aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/unaligned-memory-access.txt
blob: f866c72291bf4ce2e29a9a6ffd1c40ac740d4bb7 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
UNALIGNED MEMORY ACCESSES
=========================

Linux runs on a wide variety of architectures which have varying behaviour
when it comes to memory access. This document presents some details about
unaligned accesses, why you need to write code that doesn't cause them,
and how to write such code!


The definition of an unaligned access
=====================================

Unaligned memory accesses occur when you try to read N bytes of data starting
from an address that is not evenly divisible by N (i.e. addr % N != 0).
For example, reading 4 bytes of data from address 0x10004 is fine, but
reading 4 bytes of data from address 0x10005 would be an unaligned memory
access.

The above may seem a little vague, as memory access can happen in different
ways. The context here is at the machine code level: certain instructions read
or write a number of bytes to or from memory (e.g. movb, movw, movl in x86
assembly). As will become clear, it is relatively easy to spot C statements
which will compile to multiple-byte memory access instructions, namely when
dealing with types such as u16, u32 and u64.


Natural alignment
=================

The rule mentioned above forms what we refer to as natural alignment:
When accessing N bytes of memory, the base memory address must be evenly
divisible by N, i.e. addr % N == 0.

When writing code, assume the target architecture has natural alignment
requirements.

In reality, only a few architectures require natural alignment on all sizes
of memory access. However, we must consider ALL supported architectures;
writing code that satisfies natural alignment requirements is the easiest way
to achieve full portability.


Why unaligned access is bad
===========================

The effects of performing an unaligned memory access vary from architecture
to architecture. It would be easy to write a whole document on the differences
here; a summary of the common scenarios is presented below:

 - Some architectures are able to perform unaligned memory accesses
   transparently, but there is usually a significant performance cost.
 - Some architectures raise processor exceptions when unaligned accesses
   happen. The exception handler is able to correct the unaligned access,
   at significant cost to performance.
 - Some architectures raise processor exceptions when unaligned accesses
   happen, but the exceptions do not contain enough information for the
   unaligned access to be corrected.
 - Some architectures are not capable of unaligned memory access, but will
   silently perform a different memory access to the one that was requested,
   resulting in a subtle code bug that is hard to detect!

It should be obvious from the above that if your code causes unaligned
memory accesses to happen, your code will not work correctly on certain
platforms and will cause performance problems on others.


Code that does not cause unaligned access
=========================================

At first, the concepts above may seem a little hard to relate to actual
coding practice. After all, you don't have a great deal of control over
memory addresses of certain variables, etc.

Fortunately things are not too complex, as in most cases, the compiler
ensures that things will work for you. For example, take the following
structure:

	struct foo {
		u16 field1;
		u32 field2;
		u8 field3;
	};

Let us assume that an instance of the above structure resides in memory
starting at address 0x10000. With a basic level of understanding, it would
not be unreasonable to expect that accessing field2 would cause an unaligned
access. You'd be expecting field2 to be located at offset 2 bytes into the
structure, i.e. address 0x10002, but that address is not evenly divisible
by 4 (remember, we're reading a 4 byte value here).

Fortunately, the compiler understands the alignment constraints, so in the
above case it would insert 2 bytes of padding in between field1 and field2.
Therefore, for standard structure types you can always rely on the compiler
to pad structures so that accesses to fields are suitably aligned (assuming
you do not cast the field to a type of different length).

Similarly, you can also rely on the compiler to align variables and function
parameters to a naturally aligned scheme, based on the size of the type of
the variable.

At this point, it should be clear that accessing a single byte (u8 or char)
will never cause an unaligned access, because all memory addresses are evenly
divisible by one.

On a related topic, with the above considerations in mind you may observe
that you could reorder the fields in the structure in order to place fields
where padding would otherwise be inserted, and hence reduce the overall
resident memory size of structure instances. The optimal layout of the
above example is:

	struct foo {
		u32 field2;
		u16 field1;
		u8 field3;
	};

For a natural alignment scheme, the compiler would only have to add a single
byte of padding at the end of the structure. This padding is added in order
to satisfy alignment constraints for arrays of these structures.

Another point worth mentioning is the use of __attribute__((packed)) on a
structure type. This GCC-specific attribute tells the compiler never to
insert any padding within structures, useful when you want to use a C struct
to represent some data that comes in a fixed arrangement 'off the wire'.

You might be inclined to believe that usage of this attribute can easily
lead to unaligned accesses when accessing fields that do not satisfy
architectural alignment requirements. However, again, the compiler is aware
of the alignment constraints and will generate extra instructions to perform
the memory access in a way that does not cause unaligned access. Of course,
the extra instructions obviously cause a loss in performance compared to the
non-packed case, so the packed attribute should only be used when avoiding
structure padding is of importance.


Code that causes unaligned access
=================================

With the above in mind, let's move onto a real life example of a function
that can cause an unaligned memory access. The following function adapted
from include/linux/etherdevice.h is an optimized routine to compare two
ethernet MAC addresses for equality.

unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2)
{
	const u16 *a = (const u16 *) addr1;
	const u16 *b = (const u16 *) addr2;
	return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0;
}

In the above function, the reference to a[0] causes 2 bytes (16 bits) to
be read from memory starting at address addr1. Think about what would happen
if addr1 was an odd address such as 0x10003. (Hint: it'd be an unaligned
access.)

Despite the potential unaligned access problems with the above function, it
is included in the kernel anyway but is understood to only work on
16-bit-aligned addresses. It is up to the caller to ensure this alignment or
not use this function at all. This alignment-unsafe function is still useful
as it is a decent optimization for the cases when you can ensure alignment,
which is true almost all of the time in ethernet networking context.


Here is another example of some code that could cause unaligned accesses:
	void myfunc(u8 *data, u32 value)
	{
		[...]
		*((u32 *) data) = cpu_to_le32(value);
		[...]
	}

This code will cause unaligned accesses every time the data parameter points
to an address that is not evenly divisible by 4.

In summary, the 2 main scenarios where you may run into unaligned access
problems involve:
 1. Casting variables to types of different lengths
 2. Pointer arithmetic followed by access to at least 2 bytes of data


Avoiding unaligned accesses
===========================

The easiest way to avoid unaligned access is to use the get_unaligned() and
put_unaligned() macros provided by the <asm/unaligned.h> header file.

Going back to an earlier example of code that potentially causes unaligned
access:

	void myfunc(u8 *data, u32 value)
	{
		[...]
		*((u32 *) data) = cpu_to_le32(value);
		[...]
	}

To avoid the unaligned memory access, you would rewrite it as follows:

	void myfunc(u8 *data, u32 value)
	{
		[...]
		value = cpu_to_le32(value);
		put_unaligned(value, (u32 *) data);
		[...]
	}

The get_unaligned() macro works similarly. Assuming 'data' is a pointer to
memory and you wish to avoid unaligned access, its usage is as follows:

	u32 value = get_unaligned((u32 *) data);

These macros work for memory accesses of any length (not just 32 bits as
in the examples above). Be aware that when compared to standard access of
aligned memory, using these macros to access unaligned memory can be costly in
terms of performance.

If use of such macros is not convenient, another option is to use memcpy(),
where the source or destination (or both) are of type u8* or unsigned char*.
Due to the byte-wise nature of this operation, unaligned accesses are avoided.


Alignment vs. Networking
========================

On architectures that require aligned loads, networking requires that the IP
header is aligned on a four-byte boundary to optimise the IP stack. For
regular ethernet hardware, the constant NET_IP_ALIGN is used. On most
architectures this constant has the value 2 because the normal ethernet
header is 14 bytes long, so in order to get proper alignment one needs to
DMA to an address which can be expressed as 4*n + 2. One notable exception
here is powerpc which defines NET_IP_ALIGN to 0 because DMA to unaligned
addresses can be very expensive and dwarf the cost of unaligned loads.

For some ethernet hardware that cannot DMA to unaligned addresses like
4*n+2 or non-ethernet hardware, this can be a problem, and it is then
required to copy the incoming frame into an aligned buffer. Because this is
unnecessary on architectures that can do unaligned accesses, the code can be
made dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so:

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
	skb = original skb
#else
	skb = copy skb
#endif

--
Authors: Daniel Drake <dsd@gentoo.org>,
         Johannes Berg <johannes@sipsolutions.net>
With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt,
Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz,
Vadim Lobanov

r Alessandro Zummo <a.zummo@towertech.it> 2006-03-27 04:16:41 -0500 committer Linus Torvalds <torvalds@g5.osdl.org> 2006-03-27 11:44:51 -0500 [PATCH] RTC subsystem: dev interface' href='/cgit/cgit.cgi/litmus-rt.git/commit/drivers/rtc/rtc-dev.c?h=v2.6.31-rc5&id=e824290e5dcfaf2120da587b16d10dfdff8d5d3e'>e824290e5dcf
88efe137394f
e824290e5dcf




3418ff76119d
e824290e5dcf






























cd9662094edf
3418ff76119d








e824290e5dcf





88efe137394f
e824290e5dcf








5ad31a575157
e824290e5dcf


ab6a2d70d18e
ff8371ac9a5a
e824290e5dcf



5ad31a575157

b68bb2632453
5ad31a575157
2601a46474db
110d693d5898






5ad31a575157
110d693d5898



5ad31a575157
110d693d5898


9d013d3b14f4

5ad31a575157
110d693d5898


5ad31a575157


e824290e5dcf

cd9662094edf
5ad31a575157

e824290e5dcf
5ad31a575157
e824290e5dcf



b3969e5831ad
8a0bdfd7a05f








e824290e5dcf



5ad31a575157

ab6a2d70d18e
e824290e5dcf



5ad31a575157

e824290e5dcf

5ad31a575157

e824290e5dcf




e824290e5dcf


f8245c26886c






































5ad31a575157
e824290e5dcf

5ad31a575157

ab6a2d70d18e
e824290e5dcf



5ad31a575157

e824290e5dcf

5ad31a575157

e824290e5dcf


5ad31a575157
2601a46474db
d691eb901e04





2601a46474db

099e657625e8















2601a46474db
d691eb901e04




2601a46474db

e824290e5dcf









e824290e5dcf









5ad31a575157
e824290e5dcf


5ad31a575157
e824290e5dcf

5ad31a575157
ab6a2d70d18e
e824290e5dcf



5ad31a575157

e824290e5dcf

b3969e5831ad
e824290e5dcf


5ad31a575157

e824290e5dcf


2e4a75cdcb89





e824290e5dcf

88efe137394f
e824290e5dcf
743e6a504f81







099e657625e8

743e6a504f81
099e657625e8
5cdc98b8f513

e824290e5dcf
cd9662094edf
e824290e5dcf
372a302e9a89
e824290e5dcf


d54b1fdb1d9f
e824290e5dcf



5ad31a575157
e824290e5dcf






cb3a58d2acc0
e824290e5dcf
5726fb2012f0

e824290e5dcf

5726fb2012f0

e824290e5dcf

cd9662094edf
5726fb2012f0
655066c3835e
c4028958b6ec
655066c3835e

e824290e5dcf


cb3a58d2acc0
e824290e5dcf
cb3a58d2acc0

cd9662094edf
5726fb2012f0



e824290e5dcf
e824290e5dcf

5726fb2012f0
e824290e5dcf
cd9662094edf
e824290e5dcf
e824290e5dcf

5726fb2012f0
e824290e5dcf


e824290e5dcf
5726fb2012f0
e824290e5dcf

e824290e5dcf

5726fb2012f0
e824290e5dcf
5726fb2012f0

e824290e5dcf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536













                                                                       
                     
 







                                                                     
                                                   
 
                                                             
 
                                 
 
                                                         



                                                
                         
         
                                      
                                                    

                   



                                                                     
                                                  
 
                                                                


                           
                                      

                                  










                                                                
                                    
                
                                                            
                           
 











                                                           
                                            
                                      
                                  










                                                        
                                        
                                        
                 





                                          
                                      

                                      
                                        








                                                       








                                                                                
                                         


                                                                             
                                                    



                                         
                                                                           





























                                                        
                                                                       







                                                                            




                                                                     
                                                    







                                                       
                                            

                                                    
                                                    
                                                   


                                                
                                                       
                           
 
                                                                  





                                                              
                                      


                                                                           
                                      

                        
                                                           
                                      

                      

                          
                                              
                                                            
                                                     
                                   
                 


                                                             
                                                       







                                                             


                          
                                             
                                                  


                                                                
                                      
                         
                                             



                                                                  

                                         





































                                                                              
                                                  
                         
                                             
                                              


                                                        
                                      
                          
                                             

                                                          
                                              
 




                                                      
                      














                                                     
                          



                                                                            
                      








                                                        








                                                                        
                                             

                                                                
                                                  
                          
                                             
                                                  


                                                              
                                      
                
                              

                      
                                     

                   




                                                              
                                                                  
                                                    
 






                                                                            
                                                        
                                            
                                      
                                        
                              
                                                   
 
                                                    

                 
                                                    


                                       
                                        





                                          
                                            
 
                       
                                     
                                                                  
         
                                                        
 
                                   
                                                
                                                                        

                                                 
 
 
                                               
                                                       


                                                                            
                                                  
 
                                               
 
                          
                                         
 
                              

                
                                                                    
                    
                                                                           
 
                              
 
                                                                
 
/*
 * RTC subsystem, dev interface
 *
 * Copyright (C) 2005 Tower Technologies
 * Author: Alessandro Zummo <a.zummo@towertech.it>
 *
 * based on arch/arm/common/rtctime.c
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 as
 * published by the Free Software Foundation.
*/

#include <linux/module.h>
#include <linux/rtc.h>
#include "rtc-core.h"

static dev_t rtc_devt;

#define RTC_DEV_MAX 16 /* 16 RTCs should be enough for everyone... */

static int rtc_dev_open(struct inode *inode, struct file *file)
{
	int err;
	struct rtc_device *rtc = container_of(inode->i_cdev,
					struct rtc_device, char_dev);
	const struct rtc_class_ops *ops = rtc->ops;

	if (test_and_set_bit_lock(RTC_DEV_BUSY, &rtc->flags))
		return -EBUSY;

	file->private_data = rtc;

	err = ops->open ? ops->open(rtc->dev.parent) : 0;
	if (err == 0) {
		spin_lock_irq(&rtc->irq_lock);
		rtc->irq_data = 0;
		spin_unlock_irq(&rtc->irq_lock);

		return 0;
	}

	/* something has gone wrong */
	clear_bit_unlock(RTC_DEV_BUSY, &rtc->flags);
	return err;
}

#ifdef CONFIG_RTC_INTF_DEV_UIE_EMUL
/*
 * Routine to poll RTC seconds field for change as often as possible,
 * after first RTC_UIE use timer to reduce polling
 */
static void rtc_uie_task(struct work_struct *work)
{
	struct rtc_device *rtc =
		container_of(work, struct rtc_device, uie_task);
	struct rtc_time tm;
	int num = 0;
	int err;

	err = rtc_read_time(rtc, &tm);

	local_irq_disable();
	spin_lock(&rtc->irq_lock);
	if (rtc->stop_uie_polling || err) {
		rtc->uie_task_active = 0;
	} else if (rtc->oldsecs != tm.tm_sec) {
		num = (tm.tm_sec + 60 - rtc->oldsecs) % 60;
		rtc->oldsecs = tm.tm_sec;
		rtc->uie_timer.expires = jiffies + HZ - (HZ/10);
		rtc->uie_timer_active = 1;
		rtc->uie_task_active = 0;
		add_timer(&rtc->uie_timer);
	} else if (schedule_work(&rtc->uie_task) == 0) {
		rtc->uie_task_active = 0;
	}
	spin_unlock(&rtc->irq_lock);
	if (num)
		rtc_update_irq(rtc, num, RTC_UF | RTC_IRQF);
	local_irq_enable();
}
static void rtc_uie_timer(unsigned long data)
{
	struct rtc_device *rtc = (struct rtc_device *)data;
	unsigned long flags;

	spin_lock_irqsave(&rtc->irq_lock, flags);
	rtc->uie_timer_active = 0;
	rtc->uie_task_active = 1;
	if ((schedule_work(&rtc->uie_task) == 0))
		rtc->uie_task_active = 0;
	spin_unlock_irqrestore(&rtc->irq_lock, flags);
}

static int clear_uie(struct rtc_device *rtc)
{
	spin_lock_irq(&rtc->irq_lock);
	if (rtc->uie_irq_active) {
		rtc->stop_uie_polling = 1;
		if (rtc->uie_timer_active) {
			spin_unlock_irq(&rtc->irq_lock);
			del_timer_sync(&rtc->uie_timer);
			spin_lock_irq(&rtc->irq_lock);
			rtc->uie_timer_active = 0;
		}
		if (rtc->uie_task_active) {
			spin_unlock_irq(&rtc->irq_lock);
			flush_scheduled_work();
			spin_lock_irq(&rtc->irq_lock);
		}
		rtc->uie_irq_active = 0;
	}
	spin_unlock_irq(&rtc->irq_lock);
	return 0;
}

static int set_uie(struct rtc_device *rtc)
{
	struct rtc_time tm;
	int err;

	err = rtc_read_time(rtc, &tm);
	if (err)
		return err;
	spin_lock_irq(&rtc->irq_lock);
	if (!rtc->uie_irq_active) {
		rtc->uie_irq_active = 1;
		rtc->stop_uie_polling = 0;
		rtc->oldsecs = tm.tm_sec;
		rtc->uie_task_active = 1;
		if (schedule_work(&rtc->uie_task) == 0)
			rtc->uie_task_active = 0;
	}
	rtc->irq_data = 0;
	spin_unlock_irq(&rtc->irq_lock);
	return 0;
}

int rtc_dev_update_irq_enable_emul(struct rtc_device *rtc, unsigned int enabled)
{
	if (enabled)
		return set_uie(rtc);
	else
		return clear_uie(rtc);
}
EXPORT_SYMBOL(rtc_dev_update_irq_enable_emul);

#endif /* CONFIG_RTC_INTF_DEV_UIE_EMUL */

static ssize_t
rtc_dev_read(struct file *file, char __user *buf, size_t count, loff_t *ppos)
{
	struct rtc_device *rtc = file->private_data;

	DECLARE_WAITQUEUE(wait, current);
	unsigned long data;
	ssize_t ret;

	if (count != sizeof(unsigned int) && count < sizeof(unsigned long))
		return -EINVAL;

	add_wait_queue(&rtc->irq_queue, &wait);
	do {
		__set_current_state(TASK_INTERRUPTIBLE);

		spin_lock_irq(&rtc->irq_lock);
		data = rtc->irq_data;
		rtc->irq_data = 0;
		spin_unlock_irq(&rtc->irq_lock);

		if (data != 0) {
			ret = 0;
			break;
		}
		if (file->f_flags & O_NONBLOCK) {
			ret = -EAGAIN;
			break;
		}
		if (signal_pending(current)) {
			ret = -ERESTARTSYS;
			break;
		}
		schedule();
	} while (1);
	set_current_state(TASK_RUNNING);
	remove_wait_queue(&rtc->irq_queue, &wait);

	if (ret == 0) {
		/* Check for any data updates */
		if (rtc->ops->read_callback)
			data = rtc->ops->read_callback(rtc->dev.parent,
						       data);

		if (sizeof(int) != sizeof(long) &&
		    count == sizeof(unsigned int))
			ret = put_user(data, (unsigned int __user *)buf) ?:
				sizeof(unsigned int);
		else
			ret = put_user(data, (unsigned long __user *)buf) ?:
				sizeof(unsigned long);
	}
	return ret;
}

static unsigned int rtc_dev_poll(struct file *file, poll_table *wait)
{
	struct rtc_device *rtc = file->private_data;
	unsigned long data;

	poll_wait(file, &rtc->irq_queue, wait);

	data = rtc->irq_data;

	return (data != 0) ? (POLLIN | POLLRDNORM) : 0;
}

static long rtc_dev_ioctl(struct file *file,
		unsigned int cmd, unsigned long arg)
{
	int err = 0;
	struct rtc_device *rtc = file->private_data;
	const struct rtc_class_ops *ops = rtc->ops;
	struct rtc_time tm;
	struct rtc_wkalrm alarm;
	void __user *uarg = (void __user *) arg;

	err = mutex_lock_interruptible(&rtc->ops_lock);
	if (err)
		return err;

	/* check that the calling task has appropriate permissions
	 * for certain ioctls. doing this check here is useful
	 * to avoid duplicate code in each driver.
	 */
	switch (cmd) {
	case RTC_EPOCH_SET:
	case RTC_SET_TIME:
		if (!capable(CAP_SYS_TIME))
			err = -EACCES;
		break;

	case RTC_IRQP_SET:
		if (arg > rtc->max_user_freq && !capable(CAP_SYS_RESOURCE))
			err = -EACCES;
		break;

	case RTC_PIE_ON:
		if (rtc->irq_freq > rtc->max_user_freq &&
				!capable(CAP_SYS_RESOURCE))
			err = -EACCES;
		break;
	}

	if (err)
		goto done;

	/* try the driver's ioctl interface */
	if (ops->ioctl) {
		err = ops->ioctl(rtc->dev.parent, cmd, arg);
		if (err != -ENOIOCTLCMD) {
			mutex_unlock(&rtc->ops_lock);
			return err;
		}
	}

	/* if the driver does not provide the ioctl interface
	 * or if that particular ioctl was not implemented
	 * (-ENOIOCTLCMD), we will try to emulate here.
	 *
	 * Drivers *SHOULD NOT* provide ioctl implementations
	 * for these requests.  Instead, provide methods to
	 * support the following code, so that the RTC's main
	 * features are accessible without using ioctls.
	 *
	 * RTC and alarm times will be in UTC, by preference,
	 * but dual-booting with MS-Windows implies RTCs must
	 * use the local wall clock time.
	 */

	switch (cmd) {
	case RTC_ALM_READ:
		mutex_unlock(&rtc->ops_lock);

		err = rtc_read_alarm(rtc, &alarm);
		if (err < 0)
			return err;

		if (copy_to_user(uarg, &alarm.time, sizeof(tm)))
			err = -EFAULT;
		return err;

	case RTC_ALM_SET:
		mutex_unlock(&rtc->ops_lock);

		if (copy_from_user(&alarm.time, uarg, sizeof(tm)))
			return -EFAULT;

		alarm.enabled = 0;
		alarm.pending = 0;
		alarm.time.tm_wday = -1;
		alarm.time.tm_yday = -1;
		alarm.time.tm_isdst = -1;

		/* RTC_ALM_SET alarms may be up to 24 hours in the future.
		 * Rather than expecting every RTC to implement "don't care"
		 * for day/month/year fields, just force the alarm to have
		 * the right values for those fields.
		 *
		 * RTC_WKALM_SET should be used instead.  Not only does it
		 * eliminate the need for a separate RTC_AIE_ON call, it
		 * doesn't have the "alarm 23:59:59 in the future" race.
		 *
		 * NOTE:  some legacy code may have used invalid fields as
		 * wildcards, exposing hardware "periodic alarm" capabilities.
		 * Not supported here.
		 */
		{
			unsigned long now, then;

			err = rtc_read_time(rtc, &tm);
			if (err < 0)
				return err;
			rtc_tm_to_time(&tm, &now);

			alarm.time.tm_mday = tm.tm_mday;
			alarm.time.tm_mon = tm.tm_mon;
			alarm.time.tm_year = tm.tm_year;
			err  = rtc_valid_tm(&alarm.time);
			if (err < 0)
				return err;
			rtc_tm_to_time(&alarm.time, &then);

			/* alarm may need to wrap into tomorrow */
			if (then < now) {
				rtc_time_to_tm(now + 24 * 60 * 60, &tm);
				alarm.time.tm_mday = tm.tm_mday;
				alarm.time.tm_mon = tm.tm_mon;
				alarm.time.tm_year = tm.tm_year;
			}
		}

		return rtc_set_alarm(rtc, &alarm);

	case RTC_RD_TIME:
		mutex_unlock(&rtc->ops_lock);

		err = rtc_read_time(rtc, &tm);
		if (err < 0)
			return err;

		if (copy_to_user(uarg, &tm, sizeof(tm)))
			err = -EFAULT;
		return err;

	case RTC_SET_TIME:
		mutex_unlock(&rtc->ops_lock);

		if (copy_from_user(&tm, uarg, sizeof(tm)))
			return -EFAULT;

		return rtc_set_time(rtc, &tm);

	case RTC_PIE_ON:
		err = rtc_irq_set_state(rtc, NULL, 1);
		break;

	case RTC_PIE_OFF:
		err = rtc_irq_set_state(rtc, NULL, 0);
		break;

	case RTC_AIE_ON:
		mutex_unlock(&rtc->ops_lock);
		return rtc_alarm_irq_enable(rtc, 1);

	case RTC_AIE_OFF:
		mutex_unlock(&rtc->ops_lock);
		return rtc_alarm_irq_enable(rtc, 0);

	case RTC_UIE_ON:
		mutex_unlock(&rtc->ops_lock);
		return rtc_update_irq_enable(rtc, 1);

	case RTC_UIE_OFF:
		mutex_unlock(&rtc->ops_lock);
		return rtc_update_irq_enable(rtc, 0);

	case RTC_IRQP_SET:
		err = rtc_irq_set_freq(rtc, NULL, arg);
		break;

	case RTC_IRQP_READ:
		err = put_user(rtc->irq_freq, (unsigned long __user *)uarg);
		break;

#if 0
	case RTC_EPOCH_SET:
#ifndef rtc_epoch
		/*
		 * There were no RTC clocks before 1900.
		 */
		if (arg < 1900) {
			err = -EINVAL;
			break;
		}
		rtc_epoch = arg;
		err = 0;
#endif
		break;

	case RTC_EPOCH_READ:
		err = put_user(rtc_epoch, (unsigned long __user *)uarg);
		break;
#endif
	case RTC_WKALM_SET:
		mutex_unlock(&rtc->ops_lock);
		if (copy_from_user(&alarm, uarg, sizeof(alarm)))
			return -EFAULT;

		return rtc_set_alarm(rtc, &alarm);

	case RTC_WKALM_RD:
		mutex_unlock(&rtc->ops_lock);
		err = rtc_read_alarm(rtc, &alarm);
		if (err < 0)
			return err;

		if (copy_to_user(uarg, &alarm, sizeof(alarm)))
			err = -EFAULT;
		return err;

	default:
		err = -ENOTTY;
		break;
	}

done:
	mutex_unlock(&rtc->ops_lock);
	return err;
}

static int rtc_dev_fasync(int fd, struct file *file, int on)
{
	struct rtc_device *rtc = file->private_data;
	return fasync_helper(fd, file, on, &rtc->async_queue);
}

static int rtc_dev_release(struct inode *inode, struct file *file)
{
	struct rtc_device *rtc = file->private_data;

	/* We shut down the repeating IRQs that userspace enabled,
	 * since nothing is listening to them.
	 *  - Update (UIE) ... currently only managed through ioctls
	 *  - Periodic (PIE) ... also used through rtc_*() interface calls
	 *
	 * Leave the alarm alone; it may be set to trigger a system wakeup
	 * later, or be used by kernel code, and is a one-shot event anyway.
	 */

	/* Keep ioctl until all drivers are converted */
	rtc_dev_ioctl(file, RTC_UIE_OFF, 0);
	rtc_update_irq_enable(rtc, 0);
	rtc_irq_set_state(rtc, NULL, 0);

	if (rtc->ops->release)
		rtc->ops->release(rtc->dev.parent);

	clear_bit_unlock(RTC_DEV_BUSY, &rtc->flags);
	return 0;
}

static const struct file_operations rtc_dev_fops = {
	.owner		= THIS_MODULE,
	.llseek		= no_llseek,
	.read		= rtc_dev_read,
	.poll		= rtc_dev_poll,
	.unlocked_ioctl	= rtc_dev_ioctl,
	.open		= rtc_dev_open,
	.release	= rtc_dev_release,
	.fasync		= rtc_dev_fasync,
};

/* insertion/removal hooks */

void rtc_dev_prepare(struct rtc_device *rtc)
{
	if (!rtc_devt)
		return;

	if (rtc->id >= RTC_DEV_MAX) {
		pr_debug("%s: too many RTC devices\n", rtc->name);
		return;
	}

	rtc->dev.devt = MKDEV(MAJOR(rtc_devt), rtc->id);

#ifdef CONFIG_RTC_INTF_DEV_UIE_EMUL
	INIT_WORK(&rtc->uie_task, rtc_uie_task);
	setup_timer(&rtc->uie_timer, rtc_uie_timer, (unsigned long)rtc);
#endif

	cdev_init(&rtc->char_dev, &rtc_dev_fops);
	rtc->char_dev.owner = rtc->owner;
}

void rtc_dev_add_device(struct rtc_device *rtc)
{
	if (cdev_add(&rtc->char_dev, rtc->dev.devt, 1))
		printk(KERN_WARNING "%s: failed to add char device %d:%d\n",
			rtc->name, MAJOR(rtc_devt), rtc->id);
	else
		pr_debug("%s: dev (%d:%d)\n", rtc->name,
			MAJOR(rtc_devt), rtc->id);
}

void rtc_dev_del_device(struct rtc_device *rtc)
{
	if (rtc->dev.devt)
		cdev_del(&rtc->char_dev);
}

void __init rtc_dev_init(void)
{
	int err;

	err = alloc_chrdev_region(&rtc_devt, 0, RTC_DEV_MAX, "rtc");
	if (err < 0)
		printk(KERN_ERR "%s: failed to allocate char dev region\n",
			__FILE__);
}

void __exit rtc_dev_exit(void)
{
	if (rtc_devt)
		unregister_chrdev_region(rtc_devt, RTC_DEV_MAX);
}