diff options
author | Daniel Drake <dsd@gentoo.org> | 2008-02-06 04:37:30 -0500 |
---|---|---|
committer | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2008-02-06 13:41:07 -0500 |
commit | d156042f9fdffcb0171dc20f0d8b6df3fbf779c4 (patch) | |
tree | 891b779de32def2568f2bd2dcaa35866b0f18a79 /Documentation/unaligned-memory-access.txt | |
parent | 0d71bd5993b630a989d15adc2562a9ffe41cd26d (diff) |
Documentation about unaligned memory access
Here's a document I wrote after figuring out what unaligned memory access
is all about. I've tried to cover the information I was looking for when
trying to learn about this, without producing a hopelessly detailed/complex
spew. I hope it is useful to others.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Cc: Rob Landley <rob@landley.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Jan Engelhardt <jengelh@computergmbh.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Kyle Moffett <mrmacman_g4@mac.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation/unaligned-memory-access.txt')
-rw-r--r-- | Documentation/unaligned-memory-access.txt | 226 |
1 files changed, 226 insertions, 0 deletions
diff --git a/Documentation/unaligned-memory-access.txt b/Documentation/unaligned-memory-access.txt new file mode 100644 index 000000000000..6223eace3c09 --- /dev/null +++ b/Documentation/unaligned-memory-access.txt | |||
@@ -0,0 +1,226 @@ | |||
1 | UNALIGNED MEMORY ACCESSES | ||
2 | ========================= | ||
3 | |||
4 | Linux runs on a wide variety of architectures which have varying behaviour | ||
5 | when it comes to memory access. This document presents some details about | ||
6 | unaligned accesses, why you need to write code that doesn't cause them, | ||
7 | and how to write such code! | ||
8 | |||
9 | |||
10 | The definition of an unaligned access | ||
11 | ===================================== | ||
12 | |||
13 | Unaligned memory accesses occur when you try to read N bytes of data starting | ||
14 | from an address that is not evenly divisible by N (i.e. addr % N != 0). | ||
15 | For example, reading 4 bytes of data from address 0x10004 is fine, but | ||
16 | reading 4 bytes of data from address 0x10005 would be an unaligned memory | ||
17 | access. | ||
18 | |||
19 | The above may seem a little vague, as memory access can happen in different | ||
20 | ways. The context here is at the machine code level: certain instructions read | ||
21 | or write a number of bytes to or from memory (e.g. movb, movw, movl in x86 | ||
22 | assembly). As will become clear, it is relatively easy to spot C statements | ||
23 | which will compile to multiple-byte memory access instructions, namely when | ||
24 | dealing with types such as u16, u32 and u64. | ||
25 | |||
26 | |||
27 | Natural alignment | ||
28 | ================= | ||
29 | |||
30 | The rule mentioned above forms what we refer to as natural alignment: | ||
31 | When accessing N bytes of memory, the base memory address must be evenly | ||
32 | divisible by N, i.e. addr % N == 0. | ||
33 | |||
34 | When writing code, assume the target architecture has natural alignment | ||
35 | requirements. | ||
36 | |||
37 | In reality, only a few architectures require natural alignment on all sizes | ||
38 | of memory access. However, we must consider ALL supported architectures; | ||
39 | writing code that satisfies natural alignment requirements is the easiest way | ||
40 | to achieve full portability. | ||
41 | |||
42 | |||
43 | Why unaligned access is bad | ||
44 | =========================== | ||
45 | |||
46 | The effects of performing an unaligned memory access vary from architecture | ||
47 | to architecture. It would be easy to write a whole document on the differences | ||
48 | here; a summary of the common scenarios is presented below: | ||
49 | |||
50 | - Some architectures are able to perform unaligned memory accesses | ||
51 | transparently, but there is usually a significant performance cost. | ||
52 | - Some architectures raise processor exceptions when unaligned accesses | ||
53 | happen. The exception handler is able to correct the unaligned access, | ||
54 | at significant cost to performance. | ||
55 | - Some architectures raise processor exceptions when unaligned accesses | ||
56 | happen, but the exceptions do not contain enough information for the | ||
57 | unaligned access to be corrected. | ||
58 | - Some architectures are not capable of unaligned memory access, but will | ||
59 | silently perform a different memory access to the one that was requested, | ||
60 | resulting a a subtle code bug that is hard to detect! | ||
61 | |||
62 | It should be obvious from the above that if your code causes unaligned | ||
63 | memory accesses to happen, your code will not work correctly on certain | ||
64 | platforms and will cause performance problems on others. | ||
65 | |||
66 | |||
67 | Code that does not cause unaligned access | ||
68 | ========================================= | ||
69 | |||
70 | At first, the concepts above may seem a little hard to relate to actual | ||
71 | coding practice. After all, you don't have a great deal of control over | ||
72 | memory addresses of certain variables, etc. | ||
73 | |||
74 | Fortunately things are not too complex, as in most cases, the compiler | ||
75 | ensures that things will work for you. For example, take the following | ||
76 | structure: | ||
77 | |||
78 | struct foo { | ||
79 | u16 field1; | ||
80 | u32 field2; | ||
81 | u8 field3; | ||
82 | }; | ||
83 | |||
84 | Let us assume that an instance of the above structure resides in memory | ||
85 | starting at address 0x10000. With a basic level of understanding, it would | ||
86 | not be unreasonable to expect that accessing field2 would cause an unaligned | ||
87 | access. You'd be expecting field2 to be located at offset 2 bytes into the | ||
88 | structure, i.e. address 0x10002, but that address is not evenly divisible | ||
89 | by 4 (remember, we're reading a 4 byte value here). | ||
90 | |||
91 | Fortunately, the compiler understands the alignment constraints, so in the | ||
92 | above case it would insert 2 bytes of padding in between field1 and field2. | ||
93 | Therefore, for standard structure types you can always rely on the compiler | ||
94 | to pad structures so that accesses to fields are suitably aligned (assuming | ||
95 | you do not cast the field to a type of different length). | ||
96 | |||
97 | Similarly, you can also rely on the compiler to align variables and function | ||
98 | parameters to a naturally aligned scheme, based on the size of the type of | ||
99 | the variable. | ||
100 | |||
101 | At this point, it should be clear that accessing a single byte (u8 or char) | ||
102 | will never cause an unaligned access, because all memory addresses are evenly | ||
103 | divisible by one. | ||
104 | |||
105 | On a related topic, with the above considerations in mind you may observe | ||
106 | that you could reorder the fields in the structure in order to place fields | ||
107 | where padding would otherwise be inserted, and hence reduce the overall | ||
108 | resident memory size of structure instances. The optimal layout of the | ||
109 | above example is: | ||
110 | |||
111 | struct foo { | ||
112 | u32 field2; | ||
113 | u16 field1; | ||
114 | u8 field3; | ||
115 | }; | ||
116 | |||
117 | For a natural alignment scheme, the compiler would only have to add a single | ||
118 | byte of padding at the end of the structure. This padding is added in order | ||
119 | to satisfy alignment constraints for arrays of these structures. | ||
120 | |||
121 | Another point worth mentioning is the use of __attribute__((packed)) on a | ||
122 | structure type. This GCC-specific attribute tells the compiler never to | ||
123 | insert any padding within structures, useful when you want to use a C struct | ||
124 | to represent some data that comes in a fixed arrangement 'off the wire'. | ||
125 | |||
126 | You might be inclined to believe that usage of this attribute can easily | ||
127 | lead to unaligned accesses when accessing fields that do not satisfy | ||
128 | architectural alignment requirements. However, again, the compiler is aware | ||
129 | of the alignment constraints and will generate extra instructions to perform | ||
130 | the memory access in a way that does not cause unaligned access. Of course, | ||
131 | the extra instructions obviously cause a loss in performance compared to the | ||
132 | non-packed case, so the packed attribute should only be used when avoiding | ||
133 | structure padding is of importance. | ||
134 | |||
135 | |||
136 | Code that causes unaligned access | ||
137 | ================================= | ||
138 | |||
139 | With the above in mind, let's move onto a real life example of a function | ||
140 | that can cause an unaligned memory access. The following function adapted | ||
141 | from include/linux/etherdevice.h is an optimized routine to compare two | ||
142 | ethernet MAC addresses for equality. | ||
143 | |||
144 | unsigned int compare_ether_addr(const u8 *addr1, const u8 *addr2) | ||
145 | { | ||
146 | const u16 *a = (const u16 *) addr1; | ||
147 | const u16 *b = (const u16 *) addr2; | ||
148 | return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) != 0; | ||
149 | } | ||
150 | |||
151 | In the above function, the reference to a[0] causes 2 bytes (16 bits) to | ||
152 | be read from memory starting at address addr1. Think about what would happen | ||
153 | if addr1 was an odd address such as 0x10003. (Hint: it'd be an unaligned | ||
154 | access.) | ||
155 | |||
156 | Despite the potential unaligned access problems with the above function, it | ||
157 | is included in the kernel anyway but is understood to only work on | ||
158 | 16-bit-aligned addresses. It is up to the caller to ensure this alignment or | ||
159 | not use this function at all. This alignment-unsafe function is still useful | ||
160 | as it is a decent optimization for the cases when you can ensure alignment, | ||
161 | which is true almost all of the time in ethernet networking context. | ||
162 | |||
163 | |||
164 | Here is another example of some code that could cause unaligned accesses: | ||
165 | void myfunc(u8 *data, u32 value) | ||
166 | { | ||
167 | [...] | ||
168 | *((u32 *) data) = cpu_to_le32(value); | ||
169 | [...] | ||
170 | } | ||
171 | |||
172 | This code will cause unaligned accesses every time the data parameter points | ||
173 | to an address that is not evenly divisible by 4. | ||
174 | |||
175 | In summary, the 2 main scenarios where you may run into unaligned access | ||
176 | problems involve: | ||
177 | 1. Casting variables to types of different lengths | ||
178 | 2. Pointer arithmetic followed by access to at least 2 bytes of data | ||
179 | |||
180 | |||
181 | Avoiding unaligned accesses | ||
182 | =========================== | ||
183 | |||
184 | The easiest way to avoid unaligned access is to use the get_unaligned() and | ||
185 | put_unaligned() macros provided by the <asm/unaligned.h> header file. | ||
186 | |||
187 | Going back to an earlier example of code that potentially causes unaligned | ||
188 | access: | ||
189 | |||
190 | void myfunc(u8 *data, u32 value) | ||
191 | { | ||
192 | [...] | ||
193 | *((u32 *) data) = cpu_to_le32(value); | ||
194 | [...] | ||
195 | } | ||
196 | |||
197 | To avoid the unaligned memory access, you would rewrite it as follows: | ||
198 | |||
199 | void myfunc(u8 *data, u32 value) | ||
200 | { | ||
201 | [...] | ||
202 | value = cpu_to_le32(value); | ||
203 | put_unaligned(value, (u32 *) data); | ||
204 | [...] | ||
205 | } | ||
206 | |||
207 | The get_unaligned() macro works similarly. Assuming 'data' is a pointer to | ||
208 | memory and you wish to avoid unaligned access, its usage is as follows: | ||
209 | |||
210 | u32 value = get_unaligned((u32 *) data); | ||
211 | |||
212 | These macros work work for memory accesses of any length (not just 32 bits as | ||
213 | in the examples above). Be aware that when compared to standard access of | ||
214 | aligned memory, using these macros to access unaligned memory can be costly in | ||
215 | terms of performance. | ||
216 | |||
217 | If use of such macros is not convenient, another option is to use memcpy(), | ||
218 | where the source or destination (or both) are of type u8* or unsigned char*. | ||
219 | Due to the byte-wise nature of this operation, unaligned accesses are avoided. | ||
220 | |||
221 | -- | ||
222 | Author: Daniel Drake <dsd@gentoo.org> | ||
223 | With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt, | ||
224 | Johannes Berg, Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, | ||
225 | Uli Kunitz, Vadim Lobanov | ||
226 | |||