diff options
author | Grant Likely <grant.likely@secretlab.ca> | 2011-11-04 11:51:22 -0400 |
---|---|---|
committer | Grant Likely <grant.likely@secretlab.ca> | 2012-03-29 21:13:50 -0400 |
commit | 31134efc681a5440e2b952eed3bf9a5306a95062 (patch) | |
tree | a84bf836230d073995712a8e4148a6b7aaf6cc5d /Documentation | |
parent | 83619ea08e9abe0f5ebcfc569a829d1105a1685e (diff) |
dt: Linux DT usage model documentation
v2: 2nd draft
- Editorial cleanups (Randy Dunlap and Stephen Warren)
- Added missing Microblaze reference (Stephen Neuendorffer)
- Make example of platform_device creation clearer (Shawn Guo)
- Expand on PowerPC history and mention i2c mess (David Gibson)
- convert to plain text (remove bits of html formating)
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/devicetree/usage-model.txt | 412 |
1 files changed, 412 insertions, 0 deletions
diff --git a/Documentation/devicetree/usage-model.txt b/Documentation/devicetree/usage-model.txt new file mode 100644 index 000000000000..c5a80099b71c --- /dev/null +++ b/Documentation/devicetree/usage-model.txt | |||
@@ -0,0 +1,412 @@ | |||
1 | Linux and the Device Tree | ||
2 | ------------------------- | ||
3 | The Linux usage model for device tree data | ||
4 | |||
5 | Author: Grant Likely <grant.likely@secretlab.ca> | ||
6 | |||
7 | This article describes how Linux uses the device tree. An overview of | ||
8 | the device tree data format can be found on the device tree usage page | ||
9 | at devicetree.org[1]. | ||
10 | |||
11 | [1] http://devicetree.org/Device_Tree_Usage | ||
12 | |||
13 | The "Open Firmware Device Tree", or simply Device Tree (DT), is a data | ||
14 | structure and language for describing hardware. More specifically, it | ||
15 | is a description of hardware that is readable by an operating system | ||
16 | so that the operating system doesn't need to hard code details of the | ||
17 | machine. | ||
18 | |||
19 | Structurally, the DT is a tree, or acyclic graph with named nodes, and | ||
20 | nodes may have an arbitrary number of named properties encapsulating | ||
21 | arbitrary data. A mechanism also exists to create arbitrary | ||
22 | links from one node to another outside of the natural tree structure. | ||
23 | |||
24 | Conceptually, a common set of usage conventions, called 'bindings', | ||
25 | is defined for how data should appear in the tree to describe typical | ||
26 | hardware characteristics including data busses, interrupt lines, GPIO | ||
27 | connections, and peripheral devices. | ||
28 | |||
29 | As much as possible, hardware is described using existing bindings to | ||
30 | maximize use of existing support code, but since property and node | ||
31 | names are simply text strings, it is easy to extend existing bindings | ||
32 | or create new ones by defining new nodes and properties. Be wary, | ||
33 | however, of creating a new binding without first doing some homework | ||
34 | about what already exists. There are currently two different, | ||
35 | incompatible, bindings for i2c busses that came about because the new | ||
36 | binding was created without first investigating how i2c devices were | ||
37 | already being enumerated in existing systems. | ||
38 | |||
39 | 1. History | ||
40 | ---------- | ||
41 | The DT was originally created by Open Firmware as part of the | ||
42 | communication method for passing data from Open Firmware to a client | ||
43 | program (like to an operating system). An operating system used the | ||
44 | Device Tree to discover the topology of the hardware at runtime, and | ||
45 | thereby support a majority of available hardware without hard coded | ||
46 | information (assuming drivers were available for all devices). | ||
47 | |||
48 | Since Open Firmware is commonly used on PowerPC and SPARC platforms, | ||
49 | the Linux support for those architectures has for a long time used the | ||
50 | Device Tree. | ||
51 | |||
52 | In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit | ||
53 | and 64-bit support, the decision was made to require DT support on all | ||
54 | powerpc platforms, regardless of whether or not they used Open | ||
55 | Firmware. To do this, a DT representation called the Flattened Device | ||
56 | Tree (FDT) was created which could be passed to the kernel as a binary | ||
57 | blob without requiring a real Open Firmware implementation. U-Boot, | ||
58 | kexec, and other bootloaders were modified to support both passing a | ||
59 | Device Tree Binary (dtb) and to modify a dtb at boot time. DT was | ||
60 | also added to the PowerPC boot wrapper (arch/powerpc/boot/*) so that | ||
61 | a dtb could be wrapped up with the kernel image to support booting | ||
62 | existing non-DT aware firmware. | ||
63 | |||
64 | Some time later, FDT infrastructure was generalized to be usable by | ||
65 | all architectures. At the time of this writing, 6 mainlined | ||
66 | architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1 | ||
67 | out of mainline (nios) have some level of DT support. | ||
68 | |||
69 | 2. Data Model | ||
70 | ------------- | ||
71 | If you haven't already read the Device Tree Usage[1] page, | ||
72 | then go read it now. It's okay, I'll wait.... | ||
73 | |||
74 | 2.1 High Level View | ||
75 | ------------------- | ||
76 | The most important thing to understand is that the DT is simply a data | ||
77 | structure that describes the hardware. There is nothing magical about | ||
78 | it, and it doesn't magically make all hardware configuration problems | ||
79 | go away. What it does do is provide a language for decoupling the | ||
80 | hardware configuration from the board and device driver support in the | ||
81 | Linux kernel (or any other operating system for that matter). Using | ||
82 | it allows board and device support to become data driven; to make | ||
83 | setup decisions based on data passed into the kernel instead of on | ||
84 | per-machine hard coded selections. | ||
85 | |||
86 | Ideally, data driven platform setup should result in less code | ||
87 | duplication and make it easier to support a wide range of hardware | ||
88 | with a single kernel image. | ||
89 | |||
90 | Linux uses DT data for three major purposes: | ||
91 | 1) platform identification, | ||
92 | 2) runtime configuration, and | ||
93 | 3) device population. | ||
94 | |||
95 | 2.2 Platform Identification | ||
96 | --------------------------- | ||
97 | First and foremost, the kernel will use data in the DT to identify the | ||
98 | specific machine. In a perfect world, the specific platform shouldn't | ||
99 | matter to the kernel because all platform details would be described | ||
100 | perfectly by the device tree in a consistent and reliable manner. | ||
101 | Hardware is not perfect though, and so the kernel must identify the | ||
102 | machine during early boot so that it has the opportunity to run | ||
103 | machine-specific fixups. | ||
104 | |||
105 | In the majority of cases, the machine identity is irrelevant, and the | ||
106 | kernel will instead select setup code based on the machine's core | ||
107 | CPU or SoC. On ARM for example, setup_arch() in | ||
108 | arch/arm/kernel/setup.c will call setup_machine_fdt() in | ||
109 | arch/arm/kernel/devicetree.c which searches through the machine_desc | ||
110 | table and selects the machine_desc which best matches the device tree | ||
111 | data. It determines the best match by looking at the 'compatible' | ||
112 | property in the root device tree node, and comparing it with the | ||
113 | dt_compat list in struct machine_desc. | ||
114 | |||
115 | The 'compatible' property contains a sorted list of strings starting | ||
116 | with the exact name of the machine, followed by an optional list of | ||
117 | boards it is compatible with sorted from most compatible to least. For | ||
118 | example, the root compatible properties for the TI BeagleBoard and its | ||
119 | successor, the BeagleBoard xM board might look like: | ||
120 | |||
121 | compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3"; | ||
122 | compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3"; | ||
123 | |||
124 | Where "ti,omap3-beagleboard-xm" specifies the exact model, it also | ||
125 | claims that it compatible with the OMAP 3450 SoC, and the omap3 family | ||
126 | of SoCs in general. You'll notice that the list is sorted from most | ||
127 | specific (exact board) to least specific (SoC family). | ||
128 | |||
129 | Astute readers might point out that the Beagle xM could also claim | ||
130 | compatibility with the original Beagle board. However, one should be | ||
131 | cautioned about doing so at the board level since there is typically a | ||
132 | high level of change from one board to another, even within the same | ||
133 | product line, and it is hard to nail down exactly what is meant when one | ||
134 | board claims to be compatible with another. For the top level, it is | ||
135 | better to err on the side of caution and not claim one board is | ||
136 | compatible with another. The notable exception would be when one | ||
137 | board is a carrier for another, such as a CPU module attached to a | ||
138 | carrier board. | ||
139 | |||
140 | One more note on compatible values. Any string used in a compatible | ||
141 | property must be documented as to what it indicates. Add | ||
142 | documentation for compatible strings in Documentation/devicetree/bindings. | ||
143 | |||
144 | Again on ARM, for each machine_desc, the kernel looks to see if | ||
145 | any of the dt_compat list entries appear in the compatible property. | ||
146 | If one does, then that machine_desc is a candidate for driving the | ||
147 | machine. After searching the entire table of machine_descs, | ||
148 | setup_machine_fdt() returns the 'most compatible' machine_desc based | ||
149 | on which entry in the compatible property each machine_desc matches | ||
150 | against. If no matching machine_desc is found, then it returns NULL. | ||
151 | |||
152 | The reasoning behind this scheme is the observation that in the majority | ||
153 | of cases, a single machine_desc can support a large number of boards | ||
154 | if they all use the same SoC, or same family of SoCs. However, | ||
155 | invariably there will be some exceptions where a specific board will | ||
156 | require special setup code that is not useful in the generic case. | ||
157 | Special cases could be handled by explicitly checking for the | ||
158 | troublesome board(s) in generic setup code, but doing so very quickly | ||
159 | becomes ugly and/or unmaintainable if it is more than just a couple of | ||
160 | cases. | ||
161 | |||
162 | Instead, the compatible list allows a generic machine_desc to provide | ||
163 | support for a wide common set of boards by specifying "less | ||
164 | compatible" value in the dt_compat list. In the example above, | ||
165 | generic board support can claim compatibility with "ti,omap3" or | ||
166 | "ti,omap3450". If a bug was discovered on the original beagleboard | ||
167 | that required special workaround code during early boot, then a new | ||
168 | machine_desc could be added which implements the workarounds and only | ||
169 | matches on "ti,omap3-beagleboard". | ||
170 | |||
171 | PowerPC uses a slightly different scheme where it calls the .probe() | ||
172 | hook from each machine_desc, and the first one returning TRUE is used. | ||
173 | However, this approach does not take into account the priority of the | ||
174 | compatible list, and probably should be avoided for new architecture | ||
175 | support. | ||
176 | |||
177 | 2.3 Runtime configuration | ||
178 | ------------------------- | ||
179 | In most cases, a DT will be the sole method of communicating data from | ||
180 | firmware to the kernel, so also gets used to pass in runtime and | ||
181 | configuration data like the kernel parameters string and the location | ||
182 | of an initrd image. | ||
183 | |||
184 | Most of this data is contained in the /chosen node, and when booting | ||
185 | Linux it will look something like this: | ||
186 | |||
187 | chosen { | ||
188 | bootargs = "console=ttyS0,115200 loglevel=8"; | ||
189 | initrd-start = <0xc8000000>; | ||
190 | initrd-end = <0xc8200000>; | ||
191 | }; | ||
192 | |||
193 | The bootargs property contains the kernel arguments, and the initrd-* | ||
194 | properties define the address and size of an initrd blob. The | ||
195 | chosen node may also optionally contain an arbitrary number of | ||
196 | additional properties for platform-specific configuration data. | ||
197 | |||
198 | During early boot, the architecture setup code calls of_scan_flat_dt() | ||
199 | several times with different helper callbacks to parse device tree | ||
200 | data before paging is setup. The of_scan_flat_dt() code scans through | ||
201 | the device tree and uses the helpers to extract information required | ||
202 | during early boot. Typically the early_init_dt_scan_chosen() helper | ||
203 | is used to parse the chosen node including kernel parameters, | ||
204 | early_init_dt_scan_root() to initialize the DT address space model, | ||
205 | and early_init_dt_scan_memory() to determine the size and | ||
206 | location of usable RAM. | ||
207 | |||
208 | On ARM, the function setup_machine_fdt() is responsible for early | ||
209 | scanning of the device tree after selecting the correct machine_desc | ||
210 | that supports the board. | ||
211 | |||
212 | 2.4 Device population | ||
213 | --------------------- | ||
214 | After the board has been identified, and after the early configuration data | ||
215 | has been parsed, then kernel initialization can proceed in the normal | ||
216 | way. At some point in this process, unflatten_device_tree() is called | ||
217 | to convert the data into a more efficient runtime representation. | ||
218 | This is also when machine-specific setup hooks will get called, like | ||
219 | the machine_desc .init_early(), .init_irq() and .init_machine() hooks | ||
220 | on ARM. The remainder of this section uses examples from the ARM | ||
221 | implementation, but all architectures will do pretty much the same | ||
222 | thing when using a DT. | ||
223 | |||
224 | As can be guessed by the names, .init_early() is used for any machine- | ||
225 | specific setup that needs to be executed early in the boot process, | ||
226 | and .init_irq() is used to set up interrupt handling. Using a DT | ||
227 | doesn't materially change the behaviour of either of these functions. | ||
228 | If a DT is provided, then both .init_early() and .init_irq() are able | ||
229 | to call any of the DT query functions (of_* in include/linux/of*.h) to | ||
230 | get additional data about the platform. | ||
231 | |||
232 | The most interesting hook in the DT context is .init_machine() which | ||
233 | is primarily responsible for populating the Linux device model with | ||
234 | data about the platform. Historically this has been implemented on | ||
235 | embedded platforms by defining a set of static clock structures, | ||
236 | platform_devices, and other data in the board support .c file, and | ||
237 | registering it en-masse in .init_machine(). When DT is used, then | ||
238 | instead of hard coding static devices for each platform, the list of | ||
239 | devices can be obtained by parsing the DT, and allocating device | ||
240 | structures dynamically. | ||
241 | |||
242 | The simplest case is when .init_machine() is only responsible for | ||
243 | registering a block of platform_devices. A platform_device is a concept | ||
244 | used by Linux for memory or I/O mapped devices which cannot be detected | ||
245 | by hardware, and for 'composite' or 'virtual' devices (more on those | ||
246 | later). While there is no 'platform device' terminology for the DT, | ||
247 | platform devices roughly correspond to device nodes at the root of the | ||
248 | tree and children of simple memory mapped bus nodes. | ||
249 | |||
250 | About now is a good time to lay out an example. Here is part of the | ||
251 | device tree for the NVIDIA Tegra board. | ||
252 | |||
253 | /{ | ||
254 | compatible = "nvidia,harmony", "nvidia,tegra20"; | ||
255 | #address-cells = <1>; | ||
256 | #size-cells = <1>; | ||
257 | interrupt-parent = <&intc>; | ||
258 | |||
259 | chosen { }; | ||
260 | aliases { }; | ||
261 | |||
262 | memory { | ||
263 | device_type = "memory"; | ||
264 | reg = <0x00000000 0x40000000>; | ||
265 | }; | ||
266 | |||
267 | soc { | ||
268 | compatible = "nvidia,tegra20-soc", "simple-bus"; | ||
269 | #address-cells = <1>; | ||
270 | #size-cells = <1>; | ||
271 | ranges; | ||
272 | |||
273 | intc: interrupt-controller@50041000 { | ||
274 | compatible = "nvidia,tegra20-gic"; | ||
275 | interrupt-controller; | ||
276 | #interrupt-cells = <1>; | ||
277 | reg = <0x50041000 0x1000>, < 0x50040100 0x0100 >; | ||
278 | }; | ||
279 | |||
280 | serial@70006300 { | ||
281 | compatible = "nvidia,tegra20-uart"; | ||
282 | reg = <0x70006300 0x100>; | ||
283 | interrupts = <122>; | ||
284 | }; | ||
285 | |||
286 | i2s1: i2s@70002800 { | ||
287 | compatible = "nvidia,tegra20-i2s"; | ||
288 | reg = <0x70002800 0x100>; | ||
289 | interrupts = <77>; | ||
290 | codec = <&wm8903>; | ||
291 | }; | ||
292 | |||
293 | i2c@7000c000 { | ||
294 | compatible = "nvidia,tegra20-i2c"; | ||
295 | #address-cells = <1>; | ||
296 | #size-cells = <0>; | ||
297 | reg = <0x7000c000 0x100>; | ||
298 | interrupts = <70>; | ||
299 | |||
300 | wm8903: codec@1a { | ||
301 | compatible = "wlf,wm8903"; | ||
302 | reg = <0x1a>; | ||
303 | interrupts = <347>; | ||
304 | }; | ||
305 | }; | ||
306 | }; | ||
307 | |||
308 | sound { | ||
309 | compatible = "nvidia,harmony-sound"; | ||
310 | i2s-controller = <&i2s1>; | ||
311 | i2s-codec = <&wm8903>; | ||
312 | }; | ||
313 | }; | ||
314 | |||
315 | At .machine_init() time, Tegra board support code will need to look at | ||
316 | this DT and decide which nodes to create platform_devices for. | ||
317 | However, looking at the tree, it is not immediately obvious what kind | ||
318 | of device each node represents, or even if a node represents a device | ||
319 | at all. The /chosen, /aliases, and /memory nodes are informational | ||
320 | nodes that don't describe devices (although arguably memory could be | ||
321 | considered a device). The children of the /soc node are memory mapped | ||
322 | devices, but the codec@1a is an i2c device, and the sound node | ||
323 | represents not a device, but rather how other devices are connected | ||
324 | together to create the audio subsystem. I know what each device is | ||
325 | because I'm familiar with the board design, but how does the kernel | ||
326 | know what to do with each node? | ||
327 | |||
328 | The trick is that the kernel starts at the root of the tree and looks | ||
329 | for nodes that have a 'compatible' property. First, it is generally | ||
330 | assumed that any node with a 'compatible' property represents a device | ||
331 | of some kind, and second, it can be assumed that any node at the root | ||
332 | of the tree is either directly attached to the processor bus, or is a | ||
333 | miscellaneous system device that cannot be described any other way. | ||
334 | For each of these nodes, Linux allocates and registers a | ||
335 | platform_device, which in turn may get bound to a platform_driver. | ||
336 | |||
337 | Why is using a platform_device for these nodes a safe assumption? | ||
338 | Well, for the way that Linux models devices, just about all bus_types | ||
339 | assume that its devices are children of a bus controller. For | ||
340 | example, each i2c_client is a child of an i2c_master. Each spi_device | ||
341 | is a child of an SPI bus. Similarly for USB, PCI, MDIO, etc. The | ||
342 | same hierarchy is also found in the DT, where I2C device nodes only | ||
343 | ever appear as children of an I2C bus node. Ditto for SPI, MDIO, USB, | ||
344 | etc. The only devices which do not require a specific type of parent | ||
345 | device are platform_devices (and amba_devices, but more on that | ||
346 | later), which will happily live at the base of the Linux /sys/devices | ||
347 | tree. Therefore, if a DT node is at the root of the tree, then it | ||
348 | really probably is best registered as a platform_device. | ||
349 | |||
350 | Linux board support code calls of_platform_populate(NULL, NULL, NULL) | ||
351 | to kick off discovery of devices at the root of the tree. The | ||
352 | parameters are all NULL because when starting from the root of the | ||
353 | tree, there is no need to provide a starting node (the first NULL), a | ||
354 | parent struct device (the last NULL), and we're not using a match | ||
355 | table (yet). For a board that only needs to register devices, | ||
356 | .init_machine() can be completely empty except for the | ||
357 | of_platform_populate() call. | ||
358 | |||
359 | In the Tegra example, this accounts for the /soc and /sound nodes, but | ||
360 | what about the children of the SoC node? Shouldn't they be registered | ||
361 | as platform devices too? For Linux DT support, the generic behaviour | ||
362 | is for child devices to be registered by the parent's device driver at | ||
363 | driver .probe() time. So, an i2c bus device driver will register a | ||
364 | i2c_client for each child node, an SPI bus driver will register | ||
365 | its spi_device children, and similarly for other bus_types. | ||
366 | According to that model, a driver could be written that binds to the | ||
367 | SoC node and simply registers platform_devices for each of its | ||
368 | children. The board support code would allocate and register an SoC | ||
369 | device, a (theoretical) SoC device driver could bind to the SoC device, | ||
370 | and register platform_devices for /soc/interrupt-controller, /soc/serial, | ||
371 | /soc/i2s, and /soc/i2c in its .probe() hook. Easy, right? | ||
372 | |||
373 | Actually, it turns out that registering children of some | ||
374 | platform_devices as more platform_devices is a common pattern, and the | ||
375 | device tree support code reflects that and makes the above example | ||
376 | simpler. The second argument to of_platform_populate() is an | ||
377 | of_device_id table, and any node that matches an entry in that table | ||
378 | will also get its child nodes registered. In the tegra case, the code | ||
379 | can look something like this: | ||
380 | |||
381 | static void __init harmony_init_machine(void) | ||
382 | { | ||
383 | /* ... */ | ||
384 | of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL); | ||
385 | } | ||
386 | |||
387 | "simple-bus" is defined in the ePAPR 1.0 specification as a property | ||
388 | meaning a simple memory mapped bus, so the of_platform_populate() code | ||
389 | could be written to just assume simple-bus compatible nodes will | ||
390 | always be traversed. However, we pass it in as an argument so that | ||
391 | board support code can always override the default behaviour. | ||
392 | |||
393 | [Need to add discussion of adding i2c/spi/etc child devices] | ||
394 | |||
395 | Appendix A: AMBA devices | ||
396 | ------------------------ | ||
397 | |||
398 | ARM Primecells are a certain kind of device attached to the ARM AMBA | ||
399 | bus which include some support for hardware detection and power | ||
400 | management. In Linux, struct amba_device and the amba_bus_type is | ||
401 | used to represent Primecell devices. However, the fiddly bit is that | ||
402 | not all devices on an AMBA bus are Primecells, and for Linux it is | ||
403 | typical for both amba_device and platform_device instances to be | ||
404 | siblings of the same bus segment. | ||
405 | |||
406 | When using the DT, this creates problems for of_platform_populate() | ||
407 | because it must decide whether to register each node as either a | ||
408 | platform_device or an amba_device. This unfortunately complicates the | ||
409 | device creation model a little bit, but the solution turns out not to | ||
410 | be too invasive. If a node is compatible with "arm,amba-primecell", then | ||
411 | of_platform_populate() will register it as an amba_device instead of a | ||
412 | platform_device. | ||