aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/Intel-IOMMU.txt115
-rw-r--r--Documentation/filesystems/Exporting115
-rw-r--r--Documentation/i386/boot.txt34
-rw-r--r--Documentation/kbuild/makefiles.txt10
-rw-r--r--Documentation/kernel-parameters.txt17
-rw-r--r--Documentation/memory-hotplug.txt58
6 files changed, 268 insertions, 81 deletions
diff --git a/Documentation/Intel-IOMMU.txt b/Documentation/Intel-IOMMU.txt
new file mode 100644
index 000000000000..c2321903aa09
--- /dev/null
+++ b/Documentation/Intel-IOMMU.txt
@@ -0,0 +1,115 @@
1Linux IOMMU Support
2===================
3
4The architecture spec can be obtained from the below location.
5
6http://www.intel.com/technology/virtualization/
7
8This guide gives a quick cheat sheet for some basic understanding.
9
10Some Keywords
11
12DMAR - DMA remapping
13DRHD - DMA Engine Reporting Structure
14RMRR - Reserved memory Region Reporting Structure
15ZLR - Zero length reads from PCI devices
16IOVA - IO Virtual address.
17
18Basic stuff
19-----------
20
21ACPI enumerates and lists the different DMA engines in the platform, and
22device scope relationships between PCI devices and which DMA engine controls
23them.
24
25What is RMRR?
26-------------
27
28There are some devices the BIOS controls, for e.g USB devices to perform
29PS2 emulation. The regions of memory used for these devices are marked
30reserved in the e820 map. When we turn on DMA translation, DMA to those
31regions will fail. Hence BIOS uses RMRR to specify these regions along with
32devices that need to access these regions. OS is expected to setup
33unity mappings for these regions for these devices to access these regions.
34
35How is IOVA generated?
36---------------------
37
38Well behaved drivers call pci_map_*() calls before sending command to device
39that needs to perform DMA. Once DMA is completed and mapping is no longer
40required, device performs a pci_unmap_*() calls to unmap the region.
41
42The Intel IOMMU driver allocates a virtual address per domain. Each PCIE
43device has its own domain (hence protection). Devices under p2p bridges
44share the virtual address with all devices under the p2p bridge due to
45transaction id aliasing for p2p bridges.
46
47IOVA generation is pretty generic. We used the same technique as vmalloc()
48but these are not global address spaces, but separate for each domain.
49Different DMA engines may support different number of domains.
50
51We also allocate gaurd pages with each mapping, so we can attempt to catch
52any overflow that might happen.
53
54
55Graphics Problems?
56------------------
57If you encounter issues with graphics devices, you can try adding
58option intel_iommu=igfx_off to turn off the integrated graphics engine.
59
60If it happens to be a PCI device included in the INCLUDE_ALL Engine,
61then try enabling CONFIG_DMAR_GFX_WA to setup a 1-1 map. We hear
62graphics drivers may be in process of using DMA api's in the near
63future and at that time this option can be yanked out.
64
65Some exceptions to IOVA
66-----------------------
67Interrupt ranges are not address translated, (0xfee00000 - 0xfeefffff).
68The same is true for peer to peer transactions. Hence we reserve the
69address from PCI MMIO ranges so they are not allocated for IOVA addresses.
70
71
72Fault reporting
73---------------
74When errors are reported, the DMA engine signals via an interrupt. The fault
75reason and device that caused it with fault reason is printed on console.
76
77See below for sample.
78
79
80Boot Message Sample
81-------------------
82
83Something like this gets printed indicating presence of DMAR tables
84in ACPI.
85
86ACPI: DMAR (v001 A M I OEMDMAR 0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0
87
88When DMAR is being processed and initialized by ACPI, prints DMAR locations
89and any RMRR's processed.
90
91ACPI DMAR:Host address width 36
92ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
93ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000
94ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
95ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff
96ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff
97
98When DMAR is enabled for use, you will notice..
99
100PCI-DMA: Using DMAR IOMMU
101
102Fault reporting
103---------------
104
105DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
106DMAR:[fault reason 05] PTE Write access is not set
107DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
108DMAR:[fault reason 05] PTE Write access is not set
109
110TBD
111----
112
113- For compatibility testing, could use unity map domain for all devices, just
114 provide a 1-1 for all useful memory under a single domain for all devices.
115- API for paravirt ops for abstracting functionlity for VMM folks.
diff --git a/Documentation/filesystems/Exporting b/Documentation/filesystems/Exporting
index 31047e0fe14b..87019d2b5981 100644
--- a/Documentation/filesystems/Exporting
+++ b/Documentation/filesystems/Exporting
@@ -2,9 +2,12 @@
2Making Filesystems Exportable 2Making Filesystems Exportable
3============================= 3=============================
4 4
5Most filesystem operations require a dentry (or two) as a starting 5Overview
6--------
7
8All filesystem operations require a dentry (or two) as a starting
6point. Local applications have a reference-counted hold on suitable 9point. Local applications have a reference-counted hold on suitable
7dentrys via open file descriptors or cwd/root. However remote 10dentries via open file descriptors or cwd/root. However remote
8applications that access a filesystem via a remote filesystem protocol 11applications that access a filesystem via a remote filesystem protocol
9such as NFS may not be able to hold such a reference, and so need a 12such as NFS may not be able to hold such a reference, and so need a
10different way to refer to a particular dentry. As the alternative 13different way to refer to a particular dentry. As the alternative
@@ -13,14 +16,14 @@ server-reboot (among other things, though these tend to be the most
13problematic), there is no simple answer like 'filename'. 16problematic), there is no simple answer like 'filename'.
14 17
15The mechanism discussed here allows each filesystem implementation to 18The mechanism discussed here allows each filesystem implementation to
16specify how to generate an opaque (out side of the filesystem) byte 19specify how to generate an opaque (outside of the filesystem) byte
17string for any dentry, and how to find an appropriate dentry for any 20string for any dentry, and how to find an appropriate dentry for any
18given opaque byte string. 21given opaque byte string.
19This byte string will be called a "filehandle fragment" as it 22This byte string will be called a "filehandle fragment" as it
20corresponds to part of an NFS filehandle. 23corresponds to part of an NFS filehandle.
21 24
22A filesystem which supports the mapping between filehandle fragments 25A filesystem which supports the mapping between filehandle fragments
23and dentrys will be termed "exportable". 26and dentries will be termed "exportable".
24 27
25 28
26 29
@@ -89,11 +92,9 @@ For a filesystem to be exportable it must:
89 1/ provide the filehandle fragment routines described below. 92 1/ provide the filehandle fragment routines described below.
90 2/ make sure that d_splice_alias is used rather than d_add 93 2/ make sure that d_splice_alias is used rather than d_add
91 when ->lookup finds an inode for a given parent and name. 94 when ->lookup finds an inode for a given parent and name.
92 Typically the ->lookup routine will end: 95 Typically the ->lookup routine will end with a:
93 if (inode) 96
94 return d_splice(inode, dentry); 97 return d_splice_alias(inode, dentry);
95 d_add(dentry, inode);
96 return NULL;
97 } 98 }
98 99
99 100
@@ -101,67 +102,39 @@ For a filesystem to be exportable it must:
101 A file system implementation declares that instances of the filesystem 102 A file system implementation declares that instances of the filesystem
102are exportable by setting the s_export_op field in the struct 103are exportable by setting the s_export_op field in the struct
103super_block. This field must point to a "struct export_operations" 104super_block. This field must point to a "struct export_operations"
104struct which could potentially be full of NULLs, though normally at 105struct which has the following members:
105least get_parent will be set. 106
106 107 encode_fh (optional)
107 The primary operations are decode_fh and encode_fh. 108 Takes a dentry and creates a filehandle fragment which can later be used
108decode_fh takes a filehandle fragment and tries to find or create a 109 to find or create a dentry for the same object. The default
109dentry for the object referred to by the filehandle. 110 implementation creates a filehandle fragment that encodes a 32bit inode
110encode_fh takes a dentry and creates a filehandle fragment which can 111 and generation number for the inode encoded, and if necessary the
111later be used to find/create a dentry for the same object. 112 same information for the parent.
112 113
113decode_fh will probably make use of "find_exported_dentry". 114 fh_to_dentry (mandatory)
114This function lives in the "exportfs" module which a filesystem does 115 Given a filehandle fragment, this should find the implied object and
115not need unless it is being exported. So rather that calling 116 create a dentry for it (possibly with d_alloc_anon).
116find_exported_dentry directly, each filesystem should call it through 117
117the find_exported_dentry pointer in it's export_operations table. 118 fh_to_parent (optional but strongly recommended)
118This field is set correctly by the exporting agent (e.g. nfsd) when a 119 Given a filehandle fragment, this should find the parent of the
119filesystem is exported, and before any export operations are called. 120 implied object and create a dentry for it (possibly with d_alloc_anon).
120 121 May fail if the filehandle fragment is too small.
121find_exported_dentry needs three support functions from the 122
122filesystem: 123 get_parent (optional but strongly recommended)
123 get_name. When given a parent dentry and a child dentry, this 124 When given a dentry for a directory, this should return a dentry for
124 should find a name in the directory identified by the parent 125 the parent. Quite possibly the parent dentry will have been allocated
125 dentry, which leads to the object identified by the child dentry. 126 by d_alloc_anon. The default get_parent function just returns an error
126 If no get_name function is supplied, a default implementation is 127 so any filehandle lookup that requires finding a parent will fail.
127 provided which uses vfs_readdir to find potential names, and 128 ->lookup("..") is *not* used as a default as it can leave ".." entries
128 matches inode numbers to find the correct match. 129 in the dcache which are too messy to work with.
129 130
130 get_parent. When given a dentry for a directory, this should return 131 get_name (optional)
131 a dentry for the parent. Quite possibly the parent dentry will 132 When given a parent dentry and a child dentry, this should find a name
132 have been allocated by d_alloc_anon. 133 in the directory identified by the parent dentry, which leads to the
133 The default get_parent function just returns an error so any 134 object identified by the child dentry. If no get_name function is
134 filehandle lookup that requires finding a parent will fail. 135 supplied, a default implementation is provided which uses vfs_readdir
135 ->lookup("..") is *not* used as a default as it can leave ".." 136 to find potential names, and matches inode numbers to find the correct
136 entries in the dcache which are too messy to work with. 137 match.
137
138 get_dentry. When given an opaque datum, this should find the
139 implied object and create a dentry for it (possibly with
140 d_alloc_anon).
141 The opaque datum is whatever is passed down by the decode_fh
142 function, and is often simply a fragment of the filehandle
143 fragment.
144 decode_fh passes two datums through find_exported_dentry. One that
145 should be used to identify the target object, and one that can be
146 used to identify the object's parent, should that be necessary.
147 The default get_dentry function assumes that the datum contains an
148 inode number and a generation number, and it attempts to get the
149 inode using "iget" and check it's validity by matching the
150 generation number. A filesystem should only depend on the default
151 if iget can safely be used this way.
152
153If decode_fh and/or encode_fh are left as NULL, then default
154implementations are used. These defaults are suitable for ext2 and
155extremely similar filesystems (like ext3).
156
157The default encode_fh creates a filehandle fragment from the inode
158number and generation number of the target together with the inode
159number and generation number of the parent (if the parent is
160required).
161
162The default decode_fh extract the target and parent datums from the
163filehandle assuming the format used by the default encode_fh and
164passed them to find_exported_dentry.
165 138
166 139
167A filehandle fragment consists of an array of 1 or more 4byte words, 140A filehandle fragment consists of an array of 1 or more 4byte words,
@@ -172,5 +145,3 @@ generated by encode_fh, in which case it will have been padded with
172nuls. Rather, the encode_fh routine should choose a "type" which 145nuls. Rather, the encode_fh routine should choose a "type" which
173indicates the decode_fh how much of the filehandle is valid, and how 146indicates the decode_fh how much of the filehandle is valid, and how
174it should be interpreted. 147it should be interpreted.
175
176
diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt
index 35985b34d5a6..2f75e750e4f5 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -168,6 +168,8 @@ Offset Proto Name Meaning
1680234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not 1680234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not
1690235/3 N/A pad2 Unused 1690235/3 N/A pad2 Unused
1700238/4 2.06+ cmdline_size Maximum size of the kernel command line 1700238/4 2.06+ cmdline_size Maximum size of the kernel command line
171023C/4 2.07+ hardware_subarch Hardware subarchitecture
1720240/8 2.07+ hardware_subarch_data Subarchitecture-specific data
171 173
172(1) For backwards compatibility, if the setup_sects field contains 0, the 174(1) For backwards compatibility, if the setup_sects field contains 0, the
173 real value is 4. 175 real value is 4.
@@ -204,7 +206,7 @@ boot loaders can ignore those fields.
204 206
205The byte order of all fields is littleendian (this is x86, after all.) 207The byte order of all fields is littleendian (this is x86, after all.)
206 208
207Field name: setup_secs 209Field name: setup_sects
208Type: read 210Type: read
209Offset/size: 0x1f1/1 211Offset/size: 0x1f1/1
210Protocol: ALL 212Protocol: ALL
@@ -356,6 +358,13 @@ Protocol: 2.00+
356 - If 0, the protected-mode code is loaded at 0x10000. 358 - If 0, the protected-mode code is loaded at 0x10000.
357 - If 1, the protected-mode code is loaded at 0x100000. 359 - If 1, the protected-mode code is loaded at 0x100000.
358 360
361 Bit 6 (write): KEEP_SEGMENTS
362 Protocol: 2.07+
363 - if 0, reload the segment registers in the 32bit entry point.
364 - if 1, do not reload the segment registers in the 32bit entry point.
365 Assume that %cs %ds %ss %es are all set to flat segments with
366 a base of 0 (or the equivalent for their environment).
367
359 Bit 7 (write): CAN_USE_HEAP 368 Bit 7 (write): CAN_USE_HEAP
360 Set this bit to 1 to indicate that the value entered in the 369 Set this bit to 1 to indicate that the value entered in the
361 heap_end_ptr is valid. If this field is clear, some setup code 370 heap_end_ptr is valid. If this field is clear, some setup code
@@ -480,6 +489,29 @@ Protocol: 2.06+
480 cmdline_size characters. With protocol version 2.05 and earlier, the 489 cmdline_size characters. With protocol version 2.05 and earlier, the
481 maximum size was 255. 490 maximum size was 255.
482 491
492Field name: hardware_subarch
493Type: write
494Offset/size: 0x23c/4
495Protocol: 2.07+
496
497 In a paravirtualized environment the hardware low level architectural
498 pieces such as interrupt handling, page table handling, and
499 accessing process control registers needs to be done differently.
500
501 This field allows the bootloader to inform the kernel we are in one
502 one of those environments.
503
504 0x00000000 The default x86/PC environment
505 0x00000001 lguest
506 0x00000002 Xen
507
508Field name: hardware_subarch_data
509Type: write
510Offset/size: 0x240/8
511Protocol: 2.07+
512
513 A pointer to data that is specific to hardware subarch
514
483 515
484**** THE KERNEL COMMAND LINE 516**** THE KERNEL COMMAND LINE
485 517
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index 6166e2d7da76..7a7753321a26 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -519,17 +519,17 @@ more details, with real examples.
519 to the user why it stops. 519 to the user why it stops.
520 520
521 cc-cross-prefix 521 cc-cross-prefix
522 cc-cross-prefix is used to check if there exist a $(CC) in path with 522 cc-cross-prefix is used to check if there exists a $(CC) in path with
523 one of the listed prefixes. The first prefix where there exist a 523 one of the listed prefixes. The first prefix where there exist a
524 prefix$(CC) in the PATH is returned - and if no prefix$(CC) is found 524 prefix$(CC) in the PATH is returned - and if no prefix$(CC) is found
525 then nothing is returned. 525 then nothing is returned.
526 Additional prefixes are separated by a single space in the 526 Additional prefixes are separated by a single space in the
527 call of cc-cross-prefix. 527 call of cc-cross-prefix.
528 This functionality is usefull for architecture Makefile that try 528 This functionality is useful for architecture Makefiles that try
529 to set CROSS_COMPILE to well know values but may have several 529 to set CROSS_COMPILE to well-known values but may have several
530 values to select between. 530 values to select between.
531 It is recommended only to try to set CROSS_COMPILE is it is a cross 531 It is recommended only to try to set CROSS_COMPILE if it is a cross
532 build (host arch is different from target arch). And is CROSS_COMPILE 532 build (host arch is different from target arch). And if CROSS_COMPILE
533 is already set then leave it with the old value. 533 is already set then leave it with the old value.
534 534
535 Example: 535 Example:
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 6accd360da73..b2361667839f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -772,6 +772,23 @@ and is between 256 and 4096 characters. It is defined in the file
772 772
773 inttest= [IA64] 773 inttest= [IA64]
774 774
775 intel_iommu= [DMAR] Intel IOMMU driver (DMAR) option
776 off
777 Disable intel iommu driver.
778 igfx_off [Default Off]
779 By default, gfx is mapped as normal device. If a gfx
780 device has a dedicated DMAR unit, the DMAR unit is
781 bypassed by not enabling DMAR with this option. In
782 this case, gfx device will use physical address for
783 DMA.
784 forcedac [x86_64]
785 With this option iommu will not optimize to look
786 for io virtual address below 32 bit forcing dual
787 address cycle on pci bus for cards supporting greater
788 than 32 bit addressing. The default is to look
789 for translation below 32 bit and if not available
790 then look in the higher range.
791
775 io7= [HW] IO7 for Marvel based alpha systems 792 io7= [HW] IO7 for Marvel based alpha systems
776 See comment before marvel_specify_io7 in 793 See comment before marvel_specify_io7 in
777 arch/alpha/kernel/core_marvel.c. 794 arch/alpha/kernel/core_marvel.c.
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 5fbcc22c98e9..168117bd6ee8 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -2,7 +2,8 @@
2Memory Hotplug 2Memory Hotplug
3============== 3==============
4 4
5Last Updated: Jul 28 2007 5Created: Jul 28 2007
6Add description of notifier of memory hotplug Oct 11 2007
6 7
7This document is about memory hotplug including how-to-use and current status. 8This document is about memory hotplug including how-to-use and current status.
8Because Memory Hotplug is still under development, contents of this text will 9Because Memory Hotplug is still under development, contents of this text will
@@ -24,7 +25,8 @@ be changed often.
24 6.1 Memory offline and ZONE_MOVABLE 25 6.1 Memory offline and ZONE_MOVABLE
25 6.2. How to offline memory 26 6.2. How to offline memory
267. Physical memory remove 277. Physical memory remove
278. Future Work List 288. Memory hotplug event notifier
299. Future Work List
28 30
29Note(1): x86_64's has special implementation for memory hotplug. 31Note(1): x86_64's has special implementation for memory hotplug.
30 This text does not describe it. 32 This text does not describe it.
@@ -307,8 +309,58 @@ Need more implementation yet....
307 - Notification completion of remove works by OS to firmware. 309 - Notification completion of remove works by OS to firmware.
308 - Guard from remove if not yet. 310 - Guard from remove if not yet.
309 311
312--------------------------------
3138. Memory hotplug event notifier
314--------------------------------
315Memory hotplug has event notifer. There are 6 types of notification.
316
317MEMORY_GOING_ONLINE
318 Generated before new memory becomes available in order to be able to
319 prepare subsystems to handle memory. The page allocator is still unable
320 to allocate from the new memory.
321
322MEMORY_CANCEL_ONLINE
323 Generated if MEMORY_GOING_ONLINE fails.
324
325MEMORY_ONLINE
326 Generated when memory has succesfully brought online. The callback may
327 allocate pages from the new memory.
328
329MEMORY_GOING_OFFLINE
330 Generated to begin the process of offlining memory. Allocations are no
331 longer possible from the memory but some of the memory to be offlined
332 is still in use. The callback can be used to free memory known to a
333 subsystem from the indicated memory section.
334
335MEMORY_CANCEL_OFFLINE
336 Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from
337 the section that we attempted to offline.
338
339MEMORY_OFFLINE
340 Generated after offlining memory is complete.
341
342A callback routine can be registered by
343 hotplug_memory_notifier(callback_func, priority)
344
345The second argument of callback function (action) is event types of above.
346The third argument is passed by pointer of struct memory_notify.
347
348struct memory_notify {
349 unsigned long start_pfn;
350 unsigned long nr_pages;
351 int status_cahnge_nid;
352}
353
354start_pfn is start_pfn of online/offline memory.
355nr_pages is # of pages of online/offline memory.
356status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
357set/clear. It means a new(memoryless) node gets new memory by online and a
358node loses all memory. If this is -1, then nodemask status is not changed.
359If status_changed_nid >= 0, callback should create/discard structures for the
360node if necessary.
361
310-------------- 362--------------
3118. Future Work 3639. Future Work
312-------------- 364--------------
313 - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like 365 - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like
314 sysctl or new control file. 366 sysctl or new control file.