diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/Intel-IOMMU.txt | 115 | ||||
-rw-r--r-- | Documentation/filesystems/Exporting | 115 | ||||
-rw-r--r-- | Documentation/i386/boot.txt | 34 | ||||
-rw-r--r-- | Documentation/kbuild/makefiles.txt | 10 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 17 | ||||
-rw-r--r-- | Documentation/memory-hotplug.txt | 58 |
6 files changed, 268 insertions, 81 deletions
diff --git a/Documentation/Intel-IOMMU.txt b/Documentation/Intel-IOMMU.txt new file mode 100644 index 000000000000..c2321903aa09 --- /dev/null +++ b/Documentation/Intel-IOMMU.txt | |||
@@ -0,0 +1,115 @@ | |||
1 | Linux IOMMU Support | ||
2 | =================== | ||
3 | |||
4 | The architecture spec can be obtained from the below location. | ||
5 | |||
6 | http://www.intel.com/technology/virtualization/ | ||
7 | |||
8 | This guide gives a quick cheat sheet for some basic understanding. | ||
9 | |||
10 | Some Keywords | ||
11 | |||
12 | DMAR - DMA remapping | ||
13 | DRHD - DMA Engine Reporting Structure | ||
14 | RMRR - Reserved memory Region Reporting Structure | ||
15 | ZLR - Zero length reads from PCI devices | ||
16 | IOVA - IO Virtual address. | ||
17 | |||
18 | Basic stuff | ||
19 | ----------- | ||
20 | |||
21 | ACPI enumerates and lists the different DMA engines in the platform, and | ||
22 | device scope relationships between PCI devices and which DMA engine controls | ||
23 | them. | ||
24 | |||
25 | What is RMRR? | ||
26 | ------------- | ||
27 | |||
28 | There are some devices the BIOS controls, for e.g USB devices to perform | ||
29 | PS2 emulation. The regions of memory used for these devices are marked | ||
30 | reserved in the e820 map. When we turn on DMA translation, DMA to those | ||
31 | regions will fail. Hence BIOS uses RMRR to specify these regions along with | ||
32 | devices that need to access these regions. OS is expected to setup | ||
33 | unity mappings for these regions for these devices to access these regions. | ||
34 | |||
35 | How is IOVA generated? | ||
36 | --------------------- | ||
37 | |||
38 | Well behaved drivers call pci_map_*() calls before sending command to device | ||
39 | that needs to perform DMA. Once DMA is completed and mapping is no longer | ||
40 | required, device performs a pci_unmap_*() calls to unmap the region. | ||
41 | |||
42 | The Intel IOMMU driver allocates a virtual address per domain. Each PCIE | ||
43 | device has its own domain (hence protection). Devices under p2p bridges | ||
44 | share the virtual address with all devices under the p2p bridge due to | ||
45 | transaction id aliasing for p2p bridges. | ||
46 | |||
47 | IOVA generation is pretty generic. We used the same technique as vmalloc() | ||
48 | but these are not global address spaces, but separate for each domain. | ||
49 | Different DMA engines may support different number of domains. | ||
50 | |||
51 | We also allocate gaurd pages with each mapping, so we can attempt to catch | ||
52 | any overflow that might happen. | ||
53 | |||
54 | |||
55 | Graphics Problems? | ||
56 | ------------------ | ||
57 | If you encounter issues with graphics devices, you can try adding | ||
58 | option intel_iommu=igfx_off to turn off the integrated graphics engine. | ||
59 | |||
60 | If it happens to be a PCI device included in the INCLUDE_ALL Engine, | ||
61 | then try enabling CONFIG_DMAR_GFX_WA to setup a 1-1 map. We hear | ||
62 | graphics drivers may be in process of using DMA api's in the near | ||
63 | future and at that time this option can be yanked out. | ||
64 | |||
65 | Some exceptions to IOVA | ||
66 | ----------------------- | ||
67 | Interrupt ranges are not address translated, (0xfee00000 - 0xfeefffff). | ||
68 | The same is true for peer to peer transactions. Hence we reserve the | ||
69 | address from PCI MMIO ranges so they are not allocated for IOVA addresses. | ||
70 | |||
71 | |||
72 | Fault reporting | ||
73 | --------------- | ||
74 | When errors are reported, the DMA engine signals via an interrupt. The fault | ||
75 | reason and device that caused it with fault reason is printed on console. | ||
76 | |||
77 | See below for sample. | ||
78 | |||
79 | |||
80 | Boot Message Sample | ||
81 | ------------------- | ||
82 | |||
83 | Something like this gets printed indicating presence of DMAR tables | ||
84 | in ACPI. | ||
85 | |||
86 | ACPI: DMAR (v001 A M I OEMDMAR 0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0 | ||
87 | |||
88 | When DMAR is being processed and initialized by ACPI, prints DMAR locations | ||
89 | and any RMRR's processed. | ||
90 | |||
91 | ACPI DMAR:Host address width 36 | ||
92 | ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000 | ||
93 | ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000 | ||
94 | ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000 | ||
95 | ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff | ||
96 | ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff | ||
97 | |||
98 | When DMAR is enabled for use, you will notice.. | ||
99 | |||
100 | PCI-DMA: Using DMAR IOMMU | ||
101 | |||
102 | Fault reporting | ||
103 | --------------- | ||
104 | |||
105 | DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 | ||
106 | DMAR:[fault reason 05] PTE Write access is not set | ||
107 | DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 | ||
108 | DMAR:[fault reason 05] PTE Write access is not set | ||
109 | |||
110 | TBD | ||
111 | ---- | ||
112 | |||
113 | - For compatibility testing, could use unity map domain for all devices, just | ||
114 | provide a 1-1 for all useful memory under a single domain for all devices. | ||
115 | - API for paravirt ops for abstracting functionlity for VMM folks. | ||
diff --git a/Documentation/filesystems/Exporting b/Documentation/filesystems/Exporting index 31047e0fe14b..87019d2b5981 100644 --- a/Documentation/filesystems/Exporting +++ b/Documentation/filesystems/Exporting | |||
@@ -2,9 +2,12 @@ | |||
2 | Making Filesystems Exportable | 2 | Making Filesystems Exportable |
3 | ============================= | 3 | ============================= |
4 | 4 | ||
5 | Most filesystem operations require a dentry (or two) as a starting | 5 | Overview |
6 | -------- | ||
7 | |||
8 | All filesystem operations require a dentry (or two) as a starting | ||
6 | point. Local applications have a reference-counted hold on suitable | 9 | point. Local applications have a reference-counted hold on suitable |
7 | dentrys via open file descriptors or cwd/root. However remote | 10 | dentries via open file descriptors or cwd/root. However remote |
8 | applications that access a filesystem via a remote filesystem protocol | 11 | applications that access a filesystem via a remote filesystem protocol |
9 | such as NFS may not be able to hold such a reference, and so need a | 12 | such as NFS may not be able to hold such a reference, and so need a |
10 | different way to refer to a particular dentry. As the alternative | 13 | different way to refer to a particular dentry. As the alternative |
@@ -13,14 +16,14 @@ server-reboot (among other things, though these tend to be the most | |||
13 | problematic), there is no simple answer like 'filename'. | 16 | problematic), there is no simple answer like 'filename'. |
14 | 17 | ||
15 | The mechanism discussed here allows each filesystem implementation to | 18 | The mechanism discussed here allows each filesystem implementation to |
16 | specify how to generate an opaque (out side of the filesystem) byte | 19 | specify how to generate an opaque (outside of the filesystem) byte |
17 | string for any dentry, and how to find an appropriate dentry for any | 20 | string for any dentry, and how to find an appropriate dentry for any |
18 | given opaque byte string. | 21 | given opaque byte string. |
19 | This byte string will be called a "filehandle fragment" as it | 22 | This byte string will be called a "filehandle fragment" as it |
20 | corresponds to part of an NFS filehandle. | 23 | corresponds to part of an NFS filehandle. |
21 | 24 | ||
22 | A filesystem which supports the mapping between filehandle fragments | 25 | A filesystem which supports the mapping between filehandle fragments |
23 | and dentrys will be termed "exportable". | 26 | and dentries will be termed "exportable". |
24 | 27 | ||
25 | 28 | ||
26 | 29 | ||
@@ -89,11 +92,9 @@ For a filesystem to be exportable it must: | |||
89 | 1/ provide the filehandle fragment routines described below. | 92 | 1/ provide the filehandle fragment routines described below. |
90 | 2/ make sure that d_splice_alias is used rather than d_add | 93 | 2/ make sure that d_splice_alias is used rather than d_add |
91 | when ->lookup finds an inode for a given parent and name. | 94 | when ->lookup finds an inode for a given parent and name. |
92 | Typically the ->lookup routine will end: | 95 | Typically the ->lookup routine will end with a: |
93 | if (inode) | 96 | |
94 | return d_splice(inode, dentry); | 97 | return d_splice_alias(inode, dentry); |
95 | d_add(dentry, inode); | ||
96 | return NULL; | ||
97 | } | 98 | } |
98 | 99 | ||
99 | 100 | ||
@@ -101,67 +102,39 @@ For a filesystem to be exportable it must: | |||
101 | A file system implementation declares that instances of the filesystem | 102 | A file system implementation declares that instances of the filesystem |
102 | are exportable by setting the s_export_op field in the struct | 103 | are exportable by setting the s_export_op field in the struct |
103 | super_block. This field must point to a "struct export_operations" | 104 | super_block. This field must point to a "struct export_operations" |
104 | struct which could potentially be full of NULLs, though normally at | 105 | struct which has the following members: |
105 | least get_parent will be set. | 106 | |
106 | 107 | encode_fh (optional) | |
107 | The primary operations are decode_fh and encode_fh. | 108 | Takes a dentry and creates a filehandle fragment which can later be used |
108 | decode_fh takes a filehandle fragment and tries to find or create a | 109 | to find or create a dentry for the same object. The default |
109 | dentry for the object referred to by the filehandle. | 110 | implementation creates a filehandle fragment that encodes a 32bit inode |
110 | encode_fh takes a dentry and creates a filehandle fragment which can | 111 | and generation number for the inode encoded, and if necessary the |
111 | later be used to find/create a dentry for the same object. | 112 | same information for the parent. |
112 | 113 | ||
113 | decode_fh will probably make use of "find_exported_dentry". | 114 | fh_to_dentry (mandatory) |
114 | This function lives in the "exportfs" module which a filesystem does | 115 | Given a filehandle fragment, this should find the implied object and |
115 | not need unless it is being exported. So rather that calling | 116 | create a dentry for it (possibly with d_alloc_anon). |
116 | find_exported_dentry directly, each filesystem should call it through | 117 | |
117 | the find_exported_dentry pointer in it's export_operations table. | 118 | fh_to_parent (optional but strongly recommended) |
118 | This field is set correctly by the exporting agent (e.g. nfsd) when a | 119 | Given a filehandle fragment, this should find the parent of the |
119 | filesystem is exported, and before any export operations are called. | 120 | implied object and create a dentry for it (possibly with d_alloc_anon). |
120 | 121 | May fail if the filehandle fragment is too small. | |
121 | find_exported_dentry needs three support functions from the | 122 | |
122 | filesystem: | 123 | get_parent (optional but strongly recommended) |
123 | get_name. When given a parent dentry and a child dentry, this | 124 | When given a dentry for a directory, this should return a dentry for |
124 | should find a name in the directory identified by the parent | 125 | the parent. Quite possibly the parent dentry will have been allocated |
125 | dentry, which leads to the object identified by the child dentry. | 126 | by d_alloc_anon. The default get_parent function just returns an error |
126 | If no get_name function is supplied, a default implementation is | 127 | so any filehandle lookup that requires finding a parent will fail. |
127 | provided which uses vfs_readdir to find potential names, and | 128 | ->lookup("..") is *not* used as a default as it can leave ".." entries |
128 | matches inode numbers to find the correct match. | 129 | in the dcache which are too messy to work with. |
129 | 130 | ||
130 | get_parent. When given a dentry for a directory, this should return | 131 | get_name (optional) |
131 | a dentry for the parent. Quite possibly the parent dentry will | 132 | When given a parent dentry and a child dentry, this should find a name |
132 | have been allocated by d_alloc_anon. | 133 | in the directory identified by the parent dentry, which leads to the |
133 | The default get_parent function just returns an error so any | 134 | object identified by the child dentry. If no get_name function is |
134 | filehandle lookup that requires finding a parent will fail. | 135 | supplied, a default implementation is provided which uses vfs_readdir |
135 | ->lookup("..") is *not* used as a default as it can leave ".." | 136 | to find potential names, and matches inode numbers to find the correct |
136 | entries in the dcache which are too messy to work with. | 137 | match. |
137 | |||
138 | get_dentry. When given an opaque datum, this should find the | ||
139 | implied object and create a dentry for it (possibly with | ||
140 | d_alloc_anon). | ||
141 | The opaque datum is whatever is passed down by the decode_fh | ||
142 | function, and is often simply a fragment of the filehandle | ||
143 | fragment. | ||
144 | decode_fh passes two datums through find_exported_dentry. One that | ||
145 | should be used to identify the target object, and one that can be | ||
146 | used to identify the object's parent, should that be necessary. | ||
147 | The default get_dentry function assumes that the datum contains an | ||
148 | inode number and a generation number, and it attempts to get the | ||
149 | inode using "iget" and check it's validity by matching the | ||
150 | generation number. A filesystem should only depend on the default | ||
151 | if iget can safely be used this way. | ||
152 | |||
153 | If decode_fh and/or encode_fh are left as NULL, then default | ||
154 | implementations are used. These defaults are suitable for ext2 and | ||
155 | extremely similar filesystems (like ext3). | ||
156 | |||
157 | The default encode_fh creates a filehandle fragment from the inode | ||
158 | number and generation number of the target together with the inode | ||
159 | number and generation number of the parent (if the parent is | ||
160 | required). | ||
161 | |||
162 | The default decode_fh extract the target and parent datums from the | ||
163 | filehandle assuming the format used by the default encode_fh and | ||
164 | passed them to find_exported_dentry. | ||
165 | 138 | ||
166 | 139 | ||
167 | A filehandle fragment consists of an array of 1 or more 4byte words, | 140 | A filehandle fragment consists of an array of 1 or more 4byte words, |
@@ -172,5 +145,3 @@ generated by encode_fh, in which case it will have been padded with | |||
172 | nuls. Rather, the encode_fh routine should choose a "type" which | 145 | nuls. Rather, the encode_fh routine should choose a "type" which |
173 | indicates the decode_fh how much of the filehandle is valid, and how | 146 | indicates the decode_fh how much of the filehandle is valid, and how |
174 | it should be interpreted. | 147 | it should be interpreted. |
175 | |||
176 | |||
diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt index 35985b34d5a6..2f75e750e4f5 100644 --- a/Documentation/i386/boot.txt +++ b/Documentation/i386/boot.txt | |||
@@ -168,6 +168,8 @@ Offset Proto Name Meaning | |||
168 | 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not | 168 | 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not |
169 | 0235/3 N/A pad2 Unused | 169 | 0235/3 N/A pad2 Unused |
170 | 0238/4 2.06+ cmdline_size Maximum size of the kernel command line | 170 | 0238/4 2.06+ cmdline_size Maximum size of the kernel command line |
171 | 023C/4 2.07+ hardware_subarch Hardware subarchitecture | ||
172 | 0240/8 2.07+ hardware_subarch_data Subarchitecture-specific data | ||
171 | 173 | ||
172 | (1) For backwards compatibility, if the setup_sects field contains 0, the | 174 | (1) For backwards compatibility, if the setup_sects field contains 0, the |
173 | real value is 4. | 175 | real value is 4. |
@@ -204,7 +206,7 @@ boot loaders can ignore those fields. | |||
204 | 206 | ||
205 | The byte order of all fields is littleendian (this is x86, after all.) | 207 | The byte order of all fields is littleendian (this is x86, after all.) |
206 | 208 | ||
207 | Field name: setup_secs | 209 | Field name: setup_sects |
208 | Type: read | 210 | Type: read |
209 | Offset/size: 0x1f1/1 | 211 | Offset/size: 0x1f1/1 |
210 | Protocol: ALL | 212 | Protocol: ALL |
@@ -356,6 +358,13 @@ Protocol: 2.00+ | |||
356 | - If 0, the protected-mode code is loaded at 0x10000. | 358 | - If 0, the protected-mode code is loaded at 0x10000. |
357 | - If 1, the protected-mode code is loaded at 0x100000. | 359 | - If 1, the protected-mode code is loaded at 0x100000. |
358 | 360 | ||
361 | Bit 6 (write): KEEP_SEGMENTS | ||
362 | Protocol: 2.07+ | ||
363 | - if 0, reload the segment registers in the 32bit entry point. | ||
364 | - if 1, do not reload the segment registers in the 32bit entry point. | ||
365 | Assume that %cs %ds %ss %es are all set to flat segments with | ||
366 | a base of 0 (or the equivalent for their environment). | ||
367 | |||
359 | Bit 7 (write): CAN_USE_HEAP | 368 | Bit 7 (write): CAN_USE_HEAP |
360 | Set this bit to 1 to indicate that the value entered in the | 369 | Set this bit to 1 to indicate that the value entered in the |
361 | heap_end_ptr is valid. If this field is clear, some setup code | 370 | heap_end_ptr is valid. If this field is clear, some setup code |
@@ -480,6 +489,29 @@ Protocol: 2.06+ | |||
480 | cmdline_size characters. With protocol version 2.05 and earlier, the | 489 | cmdline_size characters. With protocol version 2.05 and earlier, the |
481 | maximum size was 255. | 490 | maximum size was 255. |
482 | 491 | ||
492 | Field name: hardware_subarch | ||
493 | Type: write | ||
494 | Offset/size: 0x23c/4 | ||
495 | Protocol: 2.07+ | ||
496 | |||
497 | In a paravirtualized environment the hardware low level architectural | ||
498 | pieces such as interrupt handling, page table handling, and | ||
499 | accessing process control registers needs to be done differently. | ||
500 | |||
501 | This field allows the bootloader to inform the kernel we are in one | ||
502 | one of those environments. | ||
503 | |||
504 | 0x00000000 The default x86/PC environment | ||
505 | 0x00000001 lguest | ||
506 | 0x00000002 Xen | ||
507 | |||
508 | Field name: hardware_subarch_data | ||
509 | Type: write | ||
510 | Offset/size: 0x240/8 | ||
511 | Protocol: 2.07+ | ||
512 | |||
513 | A pointer to data that is specific to hardware subarch | ||
514 | |||
483 | 515 | ||
484 | **** THE KERNEL COMMAND LINE | 516 | **** THE KERNEL COMMAND LINE |
485 | 517 | ||
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index 6166e2d7da76..7a7753321a26 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt | |||
@@ -519,17 +519,17 @@ more details, with real examples. | |||
519 | to the user why it stops. | 519 | to the user why it stops. |
520 | 520 | ||
521 | cc-cross-prefix | 521 | cc-cross-prefix |
522 | cc-cross-prefix is used to check if there exist a $(CC) in path with | 522 | cc-cross-prefix is used to check if there exists a $(CC) in path with |
523 | one of the listed prefixes. The first prefix where there exist a | 523 | one of the listed prefixes. The first prefix where there exist a |
524 | prefix$(CC) in the PATH is returned - and if no prefix$(CC) is found | 524 | prefix$(CC) in the PATH is returned - and if no prefix$(CC) is found |
525 | then nothing is returned. | 525 | then nothing is returned. |
526 | Additional prefixes are separated by a single space in the | 526 | Additional prefixes are separated by a single space in the |
527 | call of cc-cross-prefix. | 527 | call of cc-cross-prefix. |
528 | This functionality is usefull for architecture Makefile that try | 528 | This functionality is useful for architecture Makefiles that try |
529 | to set CROSS_COMPILE to well know values but may have several | 529 | to set CROSS_COMPILE to well-known values but may have several |
530 | values to select between. | 530 | values to select between. |
531 | It is recommended only to try to set CROSS_COMPILE is it is a cross | 531 | It is recommended only to try to set CROSS_COMPILE if it is a cross |
532 | build (host arch is different from target arch). And is CROSS_COMPILE | 532 | build (host arch is different from target arch). And if CROSS_COMPILE |
533 | is already set then leave it with the old value. | 533 | is already set then leave it with the old value. |
534 | 534 | ||
535 | Example: | 535 | Example: |
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 6accd360da73..b2361667839f 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -772,6 +772,23 @@ and is between 256 and 4096 characters. It is defined in the file | |||
772 | 772 | ||
773 | inttest= [IA64] | 773 | inttest= [IA64] |
774 | 774 | ||
775 | intel_iommu= [DMAR] Intel IOMMU driver (DMAR) option | ||
776 | off | ||
777 | Disable intel iommu driver. | ||
778 | igfx_off [Default Off] | ||
779 | By default, gfx is mapped as normal device. If a gfx | ||
780 | device has a dedicated DMAR unit, the DMAR unit is | ||
781 | bypassed by not enabling DMAR with this option. In | ||
782 | this case, gfx device will use physical address for | ||
783 | DMA. | ||
784 | forcedac [x86_64] | ||
785 | With this option iommu will not optimize to look | ||
786 | for io virtual address below 32 bit forcing dual | ||
787 | address cycle on pci bus for cards supporting greater | ||
788 | than 32 bit addressing. The default is to look | ||
789 | for translation below 32 bit and if not available | ||
790 | then look in the higher range. | ||
791 | |||
775 | io7= [HW] IO7 for Marvel based alpha systems | 792 | io7= [HW] IO7 for Marvel based alpha systems |
776 | See comment before marvel_specify_io7 in | 793 | See comment before marvel_specify_io7 in |
777 | arch/alpha/kernel/core_marvel.c. | 794 | arch/alpha/kernel/core_marvel.c. |
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt index 5fbcc22c98e9..168117bd6ee8 100644 --- a/Documentation/memory-hotplug.txt +++ b/Documentation/memory-hotplug.txt | |||
@@ -2,7 +2,8 @@ | |||
2 | Memory Hotplug | 2 | Memory Hotplug |
3 | ============== | 3 | ============== |
4 | 4 | ||
5 | Last Updated: Jul 28 2007 | 5 | Created: Jul 28 2007 |
6 | Add description of notifier of memory hotplug Oct 11 2007 | ||
6 | 7 | ||
7 | This document is about memory hotplug including how-to-use and current status. | 8 | This document is about memory hotplug including how-to-use and current status. |
8 | Because Memory Hotplug is still under development, contents of this text will | 9 | Because Memory Hotplug is still under development, contents of this text will |
@@ -24,7 +25,8 @@ be changed often. | |||
24 | 6.1 Memory offline and ZONE_MOVABLE | 25 | 6.1 Memory offline and ZONE_MOVABLE |
25 | 6.2. How to offline memory | 26 | 6.2. How to offline memory |
26 | 7. Physical memory remove | 27 | 7. Physical memory remove |
27 | 8. Future Work List | 28 | 8. Memory hotplug event notifier |
29 | 9. Future Work List | ||
28 | 30 | ||
29 | Note(1): x86_64's has special implementation for memory hotplug. | 31 | Note(1): x86_64's has special implementation for memory hotplug. |
30 | This text does not describe it. | 32 | This text does not describe it. |
@@ -307,8 +309,58 @@ Need more implementation yet.... | |||
307 | - Notification completion of remove works by OS to firmware. | 309 | - Notification completion of remove works by OS to firmware. |
308 | - Guard from remove if not yet. | 310 | - Guard from remove if not yet. |
309 | 311 | ||
312 | -------------------------------- | ||
313 | 8. Memory hotplug event notifier | ||
314 | -------------------------------- | ||
315 | Memory hotplug has event notifer. There are 6 types of notification. | ||
316 | |||
317 | MEMORY_GOING_ONLINE | ||
318 | Generated before new memory becomes available in order to be able to | ||
319 | prepare subsystems to handle memory. The page allocator is still unable | ||
320 | to allocate from the new memory. | ||
321 | |||
322 | MEMORY_CANCEL_ONLINE | ||
323 | Generated if MEMORY_GOING_ONLINE fails. | ||
324 | |||
325 | MEMORY_ONLINE | ||
326 | Generated when memory has succesfully brought online. The callback may | ||
327 | allocate pages from the new memory. | ||
328 | |||
329 | MEMORY_GOING_OFFLINE | ||
330 | Generated to begin the process of offlining memory. Allocations are no | ||
331 | longer possible from the memory but some of the memory to be offlined | ||
332 | is still in use. The callback can be used to free memory known to a | ||
333 | subsystem from the indicated memory section. | ||
334 | |||
335 | MEMORY_CANCEL_OFFLINE | ||
336 | Generated if MEMORY_GOING_OFFLINE fails. Memory is available again from | ||
337 | the section that we attempted to offline. | ||
338 | |||
339 | MEMORY_OFFLINE | ||
340 | Generated after offlining memory is complete. | ||
341 | |||
342 | A callback routine can be registered by | ||
343 | hotplug_memory_notifier(callback_func, priority) | ||
344 | |||
345 | The second argument of callback function (action) is event types of above. | ||
346 | The third argument is passed by pointer of struct memory_notify. | ||
347 | |||
348 | struct memory_notify { | ||
349 | unsigned long start_pfn; | ||
350 | unsigned long nr_pages; | ||
351 | int status_cahnge_nid; | ||
352 | } | ||
353 | |||
354 | start_pfn is start_pfn of online/offline memory. | ||
355 | nr_pages is # of pages of online/offline memory. | ||
356 | status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be) | ||
357 | set/clear. It means a new(memoryless) node gets new memory by online and a | ||
358 | node loses all memory. If this is -1, then nodemask status is not changed. | ||
359 | If status_changed_nid >= 0, callback should create/discard structures for the | ||
360 | node if necessary. | ||
361 | |||
310 | -------------- | 362 | -------------- |
311 | 8. Future Work | 363 | 9. Future Work |
312 | -------------- | 364 | -------------- |
313 | - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like | 365 | - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like |
314 | sysctl or new control file. | 366 | sysctl or new control file. |