diff options
author | Matthew Wilcox <willy@linux.intel.com> | 2009-03-17 08:54:05 -0400 |
---|---|---|
committer | Jesse Barnes <jbarnes@virtuousgeek.org> | 2009-03-20 13:48:11 -0400 |
commit | c41ade2ee1dc146d2de2ee470a87cd6b878a08f4 (patch) | |
tree | 28de511376f31c5b6cc84348e46408a879f7a9f4 /Documentation/PCI | |
parent | 0994375e9614f78657031e04e30019b9cdb62795 (diff) |
Rewrite MSI-HOWTO
I didn't find the previous version very useful, so I rewrote it.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
Reviewed-by: Grant Grundler <grundler@parisc-linunx.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Diffstat (limited to 'Documentation/PCI')
-rw-r--r-- | Documentation/PCI/MSI-HOWTO.txt | 758 |
1 files changed, 277 insertions, 481 deletions
diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt index 256defd7e174..1c02431f1d1a 100644 --- a/Documentation/PCI/MSI-HOWTO.txt +++ b/Documentation/PCI/MSI-HOWTO.txt | |||
@@ -4,506 +4,302 @@ | |||
4 | Revised Feb 12, 2004 by Martine Silbermann | 4 | Revised Feb 12, 2004 by Martine Silbermann |
5 | email: Martine.Silbermann@hp.com | 5 | email: Martine.Silbermann@hp.com |
6 | Revised Jun 25, 2004 by Tom L Nguyen | 6 | Revised Jun 25, 2004 by Tom L Nguyen |
7 | Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com> | ||
8 | Copyright 2003, 2008 Intel Corporation | ||
7 | 9 | ||
8 | 1. About this guide | 10 | 1. About this guide |
9 | 11 | ||
10 | This guide describes the basics of Message Signaled Interrupts (MSI), | 12 | This guide describes the basics of Message Signaled Interrupts (MSIs), |
11 | the advantages of using MSI over traditional interrupt mechanisms, | 13 | the advantages of using MSI over traditional interrupt mechanisms, how |
12 | and how to enable your driver to use MSI or MSI-X. Also included is | 14 | to change your driver to use MSI or MSI-X and some basic diagnostics to |
13 | a Frequently Asked Questions (FAQ) section. | 15 | try if a device doesn't support MSIs. |
14 | 16 | ||
15 | 1.1 Terminology | 17 | |
16 | 18 | 2. What are MSIs? | |
17 | PCI devices can be single-function or multi-function. In either case, | 19 | |
18 | when this text talks about enabling or disabling MSI on a "device | 20 | A Message Signaled Interrupt is a write from the device to a special |
19 | function," it is referring to one specific PCI device and function and | 21 | address which causes an interrupt to be received by the CPU. |
20 | not to all functions on a PCI device (unless the PCI device has only | 22 | |
21 | one function). | 23 | The MSI capability was first specified in PCI 2.2 and was later enhanced |
22 | 24 | in PCI 3.0 to allow each interrupt to be masked individually. The MSI-X | |
23 | 2. Copyright 2003 Intel Corporation | 25 | capability was also introduced with PCI 3.0. It supports more interrupts |
24 | 26 | per device than MSI and allows interrupts to be independently configured. | |
25 | 3. What is MSI/MSI-X? | 27 | |
26 | 28 | Devices may support both MSI and MSI-X, but only one can be enabled at | |
27 | Message Signaled Interrupt (MSI), as described in the PCI Local Bus | 29 | a time. |
28 | Specification Revision 2.3 or later, is an optional feature, and a | 30 | |
29 | required feature for PCI Express devices. MSI enables a device function | 31 | |
30 | to request service by sending an Inbound Memory Write on its PCI bus to | 32 | 3. Why use MSIs? |
31 | the FSB as a Message Signal Interrupt transaction. Because MSI is | 33 | |
32 | generated in the form of a Memory Write, all transaction conditions, | 34 | There are three reasons why using MSIs can give an advantage over |
33 | such as a Retry, Master-Abort, Target-Abort or normal completion, are | 35 | traditional pin-based interrupts. |
34 | supported. | 36 | |
35 | 37 | Pin-based PCI interrupts are often shared amongst several devices. | |
36 | A PCI device that supports MSI must also support pin IRQ assertion | 38 | To support this, the kernel must call each interrupt handler associated |
37 | interrupt mechanism to provide backward compatibility for systems that | 39 | with an interrupt, which leads to reduced performance for the system as |
38 | do not support MSI. In systems which support MSI, the bus driver is | 40 | a whole. MSIs are never shared, so this problem cannot arise. |
39 | responsible for initializing the message address and message data of | 41 | |
40 | the device function's MSI/MSI-X capability structure during device | 42 | When a device writes data to memory, then raises a pin-based interrupt, |
41 | initial configuration. | 43 | it is possible that the interrupt may arrive before all the data has |
42 | 44 | arrived in memory (this becomes more likely with devices behind PCI-PCI | |
43 | An MSI capable device function indicates MSI support by implementing | 45 | bridges). In order to ensure that all the data has arrived in memory, |
44 | the MSI/MSI-X capability structure in its PCI capability list. The | 46 | the interrupt handler must read a register on the device which raised |
45 | device function may implement both the MSI capability structure and | 47 | the interrupt. PCI transaction ordering rules require that all the data |
46 | the MSI-X capability structure; however, the bus driver should not | 48 | arrives in memory before the value can be returned from the register. |
47 | enable both. | 49 | Using MSIs avoids this problem as the interrupt-generating write cannot |
48 | 50 | pass the data writes, so by the time the interrupt is raised, the driver | |
49 | The MSI capability structure contains Message Control register, | 51 | knows that all the data has arrived in memory. |
50 | Message Address register and Message Data register. These registers | 52 | |
51 | provide the bus driver control over MSI. The Message Control register | 53 | PCI devices can only support a single pin-based interrupt per function. |
52 | indicates the MSI capability supported by the device. The Message | 54 | Often drivers have to query the device to find out what event has |
53 | Address register specifies the target address and the Message Data | 55 | occurred, slowing down interrupt handling for the common case. With |
54 | register specifies the characteristics of the message. To request | 56 | MSIs, a device can support more interrupts, allowing each interrupt |
55 | service, the device function writes the content of the Message Data | 57 | to be specialised to a different purpose. One possible design gives |
56 | register to the target address. The device and its software driver | 58 | infrequent conditions (such as errors) their own interrupt which allows |
57 | are prohibited from writing to these registers. | 59 | the driver to handle the normal interrupt handling path more efficiently. |
58 | 60 | Other possible designs include giving one interrupt to each packet queue | |
59 | The MSI-X capability structure is an optional extension to MSI. It | 61 | in a network card or each port in a storage controller. |
60 | uses an independent and separate capability structure. There are | 62 | |
61 | some key advantages to implementing the MSI-X capability structure | 63 | |
62 | over the MSI capability structure as described below. | 64 | 4. How to use MSIs |
63 | 65 | ||
64 | - Support a larger maximum number of vectors per function. | 66 | PCI devices are initialised to use pin-based interrupts. The device |
65 | 67 | driver has to set up the device to use MSI or MSI-X. Not all machines | |
66 | - Provide the ability for system software to configure | 68 | support MSIs correctly, and for those machines, the APIs described below |
67 | each vector with an independent message address and message | 69 | will simply fail and the device will continue to use pin-based interrupts. |
68 | data, specified by a table that resides in Memory Space. | 70 | |
69 | 71 | 4.1 Include kernel support for MSIs | |
70 | - MSI and MSI-X both support per-vector masking. Per-vector | 72 | |
71 | masking is an optional extension of MSI but a required | 73 | To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI |
72 | feature for MSI-X. Per-vector masking provides the kernel the | 74 | option enabled. This option is only available on some architectures, |
73 | ability to mask/unmask a single MSI while running its | 75 | and it may depend on some other options also being set. For example, |
74 | interrupt service routine. If per-vector masking is | 76 | on x86, you must also enable X86_UP_APIC or SMP in order to see the |
75 | not supported, then the device driver should provide the | 77 | CONFIG_PCI_MSI option. |
76 | hardware/software synchronization to ensure that the device | 78 | |
77 | generates MSI when the driver wants it to do so. | 79 | 4.2 Using MSI |
78 | 80 | ||
79 | 4. Why use MSI? | 81 | Most of the hard work is done for the driver in the PCI layer. It simply |
80 | 82 | has to request that the PCI layer set up the MSI capability for this | |
81 | As a benefit to the simplification of board design, MSI allows board | 83 | device. |
82 | designers to remove out-of-band interrupt routing. MSI is another | 84 | |
83 | step towards a legacy-free environment. | 85 | 4.2.1 pci_enable_msi |
84 | |||
85 | Due to increasing pressure on chipset and processor packages to | ||
86 | reduce pin count, the need for interrupt pins is expected to | ||
87 | diminish over time. Devices, due to pin constraints, may implement | ||
88 | messages to increase performance. | ||
89 | |||
90 | PCI Express endpoints uses INTx emulation (in-band messages) instead | ||
91 | of IRQ pin assertion. Using INTx emulation requires interrupt | ||
92 | sharing among devices connected to the same node (PCI bridge) while | ||
93 | MSI is unique (non-shared) and does not require BIOS configuration | ||
94 | support. As a result, the PCI Express technology requires MSI | ||
95 | support for better interrupt performance. | ||
96 | |||
97 | Using MSI enables the device functions to support two or more | ||
98 | vectors, which can be configured to target different CPUs to | ||
99 | increase scalability. | ||
100 | |||
101 | 5. Configuring a driver to use MSI/MSI-X | ||
102 | |||
103 | By default, the kernel will not enable MSI/MSI-X on all devices that | ||
104 | support this capability. The CONFIG_PCI_MSI kernel option | ||
105 | must be selected to enable MSI/MSI-X support. | ||
106 | |||
107 | 5.1 Including MSI/MSI-X support into the kernel | ||
108 | |||
109 | To allow MSI/MSI-X capable device drivers to selectively enable | ||
110 | MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described | ||
111 | below), the VECTOR based scheme needs to be enabled by setting | ||
112 | CONFIG_PCI_MSI during kernel config. | ||
113 | |||
114 | Since the target of the inbound message is the local APIC, providing | ||
115 | CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI. | ||
116 | |||
117 | 5.2 Configuring for MSI support | ||
118 | |||
119 | Due to the non-contiguous fashion in vector assignment of the | ||
120 | existing Linux kernel, this version does not support multiple | ||
121 | messages regardless of a device function is capable of supporting | ||
122 | more than one vector. To enable MSI on a device function's MSI | ||
123 | capability structure requires a device driver to call the function | ||
124 | pci_enable_msi() explicitly. | ||
125 | |||
126 | 5.2.1 API pci_enable_msi | ||
127 | 86 | ||
128 | int pci_enable_msi(struct pci_dev *dev) | 87 | int pci_enable_msi(struct pci_dev *dev) |
129 | 88 | ||
130 | With this new API, a device driver that wants to have MSI | 89 | A successful call will allocate ONE interrupt to the device, regardless |
131 | enabled on its device function must call this API to enable MSI. | 90 | of how many MSIs the device supports. The device will be switched from |
132 | A successful call will initialize the MSI capability structure | 91 | pin-based interrupt mode to MSI mode. The dev->irq number is changed |
133 | with ONE vector, regardless of whether a device function is | 92 | to a new number which represents the message signaled interrupt. |
134 | capable of supporting multiple messages. This vector replaces the | 93 | This function should be called before the driver calls request_irq() |
135 | pre-assigned dev->irq with a new MSI vector. To avoid a conflict | 94 | since enabling MSIs disables the pin-based IRQ and the driver will not |
136 | of the new assigned vector with existing pre-assigned vector requires | 95 | receive interrupts on the old interrupt. |
137 | a device driver to call this API before calling request_irq(). | ||
138 | 96 | ||
139 | 5.2.2 API pci_disable_msi | 97 | 4.2.2 pci_disable_msi |
140 | 98 | ||
141 | void pci_disable_msi(struct pci_dev *dev) | 99 | void pci_disable_msi(struct pci_dev *dev) |
142 | 100 | ||
143 | This API should always be used to undo the effect of pci_enable_msi() | 101 | This function should be used to undo the effect of pci_enable_msi(). |
144 | when a device driver is unloading. This API restores dev->irq with | 102 | Calling it restores dev->irq to the pin-based interrupt number and frees |
145 | the pre-assigned IOAPIC vector and switches a device's interrupt | 103 | the previously allocated message signaled interrupt(s). The interrupt |
146 | mode to PCI pin-irq assertion/INTx emulation mode. | 104 | may subsequently be assigned to another device, so drivers should not |
147 | 105 | cache the value of dev->irq. | |
148 | Note that a device driver should always call free_irq() on the MSI vector | ||
149 | that it has done request_irq() on before calling this API. Failure to do | ||
150 | so results in a BUG_ON() and a device will be left with MSI enabled and | ||
151 | leaks its vector. | ||
152 | |||
153 | 5.2.3 MSI mode vs. legacy mode diagram | ||
154 | |||
155 | The below diagram shows the events which switch the interrupt | ||
156 | mode on the MSI-capable device function between MSI mode and | ||
157 | PIN-IRQ assertion mode. | ||
158 | |||
159 | ------------ pci_enable_msi ------------------------ | ||
160 | | | <=============== | | | ||
161 | | MSI MODE | | PIN-IRQ ASSERTION MODE | | ||
162 | | | ===============> | | | ||
163 | ------------ pci_disable_msi ------------------------ | ||
164 | |||
165 | |||
166 | Figure 1. MSI Mode vs. Legacy Mode | ||
167 | |||
168 | In Figure 1, a device operates by default in legacy mode. Legacy | ||
169 | in this context means PCI pin-irq assertion or PCI-Express INTx | ||
170 | emulation. A successful MSI request (using pci_enable_msi()) switches | ||
171 | a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector | ||
172 | stored in dev->irq will be saved by the PCI subsystem and a new | ||
173 | assigned MSI vector will replace dev->irq. | ||
174 | |||
175 | To return back to its default mode, a device driver should always call | ||
176 | pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a | ||
177 | device driver should always call free_irq() on the MSI vector it has | ||
178 | done request_irq() on before calling pci_disable_msi(). Failure to do | ||
179 | so results in a BUG_ON() and a device will be left with MSI enabled and | ||
180 | leaks its vector. Otherwise, the PCI subsystem restores a device's | ||
181 | dev->irq with a pre-assigned IOAPIC vector and marks the released | ||
182 | MSI vector as unused. | ||
183 | |||
184 | Once being marked as unused, there is no guarantee that the PCI | ||
185 | subsystem will reserve this MSI vector for a device. Depending on | ||
186 | the availability of current PCI vector resources and the number of | ||
187 | MSI/MSI-X requests from other drivers, this MSI may be re-assigned. | ||
188 | |||
189 | For the case where the PCI subsystem re-assigns this MSI vector to | ||
190 | another driver, a request to switch back to MSI mode may result | ||
191 | in being assigned a different MSI vector or a failure if no more | ||
192 | vectors are available. | ||
193 | |||
194 | 5.3 Configuring for MSI-X support | ||
195 | |||
196 | Due to the ability of the system software to configure each vector of | ||
197 | the MSI-X capability structure with an independent message address | ||
198 | and message data, the non-contiguous fashion in vector assignment of | ||
199 | the existing Linux kernel has no impact on supporting multiple | ||
200 | messages on an MSI-X capable device functions. To enable MSI-X on | ||
201 | a device function's MSI-X capability structure requires its device | ||
202 | driver to call the function pci_enable_msix() explicitly. | ||
203 | |||
204 | The function pci_enable_msix(), once invoked, enables either | ||
205 | all or nothing, depending on the current availability of PCI vector | ||
206 | resources. If the PCI vector resources are available for the number | ||
207 | of vectors requested by a device driver, this function will configure | ||
208 | the MSI-X table of the MSI-X capability structure of a device with | ||
209 | requested messages. To emphasize this reason, for example, a device | ||
210 | may be capable for supporting the maximum of 32 vectors while its | ||
211 | software driver usually may request 4 vectors. It is recommended | ||
212 | that the device driver should call this function once during the | ||
213 | initialization phase of the device driver. | ||
214 | |||
215 | Unlike the function pci_enable_msi(), the function pci_enable_msix() | ||
216 | does not replace the pre-assigned IOAPIC dev->irq with a new MSI | ||
217 | vector because the PCI subsystem writes the 1:1 vector-to-entry mapping | ||
218 | into the field vector of each element contained in a second argument. | ||
219 | Note that the pre-assigned IOAPIC dev->irq is valid only if the device | ||
220 | operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at | ||
221 | using dev->irq by the device driver to request for interrupt service | ||
222 | may result in unpredictable behavior. | ||
223 | |||
224 | For each MSI-X vector granted, a device driver is responsible for calling | ||
225 | other functions like request_irq(), enable_irq(), etc. to enable | ||
226 | this vector with its corresponding interrupt service handler. It is | ||
227 | a device driver's choice to assign all vectors with the same | ||
228 | interrupt service handler or each vector with a unique interrupt | ||
229 | service handler. | ||
230 | |||
231 | 5.3.1 Handling MMIO address space of MSI-X Table | ||
232 | |||
233 | The PCI 3.0 specification has implementation notes that MMIO address | ||
234 | space for a device's MSI-X structure should be isolated so that the | ||
235 | software system can set different pages for controlling accesses to the | ||
236 | MSI-X structure. The implementation of MSI support requires the PCI | ||
237 | subsystem, not a device driver, to maintain full control of the MSI-X | ||
238 | table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X | ||
239 | table/MSI-X PBA. A device driver should not access the MMIO address | ||
240 | space of the MSI-X table/MSI-X PBA. | ||
241 | |||
242 | 5.3.2 API pci_enable_msix | ||
243 | |||
244 | int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec) | ||
245 | 106 | ||
246 | This API enables a device driver to request the PCI subsystem | 107 | A device driver must always call free_irq() on the interrupt(s) |
247 | to enable MSI-X messages on its hardware device. Depending on | 108 | for which it has called request_irq() before calling this function. |
248 | the availability of PCI vectors resources, the PCI subsystem enables | 109 | Failure to do so will result in a BUG_ON(), the device will be left with |
249 | either all or none of the requested vectors. | 110 | MSI enabled and will leak its vector. |
250 | 111 | ||
251 | Argument 'dev' points to the device (pci_dev) structure. | 112 | 4.3 Using MSI-X |
252 | 113 | ||
253 | Argument 'entries' is a pointer to an array of msix_entry structs. | 114 | The MSI-X capability is much more flexible than the MSI capability. |
254 | The number of entries is indicated in argument 'nvec'. | 115 | It supports up to 2048 interrupts, each of which can be controlled |
255 | struct msix_entry is defined in /driver/pci/msi.h: | 116 | independently. To support this flexibility, drivers must use an array of |
117 | `struct msix_entry': | ||
256 | 118 | ||
257 | struct msix_entry { | 119 | struct msix_entry { |
258 | u16 vector; /* kernel uses to write alloc vector */ | 120 | u16 vector; /* kernel uses to write alloc vector */ |
259 | u16 entry; /* driver uses to specify entry */ | 121 | u16 entry; /* driver uses to specify entry */ |
260 | }; | 122 | }; |
261 | 123 | ||
262 | A device driver is responsible for initializing the field 'entry' of | 124 | This allows for the device to use these interrupts in a sparse fashion; |
263 | each element with a unique entry supported by MSI-X table. Otherwise, | 125 | for example it could use interrupts 3 and 1027 and allocate only a |
264 | -EINVAL will be returned as a result. A successful return of zero | 126 | two-element array. The driver is expected to fill in the 'entry' value |
265 | indicates the PCI subsystem completed initializing each of the requested | 127 | in each element of the array to indicate which entries it wants the kernel |
266 | entries of the MSI-X table with message address and message data. | 128 | to assign interrupts for. It is invalid to fill in two entries with the |
267 | Last but not least, the PCI subsystem will write the 1:1 | 129 | same number. |
268 | vector-to-entry mapping into the field 'vector' of each element. A | 130 | |
269 | device driver is responsible for keeping track of allocated MSI-X | 131 | 4.3.1 pci_enable_msix |
270 | vectors in its internal data structure. | 132 | |
271 | 133 | int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec) | |
272 | A return of zero indicates that the number of MSI-X vectors was | 134 | |
273 | successfully allocated. A return of greater than zero indicates | 135 | Calling this function asks the PCI subsystem to allocate 'nvec' MSIs. |
274 | MSI-X vector shortage. Or a return of less than zero indicates | 136 | The 'entries' argument is a pointer to an array of msix_entry structs |
275 | a failure. This failure may be a result of duplicate entries | 137 | which should be at least 'nvec' entries in size. On success, the |
276 | specified in second argument, or a result of no available vector, | 138 | function will return 0 and the device will have been switched into |
277 | or a result of failing to initialize MSI-X table entries. | 139 | MSI-X interrupt mode. The 'vector' elements in each entry will have |
278 | 140 | been filled in with the interrupt number. The driver should then call | |
279 | 5.3.3 API pci_disable_msix | 141 | request_irq() for each 'vector' that it decides to use. |
142 | |||
143 | If this function returns a negative number, it indicates an error and | ||
144 | the driver should not attempt to allocate any more MSI-X interrupts for | ||
145 | this device. If it returns a positive number, it indicates the maximum | ||
146 | number of interrupt vectors that could have been allocated. | ||
147 | |||
148 | This function, in contrast with pci_enable_msi(), does not adjust | ||
149 | dev->irq. The device will not generate interrupts for this interrupt | ||
150 | number once MSI-X is enabled. The device driver is responsible for | ||
151 | keeping track of the interrupts assigned to the MSI-X vectors so it can | ||
152 | free them again later. | ||
153 | |||
154 | Device drivers should normally call this function once per device | ||
155 | during the initialization phase. | ||
156 | |||
157 | 4.3.2 pci_disable_msix | ||
280 | 158 | ||
281 | void pci_disable_msix(struct pci_dev *dev) | 159 | void pci_disable_msix(struct pci_dev *dev) |
282 | 160 | ||
283 | This API should always be used to undo the effect of pci_enable_msix() | 161 | This API should be used to undo the effect of pci_enable_msix(). It frees |
284 | when a device driver is unloading. Note that a device driver should | 162 | the previously allocated message signaled interrupts. The interrupts may |
285 | always call free_irq() on all MSI-X vectors it has done request_irq() | 163 | subsequently be assigned to another device, so drivers should not cache |
286 | on before calling this API. Failure to do so results in a BUG_ON() and | 164 | the value of the 'vector' elements over a call to pci_disable_msix(). |
287 | a device will be left with MSI-X enabled and leaks its vectors. | 165 | |
288 | 166 | A device driver must always call free_irq() on the interrupt(s) | |
289 | 5.3.4 MSI-X mode vs. legacy mode diagram | 167 | for which it has called request_irq() before calling this function. |
290 | 168 | Failure to do so will result in a BUG_ON(), the device will be left with | |
291 | The below diagram shows the events which switch the interrupt | 169 | MSI enabled and will leak its vector. |
292 | mode on the MSI-X capable device function between MSI-X mode and | 170 | |
293 | PIN-IRQ assertion mode (legacy). | 171 | 4.3.3 The MSI-X Table |
294 | 172 | ||
295 | ------------ pci_enable_msix(,,n) ------------------------ | 173 | The MSI-X capability specifies a BAR and offset within that BAR for the |
296 | | | <=============== | | | 174 | MSI-X Table. This address is mapped by the PCI subsystem, and should not |
297 | | MSI-X MODE | | PIN-IRQ ASSERTION MODE | | 175 | be accessed directly by the device driver. If the driver wishes to |
298 | | | ===============> | | | 176 | mask or unmask an interrupt, it should call disable_irq() / enable_irq(). |
299 | ------------ pci_disable_msix ------------------------ | 177 | |
300 | 178 | 4.4 Handling devices implementing both MSI and MSI-X capabilities | |
301 | Figure 2. MSI-X Mode vs. Legacy Mode | 179 | |
302 | 180 | If a device implements both MSI and MSI-X capabilities, it can | |
303 | In Figure 2, a device operates by default in legacy mode. A | 181 | run in either MSI mode or MSI-X mode but not both simultaneously. |
304 | successful MSI-X request (using pci_enable_msix()) switches a | 182 | This is a requirement of the PCI spec, and it is enforced by the |
305 | device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector | 183 | PCI layer. Calling pci_enable_msi() when MSI-X is already enabled or |
306 | stored in dev->irq will be saved by the PCI subsystem; however, | 184 | pci_enable_msix() when MSI is already enabled will result in an error. |
307 | unlike MSI mode, the PCI subsystem will not replace dev->irq with | 185 | If a device driver wishes to switch between MSI and MSI-X at runtime, |
308 | assigned MSI-X vector because the PCI subsystem already writes the 1:1 | 186 | it must first quiesce the device, then switch it back to pin-interrupt |
309 | vector-to-entry mapping into the field 'vector' of each element | 187 | mode, before calling pci_enable_msi() or pci_enable_msix() and resuming |
310 | specified in second argument. | 188 | operation. This is not expected to be a common operation but may be |
311 | 189 | useful for debugging or testing during development. | |
312 | To return back to its default mode, a device driver should always call | 190 | |
313 | pci_disable_msix() to undo the effect of pci_enable_msix(). Note that | 191 | 4.5 Considerations when using MSIs |
314 | a device driver should always call free_irq() on all MSI-X vectors it | 192 | |
315 | has done request_irq() on before calling pci_disable_msix(). Failure | 193 | 4.5.1 Choosing between MSI-X and MSI |
316 | to do so results in a BUG_ON() and a device will be left with MSI-X | 194 | |
317 | enabled and leaks its vectors. Otherwise, the PCI subsystem switches a | 195 | If your device supports both MSI-X and MSI capabilities, you should use |
318 | device function's interrupt mode from MSI-X mode to legacy mode and | 196 | the MSI-X facilities in preference to the MSI facilities. As mentioned |
319 | marks all allocated MSI-X vectors as unused. | 197 | above, MSI-X supports any number of interrupts between 1 and 2048. |
320 | 198 | In constrast, MSI is restricted to a maximum of 32 interrupts (and | |
321 | Once being marked as unused, there is no guarantee that the PCI | 199 | must be a power of two). In addition, the MSI interrupt vectors must |
322 | subsystem will reserve these MSI-X vectors for a device. Depending on | 200 | be allocated consecutively, so the system may not be able to allocate |
323 | the availability of current PCI vector resources and the number of | 201 | as many vectors for MSI as it could for MSI-X. On some platforms, MSI |
324 | MSI/MSI-X requests from other drivers, these MSI-X vectors may be | 202 | interrupts must all be targetted at the same set of CPUs whereas MSI-X |
325 | re-assigned. | 203 | interrupts can all be targetted at different CPUs. |
326 | 204 | ||
327 | For the case where the PCI subsystem re-assigned these MSI-X vectors | 205 | 4.5.2 Spinlocks |
328 | to other drivers, a request to switch back to MSI-X mode may result | 206 | |
329 | being assigned with another set of MSI-X vectors or a failure if no | 207 | Most device drivers have a per-device spinlock which is taken in the |
330 | more vectors are available. | 208 | interrupt handler. With pin-based interrupts or a single MSI, it is not |
331 | 209 | necessary to disable interrupts (Linux guarantees the same interrupt will | |
332 | 5.4 Handling function implementing both MSI and MSI-X capabilities | 210 | not be re-entered). If a device uses multiple interrupts, the driver |
333 | 211 | must disable interrupts while the lock is held. If the device sends | |
334 | For the case where a function implements both MSI and MSI-X | 212 | a different interrupt, the driver will deadlock trying to recursively |
335 | capabilities, the PCI subsystem enables a device to run either in MSI | 213 | acquire the spinlock. |
336 | mode or MSI-X mode but not both. A device driver determines whether it | 214 | |
337 | wants MSI or MSI-X enabled on its hardware device. Once a device | 215 | There are two solutions. The first is to take the lock with |
338 | driver requests for MSI, for example, it is prohibited from requesting | 216 | spin_lock_irqsave() or spin_lock_irq() (see |
339 | MSI-X; in other words, a device driver is not permitted to ping-pong | 217 | Documentation/DocBook/kernel-locking). The second is to specify |
340 | between MSI mod MSI-X mode during a run-time. | 218 | IRQF_DISABLED to request_irq() so that the kernel runs the entire |
341 | 219 | interrupt routine with interrupts disabled. | |
342 | 5.5 Hardware requirements for MSI/MSI-X support | 220 | |
343 | 221 | If your MSI interrupt routine does not hold the lock for the whole time | |
344 | MSI/MSI-X support requires support from both system hardware and | 222 | it is running, the first solution may be best. The second solution is |
345 | individual hardware device functions. | 223 | normally preferred as it avoids making two transitions from interrupt |
346 | 224 | disabled to enabled and back again. | |
347 | 5.5.1 Required x86 hardware support | 225 | |
348 | 226 | 4.6 How to tell whether MSI/MSI-X is enabled on a device | |
349 | Since the target of MSI address is the local APIC CPU, enabling | 227 | |
350 | MSI/MSI-X support in the Linux kernel is dependent on whether existing | 228 | Using 'lspci -v' (as root) may show some devices with "MSI", "Message |
351 | system hardware supports local APIC. Users should verify that their | 229 | Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities |
352 | system supports local APIC operation by testing that it runs when | 230 | has an 'Enable' flag which will be followed with either "+" (enabled) |
353 | CONFIG_X86_LOCAL_APIC=y. | 231 | or "-" (disabled). |
354 | 232 | ||
355 | In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set; | 233 | |
356 | however, in UP environment, users must manually set | 234 | 5. MSI quirks |
357 | CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting | 235 | |
358 | CONFIG_PCI_MSI enables the VECTOR based scheme and the option for | 236 | Several PCI chipsets or devices are known not to support MSIs. |
359 | MSI-capable device drivers to selectively enable MSI/MSI-X. | 237 | The PCI stack provides three ways to disable MSIs: |
360 | 238 | ||
361 | Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X | 239 | 1. globally |
362 | vector is allocated new during runtime and MSI/MSI-X support does not | 240 | 2. on all devices behind a specific bridge |
363 | depend on BIOS support. This key independency enables MSI/MSI-X | 241 | 3. on a single device |
364 | support on future IOxAPIC free platforms. | 242 | |
365 | 243 | 5.1. Disabling MSIs globally | |
366 | 5.5.2 Device hardware support | 244 | |
367 | 245 | Some host chipsets simply don't support MSIs properly. If we're | |
368 | The hardware device function supports MSI by indicating the | 246 | lucky, the manufacturer knows this and has indicated it in the ACPI |
369 | MSI/MSI-X capability structure on its PCI capability list. By | 247 | FADT table. In this case, Linux will automatically disable MSIs. |
370 | default, this capability structure will not be initialized by | 248 | Some boards don't include this information in the table and so we have |
371 | the kernel to enable MSI during the system boot. In other words, | 249 | to detect them ourselves. The complete list of these is found near the |
372 | the device function is running on its default pin assertion mode. | 250 | quirk_disable_all_msi() function in drivers/pci/quirks.c. |
373 | Note that in many cases the hardware supporting MSI have bugs, | 251 | |
374 | which may result in system hangs. The software driver of specific | 252 | If you have a board which has problems with MSIs, you can pass pci=nomsi |
375 | MSI-capable hardware is responsible for deciding whether to call | 253 | on the kernel command line to disable MSIs on all devices. It would be |
376 | pci_enable_msi or not. A return of zero indicates the kernel | 254 | in your best interests to report the problem to linux-pci@vger.kernel.org |
377 | successfully initialized the MSI/MSI-X capability structure of the | 255 | including a full 'lspci -v' so we can add the quirks to the kernel. |
378 | device function. The device function is now running on MSI/MSI-X mode. | 256 | |
379 | 257 | 5.2. Disabling MSIs below a bridge | |
380 | 5.6 How to tell whether MSI/MSI-X is enabled on device function | 258 | |
381 | 259 | Some PCI bridges are not able to route MSIs between busses properly. | |
382 | At the driver level, a return of zero from the function call of | 260 | In this case, MSIs must be disabled on all devices behind the bridge. |
383 | pci_enable_msi()/pci_enable_msix() indicates to a device driver that | 261 | |
384 | its device function is initialized successfully and ready to run in | 262 | Some bridges allow you to enable MSIs by changing some bits in their |
385 | MSI/MSI-X mode. | 263 | PCI configuration space (especially the Hypertransport chipsets such |
386 | 264 | as the nVidia nForce and Serverworks HT2000). As with host chipsets, | |
387 | At the user level, users can use the command 'cat /proc/interrupts' | 265 | Linux mostly knows about them and automatically enables MSIs if it can. |
388 | to display the vectors allocated for devices and their interrupt | 266 | If you have a bridge which Linux doesn't yet know about, you can enable |
389 | MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is | 267 | MSIs in configuration space using whatever method you know works, then |
390 | enabled on a SCSI Adaptec 39320D Ultra320 controller. | 268 | enable MSIs on that bridge by doing: |
391 | 269 | ||
392 | CPU0 CPU1 | 270 | echo 1 > /sys/bus/pci/devices/$bridge/msi_bus |
393 | 0: 324639 0 IO-APIC-edge timer | 271 | |
394 | 1: 1186 0 IO-APIC-edge i8042 | 272 | where $bridge is the PCI address of the bridge you've enabled (eg |
395 | 2: 0 0 XT-PIC cascade | 273 | 0000:00:0e.0). |
396 | 12: 2797 0 IO-APIC-edge i8042 | 274 | |
397 | 14: 6543 0 IO-APIC-edge ide0 | 275 | To disable MSIs, echo 0 instead of 1. Changing this value should be |
398 | 15: 1 0 IO-APIC-edge ide1 | 276 | done with caution as it can break interrupt handling for all devices |
399 | 169: 0 0 IO-APIC-level uhci-hcd | 277 | below this bridge. |
400 | 185: 0 0 IO-APIC-level uhci-hcd | 278 | |
401 | 193: 138 10 PCI-MSI aic79xx | 279 | Again, please notify linux-pci@vger.kernel.org of any bridges that need |
402 | 201: 30 0 PCI-MSI aic79xx | 280 | special handling. |
403 | 225: 30 0 IO-APIC-level aic7xxx | 281 | |
404 | 233: 30 0 IO-APIC-level aic7xxx | 282 | 5.3. Disabling MSIs on a single device |
405 | NMI: 0 0 | 283 | |
406 | LOC: 324553 325068 | 284 | Some devices are known to have faulty MSI implementations. Usually this |
407 | ERR: 0 | 285 | is handled in the individual device driver but occasionally it's necessary |
408 | MIS: 0 | 286 | to handle this with a quirk. Some drivers have an option to disable use |
409 | 287 | of MSI. While this is a convenient workaround for the driver author, | |
410 | 6. MSI quirks | 288 | it is not good practise, and should not be emulated. |
411 | 289 | ||
412 | Several PCI chipsets or devices are known to not support MSI. | 290 | 5.4. Finding why MSIs are disabled on a device |
413 | The PCI stack provides 3 possible levels of MSI disabling: | 291 | |
414 | * on a single device | 292 | From the above three sections, you can see that there are many reasons |
415 | * on all devices behind a specific bridge | 293 | why MSIs may not be enabled for a given device. Your first step should |
416 | * globally | 294 | be to examine your dmesg carefully to determine whether MSIs are enabled |
417 | 295 | for your machine. You should also check your .config to be sure you | |
418 | 6.1. Disabling MSI on a single device | 296 | have enabled CONFIG_PCI_MSI. |
419 | 297 | ||
420 | Under some circumstances it might be required to disable MSI on a | 298 | Then, 'lspci -t' gives the list of bridges above a device. Reading |
421 | single device. This may be achieved by either not calling pci_enable_msi() | 299 | /sys/bus/pci/devices/*/msi_bus will tell you whether MSI are enabled (1) |
422 | or all, or setting the pci_dev->no_msi flag before (most of the time | 300 | or disabled (0). If 0 is found in any of the msi_bus files belonging |
423 | in a quirk). | 301 | to bridges between the PCI root and the device, MSIs are disabled. |
424 | 302 | ||
425 | 6.2. Disabling MSI below a bridge | 303 | It is also worth checking the device driver to see whether it supports MSIs. |
426 | 304 | For example, it may contain calls to pci_enable_msi(), pci_enable_msix() or | |
427 | The vast majority of MSI quirks are required by PCI bridges not | 305 | pci_enable_msi_block(). |
428 | being able to route MSI between busses. In this case, MSI have to be | ||
429 | disabled on all devices behind this bridge. It is achieves by setting | ||
430 | the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge | ||
431 | subordinate bus. There is no need to set the same flag on bridges that | ||
432 | are below the broken bridge. When pci_enable_msi() is called to enable | ||
433 | MSI on a device, pci_msi_supported() takes care of checking the NO_MSI | ||
434 | flag in all parent busses of the device. | ||
435 | |||
436 | Some bridges actually support dynamic MSI support enabling/disabling | ||
437 | by changing some bits in their PCI configuration space (especially | ||
438 | the Hypertransport chipsets such as the nVidia nForce and Serverworks | ||
439 | HT2000). It may then be required to update the NO_MSI flag on the | ||
440 | corresponding devices in the sysfs hierarchy. To enable MSI support | ||
441 | on device "0000:00:0e", do: | ||
442 | |||
443 | echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus | ||
444 | |||
445 | To disable MSI support, echo 0 instead of 1. Note that it should be | ||
446 | used with caution since changing this value might break interrupts. | ||
447 | |||
448 | 6.3. Disabling MSI globally | ||
449 | |||
450 | Some extreme cases may require to disable MSI globally on the system. | ||
451 | For now, the only known case is a Serverworks PCI-X chipsets (MSI are | ||
452 | not supported on several busses that are not all connected to the | ||
453 | chipset in the Linux PCI hierarchy). In the vast majority of other | ||
454 | cases, disabling only behind a specific bridge is enough. | ||
455 | |||
456 | For debugging purpose, the user may also pass pci=nomsi on the kernel | ||
457 | command-line to explicitly disable MSI globally. But, once the appro- | ||
458 | priate quirks are added to the kernel, this option should not be | ||
459 | required anymore. | ||
460 | |||
461 | 6.4. Finding why MSI cannot be enabled on a device | ||
462 | |||
463 | Assuming that MSI are not enabled on a device, you should look at | ||
464 | dmesg to find messages that quirks may output when disabling MSI | ||
465 | on some devices, some bridges or even globally. | ||
466 | Then, lspci -t gives the list of bridges above a device. Reading | ||
467 | /sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI | ||
468 | are enabled (1) or disabled (0). In 0 is found in a single bridge | ||
469 | msi_bus file above the device, MSI cannot be enabled. | ||
470 | |||
471 | 7. FAQ | ||
472 | |||
473 | Q1. Are there any limitations on using the MSI? | ||
474 | |||
475 | A1. If the PCI device supports MSI and conforms to the | ||
476 | specification and the platform supports the APIC local bus, | ||
477 | then using MSI should work. | ||
478 | |||
479 | Q2. Will it work on all the Pentium processors (P3, P4, Xeon, | ||
480 | AMD processors)? In P3 IPI's are transmitted on the APIC local | ||
481 | bus and in P4 and Xeon they are transmitted on the system | ||
482 | bus. Are there any implications with this? | ||
483 | |||
484 | A2. MSI support enables a PCI device sending an inbound | ||
485 | memory write (0xfeexxxxx as target address) on its PCI bus | ||
486 | directly to the FSB. Since the message address has a | ||
487 | redirection hint bit cleared, it should work. | ||
488 | |||
489 | Q3. The target address 0xfeexxxxx will be translated by the | ||
490 | Host Bridge into an interrupt message. Are there any | ||
491 | limitations on the chipsets such as Intel 8xx, Intel e7xxx, | ||
492 | or VIA? | ||
493 | |||
494 | A3. If these chipsets support an inbound memory write with | ||
495 | target address set as 0xfeexxxxx, as conformed to PCI | ||
496 | specification 2.3 or latest, then it should work. | ||
497 | |||
498 | Q4. From the driver point of view, if the MSI is lost because | ||
499 | of errors occurring during inbound memory write, then it may | ||
500 | wait forever. Is there a mechanism for it to recover? | ||
501 | |||
502 | A4. Since the target of the transaction is an inbound memory | ||
503 | write, all transaction termination conditions (Retry, | ||
504 | Master-Abort, Target-Abort, or normal completion) are | ||
505 | supported. A device sending an MSI must abide by all the PCI | ||
506 | rules and conditions regarding that inbound memory write. So, | ||
507 | if a retry is signaled it must retry, etc... We believe that | ||
508 | the recommendation for Abort is also a retry (refer to PCI | ||
509 | specification 2.3 or latest). | ||