aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorMatthew Wilcox <willy@linux.intel.com>2009-03-17 08:54:05 -0400
committerJesse Barnes <jbarnes@virtuousgeek.org>2009-03-20 13:48:11 -0400
commitc41ade2ee1dc146d2de2ee470a87cd6b878a08f4 (patch)
tree28de511376f31c5b6cc84348e46408a879f7a9f4 /Documentation
parent0994375e9614f78657031e04e30019b9cdb62795 (diff)
Rewrite MSI-HOWTO
I didn't find the previous version very useful, so I rewrote it. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com> Reviewed-by: Grant Grundler <grundler@parisc-linunx.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/PCI/MSI-HOWTO.txt758
1 files changed, 277 insertions, 481 deletions
diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
index 256defd7e174..1c02431f1d1a 100644
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.txt
@@ -4,506 +4,302 @@
4 Revised Feb 12, 2004 by Martine Silbermann 4 Revised Feb 12, 2004 by Martine Silbermann
5 email: Martine.Silbermann@hp.com 5 email: Martine.Silbermann@hp.com
6 Revised Jun 25, 2004 by Tom L Nguyen 6 Revised Jun 25, 2004 by Tom L Nguyen
7 Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com>
8 Copyright 2003, 2008 Intel Corporation
7 9
81. About this guide 101. About this guide
9 11
10This guide describes the basics of Message Signaled Interrupts (MSI), 12This guide describes the basics of Message Signaled Interrupts (MSIs),
11the advantages of using MSI over traditional interrupt mechanisms, 13the advantages of using MSI over traditional interrupt mechanisms, how
12and how to enable your driver to use MSI or MSI-X. Also included is 14to change your driver to use MSI or MSI-X and some basic diagnostics to
13a Frequently Asked Questions (FAQ) section. 15try if a device doesn't support MSIs.
14 16
151.1 Terminology 17
16 182. What are MSIs?
17PCI devices can be single-function or multi-function. In either case, 19
18when this text talks about enabling or disabling MSI on a "device 20A Message Signaled Interrupt is a write from the device to a special
19function," it is referring to one specific PCI device and function and 21address which causes an interrupt to be received by the CPU.
20not to all functions on a PCI device (unless the PCI device has only 22
21one function). 23The MSI capability was first specified in PCI 2.2 and was later enhanced
22 24in PCI 3.0 to allow each interrupt to be masked individually. The MSI-X
232. Copyright 2003 Intel Corporation 25capability was also introduced with PCI 3.0. It supports more interrupts
24 26per device than MSI and allows interrupts to be independently configured.
253. What is MSI/MSI-X? 27
26 28Devices may support both MSI and MSI-X, but only one can be enabled at
27Message Signaled Interrupt (MSI), as described in the PCI Local Bus 29a time.
28Specification Revision 2.3 or later, is an optional feature, and a 30
29required feature for PCI Express devices. MSI enables a device function 31
30to request service by sending an Inbound Memory Write on its PCI bus to 323. Why use MSIs?
31the FSB as a Message Signal Interrupt transaction. Because MSI is 33
32generated in the form of a Memory Write, all transaction conditions, 34There are three reasons why using MSIs can give an advantage over
33such as a Retry, Master-Abort, Target-Abort or normal completion, are 35traditional pin-based interrupts.
34supported. 36
35 37Pin-based PCI interrupts are often shared amongst several devices.
36A PCI device that supports MSI must also support pin IRQ assertion 38To support this, the kernel must call each interrupt handler associated
37interrupt mechanism to provide backward compatibility for systems that 39with an interrupt, which leads to reduced performance for the system as
38do not support MSI. In systems which support MSI, the bus driver is 40a whole. MSIs are never shared, so this problem cannot arise.
39responsible for initializing the message address and message data of 41
40the device function's MSI/MSI-X capability structure during device 42When a device writes data to memory, then raises a pin-based interrupt,
41initial configuration. 43it is possible that the interrupt may arrive before all the data has
42 44arrived in memory (this becomes more likely with devices behind PCI-PCI
43An MSI capable device function indicates MSI support by implementing 45bridges). In order to ensure that all the data has arrived in memory,
44the MSI/MSI-X capability structure in its PCI capability list. The 46the interrupt handler must read a register on the device which raised
45device function may implement both the MSI capability structure and 47the interrupt. PCI transaction ordering rules require that all the data
46the MSI-X capability structure; however, the bus driver should not 48arrives in memory before the value can be returned from the register.
47enable both. 49Using MSIs avoids this problem as the interrupt-generating write cannot
48 50pass the data writes, so by the time the interrupt is raised, the driver
49The MSI capability structure contains Message Control register, 51knows that all the data has arrived in memory.
50Message Address register and Message Data register. These registers 52
51provide the bus driver control over MSI. The Message Control register 53PCI devices can only support a single pin-based interrupt per function.
52indicates the MSI capability supported by the device. The Message 54Often drivers have to query the device to find out what event has
53Address register specifies the target address and the Message Data 55occurred, slowing down interrupt handling for the common case. With
54register specifies the characteristics of the message. To request 56MSIs, a device can support more interrupts, allowing each interrupt
55service, the device function writes the content of the Message Data 57to be specialised to a different purpose. One possible design gives
56register to the target address. The device and its software driver 58infrequent conditions (such as errors) their own interrupt which allows
57are prohibited from writing to these registers. 59the driver to handle the normal interrupt handling path more efficiently.
58 60Other possible designs include giving one interrupt to each packet queue
59The MSI-X capability structure is an optional extension to MSI. It 61in a network card or each port in a storage controller.
60uses an independent and separate capability structure. There are 62
61some key advantages to implementing the MSI-X capability structure 63
62over the MSI capability structure as described below. 644. How to use MSIs
63 65
64 - Support a larger maximum number of vectors per function. 66PCI devices are initialised to use pin-based interrupts. The device
65 67driver has to set up the device to use MSI or MSI-X. Not all machines
66 - Provide the ability for system software to configure 68support MSIs correctly, and for those machines, the APIs described below
67 each vector with an independent message address and message 69will simply fail and the device will continue to use pin-based interrupts.
68 data, specified by a table that resides in Memory Space. 70
69 714.1 Include kernel support for MSIs
70 - MSI and MSI-X both support per-vector masking. Per-vector 72
71 masking is an optional extension of MSI but a required 73To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
72 feature for MSI-X. Per-vector masking provides the kernel the 74option enabled. This option is only available on some architectures,
73 ability to mask/unmask a single MSI while running its 75and it may depend on some other options also being set. For example,
74 interrupt service routine. If per-vector masking is 76on x86, you must also enable X86_UP_APIC or SMP in order to see the
75 not supported, then the device driver should provide the 77CONFIG_PCI_MSI option.
76 hardware/software synchronization to ensure that the device 78
77 generates MSI when the driver wants it to do so. 794.2 Using MSI
78 80
794. Why use MSI? 81Most of the hard work is done for the driver in the PCI layer. It simply
80 82has to request that the PCI layer set up the MSI capability for this
81As a benefit to the simplification of board design, MSI allows board 83device.
82designers to remove out-of-band interrupt routing. MSI is another 84
83step towards a legacy-free environment. 854.2.1 pci_enable_msi
84
85Due to increasing pressure on chipset and processor packages to
86reduce pin count, the need for interrupt pins is expected to
87diminish over time. Devices, due to pin constraints, may implement
88messages to increase performance.
89
90PCI Express endpoints uses INTx emulation (in-band messages) instead
91of IRQ pin assertion. Using INTx emulation requires interrupt
92sharing among devices connected to the same node (PCI bridge) while
93MSI is unique (non-shared) and does not require BIOS configuration
94support. As a result, the PCI Express technology requires MSI
95support for better interrupt performance.
96
97Using MSI enables the device functions to support two or more
98vectors, which can be configured to target different CPUs to
99increase scalability.
100
1015. Configuring a driver to use MSI/MSI-X
102
103By default, the kernel will not enable MSI/MSI-X on all devices that
104support this capability. The CONFIG_PCI_MSI kernel option
105must be selected to enable MSI/MSI-X support.
106
1075.1 Including MSI/MSI-X support into the kernel
108
109To allow MSI/MSI-X capable device drivers to selectively enable
110MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described
111below), the VECTOR based scheme needs to be enabled by setting
112CONFIG_PCI_MSI during kernel config.
113
114Since the target of the inbound message is the local APIC, providing
115CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.
116
1175.2 Configuring for MSI support
118
119Due to the non-contiguous fashion in vector assignment of the
120existing Linux kernel, this version does not support multiple
121messages regardless of a device function is capable of supporting
122more than one vector. To enable MSI on a device function's MSI
123capability structure requires a device driver to call the function
124pci_enable_msi() explicitly.
125
1265.2.1 API pci_enable_msi
127 86
128int pci_enable_msi(struct pci_dev *dev) 87int pci_enable_msi(struct pci_dev *dev)
129 88
130With this new API, a device driver that wants to have MSI 89A successful call will allocate ONE interrupt to the device, regardless
131enabled on its device function must call this API to enable MSI. 90of how many MSIs the device supports. The device will be switched from
132A successful call will initialize the MSI capability structure 91pin-based interrupt mode to MSI mode. The dev->irq number is changed
133with ONE vector, regardless of whether a device function is 92to a new number which represents the message signaled interrupt.
134capable of supporting multiple messages. This vector replaces the 93This function should be called before the driver calls request_irq()
135pre-assigned dev->irq with a new MSI vector. To avoid a conflict 94since enabling MSIs disables the pin-based IRQ and the driver will not
136of the new assigned vector with existing pre-assigned vector requires 95receive interrupts on the old interrupt.
137a device driver to call this API before calling request_irq().
138 96
1395.2.2 API pci_disable_msi 974.2.2 pci_disable_msi
140 98
141void pci_disable_msi(struct pci_dev *dev) 99void pci_disable_msi(struct pci_dev *dev)
142 100
143This API should always be used to undo the effect of pci_enable_msi() 101This function should be used to undo the effect of pci_enable_msi().
144when a device driver is unloading. This API restores dev->irq with 102Calling it restores dev->irq to the pin-based interrupt number and frees
145the pre-assigned IOAPIC vector and switches a device's interrupt 103the previously allocated message signaled interrupt(s). The interrupt
146mode to PCI pin-irq assertion/INTx emulation mode. 104may subsequently be assigned to another device, so drivers should not
147 105cache the value of dev->irq.
148Note that a device driver should always call free_irq() on the MSI vector
149that it has done request_irq() on before calling this API. Failure to do
150so results in a BUG_ON() and a device will be left with MSI enabled and
151leaks its vector.
152
1535.2.3 MSI mode vs. legacy mode diagram
154
155The below diagram shows the events which switch the interrupt
156mode on the MSI-capable device function between MSI mode and
157PIN-IRQ assertion mode.
158
159 ------------ pci_enable_msi ------------------------
160 | | <=============== | |
161 | MSI MODE | | PIN-IRQ ASSERTION MODE |
162 | | ===============> | |
163 ------------ pci_disable_msi ------------------------
164
165
166Figure 1. MSI Mode vs. Legacy Mode
167
168In Figure 1, a device operates by default in legacy mode. Legacy
169in this context means PCI pin-irq assertion or PCI-Express INTx
170emulation. A successful MSI request (using pci_enable_msi()) switches
171a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector
172stored in dev->irq will be saved by the PCI subsystem and a new
173assigned MSI vector will replace dev->irq.
174
175To return back to its default mode, a device driver should always call
176pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a
177device driver should always call free_irq() on the MSI vector it has
178done request_irq() on before calling pci_disable_msi(). Failure to do
179so results in a BUG_ON() and a device will be left with MSI enabled and
180leaks its vector. Otherwise, the PCI subsystem restores a device's
181dev->irq with a pre-assigned IOAPIC vector and marks the released
182MSI vector as unused.
183
184Once being marked as unused, there is no guarantee that the PCI
185subsystem will reserve this MSI vector for a device. Depending on
186the availability of current PCI vector resources and the number of
187MSI/MSI-X requests from other drivers, this MSI may be re-assigned.
188
189For the case where the PCI subsystem re-assigns this MSI vector to
190another driver, a request to switch back to MSI mode may result
191in being assigned a different MSI vector or a failure if no more
192vectors are available.
193
1945.3 Configuring for MSI-X support
195
196Due to the ability of the system software to configure each vector of
197the MSI-X capability structure with an independent message address
198and message data, the non-contiguous fashion in vector assignment of
199the existing Linux kernel has no impact on supporting multiple
200messages on an MSI-X capable device functions. To enable MSI-X on
201a device function's MSI-X capability structure requires its device
202driver to call the function pci_enable_msix() explicitly.
203
204The function pci_enable_msix(), once invoked, enables either
205all or nothing, depending on the current availability of PCI vector
206resources. If the PCI vector resources are available for the number
207of vectors requested by a device driver, this function will configure
208the MSI-X table of the MSI-X capability structure of a device with
209requested messages. To emphasize this reason, for example, a device
210may be capable for supporting the maximum of 32 vectors while its
211software driver usually may request 4 vectors. It is recommended
212that the device driver should call this function once during the
213initialization phase of the device driver.
214
215Unlike the function pci_enable_msi(), the function pci_enable_msix()
216does not replace the pre-assigned IOAPIC dev->irq with a new MSI
217vector because the PCI subsystem writes the 1:1 vector-to-entry mapping
218into the field vector of each element contained in a second argument.
219Note that the pre-assigned IOAPIC dev->irq is valid only if the device
220operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at
221using dev->irq by the device driver to request for interrupt service
222may result in unpredictable behavior.
223
224For each MSI-X vector granted, a device driver is responsible for calling
225other functions like request_irq(), enable_irq(), etc. to enable
226this vector with its corresponding interrupt service handler. It is
227a device driver's choice to assign all vectors with the same
228interrupt service handler or each vector with a unique interrupt
229service handler.
230
2315.3.1 Handling MMIO address space of MSI-X Table
232
233The PCI 3.0 specification has implementation notes that MMIO address
234space for a device's MSI-X structure should be isolated so that the
235software system can set different pages for controlling accesses to the
236MSI-X structure. The implementation of MSI support requires the PCI
237subsystem, not a device driver, to maintain full control of the MSI-X
238table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X
239table/MSI-X PBA. A device driver should not access the MMIO address
240space of the MSI-X table/MSI-X PBA.
241
2425.3.2 API pci_enable_msix
243
244int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
245 106
246This API enables a device driver to request the PCI subsystem 107A device driver must always call free_irq() on the interrupt(s)
247to enable MSI-X messages on its hardware device. Depending on 108for which it has called request_irq() before calling this function.
248the availability of PCI vectors resources, the PCI subsystem enables 109Failure to do so will result in a BUG_ON(), the device will be left with
249either all or none of the requested vectors. 110MSI enabled and will leak its vector.
250 111
251Argument 'dev' points to the device (pci_dev) structure. 1124.3 Using MSI-X
252 113
253Argument 'entries' is a pointer to an array of msix_entry structs. 114The MSI-X capability is much more flexible than the MSI capability.
254The number of entries is indicated in argument 'nvec'. 115It supports up to 2048 interrupts, each of which can be controlled
255struct msix_entry is defined in /driver/pci/msi.h: 116independently. To support this flexibility, drivers must use an array of
117`struct msix_entry':
256 118
257struct msix_entry { 119struct msix_entry {
258 u16 vector; /* kernel uses to write alloc vector */ 120 u16 vector; /* kernel uses to write alloc vector */
259 u16 entry; /* driver uses to specify entry */ 121 u16 entry; /* driver uses to specify entry */
260}; 122};
261 123
262A device driver is responsible for initializing the field 'entry' of 124This allows for the device to use these interrupts in a sparse fashion;
263each element with a unique entry supported by MSI-X table. Otherwise, 125for example it could use interrupts 3 and 1027 and allocate only a
264-EINVAL will be returned as a result. A successful return of zero 126two-element array. The driver is expected to fill in the 'entry' value
265indicates the PCI subsystem completed initializing each of the requested 127in each element of the array to indicate which entries it wants the kernel
266entries of the MSI-X table with message address and message data. 128to assign interrupts for. It is invalid to fill in two entries with the
267Last but not least, the PCI subsystem will write the 1:1 129same number.
268vector-to-entry mapping into the field 'vector' of each element. A 130
269device driver is responsible for keeping track of allocated MSI-X 1314.3.1 pci_enable_msix
270vectors in its internal data structure. 132
271 133int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
272A return of zero indicates that the number of MSI-X vectors was 134
273successfully allocated. A return of greater than zero indicates 135Calling this function asks the PCI subsystem to allocate 'nvec' MSIs.
274MSI-X vector shortage. Or a return of less than zero indicates 136The 'entries' argument is a pointer to an array of msix_entry structs
275a failure. This failure may be a result of duplicate entries 137which should be at least 'nvec' entries in size. On success, the
276specified in second argument, or a result of no available vector, 138function will return 0 and the device will have been switched into
277or a result of failing to initialize MSI-X table entries. 139MSI-X interrupt mode. The 'vector' elements in each entry will have
278 140been filled in with the interrupt number. The driver should then call
2795.3.3 API pci_disable_msix 141request_irq() for each 'vector' that it decides to use.
142
143If this function returns a negative number, it indicates an error and
144the driver should not attempt to allocate any more MSI-X interrupts for
145this device. If it returns a positive number, it indicates the maximum
146number of interrupt vectors that could have been allocated.
147
148This function, in contrast with pci_enable_msi(), does not adjust
149dev->irq. The device will not generate interrupts for this interrupt
150number once MSI-X is enabled. The device driver is responsible for
151keeping track of the interrupts assigned to the MSI-X vectors so it can
152free them again later.
153
154Device drivers should normally call this function once per device
155during the initialization phase.
156
1574.3.2 pci_disable_msix
280 158
281void pci_disable_msix(struct pci_dev *dev) 159void pci_disable_msix(struct pci_dev *dev)
282 160
283This API should always be used to undo the effect of pci_enable_msix() 161This API should be used to undo the effect of pci_enable_msix(). It frees
284when a device driver is unloading. Note that a device driver should 162the previously allocated message signaled interrupts. The interrupts may
285always call free_irq() on all MSI-X vectors it has done request_irq() 163subsequently be assigned to another device, so drivers should not cache
286on before calling this API. Failure to do so results in a BUG_ON() and 164the value of the 'vector' elements over a call to pci_disable_msix().
287a device will be left with MSI-X enabled and leaks its vectors. 165
288 166A device driver must always call free_irq() on the interrupt(s)
2895.3.4 MSI-X mode vs. legacy mode diagram 167for which it has called request_irq() before calling this function.
290 168Failure to do so will result in a BUG_ON(), the device will be left with
291The below diagram shows the events which switch the interrupt 169MSI enabled and will leak its vector.
292mode on the MSI-X capable device function between MSI-X mode and 170
293PIN-IRQ assertion mode (legacy). 1714.3.3 The MSI-X Table
294 172
295 ------------ pci_enable_msix(,,n) ------------------------ 173The MSI-X capability specifies a BAR and offset within that BAR for the
296 | | <=============== | | 174MSI-X Table. This address is mapped by the PCI subsystem, and should not
297 | MSI-X MODE | | PIN-IRQ ASSERTION MODE | 175be accessed directly by the device driver. If the driver wishes to
298 | | ===============> | | 176mask or unmask an interrupt, it should call disable_irq() / enable_irq().
299 ------------ pci_disable_msix ------------------------ 177
300 1784.4 Handling devices implementing both MSI and MSI-X capabilities
301Figure 2. MSI-X Mode vs. Legacy Mode 179
302 180If a device implements both MSI and MSI-X capabilities, it can
303In Figure 2, a device operates by default in legacy mode. A 181run in either MSI mode or MSI-X mode but not both simultaneously.
304successful MSI-X request (using pci_enable_msix()) switches a 182This is a requirement of the PCI spec, and it is enforced by the
305device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector 183PCI layer. Calling pci_enable_msi() when MSI-X is already enabled or
306stored in dev->irq will be saved by the PCI subsystem; however, 184pci_enable_msix() when MSI is already enabled will result in an error.
307unlike MSI mode, the PCI subsystem will not replace dev->irq with 185If a device driver wishes to switch between MSI and MSI-X at runtime,
308assigned MSI-X vector because the PCI subsystem already writes the 1:1 186it must first quiesce the device, then switch it back to pin-interrupt
309vector-to-entry mapping into the field 'vector' of each element 187mode, before calling pci_enable_msi() or pci_enable_msix() and resuming
310specified in second argument. 188operation. This is not expected to be a common operation but may be
311 189useful for debugging or testing during development.
312To return back to its default mode, a device driver should always call 190
313pci_disable_msix() to undo the effect of pci_enable_msix(). Note that 1914.5 Considerations when using MSIs
314a device driver should always call free_irq() on all MSI-X vectors it 192
315has done request_irq() on before calling pci_disable_msix(). Failure 1934.5.1 Choosing between MSI-X and MSI
316to do so results in a BUG_ON() and a device will be left with MSI-X 194
317enabled and leaks its vectors. Otherwise, the PCI subsystem switches a 195If your device supports both MSI-X and MSI capabilities, you should use
318device function's interrupt mode from MSI-X mode to legacy mode and 196the MSI-X facilities in preference to the MSI facilities. As mentioned
319marks all allocated MSI-X vectors as unused. 197above, MSI-X supports any number of interrupts between 1 and 2048.
320 198In constrast, MSI is restricted to a maximum of 32 interrupts (and
321Once being marked as unused, there is no guarantee that the PCI 199must be a power of two). In addition, the MSI interrupt vectors must
322subsystem will reserve these MSI-X vectors for a device. Depending on 200be allocated consecutively, so the system may not be able to allocate
323the availability of current PCI vector resources and the number of 201as many vectors for MSI as it could for MSI-X. On some platforms, MSI
324MSI/MSI-X requests from other drivers, these MSI-X vectors may be 202interrupts must all be targetted at the same set of CPUs whereas MSI-X
325re-assigned. 203interrupts can all be targetted at different CPUs.
326 204
327For the case where the PCI subsystem re-assigned these MSI-X vectors 2054.5.2 Spinlocks
328to other drivers, a request to switch back to MSI-X mode may result 206
329being assigned with another set of MSI-X vectors or a failure if no 207Most device drivers have a per-device spinlock which is taken in the
330more vectors are available. 208interrupt handler. With pin-based interrupts or a single MSI, it is not
331 209necessary to disable interrupts (Linux guarantees the same interrupt will
3325.4 Handling function implementing both MSI and MSI-X capabilities 210not be re-entered). If a device uses multiple interrupts, the driver
333 211must disable interrupts while the lock is held. If the device sends
334For the case where a function implements both MSI and MSI-X 212a different interrupt, the driver will deadlock trying to recursively
335capabilities, the PCI subsystem enables a device to run either in MSI 213acquire the spinlock.
336mode or MSI-X mode but not both. A device driver determines whether it 214
337wants MSI or MSI-X enabled on its hardware device. Once a device 215There are two solutions. The first is to take the lock with
338driver requests for MSI, for example, it is prohibited from requesting 216spin_lock_irqsave() or spin_lock_irq() (see
339MSI-X; in other words, a device driver is not permitted to ping-pong 217Documentation/DocBook/kernel-locking). The second is to specify
340between MSI mod MSI-X mode during a run-time. 218IRQF_DISABLED to request_irq() so that the kernel runs the entire
341 219interrupt routine with interrupts disabled.
3425.5 Hardware requirements for MSI/MSI-X support 220
343 221If your MSI interrupt routine does not hold the lock for the whole time
344MSI/MSI-X support requires support from both system hardware and 222it is running, the first solution may be best. The second solution is
345individual hardware device functions. 223normally preferred as it avoids making two transitions from interrupt
346 224disabled to enabled and back again.
3475.5.1 Required x86 hardware support 225
348 2264.6 How to tell whether MSI/MSI-X is enabled on a device
349Since the target of MSI address is the local APIC CPU, enabling 227
350MSI/MSI-X support in the Linux kernel is dependent on whether existing 228Using 'lspci -v' (as root) may show some devices with "MSI", "Message
351system hardware supports local APIC. Users should verify that their 229Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities
352system supports local APIC operation by testing that it runs when 230has an 'Enable' flag which will be followed with either "+" (enabled)
353CONFIG_X86_LOCAL_APIC=y. 231or "-" (disabled).
354 232
355In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set; 233
356however, in UP environment, users must manually set 2345. MSI quirks
357CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting 235
358CONFIG_PCI_MSI enables the VECTOR based scheme and the option for 236Several PCI chipsets or devices are known not to support MSIs.
359MSI-capable device drivers to selectively enable MSI/MSI-X. 237The PCI stack provides three ways to disable MSIs:
360 238
361Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X 2391. globally
362vector is allocated new during runtime and MSI/MSI-X support does not 2402. on all devices behind a specific bridge
363depend on BIOS support. This key independency enables MSI/MSI-X 2413. on a single device
364support on future IOxAPIC free platforms. 242
365 2435.1. Disabling MSIs globally
3665.5.2 Device hardware support 244
367 245Some host chipsets simply don't support MSIs properly. If we're
368The hardware device function supports MSI by indicating the 246lucky, the manufacturer knows this and has indicated it in the ACPI
369MSI/MSI-X capability structure on its PCI capability list. By 247FADT table. In this case, Linux will automatically disable MSIs.
370default, this capability structure will not be initialized by 248Some boards don't include this information in the table and so we have
371the kernel to enable MSI during the system boot. In other words, 249to detect them ourselves. The complete list of these is found near the
372the device function is running on its default pin assertion mode. 250quirk_disable_all_msi() function in drivers/pci/quirks.c.
373Note that in many cases the hardware supporting MSI have bugs, 251
374which may result in system hangs. The software driver of specific 252If you have a board which has problems with MSIs, you can pass pci=nomsi
375MSI-capable hardware is responsible for deciding whether to call 253on the kernel command line to disable MSIs on all devices. It would be
376pci_enable_msi or not. A return of zero indicates the kernel 254in your best interests to report the problem to linux-pci@vger.kernel.org
377successfully initialized the MSI/MSI-X capability structure of the 255including a full 'lspci -v' so we can add the quirks to the kernel.
378device function. The device function is now running on MSI/MSI-X mode. 256
379 2575.2. Disabling MSIs below a bridge
3805.6 How to tell whether MSI/MSI-X is enabled on device function 258
381 259Some PCI bridges are not able to route MSIs between busses properly.
382At the driver level, a return of zero from the function call of 260In this case, MSIs must be disabled on all devices behind the bridge.
383pci_enable_msi()/pci_enable_msix() indicates to a device driver that 261
384its device function is initialized successfully and ready to run in 262Some bridges allow you to enable MSIs by changing some bits in their
385MSI/MSI-X mode. 263PCI configuration space (especially the Hypertransport chipsets such
386 264as the nVidia nForce and Serverworks HT2000). As with host chipsets,
387At the user level, users can use the command 'cat /proc/interrupts' 265Linux mostly knows about them and automatically enables MSIs if it can.
388to display the vectors allocated for devices and their interrupt 266If you have a bridge which Linux doesn't yet know about, you can enable
389MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is 267MSIs in configuration space using whatever method you know works, then
390enabled on a SCSI Adaptec 39320D Ultra320 controller. 268enable MSIs on that bridge by doing:
391 269
392 CPU0 CPU1 270 echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
393 0: 324639 0 IO-APIC-edge timer 271
394 1: 1186 0 IO-APIC-edge i8042 272where $bridge is the PCI address of the bridge you've enabled (eg
395 2: 0 0 XT-PIC cascade 2730000:00:0e.0).
396 12: 2797 0 IO-APIC-edge i8042 274
397 14: 6543 0 IO-APIC-edge ide0 275To disable MSIs, echo 0 instead of 1. Changing this value should be
398 15: 1 0 IO-APIC-edge ide1 276done with caution as it can break interrupt handling for all devices
399169: 0 0 IO-APIC-level uhci-hcd 277below this bridge.
400185: 0 0 IO-APIC-level uhci-hcd 278
401193: 138 10 PCI-MSI aic79xx 279Again, please notify linux-pci@vger.kernel.org of any bridges that need
402201: 30 0 PCI-MSI aic79xx 280special handling.
403225: 30 0 IO-APIC-level aic7xxx 281
404233: 30 0 IO-APIC-level aic7xxx 2825.3. Disabling MSIs on a single device
405NMI: 0 0 283
406LOC: 324553 325068 284Some devices are known to have faulty MSI implementations. Usually this
407ERR: 0 285is handled in the individual device driver but occasionally it's necessary
408MIS: 0 286to handle this with a quirk. Some drivers have an option to disable use
409 287of MSI. While this is a convenient workaround for the driver author,
4106. MSI quirks 288it is not good practise, and should not be emulated.
411 289
412Several PCI chipsets or devices are known to not support MSI. 2905.4. Finding why MSIs are disabled on a device
413The PCI stack provides 3 possible levels of MSI disabling: 291
414* on a single device 292From the above three sections, you can see that there are many reasons
415* on all devices behind a specific bridge 293why MSIs may not be enabled for a given device. Your first step should
416* globally 294be to examine your dmesg carefully to determine whether MSIs are enabled
417 295for your machine. You should also check your .config to be sure you
4186.1. Disabling MSI on a single device 296have enabled CONFIG_PCI_MSI.
419 297
420Under some circumstances it might be required to disable MSI on a 298Then, 'lspci -t' gives the list of bridges above a device. Reading
421single device. This may be achieved by either not calling pci_enable_msi() 299/sys/bus/pci/devices/*/msi_bus will tell you whether MSI are enabled (1)
422or all, or setting the pci_dev->no_msi flag before (most of the time 300or disabled (0). If 0 is found in any of the msi_bus files belonging
423in a quirk). 301to bridges between the PCI root and the device, MSIs are disabled.
424 302
4256.2. Disabling MSI below a bridge 303It is also worth checking the device driver to see whether it supports MSIs.
426 304For example, it may contain calls to pci_enable_msi(), pci_enable_msix() or
427The vast majority of MSI quirks are required by PCI bridges not 305pci_enable_msi_block().
428being able to route MSI between busses. In this case, MSI have to be
429disabled on all devices behind this bridge. It is achieves by setting
430the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge
431subordinate bus. There is no need to set the same flag on bridges that
432are below the broken bridge. When pci_enable_msi() is called to enable
433MSI on a device, pci_msi_supported() takes care of checking the NO_MSI
434flag in all parent busses of the device.
435
436Some bridges actually support dynamic MSI support enabling/disabling
437by changing some bits in their PCI configuration space (especially
438the Hypertransport chipsets such as the nVidia nForce and Serverworks
439HT2000). It may then be required to update the NO_MSI flag on the
440corresponding devices in the sysfs hierarchy. To enable MSI support
441on device "0000:00:0e", do:
442
443 echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus
444
445To disable MSI support, echo 0 instead of 1. Note that it should be
446used with caution since changing this value might break interrupts.
447
4486.3. Disabling MSI globally
449
450Some extreme cases may require to disable MSI globally on the system.
451For now, the only known case is a Serverworks PCI-X chipsets (MSI are
452not supported on several busses that are not all connected to the
453chipset in the Linux PCI hierarchy). In the vast majority of other
454cases, disabling only behind a specific bridge is enough.
455
456For debugging purpose, the user may also pass pci=nomsi on the kernel
457command-line to explicitly disable MSI globally. But, once the appro-
458priate quirks are added to the kernel, this option should not be
459required anymore.
460
4616.4. Finding why MSI cannot be enabled on a device
462
463Assuming that MSI are not enabled on a device, you should look at
464dmesg to find messages that quirks may output when disabling MSI
465on some devices, some bridges or even globally.
466Then, lspci -t gives the list of bridges above a device. Reading
467/sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI
468are enabled (1) or disabled (0). In 0 is found in a single bridge
469msi_bus file above the device, MSI cannot be enabled.
470
4717. FAQ
472
473Q1. Are there any limitations on using the MSI?
474
475A1. If the PCI device supports MSI and conforms to the
476specification and the platform supports the APIC local bus,
477then using MSI should work.
478
479Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
480AMD processors)? In P3 IPI's are transmitted on the APIC local
481bus and in P4 and Xeon they are transmitted on the system
482bus. Are there any implications with this?
483
484A2. MSI support enables a PCI device sending an inbound
485memory write (0xfeexxxxx as target address) on its PCI bus
486directly to the FSB. Since the message address has a
487redirection hint bit cleared, it should work.
488
489Q3. The target address 0xfeexxxxx will be translated by the
490Host Bridge into an interrupt message. Are there any
491limitations on the chipsets such as Intel 8xx, Intel e7xxx,
492or VIA?
493
494A3. If these chipsets support an inbound memory write with
495target address set as 0xfeexxxxx, as conformed to PCI
496specification 2.3 or latest, then it should work.
497
498Q4. From the driver point of view, if the MSI is lost because
499of errors occurring during inbound memory write, then it may
500wait forever. Is there a mechanism for it to recover?
501
502A4. Since the target of the transaction is an inbound memory
503write, all transaction termination conditions (Retry,
504Master-Abort, Target-Abort, or normal completion) are
505supported. A device sending an MSI must abide by all the PCI
506rules and conditions regarding that inbound memory write. So,
507if a retry is signaled it must retry, etc... We believe that
508the recommendation for Abort is also a retry (refer to PCI
509specification 2.3 or latest).