diff options
author | Alexander Gordeev <agordeev@redhat.com> | 2013-12-30 02:28:16 -0500 |
---|---|---|
committer | Bjorn Helgaas <bhelgaas@google.com> | 2014-01-03 19:17:55 -0500 |
commit | 302a2523c277bea0bbe8340312b09507905849ed (patch) | |
tree | a2d971a8aa28744c65dd454bfe4509a9be7be623 /Documentation/PCI/MSI-HOWTO.txt | |
parent | ff1aa430a2fa43189e89c7ddd559f0bee2298288 (diff) |
PCI/MSI: Add pci_enable_msi_range() and pci_enable_msix_range()
This adds pci_enable_msi_range(), which supersedes the pci_enable_msi()
and pci_enable_msi_block() MSI interfaces.
It also adds pci_enable_msix_range(), which supersedes the
pci_enable_msix() MSI-X interface.
The old interfaces have three categories of return values:
negative: failure; caller should not retry
positive: failure; value indicates number of interrupts that *could*
have been allocated, and caller may retry with a smaller request
zero: success; at least as many interrupts allocated as requested
It is error-prone to handle these three cases correctly in drivers.
The new functions return either a negative error code or a number of
successfully allocated MSI/MSI-X interrupts, which is expected to lead to
clearer device driver code.
pci_enable_msi(), pci_enable_msi_block() and pci_enable_msix() still exist
unchanged, but are deprecated and may be removed after callers are updated.
[bhelgaas: tweak changelog]
Suggested-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Diffstat (limited to 'Documentation/PCI/MSI-HOWTO.txt')
-rw-r--r-- | Documentation/PCI/MSI-HOWTO.txt | 261 |
1 files changed, 186 insertions, 75 deletions
diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt index b58f4a4d14bb..a8d01005f480 100644 --- a/Documentation/PCI/MSI-HOWTO.txt +++ b/Documentation/PCI/MSI-HOWTO.txt | |||
@@ -82,67 +82,97 @@ Most of the hard work is done for the driver in the PCI layer. It simply | |||
82 | has to request that the PCI layer set up the MSI capability for this | 82 | has to request that the PCI layer set up the MSI capability for this |
83 | device. | 83 | device. |
84 | 84 | ||
85 | 4.2.1 pci_enable_msi | 85 | 4.2.1 pci_enable_msi_range |
86 | 86 | ||
87 | int pci_enable_msi(struct pci_dev *dev) | 87 | int pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec) |
88 | 88 | ||
89 | A successful call allocates ONE interrupt to the device, regardless | 89 | This function allows a device driver to request any number of MSI |
90 | of how many MSIs the device supports. The device is switched from | 90 | interrupts within specified range from 'minvec' to 'maxvec'. |
91 | pin-based interrupt mode to MSI mode. The dev->irq number is changed | ||
92 | to a new number which represents the message signaled interrupt; | ||
93 | consequently, this function should be called before the driver calls | ||
94 | request_irq(), because an MSI is delivered via a vector that is | ||
95 | different from the vector of a pin-based interrupt. | ||
96 | 91 | ||
97 | 4.2.2 pci_enable_msi_block | 92 | If this function returns a positive number it indicates the number of |
93 | MSI interrupts that have been successfully allocated. In this case | ||
94 | the device is switched from pin-based interrupt mode to MSI mode and | ||
95 | updates dev->irq to be the lowest of the new interrupts assigned to it. | ||
96 | The other interrupts assigned to the device are in the range dev->irq | ||
97 | to dev->irq + returned value - 1. Device driver can use the returned | ||
98 | number of successfully allocated MSI interrupts to further allocate | ||
99 | and initialize device resources. | ||
98 | 100 | ||
99 | int pci_enable_msi_block(struct pci_dev *dev, int count) | 101 | If this function returns a negative number, it indicates an error and |
102 | the driver should not attempt to request any more MSI interrupts for | ||
103 | this device. | ||
100 | 104 | ||
101 | This variation on the above call allows a device driver to request multiple | 105 | This function should be called before the driver calls request_irq(), |
102 | MSIs. The MSI specification only allows interrupts to be allocated in | 106 | because MSI interrupts are delivered via vectors that are different |
103 | powers of two, up to a maximum of 2^5 (32). | 107 | from the vector of a pin-based interrupt. |
104 | 108 | ||
105 | If this function returns 0, it has succeeded in allocating at least as many | 109 | It is ideal if drivers can cope with a variable number of MSI interrupts; |
106 | interrupts as the driver requested (it may have allocated more in order | 110 | there are many reasons why the platform may not be able to provide the |
107 | to satisfy the power-of-two requirement). In this case, the function | 111 | exact number that a driver asks for. |
108 | enables MSI on this device and updates dev->irq to be the lowest of | ||
109 | the new interrupts assigned to it. The other interrupts assigned to | ||
110 | the device are in the range dev->irq to dev->irq + count - 1. | ||
111 | 112 | ||
112 | If this function returns a negative number, it indicates an error and | 113 | There could be devices that can not operate with just any number of MSI |
113 | the driver should not attempt to request any more MSI interrupts for | 114 | interrupts within a range. See chapter 4.3.1.3 to get the idea how to |
114 | this device. If this function returns a positive number, it is | 115 | handle such devices for MSI-X - the same logic applies to MSI. |
115 | less than 'count' and indicates the number of interrupts that could have | 116 | |
116 | been allocated. In neither case is the irq value updated or the device | 117 | 4.2.1.1 Maximum possible number of MSI interrupts |
117 | switched into MSI mode. | 118 | |
118 | 119 | The typical usage of MSI interrupts is to allocate as many vectors as | |
119 | The device driver must decide what action to take if | 120 | possible, likely up to the limit returned by pci_msi_vec_count() function: |
120 | pci_enable_msi_block() returns a value less than the number requested. | 121 | |
121 | For instance, the driver could still make use of fewer interrupts; | 122 | static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) |
122 | in this case the driver should call pci_enable_msi_block() | 123 | { |
123 | again. Note that it is not guaranteed to succeed, even when the | 124 | return pci_enable_msi_range(pdev, 1, nvec); |
124 | 'count' has been reduced to the value returned from a previous call to | 125 | } |
125 | pci_enable_msi_block(). This is because there are multiple constraints | 126 | |
126 | on the number of vectors that can be allocated; pci_enable_msi_block() | 127 | Note the value of 'minvec' parameter is 1. As 'minvec' is inclusive, |
127 | returns as soon as it finds any constraint that doesn't allow the | 128 | the value of 0 would be meaningless and could result in error. |
128 | call to succeed. | 129 | |
129 | 130 | Some devices have a minimal limit on number of MSI interrupts. | |
130 | 4.2.3 pci_disable_msi | 131 | In this case the function could look like this: |
132 | |||
133 | static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) | ||
134 | { | ||
135 | return pci_enable_msi_range(pdev, FOO_DRIVER_MINIMUM_NVEC, nvec); | ||
136 | } | ||
137 | |||
138 | 4.2.1.2 Exact number of MSI interrupts | ||
139 | |||
140 | If a driver is unable or unwilling to deal with a variable number of MSI | ||
141 | interrupts it could request a particular number of interrupts by passing | ||
142 | that number to pci_enable_msi_range() function as both 'minvec' and 'maxvec' | ||
143 | parameters: | ||
144 | |||
145 | static int foo_driver_enable_msi(struct pci_dev *pdev, int nvec) | ||
146 | { | ||
147 | return pci_enable_msi_range(pdev, nvec, nvec); | ||
148 | } | ||
149 | |||
150 | 4.2.1.3 Single MSI mode | ||
151 | |||
152 | The most notorious example of the request type described above is | ||
153 | enabling the single MSI mode for a device. It could be done by passing | ||
154 | two 1s as 'minvec' and 'maxvec': | ||
155 | |||
156 | static int foo_driver_enable_single_msi(struct pci_dev *pdev) | ||
157 | { | ||
158 | return pci_enable_msi_range(pdev, 1, 1); | ||
159 | } | ||
160 | |||
161 | 4.2.2 pci_disable_msi | ||
131 | 162 | ||
132 | void pci_disable_msi(struct pci_dev *dev) | 163 | void pci_disable_msi(struct pci_dev *dev) |
133 | 164 | ||
134 | This function should be used to undo the effect of pci_enable_msi() or | 165 | This function should be used to undo the effect of pci_enable_msi_range(). |
135 | pci_enable_msi_block(). Calling it restores dev->irq to the pin-based | 166 | Calling it restores dev->irq to the pin-based interrupt number and frees |
136 | interrupt number and frees the previously allocated message signaled | 167 | the previously allocated MSIs. The interrupts may subsequently be assigned |
137 | interrupt(s). The interrupt may subsequently be assigned to another | 168 | to another device, so drivers should not cache the value of dev->irq. |
138 | device, so drivers should not cache the value of dev->irq. | ||
139 | 169 | ||
140 | Before calling this function, a device driver must always call free_irq() | 170 | Before calling this function, a device driver must always call free_irq() |
141 | on any interrupt for which it previously called request_irq(). | 171 | on any interrupt for which it previously called request_irq(). |
142 | Failure to do so results in a BUG_ON(), leaving the device with | 172 | Failure to do so results in a BUG_ON(), leaving the device with |
143 | MSI enabled and thus leaking its vector. | 173 | MSI enabled and thus leaking its vector. |
144 | 174 | ||
145 | 4.2.4 pci_msi_vec_count | 175 | 4.2.3 pci_msi_vec_count |
146 | 176 | ||
147 | int pci_msi_vec_count(struct pci_dev *dev) | 177 | int pci_msi_vec_count(struct pci_dev *dev) |
148 | 178 | ||
@@ -176,26 +206,31 @@ in each element of the array to indicate for which entries the kernel | |||
176 | should assign interrupts; it is invalid to fill in two entries with the | 206 | should assign interrupts; it is invalid to fill in two entries with the |
177 | same number. | 207 | same number. |
178 | 208 | ||
179 | 4.3.1 pci_enable_msix | 209 | 4.3.1 pci_enable_msix_range |
180 | 210 | ||
181 | int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec) | 211 | int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, |
212 | int minvec, int maxvec) | ||
182 | 213 | ||
183 | Calling this function asks the PCI subsystem to allocate 'nvec' MSIs. | 214 | Calling this function asks the PCI subsystem to allocate any number of |
215 | MSI-X interrupts within specified range from 'minvec' to 'maxvec'. | ||
184 | The 'entries' argument is a pointer to an array of msix_entry structs | 216 | The 'entries' argument is a pointer to an array of msix_entry structs |
185 | which should be at least 'nvec' entries in size. On success, the | 217 | which should be at least 'maxvec' entries in size. |
186 | device is switched into MSI-X mode and the function returns 0. | 218 | |
187 | The 'vector' member in each entry is populated with the interrupt number; | 219 | On success, the device is switched into MSI-X mode and the function |
220 | returns the number of MSI-X interrupts that have been successfully | ||
221 | allocated. In this case the 'vector' member in entries numbered from | ||
222 | 0 to the returned value - 1 is populated with the interrupt number; | ||
188 | the driver should then call request_irq() for each 'vector' that it | 223 | the driver should then call request_irq() for each 'vector' that it |
189 | decides to use. The device driver is responsible for keeping track of the | 224 | decides to use. The device driver is responsible for keeping track of the |
190 | interrupts assigned to the MSI-X vectors so it can free them again later. | 225 | interrupts assigned to the MSI-X vectors so it can free them again later. |
226 | Device driver can use the returned number of successfully allocated MSI-X | ||
227 | interrupts to further allocate and initialize device resources. | ||
191 | 228 | ||
192 | If this function returns a negative number, it indicates an error and | 229 | If this function returns a negative number, it indicates an error and |
193 | the driver should not attempt to allocate any more MSI-X interrupts for | 230 | the driver should not attempt to allocate any more MSI-X interrupts for |
194 | this device. If it returns a positive number, it indicates the maximum | 231 | this device. |
195 | number of interrupt vectors that could have been allocated. See example | ||
196 | below. | ||
197 | 232 | ||
198 | This function, in contrast with pci_enable_msi(), does not adjust | 233 | This function, in contrast with pci_enable_msi_range(), does not adjust |
199 | dev->irq. The device will not generate interrupts for this interrupt | 234 | dev->irq. The device will not generate interrupts for this interrupt |
200 | number once MSI-X is enabled. | 235 | number once MSI-X is enabled. |
201 | 236 | ||
@@ -206,28 +241,103 @@ It is ideal if drivers can cope with a variable number of MSI-X interrupts; | |||
206 | there are many reasons why the platform may not be able to provide the | 241 | there are many reasons why the platform may not be able to provide the |
207 | exact number that a driver asks for. | 242 | exact number that a driver asks for. |
208 | 243 | ||
209 | A request loop to achieve that might look like: | 244 | There could be devices that can not operate with just any number of MSI-X |
245 | interrupts within a range. E.g., an network adapter might need let's say | ||
246 | four vectors per each queue it provides. Therefore, a number of MSI-X | ||
247 | interrupts allocated should be a multiple of four. In this case interface | ||
248 | pci_enable_msix_range() can not be used alone to request MSI-X interrupts | ||
249 | (since it can allocate any number within the range, without any notion of | ||
250 | the multiple of four) and the device driver should master a custom logic | ||
251 | to request the required number of MSI-X interrupts. | ||
252 | |||
253 | 4.3.1.1 Maximum possible number of MSI-X interrupts | ||
254 | |||
255 | The typical usage of MSI-X interrupts is to allocate as many vectors as | ||
256 | possible, likely up to the limit returned by pci_msix_vec_count() function: | ||
257 | |||
258 | static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) | ||
259 | { | ||
260 | return pci_enable_msi_range(adapter->pdev, adapter->msix_entries, | ||
261 | 1, nvec); | ||
262 | } | ||
263 | |||
264 | Note the value of 'minvec' parameter is 1. As 'minvec' is inclusive, | ||
265 | the value of 0 would be meaningless and could result in error. | ||
266 | |||
267 | Some devices have a minimal limit on number of MSI-X interrupts. | ||
268 | In this case the function could look like this: | ||
210 | 269 | ||
211 | static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) | 270 | static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) |
212 | { | 271 | { |
213 | while (nvec >= FOO_DRIVER_MINIMUM_NVEC) { | 272 | return pci_enable_msi_range(adapter->pdev, adapter->msix_entries, |
214 | rc = pci_enable_msix(adapter->pdev, | 273 | FOO_DRIVER_MINIMUM_NVEC, nvec); |
215 | adapter->msix_entries, nvec); | 274 | } |
216 | if (rc > 0) | 275 | |
217 | nvec = rc; | 276 | 4.3.1.2 Exact number of MSI-X interrupts |
218 | else | 277 | |
219 | return rc; | 278 | If a driver is unable or unwilling to deal with a variable number of MSI-X |
279 | interrupts it could request a particular number of interrupts by passing | ||
280 | that number to pci_enable_msix_range() function as both 'minvec' and 'maxvec' | ||
281 | parameters: | ||
282 | |||
283 | static int foo_driver_enable_msix(struct foo_adapter *adapter, int nvec) | ||
284 | { | ||
285 | return pci_enable_msi_range(adapter->pdev, adapter->msix_entries, | ||
286 | nvec, nvec); | ||
287 | } | ||
288 | |||
289 | 4.3.1.3 Specific requirements to the number of MSI-X interrupts | ||
290 | |||
291 | As noted above, there could be devices that can not operate with just any | ||
292 | number of MSI-X interrupts within a range. E.g., let's assume a device that | ||
293 | is only capable sending the number of MSI-X interrupts which is a power of | ||
294 | two. A routine that enables MSI-X mode for such device might look like this: | ||
295 | |||
296 | /* | ||
297 | * Assume 'minvec' and 'maxvec' are non-zero | ||
298 | */ | ||
299 | static int foo_driver_enable_msix(struct foo_adapter *adapter, | ||
300 | int minvec, int maxvec) | ||
301 | { | ||
302 | int rc; | ||
303 | |||
304 | minvec = roundup_pow_of_two(minvec); | ||
305 | maxvec = rounddown_pow_of_two(maxvec); | ||
306 | |||
307 | if (minvec > maxvec) | ||
308 | return -ERANGE; | ||
309 | |||
310 | retry: | ||
311 | rc = pci_enable_msix_range(adapter->pdev, adapter->msix_entries, | ||
312 | maxvec, maxvec); | ||
313 | /* | ||
314 | * -ENOSPC is the only error code allowed to be analized | ||
315 | */ | ||
316 | if (rc == -ENOSPC) { | ||
317 | if (maxvec == 1) | ||
318 | return -ENOSPC; | ||
319 | |||
320 | maxvec /= 2; | ||
321 | |||
322 | if (minvec > maxvec) | ||
323 | return -ENOSPC; | ||
324 | |||
325 | goto retry; | ||
220 | } | 326 | } |
221 | 327 | ||
222 | return -ENOSPC; | 328 | return rc; |
223 | } | 329 | } |
224 | 330 | ||
331 | Note how pci_enable_msix_range() return value is analized for a fallback - | ||
332 | any error code other than -ENOSPC indicates a fatal error and should not | ||
333 | be retried. | ||
334 | |||
225 | 4.3.2 pci_disable_msix | 335 | 4.3.2 pci_disable_msix |
226 | 336 | ||
227 | void pci_disable_msix(struct pci_dev *dev) | 337 | void pci_disable_msix(struct pci_dev *dev) |
228 | 338 | ||
229 | This function should be used to undo the effect of pci_enable_msix(). It frees | 339 | This function should be used to undo the effect of pci_enable_msix_range(). |
230 | the previously allocated message signaled interrupts. The interrupts may | 340 | It frees the previously allocated MSI-X interrupts. The interrupts may |
231 | subsequently be assigned to another device, so drivers should not cache | 341 | subsequently be assigned to another device, so drivers should not cache |
232 | the value of the 'vector' elements over a call to pci_disable_msix(). | 342 | the value of the 'vector' elements over a call to pci_disable_msix(). |
233 | 343 | ||
@@ -261,13 +371,14 @@ number of MSI-X interrupt vectors that could be allocated. | |||
261 | If a device implements both MSI and MSI-X capabilities, it can | 371 | If a device implements both MSI and MSI-X capabilities, it can |
262 | run in either MSI mode or MSI-X mode, but not both simultaneously. | 372 | run in either MSI mode or MSI-X mode, but not both simultaneously. |
263 | This is a requirement of the PCI spec, and it is enforced by the | 373 | This is a requirement of the PCI spec, and it is enforced by the |
264 | PCI layer. Calling pci_enable_msi() when MSI-X is already enabled or | 374 | PCI layer. Calling pci_enable_msi_range() when MSI-X is already |
265 | pci_enable_msix() when MSI is already enabled results in an error. | 375 | enabled or pci_enable_msix_range() when MSI is already enabled |
266 | If a device driver wishes to switch between MSI and MSI-X at runtime, | 376 | results in an error. If a device driver wishes to switch between MSI |
267 | it must first quiesce the device, then switch it back to pin-interrupt | 377 | and MSI-X at runtime, it must first quiesce the device, then switch |
268 | mode, before calling pci_enable_msi() or pci_enable_msix() and resuming | 378 | it back to pin-interrupt mode, before calling pci_enable_msi_range() |
269 | operation. This is not expected to be a common operation but may be | 379 | or pci_enable_msix_range() and resuming operation. This is not expected |
270 | useful for debugging or testing during development. | 380 | to be a common operation but may be useful for debugging or testing |
381 | during development. | ||
271 | 382 | ||
272 | 4.5 Considerations when using MSIs | 383 | 4.5 Considerations when using MSIs |
273 | 384 | ||
@@ -382,5 +493,5 @@ or disabled (0). If 0 is found in any of the msi_bus files belonging | |||
382 | to bridges between the PCI root and the device, MSIs are disabled. | 493 | to bridges between the PCI root and the device, MSIs are disabled. |
383 | 494 | ||
384 | It is also worth checking the device driver to see whether it supports MSIs. | 495 | It is also worth checking the device driver to see whether it supports MSIs. |
385 | For example, it may contain calls to pci_enable_msi(), pci_enable_msix() or | 496 | For example, it may contain calls to pci_enable_msi_range() or |
386 | pci_enable_msi_block(). | 497 | pci_enable_msix_range(). |