aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/pci/msi.c
Commit message (Collapse)AuthorAge
* PCI MSI: Add support for multiple MSIMatthew Wilcox2009-03-20
| | | | | | | | | | Add the new API pci_enable_msi_block() to allow drivers to request multiple MSI and reimplement pci_enable_msi in terms of pci_enable_msi_block. Ensure that the architecture back ends don't have to know about multiple MSI. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI MSI: Refactor interrupt masking codeMatthew Wilcox2009-03-20
| | | | | | | | | | | | | | | | | | Since most of the callers already know whether they have an MSI or an MSI-X capability, split msi_set_mask_bits() into msi_mask_irq() and msix_mask_irq(). The only callers which don't (mask_msi_irq() and unmask_msi_irq()) can share code in msi_set_mask_bit(). This then becomes the only caller of msix_flush_writes(), so we can inline it. The flushing read can be to any address that belongs to the device, so we can eliminate the calculation too. We can also get rid of maskbits_mask from struct msi_desc and simply recalculate it on the rare occasion that we need it. The single-bit 'masked' element is replaced by a copy of the 32-bit 'masked' register, so this patch does not affect the size of msi_desc. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI MSI: Use mask_pos instead of mask_base when appropriateMatthew Wilcox2009-03-20
| | | | | | | | MSI interrupts have a mask_pos where MSI-X have a mask_base. Use a transparent union to get rid of some ugly casts. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI MSI: msi_desc->dev is always initialisedMatthew Wilcox2009-03-20
| | | | | | | | | By passing the pci_dev into alloc_msi_entry() we can be sure that the ->dev entry is always assigned and so we don't need to check it. Also, we used kzalloc() so we don't need to initialise ->irq to 0. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI MSI: Replace 'type' with 'is_msix'Matthew Wilcox2009-03-20
| | | | | | | | | | By changing from a 5-bit field to a 1-bit field, we free up some bits that can be used by a later patch. Also rearrange the fields for better packing on 64-bit platforms (reducing the size of msi_desc from 72 bytes to 64 bytes). Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI/MSI: Allow arch code to return the number of MSI-X availableMichael Ellerman2009-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | There is code in msix_capability_init() which, when the requested number of MSI-X couldn't be allocated, calculates how many MSI-X /could/ be allocated and returns that to the driver. That allows the driver to then make a second request, with a number of MSIs that should succeed. The current code requires the arch code to setup as many msi_descs as it can, and then return to the generic code. On some platforms the arch code may already know how many MSI-X it can allocate, before it sets up any of the msi_descs. So change the logic such that if the arch code returns a positive error code, that is taken to be the number of MSI-X that could be allocated. If the error code is negative we still calculate the number available using the old method. Because it's a little subtle, make sure the error return code from arch_setup_msi_irq() is always negative. That way only implementations of arch_setup_msi_irqs() need to be careful about returning a positive error code. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI/MSI: Use #ifdefs instead of weak functionsMichael Ellerman2009-03-19
| | | | | | | | | | | | | | | | | | | | | Weak functions aren't all they're cracked up to be. They lead to incorrect binaries with some toolchains, they require us to have empty functions we otherwise wouldn't, and the unused code is not elided (as of gcc 4.3.2 anyway). So replace the weak MSI arch hooks with the #define foo foo idiom. We no longer need empty versions of arch_setup/teardown_msi_irq(). This is less source (by 1 line!), and results in smaller binaries too: text data bss dec hex filename 9354300 1693916 678424 11726640 b2ef30 build/powerpc/vmlinux-before 9354052 1693852 678424 11726328 b2edf8 build/powerpc/vmlinux-after Also smaller on x86_64 and arm (iop13xx). Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI/MSI: Introduce pci_msix_table_size()Rafael J. Wysocki2009-03-19
| | | | | | | | | | Introduce new function pci_msix_table_size() returning the size of the MSI-X table of given PCI device or 0 if the device doesn't support MSI-X. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI/MSI: fix msi_mask() shift fixMatthew Wilcox2009-02-13
| | | | | | | | | | | | | Hidetoshi Seto points out that commit bffac3c593eba1f9da3efd0199e49ea6558a40ce has wrong values in the array. Rather than correct the array, we can just use a bounds check and perform the calculation specified in the comment. As a bonus, this will not run off the end of the array if the device specifies an illegal value in the MSI capability. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI MSI: Fix undefined shift by 32Matthew Wilcox2009-01-27
| | | | | | | | | Add an msi_mask() function which returns the correct bitmask for the number of MSI interrupts you have. This fixes an undefined bug in msi_capability_init(). Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI/MSI: bugfix/utilize for msi_capability_init()Hidetoshi Seto2009-01-16
| | | | | | | | | | | | | | | | | | | | | This patch fix a following bug and does a cleanup. bug: commit 5993760f7fc75b77e4701f1e56dc84c0d6cf18d5 had a wrong change (since is_64 is boolean[0|1]): - pci_write_config_dword(dev, - msi_mask_bits_reg(pos, is_64bit_address(control)), - maskbits); + pci_write_config_dword(dev, entry->msi_attrib.is_64, maskbits); utilize: Unify separated if (entry->msi_attrib.maskbit) statements. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: "Jike Song" <albcamus@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* ACPI/PCI: PCI MSI _OSC support capabilities called when root bridge addedAndrew Patterson2009-01-07
| | | | | | | | | | The _OSC capability OSC_MSI_SUPPORT is set when the root bridge is added with pci_acpi_osc_support(), so we no longer need to do it in the PCI MSI driver. Also adds the function pci_msi_enabled, which returns true if pci=nomsi is not on the kernel command-line. Signed-off-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* x86, MSI: pass irq_cfg and irq_descYinghai Lu2008-12-08
| | | | | | | | | | Impact: simplify code Pass irq_desc and cfg around, instead of raw IRQ numbers - this way we dont have to look it up again and again. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* ACPI/PCI: Set support bit for MSI in support field of _OSCTaku Izumi2008-10-22
| | | | | | | | | Currently linux doesn't have any code to set the "MSI supported" bit in Support Fireld of _OSC. This patch adds the code for that. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI: utilize calculated results when detecting MSI featuresJike Song2008-10-20
| | | | | | | | In msi_capability_init, we can make use of the calculated results instead of calling is_mask_bit_support and is_64bit_address twice. Signed-off-by: Jike Song <albcamus@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI: fully restore MSI state at resume timeJesse Barnes2008-08-07
| | | | | | | | | | | | With the recent change to avoid masking MSIs using the MSI enable bit, devices without an MSI mask bit will have their MSI capability always enabled when MSI is in use, so we need to restore it regardless of the mask bit state. Fixes kernel bz 11178. Acked-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI MSI: Don't disable MSIs if the mask bit isn't supportedMatthew Wilcox2008-07-28
| | | | | | | | | | David Vrabel has a device which generates an interrupt storm on the INTx pin if we disable MSI interrupts altogether. Masking interrupts is only a performance optimisation, so we can ignore the request to mask the interrupt. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI: use dev_printk when possibleBjorn Helgaas2008-06-25
| | | | | | | | | | | | | | | | | | | | | | Convert printks to use dev_printk(). I converted pr_debug() to dev_dbg(). Both use KERN_DEBUG and are enabled only when DEBUG is defined. I converted printk(KERN_DEBUG) to dev_printk(KERN_DEBUG), not to dev_dbg(), because dev_dbg() is only enabled when DEBUG is defined. I converted DBG(KERN_INFO) (only in setup-bus.c) to dev_info(). The DBG() name makes it sound like debug, but it's been enabled forever, so dev_info() preserves the previous behavior. I tried to make the resource assignment formats more consistent, e.g., "BAR %d: got res [%#llx-%#llx] bus [%#llx-%#llx] flags %#lx\n" instead of sometimes using "start-end" and sometimes using "size@start". I'm not attached to one or the other; I'd just like them consistent. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* PCI/MSI: skip calling pci_find_capability from msi_set_mask_bitsHidetoshi Seto2008-06-10
| | | | | | | | | The position of MSI capability is already cached in the msi_desc when we enter the msi_set_mask_bits(). Use it instead. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2Yinghai Lu2008-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | [PATCH 2/2] pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2 this change | commit 23a274c8a5adafc74a66f16988776fc7dd6f6e51 | Author: Prakash, Sathya <sathya.prakash@lsi.com> | Date: Fri Mar 7 15:53:21 2008 +0530 | | [SCSI] mpt fusion: Enable MSI by default for SAS controllers | | This patch modifies the driver to enable MSI by default for all SAS chips. | | Signed-off-by: Sathya Prakash <sathya.prakash@lsi.com> | Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> | Causes the kexec of a RHEL 5.1 kernel to fail. root casue: the rhel 5.1 kernel still uses INTx emulation. and mptscsih_shutdown doesn't call pci_disable_msi to reenable INTx on kexec path So call pci_msi_shutdown in the shutdown path to do the same thing to msix Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@hobbes.lan>
* pci/irq: restore mask_bits in msi shutdown -v3Yinghai Lu2008-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [PATCH 1/2] pci/irq: restore mask_bits in msi shutdown -v3 Yinghai found that kexec'ing a RHEL 5.1 kernel with 2.6.25-rc3+ kernels prevents his NIC from working. He bisected to | commit 89d694b9dbe769ca1004e01db0ca43964806a611 | Author: Thomas Gleixner <tglx@linutronix.de> | Date: Mon Feb 18 18:25:17 2008 +0100 | | genirq: do not leave interupts enabled on free_irq | | The default_disable() function was changed in commit: | | 76d2160147f43f982dfe881404cfde9fd0a9da21 | genirq: do not mask interrupts by default | For MSI, default_shutdown will call mask_bit for msi device. All mask bits will left disabled after free_irq. Then in the kexec case, the next kernel can only use msi_enable bit, so all device's MSI can not be used. So lets to restore the mask bit to its pci reset defined value (enabled) when we disable the kernels use of msi to be a little friendlier to kexec'd kernels. Extend msi_set_mask_bit to msi_set_mask_bits to take mask, so we can fully restore that to 0x00 instead of 0xfe. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@hobbes.lan>
* PCI: drivers/pci/msi.c: move arch hooks to the topAdrian Bunk2008-02-01
| | | | | | | | | | | | | | | | | | | This patch fixes the following problem present with older gcc versions: <-- snip --> ... CC drivers/pci/msi.o /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/pci/msi.c:692: warning: weak declaration of `arch_msi_check_device' after first use results in unspecified behavior /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/pci/msi.c:704: warning: weak declaration of `arch_setup_msi_irqs' after first use results in unspecified behavior /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/pci/msi.c:724: warning: weak declaration of `arch_teardown_msi_irqs' after first use results in unspecified behavior ... <-- snip --> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* PCI: export pci_restore_msi_state()Linas Vepstas2008-02-01
| | | | | | | | | | | | | | | PCI error recovery usually involves the PCI adapter being reset. If the device is using MSI, the reset will cause the MSI state to be lost; the device driver needs to restore the MSI state. The pci_restore_msi_state() routine is currently protected by CONFIG_PM; remove this, and also export the symbol, so that it can be used in a modle. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* PCI: Add quirk for devices which disable MSI when INTX_DISABLE is set.David Miller2007-11-05
| | | | | | | | | | | | | | | | | | A reasonably common problem with some devices is that they will disable MSI generation when the INTX_DISABLE bit is set in the PCI_COMMAND register. Quirk this explicitly, guarding the pci_intx() calls in msi.c with this quirk indication. The first entries for this quirk are for 5714 and 5780 Tigon3 chips, and thus we can remove the workaround code from the tg3.c driver. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Michael Chan <mchan@broadcom.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Use correct data offset for 32-bit MSI in read_msi_msg()Roland Dreier2007-10-12
| | | | | | | | | | | | | | While reading the MSI code trying to find a reason why MSI wouldn't work for devices that have a 32-bit MSI address capability, I noticed that read_msi_msg() seems to read the message data from the wrong offset in this case. Signed-off-by: Roland Dreier <roland@digitalvampire.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* msi: mask the msix vector before we unmap itEric W. Biederman2007-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With these two lines in the reverse order the drives/block/ccis.c was oopsing in msi_free_irqs. Silly us calling writel on an area after we unmap it. BUG: unable to handle kernel paging request at virtual address f8b2200c printing eip: c01e9cc7 *pdpt = 0000000000003001 *pde = 0000000037e48067 *pte = 0000000000000000 Oops: 0002 [#1] SMP Modules linked in: cciss ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc loop dm_multipath button battery asus_acpi ac tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata mptsas scsi_transport_sas mptspi scsi_transport_spi mptscsih mptbase sd_mod scsi_mod CPU: 1 EIP: 0060:[<c01e9cc7>] Not tainted VLI EFLAGS: 00010286 (2.6.22-rc2-gd2579053 #1) EIP is at msi_free_irqs+0x81/0xbe eax: f8b22000 ebx: f71f3180 ecx: f7fff280 edx: c1886eb8 esi: f7c4e800 edi: f7c4ec48 ebp: 00000002 esp: f5a0dec8 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process rmmod (pid: 5286, ti=f5a0d000 task=c47d2550 task.ti=f5a0d000) Stack: 00000002 f8b72294 00000400 f8b69ca7 f8b6bc6c 00000002 00000000 00000000 00000000 00000000 00000000 f5a997f4 f8b69d61 f7c5a4b0 f7c4e848 f7c4e848 f7c4e800 f7c4e800 f8b72294 f7c4e848 f8b72294 c01e3cdf f7c4e848 c024c469 Call Trace: [<f8b69ca7>] cciss_shutdown+0xae/0xc3 [cciss] [<f8b69d61>] cciss_remove_one+0xa5/0x178 [cciss] [<c01e3cdf>] pci_device_remove+0x16/0x35 [<c024c469>] __device_release_driver+0x71/0x8e [<c024c56e>] driver_detach+0xa0/0xde [<c024bc5c>] bus_remove_driver+0x27/0x41 [<c01e3ef3>] pci_unregister_driver+0xb/0x13 [<f8b6a343>] cciss_cleanup+0xf/0x51 [cciss] [<c0139ced>] sys_delete_module+0x110/0x135 [<c0104c7a>] sysenter_past_esp+0x5f/0x85 Here's a patch that just reverses the 2 lines of code as Eric suggests. Please consider this for inclusion. Signed-off-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Chase Maupin <chase.maupin@hp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* msi: fix the ordering of msix irqsEric W. Biederman2007-06-01
| | | | | | | | | | | | | | | | | "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> writes: Found what seems the problem with our vectors being listed backward. In drivers/pci/msi.c we should be using list_add_tail rather than list_add to preserve the ordering across various kernels. Please consider this for inclusion. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Screwed-up-by: Michael Ellerman <michael@ellerman.id.au> Cc: "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> Cc: Andi Kleen <ak@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* msi: fix ARM compileDan Williams2007-05-31
| | | | | | | | | | | | | | | | | | | In file included from drivers/pci/msi.c:22: include/asm/smp.h:17:26: asm/arch/smp.h: No such file or directory include/asm/smp.h:20:3: #error "<asm-arm/smp.h> included in non-SMP build" include/asm/smp.h:23:1: warning: "raw_smp_processor_id" redefined In file included from include/linux/sched.h:65, from include/linux/mm.h:4, from drivers/pci/msi.c:10: include/linux/smp.h:85:1: warning: this is the location of the previous definition Tested on powerpc, i386, and x86_64. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Fix assertion failure with MSI on sparc64David Miller2007-05-11
| | | | | | | | | | | | | | | | | | | | | | Today's find is a triggered assertion in msi_free_irqs() when the system doesn't support MSI, in which case arch_setup_msi_irqs() always returns an error. The problem is that when this happens we branch into msi_free_irqs(), to which you added the following assertion loop: list_for_each_entry(entry, &dev->msi_list, list) BUG_ON(irq_has_action(entry->irq)); Well, if arch_setup_msi_irqs() fails, entry->irq will be zero and although that's never assigned to any normal devices we use that IRQ number for the timer interrupt on sparc64 so this assertion triggers. Better to test for zero before doing the irq_has_action() assertion thing. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* header cleaning: don't include smp_lock.h when not usedRandy Dunlap2007-05-08
| | | | | | | | | | | | Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* MSI: Give archs the option to free all MSI/Xs at once.Michael Ellerman2007-05-02
| | | | | | | | | | | | | | | | | | | This patch introduces an optional function, arch_teardown_msi_irqs(), which gives an arch the opportunity to do per-device teardown for MSI/X. If that's not required, the default version simply calls arch_teardown_msi_irq() for each msi irq required. arch_teardown_msi_irqs() is simply passed a pdev, attached to the pdev is a list of msi_descs, it is up to the arch to free the irq associated with each of these as appropriate. For archs that _don't_ implement arch_teardown_msi_irqs(), all msi_descs with irq == 0 are considered unallocated, and the arch teardown routine is not called on them. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Give archs the option to allocate all MSI/Xs at once.Michael Ellerman2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces an optional function, arch_setup_msi_irqs(), (note the plural) which gives an arch the opportunity to do per-device setup for MSI/X and then allocate all the requested MSI/Xs at once. If that's not required by the arch, the default version simply calls arch_setup_msi_irq() for each MSI irq required. arch_setup_msi_irqs() is passed a pdev, attached to the pdev is a list of msi_descs with irq == 0, it is up to the arch to connect these up to an irq (via set_irq_msi()) or return an error. For convenience the number of vectors and the type are passed also. All msi_descs with irq != 0 are considered allocated, and the arch teardown routine will be called on them when necessary. The existing semantics of pci_enable_msix() are that if the requested number of irqs can not be allocated, the maximum number that _could_ be allocated is returned. To support that, we define that in case of an error from arch_setup_msi_irqs(), the number of msi_descs with irq != 0 are considered allocated, and are counted toward the "max that could be allocated". Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: arch must connect the irq and the msi_descMichael Ellerman2007-05-02
| | | | | | | | | | | | | | | | | | | | set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call it at some point in their setup routine, and then the generic code sets up the reverse mapping from the msi_desc back to the irq. set_irq_msi() should do both connections, making it the one and only call required to connect an irq with it's MSI desc and vice versa. The arch code MUST call set_irq_msi(), and it must do so only once it's sure it's not going to fail the irq allocation. Given that there's no need for the arch to return the irq anymore, the return value from the arch setup routine just becomes 0 for success and anything else for failure. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Remove dev->first_msi_irqMichael Ellerman2007-05-02
| | | | | | | | | | | | | Now that we keep a list of msi descriptors, we don't need first_msi_irq in the pci dev. If we somehow have zero MSIs configured list_entry() will give us weird oopes or nice memory corruption bugs. So be paranoid. Add BUG_ONs and also a check in pci_msi_check_device() to make sure nvec > 0. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Use a list instead of the custom link structureMichael Ellerman2007-05-02
| | | | | | | | | | | | | | The msi descriptors are linked together with what looks a lot like a linked list, but isn't a struct list_head list. Make it one. The only complication is that previously we walked a list of irqs, and got the descriptor for each with get_irq_msi(). Now we have a list of descriptors and need to get the irq out of it, so it needs to be in the actual struct msi_desc. We use 0 to indicate no irq is setup. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Add an arch_msi_check_device()Michael Ellerman2007-05-02
| | | | | | | | | | | Add an arch_check_device(), which gives archs a chance to check the input to pci_enable_msi/x. The arch might be interested in the value of nvec so pass it in. Propagate the error value returned from the arch routine out to the caller. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Rename pci_msi_supported() to pci_msi_check_device()Michael Ellerman2007-05-02
| | | | | | | | | | As pointed out by Eric, the name pci_msi_supported() suggests it should return a boolean value, however it doesn't. So update the name to be a bit less confusing and update the doco too. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Consolidate precondition checksMichael Ellerman2007-05-02
| | | | | | | | | Consolidate precondition checks into a single if statement. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Expand pci_msi_supported()Michael Ellerman2007-05-02
| | | | | | | | | | | | | pci_enable_msi() and pci_enable_msix() both search for the MSI/MSI-X capability, we can fold this into pci_msi_supported() by passing the type in. Update the code to match the comment for pci_msi_supported(). That is it returns 0 on success, and anything else indicates an error. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Remove msi_cacheMichael Ellerman2007-05-02
| | | | | | | | | | | | | We don't need a special cache just for msi descriptors. They're not particularly large, under 100 bytes for sure, and don't seem to require any special alignment etc. On most systems there will be relatively few MSIs, and hence we waste most of a page on the cache. Better to just kzalloc the space for the few we do need. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Move EXPORT_SYMBOL()s near their definitionMichael Ellerman2007-05-02
| | | | | | | | | Move EXPORT_SYMBOL()s near their definition. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Consolidate BUG_ON()s.Michael Ellerman2007-05-02
| | | | | | | | | | | | When freeing MSIs and MSI-Xs, we BUG_ON() if the irq has not been freed, ie. if it still has an action. We can consolidate all of these BUG_ON()s into msi_free_irqs() as all the code paths lead there almost immediately anyway. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Consolidate MSI-X irq freeing codeMichael Ellerman2007-05-02
| | | | | | | | | | | | | For the MSI-X case we do exactly the same logic in pci_disable_msix() and msi_remove_pci_irq_vectors(), so consolidate them. msi_remove_pci_irq_vectors() wasn't setting dev->first_msi_irq to 0, but I think it should have been, so the consolidated version does. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Simplify BUG() handling in msi_remove_pci_irq_vectors() part 2Michael Ellerman2007-05-02
| | | | | | | | | | | | | | Although it might be nice to do a printk before BUG'ing, it's really not necessary, and it complicates the code. The behaviour has changed slightly, in that before we set a flag if the irq had an action, and continued freeing the other irqs. But as I see it that's all irrelevant because we end up BUG'ing anyway. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Simplify BUG() handling in msi_remove_pci_irq_vectors() part 1Michael Ellerman2007-05-02
| | | | | | | | | | Although it might be nice to do a printk before BUG'ing, it's really not necessary, and it complicates the code. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Simplify BUG() handling in pci_disable_msix()Michael Ellerman2007-05-02
| | | | | | | | | | | | | | Although it might be nice to do a printk before BUG'ing, it's really not necessary, and it complicates the code. The behaviour has changed slightly, in that before we set a flag if the irq had an action, and continued freeing the other irqs. But as I see it that's all irrelevant because we end up BUG'ing anyway. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* MSI: Simplify BUG() handling in pci_disable_msi()Michael Ellerman2007-05-02
| | | | | | | | | | Although it might be nice to do a printk before BUG'ing, it's really not necessary, and it complicates the code. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* PCI: Flush MSI-X table writesMitch Williams2007-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a kernel bug which is triggered when using the irqbalance daemon with MSI-X hardware. Because both MSI-X interrupt messages and MSI-X table writes are posted, it's possible for them to cross while in-flight. This results in interrupts being received long after the kernel thinks they're disabled, and in interrupts being sent to stale vectors after rebalancing. This patch performs a read flush after writes to the MSI-X table for mask and unmask operations. Since the SMP affinity is set while the interrupt is masked, and since it's unmasked immediately after, no additional flushes are required in the various affinity setting routines. This patch has been validated with (unreleased) network hardware which uses MSI-X. Revised with input from Eric Biederman. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* [PATCH] msi: synchronously mask and unmask msi-x irqs.Eric W. Biederman2007-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simplified and actually more comprehensive form of a bug fix from Mitch Williams <mitch.a.williams@intel.com>. When we mask or unmask a msi-x irqs the writes may be posted because we are writing to memory mapped region. This means the mask and unmask don't happen immediately but at some unspecified time in the future. Which is out of sync with how the mask/unmask logic work for ioapic irqs. The practical result is that we get very subtle and hard to track down irq migration bugs. This patch performs a read flush after writes to the MSI-X table for mask and unmask operations. Since the SMP affinity is set while the interrupt is masked, and since it's unmasked immediately after, no additional flushes are required in the various affinity setting routines. The testing by Mitch Williams on his especially problematic system should still be valid as I have only simplified the code, not changed the functionality. We currently have 7 drivers: cciss, mthca, cxgb3, forceth, s2io, pcie/portdrv_core, and qla2xxx in 2.6.21 that are affected by this problem when the hardware they driver is plugged into the right slot. Given the difficulty of reproducing this bug and tracing it down to anything that even remotely resembles a cause, even if people are being affected we aren't likely to see many meaningful bug reports, and the people who see this bug aren't likely to be able to reproduce this bug in a timely fashion. So it is best to get this problem fixed as soon as we can so people don't have problems. Then if people do have a kernel message stating "No irq for vector" we will know it is yet another novel cause that needs a complete new investigation. Cc: Greg KH <greg@kroah.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] msi: Safer state caching.Eric W. Biederman2007-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two ways pci_save_state and pci_restore_state are used. As helper functions during suspend/resume, and as helper functions around a hardware reset event. When used as helper functions around a hardware reset event there is no reason to believe the calls will be paired, nor is there a good reason to believe that if we restore the msi state from before the reset that it will match the current msi state. Since arch code may change the msi message without going through the driver, drivers currently do not have enough information to even know when to call pci_save_state to ensure they will have msi state in sync with the other kernel irq reception data structures. It turns out the solution is straight forward, cache the state in the existing msi data structures (not the magic pci saved things) and have the msi code update the cached state each time we write to the hardware. This means we never need to read the hardware to figure out what the hardware state should be. By modifying the caching in this manner we get to remove our save_state routines and only need to provide restore_state routines. The only fields that were at all tricky to regenerate were the msi and msi-x control registers and the way we regenerate them currently is a bit dependent upon assumptions on how we use the allow msi registers to be configured and used making the code a little bit brittle. If we ever change what cases we allow or how we configure the msi bits we can address the fragility then. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>