diff options
author | Bryan Boatright <b1@omega71.com> | 2008-02-07 03:14:58 -0500 |
---|---|---|
committer | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2008-02-07 11:42:23 -0500 |
commit | 6b09ff9d787911b0b46a4d286e68f1f84e8b0b94 (patch) | |
tree | 933ef684e8881d7d9b5dbbcc60694b8f36815fea | |
parent | 4f4aeeabc061826376c9a72b4714d062664999ea (diff) |
drivers/edac: pci: broken parity regression
Using the EDAC code in kernel.org kernel version 2.6.23.8 I am seeing the
following problem:
In the kernel there is a pci device attribute located in sysfs that is
checked by the EDAC PCI scanning code. If that attribute is set,
PCI parity/error scannining is skipped for that device. The attribute
is:
broken_parity_status
as is located in /sys/devices/pci<XXX>/0000:XX:YY.Z directorys for
PCI devices.
I don't think this check was actually implemented. I have a misbehaved card
that reports a parity error every 1000 ms:
Nov 25 07:28:43 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Nov 25 07:28:44 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Nov 25 07:28:45 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Setting that card's broken_parity_status bit did not mask the error:
echo "1" > /sys/bus/pci/devices/0000:05:01.0/broken_parity_status
I looked through the EDAC code and did not readily see any reference to
broken_parity_status at all (which makes sense based on the behavior I am
seeing). I applied the following patch as a proof-of-concept and now EDAC's
PCI parity error reporting behaves as documented:
bryan
Good regression find, bryan. It used to work. sigh.
I added more logic to your patch, for more coverage of the error.
Doug T
Signed-off-by: Bryan Boatright <b1@omega71.com>
Signed-off-by: Doug Thompson <dougthompson@xmisson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r-- | drivers/edac/edac_pci_sysfs.c | 12 |
1 files changed, 8 insertions, 4 deletions
diff --git a/drivers/edac/edac_pci_sysfs.c b/drivers/edac/edac_pci_sysfs.c index 5b075da99145..71c3195d3704 100644 --- a/drivers/edac/edac_pci_sysfs.c +++ b/drivers/edac/edac_pci_sysfs.c | |||
@@ -558,8 +558,10 @@ static void edac_pci_dev_parity_test(struct pci_dev *dev) | |||
558 | 558 | ||
559 | debugf4("PCI STATUS= 0x%04x %s\n", status, dev->dev.bus_id); | 559 | debugf4("PCI STATUS= 0x%04x %s\n", status, dev->dev.bus_id); |
560 | 560 | ||
561 | /* check the status reg for errors */ | 561 | /* check the status reg for errors on boards NOT marked as broken |
562 | if (status) { | 562 | * if broken, we cannot trust any of the status bits |
563 | */ | ||
564 | if (status && !dev->broken_parity_status) { | ||
563 | if (status & (PCI_STATUS_SIG_SYSTEM_ERROR)) { | 565 | if (status & (PCI_STATUS_SIG_SYSTEM_ERROR)) { |
564 | edac_printk(KERN_CRIT, EDAC_PCI, | 566 | edac_printk(KERN_CRIT, EDAC_PCI, |
565 | "Signaled System Error on %s\n", | 567 | "Signaled System Error on %s\n", |
@@ -593,8 +595,10 @@ static void edac_pci_dev_parity_test(struct pci_dev *dev) | |||
593 | 595 | ||
594 | debugf4("PCI SEC_STATUS= 0x%04x %s\n", status, dev->dev.bus_id); | 596 | debugf4("PCI SEC_STATUS= 0x%04x %s\n", status, dev->dev.bus_id); |
595 | 597 | ||
596 | /* check the secondary status reg for errors */ | 598 | /* check the secondary status reg for errors, |
597 | if (status) { | 599 | * on NOT broken boards |
600 | */ | ||
601 | if (status && !dev->broken_parity_status) { | ||
598 | if (status & (PCI_STATUS_SIG_SYSTEM_ERROR)) { | 602 | if (status & (PCI_STATUS_SIG_SYSTEM_ERROR)) { |
599 | edac_printk(KERN_CRIT, EDAC_PCI, "Bridge " | 603 | edac_printk(KERN_CRIT, EDAC_PCI, "Bridge " |
600 | "Signaled System Error on %s\n", | 604 | "Signaled System Error on %s\n", |