aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/edac
Commit message (Collapse)AuthorAge
* edac: i5000_edac critical fix panic out of boundsTamas Vincze2010-01-16
| | | | | | | | | | | | | | | | | | EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4) Kernel panic - not syncing: EDAC MC0: Uncorrected Error (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. This happens because FERR_NF_FBD bit 28 is not updated on i5000. Due to that, both bits 28 and 29 may be equal to one, returning channel = 3. As this value is invalid, EDAC core generates the panic. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14568 Signed-off-by: Tamas Vincze <tom@vincze.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* amd64_edac: Ensure index stays within bounds in amd64_get_scrub_rateRoel Kluin2010-01-15
| | | | | | | | Add a missing iterator variable thus fixing the conditional of the for-loop in amd64_get_scrub_rate(). Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* edac, pci: remove pesky debug printkBorislav Petkov2009-12-24
| | | | | | | Do not spam the logs needlessly with the sole info that edac_pci_dev_parity_clear is being called. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: restrict PCI config space accessBorislav Petkov2009-12-24
| | | | | | Do not access F2x19[0,4] on K8 since they're undefined there. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: fix forcing module load/unloadBorislav Petkov2009-12-24
| | | | | | Clear the override flag after force-loading the module. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: make driver loading more robustBorislav Petkov2009-12-24
| | | | | | | | | | Currently, the module does not initialize fully when the DIMMs aren't ECC but remains still loaded. Propagate the error when no instance of the driver is properly initialized and prevent further loading. Reorganize and polish error handling in amd64_edac_init() while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: fix driver instance freeingBorislav Petkov2009-12-24
| | | | | | | | | Fix use-after-free errors by pushing all memory-freeing calls to the end of amd64_remove_one_instance(). Reported-by: Darren Jenkins <darrenrjenkins@gmail.com> LKML-Reference: <1261370306.11354.52.camel@ICE-BOX> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: fix K8 chip select reportingBorislav Petkov2009-12-24
| | | | | | | | | | Fix the case when amd64_debug_display_dimm_sizes() reports only half the amount of DRAM on it because it doesn't account for when the single DCT operates in 128-bit mode and merges chip selects from different DIMMs. Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> LKML-Reference: <200912112202.48173.johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds2009-12-16
|\ | | | | | | | | | | * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: edac, mce, amd: silence GART TLB errors edac, mce: correct corenum reporting
| * edac, mce, amd: silence GART TLB errorsBorislav Petkov2009-12-16
| | | | | | | | | | | | | | | | | | | | Although reporting of benign GART TLB errors is disabled in __mcheck_cpu_apply_quirks, those are still being logged, and, as a result, trip up amd64_edac. Pull up reporting check so that machines with loaded edac module bail out early and don't spit fragments into dmesg. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
| * edac, mce: correct corenum reportingBorislav Petkov2009-12-15
| | | | | | | | | | | | Fix core number reporting with NB MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* | edac: i5100 add 6 ranks per channelNils Carlson2009-12-16
| | | | | | | | | | | | | | | | | | | | | | | | Add support for 6 ranks per channel to the i5100 chipset. I have tested the patch as far as possible with correctible errors and things appear good. The DIMM mapping is correct for our board, but boards may differ. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Acked-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | edac: i5100 add scrubbingNils Carlson2009-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Addscrubbing to the i5100 chipset. The i5100 chipset only supports one scrubbing rate, which is not constant but dependent on memory load. The rate returned by this driver is an estimate based on some experimentation, but is substantially closer to the truth than the speed supplied in the documentation. Also, scrubbing is done once, and then a done-bit is set. This means that to accomplish continuous scrubbing a re-enabling mechanism must be used. I have created the simplest possible such mechanism in the form of a work-queue which will check every five minutes. This interval is quite arbitrary but should be sufficient for all sizes of system memory. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | edac: i5100 clean controller to channel termsNils Carlson2009-12-16
| | | | | | | | | | | | | | | | | | | | The i5100 driver uses the word controller instead of channel in a lot of places, this is simply a cleanup of the patch. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | x86, msr: Add support for non-contiguous cpumasksBorislav Petkov2009-12-11
|/ | | | | | | | | | | | | | | | | | | | | | | The current rd/wrmsr_on_cpus helpers assume that the supplied cpumasks are contiguous. However, there are machines out there like some K8 multinode Opterons which have a non-contiguous core enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see http://www.gossamer-threads.com/lists/linux/kernel/1160268. This patch fixes out-of-bounds writes (see URL above) by adding per-CPU msr structs which are used on the respective cores. Additionally, two helpers, msrs_{alloc,free}, are provided for use by the callers of the MSR accessors. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091211171440.GD31998@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* amd64_edac: bump driver versionBorislav Petkov2009-12-08
| | | | | | This was long overdue ... Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: fix use-uninitialised bugAndrew Morton2009-12-08
| | | | | | | | | | drivers/edac/amd64_edac.c: In function 'amd64_edac_init': drivers/edac/amd64_edac.c:2840: warning: 'ret' may be used uninitialized in this function Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: correct sys address to chip select mappingBorislav Petkov2009-12-08
| | | | | | | | | | | | | | The routine does the reverse mapping of the error address of a CECC back to the node id, DRAM controller and chip select of the DIMM which caused the error. We should lookup the channel using the syndromes _only_ when the DCTs are ganged so fix that. Also, add an early exit when there's an error while scanning for the csrow thus decreasing indentation levels for better readability. Finally, fixup comments. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: add a leaner syndrome decoding algorithmBorislav Petkov2009-12-08
| | | | | | | | | Instead of using the whole syndrome tables for channel decoding, use a set of eigenvectors with which the tables can be generated to search for the syndrome in error. The algorithm operates independently of symbol size and can be used for both x4 and x8 syndromes. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: remove early hw support checkBorislav Petkov2009-12-07
| | | | | | | | The .probe_valid_hardware low_ops member checked whether the DCTs are in DDR3 mode and bailed out if so. Now that all the needed changes for DDR3 support is in place, remove it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: detect DDR3 memory typeBorislav Petkov2009-12-07
| | | | Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* edac: add memory types strings for debuggingBorislav Petkov2009-12-07
| | | | | | | | | | Instead of using deeply-nested conditionals for dumping the DIMM type in debug mode, add a strings array of the supported DIMM types. This is useful in cases where an edac driver supports multiple DRAM types and is only defined in debug builds. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* edac, mce: update AMD F10h revD checkBorislav Petkov2009-12-07
| | | | | | F10h revD start with model number 8. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: remove unneeded extract_error_address wrapperBorislav Petkov2009-12-07
| | | | Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: rename StinkyIdentifierBorislav Petkov2009-12-07
| | | | | | SystemAddress -> sys_addr Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: remove superfluous dbg printkBorislav Petkov2009-12-07
| | | | Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: enhance address to DRAM bank mappingBorislav Petkov2009-12-07
| | | | | | | | | | | | | | | Add cs mode to cs size mapping tables for DDR2 and DDR3 and F10 and all K8 flavors and remove klugdy table of pseudo values. Add a low_ops->dbam_to_cs member which is family-specific and replaces low_ops->dbam_map_to_pages since the pages calculation is a one liner now. Further cleanups, while at it: - shorten family name defines - align amd64_family_types struct members Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: cleanup f10_early_channel_countBorislav Petkov2009-12-07
| | | | | | | | | Do not read DCLR[01] again since this is done in amd64_read_mc_registers() earlier. There can be more than two physical DIMMs present so clamp the channels value to max 2. Also, do not report DCT data width - it is also done earlier. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: dump DIMM sizes on K8 tooBorislav Petkov2009-12-07
| | | | | | | | | | | | | Extend f10_debug_display_dimm_sizes to dump the logical DIMMs configuration on K8 revF too. Remove the ganged arg since we print the DCT operating mode (ganged vs unganged) earlier. Also, DCT csrow configuration is relevant therefore dump it as KERN_DEBUG instead of only on debug builds. Remove misleading DIMM output since there's no reliable way of mapping of chip selects to actual physical DIMMs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: cleanup rest of amd64_dump_misc_regsBorislav Petkov2009-12-07
| | | | | | | | Clarify bitfields description, add PCI config function/offset names to registers for easy reference, simplify code layout, remove unneeded info. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: cleanup DRAM cfg low debug outputBorislav Petkov2009-12-07
| | | | | | | | | | Carve out the register-specific debug statements into a separate function, clarify meanings of the single bitfields in the register, remove irrelevant output and macros. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: wrap-up pci config read error handlingBorislav Petkov2009-12-07
| | | | | | | | | | | | | Add a pci config read wrapper for signaling pci config space access errors instead of them being visible only on a debug build. This is important on amd64_edac since it uses all those pci config register values to access the DRAM/DIMM configuration of the nodes. In addition, the wrapper makes a _lot_ (look at the diffstat!) of error handling code superfluous and improves much of the overall code readability by removing error handling details out of the way. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: unify MCGCTL ECC switchingBorislav Petkov2009-12-07
| | | | | | | | | | Unify almost identical code into one function and remove NUMA-specific usage (specifically cpumask_of_node()) in favor of generic topology methods. Remove unused defines, while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* cpumask: use modern cpumask style in drivers/edac/amd64_edac.cRusty Russell2009-12-07
| | | | | | | | cpumask_t -> struct cpumask, and don't put one on the stack. (Note: this is actually on the stack unless CONFIG_CPUMASK_OFFSTACK=y). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: make DRAM regions output more human-readableBorislav Petkov2009-12-07
| | | | | | | | | | | Do not shift the TOP_MEM and TOP_MEM2 values by 23 but rather save the whole 64-bit value read from the MSR. Although the TOP_MEM/TOP_MEM2 bits are only a subset of the 64bit register, the values are correct since the remaining bits are Read-As-Zero and no shifting is needed. Also, cleanup DRAM base/limit debug output. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: clarify DRAM CTL debug reportingBorislav Petkov2009-12-07
| | | | | | | Make debug info formulations about the DRAM and DCT configuration of the machine more human readable. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* Merge branch 'perf/mce' into perf/coreIngo Molnar2009-12-03
|\ | | | | | | | | | | Merge reason: It's ready for v2.6.33. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * mce, edac: Use an atomic notifier for MCEs decodingBorislav Petkov2009-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | Add an atomic notifier which ensures proper locking when conveying MCE info to EDAC for decoding. The actual notifier call overrides a default, negative priority notifier. Note: make sure we register the default decoder only once since mcheck_init() runs on each CPU. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091003065752.GA8935@liondog.tnic> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | amd64_edac: fix CECCs reportingBorislav Petkov2009-11-04
| | | | | | | | | | | | Shift error type bits properly. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* | amd64_edac: fix a wrong goto clause in amd64_edac.cLi Hong2009-11-04
| | | | | | | | | | | | | | | | | | | | In amd64_edac_init(void) in amd64_edac.c, cache_k8_northbridges() is called before pci_register_driver. If it fails, should exit with err directly. Signed-off-by: Li Hong <lihong.hi@gmail.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* | edac: i5100 fix initialization codeKeith Mannthey2009-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow csrows to properly initialize when the topology only has active channels on 2 and 3. This new check allows proper detection and initialization in this topology. Only checking the first mrt that represented channels 0 and 1 is not sufficient. I also fixed up the related debug information path. I can submit as a 2nd patch if needed. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Acked-by: Aristeu Rozanski <aris@ruivo.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | edac: i5400 fix missing CONFIG_PCI defineIra W. Snyder2009-10-29
| | | | | | | | | | | | | | | | | | | | | | When building without CONFIG_PCI the edac_pci_idx variable is unused, causing a build-time warning. Wrap the variable in #ifdef CONFIG_PCI, just like the rest of the PCI support. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | edac: i5400 fix csrow mappingJeff Roberson2009-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The i5400 EDAC driver has several bugs with chip-select row computation which most likely lead to bugs in detailed error reporting. Attempts to contact the authors have gone mostly unanswered so I am presenting my diff here. I do not subscribe to lkml and would appreciate being kept in the cc. The most egregious problem was miscalculating the addresses of MTR registers after register 0 by assuming they are 32bit rather than 16. This caused the driver to miss half of the memories. Most motherboards tend to have only 8 dimm slots and not 16, so this may not have been noticed before. Further, the row calculations multiplied the number of dimms several times, ultimately ending up with a maximum row of 32. The chipset only supports 4 dimms in each of 4 channels, so csrow could not be higher than 4 unless you use a row per-rank with dual-rank dimms. I opted to eliminate this behavior as it is confusing to the user and the error reporting works by slot and not rank. This gives a much clearer view of memory by slot and channel in /sys. Signed-off-by: Jeff Roberson <jroberson@jroberson.net> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | amd64_edac: fix DRAM base and limit extraction masks, v2Borislav Petkov2009-10-16
|/ | | | | | This is a proper fix as a follow-up to 66216a7 and 916d11b. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2009-10-08
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pci: Correct spelling in a comment x86: Simplify bound checks in the MTRR code x86: EDAC: carve out AMD MCE decoding logic initcalls: Add early_initcall() for modules x86: EDAC: MCE: Fix MCE decoding callback logic
| * x86: EDAC: carve out AMD MCE decoding logicBorislav Petkov2009-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This converts the MCE decoding logic into a standalone config option which can be built-in or a module, the first one being the default for MCEs happening early on in the boot process. This, beyond being separated in a cleaner way, also saves RAM by making the decoding logic modular. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <20091002133148.GD28682@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86: EDAC: MCE: Fix MCE decoding callback logicIngo Molnar2009-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make decoding of MCEs happen only on AMD hardware by registering a non-default callback only on CPU families which support it. While looking at the interaction of decode_mce() with the other MCE code i also noticed a few other things and made the following cleanups/fixes: - Fixed the mce_decode() weak alias - a weak alias is really not good here, it should be a proper callback. A weak alias will be overriden if a piece of code is built into the kernel - not good, obviously. - The patch initializes the callback on AMD family 10h and 11h. - Added the more correct fallback printk of: No support for human readable MCE decoding on this CPU type. Transcribe the message and run it through 'mcelog --ascii' to decode. On CPUs that dont have a decoder. - Made the surrounding code more readable. Note that the callback allows us to have a default fallback - without having to check the CPU versions during the printout itself. When an EDAC module registers itself, it can install the decode-print function. (there's no unregister needed as this is core code.) version -v2 by Borislav Petkov: - add K8 to the set of supported CPUs - always build in edac_mce_amd since we use an early_initcall now - fix checkpatch warnings Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <20091001141432.GA11410@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | amd64_edac: beef up DRAM error injectionBorislav Petkov2009-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | When injecting DRAM ECC errors (F3xBC_x8), EccVector[15:0] is a bitmask of which bits should be error injected when written to and holds the payload of 16-bit DRAM word when read, respectively. Add /sysfs members to show the DRAM ECC section/word/vector. Fail wrong injection values entered over /sysfs instead of truncating them. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* | amd64_edac: fix DRAM base and limit extractionBorislav Petkov2009-10-07
| | | | | | | | | | | | | | | | | | On Fam10h and above, F1x[1, 0][7C:40] are DRAM Base/Limit registers which specify the destination node of a DRAM address. Those address boundaries are being extracted into ->dram_base[] and ->dram_limit[]. Correct the extraction masks to match the respective address bits. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* | amd64_edac: fix chip select handlingBorislav Petkov2009-10-07
| | | | | | | | | | | | | | | | | | | | | | | | Different processor families support a different number of chip selects. Handle this in a family-dependent way with the proper values assigned at init time (see amd64_set_dct_base_and_mask). Remove _DCSM_COUNT defines since they're used at one place and originate from public documentation. CC: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>