aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* HWPOISON: Enable .remove_error_page for migration aware file systemsAndi Kleen2009-09-16
| | | | | | | | | | | | | | | | | | | | | | Enable removing of corrupted pages through truncation for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs These should cover most server needs. I chose the set of migration aware file systems for this for now, assuming they have been especially audited. But in general it should be safe for all file systems on the data area that support read/write and truncate. Caveat: the hardware error handler does not take i_mutex for now before calling the truncate function. Is that ok? Cc: tytso@mit.edu Cc: hch@infradead.org Cc: mfasheh@suse.com Cc: aia21@cantab.net Cc: hugh.dickins@tiscali.co.uk Cc: swhiteho@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: The high level memory error handler in the VM v7Andi Kleen2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the high level memory handler that poisons pages that got corrupted by hardware (typically by a two bit flip in a DIMM or a cache) on the Linux level. The goal is to prevent everyone from accessing these pages in the future. This done at the VM level by marking a page hwpoisoned and doing the appropriate action based on the type of page it is. The code that does this is portable and lives in mm/memory-failure.c To quote the overview comment: High level machine check handler. Handles pages reported by the hardware as being corrupted usually due to a 2bit ECC memory or cache failure. This focuses on pages detected as corrupted in the background. When the current CPU tries to consume corruption the currently running process can just be killed directly instead. This implies that if the error cannot be handled for some reason it's safe to just ignore it because no corruption has been consumed yet. Instead when that happens another machine check will happen. Handles page cache pages in various states. The tricky part here is that we can access any page asynchronous to other VM users, because memory failures could happen anytime and anywhere, possibly violating some of their assumptions. This is why this code has to be extremely careful. Generally it tries to use normal locking rules, as in get the standard locks, even if that means the error handling takes potentially a long time. Some of the operations here are somewhat inefficient and have non linear algorithmic complexity, because the data structures have not been optimized for this case. This is in particular the case for the mapping from a vma to a process. Since this case is expected to be rare we hope we can get away with this. There are in principle two strategies to kill processes on poison: - just unmap the data and wait for an actual reference before killing - kill as soon as corruption is detected. Both have advantages and disadvantages and should be used in different situations. Right now both are implemented and can be switched with a new sysctl vm.memory_failure_early_kill The default is early kill. The patch does some rmap data structure walking on its own to collect processes to kill. This is unusual because normally all rmap data structure knowledge is in rmap.c only. I put it here for now to keep everything together and rmap knowledge has been seeping out anyways Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu, Nick Piggin (who did a lot of great work) and others. Cc: npiggin@suse.de Cc: riel@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
* HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per processAndi Kleen2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows processes to override their early/late kill behaviour on hardware memory errors. Typically applications which are memory error aware is better of with early kill (see the error as soon as possible), all others with late kill (only see the error when the error is really impacting execution) There's a global sysctl, but this way an application can set its specific policy. We're using two bits, one to signify that the process stated its intention and that I also made the prctl future proof by enforcing the unused arguments are 0. The state is inherited to children. Note this makes us officially run out of process flags on 32bit, but the next patch can easily add another field. Manpage patch will be supplied separately. Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: shmem: call set_page_dirty() with locked pageWu Fengguang2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dirtying of page and set_page_dirty() can be moved into the page lock. - In shmem_write_end(), the page was dirtied while the page lock was held, but it's being marked dirty just after dropping the page lock. - In shmem_symlink(), both dirtying and marking can be moved into page lock. It's valuable for the hwpoison code to know whether one bad page can be dropped without losing data. It mainly judges by testing the PG_dirty bit after taking the page lock. So it becomes important that the dirtying of page and the marking of dirtiness are both done inside the page lock. Which is a common practice, but sadly not a rule. The noticeable exceptions are - mapped pages - pages with buffer_heads The above pages could go dirty at any time. Fortunately the hwpoison will unmap the page and release the buffer_heads beforehand anyway. Many other types of pages (eg. metadata pages) can also be dirtied at will by their owners, the hwpoison code cannot do meaningful things to them anyway. Only the dirtiness of pagecache pages owned by regular files are interested. v2: AK: Add comment about set_page_dirty rules (suggested by Peter Zijlstra) Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Define a new error_remove_page address space op for async truncationAndi Kleen2009-09-16
| | | | | | | | | | | | | | | | | | Truncating metadata pages is not safe right now before we haven't audited all file systems. To enable truncation only for data address space define a new address_space callback error_remove_page. This is used for memory_failure.c memory error handling. This can be then set to truncate_inode_page() This patch just defines the new operation and adds documentation. Callers and users come in followon patches. Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Add invalidate_inode_pageWu Fengguang2009-09-16
| | | | | | | | Add a simple way to invalidate a single page This is just a refactoring of the truncate.c code. Originally from Fengguang, modified by Andi Kleen. Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Refactor truncate to allow direct truncating of page v2Nick Piggin2009-09-16
| | | | | | | | | | Extract out truncate_inode_page() out of the truncate path so that it can be used by memory-failure.c [AK: description, headers, fix typos] v2: Some white space changes from Fengguang Wu Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: check and isolate corrupted free pages v2Wu Fengguang2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | If memory corruption hits the free buddy pages, we can safely ignore them. No one will access them until page allocation time, then prep_new_page() will automatically check and isolate PG_hwpoison page for us (for 0-order allocation). This patch expands prep_new_page() to check every component page in a high order page allocation, in order to completely stop PG_hwpoison pages from being recirculated. Note that the common case -- only allocating a single page, doesn't do any more work than before. Allocating > order 0 does a bit more work, but that's relatively uncommon. This simple implementation may drop some innocent neighbor pages, hopefully it is not a big problem because the event should be rare enough. This patch adds some runtime costs to high order page users. [AK: Improved description] v2: Andi Kleen: Port to -mm code Move check into separate function. Don't dump stack in bad_pages for hwpoisoned pages. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Handle hardware poisoned pages in try_to_unmapAndi Kleen2009-09-16
| | | | | | | | | | | | | | When a page has the poison bit set replace the PTE with a poison entry. This causes the right error handling to be done later when a process runs into it. v2: add a new flag to not do that (needed for the memory-failure handler later) (Fengguang) v3: remove unnecessary is_migration_entry() test (Fengguang, Minchan) Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Use bitmask/action code for try_to_unmap behaviourAndi Kleen2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | try_to_unmap currently has multiple modi (migration, munlock, normal unmap) which are selected by magic flag variables. The logic is not very straight forward, because each of these flag change multiple behaviours (e.g. migration turns off aging, not only sets up migration ptes etc.) Also the different flags interact in magic ways. A later patch in this series adds another mode to try_to_unmap, so this becomes quickly unmanageable. Replace the different flags with a action code (migration, munlock, munmap) and some additional flags as modifiers (ignore mlock, ignore aging). This makes the logic more straight forward and allows easier extension to new behaviours. Change all the caller to declare what they want to do. This patch is supposed to be a nop in behaviour. If anyone can prove it is not that would be a bug. Cc: Lee.Schermerhorn@hp.com Cc: npiggin@suse.de Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2Andi Kleen2009-09-16
| | | | | | | | | | | | Add VM_FAULT_HWPOISON handling to the x86 page fault handler. This is very similar to VM_FAULT_OOM, the only difference is that a different si_code is passed to user space and the new addr_lsb field is initialized. v2: Make the printk more verbose/unique Cc: x86@kernel.org Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Add poison check to page fault handlingAndi Kleen2009-09-16
| | | | | | | | | | | | | Bail out early when hardware poisoned pages are found in page fault handling. Since they are poisoned they should not be mapped freshly into processes, because that would cause another (potentially deadly) machine check This is generally handled in the same way as OOM, just a different error code is returned to the architecture code. v2: Do a page unlock if needed (Fengguang Wu) Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Add basic support for poisoned pages in fault handler v3Andi Kleen2009-09-16
| | | | | | | | | | | | | - Add a new VM_FAULT_HWPOISON error code to handle_mm_fault. Right now architectures have to explicitely enable poison page support, so this is forward compatible to all architectures. They only need to add it when they enable poison page support. - Add poison page handling in swap in fault code v2: Add missing delayacct_clear_flag (Hidehiro Kawai) v3: Really use delayacct_clear_flag (Hidehiro Kawai) Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Add new SIGBUS error codes for hardware poison signalsAndi Kleen2009-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new SIGBUS codes for reporting machine checks as signals. When the hardware detects an uncorrected ECC error it can trigger these signals. This is needed for telling KVM's qemu about machine checks that happen to guests, so that it can inject them, but might be also useful for other programs. I find it useful in my test programs. This patch merely defines the new types. - Define two new si_codes for SIGBUS. BUS_MCEERR_AO and BUS_MCEERR_AR * BUS_MCEERR_AO is for "Action Optional" machine checks, which means that some corruption has been detected in the background, but nothing has been consumed so far. The program can ignore those if it wants (but most programs would already get killed) * BUS_MCEERR_AR is for "Action Required" machine checks. This happens when corrupted data is consumed or the application ran into an area which has been known to be corrupted earlier. These require immediate action and cannot just returned to. Most programs would kill themselves. - They report the address of the corruption in the user address space in si_addr. - Define a new si_addr_lsb field that reports the extent of the corruption to user space. That's currently always a (small) page. The user application cannot tell where in this page the corruption happened. AK: I plan to write a man page update before anyone asks. Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Add support for poison swap entries v2Andi Kleen2009-09-16
| | | | | | | | | | | | Memory migration uses special swap entry types to trigger special actions on page faults. Extend this mechanism to also support poisoned swap entries, to trigger poison handling on page faults. This allows follow-on patches to prevent processes from faulting in poisoned pages again. v2: Fix overflow in MAX_SWAPFILES (Fengguang Wu) v3: Better overflow fix (Hidehiro Kawai) Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Export some rmap vma locking to outside worldAndi Kleen2009-09-16
| | | | | | | | | | Needed for later patch that walks rmap entries on its own. This used to be very frowned upon, but memory-failure.c does some rather specialized rmap walking and rmap has been stable for quite some time, so I think it's ok now to export it. Signed-off-by: Andi Kleen <ak@linux.intel.com>
* HWPOISON: Add page flag for poisoned pagesAndi Kleen2009-09-16
| | | | | | | | | | | | | Hardware poisoned pages need special handling in the VM and shouldn't be touched again. This requires a new page flag. Define it here. The page flags wars seem to be over, so it shouldn't be a problem to get a new one. v2: Add TestSetHWPoison (suggested by Johannes Weiner) Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-next-2.6Linus Torvalds2009-09-15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-next-2.6: ide: fixup for fujitsu disk ide: convert to ->proc_fops at91_ide: remove headers specific for at91sam9263 IDE: palm_bk3710: convert clock usage after clkdev conversion ide: fix races in handling of user-space SET XFER commands ide: allow ide_dev_read_id() to be called from the IRQ context ide: ide-taskfile.c fix style problems drivers/ide/ide-cd.c: Use DIV_ROUND_CLOSEST ide-tape: fix handling of postponed rqs ide-tape: convert to ide_debug_log macro ide-tape: fix debug call ide: Fix annoying warning in ide_pio_bytes(). IDE: Save a call to PageHighMem()
| * ide: fixup for fujitsu diskWu Zhangjin2009-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch will fix the following problem on Yeeloong netbook with fujitsu disk. irq 14: nobody cared (try booting with the "irqpoll" option) Call Trace: [<ffffffff8020d438>] dump_stack+0x8/0x40 [<ffffffff8027ec64>] __report_bad_irq+0x58/0xe4 [<ffffffff8027ee6c>] note_interrupt+0x17c/0x23c [<ffffffff8027f9b8>] handle_level_irq+0xcc/0x134 [<ffffffff802125b0>] mach_irq_dispatch+0xb8/0x1e0 [<ffffffff8020041c>] ret_from_irq+0x0/0x4 [<ffffffff8029e678>] free_hot_cold_page+0x224/0x2a0 [<ffffffff8026f794>] swsusp_free+0xb0/0x14c [<ffffffff8026ec08>] hibernate+0x198/0x218 [<ffffffff8026cfa8>] state_store+0x90/0x138 [<ffffffff8032b5a4>] sysfs_write_file+0x130/0x194 [<ffffffff802c94fc>] vfs_write+0xb8/0x180 [<ffffffff802c96b8>] SyS_write+0x50/0x98 [<ffffffff80203fd8>] handle_sys+0x158/0x174 handlers: [<ffffffff80429670>] (ide_intr+0x0/0x300) Disabling IRQ #14 References: 1. commit 1fde02e7146d4a1bab80fd1506f9018fe71e8521 of git://dev.lemote.com/linux_loongson.git 2. 8bc1e5aa06a2a9a425c4a6795fc564cba1521487 (ide: respect quirk_drives[] list on all controllers) Signed-off-by: Yan Hua <yanh@lemote.com> Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ide: convert to ->proc_fopsAlexey Dobriyan2009-09-01
| | | | | | | | | | | | | | | | | | | | | | | | ->read_proc, ->write_proc are going away, ->proc_fops should be used instead. The only tricky place is IDENTIFY handling: if for some reason taskfile_lib_get_identify() fails, buffer _is_ changed and at least first byte is overwritten. Emulate old behaviour with returning that first byte to userspace and reporting length=1 despite overall -E. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * at91_ide: remove headers specific for at91sam9263Sergey Matyukevich2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | This driver requires only static memory controller definitions and macroses contained in generic header at91sam9_smc.h. Those extra headers are misleading since this driver also works fine for at91sam9260 SoC: tests were performed on afeb9260 board. Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com> Acked-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
| * IDE: palm_bk3710: convert clock usage after clkdev conversionKevin Hilman2009-08-15
| | | | | | | | | | | | | | | | | | | | DaVinci core code has converted to the new clkdev API so clock name strings are not needed. Instead, just the a 'struct device' pointer is needed. Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ide: fix races in handling of user-space SET XFER commandsBartlomiej Zolnierkiewicz2009-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Make cmd->tf_flags field 'u16' and add IDE_TFLAG_SET_XFER taskfile flag. * Update ide_finish_cmd() to set xfer / re-read id if the new flag is set. * Convert set_xfer_rate() (write handler for /proc/ide/hd?/current_speed) and ide_cmd_ioctl() (HDIO_DRIVE_CMD ioctl handler) to use the new flag. * Remove no longer needed disable_irq_nosync() + enable_irq() from ide_config_drive_speed(). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ide: allow ide_dev_read_id() to be called from the IRQ contextBartlomiej Zolnierkiewicz2009-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Un-static __ide_wait_stat(). * Allow ide_dev_read_id() helper to be called from the IRQ context by adding irq_ctx flag and using mdelay()/__ide_wait_stat() when needed. * Switch ide_driveid_update() to set irq_ctx flag. This change is needed for the consecutive patch which fixes races in handling of user-space SET XFER commands but for improved bisectability and clarity it is better to do it in a separate patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ide: ide-taskfile.c fix style problemsJaswinder Singh Rajput2009-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix trivial style problems: WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h> WARNING: space prohibited between function name and open parenthesis '(' WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable ERROR: do not use C99 // comments X 2 ERROR: trailing statements should be on next line ERROR: trailing whitespace ERROR: switch and case should be at the same indent WARNING: line over 80 characters total: 5 errors, 4 warnings Also removed dead code Also used pr_err() to avoid line breaks Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * drivers/ide/ide-cd.c: Use DIV_ROUND_CLOSESTJulia Lawall2009-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel.h macro DIV_ROUND_CLOSEST performs the computation (x + d/2)/d but is perhaps more readable. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @haskernel@ @@ #include <linux/kernel.h> @depends on haskernel@ expression x,__divisor; @@ - (((x) + ((__divisor) / 2)) / (__divisor)) + DIV_ROUND_CLOSEST(x,__divisor) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ide-tape: fix handling of postponed rqsBorislav Petkov2009-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ide-tape used to hit [ 58.614854] ide-tape: ht0: BUG: Two DSC requests queued! due to the fact that another rq was being issued while the driver was waiting for DSC to get set for the device executing ATAPI commands which set the DSC to 1 to indicate completion. Here's a sample output of that case: issue REZERO_UNIT [ 143.088505] ide-tape: ide_tape_issue_pc: retry #0, cmd: 0x01 [ 143.095122] ide: Enter ide_pc_intr - interrupt handler [ 143.096118] ide: Packet command completed, 0 bytes transferred [ 143.106319] ide-tape: ide_tape_callback: cmd: 0x1, dsc: 1, err: 0 [ 143.112601] ide-tape: idetape_postpone_request: cmd: 0x1, dsc_poll_freq: 2000 we stall the ide-tape queue here waiting for DSC [ 143.119936] ide-tape: ide_tape_read_position: enter [ 145.119019] ide-tape: idetape_do_request: sector: 4294967295, nr_sectors: 0 and issue the new READ_POSITION rq and hit the check. [ 145.126247] ide-tape: ht0: BUG: Two DSC requests queued! [ 145.131748] ide-tape: ide_tape_read_position: BOP - No [ 145.137059] ide-tape: ide_tape_read_position: EOP - No Also, ->postponed_rq used to point to that postponed request. To make things worse, in certain circumstances the rq it was pointing to got replaced unterneath it by swiftly reusing the same rq from the mempool of the block layer practically confusing stuff even more. However, we don't need to keep a pointer to that rq but simply wait for DSC to be set first before issuing the follow-up request in the drive's queue. In order to do that, we make idetape_do_request() first check the DSC and if not set, we stall the drive queue giving the other device on that IDE channel a chance. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ide-tape: convert to ide_debug_log macroBorislav Petkov2009-08-07
| | | | | | | | | | | | | | | | | | Remove tape->debug_mask and use drive->debug_mask instead. There should be no functional change resulting from this patch. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ide-tape: fix debug callMark de Wever2009-08-07
| | | | | | | | | | | | | | This error only occurs when IDETAPE_DEBUG_LOG is enabled. Signed-off-by: Mark de Wever <koraq@xs4all.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ide: Fix annoying warning in ide_pio_bytes().David S. Miller2009-08-07
| | | | | | | | | | | | | | | | | | GCC can't see that flags is only set and used when PageHighmem() is true. Inspired by a patch from Jean Delvare. Signed-off-by: David S. Miller <davem@davemloft.net>
| * IDE: Save a call to PageHighMem()Jean Delvare2009-08-07
| | | | | | | | | | | | | | | | PageHighMem() isn't cheap so avoid calling it twice on the same page. Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'next' of ↵Linus Torvalds2009-09-15
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits) powerpc/nvram: Enable use Generic NVRAM driver for different size chips powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline powerpc/ps3: Workaround for flash memory I/O error powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead powerpc/perf_counters: Reduce stack usage of power_check_constraints powerpc: Fix bug where perf_counters breaks oprofile powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops powerpc/irq: Improve nanodoc powerpc: Fix some late PowerMac G5 with PCIe ATI graphics powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT powerpc/book3e: Add missing page sizes powerpc/pseries: Fix to handle slb resize across migration powerpc/powermac: Thermal control turns system off too eagerly powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan() powerpc/405ex: support cuImage via included dtb powerpc/405ex: provide necessary fixup function to support cuImage powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support. powerpc/44x: Update Arches defconfig powerpc/44x: Update Arches dts ... Fix up conflicts in drivers/char/agp/uninorth-agp.c
| * | powerpc/nvram: Enable use Generic NVRAM driver for different size chipsMartyn Welch2009-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the reliance on a staticly defined NVRAM size, allowing platforms to support NVRAMs with sizes differing from the standard. A fall back value is provided for platforms not supporting this extension. Signed-off-by: Martyn Welch <martyn.welch@gefanuc.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdlineBenjamin Herrenschmidt2009-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That code uses dma_mapping_error() with a NULL device, which is a bad idea :-) The proper fix might be to start using some kind of pseudo device for all these low level mappings with the hypervisor but that will be for another day. Since it directly calls into the low level iommu code, I see no problem in having it directly test against DMA_ERROR_CODE instead of using the accessors with a NULL argument for now. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/ps3: Workaround for flash memory I/O errorGeoff Levand2009-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A workaround for flash memory I/O errors when the PS3 internal hard disk has not been formatted for OtherOS use. This error condition mainly effects 'Live CD' users who have not formatted the PS3's internal hard disk for OtherOS. Fixes errors similar to these when using the ps3-flash-util or ps3-boot-game-os programs: ps3flash read failed 0x2050000 os_area_header_read: read error: os_area_header: Input/output error main:627: os_area_read_hp error. ERROR: can't change boot flag Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 insteadBenjamin Herrenschmidt2009-09-10
| | | | | | | | | | | | | | | | | | | | | Also remove a duplicate setting of it in the context switch path on BookE. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/perf_counters: Reduce stack usage of power_check_constraintsPaul Mackerras2009-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael Ellerman reported stack-frame size warnings being produced for power_check_constraints(), which uses an 8*8 array of u64 and two 8*8 arrays of unsigned long, which are currently allocated on the stack, along with some other smaller variables. These arrays come to 1.5kB on 64-bit or 1kB on 32-bit, which is a bit too much for the stack. This fixes the problem by putting these arrays in the existing per-cpu cpu_hw_counters struct. This is OK because two of the call sites have interrupts disabled already; for the third call site we use get_cpu_var, which disables preemption, so we know we won't get a context switch while we're in power_check_constraints(). Note that power_check_constraints() can be called during context switch but is not called from interrupts. Reported-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: <stable@kernel.org) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc: Fix bug where perf_counters breaks oprofilePaul Mackerras2009-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is a bug where if you use oprofile on a pSeries machine, then use perf_counters, then use oprofile again, oprofile will not work correctly; it will lose the PMU configuration the next time the hypervisor does a partition context switch, and thereafter won't count anything. Maynard Johnson identified the sequence causing the problem: - oprofile setup calls ppc_enable_pmcs(), which calls pseries_lpar_enable_pmcs, which tells the hypervisor that we want to use the PMU, and sets the "PMU in use" flag in the lppaca. This flag tells the hypervisor whether it needs to save and restore the PMU config. - The perf_counter code sets and clears the "PMU in use" flag directly as it context-switches the PMU between tasks, and leaves it clear when it finishes. - oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs, which does nothing because it has already been called. In particular it doesn't set the "PMU in use" flag. This fixes the problem by arranging for ppc_enable_pmcs to always set the "PMU in use" flag. It makes the perf_counter code call ppc_enable_pmcs also rather than calling the lower-level function directly, and removes the setting of the "PMU in use" flag from pseries_lpar_enable_pmcs, since that is now done in its caller. This also removes the declaration of pasemi_enable_pmcs because it isn't defined anywhere. Reported-by: Maynard Johnson <mpjohn@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: <stable@kernel.org) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/85xx: Fix SMP compile error and allow NULL for smp_opsKumar Gala2009-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commit introduced a compile error since it removed the implementation of smp_85xx_basic_setup: commit 77c0a700c1c292edafa11c1e52821ce4636f81b0 Author: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Fri Aug 28 14:25:04 2009 +1000 powerpc: Properly start decrementer on BookE secondary CPUs Make it so that smp_ops probe() and setup_cpu() can be set to NULL. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/irq: Improve nanodocWolfram Sang2009-09-10
| | | | | | | | | | | | | | | | | | | | | | | | The OF helpers look like nanodoc but are missing the header. Fix this and a typo (s/nad/and/) while we are here. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc: Fix some late PowerMac G5 with PCIe ATI graphicsBenjamin Herrenschmidt2009-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A misconfiguration by the firmware of the U4 PCIe bridge on PowerMac G5 with the U4 bridge (latest generations, may also affect the iMac G5 "iSight") is causing us to re-assign the PCI BARs of the video card, which can get it out of sync with the firmware, thus breaking offb. This works around it by fixing up the bridge configuration properly at boot time. It also fixes a bug where the firmware provides us with an incorrect set of accessible regions in the device-tree. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BITKumar Gala2009-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch to using the Power ISA defined PTE format when we have a 64-bit PTE. This makes the code handling between fsl-booke and book3e-64 similiar for TLB faults. Additionally this lets use take advantage of the page size encodings and full permissions that the HW PTE defines. Also defined _PMD_PRESENT, _PMD_PRESENT_MASK, and _PMD_BAD since the 32-bit ppc arch code expects them. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/book3e: Add missing page sizesKumar Gala2009-09-02
| | | | | | | | | | | | | | | | | | | | | | | | Add defines for the other page sizes. Even if HW doesn't support them we made them use them for hugetlbfs support. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/pseries: Fix to handle slb resize across migrationBrian King2009-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SLB can change sizes across a live migration, which was not being handled, resulting in possible machine crashes during migration if migrating to a machine which has a smaller max SLB size than the source machine. Fix this by first reducing the SLB size to the minimum possible value, which is 32, prior to migration. Then during the device tree update which occurs after migration, we make the call to ensure the SLB gets updated. Also add the slb_size to the lparcfg output so that the migration tools can check to make sure the kernel has this capability before allowing migration in scenarios where the SLB size will change. BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid breaking ppc32 build Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/powermac: Thermal control turns system off too eagerlyLyonel Vincent2009-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On certain PowerMacs, a module (therm_windtunnel) controls various thermal settings (it can report CPU/case temperature, change speed of internal fans, etc.) By default, the hardware thermal control has a temperature limit to protect the computer from damages (the default limit seems to be 80°C) but therm_windtunnel.c reduces it to an anormaly low value (65°C), which means that he computer will shut down randomly when hit by direct sun light or during summer (summer in France can be quite hot), actually possibly losing data instead of protecting it. The overheat limit in therm_windtunnel.c:253-254 should be set to 75°C and 70°C instead of 65°C and 60°C respectively. From: Lyonel Vincent <lyonel@ezix.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()Grant Likely2009-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The two versions are doing almost exactly the same thing. No need to maintain them as separate files. This patch also has the side effect of making the PCI device tree scanning code available to 32 bit powerpc machines, but no board ports actually make use of this feature at this point. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/405ex: support cuImage via included dtbTiejun Chen2009-08-31
| | | | | | | | | | | | | | | | | | | | | | | | To support cuImage, we need to initialize the required sections and ensure that it is built. Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
| * | powerpc/405ex: provide necessary fixup function to support cuImageTiejun Chen2009-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For cuImage format it's necessary to provide clock fixups since u-boot will not pass necessary clock frequency into the dtb included into cuImage so we implement the clock fixups as defined in the technical documentation for the board and update header file with the basic register definitions. Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
| * | powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBCSolomon Peachy2009-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the ESTeem 195E Hotfoot SBC. There are several variants of the SBC deployed, single/dual ethernet+serial, and also 4MB/8MB flash variations. In the interest of having a single kernel image boot on all boards, the cuboot shim detects the differences and mangles the DTS tree appropriately. With the exception of the CF interface that was never populated on production boards, this code/DTS supports all boardpop options. Signed-off-by: Solomon Peachy <solomon@linux-wlan.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
| * | powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.fkan@amcc.com2009-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the AMCC (AppliedMicro) PPC460SX Eiger evaluation board. Signed-off-by: Tai Tri Nguyen <ttnguyen@amcc.com> Acked-by: Feng Kan <fkan@amcc.com> Acked-by: Tirumala Marri <tmarri@amcc.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>