aboutsummaryrefslogtreecommitdiffstats
path: root/arch
Commit message (Collapse)AuthorAge
* vhost_net: a kernel-level virtio serverMichael S. Tsirkin2010-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | What it is: vhost net is a character device that can be used to reduce the number of system calls involved in virtio networking. Existing virtio net code is used in the guest without modification. There's similarity with vringfd, with some differences and reduced scope - uses eventfd for signalling - structures can be moved around in memory at any time (good for migration, bug work-arounds in userspace) - write logging is supported (good for migration) - support memory table and not just an offset (needed for kvm) common virtio related code has been put in a separate file vhost.c and can be made into a separate module if/when more backends appear. I used Rusty's lguest.c as the source for developing this part : this supplied me with witty comments I wouldn't be able to write myself. What it is not: vhost net is not a bus, and not a generic new system call. No assumptions are made on how guest performs hypercalls. Userspace hypervisors are supported as well as kvm. How it works: Basically, we connect virtio frontend (configured by userspace) to a backend. The backend could be a network device, or a tap device. Backend is also configured by userspace, including vlan/mac etc. Status: This works for me, and I haven't see any crashes. Compared to userspace, people reported improved latency (as I save up to 4 system calls per packet), as well as better bandwidth and CPU utilization. Features that I plan to look at in the future: - mergeable buffers - zero copy - scalability tuning: figure out the best threading model to use Note on RCU usage (this is also documented in vhost.h, near private_pointer which is the value protected by this variant of RCU): what is happening is that the rcu_dereference() is being used in a workqueue item. The role of rcu_read_lock() is taken on by the start of execution of the workqueue item, of rcu_read_unlock() by the end of execution of the workqueue item, and of synchronize_rcu() by flush_workqueue()/flush_work(). In the future we might need to apply some gcc attribute or sparse annotation to the function passed to INIT_WORK(). Paul's ack below is for this RCU usage. (Includes fixes by Alan Cox <alan@linux.intel.com>, David L Stevens <dlstevens@us.ibm.com>, Chris Wright <chrisw@redhat.com>) Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'merge' of ↵Linus Torvalds2009-12-22
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (36 commits) powerpc/gc/wii: Remove get_irq_desc() powerpc/gc/wii: hlwd-pic: convert irq_desc.lock to raw_spinlock powerpc/gamecube/wii: Fix off-by-one error in ugecon/usbgecko_udbg powerpc/mpic: Fix problem that affinity is not updated powerpc/mm: Fix stupid bug in subpge protection handling powerpc/iseries: use DECLARE_COMPLETION_ONSTACK for non-constant completion powerpc: Fix MSI support on U4 bridge PCIe slot powerpc: Handle VSX alignment faults correctly in little-endian mode powerpc/mm: Fix typo of cpumask_clear_cpu() powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled. powerpc: Convert BUG() to use unreachable() powerpc/pseries: Make declarations of cpu_hotplug_driver_lock() ANSI compatible. powerpc/pseries: Don't panic when H_PROD fails during cpu-online. powerpc/mm: Fix a WARN_ON() with CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_VM powerpc/defconfigs: Set HZ=100 on pseries and ppc64 defconfigs powerpc/defconfigs: Disable token ring in powerpc defconfigs powerpc/defconfigs: Reduce 64bit vmlinux by making acenic and cramfs modules powerpc/pseries: Select XICS and PCI_MSI PSERIES powerpc/85xx: Wrong variable returned on error powerpc/iseries: Convert to proc_fops ...
| * powerpc/gc/wii: Remove get_irq_desc()Albert Herranz2009-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following build failures: arch/powerpc/platforms/embedded6xx/flipper-pic.c: In function 'flipper_pic_map': arch/powerpc/platforms/embedded6xx/flipper-pic.c:105: error: implicit declaration of function 'get_irq_desc' arch/powerpc/platforms/embedded6xx/hlwd-pic.c: In function 'hlwd_pic_map': arch/powerpc/platforms/embedded6xx/hlwd-pic.c:98: error: implicit declaration of function 'get_irq_desc' These failures are caused by the changes introduced in commit "powerpc: Remove get_irq_desc()". The reason these drivers were not updated is that they weren't merged yet. Signed-off-by: Albert Herranz <albert_herranz@yahoo.es> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * powerpc/gc/wii: hlwd-pic: convert irq_desc.lock to raw_spinlockAlbert Herranz2009-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following build failures: arch/powerpc/platforms/embedded6xx/hlwd-pic.c: In function 'hlwd_pic_irq_cascade': arch/powerpc/platforms/embedded6xx/hlwd-pic.c:135: error: passing argument 1 of 'spin_lock' from incompatible pointer type arch/powerpc/platforms/embedded6xx/hlwd-pic.c:137: error: passing argument 1 of 'spin_unlock' from incompatible pointer type arch/powerpc/platforms/embedded6xx/hlwd-pic.c:145: error: passing argument 1 of 'spin_lock' from incompatible pointer type arch/powerpc/platforms/embedded6xx/hlwd-pic.c:149: error: passing argument 1 of 'spin_unlock' from incompatible pointer type These failures are caused by the changes introduced in commit "genirq: Convert irq_desc.lock to raw_spinlock". The reason this driver was not updated is that it wasn't merged yet. Signed-off-by: Albert Herranz <albert_herranz@yahoo.es> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * Merge commit 'jwb/next' into mergeBenjamin Herrenschmidt2009-12-20
| |\
| | * powerpc/44x: Increase warp SD bufferSean MacLennan2009-12-11
| | | | | | | | | | | | | | | | | | | | | Newer revs of the FPGA have a larger SD buffer. Signed-off-by: Sean MacLennan <smaclennan@pikatech.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
| | * powerpc/44x: Extend Katmai dts for ADMA and RAID56 supportAnatolij Gustschin2009-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add nodes for PPC440SPe DMA, I2O, XOR engines and Memory Queue module which are used in the updated PPC440SPe ADMA driver. Also extend plb ranges property to specify address ranges for DMA0/1 and I2O engines. Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Anatolij Gustschin <agust@denx.de> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
| * | Merge commit 'kumar/next' into mergeBenjamin Herrenschmidt2009-12-20
| |\ \
| | * | powerpc/85xx: Workaround MPC8572/MPC8536 GPIO 1 errata.Felix Radensky2009-12-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On MPC8572 and MPC8536 the status of GPIO pins configured as output cannot be determined by reading GPDAT register. Workaround by reading the status of input pins from GPDAT and the status of output pins from a shadow register. Signed-off-by: Felix Radensky <felix@embedded-sol.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| | * | powerpc/gpio: support gpio_to_irq()Peter Korsgaard2009-12-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gpiolib returns -ENXIO if struct gpio_chip::to_irq isn't set, so it's safe to always call. Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| | * | powerpc/83xx: Add power management support for MPC8315E-RDB boardsAnton Vorontsov2009-12-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add nodes for PMC and GTM controllers. GTM4 can be used as a wakeup source; - Add fsl,magic-packet properties to eTSEC nodes, i.e. wake-on-lan support. Unlike MPC8313 processors, MPC8315 can resume from deep sleep upon magic packet reception. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| | * | powerpc/83xx/suspend: Save and restore SICRL, SICRH and SCCRAnton Vorontsov2009-12-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to save SICRL, SICRH and SCCR registers on suspend, and restore them on resume. Otherwise, we lose IO and clocks setup on MPC8315E-RDB boards when ULPI USB PHY is used (non-POR setup). Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| | * | powerpc/83xx/suspend: Clear deep_sleeping after devices resumeAnton Vorontsov2009-12-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently 83xx PMC driver clears deep_sleeping variable very early, before devices are resumed. This makes fsl_deep_sleep() unusable in drivers' resume() callback. Sure, drivers can store fsl_deep_sleep() value on suspend and use the stored value on resume. But a better solution is to postpone clearing the deep_sleeping variable, i.e. move it into finish() callback. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| | * | powerpc/cpm2_pic: Allow correct flow_types for port C interruptsMark Ware2009-12-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Port C interrupts can be either falling edge, or either edge. Other external interrupts are either falling edge or active low. Tested on a custom 8280 based board. Signed-off-by: Mark Ware <mware@elphinstone.net> Acked-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| | * | powerpc/83xx: mpc8349emitx - add leds-gpio bindingDmitry Eremin-Solenikov2009-12-09
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| | * | powerpc/83xx: mpc8349emitx - add OF descriptions of LocalBus devicesDmitry Eremin-Solenikov2009-12-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Describe all LocalBus chipselects on MPC8349E-MITX board. Also add flash bindings. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| | * | powerpc/83xx: mpc8349emitx - populate I2C busses in device treeDmitry Eremin-Solenikov2009-12-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add OF descriptions of EEPROM, two GPIO extenders and SPD hanging on I2C on this board. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| | * | powerpc/83xx: mpc8349emitx - add gpio controller declarationsDmitry Eremin-Solenikov2009-12-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | mpc8349 bears two GPIO controllers. Enable support for them. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| | * | powerpc/fsl_pci: Fix P2P bridge handling for MPC83xx PCIe controllersAnton Vorontsov2009-12-09
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It appears that we wrongly calculate dev_base for type1 config cycles. The thing is: we shouldn't subtract hose->first_busno because PCI core sets PCI primary, secondary and subordinate bus numbers, and PCIe controller actually takes the registers into account. So we should use just bus->number. Also, according to MPC8315 reference manual, primary bus number should always remain 0. We have PPC_INDIRECT_TYPE_SURPRESS_PRIMARY_BUS quirk in indirect_pci.c, but since 83xx is somewhat special, it doesn't use indirect_pci.c routines, so we have to implement the quirk specifically for 83xx PCIe controllers. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
| * | powerpc/gamecube/wii: Fix off-by-one error in ugecon/usbgecko_udbgAlbert Herranz2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The retry logic in ug_putc() is broken. If the TX fifo is not ready and the counter runs out it will have a value of -1 and no transfer should be attempted. Also, a counter with a value of 0 means that the TX fifo got ready in the last try and the transfer should be attempted. Reported-by: "Juha Leppanen" <juha_motorsportcom@luukku.com> Signed-off-by: "Juha Leppanen" <juha_motorsportcom@luukku.com> Signed-off-by: Albert Herranz <albert_herranz@yahoo.es> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/mpic: Fix problem that affinity is not updatedYang Li2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 57b150cce8e004ddd36330490a68bfb59b7271e9, desc->affinity of an irq is changed after calling desc->chip->set_affinity. Therefore we need to fix the irq_choose_cpu() not to depend on the desc->affinity for new mask. Signed-off-by: Jiajun Wu <b06378@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/mm: Fix stupid bug in subpge protection handlingDavid Gibson2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d28513bc7f675d28b479db666d572e078ecf182d ("Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT"), itself a fix for breakage caused by an earlier clean up patch of mine, contains a stupid bug. I changed the parameters of the subpage_protection() function, but failed to update one of the callers. This patch fixes it, and replaces a void * with a typed pointer so that the compiler will warn on such an error in future. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/iseries: use DECLARE_COMPLETION_ONSTACK for non-constant completionYong Zhang2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The _ONSTACK variant should be used for on-stack completion, otherwise it will break lockdep. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc: Fix MSI support on U4 bridge PCIe slotBenjamin Herrenschmidt2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On machines using the Apple U4 bridge (AKA IBM CPC945) PCIe interface such as the latest generation G5 machines x16 slot or the x16 slot of the PowerStation, MSIs are currently broken (and will oops when enabling). This fixes the oops and implements proper support for those. Instead of using the PCIe <-> HT bridge conversion, on such slots we need to use a bunch of magic registers in the bridge as the MSI target, encoding the interrupt number in the low bits of the address itself Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc: Handle VSX alignment faults correctly in little-endian modeNeil Campbell2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the handling of VSX alignment faults in little-endian mode (the current code assumes the processor is in big-endian mode). The patch also makes the handlers clear the top 8 bytes of the register when handling an 8 byte VSX load. This is based on 2.6.32. Signed-off-by: Neil Campbell <neilc@linux.vnet.ibm.com> Cc: <stable@kernel.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/mm: Fix typo of cpumask_clear_cpu()Yang Li2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | The function name of cpumask_clear_cpu was not correct. Fortunately nobody uses that code with hotplug yet :-) Reported-by: Jin Qing <b24347@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled.Sachin P. Sant2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This time without the funny characters. Fix following build errors generated with DEBUG=1 cc1: warnings being treated as errors arch/powerpc/mm/hash_utils_64.c: In function 'htab_dt_scan_page_sizes': arch/powerpc/mm/hash_utils_64.c:343: error: format '%04x' expects type 'unsigned int', but argument 4 has type 'long unsigned int' arch/powerpc/mm/hash_utils_64.c:343: error: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int' arch/powerpc/mm/hash_utils_64.c: In function 'htab_initialize': arch/powerpc/mm/hash_utils_64.c:666: error: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int' ... SNIP ... Signed-off-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc: Convert BUG() to use unreachable()David Daney2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new unreachable() macro instead of for(;;); Signed-off-by: David Daney <ddaney@caviumnetworks.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: linuxppc-dev@ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/pseries: Make declarations of cpu_hotplug_driver_lock() ANSI compatible.Gautham R Shenoy2009-12-17
| | | | | | | | | | | | | | | | | | | | | And add the __acquires() and __releases() annotations, while at it. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/pseries: Don't panic when H_PROD fails during cpu-online.Gautham R Shenoy2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an online-attempt on a CPU which has been offlined using H_CEDE with an appropriate cede latency hint fails, don't panic. Instead print the error message and let the __cpu_up() code notify the CPU Hotplug framework of the failure, which in turn can notify the other subsystem through CPU_UP_CANCELED. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/mm: Fix a WARN_ON() with CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_VMBenjamin Herrenschmidt2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Set need to call __set_pte_at() and not set_pte_at() from __change_page_attr() since the later will perform checks with CONFIG_DEBUG_VM that aren't suitable to the way we override an existing PTE. (More specifically, it doesn't let you write over a present PTE). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/defconfigs: Set HZ=100 on pseries and ppc64 defconfigsAnton Blanchard2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we have high res timers there is less of a reason for a high HZ value. Furthermore I think there a few reasons we should reduce HZ to 100: - Timer interrupt overhead. While this overhead is small, there are applications that are very sensitive to jitter (eg some HPC apps). - Issues with the timer wheel code. When coming out of NO_HZ idle we work our way through the timer code one tick at a time. If we have been idle a long time, this adds up - I sometimes see milliseconds of time spent in that loop. Long term we should fix the timer wheel algorithm, but for now if we reduce HZ then we reduce the amount of work the timer code has to do when coming out of idle. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/defconfigs: Disable token ring in powerpc defconfigsAnton Blanchard2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | Token what? Lets save some space in our powerpc kernels and remove token ring support. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/defconfigs: Reduce 64bit vmlinux by making acenic and cramfs modulesAnton Blanchard2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Machines with acenic adapters are rare these days, so we may as well make it a module. Cramfs is also very rarely used so we can make it a module. Together this saves 143kB on a 64bit compile: text data bss dec hex filename 8247176 1729404 1221988 11198568 aae068 vmlinux~ 8134997 1727588 1188836 11051421 a8a19d vmlinux Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/pseries: Select XICS and PCI_MSI PSERIESMel Gorman2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible to set CONFIG_XICS without CONFIG_PCI_MSI. When that happens, the kernel fails to build with arch/powerpc/platforms/built-in.o: In function `.xics_startup': xics.c:(.text+0x12f60): undefined reference to `.unmask_msi_irq' make: *** [.tmp_vmlinux1] Error 1 Furthermore, as noted by Benjamin Herrenschmidt, "CONFIG_XICS should be made invisible and selected by PSERIES." This patch fixes PSERIES to select both options Signed-off-by: Mel Gorman <mel[at]csn.ul.ie> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/85xx: Wrong variable returned on errorRoel Kluin2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wrong variable was returned in the case of an error. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc/iseries: Convert to proc_fopsAlexey Dobriyan2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc: Make the CMM memory hotplug awareRobert Jennings2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Collaborative Memory Manager (CMM) module allocates individual pages over time that are not migratable. On a long running system this can severely impact the ability to find enough pages to support a hotplug memory remove operation. This patch adds a memory isolation notifier and a memory hotplug notifier. The memory isolation notifier will return the number of pages found in the range specified. This is used to determine if all of the used pages in a pageblock are owned by the balloon (or other entities in the notifier chain). The hotplug notifier will free pages in the range which is to be removed. The priority of this hotplug notifier is low so that it will be called near last, this helps avoids removing loaned pages in operations that fail due to other handlers. CMM activity will be halted when hotplug remove operations are active and resume activity after a delay period to allow the hypervisor time to adjust. Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Ingo Molnar <mingo@elte.hu> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Gerald Schaefer <geralds@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2009-12-19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha-2.6: alpha: Convert BUG() to use unreachable() alpha: Add minimal support for software performance events alpha: Wire up missing/new syscalls
| * | | alpha: Convert BUG() to use unreachable()David Daney2009-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new unreachable() macro instead of for(;;); Signed-off-by: David Daney <ddaney@caviumnetworks.com> CC: Richard Henderson <rth@twiddle.net> CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru> CC: linux-alpha@vger.kernel.org Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | | alpha: Add minimal support for software performance eventsMichael Cree2009-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the kernel the patch enables configuration of the perf event option, adds the perf_event_open syscall, and includes a minimal architecture specific asm/perf_event.h header file. Signed-off-by: Michael Cree <mcree@orcon.net.nz> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Matt Turner <mattst88@gmail.com>
| * | | alpha: Wire up missing/new syscallsDaniele Calore2009-12-18
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This wire up the: fallocate, timerfd_create, timerfd_settime, timerfd_gettime, signalfd4, eventfd2, epoll_create1, dup3, pipe2, inotify_init1, preadv, pwritev and rt_tgsigqueueinfo syscalls for the alpha port. For umount2, alpha have an "old" and "new" version called: oldumount and umount; so ignore umount2. Rebased on top of 6e17e8b9fb74b9fb9f6ea331f7f4a049c5b4c4b8 by Matt Turner. Signed-off-by: Daniele Calore <orkaan@orkaan.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Matt Turner <mattst88@gmail.com>
* | | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2009-12-19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf session: Make events_stats u64 to avoid overflow on 32-bit arches hw-breakpoints: Fix hardware breakpoints -> perf events dependency perf events: Dont report side-band events on each cpu for per-task-per-cpu events perf events, x86/stacktrace: Fix performance/softlockup by providing a special frame pointer-only stack walker perf events, x86/stacktrace: Make stack walking optional perf events: Remove unused perf_counter.h header file perf probe: Check new event name kprobe-tracer: Check new event/group name perf probe: Check whether debugfs path is correct perf probe: Fix libdwarf include path for Debian
| * | | hw-breakpoints: Fix hardware breakpoints -> perf events dependencyFrederic Weisbecker2009-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kbuild's select command doesn't propagate through the config dependencies. Hence the current rules of hardware breakpoint's config can't ensure perf can never be disabled under us. We have: config X86 selects HAVE_HW_BREAKPOINTS config HAVE_HW_BREAKPOINTS select PERF_EVENTS config PERF_EVENTS [...] x86 will select the breakpoints but that won't propagate to perf events. The user can still disable the latter, but it is necessary for the breakpoints. What we need is: - x86 selects HAVE_HW_BREAKPOINTS and PERF_EVENTS - HAVE_HW_BREAKPOINTS depends on PERF_EVENTS so that we ensure PERF_EVENTS is enabled and frozen for x86. This fixes the following kind of build errors: In file included from arch/x86/kernel/hw_breakpoint.c:31: include/linux/hw_breakpoint.h: In function 'hw_breakpoint_addr': include/linux/hw_breakpoint.h:39: error: 'struct perf_event' has no member named 'attr' v2: Select also ANON_INODES from x86, required for perf Reported-by: Cyrill Gorcunov <gorcunov@gmail.com> Reported-by: Michal Marek <mmarek@suse.cz> Reported-by: Andrew Randrianasulu <randrik_a@yahoo.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> LKML-Reference: <1261010034-7786-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | perf events, x86/stacktrace: Fix performance/softlockup by providing a ↵Frederic Weisbecker2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | special frame pointer-only stack walker It's just wasteful for stacktrace users like perf to walk through every entries on the stack whereas these only accept reliable ones, ie: that the frame pointer validates. Since perf requires pure reliable stacktraces, it needs a stack walker based on frame pointers-only to optimize the stacktrace processing. This might solve some near-lockup scenarios that can be triggered by call-graph tracing timer events. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1261024834-5336-2-git-send-regression-fweisbec@gmail.com> [ v2: fix for modular builds and small detail tidyup ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | perf events, x86/stacktrace: Make stack walking optionalFrederic Weisbecker2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current print_context_stack helper that does the stack walking job is good for usual stacktraces as it walks through all the stack and reports even addresses that look unreliable, which is nice when we don't have frame pointers for example. But we have users like perf that only require reliable stacktraces, and those may want a more adapted stack walker, so lets make this function a callback in stacktrace_ops that users can tune for their needs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1261024834-5336-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2009-12-19
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system Makefile: Unexport LC_ALL instead of clearing it x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk x86: Reenable TSC sync check at boot, even with NONSTOP_TSC x86: Don't use POSIX character classes in gen-insn-attr-x86.awk Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA x86: Fix checking of SRAT when node 0 ram is not from 0 x86, cpuid: Add "volatile" to asm in native_cpuid() x86, msr: msrs_alloc/free for CONFIG_SMP=n x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space x86: Add IA32_TSC_AUX MSR and use it x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers initramfs: add missing decompressor error check bzip2: Add missing checks for malloc returning NULL bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure
| * | | | x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu systemSuresh Siddha2009-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | John Blackwood reported: > on an older Dell PowerEdge 6650 system with 8 cpus (4 are hyper-threaded), > and 32 bit (x86) kernel, once you change the irq smp_affinity of an irq > to be less than all cpus in the system, you can never change really the > irq smp_affinity back to be all cpus in the system (0xff) again, > even though no error status is returned on the "/bin/echo ff > > /proc/irq/[n]/smp_affinity" operation. > > This is due to that fact that BAD_APICID has the same value as > all cpus (0xff) on 32bit kernels, and thus the value returned from > set_desc_affinity() via the cpu_mask_to_apicid_and() function is treated > as a failure in set_ioapic_affinity_irq_desc(), and no affinity changes > are made. set_desc_affinity() is already checking if the incoming cpu mask intersects with the cpu online mask or not. So there is no need for the apic op cpu_mask_to_apicid_and() to check again and return BAD_APICID. Remove the BAD_APICID return value from cpu_mask_to_apicid_and() and also fix set_desc_affinity() to return -1 instead of using BAD_APICID to represent error conditions (as cpu_mask_to_apicid_and() can return logical or physical apicid values and BAD_APICID is really to represent bad physical apic id). Reported-by: John Blackwood <john.blackwood@ccur.com> Root-caused-by: John Blackwood <john.blackwood@ccur.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1261103386.2535.409.camel@sbs-t61> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | x86: Fix objdump version check in arch/x86/tools/chkobjdump.awkakpm@linux-foundation.org2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It says Warning: objdump version is older than 2.19 Warning: Skipping posttest. because it used the wrong field from `objdump -v': akpm:/usr/src/25> /opt/crosstool/gcc-4.0.2-glibc-2.3.6/x86_64-unknown-linux-gnu/bin/x86_64-unknown-linux-gnu-objdump -v GNU objdump 2.16.1 Copyright 2005 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License. This program has absolutely no warranty. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <200912172326.nBHNQaQl024796@imap1.linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Masami Hiramatsu <mhiramat@redhat.com>
| * | | | x86: Reenable TSC sync check at boot, even with NONSTOP_TSCPallipadi, Venkatesh2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 83ce4009 did the following change If the TSC is constant and non-stop, also set it reliable. But, there seems to be few systems that will end up with TSC warp across sockets, depending on how the cpus come out of reset. Skipping TSC sync test on such systems may result in time inconsistency later. So, reenable TSC sync test even on constant and non-stop TSC systems. Set, sched_clock_stable to 1 by default and reset it in mark_tsc_unstable, if TSC sync fails. This change still gives perf benefit mentioned in 83ce4009 for systems where TSC is reliable. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091217202702.GA18015@linux-os.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>