aboutsummaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAge
* rcu: Use softirq to address performance regressionShaohua Li2011-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread) introduced performance regression. In an AIM7 test, this commit degraded performance by about 40%. The commit runs rcu callbacks in a kthread instead of softirq. We observed high rate of context switch which is caused by this. Out test system has 64 CPUs and HZ is 1000, so we saw more than 64k context switch per second which is caused by RCU's per-CPU kthread. A trace showed that most of the time the RCU per-CPU kthread doesn't actually handle any callbacks, but instead just does a very small amount of work handling grace periods. This means that RCU's per-CPU kthreads are making the scheduler do quite a bit of work in order to allow a very small amount of RCU-related processing to be done. Alex Shi's analysis determined that this slowdown is due to lock contention within the scheduler. Unfortunately, as Peter Zijlstra points out, the scheduler's real-time semantics require global action, which means that this contention is inherent in real-time scheduling. (Yes, perhaps someone will come up with a workaround -- otherwise, -rt is not going to do well on large SMP systems -- but this patch will work around this issue in the meantime. And "the meantime" might well be forever.) This patch therefore re-introduces softirq processing to RCU, but only for core RCU work. RCU callbacks are still executed in kthread context, so that only a small amount of RCU work runs in softirq context in the common case. This should minimize ksoftirqd execution, allowing us to skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Tested-by: "Alex,Shi" <alex.shi@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2011-06-04
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits) tg3: Fix tg3_skb_error_unmap() net: tracepoint of net_dev_xmit sees freed skb and causes panic drivers/net/can/flexcan.c: add missing clk_put net: dm9000: Get the chip in a known good state before enabling interrupts drivers/net/davinci_emac.c: add missing clk_put af-packet: Add flag to distinguish VID 0 from no-vlan. caif: Fix race when conditionally taking rtnl lock usbnet/cdc_ncm: add missing .reset_resume hook vlan: fix typo in vlan_dev_hard_start_xmit() net/ipv4: Check for mistakenly passed in non-IPv4 address iwl4965: correctly validate temperature value bluetooth l2cap: fix locking in l2cap_global_chan_by_psm ath9k: fix two more bugs in tx power cfg80211: don't drop p2p probe responses Revert "net: fix section mismatches" drivers/net/usb/catc.c: Fix potential deadlock in catc_ctrl_run() sctp: stop pending timers and purge queues when peer restart asoc drivers/net: ks8842 Fix crash on received packet when in PIO mode. ip_options_compile: properly handle unaligned pointer iwlagn: fix incorrect PCI subsystem id for 6150 devices ...
| * Merge branch 'master' of ↵John W. Linville2011-06-03
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into for-davem
| | * cfg80211: don't drop p2p probe responsesEliad Peller2011-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0a35d36 ("cfg80211: Use capability info to detect mesh beacons") assumed that probe response with both ESS and IBSS bits cleared means that the frame was sent by a mesh sta. However, these capabilities are also being used in the p2p_find phase, and the mesh-validation broke it. Rename the WLAN_CAPABILITY_IS_MBSS macro, and verify that mesh ies exist before assuming this frame was sent by a mesh sta. Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | net: tracepoint of net_dev_xmit sees freed skb and causes panicKoki Sanagi2011-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because there is a possibility that skb is kfree_skb()ed and zero cleared after ndo_start_xmit, we should not see the contents of skb like skb->len and skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that and causes panic by NULL pointer dereference. This patch fixes trace_net_dev_xmit not to see the contents of skb directly. If you want to reproduce this panic, 1. Get tracepoint of net_dev_xmit on 2. Create 2 guests on KVM 2. Make 2 guests use virtio_net 4. Execute netperf from one to another for a long time as a network burden 5. host will panic(It takes about 30 minutes) Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | af-packet: Add flag to distinguish VID 0 from no-vlan.Ben Greear2011-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, user-space cannot determine if a 0 tcp_vlan_tci means there is no VLAN tag or the VLAN ID was zero. Add flag to make this explicit. User-space can check for TP_STATUS_VLAN_VALID || tp_vlan_tci > 0, which will be backwards compatible. Older could would have just checked for tp_vlan_tci, so it will work no worse than before. Signed-off-by: Ben Greear <greearb@candelatech.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sctp: stop pending timers and purge queues when peer restart asocWei Yongjun2011-05-31
| |/ | | | | | | | | | | | | | | | | If the peer restart the asoc, we should not only fail any unsent/unacked data, but also stop the T3-rtx, SACK, T4-rto timers, and teardown ASCONF queues. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2011-06-03
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-block: block: Use hlist_entry() for io_context.cic_list.first cfq-iosched: Remove bogus check in queue_fail path xen/blkback: potential null dereference in error handling xen/blkback: don't call vbd_size() if bd_disk is NULL block: blkdev_get() should access ->bd_disk only after success CFQ: Fix typo and remove unnecessary semicolon block: remove unwanted semicolons Revert "block: Remove extra discard_alignment from hd_struct." nbd: adjust 'max_part' according to part_shift nbd: limit module parameters to a sane value nbd: pass MSG_* flags to kernel_recvmsg() block: improve the bio_add_page() and bio_add_pc_page() descriptions
| * | block: remove unwanted semicolonsNamhyung Kim2011-05-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since those defined functions require additional semicolon from the caller, they could cause potential syntax errors when used in if-else statements. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | Revert "block: Remove extra discard_alignment from hd_struct."Jens Axboe2011-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was not a good idea to start dereferencing disk->queue from the fs sysfs strategy for displaying discard alignment. We ran into first a NULL pointer deref, and after fixing that we sometimes see unvalid disk->queue pointer values. Since discard is the only one of the bunch actually looking into the queue, just revert the change. This reverts commit 23ceb5b7719e9276d4fa72a3ecf94dd396755276. Conflicts: fs/partitions/check.c
* | | Merge branch 'stable' of ↵Linus Torvalds2011-06-03
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: asm-generic/unistd.h: support sendmmsg syscall tile: enable CONFIG_BUGVERBOSE
| * | | asm-generic/unistd.h: support sendmmsg syscallChris Metcalf2011-06-02
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Arnd Bergmann <arnd@arndb.de>
* | | | Revert "tty: make receive_buf() return the amout of bytes received"Linus Torvalds2011-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit b1c43f82c5aa265442f82dba31ce985ebb7aa71c. It was broken in so many ways, and results in random odd pty issues. It re-introduced the buggy schedule_work() in flush_to_ldisc() that can cause endless work-loops (see commit a5660b41af6a: "tty: fix endless work loop when the buffer fills up"). It also used an "unsigned int" return value fo the ->receive_buf() function, but then made multiple functions return a negative error code, and didn't actually check for the error in the caller. And it didn't actually work at all. BenH bisected down odd tty behavior to it: "It looks like the patch is causing some major malfunctions of the X server for me, possibly related to PTYs. For example, cat'ing a large file in a gnome terminal hangs the kernel for -minutes- in a loop of what looks like flush_to_ldisc/workqueue code, (some ftrace data in the quoted bits further down). ... Some more data: It -looks- like what happens is that the flush_to_ldisc work queue entry constantly re-queues itself (because the PTY is full ?) and the workqueue thread will basically loop forver calling it without ever scheduling, thus starving the consumer process that could have emptied the PTY." which is pretty much exactly the problem we fixed in a5660b41af6a. Milton Miller pointed out the 'unsigned int' issue. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: Milton Miller <miltonm@bga.com> Cc: Stefan Bigler <stefan.bigler@keymile.com> Cc: Toby Gray <toby.gray@realvnc.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge git://git.infradead.org/iommu-2.6Linus Torvalds2011-06-01
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/iommu-2.6: intel-iommu: Fix off-by-one in RMRR setup intel-iommu: Add domain check in domain_remove_one_dev_info intel-iommu: Remove Host Bridge devices from identity mapping intel-iommu: Use coherent DMA mask when requested intel-iommu: Dont cache iova above 32bit intel-iommu: Speed up processing of the identity_mapping function intel-iommu: Check for identity mapping candidate using system dma mask intel-iommu: Only unlink device domains from iommu intel-iommu: Enable super page (2MiB, 1GiB, etc.) support intel-iommu: Flush unmaps at domain_exit intel-iommu: Remove obsolete comment from detect_intel_iommu intel-iommu: fix VT-d PMR disable for TXT on S3 resume
| * | | | intel-iommu: Enable super page (2MiB, 1GiB, etc.) supportYouquan Song2011-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are no externally-visible changes with this. In the loop in the internal __domain_mapping() function, we simply detect if we are mapping: - size >= 2MiB, and - virtual address aligned to 2MiB, and - physical address aligned to 2MiB, and - on hardware that supports superpages. (and likewise for larger superpages). We automatically use a superpage for such mappings. We never have to worry about *breaking* superpages, since we trust that we will always *unmap* the same range that was mapped. So all we need to do is ensure that dma_pte_clear_range() will also cope with superpages. Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so it can return a PTE at the appropriate level rather than always extending the page tables all the way down to level 1. Again, this is simplified by the fact that we should never encounter existing small pages when we're creating a mapping; any old mapping that used the same virtual range will have been entirely removed and its obsolete page tables freed. Provide an 'intel_iommu=sp_off' argument on the command line as a chicken bit. Not that it should ever be required. == The original commit seen in the iommu-2.6.git was Youquan's implementation (and completion) of my own half-baked code which I'd typed into an email. Followed by half a dozen subsequent 'fixes'. I've taken the unusual step of rewriting history and collapsing the original commits in order to keep the main history simpler, and make life easier for the people who are going to have to backport this to older kernels. And also so I can give it a more coherent commit comment which (hopefully) gives a better explanation of what's going on. The original sequence of commits leading to identical code was: Youquan Song (3): intel-iommu: super page support intel-iommu: Fix superpage alignment calculation error intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte() David Woodhouse (4): intel-iommu: Precalculate superpage support for dmar_domain intel-iommu: Fix hardware_largepage_caps() intel-iommu: Fix inappropriate use of superpages in __domain_mapping() intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* | | | | Merge git://git.infradead.org/mtd-2.6Linus Torvalds2011-06-01
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/mtd-2.6: mtd: fix physmap.h warnings
| * | | | | mtd: fix physmap.h warningsRandy Dunlap2011-06-01
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix build warnings in physmap.h: include/linux/mtd/physmap.h:25: warning: 'struct platform_device' declared inside parameter list include/linux/mtd/physmap.h:25: warning: its scope is only this definition or declaration, which is probably not what you want include/linux/mtd/physmap.h:26: warning: 'struct platform_device' declared inside parameter list include/linux/mtd/physmap.h:27: warning: 'struct platform_device' declared inside parameter list Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* | | | | virtio: add api for delayed callbacksMichael S. Tsirkin2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an API that tells the other side that callbacks should be delayed until a lot of work has been done. Implement using the new event_idx feature. Note: it might seem advantageous to let the drivers ask for a callback after a specific capacity has been reached. However, as a single head can free many entries in the descriptor table, we don't really have a clue about capacity until get_buf is called. The API is the simplest to implement at the moment, we'll see what kind of hints drivers can pass when there's more than one user of the feature. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | | | | virtio ring: inline function to check for eventsMichael S. Tsirkin2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the new used_event and avail_event and features, both host and guest need similar logic to check whether events are enabled, so it helps to put the common code in the header. Note that Xen has similar logic for notification hold-off in include/xen/interface/io/ring.h with req_event and req_prod corresponding to event_idx + 1 and new_idx respectively. +1 comes from the fact that req_event and req_prod in Xen start at 1, while event index in virtio starts at 0. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | | | | virtio: event index interfaceMichael S. Tsirkin2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define a new feature bit for the guest and host to utilize an event index (like Xen) instead if a flag bit to enable/disable interrupts and kicks. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | | | | virtio: add full three-clause BSD text to headers.Rusty Russell2011-05-29
| |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's unclear to me if it's important, but it's obviously causing my technical colleages some headaches and I'd hate such imprecision to slow virtio adoption. I've emailed this to all non-trivial contributors for approval, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Ryan Harper <ryanh@us.ibm.com> Acked-by: Anthony Liguori <aliguori@us.ibm.com> Acked-by: Eric Van Hensbergen <ericvh@gmail.com> Acked-by: john cooper <john.cooper@redhat.com> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
* | | | Merge branch 'pnfs-submit' of git://git.open-osd.org/linux-open-osdLinus Torvalds2011-05-29
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'pnfs-submit' of git://git.open-osd.org/linux-open-osd: (32 commits) pnfs-obj: pg_test check for max_io_size NFSv4.1: define nfs_generic_pg_test NFSv4.1: use pnfs_generic_pg_test directly by layout driver NFSv4.1: change pg_test return type to bool NFSv4.1: unify pnfs_pageio_init functions pnfs-obj: objlayout_encode_layoutcommit implementation pnfs: encode_layoutcommit pnfs-obj: report errors and .encode_layoutreturn Implementation. pnfs: encode_layoutreturn pnfs: layoutret_on_setattr pnfs: layoutreturn pnfs-obj: osd raid engine read/write implementation pnfs: support for non-rpc layout drivers pnfs-obj: define per-inode private structure pnfs: alloc and free layout_hdr layoutdriver methods pnfs-obj: objio_osd device information retrieval and caching pnfs-obj: decode layout, alloc/free lseg pnfs-obj: pnfs_osd XDR client implementation pnfs-obj: pnfs_osd XDR definitions pnfs-obj: objlayoutdriver module skeleton ...
| * | | | NFSv4.1: change pg_test return type to boolBenny Halevy2011-05-29
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Benny Halevy <bhalevy@panasas.com>
| * | | | pnfs: layoutreturnBenny Halevy2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFSv4.1 LAYOUTRETURN implementation Currently, does not support layout-type payload encoding. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> [call pnfs_return_layout right before pnfs_destroy_layout] [remove assert_spin_locked from pnfs_clear_lseg_list] [remove wait parameter from the layoutreturn path.] [remove return_type field from nfs4_layoutreturn_args] [remove range from nfs4_layoutreturn_args] [no need to send layoutcommit from _pnfs_return_layout] [don't wait on sync layoutreturn] [fix layout stateid in layoutreturn args] [fixed NULL deref in _pnfs_return_layout] [removed recaim member of nfs4_layoutreturn_args] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
| * | | | pnfs: support for non-rpc layout driversBenny Halevy2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Non-rpc layout driver such as for objects and blocks implement their own I/O path and error handling logic. Therefore bypass NFS-based error handling for these layout drivers. [fix lseg ref-count bugs, and null de-refs] [Fall out from: non-rpc layout drivers] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [get rid of PNFS_USE_RPC_CODE] [get rid of __nfs4_write_done_cb] [revert useless change in nfs4_write_done_cb] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
| * | | | pnfs-obj: pnfs_osd XDR definitionsBenny Halevy2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add the pnfs_osd_xdr.h header * defintions the pnfs_osd_layout structure including all it's sub-types and constants. * Declare the pnfs_osd_xdr_decode_layout API + all needed inline helpers. * Define the pnfs_osd_deviceaddr structure and all its subtypes and constants. * Declare API for decoding of a pnfs_osd_deviceaddr from XDR stream. * Define the pnfs_osd_ioerr structure, its substructures and constants. * Declare API for encoding of a pnfs_osd_ioerr into XDR stream. * Define the pnfs_osd_layoutupdate structure and its substructures. * Declare API for encoding of a pnfs_osd_layoutupdate into XDR stream. [Remove server definitions] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
| * | | | SUNRPC: introduce xdr_init_decode_pagesBenny Halevy2011-05-29
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | Initialize xdr_stream and xdr_buf using an array of page pointers and length of buffer. Signed-off-by: Benny Halevy <bhalevy@panasas.com>
* | | | mm: Fix boot crash in mm_alloc()Linus Torvalds2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thomas Gleixner reports that we now have a boot crash triggered by CONFIG_CPUMASK_OFFSTACK=y: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c11ae035>] find_next_bit+0x55/0xb0 Call Trace: [<c11addda>] cpumask_any_but+0x2a/0x70 [<c102396b>] flush_tlb_mm+0x2b/0x80 [<c1022705>] pud_populate+0x35/0x50 [<c10227ba>] pgd_alloc+0x9a/0xf0 [<c103a3fc>] mm_init+0xec/0x120 [<c103a7a3>] mm_alloc+0x53/0xd0 which was introduced by commit de03c72cfce5 ("mm: convert mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of mm_init() vs mm_init_cpumask Thomas wrote a patch to just fix the ordering of initialization, but I hate the new double allocation in the fork path, so I ended up instead doing some more radical surgery to clean it all up. Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Ingo Molnar <mingo@elte.hu> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds2011-05-29
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] mm: fix mmu_gather rework [S390] mm: fix storage key handling
| * | | | [S390] mm: fix storage key handlingHeiko Carstens2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | page_get_storage_key() and page_set_storage_key() expect a page address and not its page frame number. This got inconsistent with 2d42552d "[S390] merge page_test_dirty and page_clear_dirty". Result is that we read/write storage keys from random pages and do not have a working dirty bit tracking at all. E.g. SetPageUpdate() doesn't clear the dirty bit of requested pages, which for example ext4 doesn't like very much and panics after a while. Unable to handle kernel paging request at virtual user address (null) Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 Not tainted 2.6.39-07551-g139f37f-dirty #152 Process flush-94:0 (pid: 1576, task: 000000003eb34538, ksp: 000000003c287b70) Krnl PSW : 0704c00180000000 0000000000316b12 (jbd2_journal_file_inode+0x10e/0x138) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 0000000000000000 0000000000000000 0000000000000000 0700000000000000 0000000000316a62 000000003eb34cd0 0000000000000025 000000003c287b88 0000000000000001 000000003c287a70 000000003f1ec678 000000003f1ec000 0000000000000000 000000003e66ec00 0000000000316a62 000000003c287988 Krnl Code: 0000000000316b04: f0a0000407f4 srp 4(11,%r0),2036,0 0000000000316b0a: b9020022 ltgr %r2,%r2 0000000000316b0e: a7740015 brc 7,316b38 >0000000000316b12: e3d0c0000024 stg %r13,0(%r12) 0000000000316b18: 4120c010 la %r2,16(%r12) 0000000000316b1c: 4130d060 la %r3,96(%r13) 0000000000316b20: e340d0600004 lg %r4,96(%r13) 0000000000316b26: c0e50002b567 brasl %r14,36d5f4 Call Trace: ([<0000000000316a62>] jbd2_journal_file_inode+0x5e/0x138) [<00000000002da13c>] mpage_da_map_and_submit+0x2e8/0x42c [<00000000002daac2>] ext4_da_writepages+0x2da/0x504 [<00000000002597e8>] writeback_single_inode+0xf8/0x268 [<0000000000259f06>] writeback_sb_inodes+0xd2/0x18c [<000000000025a700>] writeback_inodes_wb+0x80/0x168 [<000000000025aa92>] wb_writeback+0x2aa/0x324 [<000000000025abde>] wb_do_writeback+0xd2/0x274 [<000000000025ae3a>] bdi_writeback_thread+0xba/0x1c4 [<00000000001737be>] kthread+0xa6/0xb0 [<000000000056c1da>] kernel_thread_starter+0x6/0xc [<000000000056c1d4>] kernel_thread_starter+0x0/0xc INFO: lockdep is turned off. Last Breaking-Event-Address: [<0000000000316a8a>] jbd2_journal_file_inode+0x86/0x138 Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* | | | | Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2011-05-29
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: (22 commits) nfsd: make local functions static NFSD: Remove unused variable from nfsd4_decode_bind_conn_to_session() NFSD: Check status from nfsd4_map_bcts_dir() NFSD: Remove setting unused variable in nfsd_vfs_read() nfsd41: error out on repeated RECLAIM_COMPLETE nfsd41: compare request's opcnt with session's maxops at nfsd4_sequence nfsd v4.1 lOCKT clientid field must be ignored nfsd41: add flag checking for create_session nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly nfsd4: fix wrongsec handling for PUTFH + op cases nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller nfsd4: introduce OPDESC helper nfsd4: allow fh_verify caller to skip pseudoflavor checks nfsd: distinguish functions of NFSD_MAY_* flags svcrpc: complete svsk processing on cb receive failure svcrpc: take advantage of tcp autotuning SUNRPC: Don't wait for full record to receive tcp data svcrpc: copy cb reply instead of pages svcrpc: close connection if client sends short packet svcrpc: note network-order types in svc_process_calldir ...
| * | | | | nfsd41: add flag checking for create_sessionMi Jinlong2011-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach the NFS server to reject invalid create_session flags. Also do some minor formatting adjustments. Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | SUNRPC: Don't wait for full record to receive tcp dataJ. Bruce Fields2011-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that we immediately read and buffer data from the incoming TCP stream so that we grow the receive window quickly, and don't deadlock on large READ or WRITE requests. Also do some minor exit cleanup. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dmLinus Torvalds2011-05-29
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: dm kcopyd: return client directly and not through a pointer dm kcopyd: reserve fewer pages dm io: use fixed initial mempool size dm kcopyd: alloc pages from the main page allocator dm kcopyd: add gfp parm to alloc_pl dm kcopyd: remove superfluous page allocation spinlock dm kcopyd: preallocate sub jobs to avoid deadlock dm kcopyd: avoid pointless job splitting dm mpath: do not fail paths after integrity errors dm table: reject devices without request fns dm table: allow targets to support discards internally
| * | | | | | dm kcopyd: return client directly and not through a pointerMikulas Patocka2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return client directly from dm_kcopyd_client_create, not through a parameter, making it consistent with dm_io_client_create. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * | | | | | dm kcopyd: reserve fewer pagesMikulas Patocka2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reserve just the minimum of pages needed to process one job. Because we allocate pages from page allocator, we don't need to reserve a large number of pages. The maximum job size is SUB_JOB_SIZE and we calculate the number of reserved pages based on this. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * | | | | | dm io: use fixed initial mempool sizeMikulas Patocka2011-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the arbitrary calculation of an initial io struct mempool size with a constant. The code calculated the number of reserved structures based on the request size and used a "magic" multiplication constant of 4. This patch changes it to reserve a fixed number - itself still chosen quite arbitrarily. Further testing might show if there is a better number to choose. Note that if there is no memory pressure, we can still allocate an arbitrary number of "struct io" structures. One structure is enough to process the whole request. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
| * | | | | | dm table: allow targets to support discards internallyMike Snitzer2011-05-29
| | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Permit a target to support discards regardless of whether or not all its underlying devices do. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | | | | | Merge branch 'nfs-for-2.6.40' of ↵Linus Torvalds2011-05-29
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Support for RPC over AF_LOCAL transports SUNRPC: Remove obsolete comment SUNRPC: Use AF_LOCAL for rpcbind upcalls SUNRPC: Clean up use of curly braces in switch cases NFS: Revert NFSROOT default mount options SUNRPC: Rename xs_encode_tcp_fragment_header() nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu() nfs41: Correct offset for LAYOUTCOMMIT NFS: nfs_update_inode: print current and new inode size in debug output NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors NFSv4: Handle expired stateids when the lease is still valid SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...
| * | | | | | SUNRPC: Support for RPC over AF_LOCAL transportsChuck Lever2011-05-27
| | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TI-RPC introduces the capability of performing RPC over AF_LOCAL sockets. It uses this mainly for registering and unregistering local RPC services securely with the local rpcbind, but we could also conceivably use it as a generic upcall mechanism. This patch provides a client-side only implementation for the moment. We might also consider a server-side implementation to provide AF_LOCAL access to NLM (for statd downcalls, and such like). Autobinding is not supported on kernel AF_LOCAL transports at this time. Kernel ULPs must specify the pathname of the remote endpoint when an AF_LOCAL transport is created. rpcbind supports registering services available via AF_LOCAL, so the kernel could handle it with some adjustment to ->rpcbind and ->set_port. But we don't need this feature for doing upcalls via well-known named sockets. This has not been tested with ULPs that move a substantial amount of data. Thus, I can't attest to how robust the write_space and congestion management logic is. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | | | Merge branch 'release' of ↵Linus Torvalds2011-05-29
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: ACPI EC: remove redundant code ACPI: Add D3 cold state ACPI: processor: fix processor_physically_present in UP kernel ACPI: Split out custom_method functionality into an own driver ACPI: Cleanup custom_method debug stuff ACPI EC: enable MSI workaround for Quanta laptops ACPICA: Update to version 20110413 ACPICA: Execute an orphan _REG method under the EC device ACPICA: Move ACPI_NUM_PREDEFINED_REGIONS to a more appropriate place ACPICA: Update internal address SpaceID for DataTable regions ACPICA: Add more methods eligible for NULL package element removal ACPICA: Split all internal Global Lock functions to new file - evglock ACPI: EC: add another DMI check for ASUS hardware ACPI EC: remove dead code ACPICA: Fix code divergence of global lock handling ACPICA: Use acpi_os_create_lock interface ACPI: osl, add acpi_os_create_lock interface ACPI:Fix goto flows in thermal-sys
| * \ \ \ \ \ Merge branch 'ec-cleanup' into releaseLen Brown2011-05-29
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/platform/x86/compal-laptop.c
| | * | | | | | ACPI EC: remove dead codeThomas Renninger2011-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | static void acpi_ec_gpe_query(void *ec_cxt); -> The function is right above this declaration -> not needed. poll_force is also not used, cleaned up in ec.c and its users: compal-laptop and msi-laptop. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>
| | | | | | | |
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| | \ \ \ \ \ \
| *-------. | | | | | | Merge branches 'acpica', 'aml-custom', 'bugzilla-16548', 'bugzilla-20242', ↵Len Brown2011-05-29
| |\ \ \ \ \| | | | | | | | | |_|_|_|/ / / / / | | |/| | | | | | | | | | | | | | | | | | | 'd3-cold', 'ec-asus' and 'thermal-fix' into release
| | | | * | | | | | | ACPI: Add D3 cold stateLin Ming2011-05-29
| | | |/ / / / / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _SxW returns an Integer containing the lowest D-state supported in state Sx. If OSPM has not indicated that it supports _PR3, then the value “3” corresponds to D3. If it has indicated _PR3 support, the value “3” represents D3hot and the value “4” represents D3cold. Linux does set _OSC._PR3, so we should fix it to expect that _SxW can return 4. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Len Brown <len.brown@intel.com>
| | | * | | | | | | ACPI: processor: fix processor_physically_present in UP kernelLin Ming2011-05-29
| | |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Usually, there are multiple processors defined in ACPI table, for example Scope (_PR) { Processor (CPU0, 0x00, 0x00000410, 0x06) {} Processor (CPU1, 0x01, 0x00000410, 0x06) {} Processor (CPU2, 0x02, 0x00000410, 0x06) {} Processor (CPU3, 0x03, 0x00000410, 0x06) {} } processor_physically_present(...) will be called to check whether those processors are physically present. Currently we have below codes in processor_physically_present, cpuid = acpi_get_cpuid(...); if ((cpuid == -1) && (num_possible_cpus() > 1)) return false; return true; In UP kernel, acpi_get_cpuid(...) always return -1 and num_possible_cpus() always return 1, so processor_physically_present(...) always returns true for all passed in processor handles. This is wrong for UP processor or SMP processor running UP kernel. This patch removes the !SMP version of acpi_get_cpuid(), so both UP and SMP kernel use the same acpi_get_cpuid function. And for UP kernel, only processor 0 is valid. https://bugzilla.kernel.org/show_bug.cgi?id=16548 https://bugzilla.kernel.org/show_bug.cgi?id=16357 Tested-by: Anton Kochkov <anton.kochkov@gmail.com> Tested-by: Ambroz Bizjak <ambrop7@gmail.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | | | | | ACPICA: Update to version 20110413Bob Moore2011-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Version 20110413 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | | | | | ACPICA: Execute an orphan _REG method under the EC deviceBob Moore2011-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change will force the execution of a _REG method underneath the EC device even if there is no corresponding operation region of type EmbeddedControl. Fixes a problem seen on some machines and apparently is compatible with Windows behavior. http://www.acpica.org/bugzilla/show_bug.cgi?id=875 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | | | | | ACPICA: Move ACPI_NUM_PREDEFINED_REGIONS to a more appropriate placeBob Moore2011-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Moved to where the predefined regions are actually defined. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | | | | | ACPICA: Update internal address SpaceID for DataTable regionsBob Moore2011-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Moved this internal space id in preparation for ACPI 5.0 changes that will include some new space IDs. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>