aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
*---------------. Merge branches 'cma', 'cxgb3', 'ehca', 'ipath', 'ipoib', 'mad', 'misc', ↵Roland Dreier2008-10-09
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'mlx4', 'mthca' and 'nes' into for-next
| | | | | | | | | * RDMA/nes: Fix slab corruptionChien Tung2008-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Referencing cm_node after it is freed via rem_ref_cm_node() causes a slab corruption. There is no need to set cm_node->cm_id to NULL in mini_cm_close(). Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Correct error_module bit maskChien Tung2008-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | error_module is 5 bits wide not 4. The corresponding crit_error_count array is correct with 32 entries. Signed-off-by: Chien Tung <ctung@neteffect.com> -- drivers/infiniband/hw/nes/nes_hw.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Fix routed RDMA connectionsBob Sharp2008-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix routed RDMA connections to destinations where the next hop is not the final destination. Use neigh_*() to properly locate neighbor. Signed-off-by: Bob Sharp <bsharp@neteffect.com> Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com>
| | | | | | | | | * RDMA/nes: Enhanced PFT management schemeVadim Makhervaks2008-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change management of perfect filter table to allow enhanced performance applications. Signed-off-by: Vadim Makhervaks <vmakhervaks@neteffect.com> Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Handle AE bounds violationFaisal Latif2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle async error NES_AEQE_AEID_AMP_BOUNDS_VIOLATION. Signed-off-by: Faisal Latif <flatif@neteffect.com> Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Limit critical error interruptsChien Tung2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mask off a critical error after 100 critical error interrupts to keep the system "sane". Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Stop spurious MAC interruptsChien Tung2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mask off MAC interrupts on netdev_stop to prevent spurious MAC interrupts on unload/reload of iw_nes. Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Correct tso_wqe_lengthChien Tung2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Fill in firmware version for ethtoolChien Tung2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fill in firmware version for ethtool_drvinfo. Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Use ethtool timer valueJohn Lacombe2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use timer value set via ethtool intead of #defines. Signed-off-by: John Lacombe <jlacombe@neteffect.com> Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Correct MAX TSO frags valueBob Sharp2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use correct define for max TSO fragments. Signed-off-by: Bob Sharp <bsharp@neteffect.com> Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Enable MC/UC after changing MTUBob Sharp2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-enable multicast and unicast after changing MTU. Signed-off-by: Bob Sharp <bsharp@neteffect.com> Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Free NIC TX buffers when destroying NIC QPBob Sharp2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Bob Sharp <bsharp@neteffect.com> Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Fix MDC settingChien Tung2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear MDC bits before setting them to a new value. Adjust MDC value for 10G. Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Add wqm_quanta module optionChien Tung2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a module parameter wqm_quanta. It controls the number of segments transmitted at a time. Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Module parameter permissionsChien Tung2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change permission to 0644 so root can set mpa_version, disable_mpa_crc, send_first, and nes_drv_opt at runtime. Signed-off-by: Sweta Bhatt <sweta.bhatt@einfochips.com> Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Add support for 4-port 1G HP blade cardChien Tung2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for NetEffect 4 port 1G HP blade card. The mapping between physical port and MAC is different from the standup card. Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | | * RDMA/nes: Make mini_cm_connect() staticFaisal Latif2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Faisal Latif <flatif@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | | * | IB/mthca: Use pci_request_regions()Roland Dreier2008-09-30
| | | | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back in prehistoric (pre-git!) days, the kernel's MSI-X support did request_mem_region() on a device's MSI-X tables, which meant that a driver that enabled MSI-X couldn't use pci_request_regions() (since that would clash with the PCI layer's MSI-X request). However, that was removed (by me!) years ago, so mthca can just use pci_request_regions() and pci_release_regions() instead of its own much more complicated code that avoids requesting the MSI-X tables. Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | | * / IB/mlx4: Set RLKEY bit for kernel QPsVladimir Sokolovsky2008-10-08
| | | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set RLKEY bit in the HW context for kernel QPs so that kernel QPs can use the reserved L_Key for memory reference. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | | * / IB: Drop code after return statementJulia Lawall2008-09-20
| | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A break after a return serves no purpose, remove it. Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Richard Genoud <richard.genoud@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | | * / IB/mad: Don't discard BMA responses in kernelMichael Brooks2008-09-20
| | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the problem of incoming BMA responses being dropped due to a bad "is response" check. Fix the test to use the ib_response_mad() predicate, which correctly handles BMA MADs. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=988>. Signed-off-by: Michael Brooks <michael.brooks@qlogic.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | * | IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTXRoland Dreier2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling tx_lock. Not only do we want to get rid of LLTX, this actually causes problems because of the skb_orphan() done with this tx_lock held: some skb destructors expect to be run with interrupts enabled. The simplest fix for this is to get rid of the driver-private tx_lock and stop using LLTX. We kill off priv->tx_lock and use netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit tricky because we need to update places that take priv->lock inside the tx_lock to disable IRQs, rather than relying on tx_lock having already disabled IRQs. Also, there are a couple of places where we need to disable BHs to make sure we have a consistent context to call netif_tx_lock() (since we no longer can use _irqsave() variants), and we also have to change ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather than directly, because ipoib_send_comp_handler() runs in interrupt context and drain_tx_cq() must run in BH context so it can call netif_tx_lock(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | | * | IPoIB: Fix crash when path record fails after path flushRoland Dreier2008-09-25
| | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events") changed how paths are flushed on an SM event. This change introduces a problem if the path record query triggered by fails, causing path->ah to become NULL. A later successful path query will then trigger WARN_ON() in path_rec_completion(), and crash because path->ah has already been freed, so the ipoib_put_ah() inside the lock in path_rec_completion() may actually drop the last reference (contrary to the comment that claims this is safe). Fix this by updating path->ah and freeing old_ah only when the path record query is successful. This prevents the neighbour AH and that path AH from getting out of sync. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194> Reported-by: Rabah Salem <ravah@mellanox.com> Debugged-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | * | IB/ipath: Fix hang on module unloadYannick Cote2008-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle the case where posting a send is requested when the link is down. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1117>. Signed-off-by: Yannick Cote <yannick.cote@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | | * | IB/ipath: Fix SLID generation for RC/UC QPs when LMC > 0Ralph Campbell2008-09-20
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code to set the source LID in the sent LRH was not setting the low bits if LMC != 0 for RC/UC QPs. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * / IB/ehca: Generate flush status CQ entriesAlexander Schmidt2008-09-20
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a QP goes into error state, it is required that CQ entries with a flush error status are delivered to the application for any outstanding work requests. eHCA does not do this in hardware, so this patch adds software flush CQE generation to the ehca driver. Whenever a QP gets into error state, it is added to the QP error list of its respective CQ. If the error QP list of a CQ is not empty, poll_cq() generates flush CQEs before polling the actual CQ. Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * / RDMA/cxgb3: Set active_mtu in ib_port_attrJon Mason2008-09-30
| |/ | | | | | | | | | | | | | | | | | | | | | | When running ibv_devinfo, the active_mtu returned is garbage. This is due to the field not being populated in the query_port function in the driver. The patch below populates the active_mtu field with a MTU of 2k. It also zeros the struct, so that any new additions to it will return 0. Signed-off-by: Jon Mason <jon@opengridcomputing.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* / IB/cm: Correctly free cm_device structureHefty, Sean2008-09-30
|/ | | | | | | | | | commit 110cf374 ("infiniband: make cm_device use a struct device and not a kobject.") introduced a memory leak, since it deleted cm_release_dev_obj(), which was where cm_dev was freed. Fix this by freeing the leaked structure after calling device_unregister(). Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6Linus Torvalds2008-09-19
|\ | | | | | | | | | | * 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [S390] cio: fix orb initialization in cio_start_key [S390] cio: Fix driver_data handling for ccwgroup devices.
| * [S390] cio: fix orb initialization in cio_start_keyStefan Weinhuber2008-09-16
| | | | | | | | | | | | | | | | | | | | | | The functions cio_tm_start_key and cio_start_key use the same private orb structure of a subchannel, so the orb needs to be cleared of old data before it is used again. A respective memset is missing from cio_start_key and hereby added. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * [S390] cio: Fix driver_data handling for ccwgroup devices.Cornelia Huck2008-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 16f7f9564c3ae190954f2ec55f385a268b93ac4d, we've seen oopses when grouping/ungrouping devices: Unable to handle kernel pointer dereference at virtual kernel address 0000000000 114000 Oops: 0004 [#1] PREEMPT SMP Modules linked in: bonding qeth_l2 dm_multipath sunrpc qeth_l3 dm_mod qeth chsc_ sch ccwgroup CPU: 1 Not tainted 2.6.26-29.x.20080815-s390xdefault #1 Process iperf (pid: 24412, task: 000000003f446038, ksp: 000000003c929e08) Krnl PSW : 0404d00180000000 000003e00006f6e6 (qeth_irq+0xda/0xb28 [qeth]) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3 Krnl GPRS: 0000000000000000 000003e000000003 0000000000000000 0000000000114ccc 000000003fb82e48 000003e00006f60c 000000000000000c 000000003ce72100 0000000000114944 000000003fb82e48 0000000000114ccc 000000003fe8fd28 000003e000066000 000003e000076128 000000003fe8fdb8 000000003fe8fd28 Krnl Code: 000003e00006f6da: bf3f2024 icm %r3,15,36(%r2) 000003e00006f6de: a774023c brc 7,3e00006fb56 000003e00006f6e2: a7280000 lhi %r2,0 >000003e00006f6e6: 5020a1a0 st %r2,416(%r10) 000003e00006f6ea: 58109000 l %r1,0(%r9) 000003e00006f6ee: a7111000 tmll %r1,4096 000003e00006f6f2: a77400f9 brc 7,3e00006f8e4 000003e00006f6f6: 8810000c srl %r1,12 Call Trace: ([<000000003fe8fd20>] 0x3fe8fd20) [<000000000033bf2a>] ccw_device_call_handler+0xb2/0xd8 [<0000000000339e1c>] ccw_device_irq+0x124/0x164 [<0000000000339758>] io_subchannel_irq+0x8c/0x118 [<00000000003309ba>] do_IRQ+0x192/0x1bc [<0000000000114f66>] io_return+0x0/0x8 [<00000000001149cc>] sysc_do_svc+0x0/0x22 ([<0000000000114a18>] sysc_noemu+0x10/0x16) [<00000200002e047c>] 0x200002e047c Last Breaking-Event-Address: [<000003e00006f6d6>] qeth_irq+0xca/0xb28 [qeth] The problem is that dev->driver_data for a ccw device is NULL, while it should point to the ccwgroup device it is a member of. This happened due to incorrect cleanup if creating a ccwgroup device failed because the ccw devices were already grouped. Fix this by setting cdev[i] to NULL in the error handling of ccwgroup_create_from_string() after we give up our reference and by checking if the driver_data points to the ccwgroup device in ccwgroup_release() just to be really sure. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | Merge git://oss.sgi.com:8090/xfs/linux-2.6Linus Torvalds2008-09-19
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://oss.sgi.com:8090/xfs/linux-2.6: [XFS] Don't do I/O beyond eof when unreserving space [XFS] Fix use-after-free with buffers [XFS] Prevent lockdep false positives when locking two inodes. [XFS] Fix barrier status change detection. [XFS] Prevent direct I/O from mapping extents beyond eof [XFS] Fix regression introduced by remount fixup [XFS] Move memory allocations for log tracing out of the critical path
| * | [XFS] Don't do I/O beyond eof when unreserving spaceLachlan McIlroy2008-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When unreserving space with boundaries that are not block aligned we round up the start and round down the end boundaries and then use this function, xfs_zero_remaining_bytes(), to zero the parts of the blocks that got dropped during the rounding. The problem is we don't consider if these blocks are beyond eof. Worse still is if we encounter delayed allocations beyond eof we will try to use the magic delayed allocation block number as a real block number. If the file size is ever extended to expose these blocks then we'll go through xfs_zero_eof() to zero them anyway. SGI-PV: 983683 SGI-Modid: xfs-linux-melb:xfs-kern:32055a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
| * | [XFS] Fix use-after-free with buffersLachlan McIlroy2008-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a use-after-free issue where log completions access buffers via the buffer log item and the buffer has already been freed. Fix this by taking a reference on the buffer when attaching the buffer log item and release the hold when the buffer log item is detached and we no longer need the buffer. Also create a new function xfs_buf_item_free() to combine some common code. SGI-PV: 985757 SGI-Modid: xfs-linux-melb:xfs-kern:32025a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
| * | [XFS] Prevent lockdep false positives when locking two inodes.David Chinner2008-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we call xfs_lock_two_inodes() to grab both the iolock and the ilock, then drop the ilocks on both inodes, then grab them again (as xfs_swap_extents() does) then lockdep will report a locking order problem. This is a false positive. To avoid this, disallow xfs_lock_two_inodes() fom locking both inode locks at once - force calers to make two separate calls. This means that nested dropping and regaining of the ilocks will retain the same lockdep subclass and so lockdep will not see anything wrong with this code. SGI-PV: 986238 SGI-Modid: xfs-linux-melb:xfs-kern:31999a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Peter Leckie <pleckie@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
| * | [XFS] Fix barrier status change detection.David Chinner2008-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code in xlog_iodone() uses the wrong macro to check if the barrier has been cleared due to an EOPNOTSUPP error form the lower layer. SGI-PV: 986143 SGI-Modid: xfs-linux-melb:xfs-kern:31984a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Nathaniel W. Turner <nate@houseofnate.net> Signed-off-by: Peter Leckie <pleckie@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
| * | [XFS] Prevent direct I/O from mapping extents beyond eofLachlan McIlroy2008-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the help from some tracing I found that we try to map extents beyond eof when doing a direct I/O read. It appears that the way to inform the generic direct I/O path (ie do_direct_IO()) that we have breached eof is to return an unmapped buffer from xfs_get_blocks_direct(). This will cause do_direct_IO() to jump to the hole handling code where is will check for eof and then abort. This problem was found because a direct I/O read was trying to map beyond eof and was encountering delayed allocations. The delayed allocations beyond eof are speculative allocations and they didn't get converted when the direct I/O flushed the file because there was only enough space in the current AG to convert and write out the dirty pages within eof. Note that xfs_iomap_write_allocate() wont necessarily convert all the delayed allocation passed to it - it will return after allocating the first extent - so if the delayed allocation extends beyond eof then it will stay that way. SGI-PV: 983683 SGI-Modid: xfs-linux-melb:xfs-kern:31929a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
| * | [XFS] Fix regression introduced by remount fixupChristoph Hellwig2008-09-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Logically we would return an error in xfs_fs_remount code to prevent users from believing they might have changed mount options using remount which can't be changed. But unfortunately mount(8) adds all options from mtab and fstab to the mount arguments in some cases so we can't blindly reject options, but have to check for each specified option if it actually differs from the currently set option and only reject it if that's the case. Until that is implemented we return success for every remount request, and silently ignore all options that we can't actually change. SGI-PV: 985710 SGI-Modid: xfs-linux-melb:xfs-kern:31908a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
| * | [XFS] Move memory allocations for log tracing out of the critical pathLachlan McIlroy2008-09-17
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory allocations for log->l_grant_trace and iclog->ic_trace are done on demand when the first event is logged. In xlog_state_get_iclog_space() we call xlog_trace_iclog() under a spinlock and allocating memory here can cause us to sleep with a spinlock held and deadlock the system. For the log grant tracing we use KM_NOSLEEP but that means we can lose trace entries. Since there is no locking to serialize the log grant tracing we could race and have multiple allocations and leak memory. So move the allocations to where we initialize the log/iclog structures. Use KM_NOFS to avoid recursing into the filesystem and drop log->l_trace since it's not even used. SGI-PV: 983738 SGI-Modid: xfs-linux-melb:xfs-kern:31896a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2008-09-19
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() RDMA/nes: Fix client side QP destroy IB/mlx4: Fix up fast register page list format mlx4_core: Set RAE and init mtt_sz field in FRMR MPT entries
| | \
| | \
| *-. \ Merge branches 'ipoib', 'mlx4' and 'nes' into for-linusRoland Dreier2008-09-16
| |\ \ \
| | | * | RDMA/nes: Fix client side QP destroyFaisal Latif2008-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix QP not being destroyed properly on the client, which leads to userspace programs hanging on exit. This is a missing chunk from the connection management rewrite in commit 6492cdf3 ("RDMA/nes: CM connection setup/teardown rework"). Signed-off-by: Faisal Latif <flatif@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * | | IB/mlx4: Fix up fast register page list formatVladimir Sokolovsky2008-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Byte swap the addresses in the page list for fast register work requests to big endian to match what the HCA expectx. Also, the addresses must have the "present" bit set so that the HCA knows it can access them. Otherwise the HCA will fault the first time it accesses the memory region. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| | * | | mlx4_core: Set RAE and init mtt_sz field in FRMR MPT entriesVladimir Sokolovsky2008-09-02
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set the RAE (remote access enable) bit and correctly initialize the MTT size in MPT entries being set up for fast register memory regions. Otherwise the callers can't enable remote access and in fact can't fast register at all (since the HCA will think no MTT entries are allocated). Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * / / IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()Yossi Etigin2008-09-16
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()"). Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | | Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2008-09-19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: fix deadlock in setting scheduler parameter to zero sched: fix 2.6.27-rc5 couldn't boot on tulsa machine randomly
| * | | sched: fix deadlock in setting scheduler parameter to zeroHiroshi Shimamoto2008-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrei Gusev wrote: > I played witch scheduler settings. After doing something like: > echo -n 1000000 >sched_rt_period_us > > command is locked. I found in kernel.log: > > Sep 11 00:39:34 zaratustra > Sep 11 00:39:34 zaratustra Pid: 4495, comm: bash Tainted: G W > (2.6.26.3 #12) > Sep 11 00:39:34 zaratustra EIP: 0060:[<c0213fc7>] EFLAGS: 00210246 CPU: 0 > Sep 11 00:39:34 zaratustra EIP is at div64_u64+0x57/0x80 > Sep 11 00:39:34 zaratustra EAX: 0000389f EBX: 00000000 ECX: 00000000 > EDX: 00000000 > Sep 11 00:39:34 zaratustra ESI: d9800000 EDI: d9800000 EBP: 0000389f > ESP: ea7a6edc > Sep 11 00:39:34 zaratustra DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > Sep 11 00:39:34 zaratustra Process bash (pid: 4495, ti=ea7a6000 > task=ea744000 task.ti=ea7a6000) > Sep 11 00:39:34 zaratustra Stack: 00000000 000003e8 d9800000 0000389f > c0119042 00000000 00000000 00000001 > Sep 11 00:39:34 zaratustra 00000000 00000000 ea7a6f54 00010000 00000000 > c04d2e80 00000001 000e7ef0 > Sep 11 00:39:34 zaratustra c01191a3 00000000 00000000 ea7a6fa0 00000001 > ffffffff c04d2e80 ea5b2480 > Sep 11 00:39:34 zaratustra Call Trace: > Sep 11 00:39:34 zaratustra [<c0119042>] __rt_schedulable+0x52/0x130 > Sep 11 00:39:34 zaratustra [<c01191a3>] sched_rt_handler+0x83/0x120 > Sep 11 00:39:34 zaratustra [<c01a76a6>] proc_sys_call_handler+0xb6/0xd0 > Sep 11 00:39:34 zaratustra [<c01a76c0>] proc_sys_write+0x0/0x20 > Sep 11 00:39:34 zaratustra [<c01a76d9>] proc_sys_write+0x19/0x20 > Sep 11 00:39:34 zaratustra [<c016cc68>] vfs_write+0xa8/0x140 > Sep 11 00:39:34 zaratustra [<c016cdd1>] sys_write+0x41/0x80 > Sep 11 00:39:34 zaratustra [<c0103051>] sysenter_past_esp+0x6a/0x91 > Sep 11 00:39:34 zaratustra ======================= > Sep 11 00:39:34 zaratustra Code: c8 41 0f ad f3 d3 ee f6 c1 20 0f 45 de > 31 f6 0f ad ef d3 ed f6 c1 20 0f 45 fd 0f 45 ee 31 c9 39 eb 89 fe 89 ea > 77 08 89 e8 31 d2 <f7> f3 89 c1 89 f0 8b 7c 24 08 f7 f3 8b 74 24 04 89 > ca 8b 1c 24 > Sep 11 00:39:34 zaratustra EIP: [<c0213fc7>] div64_u64+0x57/0x80 SS:ESP > 0068:ea7a6edc > Sep 11 00:39:34 zaratustra ---[ end trace 4eaa2a86a8e2da22 ]--- fix the boundary condition. sysctl_sched_rt_period=0 makes exception at to_ratio(). Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | sched: fix 2.6.27-rc5 couldn't boot on tulsa machine randomlyZhang, Yanmin2008-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On my tulsa x86-64 machine, kernel 2.6.25-rc5 couldn't boot randomly. Basically, function __enable_runtime forgets to reset rt_rq->rt_throttled to 0. When every cpu is up, per-cpu migration_thread is created and it runs very fast, sometimes to mark the corresponding rt_rq->rt_throttled to 1 very quickly. After all cpus are up, with below calling chain: sched_init_smp => arch_init_sched_domains => build_sched_domains => ... => cpu_attach_domain => rq_attach_root => set_rq_online => ... => _enable_runtime _enable_runtime is called against every rt_rq again, so rt_rq->rt_time is reset to 0, but rt_rq->rt_throttled might be still 1. Later on function do_sched_rt_period_timer couldn't reset it, and all RT tasks couldn't be scheduled to run on that cpu. here is RT task migration_thread which is woken up when a task is migrated to another cpu. Below patch fixes it against 2.6.27-rc5. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>