aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* sata_inic162x: use IDMA for non DMA ATA commandsTejun Heo2008-05-06
| | | | | | | | | | | | Use IDMA for PIO and non-data commands. This allows sata_inic162x to safely drive LBA48 devices. Kill inic_dev_config() which contains code to reject LBA48 devices. With this change, status checking in inic_qc_issue() to avoid hard lock up after hotplug can go away too. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_inic162x: kill now unused bmdma related stuffTejun Heo2008-05-06
| | | | | | | | | | | | | | | sata_inic162x doesn't use BMDMA anymore. Kill bmdma related stuff. * prdctl manipulation * port IRQ mask manipulation * inherit ATA_BASE_SHT instead of ATA_BMDMA_SHT * BMDMA methods Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_inic162x: use IDMA for ATA_PROT_DMATejun Heo2008-05-06
| | | | | | | | | | | | | | | | | | The modified driver on initio site has enough clue on how to use IDMA. Use IDMA for ATA_PROT_DMA. * LBA48 now works as long as it uses DMA (LBA48 devices still aren't allowed as it can destroy data if PIO is used for any reason). * No need to mask IRQs for read DMAs as IDMA_DONE is properly raised after transfer to memory is actually completed. There will be some spurious interrupts but host_intr will handle it correctly and manipulating port IRQ mask interacts badly with the other port for some reason, so command type dependent port IRQ masking is not used anymore. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_inic162x: update TF read handlingTejun Heo2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | inic162x can't reliably read back TF or at least we don't know how to do it yet. The only values which seem reliable are status and error. This patch updates access to TF. * implement inic_tf_read() which reads the TF area in mmio area * implement custom inic_qc_fill_rtf() which only returns true if status indicates device error. it'll be returning bogus addresses for device errors but it'll be able to report why it failed at least. * implement custom inic_check_ready() and use ata_wait_after_reset() instead of the SFF version. * use inic_tf_read() for classification. This is not perfect but it fixes hotplug detection failure and at least makes the driver report 0's instead of random garbages while reporting valid status and error for device errors. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_inic162x: add / update constantsTejun Heo2008-05-06
| | | | | | | | | | | | * add a bunch of constants, most are from the datasheet, a few undocumented ones are from initio's modified driver * HCTL_PWRDWN is bit 12 not 13 This is in preparation of further inic162x updates. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_inic162x: misc clean upsTejun Heo2008-05-06
| | | | | | | | | | | | | | | * use larger indents for structure member definitions * kill unused variable @addr in inic_scr_write() * kill unnecessary flushes in inic_freeze/thaw() * kill buggy explicit kfree() on devres managed port private data This is in preparation of further inic162x updates. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv use hweight16() for bit counting (V2)Mark Lord2008-05-06
| | | | | | | | | | Some tidying as suggested by Grant Grundler. Nuke local bit-counting function from sata_mv in favour of using hweight16(). Also add a short explanation for the 15msec timeout used when waiting for empty/idle. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv NCQ-EH for FIS-based switchingMark Lord2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | Convert sata_mv's EH for FIS-based switching (FBS) over to the sequence recommended by Marvell. This enables us to catch/analyze multiple failed links on a port-multiplier when using NCQ. To do this, we clear the ERR_DEV bit in the EDMA Halt-Conditions register, so that the EDMA engine doesn't self-disable on the first NCQ error. Our EH code sets the MV_PP_FLAG_DELAYED_EH flag to prevent new commands being queued while we await completion of all outstanding NCQ commands on all links of the failed PM. The SATA Test Control register tells us which links have failed, so we must only wait for any other active links to finish up before we stop the EDMA and run the .error_handler afterward. The patch also includes skeleton code for handling of non-NCQ FBS operation. This is more for documentation purposes right now, as that mode is not yet enabled in sata_mv. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv delayed eh handlingMark Lord2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | Introduce a new "delayed error handling" mechanism in sata_mv, to enable us to eventually deal with multiple simultaneous NCQ failures on a single host link when a PM is present. This involves a port flag (MV_PP_FLAG_DELAYED_EH) to prevent new commands being queued, and a pmp bitmap to indicate which pmp links had NCQ errors. The new mv_pmp_error_handler() uses those values to invoke ata_eh_analyze_ncq_error() on each failed link, prior to freezing the port and passing control to sata_pmp_error_handler(). This is based upon a strategy suggested by Tejun. For now, we just implement the delayed mechanism. The next patch in this series will add the multiple-NCQ EH code to take advantage of it. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* libata: export ata_eh_analyze_ncq_errorMark Lord2008-05-06
| | | | | | | | Export ata_eh_analyze_ncq_error() for subsequent use by sata_mv, as suggested by Tejun. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv new mv_port_intr functionMark Lord2008-05-06
| | | | | | | | | | Separate out the inner loop body of mv_host_intr() into it's own function called mv_port_intr(). This should help maintainabilty. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv fix mv_host_intr bug for hc_irq_causeMark Lord2008-05-06
| | | | | | | | Remove the unwanted reads of hc_irq_cause from mv_host_intr(), thereby removing a bug whereby we were not always reading it when needed.. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv NCQ and SError fixes for mv_err_intrMark Lord2008-05-06
| | | | | | | | | | | | | | | | | | Sigh. Undo some earlier changes to mv_port_intr(), so that we now read/clear SError again in all cases. Arrange the top of the function to be as close as possible to what we need for a later update (in this series) for ERR_DEV handling. Fix things so that libata-eh can attempt a READ_LOG_EXT_10H in response to a failed NCQ command, by just doing a local mv_eh_freeze() rather than ata_port_freeze(). This will now fully handle NCQ errors much of the time, but more fixes are needed for FBS/PMP, and for certain chip errata. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv rearrange mv_config_fbsMark Lord2008-05-06
| | | | | | | | | | | Rearrange mv_config_fbs() to more closely follow the (corrected) datasheet recommendations for NCQ and FIS-based switching (FBS). Also, maintain a port flag to let us know when FBS is enabled. We will make more use of that flag later in this patch series. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv errata workaround for sata25 part 1Mark Lord2008-05-06
| | | | | | | | | | Part 1 of workaround for errata "sata#25" for the 60x1 series (the second half of this errata workaround is still in development. Bit22 of the GPIO port has to be set "on" when in NCQ mode. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv new mv_qc_defer methodMark Lord2008-05-06
| | | | | | | | | | | The EDMA engine cannot tolerate a mix of NCQ/non-NCQ commands, and cannot be used for PIO at all. So we need to prevent libata from trying to feed us such mixtures. Introduce mv_qc_defer() for this purpose, and use it for all chip versions. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv wait for empty+idleMark Lord2008-05-06
| | | | | | | | | | | | | | | | When performing EH, it is recommended to wait for the EDMA engine to empty out requests-in-progress before disabling EDMA. Introduce code to poll the EDMA_STATUS register for idle/empty bits before disabling EDMA. For non-EH operation, this will normally exit without delay, other than the register read. A later series of patches may focus on eliminating this and various other register reads (when possible) throughout the driver, but for now we're focussing on solid reliablity. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv pci featuresMark Lord2008-05-06
| | | | | | | | | | | Some of the GenIIe EDMA optimizations should not be used for non-PCI (SOC) devices, and nor for certain configurations of conventional PCI (non PCI-X, PCIe) buses. Logic taken/simplified from that in the Marvell proprietary driver. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* sata_mv more cosmetic changesMark Lord2008-05-06
| | | | | | | | | | | | More cosmetic changes; no code changes. -- try and improve consistency of naming. -- add missing _OFS to tails of register offset definitions. -- rename mv_setup_ifctl() to mv_setup_ifcfg(), since that's what it really does. -- remove/move some dead comments Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* libata: Add Intel SCH PATA driverAlek Du2008-05-06
| | | | | | | | This patch adds Intel SCH chipsets (AF82US15W, AF82US15L, AF82UL11L) PATA controller support. Signed-off-by: Alek Du <alek.du@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* ata_piix: verify SIDPR access before enabling itTejun Heo2008-05-06
| | | | | | | | | | | | | On certain configurations (certain macbooks), even though all the conditions for SIDPR access described in the datasheet are met, actually reading those registers just returns 0 and have no effect on write. Verify SIDPR is actually working before enabling it. This is reported by Ryan Roth in bz#10512. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Ryan Roth <ryan.roth@ch2m.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* libata: improve post-reset device ready testTejun Heo2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | Some controllers (jmb and inic162x) use 0x77 and 0x7f to indicate that the device isn't ready yet. It looks like they use 0xff if device presence is detected but connection isn't established. 0x77 or 0x7f after connection is established and use the value from signature FIS after receiving it. This patch implements ata_check_ready(), which takes TF status value and determines whether the port is ready or not considering the above and other conditions, and use it in @check_ready() functions. This is safe as both 0x77 and 0x7f aren't valid ready status value even though they have BSY bit cleared. This fixes hot plug detection failures which can be triggered with certain drives if they aren't already spun up when the data connector is hot plugged. Tested on sil, sil24, ahci (jmb/ich), piix and inic162x combined with eight drives from all major vendors. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2008-05-05
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4_core: Support creation of FMRs with pages smaller than 4K IB/ehca: Fix function return types RDMA/cxgb3: Bump up the MPA connection setup timeout. RDMA/cxgb3: Silently ignore close reply after abort. RDMA/cxgb3: QP flush fixes IB/ipoib: Fix transmit queue stalling forever IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf()
| * mlx4_core: Support creation of FMRs with pages smaller than 4KOren Duer2008-05-05
| | | | | | | | | | | | | | | | | | | | Don't hard code a test against a minimum page shift of 12, since the device may support smaller pages. Test against the actual smallest page size from the device capabilities. Signed-off-by: Oren Duer <oren@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ehca: Fix function return typesStefan Roscher2008-05-05
| | | | | | | | | | | | | | | | Also remove duplicate assignment of local_ca_ack_delay and change min_t check for local_ca_ack_delay to u8 instead of int. Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * RDMA/cxgb3: Bump up the MPA connection setup timeout.Steve Wise2008-05-02
| | | | | | | | | | | | | | Testing on large clusters shows its way too short at 10 secs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * RDMA/cxgb3: Silently ignore close reply after abort.Steve Wise2008-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove bad BUG_ON() that can trigger in correct operation from close_con_rpl(). It is possible to get a close_rpl message on a dead connection. The sequence is: - host refs ep for close exchange - host posts close_req - hw posts PEER_ABORT from incoming RST - host marks ep DEAD - host posts ABORT_RPL and releases ep resources - hw posts CLOSE_RPL - host derefs ep and ep freed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * RDMA/cxgb3: QP flush fixesSteve Wise2008-05-02
| | | | | | | | | | | | | | | | | | | | | | | | - Flush the QP only after the HW disables the connection. Currently we flush the QP when transitioning to CLOSING. This exposes a race condition where the HW can complete a RECV WR, for instance, -and- the SW can flush that same WR. - Only call CQ event handlers on flush IFF we actually flushed something. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/ipoib: Fix transmit queue stalling foreverEli Cohen2008-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f56bcd80 ("IPoIB: Use separate CQ for UD send completions") introduced a bug where the transmit queue could get stopped and never woken up. The problem is that send completions are only polled at the end of the xmit function, so if the send queue fills up and the xmit path stops the queue, then there is no way for send completions to ever get polled, and so the transmit queue stays stopped forever. Fix this by arming the send CQ just before posting the last send request that fills the send queue. Then, when the completion event handler is called, drain the send CQ. Since it is possible that not enough send completions are in the CQ, verify that the the net queue has been woken up after draining the send CQ, and if not arm a timer and drain again at the timer function. Signed-off-by: Roland Dreier <rolandd@cisco.com>
| * IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf()Roland Dreier2008-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I changed things around so that mlx4_ib_alloc_cq_buf() and mlx4_ib_free_cq_buf() were used everywhere they could be. However, I screwed up the number of entries passed into mlx4_ib_alloc_cq_buf() in a couple places -- the function bumps the number of entries internally, so the caller shouldn't add 1 as well. Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf() can cause the cleanup to go off the end of an array and corrupt allocator state in interesting ways. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2008-05-05
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes: sched: default to n for GROUP_SCHED and FAIR_GROUP_SCHED sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK sched, x86: add HAVE_UNSTABLE_SCHED_CLOCK sched: fix cpu clock sched: fair-group: fix a Div0 error of the fair group scheduler sched: fix missing locking in sched_domains code sched: make clock sync tunable by architecture code sched: fix debugging sched: fix sched_info_switch not being called according to documentation sched: fix hrtick_start_fair and CPU-Hotplug sched: fix SCHED_FAIR wake-idle logic error sched: fix RT task-wakeup logic sched: add statics, don't return void expressions sched: add debug checks to idle functions sched: remove old sched doc sched: make rt_sched_class, idle_sched_class static sched: optimize calc_delta_mine() sched: fix normalized sleeper
| * | sched: default to n for GROUP_SCHED and FAIR_GROUP_SCHEDParag Warudkar2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GROUP_SCHED is confirmed to cause unacceptable latencies, see: http://lkml.org/lkml/2008/5/2/370. Mark it EXPERIMENTAL and default to no for now. Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCKPeter Zijlstra2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this replaces the rq->clock stuff (and possibly cpu_clock()). - architectures that have an 'imperfect' hardware clock can set CONFIG_HAVE_UNSTABLE_SCHED_CLOCK - the 'jiffie' window might be superfulous when we update tick_gtod before the __update_sched_clock() call in sched_clock_tick() - cpu_clock() might be implemented as: sched_clock_cpu(smp_processor_id()) if the accuracy proves good enough - how far can TSC drift in a single jiffie when considering the filtering and idle hooks? [ mingo@elte.hu: various fixes and cleanups ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched, x86: add HAVE_UNSTABLE_SCHED_CLOCKIngo Molnar2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | add the HAVE_UNSTABLE_SCHED_CLOCK, for architectures to select. the next change utilizes it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fix cpu clockIngo Molnar2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David Miller pointed it out that nothing in cpu_clock() sets prev_cpu_time. This caused __sync_cpu_clock() to be called all the time - against the intention of this code. The result was that in practice we hit a global spinlock every time cpu_clock() is called - which - even though cpu_clock() is used for tracing and debugging, is suboptimal. While at it, also: - move the irq disabling to the outest layer, this should make cpu_clock() warp-free when called with irqs enabled. - use long long instead of cycles_t - for platforms where cycles_t is 32-bit. Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fair-group: fix a Div0 error of the fair group schedulerMiao Xie2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I echoed 0 into the "cpu.shares" file, a Div0 error occured. We found it is caused by the following calling. sched_group_set_shares(tg, shares) set_se_shares(tg->se[i], shares/nr_cpu_ids) __set_se_shares(se, shares) div64_64((1ULL<<32), shares) When the echoed value was less than the number of processores, the result of the sentence "shares/nr_cpu_ids" was 0, and then the system called div64() to divide the result, the Div0 error occured. It is unnecessary that the shares value is divided by nr_cpu_ids, I think. Because in the function __update_group_shares_cpu() and init_tg_cfs_entry(), the shares value isn't divided by nr_cpu_ids when setting shares of the sched entity. This patch fixes this bug. And echoing ULONG_MAX value into cpu.shares also causes Div0 error, so we set a macro MAX_SHARES to limit the max value of shares. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fix missing locking in sched_domains codeHeiko Carstens2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Concurrent calls to detach_destroy_domains and arch_init_sched_domains were prevented by the old scheduler subsystem cpu hotplug mutex. When this got converted to get_online_cpus() the locking got broken. Unlike before now several processes can concurrently enter the critical sections that were protected by the old lock. So use the already present doms_cur_mutex to protect these sections again. Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: make clock sync tunable by architecture codeIngo Molnar2008-05-05
| | | | | | | | | | | | | | | | | | make time_sync_thresh tunable to architecture code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fix debuggingMike Galbraith2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | Revert debugging commit 7ba2e74ab5a0518bc953042952dd165724bc70c9. print_cfs_rq_tasks() can induce live-lock if a task is dequeued during list traversal. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fix sched_info_switch not being called according to documentationDavid Simner2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | http://bugzilla.kernel.org/show_bug.cgi?id=10545 sched_stats.h says that __sched_info_switch is "called when prev != next" in the comment. sched.c should therefore do that. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fix hrtick_start_fair and CPU-HotplugPeter Zijlstra2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gautham R Shenoy reported: > While running the usual CPU-Hotplug stress tests on linux-2.6.25, > I noticed the following in the console logs. > > This is a wee bit difficult to reproduce. In the past 10 runs I hit this > only once. > > ------------[ cut here ]------------ > > WARNING: at kernel/sched.c:962 hrtick+0x2e/0x65() > > Just wondering if we are doing a good job at handling the cancellation > of any per-cpu scheduler timers during CPU-Hotplug. This looks like its indeed not cancelled at all and migrates the it to another cpu. Fix it via a proper hotplug notifier mechanism. Reported-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fix SCHED_FAIR wake-idle logic errorGregory Haskins2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use an optimization to skip the overhead of wake-idle processing if more than one task is assigned to a run-queue. The assumption is that the system must already be load-balanced or we wouldnt be overloaded to begin with. The problem is that we are looking at rq->nr_running, which may include RT tasks in addition to CFS tasks. Since the presence of RT tasks really has no bearing on the balance status of CFS tasks, this throws the calculation off. This patch changes the logic to only consider the number of CFS tasks when making the decision to optimze the wake-idle. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fix RT task-wakeup logicGregory Haskins2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dmitry Adamushko pointed out a logic error in task_wake_up_rt() where we will always evaluate to "true". You can find the thread here: http://lkml.org/lkml/2008/4/22/296 In reality, we only want to try to push tasks away when a wake up request is not going to preempt the current task. So lets fix it. Note: We introduce test_tsk_need_resched() instead of open-coding the flag check so that the merge-conflict with -rt should help remind us that we may need to support NEEDS_RESCHED_DELAYED in the future, too. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Dmitry Adamushko <dmitry.adamushko@gmail.com> CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: add statics, don't return void expressionsHarvey Harrison2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Noticed by sparse: kernel/sched.c:760:20: warning: symbol 'sched_feat_names' was not declared. Should it be static? kernel/sched.c:767:5: warning: symbol 'sched_feat_open' was not declared. Should it be static? kernel/sched_fair.c:845:3: warning: returning void-valued expression kernel/sched.c:4386:3: warning: returning void-valued expression Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: add debug checks to idle functionsAndrew Morton2008-05-05
| | | | | | | | | | | | | | | | | | | | | Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: "Justin Mattock" <justinmattock@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: remove old sched docIngo Molnar2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | Fabio Checconi noticed that Documentation/scheduler/sched-design.txt was a stale copy of the old scheduler. Remove it. Reported-by: Fabio Checconi <fabio@gandalf.sssup.it> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: make rt_sched_class, idle_sched_class staticHarvey Harrison2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | The C files are included directly in sched.c, so they are effectively static. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: optimize calc_delta_mine()Peter Zijlstra2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joel noticed that the !lw->inv_weight contition isn't unlikely anymore so remove the unlikely annotation. Also, remove the two div64_u64() inv_weight calculations, which makes them rely on the calc_delta_mine() path as well. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Joel Schopp <jschopp@austin.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fix normalized sleeperPeter Zijlstra2008-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | Normalized sleeper uses calc_delta*() which requires that the rq load is already updated, so move account_entity_enqueue() before place_entity() Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'powerpc-next' of ↵Linus Torvalds2008-05-05
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'powerpc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: [POWERPC] Assign PDE->data before gluing PDE into /proc tree [POWERPC] devres: Add devm_ioremap_prot() [POWERPC] macintosh: ADB driver: adb_handler_sem semaphore to mutex [POWERPC] macintosh: windfarm_smu_sat: semaphore to mutex [POWERPC] macintosh: therm_pm72: driver_lock semaphore to mutex