aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Revert "target: Do not special-case loop and iscsi fabric module loads"Nicholas Bellinger2012-07-16
| | | | | | | | | | Existing lio_dump.py code expects this to be in place for /iscsi. Revert for now to avoid userspace breakage in lio-utils This reverts commit fd88a785f9ac5d6be437c528571ccd85cdf2d493. Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: move unmap to struct spc_opsChristoph Hellwig2012-07-16
| | | | | | | | | Having all the unmap payload parsing in the backed is a bit ugly, but until more drivers support it and we can find a good interface for all of them that seems the way to go. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: move write_same to struct spc_opsChristoph Hellwig2012-07-16
| | | | | | | | | | | | Add spc_ops->execute_write_same() caller for ->execute_cmd() setup, and update IBLOCK backends to use it. (nab: add export of spc_get_write_same_sectors symbol) (roland: Carry forward: Fix range calculation in WRITE SAME emulation when num blocks == 0) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: move sync_cache to struct spc_opsChristoph Hellwig2012-07-16
| | | | | | | | Add spc_ops->execute_sync_cache() caller for ->execute_cmd() setup, and update IBLOCK + FILEIO backends to use it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: add struct spc_ops + initial ->execute_rw pointer usageChristoph Hellwig2012-07-16
| | | | | | | | | | | Remove the execute_cmd method in struct se_subsystem_api, and always use the one directly in struct se_cmd. To make life simpler for SBC virtual backends a struct spc_ops that is passed to sbc_parse_cmd is added. For now it only contains an execute_rw member, but more will follow with the subsequent commits. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: remove dead SCF_ flagsChristoph Hellwig2012-07-16
| | | | | | | | Remove the dead SCF_SE_ALLOW_EOO and SCF_DELAYED_CMD_FROM_SAM_ATTR from se_cmd_flags_table. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target/iscsi: Remove dead code in lio_get_tpg_from_tpg_item()Roland Dreier2012-07-16
| | | | | | | It's got no callers... Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target/iblock: Add parameter to specify read-only devicesAndy Grover2012-07-16
| | | | | | | | | | see https://bugzilla.redhat.com/show_bug.cgi?id=818855 Adds a parameter so read-only block devices may be registered as LIO backstores. Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: Do not special-case loop and iscsi fabric module loadsAndy Grover2012-07-16
| | | | | | | | These modules, along with other fabrics, should be loaded as-needed by the LIO userspace tools. Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: move ref_cmd from the generic se_tmr_req into iscsi codeChristoph Hellwig2012-07-16
| | | | | | | | | | Also remove the unused ref_task_lun field in struct se_tmr_req. (nab: Add missing TASK_REASSIGN ref_lun vs. ref_cmd orig_fe_lun checks in iscsit_tmr_task_reassign) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: remove the execute listChristoph Hellwig2012-07-16
| | | | | | | | | | | | | Since "target: Drop se_device TCQ queue_depth usage from I/O path" we always submit all commands (or back then, tasks) from __transport_execute_tasks. That means the the execute list has lots its purpose, as we can simply submit the commands that are restarted in transport_complete_task_attr directly while we walk the list. In fact doing so also solves a race in the way it currently walks to delayed_cmd_list as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target/pscsi: Only emulate REPORT_LUNS for passthroughNicholas Bellinger2012-07-16
| | | | | | | | | | | | | | | This patch changes back the pSCSI backend to follow pre 3.6-queue code to passthrough SPC-3 persistent reservations + SPC-2 legacy reservation handling to the underlying LLD / physical hardware. For folks who really need this for their own SPC-3 emulation logic, avoid changing the functionality of this beyond what is exported for REPORT_LUNS for existing code, and to avoid problems with SPC-3 PR/ALUA as INQUIRY EVPD=0x83 emulation needs to be in place in order for this to work as expected with spc_parse_cdb() code.. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: Move MAINTENANCE_[IN,OUT] from pscsi_parse_cdb -> spc_parse_cdbNicholas Bellinger2012-07-16
| | | | | | | | | The MAINTENANCE_[IN,OUT] CDB parsing required for generic ALUA emulation needs to be in spc_parse_cdb() to function for virtual TYPE_DISK exports, instead of in backend pscsi_parse_cdb() code used only for passthrough ops. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: move transport_generic_prepare_cdb into pscsiChristoph Hellwig2012-07-16
| | | | | | | | The virtual drivers don't need to clear cdb fields they never look at, so move this code into the pscsi backend. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: move code for CDB emulationChristoph Hellwig2012-07-16
| | | | | | | | | | | Move the existing code in target_core_cdb.c into the files for the command sets that the emulations implement. (roland + nab: Squash patch: Fix range calculation in WRITE SAME emulation when num blocks == 0s) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: add a parse_cdb method to the backend driversChristoph Hellwig2012-07-16
| | | | | | | | | | | | | | | | Instead of trying to handle all SCSI command sets in one function (transport_generic_cmd_sequencer) call out to the backend driver to perform this functionality. For pSCSI a copy of the existing code is used, but for all virtual backends we can use a new parse_sbc_cdb helper is used to provide a simple SBC emulation. For now this setups means a fair amount of duplication between pSCSI and the SBC library, but patches later in this series will sort out that problem. (nab: Fix up build failure in target_core_pscsi.c) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: split parsing of SPC commands into a separate helperChristoph Hellwig2012-07-16
| | | | | | | (nab: Add EXPORT_SYMBOL usage for spc_parse_cdb) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: split overflow and underflow checks into a helperChristoph Hellwig2012-07-16
| | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: remove control CDB flagsChristoph Hellwig2012-07-16
| | | | | | | | | | We don't need three flags to classifiy the CDB as we can check for a NULL S/G list for a dataless command, and can infer from the absence of the data flag that we deal with a control CDB. Also remove the _SG_IO from the data CDB flag as all I/O is dont on S/G lists now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: move unrelated code out of transport_generic_cmd_sequencerChristoph Hellwig2012-07-16
| | | | | | | | Move all code not related to cdb parsing from transport_generic_cmd_sequencer into target_setup_cmd_from_cdb. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: Fix range calculation in WRITE SAME emulation when num blocks == 0Roland Dreier2012-07-16
| | | | | | | | | | | | | | | | When NUMBER OF LOGICAL BLOCKS is 0, WRITE SAME is supposed to write all the blocks from the specified LBA through the end of the device. However, dev->transport->get_blocks(dev) (perhaps confusingly) returns the last valid LBA rather than the number of blocks, so the correct number of blocks to write starting with lba is dev->transport->get_blocks(dev) - lba + 1 (nab: Backport roland's for-3.6 patch to for-3.5) Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: Clean up returning errors in PR handling codeRoland Dreier2012-07-16
| | | | | | | | | | | - instead of (PTR_ERR(file) < 0) just use IS_ERR(file) - return -EINVAL instead of EINVAL - all other error returns in target_scsi3_emulate_pr_out() use "goto out" -- get rid of the one remaining straight "return." Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* tcm_fc: Fix crash seen with aborts and large readsMark Rustad2012-07-14
| | | | | | | | | | | | | | | This patch fixes a crash seen when large reads have their exchange aborted by either timing out or being reset. Because the exchange abort results in the seq pointer being set to NULL, because the sequence is no longer valid, it must not be dereferenced. This patch changes the function ft_get_task_tag to return ~0 if it is unable to get the tag for this reason. Because the get_task_tag interface provides no means of returning an error, this seems like the best way to fix this issue at the moment. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* qla2xxx: print the right array elements in qlt_async_eventAlan Cox2012-07-06
| | | | | | | | | | | | | | | Based upon Alan's patch from Coverity scan id 793583, these debug messages in qlt_async_event() should be starting from byte 0, which is always the Asynchronous Event Status Code from the parent switch statement. Also, rename reason_code -> login_code following the language used in 2500 FW spec for Port Database Changed (0x8014) -> Port Database Changed Event Mailbox Register for mailbox[2]. Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Chad Dupuis <chad.dupuis@qlogic.com> Cc: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* tcm_fc: Resolve suspicious RCU usage warningsMark Rustad2012-07-06
| | | | | | | | | | | Use rcu_dereference_protected to tell rcu that the ft_lport_lock is held during ft_lport_create. This resolved "suspicious RCU usage" warnings when debugging options are turned on. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* qla2xxx: Remove version.h header file inclusionSachin Kamat2012-06-13
| | | | | | | version.h header file is no longer required for qla_target code. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* tcm_qla2xxx: Handle malformed wwn strings properlyRoland Dreier2012-06-12
| | | | | | | | | | | | | | If we make a variable an unsigned int and then expect it to be < 0 on a bad character, we're going to have a bad time. Fix the tcm_qla2xxx code to actually notice if hex_to_bin() returns a negative variable. This was detected by the compiler warning: scsi/qla2xxx/tcm_qla2xxx.c: In function ‘tcm_qla2xxx_npiv_extract_wwn’: scsi/qla2xxx/tcm_qla2xxx.c:148:3: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* tcm_qla2xxx: tcm_qla2xxx_handle_tmr() can be staticRoland Dreier2012-06-12
| | | | | Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* qla2xxx: Don't leak commands we give up on in qlt_do_work()Roland Dreier2012-06-12
| | | | | | | | | | | | | | If we go to the "out_term:" exit path in qlt_do_work(), we call qlt_send_term_exchange() with a NULL cmd, which means that it can't possibly free the cmd for us. Add an explicit call to free the command memory, so we don't leak the allocation. This will also fix warnings about "BUG qla_tgt_cmd_cachep: Objects remaining on kmem_cache_close" from slub when unloading the qla2xxx target module. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* qla2xxx: Don't crash if we can't find cmd for failed CTIORoland Dreier2012-06-12
| | | | | | | | | | | | | | | | | In qlt_do_ctio_completion(), there's no point in calling qlt_term_ctio_exchange() with a NULL cmd -- all that it does is crash in a NULL pointer dereference, since it does qlt_send_term_exchange(vha, cmd, &cmd->atio, 1); and dereferencing &cmd->atio is a bad idea if cmd itself is NULL. If we really need to do this, we could take the values from the failed CTIO we're processing, but it's not clear if it's worth the replumbing to do that. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* tcm_qla2xxx: Don't insert nacls without sessions into the btreeRoland Dreier2012-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | When we create an explicit node ACL in tcm_qla2xxx_make_nodeacl(), there is a call to tcm_qla2xxx_setup_nacl_from_rport(), which puts the node ACL into the lport_fcport_map even though there is no session yet for the initiator. Since the only time we remove entries from this map is when we free a session, this means that if we later delete this node ACL without the initiator ever creating a session, we'll leave the nacl pointer in the btree pointing at freed memory. This is especially bad if that initiator later does send us a command that would cause us to create a dynamic ACL and session: we'll find the stale freed nacl pointer in the btree and end up with use-after-free. We could add more code to clear the btree entry when deleting the explicit nacl, but the original insertion is pointless: without a session attached, we'll just have to update the entry when a session appears anyway. So we can just delete tcm_qla2xxx_setup_nacl_from_rport() and the code that calls it. Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: Chad Dupuis <chad.dupuis@qlogic.com> Cc: Giridhar Malavali <giridhar.malavali@qlogic.com> Cc: Arun Easi <arun.easi@qlogic.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: Return error to initiator if SET TARGET PORT GROUPS emulation failsRoland Dreier2012-06-12
| | | | | | | | | | | | | | | | | | | | | | The error paths in target_emulate_set_target_port_groups() are all essentially "rc = -EINVAL; goto out;" but the code at "out:" ignores rc and always returns success. This means that even if eg explicit ALUA is turned off, the initiator will always see a good SCSI status for SET TARGET PORT GROUPS. Fix this by returning rc as is intended. It appears this bug was added by the following patch: commit 05d1c7c0d0db4cc25548d9aadebb416888a82327 Author: Andy Grover <agrover@redhat.com> Date: Wed Jul 20 19:13:28 2011 +0000 target: Make all control CDBs scatter-gather Signed-off-by: Roland Dreier <roland@purestorage.com> Cc: Andy Grover <agrover@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* tcm_qla2xxx: Clear session s_id + loop_id earlier during shutdownNicholas Bellinger2012-06-12
| | | | | | | | | | | | | | | | | | | | | | This patch adds a new tcm_qla2xxx_clear_sess_lookup() call to clear session specific s_id + loop_id entries used for se_node_acl pointer lookup ahead of releasing se_session within the process context workqueue callback in tcm_qla2xxx_free_session(). It makes the call in existing tcm_qla2xxx_clear_nacl_from_fcport_map() code invoked from qlt_unreg_sess() in interrupt context w/ hardware_lock held, ahead of the process context callback into qlt_free_session_done() -> tcm_qla2xxx_free_session(). We are doing this to address a race between incoming ATIO or TMR packets using stale se_node_acl pointer once session shutdown has been invoked via qlt_unreg_sess() in qla_target.c LLD code, and when the entire tcm_qla2xxx endpoint has not been forced into shutdown w/ echo 0 > ../$QLA2XXX_PORT/enable Cc: Joern Engel <joern@logfs.org> Cc: Roland Dreier <roland@purestorage.com> Cc: Arun Easi <arun.easi@qlogic.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* tcm_qla2xxx: Convert to TFO->put_session() usageJoern Engel2012-06-12
| | | | | | | | | | | This patch converts tcm_qla2xxx code to use an internal kref_put() for se_session->sess_kref in order to ensure that qla_hw_data->hardware_lock can be held while calling qlt_unreg_sess() for the final put. Signed-off-by: Joern Engel <joern@logfs.org> Cc: Roland Dreier <roland@purestorage.com> Cc: Arun Easi <arun.easi@qlogic.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* target: Add TFO->put_session() caller for HW fabric session shutdownJoern Engel2012-06-12
| | | | | | | | | | | | | This patch adds an optional target_core_fabric_ops->put_session() caller within the existing target_put_session() code path. This is required by tcm_qla2xxx code in order to invoke it's own fabric specific session shutdown handler using se_session->sess_kref. Signed-off-by: Joern Engel <joern@logfs.org> Cc: Roland Dreier <roland@purestorage.com> Cc: Arun Easi <arun.easi@qlogic.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
* Linux 3.5-rc2Linus Torvalds2012-06-08
|
* mm, oom: fix badness score underflowDavid Rientjes2012-06-08
| | | | | | | | | | | | | | | | If the privileges given to root threads (3% of allowable memory) or a negative value of /proc/pid/oom_score_adj happen to exceed the amount of rss of a thread, its badness score overflows as a result of commit a7f638f999ff ("mm, oom: normalize oom scores to oom_score_adj scale only for userspace"). Fix this by making the type signed and return 1, meaning the thread is still eligible for kill, if the value is negative. Reported-by: Dave Jones <davej@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2012-06-08
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar. * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix the relax_domain_level boot parameter sched: Validate assumptions in sched_init_numa() sched: Always initialize cpu-power sched: Fix domain iteration sched/rt: Fix lockdep annotation within find_lock_lowest_rq() sched/numa: Load balance between remote nodes sched/x86: Calculate booted cores after construction of sibling_mask
| * sched: Fix the relax_domain_level boot parameterDimitri Sivanich2012-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It does not get processed because sched_domain_level_max is 0 at the time that setup_relax_domain_level() is run. Simply accept the value as it is, as we don't know the value of sched_domain_level_max until sched domain construction is completed. Fix sched_relax_domain_level in cpuset. The build_sched_domain() routine calls the set_domain_attribute() routine prior to setting the sd->level, however, the set_domain_attribute() routine relies on the sd->level to decide whether idle load balancing will be off/on. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120605184436.GA15668@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * sched: Validate assumptions in sched_init_numa()Peter Zijlstra2012-06-06
| | | | | | | | | | | | | | | | | | | | | | | | Add some code to validate assumptions we're making and output warnings if they are not. If this trigger we want to know about it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Alex Shi <lkml.alex@gmail.com> Link: http://lkml.kernel.org/n/tip-6uc3wk5s9udxtdl9cnku0vtt@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * sched: Always initialize cpu-powerPeter Zijlstra2012-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | Often when we run into mis-shapen topologies the balance iteration fails to update the cpu power properly and we'll end up in /0 traps. Always initialize the cpu-power to a semi-sane value so that we can at least boot the machine, even if the load-balancer might not function correctly. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-3lbhyj25sr169ha7z3qht5na@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * sched: Fix domain iterationPeter Zijlstra2012-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Weird topologies can lead to asymmetric domain setups. This needs further consideration since these setups are typically non-minimal too. For now, make it work by adding an extra mask selecting which CPUs are allowed to iterate up. The topology that triggered it is the one from David Rientjes: 10 20 20 30 20 10 20 20 20 20 10 20 30 20 20 10 resulting in boxes that wouldn't even boot. Reported-by: David Rientjes <rientjes@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-3p86l9cuaqnxz7uxsojmz5rm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * sched/rt: Fix lockdep annotation within find_lock_lowest_rq()Peter Zijlstra2012-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Roland Dreier reported spurious, hard to trigger lockdep warnings within the scheduler - without any real lockup. This bit gives us the right clue: > [89945.640512] [<ffffffff8103fa1a>] double_lock_balance+0x5a/0x90 > [89945.640568] [<ffffffff8104c546>] push_rt_task+0xc6/0x290 if you look at that code you'll find the double_lock_balance() in question is the one in find_lock_lowest_rq() [yay for inlining]. Now find_lock_lowest_rq() has a bug.. it fails to use double_unlock_balance() in one exit path, if this results in a retry in push_rt_task() we'll call double_lock_balance() again, at which point we'll run into said lockdep confusion. Reported-by: Roland Dreier <roland@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337282386.4281.77.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * sched/numa: Load balance between remote nodesAlex Shi2012-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit cb83b629b ("sched/numa: Rewrite the CONFIG_NUMA sched domain support") removed the NODE sched domain and started checking if the node distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will lose the load balance chance at exec/fork/wake_affine points. But actually, even the node distance is farther than REMOTE_DISTANCE. Modern CPUs also has QPI like connections, which ensures that memory access is not too slow between nodes. So the above change in behavior on NUMA machine causes a performance regression on various benchmarks: hackbench, tbench, netperf, oltp, etc. This patch will recover the scheduler behavior to old mode on all my Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and thus fixes the perfromance regressions. (all of them just have 2 kinds distance, 10, 21) Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338965571-9812-1-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * sched/x86: Calculate booted cores after construction of sibling_maskKamalesh Babulal2012-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 316ad248307fb ("sched/x86: Rewrite set_cpu_sibling_map()") broke the booted_cores accounting. The problem is that the booted_cores accounting needs all the sibling links set up. So restore the second loop and add a comment as to why its needed. On qemu booted with -smp sockets=1,cores=2,threads=2; Before: $ grep cores /proc/cpuinfo cpu cores : 2 cpu cores : 1 cpu cores : 4 cpu cores : 3 With the patch: $ grep cores /proc/cpuinfo cpu cores : 2 cpu cores : 2 cpu cores : 2 cpu cores : 2 Reported-by: Prarit Bhargava <prarit@redhat.com> Reported-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120531073738.GH7511@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched/fair: fix lots of kernel-doc warningsRandy Dunlap2012-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix lots of new kernel-doc warnings in kernel/sched/fair.c: Warning(kernel/sched/fair.c:3625): No description found for parameter 'env' Warning(kernel/sched/fair.c:3625): Excess function parameter 'sd' description in 'update_sg_lb_stats' Warning(kernel/sched/fair.c:3735): No description found for parameter 'env' Warning(kernel/sched/fair.c:3735): Excess function parameter 'sd' description in 'update_sd_pick_busiest' Warning(kernel/sched/fair.c:3735): Excess function parameter 'this_cpu' description in 'update_sd_pick_busiest' .. more warnings Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Revert "drm/i915/crt: Do not rely upon the HPD presence pin"Linus Torvalds2012-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 9e612a008fa7fe493a473454def56aa321479495. It incorrectly finds VGA connectors where none are attached, apparently not noticing that nothing replied to the EDID queries, and happily using the default EDID modes that have nothing to do with actual hardware. That in turn then causes X to fall down to the lowest common denominator, which is usually the default 1024x768 mode that is in the default EDID and pretty much anything supports). I'd suggest that if not relying on the HDP pin, the code should at least check whether it gets valid EDID data back, rather than just assume there's something on the VGA connector. Cc: Dave Airlie <airlied@linux.ie> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'ext4_for_linus' of ↵Linus Torvalds2012-06-08
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Theodore Ts'o: "This update contains two bug fixes, both destined for the stable tree. Perhaps the most important is one which fixes ext4 when used with file systems originally formatted for use with ext3, but then later converted to take advantage of ext4." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: don't set i_flags in EXT4_IOC_SETFLAGS ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg
| * | ext4: don't set i_flags in EXT4_IOC_SETFLAGSTao Ma2012-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7990696 uses the ext4_{set,clear}_inode_flags() functions to change the i_flags automatically but fails to remove the error setting of i_flags. So we still have the problem of trashing state flags. Fix this by removing the assignment. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bgTheodore Ts'o2012-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ext3 filesystems that are converted to use as many ext4 file system features as possible will enable uninit_bg to speed up e2fsck times. These file systems will have a native ext3 layout of inode tables and block allocation bitmaps (as opposed to ext4's flex_bg layout). Unfortunately, in these cases, when first allocating a block in an uninitialized block group, ext4 would incorrectly calculate the number of free blocks in that block group, and then errorneously report that the file system was corrupt: EXT4-fs error (device vdd): ext4_mb_generate_buddy:741: group 30, 32254 clusters in bitmap, 32258 in gd This problem can be reproduced via: mke2fs -q -t ext4 -O ^flex_bg /dev/vdd 5g mount -t ext4 /dev/vdd /mnt fallocate -l 4600m /mnt/test The problem was caused by a bone headed mistake in the check to see if a particular metadata block was part of the block group. Many thanks to Kees Cook for finding and bisecting the buggy commit which introduced this bug (commit fd034a84e1, present since v3.2). Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Kees Cook <keescook@chromium.org> Cc: stable@kernel.org