| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
The function textsearch_prepare has a new flag to support case
insensitive searching.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
The help text should refer to nflog instead of ulog. Noticed by
Krzysztof Halasa <khc@pm.waw.pl>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
| |
Kernel functions are not for userspace.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
One still needs to remove checks in nf_hook_slow() and nf_sockopt_find()
to test this, though.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
ctnetlink does not need to allocate the conntrack entries with GFP_ATOMIC
as its code is executed in user context.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
Get rid of refrences to no longer existant Fast NAT.
IP_ROUTE_NAT support was removed in August of 2004, but references to Fast
NAT were left in a couple of config options.
Signed-off-by: Russ Dill <Russ.Dill@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
| |
Signed-off-by: Alexey Dobriyan <adobriyan@parallels.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
| |
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
| |
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
| |
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
The vlan_hwaccel_{rx,receive_skb} functions expect the full TCI field
for priority mappings, don't truncate the upper 4 bits.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
Commit dad9b335 (netdevice: Fix promiscuity and allmulti overflow) broke
dev_set_promiscuity() by returning on success without reprogramming the
device.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
| |
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
The filter_cnt is supposed to count filter references to a class.
Since the qdisc can't be the target of a filter, it doesn't need
a filter_cnt. In fact the counter is never decreased since cls_api
considers a return value of zero a failure and doesn't unbind again.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
Now that the qdisc isn't destroyed in hierarchical order anymore,
the only user of the child lists left is htb_parent_last_child().
This can be easily changed to use a counter of children to save
a few bytes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
| |
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hash list removal currently happens twice (once in htb_delete, once
in htb_destroy_class), which makes it harder to use the dynamically
sized class hash without adding special cases for HTB. The reason is
that qdisc destruction destroys classes in hierarchical order, which
is not necessary if filters are destroyed in a separate iteration
during qdisc destruction.
Adjust qdisc destruction to follow the same scheme as other hierarchical
qdiscs by first performing a filter destruction pass, then destroying
all classes in hash order.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
| |
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
| |
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently all qdiscs which allow to create classes uses a fixed sized hash
table with size 16 to hash the classes. This causes a large bottleneck
when using thousands of classes and unbound filters.
Add helpers for dynamically sized class hashes to fix this. The following
patches will convert the qdiscs to use them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
Documentation/feature-removal-schedule.txt
drivers/net/wan/hdlc_fr.c
drivers/net/wireless/iwlwifi/iwl-4965.c
drivers/net/wireless/iwlwifi/iwl3945-base.c
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The restart() function is called when the link state changes and resets
multicast and promiscuous settings. This patch restores those settings at the
end of restart().
Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Short packets has to be discarded by the driver. So this patch addresses the
issue of discarding the short packets of size lesser then ethernet header
size.
Signed-off-by: Sathya Narayanan <sathyan@teamf1.com>
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The descriptor pointers were not initialized to NIL values, so it was
poiniting to some random addresses which was completely invalid. This
fix takes care of initializing the descriptor to NIL values and clearing
the valid descriptors on clean ring operation.
Signed-off-by: Sathya Narayanan <sathyan@teamf1.com>
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
iph->tot_len is stored in network byte order, so access it using
ntohs(). This doesn't have any real world impact on pasemi_mac, since
the device only exists as part of a big-endian system-on-chip, but
fixing this gets rid of a sparse warning and avoids having a bad example
in the tree.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
iph->tot_len is stored in network byte order, so access it using
ntohs(). This doesn't have any real world impact on ehea, since ehea
only exists for big-endian platfroms (at the moment at least) but fixing
this gets rid of a sparse warning and avoids having a bad example in the
tree.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When ehea_stop is called the function
cancel_work_sync(&port->reset_task) is used to ensure
that the reset task is not running anymore. We need an
additional flag to ensure that it can not be scheduled
after this call again for a certain time.
Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Required to allow distros to easily detect when ehea
module needs to be loaded
Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A mutex has to be replaced by spinlocks as it can be called from
a context which does not allow sleeping.
The kzalloc flag GFP_KERNEL has to be replaced by GFP_ATOMIC
for the same reason.
Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After enabling CONFIG_LOCKDEP and CONFIG_PROVE_LOCKING I get the
following warning when ethtool -s is first called on one of the
forcedeth ports:
=================================
[ INFO: inconsistent lock state ]
2.6.26-rc4 #28
---------------------------------
inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
ethtool/1985 [HC0[0]:SC0[1]:HE1:SE0] takes:
(&np->lock){++..}, at: [<ffffffffa000c5fd>] nv_set_settings+0xc8/0x3de [forcedeth]
{in-hardirq-W} state was registered at:
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 3606
hardirqs last enabled at (3605): [<ffffffff8068106f>] _spin_unlock_irqrestore+0x3f/0x68
hardirqs last disabled at (3604): [<ffffffff80680d38>] _spin_lock_irqsave+0x13/0x46
softirqs last enabled at (3534): [<ffffffff80246ba5>] __do_softirq+0xbc/0xc5
softirqs last disabled at (3606): [<ffffffff80680b33>] _spin_lock_bh+0x11/0x41
other info that might help us debug this:
2 locks held by ethtool/1985:
#0: (rtnl_mutex){--..}, at: [<ffffffff80596072>] rtnl_lock+0x12/0x14
#1: (_xmit_ETHER){-+..}, at: [<ffffffffa000c5e8>] nv_set_settings+0xb3/0x3de [forcedeth]
stack backtrace:
Pid: 1985, comm: ethtool Not tainted 2.6.26-rc4 #28
Call Trace:
[<ffffffff8025f190>] print_usage_bug+0x162/0x173
[<ffffffff8025fa8b>] mark_lock+0x231/0x41f
[<ffffffff802607cf>] __lock_acquire+0x4e7/0xcac
[<ffffffff8025fe64>] ? trace_hardirqs_on+0xf1/0x115
[<ffffffff80272c3a>] ? disable_irq_nosync+0x6f/0x7b
[<ffffffff80261375>] lock_acquire+0x55/0x6e
[<ffffffffa000c5fd>] ? :forcedeth:nv_set_settings+0xc8/0x3de
[<ffffffff80680b15>] _spin_lock+0x2f/0x3c
[<ffffffffa000c5fd>] :forcedeth:nv_set_settings+0xc8/0x3de
[<ffffffff8058f8bb>] dev_ethtool+0x186/0xea3
[<ffffffff8067f446>] ? mutex_lock_nested+0x243/0x275
[<ffffffff8025df2b>] ? debug_mutex_free_waiter+0x46/0x4a
[<ffffffff8067f469>] ? mutex_lock_nested+0x266/0x275
[<ffffffff8058e1ce>] dev_ioctl+0x4eb/0x600
[<ffffffff8068106f>] ? _spin_unlock_irqrestore+0x3f/0x68
[<ffffffff80580f91>] sock_ioctl+0x1f5/0x202
[<ffffffff802a322e>] vfs_ioctl+0x2a/0x77
[<ffffffff802a34d6>] do_vfs_ioctl+0x25b/0x270
[<ffffffff806807b6>] ? trace_hardirqs_on_thunk+0x35/0x3a
[<ffffffff802a352d>] sys_ioctl+0x42/0x65
[<ffffffff8021fffb>] system_call_after_swapgs+0x7b/0x80
This is caused by the following snippet in nv_set_settings:
netif_carrier_off(dev);
if (netif_running(dev)) {
nv_disable_irq(dev);
netif_tx_lock_bh(dev);
spin_lock(&np->lock);
/* stop engines */
nv_stop_rxtx(dev);
spin_unlock(&np->lock);
netif_tx_unlock_bh(dev);
}
Because of nv_disable_irq this is probably not really a problem
though (I guess) and replacing the spin_lock with spin_lock_irqsave
could keep interrupts disabled for a longer period of time because
of delays in nv_stop_rx and nv_stop_tx.
Signed-off-by: Tobias Diedrich <ranma+kernel@tdiedrich.de>
Cc: Ayaz Abdulla <aabdulla@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 4c13eb6657fe9ef7b4dc8f1a405c902e9e5234e0 ([ETH]: Make
eth_type_trans set skb->dev like the other *_type_trans) removed
skb->dev assignment from hdlc_fr.c:fr_rx(). Unfortunately it was also
needed for cases other than eth_type_trans().
Adding it back.
It's quite serious and may be a security risk as it causes a wrong
input interface indication (the physical hdlcX instead of logical
pvcX). Probably -stable class fix.
Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Unregistering a bridge device may cause virtual devices stacked on the
bridge, like vlan or macvlan devices, to be unregistered as well.
br_cleanup_bridges() uses for_each_netdev_safe() to iterate over all
devices during cleanup. This is not enough however, if one of the
additionally unregistered devices is next in the list to the bridge
device, it will get freed as well and the iteration continues on
the freed element.
Restart iteration after each bridge device removal from the beginning to
fix this, similar to what rtnl_link_unregister() does.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| | |
<used> should be of type int (not size_t) since recv_actor can return
negative values and it is also used in a < 0 comparison.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
alpha:
net/ipv4/tcp.c: In function 'tcp_calc_md5_hash':
net/ipv4/tcp.c:2479: error: implicit declaration of function 'sg_init_table' net/ipv4/tcp.c:2482: error: implicit declaration of function 'sg_set_buf'
net/ipv4/tcp.c:2507: error: implicit declaration of function 'sg_mark_end'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\
| | |
| | |
| | | |
master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Handle .reset_resume() so that libertas can survive suspend/resume without
reloading the firmware.
Signed-off-by: Andrey Yurovsky <andrey@cozybit.com>
Acked-by: Deepak Saxena <dsaxena@laptop.org>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch fixes the problem to keep mac80211 resubmitting SKBs
when Tx request cannot be met in monitor mode.
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch fixes the rates reported in monitor mode operation
(Wireshark) for iwlwifi.
Previously, packets with rates of 6M..24M would be reported
incorrectly and packets with rates of 36M..54M would not passed
up the stack.
Signed-off-by: Rick Farrington <rickdic@hotmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: acpiphp: cleanup notify handler on all root bridges
PCI: Limit VPD read/write lengths for Broadcom 5706, 5708, 5709 rev.
PCI: Restrict VPD read permission to root
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
During the development of the physical PCI slot patch series, Gary Hade
kept on reporting strange oopses due to interactions between pci_slot
and acpiphp.
http://lkml.org/lkml/2007/11/28/319
find_root_bridges() unconditionally installs
handle_hotplug_event_bridge() as an ACPI_SYSTEM_NOTIFY handler for all
root bridges.
However, during module cleanup, remove_bridge() will only remove the
notify handler iff the root bridge had a hot-pluggable slot directly
underneath. That is:
root bridge -> hotplug slot
But, if the topology looks like either of the following:
root bridge -> non-hotplug slot
root bridge -> p2p bridge -> hotplug slot
Then we currently do not remove the notify handler from that root
bridge.
This can cause a kernel oops if we modprobe acpiphp later and it gets
loaded somewhere else in memory. If the root bridge then receives a
hotplug event, it will then attempt to call a stale, non-existent notify
handler and we blow up.
Much thanks goes to Gary Hade for his persistent debugging efforts.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
For Broadcom 5706, 5708, 5709 rev. A nics, any read beyond the
VPD end tag will hang the device. This problem was initially
observed when a vpd entry was created in sysfs
('/sys/bus/pci/devices/<id>/vpd'). A read to this sysfs entry
will dump 32k of data. Reading a full 32k will cause an access
beyond the VPD end tag causing the device to hang. Once the device
is hung, the bnx2 driver will not be able to reset the device.
We believe that it is legal to read beyond the end tag and
therefore the solution is to limit the read/write length.
A majority of this patch is from Matthew Wilcox who gave code for
reworking the PCI vpd size information. A PCI quirk added for the
Broadcom NIC's to limit the read/write's.
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Some PCI devices will lock up if we attempt to read from VPD addresses
beyond some device-dependent limit. Until we can identify these
devices and adjust the file size accordingly, only let root read VPD
through sysfs to prevent a DoS by normal users.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
* 'i2c-fix' of git://aeryn.fluff.org.uk/bjdooks/linux:
I2C: S3C2410: Add MODULE_ALIAS() for s3c2440 device.
I2C: S3C2410: Fixup error codes returned rom a transfer.
I2C: S3C2410: Check ACK on byte transmission
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add a MODULE_ALIAS() statement for the i2c-s3c2410 controller
to ensure that it can be autoloaded on the S3C2440 systems that
we support.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The driver should be returning -ENXIO for transfers that do not
pass the initial address byte stage.
Note, also small tidyups to the driver comments in the area.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We should check for the reception of an ACK after transmitting each
data byte. The address send has been correctly checking this, but the
data write byte state should have also been checking for these failures.
As part of the same fix, we remove the ACK checking from the receive
path where it should not have been checking for an ACK which our hardware
was sending.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
* 'for-2.6.26' of git://git.kernel.dk/linux-2.6-block:
Properly notify block layer of sync writes
block: Fix the starving writes bug in the anticipatory IO scheduler
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.
This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:
INFO: task kjournald:20558 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kjournald D ffff810010820978 6712 20558 2
ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
Call Trace:
[<ffffffff803ba6f2>] kobject_get+0x12/0x17
[<ffffffff80247537>] getnstimeofday+0x2f/0x83
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d195>] io_schedule+0x5d/0x9f
[<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f
[<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff80243909>] wake_bit_function+0x0/0x23
[<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb
[<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6
[<ffffffff8023a676>] lock_timer_base+0x26/0x4b
[<ffffffff8030300a>] kjournald+0xc1/0x1fb
[<ffffffff802438db>] autoremove_wake_function+0x0/0x2e
[<ffffffff80302f49>] kjournald+0x0/0x1fb
[<ffffffff802437bb>] kthread+0x47/0x74
[<ffffffff8022de51>] schedule_tail+0x28/0x5d
[<ffffffff8020cac8>] child_rip+0xa/0x12
[<ffffffff80243774>] kthread+0x0/0x74
[<ffffffff8020cabe>] child_rip+0x0/0x12
Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
AS scheduler alternates between issuing read and write batches. It does
the batch switch only after all requests from the previous batch are
completed.
When switching to a write batch, if there is an on-going read request,
it waits for its completion and indicates its intention of switching by
setting ad->changed_batch and the new direction but does not update the
batch_expire_time for the new write batch which it does in the case of
no previous pending requests.
On completion of the read request, it sees that we were waiting for the
switch and schedules work for kblockd right away and resets the
ad->changed_data flag.
Now when kblockd enters dispatch_request where it is expected to pick
up a write request, it in turn ends the write batch because the
batch_expire_timer was not updated and shows the expire timestamp for
the previous batch.
This results in the write starvation for all the cases where there is
the intention for switching to a write batch, but there is a previous
in-flight read request and the batch gets reverted to a read_batch
right away.
This also holds true in the reverse case (switching from a write batch
to a read batch with an in-flight write request).
I've checked that this bug exists on 2.6.11, 2.6.18, 2.6.24 and
linux-2.6-block git HEAD. I've tested the fix on x86 platforms with
SCSI drives where the driver asks for the next request while a current
request is in-flight.
This patch is based off linux-2.6-block git HEAD.
Bug reproduction:
A simple scenario which reproduces this bug is:
- dd if=/dev/hda3 of=/dev/null &
- lilo
The lilo takes forever to complete.
This can also be reproduced fairly easily with the earlier dd and
another test
program doing msync().
The example test program below should print out a message after every
iteration
but it simply hangs forever. With this bugfix it makes forward progress.
====
Example test program using msync() (thanks to suleiman AT google DOT
com)
inline uint64_t
rdtsc(void)
{
int64_t tsc;
__asm __volatile("rdtsc" : "=A" (tsc));
return (tsc);
}
int
main(int argc, char **argv)
{
struct stat st;
uint64_t e, s, t;
char *p, q;
long i;
int fd;
if (argc < 2) {
printf("Usage: %s <file>\n", argv[0]);
return (1);
}
if ((fd = open(argv[1], O_RDWR | O_NOATIME)) < 0)
err(1, "open");
if (fstat(fd, &st) < 0)
err(1, "fstat");
p = mmap(NULL, st.st_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
t = 0;
for (i = 0; i < 1000; i++) {
*p = 0;
msync(p, 4096, MS_SYNC);
s = rdtsc();
*p = 0;
__asm __volatile(""::: "memory");
e = rdtsc();
if (argc > 2)
printf("%d: %lld cycles %jd %jd\n",
i, e - s, (intmax_t)s, (intmax_t)e);
t += e - s;
}
printf("average time: %lld cycles\n", t / 1000);
return (0);
}
Cc: <stable@kernel.org>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] export account_system_vtime
[IA64] Bugfix for system with 32 cpus
|