aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* md/raid5: change raid5_compute_sector and stripe_to_pdidx to take a ↵NeilBrown2009-03-30
| | | | | | | | | | 'previous' argument This similar to the recent change to get_active_stripe. There is no functional change, just come rearrangement to make future patches cleaner. Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid5: simplify interface for init_stripe and get_active_stripeNeilBrown2009-03-30
| | | | | | | | | | | | | | Rather than passing 'pd_idx' and 'disks' to these functions, just pass 'previous' which tells whether to use the 'previous' or 'current' geometry during a reshape, and let init_stripe calculate disks and pd_idx and anything else it might need. This is not a substantial simplification and even adds a division. However we will shortly be adding more complexity to init_stripe to handle more interesting 'reshape' activities, and without this change, the interface to these functions would get very complex. Signed-off-by: NeilBrown <neilb@suse.de>
* md: Represent raid device size in sectors.Andre Noll2009-03-30
| | | | | | | | | | | | This patch renames the "size" field of struct mdk_rdev_s to "sectors" and changes this field to store sectors instead of blocks. All users of this field, linear.c, raid0.c and md.c, are fixed up accordingly which gets rid of many multiplications and divisions. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: Make mddev->size sector-based.Andre Noll2009-03-30
| | | | | | | | | | | | | | | | | This patch renames the "size" field of struct mddev_s to "dev_sectors" and stores the number of 512-byte sectors instead of the number of 1K-blocks in it. All users of that field, including raid levels 1,4-6,10, are adjusted accordingly. This simplifies the code a bit because it allows to get rid of a couple of divisions/multiplications by two. In order to make checkpatch happy, some minor coding style issues have also been addressed. In particular, size_store() now uses strict_strtoull() instead of simple_strtoull(). Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: be more consistent about setting WriteMostly flag when adding a drive to ↵NeilBrown2009-03-30
| | | | | | | | | | | | | | | | | | | | | | an array When a drive is added to an array using ADD_NEW_DISK, there are two places we can get certain flags from: the metadata on the disk or the flags passed through the IOCTL. For the WriteMostly flag (aka MD_DISK_WRITEMOSTLY) we take the value from either of those sources depending on if it is set (i.e. we effectively 'or' the two sources together). This makes it awkward to clear, and is at best inconsistent. As documented code (in mdadm) requires that setting MD_DISK_WRITEMOSTLY in the ioctl will be effective, we resolve the inconsistency by always using the value for this flag from the ioctl, and ignoring the value on disk. Signed-off-by: NeilBrown <neilb@suse.de>
* md: occasionally checkpoint drive recovery to reduce duplicate effort after ↵NeilBrown2009-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a crash Version 1.x metadata has the ability to record the status of a partially completed drive recovery. However we only update that record on a clean shutdown. It would be nice to update it on unclean shutdowns too, particularly when using a bitmap that removes much to the 'sync' effort after an unclean shutdown. One complication with checkpointing recovery is that we only know where we are up to in terms of IO requests started, not which ones have completed. And we need to know what has completed to record how much is recovered. So occasionally pause the recovery until all submitted requests are completed, then update the record of where we are up to. When we have a bitmap, we already do that pause occasionally to keep the bitmap up-to-date. So enhance that code to record the recovery offset and schedule a superblock update. And when there is no bitmap, just pause 16 times during the resync to do a checkpoint. '16' is a fairly arbitrary number. But we don't really have any good way to judge how often is acceptable, and it seems like a reasonable number for now. Signed-off-by: NeilBrown <neilb@suse.de>
* md: move md_k.h from include/linux/raid/ to drivers/md/NeilBrown2009-03-30
| | | | | | It really is nicer to keep related code together.. Signed-off-by: NeilBrown <neilb@suse.de>
* md: move lots of #include lines out of .h files and into .cNeilBrown2009-03-30
| | | | | | | | | | This makes the includes more explicit, and is preparation for moving md_k.h to drivers/md/md.h Remove include/raid/md.h as its only remaining use was to #include other files. Signed-off-by: NeilBrown <neilb@suse.de>
* md: move most content from md.h to md_k.hNeilBrown2009-03-30
| | | | | | | | | | | | The extern function definitions are kernel-internal definitions, so they belong in md_k.h The MD_*_VERSION values could reasonably go in a number of places, but md_u.h seems most reasonable. This leaves almost nothing in md.h. It will go soon. Signed-off-by: NeilBrown <neilb@suse.de>
* md: move LEVEL_* definition from md_k.h to md_u.hNeilBrown2009-03-30
| | | | | | | | | | | .. as they are part of the user-space interface. Also move MdpMinorShift into there so we can remove duplication. Lastly move mdp_major in. It is less obviously part of the user-space interface, but do_mounts_md.c uses it, and it is acting a bit like user-space. Signed-off-by: NeilBrown <neilb@suse.de>
* md: move headers out of include/linux/raid/Christoph Hellwig2009-03-30
| | | | | | | | | | Move the headers with the local structures for the disciplines and bitmap.h into drivers/md/ so that they are more easily grepable for hacking and not far away. md.h is left where it is for now as there are some uses from the outside. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de>
* cleanup drivers/md/MakefileChristoph Hellwig2009-03-30
| | | | | | | | | | Use the -y variables instead of the old -objs so we can easily add conditional objects to the modules. Also always use += to add subobjects to avoid problems when placing additional objects in some place in the file. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de>
* md: stop defining MAJOR_NRChristoph Hellwig2009-03-30
| | | | | | | | MAJOR_NR was only required for magic in linux/blk.h in 2.4 or earlier kernels, so no need to keep it around. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de>
* MD data integrity supportMartin K. Petersen2009-03-30
| | | | | | | | | | md: Add support for data integrity to MD If all subdevices support the same protection format the MD device is flagged as integrity capable. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: write bitmap information to devices that are undergoing recovery.NeilBrown2009-03-30
| | | | | | | | | | | | | | | When we add some spares to an array and start recovery, and we have a bitmap which is stored 'internally' on all devices, we call bitmap_write_all to make sure the bitmap is correct on the new device(s). However that doesn't work as write_sb_page only writes to 'In_sync' devices, and devices undergoing recovery are not 'In_sync' until recovery finishes. So extend write_sb_page (actually next_active_rdev) to include devices that are under recovery. Signed-off-by: NeilBrown <neilb@suse.de>
* md: never clear bit from the write-intent bitmap when the array is degraded.NeilBrown2009-03-30
| | | | | | | | | | | | | | | | | | | | | It is safe to clear a bit from the write-intent bitmap for a raid1 if we know the data has been written to all devices, which is what the current test does. But it is not always safe to update the 'events_cleared' counter in that case. This is because one request could complete successfully after some other request has partially failed. So simply disable the clearing and updating of events_cleared whenever the array is degraded. This might end up not clearing some bits that could safely be cleared, but it is safest approach. Note that the bug fixed here did not risk corrupting data by letting the array get out-of-sync. Rather it meant that when a device is removed and re-added to the array, it might incorrectly require a full recovery rather than just recovering based on the bitmap. Signed-off-by: NeilBrown <neilb@suse.de>
* md: Allow write-intent bitmaps to have chunksize < PAGE_SIZENeilBrown2009-03-30
| | | | | | | | | | | | | | | | | | | | | | | | md currently insists that the chunk size used for write-intent bitmaps (the amount of data that corresponds to one chunk) be at least one page. The reason for this restriction is lost in the mists of time, but a review of the code (and a vague memory) suggests that the only problem would be related to resync. Resync tries very hard to work in multiples of a page, but also needs to sync with units of a bitmap_chunk too. This connection comes out in the bitmap_start_sync call. So change bitmap_start_sync to always work in multiples of a page. If the bitmap chunk size is less that one page, we flag multiple chunks as 'syncing' and generally make them all appear to the resync routines like one chunk. All other code either already works with data ranges that could span multiple chunks, or explicitly only cares about a single chunk. Signed-off-by: Neil Brown <neilb@suse.de>
* md: Fix is_mddev_idle test (again).NeilBrown2009-03-30
| | | | | | | | | | | | | | | | | | | | There are two problems with is_mddev_idle. 1/ sync_io is 'atomic_t' and hence 'int'. curr_events and all the rest are 'long'. So if sync_io were to wrap on a 64bit host, the value of curr_events would go very negative suddenly, and take a very long time to return to positive. So do all calculations as 'int'. That gives us plenty of precision for what we need. 2/ To initialise rdev->last_events we simply call is_mddev_idle, on the assumption that it will make sure that last_events is in a suitable range. It used to do this, but now it does not. So now we need to be more explicit about initialisation. Signed-off-by: NeilBrown <neilb@suse.de>
* Merge branch 'fixes' of ↵Linus Torvalds2009-03-09
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Add p4-clockmod sysfs-ui removal to feature-removal schedule. Revert "[CPUFREQ] Disable sysfs ui for p4-clockmod."
| * [CPUFREQ] Add p4-clockmod sysfs-ui removal to feature-removal schedule.Dave Jones2009-03-09
| | | | | | | | | | Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Dave Jones <davej@redhat.com>
| * Revert "[CPUFREQ] Disable sysfs ui for p4-clockmod."Dave Jones2009-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit e088e4c9cdb618675874becb91b2fd581ee707e6. Removing the sysfs interface for p4-clockmod was flagged as a regression in bug 12826. Course of action: - Find out the remaining causes of overheating, and fix them if possible. ACPI should be doing the right thing automatically. If it isn't, we need to fix that. - mark p4-clockmod ui as deprecated - try again with the removal in six months. It's not really feasible to printk about the deprecation, because it needs to happen at all the sysfs entry points, which means adding a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch. Signed-off-by: Dave Jones <davej@redhat.com>
* | copy_process: fix CLONE_PARENT && parent_exec_id interactionOleg Nesterov2009-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CLONE_PARENT can fool the ->self_exec_id/parent_exec_id logic. If we re-use the old parent, we must also re-use ->parent_exec_id to make sure exit_notify() sees the right ->xxx_exec_id's when the CLONE_PARENT'ed task exits. Also, move down the "p->parent_exec_id = p->self_exec_id" thing, to place two different cases together. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-03-09
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits) p54: fix race condition in memory management cfg80211: test before subtraction on unsigned iwlwifi: fix error flow in iwl*_pci_probe rt2x00 : more devices to rt73usb.c rt2x00 : more devices to rt2500usb.c bonding: Fix device passed into ->ndo_neigh_setup(). vlan: Fix vlan-in-vlan crashes. net: Fix missing dev->neigh_setup in register_netdevice(). tmspci: fix request_irq race pkt_sched: act_police: Fix a rate estimator test. tg3: Fix 5906 link problems SCTP: change sctp_ctl_sock_init() to try IPv4 if IPv6 fails IPv6: add "disable" module parameter support to ipv6.ko sungem: another error printed one too early aoe: error printed 1 too early net pcmcia: worklimit reaches -1 net: more timeouts that reach -1 net: fix tokenring license dm9601: new vendor/product IDs netlink: invert error code in netlink_set_err() ...
| * | p54: fix race condition in memory managementChristian Lamparter2009-03-06
| | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a number of race conditions in the driver. Up until now, "entry" pointer was initialized before acquiring the right lock. Signed-off-by: Christian Lamparter <chunkeey@web.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | cfg80211: test before subtraction on unsignedRoel Kluin2009-03-06
| | | | | | | | | | | | | | | | | | | | | freq_diff is unsigned, so test before subtraction Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | iwlwifi: fix error flow in iwl*_pci_probeReinette Chatre2009-03-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | Both the agn and 3945 drivers has some problems with dealing with errors in their probe functions. Ensure that a goto will undo only things that was done before the goto was called. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | rt2x00 : more devices to rt73usb.cXose Vazquez Perez2009-03-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | add more usb_dev to rt73usb.c . IDs 'stolen' from the windows inf file(10/21/2008, 1.03.02.0000) plus some from the Ralink linux driver(2009_0206_RT73_Linux_STA_Drv1.1.0.2.tar.bz2) Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | rt2x00 : more devices to rt2500usb.cXose Vazquez Perez2009-03-05
| | | | | | | | | | | | | | | | | | | | | | | | add more usb_dev to rt2500usb.c . IDs 'stolen' from the windows inf file(02/12/2009, 2.01.01.0015). Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | bonding: Fix device passed into ->ndo_neigh_setup().Patrick McHardy2009-03-05
| | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller2009-03-05
| |\ \
| * | | vlan: Fix vlan-in-vlan crashes.David S. Miller2009-03-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As analyzed by Patrick McHardy, vlan needs to reset it's netdev_ops pointer in it's ->init() function but this leaves the compat method pointers stale. Add a netdev_resync_ops() and call it from the vlan code. Any other driver which changes ->netdev_ops after register_netdevice() will need to call this new function after doing so too. With help from Patrick McHardy. Tested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Fix missing dev->neigh_setup in register_netdevice().David S. Miller2009-03-05
| | | | | | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tmspci: fix request_irq raceMeelis Roos2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, tmspci tokenring driver crashes on device initialization because it requests its irq before initializing corresponding data structures. Fix this by moving request_irq call to a safer place. Signed-off-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | pkt_sched: act_police: Fix a rate estimator test.Jarek Poplawski2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A commit c1b56878fb68e9c14070939ea4537ad4db79ffae "tc: policing requires a rate estimator" introduced a test which invalidates previously working configs, based on examples from iproute2: doc/actions/actions-general. This is too rigorous: a rate estimator is needed only when police's "avrate" option is used. Reported-by: Joao Correia <joaomiguelcorreia@gmail.com> Diagnosed-by: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tg3: Fix 5906 link problemsMatt Carlson2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6833c043f9fc03696fde623914c4a0277df2a0bc introduced the phy auto-powerdown capability. While the APD feature only works for 5761 and 5784 asic revisions, the (harmless portion of the) code was applied to all 5705 and newer devices. However, the 5906 phy departs from the usual design. This commit was interfering with the 5906's ability to negotiate link against some switches. This patch corrects the problem. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | SCTP: change sctp_ctl_sock_init() to try IPv4 if IPv6 failsBrian Haley2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change sctp_ctl_sock_init() to try IPv4 if IPv6 socket registration fails. Required if the IPv6 module is loaded with "disable=1", else SCTP will fail to load. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | IPv6: add "disable" module parameter support to ipv6.koBrian Haley2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add "disable" module parameter support to ipv6.ko by specifying "disable=1" on module load. We just do the minimum of initializing inetsw6[] so calls from other modules to inet6_register_protosw() won't OOPs, then bail out. No IPv6 addresses or sockets can be created as a result, and a reboot is required to enable IPv6. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | sungem: another error printed one too earlyRoel Kluin2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Another error was printed one too early. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | aoe: error printed 1 too earlyRoel Kluin2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with while (i-- > 0); i reaches -1 after the loop, so the test below is printed one too early: 0 still means success. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net pcmcia: worklimit reaches -1Roel Kluin2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with while (--worklimit >= 0); worklimit reaches -1 after the loop. In 3c589_cs.c this caused a warning not to be printed. In 3c574_cs.c contrastingly, el3_rx() treats worklimit differently: static int el3_rx(struct net_device *dev, int worklimit) { while (--worklimit >= 0) { ... } return worklimit; } el3_rx() is only called by function el3_interrupt(): twice: static irqreturn_t el3_interrupt(int irq, void *dev_id) { int work_budget = max_interrupt_work; while(...) { if (...) work_budget = el3_rx(dev, work_budget); if (...) work_budget = el3_rx(dev, work_budget); if (--work_budget < 0) { ... break; } } } The error path can occur 2 too early. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: more timeouts that reach -1Roel Kluin2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with while (timeout-- > 0); timeout reaches -1 after the loop, so the tests below are off by one. also don't do an '< 0' test on an unsigned. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: fix tokenring licenseMeelis Roos2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, modular tokenring ("tr") lacks a license and fails to load: tr: module license 'unspecified' taints kernel. tr: Unknown symbol proc_net_fops_create Beacuse of this, no tokenring driver can load if it depends on modular tr. Fix this by adding GPL module license as it is in the kernel. With this fix, tr module loads fine and tms380 driver also loads. Well, it does'nt work but that's a different bug. Signed-off-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | dm9601: new vendor/product IDsPeter Korsgaard2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add vendor/product IDs for new no name dm9601 compatible usb ethernet adaptors. Reported-by: Eric Lauriault <eric@linux.ca> Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netlink: invert error code in netlink_set_err()Pablo Neira Ayuso2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The callers of netlink_set_err() currently pass a negative value as parameter for the error code. However, sk->sk_err wants a positive error value. Without this patch, skb_recv_datagram() called by netlink_recvmsg() may return a positive value to report an error. Another choice to fix this is to change callers to pass a positive error value, but this seems a bit inconsistent and error prone to me. Indeed, the callers of netlink_set_err() assumed that the (usual) negative value for error codes was fine before this patch :). This patch also includes some documentation in docbook format for netlink_set_err() to avoid this sort of confusion. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netns: Remove net_aliveEric W. Biederman2009-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that net_alive is unnecessary, and the original problem that led to it being added was simply that the icmp code thought it was a network device and wound up being unable to handle packets while there were still packets in the network namespace. Now that icmp and tcp have been fixed to properly register themselves this problem is no longer present and we have a stronger guarantee that packets will not arrive in a network namespace then that provided by net_alive in netif_receive_skb. So remove net_alive allowing packet reception run a little faster. Additionally document the strong reason why network namespace cleanup is safe so that if something happens again someone else will have a chance of figuring it out. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp: Like icmp use register_pernet_subsysEric W. Biederman2009-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To remove the possibility of packets flying around when network devices are being cleaned up use reisger_pernet_subsys instead of register_pernet_device. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netns: Fix icmp shutdown.Eric W. Biederman2009-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently I had a kernel panic in icmp_send during a network namespace cleanup. There were packets in the arp queue that failed to be sent and we attempted to generate an ICMP host unreachable message, but failed because icmp_sk_exit had already been called. The network devices are removed from a network namespace and their arp queues are flushed before we do attempt to shutdown subsystems so this error should have been impossible. It turns out icmp_init is using register_pernet_device instead of register_pernet_subsys. Which resulted in icmp being shut down while we still had the possibility of packets in flight, making a nasty NULL pointer deference in interrupt context possible. Changing this to register_pernet_subsys fixes the problem in my testing. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netns: fix addrconf_ifdown kernel panicDaniel Lezcano2009-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a network namespace is destroyed the network interfaces are all unregistered, making addrconf_ifdown called by the netdevice notifier. In the other hand, the addrconf exit method does a loop on the network devices and does addrconf_ifdown on each of them. But the ordering of the netns subsystem is not right because it uses the register_pernet_device instead of register_pernet_subsys. If we handle the loopback as any network device, we can safely use register_pernet_subsys. But if we use register_pernet_subsys, the addrconf exit method will do exactly what was already done with the unregistering of the network devices. So in definitive, this code is pointless. I removed the netns addrconf exit method and moved the code to the addrconf cleanup function. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipv6: Fix sysctl unregistration deadlockStephen Hemminger2009-03-03
| | | | | | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Avoid race between network down and sysfsStephen Hemminger2009-03-03
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>