aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux
Commit message (Collapse)AuthorAge
* net: Make devnet_rename_seq staticThomas Gleixner2013-07-24
| | | | | | | | No users outside net/core/dev.c. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: TCP_NOTSENT_LOWAT socket optionEric Dumazet2013-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Idea of this patch is to add optional limitation of number of unsent bytes in TCP sockets, to reduce usage of kernel memory. TCP receiver might announce a big window, and TCP sender autotuning might allow a large amount of bytes in write queue, but this has little performance impact if a large part of this buffering is wasted : Write queue needs to be large only to deal with large BDP, not necessarily to cope with scheduling delays (incoming ACKS make room for the application to queue more bytes) For most workloads, using a value of 128 KB or less is OK to give applications enough time to react to POLLOUT events in time (or being awaken in a blocking sendmsg()) This patch adds two ways to set the limit : 1) Per socket option TCP_NOTSENT_LOWAT 2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets not using TCP_NOTSENT_LOWAT socket option (or setting a zero value) Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect. This changes poll()/select()/epoll() to report POLLOUT only if number of unsent bytes is below tp->nosent_lowat Note this might increase number of sendmsg()/sendfile() calls when using non blocking sockets, and increase number of context switches for blocking sockets. Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is defined as : Specify the minimum number of bytes in the buffer until the socket layer will pass the data to the protocol) Tested: netperf sessions, and watching /proc/net/protocols "memory" column for TCP With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458) lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols TCPv6 1880 2 45458 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y TCP 1696 508 45458 no 208 yes kernel y y y y y y y y y y y y y n y y y y y lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols TCPv6 1880 2 20567 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y TCP 1696 508 20567 no 208 yes kernel y y y y y y y y y y y y y n y y y y y Using 128KB has no bad effect on the throughput or cpu usage of a single flow, although there is an increase of context switches. A bonus is that we hold socket lock for a shorter amount of time and should improve latencies of ACK processing. lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf. Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand Size Size Size (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1651584 6291456 16384 20.00 17447.90 10^6bits/s 3.13 S -1.00 U 0.353 -1.000 usec/KB Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3': 412,514 context-switches 200.034645535 seconds time elapsed lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf. Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand Size Size Size (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1593240 6291456 16384 20.00 17321.16 10^6bits/s 3.35 S -1.00 U 0.381 -1.000 usec/KB Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3': 2,675,818 context-switches 200.029651391 seconds time elapsed Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-By: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* team: add support for sending multicast rejoinsJiri Pirko2013-07-23
| | | | | | | | | Similar to what is implemented in bonding. User is able to ask team driver to send IGMP rejoins in case port is enabled or disabled. Using previously introduced netdev notifier. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: convert resend IGMP to notifier eventJiri Pirko2013-07-23
| | | | | | | | | | | | Until now, bond_resend_igmp_join_requests() looks for vlans attached to bonding device, bridge where bonding act as port manually. It does not care of other scenarios, like stacked bonds or team device above. Make this more generic and use netdev notifier to propagate the event to upper devices and to actually call ip_mc_rejoin_groups(). Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* team: add peer notificationJiri Pirko2013-07-23
| | | | | | | | When port is enabled or disabled, allow to notify peers by unsolicitated NAs or gratuitous ARPs. Disabled by default. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-07-18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "A couple interesting SKB fragment handling fixes, plus the usual small bits here and there: 1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from Tim Gardner. 2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC address printing helper. 3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the device can't do checksumming offloads then it shouldn't say it can do SG either. From Haiyang Zhang. 4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens. 5) Don't leak DMA mappings on mapping failures, from Neil Horman. 6) We need to reset the transport header of SKBs in ipv4 before we attempt to perform early socket demux, just like ipv6 does. From Eric Dumazet. 7) Add missing locking on vxlan device removal, from Stephen Hemminger. 8) xen-netfront has to make two passes over an SKB to prepare it for transfer. One pass calculates the number of slots needed, the second massages the SKB and fills the slots. Unfortunately, the first pass doesn't calculate the number of slots properly so we can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't work out so well. Fix from Jan Beulich with help and discussion with several others. 9) Fix a similar problem in tun and macvtap, which have to split up scatter-gather elements at PAGE_SIZE boundaries. Don't do zerocopy if it would result in a > MAX_SKB_FRAGS skb. Fixes from Jason Wang. 10) On receive, once we've decoded the VLAN state completely, clear skb->vlan_tci. Otherwise demuxed tunnels underneath can trigger the VLAN code again, corrupting the packet. Fix from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: vlan: fix a race in egress prio management vlan: mask vlan prio bits macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS pkt_sched: sch_qfq: remove a source of high packet delay/jitter xen-netfront: pull on receive skb may need to happen earlier vxlan: add necessary locking on device removal hyperv: Fix the NETIF_F_SG flag setting in netvsc net: Fix sysfs_format_mac() code duplication. be2net: Fix to avoid hardware workaround when not needed macvtap: do not assume 802.1Q when send vlan packets macvtap: fix the missing ret value of TUNSETQUEUE ipv4: set transport header earlier mlx5 core: Fix __udivdi3 when compiling for 32 bit arches bgmac: add dependency to phylib net/irda: fixed style issues in irlan_eth ethtool: fixed trailing statements in ethtool ndisc: bool initializations should use true and false atl1e: unmap partially mapped skb on dma error and free skb
| * vlan: mask vlan prio bitsEric Dumazet2013-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e ("vlan: don't deliver frames for unknown vlans to protocols") Florian made sure we set pkt_type to PACKET_OTHERHOST if the vlan id is set and we could find a vlan device for this particular id. But we also have a problem if prio bits are set. Steinar reported an issue on a router receiving IPv6 frames with a vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device, because skb->vlan_tci is set. Forwarded frame is completely corrupted : We can see (8100:4000) being inserted in the middle of IPv6 source address : 16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 > 9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64 0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651 0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213 0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 It seems we are not really ready to properly cope with this right now. We can probably do better in future kernels : vlan_get_ingress_priority() should be a netdev property instead of a per vlan_dev one. For stable kernels, lets clear vlan_tci to fix the bugs. Reported-by: Steinar H. Gunderson <sesse@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'driver-core-3.11-rc2' of ↵Linus Torvalds2013-07-18
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg KH: "Here are some driver core patches for 3.11-rc2. They aren't really bugfixes, but a bunch of new helper macros for drivers to properly create attribute groups, which drivers and subsystems need to fix up a ton of race issues with incorrectly creating sysfs files (binary and normal) after userspace has been told that the device is present. Also here is the ability to create binary files as attribute groups, to solve that race condition, which was impossible to do before this, so that's my fault the drivers were broken. The majority of the .c changes is indenting and moving code around a bit. It affects no existing code, but allows the large backlog of 70+ patches that I already have created to start flowing into the different subtrees, instead of having to live in my driver-core tree, causing merge nightmares in linux-next for the next few months. These were finalized too late for the -rc1 merge window, which is why they were didn't make that pull request, testing and review from others didn't happen until a few weeks ago, and then there's the whole distraction of the past few days, which prevented these from getting to you sooner, sorry about that. Oh, and there's a bugfix for the documentation build warning in here as well. All of these have been in linux-next this week, with no reported problems" * tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver-core: fix new kernel-doc warning in base/platform.c sysfs: use file mode defines from stat.h sysfs: add more helper macro's for (bin_)attribute(_groups) driver core: add default groups to struct class driver core: Introduce device_create_groups sysfs: prevent warning when only using binary attributes sysfs: add support for binary attributes in groups driver core: device.h: add RW and RO attribute macros sysfs.h: add BIN_ATTR macro sysfs.h: add ATTRIBUTE_GROUPS() macro sysfs.h: add __ATTR_RW() macro
| * | sysfs: use file mode defines from stat.hOliver Schinagl2013-07-16
| | | | | | | | | | | | | | | | | | | | | | | | With the last patches stat.h was included to the header, and thus those permission defines should be used. Signed-off-by: Oliver Schinagl <oliver@schinagl.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | sysfs: add more helper macro's for (bin_)attribute(_groups)Oliver Schinagl2013-07-16
| | | | | | | | | | | | | | | | | | | | | | | | With the recent changes to sysfs there's various helper macro's. However there's no RW, RO BIN_ helper macro's. This patch adds them. Signed-off-by: Oliver Schinagl <oliver@schinagl.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | driver core: add default groups to struct classGreg Kroah-Hartman2013-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should be using groups, not attribute lists, for classes to allow subdirectories, and soon, binary files. Groups are just more flexible overall, so add them. The dev_attrs list will go away after all in-kernel users are converted to use dev_groups. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | driver core: Introduce device_create_groupsGuenter Roeck2013-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | device_create_groups lets callers create devices as well as associated sysfs attributes with a single call. This avoids race conditions seen if sysfs attributes on new devices are created later. [fixed up comment block placement and add checks for printk buffer formats - gregkh] Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | sysfs: add support for binary attributes in groupsGreg Kroah-Hartman2013-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | groups should be able to support binary attributes, just like it supports "normal" attributes. This lets us only handle one type of structure, groups, throughout the driver core and subsystems, making binary attributes a "full fledged" part of the driver model, and not something just "tacked on". Reported-by: Oliver Schinagl <oliver@schinagl.nl> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | driver core: device.h: add RW and RO attribute macrosGreg Kroah-Hartman2013-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | Make it easier to create attributes without having to always audit the mode settings. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | sysfs.h: add BIN_ATTR macroGreg Kroah-Hartman2013-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | This makes it easier to create static binary attributes, which is needed in a number of drivers, instead of "open coding" them. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | sysfs.h: add ATTRIBUTE_GROUPS() macroGreg Kroah-Hartman2013-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To make it easier for driver subsystems to work with attribute groups, create the ATTRIBUTE_GROUPS macro to remove some of the repetitive typing for the most common use for attribute groups. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | sysfs.h: add __ATTR_RW() macroGreg Kroah-Hartman2013-07-16
| |/ | | | | | | | | | | | | | | | | A number of parts of the kernel created their own version of this, might as well have the sysfs core provide it instead. Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge branch 'cpuinit_phase2' of ↵Linus Torvalds2013-07-18
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull phase two of __cpuinit removal from Paul Gortmaker: "With the __cpuinit infrastructure removed earlier, this group of commits only removes the function/data tagging that was done with the various (now no-op) __cpuinit related prefixes. Now that the dust has settled with yesterday's v3.11-rc1, there hopefully shouldn't be any new users leaking back in tree, but I think we can leave the harmless no-op stubs there for a release as a courtesy to those who still have out of tree stuff and weren't paying attention. Although the commits are against the recent tag to allow for minor context refreshes for things like yesterday's v3.11-rc1~ slab content, the patches have been largely unchanged for weeks, aside from such trivial updates. For detail junkies, the largely boring and mostly irrelevant history of the patches can be viewed at: http://git.kernel.org/cgit/linux/kernel/git/paulg/cpuinit-delete.git If nothing else, I guess it does at least demonstrate the level of involvement required to shepherd such a treewide change to completion. This is the same repository of patches that has been applied to the end of the daily linux-next branches for the past several weeks" * 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (28 commits) block: delete __cpuinit usage from all block files drivers: delete __cpuinit usage from all remaining drivers files kernel: delete __cpuinit usage from all core kernel files rcu: delete __cpuinit usage from all rcu files net: delete __cpuinit usage from all net files acpi: delete __cpuinit usage from all acpi files hwmon: delete __cpuinit usage from all hwmon files cpufreq: delete __cpuinit usage from all cpufreq files clocksource+irqchip: delete __cpuinit usage from all related files x86: delete __cpuinit usage from all x86 files score: delete __cpuinit usage from all score files xtensa: delete __cpuinit usage from all xtensa files openrisc: delete __cpuinit usage from all openrisc files m32r: delete __cpuinit usage from all m32r files hexagon: delete __cpuinit usage from all hexagon files frv: delete __cpuinit usage from all frv files cris: delete __cpuinit usage from all cris files metag: delete __cpuinit usage from all metag files tile: delete __cpuinit usage from all tile files sh: delete __cpuinit usage from all sh files ...
| * | kernel: delete __cpuinit usage from all core kernel filesPaul Gortmaker2013-07-14
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the uses of the __cpuinit macros from C files in the core kernel directories (kernel, init, lib, mm, and include) that don't really have a specific maintainer. [1] https://lkml.org/lkml/2013/5/20/589 Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* / linked-list: Remove __list_for_eachDave Jones2013-07-17
|/ | | | | | | | | | | | | | __list_for_each used to be the non prefetch() aware list walking primitive. When we removed the prefetch macros from the list routines, it became redundant. Given it does exactly the same thing as list_for_each now, we might as well remove it and call list_for_each directly. All users of __list_for_each have been converted to list_for_each calls in the current merge window. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'slab/for-linus' of ↵Linus Torvalds2013-07-14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull slab update from Pekka Enberg: "Highlights: - Fix for boot-time problems on some architectures due to init_lock_keys() not respecting kmalloc_caches boundaries (Christoph Lameter) - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim) - Fix for excessive slab freelist draining (Wanpeng Li) - SLUB and SLOB cleanups and fixes (various people)" I ended up editing the branch, and this avoids two commits at the end that were immediately reverted, and I instead just applied the oneliner fix in between myself. * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux slub: Check for page NULL before doing the node_match check mm/slab: Give s_next and s_stop slab-specific names slob: Check for NULL pointer before calling ctor() slub: Make cpu partial slab support configurable slab: add kmalloc() to kernel API documentation slab: fix init_lock_keys slob: use DIV_ROUND_UP where possible slub: do not put a slab to cpu partial list when cpu_partial is 0 mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo mm/slub: Drop unnecessary nr_partials mm/slab: Fix /proc/slabinfo unwriteable for slab mm/slab: Sharing s_next and s_stop between slab and slub mm/slab: Fix drain freelist excessively slob: Rework #ifdeffery in slab.h mm, slab: moved kmem_cache_alloc_node comment to correct place
| * slab: add kmalloc() to kernel API documentationMichael Opdenacker2013-07-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment, kmalloc() isn't even listed in the kernel API documentation (DocBook/kernel-api.html after running "make htmldocs"). Another issue is that the documentation for kmalloc_node() refers to kcalloc()'s documentation to describe its 'flags' parameter, while kcalloc() refered to kmalloc()'s documentation, which doesn't exist! This patch is a proposed fix for this. It also removes the documentation for kmalloc() in include/linux/slob_def.h which isn't included to generate the documentation anyway. This way, kmalloc() is described in only one place. Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slob: Rework #ifdeffery in slab.hChristoph Lameter2013-06-18
| | | | | | | | | | | | | | | | | | Make the SLOB specific stuff harmonize more with the way the other allocators do it. Create the typical kmalloc constants for that purpose. SLOB does not support it but the constants help us avoid #ifdefs. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2013-07-14
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs stuff from Al Viro: "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including making simple_lookup() usable for filesystems with non-NULL s_d_op, which allows us to get rid of quite a bit of ugliness" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sunrpc: now we can just set ->s_d_op cgroup: we can use simple_lookup() now efivarfs: we can use simple_lookup() now make simple_lookup() usable for filesystems that set ->s_d_op configfs: don't open-code d_alloc_name() __rpc_lookup_create_exclusive: pass string instead of qstr rpc_create_*_dir: don't bother with qstr llist: llist_add() can use llist_add_batch() llist: fix/simplify llist_add() and llist_add_batch() fput: turn "list_head delayed_fput_list" into llist_head fs/file_table.c:fput(): add comment Safer ABI for O_TMPFILE
| * | rpc_create_*_dir: don't bother with qstrAl Viro2013-07-14
| | | | | | | | | | | | | | | | | | just pass the name Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | llist: llist_add() can use llist_add_batch()Oleg Nesterov2013-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | llist_add(new, head) can simply use llist_add_batch(new, new, head), no need to duplicate the code. This obviously uninlines llist_add() and to me this is a win. But we can make llist_add_batch() inline if this is desirable, in this case gcc can notice that new_first == new_last if the caller is llist_add(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Vagin <avagin@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | llist: fix/simplify llist_add() and llist_add_batch()Oleg Nesterov2013-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. This is mostly theoretical, but llist_add*() need ACCESS_ONCE(). Otherwise it is not guaranteed that the first cmpxchg() uses the same value for old_entry and new_last->next. 2. These helpers cache the result of cmpxchg() and read the initial value of head->first before the main loop. I do not think this makes sense. In the likely case cmpxchg() succeeds, otherwise it doesn't hurt to reload head->first. I think it would be better to simplify the code and simply read ->first before cmpxchg(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Vagin <avagin@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fput: turn "list_head delayed_fput_list" into llist_headOleg Nesterov2013-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fput() and delayed_fput() can use llist and avoid the locking. This is unlikely path, it is not that this change can improve the performance, but this way the code looks simpler. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Vagin <avagin@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-07-13
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "Just a bunch of small fixes and tidy ups: 1) Finish the "busy_poll" renames, from Eliezer Tamir. 2) Fix RCU stalls in IFB driver, from Ding Tianhong. 3) Linearize buffers properly in tun/macvtap zerocopy code. 4) Don't crash on rmmod in vxlan, from Pravin B Shelar. 5) Spinlock used before init in alx driver, from Maarten Lankhorst. 6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry Kravkov. 7) Dummy and ifb driver load failure paths can oops, fixes from Tan Xiaojun and Ding Tianhong. 8) Correct MTU calculations in IP tunnels, from Alexander Duyck. 9) Account all TCP retransmits in SNMP stats properly, from Yuchung Cheng. 10) atl1e and via-rhine do not handle DMA mapping failures properly, from Neil Horman. 11) Various equal-cost multipath route fixes in ipv6 from Hannes Frederic Sowa" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits) ipv6: only static routes qualify for equal cost multipathing via-rhine: fix dma mapping errors atl1e: fix dma mapping warnings tcp: account all retransmit failures usb/net/r815x: fix cast to restricted __le32 usb/net/r8152: fix integer overflow in expression net: access page->private by using page_private net: strict_strtoul is obsolete, use kstrtoul instead drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe net/usb: add relative mii functions for r815x net/tipc: use %*phC to dump small buffers in hex form qlcnic: Adding Maintainers. gre: Fix MTU sizing check for gretap tunnels pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts pkt_sched: sch_qfq: improve efficiency of make_eligible gso: Update tunnel segmentation to support Tx checksum offload inet: fix spacing in assignment ifb: fix oops when loading the ifb failed ...
| * | | net: rename ll methods to busy-pollEliezer Tamir2013-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename ndo_ll_poll to ndo_busy_poll. Rename sk_mark_ll to sk_mark_napi_id. Rename skb_mark_ll to skb_mark_napi_id. Correct all useres of these functions. Update comments and defines in include/net/busy_poll.h Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2013-07-13
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: - watchdog fixes for full dynticks - improved debug output for full dynticks - remove an obsolete full dynticks check - two ARM SoC clocksource drivers for sharing across SoCs - tick broadcast fix for CPU hotplug * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: broadcast: Check broadcast mode on CPU hotplug clocksource: arm_global_timer: Add ARM global timer support clocksource: Add Marvell Orion SoC timer nohz: Remove obsolete check for full dynticks CPUs to be RCU nocbs watchdog: Boot-disable by default on full dynticks watchdog: Rename confusing state variable watchdog: Register / unregister watchdog kthreads on sysctl control nohz: Warn if the machine can not perform nohz_full
| * \ \ \ Merge branch 'linus' into timers/urgentThomas Gleixner2013-07-12
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get upstream changes so we can apply fixes against them Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * \ \ \ \ Merge branch 'timers/core' of ↵Ingo Molnar2013-07-10
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/urgent Pull nohz updates/fixes from Frederic Weisbecker: ' Note that "watchdog: Boot-disable by default on full dynticks" is a temporary solution to solve the issue with the watchdog that prevents the tick from stopping. This is to make sure that 3.11 doesn't have that problem as several people complained about it. A proper and longer term solution has been proposed by Peterz: http://lkml.kernel.org/r/20130618103632.GO3204@twins.programming.kicks-ass.net ' Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | watchdog: Rename confusing state variableFrederic Weisbecker2013-06-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have two very conflicting state variable names in the watchdog: * watchdog_enabled: This one reflects the user interface. It's set to 1 by default and can be overriden with boot options or sysctl/procfs interface. * watchdog_disabled: This is the internal toggle state that tells if watchdog threads, timers and NMI events are currently running or not. This state mostly depends on the user settings. It's a convenient state latch. Now we really need to find clearer names because those are just too confusing to encourage deep review. watchdog_enabled now becomes watchdog_user_enabled to reflect its purpose as an interface. watchdog_disabled becomes watchdog_running to suggest its role as a pure internal state. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Anish Singh <anish198519851985@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Don Zickus <dzickus@redhat.com>
* | | | | | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2013-07-13
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Thomas Gleixner: "Header cleanup as requested by Linus" (This is the "don't include support for ww_mutex in a header file that everybody wants, when almost nobody wants the ww part" change) * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: mutex: Move ww_mutex definitions to ww_mutex.h
| * | | | | | | mutex: Move ww_mutex definitions to ww_mutex.hMaarten Lankhorst2013-07-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the definitions for wound/wait mutexes out to a separate header, ww_mutex.h. This reduces clutter in mutex.h, and increases readability. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Dave Airlie <airlied@gmail.com> Link: http://lkml.kernel.org/r/51D675DC.3000907@canonical.com [ Tidied up the code a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | | Merge tag 'rdma-for-linus' of ↵Linus Torvalds2013-07-13
|\ \ \ \ \ \ \ \ | |_|_|_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - AF_IB (native IB addressing) for CMA from Sean Hefty - new mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes) - SRP fixes from Bart Van Assche (including fix to first merge request) - qib HW driver updates - resurrection of ocrdma HW driver development - uverbs conversion to create fds with O_CLOEXEC set - other small changes and fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) mlx5: Return -EFAULT instead of -EPERM IB/qib: Log all SDMA errors unconditionally IB/qib: Fix module-level leak mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() mlx5_core: Fixes for sparse warnings IB/mlx5: Make profile[] static in main.c mlx5: Fix parameter type of health_handler_t mlx5: Add driver for Mellanox Connect-IB adapters IB/core: Add reserved values to enums for low-level driver use IB/srp: Bump driver version and release date IB/srp: Make HCA completion vector configurable IB/srp: Maintain a single connection per I_T nexus IB/srp: Fail I/O fast if target offline IB/srp: Skip host settle delay IB/srp: Avoid skipping srp_reset_host() after a transport error IB/srp: Fix remove_one crash due to resource exhaustion IB/qib: New transmitter tunning settings for Dell 1.1 backplane IB/core: Fix error return code in add_port() ...
| | | | | | | |
| | \ \ \ \ \ \
| *-. \ \ \ \ \ \ Merge branches 'mlx5', 'qib' and 'srp' into for-nextRoland Dreier2013-07-11
| |\ \ \ \ \ \ \ \
| | * | | | | | | | mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB specMoshe Lazer2013-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sparse reported an endianness bug in the assignment to hca_cap.uar_page_sz. Fix the declaration of this field to be __be16 (which is what is in the firmware spec), renaming the field to log_uar_pg_size to conform to the spec, which fixes the endianness bug reported by sparse. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | |
| | \ \ \ \ \ \ \ \
| *-. | | | | | | | | Merge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 'srp' ↵Roland Dreier2013-07-08
| |\ \| | | | | | | | | | | |/ / / / / / / | | |/| | | | | | | | | | | | | | | | | into for-next
| | | * | | | | | | mlx5: Fix parameter type of health_handler_tRoland Dreier2013-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This deals with the sparse warning: drivers/net/ethernet/mellanox/mlx5/core/health.c:94:54: warning: incorrect type in argument 2 (different address spaces) drivers/net/ethernet/mellanox/mlx5/core/health.c:94:54: expected void *buf drivers/net/ethernet/mellanox/mlx5/core/health.c:94:54: got struct health_buffer [noderef] <asn:2>*health Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * | | | | | | mlx5: Add driver for Mellanox Connect-IB adaptersEli Cohen2013-07-08
| | |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver is comprised of two kernel modules: mlx5_ib and mlx5_core. This partitioning resembles what we have for mlx4, except that mlx5_ib is the pci device driver and not mlx5_core. mlx5_core is essentially a library that provides general functionality that is intended to be used by other Mellanox devices that will be introduced in the future. mlx5_ib has a similar role as any hardware device under drivers/infiniband/hw. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> [ Merge in coccinelle fixes from Fengguang Wu <fengguang.wu@intel.com>. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
| * / / / / / / / RDMA/cma: Define native IB addressSean Hefty2013-06-20
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define AF_IB and sockaddr_ib to allow the rdma_cm to use native IB addressing. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | | | | | | | Merge tag 'dm-3.11-changes' of ↵Linus Torvalds2013-07-11
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull device-mapper changes from Alasdair G Kergon: "Add a device-mapper target called dm-switch to provide a multipath framework for storage arrays that dynamically reconfigure their preferred paths for different device regions. Fix a bug in the verity target that prevented its use with some specific sizes of devices. Improve some locking mechanisms in the device-mapper core and bufio. Add Mike Snitzer as a device-mapper maintainer. A few more clean-ups and fixes" * tag 'dm-3.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm: add switch target dm: update maintainers dm: optimize reorder structure dm: optimize use SRCU and RCU dm bufio: submit writes outside lock dm cache: fix arm link errors with inline dm verity: use __ffs and __fls dm flakey: correct ctr alloc failure mesg dm verity: remove pointless comparison dm: use __GFP_HIGHMEM in __vmalloc dm verity: fix inability to use a few specific devices sizes dm ioctl: set noio flag to avoid __vmalloc deadlock dm mpath: fix ioctl deadlock when no paths
| * | | | | | | | dm: optimize use SRCU and RCUMikulas Patocka2013-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes "io_lock" and "map_lock" in struct mapped_device and "holders" in struct dm_table and replaces these mechanisms with sleepable-rcu. Previously, the code would call "dm_get_live_table" and "dm_table_put" to get and release table. Now, the code is changed to call "dm_get_live_table" and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and dm_put_live_table unlocks it. dm_get_live_table_fast/dm_put_live_table_fast can be used instead of dm_get_live_table/dm_put_live_table. These *_fast functions use non-sleepable RCU, so the caller must not block between them. If the code changes active or inactive dm table, it must call dm_sync_table before destroying the old table. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | | | | | | | | Merge branch 'for-3.11/core' of git://git.kernel.dk/linux-blockLinus Torvalds2013-07-11
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block IO updates from Jens Axboe: "Here are the core IO block bits for 3.11. It contains: - A tweak to the reserved tag logic from Jan, for weirdo devices with just 3 free tags. But for those it improves things substantially for random writes. - Periodic writeback fix from Jan. Marked for stable as well. - Fix for a race condition in IO scheduler switching from Jianpeng. - The hierarchical blk-cgroup support from Tejun. This is the grunt of the series. - blk-throttle fix from Vivek. Just a note that I'm in the middle of a relocation, whole family is flying out tomorrow. Hence I will be awal the remainder of this week, but back at work again on Monday the 15th. CC'ing Tejun, since any potential "surprises" will most likely be from the blk-cgroup work. But it's been brewing for a while and sitting in my tree and linux-next for a long time, so should be solid." * 'for-3.11/core' of git://git.kernel.dk/linux-block: (36 commits) elevator: Fix a race in elevator switching block: Reserve only one queue tag for sync IO if only 3 tags are available writeback: Fix periodic writeback after fs mount blk-throttle: implement proper hierarchy support blk-throttle: implement throtl_grp->has_rules[] blk-throttle: Account for child group's start time in parent while bio climbs up blk-throttle: add throtl_qnode for dispatch fairness blk-throttle: make throtl_pending_timer_fn() ready for hierarchy blk-throttle: make tg_dispatch_one_bio() ready for hierarchy blk-throttle: make blk_throtl_bio() ready for hierarchy blk-throttle: make blk_throtl_drain() ready for hierarchy blk-throttle: dispatch from throtl_pending_timer_fn() blk-throttle: implement dispatch looping blk-throttle: separate out throtl_service_queue->pending_timer from throtl_data->dispatch_work blk-throttle: set REQ_THROTTLED from throtl_charge_bio() and gate stats update with it blk-throttle: implement sq_to_tg(), sq_to_td() and throtl_log() blk-throttle: add throtl_service_queue->parent_sq blk-throttle: generalize update_disptime optimization in blk_throtl_bio() blk-throttle: dispatch to throtl_data->service_queue.bio_lists[] blk-throttle: move bio_lists[] and friends to throtl_service_queue ...
| * | | | | | | | | elevator: Fix a race in elevator switchingJianpeng Ma2013-07-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a race between elevator switching and normal io operation. Because the allocation of struct elevator_queue and struct elevator_data don't in a atomic operation.So there are have chance to use NULL ->elevator_data. For example: Thread A: Thread B blk_queu_bio elevator_switch spin_lock_irq(q->queue_block) elevator_alloc elv_merge elevator_init_fn Because call elevator_alloc, it can't hold queue_lock and the ->elevator_data is NULL.So at the same time, threadA call elv_merge and nedd some info of elevator_data.So the crash happened. Move the elevator_alloc into func elevator_init_fn, it make the operations in a atomic operation. Using the follow method can easy reproduce this bug 1:dd if=/dev/sdb of=/dev/null 2:while true;do echo noop > scheduler;echo deadline > scheduler;done The test method also use this method. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | | | | | blk-throttle: implement proper hierarchy supportTejun Heo2013-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the recent updates, blk-throttle is finally ready for proper hierarchy support. Dispatching now honors service_queue->parent_sq and propagates correctly. The only thing missing is setting ->parent_sq correctly so that throtl_grp hierarchy matches the cgroup hierarchy. This patch updates throtl_pd_init() such that service_queues form the same hierarchy as the cgroup hierarchy if sane_behavior is enabled. As this concludes proper hierarchy support for blkcg, the shameful .broken_hierarchy tag is removed from blkio_subsys. v2: Updated blkio-controller.txt as suggested by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Li Zefan <lizefan@huawei.com>
* | | | | | | | | | Merge tag 'remoteproc-3.11-fixes' of ↵Linus Torvalds2013-07-11
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc Pull remoteproc fixes from Ohad Ben-Cohen: "Trivial remoteproc fixes by Suman Anna, Wei Yongjun and Thomas Meyer" * tag 'remoteproc-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc: remoteproc: Cocci spatch "memdup.spatch" remoteproc: free carveout memories only after unmapping them remoteproc/omap: fix a sparse warning remoteproc: fix checkpatch errors in remoteproc code remoteproc: fix error return code in rproc_fw_boot()
| * | | | | | | | | | remoteproc/omap: fix a sparse warningSuman Anna2013-07-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a sparse warning in the omap remoteproc code when OMAP_REMOTEPROC is disabled. include/linux/platform_data/remoteproc-omap.h:76:13: warning: symbol 'omap_rproc_reserve_cma' was not declared. Should it be static? Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>