aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAge
* missing bits of net-namespace / sysctlAl Viro2008-07-27
| | | | | | | | | | | | | | | | | Piss-poor sysctl registration API strikes again, film at 11... What we really need is _pathname_ required to be present in already registered table, so that kernel could warn about bad order. That's the next target for sysctl stuff (and generally saner and more explicit order of initialization of ipv[46] internals wouldn't hurt either). For the time being, here are full fixups required by ..._rotable() stuff; we make per-net sysctl sets descendents of "ro" one and make sure that sufficient skeleton is there before we start registering per-net sysctls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2008-07-26
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits) [PATCH] fix RLIM_NOFILE handling [PATCH] get rid of corner case in dup3() entirely [PATCH] remove remaining namei_{32,64}.h crap [PATCH] get rid of indirect users of namei.h [PATCH] get rid of __user_path_lookup_open [PATCH] f_count may wrap around [PATCH] dup3 fix [PATCH] don't pass nameidata to __ncp_lookup_validate() [PATCH] don't pass nameidata to gfs2_lookupi() [PATCH] new (local) helper: user_path_parent() [PATCH] sanitize __user_walk_fd() et.al. [PATCH] preparation to __user_walk_fd cleanup [PATCH] kill nameidata passing to permission(), rename to inode_permission() [PATCH] take noexec checks to very few callers that care Re: [PATCH 3/6] vfs: open_exec cleanup [patch 4/4] vfs: immutable inode checking cleanup [patch 3/4] fat: dont call notify_change [patch 2/4] vfs: utimes cleanup [patch 1/4] vfs: utimes: move owner check into inode_change_ok() [PATCH] vfs: use kstrdup() and check failing allocation ...
| * [PATCH] f_count may wrap aroundAl Viro2008-07-26
| | | | | | | | | | | | | | make it atomic_long_t; while we are at it, get rid of useless checks in affs, hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] sysctl: make sure that /proc/sys/net/ipv4 appears before per-ns onesAl Viro2008-07-26
| | | | | | | | | | | | | | | | | | | | Massage ipv4 initialization - make sure that net.ipv4 appears as non-per-net-namespace before it shows up in per-net-namespace sysctls. That's the only change outside of sysctl.c needed to get sane ordering rules and data structures for sysctls (esp. for procfs side of that mess). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] beginning of sysctl cleanup - ctl_table_setAl Viro2008-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | New object: set of sysctls [currently - root and per-net-ns]. Contains: pointer to parent set, list of tables and "should I see this set?" method (->is_seen(set)). Current lists of tables are subsumed by that; net-ns contains such a beast. ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of that to ->list of that ctl_table_set. [folded compile fixes by rdd for configs without sysctl] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2008-07-26
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netns: fix ip_rt_frag_needed rt_is_expired netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences netfilter: fix double-free and use-after free netfilter: arptables in netns for real netfilter: ip{,6}tables_security: fix future section mismatch selinux: use nf_register_hooks() netfilter: ebtables: use nf_register_hooks() Revert "pkt_sched: sch_sfq: dump a real number of flows" qeth: use dev->ml_priv instead of dev->priv syncookies: Make sure ECN is disabled net: drop unused BUG_TRAP() net: convert BUG_TRAP to generic WARN_ON drivers/net: convert BUG_TRAP to generic WARN_ON
| * netns: fix ip_rt_frag_needed rt_is_expiredHugh Dickins2008-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | Running recent kernels, and using a particular vpn gateway, I've been having to edit my mails down to get them accepted by the smtp server. Git bisect led to commit e84f84f276473dcc673f360e8ff3203148bdf0e2 - netns: place rt_genid into struct net. The conversion from a != test to rt_is_expired() put one negative too many: and now my mail works. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferencesPatrick McHardy2008-07-26
| | | | | | | | | | | | | | | | As Linus points out, "ct->ext" and "new" are always equal, avoid unnecessary dereferences and use "new" directly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: fix double-free and use-after freePekka Enberg2008-07-26
| | | | | | | | | | | | | | | | | | | | | | | | As suggested by Patrick McHardy, introduce a __krealloc() that doesn't free the original buffer to fix a double-free and use-after-free bug introduced by me in netfilter that uses RCU. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Tested-by: Dieter Ries <clip2@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: arptables in netns for realAlexey Dobriyan2008-07-26
| | | | | | | | | | | | | | | | IN, FORWARD -- grab netns from in device, OUT -- from out device. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: ip{,6}tables_security: fix future section mismatchAlexey Dobriyan2008-07-26
| | | | | | | | | | | | | | | | | | Currently not visible, because NET_NS is mutually exclusive with SYSFS which is required by SECURITY. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: ebtables: use nf_register_hooks()Alexey Dobriyan2008-07-26
| | | | | | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Revert "pkt_sched: sch_sfq: dump a real number of flows"David S. Miller2008-07-26
| | | | | | | | | | | | | | | | | | | | | | This reverts commit f867e6af94239a04ec23aeec2fcda5aa58e41db7. Based upon discussions between Jarek and Patrick McHardy this is field being set is more a config parameter than a statistic. And we should add a true statistic to provide this information if we really want it. Signed-off-by: David S. Miller <davem@davemloft.net>
| * syncookies: Make sure ECN is disabledFlorian Westphal2008-07-26
| | | | | | | | | | | | | | | | | | | | ecn_ok is not initialized when a connection is established by cookies. The cookie syn-ack never sets ECN, so ecn_ok must be set to 0. Spotted using ns-3/network simulation cradle simulator and valgrind. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: convert BUG_TRAP to generic WARN_ONIlpo Järvinen2008-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* | SL*B: drop kmem cache argument from constructorAlexey Dobriyan2008-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | dma-mapping: add the device argument to dma_mapping_error()FUJITA Tomonori2008-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER architecture does: This enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423). I think that per-device dma_mapping_ops support would be also helpful for KVM people to support PCI passthrough but Andi thinks that this makes it difficult to support the PCI passthrough (see the above thread). So I CC'ed this to KVM camp. Comments are appreciated. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. If it's useful for KVM people, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created (it works with hot plugging). It enables IOMMUs to set up an appropriate dma_mapping_ops per device. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. This patch: dma_mapping_error() doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device argument. [akpm@linux-foundation.org: fix sge] [akpm@linux-foundation.org: fix svc_rdma] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix bnx2x] [akpm@linux-foundation.org: fix s2io] [akpm@linux-foundation.org: fix pasemi_mac] [akpm@linux-foundation.org: fix sdhci] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix sparc] [akpm@linux-foundation.org: fix ibmvscsi] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2008-07-25
|\| | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: ipsec: ipcomp - Decompress into frags if necessary ipsec: ipcomp - Merge IPComp implementations pkt_sched: Fix locking in shutdown_scheduler_queue()
| * ipsec: ipcomp - Decompress into frags if necessaryHerbert Xu2008-07-25
| | | | | | | | | | | | | | | | | | When decompressing extremely large packets allocating them through kmalloc is prone to failure. Therefore it's better to use page frags instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipsec: ipcomp - Merge IPComp implementationsHerbert Xu2008-07-25
| | | | | | | | | | | | | | | | | | This patch merges the IPv4/IPv6 IPComp implementations since most of the code is identical. As a result future enhancements will no longer need to be duplicated. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * pkt_sched: Fix locking in shutdown_scheduler_queue()David S. Miller2008-07-25
| | | | | | | | | | | | | | Qdisc locks need to be held with BH disabled. Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sysctl: allow override of /proc/sys/net with CAP_NET_ADMINStephen Hemminger2008-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the permission check for networking sysctl's to allow modification when current process has CAP_NET_ADMIN capability and is not root. This version uses the until now unused permissions hook to override the mode value for /proc/sys/net if accessed by a user with capabilities. Found while working with Quagga. It is impossible to turn forwarding on/off through the command interface because Quagga uses secure coding practice of dropping privledges during initialization and only raising via capabilities when necessary. Since the dameon has reset real/effective uid after initialization, all attempts to access /proc/sys/net variables will fail. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morgan <morgan@kernel.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | printk ratelimiting rewriteDave Young2008-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All ratelimit user use same jiffies and burst params, so some messages (callbacks) will be lost. For example: a call printk_ratelimit(5 * HZ, 1) b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will will be supressed. - rewrite __ratelimit, and use a ratelimit_state as parameter. Thanks for hints from andrew. - Add WARN_ON_RATELIMIT, update rcupreempt.h - remove __printk_ratelimit - use __ratelimit in net_ratelimit Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | list_for_each_rcu must die: networkingPaul E. McKenney2008-07-25
| | | | | | | | | | | | | | | | | | | | | | | | All uses of list_for_each_rcu() can be profitably replaced by the easier-to-use list_for_each_entry_rcu(). This patch makes this change for networking, in preparation for removing the list_for_each_rcu() API entirely. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2008-07-24
|\| | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: pkt_sched: sch_sfq: dump a real number of flows atm: [fore200e] use MODULE_FIRMWARE() and other suggested cleanups netfilter: make security table depend on NETFILTER_ADVANCED tcp: Clear probes_out more aggressively in tcp_ack(). e1000e: fix e1000_netpoll(), remove extraneous e1000_clean_tx_irq() call net: Update entry in af_family_clock_key_strings netdev: Remove warning from __netif_schedule(). sky2: don't stop queue on shutdown
| * pkt_sched: sch_sfq: dump a real number of flowsJarek Poplawski2008-07-24
| | | | | | | | | | | | | | | | | | Dump the "flows" number according to the number of active flows instead of repeating the "limit". Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: make security table depend on NETFILTER_ADVANCEDPatrick McHardy2008-07-23
| | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tcp: Clear probes_out more aggressively in tcp_ack().David S. Miller2008-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is based upon an excellent bug report from Eric Dumazet. tcp_ack() should clear ->icsk_probes_out even if there are packets outstanding. Otherwise if we get a sequence of ACKs while we do have packets outstanding over and over again, we'll never clear the probes_out value and eventually think the connection is too sick and we'll reset it. This appears to be some "optimization" added to tcp_ack() in the 2.4.x timeframe. In 2.2.x, probes_out is pretty much always cleared by tcp_ack(). Here is Eric's original report: ---------------------------------------- Apparently, we can in some situations reset TCP connections in a couple of seconds when some frames are lost. In order to reproduce the problem, please try the following program on linux-2.6.25.* Setup some iptables rules to allow two frames per second sent on loopback interface to tcp destination port 12000 iptables -N SLOWLO iptables -A SLOWLO -m hashlimit --hashlimit 2 --hashlimit-burst 1 --hashlimit-mode dstip --hashlimit-name slow2 -j ACCEPT iptables -A SLOWLO -j DROP iptables -A OUTPUT -o lo -p tcp --dport 12000 -j SLOWLO Then run the attached program and see the output : # ./loop State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,1) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,3) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,5) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,7) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,9) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,11) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,201ms,13) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,188ms,15) write(): Connection timed out wrote 890 bytes but was interrupted after 9 seconds ESTAB 0 0 127.0.0.1:12000 127.0.0.1:54455 Exiting read() because no data available (4000 ms timeout). read 860 bytes While this tcp session makes progress (sending frames with 50 bytes of payload, every 500ms), linux tcp stack decides to reset it, when tcp_retries 2 is reached (default value : 15) tcpdump : 15:30:28.856695 IP 127.0.0.1.56554 > 127.0.0.1.12000: S 33788768:33788768(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7> 15:30:28.856711 IP 127.0.0.1.12000 > 127.0.0.1.56554: S 33899253:33899253(0) ack 33788769 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7> 15:30:29.356947 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 1:61(60) ack 1 win 257 15:30:29.356966 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 61 win 257 15:30:29.866415 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 61:111(50) ack 1 win 257 15:30:29.866427 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 111 win 257 15:30:30.366516 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 111:161(50) ack 1 win 257 15:30:30.366527 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 161 win 257 15:30:30.876196 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 161:211(50) ack 1 win 257 15:30:30.876207 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 211 win 257 15:30:31.376282 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 211:261(50) ack 1 win 257 15:30:31.376290 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 261 win 257 15:30:31.885619 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 261:311(50) ack 1 win 257 15:30:31.885631 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 311 win 257 15:30:32.385705 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 311:361(50) ack 1 win 257 15:30:32.385715 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 361 win 257 15:30:32.895249 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 361:411(50) ack 1 win 257 15:30:32.895266 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 411 win 257 15:30:33.395341 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 411:461(50) ack 1 win 257 15:30:33.395351 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 461 win 257 15:30:33.918085 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 461:511(50) ack 1 win 257 15:30:33.918096 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 511 win 257 15:30:34.418163 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 511:561(50) ack 1 win 257 15:30:34.418172 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 561 win 257 15:30:34.927685 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 561:611(50) ack 1 win 257 15:30:34.927698 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 611 win 257 15:30:35.427757 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 611:661(50) ack 1 win 257 15:30:35.427766 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 661 win 257 15:30:35.937359 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 661:711(50) ack 1 win 257 15:30:35.937376 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 711 win 257 15:30:36.437451 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 711:761(50) ack 1 win 257 15:30:36.437464 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 761 win 257 15:30:36.947022 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 761:811(50) ack 1 win 257 15:30:36.947039 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 811 win 257 15:30:37.447135 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 811:861(50) ack 1 win 257 15:30:37.447203 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 861 win 257 15:30:41.448171 IP 127.0.0.1.12000 > 127.0.0.1.56554: F 1:1(0) ack 861 win 257 15:30:41.448189 IP 127.0.0.1.56554 > 127.0.0.1.12000: R 33789629:33789629(0) win 0 Source of program : /* * small producer/consumer program. * setup a listener on 127.0.0.1:12000 * Forks a child * child connect to 127.0.0.1, and sends 10 bytes on this tcp socket every 100 ms * Father accepts connection, and read all data */ #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <unistd.h> #include <stdio.h> #include <time.h> #include <sys/poll.h> int port = 12000; char buffer[4096]; int main(int argc, char *argv[]) { int lfd = socket(AF_INET, SOCK_STREAM, 0); struct sockaddr_in socket_address; time_t t0, t1; int on = 1, sfd, res; unsigned long total = 0; socklen_t alen = sizeof(socket_address); pid_t pid; time(&t0); socket_address.sin_family = AF_INET; socket_address.sin_port = htons(port); socket_address.sin_addr.s_addr = htonl(INADDR_LOOPBACK); if (lfd == -1) { perror("socket()"); return 1; } setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(int)); if (bind(lfd, (struct sockaddr *)&socket_address, sizeof(socket_address)) == -1) { perror("bind"); close(lfd); return 1; } if (listen(lfd, 1) == -1) { perror("listen()"); close(lfd); return 1; } pid = fork(); if (pid == 0) { int i, cfd = socket(AF_INET, SOCK_STREAM, 0); close(lfd); if (connect(cfd, (struct sockaddr *)&socket_address, sizeof(socket_address)) == -1) { perror("connect()"); return 1; } for (i = 0 ; ;) { res = write(cfd, "blablabla\n", 10); if (res > 0) total += res; else if (res == -1) { perror("write()"); break; } else break; usleep(100000); if (++i == 10) { system("ss -on dst 127.0.0.1:12000"); i = 0; } } time(&t1); fprintf(stderr, "wrote %lu bytes but was interrupted after %g seconds\n", total, difftime(t1, t0)); system("ss -on | grep 127.0.0.1:12000"); close(cfd); return 0; } sfd = accept(lfd, (struct sockaddr *)&socket_address, &alen); if (sfd == -1) { perror("accept"); return 1; } close(lfd); while (1) { struct pollfd pfd[1]; pfd[0].fd = sfd; pfd[0].events = POLLIN; if (poll(pfd, 1, 4000) == 0) { fprintf(stderr, "Exiting read() because no data available (4000 ms timeout).\n"); break; } res = read(sfd, buffer, sizeof(buffer)); if (res > 0) total += res; else if (res == 0) break; else perror("read()"); } fprintf(stderr, "read %lu bytes\n", total); close(sfd); return 0; } ---------------------------------------- Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Update entry in af_family_clock_key_stringsOliver Hartkopp2008-07-23
| | | | | | | | | | | | | | | | | | | | | | In the merge phase of the CAN subsystem the af_family_clock_key_strings[] have been added to sock.c in commit 443aef0eddfa44c158d1b94ebb431a70638fcab4 (lockdep: fixup sk_callback_lock annotation). This trivial patch adds the missing name for address family 29 (AF_CAN). Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netdev: Remove warning from __netif_schedule().David S. Miller2008-07-23
| | | | | | | | | | | | | | | | | | | | | | It isn't helping anything and we aren't going to be able to change all the drivers that do queue wakeups in strange situations. Just letting a noop_qdisc get scheduled will work because when qdisc_run() executes via net_tx_work() it will simply find no packets pending when it makes the ->dequeue() call in qdisc_restart. Signed-off-by: David S. Miller <davem@davemloft.net>
* | flag parameters: check magic constantsUlrich Drepper2008-07-24
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds test that ensure the boundary conditions for the various constants introduced in the previous patches is met. No code is generated. [akpm@linux-foundation.org: fix alpha] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | flag parameters: NONBLOCK in socket and socketpairUlrich Drepper2008-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces support for the SOCK_NONBLOCK flag in socket, socketpair, and paccept. To do this the internal function sock_attach_fd gets an additional parameter which it uses to set the appropriate flag for the file descriptor. Given that in modern, scalable programs almost all socket connections are non-blocking and the minimal additional cost for the new functionality I see no reason not to add this code. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <pthread.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/syscall.h> #ifndef __NR_paccept # ifdef __x86_64__ # define __NR_paccept 288 # elif defined __i386__ # define SYS_PACCEPT 18 # define USE_SOCKETCALL 1 # else # error "need __NR_paccept" # endif #endif #ifdef USE_SOCKETCALL # define paccept(fd, addr, addrlen, mask, flags) \ ({ long args[6] = { \ (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \ syscall (__NR_socketcall, SYS_PACCEPT, args); }) #else # define paccept(fd, addr, addrlen, mask, flags) \ syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags) #endif #define PORT 57392 #define SOCK_NONBLOCK O_NONBLOCK static pthread_barrier_t b; static void * tf (void *arg) { pthread_barrier_wait (&b); int s = socket (AF_INET, SOCK_STREAM, 0); struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr *) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr *) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); return NULL; } int main (void) { int fd; fd = socket (PF_INET, SOCK_STREAM, 0); if (fd == -1) { puts ("socket(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("socket(0) set non-blocking mode"); return 1; } close (fd); fd = socket (PF_INET, SOCK_STREAM|SOCK_NONBLOCK, 0); if (fd == -1) { puts ("socket(SOCK_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("socket(SOCK_NONBLOCK) does not set non-blocking mode"); return 1; } close (fd); int fds[2]; if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1) { puts ("socketpair(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { printf ("socketpair(0) set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_NONBLOCK, 0, fds) == -1) { puts ("socketpair(SOCK_NONBLOCK) failed"); return 1; } for (int i = 0; i < 2; ++i) { fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { printf ("socketpair(SOCK_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } pthread_barrier_init (&b, NULL, 2); struct sockaddr_in sin; pthread_t th; if (pthread_create (&th, NULL, tf, NULL) != 0) { puts ("pthread_create failed"); return 1; } int s = socket (AF_INET, SOCK_STREAM, 0); int reuse = 1; setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); int s2 = paccept (s, NULL, 0, NULL, 0); if (s2 < 0) { puts ("paccept(0) failed"); return 1; } fl = fcntl (s2, F_GETFL); if (fl & O_NONBLOCK) { puts ("paccept(0) set non-blocking mode"); return 1; } close (s2); close (s); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); s2 = paccept (s, NULL, 0, NULL, SOCK_NONBLOCK); if (s2 < 0) { puts ("paccept(SOCK_NONBLOCK) failed"); return 1; } fl = fcntl (s2, F_GETFL); if ((fl & O_NONBLOCK) == 0) { puts ("paccept(SOCK_NONBLOCK) does not set non-blocking mode"); return 1; } close (s2); close (s); pthread_barrier_wait (&b); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | flag parameters: paccept w/out set_restore_sigmaskUlrich Drepper2008-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some platforms do not have support to restore the signal mask in the return path from a syscall. For those platforms syscalls like pselect are not defined at all. This is, I think, not a good choice for paccept() since paccept() adds more value on top of accept() than just the signal mask handling. Therefore this patch defines a scaled down version of the sys_paccept function for those platforms. It returns -EINVAL in case the signal mask is non-NULL but behaves the same otherwise. Note that I explicitly included <linux/thread_info.h>. I saw that it is currently included but indirectly two levels down. There is too much risk in relying on this. The header might change and then suddenly the function definition would change without anyone immediately noticing. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | flag parameters: pacceptUlrich Drepper2008-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is by far the most complex in the series. It adds a new syscall paccept. This syscall differs from accept in that it adds (at the userlevel) two additional parameters: - a signal mask - a flags value The flags parameter can be used to set flag like SOCK_CLOEXEC. This is imlpemented here as well. Some people argued that this is a property which should be inherited from the file desriptor for the server but this is against POSIX. Additionally, we really want the signal mask parameter as well (similar to pselect, ppoll, etc). So an interface change in inevitable. The flag value is the same as for socket and socketpair. I think diverging here will only create confusion. Similar to the filesystem interfaces where the use of the O_* constants differs, it is acceptable here. The signal mask is handled as for pselect etc. The mask is temporarily installed for the thread and removed before the call returns. I modeled the code after pselect. If there is a problem it's likely also in pselect. For architectures which use socketcall I maintained this interface instead of adding a system call. The symmetry shouldn't be broken. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <errno.h> #include <fcntl.h> #include <pthread.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/syscall.h> #ifndef __NR_paccept # ifdef __x86_64__ # define __NR_paccept 288 # elif defined __i386__ # define SYS_PACCEPT 18 # define USE_SOCKETCALL 1 # else # error "need __NR_paccept" # endif #endif #ifdef USE_SOCKETCALL # define paccept(fd, addr, addrlen, mask, flags) \ ({ long args[6] = { \ (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \ syscall (__NR_socketcall, SYS_PACCEPT, args); }) #else # define paccept(fd, addr, addrlen, mask, flags) \ syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags) #endif #define PORT 57392 #define SOCK_CLOEXEC O_CLOEXEC static pthread_barrier_t b; static void * tf (void *arg) { pthread_barrier_wait (&b); int s = socket (AF_INET, SOCK_STREAM, 0); struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr *) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr *) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); pthread_barrier_wait (&b); sleep (2); pthread_kill ((pthread_t) arg, SIGUSR1); return NULL; } static void handler (int s) { } int main (void) { pthread_barrier_init (&b, NULL, 2); struct sockaddr_in sin; pthread_t th; if (pthread_create (&th, NULL, tf, (void *) pthread_self ()) != 0) { puts ("pthread_create failed"); return 1; } int s = socket (AF_INET, SOCK_STREAM, 0); int reuse = 1; setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); int s2 = paccept (s, NULL, 0, NULL, 0); if (s2 < 0) { puts ("paccept(0) failed"); return 1; } int coe = fcntl (s2, F_GETFD); if (coe & FD_CLOEXEC) { puts ("paccept(0) set close-on-exec-flag"); return 1; } close (s2); pthread_barrier_wait (&b); s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC); if (s2 < 0) { puts ("paccept(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (s2, F_GETFD); if ((coe & FD_CLOEXEC) == 0) { puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (s2); pthread_barrier_wait (&b); struct sigaction sa; sa.sa_handler = handler; sa.sa_flags = 0; sigemptyset (&sa.sa_mask); sigaction (SIGUSR1, &sa, NULL); sigset_t ss; pthread_sigmask (SIG_SETMASK, NULL, &ss); sigaddset (&ss, SIGUSR1); pthread_sigmask (SIG_SETMASK, &ss, NULL); sigdelset (&ss, SIGUSR1); alarm (4); pthread_barrier_wait (&b); errno = 0 ; s2 = paccept (s, NULL, 0, &ss, 0); if (s2 != -1 || errno != EINTR) { puts ("paccept did not fail with EINTR"); return 1; } close (s); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: make it compile] [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Roland McGrath <roland@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | flag parameters: socket and socketpairUlrich Drepper2008-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for flag values which are ORed to the type passwd to socket and socketpair. The additional code is minimal. The flag values in this implementation can and must match the O_* flags. This avoids overhead in the conversion. The internal functions sock_alloc_fd and sock_map_fd get a new parameters and all callers are changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #define PORT 57392 /* For Linux these must be the same. */ #define SOCK_CLOEXEC O_CLOEXEC int main (void) { int fd; fd = socket (PF_INET, SOCK_STREAM, 0); if (fd == -1) { puts ("socket(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("socket(0) set close-on-exec flag"); return 1; } close (fd); fd = socket (PF_INET, SOCK_STREAM|SOCK_CLOEXEC, 0); if (fd == -1) { puts ("socket(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("socket(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); int fds[2]; if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1) { puts ("socketpair(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { coe = fcntl (fds[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { printf ("socketpair(0) set close-on-exec flag for fds[%d]\n", i); return 1; } close (fds[i]); } if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, fds) == -1) { puts ("socketpair(SOCK_CLOEXEC) failed"); return 1; } for (int i = 0; i < 2; ++i) { coe = fcntl (fds[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { printf ("socketpair(SOCK_CLOEXEC) does not set close-on-exec flag for fds[%d]\n", i); return 1; } close (fds[i]); } puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'cpus4096-for-linus' of ↵Linus Torvalds2008-07-23
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) NR_CPUS: Replace NR_CPUS in speedstep-centrino.c cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP NR_CPUS: Replace NR_CPUS in cpufreq userspace routines NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target cpumask: Provide a generic set of CPUMASK_ALLOC macros cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr Revert "cpumask: introduce new APIs" cpumask: make for_each_cpu_mask a bit smaller net: Pass reference to cpumask variable in net/sunrpc/svc.c ... Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
| * \ Merge branch 'linus' into cpus4096-for-linusIngo Molnar2008-07-21
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: net/sunrpc/svc.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptrMike Travis2008-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * This patch replaces the dangerous lvalue version of cpumask_of_cpu with new cpumask_of_cpu_ptr macros. These are patterned after the node_to_cpumask_ptr macros. In general terms, if there is a cpumask_of_cpu_map[] then a pointer to the cpumask_of_cpu_map[cpu] entry is used. The cpumask_of_cpu_map is provided when there is a large NR_CPUS count, reducing greatly the amount of code generated and stack space used for cpumask_of_cpu(). The pointer to the cpumask_t value is needed for calling set_cpus_allowed_ptr() to reduce the amount of stack space needed to pass the cpumask_t value. If there isn't a cpumask_of_cpu_map[], then a temporary variable is declared and filled in with value from cpumask_of_cpu(cpu) as well as a pointer variable pointing to this temporary variable. Afterwards, the pointer is used to reference the cpumask value. The compiler will optimize out the extra dereference through the pointer as well as the stack space used for the pointer, resulting in identical code. A good example of the orthogonal usages is in net/sunrpc/svc.c: case SVC_POOL_PERCPU: { unsigned int cpu = m->pool_to[pidx]; cpumask_of_cpu_ptr(cpumask, cpu); *oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, cpumask); return 1; } case SVC_POOL_PERNODE: { unsigned int node = m->pool_to[pidx]; node_to_cpumask_ptr(nodecpumask, node); *oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, nodecpumask); return 1; } Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Merge branch 'linus' into cpus4096Ingo Molnar2008-07-18
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/acpi/processor_throttling.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * \ \ \ Merge branch 'linus' into cpus4096Ingo Molnar2008-07-15
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/x86/xen/smp.c kernel/sched_rt.c net/iucv/iucv.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * \ \ \ \ Merge commit 'v2.6.26-rc9' into cpus4096Ingo Molnar2008-07-06
| |\ \ \ \ \
| * | | | | | net: Pass reference to cpumask variable in net/sunrpc/svc.cMike Travis2008-05-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Pass reference to cpumask variable instead of using stack. For inclusion into sched-devel/latest tree. Based on: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git + sched-devel/latest .../mingo/linux-2.6-sched-devel.git Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | net: use performance variant for_each_cpu_mask_nrMike Travis2008-05-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change references from for_each_cpu_mask to for_each_cpu_mask_nr where appropriate Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2008-07-23
|\ \ \ \ \ \ \ | |_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (24 commits) I/OAT: I/OAT version 3.0 support I/OAT: tcp_dma_copybreak default value dependent on I/OAT version I/OAT: Add watchdog/reset functionality to ioatdma iop_adma: cleanup iop_chan_xor_slot_count iop_adma: document how to calculate the minimum descriptor pool size iop_adma: directly reclaim descriptors on allocation failure async_tx: make async_tx_test_ack a boolean routine async_tx: remove depend_tx from async_tx_sync_epilog async_tx: export async_tx_quiesce async_tx: fix handling of the "out of descriptor" condition in async_xor async_tx: ensure the xor destination buffer remains dma-mapped async_tx: list_for_each_entry_rcu() cleanup dmaengine: Driver for the Synopsys DesignWare DMA controller dmaengine: Add slave DMA interface dmaengine: add DMA_COMPL_SKIP_{SRC,DEST}_UNMAP flags to control dma unmap dmaengine: Add dma_client parameter to device_alloc_chan_resources dmatest: Simple DMA memcpy test client dmaengine: DMA engine driver for Marvell XOR engine iop-adma: fix platform driver hotplug/coldplug dmaengine: track the number of clients using a channel ... Fixed up conflict in drivers/dca/dca-sysfs.c manually
| * | | | | | I/OAT: tcp_dma_copybreak default value dependent on I/OAT versionMaciej Sosnowski2008-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I/OAT DMA performance tuning showed different optimal values of tcp_dma_copybreak for different I/OAT versions (4096 for 1.2 and 2048 for 2.0). This patch lets ioatdma driver set tcp_dma_copybreak value according to these results. [dan.j.williams@intel.com: remove some ifdefs] Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2008-07-22
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits) ipw2200: Call netif_*_queue() interfaces properly. netxen: Needs to include linux/vmalloc.h [netdrvr] atl1d: fix !CONFIG_PM build r6040: rework init_one error handling r6040: bump release number to 0.18 r6040: handle RX fifo full and no descriptor interrupts r6040: change the default waiting time r6040: use definitions for magic values in descriptor status r6040: completely rework the RX path r6040: call napi_disable when puting down the interface and set lp->dev accordingly. mv643xx_eth: fix NETPOLL build r6040: rework the RX buffers allocation routine r6040: fix scheduling while atomic in r6040_tx_timeout r6040: fix null pointer access and tx timeouts r6040: prefix all functions with r6040 rndis_host: support WM6 devices as modems at91_ether: use netstats in net_device structure sfc: Create one RX queue and interrupt per CPU package by default sfc: Use a separate workqueue for resets sfc: I2C adapter initialisation fixes ...
| * | | | | | | ipv6: icmp6_dst_gc return changeStephen Hemminger2008-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change icmp6_dst_gc to return the one value the caller cares about rather than using call by reference. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv6: use kcallocStephen Hemminger2008-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Th fib_table_hash is an array, so use kcalloc. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv6: use spin_trylock_bhStephen Hemminger2008-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now there is spin_trylock_bh, use it rather than open coding. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv6: use round_jiffiesStephen Hemminger2008-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This timer normally happens once a minute, there is no need to cause an early wakeup for it, so align it to next second boundary to safe power. It can't be deferred because then it could take too long on cleanup or DoS. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>