aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAge
* ethtool: Centralise validation of ETHTOOL_{G, S}RXFHINDIR parametersBen Hutchings2011-12-16
| | | | | | | | | | | | | Add a new ethtool operation (get_rxfh_indir_size) to get the indirectional table size. Use this to validate the user buffer size before calling get_rxfh_indir or set_rxfh_indir. Use get_rxnfc to get the number of RX rings, and validate the contents of the new indirection table before calling set_rxfh_indir. Remove this validation from drivers. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix_diag: Write it into kbuildPavel Emelyanov2011-12-16
| | | | | Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix_diag: Receive queue lenght NLAPavel Emelyanov2011-12-16
| | | | | Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix_diag: Pending connections IDs NLAPavel Emelyanov2011-12-16
| | | | | | | | | | | | | | | | | When establishing a unix connection on stream sockets the server end receives an skb with socket in its receive queue. Report who is waiting for these ends to be accepted for listening sockets via NLA. There's a lokcing issue with this -- the unix sk state lock is required to access the peer, and it is taken under the listening sk's queue lock. Strictly speaking the queue lock should be taken inside the state lock, but since in this case these two sockets are different it shouldn't lead to deadlock. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix_diag: Unix peer inode NLAPavel Emelyanov2011-12-16
| | | | | | | | Report the peer socket inode ID as NLA. With this it's finally possible to find out the other end of an interesting unix connection. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix_diag: Unix inode info NLAPavel Emelyanov2011-12-16
| | | | | | | | | | | | | Actually, the socket path if it's not anonymous doesn't give a clue to which file the socket is bound to. Even if the path is absolute, it can be unlinked and then new socket can be bound to it. With this NLA it's possible to check which file a particular socket is really bound to. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix_diag: Unix socket name NLAPavel Emelyanov2011-12-16
| | | | | | | | Report the sun_path when requested as NLA. With leading '\0' if present but without the leading AF_UNIX bits. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix_diag: Dumping exact socket corePavel Emelyanov2011-12-16
| | | | | | | | | | | | The socket inode is used as a key for lookup. This is effectively the only really unique ID of a unix socket, but using this for search currently has one problem -- it is O(number of sockets) :( Does it worth fixing this lookup or inventing some other ID for unix sockets? Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix_diag: Dumping all sockets corePavel Emelyanov2011-12-16
| | | | | | | | Walk the unix sockets table and fill the core response structure, which includes type, state and inode. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* unix_diag: Basic module skeletonPavel Emelyanov2011-12-16
| | | | | | | | Includes basic module_init/_exit functionality, dump/get_exact stubs and declares the basic API structures for request and response. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* af_unix: Export stuff required for diag modulePavel Emelyanov2011-12-16
| | | | | Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sock_diag: Generalize requests cookies managementsPavel Emelyanov2011-12-16
| | | | | | | | | The sk address is used as a cookie between dump/get_exact calls. It will be required for unix socket sdumping, so move it from inet_diag to sock_diag. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sock_diag: Fix module netlink aliasesPavel Emelyanov2011-12-16
| | | | | | | | | | | | | | I've made a mistake when fixing the sock_/inet_diag aliases :( 1. The sock_diag layer should request the family-based alias, not just the IPPROTO_IP one; 2. The inet_diag layer should request for AF_INET+protocol alias, not just the protocol one. Thus fix this. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-12-16
|\ | | | | | | | | | | | | Conflicts: drivers/net/ethernet/freescale/fsl_pq_mdio.c net/batman-adv/translation-table.c net/ipv6/route.c
| * ipv6: Check dest prefix length on original route not copied one in ↵David S. Miller2011-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rt6_alloc_cow(). After commit 8e2ec639173f325977818c45011ee176ef2b11f6 ("ipv6: don't use inetpeer to store metrics for routes.") the test in rt6_alloc_cow() for setting the ANYCAST flag is now wrong. 'rt' will always now have a plen of 128, because it is set explicitly to 128 by ip6_rt_copy. So to restore the semantics of the test, check the destination prefix length of 'ort'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * sch_gred: should not use GFP_KERNEL while holding a spinlockEric Dumazet2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | gred_change_vq() is called under sch_tree_lock(sch). This means a spinlock is held, and we are not allowed to sleep in this context. We might pre-allocate memory using GFP_KERNEL before taking spinlock, but this is not suitable for stable material. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipip, sit: copy parms.name after register_netdeviceTed Feng2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Same fix as 731abb9cb2 for ipip and sit tunnel. Commit 1c5cae815d removed an explicit call to dev_alloc_name in ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice will now create a valid name, however the tunnel keeps a copy of the name in the private parms structure. Fix this by copying the name back after register_netdevice has successfully returned. This shows up if you do a simple tunnel add, followed by a tunnel show: $ sudo ip tunnel add mode ipip remote 10.2.20.211 $ ip tunnel tunl0: ip/ip remote any local any ttl inherit nopmtudisc tunl%d: ip/ip remote 10.2.20.211 local any ttl inherit $ sudo ip tunnel add mode sit remote 10.2.20.212 $ ip tunnel sit0: ipv6/ip remote any local any ttl 64 nopmtudisc 6rd-prefix 2002::/16 sit%d: ioctl 89f8 failed: No such device sit%d: ipv6/ip remote 10.2.20.212 local any ttl inherit Cc: stable@vger.kernel.org Signed-off-by: Ted Feng <artisdom@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: Fix for adding multicast route for loopback device automatically.Li Wei2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no obvious reason to add a default multicast route for loopback devices, otherwise there would be a route entry whose dst.error set to -ENETUNREACH that would blocking all multicast packets. ==================== [ more detailed explanation ] The problem is that the resulting routing table depends on the sequence of interface's initialization and in some situation, that would block all muticast packets. Suppose there are two interfaces on my computer (lo and eth0), if we initailize 'lo' before 'eth0', the resuting routing table(for multicast) would be # ip -6 route show | grep ff00:: unreachable ff00::/8 dev lo metric 256 error -101 ff00::/8 dev eth0 metric 256 When sending multicasting packets, routing subsystem will return the first route entry which with a error set to -101(ENETUNREACH). I know the kernel will set the default ipv6 address for 'lo' when it is up and won't set the default multicast route for it, but there is no reason to stop 'init' program from setting address for 'lo', and that is exactly what systemd did. I am sure there is something wrong with kernel or systemd, currently I preferred kernel caused this problem. ==================== Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵John W. Linville2011-12-09
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
| | * mac80211: fix another race in aggregation startJohannes Berg2011-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emmanuel noticed that when mac80211 stops the queues for aggregation that can leave a packet pending. This packet will be given to the driver after the AMPDU callback, but as a non-aggregated packet which messes up the sequence number etc. I also noticed by looking at the code that if packets are being processed while we clear the WANT_START bit, they might see it cleared already and queue up on tid_tx->pending. If the driver then rejects the new aggregation session we leak the packet. Fix both of these issues by changing this code to not stop the queues at all. Instead, let packets queue up on the tid_tx->pending queue instead of letting them get to the driver, and add code to recover properly in case the driver rejects the session. (The patch looks large because it has to move two functions to before their new use.) Cc: stable@vger.kernel.org Reported-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * Merge branch 'master' of ↵John W. Linville2011-12-06
| | |\ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth
| * | \ Merge branch 'batman-adv/maint' of git://git.open-mesh.org/linux-mergeDavid S. Miller2011-12-07
| |\ \ \
| | * | | batman-adv: delete global entry in case of roamingAntonio Quartulli2011-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When receiving a DEL change for a client due to a roaming event (change is marked with TT_CLIENT_ROAM), each node has to check if the client roamed to itself or somewhere else. In the latter case the global entry is kept to avoid having no route at all otherwise we can safely delete the global entry Signed-off-by: Antonio Quartulli <ordex@autistici.org>
| | * | | batman-adv: in case of roaming mark the client with TT_CLIENT_ROAMAntonio Quartulli2011-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of a client roaming from node A to node B, the latter have to mark the corresponding global entry with TT_CLIENT_ROAM (instead of TT_CLIENT_PENDING). Marking a global entry with TT_CLIENT_PENDING will end up in keeping such entry forever (because this flag is only meant to be used with local entries and it is never checked on global ones). In the worst case (all the clients roaming to the same node A) the local and the global table will contain exactly the same clients. Batman-adv will continue to work, but the memory usage is duplicated. Signed-off-by: Antonio Quartulli <ordex@autistici.org>
* | | | | tcp_memcontrol: fix reversed if conditionDan Carpenter2011-12-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should only dereference the pointer if it's valid, not the other way round. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: ping: remove some sparse errorsEric Dumazet2011-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net/ipv4/sysctl_net_ipv4.c:78:6: warning: symbol 'inet_get_ping_group_range_table' was not declared. Should it be static? net/ipv4/sysctl_net_ipv4.c:119:31: warning: incorrect type in argument 2 (different signedness) net/ipv4/sysctl_net_ipv4.c:119:31: expected int *range net/ipv4/sysctl_net_ipv4.c:119:31: got unsigned int *<noident> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | cls_flow: remove one dynamic arrayEric Dumazet2011-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Its better to use a predefined size for this small automatic variable. Removes a sparse error as well : net/sched/cls_flow.c:288:13: error: bad constant expression Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | vlan: static functionsEric Dumazet2011-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6d4cdf47d2 (vlan: add 802.1q netpoll support) forgot to declare as static some private functions. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | vlan: add rtnl_dereference() annotationsDan Carpenter2011-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original code generates a Sparse warning: net/8021q/vlan_core.c:336:9: error: incompatible types in comparison expression (different address spaces) It's ok to dereference __rcu pointers here because we are holding the RTNL lock. I've added some calls to rtnl_dereference() to silence the warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | rtnetlink: rtnl_link_register() sanity testEric Dumazet2011-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before adding a struct rtnl_link_ops into link_ops list, check it doesnt clash with a prior one. Based on a previous patch from Alexander Smirnov Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | ipv6: If neigh lookup fails during icmp6 dst allocation, propagate error.David S. Miller2011-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't just succeed with a route that has a NULL neighbour attached. This follows the behavior of addrconf_dst_alloc(). Allowing this kind of route to end up with a NULL neigh attached will result in packet drops on output until the route is somehow invalidated, since nothing will meanwhile try to lookup the neigh again. A statistic is bumped for the case where we see a neigh-less route on output, but the resulting packet drop is otherwise silent in nature, and frankly it's a hard error for this to happen and ipv6 should do what ipv4 does which is say something in the kernel logs. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: Remove unused neighbour layer ops.David S. Miller2011-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's simpler to just keep these things out until there is a real user of them, so we can see what the needs actually are, rather than keep these things around as useless overhead. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | netem: add cell concept to simulate special MAC behaviorHagen Paul Pfeifer2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extension can be used to simulate special link layer characteristics. Simulate because packet data is not modified, only the calculation base is changed to delay a packet based on the original packet size and artificial cell information. packet_overhead can be used to simulate a link layer header compression scheme (e.g. set packet_overhead to -20) or with a positive packet_overhead value an additional MAC header can be simulated. It is also possible to "replace" the 14 byte Ethernet header with something else. cell_size and cell_overhead can be used to simulate link layer schemes, based on cells, like some TDMA schemes. Another application area are MAC schemes using a link layer fragmentation with a (small) header each. Cell size is the maximum amount of data bytes within one cell. Cell overhead is an additional variable to change the per-cell-overhead (e.g. 5 byte header per fragment). Example (5 kbit/s, 20 byte per packet overhead, cell-size 100 byte, per cell overhead 5 byte): tc qdisc add dev eth0 root netem rate 5kbit 20 100 5 Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge branch 'batman-adv/next' of git://git.open-mesh.org/linux-mergeDavid S. Miller2011-12-12
|\ \ \ \ \
| * | | | | batman-adv: Only write requested number of byte to user bufferSven Eckelmann2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't write more than the requested number of bytes of an batman-adv icmp packet to the userspace buffer. Otherwise unrelated userspace memory might get overridden by the kernel. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * | | | | batman-adv: Directly check read of icmp packet in copy_from_userSven Eckelmann2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The access_ok read check can be directly done in copy_from_user since a failure of access_ok is handled the same way as an error in __copy_from_user. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * | | | | batman-adv: bat_socket_read missing checksPaul Kot2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Writing a icmp_packet_rr and then reading icmp_packet can lead to kernel memory corruption, if __user *buf is just below TASK_SIZE. Signed-off-by: Paul Kot <pawlkt@gmail.com> [sven@narfation.org: made it checkpatch clean] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * | | | | batman-adv: format multi-line if in the correct wayAntonio Quartulli2011-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in an multi-line if statement leading edges should line up to the opening parenthesis Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * | | | | batman-adv: remove extra negation in gw_out_of_range()Dan Carpenter2011-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a typo here where an extra '!' made the check to the opposite of what was intended. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * | | | | batman-adv: use unregister_netdevice() when softif_create failsSimon Wunderlich2011-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When entering softif_create(), the rtnl lock has already been acquired by store_mesh_iface(). (store_mesh_iface() -> hardif_enable_interface() -> softif_create) In case of an error, we should therefore call unregister_netdevice() instead of unregister_netdev(). unregister_netdev() tries to acquire the rtnl lock itself and deadlocks in this situation. unregister_netdevice() assumes that the rtnl lock is already been held. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>
| * | | | | batman-adv: check return value for hash_add()Simon Wunderlich2011-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if hash_add() fails, we should remove the structure to avoid memory leaks. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Sven Eckelmann <sven@narfation.org>
| * | | | | batman-adv: generalise tt_local_reset_flags()Antonio Quartulli2011-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tt_local_reset_flags() is actually used for one use case only. It is not generalised enough to be used indifferent situations. This patch make it general enough in order to let other code use it whenever a flag set is requested over the whole hash table (passed as parameter). The function is now called tt_set_flags() Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Sven Eckelmann <sven@narfation.org>
| * | | | | batman-adv: create a common substructure for tt_global/local_entryAntonio Quartulli2011-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several functions in the translation table management code assume that the tt_global_entry and tt_local_entry structures have the same initial fields such as 'addr' and 'hash_entry'. To improve the code readability and to avoid mistakes in later changes, a common substructure that substitute the shared fields has been introduced (struct tt_common_entry). Thanks to this modification, it has also been possible to slightly reduce the code length by merging some functions like compare_ltt/gtt() and tt_local/global_hash_find() Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Sven Eckelmann <sven@narfation.org>
| * | | | | batman-adv: report compat_version in version field in case of version mismatchMarek Lindner2011-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reported-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>
* | | | | | Display maximum tcp memory allocation in kmem cgroupGlauber Costa2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces kmem.tcp.max_usage_in_bytes file, living in the kmem_cgroup filesystem. The root cgroup will display a value equal to RESOURCE_MAX. This is to avoid introducing any locking schemes in the network paths when cgroups are not being actively used. All others, will see the maximum memory ever used by this cgroup. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Display current tcp failcnt in kmem cgroupGlauber Costa2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces kmem.tcp.failcnt file, living in the kmem_cgroup filesystem. Following the pattern in the other memcg resources, this files keeps a counter of how many times allocation failed due to limits being hit in this cgroup. The root cgroup will always show a failcnt of 0. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Display current tcp memory allocation in kmem cgroupGlauber Costa2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces kmem.tcp.usage_in_bytes file, living in the kmem_cgroup filesystem. It is a simple read-only file that displays the amount of kernel memory currently consumed by the cgroup. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | tcp buffer limitation: per-cgroup limitGlauber Costa2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to effectively control the amount of kernel memory pinned by a cgroup. This value is ignored in the root cgroup, and in all others, caps the value specified by the admin in the net namespaces' view of tcp_sysctl_mem. If namespaces are being used, the admin is allowed to set a value bigger than cgroup's maximum, the same way it is allowed to set pretty much unlimited values in a real box. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | per-netns ipv4 sysctl_tcp_memGlauber Costa2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows each namespace to independently set up its levels for tcp memory pressure thresholds. This patch alone does not buy much: we need to make this values per group of process somehow. This is achieved in the patches that follows in this patchset. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | tcp memory pressure controlsGlauber Costa2011-12-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces memory pressure controls for the tcp protocol. It uses the generic socket memory pressure code introduced in earlier patches, and fills in the necessary data in cg_proto struct. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>