aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAge
...
| * | | | netfilter: nf_tables: expression ops overloadingPatrick McHardy2013-10-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split the expression ops into two parts and support overloading of the runtime expression ops based on the requested function through a ->select_ops() callback. This can be used to provide optimized implementations, for instance for loading small aligned amounts of data from the packet or inlining frequently used operations into the main evaluation loop. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: nf_tables: add netlink set APIPatrick McHardy2013-10-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the new netlink API for maintaining nf_tables sets independently of the ruleset. The API supports the following operations: - creation of sets - deletion of sets - querying of specific sets - dumping of all sets - addition of set elements - removal of set elements - dumping of all set elements Sets are identified by name, each table defines an individual namespace. The name of a set may be allocated automatically, this is mostly useful in combination with the NFT_SET_ANONYMOUS flag, which destroys a set automatically once the last reference has been released. Sets can be marked constant, meaning they're not allowed to change while linked to a rule. This allows to perform lockless operation for set types that would otherwise require locking. Additionally, if the implementation supports it, sets can (as before) be used as maps, associating a data value with each key (or range), by specifying the NFT_SET_MAP flag and can be used for interval queries by specifying the NFT_SET_INTERVAL flag. Set elements are added and removed incrementally. All element operations support batching, reducing netlink message and set lookup overhead. The old "set" and "hash" expressions are replaced by a generic "lookup" expression, which binds to the specified set. Userspace is not aware of the actual set implementation used by the kernel anymore, all configuration options are generic. Currently the implementation selection logic is largely missing and the kernel will simply use the first registered implementation supporting the requested operation. Eventually, the plan is to have userspace supply a description of the data characteristics and select the implementation based on expected performance and memory use. This patch includes the new 'lookup' expression to look up for element matching in the set. This patch includes kernel-doc descriptions for this set API and it also includes the following fixes. From Patrick McHardy: * netfilter: nf_tables: fix set element data type in dumps * netfilter: nf_tables: fix indentation of struct nft_set_elem comments * netfilter: nf_tables: fix oops in nft_validate_data_load() * netfilter: nf_tables: fix oops while listing sets of built-in tables * netfilter: nf_tables: destroy anonymous sets immediately if binding fails * netfilter: nf_tables: propagate context to set iter callback * netfilter: nf_tables: add loop detection From Pablo Neira Ayuso: * netfilter: nf_tables: allow to dump all existing sets * netfilter: nf_tables: fix wrong type for flags variable in newelem Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: add nftablesPatrick McHardy2013-10-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds nftables which is the intended successor of iptables. This packet filtering framework reuses the existing netfilter hooks, the connection tracking system, the NAT subsystem, the transparent proxying engine, the logging infrastructure and the userspace packet queueing facilities. In a nutshell, nftables provides a pseudo-state machine with 4 general purpose registers of 128 bits and 1 specific purpose register to store verdicts. This pseudo-machine comes with an extensible instruction set, a.k.a. "expressions" in the nftables jargon. The expressions included in this patch provide the basic functionality, they are: * bitwise: to perform bitwise operations. * byteorder: to change from host/network endianess. * cmp: to compare data with the content of the registers. * counter: to enable counters on rules. * ct: to store conntrack keys into register. * exthdr: to match IPv6 extension headers. * immediate: to load data into registers. * limit: to limit matching based on packet rate. * log: to log packets. * meta: to match metainformation that usually comes with the skbuff. * nat: to perform Network Address Translation. * payload: to fetch data from the packet payload and store it into registers. * reject (IPv4 only): to explicitly close connection, eg. TCP RST. Using this instruction-set, the userspace utility 'nft' can transform the rules expressed in human-readable text representation (using a new syntax, inspired by tcpdump) to nftables bytecode. nftables also inherits the table, chain and rule objects from iptables, but in a more configurable way, and it also includes the original datatype-agnostic set infrastructure with mapping support. This set infrastructure is enhanced in the follow up patch (netfilter: nf_tables: add netlink set API). This patch includes the following components: * the netlink API: net/netfilter/nf_tables_api.c and include/uapi/netfilter/nf_tables.h * the packet filter core: net/netfilter/nf_tables_core.c * the expressions (described above): net/netfilter/nft_*.c * the filter tables: arp, IPv4, IPv6 and bridge: net/ipv4/netfilter/nf_tables_ipv4.c net/ipv6/netfilter/nf_tables_ipv6.c net/ipv4/netfilter/nf_tables_arp.c net/bridge/netfilter/nf_tables_bridge.c * the NAT table (IPv4 only): net/ipv4/netfilter/nf_table_nat_ipv4.c * the route table (similar to mangle): net/ipv4/netfilter/nf_table_route_ipv4.c net/ipv6/netfilter/nf_table_route_ipv6.c * internal definitions under: include/net/netfilter/nf_tables.h include/net/netfilter/nf_tables_core.h * It also includes an skeleton expression: net/netfilter/nft_expr_template.c and the preliminary implementation of the meta target net/netfilter/nft_meta_target.c It also includes a change in struct nf_hook_ops to add a new pointer to store private data to the hook, that is used to store the rule list per chain. This patch is based on the patch from Patrick McHardy, plus merged accumulated cleanups, fixes and small enhancements to the nftables code that has been done since 2009, which are: From Patrick McHardy: * nf_tables: adjust netlink handler function signatures * nf_tables: only retry table lookup after successful table module load * nf_tables: fix event notification echo and avoid unnecessary messages * nft_ct: add l3proto support * nf_tables: pass expression context to nft_validate_data_load() * nf_tables: remove redundant definition * nft_ct: fix maxattr initialization * nf_tables: fix invalid event type in nf_tables_getrule() * nf_tables: simplify nft_data_init() usage * nf_tables: build in more core modules * nf_tables: fix double lookup expression unregistation * nf_tables: move expression initialization to nf_tables_core.c * nf_tables: build in payload module * nf_tables: use NFPROTO constants * nf_tables: rename pid variables to portid * nf_tables: save 48 bits per rule * nf_tables: introduce chain rename * nf_tables: check for duplicate names on chain rename * nf_tables: remove ability to specify handles for new rules * nf_tables: return error for rule change request * nf_tables: return error for NLM_F_REPLACE without rule handle * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification * nf_tables: fix NLM_F_MULTI usage in netlink notifications * nf_tables: include NLM_F_APPEND in rule dumps From Pablo Neira Ayuso: * nf_tables: fix stack overflow in nf_tables_newrule * nf_tables: nft_ct: fix compilation warning * nf_tables: nft_ct: fix crash with invalid packets * nft_log: group and qthreshold are 2^16 * nf_tables: nft_meta: fix socket uid,gid handling * nft_counter: allow to restore counters * nf_tables: fix module autoload * nf_tables: allow to remove all rules placed in one chain * nf_tables: use 64-bits rule handle instead of 16-bits * nf_tables: fix chain after rule deletion * nf_tables: improve deletion performance * nf_tables: add missing code in route chain type * nf_tables: rise maximum number of expressions from 12 to 128 * nf_tables: don't delete table if in use * nf_tables: fix basechain release From Tomasz Bursztyka: * nf_tables: Add support for changing users chain's name * nf_tables: Change chain's name to be fixed sized * nf_tables: Add support for replacing a rule by another one * nf_tables: Update uapi nftables netlink header documentation From Florian Westphal: * nft_log: group is u16, snaplen u32 From Phil Oester: * nf_tables: operational limit match Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: nf_nat: move alloc_null_binding to nf_nat_core.cPablo Neira Ayuso2013-10-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to nat_decode_session, alloc_null_binding is needed for both ip_tables and nf_tables, so move it to nf_nat_core.c. This change is required by nf_tables. This is an adapted version of the original patch from Patrick McHardy. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: pass hook ops to hookfnPatrick McHardy2013-10-14
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the hook ops to the hookfn to allow for generic hook functions. This change is required by nf_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | inet_diag: use sock_gen_put()Eric Dumazet2013-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP listener refactoring, part 6 : Use sock_gen_put() from inet_diag_dump_one_icsk() for future SYN_RECV support. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | batman-adv: Add dummy soft-interface rx mode handlerLinus Lüssing2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not actually need to set any rx filters for the virtual batman soft interface. However a dummy handler enables a user to set static multicast listeners for instance. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
* | | | batman-adv: make batadv_tt_save_orig_buffer() genericAntonio Quartulli2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simple batadv_tt_save_orig_buffer() refactoring aiming to make it more generic and avoid useless casts. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
* | | | batman-adv: implement batadv_tt_entriesAntonio Quartulli2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement batadv_tt_entries() to get the number of entries fitting in a given amount of bytes. This computation is done several times in the code and therefore it is useful to have an helper function. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
* | | | batman-adv: remove useless find_router look upSimon Wunderlich2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is not used anymore with the new fragmentation, and it might actually mess up the bonding code because find_router() assumes it is only called once per packet. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
* | | | batman-adv: consider network coding overhead when calculating required mtuMarek Lindner2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The module prints a warning when the MTU on the hard interface is too small to transfer payload traffic without fragmentation. The required MTU is calculated based on the encapsulation header size. If network coding is compild into the module its header size is taken into account as well. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
* | | | batman-adv: create common header for ICMP packetsAntonio Quartulli2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the icmp and the icmp_rr packets share the same initial fields since they use the same code to be processed and forwarded. Extract the common fields and put them into a separate struct so that future ICMP packets can be easily added without bloating the packet definition. However, keep the seqno field outside of the newly created common header because future ICMP types may require a bigger sequence number space. This change breaks compatibility due to fields reordering in the ICMP headers. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
* | | | batman-adv: use htons when possibleAntonio Quartulli2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When comparing a network ordered value with a constant, it is better to convert the constant at compile time by means of htons() instead of converting the value at runtime using ntohs(). This refactoring may slightly improve the code performance. Moreover substitute __constant_htons() with htons() since the latter increase readability and it is smart enough to be as efficient as the former Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
* | | | batman-adv: Fragment and send skbs larger than mtuMartin Hundebøll2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Non-broadcast packets larger than MTU are fragmented and sent with an encapsulating header. Up to 16 fragments are supported, which are sent in reverse order on the wire to allow minimal memory copying when creating fragments. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
* | | | batman-adv: Receive fragmented packets and mergeMartin Hundebøll2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fragments arriving at their destination are buffered for later merge. Merged packets are passed to the main receive function as had they never been fragmented. Fragments are forwarded without merging if the MTU of the outgoing interface is smaller than the size of the merged packet. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
* | | | batman-adv: Remove old fragmentation codeMartin Hundebøll2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the existing fragmentation code before adding the new version and delete unicast.{h,c}. batadv_unicast_send_skb() is moved to send.c and renamed to batadv_send_skb_unicast(). fragmentation entry in sysfs (bat_priv->fragmentation) is kept for use in the new fragmentation code. BATADV_UNICAST_FRAG packet type is renamed to BATADV_FRAG for use in the new fragmentation code. Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
* | | | batman-adv: use VLAN_ETH_HLEN instead of sizeof(struct vlan_eth_hdr)Antonio Quartulli2013-10-12
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
* | | | batman-adv: h_vlan_encapsulated_proto access refactoringAntonio Quartulli2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of a VLAN tagged frame the ethhdr pointer is moved forward by 4 bytes so that the offset of h_proto in struct ethhdr matches the real h_vlan_encapsulated_proto address in the skb. While this trickery is correct it makes the code harder to understand and may lead to bugs in case of re-use of ethhdr for other purposes. This patch introduces a proto variable to make things cleaner and easier to understand. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
* | | | batman-adv: don't use call_rcu if not neededAntonio Quartulli2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | batadv_tt_global_entry_free_ref uses call_rcu to schedule a function which will only free the global entry itself. For this reason call_rcu is useless and kfree_rcu can be used to simplify the code. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
* | | | batman-adv: remove batadv_tt_global_add_orig declarationAntonio Quartulli2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | batadv_tt_global_add_orig is neither used nor implemented anymore, therefore it is possible to remove its declaration Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
* | | | batman-adv: make tt_global_add static and return boolAntonio Quartulli2013-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | batadv_tt_global_add is not used anymore outside of the TT code thanks to the TVLV implementation. It can therefore be declared as static Last user has been removed by 3de4e64df0f1326db7cc0ef25f5af8522850252d ("batman-adv: tvlv - convert roaming adv packet to use tvlv unicast packets") Moreover make it return bool since its result can be either 0 or 1. Reported-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
* | | | batman-adv: only add recordroute information to icmp request/replySimon Wunderlich2013-10-12
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding host information for record route is only required for ICMP requests and replys, and should not be added to just any (future?) packet type. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
* | | tcp: tcp_transmit_skb() optimizationsEric Dumazet2013-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) We need to take a timestamp only for skb that should be cloned. Other skbs are not in write queue and no rtt estimation is done on them. 2) the unlikely() hint is wrong for receivers (they send pure ACK) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: MF Nowlan <fitz@cs.yale.edu> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-By: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller2013-10-11
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Included changes: - update emails for A. Quartulli and M. Lindner in MAINTAINERS - switch to the next on-the-wire protocol version - introduce the T(ype) V(ersion) L(ength) V(alue) framework - adjust the existing components to make them use the new TVLV code - make the TT component use CRC32 instead of CRC16 - totally remove the VIS functionality (has been moved to userspace) - reorder packet types and flags - add static checks on packet format - remove __packed from batadv_ogm_packet
| * | | batman-adv: reorder batadv_iv_flagsSimon Wunderlich2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vis flag is not needed anymore, and since we do a compat bump we can start with the first bit again Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: remove packed from batadv_ogm_packetSimon Wunderlich2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we decreased the struct size from 26 to 24 byte, we can remove __packed as the compiler will not add any more padding. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: reorder packet typesSimon Wunderlich2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reordering the packet type numbers allows us to handle unicast packets in a general way - even if we don't know the specific packet type, we can still forward it. There was already code handling this for a couple of unicast packets, and this is the more generalized version to do that. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: add build check macros for packet member offsetSimon Wunderlich2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we removed the __packed from most of the packets, we should make sure that the offset generated by the compiler are correct for sent/received data. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: remove vis functionalitySimon Wunderlich2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is replaced by a userspace program, we don't need this functionality to bloat the kernel. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: move BATADV_TT_CLIENT_TEMP to higher bitAntonio Quartulli2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Client flags from bit 0 to 7 are sent over the wire. BATADV_TT_CLIENT_TEMP is a local flag and is not supposed to be sent to the network. Therefore it has occupy a higher bit. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * | | batman-adv: use CRC32C instead of CRC16 in TT codeAntonio Quartulli2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CRC32C has to be preferred to CRC16 because of its possible HW native support and because of the reduced collision probability. With this change the Translation Table component now uses CRC32C to compute the local and global table checksum. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
| * | | batman-adv: tvlv - convert roaming adv packet to use tvlv unicast packetsMarek Lindner2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of generating roaming specific packets the TVLV unicast API is used to send roaming information. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: tvlv - convert tt query packet to use tvlv unicast packetsMarek Lindner2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of generating TT specific packets the TVLV unicast API is used to send translation table data. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: tvlv - convert tt data sent within OGMsMarek Lindner2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The translation table meta data (version number, crc checksum, etc) as well as the translation table diff propgated within OGMs now uses the newly introduced tvlv infrastructure. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: tvlv - add network coding containerMarek Lindner2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create network coding container to announce network coding capabilities (if enabled). Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: tvlv - add distributed arp table containerMarek Lindner2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Create DAT container to announce DAT capabilities (if enabled). Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: tvlv - gateway download/upload bandwidth containerMarek Lindner2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this patch batman-adv read the advertised uplink bandwidth from userspace and compressed this information into a single byte called "gateway class". Now the download & upload bandwidth information is sent as-is. No userspace change is necessary since the sysfs API always allowed to specify a bandwidth. Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Spyros Gasteratos <morfeas3000@gmail.com> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: tvlv - basic infrastructureMarek Lindner2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal is to provide the infrastructure for sending, receiving and parsing information 'containers' while preserving backward compatibility. TVLV (based on the commonly known Type Length Value technique) was chosen as the format for those containers. Even if a node does not know the tvlv type of a certain container it can simply skip the current container and proceed with the next. Past experience has shown features evolve over time, so a 'version' field was added right from the start to allow differentiating between feature variants - hence the name: T(ype) V(ersion) L(ength) V(alue). This patch introduces the basic TVLV infrastructure: * register / unregister tvlv containers to be sent with each OGM (on primary interfaces only) * register / unregister callback handlers to be called upon finding the corresponding tvlv type in a tvlv buffer * unicast tvlv send / receive API calls Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Spyros Gasteratos <morfeas3000@gmail.com> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
| * | | batman-adv: switch to a new packet compatibility versionAntonio Quartulli2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this change batman-adv is breaking compatibility with older versions and it is moving to compat-version 15. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
* | | | inet: rename ir_loc_port to ir_numEric Dumazet2013-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 634fb979e8f ("inet: includes a sock_common in request_sock") I forgot that the two ports in sock_common do not have same byte order : skc_dport is __be16 (network order), but skc_num is __u16 (host order) So sparse complains because ir_loc_port (mapped into skc_num) is considered as __u16 while it should be __be16 Let rename ir_loc_port to ireq->ir_num (analogy with inet->inet_num), and perform appropriate htons/ntohs conversions. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: use ACCESS_ONCE() in tcp_update_pacing_rate()Eric Dumazet2013-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sk_pacing_rate is read by sch_fq packet scheduler at any time, with no synchronization, so make sure we update it in a sensible way. ACCESS_ONCE() is how we instruct compiler to not do stupid things, like using the memory location as a temporary variable. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | inet: includes a sock_common in request_sockEric Dumazet2013-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP listener refactoring, part 5 : We want to be able to insert request sockets (SYN_RECV) into main ehash table instead of the per listener hash table to allow RCU lookups and remove listener lock contention. This patch includes the needed struct sock_common in front of struct request_sock This means there is no more inet6_request_sock IPv6 specific structure. Following inet_request_sock fields were renamed as they became macros to reference fields from struct sock_common. Prefix ir_ was chosen to avoid name collisions. loc_port -> ir_loc_port loc_addr -> ir_loc_addr rmt_addr -> ir_rmt_addr rmt_port -> ir_rmt_port iif -> ir_iif Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: gro: allow to build full sized skbEric Dumazet2013-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb, typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags. It's relatively easy to extend the skb using frag_list to allow more frags to be appended into the last sk_buff. This still builds very efficient skbs, and allows reaching 45 MSS per skb. (45 MSS GRO packet uses one skb plus a frag_list containing 2 additional sk_buff) High speed TCP flows benefit from this extension by lowering TCP stack cpu usage (less packets stored in receive queue, less ACK packets processed) Forwarding setups could be hurt, as such skbs will need to be linearized, although its not a new problem, as GRO could already provide skbs with a frag_list. We could make the 65536 bytes threshold a tunable to mitigate this. (First time we need to linearize skb in skb_needs_linearize(), we could lower the tunable to ~16*1460 so that following skb_gro_receive() calls build smaller skbs) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | fib_trie: only calc for the un-first nodebaker.zhang2013-10-10
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a enhancement. for the first node in fib_trie, newpos is 0, bit is 1. Only for the leaf or node with unmatched key need calc pos. Signed-off-by: baker.zhang <baker.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: fix build errors if ipv6 is disabledEric Dumazet2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_IPV6=n is still a valid choice ;) It appears we can remove dead code. Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | udp: fix a typo in __udp4_lib_mcast_demux_lookupEric Dumazet2013-10-09
| | | | | | | | | | | | | | | | | | | | | At this point sk might contain garbage. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: make lookups simpler and fasterEric Dumazet2013-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP listener refactoring, part 4 : To speed up inet lookups, we moved IPv4 addresses from inet to struct sock_common Now is time to do the same for IPv6, because it permits us to have fast lookups for all kind of sockets, including upcoming SYN_RECV. Getting IPv6 addresses in TCP lookups currently requires two extra cache lines, plus a dereference (and memory stall). inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6 This patch is way bigger than its IPv4 counter part, because for IPv4, we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6, it's not doable easily. inet6_sk(sk)->daddr becomes sk->sk_v6_daddr inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr at the same offset. We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp/dccp: remove twchainEric Dumazet2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP listener refactoring, part 3 : Our goal is to hash SYN_RECV sockets into main ehash for fast lookup, and parallel SYN processing. Current inet_ehash_bucket contains two chains, one for ESTABLISH (and friend states) sockets, another for TIME_WAIT sockets only. As the hash table is sized to get at most one socket per bucket, it makes little sense to have separate twchain, as it makes the lookup slightly more complicated, and doubles hash table memory usage. If we make sure all socket types have the lookup keys at the same offsets, we can use a generic and faster lookup. It turns out TIME_WAIT and ESTABLISHED sockets already have common lookup fields for IPv4. [ INET_TW_MATCH() is no longer needed ] I'll provide a follow-up to factorize IPv6 lookup as well, to remove INET6_TW_MATCH() This way, SYN_RECV pseudo sockets will be supported the same. A new sock_gen_put() helper is added, doing either a sock_put() or inet_twsk_put() [ and will support SYN_RECV later ]. Note this helper should only be called in real slow path, when rcu lookup found a socket that was moved to another identity (freed/reused immediately), but could eventually be used in other contexts, like sock_edemux() Before patch : dmesg | grep "TCP established" TCP established hash table entries: 524288 (order: 11, 8388608 bytes) After patch : TCP established hash table entries: 524288 (order: 10, 4194304 bytes) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-10-08
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: include/linux/netdevice.h net/core/sock.c Trivial merge issues. Removal of "extern" for functions declaration in netdevice.h at the same time "const" was added to an argument. Two parallel line additions in net/core/sock.c Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | pkt_sched: fq: fix non TCP flows pacingEric Dumazet2013-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Steinar reported FQ pacing was not working for UDP flows. It looks like the initial sk->sk_pacing_rate value of 0 was a wrong choice. We should init it to ~0U (unlimited) Then, TCA_FQ_FLOW_DEFAULT_RATE should be removed because it makes no real sense. The default rate is really unlimited, and we need to avoid a zero divide. Reported-by: Steinar H. Gunderson <sesse@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>