aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAge
* mac80211: Look out for some other AP when disassoc is received.Vivek Natarajan2008-11-26
| | | | | | | | | | | When a disassoc packet is received from the AP with a reason code of 'leaving the BSS', mac80211 should go into DISABLED state just as it would do if the AP suddenly went away for some reason, as that is what will happen shortly after the AP leaves anyway. Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* nl80211: Report max TX power in NL80211_BAND_ATTR_FREQSJouni Malinen2008-11-26
| | | | | | | | This is useful information to provide for userspace (e.g., hostapd needs this to generate Country IE). Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* mac80211: Fix pid rate-setting algorithm to allow rate changesLarry Finger2008-11-26
| | | | | | | | | | | In commit 9ea2c74 named "mac80211/drivers: rewrite the rate control API", the meaning of status.rates[i].count was changed from number of retries to total number of tries. As a result, the pid rate-setting algorithm fails because every packet appears to have needed a retransmit. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* mac80211: make Minstrel the default rate control algorithmLuis R. Rodriguez2008-11-26
| | | | | | | | | | | | | This makes minstrel the default rate control algorithm for mac80211. For more information see: http://wireless.kernel.org/en/developers/Documentation/mac80211/RateControl/minstrel If someone can come up with a better algorithm they get a prize (undisclosed). Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* mac80211: don't assume driver has been attached on registrationLuis R. Rodriguez2008-11-25
| | | | | | | | | | | | | | | | | mac80211's ieee80211_register_hw() is often called within the probe path so it cannot assume the device's driver structure has been attached yet so to create a workqueue instead of using driver->name use the wiphy's phy%d name. The name doesn't really matter anyway. This should fix sporadic oopses found when we race to beat the driver pointer setting. Not even sure how this was working properly. http://www.kerneloops.org/search.php?search=ieee80211_register_hw Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* mac80211: Use the HT capabilities from the IE instead of the station's caps.Sujith2008-11-25
| | | | | | Signed-off-by: Sujith <Sujith.Manoharan@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* cfg80211: add support for custom firmware regulatory solutionsLuis R. Rodriguez2008-11-25
| | | | | | | | | | | | | This adds API to cfg80211 to allow wireless drivers to inform us if their firmware can handle regulatory considerations *and* they cannot map these regulatory domains to an ISO / IEC 3166 alpha2. In these cases we skip the first regulatory hint instead of expecting the driver to build their own regulatory structure, providing us with an alpha2, or using the reg_notifier(). Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* cfg80211/mac80211: Add 802.11d supportLuis R. Rodriguez2008-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds country IE parsing to mac80211 and enables its usage within the new regulatory infrastructure in cfg80211. We parse the country IEs only on management beacons for the BSSID you are associated to and disregard the IEs when the country and environment (indoor, outdoor, any) matches the already processed country IE. To avoid following misinformed or outdated APs we build and use a regulatory domain out of the intersection between what the AP provides us on the country IE and what CRDA is aware is allowed on the same country. A secondary device is allowed to follow only the same country IE as it make no sense for two devices on a system to be in two different countries. In the case the AP is using country IEs for an incorrect country the user may help compliance further by setting the regulatory domain before or after the IE is parsed and in that case another intersection will be performed. CONFIG_WIRELESS_OLD_REGULATORY is supported but requires CRDA present. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* cfg80211: mark regdomains with > NL80211_MAX_SUPP_REG_RULES invalidLuis R. Rodriguez2008-11-25
| | | | | | | | | Lets remain consistent and mark rds with > NL80211_MAX_SUPP_REG_RULES number of reg rules as invalid in is_valid_rd(). Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* cfg80211: call_crda() won't tell us if CRDA was presentLuis R. Rodriguez2008-11-25
| | | | | | | | | | | | | | | | | | | | | | kobject_uevent_env() can return an error but it just tells us if the uvent was built/sent or not, it doesn't tell us anything about what happened in userspace, whether the udev rule was present nor does it tell us if CRDA was present or not. So remove the informative complaint about it assuming it will tell us such things. Note that you can determine if CRDA is present after loading cfg80211 by using: is_old_static_regdom(cfg80211_regdomain) but this doesn't account for possible user install after initial boot, and also for when the user uses the static EU regulatory domain. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* cfg80211: expect different rd in cfg80211 when intersectingLuis R. Rodriguez2008-11-25
| | | | | | | | | | When intersecting it is possible that set_regdom() was called with a regulatory domain which we'll only use as an aid to build a final regulatory domain. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* cfg80211: separate intersection section in __set_regdom()Luis R. Rodriguez2008-11-25
| | | | | | | | | | | | | So far the __set_regdom() code is pretty generic as the intersection case is fairly straight forward; this will however change when 802.11d support is added so lets separate intersection code for now in preparation for 802.11d support. This patch only has slight functional changes. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* cfg80211: remove switch from __set_regdom()Luis R. Rodriguez2008-11-25
| | | | | | | | | | | We have control over the REGDOM_SET_BY_* macros passed so remove the switch. This patch has no functional changes. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* cfg80211: remove switch from __regulatory_hint()Luis R. Rodriguez2008-11-25
| | | | | | | | | | | | | | We have complete control over REGDOM_SET_BY_* enum passed down to __regulatory_hint() as such there is no need to account for unexpected REGDOM_SET_BY_*'s, lets just remove the switch statement as this code does not change and won't change even when we add 802.11d support. This patch has no functional changes. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* cfg80211: mark negative frequencies as invalidLuis R. Rodriguez2008-11-25
| | | | | | | | | Regulatory rules with negative frequencies are now marked as invalid in is_valid_reg_rule(). Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* xfrm: remove useless forward declarationsAlexey Dobriyan2008-11-25
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ah4/ah6: remove useless NULL assignmentsAlexey Dobriyan2008-11-25
| | | | | | | struct will be kfreed in a moment, so... Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* DCB: fix kconfig optionJeff Kirsher2008-11-25
| | | | | | | | | Since the netlink option for DCB is necessary to actually be useful, simplified the Kconfig option. In addition, added useful help text for the Kconfig option. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: handle shift/merge of cloned skbs tooIlpo Järvinen2008-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This caused me to get repeatably: tcpdump: pcap_loop: recvfrom: Bad address Happens occassionally when I tcpdump my for-looped test xfers: while [ : ]; do echo -n "$(date '+%s.%N') "; ./sendfile; sleep 20; done Rest of the relevant commands: ethtool -K eth0 tso off tc qdisc add dev eth0 root netem drop 4% tcpdump -n -s0 -i eth0 -w sacklog.all Running net-next under kvm, connection goes to the same host (basically just out of kvm). The connection itself works ok and data gets sent without corruption even with a large number of tests while tcpdump fails usually within less than 5 tests. Whether it only happens because of this change or not, I don't know for sure but it's the only thing with which I've seen that error. The non-cloned variant works w/o it for much longer time. I'm yet to debug where the error actually comes from. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: add some mibs to track collapsingIlpo Järvinen2008-11-25
| | | | | Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: Make shifting not clear the hintsIlpo Järvinen2008-11-25
| | | | | | | | | | | The earlier version was just very basic one which is "playing safe" by always clearing the hints. However, clearing of a hint is extremely costly operation with large windows, so it must be avoided at all cost whenever possible, there is a way with shifting too achieve not-clearing. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: Try to restore large SKBs while SACK processingIlpo Järvinen2008-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During SACK processing, most of the benefits of TSO are eaten by the SACK blocks that one-by-one fragment SKBs to MSS sized chunks. Then we're in problems when cleanup work for them has to be done when a large cumulative ACK comes. Try to return back to pre-split state already while more and more SACK info gets discovered by combining newly discovered SACK areas with the previous skb if that's SACKed as well. This approach has a number of benefits: 1) The processing overhead is spread more equally over the RTT 2) Write queue has less skbs to process (affect everything which has to walk in the queue past the sacked areas) 3) Write queue is consistent whole the time, so no other parts of TCP has to be aware of this (this was not the case with some other approach that was, well, quite intrusive all around). 4) Clean_rtx_queue can release most of the pages using single put_page instead of previous PAGE_SIZE/mss+1 calls In case a hole is fully filled by the new SACK block, we attempt to combine the next skb too which allows construction of skbs that are even larger than what tso split them to and it handles hole per on every nth patterns that often occur during slow start overshoot pretty nicely. Though this to be really useful also a retransmission would have to get lost since cumulative ACKs advance one hole at a time in the most typical case. TODO: handle upwards only merging. That should be rather easy when segment is fully sacked but I'm leaving that as future work item (it won't make very large difference anyway since this current approach already covers quite a lot of normal cases). I was earlier thinking of some sophisticated way of tracking timestamps of the first and the last segment but later on realized that it won't be that necessary at all to store the timestamp of the last segment. The cases that can occur are basically either: 1) ambiguous => no sensible measurement can be taken anyway 2) non-ambiguous is due to reordering => having the timestamp of the last segment there is just skewing things more off than does some good since the ack got triggered by one of the holes (besides some substle issues that would make determining right hole/skb even harder problem). Anyway, it has nothing to do with this change then. I choose to route some abnormal looking cases with goto noop, some could be handled differently (eg., by stopping the walking at that skb but again). In general, they either shouldn't happen at all or are rare enough to make no difference in practice. In theory this change (as whole) could cause some macroscale regression (global) because of cache misses that are taken over the round-trip time but it gets very likely better because of much less (local) cache misses per other write queue walkers and the big recovery clearing cumulative ack. Worth to note that these benefits would be very easy to get also without TSO/GSO being on as long as the data is in pages so that we can merge them. Currently I won't let that happen because DSACK splitting at fragment that would mess up pcounts due to sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets avoided, we have some conditions that can be made less strict. TODO: I will probably have to convert the excessive pointer passing to struct sacktag_state... :-) My testing revealed that considerable amount of skbs couldn't be shifted because they were cloned (most likely still awaiting tx reclaim)... [The rest is considering future work instead since I got repeatably EFAULT to tcpdump's recvfrom when I added pskb_expand_head to deal with clones, so I separated that into another, later patch] ...To counter that, I gave up on the fifth advantage: 5) When growing previous SACK block, less allocs for new skbs are done, basically a new alloc is needed only when new hole is detected and when the previous skb runs out of frags space ...which now only happens of if reclaim is fast enough to dispose the clone before the SACK block comes in (the window is RTT long), otherwise we'll have to alloc some. With clones being handled I got these numbers (will be somewhat worse without that), taken with fine-grained mibs: TCPSackShifted 398 TCPSackMerged 877 TCPSackShiftFallback 320 TCPSACKCOLLAPSEFALLBACKGSO 0 TCPSACKCOLLAPSEFALLBACKSKBBITS 0 TCPSACKCOLLAPSEFALLBACKSKBDATA 0 TCPSACKCOLLAPSEFALLBACKBELOW 0 TCPSACKCOLLAPSEFALLBACKFIRST 1 TCPSACKCOLLAPSEFALLBACKPREVBITS 318 TCPSACKCOLLAPSEFALLBACKMSS 1 TCPSACKCOLLAPSEFALLBACKNOHEAD 0 TCPSACKCOLLAPSEFALLBACKSHIFT 0 TCPSACKCOLLAPSENOOPSEQ 0 TCPSACKCOLLAPSENOOPSMALLPCOUNT 0 TCPSACKCOLLAPSENOOPSMALLLEN 0 TCPSACKCOLLAPSEHOLE 12 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: make tcp_sacktag_one able to handle partial skb tooIlpo Järvinen2008-11-25
| | | | | | | | | | This is preparatory work for SACK combiner patch which may have to count TCP state changes for only a part of the skb because it will intentionally avoids splitting skb to SACKed and not sacked parts. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: Make SACK code to split only at mss boundariesIlpo Järvinen2008-11-25
| | | | | | | | Sadly enough, this adds possible divide though we try to avoid it by checking one mss as common case. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: more aggressive skippingIlpo Järvinen2008-11-25
| | | | | | | | | | | | I knew already when rewriting the sacktag that this condition was too conservative, change it now since it prevent lot of useless work (especially in the sack shifter decision code that is being added by a later patch). This shouldn't change anything really, just save some processing regardless of the shifter. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: move tcp_simple_retransmit to tcp_inputIlpo Järvinen2008-11-25
| | | | | Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: collapse more than two on retransmissionIlpo Järvinen2008-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I always had thought that collapsing up to two at a time was intentional decision to avoid excessive processing if 1 byte sized skbs are to be combined for a full mtu, and consecutive retransmissions would make the size of the retransmittee double each round anyway, but some recent discussion made me to understand that was not the case. Thus make collapse work more and wait less. It would be possible to take advantage of the shifting machinery (added in the later patch) in the case of paged data but that can be implemented on top of this change. tcp_skb_is_last check is now provided by the loop. I tested a bit (ss-after-idle-off, fill 4096x4096B xfer, 10s sleep + 4096 x 1byte writes while dropping them for some a while with netem): . 16774097:16775545(1448) ack 1 win 46 . 16775545:16776993(1448) ack 1 win 46 . ack 16759617 win 2399 P 16776993:16777217(224) ack 1 win 46 . ack 16762513 win 2399 . ack 16765409 win 2399 . ack 16768305 win 2399 . ack 16771201 win 2399 . ack 16774097 win 2399 . ack 16776993 win 2399 . ack 16777217 win 2399 P 16777217:16777257(40) ack 1 win 46 . ack 16777257 win 2399 P 16777257:16778705(1448) ack 1 win 46 P 16778705:16780153(1448) ack 1 win 46 FP 16780153:16781313(1160) ack 1 win 46 . ack 16778705 win 2399 . ack 16780153 win 2399 F 1:1(0) ack 16781314 win 2399 While without drop-all period I get this: . 16773585:16775033(1448) ack 1 win 46 . ack 16764897 win 9367 . ack 16767793 win 9367 . ack 16770689 win 9367 . ack 16773585 win 9367 . 16775033:16776481(1448) ack 1 win 46 P 16776481:16777217(736) ack 1 win 46 . ack 16776481 win 9367 . ack 16777217 win 9367 P 16777217:16777218(1) ack 1 win 46 P 16777218:16777219(1) ack 1 win 46 P 16777219:16777220(1) ack 1 win 46 ... P 16777247:16777248(1) ack 1 win 46 . ack 16777218 win 9367 . ack 16777219 win 9367 ... . ack 16777233 win 9367 . ack 16777248 win 9367 P 16777248:16778696(1448) ack 1 win 46 P 16778696:16780144(1448) ack 1 win 46 FP 16780144:16781313(1169) ack 1 win 46 . ack 16780144 win 9367 F 1:1(0) ack 16781314 win 9367 The window seems to be 30-40 segments, which were successfully combined into: P 16777217:16777257(40) ack 1 win 46 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames()Eric Dumazet2008-11-24
| | | | | | | | | | | | | | | | | We can reduce pressure on dst entry refcount that slowdown UDP transmit path on SMP machines. This pressure is visible on RTP servers when delivering content to mediagateways, especially big ones, handling thousand of streams. Several cpus send UDP frames to the same destination, hence use the same dst entry. This patch makes ip_push_pending_frames() steal the refcount its callers had to take when filling inet->cork.dst. This doesnt avoid all refcounting, but still gives speedups on SMP, on UDP/RAW transmit path. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid a pair of dst_hold()/dst_release() in ip_append_data()Eric Dumazet2008-11-24
| | | | | | | | | | | | | | | | | We can reduce pressure on dst entry refcount that slowdown UDP transmit path on SMP machines. This pressure is visible on RTP servers when delivering content to mediagateways, especially big ones, handling thousand of streams. Several cpus send UDP frames to the same destination, hence use the same dst entry. This patch makes ip_append_data() eventually steal the refcount its callers had to take on the dst entry. This doesnt avoid all refcounting, but still gives speedups on SMP, on UDP/RAW transmit path Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: gen_estimator: Fix gen_kill_estimator() lookupsJarek Poplawski2008-11-24
| | | | | | | | | | | | | | gen_kill_estimator() linear lists lookups are very slow, and e.g. while deleting a large number of HTB classes soft lockups were reported. Here is another try to fix this problem: this time internally, with rbtree, so similarly to Jamal's hashing idea IIRC. (Looking for next hits could be still optimized, but it's really fast as it is.) Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: sch_drr: fix drr_dequeue loop()Patrick McHardy2008-11-24
| | | | | | | | | | | | | | | | Jarek Poplawski points out: If all child qdiscs of sch_drr are non-work-conserving (e.g. sch_tbf) drr_dequeue() will busy-loop waiting for skbs instead of leaving the job for a watchdog. Checking for list_empty() in each loop isn't necessary either, because this can never be true except the first time. Using non-work-conserving qdiscs as children of DRR makes no sense, simply bail out in that case. Reported-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Make sure BHs are disabled in sock_prot_inuse_add()Eric Dumazet2008-11-24
| | | | | | | | | There is still a call to sock_prot_inuse_add() in af_netlink while in a preemptable section. Add explicit BH disable around this call. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Make sure BHs are disabled in sock_prot_inuse_add()Eric Dumazet2008-11-24
| | | | | | | | | | | | | The rule of calling sock_prot_inuse_add() is that BHs must be disabled. Some new calls were added where this was not true and this tiggers warnings as reported by Ilpo. Fix this by adding explicit BH disabling around those call sites, or moving sock_prot_inuse_add() call inside an existing BH disabled section. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* eth: Declare an optimized compare_ether_addr_64bits() functionEric Dumazet2008-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Linus mentioned we could try to perform long word operations, even on potentially unaligned addresses, on x86 at least. David mentioned the HAVE_EFFICIENT_UNALIGNED_ACCESS test to handle this on all arches that have efficient unailgned accesses. I tried this idea and got nice assembly on 32 bits: 158: 33 82 38 01 00 00 xor 0x138(%edx),%eax 15e: 33 8a 34 01 00 00 xor 0x134(%edx),%ecx 164: c1 e0 10 shl $0x10,%eax 167: 09 c1 or %eax,%ecx 169: 74 0b je 176 <eth_type_trans+0x87> And very nice assembly on 64 bits of course (one xor, one shl) Nice oprofile improvement in eth_type_trans(), 0.17 % instead of 0.41 %, expected since we remove 8 instructions on a fast path. This patch implements a compare_ether_addr_64bits() function, that uses the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS ifdef to efficiently perform the 6 bytes comparison on all capable arches. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Make sure BHs are disabled in sock_prot_inuse_add()David S. Miller2008-11-23
| | | | | | | | | | The rule of calling sock_prot_inuse_add() is that BHs must be disabled. Some new calls were added where this was not true and this tiggers warnings as reported by Ilpo. Fix this by adding explicit BH disabling around those call sites. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix tunnels in netns after ndo_ changesAlexey Dobriyan2008-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev_net_set() should be the very first thing after alloc_netdev(). "ndo_" changes turned simple assignment (which is OK to do before netns assignment) into quite non-trivial operation (which is not OK, init_net was used). This leads to incomplete initialisation of tunnel device in netns. BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f *pde = 00000000 Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC last sysfs file: /sys/class/net/lo/operstate Pid: 10, comm: netns Not tainted (2.6.28-rc6 #1) EIP: 0060:[<c02efdb5>] EFLAGS: 00010246 CPU: 0 EIP is at ip6_tnl_exit_net+0x37/0x4f EAX: 00000000 EBX: 00000020 ECX: 00000000 EDX: 00000003 ESI: c5caef30 EDI: c782bbe8 EBP: c7909f50 ESP: c7909f48 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process netns (pid: 10, ti=c7908000 task=c7905780 task.ti=c7908000) Stack: c03e75e0 c7390bc8 c7909f60 c0245448 c7390bd8 c7390bf0 c7909fa8 c012577a 00000000 00000002 00000000 c0125736 c782bbe8 c7909f90 c0308fe3 c782bc04 c7390bd4 c0245406 c084b718 c04f0770 c03ad785 c782bbe8 c782bc04 c782bc0c Call Trace: [<c0245448>] ? cleanup_net+0x42/0x82 [<c012577a>] ? run_workqueue+0xd6/0x1ae [<c0125736>] ? run_workqueue+0x92/0x1ae [<c0308fe3>] ? schedule+0x275/0x285 [<c0245406>] ? cleanup_net+0x0/0x82 [<c0125ae1>] ? worker_thread+0x81/0x8d [<c0128344>] ? autoremove_wake_function+0x0/0x33 [<c0125a60>] ? worker_thread+0x0/0x8d [<c012815c>] ? kthread+0x39/0x5e [<c0128123>] ? kthread+0x0/0x5e [<c0103b9f>] ? kernel_thread_helper+0x7/0x10 Code: db e8 05 ff ff ff 89 c6 e8 dc 04 f6 ff eb 08 8b 40 04 e8 38 89 f5 ff 8b 44 9e 04 85 c0 75 f0 43 83 fb 20 75 f2 8b 86 84 00 00 00 <8b> 40 04 e8 1c 89 f5 ff e8 98 04 f6 ff 89 f0 e8 f8 63 e6 ff 5b EIP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f SS:ESP 0068:c7909f48 ---[ end trace 6c2f2328fccd3e0c ]--- Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Convert TCP/DCCP listening hash tables to use RCUEric Dumazet2008-11-23
| | | | | | | | | | | | | | | | | | | | | | This is the last step to be able to perform full RCU lookups in __inet_lookup() : After established/timewait tables, we add RCU lookups to listening hash table. The only trick here is that a socket of a given type (TCP ipv4, TCP ipv6, ...) can now flight between two different tables (established and listening) during a RCU grace period, so we must use different 'nulls' end-of-chain values for two tables. We define a large value : #define LISTENING_NULLS_BASE (1U << 29) So that slots in listening table are guaranteed to have different end-of-chain values than slots in established table. A reader can still detect it finished its lookup in the right chain. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: Header option insertion routine for feature-negotiationGerrit Renker2008-11-23
| | | | | | | | | | | | The patch extends existing code: * Confirm options divide into the confirmed value plus an optional preference list for SP values. Previously only the preference list was echoed for SP values, now the confirmed value is added as per RFC 4340, 6.1; * length and sanity checks are added to avoid illegal memory (or NULL) access. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: Support for Mandatory optionsGerrit Renker2008-11-23
| | | | | | | | | | Support for Mandatory options is provided by this patch, which will be used by subsequent feature-negotiation patches. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: Increase the scope of variable-length htonl/ntohl functionsGerrit Renker2008-11-23
| | | | | | | | | | | | | | | This extends the scope of two available functions, encode|decode_value_var, to work up to 6 (8) bytes, to match maximum requirements in the RFC. These functions are going to be used both by general option processing and feature negotiation code, hence declarations have been put into feat.h. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: API to query the current TX/RX CCIDGerrit Renker2008-11-23
| | | | | | | | | | | | | | This provides function to query the current TX/RX CCID dynamically, without reliance on the minisock value, using dynamic information available in the currently loaded CCID module. This query function is then used to (a) provide the getsockopt part for getting/setting CCIDs via sockopts; (b) replace the current test for "which CCID is in use" in probe.c. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: Set per-connection CCIDs via socket optionsGerrit Renker2008-11-23
| | | | | | | | | | | | | With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which overrides the defaults set by the global sysctl variables for TX/RX CCIDs. To make full use of this facility, the remaining patches of this patch set are needed, which track dependencies and activate negotiated feature values. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: af_netlink should update its inuse counterEric Dumazet2008-11-23
| | | | | | | | | In order to have relevant information for NETLINK protocol, in /proc/net/protocols, we should use sock_prot_inuse_add() to update a (percpu and pernamespace) counter of inuse sockets. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: some optimizations in af_inetEric Dumazet2008-11-23
| | | | | | | | | | | | 1) Use eq_net() in inet_netns_ok() to speedup socket creation if !CONFIG_NET_NS 2) Reorder the tests about inet_ehash_secret generation (once only) Use the unlikely() macro when testing if inet_ehash_secret already generated. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove redundant argument commentsQinghuang Feng2008-11-21
| | | | | | | Remove redundant argument comments in files of net/* Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2008-11-21
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
| * wireless: clean up sysfs code using %pMJohannes Berg2008-11-21
| | | | | | | | | | | | | | | | | | | | Remove converting the MAC address to a string by a direct byte conversion and use %pM instead, since the code is now boilerplate use a macro to define the show functions, and also use the shorter __ATTR_RO macro to define the attributes. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * net/ieee80211 -> drivers/net/ipw2x00/libipw_* renameJohn W. Linville2008-11-21
| | | | | | | | | | | | | | | | The old ieee80211 code only remains as a support library for the ipw2100 and ipw2200 drivers. So, move the code and rename it appropriately to reflects it's true purpose and status. Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * lib80211: consolidate crypt init routinesJohn W. Linville2008-11-21
| | | | | | | | Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * lib80211: absorb crypto bits from net/ieee80211John W. Linville2008-11-21
| | | | | | | | | | | | | | | | | | These bits are shared already between ipw2x00 and hostap, and could probably be shared both more cleanly and with other drivers. This commit simply relocates the code to lib80211 and adjusts the drivers appropriately. Signed-off-by: John W. Linville <linville@tuxdriver.com>