aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
...
| * ipvs: reorganize dest trashJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All dests will go to trash, no exceptions. But we have to use new list node t_list for this, due to RCU changes in following patches. Dests will wait there initial grace period and later all conns and schedulers to put their reference. The dests don't get reference for staying in dest trash as before. As result, we do not load ip_vs_dest_put with extra checks for last refcnt and the schedulers do not need to play games with atomic_inc_not_zero while selecting best destination. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert wrr scheduler to rcuJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | The schedule method now needs _rcu list-traversal primitive for svc->destinations. As the weight for some dest can be reduced during dest selection, change the algorithm to check weights by using minimum weights in the 1 .. max_weight-(di-1) range, with the same step (di). By this way we ensure that there will be always a weight >= 1 check before claiming that all destinations are overloaded. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert wlc scheduler to rcuJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | The schedule method now needs _rcu list-traversal primitive for svc->destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert sh scheduler to rcuJulian Anastasov2013-04-01
| | | | | | | | | | | | | | Use the 3 new methods to reassign dests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert sed scheduler to rcuJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | The schedule method now needs _rcu list-traversal primitive for svc->destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert rr scheduler to rcuJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | The schedule method now needs _rcu list-traversal primitive for svc->destinations. As the previous entry could be unlinked, limit the list traversals to 2 when lookup started from previous entry. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert nq scheduler to rcuJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | The schedule method now needs _rcu list-traversal primitive for svc->destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert lc scheduler to rcuJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | The schedule method now needs _rcu list-traversal primitive for svc->destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert lblcr scheduler to rcuJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | The schedule method now needs _rcu list-traversal primitive for svc->destinations. The read_lock for sched_lock is removed. The set.lock is removed because now it is used in rare cases, mostly under sched_lock. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert lblc scheduler to rcuJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | The schedule method now needs _rcu list-traversal primitive for svc->destinations. The read_lock for sched_lock is removed. Use a dead flag to prevent new entries to be created while scheduler is reclaimed. Use hlist for the hash table. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert dh scheduler to rcuJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | Use the new add_dest and del_dest methods to reassign dests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: add ip_vs_dest_hold and ip_vs_dest_putJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | ip_vs_dest_hold will be used under RCU lock while ip_vs_dest_put can be called even after dest is removed from service, as it happens for conns and some schedulers. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: preparations for using rcu in schedulersJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow schedulers to use rcu_dereference when returning destination on lookup. The RCU read-side critical section will allow ip_vs_bind_dest to get dest refcnt as preparation for the step where destinations will be deleted without an IP_VS_WAIT_WHILE guard that holds the packet processing during update. Add new optional scheduler methods add_dest, del_dest and upd_dest. For now the methods are called together with update_service but update_service will be removed in a following change. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: change ip_vs_sched_lock to mutexJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | The global list with schedulers ip_vs_schedulers is accessed only from user context - configuration and scheduler module [un]registration. Use ip_vs_sched_mutex instead. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: avoid kmem_cache_zalloc in ip_vs_conn_newJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | We have many fields to set and few to reset, use kmem_cache_alloc instead to save some cycles. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: reorder keys in connection structureJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | __ip_vs_conn_in_get and ip_vs_conn_out_get are hot places. Optimize them, so that ports are matched first. By moving net and fwmark below, on 32-bit arch we can fit caddr in 32-byte cache line and all addresses in 64-byte cache line. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert connection lockingJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert __ip_vs_conntbl_lock_array as follows: - readers that do not modify conn lists will use RCU lock - updaters that modify lists will use spinlock_t Now for conn lookups we will use RCU read-side critical section. Without using __ip_vs_conn_get such places have access to connection fields and can dereference some pointers like pe and pe_data plus the ability to update timer expiration. If full access is required we contend for reference. We add barrier in __ip_vs_conn_put, so that other CPUs see the refcnt operation after other writes. With the introduction of ip_vs_conn_unlink() we try to reorganize ip_vs_conn_expire(), so that unhashing of connections that should stay more time is avoided, even if it is for very short time. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert locks used in persistence enginesJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | Allow the readers to use RCU lock and for PE module registrations use global mutex instead of spinlock. All PE modules need to use synchronize_rcu in their module exit handler. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: remove rs_lock by using RCUJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rs_lock was used to protect rs_table (hash table) from updaters (under global mutex) and readers (packet handlers). We can remove rs_lock by using RCU lock for readers. Reclaiming dest only with kfree_rcu is enough because the readers access only fields from the ip_vs_dest structure. Use hlist for rs_table. As we are now using hlist_del_rcu, introduce in_rs_table flag as replacement for the list_empty checks which do not work with RCU. It is needed because only NAT dests are in the rs_table. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert app locksJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use locks like tcp_app_lock, udp_app_lock, sctp_app_lock to protect access to the protocol hash tables from readers in packet context while the application instances (inc) are [un]registered under global mutex. As the hash tables are mostly read when conns are created and bound to app, use RCU for readers and reclaim app instance after grace period. Simplify ip_vs_app_inc_get because we use usecnt only for statistics and rely on module refcounting. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: optimize dst usage for real serverJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when forwarding requests to real servers we use dst_lock and atomic operations when cloning the dst_cache value. As the dst_cache value does not change most of the time it is better to use RCU and to lock dst_lock only when we need to replace the obsoleted dst. For this to work we keep dst_cache in new structure protected by RCU. For packets to remote real servers we will use noref version of dst_cache, it will be valid while we are in RCU read-side critical section because now dst_release for replaced dsts will be invoked after the grace period. Packets to local real servers that are passed to local stack with NF_ACCEPT need a dst clone. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: consolidate all dst checks on transmit in one placeJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | Consolidate the PMTU checks, ICMP sending and skb_dst modification in __ip_vs_get_out_rt and __ip_vs_get_out_rt_v6. Now skb_dst is changed early to simplify the transmitters. Make sure update_pmtu is called only for local clients. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: do not use skb_share_checkJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | We run in contexts like ip_rcv, ipv6_rcv, br_handle_frame, do not expect shared skbs. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: no need to reroute anymore on DNAT over loopbackJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | After commit 70e7341673 (ipv4: Show that ip_send_reply() is purely unicast routine.) we do not need to reroute DNAT-ed traffic over loopback because reply uses iph daddr and not rt_spec_dst. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: rename functions related to dst_cache resetJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | Move and give better names to two functions: - ip_vs_dst_reset to __ip_vs_dst_cache_reset - __ip_vs_dev_reset to ip_vs_forget_dev Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: convert the IP_VS_XMIT macros to functionsJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | It was a bad idea to hide return statements in macros. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: prefer NETDEV_DOWN event to free cached dstsJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | The real server becomes unreachable on down event, no need to wait device unregistration. Should help in releasing dsts early before dst->dev is replaced with lo. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: avoid routing by TOS for real serverJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | Avoid replacing the cached route for real server on every packet with different TOS. I doubt that routing by TOS for real server is used at all, so we should be better with such optimization. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * net: add skb_dst_set_noref_forceJulian Anastasov2013-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | Rename skb_dst_set_noref to __skb_dst_set_noref and add force flag as suggested by David Miller. The new wrapper skb_dst_set_noref_force will force dst entries that are not cached to be attached as skb dst without taking reference as long as provided dst is reclaimed after RCU grace period. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off by: Hans Schillstrom <hans@schillstrom.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Simon Horman <horms@verge.net.au>
* | net: frag queue per hash bucket lockingJesper Dangaard Brouer2013-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements per hash bucket locking for the frag queue hash. This removes two write locks, and the only remaining write lock is for protecting hash rebuild. This essentially reduce the readers-writer lock to a rebuild lock. This patch is part of "net: frag performance followup" http://thread.gmane.org/gmane.linux.network/263644 of which two patches have already been accepted: Same test setup as previous: (http://thread.gmane.org/gmane.linux.network/257155) Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses Ethernet flow-control. A third interface is used for generating the DoS attack (with trafgen). Notice, I have changed the frag DoS generator script to be more efficient/deadly. Before it would only hit one RX queue, now its sending packets causing multi-queue RX, due to "better" RX hashing. Test types summary (netperf UDP_STREAM): Test-20G64K == 2x10G with 65K fragments Test-20G3F == 2x10G with 3x fragments (3*1472 bytes) Test-20G64K+DoS == Same as 20G64K with frag DoS Test-20G3F+DoS == Same as 20G3F with frag DoS Test-20G64K+MQ == Same as 20G64K with Multi-Queue frag DoS Test-20G3F+MQ == Same as 20G3F with Multi-Queue frag DoS When I rebased this-patch(03) (on top of net-next commit a210576c) and removed the _bh spinlock, I saw a performance regression. BUT this was caused by some unrelated change in-between. See tests below. Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de. Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02. Test (C) is what I reported before for this-patch Test (D) is net-next master HEAD (commit a210576c), which reveals some (unknown) performance regression (compared against test (B)). Test (D) function as a new base-test. Performance table summary (in Mbit/s): (#) Test-type: 20G64K 20G3F 20G64K+DoS 20G3F+DoS 20G64K+MQ 20G3F+MQ ---------- ------- ------- ---------- --------- -------- ------- (A) Patch-02 : 18848.7 13230.1 4103.04 5310.36 130.0 440.2 (B) 1b5ab0de : 18841.5 13156.8 4101.08 5314.57 129.0 424.2 (C) Patch-03v1: 18838.0 13490.5 4405.11 6814.72 196.6 461.6 (D) a210576c : 18321.5 11250.4 3635.34 5160.13 119.1 405.2 (E) with _bh : 17247.3 11492.6 3994.74 6405.29 166.7 413.6 (F) without bh: 17471.3 11298.7 3818.05 6102.11 165.7 406.3 Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks. I cannot explain the slow down for 20G64K (but its an artificial "lab-test" so I'm not worried). But the other results does show improvements. And test (E) "with _bh" version is slightly better. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> ---- V2: - By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't need the spinlock _bh versions, as Netfilter currently does a local_bh_disable() before entering inet_fragment. - Fold-in desc from cover-mail V3: - Drop the chain_len counter per hash bucket. Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-04-03
|\ \ | | | | | | | | | | | | | | | | | | Pull net into net-next to get the synchronize_net() bug fix in bonding. Signed-off-by: David S. Miller <davem@davemloft.net>
| * \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-04-02
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Fix VSOCK layer handling of context ID changes, from Reilly Grant. 2) Now that we have a synchronize_net() in netdev_rx_handler_unregister(), we can't let any call sites hold locks. Unfortunately bonding does, so we have to drop the rwlock there a little bit earlier, fix from Veaceslav Falico. 3) MAC address setting loop exits one iteration too early in mlx4 driver, from Yan Burman. 4) Restore ipv6 routes properly upon ifdown/ifup of loopback, from Balakumaran Kannan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: VSOCK: Handle changes to the VMCI context ID. net IPv6 : Fix broken IPv6 routing table after loopback down-up cbq: incorrect processing of high limits net/mlx4_en: Fix setting initial MAC address bonding: get netdev_rx_handler_unregister out of locks
| | * | VSOCK: Handle changes to the VMCI context ID.Reilly Grant2013-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VMCI context ID of a virtual machine may change at any time. There is a VMCI event which signals this but datagrams may be processed before this is handled. It is therefore necessary to be flexible about the destination context ID of any datagrams received. (It can be assumed to be correct because it is provided by the hypervisor.) The context ID on existing sockets should be updated to reflect how the hypervisor is currently referring to the system. Signed-off-by: Reilly Grant <grantr@vmware.com> Acked-by: Andy King <acking@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net IPv6 : Fix broken IPv6 routing table after loopback down-upBalakumaran Kannan2013-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo) interface. After down-up, routes of other interface's IPv6 addresses through 'lo' are lost. IPv6 addresses assigned to all interfaces are routed through 'lo' for internal communication. Once 'lo' is down, those routing entries are removed from routing table. But those removed entries are not being re-created properly when 'lo' is brought up. So IPv6 addresses of other interfaces becomes unreachable from the same machine. Also this breaks communication with other machines because of NDISC packet processing failure. This patch fixes this issue by reading all interface's IPv6 addresses and adding them to IPv6 routing table while bringing up 'lo'. ==Testing== Before applying the patch: $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ sudo ifdown lo $ sudo ifup lo $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ After applying the patch: $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ sudo ifdown lo $ sudo ifup lo $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com> Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | cbq: incorrect processing of high limitsVasily Averin2013-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | currently cbq works incorrectly for limits > 10% real link bandwidth, and practically does not work for limits > 50% real link bandwidth. Below are results of experiments taken on 1 Gbit link In shaper | Actual Result -----------+--------------- 100M | 108 Mbps 200M | 244 Mbps 300M | 412 Mbps 500M | 893 Mbps This happen because of q->now changes incorrectly in cbq_dequeue(): when it is called before real end of packet transmitting, L2T is greater than real time delay, q_now gets an extra boost but never compensate it. To fix this problem we prevent change of q->now until its synchronization with real time. Signed-off-by: Vasily Averin <vvs@openvz.org> Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net/mlx4_en: Fix setting initial MAC addressYan Burman2013-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6bbb6d9 "net/mlx4_en: Optimize Rx fast path filter checks" introduced a regression under which the MAC address read from the card was not converted correctly (the most significant byte was not handled), fix that. Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | bonding: get netdev_rx_handler_unregister out of locksVeaceslav Falico2013-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that netdev_rx_handler_unregister contains synchronize_net(), we need to call it outside of bond->lock, cause it might sleep. Also, remove the already unneded synchronize_net(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge tag 'regmap-v3.9-rc4' of ↵Linus Torvalds2013-04-02
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fixes from Mark Brown: "A small collection of fixes. The most important ones are those from Stephen and Lars-Peter both of which fix cache issues that have been lurking for a while but not manifesting noticably enough for anyone to report them." * tag 'regmap-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: async: Add missing return regmap: don't corrupt work buffer in _regmap_raw_write() regmap: cache Fix regcache-rbtree sync regmap: Initialize `map->debugfs' before regcache
| | * \ \ Merge remote-tracking branch 'regmap/fix/async' into tmpMark Brown2013-03-31
| | |\ \ \
| | | * | | regmap: async: Add missing returnMark Brown2013-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's only write once... Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
| | * | | | Merge remote-tracking branch 'regmap/fix/core' into tmpMark Brown2013-03-31
| | |\ \ \ \
| | | * | | | regmap: Initialize `map->debugfs' before regcacheDimitris Papastamos2013-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the rbtree code we are exposing statistics relating to the number of nodes/registers of the rbtree cache for each of the devices. Ensure that `map->debugfs' has been initialized before we attempt to initialize the debugfs entry for the rbtree cache. Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: stable@vger.kernel.org
| | * | | | | Merge remote-tracking branch 'regmap/fix/cache' into tmpMark Brown2013-03-31
| | |\ \ \ \ \ | | | |_|/ / / | | |/| | | |
| | | * | | | regmap: don't corrupt work buffer in _regmap_raw_write()Stephen Warren2013-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _regmap_raw_write() contains code to call regcache_write() to write values to the cache. That code calls memcpy() to copy the value data to the start of the work_buf. However, at least when _regmap_raw_write() is called from _regmap_bus_raw_write(), the value data is in the work_buf, and this memcpy() operation may over-write part of that value data, depending on the value of reg_bytes + pad_bytes. At least when using reg_bytes==1 and pad_bytes==0, corruption of the value data does occur. To solve this, remove the memcpy() operation, and modify the subsequent .parse_val() call to parse the original value buffer directly. At least in the case of 8-bit register address and 16-bit values, and writes of single registers at a time, this memcpy-then-parse combination used to cancel each-other out; for a work-buffer containing xx 89 03, the memcpy changed it to 89 03 03, and the parse_val changed it back to 89 89 03, thus leaving the value uncorrupted. This appears completely accidental though. Since commit 8a819ff "regmap: core: Split out in place value parsing", .parse_val only returns the parsed value, and does not modify the buffer, and hence does not (accidentally) undo the corruption caused by memcpy(). This caused bogus values to get written to HW, thus preventing e.g. audio playback on systems with a WM8903 CODEC. This patch fixes that. Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
| | | * | | | regmap: cache Fix regcache-rbtree syncLars-Peter Clausen2013-03-13
| | | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last register block, which falls into the specified range, is not handled correctly. The formula which calculates the number of register which should be synced is inverse (and off by one). E.g. if all registers in that block should be synced only one is synced, and if only one should be synced all (but one) are synced. To calculate the number of registers that need to be synced we need to subtract the number of the first register in the block from the max register number and add one. This patch updates the code accordingly. The issue was introduced in commit ac8d91c ("regmap: Supply ranges to the sync operations"). Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: stable@vger.kernel.org
| * | | | | Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds2013-04-02
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull DRM fixes from Dave Airlie: "Two core fixes, both regressions, along with some intel and some nouveau fixes for regressions and oopses" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm: correctly restore mappings if drm_open fails drm/nouveau: fix NULL ptr dereference from nv50_disp_intr() drm/nouveau: fix handling empty channel list in ioctl's drm: don't unlock in the addfb error paths drm/i915: Fix build failure drm/i915: Be sure to turn hsync/vsync back on at crt enable (v2) drm/i915: duct-tape locking when eDP init fails
| | * | | | | drm: correctly restore mappings if drm_open failsIlija Hadzic2013-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If first drm_open fails, the error-handling path will incorrectly restore inode's mapping to NULL. This can cause the crash later on. Fix by separately storing away mapping pointers that drm_open can touch and restore each from its own respective variable if the call fails. Fixes: https://bugzilla.novell.com/show_bug.cgi?id=807850 (thanks to Michal Hocko for investigating investigating and finding the root cause of the bug) Reference: http://lists.freedesktop.org/archives/dri-devel/2013-March/036564.html v2: Use one variable to store file and inode mapping since they are the same at the function entry. Fix spelling mistakes in commit message. v3: Add reference to the original bug report. Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de> Tested-by: Marco Munderloh <munderl@tnt.uni-hannover.de> Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
| | * | | | | Merge branch 'drm-nouveau-fixes-3.9' of ↵Dave Airlie2013-04-02
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next Oops fixers. * 'drm-nouveau-fixes-3.9' of git://anongit.freedesktop.org/git/nouveau/linux-2.6: drm/nouveau: fix NULL ptr dereference from nv50_disp_intr() drm/nouveau: fix handling empty channel list in ioctl's
| | | * | | | | drm/nouveau: fix NULL ptr dereference from nv50_disp_intr()Maarten Lankhorst2013-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Op 23-03-13 12:47, Peter Hurley schreef: > On Tue, 2013-03-19 at 11:13 -0400, Peter Hurley wrote: >> On vanilla 3.9.0-rc3, I get this 100% repeatable oops after login when >> the user X session is coming up: > Perhaps I wasn't clear that this happens on every boot and is a > regression from 3.8 > > I'd be happy to help resolve this but time is of the essence; it would > be a shame to have to revert all of this for 3.9 Well it broke on my system too, so it was easy to fix. I didn't even need gdm to trigger it! >8---- This fixes regression caused by 1d7c71a3e2f7 (drm/nouveau/disp: port vblank handling to event interface), which causes a oops in the following way: BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 IP: [<0000000000000001>] 0x0 PGD 0 Oops: 0010 [#1] PREEMPT SMP Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables ...<snip>... CPU 3 Pid: 0, comm: swapper/3 Not tainted 3.9.0-rc3-xeon #rc3 Dell Inc. Precision WorkStation T5400 /0RW203 RIP: 0010:[<0000000000000001>] [<0000000000000001>] 0x0 RSP: 0018:ffff8802afcc3d80 EFLAGS: 00010087 RAX: ffff88029f6e5808 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000096 RSI: 0000000000000001 RDI: ffff88029f6e5808 RBP: ffff8802afcc3dc8 R08: 0000000000000000 R09: 0000000000000004 R10: 000000000000002c R11: ffff88029e559a98 R12: ffff8802a376cb78 R13: ffff88029f6e57e0 R14: ffff88029f6e57f8 R15: ffff88029f6e5808 FS: 0000000000000000(0000) GS:ffff8802afcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000001 CR3: 000000029fa67000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper/3 (pid: 0, threadinfo ffff8802a355e000, task ffff8802a3535c40) Stack: ffffffffa0159d8a 0000000000000082 ffff88029f6e5820 0000000000000001 ffff88029f71aa00 0000000000000000 0000000000000000 0000000004000000 0000000004000000 ffff8802afcc3e38 ffffffffa01843b5 ffff8802afcc3df8 Call Trace: <IRQ> [<ffffffffa0159d8a>] ? nouveau_event_trigger+0xaa/0xe0 [nouveau] [<ffffffffa01843b5>] nv50_disp_intr+0xc5/0x200 [nouveau] [<ffffffff816fbacc>] ? _raw_spin_unlock_irqrestore+0x2c/0x50 [<ffffffff816ff98d>] ? notifier_call_chain+0x4d/0x70 [<ffffffffa017a105>] nouveau_mc_intr+0xb5/0x110 [nouveau] [<ffffffffa01d45ff>] nouveau_irq_handler+0x6f/0x80 [nouveau] [<ffffffff810eec95>] handle_irq_event_percpu+0x75/0x260 [<ffffffff810eeec8>] handle_irq_event+0x48/0x70 [<ffffffff810f205a>] handle_fasteoi_irq+0x5a/0x100 [<ffffffff810182f2>] handle_irq+0x22/0x40 [<ffffffff8170561a>] do_IRQ+0x5a/0xd0 [<ffffffff816fc2ad>] common_interrupt+0x6d/0x6d <EOI> [<ffffffff810449b6>] ? native_safe_halt+0x6/0x10 [<ffffffff8101ea1d>] default_idle+0x3d/0x170 [<ffffffff8101f736>] cpu_idle+0x116/0x130 [<ffffffff816e2a06>] start_secondary+0x251/0x258 Code: Bad RIP value. RIP [<0000000000000001>] 0x0 RSP <ffff8802afcc3d80> CR2: 0000000000000001 ---[ end trace 907323cb8ce6f301 ]--- Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
| | | * | | | | drm/nouveau: fix handling empty channel list in ioctl'sMaarten Lankhorst2013-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are no channels, chan would never end up being NULL, and so the null pointer check would fail. Solve this by initializing chan to NULL, and iterating over temp instead. Fixes oops when running intel-gpu-tools/tests/kms_flip, which attempts to do some intel ioctl's on a nouveau device. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: stable@vger.kernel.org [3.7+] Signed-off-by: Ben Skeggs <bskeggs@redhat.com>