aboutsummaryrefslogtreecommitdiffstats
path: root/fs/select.c
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2013-07-09 21:24:39 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2013-07-09 21:24:39 -0400
commit496322bc91e35007ed754184dcd447a02b6dd685 (patch)
treef5298d0a74c0a6e65c0e98050b594b8d020904c1 /fs/select.c
parent2e17c5a97e231f3cb426f4b7895eab5be5c5442e (diff)
parent56e0ef527b184b3de2d7f88c6190812b2b2ac6bf (diff)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit 0a4db187a999 ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...
Diffstat (limited to 'fs/select.c')
-rw-r--r--fs/select.c62
1 files changed, 57 insertions, 5 deletions
diff --git a/fs/select.c b/fs/select.c
index 6b14dc7df3a4..f9f49c40cfd4 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -28,6 +28,7 @@
28#include <linux/hrtimer.h> 28#include <linux/hrtimer.h>
29#include <linux/sched/rt.h> 29#include <linux/sched/rt.h>
30#include <linux/freezer.h> 30#include <linux/freezer.h>
31#include <net/ll_poll.h>
31 32
32#include <asm/uaccess.h> 33#include <asm/uaccess.h>
33 34
@@ -386,9 +387,10 @@ get_max:
386#define POLLEX_SET (POLLPRI) 387#define POLLEX_SET (POLLPRI)
387 388
388static inline void wait_key_set(poll_table *wait, unsigned long in, 389static inline void wait_key_set(poll_table *wait, unsigned long in,
389 unsigned long out, unsigned long bit) 390 unsigned long out, unsigned long bit,
391 unsigned int ll_flag)
390{ 392{
391 wait->_key = POLLEX_SET; 393 wait->_key = POLLEX_SET | ll_flag;
392 if (in & bit) 394 if (in & bit)
393 wait->_key |= POLLIN_SET; 395 wait->_key |= POLLIN_SET;
394 if (out & bit) 396 if (out & bit)
@@ -402,6 +404,8 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
402 poll_table *wait; 404 poll_table *wait;
403 int retval, i, timed_out = 0; 405 int retval, i, timed_out = 0;
404 unsigned long slack = 0; 406 unsigned long slack = 0;
407 unsigned int busy_flag = net_busy_loop_on() ? POLL_BUSY_LOOP : 0;
408 unsigned long busy_end = 0;
405 409
406 rcu_read_lock(); 410 rcu_read_lock();
407 retval = max_select_fd(n, fds); 411 retval = max_select_fd(n, fds);
@@ -424,6 +428,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
424 retval = 0; 428 retval = 0;
425 for (;;) { 429 for (;;) {
426 unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp; 430 unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp;
431 bool can_busy_loop = false;
427 432
428 inp = fds->in; outp = fds->out; exp = fds->ex; 433 inp = fds->in; outp = fds->out; exp = fds->ex;
429 rinp = fds->res_in; routp = fds->res_out; rexp = fds->res_ex; 434 rinp = fds->res_in; routp = fds->res_out; rexp = fds->res_ex;
@@ -451,7 +456,8 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
451 f_op = f.file->f_op; 456 f_op = f.file->f_op;
452 mask = DEFAULT_POLLMASK; 457 mask = DEFAULT_POLLMASK;
453 if (f_op && f_op->poll) { 458 if (f_op && f_op->poll) {
454 wait_key_set(wait, in, out, bit); 459 wait_key_set(wait, in, out,
460 bit, busy_flag);
455 mask = (*f_op->poll)(f.file, wait); 461 mask = (*f_op->poll)(f.file, wait);
456 } 462 }
457 fdput(f); 463 fdput(f);
@@ -470,6 +476,18 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
470 retval++; 476 retval++;
471 wait->_qproc = NULL; 477 wait->_qproc = NULL;
472 } 478 }
479 /* got something, stop busy polling */
480 if (retval) {
481 can_busy_loop = false;
482 busy_flag = 0;
483
484 /*
485 * only remember a returned
486 * POLL_BUSY_LOOP if we asked for it
487 */
488 } else if (busy_flag & mask)
489 can_busy_loop = true;
490
473 } 491 }
474 } 492 }
475 if (res_in) 493 if (res_in)
@@ -488,6 +506,17 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
488 break; 506 break;
489 } 507 }
490 508
509 /* only if found POLL_BUSY_LOOP sockets && not out of time */
510 if (can_busy_loop && !need_resched()) {
511 if (!busy_end) {
512 busy_end = busy_loop_end_time();
513 continue;
514 }
515 if (!busy_loop_timeout(busy_end))
516 continue;
517 }
518 busy_flag = 0;
519
491 /* 520 /*
492 * If this is the first loop and we have a timeout 521 * If this is the first loop and we have a timeout
493 * given, then we convert to ktime_t and set the to 522 * given, then we convert to ktime_t and set the to
@@ -719,7 +748,9 @@ struct poll_list {
719 * pwait poll_table will be used by the fd-provided poll handler for waiting, 748 * pwait poll_table will be used by the fd-provided poll handler for waiting,
720 * if pwait->_qproc is non-NULL. 749 * if pwait->_qproc is non-NULL.
721 */ 750 */
722static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait) 751static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait,
752 bool *can_busy_poll,
753 unsigned int busy_flag)
723{ 754{
724 unsigned int mask; 755 unsigned int mask;
725 int fd; 756 int fd;
@@ -733,7 +764,10 @@ static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait)
733 mask = DEFAULT_POLLMASK; 764 mask = DEFAULT_POLLMASK;
734 if (f.file->f_op && f.file->f_op->poll) { 765 if (f.file->f_op && f.file->f_op->poll) {
735 pwait->_key = pollfd->events|POLLERR|POLLHUP; 766 pwait->_key = pollfd->events|POLLERR|POLLHUP;
767 pwait->_key |= busy_flag;
736 mask = f.file->f_op->poll(f.file, pwait); 768 mask = f.file->f_op->poll(f.file, pwait);
769 if (mask & busy_flag)
770 *can_busy_poll = true;
737 } 771 }
738 /* Mask out unneeded events. */ 772 /* Mask out unneeded events. */
739 mask &= pollfd->events | POLLERR | POLLHUP; 773 mask &= pollfd->events | POLLERR | POLLHUP;
@@ -752,6 +786,8 @@ static int do_poll(unsigned int nfds, struct poll_list *list,
752 ktime_t expire, *to = NULL; 786 ktime_t expire, *to = NULL;
753 int timed_out = 0, count = 0; 787 int timed_out = 0, count = 0;
754 unsigned long slack = 0; 788 unsigned long slack = 0;
789 unsigned int busy_flag = net_busy_loop_on() ? POLL_BUSY_LOOP : 0;
790 unsigned long busy_end = 0;
755 791
756 /* Optimise the no-wait case */ 792 /* Optimise the no-wait case */
757 if (end_time && !end_time->tv_sec && !end_time->tv_nsec) { 793 if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {
@@ -764,6 +800,7 @@ static int do_poll(unsigned int nfds, struct poll_list *list,
764 800
765 for (;;) { 801 for (;;) {
766 struct poll_list *walk; 802 struct poll_list *walk;
803 bool can_busy_loop = false;
767 804
768 for (walk = list; walk != NULL; walk = walk->next) { 805 for (walk = list; walk != NULL; walk = walk->next) {
769 struct pollfd * pfd, * pfd_end; 806 struct pollfd * pfd, * pfd_end;
@@ -778,9 +815,13 @@ static int do_poll(unsigned int nfds, struct poll_list *list,
778 * this. They'll get immediately deregistered 815 * this. They'll get immediately deregistered
779 * when we break out and return. 816 * when we break out and return.
780 */ 817 */
781 if (do_pollfd(pfd, pt)) { 818 if (do_pollfd(pfd, pt, &can_busy_loop,
819 busy_flag)) {
782 count++; 820 count++;
783 pt->_qproc = NULL; 821 pt->_qproc = NULL;
822 /* found something, stop busy polling */
823 busy_flag = 0;
824 can_busy_loop = false;
784 } 825 }
785 } 826 }
786 } 827 }
@@ -797,6 +838,17 @@ static int do_poll(unsigned int nfds, struct poll_list *list,
797 if (count || timed_out) 838 if (count || timed_out)
798 break; 839 break;
799 840
841 /* only if found POLL_BUSY_LOOP sockets && not out of time */
842 if (can_busy_loop && !need_resched()) {
843 if (!busy_end) {
844 busy_end = busy_loop_end_time();
845 continue;
846 }
847 if (!busy_loop_timeout(busy_end))
848 continue;
849 }
850 busy_flag = 0;
851
800 /* 852 /*
801 * If this is the first loop and we have a timeout 853 * If this is the first loop and we have a timeout
802 * given, then we convert to ktime_t and set the to 854 * given, then we convert to ktime_t and set the to