3 files changed, 15 insertions, 771 deletions
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index b886f52a9aac..e5da4f2b7c22 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -240,17 +240,23 @@ X!Ilib/string.c
     <sect1><title>Driver Support</title>
 !Enet/core/dev.c
 !Enet/ethernet/eth.c
+!Enet/sched/sch_generic.c
 !Iinclude/linux/etherdevice.h
+!Iinclude/linux/netdevice.h
+     </sect1>
+     <sect1><title>PHY Support</title>
 !Edrivers/net/phy/phy.c
 !Idrivers/net/phy/phy.c
 !Edrivers/net/phy/phy_device.c
 !Idrivers/net/phy/phy_device.c
 !Edrivers/net/phy/mdio_bus.c
 !Idrivers/net/phy/mdio_bus.c
+     </sect1>
 <!-- FIXME: Removed for now since no structured comments in source
+     <sect1><title>Wireless</title>
 X!Enet/core/wireless.c
-->
     </sect1>
+-->
     <sect1><title>Synchronous PPP</title>
 !Edrivers/net/wan/syncppp.c
     </sect1>
diff --git a/Documentation/networking/NAPI_HOWTO.txt b/Documentation/networking/NAPI_HOWTO.txt
deleted file mode 100644
index 7907435a661c..000000000000
--- a/Documentation/networking/NAPI_HOWTO.txt
+++ /dev/null
@@ -1,766 +0,0 @@
-HISTORY:
-February 16/2002 -- revision 0.2.1:
-COR typo corrected
-February 10/2002 -- revision 0.2:
-some spell checking ;->
-January 12/2002 -- revision 0.1
-This is still work in progress so may change.
-To keep up to date please watch this space.
-Introduction to NAPI
-====================
-NAPI is a proven (www.cyberus.ca/~hadi/usenix-paper.tgz) technique
-to improve network performance on Linux. For more details please
-read that paper.
-NAPI provides a "inherent mitigation" which is bound by system capacity
-as can be seen from the following data collected by Robert on Gigabit 
-ethernet (e1000):
- Psize    Ipps       Tput     Rxint     Txint    Done     Ndone
- ---------------------------------------------------------------
-   60    890000     409362        17     27622        7     6823
-  128    758150     464364        21      9301       10     7738
-  256    445632     774646        42     15507       21    12906
-  512    232666     994445    241292     19147   241192     1062
- 1024    119061    1000003    872519     19258   872511        0
- 1440     85193    1000003    946576     19505   946569        0
- 
-Legend:
-"Ipps" stands for input packets per second. 
-"Tput" == packets out of total 1M that made it out.
-"txint" == transmit completion interrupts seen
-"Done" == The number of times that the poll() managed to pull all
-packets out of the rx ring. Note from this that the lower the
-load the more we could clean up the rxring
-"Ndone" == is the converse of "Done". Note again, that the higher
-the load the more times we couldn't clean up the rxring.
-Observe that:
-when the NIC receives 890Kpackets/sec only 17 rx interrupts are generated. 
-The system cant handle the processing at 1 interrupt/packet at that load level. 
-At lower rates on the other hand, rx interrupts go up and therefore the
-interrupt/packet ratio goes up (as observable from that table). So there is
-possibility that under low enough input, you get one poll call for each
-input packet caused by a single interrupt each time. And if the system 
-cant handle interrupt per packet ratio of 1, then it will just have to 
-chug along ....
-0) Prerequisites:
-==================
-A driver MAY continue using the old 2.4 technique for interfacing
-to the network stack and not benefit from the NAPI changes.
-NAPI additions to the kernel do not break backward compatibility.
-NAPI, however, requires the following features to be available:
-A) DMA ring or enough RAM to store packets in software devices.
-B) Ability to turn off interrupts or maybe events that send packets up 
-the stack.
-NAPI processes packet events in what is known as dev->poll() method.
-Typically, only packet receive events are processed in dev->poll(). 
-The rest of the events MAY be processed by the regular interrupt handler 
-to reduce processing latency (justified also because there are not that 
-many of them).
-Note, however, NAPI does not enforce that dev->poll() only processes 
-receive events. 
-Tests with the tulip driver indicated slightly increased latency if
-all of the interrupt handler is moved to dev->poll(). Also MII handling
-gets a little trickier.
-The example used in this document is to move the receive processing only
-to dev->poll(); this is shown with the patch for the tulip driver.
-For an example of code that moves all the interrupt driver to 
-dev->poll() look at the ported e1000 code.
-There are caveats that might force you to go with moving everything to 
-dev->poll(). Different NICs work differently depending on their status/event 
-acknowledgement setup. 
-There are two types of event register ACK mechanisms.
-        I)  what is known as Clear-on-read (COR).
-        when you read the status/event register, it clears everything!
-        The natsemi and sunbmac NICs are known to do this.
-        In this case your only choice is to move all to dev->poll()
-        II) Clear-on-write (COW)
-         i) you clear the status by writing a 1 in the bit-location you want.
-                These are the majority of the NICs and work the best with NAPI.
-                Put only receive events in dev->poll(); leave the rest in
-                the old interrupt handler.
-         ii) whatever you write in the status register clears every thing ;->
-                Cant seem to find any supported by Linux which do this. If
-                someone knows such a chip email us please.
-                Move all to dev->poll()
-C) Ability to detect new work correctly.
-NAPI works by shutting down event interrupts when there's work and
-turning them on when there's none. 
-New packets might show up in the small window while interrupts were being 
-re-enabled (refer to appendix 2).  A packet might sneak in during the period 
-we are enabling interrupts. We only get to know about such a packet when the 
-next new packet arrives and generates an interrupt. 
-Essentially, there is a small window of opportunity for a race condition
-which for clarity we'll refer to as the "rotting packet".
-This is a very important topic and appendix 2 is dedicated for more 
-discussion.
-Locking rules and environmental guarantees
-==========================================
-Guarantee: Only one CPU at any time can call dev->poll(); this is because
-only one CPU can pick the initial interrupt and hence the initial
-netif_rx_schedule(dev);
- The core layer invokes devices to send packets in a round robin format.
-This implies receive is totally lockless because of the guarantee that only 
-one CPU is executing it.
-  contention can only be the result of some other CPU accessing the rx
-ring. This happens only in close() and suspend() (when these methods
-try to clean the rx ring); 
-****guarantee: driver authors need not worry about this; synchronization 
-is taken care for them by the top net layer.
-local interrupts are enabled (if you dont move all to dev->poll()). For 
-example link/MII and txcomplete continue functioning just same old way. 
-This improves the latency of processing these events. It is also assumed that 
-the receive interrupt is the largest cause of noise. Note this might not 
-always be true. 
-[according to Manfred Spraul, the winbond insists on sending one 
-txmitcomplete interrupt for each packet (although this can be mitigated)].
-For these broken drivers, move all to dev->poll().
-For the rest of this text, we'll assume that dev->poll() only
-processes receive events.
-new methods introduce by NAPI
-=============================
-a) netif_rx_schedule(dev)
-Called by an IRQ handler to schedule a poll for device
-b) netif_rx_schedule_prep(dev)
-puts the device in a state which allows for it to be added to the
-CPU polling list if it is up and running. You can look at this as
-the first half of  netif_rx_schedule(dev) above; the second half
-being c) below.
-c) __netif_rx_schedule(dev)
-Add device to the poll list for this CPU; assuming that _prep above
-has already been called and returned 1.
-d) netif_rx_reschedule(dev, undo)
-Called to reschedule polling for device specifically for some
-deficient hardware. Read Appendix 2 for more details.
-e) netif_rx_complete(dev)
-Remove interface from the CPU poll list: it must be in the poll list
-on current cpu. This primitive is called by dev->poll(), when
-it completes its work. The device cannot be out of poll list at this
-call, if it is then clearly it is a BUG(). You'll know ;->
-All of the above methods are used below, so keep reading for clarity.
-Device driver changes to be made when porting NAPI
-==================================================
-Below we describe what kind of changes are required for NAPI to work.
-1) introduction of dev->poll() method 
-=====================================
-This is the method that is invoked by the network core when it requests
-for new packets from the driver. A driver is allowed to send upto
-dev->quota packets by the current CPU before yielding to the network
-subsystem (so other devices can also get opportunity to send to the stack).
-dev->poll() prototype looks as follows:
-int my_poll(struct net_device *dev, int *budget)
-budget is the remaining number of packets the network subsystem on the
-current CPU can send up the stack before yielding to other system tasks.
-*Each driver is responsible for decrementing budget by the total number of
-packets sent.
-        Total number of packets cannot exceed dev->quota.
-dev->poll() method is invoked by the top layer, the driver just sends if it 
-can to the stack the packet quantity requested.
-more on dev->poll() below after the interrupt changes are explained.
-2) registering dev->poll() method
-===================================
-dev->poll should be set in the dev->probe() method. 
-e.g:
-dev->open = my_open;
-.
-.
-/* two new additions */
-/* first register my poll method */
-dev->poll = my_poll;
-/* next register my weight/quanta; can be overridden in /proc */
-dev->weight = 16;
-.
-.
-dev->stop = my_close;
-3) scheduling dev->poll()
-=============================
-This involves modifying the interrupt handler and the code
-path which takes the packet off the NIC and sends them to the 
-stack.
-it's important at this point to introduce the classical D Becker 
-interrupt processor:
------------------
-static irqreturn_t
-netdevice_interrupt(int irq, void *dev_id, struct pt_regs *regs)
-{
-        struct net_device *dev = (struct net_device *)dev_instance;
-        struct my_private *tp = (struct my_private *)dev->priv;
-        int work_count = my_work_count;
-        status = read_interrupt_status_reg();
-        if (status == 0)
-                return IRQ_NONE; /* Shared IRQ: not us */
-        if (status == 0xffff)
-                return IRQ_HANDLED;      /* Hot unplug */
-        if (status & error)
-                do_some_error_handling()
-        
-        do {
-                acknowledge_ints_ASAP();
-                if (status & link_interrupt) {
-                        spin_lock(&tp->link_lock);
-                        do_some_link_stat_stuff();
-                        spin_lock(&tp->link_lock);
-                }
-                
-                if (status & rx_interrupt) {
-                        receive_packets(dev);
-                }
-                if (status & rx_nobufs) {
-                        make_rx_buffs_avail();
-                }
-                        
-                if (status & tx_related) {
-                        spin_lock(&tp->lock);
-                        tx_ring_free(dev);
-                        if (tx_died)
-                                restart_tx();
-                        spin_unlock(&tp->lock);
-                }
-                status = read_interrupt_status_reg();
-        } while (!(status & error) || more_work_to_be_done);
-        return IRQ_HANDLED;
-}
----------------------------------------------------------------------
-We now change this to what is shown below to NAPI-enable it:
----------------------------------------------------------------------
-static irqreturn_t
-netdevice_interrupt(int irq, void *dev_id, struct pt_regs *regs)
-{
-        struct net_device *dev = (struct net_device *)dev_instance;
-        struct my_private *tp = (struct my_private *)dev->priv;
-        status = read_interrupt_status_reg();
-        if (status == 0)
-                return IRQ_NONE;         /* Shared IRQ: not us */
-        if (status == 0xffff)
-                return IRQ_HANDLED;         /* Hot unplug */
-        if (status & error)
-                do_some_error_handling();
-        
-        do {
-/************************ start note *********************************/         
-                acknowledge_ints_ASAP();  // dont ack rx and rxnobuff here
-/************************ end note *********************************/           
-                if (status & link_interrupt) {
-                        spin_lock(&tp->link_lock);
-                        do_some_link_stat_stuff();
-                        spin_unlock(&tp->link_lock);
-                }
-/************************ start note *********************************/         
-                if (status & rx_interrupt || (status & rx_nobuffs)) {
-                        if (netif_rx_schedule_prep(dev)) {
-                                /* disable interrupts caused 
-                                 *      by arriving packets */
-                                disable_rx_and_rxnobuff_ints();
-                                /* tell system we have work to be done. */
-                                __netif_rx_schedule(dev);
-                        } else {
-                                printk("driver bug! interrupt while in poll\n");
-                                /* FIX by disabling interrupts  */
-                                disable_rx_and_rxnobuff_ints();
-                        }
-                }
-/************************ end note note *********************************/              
-                        
-                if (status & tx_related) {
-                        spin_lock(&tp->lock);
-                        tx_ring_free(dev);
-                        if (tx_died)
-                                restart_tx();
-                        spin_unlock(&tp->lock);
-                }
-                status = read_interrupt_status_reg();
-/************************ start note *********************************/         
-        } while (!(status & error) || more_work_to_be_done(status));
-/************************ end note note *********************************/              
-        return IRQ_HANDLED;
-}
---------------------------------------------------------------------
-We note several things from above:
-I) Any interrupt source which is caused by arriving packets is now
-turned off when it occurs. Depending on the hardware, there could be
-several reasons that arriving packets would cause interrupts; these are the
-interrupt sources we wish to avoid. The two common ones are a) a packet 
-arriving (rxint) b) a packet arriving and finding no DMA buffers available
-(rxnobuff) .
-This means also acknowledge_ints_ASAP() will not clear the status
-register for those two items above; clearing is done in the place where 
-proper work is done within NAPI; at the poll() and refill_rx_ring() 
-discussed further below.
-netif_rx_schedule_prep() returns 1 if device is in running state and
-gets successfully added to the core poll list. If we get a zero value
-we can _almost_ assume are already added to the list (instead of not running. 
-Logic based on the fact that you shouldn't get interrupt if not running)
-We rectify this by disabling rx and rxnobuf interrupts.
-II) that receive_packets(dev) and make_rx_buffs_avail() may have disappeared.
-These functionalities are still around actually......
-infact, receive_packets(dev) is very close to my_poll() and 
-make_rx_buffs_avail() is invoked from my_poll()
-4) converting receive_packets() to dev->poll()
-===============================================
-We need to convert the classical D Becker receive_packets(dev) to my_poll()
-First the typical receive_packets() below:
-------------------------------------------------------------------
-/* this is called by interrupt handler */
-static void receive_packets (struct net_device *dev)
-{
-        struct my_private *tp = (struct my_private *)dev->priv;
-        rx_ring = tp->rx_ring;
-        cur_rx = tp->cur_rx;
-        int entry = cur_rx % RX_RING_SIZE;
-        int received = 0;
-        int rx_work_limit = tp->dirty_rx + RX_RING_SIZE - tp->cur_rx;
-        while (rx_ring_not_empty) {
-                u32 rx_status;
-                unsigned int rx_size;
-                unsigned int pkt_size;
-                struct sk_buff *skb;
-                /* read size+status of next frame from DMA ring buffer */
-                /* the number 16 and 4 are just examples */
-                rx_status = le32_to_cpu (*(u32 *) (rx_ring + ring_offset));
-                rx_size = rx_status >> 16;
-                pkt_size = rx_size - 4;
-                /* process errors */
-                if ((rx_size > (MAX_ETH_FRAME_SIZE+4)) ||
-                    (!(rx_status & RxStatusOK))) {
-                        netdrv_rx_err (rx_status, dev, tp, ioaddr);
-                        return;
-                }
-                if (--rx_work_limit < 0)
-                        break;
-                /* grab a skb */
-                skb = dev_alloc_skb (pkt_size + 2);
-                if (skb) {
-                        .
-                        .
-                        netif_rx (skb);
-                        .
-                        .
-                } else {  /* OOM */
-                        /*seems very driver specific ... some just pass
-                        whatever is on the ring already. */
-                }
-                /* move to the next skb on the ring */
-                entry = (++tp->cur_rx) % RX_RING_SIZE;
-                received++ ;
-        }
-        /* store current ring pointer state */
-        tp->cur_rx = cur_rx;
-        /* Refill the Rx ring buffers if they are needed */
-        refill_rx_ring();
-        .
-        .
-}
-------------------------------------------------------------------
-We change it to a new one below; note the additional parameter in
-the call.
-------------------------------------------------------------------
-/* this is called by the network core */
-static int my_poll (struct net_device *dev, int *budget)
-{
-        struct my_private *tp = (struct my_private *)dev->priv;
-        rx_ring = tp->rx_ring;
-        cur_rx = tp->cur_rx;
-        int entry = cur_rx % RX_BUF_LEN;
-        /* maximum packets to send to the stack */
-/************************ note note *********************************/          
-        int rx_work_limit = dev->quota;
-/************************ end note note *********************************/              
-    do {  // outer beginning loop starts here
-        clear_rx_status_register_bit();
-        while (rx_ring_not_empty) {
-                u32 rx_status;
-                unsigned int rx_size;
-                unsigned int pkt_size;
-                struct sk_buff *skb;
-                /* read size+status of next frame from DMA ring buffer */
-                /* the number 16 and 4 are just examples */
-                rx_status = le32_to_cpu (*(u32 *) (rx_ring + ring_offset));
-                rx_size = rx_status >> 16;
-                pkt_size = rx_size - 4;
-                /* process errors */
-                if ((rx_size > (MAX_ETH_FRAME_SIZE+4)) ||
-                    (!(rx_status & RxStatusOK))) {
-                        netdrv_rx_err (rx_status, dev, tp, ioaddr);
-                        return 1;
-                }
-/************************ note note *********************************/          
-                if (--rx_work_limit < 0) { /* we got packets, but no quota */
-                        /* store current ring pointer state */
-                        tp->cur_rx = cur_rx;
-                        /* Refill the Rx ring buffers if they are needed */
-                        refill_rx_ring(dev);
-                        goto not_done;
-                }
-/**********************  end note **********************************/
-                /* grab a skb */
-                skb = dev_alloc_skb (pkt_size + 2);
-                if (skb) {
-                        .
-                        .
-/************************ note note *********************************/          
-                        netif_receive_skb (skb);
-/**********************  end note **********************************/
-                        .
-                        .
-                } else {  /* OOM */
-                        /*seems very driver specific ... common is just pass
-                        whatever is on the ring already. */
-                }
-                /* move to the next skb on the ring */
-                entry = (++tp->cur_rx) % RX_RING_SIZE;
-                received++ ;
-        }
-        /* store current ring pointer state */
-        tp->cur_rx = cur_rx;
-        /* Refill the Rx ring buffers if they are needed */
-        refill_rx_ring(dev);
-        
-        /* no packets on ring; but new ones can arrive since we last 
-           checked  */
-        status = read_interrupt_status_reg();
-        if (rx status is not set) {
-                        /* If something arrives in this narrow window,
-                        an interrupt will be generated */
-                        goto done;
-        }
-        /* done! at least that's what it looks like ;->
-        if new packets came in after our last check on status bits
-        they'll be caught by the while check and we go back and clear them 
-        since we havent exceeded our quota */
-    } while (rx_status_is_set); 
-done:
-/************************ note note *********************************/          
-        dev->quota -= received;
-        *budget -= received;
-        /* If RX ring is not full we are out of memory. */
-        if (tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL)
-                goto oom;
-        /* we are happy/done, no more packets on ring; put us back
-        to where we can start processing interrupts again */
-        netif_rx_complete(dev);
-        enable_rx_and_rxnobuf_ints();
-       /* The last op happens after poll completion. Which means the following:
-        * 1. it can race with disabling irqs in irq handler (which are done to 
-        * schedule polls)
-        * 2. it can race with dis/enabling irqs in other poll threads
-        * 3. if an irq raised after the beginning of the outer beginning 
-        * loop (marked in the code above), it will be immediately
-        * triggered here.
-        *
-        * Summarizing: the logic may result in some redundant irqs both
-        * due to races in masking and due to too late acking of already
-        * processed irqs. The good news: no events are ever lost.
-        */
-        return 0;   /* done */
-not_done:
-        if (tp->cur_rx - tp->dirty_rx > RX_RING_SIZE/2 ||
-            tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL)
-                refill_rx_ring(dev);
-        if (!received) {
-                printk("received==0\n");
-                received = 1;
-        }
-        dev->quota -= received;
-        *budget -= received;
-        return 1;  /* not_done */
-oom:
-        /* Start timer, stop polling, but do not enable rx interrupts. */
-        start_poll_timer(dev);
-        return 0;  /* we'll take it from here so tell core "done"*/
-/************************ End note note *********************************/              
-}
-------------------------------------------------------------------
-From above we note that:
-0) rx_work_limit = dev->quota 
-1) refill_rx_ring() is in charge of clearing the bit for rxnobuff when
-it does the work.
-2) We have a done and not_done state.
-3) instead of netif_rx() we call netif_receive_skb() to pass the skb.
-4) we have a new way of handling oom condition
-5) A new outer for (;;) loop has been added. This serves the purpose of
-ensuring that if a new packet has come in, after we are all set and done,
-and we have not exceeded our quota that we continue sending packets up.
- 
-----------------------------------------------------------
-Poll timer code will need to do the following:
-a) 
-        if (tp->cur_rx - tp->dirty_rx > RX_RING_SIZE/2 ||
-            tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL) 
-                refill_rx_ring(dev);
-        /* If RX ring is not full we are still out of memory.
-           Restart the timer again. Else we re-add ourselves 
-           to the master poll list.
-         */
-        if (tp->rx_buffers[tp->dirty_rx % RX_RING_SIZE].skb == NULL)
-                restart_timer();
-        else netif_rx_schedule(dev);  /* we are back on the poll list */
-        
-5) dev->close() and dev->suspend() issues
-==========================================
-The driver writer needn't worry about this; the top net layer takes
-care of it.
-6) Adding new Stats to /proc 
-=============================
-In order to debug some of the new features, we introduce new stats
-that need to be collected.
-TODO: Fill this later.
-APPENDIX 1: discussion on using ethernet HW FC
-==============================================
-Most chips with FC only send a pause packet when they run out of Rx buffers.
-Since packets are pulled off the DMA ring by a softirq in NAPI,
-if the system is slow in grabbing them and we have a high input
-rate (faster than the system's capacity to remove packets), then theoretically
-there will only be one rx interrupt for all packets during a given packetstorm.
-Under low load, we might have a single interrupt per packet.
-FC should be programmed to apply in the case when the system cant pull out
-packets fast enough i.e send a pause only when you run out of rx buffers.
-Note FC in itself is a good solution but we have found it to not be
-much of a commodity feature (both in NICs and switches) and hence falls
-under the same category as using NIC based mitigation. Also, experiments
-indicate that it's much harder to resolve the resource allocation
-issue (aka lazy receiving that NAPI offers) and hence quantify its usefulness
-proved harder. In any case, FC works even better with NAPI but is not
-necessary.
-APPENDIX 2: the "rotting packet" race-window avoidance scheme 
-=============================================================
-There are two types of associations seen here
-1) status/int which honors level triggered IRQ
-If a status bit for receive or rxnobuff is set and the corresponding 
-interrupt-enable bit is not on, then no interrupts will be generated. However, 
-as soon as the "interrupt-enable" bit is unmasked, an immediate interrupt is 
-generated.  [assuming the status bit was not turned off].
-Generally the concept of level triggered IRQs in association with a status and
-interrupt-enable CSR register set is used to avoid the race.
-If we take the example of the tulip:
-"pending work" is indicated by the status bit(CSR5 in tulip).
-the corresponding interrupt bit (CSR7 in tulip) might be turned off (but
-the CSR5 will continue to be turned on with new packet arrivals even if
-we clear it the first time)
-Very important is the fact that if we turn on the interrupt bit on when
-status is set that an immediate irq is triggered.
- 
-If we cleared the rx ring and proclaimed there was "no more work
-to be done" and then went on to do a few other things;  then when we enable
-interrupts, there is a possibility that a new packet might sneak in during
-this phase. It helps to look at the pseudo code for the tulip poll
-routine:
--------------------------
-        do {
-                ACK;
-                while (ring_is_not_empty()) {
-                        work-work-work
-                        if quota is exceeded: exit, no touching irq status/mask
-                }
-                /* No packets, but new can arrive while we are doing this*/
-                CSR5 := read
-                if (CSR5 is not set) {
-                        /* If something arrives in this narrow window here,
-                        *  where the comments are ;-> irq will be generated */
-                        unmask irqs;
-                        exit poll;
-                }
-        } while (rx_status_is_set);
------------------------
-CSR5 bit of interest is only the rx status. 
-If you look at the last if statement: 
-you just finished grabbing all the packets from the rx ring .. you check if
-status bit says there are more packets just in ... it says none; you then
-enable rx interrupts again; if a new packet just came in during this check,
-we are counting that CSR5 will be set in that small window of opportunity
-and that by re-enabling interrupts, we would actually trigger an interrupt
-to register the new packet for processing.
-[The above description nay be very verbose, if you have better wording 
-that will make this more understandable, please suggest it.]
-2) non-capable hardware
-These do not generally respect level triggered IRQs. Normally,
-irqs may be lost while being masked and the only way to leave poll is to do
-a double check for new input after netif_rx_complete() is invoked
-and re-enable polling (after seeing this new input).
-Sample code:
---------
-        .
-        .
-restart_poll:
-        while (ring_is_not_empty()) {
-                work-work-work
-                if quota is exceeded: exit, not touching irq status/mask
-        }
-        .
-        .
-        .
-        enable_rx_interrupts()
-        netif_rx_complete(dev);
-        if (ring_has_new_packet() && netif_rx_reschedule(dev, received)) {
-                disable_rx_and_rxnobufs()
-                goto restart_poll
-        } while (rx_status_is_set);
---------
-                
-Basically netif_rx_complete() removes us from the poll list, but because a
-new packet which will never be caught due to the possibility of a race
-might come in, we attempt to re-add ourselves to the poll list. 
-APPENDIX 3: Scheduling issues.
-==============================
-As seen NAPI moves processing to softirq level. Linux uses the ksoftirqd as the 
-general solution to schedule softirq's to run before next interrupt and by putting 
-them under scheduler control. Also this prevents consecutive softirq's from 
-monopolize the CPU. This also have the effect that the priority of ksoftirq needs 
-to be considered when running very CPU-intensive applications and networking to
-get the proper balance of softirq/user balance. Increasing ksoftirq priority to 0 
-(eventually more) is reported cure problems with low network performance at high 
-CPU load.
-Most used processes in a GIGE router:
-USER       PID %CPU %MEM  SIZE   RSS TTY STAT START   TIME COMMAND
-root         3  0.2  0.0     0     0  ?  RWN Aug 15 602:00 (ksoftirqd_CPU0)
-root       232  0.0  7.9 41400 40884  ?  S   Aug 15  74:12 gated 
--------------------------------------------------------------------
-relevant sites:
-==================
-ftp://robur.slu.se/pub/Linux/net-development/NAPI/
--------------------------------------------------------------------
-TODO: Write net-skeleton.c driver.
-------------------------------------------------------------
-Authors:
-========
-Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
-Jamal Hadi Salim <hadi@cyberus.ca>
-Robert Olsson <Robert.Olsson@data.slu.se>
-Acknowledgements:
-================
-People who made this document better:
-Lennert Buytenhek <buytenh@gnu.org>
-Andrew Morton  <akpm@zip.com.au>
-Manfred Spraul <manfred@colorfullife.com>
-Donald Becker <becker@scyld.com>
-Jeff Garzik <jgarzik@pobox.com>
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index 37869295fc70..9f7be9b7785e 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -95,9 +95,13 @@ dev->set_multicast_list:
        Synchronization: netif_tx_lock spinlock.
        Context: BHs disabled
-dev->poll:
+struct napi_struct synchronization rules
-        Synchronization: __LINK_STATE_RX_SCHED bit in dev->state.  See
+========================================
-                dev_close code and comments in net/core/dev.c for more info.
+napi->poll:
+        Synchronization: NAPI_STATE_SCHED bit in napi->state.  Device
+                driver's dev->close method will invoke napi_disable() on
+                all NAPI instances which will do a sleeping poll on the
+                NAPI_STATE_SCHED napi->state bit, waiting for all pending
+                NAPI activity to cease.
        Context: softirq
                 will be called with interrupts disabled by netconsole.