83 files changed, 2423 insertions, 1261 deletions
diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb
index a986e9bbba3d..bcebb9eaedce 100644
--- a/Documentation/ABI/testing/sysfs-bus-usb
+++ b/Documentation/ABI/testing/sysfs-bus-usb
@@ -160,7 +160,7 @@ Description:
                match the driver to the device.  For example:
                # echo "046d c315" > /sys/bus/usb/drivers/foo/remove_id
-What:           /sys/bus/usb/device/.../avoid_reset
+What:           /sys/bus/usb/device/.../avoid_reset_quirk
 Date:           December 2009
 Contact:        Oliver Neukum <oliver@neukum.org>
 Description:
diff --git a/Documentation/PCI/PCI-DMA-mapping.txt b/Documentation/DMA-API-HOWTO.txt
index ecad88d9fe59..52618ab069ad 100644
--- a/Documentation/PCI/PCI-DMA-mapping.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -1,12 +1,12 @@
-                        Dynamic DMA mapping
+                     Dynamic DMA mapping Guide
-                        ===================
+                     =========================
                 David S. Miller <davem@redhat.com>
                 Richard Henderson <rth@cygnus.com>
                  Jakub Jelinek <jakub@redhat.com>
-This document describes the DMA mapping system in terms of the pci_
+This is a guide to device driver writers on how to use the DMA API
-API.  For a similar API that works for generic devices, see
+with example pseudo-code.  For a concise description of the API, see
 DMA-API.txt.
 Most of the 64bit platforms have special hardware that translates bus
@@ -26,12 +26,15 @@ mapped only for the time they are actually used and unmapped after the DMA
 transfer.
 The following API will work of course even on platforms where no such
-hardware exists, see e.g. arch/x86/include/asm/pci.h for how it is implemented on
+hardware exists.
-top of the virt_to_bus interface.
+Note that the DMA API works with any bus independent of the underlying
+microprocessor architecture. You should use the DMA API rather than
+the bus specific DMA API (e.g. pci_dma_*).
 First of all, you should make sure
-#include <linux/pci.h>
+#include <linux/dma-mapping.h>
 is in your driver. This file will obtain for you the definition of the
 dma_addr_t (which can hold any valid DMA address for the platform)
@@ -78,44 +81,43 @@ for you to DMA from/to.
                        DMA addressing limitations
 Does your device have any DMA addressing limitations?  For example, is
-your device only capable of driving the low order 24-bits of address
+your device only capable of driving the low order 24-bits of address?
-on the PCI bus for SAC DMA transfers?  If so, you need to inform the
+If so, you need to inform the kernel of this fact.
-PCI layer of this fact.
 By default, the kernel assumes that your device can address the full
-32-bits in a SAC cycle.  For a 64-bit DAC capable device, this needs
+32-bits.  For a 64-bit capable device, this needs to be increased.
-to be increased.  And for a device with limitations, as discussed in
+And for a device with limitations, as discussed in the previous
-the previous paragraph, it needs to be decreased.
+paragraph, it needs to be decreased.
-pci_alloc_consistent() by default will return 32-bit DMA addresses.
+Special note about PCI: PCI-X specification requires PCI-X devices to
-PCI-X specification requires PCI-X devices to support 64-bit
+support 64-bit addressing (DAC) for all transactions.  And at least
-addressing (DAC) for all transactions. And at least one platform (SGI
+one platform (SGI SN2) requires 64-bit consistent allocations to
-SN2) requires 64-bit consistent allocations to operate correctly when
+operate correctly when the IO bus is in PCI-X mode.
-the IO bus is in PCI-X mode. Therefore, like with pci_set_dma_mask(),
-it's good practice to call pci_set_consistent_dma_mask() to set the
+For correct operation, you must interrogate the kernel in your device
-appropriate mask even if your device only supports 32-bit DMA
+probe routine to see if the DMA controller on the machine can properly
-(default) and especially if it's a PCI-X device.
+support the DMA addressing limitation your device has.  It is good
+style to do this even if your device holds the default setting,
-For correct operation, you must interrogate the PCI layer in your
-device probe routine to see if the PCI controller on the machine can
-properly support the DMA addressing limitation your device has.  It is
-good style to do this even if your device holds the default setting,
 because this shows that you did think about these issues wrt. your
 device.
-The query is performed via a call to pci_set_dma_mask():
+The query is performed via a call to dma_set_mask():
-        int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask);
+        int dma_set_mask(struct device *dev, u64 mask);
 The query for consistent allocations is performed via a call to
-pci_set_consistent_dma_mask():
+dma_set_coherent_mask():
-        int pci_set_consistent_dma_mask(struct pci_dev *pdev, u64 device_mask);
+        int dma_set_coherent_mask(struct device *dev, u64 mask);
-Here, pdev is a pointer to the PCI device struct of your device, and
+Here, dev is a pointer to the device struct of your device, and mask
-device_mask is a bit mask describing which bits of a PCI address your
+is a bit mask describing which bits of an address your device
-device supports.  It returns zero if your card can perform DMA
+supports.  It returns zero if your card can perform DMA properly on
-properly on the machine given the address mask you provided.
+the machine given the address mask you provided.  In general, the
+device struct of your device is embedded in the bus specific device
+struct of your device.  For example, a pointer to the device struct of
+your PCI device is pdev->dev (pdev is a pointer to the PCI device
+struct of your device).
 If it returns non-zero, your device cannot perform DMA properly on
 this platform, and attempting to do so will result in undefined
@@ -133,31 +135,30 @@ of your driver reports that performance is bad or that the device is not
 even detected, you can ask them for the kernel messages to find out
 exactly why.
-The standard 32-bit addressing PCI device would do something like
+The standard 32-bit addressing device would do something like this:
-this:
-        if (pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) {
+        if (dma_set_mask(dev, DMA_BIT_MASK(32))) {
                printk(KERN_WARNING
                       "mydev: No suitable DMA available.\n");
                goto ignore_this_device;
        }
-Another common scenario is a 64-bit capable device.  The approach
+Another common scenario is a 64-bit capable device.  The approach here
-here is to try for 64-bit DAC addressing, but back down to a
+is to try for 64-bit addressing, but back down to a 32-bit mask that
-32-bit mask should that fail.  The PCI platform code may fail the
+should not fail.  The kernel may fail the 64-bit mask not because the
-64-bit mask not because the platform is not capable of 64-bit
+platform is not capable of 64-bit addressing.  Rather, it may fail in
-addressing.  Rather, it may fail in this case simply because
+this case simply because 32-bit addressing is done more efficiently
-32-bit SAC addressing is done more efficiently than DAC addressing.
+than 64-bit addressing.  For example, Sparc64 PCI SAC addressing is
-Sparc64 is one platform which behaves in this way.
+more efficient than DAC addressing.
 Here is how you would handle a 64-bit capable device which can drive
 all 64-bits when accessing streaming DMA:
        int using_dac;
-        if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
+        if (!dma_set_mask(dev, DMA_BIT_MASK(64))) {
                using_dac = 1;
-        } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) {
+        } else if (!dma_set_mask(dev, DMA_BIT_MASK(32))) {
                using_dac = 0;
        } else {
                printk(KERN_WARNING
@@ -170,36 +171,36 @@ the case would look like this:
        int using_dac, consistent_using_dac;
-        if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
+        if (!dma_set_mask(dev, DMA_BIT_MASK(64))) {
                using_dac = 1;
                consistent_using_dac = 1;
-                pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+                dma_set_coherent_mask(dev, DMA_BIT_MASK(64));
-        } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) {
+        } else if (!dma_set_mask(dev, DMA_BIT_MASK(32))) {
                using_dac = 0;
                consistent_using_dac = 0;
-                pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
+                dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
        } else {
                printk(KERN_WARNING
                       "mydev: No suitable DMA available.\n");
                goto ignore_this_device;
        }
-pci_set_consistent_dma_mask() will always be able to set the same or a
+dma_set_coherent_mask() will always be able to set the same or a
-smaller mask as pci_set_dma_mask(). However for the rare case that a
+smaller mask as dma_set_mask(). However for the rare case that a
 device driver only uses consistent allocations, one would have to
-check the return value from pci_set_consistent_dma_mask().
+check the return value from dma_set_coherent_mask().
 Finally, if your device can only drive the low 24-bits of
-address during PCI bus mastering you might do something like:
+address you might do something like:
-        if (pci_set_dma_mask(pdev, DMA_BIT_MASK(24))) {
+        if (dma_set_mask(dev, DMA_BIT_MASK(24))) {
                printk(KERN_WARNING
                       "mydev: 24-bit DMA addressing not available.\n");
                goto ignore_this_device;
        }
-When pci_set_dma_mask() is successful, and returns zero, the PCI layer
+When dma_set_mask() is successful, and returns zero, the kernel saves
-saves away this mask you have provided.  The PCI layer will use this
+away this mask you have provided.  The kernel will use this
 information later when you make DMA mappings.
 There is a case which we are aware of at this time, which is worth
@@ -208,7 +209,7 @@ functions (for example a sound card provides playback and record
 functions) and the various different functions have _different_
 DMA addressing limitations, you may wish to probe each mask and
 only provide the functionality which the machine can handle.  It
-is important that the last call to pci_set_dma_mask() be for the
+is important that the last call to dma_set_mask() be for the
 most specific mask.
 Here is pseudo-code showing how this might be done:
@@ -217,17 +218,17 @@ Here is pseudo-code showing how this might be done:
        #define RECORD_ADDRESS_BITS     DMA_BIT_MASK(24)
        struct my_sound_card *card;
-        struct pci_dev *pdev;
+        struct device *dev;
        ...
-        if (!pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) {
+        if (!dma_set_mask(dev, PLAYBACK_ADDRESS_BITS)) {
                card->playback_enabled = 1;
        } else {
                card->playback_enabled = 0;
                printk(KERN_WARNING "%s: Playback disabled due to DMA limitations.\n",
                       card->name);
        }
-        if (!pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) {
+        if (!dma_set_mask(dev, RECORD_ADDRESS_BITS)) {
                card->record_enabled = 1;
        } else {
                card->record_enabled = 0;
@@ -252,8 +253,8 @@ There are two types of DMA mappings:
  Think of "consistent" as "synchronous" or "coherent".
  The current default is to return consistent memory in the low 32
-  bits of the PCI bus space.  However, for future compatibility you
+  bits of the bus space.  However, for future compatibility you should
-  should set the consistent mask even if this default is fine for your
+  set the consistent mask even if this default is fine for your
  driver.
  Good examples of what to use consistent mappings for are:
@@ -285,9 +286,9 @@ There are two types of DMA mappings:
             found in PCI bridges (such as by reading a register's value
             after writing it).
- Streaming DMA mappings which are usually mapped for one DMA transfer,
+- Streaming DMA mappings which are usually mapped for one DMA
-  unmapped right after it (unless you use pci_dma_sync_* below) and for which
+  transfer, unmapped right after it (unless you use dma_sync_* below)
-  hardware can optimize for sequential accesses.
+  and for which hardware can optimize for sequential accesses.
  This of "streaming" as "asynchronous" or "outside the coherency
  domain".
@@ -302,8 +303,8 @@ There are two types of DMA mappings:
  optimizations the hardware allows.  To this end, when using
  such mappings you must be explicit about what you want to happen.
-Neither type of DMA mapping has alignment restrictions that come
+Neither type of DMA mapping has alignment restrictions that come from
-from PCI, although some devices may have such restrictions.
+the underlying bus, although some devices may have such restrictions.
 Also, systems with caches that aren't DMA-coherent will work better
 when the underlying buffers don't share cache lines with other data.
@@ -315,33 +316,27 @@ you should do:
        dma_addr_t dma_handle;
-        cpu_addr = pci_alloc_consistent(pdev, size, &dma_handle);
+        cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp);
-where pdev is a struct pci_dev *. This may be called in interrupt context.
-You should use dma_alloc_coherent (see DMA-API.txt) for buses
-where devices don't have struct pci_dev (like ISA, EISA).
-This argument is needed because the DMA translations may be bus
+where device is a struct device *. This may be called in interrupt
-specific (and often is private to the bus which the device is attached
+context with the GFP_ATOMIC flag.
-to).
 Size is the length of the region you want to allocate, in bytes.
 This routine will allocate RAM for that region, so it acts similarly to
 __get_free_pages (but takes size instead of a page order).  If your
 driver needs regions sized smaller than a page, you may prefer using
-the pci_pool interface, described below.
+the dma_pool interface, described below.
-The consistent DMA mapping interfaces, for non-NULL pdev, will by
+The consistent DMA mapping interfaces, for non-NULL dev, will by
-default return a DMA address which is SAC (Single Address Cycle)
+default return a DMA address which is 32-bit addressable.  Even if the
-addressable.  Even if the device indicates (via PCI dma mask) that it
+device indicates (via DMA mask) that it may address the upper 32-bits,
-may address the upper 32-bits and thus perform DAC cycles, consistent
+consistent allocation will only return > 32-bit addresses for DMA if
-allocation will only return > 32-bit PCI addresses for DMA if the
+the consistent DMA mask has been explicitly changed via
-consistent dma mask has been explicitly changed via
+dma_set_coherent_mask().  This is true of the dma_pool interface as
-pci_set_consistent_dma_mask().  This is true of the pci_pool interface
+well.
-as well.
+dma_alloc_coherent returns two values: the virtual address which you
-pci_alloc_consistent returns two values: the virtual address which you
 can use to access it from the CPU and dma_handle which you pass to the
 card.
@@ -354,54 +349,54 @@ buffer you receive will not cross a 64K boundary.
 To unmap and free such a DMA region, you call:
-        pci_free_consistent(pdev, size, cpu_addr, dma_handle);
+        dma_free_coherent(dev, size, cpu_addr, dma_handle);
-where pdev, size are the same as in the above call and cpu_addr and
+where dev, size are the same as in the above call and cpu_addr and
-dma_handle are the values pci_alloc_consistent returned to you.
+dma_handle are the values dma_alloc_coherent returned to you.
 This function may not be called in interrupt context.
 If your driver needs lots of smaller memory regions, you can write
-custom code to subdivide pages returned by pci_alloc_consistent,
+custom code to subdivide pages returned by dma_alloc_coherent,
-or you can use the pci_pool API to do that.  A pci_pool is like
+or you can use the dma_pool API to do that.  A dma_pool is like
-a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages.
+a kmem_cache, but it uses dma_alloc_coherent not __get_free_pages.
 Also, it understands common hardware constraints for alignment,
 like queue heads needing to be aligned on N byte boundaries.
-Create a pci_pool like this:
+Create a dma_pool like this:
-        struct pci_pool *pool;
+        struct dma_pool *pool;
-        pool = pci_pool_create(name, pdev, size, align, alloc);
+        pool = dma_pool_create(name, dev, size, align, alloc);
-The "name" is for diagnostics (like a kmem_cache name); pdev and size
+The "name" is for diagnostics (like a kmem_cache name); dev and size
 are as above.  The device's hardware alignment requirement for this
 type of data is "align" (which is expressed in bytes, and must be a
 power of two).  If your device has no boundary crossing restrictions,
 pass 0 for alloc; passing 4096 says memory allocated from this pool
 must not cross 4KByte boundaries (but at that time it may be better to
-go for pci_alloc_consistent directly instead).
+go for dma_alloc_coherent directly instead).
-Allocate memory from a pci pool like this:
+Allocate memory from a dma pool like this:
-        cpu_addr = pci_pool_alloc(pool, flags, &dma_handle);
+        cpu_addr = dma_pool_alloc(pool, flags, &dma_handle);
 flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor
-holding SMP locks), SLAB_ATOMIC otherwise.  Like pci_alloc_consistent,
+holding SMP locks), SLAB_ATOMIC otherwise.  Like dma_alloc_coherent,
 this returns two values, cpu_addr and dma_handle.
-Free memory that was allocated from a pci_pool like this:
+Free memory that was allocated from a dma_pool like this:
-        pci_pool_free(pool, cpu_addr, dma_handle);
+        dma_pool_free(pool, cpu_addr, dma_handle);
-where pool is what you passed to pci_pool_alloc, and cpu_addr and
+where pool is what you passed to dma_pool_alloc, and cpu_addr and
-dma_handle are the values pci_pool_alloc returned. This function
+dma_handle are the values dma_pool_alloc returned. This function
 may be called in interrupt context.
-Destroy a pci_pool by calling:
+Destroy a dma_pool by calling:
-        pci_pool_destroy(pool);
+        dma_pool_destroy(pool);
-Make sure you've called pci_pool_free for all memory allocated
+Make sure you've called dma_pool_free for all memory allocated
 from a pool before you destroy the pool. This function may not
 be called in interrupt context.
@@ -411,15 +406,15 @@ The interfaces described in subsequent portions of this document
 take a DMA direction argument, which is an integer and takes on
 one of the following values:
- PCI_DMA_BIDIRECTIONAL
+ DMA_BIDIRECTIONAL
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE
- PCI_DMA_NONE
+ DMA_NONE
 One should provide the exact DMA direction if you know it.
-PCI_DMA_TODEVICE means "from main memory to the PCI device"
+DMA_TO_DEVICE means "from main memory to the device"
-PCI_DMA_FROMDEVICE means "from the PCI device to main memory"
+DMA_FROM_DEVICE means "from the device to main memory"
 It is the direction in which the data moves during the DMA
 transfer.
@@ -427,12 +422,12 @@ You are _strongly_ encouraged to specify this as precisely
 as you possibly can.
 If you absolutely cannot know the direction of the DMA transfer,
-specify PCI_DMA_BIDIRECTIONAL.  It means that the DMA can go in
+specify DMA_BIDIRECTIONAL.  It means that the DMA can go in
 either direction.  The platform guarantees that you may legally
 specify this, and that it will work, but this may be at the
 cost of performance for example.
-The value PCI_DMA_NONE is to be used for debugging.  One can
+The value DMA_NONE is to be used for debugging.  One can
 hold this in a data structure before you come to know the
 precise direction, and this will help catch cases where your
 direction tracking logic has failed to set things up properly.
@@ -442,21 +437,21 @@ potential platform-specific optimizations of such) is for debugging.
 Some platforms actually have a write permission boolean which DMA
 mappings can be marked with, much like page protections in the user
 program address space.  Such platforms can and do report errors in the
-kernel logs when the PCI controller hardware detects violation of the
+kernel logs when the DMA controller hardware detects violation of the
 permission setting.
 Only streaming mappings specify a direction, consistent mappings
 implicitly have a direction attribute setting of
-PCI_DMA_BIDIRECTIONAL.
+DMA_BIDIRECTIONAL.
 The SCSI subsystem tells you the direction to use in the
 'sc_data_direction' member of the SCSI command your driver is
 working on.
 For Networking drivers, it's a rather simple affair.  For transmit
-packets, map/unmap them with the PCI_DMA_TODEVICE direction
+packets, map/unmap them with the DMA_TO_DEVICE direction
 specifier.  For receive packets, just the opposite, map/unmap them
-with the PCI_DMA_FROMDEVICE direction specifier.
+with the DMA_FROM_DEVICE direction specifier.
                  Using Streaming DMA mappings
@@ -467,43 +462,43 @@ scatterlist.
 To map a single region, you do:
-        struct pci_dev *pdev = mydev->pdev;
+        struct device *dev = &my_dev->dev;
        dma_addr_t dma_handle;
        void *addr = buffer->ptr;
        size_t size = buffer->len;
-        dma_handle = pci_map_single(pdev, addr, size, direction);
+        dma_handle = dma_map_single(dev, addr, size, direction);
 and to unmap it:
-        pci_unmap_single(pdev, dma_handle, size, direction);
+        dma_unmap_single(dev, dma_handle, size, direction);
-You should call pci_unmap_single when the DMA activity is finished, e.g.
+You should call dma_unmap_single when the DMA activity is finished, e.g.
 from the interrupt which told you that the DMA transfer is done.
 Using cpu pointers like this for single mappings has a disadvantage,
 you cannot reference HIGHMEM memory in this way.  Thus, there is a
-map/unmap interface pair akin to pci_{map,unmap}_single.  These
+map/unmap interface pair akin to dma_{map,unmap}_single.  These
 interfaces deal with page/offset pairs instead of cpu pointers.
 Specifically:
-        struct pci_dev *pdev = mydev->pdev;
+        struct device *dev = &my_dev->dev;
        dma_addr_t dma_handle;
        struct page *page = buffer->page;
        unsigned long offset = buffer->offset;
        size_t size = buffer->len;
-        dma_handle = pci_map_page(pdev, page, offset, size, direction);
+        dma_handle = dma_map_page(dev, page, offset, size, direction);
        ...
-        pci_unmap_page(pdev, dma_handle, size, direction);
+        dma_unmap_page(dev, dma_handle, size, direction);
 Here, "offset" means byte offset within the given page.
 With scatterlists, you map a region gathered from several regions by:
-        int i, count = pci_map_sg(pdev, sglist, nents, direction);
+        int i, count = dma_map_sg(dev, sglist, nents, direction);
        struct scatterlist *sg;
        for_each_sg(sglist, sg, count, i) {
@@ -527,16 +522,16 @@ accessed sg->address and sg->length as shown above.
 To unmap a scatterlist, just call:
-        pci_unmap_sg(pdev, sglist, nents, direction);
+        dma_unmap_sg(dev, sglist, nents, direction);
 Again, make sure DMA activity has already finished.
-PLEASE NOTE:  The 'nents' argument to the pci_unmap_sg call must be
+PLEASE NOTE:  The 'nents' argument to the dma_unmap_sg call must be
-              the _same_ one you passed into the pci_map_sg call,
+              the _same_ one you passed into the dma_map_sg call,
              it should _NOT_ be the 'count' value _returned_ from the
-              pci_map_sg call.
+              dma_map_sg call.
-Every pci_map_{single,sg} call should have its pci_unmap_{single,sg}
+Every dma_map_{single,sg} call should have its dma_unmap_{single,sg}
 counterpart, because the bus address space is a shared resource (although
 in some ports the mapping is per each BUS so less devices contend for the
 same bus address space) and you could render the machine unusable by eating
@@ -547,14 +542,14 @@ the data in between the DMA transfers, the buffer needs to be synced
 properly in order for the cpu and device to see the most uptodate and
 correct copy of the DMA buffer.
-So, firstly, just map it with pci_map_{single,sg}, and after each DMA
+So, firstly, just map it with dma_map_{single,sg}, and after each DMA
 transfer call either:
-        pci_dma_sync_single_for_cpu(pdev, dma_handle, size, direction);
+        dma_sync_single_for_cpu(dev, dma_handle, size, direction);
 or:
-        pci_dma_sync_sg_for_cpu(pdev, sglist, nents, direction);
+        dma_sync_sg_for_cpu(dev, sglist, nents, direction);
 as appropriate.
@@ -562,27 +557,27 @@ Then, if you wish to let the device get at the DMA area again,
 finish accessing the data with the cpu, and then before actually
 giving the buffer to the hardware call either:
-        pci_dma_sync_single_for_device(pdev, dma_handle, size, direction);
+        dma_sync_single_for_device(dev, dma_handle, size, direction);
 or:
-        pci_dma_sync_sg_for_device(dev, sglist, nents, direction);
+        dma_sync_sg_for_device(dev, sglist, nents, direction);
 as appropriate.
 After the last DMA transfer call one of the DMA unmap routines
-pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_*
+dma_unmap_{single,sg}. If you don't touch the data from the first dma_map_*
-call till pci_unmap_*, then you don't have to call the pci_dma_sync_*
+call till dma_unmap_*, then you don't have to call the dma_sync_*
 routines at all.
 Here is pseudo code which shows a situation in which you would need
-to use the pci_dma_sync_*() interfaces.
+to use the dma_sync_*() interfaces.
        my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len)
        {
                dma_addr_t mapping;
-                mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE);
+                mapping = dma_map_single(cp->dev, buffer, len, DMA_FROM_DEVICE);
                cp->rx_buf = buffer;
                cp->rx_len = len;
@@ -606,25 +601,25 @@ to use the pci_dma_sync_*() interfaces.
                         * the DMA transfer with the CPU first
                         * so that we see updated contents.
                         */
-                        pci_dma_sync_single_for_cpu(cp->pdev, cp->rx_dma,
+                        dma_sync_single_for_cpu(&cp->dev, cp->rx_dma,
-                                                    cp->rx_len,
+                                                cp->rx_len,
-                                                    PCI_DMA_FROMDEVICE);
+                                                DMA_FROM_DEVICE);
                        /* Now it is safe to examine the buffer. */
                        hp = (struct my_card_header *) cp->rx_buf;
                        if (header_is_ok(hp)) {
-                                pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len,
+                                dma_unmap_single(&cp->dev, cp->rx_dma, cp->rx_len,
-                                                 PCI_DMA_FROMDEVICE);
+                                                 DMA_FROM_DEVICE);
                                pass_to_upper_layers(cp->rx_buf);
                                make_and_setup_new_rx_buf(cp);
                        } else {
                                /* Just sync the buffer and give it back
                                 * to the card.
                                 */
-                                pci_dma_sync_single_for_device(cp->pdev,
+                                dma_sync_single_for_device(&cp->dev,
-                                                               cp->rx_dma,
+                                                           cp->rx_dma,
-                                                               cp->rx_len,
+                                                           cp->rx_len,
-                                                               PCI_DMA_FROMDEVICE);
+                                                           DMA_FROM_DEVICE);
                                give_rx_buf_to_card(cp);
                        }
                }
@@ -634,19 +629,19 @@ Drivers converted fully to this interface should not use virt_to_bus any
 longer, nor should they use bus_to_virt. Some drivers have to be changed a
 little bit, because there is no longer an equivalent to bus_to_virt in the
 dynamic DMA mapping scheme - you have to always store the DMA addresses
-returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single
+returned by the dma_alloc_coherent, dma_pool_alloc, and dma_map_single
-calls (pci_map_sg stores them in the scatterlist itself if the platform
+calls (dma_map_sg stores them in the scatterlist itself if the platform
 supports dynamic DMA mapping in hardware) in your driver structures and/or
 in the card registers.
-All PCI drivers should be using these interfaces with no exceptions.
+All drivers should be using these interfaces with no exceptions.  It
-It is planned to completely remove virt_to_bus() and bus_to_virt() as
+is planned to completely remove virt_to_bus() and bus_to_virt() as
 they are entirely deprecated.  Some ports already do not provide these
 as it is impossible to correctly support them.
                Optimizing Unmap State Space Consumption
-On many platforms, pci_unmap_{single,page}() is simply a nop.
+On many platforms, dma_unmap_{single,page}() is simply a nop.
 Therefore, keeping track of the mapping address and length is a waste
 of space.  Instead of filling your drivers up with ifdefs and the like
 to "work around" this (which would defeat the whole purpose of a
@@ -655,7 +650,7 @@ portable API) the following facilities are provided.
 Actually, instead of describing the macros one by one, we'll
 transform some example code.
-1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures.
+1) Use DEFINE_DMA_UNMAP_{ADDR,LEN} in state saving structures.
   Example, before:
        struct ring_state {
@@ -668,14 +663,11 @@ transform some example code.
        struct ring_state {
                struct sk_buff *skb;
-                DECLARE_PCI_UNMAP_ADDR(mapping)
+                DEFINE_DMA_UNMAP_ADDR(mapping);
-                DECLARE_PCI_UNMAP_LEN(len)
+                DEFINE_DMA_UNMAP_LEN(len);
        };
-   NOTE: DO NOT put a semicolon at the end of the DECLARE_*()
+2) Use dma_unmap_{addr,len}_set to set these values.
-         macro.
-2) Use pci_unmap_{addr,len}_set to set these values.
   Example, before:
        ringp->mapping = FOO;
@@ -683,21 +675,21 @@ transform some example code.
   after:
-        pci_unmap_addr_set(ringp, mapping, FOO);
+        dma_unmap_addr_set(ringp, mapping, FOO);
-        pci_unmap_len_set(ringp, len, BAR);
+        dma_unmap_len_set(ringp, len, BAR);
-3) Use pci_unmap_{addr,len} to access these values.
+3) Use dma_unmap_{addr,len} to access these values.
   Example, before:
-        pci_unmap_single(pdev, ringp->mapping, ringp->len,
+        dma_unmap_single(dev, ringp->mapping, ringp->len,
-                         PCI_DMA_FROMDEVICE);
+                         DMA_FROM_DEVICE);
   after:
-        pci_unmap_single(pdev,
+        dma_unmap_single(dev,
-                         pci_unmap_addr(ringp, mapping),
+                         dma_unmap_addr(ringp, mapping),
-                         pci_unmap_len(ringp, len),
+                         dma_unmap_len(ringp, len),
-                         PCI_DMA_FROMDEVICE);
+                         DMA_FROM_DEVICE);
 It really should be self-explanatory.  We treat the ADDR and LEN
 separately, because it is possible for an implementation to only
@@ -732,15 +724,15 @@ to "Closing".
 DMA address space is limited on some architectures and an allocation
 failure can be determined by:
- checking if pci_alloc_consistent returns NULL or pci_map_sg returns 0
+- checking if dma_alloc_coherent returns NULL or dma_map_sg returns 0
- checking the returned dma_addr_t of pci_map_single and pci_map_page
+- checking the returned dma_addr_t of dma_map_single and dma_map_page
-  by using pci_dma_mapping_error():
+  by using dma_mapping_error():
        dma_addr_t dma_handle;
-        dma_handle = pci_map_single(pdev, addr, size, direction);
+        dma_handle = dma_map_single(dev, addr, size, direction);
-        if (pci_dma_mapping_error(pdev, dma_handle)) {
+        if (dma_mapping_error(dev, dma_handle)) {
                /*
                 * reduce current DMA mapping usage,
                 * delay and try again later or
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 5aceb88b3f8b..05e2ae236865 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -4,20 +4,18 @@
        James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
 This document describes the DMA API.  For a more gentle introduction
-phrased in terms of the pci_ equivalents (and actual examples) see
+of the API (and actual examples) see
-Documentation/PCI/PCI-DMA-mapping.txt.
+Documentation/DMA-API-HOWTO.txt.
-This API is split into two pieces.  Part I describes the API and the
+This API is split into two pieces.  Part I describes the API.  Part II
-corresponding pci_ API.  Part II describes the extensions to the API
+describes the extensions to the API for supporting non-consistent
-for supporting non-consistent memory machines.  Unless you know that
+memory machines.  Unless you know that your driver absolutely has to
-your driver absolutely has to support non-consistent platforms (this
+support non-consistent platforms (this is usually only legacy
-is usually only legacy platforms) you should only use the API
+platforms) you should only use the API described in part I.
-described in part I.
-Part I - pci_ and dma_ Equivalent API 
+Part I - dma_ API
 -------------------------------------
-To get the pci_ API, you must #include <linux/pci.h>
 To get the dma_ API, you must #include <linux/dma-mapping.h>
@@ -27,9 +25,6 @@ Part Ia - Using large dma-coherent buffers
 void *
 dma_alloc_coherent(struct device *dev, size_t size,
                             dma_addr_t *dma_handle, gfp_t flag)
-void *
-pci_alloc_consistent(struct pci_dev *dev, size_t size,
-                             dma_addr_t *dma_handle)
 Consistent memory is memory for which a write by either the device or
 the processor can immediately be read by the processor or device
@@ -53,15 +48,11 @@ The simplest way to do that is to use the dma_pool calls (see below).
 The flag parameter (dma_alloc_coherent only) allows the caller to
 specify the GFP_ flags (see kmalloc) for the allocation (the
 implementation may choose to ignore flags that affect the location of
-the returned memory, like GFP_DMA).  For pci_alloc_consistent, you
+the returned memory, like GFP_DMA).
-must assume GFP_ATOMIC behaviour.
 void
 dma_free_coherent(struct device *dev, size_t size, void *cpu_addr,
                           dma_addr_t dma_handle)
-void
-pci_free_consistent(struct pci_dev *dev, size_t size, void *cpu_addr,
-                           dma_addr_t dma_handle)
 Free the region of consistent memory you previously allocated.  dev,
 size and dma_handle must all be the same as those passed into the
@@ -89,10 +80,6 @@ for alignment, like queue heads needing to be aligned on N-byte boundaries.
        dma_pool_create(const char *name, struct device *dev,
                        size_t size, size_t align, size_t alloc);
-        struct pci_pool *
-        pci_pool_create(const char *name, struct pci_device *dev,
-                        size_t size, size_t align, size_t alloc);
 The pool create() routines initialize a pool of dma-coherent buffers
 for use with a given device.  It must be called in a context which
 can sleep.
@@ -108,9 +95,6 @@ from this pool must not cross 4KByte boundaries.
        void *dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags,
                        dma_addr_t *dma_handle);
-        void *pci_pool_alloc(struct pci_pool *pool, gfp_t gfp_flags,
-                        dma_addr_t *dma_handle);
 This allocates memory from the pool; the returned memory will meet the size
 and alignment requirements specified at creation time.  Pass GFP_ATOMIC to
 prevent blocking, or if it's permitted (not in_interrupt, not holding SMP locks),
@@ -122,9 +106,6 @@ pool's device.
        void dma_pool_free(struct dma_pool *pool, void *vaddr,
                        dma_addr_t addr);
-        void pci_pool_free(struct pci_pool *pool, void *vaddr,
-                        dma_addr_t addr);
 This puts memory back into the pool.  The pool is what was passed to
 the pool allocation routine; the cpu (vaddr) and dma addresses are what
 were returned when that routine allocated the memory being freed.
@@ -132,8 +113,6 @@ were returned when that routine allocated the memory being freed.
        void dma_pool_destroy(struct dma_pool *pool);
-        void pci_pool_destroy(struct pci_pool *pool);
 The pool destroy() routines free the resources of the pool.  They must be
 called in a context which can sleep.  Make sure you've freed all allocated
 memory back to the pool before you destroy it.
@@ -144,8 +123,6 @@ Part Ic - DMA addressing limitations
 int
 dma_supported(struct device *dev, u64 mask)
-int
-pci_dma_supported(struct pci_dev *hwdev, u64 mask)
 Checks to see if the device can support DMA to the memory described by
 mask.
@@ -159,8 +136,14 @@ driver writers.
 int
 dma_set_mask(struct device *dev, u64 mask)
+Checks to see if the mask is possible and updates the device
+parameters if it is.
+Returns: 0 if successful and a negative error if not.
 int
-pci_set_dma_mask(struct pci_device *dev, u64 mask)
+dma_set_coherent_mask(struct device *dev, u64 mask)
 Checks to see if the mask is possible and updates the device
 parameters if it is.
@@ -187,9 +170,6 @@ Part Id - Streaming DMA mappings
 dma_addr_t
 dma_map_single(struct device *dev, void *cpu_addr, size_t size,
                      enum dma_data_direction direction)
-dma_addr_t
-pci_map_single(struct pci_dev *hwdev, void *cpu_addr, size_t size,
-                      int direction)
 Maps a piece of processor virtual memory so it can be accessed by the
 device and returns the physical handle of the memory.
@@ -198,14 +178,10 @@ The direction for both api's may be converted freely by casting.
 However the dma_ API uses a strongly typed enumerator for its
 direction:
-DMA_NONE                = PCI_DMA_NONE          no direction (used for
+DMA_NONE                no direction (used for debugging)
-                                                debugging)
+DMA_TO_DEVICE           data is going from the memory to the device
-DMA_TO_DEVICE           = PCI_DMA_TODEVICE      data is going from the
+DMA_FROM_DEVICE         data is coming from the device to the memory
-                                                memory to the device
+DMA_BIDIRECTIONAL       direction isn't known
-DMA_FROM_DEVICE         = PCI_DMA_FROMDEVICE    data is coming from
-                                                the device to the
-                                                memory
-DMA_BIDIRECTIONAL       = PCI_DMA_BIDIRECTIONAL direction isn't known
 Notes:  Not all memory regions in a machine can be mapped by this
 API.  Further, regions that appear to be physically contiguous in
@@ -268,9 +244,6 @@ cache lines are updated with data that the device may have changed).
 void
 dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
                 enum dma_data_direction direction)
-void
-pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
-                 size_t size, int direction)
 Unmaps the region previously mapped.  All the parameters passed in
 must be identical to those passed in (and returned) by the mapping
@@ -280,15 +253,9 @@ dma_addr_t
 dma_map_page(struct device *dev, struct page *page,
                    unsigned long offset, size_t size,
                    enum dma_data_direction direction)
-dma_addr_t
-pci_map_page(struct pci_dev *hwdev, struct page *page,
-                    unsigned long offset, size_t size, int direction)
 void
 dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
               enum dma_data_direction direction)
-void
-pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
-               size_t size, int direction)
 API for mapping and unmapping for pages.  All the notes and warnings
 for the other mapping APIs apply here.  Also, although the <offset>
@@ -299,9 +266,6 @@ cache width is.
 int
 dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
-int
-pci_dma_mapping_error(struct pci_dev *hwdev, dma_addr_t dma_addr)
 In some circumstances dma_map_single and dma_map_page will fail to create
 a mapping. A driver can check for these errors by testing the returned
 dma address with dma_mapping_error(). A non-zero return value means the mapping
@@ -311,9 +275,6 @@ reduce current DMA mapping usage or delay and try again later).
        int
        dma_map_sg(struct device *dev, struct scatterlist *sg,
                int nents, enum dma_data_direction direction)
-        int
-        pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
-                int nents, int direction)
 Returns: the number of physical segments mapped (this may be shorter
 than <nents> passed in if some elements of the scatter/gather list are
@@ -353,9 +314,6 @@ accessed sg->address and sg->length as shown above.
        void
        dma_unmap_sg(struct device *dev, struct scatterlist *sg,
                int nhwentries, enum dma_data_direction direction)
-        void
-        pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg,
-                int nents, int direction)
 Unmap the previously mapped scatter/gather list.  All the parameters
 must be the same as those and passed in to the scatter/gather mapping
@@ -365,21 +323,23 @@ Note: <nents> must be the number you passed in, *not* the number of
 physical entries returned.
 void
-dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size,
+dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size,
-                enum dma_data_direction direction)
+                        enum dma_data_direction direction)
 void
-pci_dma_sync_single(struct pci_dev *hwdev, dma_addr_t dma_handle,
+dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle, size_t size,
-                           size_t size, int direction)
+                           enum dma_data_direction direction)
 void
-dma_sync_sg(struct device *dev, struct scatterlist *sg, int nelems,
+dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems,
-                          enum dma_data_direction direction)
+                    enum dma_data_direction direction)
 void
-pci_dma_sync_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems,
-                       int nelems, int direction)
+                       enum dma_data_direction direction)
-Synchronise a single contiguous or scatter/gather mapping.  All the
+Synchronise a single contiguous or scatter/gather mapping for the cpu
-parameters must be the same as those passed into the single mapping
+and device. With the sync_sg API, all the parameters must be the same
-API.
+as those passed into the single mapping API. With the sync_single API,
+you can use dma_handle and size parameters that aren't identical to
+those passed into the single mapping API to do a partial sync.
 Notes:  You must do this:
@@ -461,9 +421,9 @@ void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t dma_addr,
 Part II - Advanced dma_ usage
 -----------------------------
-Warning: These pieces of the DMA API have no PCI equivalent.  They
+Warning: These pieces of the DMA API should not be used in the
-should also not be used in the majority of cases, since they cater for
+majority of cases, since they cater for unlikely corner cases that
-unlikely corner cases that don't belong in usual drivers.
+don't belong in usual drivers.
 If you don't understand how cache line coherency works between a
 processor and an I/O device, you should not be using this part of the
@@ -514,16 +474,6 @@ into the width returned by this call.  It will also always be a power
 of two for easy alignment.
 void
-dma_sync_single_range(struct device *dev, dma_addr_t dma_handle,
-                      unsigned long offset, size_t size,
-                      enum dma_data_direction direction)
-Does a partial sync, starting at offset and continuing for size.  You
-must be careful to observe the cache alignment and width when doing
-anything like this.  You must also be extra careful about accessing
-memory you intend to sync partially.
-void
 dma_cache_sync(struct device *dev, void *vaddr, size_t size,
               enum dma_data_direction direction)
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl
index 5e7d84b48505..133cd6c3f3c1 100644
--- a/Documentation/DocBook/mtdnand.tmpl
+++ b/Documentation/DocBook/mtdnand.tmpl
@@ -488,7 +488,7 @@ static void board_select_chip (struct mtd_info *mtd, int chip)
                                The ECC bytes must be placed immidiately after the data
                                bytes in order to make the syndrome generator work. This
                                is contrary to the usual layout used by software ECC. The
-                                seperation of data and out of band area is not longer
+                                separation of data and out of band area is not longer
                                possible. The nand driver code handles this layout and
                                the remaining free bytes in the oob area are managed by 
                                the autoplacement code. Provide a matching oob-layout
@@ -560,7 +560,7 @@ static void board_select_chip (struct mtd_info *mtd, int chip)
                                bad blocks. They have factory marked good blocks. The marker pattern
                                is erased when the block is erased to be reused. So in case of
                                powerloss before writing the pattern back to the chip this block 
-                                would be lost and added to the bad blocks. Therefor we scan the 
+                                would be lost and added to the bad blocks. Therefore we scan the 
                                chip(s) when we detect them the first time for good blocks and 
                                store this information in a bad block table before erasing any 
                                of the blocks.
@@ -1094,7 +1094,7 @@ in this page</entry>
                manufacturers specifications. This applies similar to the spare area. 
        </para>
        <para>
-                Therefor NAND aware filesystems must either write in page size chunks
+                Therefore NAND aware filesystems must either write in page size chunks
                or hold a writebuffer to collect smaller writes until they sum up to 
                pagesize. Available NAND aware filesystems: JFFS2, YAFFS.               
        </para>
diff --git a/Documentation/DocBook/tracepoint.tmpl b/Documentation/DocBook/tracepoint.tmpl
index 8bca1d5cec09..e8473eae2a20 100644
--- a/Documentation/DocBook/tracepoint.tmpl
+++ b/Documentation/DocBook/tracepoint.tmpl
@@ -16,6 +16,15 @@
     </address>
    </affiliation>
   </author>
+   <author>
+    <firstname>William</firstname>
+    <surname>Cohen</surname>
+    <affiliation>
+     <address>
+      <email>wcohen@redhat.com</email>
+     </address>
+    </affiliation>
+   </author>
  </authorgroup>
  <legalnotice>
@@ -91,4 +100,8 @@
 !Iinclude/trace/events/signal.h
  </chapter>
+  <chapter id="block">
+   <title>Block IO</title>
+!Iinclude/trace/events/block.h
+  </chapter>
 </book>
diff --git a/Documentation/DocBook/v4l/common.xml b/Documentation/DocBook/v4l/common.xml
index c65f0ac9b6ee..cea23e1c4fc6 100644
--- a/Documentation/DocBook/v4l/common.xml
+++ b/Documentation/DocBook/v4l/common.xml
@@ -1170,7 +1170,7 @@ frames per second. If less than this number of frames is to be
 captured or output, applications can request frame skipping or
 duplicating on the driver side. This is especially useful when using
 the &func-read; or &func-write;, which are not augmented by timestamps
-or sequence counters, and to avoid unneccessary data copying.</para>
+or sequence counters, and to avoid unnecessary data copying.</para>
    <para>Finally these ioctls can be used to determine the number of
 buffers used internally by a driver in read/write mode. For
diff --git a/Documentation/DocBook/v4l/vidioc-g-parm.xml b/Documentation/DocBook/v4l/vidioc-g-parm.xml
index 78332d365ce9..392aa9e5571e 100644
--- a/Documentation/DocBook/v4l/vidioc-g-parm.xml
+++ b/Documentation/DocBook/v4l/vidioc-g-parm.xml
@@ -55,7 +55,7 @@ captured or output, applications can request frame skipping or
 duplicating on the driver side. This is especially useful when using
 the <function>read()</function> or <function>write()</function>, which
 are not augmented by timestamps or sequence counters, and to avoid
-unneccessary data copying.</para>
+unnecessary data copying.</para>
    <para>Further these ioctls can be used to determine the number of
 buffers used internally by a driver in read/write mode. For
diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index f5395af88a41..40ada93b820a 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -234,7 +234,7 @@ process is as follows:
    Linus, usually the patches that have already been included in the
    -next kernel for a few weeks.  The preferred way to submit big changes
    is using git (the kernel's source management tool, more information
-    can be found at http://git.or.cz/) but plain patches are also just
+    can be found at http://git-scm.com/) but plain patches are also just
    fine.
  - After two weeks a -rc1 kernel is released it is now possible to push
    only patches that do not include new features that could affect the
diff --git a/Documentation/IPMI.txt b/Documentation/IPMI.txt
index bc38283379f0..69dd29ed824e 100644
--- a/Documentation/IPMI.txt
+++ b/Documentation/IPMI.txt
@@ -365,6 +365,7 @@ You can change this at module load time (for a module) with:
       regshifts=<shift1>,<shift2>,...
       slave_addrs=<addr1>,<addr2>,...
       force_kipmid=<enable1>,<enable2>,...
+       kipmid_max_busy_us=<ustime1>,<ustime2>,...
       unload_when_empty=[0|1]
 Each of these except si_trydefaults is a list, the first item for the
@@ -433,6 +434,7 @@ kernel command line as:
       ipmi_si.regshifts=<shift1>,<shift2>,...
       ipmi_si.slave_addrs=<addr1>,<addr2>,...
       ipmi_si.force_kipmid=<enable1>,<enable2>,...
+       ipmi_si.kipmid_max_busy_us=<ustime1>,<ustime2>,...
 It works the same as the module parameters of the same names.
@@ -450,6 +452,16 @@ force this thread on or off.  If you force it off and don't have
 interrupts, the driver will run VERY slowly.  Don't blame me,
 these interfaces suck.
+Unfortunately, this thread can use a lot of CPU depending on the
+interface's performance.  This can waste a lot of CPU and cause
+various issues with detecting idle CPU and using extra power.  To
+avoid this, the kipmid_max_busy_us sets the maximum amount of time, in
+microseconds, that kipmid will spin before sleeping for a tick.  This
+value sets a balance between performance and CPU waste and needs to be
+tuned to your needs.  Maybe, someday, auto-tuning will be added, but
+that's not a simple thing and even the auto-tuning would need to be
+tuned to the user's desired performance.
 The driver supports a hot add and remove of interfaces.  This way,
 interfaces can be added or removed after the kernel is up and running.
 This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 94b945733534..6fc7ea1d1f9d 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -1,3 +1,3 @@
 obj-m := DocBook/ accounting/ auxdisplay/ connector/ \
-        filesystems/configfs/ ia64/ networking/ \
+        filesystems/ filesystems/configfs/ ia64/ laptops/ networking/ \
-        pcmcia/ spi/ video4linux/ vm/ watchdog/src/
+        pcmcia/ spi/ timers/ video4linux/ vm/ watchdog/src/
diff --git a/Documentation/RCU/NMI-RCU.txt b/Documentation/RCU/NMI-RCU.txt
index a6d32e65d222..a8536cb88091 100644
--- a/Documentation/RCU/NMI-RCU.txt
+++ b/Documentation/RCU/NMI-RCU.txt
@@ -34,7 +34,7 @@ NMI handler.
                cpu = smp_processor_id();
                ++nmi_count(cpu);
-                if (!rcu_dereference(nmi_callback)(regs, cpu))
+                if (!rcu_dereference_sched(nmi_callback)(regs, cpu))
                        default_do_nmi(regs);
                nmi_exit();
@@ -47,12 +47,13 @@ function pointer.  If this handler returns zero, do_nmi() invokes the
 default_do_nmi() function to handle a machine-specific NMI.  Finally,
 preemption is restored.
-Strictly speaking, rcu_dereference() is not needed, since this code runs
+In theory, rcu_dereference_sched() is not needed, since this code runs
-only on i386, which does not need rcu_dereference() anyway.  However,
+only on i386, which in theory does not need rcu_dereference_sched()
-it is a good documentation aid, particularly for anyone attempting to
+anyway.  However, in practice it is a good documentation aid, particularly
-do something similar on Alpha.
+for anyone attempting to do something similar on Alpha or on systems
+with aggressive optimizing compilers.
-Quick Quiz:  Why might the rcu_dereference() be necessary on Alpha,
+Quick Quiz:  Why might the rcu_dereference_sched() be necessary on Alpha,
             given that the code referenced by the pointer is read-only?
@@ -99,17 +100,21 @@ invoke irq_enter() and irq_exit() on NMI entry and exit, respectively.
 Answer to Quick Quiz
-        Why might the rcu_dereference() be necessary on Alpha, given
+        Why might the rcu_dereference_sched() be necessary on Alpha, given
        that the code referenced by the pointer is read-only?
        Answer: The caller to set_nmi_callback() might well have
-                initialized some data that is to be used by the
+                initialized some data that is to be used by the new NMI
-                new NMI handler.  In this case, the rcu_dereference()
+                handler.  In this case, the rcu_dereference_sched() would
-                would be needed, because otherwise a CPU that received
+                be needed, because otherwise a CPU that received an NMI
-                an NMI just after the new handler was set might see
+                just after the new handler was set might see the pointer
-                the pointer to the new NMI handler, but the old
+                to the new NMI handler, but the old pre-initialized
-                pre-initialized version of the handler's data.
+                version of the handler's data.
-                More important, the rcu_dereference() makes it clear
+                This same sad story can happen on other CPUs when using
-                to someone reading the code that the pointer is being
+                a compiler with aggressive pointer-value speculation
-                protected by RCU.
+                optimizations.
+                More important, the rcu_dereference_sched() makes it
+                clear to someone reading the code that the pointer is
+                being protected by RCU-sched.
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index cbc180f90194..790d1a812376 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -260,7 +260,8 @@ over a rather long period of time, but improvements are always welcome!
        The reason that it is permissible to use RCU list-traversal
        primitives when the update-side lock is held is that doing so
        can be quite helpful in reducing code bloat when common code is
-        shared between readers and updaters.
+        shared between readers and updaters.  Additional primitives
+        are provided for this case, as discussed in lockdep.txt.
 10.     Conversely, if you are in an RCU read-side critical section,
        and you don't hold the appropriate update-side lock, you -must-
@@ -344,8 +345,8 @@ over a rather long period of time, but improvements are always welcome!
        requiring SRCU's read-side deadlock immunity or low read-side
        realtime latency.
-        Note that, rcu_assign_pointer() and rcu_dereference() relate to
+        Note that, rcu_assign_pointer() relates to SRCU just as they do
-        SRCU just as they do to other forms of RCU.
+        to other forms of RCU.
 15.     The whole point of call_rcu(), synchronize_rcu(), and friends
        is to wait until all pre-existing readers have finished before
diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt
index fe24b58627bd..d7a49b2f6994 100644
--- a/Documentation/RCU/lockdep.txt
+++ b/Documentation/RCU/lockdep.txt
@@ -32,9 +32,20 @@ checking of rcu_dereference() primitives:
        srcu_dereference(p, sp):
                Check for SRCU read-side critical section.
        rcu_dereference_check(p, c):
-                Use explicit check expression "c".
+                Use explicit check expression "c".  This is useful in
+                code that is invoked by both readers and updaters.
        rcu_dereference_raw(p)
                Don't check.  (Use sparingly, if at all.)
+        rcu_dereference_protected(p, c):
+                Use explicit check expression "c", and omit all barriers
+                and compiler constraints.  This is useful when the data
+                structure cannot change, for example, in code that is
+                invoked only by updaters.
+        rcu_access_pointer(p):
+                Return the value of the pointer and omit all barriers,
+                but retain the compiler constraints that prevent duplicating
+                or coalescsing.  This is useful when when testing the
+                value of the pointer itself, for example, against NULL.
 The rcu_dereference_check() check expression can be any boolean
 expression, but would normally include one of the rcu_read_lock_held()
@@ -59,7 +70,20 @@ In case (1), the pointer is picked up in an RCU-safe manner for vanilla
 RCU read-side critical sections, in case (2) the ->file_lock prevents
 any change from taking place, and finally, in case (3) the current task
 is the only task accessing the file_struct, again preventing any change
-from taking place.
+from taking place.  If the above statement was invoked only from updater
+code, it could instead be written as follows:
+        file = rcu_dereference_protected(fdt->fd[fd],
+                                         lockdep_is_held(&files->file_lock) ||
+                                         atomic_read(&files->count) == 1);
+This would verify cases #2 and #3 above, and furthermore lockdep would
+complain if this was used in an RCU read-side critical section unless one
+of these two cases held.  Because rcu_dereference_protected() omits all
+barriers and compiler constraints, it generates better code than do the
+other flavors of rcu_dereference().  On the other hand, it is illegal
+to use rcu_dereference_protected() if either the RCU-protected pointer
+or the RCU-protected data that it points to can change concurrently.
 There are currently only "universal" versions of the rcu_assign_pointer()
 and RCU list-/tree-traversal primitives, which do not (yet) check for
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 1dc00ee97163..cfaac34c4557 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -840,6 +840,12 @@ SRCU:	Initialization/cleanup
        init_srcu_struct
        cleanup_srcu_struct
+All:  lockdep-checked RCU-protected pointer access
+        rcu_dereference_check
+        rcu_dereference_protected
+        rcu_access_pointer
 See the comment headers in the source code (or the docbook generated
 from them) for more information.
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist
index 1053a56be3b1..8916ca48bc95 100644
--- a/Documentation/SubmitChecklist
+++ b/Documentation/SubmitChecklist
@@ -9,10 +9,14 @@ Documentation/SubmittingPatches and elsewhere regarding submitting Linux
 kernel patches.
-1: Builds cleanly with applicable or modified CONFIG options =y, =m, and
+1: If you use a facility then #include the file that defines/declares
+   that facility.  Don't depend on other header files pulling in ones
+   that you use.
+2: Builds cleanly with applicable or modified CONFIG options =y, =m, and
   =n.  No gcc warnings/errors, no linker warnings/errors.
-2: Passes allnoconfig, allmodconfig
+2b: Passes allnoconfig, allmodconfig
 3: Builds on multiple CPU architectures by using local cross-compile tools
   or some other build farm.
diff --git a/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
index 76b3a11e90be..fa968aa99d67 100644
--- a/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
+++ b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
@@ -14,8 +14,8 @@ Introduction
 how the clocks are arranged. The first implementation used as single
 PLL to feed the ARM, memory and peripherals via a series of dividers
 and muxes and this is the implementation that is documented here. A
- newer version where there is a seperate PLL and clock divider for the
+ newer version where there is a separate PLL and clock divider for the
- ARM core is available as a seperate driver.
+ ARM core is available as a separate driver.
 Layout
diff --git a/Documentation/arm/Samsung/Overview.txt b/Documentation/arm/Samsung/Overview.txt
new file mode 100644
index 000000000000..7cced1fea9c3
--- /dev/null
+++ b/Documentation/arm/Samsung/Overview.txt
@@ -0,0 +1,86 @@
+                Samsung ARM Linux Overview
+                ==========================
+Introduction
+------------
+  The Samsung range of ARM SoCs spans many similar devices, from the initial
+  ARM9 through to the newest ARM cores. This document shows an overview of
+  the current kernel support, how to use it and where to find the code
+  that supports this.
+  The currently supported SoCs are:
+  - S3C24XX: See Documentation/arm/Samsung-S3C24XX/Overview.txt for full list
+  - S3C64XX: S3C6400 and S3C6410
+  - S5PC6440
+  S5PC100 and S5PC110 support is currently being merged
+S3C24XX Systems
+---------------
+  There is still documentation in Documnetation/arm/Samsung-S3C24XX/ which
+  deals with the architecture and drivers specific to these devices.
+  See Documentation/arm/Samsung-S3C24XX/Overview.txt for more information
+  on the implementation details and specific support.
+Configuration
+-------------
+  A number of configurations are supplied, as there is no current way of
+  unifying all the SoCs into one kernel.
+  s5p6440_defconfig - S5P6440 specific default configuration
+  s5pc100_defconfig - S5PC100 specific default configuration
+Layout
+------
+  The directory layout is currently being restructured, and consists of
+  several platform directories and then the machine specific directories
+  of the CPUs being built for.
+  plat-samsung provides the base for all the implementations, and is the
+  last in the line of include directories that are processed for the build
+  specific information. It contains the base clock, GPIO and device definitions
+  to get the system running.
+  plat-s3c is the s3c24xx/s3c64xx platform directory, although it is currently
+  involved in other builds this will be phased out once the relevant code is
+  moved elsewhere.
+  plat-s3c24xx is for s3c24xx specific builds, see the S3C24XX docs.
+  plat-s3c64xx is for the s3c64xx specific bits, see the S3C24XX docs.
+  plat-s5p is for s5p specific builds, more to be added.
+  [ to finish ]
+Port Contributors
+-----------------
+  Ben Dooks (BJD)
+  Vincent Sanders
+  Herbert Potzl
+  Arnaud Patard (RTP)
+  Roc Wu
+  Klaus Fetscher
+  Dimitry Andric
+  Shannon Holland
+  Guillaume Gourat (NexVision)
+  Christer Weinigel (wingel) (Acer N30)
+  Lucas Correia Villa Real (S3C2400 port)
+Document Author
+---------------
+Copyright 2009-2010 Ben Dooks <ben-linux@fluff.org>
diff --git a/Documentation/arm/Samsung/clksrc-change-registers.awk b/Documentation/arm/Samsung/clksrc-change-registers.awk
new file mode 100755
index 000000000000..0c50220851fb
--- /dev/null
+++ b/Documentation/arm/Samsung/clksrc-change-registers.awk
@@ -0,0 +1,167 @@
+#!/usr/bin/awk -f
+#
+# Copyright 2010 Ben Dooks <ben-linux@fluff.org>
+#
+# Released under GPLv2
+# example usage
+# ./clksrc-change-registers.awk arch/arm/plat-s5pc1xx/include/plat/regs-clock.h < src > dst
+function extract_value(s)
+{
+    eqat = index(s, "=")
+    comat = index(s, ",")
+    return substr(s, eqat+2, (comat-eqat)-2)
+}
+function remove_brackets(b)
+{
+    return substr(b, 2, length(b)-2)
+}
+function splitdefine(l, p)
+{
+    r = split(l, tp)
+    p[0] = tp[2]
+    p[1] = remove_brackets(tp[3])
+}
+function find_length(f)
+{
+    if (0)
+        printf "find_length " f "\n" > "/dev/stderr"
+    if (f ~ /0x1/)
+        return 1
+    else if (f ~ /0x3/)
+        return 2
+    else if (f ~ /0x7/)
+        return 3
+    else if (f ~ /0xf/)
+        return 4
+    printf "unknown legnth " f "\n" > "/dev/stderr"
+    exit
+}
+function find_shift(s)
+{
+    id = index(s, "<")
+    if (id <= 0) {
+        printf "cannot find shift " s "\n" > "/dev/stderr"
+        exit
+    }
+    return substr(s, id+2)
+}
+BEGIN {
+    if (ARGC < 2) {
+        print "too few arguments" > "/dev/stderr"
+        exit
+    }
+# read the header file and find the mask values that we will need
+# to replace and create an associative array of values
+    while (getline line < ARGV[1] > 0) {
+        if (line ~ /\#define.*_MASK/ &&
+            !(line ~ /S5PC100_EPLL_MASK/) &&
+            !(line ~ /USB_SIG_MASK/)) {
+            splitdefine(line, fields)
+            name = fields[0]
+            if (0)
+                printf "MASK " line "\n" > "/dev/stderr"
+            dmask[name,0] = find_length(fields[1])
+            dmask[name,1] = find_shift(fields[1])
+            if (0)
+                printf "=> '" name "' LENGTH=" dmask[name,0] " SHIFT=" dmask[name,1] "\n" > "/dev/stderr"
+        } else {
+        }
+    }
+    delete ARGV[1]
+}
+/clksrc_clk.*=.*{/ {
+    shift=""
+    mask=""
+    divshift=""
+    reg_div=""
+    reg_src=""
+    indent=1
+    print $0
+    for(; indent >= 1;) {
+        if ((getline line) <= 0) {
+            printf "unexpected end of file" > "/dev/stderr"
+            exit 1;
+        }
+        if (line ~ /\.shift/) {
+            shift = extract_value(line)
+        } else if (line ~ /\.mask/) {
+            mask = extract_value(line)
+        } else if (line ~ /\.reg_divider/) {
+            reg_div = extract_value(line)
+        } else if (line ~ /\.reg_source/) {
+            reg_src = extract_value(line)
+        } else if (line ~ /\.divider_shift/) {
+            divshift = extract_value(line)
+        } else if (line ~ /{/) {
+                indent++
+                print line
+            } else if (line ~ /}/) {
+            indent--
+            if (indent == 0) {
+                if (0) {
+                    printf "shift '" shift   "' ='" dmask[shift,0] "'\n" > "/dev/stderr"
+                    printf "mask  '" mask    "'\n" > "/dev/stderr"
+                    printf "dshft '" divshift "'\n" > "/dev/stderr"
+                    printf "rdiv  '" reg_div "'\n" > "/dev/stderr"
+                    printf "rsrc  '" reg_src "'\n" > "/dev/stderr"
+                }
+                generated = mask
+                sub(reg_src, reg_div, generated)
+                if (0) {
+                    printf "/* rsrc " reg_src " */\n"
+                    printf "/* rdiv " reg_div " */\n"
+                    printf "/* shift " shift " */\n"
+                    printf "/* mask " mask " */\n"
+                    printf "/* generated " generated " */\n"
+                }
+                if (reg_div != "") {
+                    printf "\t.reg_div = { "
+                    printf ".reg = " reg_div ", "
+                    printf ".shift = " dmask[generated,1] ", "
+                    printf ".size = " dmask[generated,0] ", "
+                    printf "},\n"
+                }
+                printf "\t.reg_src = { "
+                printf ".reg = " reg_src ", "
+                printf ".shift = " dmask[mask,1] ", "
+                printf ".size = " dmask[mask,0] ", "
+                printf "},\n"
+            }
+            print line
+        } else {
+            print line
+        }
+        if (0)
+            printf indent ":" line "\n" > "/dev/stderr"
+    }
+}
+// && ! /clksrc_clk.*=.*{/ { print $0 }
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index 6fab97ea7e6b..508b5b2b0289 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -1162,8 +1162,8 @@ where a driver received a request ala this before:
 As mentioned, there is no virtual mapping of a bio. For DMA, this is
 not a problem as the driver probably never will need a virtual mapping.
-Instead it needs a bus mapping (pci_map_page for a single segment or
+Instead it needs a bus mapping (dma_map_page for a single segment or
-use blk_rq_map_sg for scatter gather) to be able to ship it to the driver. For
+use dma_map_sg for scatter gather) to be able to ship it to the driver. For
 PIO drivers (or drivers that need to revert to PIO transfer once in a
 while (IDE for example)), where the CPU is doing the actual data
 transfer a virtual mapping is needed. If the driver supports highmem I/O,
diff --git a/Documentation/cgroups/cgroup_event_listener.c b/Documentation/cgroups/cgroup_event_listener.c
new file mode 100644
index 000000000000..8c2bfc4a6358
--- /dev/null
+++ b/Documentation/cgroups/cgroup_event_listener.c
@@ -0,0 +1,110 @@
+/*
+ * cgroup_event_listener.c - Simple listener of cgroup events
+ *
+ * Copyright (C) Kirill A. Shutemov <kirill@shutemov.name>
+ */
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <libgen.h>
+#include <limits.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/eventfd.h>
+#define USAGE_STR "Usage: cgroup_event_listener <path-to-control-file> <args>\n"
+int main(int argc, char **argv)
+{
+        int efd = -1;
+        int cfd = -1;
+        int event_control = -1;
+        char event_control_path[PATH_MAX];
+        char line[LINE_MAX];
+        int ret;
+        if (argc != 3) {
+                fputs(USAGE_STR, stderr);
+                return 1;
+        }
+        cfd = open(argv[1], O_RDONLY);
+        if (cfd == -1) {
+                fprintf(stderr, "Cannot open %s: %s\n", argv[1],
+                                strerror(errno));
+                goto out;
+        }
+        ret = snprintf(event_control_path, PATH_MAX, "%s/cgroup.event_control",
+                        dirname(argv[1]));
+        if (ret >= PATH_MAX) {
+                fputs("Path to cgroup.event_control is too long\n", stderr);
+                goto out;
+        }
+        event_control = open(event_control_path, O_WRONLY);
+        if (event_control == -1) {
+                fprintf(stderr, "Cannot open %s: %s\n", event_control_path,
+                                strerror(errno));
+                goto out;
+        }
+        efd = eventfd(0, 0);
+        if (efd == -1) {
+                perror("eventfd() failed");
+                goto out;
+        }
+        ret = snprintf(line, LINE_MAX, "%d %d %s", efd, cfd, argv[2]);
+        if (ret >= LINE_MAX) {
+                fputs("Arguments string is too long\n", stderr);
+                goto out;
+        }
+        ret = write(event_control, line, strlen(line) + 1);
+        if (ret == -1) {
+                perror("Cannot write to cgroup.event_control");
+                goto out;
+        }
+        while (1) {
+                uint64_t result;
+                ret = read(efd, &result, sizeof(result));
+                if (ret == -1) {
+                        if (errno == EINTR)
+                                continue;
+                        perror("Cannot read from eventfd");
+                        break;
+                }
+                assert(ret == sizeof(result));
+                ret = access(event_control_path, W_OK);
+                if ((ret == -1) && (errno == ENOENT)) {
+                                puts("The cgroup seems to have removed.");
+                                ret = 0;
+                                break;
+                }
+                if (ret == -1) {
+                        perror("cgroup.event_control "
+                                        "is not accessable any more");
+                        break;
+                }
+                printf("%s %s: crossed\n", argv[1], argv[2]);
+        }
+out:
+        if (efd >= 0)
+                close(efd);
+        if (event_control >= 0)
+                close(event_control);
+        if (cfd >= 0)
+                close(cfd);
+        return (ret != 0);
+}
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 0b33bfe7dde9..a1ca5924faff 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -22,6 +22,8 @@ CONTENTS:
 2. Usage Examples and Syntax
  2.1 Basic Usage
  2.2 Attaching processes
+  2.3 Mounting hierarchies by name
+  2.4 Notification API
 3. Kernel API
  3.1 Overview
  3.2 Synchronization
@@ -233,8 +235,7 @@ containing the following files describing that cgroup:
 - cgroup.procs: list of tgids in the cgroup.  This list is not
   guaranteed to be sorted or free of duplicate tgids, and userspace
   should sort/uniquify the list if this property is required.
-   Writing a tgid into this file moves all threads with that tgid into
+   This is a read-only file, for now.
-   this cgroup.
 - notify_on_release flag: run the release agent on exit?
 - release_agent: the path to use for release notifications (this file
   exists in the top cgroup only)
@@ -434,6 +435,25 @@ you give a subsystem a name.
 The name of the subsystem appears as part of the hierarchy description
 in /proc/mounts and /proc/<pid>/cgroups.
+2.4 Notification API
+--------------------
+There is mechanism which allows to get notifications about changing
+status of a cgroup.
+To register new notification handler you need:
+ - create a file descriptor for event notification using eventfd(2);
+ - open a control file to be monitored (e.g. memory.usage_in_bytes);
+ - write "<event_fd> <control_fd> <args>" to cgroup.event_control.
+   Interpretation of args is defined by control file implementation;
+eventfd will be woken up by control file implementation or when the
+cgroup is removed.
+To unregister notification handler just close eventfd.
+NOTE: Support of notifications should be implemented for the control
+file. See documentation for the subsystem.
 3. Kernel API
 =============
@@ -488,6 +508,11 @@ Each subsystem should:
 - add an entry in linux/cgroup_subsys.h
 - define a cgroup_subsys object called <name>_subsys
+If a subsystem can be compiled as a module, it should also have in its
+module initcall a call to cgroup_load_subsys(), and in its exitcall a
+call to cgroup_unload_subsys(). It should also set its_subsys.module =
+THIS_MODULE in its .c file.
 Each subsystem may export the following methods. The only mandatory
 methods are create/destroy. Any others that are null are presumed to
 be successful no-ops.
@@ -536,10 +561,21 @@ returns an error, this will abort the attach operation.  If a NULL
 task is passed, then a successful result indicates that *any*
 unspecified task can be moved into the cgroup. Note that this isn't
 called on a fork. If this method returns 0 (success) then this should
-remain valid while the caller holds cgroup_mutex. If threadgroup is
+remain valid while the caller holds cgroup_mutex and it is ensured that either
+attach() or cancel_attach() will be called in future. If threadgroup is
 true, then a successful result indicates that all threads in the given
 thread's threadgroup can be moved together.
+void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
+               struct task_struct *task, bool threadgroup)
+(cgroup_mutex held by caller)
+Called when a task attach operation has failed after can_attach() has succeeded.
+A subsystem whose can_attach() has some side-effects should provide this
+function, so that the subsytem can implement a rollback. If not, not necessary.
+This will be called only about subsystems whose can_attach() operation have
+succeeded.
 void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
            struct cgroup *old_cgrp, struct task_struct *task,
            bool threadgroup)
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index 1d7e9784439a..4160df82b3f5 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -168,20 +168,20 @@ Each cpuset is represented by a directory in the cgroup file system
 containing (on top of the standard cgroup files) the following
 files describing that cpuset:
- - cpus: list of CPUs in that cpuset
+ - cpuset.cpus: list of CPUs in that cpuset
- - mems: list of Memory Nodes in that cpuset
+ - cpuset.mems: list of Memory Nodes in that cpuset
- - memory_migrate flag: if set, move pages to cpusets nodes
+ - cpuset.memory_migrate flag: if set, move pages to cpusets nodes
- - cpu_exclusive flag: is cpu placement exclusive?
+ - cpuset.cpu_exclusive flag: is cpu placement exclusive?
- - mem_exclusive flag: is memory placement exclusive?
+ - cpuset.mem_exclusive flag: is memory placement exclusive?
- - mem_hardwall flag:  is memory allocation hardwalled
+ - cpuset.mem_hardwall flag:  is memory allocation hardwalled
- - memory_pressure: measure of how much paging pressure in cpuset
+ - cpuset.memory_pressure: measure of how much paging pressure in cpuset
- - memory_spread_page flag: if set, spread page cache evenly on allowed nodes
+ - cpuset.memory_spread_page flag: if set, spread page cache evenly on allowed nodes
- - memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes
+ - cpuset.memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes
- - sched_load_balance flag: if set, load balance within CPUs on that cpuset
+ - cpuset.sched_load_balance flag: if set, load balance within CPUs on that cpuset
- - sched_relax_domain_level: the searching range when migrating tasks
+ - cpuset.sched_relax_domain_level: the searching range when migrating tasks
 In addition, the root cpuset only has the following file:
- - memory_pressure_enabled flag: compute memory_pressure?
+ - cpuset.memory_pressure_enabled flag: compute memory_pressure?
 New cpusets are created using the mkdir system call or shell
 command.  The properties of a cpuset, such as its flags, allowed
@@ -229,7 +229,7 @@ If a cpuset is cpu or mem exclusive, no other cpuset, other than
 a direct ancestor or descendant, may share any of the same CPUs or
 Memory Nodes.
-A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled",
+A cpuset that is cpuset.mem_exclusive *or* cpuset.mem_hardwall is "hardwalled",
 i.e. it restricts kernel allocations for page, buffer and other data
 commonly shared by the kernel across multiple users.  All cpusets,
 whether hardwalled or not, restrict allocations of memory for user
@@ -304,15 +304,15 @@ times 1000.
 ---------------------------
 There are two boolean flag files per cpuset that control where the
 kernel allocates pages for the file system buffers and related in
-kernel data structures.  They are called 'memory_spread_page' and
+kernel data structures.  They are called 'cpuset.memory_spread_page' and
-'memory_spread_slab'.
+'cpuset.memory_spread_slab'.
-If the per-cpuset boolean flag file 'memory_spread_page' is set, then
+If the per-cpuset boolean flag file 'cpuset.memory_spread_page' is set, then
 the kernel will spread the file system buffers (page cache) evenly
 over all the nodes that the faulting task is allowed to use, instead
 of preferring to put those pages on the node where the task is running.
-If the per-cpuset boolean flag file 'memory_spread_slab' is set,
+If the per-cpuset boolean flag file 'cpuset.memory_spread_slab' is set,
 then the kernel will spread some file system related slab caches,
 such as for inodes and dentries evenly over all the nodes that the
 faulting task is allowed to use, instead of preferring to put those
@@ -337,21 +337,21 @@ their containing tasks memory spread settings.  If memory spreading
 is turned off, then the currently specified NUMA mempolicy once again
 applies to memory page allocations.
-Both 'memory_spread_page' and 'memory_spread_slab' are boolean flag
+Both 'cpuset.memory_spread_page' and 'cpuset.memory_spread_slab' are boolean flag
 files.  By default they contain "0", meaning that the feature is off
 for that cpuset.  If a "1" is written to that file, then that turns
 the named feature on.
 The implementation is simple.
-Setting the flag 'memory_spread_page' turns on a per-process flag
+Setting the flag 'cpuset.memory_spread_page' turns on a per-process flag
 PF_SPREAD_PAGE for each task that is in that cpuset or subsequently
 joins that cpuset.  The page allocation calls for the page cache
 is modified to perform an inline check for this PF_SPREAD_PAGE task
 flag, and if set, a call to a new routine cpuset_mem_spread_node()
 returns the node to prefer for the allocation.
-Similarly, setting 'memory_spread_slab' turns on the flag
+Similarly, setting 'cpuset.memory_spread_slab' turns on the flag
 PF_SPREAD_SLAB, and appropriately marked slab caches will allocate
 pages from the node returned by cpuset_mem_spread_node().
@@ -404,24 +404,24 @@ the following two situations:
    system overhead on those CPUs, including avoiding task load
    balancing if that is not needed.
-When the per-cpuset flag "sched_load_balance" is enabled (the default
+When the per-cpuset flag "cpuset.sched_load_balance" is enabled (the default
-setting), it requests that all the CPUs in that cpusets allowed 'cpus'
+setting), it requests that all the CPUs in that cpusets allowed 'cpuset.cpus'
 be contained in a single sched domain, ensuring that load balancing
 can move a task (not otherwised pinned, as by sched_setaffinity)
 from any CPU in that cpuset to any other.
-When the per-cpuset flag "sched_load_balance" is disabled, then the
+When the per-cpuset flag "cpuset.sched_load_balance" is disabled, then the
 scheduler will avoid load balancing across the CPUs in that cpuset,
 --except-- in so far as is necessary because some overlapping cpuset
 has "sched_load_balance" enabled.
-So, for example, if the top cpuset has the flag "sched_load_balance"
+So, for example, if the top cpuset has the flag "cpuset.sched_load_balance"
 enabled, then the scheduler will have one sched domain covering all
-CPUs, and the setting of the "sched_load_balance" flag in any other
+CPUs, and the setting of the "cpuset.sched_load_balance" flag in any other
 cpusets won't matter, as we're already fully load balancing.
 Therefore in the above two situations, the top cpuset flag
-"sched_load_balance" should be disabled, and only some of the smaller,
+"cpuset.sched_load_balance" should be disabled, and only some of the smaller,
 child cpusets have this flag enabled.
 When doing this, you don't usually want to leave any unpinned tasks in
@@ -433,7 +433,7 @@ scheduler might not consider the possibility of load balancing that
 task to that underused CPU.
 Of course, tasks pinned to a particular CPU can be left in a cpuset
-that disables "sched_load_balance" as those tasks aren't going anywhere
+that disables "cpuset.sched_load_balance" as those tasks aren't going anywhere
 else anyway.
 There is an impedance mismatch here, between cpusets and sched domains.
@@ -443,19 +443,19 @@ overlap and each CPU is in at most one sched domain.
 It is necessary for sched domains to be flat because load balancing
 across partially overlapping sets of CPUs would risk unstable dynamics
 that would be beyond our understanding.  So if each of two partially
-overlapping cpusets enables the flag 'sched_load_balance', then we
+overlapping cpusets enables the flag 'cpuset.sched_load_balance', then we
 form a single sched domain that is a superset of both.  We won't move
 a task to a CPU outside it cpuset, but the scheduler load balancing
 code might waste some compute cycles considering that possibility.
 This mismatch is why there is not a simple one-to-one relation
-between which cpusets have the flag "sched_load_balance" enabled,
+between which cpusets have the flag "cpuset.sched_load_balance" enabled,
 and the sched domain configuration.  If a cpuset enables the flag, it
 will get balancing across all its CPUs, but if it disables the flag,
 it will only be assured of no load balancing if no other overlapping
 cpuset enables the flag.
-If two cpusets have partially overlapping 'cpus' allowed, and only
+If two cpusets have partially overlapping 'cpuset.cpus' allowed, and only
 one of them has this flag enabled, then the other may find its
 tasks only partially load balanced, just on the overlapping CPUs.
 This is just the general case of the top_cpuset example given a few
@@ -468,23 +468,23 @@ load balancing to the other CPUs.
 1.7.1 sched_load_balance implementation details.
 ------------------------------------------------
-The per-cpuset flag 'sched_load_balance' defaults to enabled (contrary
+The per-cpuset flag 'cpuset.sched_load_balance' defaults to enabled (contrary
 to most cpuset flags.)  When enabled for a cpuset, the kernel will
 ensure that it can load balance across all the CPUs in that cpuset
 (makes sure that all the CPUs in the cpus_allowed of that cpuset are
 in the same sched domain.)
-If two overlapping cpusets both have 'sched_load_balance' enabled,
+If two overlapping cpusets both have 'cpuset.sched_load_balance' enabled,
 then they will be (must be) both in the same sched domain.
-If, as is the default, the top cpuset has 'sched_load_balance' enabled,
+If, as is the default, the top cpuset has 'cpuset.sched_load_balance' enabled,
 then by the above that means there is a single sched domain covering
 the whole system, regardless of any other cpuset settings.
 The kernel commits to user space that it will avoid load balancing
 where it can.  It will pick as fine a granularity partition of sched
 domains as it can while still providing load balancing for any set
-of CPUs allowed to a cpuset having 'sched_load_balance' enabled.
+of CPUs allowed to a cpuset having 'cpuset.sched_load_balance' enabled.
 The internal kernel cpuset to scheduler interface passes from the
 cpuset code to the scheduler code a partition of the load balanced
@@ -495,9 +495,9 @@ all the CPUs that must be load balanced.
 The cpuset code builds a new such partition and passes it to the
 scheduler sched domain setup code, to have the sched domains rebuilt
 as necessary, whenever:
- - the 'sched_load_balance' flag of a cpuset with non-empty CPUs changes,
+ - the 'cpuset.sched_load_balance' flag of a cpuset with non-empty CPUs changes,
 - or CPUs come or go from a cpuset with this flag enabled,
- - or 'sched_relax_domain_level' value of a cpuset with non-empty CPUs
+ - or 'cpuset.sched_relax_domain_level' value of a cpuset with non-empty CPUs
   and with this flag enabled changes,
 - or a cpuset with non-empty CPUs and with this flag enabled is removed,
 - or a cpu is offlined/onlined.
@@ -542,7 +542,7 @@ As the result, task B on CPU X need to wait task A or wait load balance
 on the next tick.  For some applications in special situation, waiting
 1 tick may be too long.
-The 'sched_relax_domain_level' file allows you to request changing
+The 'cpuset.sched_relax_domain_level' file allows you to request changing
 this searching range as you like.  This file takes int value which
 indicates size of searching range in levels ideally as follows,
 otherwise initial value -1 that indicates the cpuset has no request.
@@ -559,8 +559,8 @@ The system default is architecture dependent.  The system default
 can be changed using the relax_domain_level= boot parameter.
 This file is per-cpuset and affect the sched domain where the cpuset
-belongs to.  Therefore if the flag 'sched_load_balance' of a cpuset
+belongs to.  Therefore if the flag 'cpuset.sched_load_balance' of a cpuset
-is disabled, then 'sched_relax_domain_level' have no effect since
+is disabled, then 'cpuset.sched_relax_domain_level' have no effect since
 there is no sched domain belonging the cpuset.
 If multiple cpusets are overlapping and hence they form a single sched
@@ -607,9 +607,9 @@ from one cpuset to another, then the kernel will adjust the tasks
 memory placement, as above, the next time that the kernel attempts
 to allocate a page of memory for that task.
-If a cpuset has its 'cpus' modified, then each task in that cpuset
+If a cpuset has its 'cpuset.cpus' modified, then each task in that cpuset
 will have its allowed CPU placement changed immediately.  Similarly,
-if a tasks pid is written to another cpusets 'tasks' file, then its
+if a tasks pid is written to another cpusets 'cpuset.tasks' file, then its
 allowed CPU placement is changed immediately.  If such a task had been
 bound to some subset of its cpuset using the sched_setaffinity() call,
 the task will be allowed to run on any CPU allowed in its new cpuset,
@@ -622,8 +622,8 @@ and the processor placement is updated immediately.
 Normally, once a page is allocated (given a physical page
 of main memory) then that page stays on whatever node it
 was allocated, so long as it remains allocated, even if the
-cpusets memory placement policy 'mems' subsequently changes.
+cpusets memory placement policy 'cpuset.mems' subsequently changes.
-If the cpuset flag file 'memory_migrate' is set true, then when
+If the cpuset flag file 'cpuset.memory_migrate' is set true, then when
 tasks are attached to that cpuset, any pages that task had
 allocated to it on nodes in its previous cpuset are migrated
 to the tasks new cpuset. The relative placement of the page within
@@ -631,12 +631,12 @@ the cpuset is preserved during these migration operations if possible.
 For example if the page was on the second valid node of the prior cpuset
 then the page will be placed on the second valid node of the new cpuset.
-Also if 'memory_migrate' is set true, then if that cpusets
+Also if 'cpuset.memory_migrate' is set true, then if that cpusets
-'mems' file is modified, pages allocated to tasks in that
+'cpuset.mems' file is modified, pages allocated to tasks in that
-cpuset, that were on nodes in the previous setting of 'mems',
+cpuset, that were on nodes in the previous setting of 'cpuset.mems',
 will be moved to nodes in the new setting of 'mems.'
 Pages that were not in the tasks prior cpuset, or in the cpusets
-prior 'mems' setting, will not be moved.
+prior 'cpuset.mems' setting, will not be moved.
 There is an exception to the above.  If hotplug functionality is used
 to remove all the CPUs that are currently assigned to a cpuset,
@@ -678,8 +678,8 @@ and then start a subshell 'sh' in that cpuset:
  cd /dev/cpuset
  mkdir Charlie
  cd Charlie
-  /bin/echo 2-3 > cpus
+  /bin/echo 2-3 > cpuset.cpus
-  /bin/echo 1 > mems
+  /bin/echo 1 > cpuset.mems
  /bin/echo $$ > tasks
  sh
  # The subshell 'sh' is now running in cpuset Charlie
@@ -725,10 +725,13 @@ Now you want to do something with this cpuset.
 In this directory you can find several files:
 # ls
-cpu_exclusive  memory_migrate      mems                      tasks
+cpuset.cpu_exclusive       cpuset.memory_spread_slab
-cpus           memory_pressure     notify_on_release
+cpuset.cpus                cpuset.mems
-mem_exclusive  memory_spread_page  sched_load_balance
+cpuset.mem_exclusive       cpuset.sched_load_balance
-mem_hardwall   memory_spread_slab  sched_relax_domain_level
+cpuset.mem_hardwall        cpuset.sched_relax_domain_level
+cpuset.memory_migrate      notify_on_release
+cpuset.memory_pressure     tasks
+cpuset.memory_spread_page
 Reading them will give you information about the state of this cpuset:
 the CPUs and Memory Nodes it can use, the processes that are using
@@ -736,13 +739,13 @@ it, its properties.  By writing to these files you can manipulate
 the cpuset.
 Set some flags:
-# /bin/echo 1 > cpu_exclusive
+# /bin/echo 1 > cpuset.cpu_exclusive
 Add some cpus:
-# /bin/echo 0-7 > cpus
+# /bin/echo 0-7 > cpuset.cpus
 Add some mems:
-# /bin/echo 0-7 > mems
+# /bin/echo 0-7 > cpuset.mems
 Now attach your shell to this cpuset:
 # /bin/echo $$ > tasks
@@ -774,28 +777,28 @@ echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent
 This is the syntax to use when writing in the cpus or mems files
 in cpuset directories:
-# /bin/echo 1-4 > cpus          -> set cpus list to cpus 1,2,3,4
+# /bin/echo 1-4 > cpuset.cpus           -> set cpus list to cpus 1,2,3,4
-# /bin/echo 1,2,3,4 > cpus      -> set cpus list to cpus 1,2,3,4
+# /bin/echo 1,2,3,4 > cpuset.cpus       -> set cpus list to cpus 1,2,3,4
 To add a CPU to a cpuset, write the new list of CPUs including the
 CPU to be added. To add 6 to the above cpuset:
-# /bin/echo 1-4,6 > cpus        -> set cpus list to cpus 1,2,3,4,6
+# /bin/echo 1-4,6 > cpuset.cpus -> set cpus list to cpus 1,2,3,4,6
 Similarly to remove a CPU from a cpuset, write the new list of CPUs
 without the CPU to be removed.
 To remove all the CPUs:
-# /bin/echo "" > cpus           -> clear cpus list
+# /bin/echo "" > cpuset.cpus            -> clear cpus list
 2.3 Setting flags
 -----------------
 The syntax is very simple:
-# /bin/echo 1 > cpu_exclusive   -> set flag 'cpu_exclusive'
+# /bin/echo 1 > cpuset.cpu_exclusive    -> set flag 'cpuset.cpu_exclusive'
-# /bin/echo 0 > cpu_exclusive   -> unset flag 'cpu_exclusive'
+# /bin/echo 0 > cpuset.cpu_exclusive    -> unset flag 'cpuset.cpu_exclusive'
 2.4 Attaching processes
 -----------------------
diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt
index 72db89ed0609..f7f68b2ac199 100644
--- a/Documentation/cgroups/memcg_test.txt
+++ b/Documentation/cgroups/memcg_test.txt
@@ -1,6 +1,6 @@
 Memory Resource Controller(Memcg)  Implementation Memo.
-Last Updated: 2009/1/20
+Last Updated: 2010/2
-Base Kernel Version: based on 2.6.29-rc2.
+Base Kernel Version: based on 2.6.33-rc7-mm(candidate for 34).
 Because VM is getting complex (one of reasons is memcg...), memcg's behavior
 is complex. This is a document for memcg's internal behavior.
@@ -337,7 +337,7 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
        race and lock dependency with other cgroup subsystems.
        example)
-        # mount -t cgroup none /cgroup -t cpuset,memory,cpu,devices
+        # mount -t cgroup none /cgroup -o cpuset,memory,cpu,devices
        and do task move, mkdir, rmdir etc...under this.
@@ -348,7 +348,7 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
        For example, test like following is good.
        (Shell-A)
-        # mount -t cgroup none /cgroup -t memory
+        # mount -t cgroup none /cgroup -o memory
        # mkdir /cgroup/test
        # echo 40M > /cgroup/test/memory.limit_in_bytes
        # echo 0 > /cgroup/test/tasks
@@ -378,3 +378,42 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
        #echo 50M > memory.limit_in_bytes
        #echo 50M > memory.memsw.limit_in_bytes
        run 51M of malloc
+ 9.9 Move charges at task migration
+        Charges associated with a task can be moved along with task migration.
+        (Shell-A)
+        #mkdir /cgroup/A
+        #echo $$ >/cgroup/A/tasks
+        run some programs which uses some amount of memory in /cgroup/A.
+        (Shell-B)
+        #mkdir /cgroup/B
+        #echo 1 >/cgroup/B/memory.move_charge_at_immigrate
+        #echo "pid of the program running in group A" >/cgroup/B/tasks
+        You can see charges have been moved by reading *.usage_in_bytes or
+        memory.stat of both A and B.
+        See 8.2 of Documentation/cgroups/memory.txt to see what value should be
+        written to move_charge_at_immigrate.
+ 9.10 Memory thresholds
+        Memory controler implements memory thresholds using cgroups notification
+        API. You can use Documentation/cgroups/cgroup_event_listener.c to test
+        it.
+        (Shell-A) Create cgroup and run event listener
+        # mkdir /cgroup/A
+        # ./cgroup_event_listener /cgroup/A/memory.usage_in_bytes 5M
+        (Shell-B) Add task to cgroup and try to allocate and free memory
+        # echo $$ >/cgroup/A/tasks
+        # a="$(dd if=/dev/zero bs=1M count=10)"
+        # a=
+        You will see message from cgroup_event_listener every time you cross
+        the thresholds.
+        Use /cgroup/A/memory.memsw.usage_in_bytes to test memsw thresholds.
+        It's good idea to test root cgroup as well.
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index b871f2552b45..3a6aecd078ba 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -182,6 +182,8 @@ list.
 NOTE: Reclaim does not work for the root cgroup, since we cannot set any
 limits on the root cgroup.
+Note2: When panic_on_oom is set to "2", the whole system will panic.
 2. Locking
 The memory controller uses the following hierarchy
@@ -262,10 +264,12 @@ some of the pages cached in the cgroup (page cache pages).
 4.2 Task migration
 When a task migrates from one cgroup to another, it's charge is not
-carried forward. The pages allocated from the original cgroup still
+carried forward by default. The pages allocated from the original cgroup still
 remain charged to it, the charge is dropped when the page is freed or
 reclaimed.
+Note: You can move charges of a task along with task migration. See 8.
 4.3 Removing a cgroup
 A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
@@ -336,7 +340,7 @@ Note:
 5.3 swappiness
  Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
-  Following cgroups' swapiness can't be changed.
+  Following cgroups' swappiness can't be changed.
  - root cgroup (uses /proc/sys/vm/swappiness).
  - a cgroup which uses hierarchy and it has child cgroup.
  - a cgroup which uses hierarchy and not the root of hierarchy.
@@ -377,7 +381,8 @@ The feature can be disabled by
 NOTE1: Enabling/disabling will fail if the cgroup already has other
 cgroups created below it.
-NOTE2: This feature can be enabled/disabled per subtree.
+NOTE2: When panic_on_oom is set to "2", the whole system will panic in
+case of an oom event in any cgroup.
 7. Soft limits
@@ -414,7 +419,76 @@ NOTE1: Soft limits take effect over a long period of time, since they involve
 NOTE2: It is recommended to set the soft limit always below the hard limit,
       otherwise the hard limit will take precedence.
-8. TODO
+8. Move charges at task migration
+Users can move charges associated with a task along with task migration, that
+is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
+This feature is not supported in !CONFIG_MMU environments because of lack of
+page tables.
+8.1 Interface
+This feature is disabled by default. It can be enabled(and disabled again) by
+writing to memory.move_charge_at_immigrate of the destination cgroup.
+If you want to enable it:
+# echo (some positive value) > memory.move_charge_at_immigrate
+Note: Each bits of move_charge_at_immigrate has its own meaning about what type
+      of charges should be moved. See 8.2 for details.
+Note: Charges are moved only when you move mm->owner, IOW, a leader of a thread
+      group.
+Note: If we cannot find enough space for the task in the destination cgroup, we
+      try to make space by reclaiming memory. Task migration may fail if we
+      cannot make enough space.
+Note: It can take several seconds if you move charges in giga bytes order.
+And if you want disable it again:
+# echo 0 > memory.move_charge_at_immigrate
+8.2 Type of charges which can be move
+Each bits of move_charge_at_immigrate has its own meaning about what type of
+charges should be moved.
+  bit | what type of charges would be moved ?
+ -----+------------------------------------------------------------------------
+   0  | A charge of an anonymous page(or swap of it) used by the target task.
+      | Those pages and swaps must be used only by the target task. You must
+      | enable Swap Extension(see 2.4) to enable move of swap charges.
+Note: Those pages and swaps must be charged to the old cgroup.
+Note: More type of pages(e.g. file cache, shmem,) will be supported by other
+      bits in future.
+8.3 TODO
+- Add support for other types of pages(e.g. file cache, shmem, etc.).
+- Implement madvise(2) to let users decide the vma to be moved or not to be
+  moved.
+- All of moving charge operations are done under cgroup_mutex. It's not good
+  behavior to hold the mutex too long, so we may need some trick.
+9. Memory thresholds
+Memory controler implements memory thresholds using cgroups notification
+API (see cgroups.txt). It allows to register multiple memory and memsw
+thresholds and gets notifications when it crosses.
+To register a threshold application need:
+ - create an eventfd using eventfd(2);
+ - open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
+ - write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to
+   cgroup.event_control.
+Application will be notified through eventfd when memory usage crosses
+threshold in any direction.
+It's applicable for root and non-root cgroup.
+10. TODO
 1. Add support for accounting huge pages (as a separate controller)
 2. Make per-cgroup scanner reclaim not-shared pages first
diff --git a/Documentation/circular-buffers.txt b/Documentation/circular-buffers.txt
new file mode 100644
index 000000000000..8117e5bf6065
--- /dev/null
+++ b/Documentation/circular-buffers.txt
@@ -0,0 +1,234 @@
+                               ================
+                               CIRCULAR BUFFERS
+                               ================
+By: David Howells <dhowells@redhat.com>
+    Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+Linux provides a number of features that can be used to implement circular
+buffering.  There are two sets of such features:
+ (1) Convenience functions for determining information about power-of-2 sized
+     buffers.
+ (2) Memory barriers for when the producer and the consumer of objects in the
+     buffer don't want to share a lock.
+To use these facilities, as discussed below, there needs to be just one
+producer and just one consumer.  It is possible to handle multiple producers by
+serialising them, and to handle multiple consumers by serialising them.
+Contents:
+ (*) What is a circular buffer?
+ (*) Measuring power-of-2 buffers.
+ (*) Using memory barriers with circular buffers.
+     - The producer.
+     - The consumer.
+==========================
+WHAT IS A CIRCULAR BUFFER?
+==========================
+First of all, what is a circular buffer?  A circular buffer is a buffer of
+fixed, finite size into which there are two indices:
+ (1) A 'head' index - the point at which the producer inserts items into the
+     buffer.
+ (2) A 'tail' index - the point at which the consumer finds the next item in
+     the buffer.
+Typically when the tail pointer is equal to the head pointer, the buffer is
+empty; and the buffer is full when the head pointer is one less than the tail
+pointer.
+The head index is incremented when items are added, and the tail index when
+items are removed.  The tail index should never jump the head index, and both
+indices should be wrapped to 0 when they reach the end of the buffer, thus
+allowing an infinite amount of data to flow through the buffer.
+Typically, items will all be of the same unit size, but this isn't strictly
+required to use the techniques below.  The indices can be increased by more
+than 1 if multiple items or variable-sized items are to be included in the
+buffer, provided that neither index overtakes the other.  The implementer must
+be careful, however, as a region more than one unit in size may wrap the end of
+the buffer and be broken into two segments.
+============================
+MEASURING POWER-OF-2 BUFFERS
+============================
+Calculation of the occupancy or the remaining capacity of an arbitrarily sized
+circular buffer would normally be a slow operation, requiring the use of a
+modulus (divide) instruction.  However, if the buffer is of a power-of-2 size,
+then a much quicker bitwise-AND instruction can be used instead.
+Linux provides a set of macros for handling power-of-2 circular buffers.  These
+can be made use of by:
+        #include <linux/circ_buf.h>
+The macros are:
+ (*) Measure the remaining capacity of a buffer:
+        CIRC_SPACE(head_index, tail_index, buffer_size);
+     This returns the amount of space left in the buffer[1] into which items
+     can be inserted.
+ (*) Measure the maximum consecutive immediate space in a buffer:
+        CIRC_SPACE_TO_END(head_index, tail_index, buffer_size);
+     This returns the amount of consecutive space left in the buffer[1] into
+     which items can be immediately inserted without having to wrap back to the
+     beginning of the buffer.
+ (*) Measure the occupancy of a buffer:
+        CIRC_CNT(head_index, tail_index, buffer_size);
+     This returns the number of items currently occupying a buffer[2].
+ (*) Measure the non-wrapping occupancy of a buffer:
+        CIRC_CNT_TO_END(head_index, tail_index, buffer_size);
+     This returns the number of consecutive items[2] that can be extracted from
+     the buffer without having to wrap back to the beginning of the buffer.
+Each of these macros will nominally return a value between 0 and buffer_size-1,
+however:
+ [1] CIRC_SPACE*() are intended to be used in the producer.  To the producer
+     they will return a lower bound as the producer controls the head index,
+     but the consumer may still be depleting the buffer on another CPU and
+     moving the tail index.
+     To the consumer it will show an upper bound as the producer may be busy
+     depleting the space.
+ [2] CIRC_CNT*() are intended to be used in the consumer.  To the consumer they
+     will return a lower bound as the consumer controls the tail index, but the
+     producer may still be filling the buffer on another CPU and moving the
+     head index.
+     To the producer it will show an upper bound as the consumer may be busy
+     emptying the buffer.
+ [3] To a third party, the order in which the writes to the indices by the
+     producer and consumer become visible cannot be guaranteed as they are
+     independent and may be made on different CPUs - so the result in such a
+     situation will merely be a guess, and may even be negative.
+===========================================
+USING MEMORY BARRIERS WITH CIRCULAR BUFFERS
+===========================================
+By using memory barriers in conjunction with circular buffers, you can avoid
+the need to:
+ (1) use a single lock to govern access to both ends of the buffer, thus
+     allowing the buffer to be filled and emptied at the same time; and
+ (2) use atomic counter operations.
+There are two sides to this: the producer that fills the buffer, and the
+consumer that empties it.  Only one thing should be filling a buffer at any one
+time, and only one thing should be emptying a buffer at any one time, but the
+two sides can operate simultaneously.
+THE PRODUCER
+------------
+The producer will look something like this:
+        spin_lock(&producer_lock);
+        unsigned long head = buffer->head;
+        unsigned long tail = ACCESS_ONCE(buffer->tail);
+        if (CIRC_SPACE(head, tail, buffer->size) >= 1) {
+                /* insert one item into the buffer */
+                struct item *item = buffer[head];
+                produce_item(item);
+                smp_wmb(); /* commit the item before incrementing the head */
+                buffer->head = (head + 1) & (buffer->size - 1);
+                /* wake_up() will make sure that the head is committed before
+                 * waking anyone up */
+                wake_up(consumer);
+        }
+        spin_unlock(&producer_lock);
+This will instruct the CPU that the contents of the new item must be written
+before the head index makes it available to the consumer and then instructs the
+CPU that the revised head index must be written before the consumer is woken.
+Note that wake_up() doesn't have to be the exact mechanism used, but whatever
+is used must guarantee a (write) memory barrier between the update of the head
+index and the change of state of the consumer, if a change of state occurs.
+THE CONSUMER
+------------
+The consumer will look something like this:
+        spin_lock(&consumer_lock);
+        unsigned long head = ACCESS_ONCE(buffer->head);
+        unsigned long tail = buffer->tail;
+        if (CIRC_CNT(head, tail, buffer->size) >= 1) {
+                /* read index before reading contents at that index */
+                smp_read_barrier_depends();
+                /* extract one item from the buffer */
+                struct item *item = buffer[tail];
+                consume_item(item);
+                smp_mb(); /* finish reading descriptor before incrementing tail */
+                buffer->tail = (tail + 1) & (buffer->size - 1);
+        }
+        spin_unlock(&consumer_lock);
+This will instruct the CPU to make sure the index is up to date before reading
+the new item, and then it shall make sure the CPU has finished reading the item
+before it writes the new tail pointer, which will erase the item.
+Note the use of ACCESS_ONCE() in both algorithms to read the opposition index.
+This prevents the compiler from discarding and reloading its cached value -
+which some compilers will do across smp_read_barrier_depends().  This isn't
+strictly needed if you can be sure that the opposition index will _only_ be
+used the once.
+===============
+FURTHER READING
+===============
+See also Documentation/memory-barriers.txt for a description of Linux's memory
+barrier facilities.
diff --git a/Documentation/connector/cn_test.c b/Documentation/connector/cn_test.c
index b07add3467f1..7764594778d4 100644
--- a/Documentation/connector/cn_test.c
+++ b/Documentation/connector/cn_test.c
@@ -25,6 +25,7 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/skbuff.h>
+#include <linux/slab.h>
 #include <linux/timer.h>
 #include <linux/connector.h>
diff --git a/Documentation/console/console.txt b/Documentation/console/console.txt
index 877a1b26cc3d..926cf1b5e63e 100644
--- a/Documentation/console/console.txt
+++ b/Documentation/console/console.txt
@@ -74,7 +74,7 @@ driver takes over the consoles vacated by the driver. Binding, on the other
 hand, will bind the driver to the consoles that are currently occupied by a
 system driver.
-NOTE1: Binding and binding must be selected in Kconfig. It's under:
+NOTE1: Binding and unbinding must be selected in Kconfig. It's under:
 Device Drivers -> Character devices -> Support for binding and unbinding
 console drivers
diff --git a/Documentation/driver-model/platform.txt b/Documentation/driver-model/platform.txt
index 2e2c2ea90ceb..41f41632ee55 100644
--- a/Documentation/driver-model/platform.txt
+++ b/Documentation/driver-model/platform.txt
@@ -192,7 +192,7 @@ command line. This will execute all matching early_param() callbacks.
 User specified early platform devices will be registered at this point.
 For the early serial console case the user can specify port on the
 kernel command line as "earlyprintk=serial.0" where "earlyprintk" is
-the class string, "serial" is the name of the platfrom driver and
+the class string, "serial" is the name of the platform driver and
 0 is the platform device id. If the id is -1 then the dot and the
 id can be omitted.
diff --git a/Documentation/eisa.txt b/Documentation/eisa.txt
index 60e361ba08c0..f297fc1202ae 100644
--- a/Documentation/eisa.txt
+++ b/Documentation/eisa.txt
@@ -171,7 +171,7 @@ device.
 virtual_root.force_probe :
 Force the probing code to probe EISA slots even when it cannot find an
-EISA compliant mainboard (nothing appears on slot 0). Defaultd to 0
+EISA compliant mainboard (nothing appears on slot 0). Defaults to 0
 (don't force), and set to 1 (force probing) when either
 CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set.
diff --git a/Documentation/email-clients.txt b/Documentation/email-clients.txt
index a618efab7b15..945ff3fda433 100644
--- a/Documentation/email-clients.txt
+++ b/Documentation/email-clients.txt
@@ -216,26 +216,14 @@ Works.  Use "Insert file..." or external editor.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Gmail (Web GUI)
-If you just have to use Gmail to send patches, it CAN be made to work.  It
+Does not work for sending patches.
-requires a bit of external help, though.
+Gmail web client converts tabs to spaces automatically.
-The first problem is that Gmail converts tabs to spaces.  This will
-totally break your patches.  To prevent this, you have to use a different
+At the same time it wraps lines every 78 chars with CRLF style line breaks
-editor.  There is a firefox extension called "ViewSourceWith"
+although tab2space problem can be solved with external editor.
-(https://addons.mozilla.org/en-US/firefox/addon/394) which allows you to
-edit any text box in the editor of your choice.  Configure it to launch
+Another problem is that Gmail will base64-encode any message that has a
-your favorite editor.  When you want to send a patch, use this technique.
+non-ASCII character. That includes things like European names.
-Once you have crafted your messsage + patch, save and exit the editor,
-which should reload the Gmail edit box.  GMAIL WILL PRESERVE THE TABS.
-Hoorah.  Apparently you can cut-n-paste literal tabs, but Gmail will
-convert those to spaces upon sending!
-The second problem is that Gmail converts tabs to spaces on replies.  If
-you reply to a patch, don't expect to be able to apply it as a patch.
-The last problem is that Gmail will base64-encode any message that has a
-non-ASCII character.  That includes things like European names.  Be aware.
-Gmail is not convenient for lkml patches, but CAN be made to work.
                                ###
diff --git a/Documentation/fb/imacfb.txt b/Documentation/fb/efifb.txt
index 316ec9bb7deb..a59916c29b33 100644
--- a/Documentation/fb/imacfb.txt
+++ b/Documentation/fb/efifb.txt
@@ -1,9 +1,9 @@
-What is imacfb?
+What is efifb?
 ===============
 This is a generic EFI platform driver for Intel based Apple computers.
-Imacfb is only for EFI booted Intel Macs.
+efifb is only for EFI booted Intel Macs.
 Supported Hardware
 ==================
@@ -16,16 +16,16 @@ MacMini
 How to use it?
 ==============
-Imacfb does not have any kind of autodetection of your machine.
+efifb does not have any kind of autodetection of your machine.
 You have to add the following kernel parameters in your elilo.conf:
        Macbook :
-                video=imacfb:macbook
+                video=efifb:macbook
        MacMini :
-                video=imacfb:mini
+                video=efifb:mini
        Macbook Pro 15", iMac 17" :
-                video=imacfb:i17
+                video=efifb:i17
        Macbook Pro 17", iMac 20" :
-                video=imacfb:i20
+                video=efifb:i20
 --
 Edgar Hucek <gimli@dark-green.com>
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index a5cc0db63d7a..ed511af0f79a 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -582,3 +582,10 @@ Why:	The paravirt mmu host support is slower than non-paravirt mmu, both
 Who:    Avi Kivity <avi@redhat.com>
 ----------------------------
+What:   "acpi=ht" boot option
+When:   2.6.35
+Why:    Useful in 2003, implementation is a hack.
+        Generally invoked by accident today.
+        Seen as doing more harm than good.
+Who:    Len Brown <len.brown@intel.com>
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 5139b8c9d5af..4303614b5add 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -16,6 +16,8 @@ befs.txt
        - information about the BeOS filesystem for Linux.
 bfs.txt
        - info for the SCO UnixWare Boot Filesystem (BFS).
+ceph.txt
+        - info for the Ceph Distributed File System
 cifs.txt
        - description of the CIFS filesystem.
 coda.txt
@@ -32,6 +34,8 @@ dlmfs.txt
        - info on the userspace interface to the OCFS2 DLM.
 dnotify.txt
        - info about directory notification in Linux.
+dnotify_test.c
+        - example program for dnotify
 ecryptfs.txt
        - docs on eCryptfs: stacked cryptographic filesystem for Linux.
 exofs.txt
diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt
index 57e0b80a5274..c0236e753bc8 100644
--- a/Documentation/filesystems/9p.txt
+++ b/Documentation/filesystems/9p.txt
@@ -37,6 +37,15 @@ For Plan 9 From User Space applications (http://swtch.com/plan9)
        mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER
+For server running on QEMU host with virtio transport:
+        mount -t 9p -o trans=virtio <mount_tag> /mnt/9
+where mount_tag is the tag associated by the server to each of the exported
+mount points. Each 9P export is seen by the client as a virtio device with an
+associated "mount_tag" property. Available mount tags can be
+seen by reading /sys/bus/virtio/drivers/9pnet_virtio/virtio<n>/mount_tag files.
 OPTIONS
 =======
@@ -47,7 +56,7 @@ OPTIONS
                        fd      - used passed file descriptors for connection
                                (see rfdno and wfdno)
                        virtio  - connect to the next virtio channel available
-                                (from lguest or KVM with trans_virtio module)
+                                (from QEMU with trans_virtio module)
                        rdma    - connect to a specified RDMA channel
  uname=name    user name to attempt mount as on the remote server.  The
@@ -85,7 +94,12 @@ OPTIONS
  port=n        port to connect to on the remote server
-  noextend      force legacy mode (no 9p2000.u semantics)
+  noextend      force legacy mode (no 9p2000.u or 9p2000.L semantics)
+  version=name  Select 9P protocol version. Valid options are:
+                        9p2000          - Legacy mode (same as noextend)
+                        9p2000.u        - Use 9P2000.u protocol
+                        9p2000.L        - Use 9P2000.L protocol
  dfltuid       attempt to mount as a particular uid
diff --git a/Documentation/filesystems/Makefile b/Documentation/filesystems/Makefile
new file mode 100644
index 000000000000..a5dd114da14f
--- /dev/null
+++ b/Documentation/filesystems/Makefile
@@ -0,0 +1,8 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+# List of programs to build
+hostprogs-y := dnotify_test
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt
new file mode 100644
index 000000000000..0660c9f5deef
--- /dev/null
+++ b/Documentation/filesystems/ceph.txt
@@ -0,0 +1,140 @@
+Ceph Distributed File System
+============================
+Ceph is a distributed network file system designed to provide good
+performance, reliability, and scalability.
+Basic features include:
+ * POSIX semantics
+ * Seamless scaling from 1 to many thousands of nodes
+ * High availability and reliability.  No single point of failure.
+ * N-way replication of data across storage nodes
+ * Fast recovery from node failures
+ * Automatic rebalancing of data on node addition/removal
+ * Easy deployment: most FS components are userspace daemons
+Also,
+ * Flexible snapshots (on any directory)
+ * Recursive accounting (nested files, directories, bytes)
+In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
+on symmetric access by all clients to shared block devices, Ceph
+separates data and metadata management into independent server
+clusters, similar to Lustre.  Unlike Lustre, however, metadata and
+storage nodes run entirely as user space daemons.  Storage nodes
+utilize btrfs to store data objects, leveraging its advanced features
+(checksumming, metadata replication, etc.).  File data is striped
+across storage nodes in large chunks to distribute workload and
+facilitate high throughputs.  When storage nodes fail, data is
+re-replicated in a distributed fashion by the storage nodes themselves
+(with some minimal coordination from a cluster monitor), making the
+system extremely efficient and scalable.
+Metadata servers effectively form a large, consistent, distributed
+in-memory cache above the file namespace that is extremely scalable,
+dynamically redistributes metadata in response to workload changes,
+and can tolerate arbitrary (well, non-Byzantine) node failures.  The
+metadata server takes a somewhat unconventional approach to metadata
+storage to significantly improve performance for common workloads.  In
+particular, inodes with only a single link are embedded in
+directories, allowing entire directories of dentries and inodes to be
+loaded into its cache with a single I/O operation.  The contents of
+extremely large directories can be fragmented and managed by
+independent metadata servers, allowing scalable concurrent access.
+The system offers automatic data rebalancing/migration when scaling
+from a small cluster of just a few nodes to many hundreds, without
+requiring an administrator carve the data set into static volumes or
+go through the tedious process of migrating data between servers.
+When the file system approaches full, new nodes can be easily added
+and things will "just work."
+Ceph includes flexible snapshot mechanism that allows a user to create
+a snapshot on any subdirectory (and its nested contents) in the
+system.  Snapshot creation and deletion are as simple as 'mkdir
+.snap/foo' and 'rmdir .snap/foo'.
+Ceph also provides some recursive accounting on directories for nested
+files and bytes.  That is, a 'getfattr -d foo' on any directory in the
+system will reveal the total number of nested regular files and
+subdirectories, and a summation of all nested file sizes.  This makes
+the identification of large disk space consumers relatively quick, as
+no 'du' or similar recursive scan of the file system is required.
+Mount Syntax
+============
+The basic mount syntax is:
+ # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
+You only need to specify a single monitor, as the client will get the
+full list when it connects.  (However, if the monitor you specify
+happens to be down, the mount won't succeed.)  The port can be left
+off if the monitor is using the default.  So if the monitor is at
+1.2.3.4,
+ # mount -t ceph 1.2.3.4:/ /mnt/ceph
+is sufficient.  If /sbin/mount.ceph is installed, a hostname can be
+used instead of an IP address.
+Mount Options
+=============
+  ip=A.B.C.D[:N]
+        Specify the IP and/or port the client should bind to locally.
+        There is normally not much reason to do this.  If the IP is not
+        specified, the client's IP address is determined by looking at the
+        address it's connection to the monitor originates from.
+  wsize=X
+        Specify the maximum write size in bytes.  By default there is no
+        maximum.  Ceph will normally size writes based on the file stripe
+        size.
+  rsize=X
+        Specify the maximum readahead.
+  mount_timeout=X
+        Specify the timeout value for mount (in seconds), in the case
+        of a non-responsive Ceph file system.  The default is 30
+        seconds.
+  rbytes
+        When stat() is called on a directory, set st_size to 'rbytes',
+        the summation of file sizes over all files nested beneath that
+        directory.  This is the default.
+  norbytes
+        When stat() is called on a directory, set st_size to the
+        number of entries in that directory.
+  nocrc
+        Disable CRC32C calculation for data writes.  If set, the storage node
+        must rely on TCP's error correction to detect data corruption
+        in the data payload.
+  noasyncreaddir
+        Disable client's use its local cache to satisfy readdir
+        requests.  (This does not change correctness; the client uses
+        cached metadata only when a lease or capability ensures it is
+        valid.)
+More Information
+================
+For more information on Ceph, see the home page at
+        http://ceph.newdream.net/
+The Linux kernel client source tree is available at
+        git://ceph.newdream.net/git/ceph-client.git
+        git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
+and the source for the full system is at
+        git://ceph.newdream.net/git/ceph.git
diff --git a/Documentation/filesystems/dnotify.txt b/Documentation/filesystems/dnotify.txt
index 9f5d338ddbb8..6baf88f46859 100644
--- a/Documentation/filesystems/dnotify.txt
+++ b/Documentation/filesystems/dnotify.txt
@@ -62,38 +62,9 @@ disabled, fcntl(fd, F_NOTIFY, ...) will return -EINVAL.
 Example
 -------
+See Documentation/filesystems/dnotify_test.c for an example.
-        #define _GNU_SOURCE     /* needed to get the defines */
+NOTE
-        #include <fcntl.h>      /* in glibc 2.2 this has the needed
+----
-                                           values defined */
+Beginning with Linux 2.6.13, dnotify has been replaced by inotify.
-        #include <signal.h>
+See Documentation/filesystems/inotify.txt for more information on it.
-        #include <stdio.h>
-        #include <unistd.h>
-        static volatile int event_fd;
-        static void handler(int sig, siginfo_t *si, void *data)
-        {
-                event_fd = si->si_fd;
-        }
-        int main(void)
-        {
-                struct sigaction act;
-                int fd;
-                act.sa_sigaction = handler;
-                sigemptyset(&act.sa_mask);
-                act.sa_flags = SA_SIGINFO;
-                sigaction(SIGRTMIN + 1, &act, NULL);
-                fd = open(".", O_RDONLY);
-                fcntl(fd, F_SETSIG, SIGRTMIN + 1);
-                fcntl(fd, F_NOTIFY, DN_MODIFY|DN_CREATE|DN_MULTISHOT);
-                /* we will now be notified if any of the files
-                   in "." is modified or new files are created */
-                while (1) {
-                        pause();
-                        printf("Got event on fd=%d\n", event_fd);
-                }
-        }
diff --git a/Documentation/filesystems/dnotify_test.c b/Documentation/filesystems/dnotify_test.c
new file mode 100644
index 000000000000..8b37b4a1e18d
--- /dev/null
+++ b/Documentation/filesystems/dnotify_test.c
@@ -0,0 +1,34 @@
+#define _GNU_SOURCE     /* needed to get the defines */
+#include <fcntl.h>      /* in glibc 2.2 this has the needed
+                                   values defined */
+#include <signal.h>
+#include <stdio.h>
+#include <unistd.h>
+static volatile int event_fd;
+static void handler(int sig, siginfo_t *si, void *data)
+{
+        event_fd = si->si_fd;
+}
+int main(void)
+{
+        struct sigaction act;
+        int fd;
+        act.sa_sigaction = handler;
+        sigemptyset(&act.sa_mask);
+        act.sa_flags = SA_SIGINFO;
+        sigaction(SIGRTMIN + 1, &act, NULL);
+        fd = open(".", O_RDONLY);
+        fcntl(fd, F_SETSIG, SIGRTMIN + 1);
+        fcntl(fd, F_NOTIFY, DN_MODIFY|DN_CREATE|DN_MULTISHOT);
+        /* we will now be notified if any of the files
+           in "." is modified or new files are created */
+        while (1) {
+                pause();
+                printf("Got event on fd=%d\n", event_fd);
+        }
+}
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 96a44dd95e03..a4f30faa4f1f 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -195,7 +195,7 @@ asynchronous manner and the vaule may not be very precise. To see a precise
 snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
 It's slow but very precise.
-Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
+Table 1-2: Contents of the status files (as of 2.6.30-rc7)
 ..............................................................................
 Field                       Content
 Name                        filename of the executable
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
index 3015da0c6b2a..fe09a2cb1858 100644
--- a/Documentation/filesystems/tmpfs.txt
+++ b/Documentation/filesystems/tmpfs.txt
@@ -82,11 +82,13 @@ tmpfs has a mount option to set the NUMA memory allocation policy for
 all files in that instance (if CONFIG_NUMA is enabled) - which can be
 adjusted on the fly via 'mount -o remount ...'
-mpol=default             prefers to allocate memory from the local node
+mpol=default             use the process allocation policy
+                         (see set_mempolicy(2))
 mpol=prefer:Node         prefers to allocate memory from the given Node
 mpol=bind:NodeList       allocates memory only from nodes in NodeList
 mpol=interleave          prefers to allocate from each node in turn
 mpol=interleave:NodeList allocates from each node of NodeList in turn
+mpol=local               prefers to allocate memory from the local node
 NodeList format is a comma-separated list of decimal numbers and ranges,
 a range being two hyphen-separated decimal numbers, the smallest and
@@ -134,3 +136,5 @@ Author:
   Christoph Rohland <cr@sap.com>, 1.12.01
 Updated:
   Hugh Dickins, 4 June 2007
+Updated:
+   KOSAKI Motohiro, 16 Mar 2010
diff --git a/Documentation/hwmon/abituguru b/Documentation/hwmon/abituguru
index 87ffa0f5ec70..5eb3b9d5f0d5 100644
--- a/Documentation/hwmon/abituguru
+++ b/Documentation/hwmon/abituguru
@@ -30,7 +30,7 @@ Supported chips:
           bank1_types=1,1,0,0,0,0,0,2,0,0,0,0,2,0,0,1
           You may also need to specify the fan_sensors option for these boards
           fan_sensors=5
-        2) There is a seperate abituguru3 driver for these motherboards,
+        2) There is a separate abituguru3 driver for these motherboards,
           the abituguru (without the 3 !) driver will not work on these
           motherboards (and visa versa)!
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index 3219ee0dbfef..5ebf5af1d716 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -74,6 +74,11 @@ structure at all.  You should use this to keep device-specific data.
        /* retrieve the value */
        void *i2c_get_clientdata(const struct i2c_client *client);
+Note that starting with kernel 2.6.34, you don't have to set the `data' field
+to NULL in remove() or if probe() failed anymore. The i2c-core does this
+automatically on these occasions. Those are also the only times the core will
+touch this field.
 Accessing the client
 ====================
diff --git a/Documentation/input/rotary-encoder.txt b/Documentation/input/rotary-encoder.txt
index 3a6aec40c0b0..8b4129de1d2d 100644
--- a/Documentation/input/rotary-encoder.txt
+++ b/Documentation/input/rotary-encoder.txt
@@ -75,7 +75,7 @@ and the number of steps or will clamp at the maximum and zero depending on
 the configuration.
 Because GPIO to IRQ mapping is platform specific, this information must
-be given in seperately to the driver. See the example below.
+be given in separately to the driver. See the example below.
 ---------<snip>---------
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 35c9b51d20ea..dd5806f4fcc4 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -291,6 +291,7 @@ Code  Seq#(hex)	Include File		Comments
 0x92    00-0F   drivers/usb/mon/mon_bin.c
 0x93    60-7F   linux/auto_fs.h
 0x94    all     fs/btrfs/ioctl.h
+0x97    00-7F   fs/ceph/ioctl.h         Ceph file system
 0x99    00-0F                           537-Addinboard driver
                                        <mailto:buk@buks.ipn.de>
 0xA0    all     linux/sdp/sdp.h         Industrial Device Project
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 3bc48b0bd3a9..839b21b0699a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -200,10 +200,6 @@ and is between 256 and 4096 characters. It is defined in the file
                        acpi_display_output=video
                        See above.
-        acpi_early_pdc_eval     [HW,ACPI] Evaluate processor _PDC methods
-                                early. Needed on some platforms to properly
-                                initialize the EC.
        acpi_irq_balance [HW,ACPI]
                        ACPI will balance active IRQs
                        default in APIC mode
@@ -324,11 +320,6 @@ and is between 256 and 4096 characters. It is defined in the file
        amd_iommu=      [HW,X86-84]
                        Pass parameters to the AMD IOMMU driver in the system.
                        Possible values are:
-                        isolate - enable device isolation (each device, as far
-                                  as possible, will get its own protection
-                                  domain) [default]
-                        share - put every device behind one IOMMU into the
-                                same protection domain
                        fullflush - enable flushing of IO/TLB entries when
                                    they are unmapped. Otherwise they are
                                    flushed before they will be reused, which
@@ -1203,7 +1194,7 @@ and is between 256 and 4096 characters. It is defined in the file
        libata.force=   [LIBATA] Force configurations.  The format is comma
                        separated list of "[ID:]VAL" where ID is
-                        PORT[:DEVICE].  PORT and DEVICE are decimal numbers
+                        PORT[.DEVICE].  PORT and DEVICE are decimal numbers
                        matching port, link or device.  Basically, it matches
                        the ATA ID string printed on console by libata.  If
                        the whole ID part is omitted, the last PORT and DEVICE
diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt
index bdb13817e1e9..3ab2472509cb 100644
--- a/Documentation/kobject.txt
+++ b/Documentation/kobject.txt
@@ -59,37 +59,56 @@ nice to have in other objects.  The C language does not allow for the
 direct expression of inheritance, so other techniques - such as structure
 embedding - must be used.
-So, for example, the UIO code has a structure that defines the memory
+(As an aside, for those familiar with the kernel linked list implementation,
-region associated with a uio device:
+this is analogous as to how "list_head" structs are rarely useful on
+their own, but are invariably found embedded in the larger objects of
+interest.)
-struct uio_mem {
+So, for example, the UIO code in drivers/uio/uio.c has a structure that
+defines the memory region associated with a uio device:
+    struct uio_map {
        struct kobject kobj;
-        unsigned long addr;
+        struct uio_mem *mem;
-        unsigned long size;
+    };
-        int memtype;
-        void __iomem *internal_addr;
-};
-If you have a struct uio_mem structure, finding its embedded kobject is
+If you have a struct uio_map structure, finding its embedded kobject is
 just a matter of using the kobj member.  Code that works with kobjects will
 often have the opposite problem, however: given a struct kobject pointer,
 what is the pointer to the containing structure?  You must avoid tricks
 (such as assuming that the kobject is at the beginning of the structure)
 and, instead, use the container_of() macro, found in <linux/kernel.h>:
-        container_of(pointer, type, member)
+    container_of(pointer, type, member)
+where:
+  * "pointer" is the pointer to the embedded kobject,
+  * "type" is the type of the containing structure, and
+  * "member" is the name of the structure field to which "pointer" points.
+The return value from container_of() is a pointer to the corresponding
+container type. So, for example, a pointer "kp" to a struct kobject
+embedded *within* a struct uio_map could be converted to a pointer to the
+*containing* uio_map structure with:
+    struct uio_map *u_map = container_of(kp, struct uio_map, kobj);
+For convenience, programmers often define a simple macro for "back-casting"
+kobject pointers to the containing type.  Exactly this happens in the
+earlier drivers/uio/uio.c, as you can see here:
+    struct uio_map {
+        struct kobject kobj;
+        struct uio_mem *mem;
+    };
-where pointer is the pointer to the embedded kobject, type is the type of
+    #define to_map(map) container_of(map, struct uio_map, kobj)
-the containing structure, and member is the name of the structure field to
-which pointer points.  The return value from container_of() is a pointer to
-the given type. So, for example, a pointer "kp" to a struct kobject
-embedded within a struct uio_mem could be converted to a pointer to the
-containing uio_mem structure with:
-    struct uio_mem *u_mem = container_of(kp, struct uio_mem, kobj);
+where the macro argument "map" is a pointer to the struct kobject in
+question.  That macro is subsequently invoked with:
-Programmers often define a simple macro for "back-casting" kobject pointers
+    struct uio_map *map = to_map(kobj);
-to the containing type.
 Initialization of kobjects
@@ -387,4 +406,5 @@ called, and the objects in the former circle release each other.
 Example code to copy from
 For a more complete example of using ksets and kobjects properly, see the
-sample/kobject/kset-example.c code.
+example programs samples/kobject/{kobject-example.c,kset-example.c},
+which will be built as loadable modules if you select CONFIG_SAMPLE_KOBJECT.
diff --git a/Documentation/laptops/00-INDEX b/Documentation/laptops/00-INDEX
index ee5692b26dd4..fa688538e757 100644
--- a/Documentation/laptops/00-INDEX
+++ b/Documentation/laptops/00-INDEX
@@ -2,6 +2,12 @@
        - This file
 acer-wmi.txt
        - information on the Acer Laptop WMI Extras driver.
+asus-laptop.txt
+        - information on the Asus Laptop Extras driver.
+disk-shock-protection.txt
+        - information on hard disk shock protection.
+dslm.c
+        - Simple Disk Sleep Monitor program
 laptop-mode.txt
        - how to conserve battery power using laptop-mode.
 sony-laptop.txt
diff --git a/Documentation/laptops/Makefile b/Documentation/laptops/Makefile
new file mode 100644
index 000000000000..5cb144af3c09
--- /dev/null
+++ b/Documentation/laptops/Makefile
@@ -0,0 +1,8 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+# List of programs to build
+hostprogs-y := dslm
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
diff --git a/Documentation/laptops/dslm.c b/Documentation/laptops/dslm.c
new file mode 100644
index 000000000000..72ff290c5fc6
--- /dev/null
+++ b/Documentation/laptops/dslm.c
@@ -0,0 +1,166 @@
+/*
+ * dslm.c
+ * Simple Disk Sleep Monitor
+ *  by Bartek Kania
+ * Licenced under the GPL
+ */
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <time.h>
+#include <string.h>
+#include <signal.h>
+#include <sys/ioctl.h>
+#include <linux/hdreg.h>
+#ifdef DEBUG
+#define D(x) x
+#else
+#define D(x)
+#endif
+int endit = 0;
+/* Check if the disk is in powersave-mode
+ * Most of the code is stolen from hdparm.
+ * 1 = active, 0 = standby/sleep, -1 = unknown */
+static int check_powermode(int fd)
+{
+    unsigned char args[4] = {WIN_CHECKPOWERMODE1,0,0,0};
+    int state;
+    if (ioctl(fd, HDIO_DRIVE_CMD, &args)
+        && (args[0] = WIN_CHECKPOWERMODE2) /* try again with 0x98 */
+        && ioctl(fd, HDIO_DRIVE_CMD, &args)) {
+        if (errno != EIO || args[0] != 0 || args[1] != 0) {
+            state = -1; /* "unknown"; */
+        } else
+            state = 0; /* "sleeping"; */
+    } else {
+        state = (args[2] == 255) ? 1 : 0;
+    }
+    D(printf(" drive state is:  %d\n", state));
+    return state;
+}
+static char *state_name(int i)
+{
+    if (i == -1) return "unknown";
+    if (i == 0) return "sleeping";
+    if (i == 1) return "active";
+    return "internal error";
+}
+static char *myctime(time_t time)
+{
+    char *ts = ctime(&time);
+    ts[strlen(ts) - 1] = 0;
+    return ts;
+}
+static void measure(int fd)
+{
+    time_t start_time;
+    int last_state;
+    time_t last_time;
+    int curr_state;
+    time_t curr_time = 0;
+    time_t time_diff;
+    time_t active_time = 0;
+    time_t sleep_time = 0;
+    time_t unknown_time = 0;
+    time_t total_time = 0;
+    int changes = 0;
+    float tmp;
+    printf("Starting measurements\n");
+    last_state = check_powermode(fd);
+    start_time = last_time = time(0);
+    printf("  System is in state %s\n\n", state_name(last_state));
+    while(!endit) {
+        sleep(1);
+        curr_state = check_powermode(fd);
+        if (curr_state != last_state || endit) {
+            changes++;
+            curr_time = time(0);
+            time_diff = curr_time - last_time;
+            if (last_state == 1) active_time += time_diff;
+            else if (last_state == 0) sleep_time += time_diff;
+            else unknown_time += time_diff;
+            last_state = curr_state;
+            last_time = curr_time;
+            printf("%s: State-change to %s\n", myctime(curr_time),
+                   state_name(curr_state));
+        }
+    }
+    changes--; /* Compensate for SIGINT */
+    total_time = time(0) - start_time;
+    printf("\nTotal running time:  %lus\n", curr_time - start_time);
+    printf(" State changed %d times\n", changes);
+    tmp = (float)sleep_time / (float)total_time * 100;
+    printf(" Time in sleep state:   %lus (%.2f%%)\n", sleep_time, tmp);
+    tmp = (float)active_time / (float)total_time * 100;
+    printf(" Time in active state:  %lus (%.2f%%)\n", active_time, tmp);
+    tmp = (float)unknown_time / (float)total_time * 100;
+    printf(" Time in unknown state: %lus (%.2f%%)\n", unknown_time, tmp);
+}
+static void ender(int s)
+{
+    endit = 1;
+}
+static void usage(void)
+{
+    puts("usage: dslm [-w <time>] <disk>");
+    exit(0);
+}
+int main(int argc, char **argv)
+{
+    int fd;
+    char *disk = 0;
+    int settle_time = 60;
+    /* Parse the simple command-line */
+    if (argc == 2)
+        disk = argv[1];
+    else if (argc == 4) {
+        settle_time = atoi(argv[2]);
+        disk = argv[3];
+    } else
+        usage();
+    if (!(fd = open(disk, O_RDONLY|O_NONBLOCK))) {
+        printf("Can't open %s, because: %s\n", disk, strerror(errno));
+        exit(-1);
+    }
+    if (settle_time) {
+        printf("Waiting %d seconds for the system to settle down to "
+               "'normal'\n", settle_time);
+        sleep(settle_time);
+    } else
+        puts("Not waiting for system to settle down");
+    signal(SIGINT, ender);
+    measure(fd);
+    close(fd);
+    return 0;
+}
diff --git a/Documentation/laptops/laptop-mode.txt b/Documentation/laptops/laptop-mode.txt
index eeedee11c8c2..2c3c35093023 100644
--- a/Documentation/laptops/laptop-mode.txt
+++ b/Documentation/laptops/laptop-mode.txt
@@ -779,172 +779,4 @@ Monitoring tool
 ---------------
 Bartek Kania submitted this, it can be used to measure how much time your disk
-spends spun up/down.
+spends spun up/down.  See Documentation/laptops/dslm.c
---------------------------dslm.c BEGIN-----------------------------------------
-/*
- * Simple Disk Sleep Monitor
- *  by Bartek Kania
- * Licenced under the GPL
- */
-#include <unistd.h>
-#include <stdlib.h>
-#include <stdio.h>
-#include <fcntl.h>
-#include <errno.h>
-#include <time.h>
-#include <string.h>
-#include <signal.h>
-#include <sys/ioctl.h>
-#include <linux/hdreg.h>
-#ifdef DEBUG
-#define D(x) x
-#else
-#define D(x)
-#endif
-int endit = 0;
-/* Check if the disk is in powersave-mode
- * Most of the code is stolen from hdparm.
- * 1 = active, 0 = standby/sleep, -1 = unknown */
-int check_powermode(int fd)
-{
-    unsigned char args[4] = {WIN_CHECKPOWERMODE1,0,0,0};
-    int state;
-    if (ioctl(fd, HDIO_DRIVE_CMD, &args)
-        && (args[0] = WIN_CHECKPOWERMODE2) /* try again with 0x98 */
-        && ioctl(fd, HDIO_DRIVE_CMD, &args)) {
-        if (errno != EIO || args[0] != 0 || args[1] != 0) {
-            state = -1; /* "unknown"; */
-        } else
-            state = 0; /* "sleeping"; */
-    } else {
-        state = (args[2] == 255) ? 1 : 0;
-    }
-    D(printf(" drive state is:  %d\n", state));
-    return state;
-}
-char *state_name(int i)
-{
-    if (i == -1) return "unknown";
-    if (i == 0) return "sleeping";
-    if (i == 1) return "active";
-    return "internal error";
-}
-char *myctime(time_t time)
-{
-    char *ts = ctime(&time);
-    ts[strlen(ts) - 1] = 0;
-    return ts;
-}
-void measure(int fd)
-{
-    time_t start_time;
-    int last_state;
-    time_t last_time;
-    int curr_state;
-    time_t curr_time = 0;
-    time_t time_diff;
-    time_t active_time = 0;
-    time_t sleep_time = 0;
-    time_t unknown_time = 0;
-    time_t total_time = 0;
-    int changes = 0;
-    float tmp;
-    printf("Starting measurements\n");
-    last_state = check_powermode(fd);
-    start_time = last_time = time(0);
-    printf("  System is in state %s\n\n", state_name(last_state));
-    while(!endit) {
-        sleep(1);
-        curr_state = check_powermode(fd);
-        if (curr_state != last_state || endit) {
-            changes++;
-            curr_time = time(0);
-            time_diff = curr_time - last_time;
-            if (last_state == 1) active_time += time_diff;
-            else if (last_state == 0) sleep_time += time_diff;
-            else unknown_time += time_diff;
-            last_state = curr_state;
-            last_time = curr_time;
-            printf("%s: State-change to %s\n", myctime(curr_time),
-                   state_name(curr_state));
-        }
-    }
-    changes--; /* Compensate for SIGINT */
-    total_time = time(0) - start_time;
-    printf("\nTotal running time:  %lus\n", curr_time - start_time);
-    printf(" State changed %d times\n", changes);
-    tmp = (float)sleep_time / (float)total_time * 100;
-    printf(" Time in sleep state:   %lus (%.2f%%)\n", sleep_time, tmp);
-    tmp = (float)active_time / (float)total_time * 100;
-    printf(" Time in active state:  %lus (%.2f%%)\n", active_time, tmp);
-    tmp = (float)unknown_time / (float)total_time * 100;
-    printf(" Time in unknown state: %lus (%.2f%%)\n", unknown_time, tmp);
-}
-void ender(int s)
-{
-    endit = 1;
-}
-void usage()
-{
-    puts("usage: dslm [-w <time>] <disk>");
-    exit(0);
-}
-int main(int argc, char **argv)
-{
-    int fd;
-    char *disk = 0;
-    int settle_time = 60;
-    /* Parse the simple command-line */
-    if (argc == 2)
-        disk = argv[1];
-    else if (argc == 4) {
-        settle_time = atoi(argv[2]);
-        disk = argv[3];
-    } else
-        usage();
-    if (!(fd = open(disk, O_RDONLY|O_NONBLOCK))) {
-        printf("Can't open %s, because: %s\n", disk, strerror(errno));
-        exit(-1);
-    }
-    if (settle_time) {
-        printf("Waiting %d seconds for the system to settle down to "
-               "'normal'\n", settle_time);
-        sleep(settle_time);
-    } else
-        puts("Not waiting for system to settle down");
-    signal(SIGINT, ender);
-    measure(fd);
-    close(fd);
-    return 0;
-}
---------------------------dslm.c END-------------------------------------------
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 7f5809eddee6..631ad2f1b229 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -3,6 +3,7 @@
                         ============================
 By: David Howells <dhowells@redhat.com>
+    Paul E. McKenney <paulmck@linux.vnet.ibm.com>
 Contents:
@@ -60,6 +61,10 @@ Contents:
     - And then there's the Alpha.
+ (*) Example uses.
+     - Circular buffers.
 (*) References.
@@ -2226,6 +2231,21 @@ The Alpha defines the Linux kernel's memory barrier model.
 See the subsection on "Cache Coherency" above.
+============
+EXAMPLE USES
+============
+CIRCULAR BUFFERS
+----------------
+Memory barriers can be used to implement circular buffering without the need
+of a lock to serialise the producer with the consumer.  See:
+        Documentation/circular-buffers.txt
+for details.
 ==========
 REFERENCES
 ==========
diff --git a/Documentation/networking/Makefile b/Documentation/networking/Makefile
index 6d8af1ac56c4..5aba7a33aeeb 100644
--- a/Documentation/networking/Makefile
+++ b/Documentation/networking/Makefile
@@ -6,3 +6,5 @@ hostprogs-y := ifenslave
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
+obj-m := timestamping/
diff --git a/Documentation/networking/skfp.txt b/Documentation/networking/skfp.txt
index abfddf81e34a..203ec66c9fb4 100644
--- a/Documentation/networking/skfp.txt
+++ b/Documentation/networking/skfp.txt
@@ -68,7 +68,7 @@ Compaq adapters (not tested):
 =======================
 From v2.01 on, the driver is integrated in the linux kernel sources.
-Therefor, the installation is the same as for any other adapter
+Therefore, the installation is the same as for any other adapter
 supported by the kernel.
 Refer to the manual of your distribution about the installation
 of network adapters.
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
new file mode 100644
index 000000000000..7ee770b5ef5f
--- /dev/null
+++ b/Documentation/networking/stmmac.txt
@@ -0,0 +1,143 @@
+       STMicroelectronics 10/100/1000 Synopsys Ethernet driver
+Copyright (C) 2007-2010  STMicroelectronics Ltd
+Author: Giuseppe Cavallaro <peppe.cavallaro@st.com>
+This is the driver for the MAC 10/100/1000 on-chip Ethernet controllers
+(Synopsys IP blocks); it has been fully tested on STLinux platforms.
+Currently this network device driver is for all STM embedded MAC/GMAC
+(7xxx SoCs).
+DWC Ether MAC 10/100/1000 Universal version 3.41a and DWC Ether MAC 10/100
+Universal version 4.0 have been used for developing the first code
+implementation.
+Please, for more information also visit: www.stlinux.com
+1) Kernel Configuration
+The kernel configuration option is STMMAC_ETH:
+ Device Drivers ---> Network device support ---> Ethernet (1000 Mbit) --->
+ STMicroelectronics 10/100/1000 Ethernet driver (STMMAC_ETH)
+2) Driver parameters list:
+        debug: message level (0: no output, 16: all);
+        phyaddr: to manually provide the physical address to the PHY device;
+        dma_rxsize: DMA rx ring size;
+        dma_txsize: DMA tx ring size;
+        buf_sz: DMA buffer size;
+        tc: control the HW FIFO threshold;
+        tx_coe: Enable/Disable Tx Checksum Offload engine;
+        watchdog: transmit timeout (in milliseconds);
+        flow_ctrl: Flow control ability [on/off];
+        pause: Flow Control Pause Time;
+        tmrate: timer period (only if timer optimisation is configured).
+3) Command line options
+Driver parameters can be also passed in command line by using:
+        stmmaceth=dma_rxsize:128,dma_txsize:512
+4) Driver information and notes
+4.1) Transmit process
+The xmit method is invoked when the kernel needs to transmit a packet; it sets
+the descriptors in the ring and informs the DMA engine that there is a packet
+ready to be transmitted.
+Once the controller has finished transmitting the packet, an interrupt is
+triggered; So the driver will be able to release the socket buffers.
+By default, the driver sets the NETIF_F_SG bit in the features field of the
+net_device structure enabling the scatter/gather feature.
+4.2) Receive process
+When one or more packets are received, an interrupt happens. The interrupts
+are not queued so the driver has to scan all the descriptors in the ring during
+the receive process.
+This is based on NAPI so the interrupt handler signals only if there is work to be
+done, and it exits.
+Then the poll method will be scheduled at some future point.
+The incoming packets are stored, by the DMA, in a list of pre-allocated socket
+buffers in order to avoid the memcpy (Zero-copy).
+4.3) Timer-Driver Interrupt
+Instead of having the device that asynchronously notifies the frame receptions, the
+driver configures a timer to generate an interrupt at regular intervals.
+Based on the granularity of the timer, the frames that are received by the device
+will experience different levels of latency. Some NICs have dedicated timer
+device to perform this task. STMMAC can use either the RTC device or the TMU
+channel 2  on STLinux platforms.
+The timers frequency can be passed to the driver as parameter; when change it,
+take care of both hardware capability and network stability/performance impact.
+Several performance tests on STM platforms showed this optimisation allows to spare
+the CPU while having the maximum throughput.
+4.4) WOL
+Wake up on Lan feature through Magic Frame is only supported for the GMAC
+core.
+4.5) DMA descriptors
+Driver handles both normal and enhanced descriptors. The latter has been only
+tested on DWC Ether MAC 10/100/1000 Universal version 3.41a.
+4.6) Ethtool support
+Ethtool is supported. Driver statistics and internal errors can be taken using:
+ethtool -S ethX command. It is possible to dump registers etc.
+4.7) Jumbo and Segmentation Offloading
+Jumbo frames are supported and tested for the GMAC.
+The GSO has been also added but it's performed in software.
+LRO is not supported.
+4.8) Physical
+The driver is compatible with PAL to work with PHY and GPHY devices.
+4.9) Platform information
+Several information came from the platform; please refer to the
+driver's Header file in include/linux directory.
+struct plat_stmmacenet_data {
+        int bus_id;
+        int pbl;
+        int has_gmac;
+        void (*fix_mac_speed)(void *priv, unsigned int speed);
+        void (*bus_setup)(unsigned long ioaddr);
+#ifdef CONFIG_STM_DRIVERS
+        struct stm_pad_config *pad_config;
+#endif
+        void *bsp_priv;
+};
+Where:
+- pbl (Programmable Burst Length) is maximum number of
+  beats to be transferred in one DMA transaction.
+  GMAC also enables the 4xPBL by default.
+- fix_mac_speed and bus_setup are used to configure internal target
+  registers (on STM platforms);
+- has_gmac: GMAC core is on board (get it at run-time in the next step);
+- bus_id: bus identifier.
+struct plat_stmmacphy_data {
+        int bus_id;
+        int phy_addr;
+        unsigned int phy_mask;
+        int interface;
+        int (*phy_reset)(void *priv);
+        void *priv;
+};
+Where:
+- bus_id: bus identifier;
+- phy_addr: physical address used for the attached phy device;
+            set it to -1 to get it at run-time;
+- interface: physical MII interface mode;
+- phy_reset: hook to reset HW function.
+TODO:
+- Continue to make the driver more generic and suitable for other Synopsys
+  Ethernet controllers used on other architectures (i.e. ARM).
+- 10G controllers are not supported.
+- MAC uses Normal descriptors and GMAC uses enhanced ones.
+  This is a limit that should be reviewed. MAC could want to
+  use the enhanced structure.
+- Checksumming: Rx/Tx csum is done in HW in case of GMAC only.
+- Review the timer optimisation code to use an embedded device that seems to be
+  available in new chip generations.
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 0e58b4539176..e8c8f4f06c67 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -41,11 +41,12 @@ SOF_TIMESTAMPING_SOFTWARE:     return system time stamp generated in
 SOF_TIMESTAMPING_TX/RX determine how time stamps are generated.
 SOF_TIMESTAMPING_RAW/SYS determine how they are reported in the
 following control message:
-    struct scm_timestamping {
-           struct timespec systime;
+struct scm_timestamping {
-           struct timespec hwtimetrans;
+        struct timespec systime;
-           struct timespec hwtimeraw;
+        struct timespec hwtimetrans;
-    };
+        struct timespec hwtimeraw;
+};
 recvmsg() can be used to get this control message for regular incoming
 packets. For send time stamps the outgoing packet is looped back to
@@ -87,12 +88,13 @@ by the network device and will be empty without that support.
 SIOCSHWTSTAMP:
 Hardware time stamping must also be initialized for each device driver
-that is expected to do hardware time stamping. The parameter is:
+that is expected to do hardware time stamping. The parameter is defined in
+/include/linux/net_tstamp.h as:
 struct hwtstamp_config {
-    int flags;           /* no flags defined right now, must be zero */
+        int flags;      /* no flags defined right now, must be zero */
-    int tx_type;         /* HWTSTAMP_TX_* */
+        int tx_type;    /* HWTSTAMP_TX_* */
-    int rx_filter;       /* HWTSTAMP_FILTER_* */
+        int rx_filter;  /* HWTSTAMP_FILTER_* */
 };
 Desired behavior is passed into the kernel and to a specific device by
@@ -139,42 +141,56 @@ enum {
        /* time stamp any incoming packet */
        HWTSTAMP_FILTER_ALL,
-        /* return value: time stamp all packets requested plus some others */
+        /* return value: time stamp all packets requested plus some others */
-        HWTSTAMP_FILTER_SOME,
+        HWTSTAMP_FILTER_SOME,
        /* PTP v1, UDP, any kind of event packet */
        HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
-        ...
+        /* for the complete list of values, please check
+         * the include file /include/linux/net_tstamp.h
+         */
 };
 DEVICE IMPLEMENTATION
 A driver which supports hardware time stamping must support the
-SIOCSHWTSTAMP ioctl. Time stamps for received packets must be stored
+SIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with
-in the skb with skb_hwtstamp_set().
+the actual values as described in the section on SIOCSHWTSTAMP.
+Time stamps for received packets must be stored in the skb. To get a pointer
+to the shared time stamp structure of the skb call skb_hwtstamps(). Then
+set the time stamps in the structure:
+struct skb_shared_hwtstamps {
+        /* hardware time stamp transformed into duration
+         * since arbitrary point in time
+         */
+        ktime_t hwtstamp;
+        ktime_t syststamp; /* hwtstamp transformed to system time base */
+};
 Time stamps for outgoing packets are to be generated as follows:
- In hard_start_xmit(), check if skb_hwtstamp_check_tx_hardware()
+- In hard_start_xmit(), check if skb_tx(skb)->hardware is set no-zero.
-  returns non-zero. If yes, then the driver is expected
+  If yes, then the driver is expected to do hardware time stamping.
-  to do hardware time stamping.
 - If this is possible for the skb and requested, then declare
-  that the driver is doing the time stamping by calling
+  that the driver is doing the time stamping by setting the field
-  skb_hwtstamp_tx_in_progress(). A driver not supporting
+  skb_tx(skb)->in_progress non-zero. You might want to keep a pointer
-  hardware time stamping doesn't do that. A driver must never
+  to the associated skb for the next step and not free the skb. A driver
-  touch sk_buff::tstamp! It is used to store how time stamping
+  not supporting hardware time stamping doesn't do that. A driver must
-  for an outgoing packets is to be done.
+  never touch sk_buff::tstamp! It is used to store software generated
+  time stamps by the network subsystem.
 - As soon as the driver has sent the packet and/or obtained a
  hardware time stamp for it, it passes the time stamp back by
  calling skb_hwtstamp_tx() with the original skb, the raw
-  hardware time stamp and a handle to the device (necessary
+  hardware time stamp. skb_hwtstamp_tx() clones the original skb and
-  to convert the hardware time stamp to system time). If obtaining
+  adds the timestamps, therefore the original skb has to be freed now.
-  the hardware time stamp somehow fails, then the driver should
+  If obtaining the hardware time stamp somehow fails, then the driver
-  not fall back to software time stamping. The rationale is that
+  should not fall back to software time stamping. The rationale is that
-  this would occur at a later time in the processing pipeline
+  this would occur at a later time in the processing pipeline than other
-  than other software time stamping and therefore could lead
+  software time stamping and therefore could lead to unexpected deltas
-  to unexpected deltas between time stamps.
+  between time stamps.
- If the driver did not call skb_hwtstamp_tx_in_progress(), then
+- If the driver did not call set skb_tx(skb)->in_progress, then
  dev_hard_start_xmit() checks whether software time stamping
  is wanted as fallback and potentially generates the time stamp.
diff --git a/Documentation/networking/timestamping/Makefile b/Documentation/networking/timestamping/Makefile
index 2a1489fdc036..e79973443e9f 100644
--- a/Documentation/networking/timestamping/Makefile
+++ b/Documentation/networking/timestamping/Makefile
@@ -1,6 +1,13 @@
-CPPFLAGS = -I../../../include
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
-timestamping: timestamping.c
+# List of programs to build
+hostprogs-y := timestamping
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
+HOSTCFLAGS_timestamping.o += -I$(objtree)/usr/include
 clean:
        rm -f timestamping
diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c
index a7936fe8444a..8ba82bfe6a33 100644
--- a/Documentation/networking/timestamping/timestamping.c
+++ b/Documentation/networking/timestamping/timestamping.c
@@ -41,9 +41,9 @@
 #include <arpa/inet.h>
 #include <net/if.h>
-#include "asm/types.h"
+#include <asm/types.h>
-#include "linux/net_tstamp.h"
+#include <linux/net_tstamp.h>
-#include "linux/errqueue.h"
+#include <linux/errqueue.h>
 #ifndef SO_TIMESTAMPING
 # define SO_TIMESTAMPING         37
@@ -164,7 +164,7 @@ static void printpacket(struct msghdr *msg, int res,
        gettimeofday(&now, 0);
-        printf("%ld.%06ld: received %s data, %d bytes from %s, %d bytes control messages\n",
+        printf("%ld.%06ld: received %s data, %d bytes from %s, %zu bytes control messages\n",
               (long)now.tv_sec, (long)now.tv_usec,
               (recvmsg_flags & MSG_ERRQUEUE) ? "error" : "regular",
               res,
@@ -173,7 +173,7 @@ static void printpacket(struct msghdr *msg, int res,
        for (cmsg = CMSG_FIRSTHDR(msg);
             cmsg;
             cmsg = CMSG_NXTHDR(msg, cmsg)) {
-                printf("   cmsg len %d: ", cmsg->cmsg_len);
+                printf("   cmsg len %zu: ", cmsg->cmsg_len);
                switch (cmsg->cmsg_level) {
                case SOL_SOCKET:
                        printf("SOL_SOCKET ");
@@ -370,7 +370,7 @@ int main(int argc, char **argv)
        }
        sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
-        if (socket < 0)
+        if (sock < 0)
                bail("socket");
        memset(&device, 0, sizeof(device));
diff --git a/Documentation/pnp.txt b/Documentation/pnp.txt
index a327db67782a..763e4659bf18 100644
--- a/Documentation/pnp.txt
+++ b/Documentation/pnp.txt
@@ -57,7 +57,7 @@ PC standard floppy disk controller
 # cat resources
 DISABLED
- Notice the string "DISABLED".  THis means the device is not active.
+- Notice the string "DISABLED".  This means the device is not active.
 3.) check the device's possible configurations (optional)
 # cat options
@@ -139,7 +139,7 @@ Plug and Play but it is planned to be in the near future.
 Requirements for a Linux PnP protocol:
 1.) the protocol must use EISA IDs
-2.) the protocol must inform the PnP Layer of a devices current configuration
+2.) the protocol must inform the PnP Layer of a device's current configuration
 - the ability to set resources is optional but preferred.
 The following are PnP protocol related functions:
@@ -158,7 +158,7 @@ pnp_remove_device
 - automatically will free mem used by the device and related structures
 pnp_add_id
- adds a EISA ID to the list of supported IDs for the specified device
+- adds an EISA ID to the list of supported IDs for the specified device
 For more information consult the source of a protocol such as
 /drivers/pnp/pnpbios/core.c.
@@ -167,7 +167,7 @@ For more information consult the source of a protocol such as
 Linux Plug and Play Drivers
 ---------------------------
-        This section contains information for linux PnP driver developers.
+        This section contains information for Linux PnP driver developers.
 The New Way
 ...........
@@ -235,11 +235,10 @@ static int __init serial8250_pnp_init(void)
 The Old Way
 ...........
-a series of compatibility functions have been created to make it easy to convert 
+A series of compatibility functions have been created to make it easy to convert
 ISAPNP drivers.  They should serve as a temporary solution only.
-they are as follows:
+They are as follows:
 struct pnp_card *pnp_find_card(unsigned short vendor,
                                 unsigned short device,
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index ab00eeddecaf..55b859b3bc72 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -256,7 +256,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
      to suspend the device again in future
  int pm_runtime_resume(struct device *dev);
-    - execute the subsystem-leve resume callback for the device; returns 0 on
+    - execute the subsystem-level resume callback for the device; returns 0 on
      success, 1 if the device's run-time PM status was already 'active' or
      error code on failure, where -EAGAIN means it may be safe to attempt to
      resume the device again in future, but 'power.runtime_error' should be
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt
index 6e37be1eeb2d..4f8930263dd9 100644
--- a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt
@@ -21,6 +21,15 @@ Required properties:
 - fsl,qe-num-snums: define how many serial number(SNUM) the QE can use for the
  threads.
+Optional properties:
+- fsl,firmware-phandle:
+    Usage: required only if there is no fsl,qe-firmware child node
+    Value type: <phandle>
+    Definition: Points to a firmware node (see "QE Firmware Node" below)
+        that contains the firmware that should be uploaded for this QE.
+        The compatible property for the firmware node should say,
+        "fsl,qe-firmware".
 Recommended properties
 - brg-frequency : the internal clock source frequency for baud-rate
  generators in Hz.
@@ -59,3 +68,48 @@ Example:
                reg = <0 c000>;
        };
     };
+* QE Firmware Node
+This node defines a firmware binary that is embedded in the device tree, for
+the purpose of passing the firmware from bootloader to the kernel, or from
+the hypervisor to the guest.
+The firmware node itself contains the firmware binary contents, a compatible
+property, and any firmware-specific properties.  The node should be placed
+inside a QE node that needs it.  Doing so eliminates the need for a
+fsl,firmware-phandle property.  Other QE nodes that need the same firmware
+should define an fsl,firmware-phandle property that points to the firmware node
+in the first QE node.
+The fsl,firmware property can be specified in the DTS (possibly using incbin)
+or can be inserted by the boot loader at boot time.
+Required properties:
+  - compatible
+      Usage: required
+      Value type: <string>
+      Definition: A standard property.  Specify a string that indicates what
+          kind of firmware it is.  For QE, this should be "fsl,qe-firmware".
+   - fsl,firmware
+      Usage: required
+      Value type: <prop-encoded-array>, encoded as an array of bytes
+      Definition: A standard property.  This property contains the firmware
+          binary "blob".
+Example:
+        qe1@e0080000 {
+                compatible = "fsl,qe";
+                qe_firmware:qe-firmware {
+                        compatible = "fsl,qe-firmware";
+                        fsl,firmware = [0x70 0xcd 0x00 0x00 0x01 0x46 0x45 ...];
+                };
+                ...
+        };
+        qe2@e0090000 {
+                compatible = "fsl,qe";
+                fsl,firmware-phandle = <&qe_firmware>;
+                ...
+        };
diff --git a/Documentation/s390/kvm.txt b/Documentation/s390/kvm.txt
index 6f5ceb0f09fc..85f3280d7ef6 100644
--- a/Documentation/s390/kvm.txt
+++ b/Documentation/s390/kvm.txt
@@ -102,7 +102,7 @@ args:		unsigned long
 see also:       include/linux/kvm.h
 This ioctl stores the state of the cpu at the guest real address given as
 argument, unless one of the following values defined in include/linux/kvm.h
-is given as arguement:
+is given as argument:
 KVM_S390_STORE_STATUS_NOADDR - the CPU stores its status to the save area in
 absolute lowcore as defined by the principles of operation
 KVM_S390_STORE_STATUS_PREFIXED - the CPU stores its status to the save area in
diff --git a/Documentation/scsi/ChangeLog.lpfc b/Documentation/scsi/ChangeLog.lpfc
index ff19a52fe004..2ffc1148eb95 100644
--- a/Documentation/scsi/ChangeLog.lpfc
+++ b/Documentation/scsi/ChangeLog.lpfc
@@ -989,8 +989,8 @@ Changes from 20040709 to 20040716
        * Remove redundant port_cmp != 2 check in if
          (!port_cmp) { .... if (port_cmp != 2).... }
        * Clock changes: removed struct clk_data and timerList.
-        * Clock changes: seperate nodev_tmo and els_retry_delay into 2
+        * Clock changes: separate nodev_tmo and els_retry_delay into 2
-          seperate timers and convert to 1 argument changed
+          separate timers and convert to 1 argument changed
          LPFC_NODE_FARP_PEND_t to struct lpfc_node_farp_pend convert
          ipfarp_tmo to 1 argument convert target struct tmofunc and
          rtplunfunc to 1 argument * cr_count, cr_delay and
@@ -1514,7 +1514,7 @@ Changes from 20040402 to 20040409
        * Remove unused elxclock declaration in elx_sli.h.
        * Since everywhere IOCB_ENTRY is used, the return value is cast,
          move the cast into the macro.
-        * Split ioctls out into seperate files
+        * Split ioctls out into separate files
 Changes from 20040326 to 20040402
@@ -1534,7 +1534,7 @@ Changes from 20040326 to 20040402
        * Unused variable cleanup
        * Use Linux list macros for DMABUF_t
        * Break up ioctls into 3 sections, dfc, util, hbaapi
-          rearranged code so this could be easily seperated into a
+          rearranged code so this could be easily separated into a
          differnet module later All 3 are currently turned on by
          defines in lpfc_ioctl.c LPFC_DFC_IOCTL, LPFC_UTIL_IOCTL,
          LPFC_HBAAPI_IOCTL
@@ -1551,7 +1551,7 @@ Changes from 20040326 to 20040402
          started by lpfc_online().  lpfc_offline() only stopped
          els_timeout routine.  It now stops all timeout routines
          associated with that hba.
-        * Replace seperate next and prev pointers in struct
+        * Replace separate next and prev pointers in struct
          lpfc_bindlist with list_head type.  In elxHBA_t, replace
          fc_nlpbind_start and _end with fc_nlpbind_list and use
          list_head macros to access it.
diff --git a/Documentation/serial/tty.txt b/Documentation/serial/tty.txt
index 5e5349a4fcd2..7c900507279f 100644
--- a/Documentation/serial/tty.txt
+++ b/Documentation/serial/tty.txt
@@ -105,6 +105,10 @@ write_wakeup()	-	May be called at any point between open and close.
                        is permitted to call the driver write method from
                        this function. In such a situation defer it.
+dcd_change()    -       Report to the tty line the current DCD pin status
+                        changes and the relative timestamp. The timestamp
+                        can be NULL.
 Driver Access
diff --git a/Documentation/sound/alsa/HD-Audio.txt b/Documentation/sound/alsa/HD-Audio.txt
index f4dd3bf99d12..98d14cb8a85d 100644
--- a/Documentation/sound/alsa/HD-Audio.txt
+++ b/Documentation/sound/alsa/HD-Audio.txt
@@ -119,10 +119,18 @@ the codec slots 0 and 1 no matter what the hardware reports.
 Interrupt Handling
 ~~~~~~~~~~~~~~~~~~
-In rare but some cases, the interrupt isn't properly handled as
+HD-audio driver uses MSI as default (if available) since 2.6.33
-default.  You would notice this by the DMA transfer error reported by
+kernel as MSI works better on some machines, and in general, it's
-ALSA PCM core, for example.  Using MSI might help in such a case.
+better for performance.  However, Nvidia controllers showed bad
-Pass `enable_msi=1` option for enabling MSI.
+regressions with MSI (especially in a combination with AMD chipset),
+thus we disabled MSI for them.
+There seem also still other devices that don't work with MSI.  If you
+see a regression wrt the sound quality (stuttering, etc) or a lock-up
+in the recent kernel, try to pass `enable_msi=0` option to disable
+MSI.  If it works, you can add the known bad device to the blacklist
+defined in hda_intel.c.  In such a case, please report and give the
+patch back to the upstream developer. 
 HD-AUDIO CODEC
diff --git a/Documentation/spi/spidev_test.c b/Documentation/spi/spidev_test.c
index 10abd3773e49..16feda901469 100644
--- a/Documentation/spi/spidev_test.c
+++ b/Documentation/spi/spidev_test.c
@@ -58,7 +58,7 @@ static void transfer(int fd)
        };
        ret = ioctl(fd, SPI_IOC_MESSAGE(1), &tr);
-        if (ret == 1)
+        if (ret < 1)
                pabort("can't send spi message");
        for (ret = 0; ret < ARRAY_SIZE(tx); ret++) {
diff --git a/Documentation/stable_kernel_rules.txt b/Documentation/stable_kernel_rules.txt
index 5effa5bd993b..e213f45cf9d7 100644
--- a/Documentation/stable_kernel_rules.txt
+++ b/Documentation/stable_kernel_rules.txt
@@ -18,16 +18,15 @@ Rules on what kind of patches are accepted, and which ones are not, into the
 - It cannot contain any "trivial" fixes in it (spelling changes,
   whitespace cleanups, etc).
 - It must follow the Documentation/SubmittingPatches rules.
- - It or an equivalent fix must already exist in Linus' tree.  Quote the
+ - It or an equivalent fix must already exist in Linus' tree (upstream).
-   respective commit ID in Linus' tree in your patch submission to -stable.
 Procedure for submitting patches to the -stable tree:
 - Send the patch, after verifying that it follows the above rules, to
-   stable@kernel.org.
+   stable@kernel.org.  You must note the upstream commit ID in the changelog
- - To have the patch automatically included in the stable tree, add the
+   of your submission.
-   the tag
+ - To have the patch automatically included in the stable tree, add the tag
     Cc: stable@kernel.org
   in the sign-off area. Once the patch is merged it will be applied to
   the stable tree without anything else needing to be done by the author
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index fc5790d36cd9..6c7d18c53f84 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -573,11 +573,14 @@ Because other nodes' memory may be free. This means system total status
 may be not fatal yet.
 If this is set to 2, the kernel panics compulsorily even on the
-above-mentioned.
+above-mentioned. Even oom happens under memory cgroup, the whole
+system panics.
 The default value is 0.
 1 and 2 are for failover of clustering. Please select either
 according to your policy of failover.
+panic_on_oom=2+kdump gives you very strong tool to investigate
+why oom happens. You can get snapshot.
 =============================================================
diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX
index 397dc35e1323..a9248da5cdbc 100644
--- a/Documentation/timers/00-INDEX
+++ b/Documentation/timers/00-INDEX
@@ -4,6 +4,8 @@ highres.txt
        - High resolution timers and dynamic ticks design notes
 hpet.txt
        - High Precision Event Timer Driver for Linux
+hpet_example.c
+        - sample hpet timer test program
 hrtimers.txt
        - subsystem for high-resolution kernel timers
 timer_stats.txt
diff --git a/Documentation/timers/Makefile b/Documentation/timers/Makefile
new file mode 100644
index 000000000000..c85625f4ab25
--- /dev/null
+++ b/Documentation/timers/Makefile
@@ -0,0 +1,8 @@
+# kbuild trick to avoid linker error. Can be omitted if a module is built.
+obj- := dummy.o
+# List of programs to build
+hostprogs-y := hpet_example
+# Tell kbuild to always build the programs
+always := $(hostprogs-y)
diff --git a/Documentation/timers/hpet.txt b/Documentation/timers/hpet.txt
index 16d25e6b5a00..767392ffd31e 100644
--- a/Documentation/timers/hpet.txt
+++ b/Documentation/timers/hpet.txt
@@ -26,274 +26,5 @@ initialization.  An example of this initialization can be found in
 arch/x86/kernel/hpet.c.
 The driver provides a userspace API which resembles the API found in the
-RTC driver framework.  An example user space program is provided below.
+RTC driver framework.  An example user space program is provided in
+file:Documentation/timers/hpet_example.c
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-#include <fcntl.h>
-#include <string.h>
-#include <memory.h>
-#include <malloc.h>
-#include <time.h>
-#include <ctype.h>
-#include <sys/types.h>
-#include <sys/wait.h>
-#include <signal.h>
-#include <fcntl.h>
-#include <errno.h>
-#include <sys/time.h>
-#include <linux/hpet.h>
-extern void hpet_open_close(int, const char **);
-extern void hpet_info(int, const char **);
-extern void hpet_poll(int, const char **);
-extern void hpet_fasync(int, const char **);
-extern void hpet_read(int, const char **);
-#include <sys/poll.h>
-#include <sys/ioctl.h>
-#include <signal.h>
-struct hpet_command {
-        char            *command;
-        void            (*func)(int argc, const char ** argv);
-} hpet_command[] = {
-        {
-                "open-close",
-                hpet_open_close
-        },
-        {
-                "info",
-                hpet_info
-        },
-        {
-                "poll",
-                hpet_poll
-        },
-        {
-                "fasync",
-                hpet_fasync
-        },
-};
-int
-main(int argc, const char ** argv)
-{
-        int     i;
-        argc--;
-        argv++;
-        if (!argc) {
-                fprintf(stderr, "-hpet: requires command\n");
-                return -1;
-        }
-        for (i = 0; i < (sizeof (hpet_command) / sizeof (hpet_command[0])); i++)
-                if (!strcmp(argv[0], hpet_command[i].command)) {
-                        argc--;
-                        argv++;
-                        fprintf(stderr, "-hpet: executing %s\n",
-                                hpet_command[i].command);
-                        hpet_command[i].func(argc, argv);
-                        return 0;
-                }
-        fprintf(stderr, "do_hpet: command %s not implemented\n", argv[0]);
-        return -1;
-}
-void
-hpet_open_close(int argc, const char **argv)
-{
-        int     fd;
-        if (argc != 1) {
-                fprintf(stderr, "hpet_open_close: device-name\n");
-                return;
-        }
-        fd = open(argv[0], O_RDONLY);
-        if (fd < 0)
-                fprintf(stderr, "hpet_open_close: open failed\n");
-        else
-                close(fd);
-        return;
-}
-void
-hpet_info(int argc, const char **argv)
-{
-}
-void
-hpet_poll(int argc, const char **argv)
-{
-        unsigned long           freq;
-        int                     iterations, i, fd;
-        struct pollfd           pfd;
-        struct hpet_info        info;
-        struct timeval          stv, etv;
-        struct timezone         tz;
-        long                    usec;
-        if (argc != 3) {
-                fprintf(stderr, "hpet_poll: device-name freq iterations\n");
-                return;
-        }
-        freq = atoi(argv[1]);
-        iterations = atoi(argv[2]);
-        fd = open(argv[0], O_RDONLY);
-        if (fd < 0) {
-                fprintf(stderr, "hpet_poll: open of %s failed\n", argv[0]);
-                return;
-        }
-        if (ioctl(fd, HPET_IRQFREQ, freq) < 0) {
-                fprintf(stderr, "hpet_poll: HPET_IRQFREQ failed\n");
-                goto out;
-        }
-        if (ioctl(fd, HPET_INFO, &info) < 0) {
-                fprintf(stderr, "hpet_poll: failed to get info\n");
-                goto out;
-        }
-        fprintf(stderr, "hpet_poll: info.hi_flags 0x%lx\n", info.hi_flags);
-        if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) {
-                fprintf(stderr, "hpet_poll: HPET_EPI failed\n");
-                goto out;
-        }
-        if (ioctl(fd, HPET_IE_ON, 0) < 0) {
-                fprintf(stderr, "hpet_poll, HPET_IE_ON failed\n");
-                goto out;
-        }
-        pfd.fd = fd;
-        pfd.events = POLLIN;
-        for (i = 0; i < iterations; i++) {
-                pfd.revents = 0;
-                gettimeofday(&stv, &tz);
-                if (poll(&pfd, 1, -1) < 0)
-                        fprintf(stderr, "hpet_poll: poll failed\n");
-                else {
-                        long    data;
-                        gettimeofday(&etv, &tz);
-                        usec = stv.tv_sec * 1000000 + stv.tv_usec;
-                        usec = (etv.tv_sec * 1000000 + etv.tv_usec) - usec;
-                        fprintf(stderr,
-                                "hpet_poll: expired time = 0x%lx\n", usec);
-                        fprintf(stderr, "hpet_poll: revents = 0x%x\n",
-                                pfd.revents);
-                        if (read(fd, &data, sizeof(data)) != sizeof(data)) {
-                                fprintf(stderr, "hpet_poll: read failed\n");
-                        }
-                        else
-                                fprintf(stderr, "hpet_poll: data 0x%lx\n",
-                                        data);
-                }
-        }
-out:
-        close(fd);
-        return;
-}
-static int hpet_sigio_count;
-static void
-hpet_sigio(int val)
-{
-        fprintf(stderr, "hpet_sigio: called\n");
-        hpet_sigio_count++;
-}
-void
-hpet_fasync(int argc, const char **argv)
-{
-        unsigned long           freq;
-        int                     iterations, i, fd, value;
-        sig_t                   oldsig;
-        struct hpet_info        info;
-        hpet_sigio_count = 0;
-        fd = -1;
-        if ((oldsig = signal(SIGIO, hpet_sigio)) == SIG_ERR) {
-                fprintf(stderr, "hpet_fasync: failed to set signal handler\n");
-                return;
-        }
-        if (argc != 3) {
-                fprintf(stderr, "hpet_fasync: device-name freq iterations\n");
-                goto out;
-        }
-        fd = open(argv[0], O_RDONLY);
-        if (fd < 0) {
-                fprintf(stderr, "hpet_fasync: failed to open %s\n", argv[0]);
-                return;
-        }
-        if ((fcntl(fd, F_SETOWN, getpid()) == 1) ||
-                ((value = fcntl(fd, F_GETFL)) == 1) ||
-                (fcntl(fd, F_SETFL, value | O_ASYNC) == 1)) {
-                fprintf(stderr, "hpet_fasync: fcntl failed\n");
-                goto out;
-        }
-        freq = atoi(argv[1]);
-        iterations = atoi(argv[2]);
-        if (ioctl(fd, HPET_IRQFREQ, freq) < 0) {
-                fprintf(stderr, "hpet_fasync: HPET_IRQFREQ failed\n");
-                goto out;
-        }
-        if (ioctl(fd, HPET_INFO, &info) < 0) {
-                fprintf(stderr, "hpet_fasync: failed to get info\n");
-                goto out;
-        }
-        fprintf(stderr, "hpet_fasync: info.hi_flags 0x%lx\n", info.hi_flags);
-        if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) {
-                fprintf(stderr, "hpet_fasync: HPET_EPI failed\n");
-                goto out;
-        }
-        if (ioctl(fd, HPET_IE_ON, 0) < 0) {
-                fprintf(stderr, "hpet_fasync, HPET_IE_ON failed\n");
-                goto out;
-        }
-        for (i = 0; i < iterations; i++) {
-                (void) pause();
-                fprintf(stderr, "hpet_fasync: count = %d\n", hpet_sigio_count);
-        }
-out:
-        signal(SIGIO, oldsig);
-        if (fd >= 0)
-                close(fd);
-        return;
-}
diff --git a/Documentation/timers/hpet_example.c b/Documentation/timers/hpet_example.c
new file mode 100644
index 000000000000..f9ce2d9fdfd5
--- /dev/null
+++ b/Documentation/timers/hpet_example.c
@@ -0,0 +1,269 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <string.h>
+#include <memory.h>
+#include <malloc.h>
+#include <time.h>
+#include <ctype.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <signal.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <sys/time.h>
+#include <linux/hpet.h>
+extern void hpet_open_close(int, const char **);
+extern void hpet_info(int, const char **);
+extern void hpet_poll(int, const char **);
+extern void hpet_fasync(int, const char **);
+extern void hpet_read(int, const char **);
+#include <sys/poll.h>
+#include <sys/ioctl.h>
+#include <signal.h>
+struct hpet_command {
+        char            *command;
+        void            (*func)(int argc, const char ** argv);
+} hpet_command[] = {
+        {
+                "open-close",
+                hpet_open_close
+        },
+        {
+                "info",
+                hpet_info
+        },
+        {
+                "poll",
+                hpet_poll
+        },
+        {
+                "fasync",
+                hpet_fasync
+        },
+};
+int
+main(int argc, const char ** argv)
+{
+        int     i;
+        argc--;
+        argv++;
+        if (!argc) {
+                fprintf(stderr, "-hpet: requires command\n");
+                return -1;
+        }
+        for (i = 0; i < (sizeof (hpet_command) / sizeof (hpet_command[0])); i++)
+                if (!strcmp(argv[0], hpet_command[i].command)) {
+                        argc--;
+                        argv++;
+                        fprintf(stderr, "-hpet: executing %s\n",
+                                hpet_command[i].command);
+                        hpet_command[i].func(argc, argv);
+                        return 0;
+                }
+        fprintf(stderr, "do_hpet: command %s not implemented\n", argv[0]);
+        return -1;
+}
+void
+hpet_open_close(int argc, const char **argv)
+{
+        int     fd;
+        if (argc != 1) {
+                fprintf(stderr, "hpet_open_close: device-name\n");
+                return;
+        }
+        fd = open(argv[0], O_RDONLY);
+        if (fd < 0)
+                fprintf(stderr, "hpet_open_close: open failed\n");
+        else
+                close(fd);
+        return;
+}
+void
+hpet_info(int argc, const char **argv)
+{
+}
+void
+hpet_poll(int argc, const char **argv)
+{
+        unsigned long           freq;
+        int                     iterations, i, fd;
+        struct pollfd           pfd;
+        struct hpet_info        info;
+        struct timeval          stv, etv;
+        struct timezone         tz;
+        long                    usec;
+        if (argc != 3) {
+                fprintf(stderr, "hpet_poll: device-name freq iterations\n");
+                return;
+        }
+        freq = atoi(argv[1]);
+        iterations = atoi(argv[2]);
+        fd = open(argv[0], O_RDONLY);
+        if (fd < 0) {
+                fprintf(stderr, "hpet_poll: open of %s failed\n", argv[0]);
+                return;
+        }
+        if (ioctl(fd, HPET_IRQFREQ, freq) < 0) {
+                fprintf(stderr, "hpet_poll: HPET_IRQFREQ failed\n");
+                goto out;
+        }
+        if (ioctl(fd, HPET_INFO, &info) < 0) {
+                fprintf(stderr, "hpet_poll: failed to get info\n");
+                goto out;
+        }
+        fprintf(stderr, "hpet_poll: info.hi_flags 0x%lx\n", info.hi_flags);
+        if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) {
+                fprintf(stderr, "hpet_poll: HPET_EPI failed\n");
+                goto out;
+        }
+        if (ioctl(fd, HPET_IE_ON, 0) < 0) {
+                fprintf(stderr, "hpet_poll, HPET_IE_ON failed\n");
+                goto out;
+        }
+        pfd.fd = fd;
+        pfd.events = POLLIN;
+        for (i = 0; i < iterations; i++) {
+                pfd.revents = 0;
+                gettimeofday(&stv, &tz);
+                if (poll(&pfd, 1, -1) < 0)
+                        fprintf(stderr, "hpet_poll: poll failed\n");
+                else {
+                        long    data;
+                        gettimeofday(&etv, &tz);
+                        usec = stv.tv_sec * 1000000 + stv.tv_usec;
+                        usec = (etv.tv_sec * 1000000 + etv.tv_usec) - usec;
+                        fprintf(stderr,
+                                "hpet_poll: expired time = 0x%lx\n", usec);
+                        fprintf(stderr, "hpet_poll: revents = 0x%x\n",
+                                pfd.revents);
+                        if (read(fd, &data, sizeof(data)) != sizeof(data)) {
+                                fprintf(stderr, "hpet_poll: read failed\n");
+                        }
+                        else
+                                fprintf(stderr, "hpet_poll: data 0x%lx\n",
+                                        data);
+                }
+        }
+out:
+        close(fd);
+        return;
+}
+static int hpet_sigio_count;
+static void
+hpet_sigio(int val)
+{
+        fprintf(stderr, "hpet_sigio: called\n");
+        hpet_sigio_count++;
+}
+void
+hpet_fasync(int argc, const char **argv)
+{
+        unsigned long           freq;
+        int                     iterations, i, fd, value;
+        sig_t                   oldsig;
+        struct hpet_info        info;
+        hpet_sigio_count = 0;
+        fd = -1;
+        if ((oldsig = signal(SIGIO, hpet_sigio)) == SIG_ERR) {
+                fprintf(stderr, "hpet_fasync: failed to set signal handler\n");
+                return;
+        }
+        if (argc != 3) {
+                fprintf(stderr, "hpet_fasync: device-name freq iterations\n");
+                goto out;
+        }
+        fd = open(argv[0], O_RDONLY);
+        if (fd < 0) {
+                fprintf(stderr, "hpet_fasync: failed to open %s\n", argv[0]);
+                return;
+        }
+        if ((fcntl(fd, F_SETOWN, getpid()) == 1) ||
+                ((value = fcntl(fd, F_GETFL)) == 1) ||
+                (fcntl(fd, F_SETFL, value | O_ASYNC) == 1)) {
+                fprintf(stderr, "hpet_fasync: fcntl failed\n");
+                goto out;
+        }
+        freq = atoi(argv[1]);
+        iterations = atoi(argv[2]);
+        if (ioctl(fd, HPET_IRQFREQ, freq) < 0) {
+                fprintf(stderr, "hpet_fasync: HPET_IRQFREQ failed\n");
+                goto out;
+        }
+        if (ioctl(fd, HPET_INFO, &info) < 0) {
+                fprintf(stderr, "hpet_fasync: failed to get info\n");
+                goto out;
+        }
+        fprintf(stderr, "hpet_fasync: info.hi_flags 0x%lx\n", info.hi_flags);
+        if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) {
+                fprintf(stderr, "hpet_fasync: HPET_EPI failed\n");
+                goto out;
+        }
+        if (ioctl(fd, HPET_IE_ON, 0) < 0) {
+                fprintf(stderr, "hpet_fasync, HPET_IE_ON failed\n");
+                goto out;
+        }
+        for (i = 0; i < iterations; i++) {
+                (void) pause();
+                fprintf(stderr, "hpet_fasync: count = %d\n", hpet_sigio_count);
+        }
+out:
+        signal(SIGIO, oldsig);
+        if (fd >= 0)
+                close(fd);
+        return;
+}
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index bab3040da548..03485bfbd797 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1588,7 +1588,7 @@ module author does not need to worry about it.
 When tracing is enabled, kstop_machine is called to prevent
 races with the CPUS executing code being modified (which can
-cause the CPU to do undesireable things), and the nops are
+cause the CPU to do undesirable things), and the nops are
 patched back to calls. But this time, they do not call mcount
 (which is just a function stub). They now call into the ftrace
 infrastructure.
diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX
index e57d6a9dd32b..dca82d7c83d8 100644
--- a/Documentation/vm/00-INDEX
+++ b/Documentation/vm/00-INDEX
@@ -4,23 +4,35 @@ active_mm.txt
        - An explanation from Linus about tsk->active_mm vs tsk->mm.
 balance
        - various information on memory balancing.
+hugepage-mmap.c
+        - Example app using huge page memory with the mmap system call.
+hugepage-shm.c
+        - Example app using huge page memory with Sys V shared memory system calls.
 hugetlbpage.txt
        - a brief summary of hugetlbpage support in the Linux kernel.
+hwpoison.txt
+        - explains what hwpoison is
 ksm.txt
        - how to use the Kernel Samepage Merging feature.
 locking
        - info on how locking and synchronization is done in the Linux vm code.
+map_hugetlb.c
+        - an example program that uses the MAP_HUGETLB mmap flag.
 numa
        - information about NUMA specific code in the Linux vm.
 numa_memory_policy.txt
        - documentation of concepts and APIs of the 2.6 memory policy support.
 overcommit-accounting
        - description of the Linux kernels overcommit handling modes.
+page-types.c
+        - Tool for querying page flags
 page_migration
        - description of page migration in NUMA systems.
+pagemap.txt
+        - pagemap, from the userspace perspective
 slabinfo.c
        - source code for a tool to get reports about slabs.
 slub.txt
        - a short users guide for SLUB.
-map_hugetlb.c
+unevictable-lru.txt
-        - an example program that uses the MAP_HUGETLB mmap flag.
+        - Unevictable LRU infrastructure
diff --git a/Documentation/vm/Makefile b/Documentation/vm/Makefile
index 5bd269b3731a..9dcff328b964 100644
--- a/Documentation/vm/Makefile
+++ b/Documentation/vm/Makefile
@@ -2,7 +2,7 @@
 obj- := dummy.o
 # List of programs to build
-hostprogs-y := slabinfo page-types
+hostprogs-y := slabinfo page-types hugepage-mmap hugepage-shm map_hugetlb
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
diff --git a/Documentation/vm/hugepage-mmap.c b/Documentation/vm/hugepage-mmap.c
new file mode 100644
index 000000000000..db0dd9a33d54
--- /dev/null
+++ b/Documentation/vm/hugepage-mmap.c
@@ -0,0 +1,91 @@
+/*
+ * hugepage-mmap:
+ *
+ * Example of using huge page memory in a user application using the mmap
+ * system call.  Before running this application, make sure that the
+ * administrator has mounted the hugetlbfs filesystem (on some directory
+ * like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this
+ * example, the app is requesting memory of size 256MB that is backed by
+ * huge pages.
+ *
+ * For the ia64 architecture, the Linux kernel reserves Region number 4 for
+ * huge pages.  That means that if one requires a fixed address, a huge page
+ * aligned address starting with 0x800000... will be required.  If a fixed
+ * address is not required, the kernel will select an address in the proper
+ * range.
+ * Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
+ */
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#define FILE_NAME "/mnt/hugepagefile"
+#define LENGTH (256UL*1024*1024)
+#define PROTECTION (PROT_READ | PROT_WRITE)
+/* Only ia64 requires this */
+#ifdef __ia64__
+#define ADDR (void *)(0x8000000000000000UL)
+#define FLAGS (MAP_SHARED | MAP_FIXED)
+#else
+#define ADDR (void *)(0x0UL)
+#define FLAGS (MAP_SHARED)
+#endif
+static void check_bytes(char *addr)
+{
+        printf("First hex is %x\n", *((unsigned int *)addr));
+}
+static void write_bytes(char *addr)
+{
+        unsigned long i;
+        for (i = 0; i < LENGTH; i++)
+                *(addr + i) = (char)i;
+}
+static void read_bytes(char *addr)
+{
+        unsigned long i;
+        check_bytes(addr);
+        for (i = 0; i < LENGTH; i++)
+                if (*(addr + i) != (char)i) {
+                        printf("Mismatch at %lu\n", i);
+                        break;
+                }
+}
+int main(void)
+{
+        void *addr;
+        int fd;
+        fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755);
+        if (fd < 0) {
+                perror("Open failed");
+                exit(1);
+        }
+        addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0);
+        if (addr == MAP_FAILED) {
+                perror("mmap");
+                unlink(FILE_NAME);
+                exit(1);
+        }
+        printf("Returned address is %p\n", addr);
+        check_bytes(addr);
+        write_bytes(addr);
+        read_bytes(addr);
+        munmap(addr, LENGTH);
+        close(fd);
+        unlink(FILE_NAME);
+        return 0;
+}
diff --git a/Documentation/vm/hugepage-shm.c b/Documentation/vm/hugepage-shm.c
new file mode 100644
index 000000000000..07956d8592c9
--- /dev/null
+++ b/Documentation/vm/hugepage-shm.c
@@ -0,0 +1,98 @@
+/*
+ * hugepage-shm:
+ *
+ * Example of using huge page memory in a user application using Sys V shared
+ * memory system calls.  In this example the app is requesting 256MB of
+ * memory that is backed by huge pages.  The application uses the flag
+ * SHM_HUGETLB in the shmget system call to inform the kernel that it is
+ * requesting huge pages.
+ *
+ * For the ia64 architecture, the Linux kernel reserves Region number 4 for
+ * huge pages.  That means that if one requires a fixed address, a huge page
+ * aligned address starting with 0x800000... will be required.  If a fixed
+ * address is not required, the kernel will select an address in the proper
+ * range.
+ * Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
+ *
+ * Note: The default shared memory limit is quite low on many kernels,
+ * you may need to increase it via:
+ *
+ * echo 268435456 > /proc/sys/kernel/shmmax
+ *
+ * This will increase the maximum size per shared memory segment to 256MB.
+ * The other limit that you will hit eventually is shmall which is the
+ * total amount of shared memory in pages. To set it to 16GB on a system
+ * with a 4kB pagesize do:
+ *
+ * echo 4194304 > /proc/sys/kernel/shmall
+ */
+#include <stdlib.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/ipc.h>
+#include <sys/shm.h>
+#include <sys/mman.h>
+#ifndef SHM_HUGETLB
+#define SHM_HUGETLB 04000
+#endif
+#define LENGTH (256UL*1024*1024)
+#define dprintf(x)  printf(x)
+/* Only ia64 requires this */
+#ifdef __ia64__
+#define ADDR (void *)(0x8000000000000000UL)
+#define SHMAT_FLAGS (SHM_RND)
+#else
+#define ADDR (void *)(0x0UL)
+#define SHMAT_FLAGS (0)
+#endif
+int main(void)
+{
+        int shmid;
+        unsigned long i;
+        char *shmaddr;
+        if ((shmid = shmget(2, LENGTH,
+                            SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) < 0) {
+                perror("shmget");
+                exit(1);
+        }
+        printf("shmid: 0x%x\n", shmid);
+        shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS);
+        if (shmaddr == (char *)-1) {
+                perror("Shared memory attach failure");
+                shmctl(shmid, IPC_RMID, NULL);
+                exit(2);
+        }
+        printf("shmaddr: %p\n", shmaddr);
+        dprintf("Starting the writes:\n");
+        for (i = 0; i < LENGTH; i++) {
+                shmaddr[i] = (char)(i);
+                if (!(i % (1024 * 1024)))
+                        dprintf(".");
+        }
+        dprintf("\n");
+        dprintf("Starting the Check...");
+        for (i = 0; i < LENGTH; i++)
+                if (shmaddr[i] != (char)i)
+                        printf("\nIndex %lu mismatched\n", i);
+        dprintf("Done.\n");
+        if (shmdt((const void *)shmaddr) != 0) {
+                perror("Detach failure");
+                shmctl(shmid, IPC_RMID, NULL);
+                exit(3);
+        }
+        shmctl(shmid, IPC_RMID, NULL);
+        return 0;
+}
diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt
index bc31636973e3..457634c1e03e 100644
--- a/Documentation/vm/hugetlbpage.txt
+++ b/Documentation/vm/hugetlbpage.txt
@@ -299,176 +299,11 @@ map_hugetlb.c.
 *******************************************************************
 /*
- * Example of using huge page memory in a user application using Sys V shared
+ * hugepage-shm:  see Documentation/vm/hugepage-shm.c
- * memory system calls.  In this example the app is requesting 256MB of
- * memory that is backed by huge pages.  The application uses the flag
- * SHM_HUGETLB in the shmget system call to inform the kernel that it is
- * requesting huge pages.
- *
- * For the ia64 architecture, the Linux kernel reserves Region number 4 for
- * huge pages.  That means that if one requires a fixed address, a huge page
- * aligned address starting with 0x800000... will be required.  If a fixed
- * address is not required, the kernel will select an address in the proper
- * range.
- * Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
- *
- * Note: The default shared memory limit is quite low on many kernels,
- * you may need to increase it via:
- *
- * echo 268435456 > /proc/sys/kernel/shmmax
- *
- * This will increase the maximum size per shared memory segment to 256MB.
- * The other limit that you will hit eventually is shmall which is the
- * total amount of shared memory in pages. To set it to 16GB on a system
- * with a 4kB pagesize do:
- *
- * echo 4194304 > /proc/sys/kernel/shmall
 */
-#include <stdlib.h>
-#include <stdio.h>
-#include <sys/types.h>
-#include <sys/ipc.h>
-#include <sys/shm.h>
-#include <sys/mman.h>
-#ifndef SHM_HUGETLB
-#define SHM_HUGETLB 04000
-#endif
-#define LENGTH (256UL*1024*1024)
-#define dprintf(x)  printf(x)
-#define ADDR (void *)(0x0UL)    /* let kernel choose address */
-#define SHMAT_FLAGS (0)
-int main(void)
-{
-        int shmid;
-        unsigned long i;
-        char *shmaddr;
-        if ((shmid = shmget(2, LENGTH,
-                            SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) < 0) {
-                perror("shmget");
-                exit(1);
-        }
-        printf("shmid: 0x%x\n", shmid);
-        shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS);
-        if (shmaddr == (char *)-1) {
-                perror("Shared memory attach failure");
-                shmctl(shmid, IPC_RMID, NULL);
-                exit(2);
-        }
-        printf("shmaddr: %p\n", shmaddr);
-        dprintf("Starting the writes:\n");
-        for (i = 0; i < LENGTH; i++) {
-                shmaddr[i] = (char)(i);
-                if (!(i % (1024 * 1024)))
-                        dprintf(".");
-        }
-        dprintf("\n");
-        dprintf("Starting the Check...");
-        for (i = 0; i < LENGTH; i++)
-                if (shmaddr[i] != (char)i)
-                        printf("\nIndex %lu mismatched\n", i);
-        dprintf("Done.\n");
-        if (shmdt((const void *)shmaddr) != 0) {
-                perror("Detach failure");
-                shmctl(shmid, IPC_RMID, NULL);
-                exit(3);
-        }
-        shmctl(shmid, IPC_RMID, NULL);
-        return 0;
-}
 *******************************************************************
 /*
- * Example of using huge page memory in a user application using the mmap
+ * hugepage-mmap:  see Documentation/vm/hugepage-mmap.c
- * system call.  Before running this application, make sure that the
- * administrator has mounted the hugetlbfs filesystem (on some directory
- * like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this
- * example, the app is requesting memory of size 256MB that is backed by
- * huge pages.
- *
- * For the ia64 architecture, the Linux kernel reserves Region number 4 for
- * huge pages.  That means that if one requires a fixed address, a huge page
- * aligned address starting with 0x800000... will be required.  If a fixed
- * address is not required, the kernel will select an address in the proper
- * range.
- * Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
 */
-#include <stdlib.h>
-#include <stdio.h>
-#include <unistd.h>
-#include <sys/mman.h>
-#include <fcntl.h>
-#define FILE_NAME "/mnt/hugepagefile"
-#define LENGTH (256UL*1024*1024)
-#define PROTECTION (PROT_READ | PROT_WRITE)
-#define ADDR (void *)(0x0UL)    /* let kernel choose address */
-#define FLAGS (MAP_SHARED)
-void check_bytes(char *addr)
-{
-        printf("First hex is %x\n", *((unsigned int *)addr));
-}
-void write_bytes(char *addr)
-{
-        unsigned long i;
-        for (i = 0; i < LENGTH; i++)
-                *(addr + i) = (char)i;
-}
-void read_bytes(char *addr)
-{
-        unsigned long i;
-        check_bytes(addr);
-        for (i = 0; i < LENGTH; i++)
-                if (*(addr + i) != (char)i) {
-                        printf("Mismatch at %lu\n", i);
-                        break;
-                }
-}
-int main(void)
-{
-        void *addr;
-        int fd;
-        fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755);
-        if (fd < 0) {
-                perror("Open failed");
-                exit(1);
-        }
-        addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0);
-        if (addr == MAP_FAILED) {
-                perror("mmap");
-                unlink(FILE_NAME);
-                exit(1);
-        }
-        printf("Returned address is %p\n", addr);
-        check_bytes(addr);
-        write_bytes(addr);
-        read_bytes(addr);
-        munmap(addr, LENGTH);
-        close(fd);
-        unlink(FILE_NAME);
-        return 0;
-}
diff --git a/Documentation/vm/map_hugetlb.c b/Documentation/vm/map_hugetlb.c
index e2bdae37f499..9969c7d9f985 100644
--- a/Documentation/vm/map_hugetlb.c
+++ b/Documentation/vm/map_hugetlb.c
@@ -31,12 +31,12 @@
 #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
 #endif
-void check_bytes(char *addr)
+static void check_bytes(char *addr)
 {
        printf("First hex is %x\n", *((unsigned int *)addr));
 }
-void write_bytes(char *addr)
+static void write_bytes(char *addr)
 {
        unsigned long i;
@@ -44,7 +44,7 @@ void write_bytes(char *addr)
                *(addr + i) = (char)i;
 }
-void read_bytes(char *addr)
+static void read_bytes(char *addr)
 {
        unsigned long i;
diff --git a/Documentation/volatile-considered-harmful.txt b/Documentation/volatile-considered-harmful.txt
index 991c26a6ef64..db0cb228d64a 100644
--- a/Documentation/volatile-considered-harmful.txt
+++ b/Documentation/volatile-considered-harmful.txt
@@ -63,9 +63,9 @@ way to perform a busy wait is:
        cpu_relax();
 The cpu_relax() call can lower CPU power consumption or yield to a
-hyperthreaded twin processor; it also happens to serve as a memory barrier,
+hyperthreaded twin processor; it also happens to serve as a compiler
-so, once again, volatile is unnecessary.  Of course, busy-waiting is
+barrier, so, once again, volatile is unnecessary.  Of course, busy-
-generally an anti-social act to begin with.
+waiting is generally an anti-social act to begin with.
 There are still a few rare situations where volatile makes sense in the
 kernel:
diff --git a/Documentation/voyager.txt b/Documentation/voyager.txt
deleted file mode 100644
index 2749af552cdf..000000000000
--- a/Documentation/voyager.txt
+++ /dev/null
@@ -1,95 +0,0 @@
-Running Linux on the Voyager Architecture
-=========================================
-For full details and current project status, see
-http://www.hansenpartnership.com/voyager
-The voyager architecture was designed by NCR in the mid 80s to be a
-fully SMP capable RAS computing architecture built around intel's 486
-chip set.  The voyager came in three levels of architectural
-sophistication: 3,4 and 5 --- 1 and 2 never made it out of prototype.
-The linux patches support only the Level 5 voyager architecture (any
-machine class 3435 and above).
-The Voyager Architecture
------------------------
-Voyager machines consist of a Baseboard with a 386 diagnostic
-processor, a Power Supply Interface (PSI) a Primary and possibly
-Secondary Microchannel bus and between 2 and 20 voyager slots.  The
-voyager slots can be populated with memory and cpu cards (up to 4GB
-memory and from 1 486 to 32 Pentium Pro processors).  Internally, the
-voyager has a dual arbitrated system bus and a configuration and test
-bus (CAT).  The voyager bus speed is 40MHz.  Therefore (since all
-voyager cards are dual ported for each system bus) the maximum
-transfer rate is 320Mb/s but only if you have your slot configuration
-tuned (only memory cards can communicate with both busses at once, CPU
-cards utilise them one at a time).
-Voyager SMP
-----------
-Since voyager was the first intel based SMP system, it is slightly
-more primitive than the Intel IO-APIC approach to SMP.  Voyager allows
-arbitrary interrupt routing (including processor affinity routing) of
-all 16 PC type interrupts.  However it does this by using a modified
-5259 master/slave chip set instead of an APIC bus.  Additionally,
-voyager supports Cross Processor Interrupts (CPI) equivalent to the
-APIC IPIs.  There are two routed voyager interrupt lines provided to
-each slot.
-Processor Cards
---------------
-These come in single, dyadic and quad configurations (the quads are
-problematic--see later).  The maximum configuration is 8 quad cards
-for 32 way SMP.
-Quad Processors
---------------
-Because voyager only supplies two interrupt lines to each Processor
-card, the Quad processors have to be configured (and Bootstrapped) in
-as a pair of Master/Slave processors.
-In fact, most Quad cards only accept one VIC interrupt line, so they
-have one interrupt handling processor (called the VIC extended
-processor) and three non-interrupt handling processors.
-Current Status
--------------
-The System will boot on Mono, Dyad and Quad cards.  There was
-originally a Quad boot problem which has been fixed by proper gdt
-alignment in the initial boot loader.  If you still cannot get your
-voyager system to boot, email me at:
-<J.E.J.Bottomley@HansenPartnership.com>
-The Quad cards now support using the separate Quad CPI vectors instead
-of going through the VIC mailbox system.
-The Level 4 architecture (3430 and 3360 Machines) should also work
-fine.
-Dump Switch
-----------
-The voyager dump switch sends out a broadcast NMI which the voyager
-code intercepts and does a task dump.
-Power Switch
------------
-The front panel power switch is intercepted by the kernel and should
-cause a system shutdown and power off.
-A Note About Mixed CPU Systems
------------------------------
-Linux isn't designed to handle mixed CPU systems very well.  In order
-to get everything going you *must* make sure that your lowest
-capability CPU is used for booting.  Also, mixing CPU classes
-(e.g. 486 and 586) is really not going to work very well at all.
diff --git a/Documentation/watchdog/src/watchdog-simple.c b/Documentation/watchdog/src/watchdog-simple.c
index 4cf72f3fa8e9..ba45803a2216 100644
--- a/Documentation/watchdog/src/watchdog-simple.c
+++ b/Documentation/watchdog/src/watchdog-simple.c
@@ -17,9 +17,6 @@ int main(void)
                        ret = -1;
                        break;
                }
-                ret = fsync(fd);
-                if (ret)
-                        break;
                sleep(10);
        }
        close(fd);
diff --git a/Documentation/watchdog/src/watchdog-test.c b/Documentation/watchdog/src/watchdog-test.c
index a750532ffcf8..63fdc34ceb98 100644
--- a/Documentation/watchdog/src/watchdog-test.c
+++ b/Documentation/watchdog/src/watchdog-test.c
@@ -31,6 +31,8 @@ static void keep_alive(void)
 */
 int main(int argc, char *argv[])
 {
+    int flags;
    fd = open("/dev/watchdog", O_WRONLY);
    if (fd == -1) {
@@ -41,12 +43,14 @@ int main(int argc, char *argv[])
    if (argc > 1) {
        if (!strncasecmp(argv[1], "-d", 2)) {
-            ioctl(fd, WDIOC_SETOPTIONS, WDIOS_DISABLECARD);
+            flags = WDIOS_DISABLECARD;
+            ioctl(fd, WDIOC_SETOPTIONS, &flags);
            fprintf(stderr, "Watchdog card disabled.\n");
            fflush(stderr);
            exit(0);
        } else if (!strncasecmp(argv[1], "-e", 2)) {
-            ioctl(fd, WDIOC_SETOPTIONS, WDIOS_ENABLECARD);
+            flags = WDIOS_ENABLECARD;
+            ioctl(fd, WDIOC_SETOPTIONS, &flags);
            fprintf(stderr, "Watchdog card enabled.\n");
            fflush(stderr);
            exit(0);
diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt
index 4cc4ba9d7150..eb7132ed8bbc 100644
--- a/Documentation/watchdog/watchdog-api.txt
+++ b/Documentation/watchdog/watchdog-api.txt
@@ -222,11 +222,10 @@ returned value is the temperature in degrees fahrenheit.
    ioctl(fd, WDIOC_GETTEMP, &temperature);
 Finally the SETOPTIONS ioctl can be used to control some aspects of
-the cards operation; right now the pcwd driver is the only one
+the cards operation.
-supporting this ioctl.
    int options = 0;
-    ioctl(fd, WDIOC_SETOPTIONS, options);
+    ioctl(fd, WDIOC_SETOPTIONS, &options);
 The following options are available: