10 files changed, 649 insertions, 57 deletions
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index d9014aa0eb68..e33ee74eee77 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -227,7 +227,6 @@ Each cgroup is represented by a directory in the cgroup file system
 containing the following files describing that cgroup:
 - tasks: list of tasks (by pid) attached to that cgroup
- - releasable flag: cgroup currently removeable?
 - notify_on_release flag: run the release agent on exit?
 - release_agent: the path to use for release notifications (this file
   exists in the top cgroup only)
@@ -360,7 +359,7 @@ Now you want to do something with this cgroup.
 In this directory you can find several files:
 # ls
-notify_on_release releasable tasks
+notify_on_release tasks
 (plus whatever files added by the attached subsystems)
 Now attach your shell to this cgroup:
@@ -479,7 +478,6 @@ newly-created cgroup if an error occurs after this subsystem's
 create() method has been called for the new cgroup).
 void pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp);
-(cgroup_mutex held by caller)
 Called before checking the reference count on each subsystem. This may
 be useful for subsystems which have some extra references even if
@@ -498,6 +496,7 @@ remain valid while the caller holds cgroup_mutex.
 void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
            struct cgroup *old_cgrp, struct task_struct *task)
+(cgroup_mutex held by caller)
 Called after the task has been attached to the cgroup, to allow any
 post-attachment activity that requires memory allocations or blocking.
@@ -511,6 +510,7 @@ void exit(struct cgroup_subsys *ss, struct task_struct *task)
 Called during task exit.
 int populate(struct cgroup_subsys *ss, struct cgroup *cgrp)
+(cgroup_mutex held by caller)
 Called after creation of a cgroup to allow a subsystem to populate
 the cgroup directory with file entries.  The subsystem should make
@@ -520,6 +520,7 @@ method can return an error code, the error code is currently not
 always handled well.
 void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
+(cgroup_mutex held by caller)
 Called at the end of cgroup_clone() to do any paramater
 initialization which might be required before a task could attach.  For
@@ -527,7 +528,7 @@ example in cpusets, no task may attach before 'cpus' and 'mems' are set
 up.
 void bind(struct cgroup_subsys *ss, struct cgroup *root)
-(cgroup_mutex held by caller)
+(cgroup_mutex and ss->hierarchy_mutex held by caller)
 Called when a cgroup subsystem is rebound to a different hierarchy
 and root cgroup. Currently this will only involve movement between
diff --git a/Documentation/controllers/memcg_test.txt b/Documentation/controllers/memcg_test.txt
new file mode 100644
index 000000000000..08d4d3ea0d79
--- /dev/null
+++ b/Documentation/controllers/memcg_test.txt
@@ -0,0 +1,342 @@
+Memory Resource Controller(Memcg)  Implementation Memo.
+Last Updated: 2008/12/15
+Base Kernel Version: based on 2.6.28-rc8-mm.
+Because VM is getting complex (one of reasons is memcg...), memcg's behavior
+is complex. This is a document for memcg's internal behavior.
+Please note that implementation details can be changed.
+(*) Topics on API should be in Documentation/controllers/memory.txt)
+0. How to record usage ?
+   2 objects are used.
+   page_cgroup ....an object per page.
+        Allocated at boot or memory hotplug. Freed at memory hot removal.
+   swap_cgroup ... an entry per swp_entry.
+        Allocated at swapon(). Freed at swapoff().
+   The page_cgroup has USED bit and double count against a page_cgroup never
+   occurs. swap_cgroup is used only when a charged page is swapped-out.
+1. Charge
+   a page/swp_entry may be charged (usage += PAGE_SIZE) at
+        mem_cgroup_newpage_charge()
+          Called at new page fault and Copy-On-Write.
+        mem_cgroup_try_charge_swapin()
+          Called at do_swap_page() (page fault on swap entry) and swapoff.
+          Followed by charge-commit-cancel protocol. (With swap accounting)
+          At commit, a charge recorded in swap_cgroup is removed.
+        mem_cgroup_cache_charge()
+          Called at add_to_page_cache()
+        mem_cgroup_cache_charge_swapin()
+          Called at shmem's swapin.
+        mem_cgroup_prepare_migration()
+          Called before migration. "extra" charge is done and followed by
+          charge-commit-cancel protocol.
+          At commit, charge against oldpage or newpage will be committed.
+2. Uncharge
+  a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by
+        mem_cgroup_uncharge_page()
+          Called when an anonymous page is fully unmapped. I.e., mapcount goes
+          to 0. If the page is SwapCache, uncharge is delayed until
+          mem_cgroup_uncharge_swapcache().
+        mem_cgroup_uncharge_cache_page()
+          Called when a page-cache is deleted from radix-tree. If the page is
+          SwapCache, uncharge is delayed until mem_cgroup_uncharge_swapcache().
+        mem_cgroup_uncharge_swapcache()
+          Called when SwapCache is removed from radix-tree. The charge itself
+          is moved to swap_cgroup. (If mem+swap controller is disabled, no
+          charge to swap occurs.)
+        mem_cgroup_uncharge_swap()
+          Called when swp_entry's refcnt goes down to 0. A charge against swap
+          disappears.
+        mem_cgroup_end_migration(old, new)
+        At success of migration old is uncharged (if necessary), a charge
+        to new page is committed. At failure, charge to old page is committed.
+3. charge-commit-cancel
+        In some case, we can't know this "charge" is valid or not at charging
+        (because of races).
+        To handle such case, there are charge-commit-cancel functions.
+                mem_cgroup_try_charge_XXX
+                mem_cgroup_commit_charge_XXX
+                mem_cgroup_cancel_charge_XXX
+        these are used in swap-in and migration.
+        At try_charge(), there are no flags to say "this page is charged".
+        at this point, usage += PAGE_SIZE.
+        At commit(), the function checks the page should be charged or not
+        and set flags or avoid charging.(usage -= PAGE_SIZE)
+        At cancel(), simply usage -= PAGE_SIZE.
+Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
+4. Anonymous
+        Anonymous page is newly allocated at
+                  - page fault into MAP_ANONYMOUS mapping.
+                  - Copy-On-Write.
+        It is charged right after it's allocated before doing any page table
+        related operations. Of course, it's uncharged when another page is used
+        for the fault address.
+        At freeing anonymous page (by exit() or munmap()), zap_pte() is called
+        and pages for ptes are freed one by one.(see mm/memory.c). Uncharges
+        are done at page_remove_rmap() when page_mapcount() goes down to 0.
+        Another page freeing is by page-reclaim (vmscan.c) and anonymous
+        pages are swapped out. In this case, the page is marked as
+        PageSwapCache(). uncharge() routine doesn't uncharge the page marked
+        as SwapCache(). It's delayed until __delete_from_swap_cache().
+        4.1 Swap-in.
+        At swap-in, the page is taken from swap-cache. There are 2 cases.
+        (a) If the SwapCache is newly allocated and read, it has no charges.
+        (b) If the SwapCache has been mapped by processes, it has been
+            charged already.
+        This swap-in is one of the most complicated work. In do_swap_page(),
+        following events occur when pte is unchanged.
+        (1) the page (SwapCache) is looked up.
+        (2) lock_page()
+        (3) try_charge_swapin()
+        (4) reuse_swap_page() (may call delete_swap_cache())
+        (5) commit_charge_swapin()
+        (6) swap_free().
+        Considering following situation for example.
+        (A) The page has not been charged before (2) and reuse_swap_page()
+            doesn't call delete_from_swap_cache().
+        (B) The page has not been charged before (2) and reuse_swap_page()
+            calls delete_from_swap_cache().
+        (C) The page has been charged before (2) and reuse_swap_page() doesn't
+            call delete_from_swap_cache().
+        (D) The page has been charged before (2) and reuse_swap_page() calls
+            delete_from_swap_cache().
+            memory.usage/memsw.usage changes to this page/swp_entry will be
+         Case          (A)      (B)       (C)     (D)
+         Event
+       Before (2)     0/ 1     0/ 1      1/ 1    1/ 1
+          ===========================================
+          (3)        +1/+1    +1/+1     +1/+1   +1/+1
+          (4)          -       0/ 0       -     -1/ 0
+          (5)         0/-1     0/ 0     -1/-1    0/ 0
+          (6)          -       0/-1       -      0/-1
+          ===========================================
+       Result         1/ 1     1/ 1      1/ 1    1/ 1
+       In any cases, charges to this page should be 1/ 1.
+        4.2 Swap-out.
+        At swap-out, typical state transition is below.
+        (a) add to swap cache. (marked as SwapCache)
+            swp_entry's refcnt += 1.
+        (b) fully unmapped.
+            swp_entry's refcnt += # of ptes.
+        (c) write back to swap.
+        (d) delete from swap cache. (remove from SwapCache)
+            swp_entry's refcnt -= 1.
+        At (b), the page is marked as SwapCache and not uncharged.
+        At (d), the page is removed from SwapCache and a charge in page_cgroup
+        is moved to swap_cgroup.
+        Finally, at task exit,
+        (e) zap_pte() is called and swp_entry's refcnt -=1 -> 0.
+        Here, a charge in swap_cgroup disappears.
+5. Page Cache
+        Page Cache is charged at
+        - add_to_page_cache_locked().
+        uncharged at
+        - __remove_from_page_cache().
+        The logic is very clear. (About migration, see below)
+        Note: __remove_from_page_cache() is called by remove_from_page_cache()
+        and __remove_mapping().
+6. Shmem(tmpfs) Page Cache
+        Memcg's charge/uncharge have special handlers of shmem. The best way
+        to understand shmem's page state transition is to read mm/shmem.c.
+        But brief explanation of the behavior of memcg around shmem will be
+        helpful to understand the logic.
+        Shmem's page (just leaf page, not direct/indirect block) can be on
+                - radix-tree of shmem's inode.
+                - SwapCache.
+                - Both on radix-tree and SwapCache. This happens at swap-in
+                  and swap-out,
+        It's charged when...
+        - A new page is added to shmem's radix-tree.
+        - A swp page is read. (move a charge from swap_cgroup to page_cgroup)
+        It's uncharged when
+        - A page is removed from radix-tree and not SwapCache.
+        - When SwapCache is removed, a charge is moved to swap_cgroup.
+        - When swp_entry's refcnt goes down to 0, a charge in swap_cgroup
+          disappears.
+7. Page Migration
+        One of the most complicated functions is page-migration-handler.
+        Memcg has 2 routines. Assume that we are migrating a page's contents
+        from OLDPAGE to NEWPAGE.
+        Usual migration logic is..
+        (a) remove the page from LRU.
+        (b) allocate NEWPAGE (migration target)
+        (c) lock by lock_page().
+        (d) unmap all mappings.
+        (e-1) If necessary, replace entry in radix-tree.
+        (e-2) move contents of a page.
+        (f) map all mappings again.
+        (g) pushback the page to LRU.
+        (-) OLDPAGE will be freed.
+        Before (g), memcg should complete all necessary charge/uncharge to
+        NEWPAGE/OLDPAGE.
+        The point is....
+        - If OLDPAGE is anonymous, all charges will be dropped at (d) because
+          try_to_unmap() drops all mapcount and the page will not be
+          SwapCache.
+        - If OLDPAGE is SwapCache, charges will be kept at (g) because
+          __delete_from_swap_cache() isn't called at (e-1)
+        - If OLDPAGE is page-cache, charges will be kept at (g) because
+          remove_from_swap_cache() isn't called at (e-1)
+        memcg provides following hooks.
+        - mem_cgroup_prepare_migration(OLDPAGE)
+          Called after (b) to account a charge (usage += PAGE_SIZE) against
+          memcg which OLDPAGE belongs to.
+        - mem_cgroup_end_migration(OLDPAGE, NEWPAGE)
+          Called after (f) before (g).
+          If OLDPAGE is used, commit OLDPAGE again. If OLDPAGE is already
+          charged, a charge by prepare_migration() is automatically canceled.
+          If NEWPAGE is used, commit NEWPAGE and uncharge OLDPAGE.
+          But zap_pte() (by exit or munmap) can be called while migration,
+          we have to check if OLDPAGE/NEWPAGE is a valid page after commit().
+8. LRU
+        Each memcg has its own private LRU. Now, it's handling is under global
+        VM's control (means that it's handled under global zone->lru_lock).
+        Almost all routines around memcg's LRU is called by global LRU's
+        list management functions under zone->lru_lock().
+        A special function is mem_cgroup_isolate_pages(). This scans
+        memcg's private LRU and call __isolate_lru_page() to extract a page
+        from LRU.
+        (By __isolate_lru_page(), the page is removed from both of global and
+         private LRU.)
+9. Typical Tests.
+ Tests for racy cases.
+ 9.1 Small limit to memcg.
+        When you do test to do racy case, it's good test to set memcg's limit
+        to be very small rather than GB. Many races found in the test under
+        xKB or xxMB limits.
+        (Memory behavior under GB and Memory behavior under MB shows very
+         different situation.)
+ 9.2 Shmem
+        Historically, memcg's shmem handling was poor and we saw some amount
+        of troubles here. This is because shmem is page-cache but can be
+        SwapCache. Test with shmem/tmpfs is always good test.
+ 9.3 Migration
+        For NUMA, migration is an another special case. To do easy test, cpuset
+        is useful. Following is a sample script to do migration.
+        mount -t cgroup -o cpuset none /opt/cpuset
+        mkdir /opt/cpuset/01
+        echo 1 > /opt/cpuset/01/cpuset.cpus
+        echo 0 > /opt/cpuset/01/cpuset.mems
+        echo 1 > /opt/cpuset/01/cpuset.memory_migrate
+        mkdir /opt/cpuset/02
+        echo 1 > /opt/cpuset/02/cpuset.cpus
+        echo 1 > /opt/cpuset/02/cpuset.mems
+        echo 1 > /opt/cpuset/02/cpuset.memory_migrate
+        In above set, when you moves a task from 01 to 02, page migration to
+        node 0 to node 1 will occur. Following is a script to migrate all
+        under cpuset.
+        --
+        move_task()
+        {
+        for pid in $1
+        do
+                /bin/echo $pid >$2/tasks 2>/dev/null
+                echo -n $pid
+                echo -n " "
+        done
+        echo END
+        }
+        G1_TASK=`cat ${G1}/tasks`
+        G2_TASK=`cat ${G2}/tasks`
+        move_task "${G1_TASK}" ${G2} &
+        --
+ 9.4 Memory hotplug.
+        memory hotplug test is one of good test.
+        to offline memory, do following.
+        # echo offline > /sys/devices/system/memory/memoryXXX/state
+        (XXX is the place of memory)
+        This is an easy way to test page migration, too.
+ 9.5 mkdir/rmdir
+        When using hierarchy, mkdir/rmdir test should be done.
+        Use tests like the following.
+        echo 1 >/opt/cgroup/01/memory/use_hierarchy
+        mkdir /opt/cgroup/01/child_a
+        mkdir /opt/cgroup/01/child_b
+        set limit to 01.
+        add limit to 01/child_b
+        run jobs under child_a and child_b
+        create/delete following groups at random while jobs are running.
+        /opt/cgroup/01/child_a/child_aa
+        /opt/cgroup/01/child_b/child_bb
+        /opt/cgroup/01/child_c
+        running new jobs in new group is also good.
+ 9.6 Mount with other subsystems.
+        Mounting with other subsystems is a good test because there is a
+        race and lock dependency with other cgroup subsystems.
+        example)
+        # mount -t cgroup none /cgroup -t cpuset,memory,cpu,devices
+        and do task move, mkdir, rmdir etc...under this.
diff --git a/Documentation/controllers/memory.txt b/Documentation/controllers/memory.txt
index 1c07547d3f81..e1501964df1e 100644
--- a/Documentation/controllers/memory.txt
+++ b/Documentation/controllers/memory.txt
@@ -137,7 +137,32 @@ behind this approach is that a cgroup that aggressively uses a shared
 page will eventually get charged for it (once it is uncharged from
 the cgroup that brought it in -- this will happen on memory pressure).
-2.4 Reclaim
+Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used..
+When you do swapoff and make swapped-out pages of shmem(tmpfs) to
+be backed into memory in force, charges for pages are accounted against the
+caller of swapoff rather than the users of shmem.
+2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP)
+Swap Extension allows you to record charge for swap. A swapped-in page is
+charged back to original page allocator if possible.
+When swap is accounted, following files are added.
+ - memory.memsw.usage_in_bytes.
+ - memory.memsw.limit_in_bytes.
+usage of mem+swap is limited by memsw.limit_in_bytes.
+Note: why 'mem+swap' rather than swap.
+The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
+to move account from memory to swap...there is no change in usage of
+mem+swap.
+In other words, when we want to limit the usage of swap without affecting
+global LRU, mem+swap limit is better than just limiting swap from OS point
+of view.
+2.5 Reclaim
 Each cgroup maintains a per cgroup LRU that consists of an active
 and inactive list. When a cgroup goes over its limit, we first try
@@ -207,12 +232,6 @@ exceeded.
 The memory.stat file gives accounting information. Now, the number of
 caches, RSS and Active pages/Inactive pages are shown.
-The memory.force_empty gives an interface to drop *all* charges by force.
-# echo 1 > memory.force_empty
-will drop all charges in cgroup. Currently, this is maintained for test.
 4. Testing
 Balbir posted lmbench, AIM9, LTP and vmmstress results [10] and [11].
@@ -242,10 +261,106 @@ reclaimed.
 A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
 cgroup might have some charge associated with it, even though all
-tasks have migrated away from it. Such charges are automatically dropped at
+tasks have migrated away from it.
-rmdir() if there are no tasks.
+Such charges are freed(at default) or moved to its parent. When moved,
+both of RSS and CACHES are moved to parent.
+If both of them are busy, rmdir() returns -EBUSY. See 5.1 Also.
+Charges recorded in swap information is not updated at removal of cgroup.
+Recorded information is discarded and a cgroup which uses swap (swapcache)
+will be charged as a new owner of it.
+5. Misc. interfaces.
+5.1 force_empty
+  memory.force_empty interface is provided to make cgroup's memory usage empty.
+  You can use this interface only when the cgroup has no tasks.
+  When writing anything to this
+  # echo 0 > memory.force_empty
+  Almost all pages tracked by this memcg will be unmapped and freed. Some of
+  pages cannot be freed because it's locked or in-use. Such pages are moved
+  to parent and this cgroup will be empty. But this may return -EBUSY in
+  some too busy case.
+  Typical use case of this interface is that calling this before rmdir().
+  Because rmdir() moves all pages to parent, some out-of-use page caches can be
+  moved to the parent. If you want to avoid that, force_empty will be useful.
+5.2 stat file
+  memory.stat file includes following statistics (now)
+        cache                   - # of pages from page-cache and shmem.
+        rss                     - # of pages from anonymous memory.
+        pgpgin                  - # of event of charging
+        pgpgout                 - # of event of uncharging
+        active_anon             - # of pages on active lru of anon, shmem.
+        inactive_anon           - # of pages on active lru of anon, shmem
+        active_file             - # of pages on active lru of file-cache
+        inactive_file           - # of pages on inactive lru of file cache
+        unevictable             - # of pages cannot be reclaimed.(mlocked etc)
+        Below is depend on CONFIG_DEBUG_VM.
+        inactive_ratio          - VM inernal parameter. (see mm/page_alloc.c)
+        recent_rotated_anon     - VM internal parameter. (see mm/vmscan.c)
+        recent_rotated_file     - VM internal parameter. (see mm/vmscan.c)
+        recent_scanned_anon     - VM internal parameter. (see mm/vmscan.c)
+        recent_scanned_file     - VM internal parameter. (see mm/vmscan.c)
+  Memo:
+        recent_rotated means recent frequency of lru rotation.
+        recent_scanned means recent # of scans to lru.
+        showing for better debug please see the code for meanings.
+5.3 swappiness
+  Similar to /proc/sys/vm/swappiness, but affecting a hierarchy of groups only.
+  Following cgroup's swapiness can't be changed.
+  - root cgroup (uses /proc/sys/vm/swappiness).
+  - a cgroup which uses hierarchy and it has child cgroup.
+  - a cgroup which uses hierarchy and not the root of hierarchy.
+6. Hierarchy support
+The memory controller supports a deep hierarchy and hierarchical accounting.
+The hierarchy is created by creating the appropriate cgroups in the
+cgroup filesystem. Consider for example, the following cgroup filesystem
+hierarchy
+                root
+             /  |   \
+           /    |    \
+          a     b       c
+                        | \
+                        |  \
+                        d   e
+In the diagram above, with hierarchical accounting enabled, all memory
+usage of e, is accounted to its ancestors up until the root (i.e, c and root),
+that has memory.use_hierarchy enabled.  If one of the ancestors goes over its
+limit, the reclaim algorithm reclaims from the tasks in the ancestor and the
+children of the ancestor.
+6.1 Enabling hierarchical accounting and reclaim
+The memory controller by default disables the hierarchy feature. Support
+can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup
+# echo 1 > memory.use_hierarchy
+The feature can be disabled by
+# echo 0 > memory.use_hierarchy
+NOTE1: Enabling/disabling will fail if the cgroup already has other
+cgroups created below it.
+NOTE2: This feature can be enabled/disabled per subtree.
-5. TODO
+7. TODO
 1. Add support for accounting huge pages (as a separate controller)
 2. Make per-cgroup scanner reclaim not-shared pages first
diff --git a/Documentation/hwmon/abituguru-datasheet b/Documentation/hwmon/abituguru-datasheet
index 4d184f2db0ea..d9251efdcec7 100644
--- a/Documentation/hwmon/abituguru-datasheet
+++ b/Documentation/hwmon/abituguru-datasheet
@@ -121,7 +121,7 @@ Once all bytes have been read data will hold 0x09, but there is no reason to
 test for this. Notice that the number of bytes is bank address dependent see
 above and below.
-After completing a successfull read it is advised to put the uGuru back in
+After completing a successful read it is advised to put the uGuru back in
 ready mode, so that it is ready for the next read / write cycle. This way
 if your program / driver is unloaded and later loaded again the detection
 algorithm described above will still work.
@@ -141,7 +141,7 @@ don't ask why this is the way it is.
 Once DATA holds 0x01 read CMD it should hold 0xAC now.
-After completing a successfull write it is advised to put the uGuru back in
+After completing a successful write it is advised to put the uGuru back in
 ready mode, so that it is ready for the next read / write cycle. This way
 if your program / driver is unloaded and later loaded again the detection
 algorithm described above will still work.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 532eacbbed62..fb849020aea9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1562,6 +1562,9 @@ and is between 256 and 4096 characters. It is defined in the file
        nosoftlockup    [KNL] Disable the soft-lockup detector.
+        noswapaccount   [KNL] Disable accounting of swap in memory resource
+                        controller. (See Documentation/controllers/memory.txt)
        nosync          [HW,M68K] Disables sync negotiation for all devices.
        notsc           [BUGS=X86-32] Disable Time Stamp Counter
diff --git a/Documentation/powerpc/dts-bindings/fsl/board.txt b/Documentation/powerpc/dts-bindings/fsl/board.txt
index 81a917ef96e9..6c974d28eeb4 100644
--- a/Documentation/powerpc/dts-bindings/fsl/board.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/board.txt
@@ -18,7 +18,7 @@ This is the memory-mapped registers for on board FPGA.
 Required properities:
 - compatible : should be "fsl,fpga-pixis".
- reg : should contain the address and the lenght of the FPPGA register
+- reg : should contain the address and the length of the FPPGA register
  set.
 Example (MPC8610HPCD):
@@ -27,3 +27,33 @@ Example (MPC8610HPCD):
                compatible = "fsl,fpga-pixis";
                reg = <0xe8000000 32>;
        };
+* Freescale BCSR GPIO banks
+Some BCSR registers act as simple GPIO controllers, each such
+register can be represented by the gpio-controller node.
+Required properities:
+- compatible : Should be "fsl,<board>-bcsr-gpio".
+- reg : Should contain the address and the length of the GPIO bank
+  register.
+- #gpio-cells : Should be two. The first cell is the pin number and the
+  second cell is used to specify optional paramters (currently unused).
+- gpio-controller : Marks the port as GPIO controller.
+Example:
+        bcsr@1,0 {
+                #address-cells = <1>;
+                #size-cells = <1>;
+                compatible = "fsl,mpc8360mds-bcsr";
+                reg = <1 0 0x8000>;
+                ranges = <0 1 0 0x8000>;
+                bcsr13: gpio-controller@d {
+                        #gpio-cells = <2>;
+                        compatible = "fsl,mpc8360mds-bcsr-gpio";
+                        reg = <0xd 1>;
+                        gpio-controller;
+                };
+        };
diff --git a/Documentation/scsi/scsi_fc_transport.txt b/Documentation/scsi/scsi_fc_transport.txt
index 38d324d62b25..e5b071d46619 100644
--- a/Documentation/scsi/scsi_fc_transport.txt
+++ b/Documentation/scsi/scsi_fc_transport.txt
@@ -191,7 +191,7 @@ Vport States:
      This is equivalent to a driver "attach" on an adapter, which is
      independent of the adapter's link state.
    - Instantiation of the vport on the FC link via ELS traffic, etc.
-      This is equivalent to a "link up" and successfull link initialization.
+      This is equivalent to a "link up" and successful link initialization.
  Further information can be found in the interfaces section below for
  Vport Creation.
@@ -320,7 +320,7 @@ Vport Creation:
      This is equivalent to a driver "attach" on an adapter, which is
      independent of the adapter's link state.
    - Instantiation of the vport on the FC link via ELS traffic, etc.
-      This is equivalent to a "link up" and successfull link initialization.
+      This is equivalent to a "link up" and successful link initialization.
  The LLDD's vport_create() function will not synchronously wait for both
  parts to be fully completed before returning. It must validate that the
diff --git a/Documentation/w1/masters/00-INDEX b/Documentation/w1/masters/00-INDEX
index 7b0ceaaad7af..d63fa024ac05 100644
--- a/Documentation/w1/masters/00-INDEX
+++ b/Documentation/w1/masters/00-INDEX
@@ -4,5 +4,7 @@ ds2482
        - The Maxim/Dallas Semiconductor DS2482 provides 1-wire busses.
 ds2490
        - The Maxim/Dallas Semiconductor DS2490 builds USB <-> W1 bridges.
+mxc_w1
+        - W1 master controller driver found on Freescale MX2/MX3 SoCs
 w1-gpio
        - GPIO 1-wire bus master driver.
diff --git a/Documentation/w1/masters/mxc-w1 b/Documentation/w1/masters/mxc-w1
new file mode 100644
index 000000000000..97f6199a7f39
--- /dev/null
+++ b/Documentation/w1/masters/mxc-w1
@@ -0,0 +1,11 @@
+Kernel driver mxc_w1
+====================
+Supported chips:
+  * Freescale MX27, MX31 and probably other i.MX SoCs
+    Datasheets:
+        http://www.freescale.com/files/32bit/doc/data_sheet/MCIMX31.pdf?fpsp=1
+        http://www.freescale.com/files/dsp/MCIMX27.pdf?fpsp=1
+Author: Originally based on Freescale code, prepared for mainline by
+        Sascha Hauer <s.hauer@pengutronix.de>
diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink
index 3640c7c87d45..804445f745ed 100644
--- a/Documentation/w1/w1.netlink
+++ b/Documentation/w1/w1.netlink
@@ -5,69 +5,157 @@ Message types.
 =============
 There are three types of messages between w1 core and userspace:
-1. Events. They are generated each time new master or slave device found
+1. Events. They are generated each time new master or slave device
-        either due to automatic or requested search.
+        found either due to automatic or requested search.
-2. Userspace commands. Includes read/write and search/alarm search comamnds.
+2. Userspace commands.
 3. Replies to userspace commands.
 Protocol.
 ========
-[struct cn_msg] - connector header. It's length field is equal to size of the attached data.
+[struct cn_msg] - connector header.
+        Its length field is equal to size of the attached data
 [struct w1_netlink_msg] - w1 netlink header.
        __u8 type       - message type.
-                        W1_SLAVE_ADD/W1_SLAVE_REMOVE - slave add/remove events.
+                        W1_LIST_MASTERS
-                        W1_MASTER_ADD/W1_MASTER_REMOVE - master add/remove events.
+                                list current bus masters
-                        W1_MASTER_CMD - userspace command for bus master device (search/alarm search).
+                        W1_SLAVE_ADD/W1_SLAVE_REMOVE
-                        W1_SLAVE_CMD - userspace command for slave device (read/write/ search/alarm search
+                                slave add/remove events
-                                        for bus master device where given slave device found).
+                        W1_MASTER_ADD/W1_MASTER_REMOVE
+                                master add/remove events
+                        W1_MASTER_CMD
+                                userspace command for bus master
+                                device (search/alarm search)
+                        W1_SLAVE_CMD
+                                userspace command for slave device
+                                (read/write/touch)
        __u8 res        - reserved
-        __u16 len       - size of attached to this header data.
+        __u16 len       - size of data attached to this header data
        union {
-                __u8 id;                         - slave unique device id
+                __u8 id[8];                      - slave unique device id
                struct w1_mst {
-                        __u32           id;      - master's id.
+                        __u32           id;      - master's id
                        __u32           res;     - reserved
                } mst;
        } id;
-[strucrt w1_netlink_cmd] - command for gived master or slave device.
+[struct w1_netlink_cmd] - command for given master or slave device.
        __u8 cmd        - command opcode.
-                        W1_CMD_READ     - read command.
+                        W1_CMD_READ     - read command
-                        W1_CMD_WRITE    - write command.
+                        W1_CMD_WRITE    - write command
-                        W1_CMD_SEARCH   - search command.
+                        W1_CMD_TOUCH    - touch command
-                        W1_CMD_ALARM_SEARCH - alarm search command.
+                                (write and sample data back to userspace)
+                        W1_CMD_SEARCH   - search command
+                        W1_CMD_ALARM_SEARCH - alarm search command
        __u8 res        - reserved
-        __u16 len       - length of data for this command.
+        __u16 len       - length of data for this command
-                        For read command data must be allocated like for write command.
+                For read command data must be allocated like for write command
-        __u8 data[0]    - data for this command.
+        __u8 data[0]    - data for this command
-Each connector message can include one or more w1_netlink_msg with zero of more attached w1_netlink_cmd messages.
+Each connector message can include one or more w1_netlink_msg with
+zero or more attached w1_netlink_cmd messages.
-For event messages there are no w1_netlink_cmd embedded structures, only connector header
+For event messages there are no w1_netlink_cmd embedded structures,
-and w1_netlink_msg strucutre with "len" field being zero and filled type (one of event types)
+only connector header and w1_netlink_msg strucutre with "len" field
-and id - either 8 bytes of slave unique id in host order, or master's id, which is assigned
+being zero and filled type (one of event types) and id:
-to bus master device when it is added to w1 core.
+either 8 bytes of slave unique id in host order,
+or master's id, which is assigned to bus master device
+when it is added to w1 core.
+Currently replies to userspace commands are only generated for read
+command request. One reply is generated exactly for one w1_netlink_cmd
+read request. Replies are not combined when sent - i.e. typical reply
+messages looks like the following:
-Currently replies to userspace commands are only generated for read command request.
-One reply is generated exactly for one w1_netlink_cmd read request.
-Replies are not combined when sent - i.e. typical reply messages looks like the following:
 [cn_msg][w1_netlink_msg][w1_netlink_cmd]
-cn_msg.len = sizeof(struct w1_netlink_msg) + sizeof(struct w1_netlink_cmd) + cmd->len;
+cn_msg.len = sizeof(struct w1_netlink_msg) +
+             sizeof(struct w1_netlink_cmd) +
+             cmd->len;
 w1_netlink_msg.len = sizeof(struct w1_netlink_cmd) + cmd->len;
 w1_netlink_cmd.len = cmd->len;
+Replies to W1_LIST_MASTERS should send a message back to the userspace
+which will contain list of all registered master ids in the following
+format:
+        cn_msg (CN_W1_IDX.CN_W1_VAL as id, len is equal to sizeof(struct
+        w1_netlink_msg) plus number of masters multipled by 4)
+        w1_netlink_msg (type: W1_LIST_MASTERS, len is equal to
+                number of masters multiplied by 4 (u32 size))
+        id0 ... idN
+        Each message is at most 4k in size, so if number of master devices
+        exceeds this, it will be split into several messages,
+        cn.seq will be increased for each one.
+W1 search and alarm search commands.
+request:
+[cn_msg]
+  [w1_netlink_msg type = W1_MASTER_CMD
+        id is equal to the bus master id to use for searching]
+  [w1_netlink_cmd cmd = W1_CMD_SEARCH or W1_CMD_ALARM_SEARCH]
+reply:
+  [cn_msg, ack = 1 and increasing, 0 means the last message,
+        seq is equal to the request seq]
+  [w1_netlink_msg type = W1_MASTER_CMD]
+  [w1_netlink_cmd cmd = W1_CMD_SEARCH or W1_CMD_ALARM_SEARCH
+        len is equal to number of IDs multiplied by 8]
+  [64bit-id0 ... 64bit-idN]
+Length in each header corresponds to the size of the data behind it, so
+w1_netlink_cmd->len = N * 8; where N is number of IDs in this message.
+        Can be zero.
+w1_netlink_msg->len = sizeof(struct w1_netlink_cmd) + N * 8;
+cn_msg->len = sizeof(struct w1_netlink_msg) +
+              sizeof(struct w1_netlink_cmd) +
+              N*8;
+W1 reset command.
+[cn_msg]
+  [w1_netlink_msg type = W1_MASTER_CMD
+        id is equal to the bus master id to use for searching]
+  [w1_netlink_cmd cmd = W1_CMD_RESET]
+Command status replies.
+======================
+Each command (either root, master or slave with or without w1_netlink_cmd
+structure) will be 'acked' by the w1 core. Format of the reply is the same
+as request message except that length parameters do not account for data
+requested by the user, i.e. read/write/touch IO requests will not contain
+data, so w1_netlink_cmd.len will be 0, w1_netlink_msg.len will be size
+of the w1_netlink_cmd structure and cn_msg.len will be equal to the sum
+of the sizeof(struct w1_netlink_msg) and sizeof(struct w1_netlink_cmd).
+If reply is generated for master or root command (which do not have
+w1_netlink_cmd attached), reply will contain only cn_msg and w1_netlink_msg
+structires.
+w1_netlink_msg.status field will carry positive error value
+(EINVAL for example) or zero in case of success.
+All other fields in every structure will mirror the same parameters in the
+request message (except lengths as described above).
+Status reply is generated for every w1_netlink_cmd embedded in the
+w1_netlink_msg, if there are no w1_netlink_cmd structures,
+reply will be generated for the w1_netlink_msg.
+All w1_netlink_cmd command structures are handled in every w1_netlink_msg,
+even if there were errors, only length mismatch interrupts message processing.
 Operation steps in w1 core when new command is received.
 =======================================================
-When new message (w1_netlink_msg) is received w1 core detects if it is master of slave request,
+When new message (w1_netlink_msg) is received w1 core detects if it is
-according to w1_netlink_msg.type field.
+master or slave request, according to w1_netlink_msg.type field.
 Then master or slave device is searched for.
-When found, master device (requested or those one on where slave device is found) is locked.
+When found, master device (requested or those one on where slave device
-If slave command is requested, then reset/select procedure is started to select given device.
+is found) is locked. If slave command is requested, then reset/select
+procedure is started to select given device.
 Then all requested in w1_netlink_msg operations are performed one by one.
 If command requires reply (like read command) it is sent on command completion.
@@ -82,8 +170,8 @@ Connector [1] specific documentation.
 Each connector message includes two u32 fields as "address".
 w1 uses CN_W1_IDX and CN_W1_VAL defined in include/linux/connector.h header.
 Each message also includes sequence and acknowledge numbers.
-Sequence number for event messages is appropriate bus master sequence number increased with
+Sequence number for event messages is appropriate bus master sequence number
-each event message sent "through" this master.
+increased with each event message sent "through" this master.
 Sequence number for userspace requests is set by userspace application.
 Sequence number for reply is the same as was in request, and
 acknowledge number is set to seq+1.
@@ -93,6 +181,6 @@ Additional documantion, source code examples.
 ============================================
 1. Documentation/connector
-2. http://tservice.net.ru/~s0mbre/archive/w1
+2. http://www.ioremap.net/archive/w1
-This archive includes userspace application w1d.c which
+This archive includes userspace application w1d.c which uses
-uses read/write/search commands for all master/slave devices found on the bus.
+read/write/search commands for all master/slave devices found on the bus.