aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorIngo Molnar <mingo@elte.hu>2010-10-30 04:43:08 -0400
committerIngo Molnar <mingo@elte.hu>2010-10-30 04:43:08 -0400
commit169ed55bd30305b933f52bfab32a58671d44ab68 (patch)
tree32e280957474f458901abfce16fa2a1687ef7497 /Documentation
parent3d7851b3cdd43a734e5cc4c643fd886ab28ad4d5 (diff)
parent45f81b1c96d9793e47ce925d257ea693ce0b193e (diff)
Merge branch 'tip/perf/jump-label-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/obsolete/dv13949
-rw-r--r--Documentation/ABI/removed/dv139414
-rw-r--r--Documentation/ABI/removed/raw139415
-rw-r--r--Documentation/ABI/removed/raw1394_legacy_isochronous16
-rw-r--r--Documentation/ABI/removed/video139416
-rw-r--r--Documentation/ABI/testing/sysfs-devices-system-ibm-rtl22
-rw-r--r--Documentation/ABI/testing/sysfs-driver-hid-roccat-pyra98
-rw-r--r--Documentation/ABI/testing/sysfs-module12
-rw-r--r--Documentation/DocBook/80211.tmpl495
-rw-r--r--Documentation/DocBook/Makefile2
-rw-r--r--Documentation/DocBook/device-drivers.tmpl5
-rw-r--r--Documentation/DocBook/drm.tmpl1
-rw-r--r--Documentation/DocBook/kernel-api.tmpl9
-rw-r--r--Documentation/DocBook/mac80211.tmpl337
-rw-r--r--Documentation/accounting/getdelays.c38
-rw-r--r--Documentation/arm/SA1100/FreeBird4
-rw-r--r--Documentation/block/00-INDEX4
-rw-r--r--Documentation/block/barrier.txt261
-rw-r--r--Documentation/block/writeback_cache_control.txt86
-rw-r--r--Documentation/cgroups/blkio-controller.txt106
-rw-r--r--Documentation/cgroups/cgroups.txt14
-rw-r--r--Documentation/devices.txt6
-rw-r--r--Documentation/dynamic-debug-howto.txt22
-rw-r--r--Documentation/fb/viafb.txt48
-rw-r--r--Documentation/feature-removal-schedule.txt46
-rw-r--r--Documentation/filesystems/Locking31
-rw-r--r--Documentation/filesystems/nfs/00-INDEX4
-rw-r--r--Documentation/filesystems/nfs/idmapper.txt67
-rw-r--r--Documentation/filesystems/nfs/nfsroot.txt22
-rw-r--r--Documentation/filesystems/nfs/pnfs.txt48
-rw-r--r--Documentation/filesystems/proc.txt25
-rw-r--r--Documentation/filesystems/sharedsubtree.txt4
-rw-r--r--Documentation/hwmon/ltc426163
-rw-r--r--Documentation/input/ntrig.txt126
-rw-r--r--Documentation/kernel-parameters.txt33
-rw-r--r--Documentation/kvm/api.txt61
-rw-r--r--Documentation/kvm/ppc-pv.txt196
-rw-r--r--Documentation/kvm/timekeeping.txt612
-rw-r--r--Documentation/lguest/lguest.c29
-rw-r--r--Documentation/misc-devices/apds990x.txt111
-rw-r--r--Documentation/misc-devices/bh1770glc.txt116
-rw-r--r--Documentation/networking/bonding.txt8
-rw-r--r--Documentation/networking/can.txt12
-rw-r--r--Documentation/networking/dccp.txt29
-rw-r--r--Documentation/networking/ip-sysctl.txt27
-rw-r--r--Documentation/networking/phonet.txt56
-rw-r--r--Documentation/networking/phy.txt18
-rw-r--r--Documentation/networking/timestamping.txt22
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/usb.txt22
-rw-r--r--Documentation/scsi/st.txt15
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt82
-rw-r--r--Documentation/sound/alsa/HD-Audio.txt8
-rw-r--r--Documentation/sysctl/vm.txt12
-rw-r--r--Documentation/sysrq.txt7
-rw-r--r--Documentation/timers/hpet_example.c27
-rw-r--r--Documentation/trace/postprocess/trace-vmscan-postprocess.pl39
-rw-r--r--Documentation/usb/proc_usb_info.txt34
-rw-r--r--Documentation/vm/highmem.txt162
-rw-r--r--Documentation/vm/numa_memory_policy.txt2
-rw-r--r--Documentation/workqueue.txt29
-rw-r--r--Documentation/x86/x86_64/kernel-stacks6
61 files changed, 3055 insertions, 796 deletions
diff --git a/Documentation/ABI/obsolete/dv1394 b/Documentation/ABI/obsolete/dv1394
deleted file mode 100644
index 2ee36864ca10..000000000000
--- a/Documentation/ABI/obsolete/dv1394
+++ /dev/null
@@ -1,9 +0,0 @@
1What: dv1394 (a.k.a. "OHCI-DV I/O support" for FireWire)
2Contact: linux1394-devel@lists.sourceforge.net
3Description:
4 New application development should use raw1394 + userspace libraries
5 instead, notably libiec61883 which is functionally equivalent.
6
7Users:
8 ffmpeg/libavformat (used by a variety of media players)
9 dvgrab v1.x (replaced by dvgrab2 on top of raw1394 and resp. libraries)
diff --git a/Documentation/ABI/removed/dv1394 b/Documentation/ABI/removed/dv1394
new file mode 100644
index 000000000000..c2310b6676f4
--- /dev/null
+++ b/Documentation/ABI/removed/dv1394
@@ -0,0 +1,14 @@
1What: dv1394 (a.k.a. "OHCI-DV I/O support" for FireWire)
2Date: May 2010 (scheduled), finally removed in kernel v2.6.37
3Contact: linux1394-devel@lists.sourceforge.net
4Description:
5 /dev/dv1394/* were character device files, one for each FireWire
6 controller and for NTSC and PAL respectively, from which DV data
7 could be received by read() or transmitted by write(). A few
8 ioctl()s allowed limited control.
9 This special-purpose interface has been superseded by libraw1394 +
10 libiec61883 which are functionally equivalent, support HDV, and
11 transparently work on top of the newer firewire kernel drivers.
12
13Users:
14 ffmpeg/libavformat (if configured for DV1394)
diff --git a/Documentation/ABI/removed/raw1394 b/Documentation/ABI/removed/raw1394
new file mode 100644
index 000000000000..490aa1efc4ae
--- /dev/null
+++ b/Documentation/ABI/removed/raw1394
@@ -0,0 +1,15 @@
1What: raw1394 (a.k.a. "Raw IEEE1394 I/O support" for FireWire)
2Date: May 2010 (scheduled), finally removed in kernel v2.6.37
3Contact: linux1394-devel@lists.sourceforge.net
4Description:
5 /dev/raw1394 was a character device file that allowed low-level
6 access to FireWire buses. Its major drawbacks were its inability
7 to implement sensible device security policies, and its low level
8 of abstraction that required userspace clients do duplicate much
9 of the kernel's ieee1394 core functionality.
10 Replaced by /dev/fw*, i.e. the <linux/firewire-cdev.h> ABI of
11 firewire-core.
12
13Users:
14 libraw1394 (works with firewire-cdev too, transparent to library ABI
15 users)
diff --git a/Documentation/ABI/removed/raw1394_legacy_isochronous b/Documentation/ABI/removed/raw1394_legacy_isochronous
deleted file mode 100644
index 1b629622d883..000000000000
--- a/Documentation/ABI/removed/raw1394_legacy_isochronous
+++ /dev/null
@@ -1,16 +0,0 @@
1What: legacy isochronous ABI of raw1394 (1st generation iso ABI)
2Date: June 2007 (scheduled), removed in kernel v2.6.23
3Contact: linux1394-devel@lists.sourceforge.net
4Description:
5 The two request types RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN have
6 been deprecated for quite some time. They are very inefficient as they
7 come with high interrupt load and several layers of callbacks for each
8 packet. Because of these deficiencies, the video1394 and dv1394 drivers
9 and the 3rd-generation isochronous ABI in raw1394 (rawiso) were created.
10
11Users:
12 libraw1394 users via the long deprecated API raw1394_iso_write,
13 raw1394_start_iso_write, raw1394_start_iso_rcv, raw1394_stop_iso_rcv
14
15 libdc1394, which optionally uses these old libraw1394 calls
16 alternatively to the more efficient video1394 ABI
diff --git a/Documentation/ABI/removed/video1394 b/Documentation/ABI/removed/video1394
new file mode 100644
index 000000000000..c39c25aee77b
--- /dev/null
+++ b/Documentation/ABI/removed/video1394
@@ -0,0 +1,16 @@
1What: video1394 (a.k.a. "OHCI-1394 Video support" for FireWire)
2Date: May 2010 (scheduled), finally removed in kernel v2.6.37
3Contact: linux1394-devel@lists.sourceforge.net
4Description:
5 /dev/video1394/* were character device files, one for each FireWire
6 controller, which were used for isochronous I/O. It was added as an
7 alternative to raw1394's isochronous I/O functionality which had
8 performance issues in its first generation. Any video1394 user had
9 to use raw1394 + libraw1394 too because video1394 did not provide
10 asynchronous I/O for device discovery and configuration.
11 Replaced by /dev/fw*, i.e. the <linux/firewire-cdev.h> ABI of
12 firewire-core.
13
14Users:
15 libdc1394 (works with firewire-cdev too, transparent to library ABI
16 users)
diff --git a/Documentation/ABI/testing/sysfs-devices-system-ibm-rtl b/Documentation/ABI/testing/sysfs-devices-system-ibm-rtl
new file mode 100644
index 000000000000..b82deeaec314
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-system-ibm-rtl
@@ -0,0 +1,22 @@
1What: state
2Date: Sep 2010
3KernelVersion: 2.6.37
4Contact: Vernon Mauery <vernux@us.ibm.com>
5Description: The state file allows a means by which to change in and
6 out of Premium Real-Time Mode (PRTM), as well as the
7 ability to query the current state.
8 0 => PRTM off
9 1 => PRTM enabled
10Users: The ibm-prtm userspace daemon uses this interface.
11
12
13What: version
14Date: Sep 2010
15KernelVersion: 2.6.37
16Contact: Vernon Mauery <vernux@us.ibm.com>
17Description: The version file provides a means by which to query
18 the RTL table version that lives in the Extended
19 BIOS Data Area (EBDA).
20Users: The ibm-prtm userspace daemon uses this interface.
21
22
diff --git a/Documentation/ABI/testing/sysfs-driver-hid-roccat-pyra b/Documentation/ABI/testing/sysfs-driver-hid-roccat-pyra
new file mode 100644
index 000000000000..ad1125b02ff4
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-hid-roccat-pyra
@@ -0,0 +1,98 @@
1What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/actual_cpi
2Date: August 2010
3Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
4Description: It is possible to switch the cpi setting of the mouse with the
5 press of a button.
6 When read, this file returns the raw number of the actual cpi
7 setting reported by the mouse. This number has to be further
8 processed to receive the real dpi value.
9
10 VALUE DPI
11 1 400
12 2 800
13 4 1600
14
15 This file is readonly.
16
17What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/actual_profile
18Date: August 2010
19Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
20Description: When read, this file returns the number of the actual profile in
21 range 0-4.
22 This file is readonly.
23
24What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/firmware_version
25Date: August 2010
26Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
27Description: When read, this file returns the raw integer version number of the
28 firmware reported by the mouse. Using the integer value eases
29 further usage in other programs. To receive the real version
30 number the decimal point has to be shifted 2 positions to the
31 left. E.g. a returned value of 138 means 1.38
32 This file is readonly.
33
34What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/profile_settings
35Date: August 2010
36Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
37Description: The mouse can store 5 profiles which can be switched by the
38 press of a button. A profile is split in settings and buttons.
39 profile_settings holds informations like resolution, sensitivity
40 and light effects.
41 When written, this file lets one write the respective profile
42 settings back to the mouse. The data has to be 13 bytes long.
43 The mouse will reject invalid data.
44 Which profile to write is determined by the profile number
45 contained in the data.
46 This file is writeonly.
47
48What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/profile[1-5]_settings
49Date: August 2010
50Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
51Description: The mouse can store 5 profiles which can be switched by the
52 press of a button. A profile is split in settings and buttons.
53 profile_settings holds informations like resolution, sensitivity
54 and light effects.
55 When read, these files return the respective profile settings.
56 The returned data is 13 bytes in size.
57 This file is readonly.
58
59What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/profile_buttons
60Date: August 2010
61Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
62Description: The mouse can store 5 profiles which can be switched by the
63 press of a button. A profile is split in settings and buttons.
64 profile_buttons holds informations about button layout.
65 When written, this file lets one write the respective profile
66 buttons back to the mouse. The data has to be 19 bytes long.
67 The mouse will reject invalid data.
68 Which profile to write is determined by the profile number
69 contained in the data.
70 This file is writeonly.
71
72What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/profile[1-5]_buttons
73Date: August 2010
74Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
75Description: The mouse can store 5 profiles which can be switched by the
76 press of a button. A profile is split in settings and buttons.
77 profile_buttons holds informations about button layout.
78 When read, these files return the respective profile buttons.
79 The returned data is 19 bytes in size.
80 This file is readonly.
81
82What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/startup_profile
83Date: August 2010
84Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
85Description: The integer value of this attribute ranges from 0-4.
86 When read, this attribute returns the number of the profile
87 that's active when the mouse is powered on.
88 This file is readonly.
89
90What: /sys/bus/usb/devices/<busnum>-<devnum>:<config num>.<interface num>/settings
91Date: August 2010
92Contact: Stefan Achatz <erazor_de@users.sourceforge.net>
93Description: When read, this file returns the settings stored in the mouse.
94 The size of the data is 3 bytes and holds information on the
95 startup_profile.
96 When written, this file lets write settings back to the mouse.
97 The data has to be 3 bytes long. The mouse will reject invalid
98 data.
diff --git a/Documentation/ABI/testing/sysfs-module b/Documentation/ABI/testing/sysfs-module
new file mode 100644
index 000000000000..cfcec3bffc0a
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-module
@@ -0,0 +1,12 @@
1What: /sys/module/pch_phub/drivers/.../pch_mac
2Date: August 2010
3KernelVersion: 2.6.35
4Contact: masa-korg@dsn.okisemi.com
5Description: Write/read GbE MAC address.
6
7What: /sys/module/pch_phub/drivers/.../pch_firmware
8Date: August 2010
9KernelVersion: 2.6.35
10Contact: masa-korg@dsn.okisemi.com
11Description: Write/read Option ROM data.
12
diff --git a/Documentation/DocBook/80211.tmpl b/Documentation/DocBook/80211.tmpl
new file mode 100644
index 000000000000..19a1210c2530
--- /dev/null
+++ b/Documentation/DocBook/80211.tmpl
@@ -0,0 +1,495 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE set PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4<set>
5 <setinfo>
6 <title>The 802.11 subsystems &ndash; for kernel developers</title>
7 <subtitle>
8 Explaining wireless 802.11 networking in the Linux kernel
9 </subtitle>
10
11 <copyright>
12 <year>2007-2009</year>
13 <holder>Johannes Berg</holder>
14 </copyright>
15
16 <authorgroup>
17 <author>
18 <firstname>Johannes</firstname>
19 <surname>Berg</surname>
20 <affiliation>
21 <address><email>johannes@sipsolutions.net</email></address>
22 </affiliation>
23 </author>
24 </authorgroup>
25
26 <legalnotice>
27 <para>
28 This documentation is free software; you can redistribute
29 it and/or modify it under the terms of the GNU General Public
30 License version 2 as published by the Free Software Foundation.
31 </para>
32 <para>
33 This documentation is distributed in the hope that it will be
34 useful, but WITHOUT ANY WARRANTY; without even the implied
35 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
36 See the GNU General Public License for more details.
37 </para>
38 <para>
39 You should have received a copy of the GNU General Public
40 License along with this documentation; if not, write to the Free
41 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
42 MA 02111-1307 USA
43 </para>
44 <para>
45 For more details see the file COPYING in the source
46 distribution of Linux.
47 </para>
48 </legalnotice>
49
50 <abstract>
51 <para>
52 These books attempt to give a description of the
53 various subsystems that play a role in 802.11 wireless
54 networking in Linux. Since these books are for kernel
55 developers they attempts to document the structures
56 and functions used in the kernel as well as giving a
57 higher-level overview.
58 </para>
59 <para>
60 The reader is expected to be familiar with the 802.11
61 standard as published by the IEEE in 802.11-2007 (or
62 possibly later versions). References to this standard
63 will be given as "802.11-2007 8.1.5".
64 </para>
65 </abstract>
66 </setinfo>
67 <book id="cfg80211-developers-guide">
68 <bookinfo>
69 <title>The cfg80211 subsystem</title>
70
71 <abstract>
72!Pinclude/net/cfg80211.h Introduction
73 </abstract>
74 </bookinfo>
75 <chapter>
76 <title>Device registration</title>
77!Pinclude/net/cfg80211.h Device registration
78!Finclude/net/cfg80211.h ieee80211_band
79!Finclude/net/cfg80211.h ieee80211_channel_flags
80!Finclude/net/cfg80211.h ieee80211_channel
81!Finclude/net/cfg80211.h ieee80211_rate_flags
82!Finclude/net/cfg80211.h ieee80211_rate
83!Finclude/net/cfg80211.h ieee80211_sta_ht_cap
84!Finclude/net/cfg80211.h ieee80211_supported_band
85!Finclude/net/cfg80211.h cfg80211_signal_type
86!Finclude/net/cfg80211.h wiphy_params_flags
87!Finclude/net/cfg80211.h wiphy_flags
88!Finclude/net/cfg80211.h wiphy
89!Finclude/net/cfg80211.h wireless_dev
90!Finclude/net/cfg80211.h wiphy_new
91!Finclude/net/cfg80211.h wiphy_register
92!Finclude/net/cfg80211.h wiphy_unregister
93!Finclude/net/cfg80211.h wiphy_free
94
95!Finclude/net/cfg80211.h wiphy_name
96!Finclude/net/cfg80211.h wiphy_dev
97!Finclude/net/cfg80211.h wiphy_priv
98!Finclude/net/cfg80211.h priv_to_wiphy
99!Finclude/net/cfg80211.h set_wiphy_dev
100!Finclude/net/cfg80211.h wdev_priv
101 </chapter>
102 <chapter>
103 <title>Actions and configuration</title>
104!Pinclude/net/cfg80211.h Actions and configuration
105!Finclude/net/cfg80211.h cfg80211_ops
106!Finclude/net/cfg80211.h vif_params
107!Finclude/net/cfg80211.h key_params
108!Finclude/net/cfg80211.h survey_info_flags
109!Finclude/net/cfg80211.h survey_info
110!Finclude/net/cfg80211.h beacon_parameters
111!Finclude/net/cfg80211.h plink_actions
112!Finclude/net/cfg80211.h station_parameters
113!Finclude/net/cfg80211.h station_info_flags
114!Finclude/net/cfg80211.h rate_info_flags
115!Finclude/net/cfg80211.h rate_info
116!Finclude/net/cfg80211.h station_info
117!Finclude/net/cfg80211.h monitor_flags
118!Finclude/net/cfg80211.h mpath_info_flags
119!Finclude/net/cfg80211.h mpath_info
120!Finclude/net/cfg80211.h bss_parameters
121!Finclude/net/cfg80211.h ieee80211_txq_params
122!Finclude/net/cfg80211.h cfg80211_crypto_settings
123!Finclude/net/cfg80211.h cfg80211_auth_request
124!Finclude/net/cfg80211.h cfg80211_assoc_request
125!Finclude/net/cfg80211.h cfg80211_deauth_request
126!Finclude/net/cfg80211.h cfg80211_disassoc_request
127!Finclude/net/cfg80211.h cfg80211_ibss_params
128!Finclude/net/cfg80211.h cfg80211_connect_params
129!Finclude/net/cfg80211.h cfg80211_pmksa
130!Finclude/net/cfg80211.h cfg80211_send_rx_auth
131!Finclude/net/cfg80211.h cfg80211_send_auth_timeout
132!Finclude/net/cfg80211.h __cfg80211_auth_canceled
133!Finclude/net/cfg80211.h cfg80211_send_rx_assoc
134!Finclude/net/cfg80211.h cfg80211_send_assoc_timeout
135!Finclude/net/cfg80211.h cfg80211_send_deauth
136!Finclude/net/cfg80211.h __cfg80211_send_deauth
137!Finclude/net/cfg80211.h cfg80211_send_disassoc
138!Finclude/net/cfg80211.h __cfg80211_send_disassoc
139!Finclude/net/cfg80211.h cfg80211_ibss_joined
140!Finclude/net/cfg80211.h cfg80211_connect_result
141!Finclude/net/cfg80211.h cfg80211_roamed
142!Finclude/net/cfg80211.h cfg80211_disconnected
143!Finclude/net/cfg80211.h cfg80211_ready_on_channel
144!Finclude/net/cfg80211.h cfg80211_remain_on_channel_expired
145!Finclude/net/cfg80211.h cfg80211_new_sta
146!Finclude/net/cfg80211.h cfg80211_rx_mgmt
147!Finclude/net/cfg80211.h cfg80211_mgmt_tx_status
148!Finclude/net/cfg80211.h cfg80211_cqm_rssi_notify
149!Finclude/net/cfg80211.h cfg80211_michael_mic_failure
150 </chapter>
151 <chapter>
152 <title>Scanning and BSS list handling</title>
153!Pinclude/net/cfg80211.h Scanning and BSS list handling
154!Finclude/net/cfg80211.h cfg80211_ssid
155!Finclude/net/cfg80211.h cfg80211_scan_request
156!Finclude/net/cfg80211.h cfg80211_scan_done
157!Finclude/net/cfg80211.h cfg80211_bss
158!Finclude/net/cfg80211.h cfg80211_inform_bss_frame
159!Finclude/net/cfg80211.h cfg80211_inform_bss
160!Finclude/net/cfg80211.h cfg80211_unlink_bss
161!Finclude/net/cfg80211.h cfg80211_find_ie
162!Finclude/net/cfg80211.h ieee80211_bss_get_ie
163 </chapter>
164 <chapter>
165 <title>Utility functions</title>
166!Pinclude/net/cfg80211.h Utility functions
167!Finclude/net/cfg80211.h ieee80211_channel_to_frequency
168!Finclude/net/cfg80211.h ieee80211_frequency_to_channel
169!Finclude/net/cfg80211.h ieee80211_get_channel
170!Finclude/net/cfg80211.h ieee80211_get_response_rate
171!Finclude/net/cfg80211.h ieee80211_hdrlen
172!Finclude/net/cfg80211.h ieee80211_get_hdrlen_from_skb
173!Finclude/net/cfg80211.h ieee80211_radiotap_iterator
174 </chapter>
175 <chapter>
176 <title>Data path helpers</title>
177!Pinclude/net/cfg80211.h Data path helpers
178!Finclude/net/cfg80211.h ieee80211_data_to_8023
179!Finclude/net/cfg80211.h ieee80211_data_from_8023
180!Finclude/net/cfg80211.h ieee80211_amsdu_to_8023s
181!Finclude/net/cfg80211.h cfg80211_classify8021d
182 </chapter>
183 <chapter>
184 <title>Regulatory enforcement infrastructure</title>
185!Pinclude/net/cfg80211.h Regulatory enforcement infrastructure
186!Finclude/net/cfg80211.h regulatory_hint
187!Finclude/net/cfg80211.h wiphy_apply_custom_regulatory
188!Finclude/net/cfg80211.h freq_reg_info
189 </chapter>
190 <chapter>
191 <title>RFkill integration</title>
192!Pinclude/net/cfg80211.h RFkill integration
193!Finclude/net/cfg80211.h wiphy_rfkill_set_hw_state
194!Finclude/net/cfg80211.h wiphy_rfkill_start_polling
195!Finclude/net/cfg80211.h wiphy_rfkill_stop_polling
196 </chapter>
197 <chapter>
198 <title>Test mode</title>
199!Pinclude/net/cfg80211.h Test mode
200!Finclude/net/cfg80211.h cfg80211_testmode_alloc_reply_skb
201!Finclude/net/cfg80211.h cfg80211_testmode_reply
202!Finclude/net/cfg80211.h cfg80211_testmode_alloc_event_skb
203!Finclude/net/cfg80211.h cfg80211_testmode_event
204 </chapter>
205 </book>
206 <book id="mac80211-developers-guide">
207 <bookinfo>
208 <title>The mac80211 subsystem</title>
209 <abstract>
210!Pinclude/net/mac80211.h Introduction
211!Pinclude/net/mac80211.h Warning
212 </abstract>
213 </bookinfo>
214
215 <toc></toc>
216
217 <!--
218 Generally, this document shall be ordered by increasing complexity.
219 It is important to note that readers should be able to read only
220 the first few sections to get a working driver and only advanced
221 usage should require reading the full document.
222 -->
223
224 <part>
225 <title>The basic mac80211 driver interface</title>
226 <partintro>
227 <para>
228 You should read and understand the information contained
229 within this part of the book while implementing a driver.
230 In some chapters, advanced usage is noted, that may be
231 skipped at first.
232 </para>
233 <para>
234 This part of the book only covers station and monitor mode
235 functionality, additional information required to implement
236 the other modes is covered in the second part of the book.
237 </para>
238 </partintro>
239
240 <chapter id="basics">
241 <title>Basic hardware handling</title>
242 <para>TBD</para>
243 <para>
244 This chapter shall contain information on getting a hw
245 struct allocated and registered with mac80211.
246 </para>
247 <para>
248 Since it is required to allocate rates/modes before registering
249 a hw struct, this chapter shall also contain information on setting
250 up the rate/mode structs.
251 </para>
252 <para>
253 Additionally, some discussion about the callbacks and
254 the general programming model should be in here, including
255 the definition of ieee80211_ops which will be referred to
256 a lot.
257 </para>
258 <para>
259 Finally, a discussion of hardware capabilities should be done
260 with references to other parts of the book.
261 </para>
262 <!-- intentionally multiple !F lines to get proper order -->
263!Finclude/net/mac80211.h ieee80211_hw
264!Finclude/net/mac80211.h ieee80211_hw_flags
265!Finclude/net/mac80211.h SET_IEEE80211_DEV
266!Finclude/net/mac80211.h SET_IEEE80211_PERM_ADDR
267!Finclude/net/mac80211.h ieee80211_ops
268!Finclude/net/mac80211.h ieee80211_alloc_hw
269!Finclude/net/mac80211.h ieee80211_register_hw
270!Finclude/net/mac80211.h ieee80211_get_tx_led_name
271!Finclude/net/mac80211.h ieee80211_get_rx_led_name
272!Finclude/net/mac80211.h ieee80211_get_assoc_led_name
273!Finclude/net/mac80211.h ieee80211_get_radio_led_name
274!Finclude/net/mac80211.h ieee80211_unregister_hw
275!Finclude/net/mac80211.h ieee80211_free_hw
276 </chapter>
277
278 <chapter id="phy-handling">
279 <title>PHY configuration</title>
280 <para>TBD</para>
281 <para>
282 This chapter should describe PHY handling including
283 start/stop callbacks and the various structures used.
284 </para>
285!Finclude/net/mac80211.h ieee80211_conf
286!Finclude/net/mac80211.h ieee80211_conf_flags
287 </chapter>
288
289 <chapter id="iface-handling">
290 <title>Virtual interfaces</title>
291 <para>TBD</para>
292 <para>
293 This chapter should describe virtual interface basics
294 that are relevant to the driver (VLANs, MGMT etc are not.)
295 It should explain the use of the add_iface/remove_iface
296 callbacks as well as the interface configuration callbacks.
297 </para>
298 <para>Things related to AP mode should be discussed there.</para>
299 <para>
300 Things related to supporting multiple interfaces should be
301 in the appropriate chapter, a BIG FAT note should be here about
302 this though and the recommendation to allow only a single
303 interface in STA mode at first!
304 </para>
305!Finclude/net/mac80211.h ieee80211_vif
306 </chapter>
307
308 <chapter id="rx-tx">
309 <title>Receive and transmit processing</title>
310 <sect1>
311 <title>what should be here</title>
312 <para>TBD</para>
313 <para>
314 This should describe the receive and transmit
315 paths in mac80211/the drivers as well as
316 transmit status handling.
317 </para>
318 </sect1>
319 <sect1>
320 <title>Frame format</title>
321!Pinclude/net/mac80211.h Frame format
322 </sect1>
323 <sect1>
324 <title>Packet alignment</title>
325!Pnet/mac80211/rx.c Packet alignment
326 </sect1>
327 <sect1>
328 <title>Calling into mac80211 from interrupts</title>
329!Pinclude/net/mac80211.h Calling mac80211 from interrupts
330 </sect1>
331 <sect1>
332 <title>functions/definitions</title>
333!Finclude/net/mac80211.h ieee80211_rx_status
334!Finclude/net/mac80211.h mac80211_rx_flags
335!Finclude/net/mac80211.h ieee80211_tx_info
336!Finclude/net/mac80211.h ieee80211_rx
337!Finclude/net/mac80211.h ieee80211_rx_irqsafe
338!Finclude/net/mac80211.h ieee80211_tx_status
339!Finclude/net/mac80211.h ieee80211_tx_status_irqsafe
340!Finclude/net/mac80211.h ieee80211_rts_get
341!Finclude/net/mac80211.h ieee80211_rts_duration
342!Finclude/net/mac80211.h ieee80211_ctstoself_get
343!Finclude/net/mac80211.h ieee80211_ctstoself_duration
344!Finclude/net/mac80211.h ieee80211_generic_frame_duration
345!Finclude/net/mac80211.h ieee80211_wake_queue
346!Finclude/net/mac80211.h ieee80211_stop_queue
347!Finclude/net/mac80211.h ieee80211_wake_queues
348!Finclude/net/mac80211.h ieee80211_stop_queues
349 </sect1>
350 </chapter>
351
352 <chapter id="filters">
353 <title>Frame filtering</title>
354!Pinclude/net/mac80211.h Frame filtering
355!Finclude/net/mac80211.h ieee80211_filter_flags
356 </chapter>
357 </part>
358
359 <part id="advanced">
360 <title>Advanced driver interface</title>
361 <partintro>
362 <para>
363 Information contained within this part of the book is
364 of interest only for advanced interaction of mac80211
365 with drivers to exploit more hardware capabilities and
366 improve performance.
367 </para>
368 </partintro>
369
370 <chapter id="hardware-crypto-offload">
371 <title>Hardware crypto acceleration</title>
372!Pinclude/net/mac80211.h Hardware crypto acceleration
373 <!-- intentionally multiple !F lines to get proper order -->
374!Finclude/net/mac80211.h set_key_cmd
375!Finclude/net/mac80211.h ieee80211_key_conf
376!Finclude/net/mac80211.h ieee80211_key_flags
377 </chapter>
378
379 <chapter id="powersave">
380 <title>Powersave support</title>
381!Pinclude/net/mac80211.h Powersave support
382 </chapter>
383
384 <chapter id="beacon-filter">
385 <title>Beacon filter support</title>
386!Pinclude/net/mac80211.h Beacon filter support
387!Finclude/net/mac80211.h ieee80211_beacon_loss
388 </chapter>
389
390 <chapter id="qos">
391 <title>Multiple queues and QoS support</title>
392 <para>TBD</para>
393!Finclude/net/mac80211.h ieee80211_tx_queue_params
394 </chapter>
395
396 <chapter id="AP">
397 <title>Access point mode support</title>
398 <para>TBD</para>
399 <para>Some parts of the if_conf should be discussed here instead</para>
400 <para>
401 Insert notes about VLAN interfaces with hw crypto here or
402 in the hw crypto chapter.
403 </para>
404!Finclude/net/mac80211.h ieee80211_get_buffered_bc
405!Finclude/net/mac80211.h ieee80211_beacon_get
406 </chapter>
407
408 <chapter id="multi-iface">
409 <title>Supporting multiple virtual interfaces</title>
410 <para>TBD</para>
411 <para>
412 Note: WDS with identical MAC address should almost always be OK
413 </para>
414 <para>
415 Insert notes about having multiple virtual interfaces with
416 different MAC addresses here, note which configurations are
417 supported by mac80211, add notes about supporting hw crypto
418 with it.
419 </para>
420 </chapter>
421
422 <chapter id="hardware-scan-offload">
423 <title>Hardware scan offload</title>
424 <para>TBD</para>
425!Finclude/net/mac80211.h ieee80211_scan_completed
426 </chapter>
427 </part>
428
429 <part id="rate-control">
430 <title>Rate control interface</title>
431 <partintro>
432 <para>TBD</para>
433 <para>
434 This part of the book describes the rate control algorithm
435 interface and how it relates to mac80211 and drivers.
436 </para>
437 </partintro>
438 <chapter id="dummy">
439 <title>dummy chapter</title>
440 <para>TBD</para>
441 </chapter>
442 </part>
443
444 <part id="internal">
445 <title>Internals</title>
446 <partintro>
447 <para>TBD</para>
448 <para>
449 This part of the book describes mac80211 internals.
450 </para>
451 </partintro>
452
453 <chapter id="key-handling">
454 <title>Key handling</title>
455 <sect1>
456 <title>Key handling basics</title>
457!Pnet/mac80211/key.c Key handling basics
458 </sect1>
459 <sect1>
460 <title>MORE TBD</title>
461 <para>TBD</para>
462 </sect1>
463 </chapter>
464
465 <chapter id="rx-processing">
466 <title>Receive processing</title>
467 <para>TBD</para>
468 </chapter>
469
470 <chapter id="tx-processing">
471 <title>Transmit processing</title>
472 <para>TBD</para>
473 </chapter>
474
475 <chapter id="sta-info">
476 <title>Station info handling</title>
477 <sect1>
478 <title>Programming information</title>
479!Fnet/mac80211/sta_info.h sta_info
480!Fnet/mac80211/sta_info.h ieee80211_sta_info_flags
481 </sect1>
482 <sect1>
483 <title>STA information lifetime rules</title>
484!Pnet/mac80211/sta_info.c STA information lifetime rules
485 </sect1>
486 </chapter>
487
488 <chapter id="synchronisation">
489 <title>Synchronisation</title>
490 <para>TBD</para>
491 <para>Locking, lots of RCU</para>
492 </chapter>
493 </part>
494 </book>
495</set>
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index 34929f24c284..8b6e00a71034 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -12,7 +12,7 @@ DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \
12 kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \ 12 kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
13 gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ 13 gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
14 genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \ 14 genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
15 mac80211.xml debugobjects.xml sh.xml regulator.xml \ 15 80211.xml debugobjects.xml sh.xml regulator.xml \
16 alsa-driver-api.xml writing-an-alsa-driver.xml \ 16 alsa-driver-api.xml writing-an-alsa-driver.xml \
17 tracepoint.xml media.xml drm.xml 17 tracepoint.xml media.xml drm.xml
18 18
diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index feca0758391e..22edcbb9ddaf 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -51,8 +51,13 @@
51 <sect1><title>Delaying, scheduling, and timer routines</title> 51 <sect1><title>Delaying, scheduling, and timer routines</title>
52!Iinclude/linux/sched.h 52!Iinclude/linux/sched.h
53!Ekernel/sched.c 53!Ekernel/sched.c
54!Iinclude/linux/completion.h
54!Ekernel/timer.c 55!Ekernel/timer.c
55 </sect1> 56 </sect1>
57 <sect1><title>Wait queues and Wake events</title>
58!Iinclude/linux/wait.h
59!Ekernel/wait.c
60 </sect1>
56 <sect1><title>High-resolution timers</title> 61 <sect1><title>High-resolution timers</title>
57!Iinclude/linux/ktime.h 62!Iinclude/linux/ktime.h
58!Iinclude/linux/hrtimer.h 63!Iinclude/linux/hrtimer.h
diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl
index 910c923a9b86..2861055afd7a 100644
--- a/Documentation/DocBook/drm.tmpl
+++ b/Documentation/DocBook/drm.tmpl
@@ -136,6 +136,7 @@
136#ifdef CONFIG_COMPAT 136#ifdef CONFIG_COMPAT
137 .compat_ioctl = i915_compat_ioctl, 137 .compat_ioctl = i915_compat_ioctl,
138#endif 138#endif
139 .llseek = noop_llseek,
139 }, 140 },
140 .pci_driver = { 141 .pci_driver = {
141 .name = DRIVER_NAME, 142 .name = DRIVER_NAME,
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index 6899f471fb15..7160652a8736 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -93,6 +93,12 @@ X!Ilib/string.c
93!Elib/crc32.c 93!Elib/crc32.c
94!Elib/crc-ccitt.c 94!Elib/crc-ccitt.c
95 </sect1> 95 </sect1>
96
97 <sect1 id="idr"><title>idr/ida Functions</title>
98!Pinclude/linux/idr.h idr sync
99!Plib/idr.c IDA description
100!Elib/idr.c
101 </sect1>
96 </chapter> 102 </chapter>
97 103
98 <chapter id="mm"> 104 <chapter id="mm">
@@ -257,7 +263,8 @@ X!Earch/x86/kernel/mca_32.c
257!Iblock/blk-sysfs.c 263!Iblock/blk-sysfs.c
258!Eblock/blk-settings.c 264!Eblock/blk-settings.c
259!Eblock/blk-exec.c 265!Eblock/blk-exec.c
260!Eblock/blk-barrier.c 266!Eblock/blk-flush.c
267!Eblock/blk-lib.c
261!Eblock/blk-tag.c 268!Eblock/blk-tag.c
262!Iblock/blk-tag.c 269!Iblock/blk-tag.c
263!Eblock/blk-integrity.c 270!Eblock/blk-integrity.c
diff --git a/Documentation/DocBook/mac80211.tmpl b/Documentation/DocBook/mac80211.tmpl
deleted file mode 100644
index affb15a344a1..000000000000
--- a/Documentation/DocBook/mac80211.tmpl
+++ /dev/null
@@ -1,337 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="mac80211-developers-guide">
6 <bookinfo>
7 <title>The mac80211 subsystem for kernel developers</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Johannes</firstname>
12 <surname>Berg</surname>
13 <affiliation>
14 <address><email>johannes@sipsolutions.net</email></address>
15 </affiliation>
16 </author>
17 </authorgroup>
18
19 <copyright>
20 <year>2007-2009</year>
21 <holder>Johannes Berg</holder>
22 </copyright>
23
24 <legalnotice>
25 <para>
26 This documentation is free software; you can redistribute
27 it and/or modify it under the terms of the GNU General Public
28 License version 2 as published by the Free Software Foundation.
29 </para>
30
31 <para>
32 This documentation is distributed in the hope that it will be
33 useful, but WITHOUT ANY WARRANTY; without even the implied
34 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
35 See the GNU General Public License for more details.
36 </para>
37
38 <para>
39 You should have received a copy of the GNU General Public
40 License along with this documentation; if not, write to the Free
41 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
42 MA 02111-1307 USA
43 </para>
44
45 <para>
46 For more details see the file COPYING in the source
47 distribution of Linux.
48 </para>
49 </legalnotice>
50
51 <abstract>
52!Pinclude/net/mac80211.h Introduction
53!Pinclude/net/mac80211.h Warning
54 </abstract>
55 </bookinfo>
56
57 <toc></toc>
58
59<!--
60Generally, this document shall be ordered by increasing complexity.
61It is important to note that readers should be able to read only
62the first few sections to get a working driver and only advanced
63usage should require reading the full document.
64-->
65
66 <part>
67 <title>The basic mac80211 driver interface</title>
68 <partintro>
69 <para>
70 You should read and understand the information contained
71 within this part of the book while implementing a driver.
72 In some chapters, advanced usage is noted, that may be
73 skipped at first.
74 </para>
75 <para>
76 This part of the book only covers station and monitor mode
77 functionality, additional information required to implement
78 the other modes is covered in the second part of the book.
79 </para>
80 </partintro>
81
82 <chapter id="basics">
83 <title>Basic hardware handling</title>
84 <para>TBD</para>
85 <para>
86 This chapter shall contain information on getting a hw
87 struct allocated and registered with mac80211.
88 </para>
89 <para>
90 Since it is required to allocate rates/modes before registering
91 a hw struct, this chapter shall also contain information on setting
92 up the rate/mode structs.
93 </para>
94 <para>
95 Additionally, some discussion about the callbacks and
96 the general programming model should be in here, including
97 the definition of ieee80211_ops which will be referred to
98 a lot.
99 </para>
100 <para>
101 Finally, a discussion of hardware capabilities should be done
102 with references to other parts of the book.
103 </para>
104<!-- intentionally multiple !F lines to get proper order -->
105!Finclude/net/mac80211.h ieee80211_hw
106!Finclude/net/mac80211.h ieee80211_hw_flags
107!Finclude/net/mac80211.h SET_IEEE80211_DEV
108!Finclude/net/mac80211.h SET_IEEE80211_PERM_ADDR
109!Finclude/net/mac80211.h ieee80211_ops
110!Finclude/net/mac80211.h ieee80211_alloc_hw
111!Finclude/net/mac80211.h ieee80211_register_hw
112!Finclude/net/mac80211.h ieee80211_get_tx_led_name
113!Finclude/net/mac80211.h ieee80211_get_rx_led_name
114!Finclude/net/mac80211.h ieee80211_get_assoc_led_name
115!Finclude/net/mac80211.h ieee80211_get_radio_led_name
116!Finclude/net/mac80211.h ieee80211_unregister_hw
117!Finclude/net/mac80211.h ieee80211_free_hw
118 </chapter>
119
120 <chapter id="phy-handling">
121 <title>PHY configuration</title>
122 <para>TBD</para>
123 <para>
124 This chapter should describe PHY handling including
125 start/stop callbacks and the various structures used.
126 </para>
127!Finclude/net/mac80211.h ieee80211_conf
128!Finclude/net/mac80211.h ieee80211_conf_flags
129 </chapter>
130
131 <chapter id="iface-handling">
132 <title>Virtual interfaces</title>
133 <para>TBD</para>
134 <para>
135 This chapter should describe virtual interface basics
136 that are relevant to the driver (VLANs, MGMT etc are not.)
137 It should explain the use of the add_iface/remove_iface
138 callbacks as well as the interface configuration callbacks.
139 </para>
140 <para>Things related to AP mode should be discussed there.</para>
141 <para>
142 Things related to supporting multiple interfaces should be
143 in the appropriate chapter, a BIG FAT note should be here about
144 this though and the recommendation to allow only a single
145 interface in STA mode at first!
146 </para>
147!Finclude/net/mac80211.h ieee80211_vif
148 </chapter>
149
150 <chapter id="rx-tx">
151 <title>Receive and transmit processing</title>
152 <sect1>
153 <title>what should be here</title>
154 <para>TBD</para>
155 <para>
156 This should describe the receive and transmit
157 paths in mac80211/the drivers as well as
158 transmit status handling.
159 </para>
160 </sect1>
161 <sect1>
162 <title>Frame format</title>
163!Pinclude/net/mac80211.h Frame format
164 </sect1>
165 <sect1>
166 <title>Packet alignment</title>
167!Pnet/mac80211/rx.c Packet alignment
168 </sect1>
169 <sect1>
170 <title>Calling into mac80211 from interrupts</title>
171!Pinclude/net/mac80211.h Calling mac80211 from interrupts
172 </sect1>
173 <sect1>
174 <title>functions/definitions</title>
175!Finclude/net/mac80211.h ieee80211_rx_status
176!Finclude/net/mac80211.h mac80211_rx_flags
177!Finclude/net/mac80211.h ieee80211_tx_info
178!Finclude/net/mac80211.h ieee80211_rx
179!Finclude/net/mac80211.h ieee80211_rx_irqsafe
180!Finclude/net/mac80211.h ieee80211_tx_status
181!Finclude/net/mac80211.h ieee80211_tx_status_irqsafe
182!Finclude/net/mac80211.h ieee80211_rts_get
183!Finclude/net/mac80211.h ieee80211_rts_duration
184!Finclude/net/mac80211.h ieee80211_ctstoself_get
185!Finclude/net/mac80211.h ieee80211_ctstoself_duration
186!Finclude/net/mac80211.h ieee80211_generic_frame_duration
187!Finclude/net/mac80211.h ieee80211_wake_queue
188!Finclude/net/mac80211.h ieee80211_stop_queue
189!Finclude/net/mac80211.h ieee80211_wake_queues
190!Finclude/net/mac80211.h ieee80211_stop_queues
191 </sect1>
192 </chapter>
193
194 <chapter id="filters">
195 <title>Frame filtering</title>
196!Pinclude/net/mac80211.h Frame filtering
197!Finclude/net/mac80211.h ieee80211_filter_flags
198 </chapter>
199 </part>
200
201 <part id="advanced">
202 <title>Advanced driver interface</title>
203 <partintro>
204 <para>
205 Information contained within this part of the book is
206 of interest only for advanced interaction of mac80211
207 with drivers to exploit more hardware capabilities and
208 improve performance.
209 </para>
210 </partintro>
211
212 <chapter id="hardware-crypto-offload">
213 <title>Hardware crypto acceleration</title>
214!Pinclude/net/mac80211.h Hardware crypto acceleration
215<!-- intentionally multiple !F lines to get proper order -->
216!Finclude/net/mac80211.h set_key_cmd
217!Finclude/net/mac80211.h ieee80211_key_conf
218!Finclude/net/mac80211.h ieee80211_key_alg
219!Finclude/net/mac80211.h ieee80211_key_flags
220 </chapter>
221
222 <chapter id="powersave">
223 <title>Powersave support</title>
224!Pinclude/net/mac80211.h Powersave support
225 </chapter>
226
227 <chapter id="beacon-filter">
228 <title>Beacon filter support</title>
229!Pinclude/net/mac80211.h Beacon filter support
230!Finclude/net/mac80211.h ieee80211_beacon_loss
231 </chapter>
232
233 <chapter id="qos">
234 <title>Multiple queues and QoS support</title>
235 <para>TBD</para>
236!Finclude/net/mac80211.h ieee80211_tx_queue_params
237 </chapter>
238
239 <chapter id="AP">
240 <title>Access point mode support</title>
241 <para>TBD</para>
242 <para>Some parts of the if_conf should be discussed here instead</para>
243 <para>
244 Insert notes about VLAN interfaces with hw crypto here or
245 in the hw crypto chapter.
246 </para>
247!Finclude/net/mac80211.h ieee80211_get_buffered_bc
248!Finclude/net/mac80211.h ieee80211_beacon_get
249 </chapter>
250
251 <chapter id="multi-iface">
252 <title>Supporting multiple virtual interfaces</title>
253 <para>TBD</para>
254 <para>
255 Note: WDS with identical MAC address should almost always be OK
256 </para>
257 <para>
258 Insert notes about having multiple virtual interfaces with
259 different MAC addresses here, note which configurations are
260 supported by mac80211, add notes about supporting hw crypto
261 with it.
262 </para>
263 </chapter>
264
265 <chapter id="hardware-scan-offload">
266 <title>Hardware scan offload</title>
267 <para>TBD</para>
268!Finclude/net/mac80211.h ieee80211_scan_completed
269 </chapter>
270 </part>
271
272 <part id="rate-control">
273 <title>Rate control interface</title>
274 <partintro>
275 <para>TBD</para>
276 <para>
277 This part of the book describes the rate control algorithm
278 interface and how it relates to mac80211 and drivers.
279 </para>
280 </partintro>
281 <chapter id="dummy">
282 <title>dummy chapter</title>
283 <para>TBD</para>
284 </chapter>
285 </part>
286
287 <part id="internal">
288 <title>Internals</title>
289 <partintro>
290 <para>TBD</para>
291 <para>
292 This part of the book describes mac80211 internals.
293 </para>
294 </partintro>
295
296 <chapter id="key-handling">
297 <title>Key handling</title>
298 <sect1>
299 <title>Key handling basics</title>
300!Pnet/mac80211/key.c Key handling basics
301 </sect1>
302 <sect1>
303 <title>MORE TBD</title>
304 <para>TBD</para>
305 </sect1>
306 </chapter>
307
308 <chapter id="rx-processing">
309 <title>Receive processing</title>
310 <para>TBD</para>
311 </chapter>
312
313 <chapter id="tx-processing">
314 <title>Transmit processing</title>
315 <para>TBD</para>
316 </chapter>
317
318 <chapter id="sta-info">
319 <title>Station info handling</title>
320 <sect1>
321 <title>Programming information</title>
322!Fnet/mac80211/sta_info.h sta_info
323!Fnet/mac80211/sta_info.h ieee80211_sta_info_flags
324 </sect1>
325 <sect1>
326 <title>STA information lifetime rules</title>
327!Pnet/mac80211/sta_info.c STA information lifetime rules
328 </sect1>
329 </chapter>
330
331 <chapter id="synchronisation">
332 <title>Synchronisation</title>
333 <para>TBD</para>
334 <para>Locking, lots of RCU</para>
335 </chapter>
336 </part>
337</book>
diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c
index 6e25c2659e0a..a2976a6de033 100644
--- a/Documentation/accounting/getdelays.c
+++ b/Documentation/accounting/getdelays.c
@@ -21,6 +21,7 @@
21#include <sys/types.h> 21#include <sys/types.h>
22#include <sys/stat.h> 22#include <sys/stat.h>
23#include <sys/socket.h> 23#include <sys/socket.h>
24#include <sys/wait.h>
24#include <signal.h> 25#include <signal.h>
25 26
26#include <linux/genetlink.h> 27#include <linux/genetlink.h>
@@ -266,11 +267,13 @@ int main(int argc, char *argv[])
266 int containerset = 0; 267 int containerset = 0;
267 char containerpath[1024]; 268 char containerpath[1024];
268 int cfd = 0; 269 int cfd = 0;
270 int forking = 0;
271 sigset_t sigset;
269 272
270 struct msgtemplate msg; 273 struct msgtemplate msg;
271 274
272 while (1) { 275 while (!forking) {
273 c = getopt(argc, argv, "qdiw:r:m:t:p:vlC:"); 276 c = getopt(argc, argv, "qdiw:r:m:t:p:vlC:c:");
274 if (c < 0) 277 if (c < 0)
275 break; 278 break;
276 279
@@ -319,6 +322,28 @@ int main(int argc, char *argv[])
319 err(1, "Invalid pid\n"); 322 err(1, "Invalid pid\n");
320 cmd_type = TASKSTATS_CMD_ATTR_PID; 323 cmd_type = TASKSTATS_CMD_ATTR_PID;
321 break; 324 break;
325 case 'c':
326
327 /* Block SIGCHLD for sigwait() later */
328 if (sigemptyset(&sigset) == -1)
329 err(1, "Failed to empty sigset");
330 if (sigaddset(&sigset, SIGCHLD))
331 err(1, "Failed to set sigchld in sigset");
332 sigprocmask(SIG_BLOCK, &sigset, NULL);
333
334 /* fork/exec a child */
335 tid = fork();
336 if (tid < 0)
337 err(1, "Fork failed\n");
338 if (tid == 0)
339 if (execvp(argv[optind - 1],
340 &argv[optind - 1]) < 0)
341 exit(-1);
342
343 /* Set the command type and avoid further processing */
344 cmd_type = TASKSTATS_CMD_ATTR_PID;
345 forking = 1;
346 break;
322 case 'v': 347 case 'v':
323 printf("debug on\n"); 348 printf("debug on\n");
324 dbg = 1; 349 dbg = 1;
@@ -370,6 +395,15 @@ int main(int argc, char *argv[])
370 goto err; 395 goto err;
371 } 396 }
372 397
398 /*
399 * If we forked a child, wait for it to exit. Cannot use waitpid()
400 * as all the delicious data would be reaped as part of the wait
401 */
402 if (tid && forking) {
403 int sig_received;
404 sigwait(&sigset, &sig_received);
405 }
406
373 if (tid) { 407 if (tid) {
374 rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET, 408 rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET,
375 cmd_type, &tid, sizeof(__u32)); 409 cmd_type, &tid, sizeof(__u32));
diff --git a/Documentation/arm/SA1100/FreeBird b/Documentation/arm/SA1100/FreeBird
index fb23b770aaf4..ab9193663b2b 100644
--- a/Documentation/arm/SA1100/FreeBird
+++ b/Documentation/arm/SA1100/FreeBird
@@ -1,6 +1,6 @@
1Freebird-1.1 is produced by Legned(C) ,Inc. 1Freebird-1.1 is produced by Legend(C), Inc.
2http://web.archive.org/web/*/http://www.legend.com.cn 2http://web.archive.org/web/*/http://www.legend.com.cn
3and software/linux mainatined by Coventive(C),Inc. 3and software/linux maintained by Coventive(C), Inc.
4(http://www.coventive.com) 4(http://www.coventive.com)
5 5
6Based on the Nicolas's strongarm kernel tree. 6Based on the Nicolas's strongarm kernel tree.
diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX
index a406286f6f3e..d111e3b23db0 100644
--- a/Documentation/block/00-INDEX
+++ b/Documentation/block/00-INDEX
@@ -1,7 +1,5 @@
100-INDEX 100-INDEX
2 - This file 2 - This file
3barrier.txt
4 - I/O Barriers
5biodoc.txt 3biodoc.txt
6 - Notes on the Generic Block Layer Rewrite in Linux 2.5 4 - Notes on the Generic Block Layer Rewrite in Linux 2.5
7capability.txt 5capability.txt
@@ -16,3 +14,5 @@ stat.txt
16 - Block layer statistics in /sys/block/<dev>/stat 14 - Block layer statistics in /sys/block/<dev>/stat
17switching-sched.txt 15switching-sched.txt
18 - Switching I/O schedulers at runtime 16 - Switching I/O schedulers at runtime
17writeback_cache_control.txt
18 - Control of volatile write back caches
diff --git a/Documentation/block/barrier.txt b/Documentation/block/barrier.txt
deleted file mode 100644
index 2c2f24f634e4..000000000000
--- a/Documentation/block/barrier.txt
+++ /dev/null
@@ -1,261 +0,0 @@
1I/O Barriers
2============
3Tejun Heo <htejun@gmail.com>, July 22 2005
4
5I/O barrier requests are used to guarantee ordering around the barrier
6requests. Unless you're crazy enough to use disk drives for
7implementing synchronization constructs (wow, sounds interesting...),
8the ordering is meaningful only for write requests for things like
9journal checkpoints. All requests queued before a barrier request
10must be finished (made it to the physical medium) before the barrier
11request is started, and all requests queued after the barrier request
12must be started only after the barrier request is finished (again,
13made it to the physical medium).
14
15In other words, I/O barrier requests have the following two properties.
16
171. Request ordering
18
19Requests cannot pass the barrier request. Preceding requests are
20processed before the barrier and following requests after.
21
22Depending on what features a drive supports, this can be done in one
23of the following three ways.
24
25i. For devices which have queue depth greater than 1 (TCQ devices) and
26support ordered tags, block layer can just issue the barrier as an
27ordered request and the lower level driver, controller and drive
28itself are responsible for making sure that the ordering constraint is
29met. Most modern SCSI controllers/drives should support this.
30
31NOTE: SCSI ordered tag isn't currently used due to limitation in the
32 SCSI midlayer, see the following random notes section.
33
34ii. For devices which have queue depth greater than 1 but don't
35support ordered tags, block layer ensures that the requests preceding
36a barrier request finishes before issuing the barrier request. Also,
37it defers requests following the barrier until the barrier request is
38finished. Older SCSI controllers/drives and SATA drives fall in this
39category.
40
41iii. Devices which have queue depth of 1. This is a degenerate case
42of ii. Just keeping issue order suffices. Ancient SCSI
43controllers/drives and IDE drives are in this category.
44
452. Forced flushing to physical medium
46
47Again, if you're not gonna do synchronization with disk drives (dang,
48it sounds even more appealing now!), the reason you use I/O barriers
49is mainly to protect filesystem integrity when power failure or some
50other events abruptly stop the drive from operating and possibly make
51the drive lose data in its cache. So, I/O barriers need to guarantee
52that requests actually get written to non-volatile medium in order.
53
54There are four cases,
55
56i. No write-back cache. Keeping requests ordered is enough.
57
58ii. Write-back cache but no flush operation. There's no way to
59guarantee physical-medium commit order. This kind of devices can't to
60I/O barriers.
61
62iii. Write-back cache and flush operation but no FUA (forced unit
63access). We need two cache flushes - before and after the barrier
64request.
65
66iv. Write-back cache, flush operation and FUA. We still need one
67flush to make sure requests preceding a barrier are written to medium,
68but post-barrier flush can be avoided by using FUA write on the
69barrier itself.
70
71
72How to support barrier requests in drivers
73------------------------------------------
74
75All barrier handling is done inside block layer proper. All low level
76drivers have to are implementing its prepare_flush_fn and using one
77the following two functions to indicate what barrier type it supports
78and how to prepare flush requests. Note that the term 'ordered' is
79used to indicate the whole sequence of performing barrier requests
80including draining and flushing.
81
82typedef void (prepare_flush_fn)(struct request_queue *q, struct request *rq);
83
84int blk_queue_ordered(struct request_queue *q, unsigned ordered,
85 prepare_flush_fn *prepare_flush_fn);
86
87@q : the queue in question
88@ordered : the ordered mode the driver/device supports
89@prepare_flush_fn : this function should prepare @rq such that it
90 flushes cache to physical medium when executed
91
92For example, SCSI disk driver's prepare_flush_fn looks like the
93following.
94
95static void sd_prepare_flush(struct request_queue *q, struct request *rq)
96{
97 memset(rq->cmd, 0, sizeof(rq->cmd));
98 rq->cmd_type = REQ_TYPE_BLOCK_PC;
99 rq->timeout = SD_TIMEOUT;
100 rq->cmd[0] = SYNCHRONIZE_CACHE;
101 rq->cmd_len = 10;
102}
103
104The following seven ordered modes are supported. The following table
105shows which mode should be used depending on what features a
106device/driver supports. In the leftmost column of table,
107QUEUE_ORDERED_ prefix is omitted from the mode names to save space.
108
109The table is followed by description of each mode. Note that in the
110descriptions of QUEUE_ORDERED_DRAIN*, '=>' is used whereas '->' is
111used for QUEUE_ORDERED_TAG* descriptions. '=>' indicates that the
112preceding step must be complete before proceeding to the next step.
113'->' indicates that the next step can start as soon as the previous
114step is issued.
115
116 write-back cache ordered tag flush FUA
117-----------------------------------------------------------------------
118NONE yes/no N/A no N/A
119DRAIN no no N/A N/A
120DRAIN_FLUSH yes no yes no
121DRAIN_FUA yes no yes yes
122TAG no yes N/A N/A
123TAG_FLUSH yes yes yes no
124TAG_FUA yes yes yes yes
125
126
127QUEUE_ORDERED_NONE
128 I/O barriers are not needed and/or supported.
129
130 Sequence: N/A
131
132QUEUE_ORDERED_DRAIN
133 Requests are ordered by draining the request queue and cache
134 flushing isn't needed.
135
136 Sequence: drain => barrier
137
138QUEUE_ORDERED_DRAIN_FLUSH
139 Requests are ordered by draining the request queue and both
140 pre-barrier and post-barrier cache flushings are needed.
141
142 Sequence: drain => preflush => barrier => postflush
143
144QUEUE_ORDERED_DRAIN_FUA
145 Requests are ordered by draining the request queue and
146 pre-barrier cache flushing is needed. By using FUA on barrier
147 request, post-barrier flushing can be skipped.
148
149 Sequence: drain => preflush => barrier
150
151QUEUE_ORDERED_TAG
152 Requests are ordered by ordered tag and cache flushing isn't
153 needed.
154
155 Sequence: barrier
156
157QUEUE_ORDERED_TAG_FLUSH
158 Requests are ordered by ordered tag and both pre-barrier and
159 post-barrier cache flushings are needed.
160
161 Sequence: preflush -> barrier -> postflush
162
163QUEUE_ORDERED_TAG_FUA
164 Requests are ordered by ordered tag and pre-barrier cache
165 flushing is needed. By using FUA on barrier request,
166 post-barrier flushing can be skipped.
167
168 Sequence: preflush -> barrier
169
170
171Random notes/caveats
172--------------------
173
174* SCSI layer currently can't use TAG ordering even if the drive,
175controller and driver support it. The problem is that SCSI midlayer
176request dispatch function is not atomic. It releases queue lock and
177switch to SCSI host lock during issue and it's possible and likely to
178happen in time that requests change their relative positions. Once
179this problem is solved, TAG ordering can be enabled.
180
181* Currently, no matter which ordered mode is used, there can be only
182one barrier request in progress. All I/O barriers are held off by
183block layer until the previous I/O barrier is complete. This doesn't
184make any difference for DRAIN ordered devices, but, for TAG ordered
185devices with very high command latency, passing multiple I/O barriers
186to low level *might* be helpful if they are very frequent. Well, this
187certainly is a non-issue. I'm writing this just to make clear that no
188two I/O barrier is ever passed to low-level driver.
189
190* Completion order. Requests in ordered sequence are issued in order
191but not required to finish in order. Barrier implementation can
192handle out-of-order completion of ordered sequence. IOW, the requests
193MUST be processed in order but the hardware/software completion paths
194are allowed to reorder completion notifications - eg. current SCSI
195midlayer doesn't preserve completion order during error handling.
196
197* Requeueing order. Low-level drivers are free to requeue any request
198after they removed it from the request queue with
199blkdev_dequeue_request(). As barrier sequence should be kept in order
200when requeued, generic elevator code takes care of putting requests in
201order around barrier. See blk_ordered_req_seq() and
202ELEVATOR_INSERT_REQUEUE handling in __elv_add_request() for details.
203
204Note that block drivers must not requeue preceding requests while
205completing latter requests in an ordered sequence. Currently, no
206error checking is done against this.
207
208* Error handling. Currently, block layer will report error to upper
209layer if any of requests in an ordered sequence fails. Unfortunately,
210this doesn't seem to be enough. Look at the following request flow.
211QUEUE_ORDERED_TAG_FLUSH is in use.
212
213 [0] [1] [2] [3] [pre] [barrier] [post] < [4] [5] [6] ... >
214 still in elevator
215
216Let's say request [2], [3] are write requests to update file system
217metadata (journal or whatever) and [barrier] is used to mark that
218those updates are valid. Consider the following sequence.
219
220 i. Requests [0] ~ [post] leaves the request queue and enters
221 low-level driver.
222 ii. After a while, unfortunately, something goes wrong and the
223 drive fails [2]. Note that any of [0], [1] and [3] could have
224 completed by this time, but [pre] couldn't have been finished
225 as the drive must process it in order and it failed before
226 processing that command.
227 iii. Error handling kicks in and determines that the error is
228 unrecoverable and fails [2], and resumes operation.
229 iv. [pre] [barrier] [post] gets processed.
230 v. *BOOM* power fails
231
232The problem here is that the barrier request is *supposed* to indicate
233that filesystem update requests [2] and [3] made it safely to the
234physical medium and, if the machine crashes after the barrier is
235written, filesystem recovery code can depend on that. Sadly, that
236isn't true in this case anymore. IOW, the success of a I/O barrier
237should also be dependent on success of some of the preceding requests,
238where only upper layer (filesystem) knows what 'some' is.
239
240This can be solved by implementing a way to tell the block layer which
241requests affect the success of the following barrier request and
242making lower lever drivers to resume operation on error only after
243block layer tells it to do so.
244
245As the probability of this happening is very low and the drive should
246be faulty, implementing the fix is probably an overkill. But, still,
247it's there.
248
249* In previous drafts of barrier implementation, there was fallback
250mechanism such that, if FUA or ordered TAG fails, less fancy ordered
251mode can be selected and the failed barrier request is retried
252automatically. The rationale for this feature was that as FUA is
253pretty new in ATA world and ordered tag was never used widely, there
254could be devices which report to support those features but choke when
255actually given such requests.
256
257 This was removed for two reasons 1. it's an overkill 2. it's
258impossible to implement properly when TAG ordering is used as low
259level drivers resume after an error automatically. If it's ever
260needed adding it back and modifying low level drivers accordingly
261shouldn't be difficult.
diff --git a/Documentation/block/writeback_cache_control.txt b/Documentation/block/writeback_cache_control.txt
new file mode 100644
index 000000000000..83407d36630a
--- /dev/null
+++ b/Documentation/block/writeback_cache_control.txt
@@ -0,0 +1,86 @@
1
2Explicit volatile write back cache control
3=====================================
4
5Introduction
6------------
7
8Many storage devices, especially in the consumer market, come with volatile
9write back caches. That means the devices signal I/O completion to the
10operating system before data actually has hit the non-volatile storage. This
11behavior obviously speeds up various workloads, but it means the operating
12system needs to force data out to the non-volatile storage when it performs
13a data integrity operation like fsync, sync or an unmount.
14
15The Linux block layer provides two simple mechanisms that let filesystems
16control the caching behavior of the storage device. These mechanisms are
17a forced cache flush, and the Force Unit Access (FUA) flag for requests.
18
19
20Explicit cache flushes
21----------------------
22
23The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from
24the filesystem and will make sure the volatile cache of the storage device
25has been flushed before the actual I/O operation is started. This explicitly
26guarantees that previously completed write requests are on non-volatile
27storage before the flagged bio starts. In addition the REQ_FLUSH flag can be
28set on an otherwise empty bio structure, which causes only an explicit cache
29flush without any dependent I/O. It is recommend to use
30the blkdev_issue_flush() helper for a pure cache flush.
31
32
33Forced Unit Access
34-----------------
35
36The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
37filesystem and will make sure that I/O completion for this request is only
38signaled after the data has been committed to non-volatile storage.
39
40
41Implementation details for filesystems
42--------------------------------------
43
44Filesystems can simply set the REQ_FLUSH and REQ_FUA bits and do not have to
45worry if the underlying devices need any explicit cache flushing and how
46the Forced Unit Access is implemented. The REQ_FLUSH and REQ_FUA flags
47may both be set on a single bio.
48
49
50Implementation details for make_request_fn based block drivers
51--------------------------------------------------------------
52
53These drivers will always see the REQ_FLUSH and REQ_FUA bits as they sit
54directly below the submit_bio interface. For remapping drivers the REQ_FUA
55bits need to be propagated to underlying devices, and a global flush needs
56to be implemented for bios with the REQ_FLUSH bit set. For real device
57drivers that do not have a volatile cache the REQ_FLUSH and REQ_FUA bits
58on non-empty bios can simply be ignored, and REQ_FLUSH requests without
59data can be completed successfully without doing any work. Drivers for
60devices with volatile caches need to implement the support for these
61flags themselves without any help from the block layer.
62
63
64Implementation details for request_fn based block drivers
65--------------------------------------------------------------
66
67For devices that do not support volatile write caches there is no driver
68support required, the block layer completes empty REQ_FLUSH requests before
69entering the driver and strips off the REQ_FLUSH and REQ_FUA bits from
70requests that have a payload. For devices with volatile write caches the
71driver needs to tell the block layer that it supports flushing caches by
72doing:
73
74 blk_queue_flush(sdkp->disk->queue, REQ_FLUSH);
75
76and handle empty REQ_FLUSH requests in its prep_fn/request_fn. Note that
77REQ_FLUSH requests with a payload are automatically turned into a sequence
78of an empty REQ_FLUSH request followed by the actual write by the block
79layer. For devices that also support the FUA bit the block layer needs
80to be told to pass through the REQ_FUA bit using:
81
82 blk_queue_flush(sdkp->disk->queue, REQ_FLUSH | REQ_FUA);
83
84and the driver must handle write requests that have the REQ_FUA bit set
85in prep_fn/request_fn. If the FUA bit is not natively supported the block
86layer turns it into an empty REQ_FLUSH request after the actual write.
diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index 6919d62591d9..d6da611f8f63 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -8,12 +8,17 @@ both at leaf nodes as well as at intermediate nodes in a storage hierarchy.
8Plan is to use the same cgroup based management interface for blkio controller 8Plan is to use the same cgroup based management interface for blkio controller
9and based on user options switch IO policies in the background. 9and based on user options switch IO policies in the background.
10 10
11In the first phase, this patchset implements proportional weight time based 11Currently two IO control policies are implemented. First one is proportional
12division of disk policy. It is implemented in CFQ. Hence this policy takes 12weight time based division of disk policy. It is implemented in CFQ. Hence
13effect only on leaf nodes when CFQ is being used. 13this policy takes effect only on leaf nodes when CFQ is being used. The second
14one is throttling policy which can be used to specify upper IO rate limits
15on devices. This policy is implemented in generic block layer and can be
16used on leaf nodes as well as higher level logical devices like device mapper.
14 17
15HOWTO 18HOWTO
16===== 19=====
20Proportional Weight division of bandwidth
21-----------------------------------------
17You can do a very simple testing of running two dd threads in two different 22You can do a very simple testing of running two dd threads in two different
18cgroups. Here is what you can do. 23cgroups. Here is what you can do.
19 24
@@ -55,6 +60,35 @@ cgroups. Here is what you can do.
55 group dispatched to the disk. We provide fairness in terms of disk time, so 60 group dispatched to the disk. We provide fairness in terms of disk time, so
56 ideally io.disk_time of cgroups should be in proportion to the weight. 61 ideally io.disk_time of cgroups should be in proportion to the weight.
57 62
63Throttling/Upper Limit policy
64-----------------------------
65- Enable Block IO controller
66 CONFIG_BLK_CGROUP=y
67
68- Enable throttling in block layer
69 CONFIG_BLK_DEV_THROTTLING=y
70
71- Mount blkio controller
72 mount -t cgroup -o blkio none /cgroup/blkio
73
74- Specify a bandwidth rate on particular device for root group. The format
75 for policy is "<major>:<minor> <byes_per_second>".
76
77 echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device
78
79 Above will put a limit of 1MB/second on reads happening for root group
80 on device having major/minor number 8:16.
81
82- Run dd to read a file and see if rate is throttled to 1MB/s or not.
83
84 # dd if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
85 # iflag=direct
86 1024+0 records in
87 1024+0 records out
88 4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
89
90 Limits for writes can be put using blkio.write_bps_device file.
91
58Various user visible config options 92Various user visible config options
59=================================== 93===================================
60CONFIG_BLK_CGROUP 94CONFIG_BLK_CGROUP
@@ -68,8 +102,13 @@ CONFIG_CFQ_GROUP_IOSCHED
68 - Enables group scheduling in CFQ. Currently only 1 level of group 102 - Enables group scheduling in CFQ. Currently only 1 level of group
69 creation is allowed. 103 creation is allowed.
70 104
105CONFIG_BLK_DEV_THROTTLING
106 - Enable block device throttling support in block layer.
107
71Details of cgroup files 108Details of cgroup files
72======================= 109=======================
110Proportional weight policy files
111--------------------------------
73- blkio.weight 112- blkio.weight
74 - Specifies per cgroup weight. This is default weight of the group 113 - Specifies per cgroup weight. This is default weight of the group
75 on all the devices until and unless overridden by per device rule. 114 on all the devices until and unless overridden by per device rule.
@@ -210,6 +249,67 @@ Details of cgroup files
210 and minor number of the device and third field specifies the number 249 and minor number of the device and third field specifies the number
211 of times a group was dequeued from a particular device. 250 of times a group was dequeued from a particular device.
212 251
252Throttling/Upper limit policy files
253-----------------------------------
254- blkio.throttle.read_bps_device
255 - Specifies upper limit on READ rate from the device. IO rate is
256 specified in bytes per second. Rules are per deivce. Following is
257 the format.
258
259 echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.read_bps_device
260
261- blkio.throttle.write_bps_device
262 - Specifies upper limit on WRITE rate to the device. IO rate is
263 specified in bytes per second. Rules are per deivce. Following is
264 the format.
265
266 echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.write_bps_device
267
268- blkio.throttle.read_iops_device
269 - Specifies upper limit on READ rate from the device. IO rate is
270 specified in IO per second. Rules are per deivce. Following is
271 the format.
272
273 echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.read_iops_device
274
275- blkio.throttle.write_iops_device
276 - Specifies upper limit on WRITE rate to the device. IO rate is
277 specified in io per second. Rules are per deivce. Following is
278 the format.
279
280 echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.write_iops_device
281
282Note: If both BW and IOPS rules are specified for a device, then IO is
283 subjectd to both the constraints.
284
285- blkio.throttle.io_serviced
286 - Number of IOs (bio) completed to/from the disk by the group (as
287 seen by throttling policy). These are further divided by the type
288 of operation - read or write, sync or async. First two fields specify
289 the major and minor number of the device, third field specifies the
290 operation type and the fourth field specifies the number of IOs.
291
292 blkio.io_serviced does accounting as seen by CFQ and counts are in
293 number of requests (struct request). On the other hand,
294 blkio.throttle.io_serviced counts number of IO in terms of number
295 of bios as seen by throttling policy. These bios can later be
296 merged by elevator and total number of requests completed can be
297 lesser.
298
299- blkio.throttle.io_service_bytes
300 - Number of bytes transferred to/from the disk by the group. These
301 are further divided by the type of operation - read or write, sync
302 or async. First two fields specify the major and minor number of the
303 device, third field specifies the operation type and the fourth field
304 specifies the number of bytes.
305
306 These numbers should roughly be same as blkio.io_service_bytes as
307 updated by CFQ. The difference between two is that
308 blkio.io_service_bytes will not be updated if CFQ is not operating
309 on request queue.
310
311Common files among various policies
312-----------------------------------
213- blkio.reset_stats 313- blkio.reset_stats
214 - Writing an int to this file will result in resetting all the stats 314 - Writing an int to this file will result in resetting all the stats
215 for that cgroup. 315 for that cgroup.
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index b34823ff1646..190018b0c649 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -18,7 +18,8 @@ CONTENTS:
18 1.2 Why are cgroups needed ? 18 1.2 Why are cgroups needed ?
19 1.3 How are cgroups implemented ? 19 1.3 How are cgroups implemented ?
20 1.4 What does notify_on_release do ? 20 1.4 What does notify_on_release do ?
21 1.5 How do I use cgroups ? 21 1.5 What does clone_children do ?
22 1.6 How do I use cgroups ?
222. Usage Examples and Syntax 232. Usage Examples and Syntax
23 2.1 Basic Usage 24 2.1 Basic Usage
24 2.2 Attaching processes 25 2.2 Attaching processes
@@ -293,7 +294,16 @@ notify_on_release in the root cgroup at system boot is disabled
293value of their parents notify_on_release setting. The default value of 294value of their parents notify_on_release setting. The default value of
294a cgroup hierarchy's release_agent path is empty. 295a cgroup hierarchy's release_agent path is empty.
295 296
2961.5 How do I use cgroups ? 2971.5 What does clone_children do ?
298---------------------------------
299
300If the clone_children flag is enabled (1) in a cgroup, then all
301cgroups created beneath will call the post_clone callbacks for each
302subsystem of the newly created cgroup. Usually when this callback is
303implemented for a subsystem, it copies the values of the parent
304subsystem, this is the case for the cpuset.
305
3061.6 How do I use cgroups ?
297-------------------------- 307--------------------------
298 308
299To start a new job that is to be contained within a cgroup, using 309To start a new job that is to be contained within a cgroup, using
diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index d0d1df6cb5de..c58abf1ccc71 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -239,6 +239,7 @@ Your cooperation is appreciated.
239 0 = /dev/tty Current TTY device 239 0 = /dev/tty Current TTY device
240 1 = /dev/console System console 240 1 = /dev/console System console
241 2 = /dev/ptmx PTY master multiplex 241 2 = /dev/ptmx PTY master multiplex
242 3 = /dev/ttyprintk User messages via printk TTY device
242 64 = /dev/cua0 Callout device for ttyS0 243 64 = /dev/cua0 Callout device for ttyS0
243 ... 244 ...
244 255 = /dev/cua191 Callout device for ttyS191 245 255 = /dev/cua191 Callout device for ttyS191
@@ -2553,7 +2554,10 @@ Your cooperation is appreciated.
2553 175 = /dev/usb/legousbtower15 16th USB Legotower device 2554 175 = /dev/usb/legousbtower15 16th USB Legotower device
2554 176 = /dev/usb/usbtmc1 First USB TMC device 2555 176 = /dev/usb/usbtmc1 First USB TMC device
2555 ... 2556 ...
2556 192 = /dev/usb/usbtmc16 16th USB TMC device 2557 191 = /dev/usb/usbtmc16 16th USB TMC device
2558 192 = /dev/usb/yurex1 First USB Yurex device
2559 ...
2560 209 = /dev/usb/yurex16 16th USB Yurex device
2557 240 = /dev/usb/dabusb0 First daubusb device 2561 240 = /dev/usb/dabusb0 First daubusb device
2558 ... 2562 ...
2559 243 = /dev/usb/dabusb3 Fourth dabusb device 2563 243 = /dev/usb/dabusb3 Fourth dabusb device
diff --git a/Documentation/dynamic-debug-howto.txt b/Documentation/dynamic-debug-howto.txt
index 674c5663d346..58ea64a96165 100644
--- a/Documentation/dynamic-debug-howto.txt
+++ b/Documentation/dynamic-debug-howto.txt
@@ -24,7 +24,7 @@ Dynamic debug has even more useful features:
24 read to display the complete list of known debug statements, to help guide you 24 read to display the complete list of known debug statements, to help guide you
25 25
26Controlling dynamic debug Behaviour 26Controlling dynamic debug Behaviour
27=============================== 27===================================
28 28
29The behaviour of pr_debug()/dev_debug()s are controlled via writing to a 29The behaviour of pr_debug()/dev_debug()s are controlled via writing to a
30control file in the 'debugfs' filesystem. Thus, you must first mount the debugfs 30control file in the 'debugfs' filesystem. Thus, you must first mount the debugfs
@@ -212,6 +212,26 @@ Note the regexp ^[-+=][scp]+$ matches a flags specification.
212Note also that there is no convenient syntax to remove all 212Note also that there is no convenient syntax to remove all
213the flags at once, you need to use "-psc". 213the flags at once, you need to use "-psc".
214 214
215
216Debug messages during boot process
217==================================
218
219To be able to activate debug messages during the boot process,
220even before userspace and debugfs exists, use the boot parameter:
221ddebug_query="QUERY"
222
223QUERY follows the syntax described above, but must not exceed 1023
224characters. The enablement of debug messages is done as an arch_initcall.
225Thus you can enable debug messages in all code processed after this
226arch_initcall via this boot parameter.
227On an x86 system for example ACPI enablement is a subsys_initcall and
228ddebug_query="file ec.c +p"
229will show early Embedded Controller transactions during ACPI setup if
230your machine (typically a laptop) has an Embedded Controller.
231PCI (or other devices) initialization also is a hot candidate for using
232this boot parameter for debugging purposes.
233
234
215Examples 235Examples
216======== 236========
217 237
diff --git a/Documentation/fb/viafb.txt b/Documentation/fb/viafb.txt
index f3e046a6a987..1a2e8aa3fbb1 100644
--- a/Documentation/fb/viafb.txt
+++ b/Documentation/fb/viafb.txt
@@ -197,6 +197,54 @@ Notes:
197 example, 197 example,
198 # fbset -depth 16 198 # fbset -depth 16
199 199
200
201[Configure viafb via /proc]
202---------------------------
203 The following files exist in /proc/viafb
204
205 supported_output_devices
206
207 This read-only file contains a full ',' seperated list containing all
208 output devices that could be available on your platform. It is likely
209 that not all of those have a connector on your hardware but it should
210 provide a good starting point to figure out which of those names match
211 a real connector.
212 Example:
213 # cat /proc/viafb/supported_output_devices
214
215 iga1/output_devices
216 iga2/output_devices
217
218 These two files are readable and writable. iga1 and iga2 are the two
219 independent units that produce the screen image. Those images can be
220 forwarded to one or more output devices. Reading those files is a way
221 to query which output devices are currently used by an iga.
222 Example:
223 # cat /proc/viafb/iga1/output_devices
224 If there are no output devices printed the output of this iga is lost.
225 This can happen for example if only one (the other) iga is used.
226 Writing to these files allows adjusting the output devices during
227 runtime. One can add new devices, remove existing ones or switch
228 between igas. Essentially you can write a ',' seperated list of device
229 names (or a single one) in the same format as the output to those
230 files. You can add a '+' or '-' as a prefix allowing simple addition
231 and removal of devices. So a prefix '+' adds the devices from your list
232 to the already existing ones, '-' removes the listed devices from the
233 existing ones and if no prefix is given it replaces all existing ones
234 with the listed ones. If you remove devices they are expected to turn
235 off. If you add devices that are already part of the other iga they are
236 removed there and added to the new one.
237 Examples:
238 Add CRT as output device to iga1
239 # echo +CRT > /proc/viafb/iga1/output_devices
240
241 Remove (turn off) DVP1 and LVDS1 as output devices of iga2
242 # echo -DVP1,LVDS1 > /proc/viafb/iga2/output_devices
243
244 Replace all iga1 output devices by CRT
245 # echo CRT > /proc/viafb/iga1/output_devices
246
247
200[Bootup with viafb]: 248[Bootup with viafb]:
201-------------------- 249--------------------
202 Add the following line to your grub.conf: 250 Add the following line to your grub.conf:
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 5e2bc4ab897a..f3da8c0a3af2 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -502,16 +502,6 @@ Who: Thomas Gleixner <tglx@linutronix.de>
502 502
503---------------------------- 503----------------------------
504 504
505What: old ieee1394 subsystem (CONFIG_IEEE1394)
506When: 2.6.37
507Files: drivers/ieee1394/ except init_ohci1394_dma.c
508Why: superseded by drivers/firewire/ (CONFIG_FIREWIRE) which offers more
509 features, better performance, and better security, all with smaller
510 and more modern code base
511Who: Stefan Richter <stefanr@s5r6.in-berlin.de>
512
513----------------------------
514
515What: The acpi_sleep=s4_nonvs command line option 505What: The acpi_sleep=s4_nonvs command line option
516When: 2.6.37 506When: 2.6.37
517Files: arch/x86/kernel/acpi/sleep.c 507Files: arch/x86/kernel/acpi/sleep.c
@@ -536,3 +526,39 @@ Who: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
536 526
537---------------------------- 527----------------------------
538 528
529What: namespace cgroup (ns_cgroup)
530When: 2.6.38
531Why: The ns_cgroup leads to some problems:
532 * cgroup creation is out-of-control
533 * cgroup name can conflict when pids are looping
534 * it is not possible to have a single process handling
535 a lot of namespaces without falling in a exponential creation time
536 * we may want to create a namespace without creating a cgroup
537
538 The ns_cgroup is replaced by a compatibility flag 'clone_children',
539 where a newly created cgroup will copy the parent cgroup values.
540 The userspace has to manually create a cgroup and add a task to
541 the 'tasks' file.
542Who: Daniel Lezcano <daniel.lezcano@free.fr>
543
544----------------------------
545
546What: iwlwifi disable_hw_scan module parameters
547When: 2.6.40
548Why: Hareware scan is the prefer method for iwlwifi devices for
549 scanning operation. Remove software scan support for all the
550 iwlwifi devices.
551
552Who: Wey-Yi Guy <wey-yi.w.guy@intel.com>
553
554----------------------------
555
556What: access to nfsd auth cache through sys_nfsservctl or '.' files
557 in the 'nfsd' filesystem.
558When: 2.6.40
559Why: This is a legacy interface which have been replaced by a more
560 dynamic cache. Continuing to maintain this interface is an
561 unnecessary burden.
562Who: NeilBrown <neilb@suse.de>
563
564----------------------------
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 2db4283efa8d..8a817f656f0a 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -349,21 +349,36 @@ call this method upon the IO completion.
349 349
350--------------------------- block_device_operations ----------------------- 350--------------------------- block_device_operations -----------------------
351prototypes: 351prototypes:
352 int (*open) (struct inode *, struct file *); 352 int (*open) (struct block_device *, fmode_t);
353 int (*release) (struct inode *, struct file *); 353 int (*release) (struct gendisk *, fmode_t);
354 int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long); 354 int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
355 int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
356 int (*direct_access) (struct block_device *, sector_t, void **, unsigned long *);
355 int (*media_changed) (struct gendisk *); 357 int (*media_changed) (struct gendisk *);
358 void (*unlock_native_capacity) (struct gendisk *);
356 int (*revalidate_disk) (struct gendisk *); 359 int (*revalidate_disk) (struct gendisk *);
360 int (*getgeo)(struct block_device *, struct hd_geometry *);
361 void (*swap_slot_free_notify) (struct block_device *, unsigned long);
357 362
358locking rules: 363locking rules:
359 BKL bd_sem 364 BKL bd_mutex
360open: yes yes 365open: no yes
361release: yes yes 366release: no yes
362ioctl: yes no 367ioctl: no no
368compat_ioctl: no no
369direct_access: no no
363media_changed: no no 370media_changed: no no
371unlock_native_capacity: no no
364revalidate_disk: no no 372revalidate_disk: no no
373getgeo: no no
374swap_slot_free_notify: no no (see below)
375
376media_changed, unlock_native_capacity and revalidate_disk are called only from
377check_disk_change().
378
379swap_slot_free_notify is called with swap_lock and sometimes the page lock
380held.
365 381
366The last two are called only from check_disk_change().
367 382
368--------------------------- file_operations ------------------------------- 383--------------------------- file_operations -------------------------------
369prototypes: 384prototypes:
diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX
index 2f68cd688769..a57e12411d2a 100644
--- a/Documentation/filesystems/nfs/00-INDEX
+++ b/Documentation/filesystems/nfs/00-INDEX
@@ -12,5 +12,9 @@ nfs-rdma.txt
12 - how to install and setup the Linux NFS/RDMA client and server software 12 - how to install and setup the Linux NFS/RDMA client and server software
13nfsroot.txt 13nfsroot.txt
14 - short guide on setting up a diskless box with NFS root filesystem. 14 - short guide on setting up a diskless box with NFS root filesystem.
15pnfs.txt
16 - short explanation of some of the internals of the pnfs client code
15rpc-cache.txt 17rpc-cache.txt
16 - introduction to the caching mechanisms in the sunrpc layer. 18 - introduction to the caching mechanisms in the sunrpc layer.
19idmapper.txt
20 - information for configuring request-keys to be used by idmapper
diff --git a/Documentation/filesystems/nfs/idmapper.txt b/Documentation/filesystems/nfs/idmapper.txt
new file mode 100644
index 000000000000..b9b4192ea8b5
--- /dev/null
+++ b/Documentation/filesystems/nfs/idmapper.txt
@@ -0,0 +1,67 @@
1
2=========
3ID Mapper
4=========
5Id mapper is used by NFS to translate user and group ids into names, and to
6translate user and group names into ids. Part of this translation involves
7performing an upcall to userspace to request the information. Id mapper will
8user request-key to perform this upcall and cache the result. The program
9/usr/sbin/nfs.idmap should be called by request-key, and will perform the
10translation and initialize a key with the resulting information.
11
12 NFS_USE_NEW_IDMAPPER must be selected when configuring the kernel to use this
13 feature.
14
15===========
16Configuring
17===========
18The file /etc/request-key.conf will need to be modified so /sbin/request-key can
19direct the upcall. The following line should be added:
20
21#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...
22#====== ======= =============== =============== ===============================
23create id_resolver * * /usr/sbin/nfs.idmap %k %d 600
24
25This will direct all id_resolver requests to the program /usr/sbin/nfs.idmap.
26The last parameter, 600, defines how many seconds into the future the key will
27expire. This parameter is optional for /usr/sbin/nfs.idmap. When the timeout
28is not specified, nfs.idmap will default to 600 seconds.
29
30id mapper uses for key descriptions:
31 uid: Find the UID for the given user
32 gid: Find the GID for the given group
33 user: Find the user name for the given UID
34 group: Find the group name for the given GID
35
36You can handle any of these individually, rather than using the generic upcall
37program. If you would like to use your own program for a uid lookup then you
38would edit your request-key.conf so it look similar to this:
39
40#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...
41#====== ======= =============== =============== ===============================
42create id_resolver uid:* * /some/other/program %k %d 600
43create id_resolver * * /usr/sbin/nfs.idmap %k %d 600
44
45Notice that the new line was added above the line for the generic program.
46request-key will find the first matching line and corresponding program. In
47this case, /some/other/program will handle all uid lookups and
48/usr/sbin/nfs.idmap will handle gid, user, and group lookups.
49
50See <file:Documentation/keys-request-keys.txt> for more information about the
51request-key function.
52
53
54=========
55nfs.idmap
56=========
57nfs.idmap is designed to be called by request-key, and should not be run "by
58hand". This program takes two arguments, a serialized key and a key
59description. The serialized key is first converted into a key_serial_t, and
60then passed as an argument to keyctl_instantiate (both are part of keyutils.h).
61
62The actual lookups are performed by functions found in nfsidmap.h. nfs.idmap
63determines the correct function to call by looking at the first part of the
64description string. For example, a uid lookup description will appear as
65"uid:user@domain".
66
67nfs.idmap will return 0 if the key was instantiated, and non-zero otherwise.
diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt
index f2430a7974e1..90c71c6f0d00 100644
--- a/Documentation/filesystems/nfs/nfsroot.txt
+++ b/Documentation/filesystems/nfs/nfsroot.txt
@@ -159,6 +159,28 @@ ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
159 Default: any 159 Default: any
160 160
161 161
162nfsrootdebug
163
164 This parameter enables debugging messages to appear in the kernel
165 log at boot time so that administrators can verify that the correct
166 NFS mount options, server address, and root path are passed to the
167 NFS client.
168
169
170rdinit=<executable file>
171
172 To specify which file contains the program that starts system
173 initialization, administrators can use this command line parameter.
174 The default value of this parameter is "/init". If the specified
175 file exists and the kernel can execute it, root filesystem related
176 kernel command line parameters, including `nfsroot=', are ignored.
177
178 A description of the process of mounting the root file system can be
179 found in:
180
181 Documentation/early-userspace/README
182
183
162 184
163 185
1643.) Boot Loader 1863.) Boot Loader
diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt
new file mode 100644
index 000000000000..bc0b9cfe095b
--- /dev/null
+++ b/Documentation/filesystems/nfs/pnfs.txt
@@ -0,0 +1,48 @@
1Reference counting in pnfs:
2==========================
3
4The are several inter-related caches. We have layouts which can
5reference multiple devices, each of which can reference multiple data servers.
6Each data server can be referenced by multiple devices. Each device
7can be referenced by multiple layouts. To keep all of this straight,
8we need to reference count.
9
10
11struct pnfs_layout_hdr
12----------------------
13The on-the-wire command LAYOUTGET corresponds to struct
14pnfs_layout_segment, usually referred to by the variable name lseg.
15Each nfs_inode may hold a pointer to a cache of of these layout
16segments in nfsi->layout, of type struct pnfs_layout_hdr.
17
18We reference the header for the inode pointing to it, across each
19outstanding RPC call that references it (LAYOUTGET, LAYOUTRETURN,
20LAYOUTCOMMIT), and for each lseg held within.
21
22Each header is also (when non-empty) put on a list associated with
23struct nfs_client (cl_layouts). Being put on this list does not bump
24the reference count, as the layout is kept around by the lseg that
25keeps it in the list.
26
27deviceid_cache
28--------------
29lsegs reference device ids, which are resolved per nfs_client and
30layout driver type. The device ids are held in a RCU cache (struct
31nfs4_deviceid_cache). The cache itself is referenced across each
32mount. The entries (struct nfs4_deviceid) themselves are held across
33the lifetime of each lseg referencing them.
34
35RCU is used because the deviceid is basically a write once, read many
36data structure. The hlist size of 32 buckets needs better
37justification, but seems reasonable given that we can have multiple
38deviceid's per filesystem, and multiple filesystems per nfs_client.
39
40The hash code is copied from the nfsd code base. A discussion of
41hashing and variations of this algorithm can be found at:
42http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809
43
44data server cache
45-----------------
46file driver devices refer to data servers, which are kept in a module
47level cache. Its reference is held over the lifetime of the deviceid
48pointing to it.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index a6aca8740883..e73df2722ff3 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -136,6 +136,7 @@ Table 1-1: Process specific entries in /proc
136 statm Process memory status information 136 statm Process memory status information
137 status Process status in human readable form 137 status Process status in human readable form
138 wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan 138 wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan
139 pagemap Page table
139 stack Report full stack trace, enable via CONFIG_STACKTRACE 140 stack Report full stack trace, enable via CONFIG_STACKTRACE
140 smaps a extension based on maps, showing the memory consumption of 141 smaps a extension based on maps, showing the memory consumption of
141 each mapping 142 each mapping
@@ -370,17 +371,24 @@ Shared_Dirty: 0 kB
370Private_Clean: 0 kB 371Private_Clean: 0 kB
371Private_Dirty: 0 kB 372Private_Dirty: 0 kB
372Referenced: 892 kB 373Referenced: 892 kB
374Anonymous: 0 kB
373Swap: 0 kB 375Swap: 0 kB
374KernelPageSize: 4 kB 376KernelPageSize: 4 kB
375MMUPageSize: 4 kB 377MMUPageSize: 4 kB
376 378
377The first of these lines shows the same information as is displayed for the 379The first of these lines shows the same information as is displayed for the
378mapping in /proc/PID/maps. The remaining lines show the size of the mapping, 380mapping in /proc/PID/maps. The remaining lines show the size of the mapping
379the amount of the mapping that is currently resident in RAM, the "proportional 381(size), the amount of the mapping that is currently resident in RAM (RSS), the
380set size” (divide each shared page by the number of processes sharing it), the 382process' proportional share of this mapping (PSS), the number of clean and
381number of clean and dirty shared pages in the mapping, and the number of clean 383dirty private pages in the mapping. Note that even a page which is part of a
382and dirty private pages in the mapping. The "Referenced" indicates the amount 384MAP_SHARED mapping, but has only a single pte mapped, i.e. is currently used
383of memory currently marked as referenced or accessed. 385by only one process, is accounted as private and not as shared. "Referenced"
386indicates the amount of memory currently marked as referenced or accessed.
387"Anonymous" shows the amount of memory that does not belong to any file. Even
388a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
389and a page is modified, the file page is replaced by a private anonymous copy.
390"Swap" shows how much would-be-anonymous memory is also used, but out on
391swap.
384 392
385This file is only present if the CONFIG_MMU kernel configuration option is 393This file is only present if the CONFIG_MMU kernel configuration option is
386enabled. 394enabled.
@@ -397,6 +405,9 @@ To clear the bits for the file mapped pages associated with the process
397 > echo 3 > /proc/PID/clear_refs 405 > echo 3 > /proc/PID/clear_refs
398Any other value written to /proc/PID/clear_refs will have no effect. 406Any other value written to /proc/PID/clear_refs will have no effect.
399 407
408The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
409using /proc/kpageflags and number of times a page is mapped using
410/proc/kpagecount. For detailed explanation, see Documentation/vm/pagemap.txt.
400 411
4011.2 Kernel data 4121.2 Kernel data
402--------------- 413---------------
diff --git a/Documentation/filesystems/sharedsubtree.txt b/Documentation/filesystems/sharedsubtree.txt
index fc0e39af43c3..4ede421c9687 100644
--- a/Documentation/filesystems/sharedsubtree.txt
+++ b/Documentation/filesystems/sharedsubtree.txt
@@ -62,10 +62,10 @@ replicas continue to be exactly same.
62 # mount /dev/sd0 /tmp/a 62 # mount /dev/sd0 /tmp/a
63 63
64 #ls /tmp/a 64 #ls /tmp/a
65 t1 t2 t2 65 t1 t2 t3
66 66
67 #ls /mnt/a 67 #ls /mnt/a
68 t1 t2 t2 68 t1 t2 t3
69 69
70 Note that the mount has propagated to the mount at /mnt as well. 70 Note that the mount has propagated to the mount at /mnt as well.
71 71
diff --git a/Documentation/hwmon/ltc4261 b/Documentation/hwmon/ltc4261
new file mode 100644
index 000000000000..eba2e2c4b94d
--- /dev/null
+++ b/Documentation/hwmon/ltc4261
@@ -0,0 +1,63 @@
1Kernel driver ltc4261
2=====================
3
4Supported chips:
5 * Linear Technology LTC4261
6 Prefix: 'ltc4261'
7 Addresses scanned: -
8 Datasheet:
9 http://cds.linear.com/docs/Datasheet/42612fb.pdf
10
11Author: Guenter Roeck <guenter.roeck@ericsson.com>
12
13
14Description
15-----------
16
17The LTC4261/LTC4261-2 negative voltage Hot Swap controllers allow a board
18to be safely inserted and removed from a live backplane.
19
20
21Usage Notes
22-----------
23
24This driver does not probe for LTC4261 devices, since there is no register
25which can be safely used to identify the chip. You will have to instantiate
26the devices explicitly.
27
28Example: the following will load the driver for an LTC4261 at address 0x10
29on I2C bus #1:
30$ modprobe ltc4261
31$ echo ltc4261 0x10 > /sys/bus/i2c/devices/i2c-1/new_device
32
33
34Sysfs entries
35-------------
36
37Voltage readings provided by this driver are reported as obtained from the ADC
38registers. If a set of voltage divider resistors is installed, calculate the
39real voltage by multiplying the reported value with (R1+R2)/R2, where R1 is the
40value of the divider resistor against the measured voltage and R2 is the value
41of the divider resistor against Ground.
42
43Current reading provided by this driver is reported as obtained from the ADC
44Current Sense register. The reported value assumes that a 1 mOhm sense resistor
45is installed. If a different sense resistor is installed, calculate the real
46current by dividing the reported value by the sense resistor value in mOhm.
47
48The chip has two voltage sensors, but only one set of voltage alarm status bits.
49In many many designs, those alarms are associated with the ADIN2 sensor, due to
50the proximity of the ADIN2 pin to the OV pin. ADIN2 is, however, not available
51on all chip variants. To ensure that the alarm condition is reported to the user,
52report it with both voltage sensors.
53
54in1_input ADIN2 voltage (mV)
55in1_min_alarm ADIN/ADIN2 Undervoltage alarm
56in1_max_alarm ADIN/ADIN2 Overvoltage alarm
57
58in2_input ADIN voltage (mV)
59in2_min_alarm ADIN/ADIN2 Undervoltage alarm
60in2_max_alarm ADIN/ADIN2 Overvoltage alarm
61
62curr1_input SENSE current (mA)
63curr1_alarm SENSE overcurrent alarm
diff --git a/Documentation/input/ntrig.txt b/Documentation/input/ntrig.txt
new file mode 100644
index 000000000000..be1fd981f73f
--- /dev/null
+++ b/Documentation/input/ntrig.txt
@@ -0,0 +1,126 @@
1N-Trig touchscreen Driver
2-------------------------
3 Copyright (c) 2008-2010 Rafi Rubin <rafi@seas.upenn.edu>
4 Copyright (c) 2009-2010 Stephane Chatty
5
6This driver provides support for N-Trig pen and multi-touch sensors. Single
7and multi-touch events are translated to the appropriate protocols for
8the hid and input systems. Pen events are sufficiently hid compliant and
9are left to the hid core. The driver also provides additional filtering
10and utility functions accessible with sysfs and module parameters.
11
12This driver has been reported to work properly with multiple N-Trig devices
13attached.
14
15
16Parameters
17----------
18
19Note: values set at load time are global and will apply to all applicable
20devices. Adjusting parameters with sysfs will override the load time values,
21but only for that one device.
22
23The following parameters are used to configure filters to reduce noise:
24
25activate_slack number of fingers to ignore before processing events
26
27activation_height size threshold to activate immediately
28activation_width
29
30min_height size threshold bellow which fingers are ignored
31min_width both to decide activation and during activity
32
33deactivate_slack the number of "no contact" frames to ignore before
34 propagating the end of activity events
35
36When the last finger is removed from the device, it sends a number of empty
37frames. By holding off on deactivation for a few frames we can tolerate false
38erroneous disconnects, where the sensor may mistakenly not detect a finger that
39is still present. Thus deactivate_slack addresses problems where a users might
40see breaks in lines during drawing, or drop an object during a long drag.
41
42
43Additional sysfs items
44----------------------
45
46These nodes just provide easy access to the ranges reported by the device.
47sensor_logical_height the range for positions reported during activity
48sensor_logical_width
49
50sensor_physical_height internal ranges not used for normal events but
51sensor_physical_width useful for tuning
52
53All N-Trig devices with product id of 1 report events in the ranges of
54X: 0-9600
55Y: 0-7200
56However not all of these devices have the same physical dimensions. Most
57seem to be 12" sensors (Dell Latitude XT and XT2 and the HP TX2), and
58at least one model (Dell Studio 17) has a 17" sensor. The ratio of physical
59to logical sizes is used to adjust the size based filter parameters.
60
61
62Filtering
63---------
64
65With the release of the early multi-touch firmwares it became increasingly
66obvious that these sensors were prone to erroneous events. Users reported
67seeing both inappropriately dropped contact and ghosts, contacts reported
68where no finger was actually touching the screen.
69
70Deactivation slack helps prevent dropped contact for single touch use, but does
71not address the problem of dropping one of more contacts while other contacts
72are still active. Drops in the multi-touch context require additional
73processing and should be handled in tandem with tacking.
74
75As observed ghost contacts are similar to actual use of the sensor, but they
76seem to have different profiles. Ghost activity typically shows up as small
77short lived touches. As such, I assume that the longer the continuous stream
78of events the more likely those events are from a real contact, and that the
79larger the size of each contact the more likely it is real. Balancing the
80goals of preventing ghosts and accepting real events quickly (to minimize
81user observable latency), the filter accumulates confidence for incoming
82events until it hits thresholds and begins propagating. In the interest in
83minimizing stored state as well as the cost of operations to make a decision,
84I've kept that decision simple.
85
86Time is measured in terms of the number of fingers reported, not frames since
87the probability of multiple simultaneous ghosts is expected to drop off
88dramatically with increasing numbers. Rather than accumulate weight as a
89function of size, I just use it as a binary threshold. A sufficiently large
90contact immediately overrides the waiting period and leads to activation.
91
92Setting the activation size thresholds to large values will result in deciding
93primarily on activation slack. If you see longer lived ghosts, turning up the
94activation slack while reducing the size thresholds may suffice to eliminate
95the ghosts while keeping the screen quite responsive to firm taps.
96
97Contacts continue to be filtered with min_height and min_width even after
98the initial activation filter is satisfied. The intent is to provide
99a mechanism for filtering out ghosts in the form of an extra finger while
100you actually are using the screen. In practice this sort of ghost has
101been far less problematic or relatively rare and I've left the defaults
102set to 0 for both parameters, effectively turning off that filter.
103
104I don't know what the optimal values are for these filters. If the defaults
105don't work for you, please play with the parameters. If you do find other
106values more comfortable, I would appreciate feedback.
107
108The calibration of these devices does drift over time. If ghosts or contact
109dropping worsen and interfere with the normal usage of your device, try
110recalibrating it.
111
112
113Calibration
114-----------
115
116The N-Trig windows tools provide calibration and testing routines. Also an
117unofficial unsupported set of user space tools including a calibrator is
118available at:
119http://code.launchpad.net/~rafi-seas/+junk/ntrig_calib
120
121
122Tracking
123--------
124
125As of yet, all tested N-Trig firmwares do not track fingers. When multiple
126contacts are active they seem to be sorted primarily by Y position.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 02f21d9220ce..4bc2f3c3da5b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -43,10 +43,11 @@ parameter is applicable:
43 AVR32 AVR32 architecture is enabled. 43 AVR32 AVR32 architecture is enabled.
44 AX25 Appropriate AX.25 support is enabled. 44 AX25 Appropriate AX.25 support is enabled.
45 BLACKFIN Blackfin architecture is enabled. 45 BLACKFIN Blackfin architecture is enabled.
46 DRM Direct Rendering Management support is enabled.
47 EDD BIOS Enhanced Disk Drive Services (EDD) is enabled 46 EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
48 EFI EFI Partitioning (GPT) is enabled 47 EFI EFI Partitioning (GPT) is enabled
49 EIDE EIDE/ATAPI support is enabled. 48 EIDE EIDE/ATAPI support is enabled.
49 DRM Direct Rendering Management support is enabled.
50 DYNAMIC_DEBUG Build in debug messages and enable them at runtime
50 FB The frame buffer device is enabled. 51 FB The frame buffer device is enabled.
51 GCOV GCOV profiling is enabled. 52 GCOV GCOV profiling is enabled.
52 HW Appropriate hardware is enabled. 53 HW Appropriate hardware is enabled.
@@ -570,6 +571,10 @@ and is between 256 and 4096 characters. It is defined in the file
570 Format: <port#>,<type> 571 Format: <port#>,<type>
571 See also Documentation/input/joystick-parport.txt 572 See also Documentation/input/joystick-parport.txt
572 573
574 ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot
575 time. See Documentation/dynamic-debug-howto.txt for
576 details.
577
573 debug [KNL] Enable kernel debugging (events log level). 578 debug [KNL] Enable kernel debugging (events log level).
574 579
575 debug_locks_verbose= 580 debug_locks_verbose=
@@ -1126,9 +1131,13 @@ and is between 256 and 4096 characters. It is defined in the file
1126 kvm.oos_shadow= [KVM] Disable out-of-sync shadow paging. 1131 kvm.oos_shadow= [KVM] Disable out-of-sync shadow paging.
1127 Default is 1 (enabled) 1132 Default is 1 (enabled)
1128 1133
1129 kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM. 1134 kvm.mmu_audit= [KVM] This is a R/W parameter which allows audit
1135 KVM MMU at runtime.
1130 Default is 0 (off) 1136 Default is 0 (off)
1131 1137
1138 kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
1139 Default is 1 (enabled)
1140
1132 kvm-amd.npt= [KVM,AMD] Disable nested paging (virtualized MMU) 1141 kvm-amd.npt= [KVM,AMD] Disable nested paging (virtualized MMU)
1133 for all guests. 1142 for all guests.
1134 Default is 1 (enabled) if in 64bit or 32bit-PAE mode 1143 Default is 1 (enabled) if in 64bit or 32bit-PAE mode
@@ -1532,12 +1541,15 @@ and is between 256 and 4096 characters. It is defined in the file
1532 1 to enable accounting 1541 1 to enable accounting
1533 Default value is 0. 1542 Default value is 0.
1534 1543
1535 nfsaddrs= [NFS] 1544 nfsaddrs= [NFS] Deprecated. Use ip= instead.
1536 See Documentation/filesystems/nfs/nfsroot.txt. 1545 See Documentation/filesystems/nfs/nfsroot.txt.
1537 1546
1538 nfsroot= [NFS] nfs root filesystem for disk-less boxes. 1547 nfsroot= [NFS] nfs root filesystem for disk-less boxes.
1539 See Documentation/filesystems/nfs/nfsroot.txt. 1548 See Documentation/filesystems/nfs/nfsroot.txt.
1540 1549
1550 nfsrootdebug [NFS] enable nfsroot debugging messages.
1551 See Documentation/filesystems/nfs/nfsroot.txt.
1552
1541 nfs.callback_tcpport= 1553 nfs.callback_tcpport=
1542 [NFS] set the TCP port on which the NFSv4 callback 1554 [NFS] set the TCP port on which the NFSv4 callback
1543 channel should listen. 1555 channel should listen.
@@ -1693,6 +1705,8 @@ and is between 256 and 4096 characters. It is defined in the file
1693 1705
1694 nojitter [IA64] Disables jitter checking for ITC timers. 1706 nojitter [IA64] Disables jitter checking for ITC timers.
1695 1707
1708 no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
1709
1696 nolapic [X86-32,APIC] Do not enable or use the local APIC. 1710 nolapic [X86-32,APIC] Do not enable or use the local APIC.
1697 1711
1698 nolapic_timer [X86-32,APIC] Do not use the local APIC timer. 1712 nolapic_timer [X86-32,APIC] Do not use the local APIC timer.
@@ -1713,7 +1727,7 @@ and is between 256 and 4096 characters. It is defined in the file
1713 norandmaps Don't use address space randomization. Equivalent to 1727 norandmaps Don't use address space randomization. Equivalent to
1714 echo 0 > /proc/sys/kernel/randomize_va_space 1728 echo 0 > /proc/sys/kernel/randomize_va_space
1715 1729
1716 noreplace-paravirt [X86-32,PV_OPS] Don't patch paravirt_ops 1730 noreplace-paravirt [X86,IA-64,PV_OPS] Don't patch paravirt_ops
1717 1731
1718 noreplace-smp [X86-32,SMP] Don't replace SMP instructions 1732 noreplace-smp [X86-32,SMP] Don't replace SMP instructions
1719 with UP alternatives 1733 with UP alternatives
@@ -2370,6 +2384,15 @@ and is between 256 and 4096 characters. It is defined in the file
2370 2384
2371 switches= [HW,M68k] 2385 switches= [HW,M68k]
2372 2386
2387 sysfs.deprecated=0|1 [KNL]
2388 Enable/disable old style sysfs layout for old udev
2389 on older distributions. When this option is enabled
2390 very new udev will not work anymore. When this option
2391 is disabled (or CONFIG_SYSFS_DEPRECATED not compiled)
2392 in older udev will not work anymore.
2393 Default depends on CONFIG_SYSFS_DEPRECATED_V2 set in
2394 the kernel configuration.
2395
2373 sysrq_always_enabled 2396 sysrq_always_enabled
2374 [KNL] 2397 [KNL]
2375 Ignore sysrq setting - this boot parameter will 2398 Ignore sysrq setting - this boot parameter will
@@ -2418,7 +2441,7 @@ and is between 256 and 4096 characters. It is defined in the file
2418 topology informations if the hardware supports these. 2441 topology informations if the hardware supports these.
2419 The scheduler will make use of these informations and 2442 The scheduler will make use of these informations and
2420 e.g. base its process migration decisions on it. 2443 e.g. base its process migration decisions on it.
2421 Default is off. 2444 Default is on.
2422 2445
2423 tp720= [HW,PS2] 2446 tp720= [HW,PS2]
2424 2447
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 5f5b64982b1a..b336266bea5e 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -320,13 +320,13 @@ struct kvm_translation {
3204.15 KVM_INTERRUPT 3204.15 KVM_INTERRUPT
321 321
322Capability: basic 322Capability: basic
323Architectures: x86 323Architectures: x86, ppc
324Type: vcpu ioctl 324Type: vcpu ioctl
325Parameters: struct kvm_interrupt (in) 325Parameters: struct kvm_interrupt (in)
326Returns: 0 on success, -1 on error 326Returns: 0 on success, -1 on error
327 327
328Queues a hardware interrupt vector to be injected. This is only 328Queues a hardware interrupt vector to be injected. This is only
329useful if in-kernel local APIC is not used. 329useful if in-kernel local APIC or equivalent is not used.
330 330
331/* for KVM_INTERRUPT */ 331/* for KVM_INTERRUPT */
332struct kvm_interrupt { 332struct kvm_interrupt {
@@ -334,8 +334,37 @@ struct kvm_interrupt {
334 __u32 irq; 334 __u32 irq;
335}; 335};
336 336
337X86:
338
337Note 'irq' is an interrupt vector, not an interrupt pin or line. 339Note 'irq' is an interrupt vector, not an interrupt pin or line.
338 340
341PPC:
342
343Queues an external interrupt to be injected. This ioctl is overleaded
344with 3 different irq values:
345
346a) KVM_INTERRUPT_SET
347
348 This injects an edge type external interrupt into the guest once it's ready
349 to receive interrupts. When injected, the interrupt is done.
350
351b) KVM_INTERRUPT_UNSET
352
353 This unsets any pending interrupt.
354
355 Only available with KVM_CAP_PPC_UNSET_IRQ.
356
357c) KVM_INTERRUPT_SET_LEVEL
358
359 This injects a level type external interrupt into the guest context. The
360 interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
361 is triggered.
362
363 Only available with KVM_CAP_PPC_IRQ_LEVEL.
364
365Note that any value for 'irq' other than the ones stated above is invalid
366and incurs unexpected behavior.
367
3394.16 KVM_DEBUG_GUEST 3684.16 KVM_DEBUG_GUEST
340 369
341Capability: basic 370Capability: basic
@@ -1013,8 +1042,9 @@ number is just right, the 'nent' field is adjusted to the number of valid
1013entries in the 'entries' array, which is then filled. 1042entries in the 'entries' array, which is then filled.
1014 1043
1015The entries returned are the host cpuid as returned by the cpuid instruction, 1044The entries returned are the host cpuid as returned by the cpuid instruction,
1016with unknown or unsupported features masked out. The fields in each entry 1045with unknown or unsupported features masked out. Some features (for example,
1017are defined as follows: 1046x2apic), may not be present in the host cpu, but are exposed by kvm if it can
1047emulate them efficiently. The fields in each entry are defined as follows:
1018 1048
1019 function: the eax value used to obtain the entry 1049 function: the eax value used to obtain the entry
1020 index: the ecx value used to obtain the entry (for entries that are 1050 index: the ecx value used to obtain the entry (for entries that are
@@ -1032,6 +1062,29 @@ are defined as follows:
1032 eax, ebx, ecx, edx: the values returned by the cpuid instruction for 1062 eax, ebx, ecx, edx: the values returned by the cpuid instruction for
1033 this function/index combination 1063 this function/index combination
1034 1064
10654.46 KVM_PPC_GET_PVINFO
1066
1067Capability: KVM_CAP_PPC_GET_PVINFO
1068Architectures: ppc
1069Type: vm ioctl
1070Parameters: struct kvm_ppc_pvinfo (out)
1071Returns: 0 on success, !0 on error
1072
1073struct kvm_ppc_pvinfo {
1074 __u32 flags;
1075 __u32 hcall[4];
1076 __u8 pad[108];
1077};
1078
1079This ioctl fetches PV specific information that need to be passed to the guest
1080using the device tree or other means from vm context.
1081
1082For now the only implemented piece of information distributed here is an array
1083of 4 instructions that make up a hypercall.
1084
1085If any additional field gets added to this structure later on, a bit for that
1086additional piece of information will be set in the flags bitmap.
1087
10355. The kvm_run structure 10885. The kvm_run structure
1036 1089
1037Application code obtains a pointer to the kvm_run structure by 1090Application code obtains a pointer to the kvm_run structure by
diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt
new file mode 100644
index 000000000000..a7f2244b3be9
--- /dev/null
+++ b/Documentation/kvm/ppc-pv.txt
@@ -0,0 +1,196 @@
1The PPC KVM paravirtual interface
2=================================
3
4The basic execution principle by which KVM on PowerPC works is to run all kernel
5space code in PR=1 which is user space. This way we trap all privileged
6instructions and can emulate them accordingly.
7
8Unfortunately that is also the downfall. There are quite some privileged
9instructions that needlessly return us to the hypervisor even though they
10could be handled differently.
11
12This is what the PPC PV interface helps with. It takes privileged instructions
13and transforms them into unprivileged ones with some help from the hypervisor.
14This cuts down virtualization costs by about 50% on some of my benchmarks.
15
16The code for that interface can be found in arch/powerpc/kernel/kvm*
17
18Querying for existence
19======================
20
21To find out if we're running on KVM or not, we leverage the device tree. When
22Linux is running on KVM, a node /hypervisor exists. That node contains a
23compatible property with the value "linux,kvm".
24
25Once you determined you're running under a PV capable KVM, you can now use
26hypercalls as described below.
27
28KVM hypercalls
29==============
30
31Inside the device tree's /hypervisor node there's a property called
32'hypercall-instructions'. This property contains at most 4 opcodes that make
33up the hypercall. To call a hypercall, just call these instructions.
34
35The parameters are as follows:
36
37 Register IN OUT
38
39 r0 - volatile
40 r3 1st parameter Return code
41 r4 2nd parameter 1st output value
42 r5 3rd parameter 2nd output value
43 r6 4th parameter 3rd output value
44 r7 5th parameter 4th output value
45 r8 6th parameter 5th output value
46 r9 7th parameter 6th output value
47 r10 8th parameter 7th output value
48 r11 hypercall number 8th output value
49 r12 - volatile
50
51Hypercall definitions are shared in generic code, so the same hypercall numbers
52apply for x86 and powerpc alike with the exception that each KVM hypercall
53also needs to be ORed with the KVM vendor code which is (42 << 16).
54
55Return codes can be as follows:
56
57 Code Meaning
58
59 0 Success
60 12 Hypercall not implemented
61 <0 Error
62
63The magic page
64==============
65
66To enable communication between the hypervisor and guest there is a new shared
67page that contains parts of supervisor visible register state. The guest can
68map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
69
70With this hypercall issued the guest always gets the magic page mapped at the
71desired location in effective and physical address space. For now, we always
72map the page to -4096. This way we can access it using absolute load and store
73functions. The following instruction reads the first field of the magic page:
74
75 ld rX, -4096(0)
76
77The interface is designed to be extensible should there be need later to add
78additional registers to the magic page. If you add fields to the magic page,
79also define a new hypercall feature to indicate that the host can give you more
80registers. Only if the host supports the additional features, make use of them.
81
82The magic page has the following layout as described in
83arch/powerpc/include/asm/kvm_para.h:
84
85struct kvm_vcpu_arch_shared {
86 __u64 scratch1;
87 __u64 scratch2;
88 __u64 scratch3;
89 __u64 critical; /* Guest may not get interrupts if == r1 */
90 __u64 sprg0;
91 __u64 sprg1;
92 __u64 sprg2;
93 __u64 sprg3;
94 __u64 srr0;
95 __u64 srr1;
96 __u64 dar;
97 __u64 msr;
98 __u32 dsisr;
99 __u32 int_pending; /* Tells the guest if we have an interrupt */
100};
101
102Additions to the page must only occur at the end. Struct fields are always 32
103or 64 bit aligned, depending on them being 32 or 64 bit wide respectively.
104
105Magic page features
106===================
107
108When mapping the magic page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE,
109a second return value is passed to the guest. This second return value contains
110a bitmap of available features inside the magic page.
111
112The following enhancements to the magic page are currently available:
113
114 KVM_MAGIC_FEAT_SR Maps SR registers r/w in the magic page
115
116For enhanced features in the magic page, please check for the existence of the
117feature before using them!
118
119MSR bits
120========
121
122The MSR contains bits that require hypervisor intervention and bits that do
123not require direct hypervisor intervention because they only get interpreted
124when entering the guest or don't have any impact on the hypervisor's behavior.
125
126The following bits are safe to be set inside the guest:
127
128 MSR_EE
129 MSR_RI
130 MSR_CR
131 MSR_ME
132
133If any other bit changes in the MSR, please still use mtmsr(d).
134
135Patched instructions
136====================
137
138The "ld" and "std" instructions are transormed to "lwz" and "stw" instructions
139respectively on 32 bit systems with an added offset of 4 to accomodate for big
140endianness.
141
142The following is a list of mapping the Linux kernel performs when running as
143guest. Implementing any of those mappings is optional, as the instruction traps
144also act on the shared page. So calling privileged instructions still works as
145before.
146
147From To
148==== ==
149
150mfmsr rX ld rX, magic_page->msr
151mfsprg rX, 0 ld rX, magic_page->sprg0
152mfsprg rX, 1 ld rX, magic_page->sprg1
153mfsprg rX, 2 ld rX, magic_page->sprg2
154mfsprg rX, 3 ld rX, magic_page->sprg3
155mfsrr0 rX ld rX, magic_page->srr0
156mfsrr1 rX ld rX, magic_page->srr1
157mfdar rX ld rX, magic_page->dar
158mfdsisr rX lwz rX, magic_page->dsisr
159
160mtmsr rX std rX, magic_page->msr
161mtsprg 0, rX std rX, magic_page->sprg0
162mtsprg 1, rX std rX, magic_page->sprg1
163mtsprg 2, rX std rX, magic_page->sprg2
164mtsprg 3, rX std rX, magic_page->sprg3
165mtsrr0 rX std rX, magic_page->srr0
166mtsrr1 rX std rX, magic_page->srr1
167mtdar rX std rX, magic_page->dar
168mtdsisr rX stw rX, magic_page->dsisr
169
170tlbsync nop
171
172mtmsrd rX, 0 b <special mtmsr section>
173mtmsr rX b <special mtmsr section>
174
175mtmsrd rX, 1 b <special mtmsrd section>
176
177[Book3S only]
178mtsrin rX, rY b <special mtsrin section>
179
180[BookE only]
181wrteei [0|1] b <special wrteei section>
182
183
184Some instructions require more logic to determine what's going on than a load
185or store instruction can deliver. To enable patching of those, we keep some
186RAM around where we can live translate instructions to. What happens is the
187following:
188
189 1) copy emulation code to memory
190 2) patch that code to fit the emulated instruction
191 3) patch that code to return to the original pc + 4
192 4) patch the original instruction to branch to the new code
193
194That way we can inject an arbitrary amount of code as replacement for a single
195instruction. This allows us to check for pending interrupts when setting EE=1
196for example.
diff --git a/Documentation/kvm/timekeeping.txt b/Documentation/kvm/timekeeping.txt
new file mode 100644
index 000000000000..0c5033a58c9e
--- /dev/null
+++ b/Documentation/kvm/timekeeping.txt
@@ -0,0 +1,612 @@
1
2 Timekeeping Virtualization for X86-Based Architectures
3
4 Zachary Amsden <zamsden@redhat.com>
5 Copyright (c) 2010, Red Hat. All rights reserved.
6
71) Overview
82) Timing Devices
93) TSC Hardware
104) Virtualization Problems
11
12=========================================================================
13
141) Overview
15
16One of the most complicated parts of the X86 platform, and specifically,
17the virtualization of this platform is the plethora of timing devices available
18and the complexity of emulating those devices. In addition, virtualization of
19time introduces a new set of challenges because it introduces a multiplexed
20division of time beyond the control of the guest CPU.
21
22First, we will describe the various timekeeping hardware available, then
23present some of the problems which arise and solutions available, giving
24specific recommendations for certain classes of KVM guests.
25
26The purpose of this document is to collect data and information relevant to
27timekeeping which may be difficult to find elsewhere, specifically,
28information relevant to KVM and hardware-based virtualization.
29
30=========================================================================
31
322) Timing Devices
33
34First we discuss the basic hardware devices available. TSC and the related
35KVM clock are special enough to warrant a full exposition and are described in
36the following section.
37
382.1) i8254 - PIT
39
40One of the first timer devices available is the programmable interrupt timer,
41or PIT. The PIT has a fixed frequency 1.193182 MHz base clock and three
42channels which can be programmed to deliver periodic or one-shot interrupts.
43These three channels can be configured in different modes and have individual
44counters. Channel 1 and 2 were not available for general use in the original
45IBM PC, and historically were connected to control RAM refresh and the PC
46speaker. Now the PIT is typically integrated as part of an emulated chipset
47and a separate physical PIT is not used.
48
49The PIT uses I/O ports 0x40 - 0x43. Access to the 16-bit counters is done
50using single or multiple byte access to the I/O ports. There are 6 modes
51available, but not all modes are available to all timers, as only timer 2
52has a connected gate input, required for modes 1 and 5. The gate line is
53controlled by port 61h, bit 0, as illustrated in the following diagram.
54
55 -------------- ----------------
56| | | |
57| 1.1932 MHz |---------->| CLOCK OUT | ---------> IRQ 0
58| Clock | | | |
59 -------------- | +->| GATE TIMER 0 |
60 | ----------------
61 |
62 | ----------------
63 | | |
64 |------>| CLOCK OUT | ---------> 66.3 KHZ DRAM
65 | | | (aka /dev/null)
66 | +->| GATE TIMER 1 |
67 | ----------------
68 |
69 | ----------------
70 | | |
71 |------>| CLOCK OUT | ---------> Port 61h, bit 5
72 | | |
73Port 61h, bit 0 ---------->| GATE TIMER 2 | \_.---- ____
74 ---------------- _| )--|LPF|---Speaker
75 / *---- \___/
76Port 61h, bit 1 -----------------------------------/
77
78The timer modes are now described.
79
80Mode 0: Single Timeout. This is a one-shot software timeout that counts down
81 when the gate is high (always true for timers 0 and 1). When the count
82 reaches zero, the output goes high.
83
84Mode 1: Triggered One-shot. The output is intially set high. When the gate
85 line is set high, a countdown is initiated (which does not stop if the gate is
86 lowered), during which the output is set low. When the count reaches zero,
87 the output goes high.
88
89Mode 2: Rate Generator. The output is initially set high. When the countdown
90 reaches 1, the output goes low for one count and then returns high. The value
91 is reloaded and the countdown automatically resumes. If the gate line goes
92 low, the count is halted. If the output is low when the gate is lowered, the
93 output automatically goes high (this only affects timer 2).
94
95Mode 3: Square Wave. This generates a high / low square wave. The count
96 determines the length of the pulse, which alternates between high and low
97 when zero is reached. The count only proceeds when gate is high and is
98 automatically reloaded on reaching zero. The count is decremented twice at
99 each clock to generate a full high / low cycle at the full periodic rate.
100 If the count is even, the clock remains high for N/2 counts and low for N/2
101 counts; if the clock is odd, the clock is high for (N+1)/2 counts and low
102 for (N-1)/2 counts. Only even values are latched by the counter, so odd
103 values are not observed when reading. This is the intended mode for timer 2,
104 which generates sine-like tones by low-pass filtering the square wave output.
105
106Mode 4: Software Strobe. After programming this mode and loading the counter,
107 the output remains high until the counter reaches zero. Then the output
108 goes low for 1 clock cycle and returns high. The counter is not reloaded.
109 Counting only occurs when gate is high.
110
111Mode 5: Hardware Strobe. After programming and loading the counter, the
112 output remains high. When the gate is raised, a countdown is initiated
113 (which does not stop if the gate is lowered). When the counter reaches zero,
114 the output goes low for 1 clock cycle and then returns high. The counter is
115 not reloaded.
116
117In addition to normal binary counting, the PIT supports BCD counting. The
118command port, 0x43 is used to set the counter and mode for each of the three
119timers.
120
121PIT commands, issued to port 0x43, using the following bit encoding:
122
123Bit 7-4: Command (See table below)
124Bit 3-1: Mode (000 = Mode 0, 101 = Mode 5, 11X = undefined)
125Bit 0 : Binary (0) / BCD (1)
126
127Command table:
128
1290000 - Latch Timer 0 count for port 0x40
130 sample and hold the count to be read in port 0x40;
131 additional commands ignored until counter is read;
132 mode bits ignored.
133
1340001 - Set Timer 0 LSB mode for port 0x40
135 set timer to read LSB only and force MSB to zero;
136 mode bits set timer mode
137
1380010 - Set Timer 0 MSB mode for port 0x40
139 set timer to read MSB only and force LSB to zero;
140 mode bits set timer mode
141
1420011 - Set Timer 0 16-bit mode for port 0x40
143 set timer to read / write LSB first, then MSB;
144 mode bits set timer mode
145
1460100 - Latch Timer 1 count for port 0x41 - as described above
1470101 - Set Timer 1 LSB mode for port 0x41 - as described above
1480110 - Set Timer 1 MSB mode for port 0x41 - as described above
1490111 - Set Timer 1 16-bit mode for port 0x41 - as described above
150
1511000 - Latch Timer 2 count for port 0x42 - as described above
1521001 - Set Timer 2 LSB mode for port 0x42 - as described above
1531010 - Set Timer 2 MSB mode for port 0x42 - as described above
1541011 - Set Timer 2 16-bit mode for port 0x42 as described above
155
1561101 - General counter latch
157 Latch combination of counters into corresponding ports
158 Bit 3 = Counter 2
159 Bit 2 = Counter 1
160 Bit 1 = Counter 0
161 Bit 0 = Unused
162
1631110 - Latch timer status
164 Latch combination of counter mode into corresponding ports
165 Bit 3 = Counter 2
166 Bit 2 = Counter 1
167 Bit 1 = Counter 0
168
169 The output of ports 0x40-0x42 following this command will be:
170
171 Bit 7 = Output pin
172 Bit 6 = Count loaded (0 if timer has expired)
173 Bit 5-4 = Read / Write mode
174 01 = MSB only
175 10 = LSB only
176 11 = LSB / MSB (16-bit)
177 Bit 3-1 = Mode
178 Bit 0 = Binary (0) / BCD mode (1)
179
1802.2) RTC
181
182The second device which was available in the original PC was the MC146818 real
183time clock. The original device is now obsolete, and usually emulated by the
184system chipset, sometimes by an HPET and some frankenstein IRQ routing.
185
186The RTC is accessed through CMOS variables, which uses an index register to
187control which bytes are read. Since there is only one index register, read
188of the CMOS and read of the RTC require lock protection (in addition, it is
189dangerous to allow userspace utilities such as hwclock to have direct RTC
190access, as they could corrupt kernel reads and writes of CMOS memory).
191
192The RTC generates an interrupt which is usually routed to IRQ 8. The interrupt
193can function as a periodic timer, an additional once a day alarm, and can issue
194interrupts after an update of the CMOS registers by the MC146818 is complete.
195The type of interrupt is signalled in the RTC status registers.
196
197The RTC will update the current time fields by battery power even while the
198system is off. The current time fields should not be read while an update is
199in progress, as indicated in the status register.
200
201The clock uses a 32.768kHz crystal, so bits 6-4 of register A should be
202programmed to a 32kHz divider if the RTC is to count seconds.
203
204This is the RAM map originally used for the RTC/CMOS:
205
206Location Size Description
207------------------------------------------
20800h byte Current second (BCD)
20901h byte Seconds alarm (BCD)
21002h byte Current minute (BCD)
21103h byte Minutes alarm (BCD)
21204h byte Current hour (BCD)
21305h byte Hours alarm (BCD)
21406h byte Current day of week (BCD)
21507h byte Current day of month (BCD)
21608h byte Current month (BCD)
21709h byte Current year (BCD)
2180Ah byte Register A
219 bit 7 = Update in progress
220 bit 6-4 = Divider for clock
221 000 = 4.194 MHz
222 001 = 1.049 MHz
223 010 = 32 kHz
224 10X = test modes
225 110 = reset / disable
226 111 = reset / disable
227 bit 3-0 = Rate selection for periodic interrupt
228 000 = periodic timer disabled
229 001 = 3.90625 uS
230 010 = 7.8125 uS
231 011 = .122070 mS
232 100 = .244141 mS
233 ...
234 1101 = 125 mS
235 1110 = 250 mS
236 1111 = 500 mS
2370Bh byte Register B
238 bit 7 = Run (0) / Halt (1)
239 bit 6 = Periodic interrupt enable
240 bit 5 = Alarm interrupt enable
241 bit 4 = Update-ended interrupt enable
242 bit 3 = Square wave interrupt enable
243 bit 2 = BCD calendar (0) / Binary (1)
244 bit 1 = 12-hour mode (0) / 24-hour mode (1)
245 bit 0 = 0 (DST off) / 1 (DST enabled)
246OCh byte Register C (read only)
247 bit 7 = interrupt request flag (IRQF)
248 bit 6 = periodic interrupt flag (PF)
249 bit 5 = alarm interrupt flag (AF)
250 bit 4 = update interrupt flag (UF)
251 bit 3-0 = reserved
252ODh byte Register D (read only)
253 bit 7 = RTC has power
254 bit 6-0 = reserved
25532h byte Current century BCD (*)
256 (*) location vendor specific and now determined from ACPI global tables
257
2582.3) APIC
259
260On Pentium and later processors, an on-board timer is available to each CPU
261as part of the Advanced Programmable Interrupt Controller. The APIC is
262accessed through memory-mapped registers and provides interrupt service to each
263CPU, used for IPIs and local timer interrupts.
264
265Although in theory the APIC is a safe and stable source for local interrupts,
266in practice, many bugs and glitches have occurred due to the special nature of
267the APIC CPU-local memory-mapped hardware. Beware that CPU errata may affect
268the use of the APIC and that workarounds may be required. In addition, some of
269these workarounds pose unique constraints for virtualization - requiring either
270extra overhead incurred from extra reads of memory-mapped I/O or additional
271functionality that may be more computationally expensive to implement.
272
273Since the APIC is documented quite well in the Intel and AMD manuals, we will
274avoid repetition of the detail here. It should be pointed out that the APIC
275timer is programmed through the LVT (local vector timer) register, is capable
276of one-shot or periodic operation, and is based on the bus clock divided down
277by the programmable divider register.
278
2792.4) HPET
280
281HPET is quite complex, and was originally intended to replace the PIT / RTC
282support of the X86 PC. It remains to be seen whether that will be the case, as
283the de facto standard of PC hardware is to emulate these older devices. Some
284systems designated as legacy free may support only the HPET as a hardware timer
285device.
286
287The HPET spec is rather loose and vague, requiring at least 3 hardware timers,
288but allowing implementation freedom to support many more. It also imposes no
289fixed rate on the timer frequency, but does impose some extremal values on
290frequency, error and slew.
291
292In general, the HPET is recommended as a high precision (compared to PIT /RTC)
293time source which is independent of local variation (as there is only one HPET
294in any given system). The HPET is also memory-mapped, and its presence is
295indicated through ACPI tables by the BIOS.
296
297Detailed specification of the HPET is beyond the current scope of this
298document, as it is also very well documented elsewhere.
299
3002.5) Offboard Timers
301
302Several cards, both proprietary (watchdog boards) and commonplace (e1000) have
303timing chips built into the cards which may have registers which are accessible
304to kernel or user drivers. To the author's knowledge, using these to generate
305a clocksource for a Linux or other kernel has not yet been attempted and is in
306general frowned upon as not playing by the agreed rules of the game. Such a
307timer device would require additional support to be virtualized properly and is
308not considered important at this time as no known operating system does this.
309
310=========================================================================
311
3123) TSC Hardware
313
314The TSC or time stamp counter is relatively simple in theory; it counts
315instruction cycles issued by the processor, which can be used as a measure of
316time. In practice, due to a number of problems, it is the most complicated
317timekeeping device to use.
318
319The TSC is represented internally as a 64-bit MSR which can be read with the
320RDMSR, RDTSC, or RDTSCP (when available) instructions. In the past, hardware
321limitations made it possible to write the TSC, but generally on old hardware it
322was only possible to write the low 32-bits of the 64-bit counter, and the upper
32332-bits of the counter were cleared. Now, however, on Intel processors family
3240Fh, for models 3, 4 and 6, and family 06h, models e and f, this restriction
325has been lifted and all 64-bits are writable. On AMD systems, the ability to
326write the TSC MSR is not an architectural guarantee.
327
328The TSC is accessible from CPL-0 and conditionally, for CPL > 0 software by
329means of the CR4.TSD bit, which when enabled, disables CPL > 0 TSC access.
330
331Some vendors have implemented an additional instruction, RDTSCP, which returns
332atomically not just the TSC, but an indicator which corresponds to the
333processor number. This can be used to index into an array of TSC variables to
334determine offset information in SMP systems where TSCs are not synchronized.
335The presence of this instruction must be determined by consulting CPUID feature
336bits.
337
338Both VMX and SVM provide extension fields in the virtualization hardware which
339allows the guest visible TSC to be offset by a constant. Newer implementations
340promise to allow the TSC to additionally be scaled, but this hardware is not
341yet widely available.
342
3433.1) TSC synchronization
344
345The TSC is a CPU-local clock in most implementations. This means, on SMP
346platforms, the TSCs of different CPUs may start at different times depending
347on when the CPUs are powered on. Generally, CPUs on the same die will share
348the same clock, however, this is not always the case.
349
350The BIOS may attempt to resynchronize the TSCs during the poweron process and
351the operating system or other system software may attempt to do this as well.
352Several hardware limitations make the problem worse - if it is not possible to
353write the full 64-bits of the TSC, it may be impossible to match the TSC in
354newly arriving CPUs to that of the rest of the system, resulting in
355unsynchronized TSCs. This may be done by BIOS or system software, but in
356practice, getting a perfectly synchronized TSC will not be possible unless all
357values are read from the same clock, which generally only is possible on single
358socket systems or those with special hardware support.
359
3603.2) TSC and CPU hotplug
361
362As touched on already, CPUs which arrive later than the boot time of the system
363may not have a TSC value that is synchronized with the rest of the system.
364Either system software, BIOS, or SMM code may actually try to establish the TSC
365to a value matching the rest of the system, but a perfect match is usually not
366a guarantee. This can have the effect of bringing a system from a state where
367TSC is synchronized back to a state where TSC synchronization flaws, however
368small, may be exposed to the OS and any virtualization environment.
369
3703.3) TSC and multi-socket / NUMA
371
372Multi-socket systems, especially large multi-socket systems are likely to have
373individual clocksources rather than a single, universally distributed clock.
374Since these clocks are driven by different crystals, they will not have
375perfectly matched frequency, and temperature and electrical variations will
376cause the CPU clocks, and thus the TSCs to drift over time. Depending on the
377exact clock and bus design, the drift may or may not be fixed in absolute
378error, and may accumulate over time.
379
380In addition, very large systems may deliberately slew the clocks of individual
381cores. This technique, known as spread-spectrum clocking, reduces EMI at the
382clock frequency and harmonics of it, which may be required to pass FCC
383standards for telecommunications and computer equipment.
384
385It is recommended not to trust the TSCs to remain synchronized on NUMA or
386multiple socket systems for these reasons.
387
3883.4) TSC and C-states
389
390C-states, or idling states of the processor, especially C1E and deeper sleep
391states may be problematic for TSC as well. The TSC may stop advancing in such
392a state, resulting in a TSC which is behind that of other CPUs when execution
393is resumed. Such CPUs must be detected and flagged by the operating system
394based on CPU and chipset identifications.
395
396The TSC in such a case may be corrected by catching it up to a known external
397clocksource.
398
3993.5) TSC frequency change / P-states
400
401To make things slightly more interesting, some CPUs may change frequency. They
402may or may not run the TSC at the same rate, and because the frequency change
403may be staggered or slewed, at some points in time, the TSC rate may not be
404known other than falling within a range of values. In this case, the TSC will
405not be a stable time source, and must be calibrated against a known, stable,
406external clock to be a usable source of time.
407
408Whether the TSC runs at a constant rate or scales with the P-state is model
409dependent and must be determined by inspecting CPUID, chipset or vendor
410specific MSR fields.
411
412In addition, some vendors have known bugs where the P-state is actually
413compensated for properly during normal operation, but when the processor is
414inactive, the P-state may be raised temporarily to service cache misses from
415other processors. In such cases, the TSC on halted CPUs could advance faster
416than that of non-halted processors. AMD Turion processors are known to have
417this problem.
418
4193.6) TSC and STPCLK / T-states
420
421External signals given to the processor may also have the effect of stopping
422the TSC. This is typically done for thermal emergency power control to prevent
423an overheating condition, and typically, there is no way to detect that this
424condition has happened.
425
4263.7) TSC virtualization - VMX
427
428VMX provides conditional trapping of RDTSC, RDMSR, WRMSR and RDTSCP
429instructions, which is enough for full virtualization of TSC in any manner. In
430addition, VMX allows passing through the host TSC plus an additional TSC_OFFSET
431field specified in the VMCS. Special instructions must be used to read and
432write the VMCS field.
433
4343.8) TSC virtualization - SVM
435
436SVM provides conditional trapping of RDTSC, RDMSR, WRMSR and RDTSCP
437instructions, which is enough for full virtualization of TSC in any manner. In
438addition, SVM allows passing through the host TSC plus an additional offset
439field specified in the SVM control block.
440
4413.9) TSC feature bits in Linux
442
443In summary, there is no way to guarantee the TSC remains in perfect
444synchronization unless it is explicitly guaranteed by the architecture. Even
445if so, the TSCs in multi-sockets or NUMA systems may still run independently
446despite being locally consistent.
447
448The following feature bits are used by Linux to signal various TSC attributes,
449but they can only be taken to be meaningful for UP or single node systems.
450
451X86_FEATURE_TSC : The TSC is available in hardware
452X86_FEATURE_RDTSCP : The RDTSCP instruction is available
453X86_FEATURE_CONSTANT_TSC : The TSC rate is unchanged with P-states
454X86_FEATURE_NONSTOP_TSC : The TSC does not stop in C-states
455X86_FEATURE_TSC_RELIABLE : TSC sync checks are skipped (VMware)
456
4574) Virtualization Problems
458
459Timekeeping is especially problematic for virtualization because a number of
460challenges arise. The most obvious problem is that time is now shared between
461the host and, potentially, a number of virtual machines. Thus the virtual
462operating system does not run with 100% usage of the CPU, despite the fact that
463it may very well make that assumption. It may expect it to remain true to very
464exacting bounds when interrupt sources are disabled, but in reality only its
465virtual interrupt sources are disabled, and the machine may still be preempted
466at any time. This causes problems as the passage of real time, the injection
467of machine interrupts and the associated clock sources are no longer completely
468synchronized with real time.
469
470This same problem can occur on native harware to a degree, as SMM mode may
471steal cycles from the naturally on X86 systems when SMM mode is used by the
472BIOS, but not in such an extreme fashion. However, the fact that SMM mode may
473cause similar problems to virtualization makes it a good justification for
474solving many of these problems on bare metal.
475
4764.1) Interrupt clocking
477
478One of the most immediate problems that occurs with legacy operating systems
479is that the system timekeeping routines are often designed to keep track of
480time by counting periodic interrupts. These interrupts may come from the PIT
481or the RTC, but the problem is the same: the host virtualization engine may not
482be able to deliver the proper number of interrupts per second, and so guest
483time may fall behind. This is especially problematic if a high interrupt rate
484is selected, such as 1000 HZ, which is unfortunately the default for many Linux
485guests.
486
487There are three approaches to solving this problem; first, it may be possible
488to simply ignore it. Guests which have a separate time source for tracking
489'wall clock' or 'real time' may not need any adjustment of their interrupts to
490maintain proper time. If this is not sufficient, it may be necessary to inject
491additional interrupts into the guest in order to increase the effective
492interrupt rate. This approach leads to complications in extreme conditions,
493where host load or guest lag is too much to compensate for, and thus another
494solution to the problem has risen: the guest may need to become aware of lost
495ticks and compensate for them internally. Although promising in theory, the
496implementation of this policy in Linux has been extremely error prone, and a
497number of buggy variants of lost tick compensation are distributed across
498commonly used Linux systems.
499
500Windows uses periodic RTC clocking as a means of keeping time internally, and
501thus requires interrupt slewing to keep proper time. It does use a low enough
502rate (ed: is it 18.2 Hz?) however that it has not yet been a problem in
503practice.
504
5054.2) TSC sampling and serialization
506
507As the highest precision time source available, the cycle counter of the CPU
508has aroused much interest from developers. As explained above, this timer has
509many problems unique to its nature as a local, potentially unstable and
510potentially unsynchronized source. One issue which is not unique to the TSC,
511but is highlighted because of its very precise nature is sampling delay. By
512definition, the counter, once read is already old. However, it is also
513possible for the counter to be read ahead of the actual use of the result.
514This is a consequence of the superscalar execution of the instruction stream,
515which may execute instructions out of order. Such execution is called
516non-serialized. Forcing serialized execution is necessary for precise
517measurement with the TSC, and requires a serializing instruction, such as CPUID
518or an MSR read.
519
520Since CPUID may actually be virtualized by a trap and emulate mechanism, this
521serialization can pose a performance issue for hardware virtualization. An
522accurate time stamp counter reading may therefore not always be available, and
523it may be necessary for an implementation to guard against "backwards" reads of
524the TSC as seen from other CPUs, even in an otherwise perfectly synchronized
525system.
526
5274.3) Timespec aliasing
528
529Additionally, this lack of serialization from the TSC poses another challenge
530when using results of the TSC when measured against another time source. As
531the TSC is much higher precision, many possible values of the TSC may be read
532while another clock is still expressing the same value.
533
534That is, you may read (T,T+10) while external clock C maintains the same value.
535Due to non-serialized reads, you may actually end up with a range which
536fluctuates - from (T-1.. T+10). Thus, any time calculated from a TSC, but
537calibrated against an external value may have a range of valid values.
538Re-calibrating this computation may actually cause time, as computed after the
539calibration, to go backwards, compared with time computed before the
540calibration.
541
542This problem is particularly pronounced with an internal time source in Linux,
543the kernel time, which is expressed in the theoretically high resolution
544timespec - but which advances in much larger granularity intervals, sometimes
545at the rate of jiffies, and possibly in catchup modes, at a much larger step.
546
547This aliasing requires care in the computation and recalibration of kvmclock
548and any other values derived from TSC computation (such as TSC virtualization
549itself).
550
5514.4) Migration
552
553Migration of a virtual machine raises problems for timekeeping in two ways.
554First, the migration itself may take time, during which interrupts cannot be
555delivered, and after which, the guest time may need to be caught up. NTP may
556be able to help to some degree here, as the clock correction required is
557typically small enough to fall in the NTP-correctable window.
558
559An additional concern is that timers based off the TSC (or HPET, if the raw bus
560clock is exposed) may now be running at different rates, requiring compensation
561in some way in the hypervisor by virtualizing these timers. In addition,
562migrating to a faster machine may preclude the use of a passthrough TSC, as a
563faster clock cannot be made visible to a guest without the potential of time
564advancing faster than usual. A slower clock is less of a problem, as it can
565always be caught up to the original rate. KVM clock avoids these problems by
566simply storing multipliers and offsets against the TSC for the guest to convert
567back into nanosecond resolution values.
568
5694.5) Scheduling
570
571Since scheduling may be based on precise timing and firing of interrupts, the
572scheduling algorithms of an operating system may be adversely affected by
573virtualization. In theory, the effect is random and should be universally
574distributed, but in contrived as well as real scenarios (guest device access,
575causes of virtualization exits, possible context switch), this may not always
576be the case. The effect of this has not been well studied.
577
578In an attempt to work around this, several implementations have provided a
579paravirtualized scheduler clock, which reveals the true amount of CPU time for
580which a virtual machine has been running.
581
5824.6) Watchdogs
583
584Watchdog timers, such as the lock detector in Linux may fire accidentally when
585running under hardware virtualization due to timer interrupts being delayed or
586misinterpretation of the passage of real time. Usually, these warnings are
587spurious and can be ignored, but in some circumstances it may be necessary to
588disable such detection.
589
5904.7) Delays and precision timing
591
592Precise timing and delays may not be possible in a virtualized system. This
593can happen if the system is controlling physical hardware, or issues delays to
594compensate for slower I/O to and from devices. The first issue is not solvable
595in general for a virtualized system; hardware control software can't be
596adequately virtualized without a full real-time operating system, which would
597require an RT aware virtualization platform.
598
599The second issue may cause performance problems, but this is unlikely to be a
600significant issue. In many cases these delays may be eliminated through
601configuration or paravirtualization.
602
6034.8) Covert channels and leaks
604
605In addition to the above problems, time information will inevitably leak to the
606guest about the host in anything but a perfect implementation of virtualized
607time. This may allow the guest to infer the presence of a hypervisor (as in a
608red-pill type detection), and it may allow information to leak between guests
609by using CPU utilization itself as a signalling channel. Preventing such
610problems would require completely isolated virtual time which may not track
611real time any longer. This may be useful in certain security or QA contexts,
612but in general isn't recommended for real-world deployment scenarios.
diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index 8a6a8c6d4980..dc73bc54cc4e 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -1640,15 +1640,6 @@ static void blk_request(struct virtqueue *vq)
1640 off = out->sector * 512; 1640 off = out->sector * 512;
1641 1641
1642 /* 1642 /*
1643 * The block device implements "barriers", where the Guest indicates
1644 * that it wants all previous writes to occur before this write. We
1645 * don't have a way of asking our kernel to do a barrier, so we just
1646 * synchronize all the data in the file. Pretty poor, no?
1647 */
1648 if (out->type & VIRTIO_BLK_T_BARRIER)
1649 fdatasync(vblk->fd);
1650
1651 /*
1652 * In general the virtio block driver is allowed to try SCSI commands. 1643 * In general the virtio block driver is allowed to try SCSI commands.
1653 * It'd be nice if we supported eject, for example, but we don't. 1644 * It'd be nice if we supported eject, for example, but we don't.
1654 */ 1645 */
@@ -1680,6 +1671,13 @@ static void blk_request(struct virtqueue *vq)
1680 /* Die, bad Guest, die. */ 1671 /* Die, bad Guest, die. */
1681 errx(1, "Write past end %llu+%u", off, ret); 1672 errx(1, "Write past end %llu+%u", off, ret);
1682 } 1673 }
1674
1675 wlen = sizeof(*in);
1676 *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR);
1677 } else if (out->type & VIRTIO_BLK_T_FLUSH) {
1678 /* Flush */
1679 ret = fdatasync(vblk->fd);
1680 verbose("FLUSH fdatasync: %i\n", ret);
1683 wlen = sizeof(*in); 1681 wlen = sizeof(*in);
1684 *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR); 1682 *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR);
1685 } else { 1683 } else {
@@ -1703,15 +1701,6 @@ static void blk_request(struct virtqueue *vq)
1703 } 1701 }
1704 } 1702 }
1705 1703
1706 /*
1707 * OK, so we noted that it was pretty poor to use an fdatasync as a
1708 * barrier. But Christoph Hellwig points out that we need a sync
1709 * *afterwards* as well: "Barriers specify no reordering to the front
1710 * or the back." And Jens Axboe confirmed it, so here we are:
1711 */
1712 if (out->type & VIRTIO_BLK_T_BARRIER)
1713 fdatasync(vblk->fd);
1714
1715 /* Finished that request. */ 1704 /* Finished that request. */
1716 add_used(vq, head, wlen); 1705 add_used(vq, head, wlen);
1717} 1706}
@@ -1736,8 +1725,8 @@ static void setup_block_file(const char *filename)
1736 vblk->fd = open_or_die(filename, O_RDWR|O_LARGEFILE); 1725 vblk->fd = open_or_die(filename, O_RDWR|O_LARGEFILE);
1737 vblk->len = lseek64(vblk->fd, 0, SEEK_END); 1726 vblk->len = lseek64(vblk->fd, 0, SEEK_END);
1738 1727
1739 /* We support barriers. */ 1728 /* We support FLUSH. */
1740 add_feature(dev, VIRTIO_BLK_F_BARRIER); 1729 add_feature(dev, VIRTIO_BLK_F_FLUSH);
1741 1730
1742 /* Tell Guest how many sectors this device has. */ 1731 /* Tell Guest how many sectors this device has. */
1743 conf.capacity = cpu_to_le64(vblk->len / 512); 1732 conf.capacity = cpu_to_le64(vblk->len / 512);
diff --git a/Documentation/misc-devices/apds990x.txt b/Documentation/misc-devices/apds990x.txt
new file mode 100644
index 000000000000..d5408cade32f
--- /dev/null
+++ b/Documentation/misc-devices/apds990x.txt
@@ -0,0 +1,111 @@
1Kernel driver apds990x
2======================
3
4Supported chips:
5Avago APDS990X
6
7Data sheet:
8Not freely available
9
10Author:
11Samu Onkalo <samu.p.onkalo@nokia.com>
12
13Description
14-----------
15
16APDS990x is a combined ambient light and proximity sensor. ALS and proximity
17functionality are highly connected. ALS measurement path must be running
18while the proximity functionality is enabled.
19
20ALS produces raw measurement values for two channels: Clear channel
21(infrared + visible light) and IR only. However, threshold comparisons happen
22using clear channel only. Lux value and the threshold level on the HW
23might vary quite much depending the spectrum of the light source.
24
25Driver makes necessary conversions to both directions so that user handles
26only lux values. Lux value is calculated using information from the both
27channels. HW threshold level is calculated from the given lux value to match
28with current type of the lightning. Sometimes inaccuracy of the estimations
29lead to false interrupt, but that doesn't harm.
30
31ALS contains 4 different gain steps. Driver automatically
32selects suitable gain step. After each measurement, reliability of the results
33is estimated and new measurement is trigged if necessary.
34
35Platform data can provide tuned values to the conversion formulas if
36values are known. Otherwise plain sensor default values are used.
37
38Proximity side is little bit simpler. There is no need for complex conversions.
39It produces directly usable values.
40
41Driver controls chip operational state using pm_runtime framework.
42Voltage regulators are controlled based on chip operational state.
43
44SYSFS
45-----
46
47
48chip_id
49 RO - shows detected chip type and version
50
51power_state
52 RW - enable / disable chip. Uses counting logic
53 1 enables the chip
54 0 disables the chip
55lux0_input
56 RO - measured lux value
57 sysfs_notify called when threshold interrupt occurs
58
59lux0_sensor_range
60 RO - lux0_input max value. Actually never reaches since sensor tends
61 to saturate much before that. Real max value varies depending
62 on the light spectrum etc.
63
64lux0_rate
65 RW - measurement rate in Hz
66
67lux0_rate_avail
68 RO - supported measurement rates
69
70lux0_calibscale
71 RW - calibration value. Set to neutral value by default.
72 Output results are multiplied with calibscale / calibscale_default
73 value.
74
75lux0_calibscale_default
76 RO - neutral calibration value
77
78lux0_thresh_above_value
79 RW - HI level threshold value. All results above the value
80 trigs an interrupt. 65535 (i.e. sensor_range) disables the above
81 interrupt.
82
83lux0_thresh_below_value
84 RW - LO level threshold value. All results below the value
85 trigs an interrupt. 0 disables the below interrupt.
86
87prox0_raw
88 RO - measured proximity value
89 sysfs_notify called when threshold interrupt occurs
90
91prox0_sensor_range
92 RO - prox0_raw max value (1023)
93
94prox0_raw_en
95 RW - enable / disable proximity - uses counting logic
96 1 enables the proximity
97 0 disables the proximity
98
99prox0_reporting_mode
100 RW - trigger / periodic. In "trigger" mode the driver tells two possible
101 values: 0 or prox0_sensor_range value. 0 means no proximity,
102 1023 means proximity. This causes minimal number of interrupts.
103 In "periodic" mode the driver reports all values above
104 prox0_thresh_above. This causes more interrupts, but it can give
105 _rough_ estimate about the distance.
106
107prox0_reporting_mode_avail
108 RO - accepted values to prox0_reporting_mode (trigger, periodic)
109
110prox0_thresh_above_value
111 RW - threshold level which trigs proximity events.
diff --git a/Documentation/misc-devices/bh1770glc.txt b/Documentation/misc-devices/bh1770glc.txt
new file mode 100644
index 000000000000..7d64c014dc70
--- /dev/null
+++ b/Documentation/misc-devices/bh1770glc.txt
@@ -0,0 +1,116 @@
1Kernel driver bh1770glc
2=======================
3
4Supported chips:
5ROHM BH1770GLC
6OSRAM SFH7770
7
8Data sheet:
9Not freely available
10
11Author:
12Samu Onkalo <samu.p.onkalo@nokia.com>
13
14Description
15-----------
16BH1770GLC and SFH7770 are combined ambient light and proximity sensors.
17ALS and proximity parts operates on their own, but they shares common I2C
18interface and interrupt logic. In principle they can run on their own,
19but ALS side results are used to estimate reliability of the proximity sensor.
20
21ALS produces 16 bit lux values. The chip contains interrupt logic to produce
22low and high threshold interrupts.
23
24Proximity part contains IR-led driver up to 3 IR leds. The chip measures
25amount of reflected IR light and produces proximity result. Resolution is
268 bit. Driver supports only one channel. Driver uses ALS results to estimate
27reliability of the proximity results. Thus ALS is always running while
28proximity detection is needed.
29
30Driver uses threshold interrupts to avoid need for polling the values.
31Proximity low interrupt doesn't exists in the chip. This is simulated
32by using a delayed work. As long as there is proximity threshold above
33interrupts the delayed work is pushed forward. So, when proximity level goes
34below the threshold value, there is no interrupt and the delayed work will
35finally run. This is handled as no proximity indication.
36
37Chip state is controlled via runtime pm framework when enabled in config.
38
39Calibscale factor is used to hide differences between the chips. By default
40value set to neutral state meaning factor of 1.00. To get proper values,
41calibrated source of light is needed as a reference. Calibscale factor is set
42so that measurement produces about the expected lux value.
43
44SYSFS
45-----
46
47chip_id
48 RO - shows detected chip type and version
49
50power_state
51 RW - enable / disable chip. Uses counting logic
52 1 enables the chip
53 0 disables the chip
54
55lux0_input
56 RO - measured lux value
57 sysfs_notify called when threshold interrupt occurs
58
59lux0_sensor_range
60 RO - lux0_input max value
61
62lux0_rate
63 RW - measurement rate in Hz
64
65lux0_rate_avail
66 RO - supported measurement rates
67
68lux0_thresh_above_value
69 RW - HI level threshold value. All results above the value
70 trigs an interrupt. 65535 (i.e. sensor_range) disables the above
71 interrupt.
72
73lux0_thresh_below_value
74 RW - LO level threshold value. All results below the value
75 trigs an interrupt. 0 disables the below interrupt.
76
77lux0_calibscale
78 RW - calibration value. Set to neutral value by default.
79 Output results are multiplied with calibscale / calibscale_default
80 value.
81
82lux0_calibscale_default
83 RO - neutral calibration value
84
85prox0_raw
86 RO - measured proximity value
87 sysfs_notify called when threshold interrupt occurs
88
89prox0_sensor_range
90 RO - prox0_raw max value
91
92prox0_raw_en
93 RW - enable / disable proximity - uses counting logic
94 1 enables the proximity
95 0 disables the proximity
96
97prox0_thresh_above_count
98 RW - number of proximity interrupts needed before triggering the event
99
100prox0_rate_above
101 RW - Measurement rate (in Hz) when the level is above threshold
102 i.e. when proximity on has been reported.
103
104prox0_rate_below
105 RW - Measurement rate (in Hz) when the level is below threshold
106 i.e. when proximity off has been reported.
107
108prox0_rate_avail
109 RO - Supported proximity measurement rates in Hz
110
111prox0_thresh_above0_value
112 RW - threshold level which trigs proximity events.
113 Filtered by persistence filter (prox0_thresh_above_count)
114
115prox0_thresh_above1_value
116 RW - threshold level which trigs event immediately
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index d2b62b71b617..5dc638791d97 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -765,6 +765,14 @@ xmit_hash_policy
765 does not exist, and the layer2 policy is the only policy. The 765 does not exist, and the layer2 policy is the only policy. The
766 layer2+3 value was added for bonding version 3.2.2. 766 layer2+3 value was added for bonding version 3.2.2.
767 767
768resend_igmp
769
770 Specifies the number of IGMP membership reports to be issued after
771 a failover event. One membership report is issued immediately after
772 the failover, subsequent packets are sent in each 200ms interval.
773
774 The valid range is 0 - 255; the default value is 1. This option
775 was added for bonding version 3.7.0.
768 776
7693. Configuring Bonding Devices 7773. Configuring Bonding Devices
770============================== 778==============================
diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt
index cd79735013f9..5b04b67ddca2 100644
--- a/Documentation/networking/can.txt
+++ b/Documentation/networking/can.txt
@@ -22,6 +22,7 @@ This file contains
22 4.1.2 RAW socket option CAN_RAW_ERR_FILTER 22 4.1.2 RAW socket option CAN_RAW_ERR_FILTER
23 4.1.3 RAW socket option CAN_RAW_LOOPBACK 23 4.1.3 RAW socket option CAN_RAW_LOOPBACK
24 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS 24 4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS
25 4.1.5 RAW socket returned message flags
25 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) 26 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
26 4.3 connected transport protocols (SOCK_SEQPACKET) 27 4.3 connected transport protocols (SOCK_SEQPACKET)
27 4.4 unconnected transport protocols (SOCK_DGRAM) 28 4.4 unconnected transport protocols (SOCK_DGRAM)
@@ -471,6 +472,17 @@ solution for a couple of reasons:
471 setsockopt(s, SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, 472 setsockopt(s, SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS,
472 &recv_own_msgs, sizeof(recv_own_msgs)); 473 &recv_own_msgs, sizeof(recv_own_msgs));
473 474
475 4.1.5 RAW socket returned message flags
476
477 When using recvmsg() call, the msg->msg_flags may contain following flags:
478
479 MSG_DONTROUTE: set when the received frame was created on the local host.
480
481 MSG_CONFIRM: set when the frame was sent via the socket it is received on.
482 This flag can be interpreted as a 'transmission confirmation' when the
483 CAN driver supports the echo of frames on driver level, see 3.2 and 6.2.
484 In order to receive such messages, CAN_RAW_RECV_OWN_MSGS must be set.
485
474 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM) 486 4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
475 4.3 connected transport protocols (SOCK_SEQPACKET) 487 4.3 connected transport protocols (SOCK_SEQPACKET)
476 4.4 unconnected transport protocols (SOCK_DGRAM) 488 4.4 unconnected transport protocols (SOCK_DGRAM)
diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt
index a62fdf7a6bff..271d524a4c8d 100644
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -1,18 +1,20 @@
1DCCP protocol 1DCCP protocol
2============ 2=============
3 3
4 4
5Contents 5Contents
6======== 6========
7
8- Introduction 7- Introduction
9- Missing features 8- Missing features
10- Socket options 9- Socket options
10- Sysctl variables
11- IOCTLs
12- Other tunables
11- Notes 13- Notes
12 14
15
13Introduction 16Introduction
14============ 17============
15
16Datagram Congestion Control Protocol (DCCP) is an unreliable, connection 18Datagram Congestion Control Protocol (DCCP) is an unreliable, connection
17oriented protocol designed to solve issues present in UDP and TCP, particularly 19oriented protocol designed to solve issues present in UDP and TCP, particularly
18for real-time and multimedia (streaming) traffic. 20for real-time and multimedia (streaming) traffic.
@@ -29,9 +31,9 @@ It has a base protocol and pluggable congestion control IDs (CCIDs).
29DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol 31DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol
30is at http://www.ietf.org/html.charters/dccp-charter.html 32is at http://www.ietf.org/html.charters/dccp-charter.html
31 33
34
32Missing features 35Missing features
33================ 36================
34
35The Linux DCCP implementation does not currently support all the features that are 37The Linux DCCP implementation does not currently support all the features that are
36specified in RFCs 4340...42. 38specified in RFCs 4340...42.
37 39
@@ -45,7 +47,6 @@ http://linux-net.osdl.org/index.php/DCCP_Testing#Experimental_DCCP_source_tree
45 47
46Socket options 48Socket options
47============== 49==============
48
49DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of 50DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
50service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, 51service codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
51the socket will fall back to 0 (which means that no meaningful service code 52the socket will fall back to 0 (which means that no meaningful service code
@@ -112,6 +113,7 @@ DCCP_SOCKOPT_CCID_TX_INFO
112On unidirectional connections it is useful to close the unused half-connection 113On unidirectional connections it is useful to close the unused half-connection
113via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs. 114via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs.
114 115
116
115Sysctl variables 117Sysctl variables
116================ 118================
117Several DCCP default parameters can be managed by the following sysctls 119Several DCCP default parameters can be managed by the following sysctls
@@ -155,15 +157,30 @@ sync_ratelimit = 125 ms
155 sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit 157 sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit
156 of this parameter is milliseconds; a value of 0 disables rate-limiting. 158 of this parameter is milliseconds; a value of 0 disables rate-limiting.
157 159
160
158IOCTLS 161IOCTLS
159====== 162======
160FIONREAD 163FIONREAD
161 Works as in udp(7): returns in the `int' argument pointer the size of 164 Works as in udp(7): returns in the `int' argument pointer the size of
162 the next pending datagram in bytes, or 0 when no datagram is pending. 165 the next pending datagram in bytes, or 0 when no datagram is pending.
163 166
167
168Other tunables
169==============
170Per-route rto_min support
171 CCID-2 supports the RTAX_RTO_MIN per-route setting for the minimum value
172 of the RTO timer. This setting can be modified via the 'rto_min' option
173 of iproute2; for example:
174 > ip route change 10.0.0.0/24 rto_min 250j dev wlan0
175 > ip route add 10.0.0.254/32 rto_min 800j dev wlan0
176 > ip route show dev wlan0
177 CCID-3 also supports the rto_min setting: it is used to define the lower
178 bound for the expiry of the nofeedback timer. This can be useful on LANs
179 with very low RTTs (e.g., loopback, Gbit ethernet).
180
181
164Notes 182Notes
165===== 183=====
166
167DCCP does not travel through NAT successfully at present on many boxes. This is 184DCCP does not travel through NAT successfully at present on many boxes. This is
168because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT 185because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT
169support for DCCP has been added. 186support for DCCP has been added.
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f350c69b2bb4..c7165f4cb792 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1014,6 +1014,12 @@ conf/interface/*:
1014accept_ra - BOOLEAN 1014accept_ra - BOOLEAN
1015 Accept Router Advertisements; autoconfigure using them. 1015 Accept Router Advertisements; autoconfigure using them.
1016 1016
1017 Possible values are:
1018 0 Do not accept Router Advertisements.
1019 1 Accept Router Advertisements if forwarding is disabled.
1020 2 Overrule forwarding behaviour. Accept Router Advertisements
1021 even if forwarding is enabled.
1022
1017 Functional default: enabled if local forwarding is disabled. 1023 Functional default: enabled if local forwarding is disabled.
1018 disabled if local forwarding is enabled. 1024 disabled if local forwarding is enabled.
1019 1025
@@ -1075,7 +1081,12 @@ forwarding - BOOLEAN
1075 Note: It is recommended to have the same setting on all 1081 Note: It is recommended to have the same setting on all
1076 interfaces; mixed router/host scenarios are rather uncommon. 1082 interfaces; mixed router/host scenarios are rather uncommon.
1077 1083
1078 FALSE: 1084 Possible values are:
1085 0 Forwarding disabled
1086 1 Forwarding enabled
1087 2 Forwarding enabled (Hybrid Mode)
1088
1089 FALSE (0):
1079 1090
1080 By default, Host behaviour is assumed. This means: 1091 By default, Host behaviour is assumed. This means:
1081 1092
@@ -1085,18 +1096,24 @@ forwarding - BOOLEAN
1085 Advertisements (and do autoconfiguration). 1096 Advertisements (and do autoconfiguration).
1086 4. If accept_redirects is TRUE (default), accept Redirects. 1097 4. If accept_redirects is TRUE (default), accept Redirects.
1087 1098
1088 TRUE: 1099 TRUE (1):
1089 1100
1090 If local forwarding is enabled, Router behaviour is assumed. 1101 If local forwarding is enabled, Router behaviour is assumed.
1091 This means exactly the reverse from the above: 1102 This means exactly the reverse from the above:
1092 1103
1093 1. IsRouter flag is set in Neighbour Advertisements. 1104 1. IsRouter flag is set in Neighbour Advertisements.
1094 2. Router Solicitations are not sent. 1105 2. Router Solicitations are not sent.
1095 3. Router Advertisements are ignored. 1106 3. Router Advertisements are ignored unless accept_ra is 2.
1096 4. Redirects are ignored. 1107 4. Redirects are ignored.
1097 1108
1098 Default: FALSE if global forwarding is disabled (default), 1109 TRUE (2):
1099 otherwise TRUE. 1110
1111 Hybrid mode. Same behaviour as TRUE, except for:
1112
1113 2. Router Solicitations are being sent when necessary.
1114
1115 Default: 0 (disabled) if global forwarding is disabled (default),
1116 otherwise 1 (enabled).
1100 1117
1101hop_limit - INTEGER 1118hop_limit - INTEGER
1102 Default Hop Limit to set. 1119 Default Hop Limit to set.
diff --git a/Documentation/networking/phonet.txt b/Documentation/networking/phonet.txt
index 6e8ce09f9c73..24ad2adba6e5 100644
--- a/Documentation/networking/phonet.txt
+++ b/Documentation/networking/phonet.txt
@@ -112,6 +112,22 @@ However, connect() and getpeername() are not supported, as they did
112not seem useful with Phonet usages (could be added easily). 112not seem useful with Phonet usages (could be added easily).
113 113
114 114
115Resource subscription
116---------------------
117
118A Phonet datagram socket can be subscribed to any number of 8-bits
119Phonet resources, as follow:
120
121 uint32_t res = 0xXX;
122 ioctl(fd, SIOCPNADDRESOURCE, &res);
123
124Subscription is similarly cancelled using the SIOCPNDELRESOURCE I/O
125control request, or when the socket is closed.
126
127Note that no more than one socket can be subcribed to any given
128resource at a time. If not, ioctl() will return EBUSY.
129
130
115Phonet Pipe protocol 131Phonet Pipe protocol
116-------------------- 132--------------------
117 133
@@ -166,6 +182,46 @@ The pipe protocol provides two socket options at the SOL_PNPIPE level:
166 or zero if encapsulation is off. 182 or zero if encapsulation is off.
167 183
168 184
185Phonet Pipe-controller Implementation
186-------------------------------------
187
188Phonet Pipe-controller is enabled by selecting the CONFIG_PHONET_PIPECTRLR Kconfig
189option. It is useful when communicating with those Nokia Modems which do not
190implement Pipe controller in them e.g. Nokia Slim Modem used in ST-Ericsson
191U8500 platform.
192
193The implementation is based on the Data Connection Establishment Sequence
194depicted in 'Nokia Wireless Modem API - Wireless_modem_user_guide.pdf'
195document.
196
197It allows a phonet sequenced socket (host-pep) to initiate a Pipe connection
198between itself and a remote pipe-end point (e.g. modem).
199
200The implementation adds socket options at SOL_PNPIPE level:
201
202 PNPIPE_PIPE_HANDLE
203 It accepts an integer argument for setting value of pipe handle.
204
205 PNPIPE_ENABLE accepts one integer value (int). If set to zero, the pipe
206 is disabled. If the value is non-zero, the pipe is enabled. If the pipe
207 is not (yet) connected, ENOTCONN is error is returned.
208
209The implementation also adds socket 'connect'. On calling the 'connect', pipe
210will be created between the source socket and the destination, and the pipe
211state will be set to PIPE_DISABLED.
212
213After a pipe has been created and enabled successfully, the Pipe data can be
214exchanged between the host-pep and remote-pep (modem).
215
216User-space would typically follow below sequence with Pipe controller:-
217-socket
218-bind
219-setsockopt for PNPIPE_PIPE_HANDLE
220-connect
221-setsockopt for PNPIPE_ENCAP_IP
222-setsockopt for PNPIPE_ENABLE
223
224
169Authors 225Authors
170------- 226-------
171 227
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 88bb71b46da4..9eb1ba52013d 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -177,18 +177,6 @@ Doing it all yourself
177 177
178 A convenience function to print out the PHY status neatly. 178 A convenience function to print out the PHY status neatly.
179 179
180 int phy_clear_interrupt(struct phy_device *phydev);
181 int phy_config_interrupt(struct phy_device *phydev, u32 interrupts);
182
183 Clear the PHY's interrupt, and configure which ones are allowed,
184 respectively. Currently only supports all on, or all off.
185
186 int phy_enable_interrupts(struct phy_device *phydev);
187 int phy_disable_interrupts(struct phy_device *phydev);
188
189 Functions which enable/disable PHY interrupts, clearing them
190 before and after, respectively.
191
192 int phy_start_interrupts(struct phy_device *phydev); 180 int phy_start_interrupts(struct phy_device *phydev);
193 int phy_stop_interrupts(struct phy_device *phydev); 181 int phy_stop_interrupts(struct phy_device *phydev);
194 182
@@ -213,12 +201,6 @@ Doing it all yourself
213 Fills the phydev structure with up-to-date information about the current 201 Fills the phydev structure with up-to-date information about the current
214 settings in the PHY. 202 settings in the PHY.
215 203
216 void phy_sanitize_settings(struct phy_device *phydev)
217
218 Resolves differences between currently desired settings, and
219 supported settings for the given PHY device. Does not make
220 the changes in the hardware, though.
221
222 int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd); 204 int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);
223 int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd); 205 int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd);
224 206
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index e8c8f4f06c67..98097d8cb910 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -172,15 +172,19 @@ struct skb_shared_hwtstamps {
172}; 172};
173 173
174Time stamps for outgoing packets are to be generated as follows: 174Time stamps for outgoing packets are to be generated as follows:
175- In hard_start_xmit(), check if skb_tx(skb)->hardware is set no-zero. 175- In hard_start_xmit(), check if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)
176 If yes, then the driver is expected to do hardware time stamping. 176 is set no-zero. If yes, then the driver is expected to do hardware time
177 stamping.
177- If this is possible for the skb and requested, then declare 178- If this is possible for the skb and requested, then declare
178 that the driver is doing the time stamping by setting the field 179 that the driver is doing the time stamping by setting the flag
179 skb_tx(skb)->in_progress non-zero. You might want to keep a pointer 180 SKBTX_IN_PROGRESS in skb_shinfo(skb)->tx_flags , e.g. with
180 to the associated skb for the next step and not free the skb. A driver 181
181 not supporting hardware time stamping doesn't do that. A driver must 182 skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
182 never touch sk_buff::tstamp! It is used to store software generated 183
183 time stamps by the network subsystem. 184 You might want to keep a pointer to the associated skb for the next step
185 and not free the skb. A driver not supporting hardware time stamping doesn't
186 do that. A driver must never touch sk_buff::tstamp! It is used to store
187 software generated time stamps by the network subsystem.
184- As soon as the driver has sent the packet and/or obtained a 188- As soon as the driver has sent the packet and/or obtained a
185 hardware time stamp for it, it passes the time stamp back by 189 hardware time stamp for it, it passes the time stamp back by
186 calling skb_hwtstamp_tx() with the original skb, the raw 190 calling skb_hwtstamp_tx() with the original skb, the raw
@@ -191,6 +195,6 @@ Time stamps for outgoing packets are to be generated as follows:
191 this would occur at a later time in the processing pipeline than other 195 this would occur at a later time in the processing pipeline than other
192 software time stamping and therefore could lead to unexpected deltas 196 software time stamping and therefore could lead to unexpected deltas
193 between time stamps. 197 between time stamps.
194- If the driver did not call set skb_tx(skb)->in_progress, then 198- If the driver did not set the SKBTX_IN_PROGRESS flag (see above), then
195 dev_hard_start_xmit() checks whether software time stamping 199 dev_hard_start_xmit() checks whether software time stamping
196 is wanted as fallback and potentially generates the time stamp. 200 is wanted as fallback and potentially generates the time stamp.
diff --git a/Documentation/powerpc/dts-bindings/fsl/usb.txt b/Documentation/powerpc/dts-bindings/fsl/usb.txt
index b00152402694..bd5723f0b67e 100644
--- a/Documentation/powerpc/dts-bindings/fsl/usb.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/usb.txt
@@ -8,6 +8,7 @@ and additions :
8Required properties : 8Required properties :
9 - compatible : Should be "fsl-usb2-mph" for multi port host USB 9 - compatible : Should be "fsl-usb2-mph" for multi port host USB
10 controllers, or "fsl-usb2-dr" for dual role USB controllers 10 controllers, or "fsl-usb2-dr" for dual role USB controllers
11 or "fsl,mpc5121-usb2-dr" for dual role USB controllers of MPC5121
11 - phy_type : For multi port host USB controllers, should be one of 12 - phy_type : For multi port host USB controllers, should be one of
12 "ulpi", or "serial". For dual role USB controllers, should be 13 "ulpi", or "serial". For dual role USB controllers, should be
13 one of "ulpi", "utmi", "utmi_wide", or "serial". 14 one of "ulpi", "utmi", "utmi_wide", or "serial".
@@ -33,6 +34,12 @@ Recommended properties :
33 - interrupt-parent : the phandle for the interrupt controller that 34 - interrupt-parent : the phandle for the interrupt controller that
34 services interrupts for this device. 35 services interrupts for this device.
35 36
37Optional properties :
38 - fsl,invert-drvvbus : boolean; for MPC5121 USB0 only. Indicates the
39 port power polarity of internal PHY signal DRVVBUS is inverted.
40 - fsl,invert-pwr-fault : boolean; for MPC5121 USB0 only. Indicates
41 the PWR_FAULT signal polarity is inverted.
42
36Example multi port host USB controller device node : 43Example multi port host USB controller device node :
37 usb@22000 { 44 usb@22000 {
38 compatible = "fsl-usb2-mph"; 45 compatible = "fsl-usb2-mph";
@@ -57,3 +64,18 @@ Example dual role USB controller device node :
57 dr_mode = "otg"; 64 dr_mode = "otg";
58 phy = "ulpi"; 65 phy = "ulpi";
59 }; 66 };
67
68Example dual role USB controller device node for MPC5121ADS:
69
70 usb@4000 {
71 compatible = "fsl,mpc5121-usb2-dr";
72 reg = <0x4000 0x1000>;
73 #address-cells = <1>;
74 #size-cells = <0>;
75 interrupt-parent = < &ipic >;
76 interrupts = <44 0x8>;
77 dr_mode = "otg";
78 phy_type = "utmi_wide";
79 fsl,invert-drvvbus;
80 fsl,invert-pwr-fault;
81 };
diff --git a/Documentation/scsi/st.txt b/Documentation/scsi/st.txt
index 40752602c050..691ca292c24d 100644
--- a/Documentation/scsi/st.txt
+++ b/Documentation/scsi/st.txt
@@ -2,7 +2,7 @@ This file contains brief information about the SCSI tape driver.
2The driver is currently maintained by Kai Mäkisara (email 2The driver is currently maintained by Kai Mäkisara (email
3Kai.Makisara@kolumbus.fi) 3Kai.Makisara@kolumbus.fi)
4 4
5Last modified: Sun Feb 24 21:59:07 2008 by kai.makisara 5Last modified: Sun Aug 29 18:25:47 2010 by kai.makisara
6 6
7 7
8BASICS 8BASICS
@@ -85,6 +85,17 @@ writing and the last operation has been a write. Two filemarks can be
85optionally written. In both cases end of data is signified by 85optionally written. In both cases end of data is signified by
86returning zero bytes for two consecutive reads. 86returning zero bytes for two consecutive reads.
87 87
88Writing filemarks without the immediate bit set in the SCSI command block acts
89as a synchronization point, i.e., all remaining data form the drive buffers is
90written to tape before the command returns. This makes sure that write errors
91are caught at that point, but this takes time. In some applications, several
92consecutive files must be written fast. The MTWEOFI operation can be used to
93write the filemarks without flushing the drive buffer. Writing filemark at
94close() is always flushing the drive buffers. However, if the previous
95operation is MTWEOFI, close() does not write a filemark. This can be used if
96the program wants to close/open the tape device between files and wants to
97skip waiting.
98
88If rewind, offline, bsf, or seek is done and previous tape operation was 99If rewind, offline, bsf, or seek is done and previous tape operation was
89write, a filemark is written before moving tape. 100write, a filemark is written before moving tape.
90 101
@@ -301,6 +312,8 @@ MTBSR Space backward over count records.
301MTFSS Space forward over count setmarks. 312MTFSS Space forward over count setmarks.
302MTBSS Space backward over count setmarks. 313MTBSS Space backward over count setmarks.
303MTWEOF Write count filemarks. 314MTWEOF Write count filemarks.
315MTWEOFI Write count filemarks with immediate bit set (i.e., does not
316 wait until data is on tape)
304MTWSM Write count setmarks. 317MTWSM Write count setmarks.
305MTREW Rewind tape. 318MTREW Rewind tape.
306MTOFFL Set device off line (often rewind plus eject). 319MTOFFL Set device off line (often rewind plus eject).
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 7f4dcebda9c6..d0eb696d32e8 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -300,6 +300,74 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
300 control correctly. If you have problems regarding this, try 300 control correctly. If you have problems regarding this, try
301 another ALSA compliant mixer (alsamixer works). 301 another ALSA compliant mixer (alsamixer works).
302 302
303 Module snd-azt1605
304 ------------------
305
306 Module for Aztech Sound Galaxy soundcards based on the Aztech AZT1605
307 chipset.
308
309 port - port # for BASE (0x220,0x240,0x260,0x280)
310 wss_port - port # for WSS (0x530,0x604,0xe80,0xf40)
311 irq - IRQ # for WSS (7,9,10,11)
312 dma1 - DMA # for WSS playback (0,1,3)
313 dma2 - DMA # for WSS capture (0,1), -1 = disabled (default)
314 mpu_port - port # for MPU-401 UART (0x300,0x330), -1 = disabled (default)
315 mpu_irq - IRQ # for MPU-401 UART (3,5,7,9), -1 = disabled (default)
316 fm_port - port # for OPL3 (0x388), -1 = disabled (default)
317
318 This module supports multiple cards. It does not support autoprobe: port,
319 wss_port, irq and dma1 have to be specified. The other values are
320 optional.
321
322 "port" needs to match the BASE ADDRESS jumper on the card (0x220 or 0x240)
323 or the value stored in the card's EEPROM for cards that have an EEPROM and
324 their "CONFIG MODE" jumper set to "EEPROM SETTING". The other values can
325 be choosen freely from the options enumerated above.
326
327 If dma2 is specified and different from dma1, the card will operate in
328 full-duplex mode. When dma1=3, only dma2=0 is valid and the only way to
329 enable capture since only channels 0 and 1 are available for capture.
330
331 Generic settings are "port=0x220 wss_port=0x530 irq=10 dma1=1 dma2=0
332 mpu_port=0x330 mpu_irq=9 fm_port=0x388".
333
334 Whatever IRQ and DMA channels you pick, be sure to reserve them for
335 legacy ISA in your BIOS.
336
337 Module snd-azt2316
338 ------------------
339
340 Module for Aztech Sound Galaxy soundcards based on the Aztech AZT2316
341 chipset.
342
343 port - port # for BASE (0x220,0x240,0x260,0x280)
344 wss_port - port # for WSS (0x530,0x604,0xe80,0xf40)
345 irq - IRQ # for WSS (7,9,10,11)
346 dma1 - DMA # for WSS playback (0,1,3)
347 dma2 - DMA # for WSS capture (0,1), -1 = disabled (default)
348 mpu_port - port # for MPU-401 UART (0x300,0x330), -1 = disabled (default)
349 mpu_irq - IRQ # for MPU-401 UART (5,7,9,10), -1 = disabled (default)
350 fm_port - port # for OPL3 (0x388), -1 = disabled (default)
351
352 This module supports multiple cards. It does not support autoprobe: port,
353 wss_port, irq and dma1 have to be specified. The other values are
354 optional.
355
356 "port" needs to match the BASE ADDRESS jumper on the card (0x220 or 0x240)
357 or the value stored in the card's EEPROM for cards that have an EEPROM and
358 their "CONFIG MODE" jumper set to "EEPROM SETTING". The other values can
359 be choosen freely from the options enumerated above.
360
361 If dma2 is specified and different from dma1, the card will operate in
362 full-duplex mode. When dma1=3, only dma2=0 is valid and the only way to
363 enable capture since only channels 0 and 1 are available for capture.
364
365 Generic settings are "port=0x220 wss_port=0x530 irq=10 dma1=1 dma2=0
366 mpu_port=0x330 mpu_irq=9 fm_port=0x388".
367
368 Whatever IRQ and DMA channels you pick, be sure to reserve them for
369 legacy ISA in your BIOS.
370
303 Module snd-aw2 371 Module snd-aw2
304 -------------- 372 --------------
305 373
@@ -1641,20 +1709,6 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
1641 1709
1642 This card is also known as Audio Excel DSP 16 or Zoltrix AV302. 1710 This card is also known as Audio Excel DSP 16 or Zoltrix AV302.
1643 1711
1644 Module snd-sgalaxy
1645 ------------------
1646
1647 Module for Aztech Sound Galaxy sound card.
1648
1649 sbport - Port # for SB16 interface (0x220,0x240)
1650 wssport - Port # for WSS interface (0x530,0xe80,0xf40,0x604)
1651 irq - IRQ # (7,9,10,11)
1652 dma1 - DMA #
1653
1654 This module supports multiple cards.
1655
1656 The power-management is supported.
1657
1658 Module snd-sscape 1712 Module snd-sscape
1659 ----------------- 1713 -----------------
1660 1714
diff --git a/Documentation/sound/alsa/HD-Audio.txt b/Documentation/sound/alsa/HD-Audio.txt
index 278cc2122ea0..c82beb007634 100644
--- a/Documentation/sound/alsa/HD-Audio.txt
+++ b/Documentation/sound/alsa/HD-Audio.txt
@@ -57,9 +57,11 @@ dead. However, this detection isn't perfect on some devices. In such
57a case, you can change the default method via `position_fix` option. 57a case, you can change the default method via `position_fix` option.
58 58
59`position_fix=1` means to use LPIB method explicitly. 59`position_fix=1` means to use LPIB method explicitly.
60`position_fix=2` means to use the position-buffer. 0 is the default 60`position_fix=2` means to use the position-buffer.
61value, the automatic check and fallback to LPIB as described in the 61`position_fix=3` means to use a combination of both methods, needed
62above. If you get a problem of repeated sounds, this option might 62for some VIA and ATI controllers. 0 is the default value for all other
63controllers, the automatic check and fallback to LPIB as described in
64the above. If you get a problem of repeated sounds, this option might
63help. 65help.
64 66
65In addition to that, every controller is known to be broken regarding 67In addition to that, every controller is known to be broken regarding
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index b606c2c4dd37..30289fab86eb 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -80,8 +80,10 @@ dirty_background_bytes
80Contains the amount of dirty memory at which the pdflush background writeback 80Contains the amount of dirty memory at which the pdflush background writeback
81daemon will start writeback. 81daemon will start writeback.
82 82
83If dirty_background_bytes is written, dirty_background_ratio becomes a function 83Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only
84of its value (dirty_background_bytes / the amount of dirtyable system memory). 84one of them may be specified at a time. When one sysctl is written it is
85immediately taken into account to evaluate the dirty memory limits and the
86other appears as 0 when read.
85 87
86============================================================== 88==============================================================
87 89
@@ -97,8 +99,10 @@ dirty_bytes
97Contains the amount of dirty memory at which a process generating disk writes 99Contains the amount of dirty memory at which a process generating disk writes
98will itself start writeback. 100will itself start writeback.
99 101
100If dirty_bytes is written, dirty_ratio becomes a function of its value 102Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be
101(dirty_bytes / the amount of dirtyable system memory). 103specified at a time. When one sysctl is written it is immediately taken into
104account to evaluate the dirty memory limits and the other appears as 0 when
105read.
102 106
103Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any 107Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
104value lower than this limit will be ignored and the old configuration will be 108value lower than this limit will be ignored and the old configuration will be
diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index 5c17196c8fe9..312e3754e8c5 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -75,7 +75,7 @@ On all - write a character to /proc/sysrq-trigger. e.g.:
75 75
76'f' - Will call oom_kill to kill a memory hog process. 76'f' - Will call oom_kill to kill a memory hog process.
77 77
78'g' - Used by kgdb on ppc and sh platforms. 78'g' - Used by kgdb (kernel debugger)
79 79
80'h' - Will display help (actually any other key than those listed 80'h' - Will display help (actually any other key than those listed
81 here will display help. but 'h' is easy to remember :-) 81 here will display help. but 'h' is easy to remember :-)
@@ -110,12 +110,15 @@ On all - write a character to /proc/sysrq-trigger. e.g.:
110 110
111'u' - Will attempt to remount all mounted filesystems read-only. 111'u' - Will attempt to remount all mounted filesystems read-only.
112 112
113'v' - Dumps Voyager SMP processor info to your console. 113'v' - Forcefully restores framebuffer console
114'v' - Causes ETM buffer dump [ARM-specific]
114 115
115'w' - Dumps tasks that are in uninterruptable (blocked) state. 116'w' - Dumps tasks that are in uninterruptable (blocked) state.
116 117
117'x' - Used by xmon interface on ppc/powerpc platforms. 118'x' - Used by xmon interface on ppc/powerpc platforms.
118 119
120'y' - Show global CPU Registers [SPARC-64 specific]
121
119'z' - Dump the ftrace buffer 122'z' - Dump the ftrace buffer
120 123
121'0'-'9' - Sets the console log level, controlling which kernel messages 124'0'-'9' - Sets the console log level, controlling which kernel messages
diff --git a/Documentation/timers/hpet_example.c b/Documentation/timers/hpet_example.c
index 4bfafb7bc4c5..9a3e7012c190 100644
--- a/Documentation/timers/hpet_example.c
+++ b/Documentation/timers/hpet_example.c
@@ -97,6 +97,33 @@ hpet_open_close(int argc, const char **argv)
97void 97void
98hpet_info(int argc, const char **argv) 98hpet_info(int argc, const char **argv)
99{ 99{
100 struct hpet_info info;
101 int fd;
102
103 if (argc != 1) {
104 fprintf(stderr, "hpet_info: device-name\n");
105 return;
106 }
107
108 fd = open(argv[0], O_RDONLY);
109 if (fd < 0) {
110 fprintf(stderr, "hpet_info: open of %s failed\n", argv[0]);
111 return;
112 }
113
114 if (ioctl(fd, HPET_INFO, &info) < 0) {
115 fprintf(stderr, "hpet_info: failed to get info\n");
116 goto out;
117 }
118
119 fprintf(stderr, "hpet_info: hi_irqfreq 0x%lx hi_flags 0x%lx ",
120 info.hi_ireqfreq, info.hi_flags);
121 fprintf(stderr, "hi_hpet %d hi_timer %d\n",
122 info.hi_hpet, info.hi_timer);
123
124out:
125 close(fd);
126 return;
100} 127}
101 128
102void 129void
diff --git a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
index 1b55146d1c8d..b3e73ddb1567 100644
--- a/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
+++ b/Documentation/trace/postprocess/trace-vmscan-postprocess.pl
@@ -46,7 +46,7 @@ use constant HIGH_KSWAPD_LATENCY => 20;
46use constant HIGH_KSWAPD_REWAKEUP => 21; 46use constant HIGH_KSWAPD_REWAKEUP => 21;
47use constant HIGH_NR_SCANNED => 22; 47use constant HIGH_NR_SCANNED => 22;
48use constant HIGH_NR_TAKEN => 23; 48use constant HIGH_NR_TAKEN => 23;
49use constant HIGH_NR_RECLAIM => 24; 49use constant HIGH_NR_RECLAIMED => 24;
50use constant HIGH_NR_CONTIG_DIRTY => 25; 50use constant HIGH_NR_CONTIG_DIRTY => 25;
51 51
52my %perprocesspid; 52my %perprocesspid;
@@ -58,11 +58,13 @@ my $opt_read_procstat;
58my $total_wakeup_kswapd; 58my $total_wakeup_kswapd;
59my ($total_direct_reclaim, $total_direct_nr_scanned); 59my ($total_direct_reclaim, $total_direct_nr_scanned);
60my ($total_direct_latency, $total_kswapd_latency); 60my ($total_direct_latency, $total_kswapd_latency);
61my ($total_direct_nr_reclaimed);
61my ($total_direct_writepage_file_sync, $total_direct_writepage_file_async); 62my ($total_direct_writepage_file_sync, $total_direct_writepage_file_async);
62my ($total_direct_writepage_anon_sync, $total_direct_writepage_anon_async); 63my ($total_direct_writepage_anon_sync, $total_direct_writepage_anon_async);
63my ($total_kswapd_nr_scanned, $total_kswapd_wake); 64my ($total_kswapd_nr_scanned, $total_kswapd_wake);
64my ($total_kswapd_writepage_file_sync, $total_kswapd_writepage_file_async); 65my ($total_kswapd_writepage_file_sync, $total_kswapd_writepage_file_async);
65my ($total_kswapd_writepage_anon_sync, $total_kswapd_writepage_anon_async); 66my ($total_kswapd_writepage_anon_sync, $total_kswapd_writepage_anon_async);
67my ($total_kswapd_nr_reclaimed);
66 68
67# Catch sigint and exit on request 69# Catch sigint and exit on request
68my $sigint_report = 0; 70my $sigint_report = 0;
@@ -104,7 +106,7 @@ my $regex_kswapd_wake_default = 'nid=([0-9]*) order=([0-9]*)';
104my $regex_kswapd_sleep_default = 'nid=([0-9]*)'; 106my $regex_kswapd_sleep_default = 'nid=([0-9]*)';
105my $regex_wakeup_kswapd_default = 'nid=([0-9]*) zid=([0-9]*) order=([0-9]*)'; 107my $regex_wakeup_kswapd_default = 'nid=([0-9]*) zid=([0-9]*) order=([0-9]*)';
106my $regex_lru_isolate_default = 'isolate_mode=([0-9]*) order=([0-9]*) nr_requested=([0-9]*) nr_scanned=([0-9]*) nr_taken=([0-9]*) contig_taken=([0-9]*) contig_dirty=([0-9]*) contig_failed=([0-9]*)'; 108my $regex_lru_isolate_default = 'isolate_mode=([0-9]*) order=([0-9]*) nr_requested=([0-9]*) nr_scanned=([0-9]*) nr_taken=([0-9]*) contig_taken=([0-9]*) contig_dirty=([0-9]*) contig_failed=([0-9]*)';
107my $regex_lru_shrink_inactive_default = 'lru=([A-Z_]*) nr_scanned=([0-9]*) nr_reclaimed=([0-9]*) priority=([0-9]*)'; 109my $regex_lru_shrink_inactive_default = 'nid=([0-9]*) zid=([0-9]*) nr_scanned=([0-9]*) nr_reclaimed=([0-9]*) priority=([0-9]*) flags=([A-Z_|]*)';
108my $regex_lru_shrink_active_default = 'lru=([A-Z_]*) nr_scanned=([0-9]*) nr_rotated=([0-9]*) priority=([0-9]*)'; 110my $regex_lru_shrink_active_default = 'lru=([A-Z_]*) nr_scanned=([0-9]*) nr_rotated=([0-9]*) priority=([0-9]*)';
109my $regex_writepage_default = 'page=([0-9a-f]*) pfn=([0-9]*) flags=([A-Z_|]*)'; 111my $regex_writepage_default = 'page=([0-9a-f]*) pfn=([0-9]*) flags=([A-Z_|]*)';
110 112
@@ -203,8 +205,8 @@ $regex_lru_shrink_inactive = generate_traceevent_regex(
203 "vmscan/mm_vmscan_lru_shrink_inactive", 205 "vmscan/mm_vmscan_lru_shrink_inactive",
204 $regex_lru_shrink_inactive_default, 206 $regex_lru_shrink_inactive_default,
205 "nid", "zid", 207 "nid", "zid",
206 "lru", 208 "nr_scanned", "nr_reclaimed", "priority",
207 "nr_scanned", "nr_reclaimed", "priority"); 209 "flags");
208$regex_lru_shrink_active = generate_traceevent_regex( 210$regex_lru_shrink_active = generate_traceevent_regex(
209 "vmscan/mm_vmscan_lru_shrink_active", 211 "vmscan/mm_vmscan_lru_shrink_active",
210 $regex_lru_shrink_active_default, 212 $regex_lru_shrink_active_default,
@@ -375,6 +377,16 @@ EVENT_PROCESS:
375 my $nr_contig_dirty = $7; 377 my $nr_contig_dirty = $7;
376 $perprocesspid{$process_pid}->{HIGH_NR_SCANNED} += $nr_scanned; 378 $perprocesspid{$process_pid}->{HIGH_NR_SCANNED} += $nr_scanned;
377 $perprocesspid{$process_pid}->{HIGH_NR_CONTIG_DIRTY} += $nr_contig_dirty; 379 $perprocesspid{$process_pid}->{HIGH_NR_CONTIG_DIRTY} += $nr_contig_dirty;
380 } elsif ($tracepoint eq "mm_vmscan_lru_shrink_inactive") {
381 $details = $5;
382 if ($details !~ /$regex_lru_shrink_inactive/o) {
383 print "WARNING: Failed to parse mm_vmscan_lru_shrink_inactive as expected\n";
384 print " $details\n";
385 print " $regex_lru_shrink_inactive/o\n";
386 next;
387 }
388 my $nr_reclaimed = $4;
389 $perprocesspid{$process_pid}->{HIGH_NR_RECLAIMED} += $nr_reclaimed;
378 } elsif ($tracepoint eq "mm_vmscan_writepage") { 390 } elsif ($tracepoint eq "mm_vmscan_writepage") {
379 $details = $5; 391 $details = $5;
380 if ($details !~ /$regex_writepage/o) { 392 if ($details !~ /$regex_writepage/o) {
@@ -464,8 +476,8 @@ sub dump_stats {
464 476
465 # Print out process activity 477 # Print out process activity
466 printf("\n"); 478 printf("\n");
467 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s %8s\n", "Process", "Direct", "Wokeup", "Pages", "Pages", "Pages", "Time"); 479 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s %8s %8s\n", "Process", "Direct", "Wokeup", "Pages", "Pages", "Pages", "Pages", "Time");
468 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s %8s\n", "details", "Rclms", "Kswapd", "Scanned", "Sync-IO", "ASync-IO", "Stalled"); 480 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s %8s %8s\n", "details", "Rclms", "Kswapd", "Scanned", "Rclmed", "Sync-IO", "ASync-IO", "Stalled");
469 foreach $process_pid (keys %stats) { 481 foreach $process_pid (keys %stats) {
470 482
471 if (!$stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}) { 483 if (!$stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}) {
@@ -475,6 +487,7 @@ sub dump_stats {
475 $total_direct_reclaim += $stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}; 487 $total_direct_reclaim += $stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN};
476 $total_wakeup_kswapd += $stats{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD}; 488 $total_wakeup_kswapd += $stats{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD};
477 $total_direct_nr_scanned += $stats{$process_pid}->{HIGH_NR_SCANNED}; 489 $total_direct_nr_scanned += $stats{$process_pid}->{HIGH_NR_SCANNED};
490 $total_direct_nr_reclaimed += $stats{$process_pid}->{HIGH_NR_RECLAIMED};
478 $total_direct_writepage_file_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC}; 491 $total_direct_writepage_file_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC};
479 $total_direct_writepage_anon_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC}; 492 $total_direct_writepage_anon_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC};
480 $total_direct_writepage_file_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC}; 493 $total_direct_writepage_file_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC};
@@ -489,11 +502,12 @@ sub dump_stats {
489 $index++; 502 $index++;
490 } 503 }
491 504
492 printf("%-" . $max_strlen . "s %8d %10d %8u %8u %8u %8.3f", 505 printf("%-" . $max_strlen . "s %8d %10d %8u %8u %8u %8u %8.3f",
493 $process_pid, 506 $process_pid,
494 $stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN}, 507 $stats{$process_pid}->{MM_VMSCAN_DIRECT_RECLAIM_BEGIN},
495 $stats{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD}, 508 $stats{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD},
496 $stats{$process_pid}->{HIGH_NR_SCANNED}, 509 $stats{$process_pid}->{HIGH_NR_SCANNED},
510 $stats{$process_pid}->{HIGH_NR_RECLAIMED},
497 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC}, 511 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC},
498 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC}, 512 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC},
499 $this_reclaim_delay / 1000); 513 $this_reclaim_delay / 1000);
@@ -529,8 +543,8 @@ sub dump_stats {
529 543
530 # Print out kswapd activity 544 # Print out kswapd activity
531 printf("\n"); 545 printf("\n");
532 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s\n", "Kswapd", "Kswapd", "Order", "Pages", "Pages", "Pages"); 546 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s\n", "Kswapd", "Kswapd", "Order", "Pages", "Pages", "Pages", "Pages");
533 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s\n", "Instance", "Wakeups", "Re-wakeup", "Scanned", "Sync-IO", "ASync-IO"); 547 printf("%-" . $max_strlen . "s %8s %10s %8s %8s %8s %8s\n", "Instance", "Wakeups", "Re-wakeup", "Scanned", "Rclmed", "Sync-IO", "ASync-IO");
534 foreach $process_pid (keys %stats) { 548 foreach $process_pid (keys %stats) {
535 549
536 if (!$stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE}) { 550 if (!$stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE}) {
@@ -539,16 +553,18 @@ sub dump_stats {
539 553
540 $total_kswapd_wake += $stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE}; 554 $total_kswapd_wake += $stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE};
541 $total_kswapd_nr_scanned += $stats{$process_pid}->{HIGH_NR_SCANNED}; 555 $total_kswapd_nr_scanned += $stats{$process_pid}->{HIGH_NR_SCANNED};
556 $total_kswapd_nr_reclaimed += $stats{$process_pid}->{HIGH_NR_RECLAIMED};
542 $total_kswapd_writepage_file_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC}; 557 $total_kswapd_writepage_file_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC};
543 $total_kswapd_writepage_anon_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC}; 558 $total_kswapd_writepage_anon_sync += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC};
544 $total_kswapd_writepage_file_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC}; 559 $total_kswapd_writepage_file_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC};
545 $total_kswapd_writepage_anon_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC}; 560 $total_kswapd_writepage_anon_async += $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC};
546 561
547 printf("%-" . $max_strlen . "s %8d %10d %8u %8i %8u", 562 printf("%-" . $max_strlen . "s %8d %10d %8u %8u %8i %8u",
548 $process_pid, 563 $process_pid,
549 $stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE}, 564 $stats{$process_pid}->{MM_VMSCAN_KSWAPD_WAKE},
550 $stats{$process_pid}->{HIGH_KSWAPD_REWAKEUP}, 565 $stats{$process_pid}->{HIGH_KSWAPD_REWAKEUP},
551 $stats{$process_pid}->{HIGH_NR_SCANNED}, 566 $stats{$process_pid}->{HIGH_NR_SCANNED},
567 $stats{$process_pid}->{HIGH_NR_RECLAIMED},
552 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC}, 568 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC},
553 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC}); 569 $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} + $stats{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_ASYNC});
554 570
@@ -579,6 +595,7 @@ sub dump_stats {
579 print "\nSummary\n"; 595 print "\nSummary\n";
580 print "Direct reclaims: $total_direct_reclaim\n"; 596 print "Direct reclaims: $total_direct_reclaim\n";
581 print "Direct reclaim pages scanned: $total_direct_nr_scanned\n"; 597 print "Direct reclaim pages scanned: $total_direct_nr_scanned\n";
598 print "Direct reclaim pages reclaimed: $total_direct_nr_reclaimed\n";
582 print "Direct reclaim write file sync I/O: $total_direct_writepage_file_sync\n"; 599 print "Direct reclaim write file sync I/O: $total_direct_writepage_file_sync\n";
583 print "Direct reclaim write anon sync I/O: $total_direct_writepage_anon_sync\n"; 600 print "Direct reclaim write anon sync I/O: $total_direct_writepage_anon_sync\n";
584 print "Direct reclaim write file async I/O: $total_direct_writepage_file_async\n"; 601 print "Direct reclaim write file async I/O: $total_direct_writepage_file_async\n";
@@ -588,6 +605,7 @@ sub dump_stats {
588 print "\n"; 605 print "\n";
589 print "Kswapd wakeups: $total_kswapd_wake\n"; 606 print "Kswapd wakeups: $total_kswapd_wake\n";
590 print "Kswapd pages scanned: $total_kswapd_nr_scanned\n"; 607 print "Kswapd pages scanned: $total_kswapd_nr_scanned\n";
608 print "Kswapd pages reclaimed: $total_kswapd_nr_reclaimed\n";
591 print "Kswapd reclaim write file sync I/O: $total_kswapd_writepage_file_sync\n"; 609 print "Kswapd reclaim write file sync I/O: $total_kswapd_writepage_file_sync\n";
592 print "Kswapd reclaim write anon sync I/O: $total_kswapd_writepage_anon_sync\n"; 610 print "Kswapd reclaim write anon sync I/O: $total_kswapd_writepage_anon_sync\n";
593 print "Kswapd reclaim write file async I/O: $total_kswapd_writepage_file_async\n"; 611 print "Kswapd reclaim write file async I/O: $total_kswapd_writepage_file_async\n";
@@ -612,6 +630,7 @@ sub aggregate_perprocesspid() {
612 $perprocess{$process}->{MM_VMSCAN_WAKEUP_KSWAPD} += $perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD}; 630 $perprocess{$process}->{MM_VMSCAN_WAKEUP_KSWAPD} += $perprocesspid{$process_pid}->{MM_VMSCAN_WAKEUP_KSWAPD};
613 $perprocess{$process}->{HIGH_KSWAPD_REWAKEUP} += $perprocesspid{$process_pid}->{HIGH_KSWAPD_REWAKEUP}; 631 $perprocess{$process}->{HIGH_KSWAPD_REWAKEUP} += $perprocesspid{$process_pid}->{HIGH_KSWAPD_REWAKEUP};
614 $perprocess{$process}->{HIGH_NR_SCANNED} += $perprocesspid{$process_pid}->{HIGH_NR_SCANNED}; 632 $perprocess{$process}->{HIGH_NR_SCANNED} += $perprocesspid{$process_pid}->{HIGH_NR_SCANNED};
633 $perprocess{$process}->{HIGH_NR_RECLAIMED} += $perprocesspid{$process_pid}->{HIGH_NR_RECLAIMED};
615 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC}; 634 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_SYNC};
616 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC}; 635 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_ANON_SYNC};
617 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC}; 636 $perprocess{$process}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC} += $perprocesspid{$process_pid}->{MM_VMSCAN_WRITEPAGE_FILE_ASYNC};
diff --git a/Documentation/usb/proc_usb_info.txt b/Documentation/usb/proc_usb_info.txt
index fafcd4723260..afe596d5f201 100644
--- a/Documentation/usb/proc_usb_info.txt
+++ b/Documentation/usb/proc_usb_info.txt
@@ -1,12 +1,17 @@
1/proc/bus/usb filesystem output 1/proc/bus/usb filesystem output
2=============================== 2===============================
3(version 2003.05.30) 3(version 2010.09.13)
4 4
5 5
6The usbfs filesystem for USB devices is traditionally mounted at 6The usbfs filesystem for USB devices is traditionally mounted at
7/proc/bus/usb. It provides the /proc/bus/usb/devices file, as well as 7/proc/bus/usb. It provides the /proc/bus/usb/devices file, as well as
8the /proc/bus/usb/BBB/DDD files. 8the /proc/bus/usb/BBB/DDD files.
9 9
10In many modern systems the usbfs filsystem isn't used at all. Instead
11USB device nodes are created under /dev/usb/ or someplace similar. The
12"devices" file is available in debugfs, typically as
13/sys/kernel/debug/usb/devices.
14
10 15
11**NOTE**: If /proc/bus/usb appears empty, and a host controller 16**NOTE**: If /proc/bus/usb appears empty, and a host controller
12 driver has been linked, then you need to mount the 17 driver has been linked, then you need to mount the
@@ -106,8 +111,8 @@ Legend:
106 111
107Topology info: 112Topology info:
108 113
109T: Bus=dd Lev=dd Prnt=dd Port=dd Cnt=dd Dev#=ddd Spd=ddd MxCh=dd 114T: Bus=dd Lev=dd Prnt=dd Port=dd Cnt=dd Dev#=ddd Spd=dddd MxCh=dd
110| | | | | | | | |__MaxChildren 115| | | | | | | | |__MaxChildren
111| | | | | | | |__Device Speed in Mbps 116| | | | | | | |__Device Speed in Mbps
112| | | | | | |__DeviceNumber 117| | | | | | |__DeviceNumber
113| | | | | |__Count of devices at this level 118| | | | | |__Count of devices at this level
@@ -120,8 +125,13 @@ T: Bus=dd Lev=dd Prnt=dd Port=dd Cnt=dd Dev#=ddd Spd=ddd MxCh=dd
120 Speed may be: 125 Speed may be:
121 1.5 Mbit/s for low speed USB 126 1.5 Mbit/s for low speed USB
122 12 Mbit/s for full speed USB 127 12 Mbit/s for full speed USB
123 480 Mbit/s for high speed USB (added for USB 2.0) 128 480 Mbit/s for high speed USB (added for USB 2.0);
129 also used for Wireless USB, which has no fixed speed
130 5000 Mbit/s for SuperSpeed USB (added for USB 3.0)
124 131
132 For reasons lost in the mists of time, the Port number is always
133 too low by 1. For example, a device plugged into port 4 will
134 show up with "Port=03".
125 135
126Bandwidth info: 136Bandwidth info:
127B: Alloc=ddd/ddd us (xx%), #Int=ddd, #Iso=ddd 137B: Alloc=ddd/ddd us (xx%), #Int=ddd, #Iso=ddd
@@ -291,7 +301,7 @@ Here's an example, from a system which has a UHCI root hub,
291an external hub connected to the root hub, and a mouse and 301an external hub connected to the root hub, and a mouse and
292a serial converter connected to the external hub. 302a serial converter connected to the external hub.
293 303
294T: Bus=00 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2 304T: Bus=00 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
295B: Alloc= 28/900 us ( 3%), #Int= 2, #Iso= 0 305B: Alloc= 28/900 us ( 3%), #Int= 2, #Iso= 0
296D: Ver= 1.00 Cls=09(hub ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 306D: Ver= 1.00 Cls=09(hub ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
297P: Vendor=0000 ProdID=0000 Rev= 0.00 307P: Vendor=0000 ProdID=0000 Rev= 0.00
@@ -301,21 +311,21 @@ C:* #Ifs= 1 Cfg#= 1 Atr=40 MxPwr= 0mA
301I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub 311I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
302E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=255ms 312E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=255ms
303 313
304T: Bus=00 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 4 314T: Bus=00 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 4
305D: Ver= 1.00 Cls=09(hub ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 315D: Ver= 1.00 Cls=09(hub ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
306P: Vendor=0451 ProdID=1446 Rev= 1.00 316P: Vendor=0451 ProdID=1446 Rev= 1.00
307C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=100mA 317C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=100mA
308I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub 318I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
309E: Ad=81(I) Atr=03(Int.) MxPS= 1 Ivl=255ms 319E: Ad=81(I) Atr=03(Int.) MxPS= 1 Ivl=255ms
310 320
311T: Bus=00 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 3 Spd=1.5 MxCh= 0 321T: Bus=00 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 3 Spd=1.5 MxCh= 0
312D: Ver= 1.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 322D: Ver= 1.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
313P: Vendor=04b4 ProdID=0001 Rev= 0.00 323P: Vendor=04b4 ProdID=0001 Rev= 0.00
314C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=100mA 324C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=100mA
315I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=mouse 325I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=mouse
316E: Ad=81(I) Atr=03(Int.) MxPS= 3 Ivl= 10ms 326E: Ad=81(I) Atr=03(Int.) MxPS= 3 Ivl= 10ms
317 327
318T: Bus=00 Lev=02 Prnt=02 Port=02 Cnt=02 Dev#= 4 Spd=12 MxCh= 0 328T: Bus=00 Lev=02 Prnt=02 Port=02 Cnt=02 Dev#= 4 Spd=12 MxCh= 0
319D: Ver= 1.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 329D: Ver= 1.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
320P: Vendor=0565 ProdID=0001 Rev= 1.08 330P: Vendor=0565 ProdID=0001 Rev= 1.08
321S: Manufacturer=Peracom Networks, Inc. 331S: Manufacturer=Peracom Networks, Inc.
@@ -330,12 +340,12 @@ E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl= 8ms
330Selecting only the "T:" and "I:" lines from this (for example, by using 340Selecting only the "T:" and "I:" lines from this (for example, by using
331"procusb ti"), we have: 341"procusb ti"), we have:
332 342
333T: Bus=00 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2 343T: Bus=00 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2
334T: Bus=00 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 4 344T: Bus=00 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 4
335I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub 345I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub
336T: Bus=00 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 3 Spd=1.5 MxCh= 0 346T: Bus=00 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 3 Spd=1.5 MxCh= 0
337I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=mouse 347I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=mouse
338T: Bus=00 Lev=02 Prnt=02 Port=02 Cnt=02 Dev#= 4 Spd=12 MxCh= 0 348T: Bus=00 Lev=02 Prnt=02 Port=02 Cnt=02 Dev#= 4 Spd=12 MxCh= 0
339I: If#= 0 Alt= 0 #EPs= 3 Cls=00(>ifc ) Sub=00 Prot=00 Driver=serial 349I: If#= 0 Alt= 0 #EPs= 3 Cls=00(>ifc ) Sub=00 Prot=00 Driver=serial
340 350
341 351
diff --git a/Documentation/vm/highmem.txt b/Documentation/vm/highmem.txt
new file mode 100644
index 000000000000..4324d24ffacd
--- /dev/null
+++ b/Documentation/vm/highmem.txt
@@ -0,0 +1,162 @@
1
2 ====================
3 HIGH MEMORY HANDLING
4 ====================
5
6By: Peter Zijlstra <a.p.zijlstra@chello.nl>
7
8Contents:
9
10 (*) What is high memory?
11
12 (*) Temporary virtual mappings.
13
14 (*) Using kmap_atomic.
15
16 (*) Cost of temporary mappings.
17
18 (*) i386 PAE.
19
20
21====================
22WHAT IS HIGH MEMORY?
23====================
24
25High memory (highmem) is used when the size of physical memory approaches or
26exceeds the maximum size of virtual memory. At that point it becomes
27impossible for the kernel to keep all of the available physical memory mapped
28at all times. This means the kernel needs to start using temporary mappings of
29the pieces of physical memory that it wants to access.
30
31The part of (physical) memory not covered by a permanent mapping is what we
32refer to as 'highmem'. There are various architecture dependent constraints on
33where exactly that border lies.
34
35In the i386 arch, for example, we choose to map the kernel into every process's
36VM space so that we don't have to pay the full TLB invalidation costs for
37kernel entry/exit. This means the available virtual memory space (4GiB on
38i386) has to be divided between user and kernel space.
39
40The traditional split for architectures using this approach is 3:1, 3GiB for
41userspace and the top 1GiB for kernel space:
42
43 +--------+ 0xffffffff
44 | Kernel |
45 +--------+ 0xc0000000
46 | |
47 | User |
48 | |
49 +--------+ 0x00000000
50
51This means that the kernel can at most map 1GiB of physical memory at any one
52time, but because we need virtual address space for other things - including
53temporary maps to access the rest of the physical memory - the actual direct
54map will typically be less (usually around ~896MiB).
55
56Other architectures that have mm context tagged TLBs can have separate kernel
57and user maps. Some hardware (like some ARMs), however, have limited virtual
58space when they use mm context tags.
59
60
61==========================
62TEMPORARY VIRTUAL MAPPINGS
63==========================
64
65The kernel contains several ways of creating temporary mappings:
66
67 (*) vmap(). This can be used to make a long duration mapping of multiple
68 physical pages into a contiguous virtual space. It needs global
69 synchronization to unmap.
70
71 (*) kmap(). This permits a short duration mapping of a single page. It needs
72 global synchronization, but is amortized somewhat. It is also prone to
73 deadlocks when using in a nested fashion, and so it is not recommended for
74 new code.
75
76 (*) kmap_atomic(). This permits a very short duration mapping of a single
77 page. Since the mapping is restricted to the CPU that issued it, it
78 performs well, but the issuing task is therefore required to stay on that
79 CPU until it has finished, lest some other task displace its mappings.
80
81 kmap_atomic() may also be used by interrupt contexts, since it is does not
82 sleep and the caller may not sleep until after kunmap_atomic() is called.
83
84 It may be assumed that k[un]map_atomic() won't fail.
85
86
87=================
88USING KMAP_ATOMIC
89=================
90
91When and where to use kmap_atomic() is straightforward. It is used when code
92wants to access the contents of a page that might be allocated from high memory
93(see __GFP_HIGHMEM), for example a page in the pagecache. The API has two
94functions, and they can be used in a manner similar to the following:
95
96 /* Find the page of interest. */
97 struct page *page = find_get_page(mapping, offset);
98
99 /* Gain access to the contents of that page. */
100 void *vaddr = kmap_atomic(page);
101
102 /* Do something to the contents of that page. */
103 memset(vaddr, 0, PAGE_SIZE);
104
105 /* Unmap that page. */
106 kunmap_atomic(vaddr);
107
108Note that the kunmap_atomic() call takes the result of the kmap_atomic() call
109not the argument.
110
111If you need to map two pages because you want to copy from one page to
112another you need to keep the kmap_atomic calls strictly nested, like:
113
114 vaddr1 = kmap_atomic(page1);
115 vaddr2 = kmap_atomic(page2);
116
117 memcpy(vaddr1, vaddr2, PAGE_SIZE);
118
119 kunmap_atomic(vaddr2);
120 kunmap_atomic(vaddr1);
121
122
123==========================
124COST OF TEMPORARY MAPPINGS
125==========================
126
127The cost of creating temporary mappings can be quite high. The arch has to
128manipulate the kernel's page tables, the data TLB and/or the MMU's registers.
129
130If CONFIG_HIGHMEM is not set, then the kernel will try and create a mapping
131simply with a bit of arithmetic that will convert the page struct address into
132a pointer to the page contents rather than juggling mappings about. In such a
133case, the unmap operation may be a null operation.
134
135If CONFIG_MMU is not set, then there can be no temporary mappings and no
136highmem. In such a case, the arithmetic approach will also be used.
137
138
139========
140i386 PAE
141========
142
143The i386 arch, under some circumstances, will permit you to stick up to 64GiB
144of RAM into your 32-bit machine. This has a number of consequences:
145
146 (*) Linux needs a page-frame structure for each page in the system and the
147 pageframes need to live in the permanent mapping, which means:
148
149 (*) you can have 896M/sizeof(struct page) page-frames at most; with struct
150 page being 32-bytes that would end up being something in the order of 112G
151 worth of pages; the kernel, however, needs to store more than just
152 page-frames in that memory...
153
154 (*) PAE makes your page tables larger - which slows the system down as more
155 data has to be accessed to traverse in TLB fills and the like. One
156 advantage is that PAE has more PTE bits and can provide advanced features
157 like NX and PAT.
158
159The general recommendation is that you don't use more than 8GiB on a 32-bit
160machine - although more might work for you and your workload, you're pretty
161much on your own - don't expect kernel developers to really care much if things
162come apart.
diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt
index 6690fc34ef6d..4e7da6543424 100644
--- a/Documentation/vm/numa_memory_policy.txt
+++ b/Documentation/vm/numa_memory_policy.txt
@@ -424,7 +424,7 @@ a command line tool, numactl(8), exists that allows one to:
424 424
425+ set the shared policy for a shared memory segment via mbind(2) 425+ set the shared policy for a shared memory segment via mbind(2)
426 426
427The numactl(8) tool is packages with the run-time version of the library 427The numactl(8) tool is packaged with the run-time version of the library
428containing the memory policy system call wrappers. Some distributions 428containing the memory policy system call wrappers. Some distributions
429package the headers and compile-time libraries in a separate development 429package the headers and compile-time libraries in a separate development
430package. 430package.
diff --git a/Documentation/workqueue.txt b/Documentation/workqueue.txt
index e4498a2872c3..996a27d9b8db 100644
--- a/Documentation/workqueue.txt
+++ b/Documentation/workqueue.txt
@@ -196,11 +196,11 @@ resources, scheduled and executed.
196 suspend operations. Work items on the wq are drained and no 196 suspend operations. Work items on the wq are drained and no
197 new work item starts execution until thawed. 197 new work item starts execution until thawed.
198 198
199 WQ_RESCUER 199 WQ_MEM_RECLAIM
200 200
201 All wq which might be used in the memory reclaim paths _MUST_ 201 All wq which might be used in the memory reclaim paths _MUST_
202 have this flag set. This reserves one worker exclusively for 202 have this flag set. The wq is guaranteed to have at least one
203 the execution of this wq under memory pressure. 203 execution context regardless of memory pressure.
204 204
205 WQ_HIGHPRI 205 WQ_HIGHPRI
206 206
@@ -356,11 +356,11 @@ If q1 has WQ_CPU_INTENSIVE set,
356 356
3576. Guidelines 3576. Guidelines
358 358
359* Do not forget to use WQ_RESCUER if a wq may process work items which 359* Do not forget to use WQ_MEM_RECLAIM if a wq may process work items
360 are used during memory reclaim. Each wq with WQ_RESCUER set has one 360 which are used during memory reclaim. Each wq with WQ_MEM_RECLAIM
361 rescuer thread reserved for it. If there is dependency among 361 set has an execution context reserved for it. If there is
362 multiple work items used during memory reclaim, they should be 362 dependency among multiple work items used during memory reclaim,
363 queued to separate wq each with WQ_RESCUER. 363 they should be queued to separate wq each with WQ_MEM_RECLAIM.
364 364
365* Unless strict ordering is required, there is no need to use ST wq. 365* Unless strict ordering is required, there is no need to use ST wq.
366 366
@@ -368,12 +368,13 @@ If q1 has WQ_CPU_INTENSIVE set,
368 recommended. In most use cases, concurrency level usually stays 368 recommended. In most use cases, concurrency level usually stays
369 well under the default limit. 369 well under the default limit.
370 370
371* A wq serves as a domain for forward progress guarantee (WQ_RESCUER), 371* A wq serves as a domain for forward progress guarantee
372 flush and work item attributes. Work items which are not involved 372 (WQ_MEM_RECLAIM, flush and work item attributes. Work items which
373 in memory reclaim and don't need to be flushed as a part of a group 373 are not involved in memory reclaim and don't need to be flushed as a
374 of work items, and don't require any special attribute, can use one 374 part of a group of work items, and don't require any special
375 of the system wq. There is no difference in execution 375 attribute, can use one of the system wq. There is no difference in
376 characteristics between using a dedicated wq and a system wq. 376 execution characteristics between using a dedicated wq and a system
377 wq.
377 378
378* Unless work items are expected to consume a huge amount of CPU 379* Unless work items are expected to consume a huge amount of CPU
379 cycles, using a bound wq is usually beneficial due to the increased 380 cycles, using a bound wq is usually beneficial due to the increased
diff --git a/Documentation/x86/x86_64/kernel-stacks b/Documentation/x86/x86_64/kernel-stacks
index 5ad65d51fb95..a01eec5d1d0b 100644
--- a/Documentation/x86/x86_64/kernel-stacks
+++ b/Documentation/x86/x86_64/kernel-stacks
@@ -18,9 +18,9 @@ specialized stacks contain no useful data. The main CPU stacks are:
18 Used for external hardware interrupts. If this is the first external 18 Used for external hardware interrupts. If this is the first external
19 hardware interrupt (i.e. not a nested hardware interrupt) then the 19 hardware interrupt (i.e. not a nested hardware interrupt) then the
20 kernel switches from the current task to the interrupt stack. Like 20 kernel switches from the current task to the interrupt stack. Like
21 the split thread and interrupt stacks on i386 (with CONFIG_4KSTACKS), 21 the split thread and interrupt stacks on i386, this gives more room
22 this gives more room for kernel interrupt processing without having 22 for kernel interrupt processing without having to increase the size
23 to increase the size of every per thread stack. 23 of every per thread stack.
24 24
25 The interrupt stack is also used when processing a softirq. 25 The interrupt stack is also used when processing a softirq.
26 26