diff options
Diffstat (limited to 'Documentation')
46 files changed, 8814 insertions, 0 deletions
diff --git a/Documentation/ABI/obsolete/proc-pid-oom_adj b/Documentation/ABI/obsolete/proc-pid-oom_adj new file mode 100644 index 00000000000..9a3cb88ade4 --- /dev/null +++ b/Documentation/ABI/obsolete/proc-pid-oom_adj | |||
@@ -0,0 +1,22 @@ | |||
1 | What: /proc/<pid>/oom_adj | ||
2 | When: August 2012 | ||
3 | Why: /proc/<pid>/oom_adj allows userspace to influence the oom killer's | ||
4 | badness heuristic used to determine which task to kill when the kernel | ||
5 | is out of memory. | ||
6 | |||
7 | The badness heuristic has since been rewritten since the introduction of | ||
8 | this tunable such that its meaning is deprecated. The value was | ||
9 | implemented as a bitshift on a score generated by the badness() | ||
10 | function that did not have any precise units of measure. With the | ||
11 | rewrite, the score is given as a proportion of available memory to the | ||
12 | task allocating pages, so using a bitshift which grows the score | ||
13 | exponentially is, thus, impossible to tune with fine granularity. | ||
14 | |||
15 | A much more powerful interface, /proc/<pid>/oom_score_adj, was | ||
16 | introduced with the oom killer rewrite that allows users to increase or | ||
17 | decrease the badness score linearly. This interface will replace | ||
18 | /proc/<pid>/oom_adj. | ||
19 | |||
20 | A warning will be emitted to the kernel log if an application uses this | ||
21 | deprecated interface. After it is printed once, future warnings will be | ||
22 | suppressed until the kernel is rebooted. | ||
diff --git a/Documentation/ABI/testing/sysfs-devices-node b/Documentation/ABI/testing/sysfs-devices-node new file mode 100644 index 00000000000..453a210c3ce --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-node | |||
@@ -0,0 +1,7 @@ | |||
1 | What: /sys/devices/system/node/nodeX/compact | ||
2 | Date: February 2010 | ||
3 | Contact: Mel Gorman <mel@csn.ul.ie> | ||
4 | Description: | ||
5 | When this file is written to, all memory within that node | ||
6 | will be compacted. When it completes, memory will be freed | ||
7 | into blocks which have as many contiguous pages as possible | ||
diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-cleancache b/Documentation/ABI/testing/sysfs-kernel-mm-cleancache new file mode 100644 index 00000000000..662ae646ea1 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-mm-cleancache | |||
@@ -0,0 +1,11 @@ | |||
1 | What: /sys/kernel/mm/cleancache/ | ||
2 | Date: April 2011 | ||
3 | Contact: Dan Magenheimer <dan.magenheimer@oracle.com> | ||
4 | Description: | ||
5 | /sys/kernel/mm/cleancache/ contains a number of files which | ||
6 | record a count of various cleancache operations | ||
7 | (sum across all filesystems): | ||
8 | succ_gets | ||
9 | failed_gets | ||
10 | puts | ||
11 | flushes | ||
diff --git a/Documentation/ABI/testing/sysfs-wacom b/Documentation/ABI/testing/sysfs-wacom new file mode 100644 index 00000000000..1517976e25c --- /dev/null +++ b/Documentation/ABI/testing/sysfs-wacom | |||
@@ -0,0 +1,10 @@ | |||
1 | What: /sys/class/hidraw/hidraw*/device/speed | ||
2 | Date: April 2010 | ||
3 | Kernel Version: 2.6.35 | ||
4 | Contact: linux-bluetooth@vger.kernel.org | ||
5 | Description: | ||
6 | The /sys/class/hidraw/hidraw*/device/speed file controls | ||
7 | reporting speed of wacom bluetooth tablet. Reading from | ||
8 | this file returns 1 if tablet reports in high speed mode | ||
9 | or 0 otherwise. Writing to this file one of these values | ||
10 | switches reporting speed. | ||
diff --git a/Documentation/DocBook/mcabook.tmpl b/Documentation/DocBook/mcabook.tmpl new file mode 100644 index 00000000000..467ccac6ec5 --- /dev/null +++ b/Documentation/DocBook/mcabook.tmpl | |||
@@ -0,0 +1,107 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="MCAGuide"> | ||
6 | <bookinfo> | ||
7 | <title>MCA Driver Programming Interface</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Alan</firstname> | ||
12 | <surname>Cox</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>alan@lxorguk.ukuu.org.uk</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | <author> | ||
20 | <firstname>David</firstname> | ||
21 | <surname>Weinehall</surname> | ||
22 | </author> | ||
23 | <author> | ||
24 | <firstname>Chris</firstname> | ||
25 | <surname>Beauregard</surname> | ||
26 | </author> | ||
27 | </authorgroup> | ||
28 | |||
29 | <copyright> | ||
30 | <year>2000</year> | ||
31 | <holder>Alan Cox</holder> | ||
32 | <holder>David Weinehall</holder> | ||
33 | <holder>Chris Beauregard</holder> | ||
34 | </copyright> | ||
35 | |||
36 | <legalnotice> | ||
37 | <para> | ||
38 | This documentation is free software; you can redistribute | ||
39 | it and/or modify it under the terms of the GNU General Public | ||
40 | License as published by the Free Software Foundation; either | ||
41 | version 2 of the License, or (at your option) any later | ||
42 | version. | ||
43 | </para> | ||
44 | |||
45 | <para> | ||
46 | This program is distributed in the hope that it will be | ||
47 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
48 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
49 | See the GNU General Public License for more details. | ||
50 | </para> | ||
51 | |||
52 | <para> | ||
53 | You should have received a copy of the GNU General Public | ||
54 | License along with this program; if not, write to the Free | ||
55 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
56 | MA 02111-1307 USA | ||
57 | </para> | ||
58 | |||
59 | <para> | ||
60 | For more details see the file COPYING in the source | ||
61 | distribution of Linux. | ||
62 | </para> | ||
63 | </legalnotice> | ||
64 | </bookinfo> | ||
65 | |||
66 | <toc></toc> | ||
67 | |||
68 | <chapter id="intro"> | ||
69 | <title>Introduction</title> | ||
70 | <para> | ||
71 | The MCA bus functions provide a generalised interface to find MCA | ||
72 | bus cards, to claim them for a driver, and to read and manipulate POS | ||
73 | registers without being aware of the motherboard internals or | ||
74 | certain deep magic specific to onboard devices. | ||
75 | </para> | ||
76 | <para> | ||
77 | The basic interface to the MCA bus devices is the slot. Each slot | ||
78 | is numbered and virtual slot numbers are assigned to the internal | ||
79 | devices. Using a pci_dev as other busses do does not really make | ||
80 | sense in the MCA context as the MCA bus resources require card | ||
81 | specific interpretation. | ||
82 | </para> | ||
83 | <para> | ||
84 | Finally the MCA bus functions provide a parallel set of DMA | ||
85 | functions mimicing the ISA bus DMA functions as closely as possible, | ||
86 | although also supporting the additional DMA functionality on the | ||
87 | MCA bus controllers. | ||
88 | </para> | ||
89 | </chapter> | ||
90 | <chapter id="bugs"> | ||
91 | <title>Known Bugs And Assumptions</title> | ||
92 | <para> | ||
93 | None. | ||
94 | </para> | ||
95 | </chapter> | ||
96 | |||
97 | <chapter id="pubfunctions"> | ||
98 | <title>Public Functions Provided</title> | ||
99 | !Edrivers/mca/mca-legacy.c | ||
100 | </chapter> | ||
101 | |||
102 | <chapter id="dmafunctions"> | ||
103 | <title>DMA Functions Provided</title> | ||
104 | !Iarch/x86/include/asm/mca_dma.h | ||
105 | </chapter> | ||
106 | |||
107 | </book> | ||
diff --git a/Documentation/android.txt b/Documentation/android.txt new file mode 100644 index 00000000000..72a62afdf20 --- /dev/null +++ b/Documentation/android.txt | |||
@@ -0,0 +1,121 @@ | |||
1 | ============= | ||
2 | A N D R O I D | ||
3 | ============= | ||
4 | |||
5 | Copyright (C) 2009 Google, Inc. | ||
6 | Written by Mike Chan <mike@android.com> | ||
7 | |||
8 | CONTENTS: | ||
9 | --------- | ||
10 | |||
11 | 1. Android | ||
12 | 1.1 Required enabled config options | ||
13 | 1.2 Required disabled config options | ||
14 | 1.3 Recommended enabled config options | ||
15 | 2. Contact | ||
16 | |||
17 | |||
18 | 1. Android | ||
19 | ========== | ||
20 | |||
21 | Android (www.android.com) is an open source operating system for mobile devices. | ||
22 | This document describes configurations needed to run the Android framework on | ||
23 | top of the Linux kernel. | ||
24 | |||
25 | To see a working defconfig look at msm_defconfig or goldfish_defconfig | ||
26 | which can be found at http://android.git.kernel.org in kernel/common.git | ||
27 | and kernel/msm.git | ||
28 | |||
29 | |||
30 | 1.1 Required enabled config options | ||
31 | ----------------------------------- | ||
32 | After building a standard defconfig, ensure that these options are enabled in | ||
33 | your .config or defconfig if they are not already. Based off the msm_defconfig. | ||
34 | You should keep the rest of the default options enabled in the defconfig | ||
35 | unless you know what you are doing. | ||
36 | |||
37 | ANDROID_PARANOID_NETWORK | ||
38 | ASHMEM | ||
39 | CONFIG_FB_MODE_HELPERS | ||
40 | CONFIG_FONT_8x16 | ||
41 | CONFIG_FONT_8x8 | ||
42 | CONFIG_YAFFS_SHORT_NAMES_IN_RAM | ||
43 | DAB | ||
44 | EARLYSUSPEND | ||
45 | FB | ||
46 | FB_CFB_COPYAREA | ||
47 | FB_CFB_FILLRECT | ||
48 | FB_CFB_IMAGEBLIT | ||
49 | FB_DEFERRED_IO | ||
50 | FB_TILEBLITTING | ||
51 | HIGH_RES_TIMERS | ||
52 | INOTIFY | ||
53 | INOTIFY_USER | ||
54 | INPUT_EVDEV | ||
55 | INPUT_GPIO | ||
56 | INPUT_MISC | ||
57 | LEDS_CLASS | ||
58 | LEDS_GPIO | ||
59 | LOCK_KERNEL | ||
60 | LkOGGER | ||
61 | LOW_MEMORY_KILLER | ||
62 | MISC_DEVICES | ||
63 | NEW_LEDS | ||
64 | NO_HZ | ||
65 | POWER_SUPPLY | ||
66 | PREEMPT | ||
67 | RAMFS | ||
68 | RTC_CLASS | ||
69 | RTC_LIB | ||
70 | SWITCH | ||
71 | SWITCH_GPIO | ||
72 | TMPFS | ||
73 | UID_STAT | ||
74 | UID16 | ||
75 | USB_FUNCTION | ||
76 | USB_FUNCTION_ADB | ||
77 | USER_WAKELOCK | ||
78 | VIDEO_OUTPUT_CONTROL | ||
79 | WAKELOCK | ||
80 | YAFFS_AUTO_YAFFS2 | ||
81 | YAFFS_FS | ||
82 | YAFFS_YAFFS1 | ||
83 | YAFFS_YAFFS2 | ||
84 | |||
85 | |||
86 | 1.2 Required disabled config options | ||
87 | ------------------------------------ | ||
88 | CONFIG_YAFFS_DISABLE_LAZY_LOAD | ||
89 | DNOTIFY | ||
90 | |||
91 | |||
92 | 1.3 Recommended enabled config options | ||
93 | ------------------------------ | ||
94 | ANDROID_PMEM | ||
95 | ANDROID_RAM_CONSOLE | ||
96 | ANDROID_RAM_CONSOLE_ERROR_CORRECTION | ||
97 | SCHEDSTATS | ||
98 | DEBUG_PREEMPT | ||
99 | DEBUG_MUTEXES | ||
100 | DEBUG_SPINLOCK_SLEEP | ||
101 | DEBUG_INFO | ||
102 | FRAME_POINTER | ||
103 | CPU_FREQ | ||
104 | CPU_FREQ_TABLE | ||
105 | CPU_FREQ_DEFAULT_GOV_ONDEMAND | ||
106 | CPU_FREQ_GOV_ONDEMAND | ||
107 | CRC_CCITT | ||
108 | EMBEDDED | ||
109 | INPUT_TOUCHSCREEN | ||
110 | I2C | ||
111 | I2C_BOARDINFO | ||
112 | LOG_BUF_SHIFT=17 | ||
113 | SERIAL_CORE | ||
114 | SERIAL_CORE_CONSOLE | ||
115 | |||
116 | |||
117 | 2. Contact | ||
118 | ========== | ||
119 | website: http://android.git.kernel.org | ||
120 | |||
121 | mailing-lists: android-kernel@googlegroups.com | ||
diff --git a/Documentation/aoe/mkdevs.sh b/Documentation/aoe/mkdevs.sh new file mode 100644 index 00000000000..44c0ab70243 --- /dev/null +++ b/Documentation/aoe/mkdevs.sh | |||
@@ -0,0 +1,41 @@ | |||
1 | #!/bin/sh | ||
2 | |||
3 | n_shelves=${n_shelves:-10} | ||
4 | n_partitions=${n_partitions:-16} | ||
5 | |||
6 | if test "$#" != "1"; then | ||
7 | echo "Usage: sh `basename $0` {dir}" 1>&2 | ||
8 | echo " n_partitions=16 sh `basename $0` {dir}" 1>&2 | ||
9 | exit 1 | ||
10 | fi | ||
11 | dir=$1 | ||
12 | |||
13 | MAJOR=152 | ||
14 | |||
15 | echo "Creating AoE devnode files in $dir ..." | ||
16 | |||
17 | set -e | ||
18 | |||
19 | mkdir -p $dir | ||
20 | |||
21 | # (Status info is in sysfs. See status.sh.) | ||
22 | # rm -f $dir/stat | ||
23 | # mknod -m 0400 $dir/stat c $MAJOR 1 | ||
24 | rm -f $dir/err | ||
25 | mknod -m 0400 $dir/err c $MAJOR 2 | ||
26 | rm -f $dir/discover | ||
27 | mknod -m 0200 $dir/discover c $MAJOR 3 | ||
28 | rm -f $dir/interfaces | ||
29 | mknod -m 0200 $dir/interfaces c $MAJOR 4 | ||
30 | rm -f $dir/revalidate | ||
31 | mknod -m 0200 $dir/revalidate c $MAJOR 5 | ||
32 | rm -f $dir/flush | ||
33 | mknod -m 0200 $dir/flush c $MAJOR 6 | ||
34 | |||
35 | export n_partitions | ||
36 | mkshelf=`echo $0 | sed 's!mkdevs!mkshelf!'` | ||
37 | i=0 | ||
38 | while test $i -lt $n_shelves; do | ||
39 | sh -xc "sh $mkshelf $dir $i" | ||
40 | i=`expr $i + 1` | ||
41 | done | ||
diff --git a/Documentation/aoe/mkshelf.sh b/Documentation/aoe/mkshelf.sh new file mode 100644 index 00000000000..32615814271 --- /dev/null +++ b/Documentation/aoe/mkshelf.sh | |||
@@ -0,0 +1,28 @@ | |||
1 | #! /bin/sh | ||
2 | |||
3 | if test "$#" != "2"; then | ||
4 | echo "Usage: sh `basename $0` {dir} {shelfaddress}" 1>&2 | ||
5 | echo " n_partitions=16 sh `basename $0` {dir} {shelfaddress}" 1>&2 | ||
6 | exit 1 | ||
7 | fi | ||
8 | n_partitions=${n_partitions:-16} | ||
9 | dir=$1 | ||
10 | shelf=$2 | ||
11 | nslots=16 | ||
12 | maxslot=`echo $nslots 1 - p | dc` | ||
13 | MAJOR=152 | ||
14 | |||
15 | set -e | ||
16 | |||
17 | minor=`echo $nslots \* $shelf \* $n_partitions | bc` | ||
18 | endp=`echo $n_partitions - 1 | bc` | ||
19 | for slot in `seq 0 $maxslot`; do | ||
20 | for part in `seq 0 $endp`; do | ||
21 | name=e$shelf.$slot | ||
22 | test "$part" != "0" && name=${name}p$part | ||
23 | rm -f $dir/$name | ||
24 | mknod -m 0660 $dir/$name b $MAJOR $minor | ||
25 | |||
26 | minor=`expr $minor + 1` | ||
27 | done | ||
28 | done | ||
diff --git a/Documentation/arm/IXP2000 b/Documentation/arm/IXP2000 new file mode 100644 index 00000000000..68d21d92a30 --- /dev/null +++ b/Documentation/arm/IXP2000 | |||
@@ -0,0 +1,69 @@ | |||
1 | |||
2 | ------------------------------------------------------------------------- | ||
3 | Release Notes for Linux on Intel's IXP2000 Network Processor | ||
4 | |||
5 | Maintained by Deepak Saxena <dsaxena@plexity.net> | ||
6 | ------------------------------------------------------------------------- | ||
7 | |||
8 | 1. Overview | ||
9 | |||
10 | Intel's IXP2000 family of NPUs (IXP2400, IXP2800, IXP2850) is designed | ||
11 | for high-performance network applications such high-availability | ||
12 | telecom systems. In addition to an XScale core, it contains up to 8 | ||
13 | "MicroEngines" that run special code, several high-end networking | ||
14 | interfaces (UTOPIA, SPI, etc), a PCI host bridge, one serial port, | ||
15 | flash interface, and some other odds and ends. For more information, see: | ||
16 | |||
17 | http://developer.intel.com | ||
18 | |||
19 | 2. Linux Support | ||
20 | |||
21 | Linux currently supports the following features on the IXP2000 NPUs: | ||
22 | |||
23 | - On-chip serial | ||
24 | - PCI | ||
25 | - Flash (MTD/JFFS2) | ||
26 | - I2C through GPIO | ||
27 | - Timers (watchdog, OS) | ||
28 | |||
29 | That is about all we can support under Linux ATM b/c the core networking | ||
30 | components of the chip are accessed via Intel's closed source SDK. | ||
31 | Please contact Intel directly on issues with using those. There is | ||
32 | also a mailing list run by some folks at Princeton University that might | ||
33 | be of help: https://lists.cs.princeton.edu/mailman/listinfo/ixp2xxx | ||
34 | |||
35 | WHATEVER YOU DO, DO NOT POST EMAIL TO THE LINUX-ARM OR LINUX-ARM-KERNEL | ||
36 | MAILING LISTS REGARDING THE INTEL SDK. | ||
37 | |||
38 | 3. Supported Platforms | ||
39 | |||
40 | - Intel IXDP2400 Reference Platform | ||
41 | - Intel IXDP2800 Reference Platform | ||
42 | - Intel IXDP2401 Reference Platform | ||
43 | - Intel IXDP2801 Reference Platform | ||
44 | - RadiSys ENP-2611 | ||
45 | |||
46 | 4. Usage Notes | ||
47 | |||
48 | - The IXP2000 platforms usually have rather complex PCI bus topologies | ||
49 | with large memory space requirements. In addition, b/c of the way the | ||
50 | Intel SDK is designed, devices are enumerated in a very specific | ||
51 | way. B/c of this this, we use "pci=firmware" option in the kernel | ||
52 | command line so that we do not re-enumerate the bus. | ||
53 | |||
54 | - IXDP2x01 systems have variable clock tick rates that we cannot determine | ||
55 | via HW registers. The "ixdp2x01_clk=XXX" cmd line options allow you | ||
56 | to pass the clock rate to the board port. | ||
57 | |||
58 | 5. Thanks | ||
59 | |||
60 | The IXP2000 work has been funded by Intel Corp. and MontaVista Software, Inc. | ||
61 | |||
62 | The following people have contributed patches/comments/etc: | ||
63 | |||
64 | Naeem F. Afzal | ||
65 | Lennert Buytenhek | ||
66 | Jeffrey Daly | ||
67 | |||
68 | ------------------------------------------------------------------------- | ||
69 | Last Update: 8/09/2004 | ||
diff --git a/Documentation/arm/nvidia/tegra_parameters.txt b/Documentation/arm/nvidia/tegra_parameters.txt new file mode 100644 index 00000000000..4c73fe7269f --- /dev/null +++ b/Documentation/arm/nvidia/tegra_parameters.txt | |||
@@ -0,0 +1,169 @@ | |||
1 | This file documents NVIDIA Tegra specific sysfs and debugfs files and | ||
2 | kernel module parameters. | ||
3 | |||
4 | /sys/power/suspend/mode | ||
5 | ----------------------- | ||
6 | |||
7 | Used to select the LP1 or LP0 power state during system suspend. | ||
8 | # echo lp0 > /sys/kernel/debug/suspend_mode | ||
9 | # echo lp1 > /sys/kernel/debug/suspend_mode | ||
10 | |||
11 | /sys/module/cpuidle/parameters/lp2_in_idle | ||
12 | ------------------------------------------ | ||
13 | |||
14 | Used to enable/disable LP2 in idle. | ||
15 | # echo 1 > /sys/module/cpuidle/parameters/lp2_in_idle | ||
16 | # echo 0 > /sys/module/cpuidle/parameters/lp2_in_idle | ||
17 | |||
18 | /sys/kernel/debug/cpuidle/lp2 | ||
19 | ----------------------------- | ||
20 | |||
21 | Contains LP2 statistics. | ||
22 | # cat /sys/kernel/debug/cpuidle/lp2 | ||
23 | |||
24 | /sys/kernel/debug/powergate | ||
25 | --------------------------- | ||
26 | |||
27 | Contains power gating state of different tegra blocks. | ||
28 | |||
29 | # cat /sys/kernel/debug/powergate | ||
30 | |||
31 | /sys/module/cpu_tegra3/parameters/auto_hotplug | ||
32 | ---------------------------------------------- | ||
33 | |||
34 | Used to control auto hotplug governor | ||
35 | # echo 0 >/sys/module/cpu_tegra3/parameters/auto_hotplug | ||
36 | # echo 1 >/sys/module/cpu_tegra3/parameters/auto_hotplug | ||
37 | # cat /sys/module/cpu_tegra3/parameters/auto_hotplug | ||
38 | 0: disabled | ||
39 | 1: idle | ||
40 | 2: down | ||
41 | 3: up | ||
42 | |||
43 | /sys/module/cpu_tegra3/parameters/no_lp | ||
44 | --------------------------------------- | ||
45 | |||
46 | Used to enable/disable shadow cluster. | ||
47 | # echo 0 >/sys/module/cpu_tegra3/parameters/no_lp | ||
48 | # echo 1 >/sys/module/cpu_tegra3/parameters/no_lp | ||
49 | |||
50 | /sys/module/cpu_tegra3/parameters/idle_bottom_freq | ||
51 | -------------------------------------------------- | ||
52 | |||
53 | Shadow cluster maximum frequency. | ||
54 | |||
55 | /sys/module/cpu_tegra3/parameters/idle_top_freq | ||
56 | ----------------------------------------------- | ||
57 | |||
58 | Main cluster minimum frequency. | ||
59 | |||
60 | /sys/module/cpu_tegra3/parameters/down_delay | ||
61 | --------------------------------------------- | ||
62 | |||
63 | Auto hotplug delay (in jiffies) for reducing cores. | ||
64 | |||
65 | /sys/module/cpu_tegra3/parameters/up2g0_delay | ||
66 | --------------------------------------------- | ||
67 | |||
68 | Delay (in jiffies) for swithing to main cluster. | ||
69 | |||
70 | /sys/module/cpu_tegra3/parameters/up2gn_delay | ||
71 | --------------------------------------------- | ||
72 | |||
73 | Delay (in jiffies) for bringing additional cores online in main | ||
74 | cluster. | ||
75 | |||
76 | /sys/module/cpu_tegra3/parameters/balance_level | ||
77 | ----------------------------------------------- | ||
78 | |||
79 | Percentage of max speed considered to be in balance. Half of balanced | ||
80 | speed is considered skewed. Speed balance states: | ||
81 | * balanced: freq targets for all CPUs are above 50% of highest speed | ||
82 | * biased: freq target for at least one CPU is below 50% threshold | ||
83 | * skewed: freq targets for at least 2 CPUs are below 25% threshold | ||
84 | Speed balance state and hotplug state dictates auto hotlug behavior. | ||
85 | |||
86 | /sys/module/cpu_tegra3/parameters/mp_overhead | ||
87 | --------------------------------------------- | ||
88 | |||
89 | Multi-core overhead percentage for EDP limit calculation. | ||
90 | |||
91 | /sys/kernel/debug/tegra_hotplug/stats | ||
92 | ------------------------------------- | ||
93 | |||
94 | Contains hotplug statistics. | ||
95 | |||
96 | /sys/kernel/cluster/active | ||
97 | -------------------------- | ||
98 | |||
99 | Controls active CPU cluster: main (G) or shadow (LP). | ||
100 | For manual control disable auto hotlug, enable immediate switch and | ||
101 | possibly force switch to happen always: | ||
102 | # echo 0 > /sys/module/cpu_tegra3/parameters/auto_hotplug | ||
103 | # echo 1 > /sys/kernel/cluster/immediate | ||
104 | # echo 1 > /sys/kernel/cluster/force | ||
105 | |||
106 | Cluster switching can happen only when only core 0 is online. | ||
107 | |||
108 | Active cluster can be set or toggled: | ||
109 | # echo "G" > /sys/kernel/cluster/active | ||
110 | # echo "LP" > /sys/kernel/cluster/active | ||
111 | # echo "toggle" > /sys/kernel/cluster/active | ||
112 | |||
113 | /sys/module/tegra3_clocks/parameters/detach_shared_bus | ||
114 | ------------------------------------------------------ | ||
115 | |||
116 | Enable/disable shared bus clock update. | ||
117 | |||
118 | /sys/module/tegra3_emc/parameters/emc_enable | ||
119 | -------------------------------------------- | ||
120 | |||
121 | Enable/disable EMC DFS. | ||
122 | |||
123 | /sys/kernel/debug/tegra_emc/stats | ||
124 | --------------------------------- | ||
125 | |||
126 | Contains EMC clock statistics. | ||
127 | |||
128 | /sys/module/tegra3_dvfs/parameters/disable_cpu | ||
129 | ---------------------------------------------- | ||
130 | |||
131 | Enable/disable DVFS for CPU domain. | ||
132 | |||
133 | /sys/module/tegra3_dvfs/parameters/disable_core | ||
134 | ----------------------------------------------- | ||
135 | |||
136 | Enable/disable DVFS for CORE domain. | ||
137 | |||
138 | /sys/kernel/debug/clock/emc/rate | ||
139 | -------------------------------- | ||
140 | |||
141 | Get/set EMC clock rate. | ||
142 | |||
143 | /sys/kernel/debug/clock/<module>/rate | ||
144 | ------------------------------------- | ||
145 | |||
146 | /sys/kernel/debug/clock/<module>/parent | ||
147 | --------------------------------------- | ||
148 | |||
149 | /sys/kernel/debug/clock/<module>/state | ||
150 | -------------------------------------- | ||
151 | |||
152 | /sys/kernel/debug/clock/<module>/time_on | ||
153 | ---------------------------------------- | ||
154 | |||
155 | /sys/kernel/debug/clock/clock_tree | ||
156 | ---------------------------------- | ||
157 | |||
158 | Shows the state of the clock tree. | ||
159 | |||
160 | /sys/kernel/debug/clock/dvfs | ||
161 | ---------------------------- | ||
162 | |||
163 | Contains voltage state. | ||
164 | |||
165 | /sys/kernel/debug/tegra_actmon/avp/state | ||
166 | ---------------------------------------- | ||
167 | |||
168 | /sys/kernel/debug/clock/mon.avp/rate | ||
169 | ------------------------------------ | ||
diff --git a/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt b/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt new file mode 100644 index 00000000000..eb4b530d64e --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt | |||
@@ -0,0 +1,8 @@ | |||
1 | NVIDIA Tegra 2 GPIO controller | ||
2 | |||
3 | Required properties: | ||
4 | - compatible : "nvidia,tegra20-gpio" | ||
5 | - #gpio-cells : Should be two. The first cell is the pin number and the | ||
6 | second cell is used to specify optional parameters: | ||
7 | - bit 0 specifies polarity (0 for normal, 1 for inverted) | ||
8 | - gpio-controller : Marks the device node as a GPIO controller. | ||
diff --git a/Documentation/devicetree/bindings/gpio/led.txt b/Documentation/devicetree/bindings/gpio/led.txt new file mode 100644 index 00000000000..064db928c3c --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/led.txt | |||
@@ -0,0 +1,58 @@ | |||
1 | LEDs connected to GPIO lines | ||
2 | |||
3 | Required properties: | ||
4 | - compatible : should be "gpio-leds". | ||
5 | |||
6 | Each LED is represented as a sub-node of the gpio-leds device. Each | ||
7 | node's name represents the name of the corresponding LED. | ||
8 | |||
9 | LED sub-node properties: | ||
10 | - gpios : Should specify the LED's GPIO, see "Specifying GPIO information | ||
11 | for devices" in Documentation/powerpc/booting-without-of.txt. Active | ||
12 | low LEDs should be indicated using flags in the GPIO specifier. | ||
13 | - label : (optional) The label for this LED. If omitted, the label is | ||
14 | taken from the node name (excluding the unit address). | ||
15 | - linux,default-trigger : (optional) This parameter, if present, is a | ||
16 | string defining the trigger assigned to the LED. Current triggers are: | ||
17 | "backlight" - LED will act as a back-light, controlled by the framebuffer | ||
18 | system | ||
19 | "default-on" - LED will turn on, but see "default-state" below | ||
20 | "heartbeat" - LED "double" flashes at a load average based rate | ||
21 | "ide-disk" - LED indicates disk activity | ||
22 | "timer" - LED flashes at a fixed, configurable rate | ||
23 | - default-state: (optional) The initial state of the LED. Valid | ||
24 | values are "on", "off", and "keep". If the LED is already on or off | ||
25 | and the default-state property is set the to same value, then no | ||
26 | glitch should be produced where the LED momentarily turns off (or | ||
27 | on). The "keep" setting will keep the LED at whatever its current | ||
28 | state is, without producing a glitch. The default is off if this | ||
29 | property is not present. | ||
30 | |||
31 | Examples: | ||
32 | |||
33 | leds { | ||
34 | compatible = "gpio-leds"; | ||
35 | hdd { | ||
36 | label = "IDE Activity"; | ||
37 | gpios = <&mcu_pio 0 1>; /* Active low */ | ||
38 | linux,default-trigger = "ide-disk"; | ||
39 | }; | ||
40 | |||
41 | fault { | ||
42 | gpios = <&mcu_pio 1 0>; | ||
43 | /* Keep LED on if BIOS detected hardware fault */ | ||
44 | default-state = "keep"; | ||
45 | }; | ||
46 | }; | ||
47 | |||
48 | run-control { | ||
49 | compatible = "gpio-leds"; | ||
50 | red { | ||
51 | gpios = <&mpc8572 6 0>; | ||
52 | default-state = "off"; | ||
53 | }; | ||
54 | green { | ||
55 | gpios = <&mpc8572 7 0>; | ||
56 | default-state = "on"; | ||
57 | }; | ||
58 | } | ||
diff --git a/Documentation/devicetree/bindings/i2c/arm-versatile.txt b/Documentation/devicetree/bindings/i2c/arm-versatile.txt new file mode 100644 index 00000000000..361d31c51b6 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/arm-versatile.txt | |||
@@ -0,0 +1,10 @@ | |||
1 | i2c Controller on ARM Versatile platform: | ||
2 | |||
3 | Required properties: | ||
4 | - compatible : Must be "arm,versatile-i2c"; | ||
5 | - reg | ||
6 | - #address-cells = <1>; | ||
7 | - #size-cells = <0>; | ||
8 | |||
9 | Optional properties: | ||
10 | - Child nodes conforming to i2c bus binding | ||
diff --git a/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt b/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt new file mode 100644 index 00000000000..569b1624851 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt | |||
@@ -0,0 +1,93 @@ | |||
1 | CE4100 I2C | ||
2 | ---------- | ||
3 | |||
4 | CE4100 has one PCI device which is described as the I2C-Controller. This | ||
5 | PCI device has three PCI-bars, each bar contains a complete I2C | ||
6 | controller. So we have a total of three independent I2C-Controllers | ||
7 | which share only an interrupt line. | ||
8 | The driver is probed via the PCI-ID and is gathering the information of | ||
9 | attached devices from the devices tree. | ||
10 | Grant Likely recommended to use the ranges property to map the PCI-Bar | ||
11 | number to its physical address and to use this to find the child nodes | ||
12 | of the specific I2C controller. This were his exact words: | ||
13 | |||
14 | Here's where the magic happens. Each entry in | ||
15 | ranges describes how the parent pci address space | ||
16 | (middle group of 3) is translated to the local | ||
17 | address space (first group of 2) and the size of | ||
18 | each range (last cell). In this particular case, | ||
19 | the first cell of the local address is chosen to be | ||
20 | 1:1 mapped to the BARs, and the second is the | ||
21 | offset from be base of the BAR (which would be | ||
22 | non-zero if you had 2 or more devices mapped off | ||
23 | the same BAR) | ||
24 | |||
25 | ranges allows the address mapping to be described | ||
26 | in a way that the OS can interpret without | ||
27 | requiring custom device driver code. | ||
28 | |||
29 | This is an example which is used on FalconFalls: | ||
30 | ------------------------------------------------ | ||
31 | i2c-controller@b,2 { | ||
32 | #address-cells = <2>; | ||
33 | #size-cells = <1>; | ||
34 | compatible = "pci8086,2e68.2", | ||
35 | "pci8086,2e68", | ||
36 | "pciclass,ff0000", | ||
37 | "pciclass,ff00"; | ||
38 | |||
39 | reg = <0x15a00 0x0 0x0 0x0 0x0>; | ||
40 | interrupts = <16 1>; | ||
41 | |||
42 | /* as described by Grant, the first number in the group of | ||
43 | * three is the bar number followed by the 64bit bar address | ||
44 | * followed by size of the mapping. The bar address | ||
45 | * requires also a valid translation in parents ranges | ||
46 | * property. | ||
47 | */ | ||
48 | ranges = <0 0 0x02000000 0 0xdffe0500 0x100 | ||
49 | 1 0 0x02000000 0 0xdffe0600 0x100 | ||
50 | 2 0 0x02000000 0 0xdffe0700 0x100>; | ||
51 | |||
52 | i2c@0 { | ||
53 | #address-cells = <1>; | ||
54 | #size-cells = <0>; | ||
55 | compatible = "intel,ce4100-i2c-controller"; | ||
56 | |||
57 | /* The first number in the reg property is the | ||
58 | * number of the bar | ||
59 | */ | ||
60 | reg = <0 0 0x100>; | ||
61 | |||
62 | /* This I2C controller has no devices */ | ||
63 | }; | ||
64 | |||
65 | i2c@1 { | ||
66 | #address-cells = <1>; | ||
67 | #size-cells = <0>; | ||
68 | compatible = "intel,ce4100-i2c-controller"; | ||
69 | reg = <1 0 0x100>; | ||
70 | |||
71 | /* This I2C controller has one gpio controller */ | ||
72 | gpio@26 { | ||
73 | #gpio-cells = <2>; | ||
74 | compatible = "ti,pcf8575"; | ||
75 | reg = <0x26>; | ||
76 | gpio-controller; | ||
77 | }; | ||
78 | }; | ||
79 | |||
80 | i2c@2 { | ||
81 | #address-cells = <1>; | ||
82 | #size-cells = <0>; | ||
83 | compatible = "intel,ce4100-i2c-controller"; | ||
84 | reg = <2 0 0x100>; | ||
85 | |||
86 | gpio@26 { | ||
87 | #gpio-cells = <2>; | ||
88 | compatible = "ti,pcf8575"; | ||
89 | reg = <0x26>; | ||
90 | gpio-controller; | ||
91 | }; | ||
92 | }; | ||
93 | }; | ||
diff --git a/Documentation/devicetree/bindings/i2c/fsl-i2c.txt b/Documentation/devicetree/bindings/i2c/fsl-i2c.txt new file mode 100644 index 00000000000..1eacd6b20ed --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/fsl-i2c.txt | |||
@@ -0,0 +1,64 @@ | |||
1 | * I2C | ||
2 | |||
3 | Required properties : | ||
4 | |||
5 | - reg : Offset and length of the register set for the device | ||
6 | - compatible : should be "fsl,CHIP-i2c" where CHIP is the name of a | ||
7 | compatible processor, e.g. mpc8313, mpc8543, mpc8544, mpc5121, | ||
8 | mpc5200 or mpc5200b. For the mpc5121, an additional node | ||
9 | "fsl,mpc5121-i2c-ctrl" is required as shown in the example below. | ||
10 | |||
11 | Recommended properties : | ||
12 | |||
13 | - interrupts : <a b> where a is the interrupt number and b is a | ||
14 | field that represents an encoding of the sense and level | ||
15 | information for the interrupt. This should be encoded based on | ||
16 | the information in section 2) depending on the type of interrupt | ||
17 | controller you have. | ||
18 | - interrupt-parent : the phandle for the interrupt controller that | ||
19 | services interrupts for this device. | ||
20 | - fsl,preserve-clocking : boolean; if defined, the clock settings | ||
21 | from the bootloader are preserved (not touched). | ||
22 | - clock-frequency : desired I2C bus clock frequency in Hz. | ||
23 | - fsl,timeout : I2C bus timeout in microseconds. | ||
24 | |||
25 | Examples : | ||
26 | |||
27 | /* MPC5121 based board */ | ||
28 | i2c@1740 { | ||
29 | #address-cells = <1>; | ||
30 | #size-cells = <0>; | ||
31 | compatible = "fsl,mpc5121-i2c", "fsl-i2c"; | ||
32 | reg = <0x1740 0x20>; | ||
33 | interrupts = <11 0x8>; | ||
34 | interrupt-parent = <&ipic>; | ||
35 | clock-frequency = <100000>; | ||
36 | }; | ||
37 | |||
38 | i2ccontrol@1760 { | ||
39 | compatible = "fsl,mpc5121-i2c-ctrl"; | ||
40 | reg = <0x1760 0x8>; | ||
41 | }; | ||
42 | |||
43 | /* MPC5200B based board */ | ||
44 | i2c@3d00 { | ||
45 | #address-cells = <1>; | ||
46 | #size-cells = <0>; | ||
47 | compatible = "fsl,mpc5200b-i2c","fsl,mpc5200-i2c","fsl-i2c"; | ||
48 | reg = <0x3d00 0x40>; | ||
49 | interrupts = <2 15 0>; | ||
50 | interrupt-parent = <&mpc5200_pic>; | ||
51 | fsl,preserve-clocking; | ||
52 | }; | ||
53 | |||
54 | /* MPC8544 base board */ | ||
55 | i2c@3100 { | ||
56 | #address-cells = <1>; | ||
57 | #size-cells = <0>; | ||
58 | compatible = "fsl,mpc8544-i2c", "fsl-i2c"; | ||
59 | reg = <0x3100 0x100>; | ||
60 | interrupts = <43 2>; | ||
61 | interrupt-parent = <&mpic>; | ||
62 | clock-frequency = <400000>; | ||
63 | fsl,timeout = <10000>; | ||
64 | }; | ||
diff --git a/Documentation/devicetree/bindings/spi/spi_nvidia.txt b/Documentation/devicetree/bindings/spi/spi_nvidia.txt new file mode 100644 index 00000000000..6b9e5189669 --- /dev/null +++ b/Documentation/devicetree/bindings/spi/spi_nvidia.txt | |||
@@ -0,0 +1,5 @@ | |||
1 | NVIDIA Tegra 2 SPI device | ||
2 | |||
3 | Required properties: | ||
4 | - compatible : should be "nvidia,tegra20-spi". | ||
5 | - gpios : should specify GPIOs used for chipselect. | ||
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt new file mode 100644 index 00000000000..4dc46547766 --- /dev/null +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -0,0 +1,602 @@ | |||
1 | The following is a list of files and features that are going to be | ||
2 | removed in the kernel source tree. Every entry should contain what | ||
3 | exactly is going away, why it is happening, and who is going to be doing | ||
4 | the work. When the feature is removed from the kernel, it should also | ||
5 | be removed from this file. | ||
6 | |||
7 | --------------------------- | ||
8 | |||
9 | What: x86 floppy disable_hlt | ||
10 | When: 2012 | ||
11 | Why: ancient workaround of dubious utility clutters the | ||
12 | code used by everybody else. | ||
13 | Who: Len Brown <len.brown@intel.com> | ||
14 | |||
15 | --------------------------- | ||
16 | |||
17 | What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle | ||
18 | When: 2012 | ||
19 | Why: This optional sub-feature of APM is of dubious reliability, | ||
20 | and ancient APM laptops are likely better served by calling HLT. | ||
21 | Deleting CONFIG_APM_CPU_IDLE allows x86 to stop exporting | ||
22 | the pm_idle function pointer to modules. | ||
23 | Who: Len Brown <len.brown@intel.com> | ||
24 | |||
25 | ---------------------------- | ||
26 | |||
27 | What: x86_32 "no-hlt" cmdline param | ||
28 | When: 2012 | ||
29 | Why: remove a branch from idle path, simplify code used by everybody. | ||
30 | This option disabled the use of HLT in idle and machine_halt() | ||
31 | for hardware that was flakey 15-years ago. Today we have | ||
32 | "idle=poll" that removed HLT from idle, and so if such a machine | ||
33 | is still running the upstream kernel, "idle=poll" is likely sufficient. | ||
34 | Who: Len Brown <len.brown@intel.com> | ||
35 | |||
36 | ---------------------------- | ||
37 | |||
38 | What: x86 "idle=mwait" cmdline param | ||
39 | When: 2012 | ||
40 | Why: simplify x86 idle code | ||
41 | Who: Len Brown <len.brown@intel.com> | ||
42 | |||
43 | ---------------------------- | ||
44 | |||
45 | What: PRISM54 | ||
46 | When: 2.6.34 | ||
47 | |||
48 | Why: prism54 FullMAC PCI / Cardbus devices used to be supported only by the | ||
49 | prism54 wireless driver. After Intersil stopped selling these | ||
50 | devices in preference for the newer more flexible SoftMAC devices | ||
51 | a SoftMAC device driver was required and prism54 did not support | ||
52 | them. The p54pci driver now exists and has been present in the kernel for | ||
53 | a while. This driver supports both SoftMAC devices and FullMAC devices. | ||
54 | The main difference between these devices was the amount of memory which | ||
55 | could be used for the firmware. The SoftMAC devices support a smaller | ||
56 | amount of memory. Because of this the SoftMAC firmware fits into FullMAC | ||
57 | devices's memory. p54pci supports not only PCI / Cardbus but also USB | ||
58 | and SPI. Since p54pci supports all devices prism54 supports | ||
59 | you will have a conflict. I'm not quite sure how distributions are | ||
60 | handling this conflict right now. prism54 was kept around due to | ||
61 | claims users may experience issues when using the SoftMAC driver. | ||
62 | Time has passed users have not reported issues. If you use prism54 | ||
63 | and for whatever reason you cannot use p54pci please let us know! | ||
64 | E-mail us at: linux-wireless@vger.kernel.org | ||
65 | |||
66 | For more information see the p54 wiki page: | ||
67 | |||
68 | http://wireless.kernel.org/en/users/Drivers/p54 | ||
69 | |||
70 | Who: Luis R. Rodriguez <lrodriguez@atheros.com> | ||
71 | |||
72 | --------------------------- | ||
73 | |||
74 | What: IRQF_SAMPLE_RANDOM | ||
75 | Check: IRQF_SAMPLE_RANDOM | ||
76 | When: July 2009 | ||
77 | |||
78 | Why: Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy | ||
79 | sources in the kernel's current entropy model. To resolve this, every | ||
80 | input point to the kernel's entropy pool needs to better document the | ||
81 | type of entropy source it actually is. This will be replaced with | ||
82 | additional add_*_randomness functions in drivers/char/random.c | ||
83 | |||
84 | Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com> | ||
85 | |||
86 | --------------------------- | ||
87 | |||
88 | What: Deprecated snapshot ioctls | ||
89 | When: 2.6.36 | ||
90 | |||
91 | Why: The ioctls in kernel/power/user.c were marked as deprecated long time | ||
92 | ago. Now they notify users about that so that they need to replace | ||
93 | their userspace. After some more time, remove them completely. | ||
94 | |||
95 | Who: Jiri Slaby <jirislaby@gmail.com> | ||
96 | |||
97 | --------------------------- | ||
98 | |||
99 | What: The ieee80211_regdom module parameter | ||
100 | When: March 2010 / desktop catchup | ||
101 | |||
102 | Why: This was inherited by the CONFIG_WIRELESS_OLD_REGULATORY code, | ||
103 | and currently serves as an option for users to define an | ||
104 | ISO / IEC 3166 alpha2 code for the country they are currently | ||
105 | present in. Although there are userspace API replacements for this | ||
106 | through nl80211 distributions haven't yet caught up with implementing | ||
107 | decent alternatives through standard GUIs. Although available as an | ||
108 | option through iw or wpa_supplicant its just a matter of time before | ||
109 | distributions pick up good GUI options for this. The ideal solution | ||
110 | would actually consist of intelligent designs which would do this for | ||
111 | the user automatically even when travelling through different countries. | ||
112 | Until then we leave this module parameter as a compromise. | ||
113 | |||
114 | When userspace improves with reasonable widely-available alternatives for | ||
115 | this we will no longer need this module parameter. This entry hopes that | ||
116 | by the super-futuristically looking date of "March 2010" we will have | ||
117 | such replacements widely available. | ||
118 | |||
119 | Who: Luis R. Rodriguez <lrodriguez@atheros.com> | ||
120 | |||
121 | --------------------------- | ||
122 | |||
123 | What: dev->power.power_state | ||
124 | When: July 2007 | ||
125 | Why: Broken design for runtime control over driver power states, confusing | ||
126 | driver-internal runtime power management with: mechanisms to support | ||
127 | system-wide sleep state transitions; event codes that distinguish | ||
128 | different phases of swsusp "sleep" transitions; and userspace policy | ||
129 | inputs. This framework was never widely used, and most attempts to | ||
130 | use it were broken. Drivers should instead be exposing domain-specific | ||
131 | interfaces either to kernel or to userspace. | ||
132 | Who: Pavel Machek <pavel@ucw.cz> | ||
133 | |||
134 | --------------------------- | ||
135 | |||
136 | What: sys_sysctl | ||
137 | When: September 2010 | ||
138 | Option: CONFIG_SYSCTL_SYSCALL | ||
139 | Why: The same information is available in a more convenient from | ||
140 | /proc/sys, and none of the sysctl variables appear to be | ||
141 | important performance wise. | ||
142 | |||
143 | Binary sysctls are a long standing source of subtle kernel | ||
144 | bugs and security issues. | ||
145 | |||
146 | When I looked several months ago all I could find after | ||
147 | searching several distributions were 5 user space programs and | ||
148 | glibc (which falls back to /proc/sys) using this syscall. | ||
149 | |||
150 | The man page for sysctl(2) documents it as unusable for user | ||
151 | space programs. | ||
152 | |||
153 | sysctl(2) is not generally ABI compatible to a 32bit user | ||
154 | space application on a 64bit and a 32bit kernel. | ||
155 | |||
156 | For the last several months the policy has been no new binary | ||
157 | sysctls and no one has put forward an argument to use them. | ||
158 | |||
159 | Binary sysctls issues seem to keep happening appearing so | ||
160 | properly deprecating them (with a warning to user space) and a | ||
161 | 2 year grace warning period will mean eventually we can kill | ||
162 | them and end the pain. | ||
163 | |||
164 | In the mean time individual binary sysctls can be dealt with | ||
165 | in a piecewise fashion. | ||
166 | |||
167 | Who: Eric Biederman <ebiederm@xmission.com> | ||
168 | |||
169 | --------------------------- | ||
170 | |||
171 | What: /proc/<pid>/oom_adj | ||
172 | When: August 2012 | ||
173 | Why: /proc/<pid>/oom_adj allows userspace to influence the oom killer's | ||
174 | badness heuristic used to determine which task to kill when the kernel | ||
175 | is out of memory. | ||
176 | |||
177 | The badness heuristic has since been rewritten since the introduction of | ||
178 | this tunable such that its meaning is deprecated. The value was | ||
179 | implemented as a bitshift on a score generated by the badness() | ||
180 | function that did not have any precise units of measure. With the | ||
181 | rewrite, the score is given as a proportion of available memory to the | ||
182 | task allocating pages, so using a bitshift which grows the score | ||
183 | exponentially is, thus, impossible to tune with fine granularity. | ||
184 | |||
185 | A much more powerful interface, /proc/<pid>/oom_score_adj, was | ||
186 | introduced with the oom killer rewrite that allows users to increase or | ||
187 | decrease the badness score linearly. This interface will replace | ||
188 | /proc/<pid>/oom_adj. | ||
189 | |||
190 | A warning will be emitted to the kernel log if an application uses this | ||
191 | deprecated interface. After it is printed once, future warnings will be | ||
192 | suppressed until the kernel is rebooted. | ||
193 | |||
194 | --------------------------- | ||
195 | |||
196 | What: remove EXPORT_SYMBOL(kernel_thread) | ||
197 | When: August 2006 | ||
198 | Files: arch/*/kernel/*_ksyms.c | ||
199 | Check: kernel_thread | ||
200 | Why: kernel_thread is a low-level implementation detail. Drivers should | ||
201 | use the <linux/kthread.h> API instead which shields them from | ||
202 | implementation details and provides a higherlevel interface that | ||
203 | prevents bugs and code duplication | ||
204 | Who: Christoph Hellwig <hch@lst.de> | ||
205 | |||
206 | --------------------------- | ||
207 | |||
208 | What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports | ||
209 | (temporary transition config option provided until then) | ||
210 | The transition config option will also be removed at the same time. | ||
211 | When: before 2.6.19 | ||
212 | Why: Unused symbols are both increasing the size of the kernel binary | ||
213 | and are often a sign of "wrong API" | ||
214 | Who: Arjan van de Ven <arjan@linux.intel.com> | ||
215 | |||
216 | --------------------------- | ||
217 | |||
218 | What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment | ||
219 | When: October 2008 | ||
220 | Why: The stacking of class devices makes these values misleading and | ||
221 | inconsistent. | ||
222 | Class devices should not carry any of these properties, and bus | ||
223 | devices have SUBSYTEM and DRIVER as a replacement. | ||
224 | Who: Kay Sievers <kay.sievers@suse.de> | ||
225 | |||
226 | --------------------------- | ||
227 | |||
228 | What: ACPI procfs interface | ||
229 | When: July 2008 | ||
230 | Why: ACPI sysfs conversion should be finished by January 2008. | ||
231 | ACPI procfs interface will be removed in July 2008 so that | ||
232 | there is enough time for the user space to catch up. | ||
233 | Who: Zhang Rui <rui.zhang@intel.com> | ||
234 | |||
235 | --------------------------- | ||
236 | |||
237 | What: CONFIG_ACPI_PROCFS_POWER | ||
238 | When: 2.6.39 | ||
239 | Why: sysfs I/F for ACPI power devices, including AC and Battery, | ||
240 | has been working in upstream kernel since 2.6.24, Sep 2007. | ||
241 | In 2.6.37, we make the sysfs I/F always built in and this option | ||
242 | disabled by default. | ||
243 | Remove this option and the ACPI power procfs interface in 2.6.39. | ||
244 | Who: Zhang Rui <rui.zhang@intel.com> | ||
245 | |||
246 | --------------------------- | ||
247 | |||
248 | What: /proc/acpi/event | ||
249 | When: February 2008 | ||
250 | Why: /proc/acpi/event has been replaced by events via the input layer | ||
251 | and netlink since 2.6.23. | ||
252 | Who: Len Brown <len.brown@intel.com> | ||
253 | |||
254 | --------------------------- | ||
255 | |||
256 | What: i386/x86_64 bzImage symlinks | ||
257 | When: April 2010 | ||
258 | |||
259 | Why: The i386/x86_64 merge provides a symlink to the old bzImage | ||
260 | location so not yet updated user space tools, e.g. package | ||
261 | scripts, do not break. | ||
262 | Who: Thomas Gleixner <tglx@linutronix.de> | ||
263 | |||
264 | --------------------------- | ||
265 | |||
266 | What: GPIO autorequest on gpio_direction_{input,output}() in gpiolib | ||
267 | When: February 2010 | ||
268 | Why: All callers should use explicit gpio_request()/gpio_free(). | ||
269 | The autorequest mechanism in gpiolib was provided mostly as a | ||
270 | migration aid for legacy GPIO interfaces (for SOC based GPIOs). | ||
271 | Those users have now largely migrated. Platforms implementing | ||
272 | the GPIO interfaces without using gpiolib will see no changes. | ||
273 | Who: David Brownell <dbrownell@users.sourceforge.net> | ||
274 | --------------------------- | ||
275 | |||
276 | What: b43 support for firmware revision < 410 | ||
277 | When: The schedule was July 2008, but it was decided that we are going to keep the | ||
278 | code as long as there are no major maintanance headaches. | ||
279 | So it _could_ be removed _any_ time now, if it conflicts with something new. | ||
280 | Why: The support code for the old firmware hurts code readability/maintainability | ||
281 | and slightly hurts runtime performance. Bugfixes for the old firmware | ||
282 | are not provided by Broadcom anymore. | ||
283 | Who: Michael Buesch <m@bues.ch> | ||
284 | |||
285 | --------------------------- | ||
286 | |||
287 | What: Ability for non root users to shm_get hugetlb pages based on mlock | ||
288 | resource limits | ||
289 | When: 2.6.31 | ||
290 | Why: Non root users need to be part of /proc/sys/vm/hugetlb_shm_group or | ||
291 | have CAP_IPC_LOCK to be able to allocate shm segments backed by | ||
292 | huge pages. The mlock based rlimit check to allow shm hugetlb is | ||
293 | inconsistent with mmap based allocations. Hence it is being | ||
294 | deprecated. | ||
295 | Who: Ravikiran Thirumalai <kiran@scalex86.org> | ||
296 | |||
297 | --------------------------- | ||
298 | |||
299 | What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS | ||
300 | (in net/core/net-sysfs.c) | ||
301 | When: After the only user (hal) has seen a release with the patches | ||
302 | for enough time, probably some time in 2010. | ||
303 | Why: Over 1K .text/.data size reduction, data is available in other | ||
304 | ways (ioctls) | ||
305 | Who: Johannes Berg <johannes@sipsolutions.net> | ||
306 | |||
307 | --------------------------- | ||
308 | |||
309 | What: sysfs ui for changing p4-clockmod parameters | ||
310 | When: September 2009 | ||
311 | Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and | ||
312 | e088e4c9cdb618675874becb91b2fd581ee707e6. | ||
313 | Removal is subject to fixing any remaining bugs in ACPI which may | ||
314 | cause the thermal throttling not to happen at the right time. | ||
315 | Who: Dave Jones <davej@redhat.com>, Matthew Garrett <mjg@redhat.com> | ||
316 | |||
317 | ----------------------------- | ||
318 | |||
319 | What: fakephp and associated sysfs files in /sys/bus/pci/slots/ | ||
320 | When: 2011 | ||
321 | Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to | ||
322 | represent a machine's physical PCI slots. The change in semantics | ||
323 | had userspace implications, as the hotplug core no longer allowed | ||
324 | drivers to create multiple sysfs files per physical slot (required | ||
325 | for multi-function devices, e.g.). fakephp was seen as a developer's | ||
326 | tool only, and its interface changed. Too late, we learned that | ||
327 | there were some users of the fakephp interface. | ||
328 | |||
329 | In 2.6.30, the original fakephp interface was restored. At the same | ||
330 | time, the PCI core gained the ability that fakephp provided, namely | ||
331 | function-level hot-remove and hot-add. | ||
332 | |||
333 | Since the PCI core now provides the same functionality, exposed in: | ||
334 | |||
335 | /sys/bus/pci/rescan | ||
336 | /sys/bus/pci/devices/.../remove | ||
337 | /sys/bus/pci/devices/.../rescan | ||
338 | |||
339 | there is no functional reason to maintain fakephp as well. | ||
340 | |||
341 | We will keep the existing module so that 'modprobe fakephp' will | ||
342 | present the old /sys/bus/pci/slots/... interface for compatibility, | ||
343 | but users are urged to migrate their applications to the API above. | ||
344 | |||
345 | After a reasonable transition period, we will remove the legacy | ||
346 | fakephp interface. | ||
347 | Who: Alex Chiang <achiang@hp.com> | ||
348 | |||
349 | --------------------------- | ||
350 | |||
351 | What: CONFIG_RFKILL_INPUT | ||
352 | When: 2.6.33 | ||
353 | Why: Should be implemented in userspace, policy daemon. | ||
354 | Who: Johannes Berg <johannes@sipsolutions.net> | ||
355 | |||
356 | ---------------------------- | ||
357 | |||
358 | What: sound-slot/service-* module aliases and related clutters in | ||
359 | sound/sound_core.c | ||
360 | When: August 2010 | ||
361 | Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR | ||
362 | (14) and requests modules using custom sound-slot/service-* | ||
363 | module aliases. The only benefit of doing this is allowing | ||
364 | use of custom module aliases which might as well be considered | ||
365 | a bug at this point. This preemptive claiming prevents | ||
366 | alternative OSS implementations. | ||
367 | |||
368 | Till the feature is removed, the kernel will be requesting | ||
369 | both sound-slot/service-* and the standard char-major-* module | ||
370 | aliases and allow turning off the pre-claiming selectively via | ||
371 | CONFIG_SOUND_OSS_CORE_PRECLAIM and soundcore.preclaim_oss | ||
372 | kernel parameter. | ||
373 | |||
374 | After the transition phase is complete, both the custom module | ||
375 | aliases and switches to disable it will go away. This removal | ||
376 | will also allow making ALSA OSS emulation independent of | ||
377 | sound_core. The dependency will be broken then too. | ||
378 | Who: Tejun Heo <tj@kernel.org> | ||
379 | |||
380 | ---------------------------- | ||
381 | |||
382 | What: sysfs-class-rfkill state file | ||
383 | When: Feb 2014 | ||
384 | Files: net/rfkill/core.c | ||
385 | Why: Documented as obsolete since Feb 2010. This file is limited to 3 | ||
386 | states while the rfkill drivers can have 4 states. | ||
387 | Who: anybody or Florian Mickler <florian@mickler.org> | ||
388 | |||
389 | ---------------------------- | ||
390 | |||
391 | What: sysfs-class-rfkill claim file | ||
392 | When: Feb 2012 | ||
393 | Files: net/rfkill/core.c | ||
394 | Why: It is not possible to claim an rfkill driver since 2007. This is | ||
395 | Documented as obsolete since Feb 2010. | ||
396 | Who: anybody or Florian Mickler <florian@mickler.org> | ||
397 | |||
398 | ---------------------------- | ||
399 | |||
400 | What: KVM paravirt mmu host support | ||
401 | When: January 2011 | ||
402 | Why: The paravirt mmu host support is slower than non-paravirt mmu, both | ||
403 | on newer and older hardware. It is already not exposed to the guest, | ||
404 | and kept only for live migration purposes. | ||
405 | Who: Avi Kivity <avi@redhat.com> | ||
406 | |||
407 | ---------------------------- | ||
408 | |||
409 | What: iwlwifi 50XX module parameters | ||
410 | When: 3.0 | ||
411 | Why: The "..50" modules parameters were used to configure 5000 series and | ||
412 | up devices; different set of module parameters also available for 4965 | ||
413 | with same functionalities. Consolidate both set into single place | ||
414 | in drivers/net/wireless/iwlwifi/iwl-agn.c | ||
415 | |||
416 | Who: Wey-Yi Guy <wey-yi.w.guy@intel.com> | ||
417 | |||
418 | ---------------------------- | ||
419 | |||
420 | What: iwl4965 alias support | ||
421 | When: 3.0 | ||
422 | Why: Internal alias support has been present in module-init-tools for some | ||
423 | time, the MODULE_ALIAS("iwl4965") boilerplate aliases can be removed | ||
424 | with no impact. | ||
425 | |||
426 | Who: Wey-Yi Guy <wey-yi.w.guy@intel.com> | ||
427 | |||
428 | --------------------------- | ||
429 | |||
430 | What: xt_NOTRACK | ||
431 | Files: net/netfilter/xt_NOTRACK.c | ||
432 | When: April 2011 | ||
433 | Why: Superseded by xt_CT | ||
434 | Who: Netfilter developer team <netfilter-devel@vger.kernel.org> | ||
435 | |||
436 | ---------------------------- | ||
437 | |||
438 | What: IRQF_DISABLED | ||
439 | When: 2.6.36 | ||
440 | Why: The flag is a NOOP as we run interrupt handlers with interrupts disabled | ||
441 | Who: Thomas Gleixner <tglx@linutronix.de> | ||
442 | |||
443 | ---------------------------- | ||
444 | |||
445 | What: PCI DMA unmap state API | ||
446 | When: August 2012 | ||
447 | Why: PCI DMA unmap state API (include/linux/pci-dma.h) was replaced | ||
448 | with DMA unmap state API (DMA unmap state API can be used for | ||
449 | any bus). | ||
450 | Who: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> | ||
451 | |||
452 | ---------------------------- | ||
453 | |||
454 | What: iwlwifi disable_hw_scan module parameters | ||
455 | When: 3.0 | ||
456 | Why: Hareware scan is the prefer method for iwlwifi devices for | ||
457 | scanning operation. Remove software scan support for all the | ||
458 | iwlwifi devices. | ||
459 | |||
460 | Who: Wey-Yi Guy <wey-yi.w.guy@intel.com> | ||
461 | |||
462 | ---------------------------- | ||
463 | |||
464 | What: Legacy, non-standard chassis intrusion detection interface. | ||
465 | When: June 2011 | ||
466 | Why: The adm9240, w83792d and w83793 hardware monitoring drivers have | ||
467 | legacy interfaces for chassis intrusion detection. A standard | ||
468 | interface has been added to each driver, so the legacy interface | ||
469 | can be removed. | ||
470 | Who: Jean Delvare <khali@linux-fr.org> | ||
471 | |||
472 | ---------------------------- | ||
473 | |||
474 | What: xt_connlimit rev 0 | ||
475 | When: 2012 | ||
476 | Who: Jan Engelhardt <jengelh@medozas.de> | ||
477 | Files: net/netfilter/xt_connlimit.c | ||
478 | |||
479 | ---------------------------- | ||
480 | |||
481 | What: ipt_addrtype match include file | ||
482 | When: 2012 | ||
483 | Why: superseded by xt_addrtype | ||
484 | Who: Florian Westphal <fw@strlen.de> | ||
485 | Files: include/linux/netfilter_ipv4/ipt_addrtype.h | ||
486 | |||
487 | ---------------------------- | ||
488 | |||
489 | What: i2c_driver.attach_adapter | ||
490 | i2c_driver.detach_adapter | ||
491 | When: September 2011 | ||
492 | Why: These legacy callbacks should no longer be used as i2c-core offers | ||
493 | a variety of preferable alternative ways to instantiate I2C devices. | ||
494 | Who: Jean Delvare <khali@linux-fr.org> | ||
495 | |||
496 | ---------------------------- | ||
497 | |||
498 | What: Support for UVCIOC_CTRL_ADD in the uvcvideo driver | ||
499 | When: 3.2 | ||
500 | Why: The information passed to the driver by this ioctl is now queried | ||
501 | dynamically from the device. | ||
502 | Who: Laurent Pinchart <laurent.pinchart@ideasonboard.com> | ||
503 | |||
504 | ---------------------------- | ||
505 | |||
506 | What: Support for UVCIOC_CTRL_MAP_OLD in the uvcvideo driver | ||
507 | When: 3.2 | ||
508 | Why: Used only by applications compiled against older driver versions. | ||
509 | Superseded by UVCIOC_CTRL_MAP which supports V4L2 menu controls. | ||
510 | Who: Laurent Pinchart <laurent.pinchart@ideasonboard.com> | ||
511 | |||
512 | ---------------------------- | ||
513 | |||
514 | What: Support for UVCIOC_CTRL_GET and UVCIOC_CTRL_SET in the uvcvideo driver | ||
515 | When: 3.2 | ||
516 | Why: Superseded by the UVCIOC_CTRL_QUERY ioctl. | ||
517 | Who: Laurent Pinchart <laurent.pinchart@ideasonboard.com> | ||
518 | |||
519 | ---------------------------- | ||
520 | |||
521 | What: Support for driver specific ioctls in the pwc driver (everything | ||
522 | defined in media/pwc-ioctl.h) | ||
523 | When: 3.3 | ||
524 | Why: This stems from the v4l1 era, with v4l2 everything can be done with | ||
525 | standardized v4l2 API calls | ||
526 | Who: Hans de Goede <hdegoede@redhat.com> | ||
527 | |||
528 | ---------------------------- | ||
529 | |||
530 | What: Driver specific sysfs API in the pwc driver | ||
531 | When: 3.3 | ||
532 | Why: Setting pan/tilt should be done with v4l2 controls, like with other | ||
533 | cams. The button is available as a standard input device | ||
534 | Who: Hans de Goede <hdegoede@redhat.com> | ||
535 | |||
536 | ---------------------------- | ||
537 | |||
538 | What: Driver specific use of pixfmt.priv in the pwc driver | ||
539 | When: 3.3 | ||
540 | Why: The .priv field never was intended for this, setting a framerate is | ||
541 | support using the standardized S_PARM ioctl | ||
542 | Who: Hans de Goede <hdegoede@redhat.com> | ||
543 | |||
544 | ---------------------------- | ||
545 | |||
546 | What: Software emulation of arbritary resolutions in the pwc driver | ||
547 | When: 3.3 | ||
548 | Why: The pwc driver claims to support any resolution between 160x120 | ||
549 | and 640x480, but emulates this by simply drawing a black border | ||
550 | around the image. Userspace can draw its own black border if it | ||
551 | really wants one. | ||
552 | Who: Hans de Goede <hdegoede@redhat.com> | ||
553 | |||
554 | ---------------------------- | ||
555 | |||
556 | What: For VIDIOC_S_FREQUENCY the type field must match the device node's type. | ||
557 | If not, return -EINVAL. | ||
558 | When: 3.2 | ||
559 | Why: It makes no sense to switch the tuner to radio mode by calling | ||
560 | VIDIOC_S_FREQUENCY on a video node, or to switch the tuner to tv mode by | ||
561 | calling VIDIOC_S_FREQUENCY on a radio node. This is the first step of a | ||
562 | move to more consistent handling of tv and radio tuners. | ||
563 | Who: Hans Verkuil <hans.verkuil@cisco.com> | ||
564 | |||
565 | ---------------------------- | ||
566 | |||
567 | What: Opening a radio device node will no longer automatically switch the | ||
568 | tuner mode from tv to radio. | ||
569 | When: 3.3 | ||
570 | Why: Just opening a V4L device should not change the state of the hardware | ||
571 | like that. It's very unexpected and against the V4L spec. Instead, you | ||
572 | switch to radio mode by calling VIDIOC_S_FREQUENCY. This is the second | ||
573 | and last step of the move to consistent handling of tv and radio tuners. | ||
574 | Who: Hans Verkuil <hans.verkuil@cisco.com> | ||
575 | |||
576 | ---------------------------- | ||
577 | |||
578 | What: g_file_storage driver | ||
579 | When: 3.8 | ||
580 | Why: This driver has been superseded by g_mass_storage. | ||
581 | Who: Alan Stern <stern@rowland.harvard.edu> | ||
582 | |||
583 | ---------------------------- | ||
584 | |||
585 | What: threeg and interface sysfs files in /sys/devices/platform/acer-wmi | ||
586 | When: 2012 | ||
587 | Why: In 3.0, we can now autodetect internal 3G device and already have | ||
588 | the threeg rfkill device. So, we plan to remove threeg sysfs support | ||
589 | for it's no longer necessary. | ||
590 | |||
591 | We also plan to remove interface sysfs file that exposed which ACPI-WMI | ||
592 | interface that was used by acer-wmi driver. It will replaced by | ||
593 | information log when acer-wmi initial. | ||
594 | Who: Lee, Chun-Yi <jlee@novell.com> | ||
595 | |||
596 | ---------------------------- | ||
597 | What: The XFS nodelaylog mount option | ||
598 | When: 3.3 | ||
599 | Why: The delaylog mode that has been the default since 2.6.39 has proven | ||
600 | stable, and the old code is in the way of additional improvements in | ||
601 | the log code. | ||
602 | Who: Christoph Hellwig <hch@lst.de> | ||
diff --git a/Documentation/i2c/muxes/gpio-i2cmux b/Documentation/i2c/muxes/gpio-i2cmux new file mode 100644 index 00000000000..811cd78d4cd --- /dev/null +++ b/Documentation/i2c/muxes/gpio-i2cmux | |||
@@ -0,0 +1,65 @@ | |||
1 | Kernel driver gpio-i2cmux | ||
2 | |||
3 | Author: Peter Korsgaard <peter.korsgaard@barco.com> | ||
4 | |||
5 | Description | ||
6 | ----------- | ||
7 | |||
8 | gpio-i2cmux is an i2c mux driver providing access to I2C bus segments | ||
9 | from a master I2C bus and a hardware MUX controlled through GPIO pins. | ||
10 | |||
11 | E.G.: | ||
12 | |||
13 | ---------- ---------- Bus segment 1 - - - - - | ||
14 | | | SCL/SDA | |-------------- | | | ||
15 | | |------------| | | ||
16 | | | | | Bus segment 2 | | | ||
17 | | Linux | GPIO 1..N | MUX |--------------- Devices | ||
18 | | |------------| | | | | ||
19 | | | | | Bus segment M | ||
20 | | | | |---------------| | | ||
21 | ---------- ---------- - - - - - | ||
22 | |||
23 | SCL/SDA of the master I2C bus is multiplexed to bus segment 1..M | ||
24 | according to the settings of the GPIO pins 1..N. | ||
25 | |||
26 | Usage | ||
27 | ----- | ||
28 | |||
29 | gpio-i2cmux uses the platform bus, so you need to provide a struct | ||
30 | platform_device with the platform_data pointing to a struct | ||
31 | gpio_i2cmux_platform_data with the I2C adapter number of the master | ||
32 | bus, the number of bus segments to create and the GPIO pins used | ||
33 | to control it. See include/linux/gpio-i2cmux.h for details. | ||
34 | |||
35 | E.G. something like this for a MUX providing 4 bus segments | ||
36 | controlled through 3 GPIO pins: | ||
37 | |||
38 | #include <linux/gpio-i2cmux.h> | ||
39 | #include <linux/platform_device.h> | ||
40 | |||
41 | static const unsigned myboard_gpiomux_gpios[] = { | ||
42 | AT91_PIN_PC26, AT91_PIN_PC25, AT91_PIN_PC24 | ||
43 | }; | ||
44 | |||
45 | static const unsigned myboard_gpiomux_values[] = { | ||
46 | 0, 1, 2, 3 | ||
47 | }; | ||
48 | |||
49 | static struct gpio_i2cmux_platform_data myboard_i2cmux_data = { | ||
50 | .parent = 1, | ||
51 | .base_nr = 2, /* optional */ | ||
52 | .values = myboard_gpiomux_values, | ||
53 | .n_values = ARRAY_SIZE(myboard_gpiomux_values), | ||
54 | .gpios = myboard_gpiomux_gpios, | ||
55 | .n_gpios = ARRAY_SIZE(myboard_gpiomux_gpios), | ||
56 | .idle = 4, /* optional */ | ||
57 | }; | ||
58 | |||
59 | static struct platform_device myboard_i2cmux = { | ||
60 | .name = "gpio-i2cmux", | ||
61 | .id = 0, | ||
62 | .dev = { | ||
63 | .platform_data = &myboard_i2cmux_data, | ||
64 | }, | ||
65 | }; | ||
diff --git a/Documentation/mca.txt b/Documentation/mca.txt new file mode 100644 index 00000000000..dfd130c2207 --- /dev/null +++ b/Documentation/mca.txt | |||
@@ -0,0 +1,313 @@ | |||
1 | i386 Micro Channel Architecture Support | ||
2 | ======================================= | ||
3 | |||
4 | MCA support is enabled using the CONFIG_MCA define. A machine with a MCA | ||
5 | bus will have the kernel variable MCA_bus set, assuming the BIOS feature | ||
6 | bits are set properly (see arch/i386/boot/setup.S for information on | ||
7 | how this detection is done). | ||
8 | |||
9 | Adapter Detection | ||
10 | ================= | ||
11 | |||
12 | The ideal MCA adapter detection is done through the use of the | ||
13 | Programmable Option Select registers. Generic functions for doing | ||
14 | this have been added in include/linux/mca.h and arch/x86/kernel/mca_32.c. | ||
15 | Everything needed to detect adapters and read (and write) configuration | ||
16 | information is there. A number of MCA-specific drivers already use | ||
17 | this. The typical probe code looks like the following: | ||
18 | |||
19 | #include <linux/mca.h> | ||
20 | |||
21 | unsigned char pos2, pos3, pos4, pos5; | ||
22 | struct net_device* dev; | ||
23 | int slot; | ||
24 | |||
25 | if( MCA_bus ) { | ||
26 | slot = mca_find_adapter( ADAPTER_ID, 0 ); | ||
27 | if( slot == MCA_NOTFOUND ) { | ||
28 | return -ENODEV; | ||
29 | } | ||
30 | /* optional - see below */ | ||
31 | mca_set_adapter_name( slot, "adapter name & description" ); | ||
32 | mca_set_adapter_procfn( slot, dev_getinfo, dev ); | ||
33 | |||
34 | /* read the POS registers. Most devices only use 2 and 3 */ | ||
35 | pos2 = mca_read_stored_pos( slot, 2 ); | ||
36 | pos3 = mca_read_stored_pos( slot, 3 ); | ||
37 | pos4 = mca_read_stored_pos( slot, 4 ); | ||
38 | pos5 = mca_read_stored_pos( slot, 5 ); | ||
39 | } else { | ||
40 | return -ENODEV; | ||
41 | } | ||
42 | |||
43 | /* extract configuration from pos[2345] and set everything up */ | ||
44 | |||
45 | Loadable modules should modify this to test that the specified IRQ and | ||
46 | IO ports (plus whatever other stuff) match. See 3c523.c for example | ||
47 | code (actually, smc-mca.c has a slightly more complex example that can | ||
48 | handle a list of adapter ids). | ||
49 | |||
50 | Keep in mind that devices should never directly access the POS registers | ||
51 | (via inb(), outb(), etc). While it's generally safe, there is a small | ||
52 | potential for blowing up hardware when it's done at the wrong time. | ||
53 | Furthermore, accessing a POS register disables a device temporarily. | ||
54 | This is usually okay during startup, but do _you_ want to rely on it? | ||
55 | During initial configuration, mca_init() reads all the POS registers | ||
56 | into memory. mca_read_stored_pos() accesses that data. mca_read_pos() | ||
57 | and mca_write_pos() are also available for (safer) direct POS access, | ||
58 | but their use is _highly_ discouraged. mca_write_pos() is particularly | ||
59 | dangerous, as it is possible for adapters to be put in inconsistent | ||
60 | states (i.e. sharing IO address, etc) and may result in crashes, toasted | ||
61 | hardware, and blindness. | ||
62 | |||
63 | User level drivers (such as the AGX X server) can use /proc/mca/pos to | ||
64 | find adapters (see below). | ||
65 | |||
66 | Some MCA adapters can also be detected via the usual ISA-style device | ||
67 | probing (many SCSI adapters, for example). This sort of thing is highly | ||
68 | discouraged. Perfectly good information is available telling you what's | ||
69 | there, so there's no excuse for messing with random IO ports. However, | ||
70 | we MCA people still appreciate any ISA-style driver that will work with | ||
71 | our hardware. You take what you can get... | ||
72 | |||
73 | Level-Triggered Interrupts | ||
74 | ========================== | ||
75 | |||
76 | Because MCA uses level-triggered interrupts, a few problems arise with | ||
77 | what might best be described as the ISA mindset and its effects on | ||
78 | drivers. These sorts of problems are expected to become less common as | ||
79 | more people use shared IRQs on PCI machines. | ||
80 | |||
81 | In general, an interrupt must be acknowledged not only at the ICU (which | ||
82 | is done automagically by the kernel), but at the device level. In | ||
83 | particular, IRQ 0 must be reset after a timer interrupt (now done in | ||
84 | arch/x86/kernel/time.c) or the first timer interrupt hangs the system. | ||
85 | There were also problems with the 1.3.x floppy drivers, but that seems | ||
86 | to have been fixed. | ||
87 | |||
88 | IRQs are also shareable, and most MCA-specific devices should be coded | ||
89 | with shared IRQs in mind. | ||
90 | |||
91 | /proc/mca | ||
92 | ========= | ||
93 | |||
94 | /proc/mca is a directory containing various files for adapters and | ||
95 | other stuff. | ||
96 | |||
97 | /proc/mca/pos Straight listing of POS registers | ||
98 | /proc/mca/slot[1-8] Information on adapter in specific slot | ||
99 | /proc/mca/video Same for integrated video | ||
100 | /proc/mca/scsi Same for integrated SCSI | ||
101 | /proc/mca/machine Machine information | ||
102 | |||
103 | See Appendix A for a sample. | ||
104 | |||
105 | Device drivers can easily add their own information function for | ||
106 | specific slots (including integrated ones) via the | ||
107 | mca_set_adapter_procfn() call. Drivers that support this are ESDI, IBM | ||
108 | SCSI, and 3c523. If a device is also a module, make sure that the proc | ||
109 | function is removed in the module cleanup. This will require storing | ||
110 | the slot information in a private structure somewhere. See the 3c523 | ||
111 | driver for details. | ||
112 | |||
113 | Your typical proc function will look something like this: | ||
114 | |||
115 | static int | ||
116 | dev_getinfo( char* buf, int slot, void* d ) { | ||
117 | struct net_device* dev = (struct net_device*) d; | ||
118 | int len = 0; | ||
119 | |||
120 | len += sprintf( buf+len, "Device: %s\n", dev->name ); | ||
121 | len += sprintf( buf+len, "IRQ: %d\n", dev->irq ); | ||
122 | len += sprintf( buf+len, "IO Port: %#lx-%#lx\n", ... ); | ||
123 | ... | ||
124 | |||
125 | return len; | ||
126 | } | ||
127 | |||
128 | Some of the standard MCA information will already be printed, so don't | ||
129 | bother repeating it. Don't try putting in more than 3K of information. | ||
130 | |||
131 | Enable this function with: | ||
132 | mca_set_adapter_procfn( slot, dev_getinfo, dev ); | ||
133 | |||
134 | Disable it with: | ||
135 | mca_set_adapter_procfn( slot, NULL, NULL ); | ||
136 | |||
137 | It is also recommended that, even if you don't write a proc function, to | ||
138 | set the name of the adapter (i.e. "PS/2 ESDI Controller") via | ||
139 | mca_set_adapter_name( int slot, char* name ). | ||
140 | |||
141 | MCA Device Drivers | ||
142 | ================== | ||
143 | |||
144 | Currently, there are a number of MCA-specific device drivers. | ||
145 | |||
146 | 1) PS/2 SCSI | ||
147 | drivers/scsi/ibmmca.c | ||
148 | drivers/scsi/ibmmca.h | ||
149 | The driver for the IBM SCSI subsystem. Includes both integrated | ||
150 | controllers and adapter cards. May require command-line arg | ||
151 | "ibmmcascsi=io_port" to force detection of an adapter. If you have a | ||
152 | machine with a front-panel display (i.e. model 95), you can use | ||
153 | "ibmmcascsi=display" to enable a drive activity indicator. | ||
154 | |||
155 | 2) 3c523 | ||
156 | drivers/net/3c523.c | ||
157 | drivers/net/3c523.h | ||
158 | 3Com 3c523 Etherlink/MC ethernet driver. | ||
159 | |||
160 | 3) SMC Ultra/MCA and IBM Adapter/A | ||
161 | drivers/net/smc-mca.c | ||
162 | drivers/net/smc-mca.h | ||
163 | Driver for the MCA version of the SMC Ultra and various other | ||
164 | OEM'ed and work-alike cards (Elite, Adapter/A, etc). | ||
165 | |||
166 | 4) NE/2 | ||
167 | driver/net/ne2.c | ||
168 | driver/net/ne2.h | ||
169 | The NE/2 is the MCA version of the NE2000. This may not work | ||
170 | with clones that have a different adapter id than the original | ||
171 | NE/2. | ||
172 | |||
173 | 5) Future Domain MCS-600/700, OEM'd IBM Fast SCSI Adapter/A and | ||
174 | Reply Sound Blaster/SCSI (SCSI part) | ||
175 | Better support for these cards than the driver for ISA. | ||
176 | Supports multiple cards with IRQ sharing. | ||
177 | |||
178 | Also added boot time option of scsi-probe, which can do reordering of | ||
179 | SCSI host adapters. This will direct the kernel on the order which | ||
180 | SCSI adapter should be detected. Example: | ||
181 | scsi-probe=ibmmca,fd_mcs,adaptec1542,buslogic | ||
182 | |||
183 | The serial drivers were modified to support the extended IO port range | ||
184 | of the typical MCA system (also #ifdef CONFIG_MCA). | ||
185 | |||
186 | The following devices work with existing drivers: | ||
187 | 1) Token-ring | ||
188 | 2) Future Domain SCSI (MCS-600, MCS-700, not MCS-350, OEM'ed IBM SCSI) | ||
189 | 3) Adaptec 1640 SCSI (using the aha1542 driver) | ||
190 | 4) Bustek/Buslogic SCSI (various) | ||
191 | 5) Probably all Arcnet cards. | ||
192 | 6) Some, possibly all, MCA IDE controllers. | ||
193 | 7) 3Com 3c529 (MCA version of 3c509) (patched) | ||
194 | |||
195 | 8) Intel EtherExpressMC (patched version) | ||
196 | You need to have CONFIG_MCA defined to have EtherExpressMC support. | ||
197 | 9) Reply Sound Blaster/SCSI (SB part) (patched version) | ||
198 | |||
199 | Bugs & Other Weirdness | ||
200 | ====================== | ||
201 | |||
202 | NMIs tend to occur with MCA machines because of various hardware | ||
203 | weirdness, bus timeouts, and many other non-critical things. Some basic | ||
204 | code to handle them (inspired by the NetBSD MCA code) has been added to | ||
205 | detect the guilty device, but it's pretty incomplete. If NMIs are a | ||
206 | persistent problem (on some model 70 or 80s, they occur every couple | ||
207 | shell commands), the CONFIG_IGNORE_NMI flag will take care of that. | ||
208 | |||
209 | Various Pentium machines have had serious problems with the FPU test in | ||
210 | bugs.h. Basically, the machine hangs after the HLT test. This occurs, | ||
211 | as far as we know, on the Pentium-equipped 85s, 95s, and some PC Servers. | ||
212 | The PCI/MCA PC 750s are fine as far as I can tell. The ``mca-pentium'' | ||
213 | boot-prompt flag will disable the FPU bug check if this is a problem | ||
214 | with your machine. | ||
215 | |||
216 | The model 80 has a raft of problems that are just too weird and unique | ||
217 | to get into here. Some people have no trouble while others have nothing | ||
218 | but problems. I'd suspect some problems are related to the age of the | ||
219 | average 80 and accompanying hardware deterioration, although others | ||
220 | are definitely design problems with the hardware. Among the problems | ||
221 | include SCSI controller problems, ESDI controller problems, and serious | ||
222 | screw-ups in the floppy controller. Oh, and the parallel port is also | ||
223 | pretty flaky. There were about 5 or 6 different model 80 motherboards | ||
224 | produced to fix various obscure problems. As far as I know, it's pretty | ||
225 | much impossible to tell which bugs a particular model 80 has (other than | ||
226 | triggering them, that is). | ||
227 | |||
228 | Drivers are required for some MCA memory adapters. If you're suddenly | ||
229 | short a few megs of RAM, this might be the reason. The (I think) Enhanced | ||
230 | Memory Adapter commonly found on the model 70 is one. There's a very | ||
231 | alpha driver floating around, but it's pretty ugly (disassembled from | ||
232 | the DOS driver, actually). See the MCA Linux web page (URL below) | ||
233 | for more current memory info. | ||
234 | |||
235 | The Thinkpad 700 and 720 will work, but various components are either | ||
236 | non-functional, flaky, or we don't know anything about them. The | ||
237 | graphics controller is supposed to be some WD, but we can't get things | ||
238 | working properly. The PCMCIA slots don't seem to work. Ditto for APM. | ||
239 | The serial ports work, but detection seems to be flaky. | ||
240 | |||
241 | Credits | ||
242 | ======= | ||
243 | A whole pile of people have contributed to the MCA code. I'd include | ||
244 | their names here, but I don't have a list handy. Check the MCA Linux | ||
245 | home page (URL below) for a perpetually out-of-date list. | ||
246 | |||
247 | ===================================================================== | ||
248 | MCA Linux Home Page: http://www.dgmicro.com/mca/ | ||
249 | |||
250 | Christophe Beauregard | ||
251 | chrisb@truespectra.com | ||
252 | cpbeaure@calum.csclub.uwaterloo.ca | ||
253 | |||
254 | ===================================================================== | ||
255 | Appendix A: Sample /proc/mca | ||
256 | |||
257 | This is from my model 8595. Slot 1 contains the standard IBM SCSI | ||
258 | adapter, slot 3 is an Adaptec AHA-1640, slot 5 is a XGA-1 video adapter, | ||
259 | and slot 7 is the 3c523 Etherlink/MC. | ||
260 | |||
261 | /proc/mca/machine: | ||
262 | Model Id: 0xf8 | ||
263 | Submodel Id: 0x14 | ||
264 | BIOS Revision: 0x5 | ||
265 | |||
266 | /proc/mca/pos: | ||
267 | Slot 1: ff 8e f1 fc a0 ff ff ff IBM SCSI Adapter w/Cache | ||
268 | Slot 2: ff ff ff ff ff ff ff ff | ||
269 | Slot 3: 1f 0f 81 3b bf b6 ff ff | ||
270 | Slot 4: ff ff ff ff ff ff ff ff | ||
271 | Slot 5: db 8f 1d 5e fd c0 00 00 | ||
272 | Slot 6: ff ff ff ff ff ff ff ff | ||
273 | Slot 7: 42 60 ff 08 ff ff ff ff 3Com 3c523 Etherlink/MC | ||
274 | Slot 8: ff ff ff ff ff ff ff ff | ||
275 | Video : ff ff ff ff ff ff ff ff | ||
276 | SCSI : ff ff ff ff ff ff ff ff | ||
277 | |||
278 | /proc/mca/slot1: | ||
279 | Slot: 1 | ||
280 | Adapter Name: IBM SCSI Adapter w/Cache | ||
281 | Id: 8eff | ||
282 | Enabled: Yes | ||
283 | POS: ff 8e f1 fc a0 ff ff ff | ||
284 | Subsystem PUN: 7 | ||
285 | Detected at boot: Yes | ||
286 | |||
287 | /proc/mca/slot3: | ||
288 | Slot: 3 | ||
289 | Adapter Name: Unknown | ||
290 | Id: 0f1f | ||
291 | Enabled: Yes | ||
292 | POS: 1f 0f 81 3b bf b6 ff ff | ||
293 | |||
294 | /proc/mca/slot5: | ||
295 | Slot: 5 | ||
296 | Adapter Name: Unknown | ||
297 | Id: 8fdb | ||
298 | Enabled: Yes | ||
299 | POS: db 8f 1d 5e fd c0 00 00 | ||
300 | |||
301 | /proc/mca/slot7: | ||
302 | Slot: 7 | ||
303 | Adapter Name: 3Com 3c523 Etherlink/MC | ||
304 | Id: 6042 | ||
305 | Enabled: Yes | ||
306 | POS: 42 60 ff 08 ff ff ff ff | ||
307 | Revision: 0xe | ||
308 | IRQ: 9 | ||
309 | IO Address: 0x3300-0x3308 | ||
310 | Memory: 0xd8000-0xdbfff | ||
311 | Transceiver: External | ||
312 | Device: eth0 | ||
313 | Hardware Address: 02 60 8c 45 c4 2a | ||
diff --git a/Documentation/memory.txt b/Documentation/memory.txt new file mode 100644 index 00000000000..802efe58647 --- /dev/null +++ b/Documentation/memory.txt | |||
@@ -0,0 +1,33 @@ | |||
1 | There are several classic problems related to memory on Linux | ||
2 | systems. | ||
3 | |||
4 | 1) There are some motherboards that will not cache above | ||
5 | a certain quantity of memory. If you have one of these | ||
6 | motherboards, your system will be SLOWER, not faster | ||
7 | as you add more memory. Consider exchanging your | ||
8 | motherboard. | ||
9 | |||
10 | All of these problems can be addressed with the "mem=XXXM" boot option | ||
11 | (where XXX is the size of RAM to use in megabytes). | ||
12 | It can also tell Linux to use less memory than is actually installed. | ||
13 | If you use "mem=" on a machine with PCI, consider using "memmap=" to avoid | ||
14 | physical address space collisions. | ||
15 | |||
16 | See the documentation of your boot loader (LILO, grub, loadlin, etc.) about | ||
17 | how to pass options to the kernel. | ||
18 | |||
19 | There are other memory problems which Linux cannot deal with. Random | ||
20 | corruption of memory is usually a sign of serious hardware trouble. | ||
21 | Try: | ||
22 | |||
23 | * Reducing memory settings in the BIOS to the most conservative | ||
24 | timings. | ||
25 | |||
26 | * Adding a cooling fan. | ||
27 | |||
28 | * Not overclocking your CPU. | ||
29 | |||
30 | * Having the memory tested in a memory tester or exchanged | ||
31 | with the vendor. Consider testing it with memtest86 yourself. | ||
32 | |||
33 | * Exchanging your CPU, cache, or motherboard for one that works. | ||
diff --git a/Documentation/networking/3c359.txt b/Documentation/networking/3c359.txt new file mode 100644 index 00000000000..dadfe8147ab --- /dev/null +++ b/Documentation/networking/3c359.txt | |||
@@ -0,0 +1,58 @@ | |||
1 | |||
2 | 3COM PCI TOKEN LINK VELOCITY XL TOKEN RING CARDS README | ||
3 | |||
4 | Release 0.9.0 - Release | ||
5 | Jul 17th 2000 Mike Phillips | ||
6 | |||
7 | 1.2.0 - Final | ||
8 | Feb 17th 2002 Mike Phillips | ||
9 | Updated for submission to the 2.4.x kernel. | ||
10 | |||
11 | Thanks: | ||
12 | Terry Murphy from 3Com for tech docs and support, | ||
13 | Adam D. Ligas for testing the driver. | ||
14 | |||
15 | Note: | ||
16 | This driver will NOT work with the 3C339 Token Ring cards, you need | ||
17 | to use the tms380 driver instead. | ||
18 | |||
19 | Options: | ||
20 | |||
21 | The driver accepts three options: ringspeed, pkt_buf_sz and message_level. | ||
22 | |||
23 | These options can be specified differently for each card found. | ||
24 | |||
25 | ringspeed: Has one of three settings 0 (default), 4 or 16. 0 will | ||
26 | make the card autosense the ringspeed and join at the appropriate speed, | ||
27 | this will be the default option for most people. 4 or 16 allow you to | ||
28 | explicitly force the card to operate at a certain speed. The card will fail | ||
29 | if you try to insert it at the wrong speed. (Although some hubs will allow | ||
30 | this so be *very* careful). The main purpose for explicitly setting the ring | ||
31 | speed is for when the card is first on the ring. In autosense mode, if the card | ||
32 | cannot detect any active monitors on the ring it will open at the same speed as | ||
33 | its last opening. This can be hazardous if this speed does not match the speed | ||
34 | you want the ring to operate at. | ||
35 | |||
36 | pkt_buf_sz: This is this initial receive buffer allocation size. This will | ||
37 | default to 4096 if no value is entered. You may increase performance of the | ||
38 | driver by setting this to a value larger than the network packet size, although | ||
39 | the driver now re-sizes buffers based on MTU settings as well. | ||
40 | |||
41 | message_level: Controls level of messages created by the driver. Defaults to 0: | ||
42 | which only displays start-up and critical messages. Presently any non-zero | ||
43 | value will display all soft messages as well. NB This does not turn | ||
44 | debugging messages on, that must be done by modified the source code. | ||
45 | |||
46 | Variable MTU size: | ||
47 | |||
48 | The driver can handle a MTU size up to either 4500 or 18000 depending upon | ||
49 | ring speed. The driver also changes the size of the receive buffers as part | ||
50 | of the mtu re-sizing, so if you set mtu = 18000, you will need to be able | ||
51 | to allocate 16 * (sk_buff with 18000 buffer size) call it 18500 bytes per ring | ||
52 | position = 296,000 bytes of memory space, plus of course anything | ||
53 | necessary for the tx sk_buff's. Remember this is per card, so if you are | ||
54 | building routers, gateway's etc, you could start to use a lot of memory | ||
55 | real fast. | ||
56 | |||
57 | 2/17/02 Mike Phillips | ||
58 | |||
diff --git a/Documentation/networking/olympic.txt b/Documentation/networking/olympic.txt new file mode 100644 index 00000000000..b95b5bf9675 --- /dev/null +++ b/Documentation/networking/olympic.txt | |||
@@ -0,0 +1,79 @@ | |||
1 | |||
2 | IBM PCI Pit/Pit-Phy/Olympic CHIPSET BASED TOKEN RING CARDS README | ||
3 | |||
4 | Release 0.2.0 - Release | ||
5 | June 8th 1999 Peter De Schrijver & Mike Phillips | ||
6 | Release 0.9.C - Release | ||
7 | April 18th 2001 Mike Phillips | ||
8 | |||
9 | Thanks: | ||
10 | Erik De Cock, Adrian Bridgett and Frank Fiene for their | ||
11 | patience and testing. | ||
12 | Donald Champion for the cardbus support | ||
13 | Kyle Lucke for the dma api changes. | ||
14 | Jonathon Bitner for hardware support. | ||
15 | Everybody on linux-tr for their continued support. | ||
16 | |||
17 | Options: | ||
18 | |||
19 | The driver accepts four options: ringspeed, pkt_buf_sz, | ||
20 | message_level and network_monitor. | ||
21 | |||
22 | These options can be specified differently for each card found. | ||
23 | |||
24 | ringspeed: Has one of three settings 0 (default), 4 or 16. 0 will | ||
25 | make the card autosense the ringspeed and join at the appropriate speed, | ||
26 | this will be the default option for most people. 4 or 16 allow you to | ||
27 | explicitly force the card to operate at a certain speed. The card will fail | ||
28 | if you try to insert it at the wrong speed. (Although some hubs will allow | ||
29 | this so be *very* careful). The main purpose for explicitly setting the ring | ||
30 | speed is for when the card is first on the ring. In autosense mode, if the card | ||
31 | cannot detect any active monitors on the ring it will not open, so you must | ||
32 | re-init the card at the appropriate speed. Unfortunately at present the only | ||
33 | way of doing this is rmmod and insmod which is a bit tough if it is compiled | ||
34 | in the kernel. | ||
35 | |||
36 | pkt_buf_sz: This is this initial receive buffer allocation size. This will | ||
37 | default to 4096 if no value is entered. You may increase performance of the | ||
38 | driver by setting this to a value larger than the network packet size, although | ||
39 | the driver now re-sizes buffers based on MTU settings as well. | ||
40 | |||
41 | message_level: Controls level of messages created by the driver. Defaults to 0: | ||
42 | which only displays start-up and critical messages. Presently any non-zero | ||
43 | value will display all soft messages as well. NB This does not turn | ||
44 | debugging messages on, that must be done by modified the source code. | ||
45 | |||
46 | network_monitor: Any non-zero value will provide a quasi network monitoring | ||
47 | mode. All unexpected MAC frames (beaconing etc.) will be received | ||
48 | by the driver and the source and destination addresses printed. | ||
49 | Also an entry will be added in /proc/net called olympic_tr%d, where tr%d | ||
50 | is the registered device name, i.e tr0, tr1, etc. This displays low | ||
51 | level information about the configuration of the ring and the adapter. | ||
52 | This feature has been designed for network administrators to assist in | ||
53 | the diagnosis of network / ring problems. (This used to OLYMPIC_NETWORK_MONITOR, | ||
54 | but has now changed to allow each adapter to be configured differently and | ||
55 | to alleviate the necessity to re-compile olympic to turn the option on). | ||
56 | |||
57 | Multi-card: | ||
58 | |||
59 | The driver will detect multiple cards and will work with shared interrupts, | ||
60 | each card is assigned the next token ring device, i.e. tr0 , tr1, tr2. The | ||
61 | driver should also happily reside in the system with other drivers. It has | ||
62 | been tested with ibmtr.c running, and I personally have had one Olicom PCI | ||
63 | card and two IBM olympic cards (all on the same interrupt), all running | ||
64 | together. | ||
65 | |||
66 | Variable MTU size: | ||
67 | |||
68 | The driver can handle a MTU size up to either 4500 or 18000 depending upon | ||
69 | ring speed. The driver also changes the size of the receive buffers as part | ||
70 | of the mtu re-sizing, so if you set mtu = 18000, you will need to be able | ||
71 | to allocate 16 * (sk_buff with 18000 buffer size) call it 18500 bytes per ring | ||
72 | position = 296,000 bytes of memory space, plus of course anything | ||
73 | necessary for the tx sk_buff's. Remember this is per card, so if you are | ||
74 | building routers, gateway's etc, you could start to use a lot of memory | ||
75 | real fast. | ||
76 | |||
77 | |||
78 | 6/8/99 Peter De Schrijver and Mike Phillips | ||
79 | |||
diff --git a/Documentation/networking/smctr.txt b/Documentation/networking/smctr.txt new file mode 100644 index 00000000000..9af25b810c1 --- /dev/null +++ b/Documentation/networking/smctr.txt | |||
@@ -0,0 +1,66 @@ | |||
1 | Text File for the SMC TokenCard TokenRing Linux driver (smctr.c). | ||
2 | By Jay Schulist <jschlst@samba.org> | ||
3 | |||
4 | The Linux SMC Token Ring driver works with the SMC TokenCard Elite (8115T) | ||
5 | ISA and SMC TokenCard Elite/A (8115T/A) MCA adapters. | ||
6 | |||
7 | Latest information on this driver can be obtained on the Linux-SNA WWW site. | ||
8 | Please point your browser to: http://www.linux-sna.org | ||
9 | |||
10 | This driver is rather simple to use. Select Y to Token Ring adapter support | ||
11 | in the kernel configuration. A choice for SMC Token Ring adapters will | ||
12 | appear. This drives supports all SMC ISA/MCA adapters. Choose this | ||
13 | option. I personally recommend compiling the driver as a module (M), but if you | ||
14 | you would like to compile it statically answer Y instead. | ||
15 | |||
16 | This driver supports multiple adapters without the need to load multiple copies | ||
17 | of the driver. You should be able to load up to 7 adapters without any kernel | ||
18 | modifications, if you are in need of more please contact the maintainer of this | ||
19 | driver. | ||
20 | |||
21 | Load the driver either by lilo/loadlin or as a module. When a module using the | ||
22 | following command will suffice for most: | ||
23 | |||
24 | # modprobe smctr | ||
25 | smctr.c: v1.00 12/6/99 by jschlst@samba.org | ||
26 | tr0: SMC TokenCard 8115T at Io 0x300, Irq 10, Rom 0xd8000, Ram 0xcc000. | ||
27 | |||
28 | Now just setup the device via ifconfig and set and routes you may have. After | ||
29 | this you are ready to start sending some tokens. | ||
30 | |||
31 | Errata: | ||
32 | 1). For anyone wondering where to pick up the SMC adapters please browse | ||
33 | to http://www.smc.com | ||
34 | |||
35 | 2). If you are the first/only Token Ring Client on a Token Ring LAN, please | ||
36 | specify the ringspeed with the ringspeed=[4/16] module option. If no | ||
37 | ringspeed is specified the driver will attempt to autodetect the ring | ||
38 | speed and/or if the adapter is the first/only station on the ring take | ||
39 | the appropriate actions. | ||
40 | |||
41 | NOTE: Default ring speed is 16MB UTP. | ||
42 | |||
43 | 3). PnP support for this adapter sucks. I recommend hard setting the | ||
44 | IO/MEM/IRQ by the jumpers on the adapter. If this is not possible | ||
45 | load the module with the following io=[ioaddr] mem=[mem_addr] | ||
46 | irq=[irq_num]. | ||
47 | |||
48 | The following IRQ, IO, and MEM settings are supported. | ||
49 | |||
50 | IO ports: | ||
51 | 0x200, 0x220, 0x240, 0x260, 0x280, 0x2A0, 0x2C0, 0x2E0, 0x300, | ||
52 | 0x320, 0x340, 0x360, 0x380. | ||
53 | |||
54 | IRQs: | ||
55 | 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15 | ||
56 | |||
57 | Memory addresses: | ||
58 | 0xA0000, 0xA4000, 0xA8000, 0xAC000, 0xB0000, 0xB4000, | ||
59 | 0xB8000, 0xBC000, 0xC0000, 0xC4000, 0xC8000, 0xCC000, | ||
60 | 0xD0000, 0xD4000, 0xD8000, 0xDC000, 0xE0000, 0xE4000, | ||
61 | 0xE8000, 0xEC000, 0xF0000, 0xF4000, 0xF8000, 0xFC000 | ||
62 | |||
63 | This driver is under the GNU General Public License. Its Firmware image is | ||
64 | included as an initialized C-array and is licensed by SMC to the Linux | ||
65 | users of this driver. However no warranty about its fitness is expressed or | ||
66 | implied by SMC. | ||
diff --git a/Documentation/networking/tms380tr.txt b/Documentation/networking/tms380tr.txt new file mode 100644 index 00000000000..1f73e13058d --- /dev/null +++ b/Documentation/networking/tms380tr.txt | |||
@@ -0,0 +1,147 @@ | |||
1 | Text file for the Linux SysKonnect Token Ring ISA/PCI Adapter Driver. | ||
2 | Text file by: Jay Schulist <jschlst@samba.org> | ||
3 | |||
4 | The Linux SysKonnect Token Ring driver works with the SysKonnect TR4/16(+) ISA, | ||
5 | SysKonnect TR4/16(+) PCI, SysKonnect TR4/16 PCI, and older revisions of the | ||
6 | SK NET TR4/16 ISA card. | ||
7 | |||
8 | Latest information on this driver can be obtained on the Linux-SNA WWW site. | ||
9 | Please point your browser to: | ||
10 | http://www.linux-sna.org | ||
11 | |||
12 | Many thanks to Christoph Goos for his excellent work on this driver and | ||
13 | SysKonnect for donating the adapters to Linux-SNA for the testing and | ||
14 | maintenance of this device driver. | ||
15 | |||
16 | Important information to be noted: | ||
17 | 1. Adapters can be slow to open (~20 secs) and close (~5 secs), please be | ||
18 | patient. | ||
19 | 2. This driver works very well when autoprobing for adapters. Why even | ||
20 | think about those nasty io/int/dma settings of modprobe when the driver | ||
21 | will do it all for you! | ||
22 | |||
23 | This driver is rather simple to use. Select Y to Token Ring adapter support | ||
24 | in the kernel configuration. A choice for SysKonnect Token Ring adapters will | ||
25 | appear. This drives supports all SysKonnect ISA and PCI adapters. Choose this | ||
26 | option. I personally recommend compiling the driver as a module (M), but if you | ||
27 | you would like to compile it statically answer Y instead. | ||
28 | |||
29 | This driver supports multiple adapters without the need to load multiple copies | ||
30 | of the driver. You should be able to load up to 7 adapters without any kernel | ||
31 | modifications, if you are in need of more please contact the maintainer of this | ||
32 | driver. | ||
33 | |||
34 | Load the driver either by lilo/loadlin or as a module. When a module using the | ||
35 | following command will suffice for most: | ||
36 | |||
37 | # modprobe sktr | ||
38 | |||
39 | This will produce output similar to the following: (Output is user specific) | ||
40 | |||
41 | sktr.c: v1.01 08/29/97 by Christoph Goos | ||
42 | tr0: SK NET TR 4/16 PCI found at 0x6100, using IRQ 17. | ||
43 | tr1: SK NET TR 4/16 PCI found at 0x6200, using IRQ 16. | ||
44 | tr2: SK NET TR 4/16 ISA found at 0xa20, using IRQ 10 and DMA 5. | ||
45 | |||
46 | Now just setup the device via ifconfig and set and routes you may have. After | ||
47 | this you are ready to start sending some tokens. | ||
48 | |||
49 | Errata: | ||
50 | For anyone wondering where to pick up the SysKonnect adapters please browse | ||
51 | to http://www.syskonnect.com | ||
52 | |||
53 | This driver is under the GNU General Public License. Its Firmware image is | ||
54 | included as an initialized C-array and is licensed by SysKonnect to the Linux | ||
55 | users of this driver. However no warranty about its fitness is expressed or | ||
56 | implied by SysKonnect. | ||
57 | |||
58 | Below find attached the setting for the SK NET TR 4/16 ISA adapters | ||
59 | ------------------------------------------------------------------- | ||
60 | |||
61 | *************************** | ||
62 | *** C O N T E N T S *** | ||
63 | *************************** | ||
64 | |||
65 | 1) Location of DIP-Switch W1 | ||
66 | 2) Default settings | ||
67 | 3) DIP-Switch W1 description | ||
68 | |||
69 | |||
70 | ============================================================== | ||
71 | CHAPTER 1 LOCATION OF DIP-SWITCH | ||
72 | ============================================================== | ||
73 | |||
74 | U脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛驴 | ||
75 | 镁U脛脛脛脛脛脛驴 U脛脛脛脛脛驴 U脛脛脛驴 镁 | ||
76 | 镁A脛脛脛脛脛脛U W1 A脛脛脛脛脛U U脛脛脛脛驴 镁 镁 镁 | ||
77 | 镁U脛脛脛脛脛脛驴 镁 镁 镁 镁 U脛脛脜驴 | ||
78 | 镁A脛脛脛脛脛脛U U脛脛脛脛脛脛脛脛脛脛脛驴 A脛脛脛脛U 镁 镁 镁 镁镁 | ||
79 | 镁U脛脛脛脛脛脛驴 镁 镁 U脛脛脛驴 A脛脛脛U A脛脛脜U | ||
80 | 镁A脛脛脛脛脛脛U 镁 TMS380C26 镁 镁 镁 镁 | ||
81 | 镁U脛脛脛脛脛脛驴 镁 镁 A脛脛脛U A脛驴 | ||
82 | 镁A脛脛脛脛脛脛U 镁 镁 镁 镁 | ||
83 | 镁 A脛脛脛脛脛脛脛脛脛脛脛U 镁 镁 | ||
84 | 镁 镁 镁 | ||
85 | 镁 A脛U | ||
86 | 镁 镁 | ||
87 | 镁 镁 | ||
88 | 镁 镁 | ||
89 | 镁 镁 | ||
90 | A脛脛脛脛脛脛脛脛脛脛脛脛A脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛A脛脛A脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛A脛脛脛脛脛脛脛脛脛U | ||
91 | A脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛U A脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛U | ||
92 | |||
93 | ============================================================== | ||
94 | CHAPTER 2 DEFAULT SETTINGS | ||
95 | ============================================================== | ||
96 | |||
97 | W1 1 2 3 4 5 6 7 8 | ||
98 | +------------------------------+ | ||
99 | | ON X | | ||
100 | | OFF X X X X X X X | | ||
101 | +------------------------------+ | ||
102 | |||
103 | W1.1 = ON Adapter drives address lines SA17..19 | ||
104 | W1.2 - 1.5 = OFF BootROM disabled | ||
105 | W1.6 - 1.8 = OFF I/O address 0A20h | ||
106 | |||
107 | ============================================================== | ||
108 | CHAPTER 3 DIP SWITCH W1 DESCRIPTION | ||
109 | ============================================================== | ||
110 | |||
111 | U脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛驴 ON | ||
112 | 镁 1 镁 2 镁 3 镁 4 镁 5 镁 6 镁 7 镁 8 镁 | ||
113 | A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛U OFF | ||
114 | |AD | BootROM Addr. | I/O | | ||
115 | +-+-+-------+-------+-----+-----+ | ||
116 | | | | | ||
117 | | | +------ 6 7 8 | ||
118 | | | ON ON ON 1900h | ||
119 | | | ON ON OFF 0900h | ||
120 | | | ON OFF ON 1980h | ||
121 | | | ON OFF OFF 0980h | ||
122 | | | OFF ON ON 1b20h | ||
123 | | | OFF ON OFF 0b20h | ||
124 | | | OFF OFF ON 1a20h | ||
125 | | | OFF OFF OFF 0a20h (+) | ||
126 | | | | ||
127 | | | | ||
128 | | +-------- 2 3 4 5 | ||
129 | | OFF x x x disabled (+) | ||
130 | | ON ON ON ON C0000 | ||
131 | | ON ON ON OFF C4000 | ||
132 | | ON ON OFF ON C8000 | ||
133 | | ON ON OFF OFF CC000 | ||
134 | | ON OFF ON ON D0000 | ||
135 | | ON OFF ON OFF D4000 | ||
136 | | ON OFF OFF ON D8000 | ||
137 | | ON OFF OFF OFF DC000 | ||
138 | | | ||
139 | | | ||
140 | +----- 1 | ||
141 | OFF adapter does NOT drive SA<17..19> | ||
142 | ON adapter drives SA<17..19> (+) | ||
143 | |||
144 | |||
145 | (+) means default setting | ||
146 | |||
147 | ******************************** | ||
diff --git a/Documentation/nmi_watchdog.txt b/Documentation/nmi_watchdog.txt new file mode 100644 index 00000000000..bf9f80a9828 --- /dev/null +++ b/Documentation/nmi_watchdog.txt | |||
@@ -0,0 +1,83 @@ | |||
1 | |||
2 | [NMI watchdog is available for x86 and x86-64 architectures] | ||
3 | |||
4 | Is your system locking up unpredictably? No keyboard activity, just | ||
5 | a frustrating complete hard lockup? Do you want to help us debugging | ||
6 | such lockups? If all yes then this document is definitely for you. | ||
7 | |||
8 | On many x86/x86-64 type hardware there is a feature that enables | ||
9 | us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt | ||
10 | which get executed even if the system is otherwise locked up hard). | ||
11 | This can be used to debug hard kernel lockups. By executing periodic | ||
12 | NMI interrupts, the kernel can monitor whether any CPU has locked up, | ||
13 | and print out debugging messages if so. | ||
14 | |||
15 | In order to use the NMI watchdog, you need to have APIC support in your | ||
16 | kernel. For SMP kernels, APIC support gets compiled in automatically. For | ||
17 | UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local | ||
18 | APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and | ||
19 | features -> IO-APIC support on uniprocessors) in your kernel config. | ||
20 | CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC. | ||
21 | CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain | ||
22 | kernel debugging options, such as Kernel Stack Meter or Kernel Tracer, | ||
23 | may implicitly disable the NMI watchdog.] | ||
24 | |||
25 | For x86-64, the needed APIC is always compiled in. | ||
26 | |||
27 | Using local APIC (nmi_watchdog=2) needs the first performance register, so | ||
28 | you can't use it for other purposes (such as high precision performance | ||
29 | profiling.) However, at least oprofile and the perfctr driver disable the | ||
30 | local APIC NMI watchdog automatically. | ||
31 | |||
32 | To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot | ||
33 | parameter. Eg. the relevant lilo.conf entry: | ||
34 | |||
35 | append="nmi_watchdog=1" | ||
36 | |||
37 | For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1. | ||
38 | For UP machines without an IO-APIC use nmi_watchdog=2, this only works | ||
39 | for some processor types. If in doubt, boot with nmi_watchdog=1 and | ||
40 | check the NMI count in /proc/interrupts; if the count is zero then | ||
41 | reboot with nmi_watchdog=2 and check the NMI count. If it is still | ||
42 | zero then log a problem, you probably have a processor that needs to be | ||
43 | added to the nmi code. | ||
44 | |||
45 | A 'lockup' is the following scenario: if any CPU in the system does not | ||
46 | execute the period local timer interrupt for more than 5 seconds, then | ||
47 | the NMI handler generates an oops and kills the process. This | ||
48 | 'controlled crash' (and the resulting kernel messages) can be used to | ||
49 | debug the lockup. Thus whenever the lockup happens, wait 5 seconds and | ||
50 | the oops will show up automatically. If the kernel produces no messages | ||
51 | then the system has crashed so hard (eg. hardware-wise) that either it | ||
52 | cannot even accept NMI interrupts, or the crash has made the kernel | ||
53 | unable to print messages. | ||
54 | |||
55 | Be aware that when using local APIC, the frequency of NMI interrupts | ||
56 | it generates, depends on the system load. The local APIC NMI watchdog, | ||
57 | lacking a better source, uses the "cycles unhalted" event. As you may | ||
58 | guess it doesn't tick when the CPU is in the halted state (which happens | ||
59 | when the system is idle), but if your system locks up on anything but the | ||
60 | "hlt" processor instruction, the watchdog will trigger very soon as the | ||
61 | "cycles unhalted" event will happen every clock tick. If it locks up on | ||
62 | "hlt", then you are out of luck -- the event will not happen at all and the | ||
63 | watchdog won't trigger. This is a shortcoming of the local APIC watchdog | ||
64 | -- unfortunately there is no "clock ticks" event that would work all the | ||
65 | time. The I/O APIC watchdog is driven externally and has no such shortcoming. | ||
66 | But its NMI frequency is much higher, resulting in a more significant hit | ||
67 | to the overall system performance. | ||
68 | |||
69 | On x86 nmi_watchdog is disabled by default so you have to enable it with | ||
70 | a boot time parameter. | ||
71 | |||
72 | It's possible to disable the NMI watchdog in run-time by writing "0" to | ||
73 | /proc/sys/kernel/nmi_watchdog. Writing "1" to the same file will re-enable | ||
74 | the NMI watchdog. Notice that you still need to use "nmi_watchdog=" parameter | ||
75 | at boot time. | ||
76 | |||
77 | NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally | ||
78 | on x86 SMP boxes. | ||
79 | |||
80 | [ feel free to send bug reports, suggestions and patches to | ||
81 | Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing | ||
82 | list at <linux-smp@vger.kernel.org> ] | ||
83 | |||
diff --git a/Documentation/powerpc/phyp-assisted-dump.txt b/Documentation/powerpc/phyp-assisted-dump.txt new file mode 100644 index 00000000000..ad340205d96 --- /dev/null +++ b/Documentation/powerpc/phyp-assisted-dump.txt | |||
@@ -0,0 +1,127 @@ | |||
1 | |||
2 | Hypervisor-Assisted Dump | ||
3 | ------------------------ | ||
4 | November 2007 | ||
5 | |||
6 | The goal of hypervisor-assisted dump is to enable the dump of | ||
7 | a crashed system, and to do so from a fully-reset system, and | ||
8 | to minimize the total elapsed time until the system is back | ||
9 | in production use. | ||
10 | |||
11 | As compared to kdump or other strategies, hypervisor-assisted | ||
12 | dump offers several strong, practical advantages: | ||
13 | |||
14 | -- Unlike kdump, the system has been reset, and loaded | ||
15 | with a fresh copy of the kernel. In particular, | ||
16 | PCI and I/O devices have been reinitialized and are | ||
17 | in a clean, consistent state. | ||
18 | -- As the dump is performed, the dumped memory becomes | ||
19 | immediately available to the system for normal use. | ||
20 | -- After the dump is completed, no further reboots are | ||
21 | required; the system will be fully usable, and running | ||
22 | in its normal, production mode on its normal kernel. | ||
23 | |||
24 | The above can only be accomplished by coordination with, | ||
25 | and assistance from the hypervisor. The procedure is | ||
26 | as follows: | ||
27 | |||
28 | -- When a system crashes, the hypervisor will save | ||
29 | the low 256MB of RAM to a previously registered | ||
30 | save region. It will also save system state, system | ||
31 | registers, and hardware PTE's. | ||
32 | |||
33 | -- After the low 256MB area has been saved, the | ||
34 | hypervisor will reset PCI and other hardware state. | ||
35 | It will *not* clear RAM. It will then launch the | ||
36 | bootloader, as normal. | ||
37 | |||
38 | -- The freshly booted kernel will notice that there | ||
39 | is a new node (ibm,dump-kernel) in the device tree, | ||
40 | indicating that there is crash data available from | ||
41 | a previous boot. It will boot into only 256MB of RAM, | ||
42 | reserving the rest of system memory. | ||
43 | |||
44 | -- Userspace tools will parse /sys/kernel/release_region | ||
45 | and read /proc/vmcore to obtain the contents of memory, | ||
46 | which holds the previous crashed kernel. The userspace | ||
47 | tools may copy this info to disk, or network, nas, san, | ||
48 | iscsi, etc. as desired. | ||
49 | |||
50 | For Example: the values in /sys/kernel/release-region | ||
51 | would look something like this (address-range pairs). | ||
52 | CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: / | ||
53 | DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A | ||
54 | |||
55 | -- As the userspace tools complete saving a portion of | ||
56 | dump, they echo an offset and size to | ||
57 | /sys/kernel/release_region to release the reserved | ||
58 | memory back to general use. | ||
59 | |||
60 | An example of this is: | ||
61 | "echo 0x40000000 0x10000000 > /sys/kernel/release_region" | ||
62 | which will release 256MB at the 1GB boundary. | ||
63 | |||
64 | Please note that the hypervisor-assisted dump feature | ||
65 | is only available on Power6-based systems with recent | ||
66 | firmware versions. | ||
67 | |||
68 | Implementation details: | ||
69 | ---------------------- | ||
70 | |||
71 | During boot, a check is made to see if firmware supports | ||
72 | this feature on this particular machine. If it does, then | ||
73 | we check to see if a active dump is waiting for us. If yes | ||
74 | then everything but 256 MB of RAM is reserved during early | ||
75 | boot. This area is released once we collect a dump from user | ||
76 | land scripts that are run. If there is dump data, then | ||
77 | the /sys/kernel/release_region file is created, and | ||
78 | the reserved memory is held. | ||
79 | |||
80 | If there is no waiting dump data, then only the highest | ||
81 | 256MB of the ram is reserved as a scratch area. This area | ||
82 | is *not* released: this region will be kept permanently | ||
83 | reserved, so that it can act as a receptacle for a copy | ||
84 | of the low 256MB in the case a crash does occur. See, | ||
85 | however, "open issues" below, as to whether | ||
86 | such a reserved region is really needed. | ||
87 | |||
88 | Currently the dump will be copied from /proc/vmcore to a | ||
89 | a new file upon user intervention. The starting address | ||
90 | to be read and the range for each data point in provided | ||
91 | in /sys/kernel/release_region. | ||
92 | |||
93 | The tools to examine the dump will be same as the ones | ||
94 | used for kdump. | ||
95 | |||
96 | General notes: | ||
97 | -------------- | ||
98 | Security: please note that there are potential security issues | ||
99 | with any sort of dump mechanism. In particular, plaintext | ||
100 | (unencrypted) data, and possibly passwords, may be present in | ||
101 | the dump data. Userspace tools must take adequate precautions to | ||
102 | preserve security. | ||
103 | |||
104 | Open issues/ToDo: | ||
105 | ------------ | ||
106 | o The various code paths that tell the hypervisor that a crash | ||
107 | occurred, vs. it simply being a normal reboot, should be | ||
108 | reviewed, and possibly clarified/fixed. | ||
109 | |||
110 | o Instead of using /sys/kernel, should there be a /sys/dump | ||
111 | instead? There is a dump_subsys being created by the s390 code, | ||
112 | perhaps the pseries code should use a similar layout as well. | ||
113 | |||
114 | o Is reserving a 256MB region really required? The goal of | ||
115 | reserving a 256MB scratch area is to make sure that no | ||
116 | important crash data is clobbered when the hypervisor | ||
117 | save low mem to the scratch area. But, if one could assure | ||
118 | that nothing important is located in some 256MB area, then | ||
119 | it would not need to be reserved. Something that can be | ||
120 | improved in subsequent versions. | ||
121 | |||
122 | o Still working the kdump team to integrate this with kdump, | ||
123 | some work remains but this would not affect the current | ||
124 | patches. | ||
125 | |||
126 | o Still need to write a shell script, to copy the dump away. | ||
127 | Currently I am parsing it manually. | ||
diff --git a/Documentation/prio_tree.txt b/Documentation/prio_tree.txt new file mode 100644 index 00000000000..3aa68f9a117 --- /dev/null +++ b/Documentation/prio_tree.txt | |||
@@ -0,0 +1,107 @@ | |||
1 | The prio_tree.c code indexes vmas using 3 different indexes: | ||
2 | * heap_index = vm_pgoff + vm_size_in_pages : end_vm_pgoff | ||
3 | * radix_index = vm_pgoff : start_vm_pgoff | ||
4 | * size_index = vm_size_in_pages | ||
5 | |||
6 | A regular radix-priority-search-tree indexes vmas using only heap_index and | ||
7 | radix_index. The conditions for indexing are: | ||
8 | * ->heap_index >= ->left->heap_index && | ||
9 | ->heap_index >= ->right->heap_index | ||
10 | * if (->heap_index == ->left->heap_index) | ||
11 | then ->radix_index < ->left->radix_index; | ||
12 | * if (->heap_index == ->right->heap_index) | ||
13 | then ->radix_index < ->right->radix_index; | ||
14 | * nodes are hashed to left or right subtree using radix_index | ||
15 | similar to a pure binary radix tree. | ||
16 | |||
17 | A regular radix-priority-search-tree helps to store and query | ||
18 | intervals (vmas). However, a regular radix-priority-search-tree is only | ||
19 | suitable for storing vmas with different radix indices (vm_pgoff). | ||
20 | |||
21 | Therefore, the prio_tree.c extends the regular radix-priority-search-tree | ||
22 | to handle many vmas with the same vm_pgoff. Such vmas are handled in | ||
23 | 2 different ways: 1) All vmas with the same radix _and_ heap indices are | ||
24 | linked using vm_set.list, 2) if there are many vmas with the same radix | ||
25 | index, but different heap indices and if the regular radix-priority-search | ||
26 | tree cannot index them all, we build an overflow-sub-tree that indexes such | ||
27 | vmas using heap and size indices instead of heap and radix indices. For | ||
28 | example, in the figure below some vmas with vm_pgoff = 0 (zero) are | ||
29 | indexed by regular radix-priority-search-tree whereas others are pushed | ||
30 | into an overflow-subtree. Note that all vmas in an overflow-sub-tree have | ||
31 | the same vm_pgoff (radix_index) and if necessary we build different | ||
32 | overflow-sub-trees to handle each possible radix_index. For example, | ||
33 | in figure we have 3 overflow-sub-trees corresponding to radix indices | ||
34 | 0, 2, and 4. | ||
35 | |||
36 | In the final tree the first few (prio_tree_root->index_bits) levels | ||
37 | are indexed using heap and radix indices whereas the overflow-sub-trees below | ||
38 | those levels (i.e. levels prio_tree_root->index_bits + 1 and higher) are | ||
39 | indexed using heap and size indices. In overflow-sub-trees the size_index | ||
40 | is used for hashing the nodes to appropriate places. | ||
41 | |||
42 | Now, an example prio_tree: | ||
43 | |||
44 | vmas are represented [radix_index, size_index, heap_index] | ||
45 | i.e., [start_vm_pgoff, vm_size_in_pages, end_vm_pgoff] | ||
46 | |||
47 | level prio_tree_root->index_bits = 3 | ||
48 | ----- | ||
49 | _ | ||
50 | 0 [0,7,7] | | ||
51 | / \ | | ||
52 | ------------------ ------------ | Regular | ||
53 | / \ | radix priority | ||
54 | 1 [1,6,7] [4,3,7] | search tree | ||
55 | / \ / \ | | ||
56 | ------- ----- ------ ----- | heap-and-radix | ||
57 | / \ / \ | indexed | ||
58 | 2 [0,6,6] [2,5,7] [5,2,7] [6,1,7] | | ||
59 | / \ / \ / \ / \ | | ||
60 | 3 [0,5,5] [1,5,6] [2,4,6] [3,4,7] [4,2,6] [5,1,6] [6,0,6] [7,0,7] | | ||
61 | / / / _ | ||
62 | / / / _ | ||
63 | 4 [0,4,4] [2,3,5] [4,1,5] | | ||
64 | / / / | | ||
65 | 5 [0,3,3] [2,2,4] [4,0,4] | Overflow-sub-trees | ||
66 | / / | | ||
67 | 6 [0,2,2] [2,1,3] | heap-and-size | ||
68 | / / | indexed | ||
69 | 7 [0,1,1] [2,0,2] | | ||
70 | / | | ||
71 | 8 [0,0,0] | | ||
72 | _ | ||
73 | |||
74 | Note that we use prio_tree_root->index_bits to optimize the height | ||
75 | of the heap-and-radix indexed tree. Since prio_tree_root->index_bits is | ||
76 | set according to the maximum end_vm_pgoff mapped, we are sure that all | ||
77 | bits (in vm_pgoff) above prio_tree_root->index_bits are 0 (zero). Therefore, | ||
78 | we only use the first prio_tree_root->index_bits as radix_index. | ||
79 | Whenever index_bits is increased in prio_tree_expand, we shuffle the tree | ||
80 | to make sure that the first prio_tree_root->index_bits levels of the tree | ||
81 | is indexed properly using heap and radix indices. | ||
82 | |||
83 | We do not optimize the height of overflow-sub-trees using index_bits. | ||
84 | The reason is: there can be many such overflow-sub-trees and all of | ||
85 | them have to be suffled whenever the index_bits increases. This may involve | ||
86 | walking the whole prio_tree in prio_tree_insert->prio_tree_expand code | ||
87 | path which is not desirable. Hence, we do not optimize the height of the | ||
88 | heap-and-size indexed overflow-sub-trees using prio_tree->index_bits. | ||
89 | Instead the overflow sub-trees are indexed using full BITS_PER_LONG bits | ||
90 | of size_index. This may lead to skewed sub-trees because most of the | ||
91 | higher significant bits of the size_index are likely to be 0 (zero). In | ||
92 | the example above, all 3 overflow-sub-trees are skewed. This may marginally | ||
93 | affect the performance. However, processes rarely map many vmas with the | ||
94 | same start_vm_pgoff but different end_vm_pgoffs. Therefore, we normally | ||
95 | do not require overflow-sub-trees to index all vmas. | ||
96 | |||
97 | From the above discussion it is clear that the maximum height of | ||
98 | a prio_tree can be prio_tree_root->index_bits + BITS_PER_LONG. | ||
99 | However, in most of the common cases we do not need overflow-sub-trees, | ||
100 | so the tree height in the common cases will be prio_tree_root->index_bits. | ||
101 | |||
102 | It is fair to mention here that the prio_tree_root->index_bits | ||
103 | is increased on demand, however, the index_bits is not decreased when | ||
104 | vmas are removed from the prio_tree. That's tricky to do. Hence, it's | ||
105 | left as a home work problem. | ||
106 | |||
107 | |||
diff --git a/Documentation/scsi/ibmmca.txt b/Documentation/scsi/ibmmca.txt new file mode 100644 index 00000000000..ac41a9fcac7 --- /dev/null +++ b/Documentation/scsi/ibmmca.txt | |||
@@ -0,0 +1,1402 @@ | |||
1 | |||
2 | -=< The IBM Microchannel SCSI-Subsystem >=- | ||
3 | |||
4 | for the IBM PS/2 series | ||
5 | |||
6 | Low Level Software-Driver for Linux | ||
7 | |||
8 | Copyright (c) 1995 Strom Systems, Inc. under the terms of the GNU | ||
9 | General Public License. Originally written by Martin Kolinek, December 1995. | ||
10 | Officially modified and maintained by Michael Lang since January 1999. | ||
11 | |||
12 | Version 4.0a | ||
13 | |||
14 | Last update: January 3, 2001 | ||
15 | |||
16 | Before you Start | ||
17 | ---------------- | ||
18 | This is the common README.ibmmca file for all driver releases of the | ||
19 | IBM MCA SCSI driver for Linux. Please note, that driver releases 4.0 | ||
20 | or newer do not work with kernel versions older than 2.4.0, while driver | ||
21 | versions older than 4.0 do not work with kernels 2.4.0 or later! If you | ||
22 | try to compile your kernel with the wrong driver source, the | ||
23 | compilation is aborted and you get a corresponding error message. This is | ||
24 | no bug in the driver; it prevents you from using the wrong source code | ||
25 | with the wrong kernel version. | ||
26 | |||
27 | Authors of this Driver | ||
28 | ---------------------- | ||
29 | - Chris Beauregard (improvement of the SCSI-device mapping by the driver) | ||
30 | - Martin Kolinek (origin, first release of this driver) | ||
31 | - Klaus Kudielka (multiple SCSI-host management/detection, adaption to | ||
32 | Linux Kernel 2.1.x, module support) | ||
33 | - Michael Lang (assigning original pun/lun mapping, dynamical ldn | ||
34 | assignment, rewritten adapter detection, this file, | ||
35 | patches, official driver maintenance and subsequent | ||
36 | debugging, related with the driver) | ||
37 | |||
38 | Table of Contents | ||
39 | ----------------- | ||
40 | 1 Abstract | ||
41 | 2 Driver Description | ||
42 | 2.1 IBM SCSI-Subsystem Detection | ||
43 | 2.2 Physical Units, Logical Units, and Logical Devices | ||
44 | 2.3 SCSI-Device Recognition and dynamical ldn Assignment | ||
45 | 2.4 SCSI-Device Order | ||
46 | 2.5 Regular SCSI-Command-Processing | ||
47 | 2.6 Abort & Reset Commands | ||
48 | 2.7 Disk Geometry | ||
49 | 2.8 Kernel Boot Option | ||
50 | 2.9 Driver Module Support | ||
51 | 2.10 Multiple Hostadapter Support | ||
52 | 2.11 /proc/scsi-Filesystem Information | ||
53 | 2.12 /proc/mca-Filesystem Information | ||
54 | 2.13 Supported IBM SCSI-Subsystems | ||
55 | 2.14 Linux Kernel Versions | ||
56 | 3 Code History | ||
57 | 4 To do | ||
58 | 5 Users' Manual | ||
59 | 5.1 Commandline Parameters | ||
60 | 5.2 Troubleshooting | ||
61 | 5.3 Bug reports | ||
62 | 5.4 Support WWW-page | ||
63 | 6 References | ||
64 | 7 Credits to | ||
65 | 7.1 People | ||
66 | 7.2 Sponsors & Supporters | ||
67 | 8 Trademarks | ||
68 | 9 Disclaimer | ||
69 | |||
70 | * * * | ||
71 | |||
72 | 1 Abstract | ||
73 | ---------- | ||
74 | This README-file describes the IBM SCSI-subsystem low level driver for | ||
75 | Linux. The descriptions which were formerly kept in the source code have | ||
76 | been taken out of this file to simplify the codes readability. The driver | ||
77 | description has been updated, as most of the former description was already | ||
78 | quite outdated. The history of the driver development is also kept inside | ||
79 | here. Multiple historical developments have been summarized to shorten the | ||
80 | text size a bit. At the end of this file you can find a small manual for | ||
81 | this driver and hints to get it running on your machine. | ||
82 | |||
83 | 2 Driver Description | ||
84 | -------------------- | ||
85 | 2.1 IBM SCSI-Subsystem Detection | ||
86 | -------------------------------- | ||
87 | This is done in the ibmmca_detect() function. It first checks, if the | ||
88 | Microchannel-bus support is enabled, as the IBM SCSI-subsystem needs the | ||
89 | Microchannel. In a next step, a free interrupt is chosen and the main | ||
90 | interrupt handler is connected to it to handle answers of the SCSI- | ||
91 | subsystem(s). If the F/W SCSI-adapter is forced by the BIOS to use IRQ11 | ||
92 | instead of IRQ14, IRQ11 is used for the IBM SCSI-2 F/W adapter. In a | ||
93 | further step it is checked, if the adapter gets detected by force from | ||
94 | the kernel commandline, where the I/O port and the SCSI-subsystem id can | ||
95 | be specified. The next step checks if there is an integrated SCSI-subsystem | ||
96 | installed. This register area is fixed through all IBM PS/2 MCA-machines | ||
97 | and appears as something like a virtual slot 10 of the MCA-bus. On most | ||
98 | PS/2 machines, the POS registers of slot 10 are set to 0xff or 0x00 if not | ||
99 | integrated SCSI-controller is available. But on certain PS/2s, like model | ||
100 | 9595, this slot 10 is used to store other information which at earlier | ||
101 | stage confused the driver and resulted in the detection of some ghost-SCSI. | ||
102 | If POS-register 2 and 3 are not 0x00 and not 0xff, but all other POS | ||
103 | registers are either 0xff or 0x00, there must be an integrated SCSI- | ||
104 | subsystem present and it will be registered as IBM Integrated SCSI- | ||
105 | Subsystem. The next step checks, if there is a slot-adapter installed on | ||
106 | the MCA-bus. To get this, the first two POS-registers, that represent the | ||
107 | adapter ID are checked. If they fit to one of the ids, stored in the | ||
108 | adapter list, a SCSI-subsystem is assumed to be found in a slot and will be | ||
109 | registered. This check is done through all possible MCA-bus slots to allow | ||
110 | more than one SCSI-adapter to be present in the PS/2-system and this is | ||
111 | already the first point of problems. Looking into the technical reference | ||
112 | manual for the IBM PS/2 common interfaces, the POS2 register must have | ||
113 | different interpretation of its single bits to avoid overlapping I/O | ||
114 | regions. While one can assume, that the integrated subsystem has a fix | ||
115 | I/O-address at 0x3540 - 0x3547, further installed IBM SCSI-adapters must | ||
116 | use a different I/O-address. This is expressed by bit 1 to 3 of POS2 | ||
117 | (multiplied by 8 + 0x3540). Bits 2 and 3 are reserved for the integrated | ||
118 | subsystem, but not for the adapters! The following list shows, how the | ||
119 | bits of POS2 and POS3 should be interpreted. | ||
120 | |||
121 | The POS2-register of all PS/2 models' integrated SCSI-subsystems has the | ||
122 | following interpretation of bits: | ||
123 | Bit 7 - 4 : Chip Revision ID (Release) | ||
124 | Bit 3 - 2 : Reserved | ||
125 | Bit 1 : 8k NVRAM Disabled | ||
126 | Bit 0 : Chip Enable (EN-Signal) | ||
127 | The POS3-register is interpreted as follows (for most IBM SCSI-subsys.): | ||
128 | Bit 7 - 5 : SCSI ID | ||
129 | Bit 4 - 0 : Reserved = 0 | ||
130 | The slot-adapters have different interpretation of these bits. The IBM SCSI | ||
131 | adapter (w/Cache) and the IBM SCSI-2 F/W adapter use the following | ||
132 | interpretation of the POS2 register: | ||
133 | Bit 7 - 4 : ROM Segment Address Select | ||
134 | Bit 3 - 1 : Adapter I/O Address Select (*8+0x3540) | ||
135 | Bit 0 : Adapter Enable (EN-Signal) | ||
136 | and for the POS3 register: | ||
137 | Bit 7 - 5 : SCSI ID | ||
138 | Bit 4 : Fairness Enable (SCSI ID3 f. F/W) | ||
139 | Bit 3 - 0 : Arbitration Level | ||
140 | The most modern product of the series is the IBM SCSI-2 F/W adapter, it | ||
141 | allows dual-bus SCSI and SCSI-wide addressing, which means, PUNs may be | ||
142 | between 0 and 15. Here, Bit 4 is the high-order bit of the 4-bit wide | ||
143 | adapter PUN expression. In short words, this means, that IBM PS/2 machines | ||
144 | can only support 1 single integrated subsystem by default. Additional | ||
145 | slot-adapters get ports assigned by the automatic configuration tool. | ||
146 | |||
147 | One day I found a patch in ibmmca_detect(), forcing the I/O-address to be | ||
148 | 0x3540 for integrated SCSI-subsystems, there was a remark placed, that on | ||
149 | integrated IBM SCSI-subsystems of model 56, the POS2 register was showing 5. | ||
150 | This means, that really for these models, POS2 has to be interpreted | ||
151 | sticking to the technical reference guide. In this case, the bit 2 (4) is | ||
152 | a reserved bit and may not be interpreted. These differences between the | ||
153 | adapters and the integrated controllers are taken into account by the | ||
154 | detection routine of the driver on from version >3.0g. | ||
155 | |||
156 | Every time, a SCSI-subsystem is discovered, the ibmmca_register() function | ||
157 | is called. This function checks first, if the requested area for the I/O- | ||
158 | address of this SCSI-subsystem is still available and assigns this I/O- | ||
159 | area to the SCSI-subsystem. There are always 8 sequential I/O-addresses | ||
160 | taken for each individual SCSI-subsystem found, which are: | ||
161 | |||
162 | Offset Type Permissions | ||
163 | 0 Command Interface Register 1 Read/Write | ||
164 | 1 Command Interface Register 2 Read/Write | ||
165 | 2 Command Interface Register 3 Read/Write | ||
166 | 3 Command Interface Register 4 Read/Write | ||
167 | 4 Attention Register Read/Write | ||
168 | 5 Basic Control Register Read/Write | ||
169 | 6 Interrupt Status Register Read | ||
170 | 7 Basic Status Register Read | ||
171 | |||
172 | After the I/O-address range is assigned, the host-adapter is assigned | ||
173 | to a local structure which keeps all adapter information needed for the | ||
174 | driver itself and the mid- and higher-level SCSI-drivers. The SCSI pun/lun | ||
175 | and the adapters' ldn tables are initialized and get probed afterwards by | ||
176 | the check_devices() function. If no further adapters are found, | ||
177 | ibmmca_detect() quits. | ||
178 | |||
179 | 2.2 Physical Units, Logical Units, and Logical Devices | ||
180 | ------------------------------------------------------ | ||
181 | There can be up to 56 devices on the SCSI bus (besides the adapter): | ||
182 | there are up to 7 "physical units" (each identified by physical unit | ||
183 | number or pun, also called the scsi id, this is the number you select | ||
184 | with hardware jumpers), and each physical unit can have up to 8 | ||
185 | "logical units" (each identified by logical unit number, or lun, | ||
186 | between 0 and 7). The IBM SCSI-2 F/W adapter offers this on up to two | ||
187 | busses and provides support for 30 logical devices at the same time, where | ||
188 | in wide-addressing mode you can have 16 puns with 32 luns on each device. | ||
189 | This section describes the handling of devices on non-F/W adapters. | ||
190 | Just imagine, that you can have 16 * 32 = 512 devices on a F/W adapter | ||
191 | which means a lot of possible devices for such a small machine. | ||
192 | |||
193 | Typically the adapter has pun=7, so puns of other physical units | ||
194 | are between 0 and 6(15). On a wide-adapter a pun higher than 7 is | ||
195 | possible, but is normally not used. Almost all physical units have only | ||
196 | one logical unit, with lun=0. A CD-ROM jukebox would be an example of a | ||
197 | physical unit with more than one logical unit. | ||
198 | |||
199 | The embedded microprocessor of the IBM SCSI-subsystem hides the complex | ||
200 | two-dimensional (pun,lun) organization from the operating system. | ||
201 | When the machine is powered-up (or rebooted), the embedded microprocessor | ||
202 | checks, on its own, all 56 possible (pun,lun) combinations, and the first | ||
203 | 15 devices found are assigned into a one-dimensional array of so-called | ||
204 | "logical devices", identified by "logical device numbers" or ldn. The last | ||
205 | ldn=15 is reserved for the subsystem itself. Wide adapters may have | ||
206 | to check up to 15 * 8 = 120 pun/lun combinations. | ||
207 | |||
208 | 2.3 SCSI-Device Recognition and Dynamical ldn Assignment | ||
209 | -------------------------------------------------------- | ||
210 | One consequence of information hiding is that the real (pun,lun) | ||
211 | numbers are also hidden. The two possibilities to get around this problem | ||
212 | are to offer fake pun/lun combinations to the operating system or to | ||
213 | delete the whole mapping of the adapter and to reassign the ldns, using | ||
214 | the immediate assign command of the SCSI-subsystem for probing through | ||
215 | all possible pun/lun combinations. An ldn is a "logical device number" | ||
216 | which is used by IBM SCSI-subsystems to access some valid SCSI-device. | ||
217 | At the beginning of the development of this driver, the following approach | ||
218 | was used: | ||
219 | |||
220 | First, the driver checked the ldn's (0 to 6) to find out which ldn's | ||
221 | have devices assigned. This was done by the functions check_devices() and | ||
222 | device_exists(). The interrupt handler has a special paragraph of code | ||
223 | (see local_checking_phase_flag) to assist in the checking. Assume, for | ||
224 | example, that three logical devices were found assigned at ldn 0, 1, 2. | ||
225 | These are presented to the upper layer of Linux SCSI driver | ||
226 | as devices with bogus (pun, lun) equal to (0,0), (1,0), (2,0). | ||
227 | On the other hand, if the upper layer issues a command to device | ||
228 | say (4,0), this driver returns DID_NO_CONNECT error. | ||
229 | |||
230 | In a second step of the driver development, the following improvement has | ||
231 | been applied: The first approach limited the number of devices to 7, far | ||
232 | fewer than the 15 that it could use, then it just mapped ldn -> | ||
233 | (ldn/8,ldn%8) for pun,lun. We ended up with a real mishmash of puns | ||
234 | and luns, but it all seemed to work. | ||
235 | |||
236 | The latest development, which is implemented from the driver version 3.0 | ||
237 | and later, realizes the device recognition in the following way: | ||
238 | The physical SCSI-devices on the SCSI-bus are probed via immediate_assign- | ||
239 | and device_inquiry-commands, that is all implemented in a completely new | ||
240 | made check_devices() subroutine. This delivers an exact map of the physical | ||
241 | SCSI-world that is now stored in the get_scsi[][]-array. This means, | ||
242 | that the once hidden pun,lun assignment is now known to this driver. | ||
243 | It no longer believes in default-settings of the subsystem and maps all | ||
244 | ldns to existing pun,lun "by foot". This assures full control of the ldn | ||
245 | mapping and allows dynamical remapping of ldns to different pun,lun, if | ||
246 | there are more SCSI-devices installed than ldns available (n>15). The | ||
247 | ldns from 0 to 6 get 'hardwired' by this driver to puns 0 to 7 at lun=0, | ||
248 | excluding the pun of the subsystem. This assures, that at least simple | ||
249 | SCSI-installations have optimum access-speed and are not touched by | ||
250 | dynamical remapping. The ldns 7 to 14 are put to existing devices with | ||
251 | lun>0 or to non-existing devices, in order to satisfy the subsystem, if | ||
252 | there are less than 15 SCSI-devices connected. In the case of more than 15 | ||
253 | devices, the dynamical mapping goes active. If the get_scsi[][] reports a | ||
254 | device to be existent, but it has no ldn assigned, it gets an ldn out of 7 | ||
255 | to 14. The numbers are assigned in cyclic order, therefore it takes 8 | ||
256 | dynamical reassignments on the SCSI-devices until a certain device | ||
257 | loses its ldn again. This assures that dynamical remapping is avoided | ||
258 | during intense I/O between up to 15 SCSI-devices (means pun,lun | ||
259 | combinations). A further advantage of this method is that people who | ||
260 | build their kernel without probing on all luns will get what they expect, | ||
261 | because the driver just won't assign everything with lun>0 when | ||
262 | multiple lun probing is inactive. | ||
263 | |||
264 | 2.4 SCSI-Device Order | ||
265 | --------------------- | ||
266 | Because of the now correct recognition of physical pun,lun, and | ||
267 | their report to mid-level- and higher-level-drivers, the new reported puns | ||
268 | can be different from the old, faked puns. Therefore, Linux will eventually | ||
269 | change /dev/sdXXX assignments and prompt you for corrupted superblock | ||
270 | repair on boottime. In this case DO NOT PANIC, YOUR DISKS ARE STILL OK!!! | ||
271 | You have to reboot (CTRL-D) with an old kernel and set the /etc/fstab-file | ||
272 | entries right. After that, the system should come up as errorfree as before. | ||
273 | If your boot-partition is not coming up, also edit the /etc/lilo.conf-file | ||
274 | in a Linux session booted on old kernel and run lilo before reboot. Check | ||
275 | lilo.conf anyway to get boot on other partitions with foreign OSes right | ||
276 | again. But there exists a feature of this driver that allows you to change | ||
277 | the assignment order of the SCSI-devices by flipping the PUN-assignment. | ||
278 | See the next paragraph for a description. | ||
279 | |||
280 | The problem for this is, that Linux does not assign the SCSI-devices in the | ||
281 | way as described in the ANSI-SCSI-standard. Linux assigns /dev/sda to | ||
282 | the device with at minimum id 0. But the first drive should be at id 6, | ||
283 | because for historical reasons, drive at id 6 has, by hardware, the highest | ||
284 | priority and a drive at id 0 the lowest. IBM was one of the rare producers, | ||
285 | where the BIOS assigns drives belonging to the ANSI-SCSI-standard. Most | ||
286 | other producers' BIOS does not (I think even Adaptec-BIOS). The | ||
287 | IBMMCA_SCSI_ORDER_STANDARD flag, which you set while configuring the | ||
288 | kernel enables to choose the preferred way of SCSI-device-assignment. | ||
289 | Defining this flag would result in Linux determining the devices in the | ||
290 | same order as DOS and OS/2 does on your MCA-machine. This is also standard | ||
291 | on most industrial computers and OSes, like e.g. OS-9. Leaving this flag | ||
292 | undefined will get your devices ordered in the default way of Linux. See | ||
293 | also the remarks of Chris Beauregard from Dec 15, 1997 and the followups | ||
294 | in section 3. | ||
295 | |||
296 | 2.5 Regular SCSI-Command-Processing | ||
297 | ----------------------------------- | ||
298 | Only three functions get involved: ibmmca_queuecommand(), issue_cmd(), | ||
299 | and interrupt_handler(). | ||
300 | |||
301 | The upper layer issues a scsi command by calling function | ||
302 | ibmmca_queuecommand(). This function fills a "subsystem control block" | ||
303 | (scb) and calls a local function issue_cmd(), which writes a scb | ||
304 | command into subsystem I/O ports. Once the scb command is carried out, | ||
305 | the interrupt_handler() is invoked. If a device is determined to be | ||
306 | existent and it has not assigned any ldn, it gets one dynamically. | ||
307 | For this, the whole stuff is done in ibmmca_queuecommand(). | ||
308 | |||
309 | 2.6 Abort & Reset Commands | ||
310 | -------------------------- | ||
311 | These are implemented with busy waiting for interrupt to arrive. | ||
312 | ibmmca_reset() and ibmmca_abort() do not work sufficiently well | ||
313 | up to now and need still a lot of development work. This seems | ||
314 | to be a problem with other low-level SCSI drivers too, however | ||
315 | this should be no excuse. | ||
316 | |||
317 | 2.7 Disk Geometry | ||
318 | ----------------- | ||
319 | The ibmmca_biosparams() function should return the same disk geometry | ||
320 | as the bios. This is needed for fdisk, etc. The returned geometry is | ||
321 | certainly correct for disks smaller than 1 gigabyte. In the meantime, | ||
322 | it has been proved, that this works fine even with disks larger than | ||
323 | 1 gigabyte. | ||
324 | |||
325 | 2.8 Kernel Boot Option | ||
326 | ---------------------- | ||
327 | The function ibmmca_scsi_setup() is called if option ibmmcascsi=n | ||
328 | is passed to the kernel. See file linux/init/main.c for details. | ||
329 | |||
330 | 2.9 Driver Module Support | ||
331 | ------------------------- | ||
332 | Is implemented and tested by K. Kudielka. This could probably not work | ||
333 | on kernels <2.1.0. | ||
334 | |||
335 | 2.10 Multiple Hostadapter Support | ||
336 | --------------------------------- | ||
337 | This driver supports up to eight interfaces of type IBM-SCSI-Subsystem. | ||
338 | Integrated-, and MCA-adapters are automatically recognized. Unrecognizable | ||
339 | IBM-SCSI-Subsystem interfaces can be specified as kernel-parameters. | ||
340 | |||
341 | 2.11 /proc/scsi-Filesystem Information | ||
342 | -------------------------------------- | ||
343 | Information about the driver condition is given in | ||
344 | /proc/scsi/ibmmca/<host_no>. ibmmca_proc_info() provides this information. | ||
345 | |||
346 | This table is quite informative for interested users. It shows the load | ||
347 | of commands on the subsystem and whether you are running the bypassed | ||
348 | (software) or integrated (hardware) SCSI-command set (see below). The | ||
349 | amount of accesses is shown. Read, write, modeselect is shown separately | ||
350 | in order to help debugging problems with CD-ROMs or tapedrives. | ||
351 | |||
352 | The following table shows the list of 15 logical device numbers, that are | ||
353 | used by the SCSI-subsystem. The load on each ldn is shown in the table, | ||
354 | again, read and write commands are split. The last column shows the amount | ||
355 | of reassignments, that have been applied to the ldns, if you have more than | ||
356 | 15 pun/lun combinations available on the SCSI-bus. | ||
357 | |||
358 | The last two tables show the pun/lun map and the positions of the ldns | ||
359 | on this pun/lun map. This may change during operation, when a ldn is | ||
360 | reassigned to another pun/lun combination. If the necessity for dynamical | ||
361 | assignments is set to 'no', the ldn structure keeps static. | ||
362 | |||
363 | 2.12 /proc/mca-Filesystem Information | ||
364 | ------------------------------------- | ||
365 | The slot-file contains all default entries and in addition chip and I/O- | ||
366 | address information of the SCSI-subsystem. This information is provided | ||
367 | by ibmmca_getinfo(). | ||
368 | |||
369 | 2.13 Supported IBM SCSI-Subsystems | ||
370 | ---------------------------------- | ||
371 | The following IBM SCSI-subsystems are supported by this driver: | ||
372 | |||
373 | - IBM Fast/Wide SCSI-2 Adapter | ||
374 | - IBM 7568 Industrial Computer SCSI Adapter w/Cache | ||
375 | - IBM Expansion Unit SCSI Controller | ||
376 | - IBM SCSI Adapter w/Cache | ||
377 | - IBM SCSI Adapter | ||
378 | - IBM Integrated SCSI Controller | ||
379 | - All clones, 100% compatible with the chipset and subsystem command | ||
380 | system of IBM SCSI-adapters (forced detection) | ||
381 | |||
382 | 2.14 Linux Kernel Versions | ||
383 | -------------------------- | ||
384 | The IBM SCSI-subsystem low level driver is prepared to be used with | ||
385 | all versions of Linux between 2.0.x and 2.4.x. The compatibility checks | ||
386 | are fully implemented up from version 3.1e of the driver. This means, that | ||
387 | you just need the latest ibmmca.h and ibmmca.c file and copy it in the | ||
388 | linux/drivers/scsi directory. The code is automatically adapted during | ||
389 | kernel compilation. This is different from kernel 2.4.0! Here version | ||
390 | 4.0 or later of the driver must be used for kernel 2.4.0 or later. Version | ||
391 | 4.0 or later does not work together with older kernels! Driver versions | ||
392 | older than 4.0 do not work together with kernel 2.4.0 or later. They work | ||
393 | on all older kernels. | ||
394 | |||
395 | 3 Code History | ||
396 | -------------- | ||
397 | Jan 15 1996: First public release. | ||
398 | - Martin Kolinek | ||
399 | |||
400 | Jan 23 1996: Scrapped code which reassigned scsi devices to logical | ||
401 | device numbers. Instead, the existing assignment (created | ||
402 | when the machine is powered-up or rebooted) is used. | ||
403 | A side effect is that the upper layer of Linux SCSI | ||
404 | device driver gets bogus scsi ids (this is benign), | ||
405 | and also the hard disks are ordered under Linux the | ||
406 | same way as they are under dos (i.e., C: disk is sda, | ||
407 | D: disk is sdb, etc.). | ||
408 | - Martin Kolinek | ||
409 | |||
410 | I think that the CD-ROM is now detected only if a CD is | ||
411 | inside CD_ROM while Linux boots. This can be fixed later, | ||
412 | once the driver works on all types of PS/2's. | ||
413 | - Martin Kolinek | ||
414 | |||
415 | Feb 7 1996: Modified biosparam function. Fixed the CD-ROM detection. | ||
416 | For now, devices other than harddisk and CD_ROM are | ||
417 | ignored. Temporarily modified abort() function | ||
418 | to behave like reset(). | ||
419 | - Martin Kolinek | ||
420 | |||
421 | Mar 31 1996: The integrated scsi subsystem is correctly found | ||
422 | in PS/2 models 56,57, but not in model 76. Therefore | ||
423 | the ibmmca_scsi_setup() function has been added today. | ||
424 | This function allows the user to force detection of | ||
425 | scsi subsystem. The kernel option has format | ||
426 | ibmmcascsi=n | ||
427 | where n is the scsi_id (pun) of the subsystem. Most likely, n is 7. | ||
428 | - Martin Kolinek | ||
429 | |||
430 | Aug 21 1996: Modified the code which maps ldns to (pun,0). It was | ||
431 | insufficient for those of us with CD-ROM changers. | ||
432 | - Chris Beauregard | ||
433 | |||
434 | Dec 14 1996: More improvements to the ldn mapping. See check_devices | ||
435 | for details. Did more fiddling with the integrated SCSI detection, | ||
436 | but I think it's ultimately hopeless without actually testing the | ||
437 | model of the machine. The 56, 57, 76 and 95 (ultimedia) all have | ||
438 | different integrated SCSI register configurations. However, the 56 | ||
439 | and 57 are the only ones that have problems with forced detection. | ||
440 | - Chris Beauregard | ||
441 | |||
442 | Mar 8-16 1997: Modified driver to run as a module and to support | ||
443 | multiple adapters. A structure, called ibmmca_hostdata, is now | ||
444 | present, containing all the variables, that were once only | ||
445 | available for one single adapter. The find_subsystem-routine has vanished. | ||
446 | The hardware recognition is now done in ibmmca_detect directly. | ||
447 | This routine checks for presence of MCA-bus, checks the interrupt | ||
448 | level and continues with checking the installed hardware. | ||
449 | Certain PS/2-models do not recognize a SCSI-subsystem automatically. | ||
450 | Hence, the setup defined by command-line-parameters is checked first. | ||
451 | Thereafter, the routine probes for an integrated SCSI-subsystem. | ||
452 | Finally, adapters are checked. This method has the advantage to cover all | ||
453 | possible combinations of multiple SCSI-subsystems on one MCA-board. Up to | ||
454 | eight SCSI-subsystems can be recognized and announced to the upper-level | ||
455 | drivers with this improvement. A set of defines made changes to other | ||
456 | routines as small as possible. | ||
457 | - Klaus Kudielka | ||
458 | |||
459 | May 30 1997: (v1.5b) | ||
460 | 1) SCSI-command capability enlarged by the recognition of MODE_SELECT. | ||
461 | This needs the RD-Bit to be disabled on IM_OTHER_SCSI_CMD_CMD which | ||
462 | allows data to be written from the system to the device. It is a | ||
463 | necessary step to be allowed to set blocksize of SCSI-tape-drives and | ||
464 | the tape-speed, without confusing the SCSI-Subsystem. | ||
465 | 2) The recognition of a tape is included in the check_devices routine. | ||
466 | This is done by checking for TYPE_TAPE, that is already defined in | ||
467 | the kernel-scsi-environment. The markup of a tape is done in the | ||
468 | global ldn_is_tape[] array. If the entry on index ldn | ||
469 | is 1, there is a tapedrive connected. | ||
470 | 3) The ldn_is_tape[] array is necessary to distinguish between tape- and | ||
471 | other devices. Fixed blocklength devices should not cause a problem | ||
472 | with the SCB-command for read and write in the ibmmca_queuecommand | ||
473 | subroutine. Therefore, I only derivate the READ_XX, WRITE_XX for | ||
474 | the tape-devices, as recommended by IBM in this Technical Reference, | ||
475 | mentioned below. (IBM recommends to avoid using the read/write of the | ||
476 | subsystem, but the fact was, that read/write causes a command error from | ||
477 | the subsystem and this causes kernel-panic.) | ||
478 | 4) In addition, I propose to use the ldn instead of a fix char for the | ||
479 | display of PS2_DISK_LED_ON(). On 95, one can distinguish between the | ||
480 | devices that are accessed. It shows activity and easyfies debugging. | ||
481 | The tape-support has been tested with a SONY SDT-5200 and a HP DDS-2 | ||
482 | (I do not know yet the type). Optimization and CD-ROM audio-support, | ||
483 | I am working on ... | ||
484 | - Michael Lang | ||
485 | |||
486 | June 19 1997: (v1.6b) | ||
487 | 1) Submitting the extra-array ldn_is_tape[] -> to the local ld[] | ||
488 | device-array. | ||
489 | 2) CD-ROM Audio-Play seems to work now. | ||
490 | 3) When using DDS-2 (120M) DAT-Tapes, mtst shows still density-code | ||
491 | 0x13 for ordinary DDS (61000 BPM) instead 0x24 for DDS-2. This appears | ||
492 | also on Adaptec 2940 adaptor in a PCI-System. Therefore, I assume that | ||
493 | the problem is independent of the low-level-driver/bus-architecture. | ||
494 | 4) Hexadecimal ldn on PS/2-95 LED-display. | ||
495 | 5) Fixing of the PS/2-LED on/off that it works right with tapedrives and | ||
496 | does not confuse the disk_rw_in_progress counter. | ||
497 | - Michael Lang | ||
498 | |||
499 | June 21 1997: (v1.7b) | ||
500 | 1) Adding of a proc_info routine to inform in /proc/scsi/ibmmca/<host> the | ||
501 | outer-world about operational load statistics on the different ldns, | ||
502 | seen by the driver. Everybody that has more than one IBM-SCSI should | ||
503 | test this, because I only have one and cannot see what happens with more | ||
504 | than one IBM-SCSI hosts. | ||
505 | 2) Definition of a driver version-number to have a better recognition of | ||
506 | the source when there are existing too much releases that may confuse | ||
507 | the user, when reading about release-specific problems. Up to know, | ||
508 | I calculated the version-number to be 1.7. Because we are in BETA-test | ||
509 | yet, it is today 1.7b. | ||
510 | 3) Sorry for the heavy bug I programmed on June 19 1997! After that, the | ||
511 | CD-ROM did not work any more! The C7-command was a fake impression | ||
512 | I got while programming. Now, the READ and WRITE commands for CD-ROM are | ||
513 | no longer running over the subsystem, but just over | ||
514 | IM_OTHER_SCSI_CMD_CMD. On my observations (PS/2-95), now CD-ROM mounts | ||
515 | much faster(!) and hopefully all fancy multimedia-functions, like direct | ||
516 | digital recording from audio-CDs also work. (I tried it with cdda2wav | ||
517 | from the cdwtools-package and it filled up the harddisk immediately :-).) | ||
518 | To easify boolean logics, a further local device-type in ld[], called | ||
519 | is_cdrom has been included. | ||
520 | 4) If one uses a SCSI-device of unsupported type/commands, one | ||
521 | immediately runs into a kernel-panic caused by Command Error. To better | ||
522 | understand which SCSI-command caused the problem, I extended this | ||
523 | specific panic-message slightly. | ||
524 | - Michael Lang | ||
525 | |||
526 | June 25 1997: (v1.8b) | ||
527 | 1) Some cosmetic changes for the handling of SCSI-device-types. | ||
528 | Now, also CD-Burners / WORMs and SCSI-scanners should work. For | ||
529 | MO-drives I have no experience, therefore not yet supported. | ||
530 | In logical_devices I changed from different type-variables to one | ||
531 | called 'device_type' where the values, corresponding to scsi.h, | ||
532 | of a SCSI-device are stored. | ||
533 | 2) There existed a small bug, that maps a device, coming after a SCSI-tape | ||
534 | wrong. Therefore, e.g. a CD-ROM changer would have been mapped wrong | ||
535 | -> problem removed. | ||
536 | 3) Extension of the logical_device structure. Now it contains also device, | ||
537 | vendor and revision-level of a SCSI-device for internal usage. | ||
538 | - Michael Lang | ||
539 | |||
540 | June 26-29 1997: (v2.0b) | ||
541 | 1) The release number 2.0b is necessary because of the completely new done | ||
542 | recognition and handling of SCSI-devices with the adapter. As I got | ||
543 | from Chris the hint, that the subsystem can reassign ldns dynamically, | ||
544 | I remembered this immediate_assign-command, I found once in the handbook. | ||
545 | Now, the driver first kills all ldn assignments that are set by default | ||
546 | on the SCSI-subsystem. After that, it probes on all puns and luns for | ||
547 | devices by going through all combinations with immediate_assign and | ||
548 | probing for devices, using device_inquiry. The found physical(!) pun,lun | ||
549 | structure is stored in get_scsi[][] as device types. This is followed | ||
550 | by the assignment of all ldns to existing SCSI-devices. If more ldns | ||
551 | than devices are available, they are assigned to non existing pun,lun | ||
552 | combinations to satisfy the adapter. With this, the dynamical mapping | ||
553 | was possible to implement. (For further info see the text in the | ||
554 | source code and in the description below. Read the description | ||
555 | below BEFORE installing this driver on your system!) | ||
556 | 2) Changed the name IBMMCA_DRIVER_VERSION to IBMMCA_SCSI_DRIVER_VERSION. | ||
557 | 3) The LED-display shows on PS/2-95 no longer the ldn, but the SCSI-ID | ||
558 | (pun) of the accessed SCSI-device. This is now senseful, because the | ||
559 | pun known within the driver is exactly the pun of the physical device | ||
560 | and no longer a fake one. | ||
561 | 4) The /proc/scsi/ibmmca/<host_no> consists now of the first part, where | ||
562 | hit-statistics of ldns is shown and a second part, where the maps of | ||
563 | physical and logical SCSI-devices are displayed. This could be very | ||
564 | interesting, when one is using more than 15 SCSI-devices in order to | ||
565 | follow the dynamical remapping of ldns. | ||
566 | - Michael Lang | ||
567 | |||
568 | June 26-29 1997: (v2.0b-1) | ||
569 | 1) I forgot to switch the local_checking_phase_flag to 1 and back to 0 | ||
570 | in the dynamical remapping part in ibmmca_queuecommand for the | ||
571 | device_exist routine. Sorry. | ||
572 | - Michael Lang | ||
573 | |||
574 | July 1-13 1997: (v3.0b,c) | ||
575 | 1) Merging of the driver-developments of Klaus Kudielka and Michael Lang | ||
576 | in order to get a optimum and unified driver-release for the | ||
577 | IBM-SCSI-Subsystem-Adapter(s). | ||
578 | For people, using the Kernel-release >=2.1.0, module-support should | ||
579 | be no problem. For users, running under <2.1.0, module-support may not | ||
580 | work, because the methods have changed between 2.0.x and 2.1.x. | ||
581 | 2) Added some more effective statistics for /proc-output. | ||
582 | 3) Change typecasting at necessary points from (unsigned long) to | ||
583 | virt_to_bus(). | ||
584 | 4) Included #if... at special points to have specific adaption of the | ||
585 | driver to kernel 2.0.x and 2.1.x. It should therefore also run with | ||
586 | later releases. | ||
587 | 5) Magneto-Optical drives and medium-changers are also recognized, now. | ||
588 | Therefore, we have a completely gapfree recognition of all SCSI- | ||
589 | device-types, that are known by Linux up to kernel 2.1.31. | ||
590 | 6) The flag SCSI_IBMMCA_DEV_RESET has been inserted. If it is set within | ||
591 | the configuration, each connected SCSI-device will get a reset command | ||
592 | during boottime. This can be necessary for some special SCSI-devices. | ||
593 | This flag should be included in Config.in. | ||
594 | (See also the new Config.in file.) | ||
595 | Probable next improvement: bad disk handler. | ||
596 | - Michael Lang | ||
597 | |||
598 | Sept 14 1997: (v3.0c) | ||
599 | 1) Some debugging and speed optimization applied. | ||
600 | - Michael Lang | ||
601 | |||
602 | Dec 15, 1997 | ||
603 | - chrisb@truespectra.com | ||
604 | - made the front panel display thingy optional, specified from the | ||
605 | command-line via ibmmcascsi=display. Along the lines of the /LED | ||
606 | option for the OS/2 driver. | ||
607 | - fixed small bug in the LED display that would hang some machines. | ||
608 | - reversed ordering of the drives (using the | ||
609 | IBMMCA_SCSI_ORDER_STANDARD define). This is necessary for two main | ||
610 | reasons: | ||
611 | - users who've already installed Linux won't be screwed. Keep | ||
612 | in mind that not everyone is a kernel hacker. | ||
613 | - be consistent with the BIOS ordering of the drives. In the | ||
614 | BIOS, id 6 is C:, id 0 might be D:. With this scheme, they'd be | ||
615 | backwards. This confuses the crap out of those heathens who've | ||
616 | got a impure Linux installation (which, <wince>, I'm one of). | ||
617 | This whole problem arises because IBM is actually non-standard with | ||
618 | the id to BIOS mappings. You'll find, in fdomain.c, a similar | ||
619 | comment about a few FD BIOS revisions. The Linux (and apparently | ||
620 | industry) standard is that C: maps to scsi id (0,0). Let's stick | ||
621 | with that standard. | ||
622 | - Since this is technically a branch of my own, I changed the | ||
623 | version number to 3.0e-cpb. | ||
624 | |||
625 | Jan 17, 1998: (v3.0f) | ||
626 | 1) Addition of some statistical info for /proc in proc_info. | ||
627 | 2) Taking care of the SCSI-assignment problem, dealed by Chris at Dec 15 | ||
628 | 1997. In fact, IBM is right, concerning the assignment of SCSI-devices | ||
629 | to driveletters. It is conform to the ANSI-definition of the SCSI- | ||
630 | standard to assign drive C: to SCSI-id 6, because it is the highest | ||
631 | hardware priority after the hostadapter (that has still today by | ||
632 | default everywhere id 7). Also realtime-operating systems that I use, | ||
633 | like LynxOS and OS9, which are quite industrial systems use top-down | ||
634 | numbering of the harddisks, that is also starting at id 6. Now, one | ||
635 | sits a bit between two chairs. On one hand side, using the define | ||
636 | IBMMCA_SCSI_ORDER_STANDARD makes Linux assigning disks conform to | ||
637 | the IBM- and ANSI-SCSI-standard and keeps this driver downward | ||
638 | compatible to older releases, on the other hand side, people is quite | ||
639 | habituated in believing that C: is assigned to (0,0) and much other | ||
640 | SCSI-BIOS do so. Therefore, I moved the IBMMCA_SCSI_ORDER_STANDARD | ||
641 | define out of the driver and put it into Config.in as subitem of | ||
642 | 'IBM SCSI support'. A help, added to Documentation/Configure.help | ||
643 | explains the differences between saying 'y' or 'n' to the user, when | ||
644 | IBMMCA_SCSI_ORDER_STANDARD prompts, so the ordinary user is enabled to | ||
645 | choose the way of assignment, depending on his own situation and gusto. | ||
646 | 3) Adapted SCSI_IBMMCA_DEV_RESET to the local naming convention, so it is | ||
647 | now called IBMMCA_SCSI_DEV_RESET. | ||
648 | 4) Optimization of proc_info and its subroutines. | ||
649 | 5) Added more in-source-comments and extended the driver description by | ||
650 | some explanation about the SCSI-device-assignment problem. | ||
651 | - Michael Lang | ||
652 | |||
653 | Jan 18, 1998: (v3.0g) | ||
654 | 1) Correcting names to be absolutely conform to the later 2.1.x releases. | ||
655 | This is necessary for | ||
656 | IBMMCA_SCSI_DEV_RESET -> CONFIG_IBMMCA_SCSI_DEV_RESET | ||
657 | IBMMCA_SCSI_ORDER_STANDARD -> CONFIG_IBMMCA_SCSI_ORDER_STANDARD | ||
658 | - Michael Lang | ||
659 | |||
660 | Jan 18, 1999: (v3.1 MCA-team internal) | ||
661 | 1) The multiple hosts structure is accessed from every subroutine, so there | ||
662 | is no longer the address of the device structure passed from function | ||
663 | to function, but only the hostindex. A call by value, nothing more. This | ||
664 | should really be understood by the compiler and the subsystem should get | ||
665 | the right values and addresses. | ||
666 | 2) The SCSI-subsystem detection was not complete and quite hugely buggy up | ||
667 | to now, compared to the technical manual. The interpretation of the pos2 | ||
668 | register is not as assumed by people before, therefore, I dropped a note | ||
669 | in the ibmmca_detect function to show the registers' interpretation. | ||
670 | The pos-registers of integrated SCSI-subsystems do not contain any | ||
671 | information concerning the IO-port offset, really. Instead, they contain | ||
672 | some info about the adapter, the chip, the NVRAM .... The I/O-port is | ||
673 | fixed to 0x3540 - 0x3547. There can be more than one adapters in the | ||
674 | slots and they get an offset for the I/O area in order to get their own | ||
675 | I/O-address area. See chapter 2 for detailed description. At least, the | ||
676 | detection should now work right, even on models other than 95. The 95ers | ||
677 | came happily around the bug, as their pos2 register contains always 0 | ||
678 | in the critical area. Reserved bits are not allowed to be interpreted, | ||
679 | therefore, IBM is allowed to set those bits as they like and they may | ||
680 | really vary between different PS/2 models. So, now, no interpretation | ||
681 | of reserved bits - hopefully no trouble here anymore. | ||
682 | 3) The command error, which you may get on models 55, 56, 57, 70, 77 and | ||
683 | P70 may have been caused by the fact, that adapters of older design do | ||
684 | not like sending commands to non-existing SCSI-devices and will react | ||
685 | with a command error as a sign of protest. While this error is not | ||
686 | present on IBM SCSI Adapter w/cache, it appears on IBM Integrated SCSI | ||
687 | Adapters. Therefore, I implemented a workaround to forgive those | ||
688 | adapters their protests, but it is marked up in the statistics, so | ||
689 | after a successful boot, you can see in /proc/scsi/ibmmca/<host_number> | ||
690 | how often the command errors have been forgiven to the SCSI-subsystem. | ||
691 | If the number is bigger than 0, you have a SCSI subsystem of older | ||
692 | design, what should no longer matter. | ||
693 | 4) ibmmca_getinfo() has been adapted very carefully, so it shows in the | ||
694 | slotn file really, what is senseful to be presented. | ||
695 | 5) ibmmca_register() has been extended in its parameter list in order to | ||
696 | pass the right name of the SCSI-adapter to Linux. | ||
697 | - Michael Lang | ||
698 | |||
699 | Feb 6, 1999: (v3.1) | ||
700 | 1) Finally, after some 3.1Beta-releases, the 3.1 release. Sorry, for | ||
701 | the delayed release, but it was not finished with the release of | ||
702 | Kernel 2.2.0. | ||
703 | - Michael Lang | ||
704 | |||
705 | Feb 10, 1999 (v3.1) | ||
706 | 1) Added a new commandline parameter called 'bypass' in order to bypass | ||
707 | every integrated subsystem SCSI-command consequently in case of | ||
708 | troubles. | ||
709 | 2) Concatenated read_capacity requests to the harddisks. It gave a lot | ||
710 | of troubles with some controllers and after I wanted to apply some | ||
711 | extensions, it jumped out in the same situation, on my w/cache, as like | ||
712 | on D. Weinehalls' Model 56, having integrated SCSI. This gave me the | ||
713 | decisive hint to move the code-part out and declare it global. Now | ||
714 | it seems to work far better and more stable. Let us see what | ||
715 | the world thinks of it... | ||
716 | 3) By the way, only Sony DAT-drives seem to show density code 0x13. A | ||
717 | test with a HP drive gave right results, so the problem is vendor- | ||
718 | specific and not a problem of the OS or the driver. | ||
719 | - Michael Lang | ||
720 | |||
721 | Feb 18, 1999 (v3.1d) | ||
722 | 1) The abort command and the reset function have been checked for | ||
723 | inconsistencies. From the logical point of thinking, they work | ||
724 | at their optimum, now, but as the subsystem does not answer with an | ||
725 | interrupt, abort never finishes, sigh... | ||
726 | 2) Everything, that is accessed by a busmaster request from the adapter | ||
727 | is now declared as global variable, even the return-buffer in the | ||
728 | local checking phase. This assures, that no accesses to undefined memory | ||
729 | areas are performed. | ||
730 | 3) In ibmmca.h, the line unchecked_isa_dma is added with 1 in order to | ||
731 | avoid memory-pointers for the areas higher than 16MByte in order to | ||
732 | be sure, it also works on 16-Bit Microchannel bus systems. | ||
733 | 4) A lot of small things have been found, but nothing that endangered the | ||
734 | driver operations. Just it should be more stable, now. | ||
735 | - Michael Lang | ||
736 | |||
737 | Feb 20, 1999 (v3.1e) | ||
738 | 1) I took the warning from the Linux Kernel Hackers Guide serious and | ||
739 | checked the cmd->result return value to the done-function very carefully. | ||
740 | It is obvious, that the IBM SCSI only delivers the tsb.dev_status, if | ||
741 | some error appeared, else it is undefined. Now, this is fixed. Before | ||
742 | any SCB command gets queued, the tsb.dev_status is set to 0, so the | ||
743 | cmd->result won't screw up Linux higher level drivers. | ||
744 | 2) The reset-function has slightly improved. This is still planned for | ||
745 | abort. During the abort and the reset function, no interrupts are | ||
746 | allowed. This is however quite hard to cope with, so the INT-status | ||
747 | register is read. When the interrupt gets queued, one can find its | ||
748 | status immediately on that register and is enabled to continue in the | ||
749 | reset function. I had no chance to test this really, only in a bogus | ||
750 | situation, I got this function running, but the situation was too much | ||
751 | worse for Linux :-(, so tests will continue. | ||
752 | 3) Buffers got now consistent. No open address mapping, as before and | ||
753 | therefore no further troubles with the unassigned memory segmentation | ||
754 | faults that scrambled probes on 95XX series and even on 85XX series, | ||
755 | when the kernel is done in a not so perfectly fitting way. | ||
756 | 4) Spontaneous interrupts from the subsystem, appearing without any | ||
757 | command previously queued are answered with a DID_BAD_INTR result. | ||
758 | 5) Taken into account ZP Gus' proposals to reverse the SCSI-device | ||
759 | scan order. As it does not work on Kernel 2.1.x or 2.2.x, as proposed | ||
760 | by him, I implemented it in a slightly derived way, which offers in | ||
761 | addition more flexibility. | ||
762 | - Michael Lang | ||
763 | |||
764 | Apr 23, 2000 (v3.2pre1) | ||
765 | 1) During a very long time, I collected a huge amount of bug reports from | ||
766 | various people, trying really quite different things on their SCSI- | ||
767 | PS/2s. Today, all these bug reports are taken into account and should be | ||
768 | mostly solved. The major topics were: | ||
769 | - Driver crashes during boottime by no obvious reason. | ||
770 | - Driver panics while the midlevel-SCSI-driver is trying to inquire | ||
771 | the SCSI-device properties, even though hardware is in perfect state. | ||
772 | - Displayed info for the various slot-cards is interpreted wrong. | ||
773 | The main reasons for the crashes were two: | ||
774 | 1) The commands to check for device information like INQUIRY, | ||
775 | TEST_UNIT_READY, REQUEST_SENSE and MODE_SENSE cause the devices | ||
776 | to deliver information of up to 255 bytes. Midlevel drivers offer | ||
777 | 1024 bytes of space for the answer, but the IBM-SCSI-adapters do | ||
778 | not accept this, as they stick quite near to ANSI-SCSI and report | ||
779 | a COMMAND_ERROR message which causes the driver to panic. The main | ||
780 | problem was located around the INQUIRY command. Now, for all the | ||
781 | mentioned commands, the buffersize sent to the adapter is at | ||
782 | maximum 255 which seems to be a quite reasonable solution. | ||
783 | TEST_UNIT_READY gets a buffersize of 0 to make sure that no | ||
784 | data is transferred in order to avoid any possible command failure. | ||
785 | 2) On unsuccessful TEST_UNIT_READY, the mid-level driver has to send | ||
786 | a REQUEST_SENSE in order to see where the problem is located. This | ||
787 | REQUEST_SENSE may have various length in its answer-buffer. IBM | ||
788 | SCSI-subsystems report a command failure if the returned buffersize | ||
789 | is different from the sent buffersize, but this can be suppressed by | ||
790 | a special bit, which is now done and problems seem to be solved. | ||
791 | 2) Code adaption to all kernel-releases. Now, the 3.2 code compiles on | ||
792 | 2.0.x, 2.1.x, 2.2.x and 2.3.x kernel releases without any code-changes. | ||
793 | 3) Commandline-parameters are recognized again, even under Kernel 2.3.x or | ||
794 | higher. | ||
795 | - Michael Lang | ||
796 | |||
797 | April 27, 2000 (v3.2pre2) | ||
798 | 1) Bypassed commands get read by the adapter by one cycle instead of two. | ||
799 | This increases SCSI-performance. | ||
800 | 2) Synchronous datatransfer is provided for sure to be 5 MHz on older | ||
801 | SCSI and 10 MHz on internal F/W SCSI-adapter. | ||
802 | 3) New commandline parameters allow to force the adapter to slow down while | ||
803 | in synchronous transfer. Could be helpful for very old devices. | ||
804 | - Michael Lang | ||
805 | |||
806 | June 2, 2000 (v3.2pre5) | ||
807 | 1) Added Jim Shorney's contribution to make the activity indicator | ||
808 | flashing in addition to the LED-alphanumeric display-panel on | ||
809 | models 95A. To be enabled to choose this feature freely, a new | ||
810 | commandline parameter is added, called 'activity'. | ||
811 | 2) Added the READ_CONTROL bit for test_unit_ready SCSI-command. | ||
812 | 3) Added some suppress_exception bits to read_device_capacity and | ||
813 | all device_inquiry occurrences in the driver code. | ||
814 | 4) Complaints about the various KERNEL_VERSION implementations are | ||
815 | taken into account. Every local_LinuxKernelVersion occurrence is | ||
816 | now replaced by KERNEL_VERSION, defined in linux/version.h. | ||
817 | Corresponding changes were applied to ibmmca.h, too. This was a | ||
818 | contribution to all kernel-parts by Philipp Hahn. | ||
819 | - Michael Lang | ||
820 | |||
821 | July 17, 2000 (v3.2pre8) | ||
822 | A long period of collecting bug reports from all corners of the world | ||
823 | now lead to the following corrections to the code: | ||
824 | 1) SCSI-2 F/W support crashed with a COMMAND ERROR. The reason for this | ||
825 | was that it is possible to disable Fast-SCSI for the external bus. | ||
826 | The feature-control command, where this crash appeared regularly, tried | ||
827 | to set the maximum speed of 10MHz synchronous transfer speed and that | ||
828 | reports a COMMAND ERROR if external bus Fast-SCSI is disabled. Now, | ||
829 | the feature-command probes down from maximum speed until the adapter | ||
830 | stops to complain, which is at the same time the maximum possible | ||
831 | speed selected in the reference program. So, F/W external can run at | ||
832 | 5 MHz (slow-) or 10 MHz (fast-SCSI). During feature probing, the | ||
833 | COMMAND ERROR message is used to detect if the adapter does not complain. | ||
834 | 2) Up to now, only combined busmode is supported, if you use external | ||
835 | SCSI-devices, attached to the F/W-controller. If dual bus is selected, | ||
836 | only the internal SCSI-devices get accessed by Linux. For most | ||
837 | applications, this should do fine. | ||
838 | 3) Wide-SCSI-addressing (16-Bit) is now possible for the internal F/W | ||
839 | bus on the F/W adapter. If F/W adapter is detected, the driver | ||
840 | automatically uses the extended PUN/LUN <-> LDN mapping tables, which | ||
841 | are now new from 3.2pre8. This allows PUNs between 0 and 15 and should | ||
842 | provide more fun with the F/W adapter. | ||
843 | 4) Several machines use the SCSI: POS registers for internal/undocumented | ||
844 | storage of system relevant info. This confused the driver, mainly on | ||
845 | models 9595, as it expected no onboard SCSI only, if all POS in | ||
846 | the integrated SCSI-area are set to 0x00 or 0xff. Now, the mechanism | ||
847 | to check for integrated SCSI is much more restrictive and these problems | ||
848 | should be history. | ||
849 | - Michael Lang | ||
850 | |||
851 | July 18, 2000 (v3.2pre9) | ||
852 | This develop rather quickly at the moment. Two major things were still | ||
853 | missing in 3.2pre8: | ||
854 | 1) The adapter PUN for F/W adapters has 4-bits, while all other adapters | ||
855 | have 3-bits. This is now taken into account for F/W. | ||
856 | 2) When you select CONFIG_IBMMCA_SCSI_ORDER_STANDARD, you should | ||
857 | normally get the inverse probing order of your devices on the SCSI-bus. | ||
858 | The ANSI device order gets scrambled in version 3.2pre8!! Now, a new | ||
859 | and tested algorithm inverts the device-order on the SCSI-bus and | ||
860 | automatically avoids accidental access to whatever SCSI PUN the adapter | ||
861 | is set and works with SCSI- and Wide-SCSI-addressing. | ||
862 | - Michael Lang | ||
863 | |||
864 | July 23, 2000 (v3.2pre10 unpublished) | ||
865 | 1) LED panel display supports wide-addressing in ibmmca=display mode. | ||
866 | 2) Adapter-information and autoadaption to address-space is done. | ||
867 | 3) Auto-probing for maximum synchronous SCSI transfer rate is working. | ||
868 | 4) Optimization to some embedded function calls is applied. | ||
869 | 5) Added some comment for the user to wait for SCSI-devices being probed. | ||
870 | 6) Finished version 3.2 for Kernel 2.4.0. It least, I thought it is but... | ||
871 | - Michael Lang | ||
872 | |||
873 | July 26, 2000 (v3.2pre11) | ||
874 | 1) I passed a horrible weekend getting mad with NMIs on kernel 2.2.14 and | ||
875 | a model 9595. Asking around in the community, nobody except of me has | ||
876 | seen such errors. Weird, but I am trying to recompile everything on | ||
877 | the model 9595. Maybe, as I use a specially modified gcc, that could | ||
878 | cause problems. But, it was not the reason. The true background was, | ||
879 | that the kernel was compiled for i386 and the 9595 has a 486DX-2. | ||
880 | Normally, no troubles should appear, but for this special machine, | ||
881 | only the right processor support is working fine! | ||
882 | 2) Previous problems with synchronous speed, slowing down from one adapter | ||
883 | to the next during probing are corrected. Now, local variables store | ||
884 | the synchronous bitmask for every single adapter found on the MCA bus. | ||
885 | 3) LED alphanumeric panel support for XX95 systems is now showing some | ||
886 | alive rotator during boottime. This makes sense, when no monitor is | ||
887 | connected to the system. You can get rid of all display activity, if | ||
888 | you do not use any parameter or just ibmmcascsi=activity, for the | ||
889 | harddrive activity LED, existent on all PS/2, except models 8595-XXX. | ||
890 | If no monitor is available, please use ibmmcascsi=display, which works | ||
891 | fine together with the linuxinfo utility for the LED-panel. | ||
892 | - Michael Lang | ||
893 | |||
894 | July 29, 2000 (v3.2) | ||
895 | 1) Submission of this driver for kernel 2.4test-XX and 2.2.17. | ||
896 | - Michael Lang | ||
897 | |||
898 | December 28, 2000 (v3.2d / v4.0) | ||
899 | 1) The interrupt handler had some wrong statement to wait for. This | ||
900 | was done due to experimental reasons during 3.2 development but it | ||
901 | has shown that this is not stable enough. Going back to wait for the | ||
902 | adapter to be not busy is best. | ||
903 | 2) Inquiry requests can be shorter than 255 bytes of return buffer. Due | ||
904 | to a bug in the ibmmca_queuecommand routine, this buffer was forced | ||
905 | to 255 at minimum. If the memory address, this return buffer is pointing | ||
906 | to does not offer more space, invalid memory accesses destabilized the | ||
907 | kernel. | ||
908 | 3) version 4.0 is only valid for kernel 2.4.0 or later. This is necessary | ||
909 | to remove old kernel version dependent waste from the driver. 3.2d is | ||
910 | only distributed with older kernels but keeps compatibility with older | ||
911 | kernel versions. 4.0 and higher versions cannot be used with older | ||
912 | kernels anymore!! You must have at least kernel 2.4.0!! | ||
913 | 4) The commandline argument 'bypass' and all its functionality got removed | ||
914 | in version 4.0. This was never really necessary, as all troubles were | ||
915 | based on non-command related reasons up to now, so bypassing commands | ||
916 | did not help to avoid any bugs. It is kept in 3.2X for debugging reasons. | ||
917 | 5) Dynamic reassignment of ldns was again verified and analyzed to be | ||
918 | completely inoperational. This is corrected and should work now. | ||
919 | 6) All commands that get sent to the SCSI adapter were verified and | ||
920 | completed in such a way, that they are now completely conform to the | ||
921 | demands in the technical description of IBM. Main candidates were the | ||
922 | DEVICE_INQUIRY, REQUEST_SENSE and DEVICE_CAPACITY commands. They must | ||
923 | be transferred by bypassing the internal command buffer of the adapter | ||
924 | or else the response can be a random result. GET_POS_INFO would be more | ||
925 | safe in usage, if one could use the SUPRESS_EXCEPTION_SHORT, but this | ||
926 | is not allowed by the technical references of IBM. (Sorry, folks, the | ||
927 | model 80 problem is still a task to be solved in a different way.) | ||
928 | 7) v3.2d is still hold back for some days for testing, while 4.0 is | ||
929 | released. | ||
930 | - Michael Lang | ||
931 | |||
932 | January 3, 2001 (v4.0a) | ||
933 | 1) A lot of complains after the 2.4.0-prerelease kernel came in about | ||
934 | the impossibility to compile the driver as a module. This problem is | ||
935 | solved. In combination with that problem, some unprecise declaration | ||
936 | of the function option_setup() gave some warnings during compilation. | ||
937 | This is solved, too by a forward declaration in ibmmca.c. | ||
938 | 2) #ifdef argument concerning CONFIG_SCSI_IBMMCA is no longer needed and | ||
939 | was entirely removed. | ||
940 | 3) Some switch statements got optimized in code, as some minor variables | ||
941 | in internal SCSI-command handlers. | ||
942 | - Michael Lang | ||
943 | |||
944 | 4 To do | ||
945 | ------- | ||
946 | - IBM SCSI-2 F/W external SCSI bus support in separate mode! | ||
947 | - It seems that the handling of bad disks is really bad - | ||
948 | non-existent, in fact. However, a low-level driver cannot help | ||
949 | much, if such things happen. | ||
950 | |||
951 | 5 Users' Manual | ||
952 | --------------- | ||
953 | 5.1 Commandline Parameters | ||
954 | -------------------------- | ||
955 | There exist several features for the IBM SCSI-subsystem driver. | ||
956 | The commandline parameter format is: | ||
957 | |||
958 | ibmmcascsi=<command1>,<command2>,<command3>,... | ||
959 | |||
960 | where commandN can be one of the following: | ||
961 | |||
962 | display Owners of a model 95 or other PS/2 systems with an | ||
963 | alphanumeric LED display may set this to have their | ||
964 | display showing the following output of the 8 digits: | ||
965 | |||
966 | ------DA | ||
967 | |||
968 | where '-' stays dark, 'D' shows the SCSI-device id | ||
969 | and 'A' shows the SCSI hostindex, being currently | ||
970 | accessed. During boottime, this will give the message | ||
971 | |||
972 | SCSIini* | ||
973 | |||
974 | on the LED-panel, where the * represents a rotator, | ||
975 | showing the activity during the probing phase of the | ||
976 | driver which can take up to two minutes per SCSI-adapter. | ||
977 | adisplay This works like display, but gives more optical overview | ||
978 | of the activities on the SCSI-bus. The display will have | ||
979 | the following output: | ||
980 | |||
981 | 6543210A | ||
982 | |||
983 | where the numbers 0 to 6 light up at the shown position, | ||
984 | when the SCSI-device is accessed. 'A' shows again the SCSI | ||
985 | hostindex. If display nor adisplay is set, the internal | ||
986 | PS/2 harddisk LED is used for media-activities. So, if | ||
987 | you really do not have a system with a LED-display, you | ||
988 | should not set display or adisplay. Keep in mind, that | ||
989 | display and adisplay can only be used alternatively. It | ||
990 | is not recommended to use this option, if you have some | ||
991 | wide-addressed devices e.g. at the SCSI-2 F/W adapter in | ||
992 | your system. In addition, the usage of the display for | ||
993 | other tasks in parallel, like the linuxinfo-utility makes | ||
994 | no sense with this option. | ||
995 | activity This enables the PS/2 harddisk LED activity indicator. | ||
996 | Most PS/2 have no alphanumeric LED display, but some | ||
997 | indicator. So you should use this parameter to activate it. | ||
998 | If you own model 9595 (Server95), you can have both, the | ||
999 | LED panel and the activity indicator in parallel. However, | ||
1000 | some PS/2s, like the 8595 do not have any harddisk LED | ||
1001 | activity indicator, which means, that you must use the | ||
1002 | alphanumeric LED display if you want to monitor SCSI- | ||
1003 | activity. | ||
1004 | bypass This is obsolete from driver version 4.0, as the adapters | ||
1005 | got that far understood, that the selection between | ||
1006 | integrated and bypassed commands should now work completely | ||
1007 | correct! For historical reasons, the old description is | ||
1008 | kept here: | ||
1009 | This commandline parameter forces the driver never to use | ||
1010 | SCSI-subsystems' integrated SCSI-command set. Except of | ||
1011 | the immediate assign, which is of vital importance for | ||
1012 | every IBM SCSI-subsystem to set its ldns right. Instead, | ||
1013 | the ordinary ANSI-SCSI-commands are used and passed by the | ||
1014 | controller to the SCSI-devices, therefore 'bypass'. The | ||
1015 | effort, done by the subsystem is quite bogus and at a | ||
1016 | minimum and therefore it should work everywhere. This | ||
1017 | could maybe solve troubles with old or integrated SCSI- | ||
1018 | controllers and nasty harddisks. Keep in mind, that using | ||
1019 | this flag will slow-down SCSI-accesses slightly, as the | ||
1020 | software generated commands are always slower than the | ||
1021 | hardware. Non-harddisk devices always get read/write- | ||
1022 | commands in bypass mode. On the most recent releases of | ||
1023 | the Linux IBM-SCSI-driver, the bypass command should be | ||
1024 | no longer a necessary thing, if you are sure about your | ||
1025 | SCSI-hardware! | ||
1026 | normal This is the parameter, introduced on the 2.0.x development | ||
1027 | rail by ZP Gu. This parameter defines the SCSI-device | ||
1028 | scan order in the new industry standard. This means, that | ||
1029 | the first SCSI-device is the one with the lowest pun. | ||
1030 | E.g. harddisk at pun=0 is scanned before harddisk at | ||
1031 | pun=6, which means, that harddisk at pun=0 gets sda | ||
1032 | and the one at pun=6 gets sdb. | ||
1033 | ansi The ANSI-standard for the right scan order, as done by | ||
1034 | IBM, Microware and Microsoft, scans SCSI-devices starting | ||
1035 | at the highest pun, which means, that e.g. harddisk at | ||
1036 | pun=6 gets sda and a harddisk at pun=0 gets sdb. If you | ||
1037 | like to have the same SCSI-device order, as in DOS, OS-9 | ||
1038 | or OS/2, just use this parameter. | ||
1039 | fast SCSI-I/O in synchronous mode is done at 5 MHz for IBM- | ||
1040 | SCSI-devices. SCSI-2 Fast/Wide Adapter/A external bus | ||
1041 | should then run at 10 MHz if Fast-SCSI is enabled, | ||
1042 | and at 5 MHz if Fast-SCSI is disabled on the external | ||
1043 | bus. This is the default setting when nothing is | ||
1044 | specified here. | ||
1045 | medium Synchronous rate is at 50% approximately, which means | ||
1046 | 2.5 MHz for IBM SCSI-adapters and 5.0 MHz for F/W ext. | ||
1047 | SCSI-bus (when Fast-SCSI speed enabled on external bus). | ||
1048 | slow The slowest possible synchronous transfer rate is set. | ||
1049 | This means 1.82 MHz for IBM SCSI-adapters and 2.0 MHz | ||
1050 | for F/W external bus at Fast-SCSI speed on the external | ||
1051 | bus. | ||
1052 | |||
1053 | A further option is that you can force the SCSI-driver to accept a SCSI- | ||
1054 | subsystem at a certain I/O-address with a predefined adapter PUN. This | ||
1055 | is done by entering | ||
1056 | |||
1057 | commandN = I/O-base | ||
1058 | commandN+1 = adapter PUN | ||
1059 | |||
1060 | e.g. ibmmcascsi=0x3540,7 will force the driver to detect a SCSI-subsystem | ||
1061 | at I/O-address 0x3540 with adapter PUN 7. Please only use this method, if | ||
1062 | the driver does really not recognize your SCSI-adapter! With driver version | ||
1063 | 3.2, this recognition of various adapters was hugely improved and you | ||
1064 | should try first to remove your commandline arguments of such type with a | ||
1065 | newer driver. I bet, it will be recognized correctly. Even multiple and | ||
1066 | different types of IBM SCSI-adapters should be recognized correctly, too. | ||
1067 | Use the forced detection method only as last solution! | ||
1068 | |||
1069 | Examples: | ||
1070 | |||
1071 | ibmmcascsi=adisplay | ||
1072 | |||
1073 | This will use the advanced display mode for the model 95 LED alphanumeric | ||
1074 | display. | ||
1075 | |||
1076 | ibmmcascsi=display,0x3558,7 | ||
1077 | |||
1078 | This will activate the default display mode for the model 95 LED display | ||
1079 | and will force the driver to accept a SCSI-subsystem at I/O-base 0x3558 | ||
1080 | with adapter PUN 7. | ||
1081 | |||
1082 | 5.2 Troubleshooting | ||
1083 | ------------------- | ||
1084 | The following FAQs should help you to solve some major problems with this | ||
1085 | driver. | ||
1086 | |||
1087 | Q: "Reset SCSI-devices at boottime" halts the system at boottime, why? | ||
1088 | A: This is only tested with the IBM SCSI Adapter w/cache. It is not | ||
1089 | yet proven to run on other adapters, however you may be lucky. | ||
1090 | In version 3.1d this has been hugely improved and should work better, | ||
1091 | now. Normally you really won't need to activate this flag in the | ||
1092 | kernel configuration, as all post 1989 SCSI-devices should accept | ||
1093 | the reset-signal, when the computer is switched on. The SCSI- | ||
1094 | subsystem generates this reset while being initialized. This flag | ||
1095 | is really reserved for users with very old, very strange or self-made | ||
1096 | SCSI-devices. | ||
1097 | Q: Why is the SCSI-order of my drives mirrored to the device-order | ||
1098 | seen from OS/2 or DOS ? | ||
1099 | A: It depends on the operating system, if it looks at the devices in | ||
1100 | ANSI-SCSI-standard (starting from pun 6 and going down to pun 0) or | ||
1101 | if it just starts at pun 0 and counts up. If you want to be conform | ||
1102 | with OS/2 and DOS, you have to activate this flag in the kernel | ||
1103 | configuration or you should set 'ansi' as parameter for the kernel. | ||
1104 | The parameter 'normal' sets the new industry standard, starting | ||
1105 | from pun 0, scanning up to pun 6. This allows you to change your | ||
1106 | opinion still after having already compiled the kernel. | ||
1107 | Q: Why can't I find IBM MCA SCSI support in the config menu? | ||
1108 | A: You have to activate MCA bus support, first. | ||
1109 | Q: Where can I find the latest info about this driver? | ||
1110 | A: See the file MAINTAINERS for the current WWW-address, which offers | ||
1111 | updates, info and Q/A lists. At this file's origin, the webaddress | ||
1112 | was: http://www.staff.uni-mainz.de/mlang/linux.html | ||
1113 | Q: My SCSI-adapter is not recognized by the driver, what can I do? | ||
1114 | A: Just force it to be recognized by kernel parameters. See section 5.1. | ||
1115 | If this really happens, do also send e-mail to the maintainer, as | ||
1116 | forced detection should be never necessary. Forced detection is in | ||
1117 | principal some flaw of the driver adapter detection and goes into | ||
1118 | bug reports. | ||
1119 | Q: The driver screws up, if it starts to probe SCSI-devices, is there | ||
1120 | some way out of it? | ||
1121 | A: Yes, that was some recognition problem of the correct SCSI-adapter | ||
1122 | and its I/O base addresses. Upgrade your driver to the latest release | ||
1123 | and it should be fine again. | ||
1124 | Q: I get a message: panic IBM MCA SCSI: command error .... , what can | ||
1125 | I do against this? | ||
1126 | A: Previously, I followed the way by ignoring command errors by using | ||
1127 | ibmmcascsi=forgiveall, but this command no longer exists and is | ||
1128 | obsolete. If such a problem appears, it is caused by some segmentation | ||
1129 | fault of the driver, which maps to some unallowed area. The latest | ||
1130 | version of the driver should be ok, as most bugs have been solved. | ||
1131 | Q: There are still kernel panics, even after having set | ||
1132 | ibmmcascsi=forgiveall. Are there other possibilities to prevent | ||
1133 | such panics? | ||
1134 | A: No, get just the latest release of the driver and it should work | ||
1135 | better and better with increasing version number. Forget about this | ||
1136 | ibmmcascsi=forgiveall, as also ignorecmd are obsolete.! | ||
1137 | Q: Linux panics or stops without any comment, but it is probable, that my | ||
1138 | harddisk(s) have bad blocks. | ||
1139 | A: Sorry, the bad-block handling is still a feeble point of this driver, | ||
1140 | but is on the schedule for development in the near future. | ||
1141 | Q: Linux panics while dynamically assigning SCSI-ids or ldns. | ||
1142 | A: If you disconnect a SCSI-device from the machine, while Linux is up | ||
1143 | and the driver uses dynamical reassignment of logical device numbers | ||
1144 | (ldn), it really gets "angry" if it won't find devices, that were still | ||
1145 | present at boottime and stops Linux. | ||
1146 | Q: The system does not recover after an abort-command has been generated. | ||
1147 | A: This is regrettably true, as it is not yet understood, why the | ||
1148 | SCSI-adapter does really NOT generate any interrupt at the end of | ||
1149 | the abort-command. As no interrupt is generated, the abort command | ||
1150 | cannot get finished and the system hangs, sorry, but checks are | ||
1151 | running to hunt down this problem. If there is a real pending command, | ||
1152 | the interrupt MUST get generated after abort. In this case, it | ||
1153 | should finish well. | ||
1154 | Q: The system gets in bad shape after a SCSI-reset, is this known? | ||
1155 | A: Yes, as there are a lot of prescriptions (see the Linux Hackers' | ||
1156 | Guide) what has to be done for reset, we still share the bad shape of | ||
1157 | the reset functions with all other low level SCSI-drivers. | ||
1158 | Astonishingly, reset works in most cases quite ok, but the harddisks | ||
1159 | won't run in synchronous mode anymore after a reset, until you reboot. | ||
1160 | Q: Why does my XXX w/Cache adapter not use read-prefetch? | ||
1161 | A: Ok, that is not completely possible. If a cache is present, the | ||
1162 | adapter tries to use it internally. Explicitly, one can use the cache | ||
1163 | with a read prefetch command, maybe in future, but this requires | ||
1164 | some major overhead of SCSI-commands that risks the performance to | ||
1165 | go down more than it gets improved. Tests with that are running. | ||
1166 | Q: I have a IBM SCSI-2 Fast/Wide adapter, it boots in some way and hangs. | ||
1167 | A: Yes, that is understood, as for sure, your SCSI-2 Fast/Wide adapter | ||
1168 | was in such a case recognized as integrated SCSI-adapter or something | ||
1169 | else, but not as the correct adapter. As the I/O-ports get assigned | ||
1170 | wrongly by that reason, the system should crash in most cases. You | ||
1171 | should upgrade to the latest release of the SCSI-driver. The | ||
1172 | recommended version is 3.2 or later. Here, the F/W support is in | ||
1173 | a stable and reliable condition. Wide-addressing is in addition | ||
1174 | supported. | ||
1175 | Q: I get an Oops message and something like "killing interrupt". | ||
1176 | A: The reason for this is that the IBM SCSI-subsystem only sends a | ||
1177 | termination status back, if some error appeared. In former releases | ||
1178 | of the driver, it was not checked, if the termination status block | ||
1179 | is NULL. From version 3.2, it is taken care of this. | ||
1180 | Q: I have a F/W adapter and the driver sees my internal SCSI-devices, | ||
1181 | but ignores the external ones. | ||
1182 | A: Select combined busmode in the IBM config-program and check for that | ||
1183 | no SCSI-id on the external devices appears on internal devices. | ||
1184 | Reboot afterwards. Dual busmode is supported, but works only for the | ||
1185 | internal bus, yet. External bus is still ignored. Take care for your | ||
1186 | SCSI-ids. If combined bus-mode is activated, on some adapters, | ||
1187 | the wide-addressing is not possible, so devices with ids between 8 | ||
1188 | and 15 get ignored by the driver & adapter! | ||
1189 | Q: I have a 9595 and I get a NMI during heavy SCSI I/O e.g. during fsck. | ||
1190 | A COMMAND ERROR is reported and characters on the screen are missing. | ||
1191 | Warm reboot is not possible. Things look like quite weird. | ||
1192 | A: Check the processor type of your 9595. If you have an 80486 or 486DX-2 | ||
1193 | processor complex on your mainboard and you compiled a kernel that | ||
1194 | supports 80386 processors, it is possible, that the kernel cannot | ||
1195 | keep track of the PS/2 interrupt handling and stops on an NMI. Just | ||
1196 | compile a kernel for the correct processor type of your PS/2 and | ||
1197 | everything should be fine. This is necessary even if one assumes, | ||
1198 | that some 80486 system should be downward compatible to 80386 | ||
1199 | software. | ||
1200 | Q: Some commands hang and interrupts block the machine. After some | ||
1201 | timeout, the syslog reports that it tries to call abort, but the | ||
1202 | machine is frozen. | ||
1203 | A: This can be a busy wait bug in the interrupt handler of driver | ||
1204 | version 3.2. You should at least upgrade to 3.2c if you use | ||
1205 | kernel < 2.4.0 and driver version 4.0 if you use kernel 2.4.0 or | ||
1206 | later (including all test releases). | ||
1207 | Q: I have a PS/2 model 80 and more than 16 MBytes of RAM. The driver | ||
1208 | completely refuses to work, reports NMIs, COMMAND ERRORs or other | ||
1209 | ambiguous stuff. When reducing the RAM size down below 16 MB, | ||
1210 | everything is running smoothly. | ||
1211 | A: No real answer, yet. In any case, one should force the kernel to | ||
1212 | present SCBs only below the 16 MBytes barrier. Maybe this solves the | ||
1213 | problem. Not yet tried, but guessing that it could work. To get this, | ||
1214 | set unchecked_isa_dma argument of ibmmca.h from 0 to 1. | ||
1215 | |||
1216 | 5.3 Bug reports | ||
1217 | -------------- | ||
1218 | If you really find bugs in the source code or the driver will successfully | ||
1219 | refuse to work on your machine, you should send a bug report to me. The | ||
1220 | best for this is to follow the instructions on the WWW-page for this | ||
1221 | driver. Fill out the bug-report form, placed on the WWW-page and ship it, | ||
1222 | so the bugs can be taken into account with maximum efforts. But, please | ||
1223 | do not send bug reports about this driver to Linus Torvalds or Leonard | ||
1224 | Zubkoff, as Linus is buried in E-Mail and Leonard is supervising all | ||
1225 | SCSI-drivers and won't have the time left to look inside every single | ||
1226 | driver to fix a bug and especially DO NOT send modified code to Linus | ||
1227 | Torvalds or Alan J. Cox which has not been checked here!!! They are both | ||
1228 | quite buried in E-mail (as me, sometimes, too) and one should first check | ||
1229 | for problems on my local teststand. Recently, I got a lot of | ||
1230 | bug reports for errors in the ibmmca.c code, which I could not imagine, but | ||
1231 | a look inside some Linux-distribution showed me quite often some modified | ||
1232 | code, which did no longer work on most other machines than the one of the | ||
1233 | modifier. Ok, so now that there is maintenance service available for this | ||
1234 | driver, please use this address first in order to keep the level of | ||
1235 | confusion low. Thank you! | ||
1236 | |||
1237 | When you get a SCSI-error message that panics your system, a list of | ||
1238 | register-entries of the SCSI-subsystem is shown (from Version 3.1d). With | ||
1239 | this list, it is very easy for the maintainer to localize the problem in | ||
1240 | the driver or in the configuration of the user. Please write down all the | ||
1241 | values from this report and send them to the maintainer. This would really | ||
1242 | help a lot and makes life easier concerning misunderstandings. | ||
1243 | |||
1244 | Use the bug-report form (see 5.4 for its address) to send all the bug- | ||
1245 | stuff to the maintainer or write e-mail with the values from the table. | ||
1246 | |||
1247 | 5.4 Support WWW-page | ||
1248 | -------------------- | ||
1249 | The address of the IBM SCSI-subsystem supporting WWW-page is: | ||
1250 | |||
1251 | http://www.staff.uni-mainz.de/mlang/linux.html | ||
1252 | |||
1253 | Here you can find info about the background of this driver, patches, | ||
1254 | troubleshooting support, news and a bugreport form. Please check that | ||
1255 | WWW-page regularly for latest hints. If ever this URL changes, please | ||
1256 | refer to the MAINTAINERS file in order to get the latest address. | ||
1257 | |||
1258 | For the bugreport, please fill out the formular on the corresponding | ||
1259 | WWW-page. Read the dedicated instructions and write as much as you | ||
1260 | know about your problem. If you do not like such formulars, please send | ||
1261 | some e-mail directly, but at least with the same information as required by | ||
1262 | the formular. | ||
1263 | |||
1264 | If you have extensive bug reports, including Oops messages and | ||
1265 | screen-shots, please feel free to send it directly to the address | ||
1266 | of the maintainer, too. The current address of the maintainer is: | ||
1267 | |||
1268 | Michael Lang <langa2@kph.uni-mainz.de> | ||
1269 | |||
1270 | 6 References | ||
1271 | ------------ | ||
1272 | IBM Corp., "Update for the PS/2 Hardware Interface Technical Reference, | ||
1273 | Common Interfaces", Armonk, September 1991, PN 04G3281, | ||
1274 | (available in the U.S. for $21.75 at 1-800-IBM-PCTB or in Germany for | ||
1275 | around 40,-DM at "Hallo IBM"). | ||
1276 | |||
1277 | IBM Corp., "Personal System/2 Micro Channel SCSI | ||
1278 | Adapter with Cache Technical Reference", Armonk, March 1990, PN 68X2365. | ||
1279 | |||
1280 | IBM Corp., "Personal System/2 Micro Channel SCSI | ||
1281 | Adapter Technical Reference", Armonk, March 1990, PN 68X2397. | ||
1282 | |||
1283 | IBM Corp., "SCSI-2 Fast/Wide Adapter/A Technical Reference - Dual Bus", | ||
1284 | Armonk, March 1994, PN 83G7545. | ||
1285 | |||
1286 | Friedhelm Schmidt, "SCSI-Bus und IDE-Schnittstelle - Moderne Peripherie- | ||
1287 | Schnittstellen: Hardware, Protokollbeschreibung und Anwendung", 2. Aufl. | ||
1288 | Addison Wesley, 1996. | ||
1289 | |||
1290 | Michael K. Johnson, "The Linux Kernel Hackers' Guide", Version 0.6, Chapel | ||
1291 | Hill - North Carolina, 1995 | ||
1292 | |||
1293 | Andreas Kaiser, "SCSI TAPE BACKUP for OS/2 2.0", Version 2.12, Stuttgart | ||
1294 | 1993 | ||
1295 | |||
1296 | Helmut Rompel, "IBM Computerwelt GUIDE", What is what bei IBM., Systeme * | ||
1297 | Programme * Begriffe, IWT-Verlag GmbH - Muenchen, 1988 | ||
1298 | |||
1299 | 7 Credits to | ||
1300 | ------------ | ||
1301 | 7.1 People | ||
1302 | ---------- | ||
1303 | Klaus Grimm | ||
1304 | who already a long time ago gave me the old code from the | ||
1305 | SCSI-driver in order to get it running for some old machine | ||
1306 | in our institute. | ||
1307 | Martin Kolinek | ||
1308 | who wrote the first release of the IBM SCSI-subsystem driver. | ||
1309 | Chris Beauregard | ||
1310 | who for a long time maintained MCA-Linux and the SCSI-driver | ||
1311 | in the beginning. Chris, wherever you are: Cheers to you! | ||
1312 | Klaus Kudielka | ||
1313 | with whom in the 2.1.x times, I had a quite fruitful | ||
1314 | cooperation to get the driver running as a module and to get | ||
1315 | it running with multiple SCSI-adapters. | ||
1316 | David Weinehall | ||
1317 | for his excellent maintenance of the MCA-stuff and the quite | ||
1318 | detailed bug reports and ideas for this driver (and his | ||
1319 | patience ;-)). | ||
1320 | Alan J. Cox | ||
1321 | for his bug reports and his bold activities in cross-checking | ||
1322 | the driver-code with his teststand. | ||
1323 | |||
1324 | 7.2 Sponsors & Supporters | ||
1325 | ------------------------- | ||
1326 | "Hallo IBM", | ||
1327 | IBM-Deutschland GmbH | ||
1328 | the service of IBM-Deutschland for customers. Their E-Mail | ||
1329 | service is unbeatable. Whatever old stuff I asked for, I | ||
1330 | always got some helpful answers. | ||
1331 | Karl-Otto Reimers, | ||
1332 | IBM Klub - Sparte IBM Geschichte, Sindelfingen | ||
1333 | for sending me a copy of the w/Cache manual from the | ||
1334 | IBM-Deutschland archives. | ||
1335 | Harald Staiger | ||
1336 | for his extensive hardware donations which allows me today | ||
1337 | still to test the driver in various constellations. | ||
1338 | Erich Fritscher | ||
1339 | for his very kind sponsoring. | ||
1340 | Louis Ohland, | ||
1341 | Charles Lasitter | ||
1342 | for support by shipping me an IBM SCSI-2 Fast/Wide manual. | ||
1343 | In addition, the contribution of various hardware is quite | ||
1344 | decessive and will make it possible to add FWSR (RAID) | ||
1345 | adapter support to the driver in the near future! So, | ||
1346 | complaints about no RAID support won't remain forever. | ||
1347 | Yes, folks, that is no joke, RAID support is going to rise! | ||
1348 | Erik Weber | ||
1349 | for the great deal we made about a model 9595 and the nice | ||
1350 | surrounding equipment and the cool trip to Mannheim | ||
1351 | second-hand computer market. In addition, I would like | ||
1352 | to thank him for his exhaustive SCSI-driver testing on his | ||
1353 | 95er PS/2 park. | ||
1354 | Anthony Hogbin | ||
1355 | for his direct shipment of a SCSI F/W adapter, which allowed | ||
1356 | me immediately on the first stage to try it on model 8557 | ||
1357 | together with onboard SCSI adapter and some SCSI w/Cache. | ||
1358 | Andreas Hotz | ||
1359 | for his support by memory and an IBM SCSI-adapter. Collecting | ||
1360 | all this together now allows me to try really things with | ||
1361 | the driver at maximum load and variety on various models in | ||
1362 | a very quick and efficient way. | ||
1363 | Peter Jennewein | ||
1364 | for his model 30, which serves me as part of my teststand | ||
1365 | and his cool remark about how you make an ordinary diskette | ||
1366 | drive working and how to connect it to an IBM-diskette port. | ||
1367 | Johannes Gutenberg-Universitaet, Mainz & | ||
1368 | Institut fuer Kernphysik, Mainz Microtron (MAMI) | ||
1369 | for the offered space, the link, placed on the central | ||
1370 | homepage and the space to store and offer the driver and | ||
1371 | related material and the free working times, which allow | ||
1372 | me to answer all your e-mail. | ||
1373 | |||
1374 | 8 Trademarks | ||
1375 | ------------ | ||
1376 | IBM, PS/2, OS/2, Microchannel are registered trademarks of International | ||
1377 | Business Machines Corporation | ||
1378 | |||
1379 | MS-DOS is a registered trademark of Microsoft Corporation | ||
1380 | |||
1381 | Microware, OS-9 are registered trademarks of Microware Systems | ||
1382 | |||
1383 | 9 Disclaimer | ||
1384 | ------------ | ||
1385 | Beside the GNU General Public License and the dependent disclaimers and disclaimers | ||
1386 | concerning the Linux-kernel in special, this SCSI-driver comes without any | ||
1387 | warranty. Its functionality is tested as good as possible on certain | ||
1388 | machines and combinations of computer hardware, which does not exclude, | ||
1389 | that data loss or severe damage of hardware is possible while using this | ||
1390 | part of software on some arbitrary computer hardware or in combination | ||
1391 | with other software packages. It is highly recommended to make backup | ||
1392 | copies of your data before using this software. Furthermore, personal | ||
1393 | injuries by hardware defects, that could be caused by this SCSI-driver are | ||
1394 | not excluded and it is highly recommended to handle this driver with a | ||
1395 | maximum of carefulness. | ||
1396 | |||
1397 | This driver supports hardware, produced by International Business Machines | ||
1398 | Corporation (IBM). | ||
1399 | |||
1400 | ------ | ||
1401 | Michael Lang | ||
1402 | (langa2@kph.uni-mainz.de) | ||
diff --git a/Documentation/serial/computone.txt b/Documentation/serial/computone.txt new file mode 100644 index 00000000000..60a6f657c37 --- /dev/null +++ b/Documentation/serial/computone.txt | |||
@@ -0,0 +1,522 @@ | |||
1 | NOTE: This is an unmaintained driver. It is not guaranteed to work due to | ||
2 | changes made in the tty layer in 2.6. If you wish to take over maintenance of | ||
3 | this driver, contact Michael Warfield <mhw@wittsend.com>. | ||
4 | |||
5 | Changelog: | ||
6 | ---------- | ||
7 | 11-01-2001: Original Document | ||
8 | |||
9 | 10-29-2004: Minor misspelling & format fix, update status of driver. | ||
10 | James Nelson <james4765@gmail.com> | ||
11 | |||
12 | Computone Intelliport II/Plus Multiport Serial Driver | ||
13 | ----------------------------------------------------- | ||
14 | |||
15 | Release Notes For Linux Kernel 2.2 and higher. | ||
16 | These notes are for the drivers which have already been integrated into the | ||
17 | kernel and have been tested on Linux kernels 2.0, 2.2, 2.3, and 2.4. | ||
18 | |||
19 | Version: 1.2.14 | ||
20 | Date: 11/01/2001 | ||
21 | Historical Author: Andrew Manison <amanison@america.net> | ||
22 | Primary Author: Doug McNash | ||
23 | Support: support@computone.com | ||
24 | Fixes and Updates: Mike Warfield <mhw@wittsend.com> | ||
25 | |||
26 | This file assumes that you are using the Computone drivers which are | ||
27 | integrated into the kernel sources. For updating the drivers or installing | ||
28 | drivers into kernels which do not already have Computone drivers, please | ||
29 | refer to the instructions in the README.computone file in the driver patch. | ||
30 | |||
31 | |||
32 | 1. INTRODUCTION | ||
33 | |||
34 | This driver supports the entire family of Intelliport II/Plus controllers | ||
35 | with the exception of the MicroChannel controllers. It does not support | ||
36 | products previous to the Intelliport II. | ||
37 | |||
38 | This driver was developed on the v2.0.x Linux tree and has been tested up | ||
39 | to v2.4.14; it will probably not work with earlier v1.X kernels,. | ||
40 | |||
41 | |||
42 | 2. QUICK INSTALLATION | ||
43 | |||
44 | Hardware - If you have an ISA card, find a free interrupt and io port. | ||
45 | List those in use with `cat /proc/interrupts` and | ||
46 | `cat /proc/ioports`. Set the card dip switches to a free | ||
47 | address. You may need to configure your BIOS to reserve an | ||
48 | irq for an ISA card. PCI and EISA parameters are set | ||
49 | automagically. Insert card into computer with the power off | ||
50 | before or after drivers installation. | ||
51 | |||
52 | Note the hardware address from the Computone ISA cards installed into | ||
53 | the system. These are required for editing ip2.c or editing | ||
54 | /etc/modprobe.conf, or for specification on the modprobe | ||
55 | command line. | ||
56 | |||
57 | Note that the /etc/modules.conf should be used for older (pre-2.6) | ||
58 | kernels. | ||
59 | |||
60 | Software - | ||
61 | |||
62 | Module installation: | ||
63 | |||
64 | a) Determine free irq/address to use if any (configure BIOS if need be) | ||
65 | b) Run "make config" or "make menuconfig" or "make xconfig" | ||
66 | Select (m) module for CONFIG_COMPUTONE under character | ||
67 | devices. CONFIG_PCI and CONFIG_MODULES also may need to be set. | ||
68 | c) Set address on ISA cards then: | ||
69 | edit /usr/src/linux/drivers/char/ip2.c if needed | ||
70 | or | ||
71 | edit /etc/modprobe.conf if needed (module). | ||
72 | or both to match this setting. | ||
73 | d) Run "make modules" | ||
74 | e) Run "make modules_install" | ||
75 | f) Run "/sbin/depmod -a" | ||
76 | g) install driver using `modprobe ip2 <options>` (options listed below) | ||
77 | h) run ip2mkdev (either the script below or the binary version) | ||
78 | |||
79 | |||
80 | Kernel installation: | ||
81 | |||
82 | a) Determine free irq/address to use if any (configure BIOS if need be) | ||
83 | b) Run "make config" or "make menuconfig" or "make xconfig" | ||
84 | Select (y) kernel for CONFIG_COMPUTONE under character | ||
85 | devices. CONFIG_PCI may need to be set if you have PCI bus. | ||
86 | c) Set address on ISA cards then: | ||
87 | edit /usr/src/linux/drivers/char/ip2.c | ||
88 | (Optional - may be specified on kernel command line now) | ||
89 | d) Run "make zImage" or whatever target you prefer. | ||
90 | e) mv /usr/src/linux/arch/x86/boot/zImage to /boot. | ||
91 | f) Add new config for this kernel into /etc/lilo.conf, run "lilo" | ||
92 | or copy to a floppy disk and boot from that floppy disk. | ||
93 | g) Reboot using this kernel | ||
94 | h) run ip2mkdev (either the script below or the binary version) | ||
95 | |||
96 | Kernel command line options: | ||
97 | |||
98 | When compiling the driver into the kernel, io and irq may be | ||
99 | compiled into the driver by editing ip2.c and setting the values for | ||
100 | io and irq in the appropriate array. An alternative is to specify | ||
101 | a command line parameter to the kernel at boot up. | ||
102 | |||
103 | ip2=io0,irq0,io1,irq1,io2,irq2,io3,irq3 | ||
104 | |||
105 | Note that this order is very different from the specifications for the | ||
106 | modload parameters which have separate IRQ and IO specifiers. | ||
107 | |||
108 | The io port also selects PCI (1) and EISA (2) boards. | ||
109 | |||
110 | io=0 No board | ||
111 | io=1 PCI board | ||
112 | io=2 EISA board | ||
113 | else ISA board io address | ||
114 | |||
115 | You only need to specify the boards which are present. | ||
116 | |||
117 | Examples: | ||
118 | |||
119 | 2 PCI boards: | ||
120 | |||
121 | ip2=1,0,1,0 | ||
122 | |||
123 | 1 ISA board at 0x310 irq 5: | ||
124 | |||
125 | ip2=0x310,5 | ||
126 | |||
127 | This can be added to and "append" option in lilo.conf similar to this: | ||
128 | |||
129 | append="ip2=1,0,1,0" | ||
130 | |||
131 | |||
132 | 3. INSTALLATION | ||
133 | |||
134 | Previously, the driver sources were packaged with a set of patch files | ||
135 | to update the character drivers' makefile and configuration file, and other | ||
136 | kernel source files. A build script (ip2build) was included which applies | ||
137 | the patches if needed, and build any utilities needed. | ||
138 | What you receive may be a single patch file in conventional kernel | ||
139 | patch format build script. That form can also be applied by | ||
140 | running patch -p1 < ThePatchFile. Otherwise run ip2build. | ||
141 | |||
142 | The driver can be installed as a module (recommended) or built into the | ||
143 | kernel. This is selected as for other drivers through the `make config` | ||
144 | command from the root of the Linux source tree. If the driver is built | ||
145 | into the kernel you will need to edit the file ip2.c to match the boards | ||
146 | you are installing. See that file for instructions. If the driver is | ||
147 | installed as a module the configuration can also be specified on the | ||
148 | modprobe command line as follows: | ||
149 | |||
150 | modprobe ip2 irq=irq1,irq2,irq3,irq4 io=addr1,addr2,addr3,addr4 | ||
151 | |||
152 | where irqnum is one of the valid Intelliport II interrupts (3,4,5,7,10,11, | ||
153 | 12,15) and addr1-4 are the base addresses for up to four controllers. If | ||
154 | the irqs are not specified the driver uses the default in ip2.c (which | ||
155 | selects polled mode). If no base addresses are specified the defaults in | ||
156 | ip2.c are used. If you are autoloading the driver module with kerneld or | ||
157 | kmod the base addresses and interrupt number must also be set in ip2.c | ||
158 | and recompile or just insert and options line in /etc/modprobe.conf or both. | ||
159 | The options line is equivalent to the command line and takes precedence over | ||
160 | what is in ip2.c. | ||
161 | |||
162 | /etc/modprobe.conf sample: | ||
163 | options ip2 io=1,0x328 irq=1,10 | ||
164 | alias char-major-71 ip2 | ||
165 | alias char-major-72 ip2 | ||
166 | alias char-major-73 ip2 | ||
167 | |||
168 | The equivalent in ip2.c: | ||
169 | |||
170 | static int io[IP2_MAX_BOARDS]= { 1, 0x328, 0, 0 }; | ||
171 | static int irq[IP2_MAX_BOARDS] = { 1, 10, -1, -1 }; | ||
172 | |||
173 | The equivalent for the kernel command line (in lilo.conf): | ||
174 | |||
175 | append="ip2=1,1,0x328,10" | ||
176 | |||
177 | |||
178 | Note: Both io and irq should be updated to reflect YOUR system. An "io" | ||
179 | address of 1 or 2 indicates a PCI or EISA card in the board table. | ||
180 | The PCI or EISA irq will be assigned automatically. | ||
181 | |||
182 | Specifying an invalid or in-use irq will default the driver into | ||
183 | running in polled mode for that card. If all irq entries are 0 then | ||
184 | all cards will operate in polled mode. | ||
185 | |||
186 | If you select the driver as part of the kernel run : | ||
187 | |||
188 | make zlilo (or whatever you do to create a bootable kernel) | ||
189 | |||
190 | If you selected a module run : | ||
191 | |||
192 | make modules && make modules_install | ||
193 | |||
194 | The utility ip2mkdev (see 5 and 7 below) creates all the device nodes | ||
195 | required by the driver. For a device to be created it must be configured | ||
196 | in the driver and the board must be installed. Only devices corresponding | ||
197 | to real IntelliPort II ports are created. With multiple boards and expansion | ||
198 | boxes this will leave gaps in the sequence of device names. ip2mkdev uses | ||
199 | Linux tty naming conventions: ttyF0 - ttyF255 for normal devices, and | ||
200 | cuf0 - cuf255 for callout devices. | ||
201 | |||
202 | |||
203 | 4. USING THE DRIVERS | ||
204 | |||
205 | As noted above, the driver implements the ports in accordance with Linux | ||
206 | conventions, and the devices should be interchangeable with the standard | ||
207 | serial devices. (This is a key point for problem reporting: please make | ||
208 | sure that what you are trying do works on the ttySx/cuax ports first; then | ||
209 | tell us what went wrong with the ip2 ports!) | ||
210 | |||
211 | Higher speeds can be obtained using the setserial utility which remaps | ||
212 | 38,400 bps (extb) to 57,600 bps, 115,200 bps, or a custom speed. | ||
213 | Intelliport II installations using the PowerPort expansion module can | ||
214 | use the custom speed setting to select the highest speeds: 153,600 bps, | ||
215 | 230,400 bps, 307,200 bps, 460,800bps and 921,600 bps. The base for | ||
216 | custom baud rate configuration is fixed at 921,600 for cards/expansion | ||
217 | modules with ST654's and 115200 for those with Cirrus CD1400's. This | ||
218 | corresponds to the maximum bit rates those chips are capable. | ||
219 | For example if the baud base is 921600 and the baud divisor is 18 then | ||
220 | the custom rate is 921600/18 = 51200 bps. See the setserial man page for | ||
221 | complete details. Of course if stty accepts the higher rates now you can | ||
222 | use that as well as the standard ioctls(). | ||
223 | |||
224 | |||
225 | 5. ip2mkdev and assorted utilities... | ||
226 | |||
227 | Several utilities, including the source for a binary ip2mkdev utility are | ||
228 | available under .../drivers/char/ip2. These can be build by changing to | ||
229 | that directory and typing "make" after the kernel has be built. If you do | ||
230 | not wish to compile the binary utilities, the shell script below can be | ||
231 | cut out and run as "ip2mkdev" to create the necessary device files. To | ||
232 | use the ip2mkdev script, you must have procfs enabled and the proc file | ||
233 | system mounted on /proc. | ||
234 | |||
235 | |||
236 | 6. NOTES | ||
237 | |||
238 | This is a release version of the driver, but it is impossible to test it | ||
239 | in all configurations of Linux. If there is any anomalous behaviour that | ||
240 | does not match the standard serial port's behaviour please let us know. | ||
241 | |||
242 | |||
243 | 7. ip2mkdev shell script | ||
244 | |||
245 | Previously, this script was simply attached here. It is now attached as a | ||
246 | shar archive to make it easier to extract the script from the documentation. | ||
247 | To create the ip2mkdev shell script change to a convenient directory (/tmp | ||
248 | works just fine) and run the following command: | ||
249 | |||
250 | unshar Documentation/serial/computone.txt | ||
251 | (This file) | ||
252 | |||
253 | You should now have a file ip2mkdev in your current working directory with | ||
254 | permissions set to execute. Running that script with then create the | ||
255 | necessary devices for the Computone boards, interfaces, and ports which | ||
256 | are present on you system at the time it is run. | ||
257 | |||
258 | |||
259 | #!/bin/sh | ||
260 | # This is a shell archive (produced by GNU sharutils 4.2.1). | ||
261 | # To extract the files from this archive, save it to some FILE, remove | ||
262 | # everything before the `!/bin/sh' line above, then type `sh FILE'. | ||
263 | # | ||
264 | # Made on 2001-10-29 10:32 EST by <mhw@alcove.wittsend.com>. | ||
265 | # Source directory was `/home2/src/tmp'. | ||
266 | # | ||
267 | # Existing files will *not* be overwritten unless `-c' is specified. | ||
268 | # | ||
269 | # This shar contains: | ||
270 | # length mode name | ||
271 | # ------ ---------- ------------------------------------------ | ||
272 | # 4251 -rwxr-xr-x ip2mkdev | ||
273 | # | ||
274 | save_IFS="${IFS}" | ||
275 | IFS="${IFS}:" | ||
276 | gettext_dir=FAILED | ||
277 | locale_dir=FAILED | ||
278 | first_param="$1" | ||
279 | for dir in $PATH | ||
280 | do | ||
281 | if test "$gettext_dir" = FAILED && test -f $dir/gettext \ | ||
282 | && ($dir/gettext --version >/dev/null 2>&1) | ||
283 | then | ||
284 | set `$dir/gettext --version 2>&1` | ||
285 | if test "$3" = GNU | ||
286 | then | ||
287 | gettext_dir=$dir | ||
288 | fi | ||
289 | fi | ||
290 | if test "$locale_dir" = FAILED && test -f $dir/shar \ | ||
291 | && ($dir/shar --print-text-domain-dir >/dev/null 2>&1) | ||
292 | then | ||
293 | locale_dir=`$dir/shar --print-text-domain-dir` | ||
294 | fi | ||
295 | done | ||
296 | IFS="$save_IFS" | ||
297 | if test "$locale_dir" = FAILED || test "$gettext_dir" = FAILED | ||
298 | then | ||
299 | echo=echo | ||
300 | else | ||
301 | TEXTDOMAINDIR=$locale_dir | ||
302 | export TEXTDOMAINDIR | ||
303 | TEXTDOMAIN=sharutils | ||
304 | export TEXTDOMAIN | ||
305 | echo="$gettext_dir/gettext -s" | ||
306 | fi | ||
307 | if touch -am -t 200112312359.59 $$.touch >/dev/null 2>&1 && test ! -f 200112312359.59 -a -f $$.touch; then | ||
308 | shar_touch='touch -am -t $1$2$3$4$5$6.$7 "$8"' | ||
309 | elif touch -am 123123592001.59 $$.touch >/dev/null 2>&1 && test ! -f 123123592001.59 -a ! -f 123123592001.5 -a -f $$.touch; then | ||
310 | shar_touch='touch -am $3$4$5$6$1$2.$7 "$8"' | ||
311 | elif touch -am 1231235901 $$.touch >/dev/null 2>&1 && test ! -f 1231235901 -a -f $$.touch; then | ||
312 | shar_touch='touch -am $3$4$5$6$2 "$8"' | ||
313 | else | ||
314 | shar_touch=: | ||
315 | echo | ||
316 | $echo 'WARNING: not restoring timestamps. Consider getting and' | ||
317 | $echo "installing GNU \`touch', distributed in GNU File Utilities..." | ||
318 | echo | ||
319 | fi | ||
320 | rm -f 200112312359.59 123123592001.59 123123592001.5 1231235901 $$.touch | ||
321 | # | ||
322 | if mkdir _sh17581; then | ||
323 | $echo 'x -' 'creating lock directory' | ||
324 | else | ||
325 | $echo 'failed to create lock directory' | ||
326 | exit 1 | ||
327 | fi | ||
328 | # ============= ip2mkdev ============== | ||
329 | if test -f 'ip2mkdev' && test "$first_param" != -c; then | ||
330 | $echo 'x -' SKIPPING 'ip2mkdev' '(file already exists)' | ||
331 | else | ||
332 | $echo 'x -' extracting 'ip2mkdev' '(text)' | ||
333 | sed 's/^X//' << 'SHAR_EOF' > 'ip2mkdev' && | ||
334 | #!/bin/sh - | ||
335 | # | ||
336 | # ip2mkdev | ||
337 | # | ||
338 | # Make or remove devices as needed for Computone Intelliport drivers | ||
339 | # | ||
340 | # First rule! If the dev file exists and you need it, don't mess | ||
341 | # with it. That prevents us from screwing up open ttys, ownership | ||
342 | # and permissions on a running system! | ||
343 | # | ||
344 | # This script will NOT remove devices that no longer exist if their | ||
345 | # board or interface box has been removed. If you want to get rid | ||
346 | # of them, you can manually do an "rm -f /dev/ttyF* /dev/cuaf*" | ||
347 | # before running this script. Running this script will then recreate | ||
348 | # all the valid devices. | ||
349 | # | ||
350 | # Michael H. Warfield | ||
351 | # /\/\|=mhw=|\/\/ | ||
352 | # mhw@wittsend.com | ||
353 | # | ||
354 | # Updated 10/29/2000 for version 1.2.13 naming convention | ||
355 | # under devfs. /\/\|=mhw=|\/\/ | ||
356 | # | ||
357 | # Updated 03/09/2000 for devfs support in ip2 drivers. /\/\|=mhw=|\/\/ | ||
358 | # | ||
359 | X | ||
360 | if test -d /dev/ip2 ; then | ||
361 | # This is devfs mode... We don't do anything except create symlinks | ||
362 | # from the real devices to the old names! | ||
363 | X cd /dev | ||
364 | X echo "Creating symbolic links to devfs devices" | ||
365 | X for i in `ls ip2` ; do | ||
366 | X if test ! -L ip2$i ; then | ||
367 | X # Remove it incase it wasn't a symlink (old device) | ||
368 | X rm -f ip2$i | ||
369 | X ln -s ip2/$i ip2$i | ||
370 | X fi | ||
371 | X done | ||
372 | X for i in `( cd tts ; ls F* )` ; do | ||
373 | X if test ! -L tty$i ; then | ||
374 | X # Remove it incase it wasn't a symlink (old device) | ||
375 | X rm -f tty$i | ||
376 | X ln -s tts/$i tty$i | ||
377 | X fi | ||
378 | X done | ||
379 | X for i in `( cd cua ; ls F* )` ; do | ||
380 | X DEVNUMBER=`expr $i : 'F\(.*\)'` | ||
381 | X if test ! -L cuf$DEVNUMBER ; then | ||
382 | X # Remove it incase it wasn't a symlink (old device) | ||
383 | X rm -f cuf$DEVNUMBER | ||
384 | X ln -s cua/$i cuf$DEVNUMBER | ||
385 | X fi | ||
386 | X done | ||
387 | X exit 0 | ||
388 | fi | ||
389 | X | ||
390 | if test ! -f /proc/tty/drivers | ||
391 | then | ||
392 | X echo "\ | ||
393 | Unable to check driver status. | ||
394 | Make sure proc file system is mounted." | ||
395 | X | ||
396 | X exit 255 | ||
397 | fi | ||
398 | X | ||
399 | if test ! -f /proc/tty/driver/ip2 | ||
400 | then | ||
401 | X echo "\ | ||
402 | Unable to locate ip2 proc file. | ||
403 | Attempting to load driver" | ||
404 | X | ||
405 | X if /sbin/insmod ip2 | ||
406 | X then | ||
407 | X if test ! -f /proc/tty/driver/ip2 | ||
408 | X then | ||
409 | X echo "\ | ||
410 | Unable to locate ip2 proc file after loading driver. | ||
411 | Driver initialization failure or driver version error. | ||
412 | " | ||
413 | X exit 255 | ||
414 | X fi | ||
415 | X else | ||
416 | X echo "Unable to load ip2 driver." | ||
417 | X exit 255 | ||
418 | X fi | ||
419 | fi | ||
420 | X | ||
421 | # Ok... So we got the driver loaded and we can locate the procfs files. | ||
422 | # Next we need our major numbers. | ||
423 | X | ||
424 | TTYMAJOR=`sed -e '/^ip2/!d' -e '/\/dev\/tt/!d' -e 's/.*tt[^ ]*[ ]*\([0-9]*\)[ ]*.*/\1/' < /proc/tty/drivers` | ||
425 | CUAMAJOR=`sed -e '/^ip2/!d' -e '/\/dev\/cu/!d' -e 's/.*cu[^ ]*[ ]*\([0-9]*\)[ ]*.*/\1/' < /proc/tty/drivers` | ||
426 | BRDMAJOR=`sed -e '/^Driver: /!d' -e 's/.*IMajor=\([0-9]*\)[ ]*.*/\1/' < /proc/tty/driver/ip2` | ||
427 | X | ||
428 | echo "\ | ||
429 | TTYMAJOR = $TTYMAJOR | ||
430 | CUAMAJOR = $CUAMAJOR | ||
431 | BRDMAJOR = $BRDMAJOR | ||
432 | " | ||
433 | X | ||
434 | # Ok... Now we should know our major numbers, if appropriate... | ||
435 | # Now we need our boards and start the device loops. | ||
436 | X | ||
437 | grep '^Board [0-9]:' /proc/tty/driver/ip2 | while read token number type alltherest | ||
438 | do | ||
439 | X # The test for blank "type" will catch the stats lead-in lines | ||
440 | X # if they exist in the file | ||
441 | X if test "$type" = "vacant" -o "$type" = "Vacant" -o "$type" = "" | ||
442 | X then | ||
443 | X continue | ||
444 | X fi | ||
445 | X | ||
446 | X BOARDNO=`expr "$number" : '\([0-9]\):'` | ||
447 | X PORTS=`expr "$alltherest" : '.*ports=\([0-9]*\)' | tr ',' ' '` | ||
448 | X MINORS=`expr "$alltherest" : '.*minors=\([0-9,]*\)' | tr ',' ' '` | ||
449 | X | ||
450 | X if test "$BOARDNO" = "" -o "$PORTS" = "" | ||
451 | X then | ||
452 | # This may be a bug. We should at least get this much information | ||
453 | X echo "Unable to process board line" | ||
454 | X continue | ||
455 | X fi | ||
456 | X | ||
457 | X if test "$MINORS" = "" | ||
458 | X then | ||
459 | # Silently skip this one. This board seems to have no boxes | ||
460 | X continue | ||
461 | X fi | ||
462 | X | ||
463 | X echo "board $BOARDNO: $type ports = $PORTS; port numbers = $MINORS" | ||
464 | X | ||
465 | X if test "$BRDMAJOR" != "" | ||
466 | X then | ||
467 | X BRDMINOR=`expr $BOARDNO \* 4` | ||
468 | X STSMINOR=`expr $BRDMINOR + 1` | ||
469 | X if test ! -c /dev/ip2ipl$BOARDNO ; then | ||
470 | X mknod /dev/ip2ipl$BOARDNO c $BRDMAJOR $BRDMINOR | ||
471 | X fi | ||
472 | X if test ! -c /dev/ip2stat$BOARDNO ; then | ||
473 | X mknod /dev/ip2stat$BOARDNO c $BRDMAJOR $STSMINOR | ||
474 | X fi | ||
475 | X fi | ||
476 | X | ||
477 | X if test "$TTYMAJOR" != "" | ||
478 | X then | ||
479 | X PORTNO=$BOARDBASE | ||
480 | X | ||
481 | X for PORTNO in $MINORS | ||
482 | X do | ||
483 | X if test ! -c /dev/ttyF$PORTNO ; then | ||
484 | X # We got the hardware but no device - make it | ||
485 | X mknod /dev/ttyF$PORTNO c $TTYMAJOR $PORTNO | ||
486 | X fi | ||
487 | X done | ||
488 | X fi | ||
489 | X | ||
490 | X if test "$CUAMAJOR" != "" | ||
491 | X then | ||
492 | X PORTNO=$BOARDBASE | ||
493 | X | ||
494 | X for PORTNO in $MINORS | ||
495 | X do | ||
496 | X if test ! -c /dev/cuf$PORTNO ; then | ||
497 | X # We got the hardware but no device - make it | ||
498 | X mknod /dev/cuf$PORTNO c $CUAMAJOR $PORTNO | ||
499 | X fi | ||
500 | X done | ||
501 | X fi | ||
502 | done | ||
503 | X | ||
504 | Xexit 0 | ||
505 | SHAR_EOF | ||
506 | (set 20 01 10 29 10 32 01 'ip2mkdev'; eval "$shar_touch") && | ||
507 | chmod 0755 'ip2mkdev' || | ||
508 | $echo 'restore of' 'ip2mkdev' 'failed' | ||
509 | if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \ | ||
510 | && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then | ||
511 | md5sum -c << SHAR_EOF >/dev/null 2>&1 \ | ||
512 | || $echo 'ip2mkdev:' 'MD5 check failed' | ||
513 | cb5717134509f38bad9fde6b1f79b4a4 ip2mkdev | ||
514 | SHAR_EOF | ||
515 | else | ||
516 | shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'ip2mkdev'`" | ||
517 | test 4251 -eq "$shar_count" || | ||
518 | $echo 'ip2mkdev:' 'original size' '4251,' 'current size' "$shar_count!" | ||
519 | fi | ||
520 | fi | ||
521 | rm -fr _sh17581 | ||
522 | exit 0 | ||
diff --git a/Documentation/sparc/README-2.5 b/Documentation/sparc/README-2.5 new file mode 100644 index 00000000000..806fe490a56 --- /dev/null +++ b/Documentation/sparc/README-2.5 | |||
@@ -0,0 +1,46 @@ | |||
1 | BTFIXUP | ||
2 | ------- | ||
3 | |||
4 | To build new kernels you have to issue "make image". The ready kernel | ||
5 | in ELF format is placed in arch/sparc/boot/image. Explanation is below. | ||
6 | |||
7 | BTFIXUP is a unique feature of Linux/sparc among other architectures, | ||
8 | developed by Jakub Jelinek (I think... Obviously David S. Miller took | ||
9 | part, too). It allows to boot the same kernel at different | ||
10 | sub-architectures, such as sun4c, sun4m, sun4d, where SunOS uses | ||
11 | different kernels. This feature is convinient for people who you move | ||
12 | disks between boxes and for distrution builders. | ||
13 | |||
14 | To function, BTFIXUP must link the kernel "in the draft" first, | ||
15 | analyze the result, write a special stub code based on that, and | ||
16 | build the final kernel with the stub (btfix.o). | ||
17 | |||
18 | Kai Germaschewski improved the build system of the kernel in the 2.5 series | ||
19 | significantly. Unfortunately, the traditional way of running the draft | ||
20 | linking from architecture specific Makefile before the actual linking | ||
21 | by generic Makefile is nearly impossible to support properly in the | ||
22 | new build system. Therefore, the way we integrate BTFIXUP with the | ||
23 | build system was changed in 2.5.40. Now, generic Makefile performs | ||
24 | the draft linking and stores the result in file vmlinux. Architecture | ||
25 | specific post-processing invokes BTFIXUP machinery and final linking | ||
26 | in the same way as other architectures do bootstraps. | ||
27 | |||
28 | Implications of that change are as follows. | ||
29 | |||
30 | 1. Hackers must type "make image" now, instead of just "make", in the same | ||
31 | way as s390 people do now. It is analogous to "make bzImage" on i386. | ||
32 | This does NOT affect sparc64, you continue to use "make" to build sparc64 | ||
33 | kernels. | ||
34 | |||
35 | 2. vmlinux is not the final kernel, so RPM builders have to adjust | ||
36 | their spec files (if they delivered vmlinux for debugging). | ||
37 | System.map generated for vmlinux is still valid. | ||
38 | |||
39 | 3. Scripts that produce a.out images have to be changed. First, if they | ||
40 | invoke make, they have to use "make image". Second, they have to pick up | ||
41 | the new kernel in arch/sparc/boot/image instead of vmlinux. | ||
42 | |||
43 | 4. Since we are compliant with Kai's build system now, make -j is permitted. | ||
44 | |||
45 | -- Pete Zaitcev | ||
46 | zaitcev@yahoo.com | ||
diff --git a/Documentation/telephony/00-INDEX b/Documentation/telephony/00-INDEX new file mode 100644 index 00000000000..4ffe0ed5b6f --- /dev/null +++ b/Documentation/telephony/00-INDEX | |||
@@ -0,0 +1,4 @@ | |||
1 | 00-INDEX | ||
2 | - this file. | ||
3 | ixj.txt | ||
4 | - document describing the Quicknet drivers. | ||
diff --git a/Documentation/telephony/ixj.txt b/Documentation/telephony/ixj.txt new file mode 100644 index 00000000000..db94fb6c567 --- /dev/null +++ b/Documentation/telephony/ixj.txt | |||
@@ -0,0 +1,394 @@ | |||
1 | Linux Quicknet-Drivers-Howto | ||
2 | Quicknet Technologies, Inc. (www.quicknet.net) | ||
3 | Version 0.3.4 December 18, 1999 | ||
4 | |||
5 | 1.0 Introduction | ||
6 | |||
7 | This document describes the first GPL release version of the Linux | ||
8 | driver for the Quicknet Internet PhoneJACK and Internet LineJACK | ||
9 | cards. More information about these cards is available at | ||
10 | www.quicknet.net. The driver version discussed in this document is | ||
11 | 0.3.4. | ||
12 | |||
13 | These cards offer nice telco style interfaces to use your standard | ||
14 | telephone/key system/PBX as the user interface for VoIP applications. | ||
15 | The Internet LineJACK also offers PSTN connectivity for a single line | ||
16 | Internet to PSTN gateway. Of course, you can add more than one card | ||
17 | to a system to obtain multi-line functionality. At this time, the | ||
18 | driver supports the POTS port on both the Internet PhoneJACK and the | ||
19 | Internet LineJACK, but the PSTN port on the latter card is not yet | ||
20 | supported. | ||
21 | |||
22 | This document, and the drivers for the cards, are intended for a | ||
23 | limited audience that includes technically capable programmers who | ||
24 | would like to experiment with Quicknet cards. The drivers are | ||
25 | considered in ALPHA status and are not yet considered stable enough | ||
26 | for general, widespread use in an unlimited audience. | ||
27 | |||
28 | That's worth saying again: | ||
29 | |||
30 | THE LINUX DRIVERS FOR QUICKNET CARDS ARE PRESENTLY IN A ALPHA STATE | ||
31 | AND SHOULD NOT BE CONSIDERED AS READY FOR NORMAL WIDESPREAD USE. | ||
32 | |||
33 | They are released early in the spirit of Internet development and to | ||
34 | make this technology available to innovators who would benefit from | ||
35 | early exposure. | ||
36 | |||
37 | When we promote the device driver to "beta" level it will be | ||
38 | considered ready for non-programmer, non-technical users. Until then, | ||
39 | please be aware that these drivers may not be stable and may affect | ||
40 | the performance of your system. | ||
41 | |||
42 | |||
43 | 1.1 Latest Additions/Improvements | ||
44 | |||
45 | The 0.3.4 version of the driver is the first GPL release. Several | ||
46 | features had to be removed from the prior binary only module, mostly | ||
47 | for reasons of Intellectual Property rights. We can't release | ||
48 | information that is not ours - so certain aspects of the driver had to | ||
49 | be removed to protect the rights of others. | ||
50 | |||
51 | Specifically, very old Internet PhoneJACK cards have non-standard | ||
52 | G.723.1 codecs (due to the early nature of the DSPs in those days). | ||
53 | The auto-conversion code to bring those cards into compliance with | ||
54 | today's standards is available as a binary only module to those people | ||
55 | needing it. If you bought your card after 1997 or so, you are OK - | ||
56 | it's only the very old cards that are affected. | ||
57 | |||
58 | Also, the code to download G.728/G.729/G.729a codecs to the DSP is | ||
59 | available as a binary only module as well. This IP is not ours to | ||
60 | release. | ||
61 | |||
62 | Hooks are built into the GPL driver to allow it to work with other | ||
63 | companion modules that are completely separate from this module. | ||
64 | |||
65 | 1.2 Copyright, Trademarks, Disclaimer, & Credits | ||
66 | |||
67 | Copyright | ||
68 | |||
69 | Copyright (c) 1999 Quicknet Technologies, Inc. Permission is granted | ||
70 | to freely copy and distribute this document provided you preserve it | ||
71 | in its original form. For corrections and minor changes contact the | ||
72 | maintainer at linux@quicknet.net. | ||
73 | |||
74 | Trademarks | ||
75 | |||
76 | Internet PhoneJACK and Internet LineJACK are registered trademarks of | ||
77 | Quicknet Technologies, Inc. | ||
78 | |||
79 | Disclaimer | ||
80 | |||
81 | Much of the info in this HOWTO is early information released by | ||
82 | Quicknet Technologies, Inc. for the express purpose of allowing early | ||
83 | testing and use of the Linux drivers developed for their products. | ||
84 | While every attempt has been made to be thorough, complete and | ||
85 | accurate, the information contained here may be unreliable and there | ||
86 | are likely a number of errors in this document. Please let the | ||
87 | maintainer know about them. Since this is free documentation, it | ||
88 | should be obvious that neither I nor previous authors can be held | ||
89 | legally responsible for any errors. | ||
90 | |||
91 | Credits | ||
92 | |||
93 | This HOWTO was written by: | ||
94 | |||
95 | Greg Herlein <gherlein@quicknet.net> | ||
96 | Ed Okerson <eokerson@quicknet.net> | ||
97 | |||
98 | 1.3 Future Plans: You Can Help | ||
99 | |||
100 | Please let the maintainer know of any errors in facts, opinions, | ||
101 | logic, spelling, grammar, clarity, links, etc. But first, if the date | ||
102 | is over a month old, check to see that you have the latest | ||
103 | version. Please send any info that you think belongs in this document. | ||
104 | |||
105 | You can also contribute code and/or bug-fixes for the sample | ||
106 | applications. | ||
107 | |||
108 | |||
109 | 1.4 Where to get things | ||
110 | |||
111 | Info on latest versions of the driver are here: | ||
112 | |||
113 | http://web.archive.org/web/*/http://www.quicknet.net/develop.htm | ||
114 | |||
115 | 1.5 Mailing List | ||
116 | |||
117 | Quicknet operates a mailing list to provide a public forum on using | ||
118 | these drivers. | ||
119 | |||
120 | To subscribe to the linux-sdk mailing list, send an email to: | ||
121 | |||
122 | majordomo@linux.quicknet.net | ||
123 | |||
124 | In the body of the email, type: | ||
125 | |||
126 | subscribe linux-sdk <your-email-address> | ||
127 | |||
128 | Please delete any signature block that you would normally add to the | ||
129 | bottom of your email - it tends to confuse majordomo. | ||
130 | |||
131 | To send mail to the list, address your mail to | ||
132 | |||
133 | linux-sdk@linux.quicknet.net | ||
134 | |||
135 | Your message will go out to everyone on the list. | ||
136 | |||
137 | To unsubscribe to the linux-sdk mailing list, send an email to: | ||
138 | |||
139 | majordomo@linux.quicknet.net | ||
140 | |||
141 | In the body of the email, type: | ||
142 | |||
143 | unsubscribe linux-sdk <your-email-address> | ||
144 | |||
145 | |||
146 | |||
147 | 2.0 Requirements | ||
148 | |||
149 | 2.1 Quicknet Card(s) | ||
150 | |||
151 | You will need at least one Internet PhoneJACK or Internet LineJACK | ||
152 | cards. These are ISA or PCI bus devices that use Plug-n-Play for | ||
153 | configuration, and use no IRQs. The driver will support up to 16 | ||
154 | cards in any one system, of any mix between the two types. | ||
155 | |||
156 | Note that you will need two cards to do any useful testing alone, since | ||
157 | you will need a card on both ends of the connection. Of course, if | ||
158 | you are doing collaborative work, perhaps your friends or coworkers | ||
159 | have cards too. If not, we'll gladly sell them some! | ||
160 | |||
161 | |||
162 | 2.2 ISAPNP | ||
163 | |||
164 | Since the Quicknet cards are Plug-n-Play devices, you will need the | ||
165 | isapnp tools package to configure the cards, or you can use the isapnp | ||
166 | module to autoconfigure them. The former package probably came with | ||
167 | your Linux distribution. Documentation on this package is available | ||
168 | online at: | ||
169 | |||
170 | http://mailer.wiwi.uni-marburg.de/linux/LDP/HOWTO/Plug-and-Play-HOWTO.html | ||
171 | |||
172 | The isapnp autoconfiguration is available on the Quicknet website at: | ||
173 | |||
174 | http://www.quicknet.net/develop.htm | ||
175 | |||
176 | though it may be in the kernel by the time you read this. | ||
177 | |||
178 | |||
179 | 3.0 Card Configuration | ||
180 | |||
181 | If you did not get your drivers as part of the linux kernel, do the | ||
182 | following to install them: | ||
183 | |||
184 | a. untar the distribution file. We use the following command: | ||
185 | tar -xvzf ixj-0.x.x.tgz | ||
186 | |||
187 | This creates a subdirectory holding all the necessary files. Go to that | ||
188 | subdirectory. | ||
189 | |||
190 | b. run the "ixj_dev_create" script to remove any stray device | ||
191 | files left in the /dev directory, and to create the new officially | ||
192 | designated device files. Note that the old devices were called | ||
193 | /dev/ixj, and the new method uses /dev/phone. | ||
194 | |||
195 | c. type "make;make install" - this will compile and install the | ||
196 | module. | ||
197 | |||
198 | d. type "depmod -av" to rebuild all your kernel version dependencies. | ||
199 | |||
200 | e. if you are using the isapnp module to configure the cards | ||
201 | automatically, then skip to step f. Otherwise, ensure that you | ||
202 | have run the isapnp configuration utility to properly configure | ||
203 | the cards. | ||
204 | |||
205 | e1. The Internet PhoneJACK has one configuration register that | ||
206 | requires 16 IO ports. The Internet LineJACK card has two | ||
207 | configuration registers and isapnp reports that IO 0 | ||
208 | requires 16 IO ports and IO 1 requires 8. The Quicknet | ||
209 | driver assumes that these registers are configured to be | ||
210 | contiguous, i.e. if IO 0 is set to 0x340 then IO 1 should | ||
211 | be set to 0x350. | ||
212 | |||
213 | Make sure that none of the cards overlap if you have | ||
214 | multiple cards in the system. | ||
215 | |||
216 | If you are new to the isapnp tools, you can jumpstart | ||
217 | yourself by doing the following: | ||
218 | |||
219 | e2. go to the /etc directory and run pnpdump to get a blank | ||
220 | isapnp.conf file. | ||
221 | |||
222 | pnpdump > /etc/isapnp.conf | ||
223 | |||
224 | e3. edit the /etc/isapnp.conf file to set the IO warnings and | ||
225 | the register IO addresses. The IO warnings means that you | ||
226 | should find the line in the file that looks like this: | ||
227 | |||
228 | (CONFLICT (IO FATAL)(IRQ FATAL)(DMA FATAL)(MEM FATAL)) # or WARNING | ||
229 | |||
230 | and you should edit the line to look like this: | ||
231 | |||
232 | (CONFLICT (IO WARNING)(IRQ FATAL)(DMA FATAL)(MEM FATAL)) # | ||
233 | or WARNING | ||
234 | |||
235 | The next step is to set the IO port addresses. The issue | ||
236 | here is that isapnp does not identify all of the ports out | ||
237 | there. Specifically any device that does not have a driver | ||
238 | or module loaded by Linux will not be registered. This | ||
239 | includes older sound cards and network cards. We have | ||
240 | found that the IO port 0x300 is often used even though | ||
241 | isapnp claims that no-one is using those ports. We | ||
242 | recommend that for a single card installation that port | ||
243 | 0x340 (and 0x350) be used. The IO port line should change | ||
244 | from this: | ||
245 | |||
246 | (IO 0 (SIZE 16) (BASE 0x0300) (CHECK)) | ||
247 | |||
248 | to this: | ||
249 | |||
250 | (IO 0 (SIZE 16) (BASE 0x0340) ) | ||
251 | |||
252 | e4. if you have multiple Quicknet cards, make sure that you do | ||
253 | not have any overlaps. Be especially careful if you are | ||
254 | mixing Internet PhoneJACK and Internet LineJACK cards in | ||
255 | the same system. In these cases we recommend moving the | ||
256 | IO port addresses to the 0x400 block. Please note that on | ||
257 | a few machines the 0x400 series are used. Feel free to | ||
258 | experiment with other addresses. Our cards have been | ||
259 | proven to work using IO addresses of up to 0xFF0. | ||
260 | |||
261 | e5. the last step is to uncomment the activation line so the | ||
262 | drivers will be associated with the port. This means the | ||
263 | line (immediately below) the IO line should go from this: | ||
264 | |||
265 | # (ACT Y) | ||
266 | |||
267 | to this: | ||
268 | |||
269 | (ACT Y) | ||
270 | |||
271 | Once you have finished editing the isapnp.conf file you | ||
272 | must submit it into the pnp driverconfigure the cards. | ||
273 | This is done using the following command: | ||
274 | |||
275 | isapnp isapnp.conf | ||
276 | |||
277 | If this works you should see a line that identifies the | ||
278 | Quicknet device, the IO port(s) chosen, and a message | ||
279 | "Enabled OK". | ||
280 | |||
281 | f. if you are loading the module by hand, use insmod. An example | ||
282 | of this would look like this: | ||
283 | |||
284 | insmod phonedev | ||
285 | insmod ixj dspio=0x320,0x310 xio=0,0x330 | ||
286 | |||
287 | Then verify the module loaded by running lsmod. If you are not using a | ||
288 | module that matches your kernel version, you may need to "force" the | ||
289 | load using the -f option in the insmod command. | ||
290 | |||
291 | insmod phonedev | ||
292 | insmod -f ixj dspio=0x320,0x310 xio=0,0x330 | ||
293 | |||
294 | |||
295 | If you are using isapnp to autoconfigure your card, then you do NOT | ||
296 | need any of the above, though you need to use depmod to load the | ||
297 | driver, like this: | ||
298 | |||
299 | depmod ixj | ||
300 | |||
301 | which will result in the needed drivers getting loaded automatically. | ||
302 | |||
303 | g. if you are planning on having the kernel automatically request | ||
304 | the module for you, then you need to edit /etc/conf.modules and add the | ||
305 | following lines: | ||
306 | |||
307 | options ixj dspio=0x340 xio=0x330 ixjdebug=0 | ||
308 | |||
309 | If you do this, then when you execute an application that uses the | ||
310 | module the kernel will request that it is loaded. | ||
311 | |||
312 | h. if you want non-root users to be able to read and write to the | ||
313 | ixj devices (this is a good idea!) you should do the following: | ||
314 | |||
315 | - decide upon a group name to use and create that group if | ||
316 | needed. Add the user names to that group that you wish to | ||
317 | have access to the device. For example, we typically will | ||
318 | create a group named "ixj" in /etc/group and add all users | ||
319 | to that group that we want to run software that can use the | ||
320 | ixjX devices. | ||
321 | |||
322 | - change the permissions on the device files, like this: | ||
323 | |||
324 | chgrp ixj /dev/ixj* | ||
325 | chmod 660 /dev/ixj* | ||
326 | |||
327 | Once this is done, then non-root users should be able to use the | ||
328 | devices. If you have enabled autoloading of modules, then the user | ||
329 | should be able to open the device and have the module loaded | ||
330 | automatically for them. | ||
331 | |||
332 | |||
333 | 4.0 Driver Installation problems. | ||
334 | |||
335 | We have tested these drivers on the 2.2.9, 2.2.10, 2.2.12, and 2.2.13 kernels | ||
336 | and in all cases have eventually been able to get the drivers to load and | ||
337 | run. We have found four types of problems that prevent this from happening. | ||
338 | The problems and solutions are: | ||
339 | |||
340 | a. A step was missed in the installation. Go back and use section 3 | ||
341 | as a checklist. Many people miss running the ixj_dev_create script and thus | ||
342 | never load the device names into the filesystem. | ||
343 | |||
344 | b. The kernel is inconsistently linked. We have found this problem in | ||
345 | the Out Of the Box installation of several distributions. The symptoms | ||
346 | are that neither driver will load, and that the unknown symbols include "jiffy" | ||
347 | and "kmalloc". The solution is to recompile both the kernel and the | ||
348 | modules. The command string for the final compile looks like this: | ||
349 | |||
350 | In the kernel directory: | ||
351 | 1. cp .config /tmp | ||
352 | 2. make mrproper | ||
353 | 3. cp /tmp/.config . | ||
354 | 4. make clean;make bzImage;make modules;make modules_install | ||
355 | |||
356 | This rebuilds both the kernel and all the modules and makes sure they all | ||
357 | have the same linkages. This generally solves the problem once the new | ||
358 | kernel is installed and the system rebooted. | ||
359 | |||
360 | c. The kernel has been patched, then unpatched. This happens when | ||
361 | someone decides to use an earlier kernel after they load a later kernel. | ||
362 | The symptoms are proceeding through all three above steps and still not | ||
363 | being able to load the driver. What has happened is that the generated | ||
364 | header files are out of sync with the kernel itself. The solution is | ||
365 | to recompile (again) using "make mrproper". This will remove and then | ||
366 | regenerate all the necessary header files. Once this is done, then you | ||
367 | need to install and reboot the kernel. We have not seen any problem | ||
368 | loading one of our drivers after this treatment. | ||
369 | |||
370 | 5.0 Known Limitations | ||
371 | |||
372 | We cannot currently play "dial-tone" and listen for DTMF digits at the | ||
373 | same time using the ISA PhoneJACK. This is a bug in the 8020 DSP chip | ||
374 | used on that product. All other Quicknet products function normally | ||
375 | in this regard. We have a work-around, but it's not done yet. Until | ||
376 | then, if you want dial-tone, you can always play a recorded dial-tone | ||
377 | sound into the audio until you have gathered the DTMF digits. | ||
378 | |||
379 | |||
380 | |||
381 | |||
382 | |||
383 | |||
384 | |||
385 | |||
386 | |||
387 | |||
388 | |||
389 | |||
390 | |||
391 | |||
392 | |||
393 | |||
394 | |||
diff --git a/Documentation/trace/tracedump.txt b/Documentation/trace/tracedump.txt new file mode 100644 index 00000000000..cba0decc3fc --- /dev/null +++ b/Documentation/trace/tracedump.txt | |||
@@ -0,0 +1,58 @@ | |||
1 | Tracedump | ||
2 | |||
3 | Documentation written by Alon Farchy | ||
4 | |||
5 | 1. Overview | ||
6 | ============ | ||
7 | |||
8 | The tracedump module provides additional mechanisms to retrieve tracing data. | ||
9 | It can be used to retrieve traces after a kernel panic or while the system | ||
10 | is running in either binary format or plaintext. The dumped data is compressed | ||
11 | with zlib to conserve space. | ||
12 | |||
13 | 2. Configuration Options | ||
14 | ======================== | ||
15 | |||
16 | CONFIG_TRACEDUMP - enable the tracedump module. | ||
17 | CONFIG_TRACEDUMP_PANIC - dump to console on kernel panic | ||
18 | CONFIG_TRACEDUMP_PROCFS - add file /proc/tracedump for userspace access. | ||
19 | |||
20 | 3. Module Parameters | ||
21 | ==================== | ||
22 | |||
23 | format_ascii | ||
24 | |||
25 | If 1, data will dump in human-readable format, ordered by time. | ||
26 | If 0, data will be dumped as raw pages from the ring buffer, | ||
27 | ordered by CPU, followed by the saved cmdlines so that the | ||
28 | raw data can be decoded. Default: 0 | ||
29 | |||
30 | panic_size | ||
31 | |||
32 | Maximum amount of compressed data to dump during a kernel panic | ||
33 | in kilobytes. This only applies if format_ascii == 1. In this case, | ||
34 | tracedump will compress the data, check the size, and if it is too big | ||
35 | toss out some data, compress again, etc, until the size is below | ||
36 | panic_size. Default: 512KB | ||
37 | |||
38 | compress_level | ||
39 | |||
40 | Determines the compression level that zlib will use. Available levels | ||
41 | are 0-9, with 0 as no compression and 9 as maximum compression. | ||
42 | Default: 9. | ||
43 | |||
44 | 4. Usage | ||
45 | ======== | ||
46 | |||
47 | If configured with CONFIG_TRACEDUMP_PROCFS, the tracing data can be pulled | ||
48 | by reading from /proc/tracedump. For example: | ||
49 | |||
50 | # cat /proc/tracedump > my_tracedump | ||
51 | |||
52 | Tracedump will surround the dump with a magic word (TRACEDUMP). Between the | ||
53 | magic words is the compressed data, which can be decompressed with a standard | ||
54 | zlib implementation. After decompression, if format_ascii == 1, then the | ||
55 | output should be readable. | ||
56 | |||
57 | If format_ascii == 0, the output should be in binary form, delimited by | ||
58 | CPU_END. After the last CPU should be the saved cmdlines, delimited by |. | ||
diff --git a/Documentation/trace/tracelevel.txt b/Documentation/trace/tracelevel.txt new file mode 100644 index 00000000000..b282dd2b329 --- /dev/null +++ b/Documentation/trace/tracelevel.txt | |||
@@ -0,0 +1,42 @@ | |||
1 | Tracelevel | ||
2 | |||
3 | Documentation by Alon Farchy | ||
4 | |||
5 | 1. Overview | ||
6 | =========== | ||
7 | |||
8 | Tracelevel allows subsystem authors to add trace priorities to | ||
9 | their tracing events. High priority traces will be enabled | ||
10 | automatically at boot time. | ||
11 | |||
12 | This module is configured with CONFIG_TRACELEVEL. | ||
13 | |||
14 | 2. Usage | ||
15 | ========= | ||
16 | |||
17 | To give an event a priority, use the function tracelevel_register | ||
18 | at any time. | ||
19 | |||
20 | tracelevel_register(my_event, level); | ||
21 | |||
22 | my_event corresponds directly to the event name as defined in the | ||
23 | event header file. Available levels are: | ||
24 | |||
25 | TRACELEVEL_ERR 3 | ||
26 | TRACELEVEL_WARN 2 | ||
27 | TRACELEVEL_INFO 1 | ||
28 | TRACELEVEL_DEBUG 0 | ||
29 | |||
30 | Any event registered at boot time as TRACELEVEL_ERR will be enabled | ||
31 | by default. The header also exposes the function tracelevel_set_level | ||
32 | to change the trace level at runtime. Any trace event registered with the | ||
33 | specified level or higher will be enabled with this call. | ||
34 | |||
35 | A userspace handle to tracelevel_set_level is available via the module | ||
36 | parameter 'level'. For example, | ||
37 | |||
38 | echo 1 > /sys/module/tracelevel/parameters/level | ||
39 | |||
40 | Is logically equivalent to: | ||
41 | |||
42 | tracelevel_set_level(TRACELEVEL_INFO); | ||
diff --git a/Documentation/video/tegra_dc_ext.txt b/Documentation/video/tegra_dc_ext.txt new file mode 100644 index 00000000000..6fc3394c665 --- /dev/null +++ b/Documentation/video/tegra_dc_ext.txt | |||
@@ -0,0 +1,83 @@ | |||
1 | The Tegra display controller (dc) driver has two frontends that implement | ||
2 | different interfaces: | ||
3 | 1. The traditional fbdev interface, implemented in drivers/video/tegra/fb.c | ||
4 | 2. A new interface that exposes the unique capabilities of the controller, | ||
5 | implemented in drivers/video/tegra/dc/ext | ||
6 | |||
7 | The Tegra fbdev capabilities are documented in fb/tegrafb.c [TODO]. This | ||
8 | document will describe the new "extended" dc interface. | ||
9 | |||
10 | The extended interface is only available when its frontend has been compiled | ||
11 | in, i.e., CONFIG_TEGRA_DC_EXTENSIONS=y. The dc_ext frontend can coexist with | ||
12 | tegrafb, but takes precedence (more on that later). | ||
13 | |||
14 | The dc_ext frontend's interface to userspace is exposed through a set of | ||
15 | device nodes: one for each controller (generally /dev/tegra_dc_N), and one | ||
16 | "control" node (generally /dev/tegra_dc_ctrl). Communication through these | ||
17 | device nodes is done with special IOCTLs. There is also an event delivery | ||
18 | mechanism; userspace can wait for and receive events with read() or poll(). | ||
19 | |||
20 | The tegra_dc_N interface is stateful; each fresh open() of the device node | ||
21 | creates a client instance. In order to prevent multiple processes from | ||
22 | "fighting" for the hardware, only one client instance is permitted to control | ||
23 | certain resources at a time, on a first-come, first-serve basis. | ||
24 | |||
25 | Overview of tegra_dc_N IOCTLs: | ||
26 | SET_NVMAP_FD: This is used to associate your nvmap client with this dc_ext | ||
27 | client instance. This is necessary so that the kernel can | ||
28 | appropriately enforce permissions on nvmap buffers. | ||
29 | |||
30 | GET_WINDOW: A dc_ext client must call this on each window that it wishes to | ||
31 | control. This strictly enforces a single dc_ext client on a | ||
32 | window at a time. | ||
33 | |||
34 | PUT_WINDOW: A dc_ext client may call this to release a window previously | ||
35 | reserved with GET_WINDOW. | ||
36 | |||
37 | FLIP: This ioctl is used to actually display an nvmap surface using one or | ||
38 | more window. Each time a dc_ext client performs a FLIP, the request is | ||
39 | put on a flip queue and executed asynchronously (the FLIP ioctl will | ||
40 | return immediately). Various parameters are available in the | ||
41 | tegra_dc_ext_flip structure. | ||
42 | A dc_ext client may only use this on windows that it has previously | ||
43 | reserved with a successful GET_WINDOW call. | ||
44 | |||
45 | GET_CURSOR: This is analogous to GET_WINDOW, but for the hardware cursor | ||
46 | instead of a window. | ||
47 | |||
48 | PUT_CURSOR: This is analogous to PUT_WINDOW, but for the hardware cursor | ||
49 | instead of a window. | ||
50 | |||
51 | SET_CURSOR_IMAGE: This is used to change the hardware cursor image. May only | ||
52 | be used by a client who has successfully performed a | ||
53 | GET_CURSOR call. | ||
54 | |||
55 | SET_CURSOR: This is used to actually place the hardware cursor on the screen. | ||
56 | May only be used by a client who has successfully performed a | ||
57 | GET_CURSOR call. | ||
58 | |||
59 | SET_CSC: This may be used to set a color space conversion matrix on a window. | ||
60 | A dc_ext client may only use this on windows that it has previously | ||
61 | reserved with a successful GET_WINDOW call. | ||
62 | |||
63 | GET_STATUS: This is used to retrieve general status about the dc. | ||
64 | |||
65 | GET_VBLANK_SYNCPT: This is used to retrieve the auto-incrementing vblank | ||
66 | syncpoint for the head associated with this dc. | ||
67 | |||
68 | |||
69 | Overview of tegra_dc_ctrl IOCTLs: | ||
70 | GET_NUM_OUTPUTS: This returns the number of available output devices on the | ||
71 | system, which may exceed the number of display controllers. | ||
72 | |||
73 | GET_OUTPUT_PROPERTIES: This returns data about the given output, such as what | ||
74 | kind of output it is, whether it's currently associated | ||
75 | with a head, etc. | ||
76 | |||
77 | GET_OUTPUT_EDID: This returns the binary EDID read from the device connected | ||
78 | to the given output, if any. | ||
79 | |||
80 | SET_EVENT_MASK: A dc_ext client may call this ioctl with a bitmask of events | ||
81 | that it wishes to receive. These events will then be | ||
82 | available to that client on a subsequent read() on the same | ||
83 | file descriptor. | ||
diff --git a/Documentation/virtual/lguest/Makefile b/Documentation/virtual/lguest/Makefile new file mode 100644 index 00000000000..0ac34206f7a --- /dev/null +++ b/Documentation/virtual/lguest/Makefile | |||
@@ -0,0 +1,8 @@ | |||
1 | # This creates the demonstration utility "lguest" which runs a Linux guest. | ||
2 | # Missing headers? Add "-I../../../include -I../../../arch/x86/include" | ||
3 | CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -U_FORTIFY_SOURCE | ||
4 | |||
5 | all: lguest | ||
6 | |||
7 | clean: | ||
8 | rm -f lguest | ||
diff --git a/Documentation/virtual/lguest/extract b/Documentation/virtual/lguest/extract new file mode 100644 index 00000000000..7730bb6e4b9 --- /dev/null +++ b/Documentation/virtual/lguest/extract | |||
@@ -0,0 +1,58 @@ | |||
1 | #! /bin/sh | ||
2 | |||
3 | set -e | ||
4 | |||
5 | PREFIX=$1 | ||
6 | shift | ||
7 | |||
8 | trap 'rm -r $TMPDIR' 0 | ||
9 | TMPDIR=`mktemp -d` | ||
10 | |||
11 | exec 3>/dev/null | ||
12 | for f; do | ||
13 | while IFS=" | ||
14 | " read -r LINE; do | ||
15 | case "$LINE" in | ||
16 | *$PREFIX:[0-9]*:\**) | ||
17 | NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"` | ||
18 | if [ -f $TMPDIR/$NUM ]; then | ||
19 | echo "$TMPDIR/$NUM already exits prior to $f" | ||
20 | exit 1 | ||
21 | fi | ||
22 | exec 3>>$TMPDIR/$NUM | ||
23 | echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM | ||
24 | /bin/echo "$LINE" | sed -e "s/$PREFIX:[0-9]*//" -e "s/:\*/*/" >&3 | ||
25 | ;; | ||
26 | *$PREFIX:[0-9]*) | ||
27 | NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"` | ||
28 | if [ -f $TMPDIR/$NUM ]; then | ||
29 | echo "$TMPDIR/$NUM already exits prior to $f" | ||
30 | exit 1 | ||
31 | fi | ||
32 | exec 3>>$TMPDIR/$NUM | ||
33 | echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM | ||
34 | /bin/echo "$LINE" | sed "s/$PREFIX:[0-9]*//" >&3 | ||
35 | ;; | ||
36 | *:\**) | ||
37 | /bin/echo "$LINE" | sed -e "s/:\*/*/" -e "s,/\*\*/,," >&3 | ||
38 | echo >&3 | ||
39 | exec 3>/dev/null | ||
40 | ;; | ||
41 | *) | ||
42 | /bin/echo "$LINE" >&3 | ||
43 | ;; | ||
44 | esac | ||
45 | done < $f | ||
46 | echo >&3 | ||
47 | exec 3>/dev/null | ||
48 | done | ||
49 | |||
50 | LASTFILE="" | ||
51 | for f in $TMPDIR/*; do | ||
52 | if [ "$LASTFILE" != $(cat $TMPDIR/.$(basename $f) ) ]; then | ||
53 | LASTFILE=$(cat $TMPDIR/.$(basename $f) ) | ||
54 | echo "[ $LASTFILE ]" | ||
55 | fi | ||
56 | cat $f | ||
57 | done | ||
58 | |||
diff --git a/Documentation/virtual/lguest/lguest.c b/Documentation/virtual/lguest/lguest.c new file mode 100644 index 00000000000..d928c134dee --- /dev/null +++ b/Documentation/virtual/lguest/lguest.c | |||
@@ -0,0 +1,2065 @@ | |||
1 | /*P:100 | ||
2 | * This is the Launcher code, a simple program which lays out the "physical" | ||
3 | * memory for the new Guest by mapping the kernel image and the virtual | ||
4 | * devices, then opens /dev/lguest to tell the kernel about the Guest and | ||
5 | * control it. | ||
6 | :*/ | ||
7 | #define _LARGEFILE64_SOURCE | ||
8 | #define _GNU_SOURCE | ||
9 | #include <stdio.h> | ||
10 | #include <string.h> | ||
11 | #include <unistd.h> | ||
12 | #include <err.h> | ||
13 | #include <stdint.h> | ||
14 | #include <stdlib.h> | ||
15 | #include <elf.h> | ||
16 | #include <sys/mman.h> | ||
17 | #include <sys/param.h> | ||
18 | #include <sys/types.h> | ||
19 | #include <sys/stat.h> | ||
20 | #include <sys/wait.h> | ||
21 | #include <sys/eventfd.h> | ||
22 | #include <fcntl.h> | ||
23 | #include <stdbool.h> | ||
24 | #include <errno.h> | ||
25 | #include <ctype.h> | ||
26 | #include <sys/socket.h> | ||
27 | #include <sys/ioctl.h> | ||
28 | #include <sys/time.h> | ||
29 | #include <time.h> | ||
30 | #include <netinet/in.h> | ||
31 | #include <net/if.h> | ||
32 | #include <linux/sockios.h> | ||
33 | #include <linux/if_tun.h> | ||
34 | #include <sys/uio.h> | ||
35 | #include <termios.h> | ||
36 | #include <getopt.h> | ||
37 | #include <assert.h> | ||
38 | #include <sched.h> | ||
39 | #include <limits.h> | ||
40 | #include <stddef.h> | ||
41 | #include <signal.h> | ||
42 | #include <pwd.h> | ||
43 | #include <grp.h> | ||
44 | |||
45 | #include <linux/virtio_config.h> | ||
46 | #include <linux/virtio_net.h> | ||
47 | #include <linux/virtio_blk.h> | ||
48 | #include <linux/virtio_console.h> | ||
49 | #include <linux/virtio_rng.h> | ||
50 | #include <linux/virtio_ring.h> | ||
51 | #include <asm/bootparam.h> | ||
52 | #include "../../../include/linux/lguest_launcher.h" | ||
53 | /*L:110 | ||
54 | * We can ignore the 43 include files we need for this program, but I do want | ||
55 | * to draw attention to the use of kernel-style types. | ||
56 | * | ||
57 | * As Linus said, "C is a Spartan language, and so should your naming be." I | ||
58 | * like these abbreviations, so we define them here. Note that u64 is always | ||
59 | * unsigned long long, which works on all Linux systems: this means that we can | ||
60 | * use %llu in printf for any u64. | ||
61 | */ | ||
62 | typedef unsigned long long u64; | ||
63 | typedef uint32_t u32; | ||
64 | typedef uint16_t u16; | ||
65 | typedef uint8_t u8; | ||
66 | /*:*/ | ||
67 | |||
68 | #define BRIDGE_PFX "bridge:" | ||
69 | #ifndef SIOCBRADDIF | ||
70 | #define SIOCBRADDIF 0x89a2 /* add interface to bridge */ | ||
71 | #endif | ||
72 | /* We can have up to 256 pages for devices. */ | ||
73 | #define DEVICE_PAGES 256 | ||
74 | /* This will occupy 3 pages: it must be a power of 2. */ | ||
75 | #define VIRTQUEUE_NUM 256 | ||
76 | |||
77 | /*L:120 | ||
78 | * verbose is both a global flag and a macro. The C preprocessor allows | ||
79 | * this, and although I wouldn't recommend it, it works quite nicely here. | ||
80 | */ | ||
81 | static bool verbose; | ||
82 | #define verbose(args...) \ | ||
83 | do { if (verbose) printf(args); } while(0) | ||
84 | /*:*/ | ||
85 | |||
86 | /* The pointer to the start of guest memory. */ | ||
87 | static void *guest_base; | ||
88 | /* The maximum guest physical address allowed, and maximum possible. */ | ||
89 | static unsigned long guest_limit, guest_max; | ||
90 | /* The /dev/lguest file descriptor. */ | ||
91 | static int lguest_fd; | ||
92 | |||
93 | /* a per-cpu variable indicating whose vcpu is currently running */ | ||
94 | static unsigned int __thread cpu_id; | ||
95 | |||
96 | /* This is our list of devices. */ | ||
97 | struct device_list { | ||
98 | /* Counter to assign interrupt numbers. */ | ||
99 | unsigned int next_irq; | ||
100 | |||
101 | /* Counter to print out convenient device numbers. */ | ||
102 | unsigned int device_num; | ||
103 | |||
104 | /* The descriptor page for the devices. */ | ||
105 | u8 *descpage; | ||
106 | |||
107 | /* A single linked list of devices. */ | ||
108 | struct device *dev; | ||
109 | /* And a pointer to the last device for easy append. */ | ||
110 | struct device *lastdev; | ||
111 | }; | ||
112 | |||
113 | /* The list of Guest devices, based on command line arguments. */ | ||
114 | static struct device_list devices; | ||
115 | |||
116 | /* The device structure describes a single device. */ | ||
117 | struct device { | ||
118 | /* The linked-list pointer. */ | ||
119 | struct device *next; | ||
120 | |||
121 | /* The device's descriptor, as mapped into the Guest. */ | ||
122 | struct lguest_device_desc *desc; | ||
123 | |||
124 | /* We can't trust desc values once Guest has booted: we use these. */ | ||
125 | unsigned int feature_len; | ||
126 | unsigned int num_vq; | ||
127 | |||
128 | /* The name of this device, for --verbose. */ | ||
129 | const char *name; | ||
130 | |||
131 | /* Any queues attached to this device */ | ||
132 | struct virtqueue *vq; | ||
133 | |||
134 | /* Is it operational */ | ||
135 | bool running; | ||
136 | |||
137 | /* Device-specific data. */ | ||
138 | void *priv; | ||
139 | }; | ||
140 | |||
141 | /* The virtqueue structure describes a queue attached to a device. */ | ||
142 | struct virtqueue { | ||
143 | struct virtqueue *next; | ||
144 | |||
145 | /* Which device owns me. */ | ||
146 | struct device *dev; | ||
147 | |||
148 | /* The configuration for this queue. */ | ||
149 | struct lguest_vqconfig config; | ||
150 | |||
151 | /* The actual ring of buffers. */ | ||
152 | struct vring vring; | ||
153 | |||
154 | /* Last available index we saw. */ | ||
155 | u16 last_avail_idx; | ||
156 | |||
157 | /* How many are used since we sent last irq? */ | ||
158 | unsigned int pending_used; | ||
159 | |||
160 | /* Eventfd where Guest notifications arrive. */ | ||
161 | int eventfd; | ||
162 | |||
163 | /* Function for the thread which is servicing this virtqueue. */ | ||
164 | void (*service)(struct virtqueue *vq); | ||
165 | pid_t thread; | ||
166 | }; | ||
167 | |||
168 | /* Remember the arguments to the program so we can "reboot" */ | ||
169 | static char **main_args; | ||
170 | |||
171 | /* The original tty settings to restore on exit. */ | ||
172 | static struct termios orig_term; | ||
173 | |||
174 | /* | ||
175 | * We have to be careful with barriers: our devices are all run in separate | ||
176 | * threads and so we need to make sure that changes visible to the Guest happen | ||
177 | * in precise order. | ||
178 | */ | ||
179 | #define wmb() __asm__ __volatile__("" : : : "memory") | ||
180 | #define mb() __asm__ __volatile__("" : : : "memory") | ||
181 | |||
182 | /* | ||
183 | * Convert an iovec element to the given type. | ||
184 | * | ||
185 | * This is a fairly ugly trick: we need to know the size of the type and | ||
186 | * alignment requirement to check the pointer is kosher. It's also nice to | ||
187 | * have the name of the type in case we report failure. | ||
188 | * | ||
189 | * Typing those three things all the time is cumbersome and error prone, so we | ||
190 | * have a macro which sets them all up and passes to the real function. | ||
191 | */ | ||
192 | #define convert(iov, type) \ | ||
193 | ((type *)_convert((iov), sizeof(type), __alignof__(type), #type)) | ||
194 | |||
195 | static void *_convert(struct iovec *iov, size_t size, size_t align, | ||
196 | const char *name) | ||
197 | { | ||
198 | if (iov->iov_len != size) | ||
199 | errx(1, "Bad iovec size %zu for %s", iov->iov_len, name); | ||
200 | if ((unsigned long)iov->iov_base % align != 0) | ||
201 | errx(1, "Bad alignment %p for %s", iov->iov_base, name); | ||
202 | return iov->iov_base; | ||
203 | } | ||
204 | |||
205 | /* Wrapper for the last available index. Makes it easier to change. */ | ||
206 | #define lg_last_avail(vq) ((vq)->last_avail_idx) | ||
207 | |||
208 | /* | ||
209 | * The virtio configuration space is defined to be little-endian. x86 is | ||
210 | * little-endian too, but it's nice to be explicit so we have these helpers. | ||
211 | */ | ||
212 | #define cpu_to_le16(v16) (v16) | ||
213 | #define cpu_to_le32(v32) (v32) | ||
214 | #define cpu_to_le64(v64) (v64) | ||
215 | #define le16_to_cpu(v16) (v16) | ||
216 | #define le32_to_cpu(v32) (v32) | ||
217 | #define le64_to_cpu(v64) (v64) | ||
218 | |||
219 | /* Is this iovec empty? */ | ||
220 | static bool iov_empty(const struct iovec iov[], unsigned int num_iov) | ||
221 | { | ||
222 | unsigned int i; | ||
223 | |||
224 | for (i = 0; i < num_iov; i++) | ||
225 | if (iov[i].iov_len) | ||
226 | return false; | ||
227 | return true; | ||
228 | } | ||
229 | |||
230 | /* Take len bytes from the front of this iovec. */ | ||
231 | static void iov_consume(struct iovec iov[], unsigned num_iov, unsigned len) | ||
232 | { | ||
233 | unsigned int i; | ||
234 | |||
235 | for (i = 0; i < num_iov; i++) { | ||
236 | unsigned int used; | ||
237 | |||
238 | used = iov[i].iov_len < len ? iov[i].iov_len : len; | ||
239 | iov[i].iov_base += used; | ||
240 | iov[i].iov_len -= used; | ||
241 | len -= used; | ||
242 | } | ||
243 | assert(len == 0); | ||
244 | } | ||
245 | |||
246 | /* The device virtqueue descriptors are followed by feature bitmasks. */ | ||
247 | static u8 *get_feature_bits(struct device *dev) | ||
248 | { | ||
249 | return (u8 *)(dev->desc + 1) | ||
250 | + dev->num_vq * sizeof(struct lguest_vqconfig); | ||
251 | } | ||
252 | |||
253 | /*L:100 | ||
254 | * The Launcher code itself takes us out into userspace, that scary place where | ||
255 | * pointers run wild and free! Unfortunately, like most userspace programs, | ||
256 | * it's quite boring (which is why everyone likes to hack on the kernel!). | ||
257 | * Perhaps if you make up an Lguest Drinking Game at this point, it will get | ||
258 | * you through this section. Or, maybe not. | ||
259 | * | ||
260 | * The Launcher sets up a big chunk of memory to be the Guest's "physical" | ||
261 | * memory and stores it in "guest_base". In other words, Guest physical == | ||
262 | * Launcher virtual with an offset. | ||
263 | * | ||
264 | * This can be tough to get your head around, but usually it just means that we | ||
265 | * use these trivial conversion functions when the Guest gives us its | ||
266 | * "physical" addresses: | ||
267 | */ | ||
268 | static void *from_guest_phys(unsigned long addr) | ||
269 | { | ||
270 | return guest_base + addr; | ||
271 | } | ||
272 | |||
273 | static unsigned long to_guest_phys(const void *addr) | ||
274 | { | ||
275 | return (addr - guest_base); | ||
276 | } | ||
277 | |||
278 | /*L:130 | ||
279 | * Loading the Kernel. | ||
280 | * | ||
281 | * We start with couple of simple helper routines. open_or_die() avoids | ||
282 | * error-checking code cluttering the callers: | ||
283 | */ | ||
284 | static int open_or_die(const char *name, int flags) | ||
285 | { | ||
286 | int fd = open(name, flags); | ||
287 | if (fd < 0) | ||
288 | err(1, "Failed to open %s", name); | ||
289 | return fd; | ||
290 | } | ||
291 | |||
292 | /* map_zeroed_pages() takes a number of pages. */ | ||
293 | static void *map_zeroed_pages(unsigned int num) | ||
294 | { | ||
295 | int fd = open_or_die("/dev/zero", O_RDONLY); | ||
296 | void *addr; | ||
297 | |||
298 | /* | ||
299 | * We use a private mapping (ie. if we write to the page, it will be | ||
300 | * copied). We allocate an extra two pages PROT_NONE to act as guard | ||
301 | * pages against read/write attempts that exceed allocated space. | ||
302 | */ | ||
303 | addr = mmap(NULL, getpagesize() * (num+2), | ||
304 | PROT_NONE, MAP_PRIVATE, fd, 0); | ||
305 | |||
306 | if (addr == MAP_FAILED) | ||
307 | err(1, "Mmapping %u pages of /dev/zero", num); | ||
308 | |||
309 | if (mprotect(addr + getpagesize(), getpagesize() * num, | ||
310 | PROT_READ|PROT_WRITE) == -1) | ||
311 | err(1, "mprotect rw %u pages failed", num); | ||
312 | |||
313 | /* | ||
314 | * One neat mmap feature is that you can close the fd, and it | ||
315 | * stays mapped. | ||
316 | */ | ||
317 | close(fd); | ||
318 | |||
319 | /* Return address after PROT_NONE page */ | ||
320 | return addr + getpagesize(); | ||
321 | } | ||
322 | |||
323 | /* Get some more pages for a device. */ | ||
324 | static void *get_pages(unsigned int num) | ||
325 | { | ||
326 | void *addr = from_guest_phys(guest_limit); | ||
327 | |||
328 | guest_limit += num * getpagesize(); | ||
329 | if (guest_limit > guest_max) | ||
330 | errx(1, "Not enough memory for devices"); | ||
331 | return addr; | ||
332 | } | ||
333 | |||
334 | /* | ||
335 | * This routine is used to load the kernel or initrd. It tries mmap, but if | ||
336 | * that fails (Plan 9's kernel file isn't nicely aligned on page boundaries), | ||
337 | * it falls back to reading the memory in. | ||
338 | */ | ||
339 | static void map_at(int fd, void *addr, unsigned long offset, unsigned long len) | ||
340 | { | ||
341 | ssize_t r; | ||
342 | |||
343 | /* | ||
344 | * We map writable even though for some segments are marked read-only. | ||
345 | * The kernel really wants to be writable: it patches its own | ||
346 | * instructions. | ||
347 | * | ||
348 | * MAP_PRIVATE means that the page won't be copied until a write is | ||
349 | * done to it. This allows us to share untouched memory between | ||
350 | * Guests. | ||
351 | */ | ||
352 | if (mmap(addr, len, PROT_READ|PROT_WRITE, | ||
353 | MAP_FIXED|MAP_PRIVATE, fd, offset) != MAP_FAILED) | ||
354 | return; | ||
355 | |||
356 | /* pread does a seek and a read in one shot: saves a few lines. */ | ||
357 | r = pread(fd, addr, len, offset); | ||
358 | if (r != len) | ||
359 | err(1, "Reading offset %lu len %lu gave %zi", offset, len, r); | ||
360 | } | ||
361 | |||
362 | /* | ||
363 | * This routine takes an open vmlinux image, which is in ELF, and maps it into | ||
364 | * the Guest memory. ELF = Embedded Linking Format, which is the format used | ||
365 | * by all modern binaries on Linux including the kernel. | ||
366 | * | ||
367 | * The ELF headers give *two* addresses: a physical address, and a virtual | ||
368 | * address. We use the physical address; the Guest will map itself to the | ||
369 | * virtual address. | ||
370 | * | ||
371 | * We return the starting address. | ||
372 | */ | ||
373 | static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr) | ||
374 | { | ||
375 | Elf32_Phdr phdr[ehdr->e_phnum]; | ||
376 | unsigned int i; | ||
377 | |||
378 | /* | ||
379 | * Sanity checks on the main ELF header: an x86 executable with a | ||
380 | * reasonable number of correctly-sized program headers. | ||
381 | */ | ||
382 | if (ehdr->e_type != ET_EXEC | ||
383 | || ehdr->e_machine != EM_386 | ||
384 | || ehdr->e_phentsize != sizeof(Elf32_Phdr) | ||
385 | || ehdr->e_phnum < 1 || ehdr->e_phnum > 65536U/sizeof(Elf32_Phdr)) | ||
386 | errx(1, "Malformed elf header"); | ||
387 | |||
388 | /* | ||
389 | * An ELF executable contains an ELF header and a number of "program" | ||
390 | * headers which indicate which parts ("segments") of the program to | ||
391 | * load where. | ||
392 | */ | ||
393 | |||
394 | /* We read in all the program headers at once: */ | ||
395 | if (lseek(elf_fd, ehdr->e_phoff, SEEK_SET) < 0) | ||
396 | err(1, "Seeking to program headers"); | ||
397 | if (read(elf_fd, phdr, sizeof(phdr)) != sizeof(phdr)) | ||
398 | err(1, "Reading program headers"); | ||
399 | |||
400 | /* | ||
401 | * Try all the headers: there are usually only three. A read-only one, | ||
402 | * a read-write one, and a "note" section which we don't load. | ||
403 | */ | ||
404 | for (i = 0; i < ehdr->e_phnum; i++) { | ||
405 | /* If this isn't a loadable segment, we ignore it */ | ||
406 | if (phdr[i].p_type != PT_LOAD) | ||
407 | continue; | ||
408 | |||
409 | verbose("Section %i: size %i addr %p\n", | ||
410 | i, phdr[i].p_memsz, (void *)phdr[i].p_paddr); | ||
411 | |||
412 | /* We map this section of the file at its physical address. */ | ||
413 | map_at(elf_fd, from_guest_phys(phdr[i].p_paddr), | ||
414 | phdr[i].p_offset, phdr[i].p_filesz); | ||
415 | } | ||
416 | |||
417 | /* The entry point is given in the ELF header. */ | ||
418 | return ehdr->e_entry; | ||
419 | } | ||
420 | |||
421 | /*L:150 | ||
422 | * A bzImage, unlike an ELF file, is not meant to be loaded. You're supposed | ||
423 | * to jump into it and it will unpack itself. We used to have to perform some | ||
424 | * hairy magic because the unpacking code scared me. | ||
425 | * | ||
426 | * Fortunately, Jeremy Fitzhardinge convinced me it wasn't that hard and wrote | ||
427 | * a small patch to jump over the tricky bits in the Guest, so now we just read | ||
428 | * the funky header so we know where in the file to load, and away we go! | ||
429 | */ | ||
430 | static unsigned long load_bzimage(int fd) | ||
431 | { | ||
432 | struct boot_params boot; | ||
433 | int r; | ||
434 | /* Modern bzImages get loaded at 1M. */ | ||
435 | void *p = from_guest_phys(0x100000); | ||
436 | |||
437 | /* | ||
438 | * Go back to the start of the file and read the header. It should be | ||
439 | * a Linux boot header (see Documentation/x86/i386/boot.txt) | ||
440 | */ | ||
441 | lseek(fd, 0, SEEK_SET); | ||
442 | read(fd, &boot, sizeof(boot)); | ||
443 | |||
444 | /* Inside the setup_hdr, we expect the magic "HdrS" */ | ||
445 | if (memcmp(&boot.hdr.header, "HdrS", 4) != 0) | ||
446 | errx(1, "This doesn't look like a bzImage to me"); | ||
447 | |||
448 | /* Skip over the extra sectors of the header. */ | ||
449 | lseek(fd, (boot.hdr.setup_sects+1) * 512, SEEK_SET); | ||
450 | |||
451 | /* Now read everything into memory. in nice big chunks. */ | ||
452 | while ((r = read(fd, p, 65536)) > 0) | ||
453 | p += r; | ||
454 | |||
455 | /* Finally, code32_start tells us where to enter the kernel. */ | ||
456 | return boot.hdr.code32_start; | ||
457 | } | ||
458 | |||
459 | /*L:140 | ||
460 | * Loading the kernel is easy when it's a "vmlinux", but most kernels | ||
461 | * come wrapped up in the self-decompressing "bzImage" format. With a little | ||
462 | * work, we can load those, too. | ||
463 | */ | ||
464 | static unsigned long load_kernel(int fd) | ||
465 | { | ||
466 | Elf32_Ehdr hdr; | ||
467 | |||
468 | /* Read in the first few bytes. */ | ||
469 | if (read(fd, &hdr, sizeof(hdr)) != sizeof(hdr)) | ||
470 | err(1, "Reading kernel"); | ||
471 | |||
472 | /* If it's an ELF file, it starts with "\177ELF" */ | ||
473 | if (memcmp(hdr.e_ident, ELFMAG, SELFMAG) == 0) | ||
474 | return map_elf(fd, &hdr); | ||
475 | |||
476 | /* Otherwise we assume it's a bzImage, and try to load it. */ | ||
477 | return load_bzimage(fd); | ||
478 | } | ||
479 | |||
480 | /* | ||
481 | * This is a trivial little helper to align pages. Andi Kleen hated it because | ||
482 | * it calls getpagesize() twice: "it's dumb code." | ||
483 | * | ||
484 | * Kernel guys get really het up about optimization, even when it's not | ||
485 | * necessary. I leave this code as a reaction against that. | ||
486 | */ | ||
487 | static inline unsigned long page_align(unsigned long addr) | ||
488 | { | ||
489 | /* Add upwards and truncate downwards. */ | ||
490 | return ((addr + getpagesize()-1) & ~(getpagesize()-1)); | ||
491 | } | ||
492 | |||
493 | /*L:180 | ||
494 | * An "initial ram disk" is a disk image loaded into memory along with the | ||
495 | * kernel which the kernel can use to boot from without needing any drivers. | ||
496 | * Most distributions now use this as standard: the initrd contains the code to | ||
497 | * load the appropriate driver modules for the current machine. | ||
498 | * | ||
499 | * Importantly, James Morris works for RedHat, and Fedora uses initrds for its | ||
500 | * kernels. He sent me this (and tells me when I break it). | ||
501 | */ | ||
502 | static unsigned long load_initrd(const char *name, unsigned long mem) | ||
503 | { | ||
504 | int ifd; | ||
505 | struct stat st; | ||
506 | unsigned long len; | ||
507 | |||
508 | ifd = open_or_die(name, O_RDONLY); | ||
509 | /* fstat() is needed to get the file size. */ | ||
510 | if (fstat(ifd, &st) < 0) | ||
511 | err(1, "fstat() on initrd '%s'", name); | ||
512 | |||
513 | /* | ||
514 | * We map the initrd at the top of memory, but mmap wants it to be | ||
515 | * page-aligned, so we round the size up for that. | ||
516 | */ | ||
517 | len = page_align(st.st_size); | ||
518 | map_at(ifd, from_guest_phys(mem - len), 0, st.st_size); | ||
519 | /* | ||
520 | * Once a file is mapped, you can close the file descriptor. It's a | ||
521 | * little odd, but quite useful. | ||
522 | */ | ||
523 | close(ifd); | ||
524 | verbose("mapped initrd %s size=%lu @ %p\n", name, len, (void*)mem-len); | ||
525 | |||
526 | /* We return the initrd size. */ | ||
527 | return len; | ||
528 | } | ||
529 | /*:*/ | ||
530 | |||
531 | /* | ||
532 | * Simple routine to roll all the commandline arguments together with spaces | ||
533 | * between them. | ||
534 | */ | ||
535 | static void concat(char *dst, char *args[]) | ||
536 | { | ||
537 | unsigned int i, len = 0; | ||
538 | |||
539 | for (i = 0; args[i]; i++) { | ||
540 | if (i) { | ||
541 | strcat(dst+len, " "); | ||
542 | len++; | ||
543 | } | ||
544 | strcpy(dst+len, args[i]); | ||
545 | len += strlen(args[i]); | ||
546 | } | ||
547 | /* In case it's empty. */ | ||
548 | dst[len] = '\0'; | ||
549 | } | ||
550 | |||
551 | /*L:185 | ||
552 | * This is where we actually tell the kernel to initialize the Guest. We | ||
553 | * saw the arguments it expects when we looked at initialize() in lguest_user.c: | ||
554 | * the base of Guest "physical" memory, the top physical page to allow and the | ||
555 | * entry point for the Guest. | ||
556 | */ | ||
557 | static void tell_kernel(unsigned long start) | ||
558 | { | ||
559 | unsigned long args[] = { LHREQ_INITIALIZE, | ||
560 | (unsigned long)guest_base, | ||
561 | guest_limit / getpagesize(), start }; | ||
562 | verbose("Guest: %p - %p (%#lx)\n", | ||
563 | guest_base, guest_base + guest_limit, guest_limit); | ||
564 | lguest_fd = open_or_die("/dev/lguest", O_RDWR); | ||
565 | if (write(lguest_fd, args, sizeof(args)) < 0) | ||
566 | err(1, "Writing to /dev/lguest"); | ||
567 | } | ||
568 | /*:*/ | ||
569 | |||
570 | /*L:200 | ||
571 | * Device Handling. | ||
572 | * | ||
573 | * When the Guest gives us a buffer, it sends an array of addresses and sizes. | ||
574 | * We need to make sure it's not trying to reach into the Launcher itself, so | ||
575 | * we have a convenient routine which checks it and exits with an error message | ||
576 | * if something funny is going on: | ||
577 | */ | ||
578 | static void *_check_pointer(unsigned long addr, unsigned int size, | ||
579 | unsigned int line) | ||
580 | { | ||
581 | /* | ||
582 | * Check if the requested address and size exceeds the allocated memory, | ||
583 | * or addr + size wraps around. | ||
584 | */ | ||
585 | if ((addr + size) > guest_limit || (addr + size) < addr) | ||
586 | errx(1, "%s:%i: Invalid address %#lx", __FILE__, line, addr); | ||
587 | /* | ||
588 | * We return a pointer for the caller's convenience, now we know it's | ||
589 | * safe to use. | ||
590 | */ | ||
591 | return from_guest_phys(addr); | ||
592 | } | ||
593 | /* A macro which transparently hands the line number to the real function. */ | ||
594 | #define check_pointer(addr,size) _check_pointer(addr, size, __LINE__) | ||
595 | |||
596 | /* | ||
597 | * Each buffer in the virtqueues is actually a chain of descriptors. This | ||
598 | * function returns the next descriptor in the chain, or vq->vring.num if we're | ||
599 | * at the end. | ||
600 | */ | ||
601 | static unsigned next_desc(struct vring_desc *desc, | ||
602 | unsigned int i, unsigned int max) | ||
603 | { | ||
604 | unsigned int next; | ||
605 | |||
606 | /* If this descriptor says it doesn't chain, we're done. */ | ||
607 | if (!(desc[i].flags & VRING_DESC_F_NEXT)) | ||
608 | return max; | ||
609 | |||
610 | /* Check they're not leading us off end of descriptors. */ | ||
611 | next = desc[i].next; | ||
612 | /* Make sure compiler knows to grab that: we don't want it changing! */ | ||
613 | wmb(); | ||
614 | |||
615 | if (next >= max) | ||
616 | errx(1, "Desc next is %u", next); | ||
617 | |||
618 | return next; | ||
619 | } | ||
620 | |||
621 | /* | ||
622 | * This actually sends the interrupt for this virtqueue, if we've used a | ||
623 | * buffer. | ||
624 | */ | ||
625 | static void trigger_irq(struct virtqueue *vq) | ||
626 | { | ||
627 | unsigned long buf[] = { LHREQ_IRQ, vq->config.irq }; | ||
628 | |||
629 | /* Don't inform them if nothing used. */ | ||
630 | if (!vq->pending_used) | ||
631 | return; | ||
632 | vq->pending_used = 0; | ||
633 | |||
634 | /* If they don't want an interrupt, don't send one... */ | ||
635 | if (vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT) { | ||
636 | return; | ||
637 | } | ||
638 | |||
639 | /* Send the Guest an interrupt tell them we used something up. */ | ||
640 | if (write(lguest_fd, buf, sizeof(buf)) != 0) | ||
641 | err(1, "Triggering irq %i", vq->config.irq); | ||
642 | } | ||
643 | |||
644 | /* | ||
645 | * This looks in the virtqueue for the first available buffer, and converts | ||
646 | * it to an iovec for convenient access. Since descriptors consist of some | ||
647 | * number of output then some number of input descriptors, it's actually two | ||
648 | * iovecs, but we pack them into one and note how many of each there were. | ||
649 | * | ||
650 | * This function waits if necessary, and returns the descriptor number found. | ||
651 | */ | ||
652 | static unsigned wait_for_vq_desc(struct virtqueue *vq, | ||
653 | struct iovec iov[], | ||
654 | unsigned int *out_num, unsigned int *in_num) | ||
655 | { | ||
656 | unsigned int i, head, max; | ||
657 | struct vring_desc *desc; | ||
658 | u16 last_avail = lg_last_avail(vq); | ||
659 | |||
660 | /* There's nothing available? */ | ||
661 | while (last_avail == vq->vring.avail->idx) { | ||
662 | u64 event; | ||
663 | |||
664 | /* | ||
665 | * Since we're about to sleep, now is a good time to tell the | ||
666 | * Guest about what we've used up to now. | ||
667 | */ | ||
668 | trigger_irq(vq); | ||
669 | |||
670 | /* OK, now we need to know about added descriptors. */ | ||
671 | vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY; | ||
672 | |||
673 | /* | ||
674 | * They could have slipped one in as we were doing that: make | ||
675 | * sure it's written, then check again. | ||
676 | */ | ||
677 | mb(); | ||
678 | if (last_avail != vq->vring.avail->idx) { | ||
679 | vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY; | ||
680 | break; | ||
681 | } | ||
682 | |||
683 | /* Nothing new? Wait for eventfd to tell us they refilled. */ | ||
684 | if (read(vq->eventfd, &event, sizeof(event)) != sizeof(event)) | ||
685 | errx(1, "Event read failed?"); | ||
686 | |||
687 | /* We don't need to be notified again. */ | ||
688 | vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY; | ||
689 | } | ||
690 | |||
691 | /* Check it isn't doing very strange things with descriptor numbers. */ | ||
692 | if ((u16)(vq->vring.avail->idx - last_avail) > vq->vring.num) | ||
693 | errx(1, "Guest moved used index from %u to %u", | ||
694 | last_avail, vq->vring.avail->idx); | ||
695 | |||
696 | /* | ||
697 | * Grab the next descriptor number they're advertising, and increment | ||
698 | * the index we've seen. | ||
699 | */ | ||
700 | head = vq->vring.avail->ring[last_avail % vq->vring.num]; | ||
701 | lg_last_avail(vq)++; | ||
702 | |||
703 | /* If their number is silly, that's a fatal mistake. */ | ||
704 | if (head >= vq->vring.num) | ||
705 | errx(1, "Guest says index %u is available", head); | ||
706 | |||
707 | /* When we start there are none of either input nor output. */ | ||
708 | *out_num = *in_num = 0; | ||
709 | |||
710 | max = vq->vring.num; | ||
711 | desc = vq->vring.desc; | ||
712 | i = head; | ||
713 | |||
714 | /* | ||
715 | * If this is an indirect entry, then this buffer contains a descriptor | ||
716 | * table which we handle as if it's any normal descriptor chain. | ||
717 | */ | ||
718 | if (desc[i].flags & VRING_DESC_F_INDIRECT) { | ||
719 | if (desc[i].len % sizeof(struct vring_desc)) | ||
720 | errx(1, "Invalid size for indirect buffer table"); | ||
721 | |||
722 | max = desc[i].len / sizeof(struct vring_desc); | ||
723 | desc = check_pointer(desc[i].addr, desc[i].len); | ||
724 | i = 0; | ||
725 | } | ||
726 | |||
727 | do { | ||
728 | /* Grab the first descriptor, and check it's OK. */ | ||
729 | iov[*out_num + *in_num].iov_len = desc[i].len; | ||
730 | iov[*out_num + *in_num].iov_base | ||
731 | = check_pointer(desc[i].addr, desc[i].len); | ||
732 | /* If this is an input descriptor, increment that count. */ | ||
733 | if (desc[i].flags & VRING_DESC_F_WRITE) | ||
734 | (*in_num)++; | ||
735 | else { | ||
736 | /* | ||
737 | * If it's an output descriptor, they're all supposed | ||
738 | * to come before any input descriptors. | ||
739 | */ | ||
740 | if (*in_num) | ||
741 | errx(1, "Descriptor has out after in"); | ||
742 | (*out_num)++; | ||
743 | } | ||
744 | |||
745 | /* If we've got too many, that implies a descriptor loop. */ | ||
746 | if (*out_num + *in_num > max) | ||
747 | errx(1, "Looped descriptor"); | ||
748 | } while ((i = next_desc(desc, i, max)) != max); | ||
749 | |||
750 | return head; | ||
751 | } | ||
752 | |||
753 | /* | ||
754 | * After we've used one of their buffers, we tell the Guest about it. Sometime | ||
755 | * later we'll want to send them an interrupt using trigger_irq(); note that | ||
756 | * wait_for_vq_desc() does that for us if it has to wait. | ||
757 | */ | ||
758 | static void add_used(struct virtqueue *vq, unsigned int head, int len) | ||
759 | { | ||
760 | struct vring_used_elem *used; | ||
761 | |||
762 | /* | ||
763 | * The virtqueue contains a ring of used buffers. Get a pointer to the | ||
764 | * next entry in that used ring. | ||
765 | */ | ||
766 | used = &vq->vring.used->ring[vq->vring.used->idx % vq->vring.num]; | ||
767 | used->id = head; | ||
768 | used->len = len; | ||
769 | /* Make sure buffer is written before we update index. */ | ||
770 | wmb(); | ||
771 | vq->vring.used->idx++; | ||
772 | vq->pending_used++; | ||
773 | } | ||
774 | |||
775 | /* And here's the combo meal deal. Supersize me! */ | ||
776 | static void add_used_and_trigger(struct virtqueue *vq, unsigned head, int len) | ||
777 | { | ||
778 | add_used(vq, head, len); | ||
779 | trigger_irq(vq); | ||
780 | } | ||
781 | |||
782 | /* | ||
783 | * The Console | ||
784 | * | ||
785 | * We associate some data with the console for our exit hack. | ||
786 | */ | ||
787 | struct console_abort { | ||
788 | /* How many times have they hit ^C? */ | ||
789 | int count; | ||
790 | /* When did they start? */ | ||
791 | struct timeval start; | ||
792 | }; | ||
793 | |||
794 | /* This is the routine which handles console input (ie. stdin). */ | ||
795 | static void console_input(struct virtqueue *vq) | ||
796 | { | ||
797 | int len; | ||
798 | unsigned int head, in_num, out_num; | ||
799 | struct console_abort *abort = vq->dev->priv; | ||
800 | struct iovec iov[vq->vring.num]; | ||
801 | |||
802 | /* Make sure there's a descriptor available. */ | ||
803 | head = wait_for_vq_desc(vq, iov, &out_num, &in_num); | ||
804 | if (out_num) | ||
805 | errx(1, "Output buffers in console in queue?"); | ||
806 | |||
807 | /* Read into it. This is where we usually wait. */ | ||
808 | len = readv(STDIN_FILENO, iov, in_num); | ||
809 | if (len <= 0) { | ||
810 | /* Ran out of input? */ | ||
811 | warnx("Failed to get console input, ignoring console."); | ||
812 | /* | ||
813 | * For simplicity, dying threads kill the whole Launcher. So | ||
814 | * just nap here. | ||
815 | */ | ||
816 | for (;;) | ||
817 | pause(); | ||
818 | } | ||
819 | |||
820 | /* Tell the Guest we used a buffer. */ | ||
821 | add_used_and_trigger(vq, head, len); | ||
822 | |||
823 | /* | ||
824 | * Three ^C within one second? Exit. | ||
825 | * | ||
826 | * This is such a hack, but works surprisingly well. Each ^C has to | ||
827 | * be in a buffer by itself, so they can't be too fast. But we check | ||
828 | * that we get three within about a second, so they can't be too | ||
829 | * slow. | ||
830 | */ | ||
831 | if (len != 1 || ((char *)iov[0].iov_base)[0] != 3) { | ||
832 | abort->count = 0; | ||
833 | return; | ||
834 | } | ||
835 | |||
836 | abort->count++; | ||
837 | if (abort->count == 1) | ||
838 | gettimeofday(&abort->start, NULL); | ||
839 | else if (abort->count == 3) { | ||
840 | struct timeval now; | ||
841 | gettimeofday(&now, NULL); | ||
842 | /* Kill all Launcher processes with SIGINT, like normal ^C */ | ||
843 | if (now.tv_sec <= abort->start.tv_sec+1) | ||
844 | kill(0, SIGINT); | ||
845 | abort->count = 0; | ||
846 | } | ||
847 | } | ||
848 | |||
849 | /* This is the routine which handles console output (ie. stdout). */ | ||
850 | static void console_output(struct virtqueue *vq) | ||
851 | { | ||
852 | unsigned int head, out, in; | ||
853 | struct iovec iov[vq->vring.num]; | ||
854 | |||
855 | /* We usually wait in here, for the Guest to give us something. */ | ||
856 | head = wait_for_vq_desc(vq, iov, &out, &in); | ||
857 | if (in) | ||
858 | errx(1, "Input buffers in console output queue?"); | ||
859 | |||
860 | /* writev can return a partial write, so we loop here. */ | ||
861 | while (!iov_empty(iov, out)) { | ||
862 | int len = writev(STDOUT_FILENO, iov, out); | ||
863 | if (len <= 0) { | ||
864 | warn("Write to stdout gave %i (%d)", len, errno); | ||
865 | break; | ||
866 | } | ||
867 | iov_consume(iov, out, len); | ||
868 | } | ||
869 | |||
870 | /* | ||
871 | * We're finished with that buffer: if we're going to sleep, | ||
872 | * wait_for_vq_desc() will prod the Guest with an interrupt. | ||
873 | */ | ||
874 | add_used(vq, head, 0); | ||
875 | } | ||
876 | |||
877 | /* | ||
878 | * The Network | ||
879 | * | ||
880 | * Handling output for network is also simple: we get all the output buffers | ||
881 | * and write them to /dev/net/tun. | ||
882 | */ | ||
883 | struct net_info { | ||
884 | int tunfd; | ||
885 | }; | ||
886 | |||
887 | static void net_output(struct virtqueue *vq) | ||
888 | { | ||
889 | struct net_info *net_info = vq->dev->priv; | ||
890 | unsigned int head, out, in; | ||
891 | struct iovec iov[vq->vring.num]; | ||
892 | |||
893 | /* We usually wait in here for the Guest to give us a packet. */ | ||
894 | head = wait_for_vq_desc(vq, iov, &out, &in); | ||
895 | if (in) | ||
896 | errx(1, "Input buffers in net output queue?"); | ||
897 | /* | ||
898 | * Send the whole thing through to /dev/net/tun. It expects the exact | ||
899 | * same format: what a coincidence! | ||
900 | */ | ||
901 | if (writev(net_info->tunfd, iov, out) < 0) | ||
902 | warnx("Write to tun failed (%d)?", errno); | ||
903 | |||
904 | /* | ||
905 | * Done with that one; wait_for_vq_desc() will send the interrupt if | ||
906 | * all packets are processed. | ||
907 | */ | ||
908 | add_used(vq, head, 0); | ||
909 | } | ||
910 | |||
911 | /* | ||
912 | * Handling network input is a bit trickier, because I've tried to optimize it. | ||
913 | * | ||
914 | * First we have a helper routine which tells is if from this file descriptor | ||
915 | * (ie. the /dev/net/tun device) will block: | ||
916 | */ | ||
917 | static bool will_block(int fd) | ||
918 | { | ||
919 | fd_set fdset; | ||
920 | struct timeval zero = { 0, 0 }; | ||
921 | FD_ZERO(&fdset); | ||
922 | FD_SET(fd, &fdset); | ||
923 | return select(fd+1, &fdset, NULL, NULL, &zero) != 1; | ||
924 | } | ||
925 | |||
926 | /* | ||
927 | * This handles packets coming in from the tun device to our Guest. Like all | ||
928 | * service routines, it gets called again as soon as it returns, so you don't | ||
929 | * see a while(1) loop here. | ||
930 | */ | ||
931 | static void net_input(struct virtqueue *vq) | ||
932 | { | ||
933 | int len; | ||
934 | unsigned int head, out, in; | ||
935 | struct iovec iov[vq->vring.num]; | ||
936 | struct net_info *net_info = vq->dev->priv; | ||
937 | |||
938 | /* | ||
939 | * Get a descriptor to write an incoming packet into. This will also | ||
940 | * send an interrupt if they're out of descriptors. | ||
941 | */ | ||
942 | head = wait_for_vq_desc(vq, iov, &out, &in); | ||
943 | if (out) | ||
944 | errx(1, "Output buffers in net input queue?"); | ||
945 | |||
946 | /* | ||
947 | * If it looks like we'll block reading from the tun device, send them | ||
948 | * an interrupt. | ||
949 | */ | ||
950 | if (vq->pending_used && will_block(net_info->tunfd)) | ||
951 | trigger_irq(vq); | ||
952 | |||
953 | /* | ||
954 | * Read in the packet. This is where we normally wait (when there's no | ||
955 | * incoming network traffic). | ||
956 | */ | ||
957 | len = readv(net_info->tunfd, iov, in); | ||
958 | if (len <= 0) | ||
959 | warn("Failed to read from tun (%d).", errno); | ||
960 | |||
961 | /* | ||
962 | * Mark that packet buffer as used, but don't interrupt here. We want | ||
963 | * to wait until we've done as much work as we can. | ||
964 | */ | ||
965 | add_used(vq, head, len); | ||
966 | } | ||
967 | /*:*/ | ||
968 | |||
969 | /* This is the helper to create threads: run the service routine in a loop. */ | ||
970 | static int do_thread(void *_vq) | ||
971 | { | ||
972 | struct virtqueue *vq = _vq; | ||
973 | |||
974 | for (;;) | ||
975 | vq->service(vq); | ||
976 | return 0; | ||
977 | } | ||
978 | |||
979 | /* | ||
980 | * When a child dies, we kill our entire process group with SIGTERM. This | ||
981 | * also has the side effect that the shell restores the console for us! | ||
982 | */ | ||
983 | static void kill_launcher(int signal) | ||
984 | { | ||
985 | kill(0, SIGTERM); | ||
986 | } | ||
987 | |||
988 | static void reset_device(struct device *dev) | ||
989 | { | ||
990 | struct virtqueue *vq; | ||
991 | |||
992 | verbose("Resetting device %s\n", dev->name); | ||
993 | |||
994 | /* Clear any features they've acked. */ | ||
995 | memset(get_feature_bits(dev) + dev->feature_len, 0, dev->feature_len); | ||
996 | |||
997 | /* We're going to be explicitly killing threads, so ignore them. */ | ||
998 | signal(SIGCHLD, SIG_IGN); | ||
999 | |||
1000 | /* Zero out the virtqueues, get rid of their threads */ | ||
1001 | for (vq = dev->vq; vq; vq = vq->next) { | ||
1002 | if (vq->thread != (pid_t)-1) { | ||
1003 | kill(vq->thread, SIGTERM); | ||
1004 | waitpid(vq->thread, NULL, 0); | ||
1005 | vq->thread = (pid_t)-1; | ||
1006 | } | ||
1007 | memset(vq->vring.desc, 0, | ||
1008 | vring_size(vq->config.num, LGUEST_VRING_ALIGN)); | ||
1009 | lg_last_avail(vq) = 0; | ||
1010 | } | ||
1011 | dev->running = false; | ||
1012 | |||
1013 | /* Now we care if threads die. */ | ||
1014 | signal(SIGCHLD, (void *)kill_launcher); | ||
1015 | } | ||
1016 | |||
1017 | /*L:216 | ||
1018 | * This actually creates the thread which services the virtqueue for a device. | ||
1019 | */ | ||
1020 | static void create_thread(struct virtqueue *vq) | ||
1021 | { | ||
1022 | /* | ||
1023 | * Create stack for thread. Since the stack grows upwards, we point | ||
1024 | * the stack pointer to the end of this region. | ||
1025 | */ | ||
1026 | char *stack = malloc(32768); | ||
1027 | unsigned long args[] = { LHREQ_EVENTFD, | ||
1028 | vq->config.pfn*getpagesize(), 0 }; | ||
1029 | |||
1030 | /* Create a zero-initialized eventfd. */ | ||
1031 | vq->eventfd = eventfd(0, 0); | ||
1032 | if (vq->eventfd < 0) | ||
1033 | err(1, "Creating eventfd"); | ||
1034 | args[2] = vq->eventfd; | ||
1035 | |||
1036 | /* | ||
1037 | * Attach an eventfd to this virtqueue: it will go off when the Guest | ||
1038 | * does an LHCALL_NOTIFY for this vq. | ||
1039 | */ | ||
1040 | if (write(lguest_fd, &args, sizeof(args)) != 0) | ||
1041 | err(1, "Attaching eventfd"); | ||
1042 | |||
1043 | /* | ||
1044 | * CLONE_VM: because it has to access the Guest memory, and SIGCHLD so | ||
1045 | * we get a signal if it dies. | ||
1046 | */ | ||
1047 | vq->thread = clone(do_thread, stack + 32768, CLONE_VM | SIGCHLD, vq); | ||
1048 | if (vq->thread == (pid_t)-1) | ||
1049 | err(1, "Creating clone"); | ||
1050 | |||
1051 | /* We close our local copy now the child has it. */ | ||
1052 | close(vq->eventfd); | ||
1053 | } | ||
1054 | |||
1055 | static void start_device(struct device *dev) | ||
1056 | { | ||
1057 | unsigned int i; | ||
1058 | struct virtqueue *vq; | ||
1059 | |||
1060 | verbose("Device %s OK: offered", dev->name); | ||
1061 | for (i = 0; i < dev->feature_len; i++) | ||
1062 | verbose(" %02x", get_feature_bits(dev)[i]); | ||
1063 | verbose(", accepted"); | ||
1064 | for (i = 0; i < dev->feature_len; i++) | ||
1065 | verbose(" %02x", get_feature_bits(dev) | ||
1066 | [dev->feature_len+i]); | ||
1067 | |||
1068 | for (vq = dev->vq; vq; vq = vq->next) { | ||
1069 | if (vq->service) | ||
1070 | create_thread(vq); | ||
1071 | } | ||
1072 | dev->running = true; | ||
1073 | } | ||
1074 | |||
1075 | static void cleanup_devices(void) | ||
1076 | { | ||
1077 | struct device *dev; | ||
1078 | |||
1079 | for (dev = devices.dev; dev; dev = dev->next) | ||
1080 | reset_device(dev); | ||
1081 | |||
1082 | /* If we saved off the original terminal settings, restore them now. */ | ||
1083 | if (orig_term.c_lflag & (ISIG|ICANON|ECHO)) | ||
1084 | tcsetattr(STDIN_FILENO, TCSANOW, &orig_term); | ||
1085 | } | ||
1086 | |||
1087 | /* When the Guest tells us they updated the status field, we handle it. */ | ||
1088 | static void update_device_status(struct device *dev) | ||
1089 | { | ||
1090 | /* A zero status is a reset, otherwise it's a set of flags. */ | ||
1091 | if (dev->desc->status == 0) | ||
1092 | reset_device(dev); | ||
1093 | else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) { | ||
1094 | warnx("Device %s configuration FAILED", dev->name); | ||
1095 | if (dev->running) | ||
1096 | reset_device(dev); | ||
1097 | } else { | ||
1098 | if (dev->running) | ||
1099 | err(1, "Device %s features finalized twice", dev->name); | ||
1100 | start_device(dev); | ||
1101 | } | ||
1102 | } | ||
1103 | |||
1104 | /*L:215 | ||
1105 | * This is the generic routine we call when the Guest uses LHCALL_NOTIFY. In | ||
1106 | * particular, it's used to notify us of device status changes during boot. | ||
1107 | */ | ||
1108 | static void handle_output(unsigned long addr) | ||
1109 | { | ||
1110 | struct device *i; | ||
1111 | |||
1112 | /* Check each device. */ | ||
1113 | for (i = devices.dev; i; i = i->next) { | ||
1114 | struct virtqueue *vq; | ||
1115 | |||
1116 | /* | ||
1117 | * Notifications to device descriptors mean they updated the | ||
1118 | * device status. | ||
1119 | */ | ||
1120 | if (from_guest_phys(addr) == i->desc) { | ||
1121 | update_device_status(i); | ||
1122 | return; | ||
1123 | } | ||
1124 | |||
1125 | /* Devices should not be used before features are finalized. */ | ||
1126 | for (vq = i->vq; vq; vq = vq->next) { | ||
1127 | if (addr != vq->config.pfn*getpagesize()) | ||
1128 | continue; | ||
1129 | errx(1, "Notification on %s before setup!", i->name); | ||
1130 | } | ||
1131 | } | ||
1132 | |||
1133 | /* | ||
1134 | * Early console write is done using notify on a nul-terminated string | ||
1135 | * in Guest memory. It's also great for hacking debugging messages | ||
1136 | * into a Guest. | ||
1137 | */ | ||
1138 | if (addr >= guest_limit) | ||
1139 | errx(1, "Bad NOTIFY %#lx", addr); | ||
1140 | |||
1141 | write(STDOUT_FILENO, from_guest_phys(addr), | ||
1142 | strnlen(from_guest_phys(addr), guest_limit - addr)); | ||
1143 | } | ||
1144 | |||
1145 | /*L:190 | ||
1146 | * Device Setup | ||
1147 | * | ||
1148 | * All devices need a descriptor so the Guest knows it exists, and a "struct | ||
1149 | * device" so the Launcher can keep track of it. We have common helper | ||
1150 | * routines to allocate and manage them. | ||
1151 | */ | ||
1152 | |||
1153 | /* | ||
1154 | * The layout of the device page is a "struct lguest_device_desc" followed by a | ||
1155 | * number of virtqueue descriptors, then two sets of feature bits, then an | ||
1156 | * array of configuration bytes. This routine returns the configuration | ||
1157 | * pointer. | ||
1158 | */ | ||
1159 | static u8 *device_config(const struct device *dev) | ||
1160 | { | ||
1161 | return (void *)(dev->desc + 1) | ||
1162 | + dev->num_vq * sizeof(struct lguest_vqconfig) | ||
1163 | + dev->feature_len * 2; | ||
1164 | } | ||
1165 | |||
1166 | /* | ||
1167 | * This routine allocates a new "struct lguest_device_desc" from descriptor | ||
1168 | * table page just above the Guest's normal memory. It returns a pointer to | ||
1169 | * that descriptor. | ||
1170 | */ | ||
1171 | static struct lguest_device_desc *new_dev_desc(u16 type) | ||
1172 | { | ||
1173 | struct lguest_device_desc d = { .type = type }; | ||
1174 | void *p; | ||
1175 | |||
1176 | /* Figure out where the next device config is, based on the last one. */ | ||
1177 | if (devices.lastdev) | ||
1178 | p = device_config(devices.lastdev) | ||
1179 | + devices.lastdev->desc->config_len; | ||
1180 | else | ||
1181 | p = devices.descpage; | ||
1182 | |||
1183 | /* We only have one page for all the descriptors. */ | ||
1184 | if (p + sizeof(d) > (void *)devices.descpage + getpagesize()) | ||
1185 | errx(1, "Too many devices"); | ||
1186 | |||
1187 | /* p might not be aligned, so we memcpy in. */ | ||
1188 | return memcpy(p, &d, sizeof(d)); | ||
1189 | } | ||
1190 | |||
1191 | /* | ||
1192 | * Each device descriptor is followed by the description of its virtqueues. We | ||
1193 | * specify how many descriptors the virtqueue is to have. | ||
1194 | */ | ||
1195 | static void add_virtqueue(struct device *dev, unsigned int num_descs, | ||
1196 | void (*service)(struct virtqueue *)) | ||
1197 | { | ||
1198 | unsigned int pages; | ||
1199 | struct virtqueue **i, *vq = malloc(sizeof(*vq)); | ||
1200 | void *p; | ||
1201 | |||
1202 | /* First we need some memory for this virtqueue. */ | ||
1203 | pages = (vring_size(num_descs, LGUEST_VRING_ALIGN) + getpagesize() - 1) | ||
1204 | / getpagesize(); | ||
1205 | p = get_pages(pages); | ||
1206 | |||
1207 | /* Initialize the virtqueue */ | ||
1208 | vq->next = NULL; | ||
1209 | vq->last_avail_idx = 0; | ||
1210 | vq->dev = dev; | ||
1211 | |||
1212 | /* | ||
1213 | * This is the routine the service thread will run, and its Process ID | ||
1214 | * once it's running. | ||
1215 | */ | ||
1216 | vq->service = service; | ||
1217 | vq->thread = (pid_t)-1; | ||
1218 | |||
1219 | /* Initialize the configuration. */ | ||
1220 | vq->config.num = num_descs; | ||
1221 | vq->config.irq = devices.next_irq++; | ||
1222 | vq->config.pfn = to_guest_phys(p) / getpagesize(); | ||
1223 | |||
1224 | /* Initialize the vring. */ | ||
1225 | vring_init(&vq->vring, num_descs, p, LGUEST_VRING_ALIGN); | ||
1226 | |||
1227 | /* | ||
1228 | * Append virtqueue to this device's descriptor. We use | ||
1229 | * device_config() to get the end of the device's current virtqueues; | ||
1230 | * we check that we haven't added any config or feature information | ||
1231 | * yet, otherwise we'd be overwriting them. | ||
1232 | */ | ||
1233 | assert(dev->desc->config_len == 0 && dev->desc->feature_len == 0); | ||
1234 | memcpy(device_config(dev), &vq->config, sizeof(vq->config)); | ||
1235 | dev->num_vq++; | ||
1236 | dev->desc->num_vq++; | ||
1237 | |||
1238 | verbose("Virtqueue page %#lx\n", to_guest_phys(p)); | ||
1239 | |||
1240 | /* | ||
1241 | * Add to tail of list, so dev->vq is first vq, dev->vq->next is | ||
1242 | * second. | ||
1243 | */ | ||
1244 | for (i = &dev->vq; *i; i = &(*i)->next); | ||
1245 | *i = vq; | ||
1246 | } | ||
1247 | |||
1248 | /* | ||
1249 | * The first half of the feature bitmask is for us to advertise features. The | ||
1250 | * second half is for the Guest to accept features. | ||
1251 | */ | ||
1252 | static void add_feature(struct device *dev, unsigned bit) | ||
1253 | { | ||
1254 | u8 *features = get_feature_bits(dev); | ||
1255 | |||
1256 | /* We can't extend the feature bits once we've added config bytes */ | ||
1257 | if (dev->desc->feature_len <= bit / CHAR_BIT) { | ||
1258 | assert(dev->desc->config_len == 0); | ||
1259 | dev->feature_len = dev->desc->feature_len = (bit/CHAR_BIT) + 1; | ||
1260 | } | ||
1261 | |||
1262 | features[bit / CHAR_BIT] |= (1 << (bit % CHAR_BIT)); | ||
1263 | } | ||
1264 | |||
1265 | /* | ||
1266 | * This routine sets the configuration fields for an existing device's | ||
1267 | * descriptor. It only works for the last device, but that's OK because that's | ||
1268 | * how we use it. | ||
1269 | */ | ||
1270 | static void set_config(struct device *dev, unsigned len, const void *conf) | ||
1271 | { | ||
1272 | /* Check we haven't overflowed our single page. */ | ||
1273 | if (device_config(dev) + len > devices.descpage + getpagesize()) | ||
1274 | errx(1, "Too many devices"); | ||
1275 | |||
1276 | /* Copy in the config information, and store the length. */ | ||
1277 | memcpy(device_config(dev), conf, len); | ||
1278 | dev->desc->config_len = len; | ||
1279 | |||
1280 | /* Size must fit in config_len field (8 bits)! */ | ||
1281 | assert(dev->desc->config_len == len); | ||
1282 | } | ||
1283 | |||
1284 | /* | ||
1285 | * This routine does all the creation and setup of a new device, including | ||
1286 | * calling new_dev_desc() to allocate the descriptor and device memory. We | ||
1287 | * don't actually start the service threads until later. | ||
1288 | * | ||
1289 | * See what I mean about userspace being boring? | ||
1290 | */ | ||
1291 | static struct device *new_device(const char *name, u16 type) | ||
1292 | { | ||
1293 | struct device *dev = malloc(sizeof(*dev)); | ||
1294 | |||
1295 | /* Now we populate the fields one at a time. */ | ||
1296 | dev->desc = new_dev_desc(type); | ||
1297 | dev->name = name; | ||
1298 | dev->vq = NULL; | ||
1299 | dev->feature_len = 0; | ||
1300 | dev->num_vq = 0; | ||
1301 | dev->running = false; | ||
1302 | |||
1303 | /* | ||
1304 | * Append to device list. Prepending to a single-linked list is | ||
1305 | * easier, but the user expects the devices to be arranged on the bus | ||
1306 | * in command-line order. The first network device on the command line | ||
1307 | * is eth0, the first block device /dev/vda, etc. | ||
1308 | */ | ||
1309 | if (devices.lastdev) | ||
1310 | devices.lastdev->next = dev; | ||
1311 | else | ||
1312 | devices.dev = dev; | ||
1313 | devices.lastdev = dev; | ||
1314 | |||
1315 | return dev; | ||
1316 | } | ||
1317 | |||
1318 | /* | ||
1319 | * Our first setup routine is the console. It's a fairly simple device, but | ||
1320 | * UNIX tty handling makes it uglier than it could be. | ||
1321 | */ | ||
1322 | static void setup_console(void) | ||
1323 | { | ||
1324 | struct device *dev; | ||
1325 | |||
1326 | /* If we can save the initial standard input settings... */ | ||
1327 | if (tcgetattr(STDIN_FILENO, &orig_term) == 0) { | ||
1328 | struct termios term = orig_term; | ||
1329 | /* | ||
1330 | * Then we turn off echo, line buffering and ^C etc: We want a | ||
1331 | * raw input stream to the Guest. | ||
1332 | */ | ||
1333 | term.c_lflag &= ~(ISIG|ICANON|ECHO); | ||
1334 | tcsetattr(STDIN_FILENO, TCSANOW, &term); | ||
1335 | } | ||
1336 | |||
1337 | dev = new_device("console", VIRTIO_ID_CONSOLE); | ||
1338 | |||
1339 | /* We store the console state in dev->priv, and initialize it. */ | ||
1340 | dev->priv = malloc(sizeof(struct console_abort)); | ||
1341 | ((struct console_abort *)dev->priv)->count = 0; | ||
1342 | |||
1343 | /* | ||
1344 | * The console needs two virtqueues: the input then the output. When | ||
1345 | * they put something the input queue, we make sure we're listening to | ||
1346 | * stdin. When they put something in the output queue, we write it to | ||
1347 | * stdout. | ||
1348 | */ | ||
1349 | add_virtqueue(dev, VIRTQUEUE_NUM, console_input); | ||
1350 | add_virtqueue(dev, VIRTQUEUE_NUM, console_output); | ||
1351 | |||
1352 | verbose("device %u: console\n", ++devices.device_num); | ||
1353 | } | ||
1354 | /*:*/ | ||
1355 | |||
1356 | /*M:010 | ||
1357 | * Inter-guest networking is an interesting area. Simplest is to have a | ||
1358 | * --sharenet=<name> option which opens or creates a named pipe. This can be | ||
1359 | * used to send packets to another guest in a 1:1 manner. | ||
1360 | * | ||
1361 | * More sophisticated is to use one of the tools developed for project like UML | ||
1362 | * to do networking. | ||
1363 | * | ||
1364 | * Faster is to do virtio bonding in kernel. Doing this 1:1 would be | ||
1365 | * completely generic ("here's my vring, attach to your vring") and would work | ||
1366 | * for any traffic. Of course, namespace and permissions issues need to be | ||
1367 | * dealt with. A more sophisticated "multi-channel" virtio_net.c could hide | ||
1368 | * multiple inter-guest channels behind one interface, although it would | ||
1369 | * require some manner of hotplugging new virtio channels. | ||
1370 | * | ||
1371 | * Finally, we could use a virtio network switch in the kernel, ie. vhost. | ||
1372 | :*/ | ||
1373 | |||
1374 | static u32 str2ip(const char *ipaddr) | ||
1375 | { | ||
1376 | unsigned int b[4]; | ||
1377 | |||
1378 | if (sscanf(ipaddr, "%u.%u.%u.%u", &b[0], &b[1], &b[2], &b[3]) != 4) | ||
1379 | errx(1, "Failed to parse IP address '%s'", ipaddr); | ||
1380 | return (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[3]; | ||
1381 | } | ||
1382 | |||
1383 | static void str2mac(const char *macaddr, unsigned char mac[6]) | ||
1384 | { | ||
1385 | unsigned int m[6]; | ||
1386 | if (sscanf(macaddr, "%02x:%02x:%02x:%02x:%02x:%02x", | ||
1387 | &m[0], &m[1], &m[2], &m[3], &m[4], &m[5]) != 6) | ||
1388 | errx(1, "Failed to parse mac address '%s'", macaddr); | ||
1389 | mac[0] = m[0]; | ||
1390 | mac[1] = m[1]; | ||
1391 | mac[2] = m[2]; | ||
1392 | mac[3] = m[3]; | ||
1393 | mac[4] = m[4]; | ||
1394 | mac[5] = m[5]; | ||
1395 | } | ||
1396 | |||
1397 | /* | ||
1398 | * This code is "adapted" from libbridge: it attaches the Host end of the | ||
1399 | * network device to the bridge device specified by the command line. | ||
1400 | * | ||
1401 | * This is yet another James Morris contribution (I'm an IP-level guy, so I | ||
1402 | * dislike bridging), and I just try not to break it. | ||
1403 | */ | ||
1404 | static void add_to_bridge(int fd, const char *if_name, const char *br_name) | ||
1405 | { | ||
1406 | int ifidx; | ||
1407 | struct ifreq ifr; | ||
1408 | |||
1409 | if (!*br_name) | ||
1410 | errx(1, "must specify bridge name"); | ||
1411 | |||
1412 | ifidx = if_nametoindex(if_name); | ||
1413 | if (!ifidx) | ||
1414 | errx(1, "interface %s does not exist!", if_name); | ||
1415 | |||
1416 | strncpy(ifr.ifr_name, br_name, IFNAMSIZ); | ||
1417 | ifr.ifr_name[IFNAMSIZ-1] = '\0'; | ||
1418 | ifr.ifr_ifindex = ifidx; | ||
1419 | if (ioctl(fd, SIOCBRADDIF, &ifr) < 0) | ||
1420 | err(1, "can't add %s to bridge %s", if_name, br_name); | ||
1421 | } | ||
1422 | |||
1423 | /* | ||
1424 | * This sets up the Host end of the network device with an IP address, brings | ||
1425 | * it up so packets will flow, the copies the MAC address into the hwaddr | ||
1426 | * pointer. | ||
1427 | */ | ||
1428 | static void configure_device(int fd, const char *tapif, u32 ipaddr) | ||
1429 | { | ||
1430 | struct ifreq ifr; | ||
1431 | struct sockaddr_in sin; | ||
1432 | |||
1433 | memset(&ifr, 0, sizeof(ifr)); | ||
1434 | strcpy(ifr.ifr_name, tapif); | ||
1435 | |||
1436 | /* Don't read these incantations. Just cut & paste them like I did! */ | ||
1437 | sin.sin_family = AF_INET; | ||
1438 | sin.sin_addr.s_addr = htonl(ipaddr); | ||
1439 | memcpy(&ifr.ifr_addr, &sin, sizeof(sin)); | ||
1440 | if (ioctl(fd, SIOCSIFADDR, &ifr) != 0) | ||
1441 | err(1, "Setting %s interface address", tapif); | ||
1442 | ifr.ifr_flags = IFF_UP; | ||
1443 | if (ioctl(fd, SIOCSIFFLAGS, &ifr) != 0) | ||
1444 | err(1, "Bringing interface %s up", tapif); | ||
1445 | } | ||
1446 | |||
1447 | static int get_tun_device(char tapif[IFNAMSIZ]) | ||
1448 | { | ||
1449 | struct ifreq ifr; | ||
1450 | int netfd; | ||
1451 | |||
1452 | /* Start with this zeroed. Messy but sure. */ | ||
1453 | memset(&ifr, 0, sizeof(ifr)); | ||
1454 | |||
1455 | /* | ||
1456 | * We open the /dev/net/tun device and tell it we want a tap device. A | ||
1457 | * tap device is like a tun device, only somehow different. To tell | ||
1458 | * the truth, I completely blundered my way through this code, but it | ||
1459 | * works now! | ||
1460 | */ | ||
1461 | netfd = open_or_die("/dev/net/tun", O_RDWR); | ||
1462 | ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR; | ||
1463 | strcpy(ifr.ifr_name, "tap%d"); | ||
1464 | if (ioctl(netfd, TUNSETIFF, &ifr) != 0) | ||
1465 | err(1, "configuring /dev/net/tun"); | ||
1466 | |||
1467 | if (ioctl(netfd, TUNSETOFFLOAD, | ||
1468 | TUN_F_CSUM|TUN_F_TSO4|TUN_F_TSO6|TUN_F_TSO_ECN) != 0) | ||
1469 | err(1, "Could not set features for tun device"); | ||
1470 | |||
1471 | /* | ||
1472 | * We don't need checksums calculated for packets coming in this | ||
1473 | * device: trust us! | ||
1474 | */ | ||
1475 | ioctl(netfd, TUNSETNOCSUM, 1); | ||
1476 | |||
1477 | memcpy(tapif, ifr.ifr_name, IFNAMSIZ); | ||
1478 | return netfd; | ||
1479 | } | ||
1480 | |||
1481 | /*L:195 | ||
1482 | * Our network is a Host<->Guest network. This can either use bridging or | ||
1483 | * routing, but the principle is the same: it uses the "tun" device to inject | ||
1484 | * packets into the Host as if they came in from a normal network card. We | ||
1485 | * just shunt packets between the Guest and the tun device. | ||
1486 | */ | ||
1487 | static void setup_tun_net(char *arg) | ||
1488 | { | ||
1489 | struct device *dev; | ||
1490 | struct net_info *net_info = malloc(sizeof(*net_info)); | ||
1491 | int ipfd; | ||
1492 | u32 ip = INADDR_ANY; | ||
1493 | bool bridging = false; | ||
1494 | char tapif[IFNAMSIZ], *p; | ||
1495 | struct virtio_net_config conf; | ||
1496 | |||
1497 | net_info->tunfd = get_tun_device(tapif); | ||
1498 | |||
1499 | /* First we create a new network device. */ | ||
1500 | dev = new_device("net", VIRTIO_ID_NET); | ||
1501 | dev->priv = net_info; | ||
1502 | |||
1503 | /* Network devices need a recv and a send queue, just like console. */ | ||
1504 | add_virtqueue(dev, VIRTQUEUE_NUM, net_input); | ||
1505 | add_virtqueue(dev, VIRTQUEUE_NUM, net_output); | ||
1506 | |||
1507 | /* | ||
1508 | * We need a socket to perform the magic network ioctls to bring up the | ||
1509 | * tap interface, connect to the bridge etc. Any socket will do! | ||
1510 | */ | ||
1511 | ipfd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); | ||
1512 | if (ipfd < 0) | ||
1513 | err(1, "opening IP socket"); | ||
1514 | |||
1515 | /* If the command line was --tunnet=bridge:<name> do bridging. */ | ||
1516 | if (!strncmp(BRIDGE_PFX, arg, strlen(BRIDGE_PFX))) { | ||
1517 | arg += strlen(BRIDGE_PFX); | ||
1518 | bridging = true; | ||
1519 | } | ||
1520 | |||
1521 | /* A mac address may follow the bridge name or IP address */ | ||
1522 | p = strchr(arg, ':'); | ||
1523 | if (p) { | ||
1524 | str2mac(p+1, conf.mac); | ||
1525 | add_feature(dev, VIRTIO_NET_F_MAC); | ||
1526 | *p = '\0'; | ||
1527 | } | ||
1528 | |||
1529 | /* arg is now either an IP address or a bridge name */ | ||
1530 | if (bridging) | ||
1531 | add_to_bridge(ipfd, tapif, arg); | ||
1532 | else | ||
1533 | ip = str2ip(arg); | ||
1534 | |||
1535 | /* Set up the tun device. */ | ||
1536 | configure_device(ipfd, tapif, ip); | ||
1537 | |||
1538 | /* Expect Guest to handle everything except UFO */ | ||
1539 | add_feature(dev, VIRTIO_NET_F_CSUM); | ||
1540 | add_feature(dev, VIRTIO_NET_F_GUEST_CSUM); | ||
1541 | add_feature(dev, VIRTIO_NET_F_GUEST_TSO4); | ||
1542 | add_feature(dev, VIRTIO_NET_F_GUEST_TSO6); | ||
1543 | add_feature(dev, VIRTIO_NET_F_GUEST_ECN); | ||
1544 | add_feature(dev, VIRTIO_NET_F_HOST_TSO4); | ||
1545 | add_feature(dev, VIRTIO_NET_F_HOST_TSO6); | ||
1546 | add_feature(dev, VIRTIO_NET_F_HOST_ECN); | ||
1547 | /* We handle indirect ring entries */ | ||
1548 | add_feature(dev, VIRTIO_RING_F_INDIRECT_DESC); | ||
1549 | set_config(dev, sizeof(conf), &conf); | ||
1550 | |||
1551 | /* We don't need the socket any more; setup is done. */ | ||
1552 | close(ipfd); | ||
1553 | |||
1554 | devices.device_num++; | ||
1555 | |||
1556 | if (bridging) | ||
1557 | verbose("device %u: tun %s attached to bridge: %s\n", | ||
1558 | devices.device_num, tapif, arg); | ||
1559 | else | ||
1560 | verbose("device %u: tun %s: %s\n", | ||
1561 | devices.device_num, tapif, arg); | ||
1562 | } | ||
1563 | /*:*/ | ||
1564 | |||
1565 | /* This hangs off device->priv. */ | ||
1566 | struct vblk_info { | ||
1567 | /* The size of the file. */ | ||
1568 | off64_t len; | ||
1569 | |||
1570 | /* The file descriptor for the file. */ | ||
1571 | int fd; | ||
1572 | |||
1573 | }; | ||
1574 | |||
1575 | /*L:210 | ||
1576 | * The Disk | ||
1577 | * | ||
1578 | * The disk only has one virtqueue, so it only has one thread. It is really | ||
1579 | * simple: the Guest asks for a block number and we read or write that position | ||
1580 | * in the file. | ||
1581 | * | ||
1582 | * Before we serviced each virtqueue in a separate thread, that was unacceptably | ||
1583 | * slow: the Guest waits until the read is finished before running anything | ||
1584 | * else, even if it could have been doing useful work. | ||
1585 | * | ||
1586 | * We could have used async I/O, except it's reputed to suck so hard that | ||
1587 | * characters actually go missing from your code when you try to use it. | ||
1588 | */ | ||
1589 | static void blk_request(struct virtqueue *vq) | ||
1590 | { | ||
1591 | struct vblk_info *vblk = vq->dev->priv; | ||
1592 | unsigned int head, out_num, in_num, wlen; | ||
1593 | int ret; | ||
1594 | u8 *in; | ||
1595 | struct virtio_blk_outhdr *out; | ||
1596 | struct iovec iov[vq->vring.num]; | ||
1597 | off64_t off; | ||
1598 | |||
1599 | /* | ||
1600 | * Get the next request, where we normally wait. It triggers the | ||
1601 | * interrupt to acknowledge previously serviced requests (if any). | ||
1602 | */ | ||
1603 | head = wait_for_vq_desc(vq, iov, &out_num, &in_num); | ||
1604 | |||
1605 | /* | ||
1606 | * Every block request should contain at least one output buffer | ||
1607 | * (detailing the location on disk and the type of request) and one | ||
1608 | * input buffer (to hold the result). | ||
1609 | */ | ||
1610 | if (out_num == 0 || in_num == 0) | ||
1611 | errx(1, "Bad virtblk cmd %u out=%u in=%u", | ||
1612 | head, out_num, in_num); | ||
1613 | |||
1614 | out = convert(&iov[0], struct virtio_blk_outhdr); | ||
1615 | in = convert(&iov[out_num+in_num-1], u8); | ||
1616 | /* | ||
1617 | * For historical reasons, block operations are expressed in 512 byte | ||
1618 | * "sectors". | ||
1619 | */ | ||
1620 | off = out->sector * 512; | ||
1621 | |||
1622 | /* | ||
1623 | * In general the virtio block driver is allowed to try SCSI commands. | ||
1624 | * It'd be nice if we supported eject, for example, but we don't. | ||
1625 | */ | ||
1626 | if (out->type & VIRTIO_BLK_T_SCSI_CMD) { | ||
1627 | fprintf(stderr, "Scsi commands unsupported\n"); | ||
1628 | *in = VIRTIO_BLK_S_UNSUPP; | ||
1629 | wlen = sizeof(*in); | ||
1630 | } else if (out->type & VIRTIO_BLK_T_OUT) { | ||
1631 | /* | ||
1632 | * Write | ||
1633 | * | ||
1634 | * Move to the right location in the block file. This can fail | ||
1635 | * if they try to write past end. | ||
1636 | */ | ||
1637 | if (lseek64(vblk->fd, off, SEEK_SET) != off) | ||
1638 | err(1, "Bad seek to sector %llu", out->sector); | ||
1639 | |||
1640 | ret = writev(vblk->fd, iov+1, out_num-1); | ||
1641 | verbose("WRITE to sector %llu: %i\n", out->sector, ret); | ||
1642 | |||
1643 | /* | ||
1644 | * Grr... Now we know how long the descriptor they sent was, we | ||
1645 | * make sure they didn't try to write over the end of the block | ||
1646 | * file (possibly extending it). | ||
1647 | */ | ||
1648 | if (ret > 0 && off + ret > vblk->len) { | ||
1649 | /* Trim it back to the correct length */ | ||
1650 | ftruncate64(vblk->fd, vblk->len); | ||
1651 | /* Die, bad Guest, die. */ | ||
1652 | errx(1, "Write past end %llu+%u", off, ret); | ||
1653 | } | ||
1654 | |||
1655 | wlen = sizeof(*in); | ||
1656 | *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR); | ||
1657 | } else if (out->type & VIRTIO_BLK_T_FLUSH) { | ||
1658 | /* Flush */ | ||
1659 | ret = fdatasync(vblk->fd); | ||
1660 | verbose("FLUSH fdatasync: %i\n", ret); | ||
1661 | wlen = sizeof(*in); | ||
1662 | *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR); | ||
1663 | } else { | ||
1664 | /* | ||
1665 | * Read | ||
1666 | * | ||
1667 | * Move to the right location in the block file. This can fail | ||
1668 | * if they try to read past end. | ||
1669 | */ | ||
1670 | if (lseek64(vblk->fd, off, SEEK_SET) != off) | ||
1671 | err(1, "Bad seek to sector %llu", out->sector); | ||
1672 | |||
1673 | ret = readv(vblk->fd, iov+1, in_num-1); | ||
1674 | verbose("READ from sector %llu: %i\n", out->sector, ret); | ||
1675 | if (ret >= 0) { | ||
1676 | wlen = sizeof(*in) + ret; | ||
1677 | *in = VIRTIO_BLK_S_OK; | ||
1678 | } else { | ||
1679 | wlen = sizeof(*in); | ||
1680 | *in = VIRTIO_BLK_S_IOERR; | ||
1681 | } | ||
1682 | } | ||
1683 | |||
1684 | /* Finished that request. */ | ||
1685 | add_used(vq, head, wlen); | ||
1686 | } | ||
1687 | |||
1688 | /*L:198 This actually sets up a virtual block device. */ | ||
1689 | static void setup_block_file(const char *filename) | ||
1690 | { | ||
1691 | struct device *dev; | ||
1692 | struct vblk_info *vblk; | ||
1693 | struct virtio_blk_config conf; | ||
1694 | |||
1695 | /* Creat the device. */ | ||
1696 | dev = new_device("block", VIRTIO_ID_BLOCK); | ||
1697 | |||
1698 | /* The device has one virtqueue, where the Guest places requests. */ | ||
1699 | add_virtqueue(dev, VIRTQUEUE_NUM, blk_request); | ||
1700 | |||
1701 | /* Allocate the room for our own bookkeeping */ | ||
1702 | vblk = dev->priv = malloc(sizeof(*vblk)); | ||
1703 | |||
1704 | /* First we open the file and store the length. */ | ||
1705 | vblk->fd = open_or_die(filename, O_RDWR|O_LARGEFILE); | ||
1706 | vblk->len = lseek64(vblk->fd, 0, SEEK_END); | ||
1707 | |||
1708 | /* We support FLUSH. */ | ||
1709 | add_feature(dev, VIRTIO_BLK_F_FLUSH); | ||
1710 | |||
1711 | /* Tell Guest how many sectors this device has. */ | ||
1712 | conf.capacity = cpu_to_le64(vblk->len / 512); | ||
1713 | |||
1714 | /* | ||
1715 | * Tell Guest not to put in too many descriptors at once: two are used | ||
1716 | * for the in and out elements. | ||
1717 | */ | ||
1718 | add_feature(dev, VIRTIO_BLK_F_SEG_MAX); | ||
1719 | conf.seg_max = cpu_to_le32(VIRTQUEUE_NUM - 2); | ||
1720 | |||
1721 | /* Don't try to put whole struct: we have 8 bit limit. */ | ||
1722 | set_config(dev, offsetof(struct virtio_blk_config, geometry), &conf); | ||
1723 | |||
1724 | verbose("device %u: virtblock %llu sectors\n", | ||
1725 | ++devices.device_num, le64_to_cpu(conf.capacity)); | ||
1726 | } | ||
1727 | |||
1728 | /*L:211 | ||
1729 | * Our random number generator device reads from /dev/random into the Guest's | ||
1730 | * input buffers. The usual case is that the Guest doesn't want random numbers | ||
1731 | * and so has no buffers although /dev/random is still readable, whereas | ||
1732 | * console is the reverse. | ||
1733 | * | ||
1734 | * The same logic applies, however. | ||
1735 | */ | ||
1736 | struct rng_info { | ||
1737 | int rfd; | ||
1738 | }; | ||
1739 | |||
1740 | static void rng_input(struct virtqueue *vq) | ||
1741 | { | ||
1742 | int len; | ||
1743 | unsigned int head, in_num, out_num, totlen = 0; | ||
1744 | struct rng_info *rng_info = vq->dev->priv; | ||
1745 | struct iovec iov[vq->vring.num]; | ||
1746 | |||
1747 | /* First we need a buffer from the Guests's virtqueue. */ | ||
1748 | head = wait_for_vq_desc(vq, iov, &out_num, &in_num); | ||
1749 | if (out_num) | ||
1750 | errx(1, "Output buffers in rng?"); | ||
1751 | |||
1752 | /* | ||
1753 | * Just like the console write, we loop to cover the whole iovec. | ||
1754 | * In this case, short reads actually happen quite a bit. | ||
1755 | */ | ||
1756 | while (!iov_empty(iov, in_num)) { | ||
1757 | len = readv(rng_info->rfd, iov, in_num); | ||
1758 | if (len <= 0) | ||
1759 | err(1, "Read from /dev/random gave %i", len); | ||
1760 | iov_consume(iov, in_num, len); | ||
1761 | totlen += len; | ||
1762 | } | ||
1763 | |||
1764 | /* Tell the Guest about the new input. */ | ||
1765 | add_used(vq, head, totlen); | ||
1766 | } | ||
1767 | |||
1768 | /*L:199 | ||
1769 | * This creates a "hardware" random number device for the Guest. | ||
1770 | */ | ||
1771 | static void setup_rng(void) | ||
1772 | { | ||
1773 | struct device *dev; | ||
1774 | struct rng_info *rng_info = malloc(sizeof(*rng_info)); | ||
1775 | |||
1776 | /* Our device's privat info simply contains the /dev/random fd. */ | ||
1777 | rng_info->rfd = open_or_die("/dev/random", O_RDONLY); | ||
1778 | |||
1779 | /* Create the new device. */ | ||
1780 | dev = new_device("rng", VIRTIO_ID_RNG); | ||
1781 | dev->priv = rng_info; | ||
1782 | |||
1783 | /* The device has one virtqueue, where the Guest places inbufs. */ | ||
1784 | add_virtqueue(dev, VIRTQUEUE_NUM, rng_input); | ||
1785 | |||
1786 | verbose("device %u: rng\n", devices.device_num++); | ||
1787 | } | ||
1788 | /* That's the end of device setup. */ | ||
1789 | |||
1790 | /*L:230 Reboot is pretty easy: clean up and exec() the Launcher afresh. */ | ||
1791 | static void __attribute__((noreturn)) restart_guest(void) | ||
1792 | { | ||
1793 | unsigned int i; | ||
1794 | |||
1795 | /* | ||
1796 | * Since we don't track all open fds, we simply close everything beyond | ||
1797 | * stderr. | ||
1798 | */ | ||
1799 | for (i = 3; i < FD_SETSIZE; i++) | ||
1800 | close(i); | ||
1801 | |||
1802 | /* Reset all the devices (kills all threads). */ | ||
1803 | cleanup_devices(); | ||
1804 | |||
1805 | execv(main_args[0], main_args); | ||
1806 | err(1, "Could not exec %s", main_args[0]); | ||
1807 | } | ||
1808 | |||
1809 | /*L:220 | ||
1810 | * Finally we reach the core of the Launcher which runs the Guest, serves | ||
1811 | * its input and output, and finally, lays it to rest. | ||
1812 | */ | ||
1813 | static void __attribute__((noreturn)) run_guest(void) | ||
1814 | { | ||
1815 | for (;;) { | ||
1816 | unsigned long notify_addr; | ||
1817 | int readval; | ||
1818 | |||
1819 | /* We read from the /dev/lguest device to run the Guest. */ | ||
1820 | readval = pread(lguest_fd, ¬ify_addr, | ||
1821 | sizeof(notify_addr), cpu_id); | ||
1822 | |||
1823 | /* One unsigned long means the Guest did HCALL_NOTIFY */ | ||
1824 | if (readval == sizeof(notify_addr)) { | ||
1825 | verbose("Notify on address %#lx\n", notify_addr); | ||
1826 | handle_output(notify_addr); | ||
1827 | /* ENOENT means the Guest died. Reading tells us why. */ | ||
1828 | } else if (errno == ENOENT) { | ||
1829 | char reason[1024] = { 0 }; | ||
1830 | pread(lguest_fd, reason, sizeof(reason)-1, cpu_id); | ||
1831 | errx(1, "%s", reason); | ||
1832 | /* ERESTART means that we need to reboot the guest */ | ||
1833 | } else if (errno == ERESTART) { | ||
1834 | restart_guest(); | ||
1835 | /* Anything else means a bug or incompatible change. */ | ||
1836 | } else | ||
1837 | err(1, "Running guest failed"); | ||
1838 | } | ||
1839 | } | ||
1840 | /*L:240 | ||
1841 | * This is the end of the Launcher. The good news: we are over halfway | ||
1842 | * through! The bad news: the most fiendish part of the code still lies ahead | ||
1843 | * of us. | ||
1844 | * | ||
1845 | * Are you ready? Take a deep breath and join me in the core of the Host, in | ||
1846 | * "make Host". | ||
1847 | :*/ | ||
1848 | |||
1849 | static struct option opts[] = { | ||
1850 | { "verbose", 0, NULL, 'v' }, | ||
1851 | { "tunnet", 1, NULL, 't' }, | ||
1852 | { "block", 1, NULL, 'b' }, | ||
1853 | { "rng", 0, NULL, 'r' }, | ||
1854 | { "initrd", 1, NULL, 'i' }, | ||
1855 | { "username", 1, NULL, 'u' }, | ||
1856 | { "chroot", 1, NULL, 'c' }, | ||
1857 | { NULL }, | ||
1858 | }; | ||
1859 | static void usage(void) | ||
1860 | { | ||
1861 | errx(1, "Usage: lguest [--verbose] " | ||
1862 | "[--tunnet=(<ipaddr>:<macaddr>|bridge:<bridgename>:<macaddr>)\n" | ||
1863 | "|--block=<filename>|--initrd=<filename>]...\n" | ||
1864 | "<mem-in-mb> vmlinux [args...]"); | ||
1865 | } | ||
1866 | |||
1867 | /*L:105 The main routine is where the real work begins: */ | ||
1868 | int main(int argc, char *argv[]) | ||
1869 | { | ||
1870 | /* Memory, code startpoint and size of the (optional) initrd. */ | ||
1871 | unsigned long mem = 0, start, initrd_size = 0; | ||
1872 | /* Two temporaries. */ | ||
1873 | int i, c; | ||
1874 | /* The boot information for the Guest. */ | ||
1875 | struct boot_params *boot; | ||
1876 | /* If they specify an initrd file to load. */ | ||
1877 | const char *initrd_name = NULL; | ||
1878 | |||
1879 | /* Password structure for initgroups/setres[gu]id */ | ||
1880 | struct passwd *user_details = NULL; | ||
1881 | |||
1882 | /* Directory to chroot to */ | ||
1883 | char *chroot_path = NULL; | ||
1884 | |||
1885 | /* Save the args: we "reboot" by execing ourselves again. */ | ||
1886 | main_args = argv; | ||
1887 | |||
1888 | /* | ||
1889 | * First we initialize the device list. We keep a pointer to the last | ||
1890 | * device, and the next interrupt number to use for devices (1: | ||
1891 | * remember that 0 is used by the timer). | ||
1892 | */ | ||
1893 | devices.lastdev = NULL; | ||
1894 | devices.next_irq = 1; | ||
1895 | |||
1896 | /* We're CPU 0. In fact, that's the only CPU possible right now. */ | ||
1897 | cpu_id = 0; | ||
1898 | |||
1899 | /* | ||
1900 | * We need to know how much memory so we can set up the device | ||
1901 | * descriptor and memory pages for the devices as we parse the command | ||
1902 | * line. So we quickly look through the arguments to find the amount | ||
1903 | * of memory now. | ||
1904 | */ | ||
1905 | for (i = 1; i < argc; i++) { | ||
1906 | if (argv[i][0] != '-') { | ||
1907 | mem = atoi(argv[i]) * 1024 * 1024; | ||
1908 | /* | ||
1909 | * We start by mapping anonymous pages over all of | ||
1910 | * guest-physical memory range. This fills it with 0, | ||
1911 | * and ensures that the Guest won't be killed when it | ||
1912 | * tries to access it. | ||
1913 | */ | ||
1914 | guest_base = map_zeroed_pages(mem / getpagesize() | ||
1915 | + DEVICE_PAGES); | ||
1916 | guest_limit = mem; | ||
1917 | guest_max = mem + DEVICE_PAGES*getpagesize(); | ||
1918 | devices.descpage = get_pages(1); | ||
1919 | break; | ||
1920 | } | ||
1921 | } | ||
1922 | |||
1923 | /* The options are fairly straight-forward */ | ||
1924 | while ((c = getopt_long(argc, argv, "v", opts, NULL)) != EOF) { | ||
1925 | switch (c) { | ||
1926 | case 'v': | ||
1927 | verbose = true; | ||
1928 | break; | ||
1929 | case 't': | ||
1930 | setup_tun_net(optarg); | ||
1931 | break; | ||
1932 | case 'b': | ||
1933 | setup_block_file(optarg); | ||
1934 | break; | ||
1935 | case 'r': | ||
1936 | setup_rng(); | ||
1937 | break; | ||
1938 | case 'i': | ||
1939 | initrd_name = optarg; | ||
1940 | break; | ||
1941 | case 'u': | ||
1942 | user_details = getpwnam(optarg); | ||
1943 | if (!user_details) | ||
1944 | err(1, "getpwnam failed, incorrect username?"); | ||
1945 | break; | ||
1946 | case 'c': | ||
1947 | chroot_path = optarg; | ||
1948 | break; | ||
1949 | default: | ||
1950 | warnx("Unknown argument %s", argv[optind]); | ||
1951 | usage(); | ||
1952 | } | ||
1953 | } | ||
1954 | /* | ||
1955 | * After the other arguments we expect memory and kernel image name, | ||
1956 | * followed by command line arguments for the kernel. | ||
1957 | */ | ||
1958 | if (optind + 2 > argc) | ||
1959 | usage(); | ||
1960 | |||
1961 | verbose("Guest base is at %p\n", guest_base); | ||
1962 | |||
1963 | /* We always have a console device */ | ||
1964 | setup_console(); | ||
1965 | |||
1966 | /* Now we load the kernel */ | ||
1967 | start = load_kernel(open_or_die(argv[optind+1], O_RDONLY)); | ||
1968 | |||
1969 | /* Boot information is stashed at physical address 0 */ | ||
1970 | boot = from_guest_phys(0); | ||
1971 | |||
1972 | /* Map the initrd image if requested (at top of physical memory) */ | ||
1973 | if (initrd_name) { | ||
1974 | initrd_size = load_initrd(initrd_name, mem); | ||
1975 | /* | ||
1976 | * These are the location in the Linux boot header where the | ||
1977 | * start and size of the initrd are expected to be found. | ||
1978 | */ | ||
1979 | boot->hdr.ramdisk_image = mem - initrd_size; | ||
1980 | boot->hdr.ramdisk_size = initrd_size; | ||
1981 | /* The bootloader type 0xFF means "unknown"; that's OK. */ | ||
1982 | boot->hdr.type_of_loader = 0xFF; | ||
1983 | } | ||
1984 | |||
1985 | /* | ||
1986 | * The Linux boot header contains an "E820" memory map: ours is a | ||
1987 | * simple, single region. | ||
1988 | */ | ||
1989 | boot->e820_entries = 1; | ||
1990 | boot->e820_map[0] = ((struct e820entry) { 0, mem, E820_RAM }); | ||
1991 | /* | ||
1992 | * The boot header contains a command line pointer: we put the command | ||
1993 | * line after the boot header. | ||
1994 | */ | ||
1995 | boot->hdr.cmd_line_ptr = to_guest_phys(boot + 1); | ||
1996 | /* We use a simple helper to copy the arguments separated by spaces. */ | ||
1997 | concat((char *)(boot + 1), argv+optind+2); | ||
1998 | |||
1999 | /* Set kernel alignment to 16M (CONFIG_PHYSICAL_ALIGN) */ | ||
2000 | boot->hdr.kernel_alignment = 0x1000000; | ||
2001 | |||
2002 | /* Boot protocol version: 2.07 supports the fields for lguest. */ | ||
2003 | boot->hdr.version = 0x207; | ||
2004 | |||
2005 | /* The hardware_subarch value of "1" tells the Guest it's an lguest. */ | ||
2006 | boot->hdr.hardware_subarch = 1; | ||
2007 | |||
2008 | /* Tell the entry path not to try to reload segment registers. */ | ||
2009 | boot->hdr.loadflags |= KEEP_SEGMENTS; | ||
2010 | |||
2011 | /* We tell the kernel to initialize the Guest. */ | ||
2012 | tell_kernel(start); | ||
2013 | |||
2014 | /* Ensure that we terminate if a device-servicing child dies. */ | ||
2015 | signal(SIGCHLD, kill_launcher); | ||
2016 | |||
2017 | /* If we exit via err(), this kills all the threads, restores tty. */ | ||
2018 | atexit(cleanup_devices); | ||
2019 | |||
2020 | /* If requested, chroot to a directory */ | ||
2021 | if (chroot_path) { | ||
2022 | if (chroot(chroot_path) != 0) | ||
2023 | err(1, "chroot(\"%s\") failed", chroot_path); | ||
2024 | |||
2025 | if (chdir("/") != 0) | ||
2026 | err(1, "chdir(\"/\") failed"); | ||
2027 | |||
2028 | verbose("chroot done\n"); | ||
2029 | } | ||
2030 | |||
2031 | /* If requested, drop privileges */ | ||
2032 | if (user_details) { | ||
2033 | uid_t u; | ||
2034 | gid_t g; | ||
2035 | |||
2036 | u = user_details->pw_uid; | ||
2037 | g = user_details->pw_gid; | ||
2038 | |||
2039 | if (initgroups(user_details->pw_name, g) != 0) | ||
2040 | err(1, "initgroups failed"); | ||
2041 | |||
2042 | if (setresgid(g, g, g) != 0) | ||
2043 | err(1, "setresgid failed"); | ||
2044 | |||
2045 | if (setresuid(u, u, u) != 0) | ||
2046 | err(1, "setresuid failed"); | ||
2047 | |||
2048 | verbose("Dropping privileges completed\n"); | ||
2049 | } | ||
2050 | |||
2051 | /* Finally, run the Guest. This doesn't return. */ | ||
2052 | run_guest(); | ||
2053 | } | ||
2054 | /*:*/ | ||
2055 | |||
2056 | /*M:999 | ||
2057 | * Mastery is done: you now know everything I do. | ||
2058 | * | ||
2059 | * But surely you have seen code, features and bugs in your wanderings which | ||
2060 | * you now yearn to attack? That is the real game, and I look forward to you | ||
2061 | * patching and forking lguest into the Your-Name-Here-visor. | ||
2062 | * | ||
2063 | * Farewell, and good coding! | ||
2064 | * Rusty Russell. | ||
2065 | */ | ||
diff --git a/Documentation/virtual/lguest/lguest.txt b/Documentation/virtual/lguest/lguest.txt new file mode 100644 index 00000000000..bff0c554485 --- /dev/null +++ b/Documentation/virtual/lguest/lguest.txt | |||
@@ -0,0 +1,129 @@ | |||
1 | __ | ||
2 | (___()'`; Rusty's Remarkably Unreliable Guide to Lguest | ||
3 | /, /` - or, A Young Coder's Illustrated Hypervisor | ||
4 | \\"--\\ http://lguest.ozlabs.org | ||
5 | |||
6 | Lguest is designed to be a minimal 32-bit x86 hypervisor for the Linux kernel, | ||
7 | for Linux developers and users to experiment with virtualization with the | ||
8 | minimum of complexity. Nonetheless, it should have sufficient features to | ||
9 | make it useful for specific tasks, and, of course, you are encouraged to fork | ||
10 | and enhance it (see drivers/lguest/README). | ||
11 | |||
12 | Features: | ||
13 | |||
14 | - Kernel module which runs in a normal kernel. | ||
15 | - Simple I/O model for communication. | ||
16 | - Simple program to create new guests. | ||
17 | - Logo contains cute puppies: http://lguest.ozlabs.org | ||
18 | |||
19 | Developer features: | ||
20 | |||
21 | - Fun to hack on. | ||
22 | - No ABI: being tied to a specific kernel anyway, you can change anything. | ||
23 | - Many opportunities for improvement or feature implementation. | ||
24 | |||
25 | Running Lguest: | ||
26 | |||
27 | - The easiest way to run lguest is to use same kernel as guest and host. | ||
28 | You can configure them differently, but usually it's easiest not to. | ||
29 | |||
30 | You will need to configure your kernel with the following options: | ||
31 | |||
32 | "General setup": | ||
33 | "Prompt for development and/or incomplete code/drivers" = Y | ||
34 | (CONFIG_EXPERIMENTAL=y) | ||
35 | |||
36 | "Processor type and features": | ||
37 | "Paravirtualized guest support" = Y | ||
38 | "Lguest guest support" = Y | ||
39 | "High Memory Support" = off/4GB | ||
40 | "Alignment value to which kernel should be aligned" = 0x100000 | ||
41 | (CONFIG_PARAVIRT=y, CONFIG_LGUEST_GUEST=y, CONFIG_HIGHMEM64G=n and | ||
42 | CONFIG_PHYSICAL_ALIGN=0x100000) | ||
43 | |||
44 | "Device Drivers": | ||
45 | "Block devices" | ||
46 | "Virtio block driver (EXPERIMENTAL)" = M/Y | ||
47 | "Network device support" | ||
48 | "Universal TUN/TAP device driver support" = M/Y | ||
49 | "Virtio network driver (EXPERIMENTAL)" = M/Y | ||
50 | (CONFIG_VIRTIO_BLK=m, CONFIG_VIRTIO_NET=m and CONFIG_TUN=m) | ||
51 | |||
52 | "Virtualization" | ||
53 | "Linux hypervisor example code" = M/Y | ||
54 | (CONFIG_LGUEST=m) | ||
55 | |||
56 | - A tool called "lguest" is available in this directory: type "make" | ||
57 | to build it. If you didn't build your kernel in-tree, use "make | ||
58 | O=<builddir>". | ||
59 | |||
60 | - Create or find a root disk image. There are several useful ones | ||
61 | around, such as the xm-test tiny root image at | ||
62 | http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img | ||
63 | |||
64 | For more serious work, I usually use a distribution ISO image and | ||
65 | install it under qemu, then make multiple copies: | ||
66 | |||
67 | dd if=/dev/zero of=rootfile bs=1M count=2048 | ||
68 | qemu -cdrom image.iso -hda rootfile -net user -net nic -boot d | ||
69 | |||
70 | Make sure that you install a getty on /dev/hvc0 if you want to log in on the | ||
71 | console! | ||
72 | |||
73 | - "modprobe lg" if you built it as a module. | ||
74 | |||
75 | - Run an lguest as root: | ||
76 | |||
77 | Documentation/virtual/lguest/lguest 64 vmlinux --tunnet=192.168.19.1 \ | ||
78 | --block=rootfile root=/dev/vda | ||
79 | |||
80 | Explanation: | ||
81 | 64: the amount of memory to use, in MB. | ||
82 | |||
83 | vmlinux: the kernel image found in the top of your build directory. You | ||
84 | can also use a standard bzImage. | ||
85 | |||
86 | --tunnet=192.168.19.1: configures a "tap" device for networking with this | ||
87 | IP address. | ||
88 | |||
89 | --block=rootfile: a file or block device which becomes /dev/vda | ||
90 | inside the guest. | ||
91 | |||
92 | root=/dev/vda: this (and anything else on the command line) are | ||
93 | kernel boot parameters. | ||
94 | |||
95 | - Configuring networking. I usually have the host masquerade, using | ||
96 | "iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE" and "echo 1 > | ||
97 | /proc/sys/net/ipv4/ip_forward". In this example, I would configure | ||
98 | eth0 inside the guest at 192.168.19.2. | ||
99 | |||
100 | Another method is to bridge the tap device to an external interface | ||
101 | using --tunnet=bridge:<bridgename>, and perhaps run dhcp on the guest | ||
102 | to obtain an IP address. The bridge needs to be configured first: | ||
103 | this option simply adds the tap interface to it. | ||
104 | |||
105 | A simple example on my system: | ||
106 | |||
107 | ifconfig eth0 0.0.0.0 | ||
108 | brctl addbr lg0 | ||
109 | ifconfig lg0 up | ||
110 | brctl addif lg0 eth0 | ||
111 | dhclient lg0 | ||
112 | |||
113 | Then use --tunnet=bridge:lg0 when launching the guest. | ||
114 | |||
115 | See: | ||
116 | |||
117 | http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge | ||
118 | |||
119 | for general information on how to get bridging to work. | ||
120 | |||
121 | - Random number generation. Using the --rng option will provide a | ||
122 | /dev/hwrng in the guest that will read from the host's /dev/random. | ||
123 | Use this option in conjunction with rng-tools (see ../hw_random.txt) | ||
124 | to provide entropy to the guest kernel's /dev/random. | ||
125 | |||
126 | There is a helpful mailing list at http://ozlabs.org/mailman/listinfo/lguest | ||
127 | |||
128 | Good luck! | ||
129 | Rusty Russell rusty@rustcorp.com.au. | ||
diff --git a/Documentation/vm/Makefile b/Documentation/vm/Makefile new file mode 100644 index 00000000000..3fa4d066886 --- /dev/null +++ b/Documentation/vm/Makefile | |||
@@ -0,0 +1,8 @@ | |||
1 | # kbuild trick to avoid linker error. Can be omitted if a module is built. | ||
2 | obj- := dummy.o | ||
3 | |||
4 | # List of programs to build | ||
5 | hostprogs-y := page-types hugepage-mmap hugepage-shm map_hugetlb | ||
6 | |||
7 | # Tell kbuild to always build the programs | ||
8 | always := $(hostprogs-y) | ||
diff --git a/Documentation/vm/hugepage-mmap.c b/Documentation/vm/hugepage-mmap.c new file mode 100644 index 00000000000..db0dd9a33d5 --- /dev/null +++ b/Documentation/vm/hugepage-mmap.c | |||
@@ -0,0 +1,91 @@ | |||
1 | /* | ||
2 | * hugepage-mmap: | ||
3 | * | ||
4 | * Example of using huge page memory in a user application using the mmap | ||
5 | * system call. Before running this application, make sure that the | ||
6 | * administrator has mounted the hugetlbfs filesystem (on some directory | ||
7 | * like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this | ||
8 | * example, the app is requesting memory of size 256MB that is backed by | ||
9 | * huge pages. | ||
10 | * | ||
11 | * For the ia64 architecture, the Linux kernel reserves Region number 4 for | ||
12 | * huge pages. That means that if one requires a fixed address, a huge page | ||
13 | * aligned address starting with 0x800000... will be required. If a fixed | ||
14 | * address is not required, the kernel will select an address in the proper | ||
15 | * range. | ||
16 | * Other architectures, such as ppc64, i386 or x86_64 are not so constrained. | ||
17 | */ | ||
18 | |||
19 | #include <stdlib.h> | ||
20 | #include <stdio.h> | ||
21 | #include <unistd.h> | ||
22 | #include <sys/mman.h> | ||
23 | #include <fcntl.h> | ||
24 | |||
25 | #define FILE_NAME "/mnt/hugepagefile" | ||
26 | #define LENGTH (256UL*1024*1024) | ||
27 | #define PROTECTION (PROT_READ | PROT_WRITE) | ||
28 | |||
29 | /* Only ia64 requires this */ | ||
30 | #ifdef __ia64__ | ||
31 | #define ADDR (void *)(0x8000000000000000UL) | ||
32 | #define FLAGS (MAP_SHARED | MAP_FIXED) | ||
33 | #else | ||
34 | #define ADDR (void *)(0x0UL) | ||
35 | #define FLAGS (MAP_SHARED) | ||
36 | #endif | ||
37 | |||
38 | static void check_bytes(char *addr) | ||
39 | { | ||
40 | printf("First hex is %x\n", *((unsigned int *)addr)); | ||
41 | } | ||
42 | |||
43 | static void write_bytes(char *addr) | ||
44 | { | ||
45 | unsigned long i; | ||
46 | |||
47 | for (i = 0; i < LENGTH; i++) | ||
48 | *(addr + i) = (char)i; | ||
49 | } | ||
50 | |||
51 | static void read_bytes(char *addr) | ||
52 | { | ||
53 | unsigned long i; | ||
54 | |||
55 | check_bytes(addr); | ||
56 | for (i = 0; i < LENGTH; i++) | ||
57 | if (*(addr + i) != (char)i) { | ||
58 | printf("Mismatch at %lu\n", i); | ||
59 | break; | ||
60 | } | ||
61 | } | ||
62 | |||
63 | int main(void) | ||
64 | { | ||
65 | void *addr; | ||
66 | int fd; | ||
67 | |||
68 | fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755); | ||
69 | if (fd < 0) { | ||
70 | perror("Open failed"); | ||
71 | exit(1); | ||
72 | } | ||
73 | |||
74 | addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0); | ||
75 | if (addr == MAP_FAILED) { | ||
76 | perror("mmap"); | ||
77 | unlink(FILE_NAME); | ||
78 | exit(1); | ||
79 | } | ||
80 | |||
81 | printf("Returned address is %p\n", addr); | ||
82 | check_bytes(addr); | ||
83 | write_bytes(addr); | ||
84 | read_bytes(addr); | ||
85 | |||
86 | munmap(addr, LENGTH); | ||
87 | close(fd); | ||
88 | unlink(FILE_NAME); | ||
89 | |||
90 | return 0; | ||
91 | } | ||
diff --git a/Documentation/vm/hugepage-shm.c b/Documentation/vm/hugepage-shm.c new file mode 100644 index 00000000000..07956d8592c --- /dev/null +++ b/Documentation/vm/hugepage-shm.c | |||
@@ -0,0 +1,98 @@ | |||
1 | /* | ||
2 | * hugepage-shm: | ||
3 | * | ||
4 | * Example of using huge page memory in a user application using Sys V shared | ||
5 | * memory system calls. In this example the app is requesting 256MB of | ||
6 | * memory that is backed by huge pages. The application uses the flag | ||
7 | * SHM_HUGETLB in the shmget system call to inform the kernel that it is | ||
8 | * requesting huge pages. | ||
9 | * | ||
10 | * For the ia64 architecture, the Linux kernel reserves Region number 4 for | ||
11 | * huge pages. That means that if one requires a fixed address, a huge page | ||
12 | * aligned address starting with 0x800000... will be required. If a fixed | ||
13 | * address is not required, the kernel will select an address in the proper | ||
14 | * range. | ||
15 | * Other architectures, such as ppc64, i386 or x86_64 are not so constrained. | ||
16 | * | ||
17 | * Note: The default shared memory limit is quite low on many kernels, | ||
18 | * you may need to increase it via: | ||
19 | * | ||
20 | * echo 268435456 > /proc/sys/kernel/shmmax | ||
21 | * | ||
22 | * This will increase the maximum size per shared memory segment to 256MB. | ||
23 | * The other limit that you will hit eventually is shmall which is the | ||
24 | * total amount of shared memory in pages. To set it to 16GB on a system | ||
25 | * with a 4kB pagesize do: | ||
26 | * | ||
27 | * echo 4194304 > /proc/sys/kernel/shmall | ||
28 | */ | ||
29 | |||
30 | #include <stdlib.h> | ||
31 | #include <stdio.h> | ||
32 | #include <sys/types.h> | ||
33 | #include <sys/ipc.h> | ||
34 | #include <sys/shm.h> | ||
35 | #include <sys/mman.h> | ||
36 | |||
37 | #ifndef SHM_HUGETLB | ||
38 | #define SHM_HUGETLB 04000 | ||
39 | #endif | ||
40 | |||
41 | #define LENGTH (256UL*1024*1024) | ||
42 | |||
43 | #define dprintf(x) printf(x) | ||
44 | |||
45 | /* Only ia64 requires this */ | ||
46 | #ifdef __ia64__ | ||
47 | #define ADDR (void *)(0x8000000000000000UL) | ||
48 | #define SHMAT_FLAGS (SHM_RND) | ||
49 | #else | ||
50 | #define ADDR (void *)(0x0UL) | ||
51 | #define SHMAT_FLAGS (0) | ||
52 | #endif | ||
53 | |||
54 | int main(void) | ||
55 | { | ||
56 | int shmid; | ||
57 | unsigned long i; | ||
58 | char *shmaddr; | ||
59 | |||
60 | if ((shmid = shmget(2, LENGTH, | ||
61 | SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) < 0) { | ||
62 | perror("shmget"); | ||
63 | exit(1); | ||
64 | } | ||
65 | printf("shmid: 0x%x\n", shmid); | ||
66 | |||
67 | shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS); | ||
68 | if (shmaddr == (char *)-1) { | ||
69 | perror("Shared memory attach failure"); | ||
70 | shmctl(shmid, IPC_RMID, NULL); | ||
71 | exit(2); | ||
72 | } | ||
73 | printf("shmaddr: %p\n", shmaddr); | ||
74 | |||
75 | dprintf("Starting the writes:\n"); | ||
76 | for (i = 0; i < LENGTH; i++) { | ||
77 | shmaddr[i] = (char)(i); | ||
78 | if (!(i % (1024 * 1024))) | ||
79 | dprintf("."); | ||
80 | } | ||
81 | dprintf("\n"); | ||
82 | |||
83 | dprintf("Starting the Check..."); | ||
84 | for (i = 0; i < LENGTH; i++) | ||
85 | if (shmaddr[i] != (char)i) | ||
86 | printf("\nIndex %lu mismatched\n", i); | ||
87 | dprintf("Done.\n"); | ||
88 | |||
89 | if (shmdt((const void *)shmaddr) != 0) { | ||
90 | perror("Detach failure"); | ||
91 | shmctl(shmid, IPC_RMID, NULL); | ||
92 | exit(3); | ||
93 | } | ||
94 | |||
95 | shmctl(shmid, IPC_RMID, NULL); | ||
96 | |||
97 | return 0; | ||
98 | } | ||
diff --git a/Documentation/vm/map_hugetlb.c b/Documentation/vm/map_hugetlb.c new file mode 100644 index 00000000000..eda1a6d3578 --- /dev/null +++ b/Documentation/vm/map_hugetlb.c | |||
@@ -0,0 +1,77 @@ | |||
1 | /* | ||
2 | * Example of using hugepage memory in a user application using the mmap | ||
3 | * system call with MAP_HUGETLB flag. Before running this program make | ||
4 | * sure the administrator has allocated enough default sized huge pages | ||
5 | * to cover the 256 MB allocation. | ||
6 | * | ||
7 | * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages. | ||
8 | * That means the addresses starting with 0x800000... will need to be | ||
9 | * specified. Specifying a fixed address is not required on ppc64, i386 | ||
10 | * or x86_64. | ||
11 | */ | ||
12 | #include <stdlib.h> | ||
13 | #include <stdio.h> | ||
14 | #include <unistd.h> | ||
15 | #include <sys/mman.h> | ||
16 | #include <fcntl.h> | ||
17 | |||
18 | #define LENGTH (256UL*1024*1024) | ||
19 | #define PROTECTION (PROT_READ | PROT_WRITE) | ||
20 | |||
21 | #ifndef MAP_HUGETLB | ||
22 | #define MAP_HUGETLB 0x40000 /* arch specific */ | ||
23 | #endif | ||
24 | |||
25 | /* Only ia64 requires this */ | ||
26 | #ifdef __ia64__ | ||
27 | #define ADDR (void *)(0x8000000000000000UL) | ||
28 | #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED) | ||
29 | #else | ||
30 | #define ADDR (void *)(0x0UL) | ||
31 | #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) | ||
32 | #endif | ||
33 | |||
34 | static void check_bytes(char *addr) | ||
35 | { | ||
36 | printf("First hex is %x\n", *((unsigned int *)addr)); | ||
37 | } | ||
38 | |||
39 | static void write_bytes(char *addr) | ||
40 | { | ||
41 | unsigned long i; | ||
42 | |||
43 | for (i = 0; i < LENGTH; i++) | ||
44 | *(addr + i) = (char)i; | ||
45 | } | ||
46 | |||
47 | static void read_bytes(char *addr) | ||
48 | { | ||
49 | unsigned long i; | ||
50 | |||
51 | check_bytes(addr); | ||
52 | for (i = 0; i < LENGTH; i++) | ||
53 | if (*(addr + i) != (char)i) { | ||
54 | printf("Mismatch at %lu\n", i); | ||
55 | break; | ||
56 | } | ||
57 | } | ||
58 | |||
59 | int main(void) | ||
60 | { | ||
61 | void *addr; | ||
62 | |||
63 | addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, 0, 0); | ||
64 | if (addr == MAP_FAILED) { | ||
65 | perror("mmap"); | ||
66 | exit(1); | ||
67 | } | ||
68 | |||
69 | printf("Returned address is %p\n", addr); | ||
70 | check_bytes(addr); | ||
71 | write_bytes(addr); | ||
72 | read_bytes(addr); | ||
73 | |||
74 | munmap(addr, LENGTH); | ||
75 | |||
76 | return 0; | ||
77 | } | ||
diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c new file mode 100644 index 00000000000..7445caa26d0 --- /dev/null +++ b/Documentation/vm/page-types.c | |||
@@ -0,0 +1,1100 @@ | |||
1 | /* | ||
2 | * page-types: Tool for querying page flags | ||
3 | * | ||
4 | * This program is free software; you can redistribute it and/or modify it | ||
5 | * under the terms of the GNU General Public License as published by the Free | ||
6 | * Software Foundation; version 2. | ||
7 | * | ||
8 | * This program is distributed in the hope that it will be useful, but WITHOUT | ||
9 | * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or | ||
10 | * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for | ||
11 | * more details. | ||
12 | * | ||
13 | * You should find a copy of v2 of the GNU General Public License somewhere on | ||
14 | * your Linux system; if not, write to the Free Software Foundation, Inc., 59 | ||
15 | * Temple Place, Suite 330, Boston, MA 02111-1307 USA. | ||
16 | * | ||
17 | * Copyright (C) 2009 Intel corporation | ||
18 | * | ||
19 | * Authors: Wu Fengguang <fengguang.wu@intel.com> | ||
20 | */ | ||
21 | |||
22 | #define _LARGEFILE64_SOURCE | ||
23 | #include <stdio.h> | ||
24 | #include <stdlib.h> | ||
25 | #include <unistd.h> | ||
26 | #include <stdint.h> | ||
27 | #include <stdarg.h> | ||
28 | #include <string.h> | ||
29 | #include <getopt.h> | ||
30 | #include <limits.h> | ||
31 | #include <assert.h> | ||
32 | #include <sys/types.h> | ||
33 | #include <sys/errno.h> | ||
34 | #include <sys/fcntl.h> | ||
35 | #include <sys/mount.h> | ||
36 | #include <sys/statfs.h> | ||
37 | #include "../../include/linux/magic.h" | ||
38 | |||
39 | |||
40 | #ifndef MAX_PATH | ||
41 | # define MAX_PATH 256 | ||
42 | #endif | ||
43 | |||
44 | #ifndef STR | ||
45 | # define _STR(x) #x | ||
46 | # define STR(x) _STR(x) | ||
47 | #endif | ||
48 | |||
49 | /* | ||
50 | * pagemap kernel ABI bits | ||
51 | */ | ||
52 | |||
53 | #define PM_ENTRY_BYTES sizeof(uint64_t) | ||
54 | #define PM_STATUS_BITS 3 | ||
55 | #define PM_STATUS_OFFSET (64 - PM_STATUS_BITS) | ||
56 | #define PM_STATUS_MASK (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET) | ||
57 | #define PM_STATUS(nr) (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK) | ||
58 | #define PM_PSHIFT_BITS 6 | ||
59 | #define PM_PSHIFT_OFFSET (PM_STATUS_OFFSET - PM_PSHIFT_BITS) | ||
60 | #define PM_PSHIFT_MASK (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET) | ||
61 | #define PM_PSHIFT(x) (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK) | ||
62 | #define PM_PFRAME_MASK ((1LL << PM_PSHIFT_OFFSET) - 1) | ||
63 | #define PM_PFRAME(x) ((x) & PM_PFRAME_MASK) | ||
64 | |||
65 | #define PM_PRESENT PM_STATUS(4LL) | ||
66 | #define PM_SWAP PM_STATUS(2LL) | ||
67 | |||
68 | |||
69 | /* | ||
70 | * kernel page flags | ||
71 | */ | ||
72 | |||
73 | #define KPF_BYTES 8 | ||
74 | #define PROC_KPAGEFLAGS "/proc/kpageflags" | ||
75 | |||
76 | /* copied from kpageflags_read() */ | ||
77 | #define KPF_LOCKED 0 | ||
78 | #define KPF_ERROR 1 | ||
79 | #define KPF_REFERENCED 2 | ||
80 | #define KPF_UPTODATE 3 | ||
81 | #define KPF_DIRTY 4 | ||
82 | #define KPF_LRU 5 | ||
83 | #define KPF_ACTIVE 6 | ||
84 | #define KPF_SLAB 7 | ||
85 | #define KPF_WRITEBACK 8 | ||
86 | #define KPF_RECLAIM 9 | ||
87 | #define KPF_BUDDY 10 | ||
88 | |||
89 | /* [11-20] new additions in 2.6.31 */ | ||
90 | #define KPF_MMAP 11 | ||
91 | #define KPF_ANON 12 | ||
92 | #define KPF_SWAPCACHE 13 | ||
93 | #define KPF_SWAPBACKED 14 | ||
94 | #define KPF_COMPOUND_HEAD 15 | ||
95 | #define KPF_COMPOUND_TAIL 16 | ||
96 | #define KPF_HUGE 17 | ||
97 | #define KPF_UNEVICTABLE 18 | ||
98 | #define KPF_HWPOISON 19 | ||
99 | #define KPF_NOPAGE 20 | ||
100 | #define KPF_KSM 21 | ||
101 | |||
102 | /* [32-] kernel hacking assistances */ | ||
103 | #define KPF_RESERVED 32 | ||
104 | #define KPF_MLOCKED 33 | ||
105 | #define KPF_MAPPEDTODISK 34 | ||
106 | #define KPF_PRIVATE 35 | ||
107 | #define KPF_PRIVATE_2 36 | ||
108 | #define KPF_OWNER_PRIVATE 37 | ||
109 | #define KPF_ARCH 38 | ||
110 | #define KPF_UNCACHED 39 | ||
111 | |||
112 | /* [48-] take some arbitrary free slots for expanding overloaded flags | ||
113 | * not part of kernel API | ||
114 | */ | ||
115 | #define KPF_READAHEAD 48 | ||
116 | #define KPF_SLOB_FREE 49 | ||
117 | #define KPF_SLUB_FROZEN 50 | ||
118 | #define KPF_SLUB_DEBUG 51 | ||
119 | |||
120 | #define KPF_ALL_BITS ((uint64_t)~0ULL) | ||
121 | #define KPF_HACKERS_BITS (0xffffULL << 32) | ||
122 | #define KPF_OVERLOADED_BITS (0xffffULL << 48) | ||
123 | #define BIT(name) (1ULL << KPF_##name) | ||
124 | #define BITS_COMPOUND (BIT(COMPOUND_HEAD) | BIT(COMPOUND_TAIL)) | ||
125 | |||
126 | static const char *page_flag_names[] = { | ||
127 | [KPF_LOCKED] = "L:locked", | ||
128 | [KPF_ERROR] = "E:error", | ||
129 | [KPF_REFERENCED] = "R:referenced", | ||
130 | [KPF_UPTODATE] = "U:uptodate", | ||
131 | [KPF_DIRTY] = "D:dirty", | ||
132 | [KPF_LRU] = "l:lru", | ||
133 | [KPF_ACTIVE] = "A:active", | ||
134 | [KPF_SLAB] = "S:slab", | ||
135 | [KPF_WRITEBACK] = "W:writeback", | ||
136 | [KPF_RECLAIM] = "I:reclaim", | ||
137 | [KPF_BUDDY] = "B:buddy", | ||
138 | |||
139 | [KPF_MMAP] = "M:mmap", | ||
140 | [KPF_ANON] = "a:anonymous", | ||
141 | [KPF_SWAPCACHE] = "s:swapcache", | ||
142 | [KPF_SWAPBACKED] = "b:swapbacked", | ||
143 | [KPF_COMPOUND_HEAD] = "H:compound_head", | ||
144 | [KPF_COMPOUND_TAIL] = "T:compound_tail", | ||
145 | [KPF_HUGE] = "G:huge", | ||
146 | [KPF_UNEVICTABLE] = "u:unevictable", | ||
147 | [KPF_HWPOISON] = "X:hwpoison", | ||
148 | [KPF_NOPAGE] = "n:nopage", | ||
149 | [KPF_KSM] = "x:ksm", | ||
150 | |||
151 | [KPF_RESERVED] = "r:reserved", | ||
152 | [KPF_MLOCKED] = "m:mlocked", | ||
153 | [KPF_MAPPEDTODISK] = "d:mappedtodisk", | ||
154 | [KPF_PRIVATE] = "P:private", | ||
155 | [KPF_PRIVATE_2] = "p:private_2", | ||
156 | [KPF_OWNER_PRIVATE] = "O:owner_private", | ||
157 | [KPF_ARCH] = "h:arch", | ||
158 | [KPF_UNCACHED] = "c:uncached", | ||
159 | |||
160 | [KPF_READAHEAD] = "I:readahead", | ||
161 | [KPF_SLOB_FREE] = "P:slob_free", | ||
162 | [KPF_SLUB_FROZEN] = "A:slub_frozen", | ||
163 | [KPF_SLUB_DEBUG] = "E:slub_debug", | ||
164 | }; | ||
165 | |||
166 | |||
167 | static const char *debugfs_known_mountpoints[] = { | ||
168 | "/sys/kernel/debug", | ||
169 | "/debug", | ||
170 | 0, | ||
171 | }; | ||
172 | |||
173 | /* | ||
174 | * data structures | ||
175 | */ | ||
176 | |||
177 | static int opt_raw; /* for kernel developers */ | ||
178 | static int opt_list; /* list pages (in ranges) */ | ||
179 | static int opt_no_summary; /* don't show summary */ | ||
180 | static pid_t opt_pid; /* process to walk */ | ||
181 | |||
182 | #define MAX_ADDR_RANGES 1024 | ||
183 | static int nr_addr_ranges; | ||
184 | static unsigned long opt_offset[MAX_ADDR_RANGES]; | ||
185 | static unsigned long opt_size[MAX_ADDR_RANGES]; | ||
186 | |||
187 | #define MAX_VMAS 10240 | ||
188 | static int nr_vmas; | ||
189 | static unsigned long pg_start[MAX_VMAS]; | ||
190 | static unsigned long pg_end[MAX_VMAS]; | ||
191 | |||
192 | #define MAX_BIT_FILTERS 64 | ||
193 | static int nr_bit_filters; | ||
194 | static uint64_t opt_mask[MAX_BIT_FILTERS]; | ||
195 | static uint64_t opt_bits[MAX_BIT_FILTERS]; | ||
196 | |||
197 | static int page_size; | ||
198 | |||
199 | static int pagemap_fd; | ||
200 | static int kpageflags_fd; | ||
201 | |||
202 | static int opt_hwpoison; | ||
203 | static int opt_unpoison; | ||
204 | |||
205 | static char hwpoison_debug_fs[MAX_PATH+1]; | ||
206 | static int hwpoison_inject_fd; | ||
207 | static int hwpoison_forget_fd; | ||
208 | |||
209 | #define HASH_SHIFT 13 | ||
210 | #define HASH_SIZE (1 << HASH_SHIFT) | ||
211 | #define HASH_MASK (HASH_SIZE - 1) | ||
212 | #define HASH_KEY(flags) (flags & HASH_MASK) | ||
213 | |||
214 | static unsigned long total_pages; | ||
215 | static unsigned long nr_pages[HASH_SIZE]; | ||
216 | static uint64_t page_flags[HASH_SIZE]; | ||
217 | |||
218 | |||
219 | /* | ||
220 | * helper functions | ||
221 | */ | ||
222 | |||
223 | #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) | ||
224 | |||
225 | #define min_t(type, x, y) ({ \ | ||
226 | type __min1 = (x); \ | ||
227 | type __min2 = (y); \ | ||
228 | __min1 < __min2 ? __min1 : __min2; }) | ||
229 | |||
230 | #define max_t(type, x, y) ({ \ | ||
231 | type __max1 = (x); \ | ||
232 | type __max2 = (y); \ | ||
233 | __max1 > __max2 ? __max1 : __max2; }) | ||
234 | |||
235 | static unsigned long pages2mb(unsigned long pages) | ||
236 | { | ||
237 | return (pages * page_size) >> 20; | ||
238 | } | ||
239 | |||
240 | static void fatal(const char *x, ...) | ||
241 | { | ||
242 | va_list ap; | ||
243 | |||
244 | va_start(ap, x); | ||
245 | vfprintf(stderr, x, ap); | ||
246 | va_end(ap); | ||
247 | exit(EXIT_FAILURE); | ||
248 | } | ||
249 | |||
250 | static int checked_open(const char *pathname, int flags) | ||
251 | { | ||
252 | int fd = open(pathname, flags); | ||
253 | |||
254 | if (fd < 0) { | ||
255 | perror(pathname); | ||
256 | exit(EXIT_FAILURE); | ||
257 | } | ||
258 | |||
259 | return fd; | ||
260 | } | ||
261 | |||
262 | /* | ||
263 | * pagemap/kpageflags routines | ||
264 | */ | ||
265 | |||
266 | static unsigned long do_u64_read(int fd, char *name, | ||
267 | uint64_t *buf, | ||
268 | unsigned long index, | ||
269 | unsigned long count) | ||
270 | { | ||
271 | long bytes; | ||
272 | |||
273 | if (index > ULONG_MAX / 8) | ||
274 | fatal("index overflow: %lu\n", index); | ||
275 | |||
276 | if (lseek(fd, index * 8, SEEK_SET) < 0) { | ||
277 | perror(name); | ||
278 | exit(EXIT_FAILURE); | ||
279 | } | ||
280 | |||
281 | bytes = read(fd, buf, count * 8); | ||
282 | if (bytes < 0) { | ||
283 | perror(name); | ||
284 | exit(EXIT_FAILURE); | ||
285 | } | ||
286 | if (bytes % 8) | ||
287 | fatal("partial read: %lu bytes\n", bytes); | ||
288 | |||
289 | return bytes / 8; | ||
290 | } | ||
291 | |||
292 | static unsigned long kpageflags_read(uint64_t *buf, | ||
293 | unsigned long index, | ||
294 | unsigned long pages) | ||
295 | { | ||
296 | return do_u64_read(kpageflags_fd, PROC_KPAGEFLAGS, buf, index, pages); | ||
297 | } | ||
298 | |||
299 | static unsigned long pagemap_read(uint64_t *buf, | ||
300 | unsigned long index, | ||
301 | unsigned long pages) | ||
302 | { | ||
303 | return do_u64_read(pagemap_fd, "/proc/pid/pagemap", buf, index, pages); | ||
304 | } | ||
305 | |||
306 | static unsigned long pagemap_pfn(uint64_t val) | ||
307 | { | ||
308 | unsigned long pfn; | ||
309 | |||
310 | if (val & PM_PRESENT) | ||
311 | pfn = PM_PFRAME(val); | ||
312 | else | ||
313 | pfn = 0; | ||
314 | |||
315 | return pfn; | ||
316 | } | ||
317 | |||
318 | |||
319 | /* | ||
320 | * page flag names | ||
321 | */ | ||
322 | |||
323 | static char *page_flag_name(uint64_t flags) | ||
324 | { | ||
325 | static char buf[65]; | ||
326 | int present; | ||
327 | int i, j; | ||
328 | |||
329 | for (i = 0, j = 0; i < ARRAY_SIZE(page_flag_names); i++) { | ||
330 | present = (flags >> i) & 1; | ||
331 | if (!page_flag_names[i]) { | ||
332 | if (present) | ||
333 | fatal("unknown flag bit %d\n", i); | ||
334 | continue; | ||
335 | } | ||
336 | buf[j++] = present ? page_flag_names[i][0] : '_'; | ||
337 | } | ||
338 | |||
339 | return buf; | ||
340 | } | ||
341 | |||
342 | static char *page_flag_longname(uint64_t flags) | ||
343 | { | ||
344 | static char buf[1024]; | ||
345 | int i, n; | ||
346 | |||
347 | for (i = 0, n = 0; i < ARRAY_SIZE(page_flag_names); i++) { | ||
348 | if (!page_flag_names[i]) | ||
349 | continue; | ||
350 | if ((flags >> i) & 1) | ||
351 | n += snprintf(buf + n, sizeof(buf) - n, "%s,", | ||
352 | page_flag_names[i] + 2); | ||
353 | } | ||
354 | if (n) | ||
355 | n--; | ||
356 | buf[n] = '\0'; | ||
357 | |||
358 | return buf; | ||
359 | } | ||
360 | |||
361 | |||
362 | /* | ||
363 | * page list and summary | ||
364 | */ | ||
365 | |||
366 | static void show_page_range(unsigned long voffset, | ||
367 | unsigned long offset, uint64_t flags) | ||
368 | { | ||
369 | static uint64_t flags0; | ||
370 | static unsigned long voff; | ||
371 | static unsigned long index; | ||
372 | static unsigned long count; | ||
373 | |||
374 | if (flags == flags0 && offset == index + count && | ||
375 | (!opt_pid || voffset == voff + count)) { | ||
376 | count++; | ||
377 | return; | ||
378 | } | ||
379 | |||
380 | if (count) { | ||
381 | if (opt_pid) | ||
382 | printf("%lx\t", voff); | ||
383 | printf("%lx\t%lx\t%s\n", | ||
384 | index, count, page_flag_name(flags0)); | ||
385 | } | ||
386 | |||
387 | flags0 = flags; | ||
388 | index = offset; | ||
389 | voff = voffset; | ||
390 | count = 1; | ||
391 | } | ||
392 | |||
393 | static void show_page(unsigned long voffset, | ||
394 | unsigned long offset, uint64_t flags) | ||
395 | { | ||
396 | if (opt_pid) | ||
397 | printf("%lx\t", voffset); | ||
398 | printf("%lx\t%s\n", offset, page_flag_name(flags)); | ||
399 | } | ||
400 | |||
401 | static void show_summary(void) | ||
402 | { | ||
403 | int i; | ||
404 | |||
405 | printf(" flags\tpage-count MB" | ||
406 | " symbolic-flags\t\t\tlong-symbolic-flags\n"); | ||
407 | |||
408 | for (i = 0; i < ARRAY_SIZE(nr_pages); i++) { | ||
409 | if (nr_pages[i]) | ||
410 | printf("0x%016llx\t%10lu %8lu %s\t%s\n", | ||
411 | (unsigned long long)page_flags[i], | ||
412 | nr_pages[i], | ||
413 | pages2mb(nr_pages[i]), | ||
414 | page_flag_name(page_flags[i]), | ||
415 | page_flag_longname(page_flags[i])); | ||
416 | } | ||
417 | |||
418 | printf(" total\t%10lu %8lu\n", | ||
419 | total_pages, pages2mb(total_pages)); | ||
420 | } | ||
421 | |||
422 | |||
423 | /* | ||
424 | * page flag filters | ||
425 | */ | ||
426 | |||
427 | static int bit_mask_ok(uint64_t flags) | ||
428 | { | ||
429 | int i; | ||
430 | |||
431 | for (i = 0; i < nr_bit_filters; i++) { | ||
432 | if (opt_bits[i] == KPF_ALL_BITS) { | ||
433 | if ((flags & opt_mask[i]) == 0) | ||
434 | return 0; | ||
435 | } else { | ||
436 | if ((flags & opt_mask[i]) != opt_bits[i]) | ||
437 | return 0; | ||
438 | } | ||
439 | } | ||
440 | |||
441 | return 1; | ||
442 | } | ||
443 | |||
444 | static uint64_t expand_overloaded_flags(uint64_t flags) | ||
445 | { | ||
446 | /* SLOB/SLUB overload several page flags */ | ||
447 | if (flags & BIT(SLAB)) { | ||
448 | if (flags & BIT(PRIVATE)) | ||
449 | flags ^= BIT(PRIVATE) | BIT(SLOB_FREE); | ||
450 | if (flags & BIT(ACTIVE)) | ||
451 | flags ^= BIT(ACTIVE) | BIT(SLUB_FROZEN); | ||
452 | if (flags & BIT(ERROR)) | ||
453 | flags ^= BIT(ERROR) | BIT(SLUB_DEBUG); | ||
454 | } | ||
455 | |||
456 | /* PG_reclaim is overloaded as PG_readahead in the read path */ | ||
457 | if ((flags & (BIT(RECLAIM) | BIT(WRITEBACK))) == BIT(RECLAIM)) | ||
458 | flags ^= BIT(RECLAIM) | BIT(READAHEAD); | ||
459 | |||
460 | return flags; | ||
461 | } | ||
462 | |||
463 | static uint64_t well_known_flags(uint64_t flags) | ||
464 | { | ||
465 | /* hide flags intended only for kernel hacker */ | ||
466 | flags &= ~KPF_HACKERS_BITS; | ||
467 | |||
468 | /* hide non-hugeTLB compound pages */ | ||
469 | if ((flags & BITS_COMPOUND) && !(flags & BIT(HUGE))) | ||
470 | flags &= ~BITS_COMPOUND; | ||
471 | |||
472 | return flags; | ||
473 | } | ||
474 | |||
475 | static uint64_t kpageflags_flags(uint64_t flags) | ||
476 | { | ||
477 | flags = expand_overloaded_flags(flags); | ||
478 | |||
479 | if (!opt_raw) | ||
480 | flags = well_known_flags(flags); | ||
481 | |||
482 | return flags; | ||
483 | } | ||
484 | |||
485 | /* verify that a mountpoint is actually a debugfs instance */ | ||
486 | static int debugfs_valid_mountpoint(const char *debugfs) | ||
487 | { | ||
488 | struct statfs st_fs; | ||
489 | |||
490 | if (statfs(debugfs, &st_fs) < 0) | ||
491 | return -ENOENT; | ||
492 | else if (st_fs.f_type != (long) DEBUGFS_MAGIC) | ||
493 | return -ENOENT; | ||
494 | |||
495 | return 0; | ||
496 | } | ||
497 | |||
498 | /* find the path to the mounted debugfs */ | ||
499 | static const char *debugfs_find_mountpoint(void) | ||
500 | { | ||
501 | const char **ptr; | ||
502 | char type[100]; | ||
503 | FILE *fp; | ||
504 | |||
505 | ptr = debugfs_known_mountpoints; | ||
506 | while (*ptr) { | ||
507 | if (debugfs_valid_mountpoint(*ptr) == 0) { | ||
508 | strcpy(hwpoison_debug_fs, *ptr); | ||
509 | return hwpoison_debug_fs; | ||
510 | } | ||
511 | ptr++; | ||
512 | } | ||
513 | |||
514 | /* give up and parse /proc/mounts */ | ||
515 | fp = fopen("/proc/mounts", "r"); | ||
516 | if (fp == NULL) | ||
517 | perror("Can't open /proc/mounts for read"); | ||
518 | |||
519 | while (fscanf(fp, "%*s %" | ||
520 | STR(MAX_PATH) | ||
521 | "s %99s %*s %*d %*d\n", | ||
522 | hwpoison_debug_fs, type) == 2) { | ||
523 | if (strcmp(type, "debugfs") == 0) | ||
524 | break; | ||
525 | } | ||
526 | fclose(fp); | ||
527 | |||
528 | if (strcmp(type, "debugfs") != 0) | ||
529 | return NULL; | ||
530 | |||
531 | return hwpoison_debug_fs; | ||
532 | } | ||
533 | |||
534 | /* mount the debugfs somewhere if it's not mounted */ | ||
535 | |||
536 | static void debugfs_mount(void) | ||
537 | { | ||
538 | const char **ptr; | ||
539 | |||
540 | /* see if it's already mounted */ | ||
541 | if (debugfs_find_mountpoint()) | ||
542 | return; | ||
543 | |||
544 | ptr = debugfs_known_mountpoints; | ||
545 | while (*ptr) { | ||
546 | if (mount(NULL, *ptr, "debugfs", 0, NULL) == 0) { | ||
547 | /* save the mountpoint */ | ||
548 | strcpy(hwpoison_debug_fs, *ptr); | ||
549 | break; | ||
550 | } | ||
551 | ptr++; | ||
552 | } | ||
553 | |||
554 | if (*ptr == NULL) { | ||
555 | perror("mount debugfs"); | ||
556 | exit(EXIT_FAILURE); | ||
557 | } | ||
558 | } | ||
559 | |||
560 | /* | ||
561 | * page actions | ||
562 | */ | ||
563 | |||
564 | static void prepare_hwpoison_fd(void) | ||
565 | { | ||
566 | char buf[MAX_PATH + 1]; | ||
567 | |||
568 | debugfs_mount(); | ||
569 | |||
570 | if (opt_hwpoison && !hwpoison_inject_fd) { | ||
571 | snprintf(buf, MAX_PATH, "%s/hwpoison/corrupt-pfn", | ||
572 | hwpoison_debug_fs); | ||
573 | hwpoison_inject_fd = checked_open(buf, O_WRONLY); | ||
574 | } | ||
575 | |||
576 | if (opt_unpoison && !hwpoison_forget_fd) { | ||
577 | snprintf(buf, MAX_PATH, "%s/hwpoison/unpoison-pfn", | ||
578 | hwpoison_debug_fs); | ||
579 | hwpoison_forget_fd = checked_open(buf, O_WRONLY); | ||
580 | } | ||
581 | } | ||
582 | |||
583 | static int hwpoison_page(unsigned long offset) | ||
584 | { | ||
585 | char buf[100]; | ||
586 | int len; | ||
587 | |||
588 | len = sprintf(buf, "0x%lx\n", offset); | ||
589 | len = write(hwpoison_inject_fd, buf, len); | ||
590 | if (len < 0) { | ||
591 | perror("hwpoison inject"); | ||
592 | return len; | ||
593 | } | ||
594 | return 0; | ||
595 | } | ||
596 | |||
597 | static int unpoison_page(unsigned long offset) | ||
598 | { | ||
599 | char buf[100]; | ||
600 | int len; | ||
601 | |||
602 | len = sprintf(buf, "0x%lx\n", offset); | ||
603 | len = write(hwpoison_forget_fd, buf, len); | ||
604 | if (len < 0) { | ||
605 | perror("hwpoison forget"); | ||
606 | return len; | ||
607 | } | ||
608 | return 0; | ||
609 | } | ||
610 | |||
611 | /* | ||
612 | * page frame walker | ||
613 | */ | ||
614 | |||
615 | static int hash_slot(uint64_t flags) | ||
616 | { | ||
617 | int k = HASH_KEY(flags); | ||
618 | int i; | ||
619 | |||
620 | /* Explicitly reserve slot 0 for flags 0: the following logic | ||
621 | * cannot distinguish an unoccupied slot from slot (flags==0). | ||
622 | */ | ||
623 | if (flags == 0) | ||
624 | return 0; | ||
625 | |||
626 | /* search through the remaining (HASH_SIZE-1) slots */ | ||
627 | for (i = 1; i < ARRAY_SIZE(page_flags); i++, k++) { | ||
628 | if (!k || k >= ARRAY_SIZE(page_flags)) | ||
629 | k = 1; | ||
630 | if (page_flags[k] == 0) { | ||
631 | page_flags[k] = flags; | ||
632 | return k; | ||
633 | } | ||
634 | if (page_flags[k] == flags) | ||
635 | return k; | ||
636 | } | ||
637 | |||
638 | fatal("hash table full: bump up HASH_SHIFT?\n"); | ||
639 | exit(EXIT_FAILURE); | ||
640 | } | ||
641 | |||
642 | static void add_page(unsigned long voffset, | ||
643 | unsigned long offset, uint64_t flags) | ||
644 | { | ||
645 | flags = kpageflags_flags(flags); | ||
646 | |||
647 | if (!bit_mask_ok(flags)) | ||
648 | return; | ||
649 | |||
650 | if (opt_hwpoison) | ||
651 | hwpoison_page(offset); | ||
652 | if (opt_unpoison) | ||
653 | unpoison_page(offset); | ||
654 | |||
655 | if (opt_list == 1) | ||
656 | show_page_range(voffset, offset, flags); | ||
657 | else if (opt_list == 2) | ||
658 | show_page(voffset, offset, flags); | ||
659 | |||
660 | nr_pages[hash_slot(flags)]++; | ||
661 | total_pages++; | ||
662 | } | ||
663 | |||
664 | #define KPAGEFLAGS_BATCH (64 << 10) /* 64k pages */ | ||
665 | static void walk_pfn(unsigned long voffset, | ||
666 | unsigned long index, | ||
667 | unsigned long count) | ||
668 | { | ||
669 | uint64_t buf[KPAGEFLAGS_BATCH]; | ||
670 | unsigned long batch; | ||
671 | long pages; | ||
672 | unsigned long i; | ||
673 | |||
674 | while (count) { | ||
675 | batch = min_t(unsigned long, count, KPAGEFLAGS_BATCH); | ||
676 | pages = kpageflags_read(buf, index, batch); | ||
677 | if (pages == 0) | ||
678 | break; | ||
679 | |||
680 | for (i = 0; i < pages; i++) | ||
681 | add_page(voffset + i, index + i, buf[i]); | ||
682 | |||
683 | index += pages; | ||
684 | count -= pages; | ||
685 | } | ||
686 | } | ||
687 | |||
688 | #define PAGEMAP_BATCH (64 << 10) | ||
689 | static void walk_vma(unsigned long index, unsigned long count) | ||
690 | { | ||
691 | uint64_t buf[PAGEMAP_BATCH]; | ||
692 | unsigned long batch; | ||
693 | unsigned long pages; | ||
694 | unsigned long pfn; | ||
695 | unsigned long i; | ||
696 | |||
697 | while (count) { | ||
698 | batch = min_t(unsigned long, count, PAGEMAP_BATCH); | ||
699 | pages = pagemap_read(buf, index, batch); | ||
700 | if (pages == 0) | ||
701 | break; | ||
702 | |||
703 | for (i = 0; i < pages; i++) { | ||
704 | pfn = pagemap_pfn(buf[i]); | ||
705 | if (pfn) | ||
706 | walk_pfn(index + i, pfn, 1); | ||
707 | } | ||
708 | |||
709 | index += pages; | ||
710 | count -= pages; | ||
711 | } | ||
712 | } | ||
713 | |||
714 | static void walk_task(unsigned long index, unsigned long count) | ||
715 | { | ||
716 | const unsigned long end = index + count; | ||
717 | unsigned long start; | ||
718 | int i = 0; | ||
719 | |||
720 | while (index < end) { | ||
721 | |||
722 | while (pg_end[i] <= index) | ||
723 | if (++i >= nr_vmas) | ||
724 | return; | ||
725 | if (pg_start[i] >= end) | ||
726 | return; | ||
727 | |||
728 | start = max_t(unsigned long, pg_start[i], index); | ||
729 | index = min_t(unsigned long, pg_end[i], end); | ||
730 | |||
731 | assert(start < index); | ||
732 | walk_vma(start, index - start); | ||
733 | } | ||
734 | } | ||
735 | |||
736 | static void add_addr_range(unsigned long offset, unsigned long size) | ||
737 | { | ||
738 | if (nr_addr_ranges >= MAX_ADDR_RANGES) | ||
739 | fatal("too many addr ranges\n"); | ||
740 | |||
741 | opt_offset[nr_addr_ranges] = offset; | ||
742 | opt_size[nr_addr_ranges] = min_t(unsigned long, size, ULONG_MAX-offset); | ||
743 | nr_addr_ranges++; | ||
744 | } | ||
745 | |||
746 | static void walk_addr_ranges(void) | ||
747 | { | ||
748 | int i; | ||
749 | |||
750 | kpageflags_fd = checked_open(PROC_KPAGEFLAGS, O_RDONLY); | ||
751 | |||
752 | if (!nr_addr_ranges) | ||
753 | add_addr_range(0, ULONG_MAX); | ||
754 | |||
755 | for (i = 0; i < nr_addr_ranges; i++) | ||
756 | if (!opt_pid) | ||
757 | walk_pfn(0, opt_offset[i], opt_size[i]); | ||
758 | else | ||
759 | walk_task(opt_offset[i], opt_size[i]); | ||
760 | |||
761 | close(kpageflags_fd); | ||
762 | } | ||
763 | |||
764 | |||
765 | /* | ||
766 | * user interface | ||
767 | */ | ||
768 | |||
769 | static const char *page_flag_type(uint64_t flag) | ||
770 | { | ||
771 | if (flag & KPF_HACKERS_BITS) | ||
772 | return "(r)"; | ||
773 | if (flag & KPF_OVERLOADED_BITS) | ||
774 | return "(o)"; | ||
775 | return " "; | ||
776 | } | ||
777 | |||
778 | static void usage(void) | ||
779 | { | ||
780 | int i, j; | ||
781 | |||
782 | printf( | ||
783 | "page-types [options]\n" | ||
784 | " -r|--raw Raw mode, for kernel developers\n" | ||
785 | " -d|--describe flags Describe flags\n" | ||
786 | " -a|--addr addr-spec Walk a range of pages\n" | ||
787 | " -b|--bits bits-spec Walk pages with specified bits\n" | ||
788 | " -p|--pid pid Walk process address space\n" | ||
789 | #if 0 /* planned features */ | ||
790 | " -f|--file filename Walk file address space\n" | ||
791 | #endif | ||
792 | " -l|--list Show page details in ranges\n" | ||
793 | " -L|--list-each Show page details one by one\n" | ||
794 | " -N|--no-summary Don't show summary info\n" | ||
795 | " -X|--hwpoison hwpoison pages\n" | ||
796 | " -x|--unpoison unpoison pages\n" | ||
797 | " -h|--help Show this usage message\n" | ||
798 | "flags:\n" | ||
799 | " 0x10 bitfield format, e.g.\n" | ||
800 | " anon bit-name, e.g.\n" | ||
801 | " 0x10,anon comma-separated list, e.g.\n" | ||
802 | "addr-spec:\n" | ||
803 | " N one page at offset N (unit: pages)\n" | ||
804 | " N+M pages range from N to N+M-1\n" | ||
805 | " N,M pages range from N to M-1\n" | ||
806 | " N, pages range from N to end\n" | ||
807 | " ,M pages range from 0 to M-1\n" | ||
808 | "bits-spec:\n" | ||
809 | " bit1,bit2 (flags & (bit1|bit2)) != 0\n" | ||
810 | " bit1,bit2=bit1 (flags & (bit1|bit2)) == bit1\n" | ||
811 | " bit1,~bit2 (flags & (bit1|bit2)) == bit1\n" | ||
812 | " =bit1,bit2 flags == (bit1|bit2)\n" | ||
813 | "bit-names:\n" | ||
814 | ); | ||
815 | |||
816 | for (i = 0, j = 0; i < ARRAY_SIZE(page_flag_names); i++) { | ||
817 | if (!page_flag_names[i]) | ||
818 | continue; | ||
819 | printf("%16s%s", page_flag_names[i] + 2, | ||
820 | page_flag_type(1ULL << i)); | ||
821 | if (++j > 3) { | ||
822 | j = 0; | ||
823 | putchar('\n'); | ||
824 | } | ||
825 | } | ||
826 | printf("\n " | ||
827 | "(r) raw mode bits (o) overloaded bits\n"); | ||
828 | } | ||
829 | |||
830 | static unsigned long long parse_number(const char *str) | ||
831 | { | ||
832 | unsigned long long n; | ||
833 | |||
834 | n = strtoll(str, NULL, 0); | ||
835 | |||
836 | if (n == 0 && str[0] != '0') | ||
837 | fatal("invalid name or number: %s\n", str); | ||
838 | |||
839 | return n; | ||
840 | } | ||
841 | |||
842 | static void parse_pid(const char *str) | ||
843 | { | ||
844 | FILE *file; | ||
845 | char buf[5000]; | ||
846 | |||
847 | opt_pid = parse_number(str); | ||
848 | |||
849 | sprintf(buf, "/proc/%d/pagemap", opt_pid); | ||
850 | pagemap_fd = checked_open(buf, O_RDONLY); | ||
851 | |||
852 | sprintf(buf, "/proc/%d/maps", opt_pid); | ||
853 | file = fopen(buf, "r"); | ||
854 | if (!file) { | ||
855 | perror(buf); | ||
856 | exit(EXIT_FAILURE); | ||
857 | } | ||
858 | |||
859 | while (fgets(buf, sizeof(buf), file) != NULL) { | ||
860 | unsigned long vm_start; | ||
861 | unsigned long vm_end; | ||
862 | unsigned long long pgoff; | ||
863 | int major, minor; | ||
864 | char r, w, x, s; | ||
865 | unsigned long ino; | ||
866 | int n; | ||
867 | |||
868 | n = sscanf(buf, "%lx-%lx %c%c%c%c %llx %x:%x %lu", | ||
869 | &vm_start, | ||
870 | &vm_end, | ||
871 | &r, &w, &x, &s, | ||
872 | &pgoff, | ||
873 | &major, &minor, | ||
874 | &ino); | ||
875 | if (n < 10) { | ||
876 | fprintf(stderr, "unexpected line: %s\n", buf); | ||
877 | continue; | ||
878 | } | ||
879 | pg_start[nr_vmas] = vm_start / page_size; | ||
880 | pg_end[nr_vmas] = vm_end / page_size; | ||
881 | if (++nr_vmas >= MAX_VMAS) { | ||
882 | fprintf(stderr, "too many VMAs\n"); | ||
883 | break; | ||
884 | } | ||
885 | } | ||
886 | fclose(file); | ||
887 | } | ||
888 | |||
889 | static void parse_file(const char *name) | ||
890 | { | ||
891 | } | ||
892 | |||
893 | static void parse_addr_range(const char *optarg) | ||
894 | { | ||
895 | unsigned long offset; | ||
896 | unsigned long size; | ||
897 | char *p; | ||
898 | |||
899 | p = strchr(optarg, ','); | ||
900 | if (!p) | ||
901 | p = strchr(optarg, '+'); | ||
902 | |||
903 | if (p == optarg) { | ||
904 | offset = 0; | ||
905 | size = parse_number(p + 1); | ||
906 | } else if (p) { | ||
907 | offset = parse_number(optarg); | ||
908 | if (p[1] == '\0') | ||
909 | size = ULONG_MAX; | ||
910 | else { | ||
911 | size = parse_number(p + 1); | ||
912 | if (*p == ',') { | ||
913 | if (size < offset) | ||
914 | fatal("invalid range: %lu,%lu\n", | ||
915 | offset, size); | ||
916 | size -= offset; | ||
917 | } | ||
918 | } | ||
919 | } else { | ||
920 | offset = parse_number(optarg); | ||
921 | size = 1; | ||
922 | } | ||
923 | |||
924 | add_addr_range(offset, size); | ||
925 | } | ||
926 | |||
927 | static void add_bits_filter(uint64_t mask, uint64_t bits) | ||
928 | { | ||
929 | if (nr_bit_filters >= MAX_BIT_FILTERS) | ||
930 | fatal("too much bit filters\n"); | ||
931 | |||
932 | opt_mask[nr_bit_filters] = mask; | ||
933 | opt_bits[nr_bit_filters] = bits; | ||
934 | nr_bit_filters++; | ||
935 | } | ||
936 | |||
937 | static uint64_t parse_flag_name(const char *str, int len) | ||
938 | { | ||
939 | int i; | ||
940 | |||
941 | if (!*str || !len) | ||
942 | return 0; | ||
943 | |||
944 | if (len <= 8 && !strncmp(str, "compound", len)) | ||
945 | return BITS_COMPOUND; | ||
946 | |||
947 | for (i = 0; i < ARRAY_SIZE(page_flag_names); i++) { | ||
948 | if (!page_flag_names[i]) | ||
949 | continue; | ||
950 | if (!strncmp(str, page_flag_names[i] + 2, len)) | ||
951 | return 1ULL << i; | ||
952 | } | ||
953 | |||
954 | return parse_number(str); | ||
955 | } | ||
956 | |||
957 | static uint64_t parse_flag_names(const char *str, int all) | ||
958 | { | ||
959 | const char *p = str; | ||
960 | uint64_t flags = 0; | ||
961 | |||
962 | while (1) { | ||
963 | if (*p == ',' || *p == '=' || *p == '\0') { | ||
964 | if ((*str != '~') || (*str == '~' && all && *++str)) | ||
965 | flags |= parse_flag_name(str, p - str); | ||
966 | if (*p != ',') | ||
967 | break; | ||
968 | str = p + 1; | ||
969 | } | ||
970 | p++; | ||
971 | } | ||
972 | |||
973 | return flags; | ||
974 | } | ||
975 | |||
976 | static void parse_bits_mask(const char *optarg) | ||
977 | { | ||
978 | uint64_t mask; | ||
979 | uint64_t bits; | ||
980 | const char *p; | ||
981 | |||
982 | p = strchr(optarg, '='); | ||
983 | if (p == optarg) { | ||
984 | mask = KPF_ALL_BITS; | ||
985 | bits = parse_flag_names(p + 1, 0); | ||
986 | } else if (p) { | ||
987 | mask = parse_flag_names(optarg, 0); | ||
988 | bits = parse_flag_names(p + 1, 0); | ||
989 | } else if (strchr(optarg, '~')) { | ||
990 | mask = parse_flag_names(optarg, 1); | ||
991 | bits = parse_flag_names(optarg, 0); | ||
992 | } else { | ||
993 | mask = parse_flag_names(optarg, 0); | ||
994 | bits = KPF_ALL_BITS; | ||
995 | } | ||
996 | |||
997 | add_bits_filter(mask, bits); | ||
998 | } | ||
999 | |||
1000 | static void describe_flags(const char *optarg) | ||
1001 | { | ||
1002 | uint64_t flags = parse_flag_names(optarg, 0); | ||
1003 | |||
1004 | printf("0x%016llx\t%s\t%s\n", | ||
1005 | (unsigned long long)flags, | ||
1006 | page_flag_name(flags), | ||
1007 | page_flag_longname(flags)); | ||
1008 | } | ||
1009 | |||
1010 | static const struct option opts[] = { | ||
1011 | { "raw" , 0, NULL, 'r' }, | ||
1012 | { "pid" , 1, NULL, 'p' }, | ||
1013 | { "file" , 1, NULL, 'f' }, | ||
1014 | { "addr" , 1, NULL, 'a' }, | ||
1015 | { "bits" , 1, NULL, 'b' }, | ||
1016 | { "describe" , 1, NULL, 'd' }, | ||
1017 | { "list" , 0, NULL, 'l' }, | ||
1018 | { "list-each" , 0, NULL, 'L' }, | ||
1019 | { "no-summary", 0, NULL, 'N' }, | ||
1020 | { "hwpoison" , 0, NULL, 'X' }, | ||
1021 | { "unpoison" , 0, NULL, 'x' }, | ||
1022 | { "help" , 0, NULL, 'h' }, | ||
1023 | { NULL , 0, NULL, 0 } | ||
1024 | }; | ||
1025 | |||
1026 | int main(int argc, char *argv[]) | ||
1027 | { | ||
1028 | int c; | ||
1029 | |||
1030 | page_size = getpagesize(); | ||
1031 | |||
1032 | while ((c = getopt_long(argc, argv, | ||
1033 | "rp:f:a:b:d:lLNXxh", opts, NULL)) != -1) { | ||
1034 | switch (c) { | ||
1035 | case 'r': | ||
1036 | opt_raw = 1; | ||
1037 | break; | ||
1038 | case 'p': | ||
1039 | parse_pid(optarg); | ||
1040 | break; | ||
1041 | case 'f': | ||
1042 | parse_file(optarg); | ||
1043 | break; | ||
1044 | case 'a': | ||
1045 | parse_addr_range(optarg); | ||
1046 | break; | ||
1047 | case 'b': | ||
1048 | parse_bits_mask(optarg); | ||
1049 | break; | ||
1050 | case 'd': | ||
1051 | describe_flags(optarg); | ||
1052 | exit(0); | ||
1053 | case 'l': | ||
1054 | opt_list = 1; | ||
1055 | break; | ||
1056 | case 'L': | ||
1057 | opt_list = 2; | ||
1058 | break; | ||
1059 | case 'N': | ||
1060 | opt_no_summary = 1; | ||
1061 | break; | ||
1062 | case 'X': | ||
1063 | opt_hwpoison = 1; | ||
1064 | prepare_hwpoison_fd(); | ||
1065 | break; | ||
1066 | case 'x': | ||
1067 | opt_unpoison = 1; | ||
1068 | prepare_hwpoison_fd(); | ||
1069 | break; | ||
1070 | case 'h': | ||
1071 | usage(); | ||
1072 | exit(0); | ||
1073 | default: | ||
1074 | usage(); | ||
1075 | exit(1); | ||
1076 | } | ||
1077 | } | ||
1078 | |||
1079 | if (opt_list && opt_pid) | ||
1080 | printf("voffset\t"); | ||
1081 | if (opt_list == 1) | ||
1082 | printf("offset\tlen\tflags\n"); | ||
1083 | if (opt_list == 2) | ||
1084 | printf("offset\tflags\n"); | ||
1085 | |||
1086 | walk_addr_ranges(); | ||
1087 | |||
1088 | if (opt_list == 1) | ||
1089 | show_page_range(0, 0, 0); /* drain the buffer */ | ||
1090 | |||
1091 | if (opt_no_summary) | ||
1092 | return 0; | ||
1093 | |||
1094 | if (opt_list) | ||
1095 | printf("\n\n"); | ||
1096 | |||
1097 | show_summary(); | ||
1098 | |||
1099 | return 0; | ||
1100 | } | ||
diff --git a/Documentation/watchdog/00-INDEX b/Documentation/watchdog/00-INDEX new file mode 100644 index 00000000000..fc51128071c --- /dev/null +++ b/Documentation/watchdog/00-INDEX | |||
@@ -0,0 +1,17 @@ | |||
1 | 00-INDEX | ||
2 | - this file. | ||
3 | hpwdt.txt | ||
4 | - information on the HP iLO2 NMI watchdog | ||
5 | pcwd-watchdog.txt | ||
6 | - documentation for Berkshire Products PC Watchdog ISA cards. | ||
7 | src/ | ||
8 | - directory holding watchdog related example programs. | ||
9 | watchdog-api.txt | ||
10 | - description of the Linux Watchdog driver API. | ||
11 | watchdog-kernel-api.txt | ||
12 | - description of the Linux WatchDog Timer Driver Core kernel API. | ||
13 | watchdog-parameters.txt | ||
14 | - information on driver parameters (for drivers other than | ||
15 | the ones that have driver-specific files here) | ||
16 | wdt.txt | ||
17 | - description of the Watchdog Timer Interfaces for Linux. | ||
diff --git a/Documentation/zh_CN/SubmitChecklist b/Documentation/zh_CN/SubmitChecklist new file mode 100644 index 00000000000..4c741d6bc04 --- /dev/null +++ b/Documentation/zh_CN/SubmitChecklist | |||
@@ -0,0 +1,109 @@ | |||
1 | Chinese translated version of Documentation/SubmitChecklist | ||
2 | |||
3 | If you have any comment or update to the content, please contact the | ||
4 | original document maintainer directly. However, if you have a problem | ||
5 | communicating in English you can also ask the Chinese maintainer for | ||
6 | help. Contact the Chinese maintainer if this translation is outdated | ||
7 | or if there is a problem with the translation. | ||
8 | |||
9 | Chinese maintainer: Harry Wei <harryxiyou@gmail.com> | ||
10 | --------------------------------------------------------------------- | ||
11 | Documentation/SubmitChecklist 的中文翻译 | ||
12 | |||
13 | 如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 | ||
14 | 交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 | ||
15 | 译存在问题,请联系中文版维护者。 | ||
16 | |||
17 | 中文版维护者: 贾威威 Harry Wei <harryxiyou@gmail.com> | ||
18 | 中文版翻译者: 贾威威 Harry Wei <harryxiyou@gmail.com> | ||
19 | 中文版校译者: 贾威威 Harry Wei <harryxiyou@gmail.com> | ||
20 | |||
21 | |||
22 | 以下为正文 | ||
23 | --------------------------------------------------------------------- | ||
24 | Linux内核提交清单 | ||
25 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
26 | |||
27 | 这里有一些内核开发者应该做的基本事情,如果他们想看到自己的内核补丁提交 | ||
28 | 被接受的更快。 | ||
29 | |||
30 | 这些都是超出Documentation/SubmittingPatches文档里所提供的以及其他 | ||
31 | 关于提交Linux内核补丁的说明。 | ||
32 | |||
33 | 1:如果你使用了一个功能那么就#include定义/声明那个功能的那个文件。 | ||
34 | 不要依靠其他间接引入定义/声明那个功能的头文件。 | ||
35 | |||
36 | 2:构建简洁适用或者更改CONFIG选项 =y,=m,或者=n。 | ||
37 | 不要有编译警告/错误, 不要有链接警告/错误。 | ||
38 | |||
39 | 2b:通过 allnoconfig, allmodconfig | ||
40 | |||
41 | 2c:当使用 0=builddir 成功地构建 | ||
42 | |||
43 | 3:通过使用本地交叉编译工具或者其他一些构建产所,在多CPU框架上构建。 | ||
44 | |||
45 | 4:ppc64 是一个很好的检查交叉编译的框架,因为它往往把‘unsigned long’ | ||
46 | 当64位值来使用。 | ||
47 | |||
48 | 5:按照Documentation/CodingStyle文件里的详细描述,检查你补丁的整体风格。 | ||
49 | 使用补丁风格检查琐碎的违规(scripts/checkpatch.pl),审核员优先提交。 | ||
50 | 你应该调整遗留在你补丁中的所有违规。 | ||
51 | |||
52 | 6:任何更新或者改动CONFIG选项都不能打乱配置菜单。 | ||
53 | |||
54 | 7:所有的Kconfig选项更新都要有说明文字。 | ||
55 | |||
56 | 8:已经认真地总结了相关的Kconfig组合。这是很难通过测试做好的--脑力在这里下降。 | ||
57 | |||
58 | 9:检查具有简洁性。 | ||
59 | |||
60 | 10:使用'make checkstack'和'make namespacecheck'检查,然后修改所找到的问题。 | ||
61 | 注意:堆栈检查不会明确地出现问题,但是任何的一个函数在堆栈上使用多于512字节 | ||
62 | 都要准备修改。 | ||
63 | |||
64 | 11:包含kernel-doc到全局内核APIs文件。(不要求静态的函数,但是包含也无所谓。) | ||
65 | 使用'make htmldocs'或者'make mandocs'来检查kernel-doc,然后修改任何 | ||
66 | 发现的问题。 | ||
67 | |||
68 | 12:已经通过CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, | ||
69 | CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, | ||
70 | CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_ATOMIC_SLEEP测试,并且同时都 | ||
71 | 使能。 | ||
72 | |||
73 | 13:已经都构建并且使用或者不使用 CONFIG_SMP 和 CONFIG_PREEMPT测试执行时间。 | ||
74 | |||
75 | 14:如果补丁影响IO/Disk,等等:已经通过使用或者不使用 CONFIG_LBDAF 测试。 | ||
76 | |||
77 | 15:所有的codepaths已经行使所有lockdep启用功能。 | ||
78 | |||
79 | 16:所有的/proc记录更新都要作成文件放在Documentation/目录下。 | ||
80 | |||
81 | 17:所有的内核启动参数更新都被记录到Documentation/kernel-parameters.txt文件中。 | ||
82 | |||
83 | 18:所有的模块参数更新都用MODULE_PARM_DESC()记录。 | ||
84 | |||
85 | 19:所有的用户空间接口更新都被记录到Documentation/ABI/。查看Documentation/ABI/README | ||
86 | 可以获得更多的信息。改变用户空间接口的补丁应该被邮件抄送给linux-api@vger.kernel.org。 | ||
87 | |||
88 | 20:检查它是不是都通过`make headers_check'。 | ||
89 | |||
90 | 21:已经通过至少引入slab和page-allocation失败检查。查看Documentation/fault-injection/。 | ||
91 | |||
92 | 22:新加入的源码已经通过`gcc -W'(使用"make EXTRA_CFLAGS=-W")编译。这样将产生很多烦恼, | ||
93 | 但是对于寻找漏洞很有益处,例如:"warning: comparison between signed and unsigned"。 | ||
94 | |||
95 | 23:当它被合并到-mm补丁集后再测试,用来确定它是否还和补丁队列中的其他补丁一起工作以及在VM,VFS | ||
96 | 和其他子系统中各个变化。 | ||
97 | |||
98 | 24:所有的内存屏障{e.g., barrier(), rmb(), wmb()}需要在源代码中的一个注释来解释他们都是干什么的 | ||
99 | 以及原因。 | ||
100 | |||
101 | 25:如果有任何输入输出控制的补丁被添加,也要更新Documentation/ioctl/ioctl-number.txt。 | ||
102 | |||
103 | 26:如果你的更改代码依靠或者使用任何的内核APIs或者与下面的kconfig符号有关系的功能,你就要 | ||
104 | 使用相关的kconfig符号关闭, and/or =m(如果选项提供)[在同一时间不是所用的都启用,仅仅各个或者自由 | ||
105 | 组合他们]: | ||
106 | |||
107 | CONFIG_SMP, CONFIG_SYSFS, CONFIG_PROC_FS, CONFIG_INPUT, CONFIG_PCI, | ||
108 | CONFIG_BLOCK, CONFIG_PM, CONFIG_HOTPLUG, CONFIG_MAGIC_SYSRQ, | ||
109 | CONFIG_NET, CONFIG_INET=n (后一个使用 CONFIG_NET=y) | ||