aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorJonathan Herman <hermanjl@cs.unc.edu>2013-01-22 10:38:37 -0500
committerJonathan Herman <hermanjl@cs.unc.edu>2013-01-22 10:38:37 -0500
commitfcc9d2e5a6c89d22b8b773a64fb4ad21ac318446 (patch)
treea57612d1888735a2ec7972891b68c1ac5ec8faea /Documentation
parent8dea78da5cee153b8af9c07a2745f6c55057fe12 (diff)
Added missing tegra files.HEADmaster
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/obsolete/proc-pid-oom_adj22
-rw-r--r--Documentation/ABI/testing/sysfs-devices-node7
-rw-r--r--Documentation/ABI/testing/sysfs-kernel-mm-cleancache11
-rw-r--r--Documentation/ABI/testing/sysfs-wacom10
-rw-r--r--Documentation/DocBook/mcabook.tmpl107
-rw-r--r--Documentation/android.txt121
-rw-r--r--Documentation/aoe/mkdevs.sh41
-rw-r--r--Documentation/aoe/mkshelf.sh28
-rw-r--r--Documentation/arm/IXP200069
-rw-r--r--Documentation/arm/nvidia/tegra_parameters.txt169
-rw-r--r--Documentation/devicetree/bindings/gpio/gpio_nvidia.txt8
-rw-r--r--Documentation/devicetree/bindings/gpio/led.txt58
-rw-r--r--Documentation/devicetree/bindings/i2c/arm-versatile.txt10
-rw-r--r--Documentation/devicetree/bindings/i2c/ce4100-i2c.txt93
-rw-r--r--Documentation/devicetree/bindings/i2c/fsl-i2c.txt64
-rw-r--r--Documentation/devicetree/bindings/spi/spi_nvidia.txt5
-rw-r--r--Documentation/feature-removal-schedule.txt602
-rw-r--r--Documentation/i2c/muxes/gpio-i2cmux65
-rw-r--r--Documentation/mca.txt313
-rw-r--r--Documentation/memory.txt33
-rw-r--r--Documentation/networking/3c359.txt58
-rw-r--r--Documentation/networking/olympic.txt79
-rw-r--r--Documentation/networking/smctr.txt66
-rw-r--r--Documentation/networking/tms380tr.txt147
-rw-r--r--Documentation/nmi_watchdog.txt83
-rw-r--r--Documentation/powerpc/phyp-assisted-dump.txt127
-rw-r--r--Documentation/prio_tree.txt107
-rw-r--r--Documentation/scsi/ibmmca.txt1402
-rw-r--r--Documentation/serial/computone.txt522
-rw-r--r--Documentation/sparc/README-2.546
-rw-r--r--Documentation/telephony/00-INDEX4
-rw-r--r--Documentation/telephony/ixj.txt394
-rw-r--r--Documentation/trace/tracedump.txt58
-rw-r--r--Documentation/trace/tracelevel.txt42
-rw-r--r--Documentation/video/tegra_dc_ext.txt83
-rw-r--r--Documentation/virtual/lguest/Makefile8
-rw-r--r--Documentation/virtual/lguest/extract58
-rw-r--r--Documentation/virtual/lguest/lguest.c2065
-rw-r--r--Documentation/virtual/lguest/lguest.txt129
-rw-r--r--Documentation/vm/Makefile8
-rw-r--r--Documentation/vm/hugepage-mmap.c91
-rw-r--r--Documentation/vm/hugepage-shm.c98
-rw-r--r--Documentation/vm/map_hugetlb.c77
-rw-r--r--Documentation/vm/page-types.c1100
-rw-r--r--Documentation/watchdog/00-INDEX17
-rw-r--r--Documentation/zh_CN/SubmitChecklist109
46 files changed, 8814 insertions, 0 deletions
diff --git a/Documentation/ABI/obsolete/proc-pid-oom_adj b/Documentation/ABI/obsolete/proc-pid-oom_adj
new file mode 100644
index 00000000000..9a3cb88ade4
--- /dev/null
+++ b/Documentation/ABI/obsolete/proc-pid-oom_adj
@@ -0,0 +1,22 @@
1What: /proc/<pid>/oom_adj
2When: August 2012
3Why: /proc/<pid>/oom_adj allows userspace to influence the oom killer's
4 badness heuristic used to determine which task to kill when the kernel
5 is out of memory.
6
7 The badness heuristic has since been rewritten since the introduction of
8 this tunable such that its meaning is deprecated. The value was
9 implemented as a bitshift on a score generated by the badness()
10 function that did not have any precise units of measure. With the
11 rewrite, the score is given as a proportion of available memory to the
12 task allocating pages, so using a bitshift which grows the score
13 exponentially is, thus, impossible to tune with fine granularity.
14
15 A much more powerful interface, /proc/<pid>/oom_score_adj, was
16 introduced with the oom killer rewrite that allows users to increase or
17 decrease the badness score linearly. This interface will replace
18 /proc/<pid>/oom_adj.
19
20 A warning will be emitted to the kernel log if an application uses this
21 deprecated interface. After it is printed once, future warnings will be
22 suppressed until the kernel is rebooted.
diff --git a/Documentation/ABI/testing/sysfs-devices-node b/Documentation/ABI/testing/sysfs-devices-node
new file mode 100644
index 00000000000..453a210c3ce
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-node
@@ -0,0 +1,7 @@
1What: /sys/devices/system/node/nodeX/compact
2Date: February 2010
3Contact: Mel Gorman <mel@csn.ul.ie>
4Description:
5 When this file is written to, all memory within that node
6 will be compacted. When it completes, memory will be freed
7 into blocks which have as many contiguous pages as possible
diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-cleancache b/Documentation/ABI/testing/sysfs-kernel-mm-cleancache
new file mode 100644
index 00000000000..662ae646ea1
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-mm-cleancache
@@ -0,0 +1,11 @@
1What: /sys/kernel/mm/cleancache/
2Date: April 2011
3Contact: Dan Magenheimer <dan.magenheimer@oracle.com>
4Description:
5 /sys/kernel/mm/cleancache/ contains a number of files which
6 record a count of various cleancache operations
7 (sum across all filesystems):
8 succ_gets
9 failed_gets
10 puts
11 flushes
diff --git a/Documentation/ABI/testing/sysfs-wacom b/Documentation/ABI/testing/sysfs-wacom
new file mode 100644
index 00000000000..1517976e25c
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-wacom
@@ -0,0 +1,10 @@
1What: /sys/class/hidraw/hidraw*/device/speed
2Date: April 2010
3Kernel Version: 2.6.35
4Contact: linux-bluetooth@vger.kernel.org
5Description:
6 The /sys/class/hidraw/hidraw*/device/speed file controls
7 reporting speed of wacom bluetooth tablet. Reading from
8 this file returns 1 if tablet reports in high speed mode
9 or 0 otherwise. Writing to this file one of these values
10 switches reporting speed.
diff --git a/Documentation/DocBook/mcabook.tmpl b/Documentation/DocBook/mcabook.tmpl
new file mode 100644
index 00000000000..467ccac6ec5
--- /dev/null
+++ b/Documentation/DocBook/mcabook.tmpl
@@ -0,0 +1,107 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="MCAGuide">
6 <bookinfo>
7 <title>MCA Driver Programming Interface</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Alan</firstname>
12 <surname>Cox</surname>
13 <affiliation>
14 <address>
15 <email>alan@lxorguk.ukuu.org.uk</email>
16 </address>
17 </affiliation>
18 </author>
19 <author>
20 <firstname>David</firstname>
21 <surname>Weinehall</surname>
22 </author>
23 <author>
24 <firstname>Chris</firstname>
25 <surname>Beauregard</surname>
26 </author>
27 </authorgroup>
28
29 <copyright>
30 <year>2000</year>
31 <holder>Alan Cox</holder>
32 <holder>David Weinehall</holder>
33 <holder>Chris Beauregard</holder>
34 </copyright>
35
36 <legalnotice>
37 <para>
38 This documentation is free software; you can redistribute
39 it and/or modify it under the terms of the GNU General Public
40 License as published by the Free Software Foundation; either
41 version 2 of the License, or (at your option) any later
42 version.
43 </para>
44
45 <para>
46 This program is distributed in the hope that it will be
47 useful, but WITHOUT ANY WARRANTY; without even the implied
48 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
49 See the GNU General Public License for more details.
50 </para>
51
52 <para>
53 You should have received a copy of the GNU General Public
54 License along with this program; if not, write to the Free
55 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
56 MA 02111-1307 USA
57 </para>
58
59 <para>
60 For more details see the file COPYING in the source
61 distribution of Linux.
62 </para>
63 </legalnotice>
64 </bookinfo>
65
66<toc></toc>
67
68 <chapter id="intro">
69 <title>Introduction</title>
70 <para>
71 The MCA bus functions provide a generalised interface to find MCA
72 bus cards, to claim them for a driver, and to read and manipulate POS
73 registers without being aware of the motherboard internals or
74 certain deep magic specific to onboard devices.
75 </para>
76 <para>
77 The basic interface to the MCA bus devices is the slot. Each slot
78 is numbered and virtual slot numbers are assigned to the internal
79 devices. Using a pci_dev as other busses do does not really make
80 sense in the MCA context as the MCA bus resources require card
81 specific interpretation.
82 </para>
83 <para>
84 Finally the MCA bus functions provide a parallel set of DMA
85 functions mimicing the ISA bus DMA functions as closely as possible,
86 although also supporting the additional DMA functionality on the
87 MCA bus controllers.
88 </para>
89 </chapter>
90 <chapter id="bugs">
91 <title>Known Bugs And Assumptions</title>
92 <para>
93 None.
94 </para>
95 </chapter>
96
97 <chapter id="pubfunctions">
98 <title>Public Functions Provided</title>
99!Edrivers/mca/mca-legacy.c
100 </chapter>
101
102 <chapter id="dmafunctions">
103 <title>DMA Functions Provided</title>
104!Iarch/x86/include/asm/mca_dma.h
105 </chapter>
106
107</book>
diff --git a/Documentation/android.txt b/Documentation/android.txt
new file mode 100644
index 00000000000..72a62afdf20
--- /dev/null
+++ b/Documentation/android.txt
@@ -0,0 +1,121 @@
1 =============
2 A N D R O I D
3 =============
4
5Copyright (C) 2009 Google, Inc.
6Written by Mike Chan <mike@android.com>
7
8CONTENTS:
9---------
10
111. Android
12 1.1 Required enabled config options
13 1.2 Required disabled config options
14 1.3 Recommended enabled config options
152. Contact
16
17
181. Android
19==========
20
21Android (www.android.com) is an open source operating system for mobile devices.
22This document describes configurations needed to run the Android framework on
23top of the Linux kernel.
24
25To see a working defconfig look at msm_defconfig or goldfish_defconfig
26which can be found at http://android.git.kernel.org in kernel/common.git
27and kernel/msm.git
28
29
301.1 Required enabled config options
31-----------------------------------
32After building a standard defconfig, ensure that these options are enabled in
33your .config or defconfig if they are not already. Based off the msm_defconfig.
34You should keep the rest of the default options enabled in the defconfig
35unless you know what you are doing.
36
37ANDROID_PARANOID_NETWORK
38ASHMEM
39CONFIG_FB_MODE_HELPERS
40CONFIG_FONT_8x16
41CONFIG_FONT_8x8
42CONFIG_YAFFS_SHORT_NAMES_IN_RAM
43DAB
44EARLYSUSPEND
45FB
46FB_CFB_COPYAREA
47FB_CFB_FILLRECT
48FB_CFB_IMAGEBLIT
49FB_DEFERRED_IO
50FB_TILEBLITTING
51HIGH_RES_TIMERS
52INOTIFY
53INOTIFY_USER
54INPUT_EVDEV
55INPUT_GPIO
56INPUT_MISC
57LEDS_CLASS
58LEDS_GPIO
59LOCK_KERNEL
60LkOGGER
61LOW_MEMORY_KILLER
62MISC_DEVICES
63NEW_LEDS
64NO_HZ
65POWER_SUPPLY
66PREEMPT
67RAMFS
68RTC_CLASS
69RTC_LIB
70SWITCH
71SWITCH_GPIO
72TMPFS
73UID_STAT
74UID16
75USB_FUNCTION
76USB_FUNCTION_ADB
77USER_WAKELOCK
78VIDEO_OUTPUT_CONTROL
79WAKELOCK
80YAFFS_AUTO_YAFFS2
81YAFFS_FS
82YAFFS_YAFFS1
83YAFFS_YAFFS2
84
85
861.2 Required disabled config options
87------------------------------------
88CONFIG_YAFFS_DISABLE_LAZY_LOAD
89DNOTIFY
90
91
921.3 Recommended enabled config options
93------------------------------
94ANDROID_PMEM
95ANDROID_RAM_CONSOLE
96ANDROID_RAM_CONSOLE_ERROR_CORRECTION
97SCHEDSTATS
98DEBUG_PREEMPT
99DEBUG_MUTEXES
100DEBUG_SPINLOCK_SLEEP
101DEBUG_INFO
102FRAME_POINTER
103CPU_FREQ
104CPU_FREQ_TABLE
105CPU_FREQ_DEFAULT_GOV_ONDEMAND
106CPU_FREQ_GOV_ONDEMAND
107CRC_CCITT
108EMBEDDED
109INPUT_TOUCHSCREEN
110I2C
111I2C_BOARDINFO
112LOG_BUF_SHIFT=17
113SERIAL_CORE
114SERIAL_CORE_CONSOLE
115
116
1172. Contact
118==========
119website: http://android.git.kernel.org
120
121mailing-lists: android-kernel@googlegroups.com
diff --git a/Documentation/aoe/mkdevs.sh b/Documentation/aoe/mkdevs.sh
new file mode 100644
index 00000000000..44c0ab70243
--- /dev/null
+++ b/Documentation/aoe/mkdevs.sh
@@ -0,0 +1,41 @@
1#!/bin/sh
2
3n_shelves=${n_shelves:-10}
4n_partitions=${n_partitions:-16}
5
6if test "$#" != "1"; then
7 echo "Usage: sh `basename $0` {dir}" 1>&2
8 echo " n_partitions=16 sh `basename $0` {dir}" 1>&2
9 exit 1
10fi
11dir=$1
12
13MAJOR=152
14
15echo "Creating AoE devnode files in $dir ..."
16
17set -e
18
19mkdir -p $dir
20
21# (Status info is in sysfs. See status.sh.)
22# rm -f $dir/stat
23# mknod -m 0400 $dir/stat c $MAJOR 1
24rm -f $dir/err
25mknod -m 0400 $dir/err c $MAJOR 2
26rm -f $dir/discover
27mknod -m 0200 $dir/discover c $MAJOR 3
28rm -f $dir/interfaces
29mknod -m 0200 $dir/interfaces c $MAJOR 4
30rm -f $dir/revalidate
31mknod -m 0200 $dir/revalidate c $MAJOR 5
32rm -f $dir/flush
33mknod -m 0200 $dir/flush c $MAJOR 6
34
35export n_partitions
36mkshelf=`echo $0 | sed 's!mkdevs!mkshelf!'`
37i=0
38while test $i -lt $n_shelves; do
39 sh -xc "sh $mkshelf $dir $i"
40 i=`expr $i + 1`
41done
diff --git a/Documentation/aoe/mkshelf.sh b/Documentation/aoe/mkshelf.sh
new file mode 100644
index 00000000000..32615814271
--- /dev/null
+++ b/Documentation/aoe/mkshelf.sh
@@ -0,0 +1,28 @@
1#! /bin/sh
2
3if test "$#" != "2"; then
4 echo "Usage: sh `basename $0` {dir} {shelfaddress}" 1>&2
5 echo " n_partitions=16 sh `basename $0` {dir} {shelfaddress}" 1>&2
6 exit 1
7fi
8n_partitions=${n_partitions:-16}
9dir=$1
10shelf=$2
11nslots=16
12maxslot=`echo $nslots 1 - p | dc`
13MAJOR=152
14
15set -e
16
17minor=`echo $nslots \* $shelf \* $n_partitions | bc`
18endp=`echo $n_partitions - 1 | bc`
19for slot in `seq 0 $maxslot`; do
20 for part in `seq 0 $endp`; do
21 name=e$shelf.$slot
22 test "$part" != "0" && name=${name}p$part
23 rm -f $dir/$name
24 mknod -m 0660 $dir/$name b $MAJOR $minor
25
26 minor=`expr $minor + 1`
27 done
28done
diff --git a/Documentation/arm/IXP2000 b/Documentation/arm/IXP2000
new file mode 100644
index 00000000000..68d21d92a30
--- /dev/null
+++ b/Documentation/arm/IXP2000
@@ -0,0 +1,69 @@
1
2-------------------------------------------------------------------------
3Release Notes for Linux on Intel's IXP2000 Network Processor
4
5Maintained by Deepak Saxena <dsaxena@plexity.net>
6-------------------------------------------------------------------------
7
81. Overview
9
10Intel's IXP2000 family of NPUs (IXP2400, IXP2800, IXP2850) is designed
11for high-performance network applications such high-availability
12telecom systems. In addition to an XScale core, it contains up to 8
13"MicroEngines" that run special code, several high-end networking
14interfaces (UTOPIA, SPI, etc), a PCI host bridge, one serial port,
15flash interface, and some other odds and ends. For more information, see:
16
17http://developer.intel.com
18
192. Linux Support
20
21Linux currently supports the following features on the IXP2000 NPUs:
22
23- On-chip serial
24- PCI
25- Flash (MTD/JFFS2)
26- I2C through GPIO
27- Timers (watchdog, OS)
28
29That is about all we can support under Linux ATM b/c the core networking
30components of the chip are accessed via Intel's closed source SDK.
31Please contact Intel directly on issues with using those. There is
32also a mailing list run by some folks at Princeton University that might
33be of help: https://lists.cs.princeton.edu/mailman/listinfo/ixp2xxx
34
35WHATEVER YOU DO, DO NOT POST EMAIL TO THE LINUX-ARM OR LINUX-ARM-KERNEL
36MAILING LISTS REGARDING THE INTEL SDK.
37
383. Supported Platforms
39
40- Intel IXDP2400 Reference Platform
41- Intel IXDP2800 Reference Platform
42- Intel IXDP2401 Reference Platform
43- Intel IXDP2801 Reference Platform
44- RadiSys ENP-2611
45
464. Usage Notes
47
48- The IXP2000 platforms usually have rather complex PCI bus topologies
49 with large memory space requirements. In addition, b/c of the way the
50 Intel SDK is designed, devices are enumerated in a very specific
51 way. B/c of this this, we use "pci=firmware" option in the kernel
52 command line so that we do not re-enumerate the bus.
53
54- IXDP2x01 systems have variable clock tick rates that we cannot determine
55 via HW registers. The "ixdp2x01_clk=XXX" cmd line options allow you
56 to pass the clock rate to the board port.
57
585. Thanks
59
60The IXP2000 work has been funded by Intel Corp. and MontaVista Software, Inc.
61
62The following people have contributed patches/comments/etc:
63
64Naeem F. Afzal
65Lennert Buytenhek
66Jeffrey Daly
67
68-------------------------------------------------------------------------
69Last Update: 8/09/2004
diff --git a/Documentation/arm/nvidia/tegra_parameters.txt b/Documentation/arm/nvidia/tegra_parameters.txt
new file mode 100644
index 00000000000..4c73fe7269f
--- /dev/null
+++ b/Documentation/arm/nvidia/tegra_parameters.txt
@@ -0,0 +1,169 @@
1This file documents NVIDIA Tegra specific sysfs and debugfs files and
2kernel module parameters.
3
4/sys/power/suspend/mode
5-----------------------
6
7Used to select the LP1 or LP0 power state during system suspend.
8# echo lp0 > /sys/kernel/debug/suspend_mode
9# echo lp1 > /sys/kernel/debug/suspend_mode
10
11/sys/module/cpuidle/parameters/lp2_in_idle
12------------------------------------------
13
14Used to enable/disable LP2 in idle.
15# echo 1 > /sys/module/cpuidle/parameters/lp2_in_idle
16# echo 0 > /sys/module/cpuidle/parameters/lp2_in_idle
17
18/sys/kernel/debug/cpuidle/lp2
19-----------------------------
20
21Contains LP2 statistics.
22# cat /sys/kernel/debug/cpuidle/lp2
23
24/sys/kernel/debug/powergate
25---------------------------
26
27Contains power gating state of different tegra blocks.
28
29# cat /sys/kernel/debug/powergate
30
31/sys/module/cpu_tegra3/parameters/auto_hotplug
32----------------------------------------------
33
34Used to control auto hotplug governor
35# echo 0 >/sys/module/cpu_tegra3/parameters/auto_hotplug
36# echo 1 >/sys/module/cpu_tegra3/parameters/auto_hotplug
37# cat /sys/module/cpu_tegra3/parameters/auto_hotplug
380: disabled
391: idle
402: down
413: up
42
43/sys/module/cpu_tegra3/parameters/no_lp
44---------------------------------------
45
46Used to enable/disable shadow cluster.
47# echo 0 >/sys/module/cpu_tegra3/parameters/no_lp
48# echo 1 >/sys/module/cpu_tegra3/parameters/no_lp
49
50/sys/module/cpu_tegra3/parameters/idle_bottom_freq
51--------------------------------------------------
52
53Shadow cluster maximum frequency.
54
55/sys/module/cpu_tegra3/parameters/idle_top_freq
56-----------------------------------------------
57
58Main cluster minimum frequency.
59
60/sys/module/cpu_tegra3/parameters/down_delay
61---------------------------------------------
62
63Auto hotplug delay (in jiffies) for reducing cores.
64
65/sys/module/cpu_tegra3/parameters/up2g0_delay
66---------------------------------------------
67
68Delay (in jiffies) for swithing to main cluster.
69
70/sys/module/cpu_tegra3/parameters/up2gn_delay
71---------------------------------------------
72
73Delay (in jiffies) for bringing additional cores online in main
74cluster.
75
76/sys/module/cpu_tegra3/parameters/balance_level
77-----------------------------------------------
78
79Percentage of max speed considered to be in balance. Half of balanced
80speed is considered skewed. Speed balance states:
81* balanced: freq targets for all CPUs are above 50% of highest speed
82* biased: freq target for at least one CPU is below 50% threshold
83* skewed: freq targets for at least 2 CPUs are below 25% threshold
84Speed balance state and hotplug state dictates auto hotlug behavior.
85
86/sys/module/cpu_tegra3/parameters/mp_overhead
87---------------------------------------------
88
89Multi-core overhead percentage for EDP limit calculation.
90
91/sys/kernel/debug/tegra_hotplug/stats
92-------------------------------------
93
94Contains hotplug statistics.
95
96/sys/kernel/cluster/active
97--------------------------
98
99Controls active CPU cluster: main (G) or shadow (LP).
100For manual control disable auto hotlug, enable immediate switch and
101possibly force switch to happen always:
102# echo 0 > /sys/module/cpu_tegra3/parameters/auto_hotplug
103# echo 1 > /sys/kernel/cluster/immediate
104# echo 1 > /sys/kernel/cluster/force
105
106Cluster switching can happen only when only core 0 is online.
107
108Active cluster can be set or toggled:
109# echo "G" > /sys/kernel/cluster/active
110# echo "LP" > /sys/kernel/cluster/active
111# echo "toggle" > /sys/kernel/cluster/active
112
113/sys/module/tegra3_clocks/parameters/detach_shared_bus
114------------------------------------------------------
115
116Enable/disable shared bus clock update.
117
118/sys/module/tegra3_emc/parameters/emc_enable
119--------------------------------------------
120
121Enable/disable EMC DFS.
122
123/sys/kernel/debug/tegra_emc/stats
124---------------------------------
125
126Contains EMC clock statistics.
127
128/sys/module/tegra3_dvfs/parameters/disable_cpu
129----------------------------------------------
130
131Enable/disable DVFS for CPU domain.
132
133/sys/module/tegra3_dvfs/parameters/disable_core
134-----------------------------------------------
135
136Enable/disable DVFS for CORE domain.
137
138/sys/kernel/debug/clock/emc/rate
139--------------------------------
140
141Get/set EMC clock rate.
142
143/sys/kernel/debug/clock/<module>/rate
144-------------------------------------
145
146/sys/kernel/debug/clock/<module>/parent
147---------------------------------------
148
149/sys/kernel/debug/clock/<module>/state
150--------------------------------------
151
152/sys/kernel/debug/clock/<module>/time_on
153----------------------------------------
154
155/sys/kernel/debug/clock/clock_tree
156----------------------------------
157
158Shows the state of the clock tree.
159
160/sys/kernel/debug/clock/dvfs
161----------------------------
162
163Contains voltage state.
164
165/sys/kernel/debug/tegra_actmon/avp/state
166----------------------------------------
167
168/sys/kernel/debug/clock/mon.avp/rate
169------------------------------------
diff --git a/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt b/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt
new file mode 100644
index 00000000000..eb4b530d64e
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/gpio_nvidia.txt
@@ -0,0 +1,8 @@
1NVIDIA Tegra 2 GPIO controller
2
3Required properties:
4- compatible : "nvidia,tegra20-gpio"
5- #gpio-cells : Should be two. The first cell is the pin number and the
6 second cell is used to specify optional parameters:
7 - bit 0 specifies polarity (0 for normal, 1 for inverted)
8- gpio-controller : Marks the device node as a GPIO controller.
diff --git a/Documentation/devicetree/bindings/gpio/led.txt b/Documentation/devicetree/bindings/gpio/led.txt
new file mode 100644
index 00000000000..064db928c3c
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/led.txt
@@ -0,0 +1,58 @@
1LEDs connected to GPIO lines
2
3Required properties:
4- compatible : should be "gpio-leds".
5
6Each LED is represented as a sub-node of the gpio-leds device. Each
7node's name represents the name of the corresponding LED.
8
9LED sub-node properties:
10- gpios : Should specify the LED's GPIO, see "Specifying GPIO information
11 for devices" in Documentation/powerpc/booting-without-of.txt. Active
12 low LEDs should be indicated using flags in the GPIO specifier.
13- label : (optional) The label for this LED. If omitted, the label is
14 taken from the node name (excluding the unit address).
15- linux,default-trigger : (optional) This parameter, if present, is a
16 string defining the trigger assigned to the LED. Current triggers are:
17 "backlight" - LED will act as a back-light, controlled by the framebuffer
18 system
19 "default-on" - LED will turn on, but see "default-state" below
20 "heartbeat" - LED "double" flashes at a load average based rate
21 "ide-disk" - LED indicates disk activity
22 "timer" - LED flashes at a fixed, configurable rate
23- default-state: (optional) The initial state of the LED. Valid
24 values are "on", "off", and "keep". If the LED is already on or off
25 and the default-state property is set the to same value, then no
26 glitch should be produced where the LED momentarily turns off (or
27 on). The "keep" setting will keep the LED at whatever its current
28 state is, without producing a glitch. The default is off if this
29 property is not present.
30
31Examples:
32
33leds {
34 compatible = "gpio-leds";
35 hdd {
36 label = "IDE Activity";
37 gpios = <&mcu_pio 0 1>; /* Active low */
38 linux,default-trigger = "ide-disk";
39 };
40
41 fault {
42 gpios = <&mcu_pio 1 0>;
43 /* Keep LED on if BIOS detected hardware fault */
44 default-state = "keep";
45 };
46};
47
48run-control {
49 compatible = "gpio-leds";
50 red {
51 gpios = <&mpc8572 6 0>;
52 default-state = "off";
53 };
54 green {
55 gpios = <&mpc8572 7 0>;
56 default-state = "on";
57 };
58}
diff --git a/Documentation/devicetree/bindings/i2c/arm-versatile.txt b/Documentation/devicetree/bindings/i2c/arm-versatile.txt
new file mode 100644
index 00000000000..361d31c51b6
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/arm-versatile.txt
@@ -0,0 +1,10 @@
1i2c Controller on ARM Versatile platform:
2
3Required properties:
4- compatible : Must be "arm,versatile-i2c";
5- reg
6- #address-cells = <1>;
7- #size-cells = <0>;
8
9Optional properties:
10- Child nodes conforming to i2c bus binding
diff --git a/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt b/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt
new file mode 100644
index 00000000000..569b1624851
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt
@@ -0,0 +1,93 @@
1CE4100 I2C
2----------
3
4CE4100 has one PCI device which is described as the I2C-Controller. This
5PCI device has three PCI-bars, each bar contains a complete I2C
6controller. So we have a total of three independent I2C-Controllers
7which share only an interrupt line.
8The driver is probed via the PCI-ID and is gathering the information of
9attached devices from the devices tree.
10Grant Likely recommended to use the ranges property to map the PCI-Bar
11number to its physical address and to use this to find the child nodes
12of the specific I2C controller. This were his exact words:
13
14 Here's where the magic happens. Each entry in
15 ranges describes how the parent pci address space
16 (middle group of 3) is translated to the local
17 address space (first group of 2) and the size of
18 each range (last cell). In this particular case,
19 the first cell of the local address is chosen to be
20 1:1 mapped to the BARs, and the second is the
21 offset from be base of the BAR (which would be
22 non-zero if you had 2 or more devices mapped off
23 the same BAR)
24
25 ranges allows the address mapping to be described
26 in a way that the OS can interpret without
27 requiring custom device driver code.
28
29This is an example which is used on FalconFalls:
30------------------------------------------------
31 i2c-controller@b,2 {
32 #address-cells = <2>;
33 #size-cells = <1>;
34 compatible = "pci8086,2e68.2",
35 "pci8086,2e68",
36 "pciclass,ff0000",
37 "pciclass,ff00";
38
39 reg = <0x15a00 0x0 0x0 0x0 0x0>;
40 interrupts = <16 1>;
41
42 /* as described by Grant, the first number in the group of
43 * three is the bar number followed by the 64bit bar address
44 * followed by size of the mapping. The bar address
45 * requires also a valid translation in parents ranges
46 * property.
47 */
48 ranges = <0 0 0x02000000 0 0xdffe0500 0x100
49 1 0 0x02000000 0 0xdffe0600 0x100
50 2 0 0x02000000 0 0xdffe0700 0x100>;
51
52 i2c@0 {
53 #address-cells = <1>;
54 #size-cells = <0>;
55 compatible = "intel,ce4100-i2c-controller";
56
57 /* The first number in the reg property is the
58 * number of the bar
59 */
60 reg = <0 0 0x100>;
61
62 /* This I2C controller has no devices */
63 };
64
65 i2c@1 {
66 #address-cells = <1>;
67 #size-cells = <0>;
68 compatible = "intel,ce4100-i2c-controller";
69 reg = <1 0 0x100>;
70
71 /* This I2C controller has one gpio controller */
72 gpio@26 {
73 #gpio-cells = <2>;
74 compatible = "ti,pcf8575";
75 reg = <0x26>;
76 gpio-controller;
77 };
78 };
79
80 i2c@2 {
81 #address-cells = <1>;
82 #size-cells = <0>;
83 compatible = "intel,ce4100-i2c-controller";
84 reg = <2 0 0x100>;
85
86 gpio@26 {
87 #gpio-cells = <2>;
88 compatible = "ti,pcf8575";
89 reg = <0x26>;
90 gpio-controller;
91 };
92 };
93 };
diff --git a/Documentation/devicetree/bindings/i2c/fsl-i2c.txt b/Documentation/devicetree/bindings/i2c/fsl-i2c.txt
new file mode 100644
index 00000000000..1eacd6b20ed
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/fsl-i2c.txt
@@ -0,0 +1,64 @@
1* I2C
2
3Required properties :
4
5 - reg : Offset and length of the register set for the device
6 - compatible : should be "fsl,CHIP-i2c" where CHIP is the name of a
7 compatible processor, e.g. mpc8313, mpc8543, mpc8544, mpc5121,
8 mpc5200 or mpc5200b. For the mpc5121, an additional node
9 "fsl,mpc5121-i2c-ctrl" is required as shown in the example below.
10
11Recommended properties :
12
13 - interrupts : <a b> where a is the interrupt number and b is a
14 field that represents an encoding of the sense and level
15 information for the interrupt. This should be encoded based on
16 the information in section 2) depending on the type of interrupt
17 controller you have.
18 - interrupt-parent : the phandle for the interrupt controller that
19 services interrupts for this device.
20 - fsl,preserve-clocking : boolean; if defined, the clock settings
21 from the bootloader are preserved (not touched).
22 - clock-frequency : desired I2C bus clock frequency in Hz.
23 - fsl,timeout : I2C bus timeout in microseconds.
24
25Examples :
26
27 /* MPC5121 based board */
28 i2c@1740 {
29 #address-cells = <1>;
30 #size-cells = <0>;
31 compatible = "fsl,mpc5121-i2c", "fsl-i2c";
32 reg = <0x1740 0x20>;
33 interrupts = <11 0x8>;
34 interrupt-parent = <&ipic>;
35 clock-frequency = <100000>;
36 };
37
38 i2ccontrol@1760 {
39 compatible = "fsl,mpc5121-i2c-ctrl";
40 reg = <0x1760 0x8>;
41 };
42
43 /* MPC5200B based board */
44 i2c@3d00 {
45 #address-cells = <1>;
46 #size-cells = <0>;
47 compatible = "fsl,mpc5200b-i2c","fsl,mpc5200-i2c","fsl-i2c";
48 reg = <0x3d00 0x40>;
49 interrupts = <2 15 0>;
50 interrupt-parent = <&mpc5200_pic>;
51 fsl,preserve-clocking;
52 };
53
54 /* MPC8544 base board */
55 i2c@3100 {
56 #address-cells = <1>;
57 #size-cells = <0>;
58 compatible = "fsl,mpc8544-i2c", "fsl-i2c";
59 reg = <0x3100 0x100>;
60 interrupts = <43 2>;
61 interrupt-parent = <&mpic>;
62 clock-frequency = <400000>;
63 fsl,timeout = <10000>;
64 };
diff --git a/Documentation/devicetree/bindings/spi/spi_nvidia.txt b/Documentation/devicetree/bindings/spi/spi_nvidia.txt
new file mode 100644
index 00000000000..6b9e5189669
--- /dev/null
+++ b/Documentation/devicetree/bindings/spi/spi_nvidia.txt
@@ -0,0 +1,5 @@
1NVIDIA Tegra 2 SPI device
2
3Required properties:
4- compatible : should be "nvidia,tegra20-spi".
5- gpios : should specify GPIOs used for chipselect.
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
new file mode 100644
index 00000000000..4dc46547766
--- /dev/null
+++ b/Documentation/feature-removal-schedule.txt
@@ -0,0 +1,602 @@
1The following is a list of files and features that are going to be
2removed in the kernel source tree. Every entry should contain what
3exactly is going away, why it is happening, and who is going to be doing
4the work. When the feature is removed from the kernel, it should also
5be removed from this file.
6
7---------------------------
8
9What: x86 floppy disable_hlt
10When: 2012
11Why: ancient workaround of dubious utility clutters the
12 code used by everybody else.
13Who: Len Brown <len.brown@intel.com>
14
15---------------------------
16
17What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle
18When: 2012
19Why: This optional sub-feature of APM is of dubious reliability,
20 and ancient APM laptops are likely better served by calling HLT.
21 Deleting CONFIG_APM_CPU_IDLE allows x86 to stop exporting
22 the pm_idle function pointer to modules.
23Who: Len Brown <len.brown@intel.com>
24
25----------------------------
26
27What: x86_32 "no-hlt" cmdline param
28When: 2012
29Why: remove a branch from idle path, simplify code used by everybody.
30 This option disabled the use of HLT in idle and machine_halt()
31 for hardware that was flakey 15-years ago. Today we have
32 "idle=poll" that removed HLT from idle, and so if such a machine
33 is still running the upstream kernel, "idle=poll" is likely sufficient.
34Who: Len Brown <len.brown@intel.com>
35
36----------------------------
37
38What: x86 "idle=mwait" cmdline param
39When: 2012
40Why: simplify x86 idle code
41Who: Len Brown <len.brown@intel.com>
42
43----------------------------
44
45What: PRISM54
46When: 2.6.34
47
48Why: prism54 FullMAC PCI / Cardbus devices used to be supported only by the
49 prism54 wireless driver. After Intersil stopped selling these
50 devices in preference for the newer more flexible SoftMAC devices
51 a SoftMAC device driver was required and prism54 did not support
52 them. The p54pci driver now exists and has been present in the kernel for
53 a while. This driver supports both SoftMAC devices and FullMAC devices.
54 The main difference between these devices was the amount of memory which
55 could be used for the firmware. The SoftMAC devices support a smaller
56 amount of memory. Because of this the SoftMAC firmware fits into FullMAC
57 devices's memory. p54pci supports not only PCI / Cardbus but also USB
58 and SPI. Since p54pci supports all devices prism54 supports
59 you will have a conflict. I'm not quite sure how distributions are
60 handling this conflict right now. prism54 was kept around due to
61 claims users may experience issues when using the SoftMAC driver.
62 Time has passed users have not reported issues. If you use prism54
63 and for whatever reason you cannot use p54pci please let us know!
64 E-mail us at: linux-wireless@vger.kernel.org
65
66 For more information see the p54 wiki page:
67
68 http://wireless.kernel.org/en/users/Drivers/p54
69
70Who: Luis R. Rodriguez <lrodriguez@atheros.com>
71
72---------------------------
73
74What: IRQF_SAMPLE_RANDOM
75Check: IRQF_SAMPLE_RANDOM
76When: July 2009
77
78Why: Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy
79 sources in the kernel's current entropy model. To resolve this, every
80 input point to the kernel's entropy pool needs to better document the
81 type of entropy source it actually is. This will be replaced with
82 additional add_*_randomness functions in drivers/char/random.c
83
84Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com>
85
86---------------------------
87
88What: Deprecated snapshot ioctls
89When: 2.6.36
90
91Why: The ioctls in kernel/power/user.c were marked as deprecated long time
92 ago. Now they notify users about that so that they need to replace
93 their userspace. After some more time, remove them completely.
94
95Who: Jiri Slaby <jirislaby@gmail.com>
96
97---------------------------
98
99What: The ieee80211_regdom module parameter
100When: March 2010 / desktop catchup
101
102Why: This was inherited by the CONFIG_WIRELESS_OLD_REGULATORY code,
103 and currently serves as an option for users to define an
104 ISO / IEC 3166 alpha2 code for the country they are currently
105 present in. Although there are userspace API replacements for this
106 through nl80211 distributions haven't yet caught up with implementing
107 decent alternatives through standard GUIs. Although available as an
108 option through iw or wpa_supplicant its just a matter of time before
109 distributions pick up good GUI options for this. The ideal solution
110 would actually consist of intelligent designs which would do this for
111 the user automatically even when travelling through different countries.
112 Until then we leave this module parameter as a compromise.
113
114 When userspace improves with reasonable widely-available alternatives for
115 this we will no longer need this module parameter. This entry hopes that
116 by the super-futuristically looking date of "March 2010" we will have
117 such replacements widely available.
118
119Who: Luis R. Rodriguez <lrodriguez@atheros.com>
120
121---------------------------
122
123What: dev->power.power_state
124When: July 2007
125Why: Broken design for runtime control over driver power states, confusing
126 driver-internal runtime power management with: mechanisms to support
127 system-wide sleep state transitions; event codes that distinguish
128 different phases of swsusp "sleep" transitions; and userspace policy
129 inputs. This framework was never widely used, and most attempts to
130 use it were broken. Drivers should instead be exposing domain-specific
131 interfaces either to kernel or to userspace.
132Who: Pavel Machek <pavel@ucw.cz>
133
134---------------------------
135
136What: sys_sysctl
137When: September 2010
138Option: CONFIG_SYSCTL_SYSCALL
139Why: The same information is available in a more convenient from
140 /proc/sys, and none of the sysctl variables appear to be
141 important performance wise.
142
143 Binary sysctls are a long standing source of subtle kernel
144 bugs and security issues.
145
146 When I looked several months ago all I could find after
147 searching several distributions were 5 user space programs and
148 glibc (which falls back to /proc/sys) using this syscall.
149
150 The man page for sysctl(2) documents it as unusable for user
151 space programs.
152
153 sysctl(2) is not generally ABI compatible to a 32bit user
154 space application on a 64bit and a 32bit kernel.
155
156 For the last several months the policy has been no new binary
157 sysctls and no one has put forward an argument to use them.
158
159 Binary sysctls issues seem to keep happening appearing so
160 properly deprecating them (with a warning to user space) and a
161 2 year grace warning period will mean eventually we can kill
162 them and end the pain.
163
164 In the mean time individual binary sysctls can be dealt with
165 in a piecewise fashion.
166
167Who: Eric Biederman <ebiederm@xmission.com>
168
169---------------------------
170
171What: /proc/<pid>/oom_adj
172When: August 2012
173Why: /proc/<pid>/oom_adj allows userspace to influence the oom killer's
174 badness heuristic used to determine which task to kill when the kernel
175 is out of memory.
176
177 The badness heuristic has since been rewritten since the introduction of
178 this tunable such that its meaning is deprecated. The value was
179 implemented as a bitshift on a score generated by the badness()
180 function that did not have any precise units of measure. With the
181 rewrite, the score is given as a proportion of available memory to the
182 task allocating pages, so using a bitshift which grows the score
183 exponentially is, thus, impossible to tune with fine granularity.
184
185 A much more powerful interface, /proc/<pid>/oom_score_adj, was
186 introduced with the oom killer rewrite that allows users to increase or
187 decrease the badness score linearly. This interface will replace
188 /proc/<pid>/oom_adj.
189
190 A warning will be emitted to the kernel log if an application uses this
191 deprecated interface. After it is printed once, future warnings will be
192 suppressed until the kernel is rebooted.
193
194---------------------------
195
196What: remove EXPORT_SYMBOL(kernel_thread)
197When: August 2006
198Files: arch/*/kernel/*_ksyms.c
199Check: kernel_thread
200Why: kernel_thread is a low-level implementation detail. Drivers should
201 use the <linux/kthread.h> API instead which shields them from
202 implementation details and provides a higherlevel interface that
203 prevents bugs and code duplication
204Who: Christoph Hellwig <hch@lst.de>
205
206---------------------------
207
208What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports
209 (temporary transition config option provided until then)
210 The transition config option will also be removed at the same time.
211When: before 2.6.19
212Why: Unused symbols are both increasing the size of the kernel binary
213 and are often a sign of "wrong API"
214Who: Arjan van de Ven <arjan@linux.intel.com>
215
216---------------------------
217
218What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment
219When: October 2008
220Why: The stacking of class devices makes these values misleading and
221 inconsistent.
222 Class devices should not carry any of these properties, and bus
223 devices have SUBSYTEM and DRIVER as a replacement.
224Who: Kay Sievers <kay.sievers@suse.de>
225
226---------------------------
227
228What: ACPI procfs interface
229When: July 2008
230Why: ACPI sysfs conversion should be finished by January 2008.
231 ACPI procfs interface will be removed in July 2008 so that
232 there is enough time for the user space to catch up.
233Who: Zhang Rui <rui.zhang@intel.com>
234
235---------------------------
236
237What: CONFIG_ACPI_PROCFS_POWER
238When: 2.6.39
239Why: sysfs I/F for ACPI power devices, including AC and Battery,
240 has been working in upstream kernel since 2.6.24, Sep 2007.
241 In 2.6.37, we make the sysfs I/F always built in and this option
242 disabled by default.
243 Remove this option and the ACPI power procfs interface in 2.6.39.
244Who: Zhang Rui <rui.zhang@intel.com>
245
246---------------------------
247
248What: /proc/acpi/event
249When: February 2008
250Why: /proc/acpi/event has been replaced by events via the input layer
251 and netlink since 2.6.23.
252Who: Len Brown <len.brown@intel.com>
253
254---------------------------
255
256What: i386/x86_64 bzImage symlinks
257When: April 2010
258
259Why: The i386/x86_64 merge provides a symlink to the old bzImage
260 location so not yet updated user space tools, e.g. package
261 scripts, do not break.
262Who: Thomas Gleixner <tglx@linutronix.de>
263
264---------------------------
265
266What: GPIO autorequest on gpio_direction_{input,output}() in gpiolib
267When: February 2010
268Why: All callers should use explicit gpio_request()/gpio_free().
269 The autorequest mechanism in gpiolib was provided mostly as a
270 migration aid for legacy GPIO interfaces (for SOC based GPIOs).
271 Those users have now largely migrated. Platforms implementing
272 the GPIO interfaces without using gpiolib will see no changes.
273Who: David Brownell <dbrownell@users.sourceforge.net>
274---------------------------
275
276What: b43 support for firmware revision < 410
277When: The schedule was July 2008, but it was decided that we are going to keep the
278 code as long as there are no major maintanance headaches.
279 So it _could_ be removed _any_ time now, if it conflicts with something new.
280Why: The support code for the old firmware hurts code readability/maintainability
281 and slightly hurts runtime performance. Bugfixes for the old firmware
282 are not provided by Broadcom anymore.
283Who: Michael Buesch <m@bues.ch>
284
285---------------------------
286
287What: Ability for non root users to shm_get hugetlb pages based on mlock
288 resource limits
289When: 2.6.31
290Why: Non root users need to be part of /proc/sys/vm/hugetlb_shm_group or
291 have CAP_IPC_LOCK to be able to allocate shm segments backed by
292 huge pages. The mlock based rlimit check to allow shm hugetlb is
293 inconsistent with mmap based allocations. Hence it is being
294 deprecated.
295Who: Ravikiran Thirumalai <kiran@scalex86.org>
296
297---------------------------
298
299What: Code that is now under CONFIG_WIRELESS_EXT_SYSFS
300 (in net/core/net-sysfs.c)
301When: After the only user (hal) has seen a release with the patches
302 for enough time, probably some time in 2010.
303Why: Over 1K .text/.data size reduction, data is available in other
304 ways (ioctls)
305Who: Johannes Berg <johannes@sipsolutions.net>
306
307---------------------------
308
309What: sysfs ui for changing p4-clockmod parameters
310When: September 2009
311Why: See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and
312 e088e4c9cdb618675874becb91b2fd581ee707e6.
313 Removal is subject to fixing any remaining bugs in ACPI which may
314 cause the thermal throttling not to happen at the right time.
315Who: Dave Jones <davej@redhat.com>, Matthew Garrett <mjg@redhat.com>
316
317-----------------------------
318
319What: fakephp and associated sysfs files in /sys/bus/pci/slots/
320When: 2011
321Why: In 2.6.27, the semantics of /sys/bus/pci/slots was redefined to
322 represent a machine's physical PCI slots. The change in semantics
323 had userspace implications, as the hotplug core no longer allowed
324 drivers to create multiple sysfs files per physical slot (required
325 for multi-function devices, e.g.). fakephp was seen as a developer's
326 tool only, and its interface changed. Too late, we learned that
327 there were some users of the fakephp interface.
328
329 In 2.6.30, the original fakephp interface was restored. At the same
330 time, the PCI core gained the ability that fakephp provided, namely
331 function-level hot-remove and hot-add.
332
333 Since the PCI core now provides the same functionality, exposed in:
334
335 /sys/bus/pci/rescan
336 /sys/bus/pci/devices/.../remove
337 /sys/bus/pci/devices/.../rescan
338
339 there is no functional reason to maintain fakephp as well.
340
341 We will keep the existing module so that 'modprobe fakephp' will
342 present the old /sys/bus/pci/slots/... interface for compatibility,
343 but users are urged to migrate their applications to the API above.
344
345 After a reasonable transition period, we will remove the legacy
346 fakephp interface.
347Who: Alex Chiang <achiang@hp.com>
348
349---------------------------
350
351What: CONFIG_RFKILL_INPUT
352When: 2.6.33
353Why: Should be implemented in userspace, policy daemon.
354Who: Johannes Berg <johannes@sipsolutions.net>
355
356----------------------------
357
358What: sound-slot/service-* module aliases and related clutters in
359 sound/sound_core.c
360When: August 2010
361Why: OSS sound_core grabs all legacy minors (0-255) of SOUND_MAJOR
362 (14) and requests modules using custom sound-slot/service-*
363 module aliases. The only benefit of doing this is allowing
364 use of custom module aliases which might as well be considered
365 a bug at this point. This preemptive claiming prevents
366 alternative OSS implementations.
367
368 Till the feature is removed, the kernel will be requesting
369 both sound-slot/service-* and the standard char-major-* module
370 aliases and allow turning off the pre-claiming selectively via
371 CONFIG_SOUND_OSS_CORE_PRECLAIM and soundcore.preclaim_oss
372 kernel parameter.
373
374 After the transition phase is complete, both the custom module
375 aliases and switches to disable it will go away. This removal
376 will also allow making ALSA OSS emulation independent of
377 sound_core. The dependency will be broken then too.
378Who: Tejun Heo <tj@kernel.org>
379
380----------------------------
381
382What: sysfs-class-rfkill state file
383When: Feb 2014
384Files: net/rfkill/core.c
385Why: Documented as obsolete since Feb 2010. This file is limited to 3
386 states while the rfkill drivers can have 4 states.
387Who: anybody or Florian Mickler <florian@mickler.org>
388
389----------------------------
390
391What: sysfs-class-rfkill claim file
392When: Feb 2012
393Files: net/rfkill/core.c
394Why: It is not possible to claim an rfkill driver since 2007. This is
395 Documented as obsolete since Feb 2010.
396Who: anybody or Florian Mickler <florian@mickler.org>
397
398----------------------------
399
400What: KVM paravirt mmu host support
401When: January 2011
402Why: The paravirt mmu host support is slower than non-paravirt mmu, both
403 on newer and older hardware. It is already not exposed to the guest,
404 and kept only for live migration purposes.
405Who: Avi Kivity <avi@redhat.com>
406
407----------------------------
408
409What: iwlwifi 50XX module parameters
410When: 3.0
411Why: The "..50" modules parameters were used to configure 5000 series and
412 up devices; different set of module parameters also available for 4965
413 with same functionalities. Consolidate both set into single place
414 in drivers/net/wireless/iwlwifi/iwl-agn.c
415
416Who: Wey-Yi Guy <wey-yi.w.guy@intel.com>
417
418----------------------------
419
420What: iwl4965 alias support
421When: 3.0
422Why: Internal alias support has been present in module-init-tools for some
423 time, the MODULE_ALIAS("iwl4965") boilerplate aliases can be removed
424 with no impact.
425
426Who: Wey-Yi Guy <wey-yi.w.guy@intel.com>
427
428---------------------------
429
430What: xt_NOTRACK
431Files: net/netfilter/xt_NOTRACK.c
432When: April 2011
433Why: Superseded by xt_CT
434Who: Netfilter developer team <netfilter-devel@vger.kernel.org>
435
436----------------------------
437
438What: IRQF_DISABLED
439When: 2.6.36
440Why: The flag is a NOOP as we run interrupt handlers with interrupts disabled
441Who: Thomas Gleixner <tglx@linutronix.de>
442
443----------------------------
444
445What: PCI DMA unmap state API
446When: August 2012
447Why: PCI DMA unmap state API (include/linux/pci-dma.h) was replaced
448 with DMA unmap state API (DMA unmap state API can be used for
449 any bus).
450Who: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
451
452----------------------------
453
454What: iwlwifi disable_hw_scan module parameters
455When: 3.0
456Why: Hareware scan is the prefer method for iwlwifi devices for
457 scanning operation. Remove software scan support for all the
458 iwlwifi devices.
459
460Who: Wey-Yi Guy <wey-yi.w.guy@intel.com>
461
462----------------------------
463
464What: Legacy, non-standard chassis intrusion detection interface.
465When: June 2011
466Why: The adm9240, w83792d and w83793 hardware monitoring drivers have
467 legacy interfaces for chassis intrusion detection. A standard
468 interface has been added to each driver, so the legacy interface
469 can be removed.
470Who: Jean Delvare <khali@linux-fr.org>
471
472----------------------------
473
474What: xt_connlimit rev 0
475When: 2012
476Who: Jan Engelhardt <jengelh@medozas.de>
477Files: net/netfilter/xt_connlimit.c
478
479----------------------------
480
481What: ipt_addrtype match include file
482When: 2012
483Why: superseded by xt_addrtype
484Who: Florian Westphal <fw@strlen.de>
485Files: include/linux/netfilter_ipv4/ipt_addrtype.h
486
487----------------------------
488
489What: i2c_driver.attach_adapter
490 i2c_driver.detach_adapter
491When: September 2011
492Why: These legacy callbacks should no longer be used as i2c-core offers
493 a variety of preferable alternative ways to instantiate I2C devices.
494Who: Jean Delvare <khali@linux-fr.org>
495
496----------------------------
497
498What: Support for UVCIOC_CTRL_ADD in the uvcvideo driver
499When: 3.2
500Why: The information passed to the driver by this ioctl is now queried
501 dynamically from the device.
502Who: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
503
504----------------------------
505
506What: Support for UVCIOC_CTRL_MAP_OLD in the uvcvideo driver
507When: 3.2
508Why: Used only by applications compiled against older driver versions.
509 Superseded by UVCIOC_CTRL_MAP which supports V4L2 menu controls.
510Who: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
511
512----------------------------
513
514What: Support for UVCIOC_CTRL_GET and UVCIOC_CTRL_SET in the uvcvideo driver
515When: 3.2
516Why: Superseded by the UVCIOC_CTRL_QUERY ioctl.
517Who: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
518
519----------------------------
520
521What: Support for driver specific ioctls in the pwc driver (everything
522 defined in media/pwc-ioctl.h)
523When: 3.3
524Why: This stems from the v4l1 era, with v4l2 everything can be done with
525 standardized v4l2 API calls
526Who: Hans de Goede <hdegoede@redhat.com>
527
528----------------------------
529
530What: Driver specific sysfs API in the pwc driver
531When: 3.3
532Why: Setting pan/tilt should be done with v4l2 controls, like with other
533 cams. The button is available as a standard input device
534Who: Hans de Goede <hdegoede@redhat.com>
535
536----------------------------
537
538What: Driver specific use of pixfmt.priv in the pwc driver
539When: 3.3
540Why: The .priv field never was intended for this, setting a framerate is
541 support using the standardized S_PARM ioctl
542Who: Hans de Goede <hdegoede@redhat.com>
543
544----------------------------
545
546What: Software emulation of arbritary resolutions in the pwc driver
547When: 3.3
548Why: The pwc driver claims to support any resolution between 160x120
549 and 640x480, but emulates this by simply drawing a black border
550 around the image. Userspace can draw its own black border if it
551 really wants one.
552Who: Hans de Goede <hdegoede@redhat.com>
553
554----------------------------
555
556What: For VIDIOC_S_FREQUENCY the type field must match the device node's type.
557 If not, return -EINVAL.
558When: 3.2
559Why: It makes no sense to switch the tuner to radio mode by calling
560 VIDIOC_S_FREQUENCY on a video node, or to switch the tuner to tv mode by
561 calling VIDIOC_S_FREQUENCY on a radio node. This is the first step of a
562 move to more consistent handling of tv and radio tuners.
563Who: Hans Verkuil <hans.verkuil@cisco.com>
564
565----------------------------
566
567What: Opening a radio device node will no longer automatically switch the
568 tuner mode from tv to radio.
569When: 3.3
570Why: Just opening a V4L device should not change the state of the hardware
571 like that. It's very unexpected and against the V4L spec. Instead, you
572 switch to radio mode by calling VIDIOC_S_FREQUENCY. This is the second
573 and last step of the move to consistent handling of tv and radio tuners.
574Who: Hans Verkuil <hans.verkuil@cisco.com>
575
576----------------------------
577
578What: g_file_storage driver
579When: 3.8
580Why: This driver has been superseded by g_mass_storage.
581Who: Alan Stern <stern@rowland.harvard.edu>
582
583----------------------------
584
585What: threeg and interface sysfs files in /sys/devices/platform/acer-wmi
586When: 2012
587Why: In 3.0, we can now autodetect internal 3G device and already have
588 the threeg rfkill device. So, we plan to remove threeg sysfs support
589 for it's no longer necessary.
590
591 We also plan to remove interface sysfs file that exposed which ACPI-WMI
592 interface that was used by acer-wmi driver. It will replaced by
593 information log when acer-wmi initial.
594Who: Lee, Chun-Yi <jlee@novell.com>
595
596----------------------------
597What: The XFS nodelaylog mount option
598When: 3.3
599Why: The delaylog mode that has been the default since 2.6.39 has proven
600 stable, and the old code is in the way of additional improvements in
601 the log code.
602Who: Christoph Hellwig <hch@lst.de>
diff --git a/Documentation/i2c/muxes/gpio-i2cmux b/Documentation/i2c/muxes/gpio-i2cmux
new file mode 100644
index 00000000000..811cd78d4cd
--- /dev/null
+++ b/Documentation/i2c/muxes/gpio-i2cmux
@@ -0,0 +1,65 @@
1Kernel driver gpio-i2cmux
2
3Author: Peter Korsgaard <peter.korsgaard@barco.com>
4
5Description
6-----------
7
8gpio-i2cmux is an i2c mux driver providing access to I2C bus segments
9from a master I2C bus and a hardware MUX controlled through GPIO pins.
10
11E.G.:
12
13 ---------- ---------- Bus segment 1 - - - - -
14 | | SCL/SDA | |-------------- | |
15 | |------------| |
16 | | | | Bus segment 2 | |
17 | Linux | GPIO 1..N | MUX |--------------- Devices
18 | |------------| | | |
19 | | | | Bus segment M
20 | | | |---------------| |
21 ---------- ---------- - - - - -
22
23SCL/SDA of the master I2C bus is multiplexed to bus segment 1..M
24according to the settings of the GPIO pins 1..N.
25
26Usage
27-----
28
29gpio-i2cmux uses the platform bus, so you need to provide a struct
30platform_device with the platform_data pointing to a struct
31gpio_i2cmux_platform_data with the I2C adapter number of the master
32bus, the number of bus segments to create and the GPIO pins used
33to control it. See include/linux/gpio-i2cmux.h for details.
34
35E.G. something like this for a MUX providing 4 bus segments
36controlled through 3 GPIO pins:
37
38#include <linux/gpio-i2cmux.h>
39#include <linux/platform_device.h>
40
41static const unsigned myboard_gpiomux_gpios[] = {
42 AT91_PIN_PC26, AT91_PIN_PC25, AT91_PIN_PC24
43};
44
45static const unsigned myboard_gpiomux_values[] = {
46 0, 1, 2, 3
47};
48
49static struct gpio_i2cmux_platform_data myboard_i2cmux_data = {
50 .parent = 1,
51 .base_nr = 2, /* optional */
52 .values = myboard_gpiomux_values,
53 .n_values = ARRAY_SIZE(myboard_gpiomux_values),
54 .gpios = myboard_gpiomux_gpios,
55 .n_gpios = ARRAY_SIZE(myboard_gpiomux_gpios),
56 .idle = 4, /* optional */
57};
58
59static struct platform_device myboard_i2cmux = {
60 .name = "gpio-i2cmux",
61 .id = 0,
62 .dev = {
63 .platform_data = &myboard_i2cmux_data,
64 },
65};
diff --git a/Documentation/mca.txt b/Documentation/mca.txt
new file mode 100644
index 00000000000..dfd130c2207
--- /dev/null
+++ b/Documentation/mca.txt
@@ -0,0 +1,313 @@
1i386 Micro Channel Architecture Support
2=======================================
3
4MCA support is enabled using the CONFIG_MCA define. A machine with a MCA
5bus will have the kernel variable MCA_bus set, assuming the BIOS feature
6bits are set properly (see arch/i386/boot/setup.S for information on
7how this detection is done).
8
9Adapter Detection
10=================
11
12The ideal MCA adapter detection is done through the use of the
13Programmable Option Select registers. Generic functions for doing
14this have been added in include/linux/mca.h and arch/x86/kernel/mca_32.c.
15Everything needed to detect adapters and read (and write) configuration
16information is there. A number of MCA-specific drivers already use
17this. The typical probe code looks like the following:
18
19 #include <linux/mca.h>
20
21 unsigned char pos2, pos3, pos4, pos5;
22 struct net_device* dev;
23 int slot;
24
25 if( MCA_bus ) {
26 slot = mca_find_adapter( ADAPTER_ID, 0 );
27 if( slot == MCA_NOTFOUND ) {
28 return -ENODEV;
29 }
30 /* optional - see below */
31 mca_set_adapter_name( slot, "adapter name & description" );
32 mca_set_adapter_procfn( slot, dev_getinfo, dev );
33
34 /* read the POS registers. Most devices only use 2 and 3 */
35 pos2 = mca_read_stored_pos( slot, 2 );
36 pos3 = mca_read_stored_pos( slot, 3 );
37 pos4 = mca_read_stored_pos( slot, 4 );
38 pos5 = mca_read_stored_pos( slot, 5 );
39 } else {
40 return -ENODEV;
41 }
42
43 /* extract configuration from pos[2345] and set everything up */
44
45Loadable modules should modify this to test that the specified IRQ and
46IO ports (plus whatever other stuff) match. See 3c523.c for example
47code (actually, smc-mca.c has a slightly more complex example that can
48handle a list of adapter ids).
49
50Keep in mind that devices should never directly access the POS registers
51(via inb(), outb(), etc). While it's generally safe, there is a small
52potential for blowing up hardware when it's done at the wrong time.
53Furthermore, accessing a POS register disables a device temporarily.
54This is usually okay during startup, but do _you_ want to rely on it?
55During initial configuration, mca_init() reads all the POS registers
56into memory. mca_read_stored_pos() accesses that data. mca_read_pos()
57and mca_write_pos() are also available for (safer) direct POS access,
58but their use is _highly_ discouraged. mca_write_pos() is particularly
59dangerous, as it is possible for adapters to be put in inconsistent
60states (i.e. sharing IO address, etc) and may result in crashes, toasted
61hardware, and blindness.
62
63User level drivers (such as the AGX X server) can use /proc/mca/pos to
64find adapters (see below).
65
66Some MCA adapters can also be detected via the usual ISA-style device
67probing (many SCSI adapters, for example). This sort of thing is highly
68discouraged. Perfectly good information is available telling you what's
69there, so there's no excuse for messing with random IO ports. However,
70we MCA people still appreciate any ISA-style driver that will work with
71our hardware. You take what you can get...
72
73Level-Triggered Interrupts
74==========================
75
76Because MCA uses level-triggered interrupts, a few problems arise with
77what might best be described as the ISA mindset and its effects on
78drivers. These sorts of problems are expected to become less common as
79more people use shared IRQs on PCI machines.
80
81In general, an interrupt must be acknowledged not only at the ICU (which
82is done automagically by the kernel), but at the device level. In
83particular, IRQ 0 must be reset after a timer interrupt (now done in
84arch/x86/kernel/time.c) or the first timer interrupt hangs the system.
85There were also problems with the 1.3.x floppy drivers, but that seems
86to have been fixed.
87
88IRQs are also shareable, and most MCA-specific devices should be coded
89with shared IRQs in mind.
90
91/proc/mca
92=========
93
94/proc/mca is a directory containing various files for adapters and
95other stuff.
96
97 /proc/mca/pos Straight listing of POS registers
98 /proc/mca/slot[1-8] Information on adapter in specific slot
99 /proc/mca/video Same for integrated video
100 /proc/mca/scsi Same for integrated SCSI
101 /proc/mca/machine Machine information
102
103See Appendix A for a sample.
104
105Device drivers can easily add their own information function for
106specific slots (including integrated ones) via the
107mca_set_adapter_procfn() call. Drivers that support this are ESDI, IBM
108SCSI, and 3c523. If a device is also a module, make sure that the proc
109function is removed in the module cleanup. This will require storing
110the slot information in a private structure somewhere. See the 3c523
111driver for details.
112
113Your typical proc function will look something like this:
114
115 static int
116 dev_getinfo( char* buf, int slot, void* d ) {
117 struct net_device* dev = (struct net_device*) d;
118 int len = 0;
119
120 len += sprintf( buf+len, "Device: %s\n", dev->name );
121 len += sprintf( buf+len, "IRQ: %d\n", dev->irq );
122 len += sprintf( buf+len, "IO Port: %#lx-%#lx\n", ... );
123 ...
124
125 return len;
126 }
127
128Some of the standard MCA information will already be printed, so don't
129bother repeating it. Don't try putting in more than 3K of information.
130
131Enable this function with:
132 mca_set_adapter_procfn( slot, dev_getinfo, dev );
133
134Disable it with:
135 mca_set_adapter_procfn( slot, NULL, NULL );
136
137It is also recommended that, even if you don't write a proc function, to
138set the name of the adapter (i.e. "PS/2 ESDI Controller") via
139mca_set_adapter_name( int slot, char* name ).
140
141MCA Device Drivers
142==================
143
144Currently, there are a number of MCA-specific device drivers.
145
1461) PS/2 SCSI
147 drivers/scsi/ibmmca.c
148 drivers/scsi/ibmmca.h
149 The driver for the IBM SCSI subsystem. Includes both integrated
150 controllers and adapter cards. May require command-line arg
151 "ibmmcascsi=io_port" to force detection of an adapter. If you have a
152 machine with a front-panel display (i.e. model 95), you can use
153 "ibmmcascsi=display" to enable a drive activity indicator.
154
1552) 3c523
156 drivers/net/3c523.c
157 drivers/net/3c523.h
158 3Com 3c523 Etherlink/MC ethernet driver.
159
1603) SMC Ultra/MCA and IBM Adapter/A
161 drivers/net/smc-mca.c
162 drivers/net/smc-mca.h
163 Driver for the MCA version of the SMC Ultra and various other
164 OEM'ed and work-alike cards (Elite, Adapter/A, etc).
165
1664) NE/2
167 driver/net/ne2.c
168 driver/net/ne2.h
169 The NE/2 is the MCA version of the NE2000. This may not work
170 with clones that have a different adapter id than the original
171 NE/2.
172
1735) Future Domain MCS-600/700, OEM'd IBM Fast SCSI Adapter/A and
174 Reply Sound Blaster/SCSI (SCSI part)
175 Better support for these cards than the driver for ISA.
176 Supports multiple cards with IRQ sharing.
177
178Also added boot time option of scsi-probe, which can do reordering of
179SCSI host adapters. This will direct the kernel on the order which
180SCSI adapter should be detected. Example:
181 scsi-probe=ibmmca,fd_mcs,adaptec1542,buslogic
182
183The serial drivers were modified to support the extended IO port range
184of the typical MCA system (also #ifdef CONFIG_MCA).
185
186The following devices work with existing drivers:
1871) Token-ring
1882) Future Domain SCSI (MCS-600, MCS-700, not MCS-350, OEM'ed IBM SCSI)
1893) Adaptec 1640 SCSI (using the aha1542 driver)
1904) Bustek/Buslogic SCSI (various)
1915) Probably all Arcnet cards.
1926) Some, possibly all, MCA IDE controllers.
1937) 3Com 3c529 (MCA version of 3c509) (patched)
194
1958) Intel EtherExpressMC (patched version)
196 You need to have CONFIG_MCA defined to have EtherExpressMC support.
1979) Reply Sound Blaster/SCSI (SB part) (patched version)
198
199Bugs & Other Weirdness
200======================
201
202NMIs tend to occur with MCA machines because of various hardware
203weirdness, bus timeouts, and many other non-critical things. Some basic
204code to handle them (inspired by the NetBSD MCA code) has been added to
205detect the guilty device, but it's pretty incomplete. If NMIs are a
206persistent problem (on some model 70 or 80s, they occur every couple
207shell commands), the CONFIG_IGNORE_NMI flag will take care of that.
208
209Various Pentium machines have had serious problems with the FPU test in
210bugs.h. Basically, the machine hangs after the HLT test. This occurs,
211as far as we know, on the Pentium-equipped 85s, 95s, and some PC Servers.
212The PCI/MCA PC 750s are fine as far as I can tell. The ``mca-pentium''
213boot-prompt flag will disable the FPU bug check if this is a problem
214with your machine.
215
216The model 80 has a raft of problems that are just too weird and unique
217to get into here. Some people have no trouble while others have nothing
218but problems. I'd suspect some problems are related to the age of the
219average 80 and accompanying hardware deterioration, although others
220are definitely design problems with the hardware. Among the problems
221include SCSI controller problems, ESDI controller problems, and serious
222screw-ups in the floppy controller. Oh, and the parallel port is also
223pretty flaky. There were about 5 or 6 different model 80 motherboards
224produced to fix various obscure problems. As far as I know, it's pretty
225much impossible to tell which bugs a particular model 80 has (other than
226triggering them, that is).
227
228Drivers are required for some MCA memory adapters. If you're suddenly
229short a few megs of RAM, this might be the reason. The (I think) Enhanced
230Memory Adapter commonly found on the model 70 is one. There's a very
231alpha driver floating around, but it's pretty ugly (disassembled from
232the DOS driver, actually). See the MCA Linux web page (URL below)
233for more current memory info.
234
235The Thinkpad 700 and 720 will work, but various components are either
236non-functional, flaky, or we don't know anything about them. The
237graphics controller is supposed to be some WD, but we can't get things
238working properly. The PCMCIA slots don't seem to work. Ditto for APM.
239The serial ports work, but detection seems to be flaky.
240
241Credits
242=======
243A whole pile of people have contributed to the MCA code. I'd include
244their names here, but I don't have a list handy. Check the MCA Linux
245home page (URL below) for a perpetually out-of-date list.
246
247=====================================================================
248MCA Linux Home Page: http://www.dgmicro.com/mca/
249
250Christophe Beauregard
251chrisb@truespectra.com
252cpbeaure@calum.csclub.uwaterloo.ca
253
254=====================================================================
255Appendix A: Sample /proc/mca
256
257This is from my model 8595. Slot 1 contains the standard IBM SCSI
258adapter, slot 3 is an Adaptec AHA-1640, slot 5 is a XGA-1 video adapter,
259and slot 7 is the 3c523 Etherlink/MC.
260
261/proc/mca/machine:
262Model Id: 0xf8
263Submodel Id: 0x14
264BIOS Revision: 0x5
265
266/proc/mca/pos:
267Slot 1: ff 8e f1 fc a0 ff ff ff IBM SCSI Adapter w/Cache
268Slot 2: ff ff ff ff ff ff ff ff
269Slot 3: 1f 0f 81 3b bf b6 ff ff
270Slot 4: ff ff ff ff ff ff ff ff
271Slot 5: db 8f 1d 5e fd c0 00 00
272Slot 6: ff ff ff ff ff ff ff ff
273Slot 7: 42 60 ff 08 ff ff ff ff 3Com 3c523 Etherlink/MC
274Slot 8: ff ff ff ff ff ff ff ff
275Video : ff ff ff ff ff ff ff ff
276SCSI : ff ff ff ff ff ff ff ff
277
278/proc/mca/slot1:
279Slot: 1
280Adapter Name: IBM SCSI Adapter w/Cache
281Id: 8eff
282Enabled: Yes
283POS: ff 8e f1 fc a0 ff ff ff
284Subsystem PUN: 7
285Detected at boot: Yes
286
287/proc/mca/slot3:
288Slot: 3
289Adapter Name: Unknown
290Id: 0f1f
291Enabled: Yes
292POS: 1f 0f 81 3b bf b6 ff ff
293
294/proc/mca/slot5:
295Slot: 5
296Adapter Name: Unknown
297Id: 8fdb
298Enabled: Yes
299POS: db 8f 1d 5e fd c0 00 00
300
301/proc/mca/slot7:
302Slot: 7
303Adapter Name: 3Com 3c523 Etherlink/MC
304Id: 6042
305Enabled: Yes
306POS: 42 60 ff 08 ff ff ff ff
307Revision: 0xe
308IRQ: 9
309IO Address: 0x3300-0x3308
310Memory: 0xd8000-0xdbfff
311Transceiver: External
312Device: eth0
313Hardware Address: 02 60 8c 45 c4 2a
diff --git a/Documentation/memory.txt b/Documentation/memory.txt
new file mode 100644
index 00000000000..802efe58647
--- /dev/null
+++ b/Documentation/memory.txt
@@ -0,0 +1,33 @@
1There are several classic problems related to memory on Linux
2systems.
3
4 1) There are some motherboards that will not cache above
5 a certain quantity of memory. If you have one of these
6 motherboards, your system will be SLOWER, not faster
7 as you add more memory. Consider exchanging your
8 motherboard.
9
10All of these problems can be addressed with the "mem=XXXM" boot option
11(where XXX is the size of RAM to use in megabytes).
12It can also tell Linux to use less memory than is actually installed.
13If you use "mem=" on a machine with PCI, consider using "memmap=" to avoid
14physical address space collisions.
15
16See the documentation of your boot loader (LILO, grub, loadlin, etc.) about
17how to pass options to the kernel.
18
19There are other memory problems which Linux cannot deal with. Random
20corruption of memory is usually a sign of serious hardware trouble.
21Try:
22
23 * Reducing memory settings in the BIOS to the most conservative
24 timings.
25
26 * Adding a cooling fan.
27
28 * Not overclocking your CPU.
29
30 * Having the memory tested in a memory tester or exchanged
31 with the vendor. Consider testing it with memtest86 yourself.
32
33 * Exchanging your CPU, cache, or motherboard for one that works.
diff --git a/Documentation/networking/3c359.txt b/Documentation/networking/3c359.txt
new file mode 100644
index 00000000000..dadfe8147ab
--- /dev/null
+++ b/Documentation/networking/3c359.txt
@@ -0,0 +1,58 @@
1
23COM PCI TOKEN LINK VELOCITY XL TOKEN RING CARDS README
3
4Release 0.9.0 - Release
5 Jul 17th 2000 Mike Phillips
6
7 1.2.0 - Final
8 Feb 17th 2002 Mike Phillips
9 Updated for submission to the 2.4.x kernel.
10
11Thanks:
12 Terry Murphy from 3Com for tech docs and support,
13 Adam D. Ligas for testing the driver.
14
15Note:
16 This driver will NOT work with the 3C339 Token Ring cards, you need
17to use the tms380 driver instead.
18
19Options:
20
21The driver accepts three options: ringspeed, pkt_buf_sz and message_level.
22
23These options can be specified differently for each card found.
24
25ringspeed: Has one of three settings 0 (default), 4 or 16. 0 will
26make the card autosense the ringspeed and join at the appropriate speed,
27this will be the default option for most people. 4 or 16 allow you to
28explicitly force the card to operate at a certain speed. The card will fail
29if you try to insert it at the wrong speed. (Although some hubs will allow
30this so be *very* careful). The main purpose for explicitly setting the ring
31speed is for when the card is first on the ring. In autosense mode, if the card
32cannot detect any active monitors on the ring it will open at the same speed as
33its last opening. This can be hazardous if this speed does not match the speed
34you want the ring to operate at.
35
36pkt_buf_sz: This is this initial receive buffer allocation size. This will
37default to 4096 if no value is entered. You may increase performance of the
38driver by setting this to a value larger than the network packet size, although
39the driver now re-sizes buffers based on MTU settings as well.
40
41message_level: Controls level of messages created by the driver. Defaults to 0:
42which only displays start-up and critical messages. Presently any non-zero
43value will display all soft messages as well. NB This does not turn
44debugging messages on, that must be done by modified the source code.
45
46Variable MTU size:
47
48The driver can handle a MTU size up to either 4500 or 18000 depending upon
49ring speed. The driver also changes the size of the receive buffers as part
50of the mtu re-sizing, so if you set mtu = 18000, you will need to be able
51to allocate 16 * (sk_buff with 18000 buffer size) call it 18500 bytes per ring
52position = 296,000 bytes of memory space, plus of course anything
53necessary for the tx sk_buff's. Remember this is per card, so if you are
54building routers, gateway's etc, you could start to use a lot of memory
55real fast.
56
572/17/02 Mike Phillips
58
diff --git a/Documentation/networking/olympic.txt b/Documentation/networking/olympic.txt
new file mode 100644
index 00000000000..b95b5bf9675
--- /dev/null
+++ b/Documentation/networking/olympic.txt
@@ -0,0 +1,79 @@
1
2IBM PCI Pit/Pit-Phy/Olympic CHIPSET BASED TOKEN RING CARDS README
3
4Release 0.2.0 - Release
5 June 8th 1999 Peter De Schrijver & Mike Phillips
6Release 0.9.C - Release
7 April 18th 2001 Mike Phillips
8
9Thanks:
10Erik De Cock, Adrian Bridgett and Frank Fiene for their
11patience and testing.
12Donald Champion for the cardbus support
13Kyle Lucke for the dma api changes.
14Jonathon Bitner for hardware support.
15Everybody on linux-tr for their continued support.
16
17Options:
18
19The driver accepts four options: ringspeed, pkt_buf_sz,
20message_level and network_monitor.
21
22These options can be specified differently for each card found.
23
24ringspeed: Has one of three settings 0 (default), 4 or 16. 0 will
25make the card autosense the ringspeed and join at the appropriate speed,
26this will be the default option for most people. 4 or 16 allow you to
27explicitly force the card to operate at a certain speed. The card will fail
28if you try to insert it at the wrong speed. (Although some hubs will allow
29this so be *very* careful). The main purpose for explicitly setting the ring
30speed is for when the card is first on the ring. In autosense mode, if the card
31cannot detect any active monitors on the ring it will not open, so you must
32re-init the card at the appropriate speed. Unfortunately at present the only
33way of doing this is rmmod and insmod which is a bit tough if it is compiled
34in the kernel.
35
36pkt_buf_sz: This is this initial receive buffer allocation size. This will
37default to 4096 if no value is entered. You may increase performance of the
38driver by setting this to a value larger than the network packet size, although
39the driver now re-sizes buffers based on MTU settings as well.
40
41message_level: Controls level of messages created by the driver. Defaults to 0:
42which only displays start-up and critical messages. Presently any non-zero
43value will display all soft messages as well. NB This does not turn
44debugging messages on, that must be done by modified the source code.
45
46network_monitor: Any non-zero value will provide a quasi network monitoring
47mode. All unexpected MAC frames (beaconing etc.) will be received
48by the driver and the source and destination addresses printed.
49Also an entry will be added in /proc/net called olympic_tr%d, where tr%d
50is the registered device name, i.e tr0, tr1, etc. This displays low
51level information about the configuration of the ring and the adapter.
52This feature has been designed for network administrators to assist in
53the diagnosis of network / ring problems. (This used to OLYMPIC_NETWORK_MONITOR,
54but has now changed to allow each adapter to be configured differently and
55to alleviate the necessity to re-compile olympic to turn the option on).
56
57Multi-card:
58
59The driver will detect multiple cards and will work with shared interrupts,
60each card is assigned the next token ring device, i.e. tr0 , tr1, tr2. The
61driver should also happily reside in the system with other drivers. It has
62been tested with ibmtr.c running, and I personally have had one Olicom PCI
63card and two IBM olympic cards (all on the same interrupt), all running
64together.
65
66Variable MTU size:
67
68The driver can handle a MTU size up to either 4500 or 18000 depending upon
69ring speed. The driver also changes the size of the receive buffers as part
70of the mtu re-sizing, so if you set mtu = 18000, you will need to be able
71to allocate 16 * (sk_buff with 18000 buffer size) call it 18500 bytes per ring
72position = 296,000 bytes of memory space, plus of course anything
73necessary for the tx sk_buff's. Remember this is per card, so if you are
74building routers, gateway's etc, you could start to use a lot of memory
75real fast.
76
77
786/8/99 Peter De Schrijver and Mike Phillips
79
diff --git a/Documentation/networking/smctr.txt b/Documentation/networking/smctr.txt
new file mode 100644
index 00000000000..9af25b810c1
--- /dev/null
+++ b/Documentation/networking/smctr.txt
@@ -0,0 +1,66 @@
1Text File for the SMC TokenCard TokenRing Linux driver (smctr.c).
2 By Jay Schulist <jschlst@samba.org>
3
4The Linux SMC Token Ring driver works with the SMC TokenCard Elite (8115T)
5ISA and SMC TokenCard Elite/A (8115T/A) MCA adapters.
6
7Latest information on this driver can be obtained on the Linux-SNA WWW site.
8Please point your browser to: http://www.linux-sna.org
9
10This driver is rather simple to use. Select Y to Token Ring adapter support
11in the kernel configuration. A choice for SMC Token Ring adapters will
12appear. This drives supports all SMC ISA/MCA adapters. Choose this
13option. I personally recommend compiling the driver as a module (M), but if you
14you would like to compile it statically answer Y instead.
15
16This driver supports multiple adapters without the need to load multiple copies
17of the driver. You should be able to load up to 7 adapters without any kernel
18modifications, if you are in need of more please contact the maintainer of this
19driver.
20
21Load the driver either by lilo/loadlin or as a module. When a module using the
22following command will suffice for most:
23
24# modprobe smctr
25smctr.c: v1.00 12/6/99 by jschlst@samba.org
26tr0: SMC TokenCard 8115T at Io 0x300, Irq 10, Rom 0xd8000, Ram 0xcc000.
27
28Now just setup the device via ifconfig and set and routes you may have. After
29this you are ready to start sending some tokens.
30
31Errata:
321). For anyone wondering where to pick up the SMC adapters please browse
33 to http://www.smc.com
34
352). If you are the first/only Token Ring Client on a Token Ring LAN, please
36 specify the ringspeed with the ringspeed=[4/16] module option. If no
37 ringspeed is specified the driver will attempt to autodetect the ring
38 speed and/or if the adapter is the first/only station on the ring take
39 the appropriate actions.
40
41 NOTE: Default ring speed is 16MB UTP.
42
433). PnP support for this adapter sucks. I recommend hard setting the
44 IO/MEM/IRQ by the jumpers on the adapter. If this is not possible
45 load the module with the following io=[ioaddr] mem=[mem_addr]
46 irq=[irq_num].
47
48 The following IRQ, IO, and MEM settings are supported.
49
50 IO ports:
51 0x200, 0x220, 0x240, 0x260, 0x280, 0x2A0, 0x2C0, 0x2E0, 0x300,
52 0x320, 0x340, 0x360, 0x380.
53
54 IRQs:
55 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15
56
57 Memory addresses:
58 0xA0000, 0xA4000, 0xA8000, 0xAC000, 0xB0000, 0xB4000,
59 0xB8000, 0xBC000, 0xC0000, 0xC4000, 0xC8000, 0xCC000,
60 0xD0000, 0xD4000, 0xD8000, 0xDC000, 0xE0000, 0xE4000,
61 0xE8000, 0xEC000, 0xF0000, 0xF4000, 0xF8000, 0xFC000
62
63This driver is under the GNU General Public License. Its Firmware image is
64included as an initialized C-array and is licensed by SMC to the Linux
65users of this driver. However no warranty about its fitness is expressed or
66implied by SMC.
diff --git a/Documentation/networking/tms380tr.txt b/Documentation/networking/tms380tr.txt
new file mode 100644
index 00000000000..1f73e13058d
--- /dev/null
+++ b/Documentation/networking/tms380tr.txt
@@ -0,0 +1,147 @@
1Text file for the Linux SysKonnect Token Ring ISA/PCI Adapter Driver.
2 Text file by: Jay Schulist <jschlst@samba.org>
3
4The Linux SysKonnect Token Ring driver works with the SysKonnect TR4/16(+) ISA,
5SysKonnect TR4/16(+) PCI, SysKonnect TR4/16 PCI, and older revisions of the
6SK NET TR4/16 ISA card.
7
8Latest information on this driver can be obtained on the Linux-SNA WWW site.
9Please point your browser to:
10http://www.linux-sna.org
11
12Many thanks to Christoph Goos for his excellent work on this driver and
13SysKonnect for donating the adapters to Linux-SNA for the testing and
14maintenance of this device driver.
15
16Important information to be noted:
171. Adapters can be slow to open (~20 secs) and close (~5 secs), please be
18 patient.
192. This driver works very well when autoprobing for adapters. Why even
20 think about those nasty io/int/dma settings of modprobe when the driver
21 will do it all for you!
22
23This driver is rather simple to use. Select Y to Token Ring adapter support
24in the kernel configuration. A choice for SysKonnect Token Ring adapters will
25appear. This drives supports all SysKonnect ISA and PCI adapters. Choose this
26option. I personally recommend compiling the driver as a module (M), but if you
27you would like to compile it statically answer Y instead.
28
29This driver supports multiple adapters without the need to load multiple copies
30of the driver. You should be able to load up to 7 adapters without any kernel
31modifications, if you are in need of more please contact the maintainer of this
32driver.
33
34Load the driver either by lilo/loadlin or as a module. When a module using the
35following command will suffice for most:
36
37# modprobe sktr
38
39This will produce output similar to the following: (Output is user specific)
40
41sktr.c: v1.01 08/29/97 by Christoph Goos
42tr0: SK NET TR 4/16 PCI found at 0x6100, using IRQ 17.
43tr1: SK NET TR 4/16 PCI found at 0x6200, using IRQ 16.
44tr2: SK NET TR 4/16 ISA found at 0xa20, using IRQ 10 and DMA 5.
45
46Now just setup the device via ifconfig and set and routes you may have. After
47this you are ready to start sending some tokens.
48
49Errata:
50For anyone wondering where to pick up the SysKonnect adapters please browse
51to http://www.syskonnect.com
52
53This driver is under the GNU General Public License. Its Firmware image is
54included as an initialized C-array and is licensed by SysKonnect to the Linux
55users of this driver. However no warranty about its fitness is expressed or
56implied by SysKonnect.
57
58Below find attached the setting for the SK NET TR 4/16 ISA adapters
59-------------------------------------------------------------------
60
61 ***************************
62 *** C O N T E N T S ***
63 ***************************
64
65 1) Location of DIP-Switch W1
66 2) Default settings
67 3) DIP-Switch W1 description
68
69
70 ==============================================================
71 CHAPTER 1 LOCATION OF DIP-SWITCH
72 ==============================================================
73
74U脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛驴
75镁U脛脛脛脛脛脛驴 U脛脛脛脛脛驴 U脛脛脛驴 镁
76镁A脛脛脛脛脛脛U W1 A脛脛脛脛脛U U脛脛脛脛驴 镁 镁 镁
77镁U脛脛脛脛脛脛驴 镁 镁 镁 镁 U脛脛脜驴
78镁A脛脛脛脛脛脛U U脛脛脛脛脛脛脛脛脛脛脛驴 A脛脛脛脛U 镁 镁 镁 镁镁
79镁U脛脛脛脛脛脛驴 镁 镁 U脛脛脛驴 A脛脛脛U A脛脛脜U
80镁A脛脛脛脛脛脛U 镁 TMS380C26 镁 镁 镁 镁
81镁U脛脛脛脛脛脛驴 镁 镁 A脛脛脛U A脛驴
82镁A脛脛脛脛脛脛U 镁 镁 镁 镁
83镁 A脛脛脛脛脛脛脛脛脛脛脛U 镁 镁
84镁 镁 镁
85镁 A脛U
86镁 镁
87镁 镁
88镁 镁
89镁 镁
90A脛脛脛脛脛脛脛脛脛脛脛脛A脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛A脛脛A脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛A脛脛脛脛脛脛脛脛脛U
91 A脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛U A脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛脛U
92
93 ==============================================================
94 CHAPTER 2 DEFAULT SETTINGS
95 ==============================================================
96
97 W1 1 2 3 4 5 6 7 8
98 +------------------------------+
99 | ON X |
100 | OFF X X X X X X X |
101 +------------------------------+
102
103 W1.1 = ON Adapter drives address lines SA17..19
104 W1.2 - 1.5 = OFF BootROM disabled
105 W1.6 - 1.8 = OFF I/O address 0A20h
106
107 ==============================================================
108 CHAPTER 3 DIP SWITCH W1 DESCRIPTION
109 ==============================================================
110
111 U脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛驴 ON
112 镁 1 镁 2 镁 3 镁 4 镁 5 镁 6 镁 7 镁 8 镁
113 A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛A脛脛脛U OFF
114 |AD | BootROM Addr. | I/O |
115 +-+-+-------+-------+-----+-----+
116 | | |
117 | | +------ 6 7 8
118 | | ON ON ON 1900h
119 | | ON ON OFF 0900h
120 | | ON OFF ON 1980h
121 | | ON OFF OFF 0980h
122 | | OFF ON ON 1b20h
123 | | OFF ON OFF 0b20h
124 | | OFF OFF ON 1a20h
125 | | OFF OFF OFF 0a20h (+)
126 | |
127 | |
128 | +-------- 2 3 4 5
129 | OFF x x x disabled (+)
130 | ON ON ON ON C0000
131 | ON ON ON OFF C4000
132 | ON ON OFF ON C8000
133 | ON ON OFF OFF CC000
134 | ON OFF ON ON D0000
135 | ON OFF ON OFF D4000
136 | ON OFF OFF ON D8000
137 | ON OFF OFF OFF DC000
138 |
139 |
140 +----- 1
141 OFF adapter does NOT drive SA<17..19>
142 ON adapter drives SA<17..19> (+)
143
144
145 (+) means default setting
146
147 ********************************
diff --git a/Documentation/nmi_watchdog.txt b/Documentation/nmi_watchdog.txt
new file mode 100644
index 00000000000..bf9f80a9828
--- /dev/null
+++ b/Documentation/nmi_watchdog.txt
@@ -0,0 +1,83 @@
1
2[NMI watchdog is available for x86 and x86-64 architectures]
3
4Is your system locking up unpredictably? No keyboard activity, just
5a frustrating complete hard lockup? Do you want to help us debugging
6such lockups? If all yes then this document is definitely for you.
7
8On many x86/x86-64 type hardware there is a feature that enables
9us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt
10which get executed even if the system is otherwise locked up hard).
11This can be used to debug hard kernel lockups. By executing periodic
12NMI interrupts, the kernel can monitor whether any CPU has locked up,
13and print out debugging messages if so.
14
15In order to use the NMI watchdog, you need to have APIC support in your
16kernel. For SMP kernels, APIC support gets compiled in automatically. For
17UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local
18APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and
19features -> IO-APIC support on uniprocessors) in your kernel config.
20CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC.
21CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain
22kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,
23may implicitly disable the NMI watchdog.]
24
25For x86-64, the needed APIC is always compiled in.
26
27Using local APIC (nmi_watchdog=2) needs the first performance register, so
28you can't use it for other purposes (such as high precision performance
29profiling.) However, at least oprofile and the perfctr driver disable the
30local APIC NMI watchdog automatically.
31
32To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot
33parameter. Eg. the relevant lilo.conf entry:
34
35 append="nmi_watchdog=1"
36
37For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1.
38For UP machines without an IO-APIC use nmi_watchdog=2, this only works
39for some processor types. If in doubt, boot with nmi_watchdog=1 and
40check the NMI count in /proc/interrupts; if the count is zero then
41reboot with nmi_watchdog=2 and check the NMI count. If it is still
42zero then log a problem, you probably have a processor that needs to be
43added to the nmi code.
44
45A 'lockup' is the following scenario: if any CPU in the system does not
46execute the period local timer interrupt for more than 5 seconds, then
47the NMI handler generates an oops and kills the process. This
48'controlled crash' (and the resulting kernel messages) can be used to
49debug the lockup. Thus whenever the lockup happens, wait 5 seconds and
50the oops will show up automatically. If the kernel produces no messages
51then the system has crashed so hard (eg. hardware-wise) that either it
52cannot even accept NMI interrupts, or the crash has made the kernel
53unable to print messages.
54
55Be aware that when using local APIC, the frequency of NMI interrupts
56it generates, depends on the system load. The local APIC NMI watchdog,
57lacking a better source, uses the "cycles unhalted" event. As you may
58guess it doesn't tick when the CPU is in the halted state (which happens
59when the system is idle), but if your system locks up on anything but the
60"hlt" processor instruction, the watchdog will trigger very soon as the
61"cycles unhalted" event will happen every clock tick. If it locks up on
62"hlt", then you are out of luck -- the event will not happen at all and the
63watchdog won't trigger. This is a shortcoming of the local APIC watchdog
64-- unfortunately there is no "clock ticks" event that would work all the
65time. The I/O APIC watchdog is driven externally and has no such shortcoming.
66But its NMI frequency is much higher, resulting in a more significant hit
67to the overall system performance.
68
69On x86 nmi_watchdog is disabled by default so you have to enable it with
70a boot time parameter.
71
72It's possible to disable the NMI watchdog in run-time by writing "0" to
73/proc/sys/kernel/nmi_watchdog. Writing "1" to the same file will re-enable
74the NMI watchdog. Notice that you still need to use "nmi_watchdog=" parameter
75at boot time.
76
77NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally
78on x86 SMP boxes.
79
80[ feel free to send bug reports, suggestions and patches to
81 Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing
82 list at <linux-smp@vger.kernel.org> ]
83
diff --git a/Documentation/powerpc/phyp-assisted-dump.txt b/Documentation/powerpc/phyp-assisted-dump.txt
new file mode 100644
index 00000000000..ad340205d96
--- /dev/null
+++ b/Documentation/powerpc/phyp-assisted-dump.txt
@@ -0,0 +1,127 @@
1
2 Hypervisor-Assisted Dump
3 ------------------------
4 November 2007
5
6The goal of hypervisor-assisted dump is to enable the dump of
7a crashed system, and to do so from a fully-reset system, and
8to minimize the total elapsed time until the system is back
9in production use.
10
11As compared to kdump or other strategies, hypervisor-assisted
12dump offers several strong, practical advantages:
13
14-- Unlike kdump, the system has been reset, and loaded
15 with a fresh copy of the kernel. In particular,
16 PCI and I/O devices have been reinitialized and are
17 in a clean, consistent state.
18-- As the dump is performed, the dumped memory becomes
19 immediately available to the system for normal use.
20-- After the dump is completed, no further reboots are
21 required; the system will be fully usable, and running
22 in its normal, production mode on its normal kernel.
23
24The above can only be accomplished by coordination with,
25and assistance from the hypervisor. The procedure is
26as follows:
27
28-- When a system crashes, the hypervisor will save
29 the low 256MB of RAM to a previously registered
30 save region. It will also save system state, system
31 registers, and hardware PTE's.
32
33-- After the low 256MB area has been saved, the
34 hypervisor will reset PCI and other hardware state.
35 It will *not* clear RAM. It will then launch the
36 bootloader, as normal.
37
38-- The freshly booted kernel will notice that there
39 is a new node (ibm,dump-kernel) in the device tree,
40 indicating that there is crash data available from
41 a previous boot. It will boot into only 256MB of RAM,
42 reserving the rest of system memory.
43
44-- Userspace tools will parse /sys/kernel/release_region
45 and read /proc/vmcore to obtain the contents of memory,
46 which holds the previous crashed kernel. The userspace
47 tools may copy this info to disk, or network, nas, san,
48 iscsi, etc. as desired.
49
50 For Example: the values in /sys/kernel/release-region
51 would look something like this (address-range pairs).
52 CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
53 DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
54
55-- As the userspace tools complete saving a portion of
56 dump, they echo an offset and size to
57 /sys/kernel/release_region to release the reserved
58 memory back to general use.
59
60 An example of this is:
61 "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
62 which will release 256MB at the 1GB boundary.
63
64Please note that the hypervisor-assisted dump feature
65is only available on Power6-based systems with recent
66firmware versions.
67
68Implementation details:
69----------------------
70
71During boot, a check is made to see if firmware supports
72this feature on this particular machine. If it does, then
73we check to see if a active dump is waiting for us. If yes
74then everything but 256 MB of RAM is reserved during early
75boot. This area is released once we collect a dump from user
76land scripts that are run. If there is dump data, then
77the /sys/kernel/release_region file is created, and
78the reserved memory is held.
79
80If there is no waiting dump data, then only the highest
81256MB of the ram is reserved as a scratch area. This area
82is *not* released: this region will be kept permanently
83reserved, so that it can act as a receptacle for a copy
84of the low 256MB in the case a crash does occur. See,
85however, "open issues" below, as to whether
86such a reserved region is really needed.
87
88Currently the dump will be copied from /proc/vmcore to a
89a new file upon user intervention. The starting address
90to be read and the range for each data point in provided
91in /sys/kernel/release_region.
92
93The tools to examine the dump will be same as the ones
94used for kdump.
95
96General notes:
97--------------
98Security: please note that there are potential security issues
99with any sort of dump mechanism. In particular, plaintext
100(unencrypted) data, and possibly passwords, may be present in
101the dump data. Userspace tools must take adequate precautions to
102preserve security.
103
104Open issues/ToDo:
105------------
106 o The various code paths that tell the hypervisor that a crash
107 occurred, vs. it simply being a normal reboot, should be
108 reviewed, and possibly clarified/fixed.
109
110 o Instead of using /sys/kernel, should there be a /sys/dump
111 instead? There is a dump_subsys being created by the s390 code,
112 perhaps the pseries code should use a similar layout as well.
113
114 o Is reserving a 256MB region really required? The goal of
115 reserving a 256MB scratch area is to make sure that no
116 important crash data is clobbered when the hypervisor
117 save low mem to the scratch area. But, if one could assure
118 that nothing important is located in some 256MB area, then
119 it would not need to be reserved. Something that can be
120 improved in subsequent versions.
121
122 o Still working the kdump team to integrate this with kdump,
123 some work remains but this would not affect the current
124 patches.
125
126 o Still need to write a shell script, to copy the dump away.
127 Currently I am parsing it manually.
diff --git a/Documentation/prio_tree.txt b/Documentation/prio_tree.txt
new file mode 100644
index 00000000000..3aa68f9a117
--- /dev/null
+++ b/Documentation/prio_tree.txt
@@ -0,0 +1,107 @@
1The prio_tree.c code indexes vmas using 3 different indexes:
2 * heap_index = vm_pgoff + vm_size_in_pages : end_vm_pgoff
3 * radix_index = vm_pgoff : start_vm_pgoff
4 * size_index = vm_size_in_pages
5
6A regular radix-priority-search-tree indexes vmas using only heap_index and
7radix_index. The conditions for indexing are:
8 * ->heap_index >= ->left->heap_index &&
9 ->heap_index >= ->right->heap_index
10 * if (->heap_index == ->left->heap_index)
11 then ->radix_index < ->left->radix_index;
12 * if (->heap_index == ->right->heap_index)
13 then ->radix_index < ->right->radix_index;
14 * nodes are hashed to left or right subtree using radix_index
15 similar to a pure binary radix tree.
16
17A regular radix-priority-search-tree helps to store and query
18intervals (vmas). However, a regular radix-priority-search-tree is only
19suitable for storing vmas with different radix indices (vm_pgoff).
20
21Therefore, the prio_tree.c extends the regular radix-priority-search-tree
22to handle many vmas with the same vm_pgoff. Such vmas are handled in
232 different ways: 1) All vmas with the same radix _and_ heap indices are
24linked using vm_set.list, 2) if there are many vmas with the same radix
25index, but different heap indices and if the regular radix-priority-search
26tree cannot index them all, we build an overflow-sub-tree that indexes such
27vmas using heap and size indices instead of heap and radix indices. For
28example, in the figure below some vmas with vm_pgoff = 0 (zero) are
29indexed by regular radix-priority-search-tree whereas others are pushed
30into an overflow-subtree. Note that all vmas in an overflow-sub-tree have
31the same vm_pgoff (radix_index) and if necessary we build different
32overflow-sub-trees to handle each possible radix_index. For example,
33in figure we have 3 overflow-sub-trees corresponding to radix indices
340, 2, and 4.
35
36In the final tree the first few (prio_tree_root->index_bits) levels
37are indexed using heap and radix indices whereas the overflow-sub-trees below
38those levels (i.e. levels prio_tree_root->index_bits + 1 and higher) are
39indexed using heap and size indices. In overflow-sub-trees the size_index
40is used for hashing the nodes to appropriate places.
41
42Now, an example prio_tree:
43
44 vmas are represented [radix_index, size_index, heap_index]
45 i.e., [start_vm_pgoff, vm_size_in_pages, end_vm_pgoff]
46
47level prio_tree_root->index_bits = 3
48-----
49 _
50 0 [0,7,7] |
51 / \ |
52 ------------------ ------------ | Regular
53 / \ | radix priority
54 1 [1,6,7] [4,3,7] | search tree
55 / \ / \ |
56 ------- ----- ------ ----- | heap-and-radix
57 / \ / \ | indexed
58 2 [0,6,6] [2,5,7] [5,2,7] [6,1,7] |
59 / \ / \ / \ / \ |
60 3 [0,5,5] [1,5,6] [2,4,6] [3,4,7] [4,2,6] [5,1,6] [6,0,6] [7,0,7] |
61 / / / _
62 / / / _
63 4 [0,4,4] [2,3,5] [4,1,5] |
64 / / / |
65 5 [0,3,3] [2,2,4] [4,0,4] | Overflow-sub-trees
66 / / |
67 6 [0,2,2] [2,1,3] | heap-and-size
68 / / | indexed
69 7 [0,1,1] [2,0,2] |
70 / |
71 8 [0,0,0] |
72 _
73
74Note that we use prio_tree_root->index_bits to optimize the height
75of the heap-and-radix indexed tree. Since prio_tree_root->index_bits is
76set according to the maximum end_vm_pgoff mapped, we are sure that all
77bits (in vm_pgoff) above prio_tree_root->index_bits are 0 (zero). Therefore,
78we only use the first prio_tree_root->index_bits as radix_index.
79Whenever index_bits is increased in prio_tree_expand, we shuffle the tree
80to make sure that the first prio_tree_root->index_bits levels of the tree
81is indexed properly using heap and radix indices.
82
83We do not optimize the height of overflow-sub-trees using index_bits.
84The reason is: there can be many such overflow-sub-trees and all of
85them have to be suffled whenever the index_bits increases. This may involve
86walking the whole prio_tree in prio_tree_insert->prio_tree_expand code
87path which is not desirable. Hence, we do not optimize the height of the
88heap-and-size indexed overflow-sub-trees using prio_tree->index_bits.
89Instead the overflow sub-trees are indexed using full BITS_PER_LONG bits
90of size_index. This may lead to skewed sub-trees because most of the
91higher significant bits of the size_index are likely to be 0 (zero). In
92the example above, all 3 overflow-sub-trees are skewed. This may marginally
93affect the performance. However, processes rarely map many vmas with the
94same start_vm_pgoff but different end_vm_pgoffs. Therefore, we normally
95do not require overflow-sub-trees to index all vmas.
96
97From the above discussion it is clear that the maximum height of
98a prio_tree can be prio_tree_root->index_bits + BITS_PER_LONG.
99However, in most of the common cases we do not need overflow-sub-trees,
100so the tree height in the common cases will be prio_tree_root->index_bits.
101
102It is fair to mention here that the prio_tree_root->index_bits
103is increased on demand, however, the index_bits is not decreased when
104vmas are removed from the prio_tree. That's tricky to do. Hence, it's
105left as a home work problem.
106
107
diff --git a/Documentation/scsi/ibmmca.txt b/Documentation/scsi/ibmmca.txt
new file mode 100644
index 00000000000..ac41a9fcac7
--- /dev/null
+++ b/Documentation/scsi/ibmmca.txt
@@ -0,0 +1,1402 @@
1
2 -=< The IBM Microchannel SCSI-Subsystem >=-
3
4 for the IBM PS/2 series
5
6 Low Level Software-Driver for Linux
7
8 Copyright (c) 1995 Strom Systems, Inc. under the terms of the GNU
9 General Public License. Originally written by Martin Kolinek, December 1995.
10 Officially modified and maintained by Michael Lang since January 1999.
11
12 Version 4.0a
13
14 Last update: January 3, 2001
15
16 Before you Start
17 ----------------
18 This is the common README.ibmmca file for all driver releases of the
19 IBM MCA SCSI driver for Linux. Please note, that driver releases 4.0
20 or newer do not work with kernel versions older than 2.4.0, while driver
21 versions older than 4.0 do not work with kernels 2.4.0 or later! If you
22 try to compile your kernel with the wrong driver source, the
23 compilation is aborted and you get a corresponding error message. This is
24 no bug in the driver; it prevents you from using the wrong source code
25 with the wrong kernel version.
26
27 Authors of this Driver
28 ----------------------
29 - Chris Beauregard (improvement of the SCSI-device mapping by the driver)
30 - Martin Kolinek (origin, first release of this driver)
31 - Klaus Kudielka (multiple SCSI-host management/detection, adaption to
32 Linux Kernel 2.1.x, module support)
33 - Michael Lang (assigning original pun/lun mapping, dynamical ldn
34 assignment, rewritten adapter detection, this file,
35 patches, official driver maintenance and subsequent
36 debugging, related with the driver)
37
38 Table of Contents
39 -----------------
40 1 Abstract
41 2 Driver Description
42 2.1 IBM SCSI-Subsystem Detection
43 2.2 Physical Units, Logical Units, and Logical Devices
44 2.3 SCSI-Device Recognition and dynamical ldn Assignment
45 2.4 SCSI-Device Order
46 2.5 Regular SCSI-Command-Processing
47 2.6 Abort & Reset Commands
48 2.7 Disk Geometry
49 2.8 Kernel Boot Option
50 2.9 Driver Module Support
51 2.10 Multiple Hostadapter Support
52 2.11 /proc/scsi-Filesystem Information
53 2.12 /proc/mca-Filesystem Information
54 2.13 Supported IBM SCSI-Subsystems
55 2.14 Linux Kernel Versions
56 3 Code History
57 4 To do
58 5 Users' Manual
59 5.1 Commandline Parameters
60 5.2 Troubleshooting
61 5.3 Bug reports
62 5.4 Support WWW-page
63 6 References
64 7 Credits to
65 7.1 People
66 7.2 Sponsors & Supporters
67 8 Trademarks
68 9 Disclaimer
69
70 * * *
71
72 1 Abstract
73 ----------
74 This README-file describes the IBM SCSI-subsystem low level driver for
75 Linux. The descriptions which were formerly kept in the source code have
76 been taken out of this file to simplify the codes readability. The driver
77 description has been updated, as most of the former description was already
78 quite outdated. The history of the driver development is also kept inside
79 here. Multiple historical developments have been summarized to shorten the
80 text size a bit. At the end of this file you can find a small manual for
81 this driver and hints to get it running on your machine.
82
83 2 Driver Description
84 --------------------
85 2.1 IBM SCSI-Subsystem Detection
86 --------------------------------
87 This is done in the ibmmca_detect() function. It first checks, if the
88 Microchannel-bus support is enabled, as the IBM SCSI-subsystem needs the
89 Microchannel. In a next step, a free interrupt is chosen and the main
90 interrupt handler is connected to it to handle answers of the SCSI-
91 subsystem(s). If the F/W SCSI-adapter is forced by the BIOS to use IRQ11
92 instead of IRQ14, IRQ11 is used for the IBM SCSI-2 F/W adapter. In a
93 further step it is checked, if the adapter gets detected by force from
94 the kernel commandline, where the I/O port and the SCSI-subsystem id can
95 be specified. The next step checks if there is an integrated SCSI-subsystem
96 installed. This register area is fixed through all IBM PS/2 MCA-machines
97 and appears as something like a virtual slot 10 of the MCA-bus. On most
98 PS/2 machines, the POS registers of slot 10 are set to 0xff or 0x00 if not
99 integrated SCSI-controller is available. But on certain PS/2s, like model
100 9595, this slot 10 is used to store other information which at earlier
101 stage confused the driver and resulted in the detection of some ghost-SCSI.
102 If POS-register 2 and 3 are not 0x00 and not 0xff, but all other POS
103 registers are either 0xff or 0x00, there must be an integrated SCSI-
104 subsystem present and it will be registered as IBM Integrated SCSI-
105 Subsystem. The next step checks, if there is a slot-adapter installed on
106 the MCA-bus. To get this, the first two POS-registers, that represent the
107 adapter ID are checked. If they fit to one of the ids, stored in the
108 adapter list, a SCSI-subsystem is assumed to be found in a slot and will be
109 registered. This check is done through all possible MCA-bus slots to allow
110 more than one SCSI-adapter to be present in the PS/2-system and this is
111 already the first point of problems. Looking into the technical reference
112 manual for the IBM PS/2 common interfaces, the POS2 register must have
113 different interpretation of its single bits to avoid overlapping I/O
114 regions. While one can assume, that the integrated subsystem has a fix
115 I/O-address at 0x3540 - 0x3547, further installed IBM SCSI-adapters must
116 use a different I/O-address. This is expressed by bit 1 to 3 of POS2
117 (multiplied by 8 + 0x3540). Bits 2 and 3 are reserved for the integrated
118 subsystem, but not for the adapters! The following list shows, how the
119 bits of POS2 and POS3 should be interpreted.
120
121 The POS2-register of all PS/2 models' integrated SCSI-subsystems has the
122 following interpretation of bits:
123 Bit 7 - 4 : Chip Revision ID (Release)
124 Bit 3 - 2 : Reserved
125 Bit 1 : 8k NVRAM Disabled
126 Bit 0 : Chip Enable (EN-Signal)
127 The POS3-register is interpreted as follows (for most IBM SCSI-subsys.):
128 Bit 7 - 5 : SCSI ID
129 Bit 4 - 0 : Reserved = 0
130 The slot-adapters have different interpretation of these bits. The IBM SCSI
131 adapter (w/Cache) and the IBM SCSI-2 F/W adapter use the following
132 interpretation of the POS2 register:
133 Bit 7 - 4 : ROM Segment Address Select
134 Bit 3 - 1 : Adapter I/O Address Select (*8+0x3540)
135 Bit 0 : Adapter Enable (EN-Signal)
136 and for the POS3 register:
137 Bit 7 - 5 : SCSI ID
138 Bit 4 : Fairness Enable (SCSI ID3 f. F/W)
139 Bit 3 - 0 : Arbitration Level
140 The most modern product of the series is the IBM SCSI-2 F/W adapter, it
141 allows dual-bus SCSI and SCSI-wide addressing, which means, PUNs may be
142 between 0 and 15. Here, Bit 4 is the high-order bit of the 4-bit wide
143 adapter PUN expression. In short words, this means, that IBM PS/2 machines
144 can only support 1 single integrated subsystem by default. Additional
145 slot-adapters get ports assigned by the automatic configuration tool.
146
147 One day I found a patch in ibmmca_detect(), forcing the I/O-address to be
148 0x3540 for integrated SCSI-subsystems, there was a remark placed, that on
149 integrated IBM SCSI-subsystems of model 56, the POS2 register was showing 5.
150 This means, that really for these models, POS2 has to be interpreted
151 sticking to the technical reference guide. In this case, the bit 2 (4) is
152 a reserved bit and may not be interpreted. These differences between the
153 adapters and the integrated controllers are taken into account by the
154 detection routine of the driver on from version >3.0g.
155
156 Every time, a SCSI-subsystem is discovered, the ibmmca_register() function
157 is called. This function checks first, if the requested area for the I/O-
158 address of this SCSI-subsystem is still available and assigns this I/O-
159 area to the SCSI-subsystem. There are always 8 sequential I/O-addresses
160 taken for each individual SCSI-subsystem found, which are:
161
162 Offset Type Permissions
163 0 Command Interface Register 1 Read/Write
164 1 Command Interface Register 2 Read/Write
165 2 Command Interface Register 3 Read/Write
166 3 Command Interface Register 4 Read/Write
167 4 Attention Register Read/Write
168 5 Basic Control Register Read/Write
169 6 Interrupt Status Register Read
170 7 Basic Status Register Read
171
172 After the I/O-address range is assigned, the host-adapter is assigned
173 to a local structure which keeps all adapter information needed for the
174 driver itself and the mid- and higher-level SCSI-drivers. The SCSI pun/lun
175 and the adapters' ldn tables are initialized and get probed afterwards by
176 the check_devices() function. If no further adapters are found,
177 ibmmca_detect() quits.
178
179 2.2 Physical Units, Logical Units, and Logical Devices
180 ------------------------------------------------------
181 There can be up to 56 devices on the SCSI bus (besides the adapter):
182 there are up to 7 "physical units" (each identified by physical unit
183 number or pun, also called the scsi id, this is the number you select
184 with hardware jumpers), and each physical unit can have up to 8
185 "logical units" (each identified by logical unit number, or lun,
186 between 0 and 7). The IBM SCSI-2 F/W adapter offers this on up to two
187 busses and provides support for 30 logical devices at the same time, where
188 in wide-addressing mode you can have 16 puns with 32 luns on each device.
189 This section describes the handling of devices on non-F/W adapters.
190 Just imagine, that you can have 16 * 32 = 512 devices on a F/W adapter
191 which means a lot of possible devices for such a small machine.
192
193 Typically the adapter has pun=7, so puns of other physical units
194 are between 0 and 6(15). On a wide-adapter a pun higher than 7 is
195 possible, but is normally not used. Almost all physical units have only
196 one logical unit, with lun=0. A CD-ROM jukebox would be an example of a
197 physical unit with more than one logical unit.
198
199 The embedded microprocessor of the IBM SCSI-subsystem hides the complex
200 two-dimensional (pun,lun) organization from the operating system.
201 When the machine is powered-up (or rebooted), the embedded microprocessor
202 checks, on its own, all 56 possible (pun,lun) combinations, and the first
203 15 devices found are assigned into a one-dimensional array of so-called
204 "logical devices", identified by "logical device numbers" or ldn. The last
205 ldn=15 is reserved for the subsystem itself. Wide adapters may have
206 to check up to 15 * 8 = 120 pun/lun combinations.
207
208 2.3 SCSI-Device Recognition and Dynamical ldn Assignment
209 --------------------------------------------------------
210 One consequence of information hiding is that the real (pun,lun)
211 numbers are also hidden. The two possibilities to get around this problem
212 are to offer fake pun/lun combinations to the operating system or to
213 delete the whole mapping of the adapter and to reassign the ldns, using
214 the immediate assign command of the SCSI-subsystem for probing through
215 all possible pun/lun combinations. An ldn is a "logical device number"
216 which is used by IBM SCSI-subsystems to access some valid SCSI-device.
217 At the beginning of the development of this driver, the following approach
218 was used:
219
220 First, the driver checked the ldn's (0 to 6) to find out which ldn's
221 have devices assigned. This was done by the functions check_devices() and
222 device_exists(). The interrupt handler has a special paragraph of code
223 (see local_checking_phase_flag) to assist in the checking. Assume, for
224 example, that three logical devices were found assigned at ldn 0, 1, 2.
225 These are presented to the upper layer of Linux SCSI driver
226 as devices with bogus (pun, lun) equal to (0,0), (1,0), (2,0).
227 On the other hand, if the upper layer issues a command to device
228 say (4,0), this driver returns DID_NO_CONNECT error.
229
230 In a second step of the driver development, the following improvement has
231 been applied: The first approach limited the number of devices to 7, far
232 fewer than the 15 that it could use, then it just mapped ldn ->
233 (ldn/8,ldn%8) for pun,lun. We ended up with a real mishmash of puns
234 and luns, but it all seemed to work.
235
236 The latest development, which is implemented from the driver version 3.0
237 and later, realizes the device recognition in the following way:
238 The physical SCSI-devices on the SCSI-bus are probed via immediate_assign-
239 and device_inquiry-commands, that is all implemented in a completely new
240 made check_devices() subroutine. This delivers an exact map of the physical
241 SCSI-world that is now stored in the get_scsi[][]-array. This means,
242 that the once hidden pun,lun assignment is now known to this driver.
243 It no longer believes in default-settings of the subsystem and maps all
244 ldns to existing pun,lun "by foot". This assures full control of the ldn
245 mapping and allows dynamical remapping of ldns to different pun,lun, if
246 there are more SCSI-devices installed than ldns available (n>15). The
247 ldns from 0 to 6 get 'hardwired' by this driver to puns 0 to 7 at lun=0,
248 excluding the pun of the subsystem. This assures, that at least simple
249 SCSI-installations have optimum access-speed and are not touched by
250 dynamical remapping. The ldns 7 to 14 are put to existing devices with
251 lun>0 or to non-existing devices, in order to satisfy the subsystem, if
252 there are less than 15 SCSI-devices connected. In the case of more than 15
253 devices, the dynamical mapping goes active. If the get_scsi[][] reports a
254 device to be existent, but it has no ldn assigned, it gets an ldn out of 7
255 to 14. The numbers are assigned in cyclic order, therefore it takes 8
256 dynamical reassignments on the SCSI-devices until a certain device
257 loses its ldn again. This assures that dynamical remapping is avoided
258 during intense I/O between up to 15 SCSI-devices (means pun,lun
259 combinations). A further advantage of this method is that people who
260 build their kernel without probing on all luns will get what they expect,
261 because the driver just won't assign everything with lun>0 when
262 multiple lun probing is inactive.
263
264 2.4 SCSI-Device Order
265 ---------------------
266 Because of the now correct recognition of physical pun,lun, and
267 their report to mid-level- and higher-level-drivers, the new reported puns
268 can be different from the old, faked puns. Therefore, Linux will eventually
269 change /dev/sdXXX assignments and prompt you for corrupted superblock
270 repair on boottime. In this case DO NOT PANIC, YOUR DISKS ARE STILL OK!!!
271 You have to reboot (CTRL-D) with an old kernel and set the /etc/fstab-file
272 entries right. After that, the system should come up as errorfree as before.
273 If your boot-partition is not coming up, also edit the /etc/lilo.conf-file
274 in a Linux session booted on old kernel and run lilo before reboot. Check
275 lilo.conf anyway to get boot on other partitions with foreign OSes right
276 again. But there exists a feature of this driver that allows you to change
277 the assignment order of the SCSI-devices by flipping the PUN-assignment.
278 See the next paragraph for a description.
279
280 The problem for this is, that Linux does not assign the SCSI-devices in the
281 way as described in the ANSI-SCSI-standard. Linux assigns /dev/sda to
282 the device with at minimum id 0. But the first drive should be at id 6,
283 because for historical reasons, drive at id 6 has, by hardware, the highest
284 priority and a drive at id 0 the lowest. IBM was one of the rare producers,
285 where the BIOS assigns drives belonging to the ANSI-SCSI-standard. Most
286 other producers' BIOS does not (I think even Adaptec-BIOS). The
287 IBMMCA_SCSI_ORDER_STANDARD flag, which you set while configuring the
288 kernel enables to choose the preferred way of SCSI-device-assignment.
289 Defining this flag would result in Linux determining the devices in the
290 same order as DOS and OS/2 does on your MCA-machine. This is also standard
291 on most industrial computers and OSes, like e.g. OS-9. Leaving this flag
292 undefined will get your devices ordered in the default way of Linux. See
293 also the remarks of Chris Beauregard from Dec 15, 1997 and the followups
294 in section 3.
295
296 2.5 Regular SCSI-Command-Processing
297 -----------------------------------
298 Only three functions get involved: ibmmca_queuecommand(), issue_cmd(),
299 and interrupt_handler().
300
301 The upper layer issues a scsi command by calling function
302 ibmmca_queuecommand(). This function fills a "subsystem control block"
303 (scb) and calls a local function issue_cmd(), which writes a scb
304 command into subsystem I/O ports. Once the scb command is carried out,
305 the interrupt_handler() is invoked. If a device is determined to be
306 existent and it has not assigned any ldn, it gets one dynamically.
307 For this, the whole stuff is done in ibmmca_queuecommand().
308
309 2.6 Abort & Reset Commands
310 --------------------------
311 These are implemented with busy waiting for interrupt to arrive.
312 ibmmca_reset() and ibmmca_abort() do not work sufficiently well
313 up to now and need still a lot of development work. This seems
314 to be a problem with other low-level SCSI drivers too, however
315 this should be no excuse.
316
317 2.7 Disk Geometry
318 -----------------
319 The ibmmca_biosparams() function should return the same disk geometry
320 as the bios. This is needed for fdisk, etc. The returned geometry is
321 certainly correct for disks smaller than 1 gigabyte. In the meantime,
322 it has been proved, that this works fine even with disks larger than
323 1 gigabyte.
324
325 2.8 Kernel Boot Option
326 ----------------------
327 The function ibmmca_scsi_setup() is called if option ibmmcascsi=n
328 is passed to the kernel. See file linux/init/main.c for details.
329
330 2.9 Driver Module Support
331 -------------------------
332 Is implemented and tested by K. Kudielka. This could probably not work
333 on kernels <2.1.0.
334
335 2.10 Multiple Hostadapter Support
336 ---------------------------------
337 This driver supports up to eight interfaces of type IBM-SCSI-Subsystem.
338 Integrated-, and MCA-adapters are automatically recognized. Unrecognizable
339 IBM-SCSI-Subsystem interfaces can be specified as kernel-parameters.
340
341 2.11 /proc/scsi-Filesystem Information
342 --------------------------------------
343 Information about the driver condition is given in
344 /proc/scsi/ibmmca/<host_no>. ibmmca_proc_info() provides this information.
345
346 This table is quite informative for interested users. It shows the load
347 of commands on the subsystem and whether you are running the bypassed
348 (software) or integrated (hardware) SCSI-command set (see below). The
349 amount of accesses is shown. Read, write, modeselect is shown separately
350 in order to help debugging problems with CD-ROMs or tapedrives.
351
352 The following table shows the list of 15 logical device numbers, that are
353 used by the SCSI-subsystem. The load on each ldn is shown in the table,
354 again, read and write commands are split. The last column shows the amount
355 of reassignments, that have been applied to the ldns, if you have more than
356 15 pun/lun combinations available on the SCSI-bus.
357
358 The last two tables show the pun/lun map and the positions of the ldns
359 on this pun/lun map. This may change during operation, when a ldn is
360 reassigned to another pun/lun combination. If the necessity for dynamical
361 assignments is set to 'no', the ldn structure keeps static.
362
363 2.12 /proc/mca-Filesystem Information
364 -------------------------------------
365 The slot-file contains all default entries and in addition chip and I/O-
366 address information of the SCSI-subsystem. This information is provided
367 by ibmmca_getinfo().
368
369 2.13 Supported IBM SCSI-Subsystems
370 ----------------------------------
371 The following IBM SCSI-subsystems are supported by this driver:
372
373 - IBM Fast/Wide SCSI-2 Adapter
374 - IBM 7568 Industrial Computer SCSI Adapter w/Cache
375 - IBM Expansion Unit SCSI Controller
376 - IBM SCSI Adapter w/Cache
377 - IBM SCSI Adapter
378 - IBM Integrated SCSI Controller
379 - All clones, 100% compatible with the chipset and subsystem command
380 system of IBM SCSI-adapters (forced detection)
381
382 2.14 Linux Kernel Versions
383 --------------------------
384 The IBM SCSI-subsystem low level driver is prepared to be used with
385 all versions of Linux between 2.0.x and 2.4.x. The compatibility checks
386 are fully implemented up from version 3.1e of the driver. This means, that
387 you just need the latest ibmmca.h and ibmmca.c file and copy it in the
388 linux/drivers/scsi directory. The code is automatically adapted during
389 kernel compilation. This is different from kernel 2.4.0! Here version
390 4.0 or later of the driver must be used for kernel 2.4.0 or later. Version
391 4.0 or later does not work together with older kernels! Driver versions
392 older than 4.0 do not work together with kernel 2.4.0 or later. They work
393 on all older kernels.
394
395 3 Code History
396 --------------
397 Jan 15 1996: First public release.
398 - Martin Kolinek
399
400 Jan 23 1996: Scrapped code which reassigned scsi devices to logical
401 device numbers. Instead, the existing assignment (created
402 when the machine is powered-up or rebooted) is used.
403 A side effect is that the upper layer of Linux SCSI
404 device driver gets bogus scsi ids (this is benign),
405 and also the hard disks are ordered under Linux the
406 same way as they are under dos (i.e., C: disk is sda,
407 D: disk is sdb, etc.).
408 - Martin Kolinek
409
410 I think that the CD-ROM is now detected only if a CD is
411 inside CD_ROM while Linux boots. This can be fixed later,
412 once the driver works on all types of PS/2's.
413 - Martin Kolinek
414
415 Feb 7 1996: Modified biosparam function. Fixed the CD-ROM detection.
416 For now, devices other than harddisk and CD_ROM are
417 ignored. Temporarily modified abort() function
418 to behave like reset().
419 - Martin Kolinek
420
421 Mar 31 1996: The integrated scsi subsystem is correctly found
422 in PS/2 models 56,57, but not in model 76. Therefore
423 the ibmmca_scsi_setup() function has been added today.
424 This function allows the user to force detection of
425 scsi subsystem. The kernel option has format
426 ibmmcascsi=n
427 where n is the scsi_id (pun) of the subsystem. Most likely, n is 7.
428 - Martin Kolinek
429
430 Aug 21 1996: Modified the code which maps ldns to (pun,0). It was
431 insufficient for those of us with CD-ROM changers.
432 - Chris Beauregard
433
434 Dec 14 1996: More improvements to the ldn mapping. See check_devices
435 for details. Did more fiddling with the integrated SCSI detection,
436 but I think it's ultimately hopeless without actually testing the
437 model of the machine. The 56, 57, 76 and 95 (ultimedia) all have
438 different integrated SCSI register configurations. However, the 56
439 and 57 are the only ones that have problems with forced detection.
440 - Chris Beauregard
441
442 Mar 8-16 1997: Modified driver to run as a module and to support
443 multiple adapters. A structure, called ibmmca_hostdata, is now
444 present, containing all the variables, that were once only
445 available for one single adapter. The find_subsystem-routine has vanished.
446 The hardware recognition is now done in ibmmca_detect directly.
447 This routine checks for presence of MCA-bus, checks the interrupt
448 level and continues with checking the installed hardware.
449 Certain PS/2-models do not recognize a SCSI-subsystem automatically.
450 Hence, the setup defined by command-line-parameters is checked first.
451 Thereafter, the routine probes for an integrated SCSI-subsystem.
452 Finally, adapters are checked. This method has the advantage to cover all
453 possible combinations of multiple SCSI-subsystems on one MCA-board. Up to
454 eight SCSI-subsystems can be recognized and announced to the upper-level
455 drivers with this improvement. A set of defines made changes to other
456 routines as small as possible.
457 - Klaus Kudielka
458
459 May 30 1997: (v1.5b)
460 1) SCSI-command capability enlarged by the recognition of MODE_SELECT.
461 This needs the RD-Bit to be disabled on IM_OTHER_SCSI_CMD_CMD which
462 allows data to be written from the system to the device. It is a
463 necessary step to be allowed to set blocksize of SCSI-tape-drives and
464 the tape-speed, without confusing the SCSI-Subsystem.
465 2) The recognition of a tape is included in the check_devices routine.
466 This is done by checking for TYPE_TAPE, that is already defined in
467 the kernel-scsi-environment. The markup of a tape is done in the
468 global ldn_is_tape[] array. If the entry on index ldn
469 is 1, there is a tapedrive connected.
470 3) The ldn_is_tape[] array is necessary to distinguish between tape- and
471 other devices. Fixed blocklength devices should not cause a problem
472 with the SCB-command for read and write in the ibmmca_queuecommand
473 subroutine. Therefore, I only derivate the READ_XX, WRITE_XX for
474 the tape-devices, as recommended by IBM in this Technical Reference,
475 mentioned below. (IBM recommends to avoid using the read/write of the
476 subsystem, but the fact was, that read/write causes a command error from
477 the subsystem and this causes kernel-panic.)
478 4) In addition, I propose to use the ldn instead of a fix char for the
479 display of PS2_DISK_LED_ON(). On 95, one can distinguish between the
480 devices that are accessed. It shows activity and easyfies debugging.
481 The tape-support has been tested with a SONY SDT-5200 and a HP DDS-2
482 (I do not know yet the type). Optimization and CD-ROM audio-support,
483 I am working on ...
484 - Michael Lang
485
486 June 19 1997: (v1.6b)
487 1) Submitting the extra-array ldn_is_tape[] -> to the local ld[]
488 device-array.
489 2) CD-ROM Audio-Play seems to work now.
490 3) When using DDS-2 (120M) DAT-Tapes, mtst shows still density-code
491 0x13 for ordinary DDS (61000 BPM) instead 0x24 for DDS-2. This appears
492 also on Adaptec 2940 adaptor in a PCI-System. Therefore, I assume that
493 the problem is independent of the low-level-driver/bus-architecture.
494 4) Hexadecimal ldn on PS/2-95 LED-display.
495 5) Fixing of the PS/2-LED on/off that it works right with tapedrives and
496 does not confuse the disk_rw_in_progress counter.
497 - Michael Lang
498
499 June 21 1997: (v1.7b)
500 1) Adding of a proc_info routine to inform in /proc/scsi/ibmmca/<host> the
501 outer-world about operational load statistics on the different ldns,
502 seen by the driver. Everybody that has more than one IBM-SCSI should
503 test this, because I only have one and cannot see what happens with more
504 than one IBM-SCSI hosts.
505 2) Definition of a driver version-number to have a better recognition of
506 the source when there are existing too much releases that may confuse
507 the user, when reading about release-specific problems. Up to know,
508 I calculated the version-number to be 1.7. Because we are in BETA-test
509 yet, it is today 1.7b.
510 3) Sorry for the heavy bug I programmed on June 19 1997! After that, the
511 CD-ROM did not work any more! The C7-command was a fake impression
512 I got while programming. Now, the READ and WRITE commands for CD-ROM are
513 no longer running over the subsystem, but just over
514 IM_OTHER_SCSI_CMD_CMD. On my observations (PS/2-95), now CD-ROM mounts
515 much faster(!) and hopefully all fancy multimedia-functions, like direct
516 digital recording from audio-CDs also work. (I tried it with cdda2wav
517 from the cdwtools-package and it filled up the harddisk immediately :-).)
518 To easify boolean logics, a further local device-type in ld[], called
519 is_cdrom has been included.
520 4) If one uses a SCSI-device of unsupported type/commands, one
521 immediately runs into a kernel-panic caused by Command Error. To better
522 understand which SCSI-command caused the problem, I extended this
523 specific panic-message slightly.
524 - Michael Lang
525
526 June 25 1997: (v1.8b)
527 1) Some cosmetic changes for the handling of SCSI-device-types.
528 Now, also CD-Burners / WORMs and SCSI-scanners should work. For
529 MO-drives I have no experience, therefore not yet supported.
530 In logical_devices I changed from different type-variables to one
531 called 'device_type' where the values, corresponding to scsi.h,
532 of a SCSI-device are stored.
533 2) There existed a small bug, that maps a device, coming after a SCSI-tape
534 wrong. Therefore, e.g. a CD-ROM changer would have been mapped wrong
535 -> problem removed.
536 3) Extension of the logical_device structure. Now it contains also device,
537 vendor and revision-level of a SCSI-device for internal usage.
538 - Michael Lang
539
540 June 26-29 1997: (v2.0b)
541 1) The release number 2.0b is necessary because of the completely new done
542 recognition and handling of SCSI-devices with the adapter. As I got
543 from Chris the hint, that the subsystem can reassign ldns dynamically,
544 I remembered this immediate_assign-command, I found once in the handbook.
545 Now, the driver first kills all ldn assignments that are set by default
546 on the SCSI-subsystem. After that, it probes on all puns and luns for
547 devices by going through all combinations with immediate_assign and
548 probing for devices, using device_inquiry. The found physical(!) pun,lun
549 structure is stored in get_scsi[][] as device types. This is followed
550 by the assignment of all ldns to existing SCSI-devices. If more ldns
551 than devices are available, they are assigned to non existing pun,lun
552 combinations to satisfy the adapter. With this, the dynamical mapping
553 was possible to implement. (For further info see the text in the
554 source code and in the description below. Read the description
555 below BEFORE installing this driver on your system!)
556 2) Changed the name IBMMCA_DRIVER_VERSION to IBMMCA_SCSI_DRIVER_VERSION.
557 3) The LED-display shows on PS/2-95 no longer the ldn, but the SCSI-ID
558 (pun) of the accessed SCSI-device. This is now senseful, because the
559 pun known within the driver is exactly the pun of the physical device
560 and no longer a fake one.
561 4) The /proc/scsi/ibmmca/<host_no> consists now of the first part, where
562 hit-statistics of ldns is shown and a second part, where the maps of
563 physical and logical SCSI-devices are displayed. This could be very
564 interesting, when one is using more than 15 SCSI-devices in order to
565 follow the dynamical remapping of ldns.
566 - Michael Lang
567
568 June 26-29 1997: (v2.0b-1)
569 1) I forgot to switch the local_checking_phase_flag to 1 and back to 0
570 in the dynamical remapping part in ibmmca_queuecommand for the
571 device_exist routine. Sorry.
572 - Michael Lang
573
574 July 1-13 1997: (v3.0b,c)
575 1) Merging of the driver-developments of Klaus Kudielka and Michael Lang
576 in order to get a optimum and unified driver-release for the
577 IBM-SCSI-Subsystem-Adapter(s).
578 For people, using the Kernel-release >=2.1.0, module-support should
579 be no problem. For users, running under <2.1.0, module-support may not
580 work, because the methods have changed between 2.0.x and 2.1.x.
581 2) Added some more effective statistics for /proc-output.
582 3) Change typecasting at necessary points from (unsigned long) to
583 virt_to_bus().
584 4) Included #if... at special points to have specific adaption of the
585 driver to kernel 2.0.x and 2.1.x. It should therefore also run with
586 later releases.
587 5) Magneto-Optical drives and medium-changers are also recognized, now.
588 Therefore, we have a completely gapfree recognition of all SCSI-
589 device-types, that are known by Linux up to kernel 2.1.31.
590 6) The flag SCSI_IBMMCA_DEV_RESET has been inserted. If it is set within
591 the configuration, each connected SCSI-device will get a reset command
592 during boottime. This can be necessary for some special SCSI-devices.
593 This flag should be included in Config.in.
594 (See also the new Config.in file.)
595 Probable next improvement: bad disk handler.
596 - Michael Lang
597
598 Sept 14 1997: (v3.0c)
599 1) Some debugging and speed optimization applied.
600 - Michael Lang
601
602 Dec 15, 1997
603 - chrisb@truespectra.com
604 - made the front panel display thingy optional, specified from the
605 command-line via ibmmcascsi=display. Along the lines of the /LED
606 option for the OS/2 driver.
607 - fixed small bug in the LED display that would hang some machines.
608 - reversed ordering of the drives (using the
609 IBMMCA_SCSI_ORDER_STANDARD define). This is necessary for two main
610 reasons:
611 - users who've already installed Linux won't be screwed. Keep
612 in mind that not everyone is a kernel hacker.
613 - be consistent with the BIOS ordering of the drives. In the
614 BIOS, id 6 is C:, id 0 might be D:. With this scheme, they'd be
615 backwards. This confuses the crap out of those heathens who've
616 got a impure Linux installation (which, <wince>, I'm one of).
617 This whole problem arises because IBM is actually non-standard with
618 the id to BIOS mappings. You'll find, in fdomain.c, a similar
619 comment about a few FD BIOS revisions. The Linux (and apparently
620 industry) standard is that C: maps to scsi id (0,0). Let's stick
621 with that standard.
622 - Since this is technically a branch of my own, I changed the
623 version number to 3.0e-cpb.
624
625 Jan 17, 1998: (v3.0f)
626 1) Addition of some statistical info for /proc in proc_info.
627 2) Taking care of the SCSI-assignment problem, dealed by Chris at Dec 15
628 1997. In fact, IBM is right, concerning the assignment of SCSI-devices
629 to driveletters. It is conform to the ANSI-definition of the SCSI-
630 standard to assign drive C: to SCSI-id 6, because it is the highest
631 hardware priority after the hostadapter (that has still today by
632 default everywhere id 7). Also realtime-operating systems that I use,
633 like LynxOS and OS9, which are quite industrial systems use top-down
634 numbering of the harddisks, that is also starting at id 6. Now, one
635 sits a bit between two chairs. On one hand side, using the define
636 IBMMCA_SCSI_ORDER_STANDARD makes Linux assigning disks conform to
637 the IBM- and ANSI-SCSI-standard and keeps this driver downward
638 compatible to older releases, on the other hand side, people is quite
639 habituated in believing that C: is assigned to (0,0) and much other
640 SCSI-BIOS do so. Therefore, I moved the IBMMCA_SCSI_ORDER_STANDARD
641 define out of the driver and put it into Config.in as subitem of
642 'IBM SCSI support'. A help, added to Documentation/Configure.help
643 explains the differences between saying 'y' or 'n' to the user, when
644 IBMMCA_SCSI_ORDER_STANDARD prompts, so the ordinary user is enabled to
645 choose the way of assignment, depending on his own situation and gusto.
646 3) Adapted SCSI_IBMMCA_DEV_RESET to the local naming convention, so it is
647 now called IBMMCA_SCSI_DEV_RESET.
648 4) Optimization of proc_info and its subroutines.
649 5) Added more in-source-comments and extended the driver description by
650 some explanation about the SCSI-device-assignment problem.
651 - Michael Lang
652
653 Jan 18, 1998: (v3.0g)
654 1) Correcting names to be absolutely conform to the later 2.1.x releases.
655 This is necessary for
656 IBMMCA_SCSI_DEV_RESET -> CONFIG_IBMMCA_SCSI_DEV_RESET
657 IBMMCA_SCSI_ORDER_STANDARD -> CONFIG_IBMMCA_SCSI_ORDER_STANDARD
658 - Michael Lang
659
660 Jan 18, 1999: (v3.1 MCA-team internal)
661 1) The multiple hosts structure is accessed from every subroutine, so there
662 is no longer the address of the device structure passed from function
663 to function, but only the hostindex. A call by value, nothing more. This
664 should really be understood by the compiler and the subsystem should get
665 the right values and addresses.
666 2) The SCSI-subsystem detection was not complete and quite hugely buggy up
667 to now, compared to the technical manual. The interpretation of the pos2
668 register is not as assumed by people before, therefore, I dropped a note
669 in the ibmmca_detect function to show the registers' interpretation.
670 The pos-registers of integrated SCSI-subsystems do not contain any
671 information concerning the IO-port offset, really. Instead, they contain
672 some info about the adapter, the chip, the NVRAM .... The I/O-port is
673 fixed to 0x3540 - 0x3547. There can be more than one adapters in the
674 slots and they get an offset for the I/O area in order to get their own
675 I/O-address area. See chapter 2 for detailed description. At least, the
676 detection should now work right, even on models other than 95. The 95ers
677 came happily around the bug, as their pos2 register contains always 0
678 in the critical area. Reserved bits are not allowed to be interpreted,
679 therefore, IBM is allowed to set those bits as they like and they may
680 really vary between different PS/2 models. So, now, no interpretation
681 of reserved bits - hopefully no trouble here anymore.
682 3) The command error, which you may get on models 55, 56, 57, 70, 77 and
683 P70 may have been caused by the fact, that adapters of older design do
684 not like sending commands to non-existing SCSI-devices and will react
685 with a command error as a sign of protest. While this error is not
686 present on IBM SCSI Adapter w/cache, it appears on IBM Integrated SCSI
687 Adapters. Therefore, I implemented a workaround to forgive those
688 adapters their protests, but it is marked up in the statistics, so
689 after a successful boot, you can see in /proc/scsi/ibmmca/<host_number>
690 how often the command errors have been forgiven to the SCSI-subsystem.
691 If the number is bigger than 0, you have a SCSI subsystem of older
692 design, what should no longer matter.
693 4) ibmmca_getinfo() has been adapted very carefully, so it shows in the
694 slotn file really, what is senseful to be presented.
695 5) ibmmca_register() has been extended in its parameter list in order to
696 pass the right name of the SCSI-adapter to Linux.
697 - Michael Lang
698
699 Feb 6, 1999: (v3.1)
700 1) Finally, after some 3.1Beta-releases, the 3.1 release. Sorry, for
701 the delayed release, but it was not finished with the release of
702 Kernel 2.2.0.
703 - Michael Lang
704
705 Feb 10, 1999 (v3.1)
706 1) Added a new commandline parameter called 'bypass' in order to bypass
707 every integrated subsystem SCSI-command consequently in case of
708 troubles.
709 2) Concatenated read_capacity requests to the harddisks. It gave a lot
710 of troubles with some controllers and after I wanted to apply some
711 extensions, it jumped out in the same situation, on my w/cache, as like
712 on D. Weinehalls' Model 56, having integrated SCSI. This gave me the
713 decisive hint to move the code-part out and declare it global. Now
714 it seems to work far better and more stable. Let us see what
715 the world thinks of it...
716 3) By the way, only Sony DAT-drives seem to show density code 0x13. A
717 test with a HP drive gave right results, so the problem is vendor-
718 specific and not a problem of the OS or the driver.
719 - Michael Lang
720
721 Feb 18, 1999 (v3.1d)
722 1) The abort command and the reset function have been checked for
723 inconsistencies. From the logical point of thinking, they work
724 at their optimum, now, but as the subsystem does not answer with an
725 interrupt, abort never finishes, sigh...
726 2) Everything, that is accessed by a busmaster request from the adapter
727 is now declared as global variable, even the return-buffer in the
728 local checking phase. This assures, that no accesses to undefined memory
729 areas are performed.
730 3) In ibmmca.h, the line unchecked_isa_dma is added with 1 in order to
731 avoid memory-pointers for the areas higher than 16MByte in order to
732 be sure, it also works on 16-Bit Microchannel bus systems.
733 4) A lot of small things have been found, but nothing that endangered the
734 driver operations. Just it should be more stable, now.
735 - Michael Lang
736
737 Feb 20, 1999 (v3.1e)
738 1) I took the warning from the Linux Kernel Hackers Guide serious and
739 checked the cmd->result return value to the done-function very carefully.
740 It is obvious, that the IBM SCSI only delivers the tsb.dev_status, if
741 some error appeared, else it is undefined. Now, this is fixed. Before
742 any SCB command gets queued, the tsb.dev_status is set to 0, so the
743 cmd->result won't screw up Linux higher level drivers.
744 2) The reset-function has slightly improved. This is still planned for
745 abort. During the abort and the reset function, no interrupts are
746 allowed. This is however quite hard to cope with, so the INT-status
747 register is read. When the interrupt gets queued, one can find its
748 status immediately on that register and is enabled to continue in the
749 reset function. I had no chance to test this really, only in a bogus
750 situation, I got this function running, but the situation was too much
751 worse for Linux :-(, so tests will continue.
752 3) Buffers got now consistent. No open address mapping, as before and
753 therefore no further troubles with the unassigned memory segmentation
754 faults that scrambled probes on 95XX series and even on 85XX series,
755 when the kernel is done in a not so perfectly fitting way.
756 4) Spontaneous interrupts from the subsystem, appearing without any
757 command previously queued are answered with a DID_BAD_INTR result.
758 5) Taken into account ZP Gus' proposals to reverse the SCSI-device
759 scan order. As it does not work on Kernel 2.1.x or 2.2.x, as proposed
760 by him, I implemented it in a slightly derived way, which offers in
761 addition more flexibility.
762 - Michael Lang
763
764 Apr 23, 2000 (v3.2pre1)
765 1) During a very long time, I collected a huge amount of bug reports from
766 various people, trying really quite different things on their SCSI-
767 PS/2s. Today, all these bug reports are taken into account and should be
768 mostly solved. The major topics were:
769 - Driver crashes during boottime by no obvious reason.
770 - Driver panics while the midlevel-SCSI-driver is trying to inquire
771 the SCSI-device properties, even though hardware is in perfect state.
772 - Displayed info for the various slot-cards is interpreted wrong.
773 The main reasons for the crashes were two:
774 1) The commands to check for device information like INQUIRY,
775 TEST_UNIT_READY, REQUEST_SENSE and MODE_SENSE cause the devices
776 to deliver information of up to 255 bytes. Midlevel drivers offer
777 1024 bytes of space for the answer, but the IBM-SCSI-adapters do
778 not accept this, as they stick quite near to ANSI-SCSI and report
779 a COMMAND_ERROR message which causes the driver to panic. The main
780 problem was located around the INQUIRY command. Now, for all the
781 mentioned commands, the buffersize sent to the adapter is at
782 maximum 255 which seems to be a quite reasonable solution.
783 TEST_UNIT_READY gets a buffersize of 0 to make sure that no
784 data is transferred in order to avoid any possible command failure.
785 2) On unsuccessful TEST_UNIT_READY, the mid-level driver has to send
786 a REQUEST_SENSE in order to see where the problem is located. This
787 REQUEST_SENSE may have various length in its answer-buffer. IBM
788 SCSI-subsystems report a command failure if the returned buffersize
789 is different from the sent buffersize, but this can be suppressed by
790 a special bit, which is now done and problems seem to be solved.
791 2) Code adaption to all kernel-releases. Now, the 3.2 code compiles on
792 2.0.x, 2.1.x, 2.2.x and 2.3.x kernel releases without any code-changes.
793 3) Commandline-parameters are recognized again, even under Kernel 2.3.x or
794 higher.
795 - Michael Lang
796
797 April 27, 2000 (v3.2pre2)
798 1) Bypassed commands get read by the adapter by one cycle instead of two.
799 This increases SCSI-performance.
800 2) Synchronous datatransfer is provided for sure to be 5 MHz on older
801 SCSI and 10 MHz on internal F/W SCSI-adapter.
802 3) New commandline parameters allow to force the adapter to slow down while
803 in synchronous transfer. Could be helpful for very old devices.
804 - Michael Lang
805
806 June 2, 2000 (v3.2pre5)
807 1) Added Jim Shorney's contribution to make the activity indicator
808 flashing in addition to the LED-alphanumeric display-panel on
809 models 95A. To be enabled to choose this feature freely, a new
810 commandline parameter is added, called 'activity'.
811 2) Added the READ_CONTROL bit for test_unit_ready SCSI-command.
812 3) Added some suppress_exception bits to read_device_capacity and
813 all device_inquiry occurrences in the driver code.
814 4) Complaints about the various KERNEL_VERSION implementations are
815 taken into account. Every local_LinuxKernelVersion occurrence is
816 now replaced by KERNEL_VERSION, defined in linux/version.h.
817 Corresponding changes were applied to ibmmca.h, too. This was a
818 contribution to all kernel-parts by Philipp Hahn.
819 - Michael Lang
820
821 July 17, 2000 (v3.2pre8)
822 A long period of collecting bug reports from all corners of the world
823 now lead to the following corrections to the code:
824 1) SCSI-2 F/W support crashed with a COMMAND ERROR. The reason for this
825 was that it is possible to disable Fast-SCSI for the external bus.
826 The feature-control command, where this crash appeared regularly, tried
827 to set the maximum speed of 10MHz synchronous transfer speed and that
828 reports a COMMAND ERROR if external bus Fast-SCSI is disabled. Now,
829 the feature-command probes down from maximum speed until the adapter
830 stops to complain, which is at the same time the maximum possible
831 speed selected in the reference program. So, F/W external can run at
832 5 MHz (slow-) or 10 MHz (fast-SCSI). During feature probing, the
833 COMMAND ERROR message is used to detect if the adapter does not complain.
834 2) Up to now, only combined busmode is supported, if you use external
835 SCSI-devices, attached to the F/W-controller. If dual bus is selected,
836 only the internal SCSI-devices get accessed by Linux. For most
837 applications, this should do fine.
838 3) Wide-SCSI-addressing (16-Bit) is now possible for the internal F/W
839 bus on the F/W adapter. If F/W adapter is detected, the driver
840 automatically uses the extended PUN/LUN <-> LDN mapping tables, which
841 are now new from 3.2pre8. This allows PUNs between 0 and 15 and should
842 provide more fun with the F/W adapter.
843 4) Several machines use the SCSI: POS registers for internal/undocumented
844 storage of system relevant info. This confused the driver, mainly on
845 models 9595, as it expected no onboard SCSI only, if all POS in
846 the integrated SCSI-area are set to 0x00 or 0xff. Now, the mechanism
847 to check for integrated SCSI is much more restrictive and these problems
848 should be history.
849 - Michael Lang
850
851 July 18, 2000 (v3.2pre9)
852 This develop rather quickly at the moment. Two major things were still
853 missing in 3.2pre8:
854 1) The adapter PUN for F/W adapters has 4-bits, while all other adapters
855 have 3-bits. This is now taken into account for F/W.
856 2) When you select CONFIG_IBMMCA_SCSI_ORDER_STANDARD, you should
857 normally get the inverse probing order of your devices on the SCSI-bus.
858 The ANSI device order gets scrambled in version 3.2pre8!! Now, a new
859 and tested algorithm inverts the device-order on the SCSI-bus and
860 automatically avoids accidental access to whatever SCSI PUN the adapter
861 is set and works with SCSI- and Wide-SCSI-addressing.
862 - Michael Lang
863
864 July 23, 2000 (v3.2pre10 unpublished)
865 1) LED panel display supports wide-addressing in ibmmca=display mode.
866 2) Adapter-information and autoadaption to address-space is done.
867 3) Auto-probing for maximum synchronous SCSI transfer rate is working.
868 4) Optimization to some embedded function calls is applied.
869 5) Added some comment for the user to wait for SCSI-devices being probed.
870 6) Finished version 3.2 for Kernel 2.4.0. It least, I thought it is but...
871 - Michael Lang
872
873 July 26, 2000 (v3.2pre11)
874 1) I passed a horrible weekend getting mad with NMIs on kernel 2.2.14 and
875 a model 9595. Asking around in the community, nobody except of me has
876 seen such errors. Weird, but I am trying to recompile everything on
877 the model 9595. Maybe, as I use a specially modified gcc, that could
878 cause problems. But, it was not the reason. The true background was,
879 that the kernel was compiled for i386 and the 9595 has a 486DX-2.
880 Normally, no troubles should appear, but for this special machine,
881 only the right processor support is working fine!
882 2) Previous problems with synchronous speed, slowing down from one adapter
883 to the next during probing are corrected. Now, local variables store
884 the synchronous bitmask for every single adapter found on the MCA bus.
885 3) LED alphanumeric panel support for XX95 systems is now showing some
886 alive rotator during boottime. This makes sense, when no monitor is
887 connected to the system. You can get rid of all display activity, if
888 you do not use any parameter or just ibmmcascsi=activity, for the
889 harddrive activity LED, existent on all PS/2, except models 8595-XXX.
890 If no monitor is available, please use ibmmcascsi=display, which works
891 fine together with the linuxinfo utility for the LED-panel.
892 - Michael Lang
893
894 July 29, 2000 (v3.2)
895 1) Submission of this driver for kernel 2.4test-XX and 2.2.17.
896 - Michael Lang
897
898 December 28, 2000 (v3.2d / v4.0)
899 1) The interrupt handler had some wrong statement to wait for. This
900 was done due to experimental reasons during 3.2 development but it
901 has shown that this is not stable enough. Going back to wait for the
902 adapter to be not busy is best.
903 2) Inquiry requests can be shorter than 255 bytes of return buffer. Due
904 to a bug in the ibmmca_queuecommand routine, this buffer was forced
905 to 255 at minimum. If the memory address, this return buffer is pointing
906 to does not offer more space, invalid memory accesses destabilized the
907 kernel.
908 3) version 4.0 is only valid for kernel 2.4.0 or later. This is necessary
909 to remove old kernel version dependent waste from the driver. 3.2d is
910 only distributed with older kernels but keeps compatibility with older
911 kernel versions. 4.0 and higher versions cannot be used with older
912 kernels anymore!! You must have at least kernel 2.4.0!!
913 4) The commandline argument 'bypass' and all its functionality got removed
914 in version 4.0. This was never really necessary, as all troubles were
915 based on non-command related reasons up to now, so bypassing commands
916 did not help to avoid any bugs. It is kept in 3.2X for debugging reasons.
917 5) Dynamic reassignment of ldns was again verified and analyzed to be
918 completely inoperational. This is corrected and should work now.
919 6) All commands that get sent to the SCSI adapter were verified and
920 completed in such a way, that they are now completely conform to the
921 demands in the technical description of IBM. Main candidates were the
922 DEVICE_INQUIRY, REQUEST_SENSE and DEVICE_CAPACITY commands. They must
923 be transferred by bypassing the internal command buffer of the adapter
924 or else the response can be a random result. GET_POS_INFO would be more
925 safe in usage, if one could use the SUPRESS_EXCEPTION_SHORT, but this
926 is not allowed by the technical references of IBM. (Sorry, folks, the
927 model 80 problem is still a task to be solved in a different way.)
928 7) v3.2d is still hold back for some days for testing, while 4.0 is
929 released.
930 - Michael Lang
931
932 January 3, 2001 (v4.0a)
933 1) A lot of complains after the 2.4.0-prerelease kernel came in about
934 the impossibility to compile the driver as a module. This problem is
935 solved. In combination with that problem, some unprecise declaration
936 of the function option_setup() gave some warnings during compilation.
937 This is solved, too by a forward declaration in ibmmca.c.
938 2) #ifdef argument concerning CONFIG_SCSI_IBMMCA is no longer needed and
939 was entirely removed.
940 3) Some switch statements got optimized in code, as some minor variables
941 in internal SCSI-command handlers.
942 - Michael Lang
943
944 4 To do
945 -------
946 - IBM SCSI-2 F/W external SCSI bus support in separate mode!
947 - It seems that the handling of bad disks is really bad -
948 non-existent, in fact. However, a low-level driver cannot help
949 much, if such things happen.
950
951 5 Users' Manual
952 ---------------
953 5.1 Commandline Parameters
954 --------------------------
955 There exist several features for the IBM SCSI-subsystem driver.
956 The commandline parameter format is:
957
958 ibmmcascsi=<command1>,<command2>,<command3>,...
959
960 where commandN can be one of the following:
961
962 display Owners of a model 95 or other PS/2 systems with an
963 alphanumeric LED display may set this to have their
964 display showing the following output of the 8 digits:
965
966 ------DA
967
968 where '-' stays dark, 'D' shows the SCSI-device id
969 and 'A' shows the SCSI hostindex, being currently
970 accessed. During boottime, this will give the message
971
972 SCSIini*
973
974 on the LED-panel, where the * represents a rotator,
975 showing the activity during the probing phase of the
976 driver which can take up to two minutes per SCSI-adapter.
977 adisplay This works like display, but gives more optical overview
978 of the activities on the SCSI-bus. The display will have
979 the following output:
980
981 6543210A
982
983 where the numbers 0 to 6 light up at the shown position,
984 when the SCSI-device is accessed. 'A' shows again the SCSI
985 hostindex. If display nor adisplay is set, the internal
986 PS/2 harddisk LED is used for media-activities. So, if
987 you really do not have a system with a LED-display, you
988 should not set display or adisplay. Keep in mind, that
989 display and adisplay can only be used alternatively. It
990 is not recommended to use this option, if you have some
991 wide-addressed devices e.g. at the SCSI-2 F/W adapter in
992 your system. In addition, the usage of the display for
993 other tasks in parallel, like the linuxinfo-utility makes
994 no sense with this option.
995 activity This enables the PS/2 harddisk LED activity indicator.
996 Most PS/2 have no alphanumeric LED display, but some
997 indicator. So you should use this parameter to activate it.
998 If you own model 9595 (Server95), you can have both, the
999 LED panel and the activity indicator in parallel. However,
1000 some PS/2s, like the 8595 do not have any harddisk LED
1001 activity indicator, which means, that you must use the
1002 alphanumeric LED display if you want to monitor SCSI-
1003 activity.
1004 bypass This is obsolete from driver version 4.0, as the adapters
1005 got that far understood, that the selection between
1006 integrated and bypassed commands should now work completely
1007 correct! For historical reasons, the old description is
1008 kept here:
1009 This commandline parameter forces the driver never to use
1010 SCSI-subsystems' integrated SCSI-command set. Except of
1011 the immediate assign, which is of vital importance for
1012 every IBM SCSI-subsystem to set its ldns right. Instead,
1013 the ordinary ANSI-SCSI-commands are used and passed by the
1014 controller to the SCSI-devices, therefore 'bypass'. The
1015 effort, done by the subsystem is quite bogus and at a
1016 minimum and therefore it should work everywhere. This
1017 could maybe solve troubles with old or integrated SCSI-
1018 controllers and nasty harddisks. Keep in mind, that using
1019 this flag will slow-down SCSI-accesses slightly, as the
1020 software generated commands are always slower than the
1021 hardware. Non-harddisk devices always get read/write-
1022 commands in bypass mode. On the most recent releases of
1023 the Linux IBM-SCSI-driver, the bypass command should be
1024 no longer a necessary thing, if you are sure about your
1025 SCSI-hardware!
1026 normal This is the parameter, introduced on the 2.0.x development
1027 rail by ZP Gu. This parameter defines the SCSI-device
1028 scan order in the new industry standard. This means, that
1029 the first SCSI-device is the one with the lowest pun.
1030 E.g. harddisk at pun=0 is scanned before harddisk at
1031 pun=6, which means, that harddisk at pun=0 gets sda
1032 and the one at pun=6 gets sdb.
1033 ansi The ANSI-standard for the right scan order, as done by
1034 IBM, Microware and Microsoft, scans SCSI-devices starting
1035 at the highest pun, which means, that e.g. harddisk at
1036 pun=6 gets sda and a harddisk at pun=0 gets sdb. If you
1037 like to have the same SCSI-device order, as in DOS, OS-9
1038 or OS/2, just use this parameter.
1039 fast SCSI-I/O in synchronous mode is done at 5 MHz for IBM-
1040 SCSI-devices. SCSI-2 Fast/Wide Adapter/A external bus
1041 should then run at 10 MHz if Fast-SCSI is enabled,
1042 and at 5 MHz if Fast-SCSI is disabled on the external
1043 bus. This is the default setting when nothing is
1044 specified here.
1045 medium Synchronous rate is at 50% approximately, which means
1046 2.5 MHz for IBM SCSI-adapters and 5.0 MHz for F/W ext.
1047 SCSI-bus (when Fast-SCSI speed enabled on external bus).
1048 slow The slowest possible synchronous transfer rate is set.
1049 This means 1.82 MHz for IBM SCSI-adapters and 2.0 MHz
1050 for F/W external bus at Fast-SCSI speed on the external
1051 bus.
1052
1053 A further option is that you can force the SCSI-driver to accept a SCSI-
1054 subsystem at a certain I/O-address with a predefined adapter PUN. This
1055 is done by entering
1056
1057 commandN = I/O-base
1058 commandN+1 = adapter PUN
1059
1060 e.g. ibmmcascsi=0x3540,7 will force the driver to detect a SCSI-subsystem
1061 at I/O-address 0x3540 with adapter PUN 7. Please only use this method, if
1062 the driver does really not recognize your SCSI-adapter! With driver version
1063 3.2, this recognition of various adapters was hugely improved and you
1064 should try first to remove your commandline arguments of such type with a
1065 newer driver. I bet, it will be recognized correctly. Even multiple and
1066 different types of IBM SCSI-adapters should be recognized correctly, too.
1067 Use the forced detection method only as last solution!
1068
1069 Examples:
1070
1071 ibmmcascsi=adisplay
1072
1073 This will use the advanced display mode for the model 95 LED alphanumeric
1074 display.
1075
1076 ibmmcascsi=display,0x3558,7
1077
1078 This will activate the default display mode for the model 95 LED display
1079 and will force the driver to accept a SCSI-subsystem at I/O-base 0x3558
1080 with adapter PUN 7.
1081
1082 5.2 Troubleshooting
1083 -------------------
1084 The following FAQs should help you to solve some major problems with this
1085 driver.
1086
1087 Q: "Reset SCSI-devices at boottime" halts the system at boottime, why?
1088 A: This is only tested with the IBM SCSI Adapter w/cache. It is not
1089 yet proven to run on other adapters, however you may be lucky.
1090 In version 3.1d this has been hugely improved and should work better,
1091 now. Normally you really won't need to activate this flag in the
1092 kernel configuration, as all post 1989 SCSI-devices should accept
1093 the reset-signal, when the computer is switched on. The SCSI-
1094 subsystem generates this reset while being initialized. This flag
1095 is really reserved for users with very old, very strange or self-made
1096 SCSI-devices.
1097 Q: Why is the SCSI-order of my drives mirrored to the device-order
1098 seen from OS/2 or DOS ?
1099 A: It depends on the operating system, if it looks at the devices in
1100 ANSI-SCSI-standard (starting from pun 6 and going down to pun 0) or
1101 if it just starts at pun 0 and counts up. If you want to be conform
1102 with OS/2 and DOS, you have to activate this flag in the kernel
1103 configuration or you should set 'ansi' as parameter for the kernel.
1104 The parameter 'normal' sets the new industry standard, starting
1105 from pun 0, scanning up to pun 6. This allows you to change your
1106 opinion still after having already compiled the kernel.
1107 Q: Why can't I find IBM MCA SCSI support in the config menu?
1108 A: You have to activate MCA bus support, first.
1109 Q: Where can I find the latest info about this driver?
1110 A: See the file MAINTAINERS for the current WWW-address, which offers
1111 updates, info and Q/A lists. At this file's origin, the webaddress
1112 was: http://www.staff.uni-mainz.de/mlang/linux.html
1113 Q: My SCSI-adapter is not recognized by the driver, what can I do?
1114 A: Just force it to be recognized by kernel parameters. See section 5.1.
1115 If this really happens, do also send e-mail to the maintainer, as
1116 forced detection should be never necessary. Forced detection is in
1117 principal some flaw of the driver adapter detection and goes into
1118 bug reports.
1119 Q: The driver screws up, if it starts to probe SCSI-devices, is there
1120 some way out of it?
1121 A: Yes, that was some recognition problem of the correct SCSI-adapter
1122 and its I/O base addresses. Upgrade your driver to the latest release
1123 and it should be fine again.
1124 Q: I get a message: panic IBM MCA SCSI: command error .... , what can
1125 I do against this?
1126 A: Previously, I followed the way by ignoring command errors by using
1127 ibmmcascsi=forgiveall, but this command no longer exists and is
1128 obsolete. If such a problem appears, it is caused by some segmentation
1129 fault of the driver, which maps to some unallowed area. The latest
1130 version of the driver should be ok, as most bugs have been solved.
1131 Q: There are still kernel panics, even after having set
1132 ibmmcascsi=forgiveall. Are there other possibilities to prevent
1133 such panics?
1134 A: No, get just the latest release of the driver and it should work
1135 better and better with increasing version number. Forget about this
1136 ibmmcascsi=forgiveall, as also ignorecmd are obsolete.!
1137 Q: Linux panics or stops without any comment, but it is probable, that my
1138 harddisk(s) have bad blocks.
1139 A: Sorry, the bad-block handling is still a feeble point of this driver,
1140 but is on the schedule for development in the near future.
1141 Q: Linux panics while dynamically assigning SCSI-ids or ldns.
1142 A: If you disconnect a SCSI-device from the machine, while Linux is up
1143 and the driver uses dynamical reassignment of logical device numbers
1144 (ldn), it really gets "angry" if it won't find devices, that were still
1145 present at boottime and stops Linux.
1146 Q: The system does not recover after an abort-command has been generated.
1147 A: This is regrettably true, as it is not yet understood, why the
1148 SCSI-adapter does really NOT generate any interrupt at the end of
1149 the abort-command. As no interrupt is generated, the abort command
1150 cannot get finished and the system hangs, sorry, but checks are
1151 running to hunt down this problem. If there is a real pending command,
1152 the interrupt MUST get generated after abort. In this case, it
1153 should finish well.
1154 Q: The system gets in bad shape after a SCSI-reset, is this known?
1155 A: Yes, as there are a lot of prescriptions (see the Linux Hackers'
1156 Guide) what has to be done for reset, we still share the bad shape of
1157 the reset functions with all other low level SCSI-drivers.
1158 Astonishingly, reset works in most cases quite ok, but the harddisks
1159 won't run in synchronous mode anymore after a reset, until you reboot.
1160 Q: Why does my XXX w/Cache adapter not use read-prefetch?
1161 A: Ok, that is not completely possible. If a cache is present, the
1162 adapter tries to use it internally. Explicitly, one can use the cache
1163 with a read prefetch command, maybe in future, but this requires
1164 some major overhead of SCSI-commands that risks the performance to
1165 go down more than it gets improved. Tests with that are running.
1166 Q: I have a IBM SCSI-2 Fast/Wide adapter, it boots in some way and hangs.
1167 A: Yes, that is understood, as for sure, your SCSI-2 Fast/Wide adapter
1168 was in such a case recognized as integrated SCSI-adapter or something
1169 else, but not as the correct adapter. As the I/O-ports get assigned
1170 wrongly by that reason, the system should crash in most cases. You
1171 should upgrade to the latest release of the SCSI-driver. The
1172 recommended version is 3.2 or later. Here, the F/W support is in
1173 a stable and reliable condition. Wide-addressing is in addition
1174 supported.
1175 Q: I get an Oops message and something like "killing interrupt".
1176 A: The reason for this is that the IBM SCSI-subsystem only sends a
1177 termination status back, if some error appeared. In former releases
1178 of the driver, it was not checked, if the termination status block
1179 is NULL. From version 3.2, it is taken care of this.
1180 Q: I have a F/W adapter and the driver sees my internal SCSI-devices,
1181 but ignores the external ones.
1182 A: Select combined busmode in the IBM config-program and check for that
1183 no SCSI-id on the external devices appears on internal devices.
1184 Reboot afterwards. Dual busmode is supported, but works only for the
1185 internal bus, yet. External bus is still ignored. Take care for your
1186 SCSI-ids. If combined bus-mode is activated, on some adapters,
1187 the wide-addressing is not possible, so devices with ids between 8
1188 and 15 get ignored by the driver & adapter!
1189 Q: I have a 9595 and I get a NMI during heavy SCSI I/O e.g. during fsck.
1190 A COMMAND ERROR is reported and characters on the screen are missing.
1191 Warm reboot is not possible. Things look like quite weird.
1192 A: Check the processor type of your 9595. If you have an 80486 or 486DX-2
1193 processor complex on your mainboard and you compiled a kernel that
1194 supports 80386 processors, it is possible, that the kernel cannot
1195 keep track of the PS/2 interrupt handling and stops on an NMI. Just
1196 compile a kernel for the correct processor type of your PS/2 and
1197 everything should be fine. This is necessary even if one assumes,
1198 that some 80486 system should be downward compatible to 80386
1199 software.
1200 Q: Some commands hang and interrupts block the machine. After some
1201 timeout, the syslog reports that it tries to call abort, but the
1202 machine is frozen.
1203 A: This can be a busy wait bug in the interrupt handler of driver
1204 version 3.2. You should at least upgrade to 3.2c if you use
1205 kernel < 2.4.0 and driver version 4.0 if you use kernel 2.4.0 or
1206 later (including all test releases).
1207 Q: I have a PS/2 model 80 and more than 16 MBytes of RAM. The driver
1208 completely refuses to work, reports NMIs, COMMAND ERRORs or other
1209 ambiguous stuff. When reducing the RAM size down below 16 MB,
1210 everything is running smoothly.
1211 A: No real answer, yet. In any case, one should force the kernel to
1212 present SCBs only below the 16 MBytes barrier. Maybe this solves the
1213 problem. Not yet tried, but guessing that it could work. To get this,
1214 set unchecked_isa_dma argument of ibmmca.h from 0 to 1.
1215
1216 5.3 Bug reports
1217 --------------
1218 If you really find bugs in the source code or the driver will successfully
1219 refuse to work on your machine, you should send a bug report to me. The
1220 best for this is to follow the instructions on the WWW-page for this
1221 driver. Fill out the bug-report form, placed on the WWW-page and ship it,
1222 so the bugs can be taken into account with maximum efforts. But, please
1223 do not send bug reports about this driver to Linus Torvalds or Leonard
1224 Zubkoff, as Linus is buried in E-Mail and Leonard is supervising all
1225 SCSI-drivers and won't have the time left to look inside every single
1226 driver to fix a bug and especially DO NOT send modified code to Linus
1227 Torvalds or Alan J. Cox which has not been checked here!!! They are both
1228 quite buried in E-mail (as me, sometimes, too) and one should first check
1229 for problems on my local teststand. Recently, I got a lot of
1230 bug reports for errors in the ibmmca.c code, which I could not imagine, but
1231 a look inside some Linux-distribution showed me quite often some modified
1232 code, which did no longer work on most other machines than the one of the
1233 modifier. Ok, so now that there is maintenance service available for this
1234 driver, please use this address first in order to keep the level of
1235 confusion low. Thank you!
1236
1237 When you get a SCSI-error message that panics your system, a list of
1238 register-entries of the SCSI-subsystem is shown (from Version 3.1d). With
1239 this list, it is very easy for the maintainer to localize the problem in
1240 the driver or in the configuration of the user. Please write down all the
1241 values from this report and send them to the maintainer. This would really
1242 help a lot and makes life easier concerning misunderstandings.
1243
1244 Use the bug-report form (see 5.4 for its address) to send all the bug-
1245 stuff to the maintainer or write e-mail with the values from the table.
1246
1247 5.4 Support WWW-page
1248 --------------------
1249 The address of the IBM SCSI-subsystem supporting WWW-page is:
1250
1251 http://www.staff.uni-mainz.de/mlang/linux.html
1252
1253 Here you can find info about the background of this driver, patches,
1254 troubleshooting support, news and a bugreport form. Please check that
1255 WWW-page regularly for latest hints. If ever this URL changes, please
1256 refer to the MAINTAINERS file in order to get the latest address.
1257
1258 For the bugreport, please fill out the formular on the corresponding
1259 WWW-page. Read the dedicated instructions and write as much as you
1260 know about your problem. If you do not like such formulars, please send
1261 some e-mail directly, but at least with the same information as required by
1262 the formular.
1263
1264 If you have extensive bug reports, including Oops messages and
1265 screen-shots, please feel free to send it directly to the address
1266 of the maintainer, too. The current address of the maintainer is:
1267
1268 Michael Lang <langa2@kph.uni-mainz.de>
1269
1270 6 References
1271 ------------
1272 IBM Corp., "Update for the PS/2 Hardware Interface Technical Reference,
1273 Common Interfaces", Armonk, September 1991, PN 04G3281,
1274 (available in the U.S. for $21.75 at 1-800-IBM-PCTB or in Germany for
1275 around 40,-DM at "Hallo IBM").
1276
1277 IBM Corp., "Personal System/2 Micro Channel SCSI
1278 Adapter with Cache Technical Reference", Armonk, March 1990, PN 68X2365.
1279
1280 IBM Corp., "Personal System/2 Micro Channel SCSI
1281 Adapter Technical Reference", Armonk, March 1990, PN 68X2397.
1282
1283 IBM Corp., "SCSI-2 Fast/Wide Adapter/A Technical Reference - Dual Bus",
1284 Armonk, March 1994, PN 83G7545.
1285
1286 Friedhelm Schmidt, "SCSI-Bus und IDE-Schnittstelle - Moderne Peripherie-
1287 Schnittstellen: Hardware, Protokollbeschreibung und Anwendung", 2. Aufl.
1288 Addison Wesley, 1996.
1289
1290 Michael K. Johnson, "The Linux Kernel Hackers' Guide", Version 0.6, Chapel
1291 Hill - North Carolina, 1995
1292
1293 Andreas Kaiser, "SCSI TAPE BACKUP for OS/2 2.0", Version 2.12, Stuttgart
1294 1993
1295
1296 Helmut Rompel, "IBM Computerwelt GUIDE", What is what bei IBM., Systeme *
1297 Programme * Begriffe, IWT-Verlag GmbH - Muenchen, 1988
1298
1299 7 Credits to
1300 ------------
1301 7.1 People
1302 ----------
1303 Klaus Grimm
1304 who already a long time ago gave me the old code from the
1305 SCSI-driver in order to get it running for some old machine
1306 in our institute.
1307 Martin Kolinek
1308 who wrote the first release of the IBM SCSI-subsystem driver.
1309 Chris Beauregard
1310 who for a long time maintained MCA-Linux and the SCSI-driver
1311 in the beginning. Chris, wherever you are: Cheers to you!
1312 Klaus Kudielka
1313 with whom in the 2.1.x times, I had a quite fruitful
1314 cooperation to get the driver running as a module and to get
1315 it running with multiple SCSI-adapters.
1316 David Weinehall
1317 for his excellent maintenance of the MCA-stuff and the quite
1318 detailed bug reports and ideas for this driver (and his
1319 patience ;-)).
1320 Alan J. Cox
1321 for his bug reports and his bold activities in cross-checking
1322 the driver-code with his teststand.
1323
1324 7.2 Sponsors & Supporters
1325 -------------------------
1326 "Hallo IBM",
1327 IBM-Deutschland GmbH
1328 the service of IBM-Deutschland for customers. Their E-Mail
1329 service is unbeatable. Whatever old stuff I asked for, I
1330 always got some helpful answers.
1331 Karl-Otto Reimers,
1332 IBM Klub - Sparte IBM Geschichte, Sindelfingen
1333 for sending me a copy of the w/Cache manual from the
1334 IBM-Deutschland archives.
1335 Harald Staiger
1336 for his extensive hardware donations which allows me today
1337 still to test the driver in various constellations.
1338 Erich Fritscher
1339 for his very kind sponsoring.
1340 Louis Ohland,
1341 Charles Lasitter
1342 for support by shipping me an IBM SCSI-2 Fast/Wide manual.
1343 In addition, the contribution of various hardware is quite
1344 decessive and will make it possible to add FWSR (RAID)
1345 adapter support to the driver in the near future! So,
1346 complaints about no RAID support won't remain forever.
1347 Yes, folks, that is no joke, RAID support is going to rise!
1348 Erik Weber
1349 for the great deal we made about a model 9595 and the nice
1350 surrounding equipment and the cool trip to Mannheim
1351 second-hand computer market. In addition, I would like
1352 to thank him for his exhaustive SCSI-driver testing on his
1353 95er PS/2 park.
1354 Anthony Hogbin
1355 for his direct shipment of a SCSI F/W adapter, which allowed
1356 me immediately on the first stage to try it on model 8557
1357 together with onboard SCSI adapter and some SCSI w/Cache.
1358 Andreas Hotz
1359 for his support by memory and an IBM SCSI-adapter. Collecting
1360 all this together now allows me to try really things with
1361 the driver at maximum load and variety on various models in
1362 a very quick and efficient way.
1363 Peter Jennewein
1364 for his model 30, which serves me as part of my teststand
1365 and his cool remark about how you make an ordinary diskette
1366 drive working and how to connect it to an IBM-diskette port.
1367 Johannes Gutenberg-Universitaet, Mainz &
1368 Institut fuer Kernphysik, Mainz Microtron (MAMI)
1369 for the offered space, the link, placed on the central
1370 homepage and the space to store and offer the driver and
1371 related material and the free working times, which allow
1372 me to answer all your e-mail.
1373
1374 8 Trademarks
1375 ------------
1376 IBM, PS/2, OS/2, Microchannel are registered trademarks of International
1377 Business Machines Corporation
1378
1379 MS-DOS is a registered trademark of Microsoft Corporation
1380
1381 Microware, OS-9 are registered trademarks of Microware Systems
1382
1383 9 Disclaimer
1384 ------------
1385 Beside the GNU General Public License and the dependent disclaimers and disclaimers
1386 concerning the Linux-kernel in special, this SCSI-driver comes without any
1387 warranty. Its functionality is tested as good as possible on certain
1388 machines and combinations of computer hardware, which does not exclude,
1389 that data loss or severe damage of hardware is possible while using this
1390 part of software on some arbitrary computer hardware or in combination
1391 with other software packages. It is highly recommended to make backup
1392 copies of your data before using this software. Furthermore, personal
1393 injuries by hardware defects, that could be caused by this SCSI-driver are
1394 not excluded and it is highly recommended to handle this driver with a
1395 maximum of carefulness.
1396
1397 This driver supports hardware, produced by International Business Machines
1398 Corporation (IBM).
1399
1400------
1401Michael Lang
1402(langa2@kph.uni-mainz.de)
diff --git a/Documentation/serial/computone.txt b/Documentation/serial/computone.txt
new file mode 100644
index 00000000000..60a6f657c37
--- /dev/null
+++ b/Documentation/serial/computone.txt
@@ -0,0 +1,522 @@
1NOTE: This is an unmaintained driver. It is not guaranteed to work due to
2changes made in the tty layer in 2.6. If you wish to take over maintenance of
3this driver, contact Michael Warfield <mhw@wittsend.com>.
4
5Changelog:
6----------
711-01-2001: Original Document
8
910-29-2004: Minor misspelling & format fix, update status of driver.
10 James Nelson <james4765@gmail.com>
11
12Computone Intelliport II/Plus Multiport Serial Driver
13-----------------------------------------------------
14
15Release Notes For Linux Kernel 2.2 and higher.
16These notes are for the drivers which have already been integrated into the
17kernel and have been tested on Linux kernels 2.0, 2.2, 2.3, and 2.4.
18
19Version: 1.2.14
20Date: 11/01/2001
21Historical Author: Andrew Manison <amanison@america.net>
22Primary Author: Doug McNash
23Support: support@computone.com
24Fixes and Updates: Mike Warfield <mhw@wittsend.com>
25
26This file assumes that you are using the Computone drivers which are
27integrated into the kernel sources. For updating the drivers or installing
28drivers into kernels which do not already have Computone drivers, please
29refer to the instructions in the README.computone file in the driver patch.
30
31
321. INTRODUCTION
33
34This driver supports the entire family of Intelliport II/Plus controllers
35with the exception of the MicroChannel controllers. It does not support
36products previous to the Intelliport II.
37
38This driver was developed on the v2.0.x Linux tree and has been tested up
39to v2.4.14; it will probably not work with earlier v1.X kernels,.
40
41
422. QUICK INSTALLATION
43
44Hardware - If you have an ISA card, find a free interrupt and io port.
45 List those in use with `cat /proc/interrupts` and
46 `cat /proc/ioports`. Set the card dip switches to a free
47 address. You may need to configure your BIOS to reserve an
48 irq for an ISA card. PCI and EISA parameters are set
49 automagically. Insert card into computer with the power off
50 before or after drivers installation.
51
52 Note the hardware address from the Computone ISA cards installed into
53 the system. These are required for editing ip2.c or editing
54 /etc/modprobe.conf, or for specification on the modprobe
55 command line.
56
57 Note that the /etc/modules.conf should be used for older (pre-2.6)
58 kernels.
59
60Software -
61
62Module installation:
63
64a) Determine free irq/address to use if any (configure BIOS if need be)
65b) Run "make config" or "make menuconfig" or "make xconfig"
66 Select (m) module for CONFIG_COMPUTONE under character
67 devices. CONFIG_PCI and CONFIG_MODULES also may need to be set.
68c) Set address on ISA cards then:
69 edit /usr/src/linux/drivers/char/ip2.c if needed
70 or
71 edit /etc/modprobe.conf if needed (module).
72 or both to match this setting.
73d) Run "make modules"
74e) Run "make modules_install"
75f) Run "/sbin/depmod -a"
76g) install driver using `modprobe ip2 <options>` (options listed below)
77h) run ip2mkdev (either the script below or the binary version)
78
79
80Kernel installation:
81
82a) Determine free irq/address to use if any (configure BIOS if need be)
83b) Run "make config" or "make menuconfig" or "make xconfig"
84 Select (y) kernel for CONFIG_COMPUTONE under character
85 devices. CONFIG_PCI may need to be set if you have PCI bus.
86c) Set address on ISA cards then:
87 edit /usr/src/linux/drivers/char/ip2.c
88 (Optional - may be specified on kernel command line now)
89d) Run "make zImage" or whatever target you prefer.
90e) mv /usr/src/linux/arch/x86/boot/zImage to /boot.
91f) Add new config for this kernel into /etc/lilo.conf, run "lilo"
92 or copy to a floppy disk and boot from that floppy disk.
93g) Reboot using this kernel
94h) run ip2mkdev (either the script below or the binary version)
95
96Kernel command line options:
97
98When compiling the driver into the kernel, io and irq may be
99compiled into the driver by editing ip2.c and setting the values for
100io and irq in the appropriate array. An alternative is to specify
101a command line parameter to the kernel at boot up.
102
103 ip2=io0,irq0,io1,irq1,io2,irq2,io3,irq3
104
105Note that this order is very different from the specifications for the
106modload parameters which have separate IRQ and IO specifiers.
107
108The io port also selects PCI (1) and EISA (2) boards.
109
110 io=0 No board
111 io=1 PCI board
112 io=2 EISA board
113 else ISA board io address
114
115You only need to specify the boards which are present.
116
117 Examples:
118
119 2 PCI boards:
120
121 ip2=1,0,1,0
122
123 1 ISA board at 0x310 irq 5:
124
125 ip2=0x310,5
126
127This can be added to and "append" option in lilo.conf similar to this:
128
129 append="ip2=1,0,1,0"
130
131
1323. INSTALLATION
133
134Previously, the driver sources were packaged with a set of patch files
135to update the character drivers' makefile and configuration file, and other
136kernel source files. A build script (ip2build) was included which applies
137the patches if needed, and build any utilities needed.
138What you receive may be a single patch file in conventional kernel
139patch format build script. That form can also be applied by
140running patch -p1 < ThePatchFile. Otherwise run ip2build.
141
142The driver can be installed as a module (recommended) or built into the
143kernel. This is selected as for other drivers through the `make config`
144command from the root of the Linux source tree. If the driver is built
145into the kernel you will need to edit the file ip2.c to match the boards
146you are installing. See that file for instructions. If the driver is
147installed as a module the configuration can also be specified on the
148modprobe command line as follows:
149
150 modprobe ip2 irq=irq1,irq2,irq3,irq4 io=addr1,addr2,addr3,addr4
151
152where irqnum is one of the valid Intelliport II interrupts (3,4,5,7,10,11,
15312,15) and addr1-4 are the base addresses for up to four controllers. If
154the irqs are not specified the driver uses the default in ip2.c (which
155selects polled mode). If no base addresses are specified the defaults in
156ip2.c are used. If you are autoloading the driver module with kerneld or
157kmod the base addresses and interrupt number must also be set in ip2.c
158and recompile or just insert and options line in /etc/modprobe.conf or both.
159The options line is equivalent to the command line and takes precedence over
160what is in ip2.c.
161
162/etc/modprobe.conf sample:
163 options ip2 io=1,0x328 irq=1,10
164 alias char-major-71 ip2
165 alias char-major-72 ip2
166 alias char-major-73 ip2
167
168The equivalent in ip2.c:
169
170static int io[IP2_MAX_BOARDS]= { 1, 0x328, 0, 0 };
171static int irq[IP2_MAX_BOARDS] = { 1, 10, -1, -1 };
172
173The equivalent for the kernel command line (in lilo.conf):
174
175 append="ip2=1,1,0x328,10"
176
177
178Note: Both io and irq should be updated to reflect YOUR system. An "io"
179 address of 1 or 2 indicates a PCI or EISA card in the board table.
180 The PCI or EISA irq will be assigned automatically.
181
182Specifying an invalid or in-use irq will default the driver into
183running in polled mode for that card. If all irq entries are 0 then
184all cards will operate in polled mode.
185
186If you select the driver as part of the kernel run :
187
188 make zlilo (or whatever you do to create a bootable kernel)
189
190If you selected a module run :
191
192 make modules && make modules_install
193
194The utility ip2mkdev (see 5 and 7 below) creates all the device nodes
195required by the driver. For a device to be created it must be configured
196in the driver and the board must be installed. Only devices corresponding
197to real IntelliPort II ports are created. With multiple boards and expansion
198boxes this will leave gaps in the sequence of device names. ip2mkdev uses
199Linux tty naming conventions: ttyF0 - ttyF255 for normal devices, and
200cuf0 - cuf255 for callout devices.
201
202
2034. USING THE DRIVERS
204
205As noted above, the driver implements the ports in accordance with Linux
206conventions, and the devices should be interchangeable with the standard
207serial devices. (This is a key point for problem reporting: please make
208sure that what you are trying do works on the ttySx/cuax ports first; then
209tell us what went wrong with the ip2 ports!)
210
211Higher speeds can be obtained using the setserial utility which remaps
21238,400 bps (extb) to 57,600 bps, 115,200 bps, or a custom speed.
213Intelliport II installations using the PowerPort expansion module can
214use the custom speed setting to select the highest speeds: 153,600 bps,
215230,400 bps, 307,200 bps, 460,800bps and 921,600 bps. The base for
216custom baud rate configuration is fixed at 921,600 for cards/expansion
217modules with ST654's and 115200 for those with Cirrus CD1400's. This
218corresponds to the maximum bit rates those chips are capable.
219For example if the baud base is 921600 and the baud divisor is 18 then
220the custom rate is 921600/18 = 51200 bps. See the setserial man page for
221complete details. Of course if stty accepts the higher rates now you can
222use that as well as the standard ioctls().
223
224
2255. ip2mkdev and assorted utilities...
226
227Several utilities, including the source for a binary ip2mkdev utility are
228available under .../drivers/char/ip2. These can be build by changing to
229that directory and typing "make" after the kernel has be built. If you do
230not wish to compile the binary utilities, the shell script below can be
231cut out and run as "ip2mkdev" to create the necessary device files. To
232use the ip2mkdev script, you must have procfs enabled and the proc file
233system mounted on /proc.
234
235
2366. NOTES
237
238This is a release version of the driver, but it is impossible to test it
239in all configurations of Linux. If there is any anomalous behaviour that
240does not match the standard serial port's behaviour please let us know.
241
242
2437. ip2mkdev shell script
244
245Previously, this script was simply attached here. It is now attached as a
246shar archive to make it easier to extract the script from the documentation.
247To create the ip2mkdev shell script change to a convenient directory (/tmp
248works just fine) and run the following command:
249
250 unshar Documentation/serial/computone.txt
251 (This file)
252
253You should now have a file ip2mkdev in your current working directory with
254permissions set to execute. Running that script with then create the
255necessary devices for the Computone boards, interfaces, and ports which
256are present on you system at the time it is run.
257
258
259#!/bin/sh
260# This is a shell archive (produced by GNU sharutils 4.2.1).
261# To extract the files from this archive, save it to some FILE, remove
262# everything before the `!/bin/sh' line above, then type `sh FILE'.
263#
264# Made on 2001-10-29 10:32 EST by <mhw@alcove.wittsend.com>.
265# Source directory was `/home2/src/tmp'.
266#
267# Existing files will *not* be overwritten unless `-c' is specified.
268#
269# This shar contains:
270# length mode name
271# ------ ---------- ------------------------------------------
272# 4251 -rwxr-xr-x ip2mkdev
273#
274save_IFS="${IFS}"
275IFS="${IFS}:"
276gettext_dir=FAILED
277locale_dir=FAILED
278first_param="$1"
279for dir in $PATH
280do
281 if test "$gettext_dir" = FAILED && test -f $dir/gettext \
282 && ($dir/gettext --version >/dev/null 2>&1)
283 then
284 set `$dir/gettext --version 2>&1`
285 if test "$3" = GNU
286 then
287 gettext_dir=$dir
288 fi
289 fi
290 if test "$locale_dir" = FAILED && test -f $dir/shar \
291 && ($dir/shar --print-text-domain-dir >/dev/null 2>&1)
292 then
293 locale_dir=`$dir/shar --print-text-domain-dir`
294 fi
295done
296IFS="$save_IFS"
297if test "$locale_dir" = FAILED || test "$gettext_dir" = FAILED
298then
299 echo=echo
300else
301 TEXTDOMAINDIR=$locale_dir
302 export TEXTDOMAINDIR
303 TEXTDOMAIN=sharutils
304 export TEXTDOMAIN
305 echo="$gettext_dir/gettext -s"
306fi
307if touch -am -t 200112312359.59 $$.touch >/dev/null 2>&1 && test ! -f 200112312359.59 -a -f $$.touch; then
308 shar_touch='touch -am -t $1$2$3$4$5$6.$7 "$8"'
309elif touch -am 123123592001.59 $$.touch >/dev/null 2>&1 && test ! -f 123123592001.59 -a ! -f 123123592001.5 -a -f $$.touch; then
310 shar_touch='touch -am $3$4$5$6$1$2.$7 "$8"'
311elif touch -am 1231235901 $$.touch >/dev/null 2>&1 && test ! -f 1231235901 -a -f $$.touch; then
312 shar_touch='touch -am $3$4$5$6$2 "$8"'
313else
314 shar_touch=:
315 echo
316 $echo 'WARNING: not restoring timestamps. Consider getting and'
317 $echo "installing GNU \`touch', distributed in GNU File Utilities..."
318 echo
319fi
320rm -f 200112312359.59 123123592001.59 123123592001.5 1231235901 $$.touch
321#
322if mkdir _sh17581; then
323 $echo 'x -' 'creating lock directory'
324else
325 $echo 'failed to create lock directory'
326 exit 1
327fi
328# ============= ip2mkdev ==============
329if test -f 'ip2mkdev' && test "$first_param" != -c; then
330 $echo 'x -' SKIPPING 'ip2mkdev' '(file already exists)'
331else
332 $echo 'x -' extracting 'ip2mkdev' '(text)'
333 sed 's/^X//' << 'SHAR_EOF' > 'ip2mkdev' &&
334#!/bin/sh -
335#
336# ip2mkdev
337#
338# Make or remove devices as needed for Computone Intelliport drivers
339#
340# First rule! If the dev file exists and you need it, don't mess
341# with it. That prevents us from screwing up open ttys, ownership
342# and permissions on a running system!
343#
344# This script will NOT remove devices that no longer exist if their
345# board or interface box has been removed. If you want to get rid
346# of them, you can manually do an "rm -f /dev/ttyF* /dev/cuaf*"
347# before running this script. Running this script will then recreate
348# all the valid devices.
349#
350# Michael H. Warfield
351# /\/\|=mhw=|\/\/
352# mhw@wittsend.com
353#
354# Updated 10/29/2000 for version 1.2.13 naming convention
355# under devfs. /\/\|=mhw=|\/\/
356#
357# Updated 03/09/2000 for devfs support in ip2 drivers. /\/\|=mhw=|\/\/
358#
359X
360if test -d /dev/ip2 ; then
361# This is devfs mode... We don't do anything except create symlinks
362# from the real devices to the old names!
363X cd /dev
364X echo "Creating symbolic links to devfs devices"
365X for i in `ls ip2` ; do
366X if test ! -L ip2$i ; then
367X # Remove it incase it wasn't a symlink (old device)
368X rm -f ip2$i
369X ln -s ip2/$i ip2$i
370X fi
371X done
372X for i in `( cd tts ; ls F* )` ; do
373X if test ! -L tty$i ; then
374X # Remove it incase it wasn't a symlink (old device)
375X rm -f tty$i
376X ln -s tts/$i tty$i
377X fi
378X done
379X for i in `( cd cua ; ls F* )` ; do
380X DEVNUMBER=`expr $i : 'F\(.*\)'`
381X if test ! -L cuf$DEVNUMBER ; then
382X # Remove it incase it wasn't a symlink (old device)
383X rm -f cuf$DEVNUMBER
384X ln -s cua/$i cuf$DEVNUMBER
385X fi
386X done
387X exit 0
388fi
389X
390if test ! -f /proc/tty/drivers
391then
392X echo "\
393Unable to check driver status.
394Make sure proc file system is mounted."
395X
396X exit 255
397fi
398X
399if test ! -f /proc/tty/driver/ip2
400then
401X echo "\
402Unable to locate ip2 proc file.
403Attempting to load driver"
404X
405X if /sbin/insmod ip2
406X then
407X if test ! -f /proc/tty/driver/ip2
408X then
409X echo "\
410Unable to locate ip2 proc file after loading driver.
411Driver initialization failure or driver version error.
412"
413X exit 255
414X fi
415X else
416X echo "Unable to load ip2 driver."
417X exit 255
418X fi
419fi
420X
421# Ok... So we got the driver loaded and we can locate the procfs files.
422# Next we need our major numbers.
423X
424TTYMAJOR=`sed -e '/^ip2/!d' -e '/\/dev\/tt/!d' -e 's/.*tt[^ ]*[ ]*\([0-9]*\)[ ]*.*/\1/' < /proc/tty/drivers`
425CUAMAJOR=`sed -e '/^ip2/!d' -e '/\/dev\/cu/!d' -e 's/.*cu[^ ]*[ ]*\([0-9]*\)[ ]*.*/\1/' < /proc/tty/drivers`
426BRDMAJOR=`sed -e '/^Driver: /!d' -e 's/.*IMajor=\([0-9]*\)[ ]*.*/\1/' < /proc/tty/driver/ip2`
427X
428echo "\
429TTYMAJOR = $TTYMAJOR
430CUAMAJOR = $CUAMAJOR
431BRDMAJOR = $BRDMAJOR
432"
433X
434# Ok... Now we should know our major numbers, if appropriate...
435# Now we need our boards and start the device loops.
436X
437grep '^Board [0-9]:' /proc/tty/driver/ip2 | while read token number type alltherest
438do
439X # The test for blank "type" will catch the stats lead-in lines
440X # if they exist in the file
441X if test "$type" = "vacant" -o "$type" = "Vacant" -o "$type" = ""
442X then
443X continue
444X fi
445X
446X BOARDNO=`expr "$number" : '\([0-9]\):'`
447X PORTS=`expr "$alltherest" : '.*ports=\([0-9]*\)' | tr ',' ' '`
448X MINORS=`expr "$alltherest" : '.*minors=\([0-9,]*\)' | tr ',' ' '`
449X
450X if test "$BOARDNO" = "" -o "$PORTS" = ""
451X then
452# This may be a bug. We should at least get this much information
453X echo "Unable to process board line"
454X continue
455X fi
456X
457X if test "$MINORS" = ""
458X then
459# Silently skip this one. This board seems to have no boxes
460X continue
461X fi
462X
463X echo "board $BOARDNO: $type ports = $PORTS; port numbers = $MINORS"
464X
465X if test "$BRDMAJOR" != ""
466X then
467X BRDMINOR=`expr $BOARDNO \* 4`
468X STSMINOR=`expr $BRDMINOR + 1`
469X if test ! -c /dev/ip2ipl$BOARDNO ; then
470X mknod /dev/ip2ipl$BOARDNO c $BRDMAJOR $BRDMINOR
471X fi
472X if test ! -c /dev/ip2stat$BOARDNO ; then
473X mknod /dev/ip2stat$BOARDNO c $BRDMAJOR $STSMINOR
474X fi
475X fi
476X
477X if test "$TTYMAJOR" != ""
478X then
479X PORTNO=$BOARDBASE
480X
481X for PORTNO in $MINORS
482X do
483X if test ! -c /dev/ttyF$PORTNO ; then
484X # We got the hardware but no device - make it
485X mknod /dev/ttyF$PORTNO c $TTYMAJOR $PORTNO
486X fi
487X done
488X fi
489X
490X if test "$CUAMAJOR" != ""
491X then
492X PORTNO=$BOARDBASE
493X
494X for PORTNO in $MINORS
495X do
496X if test ! -c /dev/cuf$PORTNO ; then
497X # We got the hardware but no device - make it
498X mknod /dev/cuf$PORTNO c $CUAMAJOR $PORTNO
499X fi
500X done
501X fi
502done
503X
504Xexit 0
505SHAR_EOF
506 (set 20 01 10 29 10 32 01 'ip2mkdev'; eval "$shar_touch") &&
507 chmod 0755 'ip2mkdev' ||
508 $echo 'restore of' 'ip2mkdev' 'failed'
509 if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
510 && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
511 md5sum -c << SHAR_EOF >/dev/null 2>&1 \
512 || $echo 'ip2mkdev:' 'MD5 check failed'
513cb5717134509f38bad9fde6b1f79b4a4 ip2mkdev
514SHAR_EOF
515 else
516 shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'ip2mkdev'`"
517 test 4251 -eq "$shar_count" ||
518 $echo 'ip2mkdev:' 'original size' '4251,' 'current size' "$shar_count!"
519 fi
520fi
521rm -fr _sh17581
522exit 0
diff --git a/Documentation/sparc/README-2.5 b/Documentation/sparc/README-2.5
new file mode 100644
index 00000000000..806fe490a56
--- /dev/null
+++ b/Documentation/sparc/README-2.5
@@ -0,0 +1,46 @@
1BTFIXUP
2-------
3
4To build new kernels you have to issue "make image". The ready kernel
5in ELF format is placed in arch/sparc/boot/image. Explanation is below.
6
7BTFIXUP is a unique feature of Linux/sparc among other architectures,
8developed by Jakub Jelinek (I think... Obviously David S. Miller took
9part, too). It allows to boot the same kernel at different
10sub-architectures, such as sun4c, sun4m, sun4d, where SunOS uses
11different kernels. This feature is convinient for people who you move
12disks between boxes and for distrution builders.
13
14To function, BTFIXUP must link the kernel "in the draft" first,
15analyze the result, write a special stub code based on that, and
16build the final kernel with the stub (btfix.o).
17
18Kai Germaschewski improved the build system of the kernel in the 2.5 series
19significantly. Unfortunately, the traditional way of running the draft
20linking from architecture specific Makefile before the actual linking
21by generic Makefile is nearly impossible to support properly in the
22new build system. Therefore, the way we integrate BTFIXUP with the
23build system was changed in 2.5.40. Now, generic Makefile performs
24the draft linking and stores the result in file vmlinux. Architecture
25specific post-processing invokes BTFIXUP machinery and final linking
26in the same way as other architectures do bootstraps.
27
28Implications of that change are as follows.
29
301. Hackers must type "make image" now, instead of just "make", in the same
31 way as s390 people do now. It is analogous to "make bzImage" on i386.
32 This does NOT affect sparc64, you continue to use "make" to build sparc64
33 kernels.
34
352. vmlinux is not the final kernel, so RPM builders have to adjust
36 their spec files (if they delivered vmlinux for debugging).
37 System.map generated for vmlinux is still valid.
38
393. Scripts that produce a.out images have to be changed. First, if they
40 invoke make, they have to use "make image". Second, they have to pick up
41 the new kernel in arch/sparc/boot/image instead of vmlinux.
42
434. Since we are compliant with Kai's build system now, make -j is permitted.
44
45-- Pete Zaitcev
46zaitcev@yahoo.com
diff --git a/Documentation/telephony/00-INDEX b/Documentation/telephony/00-INDEX
new file mode 100644
index 00000000000..4ffe0ed5b6f
--- /dev/null
+++ b/Documentation/telephony/00-INDEX
@@ -0,0 +1,4 @@
100-INDEX
2 - this file.
3ixj.txt
4 - document describing the Quicknet drivers.
diff --git a/Documentation/telephony/ixj.txt b/Documentation/telephony/ixj.txt
new file mode 100644
index 00000000000..db94fb6c567
--- /dev/null
+++ b/Documentation/telephony/ixj.txt
@@ -0,0 +1,394 @@
1Linux Quicknet-Drivers-Howto
2Quicknet Technologies, Inc. (www.quicknet.net)
3Version 0.3.4 December 18, 1999
4
51.0 Introduction
6
7This document describes the first GPL release version of the Linux
8driver for the Quicknet Internet PhoneJACK and Internet LineJACK
9cards. More information about these cards is available at
10www.quicknet.net. The driver version discussed in this document is
110.3.4.
12
13These cards offer nice telco style interfaces to use your standard
14telephone/key system/PBX as the user interface for VoIP applications.
15The Internet LineJACK also offers PSTN connectivity for a single line
16Internet to PSTN gateway. Of course, you can add more than one card
17to a system to obtain multi-line functionality. At this time, the
18driver supports the POTS port on both the Internet PhoneJACK and the
19Internet LineJACK, but the PSTN port on the latter card is not yet
20supported.
21
22This document, and the drivers for the cards, are intended for a
23limited audience that includes technically capable programmers who
24would like to experiment with Quicknet cards. The drivers are
25considered in ALPHA status and are not yet considered stable enough
26for general, widespread use in an unlimited audience.
27
28That's worth saying again:
29
30THE LINUX DRIVERS FOR QUICKNET CARDS ARE PRESENTLY IN A ALPHA STATE
31AND SHOULD NOT BE CONSIDERED AS READY FOR NORMAL WIDESPREAD USE.
32
33They are released early in the spirit of Internet development and to
34make this technology available to innovators who would benefit from
35early exposure.
36
37When we promote the device driver to "beta" level it will be
38considered ready for non-programmer, non-technical users. Until then,
39please be aware that these drivers may not be stable and may affect
40the performance of your system.
41
42
431.1 Latest Additions/Improvements
44
45The 0.3.4 version of the driver is the first GPL release. Several
46features had to be removed from the prior binary only module, mostly
47for reasons of Intellectual Property rights. We can't release
48information that is not ours - so certain aspects of the driver had to
49be removed to protect the rights of others.
50
51Specifically, very old Internet PhoneJACK cards have non-standard
52G.723.1 codecs (due to the early nature of the DSPs in those days).
53The auto-conversion code to bring those cards into compliance with
54today's standards is available as a binary only module to those people
55needing it. If you bought your card after 1997 or so, you are OK -
56it's only the very old cards that are affected.
57
58Also, the code to download G.728/G.729/G.729a codecs to the DSP is
59available as a binary only module as well. This IP is not ours to
60release.
61
62Hooks are built into the GPL driver to allow it to work with other
63companion modules that are completely separate from this module.
64
651.2 Copyright, Trademarks, Disclaimer, & Credits
66
67Copyright
68
69Copyright (c) 1999 Quicknet Technologies, Inc. Permission is granted
70to freely copy and distribute this document provided you preserve it
71in its original form. For corrections and minor changes contact the
72maintainer at linux@quicknet.net.
73
74Trademarks
75
76Internet PhoneJACK and Internet LineJACK are registered trademarks of
77Quicknet Technologies, Inc.
78
79Disclaimer
80
81Much of the info in this HOWTO is early information released by
82Quicknet Technologies, Inc. for the express purpose of allowing early
83testing and use of the Linux drivers developed for their products.
84While every attempt has been made to be thorough, complete and
85accurate, the information contained here may be unreliable and there
86are likely a number of errors in this document. Please let the
87maintainer know about them. Since this is free documentation, it
88should be obvious that neither I nor previous authors can be held
89legally responsible for any errors.
90
91Credits
92
93This HOWTO was written by:
94
95 Greg Herlein <gherlein@quicknet.net>
96 Ed Okerson <eokerson@quicknet.net>
97
981.3 Future Plans: You Can Help
99
100Please let the maintainer know of any errors in facts, opinions,
101logic, spelling, grammar, clarity, links, etc. But first, if the date
102is over a month old, check to see that you have the latest
103version. Please send any info that you think belongs in this document.
104
105You can also contribute code and/or bug-fixes for the sample
106applications.
107
108
1091.4 Where to get things
110
111Info on latest versions of the driver are here:
112
113http://web.archive.org/web/*/http://www.quicknet.net/develop.htm
114
1151.5 Mailing List
116
117Quicknet operates a mailing list to provide a public forum on using
118these drivers.
119
120To subscribe to the linux-sdk mailing list, send an email to:
121
122 majordomo@linux.quicknet.net
123
124In the body of the email, type:
125
126 subscribe linux-sdk <your-email-address>
127
128Please delete any signature block that you would normally add to the
129bottom of your email - it tends to confuse majordomo.
130
131To send mail to the list, address your mail to
132
133 linux-sdk@linux.quicknet.net
134
135Your message will go out to everyone on the list.
136
137To unsubscribe to the linux-sdk mailing list, send an email to:
138
139 majordomo@linux.quicknet.net
140
141In the body of the email, type:
142
143 unsubscribe linux-sdk <your-email-address>
144
145
146
1472.0 Requirements
148
1492.1 Quicknet Card(s)
150
151You will need at least one Internet PhoneJACK or Internet LineJACK
152cards. These are ISA or PCI bus devices that use Plug-n-Play for
153configuration, and use no IRQs. The driver will support up to 16
154cards in any one system, of any mix between the two types.
155
156Note that you will need two cards to do any useful testing alone, since
157you will need a card on both ends of the connection. Of course, if
158you are doing collaborative work, perhaps your friends or coworkers
159have cards too. If not, we'll gladly sell them some!
160
161
1622.2 ISAPNP
163
164Since the Quicknet cards are Plug-n-Play devices, you will need the
165isapnp tools package to configure the cards, or you can use the isapnp
166module to autoconfigure them. The former package probably came with
167your Linux distribution. Documentation on this package is available
168online at:
169
170http://mailer.wiwi.uni-marburg.de/linux/LDP/HOWTO/Plug-and-Play-HOWTO.html
171
172The isapnp autoconfiguration is available on the Quicknet website at:
173
174 http://www.quicknet.net/develop.htm
175
176though it may be in the kernel by the time you read this.
177
178
1793.0 Card Configuration
180
181If you did not get your drivers as part of the linux kernel, do the
182following to install them:
183
184 a. untar the distribution file. We use the following command:
185 tar -xvzf ixj-0.x.x.tgz
186
187This creates a subdirectory holding all the necessary files. Go to that
188subdirectory.
189
190 b. run the "ixj_dev_create" script to remove any stray device
191files left in the /dev directory, and to create the new officially
192designated device files. Note that the old devices were called
193/dev/ixj, and the new method uses /dev/phone.
194
195 c. type "make;make install" - this will compile and install the
196module.
197
198 d. type "depmod -av" to rebuild all your kernel version dependencies.
199
200 e. if you are using the isapnp module to configure the cards
201 automatically, then skip to step f. Otherwise, ensure that you
202 have run the isapnp configuration utility to properly configure
203 the cards.
204
205 e1. The Internet PhoneJACK has one configuration register that
206 requires 16 IO ports. The Internet LineJACK card has two
207 configuration registers and isapnp reports that IO 0
208 requires 16 IO ports and IO 1 requires 8. The Quicknet
209 driver assumes that these registers are configured to be
210 contiguous, i.e. if IO 0 is set to 0x340 then IO 1 should
211 be set to 0x350.
212
213 Make sure that none of the cards overlap if you have
214 multiple cards in the system.
215
216 If you are new to the isapnp tools, you can jumpstart
217 yourself by doing the following:
218
219 e2. go to the /etc directory and run pnpdump to get a blank
220 isapnp.conf file.
221
222 pnpdump > /etc/isapnp.conf
223
224 e3. edit the /etc/isapnp.conf file to set the IO warnings and
225 the register IO addresses. The IO warnings means that you
226 should find the line in the file that looks like this:
227
228 (CONFLICT (IO FATAL)(IRQ FATAL)(DMA FATAL)(MEM FATAL)) # or WARNING
229
230 and you should edit the line to look like this:
231
232 (CONFLICT (IO WARNING)(IRQ FATAL)(DMA FATAL)(MEM FATAL)) #
233 or WARNING
234
235 The next step is to set the IO port addresses. The issue
236 here is that isapnp does not identify all of the ports out
237 there. Specifically any device that does not have a driver
238 or module loaded by Linux will not be registered. This
239 includes older sound cards and network cards. We have
240 found that the IO port 0x300 is often used even though
241 isapnp claims that no-one is using those ports. We
242 recommend that for a single card installation that port
243 0x340 (and 0x350) be used. The IO port line should change
244 from this:
245
246 (IO 0 (SIZE 16) (BASE 0x0300) (CHECK))
247
248 to this:
249
250 (IO 0 (SIZE 16) (BASE 0x0340) )
251
252 e4. if you have multiple Quicknet cards, make sure that you do
253 not have any overlaps. Be especially careful if you are
254 mixing Internet PhoneJACK and Internet LineJACK cards in
255 the same system. In these cases we recommend moving the
256 IO port addresses to the 0x400 block. Please note that on
257 a few machines the 0x400 series are used. Feel free to
258 experiment with other addresses. Our cards have been
259 proven to work using IO addresses of up to 0xFF0.
260
261 e5. the last step is to uncomment the activation line so the
262 drivers will be associated with the port. This means the
263 line (immediately below) the IO line should go from this:
264
265 # (ACT Y)
266
267 to this:
268
269 (ACT Y)
270
271 Once you have finished editing the isapnp.conf file you
272 must submit it into the pnp driverconfigure the cards.
273 This is done using the following command:
274
275 isapnp isapnp.conf
276
277 If this works you should see a line that identifies the
278 Quicknet device, the IO port(s) chosen, and a message
279 "Enabled OK".
280
281 f. if you are loading the module by hand, use insmod. An example
282of this would look like this:
283
284 insmod phonedev
285 insmod ixj dspio=0x320,0x310 xio=0,0x330
286
287Then verify the module loaded by running lsmod. If you are not using a
288module that matches your kernel version, you may need to "force" the
289load using the -f option in the insmod command.
290
291 insmod phonedev
292 insmod -f ixj dspio=0x320,0x310 xio=0,0x330
293
294
295If you are using isapnp to autoconfigure your card, then you do NOT
296need any of the above, though you need to use depmod to load the
297driver, like this:
298
299 depmod ixj
300
301which will result in the needed drivers getting loaded automatically.
302
303 g. if you are planning on having the kernel automatically request
304the module for you, then you need to edit /etc/conf.modules and add the
305following lines:
306
307 options ixj dspio=0x340 xio=0x330 ixjdebug=0
308
309If you do this, then when you execute an application that uses the
310module the kernel will request that it is loaded.
311
312 h. if you want non-root users to be able to read and write to the
313ixj devices (this is a good idea!) you should do the following:
314
315 - decide upon a group name to use and create that group if
316 needed. Add the user names to that group that you wish to
317 have access to the device. For example, we typically will
318 create a group named "ixj" in /etc/group and add all users
319 to that group that we want to run software that can use the
320 ixjX devices.
321
322 - change the permissions on the device files, like this:
323
324 chgrp ixj /dev/ixj*
325 chmod 660 /dev/ixj*
326
327Once this is done, then non-root users should be able to use the
328devices. If you have enabled autoloading of modules, then the user
329should be able to open the device and have the module loaded
330automatically for them.
331
332
3334.0 Driver Installation problems.
334
335We have tested these drivers on the 2.2.9, 2.2.10, 2.2.12, and 2.2.13 kernels
336and in all cases have eventually been able to get the drivers to load and
337run. We have found four types of problems that prevent this from happening.
338The problems and solutions are:
339
340 a. A step was missed in the installation. Go back and use section 3
341as a checklist. Many people miss running the ixj_dev_create script and thus
342never load the device names into the filesystem.
343
344 b. The kernel is inconsistently linked. We have found this problem in
345the Out Of the Box installation of several distributions. The symptoms
346are that neither driver will load, and that the unknown symbols include "jiffy"
347and "kmalloc". The solution is to recompile both the kernel and the
348modules. The command string for the final compile looks like this:
349
350 In the kernel directory:
351 1. cp .config /tmp
352 2. make mrproper
353 3. cp /tmp/.config .
354 4. make clean;make bzImage;make modules;make modules_install
355
356This rebuilds both the kernel and all the modules and makes sure they all
357have the same linkages. This generally solves the problem once the new
358kernel is installed and the system rebooted.
359
360 c. The kernel has been patched, then unpatched. This happens when
361someone decides to use an earlier kernel after they load a later kernel.
362The symptoms are proceeding through all three above steps and still not
363being able to load the driver. What has happened is that the generated
364header files are out of sync with the kernel itself. The solution is
365to recompile (again) using "make mrproper". This will remove and then
366regenerate all the necessary header files. Once this is done, then you
367need to install and reboot the kernel. We have not seen any problem
368loading one of our drivers after this treatment.
369
3705.0 Known Limitations
371
372We cannot currently play "dial-tone" and listen for DTMF digits at the
373same time using the ISA PhoneJACK. This is a bug in the 8020 DSP chip
374used on that product. All other Quicknet products function normally
375in this regard. We have a work-around, but it's not done yet. Until
376then, if you want dial-tone, you can always play a recorded dial-tone
377sound into the audio until you have gathered the DTMF digits.
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
diff --git a/Documentation/trace/tracedump.txt b/Documentation/trace/tracedump.txt
new file mode 100644
index 00000000000..cba0decc3fc
--- /dev/null
+++ b/Documentation/trace/tracedump.txt
@@ -0,0 +1,58 @@
1 Tracedump
2
3 Documentation written by Alon Farchy
4
51. Overview
6============
7
8The tracedump module provides additional mechanisms to retrieve tracing data.
9It can be used to retrieve traces after a kernel panic or while the system
10is running in either binary format or plaintext. The dumped data is compressed
11with zlib to conserve space.
12
132. Configuration Options
14========================
15
16CONFIG_TRACEDUMP - enable the tracedump module.
17CONFIG_TRACEDUMP_PANIC - dump to console on kernel panic
18CONFIG_TRACEDUMP_PROCFS - add file /proc/tracedump for userspace access.
19
203. Module Parameters
21====================
22
23format_ascii
24
25 If 1, data will dump in human-readable format, ordered by time.
26 If 0, data will be dumped as raw pages from the ring buffer,
27 ordered by CPU, followed by the saved cmdlines so that the
28 raw data can be decoded. Default: 0
29
30panic_size
31
32 Maximum amount of compressed data to dump during a kernel panic
33 in kilobytes. This only applies if format_ascii == 1. In this case,
34 tracedump will compress the data, check the size, and if it is too big
35 toss out some data, compress again, etc, until the size is below
36 panic_size. Default: 512KB
37
38compress_level
39
40 Determines the compression level that zlib will use. Available levels
41 are 0-9, with 0 as no compression and 9 as maximum compression.
42 Default: 9.
43
444. Usage
45========
46
47If configured with CONFIG_TRACEDUMP_PROCFS, the tracing data can be pulled
48by reading from /proc/tracedump. For example:
49
50 # cat /proc/tracedump > my_tracedump
51
52Tracedump will surround the dump with a magic word (TRACEDUMP). Between the
53magic words is the compressed data, which can be decompressed with a standard
54zlib implementation. After decompression, if format_ascii == 1, then the
55output should be readable.
56
57If format_ascii == 0, the output should be in binary form, delimited by
58CPU_END. After the last CPU should be the saved cmdlines, delimited by |.
diff --git a/Documentation/trace/tracelevel.txt b/Documentation/trace/tracelevel.txt
new file mode 100644
index 00000000000..b282dd2b329
--- /dev/null
+++ b/Documentation/trace/tracelevel.txt
@@ -0,0 +1,42 @@
1 Tracelevel
2
3 Documentation by Alon Farchy
4
51. Overview
6===========
7
8Tracelevel allows subsystem authors to add trace priorities to
9their tracing events. High priority traces will be enabled
10automatically at boot time.
11
12This module is configured with CONFIG_TRACELEVEL.
13
142. Usage
15=========
16
17To give an event a priority, use the function tracelevel_register
18at any time.
19
20 tracelevel_register(my_event, level);
21
22my_event corresponds directly to the event name as defined in the
23event header file. Available levels are:
24
25 TRACELEVEL_ERR 3
26 TRACELEVEL_WARN 2
27 TRACELEVEL_INFO 1
28 TRACELEVEL_DEBUG 0
29
30Any event registered at boot time as TRACELEVEL_ERR will be enabled
31by default. The header also exposes the function tracelevel_set_level
32to change the trace level at runtime. Any trace event registered with the
33specified level or higher will be enabled with this call.
34
35A userspace handle to tracelevel_set_level is available via the module
36parameter 'level'. For example,
37
38 echo 1 > /sys/module/tracelevel/parameters/level
39
40Is logically equivalent to:
41
42 tracelevel_set_level(TRACELEVEL_INFO);
diff --git a/Documentation/video/tegra_dc_ext.txt b/Documentation/video/tegra_dc_ext.txt
new file mode 100644
index 00000000000..6fc3394c665
--- /dev/null
+++ b/Documentation/video/tegra_dc_ext.txt
@@ -0,0 +1,83 @@
1The Tegra display controller (dc) driver has two frontends that implement
2different interfaces:
31. The traditional fbdev interface, implemented in drivers/video/tegra/fb.c
42. A new interface that exposes the unique capabilities of the controller,
5 implemented in drivers/video/tegra/dc/ext
6
7The Tegra fbdev capabilities are documented in fb/tegrafb.c [TODO]. This
8document will describe the new "extended" dc interface.
9
10The extended interface is only available when its frontend has been compiled
11in, i.e., CONFIG_TEGRA_DC_EXTENSIONS=y. The dc_ext frontend can coexist with
12tegrafb, but takes precedence (more on that later).
13
14The dc_ext frontend's interface to userspace is exposed through a set of
15device nodes: one for each controller (generally /dev/tegra_dc_N), and one
16"control" node (generally /dev/tegra_dc_ctrl). Communication through these
17device nodes is done with special IOCTLs. There is also an event delivery
18mechanism; userspace can wait for and receive events with read() or poll().
19
20The tegra_dc_N interface is stateful; each fresh open() of the device node
21creates a client instance. In order to prevent multiple processes from
22"fighting" for the hardware, only one client instance is permitted to control
23certain resources at a time, on a first-come, first-serve basis.
24
25Overview of tegra_dc_N IOCTLs:
26SET_NVMAP_FD: This is used to associate your nvmap client with this dc_ext
27 client instance. This is necessary so that the kernel can
28 appropriately enforce permissions on nvmap buffers.
29
30GET_WINDOW: A dc_ext client must call this on each window that it wishes to
31 control. This strictly enforces a single dc_ext client on a
32 window at a time.
33
34PUT_WINDOW: A dc_ext client may call this to release a window previously
35 reserved with GET_WINDOW.
36
37FLIP: This ioctl is used to actually display an nvmap surface using one or
38 more window. Each time a dc_ext client performs a FLIP, the request is
39 put on a flip queue and executed asynchronously (the FLIP ioctl will
40 return immediately). Various parameters are available in the
41 tegra_dc_ext_flip structure.
42 A dc_ext client may only use this on windows that it has previously
43 reserved with a successful GET_WINDOW call.
44
45GET_CURSOR: This is analogous to GET_WINDOW, but for the hardware cursor
46 instead of a window.
47
48PUT_CURSOR: This is analogous to PUT_WINDOW, but for the hardware cursor
49 instead of a window.
50
51SET_CURSOR_IMAGE: This is used to change the hardware cursor image. May only
52 be used by a client who has successfully performed a
53 GET_CURSOR call.
54
55SET_CURSOR: This is used to actually place the hardware cursor on the screen.
56 May only be used by a client who has successfully performed a
57 GET_CURSOR call.
58
59SET_CSC: This may be used to set a color space conversion matrix on a window.
60 A dc_ext client may only use this on windows that it has previously
61 reserved with a successful GET_WINDOW call.
62
63GET_STATUS: This is used to retrieve general status about the dc.
64
65GET_VBLANK_SYNCPT: This is used to retrieve the auto-incrementing vblank
66 syncpoint for the head associated with this dc.
67
68
69Overview of tegra_dc_ctrl IOCTLs:
70GET_NUM_OUTPUTS: This returns the number of available output devices on the
71 system, which may exceed the number of display controllers.
72
73GET_OUTPUT_PROPERTIES: This returns data about the given output, such as what
74 kind of output it is, whether it's currently associated
75 with a head, etc.
76
77GET_OUTPUT_EDID: This returns the binary EDID read from the device connected
78 to the given output, if any.
79
80SET_EVENT_MASK: A dc_ext client may call this ioctl with a bitmask of events
81 that it wishes to receive. These events will then be
82 available to that client on a subsequent read() on the same
83 file descriptor.
diff --git a/Documentation/virtual/lguest/Makefile b/Documentation/virtual/lguest/Makefile
new file mode 100644
index 00000000000..0ac34206f7a
--- /dev/null
+++ b/Documentation/virtual/lguest/Makefile
@@ -0,0 +1,8 @@
1# This creates the demonstration utility "lguest" which runs a Linux guest.
2# Missing headers? Add "-I../../../include -I../../../arch/x86/include"
3CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -U_FORTIFY_SOURCE
4
5all: lguest
6
7clean:
8 rm -f lguest
diff --git a/Documentation/virtual/lguest/extract b/Documentation/virtual/lguest/extract
new file mode 100644
index 00000000000..7730bb6e4b9
--- /dev/null
+++ b/Documentation/virtual/lguest/extract
@@ -0,0 +1,58 @@
1#! /bin/sh
2
3set -e
4
5PREFIX=$1
6shift
7
8trap 'rm -r $TMPDIR' 0
9TMPDIR=`mktemp -d`
10
11exec 3>/dev/null
12for f; do
13 while IFS="
14" read -r LINE; do
15 case "$LINE" in
16 *$PREFIX:[0-9]*:\**)
17 NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"`
18 if [ -f $TMPDIR/$NUM ]; then
19 echo "$TMPDIR/$NUM already exits prior to $f"
20 exit 1
21 fi
22 exec 3>>$TMPDIR/$NUM
23 echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM
24 /bin/echo "$LINE" | sed -e "s/$PREFIX:[0-9]*//" -e "s/:\*/*/" >&3
25 ;;
26 *$PREFIX:[0-9]*)
27 NUM=`echo "$LINE" | sed "s/.*$PREFIX:\([0-9]*\).*/\1/"`
28 if [ -f $TMPDIR/$NUM ]; then
29 echo "$TMPDIR/$NUM already exits prior to $f"
30 exit 1
31 fi
32 exec 3>>$TMPDIR/$NUM
33 echo $f | sed 's,\.\./,,g' > $TMPDIR/.$NUM
34 /bin/echo "$LINE" | sed "s/$PREFIX:[0-9]*//" >&3
35 ;;
36 *:\**)
37 /bin/echo "$LINE" | sed -e "s/:\*/*/" -e "s,/\*\*/,," >&3
38 echo >&3
39 exec 3>/dev/null
40 ;;
41 *)
42 /bin/echo "$LINE" >&3
43 ;;
44 esac
45 done < $f
46 echo >&3
47 exec 3>/dev/null
48done
49
50LASTFILE=""
51for f in $TMPDIR/*; do
52 if [ "$LASTFILE" != $(cat $TMPDIR/.$(basename $f) ) ]; then
53 LASTFILE=$(cat $TMPDIR/.$(basename $f) )
54 echo "[ $LASTFILE ]"
55 fi
56 cat $f
57done
58
diff --git a/Documentation/virtual/lguest/lguest.c b/Documentation/virtual/lguest/lguest.c
new file mode 100644
index 00000000000..d928c134dee
--- /dev/null
+++ b/Documentation/virtual/lguest/lguest.c
@@ -0,0 +1,2065 @@
1/*P:100
2 * This is the Launcher code, a simple program which lays out the "physical"
3 * memory for the new Guest by mapping the kernel image and the virtual
4 * devices, then opens /dev/lguest to tell the kernel about the Guest and
5 * control it.
6:*/
7#define _LARGEFILE64_SOURCE
8#define _GNU_SOURCE
9#include <stdio.h>
10#include <string.h>
11#include <unistd.h>
12#include <err.h>
13#include <stdint.h>
14#include <stdlib.h>
15#include <elf.h>
16#include <sys/mman.h>
17#include <sys/param.h>
18#include <sys/types.h>
19#include <sys/stat.h>
20#include <sys/wait.h>
21#include <sys/eventfd.h>
22#include <fcntl.h>
23#include <stdbool.h>
24#include <errno.h>
25#include <ctype.h>
26#include <sys/socket.h>
27#include <sys/ioctl.h>
28#include <sys/time.h>
29#include <time.h>
30#include <netinet/in.h>
31#include <net/if.h>
32#include <linux/sockios.h>
33#include <linux/if_tun.h>
34#include <sys/uio.h>
35#include <termios.h>
36#include <getopt.h>
37#include <assert.h>
38#include <sched.h>
39#include <limits.h>
40#include <stddef.h>
41#include <signal.h>
42#include <pwd.h>
43#include <grp.h>
44
45#include <linux/virtio_config.h>
46#include <linux/virtio_net.h>
47#include <linux/virtio_blk.h>
48#include <linux/virtio_console.h>
49#include <linux/virtio_rng.h>
50#include <linux/virtio_ring.h>
51#include <asm/bootparam.h>
52#include "../../../include/linux/lguest_launcher.h"
53/*L:110
54 * We can ignore the 43 include files we need for this program, but I do want
55 * to draw attention to the use of kernel-style types.
56 *
57 * As Linus said, "C is a Spartan language, and so should your naming be." I
58 * like these abbreviations, so we define them here. Note that u64 is always
59 * unsigned long long, which works on all Linux systems: this means that we can
60 * use %llu in printf for any u64.
61 */
62typedef unsigned long long u64;
63typedef uint32_t u32;
64typedef uint16_t u16;
65typedef uint8_t u8;
66/*:*/
67
68#define BRIDGE_PFX "bridge:"
69#ifndef SIOCBRADDIF
70#define SIOCBRADDIF 0x89a2 /* add interface to bridge */
71#endif
72/* We can have up to 256 pages for devices. */
73#define DEVICE_PAGES 256
74/* This will occupy 3 pages: it must be a power of 2. */
75#define VIRTQUEUE_NUM 256
76
77/*L:120
78 * verbose is both a global flag and a macro. The C preprocessor allows
79 * this, and although I wouldn't recommend it, it works quite nicely here.
80 */
81static bool verbose;
82#define verbose(args...) \
83 do { if (verbose) printf(args); } while(0)
84/*:*/
85
86/* The pointer to the start of guest memory. */
87static void *guest_base;
88/* The maximum guest physical address allowed, and maximum possible. */
89static unsigned long guest_limit, guest_max;
90/* The /dev/lguest file descriptor. */
91static int lguest_fd;
92
93/* a per-cpu variable indicating whose vcpu is currently running */
94static unsigned int __thread cpu_id;
95
96/* This is our list of devices. */
97struct device_list {
98 /* Counter to assign interrupt numbers. */
99 unsigned int next_irq;
100
101 /* Counter to print out convenient device numbers. */
102 unsigned int device_num;
103
104 /* The descriptor page for the devices. */
105 u8 *descpage;
106
107 /* A single linked list of devices. */
108 struct device *dev;
109 /* And a pointer to the last device for easy append. */
110 struct device *lastdev;
111};
112
113/* The list of Guest devices, based on command line arguments. */
114static struct device_list devices;
115
116/* The device structure describes a single device. */
117struct device {
118 /* The linked-list pointer. */
119 struct device *next;
120
121 /* The device's descriptor, as mapped into the Guest. */
122 struct lguest_device_desc *desc;
123
124 /* We can't trust desc values once Guest has booted: we use these. */
125 unsigned int feature_len;
126 unsigned int num_vq;
127
128 /* The name of this device, for --verbose. */
129 const char *name;
130
131 /* Any queues attached to this device */
132 struct virtqueue *vq;
133
134 /* Is it operational */
135 bool running;
136
137 /* Device-specific data. */
138 void *priv;
139};
140
141/* The virtqueue structure describes a queue attached to a device. */
142struct virtqueue {
143 struct virtqueue *next;
144
145 /* Which device owns me. */
146 struct device *dev;
147
148 /* The configuration for this queue. */
149 struct lguest_vqconfig config;
150
151 /* The actual ring of buffers. */
152 struct vring vring;
153
154 /* Last available index we saw. */
155 u16 last_avail_idx;
156
157 /* How many are used since we sent last irq? */
158 unsigned int pending_used;
159
160 /* Eventfd where Guest notifications arrive. */
161 int eventfd;
162
163 /* Function for the thread which is servicing this virtqueue. */
164 void (*service)(struct virtqueue *vq);
165 pid_t thread;
166};
167
168/* Remember the arguments to the program so we can "reboot" */
169static char **main_args;
170
171/* The original tty settings to restore on exit. */
172static struct termios orig_term;
173
174/*
175 * We have to be careful with barriers: our devices are all run in separate
176 * threads and so we need to make sure that changes visible to the Guest happen
177 * in precise order.
178 */
179#define wmb() __asm__ __volatile__("" : : : "memory")
180#define mb() __asm__ __volatile__("" : : : "memory")
181
182/*
183 * Convert an iovec element to the given type.
184 *
185 * This is a fairly ugly trick: we need to know the size of the type and
186 * alignment requirement to check the pointer is kosher. It's also nice to
187 * have the name of the type in case we report failure.
188 *
189 * Typing those three things all the time is cumbersome and error prone, so we
190 * have a macro which sets them all up and passes to the real function.
191 */
192#define convert(iov, type) \
193 ((type *)_convert((iov), sizeof(type), __alignof__(type), #type))
194
195static void *_convert(struct iovec *iov, size_t size, size_t align,
196 const char *name)
197{
198 if (iov->iov_len != size)
199 errx(1, "Bad iovec size %zu for %s", iov->iov_len, name);
200 if ((unsigned long)iov->iov_base % align != 0)
201 errx(1, "Bad alignment %p for %s", iov->iov_base, name);
202 return iov->iov_base;
203}
204
205/* Wrapper for the last available index. Makes it easier to change. */
206#define lg_last_avail(vq) ((vq)->last_avail_idx)
207
208/*
209 * The virtio configuration space is defined to be little-endian. x86 is
210 * little-endian too, but it's nice to be explicit so we have these helpers.
211 */
212#define cpu_to_le16(v16) (v16)
213#define cpu_to_le32(v32) (v32)
214#define cpu_to_le64(v64) (v64)
215#define le16_to_cpu(v16) (v16)
216#define le32_to_cpu(v32) (v32)
217#define le64_to_cpu(v64) (v64)
218
219/* Is this iovec empty? */
220static bool iov_empty(const struct iovec iov[], unsigned int num_iov)
221{
222 unsigned int i;
223
224 for (i = 0; i < num_iov; i++)
225 if (iov[i].iov_len)
226 return false;
227 return true;
228}
229
230/* Take len bytes from the front of this iovec. */
231static void iov_consume(struct iovec iov[], unsigned num_iov, unsigned len)
232{
233 unsigned int i;
234
235 for (i = 0; i < num_iov; i++) {
236 unsigned int used;
237
238 used = iov[i].iov_len < len ? iov[i].iov_len : len;
239 iov[i].iov_base += used;
240 iov[i].iov_len -= used;
241 len -= used;
242 }
243 assert(len == 0);
244}
245
246/* The device virtqueue descriptors are followed by feature bitmasks. */
247static u8 *get_feature_bits(struct device *dev)
248{
249 return (u8 *)(dev->desc + 1)
250 + dev->num_vq * sizeof(struct lguest_vqconfig);
251}
252
253/*L:100
254 * The Launcher code itself takes us out into userspace, that scary place where
255 * pointers run wild and free! Unfortunately, like most userspace programs,
256 * it's quite boring (which is why everyone likes to hack on the kernel!).
257 * Perhaps if you make up an Lguest Drinking Game at this point, it will get
258 * you through this section. Or, maybe not.
259 *
260 * The Launcher sets up a big chunk of memory to be the Guest's "physical"
261 * memory and stores it in "guest_base". In other words, Guest physical ==
262 * Launcher virtual with an offset.
263 *
264 * This can be tough to get your head around, but usually it just means that we
265 * use these trivial conversion functions when the Guest gives us its
266 * "physical" addresses:
267 */
268static void *from_guest_phys(unsigned long addr)
269{
270 return guest_base + addr;
271}
272
273static unsigned long to_guest_phys(const void *addr)
274{
275 return (addr - guest_base);
276}
277
278/*L:130
279 * Loading the Kernel.
280 *
281 * We start with couple of simple helper routines. open_or_die() avoids
282 * error-checking code cluttering the callers:
283 */
284static int open_or_die(const char *name, int flags)
285{
286 int fd = open(name, flags);
287 if (fd < 0)
288 err(1, "Failed to open %s", name);
289 return fd;
290}
291
292/* map_zeroed_pages() takes a number of pages. */
293static void *map_zeroed_pages(unsigned int num)
294{
295 int fd = open_or_die("/dev/zero", O_RDONLY);
296 void *addr;
297
298 /*
299 * We use a private mapping (ie. if we write to the page, it will be
300 * copied). We allocate an extra two pages PROT_NONE to act as guard
301 * pages against read/write attempts that exceed allocated space.
302 */
303 addr = mmap(NULL, getpagesize() * (num+2),
304 PROT_NONE, MAP_PRIVATE, fd, 0);
305
306 if (addr == MAP_FAILED)
307 err(1, "Mmapping %u pages of /dev/zero", num);
308
309 if (mprotect(addr + getpagesize(), getpagesize() * num,
310 PROT_READ|PROT_WRITE) == -1)
311 err(1, "mprotect rw %u pages failed", num);
312
313 /*
314 * One neat mmap feature is that you can close the fd, and it
315 * stays mapped.
316 */
317 close(fd);
318
319 /* Return address after PROT_NONE page */
320 return addr + getpagesize();
321}
322
323/* Get some more pages for a device. */
324static void *get_pages(unsigned int num)
325{
326 void *addr = from_guest_phys(guest_limit);
327
328 guest_limit += num * getpagesize();
329 if (guest_limit > guest_max)
330 errx(1, "Not enough memory for devices");
331 return addr;
332}
333
334/*
335 * This routine is used to load the kernel or initrd. It tries mmap, but if
336 * that fails (Plan 9's kernel file isn't nicely aligned on page boundaries),
337 * it falls back to reading the memory in.
338 */
339static void map_at(int fd, void *addr, unsigned long offset, unsigned long len)
340{
341 ssize_t r;
342
343 /*
344 * We map writable even though for some segments are marked read-only.
345 * The kernel really wants to be writable: it patches its own
346 * instructions.
347 *
348 * MAP_PRIVATE means that the page won't be copied until a write is
349 * done to it. This allows us to share untouched memory between
350 * Guests.
351 */
352 if (mmap(addr, len, PROT_READ|PROT_WRITE,
353 MAP_FIXED|MAP_PRIVATE, fd, offset) != MAP_FAILED)
354 return;
355
356 /* pread does a seek and a read in one shot: saves a few lines. */
357 r = pread(fd, addr, len, offset);
358 if (r != len)
359 err(1, "Reading offset %lu len %lu gave %zi", offset, len, r);
360}
361
362/*
363 * This routine takes an open vmlinux image, which is in ELF, and maps it into
364 * the Guest memory. ELF = Embedded Linking Format, which is the format used
365 * by all modern binaries on Linux including the kernel.
366 *
367 * The ELF headers give *two* addresses: a physical address, and a virtual
368 * address. We use the physical address; the Guest will map itself to the
369 * virtual address.
370 *
371 * We return the starting address.
372 */
373static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr)
374{
375 Elf32_Phdr phdr[ehdr->e_phnum];
376 unsigned int i;
377
378 /*
379 * Sanity checks on the main ELF header: an x86 executable with a
380 * reasonable number of correctly-sized program headers.
381 */
382 if (ehdr->e_type != ET_EXEC
383 || ehdr->e_machine != EM_386
384 || ehdr->e_phentsize != sizeof(Elf32_Phdr)
385 || ehdr->e_phnum < 1 || ehdr->e_phnum > 65536U/sizeof(Elf32_Phdr))
386 errx(1, "Malformed elf header");
387
388 /*
389 * An ELF executable contains an ELF header and a number of "program"
390 * headers which indicate which parts ("segments") of the program to
391 * load where.
392 */
393
394 /* We read in all the program headers at once: */
395 if (lseek(elf_fd, ehdr->e_phoff, SEEK_SET) < 0)
396 err(1, "Seeking to program headers");
397 if (read(elf_fd, phdr, sizeof(phdr)) != sizeof(phdr))
398 err(1, "Reading program headers");
399
400 /*
401 * Try all the headers: there are usually only three. A read-only one,
402 * a read-write one, and a "note" section which we don't load.
403 */
404 for (i = 0; i < ehdr->e_phnum; i++) {
405 /* If this isn't a loadable segment, we ignore it */
406 if (phdr[i].p_type != PT_LOAD)
407 continue;
408
409 verbose("Section %i: size %i addr %p\n",
410 i, phdr[i].p_memsz, (void *)phdr[i].p_paddr);
411
412 /* We map this section of the file at its physical address. */
413 map_at(elf_fd, from_guest_phys(phdr[i].p_paddr),
414 phdr[i].p_offset, phdr[i].p_filesz);
415 }
416
417 /* The entry point is given in the ELF header. */
418 return ehdr->e_entry;
419}
420
421/*L:150
422 * A bzImage, unlike an ELF file, is not meant to be loaded. You're supposed
423 * to jump into it and it will unpack itself. We used to have to perform some
424 * hairy magic because the unpacking code scared me.
425 *
426 * Fortunately, Jeremy Fitzhardinge convinced me it wasn't that hard and wrote
427 * a small patch to jump over the tricky bits in the Guest, so now we just read
428 * the funky header so we know where in the file to load, and away we go!
429 */
430static unsigned long load_bzimage(int fd)
431{
432 struct boot_params boot;
433 int r;
434 /* Modern bzImages get loaded at 1M. */
435 void *p = from_guest_phys(0x100000);
436
437 /*
438 * Go back to the start of the file and read the header. It should be
439 * a Linux boot header (see Documentation/x86/i386/boot.txt)
440 */
441 lseek(fd, 0, SEEK_SET);
442 read(fd, &boot, sizeof(boot));
443
444 /* Inside the setup_hdr, we expect the magic "HdrS" */
445 if (memcmp(&boot.hdr.header, "HdrS", 4) != 0)
446 errx(1, "This doesn't look like a bzImage to me");
447
448 /* Skip over the extra sectors of the header. */
449 lseek(fd, (boot.hdr.setup_sects+1) * 512, SEEK_SET);
450
451 /* Now read everything into memory. in nice big chunks. */
452 while ((r = read(fd, p, 65536)) > 0)
453 p += r;
454
455 /* Finally, code32_start tells us where to enter the kernel. */
456 return boot.hdr.code32_start;
457}
458
459/*L:140
460 * Loading the kernel is easy when it's a "vmlinux", but most kernels
461 * come wrapped up in the self-decompressing "bzImage" format. With a little
462 * work, we can load those, too.
463 */
464static unsigned long load_kernel(int fd)
465{
466 Elf32_Ehdr hdr;
467
468 /* Read in the first few bytes. */
469 if (read(fd, &hdr, sizeof(hdr)) != sizeof(hdr))
470 err(1, "Reading kernel");
471
472 /* If it's an ELF file, it starts with "\177ELF" */
473 if (memcmp(hdr.e_ident, ELFMAG, SELFMAG) == 0)
474 return map_elf(fd, &hdr);
475
476 /* Otherwise we assume it's a bzImage, and try to load it. */
477 return load_bzimage(fd);
478}
479
480/*
481 * This is a trivial little helper to align pages. Andi Kleen hated it because
482 * it calls getpagesize() twice: "it's dumb code."
483 *
484 * Kernel guys get really het up about optimization, even when it's not
485 * necessary. I leave this code as a reaction against that.
486 */
487static inline unsigned long page_align(unsigned long addr)
488{
489 /* Add upwards and truncate downwards. */
490 return ((addr + getpagesize()-1) & ~(getpagesize()-1));
491}
492
493/*L:180
494 * An "initial ram disk" is a disk image loaded into memory along with the
495 * kernel which the kernel can use to boot from without needing any drivers.
496 * Most distributions now use this as standard: the initrd contains the code to
497 * load the appropriate driver modules for the current machine.
498 *
499 * Importantly, James Morris works for RedHat, and Fedora uses initrds for its
500 * kernels. He sent me this (and tells me when I break it).
501 */
502static unsigned long load_initrd(const char *name, unsigned long mem)
503{
504 int ifd;
505 struct stat st;
506 unsigned long len;
507
508 ifd = open_or_die(name, O_RDONLY);
509 /* fstat() is needed to get the file size. */
510 if (fstat(ifd, &st) < 0)
511 err(1, "fstat() on initrd '%s'", name);
512
513 /*
514 * We map the initrd at the top of memory, but mmap wants it to be
515 * page-aligned, so we round the size up for that.
516 */
517 len = page_align(st.st_size);
518 map_at(ifd, from_guest_phys(mem - len), 0, st.st_size);
519 /*
520 * Once a file is mapped, you can close the file descriptor. It's a
521 * little odd, but quite useful.
522 */
523 close(ifd);
524 verbose("mapped initrd %s size=%lu @ %p\n", name, len, (void*)mem-len);
525
526 /* We return the initrd size. */
527 return len;
528}
529/*:*/
530
531/*
532 * Simple routine to roll all the commandline arguments together with spaces
533 * between them.
534 */
535static void concat(char *dst, char *args[])
536{
537 unsigned int i, len = 0;
538
539 for (i = 0; args[i]; i++) {
540 if (i) {
541 strcat(dst+len, " ");
542 len++;
543 }
544 strcpy(dst+len, args[i]);
545 len += strlen(args[i]);
546 }
547 /* In case it's empty. */
548 dst[len] = '\0';
549}
550
551/*L:185
552 * This is where we actually tell the kernel to initialize the Guest. We
553 * saw the arguments it expects when we looked at initialize() in lguest_user.c:
554 * the base of Guest "physical" memory, the top physical page to allow and the
555 * entry point for the Guest.
556 */
557static void tell_kernel(unsigned long start)
558{
559 unsigned long args[] = { LHREQ_INITIALIZE,
560 (unsigned long)guest_base,
561 guest_limit / getpagesize(), start };
562 verbose("Guest: %p - %p (%#lx)\n",
563 guest_base, guest_base + guest_limit, guest_limit);
564 lguest_fd = open_or_die("/dev/lguest", O_RDWR);
565 if (write(lguest_fd, args, sizeof(args)) < 0)
566 err(1, "Writing to /dev/lguest");
567}
568/*:*/
569
570/*L:200
571 * Device Handling.
572 *
573 * When the Guest gives us a buffer, it sends an array of addresses and sizes.
574 * We need to make sure it's not trying to reach into the Launcher itself, so
575 * we have a convenient routine which checks it and exits with an error message
576 * if something funny is going on:
577 */
578static void *_check_pointer(unsigned long addr, unsigned int size,
579 unsigned int line)
580{
581 /*
582 * Check if the requested address and size exceeds the allocated memory,
583 * or addr + size wraps around.
584 */
585 if ((addr + size) > guest_limit || (addr + size) < addr)
586 errx(1, "%s:%i: Invalid address %#lx", __FILE__, line, addr);
587 /*
588 * We return a pointer for the caller's convenience, now we know it's
589 * safe to use.
590 */
591 return from_guest_phys(addr);
592}
593/* A macro which transparently hands the line number to the real function. */
594#define check_pointer(addr,size) _check_pointer(addr, size, __LINE__)
595
596/*
597 * Each buffer in the virtqueues is actually a chain of descriptors. This
598 * function returns the next descriptor in the chain, or vq->vring.num if we're
599 * at the end.
600 */
601static unsigned next_desc(struct vring_desc *desc,
602 unsigned int i, unsigned int max)
603{
604 unsigned int next;
605
606 /* If this descriptor says it doesn't chain, we're done. */
607 if (!(desc[i].flags & VRING_DESC_F_NEXT))
608 return max;
609
610 /* Check they're not leading us off end of descriptors. */
611 next = desc[i].next;
612 /* Make sure compiler knows to grab that: we don't want it changing! */
613 wmb();
614
615 if (next >= max)
616 errx(1, "Desc next is %u", next);
617
618 return next;
619}
620
621/*
622 * This actually sends the interrupt for this virtqueue, if we've used a
623 * buffer.
624 */
625static void trigger_irq(struct virtqueue *vq)
626{
627 unsigned long buf[] = { LHREQ_IRQ, vq->config.irq };
628
629 /* Don't inform them if nothing used. */
630 if (!vq->pending_used)
631 return;
632 vq->pending_used = 0;
633
634 /* If they don't want an interrupt, don't send one... */
635 if (vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT) {
636 return;
637 }
638
639 /* Send the Guest an interrupt tell them we used something up. */
640 if (write(lguest_fd, buf, sizeof(buf)) != 0)
641 err(1, "Triggering irq %i", vq->config.irq);
642}
643
644/*
645 * This looks in the virtqueue for the first available buffer, and converts
646 * it to an iovec for convenient access. Since descriptors consist of some
647 * number of output then some number of input descriptors, it's actually two
648 * iovecs, but we pack them into one and note how many of each there were.
649 *
650 * This function waits if necessary, and returns the descriptor number found.
651 */
652static unsigned wait_for_vq_desc(struct virtqueue *vq,
653 struct iovec iov[],
654 unsigned int *out_num, unsigned int *in_num)
655{
656 unsigned int i, head, max;
657 struct vring_desc *desc;
658 u16 last_avail = lg_last_avail(vq);
659
660 /* There's nothing available? */
661 while (last_avail == vq->vring.avail->idx) {
662 u64 event;
663
664 /*
665 * Since we're about to sleep, now is a good time to tell the
666 * Guest about what we've used up to now.
667 */
668 trigger_irq(vq);
669
670 /* OK, now we need to know about added descriptors. */
671 vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
672
673 /*
674 * They could have slipped one in as we were doing that: make
675 * sure it's written, then check again.
676 */
677 mb();
678 if (last_avail != vq->vring.avail->idx) {
679 vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
680 break;
681 }
682
683 /* Nothing new? Wait for eventfd to tell us they refilled. */
684 if (read(vq->eventfd, &event, sizeof(event)) != sizeof(event))
685 errx(1, "Event read failed?");
686
687 /* We don't need to be notified again. */
688 vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
689 }
690
691 /* Check it isn't doing very strange things with descriptor numbers. */
692 if ((u16)(vq->vring.avail->idx - last_avail) > vq->vring.num)
693 errx(1, "Guest moved used index from %u to %u",
694 last_avail, vq->vring.avail->idx);
695
696 /*
697 * Grab the next descriptor number they're advertising, and increment
698 * the index we've seen.
699 */
700 head = vq->vring.avail->ring[last_avail % vq->vring.num];
701 lg_last_avail(vq)++;
702
703 /* If their number is silly, that's a fatal mistake. */
704 if (head >= vq->vring.num)
705 errx(1, "Guest says index %u is available", head);
706
707 /* When we start there are none of either input nor output. */
708 *out_num = *in_num = 0;
709
710 max = vq->vring.num;
711 desc = vq->vring.desc;
712 i = head;
713
714 /*
715 * If this is an indirect entry, then this buffer contains a descriptor
716 * table which we handle as if it's any normal descriptor chain.
717 */
718 if (desc[i].flags & VRING_DESC_F_INDIRECT) {
719 if (desc[i].len % sizeof(struct vring_desc))
720 errx(1, "Invalid size for indirect buffer table");
721
722 max = desc[i].len / sizeof(struct vring_desc);
723 desc = check_pointer(desc[i].addr, desc[i].len);
724 i = 0;
725 }
726
727 do {
728 /* Grab the first descriptor, and check it's OK. */
729 iov[*out_num + *in_num].iov_len = desc[i].len;
730 iov[*out_num + *in_num].iov_base
731 = check_pointer(desc[i].addr, desc[i].len);
732 /* If this is an input descriptor, increment that count. */
733 if (desc[i].flags & VRING_DESC_F_WRITE)
734 (*in_num)++;
735 else {
736 /*
737 * If it's an output descriptor, they're all supposed
738 * to come before any input descriptors.
739 */
740 if (*in_num)
741 errx(1, "Descriptor has out after in");
742 (*out_num)++;
743 }
744
745 /* If we've got too many, that implies a descriptor loop. */
746 if (*out_num + *in_num > max)
747 errx(1, "Looped descriptor");
748 } while ((i = next_desc(desc, i, max)) != max);
749
750 return head;
751}
752
753/*
754 * After we've used one of their buffers, we tell the Guest about it. Sometime
755 * later we'll want to send them an interrupt using trigger_irq(); note that
756 * wait_for_vq_desc() does that for us if it has to wait.
757 */
758static void add_used(struct virtqueue *vq, unsigned int head, int len)
759{
760 struct vring_used_elem *used;
761
762 /*
763 * The virtqueue contains a ring of used buffers. Get a pointer to the
764 * next entry in that used ring.
765 */
766 used = &vq->vring.used->ring[vq->vring.used->idx % vq->vring.num];
767 used->id = head;
768 used->len = len;
769 /* Make sure buffer is written before we update index. */
770 wmb();
771 vq->vring.used->idx++;
772 vq->pending_used++;
773}
774
775/* And here's the combo meal deal. Supersize me! */
776static void add_used_and_trigger(struct virtqueue *vq, unsigned head, int len)
777{
778 add_used(vq, head, len);
779 trigger_irq(vq);
780}
781
782/*
783 * The Console
784 *
785 * We associate some data with the console for our exit hack.
786 */
787struct console_abort {
788 /* How many times have they hit ^C? */
789 int count;
790 /* When did they start? */
791 struct timeval start;
792};
793
794/* This is the routine which handles console input (ie. stdin). */
795static void console_input(struct virtqueue *vq)
796{
797 int len;
798 unsigned int head, in_num, out_num;
799 struct console_abort *abort = vq->dev->priv;
800 struct iovec iov[vq->vring.num];
801
802 /* Make sure there's a descriptor available. */
803 head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
804 if (out_num)
805 errx(1, "Output buffers in console in queue?");
806
807 /* Read into it. This is where we usually wait. */
808 len = readv(STDIN_FILENO, iov, in_num);
809 if (len <= 0) {
810 /* Ran out of input? */
811 warnx("Failed to get console input, ignoring console.");
812 /*
813 * For simplicity, dying threads kill the whole Launcher. So
814 * just nap here.
815 */
816 for (;;)
817 pause();
818 }
819
820 /* Tell the Guest we used a buffer. */
821 add_used_and_trigger(vq, head, len);
822
823 /*
824 * Three ^C within one second? Exit.
825 *
826 * This is such a hack, but works surprisingly well. Each ^C has to
827 * be in a buffer by itself, so they can't be too fast. But we check
828 * that we get three within about a second, so they can't be too
829 * slow.
830 */
831 if (len != 1 || ((char *)iov[0].iov_base)[0] != 3) {
832 abort->count = 0;
833 return;
834 }
835
836 abort->count++;
837 if (abort->count == 1)
838 gettimeofday(&abort->start, NULL);
839 else if (abort->count == 3) {
840 struct timeval now;
841 gettimeofday(&now, NULL);
842 /* Kill all Launcher processes with SIGINT, like normal ^C */
843 if (now.tv_sec <= abort->start.tv_sec+1)
844 kill(0, SIGINT);
845 abort->count = 0;
846 }
847}
848
849/* This is the routine which handles console output (ie. stdout). */
850static void console_output(struct virtqueue *vq)
851{
852 unsigned int head, out, in;
853 struct iovec iov[vq->vring.num];
854
855 /* We usually wait in here, for the Guest to give us something. */
856 head = wait_for_vq_desc(vq, iov, &out, &in);
857 if (in)
858 errx(1, "Input buffers in console output queue?");
859
860 /* writev can return a partial write, so we loop here. */
861 while (!iov_empty(iov, out)) {
862 int len = writev(STDOUT_FILENO, iov, out);
863 if (len <= 0) {
864 warn("Write to stdout gave %i (%d)", len, errno);
865 break;
866 }
867 iov_consume(iov, out, len);
868 }
869
870 /*
871 * We're finished with that buffer: if we're going to sleep,
872 * wait_for_vq_desc() will prod the Guest with an interrupt.
873 */
874 add_used(vq, head, 0);
875}
876
877/*
878 * The Network
879 *
880 * Handling output for network is also simple: we get all the output buffers
881 * and write them to /dev/net/tun.
882 */
883struct net_info {
884 int tunfd;
885};
886
887static void net_output(struct virtqueue *vq)
888{
889 struct net_info *net_info = vq->dev->priv;
890 unsigned int head, out, in;
891 struct iovec iov[vq->vring.num];
892
893 /* We usually wait in here for the Guest to give us a packet. */
894 head = wait_for_vq_desc(vq, iov, &out, &in);
895 if (in)
896 errx(1, "Input buffers in net output queue?");
897 /*
898 * Send the whole thing through to /dev/net/tun. It expects the exact
899 * same format: what a coincidence!
900 */
901 if (writev(net_info->tunfd, iov, out) < 0)
902 warnx("Write to tun failed (%d)?", errno);
903
904 /*
905 * Done with that one; wait_for_vq_desc() will send the interrupt if
906 * all packets are processed.
907 */
908 add_used(vq, head, 0);
909}
910
911/*
912 * Handling network input is a bit trickier, because I've tried to optimize it.
913 *
914 * First we have a helper routine which tells is if from this file descriptor
915 * (ie. the /dev/net/tun device) will block:
916 */
917static bool will_block(int fd)
918{
919 fd_set fdset;
920 struct timeval zero = { 0, 0 };
921 FD_ZERO(&fdset);
922 FD_SET(fd, &fdset);
923 return select(fd+1, &fdset, NULL, NULL, &zero) != 1;
924}
925
926/*
927 * This handles packets coming in from the tun device to our Guest. Like all
928 * service routines, it gets called again as soon as it returns, so you don't
929 * see a while(1) loop here.
930 */
931static void net_input(struct virtqueue *vq)
932{
933 int len;
934 unsigned int head, out, in;
935 struct iovec iov[vq->vring.num];
936 struct net_info *net_info = vq->dev->priv;
937
938 /*
939 * Get a descriptor to write an incoming packet into. This will also
940 * send an interrupt if they're out of descriptors.
941 */
942 head = wait_for_vq_desc(vq, iov, &out, &in);
943 if (out)
944 errx(1, "Output buffers in net input queue?");
945
946 /*
947 * If it looks like we'll block reading from the tun device, send them
948 * an interrupt.
949 */
950 if (vq->pending_used && will_block(net_info->tunfd))
951 trigger_irq(vq);
952
953 /*
954 * Read in the packet. This is where we normally wait (when there's no
955 * incoming network traffic).
956 */
957 len = readv(net_info->tunfd, iov, in);
958 if (len <= 0)
959 warn("Failed to read from tun (%d).", errno);
960
961 /*
962 * Mark that packet buffer as used, but don't interrupt here. We want
963 * to wait until we've done as much work as we can.
964 */
965 add_used(vq, head, len);
966}
967/*:*/
968
969/* This is the helper to create threads: run the service routine in a loop. */
970static int do_thread(void *_vq)
971{
972 struct virtqueue *vq = _vq;
973
974 for (;;)
975 vq->service(vq);
976 return 0;
977}
978
979/*
980 * When a child dies, we kill our entire process group with SIGTERM. This
981 * also has the side effect that the shell restores the console for us!
982 */
983static void kill_launcher(int signal)
984{
985 kill(0, SIGTERM);
986}
987
988static void reset_device(struct device *dev)
989{
990 struct virtqueue *vq;
991
992 verbose("Resetting device %s\n", dev->name);
993
994 /* Clear any features they've acked. */
995 memset(get_feature_bits(dev) + dev->feature_len, 0, dev->feature_len);
996
997 /* We're going to be explicitly killing threads, so ignore them. */
998 signal(SIGCHLD, SIG_IGN);
999
1000 /* Zero out the virtqueues, get rid of their threads */
1001 for (vq = dev->vq; vq; vq = vq->next) {
1002 if (vq->thread != (pid_t)-1) {
1003 kill(vq->thread, SIGTERM);
1004 waitpid(vq->thread, NULL, 0);
1005 vq->thread = (pid_t)-1;
1006 }
1007 memset(vq->vring.desc, 0,
1008 vring_size(vq->config.num, LGUEST_VRING_ALIGN));
1009 lg_last_avail(vq) = 0;
1010 }
1011 dev->running = false;
1012
1013 /* Now we care if threads die. */
1014 signal(SIGCHLD, (void *)kill_launcher);
1015}
1016
1017/*L:216
1018 * This actually creates the thread which services the virtqueue for a device.
1019 */
1020static void create_thread(struct virtqueue *vq)
1021{
1022 /*
1023 * Create stack for thread. Since the stack grows upwards, we point
1024 * the stack pointer to the end of this region.
1025 */
1026 char *stack = malloc(32768);
1027 unsigned long args[] = { LHREQ_EVENTFD,
1028 vq->config.pfn*getpagesize(), 0 };
1029
1030 /* Create a zero-initialized eventfd. */
1031 vq->eventfd = eventfd(0, 0);
1032 if (vq->eventfd < 0)
1033 err(1, "Creating eventfd");
1034 args[2] = vq->eventfd;
1035
1036 /*
1037 * Attach an eventfd to this virtqueue: it will go off when the Guest
1038 * does an LHCALL_NOTIFY for this vq.
1039 */
1040 if (write(lguest_fd, &args, sizeof(args)) != 0)
1041 err(1, "Attaching eventfd");
1042
1043 /*
1044 * CLONE_VM: because it has to access the Guest memory, and SIGCHLD so
1045 * we get a signal if it dies.
1046 */
1047 vq->thread = clone(do_thread, stack + 32768, CLONE_VM | SIGCHLD, vq);
1048 if (vq->thread == (pid_t)-1)
1049 err(1, "Creating clone");
1050
1051 /* We close our local copy now the child has it. */
1052 close(vq->eventfd);
1053}
1054
1055static void start_device(struct device *dev)
1056{
1057 unsigned int i;
1058 struct virtqueue *vq;
1059
1060 verbose("Device %s OK: offered", dev->name);
1061 for (i = 0; i < dev->feature_len; i++)
1062 verbose(" %02x", get_feature_bits(dev)[i]);
1063 verbose(", accepted");
1064 for (i = 0; i < dev->feature_len; i++)
1065 verbose(" %02x", get_feature_bits(dev)
1066 [dev->feature_len+i]);
1067
1068 for (vq = dev->vq; vq; vq = vq->next) {
1069 if (vq->service)
1070 create_thread(vq);
1071 }
1072 dev->running = true;
1073}
1074
1075static void cleanup_devices(void)
1076{
1077 struct device *dev;
1078
1079 for (dev = devices.dev; dev; dev = dev->next)
1080 reset_device(dev);
1081
1082 /* If we saved off the original terminal settings, restore them now. */
1083 if (orig_term.c_lflag & (ISIG|ICANON|ECHO))
1084 tcsetattr(STDIN_FILENO, TCSANOW, &orig_term);
1085}
1086
1087/* When the Guest tells us they updated the status field, we handle it. */
1088static void update_device_status(struct device *dev)
1089{
1090 /* A zero status is a reset, otherwise it's a set of flags. */
1091 if (dev->desc->status == 0)
1092 reset_device(dev);
1093 else if (dev->desc->status & VIRTIO_CONFIG_S_FAILED) {
1094 warnx("Device %s configuration FAILED", dev->name);
1095 if (dev->running)
1096 reset_device(dev);
1097 } else {
1098 if (dev->running)
1099 err(1, "Device %s features finalized twice", dev->name);
1100 start_device(dev);
1101 }
1102}
1103
1104/*L:215
1105 * This is the generic routine we call when the Guest uses LHCALL_NOTIFY. In
1106 * particular, it's used to notify us of device status changes during boot.
1107 */
1108static void handle_output(unsigned long addr)
1109{
1110 struct device *i;
1111
1112 /* Check each device. */
1113 for (i = devices.dev; i; i = i->next) {
1114 struct virtqueue *vq;
1115
1116 /*
1117 * Notifications to device descriptors mean they updated the
1118 * device status.
1119 */
1120 if (from_guest_phys(addr) == i->desc) {
1121 update_device_status(i);
1122 return;
1123 }
1124
1125 /* Devices should not be used before features are finalized. */
1126 for (vq = i->vq; vq; vq = vq->next) {
1127 if (addr != vq->config.pfn*getpagesize())
1128 continue;
1129 errx(1, "Notification on %s before setup!", i->name);
1130 }
1131 }
1132
1133 /*
1134 * Early console write is done using notify on a nul-terminated string
1135 * in Guest memory. It's also great for hacking debugging messages
1136 * into a Guest.
1137 */
1138 if (addr >= guest_limit)
1139 errx(1, "Bad NOTIFY %#lx", addr);
1140
1141 write(STDOUT_FILENO, from_guest_phys(addr),
1142 strnlen(from_guest_phys(addr), guest_limit - addr));
1143}
1144
1145/*L:190
1146 * Device Setup
1147 *
1148 * All devices need a descriptor so the Guest knows it exists, and a "struct
1149 * device" so the Launcher can keep track of it. We have common helper
1150 * routines to allocate and manage them.
1151 */
1152
1153/*
1154 * The layout of the device page is a "struct lguest_device_desc" followed by a
1155 * number of virtqueue descriptors, then two sets of feature bits, then an
1156 * array of configuration bytes. This routine returns the configuration
1157 * pointer.
1158 */
1159static u8 *device_config(const struct device *dev)
1160{
1161 return (void *)(dev->desc + 1)
1162 + dev->num_vq * sizeof(struct lguest_vqconfig)
1163 + dev->feature_len * 2;
1164}
1165
1166/*
1167 * This routine allocates a new "struct lguest_device_desc" from descriptor
1168 * table page just above the Guest's normal memory. It returns a pointer to
1169 * that descriptor.
1170 */
1171static struct lguest_device_desc *new_dev_desc(u16 type)
1172{
1173 struct lguest_device_desc d = { .type = type };
1174 void *p;
1175
1176 /* Figure out where the next device config is, based on the last one. */
1177 if (devices.lastdev)
1178 p = device_config(devices.lastdev)
1179 + devices.lastdev->desc->config_len;
1180 else
1181 p = devices.descpage;
1182
1183 /* We only have one page for all the descriptors. */
1184 if (p + sizeof(d) > (void *)devices.descpage + getpagesize())
1185 errx(1, "Too many devices");
1186
1187 /* p might not be aligned, so we memcpy in. */
1188 return memcpy(p, &d, sizeof(d));
1189}
1190
1191/*
1192 * Each device descriptor is followed by the description of its virtqueues. We
1193 * specify how many descriptors the virtqueue is to have.
1194 */
1195static void add_virtqueue(struct device *dev, unsigned int num_descs,
1196 void (*service)(struct virtqueue *))
1197{
1198 unsigned int pages;
1199 struct virtqueue **i, *vq = malloc(sizeof(*vq));
1200 void *p;
1201
1202 /* First we need some memory for this virtqueue. */
1203 pages = (vring_size(num_descs, LGUEST_VRING_ALIGN) + getpagesize() - 1)
1204 / getpagesize();
1205 p = get_pages(pages);
1206
1207 /* Initialize the virtqueue */
1208 vq->next = NULL;
1209 vq->last_avail_idx = 0;
1210 vq->dev = dev;
1211
1212 /*
1213 * This is the routine the service thread will run, and its Process ID
1214 * once it's running.
1215 */
1216 vq->service = service;
1217 vq->thread = (pid_t)-1;
1218
1219 /* Initialize the configuration. */
1220 vq->config.num = num_descs;
1221 vq->config.irq = devices.next_irq++;
1222 vq->config.pfn = to_guest_phys(p) / getpagesize();
1223
1224 /* Initialize the vring. */
1225 vring_init(&vq->vring, num_descs, p, LGUEST_VRING_ALIGN);
1226
1227 /*
1228 * Append virtqueue to this device's descriptor. We use
1229 * device_config() to get the end of the device's current virtqueues;
1230 * we check that we haven't added any config or feature information
1231 * yet, otherwise we'd be overwriting them.
1232 */
1233 assert(dev->desc->config_len == 0 && dev->desc->feature_len == 0);
1234 memcpy(device_config(dev), &vq->config, sizeof(vq->config));
1235 dev->num_vq++;
1236 dev->desc->num_vq++;
1237
1238 verbose("Virtqueue page %#lx\n", to_guest_phys(p));
1239
1240 /*
1241 * Add to tail of list, so dev->vq is first vq, dev->vq->next is
1242 * second.
1243 */
1244 for (i = &dev->vq; *i; i = &(*i)->next);
1245 *i = vq;
1246}
1247
1248/*
1249 * The first half of the feature bitmask is for us to advertise features. The
1250 * second half is for the Guest to accept features.
1251 */
1252static void add_feature(struct device *dev, unsigned bit)
1253{
1254 u8 *features = get_feature_bits(dev);
1255
1256 /* We can't extend the feature bits once we've added config bytes */
1257 if (dev->desc->feature_len <= bit / CHAR_BIT) {
1258 assert(dev->desc->config_len == 0);
1259 dev->feature_len = dev->desc->feature_len = (bit/CHAR_BIT) + 1;
1260 }
1261
1262 features[bit / CHAR_BIT] |= (1 << (bit % CHAR_BIT));
1263}
1264
1265/*
1266 * This routine sets the configuration fields for an existing device's
1267 * descriptor. It only works for the last device, but that's OK because that's
1268 * how we use it.
1269 */
1270static void set_config(struct device *dev, unsigned len, const void *conf)
1271{
1272 /* Check we haven't overflowed our single page. */
1273 if (device_config(dev) + len > devices.descpage + getpagesize())
1274 errx(1, "Too many devices");
1275
1276 /* Copy in the config information, and store the length. */
1277 memcpy(device_config(dev), conf, len);
1278 dev->desc->config_len = len;
1279
1280 /* Size must fit in config_len field (8 bits)! */
1281 assert(dev->desc->config_len == len);
1282}
1283
1284/*
1285 * This routine does all the creation and setup of a new device, including
1286 * calling new_dev_desc() to allocate the descriptor and device memory. We
1287 * don't actually start the service threads until later.
1288 *
1289 * See what I mean about userspace being boring?
1290 */
1291static struct device *new_device(const char *name, u16 type)
1292{
1293 struct device *dev = malloc(sizeof(*dev));
1294
1295 /* Now we populate the fields one at a time. */
1296 dev->desc = new_dev_desc(type);
1297 dev->name = name;
1298 dev->vq = NULL;
1299 dev->feature_len = 0;
1300 dev->num_vq = 0;
1301 dev->running = false;
1302
1303 /*
1304 * Append to device list. Prepending to a single-linked list is
1305 * easier, but the user expects the devices to be arranged on the bus
1306 * in command-line order. The first network device on the command line
1307 * is eth0, the first block device /dev/vda, etc.
1308 */
1309 if (devices.lastdev)
1310 devices.lastdev->next = dev;
1311 else
1312 devices.dev = dev;
1313 devices.lastdev = dev;
1314
1315 return dev;
1316}
1317
1318/*
1319 * Our first setup routine is the console. It's a fairly simple device, but
1320 * UNIX tty handling makes it uglier than it could be.
1321 */
1322static void setup_console(void)
1323{
1324 struct device *dev;
1325
1326 /* If we can save the initial standard input settings... */
1327 if (tcgetattr(STDIN_FILENO, &orig_term) == 0) {
1328 struct termios term = orig_term;
1329 /*
1330 * Then we turn off echo, line buffering and ^C etc: We want a
1331 * raw input stream to the Guest.
1332 */
1333 term.c_lflag &= ~(ISIG|ICANON|ECHO);
1334 tcsetattr(STDIN_FILENO, TCSANOW, &term);
1335 }
1336
1337 dev = new_device("console", VIRTIO_ID_CONSOLE);
1338
1339 /* We store the console state in dev->priv, and initialize it. */
1340 dev->priv = malloc(sizeof(struct console_abort));
1341 ((struct console_abort *)dev->priv)->count = 0;
1342
1343 /*
1344 * The console needs two virtqueues: the input then the output. When
1345 * they put something the input queue, we make sure we're listening to
1346 * stdin. When they put something in the output queue, we write it to
1347 * stdout.
1348 */
1349 add_virtqueue(dev, VIRTQUEUE_NUM, console_input);
1350 add_virtqueue(dev, VIRTQUEUE_NUM, console_output);
1351
1352 verbose("device %u: console\n", ++devices.device_num);
1353}
1354/*:*/
1355
1356/*M:010
1357 * Inter-guest networking is an interesting area. Simplest is to have a
1358 * --sharenet=<name> option which opens or creates a named pipe. This can be
1359 * used to send packets to another guest in a 1:1 manner.
1360 *
1361 * More sophisticated is to use one of the tools developed for project like UML
1362 * to do networking.
1363 *
1364 * Faster is to do virtio bonding in kernel. Doing this 1:1 would be
1365 * completely generic ("here's my vring, attach to your vring") and would work
1366 * for any traffic. Of course, namespace and permissions issues need to be
1367 * dealt with. A more sophisticated "multi-channel" virtio_net.c could hide
1368 * multiple inter-guest channels behind one interface, although it would
1369 * require some manner of hotplugging new virtio channels.
1370 *
1371 * Finally, we could use a virtio network switch in the kernel, ie. vhost.
1372:*/
1373
1374static u32 str2ip(const char *ipaddr)
1375{
1376 unsigned int b[4];
1377
1378 if (sscanf(ipaddr, "%u.%u.%u.%u", &b[0], &b[1], &b[2], &b[3]) != 4)
1379 errx(1, "Failed to parse IP address '%s'", ipaddr);
1380 return (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[3];
1381}
1382
1383static void str2mac(const char *macaddr, unsigned char mac[6])
1384{
1385 unsigned int m[6];
1386 if (sscanf(macaddr, "%02x:%02x:%02x:%02x:%02x:%02x",
1387 &m[0], &m[1], &m[2], &m[3], &m[4], &m[5]) != 6)
1388 errx(1, "Failed to parse mac address '%s'", macaddr);
1389 mac[0] = m[0];
1390 mac[1] = m[1];
1391 mac[2] = m[2];
1392 mac[3] = m[3];
1393 mac[4] = m[4];
1394 mac[5] = m[5];
1395}
1396
1397/*
1398 * This code is "adapted" from libbridge: it attaches the Host end of the
1399 * network device to the bridge device specified by the command line.
1400 *
1401 * This is yet another James Morris contribution (I'm an IP-level guy, so I
1402 * dislike bridging), and I just try not to break it.
1403 */
1404static void add_to_bridge(int fd, const char *if_name, const char *br_name)
1405{
1406 int ifidx;
1407 struct ifreq ifr;
1408
1409 if (!*br_name)
1410 errx(1, "must specify bridge name");
1411
1412 ifidx = if_nametoindex(if_name);
1413 if (!ifidx)
1414 errx(1, "interface %s does not exist!", if_name);
1415
1416 strncpy(ifr.ifr_name, br_name, IFNAMSIZ);
1417 ifr.ifr_name[IFNAMSIZ-1] = '\0';
1418 ifr.ifr_ifindex = ifidx;
1419 if (ioctl(fd, SIOCBRADDIF, &ifr) < 0)
1420 err(1, "can't add %s to bridge %s", if_name, br_name);
1421}
1422
1423/*
1424 * This sets up the Host end of the network device with an IP address, brings
1425 * it up so packets will flow, the copies the MAC address into the hwaddr
1426 * pointer.
1427 */
1428static void configure_device(int fd, const char *tapif, u32 ipaddr)
1429{
1430 struct ifreq ifr;
1431 struct sockaddr_in sin;
1432
1433 memset(&ifr, 0, sizeof(ifr));
1434 strcpy(ifr.ifr_name, tapif);
1435
1436 /* Don't read these incantations. Just cut & paste them like I did! */
1437 sin.sin_family = AF_INET;
1438 sin.sin_addr.s_addr = htonl(ipaddr);
1439 memcpy(&ifr.ifr_addr, &sin, sizeof(sin));
1440 if (ioctl(fd, SIOCSIFADDR, &ifr) != 0)
1441 err(1, "Setting %s interface address", tapif);
1442 ifr.ifr_flags = IFF_UP;
1443 if (ioctl(fd, SIOCSIFFLAGS, &ifr) != 0)
1444 err(1, "Bringing interface %s up", tapif);
1445}
1446
1447static int get_tun_device(char tapif[IFNAMSIZ])
1448{
1449 struct ifreq ifr;
1450 int netfd;
1451
1452 /* Start with this zeroed. Messy but sure. */
1453 memset(&ifr, 0, sizeof(ifr));
1454
1455 /*
1456 * We open the /dev/net/tun device and tell it we want a tap device. A
1457 * tap device is like a tun device, only somehow different. To tell
1458 * the truth, I completely blundered my way through this code, but it
1459 * works now!
1460 */
1461 netfd = open_or_die("/dev/net/tun", O_RDWR);
1462 ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
1463 strcpy(ifr.ifr_name, "tap%d");
1464 if (ioctl(netfd, TUNSETIFF, &ifr) != 0)
1465 err(1, "configuring /dev/net/tun");
1466
1467 if (ioctl(netfd, TUNSETOFFLOAD,
1468 TUN_F_CSUM|TUN_F_TSO4|TUN_F_TSO6|TUN_F_TSO_ECN) != 0)
1469 err(1, "Could not set features for tun device");
1470
1471 /*
1472 * We don't need checksums calculated for packets coming in this
1473 * device: trust us!
1474 */
1475 ioctl(netfd, TUNSETNOCSUM, 1);
1476
1477 memcpy(tapif, ifr.ifr_name, IFNAMSIZ);
1478 return netfd;
1479}
1480
1481/*L:195
1482 * Our network is a Host<->Guest network. This can either use bridging or
1483 * routing, but the principle is the same: it uses the "tun" device to inject
1484 * packets into the Host as if they came in from a normal network card. We
1485 * just shunt packets between the Guest and the tun device.
1486 */
1487static void setup_tun_net(char *arg)
1488{
1489 struct device *dev;
1490 struct net_info *net_info = malloc(sizeof(*net_info));
1491 int ipfd;
1492 u32 ip = INADDR_ANY;
1493 bool bridging = false;
1494 char tapif[IFNAMSIZ], *p;
1495 struct virtio_net_config conf;
1496
1497 net_info->tunfd = get_tun_device(tapif);
1498
1499 /* First we create a new network device. */
1500 dev = new_device("net", VIRTIO_ID_NET);
1501 dev->priv = net_info;
1502
1503 /* Network devices need a recv and a send queue, just like console. */
1504 add_virtqueue(dev, VIRTQUEUE_NUM, net_input);
1505 add_virtqueue(dev, VIRTQUEUE_NUM, net_output);
1506
1507 /*
1508 * We need a socket to perform the magic network ioctls to bring up the
1509 * tap interface, connect to the bridge etc. Any socket will do!
1510 */
1511 ipfd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
1512 if (ipfd < 0)
1513 err(1, "opening IP socket");
1514
1515 /* If the command line was --tunnet=bridge:<name> do bridging. */
1516 if (!strncmp(BRIDGE_PFX, arg, strlen(BRIDGE_PFX))) {
1517 arg += strlen(BRIDGE_PFX);
1518 bridging = true;
1519 }
1520
1521 /* A mac address may follow the bridge name or IP address */
1522 p = strchr(arg, ':');
1523 if (p) {
1524 str2mac(p+1, conf.mac);
1525 add_feature(dev, VIRTIO_NET_F_MAC);
1526 *p = '\0';
1527 }
1528
1529 /* arg is now either an IP address or a bridge name */
1530 if (bridging)
1531 add_to_bridge(ipfd, tapif, arg);
1532 else
1533 ip = str2ip(arg);
1534
1535 /* Set up the tun device. */
1536 configure_device(ipfd, tapif, ip);
1537
1538 /* Expect Guest to handle everything except UFO */
1539 add_feature(dev, VIRTIO_NET_F_CSUM);
1540 add_feature(dev, VIRTIO_NET_F_GUEST_CSUM);
1541 add_feature(dev, VIRTIO_NET_F_GUEST_TSO4);
1542 add_feature(dev, VIRTIO_NET_F_GUEST_TSO6);
1543 add_feature(dev, VIRTIO_NET_F_GUEST_ECN);
1544 add_feature(dev, VIRTIO_NET_F_HOST_TSO4);
1545 add_feature(dev, VIRTIO_NET_F_HOST_TSO6);
1546 add_feature(dev, VIRTIO_NET_F_HOST_ECN);
1547 /* We handle indirect ring entries */
1548 add_feature(dev, VIRTIO_RING_F_INDIRECT_DESC);
1549 set_config(dev, sizeof(conf), &conf);
1550
1551 /* We don't need the socket any more; setup is done. */
1552 close(ipfd);
1553
1554 devices.device_num++;
1555
1556 if (bridging)
1557 verbose("device %u: tun %s attached to bridge: %s\n",
1558 devices.device_num, tapif, arg);
1559 else
1560 verbose("device %u: tun %s: %s\n",
1561 devices.device_num, tapif, arg);
1562}
1563/*:*/
1564
1565/* This hangs off device->priv. */
1566struct vblk_info {
1567 /* The size of the file. */
1568 off64_t len;
1569
1570 /* The file descriptor for the file. */
1571 int fd;
1572
1573};
1574
1575/*L:210
1576 * The Disk
1577 *
1578 * The disk only has one virtqueue, so it only has one thread. It is really
1579 * simple: the Guest asks for a block number and we read or write that position
1580 * in the file.
1581 *
1582 * Before we serviced each virtqueue in a separate thread, that was unacceptably
1583 * slow: the Guest waits until the read is finished before running anything
1584 * else, even if it could have been doing useful work.
1585 *
1586 * We could have used async I/O, except it's reputed to suck so hard that
1587 * characters actually go missing from your code when you try to use it.
1588 */
1589static void blk_request(struct virtqueue *vq)
1590{
1591 struct vblk_info *vblk = vq->dev->priv;
1592 unsigned int head, out_num, in_num, wlen;
1593 int ret;
1594 u8 *in;
1595 struct virtio_blk_outhdr *out;
1596 struct iovec iov[vq->vring.num];
1597 off64_t off;
1598
1599 /*
1600 * Get the next request, where we normally wait. It triggers the
1601 * interrupt to acknowledge previously serviced requests (if any).
1602 */
1603 head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
1604
1605 /*
1606 * Every block request should contain at least one output buffer
1607 * (detailing the location on disk and the type of request) and one
1608 * input buffer (to hold the result).
1609 */
1610 if (out_num == 0 || in_num == 0)
1611 errx(1, "Bad virtblk cmd %u out=%u in=%u",
1612 head, out_num, in_num);
1613
1614 out = convert(&iov[0], struct virtio_blk_outhdr);
1615 in = convert(&iov[out_num+in_num-1], u8);
1616 /*
1617 * For historical reasons, block operations are expressed in 512 byte
1618 * "sectors".
1619 */
1620 off = out->sector * 512;
1621
1622 /*
1623 * In general the virtio block driver is allowed to try SCSI commands.
1624 * It'd be nice if we supported eject, for example, but we don't.
1625 */
1626 if (out->type & VIRTIO_BLK_T_SCSI_CMD) {
1627 fprintf(stderr, "Scsi commands unsupported\n");
1628 *in = VIRTIO_BLK_S_UNSUPP;
1629 wlen = sizeof(*in);
1630 } else if (out->type & VIRTIO_BLK_T_OUT) {
1631 /*
1632 * Write
1633 *
1634 * Move to the right location in the block file. This can fail
1635 * if they try to write past end.
1636 */
1637 if (lseek64(vblk->fd, off, SEEK_SET) != off)
1638 err(1, "Bad seek to sector %llu", out->sector);
1639
1640 ret = writev(vblk->fd, iov+1, out_num-1);
1641 verbose("WRITE to sector %llu: %i\n", out->sector, ret);
1642
1643 /*
1644 * Grr... Now we know how long the descriptor they sent was, we
1645 * make sure they didn't try to write over the end of the block
1646 * file (possibly extending it).
1647 */
1648 if (ret > 0 && off + ret > vblk->len) {
1649 /* Trim it back to the correct length */
1650 ftruncate64(vblk->fd, vblk->len);
1651 /* Die, bad Guest, die. */
1652 errx(1, "Write past end %llu+%u", off, ret);
1653 }
1654
1655 wlen = sizeof(*in);
1656 *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR);
1657 } else if (out->type & VIRTIO_BLK_T_FLUSH) {
1658 /* Flush */
1659 ret = fdatasync(vblk->fd);
1660 verbose("FLUSH fdatasync: %i\n", ret);
1661 wlen = sizeof(*in);
1662 *in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR);
1663 } else {
1664 /*
1665 * Read
1666 *
1667 * Move to the right location in the block file. This can fail
1668 * if they try to read past end.
1669 */
1670 if (lseek64(vblk->fd, off, SEEK_SET) != off)
1671 err(1, "Bad seek to sector %llu", out->sector);
1672
1673 ret = readv(vblk->fd, iov+1, in_num-1);
1674 verbose("READ from sector %llu: %i\n", out->sector, ret);
1675 if (ret >= 0) {
1676 wlen = sizeof(*in) + ret;
1677 *in = VIRTIO_BLK_S_OK;
1678 } else {
1679 wlen = sizeof(*in);
1680 *in = VIRTIO_BLK_S_IOERR;
1681 }
1682 }
1683
1684 /* Finished that request. */
1685 add_used(vq, head, wlen);
1686}
1687
1688/*L:198 This actually sets up a virtual block device. */
1689static void setup_block_file(const char *filename)
1690{
1691 struct device *dev;
1692 struct vblk_info *vblk;
1693 struct virtio_blk_config conf;
1694
1695 /* Creat the device. */
1696 dev = new_device("block", VIRTIO_ID_BLOCK);
1697
1698 /* The device has one virtqueue, where the Guest places requests. */
1699 add_virtqueue(dev, VIRTQUEUE_NUM, blk_request);
1700
1701 /* Allocate the room for our own bookkeeping */
1702 vblk = dev->priv = malloc(sizeof(*vblk));
1703
1704 /* First we open the file and store the length. */
1705 vblk->fd = open_or_die(filename, O_RDWR|O_LARGEFILE);
1706 vblk->len = lseek64(vblk->fd, 0, SEEK_END);
1707
1708 /* We support FLUSH. */
1709 add_feature(dev, VIRTIO_BLK_F_FLUSH);
1710
1711 /* Tell Guest how many sectors this device has. */
1712 conf.capacity = cpu_to_le64(vblk->len / 512);
1713
1714 /*
1715 * Tell Guest not to put in too many descriptors at once: two are used
1716 * for the in and out elements.
1717 */
1718 add_feature(dev, VIRTIO_BLK_F_SEG_MAX);
1719 conf.seg_max = cpu_to_le32(VIRTQUEUE_NUM - 2);
1720
1721 /* Don't try to put whole struct: we have 8 bit limit. */
1722 set_config(dev, offsetof(struct virtio_blk_config, geometry), &conf);
1723
1724 verbose("device %u: virtblock %llu sectors\n",
1725 ++devices.device_num, le64_to_cpu(conf.capacity));
1726}
1727
1728/*L:211
1729 * Our random number generator device reads from /dev/random into the Guest's
1730 * input buffers. The usual case is that the Guest doesn't want random numbers
1731 * and so has no buffers although /dev/random is still readable, whereas
1732 * console is the reverse.
1733 *
1734 * The same logic applies, however.
1735 */
1736struct rng_info {
1737 int rfd;
1738};
1739
1740static void rng_input(struct virtqueue *vq)
1741{
1742 int len;
1743 unsigned int head, in_num, out_num, totlen = 0;
1744 struct rng_info *rng_info = vq->dev->priv;
1745 struct iovec iov[vq->vring.num];
1746
1747 /* First we need a buffer from the Guests's virtqueue. */
1748 head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
1749 if (out_num)
1750 errx(1, "Output buffers in rng?");
1751
1752 /*
1753 * Just like the console write, we loop to cover the whole iovec.
1754 * In this case, short reads actually happen quite a bit.
1755 */
1756 while (!iov_empty(iov, in_num)) {
1757 len = readv(rng_info->rfd, iov, in_num);
1758 if (len <= 0)
1759 err(1, "Read from /dev/random gave %i", len);
1760 iov_consume(iov, in_num, len);
1761 totlen += len;
1762 }
1763
1764 /* Tell the Guest about the new input. */
1765 add_used(vq, head, totlen);
1766}
1767
1768/*L:199
1769 * This creates a "hardware" random number device for the Guest.
1770 */
1771static void setup_rng(void)
1772{
1773 struct device *dev;
1774 struct rng_info *rng_info = malloc(sizeof(*rng_info));
1775
1776 /* Our device's privat info simply contains the /dev/random fd. */
1777 rng_info->rfd = open_or_die("/dev/random", O_RDONLY);
1778
1779 /* Create the new device. */
1780 dev = new_device("rng", VIRTIO_ID_RNG);
1781 dev->priv = rng_info;
1782
1783 /* The device has one virtqueue, where the Guest places inbufs. */
1784 add_virtqueue(dev, VIRTQUEUE_NUM, rng_input);
1785
1786 verbose("device %u: rng\n", devices.device_num++);
1787}
1788/* That's the end of device setup. */
1789
1790/*L:230 Reboot is pretty easy: clean up and exec() the Launcher afresh. */
1791static void __attribute__((noreturn)) restart_guest(void)
1792{
1793 unsigned int i;
1794
1795 /*
1796 * Since we don't track all open fds, we simply close everything beyond
1797 * stderr.
1798 */
1799 for (i = 3; i < FD_SETSIZE; i++)
1800 close(i);
1801
1802 /* Reset all the devices (kills all threads). */
1803 cleanup_devices();
1804
1805 execv(main_args[0], main_args);
1806 err(1, "Could not exec %s", main_args[0]);
1807}
1808
1809/*L:220
1810 * Finally we reach the core of the Launcher which runs the Guest, serves
1811 * its input and output, and finally, lays it to rest.
1812 */
1813static void __attribute__((noreturn)) run_guest(void)
1814{
1815 for (;;) {
1816 unsigned long notify_addr;
1817 int readval;
1818
1819 /* We read from the /dev/lguest device to run the Guest. */
1820 readval = pread(lguest_fd, &notify_addr,
1821 sizeof(notify_addr), cpu_id);
1822
1823 /* One unsigned long means the Guest did HCALL_NOTIFY */
1824 if (readval == sizeof(notify_addr)) {
1825 verbose("Notify on address %#lx\n", notify_addr);
1826 handle_output(notify_addr);
1827 /* ENOENT means the Guest died. Reading tells us why. */
1828 } else if (errno == ENOENT) {
1829 char reason[1024] = { 0 };
1830 pread(lguest_fd, reason, sizeof(reason)-1, cpu_id);
1831 errx(1, "%s", reason);
1832 /* ERESTART means that we need to reboot the guest */
1833 } else if (errno == ERESTART) {
1834 restart_guest();
1835 /* Anything else means a bug or incompatible change. */
1836 } else
1837 err(1, "Running guest failed");
1838 }
1839}
1840/*L:240
1841 * This is the end of the Launcher. The good news: we are over halfway
1842 * through! The bad news: the most fiendish part of the code still lies ahead
1843 * of us.
1844 *
1845 * Are you ready? Take a deep breath and join me in the core of the Host, in
1846 * "make Host".
1847:*/
1848
1849static struct option opts[] = {
1850 { "verbose", 0, NULL, 'v' },
1851 { "tunnet", 1, NULL, 't' },
1852 { "block", 1, NULL, 'b' },
1853 { "rng", 0, NULL, 'r' },
1854 { "initrd", 1, NULL, 'i' },
1855 { "username", 1, NULL, 'u' },
1856 { "chroot", 1, NULL, 'c' },
1857 { NULL },
1858};
1859static void usage(void)
1860{
1861 errx(1, "Usage: lguest [--verbose] "
1862 "[--tunnet=(<ipaddr>:<macaddr>|bridge:<bridgename>:<macaddr>)\n"
1863 "|--block=<filename>|--initrd=<filename>]...\n"
1864 "<mem-in-mb> vmlinux [args...]");
1865}
1866
1867/*L:105 The main routine is where the real work begins: */
1868int main(int argc, char *argv[])
1869{
1870 /* Memory, code startpoint and size of the (optional) initrd. */
1871 unsigned long mem = 0, start, initrd_size = 0;
1872 /* Two temporaries. */
1873 int i, c;
1874 /* The boot information for the Guest. */
1875 struct boot_params *boot;
1876 /* If they specify an initrd file to load. */
1877 const char *initrd_name = NULL;
1878
1879 /* Password structure for initgroups/setres[gu]id */
1880 struct passwd *user_details = NULL;
1881
1882 /* Directory to chroot to */
1883 char *chroot_path = NULL;
1884
1885 /* Save the args: we "reboot" by execing ourselves again. */
1886 main_args = argv;
1887
1888 /*
1889 * First we initialize the device list. We keep a pointer to the last
1890 * device, and the next interrupt number to use for devices (1:
1891 * remember that 0 is used by the timer).
1892 */
1893 devices.lastdev = NULL;
1894 devices.next_irq = 1;
1895
1896 /* We're CPU 0. In fact, that's the only CPU possible right now. */
1897 cpu_id = 0;
1898
1899 /*
1900 * We need to know how much memory so we can set up the device
1901 * descriptor and memory pages for the devices as we parse the command
1902 * line. So we quickly look through the arguments to find the amount
1903 * of memory now.
1904 */
1905 for (i = 1; i < argc; i++) {
1906 if (argv[i][0] != '-') {
1907 mem = atoi(argv[i]) * 1024 * 1024;
1908 /*
1909 * We start by mapping anonymous pages over all of
1910 * guest-physical memory range. This fills it with 0,
1911 * and ensures that the Guest won't be killed when it
1912 * tries to access it.
1913 */
1914 guest_base = map_zeroed_pages(mem / getpagesize()
1915 + DEVICE_PAGES);
1916 guest_limit = mem;
1917 guest_max = mem + DEVICE_PAGES*getpagesize();
1918 devices.descpage = get_pages(1);
1919 break;
1920 }
1921 }
1922
1923 /* The options are fairly straight-forward */
1924 while ((c = getopt_long(argc, argv, "v", opts, NULL)) != EOF) {
1925 switch (c) {
1926 case 'v':
1927 verbose = true;
1928 break;
1929 case 't':
1930 setup_tun_net(optarg);
1931 break;
1932 case 'b':
1933 setup_block_file(optarg);
1934 break;
1935 case 'r':
1936 setup_rng();
1937 break;
1938 case 'i':
1939 initrd_name = optarg;
1940 break;
1941 case 'u':
1942 user_details = getpwnam(optarg);
1943 if (!user_details)
1944 err(1, "getpwnam failed, incorrect username?");
1945 break;
1946 case 'c':
1947 chroot_path = optarg;
1948 break;
1949 default:
1950 warnx("Unknown argument %s", argv[optind]);
1951 usage();
1952 }
1953 }
1954 /*
1955 * After the other arguments we expect memory and kernel image name,
1956 * followed by command line arguments for the kernel.
1957 */
1958 if (optind + 2 > argc)
1959 usage();
1960
1961 verbose("Guest base is at %p\n", guest_base);
1962
1963 /* We always have a console device */
1964 setup_console();
1965
1966 /* Now we load the kernel */
1967 start = load_kernel(open_or_die(argv[optind+1], O_RDONLY));
1968
1969 /* Boot information is stashed at physical address 0 */
1970 boot = from_guest_phys(0);
1971
1972 /* Map the initrd image if requested (at top of physical memory) */
1973 if (initrd_name) {
1974 initrd_size = load_initrd(initrd_name, mem);
1975 /*
1976 * These are the location in the Linux boot header where the
1977 * start and size of the initrd are expected to be found.
1978 */
1979 boot->hdr.ramdisk_image = mem - initrd_size;
1980 boot->hdr.ramdisk_size = initrd_size;
1981 /* The bootloader type 0xFF means "unknown"; that's OK. */
1982 boot->hdr.type_of_loader = 0xFF;
1983 }
1984
1985 /*
1986 * The Linux boot header contains an "E820" memory map: ours is a
1987 * simple, single region.
1988 */
1989 boot->e820_entries = 1;
1990 boot->e820_map[0] = ((struct e820entry) { 0, mem, E820_RAM });
1991 /*
1992 * The boot header contains a command line pointer: we put the command
1993 * line after the boot header.
1994 */
1995 boot->hdr.cmd_line_ptr = to_guest_phys(boot + 1);
1996 /* We use a simple helper to copy the arguments separated by spaces. */
1997 concat((char *)(boot + 1), argv+optind+2);
1998
1999 /* Set kernel alignment to 16M (CONFIG_PHYSICAL_ALIGN) */
2000 boot->hdr.kernel_alignment = 0x1000000;
2001
2002 /* Boot protocol version: 2.07 supports the fields for lguest. */
2003 boot->hdr.version = 0x207;
2004
2005 /* The hardware_subarch value of "1" tells the Guest it's an lguest. */
2006 boot->hdr.hardware_subarch = 1;
2007
2008 /* Tell the entry path not to try to reload segment registers. */
2009 boot->hdr.loadflags |= KEEP_SEGMENTS;
2010
2011 /* We tell the kernel to initialize the Guest. */
2012 tell_kernel(start);
2013
2014 /* Ensure that we terminate if a device-servicing child dies. */
2015 signal(SIGCHLD, kill_launcher);
2016
2017 /* If we exit via err(), this kills all the threads, restores tty. */
2018 atexit(cleanup_devices);
2019
2020 /* If requested, chroot to a directory */
2021 if (chroot_path) {
2022 if (chroot(chroot_path) != 0)
2023 err(1, "chroot(\"%s\") failed", chroot_path);
2024
2025 if (chdir("/") != 0)
2026 err(1, "chdir(\"/\") failed");
2027
2028 verbose("chroot done\n");
2029 }
2030
2031 /* If requested, drop privileges */
2032 if (user_details) {
2033 uid_t u;
2034 gid_t g;
2035
2036 u = user_details->pw_uid;
2037 g = user_details->pw_gid;
2038
2039 if (initgroups(user_details->pw_name, g) != 0)
2040 err(1, "initgroups failed");
2041
2042 if (setresgid(g, g, g) != 0)
2043 err(1, "setresgid failed");
2044
2045 if (setresuid(u, u, u) != 0)
2046 err(1, "setresuid failed");
2047
2048 verbose("Dropping privileges completed\n");
2049 }
2050
2051 /* Finally, run the Guest. This doesn't return. */
2052 run_guest();
2053}
2054/*:*/
2055
2056/*M:999
2057 * Mastery is done: you now know everything I do.
2058 *
2059 * But surely you have seen code, features and bugs in your wanderings which
2060 * you now yearn to attack? That is the real game, and I look forward to you
2061 * patching and forking lguest into the Your-Name-Here-visor.
2062 *
2063 * Farewell, and good coding!
2064 * Rusty Russell.
2065 */
diff --git a/Documentation/virtual/lguest/lguest.txt b/Documentation/virtual/lguest/lguest.txt
new file mode 100644
index 00000000000..bff0c554485
--- /dev/null
+++ b/Documentation/virtual/lguest/lguest.txt
@@ -0,0 +1,129 @@
1 __
2 (___()'`; Rusty's Remarkably Unreliable Guide to Lguest
3 /, /` - or, A Young Coder's Illustrated Hypervisor
4 \\"--\\ http://lguest.ozlabs.org
5
6Lguest is designed to be a minimal 32-bit x86 hypervisor for the Linux kernel,
7for Linux developers and users to experiment with virtualization with the
8minimum of complexity. Nonetheless, it should have sufficient features to
9make it useful for specific tasks, and, of course, you are encouraged to fork
10and enhance it (see drivers/lguest/README).
11
12Features:
13
14- Kernel module which runs in a normal kernel.
15- Simple I/O model for communication.
16- Simple program to create new guests.
17- Logo contains cute puppies: http://lguest.ozlabs.org
18
19Developer features:
20
21- Fun to hack on.
22- No ABI: being tied to a specific kernel anyway, you can change anything.
23- Many opportunities for improvement or feature implementation.
24
25Running Lguest:
26
27- The easiest way to run lguest is to use same kernel as guest and host.
28 You can configure them differently, but usually it's easiest not to.
29
30 You will need to configure your kernel with the following options:
31
32 "General setup":
33 "Prompt for development and/or incomplete code/drivers" = Y
34 (CONFIG_EXPERIMENTAL=y)
35
36 "Processor type and features":
37 "Paravirtualized guest support" = Y
38 "Lguest guest support" = Y
39 "High Memory Support" = off/4GB
40 "Alignment value to which kernel should be aligned" = 0x100000
41 (CONFIG_PARAVIRT=y, CONFIG_LGUEST_GUEST=y, CONFIG_HIGHMEM64G=n and
42 CONFIG_PHYSICAL_ALIGN=0x100000)
43
44 "Device Drivers":
45 "Block devices"
46 "Virtio block driver (EXPERIMENTAL)" = M/Y
47 "Network device support"
48 "Universal TUN/TAP device driver support" = M/Y
49 "Virtio network driver (EXPERIMENTAL)" = M/Y
50 (CONFIG_VIRTIO_BLK=m, CONFIG_VIRTIO_NET=m and CONFIG_TUN=m)
51
52 "Virtualization"
53 "Linux hypervisor example code" = M/Y
54 (CONFIG_LGUEST=m)
55
56- A tool called "lguest" is available in this directory: type "make"
57 to build it. If you didn't build your kernel in-tree, use "make
58 O=<builddir>".
59
60- Create or find a root disk image. There are several useful ones
61 around, such as the xm-test tiny root image at
62 http://xm-test.xensource.com/ramdisks/initrd-1.1-i386.img
63
64 For more serious work, I usually use a distribution ISO image and
65 install it under qemu, then make multiple copies:
66
67 dd if=/dev/zero of=rootfile bs=1M count=2048
68 qemu -cdrom image.iso -hda rootfile -net user -net nic -boot d
69
70 Make sure that you install a getty on /dev/hvc0 if you want to log in on the
71 console!
72
73- "modprobe lg" if you built it as a module.
74
75- Run an lguest as root:
76
77 Documentation/virtual/lguest/lguest 64 vmlinux --tunnet=192.168.19.1 \
78 --block=rootfile root=/dev/vda
79
80 Explanation:
81 64: the amount of memory to use, in MB.
82
83 vmlinux: the kernel image found in the top of your build directory. You
84 can also use a standard bzImage.
85
86 --tunnet=192.168.19.1: configures a "tap" device for networking with this
87 IP address.
88
89 --block=rootfile: a file or block device which becomes /dev/vda
90 inside the guest.
91
92 root=/dev/vda: this (and anything else on the command line) are
93 kernel boot parameters.
94
95- Configuring networking. I usually have the host masquerade, using
96 "iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE" and "echo 1 >
97 /proc/sys/net/ipv4/ip_forward". In this example, I would configure
98 eth0 inside the guest at 192.168.19.2.
99
100 Another method is to bridge the tap device to an external interface
101 using --tunnet=bridge:<bridgename>, and perhaps run dhcp on the guest
102 to obtain an IP address. The bridge needs to be configured first:
103 this option simply adds the tap interface to it.
104
105 A simple example on my system:
106
107 ifconfig eth0 0.0.0.0
108 brctl addbr lg0
109 ifconfig lg0 up
110 brctl addif lg0 eth0
111 dhclient lg0
112
113 Then use --tunnet=bridge:lg0 when launching the guest.
114
115 See:
116
117 http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge
118
119 for general information on how to get bridging to work.
120
121- Random number generation. Using the --rng option will provide a
122 /dev/hwrng in the guest that will read from the host's /dev/random.
123 Use this option in conjunction with rng-tools (see ../hw_random.txt)
124 to provide entropy to the guest kernel's /dev/random.
125
126There is a helpful mailing list at http://ozlabs.org/mailman/listinfo/lguest
127
128Good luck!
129Rusty Russell rusty@rustcorp.com.au.
diff --git a/Documentation/vm/Makefile b/Documentation/vm/Makefile
new file mode 100644
index 00000000000..3fa4d066886
--- /dev/null
+++ b/Documentation/vm/Makefile
@@ -0,0 +1,8 @@
1# kbuild trick to avoid linker error. Can be omitted if a module is built.
2obj- := dummy.o
3
4# List of programs to build
5hostprogs-y := page-types hugepage-mmap hugepage-shm map_hugetlb
6
7# Tell kbuild to always build the programs
8always := $(hostprogs-y)
diff --git a/Documentation/vm/hugepage-mmap.c b/Documentation/vm/hugepage-mmap.c
new file mode 100644
index 00000000000..db0dd9a33d5
--- /dev/null
+++ b/Documentation/vm/hugepage-mmap.c
@@ -0,0 +1,91 @@
1/*
2 * hugepage-mmap:
3 *
4 * Example of using huge page memory in a user application using the mmap
5 * system call. Before running this application, make sure that the
6 * administrator has mounted the hugetlbfs filesystem (on some directory
7 * like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this
8 * example, the app is requesting memory of size 256MB that is backed by
9 * huge pages.
10 *
11 * For the ia64 architecture, the Linux kernel reserves Region number 4 for
12 * huge pages. That means that if one requires a fixed address, a huge page
13 * aligned address starting with 0x800000... will be required. If a fixed
14 * address is not required, the kernel will select an address in the proper
15 * range.
16 * Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
17 */
18
19#include <stdlib.h>
20#include <stdio.h>
21#include <unistd.h>
22#include <sys/mman.h>
23#include <fcntl.h>
24
25#define FILE_NAME "/mnt/hugepagefile"
26#define LENGTH (256UL*1024*1024)
27#define PROTECTION (PROT_READ | PROT_WRITE)
28
29/* Only ia64 requires this */
30#ifdef __ia64__
31#define ADDR (void *)(0x8000000000000000UL)
32#define FLAGS (MAP_SHARED | MAP_FIXED)
33#else
34#define ADDR (void *)(0x0UL)
35#define FLAGS (MAP_SHARED)
36#endif
37
38static void check_bytes(char *addr)
39{
40 printf("First hex is %x\n", *((unsigned int *)addr));
41}
42
43static void write_bytes(char *addr)
44{
45 unsigned long i;
46
47 for (i = 0; i < LENGTH; i++)
48 *(addr + i) = (char)i;
49}
50
51static void read_bytes(char *addr)
52{
53 unsigned long i;
54
55 check_bytes(addr);
56 for (i = 0; i < LENGTH; i++)
57 if (*(addr + i) != (char)i) {
58 printf("Mismatch at %lu\n", i);
59 break;
60 }
61}
62
63int main(void)
64{
65 void *addr;
66 int fd;
67
68 fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755);
69 if (fd < 0) {
70 perror("Open failed");
71 exit(1);
72 }
73
74 addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0);
75 if (addr == MAP_FAILED) {
76 perror("mmap");
77 unlink(FILE_NAME);
78 exit(1);
79 }
80
81 printf("Returned address is %p\n", addr);
82 check_bytes(addr);
83 write_bytes(addr);
84 read_bytes(addr);
85
86 munmap(addr, LENGTH);
87 close(fd);
88 unlink(FILE_NAME);
89
90 return 0;
91}
diff --git a/Documentation/vm/hugepage-shm.c b/Documentation/vm/hugepage-shm.c
new file mode 100644
index 00000000000..07956d8592c
--- /dev/null
+++ b/Documentation/vm/hugepage-shm.c
@@ -0,0 +1,98 @@
1/*
2 * hugepage-shm:
3 *
4 * Example of using huge page memory in a user application using Sys V shared
5 * memory system calls. In this example the app is requesting 256MB of
6 * memory that is backed by huge pages. The application uses the flag
7 * SHM_HUGETLB in the shmget system call to inform the kernel that it is
8 * requesting huge pages.
9 *
10 * For the ia64 architecture, the Linux kernel reserves Region number 4 for
11 * huge pages. That means that if one requires a fixed address, a huge page
12 * aligned address starting with 0x800000... will be required. If a fixed
13 * address is not required, the kernel will select an address in the proper
14 * range.
15 * Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
16 *
17 * Note: The default shared memory limit is quite low on many kernels,
18 * you may need to increase it via:
19 *
20 * echo 268435456 > /proc/sys/kernel/shmmax
21 *
22 * This will increase the maximum size per shared memory segment to 256MB.
23 * The other limit that you will hit eventually is shmall which is the
24 * total amount of shared memory in pages. To set it to 16GB on a system
25 * with a 4kB pagesize do:
26 *
27 * echo 4194304 > /proc/sys/kernel/shmall
28 */
29
30#include <stdlib.h>
31#include <stdio.h>
32#include <sys/types.h>
33#include <sys/ipc.h>
34#include <sys/shm.h>
35#include <sys/mman.h>
36
37#ifndef SHM_HUGETLB
38#define SHM_HUGETLB 04000
39#endif
40
41#define LENGTH (256UL*1024*1024)
42
43#define dprintf(x) printf(x)
44
45/* Only ia64 requires this */
46#ifdef __ia64__
47#define ADDR (void *)(0x8000000000000000UL)
48#define SHMAT_FLAGS (SHM_RND)
49#else
50#define ADDR (void *)(0x0UL)
51#define SHMAT_FLAGS (0)
52#endif
53
54int main(void)
55{
56 int shmid;
57 unsigned long i;
58 char *shmaddr;
59
60 if ((shmid = shmget(2, LENGTH,
61 SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) < 0) {
62 perror("shmget");
63 exit(1);
64 }
65 printf("shmid: 0x%x\n", shmid);
66
67 shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS);
68 if (shmaddr == (char *)-1) {
69 perror("Shared memory attach failure");
70 shmctl(shmid, IPC_RMID, NULL);
71 exit(2);
72 }
73 printf("shmaddr: %p\n", shmaddr);
74
75 dprintf("Starting the writes:\n");
76 for (i = 0; i < LENGTH; i++) {
77 shmaddr[i] = (char)(i);
78 if (!(i % (1024 * 1024)))
79 dprintf(".");
80 }
81 dprintf("\n");
82
83 dprintf("Starting the Check...");
84 for (i = 0; i < LENGTH; i++)
85 if (shmaddr[i] != (char)i)
86 printf("\nIndex %lu mismatched\n", i);
87 dprintf("Done.\n");
88
89 if (shmdt((const void *)shmaddr) != 0) {
90 perror("Detach failure");
91 shmctl(shmid, IPC_RMID, NULL);
92 exit(3);
93 }
94
95 shmctl(shmid, IPC_RMID, NULL);
96
97 return 0;
98}
diff --git a/Documentation/vm/map_hugetlb.c b/Documentation/vm/map_hugetlb.c
new file mode 100644
index 00000000000..eda1a6d3578
--- /dev/null
+++ b/Documentation/vm/map_hugetlb.c
@@ -0,0 +1,77 @@
1/*
2 * Example of using hugepage memory in a user application using the mmap
3 * system call with MAP_HUGETLB flag. Before running this program make
4 * sure the administrator has allocated enough default sized huge pages
5 * to cover the 256 MB allocation.
6 *
7 * For ia64 architecture, Linux kernel reserves Region number 4 for hugepages.
8 * That means the addresses starting with 0x800000... will need to be
9 * specified. Specifying a fixed address is not required on ppc64, i386
10 * or x86_64.
11 */
12#include <stdlib.h>
13#include <stdio.h>
14#include <unistd.h>
15#include <sys/mman.h>
16#include <fcntl.h>
17
18#define LENGTH (256UL*1024*1024)
19#define PROTECTION (PROT_READ | PROT_WRITE)
20
21#ifndef MAP_HUGETLB
22#define MAP_HUGETLB 0x40000 /* arch specific */
23#endif
24
25/* Only ia64 requires this */
26#ifdef __ia64__
27#define ADDR (void *)(0x8000000000000000UL)
28#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
29#else
30#define ADDR (void *)(0x0UL)
31#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
32#endif
33
34static void check_bytes(char *addr)
35{
36 printf("First hex is %x\n", *((unsigned int *)addr));
37}
38
39static void write_bytes(char *addr)
40{
41 unsigned long i;
42
43 for (i = 0; i < LENGTH; i++)
44 *(addr + i) = (char)i;
45}
46
47static void read_bytes(char *addr)
48{
49 unsigned long i;
50
51 check_bytes(addr);
52 for (i = 0; i < LENGTH; i++)
53 if (*(addr + i) != (char)i) {
54 printf("Mismatch at %lu\n", i);
55 break;
56 }
57}
58
59int main(void)
60{
61 void *addr;
62
63 addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, 0, 0);
64 if (addr == MAP_FAILED) {
65 perror("mmap");
66 exit(1);
67 }
68
69 printf("Returned address is %p\n", addr);
70 check_bytes(addr);
71 write_bytes(addr);
72 read_bytes(addr);
73
74 munmap(addr, LENGTH);
75
76 return 0;
77}
diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
new file mode 100644
index 00000000000..7445caa26d0
--- /dev/null
+++ b/Documentation/vm/page-types.c
@@ -0,0 +1,1100 @@
1/*
2 * page-types: Tool for querying page flags
3 *
4 * This program is free software; you can redistribute it and/or modify it
5 * under the terms of the GNU General Public License as published by the Free
6 * Software Foundation; version 2.
7 *
8 * This program is distributed in the hope that it will be useful, but WITHOUT
9 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
10 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
11 * more details.
12 *
13 * You should find a copy of v2 of the GNU General Public License somewhere on
14 * your Linux system; if not, write to the Free Software Foundation, Inc., 59
15 * Temple Place, Suite 330, Boston, MA 02111-1307 USA.
16 *
17 * Copyright (C) 2009 Intel corporation
18 *
19 * Authors: Wu Fengguang <fengguang.wu@intel.com>
20 */
21
22#define _LARGEFILE64_SOURCE
23#include <stdio.h>
24#include <stdlib.h>
25#include <unistd.h>
26#include <stdint.h>
27#include <stdarg.h>
28#include <string.h>
29#include <getopt.h>
30#include <limits.h>
31#include <assert.h>
32#include <sys/types.h>
33#include <sys/errno.h>
34#include <sys/fcntl.h>
35#include <sys/mount.h>
36#include <sys/statfs.h>
37#include "../../include/linux/magic.h"
38
39
40#ifndef MAX_PATH
41# define MAX_PATH 256
42#endif
43
44#ifndef STR
45# define _STR(x) #x
46# define STR(x) _STR(x)
47#endif
48
49/*
50 * pagemap kernel ABI bits
51 */
52
53#define PM_ENTRY_BYTES sizeof(uint64_t)
54#define PM_STATUS_BITS 3
55#define PM_STATUS_OFFSET (64 - PM_STATUS_BITS)
56#define PM_STATUS_MASK (((1LL << PM_STATUS_BITS) - 1) << PM_STATUS_OFFSET)
57#define PM_STATUS(nr) (((nr) << PM_STATUS_OFFSET) & PM_STATUS_MASK)
58#define PM_PSHIFT_BITS 6
59#define PM_PSHIFT_OFFSET (PM_STATUS_OFFSET - PM_PSHIFT_BITS)
60#define PM_PSHIFT_MASK (((1LL << PM_PSHIFT_BITS) - 1) << PM_PSHIFT_OFFSET)
61#define PM_PSHIFT(x) (((u64) (x) << PM_PSHIFT_OFFSET) & PM_PSHIFT_MASK)
62#define PM_PFRAME_MASK ((1LL << PM_PSHIFT_OFFSET) - 1)
63#define PM_PFRAME(x) ((x) & PM_PFRAME_MASK)
64
65#define PM_PRESENT PM_STATUS(4LL)
66#define PM_SWAP PM_STATUS(2LL)
67
68
69/*
70 * kernel page flags
71 */
72
73#define KPF_BYTES 8
74#define PROC_KPAGEFLAGS "/proc/kpageflags"
75
76/* copied from kpageflags_read() */
77#define KPF_LOCKED 0
78#define KPF_ERROR 1
79#define KPF_REFERENCED 2
80#define KPF_UPTODATE 3
81#define KPF_DIRTY 4
82#define KPF_LRU 5
83#define KPF_ACTIVE 6
84#define KPF_SLAB 7
85#define KPF_WRITEBACK 8
86#define KPF_RECLAIM 9
87#define KPF_BUDDY 10
88
89/* [11-20] new additions in 2.6.31 */
90#define KPF_MMAP 11
91#define KPF_ANON 12
92#define KPF_SWAPCACHE 13
93#define KPF_SWAPBACKED 14
94#define KPF_COMPOUND_HEAD 15
95#define KPF_COMPOUND_TAIL 16
96#define KPF_HUGE 17
97#define KPF_UNEVICTABLE 18
98#define KPF_HWPOISON 19
99#define KPF_NOPAGE 20
100#define KPF_KSM 21
101
102/* [32-] kernel hacking assistances */
103#define KPF_RESERVED 32
104#define KPF_MLOCKED 33
105#define KPF_MAPPEDTODISK 34
106#define KPF_PRIVATE 35
107#define KPF_PRIVATE_2 36
108#define KPF_OWNER_PRIVATE 37
109#define KPF_ARCH 38
110#define KPF_UNCACHED 39
111
112/* [48-] take some arbitrary free slots for expanding overloaded flags
113 * not part of kernel API
114 */
115#define KPF_READAHEAD 48
116#define KPF_SLOB_FREE 49
117#define KPF_SLUB_FROZEN 50
118#define KPF_SLUB_DEBUG 51
119
120#define KPF_ALL_BITS ((uint64_t)~0ULL)
121#define KPF_HACKERS_BITS (0xffffULL << 32)
122#define KPF_OVERLOADED_BITS (0xffffULL << 48)
123#define BIT(name) (1ULL << KPF_##name)
124#define BITS_COMPOUND (BIT(COMPOUND_HEAD) | BIT(COMPOUND_TAIL))
125
126static const char *page_flag_names[] = {
127 [KPF_LOCKED] = "L:locked",
128 [KPF_ERROR] = "E:error",
129 [KPF_REFERENCED] = "R:referenced",
130 [KPF_UPTODATE] = "U:uptodate",
131 [KPF_DIRTY] = "D:dirty",
132 [KPF_LRU] = "l:lru",
133 [KPF_ACTIVE] = "A:active",
134 [KPF_SLAB] = "S:slab",
135 [KPF_WRITEBACK] = "W:writeback",
136 [KPF_RECLAIM] = "I:reclaim",
137 [KPF_BUDDY] = "B:buddy",
138
139 [KPF_MMAP] = "M:mmap",
140 [KPF_ANON] = "a:anonymous",
141 [KPF_SWAPCACHE] = "s:swapcache",
142 [KPF_SWAPBACKED] = "b:swapbacked",
143 [KPF_COMPOUND_HEAD] = "H:compound_head",
144 [KPF_COMPOUND_TAIL] = "T:compound_tail",
145 [KPF_HUGE] = "G:huge",
146 [KPF_UNEVICTABLE] = "u:unevictable",
147 [KPF_HWPOISON] = "X:hwpoison",
148 [KPF_NOPAGE] = "n:nopage",
149 [KPF_KSM] = "x:ksm",
150
151 [KPF_RESERVED] = "r:reserved",
152 [KPF_MLOCKED] = "m:mlocked",
153 [KPF_MAPPEDTODISK] = "d:mappedtodisk",
154 [KPF_PRIVATE] = "P:private",
155 [KPF_PRIVATE_2] = "p:private_2",
156 [KPF_OWNER_PRIVATE] = "O:owner_private",
157 [KPF_ARCH] = "h:arch",
158 [KPF_UNCACHED] = "c:uncached",
159
160 [KPF_READAHEAD] = "I:readahead",
161 [KPF_SLOB_FREE] = "P:slob_free",
162 [KPF_SLUB_FROZEN] = "A:slub_frozen",
163 [KPF_SLUB_DEBUG] = "E:slub_debug",
164};
165
166
167static const char *debugfs_known_mountpoints[] = {
168 "/sys/kernel/debug",
169 "/debug",
170 0,
171};
172
173/*
174 * data structures
175 */
176
177static int opt_raw; /* for kernel developers */
178static int opt_list; /* list pages (in ranges) */
179static int opt_no_summary; /* don't show summary */
180static pid_t opt_pid; /* process to walk */
181
182#define MAX_ADDR_RANGES 1024
183static int nr_addr_ranges;
184static unsigned long opt_offset[MAX_ADDR_RANGES];
185static unsigned long opt_size[MAX_ADDR_RANGES];
186
187#define MAX_VMAS 10240
188static int nr_vmas;
189static unsigned long pg_start[MAX_VMAS];
190static unsigned long pg_end[MAX_VMAS];
191
192#define MAX_BIT_FILTERS 64
193static int nr_bit_filters;
194static uint64_t opt_mask[MAX_BIT_FILTERS];
195static uint64_t opt_bits[MAX_BIT_FILTERS];
196
197static int page_size;
198
199static int pagemap_fd;
200static int kpageflags_fd;
201
202static int opt_hwpoison;
203static int opt_unpoison;
204
205static char hwpoison_debug_fs[MAX_PATH+1];
206static int hwpoison_inject_fd;
207static int hwpoison_forget_fd;
208
209#define HASH_SHIFT 13
210#define HASH_SIZE (1 << HASH_SHIFT)
211#define HASH_MASK (HASH_SIZE - 1)
212#define HASH_KEY(flags) (flags & HASH_MASK)
213
214static unsigned long total_pages;
215static unsigned long nr_pages[HASH_SIZE];
216static uint64_t page_flags[HASH_SIZE];
217
218
219/*
220 * helper functions
221 */
222
223#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
224
225#define min_t(type, x, y) ({ \
226 type __min1 = (x); \
227 type __min2 = (y); \
228 __min1 < __min2 ? __min1 : __min2; })
229
230#define max_t(type, x, y) ({ \
231 type __max1 = (x); \
232 type __max2 = (y); \
233 __max1 > __max2 ? __max1 : __max2; })
234
235static unsigned long pages2mb(unsigned long pages)
236{
237 return (pages * page_size) >> 20;
238}
239
240static void fatal(const char *x, ...)
241{
242 va_list ap;
243
244 va_start(ap, x);
245 vfprintf(stderr, x, ap);
246 va_end(ap);
247 exit(EXIT_FAILURE);
248}
249
250static int checked_open(const char *pathname, int flags)
251{
252 int fd = open(pathname, flags);
253
254 if (fd < 0) {
255 perror(pathname);
256 exit(EXIT_FAILURE);
257 }
258
259 return fd;
260}
261
262/*
263 * pagemap/kpageflags routines
264 */
265
266static unsigned long do_u64_read(int fd, char *name,
267 uint64_t *buf,
268 unsigned long index,
269 unsigned long count)
270{
271 long bytes;
272
273 if (index > ULONG_MAX / 8)
274 fatal("index overflow: %lu\n", index);
275
276 if (lseek(fd, index * 8, SEEK_SET) < 0) {
277 perror(name);
278 exit(EXIT_FAILURE);
279 }
280
281 bytes = read(fd, buf, count * 8);
282 if (bytes < 0) {
283 perror(name);
284 exit(EXIT_FAILURE);
285 }
286 if (bytes % 8)
287 fatal("partial read: %lu bytes\n", bytes);
288
289 return bytes / 8;
290}
291
292static unsigned long kpageflags_read(uint64_t *buf,
293 unsigned long index,
294 unsigned long pages)
295{
296 return do_u64_read(kpageflags_fd, PROC_KPAGEFLAGS, buf, index, pages);
297}
298
299static unsigned long pagemap_read(uint64_t *buf,
300 unsigned long index,
301 unsigned long pages)
302{
303 return do_u64_read(pagemap_fd, "/proc/pid/pagemap", buf, index, pages);
304}
305
306static unsigned long pagemap_pfn(uint64_t val)
307{
308 unsigned long pfn;
309
310 if (val & PM_PRESENT)
311 pfn = PM_PFRAME(val);
312 else
313 pfn = 0;
314
315 return pfn;
316}
317
318
319/*
320 * page flag names
321 */
322
323static char *page_flag_name(uint64_t flags)
324{
325 static char buf[65];
326 int present;
327 int i, j;
328
329 for (i = 0, j = 0; i < ARRAY_SIZE(page_flag_names); i++) {
330 present = (flags >> i) & 1;
331 if (!page_flag_names[i]) {
332 if (present)
333 fatal("unknown flag bit %d\n", i);
334 continue;
335 }
336 buf[j++] = present ? page_flag_names[i][0] : '_';
337 }
338
339 return buf;
340}
341
342static char *page_flag_longname(uint64_t flags)
343{
344 static char buf[1024];
345 int i, n;
346
347 for (i = 0, n = 0; i < ARRAY_SIZE(page_flag_names); i++) {
348 if (!page_flag_names[i])
349 continue;
350 if ((flags >> i) & 1)
351 n += snprintf(buf + n, sizeof(buf) - n, "%s,",
352 page_flag_names[i] + 2);
353 }
354 if (n)
355 n--;
356 buf[n] = '\0';
357
358 return buf;
359}
360
361
362/*
363 * page list and summary
364 */
365
366static void show_page_range(unsigned long voffset,
367 unsigned long offset, uint64_t flags)
368{
369 static uint64_t flags0;
370 static unsigned long voff;
371 static unsigned long index;
372 static unsigned long count;
373
374 if (flags == flags0 && offset == index + count &&
375 (!opt_pid || voffset == voff + count)) {
376 count++;
377 return;
378 }
379
380 if (count) {
381 if (opt_pid)
382 printf("%lx\t", voff);
383 printf("%lx\t%lx\t%s\n",
384 index, count, page_flag_name(flags0));
385 }
386
387 flags0 = flags;
388 index = offset;
389 voff = voffset;
390 count = 1;
391}
392
393static void show_page(unsigned long voffset,
394 unsigned long offset, uint64_t flags)
395{
396 if (opt_pid)
397 printf("%lx\t", voffset);
398 printf("%lx\t%s\n", offset, page_flag_name(flags));
399}
400
401static void show_summary(void)
402{
403 int i;
404
405 printf(" flags\tpage-count MB"
406 " symbolic-flags\t\t\tlong-symbolic-flags\n");
407
408 for (i = 0; i < ARRAY_SIZE(nr_pages); i++) {
409 if (nr_pages[i])
410 printf("0x%016llx\t%10lu %8lu %s\t%s\n",
411 (unsigned long long)page_flags[i],
412 nr_pages[i],
413 pages2mb(nr_pages[i]),
414 page_flag_name(page_flags[i]),
415 page_flag_longname(page_flags[i]));
416 }
417
418 printf(" total\t%10lu %8lu\n",
419 total_pages, pages2mb(total_pages));
420}
421
422
423/*
424 * page flag filters
425 */
426
427static int bit_mask_ok(uint64_t flags)
428{
429 int i;
430
431 for (i = 0; i < nr_bit_filters; i++) {
432 if (opt_bits[i] == KPF_ALL_BITS) {
433 if ((flags & opt_mask[i]) == 0)
434 return 0;
435 } else {
436 if ((flags & opt_mask[i]) != opt_bits[i])
437 return 0;
438 }
439 }
440
441 return 1;
442}
443
444static uint64_t expand_overloaded_flags(uint64_t flags)
445{
446 /* SLOB/SLUB overload several page flags */
447 if (flags & BIT(SLAB)) {
448 if (flags & BIT(PRIVATE))
449 flags ^= BIT(PRIVATE) | BIT(SLOB_FREE);
450 if (flags & BIT(ACTIVE))
451 flags ^= BIT(ACTIVE) | BIT(SLUB_FROZEN);
452 if (flags & BIT(ERROR))
453 flags ^= BIT(ERROR) | BIT(SLUB_DEBUG);
454 }
455
456 /* PG_reclaim is overloaded as PG_readahead in the read path */
457 if ((flags & (BIT(RECLAIM) | BIT(WRITEBACK))) == BIT(RECLAIM))
458 flags ^= BIT(RECLAIM) | BIT(READAHEAD);
459
460 return flags;
461}
462
463static uint64_t well_known_flags(uint64_t flags)
464{
465 /* hide flags intended only for kernel hacker */
466 flags &= ~KPF_HACKERS_BITS;
467
468 /* hide non-hugeTLB compound pages */
469 if ((flags & BITS_COMPOUND) && !(flags & BIT(HUGE)))
470 flags &= ~BITS_COMPOUND;
471
472 return flags;
473}
474
475static uint64_t kpageflags_flags(uint64_t flags)
476{
477 flags = expand_overloaded_flags(flags);
478
479 if (!opt_raw)
480 flags = well_known_flags(flags);
481
482 return flags;
483}
484
485/* verify that a mountpoint is actually a debugfs instance */
486static int debugfs_valid_mountpoint(const char *debugfs)
487{
488 struct statfs st_fs;
489
490 if (statfs(debugfs, &st_fs) < 0)
491 return -ENOENT;
492 else if (st_fs.f_type != (long) DEBUGFS_MAGIC)
493 return -ENOENT;
494
495 return 0;
496}
497
498/* find the path to the mounted debugfs */
499static const char *debugfs_find_mountpoint(void)
500{
501 const char **ptr;
502 char type[100];
503 FILE *fp;
504
505 ptr = debugfs_known_mountpoints;
506 while (*ptr) {
507 if (debugfs_valid_mountpoint(*ptr) == 0) {
508 strcpy(hwpoison_debug_fs, *ptr);
509 return hwpoison_debug_fs;
510 }
511 ptr++;
512 }
513
514 /* give up and parse /proc/mounts */
515 fp = fopen("/proc/mounts", "r");
516 if (fp == NULL)
517 perror("Can't open /proc/mounts for read");
518
519 while (fscanf(fp, "%*s %"
520 STR(MAX_PATH)
521 "s %99s %*s %*d %*d\n",
522 hwpoison_debug_fs, type) == 2) {
523 if (strcmp(type, "debugfs") == 0)
524 break;
525 }
526 fclose(fp);
527
528 if (strcmp(type, "debugfs") != 0)
529 return NULL;
530
531 return hwpoison_debug_fs;
532}
533
534/* mount the debugfs somewhere if it's not mounted */
535
536static void debugfs_mount(void)
537{
538 const char **ptr;
539
540 /* see if it's already mounted */
541 if (debugfs_find_mountpoint())
542 return;
543
544 ptr = debugfs_known_mountpoints;
545 while (*ptr) {
546 if (mount(NULL, *ptr, "debugfs", 0, NULL) == 0) {
547 /* save the mountpoint */
548 strcpy(hwpoison_debug_fs, *ptr);
549 break;
550 }
551 ptr++;
552 }
553
554 if (*ptr == NULL) {
555 perror("mount debugfs");
556 exit(EXIT_FAILURE);
557 }
558}
559
560/*
561 * page actions
562 */
563
564static void prepare_hwpoison_fd(void)
565{
566 char buf[MAX_PATH + 1];
567
568 debugfs_mount();
569
570 if (opt_hwpoison && !hwpoison_inject_fd) {
571 snprintf(buf, MAX_PATH, "%s/hwpoison/corrupt-pfn",
572 hwpoison_debug_fs);
573 hwpoison_inject_fd = checked_open(buf, O_WRONLY);
574 }
575
576 if (opt_unpoison && !hwpoison_forget_fd) {
577 snprintf(buf, MAX_PATH, "%s/hwpoison/unpoison-pfn",
578 hwpoison_debug_fs);
579 hwpoison_forget_fd = checked_open(buf, O_WRONLY);
580 }
581}
582
583static int hwpoison_page(unsigned long offset)
584{
585 char buf[100];
586 int len;
587
588 len = sprintf(buf, "0x%lx\n", offset);
589 len = write(hwpoison_inject_fd, buf, len);
590 if (len < 0) {
591 perror("hwpoison inject");
592 return len;
593 }
594 return 0;
595}
596
597static int unpoison_page(unsigned long offset)
598{
599 char buf[100];
600 int len;
601
602 len = sprintf(buf, "0x%lx\n", offset);
603 len = write(hwpoison_forget_fd, buf, len);
604 if (len < 0) {
605 perror("hwpoison forget");
606 return len;
607 }
608 return 0;
609}
610
611/*
612 * page frame walker
613 */
614
615static int hash_slot(uint64_t flags)
616{
617 int k = HASH_KEY(flags);
618 int i;
619
620 /* Explicitly reserve slot 0 for flags 0: the following logic
621 * cannot distinguish an unoccupied slot from slot (flags==0).
622 */
623 if (flags == 0)
624 return 0;
625
626 /* search through the remaining (HASH_SIZE-1) slots */
627 for (i = 1; i < ARRAY_SIZE(page_flags); i++, k++) {
628 if (!k || k >= ARRAY_SIZE(page_flags))
629 k = 1;
630 if (page_flags[k] == 0) {
631 page_flags[k] = flags;
632 return k;
633 }
634 if (page_flags[k] == flags)
635 return k;
636 }
637
638 fatal("hash table full: bump up HASH_SHIFT?\n");
639 exit(EXIT_FAILURE);
640}
641
642static void add_page(unsigned long voffset,
643 unsigned long offset, uint64_t flags)
644{
645 flags = kpageflags_flags(flags);
646
647 if (!bit_mask_ok(flags))
648 return;
649
650 if (opt_hwpoison)
651 hwpoison_page(offset);
652 if (opt_unpoison)
653 unpoison_page(offset);
654
655 if (opt_list == 1)
656 show_page_range(voffset, offset, flags);
657 else if (opt_list == 2)
658 show_page(voffset, offset, flags);
659
660 nr_pages[hash_slot(flags)]++;
661 total_pages++;
662}
663
664#define KPAGEFLAGS_BATCH (64 << 10) /* 64k pages */
665static void walk_pfn(unsigned long voffset,
666 unsigned long index,
667 unsigned long count)
668{
669 uint64_t buf[KPAGEFLAGS_BATCH];
670 unsigned long batch;
671 long pages;
672 unsigned long i;
673
674 while (count) {
675 batch = min_t(unsigned long, count, KPAGEFLAGS_BATCH);
676 pages = kpageflags_read(buf, index, batch);
677 if (pages == 0)
678 break;
679
680 for (i = 0; i < pages; i++)
681 add_page(voffset + i, index + i, buf[i]);
682
683 index += pages;
684 count -= pages;
685 }
686}
687
688#define PAGEMAP_BATCH (64 << 10)
689static void walk_vma(unsigned long index, unsigned long count)
690{
691 uint64_t buf[PAGEMAP_BATCH];
692 unsigned long batch;
693 unsigned long pages;
694 unsigned long pfn;
695 unsigned long i;
696
697 while (count) {
698 batch = min_t(unsigned long, count, PAGEMAP_BATCH);
699 pages = pagemap_read(buf, index, batch);
700 if (pages == 0)
701 break;
702
703 for (i = 0; i < pages; i++) {
704 pfn = pagemap_pfn(buf[i]);
705 if (pfn)
706 walk_pfn(index + i, pfn, 1);
707 }
708
709 index += pages;
710 count -= pages;
711 }
712}
713
714static void walk_task(unsigned long index, unsigned long count)
715{
716 const unsigned long end = index + count;
717 unsigned long start;
718 int i = 0;
719
720 while (index < end) {
721
722 while (pg_end[i] <= index)
723 if (++i >= nr_vmas)
724 return;
725 if (pg_start[i] >= end)
726 return;
727
728 start = max_t(unsigned long, pg_start[i], index);
729 index = min_t(unsigned long, pg_end[i], end);
730
731 assert(start < index);
732 walk_vma(start, index - start);
733 }
734}
735
736static void add_addr_range(unsigned long offset, unsigned long size)
737{
738 if (nr_addr_ranges >= MAX_ADDR_RANGES)
739 fatal("too many addr ranges\n");
740
741 opt_offset[nr_addr_ranges] = offset;
742 opt_size[nr_addr_ranges] = min_t(unsigned long, size, ULONG_MAX-offset);
743 nr_addr_ranges++;
744}
745
746static void walk_addr_ranges(void)
747{
748 int i;
749
750 kpageflags_fd = checked_open(PROC_KPAGEFLAGS, O_RDONLY);
751
752 if (!nr_addr_ranges)
753 add_addr_range(0, ULONG_MAX);
754
755 for (i = 0; i < nr_addr_ranges; i++)
756 if (!opt_pid)
757 walk_pfn(0, opt_offset[i], opt_size[i]);
758 else
759 walk_task(opt_offset[i], opt_size[i]);
760
761 close(kpageflags_fd);
762}
763
764
765/*
766 * user interface
767 */
768
769static const char *page_flag_type(uint64_t flag)
770{
771 if (flag & KPF_HACKERS_BITS)
772 return "(r)";
773 if (flag & KPF_OVERLOADED_BITS)
774 return "(o)";
775 return " ";
776}
777
778static void usage(void)
779{
780 int i, j;
781
782 printf(
783"page-types [options]\n"
784" -r|--raw Raw mode, for kernel developers\n"
785" -d|--describe flags Describe flags\n"
786" -a|--addr addr-spec Walk a range of pages\n"
787" -b|--bits bits-spec Walk pages with specified bits\n"
788" -p|--pid pid Walk process address space\n"
789#if 0 /* planned features */
790" -f|--file filename Walk file address space\n"
791#endif
792" -l|--list Show page details in ranges\n"
793" -L|--list-each Show page details one by one\n"
794" -N|--no-summary Don't show summary info\n"
795" -X|--hwpoison hwpoison pages\n"
796" -x|--unpoison unpoison pages\n"
797" -h|--help Show this usage message\n"
798"flags:\n"
799" 0x10 bitfield format, e.g.\n"
800" anon bit-name, e.g.\n"
801" 0x10,anon comma-separated list, e.g.\n"
802"addr-spec:\n"
803" N one page at offset N (unit: pages)\n"
804" N+M pages range from N to N+M-1\n"
805" N,M pages range from N to M-1\n"
806" N, pages range from N to end\n"
807" ,M pages range from 0 to M-1\n"
808"bits-spec:\n"
809" bit1,bit2 (flags & (bit1|bit2)) != 0\n"
810" bit1,bit2=bit1 (flags & (bit1|bit2)) == bit1\n"
811" bit1,~bit2 (flags & (bit1|bit2)) == bit1\n"
812" =bit1,bit2 flags == (bit1|bit2)\n"
813"bit-names:\n"
814 );
815
816 for (i = 0, j = 0; i < ARRAY_SIZE(page_flag_names); i++) {
817 if (!page_flag_names[i])
818 continue;
819 printf("%16s%s", page_flag_names[i] + 2,
820 page_flag_type(1ULL << i));
821 if (++j > 3) {
822 j = 0;
823 putchar('\n');
824 }
825 }
826 printf("\n "
827 "(r) raw mode bits (o) overloaded bits\n");
828}
829
830static unsigned long long parse_number(const char *str)
831{
832 unsigned long long n;
833
834 n = strtoll(str, NULL, 0);
835
836 if (n == 0 && str[0] != '0')
837 fatal("invalid name or number: %s\n", str);
838
839 return n;
840}
841
842static void parse_pid(const char *str)
843{
844 FILE *file;
845 char buf[5000];
846
847 opt_pid = parse_number(str);
848
849 sprintf(buf, "/proc/%d/pagemap", opt_pid);
850 pagemap_fd = checked_open(buf, O_RDONLY);
851
852 sprintf(buf, "/proc/%d/maps", opt_pid);
853 file = fopen(buf, "r");
854 if (!file) {
855 perror(buf);
856 exit(EXIT_FAILURE);
857 }
858
859 while (fgets(buf, sizeof(buf), file) != NULL) {
860 unsigned long vm_start;
861 unsigned long vm_end;
862 unsigned long long pgoff;
863 int major, minor;
864 char r, w, x, s;
865 unsigned long ino;
866 int n;
867
868 n = sscanf(buf, "%lx-%lx %c%c%c%c %llx %x:%x %lu",
869 &vm_start,
870 &vm_end,
871 &r, &w, &x, &s,
872 &pgoff,
873 &major, &minor,
874 &ino);
875 if (n < 10) {
876 fprintf(stderr, "unexpected line: %s\n", buf);
877 continue;
878 }
879 pg_start[nr_vmas] = vm_start / page_size;
880 pg_end[nr_vmas] = vm_end / page_size;
881 if (++nr_vmas >= MAX_VMAS) {
882 fprintf(stderr, "too many VMAs\n");
883 break;
884 }
885 }
886 fclose(file);
887}
888
889static void parse_file(const char *name)
890{
891}
892
893static void parse_addr_range(const char *optarg)
894{
895 unsigned long offset;
896 unsigned long size;
897 char *p;
898
899 p = strchr(optarg, ',');
900 if (!p)
901 p = strchr(optarg, '+');
902
903 if (p == optarg) {
904 offset = 0;
905 size = parse_number(p + 1);
906 } else if (p) {
907 offset = parse_number(optarg);
908 if (p[1] == '\0')
909 size = ULONG_MAX;
910 else {
911 size = parse_number(p + 1);
912 if (*p == ',') {
913 if (size < offset)
914 fatal("invalid range: %lu,%lu\n",
915 offset, size);
916 size -= offset;
917 }
918 }
919 } else {
920 offset = parse_number(optarg);
921 size = 1;
922 }
923
924 add_addr_range(offset, size);
925}
926
927static void add_bits_filter(uint64_t mask, uint64_t bits)
928{
929 if (nr_bit_filters >= MAX_BIT_FILTERS)
930 fatal("too much bit filters\n");
931
932 opt_mask[nr_bit_filters] = mask;
933 opt_bits[nr_bit_filters] = bits;
934 nr_bit_filters++;
935}
936
937static uint64_t parse_flag_name(const char *str, int len)
938{
939 int i;
940
941 if (!*str || !len)
942 return 0;
943
944 if (len <= 8 && !strncmp(str, "compound", len))
945 return BITS_COMPOUND;
946
947 for (i = 0; i < ARRAY_SIZE(page_flag_names); i++) {
948 if (!page_flag_names[i])
949 continue;
950 if (!strncmp(str, page_flag_names[i] + 2, len))
951 return 1ULL << i;
952 }
953
954 return parse_number(str);
955}
956
957static uint64_t parse_flag_names(const char *str, int all)
958{
959 const char *p = str;
960 uint64_t flags = 0;
961
962 while (1) {
963 if (*p == ',' || *p == '=' || *p == '\0') {
964 if ((*str != '~') || (*str == '~' && all && *++str))
965 flags |= parse_flag_name(str, p - str);
966 if (*p != ',')
967 break;
968 str = p + 1;
969 }
970 p++;
971 }
972
973 return flags;
974}
975
976static void parse_bits_mask(const char *optarg)
977{
978 uint64_t mask;
979 uint64_t bits;
980 const char *p;
981
982 p = strchr(optarg, '=');
983 if (p == optarg) {
984 mask = KPF_ALL_BITS;
985 bits = parse_flag_names(p + 1, 0);
986 } else if (p) {
987 mask = parse_flag_names(optarg, 0);
988 bits = parse_flag_names(p + 1, 0);
989 } else if (strchr(optarg, '~')) {
990 mask = parse_flag_names(optarg, 1);
991 bits = parse_flag_names(optarg, 0);
992 } else {
993 mask = parse_flag_names(optarg, 0);
994 bits = KPF_ALL_BITS;
995 }
996
997 add_bits_filter(mask, bits);
998}
999
1000static void describe_flags(const char *optarg)
1001{
1002 uint64_t flags = parse_flag_names(optarg, 0);
1003
1004 printf("0x%016llx\t%s\t%s\n",
1005 (unsigned long long)flags,
1006 page_flag_name(flags),
1007 page_flag_longname(flags));
1008}
1009
1010static const struct option opts[] = {
1011 { "raw" , 0, NULL, 'r' },
1012 { "pid" , 1, NULL, 'p' },
1013 { "file" , 1, NULL, 'f' },
1014 { "addr" , 1, NULL, 'a' },
1015 { "bits" , 1, NULL, 'b' },
1016 { "describe" , 1, NULL, 'd' },
1017 { "list" , 0, NULL, 'l' },
1018 { "list-each" , 0, NULL, 'L' },
1019 { "no-summary", 0, NULL, 'N' },
1020 { "hwpoison" , 0, NULL, 'X' },
1021 { "unpoison" , 0, NULL, 'x' },
1022 { "help" , 0, NULL, 'h' },
1023 { NULL , 0, NULL, 0 }
1024};
1025
1026int main(int argc, char *argv[])
1027{
1028 int c;
1029
1030 page_size = getpagesize();
1031
1032 while ((c = getopt_long(argc, argv,
1033 "rp:f:a:b:d:lLNXxh", opts, NULL)) != -1) {
1034 switch (c) {
1035 case 'r':
1036 opt_raw = 1;
1037 break;
1038 case 'p':
1039 parse_pid(optarg);
1040 break;
1041 case 'f':
1042 parse_file(optarg);
1043 break;
1044 case 'a':
1045 parse_addr_range(optarg);
1046 break;
1047 case 'b':
1048 parse_bits_mask(optarg);
1049 break;
1050 case 'd':
1051 describe_flags(optarg);
1052 exit(0);
1053 case 'l':
1054 opt_list = 1;
1055 break;
1056 case 'L':
1057 opt_list = 2;
1058 break;
1059 case 'N':
1060 opt_no_summary = 1;
1061 break;
1062 case 'X':
1063 opt_hwpoison = 1;
1064 prepare_hwpoison_fd();
1065 break;
1066 case 'x':
1067 opt_unpoison = 1;
1068 prepare_hwpoison_fd();
1069 break;
1070 case 'h':
1071 usage();
1072 exit(0);
1073 default:
1074 usage();
1075 exit(1);
1076 }
1077 }
1078
1079 if (opt_list && opt_pid)
1080 printf("voffset\t");
1081 if (opt_list == 1)
1082 printf("offset\tlen\tflags\n");
1083 if (opt_list == 2)
1084 printf("offset\tflags\n");
1085
1086 walk_addr_ranges();
1087
1088 if (opt_list == 1)
1089 show_page_range(0, 0, 0); /* drain the buffer */
1090
1091 if (opt_no_summary)
1092 return 0;
1093
1094 if (opt_list)
1095 printf("\n\n");
1096
1097 show_summary();
1098
1099 return 0;
1100}
diff --git a/Documentation/watchdog/00-INDEX b/Documentation/watchdog/00-INDEX
new file mode 100644
index 00000000000..fc51128071c
--- /dev/null
+++ b/Documentation/watchdog/00-INDEX
@@ -0,0 +1,17 @@
100-INDEX
2 - this file.
3hpwdt.txt
4 - information on the HP iLO2 NMI watchdog
5pcwd-watchdog.txt
6 - documentation for Berkshire Products PC Watchdog ISA cards.
7src/
8 - directory holding watchdog related example programs.
9watchdog-api.txt
10 - description of the Linux Watchdog driver API.
11watchdog-kernel-api.txt
12 - description of the Linux WatchDog Timer Driver Core kernel API.
13watchdog-parameters.txt
14 - information on driver parameters (for drivers other than
15 the ones that have driver-specific files here)
16wdt.txt
17 - description of the Watchdog Timer Interfaces for Linux.
diff --git a/Documentation/zh_CN/SubmitChecklist b/Documentation/zh_CN/SubmitChecklist
new file mode 100644
index 00000000000..4c741d6bc04
--- /dev/null
+++ b/Documentation/zh_CN/SubmitChecklist
@@ -0,0 +1,109 @@
1Chinese translated version of Documentation/SubmitChecklist
2
3If you have any comment or update to the content, please contact the
4original document maintainer directly. However, if you have a problem
5communicating in English you can also ask the Chinese maintainer for
6help. Contact the Chinese maintainer if this translation is outdated
7or if there is a problem with the translation.
8
9Chinese maintainer: Harry Wei <harryxiyou@gmail.com>
10---------------------------------------------------------------------
11Documentation/SubmitChecklist 的中文翻译
12
13如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文
14交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻
15译存在问题,请联系中文版维护者。
16
17中文版维护者: 贾威威 Harry Wei <harryxiyou@gmail.com>
18中文版翻译者: 贾威威 Harry Wei <harryxiyou@gmail.com>
19中文版校译者: 贾威威 Harry Wei <harryxiyou@gmail.com>
20
21
22以下为正文
23---------------------------------------------------------------------
24Linux内核提交清单
25~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
26
27这里有一些内核开发者应该做的基本事情,如果他们想看到自己的内核补丁提交
28被接受的更快。
29
30这些都是超出Documentation/SubmittingPatches文档里所提供的以及其他
31关于提交Linux内核补丁的说明。
32
331:如果你使用了一个功能那么就#include定义/声明那个功能的那个文件。
34 不要依靠其他间接引入定义/声明那个功能的头文件。
35
362:构建简洁适用或者更改CONFIG选项 =y,=m,或者=n。
37 不要有编译警告/错误, 不要有链接警告/错误。
38
392b:通过 allnoconfig, allmodconfig
40
412c:当使用 0=builddir 成功地构建
42
433:通过使用本地交叉编译工具或者其他一些构建产所,在多CPU框架上构建。
44
454:ppc64 是一个很好的检查交叉编译的框架,因为它往往把‘unsigned long’
46 当64位值来使用。
47
485:按照Documentation/CodingStyle文件里的详细描述,检查你补丁的整体风格。
49 使用补丁风格检查琐碎的违规(scripts/checkpatch.pl),审核员优先提交。
50 你应该调整遗留在你补丁中的所有违规。
51
526:任何更新或者改动CONFIG选项都不能打乱配置菜单。
53
547:所有的Kconfig选项更新都要有说明文字。
55
568:已经认真地总结了相关的Kconfig组合。这是很难通过测试做好的--脑力在这里下降。
57
589:检查具有简洁性。
59
6010:使用'make checkstack'和'make namespacecheck'检查,然后修改所找到的问题。
61 注意:堆栈检查不会明确地出现问题,但是任何的一个函数在堆栈上使用多于512字节
62 都要准备修改。
63
6411:包含kernel-doc到全局内核APIs文件。(不要求静态的函数,但是包含也无所谓。)
65 使用'make htmldocs'或者'make mandocs'来检查kernel-doc,然后修改任何
66 发现的问题。
67
6812:已经通过CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
69 CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
70 CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_ATOMIC_SLEEP测试,并且同时都
71 使能。
72
7313:已经都构建并且使用或者不使用 CONFIG_SMP 和 CONFIG_PREEMPT测试执行时间。
74
7514:如果补丁影响IO/Disk,等等:已经通过使用或者不使用 CONFIG_LBDAF 测试。
76
7715:所有的codepaths已经行使所有lockdep启用功能。
78
7916:所有的/proc记录更新都要作成文件放在Documentation/目录下。
80
8117:所有的内核启动参数更新都被记录到Documentation/kernel-parameters.txt文件中。
82
8318:所有的模块参数更新都用MODULE_PARM_DESC()记录。
84
8519:所有的用户空间接口更新都被记录到Documentation/ABI/。查看Documentation/ABI/README
86 可以获得更多的信息。改变用户空间接口的补丁应该被邮件抄送给linux-api@vger.kernel.org。
87
8820:检查它是不是都通过`make headers_check'。
89
9021:已经通过至少引入slab和page-allocation失败检查。查看Documentation/fault-injection/。
91
9222:新加入的源码已经通过`gcc -W'(使用"make EXTRA_CFLAGS=-W")编译。这样将产生很多烦恼,
93 但是对于寻找漏洞很有益处,例如:"warning: comparison between signed and unsigned"。
94
9523:当它被合并到-mm补丁集后再测试,用来确定它是否还和补丁队列中的其他补丁一起工作以及在VM,VFS
96 和其他子系统中各个变化。
97
9824:所有的内存屏障{e.g., barrier(), rmb(), wmb()}需要在源代码中的一个注释来解释他们都是干什么的
99 以及原因。
100
10125:如果有任何输入输出控制的补丁被添加,也要更新Documentation/ioctl/ioctl-number.txt。
102
10326:如果你的更改代码依靠或者使用任何的内核APIs或者与下面的kconfig符号有关系的功能,你就要
104 使用相关的kconfig符号关闭, and/or =m(如果选项提供)[在同一时间不是所用的都启用,仅仅各个或者自由
105 组合他们]:
106
107 CONFIG_SMP, CONFIG_SYSFS, CONFIG_PROC_FS, CONFIG_INPUT, CONFIG_PCI,
108 CONFIG_BLOCK, CONFIG_PM, CONFIG_HOTPLUG, CONFIG_MAGIC_SYSRQ,
109 CONFIG_NET, CONFIG_INET=n (后一个使用 CONFIG_NET=y)