aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/s390
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/s390')
-rw-r--r--Documentation/s390/3270.ChangeLog44
-rw-r--r--Documentation/s390/3270.txt274
-rw-r--r--Documentation/s390/CommonIO109
-rw-r--r--Documentation/s390/DASD73
-rw-r--r--Documentation/s390/Debugging390.txt2536
-rw-r--r--Documentation/s390/TAPE122
-rw-r--r--Documentation/s390/cds.txt513
-rw-r--r--Documentation/s390/config3270.sh76
-rw-r--r--Documentation/s390/crypto/crypto-API.txt83
-rw-r--r--Documentation/s390/driver-model.txt265
-rw-r--r--Documentation/s390/monreader.txt197
-rw-r--r--Documentation/s390/s390dbf.txt615
12 files changed, 4907 insertions, 0 deletions
diff --git a/Documentation/s390/3270.ChangeLog b/Documentation/s390/3270.ChangeLog
new file mode 100644
index 000000000000..031c36081946
--- /dev/null
+++ b/Documentation/s390/3270.ChangeLog
@@ -0,0 +1,44 @@
1ChangeLog for the UTS Global 3270-support patch
2
3Sep 2002: Get bootup colors right on 3270 console
4 * In tubttybld.c, substantially revise ESC processing so that
5 ESC sequences (especially coloring ones) and the strings
6 they affect work as right as 3270 can get them. Also, set
7 screen height to omit the two rows used for input area, in
8 tty3270_open() in tubtty.c.
9
10Sep 2002: Dynamically get 3270 input buffer
11 * Oversize 3270 screen widths may exceed GEOM_MAXINPLEN columns,
12 so get input-area buffer dynamically when sizing the device in
13 tubmakemin() in tuball.c (if it's the console) or tty3270_open()
14 in tubtty.c (if needed). Change tubp->tty_input to be a
15 pointer rather than an array, in tubio.h.
16
17Sep 2002: Fix tubfs kmalloc()s
18 * Do read and write lengths correctly in fs3270_read()
19 and fs3270_write(), whilst never asking kmalloc()
20 for more than 0x800 bytes. Affects tubfs.c and tubio.h.
21
22Sep 2002: Recognize 3270 control unit type 3174
23 * Recognize control-unit type 0x3174 as well as 0x327?.
24 The IBM 2047 device emulates a 3174 control unit.
25 Modularize control-unit recognition in tuball.c by
26 adding and invoking new tub3270_is_ours().
27
28Apr 2002: Fix 3270 console reboot loop
29 * (Belated log entry) Fixed reboot loop if 3270 console,
30 in tubtty.c:ttu3270_bh().
31
32Feb 6, 2001:
33 * This changelog is new
34 * tub3270 now supports 3270 console:
35 Specify y for CONFIG_3270 and y for CONFIG_3270_CONSOLE.
36 Support for 3215 will not appear if 3270 console support
37 is chosen.
38 NOTE: The default is 3270 console support, NOT 3215.
39 * the components are remodularized: added source modules are
40 tubttybld.c and tubttyscl.c, for screen-building code and
41 scroll-timeout code.
42 * tub3270 source for this (2.4.0) version is #ifdeffed to
43 build with both 2.4.0 and 2.2.16.2.
44 * color support and minimal other ESC-sequence support is added.
diff --git a/Documentation/s390/3270.txt b/Documentation/s390/3270.txt
new file mode 100644
index 000000000000..0a044e647d2d
--- /dev/null
+++ b/Documentation/s390/3270.txt
@@ -0,0 +1,274 @@
1IBM 3270 Display System support
2
3This file describes the driver that supports local channel attachment
4of IBM 3270 devices. It consists of three sections:
5 * Introduction
6 * Installation
7 * Operation
8
9
10INTRODUCTION.
11
12This paper describes installing and operating 3270 devices under
13Linux/390. A 3270 device is a block-mode rows-and-columns terminal of
14which I'm sure hundreds of millions were sold by IBM and clonemakers
15twenty and thirty years ago.
16
17You may have 3270s in-house and not know it. If you're using the
18VM-ESA operating system, define a 3270 to your virtual machine by using
19the command "DEF GRAF <hex-address>" This paper presumes you will be
20defining four 3270s with the CP/CMS commands
21
22 DEF GRAF 620
23 DEF GRAF 621
24 DEF GRAF 622
25 DEF GRAF 623
26
27Your network connection from VM-ESA allows you to use x3270, tn3270, or
28another 3270 emulator, started from an xterm window on your PC or
29workstation. With the DEF GRAF command, an application such as xterm,
30and this Linux-390 3270 driver, you have another way of talking to your
31Linux box.
32
33This paper covers installation of the driver and operation of a
34dialed-in x3270.
35
36
37INSTALLATION.
38
39You install the driver by installing a patch, doing a kernel build, and
40running the configuration script (config3270.sh, in this directory).
41
42WARNING: If you are using 3270 console support, you must rerun the
43configuration script every time you change the console's address (perhaps
44by using the condev= parameter in silo's /boot/parmfile). More precisely,
45you should rerun the configuration script every time your set of 3270s,
46including the console 3270, changes subchannel identifier relative to
47one another. ReIPL as soon as possible after running the configuration
48script and the resulting /tmp/mkdev3270.
49
50If you have chosen to make tub3270 a module, you add a line to
51/etc/modprobe.conf. If you are working on a VM virtual machine, you
52can use DEF GRAF to define virtual 3270 devices.
53
54You may generate both 3270 and 3215 console support, or one or the
55other, or neither. If you generate both, the console type under VM is
56not changed. Use #CP Q TERM to see what the current console type is.
57Use #CP TERM CONMODE 3270 to change it to 3270. If you generate only
583270 console support, then the driver automatically converts your console
59at boot time to a 3270 if it is a 3215.
60
61In brief, these are the steps:
62 1. Install the tub3270 patch
63 2. (If a module) add a line to /etc/modprobe.conf
64 3. (If VM) define devices with DEF GRAF
65 4. Reboot
66 5. Configure
67
68To test that everything works, assuming VM and x3270,
69 1. Bring up an x3270 window.
70 2. Use the DIAL command in that window.
71 3. You should immediately see a Linux login screen.
72
73Here are the installation steps in detail:
74
75 1. The 3270 driver is a part of the official Linux kernel
76 source. Build a tree with the kernel source and any necessary
77 patches. Then do
78 make oldconfig
79 (If you wish to disable 3215 console support, edit
80 .config; change CONFIG_TN3215's value to "n";
81 and rerun "make oldconfig".)
82 make image
83 make modules
84 make modules_install
85
86 2. (Perform this step only if you have configured tub3270 as a
87 module.) Add a line to /etc/modprobe.conf to automatically
88 load the driver when it's needed. With this line added,
89 you will see login prompts appear on your 3270s as soon as
90 boot is complete (or with emulated 3270s, as soon as you dial
91 into your vm guest using the command "DIAL <vmguestname>").
92 Since the line-mode major number is 227, the line to add to
93 /etc/modprobe.conf should be:
94 alias char-major-227 tub3270
95
96 3. Define graphic devices to your vm guest machine, if you
97 haven't already. Define them before you reboot (reipl):
98 DEFINE GRAF 620
99 DEFINE GRAF 621
100 DEFINE GRAF 622
101 DEFINE GRAF 623
102
103 4. Reboot. The reboot process scans hardware devices, including
104 3270s, and this enables the tub3270 driver once loaded to respond
105 correctly to the configuration requests of the next step. If
106 you have chosen 3270 console support, your console now behaves
107 as a 3270, not a 3215.
108
109 5. Run the 3270 configuration script config3270. It is
110 distributed in this same directory, Documentation/s390, as
111 config3270.sh. Inspect the output script it produces,
112 /tmp/mkdev3270, and then run that script. This will create the
113 necessary character special device files and make the necessary
114 changes to /etc/inittab. If you have selected DEVFS, the driver
115 itself creates the device files, and /tmp/mkdev3270 only changes
116 /etc/inittab.
117
118 Then notify /sbin/init that /etc/inittab has changed, by issuing
119 the telinit command with the q operand:
120 cd Documentation/s390
121 sh config3270.sh
122 sh /tmp/mkdev3270
123 telinit q
124
125 This should be sufficient for your first time. If your 3270
126 configuration has changed and you're reusing config3270, you
127 should follow these steps:
128 Change 3270 configuration
129 Reboot
130 Run config3270 and /tmp/mkdev3270
131 Reboot
132
133Here are the testing steps in detail:
134
135 1. Bring up an x3270 window, or use an actual hardware 3278 or
136 3279, or use the 3270 emulator of your choice. You would be
137 running the emulator on your PC or workstation. You would use
138 the command, for example,
139 x3270 vm-esa-domain-name &
140 if you wanted a 3278 Model 4 with 43 rows of 80 columns, the
141 default model number. The driver does not take advantage of
142 extended attributes.
143
144 The screen you should now see contains a VM logo with input
145 lines near the bottom. Use TAB to move to the bottom line,
146 probably labeled "COMMAND ===>".
147
148 2. Use the DIAL command instead of the LOGIN command to connect
149 to one of the virtual 3270s you defined with the DEF GRAF
150 commands:
151 dial my-vm-guest-name
152
153 3. You should immediately see a login prompt from your
154 Linux-390 operating system. If that does not happen, you would
155 see instead the line "DIALED TO my-vm-guest-name 0620".
156
157 To troubleshoot: do these things.
158
159 A. Is the driver loaded? Use the lsmod command (no operands)
160 to find out. Probably it isn't. Try loading it manually, with
161 the command "insmod tub3270". Does that command give error
162 messages? Ha! There's your problem.
163
164 B. Is the /etc/inittab file modified as in installation step 3
165 above? Use the grep command to find out; for instance, issue
166 "grep 3270 /etc/inittab". Nothing found? There's your
167 problem!
168
169 C. Are the device special files created, as in installation
170 step 2 above? Use the ls -l command to find out; for instance,
171 issue "ls -l /dev/3270/tty620". The output should start with the
172 letter "c" meaning character device and should contain "227, 1"
173 just to the left of the device name. No such file? no "c"?
174 Wrong major number? Wrong minor number? There's your
175 problem!
176
177 D. Do you get the message
178 "HCPDIA047E my-vm-guest-name 0620 does not exist"?
179 If so, you must issue the command "DEF GRAF 620" from your VM
180 3215 console and then reboot the system.
181
182
183
184OPERATION.
185
186The driver defines three areas on the 3270 screen: the log area, the
187input area, and the status area.
188
189The log area takes up all but the bottom two lines of the screen. The
190driver writes terminal output to it, starting at the top line and going
191down. When it fills, the status area changes from "Linux Running" to
192"Linux More...". After a scrolling timeout of (default) 5 sec, the
193screen clears and more output is written, from the top down.
194
195The input area extends from the beginning of the second-to-last screen
196line to the start of the status area. You type commands in this area
197and hit ENTER to execute them.
198
199The status area initializes to "Linux Running" to give you a warm
200fuzzy feeling. When the log area fills up and output awaits, it
201changes to "Linux More...". At this time you can do several things or
202nothing. If you do nothing, the screen will clear in (default) 5 sec
203and more output will appear. You may hit ENTER with nothing typed in
204the input area to toggle between "Linux More..." and "Linux Holding",
205which indicates no scrolling will occur. (If you hit ENTER with "Linux
206Running" and nothing typed, the application receives a newline.)
207
208You may change the scrolling timeout value. For example, the following
209command line:
210 echo scrolltime=60 > /proc/tty/driver/tty3270
211changes the scrolling timeout value to 60 sec. Set scrolltime to 0 if
212you wish to prevent scrolling entirely.
213
214Other things you may do when the log area fills up are: hit PA2 to
215clear the log area and write more output to it, or hit CLEAR to clear
216the log area and the input area and write more output to the log area.
217
218Some of the Program Function (PF) and Program Attention (PA) keys are
219preassigned special functions. The ones that are not yield an alarm
220when pressed.
221
222PA1 causes a SIGINT to the currently running application. You may do
223the same thing from the input area, by typing "^C" and hitting ENTER.
224
225PA2 causes the log area to be cleared. If output awaits, it is then
226written to the log area.
227
228PF3 causes an EOF to be received as input by the application. You may
229cause an EOF also by typing "^D" and hitting ENTER.
230
231No PF key is preassigned to cause a job suspension, but you may cause a
232job suspension by typing "^Z" and hitting ENTER. You may wish to
233assign this function to a PF key. To make PF7 cause job suspension,
234execute the command:
235 echo pf7=^z > /proc/tty/driver/tty3270
236
237If the input you type does not end with the two characters "^n", the
238driver appends a newline character and sends it to the tty driver;
239otherwise the driver strips the "^n" and does not append a newline.
240The IBM 3215 driver behaves similarly.
241
242Pf10 causes the most recent command to be retrieved from the tube's
243command stack (default depth 20) and displayed in the input area. You
244may hit PF10 again for the next-most-recent command, and so on. A
245command is entered into the stack only when the input area is not made
246invisible (such as for password entry) and it is not identical to the
247current top entry. PF10 rotates backward through the command stack;
248PF11 rotates forward. You may assign the backward function to any PF
249key (or PA key, for that matter), say, PA3, with the command:
250 echo -e pa3=\\033k > /proc/tty/driver/tty3270
251This assigns the string ESC-k to PA3. Similarly, the string ESC-j
252performs the forward function. (Rationale: In bash with vi-mode line
253editing, ESC-k and ESC-j retrieve backward and forward history.
254Suggestions welcome.)
255
256Is a stack size of twenty commands not to your liking? Change it on
257the fly. To change to saving the last 100 commands, execute the
258command:
259 echo recallsize=100 > /proc/tty/driver/tty3270
260
261Have a command you issue frequently? Assign it to a PF or PA key! Use
262the command
263 echo pf24="mkdir foobar; cd foobar" > /proc/tty/driver/tty3270
264to execute the commands mkdir foobar and cd foobar immediately when you
265hit PF24. Want to see the command line first, before you execute it?
266Use the -n option of the echo command:
267 echo -n pf24="mkdir foo; cd foo" > /proc/tty/driver/tty3270
268
269
270
271Happy testing! I welcome any and all comments about this document, the
272driver, etc etc.
273
274Dick Hitt <rbh00@utsglobal.com>
diff --git a/Documentation/s390/CommonIO b/Documentation/s390/CommonIO
new file mode 100644
index 000000000000..a831d9ae5a5e
--- /dev/null
+++ b/Documentation/s390/CommonIO
@@ -0,0 +1,109 @@
1S/390 common I/O-Layer - command line parameters and /proc entries
2==================================================================
3
4Command line parameters
5-----------------------
6
7* cio_msg = yes | no
8
9 Determines whether information on found devices and sensed device
10 characteristics should be shown during startup, i. e. messages of the types
11 "Detected device 0.0.4711 on subchannel 0.0.0042" and "SenseID: Device
12 0.0.4711 reports: ...".
13
14 Default is off.
15
16
17* cio_ignore = {all} |
18 {<device> | <range of devices>} |
19 {!<device> | !<range of devices>}
20
21 The given devices will be ignored by the common I/O-layer; no detection
22 and device sensing will be done on any of those devices. The subchannel to
23 which the device in question is attached will be treated as if no device was
24 attached.
25
26 An ignored device can be un-ignored later; see the "/proc entries"-section for
27 details.
28
29 The devices must be given either as bus ids (0.0.abcd) or as hexadecimal
30 device numbers (0xabcd or abcd, for 2.4 backward compatibility).
31 You can use the 'all' keyword to ignore all devices.
32 The '!' operator will cause the I/O-layer to _not_ ignore a device.
33 The order on the command line is not important.
34
35 For example,
36 cio_ignore=0.0.0023-0.0.0042,0.0.4711
37 will ignore all devices ranging from 0.0.0023 to 0.0.0042 and the device
38 0.0.4711, if detected.
39 As another example,
40 cio_ignore=all,!0.0.4711,!0.0.fd00-0.0.fd02
41 will ignore all devices but 0.0.4711, 0.0.fd00, 0.0.fd01, 0.0.fd02.
42
43 By default, no devices are ignored.
44
45
46/proc entries
47-------------
48
49* /proc/cio_ignore
50
51 Lists the ranges of devices (by bus id) which are ignored by common I/O.
52
53 You can un-ignore certain or all devices by piping to /proc/cio_ignore.
54 "free all" will un-ignore all ignored devices,
55 "free <device range>, <device range>, ..." will un-ignore the specified
56 devices.
57
58 For example, if devices 0.0.0023 to 0.0.0042 and 0.0.4711 are ignored,
59 - echo free 0.0.0030-0.0.0032 > /proc/cio_ignore
60 will un-ignore devices 0.0.0030 to 0.0.0032 and will leave devices 0.0.0023
61 to 0.0.002f, 0.0.0033 to 0.0.0042 and 0.0.4711 ignored;
62 - echo free 0.0.0041 > /proc/cio_ignore will furthermore un-ignore device
63 0.0.0041;
64 - echo free all > /proc/cio_ignore will un-ignore all remaining ignored
65 devices.
66
67 When a device is un-ignored, device recognition and sensing is performed and
68 the device driver will be notified if possible, so the device will become
69 available to the system.
70
71 You can also add ranges of devices to be ignored by piping to
72 /proc/cio_ignore; "add <device range>, <device range>, ..." will ignore the
73 specified devices.
74
75 Note: Already known devices cannot be ignored.
76
77 For example, if device 0.0.abcd is already known and all other devices
78 0.0.a000-0.0.afff are not known,
79 "echo add 0.0.a000-0.0.accc, 0.0.af00-0.0.afff > /proc/cio_ignore"
80 will add 0.0.a000-0.0.abcc, 0.0.abce-0.0.accc and 0.0.af00-0.0.afff to the
81 list of ignored devices and skip 0.0.abcd.
82
83 The devices can be specified either by bus id (0.0.abcd) or, for 2.4 backward
84 compatibilty, by the device number in hexadecimal (0xabcd or abcd).
85
86
87* /proc/s390dbf/cio_*/ (S/390 debug feature)
88
89 Some views generated by the debug feature to hold various debug outputs.
90
91 - /proc/s390dbf/cio_crw/sprintf
92 Messages from the processing of pending channel report words (machine check
93 handling), which will also show when CONFIG_DEBUG_CRW is defined.
94
95 - /proc/s390dbf/cio_msg/sprintf
96 Various debug messages from the common I/O-layer; generally, messages which
97 will also show when CONFIG_DEBUG_IO is defined.
98
99 - /proc/s390dbf/cio_trace/hex_ascii
100 Logs the calling of functions in the common I/O-layer and, if applicable,
101 which subchannel they were called for.
102
103 The level of logging can be changed to be more or less verbose by piping to
104 /proc/s390dbf/cio_*/level a number between 0 and 6; see the documentation on
105 the S/390 debug feature (Documentation/s390/s390dbf.txt) for details.
106
107* For some of the information present in the /proc filesystem in 2.4 (namely,
108 /proc/subchannels and /proc/chpids), see driver-model.txt.
109 Information formerly in /proc/irq_count is now in /proc/interrupts.
diff --git a/Documentation/s390/DASD b/Documentation/s390/DASD
new file mode 100644
index 000000000000..9963f1e9c98a
--- /dev/null
+++ b/Documentation/s390/DASD
@@ -0,0 +1,73 @@
1DASD device driver
2
3S/390's disk devices (DASDs) are managed by Linux via the DASD device
4driver. It is valid for all types of DASDs and represents them to
5Linux as block devices, namely "dd". Currently the DASD driver uses a
6single major number (254) and 4 minor numbers per volume (1 for the
7physical volume and 3 for partitions). With respect to partitions see
8below. Thus you may have up to 64 DASD devices in your system.
9
10The kernel parameter 'dasd=from-to,...' may be issued arbitrary times
11in the kernel's parameter line or not at all. The 'from' and 'to'
12parameters are to be given in hexadecimal notation without a leading
130x.
14If you supply kernel parameters the different instances are processed
15in order of appearance and a minor number is reserved for any device
16covered by the supplied range up to 64 volumes. Additional DASDs are
17ignored. If you do not supply the 'dasd=' kernel parameter at all, the
18DASD driver registers all supported DASDs of your system to a minor
19number in ascending order of the subchannel number.
20
21The driver currently supports ECKD-devices and there are stubs for
22support of the FBA and CKD architectures. For the FBA architecture
23only some smart data structures are missing to make the support
24complete.
25We performed our testing on 3380 and 3390 type disks of different
26sizes, under VM and on the bare hardware (LPAR), using internal disks
27of the multiprise as well as a RAMAC virtual array. Disks exported by
28an Enterprise Storage Server (Seascape) should work fine as well.
29
30We currently implement one partition per volume, which is the whole
31volume, skipping the first blocks up to the volume label. These are
32reserved for IPL records and IBM's volume label to assure
33accessibility of the DASD from other OSs. In a later stage we will
34provide support of partitions, maybe VTOC oriented or using a kind of
35partition table in the label record.
36
37USAGE
38
39-Low-level format (?CKD only)
40For using an ECKD-DASD as a Linux harddisk you have to low-level
41format the tracks by issuing the BLKDASDFORMAT-ioctl on that
42device. This will erase any data on that volume including IBM volume
43labels, VTOCs etc. The ioctl may take a 'struct format_data *' or
44'NULL' as an argument.
45typedef struct {
46 int start_unit;
47 int stop_unit;
48 int blksize;
49} format_data_t;
50When a NULL argument is passed to the BLKDASDFORMAT ioctl the whole
51disk is formatted to a blocksize of 1024 bytes. Otherwise start_unit
52and stop_unit are the first and last track to be formatted. If
53stop_unit is -1 it implies that the DASD is formatted from start_unit
54up to the last track. blksize can be any power of two between 512 and
554096. We recommend no blksize lower than 1024 because the ext2fs uses
561kB blocks anyway and you gain approx. 50% of capacity increasing your
57blksize from 512 byte to 1kB.
58
59-Make a filesystem
60Then you can mk??fs the filesystem of your choice on that volume or
61partition. For reasons of sanity you should build your filesystem on
62the partition /dev/dd?1 instead of the whole volume. You only lose 3kB
63but may be sure that you can reuse your data after introduction of a
64real partition table.
65
66BUGS:
67- Performance sometimes is rather low because we don't fully exploit clustering
68
69TODO-List:
70- Add IBM'S Disk layout to genhd
71- Enhance driver to use more than one major number
72- Enable usage as a module
73- Support Cache fast write and DASD fast write (ECKD)
diff --git a/Documentation/s390/Debugging390.txt b/Documentation/s390/Debugging390.txt
new file mode 100644
index 000000000000..adbfe620c061
--- /dev/null
+++ b/Documentation/s390/Debugging390.txt
@@ -0,0 +1,2536 @@
1
2 Debugging on Linux for s/390 & z/Architecture
3 by
4 Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com)
5 Copyright (C) 2000-2001 IBM Deutschland Entwicklung GmbH, IBM Corporation
6 Best viewed with fixed width fonts
7
8Overview of Document:
9=====================
10This document is intended to give an good overview of how to debug
11Linux for s/390 & z/Architecture it isn't intended as a complete reference & not a
12tutorial on the fundamentals of C & assembly, it dosen't go into
13390 IO in any detail. It is intended to complement the documents in the
14reference section below & any other worthwhile references you get.
15
16It is intended like the Enterprise Systems Architecture/390 Reference Summary
17to be printed out & used as a quick cheat sheet self help style reference when
18problems occur.
19
20Contents
21========
22Register Set
23Address Spaces on Intel Linux
24Address Spaces on Linux for s/390 & z/Architecture
25The Linux for s/390 & z/Architecture Kernel Task Structure
26Register Usage & Stackframes on Linux for s/390 & z/Architecture
27A sample program with comments
28Compiling programs for debugging on Linux for s/390 & z/Architecture
29Figuring out gcc compile errors
30Debugging Tools
31objdump
32strace
33Performance Debugging
34Debugging under VM
35s/390 & z/Architecture IO Overview
36Debugging IO on s/390 & z/Architecture under VM
37GDB on s/390 & z/Architecture
38Stack chaining in gdb by hand
39Examining core dumps
40ldd
41Debugging modules
42The proc file system
43Starting points for debugging scripting languages etc.
44Dumptool & Lcrash
45SysRq
46References
47Special Thanks
48
49Register Set
50============
51The current architectures have the following registers.
52
5316 General propose registers, 32 bit on s/390 64 bit on z/Architecture, r0-r15 or gpr0-gpr15 used for arithmetic & addressing.
54
5516 Control registers, 32 bit on s/390 64 bit on z/Architecture, ( cr0-cr15 kernel usage only ) used for memory management,
56interrupt control,debugging control etc.
57
5816 Access registers ( ar0-ar15 ) 32 bit on s/390 & z/Architecture
59not used by normal programs but potentially could
60be used as temporary storage. Their main purpose is their 1 to 1
61association with general purpose registers and are used in
62the kernel for copying data between kernel & user address spaces.
63Access register 0 ( & access register 1 on z/Architecture ( needs 64 bit
64pointer ) ) is currently used by the pthread library as a pointer to
65the current running threads private area.
66
6716 64 bit floating point registers (fp0-fp15 ) IEEE & HFP floating
68point format compliant on G5 upwards & a Floating point control reg (FPC)
694 64 bit registers (fp0,fp2,fp4 & fp6) HFP only on older machines.
70Note:
71Linux (currently) always uses IEEE & emulates G5 IEEE format on older machines,
72( provided the kernel is configured for this ).
73
74
75The PSW is the most important register on the machine it
76is 64 bit on s/390 & 128 bit on z/Architecture & serves the roles of
77a program counter (pc), condition code register,memory space designator.
78In IBM standard notation I am counting bit 0 as the MSB.
79It has several advantages over a normal program counter
80in that you can change address translation & program counter
81in a single instruction. To change address translation,
82e.g. switching address translation off requires that you
83have a logical=physical mapping for the address you are
84currently running at.
85
86 Bit Value
87s/390 z/Architecture
880 0 Reserved ( must be 0 ) otherwise specification exception occurs.
89
901 1 Program Event Recording 1 PER enabled,
91 PER is used to facilititate debugging e.g. single stepping.
92
932-4 2-4 Reserved ( must be 0 ).
94
955 5 Dynamic address translation 1=DAT on.
96
976 6 Input/Output interrupt Mask
98
997 7 External interrupt Mask used primarily for interprocessor signalling &
100 clock interrupts.
101
1028-11 8-11 PSW Key used for complex memory protection mechanism not used under linux
103
10412 12 1 on s/390 0 on z/Architecture
105
10613 13 Machine Check Mask 1=enable machine check interrupts
107
10814 14 Wait State set this to 1 to stop the processor except for interrupts & give
109 time to other LPARS used in CPU idle in the kernel to increase overall
110 usage of processor resources.
111
11215 15 Problem state ( if set to 1 certain instructions are disabled )
113 all linux user programs run with this bit 1
114 ( useful info for debugging under VM ).
115
11616-17 16-17 Address Space Control
117
118 00 Primary Space Mode when DAT on
119 The linux kernel currently runs in this mode, CR1 is affiliated with
120 this mode & points to the primary segment table origin etc.
121
122 01 Access register mode this mode is used in functions to
123 copy data between kernel & user space.
124
125 10 Secondary space mode not used in linux however CR7 the
126 register affiliated with this mode is & this & normally
127 CR13=CR7 to allow us to copy data between kernel & user space.
128 We do this as follows:
129 We set ar2 to 0 to designate its
130 affiliated gpr ( gpr2 )to point to primary=kernel space.
131 We set ar4 to 1 to designate its
132 affiliated gpr ( gpr4 ) to point to secondary=home=user space
133 & then essentially do a memcopy(gpr2,gpr4,size) to
134 copy data between the address spaces, the reason we use home space for the
135 kernel & don't keep secondary space free is that code will not run in
136 secondary space.
137
138 11 Home Space Mode all user programs run in this mode.
139 it is affiliated with CR13.
140
14118-19 18-19 Condition codes (CC)
142
14320 20 Fixed point overflow mask if 1=FPU exceptions for this event
144 occur ( normally 0 )
145
14621 21 Decimal overflow mask if 1=FPU exceptions for this event occur
147 ( normally 0 )
148
14922 22 Exponent underflow mask if 1=FPU exceptions for this event occur
150 ( normally 0 )
151
15223 23 Significance Mask if 1=FPU exceptions for this event occur
153 ( normally 0 )
154
15524-31 24-30 Reserved Must be 0.
156
157 31 Extended Addressing Mode
158 32 Basic Addressing Mode
159 Used to set addressing mode
160 PSW 31 PSW 32
161 0 0 24 bit
162 0 1 31 bit
163 1 1 64 bit
164
16532 1=31 bit addressing mode 0=24 bit addressing mode (for backward
166 compatibility ), linux always runs with this bit set to 1
167
16833-64 Instruction address.
169 33-63 Reserved must be 0
170 64-127 Address
171 In 24 bits mode bits 64-103=0 bits 104-127 Address
172 In 31 bits mode bits 64-96=0 bits 97-127 Address
173 Note: unlike 31 bit mode on s/390 bit 96 must be zero
174 when loading the address with LPSWE otherwise a
175 specification exception occurs, LPSW is fully backward
176 compatible.
177
178
179Prefix Page(s)
180--------------
181This per cpu memory area is too intimately tied to the processor not to mention.
182It exists between the real addresses 0-4096 on s/390 & 0-8192 z/Architecture & is exchanged
183with a 1 page on s/390 or 2 pages on z/Architecture in absolute storage by the set
184prefix instruction in linux'es startup.
185This page is mapped to a different prefix for each processor in an SMP configuration
186( assuming the os designer is sane of course :-) ).
187Bytes 0-512 ( 200 hex ) on s/390 & 0-512,4096-4544,4604-5119 currently on z/Architecture
188are used by the processor itself for holding such information as exception indications &
189entry points for exceptions.
190Bytes after 0xc00 hex are used by linux for per processor globals on s/390 & z/Architecture
191( there is a gap on z/Architecure too currently between 0xc00 & 1000 which linux uses ).
192The closest thing to this on traditional architectures is the interrupt
193vector table. This is a good thing & does simplify some of the kernel coding
194however it means that we now cannot catch stray NULL pointers in the
195kernel without hard coded checks.
196
197
198
199Address Spaces on Intel Linux
200=============================
201
202The traditional Intel Linux is approximately mapped as follows forgive
203the ascii art.
2040xFFFFFFFF 4GB Himem *****************
205 * *
206 * Kernel Space *
207 * *
208 ***************** ****************
209User Space Himem (typically 0xC0000000 3GB )* User Stack * * *
210 ***************** * *
211 * Shared Libs * * Next Process *
212 ***************** * to *
213 * * <== * Run * <==
214 * User Program * * *
215 * Data BSS * * *
216 * Text * * *
217 * Sections * * *
2180x00000000 ***************** ****************
219
220Now it is easy to see that on Intel it is quite easy to recognise a kernel address
221as being one greater than user space himem ( in this case 0xC0000000).
222& addresses of less than this are the ones in the current running program on this
223processor ( if an smp box ).
224If using the virtual machine ( VM ) as a debugger it is quite difficult to
225know which user process is running as the address space you are looking at
226could be from any process in the run queue.
227
228The limitation of Intels addressing technique is that the linux
229kernel uses a very simple real address to virtual addressing technique
230of Real Address=Virtual Address-User Space Himem.
231This means that on Intel the kernel linux can typically only address
232Himem=0xFFFFFFFF-0xC0000000=1GB & this is all the RAM these machines
233can typically use.
234They can lower User Himem to 2GB or lower & thus be
235able to use 2GB of RAM however this shrinks the maximum size
236of User Space from 3GB to 2GB they have a no win limit of 4GB unless
237they go to 64 Bit.
238
239
240On 390 our limitations & strengths make us slightly different.
241For backward compatibility we are only allowed use 31 bits (2GB)
242of our 32 bit addresses,however, we use entirely separate address
243spaces for the user & kernel.
244
245This means we can support 2GB of non Extended RAM on s/390, & more
246with the Extended memory management swap device &
247currently 4TB of physical memory currently on z/Architecture.
248
249
250Address Spaces on Linux for s/390 & z/Architecture
251==================================================
252
253Our addressing scheme is as follows
254
255
256Himem 0x7fffffff 2GB on s/390 ***************** ****************
257currently 0x3ffffffffff (2^42)-1 * User Stack * * *
258on z/Architecture. ***************** * *
259 * Shared Libs * * *
260 ***************** * *
261 * * * Kernel *
262 * User Program * * *
263 * Data BSS * * *
264 * Text * * *
265 * Sections * * *
2660x00000000 ***************** ****************
267
268This also means that we need to look at the PSW problem state bit
269or the addressing mode to decide whether we are looking at
270user or kernel space.
271
272Virtual Addresses on s/390 & z/Architecture
273===========================================
274
275A virtual address on s/390 is made up of 3 parts
276The SX ( segment index, roughly corresponding to the PGD & PMD in linux terminology )
277being bits 1-11.
278The PX ( page index, corresponding to the page table entry (pte) in linux terminology )
279being bits 12-19.
280The remaining bits BX (the byte index are the offset in the page )
281i.e. bits 20 to 31.
282
283On z/Architecture in linux we currently make up an address from 4 parts.
284The region index bits (RX) 0-32 we currently use bits 22-32
285The segment index (SX) being bits 33-43
286The page index (PX) being bits 44-51
287The byte index (BX) being bits 52-63
288
289Notes:
2901) s/390 has no PMD so the PMD is really the PGD also.
291A lot of this stuff is defined in pgtable.h.
292
2932) Also seeing as s/390's page indexes are only 1k in size
294(bits 12-19 x 4 bytes per pte ) we use 1 ( page 4k )
295to make the best use of memory by updating 4 segment indices
296entries each time we mess with a PMD & use offsets
2970,1024,2048 & 3072 in this page as for our segment indexes.
298On z/Architecture our page indexes are now 2k in size
299( bits 12-19 x 8 bytes per pte ) we do a similar trick
300but only mess with 2 segment indices each time we mess with
301a PMD.
302
3033) As z/Architecture supports upto a massive 5-level page table lookup we
304can only use 3 currently on Linux ( as this is all the generic kernel
305currently supports ) however this may change in future
306this allows us to access ( according to my sums )
3074TB of virtual storage per process i.e.
3084096*512(PTES)*1024(PMDS)*2048(PGD) = 4398046511104 bytes,
309enough for another 2 or 3 of years I think :-).
310to do this we use a region-third-table designation type in
311our address space control registers.
312
313
314The Linux for s/390 & z/Architecture Kernel Task Structure
315==========================================================
316Each process/thread under Linux for S390 has its own kernel task_struct
317defined in linux/include/linux/sched.h
318The S390 on initialisation & resuming of a process on a cpu sets
319the __LC_KERNEL_STACK variable in the spare prefix area for this cpu
320( which we use for per processor globals).
321
322The kernel stack pointer is intimately tied with the task stucture for
323each processor as follows.
324
325 s/390
326 ************************
327 * 1 page kernel stack *
328 * ( 4K ) *
329 ************************
330 * 1 page task_struct *
331 * ( 4K ) *
3328K aligned ************************
333
334 z/Architecture
335 ************************
336 * 2 page kernel stack *
337 * ( 8K ) *
338 ************************
339 * 2 page task_struct *
340 * ( 8K ) *
34116K aligned ************************
342
343What this means is that we don't need to dedicate any register or global variable
344to point to the current running process & can retrieve it with the following
345very simple construct for s/390 & one very similar for z/Architecture.
346
347static inline struct task_struct * get_current(void)
348{
349 struct task_struct *current;
350 __asm__("lhi %0,-8192\n\t"
351 "nr %0,15"
352 : "=r" (current) );
353 return current;
354}
355
356i.e. just anding the current kernel stack pointer with the mask -8192.
357Thankfully because Linux dosen't have support for nested IO interrupts
358& our devices have large buffers can survive interrupts being shut for
359short amounts of time we don't need a separate stack for interrupts.
360
361
362
363
364Register Usage & Stackframes on Linux for s/390 & z/Architecture
365=================================================================
366Overview:
367---------
368This is the code that gcc produces at the top & the bottom of
369each function, it usually is fairly consistent & similar from
370function to function & if you know its layout you can probalby
371make some headway in finding the ultimate cause of a problem
372after a crash without a source level debugger.
373
374Note: To follow stackframes requires a knowledge of C or Pascal &
375limited knowledge of one assembly language.
376
377It should be noted that there are some differences between the
378s/390 & z/Architecture stack layouts as the z/Architecture stack layout didn't have
379to maintain compatibility with older linkage formats.
380
381Glossary:
382---------
383alloca:
384This is a built in compiler function for runtime allocation
385of extra space on the callers stack which is obviously freed
386up on function exit ( e.g. the caller may choose to allocate nothing
387of a buffer of 4k if required for temporary purposes ), it generates
388very efficient code ( a few cycles ) when compared to alternatives
389like malloc.
390
391automatics: These are local variables on the stack,
392i.e they aren't in registers & they aren't static.
393
394back-chain:
395This is a pointer to the stack pointer before entering a
396framed functions ( see frameless function ) prologue got by
397deferencing the address of the current stack pointer,
398 i.e. got by accessing the 32 bit value at the stack pointers
399current location.
400
401base-pointer:
402This is a pointer to the back of the literal pool which
403is an area just behind each procedure used to store constants
404in each function.
405
406call-clobbered: The caller probably needs to save these registers if there
407is something of value in them, on the stack or elsewhere before making a
408call to another procedure so that it can restore it later.
409
410epilogue:
411The code generated by the compiler to return to the caller.
412
413frameless-function
414A frameless function in Linux for s390 & z/Architecture is one which doesn't
415need more than the register save area ( 96 bytes on s/390, 160 on z/Architecture )
416given to it by the caller.
417A frameless function never:
4181) Sets up a back chain.
4192) Calls alloca.
4203) Calls other normal functions
4214) Has automatics.
422
423GOT-pointer:
424This is a pointer to the global-offset-table in ELF
425( Executable Linkable Format, Linux'es most common executable format ),
426all globals & shared library objects are found using this pointer.
427
428lazy-binding
429ELF shared libraries are typically only loaded when routines in the shared
430library are actually first called at runtime. This is lazy binding.
431
432procedure-linkage-table
433This is a table found from the GOT which contains pointers to routines
434in other shared libraries which can't be called to by easier means.
435
436prologue:
437The code generated by the compiler to set up the stack frame.
438
439outgoing-args:
440This is extra area allocated on the stack of the calling function if the
441parameters for the callee's cannot all be put in registers, the same
442area can be reused by each function the caller calls.
443
444routine-descriptor:
445A COFF executable format based concept of a procedure reference
446actually being 8 bytes or more as opposed to a simple pointer to the routine.
447This is typically defined as follows
448Routine Descriptor offset 0=Pointer to Function
449Routine Descriptor offset 4=Pointer to Table of Contents
450The table of contents/TOC is roughly equivalent to a GOT pointer.
451& it means that shared libraries etc. can be shared between several
452environments each with their own TOC.
453
454
455static-chain: This is used in nested functions a concept adopted from pascal
456by gcc not used in ansi C or C++ ( although quite useful ), basically it
457is a pointer used to reference local variables of enclosing functions.
458You might come across this stuff once or twice in your lifetime.
459
460e.g.
461The function below should return 11 though gcc may get upset & toss warnings
462about unused variables.
463int FunctionA(int a)
464{
465 int b;
466 FunctionC(int c)
467 {
468 b=c+1;
469 }
470 FunctionC(10);
471 return(b);
472}
473
474
475s/390 & z/Architecture Register usage
476=====================================
477r0 used by syscalls/assembly call-clobbered
478r1 used by syscalls/assembly call-clobbered
479r2 argument 0 / return value 0 call-clobbered
480r3 argument 1 / return value 1 (if long long) call-clobbered
481r4 argument 2 call-clobbered
482r5 argument 3 call-clobbered
483r6 argument 5 saved
484r7 pointer-to arguments 5 to ... saved
485r8 this & that saved
486r9 this & that saved
487r10 static-chain ( if nested function ) saved
488r11 frame-pointer ( if function used alloca ) saved
489r12 got-pointer saved
490r13 base-pointer saved
491r14 return-address saved
492r15 stack-pointer saved
493
494f0 argument 0 / return value ( float/double ) call-clobbered
495f2 argument 1 call-clobbered
496f4 z/Architecture argument 2 saved
497f6 z/Architecture argument 3 saved
498The remaining floating points
499f1,f3,f5 f7-f15 are call-clobbered.
500
501Notes:
502------
5031) The only requirement is that registers which are used
504by the callee are saved, e.g. the compiler is perfectly
505capible of using r11 for purposes other than a frame a
506frame pointer if a frame pointer is not needed.
5072) In functions with variable arguments e.g. printf the calling procedure
508is identical to one without variable arguments & the same number of
509parameters. However, the prologue of this function is somewhat more
510hairy owing to it having to move these parameters to the stack to
511get va_start, va_arg & va_end to work.
5123) Access registers are currently unused by gcc but are used in
513the kernel. Possibilities exist to use them at the moment for
514temporary storage but it isn't recommended.
5154) Only 4 of the floating point registers are used for
516parameter passing as older machines such as G3 only have only 4
517& it keeps the stack frame compatible with other compilers.
518However with IEEE floating point emulation under linux on the
519older machines you are free to use the other 12.
5205) A long long or double parameter cannot be have the
521first 4 bytes in a register & the second four bytes in the
522outgoing args area. It must be purely in the outgoing args
523area if crossing this boundary.
5246) Floating point parameters are mixed with outgoing args
525on the outgoing args area in the order the are passed in as parameters.
5267) Floating point arguments 2 & 3 are saved in the outgoing args area for
527z/Architecture
528
529
530Stack Frame Layout
531------------------
532s/390 z/Architecture
5330 0 back chain ( a 0 here signifies end of back chain )
5344 8 eos ( end of stack, not used on Linux for S390 used in other linkage formats )
5358 16 glue used in other s/390 linkage formats for saved routine descriptors etc.
53612 24 glue used in other s/390 linkage formats for saved routine descriptors etc.
53716 32 scratch area
53820 40 scratch area
53924 48 saved r6 of caller function
54028 56 saved r7 of caller function
54132 64 saved r8 of caller function
54236 72 saved r9 of caller function
54340 80 saved r10 of caller function
54444 88 saved r11 of caller function
54548 96 saved r12 of caller function
54652 104 saved r13 of caller function
54756 112 saved r14 of caller function
54860 120 saved r15 of caller function
54964 128 saved f4 of caller function
55072 132 saved f6 of caller function
55180 undefined
55296 160 outgoing args passed from caller to callee
55396+x 160+x possible stack alignment ( 8 bytes desirable )
55496+x+y 160+x+y alloca space of caller ( if used )
55596+x+y+z 160+x+y+z automatics of caller ( if used )
5560 back-chain
557
558A sample program with comments.
559===============================
560
561Comments on the function test
562-----------------------------
5631) It didn't need to set up a pointer to the constant pool gpr13 as it isn't used
564( :-( ).
5652) This is a frameless function & no stack is bought.
5663) The compiler was clever enough to recognise that it could return the
567value in r2 as well as use it for the passed in parameter ( :-) ).
5684) The basr ( branch relative & save ) trick works as follows the instruction
569has a special case with r0,r0 with some instruction operands is understood as
570the literal value 0, some risc architectures also do this ). So now
571we are branching to the next address & the address new program counter is
572in r13,so now we subtract the size of the function prologue we have executed
573+ the size of the literal pool to get to the top of the literal pool
5740040037c int test(int b)
575{ # Function prologue below
576 40037c: 90 de f0 34 stm %r13,%r14,52(%r15) # Save registers r13 & r14
577 400380: 0d d0 basr %r13,%r0 # Set up pointer to constant pool using
578 400382: a7 da ff fa ahi %r13,-6 # basr trick
579 return(5+b);
580 # Huge main program
581 400386: a7 2a 00 05 ahi %r2,5 # add 5 to r2
582
583 # Function epilogue below
584 40038a: 98 de f0 34 lm %r13,%r14,52(%r15) # restore registers r13 & 14
585 40038e: 07 fe br %r14 # return
586}
587
588Comments on the function main
589-----------------------------
5901) The compiler did this function optimally ( 8-) )
591
592Literal pool for main.
593400390: ff ff ff ec .long 0xffffffec
594main(int argc,char *argv[])
595{ # Function prologue below
596 400394: 90 bf f0 2c stm %r11,%r15,44(%r15) # Save necessary registers
597 400398: 18 0f lr %r0,%r15 # copy stack pointer to r0
598 40039a: a7 fa ff a0 ahi %r15,-96 # Make area for callee saving
599 40039e: 0d d0 basr %r13,%r0 # Set up r13 to point to
600 4003a0: a7 da ff f0 ahi %r13,-16 # literal pool
601 4003a4: 50 00 f0 00 st %r0,0(%r15) # Save backchain
602
603 return(test(5)); # Main Program Below
604 4003a8: 58 e0 d0 00 l %r14,0(%r13) # load relative address of test from
605 # literal pool
606 4003ac: a7 28 00 05 lhi %r2,5 # Set first parameter to 5
607 4003b0: 4d ee d0 00 bas %r14,0(%r14,%r13) # jump to test setting r14 as return
608 # address using branch & save instruction.
609
610 # Function Epilogue below
611 4003b4: 98 bf f0 8c lm %r11,%r15,140(%r15)# Restore necessary registers.
612 4003b8: 07 fe br %r14 # return to do program exit
613}
614
615
616Compiler updates
617----------------
618
619main(int argc,char *argv[])
620{
621 4004fc: 90 7f f0 1c stm %r7,%r15,28(%r15)
622 400500: a7 d5 00 04 bras %r13,400508 <main+0xc>
623 400504: 00 40 04 f4 .long 0x004004f4
624 # compiler now puts constant pool in code to so it saves an instruction
625 400508: 18 0f lr %r0,%r15
626 40050a: a7 fa ff a0 ahi %r15,-96
627 40050e: 50 00 f0 00 st %r0,0(%r15)
628 return(test(5));
629 400512: 58 10 d0 00 l %r1,0(%r13)
630 400516: a7 28 00 05 lhi %r2,5
631 40051a: 0d e1 basr %r14,%r1
632 # compiler adds 1 extra instruction to epilogue this is done to
633 # avoid processor pipeline stalls owing to data dependencies on g5 &
634 # above as register 14 in the old code was needed directly after being loaded
635 # by the lm %r11,%r15,140(%r15) for the br %14.
636 40051c: 58 40 f0 98 l %r4,152(%r15)
637 400520: 98 7f f0 7c lm %r7,%r15,124(%r15)
638 400524: 07 f4 br %r4
639}
640
641
642Hartmut ( our compiler developer ) also has been threatening to take out the
643stack backchain in optimised code as this also causes pipeline stalls, you
644have been warned.
645
64664 bit z/Architecture code disassembly
647--------------------------------------
648
649If you understand the stuff above you'll understand the stuff
650below too so I'll avoid repeating myself & just say that
651some of the instructions have g's on the end of them to indicate
652they are 64 bit & the stack offsets are a bigger,
653the only other difference you'll find between 32 & 64 bit is that
654we now use f4 & f6 for floating point arguments on 64 bit.
65500000000800005b0 <test>:
656int test(int b)
657{
658 return(5+b);
659 800005b0: a7 2a 00 05 ahi %r2,5
660 800005b4: b9 14 00 22 lgfr %r2,%r2 # downcast to integer
661 800005b8: 07 fe br %r14
662 800005ba: 07 07 bcr 0,%r7
663
664
665}
666
66700000000800005bc <main>:
668main(int argc,char *argv[])
669{
670 800005bc: eb bf f0 58 00 24 stmg %r11,%r15,88(%r15)
671 800005c2: b9 04 00 1f lgr %r1,%r15
672 800005c6: a7 fb ff 60 aghi %r15,-160
673 800005ca: e3 10 f0 00 00 24 stg %r1,0(%r15)
674 return(test(5));
675 800005d0: a7 29 00 05 lghi %r2,5
676 # brasl allows jumps > 64k & is overkill here bras would do fune
677 800005d4: c0 e5 ff ff ff ee brasl %r14,800005b0 <test>
678 800005da: e3 40 f1 10 00 04 lg %r4,272(%r15)
679 800005e0: eb bf f0 f8 00 04 lmg %r11,%r15,248(%r15)
680 800005e6: 07 f4 br %r4
681}
682
683
684
685Compiling programs for debugging on Linux for s/390 & z/Architecture
686====================================================================
687-gdwarf-2 now works it should be considered the default debugging
688format for s/390 & z/Architecture as it is more reliable for debugging
689shared libraries, normal -g debugging works much better now
690Thanks to the IBM java compiler developers bug reports.
691
692This is typically done adding/appending the flags -g or -gdwarf-2 to the
693CFLAGS & LDFLAGS variables Makefile of the program concerned.
694
695If using gdb & you would like accurate displays of registers &
696 stack traces compile without optimisation i.e make sure
697that there is no -O2 or similar on the CFLAGS line of the Makefile &
698the emitted gcc commands, obviously this will produce worse code
699( not advisable for shipment ) but it is an aid to the debugging process.
700
701This aids debugging because the compiler will copy parameters passed in
702in registers onto the stack so backtracing & looking at passed in
703parameters will work, however some larger programs which use inline functions
704will not compile without optimisation.
705
706Debugging with optimisation has since much improved after fixing
707some bugs, please make sure you are using gdb-5.0 or later developed
708after Nov'2000.
709
710Figuring out gcc compile errors
711===============================
712If you are getting a lot of syntax errors compiling a program & the problem
713isn't blatantly obvious from the source.
714It often helps to just preprocess the file, this is done with the -E
715option in gcc.
716What this does is that it runs through the very first phase of compilation
717( compilation in gcc is done in several stages & gcc calls many programs to
718achieve its end result ) with the -E option gcc just calls the gcc preprocessor (cpp).
719The c preprocessor does the following, it joins all the files #included together
720recursively ( #include files can #include other files ) & also the c file you wish to compile.
721It puts a fully qualified path of the #included files in a comment & it
722does macro expansion.
723This is useful for debugging because
7241) You can double check whether the files you expect to be included are the ones
725that are being included ( e.g. double check that you aren't going to the i386 asm directory ).
7262) Check that macro definitions aren't clashing with typedefs,
7273) Check that definitons aren't being used before they are being included.
7284) Helps put the line emitting the error under the microscope if it contains macros.
729
730For convenience the Linux kernel's makefile will do preprocessing automatically for you
731by suffixing the file you want built with .i ( instead of .o )
732
733e.g.
734from the linux directory type
735make arch/s390/kernel/signal.i
736this will build
737
738s390-gcc -D__KERNEL__ -I/home1/barrow/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer
739-fno-strict-aliasing -D__SMP__ -pipe -fno-strength-reduce -E arch/s390/kernel/signal.c
740> arch/s390/kernel/signal.i
741
742Now look at signal.i you should see something like.
743
744
745# 1 "/home1/barrow/linux/include/asm/types.h" 1
746typedef unsigned short umode_t;
747typedef __signed__ char __s8;
748typedef unsigned char __u8;
749typedef __signed__ short __s16;
750typedef unsigned short __u16;
751
752If instead you are getting errors further down e.g.
753unknown instruction:2515 "move.l" or better still unknown instruction:2515
754"Fixme not implemented yet, call Martin" you are probably are attempting to compile some code
755meant for another architecture or code that is simply not implemented, with a fixme statement
756stuck into the inline assembly code so that the author of the file now knows he has work to do.
757To look at the assembly emitted by gcc just before it is about to call gas ( the gnu assembler )
758use the -S option.
759Again for your convenience the Linux kernel's Makefile will hold your hand &
760do all this donkey work for you also by building the file with the .s suffix.
761e.g.
762from the Linux directory type
763make arch/s390/kernel/signal.s
764
765s390-gcc -D__KERNEL__ -I/home1/barrow/linux/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer
766-fno-strict-aliasing -D__SMP__ -pipe -fno-strength-reduce -S arch/s390/kernel/signal.c
767-o arch/s390/kernel/signal.s
768
769
770This will output something like, ( please note the constant pool & the useful comments
771in the prologue to give you a hand at interpreting it ).
772
773.LC54:
774 .string "misaligned (__u16 *) in __xchg\n"
775.LC57:
776 .string "misaligned (__u32 *) in __xchg\n"
777.L$PG1: # Pool sys_sigsuspend
778.LC192:
779 .long -262401
780.LC193:
781 .long -1
782.LC194:
783 .long schedule-.L$PG1
784.LC195:
785 .long do_signal-.L$PG1
786 .align 4
787.globl sys_sigsuspend
788 .type sys_sigsuspend,@function
789sys_sigsuspend:
790# leaf function 0
791# automatics 16
792# outgoing args 0
793# need frame pointer 0
794# call alloca 0
795# has varargs 0
796# incoming args (stack) 0
797# function length 168
798 STM 8,15,32(15)
799 LR 0,15
800 AHI 15,-112
801 BASR 13,0
802.L$CO1: AHI 13,.L$PG1-.L$CO1
803 ST 0,0(15)
804 LR 8,2
805 N 5,.LC192-.L$PG1(13)
806
807Adding -g to the above output makes the output even more useful
808e.g. typing
809make CC:="s390-gcc -g" kernel/sched.s
810
811which compiles.
812s390-gcc -g -D__KERNEL__ -I/home/barrow/linux-2.3/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -fno-strength-reduce -S kernel/sched.c -o kernel/sched.s
813
814also outputs stabs ( debugger ) info, from this info you can find out the
815offsets & sizes of various elements in structures.
816e.g. the stab for the structure
817struct rlimit {
818 unsigned long rlim_cur;
819 unsigned long rlim_max;
820};
821is
822.stabs "rlimit:T(151,2)=s8rlim_cur:(0,5),0,32;rlim_max:(0,5),32,32;;",128,0,0,0
823from this stab you can see that
824rlimit_cur starts at bit offset 0 & is 32 bits in size
825rlimit_max starts at bit offset 32 & is 32 bits in size.
826
827
828Debugging Tools:
829================
830
831objdump
832=======
833This is a tool with many options the most useful being ( if compiled with -g).
834objdump --source <victim program or object file> > <victims debug listing >
835
836
837The whole kernel can be compiled like this ( Doing this will make a 17MB kernel
838& a 200 MB listing ) however you have to strip it before building the image
839using the strip command to make it a more reasonable size to boot it.
840
841A source/assembly mixed dump of the kernel can be done with the line
842objdump --source vmlinux > vmlinux.lst
843Also if the file isn't compiled -g this will output as much debugging information
844as it can ( e.g. function names ), however, this is very slow as it spends lots
845of time searching for debugging info, the following self explanitory line should be used
846instead if the code isn't compiled -g.
847objdump --disassemble-all --syms vmlinux > vmlinux.lst
848as it is much faster
849
850As hard drive space is valuble most of us use the following approach.
8511) Look at the emitted psw on the console to find the crash address in the kernel.
8522) Look at the file System.map ( in the linux directory ) produced when building
853the kernel to find the closest address less than the current PSW to find the
854offending function.
8553) use grep or similar to search the source tree looking for the source file
856 with this function if you don't know where it is.
8574) rebuild this object file with -g on, as an example suppose the file was
858( /arch/s390/kernel/signal.o )
8595) Assuming the file with the erroneous function is signal.c Move to the base of the
860Linux source tree.
8616) rm /arch/s390/kernel/signal.o
8627) make /arch/s390/kernel/signal.o
8638) watch the gcc command line emitted
8649) type it in again or alernatively cut & paste it on the console adding the -g option.
86510) objdump --source arch/s390/kernel/signal.o > signal.lst
866This will output the source & the assembly intermixed, as the snippet below shows
867This will unfortunately output addresses which aren't the same
868as the kernel ones you should be able to get around the mental arithmetic
869by playing with the --adjust-vma parameter to objdump.
870
871
872
873
874extern inline void spin_lock(spinlock_t *lp)
875{
876 a0: 18 34 lr %r3,%r4
877 a2: a7 3a 03 bc ahi %r3,956
878 __asm__ __volatile(" lhi 1,-1\n"
879 a6: a7 18 ff ff lhi %r1,-1
880 aa: 1f 00 slr %r0,%r0
881 ac: ba 01 30 00 cs %r0,%r1,0(%r3)
882 b0: a7 44 ff fd jm aa <sys_sigsuspend+0x2e>
883 saveset = current->blocked;
884 b4: d2 07 f0 68 mvc 104(8,%r15),972(%r4)
885 b8: 43 cc
886 return (set->sig[0] & mask) != 0;
887}
888
8896) If debugging under VM go down to that section in the document for more info.
890
891
892I now have a tool which takes the pain out of --adjust-vma
893& you are able to do something like
894make /arch/s390/kernel/traps.lst
895& it automatically generates the correctly relocated entries for
896the text segment in traps.lst.
897This tool is now standard in linux distro's in scripts/makelst
898
899strace:
900-------
901Q. What is it ?
902A. It is a tool for intercepting calls to the kernel & logging them
903to a file & on the screen.
904
905Q. What use is it ?
906A. You can used it to find out what files a particular program opens.
907
908
909
910Example 1
911---------
912If you wanted to know does ping work but didn't have the source
913strace ping -c 1 127.0.0.1
914& then look at the man pages for each of the syscalls below,
915( In fact this is sometimes easier than looking at some spagetti
916source which conditionally compiles for several architectures )
917Not everything that it throws out needs to make sense immeadiately
918
919Just looking quickly you can see that it is making up a RAW socket
920for the ICMP protocol.
921Doing an alarm(10) for a 10 second timeout
922& doing a gettimeofday call before & after each read to see
923how long the replies took, & writing some text to stdout so the user
924has an idea what is going on.
925
926socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3
927getuid() = 0
928setuid(0) = 0
929stat("/usr/share/locale/C/libc.cat", 0xbffff134) = -1 ENOENT (No such file or directory)
930stat("/usr/share/locale/libc/C", 0xbffff134) = -1 ENOENT (No such file or directory)
931stat("/usr/local/share/locale/C/libc.cat", 0xbffff134) = -1 ENOENT (No such file or directory)
932getpid() = 353
933setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
934setsockopt(3, SOL_SOCKET, SO_RCVBUF, [49152], 4) = 0
935fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(3, 1), ...}) = 0
936mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40008000
937ioctl(1, TCGETS, {B9600 opost isig icanon echo ...}) = 0
938write(1, "PING 127.0.0.1 (127.0.0.1): 56 d"..., 42PING 127.0.0.1 (127.0.0.1): 56 data bytes
939) = 42
940sigaction(SIGINT, {0x8049ba0, [], SA_RESTART}, {SIG_DFL}) = 0
941sigaction(SIGALRM, {0x8049600, [], SA_RESTART}, {SIG_DFL}) = 0
942gettimeofday({948904719, 138951}, NULL) = 0
943sendto(3, "\10\0D\201a\1\0\0\17#\2178\307\36"..., 64, 0, {sin_family=AF_INET,
944sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 64
945sigaction(SIGALRM, {0x8049600, [], SA_RESTART}, {0x8049600, [], SA_RESTART}) = 0
946sigaction(SIGALRM, {0x8049ba0, [], SA_RESTART}, {0x8049600, [], SA_RESTART}) = 0
947alarm(10) = 0
948recvfrom(3, "E\0\0T\0005\0\0@\1|r\177\0\0\1\177"..., 192, 0,
949{sin_family=AF_INET, sin_port=htons(50882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 84
950gettimeofday({948904719, 160224}, NULL) = 0
951recvfrom(3, "E\0\0T\0006\0\0\377\1\275p\177\0"..., 192, 0,
952{sin_family=AF_INET, sin_port=htons(50882), sin_addr=inet_addr("127.0.0.1")}, [16]) = 84
953gettimeofday({948904719, 166952}, NULL) = 0
954write(1, "64 bytes from 127.0.0.1: icmp_se"...,
9555764 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=28.0 ms
956
957Example 2
958---------
959strace passwd 2>&1 | grep open
960produces the following output
961open("/etc/ld.so.cache", O_RDONLY) = 3
962open("/opt/kde/lib/libc.so.5", O_RDONLY) = -1 ENOENT (No such file or directory)
963open("/lib/libc.so.5", O_RDONLY) = 3
964open("/dev", O_RDONLY) = 3
965open("/var/run/utmp", O_RDONLY) = 3
966open("/etc/passwd", O_RDONLY) = 3
967open("/etc/shadow", O_RDONLY) = 3
968open("/etc/login.defs", O_RDONLY) = 4
969open("/dev/tty", O_RDONLY) = 4
970
971The 2>&1 is done to redirect stderr to stdout & grep is then filtering this input
972through the pipe for each line containing the string open.
973
974
975Example 3
976---------
977Getting sophistocated
978telnetd crashes on & I don't know why
979Steps
980-----
9811) Replace the following line in /etc/inetd.conf
982telnet stream tcp nowait root /usr/sbin/in.telnetd -h
983with
984telnet stream tcp nowait root /blah
985
9862) Create the file /blah with the following contents to start tracing telnetd
987#!/bin/bash
988/usr/bin/strace -o/t1 -f /usr/sbin/in.telnetd -h
9893) chmod 700 /blah to make it executable only to root
9904)
991killall -HUP inetd
992or ps aux | grep inetd
993get inetd's process id
994& kill -HUP inetd to restart it.
995
996Important options
997-----------------
998-o is used to tell strace to output to a file in our case t1 in the root directory
999-f is to follow children i.e.
1000e.g in our case above telnetd will start the login process & subsequently a shell like bash.
1001You will be able to tell which is which from the process ID's listed on the left hand side
1002of the strace output.
1003-p<pid> will tell strace to attach to a running process, yup this can be done provided
1004 it isn't being traced or debugged already & you have enough privileges,
1005the reason 2 processes cannot trace or debug the same program is that strace
1006becomes the parent process of the one being debugged & processes ( unlike people )
1007can have only one parent.
1008
1009
1010However the file /t1 will get big quite quickly
1011to test it telnet 127.0.0.1
1012
1013now look at what files in.telnetd execve'd
1014413 execve("/usr/sbin/in.telnetd", ["/usr/sbin/in.telnetd", "-h"], [/* 17 vars */]) = 0
1015414 execve("/bin/login", ["/bin/login", "-h", "localhost", "-p"], [/* 2 vars */]) = 0
1016
1017Whey it worked!.
1018
1019
1020Other hints:
1021------------
1022If the program is not very interactive ( i.e. not much keyboard input )
1023& is crashing in one architecture but not in another you can do
1024an strace of both programs under as identical a scenario as you can
1025on both architectures outputting to a file then.
1026do a diff of the two traces using the diff program
1027i.e.
1028diff output1 output2
1029& maybe you'll be able to see where the call paths differed, this
1030is possibly near the cause of the crash.
1031
1032More info
1033---------
1034Look at man pages for strace & the various syscalls
1035e.g. man strace, man alarm, man socket.
1036
1037
1038Performance Debugging
1039=====================
1040gcc is capible of compiling in profiling code just add the -p option
1041to the CFLAGS, this obviously affects program size & performance.
1042This can be used by the gprof gnu profiling tool or the
1043gcov the gnu code coverage tool ( code coverage is a means of testing
1044code quality by checking if all the code in an executable in exercised by
1045a tester ).
1046
1047
1048Using top to find out where processes are sleeping in the kernel
1049----------------------------------------------------------------
1050To do this copy the System.map from the root directory where
1051the linux kernel was built to the /boot directory on your
1052linux machine.
1053Start top
1054Now type fU<return>
1055You should see a new field called WCHAN which
1056tells you where each process is sleeping here is a typical output.
1057
1058 6:59pm up 41 min, 1 user, load average: 0.00, 0.00, 0.00
105928 processes: 27 sleeping, 1 running, 0 zombie, 0 stopped
1060CPU states: 0.0% user, 0.1% system, 0.0% nice, 99.8% idle
1061Mem: 254900K av, 45976K used, 208924K free, 0K shrd, 28636K buff
1062Swap: 0K av, 0K used, 0K free 8620K cached
1063
1064 PID USER PRI NI SIZE RSS SHARE WCHAN STAT LIB %CPU %MEM TIME COMMAND
1065 750 root 12 0 848 848 700 do_select S 0 0.1 0.3 0:00 in.telnetd
1066 767 root 16 0 1140 1140 964 R 0 0.1 0.4 0:00 top
1067 1 root 8 0 212 212 180 do_select S 0 0.0 0.0 0:00 init
1068 2 root 9 0 0 0 0 down_inte SW 0 0.0 0.0 0:00 kmcheck
1069
1070The time command
1071----------------
1072Another related command is the time command which gives you an indication
1073of where a process is spending the majority of its time.
1074e.g.
1075time ping -c 5 nc
1076outputs
1077real 0m4.054s
1078user 0m0.010s
1079sys 0m0.010s
1080
1081Debugging under VM
1082==================
1083
1084Notes
1085-----
1086Addresses & values in the VM debugger are always hex never decimal
1087Address ranges are of the format <HexValue1>-<HexValue2> or <HexValue1>.<HexValue2>
1088e.g. The address range 0x2000 to 0x3000 can be described described as
10892000-3000 or 2000.1000
1090
1091The VM Debugger is case insensitive.
1092
1093VM's strengths are usually other debuggers weaknesses you can get at any resource
1094no matter how sensitive e.g. memory management resources,change address translation
1095in the PSW. For kernel hacking you will reap dividends if you get good at it.
1096
1097The VM Debugger displays operators but not operands, probably because some
1098of it was written when memory was expensive & the programmer was probably proud that
1099it fitted into 2k of memory & the programmers & didn't want to shock hardcore VM'ers by
1100changing the interface :-), also the debugger displays useful information on the same line &
1101the author of the code probably felt that it was a good idea not to go over
1102the 80 columns on the screen.
1103
1104As some of you are probably in a panic now this isn't as unintuitive as it may seem
1105as the 390 instructions are easy to decode mentally & you can make a good guess at a lot
1106of them as all the operands are nibble ( half byte aligned ) & if you have an objdump listing
1107also it is quite easy to follow, if you don't have an objdump listing keep a copy of
1108the s/390 Reference Summary & look at between pages 2 & 7 or alternatively the
1109s/390 principles of operation.
1110e.g. even I can guess that
11110001AFF8' LR 180F CC 0
1112is a ( load register ) lr r0,r15
1113
1114Also it is very easy to tell the length of a 390 instruction from the 2 most significant
1115bits in the instruction ( not that this info is really useful except if you are trying to
1116make sense of a hexdump of code ).
1117Here is a table
1118Bits Instruction Length
1119------------------------------------------
112000 2 Bytes
112101 4 Bytes
112210 4 Bytes
112311 6 Bytes
1124
1125
1126
1127
1128The debugger also displays other useful info on the same line such as the
1129addresses being operated on destination addresses of branches & condition codes.
1130e.g.
113100019736' AHI A7DAFF0E CC 1
1132000198BA' BRC A7840004 -> 000198C2' CC 0
1133000198CE' STM 900EF068 >> 0FA95E78 CC 2
1134
1135
1136
1137Useful VM debugger commands
1138---------------------------
1139
1140I suppose I'd better mention this before I start
1141to list the current active traces do
1142Q TR
1143there can be a maximum of 255 of these per set
1144( more about trace sets later ).
1145To stop traces issue a
1146TR END.
1147To delete a particular breakpoint issue
1148TR DEL <breakpoint number>
1149
1150The PA1 key drops to CP mode so you can issue debugger commands,
1151Doing alt c (on my 3270 console at least ) clears the screen.
1152hitting b <enter> comes back to the running operating system
1153from cp mode ( in our case linux ).
1154It is typically useful to add shortcuts to your profile.exec file
1155if you have one ( this is roughly equivalent to autoexec.bat in DOS ).
1156file here are a few from mine.
1157/* this gives me command history on issuing f12 */
1158set pf12 retrieve
1159/* this continues */
1160set pf8 imm b
1161/* goes to trace set a */
1162set pf1 imm tr goto a
1163/* goes to trace set b */
1164set pf2 imm tr goto b
1165/* goes to trace set c */
1166set pf3 imm tr goto c
1167
1168
1169
1170Instruction Tracing
1171-------------------
1172Setting a simple breakpoint
1173TR I PSWA <address>
1174To debug a particular function try
1175TR I R <function address range>
1176TR I on its own will single step.
1177TR I DATA <MNEMONIC> <OPTIONAL RANGE> will trace for particular mnemonics
1178e.g.
1179TR I DATA 4D R 0197BC.4000
1180will trace for BAS'es ( opcode 4D ) in the range 0197BC.4000
1181if you were inclined you could add traces for all branch instructions &
1182suffix them with the run prefix so you would have a backtrace on screen
1183when a program crashes.
1184TR BR <INTO OR FROM> will trace branches into or out of an address.
1185e.g.
1186TR BR INTO 0 is often quite useful if a program is getting awkward & deciding
1187to branch to 0 & crashing as this will stop at the address before in jumps to 0.
1188TR I R <address range> RUN cmd d g
1189single steps a range of addresses but stays running &
1190displays the gprs on each step.
1191
1192
1193
1194Displaying & modifying Registers
1195--------------------------------
1196D G will display all the gprs
1197Adding a extra G to all the commands is necessary to access the full 64 bit
1198content in VM on z/Architecture obviously this isn't required for access registers
1199as these are still 32 bit.
1200e.g. DGG instead of DG
1201D X will display all the control registers
1202D AR will display all the access registers
1203D AR4-7 will display access registers 4 to 7
1204CPU ALL D G will display the GRPS of all CPUS in the configuration
1205D PSW will display the current PSW
1206st PSW 2000 will put the value 2000 into the PSW &
1207cause crash your machine.
1208D PREFIX displays the prefix offset
1209
1210
1211Displaying Memory
1212-----------------
1213To display memory mapped using the current PSW's mapping try
1214D <range>
1215To make VM display a message each time it hits a particular address & continue try
1216D I<range> will disassemble/display a range of instructions.
1217ST addr 32 bit word will store a 32 bit aligned address
1218D T<range> will display the EBCDIC in an address ( if you are that way inclined )
1219D R<range> will display real addresses ( without DAT ) but with prefixing.
1220There are other complex options to display if you need to get at say home space
1221but are in primary space the easiest thing to do is to temporarily
1222modify the PSW to the other addressing mode, display the stuff & then
1223restore it.
1224
1225
1226
1227Hints
1228-----
1229If you want to issue a debugger command without halting your virtual machine with the
1230PA1 key try prefixing the command with #CP e.g.
1231#cp tr i pswa 2000
1232also suffixing most debugger commands with RUN will cause them not
1233to stop just display the mnemonic at the current instruction on the console.
1234If you have several breakpoints you want to put into your program &
1235you get fed up of cross referencing with System.map
1236you can do the following trick for several symbols.
1237grep do_signal System.map
1238which emits the following among other things
12390001f4e0 T do_signal
1240now you can do
1241
1242TR I PSWA 0001f4e0 cmd msg * do_signal
1243This sends a message to your own console each time do_signal is entered.
1244( As an aside I wrote a perl script once which automatically generated a REXX
1245script with breakpoints on every kernel procedure, this isn't a good idea
1246because there are thousands of these routines & VM can only set 255 breakpoints
1247at a time so you nearly had to spend as long pruning the file down as you would
1248entering the msg's by hand ),however, the trick might be useful for a single object file.
1249On linux'es 3270 emulator x3270 there is a very useful option under the file ment
1250Save Screens In File this is very good of keeping a copy of traces.
1251
1252From CMS help <command name> will give you online help on a particular command.
1253e.g.
1254HELP DISPLAY
1255
1256Also CP has a file called profile.exec which automatically gets called
1257on startup of CMS ( like autoexec.bat ), keeping on a DOS analogy session
1258CP has a feature similar to doskey, it may be useful for you to
1259use profile.exec to define some keystrokes.
1260e.g.
1261SET PF9 IMM B
1262This does a single step in VM on pressing F8.
1263SET PF10 ^
1264This sets up the ^ key.
1265which can be used for ^c (ctrl-c),^z (ctrl-z) which can't be typed directly into some 3270 consoles.
1266SET PF11 ^-
1267This types the starting keystrokes for a sysrq see SysRq below.
1268SET PF12 RETRIEVE
1269This retrieves command history on pressing F12.
1270
1271
1272Sometimes in VM the display is set up to scroll automatically this
1273can be very annoying if there are messages you wish to look at
1274to stop this do
1275TERM MORE 255 255
1276This will nearly stop automatic screen updates, however it will
1277cause a denial of service if lots of messages go to the 3270 console,
1278so it would be foolish to use this as the default on a production machine.
1279
1280
1281Tracing particular processes
1282----------------------------
1283The kernel's text segment is intentionally at an address in memory that it will
1284very seldom collide with text segments of user programs ( thanks Martin ),
1285this simplifies debugging the kernel.
1286However it is quite common for user processes to have addresses which collide
1287this can make debugging a particular process under VM painful under normal
1288circumstances as the process may change when doing a
1289TR I R <address range>.
1290Thankfully after reading VM's online help I figured out how to debug
1291I particular process.
1292
1293Your first problem is to find the STD ( segment table designation )
1294of the program you wish to debug.
1295There are several ways you can do this here are a few
12961) objdump --syms <program to be debugged> | grep main
1297To get the address of main in the program.
1298tr i pswa <address of main>
1299Start the program, if VM drops to CP on what looks like the entry
1300point of the main function this is most likely the process you wish to debug.
1301Now do a D X13 or D XG13 on z/Architecture.
1302On 31 bit the STD is bits 1-19 ( the STO segment table origin )
1303& 25-31 ( the STL segment table length ) of CR13.
1304now type
1305TR I R STD <CR13's value> 0.7fffffff
1306e.g.
1307TR I R STD 8F32E1FF 0.7fffffff
1308Another very useful variation is
1309TR STORE INTO STD <CR13's value> <address range>
1310for finding out when a particular variable changes.
1311
1312An alternative way of finding the STD of a currently running process
1313is to do the following, ( this method is more complex but
1314could be quite convient if you aren't updating the kernel much &
1315so your kernel structures will stay constant for a reasonable period of
1316time ).
1317
1318grep task /proc/<pid>/status
1319from this you should see something like
1320task: 0f160000 ksp: 0f161de8 pt_regs: 0f161f68
1321This now gives you a pointer to the task structure.
1322Now make CC:="s390-gcc -g" kernel/sched.s
1323To get the task_struct stabinfo.
1324( task_struct is defined in include/linux/sched.h ).
1325Now we want to look at
1326task->active_mm->pgd
1327on my machine the active_mm in the task structure stab is
1328active_mm:(4,12),672,32
1329its offset is 672/8=84=0x54
1330the pgd member in the mm_struct stab is
1331pgd:(4,6)=*(29,5),96,32
1332so its offset is 96/8=12=0xc
1333
1334so we'll
1335hexdump -s 0xf160054 /dev/mem | more
1336i.e. task_struct+active_mm offset
1337to look at the active_mm member
1338f160054 0fee cc60 0019 e334 0000 0000 0000 0011
1339hexdump -s 0x0feecc6c /dev/mem | more
1340i.e. active_mm+pgd offset
1341feecc6c 0f2c 0000 0000 0001 0000 0001 0000 0010
1342we get something like
1343now do
1344TR I R STD <pgd|0x7f> 0.7fffffff
1345i.e. the 0x7f is added because the pgd only
1346gives the page table origin & we need to set the low bits
1347to the maximum possible segment table length.
1348TR I R STD 0f2c007f 0.7fffffff
1349on z/Architecture you'll probably need to do
1350TR I R STD <pgd|0x7> 0.ffffffffffffffff
1351to set the TableType to 0x1 & the Table length to 3.
1352
1353
1354
1355Tracing Program Exceptions
1356--------------------------
1357If you get a crash which says something like
1358illegal operation or specification exception followed by a register dump
1359You can restart linux & trace these using the tr prog <range or value> trace option.
1360
1361
1362
1363The most common ones you will normally be tracing for is
13641=operation exception
13652=privileged operation exception
13664=protection exception
13675=addressing exception
13686=specification exception
136910=segment translation exception
137011=page translation exception
1371
1372The full list of these is on page 22 of the current s/390 Reference Summary.
1373e.g.
1374tr prog 10 will trace segment translation exceptions.
1375tr prog on its own will trace all program interruption codes.
1376
1377Trace Sets
1378----------
1379On starting VM you are initially in the INITIAL trace set.
1380You can do a Q TR to verify this.
1381If you have a complex tracing situation where you wish to wait for instance
1382till a driver is open before you start tracing IO, but know in your
1383heart that you are going to have to make several runs through the code till you
1384have a clue whats going on.
1385
1386What you can do is
1387TR I PSWA <Driver open address>
1388hit b to continue till breakpoint
1389reach the breakpoint
1390now do your
1391TR GOTO B
1392TR IO 7c08-7c09 inst int run
1393or whatever the IO channels you wish to trace are & hit b
1394
1395To got back to the initial trace set do
1396TR GOTO INITIAL
1397& the TR I PSWA <Driver open address> will be the only active breakpoint again.
1398
1399
1400Tracing linux syscalls under VM
1401-------------------------------
1402Syscalls are implemented on Linux for S390 by the Supervisor call instruction (SVC) there 256
1403possibilities of these as the instruction is made up of a 0xA opcode & the second byte being
1404the syscall number. They are traced using the simple command.
1405TR SVC <Optional value or range>
1406the syscalls are defined in linux/include/asm-s390/unistd.h
1407e.g. to trace all file opens just do
1408TR SVC 5 ( as this is the syscall number of open )
1409
1410
1411SMP Specific commands
1412---------------------
1413To find out how many cpus you have
1414Q CPUS displays all the CPU's available to your virtual machine
1415To find the cpu that the current cpu VM debugger commands are being directed at do
1416Q CPU to change the current cpu cpu VM debugger commands are being directed at do
1417CPU <desired cpu no>
1418
1419On a SMP guest issue a command to all CPUs try prefixing the command with cpu all.
1420To issue a command to a particular cpu try cpu <cpu number> e.g.
1421CPU 01 TR I R 2000.3000
1422If you are running on a guest with several cpus & you have a IO related problem
1423& cannot follow the flow of code but you know it isnt smp related.
1424from the bash prompt issue
1425shutdown -h now or halt.
1426do a Q CPUS to find out how many cpus you have
1427detach each one of them from cp except cpu 0
1428by issuing a
1429DETACH CPU 01-(number of cpus in configuration)
1430& boot linux again.
1431TR SIGP will trace inter processor signal processor instructions.
1432DEFINE CPU 01-(number in configuration)
1433will get your guests cpus back.
1434
1435
1436Help for displaying ascii textstrings
1437-------------------------------------
1438On the very latest VM Nucleus'es VM can now display ascii
1439( thanks Neale for the hint ) by doing
1440D TX<lowaddr>.<len>
1441e.g.
1442D TX0.100
1443
1444Alternatively
1445=============
1446Under older VM debuggers ( I love EBDIC too ) you can use this little program I wrote which
1447will convert a command line of hex digits to ascii text which can be compiled under linux &
1448you can copy the hex digits from your x3270 terminal to your xterm if you are debugging
1449from a linuxbox.
1450
1451This is quite useful when looking at a parameter passed in as a text string
1452under VM ( unless you are good at decoding ASCII in your head ).
1453
1454e.g. consider tracing an open syscall
1455TR SVC 5
1456We have stopped at a breakpoint
1457000151B0' SVC 0A05 -> 0001909A' CC 0
1458
1459D 20.8 to check the SVC old psw in the prefix area & see was it from userspace
1460( for the layout of the prefix area consult P18 of the s/390 390 Reference Summary
1461if you have it available ).
1462V00000020 070C2000 800151B2
1463The problem state bit wasn't set & it's also too early in the boot sequence
1464for it to be a userspace SVC if it was we would have to temporarily switch the
1465psw to user space addressing so we could get at the first parameter of the open in
1466gpr2.
1467Next do a
1468D G2
1469GPR 2 = 00014CB4
1470Now display what gpr2 is pointing to
1471D 00014CB4.20
1472V00014CB4 2F646576 2F636F6E 736F6C65 00001BF5
1473V00014CC4 FC00014C B4001001 E0001000 B8070707
1474Now copy the text till the first 00 hex ( which is the end of the string
1475to an xterm & do hex2ascii on it.
1476hex2ascii 2F646576 2F636F6E 736F6C65 00
1477outputs
1478Decoded Hex:=/ d e v / c o n s o l e 0x00
1479We were opening the console device,
1480
1481You can compile the code below yourself for practice :-),
1482/*
1483 * hex2ascii.c
1484 * a useful little tool for converting a hexadecimal command line to ascii
1485 *
1486 * Author(s): Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com)
1487 * (C) 2000 IBM Deutschland Entwicklung GmbH, IBM Corporation.
1488 */
1489#include <stdio.h>
1490
1491int main(int argc,char *argv[])
1492{
1493 int cnt1,cnt2,len,toggle=0;
1494 int startcnt=1;
1495 unsigned char c,hex;
1496
1497 if(argc>1&&(strcmp(argv[1],"-a")==0))
1498 startcnt=2;
1499 printf("Decoded Hex:=");
1500 for(cnt1=startcnt;cnt1<argc;cnt1++)
1501 {
1502 len=strlen(argv[cnt1]);
1503 for(cnt2=0;cnt2<len;cnt2++)
1504 {
1505 c=argv[cnt1][cnt2];
1506 if(c>='0'&&c<='9')
1507 c=c-'0';
1508 if(c>='A'&&c<='F')
1509 c=c-'A'+10;
1510 if(c>='a'&&c<='f')
1511 c=c-'a'+10;
1512 switch(toggle)
1513 {
1514 case 0:
1515 hex=c<<4;
1516 toggle=1;
1517 break;
1518 case 1:
1519 hex+=c;
1520 if(hex<32||hex>127)
1521 {
1522 if(startcnt==1)
1523 printf("0x%02X ",(int)hex);
1524 else
1525 printf(".");
1526 }
1527 else
1528 {
1529 printf("%c",hex);
1530 if(startcnt==1)
1531 printf(" ");
1532 }
1533 toggle=0;
1534 break;
1535 }
1536 }
1537 }
1538 printf("\n");
1539}
1540
1541
1542
1543
1544Stack tracing under VM
1545----------------------
1546A basic backtrace
1547-----------------
1548
1549Here are the tricks I use 9 out of 10 times it works pretty well,
1550
1551When your backchain reaches a dead end
1552--------------------------------------
1553This can happen when an exception happens in the kernel & the kernel is entered twice
1554if you reach the NULL pointer at the end of the back chain you should be
1555able to sniff further back if you follow the following tricks.
15561) A kernel address should be easy to recognise since it is in
1557primary space & the problem state bit isn't set & also
1558The Hi bit of the address is set.
15592) Another backchain should also be easy to recognise since it is an
1560address pointing to another address approximately 100 bytes or 0x70 hex
1561behind the current stackpointer.
1562
1563
1564Here is some practice.
1565boot the kernel & hit PA1 at some random time
1566d g to display the gprs, this should display something like
1567GPR 0 = 00000001 00156018 0014359C 00000000
1568GPR 4 = 00000001 001B8888 000003E0 00000000
1569GPR 8 = 00100080 00100084 00000000 000FE000
1570GPR 12 = 00010400 8001B2DC 8001B36A 000FFED8
1571Note that GPR14 is a return address but as we are real men we are going to
1572trace the stack.
1573display 0x40 bytes after the stack pointer.
1574
1575V000FFED8 000FFF38 8001B838 80014C8E 000FFF38
1576V000FFEE8 00000000 00000000 000003E0 00000000
1577V000FFEF8 00100080 00100084 00000000 000FE000
1578V000FFF08 00010400 8001B2DC 8001B36A 000FFED8
1579
1580
1581Ah now look at whats in sp+56 (sp+0x38) this is 8001B36A our saved r14 if
1582you look above at our stackframe & also agrees with GPR14.
1583
1584now backchain
1585d 000FFF38.40
1586we now are taking the contents of SP to get our first backchain.
1587
1588V000FFF38 000FFFA0 00000000 00014995 00147094
1589V000FFF48 00147090 001470A0 000003E0 00000000
1590V000FFF58 00100080 00100084 00000000 001BF1D0
1591V000FFF68 00010400 800149BA 80014CA6 000FFF38
1592
1593This displays a 2nd return address of 80014CA6
1594
1595now do d 000FFFA0.40 for our 3rd backchain
1596
1597V000FFFA0 04B52002 0001107F 00000000 00000000
1598V000FFFB0 00000000 00000000 FF000000 0001107F
1599V000FFFC0 00000000 00000000 00000000 00000000
1600V000FFFD0 00010400 80010802 8001085A 000FFFA0
1601
1602
1603our 3rd return address is 8001085A
1604
1605as the 04B52002 looks suspiciously like rubbish it is fair to assume that the kernel entry routines
1606for the sake of optimisation dont set up a backchain.
1607
1608now look at System.map to see if the addresses make any sense.
1609
1610grep -i 0001b3 System.map
1611outputs among other things
16120001b304 T cpu_idle
1613so 8001B36A
1614is cpu_idle+0x66 ( quiet the cpu is asleep, don't wake it )
1615
1616
1617grep -i 00014 System.map
1618produces among other things
161900014a78 T start_kernel
1620so 0014CA6 is start_kernel+some hex number I can't add in my head.
1621
1622grep -i 00108 System.map
1623this produces
162400010800 T _stext
1625so 8001085A is _stext+0x5a
1626
1627Congrats you've done your first backchain.
1628
1629
1630
1631s/390 & z/Architecture IO Overview
1632==================================
1633
1634I am not going to give a course in 390 IO architecture as this would take me quite a
1635while & I'm no expert. Instead I'll give a 390 IO architecture summary for Dummies if you have
1636the s/390 principles of operation available read this instead. If nothing else you may find a few
1637useful keywords in here & be able to use them on a web search engine like altavista to find
1638more useful information.
1639
1640Unlike other bus architectures modern 390 systems do their IO using mostly
1641fibre optics & devices such as tapes & disks can be shared between several mainframes,
1642also S390 can support upto 65536 devices while a high end PC based system might be choking
1643with around 64. Here is some of the common IO terminology
1644
1645Subchannel:
1646This is the logical number most IO commands use to talk to an IO device there can be upto
16470x10000 (65536) of these in a configuration typically there is a few hundred. Under VM
1648for simplicity they are allocated contiguously, however on the native hardware they are not
1649they typically stay consistent between boots provided no new hardware is inserted or removed.
1650Under Linux for 390 we use these as IRQ's & also when issuing an IO command (CLEAR SUBCHANNEL,
1651HALT SUBCHANNEL,MODIFY SUBCHANNEL,RESUME SUBCHANNEL,START SUBCHANNEL,STORE SUBCHANNEL &
1652TEST SUBCHANNEL ) we use this as the ID of the device we wish to talk to, the most
1653important of these instructions are START SUBCHANNEL ( to start IO ), TEST SUBCHANNEL ( to check
1654whether the IO completed successfully ), & HALT SUBCHANNEL ( to kill IO ), a subchannel
1655can have up to 8 channel paths to a device this offers redunancy if one is not available.
1656
1657
1658Device Number:
1659This number remains static & Is closely tied to the hardware, there are 65536 of these
1660also they are made up of a CHPID ( Channel Path ID, the most significant 8 bits )
1661& another lsb 8 bits. These remain static even if more devices are inserted or removed
1662from the hardware, there is a 1 to 1 mapping between Subchannels & Device Numbers provided
1663devices arent inserted or removed.
1664
1665Channel Control Words:
1666CCWS are linked lists of instructions initially pointed to by an operation request block (ORB),
1667which is initially given to Start Subchannel (SSCH) command along with the subchannel number
1668for the IO subsystem to process while the CPU continues executing normal code.
1669These come in two flavours, Format 0 ( 24 bit for backward )
1670compatibility & Format 1 ( 31 bit ). These are typically used to issue read & write
1671( & many other instructions ) they consist of a length field & an absolute address field.
1672For each IO typically get 1 or 2 interrupts one for channel end ( primary status ) when the
1673channel is idle & the second for device end ( secondary status ) sometimes you get both
1674concurrently, you check how the IO went on by issuing a TEST SUBCHANNEL at each interrupt,
1675from which you receive an Interruption response block (IRB). If you get channel & device end
1676status in the IRB without channel checks etc. your IO probably went okay. If you didn't you
1677probably need a doctorto examine the IRB & extended status word etc.
1678If an error occurs more sophistocated control units have a facitity known as
1679concurrent sense this means that if an error occurs Extended sense information will
1680be presented in the Extended status word in the IRB if not you have to issue a
1681subsequent SENSE CCW command after the test subchannel.
1682
1683
1684TPI( Test pending interrupt) can also be used for polled IO but in multitasking multiprocessor
1685systems it isn't recommended except for checking special cases ( i.e. non looping checks for
1686pending IO etc. ).
1687
1688Store Subchannel & Modify Subchannel can be used to examine & modify operating characteristics
1689of a subchannel ( e.g. channel paths ).
1690
1691Other IO related Terms:
1692Sysplex: S390's Clustering Technology
1693QDIO: S390's new high speed IO architecture to support devices such as gigabit ethernet,
1694this architecture is also designed to be forward compatible with up & coming 64 bit machines.
1695
1696
1697General Concepts
1698
1699Input Output Processors (IOP's) are responsible for communicating between
1700the mainframe CPU's & the channel & relieve the mainframe CPU's from the
1701burden of communicating with IO devices directly, this allows the CPU's to
1702concentrate on data processing.
1703
1704IOP's can use one or more links ( known as channel paths ) to talk to each
1705IO device. It first checks for path availability & chooses an available one,
1706then starts ( & sometimes terminates IO ).
1707There are two types of channel path ESCON & the Paralell IO interface.
1708
1709IO devices are attached to control units, control units provide the
1710logic to interface the channel paths & channel path IO protocols to
1711the IO devices, they can be integrated with the devices or housed separately
1712& often talk to several similar devices ( typical examples would be raid
1713controllers or a control unit which connects to 1000 3270 terminals ).
1714
1715
1716 +---------------------------------------------------------------+
1717 | +-----+ +-----+ +-----+ +-----+ +----------+ +----------+ |
1718 | | CPU | | CPU | | CPU | | CPU | | Main | | Expanded | |
1719 | | | | | | | | | | Memory | | Storage | |
1720 | +-----+ +-----+ +-----+ +-----+ +----------+ +----------+ |
1721 |---------------------------------------------------------------+
1722 | IOP | IOP | IOP |
1723 |---------------------------------------------------------------
1724 | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C | C |
1725 ----------------------------------------------------------------
1726 || ||
1727 || Bus & Tag Channel Path || ESCON
1728 || ====================== || Channel
1729 || || || || Path
1730 +----------+ +----------+ +----------+
1731 | | | | | |
1732 | CU | | CU | | CU |
1733 | | | | | |
1734 +----------+ +----------+ +----------+
1735 | | | | |
1736+----------+ +----------+ +----------+ +----------+ +----------+
1737|I/O Device| |I/O Device| |I/O Device| |I/O Device| |I/O Device|
1738+----------+ +----------+ +----------+ +----------+ +----------+
1739 CPU = Central Processing Unit
1740 C = Channel
1741 IOP = IP Processor
1742 CU = Control Unit
1743
1744The 390 IO systems come in 2 flavours the current 390 machines support both
1745
1746The Older 360 & 370 Interface,sometimes called the paralell I/O interface,
1747sometimes called Bus-and Tag & sometimes Original Equipment Manufacturers
1748Interface (OEMI).
1749
1750This byte wide paralell channel path/bus has parity & data on the "Bus" cable
1751& control lines on the "Tag" cable. These can operate in byte multiplex mode for
1752sharing between several slow devices or burst mode & monopolize the channel for the
1753whole burst. Upto 256 devices can be addressed on one of these cables. These cables are
1754about one inch in diameter. The maximum unextended length supported by these cables is
1755125 Meters but this can be extended up to 2km with a fibre optic channel extended
1756such as a 3044. The maximum burst speed supported is 4.5 megabytes per second however
1757some really old processors support only transfer rates of 3.0, 2.0 & 1.0 MB/sec.
1758One of these paths can be daisy chained to up to 8 control units.
1759
1760
1761ESCON if fibre optic it is also called FICON
1762Was introduced by IBM in 1990. Has 2 fibre optic cables & uses either leds or lasers
1763for communication at a signaling rate of upto 200 megabits/sec. As 10bits are transferred
1764for every 8 bits info this drops to 160 megabits/sec & to 18.6 Megabytes/sec once
1765control info & CRC are added. ESCON only operates in burst mode.
1766
1767ESCONs typical max cable length is 3km for the led version & 20km for the laser version
1768known as XDF ( extended distance facility ). This can be further extended by using an
1769ESCON director which triples the above mentioned ranges. Unlike Bus & Tag as ESCON is
1770serial it uses a packet switching architecture the standard Bus & Tag control protocol
1771is however present within the packets. Upto 256 devices can be attached to each control
1772unit that uses one of these interfaces.
1773
1774Common 390 Devices include:
1775Network adapters typically OSA2,3172's,2116's & OSA-E gigabit ethernet adapters,
1776Consoles 3270 & 3215 ( a teletype emulated under linux for a line mode console ).
1777DASD's direct access storage devices ( otherwise known as hard disks ).
1778Tape Drives.
1779CTC ( Channel to Channel Adapters ),
1780ESCON or Paralell Cables used as a very high speed serial link
1781between 2 machines. We use 2 cables under linux to do a bi-directional serial link.
1782
1783
1784Debugging IO on s/390 & z/Architecture under VM
1785===============================================
1786
1787Now we are ready to go on with IO tracing commands under VM
1788
1789A few self explanatory queries:
1790Q OSA
1791Q CTC
1792Q DISK ( This command is CMS specific )
1793Q DASD
1794
1795
1796
1797
1798
1799
1800Q OSA on my machine returns
1801OSA 7C08 ON OSA 7C08 SUBCHANNEL = 0000
1802OSA 7C09 ON OSA 7C09 SUBCHANNEL = 0001
1803OSA 7C14 ON OSA 7C14 SUBCHANNEL = 0002
1804OSA 7C15 ON OSA 7C15 SUBCHANNEL = 0003
1805
1806If you have a guest with certain priviliges you may be able to see devices
1807which don't belong to you to avoid this do add the option V.
1808e.g.
1809Q V OSA
1810
1811Now using the device numbers returned by this command we will
1812Trace the io starting up on the first device 7c08 & 7c09
1813In our simplest case we can trace the
1814start subchannels
1815like TR SSCH 7C08-7C09
1816or the halt subchannels
1817or TR HSCH 7C08-7C09
1818MSCH's ,STSCH's I think you can guess the rest
1819
1820Ingo's favourite trick is tracing all the IO's & CCWS & spooling them into the reader of another
1821VM guest so he can ftp the logfile back to his own machine.I'll do a small bit of this & give you
1822 a look at the output.
1823
18241) Spool stdout to VM reader
1825SP PRT TO (another vm guest ) or * for the local vm guest
18262) Fill the reader with the trace
1827TR IO 7c08-7c09 INST INT CCW PRT RUN
18283) Start up linux
1829i 00c
18304) Finish the trace
1831TR END
18325) close the reader
1833C PRT
18346) list reader contents
1835RDRLIST
18367) copy it to linux4's minidisk
1837RECEIVE / LOG TXT A1 ( replace
18388)
1839filel & press F11 to look at it
1840You should see someting like.
1841
184200020942' SSCH B2334000 0048813C CC 0 SCH 0000 DEV 7C08
1843 CPA 000FFDF0 PARM 00E2C9C4 KEY 0 FPI C0 LPM 80
1844 CCW 000FFDF0 E4200100 00487FE8 0000 E4240100 ........
1845 IDAL 43D8AFE8
1846 IDAL 0FB76000
184700020B0A' I/O DEV 7C08 -> 000197BC' SCH 0000 PARM 00E2C9C4
184800021628' TSCH B2354000 >> 00488164 CC 0 SCH 0000 DEV 7C08
1849 CCWA 000FFDF8 DEV STS 0C SCH STS 00 CNT 00EC
1850 KEY 0 FPI C0 CC 0 CTLS 4007
185100022238' STSCH B2344000 >> 00488108 CC 0 SCH 0000 DEV 7C08
1852
1853If you don't like messing up your readed ( because you possibly booted from it )
1854you can alternatively spool it to another readers guest.
1855
1856
1857Other common VM device related commands
1858---------------------------------------------
1859These commands are listed only because they have
1860been of use to me in the past & may be of use to
1861you too. For more complete info on each of the commands
1862use type HELP <command> from CMS.
1863detaching devices
1864DET <devno range>
1865ATT <devno range> <guest>
1866attach a device to guest * for your own guest
1867READY <devno> cause VM to issue a fake interrupt.
1868
1869The VARY command is normally only available to VM administrators.
1870VARY ON PATH <path> TO <devno range>
1871VARY OFF PATH <PATH> FROM <devno range>
1872This is used to switch on or off channel paths to devices.
1873
1874Q CHPID <channel path ID>
1875This displays state of devices using this channel path
1876D SCHIB <subchannel>
1877This displays the subchannel information SCHIB block for the device.
1878this I believe is also only available to administrators.
1879DEFINE CTC <devno>
1880defines a virtual CTC channel to channel connection
18812 need to be defined on each guest for the CTC driver to use.
1882COUPLE devno userid remote devno
1883Joins a local virtual device to a remote virtual device
1884( commonly used for the CTC driver ).
1885
1886Building a VM ramdisk under CMS which linux can use
1887def vfb-<blocksize> <subchannel> <number blocks>
1888blocksize is commonly 4096 for linux.
1889Formatting it
1890format <subchannel> <driver letter e.g. x> (blksize <blocksize>
1891
1892Sharing a disk between multiple guests
1893LINK userid devno1 devno2 mode password
1894
1895
1896
1897GDB on S390
1898===========
1899N.B. if compiling for debugging gdb works better without optimisation
1900( see Compiling programs for debugging )
1901
1902invocation
1903----------
1904gdb <victim program> <optional corefile>
1905
1906Online help
1907-----------
1908help: gives help on commands
1909e.g.
1910help
1911help display
1912Note gdb's online help is very good use it.
1913
1914
1915Assembly
1916--------
1917info registers: displays registers other than floating point.
1918info all-registers: displays floating points as well.
1919disassemble: dissassembles
1920e.g.
1921disassemble without parameters will disassemble the current function
1922disassemble $pc $pc+10
1923
1924Viewing & modifying variables
1925-----------------------------
1926print or p: displays variable or register
1927e.g. p/x $sp will display the stack pointer
1928
1929display: prints variable or register each time program stops
1930e.g.
1931display/x $pc will display the program counter
1932display argc
1933
1934undisplay : undo's display's
1935
1936info breakpoints: shows all current breakpoints
1937
1938info stack: shows stack back trace ( if this dosent work too well, I'll show you the
1939stacktrace by hand below ).
1940
1941info locals: displays local variables.
1942
1943info args: display current procedure arguments.
1944
1945set args: will set argc & argv each time the victim program is invoked.
1946
1947set <variable>=value
1948set argc=100
1949set $pc=0
1950
1951
1952
1953Modifying execution
1954-------------------
1955step: steps n lines of sourcecode
1956step steps 1 line.
1957step 100 steps 100 lines of code.
1958
1959next: like step except this will not step into subroutines
1960
1961stepi: steps a single machine code instruction.
1962e.g. stepi 100
1963
1964nexti: steps a single machine code instruction but will not step into subroutines.
1965
1966finish: will run until exit of the current routine
1967
1968run: (re)starts a program
1969
1970cont: continues a program
1971
1972quit: exits gdb.
1973
1974
1975breakpoints
1976------------
1977
1978break
1979sets a breakpoint
1980e.g.
1981
1982break main
1983
1984break *$pc
1985
1986break *0x400618
1987
1988heres a really useful one for large programs
1989rbr
1990Set a breakpoint for all functions matching REGEXP
1991e.g.
1992rbr 390
1993will set a breakpoint with all functions with 390 in their name.
1994
1995info breakpoints
1996lists all breakpoints
1997
1998delete: delete breakpoint by number or delete them all
1999e.g.
2000delete 1 will delete the first breakpoint
2001delete will delete them all
2002
2003watch: This will set a watchpoint ( usually hardware assisted ),
2004This will watch a variable till it changes
2005e.g.
2006watch cnt, will watch the variable cnt till it changes.
2007As an aside unfortunately gdb's, architecture independent watchpoint code
2008is inconsistent & not very good, watchpoints usually work but not always.
2009
2010info watchpoints: Display currently active watchpoints
2011
2012condition: ( another useful one )
2013Specify breakpoint number N to break only if COND is true.
2014Usage is `condition N COND', where N is an integer and COND is an
2015expression to be evaluated whenever breakpoint N is reached.
2016
2017
2018
2019User defined functions/macros
2020-----------------------------
2021define: ( Note this is very very useful,simple & powerful )
2022usage define <name> <list of commands> end
2023
2024examples which you should consider putting into .gdbinit in your home directory
2025define d
2026stepi
2027disassemble $pc $pc+10
2028end
2029
2030define e
2031nexti
2032disassemble $pc $pc+10
2033end
2034
2035
2036Other hard to classify stuff
2037----------------------------
2038signal n:
2039sends the victim program a signal.
2040e.g. signal 3 will send a SIGQUIT.
2041
2042info signals:
2043what gdb does when the victim receives certain signals.
2044
2045list:
2046e.g.
2047list lists current function source
2048list 1,10 list first 10 lines of curret file.
2049list test.c:1,10
2050
2051
2052directory:
2053Adds directories to be searched for source if gdb cannot find the source.
2054(note it is a bit sensititive about slashes )
2055e.g. To add the root of the filesystem to the searchpath do
2056directory //
2057
2058
2059call <function>
2060This calls a function in the victim program, this is pretty powerful
2061e.g.
2062(gdb) call printf("hello world")
2063outputs:
2064$1 = 11
2065
2066You might now be thinking that the line above didn't work, something extra had to be done.
2067(gdb) call fflush(stdout)
2068hello world$2 = 0
2069As an aside the debugger also calls malloc & free under the hood
2070to make space for the "hello world" string.
2071
2072
2073
2074hints
2075-----
20761) command completion works just like bash
2077( if you are a bad typist like me this really helps )
2078e.g. hit br <TAB> & cursor up & down :-).
2079
20802) if you have a debugging problem that takes a few steps to recreate
2081put the steps into a file called .gdbinit in your current working directory
2082if you have defined a few extra useful user defined commands put these in
2083your home directory & they will be read each time gdb is launched.
2084
2085A typical .gdbinit file might be.
2086break main
2087run
2088break runtime_exception
2089cont
2090
2091
2092stack chaining in gdb by hand
2093-----------------------------
2094This is done using a the same trick described for VM
2095p/x (*($sp+56))&0x7fffffff get the first backchain.
2096
2097For z/Architecture
2098Replace 56 with 112 & ignore the &0x7fffffff
2099in the macros below & do nasty casts to longs like the following
2100as gdb unfortunately deals with printed arguments as ints which
2101messes up everything.
2102i.e. here is a 3rd backchain dereference
2103p/x *(long *)(***(long ***)$sp+112)
2104
2105
2106this outputs
2107$5 = 0x528f18
2108on my machine.
2109Now you can use
2110info symbol (*($sp+56))&0x7fffffff
2111you might see something like.
2112rl_getc + 36 in section .text telling you what is located at address 0x528f18
2113Now do.
2114p/x (*(*$sp+56))&0x7fffffff
2115This outputs
2116$6 = 0x528ed0
2117Now do.
2118info symbol (*(*$sp+56))&0x7fffffff
2119rl_read_key + 180 in section .text
2120now do
2121p/x (*(**$sp+56))&0x7fffffff
2122& so on.
2123
2124Disassembling instructions without debug info
2125---------------------------------------------
2126gdb typically compains if there is a lack of debugging
2127symbols in the disassemble command with
2128"No function contains specified address." to get around
2129this do
2130x/<number lines to disassemble>xi <address>
2131e.g.
2132x/20xi 0x400730
2133
2134
2135
2136Note: Remember gdb has history just like bash you don't need to retype the
2137whole line just use the up & down arrows.
2138
2139
2140
2141For more info
2142-------------
2143From your linuxbox do
2144man gdb or info gdb.
2145
2146core dumps
2147----------
2148What a core dump ?,
2149A core dump is a file generated by the kernel ( if allowed ) which contains the registers,
2150& all active pages of the program which has crashed.
2151From this file gdb will allow you to look at the registers & stack trace & memory of the
2152program as if it just crashed on your system, it is usually called core & created in the
2153current working directory.
2154This is very useful in that a customer can mail a core dump to a technical support department
2155& the technical support department can reconstruct what happened.
2156Provided the have an identical copy of this program with debugging symbols compiled in &
2157the source base of this build is available.
2158In short it is far more useful than something like a crash log could ever hope to be.
2159
2160In theory all that is missing to restart a core dumped program is a kernel patch which
2161will do the following.
21621) Make a new kernel task structure
21632) Reload all the dumped pages back into the kernel's memory management structures.
21643) Do the required clock fixups
21654) Get all files & network connections for the process back into an identical state ( really difficult ).
21665) A few more difficult things I haven't thought of.
2167
2168
2169
2170Why have I never seen one ?.
2171Probably because you haven't used the command
2172ulimit -c unlimited in bash
2173to allow core dumps, now do
2174ulimit -a
2175to verify that the limit was accepted.
2176
2177A sample core dump
2178To create this I'm going to do
2179ulimit -c unlimited
2180gdb
2181to launch gdb (my victim app. ) now be bad & do the following from another
2182telnet/xterm session to the same machine
2183ps -aux | grep gdb
2184kill -SIGSEGV <gdb's pid>
2185or alternatively use killall -SIGSEGV gdb if you have the killall command.
2186Now look at the core dump.
2187./gdb ./gdb core
2188Displays the following
2189GNU gdb 4.18
2190Copyright 1998 Free Software Foundation, Inc.
2191GDB is free software, covered by the GNU General Public License, and you are
2192welcome to change it and/or distribute copies of it under certain conditions.
2193Type "show copying" to see the conditions.
2194There is absolutely no warranty for GDB. Type "show warranty" for details.
2195This GDB was configured as "s390-ibm-linux"...
2196Core was generated by `./gdb'.
2197Program terminated with signal 11, Segmentation fault.
2198Reading symbols from /usr/lib/libncurses.so.4...done.
2199Reading symbols from /lib/libm.so.6...done.
2200Reading symbols from /lib/libc.so.6...done.
2201Reading symbols from /lib/ld-linux.so.2...done.
2202#0 0x40126d1a in read () from /lib/libc.so.6
2203Setting up the environment for debugging gdb.
2204Breakpoint 1 at 0x4dc6f8: file utils.c, line 471.
2205Breakpoint 2 at 0x4d87a4: file top.c, line 2609.
2206(top-gdb) info stack
2207#0 0x40126d1a in read () from /lib/libc.so.6
2208#1 0x528f26 in rl_getc (stream=0x7ffffde8) at input.c:402
2209#2 0x528ed0 in rl_read_key () at input.c:381
2210#3 0x5167e6 in readline_internal_char () at readline.c:454
2211#4 0x5168ee in readline_internal_charloop () at readline.c:507
2212#5 0x51692c in readline_internal () at readline.c:521
2213#6 0x5164fe in readline (prompt=0x7ffff810 "\177ÿøx\177ÿ÷Ø\177ÿøxÀ")
2214 at readline.c:349
2215#7 0x4d7a8a in command_line_input (prrompt=0x564420 "(gdb) ", repeat=1,
2216 annotation_suffix=0x4d6b44 "prompt") at top.c:2091
2217#8 0x4d6cf0 in command_loop () at top.c:1345
2218#9 0x4e25bc in main (argc=1, argv=0x7ffffdf4) at main.c:635
2219
2220
2221LDD
2222===
2223This is a program which lists the shared libraries which a library needs,
2224Note you also get the relocations of the shared library text segments which
2225help when using objdump --source.
2226e.g.
2227 ldd ./gdb
2228outputs
2229libncurses.so.4 => /usr/lib/libncurses.so.4 (0x40018000)
2230libm.so.6 => /lib/libm.so.6 (0x4005e000)
2231libc.so.6 => /lib/libc.so.6 (0x40084000)
2232/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
2233
2234
2235Debugging shared libraries
2236==========================
2237Most programs use shared libraries, however it can be very painful
2238when you single step instruction into a function like printf for the
2239first time & you end up in functions like _dl_runtime_resolve this is
2240the ld.so doing lazy binding, lazy binding is a concept in ELF where
2241shared library functions are not loaded into memory unless they are
2242actually used, great for saving memory but a pain to debug.
2243To get around this either relink the program -static or exit gdb type
2244export LD_BIND_NOW=true this will stop lazy binding & restart the gdb'ing
2245the program in question.
2246
2247
2248
2249Debugging modules
2250=================
2251As modules are dynamically loaded into the kernel their address can be
2252anywhere to get around this use the -m option with insmod to emit a load
2253map which can be piped into a file if required.
2254
2255The proc file system
2256====================
2257What is it ?.
2258It is a filesystem created by the kernel with files which are created on demand
2259by the kernel if read, or can be used to modify kernel parameters,
2260it is a powerful concept.
2261
2262e.g.
2263
2264cat /proc/sys/net/ipv4/ip_forward
2265On my machine outputs
22660
2267telling me ip_forwarding is not on to switch it on I can do
2268echo 1 > /proc/sys/net/ipv4/ip_forward
2269cat it again
2270cat /proc/sys/net/ipv4/ip_forward
2271On my machine now outputs
22721
2273IP forwarding is on.
2274There is a lot of useful info in here best found by going in & having a look around,
2275so I'll take you through some entries I consider important.
2276
2277All the processes running on the machine have there own entry defined by
2278/proc/<pid>
2279So lets have a look at the init process
2280cd /proc/1
2281
2282cat cmdline
2283emits
2284init [2]
2285
2286cd /proc/1/fd
2287This contains numerical entries of all the open files,
2288some of these you can cat e.g. stdout (2)
2289
2290cat /proc/29/maps
2291on my machine emits
2292
229300400000-00478000 r-xp 00000000 5f:00 4103 /bin/bash
229400478000-0047e000 rw-p 00077000 5f:00 4103 /bin/bash
22950047e000-00492000 rwxp 00000000 00:00 0
229640000000-40015000 r-xp 00000000 5f:00 14382 /lib/ld-2.1.2.so
229740015000-40016000 rw-p 00014000 5f:00 14382 /lib/ld-2.1.2.so
229840016000-40017000 rwxp 00000000 00:00 0
229940017000-40018000 rw-p 00000000 00:00 0
230040018000-4001b000 r-xp 00000000 5f:00 14435 /lib/libtermcap.so.2.0.8
23014001b000-4001c000 rw-p 00002000 5f:00 14435 /lib/libtermcap.so.2.0.8
23024001c000-4010d000 r-xp 00000000 5f:00 14387 /lib/libc-2.1.2.so
23034010d000-40111000 rw-p 000f0000 5f:00 14387 /lib/libc-2.1.2.so
230440111000-40114000 rw-p 00000000 00:00 0
230540114000-4011e000 r-xp 00000000 5f:00 14408 /lib/libnss_files-2.1.2.so
23064011e000-4011f000 rw-p 00009000 5f:00 14408 /lib/libnss_files-2.1.2.so
23077fffd000-80000000 rwxp ffffe000 00:00 0
2308
2309
2310Showing us the shared libraries init uses where they are in memory
2311& memory access permissions for each virtual memory area.
2312
2313/proc/1/cwd is a softlink to the current working directory.
2314/proc/1/root is the root of the filesystem for this process.
2315
2316/proc/1/mem is the current running processes memory which you
2317can read & write to like a file.
2318strace uses this sometimes as it is a bit faster than the
2319rather inefficent ptrace interface for peeking at DATA.
2320
2321
2322cat status
2323
2324Name: init
2325State: S (sleeping)
2326Pid: 1
2327PPid: 0
2328Uid: 0 0 0 0
2329Gid: 0 0 0 0
2330Groups:
2331VmSize: 408 kB
2332VmLck: 0 kB
2333VmRSS: 208 kB
2334VmData: 24 kB
2335VmStk: 8 kB
2336VmExe: 368 kB
2337VmLib: 0 kB
2338SigPnd: 0000000000000000
2339SigBlk: 0000000000000000
2340SigIgn: 7fffffffd7f0d8fc
2341SigCgt: 00000000280b2603
2342CapInh: 00000000fffffeff
2343CapPrm: 00000000ffffffff
2344CapEff: 00000000fffffeff
2345
2346User PSW: 070de000 80414146
2347task: 004b6000 tss: 004b62d8 ksp: 004b7ca8 pt_regs: 004b7f68
2348User GPRS:
234900000400 00000000 0000000b 7ffffa90
235000000000 00000000 00000000 0045d9f4
23510045cafc 7ffffa90 7fffff18 0045cb08
235200010400 804039e8 80403af8 7ffff8b0
2353User ACRS:
235400000000 00000000 00000000 00000000
235500000001 00000000 00000000 00000000
235600000000 00000000 00000000 00000000
235700000000 00000000 00000000 00000000
2358Kernel BackChain CallChain BackChain CallChain
2359 004b7ca8 8002bd0c 004b7d18 8002b92c
2360 004b7db8 8005cd50 004b7e38 8005d12a
2361 004b7f08 80019114
2362Showing among other things memory usage & status of some signals &
2363the processes'es registers from the kernel task_structure
2364as well as a backchain which may be useful if a process crashes
2365in the kernel for some unknown reason.
2366
2367Some driver debugging techniques
2368================================
2369debug feature
2370-------------
2371Some of our drivers now support a "debug feature" in
2372/proc/s390dbf see s390dbf.txt in the linux/Documentation directory
2373for more info.
2374e.g.
2375to switch on the lcs "debug feature"
2376echo 5 > /proc/s390dbf/lcs/level
2377& then after the error occurred.
2378cat /proc/s390dbf/lcs/sprintf >/logfile
2379the logfile now contains some information which may help
2380tech support resolve a problem in the field.
2381
2382
2383
2384high level debugging network drivers
2385------------------------------------
2386ifconfig is a quite useful command
2387it gives the current state of network drivers.
2388
2389If you suspect your network device driver is dead
2390one way to check is type
2391ifconfig <network device>
2392e.g. tr0
2393You should see something like
2394tr0 Link encap:16/4 Mbps Token Ring (New) HWaddr 00:04:AC:20:8E:48
2395 inet addr:9.164.185.132 Bcast:9.164.191.255 Mask:255.255.224.0
2396 UP BROADCAST RUNNING MULTICAST MTU:2000 Metric:1
2397 RX packets:246134 errors:0 dropped:0 overruns:0 frame:0
2398 TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
2399 collisions:0 txqueuelen:100
2400
2401if the device doesn't say up
2402try
2403/etc/rc.d/init.d/network start
2404( this starts the network stack & hopefully calls ifconfig tr0 up ).
2405ifconfig looks at the output of /proc/net/dev & presents it in a more presentable form
2406Now ping the device from a machine in the same subnet.
2407if the RX packets count & TX packets counts don't increment you probably
2408have problems.
2409next
2410cat /proc/net/arp
2411Do you see any hardware addresses in the cache if not you may have problems.
2412Next try
2413ping -c 5 <broadcast_addr> i.e. the Bcast field above in the output of
2414ifconfig. Do you see any replies from machines other than the local machine
2415if not you may have problems. also if the TX packets count in ifconfig
2416hasn't incremented either you have serious problems in your driver
2417(e.g. the txbusy field of the network device being stuck on )
2418or you may have multiple network devices connected.
2419
2420
2421chandev
2422-------
2423There is a new device layer for channel devices, some
2424drivers e.g. lcs are registered with this layer.
2425If the device uses the channel device layer you'll be
2426able to find what interrupts it uses & the current state
2427of the device.
2428See the manpage chandev.8 &type cat /proc/chandev for more info.
2429
2430
2431
2432Starting points for debugging scripting languages etc.
2433======================================================
2434
2435bash/sh
2436
2437bash -x <scriptname>
2438e.g. bash -x /usr/bin/bashbug
2439displays the following lines as it executes them.
2440+ MACHINE=i586
2441+ OS=linux-gnu
2442+ CC=gcc
2443+ CFLAGS= -DPROGRAM='bash' -DHOSTTYPE='i586' -DOSTYPE='linux-gnu' -DMACHTYPE='i586-pc-linux-gnu' -DSHELL -DHAVE_CONFIG_H -I. -I. -I./lib -O2 -pipe
2444+ RELEASE=2.01
2445+ PATCHLEVEL=1
2446+ RELSTATUS=release
2447+ MACHTYPE=i586-pc-linux-gnu
2448
2449perl -d <scriptname> runs the perlscript in a fully intercative debugger
2450<like gdb>.
2451Type 'h' in the debugger for help.
2452
2453for debugging java type
2454jdb <filename> another fully interactive gdb style debugger.
2455& type ? in the debugger for help.
2456
2457
2458
2459Dumptool & Lcrash ( lkcd )
2460==========================
2461Michael Holzheu & others here at IBM have a fairly mature port of
2462SGI's lcrash tool which allows one to look at kernel structures in a
2463running kernel.
2464
2465It also complements a tool called dumptool which dumps all the kernel's
2466memory pages & registers to either a tape or a disk.
2467This can be used by tech support or an ambitious end user do
2468post mortem debugging of a machine like gdb core dumps.
2469
2470Going into how to use this tool in detail will be explained
2471in other documentation supplied by IBM with the patches & the
2472lcrash homepage http://oss.sgi.com/projects/lkcd/ & the lcrash manpage.
2473
2474How they work
2475-------------
2476Lcrash is a perfectly normal program,however, it requires 2
2477additional files, Kerntypes which is built using a patch to the
2478linux kernel sources in the linux root directory & the System.map.
2479
2480Kerntypes is an an objectfile whose sole purpose in life
2481is to provide stabs debug info to lcrash, to do this
2482Kerntypes is built from kerntypes.c which just includes the most commonly
2483referenced header files used when debugging, lcrash can then read the
2484.stabs section of this file.
2485
2486Debugging a live system it uses /dev/mem
2487alternatively for post mortem debugging it uses the data
2488collected by dumptool.
2489
2490
2491
2492SysRq
2493=====
2494This is now supported by linux for s/390 & z/Architecture.
2495To enable it do compile the kernel with
2496Kernel Hacking -> Magic SysRq Key Enabled
2497echo "1" > /proc/sys/kernel/sysrq
2498also type
2499echo "8" >/proc/sys/kernel/printk
2500To make printk output go to console.
2501On 390 all commands are prefixed with
2502^-
2503e.g.
2504^-t will show tasks.
2505^-? or some unknown command will display help.
2506The sysrq key reading is very picky ( I have to type the keys in an
2507 xterm session & paste them into the x3270 console )
2508& it may be wise to predefine the keys as described in the VM hints above
2509
2510This is particularly useful for syncing disks unmounting & rebooting
2511if the machine gets partially hung.
2512
2513Read Documentation/sysrq.txt for more info
2514
2515References:
2516===========
2517Enterprise Systems Architecture Reference Summary
2518Enterprise Systems Architecture Principles of Operation
2519Hartmut Penners s390 stack frame sheet.
2520IBM Mainframe Channel Attachment a technology brief from a CISCO webpage
2521Various bits of man & info pages of Linux.
2522Linux & GDB source.
2523Various info & man pages.
2524CMS Help on tracing commands.
2525Linux for s/390 Elf Application Binary Interface
2526Linux for z/Series Elf Application Binary Interface ( Both Highly Recommended )
2527z/Architecture Principles of Operation SA22-7832-00
2528Enterprise Systems Architecture/390 Reference Summary SA22-7209-01 & the
2529Enterprise Systems Architecture/390 Principles of Operation SA22-7201-05
2530
2531Special Thanks
2532==============
2533Special thanks to Neale Ferguson who maintains a much
2534prettier HTML version of this page at
2535http://penguinvm.princeton.edu/notes.html#Debug390
2536Bob Grainger Stefan Bader & others for reporting bugs
diff --git a/Documentation/s390/TAPE b/Documentation/s390/TAPE
new file mode 100644
index 000000000000..c639aa5603ff
--- /dev/null
+++ b/Documentation/s390/TAPE
@@ -0,0 +1,122 @@
1Channel attached Tape device driver
2
3-----------------------------WARNING-----------------------------------------
4This driver is considered to be EXPERIMENTAL. Do NOT use it in
5production environments. Feel free to test it and report problems back to us.
6-----------------------------------------------------------------------------
7
8The LINUX for zSeries tape device driver manages channel attached tape drives
9which are compatible to IBM 3480 or IBM 3490 magnetic tape subsystems. This
10includes various models of these devices (for example the 3490E).
11
12
13Tape driver features
14
15The device driver supports a maximum of 128 tape devices.
16No official LINUX device major number is assigned to the zSeries tape device
17driver. It allocates major numbers dynamically and reports them on system
18startup.
19Typically it will get major number 254 for both the character device front-end
20and the block device front-end.
21
22The tape device driver needs no kernel parameters. All supported devices
23present are detected on driver initialization at system startup or module load.
24The devices detected are ordered by their subchannel numbers. The device with
25the lowest subchannel number becomes device 0, the next one will be device 1
26and so on.
27
28
29Tape character device front-end
30
31The usual way to read or write to the tape device is through the character
32device front-end. The zSeries tape device driver provides two character devices
33for each physical device -- the first of these will rewind automatically when
34it is closed, the second will not rewind automatically.
35
36The character device nodes are named /dev/rtibm0 (rewinding) and /dev/ntibm0
37(non-rewinding) for the first device, /dev/rtibm1 and /dev/ntibm1 for the
38second, and so on.
39
40The character device front-end can be used as any other LINUX tape device. You
41can write to it and read from it using LINUX facilities such as GNU tar. The
42tool mt can be used to perform control operations, such as rewinding the tape
43or skipping a file.
44
45Most LINUX tape software should work with either tape character device.
46
47
48Tape block device front-end
49
50The tape device may also be accessed as a block device in read-only mode.
51This could be used for software installation in the same way as it is used with
52other operation systems on the zSeries platform (and most LINUX
53distributions are shipped on compact disk using ISO9660 filesystems).
54
55One block device node is provided for each physical device. These are named
56/dev/btibm0 for the first device, /dev/btibm1 for the second and so on.
57You should only use the ISO9660 filesystem on LINUX for zSeries tapes because
58the physical tape devices cannot perform fast seeks and the ISO9660 system is
59optimized for this situation.
60
61
62Tape block device example
63
64In this example a tape with an ISO9660 filesystem is created using the first
65tape device. ISO9660 filesystem support must be built into your system kernel
66for this.
67The mt command is used to issue tape commands and the mkisofs command to
68create an ISO9660 filesystem:
69
70- create a LINUX directory (somedir) with the contents of the filesystem
71 mkdir somedir
72 cp contents somedir
73
74- insert a tape
75
76- ensure the tape is at the beginning
77 mt -f /dev/ntibm0 rewind
78
79- set the blocksize of the character driver. The blocksize 2048 bytes
80 is commonly used on ISO9660 CD-Roms
81 mt -f /dev/ntibm0 setblk 2048
82
83- write the filesystem to the character device driver
84 mkisofs -o /dev/ntibm0 somedir
85
86- rewind the tape again
87 mt -f /dev/ntibm0 rewind
88
89- Now you can mount your new filesystem as a block device:
90 mount -t iso9660 -o ro,block=2048 /dev/btibm0 /mnt
91
92TODO List
93
94 - Driver has to be stabilized still
95
96BUGS
97
98This driver is considered BETA, which means some weaknesses may still
99be in it.
100If an error occurs which cannot be handled by the code you will get a
101sense-data dump.In that case please do the following:
102
1031. set the tape driver debug level to maximum:
104 echo 6 >/proc/s390dbf/tape/level
105
1062. re-perform the actions which produced the bug. (Hopefully the bug will
107 reappear.)
108
1093. get a snapshot from the debug-feature:
110 cat /proc/s390dbf/tape/hex_ascii >somefile
111
1124. Now put the snapshot together with a detailed description of the situation
113 that led to the bug:
114 - Which tool did you use?
115 - Which hardware do you have?
116 - Was your tape unit online?
117 - Is it a shared tape unit?
118
1195. Send an email with your bug report to:
120 mailto:Linux390@de.ibm.com
121
122
diff --git a/Documentation/s390/cds.txt b/Documentation/s390/cds.txt
new file mode 100644
index 000000000000..d9397170fb36
--- /dev/null
+++ b/Documentation/s390/cds.txt
@@ -0,0 +1,513 @@
1Linux for S/390 and zSeries
2
3Common Device Support (CDS)
4Device Driver I/O Support Routines
5
6Authors : Ingo Adlung
7 Cornelia Huck
8
9Copyright, IBM Corp. 1999-2002
10
11Introduction
12
13This document describes the common device support routines for Linux/390.
14Different than other hardware architectures, ESA/390 has defined a unified
15I/O access method. This gives relief to the device drivers as they don't
16have to deal with different bus types, polling versus interrupt
17processing, shared versus non-shared interrupt processing, DMA versus port
18I/O (PIO), and other hardware features more. However, this implies that
19either every single device driver needs to implement the hardware I/O
20attachment functionality itself, or the operating system provides for a
21unified method to access the hardware, providing all the functionality that
22every single device driver would have to provide itself.
23
24The document does not intend to explain the ESA/390 hardware architecture in
25every detail.This information can be obtained from the ESA/390 Principles of
26Operation manual (IBM Form. No. SA22-7201).
27
28In order to build common device support for ESA/390 I/O interfaces, a
29functional layer was introduced that provides generic I/O access methods to
30the hardware.
31
32The common device support layer comprises the I/O support routines defined
33below. Some of them implement common Linux device driver interfaces, while
34some of them are ESA/390 platform specific.
35
36Note:
37In order to write a driver for S/390, you also need to look into the interface
38described in Documentation/s390/driver-model.txt.
39
40Note for porting drivers from 2.4:
41The major changes are:
42* The functions use a ccw_device instead of an irq (subchannel).
43* All drivers must define a ccw_driver (see driver-model.txt) and the associated
44 functions.
45* request_irq() and free_irq() are no longer done by the driver.
46* The oper_handler is (kindof) replaced by the probe() and set_online() functions
47 of the ccw_driver.
48* The not_oper_handler is (kindof) replaced by the remove() and set_offline()
49 functions of the ccw_driver.
50* The channel device layer is gone.
51* The interrupt handlers must be adapted to use a ccw_device as argument.
52 Moreover, they don't return a devstat, but an irb.
53* Before initiating an io, the options must be set via ccw_device_set_options().
54
55read_dev_chars()
56 read device characteristics
57
58read_conf_data()
59 read configuration data.
60
61ccw_device_get_ciw()
62 get commands from extended sense data.
63
64ccw_device_start()
65 initiate an I/O request.
66
67ccw_device_resume()
68 resume channel program execution.
69
70ccw_device_halt()
71 terminate the current I/O request processed on the device.
72
73do_IRQ()
74 generic interrupt routine. This function is called by the interrupt entry
75 routine whenever an I/O interrupt is presented to the system. The do_IRQ()
76 routine determines the interrupt status and calls the device specific
77 interrupt handler according to the rules (flags) defined during I/O request
78 initiation with do_IO().
79
80The next chapters describe the functions other than do_IRQ() in more details.
81The do_IRQ() interface is not described, as it is called from the Linux/390
82first level interrupt handler only and does not comprise a device driver
83callable interface. Instead, the functional description of do_IO() also
84describes the input to the device specific interrupt handler.
85
86Note: All explanations apply also to the 64 bit architecture s390x.
87
88
89Common Device Support (CDS) for Linux/390 Device Drivers
90
91General Information
92
93The following chapters describe the I/O related interface routines the
94Linux/390 common device support (CDS) provides to allow for device specific
95driver implementations on the IBM ESA/390 hardware platform. Those interfaces
96intend to provide the functionality required by every device driver
97implementaion to allow to drive a specific hardware device on the ESA/390
98platform. Some of the interface routines are specific to Linux/390 and some
99of them can be found on other Linux platforms implementations too.
100Miscellaneous function prototypes, data declarations, and macro definitions
101can be found in the architecture specific C header file
102linux/include/asm-s390/irq.h.
103
104Overview of CDS interface concepts
105
106Different to other hardware platforms, the ESA/390 architecture doesn't define
107interrupt lines managed by a specific interrupt controller and bus systems
108that may or may not allow for shared interrupts, DMA processing, etc.. Instead,
109the ESA/390 architecture has implemented a so called channel subsystem, that
110provides a unified view of the devices physically attached to the systems.
111Though the ESA/390 hardware platform knows about a huge variety of different
112peripheral attachments like disk devices (aka. DASDs), tapes, communication
113controllers, etc. they can all by accessed by a well defined access method and
114they are presenting I/O completion a unified way : I/O interruptions. Every
115single device is uniquely identified to the system by a so called subchannel,
116where the ESA/390 architecture allows for 64k devices be attached.
117
118Linux, however, was first built on the Intel PC architecture, with its two
119cascaded 8259 programmable interrupt controllers (PICs), that allow for a
120maximum of 15 different interrupt lines. All devices attached to such a system
121share those 15 interrupt levels. Devices attached to the ISA bus system must
122not share interrupt levels (aka. IRQs), as the ISA bus bases on edge triggered
123interrupts. MCA, EISA, PCI and other bus systems base on level triggered
124interrupts, and therewith allow for shared IRQs. However, if multiple devices
125present their hardware status by the same (shared) IRQ, the operating system
126has to call every single device driver registered on this IRQ in order to
127determine the device driver owning the device that raised the interrupt.
128
129In order not to introduce a new I/O concept to the common Linux code,
130Linux/390 preserves the IRQ concept and semantically maps the ESA/390
131subchannels to Linux as IRQs. This allows Linux/390 to support up to 64k
132different IRQs, uniquely representig a single device each.
133
134Up to kernel 2.4, Linux/390 used to provide interfaces via the IRQ (subchannel).
135For internal use of the common I/O layer, these are still there. However,
136device drivers should use the new calling interface via the ccw_device only.
137
138During its startup the Linux/390 system checks for peripheral devices. Each
139of those devices is uniquely defined by a so called subchannel by the ESA/390
140channel subsystem. While the subchannel numbers are system generated, each
141subchannel also takes a user defined attribute, the so called device number.
142Both subchannel number and device number can not exceed 65535. During driverfs
143initialisation, the information about control unit type and device types that
144imply specific I/O commands (channel command words - CCWs) in order to operate
145the device are gathered. Device drivers can retrieve this set of hardware
146information during their initialization step to recognize the devices they
147support using the information saved in the struct ccw_device given to them.
148This methods implies that Linux/390 doesn't require to probe for free (not
149armed) interrupt request lines (IRQs) to drive its devices with. Where
150applicable, the device drivers can use the read_dev_chars() to retrieve device
151characteristics. This can be done without having to request device ownership
152previously.
153
154In order to allow for easy I/O initiation the CDS layer provides a
155ccw_device_start() interface that takes a device specific channel program (one
156or more CCWs) as input sets up the required architecture specific control blocks
157and initiates an I/O request on behalf of the device driver. The
158ccw_device_start() routine allows to specify whether it expects the CDS layer
159to notify the device driver for every interrupt it observes, or with final status
160only. See ccw_device_start() for more details. A device driver must never issue
161ESA/390 I/O commands itself, but must use the Linux/390 CDS interfaces instead.
162
163For long running I/O request to be canceled, the CDS layer provides the
164ccw_device_halt() function. Some devices require to initially issue a HALT
165SUBCHANNEL (HSCH) command without having pending I/O requests. This function is
166also covered by ccw_device_halt().
167
168
169read_dev_chars() - Read Device Characteristics
170
171This routine returns the characteristics for the device specified.
172
173The function is meant to be called with an irq handler in place; that is,
174at earliest during set_online() processing.
175
176While the request is procesed synchronously, the device interrupt
177handler is called for final ending status. In case of error situations the
178interrupt handler may recover appropriately. The device irq handler can
179recognize the corresponding interrupts by the interruption parameter be
1800x00524443.The ccw_device must not be locked prior to calling read_dev_chars().
181
182The function may be called enabled or disabled.
183
184int read_dev_chars(struct ccw_device *cdev, void **buffer, int length );
185
186cdev - the ccw_device the information is requested for.
187buffer - pointer to a buffer pointer. The buffer pointer itself
188 must contain a valid buffer area.
189length - length of the buffer provided.
190
191The read_dev_chars() function returns :
192
193 0 - successful completion
194-ENODEV - cdev invalid
195-EINVAL - an invalid parameter was detected, or the function was called early.
196-EBUSY - an irrecoverable I/O error occurred or the device is not
197 operational.
198
199
200read_conf_data() - Read Configuration Data
201
202Retrieve the device dependent configuration data. Please have a look at your
203device dependent I/O commands for the device specific layout of the node
204descriptor elements.
205
206The function is meant to be called with an irq handler in place; that is,
207at earliest during set_online() processing.
208
209The function may be called enabled or disabled, but the device must not be
210locked
211
212int read_conf_data(struct ccw_device, void **buffer, int *length, __u8 lpm);
213
214cdev - the ccw_device the data is requested for.
215buffer - Pointer to a buffer pointer. The read_conf_data() routine
216 will allocate a buffer and initialize the buffer pointer
217 accordingly. It's the device driver's responsibility to
218 release the kernel memory if no longer needed.
219length - Length of the buffer allocated and retrieved.
220lpm - Logical path mask to be used for retrieving the data. If
221 zero the data is retrieved on the next path available.
222
223The read_conf_data() function returns :
224 0 - Successful completion
225-ENODEV - cdev invalid.
226-EINVAL - An invalid parameter was detected, or the function was called early.
227-EIO - An irrecoverable I/O error occurred or the device is
228 not operational.
229-ENOMEM - The read_conf_data() routine couldn't obtain storage.
230-EOPNOTSUPP - The device doesn't support the read configuration
231 data command.
232
233
234get_ciw() - get command information word
235
236This call enables a device driver to get information about supported commands
237from the extended SenseID data.
238
239struct ciw *
240ccw_device_get_ciw(struct ccw_device *cdev, __u32 cmd);
241
242cdev - The ccw_device for which the command is to be retrieved.
243cmd - The command type to be retrieved.
244
245ccw_device_get_ciw() returns:
246NULL - No extended data available, invalid device or command not found.
247!NULL - The command requested.
248
249
250ccw_device_start() - Initiate I/O Request
251
252The ccw_device_start() routines is the I/O request front-end processor. All
253device driver I/O requests must be issued using this routine. A device driver
254must not issue ESA/390 I/O commands itself. Instead the ccw_device_start()
255routine provides all interfaces required to drive arbitrary devices.
256
257This description also covers the status information passed to the device
258driver's interrupt handler as this is related to the rules (flags) defined
259with the associated I/O request when calling ccw_device_start().
260
261int ccw_device_start(struct ccw_device *cdev,
262 struct ccw1 *cpa,
263 unsigned long intparm,
264 __u8 lpm,
265 unsigned long flags);
266
267cdev : ccw_device the I/O is destined for
268cpa : logical start address of channel program
269user_intparm : user specific interrupt information; will be presented
270 back to the device driver's interrupt handler. Allows a
271 device driver to associate the interrupt with a
272 particular I/O request.
273lpm : defines the channel path to be used for a specific I/O
274 request. A value of 0 will make cio use the opm.
275flag : defines the action to be performed for I/O processing
276
277Possible flag values are :
278
279DOIO_ALLOW_SUSPEND - channel program may become suspended
280DOIO_DENY_PREFETCH - don't allow for CCW prefetch; usually
281 this implies the channel program might
282 become modified
283DOIO_SUPPRESS_INTER - don't call the handler on intermediate status
284
285The cpa parameter points to the first format 1 CCW of a channel program :
286
287struct ccw1 {
288 __u8 cmd_code;/* command code */
289 __u8 flags; /* flags, like IDA addressing, etc. */
290 __u16 count; /* byte count */
291 __u32 cda; /* data address */
292} __attribute__ ((packed,aligned(8)));
293
294with the following CCW flags values defined :
295
296CCW_FLAG_DC - data chaining
297CCW_FLAG_CC - command chaining
298CCW_FLAG_SLI - suppress incorrct length
299CCW_FLAG_SKIP - skip
300CCW_FLAG_PCI - PCI
301CCW_FLAG_IDA - indirect addressing
302CCW_FLAG_SUSPEND - suspend
303
304
305Via ccw_device_set_options(), the device driver may specify the following
306options for the device:
307
308DOIO_EARLY_NOTIFICATION - allow for early interrupt notification
309DOIO_REPORT_ALL - report all interrupt conditions
310
311
312The ccw_device_start() function returns :
313
314 0 - successful completion or request successfully initiated
315-EBUSY - The device is currently processing a previous I/O request, or ther is
316 a status pending at the device.
317-ENODEV - cdev is invalid, the device is not operational or the ccw_device is
318 not online.
319
320When the I/O request completes, the CDS first level interrupt handler will
321accumalate the status in a struct irb and then call the device interrupt handler.
322The intparm field will contain the value the device driver has associated with a
323particular I/O request. If a pending device status was recognized,
324intparm will be set to 0 (zero). This may happen during I/O initiation or delayed
325by an alert status notification. In any case this status is not related to the
326current (last) I/O request. In case of a delayed status notification no special
327interrupt will be presented to indicate I/O completion as the I/O request was
328never started, even though ccw_device_start() returned with successful completion.
329
330If the concurrent sense flag in the extended status word in the irb is set, the
331field irb->scsw.count describes the numer of device specific sense bytes
332available in the extended control word irb->scsw.ecw[0]. No device sensing by
333the device driver itself is required.
334
335The device interrupt handler can use the following definitions to investigate
336the primary unit check source coded in sense byte 0 :
337
338SNS0_CMD_REJECT 0x80
339SNS0_INTERVENTION_REQ 0x40
340SNS0_BUS_OUT_CHECK 0x20
341SNS0_EQUIPMENT_CHECK 0x10
342SNS0_DATA_CHECK 0x08
343SNS0_OVERRUN 0x04
344SNS0_INCOMPL_DOMAIN 0x01
345
346Depending on the device status, multiple of those values may be set together.
347Please refer to the device specific documentation for details.
348
349The irb->scsw.cstat field provides the (accumulated) subchannel status :
350
351SCHN_STAT_PCI - program controlled interrupt
352SCHN_STAT_INCORR_LEN - incorrect length
353SCHN_STAT_PROG_CHECK - program check
354SCHN_STAT_PROT_CHECK - protection check
355SCHN_STAT_CHN_DATA_CHK - channel data check
356SCHN_STAT_CHN_CTRL_CHK - channel control check
357SCHN_STAT_INTF_CTRL_CHK - interface control check
358SCHN_STAT_CHAIN_CHECK - chaining check
359
360The irb->scsw.dstat field provides the (accumulated) device status :
361
362DEV_STAT_ATTENTION - attention
363DEV_STAT_STAT_MOD - status modifier
364DEV_STAT_CU_END - control unit end
365DEV_STAT_BUSY - busy
366DEV_STAT_CHN_END - channel end
367DEV_STAT_DEV_END - device end
368DEV_STAT_UNIT_CHECK - unit check
369DEV_STAT_UNIT_EXCEP - unit exception
370
371Please see the ESA/390 Principles of Operation manual for details on the
372individual flag meanings.
373
374Usage Notes :
375
376Prior to call ccw_device_start() the device driver must assure disabled state,
377i.e. the I/O mask value in the PSW must be disabled. This can be accomplished
378by calling local_save_flags( flags). The current PSW flags are preserved and
379can be restored by local_irq_restore( flags) at a later time.
380
381If the device driver violates this rule while running in a uni-processor
382environment an interrupt might be presented prior to the ccw_device_start()
383routine returning to the device driver main path. In this case we will end in a
384deadlock situation as the interrupt handler will try to obtain the irq
385lock the device driver still owns (see below) !
386
387The driver must assure to hold the device specific lock. This can be
388accomplished by
389
390(i) spin_lock(get_ccwdev_lock(cdev)), or
391(ii) spin_lock_irqsave(get_ccwdev_lock(cdev), flags)
392
393Option (i) should be used if the calling routine is running disabled for
394I/O interrupts (see above) already. Option (ii) obtains the device gate und
395puts the CPU into I/O disabled state by preserving the current PSW flags.
396
397The device driver is allowed to issue the next ccw_device_start() call from
398within its interrupt handler already. It is not required to schedule a
399bottom-half, unless an non deterministicly long running error recovery procedure
400or similar needs to be scheduled. During I/O processing the Linux/390 generic
401I/O device driver support has already obtained the IRQ lock, i.e. the handler
402must not try to obtain it again when calling ccw_device_start() or we end in a
403deadlock situation!
404
405If a device driver relies on an I/O request to be completed prior to start the
406next it can reduce I/O processing overhead by chaining a NoOp I/O command
407CCW_CMD_NOOP to the end of the submitted CCW chain. This will force Channel-End
408and Device-End status to be presented together, with a single interrupt.
409However, this should be used with care as it implies the channel will remain
410busy, not being able to process I/O requests for other devices on the same
411channel. Therefore e.g. read commands should never use this technique, as the
412result will be presented by a single interrupt anyway.
413
414In order to minimize I/O overhead, a device driver should use the
415DOIO_REPORT_ALL only if the device can report intermediate interrupt
416information prior to device-end the device driver urgently relies on. In this
417case all I/O interruptions are presented to the device driver until final
418status is recognized.
419
420If a device is able to recover from asynchronosly presented I/O errors, it can
421perform overlapping I/O using the DOIO_EARLY_NOTIFICATION flag. While some
422devices always report channel-end and device-end together, with a single
423interrupt, others present primary status (channel-end) when the channel is
424ready for the next I/O request and secondary status (device-end) when the data
425transmission has been completed at the device.
426
427Above flag allows to exploit this feature, e.g. for communication devices that
428can handle lost data on the network to allow for enhanced I/O processing.
429
430Unless the channel subsystem at any time presents a secondary status interrupt,
431exploiting this feature will cause only primary status interrupts to be
432presented to the device driver while overlapping I/O is performed. When a
433secondary status without error (alert status) is presented, this indicates
434successful completion for all overlapping ccw_device_start() requests that have
435been issued since the last secondary (final) status.
436
437Channel programs that intend to set the suspend flag on a channel command word
438(CCW) must start the I/O operation with the DOIO_ALLOW_SUSPEND option or the
439suspend flag will cause a channel program check. At the time the channel program
440becomes suspended an intermediate interrupt will be generated by the channel
441subsystem.
442
443ccw_device_resume() - Resume Channel Program Execution
444
445If a device driver chooses to suspend the current channel program execution by
446setting the CCW suspend flag on a particular CCW, the channel program execution
447is suspended. In order to resume channel program execution the CIO layer
448provides the ccw_device_resume() routine.
449
450int ccw_device_resume(struct ccw_device *cdev);
451
452cdev - ccw_device the resume operation is requested for
453
454The resume_IO() function returns:
455
456 0 - suspended channel program is resumed
457-EBUSY - status pending
458-ENODEV - cdev invalid or not-operational subchannel
459-EINVAL - resume function not applicable
460-ENOTCONN - there is no I/O request pending for completion
461
462Usage Notes:
463Please have a look at the ccw_device_start() usage notes for more details on
464suspended channel programs.
465
466ccw_device_halt() - Halt I/O Request Processing
467
468Sometimes a device driver might need a possibility to stop the processing of
469a long-running channel program or the device might require to initially issue
470a halt subchannel (HSCH) I/O command. For those purposes the ccw_device_halt()
471command is provided.
472
473int ccw_device_halt(struct ccw_device *cdev,
474 unsigned long intparm);
475
476cdev : ccw_device the halt operation is requested for
477intparm : interruption parameter; value is only used if no I/O
478 is outstanding, otherwise the intparm associated with
479 the I/O request is returned
480
481The ccw_device_halt() function returns :
482
483 0 - successful completion or request successfully initiated
484-EBUSY - the device is currently busy, or status pending.
485-ENODEV - cdev invalid.
486-EINVAL - The device is not operational or the ccw device is not online.
487
488Usage Notes :
489
490A device driver may write a never-ending channel program by writing a channel
491program that at its end loops back to its beginning by means of a transfer in
492channel (TIC) command (CCW_CMD_TIC). Usually this is performed by network
493device drivers by setting the PCI CCW flag (CCW_FLAG_PCI). Once this CCW is
494executed a program controlled interrupt (PCI) is generated. The device driver
495can then perform an appropriate action. Prior to interrupt of an outstanding
496read to a network device (with or without PCI flag) a ccw_device_halt()
497is required to end the pending operation.
498
499
500Miscellaneous Support Routines
501
502This chapter describes various routines to be used in a Linux/390 device
503driver programming environment.
504
505get_ccwdev_lock()
506
507Get the address of the device specific lock. This is then used in
508spin_lock() / spin_unlock() calls.
509
510
511__u8 ccw_device_get_path_mask(struct ccw_device *cdev);
512
513Get the mask of the path currently available for cdev.
diff --git a/Documentation/s390/config3270.sh b/Documentation/s390/config3270.sh
new file mode 100644
index 000000000000..515e2f431487
--- /dev/null
+++ b/Documentation/s390/config3270.sh
@@ -0,0 +1,76 @@
1#!/bin/sh
2#
3# config3270 -- Autoconfigure /dev/3270/* and /etc/inittab
4#
5# Usage:
6# config3270
7#
8# Output:
9# /tmp/mkdev3270
10#
11# Operation:
12# 1. Run this script
13# 2. Run the script it produces: /tmp/mkdev3270
14# 3. Issue "telinit q" or reboot, as appropriate.
15#
16P=/proc/tty/driver/tty3270
17ROOT=
18D=$ROOT/dev
19SUBD=3270
20TTY=$SUBD/tty
21TUB=$SUBD/tub
22SCR=$ROOT/tmp/mkdev3270
23SCRTMP=$SCR.a
24GETTYLINE=:2345:respawn:/sbin/mingetty
25INITTAB=$ROOT/etc/inittab
26NINITTAB=$ROOT/etc/NEWinittab
27OINITTAB=$ROOT/etc/OLDinittab
28ADDNOTE=\\"# Additional mingettys for the 3270/tty* driver, tub3270 ---\\"
29
30if ! ls $P > /dev/null 2>&1; then
31 modprobe tub3270 > /dev/null 2>&1
32fi
33ls $P > /dev/null 2>&1 || exit 1
34
35# Initialize two files, one for /dev/3270 commands and one
36# to replace the /etc/inittab file (old one saved in OLDinittab)
37echo "#!/bin/sh" > $SCR || exit 1
38echo " " >> $SCR
39echo "# Script built by /sbin/config3270" >> $SCR
40if [ ! -d /dev/dasd ]; then
41 echo rm -rf "$D/$SUBD/*" >> $SCR
42fi
43echo "grep -v $TTY $INITTAB > $NINITTAB" > $SCRTMP || exit 1
44echo "echo $ADDNOTE >> $NINITTAB" >> $SCRTMP
45if [ ! -d /dev/dasd ]; then
46 echo mkdir -p $D/$SUBD >> $SCR
47fi
48
49# Now query the tub3270 driver for 3270 device information
50# and add appropriate mknod and mingetty lines to our files
51echo what=config > $P
52while read devno maj min;do
53 if [ $min = 0 ]; then
54 fsmaj=$maj
55 if [ ! -d /dev/dasd ]; then
56 echo mknod $D/$TUB c $fsmaj 0 >> $SCR
57 echo chmod 666 $D/$TUB >> $SCR
58 fi
59 elif [ $maj = CONSOLE ]; then
60 if [ ! -d /dev/dasd ]; then
61 echo mknod $D/$TUB$devno c $fsmaj $min >> $SCR
62 fi
63 else
64 if [ ! -d /dev/dasd ]; then
65 echo mknod $D/$TTY$devno c $maj $min >>$SCR
66 echo mknod $D/$TUB$devno c $fsmaj $min >> $SCR
67 fi
68 echo "echo t$min$GETTYLINE $TTY$devno >> $NINITTAB" >> $SCRTMP
69 fi
70done < $P
71
72echo mv $INITTAB $OINITTAB >> $SCRTMP || exit 1
73echo mv $NINITTAB $INITTAB >> $SCRTMP
74cat $SCRTMP >> $SCR
75rm $SCRTMP
76exit 0
diff --git a/Documentation/s390/crypto/crypto-API.txt b/Documentation/s390/crypto/crypto-API.txt
new file mode 100644
index 000000000000..78a77624a716
--- /dev/null
+++ b/Documentation/s390/crypto/crypto-API.txt
@@ -0,0 +1,83 @@
1crypto-API support for z990 Message Security Assist (MSA) instructions
2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3
4AUTHOR: Thomas Spatzier (tspat@de.ibm.com)
5
6
71. Introduction crypto-API
8~~~~~~~~~~~~~~~~~~~~~~~~~~
9See Documentation/crypto/api-intro.txt for an introduction/description of the
10kernel crypto API.
11According to api-intro.txt support for z990 crypto instructions has been added
12in the algorithm api layer of the crypto API. Several files containing z990
13optimized implementations of crypto algorithms are placed in the
14arch/s390/crypto directory.
15
16
172. Probing for availability of MSA
18~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19It should be possible to use Kernels with the z990 crypto implementations both
20on machines with MSA available an on those without MSA (pre z990 or z990
21without MSA). Therefore a simple probing mechanisms has been implemented:
22In the init function of each crypto module the availability of MSA and of the
23respective crypto algorithm in particular will be tested. If the algorithm is
24available the module will load and register its algorithm with the crypto API.
25
26If the respective crypto algorithm is not available, the init function will
27return -ENOSYS. In that case a fallback to the standard software implementation
28of the crypto algorithm must be taken ( -> the standard crypto modules are
29also build when compiling the kernel).
30
31
323. Ensuring z990 crypto module preference
33~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
34If z990 crypto instructions are available the optimized modules should be
35preferred instead of standard modules.
36
373.1. compiled-in modules
38~~~~~~~~~~~~~~~~~~~~~~~~
39For compiled-in modules it has to be ensured that the z990 modules are linked
40before the standard crypto modules. Then, on system startup the init functions
41of z990 crypto modules will be called first and query for availability of z990
42crypto instructions. If instruction is available, the z990 module will register
43its crypto algorithm implementation -> the load of the standard module will fail
44since the algorithm is already registered.
45If z990 crypto instruction is not available the load of the z990 module will
46fail -> the standard module will load and register its algorithm.
47
483.2. dynamic modules
49~~~~~~~~~~~~~~~~~~~~
50A system administrator has to take care of giving preference to z990 crypto
51modules. If MSA is available appropriate lines have to be added to
52/etc/modprobe.conf.
53
54Example: z990 crypto instruction for SHA1 algorithm is available
55
56 add the following line to /etc/modprobe.conf (assuming the
57 z990 crypto modules for SHA1 is called sha1_z990):
58
59 alias sha1 sha1_z990
60
61 -> when the sha1 algorithm is requested through the crypto API
62 (which has a module autoloader) the z990 module will be loaded.
63
64TBD: a userspace module probin mechanism
65 something like 'probe sha1 sha1_z990 sha1' in modprobe.conf
66 -> try module sha1_z990, if it fails to load load standard module sha1
67 the 'probe' statement is currently not supported in modprobe.conf
68
69
704. Currently implemented z990 crypto algorithms
71~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
72The following crypto algorithms with z990 MSA support are currently implemented.
73The name of each algorithm under which it is registered in crypto API and the
74name of the respective module is given in square brackets.
75
76- SHA1 Digest Algorithm [sha1 -> sha1_z990]
77- DES Encrypt/Decrypt Algorithm (64bit key) [des -> des_z990]
78- Tripple DES Encrypt/Decrypt Algorithm (128bit key) [des3_ede128 -> des_z990]
79- Tripple DES Encrypt/Decrypt Algorithm (192bit key) [des3_ede -> des_z990]
80
81In order to load, for example, the sha1_z990 module when the sha1 algorithm is
82requested (see 3.2.) add 'alias sha1 sha1_z990' to /etc/modprobe.conf.
83
diff --git a/Documentation/s390/driver-model.txt b/Documentation/s390/driver-model.txt
new file mode 100644
index 000000000000..19461958e2bd
--- /dev/null
+++ b/Documentation/s390/driver-model.txt
@@ -0,0 +1,265 @@
1S/390 driver model interfaces
2-----------------------------
3
41. CCW devices
5--------------
6
7All devices which can be addressed by means of ccws are called 'CCW devices' -
8even if they aren't actually driven by ccws.
9
10All ccw devices are accessed via a subchannel, this is reflected in the
11structures under root/:
12
13root/
14 - sys
15 - legacy
16 - css0/
17 - 0.0.0000/0.0.0815/
18 - 0.0.0001/0.0.4711/
19 - 0.0.0002/
20 ...
21
22In this example, device 0815 is accessed via subchannel 0, device 4711 via
23subchannel 1, and subchannel 2 is a non-I/O subchannel.
24
25You should address a ccw device via its bus id (e.g. 0.0.4711); the device can
26be found under bus/ccw/devices/.
27
28All ccw devices export some data via sysfs.
29
30cutype: The control unit type / model.
31
32devtype: The device type / model, if applicable.
33
34availability: Can be 'good' or 'boxed'; 'no path' or 'no device' for
35 disconnected devices.
36
37online: An interface to set the device online and offline.
38 In the special case of the device being disconnected (see the
39 notify function under 1.2), piping 0 to online will focibly delete
40 the device.
41
42The device drivers can add entries to export per-device data and interfaces.
43
44There is also some data exported on a per-subchannel basis (see under
45bus/css/devices/):
46
47chpids: Via which chpids the device is connected.
48
49pimpampom: The path installed, path available and path operational masks.
50
51There also might be additional data, for example for block devices.
52
53
541.1 Bringing up a ccw device
55----------------------------
56
57This is done in several steps.
58
59a. Each driver can provide one or more parameter interfaces where parameters can
60 be specified. These interfaces are also in the driver's responsibility.
61b. After a. has been performed, if necessary, the device is finally brought up
62 via the 'online' interface.
63
64
651.2 Writing a driver for ccw devices
66------------------------------------
67
68The basic struct ccw_device and struct ccw_driver data structures can be found
69under include/asm/ccwdev.h.
70
71struct ccw_device {
72 spinlock_t *ccwlock;
73 struct ccw_device_private *private;
74 struct ccw_device_id id;
75
76 struct ccw_driver *drv;
77 struct device dev;
78 int online;
79
80 void (*handler) (struct ccw_device *dev, unsigned long intparm,
81 struct irb *irb);
82};
83
84struct ccw_driver {
85 struct module *owner;
86 struct ccw_device_id *ids;
87 int (*probe) (struct ccw_device *);
88 int (*remove) (struct ccw_device *);
89 int (*set_online) (struct ccw_device *);
90 int (*set_offline) (struct ccw_device *);
91 int (*notify) (struct ccw_device *, int);
92 struct device_driver driver;
93 char *name;
94};
95
96The 'private' field contains data needed for internal i/o operation only, and
97is not available to the device driver.
98
99Each driver should declare in a MODULE_DEVICE_TABLE into which CU types/models
100and/or device types/models it is interested. This information can later be found
101found in the struct ccw_device_id fields:
102
103struct ccw_device_id {
104 __u16 match_flags;
105
106 __u16 cu_type;
107 __u16 dev_type;
108 __u8 cu_model;
109 __u8 dev_model;
110
111 unsigned long driver_info;
112};
113
114The functions in ccw_driver should be used in the following way:
115probe: This function is called by the device layer for each device the driver
116 is interested in. The driver should only allocate private structures
117 to put in dev->driver_data and create attributes (if needed). Also,
118 the interrupt handler (see below) should be set here.
119
120int (*probe) (struct ccw_device *cdev);
121
122Parameters: cdev - the device to be probed.
123
124
125remove: This function is called by the device layer upon removal of the driver,
126 the device or the module. The driver should perform cleanups here.
127
128int (*remove) (struct ccw_device *cdev);
129
130Parameters: cdev - the device to be removed.
131
132
133set_online: This function is called by the common I/O layer when the device is
134 activated via the 'online' attribute. The driver should finally
135 setup and activate the device here.
136
137int (*set_online) (struct ccw_device *);
138
139Parameters: cdev - the device to be activated. The common layer has
140 verified that the device is not already online.
141
142
143set_offline: This function is called by the common I/O layer when the device is
144 de-activated via the 'online' attribute. The driver should shut
145 down the device, but not de-allocate its private data.
146
147int (*set_offline) (struct ccw_device *);
148
149Parameters: cdev - the device to be deactivated. The common layer has
150 verified that the device is online.
151
152
153notify: This function is called by the common I/O layer for some state changes
154 of the device.
155 Signalled to the driver are:
156 * In online state, device detached (CIO_GONE) or last path gone
157 (CIO_NO_PATH). The driver must return !0 to keep the device; for
158 return code 0, the device will be deleted as usual (also when no
159 notify function is registerd). If the driver wants to keep the
160 device, it is moved into disconnected state.
161 * In disconnected state, device operational again (CIO_OPER). The
162 common I/O layer performs some sanity checks on device number and
163 Device / CU to be reasonably sure if it is still the same device.
164 If not, the old device is removed and a new one registered. By the
165 return code of the notify function the device driver signals if it
166 wants the device back: !0 for keeping, 0 to make the device being
167 removed and re-registered.
168
169int (*notify) (struct ccw_device *, int);
170
171Parameters: cdev - the device whose state changed.
172 event - the event that happened. This can be one of CIO_GONE,
173 CIO_NO_PATH or CIO_OPER.
174
175The handler field of the struct ccw_device is meant to be set to the interrupt
176handler for the device. In order to accommodate drivers which use several
177distinct handlers (e.g. multi subchannel devices), this is a member of ccw_device
178instead of ccw_driver.
179The handler is registered with the common layer during set_online() processing
180before the driver is called, and is deregistered during set_offline() after the
181driver has been called. Also, after registering / before deregistering, path
182grouping resp. disbanding of the path group (if applicable) are performed.
183
184void (*handler) (struct ccw_device *dev, unsigned long intparm, struct irb *irb);
185
186Parameters: dev - the device the handler is called for
187 intparm - the intparm which allows the device driver to identify
188 the i/o the interrupt is associated with, or to recognize
189 the interrupt as unsolicited.
190 irb - interruption response block which contains the accumulated
191 status.
192
193The device driver is called from the common ccw_device layer and can retrieve
194information about the interrupt from the irb parameter.
195
196
1971.3 ccwgroup devices
198--------------------
199
200The ccwgroup mechanism is designed to handle devices consisting of multiple ccw
201devices, like lcs or ctc.
202
203The ccw driver provides a 'group' attribute. Piping bus ids of ccw devices to
204this attributes creates a ccwgroup device consisting of these ccw devices (if
205possible). This ccwgroup device can be set online or offline just like a normal
206ccw device.
207
208Each ccwgroup device also provides an 'ungroup' attribute to destroy the device
209again (only when offline). This is a generic ccwgroup mechanism (the driver does
210not need to implement anything beyond normal removal routines).
211
212To implement a ccwgroup driver, please refer to include/asm/ccwgroup.h. Keep in
213mind that most drivers will need to implement both a ccwgroup and a ccw driver
214(unless you have a meta ccw driver, like cu3088 for lcs and ctc).
215
216
2172. Channel paths
218-----------------
219
220Channel paths show up, like subchannels, under the channel subsystem root (css0)
221and are called 'chp0.<chpid>'. They have no driver and do not belong to any bus.
222Please note, that unlike /proc/chpids in 2.4, the channel path objects reflect
223only the logical state and not the physical state, since we cannot track the
224latter consistently due to lacking machine support (we don't need to be aware
225of anyway).
226
227status - Can be 'online' or 'offline'.
228 Piping 'on' or 'off' sets the chpid logically online/offline.
229 Piping 'on' to an online chpid triggers path reprobing for all devices
230 the chpid connects to. This can be used to force the kernel to re-use
231 a channel path the user knows to be online, but the machine hasn't
232 created a machine check for.
233
234
2353. System devices
236-----------------
237
238Note: cpus may yet be added here.
239
2403.1 xpram
241---------
242
243xpram shows up under sys/ as 'xpram'.
244
245
2464. Other devices
247----------------
248
2494.1 Netiucv
250-----------
251
252The netiucv driver creates an attribute 'connection' under
253bus/iucv/drivers/netiucv. Piping to this attibute creates a new netiucv
254connection to the specified host.
255
256Netiucv connections show up under devices/iucv/ as "netiucv<ifnum>". The interface
257number is assigned sequentially to the connections defined via the 'connection'
258attribute.
259
260user - shows the connection partner.
261
262buffer - maximum buffer size.
263 Pipe to it to change buffer size.
264
265
diff --git a/Documentation/s390/monreader.txt b/Documentation/s390/monreader.txt
new file mode 100644
index 000000000000..d843bb04906e
--- /dev/null
+++ b/Documentation/s390/monreader.txt
@@ -0,0 +1,197 @@
1
2Date : 2004-Nov-26
3Author: Gerald Schaefer (geraldsc@de.ibm.com)
4
5
6 Linux API for read access to z/VM Monitor Records
7 =================================================
8
9
10Description
11===========
12This item delivers a new Linux API in the form of a misc char device that is
13useable from user space and allows read access to the z/VM Monitor Records
14collected by the *MONITOR System Service of z/VM.
15
16
17User Requirements
18=================
19The z/VM guest on which you want to access this API needs to be configured in
20order to allow IUCV connections to the *MONITOR service, i.e. it needs the
21IUCV *MONITOR statement in its user entry. If the monitor DCSS to be used is
22restricted (likely), you also need the NAMESAVE <DCSS NAME> statement.
23This item will use the IUCV device driver to access the z/VM services, so you
24need a kernel with IUCV support. You also need z/VM version 4.4 or 5.1.
25
26There are two options for being able to load the monitor DCSS (examples assume
27that the monitor DCSS begins at 144 MB and ends at 152 MB). You can query the
28location of the monitor DCSS with the Class E privileged CP command Q NSS MAP
29(the values BEGPAG and ENDPAG are given in units of 4K pages).
30
31See also "CP Command and Utility Reference" (SC24-6081-00) for more information
32on the DEF STOR and Q NSS MAP commands, as well as "Saved Segments Planning
33and Administration" (SC24-6116-00) for more information on DCSSes.
34
351st option:
36-----------
37You can use the CP command DEF STOR CONFIG to define a "memory hole" in your
38guest virtual storage around the address range of the DCSS.
39
40Example: DEF STOR CONFIG 0.140M 200M.200M
41
42This defines two blocks of storage, the first is 140MB in size an begins at
43address 0MB, the second is 200MB in size and begins at address 200MB,
44resulting in a total storage of 340MB. Note that the first block should
45always start at 0 and be at least 64MB in size.
46
472nd option:
48-----------
49Your guest virtual storage has to end below the starting address of the DCSS
50and you have to specify the "mem=" kernel parameter in your parmfile with a
51value greater than the ending address of the DCSS.
52
53Example: DEF STOR 140M
54
55This defines 140MB storage size for your guest, the parameter "mem=160M" is
56added to the parmfile.
57
58
59User Interface
60==============
61The char device is implemented as a kernel module named "monreader",
62which can be loaded via the modprobe command, or it can be compiled into the
63kernel instead. There is one optional module (or kernel) parameter, "mondcss",
64to specify the name of the monitor DCSS. If the module is compiled into the
65kernel, the kernel parameter "monreader.mondcss=<DCSS NAME>" can be specified
66in the parmfile.
67
68The default name for the DCSS is "MONDCSS" if none is specified. In case that
69there are other users already connected to the *MONITOR service (e.g.
70Performance Toolkit), the monitor DCSS is already defined and you have to use
71the same DCSS. The CP command Q MONITOR (Class E privileged) shows the name
72of the monitor DCSS, if already defined, and the users connected to the
73*MONITOR service.
74Refer to the "z/VM Performance" book (SC24-6109-00) on how to create a monitor
75DCSS if your z/VM doesn't have one already, you need Class E privileges to
76define and save a DCSS.
77
78Example:
79--------
80modprobe monreader mondcss=MYDCSS
81
82This loads the module and sets the DCSS name to "MYDCSS".
83
84NOTE:
85-----
86This API provides no interface to control the *MONITOR service, e.g. specifiy
87which data should be collected. This can be done by the CP command MONITOR
88(Class E privileged), see "CP Command and Utility Reference".
89
90Device nodes with udev:
91-----------------------
92After loading the module, a char device will be created along with the device
93node /<udev directory>/monreader.
94
95Device nodes without udev:
96--------------------------
97If your distribution does not support udev, a device node will not be created
98automatically and you have to create it manually after loading the module.
99Therefore you need to know the major and minor numbers of the device. These
100numbers can be found in /sys/class/misc/monreader/dev.
101Typing cat /sys/class/misc/monreader/dev will give an output of the form
102<major>:<minor>. The device node can be created via the mknod command, enter
103mknod <name> c <major> <minor>, where <name> is the name of the device node
104to be created.
105
106Example:
107--------
108# modprobe monreader
109# cat /sys/class/misc/monreader/dev
11010:63
111# mknod /dev/monreader c 10 63
112
113This loads the module with the default monitor DCSS (MONDCSS) and creates a
114device node.
115
116File operations:
117----------------
118The following file operations are supported: open, release, read, poll.
119There are two alternative methods for reading: either non-blocking read in
120conjunction with polling, or blocking read without polling. IOCTLs are not
121supported.
122
123Read:
124-----
125Reading from the device provides a 12 Byte monitor control element (MCE),
126followed by a set of one or more contiguous monitor records (similar to the
127output of the CMS utility MONWRITE without the 4K control blocks). The MCE
128contains information on the type of the following record set (sample/event
129data), the monitor domains contained within it and the start and end address
130of the record set in the monitor DCSS. The start and end address can be used
131to determine the size of the record set, the end address is the address of the
132last byte of data. The start address is needed to handle "end-of-frame" records
133correctly (domain 1, record 13), i.e. it can be used to determine the record
134start offset relative to a 4K page (frame) boundary.
135
136See "Appendix A: *MONITOR" in the "z/VM Performance" document for a description
137of the monitor control element layout. The layout of the monitor records can
138be found here (z/VM 5.1): http://www.vm.ibm.com/pubs/mon510/index.html
139
140The layout of the data stream provided by the monreader device is as follows:
141...
142<0 byte read>
143<first MCE> \
144<first set of records> |
145... |- data set
146<last MCE> |
147<last set of records> /
148<0 byte read>
149...
150
151There may be more than one combination of MCE and corresponding record set
152within one data set and the end of each data set is indicated by a successful
153read with a return value of 0 (0 byte read).
154Any received data must be considered invalid until a complete set was
155read successfully, including the closing 0 byte read. Therefore you should
156always read the complete set into a buffer before processing the data.
157
158The maximum size of a data set can be as large as the size of the
159monitor DCSS, so design the buffer adequately or use dynamic memory allocation.
160The size of the monitor DCSS will be printed into syslog after loading the
161module. You can also use the (Class E privileged) CP command Q NSS MAP to
162list all available segments and information about them.
163
164As with most char devices, error conditions are indicated by returning a
165negative value for the number of bytes read. In this case, the errno variable
166indicates the error condition:
167
168EIO: reply failed, read data is invalid and the application
169 should discard the data read since the last successful read with 0 size.
170EFAULT: copy_to_user failed, read data is invalid and the application should
171 discard the data read since the last successful read with 0 size.
172EAGAIN: occurs on a non-blocking read if there is no data available at the
173 moment. There is no data missing or corrupted, just try again or rather
174 use polling for non-blocking reads.
175EOVERFLOW: message limit reached, the data read since the last successful
176 read with 0 size is valid but subsequent records may be missing.
177
178In the last case (EOVERFLOW) there may be missing data, in the first two cases
179(EIO, EFAULT) there will be missing data. It's up to the application if it will
180continue reading subsequent data or rather exit.
181
182Open:
183-----
184Only one user is allowed to open the char device. If it is already in use, the
185open function will fail (return a negative value) and set errno to EBUSY.
186The open function may also fail if an IUCV connection to the *MONITOR service
187cannot be established. In this case errno will be set to EIO and an error
188message with an IPUSER SEVER code will be printed into syslog. The IPUSER SEVER
189codes are described in the "z/VM Performance" book, Appendix A.
190
191NOTE:
192-----
193As soon as the device is opened, incoming messages will be accepted and they
194will account for the message limit, i.e. opening the device without reading
195from it will provoke the "message limit reached" error (EOVERFLOW error code)
196eventually.
197
diff --git a/Documentation/s390/s390dbf.txt b/Documentation/s390/s390dbf.txt
new file mode 100644
index 000000000000..2d1cd939b4df
--- /dev/null
+++ b/Documentation/s390/s390dbf.txt
@@ -0,0 +1,615 @@
1S390 Debug Feature
2==================
3
4files: arch/s390/kernel/debug.c
5 include/asm-s390/debug.h
6
7Description:
8------------
9The goal of this feature is to provide a kernel debug logging API
10where log records can be stored efficiently in memory, where each component
11(e.g. device drivers) can have one separate debug log.
12One purpose of this is to inspect the debug logs after a production system crash
13in order to analyze the reason for the crash.
14If the system still runs but only a subcomponent which uses dbf failes,
15it is possible to look at the debug logs on a live system via the Linux proc
16filesystem.
17The debug feature may also very useful for kernel and driver development.
18
19Design:
20-------
21Kernel components (e.g. device drivers) can register themselves at the debug
22feature with the function call debug_register(). This function initializes a
23debug log for the caller. For each debug log exists a number of debug areas
24where exactly one is active at one time. Each debug area consists of contiguous
25pages in memory. In the debug areas there are stored debug entries (log records)
26which are written by event- and exception-calls.
27
28An event-call writes the specified debug entry to the active debug
29area and updates the log pointer for the active area. If the end
30of the active debug area is reached, a wrap around is done (ring buffer)
31and the next debug entry will be written at the beginning of the active
32debug area.
33
34An exception-call writes the specified debug entry to the log and
35switches to the next debug area. This is done in order to be sure
36that the records which describe the origin of the exception are not
37overwritten when a wrap around for the current area occurs.
38
39The debug areas itselve are also ordered in form of a ring buffer.
40When an exception is thrown in the last debug area, the following debug
41entries are then written again in the very first area.
42
43There are three versions for the event- and exception-calls: One for
44logging raw data, one for text and one for numbers.
45
46Each debug entry contains the following data:
47
48- Timestamp
49- Cpu-Number of calling task
50- Level of debug entry (0...6)
51- Return Address to caller
52- Flag, if entry is an exception or not
53
54The debug logs can be inspected in a live system through entries in
55the proc-filesystem. Under the path /proc/s390dbf there is
56a directory for each registered component, which is named like the
57corresponding component.
58
59The content of the directories are files which represent different views
60to the debug log. Each component can decide which views should be
61used through registering them with the function debug_register_view().
62Predefined views for hex/ascii, sprintf and raw binary data are provided.
63It is also possible to define other views. The content of
64a view can be inspected simply by reading the corresponding proc file.
65
66All debug logs have an an actual debug level (range from 0 to 6).
67The default level is 3. Event and Exception functions have a 'level'
68parameter. Only debug entries with a level that is lower or equal
69than the actual level are written to the log. This means, when
70writing events, high priority log entries should have a low level
71value whereas low priority entries should have a high one.
72The actual debug level can be changed with the help of the proc-filesystem
73through writing a number string "x" to the 'level' proc file which is
74provided for every debug log. Debugging can be switched off completely
75by using "-" on the 'level' proc file.
76
77Example:
78
79> echo "-" > /proc/s390dbf/dasd/level
80
81It is also possible to deactivate the debug feature globally for every
82debug log. You can change the behavior using 2 sysctl parameters in
83/proc/sys/s390dbf:
84There are currently 2 possible triggers, which stop the debug feature
85globally. The first possbility is to use the "debug_active" sysctl. If
86set to 1 the debug feature is running. If "debug_active" is set to 0 the
87debug feature is turned off.
88The second trigger which stops the debug feature is an kernel oops.
89That prevents the debug feature from overwriting debug information that
90happened before the oops. After an oops you can reactivate the debug feature
91by piping 1 to /proc/sys/s390dbf/debug_active. Nevertheless, its not
92suggested to use an oopsed kernel in an production environment.
93If you want to disallow the deactivation of the debug feature, you can use
94the "debug_stoppable" sysctl. If you set "debug_stoppable" to 0 the debug
95feature cannot be stopped. If the debug feature is already stopped, it
96will stay deactivated.
97
98Kernel Interfaces:
99------------------
100
101----------------------------------------------------------------------------
102debug_info_t *debug_register(char *name, int pages_index, int nr_areas,
103 int buf_size);
104
105Parameter: name: Name of debug log (e.g. used for proc entry)
106 pages_index: 2^pages_index pages will be allocated per area
107 nr_areas: number of debug areas
108 buf_size: size of data area in each debug entry
109
110Return Value: Handle for generated debug area
111 NULL if register failed
112
113Description: Allocates memory for a debug log
114 Must not be called within an interrupt handler
115
116---------------------------------------------------------------------------
117void debug_unregister (debug_info_t * id);
118
119Parameter: id: handle for debug log
120
121Return Value: none
122
123Description: frees memory for a debug log
124 Must not be called within an interrupt handler
125
126---------------------------------------------------------------------------
127void debug_set_level (debug_info_t * id, int new_level);
128
129Parameter: id: handle for debug log
130 new_level: new debug level
131
132Return Value: none
133
134Description: Sets new actual debug level if new_level is valid.
135
136---------------------------------------------------------------------------
137+void debug_stop_all(void);
138
139Parameter: none
140
141Return Value: none
142
143Description: stops the debug feature if stopping is allowed. Currently
144 used in case of a kernel oops.
145
146---------------------------------------------------------------------------
147debug_entry_t* debug_event (debug_info_t* id, int level, void* data,
148 int length);
149
150Parameter: id: handle for debug log
151 level: debug level
152 data: pointer to data for debug entry
153 length: length of data in bytes
154
155Return Value: Address of written debug entry
156
157Description: writes debug entry to active debug area (if level <= actual
158 debug level)
159
160---------------------------------------------------------------------------
161debug_entry_t* debug_int_event (debug_info_t * id, int level,
162 unsigned int data);
163debug_entry_t* debug_long_event(debug_info_t * id, int level,
164 unsigned long data);
165
166Parameter: id: handle for debug log
167 level: debug level
168 data: integer value for debug entry
169
170Return Value: Address of written debug entry
171
172Description: writes debug entry to active debug area (if level <= actual
173 debug level)
174
175---------------------------------------------------------------------------
176debug_entry_t* debug_text_event (debug_info_t * id, int level,
177 const char* data);
178
179Parameter: id: handle for debug log
180 level: debug level
181 data: string for debug entry
182
183Return Value: Address of written debug entry
184
185Description: writes debug entry in ascii format to active debug area
186 (if level <= actual debug level)
187
188---------------------------------------------------------------------------
189debug_entry_t* debug_sprintf_event (debug_info_t * id, int level,
190 char* string,...);
191
192Parameter: id: handle for debug log
193 level: debug level
194 string: format string for debug entry
195 ...: varargs used as in sprintf()
196
197Return Value: Address of written debug entry
198
199Description: writes debug entry with format string and varargs (longs) to
200 active debug area (if level $<=$ actual debug level).
201 floats and long long datatypes cannot be used as varargs.
202
203---------------------------------------------------------------------------
204
205debug_entry_t* debug_exception (debug_info_t* id, int level, void* data,
206 int length);
207
208Parameter: id: handle for debug log
209 level: debug level
210 data: pointer to data for debug entry
211 length: length of data in bytes
212
213Return Value: Address of written debug entry
214
215Description: writes debug entry to active debug area (if level <= actual
216 debug level) and switches to next debug area
217
218---------------------------------------------------------------------------
219debug_entry_t* debug_int_exception (debug_info_t * id, int level,
220 unsigned int data);
221debug_entry_t* debug_long_exception(debug_info_t * id, int level,
222 unsigned long data);
223
224Parameter: id: handle for debug log
225 level: debug level
226 data: integer value for debug entry
227
228Return Value: Address of written debug entry
229
230Description: writes debug entry to active debug area (if level <= actual
231 debug level) and switches to next debug area
232
233---------------------------------------------------------------------------
234debug_entry_t* debug_text_exception (debug_info_t * id, int level,
235 const char* data);
236
237Parameter: id: handle for debug log
238 level: debug level
239 data: string for debug entry
240
241Return Value: Address of written debug entry
242
243Description: writes debug entry in ascii format to active debug area
244 (if level <= actual debug level) and switches to next debug
245 area
246
247---------------------------------------------------------------------------
248debug_entry_t* debug_sprintf_exception (debug_info_t * id, int level,
249 char* string,...);
250
251Parameter: id: handle for debug log
252 level: debug level
253 string: format string for debug entry
254 ...: varargs used as in sprintf()
255
256Return Value: Address of written debug entry
257
258Description: writes debug entry with format string and varargs (longs) to
259 active debug area (if level $<=$ actual debug level) and
260 switches to next debug area.
261 floats and long long datatypes cannot be used as varargs.
262
263---------------------------------------------------------------------------
264
265int debug_register_view (debug_info_t * id, struct debug_view *view);
266
267Parameter: id: handle for debug log
268 view: pointer to debug view struct
269
270Return Value: 0 : ok
271 < 0: Error
272
273Description: registers new debug view and creates proc dir entry
274
275---------------------------------------------------------------------------
276int debug_unregister_view (debug_info_t * id, struct debug_view *view);
277
278Parameter: id: handle for debug log
279 view: pointer to debug view struct
280
281Return Value: 0 : ok
282 < 0: Error
283
284Description: unregisters debug view and removes proc dir entry
285
286
287
288Predefined views:
289-----------------
290
291extern struct debug_view debug_hex_ascii_view;
292extern struct debug_view debug_raw_view;
293extern struct debug_view debug_sprintf_view;
294
295Examples
296--------
297
298/*
299 * hex_ascii- + raw-view Example
300 */
301
302#include <linux/init.h>
303#include <asm/debug.h>
304
305static debug_info_t* debug_info;
306
307static int init(void)
308{
309 /* register 4 debug areas with one page each and 4 byte data field */
310
311 debug_info = debug_register ("test", 0, 4, 4 );
312 debug_register_view(debug_info,&debug_hex_ascii_view);
313 debug_register_view(debug_info,&debug_raw_view);
314
315 debug_text_event(debug_info, 4 , "one ");
316 debug_int_exception(debug_info, 4, 4711);
317 debug_event(debug_info, 3, &debug_info, 4);
318
319 return 0;
320}
321
322static void cleanup(void)
323{
324 debug_unregister (debug_info);
325}
326
327module_init(init);
328module_exit(cleanup);
329
330---------------------------------------------------------------------------
331
332/*
333 * sprintf-view Example
334 */
335
336#include <linux/init.h>
337#include <asm/debug.h>
338
339static debug_info_t* debug_info;
340
341static int init(void)
342{
343 /* register 4 debug areas with one page each and data field for */
344 /* format string pointer + 2 varargs (= 3 * sizeof(long)) */
345
346 debug_info = debug_register ("test", 0, 4, sizeof(long) * 3);
347 debug_register_view(debug_info,&debug_sprintf_view);
348
349 debug_sprintf_event(debug_info, 2 , "first event in %s:%i\n",__FILE__,__LINE__);
350 debug_sprintf_exception(debug_info, 1, "pointer to debug info: %p\n",&debug_info);
351
352 return 0;
353}
354
355static void cleanup(void)
356{
357 debug_unregister (debug_info);
358}
359
360module_init(init);
361module_exit(cleanup);
362
363
364
365ProcFS Interface
366----------------
367Views to the debug logs can be investigated through reading the corresponding
368proc-files:
369
370Example:
371
372> ls /proc/s390dbf/dasd
373flush hex_ascii level raw
374> cat /proc/s390dbf/dasd/hex_ascii | sort +1
37500 00974733272:680099 2 - 02 0006ad7e 07 ea 4a 90 | ....
37600 00974733272:682210 2 - 02 0006ade6 46 52 45 45 | FREE
37700 00974733272:682213 2 - 02 0006adf6 07 ea 4a 90 | ....
37800 00974733272:682281 1 * 02 0006ab08 41 4c 4c 43 | EXCP
37901 00974733272:682284 2 - 02 0006ab16 45 43 4b 44 | ECKD
38001 00974733272:682287 2 - 02 0006ab28 00 00 00 04 | ....
38101 00974733272:682289 2 - 02 0006ab3e 00 00 00 20 | ...
38201 00974733272:682297 2 - 02 0006ad7e 07 ea 4a 90 | ....
38301 00974733272:684384 2 - 00 0006ade6 46 52 45 45 | FREE
38401 00974733272:684388 2 - 00 0006adf6 07 ea 4a 90 | ....
385
386See section about predefined views for explanation of the above output!
387
388Changing the debug level
389------------------------
390
391Example:
392
393
394> cat /proc/s390dbf/dasd/level
3953
396> echo "5" > /proc/s390dbf/dasd/level
397> cat /proc/s390dbf/dasd/level
3985
399
400Flushing debug areas
401--------------------
402Debug areas can be flushed with piping the number of the desired
403area (0...n) to the proc file "flush". When using "-" all debug areas
404are flushed.
405
406Examples:
407
4081. Flush debug area 0:
409> echo "0" > /proc/s390dbf/dasd/flush
410
4112. Flush all debug areas:
412> echo "-" > /proc/s390dbf/dasd/flush
413
414Stooping the debug feature
415--------------------------
416Example:
417
4181. Check if stopping is allowed
419> cat /proc/sys/s390dbf/debug_stoppable
4202. Stop debug feature
421> echo 0 > /proc/sys/s390dbf/debug_active
422
423lcrash Interface
424----------------
425It is planned that the dump analysis tool lcrash gets an additional command
426's390dbf' to display all the debug logs. With this tool it will be possible
427to investigate the debug logs on a live system and with a memory dump after
428a system crash.
429
430Investigating raw memory
431------------------------
432One last possibility to investigate the debug logs at a live
433system and after a system crash is to look at the raw memory
434under VM or at the Service Element.
435It is possible to find the anker of the debug-logs through
436the 'debug_area_first' symbol in the System map. Then one has
437to follow the correct pointers of the data-structures defined
438in debug.h and find the debug-areas in memory.
439Normally modules which use the debug feature will also have
440a global variable with the pointer to the debug-logs. Following
441this pointer it will also be possible to find the debug logs in
442memory.
443
444For this method it is recommended to use '16 * x + 4' byte (x = 0..n)
445for the length of the data field in debug_register() in
446order to see the debug entries well formatted.
447
448
449Predefined Views
450----------------
451
452There are three predefined views: hex_ascii, raw and sprintf.
453The hex_ascii view shows the data field in hex and ascii representation
454(e.g. '45 43 4b 44 | ECKD').
455The raw view returns a bytestream as the debug areas are stored in memory.
456
457The sprintf view formats the debug entries in the same way as the sprintf
458function would do. The sprintf event/expection fuctions write to the
459debug entry a pointer to the format string (size = sizeof(long))
460and for each vararg a long value. So e.g. for a debug entry with a format
461string plus two varargs one would need to allocate a (3 * sizeof(long))
462byte data area in the debug_register() function.
463
464
465NOTE: If using the sprintf view do NOT use other event/exception functions
466than the sprintf-event and -exception functions.
467
468The format of the hex_ascii and sprintf view is as follows:
469- Number of area
470- Timestamp (formatted as seconds and microseconds since 00:00:00 Coordinated
471 Universal Time (UTC), January 1, 1970)
472- level of debug entry
473- Exception flag (* = Exception)
474- Cpu-Number of calling task
475- Return Address to caller
476- data field
477
478The format of the raw view is:
479- Header as described in debug.h
480- datafield
481
482A typical line of the hex_ascii view will look like the following (first line
483is only for explanation and will not be displayed when 'cating' the view):
484
485area time level exception cpu caller data (hex + ascii)
486--------------------------------------------------------------------------
48700 00964419409:440690 1 - 00 88023fe
488
489
490Defining views
491--------------
492
493Views are specified with the 'debug_view' structure. There are defined
494callback functions which are used for reading and writing the proc files:
495
496struct debug_view {
497 char name[DEBUG_MAX_PROCF_LEN];
498 debug_prolog_proc_t* prolog_proc;
499 debug_header_proc_t* header_proc;
500 debug_format_proc_t* format_proc;
501 debug_input_proc_t* input_proc;
502 void* private_data;
503};
504
505where
506
507typedef int (debug_header_proc_t) (debug_info_t* id,
508 struct debug_view* view,
509 int area,
510 debug_entry_t* entry,
511 char* out_buf);
512
513typedef int (debug_format_proc_t) (debug_info_t* id,
514 struct debug_view* view, char* out_buf,
515 const char* in_buf);
516typedef int (debug_prolog_proc_t) (debug_info_t* id,
517 struct debug_view* view,
518 char* out_buf);
519typedef int (debug_input_proc_t) (debug_info_t* id,
520 struct debug_view* view,
521 struct file* file, const char* user_buf,
522 size_t in_buf_size, loff_t* offset);
523
524
525The "private_data" member can be used as pointer to view specific data.
526It is not used by the debug feature itself.
527
528The output when reading a debug-proc file is structured like this:
529
530"prolog_proc output"
531
532"header_proc output 1" "format_proc output 1"
533"header_proc output 2" "format_proc output 2"
534"header_proc output 3" "format_proc output 3"
535...
536
537When a view is read from the proc fs, the Debug Feature calls the
538'prolog_proc' once for writing the prolog.
539Then 'header_proc' and 'format_proc' are called for each
540existing debug entry.
541
542The input_proc can be used to implement functionality when it is written to
543the view (e.g. like with 'echo "0" > /proc/s390dbf/dasd/level).
544
545For header_proc there can be used the default function
546debug_dflt_header_fn() which is defined in in debug.h.
547and which produces the same header output as the predefined views.
548E.g:
54900 00964419409:440761 2 - 00 88023ec
550
551In order to see how to use the callback functions check the implementation
552of the default views!
553
554Example
555
556#include <asm/debug.h>
557
558#define UNKNOWNSTR "data: %08x"
559
560const char* messages[] =
561{"This error...........\n",
562 "That error...........\n",
563 "Problem..............\n",
564 "Something went wrong.\n",
565 "Everything ok........\n",
566 NULL
567};
568
569static int debug_test_format_fn(
570 debug_info_t * id, struct debug_view *view,
571 char *out_buf, const char *in_buf
572)
573{
574 int i, rc = 0;
575
576 if(id->buf_size >= 4) {
577 int msg_nr = *((int*)in_buf);
578 if(msg_nr < sizeof(messages)/sizeof(char*) - 1)
579 rc += sprintf(out_buf, "%s", messages[msg_nr]);
580 else
581 rc += sprintf(out_buf, UNKNOWNSTR, msg_nr);
582 }
583 out:
584 return rc;
585}
586
587struct debug_view debug_test_view = {
588 "myview", /* name of view */
589 NULL, /* no prolog */
590 &debug_dflt_header_fn, /* default header for each entry */
591 &debug_test_format_fn, /* our own format function */
592 NULL, /* no input function */
593 NULL /* no private data */
594};
595
596=====
597test:
598=====
599debug_info_t *debug_info;
600...
601debug_info = debug_register ("test", 0, 4, 4 ));
602debug_register_view(debug_info, &debug_test_view);
603for(i = 0; i < 10; i ++) debug_int_event(debug_info, 1, i);
604
605> cat /proc/s390dbf/test/myview
60600 00964419734:611402 1 - 00 88042ca This error...........
60700 00964419734:611405 1 - 00 88042ca That error...........
60800 00964419734:611408 1 - 00 88042ca Problem..............
60900 00964419734:611411 1 - 00 88042ca Something went wrong.
61000 00964419734:611414 1 - 00 88042ca Everything ok........
61100 00964419734:611417 1 - 00 88042ca data: 00000005
61200 00964419734:611419 1 - 00 88042ca data: 00000006
61300 00964419734:611422 1 - 00 88042ca data: 00000007
61400 00964419734:611425 1 - 00 88042ca data: 00000008
61500 00964419734:611428 1 - 00 88042ca data: 00000009