summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2017-07-04 00:13:25 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2017-07-04 00:13:25 -0400
commit650fc870a2ef35b83397eebd35b8c8df211bff78 (patch)
tree14a293fa894d0f166aa60f1f5ca672a2bdb312c0
parentf4dd029ee0b92b77769a1ac6dce03e829e74763e (diff)
parent1cb566ba5634d7593b8b2a0a5c83f1c9e14b2e09 (diff)
Merge tag 'docs-4.13' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet: "There has been a fair amount of activity in the docs tree this time around. Highlights include: - Conversion of a bunch of security documentation into RST - The conversion of the remaining DocBook templates by The Amazing Mauro Machine. We can now drop the entire DocBook build chain. - The usual collection of fixes and minor updates" * tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits) scripts/kernel-doc: handle DECLARE_HASHTABLE Documentation: atomic_ops.txt is core-api/atomic_ops.rst Docs: clean up some DocBook loose ends Make the main documentation title less Geocities Docs: Use kernel-figure in vidioc-g-selection.rst Docs: fix table problems in ras.rst Docs: Fix breakage with Sphinx 1.5 and upper Docs: Include the Latex "ifthen" package doc/kokr/howto: Only send regression fixes after -rc1 docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters doc: Document suitability of IBM Verse for kernel development Doc: fix a markup error in coding-style.rst docs: driver-api: i2c: remove some outdated information Documentation: DMA API: fix a typo in a function name Docs: Insert missing space to separate link from text doc/ko_KR/memory-barriers: Update control-dependencies example Documentation, kbuild: fix typo "minimun" -> "minimum" docs: Fix some formatting issues in request-key.rst doc: ReSTify keys-trusted-encrypted.txt doc: ReSTify keys-request-key.txt ...
-rw-r--r--Documentation/00-INDEX6
-rw-r--r--Documentation/DMA-API.txt2
-rw-r--r--Documentation/DocBook/.gitignore17
-rw-r--r--Documentation/DocBook/Makefile282
-rw-r--r--Documentation/DocBook/filesystems.tmpl381
-rw-r--r--Documentation/DocBook/kernel-hacking.tmpl1312
-rw-r--r--Documentation/DocBook/kernel-locking.tmpl2151
-rw-r--r--Documentation/DocBook/kgdb.tmpl918
-rw-r--r--Documentation/DocBook/libata.tmpl1625
-rw-r--r--Documentation/DocBook/librs.tmpl289
-rw-r--r--Documentation/DocBook/lsm.tmpl265
-rw-r--r--Documentation/DocBook/mtdnand.tmpl1291
-rw-r--r--Documentation/DocBook/networking.tmpl111
-rw-r--r--Documentation/DocBook/rapidio.tmpl155
-rw-r--r--Documentation/DocBook/s390-drivers.tmpl161
-rw-r--r--Documentation/DocBook/scsi.tmpl409
-rw-r--r--Documentation/DocBook/sh.tmpl105
-rw-r--r--Documentation/DocBook/stylesheet.xsl11
-rw-r--r--Documentation/DocBook/w1.tmpl101
-rw-r--r--Documentation/DocBook/z8530book.tmpl371
-rw-r--r--Documentation/Makefile125
-rw-r--r--Documentation/Makefile.sphinx130
-rw-r--r--Documentation/PCI/MSI-HOWTO.txt2
-rw-r--r--Documentation/admin-guide/LSM/LoadPin.rst (renamed from Documentation/security/LoadPin.txt)12
-rw-r--r--Documentation/admin-guide/LSM/SELinux.rst (renamed from Documentation/security/SELinux.txt)18
-rw-r--r--Documentation/admin-guide/LSM/Smack.rst (renamed from Documentation/security/Smack.txt)273
-rw-r--r--Documentation/admin-guide/LSM/Yama.rst (renamed from Documentation/security/Yama.txt)55
-rw-r--r--Documentation/admin-guide/LSM/apparmor.rst (renamed from Documentation/security/apparmor.txt)36
-rw-r--r--Documentation/admin-guide/LSM/index.rst (renamed from Documentation/security/LSM.txt)28
-rw-r--r--Documentation/admin-guide/LSM/tomoyo.rst (renamed from Documentation/security/tomoyo.txt)22
-rw-r--r--Documentation/admin-guide/README.rst6
-rw-r--r--Documentation/admin-guide/index.rst1
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt6
-rw-r--r--Documentation/admin-guide/ras.rst10
-rw-r--r--Documentation/conf.py44
-rw-r--r--Documentation/core-api/assoc_array.rst5
-rw-r--r--Documentation/core-api/index.rst1
-rw-r--r--Documentation/core-api/librs.rst212
-rw-r--r--Documentation/crypto/asymmetric-keys.txt2
-rw-r--r--Documentation/crypto/conf.py10
-rw-r--r--Documentation/dev-tools/index.rst1
-rw-r--r--Documentation/dev-tools/kgdb.rst907
-rw-r--r--Documentation/doc-guide/docbook.rst90
-rw-r--r--Documentation/doc-guide/index.rst1
-rw-r--r--Documentation/doc-guide/kernel-doc.rst10
-rw-r--r--Documentation/doc-guide/sphinx.rst5
-rw-r--r--Documentation/dontdiff1
-rw-r--r--Documentation/driver-api/i2c.rst9
-rw-r--r--Documentation/driver-api/index.rst6
-rw-r--r--Documentation/driver-api/libata.rst1031
-rw-r--r--Documentation/driver-api/mtdnand.rst1007
-rw-r--r--Documentation/driver-api/rapidio.rst107
-rw-r--r--Documentation/driver-api/s390-drivers.rst111
-rw-r--r--Documentation/driver-api/scsi.rst344
-rw-r--r--Documentation/driver-api/w1.rst70
-rw-r--r--Documentation/fb/api.txt4
-rw-r--r--Documentation/filesystems/conf.py10
-rw-r--r--Documentation/filesystems/index.rst317
-rw-r--r--Documentation/filesystems/nfs/idmapper.txt2
-rw-r--r--Documentation/gpu/todo.rst2
-rw-r--r--Documentation/index.rst18
-rw-r--r--Documentation/kbuild/makefiles.txt2
-rw-r--r--Documentation/kernel-doc-nano-HOWTO.txt65
-rw-r--r--Documentation/kernel-hacking/conf.py10
-rw-r--r--Documentation/kernel-hacking/hacking.rst811
-rw-r--r--Documentation/kernel-hacking/index.rst9
-rw-r--r--Documentation/kernel-hacking/locking.rst1446
-rw-r--r--Documentation/lsm.txt201
-rw-r--r--Documentation/media/uapi/v4l/vidioc-g-selection.rst4
-rw-r--r--Documentation/memory-barriers.txt10
-rw-r--r--Documentation/networking/conf.py10
-rw-r--r--Documentation/networking/dns_resolver.txt2
-rw-r--r--Documentation/networking/index.rst18
-rw-r--r--Documentation/networking/kapi.rst147
-rw-r--r--Documentation/networking/z8530book.rst256
-rw-r--r--Documentation/process/changes.rst26
-rw-r--r--Documentation/process/coding-style.rst4
-rw-r--r--Documentation/process/email-clients.rst5
-rw-r--r--Documentation/process/howto.rst8
-rw-r--r--Documentation/process/kernel-docs.rst34
-rw-r--r--Documentation/security/00-INDEX26
-rw-r--r--Documentation/security/IMA-templates.rst (renamed from Documentation/security/IMA-templates.txt)46
-rw-r--r--Documentation/security/LSM.rst14
-rw-r--r--Documentation/security/conf.py8
-rw-r--r--Documentation/security/credentials.rst (renamed from Documentation/security/credentials.txt)275
-rw-r--r--Documentation/security/index.rst8
-rw-r--r--Documentation/security/keys/core.rst (renamed from Documentation/security/keys.txt)314
-rw-r--r--Documentation/security/keys/ecryptfs.rst (renamed from Documentation/security/keys-ecryptfs.txt)19
-rw-r--r--Documentation/security/keys/index.rst11
-rw-r--r--Documentation/security/keys/request-key.rst (renamed from Documentation/security/keys-request-key.txt)73
-rw-r--r--Documentation/security/keys/trusted-encrypted.rst (renamed from Documentation/security/keys-trusted-encrypted.txt)32
-rw-r--r--Documentation/security/self-protection.rst (renamed from Documentation/security/self-protection.txt)99
-rw-r--r--Documentation/sh/conf.py10
-rw-r--r--Documentation/sh/index.rst59
-rw-r--r--Documentation/sound/conf.py10
-rw-r--r--Documentation/sphinx/convert_template.sed18
-rw-r--r--Documentation/sphinx/post_convert.sed23
-rwxr-xr-xDocumentation/sphinx/tmplcvt28
-rw-r--r--Documentation/translations/ja_JP/howto.rst7
-rw-r--r--Documentation/translations/ko_KR/howto.rst27
-rw-r--r--Documentation/translations/ko_KR/memory-barriers.txt2
-rw-r--r--Documentation/userspace-api/index.rst2
-rw-r--r--Documentation/userspace-api/no_new_privs.rst (renamed from Documentation/prctl/no_new_privs.txt)44
-rw-r--r--Documentation/userspace-api/seccomp_filter.rst (renamed from Documentation/prctl/seccomp_filter.txt)116
-rw-r--r--Documentation/userspace-api/unshare.rst2
-rw-r--r--MAINTAINERS21
-rw-r--r--Makefile10
-rw-r--r--arch/ia64/include/asm/io.h2
-rw-r--r--arch/ia64/sn/kernel/iomv.c2
-rw-r--r--drivers/ata/acard-ahci.c2
-rw-r--r--drivers/ata/ahci.c2
-rw-r--r--drivers/ata/ahci.h2
-rw-r--r--drivers/ata/ata_piix.c2
-rw-r--r--drivers/ata/libahci.c2
-rw-r--r--drivers/ata/libata-core.c2
-rw-r--r--drivers/ata/libata-eh.c2
-rw-r--r--drivers/ata/libata-scsi.c9
-rw-r--r--drivers/ata/libata-sff.c2
-rw-r--r--drivers/ata/libata.h2
-rw-r--r--drivers/ata/pata_pdc2027x.c2
-rw-r--r--drivers/ata/pdc_adma.c2
-rw-r--r--drivers/ata/sata_nv.c2
-rw-r--r--drivers/ata/sata_promise.c2
-rw-r--r--drivers/ata/sata_promise.h2
-rw-r--r--drivers/ata/sata_qstor.c2
-rw-r--r--drivers/ata/sata_sil.c2
-rw-r--r--drivers/ata/sata_sis.c2
-rw-r--r--drivers/ata/sata_svw.c2
-rw-r--r--drivers/ata/sata_sx4.c2
-rw-r--r--drivers/ata/sata_uli.c2
-rw-r--r--drivers/ata/sata_via.c2
-rw-r--r--drivers/ata/sata_vsc.c2
-rw-r--r--drivers/mtd/nand/nand_base.c7
-rw-r--r--drivers/net/phy/phy.c1
-rw-r--r--drivers/scsi/qla1280.c2
-rw-r--r--drivers/scsi/scsi_scan.c7
-rw-r--r--drivers/scsi/scsi_transport_fc.c18
-rw-r--r--drivers/scsi/scsicam.c4
-rw-r--r--fs/debugfs/file.c2
-rw-r--r--fs/debugfs/inode.c2
-rw-r--r--fs/eventfd.c4
-rw-r--r--fs/fs-writeback.c12
-rw-r--r--fs/jbd2/transaction.c42
-rw-r--r--fs/mpage.c1
-rw-r--r--fs/namei.c1
-rw-r--r--include/linux/ata.h2
-rw-r--r--include/linux/cred.h2
-rw-r--r--include/linux/debugfs.h2
-rw-r--r--include/linux/key.h2
-rw-r--r--include/linux/libata.h2
-rw-r--r--include/linux/lsm_hooks.h25
-rw-r--r--include/linux/mtd/nand.h2
-rw-r--r--include/linux/mutex.h6
-rw-r--r--include/linux/netdevice.h9
-rw-r--r--include/linux/skbuff.h2
-rw-r--r--include/net/sock.h9
-rw-r--r--kernel/cred.c2
-rw-r--r--kernel/futex.c40
-rw-r--r--kernel/irq/chip.c2
-rw-r--r--kernel/irq/handle.c2
-rw-r--r--kernel/irq/irqdesc.c2
-rw-r--r--kernel/locking/mutex.c6
-rw-r--r--lib/Kconfig.debug2
-rw-r--r--lib/Kconfig.kgdb2
-rw-r--r--net/core/datagram.c2
-rw-r--r--net/core/sock.c7
-rw-r--r--scripts/.gitignore1
-rw-r--r--scripts/Makefile10
-rw-r--r--scripts/check-lc_ctype.c11
-rw-r--r--scripts/docproc.c681
-rwxr-xr-xscripts/kernel-doc2
-rwxr-xr-xscripts/kernel-doc-xml-ref198
-rw-r--r--scripts/selinux/README2
-rw-r--r--security/apparmor/match.c2
-rw-r--r--security/apparmor/policy_unpack.c2
-rw-r--r--security/keys/encrypted-keys/encrypted.c2
-rw-r--r--security/keys/encrypted-keys/masterkey_trusted.c2
-rw-r--r--security/keys/request_key.c2
-rw-r--r--security/keys/request_key_auth.c2
-rw-r--r--security/keys/trusted.c2
-rw-r--r--security/yama/Kconfig3
181 files changed, 8403 insertions, 12195 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index ed3e5e949fce..f35473f8c630 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -24,8 +24,6 @@ DMA-ISA-LPC.txt
24 - How to do DMA with ISA (and LPC) devices. 24 - How to do DMA with ISA (and LPC) devices.
25DMA-attributes.txt 25DMA-attributes.txt
26 - listing of the various possible attributes a DMA region can have 26 - listing of the various possible attributes a DMA region can have
27DocBook/
28 - directory with DocBook templates etc. for kernel documentation.
29EDID/ 27EDID/
30 - directory with info on customizing EDID for broken gfx/displays. 28 - directory with info on customizing EDID for broken gfx/displays.
31IPMI.txt 29IPMI.txt
@@ -40,8 +38,6 @@ Intel-IOMMU.txt
40 - basic info on the Intel IOMMU virtualization support. 38 - basic info on the Intel IOMMU virtualization support.
41Makefile 39Makefile
42 - It's not of interest for those who aren't touching the build system. 40 - It's not of interest for those who aren't touching the build system.
43Makefile.sphinx
44 - It's not of interest for those who aren't touching the build system.
45PCI/ 41PCI/
46 - info related to PCI drivers. 42 - info related to PCI drivers.
47RCU/ 43RCU/
@@ -264,6 +260,8 @@ logo.gif
264 - full colour GIF image of Linux logo (penguin - Tux). 260 - full colour GIF image of Linux logo (penguin - Tux).
265logo.txt 261logo.txt
266 - info on creator of above logo & site to get additional images from. 262 - info on creator of above logo & site to get additional images from.
263lsm.txt
264 - Linux Security Modules: General Security Hooks for Linux
267lzo.txt 265lzo.txt
268 - kernel LZO decompressor input formats 266 - kernel LZO decompressor input formats
269m68k/ 267m68k/
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 6b20128fab8a..71200dfa0922 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -692,7 +692,7 @@ of preallocated entries is defined per architecture. If it is too low for you
692boot with 'dma_debug_entries=<your_desired_number>' to overwrite the 692boot with 'dma_debug_entries=<your_desired_number>' to overwrite the
693architectural default. 693architectural default.
694 694
695void debug_dmap_mapping_error(struct device *dev, dma_addr_t dma_addr); 695void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
696 696
697dma-debug interface debug_dma_mapping_error() to debug drivers that fail 697dma-debug interface debug_dma_mapping_error() to debug drivers that fail
698to check DMA mapping errors on addresses returned by dma_map_single() and 698to check DMA mapping errors on addresses returned by dma_map_single() and
diff --git a/Documentation/DocBook/.gitignore b/Documentation/DocBook/.gitignore
deleted file mode 100644
index e05da3f7aa21..000000000000
--- a/Documentation/DocBook/.gitignore
+++ /dev/null
@@ -1,17 +0,0 @@
1*.xml
2*.ps
3*.pdf
4*.html
5*.9.gz
6*.9
7*.aux
8*.dvi
9*.log
10*.out
11*.png
12*.gif
13*.svg
14*.proc
15*.db
16media-indices.tmpl
17media-entities.tmpl
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
deleted file mode 100644
index 85916f13d330..000000000000
--- a/Documentation/DocBook/Makefile
+++ /dev/null
@@ -1,282 +0,0 @@
1###
2# This makefile is used to generate the kernel documentation,
3# primarily based on in-line comments in various source files.
4# See Documentation/kernel-doc-nano-HOWTO.txt for instruction in how
5# to document the SRC - and how to read it.
6# To add a new book the only step required is to add the book to the
7# list of DOCBOOKS.
8
9DOCBOOKS := z8530book.xml \
10 kernel-hacking.xml kernel-locking.xml \
11 networking.xml \
12 filesystems.xml lsm.xml kgdb.xml \
13 libata.xml mtdnand.xml librs.xml rapidio.xml \
14 s390-drivers.xml scsi.xml \
15 sh.xml w1.xml
16
17ifeq ($(DOCBOOKS),)
18
19# Skip DocBook build if the user explicitly requested no DOCBOOKS.
20.DEFAULT:
21 @echo " SKIP DocBook $@ target (DOCBOOKS=\"\" specified)."
22else
23ifneq ($(SPHINXDIRS),)
24
25# Skip DocBook build if the user explicitly requested a sphinx dir
26.DEFAULT:
27 @echo " SKIP DocBook $@ target (SPHINXDIRS specified)."
28else
29
30
31###
32# The build process is as follows (targets):
33# (xmldocs) [by docproc]
34# file.tmpl --> file.xml +--> file.ps (psdocs) [by db2ps or xmlto]
35# +--> file.pdf (pdfdocs) [by db2pdf or xmlto]
36# +--> DIR=file (htmldocs) [by xmlto]
37# +--> man/ (mandocs) [by xmlto]
38
39
40# for PDF and PS output you can choose between xmlto and docbook-utils tools
41PDF_METHOD = $(prefer-db2x)
42PS_METHOD = $(prefer-db2x)
43
44
45targets += $(DOCBOOKS)
46BOOKS := $(addprefix $(obj)/,$(DOCBOOKS))
47xmldocs: $(BOOKS)
48sgmldocs: xmldocs
49
50PS := $(patsubst %.xml, %.ps, $(BOOKS))
51psdocs: $(PS)
52
53PDF := $(patsubst %.xml, %.pdf, $(BOOKS))
54pdfdocs: $(PDF)
55
56HTML := $(sort $(patsubst %.xml, %.html, $(BOOKS)))
57htmldocs: $(HTML)
58 $(call cmd,build_main_index)
59
60MAN := $(patsubst %.xml, %.9, $(BOOKS))
61mandocs: $(MAN)
62 find $(obj)/man -name '*.9' | xargs gzip -nf
63
64# Default location for installed man pages
65export INSTALL_MAN_PATH = $(objtree)/usr
66
67installmandocs: mandocs
68 mkdir -p $(INSTALL_MAN_PATH)/man/man9/
69 find $(obj)/man -name '*.9.gz' -printf '%h %f\n' | \
70 sort -k 2 -k 1 | uniq -f 1 | sed -e 's: :/:' | \
71 xargs install -m 644 -t $(INSTALL_MAN_PATH)/man/man9/
72
73# no-op for the DocBook toolchain
74epubdocs:
75latexdocs:
76linkcheckdocs:
77
78###
79#External programs used
80KERNELDOCXMLREF = $(srctree)/scripts/kernel-doc-xml-ref
81KERNELDOC = $(srctree)/scripts/kernel-doc
82DOCPROC = $(objtree)/scripts/docproc
83CHECK_LC_CTYPE = $(objtree)/scripts/check-lc_ctype
84
85# Use a fixed encoding - UTF-8 if the C library has support built-in
86# or ASCII if not
87LC_CTYPE := $(call try-run, LC_CTYPE=C.UTF-8 $(CHECK_LC_CTYPE),C.UTF-8,C)
88export LC_CTYPE
89
90XMLTOFLAGS = -m $(srctree)/$(src)/stylesheet.xsl
91XMLTOFLAGS += --skip-validation
92
93###
94# DOCPROC is used for two purposes:
95# 1) To generate a dependency list for a .tmpl file
96# 2) To preprocess a .tmpl file and call kernel-doc with
97# appropriate parameters.
98# The following rules are used to generate the .xml documentation
99# required to generate the final targets. (ps, pdf, html).
100quiet_cmd_docproc = DOCPROC $@
101 cmd_docproc = SRCTREE=$(srctree)/ $(DOCPROC) doc $< >$@
102define rule_docproc
103 set -e; \
104 $(if $($(quiet)cmd_$(1)),echo ' $($(quiet)cmd_$(1))';) \
105 $(cmd_$(1)); \
106 ( \
107 echo 'cmd_$@ := $(cmd_$(1))'; \
108 echo $@: `SRCTREE=$(srctree) $(DOCPROC) depend $<`; \
109 ) > $(dir $@).$(notdir $@).cmd
110endef
111
112%.xml: %.tmpl $(KERNELDOC) $(DOCPROC) $(KERNELDOCXMLREF) FORCE
113 $(call if_changed_rule,docproc)
114
115# Tell kbuild to always build the programs
116always := $(hostprogs-y)
117
118notfoundtemplate = echo "*** You have to install docbook-utils or xmlto ***"; \
119 exit 1
120db2xtemplate = db2TYPE -o $(dir $@) $<
121xmltotemplate = xmlto TYPE $(XMLTOFLAGS) -o $(dir $@) $<
122
123# determine which methods are available
124ifeq ($(shell which db2ps >/dev/null 2>&1 && echo found),found)
125 use-db2x = db2x
126 prefer-db2x = db2x
127else
128 use-db2x = notfound
129 prefer-db2x = $(use-xmlto)
130endif
131ifeq ($(shell which xmlto >/dev/null 2>&1 && echo found),found)
132 use-xmlto = xmlto
133 prefer-xmlto = xmlto
134else
135 use-xmlto = notfound
136 prefer-xmlto = $(use-db2x)
137endif
138
139# the commands, generated from the chosen template
140quiet_cmd_db2ps = PS $@
141 cmd_db2ps = $(subst TYPE,ps, $($(PS_METHOD)template))
142%.ps : %.xml
143 $(call cmd,db2ps)
144
145quiet_cmd_db2pdf = PDF $@
146 cmd_db2pdf = $(subst TYPE,pdf, $($(PDF_METHOD)template))
147%.pdf : %.xml
148 $(call cmd,db2pdf)
149
150
151index = index.html
152main_idx = $(obj)/$(index)
153quiet_cmd_build_main_index = HTML $(main_idx)
154 cmd_build_main_index = rm -rf $(main_idx); \
155 echo '<h1>Linux Kernel HTML Documentation</h1>' >> $(main_idx) && \
156 echo '<h2>Kernel Version: $(KERNELVERSION)</h2>' >> $(main_idx) && \
157 cat $(HTML) >> $(main_idx)
158
159quiet_cmd_db2html = HTML $@
160 cmd_db2html = xmlto html $(XMLTOFLAGS) -o $(patsubst %.html,%,$@) $< && \
161 echo '<a HREF="$(patsubst %.html,%,$(notdir $@))/index.html"> \
162 $(patsubst %.html,%,$(notdir $@))</a><p>' > $@
163
164###
165# Rules to create an aux XML and .db, and use them to re-process the DocBook XML
166# to fill internal hyperlinks
167 gen_aux_xml = :
168 quiet_gen_aux_xml = echo ' XMLREF $@'
169silent_gen_aux_xml = :
170%.aux.xml: %.xml
171 @$($(quiet)gen_aux_xml)
172 @rm -rf $@
173 @(cat $< | egrep "^<refentry id" | egrep -o "\".*\"" | cut -f 2 -d \" > $<.db)
174 @$(KERNELDOCXMLREF) -db $<.db $< > $@
175.PRECIOUS: %.aux.xml
176
177%.html: %.aux.xml
178 @(which xmlto > /dev/null 2>&1) || \
179 (echo "*** You need to install xmlto ***"; \
180 exit 1)
181 @rm -rf $@ $(patsubst %.html,%,$@)
182 $(call cmd,db2html)
183 @if [ ! -z "$(PNG-$(basename $(notdir $@)))" ]; then \
184 cp $(PNG-$(basename $(notdir $@))) $(patsubst %.html,%,$@); fi
185
186quiet_cmd_db2man = MAN $@
187 cmd_db2man = if grep -q refentry $<; then xmlto man $(XMLTOFLAGS) -o $(obj)/man/$(*F) $< ; fi
188%.9 : %.xml
189 @(which xmlto > /dev/null 2>&1) || \
190 (echo "*** You need to install xmlto ***"; \
191 exit 1)
192 $(Q)mkdir -p $(obj)/man/$(*F)
193 $(call cmd,db2man)
194 @touch $@
195
196###
197# Rules to generate postscripts and PNG images from .fig format files
198quiet_cmd_fig2eps = FIG2EPS $@
199 cmd_fig2eps = fig2dev -Leps $< $@
200
201%.eps: %.fig
202 @(which fig2dev > /dev/null 2>&1) || \
203 (echo "*** You need to install transfig ***"; \
204 exit 1)
205 $(call cmd,fig2eps)
206
207quiet_cmd_fig2png = FIG2PNG $@
208 cmd_fig2png = fig2dev -Lpng $< $@
209
210%.png: %.fig
211 @(which fig2dev > /dev/null 2>&1) || \
212 (echo "*** You need to install transfig ***"; \
213 exit 1)
214 $(call cmd,fig2png)
215
216###
217# Rule to convert a .c file to inline XML documentation
218 gen_xml = :
219 quiet_gen_xml = echo ' GEN $@'
220silent_gen_xml = :
221%.xml: %.c
222 @$($(quiet)gen_xml)
223 @( \
224 echo "<programlisting>"; \
225 expand --tabs=8 < $< | \
226 sed -e "s/&/\\&amp;/g" \
227 -e "s/</\\&lt;/g" \
228 -e "s/>/\\&gt;/g"; \
229 echo "</programlisting>") > $@
230
231endif # DOCBOOKS=""
232endif # SPHINDIR=...
233
234###
235# Help targets as used by the top-level makefile
236dochelp:
237 @echo ' Linux kernel internal documentation in different formats (DocBook):'
238 @echo ' htmldocs - HTML'
239 @echo ' pdfdocs - PDF'
240 @echo ' psdocs - Postscript'
241 @echo ' xmldocs - XML DocBook'
242 @echo ' mandocs - man pages'
243 @echo ' installmandocs - install man pages generated by mandocs to INSTALL_MAN_PATH'; \
244 echo ' (default: $(INSTALL_MAN_PATH))'; \
245 echo ''
246 @echo ' cleandocs - clean all generated DocBook files'
247 @echo
248 @echo ' make DOCBOOKS="s1.xml s2.xml" [target] Generate only docs s1.xml s2.xml'
249 @echo ' valid values for DOCBOOKS are: $(DOCBOOKS)'
250 @echo
251 @echo " make DOCBOOKS=\"\" [target] Don't generate docs from Docbook"
252 @echo ' This is useful to generate only the ReST docs (Sphinx)'
253
254
255###
256# Temporary files left by various tools
257clean-files := $(DOCBOOKS) \
258 $(patsubst %.xml, %.dvi, $(DOCBOOKS)) \
259 $(patsubst %.xml, %.aux, $(DOCBOOKS)) \
260 $(patsubst %.xml, %.tex, $(DOCBOOKS)) \
261 $(patsubst %.xml, %.log, $(DOCBOOKS)) \
262 $(patsubst %.xml, %.out, $(DOCBOOKS)) \
263 $(patsubst %.xml, %.ps, $(DOCBOOKS)) \
264 $(patsubst %.xml, %.pdf, $(DOCBOOKS)) \
265 $(patsubst %.xml, %.html, $(DOCBOOKS)) \
266 $(patsubst %.xml, %.9, $(DOCBOOKS)) \
267 $(patsubst %.xml, %.aux.xml, $(DOCBOOKS)) \
268 $(patsubst %.xml, %.xml.db, $(DOCBOOKS)) \
269 $(patsubst %.xml, %.xml, $(DOCBOOKS)) \
270 $(patsubst %.xml, .%.xml.cmd, $(DOCBOOKS)) \
271 $(index)
272
273clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man
274
275cleandocs:
276 $(Q)rm -f $(call objectify, $(clean-files))
277 $(Q)rm -rf $(call objectify, $(clean-dirs))
278
279# Declare the contents of the .PHONY variable as phony. We keep that
280# information in a variable so we can use it in if_changed and friends.
281
282.PHONY: $(PHONY)
diff --git a/Documentation/DocBook/filesystems.tmpl b/Documentation/DocBook/filesystems.tmpl
deleted file mode 100644
index 6006b6358c86..000000000000
--- a/Documentation/DocBook/filesystems.tmpl
+++ /dev/null
@@ -1,381 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="Linux-filesystems-API">
6 <bookinfo>
7 <title>Linux Filesystems API</title>
8
9 <legalnotice>
10 <para>
11 This documentation is free software; you can redistribute
12 it and/or modify it under the terms of the GNU General Public
13 License as published by the Free Software Foundation; either
14 version 2 of the License, or (at your option) any later
15 version.
16 </para>
17
18 <para>
19 This program is distributed in the hope that it will be
20 useful, but WITHOUT ANY WARRANTY; without even the implied
21 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
22 See the GNU General Public License for more details.
23 </para>
24
25 <para>
26 You should have received a copy of the GNU General Public
27 License along with this program; if not, write to the Free
28 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
29 MA 02111-1307 USA
30 </para>
31
32 <para>
33 For more details see the file COPYING in the source
34 distribution of Linux.
35 </para>
36 </legalnotice>
37 </bookinfo>
38
39<toc></toc>
40
41 <chapter id="vfs">
42 <title>The Linux VFS</title>
43 <sect1 id="the_filesystem_types"><title>The Filesystem types</title>
44!Iinclude/linux/fs.h
45 </sect1>
46 <sect1 id="the_directory_cache"><title>The Directory Cache</title>
47!Efs/dcache.c
48!Iinclude/linux/dcache.h
49 </sect1>
50 <sect1 id="inode_handling"><title>Inode Handling</title>
51!Efs/inode.c
52!Efs/bad_inode.c
53 </sect1>
54 <sect1 id="registration_and_superblocks"><title>Registration and Superblocks</title>
55!Efs/super.c
56 </sect1>
57 <sect1 id="file_locks"><title>File Locks</title>
58!Efs/locks.c
59!Ifs/locks.c
60 </sect1>
61 <sect1 id="other_functions"><title>Other Functions</title>
62!Efs/mpage.c
63!Efs/namei.c
64!Efs/buffer.c
65!Eblock/bio.c
66!Efs/seq_file.c
67!Efs/filesystems.c
68!Efs/fs-writeback.c
69!Efs/block_dev.c
70 </sect1>
71 </chapter>
72
73 <chapter id="proc">
74 <title>The proc filesystem</title>
75
76 <sect1 id="sysctl_interface"><title>sysctl interface</title>
77!Ekernel/sysctl.c
78 </sect1>
79
80 <sect1 id="proc_filesystem_interface"><title>proc filesystem interface</title>
81!Ifs/proc/base.c
82 </sect1>
83 </chapter>
84
85 <chapter id="fs_events">
86 <title>Events based on file descriptors</title>
87!Efs/eventfd.c
88 </chapter>
89
90 <chapter id="sysfs">
91 <title>The Filesystem for Exporting Kernel Objects</title>
92!Efs/sysfs/file.c
93!Efs/sysfs/symlink.c
94 </chapter>
95
96 <chapter id="debugfs">
97 <title>The debugfs filesystem</title>
98
99 <sect1 id="debugfs_interface"><title>debugfs interface</title>
100!Efs/debugfs/inode.c
101!Efs/debugfs/file.c
102 </sect1>
103 </chapter>
104
105 <chapter id="LinuxJDBAPI">
106 <chapterinfo>
107 <title>The Linux Journalling API</title>
108
109 <authorgroup>
110 <author>
111 <firstname>Roger</firstname>
112 <surname>Gammans</surname>
113 <affiliation>
114 <address>
115 <email>rgammans@computer-surgery.co.uk</email>
116 </address>
117 </affiliation>
118 </author>
119 </authorgroup>
120
121 <authorgroup>
122 <author>
123 <firstname>Stephen</firstname>
124 <surname>Tweedie</surname>
125 <affiliation>
126 <address>
127 <email>sct@redhat.com</email>
128 </address>
129 </affiliation>
130 </author>
131 </authorgroup>
132
133 <copyright>
134 <year>2002</year>
135 <holder>Roger Gammans</holder>
136 </copyright>
137 </chapterinfo>
138
139 <title>The Linux Journalling API</title>
140
141 <sect1 id="journaling_overview">
142 <title>Overview</title>
143 <sect2 id="journaling_details">
144 <title>Details</title>
145<para>
146The journalling layer is easy to use. You need to
147first of all create a journal_t data structure. There are
148two calls to do this dependent on how you decide to allocate the physical
149media on which the journal resides. The jbd2_journal_init_inode() call
150is for journals stored in filesystem inodes, or the jbd2_journal_init_dev()
151call can be used for journal stored on a raw device (in a continuous range
152of blocks). A journal_t is a typedef for a struct pointer, so when
153you are finally finished make sure you call jbd2_journal_destroy() on it
154to free up any used kernel memory.
155</para>
156
157<para>
158Once you have got your journal_t object you need to 'mount' or load the journal
159file. The journalling layer expects the space for the journal was already
160allocated and initialized properly by the userspace tools. When loading the
161journal you must call jbd2_journal_load() to process journal contents. If the
162client file system detects the journal contents does not need to be processed
163(or even need not have valid contents), it may call jbd2_journal_wipe() to
164clear the journal contents before calling jbd2_journal_load().
165</para>
166
167<para>
168Note that jbd2_journal_wipe(..,0) calls jbd2_journal_skip_recovery() for you if
169it detects any outstanding transactions in the journal and similarly
170jbd2_journal_load() will call jbd2_journal_recover() if necessary. I would
171advise reading ext4_load_journal() in fs/ext4/super.c for examples on this
172stage.
173</para>
174
175<para>
176Now you can go ahead and start modifying the underlying
177filesystem. Almost.
178</para>
179
180<para>
181
182You still need to actually journal your filesystem changes, this
183is done by wrapping them into transactions. Additionally you
184also need to wrap the modification of each of the buffers
185with calls to the journal layer, so it knows what the modifications
186you are actually making are. To do this use jbd2_journal_start() which
187returns a transaction handle.
188</para>
189
190<para>
191jbd2_journal_start()
192and its counterpart jbd2_journal_stop(), which indicates the end of a
193transaction are nestable calls, so you can reenter a transaction if necessary,
194but remember you must call jbd2_journal_stop() the same number of times as
195jbd2_journal_start() before the transaction is completed (or more accurately
196leaves the update phase). Ext4/VFS makes use of this feature to simplify
197handling of inode dirtying, quota support, etc.
198</para>
199
200<para>
201Inside each transaction you need to wrap the modifications to the
202individual buffers (blocks). Before you start to modify a buffer you
203need to call jbd2_journal_get_{create,write,undo}_access() as appropriate,
204this allows the journalling layer to copy the unmodified data if it
205needs to. After all the buffer may be part of a previously uncommitted
206transaction.
207At this point you are at last ready to modify a buffer, and once
208you are have done so you need to call jbd2_journal_dirty_{meta,}data().
209Or if you've asked for access to a buffer you now know is now longer
210required to be pushed back on the device you can call jbd2_journal_forget()
211in much the same way as you might have used bforget() in the past.
212</para>
213
214<para>
215A jbd2_journal_flush() may be called at any time to commit and checkpoint
216all your transactions.
217</para>
218
219<para>
220Then at umount time , in your put_super() you can then call jbd2_journal_destroy()
221to clean up your in-core journal object.
222</para>
223
224<para>
225Unfortunately there a couple of ways the journal layer can cause a deadlock.
226The first thing to note is that each task can only have
227a single outstanding transaction at any one time, remember nothing
228commits until the outermost jbd2_journal_stop(). This means
229you must complete the transaction at the end of each file/inode/address
230etc. operation you perform, so that the journalling system isn't re-entered
231on another journal. Since transactions can't be nested/batched
232across differing journals, and another filesystem other than
233yours (say ext4) may be modified in a later syscall.
234</para>
235
236<para>
237The second case to bear in mind is that jbd2_journal_start() can
238block if there isn't enough space in the journal for your transaction
239(based on the passed nblocks param) - when it blocks it merely(!) needs to
240wait for transactions to complete and be committed from other tasks,
241so essentially we are waiting for jbd2_journal_stop(). So to avoid
242deadlocks you must treat jbd2_journal_start/stop() as if they
243were semaphores and include them in your semaphore ordering rules to prevent
244deadlocks. Note that jbd2_journal_extend() has similar blocking behaviour to
245jbd2_journal_start() so you can deadlock here just as easily as on
246jbd2_journal_start().
247</para>
248
249<para>
250Try to reserve the right number of blocks the first time. ;-). This will
251be the maximum number of blocks you are going to touch in this transaction.
252I advise having a look at at least ext4_jbd.h to see the basis on which
253ext4 uses to make these decisions.
254</para>
255
256<para>
257Another wriggle to watch out for is your on-disk block allocation strategy.
258Why? Because, if you do a delete, you need to ensure you haven't reused any
259of the freed blocks until the transaction freeing these blocks commits. If you
260reused these blocks and crash happens, there is no way to restore the contents
261of the reallocated blocks at the end of the last fully committed transaction.
262
263One simple way of doing this is to mark blocks as free in internal in-memory
264block allocation structures only after the transaction freeing them commits.
265Ext4 uses journal commit callback for this purpose.
266</para>
267
268<para>
269With journal commit callbacks you can ask the journalling layer to call a
270callback function when the transaction is finally committed to disk, so that
271you can do some of your own management. You ask the journalling layer for
272calling the callback by simply setting journal->j_commit_callback function
273pointer and that function is called after each transaction commit. You can also
274use transaction->t_private_list for attaching entries to a transaction that
275need processing when the transaction commits.
276</para>
277
278<para>
279JBD2 also provides a way to block all transaction updates via
280jbd2_journal_{un,}lock_updates(). Ext4 uses this when it wants a window with a
281clean and stable fs for a moment. E.g.
282</para>
283
284<programlisting>
285
286 jbd2_journal_lock_updates() //stop new stuff happening..
287 jbd2_journal_flush() // checkpoint everything.
288 ..do stuff on stable fs
289 jbd2_journal_unlock_updates() // carry on with filesystem use.
290</programlisting>
291
292<para>
293The opportunities for abuse and DOS attacks with this should be obvious,
294if you allow unprivileged userspace to trigger codepaths containing these
295calls.
296</para>
297
298 </sect2>
299
300 <sect2 id="jbd_summary">
301 <title>Summary</title>
302<para>
303Using the journal is a matter of wrapping the different context changes,
304being each mount, each modification (transaction) and each changed buffer
305to tell the journalling layer about them.
306</para>
307
308 </sect2>
309
310 </sect1>
311
312 <sect1 id="data_types">
313 <title>Data Types</title>
314 <para>
315 The journalling layer uses typedefs to 'hide' the concrete definitions
316 of the structures used. As a client of the JBD2 layer you can
317 just rely on the using the pointer as a magic cookie of some sort.
318
319 Obviously the hiding is not enforced as this is 'C'.
320 </para>
321 <sect2 id="structures"><title>Structures</title>
322!Iinclude/linux/jbd2.h
323 </sect2>
324 </sect1>
325
326 <sect1 id="functions">
327 <title>Functions</title>
328 <para>
329 The functions here are split into two groups those that
330 affect a journal as a whole, and those which are used to
331 manage transactions
332 </para>
333 <sect2 id="journal_level"><title>Journal Level</title>
334!Efs/jbd2/journal.c
335!Ifs/jbd2/recovery.c
336 </sect2>
337 <sect2 id="transaction_level"><title>Transasction Level</title>
338!Efs/jbd2/transaction.c
339 </sect2>
340 </sect1>
341 <sect1 id="see_also">
342 <title>See also</title>
343 <para>
344 <citation>
345 <ulink url="http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz">
346 Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen Tweedie
347 </ulink>
348 </citation>
349 </para>
350 <para>
351 <citation>
352 <ulink url="http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html">
353 Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen Tweedie
354 </ulink>
355 </citation>
356 </para>
357 </sect1>
358
359 </chapter>
360
361 <chapter id="splice">
362 <title>splice API</title>
363 <para>
364 splice is a method for moving blocks of data around inside the
365 kernel, without continually transferring them between the kernel
366 and user space.
367 </para>
368!Ffs/splice.c
369 </chapter>
370
371 <chapter id="pipes">
372 <title>pipes API</title>
373 <para>
374 Pipe interfaces are all for in-kernel (builtin image) use.
375 They are not exported for use by modules.
376 </para>
377!Iinclude/linux/pipe_fs_i.h
378!Ffs/pipe.c
379 </chapter>
380
381</book>
diff --git a/Documentation/DocBook/kernel-hacking.tmpl b/Documentation/DocBook/kernel-hacking.tmpl
deleted file mode 100644
index c3c705591532..000000000000
--- a/Documentation/DocBook/kernel-hacking.tmpl
+++ /dev/null
@@ -1,1312 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="lk-hacking-guide">
6 <bookinfo>
7 <title>Unreliable Guide To Hacking The Linux Kernel</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Rusty</firstname>
12 <surname>Russell</surname>
13 <affiliation>
14 <address>
15 <email>rusty@rustcorp.com.au</email>
16 </address>
17 </affiliation>
18 </author>
19 </authorgroup>
20
21 <copyright>
22 <year>2005</year>
23 <holder>Rusty Russell</holder>
24 </copyright>
25
26 <legalnotice>
27 <para>
28 This documentation is free software; you can redistribute
29 it and/or modify it under the terms of the GNU General Public
30 License as published by the Free Software Foundation; either
31 version 2 of the License, or (at your option) any later
32 version.
33 </para>
34
35 <para>
36 This program is distributed in the hope that it will be
37 useful, but WITHOUT ANY WARRANTY; without even the implied
38 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
39 See the GNU General Public License for more details.
40 </para>
41
42 <para>
43 You should have received a copy of the GNU General Public
44 License along with this program; if not, write to the Free
45 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
46 MA 02111-1307 USA
47 </para>
48
49 <para>
50 For more details see the file COPYING in the source
51 distribution of Linux.
52 </para>
53 </legalnotice>
54
55 <releaseinfo>
56 This is the first release of this document as part of the kernel tarball.
57 </releaseinfo>
58
59 </bookinfo>
60
61 <toc></toc>
62
63 <chapter id="introduction">
64 <title>Introduction</title>
65 <para>
66 Welcome, gentle reader, to Rusty's Remarkably Unreliable Guide to Linux
67 Kernel Hacking. This document describes the common routines and
68 general requirements for kernel code: its goal is to serve as a
69 primer for Linux kernel development for experienced C
70 programmers. I avoid implementation details: that's what the
71 code is for, and I ignore whole tracts of useful routines.
72 </para>
73 <para>
74 Before you read this, please understand that I never wanted to
75 write this document, being grossly under-qualified, but I always
76 wanted to read it, and this was the only way. I hope it will
77 grow into a compendium of best practice, common starting points
78 and random information.
79 </para>
80 </chapter>
81
82 <chapter id="basic-players">
83 <title>The Players</title>
84
85 <para>
86 At any time each of the CPUs in a system can be:
87 </para>
88
89 <itemizedlist>
90 <listitem>
91 <para>
92 not associated with any process, serving a hardware interrupt;
93 </para>
94 </listitem>
95
96 <listitem>
97 <para>
98 not associated with any process, serving a softirq or tasklet;
99 </para>
100 </listitem>
101
102 <listitem>
103 <para>
104 running in kernel space, associated with a process (user context);
105 </para>
106 </listitem>
107
108 <listitem>
109 <para>
110 running a process in user space.
111 </para>
112 </listitem>
113 </itemizedlist>
114
115 <para>
116 There is an ordering between these. The bottom two can preempt
117 each other, but above that is a strict hierarchy: each can only be
118 preempted by the ones above it. For example, while a softirq is
119 running on a CPU, no other softirq will preempt it, but a hardware
120 interrupt can. However, any other CPUs in the system execute
121 independently.
122 </para>
123
124 <para>
125 We'll see a number of ways that the user context can block
126 interrupts, to become truly non-preemptable.
127 </para>
128
129 <sect1 id="basics-usercontext">
130 <title>User Context</title>
131
132 <para>
133 User context is when you are coming in from a system call or other
134 trap: like userspace, you can be preempted by more important tasks
135 and by interrupts. You can sleep, by calling
136 <function>schedule()</function>.
137 </para>
138
139 <note>
140 <para>
141 You are always in user context on module load and unload,
142 and on operations on the block device layer.
143 </para>
144 </note>
145
146 <para>
147 In user context, the <varname>current</varname> pointer (indicating
148 the task we are currently executing) is valid, and
149 <function>in_interrupt()</function>
150 (<filename>include/linux/interrupt.h</filename>) is <returnvalue>false
151 </returnvalue>.
152 </para>
153
154 <caution>
155 <para>
156 Beware that if you have preemption or softirqs disabled
157 (see below), <function>in_interrupt()</function> will return a
158 false positive.
159 </para>
160 </caution>
161 </sect1>
162
163 <sect1 id="basics-hardirqs">
164 <title>Hardware Interrupts (Hard IRQs)</title>
165
166 <para>
167 Timer ticks, <hardware>network cards</hardware> and
168 <hardware>keyboard</hardware> are examples of real
169 hardware which produce interrupts at any time. The kernel runs
170 interrupt handlers, which services the hardware. The kernel
171 guarantees that this handler is never re-entered: if the same
172 interrupt arrives, it is queued (or dropped). Because it
173 disables interrupts, this handler has to be fast: frequently it
174 simply acknowledges the interrupt, marks a 'software interrupt'
175 for execution and exits.
176 </para>
177
178 <para>
179 You can tell you are in a hardware interrupt, because
180 <function>in_irq()</function> returns <returnvalue>true</returnvalue>.
181 </para>
182 <caution>
183 <para>
184 Beware that this will return a false positive if interrupts are disabled
185 (see below).
186 </para>
187 </caution>
188 </sect1>
189
190 <sect1 id="basics-softirqs">
191 <title>Software Interrupt Context: Softirqs and Tasklets</title>
192
193 <para>
194 Whenever a system call is about to return to userspace, or a
195 hardware interrupt handler exits, any 'software interrupts'
196 which are marked pending (usually by hardware interrupts) are
197 run (<filename>kernel/softirq.c</filename>).
198 </para>
199
200 <para>
201 Much of the real interrupt handling work is done here. Early in
202 the transition to <acronym>SMP</acronym>, there were only 'bottom
203 halves' (BHs), which didn't take advantage of multiple CPUs. Shortly
204 after we switched from wind-up computers made of match-sticks and snot,
205 we abandoned this limitation and switched to 'softirqs'.
206 </para>
207
208 <para>
209 <filename class="headerfile">include/linux/interrupt.h</filename> lists the
210 different softirqs. A very important softirq is the
211 timer softirq (<filename
212 class="headerfile">include/linux/timer.h</filename>): you can
213 register to have it call functions for you in a given length of
214 time.
215 </para>
216
217 <para>
218 Softirqs are often a pain to deal with, since the same softirq
219 will run simultaneously on more than one CPU. For this reason,
220 tasklets (<filename
221 class="headerfile">include/linux/interrupt.h</filename>) are more
222 often used: they are dynamically-registrable (meaning you can have
223 as many as you want), and they also guarantee that any tasklet
224 will only run on one CPU at any time, although different tasklets
225 can run simultaneously.
226 </para>
227 <caution>
228 <para>
229 The name 'tasklet' is misleading: they have nothing to do with 'tasks',
230 and probably more to do with some bad vodka Alexey Kuznetsov had at the
231 time.
232 </para>
233 </caution>
234
235 <para>
236 You can tell you are in a softirq (or tasklet)
237 using the <function>in_softirq()</function> macro
238 (<filename class="headerfile">include/linux/interrupt.h</filename>).
239 </para>
240 <caution>
241 <para>
242 Beware that this will return a false positive if a bh lock (see below)
243 is held.
244 </para>
245 </caution>
246 </sect1>
247 </chapter>
248
249 <chapter id="basic-rules">
250 <title>Some Basic Rules</title>
251
252 <variablelist>
253 <varlistentry>
254 <term>No memory protection</term>
255 <listitem>
256 <para>
257 If you corrupt memory, whether in user context or
258 interrupt context, the whole machine will crash. Are you
259 sure you can't do what you want in userspace?
260 </para>
261 </listitem>
262 </varlistentry>
263
264 <varlistentry>
265 <term>No floating point or <acronym>MMX</acronym></term>
266 <listitem>
267 <para>
268 The <acronym>FPU</acronym> context is not saved; even in user
269 context the <acronym>FPU</acronym> state probably won't
270 correspond with the current process: you would mess with some
271 user process' <acronym>FPU</acronym> state. If you really want
272 to do this, you would have to explicitly save/restore the full
273 <acronym>FPU</acronym> state (and avoid context switches). It
274 is generally a bad idea; use fixed point arithmetic first.
275 </para>
276 </listitem>
277 </varlistentry>
278
279 <varlistentry>
280 <term>A rigid stack limit</term>
281 <listitem>
282 <para>
283 Depending on configuration options the kernel stack is about 3K to 6K for most 32-bit architectures: it's
284 about 14K on most 64-bit archs, and often shared with interrupts
285 so you can't use it all. Avoid deep recursion and huge local
286 arrays on the stack (allocate them dynamically instead).
287 </para>
288 </listitem>
289 </varlistentry>
290
291 <varlistentry>
292 <term>The Linux kernel is portable</term>
293 <listitem>
294 <para>
295 Let's keep it that way. Your code should be 64-bit clean,
296 and endian-independent. You should also minimize CPU
297 specific stuff, e.g. inline assembly should be cleanly
298 encapsulated and minimized to ease porting. Generally it
299 should be restricted to the architecture-dependent part of
300 the kernel tree.
301 </para>
302 </listitem>
303 </varlistentry>
304 </variablelist>
305 </chapter>
306
307 <chapter id="ioctls">
308 <title>ioctls: Not writing a new system call</title>
309
310 <para>
311 A system call generally looks like this
312 </para>
313
314 <programlisting>
315asmlinkage long sys_mycall(int arg)
316{
317 return 0;
318}
319 </programlisting>
320
321 <para>
322 First, in most cases you don't want to create a new system call.
323 You create a character device and implement an appropriate ioctl
324 for it. This is much more flexible than system calls, doesn't have
325 to be entered in every architecture's
326 <filename class="headerfile">include/asm/unistd.h</filename> and
327 <filename>arch/kernel/entry.S</filename> file, and is much more
328 likely to be accepted by Linus.
329 </para>
330
331 <para>
332 If all your routine does is read or write some parameter, consider
333 implementing a <function>sysfs</function> interface instead.
334 </para>
335
336 <para>
337 Inside the ioctl you're in user context to a process. When a
338 error occurs you return a negated errno (see
339 <filename class="headerfile">include/linux/errno.h</filename>),
340 otherwise you return <returnvalue>0</returnvalue>.
341 </para>
342
343 <para>
344 After you slept you should check if a signal occurred: the
345 Unix/Linux way of handling signals is to temporarily exit the
346 system call with the <constant>-ERESTARTSYS</constant> error. The
347 system call entry code will switch back to user context, process
348 the signal handler and then your system call will be restarted
349 (unless the user disabled that). So you should be prepared to
350 process the restart, e.g. if you're in the middle of manipulating
351 some data structure.
352 </para>
353
354 <programlisting>
355if (signal_pending(current))
356 return -ERESTARTSYS;
357 </programlisting>
358
359 <para>
360 If you're doing longer computations: first think userspace. If you
361 <emphasis>really</emphasis> want to do it in kernel you should
362 regularly check if you need to give up the CPU (remember there is
363 cooperative multitasking per CPU). Idiom:
364 </para>
365
366 <programlisting>
367cond_resched(); /* Will sleep */
368 </programlisting>
369
370 <para>
371 A short note on interface design: the UNIX system call motto is
372 "Provide mechanism not policy".
373 </para>
374 </chapter>
375
376 <chapter id="deadlock-recipes">
377 <title>Recipes for Deadlock</title>
378
379 <para>
380 You cannot call any routines which may sleep, unless:
381 </para>
382 <itemizedlist>
383 <listitem>
384 <para>
385 You are in user context.
386 </para>
387 </listitem>
388
389 <listitem>
390 <para>
391 You do not own any spinlocks.
392 </para>
393 </listitem>
394
395 <listitem>
396 <para>
397 You have interrupts enabled (actually, Andi Kleen says
398 that the scheduling code will enable them for you, but
399 that's probably not what you wanted).
400 </para>
401 </listitem>
402 </itemizedlist>
403
404 <para>
405 Note that some functions may sleep implicitly: common ones are
406 the user space access functions (*_user) and memory allocation
407 functions without <symbol>GFP_ATOMIC</symbol>.
408 </para>
409
410 <para>
411 You should always compile your kernel
412 <symbol>CONFIG_DEBUG_ATOMIC_SLEEP</symbol> on, and it will warn
413 you if you break these rules. If you <emphasis>do</emphasis> break
414 the rules, you will eventually lock up your box.
415 </para>
416
417 <para>
418 Really.
419 </para>
420 </chapter>
421
422 <chapter id="common-routines">
423 <title>Common Routines</title>
424
425 <sect1 id="routines-printk">
426 <title>
427 <function>printk()</function>
428 <filename class="headerfile">include/linux/kernel.h</filename>
429 </title>
430
431 <para>
432 <function>printk()</function> feeds kernel messages to the
433 console, dmesg, and the syslog daemon. It is useful for debugging
434 and reporting errors, and can be used inside interrupt context,
435 but use with caution: a machine which has its console flooded with
436 printk messages is unusable. It uses a format string mostly
437 compatible with ANSI C printf, and C string concatenation to give
438 it a first "priority" argument:
439 </para>
440
441 <programlisting>
442printk(KERN_INFO "i = %u\n", i);
443 </programlisting>
444
445 <para>
446 See <filename class="headerfile">include/linux/kernel.h</filename>;
447 for other KERN_ values; these are interpreted by syslog as the
448 level. Special case: for printing an IP address use
449 </para>
450
451 <programlisting>
452__be32 ipaddress;
453printk(KERN_INFO "my ip: %pI4\n", &amp;ipaddress);
454 </programlisting>
455
456 <para>
457 <function>printk()</function> internally uses a 1K buffer and does
458 not catch overruns. Make sure that will be enough.
459 </para>
460
461 <note>
462 <para>
463 You will know when you are a real kernel hacker
464 when you start typoing printf as printk in your user programs :)
465 </para>
466 </note>
467
468 <!--- From the Lions book reader department -->
469
470 <note>
471 <para>
472 Another sidenote: the original Unix Version 6 sources had a
473 comment on top of its printf function: "Printf should not be
474 used for chit-chat". You should follow that advice.
475 </para>
476 </note>
477 </sect1>
478
479 <sect1 id="routines-copy">
480 <title>
481 <function>copy_[to/from]_user()</function>
482 /
483 <function>get_user()</function>
484 /
485 <function>put_user()</function>
486 <filename class="headerfile">include/linux/uaccess.h</filename>
487 </title>
488
489 <para>
490 <emphasis>[SLEEPS]</emphasis>
491 </para>
492
493 <para>
494 <function>put_user()</function> and <function>get_user()</function>
495 are used to get and put single values (such as an int, char, or
496 long) from and to userspace. A pointer into userspace should
497 never be simply dereferenced: data should be copied using these
498 routines. Both return <constant>-EFAULT</constant> or 0.
499 </para>
500 <para>
501 <function>copy_to_user()</function> and
502 <function>copy_from_user()</function> are more general: they copy
503 an arbitrary amount of data to and from userspace.
504 <caution>
505 <para>
506 Unlike <function>put_user()</function> and
507 <function>get_user()</function>, they return the amount of
508 uncopied data (ie. <returnvalue>0</returnvalue> still means
509 success).
510 </para>
511 </caution>
512 [Yes, this moronic interface makes me cringe. The flamewar comes up every year or so. --RR.]
513 </para>
514 <para>
515 The functions may sleep implicitly. This should never be called
516 outside user context (it makes no sense), with interrupts
517 disabled, or a spinlock held.
518 </para>
519 </sect1>
520
521 <sect1 id="routines-kmalloc">
522 <title><function>kmalloc()</function>/<function>kfree()</function>
523 <filename class="headerfile">include/linux/slab.h</filename></title>
524
525 <para>
526 <emphasis>[MAY SLEEP: SEE BELOW]</emphasis>
527 </para>
528
529 <para>
530 These routines are used to dynamically request pointer-aligned
531 chunks of memory, like malloc and free do in userspace, but
532 <function>kmalloc()</function> takes an extra flag word.
533 Important values:
534 </para>
535
536 <variablelist>
537 <varlistentry>
538 <term>
539 <constant>
540 GFP_KERNEL
541 </constant>
542 </term>
543 <listitem>
544 <para>
545 May sleep and swap to free memory. Only allowed in user
546 context, but is the most reliable way to allocate memory.
547 </para>
548 </listitem>
549 </varlistentry>
550
551 <varlistentry>
552 <term>
553 <constant>
554 GFP_ATOMIC
555 </constant>
556 </term>
557 <listitem>
558 <para>
559 Don't sleep. Less reliable than <constant>GFP_KERNEL</constant>,
560 but may be called from interrupt context. You should
561 <emphasis>really</emphasis> have a good out-of-memory
562 error-handling strategy.
563 </para>
564 </listitem>
565 </varlistentry>
566
567 <varlistentry>
568 <term>
569 <constant>
570 GFP_DMA
571 </constant>
572 </term>
573 <listitem>
574 <para>
575 Allocate ISA DMA lower than 16MB. If you don't know what that
576 is you don't need it. Very unreliable.
577 </para>
578 </listitem>
579 </varlistentry>
580 </variablelist>
581
582 <para>
583 If you see a <errorname>sleeping function called from invalid
584 context</errorname> warning message, then maybe you called a
585 sleeping allocation function from interrupt context without
586 <constant>GFP_ATOMIC</constant>. You should really fix that.
587 Run, don't walk.
588 </para>
589
590 <para>
591 If you are allocating at least <constant>PAGE_SIZE</constant>
592 (<filename class="headerfile">include/asm/page.h</filename>) bytes,
593 consider using <function>__get_free_pages()</function>
594
595 (<filename class="headerfile">include/linux/mm.h</filename>). It
596 takes an order argument (0 for page sized, 1 for double page, 2
597 for four pages etc.) and the same memory priority flag word as
598 above.
599 </para>
600
601 <para>
602 If you are allocating more than a page worth of bytes you can use
603 <function>vmalloc()</function>. It'll allocate virtual memory in
604 the kernel map. This block is not contiguous in physical memory,
605 but the <acronym>MMU</acronym> makes it look like it is for you
606 (so it'll only look contiguous to the CPUs, not to external device
607 drivers). If you really need large physically contiguous memory
608 for some weird device, you have a problem: it is poorly supported
609 in Linux because after some time memory fragmentation in a running
610 kernel makes it hard. The best way is to allocate the block early
611 in the boot process via the <function>alloc_bootmem()</function>
612 routine.
613 </para>
614
615 <para>
616 Before inventing your own cache of often-used objects consider
617 using a slab cache in
618 <filename class="headerfile">include/linux/slab.h</filename>
619 </para>
620 </sect1>
621
622 <sect1 id="routines-current">
623 <title><function>current</function>
624 <filename class="headerfile">include/asm/current.h</filename></title>
625
626 <para>
627 This global variable (really a macro) contains a pointer to
628 the current task structure, so is only valid in user context.
629 For example, when a process makes a system call, this will
630 point to the task structure of the calling process. It is
631 <emphasis>not NULL</emphasis> in interrupt context.
632 </para>
633 </sect1>
634
635 <sect1 id="routines-udelay">
636 <title><function>mdelay()</function>/<function>udelay()</function>
637 <filename class="headerfile">include/asm/delay.h</filename>
638 <filename class="headerfile">include/linux/delay.h</filename>
639 </title>
640
641 <para>
642 The <function>udelay()</function> and <function>ndelay()</function> functions can be used for small pauses.
643 Do not use large values with them as you risk
644 overflow - the helper function <function>mdelay()</function> is useful
645 here, or consider <function>msleep()</function>.
646 </para>
647 </sect1>
648
649 <sect1 id="routines-endian">
650 <title><function>cpu_to_be32()</function>/<function>be32_to_cpu()</function>/<function>cpu_to_le32()</function>/<function>le32_to_cpu()</function>
651 <filename class="headerfile">include/asm/byteorder.h</filename>
652 </title>
653
654 <para>
655 The <function>cpu_to_be32()</function> family (where the "32" can
656 be replaced by 64 or 16, and the "be" can be replaced by "le") are
657 the general way to do endian conversions in the kernel: they
658 return the converted value. All variations supply the reverse as
659 well: <function>be32_to_cpu()</function>, etc.
660 </para>
661
662 <para>
663 There are two major variations of these functions: the pointer
664 variation, such as <function>cpu_to_be32p()</function>, which take
665 a pointer to the given type, and return the converted value. The
666 other variation is the "in-situ" family, such as
667 <function>cpu_to_be32s()</function>, which convert value referred
668 to by the pointer, and return void.
669 </para>
670 </sect1>
671
672 <sect1 id="routines-local-irqs">
673 <title><function>local_irq_save()</function>/<function>local_irq_restore()</function>
674 <filename class="headerfile">include/linux/irqflags.h</filename>
675 </title>
676
677 <para>
678 These routines disable hard interrupts on the local CPU, and
679 restore them. They are reentrant; saving the previous state in
680 their one <varname>unsigned long flags</varname> argument. If you
681 know that interrupts are enabled, you can simply use
682 <function>local_irq_disable()</function> and
683 <function>local_irq_enable()</function>.
684 </para>
685 </sect1>
686
687 <sect1 id="routines-softirqs">
688 <title><function>local_bh_disable()</function>/<function>local_bh_enable()</function>
689 <filename class="headerfile">include/linux/interrupt.h</filename></title>
690
691 <para>
692 These routines disable soft interrupts on the local CPU, and
693 restore them. They are reentrant; if soft interrupts were
694 disabled before, they will still be disabled after this pair
695 of functions has been called. They prevent softirqs and tasklets
696 from running on the current CPU.
697 </para>
698 </sect1>
699
700 <sect1 id="routines-processorids">
701 <title><function>smp_processor_id</function>()
702 <filename class="headerfile">include/asm/smp.h</filename></title>
703
704 <para>
705 <function>get_cpu()</function> disables preemption (so you won't
706 suddenly get moved to another CPU) and returns the current
707 processor number, between 0 and <symbol>NR_CPUS</symbol>. Note
708 that the CPU numbers are not necessarily continuous. You return
709 it again with <function>put_cpu()</function> when you are done.
710 </para>
711 <para>
712 If you know you cannot be preempted by another task (ie. you are
713 in interrupt context, or have preemption disabled) you can use
714 smp_processor_id().
715 </para>
716 </sect1>
717
718 <sect1 id="routines-init">
719 <title><type>__init</type>/<type>__exit</type>/<type>__initdata</type>
720 <filename class="headerfile">include/linux/init.h</filename></title>
721
722 <para>
723 After boot, the kernel frees up a special section; functions
724 marked with <type>__init</type> and data structures marked with
725 <type>__initdata</type> are dropped after boot is complete: similarly
726 modules discard this memory after initialization. <type>__exit</type>
727 is used to declare a function which is only required on exit: the
728 function will be dropped if this file is not compiled as a module.
729 See the header file for use. Note that it makes no sense for a function
730 marked with <type>__init</type> to be exported to modules with
731 <function>EXPORT_SYMBOL()</function> - this will break.
732 </para>
733
734 </sect1>
735
736 <sect1 id="routines-init-again">
737 <title><function>__initcall()</function>/<function>module_init()</function>
738 <filename class="headerfile">include/linux/init.h</filename></title>
739 <para>
740 Many parts of the kernel are well served as a module
741 (dynamically-loadable parts of the kernel). Using the
742 <function>module_init()</function> and
743 <function>module_exit()</function> macros it is easy to write code
744 without #ifdefs which can operate both as a module or built into
745 the kernel.
746 </para>
747
748 <para>
749 The <function>module_init()</function> macro defines which
750 function is to be called at module insertion time (if the file is
751 compiled as a module), or at boot time: if the file is not
752 compiled as a module the <function>module_init()</function> macro
753 becomes equivalent to <function>__initcall()</function>, which
754 through linker magic ensures that the function is called on boot.
755 </para>
756
757 <para>
758 The function can return a negative error number to cause
759 module loading to fail (unfortunately, this has no effect if
760 the module is compiled into the kernel). This function is
761 called in user context with interrupts enabled, so it can sleep.
762 </para>
763 </sect1>
764
765 <sect1 id="routines-moduleexit">
766 <title> <function>module_exit()</function>
767 <filename class="headerfile">include/linux/init.h</filename> </title>
768
769 <para>
770 This macro defines the function to be called at module removal
771 time (or never, in the case of the file compiled into the
772 kernel). It will only be called if the module usage count has
773 reached zero. This function can also sleep, but cannot fail:
774 everything must be cleaned up by the time it returns.
775 </para>
776
777 <para>
778 Note that this macro is optional: if it is not present, your
779 module will not be removable (except for 'rmmod -f').
780 </para>
781 </sect1>
782
783 <sect1 id="routines-module-use-counters">
784 <title> <function>try_module_get()</function>/<function>module_put()</function>
785 <filename class="headerfile">include/linux/module.h</filename></title>
786
787 <para>
788 These manipulate the module usage count, to protect against
789 removal (a module also can't be removed if another module uses one
790 of its exported symbols: see below). Before calling into module
791 code, you should call <function>try_module_get()</function> on
792 that module: if it fails, then the module is being removed and you
793 should act as if it wasn't there. Otherwise, you can safely enter
794 the module, and call <function>module_put()</function> when you're
795 finished.
796 </para>
797
798 <para>
799 Most registerable structures have an
800 <structfield>owner</structfield> field, such as in the
801 <structname>file_operations</structname> structure. Set this field
802 to the macro <symbol>THIS_MODULE</symbol>.
803 </para>
804 </sect1>
805
806 <!-- add info on new-style module refcounting here -->
807 </chapter>
808
809 <chapter id="queues">
810 <title>Wait Queues
811 <filename class="headerfile">include/linux/wait.h</filename>
812 </title>
813 <para>
814 <emphasis>[SLEEPS]</emphasis>
815 </para>
816
817 <para>
818 A wait queue is used to wait for someone to wake you up when a
819 certain condition is true. They must be used carefully to ensure
820 there is no race condition. You declare a
821 <type>wait_queue_head_t</type>, and then processes which want to
822 wait for that condition declare a <type>wait_queue_entry_t</type>
823 referring to themselves, and place that in the queue.
824 </para>
825
826 <sect1 id="queue-declaring">
827 <title>Declaring</title>
828
829 <para>
830 You declare a <type>wait_queue_head_t</type> using the
831 <function>DECLARE_WAIT_QUEUE_HEAD()</function> macro, or using the
832 <function>init_waitqueue_head()</function> routine in your
833 initialization code.
834 </para>
835 </sect1>
836
837 <sect1 id="queue-waitqueue">
838 <title>Queuing</title>
839
840 <para>
841 Placing yourself in the waitqueue is fairly complex, because you
842 must put yourself in the queue before checking the condition.
843 There is a macro to do this:
844 <function>wait_event_interruptible()</function>
845
846 <filename class="headerfile">include/linux/wait.h</filename> The
847 first argument is the wait queue head, and the second is an
848 expression which is evaluated; the macro returns
849 <returnvalue>0</returnvalue> when this expression is true, or
850 <returnvalue>-ERESTARTSYS</returnvalue> if a signal is received.
851 The <function>wait_event()</function> version ignores signals.
852 </para>
853
854 </sect1>
855
856 <sect1 id="queue-waking">
857 <title>Waking Up Queued Tasks</title>
858
859 <para>
860 Call <function>wake_up()</function>
861
862 <filename class="headerfile">include/linux/wait.h</filename>;,
863 which will wake up every process in the queue. The exception is
864 if one has <constant>TASK_EXCLUSIVE</constant> set, in which case
865 the remainder of the queue will not be woken. There are other variants
866 of this basic function available in the same header.
867 </para>
868 </sect1>
869 </chapter>
870
871 <chapter id="atomic-ops">
872 <title>Atomic Operations</title>
873
874 <para>
875 Certain operations are guaranteed atomic on all platforms. The
876 first class of operations work on <type>atomic_t</type>
877
878 <filename class="headerfile">include/asm/atomic.h</filename>; this
879 contains a signed integer (at least 32 bits long), and you must use
880 these functions to manipulate or read atomic_t variables.
881 <function>atomic_read()</function> and
882 <function>atomic_set()</function> get and set the counter,
883 <function>atomic_add()</function>,
884 <function>atomic_sub()</function>,
885 <function>atomic_inc()</function>,
886 <function>atomic_dec()</function>, and
887 <function>atomic_dec_and_test()</function> (returns
888 <returnvalue>true</returnvalue> if it was decremented to zero).
889 </para>
890
891 <para>
892 Yes. It returns <returnvalue>true</returnvalue> (i.e. != 0) if the
893 atomic variable is zero.
894 </para>
895
896 <para>
897 Note that these functions are slower than normal arithmetic, and
898 so should not be used unnecessarily.
899 </para>
900
901 <para>
902 The second class of atomic operations is atomic bit operations on an
903 <type>unsigned long</type>, defined in
904
905 <filename class="headerfile">include/linux/bitops.h</filename>. These
906 operations generally take a pointer to the bit pattern, and a bit
907 number: 0 is the least significant bit.
908 <function>set_bit()</function>, <function>clear_bit()</function>
909 and <function>change_bit()</function> set, clear, and flip the
910 given bit. <function>test_and_set_bit()</function>,
911 <function>test_and_clear_bit()</function> and
912 <function>test_and_change_bit()</function> do the same thing,
913 except return true if the bit was previously set; these are
914 particularly useful for atomically setting flags.
915 </para>
916
917 <para>
918 It is possible to call these operations with bit indices greater
919 than BITS_PER_LONG. The resulting behavior is strange on big-endian
920 platforms though so it is a good idea not to do this.
921 </para>
922 </chapter>
923
924 <chapter id="symbols">
925 <title>Symbols</title>
926
927 <para>
928 Within the kernel proper, the normal linking rules apply
929 (ie. unless a symbol is declared to be file scope with the
930 <type>static</type> keyword, it can be used anywhere in the
931 kernel). However, for modules, a special exported symbol table is
932 kept which limits the entry points to the kernel proper. Modules
933 can also export symbols.
934 </para>
935
936 <sect1 id="sym-exportsymbols">
937 <title><function>EXPORT_SYMBOL()</function>
938 <filename class="headerfile">include/linux/export.h</filename></title>
939
940 <para>
941 This is the classic method of exporting a symbol: dynamically
942 loaded modules will be able to use the symbol as normal.
943 </para>
944 </sect1>
945
946 <sect1 id="sym-exportsymbols-gpl">
947 <title><function>EXPORT_SYMBOL_GPL()</function>
948 <filename class="headerfile">include/linux/export.h</filename></title>
949
950 <para>
951 Similar to <function>EXPORT_SYMBOL()</function> except that the
952 symbols exported by <function>EXPORT_SYMBOL_GPL()</function> can
953 only be seen by modules with a
954 <function>MODULE_LICENSE()</function> that specifies a GPL
955 compatible license. It implies that the function is considered
956 an internal implementation issue, and not really an interface.
957 Some maintainers and developers may however
958 require EXPORT_SYMBOL_GPL() when adding any new APIs or functionality.
959 </para>
960 </sect1>
961 </chapter>
962
963 <chapter id="conventions">
964 <title>Routines and Conventions</title>
965
966 <sect1 id="conventions-doublelinkedlist">
967 <title>Double-linked lists
968 <filename class="headerfile">include/linux/list.h</filename></title>
969
970 <para>
971 There used to be three sets of linked-list routines in the kernel
972 headers, but this one is the winner. If you don't have some
973 particular pressing need for a single list, it's a good choice.
974 </para>
975
976 <para>
977 In particular, <function>list_for_each_entry</function> is useful.
978 </para>
979 </sect1>
980
981 <sect1 id="convention-returns">
982 <title>Return Conventions</title>
983
984 <para>
985 For code called in user context, it's very common to defy C
986 convention, and return <returnvalue>0</returnvalue> for success,
987 and a negative error number
988 (eg. <returnvalue>-EFAULT</returnvalue>) for failure. This can be
989 unintuitive at first, but it's fairly widespread in the kernel.
990 </para>
991
992 <para>
993 Using <function>ERR_PTR()</function>
994
995 <filename class="headerfile">include/linux/err.h</filename>; to
996 encode a negative error number into a pointer, and
997 <function>IS_ERR()</function> and <function>PTR_ERR()</function>
998 to get it back out again: avoids a separate pointer parameter for
999 the error number. Icky, but in a good way.
1000 </para>
1001 </sect1>
1002
1003 <sect1 id="conventions-borkedcompile">
1004 <title>Breaking Compilation</title>
1005
1006 <para>
1007 Linus and the other developers sometimes change function or
1008 structure names in development kernels; this is not done just to
1009 keep everyone on their toes: it reflects a fundamental change
1010 (eg. can no longer be called with interrupts on, or does extra
1011 checks, or doesn't do checks which were caught before). Usually
1012 this is accompanied by a fairly complete note to the linux-kernel
1013 mailing list; search the archive. Simply doing a global replace
1014 on the file usually makes things <emphasis>worse</emphasis>.
1015 </para>
1016 </sect1>
1017
1018 <sect1 id="conventions-initialising">
1019 <title>Initializing structure members</title>
1020
1021 <para>
1022 The preferred method of initializing structures is to use
1023 designated initialisers, as defined by ISO C99, eg:
1024 </para>
1025 <programlisting>
1026static struct block_device_operations opt_fops = {
1027 .open = opt_open,
1028 .release = opt_release,
1029 .ioctl = opt_ioctl,
1030 .check_media_change = opt_media_change,
1031};
1032 </programlisting>
1033 <para>
1034 This makes it easy to grep for, and makes it clear which
1035 structure fields are set. You should do this because it looks
1036 cool.
1037 </para>
1038 </sect1>
1039
1040 <sect1 id="conventions-gnu-extns">
1041 <title>GNU Extensions</title>
1042
1043 <para>
1044 GNU Extensions are explicitly allowed in the Linux kernel.
1045 Note that some of the more complex ones are not very well
1046 supported, due to lack of general use, but the following are
1047 considered standard (see the GCC info page section "C
1048 Extensions" for more details - Yes, really the info page, the
1049 man page is only a short summary of the stuff in info).
1050 </para>
1051 <itemizedlist>
1052 <listitem>
1053 <para>
1054 Inline functions
1055 </para>
1056 </listitem>
1057 <listitem>
1058 <para>
1059 Statement expressions (ie. the ({ and }) constructs).
1060 </para>
1061 </listitem>
1062 <listitem>
1063 <para>
1064 Declaring attributes of a function / variable / type
1065 (__attribute__)
1066 </para>
1067 </listitem>
1068 <listitem>
1069 <para>
1070 typeof
1071 </para>
1072 </listitem>
1073 <listitem>
1074 <para>
1075 Zero length arrays
1076 </para>
1077 </listitem>
1078 <listitem>
1079 <para>
1080 Macro varargs
1081 </para>
1082 </listitem>
1083 <listitem>
1084 <para>
1085 Arithmetic on void pointers
1086 </para>
1087 </listitem>
1088 <listitem>
1089 <para>
1090 Non-Constant initializers
1091 </para>
1092 </listitem>
1093 <listitem>
1094 <para>
1095 Assembler Instructions (not outside arch/ and include/asm/)
1096 </para>
1097 </listitem>
1098 <listitem>
1099 <para>
1100 Function names as strings (__func__).
1101 </para>
1102 </listitem>
1103 <listitem>
1104 <para>
1105 __builtin_constant_p()
1106 </para>
1107 </listitem>
1108 </itemizedlist>
1109
1110 <para>
1111 Be wary when using long long in the kernel, the code gcc generates for
1112 it is horrible and worse: division and multiplication does not work
1113 on i386 because the GCC runtime functions for it are missing from
1114 the kernel environment.
1115 </para>
1116
1117 <!-- FIXME: add a note about ANSI aliasing cleanness -->
1118 </sect1>
1119
1120 <sect1 id="conventions-cplusplus">
1121 <title>C++</title>
1122
1123 <para>
1124 Using C++ in the kernel is usually a bad idea, because the
1125 kernel does not provide the necessary runtime environment
1126 and the include files are not tested for it. It is still
1127 possible, but not recommended. If you really want to do
1128 this, forget about exceptions at least.
1129 </para>
1130 </sect1>
1131
1132 <sect1 id="conventions-ifdef">
1133 <title>&num;if</title>
1134
1135 <para>
1136 It is generally considered cleaner to use macros in header files
1137 (or at the top of .c files) to abstract away functions rather than
1138 using `#if' pre-processor statements throughout the source code.
1139 </para>
1140 </sect1>
1141 </chapter>
1142
1143 <chapter id="submitting">
1144 <title>Putting Your Stuff in the Kernel</title>
1145
1146 <para>
1147 In order to get your stuff into shape for official inclusion, or
1148 even to make a neat patch, there's administrative work to be
1149 done:
1150 </para>
1151 <itemizedlist>
1152 <listitem>
1153 <para>
1154 Figure out whose pond you've been pissing in. Look at the top of
1155 the source files, inside the <filename>MAINTAINERS</filename>
1156 file, and last of all in the <filename>CREDITS</filename> file.
1157 You should coordinate with this person to make sure you're not
1158 duplicating effort, or trying something that's already been
1159 rejected.
1160 </para>
1161
1162 <para>
1163 Make sure you put your name and EMail address at the top of
1164 any files you create or mangle significantly. This is the
1165 first place people will look when they find a bug, or when
1166 <emphasis>they</emphasis> want to make a change.
1167 </para>
1168 </listitem>
1169
1170 <listitem>
1171 <para>
1172 Usually you want a configuration option for your kernel hack.
1173 Edit <filename>Kconfig</filename> in the appropriate directory.
1174 The Config language is simple to use by cut and paste, and there's
1175 complete documentation in
1176 <filename>Documentation/kbuild/kconfig-language.txt</filename>.
1177 </para>
1178
1179 <para>
1180 In your description of the option, make sure you address both the
1181 expert user and the user who knows nothing about your feature. Mention
1182 incompatibilities and issues here. <emphasis> Definitely
1183 </emphasis> end your description with <quote> if in doubt, say N
1184 </quote> (or, occasionally, `Y'); this is for people who have no
1185 idea what you are talking about.
1186 </para>
1187 </listitem>
1188
1189 <listitem>
1190 <para>
1191 Edit the <filename>Makefile</filename>: the CONFIG variables are
1192 exported here so you can usually just add a "obj-$(CONFIG_xxx) +=
1193 xxx.o" line. The syntax is documented in
1194 <filename>Documentation/kbuild/makefiles.txt</filename>.
1195 </para>
1196 </listitem>
1197
1198 <listitem>
1199 <para>
1200 Put yourself in <filename>CREDITS</filename> if you've done
1201 something noteworthy, usually beyond a single file (your name
1202 should be at the top of the source files anyway).
1203 <filename>MAINTAINERS</filename> means you want to be consulted
1204 when changes are made to a subsystem, and hear about bugs; it
1205 implies a more-than-passing commitment to some part of the code.
1206 </para>
1207 </listitem>
1208
1209 <listitem>
1210 <para>
1211 Finally, don't forget to read <filename>Documentation/process/submitting-patches.rst</filename>
1212 and possibly <filename>Documentation/process/submitting-drivers.rst</filename>.
1213 </para>
1214 </listitem>
1215 </itemizedlist>
1216 </chapter>
1217
1218 <chapter id="cantrips">
1219 <title>Kernel Cantrips</title>
1220
1221 <para>
1222 Some favorites from browsing the source. Feel free to add to this
1223 list.
1224 </para>
1225
1226 <para>
1227 <filename>arch/x86/include/asm/delay.h:</filename>
1228 </para>
1229 <programlisting>
1230#define ndelay(n) (__builtin_constant_p(n) ? \
1231 ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
1232 __ndelay(n))
1233 </programlisting>
1234
1235 <para>
1236 <filename>include/linux/fs.h</filename>:
1237 </para>
1238 <programlisting>
1239/*
1240 * Kernel pointers have redundant information, so we can use a
1241 * scheme where we can return either an error code or a dentry
1242 * pointer with the same return value.
1243 *
1244 * This should be a per-architecture thing, to allow different
1245 * error and pointer decisions.
1246 */
1247 #define ERR_PTR(err) ((void *)((long)(err)))
1248 #define PTR_ERR(ptr) ((long)(ptr))
1249 #define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000))
1250</programlisting>
1251
1252 <para>
1253 <filename>arch/x86/include/asm/uaccess_32.h:</filename>
1254 </para>
1255
1256 <programlisting>
1257#define copy_to_user(to,from,n) \
1258 (__builtin_constant_p(n) ? \
1259 __constant_copy_to_user((to),(from),(n)) : \
1260 __generic_copy_to_user((to),(from),(n)))
1261 </programlisting>
1262
1263 <para>
1264 <filename>arch/sparc/kernel/head.S:</filename>
1265 </para>
1266
1267 <programlisting>
1268/*
1269 * Sun people can't spell worth damn. "compatability" indeed.
1270 * At least we *know* we can't spell, and use a spell-checker.
1271 */
1272
1273/* Uh, actually Linus it is I who cannot spell. Too much murky
1274 * Sparc assembly will do this to ya.
1275 */
1276C_LABEL(cputypvar):
1277 .asciz "compatibility"
1278
1279/* Tested on SS-5, SS-10. Probably someone at Sun applied a spell-checker. */
1280 .align 4
1281C_LABEL(cputypvar_sun4m):
1282 .asciz "compatible"
1283 </programlisting>
1284
1285 <para>
1286 <filename>arch/sparc/lib/checksum.S:</filename>
1287 </para>
1288
1289 <programlisting>
1290 /* Sun, you just can't beat me, you just can't. Stop trying,
1291 * give up. I'm serious, I am going to kick the living shit
1292 * out of you, game over, lights out.
1293 */
1294 </programlisting>
1295 </chapter>
1296
1297 <chapter id="credits">
1298 <title>Thanks</title>
1299
1300 <para>
1301 Thanks to Andi Kleen for the idea, answering my questions, fixing
1302 my mistakes, filling content, etc. Philipp Rumpf for more spelling
1303 and clarity fixes, and some excellent non-obvious points. Werner
1304 Almesberger for giving me a great summary of
1305 <function>disable_irq()</function>, and Jes Sorensen and Andrea
1306 Arcangeli added caveats. Michael Elizabeth Chastain for checking
1307 and adding to the Configure section. <!-- Rusty insisted on this
1308 bit; I didn't do it! --> Telsa Gwynne for teaching me DocBook.
1309 </para>
1310 </chapter>
1311</book>
1312
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
deleted file mode 100644
index 7c9cc4846cb6..000000000000
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ /dev/null
@@ -1,2151 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="LKLockingGuide">
6 <bookinfo>
7 <title>Unreliable Guide To Locking</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Rusty</firstname>
12 <surname>Russell</surname>
13 <affiliation>
14 <address>
15 <email>rusty@rustcorp.com.au</email>
16 </address>
17 </affiliation>
18 </author>
19 </authorgroup>
20
21 <copyright>
22 <year>2003</year>
23 <holder>Rusty Russell</holder>
24 </copyright>
25
26 <legalnotice>
27 <para>
28 This documentation is free software; you can redistribute
29 it and/or modify it under the terms of the GNU General Public
30 License as published by the Free Software Foundation; either
31 version 2 of the License, or (at your option) any later
32 version.
33 </para>
34
35 <para>
36 This program is distributed in the hope that it will be
37 useful, but WITHOUT ANY WARRANTY; without even the implied
38 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
39 See the GNU General Public License for more details.
40 </para>
41
42 <para>
43 You should have received a copy of the GNU General Public
44 License along with this program; if not, write to the Free
45 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
46 MA 02111-1307 USA
47 </para>
48
49 <para>
50 For more details see the file COPYING in the source
51 distribution of Linux.
52 </para>
53 </legalnotice>
54 </bookinfo>
55
56 <toc></toc>
57 <chapter id="intro">
58 <title>Introduction</title>
59 <para>
60 Welcome, to Rusty's Remarkably Unreliable Guide to Kernel
61 Locking issues. This document describes the locking systems in
62 the Linux Kernel in 2.6.
63 </para>
64 <para>
65 With the wide availability of HyperThreading, and <firstterm
66 linkend="gloss-preemption">preemption </firstterm> in the Linux
67 Kernel, everyone hacking on the kernel needs to know the
68 fundamentals of concurrency and locking for
69 <firstterm linkend="gloss-smp"><acronym>SMP</acronym></firstterm>.
70 </para>
71 </chapter>
72
73 <chapter id="races">
74 <title>The Problem With Concurrency</title>
75 <para>
76 (Skip this if you know what a Race Condition is).
77 </para>
78 <para>
79 In a normal program, you can increment a counter like so:
80 </para>
81 <programlisting>
82 very_important_count++;
83 </programlisting>
84
85 <para>
86 This is what they would expect to happen:
87 </para>
88
89 <table>
90 <title>Expected Results</title>
91
92 <tgroup cols="2" align="left">
93
94 <thead>
95 <row>
96 <entry>Instance 1</entry>
97 <entry>Instance 2</entry>
98 </row>
99 </thead>
100
101 <tbody>
102 <row>
103 <entry>read very_important_count (5)</entry>
104 <entry></entry>
105 </row>
106 <row>
107 <entry>add 1 (6)</entry>
108 <entry></entry>
109 </row>
110 <row>
111 <entry>write very_important_count (6)</entry>
112 <entry></entry>
113 </row>
114 <row>
115 <entry></entry>
116 <entry>read very_important_count (6)</entry>
117 </row>
118 <row>
119 <entry></entry>
120 <entry>add 1 (7)</entry>
121 </row>
122 <row>
123 <entry></entry>
124 <entry>write very_important_count (7)</entry>
125 </row>
126 </tbody>
127
128 </tgroup>
129 </table>
130
131 <para>
132 This is what might happen:
133 </para>
134
135 <table>
136 <title>Possible Results</title>
137
138 <tgroup cols="2" align="left">
139 <thead>
140 <row>
141 <entry>Instance 1</entry>
142 <entry>Instance 2</entry>
143 </row>
144 </thead>
145
146 <tbody>
147 <row>
148 <entry>read very_important_count (5)</entry>
149 <entry></entry>
150 </row>
151 <row>
152 <entry></entry>
153 <entry>read very_important_count (5)</entry>
154 </row>
155 <row>
156 <entry>add 1 (6)</entry>
157 <entry></entry>
158 </row>
159 <row>
160 <entry></entry>
161 <entry>add 1 (6)</entry>
162 </row>
163 <row>
164 <entry>write very_important_count (6)</entry>
165 <entry></entry>
166 </row>
167 <row>
168 <entry></entry>
169 <entry>write very_important_count (6)</entry>
170 </row>
171 </tbody>
172 </tgroup>
173 </table>
174
175 <sect1 id="race-condition">
176 <title>Race Conditions and Critical Regions</title>
177 <para>
178 This overlap, where the result depends on the
179 relative timing of multiple tasks, is called a <firstterm>race condition</firstterm>.
180 The piece of code containing the concurrency issue is called a
181 <firstterm>critical region</firstterm>. And especially since Linux starting running
182 on SMP machines, they became one of the major issues in kernel
183 design and implementation.
184 </para>
185 <para>
186 Preemption can have the same effect, even if there is only one
187 CPU: by preempting one task during the critical region, we have
188 exactly the same race condition. In this case the thread which
189 preempts might run the critical region itself.
190 </para>
191 <para>
192 The solution is to recognize when these simultaneous accesses
193 occur, and use locks to make sure that only one instance can
194 enter the critical region at any time. There are many
195 friendly primitives in the Linux kernel to help you do this.
196 And then there are the unfriendly primitives, but I'll pretend
197 they don't exist.
198 </para>
199 </sect1>
200 </chapter>
201
202 <chapter id="locks">
203 <title>Locking in the Linux Kernel</title>
204
205 <para>
206 If I could give you one piece of advice: never sleep with anyone
207 crazier than yourself. But if I had to give you advice on
208 locking: <emphasis>keep it simple</emphasis>.
209 </para>
210
211 <para>
212 Be reluctant to introduce new locks.
213 </para>
214
215 <para>
216 Strangely enough, this last one is the exact reverse of my advice when
217 you <emphasis>have</emphasis> slept with someone crazier than yourself.
218 And you should think about getting a big dog.
219 </para>
220
221 <sect1 id="lock-intro">
222 <title>Two Main Types of Kernel Locks: Spinlocks and Mutexes</title>
223
224 <para>
225 There are two main types of kernel locks. The fundamental type
226 is the spinlock
227 (<filename class="headerfile">include/asm/spinlock.h</filename>),
228 which is a very simple single-holder lock: if you can't get the
229 spinlock, you keep trying (spinning) until you can. Spinlocks are
230 very small and fast, and can be used anywhere.
231 </para>
232 <para>
233 The second type is a mutex
234 (<filename class="headerfile">include/linux/mutex.h</filename>): it
235 is like a spinlock, but you may block holding a mutex.
236 If you can't lock a mutex, your task will suspend itself, and be woken
237 up when the mutex is released. This means the CPU can do something
238 else while you are waiting. There are many cases when you simply
239 can't sleep (see <xref linkend="sleeping-things"/>), and so have to
240 use a spinlock instead.
241 </para>
242 <para>
243 Neither type of lock is recursive: see
244 <xref linkend="deadlock"/>.
245 </para>
246 </sect1>
247
248 <sect1 id="uniprocessor">
249 <title>Locks and Uniprocessor Kernels</title>
250
251 <para>
252 For kernels compiled without <symbol>CONFIG_SMP</symbol>, and
253 without <symbol>CONFIG_PREEMPT</symbol> spinlocks do not exist at
254 all. This is an excellent design decision: when no-one else can
255 run at the same time, there is no reason to have a lock.
256 </para>
257
258 <para>
259 If the kernel is compiled without <symbol>CONFIG_SMP</symbol>,
260 but <symbol>CONFIG_PREEMPT</symbol> is set, then spinlocks
261 simply disable preemption, which is sufficient to prevent any
262 races. For most purposes, we can think of preemption as
263 equivalent to SMP, and not worry about it separately.
264 </para>
265
266 <para>
267 You should always test your locking code with <symbol>CONFIG_SMP</symbol>
268 and <symbol>CONFIG_PREEMPT</symbol> enabled, even if you don't have an SMP test box, because it
269 will still catch some kinds of locking bugs.
270 </para>
271
272 <para>
273 Mutexes still exist, because they are required for
274 synchronization between <firstterm linkend="gloss-usercontext">user
275 contexts</firstterm>, as we will see below.
276 </para>
277 </sect1>
278
279 <sect1 id="usercontextlocking">
280 <title>Locking Only In User Context</title>
281
282 <para>
283 If you have a data structure which is only ever accessed from
284 user context, then you can use a simple mutex
285 (<filename>include/linux/mutex.h</filename>) to protect it. This
286 is the most trivial case: you initialize the mutex. Then you can
287 call <function>mutex_lock_interruptible()</function> to grab the mutex,
288 and <function>mutex_unlock()</function> to release it. There is also a
289 <function>mutex_lock()</function>, which should be avoided, because it
290 will not return if a signal is received.
291 </para>
292
293 <para>
294 Example: <filename>net/netfilter/nf_sockopt.c</filename> allows
295 registration of new <function>setsockopt()</function> and
296 <function>getsockopt()</function> calls, with
297 <function>nf_register_sockopt()</function>. Registration and
298 de-registration are only done on module load and unload (and boot
299 time, where there is no concurrency), and the list of registrations
300 is only consulted for an unknown <function>setsockopt()</function>
301 or <function>getsockopt()</function> system call. The
302 <varname>nf_sockopt_mutex</varname> is perfect to protect this,
303 especially since the setsockopt and getsockopt calls may well
304 sleep.
305 </para>
306 </sect1>
307
308 <sect1 id="lock-user-bh">
309 <title>Locking Between User Context and Softirqs</title>
310
311 <para>
312 If a <firstterm linkend="gloss-softirq">softirq</firstterm> shares
313 data with user context, you have two problems. Firstly, the current
314 user context can be interrupted by a softirq, and secondly, the
315 critical region could be entered from another CPU. This is where
316 <function>spin_lock_bh()</function>
317 (<filename class="headerfile">include/linux/spinlock.h</filename>) is
318 used. It disables softirqs on that CPU, then grabs the lock.
319 <function>spin_unlock_bh()</function> does the reverse. (The
320 '_bh' suffix is a historical reference to "Bottom Halves", the
321 old name for software interrupts. It should really be
322 called spin_lock_softirq()' in a perfect world).
323 </para>
324
325 <para>
326 Note that you can also use <function>spin_lock_irq()</function>
327 or <function>spin_lock_irqsave()</function> here, which stop
328 hardware interrupts as well: see <xref linkend="hardirq-context"/>.
329 </para>
330
331 <para>
332 This works perfectly for <firstterm linkend="gloss-up"><acronym>UP
333 </acronym></firstterm> as well: the spin lock vanishes, and this macro
334 simply becomes <function>local_bh_disable()</function>
335 (<filename class="headerfile">include/linux/interrupt.h</filename>), which
336 protects you from the softirq being run.
337 </para>
338 </sect1>
339
340 <sect1 id="lock-user-tasklet">
341 <title>Locking Between User Context and Tasklets</title>
342
343 <para>
344 This is exactly the same as above, because <firstterm
345 linkend="gloss-tasklet">tasklets</firstterm> are actually run
346 from a softirq.
347 </para>
348 </sect1>
349
350 <sect1 id="lock-user-timers">
351 <title>Locking Between User Context and Timers</title>
352
353 <para>
354 This, too, is exactly the same as above, because <firstterm
355 linkend="gloss-timers">timers</firstterm> are actually run from
356 a softirq. From a locking point of view, tasklets and timers
357 are identical.
358 </para>
359 </sect1>
360
361 <sect1 id="lock-tasklets">
362 <title>Locking Between Tasklets/Timers</title>
363
364 <para>
365 Sometimes a tasklet or timer might want to share data with
366 another tasklet or timer.
367 </para>
368
369 <sect2 id="lock-tasklets-same">
370 <title>The Same Tasklet/Timer</title>
371 <para>
372 Since a tasklet is never run on two CPUs at once, you don't
373 need to worry about your tasklet being reentrant (running
374 twice at once), even on SMP.
375 </para>
376 </sect2>
377
378 <sect2 id="lock-tasklets-different">
379 <title>Different Tasklets/Timers</title>
380 <para>
381 If another tasklet/timer wants
382 to share data with your tasklet or timer , you will both need to use
383 <function>spin_lock()</function> and
384 <function>spin_unlock()</function> calls.
385 <function>spin_lock_bh()</function> is
386 unnecessary here, as you are already in a tasklet, and
387 none will be run on the same CPU.
388 </para>
389 </sect2>
390 </sect1>
391
392 <sect1 id="lock-softirqs">
393 <title>Locking Between Softirqs</title>
394
395 <para>
396 Often a softirq might
397 want to share data with itself or a tasklet/timer.
398 </para>
399
400 <sect2 id="lock-softirqs-same">
401 <title>The Same Softirq</title>
402
403 <para>
404 The same softirq can run on the other CPUs: you can use a
405 per-CPU array (see <xref linkend="per-cpu"/>) for better
406 performance. If you're going so far as to use a softirq,
407 you probably care about scalable performance enough
408 to justify the extra complexity.
409 </para>
410
411 <para>
412 You'll need to use <function>spin_lock()</function> and
413 <function>spin_unlock()</function> for shared data.
414 </para>
415 </sect2>
416
417 <sect2 id="lock-softirqs-different">
418 <title>Different Softirqs</title>
419
420 <para>
421 You'll need to use <function>spin_lock()</function> and
422 <function>spin_unlock()</function> for shared data, whether it
423 be a timer, tasklet, different softirq or the same or another
424 softirq: any of them could be running on a different CPU.
425 </para>
426 </sect2>
427 </sect1>
428 </chapter>
429
430 <chapter id="hardirq-context">
431 <title>Hard IRQ Context</title>
432
433 <para>
434 Hardware interrupts usually communicate with a
435 tasklet or softirq. Frequently this involves putting work in a
436 queue, which the softirq will take out.
437 </para>
438
439 <sect1 id="hardirq-softirq">
440 <title>Locking Between Hard IRQ and Softirqs/Tasklets</title>
441
442 <para>
443 If a hardware irq handler shares data with a softirq, you have
444 two concerns. Firstly, the softirq processing can be
445 interrupted by a hardware interrupt, and secondly, the
446 critical region could be entered by a hardware interrupt on
447 another CPU. This is where <function>spin_lock_irq()</function> is
448 used. It is defined to disable interrupts on that cpu, then grab
449 the lock. <function>spin_unlock_irq()</function> does the reverse.
450 </para>
451
452 <para>
453 The irq handler does not to use
454 <function>spin_lock_irq()</function>, because the softirq cannot
455 run while the irq handler is running: it can use
456 <function>spin_lock()</function>, which is slightly faster. The
457 only exception would be if a different hardware irq handler uses
458 the same lock: <function>spin_lock_irq()</function> will stop
459 that from interrupting us.
460 </para>
461
462 <para>
463 This works perfectly for UP as well: the spin lock vanishes,
464 and this macro simply becomes <function>local_irq_disable()</function>
465 (<filename class="headerfile">include/asm/smp.h</filename>), which
466 protects you from the softirq/tasklet/BH being run.
467 </para>
468
469 <para>
470 <function>spin_lock_irqsave()</function>
471 (<filename>include/linux/spinlock.h</filename>) is a variant
472 which saves whether interrupts were on or off in a flags word,
473 which is passed to <function>spin_unlock_irqrestore()</function>. This
474 means that the same code can be used inside an hard irq handler (where
475 interrupts are already off) and in softirqs (where the irq
476 disabling is required).
477 </para>
478
479 <para>
480 Note that softirqs (and hence tasklets and timers) are run on
481 return from hardware interrupts, so
482 <function>spin_lock_irq()</function> also stops these. In that
483 sense, <function>spin_lock_irqsave()</function> is the most
484 general and powerful locking function.
485 </para>
486
487 </sect1>
488 <sect1 id="hardirq-hardirq">
489 <title>Locking Between Two Hard IRQ Handlers</title>
490 <para>
491 It is rare to have to share data between two IRQ handlers, but
492 if you do, <function>spin_lock_irqsave()</function> should be
493 used: it is architecture-specific whether all interrupts are
494 disabled inside irq handlers themselves.
495 </para>
496 </sect1>
497
498 </chapter>
499
500 <chapter id="cheatsheet">
501 <title>Cheat Sheet For Locking</title>
502 <para>
503 Pete Zaitcev gives the following summary:
504 </para>
505 <itemizedlist>
506 <listitem>
507 <para>
508 If you are in a process context (any syscall) and want to
509 lock other process out, use a mutex. You can take a mutex
510 and sleep (<function>copy_from_user*(</function> or
511 <function>kmalloc(x,GFP_KERNEL)</function>).
512 </para>
513 </listitem>
514 <listitem>
515 <para>
516 Otherwise (== data can be touched in an interrupt), use
517 <function>spin_lock_irqsave()</function> and
518 <function>spin_unlock_irqrestore()</function>.
519 </para>
520 </listitem>
521 <listitem>
522 <para>
523 Avoid holding spinlock for more than 5 lines of code and
524 across any function call (except accessors like
525 <function>readb</function>).
526 </para>
527 </listitem>
528 </itemizedlist>
529
530 <sect1 id="minimum-lock-reqirements">
531 <title>Table of Minimum Requirements</title>
532
533 <para> The following table lists the <emphasis>minimum</emphasis>
534 locking requirements between various contexts. In some cases,
535 the same context can only be running on one CPU at a time, so
536 no locking is required for that context (eg. a particular
537 thread can only run on one CPU at a time, but if it needs
538 shares data with another thread, locking is required).
539 </para>
540 <para>
541 Remember the advice above: you can always use
542 <function>spin_lock_irqsave()</function>, which is a superset
543 of all other spinlock primitives.
544 </para>
545
546 <table>
547<title>Table of Locking Requirements</title>
548<tgroup cols="11">
549<tbody>
550
551<row>
552<entry></entry>
553<entry>IRQ Handler A</entry>
554<entry>IRQ Handler B</entry>
555<entry>Softirq A</entry>
556<entry>Softirq B</entry>
557<entry>Tasklet A</entry>
558<entry>Tasklet B</entry>
559<entry>Timer A</entry>
560<entry>Timer B</entry>
561<entry>User Context A</entry>
562<entry>User Context B</entry>
563</row>
564
565<row>
566<entry>IRQ Handler A</entry>
567<entry>None</entry>
568</row>
569
570<row>
571<entry>IRQ Handler B</entry>
572<entry>SLIS</entry>
573<entry>None</entry>
574</row>
575
576<row>
577<entry>Softirq A</entry>
578<entry>SLI</entry>
579<entry>SLI</entry>
580<entry>SL</entry>
581</row>
582
583<row>
584<entry>Softirq B</entry>
585<entry>SLI</entry>
586<entry>SLI</entry>
587<entry>SL</entry>
588<entry>SL</entry>
589</row>
590
591<row>
592<entry>Tasklet A</entry>
593<entry>SLI</entry>
594<entry>SLI</entry>
595<entry>SL</entry>
596<entry>SL</entry>
597<entry>None</entry>
598</row>
599
600<row>
601<entry>Tasklet B</entry>
602<entry>SLI</entry>
603<entry>SLI</entry>
604<entry>SL</entry>
605<entry>SL</entry>
606<entry>SL</entry>
607<entry>None</entry>
608</row>
609
610<row>
611<entry>Timer A</entry>
612<entry>SLI</entry>
613<entry>SLI</entry>
614<entry>SL</entry>
615<entry>SL</entry>
616<entry>SL</entry>
617<entry>SL</entry>
618<entry>None</entry>
619</row>
620
621<row>
622<entry>Timer B</entry>
623<entry>SLI</entry>
624<entry>SLI</entry>
625<entry>SL</entry>
626<entry>SL</entry>
627<entry>SL</entry>
628<entry>SL</entry>
629<entry>SL</entry>
630<entry>None</entry>
631</row>
632
633<row>
634<entry>User Context A</entry>
635<entry>SLI</entry>
636<entry>SLI</entry>
637<entry>SLBH</entry>
638<entry>SLBH</entry>
639<entry>SLBH</entry>
640<entry>SLBH</entry>
641<entry>SLBH</entry>
642<entry>SLBH</entry>
643<entry>None</entry>
644</row>
645
646<row>
647<entry>User Context B</entry>
648<entry>SLI</entry>
649<entry>SLI</entry>
650<entry>SLBH</entry>
651<entry>SLBH</entry>
652<entry>SLBH</entry>
653<entry>SLBH</entry>
654<entry>SLBH</entry>
655<entry>SLBH</entry>
656<entry>MLI</entry>
657<entry>None</entry>
658</row>
659
660</tbody>
661</tgroup>
662</table>
663
664 <table>
665<title>Legend for Locking Requirements Table</title>
666<tgroup cols="2">
667<tbody>
668
669<row>
670<entry>SLIS</entry>
671<entry>spin_lock_irqsave</entry>
672</row>
673<row>
674<entry>SLI</entry>
675<entry>spin_lock_irq</entry>
676</row>
677<row>
678<entry>SL</entry>
679<entry>spin_lock</entry>
680</row>
681<row>
682<entry>SLBH</entry>
683<entry>spin_lock_bh</entry>
684</row>
685<row>
686<entry>MLI</entry>
687<entry>mutex_lock_interruptible</entry>
688</row>
689
690</tbody>
691</tgroup>
692</table>
693
694</sect1>
695</chapter>
696
697<chapter id="trylock-functions">
698 <title>The trylock Functions</title>
699 <para>
700 There are functions that try to acquire a lock only once and immediately
701 return a value telling about success or failure to acquire the lock.
702 They can be used if you need no access to the data protected with the lock
703 when some other thread is holding the lock. You should acquire the lock
704 later if you then need access to the data protected with the lock.
705 </para>
706
707 <para>
708 <function>spin_trylock()</function> does not spin but returns non-zero if
709 it acquires the spinlock on the first try or 0 if not. This function can
710 be used in all contexts like <function>spin_lock</function>: you must have
711 disabled the contexts that might interrupt you and acquire the spin lock.
712 </para>
713
714 <para>
715 <function>mutex_trylock()</function> does not suspend your task
716 but returns non-zero if it could lock the mutex on the first try
717 or 0 if not. This function cannot be safely used in hardware or software
718 interrupt contexts despite not sleeping.
719 </para>
720</chapter>
721
722 <chapter id="Examples">
723 <title>Common Examples</title>
724 <para>
725Let's step through a simple example: a cache of number to name
726mappings. The cache keeps a count of how often each of the objects is
727used, and when it gets full, throws out the least used one.
728
729 </para>
730
731 <sect1 id="examples-usercontext">
732 <title>All In User Context</title>
733 <para>
734For our first example, we assume that all operations are in user
735context (ie. from system calls), so we can sleep. This means we can
736use a mutex to protect the cache and all the objects within
737it. Here's the code:
738 </para>
739
740 <programlisting>
741#include &lt;linux/list.h&gt;
742#include &lt;linux/slab.h&gt;
743#include &lt;linux/string.h&gt;
744#include &lt;linux/mutex.h&gt;
745#include &lt;asm/errno.h&gt;
746
747struct object
748{
749 struct list_head list;
750 int id;
751 char name[32];
752 int popularity;
753};
754
755/* Protects the cache, cache_num, and the objects within it */
756static DEFINE_MUTEX(cache_lock);
757static LIST_HEAD(cache);
758static unsigned int cache_num = 0;
759#define MAX_CACHE_SIZE 10
760
761/* Must be holding cache_lock */
762static struct object *__cache_find(int id)
763{
764 struct object *i;
765
766 list_for_each_entry(i, &amp;cache, list)
767 if (i-&gt;id == id) {
768 i-&gt;popularity++;
769 return i;
770 }
771 return NULL;
772}
773
774/* Must be holding cache_lock */
775static void __cache_delete(struct object *obj)
776{
777 BUG_ON(!obj);
778 list_del(&amp;obj-&gt;list);
779 kfree(obj);
780 cache_num--;
781}
782
783/* Must be holding cache_lock */
784static void __cache_add(struct object *obj)
785{
786 list_add(&amp;obj-&gt;list, &amp;cache);
787 if (++cache_num > MAX_CACHE_SIZE) {
788 struct object *i, *outcast = NULL;
789 list_for_each_entry(i, &amp;cache, list) {
790 if (!outcast || i-&gt;popularity &lt; outcast-&gt;popularity)
791 outcast = i;
792 }
793 __cache_delete(outcast);
794 }
795}
796
797int cache_add(int id, const char *name)
798{
799 struct object *obj;
800
801 if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL)
802 return -ENOMEM;
803
804 strlcpy(obj-&gt;name, name, sizeof(obj-&gt;name));
805 obj-&gt;id = id;
806 obj-&gt;popularity = 0;
807
808 mutex_lock(&amp;cache_lock);
809 __cache_add(obj);
810 mutex_unlock(&amp;cache_lock);
811 return 0;
812}
813
814void cache_delete(int id)
815{
816 mutex_lock(&amp;cache_lock);
817 __cache_delete(__cache_find(id));
818 mutex_unlock(&amp;cache_lock);
819}
820
821int cache_find(int id, char *name)
822{
823 struct object *obj;
824 int ret = -ENOENT;
825
826 mutex_lock(&amp;cache_lock);
827 obj = __cache_find(id);
828 if (obj) {
829 ret = 0;
830 strcpy(name, obj-&gt;name);
831 }
832 mutex_unlock(&amp;cache_lock);
833 return ret;
834}
835</programlisting>
836
837 <para>
838Note that we always make sure we have the cache_lock when we add,
839delete, or look up the cache: both the cache infrastructure itself and
840the contents of the objects are protected by the lock. In this case
841it's easy, since we copy the data for the user, and never let them
842access the objects directly.
843 </para>
844 <para>
845There is a slight (and common) optimization here: in
846<function>cache_add</function> we set up the fields of the object
847before grabbing the lock. This is safe, as no-one else can access it
848until we put it in cache.
849 </para>
850 </sect1>
851
852 <sect1 id="examples-interrupt">
853 <title>Accessing From Interrupt Context</title>
854 <para>
855Now consider the case where <function>cache_find</function> can be
856called from interrupt context: either a hardware interrupt or a
857softirq. An example would be a timer which deletes object from the
858cache.
859 </para>
860 <para>
861The change is shown below, in standard patch format: the
862<symbol>-</symbol> are lines which are taken away, and the
863<symbol>+</symbol> are lines which are added.
864 </para>
865<programlisting>
866--- cache.c.usercontext 2003-12-09 13:58:54.000000000 +1100
867+++ cache.c.interrupt 2003-12-09 14:07:49.000000000 +1100
868@@ -12,7 +12,7 @@
869 int popularity;
870 };
871
872-static DEFINE_MUTEX(cache_lock);
873+static DEFINE_SPINLOCK(cache_lock);
874 static LIST_HEAD(cache);
875 static unsigned int cache_num = 0;
876 #define MAX_CACHE_SIZE 10
877@@ -55,6 +55,7 @@
878 int cache_add(int id, const char *name)
879 {
880 struct object *obj;
881+ unsigned long flags;
882
883 if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL)
884 return -ENOMEM;
885@@ -63,30 +64,33 @@
886 obj-&gt;id = id;
887 obj-&gt;popularity = 0;
888
889- mutex_lock(&amp;cache_lock);
890+ spin_lock_irqsave(&amp;cache_lock, flags);
891 __cache_add(obj);
892- mutex_unlock(&amp;cache_lock);
893+ spin_unlock_irqrestore(&amp;cache_lock, flags);
894 return 0;
895 }
896
897 void cache_delete(int id)
898 {
899- mutex_lock(&amp;cache_lock);
900+ unsigned long flags;
901+
902+ spin_lock_irqsave(&amp;cache_lock, flags);
903 __cache_delete(__cache_find(id));
904- mutex_unlock(&amp;cache_lock);
905+ spin_unlock_irqrestore(&amp;cache_lock, flags);
906 }
907
908 int cache_find(int id, char *name)
909 {
910 struct object *obj;
911 int ret = -ENOENT;
912+ unsigned long flags;
913
914- mutex_lock(&amp;cache_lock);
915+ spin_lock_irqsave(&amp;cache_lock, flags);
916 obj = __cache_find(id);
917 if (obj) {
918 ret = 0;
919 strcpy(name, obj-&gt;name);
920 }
921- mutex_unlock(&amp;cache_lock);
922+ spin_unlock_irqrestore(&amp;cache_lock, flags);
923 return ret;
924 }
925</programlisting>
926
927 <para>
928Note that the <function>spin_lock_irqsave</function> will turn off
929interrupts if they are on, otherwise does nothing (if we are already
930in an interrupt handler), hence these functions are safe to call from
931any context.
932 </para>
933 <para>
934Unfortunately, <function>cache_add</function> calls
935<function>kmalloc</function> with the <symbol>GFP_KERNEL</symbol>
936flag, which is only legal in user context. I have assumed that
937<function>cache_add</function> is still only called in user context,
938otherwise this should become a parameter to
939<function>cache_add</function>.
940 </para>
941 </sect1>
942 <sect1 id="examples-refcnt">
943 <title>Exposing Objects Outside This File</title>
944 <para>
945If our objects contained more information, it might not be sufficient
946to copy the information in and out: other parts of the code might want
947to keep pointers to these objects, for example, rather than looking up
948the id every time. This produces two problems.
949 </para>
950 <para>
951The first problem is that we use the <symbol>cache_lock</symbol> to
952protect objects: we'd need to make this non-static so the rest of the
953code can use it. This makes locking trickier, as it is no longer all
954in one place.
955 </para>
956 <para>
957The second problem is the lifetime problem: if another structure keeps
958a pointer to an object, it presumably expects that pointer to remain
959valid. Unfortunately, this is only guaranteed while you hold the
960lock, otherwise someone might call <function>cache_delete</function>
961and even worse, add another object, re-using the same address.
962 </para>
963 <para>
964As there is only one lock, you can't hold it forever: no-one else would
965get any work done.
966 </para>
967 <para>
968The solution to this problem is to use a reference count: everyone who
969has a pointer to the object increases it when they first get the
970object, and drops the reference count when they're finished with it.
971Whoever drops it to zero knows it is unused, and can actually delete it.
972 </para>
973 <para>
974Here is the code:
975 </para>
976
977<programlisting>
978--- cache.c.interrupt 2003-12-09 14:25:43.000000000 +1100
979+++ cache.c.refcnt 2003-12-09 14:33:05.000000000 +1100
980@@ -7,6 +7,7 @@
981 struct object
982 {
983 struct list_head list;
984+ unsigned int refcnt;
985 int id;
986 char name[32];
987 int popularity;
988@@ -17,6 +18,35 @@
989 static unsigned int cache_num = 0;
990 #define MAX_CACHE_SIZE 10
991
992+static void __object_put(struct object *obj)
993+{
994+ if (--obj-&gt;refcnt == 0)
995+ kfree(obj);
996+}
997+
998+static void __object_get(struct object *obj)
999+{
1000+ obj-&gt;refcnt++;
1001+}
1002+
1003+void object_put(struct object *obj)
1004+{
1005+ unsigned long flags;
1006+
1007+ spin_lock_irqsave(&amp;cache_lock, flags);
1008+ __object_put(obj);
1009+ spin_unlock_irqrestore(&amp;cache_lock, flags);
1010+}
1011+
1012+void object_get(struct object *obj)
1013+{
1014+ unsigned long flags;
1015+
1016+ spin_lock_irqsave(&amp;cache_lock, flags);
1017+ __object_get(obj);
1018+ spin_unlock_irqrestore(&amp;cache_lock, flags);
1019+}
1020+
1021 /* Must be holding cache_lock */
1022 static struct object *__cache_find(int id)
1023 {
1024@@ -35,6 +65,7 @@
1025 {
1026 BUG_ON(!obj);
1027 list_del(&amp;obj-&gt;list);
1028+ __object_put(obj);
1029 cache_num--;
1030 }
1031
1032@@ -63,6 +94,7 @@
1033 strlcpy(obj-&gt;name, name, sizeof(obj-&gt;name));
1034 obj-&gt;id = id;
1035 obj-&gt;popularity = 0;
1036+ obj-&gt;refcnt = 1; /* The cache holds a reference */
1037
1038 spin_lock_irqsave(&amp;cache_lock, flags);
1039 __cache_add(obj);
1040@@ -79,18 +111,15 @@
1041 spin_unlock_irqrestore(&amp;cache_lock, flags);
1042 }
1043
1044-int cache_find(int id, char *name)
1045+struct object *cache_find(int id)
1046 {
1047 struct object *obj;
1048- int ret = -ENOENT;
1049 unsigned long flags;
1050
1051 spin_lock_irqsave(&amp;cache_lock, flags);
1052 obj = __cache_find(id);
1053- if (obj) {
1054- ret = 0;
1055- strcpy(name, obj-&gt;name);
1056- }
1057+ if (obj)
1058+ __object_get(obj);
1059 spin_unlock_irqrestore(&amp;cache_lock, flags);
1060- return ret;
1061+ return obj;
1062 }
1063</programlisting>
1064
1065<para>
1066We encapsulate the reference counting in the standard 'get' and 'put'
1067functions. Now we can return the object itself from
1068<function>cache_find</function> which has the advantage that the user
1069can now sleep holding the object (eg. to
1070<function>copy_to_user</function> to name to userspace).
1071</para>
1072<para>
1073The other point to note is that I said a reference should be held for
1074every pointer to the object: thus the reference count is 1 when first
1075inserted into the cache. In some versions the framework does not hold
1076a reference count, but they are more complicated.
1077</para>
1078
1079 <sect2 id="examples-refcnt-atomic">
1080 <title>Using Atomic Operations For The Reference Count</title>
1081<para>
1082In practice, <type>atomic_t</type> would usually be used for
1083<structfield>refcnt</structfield>. There are a number of atomic
1084operations defined in
1085
1086<filename class="headerfile">include/asm/atomic.h</filename>: these are
1087guaranteed to be seen atomically from all CPUs in the system, so no
1088lock is required. In this case, it is simpler than using spinlocks,
1089although for anything non-trivial using spinlocks is clearer. The
1090<function>atomic_inc</function> and
1091<function>atomic_dec_and_test</function> are used instead of the
1092standard increment and decrement operators, and the lock is no longer
1093used to protect the reference count itself.
1094</para>
1095
1096<programlisting>
1097--- cache.c.refcnt 2003-12-09 15:00:35.000000000 +1100
1098+++ cache.c.refcnt-atomic 2003-12-11 15:49:42.000000000 +1100
1099@@ -7,7 +7,7 @@
1100 struct object
1101 {
1102 struct list_head list;
1103- unsigned int refcnt;
1104+ atomic_t refcnt;
1105 int id;
1106 char name[32];
1107 int popularity;
1108@@ -18,33 +18,15 @@
1109 static unsigned int cache_num = 0;
1110 #define MAX_CACHE_SIZE 10
1111
1112-static void __object_put(struct object *obj)
1113-{
1114- if (--obj-&gt;refcnt == 0)
1115- kfree(obj);
1116-}
1117-
1118-static void __object_get(struct object *obj)
1119-{
1120- obj-&gt;refcnt++;
1121-}
1122-
1123 void object_put(struct object *obj)
1124 {
1125- unsigned long flags;
1126-
1127- spin_lock_irqsave(&amp;cache_lock, flags);
1128- __object_put(obj);
1129- spin_unlock_irqrestore(&amp;cache_lock, flags);
1130+ if (atomic_dec_and_test(&amp;obj-&gt;refcnt))
1131+ kfree(obj);
1132 }
1133
1134 void object_get(struct object *obj)
1135 {
1136- unsigned long flags;
1137-
1138- spin_lock_irqsave(&amp;cache_lock, flags);
1139- __object_get(obj);
1140- spin_unlock_irqrestore(&amp;cache_lock, flags);
1141+ atomic_inc(&amp;obj-&gt;refcnt);
1142 }
1143
1144 /* Must be holding cache_lock */
1145@@ -65,7 +47,7 @@
1146 {
1147 BUG_ON(!obj);
1148 list_del(&amp;obj-&gt;list);
1149- __object_put(obj);
1150+ object_put(obj);
1151 cache_num--;
1152 }
1153
1154@@ -94,7 +76,7 @@
1155 strlcpy(obj-&gt;name, name, sizeof(obj-&gt;name));
1156 obj-&gt;id = id;
1157 obj-&gt;popularity = 0;
1158- obj-&gt;refcnt = 1; /* The cache holds a reference */
1159+ atomic_set(&amp;obj-&gt;refcnt, 1); /* The cache holds a reference */
1160
1161 spin_lock_irqsave(&amp;cache_lock, flags);
1162 __cache_add(obj);
1163@@ -119,7 +101,7 @@
1164 spin_lock_irqsave(&amp;cache_lock, flags);
1165 obj = __cache_find(id);
1166 if (obj)
1167- __object_get(obj);
1168+ object_get(obj);
1169 spin_unlock_irqrestore(&amp;cache_lock, flags);
1170 return obj;
1171 }
1172</programlisting>
1173</sect2>
1174</sect1>
1175
1176 <sect1 id="examples-lock-per-obj">
1177 <title>Protecting The Objects Themselves</title>
1178 <para>
1179In these examples, we assumed that the objects (except the reference
1180counts) never changed once they are created. If we wanted to allow
1181the name to change, there are three possibilities:
1182 </para>
1183 <itemizedlist>
1184 <listitem>
1185 <para>
1186You can make <symbol>cache_lock</symbol> non-static, and tell people
1187to grab that lock before changing the name in any object.
1188 </para>
1189 </listitem>
1190 <listitem>
1191 <para>
1192You can provide a <function>cache_obj_rename</function> which grabs
1193this lock and changes the name for the caller, and tell everyone to
1194use that function.
1195 </para>
1196 </listitem>
1197 <listitem>
1198 <para>
1199You can make the <symbol>cache_lock</symbol> protect only the cache
1200itself, and use another lock to protect the name.
1201 </para>
1202 </listitem>
1203 </itemizedlist>
1204
1205 <para>
1206Theoretically, you can make the locks as fine-grained as one lock for
1207every field, for every object. In practice, the most common variants
1208are:
1209</para>
1210 <itemizedlist>
1211 <listitem>
1212 <para>
1213One lock which protects the infrastructure (the <symbol>cache</symbol>
1214list in this example) and all the objects. This is what we have done
1215so far.
1216 </para>
1217 </listitem>
1218 <listitem>
1219 <para>
1220One lock which protects the infrastructure (including the list
1221pointers inside the objects), and one lock inside the object which
1222protects the rest of that object.
1223 </para>
1224 </listitem>
1225 <listitem>
1226 <para>
1227Multiple locks to protect the infrastructure (eg. one lock per hash
1228chain), possibly with a separate per-object lock.
1229 </para>
1230 </listitem>
1231 </itemizedlist>
1232
1233<para>
1234Here is the "lock-per-object" implementation:
1235</para>
1236<programlisting>
1237--- cache.c.refcnt-atomic 2003-12-11 15:50:54.000000000 +1100
1238+++ cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100
1239@@ -6,11 +6,17 @@
1240
1241 struct object
1242 {
1243+ /* These two protected by cache_lock. */
1244 struct list_head list;
1245+ int popularity;
1246+
1247 atomic_t refcnt;
1248+
1249+ /* Doesn't change once created. */
1250 int id;
1251+
1252+ spinlock_t lock; /* Protects the name */
1253 char name[32];
1254- int popularity;
1255 };
1256
1257 static DEFINE_SPINLOCK(cache_lock);
1258@@ -77,6 +84,7 @@
1259 obj-&gt;id = id;
1260 obj-&gt;popularity = 0;
1261 atomic_set(&amp;obj-&gt;refcnt, 1); /* The cache holds a reference */
1262+ spin_lock_init(&amp;obj-&gt;lock);
1263
1264 spin_lock_irqsave(&amp;cache_lock, flags);
1265 __cache_add(obj);
1266</programlisting>
1267
1268<para>
1269Note that I decide that the <structfield>popularity</structfield>
1270count should be protected by the <symbol>cache_lock</symbol> rather
1271than the per-object lock: this is because it (like the
1272<structname>struct list_head</structname> inside the object) is
1273logically part of the infrastructure. This way, I don't need to grab
1274the lock of every object in <function>__cache_add</function> when
1275seeking the least popular.
1276</para>
1277
1278<para>
1279I also decided that the <structfield>id</structfield> member is
1280unchangeable, so I don't need to grab each object lock in
1281<function>__cache_find()</function> to examine the
1282<structfield>id</structfield>: the object lock is only used by a
1283caller who wants to read or write the <structfield>name</structfield>
1284field.
1285</para>
1286
1287<para>
1288Note also that I added a comment describing what data was protected by
1289which locks. This is extremely important, as it describes the runtime
1290behavior of the code, and can be hard to gain from just reading. And
1291as Alan Cox says, <quote>Lock data, not code</quote>.
1292</para>
1293</sect1>
1294</chapter>
1295
1296 <chapter id="common-problems">
1297 <title>Common Problems</title>
1298 <sect1 id="deadlock">
1299 <title>Deadlock: Simple and Advanced</title>
1300
1301 <para>
1302 There is a coding bug where a piece of code tries to grab a
1303 spinlock twice: it will spin forever, waiting for the lock to
1304 be released (spinlocks, rwlocks and mutexes are not
1305 recursive in Linux). This is trivial to diagnose: not a
1306 stay-up-five-nights-talk-to-fluffy-code-bunnies kind of
1307 problem.
1308 </para>
1309
1310 <para>
1311 For a slightly more complex case, imagine you have a region
1312 shared by a softirq and user context. If you use a
1313 <function>spin_lock()</function> call to protect it, it is
1314 possible that the user context will be interrupted by the softirq
1315 while it holds the lock, and the softirq will then spin
1316 forever trying to get the same lock.
1317 </para>
1318
1319 <para>
1320 Both of these are called deadlock, and as shown above, it can
1321 occur even with a single CPU (although not on UP compiles,
1322 since spinlocks vanish on kernel compiles with
1323 <symbol>CONFIG_SMP</symbol>=n. You'll still get data corruption
1324 in the second example).
1325 </para>
1326
1327 <para>
1328 This complete lockup is easy to diagnose: on SMP boxes the
1329 watchdog timer or compiling with <symbol>DEBUG_SPINLOCK</symbol> set
1330 (<filename>include/linux/spinlock.h</filename>) will show this up
1331 immediately when it happens.
1332 </para>
1333
1334 <para>
1335 A more complex problem is the so-called 'deadly embrace',
1336 involving two or more locks. Say you have a hash table: each
1337 entry in the table is a spinlock, and a chain of hashed
1338 objects. Inside a softirq handler, you sometimes want to
1339 alter an object from one place in the hash to another: you
1340 grab the spinlock of the old hash chain and the spinlock of
1341 the new hash chain, and delete the object from the old one,
1342 and insert it in the new one.
1343 </para>
1344
1345 <para>
1346 There are two problems here. First, if your code ever
1347 tries to move the object to the same chain, it will deadlock
1348 with itself as it tries to lock it twice. Secondly, if the
1349 same softirq on another CPU is trying to move another object
1350 in the reverse direction, the following could happen:
1351 </para>
1352
1353 <table>
1354 <title>Consequences</title>
1355
1356 <tgroup cols="2" align="left">
1357
1358 <thead>
1359 <row>
1360 <entry>CPU 1</entry>
1361 <entry>CPU 2</entry>
1362 </row>
1363 </thead>
1364
1365 <tbody>
1366 <row>
1367 <entry>Grab lock A -&gt; OK</entry>
1368 <entry>Grab lock B -&gt; OK</entry>
1369 </row>
1370 <row>
1371 <entry>Grab lock B -&gt; spin</entry>
1372 <entry>Grab lock A -&gt; spin</entry>
1373 </row>
1374 </tbody>
1375 </tgroup>
1376 </table>
1377
1378 <para>
1379 The two CPUs will spin forever, waiting for the other to give up
1380 their lock. It will look, smell, and feel like a crash.
1381 </para>
1382 </sect1>
1383
1384 <sect1 id="techs-deadlock-prevent">
1385 <title>Preventing Deadlock</title>
1386
1387 <para>
1388 Textbooks will tell you that if you always lock in the same
1389 order, you will never get this kind of deadlock. Practice
1390 will tell you that this approach doesn't scale: when I
1391 create a new lock, I don't understand enough of the kernel
1392 to figure out where in the 5000 lock hierarchy it will fit.
1393 </para>
1394
1395 <para>
1396 The best locks are encapsulated: they never get exposed in
1397 headers, and are never held around calls to non-trivial
1398 functions outside the same file. You can read through this
1399 code and see that it will never deadlock, because it never
1400 tries to grab another lock while it has that one. People
1401 using your code don't even need to know you are using a
1402 lock.
1403 </para>
1404
1405 <para>
1406 A classic problem here is when you provide callbacks or
1407 hooks: if you call these with the lock held, you risk simple
1408 deadlock, or a deadly embrace (who knows what the callback
1409 will do?). Remember, the other programmers are out to get
1410 you, so don't do this.
1411 </para>
1412
1413 <sect2 id="techs-deadlock-overprevent">
1414 <title>Overzealous Prevention Of Deadlocks</title>
1415
1416 <para>
1417 Deadlocks are problematic, but not as bad as data
1418 corruption. Code which grabs a read lock, searches a list,
1419 fails to find what it wants, drops the read lock, grabs a
1420 write lock and inserts the object has a race condition.
1421 </para>
1422
1423 <para>
1424 If you don't see why, please stay the fuck away from my code.
1425 </para>
1426 </sect2>
1427 </sect1>
1428
1429 <sect1 id="racing-timers">
1430 <title>Racing Timers: A Kernel Pastime</title>
1431
1432 <para>
1433 Timers can produce their own special problems with races.
1434 Consider a collection of objects (list, hash, etc) where each
1435 object has a timer which is due to destroy it.
1436 </para>
1437
1438 <para>
1439 If you want to destroy the entire collection (say on module
1440 removal), you might do the following:
1441 </para>
1442
1443 <programlisting>
1444 /* THIS CODE BAD BAD BAD BAD: IF IT WAS ANY WORSE IT WOULD USE
1445 HUNGARIAN NOTATION */
1446 spin_lock_bh(&amp;list_lock);
1447
1448 while (list) {
1449 struct foo *next = list-&gt;next;
1450 del_timer(&amp;list-&gt;timer);
1451 kfree(list);
1452 list = next;
1453 }
1454
1455 spin_unlock_bh(&amp;list_lock);
1456 </programlisting>
1457
1458 <para>
1459 Sooner or later, this will crash on SMP, because a timer can
1460 have just gone off before the <function>spin_lock_bh()</function>,
1461 and it will only get the lock after we
1462 <function>spin_unlock_bh()</function>, and then try to free
1463 the element (which has already been freed!).
1464 </para>
1465
1466 <para>
1467 This can be avoided by checking the result of
1468 <function>del_timer()</function>: if it returns
1469 <returnvalue>1</returnvalue>, the timer has been deleted.
1470 If <returnvalue>0</returnvalue>, it means (in this
1471 case) that it is currently running, so we can do:
1472 </para>
1473
1474 <programlisting>
1475 retry:
1476 spin_lock_bh(&amp;list_lock);
1477
1478 while (list) {
1479 struct foo *next = list-&gt;next;
1480 if (!del_timer(&amp;list-&gt;timer)) {
1481 /* Give timer a chance to delete this */
1482 spin_unlock_bh(&amp;list_lock);
1483 goto retry;
1484 }
1485 kfree(list);
1486 list = next;
1487 }
1488
1489 spin_unlock_bh(&amp;list_lock);
1490 </programlisting>
1491
1492 <para>
1493 Another common problem is deleting timers which restart
1494 themselves (by calling <function>add_timer()</function> at the end
1495 of their timer function). Because this is a fairly common case
1496 which is prone to races, you should use <function>del_timer_sync()</function>
1497 (<filename class="headerfile">include/linux/timer.h</filename>)
1498 to handle this case. It returns the number of times the timer
1499 had to be deleted before we finally stopped it from adding itself back
1500 in.
1501 </para>
1502 </sect1>
1503
1504 </chapter>
1505
1506 <chapter id="Efficiency">
1507 <title>Locking Speed</title>
1508
1509 <para>
1510There are three main things to worry about when considering speed of
1511some code which does locking. First is concurrency: how many things
1512are going to be waiting while someone else is holding a lock. Second
1513is the time taken to actually acquire and release an uncontended lock.
1514Third is using fewer, or smarter locks. I'm assuming that the lock is
1515used fairly often: otherwise, you wouldn't be concerned about
1516efficiency.
1517</para>
1518 <para>
1519Concurrency depends on how long the lock is usually held: you should
1520hold the lock for as long as needed, but no longer. In the cache
1521example, we always create the object without the lock held, and then
1522grab the lock only when we are ready to insert it in the list.
1523</para>
1524 <para>
1525Acquisition times depend on how much damage the lock operations do to
1526the pipeline (pipeline stalls) and how likely it is that this CPU was
1527the last one to grab the lock (ie. is the lock cache-hot for this
1528CPU): on a machine with more CPUs, this likelihood drops fast.
1529Consider a 700MHz Intel Pentium III: an instruction takes about 0.7ns,
1530an atomic increment takes about 58ns, a lock which is cache-hot on
1531this CPU takes 160ns, and a cacheline transfer from another CPU takes
1532an additional 170 to 360ns. (These figures from Paul McKenney's
1533<ulink url="http://www.linuxjournal.com/article.php?sid=6993"> Linux
1534Journal RCU article</ulink>).
1535</para>
1536 <para>
1537These two aims conflict: holding a lock for a short time might be done
1538by splitting locks into parts (such as in our final per-object-lock
1539example), but this increases the number of lock acquisitions, and the
1540results are often slower than having a single lock. This is another
1541reason to advocate locking simplicity.
1542</para>
1543 <para>
1544The third concern is addressed below: there are some methods to reduce
1545the amount of locking which needs to be done.
1546</para>
1547
1548 <sect1 id="efficiency-rwlocks">
1549 <title>Read/Write Lock Variants</title>
1550
1551 <para>
1552 Both spinlocks and mutexes have read/write variants:
1553 <type>rwlock_t</type> and <structname>struct rw_semaphore</structname>.
1554 These divide users into two classes: the readers and the writers. If
1555 you are only reading the data, you can get a read lock, but to write to
1556 the data you need the write lock. Many people can hold a read lock,
1557 but a writer must be sole holder.
1558 </para>
1559
1560 <para>
1561 If your code divides neatly along reader/writer lines (as our
1562 cache code does), and the lock is held by readers for
1563 significant lengths of time, using these locks can help. They
1564 are slightly slower than the normal locks though, so in practice
1565 <type>rwlock_t</type> is not usually worthwhile.
1566 </para>
1567 </sect1>
1568
1569 <sect1 id="efficiency-read-copy-update">
1570 <title>Avoiding Locks: Read Copy Update</title>
1571
1572 <para>
1573 There is a special method of read/write locking called Read Copy
1574 Update. Using RCU, the readers can avoid taking a lock
1575 altogether: as we expect our cache to be read more often than
1576 updated (otherwise the cache is a waste of time), it is a
1577 candidate for this optimization.
1578 </para>
1579
1580 <para>
1581 How do we get rid of read locks? Getting rid of read locks
1582 means that writers may be changing the list underneath the
1583 readers. That is actually quite simple: we can read a linked
1584 list while an element is being added if the writer adds the
1585 element very carefully. For example, adding
1586 <symbol>new</symbol> to a single linked list called
1587 <symbol>list</symbol>:
1588 </para>
1589
1590 <programlisting>
1591 new-&gt;next = list-&gt;next;
1592 wmb();
1593 list-&gt;next = new;
1594 </programlisting>
1595
1596 <para>
1597 The <function>wmb()</function> is a write memory barrier. It
1598 ensures that the first operation (setting the new element's
1599 <symbol>next</symbol> pointer) is complete and will be seen by
1600 all CPUs, before the second operation is (putting the new
1601 element into the list). This is important, since modern
1602 compilers and modern CPUs can both reorder instructions unless
1603 told otherwise: we want a reader to either not see the new
1604 element at all, or see the new element with the
1605 <symbol>next</symbol> pointer correctly pointing at the rest of
1606 the list.
1607 </para>
1608 <para>
1609 Fortunately, there is a function to do this for standard
1610 <structname>struct list_head</structname> lists:
1611 <function>list_add_rcu()</function>
1612 (<filename>include/linux/list.h</filename>).
1613 </para>
1614 <para>
1615 Removing an element from the list is even simpler: we replace
1616 the pointer to the old element with a pointer to its successor,
1617 and readers will either see it, or skip over it.
1618 </para>
1619 <programlisting>
1620 list-&gt;next = old-&gt;next;
1621 </programlisting>
1622 <para>
1623 There is <function>list_del_rcu()</function>
1624 (<filename>include/linux/list.h</filename>) which does this (the
1625 normal version poisons the old object, which we don't want).
1626 </para>
1627 <para>
1628 The reader must also be careful: some CPUs can look through the
1629 <symbol>next</symbol> pointer to start reading the contents of
1630 the next element early, but don't realize that the pre-fetched
1631 contents is wrong when the <symbol>next</symbol> pointer changes
1632 underneath them. Once again, there is a
1633 <function>list_for_each_entry_rcu()</function>
1634 (<filename>include/linux/list.h</filename>) to help you. Of
1635 course, writers can just use
1636 <function>list_for_each_entry()</function>, since there cannot
1637 be two simultaneous writers.
1638 </para>
1639 <para>
1640 Our final dilemma is this: when can we actually destroy the
1641 removed element? Remember, a reader might be stepping through
1642 this element in the list right now: if we free this element and
1643 the <symbol>next</symbol> pointer changes, the reader will jump
1644 off into garbage and crash. We need to wait until we know that
1645 all the readers who were traversing the list when we deleted the
1646 element are finished. We use <function>call_rcu()</function> to
1647 register a callback which will actually destroy the object once
1648 all pre-existing readers are finished. Alternatively,
1649 <function>synchronize_rcu()</function> may be used to block until
1650 all pre-existing are finished.
1651 </para>
1652 <para>
1653 But how does Read Copy Update know when the readers are
1654 finished? The method is this: firstly, the readers always
1655 traverse the list inside
1656 <function>rcu_read_lock()</function>/<function>rcu_read_unlock()</function>
1657 pairs: these simply disable preemption so the reader won't go to
1658 sleep while reading the list.
1659 </para>
1660 <para>
1661 RCU then waits until every other CPU has slept at least once:
1662 since readers cannot sleep, we know that any readers which were
1663 traversing the list during the deletion are finished, and the
1664 callback is triggered. The real Read Copy Update code is a
1665 little more optimized than this, but this is the fundamental
1666 idea.
1667 </para>
1668
1669<programlisting>
1670--- cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100
1671+++ cache.c.rcupdate 2003-12-11 17:55:14.000000000 +1100
1672@@ -1,15 +1,18 @@
1673 #include &lt;linux/list.h&gt;
1674 #include &lt;linux/slab.h&gt;
1675 #include &lt;linux/string.h&gt;
1676+#include &lt;linux/rcupdate.h&gt;
1677 #include &lt;linux/mutex.h&gt;
1678 #include &lt;asm/errno.h&gt;
1679
1680 struct object
1681 {
1682- /* These two protected by cache_lock. */
1683+ /* This is protected by RCU */
1684 struct list_head list;
1685 int popularity;
1686
1687+ struct rcu_head rcu;
1688+
1689 atomic_t refcnt;
1690
1691 /* Doesn't change once created. */
1692@@ -40,7 +43,7 @@
1693 {
1694 struct object *i;
1695
1696- list_for_each_entry(i, &amp;cache, list) {
1697+ list_for_each_entry_rcu(i, &amp;cache, list) {
1698 if (i-&gt;id == id) {
1699 i-&gt;popularity++;
1700 return i;
1701@@ -49,19 +52,25 @@
1702 return NULL;
1703 }
1704
1705+/* Final discard done once we know no readers are looking. */
1706+static void cache_delete_rcu(void *arg)
1707+{
1708+ object_put(arg);
1709+}
1710+
1711 /* Must be holding cache_lock */
1712 static void __cache_delete(struct object *obj)
1713 {
1714 BUG_ON(!obj);
1715- list_del(&amp;obj-&gt;list);
1716- object_put(obj);
1717+ list_del_rcu(&amp;obj-&gt;list);
1718 cache_num--;
1719+ call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu);
1720 }
1721
1722 /* Must be holding cache_lock */
1723 static void __cache_add(struct object *obj)
1724 {
1725- list_add(&amp;obj-&gt;list, &amp;cache);
1726+ list_add_rcu(&amp;obj-&gt;list, &amp;cache);
1727 if (++cache_num > MAX_CACHE_SIZE) {
1728 struct object *i, *outcast = NULL;
1729 list_for_each_entry(i, &amp;cache, list) {
1730@@ -104,12 +114,11 @@
1731 struct object *cache_find(int id)
1732 {
1733 struct object *obj;
1734- unsigned long flags;
1735
1736- spin_lock_irqsave(&amp;cache_lock, flags);
1737+ rcu_read_lock();
1738 obj = __cache_find(id);
1739 if (obj)
1740 object_get(obj);
1741- spin_unlock_irqrestore(&amp;cache_lock, flags);
1742+ rcu_read_unlock();
1743 return obj;
1744 }
1745</programlisting>
1746
1747<para>
1748Note that the reader will alter the
1749<structfield>popularity</structfield> member in
1750<function>__cache_find()</function>, and now it doesn't hold a lock.
1751One solution would be to make it an <type>atomic_t</type>, but for
1752this usage, we don't really care about races: an approximate result is
1753good enough, so I didn't change it.
1754</para>
1755
1756<para>
1757The result is that <function>cache_find()</function> requires no
1758synchronization with any other functions, so is almost as fast on SMP
1759as it would be on UP.
1760</para>
1761
1762<para>
1763There is a further optimization possible here: remember our original
1764cache code, where there were no reference counts and the caller simply
1765held the lock whenever using the object? This is still possible: if
1766you hold the lock, no one can delete the object, so you don't need to
1767get and put the reference count.
1768</para>
1769
1770<para>
1771Now, because the 'read lock' in RCU is simply disabling preemption, a
1772caller which always has preemption disabled between calling
1773<function>cache_find()</function> and
1774<function>object_put()</function> does not need to actually get and
1775put the reference count: we could expose
1776<function>__cache_find()</function> by making it non-static, and
1777such callers could simply call that.
1778</para>
1779<para>
1780The benefit here is that the reference count is not written to: the
1781object is not altered in any way, which is much faster on SMP
1782machines due to caching.
1783</para>
1784 </sect1>
1785
1786 <sect1 id="per-cpu">
1787 <title>Per-CPU Data</title>
1788
1789 <para>
1790 Another technique for avoiding locking which is used fairly
1791 widely is to duplicate information for each CPU. For example,
1792 if you wanted to keep a count of a common condition, you could
1793 use a spin lock and a single counter. Nice and simple.
1794 </para>
1795
1796 <para>
1797 If that was too slow (it's usually not, but if you've got a
1798 really big machine to test on and can show that it is), you
1799 could instead use a counter for each CPU, then none of them need
1800 an exclusive lock. See <function>DEFINE_PER_CPU()</function>,
1801 <function>get_cpu_var()</function> and
1802 <function>put_cpu_var()</function>
1803 (<filename class="headerfile">include/linux/percpu.h</filename>).
1804 </para>
1805
1806 <para>
1807 Of particular use for simple per-cpu counters is the
1808 <type>local_t</type> type, and the
1809 <function>cpu_local_inc()</function> and related functions,
1810 which are more efficient than simple code on some architectures
1811 (<filename class="headerfile">include/asm/local.h</filename>).
1812 </para>
1813
1814 <para>
1815 Note that there is no simple, reliable way of getting an exact
1816 value of such a counter, without introducing more locks. This
1817 is not a problem for some uses.
1818 </para>
1819 </sect1>
1820
1821 <sect1 id="mostly-hardirq">
1822 <title>Data Which Mostly Used By An IRQ Handler</title>
1823
1824 <para>
1825 If data is always accessed from within the same IRQ handler, you
1826 don't need a lock at all: the kernel already guarantees that the
1827 irq handler will not run simultaneously on multiple CPUs.
1828 </para>
1829 <para>
1830 Manfred Spraul points out that you can still do this, even if
1831 the data is very occasionally accessed in user context or
1832 softirqs/tasklets. The irq handler doesn't use a lock, and
1833 all other accesses are done as so:
1834 </para>
1835
1836<programlisting>
1837 spin_lock(&amp;lock);
1838 disable_irq(irq);
1839 ...
1840 enable_irq(irq);
1841 spin_unlock(&amp;lock);
1842</programlisting>
1843 <para>
1844 The <function>disable_irq()</function> prevents the irq handler
1845 from running (and waits for it to finish if it's currently
1846 running on other CPUs). The spinlock prevents any other
1847 accesses happening at the same time. Naturally, this is slower
1848 than just a <function>spin_lock_irq()</function> call, so it
1849 only makes sense if this type of access happens extremely
1850 rarely.
1851 </para>
1852 </sect1>
1853 </chapter>
1854
1855 <chapter id="sleeping-things">
1856 <title>What Functions Are Safe To Call From Interrupts?</title>
1857
1858 <para>
1859 Many functions in the kernel sleep (ie. call schedule())
1860 directly or indirectly: you can never call them while holding a
1861 spinlock, or with preemption disabled. This also means you need
1862 to be in user context: calling them from an interrupt is illegal.
1863 </para>
1864
1865 <sect1 id="sleeping">
1866 <title>Some Functions Which Sleep</title>
1867
1868 <para>
1869 The most common ones are listed below, but you usually have to
1870 read the code to find out if other calls are safe. If everyone
1871 else who calls it can sleep, you probably need to be able to
1872 sleep, too. In particular, registration and deregistration
1873 functions usually expect to be called from user context, and can
1874 sleep.
1875 </para>
1876
1877 <itemizedlist>
1878 <listitem>
1879 <para>
1880 Accesses to
1881 <firstterm linkend="gloss-userspace">userspace</firstterm>:
1882 </para>
1883 <itemizedlist>
1884 <listitem>
1885 <para>
1886 <function>copy_from_user()</function>
1887 </para>
1888 </listitem>
1889 <listitem>
1890 <para>
1891 <function>copy_to_user()</function>
1892 </para>
1893 </listitem>
1894 <listitem>
1895 <para>
1896 <function>get_user()</function>
1897 </para>
1898 </listitem>
1899 <listitem>
1900 <para>
1901 <function>put_user()</function>
1902 </para>
1903 </listitem>
1904 </itemizedlist>
1905 </listitem>
1906
1907 <listitem>
1908 <para>
1909 <function>kmalloc(GFP_KERNEL)</function>
1910 </para>
1911 </listitem>
1912
1913 <listitem>
1914 <para>
1915 <function>mutex_lock_interruptible()</function> and
1916 <function>mutex_lock()</function>
1917 </para>
1918 <para>
1919 There is a <function>mutex_trylock()</function> which does not
1920 sleep. Still, it must not be used inside interrupt context since
1921 its implementation is not safe for that.
1922 <function>mutex_unlock()</function> will also never sleep.
1923 It cannot be used in interrupt context either since a mutex
1924 must be released by the same task that acquired it.
1925 </para>
1926 </listitem>
1927 </itemizedlist>
1928 </sect1>
1929
1930 <sect1 id="dont-sleep">
1931 <title>Some Functions Which Don't Sleep</title>
1932
1933 <para>
1934 Some functions are safe to call from any context, or holding
1935 almost any lock.
1936 </para>
1937
1938 <itemizedlist>
1939 <listitem>
1940 <para>
1941 <function>printk()</function>
1942 </para>
1943 </listitem>
1944 <listitem>
1945 <para>
1946 <function>kfree()</function>
1947 </para>
1948 </listitem>
1949 <listitem>
1950 <para>
1951 <function>add_timer()</function> and <function>del_timer()</function>
1952 </para>
1953 </listitem>
1954 </itemizedlist>
1955 </sect1>
1956 </chapter>
1957
1958 <chapter id="apiref-mutex">
1959 <title>Mutex API reference</title>
1960!Iinclude/linux/mutex.h
1961!Ekernel/locking/mutex.c
1962 </chapter>
1963
1964 <chapter id="apiref-futex">
1965 <title>Futex API reference</title>
1966!Ikernel/futex.c
1967 </chapter>
1968
1969 <chapter id="references">
1970 <title>Further reading</title>
1971
1972 <itemizedlist>
1973 <listitem>
1974 <para>
1975 <filename>Documentation/locking/spinlocks.txt</filename>:
1976 Linus Torvalds' spinlocking tutorial in the kernel sources.
1977 </para>
1978 </listitem>
1979
1980 <listitem>
1981 <para>
1982 Unix Systems for Modern Architectures: Symmetric
1983 Multiprocessing and Caching for Kernel Programmers:
1984 </para>
1985
1986 <para>
1987 Curt Schimmel's very good introduction to kernel level
1988 locking (not written for Linux, but nearly everything
1989 applies). The book is expensive, but really worth every
1990 penny to understand SMP locking. [ISBN: 0201633388]
1991 </para>
1992 </listitem>
1993 </itemizedlist>
1994 </chapter>
1995
1996 <chapter id="thanks">
1997 <title>Thanks</title>
1998
1999 <para>
2000 Thanks to Telsa Gwynne for DocBooking, neatening and adding
2001 style.
2002 </para>
2003
2004 <para>
2005 Thanks to Martin Pool, Philipp Rumpf, Stephen Rothwell, Paul
2006 Mackerras, Ruedi Aschwanden, Alan Cox, Manfred Spraul, Tim
2007 Waugh, Pete Zaitcev, James Morris, Robert Love, Paul McKenney,
2008 John Ashby for proofreading, correcting, flaming, commenting.
2009 </para>
2010
2011 <para>
2012 Thanks to the cabal for having no influence on this document.
2013 </para>
2014 </chapter>
2015
2016 <glossary id="glossary">
2017 <title>Glossary</title>
2018
2019 <glossentry id="gloss-preemption">
2020 <glossterm>preemption</glossterm>
2021 <glossdef>
2022 <para>
2023 Prior to 2.5, or when <symbol>CONFIG_PREEMPT</symbol> is
2024 unset, processes in user context inside the kernel would not
2025 preempt each other (ie. you had that CPU until you gave it up,
2026 except for interrupts). With the addition of
2027 <symbol>CONFIG_PREEMPT</symbol> in 2.5.4, this changed: when
2028 in user context, higher priority tasks can "cut in": spinlocks
2029 were changed to disable preemption, even on UP.
2030 </para>
2031 </glossdef>
2032 </glossentry>
2033
2034 <glossentry id="gloss-bh">
2035 <glossterm>bh</glossterm>
2036 <glossdef>
2037 <para>
2038 Bottom Half: for historical reasons, functions with
2039 '_bh' in them often now refer to any software interrupt, e.g.
2040 <function>spin_lock_bh()</function> blocks any software interrupt
2041 on the current CPU. Bottom halves are deprecated, and will
2042 eventually be replaced by tasklets. Only one bottom half will be
2043 running at any time.
2044 </para>
2045 </glossdef>
2046 </glossentry>
2047
2048 <glossentry id="gloss-hwinterrupt">
2049 <glossterm>Hardware Interrupt / Hardware IRQ</glossterm>
2050 <glossdef>
2051 <para>
2052 Hardware interrupt request. <function>in_irq()</function> returns
2053 <returnvalue>true</returnvalue> in a hardware interrupt handler.
2054 </para>
2055 </glossdef>
2056 </glossentry>
2057
2058 <glossentry id="gloss-interruptcontext">
2059 <glossterm>Interrupt Context</glossterm>
2060 <glossdef>
2061 <para>
2062 Not user context: processing a hardware irq or software irq.
2063 Indicated by the <function>in_interrupt()</function> macro
2064 returning <returnvalue>true</returnvalue>.
2065 </para>
2066 </glossdef>
2067 </glossentry>
2068
2069 <glossentry id="gloss-smp">
2070 <glossterm><acronym>SMP</acronym></glossterm>
2071 <glossdef>
2072 <para>
2073 Symmetric Multi-Processor: kernels compiled for multiple-CPU
2074 machines. (CONFIG_SMP=y).
2075 </para>
2076 </glossdef>
2077 </glossentry>
2078
2079 <glossentry id="gloss-softirq">
2080 <glossterm>Software Interrupt / softirq</glossterm>
2081 <glossdef>
2082 <para>
2083 Software interrupt handler. <function>in_irq()</function> returns
2084 <returnvalue>false</returnvalue>; <function>in_softirq()</function>
2085 returns <returnvalue>true</returnvalue>. Tasklets and softirqs
2086 both fall into the category of 'software interrupts'.
2087 </para>
2088 <para>
2089 Strictly speaking a softirq is one of up to 32 enumerated software
2090 interrupts which can run on multiple CPUs at once.
2091 Sometimes used to refer to tasklets as
2092 well (ie. all software interrupts).
2093 </para>
2094 </glossdef>
2095 </glossentry>
2096
2097 <glossentry id="gloss-tasklet">
2098 <glossterm>tasklet</glossterm>
2099 <glossdef>
2100 <para>
2101 A dynamically-registrable software interrupt,
2102 which is guaranteed to only run on one CPU at a time.
2103 </para>
2104 </glossdef>
2105 </glossentry>
2106
2107 <glossentry id="gloss-timers">
2108 <glossterm>timer</glossterm>
2109 <glossdef>
2110 <para>
2111 A dynamically-registrable software interrupt, which is run at
2112 (or close to) a given time. When running, it is just like a
2113 tasklet (in fact, they are called from the TIMER_SOFTIRQ).
2114 </para>
2115 </glossdef>
2116 </glossentry>
2117
2118 <glossentry id="gloss-up">
2119 <glossterm><acronym>UP</acronym></glossterm>
2120 <glossdef>
2121 <para>
2122 Uni-Processor: Non-SMP. (CONFIG_SMP=n).
2123 </para>
2124 </glossdef>
2125 </glossentry>
2126
2127 <glossentry id="gloss-usercontext">
2128 <glossterm>User Context</glossterm>
2129 <glossdef>
2130 <para>
2131 The kernel executing on behalf of a particular process (ie. a
2132 system call or trap) or kernel thread. You can tell which
2133 process with the <symbol>current</symbol> macro.) Not to
2134 be confused with userspace. Can be interrupted by software or
2135 hardware interrupts.
2136 </para>
2137 </glossdef>
2138 </glossentry>
2139
2140 <glossentry id="gloss-userspace">
2141 <glossterm>Userspace</glossterm>
2142 <glossdef>
2143 <para>
2144 A process executing its own code outside the kernel.
2145 </para>
2146 </glossdef>
2147 </glossentry>
2148
2149 </glossary>
2150</book>
2151
diff --git a/Documentation/DocBook/kgdb.tmpl b/Documentation/DocBook/kgdb.tmpl
deleted file mode 100644
index 856ac20bf367..000000000000
--- a/Documentation/DocBook/kgdb.tmpl
+++ /dev/null
@@ -1,918 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="kgdbOnLinux">
6 <bookinfo>
7 <title>Using kgdb, kdb and the kernel debugger internals</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Jason</firstname>
12 <surname>Wessel</surname>
13 <affiliation>
14 <address>
15 <email>jason.wessel@windriver.com</email>
16 </address>
17 </affiliation>
18 </author>
19 </authorgroup>
20 <copyright>
21 <year>2008,2010</year>
22 <holder>Wind River Systems, Inc.</holder>
23 </copyright>
24 <copyright>
25 <year>2004-2005</year>
26 <holder>MontaVista Software, Inc.</holder>
27 </copyright>
28 <copyright>
29 <year>2004</year>
30 <holder>Amit S. Kale</holder>
31 </copyright>
32
33 <legalnotice>
34 <para>
35 This file is licensed under the terms of the GNU General Public License
36 version 2. This program is licensed "as is" without any warranty of any
37 kind, whether express or implied.
38 </para>
39
40 </legalnotice>
41 </bookinfo>
42
43<toc></toc>
44 <chapter id="Introduction">
45 <title>Introduction</title>
46 <para>
47 The kernel has two different debugger front ends (kdb and kgdb)
48 which interface to the debug core. It is possible to use either
49 of the debugger front ends and dynamically transition between them
50 if you configure the kernel properly at compile and runtime.
51 </para>
52 <para>
53 Kdb is simplistic shell-style interface which you can use on a
54 system console with a keyboard or serial console. You can use it
55 to inspect memory, registers, process lists, dmesg, and even set
56 breakpoints to stop in a certain location. Kdb is not a source
57 level debugger, although you can set breakpoints and execute some
58 basic kernel run control. Kdb is mainly aimed at doing some
59 analysis to aid in development or diagnosing kernel problems. You
60 can access some symbols by name in kernel built-ins or in kernel
61 modules if the code was built
62 with <symbol>CONFIG_KALLSYMS</symbol>.
63 </para>
64 <para>
65 Kgdb is intended to be used as a source level debugger for the
66 Linux kernel. It is used along with gdb to debug a Linux kernel.
67 The expectation is that gdb can be used to "break in" to the
68 kernel to inspect memory, variables and look through call stack
69 information similar to the way an application developer would use
70 gdb to debug an application. It is possible to place breakpoints
71 in kernel code and perform some limited execution stepping.
72 </para>
73 <para>
74 Two machines are required for using kgdb. One of these machines is
75 a development machine and the other is the target machine. The
76 kernel to be debugged runs on the target machine. The development
77 machine runs an instance of gdb against the vmlinux file which
78 contains the symbols (not a boot image such as bzImage, zImage,
79 uImage...). In gdb the developer specifies the connection
80 parameters and connects to kgdb. The type of connection a
81 developer makes with gdb depends on the availability of kgdb I/O
82 modules compiled as built-ins or loadable kernel modules in the test
83 machine's kernel.
84 </para>
85 </chapter>
86 <chapter id="CompilingAKernel">
87 <title>Compiling a kernel</title>
88 <para>
89 <itemizedlist>
90 <listitem><para>In order to enable compilation of kdb, you must first enable kgdb.</para></listitem>
91 <listitem><para>The kgdb test compile options are described in the kgdb test suite chapter.</para></listitem>
92 </itemizedlist>
93 </para>
94 <sect1 id="CompileKGDB">
95 <title>Kernel config options for kgdb</title>
96 <para>
97 To enable <symbol>CONFIG_KGDB</symbol> you should look under
98 "Kernel hacking" / "Kernel debugging" and select "KGDB: kernel debugger".
99 </para>
100 <para>
101 While it is not a hard requirement that you have symbols in your
102 vmlinux file, gdb tends not to be very useful without the symbolic
103 data, so you will want to turn
104 on <symbol>CONFIG_DEBUG_INFO</symbol> which is called "Compile the
105 kernel with debug info" in the config menu.
106 </para>
107 <para>
108 It is advised, but not required, that you turn on the
109 <symbol>CONFIG_FRAME_POINTER</symbol> kernel option which is called "Compile the
110 kernel with frame pointers" in the config menu. This option
111 inserts code to into the compiled executable which saves the frame
112 information in registers or on the stack at different points which
113 allows a debugger such as gdb to more accurately construct
114 stack back traces while debugging the kernel.
115 </para>
116 <para>
117 If the architecture that you are using supports the kernel option
118 CONFIG_STRICT_KERNEL_RWX, you should consider turning it off. This
119 option will prevent the use of software breakpoints because it
120 marks certain regions of the kernel's memory space as read-only.
121 If kgdb supports it for the architecture you are using, you can
122 use hardware breakpoints if you desire to run with the
123 CONFIG_STRICT_KERNEL_RWX option turned on, else you need to turn off
124 this option.
125 </para>
126 <para>
127 Next you should choose one of more I/O drivers to interconnect
128 debugging host and debugged target. Early boot debugging requires
129 a KGDB I/O driver that supports early debugging and the driver
130 must be built into the kernel directly. Kgdb I/O driver
131 configuration takes place via kernel or module parameters which
132 you can learn more about in the in the section that describes the
133 parameter "kgdboc".
134 </para>
135 <para>Here is an example set of .config symbols to enable or
136 disable for kgdb:
137 <itemizedlist>
138 <listitem><para># CONFIG_STRICT_KERNEL_RWX is not set</para></listitem>
139 <listitem><para>CONFIG_FRAME_POINTER=y</para></listitem>
140 <listitem><para>CONFIG_KGDB=y</para></listitem>
141 <listitem><para>CONFIG_KGDB_SERIAL_CONSOLE=y</para></listitem>
142 </itemizedlist>
143 </para>
144 </sect1>
145 <sect1 id="CompileKDB">
146 <title>Kernel config options for kdb</title>
147 <para>Kdb is quite a bit more complex than the simple gdbstub
148 sitting on top of the kernel's debug core. Kdb must implement a
149 shell, and also adds some helper functions in other parts of the
150 kernel, responsible for printing out interesting data such as what
151 you would see if you ran "lsmod", or "ps". In order to build kdb
152 into the kernel you follow the same steps as you would for kgdb.
153 </para>
154 <para>The main config option for kdb
155 is <symbol>CONFIG_KGDB_KDB</symbol> which is called "KGDB_KDB:
156 include kdb frontend for kgdb" in the config menu. In theory you
157 would have already also selected an I/O driver such as the
158 CONFIG_KGDB_SERIAL_CONSOLE interface if you plan on using kdb on a
159 serial port, when you were configuring kgdb.
160 </para>
161 <para>If you want to use a PS/2-style keyboard with kdb, you would
162 select CONFIG_KDB_KEYBOARD which is called "KGDB_KDB: keyboard as
163 input device" in the config menu. The CONFIG_KDB_KEYBOARD option
164 is not used for anything in the gdb interface to kgdb. The
165 CONFIG_KDB_KEYBOARD option only works with kdb.
166 </para>
167 <para>Here is an example set of .config symbols to enable/disable kdb:
168 <itemizedlist>
169 <listitem><para># CONFIG_STRICT_KERNEL_RWX is not set</para></listitem>
170 <listitem><para>CONFIG_FRAME_POINTER=y</para></listitem>
171 <listitem><para>CONFIG_KGDB=y</para></listitem>
172 <listitem><para>CONFIG_KGDB_SERIAL_CONSOLE=y</para></listitem>
173 <listitem><para>CONFIG_KGDB_KDB=y</para></listitem>
174 <listitem><para>CONFIG_KDB_KEYBOARD=y</para></listitem>
175 </itemizedlist>
176 </para>
177 </sect1>
178 </chapter>
179 <chapter id="kgdbKernelArgs">
180 <title>Kernel Debugger Boot Arguments</title>
181 <para>This section describes the various runtime kernel
182 parameters that affect the configuration of the kernel debugger.
183 The following chapter covers using kdb and kgdb as well as
184 providing some examples of the configuration parameters.</para>
185 <sect1 id="kgdboc">
186 <title>Kernel parameter: kgdboc</title>
187 <para>The kgdboc driver was originally an abbreviation meant to
188 stand for "kgdb over console". Today it is the primary mechanism
189 to configure how to communicate from gdb to kgdb as well as the
190 devices you want to use to interact with the kdb shell.
191 </para>
192 <para>For kgdb/gdb, kgdboc is designed to work with a single serial
193 port. It is intended to cover the circumstance where you want to
194 use a serial console as your primary console as well as using it to
195 perform kernel debugging. It is also possible to use kgdb on a
196 serial port which is not designated as a system console. Kgdboc
197 may be configured as a kernel built-in or a kernel loadable module.
198 You can only make use of <constant>kgdbwait</constant> and early
199 debugging if you build kgdboc into the kernel as a built-in.
200 </para>
201 <para>Optionally you can elect to activate kms (Kernel Mode
202 Setting) integration. When you use kms with kgdboc and you have a
203 video driver that has atomic mode setting hooks, it is possible to
204 enter the debugger on the graphics console. When the kernel
205 execution is resumed, the previous graphics mode will be restored.
206 This integration can serve as a useful tool to aid in diagnosing
207 crashes or doing analysis of memory with kdb while allowing the
208 full graphics console applications to run.
209 </para>
210 <sect2 id="kgdbocArgs">
211 <title>kgdboc arguments</title>
212 <para>Usage: <constant>kgdboc=[kms][[,]kbd][[,]serial_device][,baud]</constant></para>
213 <para>The order listed above must be observed if you use any of the
214 optional configurations together.
215 </para>
216 <para>Abbreviations:
217 <itemizedlist>
218 <listitem><para>kms = Kernel Mode Setting</para></listitem>
219 <listitem><para>kbd = Keyboard</para></listitem>
220 </itemizedlist>
221 </para>
222 <para>You can configure kgdboc to use the keyboard, and/or a serial
223 device depending on if you are using kdb and/or kgdb, in one of the
224 following scenarios. The order listed above must be observed if
225 you use any of the optional configurations together. Using kms +
226 only gdb is generally not a useful combination.</para>
227 <sect3 id="kgdbocArgs1">
228 <title>Using loadable module or built-in</title>
229 <para>
230 <orderedlist>
231 <listitem><para>As a kernel built-in:</para>
232 <para>Use the kernel boot argument: <constant>kgdboc=&lt;tty-device&gt;,[baud]</constant></para></listitem>
233 <listitem>
234 <para>As a kernel loadable module:</para>
235 <para>Use the command: <constant>modprobe kgdboc kgdboc=&lt;tty-device&gt;,[baud]</constant></para>
236 <para>Here are two examples of how you might format the kgdboc
237 string. The first is for an x86 target using the first serial port.
238 The second example is for the ARM Versatile AB using the second
239 serial port.
240 <orderedlist>
241 <listitem><para><constant>kgdboc=ttyS0,115200</constant></para></listitem>
242 <listitem><para><constant>kgdboc=ttyAMA1,115200</constant></para></listitem>
243 </orderedlist>
244 </para>
245 </listitem>
246 </orderedlist></para>
247 </sect3>
248 <sect3 id="kgdbocArgs2">
249 <title>Configure kgdboc at runtime with sysfs</title>
250 <para>At run time you can enable or disable kgdboc by echoing a
251 parameters into the sysfs. Here are two examples:</para>
252 <orderedlist>
253 <listitem><para>Enable kgdboc on ttyS0</para>
254 <para><constant>echo ttyS0 &gt; /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem>
255 <listitem><para>Disable kgdboc</para>
256 <para><constant>echo "" &gt; /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem>
257 </orderedlist>
258 <para>NOTE: You do not need to specify the baud if you are
259 configuring the console on tty which is already configured or
260 open.</para>
261 </sect3>
262 <sect3 id="kgdbocArgs3">
263 <title>More examples</title>
264 <para>You can configure kgdboc to use the keyboard, and/or a serial device
265 depending on if you are using kdb and/or kgdb, in one of the
266 following scenarios.
267 <orderedlist>
268 <listitem><para>kdb and kgdb over only a serial port</para>
269 <para><constant>kgdboc=&lt;serial_device&gt;[,baud]</constant></para>
270 <para>Example: <constant>kgdboc=ttyS0,115200</constant></para>
271 </listitem>
272 <listitem><para>kdb and kgdb with keyboard and a serial port</para>
273 <para><constant>kgdboc=kbd,&lt;serial_device&gt;[,baud]</constant></para>
274 <para>Example: <constant>kgdboc=kbd,ttyS0,115200</constant></para>
275 </listitem>
276 <listitem><para>kdb with a keyboard</para>
277 <para><constant>kgdboc=kbd</constant></para>
278 </listitem>
279 <listitem><para>kdb with kernel mode setting</para>
280 <para><constant>kgdboc=kms,kbd</constant></para>
281 </listitem>
282 <listitem><para>kdb with kernel mode setting and kgdb over a serial port</para>
283 <para><constant>kgdboc=kms,kbd,ttyS0,115200</constant></para>
284 </listitem>
285 </orderedlist>
286 </para>
287 <para>NOTE: Kgdboc does not support interrupting the target via the
288 gdb remote protocol. You must manually send a sysrq-g unless you
289 have a proxy that splits console output to a terminal program.
290 A console proxy has a separate TCP port for the debugger and a separate
291 TCP port for the "human" console. The proxy can take care of sending
292 the sysrq-g for you.
293 </para>
294 <para>When using kgdboc with no debugger proxy, you can end up
295 connecting the debugger at one of two entry points. If an
296 exception occurs after you have loaded kgdboc, a message should
297 print on the console stating it is waiting for the debugger. In
298 this case you disconnect your terminal program and then connect the
299 debugger in its place. If you want to interrupt the target system
300 and forcibly enter a debug session you have to issue a Sysrq
301 sequence and then type the letter <constant>g</constant>. Then
302 you disconnect the terminal session and connect gdb. Your options
303 if you don't like this are to hack gdb to send the sysrq-g for you
304 as well as on the initial connect, or to use a debugger proxy that
305 allows an unmodified gdb to do the debugging.
306 </para>
307 </sect3>
308 </sect2>
309 </sect1>
310 <sect1 id="kgdbwait">
311 <title>Kernel parameter: kgdbwait</title>
312 <para>
313 The Kernel command line option <constant>kgdbwait</constant> makes
314 kgdb wait for a debugger connection during booting of a kernel. You
315 can only use this option if you compiled a kgdb I/O driver into the
316 kernel and you specified the I/O driver configuration as a kernel
317 command line option. The kgdbwait parameter should always follow the
318 configuration parameter for the kgdb I/O driver in the kernel
319 command line else the I/O driver will not be configured prior to
320 asking the kernel to use it to wait.
321 </para>
322 <para>
323 The kernel will stop and wait as early as the I/O driver and
324 architecture allows when you use this option. If you build the
325 kgdb I/O driver as a loadable kernel module kgdbwait will not do
326 anything.
327 </para>
328 </sect1>
329 <sect1 id="kgdbcon">
330 <title>Kernel parameter: kgdbcon</title>
331 <para> The kgdbcon feature allows you to see printk() messages
332 inside gdb while gdb is connected to the kernel. Kdb does not make
333 use of the kgdbcon feature.
334 </para>
335 <para>Kgdb supports using the gdb serial protocol to send console
336 messages to the debugger when the debugger is connected and running.
337 There are two ways to activate this feature.
338 <orderedlist>
339 <listitem><para>Activate with the kernel command line option:</para>
340 <para><constant>kgdbcon</constant></para>
341 </listitem>
342 <listitem><para>Use sysfs before configuring an I/O driver</para>
343 <para>
344 <constant>echo 1 &gt; /sys/module/kgdb/parameters/kgdb_use_con</constant>
345 </para>
346 <para>
347 NOTE: If you do this after you configure the kgdb I/O driver, the
348 setting will not take effect until the next point the I/O is
349 reconfigured.
350 </para>
351 </listitem>
352 </orderedlist>
353 </para>
354 <para>IMPORTANT NOTE: You cannot use kgdboc + kgdbcon on a tty that is an
355 active system console. An example of incorrect usage is <constant>console=ttyS0,115200 kgdboc=ttyS0 kgdbcon</constant>
356 </para>
357 <para>It is possible to use this option with kgdboc on a tty that is not a system console.
358 </para>
359 </sect1>
360 <sect1 id="kgdbreboot">
361 <title>Run time parameter: kgdbreboot</title>
362 <para> The kgdbreboot feature allows you to change how the debugger
363 deals with the reboot notification. You have 3 choices for the
364 behavior. The default behavior is always set to 0.</para>
365 <orderedlist>
366 <listitem><para>echo -1 > /sys/module/debug_core/parameters/kgdbreboot</para>
367 <para>Ignore the reboot notification entirely.</para>
368 </listitem>
369 <listitem><para>echo 0 > /sys/module/debug_core/parameters/kgdbreboot</para>
370 <para>Send the detach message to any attached debugger client.</para>
371 </listitem>
372 <listitem><para>echo 1 > /sys/module/debug_core/parameters/kgdbreboot</para>
373 <para>Enter the debugger on reboot notify.</para>
374 </listitem>
375 </orderedlist>
376 </sect1>
377 </chapter>
378 <chapter id="usingKDB">
379 <title>Using kdb</title>
380 <para>
381 </para>
382 <sect1 id="quickKDBserial">
383 <title>Quick start for kdb on a serial port</title>
384 <para>This is a quick example of how to use kdb.</para>
385 <para><orderedlist>
386 <listitem><para>Configure kgdboc at boot using kernel parameters:
387 <itemizedlist>
388 <listitem><para><constant>console=ttyS0,115200 kgdboc=ttyS0,115200</constant></para></listitem>
389 </itemizedlist></para>
390 <para>OR</para>
391 <para>Configure kgdboc after the kernel has booted; assuming you are using a serial port console:
392 <itemizedlist>
393 <listitem><para><constant>echo ttyS0 &gt; /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem>
394 </itemizedlist>
395 </para>
396 </listitem>
397 <listitem><para>Enter the kernel debugger manually or by waiting for an oops or fault. There are several ways you can enter the kernel debugger manually; all involve using the sysrq-g, which means you must have enabled CONFIG_MAGIC_SYSRQ=y in your kernel config.</para>
398 <itemizedlist>
399 <listitem><para>When logged in as root or with a super user session you can run:</para>
400 <para><constant>echo g &gt; /proc/sysrq-trigger</constant></para></listitem>
401 <listitem><para>Example using minicom 2.2</para>
402 <para>Press: <constant>Control-a</constant></para>
403 <para>Press: <constant>f</constant></para>
404 <para>Press: <constant>g</constant></para>
405 </listitem>
406 <listitem><para>When you have telneted to a terminal server that supports sending a remote break</para>
407 <para>Press: <constant>Control-]</constant></para>
408 <para>Type in:<constant>send break</constant></para>
409 <para>Press: <constant>Enter</constant></para>
410 <para>Press: <constant>g</constant></para>
411 </listitem>
412 </itemizedlist>
413 </listitem>
414 <listitem><para>From the kdb prompt you can run the "help" command to see a complete list of the commands that are available.</para>
415 <para>Some useful commands in kdb include:
416 <itemizedlist>
417 <listitem><para>lsmod -- Shows where kernel modules are loaded</para></listitem>
418 <listitem><para>ps -- Displays only the active processes</para></listitem>
419 <listitem><para>ps A -- Shows all the processes</para></listitem>
420 <listitem><para>summary -- Shows kernel version info and memory usage</para></listitem>
421 <listitem><para>bt -- Get a backtrace of the current process using dump_stack()</para></listitem>
422 <listitem><para>dmesg -- View the kernel syslog buffer</para></listitem>
423 <listitem><para>go -- Continue the system</para></listitem>
424 </itemizedlist>
425 </para>
426 </listitem>
427 <listitem>
428 <para>When you are done using kdb you need to consider rebooting the
429 system or using the "go" command to resuming normal kernel
430 execution. If you have paused the kernel for a lengthy period of
431 time, applications that rely on timely networking or anything to do
432 with real wall clock time could be adversely affected, so you
433 should take this into consideration when using the kernel
434 debugger.</para>
435 </listitem>
436 </orderedlist></para>
437 </sect1>
438 <sect1 id="quickKDBkeyboard">
439 <title>Quick start for kdb using a keyboard connected console</title>
440 <para>This is a quick example of how to use kdb with a keyboard.</para>
441 <para><orderedlist>
442 <listitem><para>Configure kgdboc at boot using kernel parameters:
443 <itemizedlist>
444 <listitem><para><constant>kgdboc=kbd</constant></para></listitem>
445 </itemizedlist></para>
446 <para>OR</para>
447 <para>Configure kgdboc after the kernel has booted:
448 <itemizedlist>
449 <listitem><para><constant>echo kbd &gt; /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem>
450 </itemizedlist>
451 </para>
452 </listitem>
453 <listitem><para>Enter the kernel debugger manually or by waiting for an oops or fault. There are several ways you can enter the kernel debugger manually; all involve using the sysrq-g, which means you must have enabled CONFIG_MAGIC_SYSRQ=y in your kernel config.</para>
454 <itemizedlist>
455 <listitem><para>When logged in as root or with a super user session you can run:</para>
456 <para><constant>echo g &gt; /proc/sysrq-trigger</constant></para></listitem>
457 <listitem><para>Example using a laptop keyboard</para>
458 <para>Press and hold down: <constant>Alt</constant></para>
459 <para>Press and hold down: <constant>Fn</constant></para>
460 <para>Press and release the key with the label: <constant>SysRq</constant></para>
461 <para>Release: <constant>Fn</constant></para>
462 <para>Press and release: <constant>g</constant></para>
463 <para>Release: <constant>Alt</constant></para>
464 </listitem>
465 <listitem><para>Example using a PS/2 101-key keyboard</para>
466 <para>Press and hold down: <constant>Alt</constant></para>
467 <para>Press and release the key with the label: <constant>SysRq</constant></para>
468 <para>Press and release: <constant>g</constant></para>
469 <para>Release: <constant>Alt</constant></para>
470 </listitem>
471 </itemizedlist>
472 </listitem>
473 <listitem>
474 <para>Now type in a kdb command such as "help", "dmesg", "bt" or "go" to continue kernel execution.</para>
475 </listitem>
476 </orderedlist></para>
477 </sect1>
478 </chapter>
479 <chapter id="EnableKGDB">
480 <title>Using kgdb / gdb</title>
481 <para>In order to use kgdb you must activate it by passing
482 configuration information to one of the kgdb I/O drivers. If you
483 do not pass any configuration information kgdb will not do anything
484 at all. Kgdb will only actively hook up to the kernel trap hooks
485 if a kgdb I/O driver is loaded and configured. If you unconfigure
486 a kgdb I/O driver, kgdb will unregister all the kernel hook points.
487 </para>
488 <para> All kgdb I/O drivers can be reconfigured at run time, if
489 <symbol>CONFIG_SYSFS</symbol> and <symbol>CONFIG_MODULES</symbol>
490 are enabled, by echo'ing a new config string to
491 <constant>/sys/module/&lt;driver&gt;/parameter/&lt;option&gt;</constant>.
492 The driver can be unconfigured by passing an empty string. You cannot
493 change the configuration while the debugger is attached. Make sure
494 to detach the debugger with the <constant>detach</constant> command
495 prior to trying to unconfigure a kgdb I/O driver.
496 </para>
497 <sect1 id="ConnectingGDB">
498 <title>Connecting with gdb to a serial port</title>
499 <orderedlist>
500 <listitem><para>Configure kgdboc</para>
501 <para>Configure kgdboc at boot using kernel parameters:
502 <itemizedlist>
503 <listitem><para><constant>kgdboc=ttyS0,115200</constant></para></listitem>
504 </itemizedlist></para>
505 <para>OR</para>
506 <para>Configure kgdboc after the kernel has booted:
507 <itemizedlist>
508 <listitem><para><constant>echo ttyS0 &gt; /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem>
509 </itemizedlist></para>
510 </listitem>
511 <listitem>
512 <para>Stop kernel execution (break into the debugger)</para>
513 <para>In order to connect to gdb via kgdboc, the kernel must
514 first be stopped. There are several ways to stop the kernel which
515 include using kgdbwait as a boot argument, via a sysrq-g, or running
516 the kernel until it takes an exception where it waits for the
517 debugger to attach.
518 <itemizedlist>
519 <listitem><para>When logged in as root or with a super user session you can run:</para>
520 <para><constant>echo g &gt; /proc/sysrq-trigger</constant></para></listitem>
521 <listitem><para>Example using minicom 2.2</para>
522 <para>Press: <constant>Control-a</constant></para>
523 <para>Press: <constant>f</constant></para>
524 <para>Press: <constant>g</constant></para>
525 </listitem>
526 <listitem><para>When you have telneted to a terminal server that supports sending a remote break</para>
527 <para>Press: <constant>Control-]</constant></para>
528 <para>Type in:<constant>send break</constant></para>
529 <para>Press: <constant>Enter</constant></para>
530 <para>Press: <constant>g</constant></para>
531 </listitem>
532 </itemizedlist>
533 </para>
534 </listitem>
535 <listitem>
536 <para>Connect from gdb</para>
537 <para>
538 Example (using a directly connected port):
539 </para>
540 <programlisting>
541 % gdb ./vmlinux
542 (gdb) set remotebaud 115200
543 (gdb) target remote /dev/ttyS0
544 </programlisting>
545 <para>
546 Example (kgdb to a terminal server on TCP port 2012):
547 </para>
548 <programlisting>
549 % gdb ./vmlinux
550 (gdb) target remote 192.168.2.2:2012
551 </programlisting>
552 <para>
553 Once connected, you can debug a kernel the way you would debug an
554 application program.
555 </para>
556 <para>
557 If you are having problems connecting or something is going
558 seriously wrong while debugging, it will most often be the case
559 that you want to enable gdb to be verbose about its target
560 communications. You do this prior to issuing the <constant>target
561 remote</constant> command by typing in: <constant>set debug remote 1</constant>
562 </para>
563 </listitem>
564 </orderedlist>
565 <para>Remember if you continue in gdb, and need to "break in" again,
566 you need to issue an other sysrq-g. It is easy to create a simple
567 entry point by putting a breakpoint at <constant>sys_sync</constant>
568 and then you can run "sync" from a shell or script to break into the
569 debugger.</para>
570 </sect1>
571 </chapter>
572 <chapter id="switchKdbKgdb">
573 <title>kgdb and kdb interoperability</title>
574 <para>It is possible to transition between kdb and kgdb dynamically.
575 The debug core will remember which you used the last time and
576 automatically start in the same mode.</para>
577 <sect1>
578 <title>Switching between kdb and kgdb</title>
579 <sect2>
580 <title>Switching from kgdb to kdb</title>
581 <para>
582 There are two ways to switch from kgdb to kdb: you can use gdb to
583 issue a maintenance packet, or you can blindly type the command $3#33.
584 Whenever the kernel debugger stops in kgdb mode it will print the
585 message <constant>KGDB or $3#33 for KDB</constant>. It is important
586 to note that you have to type the sequence correctly in one pass.
587 You cannot type a backspace or delete because kgdb will interpret
588 that as part of the debug stream.
589 <orderedlist>
590 <listitem><para>Change from kgdb to kdb by blindly typing:</para>
591 <para><constant>$3#33</constant></para></listitem>
592 <listitem><para>Change from kgdb to kdb with gdb</para>
593 <para><constant>maintenance packet 3</constant></para>
594 <para>NOTE: Now you must kill gdb. Typically you press control-z and
595 issue the command: kill -9 %</para></listitem>
596 </orderedlist>
597 </para>
598 </sect2>
599 <sect2>
600 <title>Change from kdb to kgdb</title>
601 <para>There are two ways you can change from kdb to kgdb. You can
602 manually enter kgdb mode by issuing the kgdb command from the kdb
603 shell prompt, or you can connect gdb while the kdb shell prompt is
604 active. The kdb shell looks for the typical first commands that gdb
605 would issue with the gdb remote protocol and if it sees one of those
606 commands it automatically changes into kgdb mode.</para>
607 <orderedlist>
608 <listitem><para>From kdb issue the command:</para>
609 <para><constant>kgdb</constant></para>
610 <para>Now disconnect your terminal program and connect gdb in its place</para></listitem>
611 <listitem><para>At the kdb prompt, disconnect the terminal program and connect gdb in its place.</para></listitem>
612 </orderedlist>
613 </sect2>
614 </sect1>
615 <sect1>
616 <title>Running kdb commands from gdb</title>
617 <para>It is possible to run a limited set of kdb commands from gdb,
618 using the gdb monitor command. You don't want to execute any of the
619 run control or breakpoint operations, because it can disrupt the
620 state of the kernel debugger. You should be using gdb for
621 breakpoints and run control operations if you have gdb connected.
622 The more useful commands to run are things like lsmod, dmesg, ps or
623 possibly some of the memory information commands. To see all the kdb
624 commands you can run <constant>monitor help</constant>.</para>
625 <para>Example:
626 <informalexample><programlisting>
627(gdb) monitor ps
6281 idle process (state I) and
62927 sleeping system daemon (state M) processes suppressed,
630use 'ps A' to see all.
631Task Addr Pid Parent [*] cpu State Thread Command
632
6330xc78291d0 1 0 0 0 S 0xc7829404 init
6340xc7954150 942 1 0 0 S 0xc7954384 dropbear
6350xc78789c0 944 1 0 0 S 0xc7878bf4 sh
636(gdb)
637 </programlisting></informalexample>
638 </para>
639 </sect1>
640 </chapter>
641 <chapter id="KGDBTestSuite">
642 <title>kgdb Test Suite</title>
643 <para>
644 When kgdb is enabled in the kernel config you can also elect to
645 enable the config parameter KGDB_TESTS. Turning this on will
646 enable a special kgdb I/O module which is designed to test the
647 kgdb internal functions.
648 </para>
649 <para>
650 The kgdb tests are mainly intended for developers to test the kgdb
651 internals as well as a tool for developing a new kgdb architecture
652 specific implementation. These tests are not really for end users
653 of the Linux kernel. The primary source of documentation would be
654 to look in the drivers/misc/kgdbts.c file.
655 </para>
656 <para>
657 The kgdb test suite can also be configured at compile time to run
658 the core set of tests by setting the kernel config parameter
659 KGDB_TESTS_ON_BOOT. This particular option is aimed at automated
660 regression testing and does not require modifying the kernel boot
661 config arguments. If this is turned on, the kgdb test suite can
662 be disabled by specifying "kgdbts=" as a kernel boot argument.
663 </para>
664 </chapter>
665 <chapter id="CommonBackEndReq">
666 <title>Kernel Debugger Internals</title>
667 <sect1 id="kgdbArchitecture">
668 <title>Architecture Specifics</title>
669 <para>
670 The kernel debugger is organized into a number of components:
671 <orderedlist>
672 <listitem><para>The debug core</para>
673 <para>
674 The debug core is found in kernel/debugger/debug_core.c. It contains:
675 <itemizedlist>
676 <listitem><para>A generic OS exception handler which includes
677 sync'ing the processors into a stopped state on an multi-CPU
678 system.</para></listitem>
679 <listitem><para>The API to talk to the kgdb I/O drivers</para></listitem>
680 <listitem><para>The API to make calls to the arch-specific kgdb implementation</para></listitem>
681 <listitem><para>The logic to perform safe memory reads and writes to memory while using the debugger</para></listitem>
682 <listitem><para>A full implementation for software breakpoints unless overridden by the arch</para></listitem>
683 <listitem><para>The API to invoke either the kdb or kgdb frontend to the debug core.</para></listitem>
684 <listitem><para>The structures and callback API for atomic kernel mode setting.</para>
685 <para>NOTE: kgdboc is where the kms callbacks are invoked.</para></listitem>
686 </itemizedlist>
687 </para>
688 </listitem>
689 <listitem><para>kgdb arch-specific implementation</para>
690 <para>
691 This implementation is generally found in arch/*/kernel/kgdb.c.
692 As an example, arch/x86/kernel/kgdb.c contains the specifics to
693 implement HW breakpoint as well as the initialization to
694 dynamically register and unregister for the trap handlers on
695 this architecture. The arch-specific portion implements:
696 <itemizedlist>
697 <listitem><para>contains an arch-specific trap catcher which
698 invokes kgdb_handle_exception() to start kgdb about doing its
699 work</para></listitem>
700 <listitem><para>translation to and from gdb specific packet format to pt_regs</para></listitem>
701 <listitem><para>Registration and unregistration of architecture specific trap hooks</para></listitem>
702 <listitem><para>Any special exception handling and cleanup</para></listitem>
703 <listitem><para>NMI exception handling and cleanup</para></listitem>
704 <listitem><para>(optional) HW breakpoints</para></listitem>
705 </itemizedlist>
706 </para>
707 </listitem>
708 <listitem><para>gdbstub frontend (aka kgdb)</para>
709 <para>The gdbstub is located in kernel/debug/gdbstub.c. It contains:</para>
710 <itemizedlist>
711 <listitem><para>All the logic to implement the gdb serial protocol</para></listitem>
712 </itemizedlist>
713 </listitem>
714 <listitem><para>kdb frontend</para>
715 <para>The kdb debugger shell is broken down into a number of
716 components. The kdb core is located in kernel/debug/kdb. There
717 are a number of helper functions in some of the other kernel
718 components to make it possible for kdb to examine and report
719 information about the kernel without taking locks that could
720 cause a kernel deadlock. The kdb core contains implements the following functionality.</para>
721 <itemizedlist>
722 <listitem><para>A simple shell</para></listitem>
723 <listitem><para>The kdb core command set</para></listitem>
724 <listitem><para>A registration API to register additional kdb shell commands.</para>
725 <itemizedlist>
726 <listitem><para>A good example of a self-contained kdb module
727 is the "ftdump" command for dumping the ftrace buffer. See:
728 kernel/trace/trace_kdb.c</para></listitem>
729 <listitem><para>For an example of how to dynamically register
730 a new kdb command you can build the kdb_hello.ko kernel module
731 from samples/kdb/kdb_hello.c. To build this example you can
732 set CONFIG_SAMPLES=y and CONFIG_SAMPLE_KDB=m in your kernel
733 config. Later run "modprobe kdb_hello" and the next time you
734 enter the kdb shell, you can run the "hello"
735 command.</para></listitem>
736 </itemizedlist></listitem>
737 <listitem><para>The implementation for kdb_printf() which
738 emits messages directly to I/O drivers, bypassing the kernel
739 log.</para></listitem>
740 <listitem><para>SW / HW breakpoint management for the kdb shell</para></listitem>
741 </itemizedlist>
742 </listitem>
743 <listitem><para>kgdb I/O driver</para>
744 <para>
745 Each kgdb I/O driver has to provide an implementation for the following:
746 <itemizedlist>
747 <listitem><para>configuration via built-in or module</para></listitem>
748 <listitem><para>dynamic configuration and kgdb hook registration calls</para></listitem>
749 <listitem><para>read and write character interface</para></listitem>
750 <listitem><para>A cleanup handler for unconfiguring from the kgdb core</para></listitem>
751 <listitem><para>(optional) Early debug methodology</para></listitem>
752 </itemizedlist>
753 Any given kgdb I/O driver has to operate very closely with the
754 hardware and must do it in such a way that does not enable
755 interrupts or change other parts of the system context without
756 completely restoring them. The kgdb core will repeatedly "poll"
757 a kgdb I/O driver for characters when it needs input. The I/O
758 driver is expected to return immediately if there is no data
759 available. Doing so allows for the future possibility to touch
760 watchdog hardware in such a way as to have a target system not
761 reset when these are enabled.
762 </para>
763 </listitem>
764 </orderedlist>
765 </para>
766 <para>
767 If you are intent on adding kgdb architecture specific support
768 for a new architecture, the architecture should define
769 <constant>HAVE_ARCH_KGDB</constant> in the architecture specific
770 Kconfig file. This will enable kgdb for the architecture, and
771 at that point you must create an architecture specific kgdb
772 implementation.
773 </para>
774 <para>
775 There are a few flags which must be set on every architecture in
776 their &lt;asm/kgdb.h&gt; file. These are:
777 <itemizedlist>
778 <listitem>
779 <para>
780 NUMREGBYTES: The size in bytes of all of the registers, so
781 that we can ensure they will all fit into a packet.
782 </para>
783 </listitem>
784 <listitem>
785 <para>
786 BUFMAX: The size in bytes of the buffer GDB will read into.
787 This must be larger than NUMREGBYTES.
788 </para>
789 </listitem>
790 <listitem>
791 <para>
792 CACHE_FLUSH_IS_SAFE: Set to 1 if it is always safe to call
793 flush_cache_range or flush_icache_range. On some architectures,
794 these functions may not be safe to call on SMP since we keep other
795 CPUs in a holding pattern.
796 </para>
797 </listitem>
798 </itemizedlist>
799 </para>
800 <para>
801 There are also the following functions for the common backend,
802 found in kernel/kgdb.c, that must be supplied by the
803 architecture-specific backend unless marked as (optional), in
804 which case a default function maybe used if the architecture
805 does not need to provide a specific implementation.
806 </para>
807!Iinclude/linux/kgdb.h
808 </sect1>
809 <sect1 id="kgdbocDesign">
810 <title>kgdboc internals</title>
811 <sect2>
812 <title>kgdboc and uarts</title>
813 <para>
814 The kgdboc driver is actually a very thin driver that relies on the
815 underlying low level to the hardware driver having "polling hooks"
816 to which the tty driver is attached. In the initial
817 implementation of kgdboc the serial_core was changed to expose a
818 low level UART hook for doing polled mode reading and writing of a
819 single character while in an atomic context. When kgdb makes an I/O
820 request to the debugger, kgdboc invokes a callback in the serial
821 core which in turn uses the callback in the UART driver.</para>
822 <para>
823 When using kgdboc with a UART, the UART driver must implement two callbacks in the <constant>struct uart_ops</constant>. Example from drivers/8250.c:<programlisting>
824#ifdef CONFIG_CONSOLE_POLL
825 .poll_get_char = serial8250_get_poll_char,
826 .poll_put_char = serial8250_put_poll_char,
827#endif
828 </programlisting>
829 Any implementation specifics around creating a polling driver use the
830 <constant>#ifdef CONFIG_CONSOLE_POLL</constant>, as shown above.
831 Keep in mind that polling hooks have to be implemented in such a way
832 that they can be called from an atomic context and have to restore
833 the state of the UART chip on return such that the system can return
834 to normal when the debugger detaches. You need to be very careful
835 with any kind of lock you consider, because failing here is most likely
836 going to mean pressing the reset button.
837 </para>
838 </sect2>
839 <sect2 id="kgdbocKbd">
840 <title>kgdboc and keyboards</title>
841 <para>The kgdboc driver contains logic to configure communications
842 with an attached keyboard. The keyboard infrastructure is only
843 compiled into the kernel when CONFIG_KDB_KEYBOARD=y is set in the
844 kernel configuration.</para>
845 <para>The core polled keyboard driver driver for PS/2 type keyboards
846 is in drivers/char/kdb_keyboard.c. This driver is hooked into the
847 debug core when kgdboc populates the callback in the array
848 called <constant>kdb_poll_funcs[]</constant>. The
849 kdb_get_kbd_char() is the top-level function which polls hardware
850 for single character input.
851 </para>
852 </sect2>
853 <sect2 id="kgdbocKms">
854 <title>kgdboc and kms</title>
855 <para>The kgdboc driver contains logic to request the graphics
856 display to switch to a text context when you are using
857 "kgdboc=kms,kbd", provided that you have a video driver which has a
858 frame buffer console and atomic kernel mode setting support.</para>
859 <para>
860 Every time the kernel
861 debugger is entered it calls kgdboc_pre_exp_handler() which in turn
862 calls con_debug_enter() in the virtual console layer. On resuming kernel
863 execution, the kernel debugger calls kgdboc_post_exp_handler() which
864 in turn calls con_debug_leave().</para>
865 <para>Any video driver that wants to be compatible with the kernel
866 debugger and the atomic kms callbacks must implement the
867 mode_set_base_atomic, fb_debug_enter and fb_debug_leave operations.
868 For the fb_debug_enter and fb_debug_leave the option exists to use
869 the generic drm fb helper functions or implement something custom for
870 the hardware. The following example shows the initialization of the
871 .mode_set_base_atomic operation in
872 drivers/gpu/drm/i915/intel_display.c:
873 <informalexample>
874 <programlisting>
875static const struct drm_crtc_helper_funcs intel_helper_funcs = {
876[...]
877 .mode_set_base_atomic = intel_pipe_set_base_atomic,
878[...]
879};
880 </programlisting>
881 </informalexample>
882 </para>
883 <para>Here is an example of how the i915 driver initializes the fb_debug_enter and fb_debug_leave functions to use the generic drm helpers in
884 drivers/gpu/drm/i915/intel_fb.c:
885 <informalexample>
886 <programlisting>
887static struct fb_ops intelfb_ops = {
888[...]
889 .fb_debug_enter = drm_fb_helper_debug_enter,
890 .fb_debug_leave = drm_fb_helper_debug_leave,
891[...]
892};
893 </programlisting>
894 </informalexample>
895 </para>
896 </sect2>
897 </sect1>
898 </chapter>
899 <chapter id="credits">
900 <title>Credits</title>
901 <para>
902 The following people have contributed to this document:
903 <orderedlist>
904 <listitem><para>Amit Kale<email>amitkale@linsyssoft.com</email></para></listitem>
905 <listitem><para>Tom Rini<email>trini@kernel.crashing.org</email></para></listitem>
906 </orderedlist>
907 In March 2008 this document was completely rewritten by:
908 <itemizedlist>
909 <listitem><para>Jason Wessel<email>jason.wessel@windriver.com</email></para></listitem>
910 </itemizedlist>
911 In Jan 2010 this document was updated to include kdb.
912 <itemizedlist>
913 <listitem><para>Jason Wessel<email>jason.wessel@windriver.com</email></para></listitem>
914 </itemizedlist>
915 </para>
916 </chapter>
917</book>
918
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl
deleted file mode 100644
index 0320910b866d..000000000000
--- a/Documentation/DocBook/libata.tmpl
+++ /dev/null
@@ -1,1625 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="libataDevGuide">
6 <bookinfo>
7 <title>libATA Developer's Guide</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Jeff</firstname>
12 <surname>Garzik</surname>
13 </author>
14 </authorgroup>
15
16 <copyright>
17 <year>2003-2006</year>
18 <holder>Jeff Garzik</holder>
19 </copyright>
20
21 <legalnotice>
22 <para>
23 The contents of this file are subject to the Open
24 Software License version 1.1 that can be found at
25 <ulink url="http://fedoraproject.org/wiki/Licensing:OSL1.1">http://fedoraproject.org/wiki/Licensing:OSL1.1</ulink>
26 and is included herein by reference.
27 </para>
28
29 <para>
30 Alternatively, the contents of this file may be used under the terms
31 of the GNU General Public License version 2 (the "GPL") as distributed
32 in the kernel source COPYING file, in which case the provisions of
33 the GPL are applicable instead of the above. If you wish to allow
34 the use of your version of this file only under the terms of the
35 GPL and not to allow others to use your version of this file under
36 the OSL, indicate your decision by deleting the provisions above and
37 replace them with the notice and other provisions required by the GPL.
38 If you do not delete the provisions above, a recipient may use your
39 version of this file under either the OSL or the GPL.
40 </para>
41
42 </legalnotice>
43 </bookinfo>
44
45<toc></toc>
46
47 <chapter id="libataIntroduction">
48 <title>Introduction</title>
49 <para>
50 libATA is a library used inside the Linux kernel to support ATA host
51 controllers and devices. libATA provides an ATA driver API, class
52 transports for ATA and ATAPI devices, and SCSI&lt;-&gt;ATA translation
53 for ATA devices according to the T10 SAT specification.
54 </para>
55 <para>
56 This Guide documents the libATA driver API, library functions, library
57 internals, and a couple sample ATA low-level drivers.
58 </para>
59 </chapter>
60
61 <chapter id="libataDriverApi">
62 <title>libata Driver API</title>
63 <para>
64 struct ata_port_operations is defined for every low-level libata
65 hardware driver, and it controls how the low-level driver
66 interfaces with the ATA and SCSI layers.
67 </para>
68 <para>
69 FIS-based drivers will hook into the system with ->qc_prep() and
70 ->qc_issue() high-level hooks. Hardware which behaves in a manner
71 similar to PCI IDE hardware may utilize several generic helpers,
72 defining at a bare minimum the bus I/O addresses of the ATA shadow
73 register blocks.
74 </para>
75 <sect1>
76 <title>struct ata_port_operations</title>
77
78 <sect2><title>Disable ATA port</title>
79 <programlisting>
80void (*port_disable) (struct ata_port *);
81 </programlisting>
82
83 <para>
84 Called from ata_bus_probe() error path, as well as when
85 unregistering from the SCSI module (rmmod, hot unplug).
86 This function should do whatever needs to be done to take the
87 port out of use. In most cases, ata_port_disable() can be used
88 as this hook.
89 </para>
90 <para>
91 Called from ata_bus_probe() on a failed probe.
92 Called from ata_scsi_release().
93 </para>
94
95 </sect2>
96
97 <sect2><title>Post-IDENTIFY device configuration</title>
98 <programlisting>
99void (*dev_config) (struct ata_port *, struct ata_device *);
100 </programlisting>
101
102 <para>
103 Called after IDENTIFY [PACKET] DEVICE is issued to each device
104 found. Typically used to apply device-specific fixups prior to
105 issue of SET FEATURES - XFER MODE, and prior to operation.
106 </para>
107 <para>
108 This entry may be specified as NULL in ata_port_operations.
109 </para>
110
111 </sect2>
112
113 <sect2><title>Set PIO/DMA mode</title>
114 <programlisting>
115void (*set_piomode) (struct ata_port *, struct ata_device *);
116void (*set_dmamode) (struct ata_port *, struct ata_device *);
117void (*post_set_mode) (struct ata_port *);
118unsigned int (*mode_filter) (struct ata_port *, struct ata_device *, unsigned int);
119 </programlisting>
120
121 <para>
122 Hooks called prior to the issue of SET FEATURES - XFER MODE
123 command. The optional ->mode_filter() hook is called when libata
124 has built a mask of the possible modes. This is passed to the
125 ->mode_filter() function which should return a mask of valid modes
126 after filtering those unsuitable due to hardware limits. It is not
127 valid to use this interface to add modes.
128 </para>
129 <para>
130 dev->pio_mode and dev->dma_mode are guaranteed to be valid when
131 ->set_piomode() and when ->set_dmamode() is called. The timings for
132 any other drive sharing the cable will also be valid at this point.
133 That is the library records the decisions for the modes of each
134 drive on a channel before it attempts to set any of them.
135 </para>
136 <para>
137 ->post_set_mode() is
138 called unconditionally, after the SET FEATURES - XFER MODE
139 command completes successfully.
140 </para>
141
142 <para>
143 ->set_piomode() is always called (if present), but
144 ->set_dma_mode() is only called if DMA is possible.
145 </para>
146
147 </sect2>
148
149 <sect2><title>Taskfile read/write</title>
150 <programlisting>
151void (*sff_tf_load) (struct ata_port *ap, struct ata_taskfile *tf);
152void (*sff_tf_read) (struct ata_port *ap, struct ata_taskfile *tf);
153 </programlisting>
154
155 <para>
156 ->tf_load() is called to load the given taskfile into hardware
157 registers / DMA buffers. ->tf_read() is called to read the
158 hardware registers / DMA buffers, to obtain the current set of
159 taskfile register values.
160 Most drivers for taskfile-based hardware (PIO or MMIO) use
161 ata_sff_tf_load() and ata_sff_tf_read() for these hooks.
162 </para>
163
164 </sect2>
165
166 <sect2><title>PIO data read/write</title>
167 <programlisting>
168void (*sff_data_xfer) (struct ata_device *, unsigned char *, unsigned int, int);
169 </programlisting>
170
171 <para>
172All bmdma-style drivers must implement this hook. This is the low-level
173operation that actually copies the data bytes during a PIO data
174transfer.
175Typically the driver will choose one of ata_sff_data_xfer_noirq(),
176ata_sff_data_xfer(), or ata_sff_data_xfer32().
177 </para>
178
179 </sect2>
180
181 <sect2><title>ATA command execute</title>
182 <programlisting>
183void (*sff_exec_command)(struct ata_port *ap, struct ata_taskfile *tf);
184 </programlisting>
185
186 <para>
187 causes an ATA command, previously loaded with
188 ->tf_load(), to be initiated in hardware.
189 Most drivers for taskfile-based hardware use ata_sff_exec_command()
190 for this hook.
191 </para>
192
193 </sect2>
194
195 <sect2><title>Per-cmd ATAPI DMA capabilities filter</title>
196 <programlisting>
197int (*check_atapi_dma) (struct ata_queued_cmd *qc);
198 </programlisting>
199
200 <para>
201Allow low-level driver to filter ATA PACKET commands, returning a status
202indicating whether or not it is OK to use DMA for the supplied PACKET
203command.
204 </para>
205 <para>
206 This hook may be specified as NULL, in which case libata will
207 assume that atapi dma can be supported.
208 </para>
209
210 </sect2>
211
212 <sect2><title>Read specific ATA shadow registers</title>
213 <programlisting>
214u8 (*sff_check_status)(struct ata_port *ap);
215u8 (*sff_check_altstatus)(struct ata_port *ap);
216 </programlisting>
217
218 <para>
219 Reads the Status/AltStatus ATA shadow register from
220 hardware. On some hardware, reading the Status register has
221 the side effect of clearing the interrupt condition.
222 Most drivers for taskfile-based hardware use
223 ata_sff_check_status() for this hook.
224 </para>
225
226 </sect2>
227
228 <sect2><title>Write specific ATA shadow register</title>
229 <programlisting>
230void (*sff_set_devctl)(struct ata_port *ap, u8 ctl);
231 </programlisting>
232
233 <para>
234 Write the device control ATA shadow register to the hardware.
235 Most drivers don't need to define this.
236 </para>
237
238 </sect2>
239
240 <sect2><title>Select ATA device on bus</title>
241 <programlisting>
242void (*sff_dev_select)(struct ata_port *ap, unsigned int device);
243 </programlisting>
244
245 <para>
246 Issues the low-level hardware command(s) that causes one of N
247 hardware devices to be considered 'selected' (active and
248 available for use) on the ATA bus. This generally has no
249 meaning on FIS-based devices.
250 </para>
251 <para>
252 Most drivers for taskfile-based hardware use
253 ata_sff_dev_select() for this hook.
254 </para>
255
256 </sect2>
257
258 <sect2><title>Private tuning method</title>
259 <programlisting>
260void (*set_mode) (struct ata_port *ap);
261 </programlisting>
262
263 <para>
264 By default libata performs drive and controller tuning in
265 accordance with the ATA timing rules and also applies blacklists
266 and cable limits. Some controllers need special handling and have
267 custom tuning rules, typically raid controllers that use ATA
268 commands but do not actually do drive timing.
269 </para>
270
271 <warning>
272 <para>
273 This hook should not be used to replace the standard controller
274 tuning logic when a controller has quirks. Replacing the default
275 tuning logic in that case would bypass handling for drive and
276 bridge quirks that may be important to data reliability. If a
277 controller needs to filter the mode selection it should use the
278 mode_filter hook instead.
279 </para>
280 </warning>
281
282 </sect2>
283
284 <sect2><title>Control PCI IDE BMDMA engine</title>
285 <programlisting>
286void (*bmdma_setup) (struct ata_queued_cmd *qc);
287void (*bmdma_start) (struct ata_queued_cmd *qc);
288void (*bmdma_stop) (struct ata_port *ap);
289u8 (*bmdma_status) (struct ata_port *ap);
290 </programlisting>
291
292 <para>
293When setting up an IDE BMDMA transaction, these hooks arm
294(->bmdma_setup), fire (->bmdma_start), and halt (->bmdma_stop)
295the hardware's DMA engine. ->bmdma_status is used to read the standard
296PCI IDE DMA Status register.
297 </para>
298
299 <para>
300These hooks are typically either no-ops, or simply not implemented, in
301FIS-based drivers.
302 </para>
303 <para>
304Most legacy IDE drivers use ata_bmdma_setup() for the bmdma_setup()
305hook. ata_bmdma_setup() will write the pointer to the PRD table to
306the IDE PRD Table Address register, enable DMA in the DMA Command
307register, and call exec_command() to begin the transfer.
308 </para>
309 <para>
310Most legacy IDE drivers use ata_bmdma_start() for the bmdma_start()
311hook. ata_bmdma_start() will write the ATA_DMA_START flag to the DMA
312Command register.
313 </para>
314 <para>
315Many legacy IDE drivers use ata_bmdma_stop() for the bmdma_stop()
316hook. ata_bmdma_stop() clears the ATA_DMA_START flag in the DMA
317command register.
318 </para>
319 <para>
320Many legacy IDE drivers use ata_bmdma_status() as the bmdma_status() hook.
321 </para>
322
323 </sect2>
324
325 <sect2><title>High-level taskfile hooks</title>
326 <programlisting>
327void (*qc_prep) (struct ata_queued_cmd *qc);
328int (*qc_issue) (struct ata_queued_cmd *qc);
329 </programlisting>
330
331 <para>
332 Higher-level hooks, these two hooks can potentially supercede
333 several of the above taskfile/DMA engine hooks. ->qc_prep is
334 called after the buffers have been DMA-mapped, and is typically
335 used to populate the hardware's DMA scatter-gather table.
336 Most drivers use the standard ata_qc_prep() helper function, but
337 more advanced drivers roll their own.
338 </para>
339 <para>
340 ->qc_issue is used to make a command active, once the hardware
341 and S/G tables have been prepared. IDE BMDMA drivers use the
342 helper function ata_qc_issue_prot() for taskfile protocol-based
343 dispatch. More advanced drivers implement their own ->qc_issue.
344 </para>
345 <para>
346 ata_qc_issue_prot() calls ->tf_load(), ->bmdma_setup(), and
347 ->bmdma_start() as necessary to initiate a transfer.
348 </para>
349
350 </sect2>
351
352 <sect2><title>Exception and probe handling (EH)</title>
353 <programlisting>
354void (*eng_timeout) (struct ata_port *ap);
355void (*phy_reset) (struct ata_port *ap);
356 </programlisting>
357
358 <para>
359Deprecated. Use ->error_handler() instead.
360 </para>
361
362 <programlisting>
363void (*freeze) (struct ata_port *ap);
364void (*thaw) (struct ata_port *ap);
365 </programlisting>
366
367 <para>
368ata_port_freeze() is called when HSM violations or some other
369condition disrupts normal operation of the port. A frozen port
370is not allowed to perform any operation until the port is
371thawed, which usually follows a successful reset.
372 </para>
373
374 <para>
375The optional ->freeze() callback can be used for freezing the port
376hardware-wise (e.g. mask interrupt and stop DMA engine). If a
377port cannot be frozen hardware-wise, the interrupt handler
378must ack and clear interrupts unconditionally while the port
379is frozen.
380 </para>
381 <para>
382The optional ->thaw() callback is called to perform the opposite of ->freeze():
383prepare the port for normal operation once again. Unmask interrupts,
384start DMA engine, etc.
385 </para>
386
387 <programlisting>
388void (*error_handler) (struct ata_port *ap);
389 </programlisting>
390
391 <para>
392->error_handler() is a driver's hook into probe, hotplug, and recovery
393and other exceptional conditions. The primary responsibility of an
394implementation is to call ata_do_eh() or ata_bmdma_drive_eh() with a set
395of EH hooks as arguments:
396 </para>
397
398 <para>
399'prereset' hook (may be NULL) is called during an EH reset, before any other actions
400are taken.
401 </para>
402
403 <para>
404'postreset' hook (may be NULL) is called after the EH reset is performed. Based on
405existing conditions, severity of the problem, and hardware capabilities,
406 </para>
407
408 <para>
409Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be
410called to perform the low-level EH reset.
411 </para>
412
413 <programlisting>
414void (*post_internal_cmd) (struct ata_queued_cmd *qc);
415 </programlisting>
416
417 <para>
418Perform any hardware-specific actions necessary to finish processing
419after executing a probe-time or EH-time command via ata_exec_internal().
420 </para>
421
422 </sect2>
423
424 <sect2><title>Hardware interrupt handling</title>
425 <programlisting>
426irqreturn_t (*irq_handler)(int, void *, struct pt_regs *);
427void (*irq_clear) (struct ata_port *);
428 </programlisting>
429
430 <para>
431 ->irq_handler is the interrupt handling routine registered with
432 the system, by libata. ->irq_clear is called during probe just
433 before the interrupt handler is registered, to be sure hardware
434 is quiet.
435 </para>
436 <para>
437 The second argument, dev_instance, should be cast to a pointer
438 to struct ata_host_set.
439 </para>
440 <para>
441 Most legacy IDE drivers use ata_sff_interrupt() for the
442 irq_handler hook, which scans all ports in the host_set,
443 determines which queued command was active (if any), and calls
444 ata_sff_host_intr(ap,qc).
445 </para>
446 <para>
447 Most legacy IDE drivers use ata_sff_irq_clear() for the
448 irq_clear() hook, which simply clears the interrupt and error
449 flags in the DMA status register.
450 </para>
451
452 </sect2>
453
454 <sect2><title>SATA phy read/write</title>
455 <programlisting>
456int (*scr_read) (struct ata_port *ap, unsigned int sc_reg,
457 u32 *val);
458int (*scr_write) (struct ata_port *ap, unsigned int sc_reg,
459 u32 val);
460 </programlisting>
461
462 <para>
463 Read and write standard SATA phy registers. Currently only used
464 if ->phy_reset hook called the sata_phy_reset() helper function.
465 sc_reg is one of SCR_STATUS, SCR_CONTROL, SCR_ERROR, or SCR_ACTIVE.
466 </para>
467
468 </sect2>
469
470 <sect2><title>Init and shutdown</title>
471 <programlisting>
472int (*port_start) (struct ata_port *ap);
473void (*port_stop) (struct ata_port *ap);
474void (*host_stop) (struct ata_host_set *host_set);
475 </programlisting>
476
477 <para>
478 ->port_start() is called just after the data structures for each
479 port are initialized. Typically this is used to alloc per-port
480 DMA buffers / tables / rings, enable DMA engines, and similar
481 tasks. Some drivers also use this entry point as a chance to
482 allocate driver-private memory for ap->private_data.
483 </para>
484 <para>
485 Many drivers use ata_port_start() as this hook or call
486 it from their own port_start() hooks. ata_port_start()
487 allocates space for a legacy IDE PRD table and returns.
488 </para>
489 <para>
490 ->port_stop() is called after ->host_stop(). Its sole function
491 is to release DMA/memory resources, now that they are no longer
492 actively being used. Many drivers also free driver-private
493 data from port at this time.
494 </para>
495 <para>
496 ->host_stop() is called after all ->port_stop() calls
497have completed. The hook must finalize hardware shutdown, release DMA
498and other resources, etc.
499 This hook may be specified as NULL, in which case it is not called.
500 </para>
501
502 </sect2>
503
504 </sect1>
505 </chapter>
506
507 <chapter id="libataEH">
508 <title>Error handling</title>
509
510 <para>
511 This chapter describes how errors are handled under libata.
512 Readers are advised to read SCSI EH
513 (Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first.
514 </para>
515
516 <sect1><title>Origins of commands</title>
517 <para>
518 In libata, a command is represented with struct ata_queued_cmd
519 or qc. qc's are preallocated during port initialization and
520 repetitively used for command executions. Currently only one
521 qc is allocated per port but yet-to-be-merged NCQ branch
522 allocates one for each tag and maps each qc to NCQ tag 1-to-1.
523 </para>
524 <para>
525 libata commands can originate from two sources - libata itself
526 and SCSI midlayer. libata internal commands are used for
527 initialization and error handling. All normal blk requests
528 and commands for SCSI emulation are passed as SCSI commands
529 through queuecommand callback of SCSI host template.
530 </para>
531 </sect1>
532
533 <sect1><title>How commands are issued</title>
534
535 <variablelist>
536
537 <varlistentry><term>Internal commands</term>
538 <listitem>
539 <para>
540 First, qc is allocated and initialized using
541 ata_qc_new_init(). Although ata_qc_new_init() doesn't
542 implement any wait or retry mechanism when qc is not
543 available, internal commands are currently issued only during
544 initialization and error recovery, so no other command is
545 active and allocation is guaranteed to succeed.
546 </para>
547 <para>
548 Once allocated qc's taskfile is initialized for the command to
549 be executed. qc currently has two mechanisms to notify
550 completion. One is via qc->complete_fn() callback and the
551 other is completion qc->waiting. qc->complete_fn() callback
552 is the asynchronous path used by normal SCSI translated
553 commands and qc->waiting is the synchronous (issuer sleeps in
554 process context) path used by internal commands.
555 </para>
556 <para>
557 Once initialization is complete, host_set lock is acquired
558 and the qc is issued.
559 </para>
560 </listitem>
561 </varlistentry>
562
563 <varlistentry><term>SCSI commands</term>
564 <listitem>
565 <para>
566 All libata drivers use ata_scsi_queuecmd() as
567 hostt->queuecommand callback. scmds can either be simulated
568 or translated. No qc is involved in processing a simulated
569 scmd. The result is computed right away and the scmd is
570 completed.
571 </para>
572 <para>
573 For a translated scmd, ata_qc_new_init() is invoked to
574 allocate a qc and the scmd is translated into the qc. SCSI
575 midlayer's completion notification function pointer is stored
576 into qc->scsidone.
577 </para>
578 <para>
579 qc->complete_fn() callback is used for completion
580 notification. ATA commands use ata_scsi_qc_complete() while
581 ATAPI commands use atapi_qc_complete(). Both functions end up
582 calling qc->scsidone to notify upper layer when the qc is
583 finished. After translation is completed, the qc is issued
584 with ata_qc_issue().
585 </para>
586 <para>
587 Note that SCSI midlayer invokes hostt->queuecommand while
588 holding host_set lock, so all above occur while holding
589 host_set lock.
590 </para>
591 </listitem>
592 </varlistentry>
593
594 </variablelist>
595 </sect1>
596
597 <sect1><title>How commands are processed</title>
598 <para>
599 Depending on which protocol and which controller are used,
600 commands are processed differently. For the purpose of
601 discussion, a controller which uses taskfile interface and all
602 standard callbacks is assumed.
603 </para>
604 <para>
605 Currently 6 ATA command protocols are used. They can be
606 sorted into the following four categories according to how
607 they are processed.
608 </para>
609
610 <variablelist>
611 <varlistentry><term>ATA NO DATA or DMA</term>
612 <listitem>
613 <para>
614 ATA_PROT_NODATA and ATA_PROT_DMA fall into this category.
615 These types of commands don't require any software
616 intervention once issued. Device will raise interrupt on
617 completion.
618 </para>
619 </listitem>
620 </varlistentry>
621
622 <varlistentry><term>ATA PIO</term>
623 <listitem>
624 <para>
625 ATA_PROT_PIO is in this category. libata currently
626 implements PIO with polling. ATA_NIEN bit is set to turn
627 off interrupt and pio_task on ata_wq performs polling and
628 IO.
629 </para>
630 </listitem>
631 </varlistentry>
632
633 <varlistentry><term>ATAPI NODATA or DMA</term>
634 <listitem>
635 <para>
636 ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this
637 category. packet_task is used to poll BSY bit after
638 issuing PACKET command. Once BSY is turned off by the
639 device, packet_task transfers CDB and hands off processing
640 to interrupt handler.
641 </para>
642 </listitem>
643 </varlistentry>
644
645 <varlistentry><term>ATAPI PIO</term>
646 <listitem>
647 <para>
648 ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set
649 and, as in ATAPI NODATA or DMA, packet_task submits cdb.
650 However, after submitting cdb, further processing (data
651 transfer) is handed off to pio_task.
652 </para>
653 </listitem>
654 </varlistentry>
655 </variablelist>
656 </sect1>
657
658 <sect1><title>How commands are completed</title>
659 <para>
660 Once issued, all qc's are either completed with
661 ata_qc_complete() or time out. For commands which are handled
662 by interrupts, ata_host_intr() invokes ata_qc_complete(), and,
663 for PIO tasks, pio_task invokes ata_qc_complete(). In error
664 cases, packet_task may also complete commands.
665 </para>
666 <para>
667 ata_qc_complete() does the following.
668 </para>
669
670 <orderedlist>
671
672 <listitem>
673 <para>
674 DMA memory is unmapped.
675 </para>
676 </listitem>
677
678 <listitem>
679 <para>
680 ATA_QCFLAG_ACTIVE is cleared from qc->flags.
681 </para>
682 </listitem>
683
684 <listitem>
685 <para>
686 qc->complete_fn() callback is invoked. If the return value of
687 the callback is not zero. Completion is short circuited and
688 ata_qc_complete() returns.
689 </para>
690 </listitem>
691
692 <listitem>
693 <para>
694 __ata_qc_complete() is called, which does
695 <orderedlist>
696
697 <listitem>
698 <para>
699 qc->flags is cleared to zero.
700 </para>
701 </listitem>
702
703 <listitem>
704 <para>
705 ap->active_tag and qc->tag are poisoned.
706 </para>
707 </listitem>
708
709 <listitem>
710 <para>
711 qc->waiting is cleared &amp; completed (in that order).
712 </para>
713 </listitem>
714
715 <listitem>
716 <para>
717 qc is deallocated by clearing appropriate bit in ap->qactive.
718 </para>
719 </listitem>
720
721 </orderedlist>
722 </para>
723 </listitem>
724
725 </orderedlist>
726
727 <para>
728 So, it basically notifies upper layer and deallocates qc. One
729 exception is short-circuit path in #3 which is used by
730 atapi_qc_complete().
731 </para>
732 <para>
733 For all non-ATAPI commands, whether it fails or not, almost
734 the same code path is taken and very little error handling
735 takes place. A qc is completed with success status if it
736 succeeded, with failed status otherwise.
737 </para>
738 <para>
739 However, failed ATAPI commands require more handling as
740 REQUEST SENSE is needed to acquire sense data. If an ATAPI
741 command fails, ata_qc_complete() is invoked with error status,
742 which in turn invokes atapi_qc_complete() via
743 qc->complete_fn() callback.
744 </para>
745 <para>
746 This makes atapi_qc_complete() set scmd->result to
747 SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As
748 the sense data is empty but scmd->result is CHECK CONDITION,
749 SCSI midlayer will invoke EH for the scmd, and returning 1
750 makes ata_qc_complete() to return without deallocating the qc.
751 This leads us to ata_scsi_error() with partially completed qc.
752 </para>
753
754 </sect1>
755
756 <sect1><title>ata_scsi_error()</title>
757 <para>
758 ata_scsi_error() is the current transportt->eh_strategy_handler()
759 for libata. As discussed above, this will be entered in two
760 cases - timeout and ATAPI error completion. This function
761 calls low level libata driver's eng_timeout() callback, the
762 standard callback for which is ata_eng_timeout(). It checks
763 if a qc is active and calls ata_qc_timeout() on the qc if so.
764 Actual error handling occurs in ata_qc_timeout().
765 </para>
766 <para>
767 If EH is invoked for timeout, ata_qc_timeout() stops BMDMA and
768 completes the qc. Note that as we're currently in EH, we
769 cannot call scsi_done. As described in SCSI EH doc, a
770 recovered scmd should be either retried with
771 scsi_queue_insert() or finished with scsi_finish_command().
772 Here, we override qc->scsidone with scsi_finish_command() and
773 calls ata_qc_complete().
774 </para>
775 <para>
776 If EH is invoked due to a failed ATAPI qc, the qc here is
777 completed but not deallocated. The purpose of this
778 half-completion is to use the qc as place holder to make EH
779 code reach this place. This is a bit hackish, but it works.
780 </para>
781 <para>
782 Once control reaches here, the qc is deallocated by invoking
783 __ata_qc_complete() explicitly. Then, internal qc for REQUEST
784 SENSE is issued. Once sense data is acquired, scmd is
785 finished by directly invoking scsi_finish_command() on the
786 scmd. Note that as we already have completed and deallocated
787 the qc which was associated with the scmd, we don't need
788 to/cannot call ata_qc_complete() again.
789 </para>
790
791 </sect1>
792
793 <sect1><title>Problems with the current EH</title>
794
795 <itemizedlist>
796
797 <listitem>
798 <para>
799 Error representation is too crude. Currently any and all
800 error conditions are represented with ATA STATUS and ERROR
801 registers. Errors which aren't ATA device errors are treated
802 as ATA device errors by setting ATA_ERR bit. Better error
803 descriptor which can properly represent ATA and other
804 errors/exceptions is needed.
805 </para>
806 </listitem>
807
808 <listitem>
809 <para>
810 When handling timeouts, no action is taken to make device
811 forget about the timed out command and ready for new commands.
812 </para>
813 </listitem>
814
815 <listitem>
816 <para>
817 EH handling via ata_scsi_error() is not properly protected
818 from usual command processing. On EH entrance, the device is
819 not in quiescent state. Timed out commands may succeed or
820 fail any time. pio_task and atapi_task may still be running.
821 </para>
822 </listitem>
823
824 <listitem>
825 <para>
826 Too weak error recovery. Devices / controllers causing HSM
827 mismatch errors and other errors quite often require reset to
828 return to known state. Also, advanced error handling is
829 necessary to support features like NCQ and hotplug.
830 </para>
831 </listitem>
832
833 <listitem>
834 <para>
835 ATA errors are directly handled in the interrupt handler and
836 PIO errors in pio_task. This is problematic for advanced
837 error handling for the following reasons.
838 </para>
839 <para>
840 First, advanced error handling often requires context and
841 internal qc execution.
842 </para>
843 <para>
844 Second, even a simple failure (say, CRC error) needs
845 information gathering and could trigger complex error handling
846 (say, resetting &amp; reconfiguring). Having multiple code
847 paths to gather information, enter EH and trigger actions
848 makes life painful.
849 </para>
850 <para>
851 Third, scattered EH code makes implementing low level drivers
852 difficult. Low level drivers override libata callbacks. If
853 EH is scattered over several places, each affected callbacks
854 should perform its part of error handling. This can be error
855 prone and painful.
856 </para>
857 </listitem>
858
859 </itemizedlist>
860 </sect1>
861 </chapter>
862
863 <chapter id="libataExt">
864 <title>libata Library</title>
865!Edrivers/ata/libata-core.c
866 </chapter>
867
868 <chapter id="libataInt">
869 <title>libata Core Internals</title>
870!Idrivers/ata/libata-core.c
871 </chapter>
872
873 <chapter id="libataScsiInt">
874 <title>libata SCSI translation/emulation</title>
875!Edrivers/ata/libata-scsi.c
876!Idrivers/ata/libata-scsi.c
877 </chapter>
878
879 <chapter id="ataExceptions">
880 <title>ATA errors and exceptions</title>
881
882 <para>
883 This chapter tries to identify what error/exception conditions exist
884 for ATA/ATAPI devices and describe how they should be handled in
885 implementation-neutral way.
886 </para>
887
888 <para>
889 The term 'error' is used to describe conditions where either an
890 explicit error condition is reported from device or a command has
891 timed out.
892 </para>
893
894 <para>
895 The term 'exception' is either used to describe exceptional
896 conditions which are not errors (say, power or hotplug events), or
897 to describe both errors and non-error exceptional conditions. Where
898 explicit distinction between error and exception is necessary, the
899 term 'non-error exception' is used.
900 </para>
901
902 <sect1 id="excat">
903 <title>Exception categories</title>
904 <para>
905 Exceptions are described primarily with respect to legacy
906 taskfile + bus master IDE interface. If a controller provides
907 other better mechanism for error reporting, mapping those into
908 categories described below shouldn't be difficult.
909 </para>
910
911 <para>
912 In the following sections, two recovery actions - reset and
913 reconfiguring transport - are mentioned. These are described
914 further in <xref linkend="exrec"/>.
915 </para>
916
917 <sect2 id="excatHSMviolation">
918 <title>HSM violation</title>
919 <para>
920 This error is indicated when STATUS value doesn't match HSM
921 requirement during issuing or execution any ATA/ATAPI command.
922 </para>
923
924 <itemizedlist>
925 <title>Examples</title>
926
927 <listitem>
928 <para>
929 ATA_STATUS doesn't contain !BSY &amp;&amp; DRDY &amp;&amp; !DRQ while trying
930 to issue a command.
931 </para>
932 </listitem>
933
934 <listitem>
935 <para>
936 !BSY &amp;&amp; !DRQ during PIO data transfer.
937 </para>
938 </listitem>
939
940 <listitem>
941 <para>
942 DRQ on command completion.
943 </para>
944 </listitem>
945
946 <listitem>
947 <para>
948 !BSY &amp;&amp; ERR after CDB transfer starts but before the
949 last byte of CDB is transferred. ATA/ATAPI standard states
950 that &quot;The device shall not terminate the PACKET command
951 with an error before the last byte of the command packet has
952 been written&quot; in the error outputs description of PACKET
953 command and the state diagram doesn't include such
954 transitions.
955 </para>
956 </listitem>
957
958 </itemizedlist>
959
960 <para>
961 In these cases, HSM is violated and not much information
962 regarding the error can be acquired from STATUS or ERROR
963 register. IOW, this error can be anything - driver bug,
964 faulty device, controller and/or cable.
965 </para>
966
967 <para>
968 As HSM is violated, reset is necessary to restore known state.
969 Reconfiguring transport for lower speed might be helpful too
970 as transmission errors sometimes cause this kind of errors.
971 </para>
972 </sect2>
973
974 <sect2 id="excatDevErr">
975 <title>ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</title>
976
977 <para>
978 These are errors detected and reported by ATA/ATAPI devices
979 indicating device problems. For this type of errors, STATUS
980 and ERROR register values are valid and describe error
981 condition. Note that some of ATA bus errors are detected by
982 ATA/ATAPI devices and reported using the same mechanism as
983 device errors. Those cases are described later in this
984 section.
985 </para>
986
987 <para>
988 For ATA commands, this type of errors are indicated by !BSY
989 &amp;&amp; ERR during command execution and on completion.
990 </para>
991
992 <para>For ATAPI commands,</para>
993
994 <itemizedlist>
995
996 <listitem>
997 <para>
998 !BSY &amp;&amp; ERR &amp;&amp; ABRT right after issuing PACKET
999 indicates that PACKET command is not supported and falls in
1000 this category.
1001 </para>
1002 </listitem>
1003
1004 <listitem>
1005 <para>
1006 !BSY &amp;&amp; ERR(==CHK) &amp;&amp; !ABRT after the last
1007 byte of CDB is transferred indicates CHECK CONDITION and
1008 doesn't fall in this category.
1009 </para>
1010 </listitem>
1011
1012 <listitem>
1013 <para>
1014 !BSY &amp;&amp; ERR(==CHK) &amp;&amp; ABRT after the last byte
1015 of CDB is transferred *probably* indicates CHECK CONDITION and
1016 doesn't fall in this category.
1017 </para>
1018 </listitem>
1019
1020 </itemizedlist>
1021
1022 <para>
1023 Of errors detected as above, the following are not ATA/ATAPI
1024 device errors but ATA bus errors and should be handled
1025 according to <xref linkend="excatATAbusErr"/>.
1026 </para>
1027
1028 <variablelist>
1029
1030 <varlistentry>
1031 <term>CRC error during data transfer</term>
1032 <listitem>
1033 <para>
1034 This is indicated by ICRC bit in the ERROR register and
1035 means that corruption occurred during data transfer. Up to
1036 ATA/ATAPI-7, the standard specifies that this bit is only
1037 applicable to UDMA transfers but ATA/ATAPI-8 draft revision
1038 1f says that the bit may be applicable to multiword DMA and
1039 PIO.
1040 </para>
1041 </listitem>
1042 </varlistentry>
1043
1044 <varlistentry>
1045 <term>ABRT error during data transfer or on completion</term>
1046 <listitem>
1047 <para>
1048 Up to ATA/ATAPI-7, the standard specifies that ABRT could be
1049 set on ICRC errors and on cases where a device is not able
1050 to complete a command. Combined with the fact that MWDMA
1051 and PIO transfer errors aren't allowed to use ICRC bit up to
1052 ATA/ATAPI-7, it seems to imply that ABRT bit alone could
1053 indicate transfer errors.
1054 </para>
1055 <para>
1056 However, ATA/ATAPI-8 draft revision 1f removes the part
1057 that ICRC errors can turn on ABRT. So, this is kind of
1058 gray area. Some heuristics are needed here.
1059 </para>
1060 </listitem>
1061 </varlistentry>
1062
1063 </variablelist>
1064
1065 <para>
1066 ATA/ATAPI device errors can be further categorized as follows.
1067 </para>
1068
1069 <variablelist>
1070
1071 <varlistentry>
1072 <term>Media errors</term>
1073 <listitem>
1074 <para>
1075 This is indicated by UNC bit in the ERROR register. ATA
1076 devices reports UNC error only after certain number of
1077 retries cannot recover the data, so there's nothing much
1078 else to do other than notifying upper layer.
1079 </para>
1080 <para>
1081 READ and WRITE commands report CHS or LBA of the first
1082 failed sector but ATA/ATAPI standard specifies that the
1083 amount of transferred data on error completion is
1084 indeterminate, so we cannot assume that sectors preceding
1085 the failed sector have been transferred and thus cannot
1086 complete those sectors successfully as SCSI does.
1087 </para>
1088 </listitem>
1089 </varlistentry>
1090
1091 <varlistentry>
1092 <term>Media changed / media change requested error</term>
1093 <listitem>
1094 <para>
1095 &lt;&lt;TODO: fill here&gt;&gt;
1096 </para>
1097 </listitem>
1098 </varlistentry>
1099
1100 <varlistentry><term>Address error</term>
1101 <listitem>
1102 <para>
1103 This is indicated by IDNF bit in the ERROR register.
1104 Report to upper layer.
1105 </para>
1106 </listitem>
1107 </varlistentry>
1108
1109 <varlistentry><term>Other errors</term>
1110 <listitem>
1111 <para>
1112 This can be invalid command or parameter indicated by ABRT
1113 ERROR bit or some other error condition. Note that ABRT
1114 bit can indicate a lot of things including ICRC and Address
1115 errors. Heuristics needed.
1116 </para>
1117 </listitem>
1118 </varlistentry>
1119
1120 </variablelist>
1121
1122 <para>
1123 Depending on commands, not all STATUS/ERROR bits are
1124 applicable. These non-applicable bits are marked with
1125 &quot;na&quot; in the output descriptions but up to ATA/ATAPI-7
1126 no definition of &quot;na&quot; can be found. However,
1127 ATA/ATAPI-8 draft revision 1f describes &quot;N/A&quot; as
1128 follows.
1129 </para>
1130
1131 <blockquote>
1132 <variablelist>
1133 <varlistentry><term>3.2.3.3a N/A</term>
1134 <listitem>
1135 <para>
1136 A keyword the indicates a field has no defined value in
1137 this standard and should not be checked by the host or
1138 device. N/A fields should be cleared to zero.
1139 </para>
1140 </listitem>
1141 </varlistentry>
1142 </variablelist>
1143 </blockquote>
1144
1145 <para>
1146 So, it seems reasonable to assume that &quot;na&quot; bits are
1147 cleared to zero by devices and thus need no explicit masking.
1148 </para>
1149
1150 </sect2>
1151
1152 <sect2 id="excatATAPIcc">
1153 <title>ATAPI device CHECK CONDITION</title>
1154
1155 <para>
1156 ATAPI device CHECK CONDITION error is indicated by set CHK bit
1157 (ERR bit) in the STATUS register after the last byte of CDB is
1158 transferred for a PACKET command. For this kind of errors,
1159 sense data should be acquired to gather information regarding
1160 the errors. REQUEST SENSE packet command should be used to
1161 acquire sense data.
1162 </para>
1163
1164 <para>
1165 Once sense data is acquired, this type of errors can be
1166 handled similarly to other SCSI errors. Note that sense data
1167 may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR
1168 &amp;&amp; ASC/ASCQ 47h/00h SCSI PARITY ERROR). In such
1169 cases, the error should be considered as an ATA bus error and
1170 handled according to <xref linkend="excatATAbusErr"/>.
1171 </para>
1172
1173 </sect2>
1174
1175 <sect2 id="excatNCQerr">
1176 <title>ATA device error (NCQ)</title>
1177
1178 <para>
1179 NCQ command error is indicated by cleared BSY and set ERR bit
1180 during NCQ command phase (one or more NCQ commands
1181 outstanding). Although STATUS and ERROR registers will
1182 contain valid values describing the error, READ LOG EXT is
1183 required to clear the error condition, determine which command
1184 has failed and acquire more information.
1185 </para>
1186
1187 <para>
1188 READ LOG EXT Log Page 10h reports which tag has failed and
1189 taskfile register values describing the error. With this
1190 information the failed command can be handled as a normal ATA
1191 command error as in <xref linkend="excatDevErr"/> and all
1192 other in-flight commands must be retried. Note that this
1193 retry should not be counted - it's likely that commands
1194 retried this way would have completed normally if it were not
1195 for the failed command.
1196 </para>
1197
1198 <para>
1199 Note that ATA bus errors can be reported as ATA device NCQ
1200 errors. This should be handled as described in <xref
1201 linkend="excatATAbusErr"/>.
1202 </para>
1203
1204 <para>
1205 If READ LOG EXT Log Page 10h fails or reports NQ, we're
1206 thoroughly screwed. This condition should be treated
1207 according to <xref linkend="excatHSMviolation"/>.
1208 </para>
1209
1210 </sect2>
1211
1212 <sect2 id="excatATAbusErr">
1213 <title>ATA bus error</title>
1214
1215 <para>
1216 ATA bus error means that data corruption occurred during
1217 transmission over ATA bus (SATA or PATA). This type of errors
1218 can be indicated by
1219 </para>
1220
1221 <itemizedlist>
1222
1223 <listitem>
1224 <para>
1225 ICRC or ABRT error as described in <xref linkend="excatDevErr"/>.
1226 </para>
1227 </listitem>
1228
1229 <listitem>
1230 <para>
1231 Controller-specific error completion with error information
1232 indicating transmission error.
1233 </para>
1234 </listitem>
1235
1236 <listitem>
1237 <para>
1238 On some controllers, command timeout. In this case, there may
1239 be a mechanism to determine that the timeout is due to
1240 transmission error.
1241 </para>
1242 </listitem>
1243
1244 <listitem>
1245 <para>
1246 Unknown/random errors, timeouts and all sorts of weirdities.
1247 </para>
1248 </listitem>
1249
1250 </itemizedlist>
1251
1252 <para>
1253 As described above, transmission errors can cause wide variety
1254 of symptoms ranging from device ICRC error to random device
1255 lockup, and, for many cases, there is no way to tell if an
1256 error condition is due to transmission error or not;
1257 therefore, it's necessary to employ some kind of heuristic
1258 when dealing with errors and timeouts. For example,
1259 encountering repetitive ABRT errors for known supported
1260 command is likely to indicate ATA bus error.
1261 </para>
1262
1263 <para>
1264 Once it's determined that ATA bus errors have possibly
1265 occurred, lowering ATA bus transmission speed is one of
1266 actions which may alleviate the problem. See <xref
1267 linkend="exrecReconf"/> for more information.
1268 </para>
1269
1270 </sect2>
1271
1272 <sect2 id="excatPCIbusErr">
1273 <title>PCI bus error</title>
1274
1275 <para>
1276 Data corruption or other failures during transmission over PCI
1277 (or other system bus). For standard BMDMA, this is indicated
1278 by Error bit in the BMDMA Status register. This type of
1279 errors must be logged as it indicates something is very wrong
1280 with the system. Resetting host controller is recommended.
1281 </para>
1282
1283 </sect2>
1284
1285 <sect2 id="excatLateCompletion">
1286 <title>Late completion</title>
1287
1288 <para>
1289 This occurs when timeout occurs and the timeout handler finds
1290 out that the timed out command has completed successfully or
1291 with error. This is usually caused by lost interrupts. This
1292 type of errors must be logged. Resetting host controller is
1293 recommended.
1294 </para>
1295
1296 </sect2>
1297
1298 <sect2 id="excatUnknown">
1299 <title>Unknown error (timeout)</title>
1300
1301 <para>
1302 This is when timeout occurs and the command is still
1303 processing or the host and device are in unknown state. When
1304 this occurs, HSM could be in any valid or invalid state. To
1305 bring the device to known state and make it forget about the
1306 timed out command, resetting is necessary. The timed out
1307 command may be retried.
1308 </para>
1309
1310 <para>
1311 Timeouts can also be caused by transmission errors. Refer to
1312 <xref linkend="excatATAbusErr"/> for more details.
1313 </para>
1314
1315 </sect2>
1316
1317 <sect2 id="excatHoplugPM">
1318 <title>Hotplug and power management exceptions</title>
1319
1320 <para>
1321 &lt;&lt;TODO: fill here&gt;&gt;
1322 </para>
1323
1324 </sect2>
1325
1326 </sect1>
1327
1328 <sect1 id="exrec">
1329 <title>EH recovery actions</title>
1330
1331 <para>
1332 This section discusses several important recovery actions.
1333 </para>
1334
1335 <sect2 id="exrecClr">
1336 <title>Clearing error condition</title>
1337
1338 <para>
1339 Many controllers require its error registers to be cleared by
1340 error handler. Different controllers may have different
1341 requirements.
1342 </para>
1343
1344 <para>
1345 For SATA, it's strongly recommended to clear at least SError
1346 register during error handling.
1347 </para>
1348 </sect2>
1349
1350 <sect2 id="exrecRst">
1351 <title>Reset</title>
1352
1353 <para>
1354 During EH, resetting is necessary in the following cases.
1355 </para>
1356
1357 <itemizedlist>
1358
1359 <listitem>
1360 <para>
1361 HSM is in unknown or invalid state
1362 </para>
1363 </listitem>
1364
1365 <listitem>
1366 <para>
1367 HBA is in unknown or invalid state
1368 </para>
1369 </listitem>
1370
1371 <listitem>
1372 <para>
1373 EH needs to make HBA/device forget about in-flight commands
1374 </para>
1375 </listitem>
1376
1377 <listitem>
1378 <para>
1379 HBA/device behaves weirdly
1380 </para>
1381 </listitem>
1382
1383 </itemizedlist>
1384
1385 <para>
1386 Resetting during EH might be a good idea regardless of error
1387 condition to improve EH robustness. Whether to reset both or
1388 either one of HBA and device depends on situation but the
1389 following scheme is recommended.
1390 </para>
1391
1392 <itemizedlist>
1393
1394 <listitem>
1395 <para>
1396 When it's known that HBA is in ready state but ATA/ATAPI
1397 device is in unknown state, reset only device.
1398 </para>
1399 </listitem>
1400
1401 <listitem>
1402 <para>
1403 If HBA is in unknown state, reset both HBA and device.
1404 </para>
1405 </listitem>
1406
1407 </itemizedlist>
1408
1409 <para>
1410 HBA resetting is implementation specific. For a controller
1411 complying to taskfile/BMDMA PCI IDE, stopping active DMA
1412 transaction may be sufficient iff BMDMA state is the only HBA
1413 context. But even mostly taskfile/BMDMA PCI IDE complying
1414 controllers may have implementation specific requirements and
1415 mechanism to reset themselves. This must be addressed by
1416 specific drivers.
1417 </para>
1418
1419 <para>
1420 OTOH, ATA/ATAPI standard describes in detail ways to reset
1421 ATA/ATAPI devices.
1422 </para>
1423
1424 <variablelist>
1425
1426 <varlistentry><term>PATA hardware reset</term>
1427 <listitem>
1428 <para>
1429 This is hardware initiated device reset signalled with
1430 asserted PATA RESET- signal. There is no standard way to
1431 initiate hardware reset from software although some
1432 hardware provides registers that allow driver to directly
1433 tweak the RESET- signal.
1434 </para>
1435 </listitem>
1436 </varlistentry>
1437
1438 <varlistentry><term>Software reset</term>
1439 <listitem>
1440 <para>
1441 This is achieved by turning CONTROL SRST bit on for at
1442 least 5us. Both PATA and SATA support it but, in case of
1443 SATA, this may require controller-specific support as the
1444 second Register FIS to clear SRST should be transmitted
1445 while BSY bit is still set. Note that on PATA, this resets
1446 both master and slave devices on a channel.
1447 </para>
1448 </listitem>
1449 </varlistentry>
1450
1451 <varlistentry><term>EXECUTE DEVICE DIAGNOSTIC command</term>
1452 <listitem>
1453 <para>
1454 Although ATA/ATAPI standard doesn't describe exactly, EDD
1455 implies some level of resetting, possibly similar level
1456 with software reset. Host-side EDD protocol can be handled
1457 with normal command processing and most SATA controllers
1458 should be able to handle EDD's just like other commands.
1459 As in software reset, EDD affects both devices on a PATA
1460 bus.
1461 </para>
1462 <para>
1463 Although EDD does reset devices, this doesn't suit error
1464 handling as EDD cannot be issued while BSY is set and it's
1465 unclear how it will act when device is in unknown/weird
1466 state.
1467 </para>
1468 </listitem>
1469 </varlistentry>
1470
1471 <varlistentry><term>ATAPI DEVICE RESET command</term>
1472 <listitem>
1473 <para>
1474 This is very similar to software reset except that reset
1475 can be restricted to the selected device without affecting
1476 the other device sharing the cable.
1477 </para>
1478 </listitem>
1479 </varlistentry>
1480
1481 <varlistentry><term>SATA phy reset</term>
1482 <listitem>
1483 <para>
1484 This is the preferred way of resetting a SATA device. In
1485 effect, it's identical to PATA hardware reset. Note that
1486 this can be done with the standard SCR Control register.
1487 As such, it's usually easier to implement than software
1488 reset.
1489 </para>
1490 </listitem>
1491 </varlistentry>
1492
1493 </variablelist>
1494
1495 <para>
1496 One more thing to consider when resetting devices is that
1497 resetting clears certain configuration parameters and they
1498 need to be set to their previous or newly adjusted values
1499 after reset.
1500 </para>
1501
1502 <para>
1503 Parameters affected are.
1504 </para>
1505
1506 <itemizedlist>
1507
1508 <listitem>
1509 <para>
1510 CHS set up with INITIALIZE DEVICE PARAMETERS (seldom used)
1511 </para>
1512 </listitem>
1513
1514 <listitem>
1515 <para>
1516 Parameters set with SET FEATURES including transfer mode setting
1517 </para>
1518 </listitem>
1519
1520 <listitem>
1521 <para>
1522 Block count set with SET MULTIPLE MODE
1523 </para>
1524 </listitem>
1525
1526 <listitem>
1527 <para>
1528 Other parameters (SET MAX, MEDIA LOCK...)
1529 </para>
1530 </listitem>
1531
1532 </itemizedlist>
1533
1534 <para>
1535 ATA/ATAPI standard specifies that some parameters must be
1536 maintained across hardware or software reset, but doesn't
1537 strictly specify all of them. Always reconfiguring needed
1538 parameters after reset is required for robustness. Note that
1539 this also applies when resuming from deep sleep (power-off).
1540 </para>
1541
1542 <para>
1543 Also, ATA/ATAPI standard requires that IDENTIFY DEVICE /
1544 IDENTIFY PACKET DEVICE is issued after any configuration
1545 parameter is updated or a hardware reset and the result used
1546 for further operation. OS driver is required to implement
1547 revalidation mechanism to support this.
1548 </para>
1549
1550 </sect2>
1551
1552 <sect2 id="exrecReconf">
1553 <title>Reconfigure transport</title>
1554
1555 <para>
1556 For both PATA and SATA, a lot of corners are cut for cheap
1557 connectors, cables or controllers and it's quite common to see
1558 high transmission error rate. This can be mitigated by
1559 lowering transmission speed.
1560 </para>
1561
1562 <para>
1563 The following is a possible scheme Jeff Garzik suggested.
1564 </para>
1565
1566 <blockquote>
1567 <para>
1568 If more than $N (3?) transmission errors happen in 15 minutes,
1569 </para>
1570 <itemizedlist>
1571 <listitem>
1572 <para>
1573 if SATA, decrease SATA PHY speed. if speed cannot be decreased,
1574 </para>
1575 </listitem>
1576 <listitem>
1577 <para>
1578 decrease UDMA xfer speed. if at UDMA0, switch to PIO4,
1579 </para>
1580 </listitem>
1581 <listitem>
1582 <para>
1583 decrease PIO xfer speed. if at PIO3, complain, but continue
1584 </para>
1585 </listitem>
1586 </itemizedlist>
1587 </blockquote>
1588
1589 </sect2>
1590
1591 </sect1>
1592
1593 </chapter>
1594
1595 <chapter id="PiixInt">
1596 <title>ata_piix Internals</title>
1597!Idrivers/ata/ata_piix.c
1598 </chapter>
1599
1600 <chapter id="SILInt">
1601 <title>sata_sil Internals</title>
1602!Idrivers/ata/sata_sil.c
1603 </chapter>
1604
1605 <chapter id="libataThanks">
1606 <title>Thanks</title>
1607 <para>
1608 The bulk of the ATA knowledge comes thanks to long conversations with
1609 Andre Hedrick (www.linux-ide.org), and long hours pondering the ATA
1610 and SCSI specifications.
1611 </para>
1612 <para>
1613 Thanks to Alan Cox for pointing out similarities
1614 between SATA and SCSI, and in general for motivation to hack on
1615 libata.
1616 </para>
1617 <para>
1618 libata's device detection
1619 method, ata_pio_devchk, and in general all the early probing was
1620 based on extensive study of Hale Landis's probe/reset code in his
1621 ATADRVR driver (www.ata-atapi.com).
1622 </para>
1623 </chapter>
1624
1625</book>
diff --git a/Documentation/DocBook/librs.tmpl b/Documentation/DocBook/librs.tmpl
deleted file mode 100644
index 94f21361e0ed..000000000000
--- a/Documentation/DocBook/librs.tmpl
+++ /dev/null
@@ -1,289 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="Reed-Solomon-Library-Guide">
6 <bookinfo>
7 <title>Reed-Solomon Library Programming Interface</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Thomas</firstname>
12 <surname>Gleixner</surname>
13 <affiliation>
14 <address>
15 <email>tglx@linutronix.de</email>
16 </address>
17 </affiliation>
18 </author>
19 </authorgroup>
20
21 <copyright>
22 <year>2004</year>
23 <holder>Thomas Gleixner</holder>
24 </copyright>
25
26 <legalnotice>
27 <para>
28 This documentation is free software; you can redistribute
29 it and/or modify it under the terms of the GNU General Public
30 License version 2 as published by the Free Software Foundation.
31 </para>
32
33 <para>
34 This program is distributed in the hope that it will be
35 useful, but WITHOUT ANY WARRANTY; without even the implied
36 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
37 See the GNU General Public License for more details.
38 </para>
39
40 <para>
41 You should have received a copy of the GNU General Public
42 License along with this program; if not, write to the Free
43 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
44 MA 02111-1307 USA
45 </para>
46
47 <para>
48 For more details see the file COPYING in the source
49 distribution of Linux.
50 </para>
51 </legalnotice>
52 </bookinfo>
53
54<toc></toc>
55
56 <chapter id="intro">
57 <title>Introduction</title>
58 <para>
59 The generic Reed-Solomon Library provides encoding, decoding
60 and error correction functions.
61 </para>
62 <para>
63 Reed-Solomon codes are used in communication and storage
64 applications to ensure data integrity.
65 </para>
66 <para>
67 This documentation is provided for developers who want to utilize
68 the functions provided by the library.
69 </para>
70 </chapter>
71
72 <chapter id="bugs">
73 <title>Known Bugs And Assumptions</title>
74 <para>
75 None.
76 </para>
77 </chapter>
78
79 <chapter id="usage">
80 <title>Usage</title>
81 <para>
82 This chapter provides examples of how to use the library.
83 </para>
84 <sect1>
85 <title>Initializing</title>
86 <para>
87 The init function init_rs returns a pointer to an
88 rs decoder structure, which holds the necessary
89 information for encoding, decoding and error correction
90 with the given polynomial. It either uses an existing
91 matching decoder or creates a new one. On creation all
92 the lookup tables for fast en/decoding are created.
93 The function may take a while, so make sure not to
94 call it in critical code paths.
95 </para>
96 <programlisting>
97/* the Reed Solomon control structure */
98static struct rs_control *rs_decoder;
99
100/* Symbolsize is 10 (bits)
101 * Primitive polynomial is x^10+x^3+1
102 * first consecutive root is 0
103 * primitive element to generate roots = 1
104 * generator polynomial degree (number of roots) = 6
105 */
106rs_decoder = init_rs (10, 0x409, 0, 1, 6);
107 </programlisting>
108 </sect1>
109 <sect1>
110 <title>Encoding</title>
111 <para>
112 The encoder calculates the Reed-Solomon code over
113 the given data length and stores the result in
114 the parity buffer. Note that the parity buffer must
115 be initialized before calling the encoder.
116 </para>
117 <para>
118 The expanded data can be inverted on the fly by
119 providing a non-zero inversion mask. The expanded data is
120 XOR'ed with the mask. This is used e.g. for FLASH
121 ECC, where the all 0xFF is inverted to an all 0x00.
122 The Reed-Solomon code for all 0x00 is all 0x00. The
123 code is inverted before storing to FLASH so it is 0xFF
124 too. This prevents that reading from an erased FLASH
125 results in ECC errors.
126 </para>
127 <para>
128 The databytes are expanded to the given symbol size
129 on the fly. There is no support for encoding continuous
130 bitstreams with a symbol size != 8 at the moment. If
131 it is necessary it should be not a big deal to implement
132 such functionality.
133 </para>
134 <programlisting>
135/* Parity buffer. Size = number of roots */
136uint16_t par[6];
137/* Initialize the parity buffer */
138memset(par, 0, sizeof(par));
139/* Encode 512 byte in data8. Store parity in buffer par */
140encode_rs8 (rs_decoder, data8, 512, par, 0);
141 </programlisting>
142 </sect1>
143 <sect1>
144 <title>Decoding</title>
145 <para>
146 The decoder calculates the syndrome over
147 the given data length and the received parity symbols
148 and corrects errors in the data.
149 </para>
150 <para>
151 If a syndrome is available from a hardware decoder
152 then the syndrome calculation is skipped.
153 </para>
154 <para>
155 The correction of the data buffer can be suppressed
156 by providing a correction pattern buffer and an error
157 location buffer to the decoder. The decoder stores the
158 calculated error location and the correction bitmask
159 in the given buffers. This is useful for hardware
160 decoders which use a weird bit ordering scheme.
161 </para>
162 <para>
163 The databytes are expanded to the given symbol size
164 on the fly. There is no support for decoding continuous
165 bitstreams with a symbolsize != 8 at the moment. If
166 it is necessary it should be not a big deal to implement
167 such functionality.
168 </para>
169
170 <sect2>
171 <title>
172 Decoding with syndrome calculation, direct data correction
173 </title>
174 <programlisting>
175/* Parity buffer. Size = number of roots */
176uint16_t par[6];
177uint8_t data[512];
178int numerr;
179/* Receive data */
180.....
181/* Receive parity */
182.....
183/* Decode 512 byte in data8.*/
184numerr = decode_rs8 (rs_decoder, data8, par, 512, NULL, 0, NULL, 0, NULL);
185 </programlisting>
186 </sect2>
187
188 <sect2>
189 <title>
190 Decoding with syndrome given by hardware decoder, direct data correction
191 </title>
192 <programlisting>
193/* Parity buffer. Size = number of roots */
194uint16_t par[6], syn[6];
195uint8_t data[512];
196int numerr;
197/* Receive data */
198.....
199/* Receive parity */
200.....
201/* Get syndrome from hardware decoder */
202.....
203/* Decode 512 byte in data8.*/
204numerr = decode_rs8 (rs_decoder, data8, par, 512, syn, 0, NULL, 0, NULL);
205 </programlisting>
206 </sect2>
207
208 <sect2>
209 <title>
210 Decoding with syndrome given by hardware decoder, no direct data correction.
211 </title>
212 <para>
213 Note: It's not necessary to give data and received parity to the decoder.
214 </para>
215 <programlisting>
216/* Parity buffer. Size = number of roots */
217uint16_t par[6], syn[6], corr[8];
218uint8_t data[512];
219int numerr, errpos[8];
220/* Receive data */
221.....
222/* Receive parity */
223.....
224/* Get syndrome from hardware decoder */
225.....
226/* Decode 512 byte in data8.*/
227numerr = decode_rs8 (rs_decoder, NULL, NULL, 512, syn, 0, errpos, 0, corr);
228for (i = 0; i &lt; numerr; i++) {
229 do_error_correction_in_your_buffer(errpos[i], corr[i]);
230}
231 </programlisting>
232 </sect2>
233 </sect1>
234 <sect1>
235 <title>Cleanup</title>
236 <para>
237 The function free_rs frees the allocated resources,
238 if the caller is the last user of the decoder.
239 </para>
240 <programlisting>
241/* Release resources */
242free_rs(rs_decoder);
243 </programlisting>
244 </sect1>
245
246 </chapter>
247
248 <chapter id="structs">
249 <title>Structures</title>
250 <para>
251 This chapter contains the autogenerated documentation of the structures which are
252 used in the Reed-Solomon Library and are relevant for a developer.
253 </para>
254!Iinclude/linux/rslib.h
255 </chapter>
256
257 <chapter id="pubfunctions">
258 <title>Public Functions Provided</title>
259 <para>
260 This chapter contains the autogenerated documentation of the Reed-Solomon functions
261 which are exported.
262 </para>
263!Elib/reed_solomon/reed_solomon.c
264 </chapter>
265
266 <chapter id="credits">
267 <title>Credits</title>
268 <para>
269 The library code for encoding and decoding was written by Phil Karn.
270 </para>
271 <programlisting>
272 Copyright 2002, Phil Karn, KA9Q
273 May be used under the terms of the GNU General Public License (GPL)
274 </programlisting>
275 <para>
276 The wrapper functions and interfaces are written by Thomas Gleixner.
277 </para>
278 <para>
279 Many users have provided bugfixes, improvements and helping hands for testing.
280 Thanks a lot.
281 </para>
282 <para>
283 The following people have contributed to this document:
284 </para>
285 <para>
286 Thomas Gleixner<email>tglx@linutronix.de</email>
287 </para>
288 </chapter>
289</book>
diff --git a/Documentation/DocBook/lsm.tmpl b/Documentation/DocBook/lsm.tmpl
deleted file mode 100644
index fe7664ce9667..000000000000
--- a/Documentation/DocBook/lsm.tmpl
+++ /dev/null
@@ -1,265 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<article class="whitepaper" id="LinuxSecurityModule" lang="en">
6 <articleinfo>
7 <title>Linux Security Modules: General Security Hooks for Linux</title>
8 <authorgroup>
9 <author>
10 <firstname>Stephen</firstname>
11 <surname>Smalley</surname>
12 <affiliation>
13 <orgname>NAI Labs</orgname>
14 <address><email>ssmalley@nai.com</email></address>
15 </affiliation>
16 </author>
17 <author>
18 <firstname>Timothy</firstname>
19 <surname>Fraser</surname>
20 <affiliation>
21 <orgname>NAI Labs</orgname>
22 <address><email>tfraser@nai.com</email></address>
23 </affiliation>
24 </author>
25 <author>
26 <firstname>Chris</firstname>
27 <surname>Vance</surname>
28 <affiliation>
29 <orgname>NAI Labs</orgname>
30 <address><email>cvance@nai.com</email></address>
31 </affiliation>
32 </author>
33 </authorgroup>
34 </articleinfo>
35
36<sect1 id="Introduction"><title>Introduction</title>
37
38<para>
39In March 2001, the National Security Agency (NSA) gave a presentation
40about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel
41Summit. SELinux is an implementation of flexible and fine-grained
42nondiscretionary access controls in the Linux kernel, originally
43implemented as its own particular kernel patch. Several other
44security projects (e.g. RSBAC, Medusa) have also developed flexible
45access control architectures for the Linux kernel, and various
46projects have developed particular access control models for Linux
47(e.g. LIDS, DTE, SubDomain). Each project has developed and
48maintained its own kernel patch to support its security needs.
49</para>
50
51<para>
52In response to the NSA presentation, Linus Torvalds made a set of
53remarks that described a security framework he would be willing to
54consider for inclusion in the mainstream Linux kernel. He described a
55general framework that would provide a set of security hooks to
56control operations on kernel objects and a set of opaque security
57fields in kernel data structures for maintaining security attributes.
58This framework could then be used by loadable kernel modules to
59implement any desired model of security. Linus also suggested the
60possibility of migrating the Linux capabilities code into such a
61module.
62</para>
63
64<para>
65The Linux Security Modules (LSM) project was started by WireX to
66develop such a framework. LSM is a joint development effort by
67several security projects, including Immunix, SELinux, SGI and Janus,
68and several individuals, including Greg Kroah-Hartman and James
69Morris, to develop a Linux kernel patch that implements this
70framework. The patch is currently tracking the 2.4 series and is
71targeted for integration into the 2.5 development series. This
72technical report provides an overview of the framework and the example
73capabilities security module provided by the LSM kernel patch.
74</para>
75
76</sect1>
77
78<sect1 id="framework"><title>LSM Framework</title>
79
80<para>
81The LSM kernel patch provides a general kernel framework to support
82security modules. In particular, the LSM framework is primarily
83focused on supporting access control modules, although future
84development is likely to address other security needs such as
85auditing. By itself, the framework does not provide any additional
86security; it merely provides the infrastructure to support security
87modules. The LSM kernel patch also moves most of the capabilities
88logic into an optional security module, with the system defaulting
89to the traditional superuser logic. This capabilities module
90is discussed further in <xref linkend="cap"/>.
91</para>
92
93<para>
94The LSM kernel patch adds security fields to kernel data structures
95and inserts calls to hook functions at critical points in the kernel
96code to manage the security fields and to perform access control. It
97also adds functions for registering and unregistering security
98modules, and adds a general <function>security</function> system call
99to support new system calls for security-aware applications.
100</para>
101
102<para>
103The LSM security fields are simply <type>void*</type> pointers. For
104process and program execution security information, security fields
105were added to <structname>struct task_struct</structname> and
106<structname>struct linux_binprm</structname>. For filesystem security
107information, a security field was added to
108<structname>struct super_block</structname>. For pipe, file, and socket
109security information, security fields were added to
110<structname>struct inode</structname> and
111<structname>struct file</structname>. For packet and network device security
112information, security fields were added to
113<structname>struct sk_buff</structname> and
114<structname>struct net_device</structname>. For System V IPC security
115information, security fields were added to
116<structname>struct kern_ipc_perm</structname> and
117<structname>struct msg_msg</structname>; additionally, the definitions
118for <structname>struct msg_msg</structname>, <structname>struct
119msg_queue</structname>, and <structname>struct
120shmid_kernel</structname> were moved to header files
121(<filename>include/linux/msg.h</filename> and
122<filename>include/linux/shm.h</filename> as appropriate) to allow
123the security modules to use these definitions.
124</para>
125
126<para>
127Each LSM hook is a function pointer in a global table,
128security_ops. This table is a
129<structname>security_operations</structname> structure as defined by
130<filename>include/linux/security.h</filename>. Detailed documentation
131for each hook is included in this header file. At present, this
132structure consists of a collection of substructures that group related
133hooks based on the kernel object (e.g. task, inode, file, sk_buff,
134etc) as well as some top-level hook function pointers for system
135operations. This structure is likely to be flattened in the future
136for performance. The placement of the hook calls in the kernel code
137is described by the "called:" lines in the per-hook documentation in
138the header file. The hook calls can also be easily found in the
139kernel code by looking for the string "security_ops->".
140
141</para>
142
143<para>
144Linus mentioned per-process security hooks in his original remarks as a
145possible alternative to global security hooks. However, if LSM were
146to start from the perspective of per-process hooks, then the base
147framework would have to deal with how to handle operations that
148involve multiple processes (e.g. kill), since each process might have
149its own hook for controlling the operation. This would require a
150general mechanism for composing hooks in the base framework.
151Additionally, LSM would still need global hooks for operations that
152have no process context (e.g. network input operations).
153Consequently, LSM provides global security hooks, but a security
154module is free to implement per-process hooks (where that makes sense)
155by storing a security_ops table in each process' security field and
156then invoking these per-process hooks from the global hooks.
157The problem of composition is thus deferred to the module.
158</para>
159
160<para>
161The global security_ops table is initialized to a set of hook
162functions provided by a dummy security module that provides
163traditional superuser logic. A <function>register_security</function>
164function (in <filename>security/security.c</filename>) is provided to
165allow a security module to set security_ops to refer to its own hook
166functions, and an <function>unregister_security</function> function is
167provided to revert security_ops to the dummy module hooks. This
168mechanism is used to set the primary security module, which is
169responsible for making the final decision for each hook.
170</para>
171
172<para>
173LSM also provides a simple mechanism for stacking additional security
174modules with the primary security module. It defines
175<function>register_security</function> and
176<function>unregister_security</function> hooks in the
177<structname>security_operations</structname> structure and provides
178<function>mod_reg_security</function> and
179<function>mod_unreg_security</function> functions that invoke these
180hooks after performing some sanity checking. A security module can
181call these functions in order to stack with other modules. However,
182the actual details of how this stacking is handled are deferred to the
183module, which can implement these hooks in any way it wishes
184(including always returning an error if it does not wish to support
185stacking). In this manner, LSM again defers the problem of
186composition to the module.
187</para>
188
189<para>
190Although the LSM hooks are organized into substructures based on
191kernel object, all of the hooks can be viewed as falling into two
192major categories: hooks that are used to manage the security fields
193and hooks that are used to perform access control. Examples of the
194first category of hooks include the
195<function>alloc_security</function> and
196<function>free_security</function> hooks defined for each kernel data
197structure that has a security field. These hooks are used to allocate
198and free security structures for kernel objects. The first category
199of hooks also includes hooks that set information in the security
200field after allocation, such as the <function>post_lookup</function>
201hook in <structname>struct inode_security_ops</structname>. This hook
202is used to set security information for inodes after successful lookup
203operations. An example of the second category of hooks is the
204<function>permission</function> hook in
205<structname>struct inode_security_ops</structname>. This hook checks
206permission when accessing an inode.
207</para>
208
209</sect1>
210
211<sect1 id="cap"><title>LSM Capabilities Module</title>
212
213<para>
214The LSM kernel patch moves most of the existing POSIX.1e capabilities
215logic into an optional security module stored in the file
216<filename>security/capability.c</filename>. This change allows
217users who do not want to use capabilities to omit this code entirely
218from their kernel, instead using the dummy module for traditional
219superuser logic or any other module that they desire. This change
220also allows the developers of the capabilities logic to maintain and
221enhance their code more freely, without needing to integrate patches
222back into the base kernel.
223</para>
224
225<para>
226In addition to moving the capabilities logic, the LSM kernel patch
227could move the capability-related fields from the kernel data
228structures into the new security fields managed by the security
229modules. However, at present, the LSM kernel patch leaves the
230capability fields in the kernel data structures. In his original
231remarks, Linus suggested that this might be preferable so that other
232security modules can be easily stacked with the capabilities module
233without needing to chain multiple security structures on the security field.
234It also avoids imposing extra overhead on the capabilities module
235to manage the security fields. However, the LSM framework could
236certainly support such a move if it is determined to be desirable,
237with only a few additional changes described below.
238</para>
239
240<para>
241At present, the capabilities logic for computing process capabilities
242on <function>execve</function> and <function>set*uid</function>,
243checking capabilities for a particular process, saving and checking
244capabilities for netlink messages, and handling the
245<function>capget</function> and <function>capset</function> system
246calls have been moved into the capabilities module. There are still a
247few locations in the base kernel where capability-related fields are
248directly examined or modified, but the current version of the LSM
249patch does allow a security module to completely replace the
250assignment and testing of capabilities. These few locations would
251need to be changed if the capability-related fields were moved into
252the security field. The following is a list of known locations that
253still perform such direct examination or modification of
254capability-related fields:
255<itemizedlist>
256<listitem><para><filename>fs/open.c</filename>:<function>sys_access</function></para></listitem>
257<listitem><para><filename>fs/lockd/host.c</filename>:<function>nlm_bind_host</function></para></listitem>
258<listitem><para><filename>fs/nfsd/auth.c</filename>:<function>nfsd_setuser</function></para></listitem>
259<listitem><para><filename>fs/proc/array.c</filename>:<function>task_cap</function></para></listitem>
260</itemizedlist>
261</para>
262
263</sect1>
264
265</article>
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl
deleted file mode 100644
index b442921bca54..000000000000
--- a/Documentation/DocBook/mtdnand.tmpl
+++ /dev/null
@@ -1,1291 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="MTD-NAND-Guide">
6 <bookinfo>
7 <title>MTD NAND Driver Programming Interface</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Thomas</firstname>
12 <surname>Gleixner</surname>
13 <affiliation>
14 <address>
15 <email>tglx@linutronix.de</email>
16 </address>
17 </affiliation>
18 </author>
19 </authorgroup>
20
21 <copyright>
22 <year>2004</year>
23 <holder>Thomas Gleixner</holder>
24 </copyright>
25
26 <legalnotice>
27 <para>
28 This documentation is free software; you can redistribute
29 it and/or modify it under the terms of the GNU General Public
30 License version 2 as published by the Free Software Foundation.
31 </para>
32
33 <para>
34 This program is distributed in the hope that it will be
35 useful, but WITHOUT ANY WARRANTY; without even the implied
36 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
37 See the GNU General Public License for more details.
38 </para>
39
40 <para>
41 You should have received a copy of the GNU General Public
42 License along with this program; if not, write to the Free
43 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
44 MA 02111-1307 USA
45 </para>
46
47 <para>
48 For more details see the file COPYING in the source
49 distribution of Linux.
50 </para>
51 </legalnotice>
52 </bookinfo>
53
54<toc></toc>
55
56 <chapter id="intro">
57 <title>Introduction</title>
58 <para>
59 The generic NAND driver supports almost all NAND and AG-AND based
60 chips and connects them to the Memory Technology Devices (MTD)
61 subsystem of the Linux Kernel.
62 </para>
63 <para>
64 This documentation is provided for developers who want to implement
65 board drivers or filesystem drivers suitable for NAND devices.
66 </para>
67 </chapter>
68
69 <chapter id="bugs">
70 <title>Known Bugs And Assumptions</title>
71 <para>
72 None.
73 </para>
74 </chapter>
75
76 <chapter id="dochints">
77 <title>Documentation hints</title>
78 <para>
79 The function and structure docs are autogenerated. Each function and
80 struct member has a short description which is marked with an [XXX] identifier.
81 The following chapters explain the meaning of those identifiers.
82 </para>
83 <sect1 id="Function_identifiers_XXX">
84 <title>Function identifiers [XXX]</title>
85 <para>
86 The functions are marked with [XXX] identifiers in the short
87 comment. The identifiers explain the usage and scope of the
88 functions. Following identifiers are used:
89 </para>
90 <itemizedlist>
91 <listitem><para>
92 [MTD Interface]</para><para>
93 These functions provide the interface to the MTD kernel API.
94 They are not replaceable and provide functionality
95 which is complete hardware independent.
96 </para></listitem>
97 <listitem><para>
98 [NAND Interface]</para><para>
99 These functions are exported and provide the interface to the NAND kernel API.
100 </para></listitem>
101 <listitem><para>
102 [GENERIC]</para><para>
103 Generic functions are not replaceable and provide functionality
104 which is complete hardware independent.
105 </para></listitem>
106 <listitem><para>
107 [DEFAULT]</para><para>
108 Default functions provide hardware related functionality which is suitable
109 for most of the implementations. These functions can be replaced by the
110 board driver if necessary. Those functions are called via pointers in the
111 NAND chip description structure. The board driver can set the functions which
112 should be replaced by board dependent functions before calling nand_scan().
113 If the function pointer is NULL on entry to nand_scan() then the pointer
114 is set to the default function which is suitable for the detected chip type.
115 </para></listitem>
116 </itemizedlist>
117 </sect1>
118 <sect1 id="Struct_member_identifiers_XXX">
119 <title>Struct member identifiers [XXX]</title>
120 <para>
121 The struct members are marked with [XXX] identifiers in the
122 comment. The identifiers explain the usage and scope of the
123 members. Following identifiers are used:
124 </para>
125 <itemizedlist>
126 <listitem><para>
127 [INTERN]</para><para>
128 These members are for NAND driver internal use only and must not be
129 modified. Most of these values are calculated from the chip geometry
130 information which is evaluated during nand_scan().
131 </para></listitem>
132 <listitem><para>
133 [REPLACEABLE]</para><para>
134 Replaceable members hold hardware related functions which can be
135 provided by the board driver. The board driver can set the functions which
136 should be replaced by board dependent functions before calling nand_scan().
137 If the function pointer is NULL on entry to nand_scan() then the pointer
138 is set to the default function which is suitable for the detected chip type.
139 </para></listitem>
140 <listitem><para>
141 [BOARDSPECIFIC]</para><para>
142 Board specific members hold hardware related information which must
143 be provided by the board driver. The board driver must set the function
144 pointers and datafields before calling nand_scan().
145 </para></listitem>
146 <listitem><para>
147 [OPTIONAL]</para><para>
148 Optional members can hold information relevant for the board driver. The
149 generic NAND driver code does not use this information.
150 </para></listitem>
151 </itemizedlist>
152 </sect1>
153 </chapter>
154
155 <chapter id="basicboarddriver">
156 <title>Basic board driver</title>
157 <para>
158 For most boards it will be sufficient to provide just the
159 basic functions and fill out some really board dependent
160 members in the nand chip description structure.
161 </para>
162 <sect1 id="Basic_defines">
163 <title>Basic defines</title>
164 <para>
165 At least you have to provide a nand_chip structure
166 and a storage for the ioremap'ed chip address.
167 You can allocate the nand_chip structure using
168 kmalloc or you can allocate it statically.
169 The NAND chip structure embeds an mtd structure
170 which will be registered to the MTD subsystem.
171 You can extract a pointer to the mtd structure
172 from a nand_chip pointer using the nand_to_mtd()
173 helper.
174 </para>
175 <para>
176 Kmalloc based example
177 </para>
178 <programlisting>
179static struct mtd_info *board_mtd;
180static void __iomem *baseaddr;
181 </programlisting>
182 <para>
183 Static example
184 </para>
185 <programlisting>
186static struct nand_chip board_chip;
187static void __iomem *baseaddr;
188 </programlisting>
189 </sect1>
190 <sect1 id="Partition_defines">
191 <title>Partition defines</title>
192 <para>
193 If you want to divide your device into partitions, then
194 define a partitioning scheme suitable to your board.
195 </para>
196 <programlisting>
197#define NUM_PARTITIONS 2
198static struct mtd_partition partition_info[] = {
199 { .name = "Flash partition 1",
200 .offset = 0,
201 .size = 8 * 1024 * 1024 },
202 { .name = "Flash partition 2",
203 .offset = MTDPART_OFS_NEXT,
204 .size = MTDPART_SIZ_FULL },
205};
206 </programlisting>
207 </sect1>
208 <sect1 id="Hardware_control_functions">
209 <title>Hardware control function</title>
210 <para>
211 The hardware control function provides access to the
212 control pins of the NAND chip(s).
213 The access can be done by GPIO pins or by address lines.
214 If you use address lines, make sure that the timing
215 requirements are met.
216 </para>
217 <para>
218 <emphasis>GPIO based example</emphasis>
219 </para>
220 <programlisting>
221static void board_hwcontrol(struct mtd_info *mtd, int cmd)
222{
223 switch(cmd){
224 case NAND_CTL_SETCLE: /* Set CLE pin high */ break;
225 case NAND_CTL_CLRCLE: /* Set CLE pin low */ break;
226 case NAND_CTL_SETALE: /* Set ALE pin high */ break;
227 case NAND_CTL_CLRALE: /* Set ALE pin low */ break;
228 case NAND_CTL_SETNCE: /* Set nCE pin low */ break;
229 case NAND_CTL_CLRNCE: /* Set nCE pin high */ break;
230 }
231}
232 </programlisting>
233 <para>
234 <emphasis>Address lines based example.</emphasis> It's assumed that the
235 nCE pin is driven by a chip select decoder.
236 </para>
237 <programlisting>
238static void board_hwcontrol(struct mtd_info *mtd, int cmd)
239{
240 struct nand_chip *this = mtd_to_nand(mtd);
241 switch(cmd){
242 case NAND_CTL_SETCLE: this->IO_ADDR_W |= CLE_ADRR_BIT; break;
243 case NAND_CTL_CLRCLE: this->IO_ADDR_W &amp;= ~CLE_ADRR_BIT; break;
244 case NAND_CTL_SETALE: this->IO_ADDR_W |= ALE_ADRR_BIT; break;
245 case NAND_CTL_CLRALE: this->IO_ADDR_W &amp;= ~ALE_ADRR_BIT; break;
246 }
247}
248 </programlisting>
249 </sect1>
250 <sect1 id="Device_ready_function">
251 <title>Device ready function</title>
252 <para>
253 If the hardware interface has the ready busy pin of the NAND chip connected to a
254 GPIO or other accessible I/O pin, this function is used to read back the state of the
255 pin. The function has no arguments and should return 0, if the device is busy (R/B pin
256 is low) and 1, if the device is ready (R/B pin is high).
257 If the hardware interface does not give access to the ready busy pin, then
258 the function must not be defined and the function pointer this->dev_ready is set to NULL.
259 </para>
260 </sect1>
261 <sect1 id="Init_function">
262 <title>Init function</title>
263 <para>
264 The init function allocates memory and sets up all the board
265 specific parameters and function pointers. When everything
266 is set up nand_scan() is called. This function tries to
267 detect and identify then chip. If a chip is found all the
268 internal data fields are initialized accordingly.
269 The structure(s) have to be zeroed out first and then filled with the necessary
270 information about the device.
271 </para>
272 <programlisting>
273static int __init board_init (void)
274{
275 struct nand_chip *this;
276 int err = 0;
277
278 /* Allocate memory for MTD device structure and private data */
279 this = kzalloc(sizeof(struct nand_chip), GFP_KERNEL);
280 if (!this) {
281 printk ("Unable to allocate NAND MTD device structure.\n");
282 err = -ENOMEM;
283 goto out;
284 }
285
286 board_mtd = nand_to_mtd(this);
287
288 /* map physical address */
289 baseaddr = ioremap(CHIP_PHYSICAL_ADDRESS, 1024);
290 if (!baseaddr) {
291 printk("Ioremap to access NAND chip failed\n");
292 err = -EIO;
293 goto out_mtd;
294 }
295
296 /* Set address of NAND IO lines */
297 this->IO_ADDR_R = baseaddr;
298 this->IO_ADDR_W = baseaddr;
299 /* Reference hardware control function */
300 this->hwcontrol = board_hwcontrol;
301 /* Set command delay time, see datasheet for correct value */
302 this->chip_delay = CHIP_DEPENDEND_COMMAND_DELAY;
303 /* Assign the device ready function, if available */
304 this->dev_ready = board_dev_ready;
305 this->eccmode = NAND_ECC_SOFT;
306
307 /* Scan to find existence of the device */
308 if (nand_scan (board_mtd, 1)) {
309 err = -ENXIO;
310 goto out_ior;
311 }
312
313 add_mtd_partitions(board_mtd, partition_info, NUM_PARTITIONS);
314 goto out;
315
316out_ior:
317 iounmap(baseaddr);
318out_mtd:
319 kfree (this);
320out:
321 return err;
322}
323module_init(board_init);
324 </programlisting>
325 </sect1>
326 <sect1 id="Exit_function">
327 <title>Exit function</title>
328 <para>
329 The exit function is only necessary if the driver is
330 compiled as a module. It releases all resources which
331 are held by the chip driver and unregisters the partitions
332 in the MTD layer.
333 </para>
334 <programlisting>
335#ifdef MODULE
336static void __exit board_cleanup (void)
337{
338 /* Release resources, unregister device */
339 nand_release (board_mtd);
340
341 /* unmap physical address */
342 iounmap(baseaddr);
343
344 /* Free the MTD device structure */
345 kfree (mtd_to_nand(board_mtd));
346}
347module_exit(board_cleanup);
348#endif
349 </programlisting>
350 </sect1>
351 </chapter>
352
353 <chapter id="boarddriversadvanced">
354 <title>Advanced board driver functions</title>
355 <para>
356 This chapter describes the advanced functionality of the NAND
357 driver. For a list of functions which can be overridden by the board
358 driver see the documentation of the nand_chip structure.
359 </para>
360 <sect1 id="Multiple_chip_control">
361 <title>Multiple chip control</title>
362 <para>
363 The nand driver can control chip arrays. Therefore the
364 board driver must provide an own select_chip function. This
365 function must (de)select the requested chip.
366 The function pointer in the nand_chip structure must
367 be set before calling nand_scan(). The maxchip parameter
368 of nand_scan() defines the maximum number of chips to
369 scan for. Make sure that the select_chip function can
370 handle the requested number of chips.
371 </para>
372 <para>
373 The nand driver concatenates the chips to one virtual
374 chip and provides this virtual chip to the MTD layer.
375 </para>
376 <para>
377 <emphasis>Note: The driver can only handle linear chip arrays
378 of equally sized chips. There is no support for
379 parallel arrays which extend the buswidth.</emphasis>
380 </para>
381 <para>
382 <emphasis>GPIO based example</emphasis>
383 </para>
384 <programlisting>
385static void board_select_chip (struct mtd_info *mtd, int chip)
386{
387 /* Deselect all chips, set all nCE pins high */
388 GPIO(BOARD_NAND_NCE) |= 0xff;
389 if (chip >= 0)
390 GPIO(BOARD_NAND_NCE) &amp;= ~ (1 &lt;&lt; chip);
391}
392 </programlisting>
393 <para>
394 <emphasis>Address lines based example.</emphasis>
395 Its assumed that the nCE pins are connected to an
396 address decoder.
397 </para>
398 <programlisting>
399static void board_select_chip (struct mtd_info *mtd, int chip)
400{
401 struct nand_chip *this = mtd_to_nand(mtd);
402
403 /* Deselect all chips */
404 this->IO_ADDR_R &amp;= ~BOARD_NAND_ADDR_MASK;
405 this->IO_ADDR_W &amp;= ~BOARD_NAND_ADDR_MASK;
406 switch (chip) {
407 case 0:
408 this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIP0;
409 this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIP0;
410 break;
411 ....
412 case n:
413 this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIPn;
414 this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIPn;
415 break;
416 }
417}
418 </programlisting>
419 </sect1>
420 <sect1 id="Hardware_ECC_support">
421 <title>Hardware ECC support</title>
422 <sect2 id="Functions_and_constants">
423 <title>Functions and constants</title>
424 <para>
425 The nand driver supports three different types of
426 hardware ECC.
427 <itemizedlist>
428 <listitem><para>NAND_ECC_HW3_256</para><para>
429 Hardware ECC generator providing 3 bytes ECC per
430 256 byte.
431 </para> </listitem>
432 <listitem><para>NAND_ECC_HW3_512</para><para>
433 Hardware ECC generator providing 3 bytes ECC per
434 512 byte.
435 </para> </listitem>
436 <listitem><para>NAND_ECC_HW6_512</para><para>
437 Hardware ECC generator providing 6 bytes ECC per
438 512 byte.
439 </para> </listitem>
440 <listitem><para>NAND_ECC_HW8_512</para><para>
441 Hardware ECC generator providing 6 bytes ECC per
442 512 byte.
443 </para> </listitem>
444 </itemizedlist>
445 If your hardware generator has a different functionality
446 add it at the appropriate place in nand_base.c
447 </para>
448 <para>
449 The board driver must provide following functions:
450 <itemizedlist>
451 <listitem><para>enable_hwecc</para><para>
452 This function is called before reading / writing to
453 the chip. Reset or initialize the hardware generator
454 in this function. The function is called with an
455 argument which let you distinguish between read
456 and write operations.
457 </para> </listitem>
458 <listitem><para>calculate_ecc</para><para>
459 This function is called after read / write from / to
460 the chip. Transfer the ECC from the hardware to
461 the buffer. If the option NAND_HWECC_SYNDROME is set
462 then the function is only called on write. See below.
463 </para> </listitem>
464 <listitem><para>correct_data</para><para>
465 In case of an ECC error this function is called for
466 error detection and correction. Return 1 respectively 2
467 in case the error can be corrected. If the error is
468 not correctable return -1. If your hardware generator
469 matches the default algorithm of the nand_ecc software
470 generator then use the correction function provided
471 by nand_ecc instead of implementing duplicated code.
472 </para> </listitem>
473 </itemizedlist>
474 </para>
475 </sect2>
476 <sect2 id="Hardware_ECC_with_syndrome_calculation">
477 <title>Hardware ECC with syndrome calculation</title>
478 <para>
479 Many hardware ECC implementations provide Reed-Solomon
480 codes and calculate an error syndrome on read. The syndrome
481 must be converted to a standard Reed-Solomon syndrome
482 before calling the error correction code in the generic
483 Reed-Solomon library.
484 </para>
485 <para>
486 The ECC bytes must be placed immediately after the data
487 bytes in order to make the syndrome generator work. This
488 is contrary to the usual layout used by software ECC. The
489 separation of data and out of band area is not longer
490 possible. The nand driver code handles this layout and
491 the remaining free bytes in the oob area are managed by
492 the autoplacement code. Provide a matching oob-layout
493 in this case. See rts_from4.c and diskonchip.c for
494 implementation reference. In those cases we must also
495 use bad block tables on FLASH, because the ECC layout is
496 interfering with the bad block marker positions.
497 See bad block table support for details.
498 </para>
499 </sect2>
500 </sect1>
501 <sect1 id="Bad_Block_table_support">
502 <title>Bad block table support</title>
503 <para>
504 Most NAND chips mark the bad blocks at a defined
505 position in the spare area. Those blocks must
506 not be erased under any circumstances as the bad
507 block information would be lost.
508 It is possible to check the bad block mark each
509 time when the blocks are accessed by reading the
510 spare area of the first page in the block. This
511 is time consuming so a bad block table is used.
512 </para>
513 <para>
514 The nand driver supports various types of bad block
515 tables.
516 <itemizedlist>
517 <listitem><para>Per device</para><para>
518 The bad block table contains all bad block information
519 of the device which can consist of multiple chips.
520 </para> </listitem>
521 <listitem><para>Per chip</para><para>
522 A bad block table is used per chip and contains the
523 bad block information for this particular chip.
524 </para> </listitem>
525 <listitem><para>Fixed offset</para><para>
526 The bad block table is located at a fixed offset
527 in the chip (device). This applies to various
528 DiskOnChip devices.
529 </para> </listitem>
530 <listitem><para>Automatic placed</para><para>
531 The bad block table is automatically placed and
532 detected either at the end or at the beginning
533 of a chip (device)
534 </para> </listitem>
535 <listitem><para>Mirrored tables</para><para>
536 The bad block table is mirrored on the chip (device) to
537 allow updates of the bad block table without data loss.
538 </para> </listitem>
539 </itemizedlist>
540 </para>
541 <para>
542 nand_scan() calls the function nand_default_bbt().
543 nand_default_bbt() selects appropriate default
544 bad block table descriptors depending on the chip information
545 which was retrieved by nand_scan().
546 </para>
547 <para>
548 The standard policy is scanning the device for bad
549 blocks and build a ram based bad block table which
550 allows faster access than always checking the
551 bad block information on the flash chip itself.
552 </para>
553 <sect2 id="Flash_based_tables">
554 <title>Flash based tables</title>
555 <para>
556 It may be desired or necessary to keep a bad block table in FLASH.
557 For AG-AND chips this is mandatory, as they have no factory marked
558 bad blocks. They have factory marked good blocks. The marker pattern
559 is erased when the block is erased to be reused. So in case of
560 powerloss before writing the pattern back to the chip this block
561 would be lost and added to the bad blocks. Therefore we scan the
562 chip(s) when we detect them the first time for good blocks and
563 store this information in a bad block table before erasing any
564 of the blocks.
565 </para>
566 <para>
567 The blocks in which the tables are stored are protected against
568 accidental access by marking them bad in the memory bad block
569 table. The bad block table management functions are allowed
570 to circumvent this protection.
571 </para>
572 <para>
573 The simplest way to activate the FLASH based bad block table support
574 is to set the option NAND_BBT_USE_FLASH in the bbt_option field of
575 the nand chip structure before calling nand_scan(). For AG-AND
576 chips is this done by default.
577 This activates the default FLASH based bad block table functionality
578 of the NAND driver. The default bad block table options are
579 <itemizedlist>
580 <listitem><para>Store bad block table per chip</para></listitem>
581 <listitem><para>Use 2 bits per block</para></listitem>
582 <listitem><para>Automatic placement at the end of the chip</para></listitem>
583 <listitem><para>Use mirrored tables with version numbers</para></listitem>
584 <listitem><para>Reserve 4 blocks at the end of the chip</para></listitem>
585 </itemizedlist>
586 </para>
587 </sect2>
588 <sect2 id="User_defined_tables">
589 <title>User defined tables</title>
590 <para>
591 User defined tables are created by filling out a
592 nand_bbt_descr structure and storing the pointer in the
593 nand_chip structure member bbt_td before calling nand_scan().
594 If a mirror table is necessary a second structure must be
595 created and a pointer to this structure must be stored
596 in bbt_md inside the nand_chip structure. If the bbt_md
597 member is set to NULL then only the main table is used
598 and no scan for the mirrored table is performed.
599 </para>
600 <para>
601 The most important field in the nand_bbt_descr structure
602 is the options field. The options define most of the
603 table properties. Use the predefined constants from
604 nand.h to define the options.
605 <itemizedlist>
606 <listitem><para>Number of bits per block</para>
607 <para>The supported number of bits is 1, 2, 4, 8.</para></listitem>
608 <listitem><para>Table per chip</para>
609 <para>Setting the constant NAND_BBT_PERCHIP selects that
610 a bad block table is managed for each chip in a chip array.
611 If this option is not set then a per device bad block table
612 is used.</para></listitem>
613 <listitem><para>Table location is absolute</para>
614 <para>Use the option constant NAND_BBT_ABSPAGE and
615 define the absolute page number where the bad block
616 table starts in the field pages. If you have selected bad block
617 tables per chip and you have a multi chip array then the start page
618 must be given for each chip in the chip array. Note: there is no scan
619 for a table ident pattern performed, so the fields
620 pattern, veroffs, offs, len can be left uninitialized</para></listitem>
621 <listitem><para>Table location is automatically detected</para>
622 <para>The table can either be located in the first or the last good
623 blocks of the chip (device). Set NAND_BBT_LASTBLOCK to place
624 the bad block table at the end of the chip (device). The
625 bad block tables are marked and identified by a pattern which
626 is stored in the spare area of the first page in the block which
627 holds the bad block table. Store a pointer to the pattern
628 in the pattern field. Further the length of the pattern has to be
629 stored in len and the offset in the spare area must be given
630 in the offs member of the nand_bbt_descr structure. For mirrored
631 bad block tables different patterns are mandatory.</para></listitem>
632 <listitem><para>Table creation</para>
633 <para>Set the option NAND_BBT_CREATE to enable the table creation
634 if no table can be found during the scan. Usually this is done only
635 once if a new chip is found. </para></listitem>
636 <listitem><para>Table write support</para>
637 <para>Set the option NAND_BBT_WRITE to enable the table write support.
638 This allows the update of the bad block table(s) in case a block has
639 to be marked bad due to wear. The MTD interface function block_markbad
640 is calling the update function of the bad block table. If the write
641 support is enabled then the table is updated on FLASH.</para>
642 <para>
643 Note: Write support should only be enabled for mirrored tables with
644 version control.
645 </para></listitem>
646 <listitem><para>Table version control</para>
647 <para>Set the option NAND_BBT_VERSION to enable the table version control.
648 It's highly recommended to enable this for mirrored tables with write
649 support. It makes sure that the risk of losing the bad block
650 table information is reduced to the loss of the information about the
651 one worn out block which should be marked bad. The version is stored in
652 4 consecutive bytes in the spare area of the device. The position of
653 the version number is defined by the member veroffs in the bad block table
654 descriptor.</para></listitem>
655 <listitem><para>Save block contents on write</para>
656 <para>
657 In case that the block which holds the bad block table does contain
658 other useful information, set the option NAND_BBT_SAVECONTENT. When
659 the bad block table is written then the whole block is read the bad
660 block table is updated and the block is erased and everything is
661 written back. If this option is not set only the bad block table
662 is written and everything else in the block is ignored and erased.
663 </para></listitem>
664 <listitem><para>Number of reserved blocks</para>
665 <para>
666 For automatic placement some blocks must be reserved for
667 bad block table storage. The number of reserved blocks is defined
668 in the maxblocks member of the bad block table description structure.
669 Reserving 4 blocks for mirrored tables should be a reasonable number.
670 This also limits the number of blocks which are scanned for the bad
671 block table ident pattern.
672 </para></listitem>
673 </itemizedlist>
674 </para>
675 </sect2>
676 </sect1>
677 <sect1 id="Spare_area_placement">
678 <title>Spare area (auto)placement</title>
679 <para>
680 The nand driver implements different possibilities for
681 placement of filesystem data in the spare area,
682 <itemizedlist>
683 <listitem><para>Placement defined by fs driver</para></listitem>
684 <listitem><para>Automatic placement</para></listitem>
685 </itemizedlist>
686 The default placement function is automatic placement. The
687 nand driver has built in default placement schemes for the
688 various chiptypes. If due to hardware ECC functionality the
689 default placement does not fit then the board driver can
690 provide a own placement scheme.
691 </para>
692 <para>
693 File system drivers can provide a own placement scheme which
694 is used instead of the default placement scheme.
695 </para>
696 <para>
697 Placement schemes are defined by a nand_oobinfo structure
698 <programlisting>
699struct nand_oobinfo {
700 int useecc;
701 int eccbytes;
702 int eccpos[24];
703 int oobfree[8][2];
704};
705 </programlisting>
706 <itemizedlist>
707 <listitem><para>useecc</para><para>
708 The useecc member controls the ecc and placement function. The header
709 file include/mtd/mtd-abi.h contains constants to select ecc and
710 placement. MTD_NANDECC_OFF switches off the ecc complete. This is
711 not recommended and available for testing and diagnosis only.
712 MTD_NANDECC_PLACE selects caller defined placement, MTD_NANDECC_AUTOPLACE
713 selects automatic placement.
714 </para></listitem>
715 <listitem><para>eccbytes</para><para>
716 The eccbytes member defines the number of ecc bytes per page.
717 </para></listitem>
718 <listitem><para>eccpos</para><para>
719 The eccpos array holds the byte offsets in the spare area where
720 the ecc codes are placed.
721 </para></listitem>
722 <listitem><para>oobfree</para><para>
723 The oobfree array defines the areas in the spare area which can be
724 used for automatic placement. The information is given in the format
725 {offset, size}. offset defines the start of the usable area, size the
726 length in bytes. More than one area can be defined. The list is terminated
727 by an {0, 0} entry.
728 </para></listitem>
729 </itemizedlist>
730 </para>
731 <sect2 id="Placement_defined_by_fs_driver">
732 <title>Placement defined by fs driver</title>
733 <para>
734 The calling function provides a pointer to a nand_oobinfo
735 structure which defines the ecc placement. For writes the
736 caller must provide a spare area buffer along with the
737 data buffer. The spare area buffer size is (number of pages) *
738 (size of spare area). For reads the buffer size is
739 (number of pages) * ((size of spare area) + (number of ecc
740 steps per page) * sizeof (int)). The driver stores the
741 result of the ecc check for each tuple in the spare buffer.
742 The storage sequence is
743 </para>
744 <para>
745 &lt;spare data page 0&gt;&lt;ecc result 0&gt;...&lt;ecc result n&gt;
746 </para>
747 <para>
748 ...
749 </para>
750 <para>
751 &lt;spare data page n&gt;&lt;ecc result 0&gt;...&lt;ecc result n&gt;
752 </para>
753 <para>
754 This is a legacy mode used by YAFFS1.
755 </para>
756 <para>
757 If the spare area buffer is NULL then only the ECC placement is
758 done according to the given scheme in the nand_oobinfo structure.
759 </para>
760 </sect2>
761 <sect2 id="Automatic_placement">
762 <title>Automatic placement</title>
763 <para>
764 Automatic placement uses the built in defaults to place the
765 ecc bytes in the spare area. If filesystem data have to be stored /
766 read into the spare area then the calling function must provide a
767 buffer. The buffer size per page is determined by the oobfree array in
768 the nand_oobinfo structure.
769 </para>
770 <para>
771 If the spare area buffer is NULL then only the ECC placement is
772 done according to the default builtin scheme.
773 </para>
774 </sect2>
775 </sect1>
776 <sect1 id="Spare_area_autoplacement_default">
777 <title>Spare area autoplacement default schemes</title>
778 <sect2 id="pagesize_256">
779 <title>256 byte pagesize</title>
780<informaltable><tgroup cols="3"><tbody>
781<row>
782<entry>Offset</entry>
783<entry>Content</entry>
784<entry>Comment</entry>
785</row>
786<row>
787<entry>0x00</entry>
788<entry>ECC byte 0</entry>
789<entry>Error correction code byte 0</entry>
790</row>
791<row>
792<entry>0x01</entry>
793<entry>ECC byte 1</entry>
794<entry>Error correction code byte 1</entry>
795</row>
796<row>
797<entry>0x02</entry>
798<entry>ECC byte 2</entry>
799<entry>Error correction code byte 2</entry>
800</row>
801<row>
802<entry>0x03</entry>
803<entry>Autoplace 0</entry>
804<entry></entry>
805</row>
806<row>
807<entry>0x04</entry>
808<entry>Autoplace 1</entry>
809<entry></entry>
810</row>
811<row>
812<entry>0x05</entry>
813<entry>Bad block marker</entry>
814<entry>If any bit in this byte is zero, then this block is bad.
815This applies only to the first page in a block. In the remaining
816pages this byte is reserved</entry>
817</row>
818<row>
819<entry>0x06</entry>
820<entry>Autoplace 2</entry>
821<entry></entry>
822</row>
823<row>
824<entry>0x07</entry>
825<entry>Autoplace 3</entry>
826<entry></entry>
827</row>
828</tbody></tgroup></informaltable>
829 </sect2>
830 <sect2 id="pagesize_512">
831 <title>512 byte pagesize</title>
832<informaltable><tgroup cols="3"><tbody>
833<row>
834<entry>Offset</entry>
835<entry>Content</entry>
836<entry>Comment</entry>
837</row>
838<row>
839<entry>0x00</entry>
840<entry>ECC byte 0</entry>
841<entry>Error correction code byte 0 of the lower 256 Byte data in
842this page</entry>
843</row>
844<row>
845<entry>0x01</entry>
846<entry>ECC byte 1</entry>
847<entry>Error correction code byte 1 of the lower 256 Bytes of data
848in this page</entry>
849</row>
850<row>
851<entry>0x02</entry>
852<entry>ECC byte 2</entry>
853<entry>Error correction code byte 2 of the lower 256 Bytes of data
854in this page</entry>
855</row>
856<row>
857<entry>0x03</entry>
858<entry>ECC byte 3</entry>
859<entry>Error correction code byte 0 of the upper 256 Bytes of data
860in this page</entry>
861</row>
862<row>
863<entry>0x04</entry>
864<entry>reserved</entry>
865<entry>reserved</entry>
866</row>
867<row>
868<entry>0x05</entry>
869<entry>Bad block marker</entry>
870<entry>If any bit in this byte is zero, then this block is bad.
871This applies only to the first page in a block. In the remaining
872pages this byte is reserved</entry>
873</row>
874<row>
875<entry>0x06</entry>
876<entry>ECC byte 4</entry>
877<entry>Error correction code byte 1 of the upper 256 Bytes of data
878in this page</entry>
879</row>
880<row>
881<entry>0x07</entry>
882<entry>ECC byte 5</entry>
883<entry>Error correction code byte 2 of the upper 256 Bytes of data
884in this page</entry>
885</row>
886<row>
887<entry>0x08 - 0x0F</entry>
888<entry>Autoplace 0 - 7</entry>
889<entry></entry>
890</row>
891</tbody></tgroup></informaltable>
892 </sect2>
893 <sect2 id="pagesize_2048">
894 <title>2048 byte pagesize</title>
895<informaltable><tgroup cols="3"><tbody>
896<row>
897<entry>Offset</entry>
898<entry>Content</entry>
899<entry>Comment</entry>
900</row>
901<row>
902<entry>0x00</entry>
903<entry>Bad block marker</entry>
904<entry>If any bit in this byte is zero, then this block is bad.
905This applies only to the first page in a block. In the remaining
906pages this byte is reserved</entry>
907</row>
908<row>
909<entry>0x01</entry>
910<entry>Reserved</entry>
911<entry>Reserved</entry>
912</row>
913<row>
914<entry>0x02-0x27</entry>
915<entry>Autoplace 0 - 37</entry>
916<entry></entry>
917</row>
918<row>
919<entry>0x28</entry>
920<entry>ECC byte 0</entry>
921<entry>Error correction code byte 0 of the first 256 Byte data in
922this page</entry>
923</row>
924<row>
925<entry>0x29</entry>
926<entry>ECC byte 1</entry>
927<entry>Error correction code byte 1 of the first 256 Bytes of data
928in this page</entry>
929</row>
930<row>
931<entry>0x2A</entry>
932<entry>ECC byte 2</entry>
933<entry>Error correction code byte 2 of the first 256 Bytes data in
934this page</entry>
935</row>
936<row>
937<entry>0x2B</entry>
938<entry>ECC byte 3</entry>
939<entry>Error correction code byte 0 of the second 256 Bytes of data
940in this page</entry>
941</row>
942<row>
943<entry>0x2C</entry>
944<entry>ECC byte 4</entry>
945<entry>Error correction code byte 1 of the second 256 Bytes of data
946in this page</entry>
947</row>
948<row>
949<entry>0x2D</entry>
950<entry>ECC byte 5</entry>
951<entry>Error correction code byte 2 of the second 256 Bytes of data
952in this page</entry>
953</row>
954<row>
955<entry>0x2E</entry>
956<entry>ECC byte 6</entry>
957<entry>Error correction code byte 0 of the third 256 Bytes of data
958in this page</entry>
959</row>
960<row>
961<entry>0x2F</entry>
962<entry>ECC byte 7</entry>
963<entry>Error correction code byte 1 of the third 256 Bytes of data
964in this page</entry>
965</row>
966<row>
967<entry>0x30</entry>
968<entry>ECC byte 8</entry>
969<entry>Error correction code byte 2 of the third 256 Bytes of data
970in this page</entry>
971</row>
972<row>
973<entry>0x31</entry>
974<entry>ECC byte 9</entry>
975<entry>Error correction code byte 0 of the fourth 256 Bytes of data
976in this page</entry>
977</row>
978<row>
979<entry>0x32</entry>
980<entry>ECC byte 10</entry>
981<entry>Error correction code byte 1 of the fourth 256 Bytes of data
982in this page</entry>
983</row>
984<row>
985<entry>0x33</entry>
986<entry>ECC byte 11</entry>
987<entry>Error correction code byte 2 of the fourth 256 Bytes of data
988in this page</entry>
989</row>
990<row>
991<entry>0x34</entry>
992<entry>ECC byte 12</entry>
993<entry>Error correction code byte 0 of the fifth 256 Bytes of data
994in this page</entry>
995</row>
996<row>
997<entry>0x35</entry>
998<entry>ECC byte 13</entry>
999<entry>Error correction code byte 1 of the fifth 256 Bytes of data
1000in this page</entry>
1001</row>
1002<row>
1003<entry>0x36</entry>
1004<entry>ECC byte 14</entry>
1005<entry>Error correction code byte 2 of the fifth 256 Bytes of data
1006in this page</entry>
1007</row>
1008<row>
1009<entry>0x37</entry>
1010<entry>ECC byte 15</entry>
1011<entry>Error correction code byte 0 of the sixt 256 Bytes of data
1012in this page</entry>
1013</row>
1014<row>
1015<entry>0x38</entry>
1016<entry>ECC byte 16</entry>
1017<entry>Error correction code byte 1 of the sixt 256 Bytes of data
1018in this page</entry>
1019</row>
1020<row>
1021<entry>0x39</entry>
1022<entry>ECC byte 17</entry>
1023<entry>Error correction code byte 2 of the sixt 256 Bytes of data
1024in this page</entry>
1025</row>
1026<row>
1027<entry>0x3A</entry>
1028<entry>ECC byte 18</entry>
1029<entry>Error correction code byte 0 of the seventh 256 Bytes of
1030data in this page</entry>
1031</row>
1032<row>
1033<entry>0x3B</entry>
1034<entry>ECC byte 19</entry>
1035<entry>Error correction code byte 1 of the seventh 256 Bytes of
1036data in this page</entry>
1037</row>
1038<row>
1039<entry>0x3C</entry>
1040<entry>ECC byte 20</entry>
1041<entry>Error correction code byte 2 of the seventh 256 Bytes of
1042data in this page</entry>
1043</row>
1044<row>
1045<entry>0x3D</entry>
1046<entry>ECC byte 21</entry>
1047<entry>Error correction code byte 0 of the eighth 256 Bytes of data
1048in this page</entry>
1049</row>
1050<row>
1051<entry>0x3E</entry>
1052<entry>ECC byte 22</entry>
1053<entry>Error correction code byte 1 of the eighth 256 Bytes of data
1054in this page</entry>
1055</row>
1056<row>
1057<entry>0x3F</entry>
1058<entry>ECC byte 23</entry>
1059<entry>Error correction code byte 2 of the eighth 256 Bytes of data
1060in this page</entry>
1061</row>
1062</tbody></tgroup></informaltable>
1063 </sect2>
1064 </sect1>
1065 </chapter>
1066
1067 <chapter id="filesystems">
1068 <title>Filesystem support</title>
1069 <para>
1070 The NAND driver provides all necessary functions for a
1071 filesystem via the MTD interface.
1072 </para>
1073 <para>
1074 Filesystems must be aware of the NAND peculiarities and
1075 restrictions. One major restrictions of NAND Flash is, that you cannot
1076 write as often as you want to a page. The consecutive writes to a page,
1077 before erasing it again, are restricted to 1-3 writes, depending on the
1078 manufacturers specifications. This applies similar to the spare area.
1079 </para>
1080 <para>
1081 Therefore NAND aware filesystems must either write in page size chunks
1082 or hold a writebuffer to collect smaller writes until they sum up to
1083 pagesize. Available NAND aware filesystems: JFFS2, YAFFS.
1084 </para>
1085 <para>
1086 The spare area usage to store filesystem data is controlled by
1087 the spare area placement functionality which is described in one
1088 of the earlier chapters.
1089 </para>
1090 </chapter>
1091 <chapter id="tools">
1092 <title>Tools</title>
1093 <para>
1094 The MTD project provides a couple of helpful tools to handle NAND Flash.
1095 <itemizedlist>
1096 <listitem><para>flasherase, flasheraseall: Erase and format FLASH partitions</para></listitem>
1097 <listitem><para>nandwrite: write filesystem images to NAND FLASH</para></listitem>
1098 <listitem><para>nanddump: dump the contents of a NAND FLASH partitions</para></listitem>
1099 </itemizedlist>
1100 </para>
1101 <para>
1102 These tools are aware of the NAND restrictions. Please use those tools
1103 instead of complaining about errors which are caused by non NAND aware
1104 access methods.
1105 </para>
1106 </chapter>
1107
1108 <chapter id="defines">
1109 <title>Constants</title>
1110 <para>
1111 This chapter describes the constants which might be relevant for a driver developer.
1112 </para>
1113 <sect1 id="Chip_option_constants">
1114 <title>Chip option constants</title>
1115 <sect2 id="Constants_for_chip_id_table">
1116 <title>Constants for chip id table</title>
1117 <para>
1118 These constants are defined in nand.h. They are ored together to describe
1119 the chip functionality.
1120 <programlisting>
1121/* Buswitdh is 16 bit */
1122#define NAND_BUSWIDTH_16 0x00000002
1123/* Device supports partial programming without padding */
1124#define NAND_NO_PADDING 0x00000004
1125/* Chip has cache program function */
1126#define NAND_CACHEPRG 0x00000008
1127/* Chip has copy back function */
1128#define NAND_COPYBACK 0x00000010
1129/* AND Chip which has 4 banks and a confusing page / block
1130 * assignment. See Renesas datasheet for further information */
1131#define NAND_IS_AND 0x00000020
1132/* Chip has a array of 4 pages which can be read without
1133 * additional ready /busy waits */
1134#define NAND_4PAGE_ARRAY 0x00000040
1135 </programlisting>
1136 </para>
1137 </sect2>
1138 <sect2 id="Constants_for_runtime_options">
1139 <title>Constants for runtime options</title>
1140 <para>
1141 These constants are defined in nand.h. They are ored together to describe
1142 the functionality.
1143 <programlisting>
1144/* The hw ecc generator provides a syndrome instead a ecc value on read
1145 * This can only work if we have the ecc bytes directly behind the
1146 * data bytes. Applies for DOC and AG-AND Renesas HW Reed Solomon generators */
1147#define NAND_HWECC_SYNDROME 0x00020000
1148 </programlisting>
1149 </para>
1150 </sect2>
1151 </sect1>
1152
1153 <sect1 id="EEC_selection_constants">
1154 <title>ECC selection constants</title>
1155 <para>
1156 Use these constants to select the ECC algorithm.
1157 <programlisting>
1158/* No ECC. Usage is not recommended ! */
1159#define NAND_ECC_NONE 0
1160/* Software ECC 3 byte ECC per 256 Byte data */
1161#define NAND_ECC_SOFT 1
1162/* Hardware ECC 3 byte ECC per 256 Byte data */
1163#define NAND_ECC_HW3_256 2
1164/* Hardware ECC 3 byte ECC per 512 Byte data */
1165#define NAND_ECC_HW3_512 3
1166/* Hardware ECC 6 byte ECC per 512 Byte data */
1167#define NAND_ECC_HW6_512 4
1168/* Hardware ECC 6 byte ECC per 512 Byte data */
1169#define NAND_ECC_HW8_512 6
1170 </programlisting>
1171 </para>
1172 </sect1>
1173
1174 <sect1 id="Hardware_control_related_constants">
1175 <title>Hardware control related constants</title>
1176 <para>
1177 These constants describe the requested hardware access function when
1178 the boardspecific hardware control function is called
1179 <programlisting>
1180/* Select the chip by setting nCE to low */
1181#define NAND_CTL_SETNCE 1
1182/* Deselect the chip by setting nCE to high */
1183#define NAND_CTL_CLRNCE 2
1184/* Select the command latch by setting CLE to high */
1185#define NAND_CTL_SETCLE 3
1186/* Deselect the command latch by setting CLE to low */
1187#define NAND_CTL_CLRCLE 4
1188/* Select the address latch by setting ALE to high */
1189#define NAND_CTL_SETALE 5
1190/* Deselect the address latch by setting ALE to low */
1191#define NAND_CTL_CLRALE 6
1192/* Set write protection by setting WP to high. Not used! */
1193#define NAND_CTL_SETWP 7
1194/* Clear write protection by setting WP to low. Not used! */
1195#define NAND_CTL_CLRWP 8
1196 </programlisting>
1197 </para>
1198 </sect1>
1199
1200 <sect1 id="Bad_block_table_constants">
1201 <title>Bad block table related constants</title>
1202 <para>
1203 These constants describe the options used for bad block
1204 table descriptors.
1205 <programlisting>
1206/* Options for the bad block table descriptors */
1207
1208/* The number of bits used per block in the bbt on the device */
1209#define NAND_BBT_NRBITS_MSK 0x0000000F
1210#define NAND_BBT_1BIT 0x00000001
1211#define NAND_BBT_2BIT 0x00000002
1212#define NAND_BBT_4BIT 0x00000004
1213#define NAND_BBT_8BIT 0x00000008
1214/* The bad block table is in the last good block of the device */
1215#define NAND_BBT_LASTBLOCK 0x00000010
1216/* The bbt is at the given page, else we must scan for the bbt */
1217#define NAND_BBT_ABSPAGE 0x00000020
1218/* bbt is stored per chip on multichip devices */
1219#define NAND_BBT_PERCHIP 0x00000080
1220/* bbt has a version counter at offset veroffs */
1221#define NAND_BBT_VERSION 0x00000100
1222/* Create a bbt if none axists */
1223#define NAND_BBT_CREATE 0x00000200
1224/* Write bbt if necessary */
1225#define NAND_BBT_WRITE 0x00001000
1226/* Read and write back block contents when writing bbt */
1227#define NAND_BBT_SAVECONTENT 0x00002000
1228 </programlisting>
1229 </para>
1230 </sect1>
1231
1232 </chapter>
1233
1234 <chapter id="structs">
1235 <title>Structures</title>
1236 <para>
1237 This chapter contains the autogenerated documentation of the structures which are
1238 used in the NAND driver and might be relevant for a driver developer. Each
1239 struct member has a short description which is marked with an [XXX] identifier.
1240 See the chapter "Documentation hints" for an explanation.
1241 </para>
1242!Iinclude/linux/mtd/nand.h
1243 </chapter>
1244
1245 <chapter id="pubfunctions">
1246 <title>Public Functions Provided</title>
1247 <para>
1248 This chapter contains the autogenerated documentation of the NAND kernel API functions
1249 which are exported. Each function has a short description which is marked with an [XXX] identifier.
1250 See the chapter "Documentation hints" for an explanation.
1251 </para>
1252!Edrivers/mtd/nand/nand_base.c
1253!Edrivers/mtd/nand/nand_bbt.c
1254!Edrivers/mtd/nand/nand_ecc.c
1255 </chapter>
1256
1257 <chapter id="intfunctions">
1258 <title>Internal Functions Provided</title>
1259 <para>
1260 This chapter contains the autogenerated documentation of the NAND driver internal functions.
1261 Each function has a short description which is marked with an [XXX] identifier.
1262 See the chapter "Documentation hints" for an explanation.
1263 The functions marked with [DEFAULT] might be relevant for a board driver developer.
1264 </para>
1265!Idrivers/mtd/nand/nand_base.c
1266!Idrivers/mtd/nand/nand_bbt.c
1267<!-- No internal functions for kernel-doc:
1268X!Idrivers/mtd/nand/nand_ecc.c
1269-->
1270 </chapter>
1271
1272 <chapter id="credits">
1273 <title>Credits</title>
1274 <para>
1275 The following people have contributed to the NAND driver:
1276 <orderedlist>
1277 <listitem><para>Steven J. Hill<email>sjhill@realitydiluted.com</email></para></listitem>
1278 <listitem><para>David Woodhouse<email>dwmw2@infradead.org</email></para></listitem>
1279 <listitem><para>Thomas Gleixner<email>tglx@linutronix.de</email></para></listitem>
1280 </orderedlist>
1281 A lot of users have provided bugfixes, improvements and helping hands for testing.
1282 Thanks a lot.
1283 </para>
1284 <para>
1285 The following people have contributed to this document:
1286 <orderedlist>
1287 <listitem><para>Thomas Gleixner<email>tglx@linutronix.de</email></para></listitem>
1288 </orderedlist>
1289 </para>
1290 </chapter>
1291</book>
diff --git a/Documentation/DocBook/networking.tmpl b/Documentation/DocBook/networking.tmpl
deleted file mode 100644
index 29df25016c7c..000000000000
--- a/Documentation/DocBook/networking.tmpl
+++ /dev/null
@@ -1,111 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="LinuxNetworking">
6 <bookinfo>
7 <title>Linux Networking and Network Devices APIs</title>
8
9 <legalnotice>
10 <para>
11 This documentation is free software; you can redistribute
12 it and/or modify it under the terms of the GNU General Public
13 License as published by the Free Software Foundation; either
14 version 2 of the License, or (at your option) any later
15 version.
16 </para>
17
18 <para>
19 This program is distributed in the hope that it will be
20 useful, but WITHOUT ANY WARRANTY; without even the implied
21 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
22 See the GNU General Public License for more details.
23 </para>
24
25 <para>
26 You should have received a copy of the GNU General Public
27 License along with this program; if not, write to the Free
28 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
29 MA 02111-1307 USA
30 </para>
31
32 <para>
33 For more details see the file COPYING in the source
34 distribution of Linux.
35 </para>
36 </legalnotice>
37 </bookinfo>
38
39<toc></toc>
40
41 <chapter id="netcore">
42 <title>Linux Networking</title>
43 <sect1><title>Networking Base Types</title>
44!Iinclude/linux/net.h
45 </sect1>
46 <sect1><title>Socket Buffer Functions</title>
47!Iinclude/linux/skbuff.h
48!Iinclude/net/sock.h
49!Enet/socket.c
50!Enet/core/skbuff.c
51!Enet/core/sock.c
52!Enet/core/datagram.c
53!Enet/core/stream.c
54 </sect1>
55 <sect1><title>Socket Filter</title>
56!Enet/core/filter.c
57 </sect1>
58 <sect1><title>Generic Network Statistics</title>
59!Iinclude/uapi/linux/gen_stats.h
60!Enet/core/gen_stats.c
61!Enet/core/gen_estimator.c
62 </sect1>
63 <sect1><title>SUN RPC subsystem</title>
64<!-- The !D functionality is not perfect, garbage has to be protected by comments
65!Dnet/sunrpc/sunrpc_syms.c
66-->
67!Enet/sunrpc/xdr.c
68!Enet/sunrpc/svc_xprt.c
69!Enet/sunrpc/xprt.c
70!Enet/sunrpc/sched.c
71!Enet/sunrpc/socklib.c
72!Enet/sunrpc/stats.c
73!Enet/sunrpc/rpc_pipe.c
74!Enet/sunrpc/rpcb_clnt.c
75!Enet/sunrpc/clnt.c
76 </sect1>
77 <sect1><title>WiMAX</title>
78!Enet/wimax/op-msg.c
79!Enet/wimax/op-reset.c
80!Enet/wimax/op-rfkill.c
81!Enet/wimax/stack.c
82!Iinclude/net/wimax.h
83!Iinclude/uapi/linux/wimax.h
84 </sect1>
85 </chapter>
86
87 <chapter id="netdev">
88 <title>Network device support</title>
89 <sect1><title>Driver Support</title>
90!Enet/core/dev.c
91!Enet/ethernet/eth.c
92!Enet/sched/sch_generic.c
93!Iinclude/linux/etherdevice.h
94!Iinclude/linux/netdevice.h
95 </sect1>
96 <sect1><title>PHY Support</title>
97!Edrivers/net/phy/phy.c
98!Idrivers/net/phy/phy.c
99!Edrivers/net/phy/phy_device.c
100!Idrivers/net/phy/phy_device.c
101!Edrivers/net/phy/mdio_bus.c
102!Idrivers/net/phy/mdio_bus.c
103 </sect1>
104<!-- FIXME: Removed for now since no structured comments in source
105 <sect1><title>Wireless</title>
106X!Enet/core/wireless.c
107 </sect1>
108-->
109 </chapter>
110
111</book>
diff --git a/Documentation/DocBook/rapidio.tmpl b/Documentation/DocBook/rapidio.tmpl
deleted file mode 100644
index ac3cca3399a1..000000000000
--- a/Documentation/DocBook/rapidio.tmpl
+++ /dev/null
@@ -1,155 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
4 <!ENTITY rapidio SYSTEM "rapidio.xml">
5 ]>
6
7<book id="RapidIO-Guide">
8 <bookinfo>
9 <title>RapidIO Subsystem Guide</title>
10
11 <authorgroup>
12 <author>
13 <firstname>Matt</firstname>
14 <surname>Porter</surname>
15 <affiliation>
16 <address>
17 <email>mporter@kernel.crashing.org</email>
18 <email>mporter@mvista.com</email>
19 </address>
20 </affiliation>
21 </author>
22 </authorgroup>
23
24 <copyright>
25 <year>2005</year>
26 <holder>MontaVista Software, Inc.</holder>
27 </copyright>
28
29 <legalnotice>
30 <para>
31 This documentation is free software; you can redistribute
32 it and/or modify it under the terms of the GNU General Public
33 License version 2 as published by the Free Software Foundation.
34 </para>
35
36 <para>
37 This program is distributed in the hope that it will be
38 useful, but WITHOUT ANY WARRANTY; without even the implied
39 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
40 See the GNU General Public License for more details.
41 </para>
42
43 <para>
44 You should have received a copy of the GNU General Public
45 License along with this program; if not, write to the Free
46 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
47 MA 02111-1307 USA
48 </para>
49
50 <para>
51 For more details see the file COPYING in the source
52 distribution of Linux.
53 </para>
54 </legalnotice>
55 </bookinfo>
56
57<toc></toc>
58
59 <chapter id="intro">
60 <title>Introduction</title>
61 <para>
62 RapidIO is a high speed switched fabric interconnect with
63 features aimed at the embedded market. RapidIO provides
64 support for memory-mapped I/O as well as message-based
65 transactions over the switched fabric network. RapidIO has
66 a standardized discovery mechanism not unlike the PCI bus
67 standard that allows simple detection of devices in a
68 network.
69 </para>
70 <para>
71 This documentation is provided for developers intending
72 to support RapidIO on new architectures, write new drivers,
73 or to understand the subsystem internals.
74 </para>
75 </chapter>
76
77 <chapter id="bugs">
78 <title>Known Bugs and Limitations</title>
79
80 <sect1 id="known_bugs">
81 <title>Bugs</title>
82 <para>None. ;)</para>
83 </sect1>
84 <sect1 id="Limitations">
85 <title>Limitations</title>
86 <para>
87 <orderedlist>
88 <listitem><para>Access/management of RapidIO memory regions is not supported</para></listitem>
89 <listitem><para>Multiple host enumeration is not supported</para></listitem>
90 </orderedlist>
91 </para>
92 </sect1>
93 </chapter>
94
95 <chapter id="drivers">
96 <title>RapidIO driver interface</title>
97 <para>
98 Drivers are provided a set of calls in order
99 to interface with the subsystem to gather info
100 on devices, request/map memory region resources,
101 and manage mailboxes/doorbells.
102 </para>
103 <sect1 id="Functions">
104 <title>Functions</title>
105!Iinclude/linux/rio_drv.h
106!Edrivers/rapidio/rio-driver.c
107!Edrivers/rapidio/rio.c
108 </sect1>
109 </chapter>
110
111 <chapter id="internals">
112 <title>Internals</title>
113
114 <para>
115 This chapter contains the autogenerated documentation of the RapidIO
116 subsystem.
117 </para>
118
119 <sect1 id="Structures"><title>Structures</title>
120!Iinclude/linux/rio.h
121 </sect1>
122 <sect1 id="Enumeration_and_Discovery"><title>Enumeration and Discovery</title>
123!Idrivers/rapidio/rio-scan.c
124 </sect1>
125 <sect1 id="Driver_functionality"><title>Driver functionality</title>
126!Idrivers/rapidio/rio.c
127!Idrivers/rapidio/rio-access.c
128 </sect1>
129 <sect1 id="Device_model_support"><title>Device model support</title>
130!Idrivers/rapidio/rio-driver.c
131 </sect1>
132 <sect1 id="PPC32_support"><title>PPC32 support</title>
133!Iarch/powerpc/sysdev/fsl_rio.c
134 </sect1>
135 </chapter>
136
137 <chapter id="credits">
138 <title>Credits</title>
139 <para>
140 The following people have contributed to the RapidIO
141 subsystem directly or indirectly:
142 <orderedlist>
143 <listitem><para>Matt Porter<email>mporter@kernel.crashing.org</email></para></listitem>
144 <listitem><para>Randy Vinson<email>rvinson@mvista.com</email></para></listitem>
145 <listitem><para>Dan Malek<email>dan@embeddedalley.com</email></para></listitem>
146 </orderedlist>
147 </para>
148 <para>
149 The following people have contributed to this document:
150 <orderedlist>
151 <listitem><para>Matt Porter<email>mporter@kernel.crashing.org</email></para></listitem>
152 </orderedlist>
153 </para>
154 </chapter>
155</book>
diff --git a/Documentation/DocBook/s390-drivers.tmpl b/Documentation/DocBook/s390-drivers.tmpl
deleted file mode 100644
index 95bfc12e5439..000000000000
--- a/Documentation/DocBook/s390-drivers.tmpl
+++ /dev/null
@@ -1,161 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="s390drivers">
6 <bookinfo>
7 <title>Writing s390 channel device drivers</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Cornelia</firstname>
12 <surname>Huck</surname>
13 <affiliation>
14 <address>
15 <email>cornelia.huck@de.ibm.com</email>
16 </address>
17 </affiliation>
18 </author>
19 </authorgroup>
20
21 <copyright>
22 <year>2007</year>
23 <holder>IBM Corp.</holder>
24 </copyright>
25
26 <legalnotice>
27 <para>
28 This documentation is free software; you can redistribute
29 it and/or modify it under the terms of the GNU General Public
30 License as published by the Free Software Foundation; either
31 version 2 of the License, or (at your option) any later
32 version.
33 </para>
34
35 <para>
36 This program is distributed in the hope that it will be
37 useful, but WITHOUT ANY WARRANTY; without even the implied
38 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
39 See the GNU General Public License for more details.
40 </para>
41
42 <para>
43 You should have received a copy of the GNU General Public
44 License along with this program; if not, write to the Free
45 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
46 MA 02111-1307 USA
47 </para>
48
49 <para>
50 For more details see the file COPYING in the source
51 distribution of Linux.
52 </para>
53 </legalnotice>
54 </bookinfo>
55
56<toc></toc>
57
58 <chapter id="intro">
59 <title>Introduction</title>
60 <para>
61 This document describes the interfaces available for device drivers that
62 drive s390 based channel attached I/O devices. This includes interfaces for
63 interaction with the hardware and interfaces for interacting with the
64 common driver core. Those interfaces are provided by the s390 common I/O
65 layer.
66 </para>
67 <para>
68 The document assumes a familarity with the technical terms associated
69 with the s390 channel I/O architecture. For a description of this
70 architecture, please refer to the "z/Architecture: Principles of
71 Operation", IBM publication no. SA22-7832.
72 </para>
73 <para>
74 While most I/O devices on a s390 system are typically driven through the
75 channel I/O mechanism described here, there are various other methods
76 (like the diag interface). These are out of the scope of this document.
77 </para>
78 <para>
79 Some additional information can also be found in the kernel source
80 under Documentation/s390/driver-model.txt.
81 </para>
82 </chapter>
83 <chapter id="ccw">
84 <title>The ccw bus</title>
85 <para>
86 The ccw bus typically contains the majority of devices available to
87 a s390 system. Named after the channel command word (ccw), the basic
88 command structure used to address its devices, the ccw bus contains
89 so-called channel attached devices. They are addressed via I/O
90 subchannels, visible on the css bus. A device driver for
91 channel-attached devices, however, will never interact with the
92 subchannel directly, but only via the I/O device on the ccw bus,
93 the ccw device.
94 </para>
95 <sect1 id="channelIO">
96 <title>I/O functions for channel-attached devices</title>
97 <para>
98 Some hardware structures have been translated into C structures for use
99 by the common I/O layer and device drivers. For more information on
100 the hardware structures represented here, please consult the Principles
101 of Operation.
102 </para>
103!Iarch/s390/include/asm/cio.h
104 </sect1>
105 <sect1 id="ccwdev">
106 <title>ccw devices</title>
107 <para>
108 Devices that want to initiate channel I/O need to attach to the ccw bus.
109 Interaction with the driver core is done via the common I/O layer, which
110 provides the abstractions of ccw devices and ccw device drivers.
111 </para>
112 <para>
113 The functions that initiate or terminate channel I/O all act upon a
114 ccw device structure. Device drivers must not bypass those functions
115 or strange side effects may happen.
116 </para>
117!Iarch/s390/include/asm/ccwdev.h
118!Edrivers/s390/cio/device.c
119!Edrivers/s390/cio/device_ops.c
120 </sect1>
121 <sect1 id="cmf">
122 <title>The channel-measurement facility</title>
123 <para>
124 The channel-measurement facility provides a means to collect
125 measurement data which is made available by the channel subsystem
126 for each channel attached device.
127 </para>
128!Iarch/s390/include/asm/cmb.h
129!Edrivers/s390/cio/cmf.c
130 </sect1>
131 </chapter>
132
133 <chapter id="ccwgroup">
134 <title>The ccwgroup bus</title>
135 <para>
136 The ccwgroup bus only contains artificial devices, created by the user.
137 Many networking devices (e.g. qeth) are in fact composed of several
138 ccw devices (like read, write and data channel for qeth). The
139 ccwgroup bus provides a mechanism to create a meta-device which
140 contains those ccw devices as slave devices and can be associated
141 with the netdevice.
142 </para>
143 <sect1 id="ccwgroupdevices">
144 <title>ccw group devices</title>
145!Iarch/s390/include/asm/ccwgroup.h
146!Edrivers/s390/cio/ccwgroup.c
147 </sect1>
148 </chapter>
149
150 <chapter id="genericinterfaces">
151 <title>Generic interfaces</title>
152 <para>
153 Some interfaces are available to other drivers that do not necessarily
154 have anything to do with the busses described above, but still are
155 indirectly using basic infrastructure in the common I/O layer.
156 One example is the support for adapter interrupts.
157 </para>
158!Edrivers/s390/cio/airq.c
159 </chapter>
160
161</book>
diff --git a/Documentation/DocBook/scsi.tmpl b/Documentation/DocBook/scsi.tmpl
deleted file mode 100644
index 4b9b9b286cea..000000000000
--- a/Documentation/DocBook/scsi.tmpl
+++ /dev/null
@@ -1,409 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="scsimid">
6 <bookinfo>
7 <title>SCSI Interfaces Guide</title>
8
9 <authorgroup>
10 <author>
11 <firstname>James</firstname>
12 <surname>Bottomley</surname>
13 <affiliation>
14 <address>
15 <email>James.Bottomley@hansenpartnership.com</email>
16 </address>
17 </affiliation>
18 </author>
19
20 <author>
21 <firstname>Rob</firstname>
22 <surname>Landley</surname>
23 <affiliation>
24 <address>
25 <email>rob@landley.net</email>
26 </address>
27 </affiliation>
28 </author>
29
30 </authorgroup>
31
32 <copyright>
33 <year>2007</year>
34 <holder>Linux Foundation</holder>
35 </copyright>
36
37 <legalnotice>
38 <para>
39 This documentation is free software; you can redistribute
40 it and/or modify it under the terms of the GNU General Public
41 License version 2.
42 </para>
43
44 <para>
45 This program is distributed in the hope that it will be
46 useful, but WITHOUT ANY WARRANTY; without even the implied
47 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
48 For more details see the file COPYING in the source
49 distribution of Linux.
50 </para>
51 </legalnotice>
52 </bookinfo>
53
54 <toc></toc>
55
56 <chapter id="intro">
57 <title>Introduction</title>
58 <sect1 id="protocol_vs_bus">
59 <title>Protocol vs bus</title>
60 <para>
61 Once upon a time, the Small Computer Systems Interface defined both
62 a parallel I/O bus and a data protocol to connect a wide variety of
63 peripherals (disk drives, tape drives, modems, printers, scanners,
64 optical drives, test equipment, and medical devices) to a host
65 computer.
66 </para>
67 <para>
68 Although the old parallel (fast/wide/ultra) SCSI bus has largely
69 fallen out of use, the SCSI command set is more widely used than ever
70 to communicate with devices over a number of different busses.
71 </para>
72 <para>
73 The <ulink url='http://www.t10.org/scsi-3.htm'>SCSI protocol</ulink>
74 is a big-endian peer-to-peer packet based protocol. SCSI commands
75 are 6, 10, 12, or 16 bytes long, often followed by an associated data
76 payload.
77 </para>
78 <para>
79 SCSI commands can be transported over just about any kind of bus, and
80 are the default protocol for storage devices attached to USB, SATA,
81 SAS, Fibre Channel, FireWire, and ATAPI devices. SCSI packets are
82 also commonly exchanged over Infiniband,
83 <ulink url='http://i2o.shadowconnect.com/faq.php'>I20</ulink>, TCP/IP
84 (<ulink url='https://en.wikipedia.org/wiki/ISCSI'>iSCSI</ulink>), even
85 <ulink url='http://cyberelk.net/tim/parport/parscsi.html'>Parallel
86 ports</ulink>.
87 </para>
88 </sect1>
89 <sect1 id="subsystem_design">
90 <title>Design of the Linux SCSI subsystem</title>
91 <para>
92 The SCSI subsystem uses a three layer design, with upper, mid, and low
93 layers. Every operation involving the SCSI subsystem (such as reading
94 a sector from a disk) uses one driver at each of the 3 levels: one
95 upper layer driver, one lower layer driver, and the SCSI midlayer.
96 </para>
97 <para>
98 The SCSI upper layer provides the interface between userspace and the
99 kernel, in the form of block and char device nodes for I/O and
100 ioctl(). The SCSI lower layer contains drivers for specific hardware
101 devices.
102 </para>
103 <para>
104 In between is the SCSI mid-layer, analogous to a network routing
105 layer such as the IPv4 stack. The SCSI mid-layer routes a packet
106 based data protocol between the upper layer's /dev nodes and the
107 corresponding devices in the lower layer. It manages command queues,
108 provides error handling and power management functions, and responds
109 to ioctl() requests.
110 </para>
111 </sect1>
112 </chapter>
113
114 <chapter id="upper_layer">
115 <title>SCSI upper layer</title>
116 <para>
117 The upper layer supports the user-kernel interface by providing
118 device nodes.
119 </para>
120 <sect1 id="sd">
121 <title>sd (SCSI Disk)</title>
122 <para>sd (sd_mod.o)</para>
123<!-- !Idrivers/scsi/sd.c -->
124 </sect1>
125 <sect1 id="sr">
126 <title>sr (SCSI CD-ROM)</title>
127 <para>sr (sr_mod.o)</para>
128 </sect1>
129 <sect1 id="st">
130 <title>st (SCSI Tape)</title>
131 <para>st (st.o)</para>
132 </sect1>
133 <sect1 id="sg">
134 <title>sg (SCSI Generic)</title>
135 <para>sg (sg.o)</para>
136 </sect1>
137 <sect1 id="ch">
138 <title>ch (SCSI Media Changer)</title>
139 <para>ch (ch.c)</para>
140 </sect1>
141 </chapter>
142
143 <chapter id="mid_layer">
144 <title>SCSI mid layer</title>
145
146 <sect1 id="midlayer_implementation">
147 <title>SCSI midlayer implementation</title>
148 <sect2 id="scsi_device.h">
149 <title>include/scsi/scsi_device.h</title>
150 <para>
151 </para>
152!Iinclude/scsi/scsi_device.h
153 </sect2>
154
155 <sect2 id="scsi.c">
156 <title>drivers/scsi/scsi.c</title>
157 <para>Main file for the SCSI midlayer.</para>
158!Edrivers/scsi/scsi.c
159 </sect2>
160 <sect2 id="scsicam.c">
161 <title>drivers/scsi/scsicam.c</title>
162 <para>
163 <ulink url='http://www.t10.org/ftp/t10/drafts/cam/cam-r12b.pdf'>SCSI
164 Common Access Method</ulink> support functions, for use with
165 HDIO_GETGEO, etc.
166 </para>
167!Edrivers/scsi/scsicam.c
168 </sect2>
169 <sect2 id="scsi_error.c">
170 <title>drivers/scsi/scsi_error.c</title>
171 <para>Common SCSI error/timeout handling routines.</para>
172!Edrivers/scsi/scsi_error.c
173 </sect2>
174 <sect2 id="scsi_devinfo.c">
175 <title>drivers/scsi/scsi_devinfo.c</title>
176 <para>
177 Manage scsi_dev_info_list, which tracks blacklisted and whitelisted
178 devices.
179 </para>
180!Idrivers/scsi/scsi_devinfo.c
181 </sect2>
182 <sect2 id="scsi_ioctl.c">
183 <title>drivers/scsi/scsi_ioctl.c</title>
184 <para>
185 Handle ioctl() calls for SCSI devices.
186 </para>
187!Edrivers/scsi/scsi_ioctl.c
188 </sect2>
189 <sect2 id="scsi_lib.c">
190 <title>drivers/scsi/scsi_lib.c</title>
191 <para>
192 SCSI queuing library.
193 </para>
194!Edrivers/scsi/scsi_lib.c
195 </sect2>
196 <sect2 id="scsi_lib_dma.c">
197 <title>drivers/scsi/scsi_lib_dma.c</title>
198 <para>
199 SCSI library functions depending on DMA
200 (map and unmap scatter-gather lists).
201 </para>
202!Edrivers/scsi/scsi_lib_dma.c
203 </sect2>
204 <sect2 id="scsi_module.c">
205 <title>drivers/scsi/scsi_module.c</title>
206 <para>
207 The file drivers/scsi/scsi_module.c contains legacy support for
208 old-style host templates. It should never be used by any new driver.
209 </para>
210 </sect2>
211 <sect2 id="scsi_proc.c">
212 <title>drivers/scsi/scsi_proc.c</title>
213 <para>
214 The functions in this file provide an interface between
215 the PROC file system and the SCSI device drivers
216 It is mainly used for debugging, statistics and to pass
217 information directly to the lowlevel driver.
218
219 I.E. plumbing to manage /proc/scsi/*
220 </para>
221!Idrivers/scsi/scsi_proc.c
222 </sect2>
223 <sect2 id="scsi_netlink.c">
224 <title>drivers/scsi/scsi_netlink.c</title>
225 <para>
226 Infrastructure to provide async events from transports to userspace
227 via netlink, using a single NETLINK_SCSITRANSPORT protocol for all
228 transports.
229
230 See <ulink url='http://marc.info/?l=linux-scsi&amp;m=115507374832500&amp;w=2'>the
231 original patch submission</ulink> for more details.
232 </para>
233!Idrivers/scsi/scsi_netlink.c
234 </sect2>
235 <sect2 id="scsi_scan.c">
236 <title>drivers/scsi/scsi_scan.c</title>
237 <para>
238 Scan a host to determine which (if any) devices are attached.
239
240 The general scanning/probing algorithm is as follows, exceptions are
241 made to it depending on device specific flags, compilation options,
242 and global variable (boot or module load time) settings.
243
244 A specific LUN is scanned via an INQUIRY command; if the LUN has a
245 device attached, a scsi_device is allocated and setup for it.
246
247 For every id of every channel on the given host, start by scanning
248 LUN 0. Skip hosts that don't respond at all to a scan of LUN 0.
249 Otherwise, if LUN 0 has a device attached, allocate and setup a
250 scsi_device for it. If target is SCSI-3 or up, issue a REPORT LUN,
251 and scan all of the LUNs returned by the REPORT LUN; else,
252 sequentially scan LUNs up until some maximum is reached, or a LUN is
253 seen that cannot have a device attached to it.
254 </para>
255!Idrivers/scsi/scsi_scan.c
256 </sect2>
257 <sect2 id="scsi_sysctl.c">
258 <title>drivers/scsi/scsi_sysctl.c</title>
259 <para>
260 Set up the sysctl entry: "/dev/scsi/logging_level"
261 (DEV_SCSI_LOGGING_LEVEL) which sets/returns scsi_logging_level.
262 </para>
263 </sect2>
264 <sect2 id="scsi_sysfs.c">
265 <title>drivers/scsi/scsi_sysfs.c</title>
266 <para>
267 SCSI sysfs interface routines.
268 </para>
269!Edrivers/scsi/scsi_sysfs.c
270 </sect2>
271 <sect2 id="hosts.c">
272 <title>drivers/scsi/hosts.c</title>
273 <para>
274 mid to lowlevel SCSI driver interface
275 </para>
276!Edrivers/scsi/hosts.c
277 </sect2>
278 <sect2 id="constants.c">
279 <title>drivers/scsi/constants.c</title>
280 <para>
281 mid to lowlevel SCSI driver interface
282 </para>
283!Edrivers/scsi/constants.c
284 </sect2>
285 </sect1>
286
287 <sect1 id="Transport_classes">
288 <title>Transport classes</title>
289 <para>
290 Transport classes are service libraries for drivers in the SCSI
291 lower layer, which expose transport attributes in sysfs.
292 </para>
293 <sect2 id="Fibre_Channel_transport">
294 <title>Fibre Channel transport</title>
295 <para>
296 The file drivers/scsi/scsi_transport_fc.c defines transport attributes
297 for Fibre Channel.
298 </para>
299!Edrivers/scsi/scsi_transport_fc.c
300 </sect2>
301 <sect2 id="iSCSI_transport">
302 <title>iSCSI transport class</title>
303 <para>
304 The file drivers/scsi/scsi_transport_iscsi.c defines transport
305 attributes for the iSCSI class, which sends SCSI packets over TCP/IP
306 connections.
307 </para>
308!Edrivers/scsi/scsi_transport_iscsi.c
309 </sect2>
310 <sect2 id="SAS_transport">
311 <title>Serial Attached SCSI (SAS) transport class</title>
312 <para>
313 The file drivers/scsi/scsi_transport_sas.c defines transport
314 attributes for Serial Attached SCSI, a variant of SATA aimed at
315 large high-end systems.
316 </para>
317 <para>
318 The SAS transport class contains common code to deal with SAS HBAs,
319 an aproximated representation of SAS topologies in the driver model,
320 and various sysfs attributes to expose these topologies and management
321 interfaces to userspace.
322 </para>
323 <para>
324 In addition to the basic SCSI core objects this transport class
325 introduces two additional intermediate objects: The SAS PHY
326 as represented by struct sas_phy defines an "outgoing" PHY on
327 a SAS HBA or Expander, and the SAS remote PHY represented by
328 struct sas_rphy defines an "incoming" PHY on a SAS Expander or
329 end device. Note that this is purely a software concept, the
330 underlying hardware for a PHY and a remote PHY is the exactly
331 the same.
332 </para>
333 <para>
334 There is no concept of a SAS port in this code, users can see
335 what PHYs form a wide port based on the port_identifier attribute,
336 which is the same for all PHYs in a port.
337 </para>
338!Edrivers/scsi/scsi_transport_sas.c
339 </sect2>
340 <sect2 id="SATA_transport">
341 <title>SATA transport class</title>
342 <para>
343 The SATA transport is handled by libata, which has its own book of
344 documentation in this directory.
345 </para>
346 </sect2>
347 <sect2 id="SPI_transport">
348 <title>Parallel SCSI (SPI) transport class</title>
349 <para>
350 The file drivers/scsi/scsi_transport_spi.c defines transport
351 attributes for traditional (fast/wide/ultra) SCSI busses.
352 </para>
353!Edrivers/scsi/scsi_transport_spi.c
354 </sect2>
355 <sect2 id="SRP_transport">
356 <title>SCSI RDMA (SRP) transport class</title>
357 <para>
358 The file drivers/scsi/scsi_transport_srp.c defines transport
359 attributes for SCSI over Remote Direct Memory Access.
360 </para>
361!Edrivers/scsi/scsi_transport_srp.c
362 </sect2>
363 </sect1>
364
365 </chapter>
366
367 <chapter id="lower_layer">
368 <title>SCSI lower layer</title>
369 <sect1 id="hba_drivers">
370 <title>Host Bus Adapter transport types</title>
371 <para>
372 Many modern device controllers use the SCSI command set as a protocol to
373 communicate with their devices through many different types of physical
374 connections.
375 </para>
376 <para>
377 In SCSI language a bus capable of carrying SCSI commands is
378 called a "transport", and a controller connecting to such a bus is
379 called a "host bus adapter" (HBA).
380 </para>
381 <sect2 id="scsi_debug.c">
382 <title>Debug transport</title>
383 <para>
384 The file drivers/scsi/scsi_debug.c simulates a host adapter with a
385 variable number of disks (or disk like devices) attached, sharing a
386 common amount of RAM. Does a lot of checking to make sure that we are
387 not getting blocks mixed up, and panics the kernel if anything out of
388 the ordinary is seen.
389 </para>
390 <para>
391 To be more realistic, the simulated devices have the transport
392 attributes of SAS disks.
393 </para>
394 <para>
395 For documentation see
396 <ulink url='http://sg.danny.cz/sg/sdebug26.html'>http://sg.danny.cz/sg/sdebug26.html</ulink>
397 </para>
398<!-- !Edrivers/scsi/scsi_debug.c -->
399 </sect2>
400 <sect2 id="todo">
401 <title>todo</title>
402 <para>Parallel (fast/wide/ultra) SCSI, USB, SATA,
403 SAS, Fibre Channel, FireWire, ATAPI devices, Infiniband,
404 I20, iSCSI, Parallel ports, netlink...
405 </para>
406 </sect2>
407 </sect1>
408 </chapter>
409</book>
diff --git a/Documentation/DocBook/sh.tmpl b/Documentation/DocBook/sh.tmpl
deleted file mode 100644
index 4a38f604fa66..000000000000
--- a/Documentation/DocBook/sh.tmpl
+++ /dev/null
@@ -1,105 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="sh-drivers">
6 <bookinfo>
7 <title>SuperH Interfaces Guide</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Paul</firstname>
12 <surname>Mundt</surname>
13 <affiliation>
14 <address>
15 <email>lethal@linux-sh.org</email>
16 </address>
17 </affiliation>
18 </author>
19 </authorgroup>
20
21 <copyright>
22 <year>2008-2010</year>
23 <holder>Paul Mundt</holder>
24 </copyright>
25 <copyright>
26 <year>2008-2010</year>
27 <holder>Renesas Technology Corp.</holder>
28 </copyright>
29 <copyright>
30 <year>2010</year>
31 <holder>Renesas Electronics Corp.</holder>
32 </copyright>
33
34 <legalnotice>
35 <para>
36 This documentation is free software; you can redistribute
37 it and/or modify it under the terms of the GNU General Public
38 License version 2 as published by the Free Software Foundation.
39 </para>
40
41 <para>
42 This program is distributed in the hope that it will be
43 useful, but WITHOUT ANY WARRANTY; without even the implied
44 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
45 See the GNU General Public License for more details.
46 </para>
47
48 <para>
49 You should have received a copy of the GNU General Public
50 License along with this program; if not, write to the Free
51 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
52 MA 02111-1307 USA
53 </para>
54
55 <para>
56 For more details see the file COPYING in the source
57 distribution of Linux.
58 </para>
59 </legalnotice>
60 </bookinfo>
61
62<toc></toc>
63
64 <chapter id="mm">
65 <title>Memory Management</title>
66 <sect1 id="sh4">
67 <title>SH-4</title>
68 <sect2 id="sq">
69 <title>Store Queue API</title>
70!Earch/sh/kernel/cpu/sh4/sq.c
71 </sect2>
72 </sect1>
73 <sect1 id="sh5">
74 <title>SH-5</title>
75 <sect2 id="tlb">
76 <title>TLB Interfaces</title>
77!Iarch/sh/mm/tlb-sh5.c
78!Iarch/sh/include/asm/tlb_64.h
79 </sect2>
80 </sect1>
81 </chapter>
82 <chapter id="mach">
83 <title>Machine Specific Interfaces</title>
84 <sect1 id="dreamcast">
85 <title>mach-dreamcast</title>
86!Iarch/sh/boards/mach-dreamcast/rtc.c
87 </sect1>
88 <sect1 id="x3proto">
89 <title>mach-x3proto</title>
90!Earch/sh/boards/mach-x3proto/ilsel.c
91 </sect1>
92 </chapter>
93 <chapter id="busses">
94 <title>Busses</title>
95 <sect1 id="superhyway">
96 <title>SuperHyway</title>
97!Edrivers/sh/superhyway/superhyway.c
98 </sect1>
99
100 <sect1 id="maple">
101 <title>Maple</title>
102!Edrivers/sh/maple/maple.c
103 </sect1>
104 </chapter>
105</book>
diff --git a/Documentation/DocBook/stylesheet.xsl b/Documentation/DocBook/stylesheet.xsl
deleted file mode 100644
index 3bf4ecf3d760..000000000000
--- a/Documentation/DocBook/stylesheet.xsl
+++ /dev/null
@@ -1,11 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0">
3<param name="chunk.quietly">1</param>
4<param name="funcsynopsis.style">ansi</param>
5<param name="funcsynopsis.tabular.threshold">80</param>
6<param name="callout.graphics">0</param>
7<!-- <param name="paper.type">A4</param> -->
8<param name="generate.consistent.ids">1</param>
9<param name="generate.section.toc.level">2</param>
10<param name="use.id.as.filename">1</param>
11</stylesheet>
diff --git a/Documentation/DocBook/w1.tmpl b/Documentation/DocBook/w1.tmpl
deleted file mode 100644
index c65cb27abef9..000000000000
--- a/Documentation/DocBook/w1.tmpl
+++ /dev/null
@@ -1,101 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="w1id">
6 <bookinfo>
7 <title>W1: Dallas' 1-wire bus</title>
8
9 <authorgroup>
10 <author>
11 <firstname>David</firstname>
12 <surname>Fries</surname>
13 <affiliation>
14 <address>
15 <email>David@Fries.net</email>
16 </address>
17 </affiliation>
18 </author>
19
20 </authorgroup>
21
22 <copyright>
23 <year>2013</year>
24 <!--
25 <holder></holder>
26 -->
27 </copyright>
28
29 <legalnotice>
30 <para>
31 This documentation is free software; you can redistribute
32 it and/or modify it under the terms of the GNU General Public
33 License version 2.
34 </para>
35
36 <para>
37 This program is distributed in the hope that it will be
38 useful, but WITHOUT ANY WARRANTY; without even the implied
39 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
40 For more details see the file COPYING in the source
41 distribution of Linux.
42 </para>
43 </legalnotice>
44 </bookinfo>
45
46 <toc></toc>
47
48 <chapter id="w1_internal">
49 <title>W1 API internal to the kernel</title>
50
51 <sect1 id="w1_internal_api">
52 <title>W1 API internal to the kernel</title>
53 <sect2 id="w1.h">
54 <title>include/linux/w1.h</title>
55 <para>W1 kernel API functions.</para>
56!Iinclude/linux/w1.h
57 </sect2>
58
59 <sect2 id="w1.c">
60 <title>drivers/w1/w1.c</title>
61 <para>W1 core functions.</para>
62!Idrivers/w1/w1.c
63 </sect2>
64
65 <sect2 id="w1_family.c">
66 <title>drivers/w1/w1_family.c</title>
67 <para>Allows registering device family operations.</para>
68!Edrivers/w1/w1_family.c
69 </sect2>
70
71 <sect2 id="w1_internal.h">
72 <title>drivers/w1/w1_internal.h</title>
73 <para>W1 internal initialization for master devices.</para>
74!Idrivers/w1/w1_internal.h
75 </sect2>
76
77 <sect2 id="w1_int.c">
78 <title>drivers/w1/w1_int.c</title>
79 <para>W1 internal initialization for master devices.</para>
80!Edrivers/w1/w1_int.c
81 </sect2>
82
83 <sect2 id="w1_netlink.h">
84 <title>drivers/w1/w1_netlink.h</title>
85 <para>W1 external netlink API structures and commands.</para>
86!Idrivers/w1/w1_netlink.h
87 </sect2>
88
89 <sect2 id="w1_io.c">
90 <title>drivers/w1/w1_io.c</title>
91 <para>W1 input/output.</para>
92!Edrivers/w1/w1_io.c
93!Idrivers/w1/w1_io.c
94 </sect2>
95
96 </sect1>
97
98
99 </chapter>
100
101</book>
diff --git a/Documentation/DocBook/z8530book.tmpl b/Documentation/DocBook/z8530book.tmpl
deleted file mode 100644
index 6f3883be877e..000000000000
--- a/Documentation/DocBook/z8530book.tmpl
+++ /dev/null
@@ -1,371 +0,0 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="Z85230Guide">
6 <bookinfo>
7 <title>Z8530 Programming Guide</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Alan</firstname>
12 <surname>Cox</surname>
13 <affiliation>
14 <address>
15 <email>alan@lxorguk.ukuu.org.uk</email>
16 </address>
17 </affiliation>
18 </author>
19 </authorgroup>
20
21 <copyright>
22 <year>2000</year>
23 <holder>Alan Cox</holder>
24 </copyright>
25
26 <legalnotice>
27 <para>
28 This documentation is free software; you can redistribute
29 it and/or modify it under the terms of the GNU General Public
30 License as published by the Free Software Foundation; either
31 version 2 of the License, or (at your option) any later
32 version.
33 </para>
34
35 <para>
36 This program is distributed in the hope that it will be
37 useful, but WITHOUT ANY WARRANTY; without even the implied
38 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
39 See the GNU General Public License for more details.
40 </para>
41
42 <para>
43 You should have received a copy of the GNU General Public
44 License along with this program; if not, write to the Free
45 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
46 MA 02111-1307 USA
47 </para>
48
49 <para>
50 For more details see the file COPYING in the source
51 distribution of Linux.
52 </para>
53 </legalnotice>
54 </bookinfo>
55
56<toc></toc>
57
58 <chapter id="intro">
59 <title>Introduction</title>
60 <para>
61 The Z85x30 family synchronous/asynchronous controller chips are
62 used on a large number of cheap network interface cards. The
63 kernel provides a core interface layer that is designed to make
64 it easy to provide WAN services using this chip.
65 </para>
66 <para>
67 The current driver only support synchronous operation. Merging the
68 asynchronous driver support into this code to allow any Z85x30
69 device to be used as both a tty interface and as a synchronous
70 controller is a project for Linux post the 2.4 release
71 </para>
72 </chapter>
73
74 <chapter id="Driver_Modes">
75 <title>Driver Modes</title>
76 <para>
77 The Z85230 driver layer can drive Z8530, Z85C30 and Z85230 devices
78 in three different modes. Each mode can be applied to an individual
79 channel on the chip (each chip has two channels).
80 </para>
81 <para>
82 The PIO synchronous mode supports the most common Z8530 wiring. Here
83 the chip is interface to the I/O and interrupt facilities of the
84 host machine but not to the DMA subsystem. When running PIO the
85 Z8530 has extremely tight timing requirements. Doing high speeds,
86 even with a Z85230 will be tricky. Typically you should expect to
87 achieve at best 9600 baud with a Z8C530 and 64Kbits with a Z85230.
88 </para>
89 <para>
90 The DMA mode supports the chip when it is configured to use dual DMA
91 channels on an ISA bus. The better cards tend to support this mode
92 of operation for a single channel. With DMA running the Z85230 tops
93 out when it starts to hit ISA DMA constraints at about 512Kbits. It
94 is worth noting here that many PC machines hang or crash when the
95 chip is driven fast enough to hold the ISA bus solid.
96 </para>
97 <para>
98 Transmit DMA mode uses a single DMA channel. The DMA channel is used
99 for transmission as the transmit FIFO is smaller than the receive
100 FIFO. it gives better performance than pure PIO mode but is nowhere
101 near as ideal as pure DMA mode.
102 </para>
103 </chapter>
104
105 <chapter id="Using_the_Z85230_driver">
106 <title>Using the Z85230 driver</title>
107 <para>
108 The Z85230 driver provides the back end interface to your board. To
109 configure a Z8530 interface you need to detect the board and to
110 identify its ports and interrupt resources. It is also your problem
111 to verify the resources are available.
112 </para>
113 <para>
114 Having identified the chip you need to fill in a struct z8530_dev,
115 which describes each chip. This object must exist until you finally
116 shutdown the board. Firstly zero the active field. This ensures
117 nothing goes off without you intending it. The irq field should
118 be set to the interrupt number of the chip. (Each chip has a single
119 interrupt source rather than each channel). You are responsible
120 for allocating the interrupt line. The interrupt handler should be
121 set to <function>z8530_interrupt</function>. The device id should
122 be set to the z8530_dev structure pointer. Whether the interrupt can
123 be shared or not is board dependent, and up to you to initialise.
124 </para>
125 <para>
126 The structure holds two channel structures.
127 Initialise chanA.ctrlio and chanA.dataio with the address of the
128 control and data ports. You can or this with Z8530_PORT_SLEEP to
129 indicate your interface needs the 5uS delay for chip settling done
130 in software. The PORT_SLEEP option is architecture specific. Other
131 flags may become available on future platforms, eg for MMIO.
132 Initialise the chanA.irqs to &amp;z8530_nop to start the chip up
133 as disabled and discarding interrupt events. This ensures that
134 stray interrupts will be mopped up and not hang the bus. Set
135 chanA.dev to point to the device structure itself. The
136 private and name field you may use as you wish. The private field
137 is unused by the Z85230 layer. The name is used for error reporting
138 and it may thus make sense to make it match the network name.
139 </para>
140 <para>
141 Repeat the same operation with the B channel if your chip has
142 both channels wired to something useful. This isn't always the
143 case. If it is not wired then the I/O values do not matter, but
144 you must initialise chanB.dev.
145 </para>
146 <para>
147 If your board has DMA facilities then initialise the txdma and
148 rxdma fields for the relevant channels. You must also allocate the
149 ISA DMA channels and do any necessary board level initialisation
150 to configure them. The low level driver will do the Z8530 and
151 DMA controller programming but not board specific magic.
152 </para>
153 <para>
154 Having initialised the device you can then call
155 <function>z8530_init</function>. This will probe the chip and
156 reset it into a known state. An identification sequence is then
157 run to identify the chip type. If the checks fail to pass the
158 function returns a non zero error code. Typically this indicates
159 that the port given is not valid. After this call the
160 type field of the z8530_dev structure is initialised to either
161 Z8530, Z85C30 or Z85230 according to the chip found.
162 </para>
163 <para>
164 Once you have called z8530_init you can also make use of the utility
165 function <function>z8530_describe</function>. This provides a
166 consistent reporting format for the Z8530 devices, and allows all
167 the drivers to provide consistent reporting.
168 </para>
169 </chapter>
170
171 <chapter id="Attaching_Network_Interfaces">
172 <title>Attaching Network Interfaces</title>
173 <para>
174 If you wish to use the network interface facilities of the driver,
175 then you need to attach a network device to each channel that is
176 present and in use. In addition to use the generic HDLC
177 you need to follow some additional plumbing rules. They may seem
178 complex but a look at the example hostess_sv11 driver should
179 reassure you.
180 </para>
181 <para>
182 The network device used for each channel should be pointed to by
183 the netdevice field of each channel. The hdlc-&gt; priv field of the
184 network device points to your private data - you will need to be
185 able to find your private data from this.
186 </para>
187 <para>
188 The way most drivers approach this particular problem is to
189 create a structure holding the Z8530 device definition and
190 put that into the private field of the network device. The
191 network device fields of the channels then point back to the
192 network devices.
193 </para>
194 <para>
195 If you wish to use the generic HDLC then you need to register
196 the HDLC device.
197 </para>
198 <para>
199 Before you register your network device you will also need to
200 provide suitable handlers for most of the network device callbacks.
201 See the network device documentation for more details on this.
202 </para>
203 </chapter>
204
205 <chapter id="Configuring_And_Activating_The_Port">
206 <title>Configuring And Activating The Port</title>
207 <para>
208 The Z85230 driver provides helper functions and tables to load the
209 port registers on the Z8530 chips. When programming the register
210 settings for a channel be aware that the documentation recommends
211 initialisation orders. Strange things happen when these are not
212 followed.
213 </para>
214 <para>
215 <function>z8530_channel_load</function> takes an array of
216 pairs of initialisation values in an array of u8 type. The first
217 value is the Z8530 register number. Add 16 to indicate the alternate
218 register bank on the later chips. The array is terminated by a 255.
219 </para>
220 <para>
221 The driver provides a pair of public tables. The
222 z8530_hdlc_kilostream table is for the UK 'Kilostream' service and
223 also happens to cover most other end host configurations. The
224 z8530_hdlc_kilostream_85230 table is the same configuration using
225 the enhancements of the 85230 chip. The configuration loaded is
226 standard NRZ encoded synchronous data with HDLC bitstuffing. All
227 of the timing is taken from the other end of the link.
228 </para>
229 <para>
230 When writing your own tables be aware that the driver internally
231 tracks register values. It may need to reload values. You should
232 therefore be sure to set registers 1-7, 9-11, 14 and 15 in all
233 configurations. Where the register settings depend on DMA selection
234 the driver will update the bits itself when you open or close.
235 Loading a new table with the interface open is not recommended.
236 </para>
237 <para>
238 There are three standard configurations supported by the core
239 code. In PIO mode the interface is programmed up to use
240 interrupt driven PIO. This places high demands on the host processor
241 to avoid latency. The driver is written to take account of latency
242 issues but it cannot avoid latencies caused by other drivers,
243 notably IDE in PIO mode. Because the drivers allocate buffers you
244 must also prevent MTU changes while the port is open.
245 </para>
246 <para>
247 Once the port is open it will call the rx_function of each channel
248 whenever a completed packet arrived. This is invoked from
249 interrupt context and passes you the channel and a network
250 buffer (struct sk_buff) holding the data. The data includes
251 the CRC bytes so most users will want to trim the last two
252 bytes before processing the data. This function is very timing
253 critical. When you wish to simply discard data the support
254 code provides the function <function>z8530_null_rx</function>
255 to discard the data.
256 </para>
257 <para>
258 To active PIO mode sending and receiving the <function>
259 z8530_sync_open</function> is called. This expects to be passed
260 the network device and the channel. Typically this is called from
261 your network device open callback. On a failure a non zero error
262 status is returned. The <function>z8530_sync_close</function>
263 function shuts down a PIO channel. This must be done before the
264 channel is opened again and before the driver shuts down
265 and unloads.
266 </para>
267 <para>
268 The ideal mode of operation is dual channel DMA mode. Here the
269 kernel driver will configure the board for DMA in both directions.
270 The driver also handles ISA DMA issues such as controller
271 programming and the memory range limit for you. This mode is
272 activated by calling the <function>z8530_sync_dma_open</function>
273 function. On failure a non zero error value is returned.
274 Once this mode is activated it can be shut down by calling the
275 <function>z8530_sync_dma_close</function>. You must call the close
276 function matching the open mode you used.
277 </para>
278 <para>
279 The final supported mode uses a single DMA channel to drive the
280 transmit side. As the Z85C30 has a larger FIFO on the receive
281 channel this tends to increase the maximum speed a little.
282 This is activated by calling the <function>z8530_sync_txdma_open
283 </function>. This returns a non zero error code on failure. The
284 <function>z8530_sync_txdma_close</function> function closes down
285 the Z8530 interface from this mode.
286 </para>
287 </chapter>
288
289 <chapter id="Network_Layer_Functions">
290 <title>Network Layer Functions</title>
291 <para>
292 The Z8530 layer provides functions to queue packets for
293 transmission. The driver internally buffers the frame currently
294 being transmitted and one further frame (in order to keep back
295 to back transmission running). Any further buffering is up to
296 the caller.
297 </para>
298 <para>
299 The function <function>z8530_queue_xmit</function> takes a network
300 buffer in sk_buff format and queues it for transmission. The
301 caller must provide the entire packet with the exception of the
302 bitstuffing and CRC. This is normally done by the caller via
303 the generic HDLC interface layer. It returns 0 if the buffer has been
304 queued and non zero values for queue full. If the function accepts
305 the buffer it becomes property of the Z8530 layer and the caller
306 should not free it.
307 </para>
308 <para>
309 The function <function>z8530_get_stats</function> returns a pointer
310 to an internally maintained per interface statistics block. This
311 provides most of the interface code needed to implement the network
312 layer get_stats callback.
313 </para>
314 </chapter>
315
316 <chapter id="Porting_The_Z8530_Driver">
317 <title>Porting The Z8530 Driver</title>
318 <para>
319 The Z8530 driver is written to be portable. In DMA mode it makes
320 assumptions about the use of ISA DMA. These are probably warranted
321 in most cases as the Z85230 in particular was designed to glue to PC
322 type machines. The PIO mode makes no real assumptions.
323 </para>
324 <para>
325 Should you need to retarget the Z8530 driver to another architecture
326 the only code that should need changing are the port I/O functions.
327 At the moment these assume PC I/O port accesses. This may not be
328 appropriate for all platforms. Replacing
329 <function>z8530_read_port</function> and <function>z8530_write_port
330 </function> is intended to be all that is required to port this
331 driver layer.
332 </para>
333 </chapter>
334
335 <chapter id="bugs">
336 <title>Known Bugs And Assumptions</title>
337 <para>
338 <variablelist>
339 <varlistentry><term>Interrupt Locking</term>
340 <listitem>
341 <para>
342 The locking in the driver is done via the global cli/sti lock. This
343 makes for relatively poor SMP performance. Switching this to use a
344 per device spin lock would probably materially improve performance.
345 </para>
346 </listitem></varlistentry>
347
348 <varlistentry><term>Occasional Failures</term>
349 <listitem>
350 <para>
351 We have reports of occasional failures when run for very long
352 periods of time and the driver starts to receive junk frames. At
353 the moment the cause of this is not clear.
354 </para>
355 </listitem></varlistentry>
356 </variablelist>
357
358 </para>
359 </chapter>
360
361 <chapter id="pubfunctions">
362 <title>Public Functions Provided</title>
363!Edrivers/net/wan/z85230.c
364 </chapter>
365
366 <chapter id="intfunctions">
367 <title>Internal Functions</title>
368!Idrivers/net/wan/z85230.c
369 </chapter>
370
371</book>
diff --git a/Documentation/Makefile b/Documentation/Makefile
index c2a469112c37..a42320385df3 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -1 +1,126 @@
1# -*- makefile -*-
2# Makefile for Sphinx documentation
3#
4
1subdir-y := 5subdir-y :=
6
7# You can set these variables from the command line.
8SPHINXBUILD = sphinx-build
9SPHINXOPTS =
10SPHINXDIRS = .
11_SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(srctree)/Documentation/*/conf.py))
12SPHINX_CONF = conf.py
13PAPER =
14BUILDDIR = $(obj)/output
15PDFLATEX = xelatex
16LATEXOPTS = -interaction=batchmode
17
18# User-friendly check for sphinx-build
19HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi)
20
21ifeq ($(HAVE_SPHINX),0)
22
23.DEFAULT:
24 $(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
25 @echo " SKIP Sphinx $@ target."
26
27else # HAVE_SPHINX
28
29# User-friendly check for pdflatex
30HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
31
32# Internal variables.
33PAPEROPT_a4 = -D latex_paper_size=a4
34PAPEROPT_letter = -D latex_paper_size=letter
35KERNELDOC = $(srctree)/scripts/kernel-doc
36KERNELDOC_CONF = -D kerneldoc_srctree=$(srctree) -D kerneldoc_bin=$(KERNELDOC)
37ALLSPHINXOPTS = $(KERNELDOC_CONF) $(PAPEROPT_$(PAPER)) $(SPHINXOPTS)
38# the i18n builder cannot share the environment and doctrees with the others
39I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
40
41# commands; the 'cmd' from scripts/Kbuild.include is not *loopable*
42loop_cmd = $(echo-cmd) $(cmd_$(1)) || exit;
43
44# $2 sphinx builder e.g. "html"
45# $3 name of the build subfolder / e.g. "media", used as:
46# * dest folder relative to $(BUILDDIR) and
47# * cache folder relative to $(BUILDDIR)/.doctrees
48# $4 dest subfolder e.g. "man" for man pages at media/man
49# $5 reST source folder relative to $(srctree)/$(src),
50# e.g. "media" for the linux-tv book-set at ./Documentation/media
51
52quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
53 cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2 && \
54 PYTHONDONTWRITEBYTECODE=1 \
55 BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
56 $(SPHINXBUILD) \
57 -b $2 \
58 -c $(abspath $(srctree)/$(src)) \
59 -d $(abspath $(BUILDDIR)/.doctrees/$3) \
60 -D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \
61 $(ALLSPHINXOPTS) \
62 $(abspath $(srctree)/$(src)/$5) \
63 $(abspath $(BUILDDIR)/$3/$4)
64
65htmldocs:
66 @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
67
68linkcheckdocs:
69 @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
70
71latexdocs:
72 @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
73
74ifeq ($(HAVE_PDFLATEX),0)
75
76pdfdocs:
77 $(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
78 @echo " SKIP Sphinx $@ target."
79
80else # HAVE_PDFLATEX
81
82pdfdocs: latexdocs
83 $(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
84
85endif # HAVE_PDFLATEX
86
87epubdocs:
88 @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
89
90xmldocs:
91 @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
92
93endif # HAVE_SPHINX
94
95# The following targets are independent of HAVE_SPHINX, and the rules should
96# work or silently pass without Sphinx.
97
98# no-ops for the Sphinx toolchain
99sgmldocs:
100 @:
101psdocs:
102 @:
103mandocs:
104 @:
105installmandocs:
106 @:
107
108cleandocs:
109 $(Q)rm -rf $(BUILDDIR)
110 $(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media clean
111
112dochelp:
113 @echo ' Linux kernel internal documentation in different formats from ReST:'
114 @echo ' htmldocs - HTML'
115 @echo ' latexdocs - LaTeX'
116 @echo ' pdfdocs - PDF'
117 @echo ' epubdocs - EPUB'
118 @echo ' xmldocs - XML'
119 @echo ' linkcheckdocs - check for broken external links (will connect to external hosts)'
120 @echo ' cleandocs - clean all generated files'
121 @echo
122 @echo ' make SPHINXDIRS="s1 s2" [target] Generate only docs of folder s1, s2'
123 @echo ' valid values for SPHINXDIRS are: $(_SPHINXDIRS)'
124 @echo
125 @echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build'
126 @echo ' configuration. This is e.g. useful to build with nit-picking config.'
diff --git a/Documentation/Makefile.sphinx b/Documentation/Makefile.sphinx
deleted file mode 100644
index bcf529f6cf9b..000000000000
--- a/Documentation/Makefile.sphinx
+++ /dev/null
@@ -1,130 +0,0 @@
1# -*- makefile -*-
2# Makefile for Sphinx documentation
3#
4
5# You can set these variables from the command line.
6SPHINXBUILD = sphinx-build
7SPHINXOPTS =
8SPHINXDIRS = .
9_SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(srctree)/Documentation/*/conf.py))
10SPHINX_CONF = conf.py
11PAPER =
12BUILDDIR = $(obj)/output
13PDFLATEX = xelatex
14LATEXOPTS = -interaction=batchmode
15
16# User-friendly check for sphinx-build
17HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi)
18
19ifeq ($(HAVE_SPHINX),0)
20
21.DEFAULT:
22 $(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.)
23 @echo " SKIP Sphinx $@ target."
24
25else ifneq ($(DOCBOOKS),)
26
27# Skip Sphinx build if the user explicitly requested DOCBOOKS.
28.DEFAULT:
29 @echo " SKIP Sphinx $@ target (DOCBOOKS specified)."
30
31else # HAVE_SPHINX
32
33# User-friendly check for pdflatex
34HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi)
35
36# Internal variables.
37PAPEROPT_a4 = -D latex_paper_size=a4
38PAPEROPT_letter = -D latex_paper_size=letter
39KERNELDOC = $(srctree)/scripts/kernel-doc
40KERNELDOC_CONF = -D kerneldoc_srctree=$(srctree) -D kerneldoc_bin=$(KERNELDOC)
41ALLSPHINXOPTS = $(KERNELDOC_CONF) $(PAPEROPT_$(PAPER)) $(SPHINXOPTS)
42# the i18n builder cannot share the environment and doctrees with the others
43I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
44
45# commands; the 'cmd' from scripts/Kbuild.include is not *loopable*
46loop_cmd = $(echo-cmd) $(cmd_$(1)) || exit;
47
48# $2 sphinx builder e.g. "html"
49# $3 name of the build subfolder / e.g. "media", used as:
50# * dest folder relative to $(BUILDDIR) and
51# * cache folder relative to $(BUILDDIR)/.doctrees
52# $4 dest subfolder e.g. "man" for man pages at media/man
53# $5 reST source folder relative to $(srctree)/$(src),
54# e.g. "media" for the linux-tv book-set at ./Documentation/media
55
56quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
57 cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2 && \
58 PYTHONDONTWRITEBYTECODE=1 \
59 BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \
60 $(SPHINXBUILD) \
61 -b $2 \
62 -c $(abspath $(srctree)/$(src)) \
63 -d $(abspath $(BUILDDIR)/.doctrees/$3) \
64 -D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \
65 $(ALLSPHINXOPTS) \
66 $(abspath $(srctree)/$(src)/$5) \
67 $(abspath $(BUILDDIR)/$3/$4)
68
69htmldocs:
70 @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var)))
71
72linkcheckdocs:
73 @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var)))
74
75latexdocs:
76 @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var)))
77
78ifeq ($(HAVE_PDFLATEX),0)
79
80pdfdocs:
81 $(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.)
82 @echo " SKIP Sphinx $@ target."
83
84else # HAVE_PDFLATEX
85
86pdfdocs: latexdocs
87 $(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;)
88
89endif # HAVE_PDFLATEX
90
91epubdocs:
92 @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var)))
93
94xmldocs:
95 @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var)))
96
97endif # HAVE_SPHINX
98
99# The following targets are independent of HAVE_SPHINX, and the rules should
100# work or silently pass without Sphinx.
101
102# no-ops for the Sphinx toolchain
103sgmldocs:
104 @:
105psdocs:
106 @:
107mandocs:
108 @:
109installmandocs:
110 @:
111
112cleandocs:
113 $(Q)rm -rf $(BUILDDIR)
114 $(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media clean
115
116dochelp:
117 @echo ' Linux kernel internal documentation in different formats (Sphinx):'
118 @echo ' htmldocs - HTML'
119 @echo ' latexdocs - LaTeX'
120 @echo ' pdfdocs - PDF'
121 @echo ' epubdocs - EPUB'
122 @echo ' xmldocs - XML'
123 @echo ' linkcheckdocs - check for broken external links (will connect to external hosts)'
124 @echo ' cleandocs - clean all generated files'
125 @echo
126 @echo ' make SPHINXDIRS="s1 s2" [target] Generate only docs of folder s1, s2'
127 @echo ' valid values for SPHINXDIRS are: $(_SPHINXDIRS)'
128 @echo
129 @echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build'
130 @echo ' configuration. This is e.g. useful to build with nit-picking config.'
diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
index 1e37138027a3..618e13d5e276 100644
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.txt
@@ -186,7 +186,7 @@ must disable interrupts while the lock is held. If the device sends
186a different interrupt, the driver will deadlock trying to recursively 186a different interrupt, the driver will deadlock trying to recursively
187acquire the spinlock. Such deadlocks can be avoided by using 187acquire the spinlock. Such deadlocks can be avoided by using
188spin_lock_irqsave() or spin_lock_irq() which disable local interrupts 188spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
189and acquire the lock (see Documentation/DocBook/kernel-locking). 189and acquire the lock (see Documentation/kernel-hacking/locking.rst).
190 190
1914.5 How to tell whether MSI/MSI-X is enabled on a device 1914.5 How to tell whether MSI/MSI-X is enabled on a device
192 192
diff --git a/Documentation/security/LoadPin.txt b/Documentation/admin-guide/LSM/LoadPin.rst
index e11877f5d3d4..32070762d24c 100644
--- a/Documentation/security/LoadPin.txt
+++ b/Documentation/admin-guide/LSM/LoadPin.rst
@@ -1,3 +1,7 @@
1=======
2LoadPin
3=======
4
1LoadPin is a Linux Security Module that ensures all kernel-loaded files 5LoadPin is a Linux Security Module that ensures all kernel-loaded files
2(modules, firmware, etc) all originate from the same filesystem, with 6(modules, firmware, etc) all originate from the same filesystem, with
3the expectation that such a filesystem is backed by a read-only device 7the expectation that such a filesystem is backed by a read-only device
@@ -5,13 +9,13 @@ such as dm-verity or CDROM. This allows systems that have a verified
5and/or unchangeable filesystem to enforce module and firmware loading 9and/or unchangeable filesystem to enforce module and firmware loading
6restrictions without needing to sign the files individually. 10restrictions without needing to sign the files individually.
7 11
8The LSM is selectable at build-time with CONFIG_SECURITY_LOADPIN, and 12The LSM is selectable at build-time with ``CONFIG_SECURITY_LOADPIN``, and
9can be controlled at boot-time with the kernel command line option 13can be controlled at boot-time with the kernel command line option
10"loadpin.enabled". By default, it is enabled, but can be disabled at 14"``loadpin.enabled``". By default, it is enabled, but can be disabled at
11boot ("loadpin.enabled=0"). 15boot ("``loadpin.enabled=0``").
12 16
13LoadPin starts pinning when it sees the first file loaded. If the 17LoadPin starts pinning when it sees the first file loaded. If the
14block device backing the filesystem is not read-only, a sysctl is 18block device backing the filesystem is not read-only, a sysctl is
15created to toggle pinning: /proc/sys/kernel/loadpin/enabled. (Having 19created to toggle pinning: ``/proc/sys/kernel/loadpin/enabled``. (Having
16a mutable filesystem means pinning is mutable too, but having the 20a mutable filesystem means pinning is mutable too, but having the
17sysctl allows for easy testing on systems with a mutable filesystem.) 21sysctl allows for easy testing on systems with a mutable filesystem.)
diff --git a/Documentation/security/SELinux.txt b/Documentation/admin-guide/LSM/SELinux.rst
index 07eae00f3314..f722c9b4173a 100644
--- a/Documentation/security/SELinux.txt
+++ b/Documentation/admin-guide/LSM/SELinux.rst
@@ -1,27 +1,33 @@
1=======
2SELinux
3=======
4
1If you want to use SELinux, chances are you will want 5If you want to use SELinux, chances are you will want
2to use the distro-provided policies, or install the 6to use the distro-provided policies, or install the
3latest reference policy release from 7latest reference policy release from
8
4 http://oss.tresys.com/projects/refpolicy 9 http://oss.tresys.com/projects/refpolicy
5 10
6However, if you want to install a dummy policy for 11However, if you want to install a dummy policy for
7testing, you can do using 'mdp' provided under 12testing, you can do using ``mdp`` provided under
8scripts/selinux. Note that this requires the selinux 13scripts/selinux. Note that this requires the selinux
9userspace to be installed - in particular you will 14userspace to be installed - in particular you will
10need checkpolicy to compile a kernel, and setfiles and 15need checkpolicy to compile a kernel, and setfiles and
11fixfiles to label the filesystem. 16fixfiles to label the filesystem.
12 17
13 1. Compile the kernel with selinux enabled. 18 1. Compile the kernel with selinux enabled.
14 2. Type 'make' to compile mdp. 19 2. Type ``make`` to compile ``mdp``.
15 3. Make sure that you are not running with 20 3. Make sure that you are not running with
16 SELinux enabled and a real policy. If 21 SELinux enabled and a real policy. If
17 you are, reboot with selinux disabled 22 you are, reboot with selinux disabled
18 before continuing. 23 before continuing.
19 4. Run install_policy.sh: 24 4. Run install_policy.sh::
25
20 cd scripts/selinux 26 cd scripts/selinux
21 sh install_policy.sh 27 sh install_policy.sh
22 28
23Step 4 will create a new dummy policy valid for your 29Step 4 will create a new dummy policy valid for your
24kernel, with a single selinux user, role, and type. 30kernel, with a single selinux user, role, and type.
25It will compile the policy, will set your SELINUXTYPE to 31It will compile the policy, will set your ``SELINUXTYPE`` to
26dummy in /etc/selinux/config, install the compiled policy 32``dummy`` in ``/etc/selinux/config``, install the compiled policy
27as 'dummy', and relabel your filesystem. 33as ``dummy``, and relabel your filesystem.
diff --git a/Documentation/security/Smack.txt b/Documentation/admin-guide/LSM/Smack.rst
index 945cc633d883..6a5826a13aea 100644
--- a/Documentation/security/Smack.txt
+++ b/Documentation/admin-guide/LSM/Smack.rst
@@ -1,3 +1,6 @@
1=====
2Smack
3=====
1 4
2 5
3 "Good for you, you've decided to clean the elevator!" 6 "Good for you, you've decided to clean the elevator!"
@@ -14,6 +17,7 @@ available to determine which is best suited to the problem
14at hand. 17at hand.
15 18
16Smack consists of three major components: 19Smack consists of three major components:
20
17 - The kernel 21 - The kernel
18 - Basic utilities, which are helpful but not required 22 - Basic utilities, which are helpful but not required
19 - Configuration data 23 - Configuration data
@@ -39,16 +43,24 @@ The current git repository for Smack user space is:
39This should make and install on most modern distributions. 43This should make and install on most modern distributions.
40There are five commands included in smackutil: 44There are five commands included in smackutil:
41 45
42chsmack - display or set Smack extended attribute values 46chsmack:
43smackctl - load the Smack access rules 47 display or set Smack extended attribute values
44smackaccess - report if a process with one label has access 48
45 to an object with another 49smackctl:
50 load the Smack access rules
51
52smackaccess:
53 report if a process with one label has access
54 to an object with another
46 55
47These two commands are obsolete with the introduction of 56These two commands are obsolete with the introduction of
48the smackfs/load2 and smackfs/cipso2 interfaces. 57the smackfs/load2 and smackfs/cipso2 interfaces.
49 58
50smackload - properly formats data for writing to smackfs/load 59smackload:
51smackcipso - properly formats data for writing to smackfs/cipso 60 properly formats data for writing to smackfs/load
61
62smackcipso:
63 properly formats data for writing to smackfs/cipso
52 64
53In keeping with the intent of Smack, configuration data is 65In keeping with the intent of Smack, configuration data is
54minimal and not strictly required. The most important 66minimal and not strictly required. The most important
@@ -56,15 +68,15 @@ configuration step is mounting the smackfs pseudo filesystem.
56If smackutil is installed the startup script will take care 68If smackutil is installed the startup script will take care
57of this, but it can be manually as well. 69of this, but it can be manually as well.
58 70
59Add this line to /etc/fstab: 71Add this line to ``/etc/fstab``::
60 72
61 smackfs /sys/fs/smackfs smackfs defaults 0 0 73 smackfs /sys/fs/smackfs smackfs defaults 0 0
62 74
63The /sys/fs/smackfs directory is created by the kernel. 75The ``/sys/fs/smackfs`` directory is created by the kernel.
64 76
65Smack uses extended attributes (xattrs) to store labels on filesystem 77Smack uses extended attributes (xattrs) to store labels on filesystem
66objects. The attributes are stored in the extended attribute security 78objects. The attributes are stored in the extended attribute security
67name space. A process must have CAP_MAC_ADMIN to change any of these 79name space. A process must have ``CAP_MAC_ADMIN`` to change any of these
68attributes. 80attributes.
69 81
70The extended attributes that Smack uses are: 82The extended attributes that Smack uses are:
@@ -73,14 +85,17 @@ SMACK64
73 Used to make access control decisions. In almost all cases 85 Used to make access control decisions. In almost all cases
74 the label given to a new filesystem object will be the label 86 the label given to a new filesystem object will be the label
75 of the process that created it. 87 of the process that created it.
88
76SMACK64EXEC 89SMACK64EXEC
77 The Smack label of a process that execs a program file with 90 The Smack label of a process that execs a program file with
78 this attribute set will run with this attribute's value. 91 this attribute set will run with this attribute's value.
92
79SMACK64MMAP 93SMACK64MMAP
80 Don't allow the file to be mmapped by a process whose Smack 94 Don't allow the file to be mmapped by a process whose Smack
81 label does not allow all of the access permitted to a process 95 label does not allow all of the access permitted to a process
82 with the label contained in this attribute. This is a very 96 with the label contained in this attribute. This is a very
83 specific use case for shared libraries. 97 specific use case for shared libraries.
98
84SMACK64TRANSMUTE 99SMACK64TRANSMUTE
85 Can only have the value "TRUE". If this attribute is present 100 Can only have the value "TRUE". If this attribute is present
86 on a directory when an object is created in the directory and 101 on a directory when an object is created in the directory and
@@ -89,27 +104,29 @@ SMACK64TRANSMUTE
89 gets the label of the directory instead of the label of the 104 gets the label of the directory instead of the label of the
90 creating process. If the object being created is a directory 105 creating process. If the object being created is a directory
91 the SMACK64TRANSMUTE attribute is set as well. 106 the SMACK64TRANSMUTE attribute is set as well.
107
92SMACK64IPIN 108SMACK64IPIN
93 This attribute is only available on file descriptors for sockets. 109 This attribute is only available on file descriptors for sockets.
94 Use the Smack label in this attribute for access control 110 Use the Smack label in this attribute for access control
95 decisions on packets being delivered to this socket. 111 decisions on packets being delivered to this socket.
112
96SMACK64IPOUT 113SMACK64IPOUT
97 This attribute is only available on file descriptors for sockets. 114 This attribute is only available on file descriptors for sockets.
98 Use the Smack label in this attribute for access control 115 Use the Smack label in this attribute for access control
99 decisions on packets coming from this socket. 116 decisions on packets coming from this socket.
100 117
101There are multiple ways to set a Smack label on a file: 118There are multiple ways to set a Smack label on a file::
102 119
103 # attr -S -s SMACK64 -V "value" path 120 # attr -S -s SMACK64 -V "value" path
104 # chsmack -a value path 121 # chsmack -a value path
105 122
106A process can see the Smack label it is running with by 123A process can see the Smack label it is running with by
107reading /proc/self/attr/current. A process with CAP_MAC_ADMIN 124reading ``/proc/self/attr/current``. A process with ``CAP_MAC_ADMIN``
108can set the process Smack by writing there. 125can set the process Smack by writing there.
109 126
110Most Smack configuration is accomplished by writing to files 127Most Smack configuration is accomplished by writing to files
111in the smackfs filesystem. This pseudo-filesystem is mounted 128in the smackfs filesystem. This pseudo-filesystem is mounted
112on /sys/fs/smackfs. 129on ``/sys/fs/smackfs``.
113 130
114access 131access
115 Provided for backward compatibility. The access2 interface 132 Provided for backward compatibility. The access2 interface
@@ -120,6 +137,7 @@ access
120 this file. The next read will indicate whether the access 137 this file. The next read will indicate whether the access
121 would be permitted. The text will be either "1" indicating 138 would be permitted. The text will be either "1" indicating
122 access, or "0" indicating denial. 139 access, or "0" indicating denial.
140
123access2 141access2
124 This interface reports whether a subject with the specified 142 This interface reports whether a subject with the specified
125 Smack label has a particular access to an object with a 143 Smack label has a particular access to an object with a
@@ -127,13 +145,17 @@ access2
127 this file. The next read will indicate whether the access 145 this file. The next read will indicate whether the access
128 would be permitted. The text will be either "1" indicating 146 would be permitted. The text will be either "1" indicating
129 access, or "0" indicating denial. 147 access, or "0" indicating denial.
148
130ambient 149ambient
131 This contains the Smack label applied to unlabeled network 150 This contains the Smack label applied to unlabeled network
132 packets. 151 packets.
152
133change-rule 153change-rule
134 This interface allows modification of existing access control rules. 154 This interface allows modification of existing access control rules.
135 The format accepted on write is: 155 The format accepted on write is::
156
136 "%s %s %s %s" 157 "%s %s %s %s"
158
137 where the first string is the subject label, the second the 159 where the first string is the subject label, the second the
138 object label, the third the access to allow and the fourth the 160 object label, the third the access to allow and the fourth the
139 access to deny. The access strings may contain only the characters 161 access to deny. The access strings may contain only the characters
@@ -141,47 +163,63 @@ change-rule
141 modified by enabling the permissions in the third string and disabling 163 modified by enabling the permissions in the third string and disabling
142 those in the fourth string. If there is no such rule it will be 164 those in the fourth string. If there is no such rule it will be
143 created using the access specified in the third and the fourth strings. 165 created using the access specified in the third and the fourth strings.
166
144cipso 167cipso
145 Provided for backward compatibility. The cipso2 interface 168 Provided for backward compatibility. The cipso2 interface
146 is preferred and should be used instead. 169 is preferred and should be used instead.
147 This interface allows a specific CIPSO header to be assigned 170 This interface allows a specific CIPSO header to be assigned
148 to a Smack label. The format accepted on write is: 171 to a Smack label. The format accepted on write is::
172
149 "%24s%4d%4d"["%4d"]... 173 "%24s%4d%4d"["%4d"]...
174
150 The first string is a fixed Smack label. The first number is 175 The first string is a fixed Smack label. The first number is
151 the level to use. The second number is the number of categories. 176 the level to use. The second number is the number of categories.
152 The following numbers are the categories. 177 The following numbers are the categories::
153 "level-3-cats-5-19 3 2 5 19" 178
179 "level-3-cats-5-19 3 2 5 19"
180
154cipso2 181cipso2
155 This interface allows a specific CIPSO header to be assigned 182 This interface allows a specific CIPSO header to be assigned
156 to a Smack label. The format accepted on write is: 183 to a Smack label. The format accepted on write is::
157 "%s%4d%4d"["%4d"]... 184
185 "%s%4d%4d"["%4d"]...
186
158 The first string is a long Smack label. The first number is 187 The first string is a long Smack label. The first number is
159 the level to use. The second number is the number of categories. 188 the level to use. The second number is the number of categories.
160 The following numbers are the categories. 189 The following numbers are the categories::
161 "level-3-cats-5-19 3 2 5 19" 190
191 "level-3-cats-5-19 3 2 5 19"
192
162direct 193direct
163 This contains the CIPSO level used for Smack direct label 194 This contains the CIPSO level used for Smack direct label
164 representation in network packets. 195 representation in network packets.
196
165doi 197doi
166 This contains the CIPSO domain of interpretation used in 198 This contains the CIPSO domain of interpretation used in
167 network packets. 199 network packets.
200
168ipv6host 201ipv6host
169 This interface allows specific IPv6 internet addresses to be 202 This interface allows specific IPv6 internet addresses to be
170 treated as single label hosts. Packets are sent to single 203 treated as single label hosts. Packets are sent to single
171 label hosts only from processes that have Smack write access 204 label hosts only from processes that have Smack write access
172 to the host label. All packets received from single label hosts 205 to the host label. All packets received from single label hosts
173 are given the specified label. The format accepted on write is: 206 are given the specified label. The format accepted on write is::
207
174 "%h:%h:%h:%h:%h:%h:%h:%h label" or 208 "%h:%h:%h:%h:%h:%h:%h:%h label" or
175 "%h:%h:%h:%h:%h:%h:%h:%h/%d label". 209 "%h:%h:%h:%h:%h:%h:%h:%h/%d label".
210
176 The "::" address shortcut is not supported. 211 The "::" address shortcut is not supported.
177 If label is "-DELETE" a matched entry will be deleted. 212 If label is "-DELETE" a matched entry will be deleted.
213
178load 214load
179 Provided for backward compatibility. The load2 interface 215 Provided for backward compatibility. The load2 interface
180 is preferred and should be used instead. 216 is preferred and should be used instead.
181 This interface allows access control rules in addition to 217 This interface allows access control rules in addition to
182 the system defined rules to be specified. The format accepted 218 the system defined rules to be specified. The format accepted
183 on write is: 219 on write is::
220
184 "%24s%24s%5s" 221 "%24s%24s%5s"
222
185 where the first string is the subject label, the second the 223 where the first string is the subject label, the second the
186 object label, and the third the requested access. The access 224 object label, and the third the requested access. The access
187 string may contain only the characters "rwxat-", and specifies 225 string may contain only the characters "rwxat-", and specifies
@@ -189,17 +227,21 @@ load
189 permissions that are not allowed. The string "r-x--" would 227 permissions that are not allowed. The string "r-x--" would
190 specify read and execute access. Labels are limited to 23 228 specify read and execute access. Labels are limited to 23
191 characters in length. 229 characters in length.
230
192load2 231load2
193 This interface allows access control rules in addition to 232 This interface allows access control rules in addition to
194 the system defined rules to be specified. The format accepted 233 the system defined rules to be specified. The format accepted
195 on write is: 234 on write is::
235
196 "%s %s %s" 236 "%s %s %s"
237
197 where the first string is the subject label, the second the 238 where the first string is the subject label, the second the
198 object label, and the third the requested access. The access 239 object label, and the third the requested access. The access
199 string may contain only the characters "rwxat-", and specifies 240 string may contain only the characters "rwxat-", and specifies
200 which sort of access is allowed. The "-" is a placeholder for 241 which sort of access is allowed. The "-" is a placeholder for
201 permissions that are not allowed. The string "r-x--" would 242 permissions that are not allowed. The string "r-x--" would
202 specify read and execute access. 243 specify read and execute access.
244
203load-self 245load-self
204 Provided for backward compatibility. The load-self2 interface 246 Provided for backward compatibility. The load-self2 interface
205 is preferred and should be used instead. 247 is preferred and should be used instead.
@@ -208,66 +250,83 @@ load-self
208 otherwise be permitted, and are intended to provide additional 250 otherwise be permitted, and are intended to provide additional
209 restrictions on the process. The format is the same as for 251 restrictions on the process. The format is the same as for
210 the load interface. 252 the load interface.
253
211load-self2 254load-self2
212 This interface allows process specific access rules to be 255 This interface allows process specific access rules to be
213 defined. These rules are only consulted if access would 256 defined. These rules are only consulted if access would
214 otherwise be permitted, and are intended to provide additional 257 otherwise be permitted, and are intended to provide additional
215 restrictions on the process. The format is the same as for 258 restrictions on the process. The format is the same as for
216 the load2 interface. 259 the load2 interface.
260
217logging 261logging
218 This contains the Smack logging state. 262 This contains the Smack logging state.
263
219mapped 264mapped
220 This contains the CIPSO level used for Smack mapped label 265 This contains the CIPSO level used for Smack mapped label
221 representation in network packets. 266 representation in network packets.
267
222netlabel 268netlabel
223 This interface allows specific internet addresses to be 269 This interface allows specific internet addresses to be
224 treated as single label hosts. Packets are sent to single 270 treated as single label hosts. Packets are sent to single
225 label hosts without CIPSO headers, but only from processes 271 label hosts without CIPSO headers, but only from processes
226 that have Smack write access to the host label. All packets 272 that have Smack write access to the host label. All packets
227 received from single label hosts are given the specified 273 received from single label hosts are given the specified
228 label. The format accepted on write is: 274 label. The format accepted on write is::
275
229 "%d.%d.%d.%d label" or "%d.%d.%d.%d/%d label". 276 "%d.%d.%d.%d label" or "%d.%d.%d.%d/%d label".
277
230 If the label specified is "-CIPSO" the address is treated 278 If the label specified is "-CIPSO" the address is treated
231 as a host that supports CIPSO headers. 279 as a host that supports CIPSO headers.
280
232onlycap 281onlycap
233 This contains labels processes must have for CAP_MAC_ADMIN 282 This contains labels processes must have for CAP_MAC_ADMIN
234 and CAP_MAC_OVERRIDE to be effective. If this file is empty 283 and ``CAP_MAC_OVERRIDE`` to be effective. If this file is empty
235 these capabilities are effective at for processes with any 284 these capabilities are effective at for processes with any
236 label. The values are set by writing the desired labels, separated 285 label. The values are set by writing the desired labels, separated
237 by spaces, to the file or cleared by writing "-" to the file. 286 by spaces, to the file or cleared by writing "-" to the file.
287
238ptrace 288ptrace
239 This is used to define the current ptrace policy 289 This is used to define the current ptrace policy
240 0 - default: this is the policy that relies on Smack access rules. 290
241 For the PTRACE_READ a subject needs to have a read access on 291 0 - default:
242 object. For the PTRACE_ATTACH a read-write access is required. 292 this is the policy that relies on Smack access rules.
243 1 - exact: this is the policy that limits PTRACE_ATTACH. Attach is 293 For the ``PTRACE_READ`` a subject needs to have a read access on
294 object. For the ``PTRACE_ATTACH`` a read-write access is required.
295
296 1 - exact:
297 this is the policy that limits ``PTRACE_ATTACH``. Attach is
244 only allowed when subject's and object's labels are equal. 298 only allowed when subject's and object's labels are equal.
245 PTRACE_READ is not affected. Can be overridden with CAP_SYS_PTRACE. 299 ``PTRACE_READ`` is not affected. Can be overridden with ``CAP_SYS_PTRACE``.
246 2 - draconian: this policy behaves like the 'exact' above with an 300
247 exception that it can't be overridden with CAP_SYS_PTRACE. 301 2 - draconian:
302 this policy behaves like the 'exact' above with an
303 exception that it can't be overridden with ``CAP_SYS_PTRACE``.
304
248revoke-subject 305revoke-subject
249 Writing a Smack label here sets the access to '-' for all access 306 Writing a Smack label here sets the access to '-' for all access
250 rules with that subject label. 307 rules with that subject label.
308
251unconfined 309unconfined
252 If the kernel is configured with CONFIG_SECURITY_SMACK_BRINGUP 310 If the kernel is configured with ``CONFIG_SECURITY_SMACK_BRINGUP``
253 a process with CAP_MAC_ADMIN can write a label into this interface. 311 a process with ``CAP_MAC_ADMIN`` can write a label into this interface.
254 Thereafter, accesses that involve that label will be logged and 312 Thereafter, accesses that involve that label will be logged and
255 the access permitted if it wouldn't be otherwise. Note that this 313 the access permitted if it wouldn't be otherwise. Note that this
256 is dangerous and can ruin the proper labeling of your system. 314 is dangerous and can ruin the proper labeling of your system.
257 It should never be used in production. 315 It should never be used in production.
316
258relabel-self 317relabel-self
259 This interface contains a list of labels to which the process can 318 This interface contains a list of labels to which the process can
260 transition to, by writing to /proc/self/attr/current. 319 transition to, by writing to ``/proc/self/attr/current``.
261 Normally a process can change its own label to any legal value, but only 320 Normally a process can change its own label to any legal value, but only
262 if it has CAP_MAC_ADMIN. This interface allows a process without 321 if it has ``CAP_MAC_ADMIN``. This interface allows a process without
263 CAP_MAC_ADMIN to relabel itself to one of labels from predefined list. 322 ``CAP_MAC_ADMIN`` to relabel itself to one of labels from predefined list.
264 A process without CAP_MAC_ADMIN can change its label only once. When it 323 A process without ``CAP_MAC_ADMIN`` can change its label only once. When it
265 does, this list will be cleared. 324 does, this list will be cleared.
266 The values are set by writing the desired labels, separated 325 The values are set by writing the desired labels, separated
267 by spaces, to the file or cleared by writing "-" to the file. 326 by spaces, to the file or cleared by writing "-" to the file.
268 327
269If you are using the smackload utility 328If you are using the smackload utility
270you can add access rules in /etc/smack/accesses. They take the form: 329you can add access rules in ``/etc/smack/accesses``. They take the form::
271 330
272 subjectlabel objectlabel access 331 subjectlabel objectlabel access
273 332
@@ -277,14 +336,14 @@ object with objectlabel. If there is no rule no access is allowed.
277 336
278Look for additional programs on http://schaufler-ca.com 337Look for additional programs on http://schaufler-ca.com
279 338
280From the Smack Whitepaper: 339The Simplified Mandatory Access Control Kernel (Whitepaper)
281 340===========================================================
282The Simplified Mandatory Access Control Kernel
283 341
284Casey Schaufler 342Casey Schaufler
285casey@schaufler-ca.com 343casey@schaufler-ca.com
286 344
287Mandatory Access Control 345Mandatory Access Control
346------------------------
288 347
289Computer systems employ a variety of schemes to constrain how information is 348Computer systems employ a variety of schemes to constrain how information is
290shared among the people and services using the machine. Some of these schemes 349shared among the people and services using the machine. Some of these schemes
@@ -297,6 +356,7 @@ access control mechanisms because you don't have a choice regarding the users
297or programs that have access to pieces of data. 356or programs that have access to pieces of data.
298 357
299Bell & LaPadula 358Bell & LaPadula
359---------------
300 360
301From the middle of the 1980's until the turn of the century Mandatory Access 361From the middle of the 1980's until the turn of the century Mandatory Access
302Control (MAC) was very closely associated with the Bell & LaPadula security 362Control (MAC) was very closely associated with the Bell & LaPadula security
@@ -306,6 +366,7 @@ within the Capital Beltway and Scandinavian supercomputer centers but was
306often sited as failing to address general needs. 366often sited as failing to address general needs.
307 367
308Domain Type Enforcement 368Domain Type Enforcement
369-----------------------
309 370
310Around the turn of the century Domain Type Enforcement (DTE) became popular. 371Around the turn of the century Domain Type Enforcement (DTE) became popular.
311This scheme organizes users, programs, and data into domains that are 372This scheme organizes users, programs, and data into domains that are
@@ -316,6 +377,7 @@ necessary to provide a secure domain mapping leads to the scheme being
316disabled or used in limited ways in the majority of cases. 377disabled or used in limited ways in the majority of cases.
317 378
318Smack 379Smack
380-----
319 381
320Smack is a Mandatory Access Control mechanism designed to provide useful MAC 382Smack is a Mandatory Access Control mechanism designed to provide useful MAC
321while avoiding the pitfalls of its predecessors. The limitations of Bell & 383while avoiding the pitfalls of its predecessors. The limitations of Bell &
@@ -326,46 +388,55 @@ Enforcement and avoided by defining access controls in terms of the access
326modes already in use. 388modes already in use.
327 389
328Smack Terminology 390Smack Terminology
391-----------------
329 392
330The jargon used to talk about Smack will be familiar to those who have dealt 393The jargon used to talk about Smack will be familiar to those who have dealt
331with other MAC systems and shouldn't be too difficult for the uninitiated to 394with other MAC systems and shouldn't be too difficult for the uninitiated to
332pick up. There are four terms that are used in a specific way and that are 395pick up. There are four terms that are used in a specific way and that are
333especially important: 396especially important:
334 397
335 Subject: A subject is an active entity on the computer system. 398 Subject:
399 A subject is an active entity on the computer system.
336 On Smack a subject is a task, which is in turn the basic unit 400 On Smack a subject is a task, which is in turn the basic unit
337 of execution. 401 of execution.
338 402
339 Object: An object is a passive entity on the computer system. 403 Object:
404 An object is a passive entity on the computer system.
340 On Smack files of all types, IPC, and tasks can be objects. 405 On Smack files of all types, IPC, and tasks can be objects.
341 406
342 Access: Any attempt by a subject to put information into or get 407 Access:
408 Any attempt by a subject to put information into or get
343 information from an object is an access. 409 information from an object is an access.
344 410
345 Label: Data that identifies the Mandatory Access Control 411 Label:
412 Data that identifies the Mandatory Access Control
346 characteristics of a subject or an object. 413 characteristics of a subject or an object.
347 414
348These definitions are consistent with the traditional use in the security 415These definitions are consistent with the traditional use in the security
349community. There are also some terms from Linux that are likely to crop up: 416community. There are also some terms from Linux that are likely to crop up:
350 417
351 Capability: A task that possesses a capability has permission to 418 Capability:
419 A task that possesses a capability has permission to
352 violate an aspect of the system security policy, as identified by 420 violate an aspect of the system security policy, as identified by
353 the specific capability. A task that possesses one or more 421 the specific capability. A task that possesses one or more
354 capabilities is a privileged task, whereas a task with no 422 capabilities is a privileged task, whereas a task with no
355 capabilities is an unprivileged task. 423 capabilities is an unprivileged task.
356 424
357 Privilege: A task that is allowed to violate the system security 425 Privilege:
426 A task that is allowed to violate the system security
358 policy is said to have privilege. As of this writing a task can 427 policy is said to have privilege. As of this writing a task can
359 have privilege either by possessing capabilities or by having an 428 have privilege either by possessing capabilities or by having an
360 effective user of root. 429 effective user of root.
361 430
362Smack Basics 431Smack Basics
432------------
363 433
364Smack is an extension to a Linux system. It enforces additional restrictions 434Smack is an extension to a Linux system. It enforces additional restrictions
365on what subjects can access which objects, based on the labels attached to 435on what subjects can access which objects, based on the labels attached to
366each of the subject and the object. 436each of the subject and the object.
367 437
368Labels 438Labels
439~~~~~~
369 440
370Smack labels are ASCII character strings. They can be up to 255 characters 441Smack labels are ASCII character strings. They can be up to 255 characters
371long, but keeping them to twenty-three characters is recommended. 442long, but keeping them to twenty-three characters is recommended.
@@ -377,7 +448,7 @@ contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
377(quote) and '"' (double-quote) characters. 448(quote) and '"' (double-quote) characters.
378Smack labels cannot begin with a '-'. This is reserved for special options. 449Smack labels cannot begin with a '-'. This is reserved for special options.
379 450
380There are some predefined labels: 451There are some predefined labels::
381 452
382 _ Pronounced "floor", a single underscore character. 453 _ Pronounced "floor", a single underscore character.
383 ^ Pronounced "hat", a single circumflex character. 454 ^ Pronounced "hat", a single circumflex character.
@@ -390,14 +461,18 @@ of a process will usually be assigned by the system initialization
390mechanism. 461mechanism.
391 462
392Access Rules 463Access Rules
464~~~~~~~~~~~~
393 465
394Smack uses the traditional access modes of Linux. These modes are read, 466Smack uses the traditional access modes of Linux. These modes are read,
395execute, write, and occasionally append. There are a few cases where the 467execute, write, and occasionally append. There are a few cases where the
396access mode may not be obvious. These include: 468access mode may not be obvious. These include:
397 469
398 Signals: A signal is a write operation from the subject task to 470 Signals:
471 A signal is a write operation from the subject task to
399 the object task. 472 the object task.
400 Internet Domain IPC: Transmission of a packet is considered a 473
474 Internet Domain IPC:
475 Transmission of a packet is considered a
401 write operation from the source task to the destination task. 476 write operation from the source task to the destination task.
402 477
403Smack restricts access based on the label attached to a subject and the label 478Smack restricts access based on the label attached to a subject and the label
@@ -417,6 +492,7 @@ order:
417 7. Any other access is denied. 492 7. Any other access is denied.
418 493
419Smack Access Rules 494Smack Access Rules
495~~~~~~~~~~~~~~~~~~
420 496
421With the isolation provided by Smack access separation is simple. There are 497With the isolation provided by Smack access separation is simple. There are
422many interesting cases where limited access by subjects to objects with 498many interesting cases where limited access by subjects to objects with
@@ -427,8 +503,9 @@ be "born" highly classified. To accommodate such schemes Smack includes a
427mechanism for specifying rules allowing access between labels. 503mechanism for specifying rules allowing access between labels.
428 504
429Access Rule Format 505Access Rule Format
506~~~~~~~~~~~~~~~~~~
430 507
431The format of an access rule is: 508The format of an access rule is::
432 509
433 subject-label object-label access 510 subject-label object-label access
434 511
@@ -446,7 +523,7 @@ describe access modes:
446 523
447Uppercase values for the specification letters are allowed as well. 524Uppercase values for the specification letters are allowed as well.
448Access mode specifications can be in any order. Examples of acceptable rules 525Access mode specifications can be in any order. Examples of acceptable rules
449are: 526are::
450 527
451 TopSecret Secret rx 528 TopSecret Secret rx
452 Secret Unclass R 529 Secret Unclass R
@@ -456,7 +533,7 @@ are:
456 New Old rRrRr 533 New Old rRrRr
457 Closed Off - 534 Closed Off -
458 535
459Examples of unacceptable rules are: 536Examples of unacceptable rules are::
460 537
461 Top Secret Secret rx 538 Top Secret Secret rx
462 Ace Ace r 539 Ace Ace r
@@ -469,6 +546,7 @@ access specifications. The dash is a placeholder, so "a-r" is the same
469as "ar". A lone dash is used to specify that no access should be allowed. 546as "ar". A lone dash is used to specify that no access should be allowed.
470 547
471Applying Access Rules 548Applying Access Rules
549~~~~~~~~~~~~~~~~~~~~~
472 550
473The developers of Linux rarely define new sorts of things, usually importing 551The developers of Linux rarely define new sorts of things, usually importing
474schemes and concepts from other systems. Most often, the other systems are 552schemes and concepts from other systems. Most often, the other systems are
@@ -511,6 +589,7 @@ one process to another requires that the sender have write access to the
511receiver. The receiver is not required to have read access to the sender. 589receiver. The receiver is not required to have read access to the sender.
512 590
513Setting Access Rules 591Setting Access Rules
592~~~~~~~~~~~~~~~~~~~~
514 593
515The configuration file /etc/smack/accesses contains the rules to be set at 594The configuration file /etc/smack/accesses contains the rules to be set at
516system startup. The contents are written to the special file 595system startup. The contents are written to the special file
@@ -520,6 +599,7 @@ one rule, with the most recently specified overriding any earlier
520specification. 599specification.
521 600
522Task Attribute 601Task Attribute
602~~~~~~~~~~~~~~
523 603
524The Smack label of a process can be read from /proc/<pid>/attr/current. A 604The Smack label of a process can be read from /proc/<pid>/attr/current. A
525process can read its own Smack label from /proc/self/attr/current. A 605process can read its own Smack label from /proc/self/attr/current. A
@@ -527,12 +607,14 @@ privileged process can change its own Smack label by writing to
527/proc/self/attr/current but not the label of another process. 607/proc/self/attr/current but not the label of another process.
528 608
529File Attribute 609File Attribute
610~~~~~~~~~~~~~~
530 611
531The Smack label of a filesystem object is stored as an extended attribute 612The Smack label of a filesystem object is stored as an extended attribute
532named SMACK64 on the file. This attribute is in the security namespace. It can 613named SMACK64 on the file. This attribute is in the security namespace. It can
533only be changed by a process with privilege. 614only be changed by a process with privilege.
534 615
535Privilege 616Privilege
617~~~~~~~~~
536 618
537A process with CAP_MAC_OVERRIDE or CAP_MAC_ADMIN is privileged. 619A process with CAP_MAC_OVERRIDE or CAP_MAC_ADMIN is privileged.
538CAP_MAC_OVERRIDE allows the process access to objects it would 620CAP_MAC_OVERRIDE allows the process access to objects it would
@@ -540,6 +622,7 @@ be denied otherwise. CAP_MAC_ADMIN allows a process to change
540Smack data, including rules and attributes. 622Smack data, including rules and attributes.
541 623
542Smack Networking 624Smack Networking
625~~~~~~~~~~~~~~~~
543 626
544As mentioned before, Smack enforces access control on network protocol 627As mentioned before, Smack enforces access control on network protocol
545transmissions. Every packet sent by a Smack process is tagged with its Smack 628transmissions. Every packet sent by a Smack process is tagged with its Smack
@@ -551,6 +634,7 @@ packet has write access to the receiving process and if that is not the case
551the packet is dropped. 634the packet is dropped.
552 635
553CIPSO Configuration 636CIPSO Configuration
637~~~~~~~~~~~~~~~~~~~
554 638
555It is normally unnecessary to specify the CIPSO configuration. The default 639It is normally unnecessary to specify the CIPSO configuration. The default
556values used by the system handle all internal cases. Smack will compose CIPSO 640values used by the system handle all internal cases. Smack will compose CIPSO
@@ -571,13 +655,13 @@ discarded. The DOI is 3 by default. The value can be read from
571The label and category set are mapped to a Smack label as defined in 655The label and category set are mapped to a Smack label as defined in
572/etc/smack/cipso. 656/etc/smack/cipso.
573 657
574A Smack/CIPSO mapping has the form: 658A Smack/CIPSO mapping has the form::
575 659
576 smack level [category [category]*] 660 smack level [category [category]*]
577 661
578Smack does not expect the level or category sets to be related in any 662Smack does not expect the level or category sets to be related in any
579particular way and does not assume or assign accesses based on them. Some 663particular way and does not assume or assign accesses based on them. Some
580examples of mappings: 664examples of mappings::
581 665
582 TopSecret 7 666 TopSecret 7
583 TS:A,B 7 1 2 667 TS:A,B 7 1 2
@@ -597,25 +681,30 @@ value can be read from /sys/fs/smackfs/direct and changed by writing to
597/sys/fs/smackfs/direct. 681/sys/fs/smackfs/direct.
598 682
599Socket Attributes 683Socket Attributes
684~~~~~~~~~~~~~~~~~
600 685
601There are two attributes that are associated with sockets. These attributes 686There are two attributes that are associated with sockets. These attributes
602can only be set by privileged tasks, but any task can read them for their own 687can only be set by privileged tasks, but any task can read them for their own
603sockets. 688sockets.
604 689
605 SMACK64IPIN: The Smack label of the task object. A privileged 690 SMACK64IPIN:
691 The Smack label of the task object. A privileged
606 program that will enforce policy may set this to the star label. 692 program that will enforce policy may set this to the star label.
607 693
608 SMACK64IPOUT: The Smack label transmitted with outgoing packets. 694 SMACK64IPOUT:
695 The Smack label transmitted with outgoing packets.
609 A privileged program may set this to match the label of another 696 A privileged program may set this to match the label of another
610 task with which it hopes to communicate. 697 task with which it hopes to communicate.
611 698
612Smack Netlabel Exceptions 699Smack Netlabel Exceptions
700~~~~~~~~~~~~~~~~~~~~~~~~~
613 701
614You will often find that your labeled application has to talk to the outside, 702You will often find that your labeled application has to talk to the outside,
615unlabeled world. To do this there's a special file /sys/fs/smackfs/netlabel 703unlabeled world. To do this there's a special file /sys/fs/smackfs/netlabel
616where you can add some exceptions in the form of : 704where you can add some exceptions in the form of::
617@IP1 LABEL1 or 705
618@IP2/MASK LABEL2 706 @IP1 LABEL1 or
707 @IP2/MASK LABEL2
619 708
620It means that your application will have unlabeled access to @IP1 if it has 709It means that your application will have unlabeled access to @IP1 if it has
621write access on LABEL1, and access to the subnet @IP2/MASK if it has write 710write access on LABEL1, and access to the subnet @IP2/MASK if it has write
@@ -624,28 +713,32 @@ access on LABEL2.
624Entries in the /sys/fs/smackfs/netlabel file are matched by longest mask 713Entries in the /sys/fs/smackfs/netlabel file are matched by longest mask
625first, like in classless IPv4 routing. 714first, like in classless IPv4 routing.
626 715
627A special label '@' and an option '-CIPSO' can be used there : 716A special label '@' and an option '-CIPSO' can be used there::
628@ means Internet, any application with any label has access to it
629-CIPSO means standard CIPSO networking
630 717
631If you don't know what CIPSO is and don't plan to use it, you can just do : 718 @ means Internet, any application with any label has access to it
632echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel 719 -CIPSO means standard CIPSO networking
633echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel 720
721If you don't know what CIPSO is and don't plan to use it, you can just do::
722
723 echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel
724 echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel
634 725
635If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled 726If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled
636Internet access, you can have : 727Internet access, you can have::
637echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel
638echo 192.168.0.0/16 -CIPSO > /sys/fs/smackfs/netlabel
639echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel
640 728
729 echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel
730 echo 192.168.0.0/16 -CIPSO > /sys/fs/smackfs/netlabel
731 echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel
641 732
642Writing Applications for Smack 733Writing Applications for Smack
734------------------------------
643 735
644There are three sorts of applications that will run on a Smack system. How an 736There are three sorts of applications that will run on a Smack system. How an
645application interacts with Smack will determine what it will have to do to 737application interacts with Smack will determine what it will have to do to
646work properly under Smack. 738work properly under Smack.
647 739
648Smack Ignorant Applications 740Smack Ignorant Applications
741---------------------------
649 742
650By far the majority of applications have no reason whatever to care about the 743By far the majority of applications have no reason whatever to care about the
651unique properties of Smack. Since invoking a program has no impact on the 744unique properties of Smack. Since invoking a program has no impact on the
@@ -653,12 +746,14 @@ Smack label associated with the process the only concern likely to arise is
653whether the process has execute access to the program. 746whether the process has execute access to the program.
654 747
655Smack Relevant Applications 748Smack Relevant Applications
749---------------------------
656 750
657Some programs can be improved by teaching them about Smack, but do not make 751Some programs can be improved by teaching them about Smack, but do not make
658any security decisions themselves. The utility ls(1) is one example of such a 752any security decisions themselves. The utility ls(1) is one example of such a
659program. 753program.
660 754
661Smack Enforcing Applications 755Smack Enforcing Applications
756----------------------------
662 757
663These are special programs that not only know about Smack, but participate in 758These are special programs that not only know about Smack, but participate in
664the enforcement of system policy. In most cases these are the programs that 759the enforcement of system policy. In most cases these are the programs that
@@ -666,15 +761,16 @@ set up user sessions. There are also network services that provide information
666to processes running with various labels. 761to processes running with various labels.
667 762
668File System Interfaces 763File System Interfaces
764----------------------
669 765
670Smack maintains labels on file system objects using extended attributes. The 766Smack maintains labels on file system objects using extended attributes. The
671Smack label of a file, directory, or other file system object can be obtained 767Smack label of a file, directory, or other file system object can be obtained
672using getxattr(2). 768using getxattr(2)::
673 769
674 len = getxattr("/", "security.SMACK64", value, sizeof (value)); 770 len = getxattr("/", "security.SMACK64", value, sizeof (value));
675 771
676will put the Smack label of the root directory into value. A privileged 772will put the Smack label of the root directory into value. A privileged
677process can set the Smack label of a file system object with setxattr(2). 773process can set the Smack label of a file system object with setxattr(2)::
678 774
679 len = strlen("Rubble"); 775 len = strlen("Rubble");
680 rc = setxattr("/foo", "security.SMACK64", "Rubble", len, 0); 776 rc = setxattr("/foo", "security.SMACK64", "Rubble", len, 0);
@@ -683,17 +779,18 @@ will set the Smack label of /foo to "Rubble" if the program has appropriate
683privilege. 779privilege.
684 780
685Socket Interfaces 781Socket Interfaces
782-----------------
686 783
687The socket attributes can be read using fgetxattr(2). 784The socket attributes can be read using fgetxattr(2).
688 785
689A privileged process can set the Smack label of outgoing packets with 786A privileged process can set the Smack label of outgoing packets with
690fsetxattr(2). 787fsetxattr(2)::
691 788
692 len = strlen("Rubble"); 789 len = strlen("Rubble");
693 rc = fsetxattr(fd, "security.SMACK64IPOUT", "Rubble", len, 0); 790 rc = fsetxattr(fd, "security.SMACK64IPOUT", "Rubble", len, 0);
694 791
695will set the Smack label "Rubble" on packets going out from the socket if the 792will set the Smack label "Rubble" on packets going out from the socket if the
696program has appropriate privilege. 793program has appropriate privilege::
697 794
698 rc = fsetxattr(fd, "security.SMACK64IPIN, "*", strlen("*"), 0); 795 rc = fsetxattr(fd, "security.SMACK64IPIN, "*", strlen("*"), 0);
699 796
@@ -701,33 +798,40 @@ will set the Smack label "*" as the object label against which incoming
701packets will be checked if the program has appropriate privilege. 798packets will be checked if the program has appropriate privilege.
702 799
703Administration 800Administration
801--------------
704 802
705Smack supports some mount options: 803Smack supports some mount options:
706 804
707 smackfsdef=label: specifies the label to give files that lack 805 smackfsdef=label:
806 specifies the label to give files that lack
708 the Smack label extended attribute. 807 the Smack label extended attribute.
709 808
710 smackfsroot=label: specifies the label to assign the root of the 809 smackfsroot=label:
810 specifies the label to assign the root of the
711 file system if it lacks the Smack extended attribute. 811 file system if it lacks the Smack extended attribute.
712 812
713 smackfshat=label: specifies a label that must have read access to 813 smackfshat=label:
814 specifies a label that must have read access to
714 all labels set on the filesystem. Not yet enforced. 815 all labels set on the filesystem. Not yet enforced.
715 816
716 smackfsfloor=label: specifies a label to which all labels set on the 817 smackfsfloor=label:
818 specifies a label to which all labels set on the
717 filesystem must have read access. Not yet enforced. 819 filesystem must have read access. Not yet enforced.
718 820
719These mount options apply to all file system types. 821These mount options apply to all file system types.
720 822
721Smack auditing 823Smack auditing
824--------------
722 825
723If you want Smack auditing of security events, you need to set CONFIG_AUDIT 826If you want Smack auditing of security events, you need to set CONFIG_AUDIT
724in your kernel configuration. 827in your kernel configuration.
725By default, all denied events will be audited. You can change this behavior by 828By default, all denied events will be audited. You can change this behavior by
726writing a single character to the /sys/fs/smackfs/logging file : 829writing a single character to the /sys/fs/smackfs/logging file::
7270 : no logging 830
7281 : log denied (default) 831 0 : no logging
7292 : log accepted 832 1 : log denied (default)
7303 : log denied & accepted 833 2 : log accepted
834 3 : log denied & accepted
731 835
732Events are logged as 'key=value' pairs, for each event you at least will get 836Events are logged as 'key=value' pairs, for each event you at least will get
733the subject, the object, the rights requested, the action, the kernel function 837the subject, the object, the rights requested, the action, the kernel function
@@ -735,6 +839,7 @@ that triggered the event, plus other pairs depending on the type of event
735audited. 839audited.
736 840
737Bringup Mode 841Bringup Mode
842------------
738 843
739Bringup mode provides logging features that can make application 844Bringup mode provides logging features that can make application
740configuration and system bringup easier. Configure the kernel with 845configuration and system bringup easier. Configure the kernel with
diff --git a/Documentation/security/Yama.txt b/Documentation/admin-guide/LSM/Yama.rst
index d9ee7d7a6c7f..13468ea696b7 100644
--- a/Documentation/security/Yama.txt
+++ b/Documentation/admin-guide/LSM/Yama.rst
@@ -1,13 +1,14 @@
1====
2Yama
3====
4
1Yama is a Linux Security Module that collects system-wide DAC security 5Yama is a Linux Security Module that collects system-wide DAC security
2protections that are not handled by the core kernel itself. This is 6protections that are not handled by the core kernel itself. This is
3selectable at build-time with CONFIG_SECURITY_YAMA, and can be controlled 7selectable at build-time with ``CONFIG_SECURITY_YAMA``, and can be controlled
4at run-time through sysctls in /proc/sys/kernel/yama: 8at run-time through sysctls in ``/proc/sys/kernel/yama``:
5
6- ptrace_scope
7 9
8============================================================== 10ptrace_scope
9 11============
10ptrace_scope:
11 12
12As Linux grows in popularity, it will become a larger target for 13As Linux grows in popularity, it will become a larger target for
13malware. One particularly troubling weakness of the Linux process 14malware. One particularly troubling weakness of the Linux process
@@ -25,47 +26,49 @@ exist and remain possible if ptrace is allowed to operate as before.
25Since ptrace is not commonly used by non-developers and non-admins, system 26Since ptrace is not commonly used by non-developers and non-admins, system
26builders should be allowed the option to disable this debugging system. 27builders should be allowed the option to disable this debugging system.
27 28
28For a solution, some applications use prctl(PR_SET_DUMPABLE, ...) to 29For a solution, some applications use ``prctl(PR_SET_DUMPABLE, ...)`` to
29specifically disallow such ptrace attachment (e.g. ssh-agent), but many 30specifically disallow such ptrace attachment (e.g. ssh-agent), but many
30do not. A more general solution is to only allow ptrace directly from a 31do not. A more general solution is to only allow ptrace directly from a
31parent to a child process (i.e. direct "gdb EXE" and "strace EXE" still 32parent to a child process (i.e. direct "gdb EXE" and "strace EXE" still
32work), or with CAP_SYS_PTRACE (i.e. "gdb --pid=PID", and "strace -p PID" 33work), or with ``CAP_SYS_PTRACE`` (i.e. "gdb --pid=PID", and "strace -p PID"
33still work as root). 34still work as root).
34 35
35In mode 1, software that has defined application-specific relationships 36In mode 1, software that has defined application-specific relationships
36between a debugging process and its inferior (crash handlers, etc), 37between a debugging process and its inferior (crash handlers, etc),
37prctl(PR_SET_PTRACER, pid, ...) can be used. An inferior can declare which 38``prctl(PR_SET_PTRACER, pid, ...)`` can be used. An inferior can declare which
38other process (and its descendants) are allowed to call PTRACE_ATTACH 39other process (and its descendants) are allowed to call ``PTRACE_ATTACH``
39against it. Only one such declared debugging process can exists for 40against it. Only one such declared debugging process can exists for
40each inferior at a time. For example, this is used by KDE, Chromium, and 41each inferior at a time. For example, this is used by KDE, Chromium, and
41Firefox's crash handlers, and by Wine for allowing only Wine processes 42Firefox's crash handlers, and by Wine for allowing only Wine processes
42to ptrace each other. If a process wishes to entirely disable these ptrace 43to ptrace each other. If a process wishes to entirely disable these ptrace
43restrictions, it can call prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...) 44restrictions, it can call ``prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...)``
44so that any otherwise allowed process (even those in external pid namespaces) 45so that any otherwise allowed process (even those in external pid namespaces)
45may attach. 46may attach.
46 47
47The sysctl settings (writable only with CAP_SYS_PTRACE) are: 48The sysctl settings (writable only with ``CAP_SYS_PTRACE``) are:
48 49
490 - classic ptrace permissions: a process can PTRACE_ATTACH to any other 500 - classic ptrace permissions:
51 a process can ``PTRACE_ATTACH`` to any other
50 process running under the same uid, as long as it is dumpable (i.e. 52 process running under the same uid, as long as it is dumpable (i.e.
51 did not transition uids, start privileged, or have called 53 did not transition uids, start privileged, or have called
52 prctl(PR_SET_DUMPABLE...) already). Similarly, PTRACE_TRACEME is 54 ``prctl(PR_SET_DUMPABLE...)`` already). Similarly, ``PTRACE_TRACEME`` is
53 unchanged. 55 unchanged.
54 56
551 - restricted ptrace: a process must have a predefined relationship 571 - restricted ptrace:
56 with the inferior it wants to call PTRACE_ATTACH on. By default, 58 a process must have a predefined relationship
59 with the inferior it wants to call ``PTRACE_ATTACH`` on. By default,
57 this relationship is that of only its descendants when the above 60 this relationship is that of only its descendants when the above
58 classic criteria is also met. To change the relationship, an 61 classic criteria is also met. To change the relationship, an
59 inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare 62 inferior can call ``prctl(PR_SET_PTRACER, debugger, ...)`` to declare
60 an allowed debugger PID to call PTRACE_ATTACH on the inferior. 63 an allowed debugger PID to call ``PTRACE_ATTACH`` on the inferior.
61 Using PTRACE_TRACEME is unchanged. 64 Using ``PTRACE_TRACEME`` is unchanged.
62 65
632 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace 662 - admin-only attach:
64 with PTRACE_ATTACH, or through children calling PTRACE_TRACEME. 67 only processes with ``CAP_SYS_PTRACE`` may use ptrace
68 with ``PTRACE_ATTACH``, or through children calling ``PTRACE_TRACEME``.
65 69
663 - no attach: no processes may use ptrace with PTRACE_ATTACH nor via 703 - no attach:
67 PTRACE_TRACEME. Once set, this sysctl value cannot be changed. 71 no processes may use ptrace with ``PTRACE_ATTACH`` nor via
72 ``PTRACE_TRACEME``. Once set, this sysctl value cannot be changed.
68 73
69The original children-only logic was based on the restrictions in grsecurity. 74The original children-only logic was based on the restrictions in grsecurity.
70
71==============================================================
diff --git a/Documentation/security/apparmor.txt b/Documentation/admin-guide/LSM/apparmor.rst
index 93c1fd7d0635..3e9734bd0e05 100644
--- a/Documentation/security/apparmor.txt
+++ b/Documentation/admin-guide/LSM/apparmor.rst
@@ -1,4 +1,9 @@
1--- What is AppArmor? --- 1========
2AppArmor
3========
4
5What is AppArmor?
6=================
2 7
3AppArmor is MAC style security extension for the Linux kernel. It implements 8AppArmor is MAC style security extension for the Linux kernel. It implements
4a task centered policy, with task "profiles" being created and loaded 9a task centered policy, with task "profiles" being created and loaded
@@ -6,34 +11,41 @@ from user space. Tasks on the system that do not have a profile defined for
6them run in an unconfined state which is equivalent to standard Linux DAC 11them run in an unconfined state which is equivalent to standard Linux DAC
7permissions. 12permissions.
8 13
9--- How to enable/disable --- 14How to enable/disable
15=====================
16
17set ``CONFIG_SECURITY_APPARMOR=y``
10 18
11set CONFIG_SECURITY_APPARMOR=y 19If AppArmor should be selected as the default security module then set::
12 20
13If AppArmor should be selected as the default security module then 21 CONFIG_DEFAULT_SECURITY="apparmor"
14 set CONFIG_DEFAULT_SECURITY="apparmor" 22 CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1
15 and CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1
16 23
17Build the kernel 24Build the kernel
18 25
19If AppArmor is not the default security module it can be enabled by passing 26If AppArmor is not the default security module it can be enabled by passing
20security=apparmor on the kernel's command line. 27``security=apparmor`` on the kernel's command line.
21 28
22If AppArmor is the default security module it can be disabled by passing 29If AppArmor is the default security module it can be disabled by passing
23apparmor=0, security=XXXX (where XXX is valid security module), on the 30``apparmor=0, security=XXXX`` (where ``XXXX`` is valid security module), on the
24kernel's command line 31kernel's command line.
25 32
26For AppArmor to enforce any restrictions beyond standard Linux DAC permissions 33For AppArmor to enforce any restrictions beyond standard Linux DAC permissions
27policy must be loaded into the kernel from user space (see the Documentation 34policy must be loaded into the kernel from user space (see the Documentation
28and tools links). 35and tools links).
29 36
30--- Documentation --- 37Documentation
38=============
31 39
32Documentation can be found on the wiki. 40Documentation can be found on the wiki, linked below.
33 41
34--- Links --- 42Links
43=====
35 44
36Mailing List - apparmor@lists.ubuntu.com 45Mailing List - apparmor@lists.ubuntu.com
46
37Wiki - http://apparmor.wiki.kernel.org/ 47Wiki - http://apparmor.wiki.kernel.org/
48
38User space tools - https://launchpad.net/apparmor 49User space tools - https://launchpad.net/apparmor
50
39Kernel module - git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git 51Kernel module - git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git
diff --git a/Documentation/security/LSM.txt b/Documentation/admin-guide/LSM/index.rst
index c2683f28ed36..c980dfe9abf1 100644
--- a/Documentation/security/LSM.txt
+++ b/Documentation/admin-guide/LSM/index.rst
@@ -1,12 +1,13 @@
1Linux Security Module framework 1===========================
2------------------------------- 2Linux Security Module Usage
3===========================
3 4
4The Linux Security Module (LSM) framework provides a mechanism for 5The Linux Security Module (LSM) framework provides a mechanism for
5various security checks to be hooked by new kernel extensions. The name 6various security checks to be hooked by new kernel extensions. The name
6"module" is a bit of a misnomer since these extensions are not actually 7"module" is a bit of a misnomer since these extensions are not actually
7loadable kernel modules. Instead, they are selectable at build-time via 8loadable kernel modules. Instead, they are selectable at build-time via
8CONFIG_DEFAULT_SECURITY and can be overridden at boot-time via the 9CONFIG_DEFAULT_SECURITY and can be overridden at boot-time via the
9"security=..." kernel command line argument, in the case where multiple 10``"security=..."`` kernel command line argument, in the case where multiple
10LSMs were built into a given kernel. 11LSMs were built into a given kernel.
11 12
12The primary users of the LSM interface are Mandatory Access Control 13The primary users of the LSM interface are Mandatory Access Control
@@ -19,23 +20,22 @@ in the core functionality of Linux itself.
19Without a specific LSM built into the kernel, the default LSM will be the 20Without a specific LSM built into the kernel, the default LSM will be the
20Linux capabilities system. Most LSMs choose to extend the capabilities 21Linux capabilities system. Most LSMs choose to extend the capabilities
21system, building their checks on top of the defined capability hooks. 22system, building their checks on top of the defined capability hooks.
22For more details on capabilities, see capabilities(7) in the Linux 23For more details on capabilities, see ``capabilities(7)`` in the Linux
23man-pages project. 24man-pages project.
24 25
25A list of the active security modules can be found by reading 26A list of the active security modules can be found by reading
26/sys/kernel/security/lsm. This is a comma separated list, and 27``/sys/kernel/security/lsm``. This is a comma separated list, and
27will always include the capability module. The list reflects the 28will always include the capability module. The list reflects the
28order in which checks are made. The capability module will always 29order in which checks are made. The capability module will always
29be first, followed by any "minor" modules (e.g. Yama) and then 30be first, followed by any "minor" modules (e.g. Yama) and then
30the one "major" module (e.g. SELinux) if there is one configured. 31the one "major" module (e.g. SELinux) if there is one configured.
31 32
32Based on https://lkml.org/lkml/2007/10/26/215, 33.. toctree::
33a new LSM is accepted into the kernel when its intent (a description of 34 :maxdepth: 1
34what it tries to protect against and in what cases one would expect to
35use it) has been appropriately documented in Documentation/security/.
36This allows an LSM's code to be easily compared to its goals, and so
37that end users and distros can make a more informed decision about which
38LSMs suit their requirements.
39 35
40For extensive documentation on the available LSM hook interfaces, please 36 apparmor
41see include/linux/security.h. 37 LoadPin
38 SELinux
39 Smack
40 tomoyo
41 Yama
diff --git a/Documentation/security/tomoyo.txt b/Documentation/admin-guide/LSM/tomoyo.rst
index 200a2d37cbc8..a5947218fa64 100644
--- a/Documentation/security/tomoyo.txt
+++ b/Documentation/admin-guide/LSM/tomoyo.rst
@@ -1,21 +1,30 @@
1--- What is TOMOYO? --- 1======
2TOMOYO
3======
4
5What is TOMOYO?
6===============
2 7
3TOMOYO is a name-based MAC extension (LSM module) for the Linux kernel. 8TOMOYO is a name-based MAC extension (LSM module) for the Linux kernel.
4 9
5LiveCD-based tutorials are available at 10LiveCD-based tutorials are available at
11
6http://tomoyo.sourceforge.jp/1.7/1st-step/ubuntu10.04-live/ 12http://tomoyo.sourceforge.jp/1.7/1st-step/ubuntu10.04-live/
7http://tomoyo.sourceforge.jp/1.7/1st-step/centos5-live/ . 13http://tomoyo.sourceforge.jp/1.7/1st-step/centos5-live/
14
8Though these tutorials use non-LSM version of TOMOYO, they are useful for you 15Though these tutorials use non-LSM version of TOMOYO, they are useful for you
9to know what TOMOYO is. 16to know what TOMOYO is.
10 17
11--- How to enable TOMOYO? --- 18How to enable TOMOYO?
19=====================
12 20
13Build the kernel with CONFIG_SECURITY_TOMOYO=y and pass "security=tomoyo" on 21Build the kernel with ``CONFIG_SECURITY_TOMOYO=y`` and pass ``security=tomoyo`` on
14kernel's command line. 22kernel's command line.
15 23
16Please see http://tomoyo.sourceforge.jp/2.3/ for details. 24Please see http://tomoyo.sourceforge.jp/2.3/ for details.
17 25
18--- Where is documentation? --- 26Where is documentation?
27=======================
19 28
20User <-> Kernel interface documentation is available at 29User <-> Kernel interface documentation is available at
21http://tomoyo.sourceforge.jp/2.3/policy-reference.html . 30http://tomoyo.sourceforge.jp/2.3/policy-reference.html .
@@ -42,7 +51,8 @@ History of TOMOYO?
42 Realities of Mainlining 51 Realities of Mainlining
43 http://sourceforge.jp/projects/tomoyo/docs/lfj2008.pdf 52 http://sourceforge.jp/projects/tomoyo/docs/lfj2008.pdf
44 53
45--- What is future plan? --- 54What is future plan?
55====================
46 56
47We believe that inode based security and name based security are complementary 57We believe that inode based security and name based security are complementary
48and both should be used together. But unfortunately, so far, we cannot enable 58and both should be used together. But unfortunately, so far, we cannot enable
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
index b96e80f79e85..b5343c5aa224 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -55,12 +55,6 @@ Documentation
55 contains information about the problems, which may result by upgrading 55 contains information about the problems, which may result by upgrading
56 your kernel. 56 your kernel.
57 57
58 - The Documentation/DocBook/ subdirectory contains several guides for
59 kernel developers and users. These guides can be rendered in a
60 number of formats: PostScript (.ps), PDF, HTML, & man-pages, among others.
61 After installation, ``make psdocs``, ``make pdfdocs``, ``make htmldocs``,
62 or ``make mandocs`` will render the documentation in the requested format.
63
64Installing the kernel source 58Installing the kernel source
65---------------------------- 59----------------------------
66 60
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 6d99a7ce6e21..5bb9161dbe6a 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -62,6 +62,7 @@ configure specific aspects of kernel behavior to your liking.
62 ras 62 ras
63 pm/index 63 pm/index
64 thunderbolt 64 thunderbolt
65 LSM/index
65 66
66.. only:: subproject and html 67.. only:: subproject and html
67 68
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 783010e95f51..3b335c1f8441 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -727,7 +727,8 @@
727 See also Documentation/input/joystick-parport.txt 727 See also Documentation/input/joystick-parport.txt
728 728
729 ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot 729 ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot
730 time. See Documentation/dynamic-debug-howto.txt for 730 time. See
731 Documentation/admin-guide/dynamic-debug-howto.rst for
731 details. Deprecated, see dyndbg. 732 details. Deprecated, see dyndbg.
732 733
733 debug [KNL] Enable kernel debugging (events log level). 734 debug [KNL] Enable kernel debugging (events log level).
@@ -890,7 +891,8 @@
890 dyndbg[="val"] [KNL,DYNAMIC_DEBUG] 891 dyndbg[="val"] [KNL,DYNAMIC_DEBUG]
891 module.dyndbg[="val"] 892 module.dyndbg[="val"]
892 Enable debug messages at boot time. See 893 Enable debug messages at boot time. See
893 Documentation/dynamic-debug-howto.txt for details. 894 Documentation/admin-guide/dynamic-debug-howto.rst
895 for details.
894 896
895 nompx [X86] Disables Intel Memory Protection Extensions. 897 nompx [X86] Disables Intel Memory Protection Extensions.
896 See Documentation/x86/intel_mpx.txt for more 898 See Documentation/x86/intel_mpx.txt for more
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index 8c7bbf2c88d2..197896718f81 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -344,9 +344,9 @@ for more than 2 channels, like Fully Buffered DIMMs (FB-DIMMs) memory
344controllers. The following example will assume 2 channels: 344controllers. The following example will assume 2 channels:
345 345
346 +------------+-----------------------+ 346 +------------+-----------------------+
347 | Chip | Channels | 347 | CS Rows | Channels |
348 | Select +-----------+-----------+ 348 +------------+-----------+-----------+
349 | rows | ``ch0`` | ``ch1`` | 349 | | ``ch0`` | ``ch1`` |
350 +============+===========+===========+ 350 +============+===========+===========+
351 | ``csrow0`` | DIMM_A0 | DIMM_B0 | 351 | ``csrow0`` | DIMM_A0 | DIMM_B0 |
352 +------------+ | | 352 +------------+ | |
@@ -698,7 +698,7 @@ information indicating that errors have been detected::
698The structure of the message is: 698The structure of the message is:
699 699
700 +---------------------------------------+-------------+ 700 +---------------------------------------+-------------+
701 | Content + Example | 701 | Content | Example |
702 +=======================================+=============+ 702 +=======================================+=============+
703 | The memory controller | MC0 | 703 | The memory controller | MC0 |
704 +---------------------------------------+-------------+ 704 +---------------------------------------+-------------+
@@ -713,7 +713,7 @@ The structure of the message is:
713 +---------------------------------------+-------------+ 713 +---------------------------------------+-------------+
714 | The error syndrome | 0xb741 | 714 | The error syndrome | 0xb741 |
715 +---------------------------------------+-------------+ 715 +---------------------------------------+-------------+
716 | Memory row | row 0 + 716 | Memory row | row 0 |
717 +---------------------------------------+-------------+ 717 +---------------------------------------+-------------+
718 | Memory channel | channel 1 | 718 | Memory channel | channel 1 |
719 +---------------------------------------+-------------+ 719 +---------------------------------------+-------------+
diff --git a/Documentation/conf.py b/Documentation/conf.py
index bacf9d337c89..71b032bb44fd 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -271,8 +271,7 @@ latex_elements = {
271 271
272# Additional stuff for the LaTeX preamble. 272# Additional stuff for the LaTeX preamble.
273 'preamble': ''' 273 'preamble': '''
274 % Adjust margins 274 \\usepackage{ifthen}
275 \\usepackage[margin=0.5in, top=1in, bottom=1in]{geometry}
276 275
277 % Allow generate some pages in landscape 276 % Allow generate some pages in landscape
278 \\usepackage{lscape} 277 \\usepackage{lscape}
@@ -281,6 +280,7 @@ latex_elements = {
281 \\definecolor{NoteColor}{RGB}{204,255,255} 280 \\definecolor{NoteColor}{RGB}{204,255,255}
282 \\definecolor{WarningColor}{RGB}{255,204,204} 281 \\definecolor{WarningColor}{RGB}{255,204,204}
283 \\definecolor{AttentionColor}{RGB}{255,255,204} 282 \\definecolor{AttentionColor}{RGB}{255,255,204}
283 \\definecolor{ImportantColor}{RGB}{192,255,204}
284 \\definecolor{OtherColor}{RGB}{204,204,204} 284 \\definecolor{OtherColor}{RGB}{204,204,204}
285 \\newlength{\\mynoticelength} 285 \\newlength{\\mynoticelength}
286 \\makeatletter\\newenvironment{coloredbox}[1]{% 286 \\makeatletter\\newenvironment{coloredbox}[1]{%
@@ -301,7 +301,12 @@ latex_elements = {
301 \\ifthenelse% 301 \\ifthenelse%
302 {\\equal{\\py@noticetype}{attention}}% 302 {\\equal{\\py@noticetype}{attention}}%
303 {\\colorbox{AttentionColor}{\\usebox{\\@tempboxa}}}% 303 {\\colorbox{AttentionColor}{\\usebox{\\@tempboxa}}}%
304 {\\colorbox{OtherColor}{\\usebox{\\@tempboxa}}}% 304 {%
305 \\ifthenelse%
306 {\\equal{\\py@noticetype}{important}}%
307 {\\colorbox{ImportantColor}{\\usebox{\\@tempboxa}}}%
308 {\\colorbox{OtherColor}{\\usebox{\\@tempboxa}}}%
309 }%
305 }% 310 }%
306 }% 311 }%
307 }\\makeatother 312 }\\makeatother
@@ -336,30 +341,51 @@ latex_elements = {
336if major == 1 and minor > 3: 341if major == 1 and minor > 3:
337 latex_elements['preamble'] += '\\renewcommand*{\\DUrole}[2]{ #2 }\n' 342 latex_elements['preamble'] += '\\renewcommand*{\\DUrole}[2]{ #2 }\n'
338 343
344if major == 1 and minor <= 4:
345 latex_elements['preamble'] += '\\usepackage[margin=0.5in, top=1in, bottom=1in]{geometry}'
346elif major == 1 and (minor > 5 or (minor == 5 and patch >= 3)):
347 latex_elements['sphinxsetup'] = 'hmargin=0.5in, vmargin=0.5in'
348
349
339# Grouping the document tree into LaTeX files. List of tuples 350# Grouping the document tree into LaTeX files. List of tuples
340# (source start file, target name, title, 351# (source start file, target name, title,
341# author, documentclass [howto, manual, or own class]). 352# author, documentclass [howto, manual, or own class]).
353# Sorted in alphabetical order
342latex_documents = [ 354latex_documents = [
343 ('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
344 'The kernel development community', 'manual'),
345 ('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation', 355 ('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation',
346 'The kernel development community', 'manual'), 356 'The kernel development community', 'manual'),
347 ('core-api/index', 'core-api.tex', 'The kernel core API manual', 357 ('core-api/index', 'core-api.tex', 'The kernel core API manual',
348 'The kernel development community', 'manual'), 358 'The kernel development community', 'manual'),
349 ('driver-api/index', 'driver-api.tex', 'The kernel driver API manual', 359 ('crypto/index', 'crypto-api.tex', 'Linux Kernel Crypto API manual',
350 'The kernel development community', 'manual'), 360 'The kernel development community', 'manual'),
351 ('input/index', 'linux-input.tex', 'The Linux input driver subsystem', 361 ('dev-tools/index', 'dev-tools.tex', 'Development tools for the Kernel',
352 'The kernel development community', 'manual'), 362 'The kernel development community', 'manual'),
353 ('kernel-documentation', 'kernel-documentation.tex', 'The Linux Kernel Documentation', 363 ('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide',
354 'The kernel development community', 'manual'), 364 'The kernel development community', 'manual'),
355 ('process/index', 'development-process.tex', 'Linux Kernel Development Documentation', 365 ('driver-api/index', 'driver-api.tex', 'The kernel driver API manual',
366 'The kernel development community', 'manual'),
367 ('filesystems/index', 'filesystems.tex', 'Linux Filesystems API',
356 'The kernel development community', 'manual'), 368 'The kernel development community', 'manual'),
357 ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide', 369 ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide',
358 'The kernel development community', 'manual'), 370 'The kernel development community', 'manual'),
371 ('input/index', 'linux-input.tex', 'The Linux input driver subsystem',
372 'The kernel development community', 'manual'),
373 ('kernel-hacking/index', 'kernel-hacking.tex', 'Unreliable Guide To Hacking The Linux Kernel',
374 'The kernel development community', 'manual'),
359 ('media/index', 'media.tex', 'Linux Media Subsystem Documentation', 375 ('media/index', 'media.tex', 'Linux Media Subsystem Documentation',
360 'The kernel development community', 'manual'), 376 'The kernel development community', 'manual'),
377 ('networking/index', 'networking.tex', 'Linux Networking Documentation',
378 'The kernel development community', 'manual'),
379 ('process/index', 'development-process.tex', 'Linux Kernel Development Documentation',
380 'The kernel development community', 'manual'),
361 ('security/index', 'security.tex', 'The kernel security subsystem manual', 381 ('security/index', 'security.tex', 'The kernel security subsystem manual',
362 'The kernel development community', 'manual'), 382 'The kernel development community', 'manual'),
383 ('sh/index', 'sh.tex', 'SuperH architecture implementation manual',
384 'The kernel development community', 'manual'),
385 ('sound/index', 'sound.tex', 'Linux Sound Subsystem Documentation',
386 'The kernel development community', 'manual'),
387 ('userspace-api/index', 'userspace-api.tex', 'The Linux kernel user-space API guide',
388 'The kernel development community', 'manual'),
363] 389]
364 390
365# The name of an image file (relative to this directory) to place at the top of 391# The name of an image file (relative to this directory) to place at the top of
diff --git a/Documentation/core-api/assoc_array.rst b/Documentation/core-api/assoc_array.rst
index d83cfff9ea43..8231b915c939 100644
--- a/Documentation/core-api/assoc_array.rst
+++ b/Documentation/core-api/assoc_array.rst
@@ -10,7 +10,10 @@ properties:
10 10
111. Objects are opaque pointers. The implementation does not care where they 111. Objects are opaque pointers. The implementation does not care where they
12 point (if anywhere) or what they point to (if anything). 12 point (if anywhere) or what they point to (if anything).
13.. note:: Pointers to objects _must_ be zero in the least significant bit. 13
14 .. note::
15
16 Pointers to objects _must_ be zero in the least significant bit.
14 17
152. Objects do not need to contain linkage blocks for use by the array. This 182. Objects do not need to contain linkage blocks for use by the array. This
16 permits an object to be located in multiple arrays simultaneously. 19 permits an object to be located in multiple arrays simultaneously.
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 62abd36bfffb..0606be3a3111 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -19,6 +19,7 @@ Core utilities
19 workqueue 19 workqueue
20 genericirq 20 genericirq
21 flexible-arrays 21 flexible-arrays
22 librs
22 23
23Interfaces for kernel debugging 24Interfaces for kernel debugging
24=============================== 25===============================
diff --git a/Documentation/core-api/librs.rst b/Documentation/core-api/librs.rst
new file mode 100644
index 000000000000..6010f5bc5bf9
--- /dev/null
+++ b/Documentation/core-api/librs.rst
@@ -0,0 +1,212 @@
1==========================================
2Reed-Solomon Library Programming Interface
3==========================================
4
5:Author: Thomas Gleixner
6
7Introduction
8============
9
10The generic Reed-Solomon Library provides encoding, decoding and error
11correction functions.
12
13Reed-Solomon codes are used in communication and storage applications to
14ensure data integrity.
15
16This documentation is provided for developers who want to utilize the
17functions provided by the library.
18
19Known Bugs And Assumptions
20==========================
21
22None.
23
24Usage
25=====
26
27This chapter provides examples of how to use the library.
28
29Initializing
30------------
31
32The init function init_rs returns a pointer to an rs decoder structure,
33which holds the necessary information for encoding, decoding and error
34correction with the given polynomial. It either uses an existing
35matching decoder or creates a new one. On creation all the lookup tables
36for fast en/decoding are created. The function may take a while, so make
37sure not to call it in critical code paths.
38
39::
40
41 /* the Reed Solomon control structure */
42 static struct rs_control *rs_decoder;
43
44 /* Symbolsize is 10 (bits)
45 * Primitive polynomial is x^10+x^3+1
46 * first consecutive root is 0
47 * primitive element to generate roots = 1
48 * generator polynomial degree (number of roots) = 6
49 */
50 rs_decoder = init_rs (10, 0x409, 0, 1, 6);
51
52
53Encoding
54--------
55
56The encoder calculates the Reed-Solomon code over the given data length
57and stores the result in the parity buffer. Note that the parity buffer
58must be initialized before calling the encoder.
59
60The expanded data can be inverted on the fly by providing a non-zero
61inversion mask. The expanded data is XOR'ed with the mask. This is used
62e.g. for FLASH ECC, where the all 0xFF is inverted to an all 0x00. The
63Reed-Solomon code for all 0x00 is all 0x00. The code is inverted before
64storing to FLASH so it is 0xFF too. This prevents that reading from an
65erased FLASH results in ECC errors.
66
67The databytes are expanded to the given symbol size on the fly. There is
68no support for encoding continuous bitstreams with a symbol size != 8 at
69the moment. If it is necessary it should be not a big deal to implement
70such functionality.
71
72::
73
74 /* Parity buffer. Size = number of roots */
75 uint16_t par[6];
76 /* Initialize the parity buffer */
77 memset(par, 0, sizeof(par));
78 /* Encode 512 byte in data8. Store parity in buffer par */
79 encode_rs8 (rs_decoder, data8, 512, par, 0);
80
81
82Decoding
83--------
84
85The decoder calculates the syndrome over the given data length and the
86received parity symbols and corrects errors in the data.
87
88If a syndrome is available from a hardware decoder then the syndrome
89calculation is skipped.
90
91The correction of the data buffer can be suppressed by providing a
92correction pattern buffer and an error location buffer to the decoder.
93The decoder stores the calculated error location and the correction
94bitmask in the given buffers. This is useful for hardware decoders which
95use a weird bit ordering scheme.
96
97The databytes are expanded to the given symbol size on the fly. There is
98no support for decoding continuous bitstreams with a symbolsize != 8 at
99the moment. If it is necessary it should be not a big deal to implement
100such functionality.
101
102Decoding with syndrome calculation, direct data correction
103~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
104
105::
106
107 /* Parity buffer. Size = number of roots */
108 uint16_t par[6];
109 uint8_t data[512];
110 int numerr;
111 /* Receive data */
112 .....
113 /* Receive parity */
114 .....
115 /* Decode 512 byte in data8.*/
116 numerr = decode_rs8 (rs_decoder, data8, par, 512, NULL, 0, NULL, 0, NULL);
117
118
119Decoding with syndrome given by hardware decoder, direct data correction
120~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
121
122::
123
124 /* Parity buffer. Size = number of roots */
125 uint16_t par[6], syn[6];
126 uint8_t data[512];
127 int numerr;
128 /* Receive data */
129 .....
130 /* Receive parity */
131 .....
132 /* Get syndrome from hardware decoder */
133 .....
134 /* Decode 512 byte in data8.*/
135 numerr = decode_rs8 (rs_decoder, data8, par, 512, syn, 0, NULL, 0, NULL);
136
137
138Decoding with syndrome given by hardware decoder, no direct data correction.
139~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
140
141Note: It's not necessary to give data and received parity to the
142decoder.
143
144::
145
146 /* Parity buffer. Size = number of roots */
147 uint16_t par[6], syn[6], corr[8];
148 uint8_t data[512];
149 int numerr, errpos[8];
150 /* Receive data */
151 .....
152 /* Receive parity */
153 .....
154 /* Get syndrome from hardware decoder */
155 .....
156 /* Decode 512 byte in data8.*/
157 numerr = decode_rs8 (rs_decoder, NULL, NULL, 512, syn, 0, errpos, 0, corr);
158 for (i = 0; i < numerr; i++) {
159 do_error_correction_in_your_buffer(errpos[i], corr[i]);
160 }
161
162
163Cleanup
164-------
165
166The function free_rs frees the allocated resources, if the caller is
167the last user of the decoder.
168
169::
170
171 /* Release resources */
172 free_rs(rs_decoder);
173
174
175Structures
176==========
177
178This chapter contains the autogenerated documentation of the structures
179which are used in the Reed-Solomon Library and are relevant for a
180developer.
181
182.. kernel-doc:: include/linux/rslib.h
183 :internal:
184
185Public Functions Provided
186=========================
187
188This chapter contains the autogenerated documentation of the
189Reed-Solomon functions which are exported.
190
191.. kernel-doc:: lib/reed_solomon/reed_solomon.c
192 :export:
193
194Credits
195=======
196
197The library code for encoding and decoding was written by Phil Karn.
198
199::
200
201 Copyright 2002, Phil Karn, KA9Q
202 May be used under the terms of the GNU General Public License (GPL)
203
204
205The wrapper functions and interfaces are written by Thomas Gleixner.
206
207Many users have provided bugfixes, improvements and helping hands for
208testing. Thanks a lot.
209
210The following people have contributed to this document:
211
212Thomas Gleixner\ tglx@linutronix.de
diff --git a/Documentation/crypto/asymmetric-keys.txt b/Documentation/crypto/asymmetric-keys.txt
index 5ad6480e3fb9..b82b6ad48488 100644
--- a/Documentation/crypto/asymmetric-keys.txt
+++ b/Documentation/crypto/asymmetric-keys.txt
@@ -265,7 +265,7 @@ mandatory:
265 265
266 The caller passes a pointer to the following struct with all of the fields 266 The caller passes a pointer to the following struct with all of the fields
267 cleared, except for data, datalen and quotalen [see 267 cleared, except for data, datalen and quotalen [see
268 Documentation/security/keys.txt]. 268 Documentation/security/keys/core.rst].
269 269
270 struct key_preparsed_payload { 270 struct key_preparsed_payload {
271 char *description; 271 char *description;
diff --git a/Documentation/crypto/conf.py b/Documentation/crypto/conf.py
new file mode 100644
index 000000000000..4335d251ddf3
--- /dev/null
+++ b/Documentation/crypto/conf.py
@@ -0,0 +1,10 @@
1# -*- coding: utf-8; mode: python -*-
2
3project = 'Linux Kernel Crypto API'
4
5tags.add("subproject")
6
7latex_documents = [
8 ('index', 'crypto-api.tex', 'Linux Kernel Crypto API manual',
9 'The kernel development community', 'manual'),
10]
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 07d881147ef3..4ac991dbddb7 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -23,6 +23,7 @@ whole; patches welcome!
23 kmemleak 23 kmemleak
24 kmemcheck 24 kmemcheck
25 gdb-kernel-debugging 25 gdb-kernel-debugging
26 kgdb
26 27
27 28
28.. only:: subproject and html 29.. only:: subproject and html
diff --git a/Documentation/dev-tools/kgdb.rst b/Documentation/dev-tools/kgdb.rst
new file mode 100644
index 000000000000..75273203a35a
--- /dev/null
+++ b/Documentation/dev-tools/kgdb.rst
@@ -0,0 +1,907 @@
1=================================================
2Using kgdb, kdb and the kernel debugger internals
3=================================================
4
5:Author: Jason Wessel
6
7Introduction
8============
9
10The kernel has two different debugger front ends (kdb and kgdb) which
11interface to the debug core. It is possible to use either of the
12debugger front ends and dynamically transition between them if you
13configure the kernel properly at compile and runtime.
14
15Kdb is simplistic shell-style interface which you can use on a system
16console with a keyboard or serial console. You can use it to inspect
17memory, registers, process lists, dmesg, and even set breakpoints to
18stop in a certain location. Kdb is not a source level debugger, although
19you can set breakpoints and execute some basic kernel run control. Kdb
20is mainly aimed at doing some analysis to aid in development or
21diagnosing kernel problems. You can access some symbols by name in
22kernel built-ins or in kernel modules if the code was built with
23``CONFIG_KALLSYMS``.
24
25Kgdb is intended to be used as a source level debugger for the Linux
26kernel. It is used along with gdb to debug a Linux kernel. The
27expectation is that gdb can be used to "break in" to the kernel to
28inspect memory, variables and look through call stack information
29similar to the way an application developer would use gdb to debug an
30application. It is possible to place breakpoints in kernel code and
31perform some limited execution stepping.
32
33Two machines are required for using kgdb. One of these machines is a
34development machine and the other is the target machine. The kernel to
35be debugged runs on the target machine. The development machine runs an
36instance of gdb against the vmlinux file which contains the symbols (not
37a boot image such as bzImage, zImage, uImage...). In gdb the developer
38specifies the connection parameters and connects to kgdb. The type of
39connection a developer makes with gdb depends on the availability of
40kgdb I/O modules compiled as built-ins or loadable kernel modules in the
41test machine's kernel.
42
43Compiling a kernel
44==================
45
46- In order to enable compilation of kdb, you must first enable kgdb.
47
48- The kgdb test compile options are described in the kgdb test suite
49 chapter.
50
51Kernel config options for kgdb
52------------------------------
53
54To enable ``CONFIG_KGDB`` you should look under
55:menuselection:`Kernel hacking --> Kernel debugging` and select
56:menuselection:`KGDB: kernel debugger`.
57
58While it is not a hard requirement that you have symbols in your vmlinux
59file, gdb tends not to be very useful without the symbolic data, so you
60will want to turn on ``CONFIG_DEBUG_INFO`` which is called
61:menuselection:`Compile the kernel with debug info` in the config menu.
62
63It is advised, but not required, that you turn on the
64``CONFIG_FRAME_POINTER`` kernel option which is called :menuselection:`Compile
65the kernel with frame pointers` in the config menu. This option inserts code
66to into the compiled executable which saves the frame information in
67registers or on the stack at different points which allows a debugger
68such as gdb to more accurately construct stack back traces while
69debugging the kernel.
70
71If the architecture that you are using supports the kernel option
72``CONFIG_STRICT_KERNEL_RWX``, you should consider turning it off. This
73option will prevent the use of software breakpoints because it marks
74certain regions of the kernel's memory space as read-only. If kgdb
75supports it for the architecture you are using, you can use hardware
76breakpoints if you desire to run with the ``CONFIG_STRICT_KERNEL_RWX``
77option turned on, else you need to turn off this option.
78
79Next you should choose one of more I/O drivers to interconnect debugging
80host and debugged target. Early boot debugging requires a KGDB I/O
81driver that supports early debugging and the driver must be built into
82the kernel directly. Kgdb I/O driver configuration takes place via
83kernel or module parameters which you can learn more about in the in the
84section that describes the parameter kgdboc.
85
86Here is an example set of ``.config`` symbols to enable or disable for kgdb::
87
88 # CONFIG_STRICT_KERNEL_RWX is not set
89 CONFIG_FRAME_POINTER=y
90 CONFIG_KGDB=y
91 CONFIG_KGDB_SERIAL_CONSOLE=y
92
93Kernel config options for kdb
94-----------------------------
95
96Kdb is quite a bit more complex than the simple gdbstub sitting on top
97of the kernel's debug core. Kdb must implement a shell, and also adds
98some helper functions in other parts of the kernel, responsible for
99printing out interesting data such as what you would see if you ran
100``lsmod``, or ``ps``. In order to build kdb into the kernel you follow the
101same steps as you would for kgdb.
102
103The main config option for kdb is ``CONFIG_KGDB_KDB`` which is called
104:menuselection:`KGDB_KDB: include kdb frontend for kgdb` in the config menu.
105In theory you would have already also selected an I/O driver such as the
106``CONFIG_KGDB_SERIAL_CONSOLE`` interface if you plan on using kdb on a
107serial port, when you were configuring kgdb.
108
109If you want to use a PS/2-style keyboard with kdb, you would select
110``CONFIG_KDB_KEYBOARD`` which is called :menuselection:`KGDB_KDB: keyboard as
111input device` in the config menu. The ``CONFIG_KDB_KEYBOARD`` option is not
112used for anything in the gdb interface to kgdb. The ``CONFIG_KDB_KEYBOARD``
113option only works with kdb.
114
115Here is an example set of ``.config`` symbols to enable/disable kdb::
116
117 # CONFIG_STRICT_KERNEL_RWX is not set
118 CONFIG_FRAME_POINTER=y
119 CONFIG_KGDB=y
120 CONFIG_KGDB_SERIAL_CONSOLE=y
121 CONFIG_KGDB_KDB=y
122 CONFIG_KDB_KEYBOARD=y
123
124Kernel Debugger Boot Arguments
125==============================
126
127This section describes the various runtime kernel parameters that affect
128the configuration of the kernel debugger. The following chapter covers
129using kdb and kgdb as well as providing some examples of the
130configuration parameters.
131
132Kernel parameter: kgdboc
133------------------------
134
135The kgdboc driver was originally an abbreviation meant to stand for
136"kgdb over console". Today it is the primary mechanism to configure how
137to communicate from gdb to kgdb as well as the devices you want to use
138to interact with the kdb shell.
139
140For kgdb/gdb, kgdboc is designed to work with a single serial port. It
141is intended to cover the circumstance where you want to use a serial
142console as your primary console as well as using it to perform kernel
143debugging. It is also possible to use kgdb on a serial port which is not
144designated as a system console. Kgdboc may be configured as a kernel
145built-in or a kernel loadable module. You can only make use of
146``kgdbwait`` and early debugging if you build kgdboc into the kernel as
147a built-in.
148
149Optionally you can elect to activate kms (Kernel Mode Setting)
150integration. When you use kms with kgdboc and you have a video driver
151that has atomic mode setting hooks, it is possible to enter the debugger
152on the graphics console. When the kernel execution is resumed, the
153previous graphics mode will be restored. This integration can serve as a
154useful tool to aid in diagnosing crashes or doing analysis of memory
155with kdb while allowing the full graphics console applications to run.
156
157kgdboc arguments
158~~~~~~~~~~~~~~~~
159
160Usage::
161
162 kgdboc=[kms][[,]kbd][[,]serial_device][,baud]
163
164The order listed above must be observed if you use any of the optional
165configurations together.
166
167Abbreviations:
168
169- kms = Kernel Mode Setting
170
171- kbd = Keyboard
172
173You can configure kgdboc to use the keyboard, and/or a serial device
174depending on if you are using kdb and/or kgdb, in one of the following
175scenarios. The order listed above must be observed if you use any of the
176optional configurations together. Using kms + only gdb is generally not
177a useful combination.
178
179Using loadable module or built-in
180^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
181
1821. As a kernel built-in:
183
184 Use the kernel boot argument::
185
186 kgdboc=<tty-device>,[baud]
187
1882. As a kernel loadable module:
189
190 Use the command::
191
192 modprobe kgdboc kgdboc=<tty-device>,[baud]
193
194 Here are two examples of how you might format the kgdboc string. The
195 first is for an x86 target using the first serial port. The second
196 example is for the ARM Versatile AB using the second serial port.
197
198 1. ``kgdboc=ttyS0,115200``
199
200 2. ``kgdboc=ttyAMA1,115200``
201
202Configure kgdboc at runtime with sysfs
203^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
204
205At run time you can enable or disable kgdboc by echoing a parameters
206into the sysfs. Here are two examples:
207
2081. Enable kgdboc on ttyS0::
209
210 echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc
211
2122. Disable kgdboc::
213
214 echo "" > /sys/module/kgdboc/parameters/kgdboc
215
216.. note::
217
218 You do not need to specify the baud if you are configuring the
219 console on tty which is already configured or open.
220
221More examples
222^^^^^^^^^^^^^
223
224You can configure kgdboc to use the keyboard, and/or a serial device
225depending on if you are using kdb and/or kgdb, in one of the following
226scenarios.
227
2281. kdb and kgdb over only a serial port::
229
230 kgdboc=<serial_device>[,baud]
231
232 Example::
233
234 kgdboc=ttyS0,115200
235
2362. kdb and kgdb with keyboard and a serial port::
237
238 kgdboc=kbd,<serial_device>[,baud]
239
240 Example::
241
242 kgdboc=kbd,ttyS0,115200
243
2443. kdb with a keyboard::
245
246 kgdboc=kbd
247
2484. kdb with kernel mode setting::
249
250 kgdboc=kms,kbd
251
2525. kdb with kernel mode setting and kgdb over a serial port::
253
254 kgdboc=kms,kbd,ttyS0,115200
255
256.. note::
257
258 Kgdboc does not support interrupting the target via the gdb remote
259 protocol. You must manually send a :kbd:`SysRq-G` unless you have a proxy
260 that splits console output to a terminal program. A console proxy has a
261 separate TCP port for the debugger and a separate TCP port for the
262 "human" console. The proxy can take care of sending the :kbd:`SysRq-G`
263 for you.
264
265When using kgdboc with no debugger proxy, you can end up connecting the
266debugger at one of two entry points. If an exception occurs after you
267have loaded kgdboc, a message should print on the console stating it is
268waiting for the debugger. In this case you disconnect your terminal
269program and then connect the debugger in its place. If you want to
270interrupt the target system and forcibly enter a debug session you have
271to issue a :kbd:`Sysrq` sequence and then type the letter :kbd:`g`. Then you
272disconnect the terminal session and connect gdb. Your options if you
273don't like this are to hack gdb to send the :kbd:`SysRq-G` for you as well as
274on the initial connect, or to use a debugger proxy that allows an
275unmodified gdb to do the debugging.
276
277Kernel parameter: ``kgdbwait``
278------------------------------
279
280The Kernel command line option ``kgdbwait`` makes kgdb wait for a
281debugger connection during booting of a kernel. You can only use this
282option if you compiled a kgdb I/O driver into the kernel and you
283specified the I/O driver configuration as a kernel command line option.
284The kgdbwait parameter should always follow the configuration parameter
285for the kgdb I/O driver in the kernel command line else the I/O driver
286will not be configured prior to asking the kernel to use it to wait.
287
288The kernel will stop and wait as early as the I/O driver and
289architecture allows when you use this option. If you build the kgdb I/O
290driver as a loadable kernel module kgdbwait will not do anything.
291
292Kernel parameter: ``kgdbcon``
293-----------------------------
294
295The ``kgdbcon`` feature allows you to see :c:func:`printk` messages inside gdb
296while gdb is connected to the kernel. Kdb does not make use of the kgdbcon
297feature.
298
299Kgdb supports using the gdb serial protocol to send console messages to
300the debugger when the debugger is connected and running. There are two
301ways to activate this feature.
302
3031. Activate with the kernel command line option::
304
305 kgdbcon
306
3072. Use sysfs before configuring an I/O driver::
308
309 echo 1 > /sys/module/kgdb/parameters/kgdb_use_con
310
311.. note::
312
313 If you do this after you configure the kgdb I/O driver, the
314 setting will not take effect until the next point the I/O is
315 reconfigured.
316
317.. important::
318
319 You cannot use kgdboc + kgdbcon on a tty that is an
320 active system console. An example of incorrect usage is::
321
322 console=ttyS0,115200 kgdboc=ttyS0 kgdbcon
323
324It is possible to use this option with kgdboc on a tty that is not a
325system console.
326
327Run time parameter: ``kgdbreboot``
328----------------------------------
329
330The kgdbreboot feature allows you to change how the debugger deals with
331the reboot notification. You have 3 choices for the behavior. The
332default behavior is always set to 0.
333
334.. tabularcolumns:: |p{0.4cm}|p{11.5cm}|p{5.6cm}|
335
336.. flat-table::
337 :widths: 1 10 8
338
339 * - 1
340 - ``echo -1 > /sys/module/debug_core/parameters/kgdbreboot``
341 - Ignore the reboot notification entirely.
342
343 * - 2
344 - ``echo 0 > /sys/module/debug_core/parameters/kgdbreboot``
345 - Send the detach message to any attached debugger client.
346
347 * - 3
348 - ``echo 1 > /sys/module/debug_core/parameters/kgdbreboot``
349 - Enter the debugger on reboot notify.
350
351Using kdb
352=========
353
354Quick start for kdb on a serial port
355------------------------------------
356
357This is a quick example of how to use kdb.
358
3591. Configure kgdboc at boot using kernel parameters::
360
361 console=ttyS0,115200 kgdboc=ttyS0,115200
362
363 OR
364
365 Configure kgdboc after the kernel has booted; assuming you are using
366 a serial port console::
367
368 echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc
369
3702. Enter the kernel debugger manually or by waiting for an oops or
371 fault. There are several ways you can enter the kernel debugger
372 manually; all involve using the :kbd:`SysRq-G`, which means you must have
373 enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config.
374
375 - When logged in as root or with a super user session you can run::
376
377 echo g > /proc/sysrq-trigger
378
379 - Example using minicom 2.2
380
381 Press: :kbd:`CTRL-A` :kbd:`f` :kbd:`g`
382
383 - When you have telneted to a terminal server that supports sending
384 a remote break
385
386 Press: :kbd:`CTRL-]`
387
388 Type in: ``send break``
389
390 Press: :kbd:`Enter` :kbd:`g`
391
3923. From the kdb prompt you can run the ``help`` command to see a complete
393 list of the commands that are available.
394
395 Some useful commands in kdb include:
396
397 =========== =================================================================
398 ``lsmod`` Shows where kernel modules are loaded
399 ``ps`` Displays only the active processes
400 ``ps A`` Shows all the processes
401 ``summary`` Shows kernel version info and memory usage
402 ``bt`` Get a backtrace of the current process using :c:func:`dump_stack`
403 ``dmesg`` View the kernel syslog buffer
404 ``go`` Continue the system
405 =========== =================================================================
406
4074. When you are done using kdb you need to consider rebooting the system
408 or using the ``go`` command to resuming normal kernel execution. If you
409 have paused the kernel for a lengthy period of time, applications
410 that rely on timely networking or anything to do with real wall clock
411 time could be adversely affected, so you should take this into
412 consideration when using the kernel debugger.
413
414Quick start for kdb using a keyboard connected console
415------------------------------------------------------
416
417This is a quick example of how to use kdb with a keyboard.
418
4191. Configure kgdboc at boot using kernel parameters::
420
421 kgdboc=kbd
422
423 OR
424
425 Configure kgdboc after the kernel has booted::
426
427 echo kbd > /sys/module/kgdboc/parameters/kgdboc
428
4292. Enter the kernel debugger manually or by waiting for an oops or
430 fault. There are several ways you can enter the kernel debugger
431 manually; all involve using the :kbd:`SysRq-G`, which means you must have
432 enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config.
433
434 - When logged in as root or with a super user session you can run::
435
436 echo g > /proc/sysrq-trigger
437
438 - Example using a laptop keyboard:
439
440 Press and hold down: :kbd:`Alt`
441
442 Press and hold down: :kbd:`Fn`
443
444 Press and release the key with the label: :kbd:`SysRq`
445
446 Release: :kbd:`Fn`
447
448 Press and release: :kbd:`g`
449
450 Release: :kbd:`Alt`
451
452 - Example using a PS/2 101-key keyboard
453
454 Press and hold down: :kbd:`Alt`
455
456 Press and release the key with the label: :kbd:`SysRq`
457
458 Press and release: :kbd:`g`
459
460 Release: :kbd:`Alt`
461
4623. Now type in a kdb command such as ``help``, ``dmesg``, ``bt`` or ``go`` to
463 continue kernel execution.
464
465Using kgdb / gdb
466================
467
468In order to use kgdb you must activate it by passing configuration
469information to one of the kgdb I/O drivers. If you do not pass any
470configuration information kgdb will not do anything at all. Kgdb will
471only actively hook up to the kernel trap hooks if a kgdb I/O driver is
472loaded and configured. If you unconfigure a kgdb I/O driver, kgdb will
473unregister all the kernel hook points.
474
475All kgdb I/O drivers can be reconfigured at run time, if
476``CONFIG_SYSFS`` and ``CONFIG_MODULES`` are enabled, by echo'ing a new
477config string to ``/sys/module/<driver>/parameter/<option>``. The driver
478can be unconfigured by passing an empty string. You cannot change the
479configuration while the debugger is attached. Make sure to detach the
480debugger with the ``detach`` command prior to trying to unconfigure a
481kgdb I/O driver.
482
483Connecting with gdb to a serial port
484------------------------------------
485
4861. Configure kgdboc
487
488 Configure kgdboc at boot using kernel parameters::
489
490 kgdboc=ttyS0,115200
491
492 OR
493
494 Configure kgdboc after the kernel has booted::
495
496 echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc
497
4982. Stop kernel execution (break into the debugger)
499
500 In order to connect to gdb via kgdboc, the kernel must first be
501 stopped. There are several ways to stop the kernel which include
502 using kgdbwait as a boot argument, via a :kbd:`SysRq-G`, or running the
503 kernel until it takes an exception where it waits for the debugger to
504 attach.
505
506 - When logged in as root or with a super user session you can run::
507
508 echo g > /proc/sysrq-trigger
509
510 - Example using minicom 2.2
511
512 Press: :kbd:`CTRL-A` :kbd:`f` :kbd:`g`
513
514 - When you have telneted to a terminal server that supports sending
515 a remote break
516
517 Press: :kbd:`CTRL-]`
518
519 Type in: ``send break``
520
521 Press: :kbd:`Enter` :kbd:`g`
522
5233. Connect from gdb
524
525 Example (using a directly connected port)::
526
527 % gdb ./vmlinux
528 (gdb) set remotebaud 115200
529 (gdb) target remote /dev/ttyS0
530
531
532 Example (kgdb to a terminal server on TCP port 2012)::
533
534 % gdb ./vmlinux
535 (gdb) target remote 192.168.2.2:2012
536
537
538 Once connected, you can debug a kernel the way you would debug an
539 application program.
540
541 If you are having problems connecting or something is going seriously
542 wrong while debugging, it will most often be the case that you want
543 to enable gdb to be verbose about its target communications. You do
544 this prior to issuing the ``target remote`` command by typing in::
545
546 set debug remote 1
547
548Remember if you continue in gdb, and need to "break in" again, you need
549to issue an other :kbd:`SysRq-G`. It is easy to create a simple entry point by
550putting a breakpoint at ``sys_sync`` and then you can run ``sync`` from a
551shell or script to break into the debugger.
552
553kgdb and kdb interoperability
554=============================
555
556It is possible to transition between kdb and kgdb dynamically. The debug
557core will remember which you used the last time and automatically start
558in the same mode.
559
560Switching between kdb and kgdb
561------------------------------
562
563Switching from kgdb to kdb
564~~~~~~~~~~~~~~~~~~~~~~~~~~
565
566There are two ways to switch from kgdb to kdb: you can use gdb to issue
567a maintenance packet, or you can blindly type the command ``$3#33``.
568Whenever the kernel debugger stops in kgdb mode it will print the
569message ``KGDB or $3#33 for KDB``. It is important to note that you have
570to type the sequence correctly in one pass. You cannot type a backspace
571or delete because kgdb will interpret that as part of the debug stream.
572
5731. Change from kgdb to kdb by blindly typing::
574
575 $3#33
576
5772. Change from kgdb to kdb with gdb::
578
579 maintenance packet 3
580
581 .. note::
582
583 Now you must kill gdb. Typically you press :kbd:`CTRL-Z` and issue
584 the command::
585
586 kill -9 %
587
588Change from kdb to kgdb
589~~~~~~~~~~~~~~~~~~~~~~~
590
591There are two ways you can change from kdb to kgdb. You can manually
592enter kgdb mode by issuing the kgdb command from the kdb shell prompt,
593or you can connect gdb while the kdb shell prompt is active. The kdb
594shell looks for the typical first commands that gdb would issue with the
595gdb remote protocol and if it sees one of those commands it
596automatically changes into kgdb mode.
597
5981. From kdb issue the command::
599
600 kgdb
601
602 Now disconnect your terminal program and connect gdb in its place
603
6042. At the kdb prompt, disconnect the terminal program and connect gdb in
605 its place.
606
607Running kdb commands from gdb
608-----------------------------
609
610It is possible to run a limited set of kdb commands from gdb, using the
611gdb monitor command. You don't want to execute any of the run control or
612breakpoint operations, because it can disrupt the state of the kernel
613debugger. You should be using gdb for breakpoints and run control
614operations if you have gdb connected. The more useful commands to run
615are things like lsmod, dmesg, ps or possibly some of the memory
616information commands. To see all the kdb commands you can run
617``monitor help``.
618
619Example::
620
621 (gdb) monitor ps
622 1 idle process (state I) and
623 27 sleeping system daemon (state M) processes suppressed,
624 use 'ps A' to see all.
625 Task Addr Pid Parent [*] cpu State Thread Command
626
627 0xc78291d0 1 0 0 0 S 0xc7829404 init
628 0xc7954150 942 1 0 0 S 0xc7954384 dropbear
629 0xc78789c0 944 1 0 0 S 0xc7878bf4 sh
630 (gdb)
631
632kgdb Test Suite
633===============
634
635When kgdb is enabled in the kernel config you can also elect to enable
636the config parameter ``KGDB_TESTS``. Turning this on will enable a special
637kgdb I/O module which is designed to test the kgdb internal functions.
638
639The kgdb tests are mainly intended for developers to test the kgdb
640internals as well as a tool for developing a new kgdb architecture
641specific implementation. These tests are not really for end users of the
642Linux kernel. The primary source of documentation would be to look in
643the ``drivers/misc/kgdbts.c`` file.
644
645The kgdb test suite can also be configured at compile time to run the
646core set of tests by setting the kernel config parameter
647``KGDB_TESTS_ON_BOOT``. This particular option is aimed at automated
648regression testing and does not require modifying the kernel boot config
649arguments. If this is turned on, the kgdb test suite can be disabled by
650specifying ``kgdbts=`` as a kernel boot argument.
651
652Kernel Debugger Internals
653=========================
654
655Architecture Specifics
656----------------------
657
658The kernel debugger is organized into a number of components:
659
6601. The debug core
661
662 The debug core is found in ``kernel/debugger/debug_core.c``. It
663 contains:
664
665 - A generic OS exception handler which includes sync'ing the
666 processors into a stopped state on an multi-CPU system.
667
668 - The API to talk to the kgdb I/O drivers
669
670 - The API to make calls to the arch-specific kgdb implementation
671
672 - The logic to perform safe memory reads and writes to memory while
673 using the debugger
674
675 - A full implementation for software breakpoints unless overridden
676 by the arch
677
678 - The API to invoke either the kdb or kgdb frontend to the debug
679 core.
680
681 - The structures and callback API for atomic kernel mode setting.
682
683 .. note:: kgdboc is where the kms callbacks are invoked.
684
6852. kgdb arch-specific implementation
686
687 This implementation is generally found in ``arch/*/kernel/kgdb.c``. As
688 an example, ``arch/x86/kernel/kgdb.c`` contains the specifics to
689 implement HW breakpoint as well as the initialization to dynamically
690 register and unregister for the trap handlers on this architecture.
691 The arch-specific portion implements:
692
693 - contains an arch-specific trap catcher which invokes
694 :c:func:`kgdb_handle_exception` to start kgdb about doing its work
695
696 - translation to and from gdb specific packet format to :c:type:`pt_regs`
697
698 - Registration and unregistration of architecture specific trap
699 hooks
700
701 - Any special exception handling and cleanup
702
703 - NMI exception handling and cleanup
704
705 - (optional) HW breakpoints
706
7073. gdbstub frontend (aka kgdb)
708
709 The gdbstub is located in ``kernel/debug/gdbstub.c``. It contains:
710
711 - All the logic to implement the gdb serial protocol
712
7134. kdb frontend
714
715 The kdb debugger shell is broken down into a number of components.
716 The kdb core is located in kernel/debug/kdb. There are a number of
717 helper functions in some of the other kernel components to make it
718 possible for kdb to examine and report information about the kernel
719 without taking locks that could cause a kernel deadlock. The kdb core
720 contains implements the following functionality.
721
722 - A simple shell
723
724 - The kdb core command set
725
726 - A registration API to register additional kdb shell commands.
727
728 - A good example of a self-contained kdb module is the ``ftdump``
729 command for dumping the ftrace buffer. See:
730 ``kernel/trace/trace_kdb.c``
731
732 - For an example of how to dynamically register a new kdb command
733 you can build the kdb_hello.ko kernel module from
734 ``samples/kdb/kdb_hello.c``. To build this example you can set
735 ``CONFIG_SAMPLES=y`` and ``CONFIG_SAMPLE_KDB=m`` in your kernel
736 config. Later run ``modprobe kdb_hello`` and the next time you
737 enter the kdb shell, you can run the ``hello`` command.
738
739 - The implementation for :c:func:`kdb_printf` which emits messages directly
740 to I/O drivers, bypassing the kernel log.
741
742 - SW / HW breakpoint management for the kdb shell
743
7445. kgdb I/O driver
745
746 Each kgdb I/O driver has to provide an implementation for the
747 following:
748
749 - configuration via built-in or module
750
751 - dynamic configuration and kgdb hook registration calls
752
753 - read and write character interface
754
755 - A cleanup handler for unconfiguring from the kgdb core
756
757 - (optional) Early debug methodology
758
759 Any given kgdb I/O driver has to operate very closely with the
760 hardware and must do it in such a way that does not enable interrupts
761 or change other parts of the system context without completely
762 restoring them. The kgdb core will repeatedly "poll" a kgdb I/O
763 driver for characters when it needs input. The I/O driver is expected
764 to return immediately if there is no data available. Doing so allows
765 for the future possibility to touch watchdog hardware in such a way
766 as to have a target system not reset when these are enabled.
767
768If you are intent on adding kgdb architecture specific support for a new
769architecture, the architecture should define ``HAVE_ARCH_KGDB`` in the
770architecture specific Kconfig file. This will enable kgdb for the
771architecture, and at that point you must create an architecture specific
772kgdb implementation.
773
774There are a few flags which must be set on every architecture in their
775``asm/kgdb.h`` file. These are:
776
777- ``NUMREGBYTES``:
778 The size in bytes of all of the registers, so that we
779 can ensure they will all fit into a packet.
780
781- ``BUFMAX``:
782 The size in bytes of the buffer GDB will read into. This must
783 be larger than NUMREGBYTES.
784
785- ``CACHE_FLUSH_IS_SAFE``:
786 Set to 1 if it is always safe to call
787 flush_cache_range or flush_icache_range. On some architectures,
788 these functions may not be safe to call on SMP since we keep other
789 CPUs in a holding pattern.
790
791There are also the following functions for the common backend, found in
792``kernel/kgdb.c``, that must be supplied by the architecture-specific
793backend unless marked as (optional), in which case a default function
794maybe used if the architecture does not need to provide a specific
795implementation.
796
797.. kernel-doc:: include/linux/kgdb.h
798 :internal:
799
800kgdboc internals
801----------------
802
803kgdboc and uarts
804~~~~~~~~~~~~~~~~
805
806The kgdboc driver is actually a very thin driver that relies on the
807underlying low level to the hardware driver having "polling hooks" to
808which the tty driver is attached. In the initial implementation of
809kgdboc the serial_core was changed to expose a low level UART hook for
810doing polled mode reading and writing of a single character while in an
811atomic context. When kgdb makes an I/O request to the debugger, kgdboc
812invokes a callback in the serial core which in turn uses the callback in
813the UART driver.
814
815When using kgdboc with a UART, the UART driver must implement two
816callbacks in the :c:type:`struct uart_ops <uart_ops>`.
817Example from ``drivers/8250.c``::
818
819
820 #ifdef CONFIG_CONSOLE_POLL
821 .poll_get_char = serial8250_get_poll_char,
822 .poll_put_char = serial8250_put_poll_char,
823 #endif
824
825
826Any implementation specifics around creating a polling driver use the
827``#ifdef CONFIG_CONSOLE_POLL``, as shown above. Keep in mind that
828polling hooks have to be implemented in such a way that they can be
829called from an atomic context and have to restore the state of the UART
830chip on return such that the system can return to normal when the
831debugger detaches. You need to be very careful with any kind of lock you
832consider, because failing here is most likely going to mean pressing the
833reset button.
834
835kgdboc and keyboards
836~~~~~~~~~~~~~~~~~~~~~~~~
837
838The kgdboc driver contains logic to configure communications with an
839attached keyboard. The keyboard infrastructure is only compiled into the
840kernel when ``CONFIG_KDB_KEYBOARD=y`` is set in the kernel configuration.
841
842The core polled keyboard driver driver for PS/2 type keyboards is in
843``drivers/char/kdb_keyboard.c``. This driver is hooked into the debug core
844when kgdboc populates the callback in the array called
845:c:type:`kdb_poll_funcs[]`. The :c:func:`kdb_get_kbd_char` is the top-level
846function which polls hardware for single character input.
847
848kgdboc and kms
849~~~~~~~~~~~~~~~~~~
850
851The kgdboc driver contains logic to request the graphics display to
852switch to a text context when you are using ``kgdboc=kms,kbd``, provided
853that you have a video driver which has a frame buffer console and atomic
854kernel mode setting support.
855
856Every time the kernel debugger is entered it calls
857:c:func:`kgdboc_pre_exp_handler` which in turn calls :c:func:`con_debug_enter`
858in the virtual console layer. On resuming kernel execution, the kernel
859debugger calls :c:func:`kgdboc_post_exp_handler` which in turn calls
860:c:func:`con_debug_leave`.
861
862Any video driver that wants to be compatible with the kernel debugger
863and the atomic kms callbacks must implement the ``mode_set_base_atomic``,
864``fb_debug_enter`` and ``fb_debug_leave operations``. For the
865``fb_debug_enter`` and ``fb_debug_leave`` the option exists to use the
866generic drm fb helper functions or implement something custom for the
867hardware. The following example shows the initialization of the
868.mode_set_base_atomic operation in
869drivers/gpu/drm/i915/intel_display.c::
870
871
872 static const struct drm_crtc_helper_funcs intel_helper_funcs = {
873 [...]
874 .mode_set_base_atomic = intel_pipe_set_base_atomic,
875 [...]
876 };
877
878
879Here is an example of how the i915 driver initializes the
880fb_debug_enter and fb_debug_leave functions to use the generic drm
881helpers in ``drivers/gpu/drm/i915/intel_fb.c``::
882
883
884 static struct fb_ops intelfb_ops = {
885 [...]
886 .fb_debug_enter = drm_fb_helper_debug_enter,
887 .fb_debug_leave = drm_fb_helper_debug_leave,
888 [...]
889 };
890
891
892Credits
893=======
894
895The following people have contributed to this document:
896
8971. Amit Kale <amitkale@linsyssoft.com>
898
8992. Tom Rini <trini@kernel.crashing.org>
900
901In March 2008 this document was completely rewritten by:
902
903- Jason Wessel <jason.wessel@windriver.com>
904
905In Jan 2010 this document was updated to include kdb.
906
907- Jason Wessel <jason.wessel@windriver.com>
diff --git a/Documentation/doc-guide/docbook.rst b/Documentation/doc-guide/docbook.rst
deleted file mode 100644
index d8bf04308b43..000000000000
--- a/Documentation/doc-guide/docbook.rst
+++ /dev/null
@@ -1,90 +0,0 @@
1DocBook XML [DEPRECATED]
2========================
3
4.. attention::
5
6 This section describes the deprecated DocBook XML toolchain. Please do not
7 create new DocBook XML template files. Please consider converting existing
8 DocBook XML templates files to Sphinx/reStructuredText.
9
10Converting DocBook to Sphinx
11----------------------------
12
13Over time, we expect all of the documents under ``Documentation/DocBook`` to be
14converted to Sphinx and reStructuredText. For most DocBook XML documents, a good
15enough solution is to use the simple ``Documentation/sphinx/tmplcvt`` script,
16which uses ``pandoc`` under the hood. For example::
17
18 $ cd Documentation/sphinx
19 $ ./tmplcvt ../DocBook/in.tmpl ../out.rst
20
21Then edit the resulting rst files to fix any remaining issues, and add the
22document in the ``toctree`` in ``Documentation/index.rst``.
23
24Components of the kernel-doc system
25-----------------------------------
26
27Many places in the source tree have extractable documentation in the form of
28block comments above functions. The components of this system are:
29
30- ``scripts/kernel-doc``
31
32 This is a perl script that hunts for the block comments and can mark them up
33 directly into reStructuredText, DocBook, man, text, and HTML. (No, not
34 texinfo.)
35
36- ``Documentation/DocBook/*.tmpl``
37
38 These are XML template files, which are normal XML files with special
39 place-holders for where the extracted documentation should go.
40
41- ``scripts/docproc.c``
42
43 This is a program for converting XML template files into XML files. When a
44 file is referenced it is searched for symbols exported (EXPORT_SYMBOL), to be
45 able to distinguish between internal and external functions.
46
47 It invokes kernel-doc, giving it the list of functions that are to be
48 documented.
49
50 Additionally it is used to scan the XML template files to locate all the files
51 referenced herein. This is used to generate dependency information as used by
52 make.
53
54- ``Makefile``
55
56 The targets 'xmldocs', 'psdocs', 'pdfdocs', and 'htmldocs' are used to build
57 DocBook XML files, PostScript files, PDF files, and html files in
58 Documentation/DocBook. The older target 'sgmldocs' is equivalent to 'xmldocs'.
59
60- ``Documentation/DocBook/Makefile``
61
62 This is where C files are associated with SGML templates.
63
64How to use kernel-doc comments in DocBook XML template files
65------------------------------------------------------------
66
67DocBook XML template files (\*.tmpl) are like normal XML files, except that they
68can contain escape sequences where extracted documentation should be inserted.
69
70``!E<filename>`` is replaced by the documentation, in ``<filename>``, for
71functions that are exported using ``EXPORT_SYMBOL``: the function list is
72collected from files listed in ``Documentation/DocBook/Makefile``.
73
74``!I<filename>`` is replaced by the documentation for functions that are **not**
75exported using ``EXPORT_SYMBOL``.
76
77``!D<filename>`` is used to name additional files to search for functions
78exported using ``EXPORT_SYMBOL``.
79
80``!F<filename> <function [functions...]>`` is replaced by the documentation, in
81``<filename>``, for the functions listed.
82
83``!P<filename> <section title>`` is replaced by the contents of the ``DOC:``
84section titled ``<section title>`` from ``<filename>``. Spaces are allowed in
85``<section title>``; do not quote the ``<section title>``.
86
87``!C<filename>`` is replaced by nothing, but makes the tools check that all DOC:
88sections and documented functions, symbols, etc. are used. This makes sense to
89use when you use ``!F`` or ``!P`` only and want to verify that all documentation
90is included.
diff --git a/Documentation/doc-guide/index.rst b/Documentation/doc-guide/index.rst
index 6fff4024606e..a7f95d7d3a63 100644
--- a/Documentation/doc-guide/index.rst
+++ b/Documentation/doc-guide/index.rst
@@ -10,7 +10,6 @@ How to write kernel documentation
10 sphinx.rst 10 sphinx.rst
11 kernel-doc.rst 11 kernel-doc.rst
12 parse-headers.rst 12 parse-headers.rst
13 docbook.rst
14 13
15.. only:: subproject and html 14.. only:: subproject and html
16 15
diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst
index b32e4813ff6f..b24854b5d6be 100644
--- a/Documentation/doc-guide/kernel-doc.rst
+++ b/Documentation/doc-guide/kernel-doc.rst
@@ -149,6 +149,16 @@ Domain`_ references.
149``%CONST`` 149``%CONST``
150 Name of a constant. (No cross-referencing, just formatting.) 150 Name of a constant. (No cross-referencing, just formatting.)
151 151
152````literal````
153 A literal block that should be handled as-is. The output will use a
154 ``monospaced font``.
155
156 Useful if you need to use special characters that would otherwise have some
157 meaning either by kernel-doc script of by reStructuredText.
158
159 This is particularly useful if you need to use things like ``%ph`` inside
160 a function description.
161
152``$ENVVAR`` 162``$ENVVAR``
153 Name of an environment variable. (No cross-referencing, just formatting.) 163 Name of an environment variable. (No cross-referencing, just formatting.)
154 164
diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst
index 731334de3efd..84e8e8a9cbdb 100644
--- a/Documentation/doc-guide/sphinx.rst
+++ b/Documentation/doc-guide/sphinx.rst
@@ -15,11 +15,6 @@ are used to describe the functions and types and design of the code. The
15kernel-doc comments have some special structure and formatting, but beyond that 15kernel-doc comments have some special structure and formatting, but beyond that
16they are also treated as reStructuredText. 16they are also treated as reStructuredText.
17 17
18There is also the deprecated DocBook toolchain to generate documentation from
19DocBook XML template files under ``Documentation/DocBook``. The DocBook files
20are to be converted to reStructuredText, and the toolchain is slated to be
21removed.
22
23Finally, there are thousands of plain text documentation files scattered around 18Finally, there are thousands of plain text documentation files scattered around
24``Documentation``. Some of these will likely be converted to reStructuredText 19``Documentation``. Some of these will likely be converted to reStructuredText
25over time, but the bulk of them will remain in plain text. 20over time, but the bulk of them will remain in plain text.
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 77b92221f951..f64a63b233c3 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -118,7 +118,6 @@ defkeymap.c
118devlist.h* 118devlist.h*
119devicetable-offsets.h 119devicetable-offsets.h
120dnotify_test 120dnotify_test
121docproc
122dslm 121dslm
123dtc 122dtc
124elf2ecoff 123elf2ecoff
diff --git a/Documentation/driver-api/i2c.rst b/Documentation/driver-api/i2c.rst
index f3939f7852bd..0bf86a445d01 100644
--- a/Documentation/driver-api/i2c.rst
+++ b/Documentation/driver-api/i2c.rst
@@ -13,8 +13,8 @@ I2C is a multi-master bus; open drain signaling is used to arbitrate
13between masters, as well as to handshake and to synchronize clocks from 13between masters, as well as to handshake and to synchronize clocks from
14slower clients. 14slower clients.
15 15
16The Linux I2C programming interfaces support only the master side of bus 16The Linux I2C programming interfaces support the master side of bus
17interactions, not the slave side. The programming interface is 17interactions and the slave side. The programming interface is
18structured around two kinds of driver, and two kinds of device. An I2C 18structured around two kinds of driver, and two kinds of device. An I2C
19"Adapter Driver" abstracts the controller hardware; it binds to a 19"Adapter Driver" abstracts the controller hardware; it binds to a
20physical device (perhaps a PCI device or platform_device) and exposes a 20physical device (perhaps a PCI device or platform_device) and exposes a
@@ -22,9 +22,8 @@ physical device (perhaps a PCI device or platform_device) and exposes a
22I2C bus segment it manages. On each I2C bus segment will be I2C devices 22I2C bus segment it manages. On each I2C bus segment will be I2C devices
23represented by a :c:type:`struct i2c_client <i2c_client>`. 23represented by a :c:type:`struct i2c_client <i2c_client>`.
24Those devices will be bound to a :c:type:`struct i2c_driver 24Those devices will be bound to a :c:type:`struct i2c_driver
25<i2c_driver>`, which should follow the standard Linux driver 25<i2c_driver>`, which should follow the standard Linux driver model. There
26model. (At this writing, a legacy model is more widely used.) There are 26are functions to perform various I2C protocol operations; at this writing
27functions to perform various I2C protocol operations; at this writing
28all such functions are usable only from task context. 27all such functions are usable only from task context.
29 28
30The System Management Bus (SMBus) is a sibling protocol. Most SMBus 29The System Management Bus (SMBus) is a sibling protocol. Most SMBus
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 8058a87c1c74..3cf1acebc4ee 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -32,7 +32,13 @@ available subsections can be seen below.
32 i2c 32 i2c
33 hsi 33 hsi
34 edac 34 edac
35 scsi
36 libata
37 mtdnand
35 miscellaneous 38 miscellaneous
39 w1
40 rapidio
41 s390-drivers
36 vme 42 vme
37 80211/index 43 80211/index
38 uio-howto 44 uio-howto
diff --git a/Documentation/driver-api/libata.rst b/Documentation/driver-api/libata.rst
new file mode 100644
index 000000000000..4adc056f7635
--- /dev/null
+++ b/Documentation/driver-api/libata.rst
@@ -0,0 +1,1031 @@
1========================
2libATA Developer's Guide
3========================
4
5:Author: Jeff Garzik
6
7Introduction
8============
9
10libATA is a library used inside the Linux kernel to support ATA host
11controllers and devices. libATA provides an ATA driver API, class
12transports for ATA and ATAPI devices, and SCSI<->ATA translation for ATA
13devices according to the T10 SAT specification.
14
15This Guide documents the libATA driver API, library functions, library
16internals, and a couple sample ATA low-level drivers.
17
18libata Driver API
19=================
20
21:c:type:`struct ata_port_operations <ata_port_operations>`
22is defined for every low-level libata
23hardware driver, and it controls how the low-level driver interfaces
24with the ATA and SCSI layers.
25
26FIS-based drivers will hook into the system with ``->qc_prep()`` and
27``->qc_issue()`` high-level hooks. Hardware which behaves in a manner
28similar to PCI IDE hardware may utilize several generic helpers,
29defining at a bare minimum the bus I/O addresses of the ATA shadow
30register blocks.
31
32:c:type:`struct ata_port_operations <ata_port_operations>`
33----------------------------------------------------------
34
35Disable ATA port
36~~~~~~~~~~~~~~~~
37
38::
39
40 void (*port_disable) (struct ata_port *);
41
42
43Called from :c:func:`ata_bus_probe` error path, as well as when unregistering
44from the SCSI module (rmmod, hot unplug). This function should do
45whatever needs to be done to take the port out of use. In most cases,
46:c:func:`ata_port_disable` can be used as this hook.
47
48Called from :c:func:`ata_bus_probe` on a failed probe. Called from
49:c:func:`ata_scsi_release`.
50
51Post-IDENTIFY device configuration
52~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
53
54::
55
56 void (*dev_config) (struct ata_port *, struct ata_device *);
57
58
59Called after IDENTIFY [PACKET] DEVICE is issued to each device found.
60Typically used to apply device-specific fixups prior to issue of SET
61FEATURES - XFER MODE, and prior to operation.
62
63This entry may be specified as NULL in ata_port_operations.
64
65Set PIO/DMA mode
66~~~~~~~~~~~~~~~~
67
68::
69
70 void (*set_piomode) (struct ata_port *, struct ata_device *);
71 void (*set_dmamode) (struct ata_port *, struct ata_device *);
72 void (*post_set_mode) (struct ata_port *);
73 unsigned int (*mode_filter) (struct ata_port *, struct ata_device *, unsigned int);
74
75
76Hooks called prior to the issue of SET FEATURES - XFER MODE command. The
77optional ``->mode_filter()`` hook is called when libata has built a mask of
78the possible modes. This is passed to the ``->mode_filter()`` function
79which should return a mask of valid modes after filtering those
80unsuitable due to hardware limits. It is not valid to use this interface
81to add modes.
82
83``dev->pio_mode`` and ``dev->dma_mode`` are guaranteed to be valid when
84``->set_piomode()`` and when ``->set_dmamode()`` is called. The timings for
85any other drive sharing the cable will also be valid at this point. That
86is the library records the decisions for the modes of each drive on a
87channel before it attempts to set any of them.
88
89``->post_set_mode()`` is called unconditionally, after the SET FEATURES -
90XFER MODE command completes successfully.
91
92``->set_piomode()`` is always called (if present), but ``->set_dma_mode()``
93is only called if DMA is possible.
94
95Taskfile read/write
96~~~~~~~~~~~~~~~~~~~
97
98::
99
100 void (*sff_tf_load) (struct ata_port *ap, struct ata_taskfile *tf);
101 void (*sff_tf_read) (struct ata_port *ap, struct ata_taskfile *tf);
102
103
104``->tf_load()`` is called to load the given taskfile into hardware
105registers / DMA buffers. ``->tf_read()`` is called to read the hardware
106registers / DMA buffers, to obtain the current set of taskfile register
107values. Most drivers for taskfile-based hardware (PIO or MMIO) use
108:c:func:`ata_sff_tf_load` and :c:func:`ata_sff_tf_read` for these hooks.
109
110PIO data read/write
111~~~~~~~~~~~~~~~~~~~
112
113::
114
115 void (*sff_data_xfer) (struct ata_device *, unsigned char *, unsigned int, int);
116
117
118All bmdma-style drivers must implement this hook. This is the low-level
119operation that actually copies the data bytes during a PIO data
120transfer. Typically the driver will choose one of
121:c:func:`ata_sff_data_xfer_noirq`, :c:func:`ata_sff_data_xfer`, or
122:c:func:`ata_sff_data_xfer32`.
123
124ATA command execute
125~~~~~~~~~~~~~~~~~~~
126
127::
128
129 void (*sff_exec_command)(struct ata_port *ap, struct ata_taskfile *tf);
130
131
132causes an ATA command, previously loaded with ``->tf_load()``, to be
133initiated in hardware. Most drivers for taskfile-based hardware use
134:c:func:`ata_sff_exec_command` for this hook.
135
136Per-cmd ATAPI DMA capabilities filter
137~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
138
139::
140
141 int (*check_atapi_dma) (struct ata_queued_cmd *qc);
142
143
144Allow low-level driver to filter ATA PACKET commands, returning a status
145indicating whether or not it is OK to use DMA for the supplied PACKET
146command.
147
148This hook may be specified as NULL, in which case libata will assume
149that atapi dma can be supported.
150
151Read specific ATA shadow registers
152~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153
154::
155
156 u8 (*sff_check_status)(struct ata_port *ap);
157 u8 (*sff_check_altstatus)(struct ata_port *ap);
158
159
160Reads the Status/AltStatus ATA shadow register from hardware. On some
161hardware, reading the Status register has the side effect of clearing
162the interrupt condition. Most drivers for taskfile-based hardware use
163:c:func:`ata_sff_check_status` for this hook.
164
165Write specific ATA shadow register
166~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
167
168::
169
170 void (*sff_set_devctl)(struct ata_port *ap, u8 ctl);
171
172
173Write the device control ATA shadow register to the hardware. Most
174drivers don't need to define this.
175
176Select ATA device on bus
177~~~~~~~~~~~~~~~~~~~~~~~~
178
179::
180
181 void (*sff_dev_select)(struct ata_port *ap, unsigned int device);
182
183
184Issues the low-level hardware command(s) that causes one of N hardware
185devices to be considered 'selected' (active and available for use) on
186the ATA bus. This generally has no meaning on FIS-based devices.
187
188Most drivers for taskfile-based hardware use :c:func:`ata_sff_dev_select` for
189this hook.
190
191Private tuning method
192~~~~~~~~~~~~~~~~~~~~~
193
194::
195
196 void (*set_mode) (struct ata_port *ap);
197
198
199By default libata performs drive and controller tuning in accordance
200with the ATA timing rules and also applies blacklists and cable limits.
201Some controllers need special handling and have custom tuning rules,
202typically raid controllers that use ATA commands but do not actually do
203drive timing.
204
205 **Warning**
206
207 This hook should not be used to replace the standard controller
208 tuning logic when a controller has quirks. Replacing the default
209 tuning logic in that case would bypass handling for drive and bridge
210 quirks that may be important to data reliability. If a controller
211 needs to filter the mode selection it should use the mode_filter
212 hook instead.
213
214Control PCI IDE BMDMA engine
215~~~~~~~~~~~~~~~~~~~~~~~~~~~~
216
217::
218
219 void (*bmdma_setup) (struct ata_queued_cmd *qc);
220 void (*bmdma_start) (struct ata_queued_cmd *qc);
221 void (*bmdma_stop) (struct ata_port *ap);
222 u8 (*bmdma_status) (struct ata_port *ap);
223
224
225When setting up an IDE BMDMA transaction, these hooks arm
226(``->bmdma_setup``), fire (``->bmdma_start``), and halt (``->bmdma_stop``) the
227hardware's DMA engine. ``->bmdma_status`` is used to read the standard PCI
228IDE DMA Status register.
229
230These hooks are typically either no-ops, or simply not implemented, in
231FIS-based drivers.
232
233Most legacy IDE drivers use :c:func:`ata_bmdma_setup` for the
234:c:func:`bmdma_setup` hook. :c:func:`ata_bmdma_setup` will write the pointer
235to the PRD table to the IDE PRD Table Address register, enable DMA in the DMA
236Command register, and call :c:func:`exec_command` to begin the transfer.
237
238Most legacy IDE drivers use :c:func:`ata_bmdma_start` for the
239:c:func:`bmdma_start` hook. :c:func:`ata_bmdma_start` will write the
240ATA_DMA_START flag to the DMA Command register.
241
242Many legacy IDE drivers use :c:func:`ata_bmdma_stop` for the
243:c:func:`bmdma_stop` hook. :c:func:`ata_bmdma_stop` clears the ATA_DMA_START
244flag in the DMA command register.
245
246Many legacy IDE drivers use :c:func:`ata_bmdma_status` as the
247:c:func:`bmdma_status` hook.
248
249High-level taskfile hooks
250~~~~~~~~~~~~~~~~~~~~~~~~~
251
252::
253
254 void (*qc_prep) (struct ata_queued_cmd *qc);
255 int (*qc_issue) (struct ata_queued_cmd *qc);
256
257
258Higher-level hooks, these two hooks can potentially supercede several of
259the above taskfile/DMA engine hooks. ``->qc_prep`` is called after the
260buffers have been DMA-mapped, and is typically used to populate the
261hardware's DMA scatter-gather table. Most drivers use the standard
262:c:func:`ata_qc_prep` helper function, but more advanced drivers roll their
263own.
264
265``->qc_issue`` is used to make a command active, once the hardware and S/G
266tables have been prepared. IDE BMDMA drivers use the helper function
267:c:func:`ata_qc_issue_prot` for taskfile protocol-based dispatch. More
268advanced drivers implement their own ``->qc_issue``.
269
270:c:func:`ata_qc_issue_prot` calls ``->tf_load()``, ``->bmdma_setup()``, and
271``->bmdma_start()`` as necessary to initiate a transfer.
272
273Exception and probe handling (EH)
274~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
275
276::
277
278 void (*eng_timeout) (struct ata_port *ap);
279 void (*phy_reset) (struct ata_port *ap);
280
281
282Deprecated. Use ``->error_handler()`` instead.
283
284::
285
286 void (*freeze) (struct ata_port *ap);
287 void (*thaw) (struct ata_port *ap);
288
289
290:c:func:`ata_port_freeze` is called when HSM violations or some other
291condition disrupts normal operation of the port. A frozen port is not
292allowed to perform any operation until the port is thawed, which usually
293follows a successful reset.
294
295The optional ``->freeze()`` callback can be used for freezing the port
296hardware-wise (e.g. mask interrupt and stop DMA engine). If a port
297cannot be frozen hardware-wise, the interrupt handler must ack and clear
298interrupts unconditionally while the port is frozen.
299
300The optional ``->thaw()`` callback is called to perform the opposite of
301``->freeze()``: prepare the port for normal operation once again. Unmask
302interrupts, start DMA engine, etc.
303
304::
305
306 void (*error_handler) (struct ata_port *ap);
307
308
309``->error_handler()`` is a driver's hook into probe, hotplug, and recovery
310and other exceptional conditions. The primary responsibility of an
311implementation is to call :c:func:`ata_do_eh` or :c:func:`ata_bmdma_drive_eh`
312with a set of EH hooks as arguments:
313
314'prereset' hook (may be NULL) is called during an EH reset, before any
315other actions are taken.
316
317'postreset' hook (may be NULL) is called after the EH reset is
318performed. Based on existing conditions, severity of the problem, and
319hardware capabilities,
320
321Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be
322called to perform the low-level EH reset.
323
324::
325
326 void (*post_internal_cmd) (struct ata_queued_cmd *qc);
327
328
329Perform any hardware-specific actions necessary to finish processing
330after executing a probe-time or EH-time command via
331:c:func:`ata_exec_internal`.
332
333Hardware interrupt handling
334~~~~~~~~~~~~~~~~~~~~~~~~~~~
335
336::
337
338 irqreturn_t (*irq_handler)(int, void *, struct pt_regs *);
339 void (*irq_clear) (struct ata_port *);
340
341
342``->irq_handler`` is the interrupt handling routine registered with the
343system, by libata. ``->irq_clear`` is called during probe just before the
344interrupt handler is registered, to be sure hardware is quiet.
345
346The second argument, dev_instance, should be cast to a pointer to
347:c:type:`struct ata_host_set <ata_host_set>`.
348
349Most legacy IDE drivers use :c:func:`ata_sff_interrupt` for the irq_handler
350hook, which scans all ports in the host_set, determines which queued
351command was active (if any), and calls ata_sff_host_intr(ap,qc).
352
353Most legacy IDE drivers use :c:func:`ata_sff_irq_clear` for the
354:c:func:`irq_clear` hook, which simply clears the interrupt and error flags
355in the DMA status register.
356
357SATA phy read/write
358~~~~~~~~~~~~~~~~~~~
359
360::
361
362 int (*scr_read) (struct ata_port *ap, unsigned int sc_reg,
363 u32 *val);
364 int (*scr_write) (struct ata_port *ap, unsigned int sc_reg,
365 u32 val);
366
367
368Read and write standard SATA phy registers. Currently only used if
369``->phy_reset`` hook called the :c:func:`sata_phy_reset` helper function.
370sc_reg is one of SCR_STATUS, SCR_CONTROL, SCR_ERROR, or SCR_ACTIVE.
371
372Init and shutdown
373~~~~~~~~~~~~~~~~~
374
375::
376
377 int (*port_start) (struct ata_port *ap);
378 void (*port_stop) (struct ata_port *ap);
379 void (*host_stop) (struct ata_host_set *host_set);
380
381
382``->port_start()`` is called just after the data structures for each port
383are initialized. Typically this is used to alloc per-port DMA buffers /
384tables / rings, enable DMA engines, and similar tasks. Some drivers also
385use this entry point as a chance to allocate driver-private memory for
386``ap->private_data``.
387
388Many drivers use :c:func:`ata_port_start` as this hook or call it from their
389own :c:func:`port_start` hooks. :c:func:`ata_port_start` allocates space for
390a legacy IDE PRD table and returns.
391
392``->port_stop()`` is called after ``->host_stop()``. Its sole function is to
393release DMA/memory resources, now that they are no longer actively being
394used. Many drivers also free driver-private data from port at this time.
395
396``->host_stop()`` is called after all ``->port_stop()`` calls have completed.
397The hook must finalize hardware shutdown, release DMA and other
398resources, etc. This hook may be specified as NULL, in which case it is
399not called.
400
401Error handling
402==============
403
404This chapter describes how errors are handled under libata. Readers are
405advised to read SCSI EH (Documentation/scsi/scsi_eh.txt) and ATA
406exceptions doc first.
407
408Origins of commands
409-------------------
410
411In libata, a command is represented with
412:c:type:`struct ata_queued_cmd <ata_queued_cmd>` or qc.
413qc's are preallocated during port initialization and repetitively used
414for command executions. Currently only one qc is allocated per port but
415yet-to-be-merged NCQ branch allocates one for each tag and maps each qc
416to NCQ tag 1-to-1.
417
418libata commands can originate from two sources - libata itself and SCSI
419midlayer. libata internal commands are used for initialization and error
420handling. All normal blk requests and commands for SCSI emulation are
421passed as SCSI commands through queuecommand callback of SCSI host
422template.
423
424How commands are issued
425-----------------------
426
427Internal commands
428 First, qc is allocated and initialized using :c:func:`ata_qc_new_init`.
429 Although :c:func:`ata_qc_new_init` doesn't implement any wait or retry
430 mechanism when qc is not available, internal commands are currently
431 issued only during initialization and error recovery, so no other
432 command is active and allocation is guaranteed to succeed.
433
434 Once allocated qc's taskfile is initialized for the command to be
435 executed. qc currently has two mechanisms to notify completion. One
436 is via ``qc->complete_fn()`` callback and the other is completion
437 ``qc->waiting``. ``qc->complete_fn()`` callback is the asynchronous path
438 used by normal SCSI translated commands and ``qc->waiting`` is the
439 synchronous (issuer sleeps in process context) path used by internal
440 commands.
441
442 Once initialization is complete, host_set lock is acquired and the
443 qc is issued.
444
445SCSI commands
446 All libata drivers use :c:func:`ata_scsi_queuecmd` as
447 ``hostt->queuecommand`` callback. scmds can either be simulated or
448 translated. No qc is involved in processing a simulated scmd. The
449 result is computed right away and the scmd is completed.
450
451 For a translated scmd, :c:func:`ata_qc_new_init` is invoked to allocate a
452 qc and the scmd is translated into the qc. SCSI midlayer's
453 completion notification function pointer is stored into
454 ``qc->scsidone``.
455
456 ``qc->complete_fn()`` callback is used for completion notification. ATA
457 commands use :c:func:`ata_scsi_qc_complete` while ATAPI commands use
458 :c:func:`atapi_qc_complete`. Both functions end up calling ``qc->scsidone``
459 to notify upper layer when the qc is finished. After translation is
460 completed, the qc is issued with :c:func:`ata_qc_issue`.
461
462 Note that SCSI midlayer invokes hostt->queuecommand while holding
463 host_set lock, so all above occur while holding host_set lock.
464
465How commands are processed
466--------------------------
467
468Depending on which protocol and which controller are used, commands are
469processed differently. For the purpose of discussion, a controller which
470uses taskfile interface and all standard callbacks is assumed.
471
472Currently 6 ATA command protocols are used. They can be sorted into the
473following four categories according to how they are processed.
474
475ATA NO DATA or DMA
476 ATA_PROT_NODATA and ATA_PROT_DMA fall into this category. These
477 types of commands don't require any software intervention once
478 issued. Device will raise interrupt on completion.
479
480ATA PIO
481 ATA_PROT_PIO is in this category. libata currently implements PIO
482 with polling. ATA_NIEN bit is set to turn off interrupt and
483 pio_task on ata_wq performs polling and IO.
484
485ATAPI NODATA or DMA
486 ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this
487 category. packet_task is used to poll BSY bit after issuing PACKET
488 command. Once BSY is turned off by the device, packet_task
489 transfers CDB and hands off processing to interrupt handler.
490
491ATAPI PIO
492 ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set and, as
493 in ATAPI NODATA or DMA, packet_task submits cdb. However, after
494 submitting cdb, further processing (data transfer) is handed off to
495 pio_task.
496
497How commands are completed
498--------------------------
499
500Once issued, all qc's are either completed with :c:func:`ata_qc_complete` or
501time out. For commands which are handled by interrupts,
502:c:func:`ata_host_intr` invokes :c:func:`ata_qc_complete`, and, for PIO tasks,
503pio_task invokes :c:func:`ata_qc_complete`. In error cases, packet_task may
504also complete commands.
505
506:c:func:`ata_qc_complete` does the following.
507
5081. DMA memory is unmapped.
509
5102. ATA_QCFLAG_ACTIVE is cleared from qc->flags.
511
5123. :c:func:`qc->complete_fn` callback is invoked. If the return value of the
513 callback is not zero. Completion is short circuited and
514 :c:func:`ata_qc_complete` returns.
515
5164. :c:func:`__ata_qc_complete` is called, which does
517
518 1. ``qc->flags`` is cleared to zero.
519
520 2. ``ap->active_tag`` and ``qc->tag`` are poisoned.
521
522 3. ``qc->waiting`` is cleared & completed (in that order).
523
524 4. qc is deallocated by clearing appropriate bit in ``ap->qactive``.
525
526So, it basically notifies upper layer and deallocates qc. One exception
527is short-circuit path in #3 which is used by :c:func:`atapi_qc_complete`.
528
529For all non-ATAPI commands, whether it fails or not, almost the same
530code path is taken and very little error handling takes place. A qc is
531completed with success status if it succeeded, with failed status
532otherwise.
533
534However, failed ATAPI commands require more handling as REQUEST SENSE is
535needed to acquire sense data. If an ATAPI command fails,
536:c:func:`ata_qc_complete` is invoked with error status, which in turn invokes
537:c:func:`atapi_qc_complete` via ``qc->complete_fn()`` callback.
538
539This makes :c:func:`atapi_qc_complete` set ``scmd->result`` to
540SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As the
541sense data is empty but ``scmd->result`` is CHECK CONDITION, SCSI midlayer
542will invoke EH for the scmd, and returning 1 makes :c:func:`ata_qc_complete`
543to return without deallocating the qc. This leads us to
544:c:func:`ata_scsi_error` with partially completed qc.
545
546:c:func:`ata_scsi_error`
547------------------------
548
549:c:func:`ata_scsi_error` is the current ``transportt->eh_strategy_handler()``
550for libata. As discussed above, this will be entered in two cases -
551timeout and ATAPI error completion. This function calls low level libata
552driver's :c:func:`eng_timeout` callback, the standard callback for which is
553:c:func:`ata_eng_timeout`. It checks if a qc is active and calls
554:c:func:`ata_qc_timeout` on the qc if so. Actual error handling occurs in
555:c:func:`ata_qc_timeout`.
556
557If EH is invoked for timeout, :c:func:`ata_qc_timeout` stops BMDMA and
558completes the qc. Note that as we're currently in EH, we cannot call
559scsi_done. As described in SCSI EH doc, a recovered scmd should be
560either retried with :c:func:`scsi_queue_insert` or finished with
561:c:func:`scsi_finish_command`. Here, we override ``qc->scsidone`` with
562:c:func:`scsi_finish_command` and calls :c:func:`ata_qc_complete`.
563
564If EH is invoked due to a failed ATAPI qc, the qc here is completed but
565not deallocated. The purpose of this half-completion is to use the qc as
566place holder to make EH code reach this place. This is a bit hackish,
567but it works.
568
569Once control reaches here, the qc is deallocated by invoking
570:c:func:`__ata_qc_complete` explicitly. Then, internal qc for REQUEST SENSE
571is issued. Once sense data is acquired, scmd is finished by directly
572invoking :c:func:`scsi_finish_command` on the scmd. Note that as we already
573have completed and deallocated the qc which was associated with the
574scmd, we don't need to/cannot call :c:func:`ata_qc_complete` again.
575
576Problems with the current EH
577----------------------------
578
579- Error representation is too crude. Currently any and all error
580 conditions are represented with ATA STATUS and ERROR registers.
581 Errors which aren't ATA device errors are treated as ATA device
582 errors by setting ATA_ERR bit. Better error descriptor which can
583 properly represent ATA and other errors/exceptions is needed.
584
585- When handling timeouts, no action is taken to make device forget
586 about the timed out command and ready for new commands.
587
588- EH handling via :c:func:`ata_scsi_error` is not properly protected from
589 usual command processing. On EH entrance, the device is not in
590 quiescent state. Timed out commands may succeed or fail any time.
591 pio_task and atapi_task may still be running.
592
593- Too weak error recovery. Devices / controllers causing HSM mismatch
594 errors and other errors quite often require reset to return to known
595 state. Also, advanced error handling is necessary to support features
596 like NCQ and hotplug.
597
598- ATA errors are directly handled in the interrupt handler and PIO
599 errors in pio_task. This is problematic for advanced error handling
600 for the following reasons.
601
602 First, advanced error handling often requires context and internal qc
603 execution.
604
605 Second, even a simple failure (say, CRC error) needs information
606 gathering and could trigger complex error handling (say, resetting &
607 reconfiguring). Having multiple code paths to gather information,
608 enter EH and trigger actions makes life painful.
609
610 Third, scattered EH code makes implementing low level drivers
611 difficult. Low level drivers override libata callbacks. If EH is
612 scattered over several places, each affected callbacks should perform
613 its part of error handling. This can be error prone and painful.
614
615libata Library
616==============
617
618.. kernel-doc:: drivers/ata/libata-core.c
619 :export:
620
621libata Core Internals
622=====================
623
624.. kernel-doc:: drivers/ata/libata-core.c
625 :internal:
626
627.. kernel-doc:: drivers/ata/libata-eh.c
628
629libata SCSI translation/emulation
630=================================
631
632.. kernel-doc:: drivers/ata/libata-scsi.c
633 :export:
634
635.. kernel-doc:: drivers/ata/libata-scsi.c
636 :internal:
637
638ATA errors and exceptions
639=========================
640
641This chapter tries to identify what error/exception conditions exist for
642ATA/ATAPI devices and describe how they should be handled in
643implementation-neutral way.
644
645The term 'error' is used to describe conditions where either an explicit
646error condition is reported from device or a command has timed out.
647
648The term 'exception' is either used to describe exceptional conditions
649which are not errors (say, power or hotplug events), or to describe both
650errors and non-error exceptional conditions. Where explicit distinction
651between error and exception is necessary, the term 'non-error exception'
652is used.
653
654Exception categories
655--------------------
656
657Exceptions are described primarily with respect to legacy taskfile + bus
658master IDE interface. If a controller provides other better mechanism
659for error reporting, mapping those into categories described below
660shouldn't be difficult.
661
662In the following sections, two recovery actions - reset and
663reconfiguring transport - are mentioned. These are described further in
664`EH recovery actions <#exrec>`__.
665
666HSM violation
667~~~~~~~~~~~~~
668
669This error is indicated when STATUS value doesn't match HSM requirement
670during issuing or execution any ATA/ATAPI command.
671
672- ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying to
673 issue a command.
674
675- !BSY && !DRQ during PIO data transfer.
676
677- DRQ on command completion.
678
679- !BSY && ERR after CDB transfer starts but before the last byte of CDB
680 is transferred. ATA/ATAPI standard states that "The device shall not
681 terminate the PACKET command with an error before the last byte of
682 the command packet has been written" in the error outputs description
683 of PACKET command and the state diagram doesn't include such
684 transitions.
685
686In these cases, HSM is violated and not much information regarding the
687error can be acquired from STATUS or ERROR register. IOW, this error can
688be anything - driver bug, faulty device, controller and/or cable.
689
690As HSM is violated, reset is necessary to restore known state.
691Reconfiguring transport for lower speed might be helpful too as
692transmission errors sometimes cause this kind of errors.
693
694ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)
695~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
696
697These are errors detected and reported by ATA/ATAPI devices indicating
698device problems. For this type of errors, STATUS and ERROR register
699values are valid and describe error condition. Note that some of ATA bus
700errors are detected by ATA/ATAPI devices and reported using the same
701mechanism as device errors. Those cases are described later in this
702section.
703
704For ATA commands, this type of errors are indicated by !BSY && ERR
705during command execution and on completion.
706
707For ATAPI commands,
708
709- !BSY && ERR && ABRT right after issuing PACKET indicates that PACKET
710 command is not supported and falls in this category.
711
712- !BSY && ERR(==CHK) && !ABRT after the last byte of CDB is transferred
713 indicates CHECK CONDITION and doesn't fall in this category.
714
715- !BSY && ERR(==CHK) && ABRT after the last byte of CDB is transferred
716 \*probably\* indicates CHECK CONDITION and doesn't fall in this
717 category.
718
719Of errors detected as above, the following are not ATA/ATAPI device
720errors but ATA bus errors and should be handled according to
721`ATA bus error <#excatATAbusErr>`__.
722
723CRC error during data transfer
724 This is indicated by ICRC bit in the ERROR register and means that
725 corruption occurred during data transfer. Up to ATA/ATAPI-7, the
726 standard specifies that this bit is only applicable to UDMA
727 transfers but ATA/ATAPI-8 draft revision 1f says that the bit may be
728 applicable to multiword DMA and PIO.
729
730ABRT error during data transfer or on completion
731 Up to ATA/ATAPI-7, the standard specifies that ABRT could be set on
732 ICRC errors and on cases where a device is not able to complete a
733 command. Combined with the fact that MWDMA and PIO transfer errors
734 aren't allowed to use ICRC bit up to ATA/ATAPI-7, it seems to imply
735 that ABRT bit alone could indicate transfer errors.
736
737 However, ATA/ATAPI-8 draft revision 1f removes the part that ICRC
738 errors can turn on ABRT. So, this is kind of gray area. Some
739 heuristics are needed here.
740
741ATA/ATAPI device errors can be further categorized as follows.
742
743Media errors
744 This is indicated by UNC bit in the ERROR register. ATA devices
745 reports UNC error only after certain number of retries cannot
746 recover the data, so there's nothing much else to do other than
747 notifying upper layer.
748
749 READ and WRITE commands report CHS or LBA of the first failed sector
750 but ATA/ATAPI standard specifies that the amount of transferred data
751 on error completion is indeterminate, so we cannot assume that
752 sectors preceding the failed sector have been transferred and thus
753 cannot complete those sectors successfully as SCSI does.
754
755Media changed / media change requested error
756 <<TODO: fill here>>
757
758Address error
759 This is indicated by IDNF bit in the ERROR register. Report to upper
760 layer.
761
762Other errors
763 This can be invalid command or parameter indicated by ABRT ERROR bit
764 or some other error condition. Note that ABRT bit can indicate a lot
765 of things including ICRC and Address errors. Heuristics needed.
766
767Depending on commands, not all STATUS/ERROR bits are applicable. These
768non-applicable bits are marked with "na" in the output descriptions but
769up to ATA/ATAPI-7 no definition of "na" can be found. However,
770ATA/ATAPI-8 draft revision 1f describes "N/A" as follows.
771
772 3.2.3.3a N/A
773 A keyword the indicates a field has no defined value in this
774 standard and should not be checked by the host or device. N/A
775 fields should be cleared to zero.
776
777So, it seems reasonable to assume that "na" bits are cleared to zero by
778devices and thus need no explicit masking.
779
780ATAPI device CHECK CONDITION
781~~~~~~~~~~~~~~~~~~~~~~~~~~~~
782
783ATAPI device CHECK CONDITION error is indicated by set CHK bit (ERR bit)
784in the STATUS register after the last byte of CDB is transferred for a
785PACKET command. For this kind of errors, sense data should be acquired
786to gather information regarding the errors. REQUEST SENSE packet command
787should be used to acquire sense data.
788
789Once sense data is acquired, this type of errors can be handled
790similarly to other SCSI errors. Note that sense data may indicate ATA
791bus error (e.g. Sense Key 04h HARDWARE ERROR && ASC/ASCQ 47h/00h SCSI
792PARITY ERROR). In such cases, the error should be considered as an ATA
793bus error and handled according to `ATA bus error <#excatATAbusErr>`__.
794
795ATA device error (NCQ)
796~~~~~~~~~~~~~~~~~~~~~~
797
798NCQ command error is indicated by cleared BSY and set ERR bit during NCQ
799command phase (one or more NCQ commands outstanding). Although STATUS
800and ERROR registers will contain valid values describing the error, READ
801LOG EXT is required to clear the error condition, determine which
802command has failed and acquire more information.
803
804READ LOG EXT Log Page 10h reports which tag has failed and taskfile
805register values describing the error. With this information the failed
806command can be handled as a normal ATA command error as in
807`ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION) <#excatDevErr>`__
808and all other in-flight commands must be retried. Note that this retry
809should not be counted - it's likely that commands retried this way would
810have completed normally if it were not for the failed command.
811
812Note that ATA bus errors can be reported as ATA device NCQ errors. This
813should be handled as described in `ATA bus error <#excatATAbusErr>`__.
814
815If READ LOG EXT Log Page 10h fails or reports NQ, we're thoroughly
816screwed. This condition should be treated according to
817`HSM violation <#excatHSMviolation>`__.
818
819ATA bus error
820~~~~~~~~~~~~~
821
822ATA bus error means that data corruption occurred during transmission
823over ATA bus (SATA or PATA). This type of errors can be indicated by
824
825- ICRC or ABRT error as described in
826 `ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION) <#excatDevErr>`__.
827
828- Controller-specific error completion with error information
829 indicating transmission error.
830
831- On some controllers, command timeout. In this case, there may be a
832 mechanism to determine that the timeout is due to transmission error.
833
834- Unknown/random errors, timeouts and all sorts of weirdities.
835
836As described above, transmission errors can cause wide variety of
837symptoms ranging from device ICRC error to random device lockup, and,
838for many cases, there is no way to tell if an error condition is due to
839transmission error or not; therefore, it's necessary to employ some kind
840of heuristic when dealing with errors and timeouts. For example,
841encountering repetitive ABRT errors for known supported command is
842likely to indicate ATA bus error.
843
844Once it's determined that ATA bus errors have possibly occurred,
845lowering ATA bus transmission speed is one of actions which may
846alleviate the problem. See `Reconfigure transport <#exrecReconf>`__ for
847more information.
848
849PCI bus error
850~~~~~~~~~~~~~
851
852Data corruption or other failures during transmission over PCI (or other
853system bus). For standard BMDMA, this is indicated by Error bit in the
854BMDMA Status register. This type of errors must be logged as it
855indicates something is very wrong with the system. Resetting host
856controller is recommended.
857
858Late completion
859~~~~~~~~~~~~~~~
860
861This occurs when timeout occurs and the timeout handler finds out that
862the timed out command has completed successfully or with error. This is
863usually caused by lost interrupts. This type of errors must be logged.
864Resetting host controller is recommended.
865
866Unknown error (timeout)
867~~~~~~~~~~~~~~~~~~~~~~~
868
869This is when timeout occurs and the command is still processing or the
870host and device are in unknown state. When this occurs, HSM could be in
871any valid or invalid state. To bring the device to known state and make
872it forget about the timed out command, resetting is necessary. The timed
873out command may be retried.
874
875Timeouts can also be caused by transmission errors. Refer to
876`ATA bus error <#excatATAbusErr>`__ for more details.
877
878Hotplug and power management exceptions
879~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
880
881<<TODO: fill here>>
882
883EH recovery actions
884-------------------
885
886This section discusses several important recovery actions.
887
888Clearing error condition
889~~~~~~~~~~~~~~~~~~~~~~~~
890
891Many controllers require its error registers to be cleared by error
892handler. Different controllers may have different requirements.
893
894For SATA, it's strongly recommended to clear at least SError register
895during error handling.
896
897Reset
898~~~~~
899
900During EH, resetting is necessary in the following cases.
901
902- HSM is in unknown or invalid state
903
904- HBA is in unknown or invalid state
905
906- EH needs to make HBA/device forget about in-flight commands
907
908- HBA/device behaves weirdly
909
910Resetting during EH might be a good idea regardless of error condition
911to improve EH robustness. Whether to reset both or either one of HBA and
912device depends on situation but the following scheme is recommended.
913
914- When it's known that HBA is in ready state but ATA/ATAPI device is in
915 unknown state, reset only device.
916
917- If HBA is in unknown state, reset both HBA and device.
918
919HBA resetting is implementation specific. For a controller complying to
920taskfile/BMDMA PCI IDE, stopping active DMA transaction may be
921sufficient iff BMDMA state is the only HBA context. But even mostly
922taskfile/BMDMA PCI IDE complying controllers may have implementation
923specific requirements and mechanism to reset themselves. This must be
924addressed by specific drivers.
925
926OTOH, ATA/ATAPI standard describes in detail ways to reset ATA/ATAPI
927devices.
928
929PATA hardware reset
930 This is hardware initiated device reset signalled with asserted PATA
931 RESET- signal. There is no standard way to initiate hardware reset
932 from software although some hardware provides registers that allow
933 driver to directly tweak the RESET- signal.
934
935Software reset
936 This is achieved by turning CONTROL SRST bit on for at least 5us.
937 Both PATA and SATA support it but, in case of SATA, this may require
938 controller-specific support as the second Register FIS to clear SRST
939 should be transmitted while BSY bit is still set. Note that on PATA,
940 this resets both master and slave devices on a channel.
941
942EXECUTE DEVICE DIAGNOSTIC command
943 Although ATA/ATAPI standard doesn't describe exactly, EDD implies
944 some level of resetting, possibly similar level with software reset.
945 Host-side EDD protocol can be handled with normal command processing
946 and most SATA controllers should be able to handle EDD's just like
947 other commands. As in software reset, EDD affects both devices on a
948 PATA bus.
949
950 Although EDD does reset devices, this doesn't suit error handling as
951 EDD cannot be issued while BSY is set and it's unclear how it will
952 act when device is in unknown/weird state.
953
954ATAPI DEVICE RESET command
955 This is very similar to software reset except that reset can be
956 restricted to the selected device without affecting the other device
957 sharing the cable.
958
959SATA phy reset
960 This is the preferred way of resetting a SATA device. In effect,
961 it's identical to PATA hardware reset. Note that this can be done
962 with the standard SCR Control register. As such, it's usually easier
963 to implement than software reset.
964
965One more thing to consider when resetting devices is that resetting
966clears certain configuration parameters and they need to be set to their
967previous or newly adjusted values after reset.
968
969Parameters affected are.
970
971- CHS set up with INITIALIZE DEVICE PARAMETERS (seldom used)
972
973- Parameters set with SET FEATURES including transfer mode setting
974
975- Block count set with SET MULTIPLE MODE
976
977- Other parameters (SET MAX, MEDIA LOCK...)
978
979ATA/ATAPI standard specifies that some parameters must be maintained
980across hardware or software reset, but doesn't strictly specify all of
981them. Always reconfiguring needed parameters after reset is required for
982robustness. Note that this also applies when resuming from deep sleep
983(power-off).
984
985Also, ATA/ATAPI standard requires that IDENTIFY DEVICE / IDENTIFY PACKET
986DEVICE is issued after any configuration parameter is updated or a
987hardware reset and the result used for further operation. OS driver is
988required to implement revalidation mechanism to support this.
989
990Reconfigure transport
991~~~~~~~~~~~~~~~~~~~~~
992
993For both PATA and SATA, a lot of corners are cut for cheap connectors,
994cables or controllers and it's quite common to see high transmission
995error rate. This can be mitigated by lowering transmission speed.
996
997The following is a possible scheme Jeff Garzik suggested.
998
999 If more than $N (3?) transmission errors happen in 15 minutes,
1000
1001 - if SATA, decrease SATA PHY speed. if speed cannot be decreased,
1002
1003 - decrease UDMA xfer speed. if at UDMA0, switch to PIO4,
1004
1005 - decrease PIO xfer speed. if at PIO3, complain, but continue
1006
1007ata_piix Internals
1008===================
1009
1010.. kernel-doc:: drivers/ata/ata_piix.c
1011 :internal:
1012
1013sata_sil Internals
1014===================
1015
1016.. kernel-doc:: drivers/ata/sata_sil.c
1017 :internal:
1018
1019Thanks
1020======
1021
1022The bulk of the ATA knowledge comes thanks to long conversations with
1023Andre Hedrick (www.linux-ide.org), and long hours pondering the ATA and
1024SCSI specifications.
1025
1026Thanks to Alan Cox for pointing out similarities between SATA and SCSI,
1027and in general for motivation to hack on libata.
1028
1029libata's device detection method, ata_pio_devchk, and in general all
1030the early probing was based on extensive study of Hale Landis's
1031probe/reset code in his ATADRVR driver (www.ata-atapi.com).
diff --git a/Documentation/driver-api/mtdnand.rst b/Documentation/driver-api/mtdnand.rst
new file mode 100644
index 000000000000..e9afa586d15e
--- /dev/null
+++ b/Documentation/driver-api/mtdnand.rst
@@ -0,0 +1,1007 @@
1=====================================
2MTD NAND Driver Programming Interface
3=====================================
4
5:Author: Thomas Gleixner
6
7Introduction
8============
9
10The generic NAND driver supports almost all NAND and AG-AND based chips
11and connects them to the Memory Technology Devices (MTD) subsystem of
12the Linux Kernel.
13
14This documentation is provided for developers who want to implement
15board drivers or filesystem drivers suitable for NAND devices.
16
17Known Bugs And Assumptions
18==========================
19
20None.
21
22Documentation hints
23===================
24
25The function and structure docs are autogenerated. Each function and
26struct member has a short description which is marked with an [XXX]
27identifier. The following chapters explain the meaning of those
28identifiers.
29
30Function identifiers [XXX]
31--------------------------
32
33The functions are marked with [XXX] identifiers in the short comment.
34The identifiers explain the usage and scope of the functions. Following
35identifiers are used:
36
37- [MTD Interface]
38
39 These functions provide the interface to the MTD kernel API. They are
40 not replaceable and provide functionality which is complete hardware
41 independent.
42
43- [NAND Interface]
44
45 These functions are exported and provide the interface to the NAND
46 kernel API.
47
48- [GENERIC]
49
50 Generic functions are not replaceable and provide functionality which
51 is complete hardware independent.
52
53- [DEFAULT]
54
55 Default functions provide hardware related functionality which is
56 suitable for most of the implementations. These functions can be
57 replaced by the board driver if necessary. Those functions are called
58 via pointers in the NAND chip description structure. The board driver
59 can set the functions which should be replaced by board dependent
60 functions before calling nand_scan(). If the function pointer is
61 NULL on entry to nand_scan() then the pointer is set to the default
62 function which is suitable for the detected chip type.
63
64Struct member identifiers [XXX]
65-------------------------------
66
67The struct members are marked with [XXX] identifiers in the comment. The
68identifiers explain the usage and scope of the members. Following
69identifiers are used:
70
71- [INTERN]
72
73 These members are for NAND driver internal use only and must not be
74 modified. Most of these values are calculated from the chip geometry
75 information which is evaluated during nand_scan().
76
77- [REPLACEABLE]
78
79 Replaceable members hold hardware related functions which can be
80 provided by the board driver. The board driver can set the functions
81 which should be replaced by board dependent functions before calling
82 nand_scan(). If the function pointer is NULL on entry to
83 nand_scan() then the pointer is set to the default function which is
84 suitable for the detected chip type.
85
86- [BOARDSPECIFIC]
87
88 Board specific members hold hardware related information which must
89 be provided by the board driver. The board driver must set the
90 function pointers and datafields before calling nand_scan().
91
92- [OPTIONAL]
93
94 Optional members can hold information relevant for the board driver.
95 The generic NAND driver code does not use this information.
96
97Basic board driver
98==================
99
100For most boards it will be sufficient to provide just the basic
101functions and fill out some really board dependent members in the nand
102chip description structure.
103
104Basic defines
105-------------
106
107At least you have to provide a nand_chip structure and a storage for
108the ioremap'ed chip address. You can allocate the nand_chip structure
109using kmalloc or you can allocate it statically. The NAND chip structure
110embeds an mtd structure which will be registered to the MTD subsystem.
111You can extract a pointer to the mtd structure from a nand_chip pointer
112using the nand_to_mtd() helper.
113
114Kmalloc based example
115
116::
117
118 static struct mtd_info *board_mtd;
119 static void __iomem *baseaddr;
120
121
122Static example
123
124::
125
126 static struct nand_chip board_chip;
127 static void __iomem *baseaddr;
128
129
130Partition defines
131-----------------
132
133If you want to divide your device into partitions, then define a
134partitioning scheme suitable to your board.
135
136::
137
138 #define NUM_PARTITIONS 2
139 static struct mtd_partition partition_info[] = {
140 { .name = "Flash partition 1",
141 .offset = 0,
142 .size = 8 * 1024 * 1024 },
143 { .name = "Flash partition 2",
144 .offset = MTDPART_OFS_NEXT,
145 .size = MTDPART_SIZ_FULL },
146 };
147
148
149Hardware control function
150-------------------------
151
152The hardware control function provides access to the control pins of the
153NAND chip(s). The access can be done by GPIO pins or by address lines.
154If you use address lines, make sure that the timing requirements are
155met.
156
157*GPIO based example*
158
159::
160
161 static void board_hwcontrol(struct mtd_info *mtd, int cmd)
162 {
163 switch(cmd){
164 case NAND_CTL_SETCLE: /* Set CLE pin high */ break;
165 case NAND_CTL_CLRCLE: /* Set CLE pin low */ break;
166 case NAND_CTL_SETALE: /* Set ALE pin high */ break;
167 case NAND_CTL_CLRALE: /* Set ALE pin low */ break;
168 case NAND_CTL_SETNCE: /* Set nCE pin low */ break;
169 case NAND_CTL_CLRNCE: /* Set nCE pin high */ break;
170 }
171 }
172
173
174*Address lines based example.* It's assumed that the nCE pin is driven
175by a chip select decoder.
176
177::
178
179 static void board_hwcontrol(struct mtd_info *mtd, int cmd)
180 {
181 struct nand_chip *this = mtd_to_nand(mtd);
182 switch(cmd){
183 case NAND_CTL_SETCLE: this->IO_ADDR_W |= CLE_ADRR_BIT; break;
184 case NAND_CTL_CLRCLE: this->IO_ADDR_W &= ~CLE_ADRR_BIT; break;
185 case NAND_CTL_SETALE: this->IO_ADDR_W |= ALE_ADRR_BIT; break;
186 case NAND_CTL_CLRALE: this->IO_ADDR_W &= ~ALE_ADRR_BIT; break;
187 }
188 }
189
190
191Device ready function
192---------------------
193
194If the hardware interface has the ready busy pin of the NAND chip
195connected to a GPIO or other accessible I/O pin, this function is used
196to read back the state of the pin. The function has no arguments and
197should return 0, if the device is busy (R/B pin is low) and 1, if the
198device is ready (R/B pin is high). If the hardware interface does not
199give access to the ready busy pin, then the function must not be defined
200and the function pointer this->dev_ready is set to NULL.
201
202Init function
203-------------
204
205The init function allocates memory and sets up all the board specific
206parameters and function pointers. When everything is set up nand_scan()
207is called. This function tries to detect and identify then chip. If a
208chip is found all the internal data fields are initialized accordingly.
209The structure(s) have to be zeroed out first and then filled with the
210necessary information about the device.
211
212::
213
214 static int __init board_init (void)
215 {
216 struct nand_chip *this;
217 int err = 0;
218
219 /* Allocate memory for MTD device structure and private data */
220 this = kzalloc(sizeof(struct nand_chip), GFP_KERNEL);
221 if (!this) {
222 printk ("Unable to allocate NAND MTD device structure.\n");
223 err = -ENOMEM;
224 goto out;
225 }
226
227 board_mtd = nand_to_mtd(this);
228
229 /* map physical address */
230 baseaddr = ioremap(CHIP_PHYSICAL_ADDRESS, 1024);
231 if (!baseaddr) {
232 printk("Ioremap to access NAND chip failed\n");
233 err = -EIO;
234 goto out_mtd;
235 }
236
237 /* Set address of NAND IO lines */
238 this->IO_ADDR_R = baseaddr;
239 this->IO_ADDR_W = baseaddr;
240 /* Reference hardware control function */
241 this->hwcontrol = board_hwcontrol;
242 /* Set command delay time, see datasheet for correct value */
243 this->chip_delay = CHIP_DEPENDEND_COMMAND_DELAY;
244 /* Assign the device ready function, if available */
245 this->dev_ready = board_dev_ready;
246 this->eccmode = NAND_ECC_SOFT;
247
248 /* Scan to find existence of the device */
249 if (nand_scan (board_mtd, 1)) {
250 err = -ENXIO;
251 goto out_ior;
252 }
253
254 add_mtd_partitions(board_mtd, partition_info, NUM_PARTITIONS);
255 goto out;
256
257 out_ior:
258 iounmap(baseaddr);
259 out_mtd:
260 kfree (this);
261 out:
262 return err;
263 }
264 module_init(board_init);
265
266
267Exit function
268-------------
269
270The exit function is only necessary if the driver is compiled as a
271module. It releases all resources which are held by the chip driver and
272unregisters the partitions in the MTD layer.
273
274::
275
276 #ifdef MODULE
277 static void __exit board_cleanup (void)
278 {
279 /* Release resources, unregister device */
280 nand_release (board_mtd);
281
282 /* unmap physical address */
283 iounmap(baseaddr);
284
285 /* Free the MTD device structure */
286 kfree (mtd_to_nand(board_mtd));
287 }
288 module_exit(board_cleanup);
289 #endif
290
291
292Advanced board driver functions
293===============================
294
295This chapter describes the advanced functionality of the NAND driver.
296For a list of functions which can be overridden by the board driver see
297the documentation of the nand_chip structure.
298
299Multiple chip control
300---------------------
301
302The nand driver can control chip arrays. Therefore the board driver must
303provide an own select_chip function. This function must (de)select the
304requested chip. The function pointer in the nand_chip structure must be
305set before calling nand_scan(). The maxchip parameter of nand_scan()
306defines the maximum number of chips to scan for. Make sure that the
307select_chip function can handle the requested number of chips.
308
309The nand driver concatenates the chips to one virtual chip and provides
310this virtual chip to the MTD layer.
311
312*Note: The driver can only handle linear chip arrays of equally sized
313chips. There is no support for parallel arrays which extend the
314buswidth.*
315
316*GPIO based example*
317
318::
319
320 static void board_select_chip (struct mtd_info *mtd, int chip)
321 {
322 /* Deselect all chips, set all nCE pins high */
323 GPIO(BOARD_NAND_NCE) |= 0xff;
324 if (chip >= 0)
325 GPIO(BOARD_NAND_NCE) &= ~ (1 << chip);
326 }
327
328
329*Address lines based example.* Its assumed that the nCE pins are
330connected to an address decoder.
331
332::
333
334 static void board_select_chip (struct mtd_info *mtd, int chip)
335 {
336 struct nand_chip *this = mtd_to_nand(mtd);
337
338 /* Deselect all chips */
339 this->IO_ADDR_R &= ~BOARD_NAND_ADDR_MASK;
340 this->IO_ADDR_W &= ~BOARD_NAND_ADDR_MASK;
341 switch (chip) {
342 case 0:
343 this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIP0;
344 this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIP0;
345 break;
346 ....
347 case n:
348 this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIPn;
349 this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIPn;
350 break;
351 }
352 }
353
354
355Hardware ECC support
356--------------------
357
358Functions and constants
359~~~~~~~~~~~~~~~~~~~~~~~
360
361The nand driver supports three different types of hardware ECC.
362
363- NAND_ECC_HW3_256
364
365 Hardware ECC generator providing 3 bytes ECC per 256 byte.
366
367- NAND_ECC_HW3_512
368
369 Hardware ECC generator providing 3 bytes ECC per 512 byte.
370
371- NAND_ECC_HW6_512
372
373 Hardware ECC generator providing 6 bytes ECC per 512 byte.
374
375- NAND_ECC_HW8_512
376
377 Hardware ECC generator providing 6 bytes ECC per 512 byte.
378
379If your hardware generator has a different functionality add it at the
380appropriate place in nand_base.c
381
382The board driver must provide following functions:
383
384- enable_hwecc
385
386 This function is called before reading / writing to the chip. Reset
387 or initialize the hardware generator in this function. The function
388 is called with an argument which let you distinguish between read and
389 write operations.
390
391- calculate_ecc
392
393 This function is called after read / write from / to the chip.
394 Transfer the ECC from the hardware to the buffer. If the option
395 NAND_HWECC_SYNDROME is set then the function is only called on
396 write. See below.
397
398- correct_data
399
400 In case of an ECC error this function is called for error detection
401 and correction. Return 1 respectively 2 in case the error can be
402 corrected. If the error is not correctable return -1. If your
403 hardware generator matches the default algorithm of the nand_ecc
404 software generator then use the correction function provided by
405 nand_ecc instead of implementing duplicated code.
406
407Hardware ECC with syndrome calculation
408~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
409
410Many hardware ECC implementations provide Reed-Solomon codes and
411calculate an error syndrome on read. The syndrome must be converted to a
412standard Reed-Solomon syndrome before calling the error correction code
413in the generic Reed-Solomon library.
414
415The ECC bytes must be placed immediately after the data bytes in order
416to make the syndrome generator work. This is contrary to the usual
417layout used by software ECC. The separation of data and out of band area
418is not longer possible. The nand driver code handles this layout and the
419remaining free bytes in the oob area are managed by the autoplacement
420code. Provide a matching oob-layout in this case. See rts_from4.c and
421diskonchip.c for implementation reference. In those cases we must also
422use bad block tables on FLASH, because the ECC layout is interfering
423with the bad block marker positions. See bad block table support for
424details.
425
426Bad block table support
427-----------------------
428
429Most NAND chips mark the bad blocks at a defined position in the spare
430area. Those blocks must not be erased under any circumstances as the bad
431block information would be lost. It is possible to check the bad block
432mark each time when the blocks are accessed by reading the spare area of
433the first page in the block. This is time consuming so a bad block table
434is used.
435
436The nand driver supports various types of bad block tables.
437
438- Per device
439
440 The bad block table contains all bad block information of the device
441 which can consist of multiple chips.
442
443- Per chip
444
445 A bad block table is used per chip and contains the bad block
446 information for this particular chip.
447
448- Fixed offset
449
450 The bad block table is located at a fixed offset in the chip
451 (device). This applies to various DiskOnChip devices.
452
453- Automatic placed
454
455 The bad block table is automatically placed and detected either at
456 the end or at the beginning of a chip (device)
457
458- Mirrored tables
459
460 The bad block table is mirrored on the chip (device) to allow updates
461 of the bad block table without data loss.
462
463nand_scan() calls the function nand_default_bbt().
464nand_default_bbt() selects appropriate default bad block table
465descriptors depending on the chip information which was retrieved by
466nand_scan().
467
468The standard policy is scanning the device for bad blocks and build a
469ram based bad block table which allows faster access than always
470checking the bad block information on the flash chip itself.
471
472Flash based tables
473~~~~~~~~~~~~~~~~~~
474
475It may be desired or necessary to keep a bad block table in FLASH. For
476AG-AND chips this is mandatory, as they have no factory marked bad
477blocks. They have factory marked good blocks. The marker pattern is
478erased when the block is erased to be reused. So in case of powerloss
479before writing the pattern back to the chip this block would be lost and
480added to the bad blocks. Therefore we scan the chip(s) when we detect
481them the first time for good blocks and store this information in a bad
482block table before erasing any of the blocks.
483
484The blocks in which the tables are stored are protected against
485accidental access by marking them bad in the memory bad block table. The
486bad block table management functions are allowed to circumvent this
487protection.
488
489The simplest way to activate the FLASH based bad block table support is
490to set the option NAND_BBT_USE_FLASH in the bbt_option field of the
491nand chip structure before calling nand_scan(). For AG-AND chips is
492this done by default. This activates the default FLASH based bad block
493table functionality of the NAND driver. The default bad block table
494options are
495
496- Store bad block table per chip
497
498- Use 2 bits per block
499
500- Automatic placement at the end of the chip
501
502- Use mirrored tables with version numbers
503
504- Reserve 4 blocks at the end of the chip
505
506User defined tables
507~~~~~~~~~~~~~~~~~~~
508
509User defined tables are created by filling out a nand_bbt_descr
510structure and storing the pointer in the nand_chip structure member
511bbt_td before calling nand_scan(). If a mirror table is necessary a
512second structure must be created and a pointer to this structure must be
513stored in bbt_md inside the nand_chip structure. If the bbt_md member
514is set to NULL then only the main table is used and no scan for the
515mirrored table is performed.
516
517The most important field in the nand_bbt_descr structure is the
518options field. The options define most of the table properties. Use the
519predefined constants from nand.h to define the options.
520
521- Number of bits per block
522
523 The supported number of bits is 1, 2, 4, 8.
524
525- Table per chip
526
527 Setting the constant NAND_BBT_PERCHIP selects that a bad block
528 table is managed for each chip in a chip array. If this option is not
529 set then a per device bad block table is used.
530
531- Table location is absolute
532
533 Use the option constant NAND_BBT_ABSPAGE and define the absolute
534 page number where the bad block table starts in the field pages. If
535 you have selected bad block tables per chip and you have a multi chip
536 array then the start page must be given for each chip in the chip
537 array. Note: there is no scan for a table ident pattern performed, so
538 the fields pattern, veroffs, offs, len can be left uninitialized
539
540- Table location is automatically detected
541
542 The table can either be located in the first or the last good blocks
543 of the chip (device). Set NAND_BBT_LASTBLOCK to place the bad block
544 table at the end of the chip (device). The bad block tables are
545 marked and identified by a pattern which is stored in the spare area
546 of the first page in the block which holds the bad block table. Store
547 a pointer to the pattern in the pattern field. Further the length of
548 the pattern has to be stored in len and the offset in the spare area
549 must be given in the offs member of the nand_bbt_descr structure.
550 For mirrored bad block tables different patterns are mandatory.
551
552- Table creation
553
554 Set the option NAND_BBT_CREATE to enable the table creation if no
555 table can be found during the scan. Usually this is done only once if
556 a new chip is found.
557
558- Table write support
559
560 Set the option NAND_BBT_WRITE to enable the table write support.
561 This allows the update of the bad block table(s) in case a block has
562 to be marked bad due to wear. The MTD interface function
563 block_markbad is calling the update function of the bad block table.
564 If the write support is enabled then the table is updated on FLASH.
565
566 Note: Write support should only be enabled for mirrored tables with
567 version control.
568
569- Table version control
570
571 Set the option NAND_BBT_VERSION to enable the table version
572 control. It's highly recommended to enable this for mirrored tables
573 with write support. It makes sure that the risk of losing the bad
574 block table information is reduced to the loss of the information
575 about the one worn out block which should be marked bad. The version
576 is stored in 4 consecutive bytes in the spare area of the device. The
577 position of the version number is defined by the member veroffs in
578 the bad block table descriptor.
579
580- Save block contents on write
581
582 In case that the block which holds the bad block table does contain
583 other useful information, set the option NAND_BBT_SAVECONTENT. When
584 the bad block table is written then the whole block is read the bad
585 block table is updated and the block is erased and everything is
586 written back. If this option is not set only the bad block table is
587 written and everything else in the block is ignored and erased.
588
589- Number of reserved blocks
590
591 For automatic placement some blocks must be reserved for bad block
592 table storage. The number of reserved blocks is defined in the
593 maxblocks member of the bad block table description structure.
594 Reserving 4 blocks for mirrored tables should be a reasonable number.
595 This also limits the number of blocks which are scanned for the bad
596 block table ident pattern.
597
598Spare area (auto)placement
599--------------------------
600
601The nand driver implements different possibilities for placement of
602filesystem data in the spare area,
603
604- Placement defined by fs driver
605
606- Automatic placement
607
608The default placement function is automatic placement. The nand driver
609has built in default placement schemes for the various chiptypes. If due
610to hardware ECC functionality the default placement does not fit then
611the board driver can provide a own placement scheme.
612
613File system drivers can provide a own placement scheme which is used
614instead of the default placement scheme.
615
616Placement schemes are defined by a nand_oobinfo structure
617
618::
619
620 struct nand_oobinfo {
621 int useecc;
622 int eccbytes;
623 int eccpos[24];
624 int oobfree[8][2];
625 };
626
627
628- useecc
629
630 The useecc member controls the ecc and placement function. The header
631 file include/mtd/mtd-abi.h contains constants to select ecc and
632 placement. MTD_NANDECC_OFF switches off the ecc complete. This is
633 not recommended and available for testing and diagnosis only.
634 MTD_NANDECC_PLACE selects caller defined placement,
635 MTD_NANDECC_AUTOPLACE selects automatic placement.
636
637- eccbytes
638
639 The eccbytes member defines the number of ecc bytes per page.
640
641- eccpos
642
643 The eccpos array holds the byte offsets in the spare area where the
644 ecc codes are placed.
645
646- oobfree
647
648 The oobfree array defines the areas in the spare area which can be
649 used for automatic placement. The information is given in the format
650 {offset, size}. offset defines the start of the usable area, size the
651 length in bytes. More than one area can be defined. The list is
652 terminated by an {0, 0} entry.
653
654Placement defined by fs driver
655~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
656
657The calling function provides a pointer to a nand_oobinfo structure
658which defines the ecc placement. For writes the caller must provide a
659spare area buffer along with the data buffer. The spare area buffer size
660is (number of pages) \* (size of spare area). For reads the buffer size
661is (number of pages) \* ((size of spare area) + (number of ecc steps per
662page) \* sizeof (int)). The driver stores the result of the ecc check
663for each tuple in the spare buffer. The storage sequence is::
664
665 <spare data page 0><ecc result 0>...<ecc result n>
666
667 ...
668
669 <spare data page n><ecc result 0>...<ecc result n>
670
671This is a legacy mode used by YAFFS1.
672
673If the spare area buffer is NULL then only the ECC placement is done
674according to the given scheme in the nand_oobinfo structure.
675
676Automatic placement
677~~~~~~~~~~~~~~~~~~~
678
679Automatic placement uses the built in defaults to place the ecc bytes in
680the spare area. If filesystem data have to be stored / read into the
681spare area then the calling function must provide a buffer. The buffer
682size per page is determined by the oobfree array in the nand_oobinfo
683structure.
684
685If the spare area buffer is NULL then only the ECC placement is done
686according to the default builtin scheme.
687
688Spare area autoplacement default schemes
689----------------------------------------
690
691256 byte pagesize
692~~~~~~~~~~~~~~~~~
693
694======== ================== ===================================================
695Offset Content Comment
696======== ================== ===================================================
6970x00 ECC byte 0 Error correction code byte 0
6980x01 ECC byte 1 Error correction code byte 1
6990x02 ECC byte 2 Error correction code byte 2
7000x03 Autoplace 0
7010x04 Autoplace 1
7020x05 Bad block marker If any bit in this byte is zero, then this
703 block is bad. This applies only to the first
704 page in a block. In the remaining pages this
705 byte is reserved
7060x06 Autoplace 2
7070x07 Autoplace 3
708======== ================== ===================================================
709
710512 byte pagesize
711~~~~~~~~~~~~~~~~~
712
713
714============= ================== ==============================================
715Offset Content Comment
716============= ================== ==============================================
7170x00 ECC byte 0 Error correction code byte 0 of the lower
718 256 Byte data in this page
7190x01 ECC byte 1 Error correction code byte 1 of the lower
720 256 Bytes of data in this page
7210x02 ECC byte 2 Error correction code byte 2 of the lower
722 256 Bytes of data in this page
7230x03 ECC byte 3 Error correction code byte 0 of the upper
724 256 Bytes of data in this page
7250x04 reserved reserved
7260x05 Bad block marker If any bit in this byte is zero, then this
727 block is bad. This applies only to the first
728 page in a block. In the remaining pages this
729 byte is reserved
7300x06 ECC byte 4 Error correction code byte 1 of the upper
731 256 Bytes of data in this page
7320x07 ECC byte 5 Error correction code byte 2 of the upper
733 256 Bytes of data in this page
7340x08 - 0x0F Autoplace 0 - 7
735============= ================== ==============================================
736
7372048 byte pagesize
738~~~~~~~~~~~~~~~~~~
739
740=========== ================== ================================================
741Offset Content Comment
742=========== ================== ================================================
7430x00 Bad block marker If any bit in this byte is zero, then this block
744 is bad. This applies only to the first page in a
745 block. In the remaining pages this byte is
746 reserved
7470x01 Reserved Reserved
7480x02-0x27 Autoplace 0 - 37
7490x28 ECC byte 0 Error correction code byte 0 of the first
750 256 Byte data in this page
7510x29 ECC byte 1 Error correction code byte 1 of the first
752 256 Bytes of data in this page
7530x2A ECC byte 2 Error correction code byte 2 of the first
754 256 Bytes data in this page
7550x2B ECC byte 3 Error correction code byte 0 of the second
756 256 Bytes of data in this page
7570x2C ECC byte 4 Error correction code byte 1 of the second
758 256 Bytes of data in this page
7590x2D ECC byte 5 Error correction code byte 2 of the second
760 256 Bytes of data in this page
7610x2E ECC byte 6 Error correction code byte 0 of the third
762 256 Bytes of data in this page
7630x2F ECC byte 7 Error correction code byte 1 of the third
764 256 Bytes of data in this page
7650x30 ECC byte 8 Error correction code byte 2 of the third
766 256 Bytes of data in this page
7670x31 ECC byte 9 Error correction code byte 0 of the fourth
768 256 Bytes of data in this page
7690x32 ECC byte 10 Error correction code byte 1 of the fourth
770 256 Bytes of data in this page
7710x33 ECC byte 11 Error correction code byte 2 of the fourth
772 256 Bytes of data in this page
7730x34 ECC byte 12 Error correction code byte 0 of the fifth
774 256 Bytes of data in this page
7750x35 ECC byte 13 Error correction code byte 1 of the fifth
776 256 Bytes of data in this page
7770x36 ECC byte 14 Error correction code byte 2 of the fifth
778 256 Bytes of data in this page
7790x37 ECC byte 15 Error correction code byte 0 of the sixth
780 256 Bytes of data in this page
7810x38 ECC byte 16 Error correction code byte 1 of the sixth
782 256 Bytes of data in this page
7830x39 ECC byte 17 Error correction code byte 2 of the sixth
784 256 Bytes of data in this page
7850x3A ECC byte 18 Error correction code byte 0 of the seventh
786 256 Bytes of data in this page
7870x3B ECC byte 19 Error correction code byte 1 of the seventh
788 256 Bytes of data in this page
7890x3C ECC byte 20 Error correction code byte 2 of the seventh
790 256 Bytes of data in this page
7910x3D ECC byte 21 Error correction code byte 0 of the eighth
792 256 Bytes of data in this page
7930x3E ECC byte 22 Error correction code byte 1 of the eighth
794 256 Bytes of data in this page
7950x3F ECC byte 23 Error correction code byte 2 of the eighth
796 256 Bytes of data in this page
797=========== ================== ================================================
798
799Filesystem support
800==================
801
802The NAND driver provides all necessary functions for a filesystem via
803the MTD interface.
804
805Filesystems must be aware of the NAND peculiarities and restrictions.
806One major restrictions of NAND Flash is, that you cannot write as often
807as you want to a page. The consecutive writes to a page, before erasing
808it again, are restricted to 1-3 writes, depending on the manufacturers
809specifications. This applies similar to the spare area.
810
811Therefore NAND aware filesystems must either write in page size chunks
812or hold a writebuffer to collect smaller writes until they sum up to
813pagesize. Available NAND aware filesystems: JFFS2, YAFFS.
814
815The spare area usage to store filesystem data is controlled by the spare
816area placement functionality which is described in one of the earlier
817chapters.
818
819Tools
820=====
821
822The MTD project provides a couple of helpful tools to handle NAND Flash.
823
824- flasherase, flasheraseall: Erase and format FLASH partitions
825
826- nandwrite: write filesystem images to NAND FLASH
827
828- nanddump: dump the contents of a NAND FLASH partitions
829
830These tools are aware of the NAND restrictions. Please use those tools
831instead of complaining about errors which are caused by non NAND aware
832access methods.
833
834Constants
835=========
836
837This chapter describes the constants which might be relevant for a
838driver developer.
839
840Chip option constants
841---------------------
842
843Constants for chip id table
844~~~~~~~~~~~~~~~~~~~~~~~~~~~
845
846These constants are defined in nand.h. They are OR-ed together to
847describe the chip functionality::
848
849 /* Buswitdh is 16 bit */
850 #define NAND_BUSWIDTH_16 0x00000002
851 /* Device supports partial programming without padding */
852 #define NAND_NO_PADDING 0x00000004
853 /* Chip has cache program function */
854 #define NAND_CACHEPRG 0x00000008
855 /* Chip has copy back function */
856 #define NAND_COPYBACK 0x00000010
857 /* AND Chip which has 4 banks and a confusing page / block
858 * assignment. See Renesas datasheet for further information */
859 #define NAND_IS_AND 0x00000020
860 /* Chip has a array of 4 pages which can be read without
861 * additional ready /busy waits */
862 #define NAND_4PAGE_ARRAY 0x00000040
863
864
865Constants for runtime options
866~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
867
868These constants are defined in nand.h. They are OR-ed together to
869describe the functionality::
870
871 /* The hw ecc generator provides a syndrome instead a ecc value on read
872 * This can only work if we have the ecc bytes directly behind the
873 * data bytes. Applies for DOC and AG-AND Renesas HW Reed Solomon generators */
874 #define NAND_HWECC_SYNDROME 0x00020000
875
876
877ECC selection constants
878-----------------------
879
880Use these constants to select the ECC algorithm::
881
882 /* No ECC. Usage is not recommended ! */
883 #define NAND_ECC_NONE 0
884 /* Software ECC 3 byte ECC per 256 Byte data */
885 #define NAND_ECC_SOFT 1
886 /* Hardware ECC 3 byte ECC per 256 Byte data */
887 #define NAND_ECC_HW3_256 2
888 /* Hardware ECC 3 byte ECC per 512 Byte data */
889 #define NAND_ECC_HW3_512 3
890 /* Hardware ECC 6 byte ECC per 512 Byte data */
891 #define NAND_ECC_HW6_512 4
892 /* Hardware ECC 6 byte ECC per 512 Byte data */
893 #define NAND_ECC_HW8_512 6
894
895
896Hardware control related constants
897----------------------------------
898
899These constants describe the requested hardware access function when the
900boardspecific hardware control function is called::
901
902 /* Select the chip by setting nCE to low */
903 #define NAND_CTL_SETNCE 1
904 /* Deselect the chip by setting nCE to high */
905 #define NAND_CTL_CLRNCE 2
906 /* Select the command latch by setting CLE to high */
907 #define NAND_CTL_SETCLE 3
908 /* Deselect the command latch by setting CLE to low */
909 #define NAND_CTL_CLRCLE 4
910 /* Select the address latch by setting ALE to high */
911 #define NAND_CTL_SETALE 5
912 /* Deselect the address latch by setting ALE to low */
913 #define NAND_CTL_CLRALE 6
914 /* Set write protection by setting WP to high. Not used! */
915 #define NAND_CTL_SETWP 7
916 /* Clear write protection by setting WP to low. Not used! */
917 #define NAND_CTL_CLRWP 8
918
919
920Bad block table related constants
921---------------------------------
922
923These constants describe the options used for bad block table
924descriptors::
925
926 /* Options for the bad block table descriptors */
927
928 /* The number of bits used per block in the bbt on the device */
929 #define NAND_BBT_NRBITS_MSK 0x0000000F
930 #define NAND_BBT_1BIT 0x00000001
931 #define NAND_BBT_2BIT 0x00000002
932 #define NAND_BBT_4BIT 0x00000004
933 #define NAND_BBT_8BIT 0x00000008
934 /* The bad block table is in the last good block of the device */
935 #define NAND_BBT_LASTBLOCK 0x00000010
936 /* The bbt is at the given page, else we must scan for the bbt */
937 #define NAND_BBT_ABSPAGE 0x00000020
938 /* bbt is stored per chip on multichip devices */
939 #define NAND_BBT_PERCHIP 0x00000080
940 /* bbt has a version counter at offset veroffs */
941 #define NAND_BBT_VERSION 0x00000100
942 /* Create a bbt if none axists */
943 #define NAND_BBT_CREATE 0x00000200
944 /* Write bbt if necessary */
945 #define NAND_BBT_WRITE 0x00001000
946 /* Read and write back block contents when writing bbt */
947 #define NAND_BBT_SAVECONTENT 0x00002000
948
949
950Structures
951==========
952
953This chapter contains the autogenerated documentation of the structures
954which are used in the NAND driver and might be relevant for a driver
955developer. Each struct member has a short description which is marked
956with an [XXX] identifier. See the chapter "Documentation hints" for an
957explanation.
958
959.. kernel-doc:: include/linux/mtd/nand.h
960 :internal:
961
962Public Functions Provided
963=========================
964
965This chapter contains the autogenerated documentation of the NAND kernel
966API functions which are exported. Each function has a short description
967which is marked with an [XXX] identifier. See the chapter "Documentation
968hints" for an explanation.
969
970.. kernel-doc:: drivers/mtd/nand/nand_base.c
971 :export:
972
973.. kernel-doc:: drivers/mtd/nand/nand_ecc.c
974 :export:
975
976Internal Functions Provided
977===========================
978
979This chapter contains the autogenerated documentation of the NAND driver
980internal functions. Each function has a short description which is
981marked with an [XXX] identifier. See the chapter "Documentation hints"
982for an explanation. The functions marked with [DEFAULT] might be
983relevant for a board driver developer.
984
985.. kernel-doc:: drivers/mtd/nand/nand_base.c
986 :internal:
987
988.. kernel-doc:: drivers/mtd/nand/nand_bbt.c
989 :internal:
990
991Credits
992=======
993
994The following people have contributed to the NAND driver:
995
9961. Steven J. Hill\ sjhill@realitydiluted.com
997
9982. David Woodhouse\ dwmw2@infradead.org
999
10003. Thomas Gleixner\ tglx@linutronix.de
1001
1002A lot of users have provided bugfixes, improvements and helping hands
1003for testing. Thanks a lot.
1004
1005The following people have contributed to this document:
1006
10071. Thomas Gleixner\ tglx@linutronix.de
diff --git a/Documentation/driver-api/rapidio.rst b/Documentation/driver-api/rapidio.rst
new file mode 100644
index 000000000000..71ff658ab78e
--- /dev/null
+++ b/Documentation/driver-api/rapidio.rst
@@ -0,0 +1,107 @@
1=======================
2RapidIO Subsystem Guide
3=======================
4
5:Author: Matt Porter
6
7Introduction
8============
9
10RapidIO is a high speed switched fabric interconnect with features aimed
11at the embedded market. RapidIO provides support for memory-mapped I/O
12as well as message-based transactions over the switched fabric network.
13RapidIO has a standardized discovery mechanism not unlike the PCI bus
14standard that allows simple detection of devices in a network.
15
16This documentation is provided for developers intending to support
17RapidIO on new architectures, write new drivers, or to understand the
18subsystem internals.
19
20Known Bugs and Limitations
21==========================
22
23Bugs
24----
25
26None. ;)
27
28Limitations
29-----------
30
311. Access/management of RapidIO memory regions is not supported
32
332. Multiple host enumeration is not supported
34
35RapidIO driver interface
36========================
37
38Drivers are provided a set of calls in order to interface with the
39subsystem to gather info on devices, request/map memory region
40resources, and manage mailboxes/doorbells.
41
42Functions
43---------
44
45.. kernel-doc:: include/linux/rio_drv.h
46 :internal:
47
48.. kernel-doc:: drivers/rapidio/rio-driver.c
49 :export:
50
51.. kernel-doc:: drivers/rapidio/rio.c
52 :export:
53
54Internals
55=========
56
57This chapter contains the autogenerated documentation of the RapidIO
58subsystem.
59
60Structures
61----------
62
63.. kernel-doc:: include/linux/rio.h
64 :internal:
65
66Enumeration and Discovery
67-------------------------
68
69.. kernel-doc:: drivers/rapidio/rio-scan.c
70 :internal:
71
72Driver functionality
73--------------------
74
75.. kernel-doc:: drivers/rapidio/rio.c
76 :internal:
77
78.. kernel-doc:: drivers/rapidio/rio-access.c
79 :internal:
80
81Device model support
82--------------------
83
84.. kernel-doc:: drivers/rapidio/rio-driver.c
85 :internal:
86
87PPC32 support
88-------------
89
90.. kernel-doc:: arch/powerpc/sysdev/fsl_rio.c
91 :internal:
92
93Credits
94=======
95
96The following people have contributed to the RapidIO subsystem directly
97or indirectly:
98
991. Matt Porter\ mporter@kernel.crashing.org
100
1012. Randy Vinson\ rvinson@mvista.com
102
1033. Dan Malek\ dan@embeddedalley.com
104
105The following people have contributed to this document:
106
1071. Matt Porter\ mporter@kernel.crashing.org
diff --git a/Documentation/driver-api/s390-drivers.rst b/Documentation/driver-api/s390-drivers.rst
new file mode 100644
index 000000000000..7060da136095
--- /dev/null
+++ b/Documentation/driver-api/s390-drivers.rst
@@ -0,0 +1,111 @@
1===================================
2Writing s390 channel device drivers
3===================================
4
5:Author: Cornelia Huck
6
7Introduction
8============
9
10This document describes the interfaces available for device drivers that
11drive s390 based channel attached I/O devices. This includes interfaces
12for interaction with the hardware and interfaces for interacting with
13the common driver core. Those interfaces are provided by the s390 common
14I/O layer.
15
16The document assumes a familarity with the technical terms associated
17with the s390 channel I/O architecture. For a description of this
18architecture, please refer to the "z/Architecture: Principles of
19Operation", IBM publication no. SA22-7832.
20
21While most I/O devices on a s390 system are typically driven through the
22channel I/O mechanism described here, there are various other methods
23(like the diag interface). These are out of the scope of this document.
24
25Some additional information can also be found in the kernel source under
26Documentation/s390/driver-model.txt.
27
28The ccw bus
29===========
30
31The ccw bus typically contains the majority of devices available to a
32s390 system. Named after the channel command word (ccw), the basic
33command structure used to address its devices, the ccw bus contains
34so-called channel attached devices. They are addressed via I/O
35subchannels, visible on the css bus. A device driver for
36channel-attached devices, however, will never interact with the
37subchannel directly, but only via the I/O device on the ccw bus, the ccw
38device.
39
40I/O functions for channel-attached devices
41------------------------------------------
42
43Some hardware structures have been translated into C structures for use
44by the common I/O layer and device drivers. For more information on the
45hardware structures represented here, please consult the Principles of
46Operation.
47
48.. kernel-doc:: arch/s390/include/asm/cio.h
49 :internal:
50
51ccw devices
52-----------
53
54Devices that want to initiate channel I/O need to attach to the ccw bus.
55Interaction with the driver core is done via the common I/O layer, which
56provides the abstractions of ccw devices and ccw device drivers.
57
58The functions that initiate or terminate channel I/O all act upon a ccw
59device structure. Device drivers must not bypass those functions or
60strange side effects may happen.
61
62.. kernel-doc:: arch/s390/include/asm/ccwdev.h
63 :internal:
64
65.. kernel-doc:: drivers/s390/cio/device.c
66 :export:
67
68.. kernel-doc:: drivers/s390/cio/device_ops.c
69 :export:
70
71The channel-measurement facility
72--------------------------------
73
74The channel-measurement facility provides a means to collect measurement
75data which is made available by the channel subsystem for each channel
76attached device.
77
78.. kernel-doc:: arch/s390/include/asm/cmb.h
79 :internal:
80
81.. kernel-doc:: drivers/s390/cio/cmf.c
82 :export:
83
84The ccwgroup bus
85================
86
87The ccwgroup bus only contains artificial devices, created by the user.
88Many networking devices (e.g. qeth) are in fact composed of several ccw
89devices (like read, write and data channel for qeth). The ccwgroup bus
90provides a mechanism to create a meta-device which contains those ccw
91devices as slave devices and can be associated with the netdevice.
92
93ccw group devices
94-----------------
95
96.. kernel-doc:: arch/s390/include/asm/ccwgroup.h
97 :internal:
98
99.. kernel-doc:: drivers/s390/cio/ccwgroup.c
100 :export:
101
102Generic interfaces
103==================
104
105Some interfaces are available to other drivers that do not necessarily
106have anything to do with the busses described above, but still are
107indirectly using basic infrastructure in the common I/O layer. One
108example is the support for adapter interrupts.
109
110.. kernel-doc:: drivers/s390/cio/airq.c
111 :export:
diff --git a/Documentation/driver-api/scsi.rst b/Documentation/driver-api/scsi.rst
new file mode 100644
index 000000000000..859fb672319f
--- /dev/null
+++ b/Documentation/driver-api/scsi.rst
@@ -0,0 +1,344 @@
1=====================
2SCSI Interfaces Guide
3=====================
4
5:Author: James Bottomley
6:Author: Rob Landley
7
8Introduction
9============
10
11Protocol vs bus
12---------------
13
14Once upon a time, the Small Computer Systems Interface defined both a
15parallel I/O bus and a data protocol to connect a wide variety of
16peripherals (disk drives, tape drives, modems, printers, scanners,
17optical drives, test equipment, and medical devices) to a host computer.
18
19Although the old parallel (fast/wide/ultra) SCSI bus has largely fallen
20out of use, the SCSI command set is more widely used than ever to
21communicate with devices over a number of different busses.
22
23The `SCSI protocol <http://www.t10.org/scsi-3.htm>`__ is a big-endian
24peer-to-peer packet based protocol. SCSI commands are 6, 10, 12, or 16
25bytes long, often followed by an associated data payload.
26
27SCSI commands can be transported over just about any kind of bus, and
28are the default protocol for storage devices attached to USB, SATA, SAS,
29Fibre Channel, FireWire, and ATAPI devices. SCSI packets are also
30commonly exchanged over Infiniband,
31`I20 <http://i2o.shadowconnect.com/faq.php>`__, TCP/IP
32(`iSCSI <https://en.wikipedia.org/wiki/ISCSI>`__), even `Parallel
33ports <http://cyberelk.net/tim/parport/parscsi.html>`__.
34
35Design of the Linux SCSI subsystem
36----------------------------------
37
38The SCSI subsystem uses a three layer design, with upper, mid, and low
39layers. Every operation involving the SCSI subsystem (such as reading a
40sector from a disk) uses one driver at each of the 3 levels: one upper
41layer driver, one lower layer driver, and the SCSI midlayer.
42
43The SCSI upper layer provides the interface between userspace and the
44kernel, in the form of block and char device nodes for I/O and ioctl().
45The SCSI lower layer contains drivers for specific hardware devices.
46
47In between is the SCSI mid-layer, analogous to a network routing layer
48such as the IPv4 stack. The SCSI mid-layer routes a packet based data
49protocol between the upper layer's /dev nodes and the corresponding
50devices in the lower layer. It manages command queues, provides error
51handling and power management functions, and responds to ioctl()
52requests.
53
54SCSI upper layer
55================
56
57The upper layer supports the user-kernel interface by providing device
58nodes.
59
60sd (SCSI Disk)
61--------------
62
63sd (sd_mod.o)
64
65sr (SCSI CD-ROM)
66----------------
67
68sr (sr_mod.o)
69
70st (SCSI Tape)
71--------------
72
73st (st.o)
74
75sg (SCSI Generic)
76-----------------
77
78sg (sg.o)
79
80ch (SCSI Media Changer)
81-----------------------
82
83ch (ch.c)
84
85SCSI mid layer
86==============
87
88SCSI midlayer implementation
89----------------------------
90
91include/scsi/scsi_device.h
92~~~~~~~~~~~~~~~~~~~~~~~~~~~
93
94.. kernel-doc:: include/scsi/scsi_device.h
95 :internal:
96
97drivers/scsi/scsi.c
98~~~~~~~~~~~~~~~~~~~
99
100Main file for the SCSI midlayer.
101
102.. kernel-doc:: drivers/scsi/scsi.c
103 :export:
104
105drivers/scsi/scsicam.c
106~~~~~~~~~~~~~~~~~~~~~~
107
108`SCSI Common Access
109Method <http://www.t10.org/ftp/t10/drafts/cam/cam-r12b.pdf>`__ support
110functions, for use with HDIO_GETGEO, etc.
111
112.. kernel-doc:: drivers/scsi/scsicam.c
113 :export:
114
115drivers/scsi/scsi_error.c
116~~~~~~~~~~~~~~~~~~~~~~~~~~
117
118Common SCSI error/timeout handling routines.
119
120.. kernel-doc:: drivers/scsi/scsi_error.c
121 :export:
122
123drivers/scsi/scsi_devinfo.c
124~~~~~~~~~~~~~~~~~~~~~~~~~~~~
125
126Manage scsi_dev_info_list, which tracks blacklisted and whitelisted
127devices.
128
129.. kernel-doc:: drivers/scsi/scsi_devinfo.c
130 :internal:
131
132drivers/scsi/scsi_ioctl.c
133~~~~~~~~~~~~~~~~~~~~~~~~~~
134
135Handle ioctl() calls for SCSI devices.
136
137.. kernel-doc:: drivers/scsi/scsi_ioctl.c
138 :export:
139
140drivers/scsi/scsi_lib.c
141~~~~~~~~~~~~~~~~~~~~~~~~
142
143SCSI queuing library.
144
145.. kernel-doc:: drivers/scsi/scsi_lib.c
146 :export:
147
148drivers/scsi/scsi_lib_dma.c
149~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
150
151SCSI library functions depending on DMA (map and unmap scatter-gather
152lists).
153
154.. kernel-doc:: drivers/scsi/scsi_lib_dma.c
155 :export:
156
157drivers/scsi/scsi_module.c
158~~~~~~~~~~~~~~~~~~~~~~~~~~~
159
160The file drivers/scsi/scsi_module.c contains legacy support for
161old-style host templates. It should never be used by any new driver.
162
163drivers/scsi/scsi_proc.c
164~~~~~~~~~~~~~~~~~~~~~~~~~
165
166The functions in this file provide an interface between the PROC file
167system and the SCSI device drivers It is mainly used for debugging,
168statistics and to pass information directly to the lowlevel driver. I.E.
169plumbing to manage /proc/scsi/\*
170
171.. kernel-doc:: drivers/scsi/scsi_proc.c
172 :internal:
173
174drivers/scsi/scsi_netlink.c
175~~~~~~~~~~~~~~~~~~~~~~~~~~~~
176
177Infrastructure to provide async events from transports to userspace via
178netlink, using a single NETLINK_SCSITRANSPORT protocol for all
179transports. See `the original patch
180submission <http://marc.info/?l=linux-scsi&m=115507374832500&w=2>`__ for
181more details.
182
183.. kernel-doc:: drivers/scsi/scsi_netlink.c
184 :internal:
185
186drivers/scsi/scsi_scan.c
187~~~~~~~~~~~~~~~~~~~~~~~~~
188
189Scan a host to determine which (if any) devices are attached. The
190general scanning/probing algorithm is as follows, exceptions are made to
191it depending on device specific flags, compilation options, and global
192variable (boot or module load time) settings. A specific LUN is scanned
193via an INQUIRY command; if the LUN has a device attached, a scsi_device
194is allocated and setup for it. For every id of every channel on the
195given host, start by scanning LUN 0. Skip hosts that don't respond at
196all to a scan of LUN 0. Otherwise, if LUN 0 has a device attached,
197allocate and setup a scsi_device for it. If target is SCSI-3 or up,
198issue a REPORT LUN, and scan all of the LUNs returned by the REPORT LUN;
199else, sequentially scan LUNs up until some maximum is reached, or a LUN
200is seen that cannot have a device attached to it.
201
202.. kernel-doc:: drivers/scsi/scsi_scan.c
203 :internal:
204
205drivers/scsi/scsi_sysctl.c
206~~~~~~~~~~~~~~~~~~~~~~~~~~~
207
208Set up the sysctl entry: "/dev/scsi/logging_level"
209(DEV_SCSI_LOGGING_LEVEL) which sets/returns scsi_logging_level.
210
211drivers/scsi/scsi_sysfs.c
212~~~~~~~~~~~~~~~~~~~~~~~~~~
213
214SCSI sysfs interface routines.
215
216.. kernel-doc:: drivers/scsi/scsi_sysfs.c
217 :export:
218
219drivers/scsi/hosts.c
220~~~~~~~~~~~~~~~~~~~~
221
222mid to lowlevel SCSI driver interface
223
224.. kernel-doc:: drivers/scsi/hosts.c
225 :export:
226
227drivers/scsi/constants.c
228~~~~~~~~~~~~~~~~~~~~~~~~
229
230mid to lowlevel SCSI driver interface
231
232.. kernel-doc:: drivers/scsi/constants.c
233 :export:
234
235Transport classes
236-----------------
237
238Transport classes are service libraries for drivers in the SCSI lower
239layer, which expose transport attributes in sysfs.
240
241Fibre Channel transport
242~~~~~~~~~~~~~~~~~~~~~~~
243
244The file drivers/scsi/scsi_transport_fc.c defines transport attributes
245for Fibre Channel.
246
247.. kernel-doc:: drivers/scsi/scsi_transport_fc.c
248 :export:
249
250iSCSI transport class
251~~~~~~~~~~~~~~~~~~~~~
252
253The file drivers/scsi/scsi_transport_iscsi.c defines transport
254attributes for the iSCSI class, which sends SCSI packets over TCP/IP
255connections.
256
257.. kernel-doc:: drivers/scsi/scsi_transport_iscsi.c
258 :export:
259
260Serial Attached SCSI (SAS) transport class
261~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
262
263The file drivers/scsi/scsi_transport_sas.c defines transport
264attributes for Serial Attached SCSI, a variant of SATA aimed at large
265high-end systems.
266
267The SAS transport class contains common code to deal with SAS HBAs, an
268aproximated representation of SAS topologies in the driver model, and
269various sysfs attributes to expose these topologies and management
270interfaces to userspace.
271
272In addition to the basic SCSI core objects this transport class
273introduces two additional intermediate objects: The SAS PHY as
274represented by struct sas_phy defines an "outgoing" PHY on a SAS HBA or
275Expander, and the SAS remote PHY represented by struct sas_rphy defines
276an "incoming" PHY on a SAS Expander or end device. Note that this is
277purely a software concept, the underlying hardware for a PHY and a
278remote PHY is the exactly the same.
279
280There is no concept of a SAS port in this code, users can see what PHYs
281form a wide port based on the port_identifier attribute, which is the
282same for all PHYs in a port.
283
284.. kernel-doc:: drivers/scsi/scsi_transport_sas.c
285 :export:
286
287SATA transport class
288~~~~~~~~~~~~~~~~~~~~
289
290The SATA transport is handled by libata, which has its own book of
291documentation in this directory.
292
293Parallel SCSI (SPI) transport class
294~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
295
296The file drivers/scsi/scsi_transport_spi.c defines transport
297attributes for traditional (fast/wide/ultra) SCSI busses.
298
299.. kernel-doc:: drivers/scsi/scsi_transport_spi.c
300 :export:
301
302SCSI RDMA (SRP) transport class
303~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
304
305The file drivers/scsi/scsi_transport_srp.c defines transport
306attributes for SCSI over Remote Direct Memory Access.
307
308.. kernel-doc:: drivers/scsi/scsi_transport_srp.c
309 :export:
310
311SCSI lower layer
312================
313
314Host Bus Adapter transport types
315--------------------------------
316
317Many modern device controllers use the SCSI command set as a protocol to
318communicate with their devices through many different types of physical
319connections.
320
321In SCSI language a bus capable of carrying SCSI commands is called a
322"transport", and a controller connecting to such a bus is called a "host
323bus adapter" (HBA).
324
325Debug transport
326~~~~~~~~~~~~~~~
327
328The file drivers/scsi/scsi_debug.c simulates a host adapter with a
329variable number of disks (or disk like devices) attached, sharing a
330common amount of RAM. Does a lot of checking to make sure that we are
331not getting blocks mixed up, and panics the kernel if anything out of
332the ordinary is seen.
333
334To be more realistic, the simulated devices have the transport
335attributes of SAS disks.
336
337For documentation see http://sg.danny.cz/sg/sdebug26.html
338
339todo
340~~~~
341
342Parallel (fast/wide/ultra) SCSI, USB, SATA, SAS, Fibre Channel,
343FireWire, ATAPI devices, Infiniband, I20, iSCSI, Parallel ports,
344netlink...
diff --git a/Documentation/driver-api/w1.rst b/Documentation/driver-api/w1.rst
new file mode 100644
index 000000000000..9963cca788a1
--- /dev/null
+++ b/Documentation/driver-api/w1.rst
@@ -0,0 +1,70 @@
1======================
2W1: Dallas' 1-wire bus
3======================
4
5:Author: David Fries
6
7W1 API internal to the kernel
8=============================
9
10W1 API internal to the kernel
11-----------------------------
12
13include/linux/w1.h
14~~~~~~~~~~~~~~~~~~
15
16W1 kernel API functions.
17
18.. kernel-doc:: include/linux/w1.h
19 :internal:
20
21drivers/w1/w1.c
22~~~~~~~~~~~~~~~
23
24W1 core functions.
25
26.. kernel-doc:: drivers/w1/w1.c
27 :internal:
28
29drivers/w1/w1_family.c
30~~~~~~~~~~~~~~~~~~~~~~~
31
32Allows registering device family operations.
33
34.. kernel-doc:: drivers/w1/w1_family.c
35 :export:
36
37drivers/w1/w1_internal.h
38~~~~~~~~~~~~~~~~~~~~~~~~
39
40W1 internal initialization for master devices.
41
42.. kernel-doc:: drivers/w1/w1_internal.h
43 :internal:
44
45drivers/w1/w1_int.c
46~~~~~~~~~~~~~~~~~~~~
47
48W1 internal initialization for master devices.
49
50.. kernel-doc:: drivers/w1/w1_int.c
51 :export:
52
53drivers/w1/w1_netlink.h
54~~~~~~~~~~~~~~~~~~~~~~~~
55
56W1 external netlink API structures and commands.
57
58.. kernel-doc:: drivers/w1/w1_netlink.h
59 :internal:
60
61drivers/w1/w1_io.c
62~~~~~~~~~~~~~~~~~~~
63
64W1 input/output.
65
66.. kernel-doc:: drivers/w1/w1_io.c
67 :export:
68
69.. kernel-doc:: drivers/w1/w1_io.c
70 :internal:
diff --git a/Documentation/fb/api.txt b/Documentation/fb/api.txt
index d4ff7de85700..d52cf1e3b975 100644
--- a/Documentation/fb/api.txt
+++ b/Documentation/fb/api.txt
@@ -289,12 +289,12 @@ the FB_CAP_FOURCC bit in the fb_fix_screeninfo capabilities field.
289FOURCC definitions are located in the linux/videodev2.h header. However, and 289FOURCC definitions are located in the linux/videodev2.h header. However, and
290despite starting with the V4L2_PIX_FMT_prefix, they are not restricted to V4L2 290despite starting with the V4L2_PIX_FMT_prefix, they are not restricted to V4L2
291and don't require usage of the V4L2 subsystem. FOURCC documentation is 291and don't require usage of the V4L2 subsystem. FOURCC documentation is
292available in Documentation/DocBook/v4l/pixfmt.xml. 292available in Documentation/media/uapi/v4l/pixfmt.rst.
293 293
294To select a format, applications set the grayscale field to the desired FOURCC. 294To select a format, applications set the grayscale field to the desired FOURCC.
295For YUV formats, they should also select the appropriate colorspace by setting 295For YUV formats, they should also select the appropriate colorspace by setting
296the colorspace field to one of the colorspaces listed in linux/videodev2.h and 296the colorspace field to one of the colorspaces listed in linux/videodev2.h and
297documented in Documentation/DocBook/v4l/colorspaces.xml. 297documented in Documentation/media/uapi/v4l/colorspaces.rst.
298 298
299The red, green, blue and transp fields are not used with the FOURCC-based API. 299The red, green, blue and transp fields are not used with the FOURCC-based API.
300For forward compatibility reasons applications must zero those fields, and 300For forward compatibility reasons applications must zero those fields, and
diff --git a/Documentation/filesystems/conf.py b/Documentation/filesystems/conf.py
new file mode 100644
index 000000000000..ea44172af5c4
--- /dev/null
+++ b/Documentation/filesystems/conf.py
@@ -0,0 +1,10 @@
1# -*- coding: utf-8; mode: python -*-
2
3project = "Linux Filesystems API"
4
5tags.add("subproject")
6
7latex_documents = [
8 ('index', 'filesystems.tex', project,
9 'The kernel development community', 'manual'),
10]
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
new file mode 100644
index 000000000000..256e10eedba4
--- /dev/null
+++ b/Documentation/filesystems/index.rst
@@ -0,0 +1,317 @@
1=====================
2Linux Filesystems API
3=====================
4
5The Linux VFS
6=============
7
8The Filesystem types
9--------------------
10
11.. kernel-doc:: include/linux/fs.h
12 :internal:
13
14The Directory Cache
15-------------------
16
17.. kernel-doc:: fs/dcache.c
18 :export:
19
20.. kernel-doc:: include/linux/dcache.h
21 :internal:
22
23Inode Handling
24--------------
25
26.. kernel-doc:: fs/inode.c
27 :export:
28
29.. kernel-doc:: fs/bad_inode.c
30 :export:
31
32Registration and Superblocks
33----------------------------
34
35.. kernel-doc:: fs/super.c
36 :export:
37
38File Locks
39----------
40
41.. kernel-doc:: fs/locks.c
42 :export:
43
44.. kernel-doc:: fs/locks.c
45 :internal:
46
47Other Functions
48---------------
49
50.. kernel-doc:: fs/mpage.c
51 :export:
52
53.. kernel-doc:: fs/namei.c
54 :export:
55
56.. kernel-doc:: fs/buffer.c
57 :export:
58
59.. kernel-doc:: block/bio.c
60 :export:
61
62.. kernel-doc:: fs/seq_file.c
63 :export:
64
65.. kernel-doc:: fs/filesystems.c
66 :export:
67
68.. kernel-doc:: fs/fs-writeback.c
69 :export:
70
71.. kernel-doc:: fs/block_dev.c
72 :export:
73
74The proc filesystem
75===================
76
77sysctl interface
78----------------
79
80.. kernel-doc:: kernel/sysctl.c
81 :export:
82
83proc filesystem interface
84-------------------------
85
86.. kernel-doc:: fs/proc/base.c
87 :internal:
88
89Events based on file descriptors
90================================
91
92.. kernel-doc:: fs/eventfd.c
93 :export:
94
95The Filesystem for Exporting Kernel Objects
96===========================================
97
98.. kernel-doc:: fs/sysfs/file.c
99 :export:
100
101.. kernel-doc:: fs/sysfs/symlink.c
102 :export:
103
104The debugfs filesystem
105======================
106
107debugfs interface
108-----------------
109
110.. kernel-doc:: fs/debugfs/inode.c
111 :export:
112
113.. kernel-doc:: fs/debugfs/file.c
114 :export:
115
116The Linux Journalling API
117=========================
118
119Overview
120--------
121
122Details
123~~~~~~~
124
125The journalling layer is easy to use. You need to first of all create a
126journal_t data structure. There are two calls to do this dependent on
127how you decide to allocate the physical media on which the journal
128resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in
129filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used
130for journal stored on a raw device (in a continuous range of blocks). A
131journal_t is a typedef for a struct pointer, so when you are finally
132finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up
133any used kernel memory.
134
135Once you have got your journal_t object you need to 'mount' or load the
136journal file. The journalling layer expects the space for the journal
137was already allocated and initialized properly by the userspace tools.
138When loading the journal you must call :c:func:`jbd2_journal_load` to process
139journal contents. If the client file system detects the journal contents
140does not need to be processed (or even need not have valid contents), it
141may call :c:func:`jbd2_journal_wipe` to clear the journal contents before
142calling :c:func:`jbd2_journal_load`.
143
144Note that jbd2_journal_wipe(..,0) calls
145:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding
146transactions in the journal and similarly :c:func:`jbd2_journal_load` will
147call :c:func:`jbd2_journal_recover` if necessary. I would advise reading
148:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage.
149
150Now you can go ahead and start modifying the underlying filesystem.
151Almost.
152
153You still need to actually journal your filesystem changes, this is done
154by wrapping them into transactions. Additionally you also need to wrap
155the modification of each of the buffers with calls to the journal layer,
156so it knows what the modifications you are actually making are. To do
157this use :c:func:`jbd2_journal_start` which returns a transaction handle.
158
159:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`,
160which indicates the end of a transaction are nestable calls, so you can
161reenter a transaction if necessary, but remember you must call
162:c:func:`jbd2_journal_stop` the same number of times as
163:c:func:`jbd2_journal_start` before the transaction is completed (or more
164accurately leaves the update phase). Ext4/VFS makes use of this feature to
165simplify handling of inode dirtying, quota support, etc.
166
167Inside each transaction you need to wrap the modifications to the
168individual buffers (blocks). Before you start to modify a buffer you
169need to call :c:func:`jbd2_journal_get_create_access()` /
170:c:func:`jbd2_journal_get_write_access()` /
171:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the
172journalling layer to copy the unmodified
173data if it needs to. After all the buffer may be part of a previously
174uncommitted transaction. At this point you are at last ready to modify a
175buffer, and once you are have done so you need to call
176:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a
177buffer you now know is now longer required to be pushed back on the
178device you can call :c:func:`jbd2_journal_forget` in much the same way as you
179might have used :c:func:`bforget` in the past.
180
181A :c:func:`jbd2_journal_flush` may be called at any time to commit and
182checkpoint all your transactions.
183
184Then at umount time , in your :c:func:`put_super` you can then call
185:c:func:`jbd2_journal_destroy` to clean up your in-core journal object.
186
187Unfortunately there a couple of ways the journal layer can cause a
188deadlock. The first thing to note is that each task can only have a
189single outstanding transaction at any one time, remember nothing commits
190until the outermost :c:func:`jbd2_journal_stop`. This means you must complete
191the transaction at the end of each file/inode/address etc. operation you
192perform, so that the journalling system isn't re-entered on another
193journal. Since transactions can't be nested/batched across differing
194journals, and another filesystem other than yours (say ext4) may be
195modified in a later syscall.
196
197The second case to bear in mind is that :c:func:`jbd2_journal_start` can block
198if there isn't enough space in the journal for your transaction (based
199on the passed nblocks param) - when it blocks it merely(!) needs to wait
200for transactions to complete and be committed from other tasks, so
201essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid
202deadlocks you must treat :c:func:`jbd2_journal_start` /
203:c:func:`jbd2_journal_stop` as if they were semaphores and include them in
204your semaphore ordering rules to prevent
205deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking
206behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as
207easily as on :c:func:`jbd2_journal_start`.
208
209Try to reserve the right number of blocks the first time. ;-). This will
210be the maximum number of blocks you are going to touch in this
211transaction. I advise having a look at at least ext4_jbd.h to see the
212basis on which ext4 uses to make these decisions.
213
214Another wriggle to watch out for is your on-disk block allocation
215strategy. Why? Because, if you do a delete, you need to ensure you
216haven't reused any of the freed blocks until the transaction freeing
217these blocks commits. If you reused these blocks and crash happens,
218there is no way to restore the contents of the reallocated blocks at the
219end of the last fully committed transaction. One simple way of doing
220this is to mark blocks as free in internal in-memory block allocation
221structures only after the transaction freeing them commits. Ext4 uses
222journal commit callback for this purpose.
223
224With journal commit callbacks you can ask the journalling layer to call
225a callback function when the transaction is finally committed to disk,
226so that you can do some of your own management. You ask the journalling
227layer for calling the callback by simply setting
228``journal->j_commit_callback`` function pointer and that function is
229called after each transaction commit. You can also use
230``transaction->t_private_list`` for attaching entries to a transaction
231that need processing when the transaction commits.
232
233JBD2 also provides a way to block all transaction updates via
234:c:func:`jbd2_journal_lock_updates()` /
235:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a
236window with a clean and stable fs for a moment. E.g.
237
238::
239
240
241 jbd2_journal_lock_updates() //stop new stuff happening..
242 jbd2_journal_flush() // checkpoint everything.
243 ..do stuff on stable fs
244 jbd2_journal_unlock_updates() // carry on with filesystem use.
245
246The opportunities for abuse and DOS attacks with this should be obvious,
247if you allow unprivileged userspace to trigger codepaths containing
248these calls.
249
250Summary
251~~~~~~~
252
253Using the journal is a matter of wrapping the different context changes,
254being each mount, each modification (transaction) and each changed
255buffer to tell the journalling layer about them.
256
257Data Types
258----------
259
260The journalling layer uses typedefs to 'hide' the concrete definitions
261of the structures used. As a client of the JBD2 layer you can just rely
262on the using the pointer as a magic cookie of some sort. Obviously the
263hiding is not enforced as this is 'C'.
264
265Structures
266~~~~~~~~~~
267
268.. kernel-doc:: include/linux/jbd2.h
269 :internal:
270
271Functions
272---------
273
274The functions here are split into two groups those that affect a journal
275as a whole, and those which are used to manage transactions
276
277Journal Level
278~~~~~~~~~~~~~
279
280.. kernel-doc:: fs/jbd2/journal.c
281 :export:
282
283.. kernel-doc:: fs/jbd2/recovery.c
284 :internal:
285
286Transasction Level
287~~~~~~~~~~~~~~~~~~
288
289.. kernel-doc:: fs/jbd2/transaction.c
290
291See also
292--------
293
294`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen
295Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__
296
297`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen
298Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__
299
300splice API
301==========
302
303splice is a method for moving blocks of data around inside the kernel,
304without continually transferring them between the kernel and user space.
305
306.. kernel-doc:: fs/splice.c
307
308pipes API
309=========
310
311Pipe interfaces are all for in-kernel (builtin image) use. They are not
312exported for use by modules.
313
314.. kernel-doc:: include/linux/pipe_fs_i.h
315 :internal:
316
317.. kernel-doc:: fs/pipe.c
diff --git a/Documentation/filesystems/nfs/idmapper.txt b/Documentation/filesystems/nfs/idmapper.txt
index fe03d10bb79a..b86831acd583 100644
--- a/Documentation/filesystems/nfs/idmapper.txt
+++ b/Documentation/filesystems/nfs/idmapper.txt
@@ -55,7 +55,7 @@ request-key will find the first matching line and corresponding program. In
55this case, /some/other/program will handle all uid lookups and 55this case, /some/other/program will handle all uid lookups and
56/usr/sbin/nfs.idmap will handle gid, user, and group lookups. 56/usr/sbin/nfs.idmap will handle gid, user, and group lookups.
57 57
58See <file:Documentation/security/keys-request-key.txt> for more information 58See <file:Documentation/security/keys/request-key.rst> for more information
59about the request-key function. 59about the request-key function.
60 60
61 61
diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst
index 1bdb7356a310..6162d0e9dc28 100644
--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -228,7 +228,7 @@ The DRM reference documentation is still lacking kerneldoc in a few areas. The
228task would be to clean up interfaces like moving functions around between 228task would be to clean up interfaces like moving functions around between
229files to better group them and improving the interfaces like dropping return 229files to better group them and improving the interfaces like dropping return
230values for functions that never fail. Then write kerneldoc for all exported 230values for functions that never fail. Then write kerneldoc for all exported
231functions and an overview section and integrate it all into the drm DocBook. 231functions and an overview section and integrate it all into the drm book.
232 232
233See https://dri.freedesktop.org/docs/drm/ for what's there already. 233See https://dri.freedesktop.org/docs/drm/ for what's there already.
234 234
diff --git a/Documentation/index.rst b/Documentation/index.rst
index bc67dbf76eb0..cb7f1ba5b3b1 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -3,8 +3,8 @@
3 You can adapt this file completely to your liking, but it should at least 3 You can adapt this file completely to your liking, but it should at least
4 contain the root `toctree` directive. 4 contain the root `toctree` directive.
5 5
6Welcome to The Linux Kernel's documentation 6The Linux Kernel documentation
7=========================================== 7==============================
8 8
9This is the top level of the kernel's documentation tree. Kernel 9This is the top level of the kernel's documentation tree. Kernel
10documentation, like the kernel itself, is very much a work in progress; 10documentation, like the kernel itself, is very much a work in progress;
@@ -51,6 +51,7 @@ merged much easier.
51 process/index 51 process/index
52 dev-tools/index 52 dev-tools/index
53 doc-guide/index 53 doc-guide/index
54 kernel-hacking/index
54 55
55Kernel API documentation 56Kernel API documentation
56------------------------ 57------------------------
@@ -67,11 +68,24 @@ needed).
67 driver-api/index 68 driver-api/index
68 core-api/index 69 core-api/index
69 media/index 70 media/index
71 networking/index
70 input/index 72 input/index
71 gpu/index 73 gpu/index
72 security/index 74 security/index
73 sound/index 75 sound/index
74 crypto/index 76 crypto/index
77 filesystems/index
78
79Architecture-specific documentation
80-----------------------------------
81
82These books provide programming details about architecture-specific
83implementation.
84
85.. toctree::
86 :maxdepth: 2
87
88 sh/index
75 89
76Korean translations 90Korean translations
77------------------- 91-------------------
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index e18daca65ccd..659afd56ecdb 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -1331,7 +1331,7 @@ See subsequent chapter for the syntax of the Kbuild file.
1331 --- 7.5 mandatory-y 1331 --- 7.5 mandatory-y
1332 1332
1333 mandatory-y is essentially used by include/uapi/asm-generic/Kbuild.asm 1333 mandatory-y is essentially used by include/uapi/asm-generic/Kbuild.asm
1334 to define the minimun set of headers that must be exported in 1334 to define the minimum set of headers that must be exported in
1335 include/asm. 1335 include/asm.
1336 1336
1337 The convention is to list one subdir per line and 1337 The convention is to list one subdir per line and
diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt
index 104740ea0041..c23e2c5ab80d 100644
--- a/Documentation/kernel-doc-nano-HOWTO.txt
+++ b/Documentation/kernel-doc-nano-HOWTO.txt
@@ -17,8 +17,8 @@ The format for this documentation is called the kernel-doc format.
17It is documented in this Documentation/kernel-doc-nano-HOWTO.txt file. 17It is documented in this Documentation/kernel-doc-nano-HOWTO.txt file.
18 18
19This style embeds the documentation within the source files, using 19This style embeds the documentation within the source files, using
20a few simple conventions. The scripts/kernel-doc perl script, some 20a few simple conventions. The scripts/kernel-doc perl script, the
21SGML templates in Documentation/DocBook, and other tools understand 21Documentation/sphinx/kerneldoc.py Sphinx extension and other tools understand
22these conventions, and are used to extract this embedded documentation 22these conventions, and are used to extract this embedded documentation
23into various documents. 23into various documents.
24 24
@@ -122,15 +122,9 @@ are:
122- scripts/kernel-doc 122- scripts/kernel-doc
123 123
124 This is a perl script that hunts for the block comments and can mark 124 This is a perl script that hunts for the block comments and can mark
125 them up directly into DocBook, man, text, and HTML. (No, not 125 them up directly into DocBook, ReST, man, text, and HTML. (No, not
126 texinfo.) 126 texinfo.)
127 127
128- Documentation/DocBook/*.tmpl
129
130 These are SGML template files, which are normal SGML files with
131 special place-holders for where the extracted documentation should
132 go.
133
134- scripts/docproc.c 128- scripts/docproc.c
135 129
136 This is a program for converting SGML template files into SGML 130 This is a program for converting SGML template files into SGML
@@ -145,25 +139,18 @@ are:
145 139
146- Makefile 140- Makefile
147 141
148 The targets 'xmldocs', 'psdocs', 'pdfdocs', and 'htmldocs' are used 142 The targets 'xmldocs', 'latexdocs', 'pdfdocs', 'epubdocs'and 'htmldocs'
149 to build XML DocBook files, PostScript files, PDF files, and html files 143 are used to build XML DocBook files, LaTeX files, PDF files,
150 in Documentation/DocBook. The older target 'sgmldocs' is equivalent 144 ePub files and html files in Documentation/.
151 to 'xmldocs'.
152
153- Documentation/DocBook/Makefile
154
155 This is where C files are associated with SGML templates.
156
157 145
158How to extract the documentation 146How to extract the documentation
159-------------------------------- 147--------------------------------
160 148
161If you just want to read the ready-made books on the various 149If you just want to read the ready-made books on the various
162subsystems (see Documentation/DocBook/*.tmpl), just type 'make 150subsystems, just type 'make epubdocs', or 'make pdfdocs', or 'make htmldocs',
163psdocs', or 'make pdfdocs', or 'make htmldocs', depending on your 151depending on your preference. If you would rather read a different format,
164preference. If you would rather read a different format, you can type 152you can type 'make xmldocs' and then use DocBook tools to convert
165'make xmldocs' and then use DocBook tools to convert 153Documentation/output/*.xml to a format of your choice (for example,
166Documentation/DocBook/*.xml to a format of your choice (for example,
167'db2html ...' if 'make htmldocs' was not defined). 154'db2html ...' if 'make htmldocs' was not defined).
168 155
169If you want to see man pages instead, you can do this: 156If you want to see man pages instead, you can do this:
@@ -329,37 +316,7 @@ This is done by using a DOC: section keyword with a section title. E.g.:
329 * hardware, software, or its subject(s). 316 * hardware, software, or its subject(s).
330 */ 317 */
331 318
332DOC: sections are used in SGML templates files as indicated below. 319DOC: sections are used in ReST files.
333
334
335How to make new SGML template files
336-----------------------------------
337
338SGML template files (*.tmpl) are like normal SGML files, except that
339they can contain escape sequences where extracted documentation should
340be inserted.
341
342!E<filename> is replaced by the documentation, in <filename>, for
343functions that are exported using EXPORT_SYMBOL: the function list is
344collected from files listed in Documentation/DocBook/Makefile.
345
346!I<filename> is replaced by the documentation for functions that are
347_not_ exported using EXPORT_SYMBOL.
348
349!D<filename> is used to name additional files to search for functions
350exported using EXPORT_SYMBOL.
351
352!F<filename> <function [functions...]> is replaced by the
353documentation, in <filename>, for the functions listed.
354
355!P<filename> <section title> is replaced by the contents of the DOC:
356section titled <section title> from <filename>.
357Spaces are allowed in <section title>; do not quote the <section title>.
358
359!C<filename> is replaced by nothing, but makes the tools check that
360all DOC: sections and documented functions, symbols, etc. are used.
361This makes sense to use when you use !F/!P only and want to verify
362that all documentation is included.
363 320
364Tim. 321Tim.
365*/ <twaugh@redhat.com> 322*/ <twaugh@redhat.com>
diff --git a/Documentation/kernel-hacking/conf.py b/Documentation/kernel-hacking/conf.py
new file mode 100644
index 000000000000..3d8acf0f33ad
--- /dev/null
+++ b/Documentation/kernel-hacking/conf.py
@@ -0,0 +1,10 @@
1# -*- coding: utf-8; mode: python -*-
2
3project = "Kernel Hacking Guides"
4
5tags.add("subproject")
6
7latex_documents = [
8 ('index', 'kernel-hacking.tex', project,
9 'The kernel development community', 'manual'),
10]
diff --git a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst
new file mode 100644
index 000000000000..daf3883b2694
--- /dev/null
+++ b/Documentation/kernel-hacking/hacking.rst
@@ -0,0 +1,811 @@
1============================================
2Unreliable Guide To Hacking The Linux Kernel
3============================================
4
5:Author: Rusty Russell
6
7Introduction
8============
9
10Welcome, gentle reader, to Rusty's Remarkably Unreliable Guide to Linux
11Kernel Hacking. This document describes the common routines and general
12requirements for kernel code: its goal is to serve as a primer for Linux
13kernel development for experienced C programmers. I avoid implementation
14details: that's what the code is for, and I ignore whole tracts of
15useful routines.
16
17Before you read this, please understand that I never wanted to write
18this document, being grossly under-qualified, but I always wanted to
19read it, and this was the only way. I hope it will grow into a
20compendium of best practice, common starting points and random
21information.
22
23The Players
24===========
25
26At any time each of the CPUs in a system can be:
27
28- not associated with any process, serving a hardware interrupt;
29
30- not associated with any process, serving a softirq or tasklet;
31
32- running in kernel space, associated with a process (user context);
33
34- running a process in user space.
35
36There is an ordering between these. The bottom two can preempt each
37other, but above that is a strict hierarchy: each can only be preempted
38by the ones above it. For example, while a softirq is running on a CPU,
39no other softirq will preempt it, but a hardware interrupt can. However,
40any other CPUs in the system execute independently.
41
42We'll see a number of ways that the user context can block interrupts,
43to become truly non-preemptable.
44
45User Context
46------------
47
48User context is when you are coming in from a system call or other trap:
49like userspace, you can be preempted by more important tasks and by
50interrupts. You can sleep, by calling :c:func:`schedule()`.
51
52.. note::
53
54 You are always in user context on module load and unload, and on
55 operations on the block device layer.
56
57In user context, the ``current`` pointer (indicating the task we are
58currently executing) is valid, and :c:func:`in_interrupt()`
59(``include/linux/preempt.h``) is false.
60
61.. warning::
62
63 Beware that if you have preemption or softirqs disabled (see below),
64 :c:func:`in_interrupt()` will return a false positive.
65
66Hardware Interrupts (Hard IRQs)
67-------------------------------
68
69Timer ticks, network cards and keyboard are examples of real hardware
70which produce interrupts at any time. The kernel runs interrupt
71handlers, which services the hardware. The kernel guarantees that this
72handler is never re-entered: if the same interrupt arrives, it is queued
73(or dropped). Because it disables interrupts, this handler has to be
74fast: frequently it simply acknowledges the interrupt, marks a 'software
75interrupt' for execution and exits.
76
77You can tell you are in a hardware interrupt, because
78:c:func:`in_irq()` returns true.
79
80.. warning::
81
82 Beware that this will return a false positive if interrupts are
83 disabled (see below).
84
85Software Interrupt Context: Softirqs and Tasklets
86-------------------------------------------------
87
88Whenever a system call is about to return to userspace, or a hardware
89interrupt handler exits, any 'software interrupts' which are marked
90pending (usually by hardware interrupts) are run (``kernel/softirq.c``).
91
92Much of the real interrupt handling work is done here. Early in the
93transition to SMP, there were only 'bottom halves' (BHs), which didn't
94take advantage of multiple CPUs. Shortly after we switched from wind-up
95computers made of match-sticks and snot, we abandoned this limitation
96and switched to 'softirqs'.
97
98``include/linux/interrupt.h`` lists the different softirqs. A very
99important softirq is the timer softirq (``include/linux/timer.h``): you
100can register to have it call functions for you in a given length of
101time.
102
103Softirqs are often a pain to deal with, since the same softirq will run
104simultaneously on more than one CPU. For this reason, tasklets
105(``include/linux/interrupt.h``) are more often used: they are
106dynamically-registrable (meaning you can have as many as you want), and
107they also guarantee that any tasklet will only run on one CPU at any
108time, although different tasklets can run simultaneously.
109
110.. warning::
111
112 The name 'tasklet' is misleading: they have nothing to do with
113 'tasks', and probably more to do with some bad vodka Alexey
114 Kuznetsov had at the time.
115
116You can tell you are in a softirq (or tasklet) using the
117:c:func:`in_softirq()` macro (``include/linux/preempt.h``).
118
119.. warning::
120
121 Beware that this will return a false positive if a
122 :ref:`botton half lock <local_bh_disable>` is held.
123
124Some Basic Rules
125================
126
127No memory protection
128 If you corrupt memory, whether in user context or interrupt context,
129 the whole machine will crash. Are you sure you can't do what you
130 want in userspace?
131
132No floating point or MMX
133 The FPU context is not saved; even in user context the FPU state
134 probably won't correspond with the current process: you would mess
135 with some user process' FPU state. If you really want to do this,
136 you would have to explicitly save/restore the full FPU state (and
137 avoid context switches). It is generally a bad idea; use fixed point
138 arithmetic first.
139
140A rigid stack limit
141 Depending on configuration options the kernel stack is about 3K to
142 6K for most 32-bit architectures: it's about 14K on most 64-bit
143 archs, and often shared with interrupts so you can't use it all.
144 Avoid deep recursion and huge local arrays on the stack (allocate
145 them dynamically instead).
146
147The Linux kernel is portable
148 Let's keep it that way. Your code should be 64-bit clean, and
149 endian-independent. You should also minimize CPU specific stuff,
150 e.g. inline assembly should be cleanly encapsulated and minimized to
151 ease porting. Generally it should be restricted to the
152 architecture-dependent part of the kernel tree.
153
154ioctls: Not writing a new system call
155=====================================
156
157A system call generally looks like this::
158
159 asmlinkage long sys_mycall(int arg)
160 {
161 return 0;
162 }
163
164
165First, in most cases you don't want to create a new system call. You
166create a character device and implement an appropriate ioctl for it.
167This is much more flexible than system calls, doesn't have to be entered
168in every architecture's ``include/asm/unistd.h`` and
169``arch/kernel/entry.S`` file, and is much more likely to be accepted by
170Linus.
171
172If all your routine does is read or write some parameter, consider
173implementing a :c:func:`sysfs()` interface instead.
174
175Inside the ioctl you're in user context to a process. When a error
176occurs you return a negated errno (see
177``include/uapi/asm-generic/errno-base.h``,
178``include/uapi/asm-generic/errno.h`` and ``include/linux/errno.h``),
179otherwise you return 0.
180
181After you slept you should check if a signal occurred: the Unix/Linux
182way of handling signals is to temporarily exit the system call with the
183``-ERESTARTSYS`` error. The system call entry code will switch back to
184user context, process the signal handler and then your system call will
185be restarted (unless the user disabled that). So you should be prepared
186to process the restart, e.g. if you're in the middle of manipulating
187some data structure.
188
189::
190
191 if (signal_pending(current))
192 return -ERESTARTSYS;
193
194
195If you're doing longer computations: first think userspace. If you
196**really** want to do it in kernel you should regularly check if you need
197to give up the CPU (remember there is cooperative multitasking per CPU).
198Idiom::
199
200 cond_resched(); /* Will sleep */
201
202
203A short note on interface design: the UNIX system call motto is "Provide
204mechanism not policy".
205
206Recipes for Deadlock
207====================
208
209You cannot call any routines which may sleep, unless:
210
211- You are in user context.
212
213- You do not own any spinlocks.
214
215- You have interrupts enabled (actually, Andi Kleen says that the
216 scheduling code will enable them for you, but that's probably not
217 what you wanted).
218
219Note that some functions may sleep implicitly: common ones are the user
220space access functions (\*_user) and memory allocation functions
221without ``GFP_ATOMIC``.
222
223You should always compile your kernel ``CONFIG_DEBUG_ATOMIC_SLEEP`` on,
224and it will warn you if you break these rules. If you **do** break the
225rules, you will eventually lock up your box.
226
227Really.
228
229Common Routines
230===============
231
232:c:func:`printk()`
233------------------
234
235Defined in ``include/linux/printk.h``
236
237:c:func:`printk()` feeds kernel messages to the console, dmesg, and
238the syslog daemon. It is useful for debugging and reporting errors, and
239can be used inside interrupt context, but use with caution: a machine
240which has its console flooded with printk messages is unusable. It uses
241a format string mostly compatible with ANSI C printf, and C string
242concatenation to give it a first "priority" argument::
243
244 printk(KERN_INFO "i = %u\n", i);
245
246
247See ``include/linux/kern_levels.h``; for other ``KERN_`` values; these are
248interpreted by syslog as the level. Special case: for printing an IP
249address use::
250
251 __be32 ipaddress;
252 printk(KERN_INFO "my ip: %pI4\n", &ipaddress);
253
254
255:c:func:`printk()` internally uses a 1K buffer and does not catch
256overruns. Make sure that will be enough.
257
258.. note::
259
260 You will know when you are a real kernel hacker when you start
261 typoing printf as printk in your user programs :)
262
263.. note::
264
265 Another sidenote: the original Unix Version 6 sources had a comment
266 on top of its printf function: "Printf should not be used for
267 chit-chat". You should follow that advice.
268
269:c:func:`copy_to_user()` / :c:func:`copy_from_user()` / :c:func:`get_user()` / :c:func:`put_user()`
270---------------------------------------------------------------------------------------------------
271
272Defined in ``include/linux/uaccess.h`` / ``asm/uaccess.h``
273
274**[SLEEPS]**
275
276:c:func:`put_user()` and :c:func:`get_user()` are used to get
277and put single values (such as an int, char, or long) from and to
278userspace. A pointer into userspace should never be simply dereferenced:
279data should be copied using these routines. Both return ``-EFAULT`` or
2800.
281
282:c:func:`copy_to_user()` and :c:func:`copy_from_user()` are
283more general: they copy an arbitrary amount of data to and from
284userspace.
285
286.. warning::
287
288 Unlike :c:func:`put_user()` and :c:func:`get_user()`, they
289 return the amount of uncopied data (ie. 0 still means success).
290
291[Yes, this moronic interface makes me cringe. The flamewar comes up
292every year or so. --RR.]
293
294The functions may sleep implicitly. This should never be called outside
295user context (it makes no sense), with interrupts disabled, or a
296spinlock held.
297
298:c:func:`kmalloc()`/:c:func:`kfree()`
299-------------------------------------
300
301Defined in ``include/linux/slab.h``
302
303**[MAY SLEEP: SEE BELOW]**
304
305These routines are used to dynamically request pointer-aligned chunks of
306memory, like malloc and free do in userspace, but
307:c:func:`kmalloc()` takes an extra flag word. Important values:
308
309``GFP_KERNEL``
310 May sleep and swap to free memory. Only allowed in user context, but
311 is the most reliable way to allocate memory.
312
313``GFP_ATOMIC``
314 Don't sleep. Less reliable than ``GFP_KERNEL``, but may be called
315 from interrupt context. You should **really** have a good
316 out-of-memory error-handling strategy.
317
318``GFP_DMA``
319 Allocate ISA DMA lower than 16MB. If you don't know what that is you
320 don't need it. Very unreliable.
321
322If you see a sleeping function called from invalid context warning
323message, then maybe you called a sleeping allocation function from
324interrupt context without ``GFP_ATOMIC``. You should really fix that.
325Run, don't walk.
326
327If you are allocating at least ``PAGE_SIZE`` (``asm/page.h`` or
328``asm/page_types.h``) bytes, consider using :c:func:`__get_free_pages()`
329(``include/linux/gfp.h``). It takes an order argument (0 for page sized,
3301 for double page, 2 for four pages etc.) and the same memory priority
331flag word as above.
332
333If you are allocating more than a page worth of bytes you can use
334:c:func:`vmalloc()`. It'll allocate virtual memory in the kernel
335map. This block is not contiguous in physical memory, but the MMU makes
336it look like it is for you (so it'll only look contiguous to the CPUs,
337not to external device drivers). If you really need large physically
338contiguous memory for some weird device, you have a problem: it is
339poorly supported in Linux because after some time memory fragmentation
340in a running kernel makes it hard. The best way is to allocate the block
341early in the boot process via the :c:func:`alloc_bootmem()`
342routine.
343
344Before inventing your own cache of often-used objects consider using a
345slab cache in ``include/linux/slab.h``
346
347:c:func:`current()`
348-------------------
349
350Defined in ``include/asm/current.h``
351
352This global variable (really a macro) contains a pointer to the current
353task structure, so is only valid in user context. For example, when a
354process makes a system call, this will point to the task structure of
355the calling process. It is **not NULL** in interrupt context.
356
357:c:func:`mdelay()`/:c:func:`udelay()`
358-------------------------------------
359
360Defined in ``include/asm/delay.h`` / ``include/linux/delay.h``
361
362The :c:func:`udelay()` and :c:func:`ndelay()` functions can be
363used for small pauses. Do not use large values with them as you risk
364overflow - the helper function :c:func:`mdelay()` is useful here, or
365consider :c:func:`msleep()`.
366
367:c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()`
368-----------------------------------------------------------------------------------------------
369
370Defined in ``include/asm/byteorder.h``
371
372The :c:func:`cpu_to_be32()` family (where the "32" can be replaced
373by 64 or 16, and the "be" can be replaced by "le") are the general way
374to do endian conversions in the kernel: they return the converted value.
375All variations supply the reverse as well:
376:c:func:`be32_to_cpu()`, etc.
377
378There are two major variations of these functions: the pointer
379variation, such as :c:func:`cpu_to_be32p()`, which take a pointer
380to the given type, and return the converted value. The other variation
381is the "in-situ" family, such as :c:func:`cpu_to_be32s()`, which
382convert value referred to by the pointer, and return void.
383
384:c:func:`local_irq_save()`/:c:func:`local_irq_restore()`
385--------------------------------------------------------
386
387Defined in ``include/linux/irqflags.h``
388
389These routines disable hard interrupts on the local CPU, and restore
390them. They are reentrant; saving the previous state in their one
391``unsigned long flags`` argument. If you know that interrupts are
392enabled, you can simply use :c:func:`local_irq_disable()` and
393:c:func:`local_irq_enable()`.
394
395.. _local_bh_disable:
396
397:c:func:`local_bh_disable()`/:c:func:`local_bh_enable()`
398--------------------------------------------------------
399
400Defined in ``include/linux/bottom_half.h``
401
402
403These routines disable soft interrupts on the local CPU, and restore
404them. They are reentrant; if soft interrupts were disabled before, they
405will still be disabled after this pair of functions has been called.
406They prevent softirqs and tasklets from running on the current CPU.
407
408:c:func:`smp_processor_id()`
409----------------------------
410
411Defined in ``include/linux/smp.h``
412
413:c:func:`get_cpu()` disables preemption (so you won't suddenly get
414moved to another CPU) and returns the current processor number, between
4150 and ``NR_CPUS``. Note that the CPU numbers are not necessarily
416continuous. You return it again with :c:func:`put_cpu()` when you
417are done.
418
419If you know you cannot be preempted by another task (ie. you are in
420interrupt context, or have preemption disabled) you can use
421smp_processor_id().
422
423``__init``/``__exit``/``__initdata``
424------------------------------------
425
426Defined in ``include/linux/init.h``
427
428After boot, the kernel frees up a special section; functions marked with
429``__init`` and data structures marked with ``__initdata`` are dropped
430after boot is complete: similarly modules discard this memory after
431initialization. ``__exit`` is used to declare a function which is only
432required on exit: the function will be dropped if this file is not
433compiled as a module. See the header file for use. Note that it makes no
434sense for a function marked with ``__init`` to be exported to modules
435with :c:func:`EXPORT_SYMBOL()` or :c:func:`EXPORT_SYMBOL_GPL()`- this
436will break.
437
438:c:func:`__initcall()`/:c:func:`module_init()`
439----------------------------------------------
440
441Defined in ``include/linux/init.h`` / ``include/linux/module.h``
442
443Many parts of the kernel are well served as a module
444(dynamically-loadable parts of the kernel). Using the
445:c:func:`module_init()` and :c:func:`module_exit()` macros it
446is easy to write code without #ifdefs which can operate both as a module
447or built into the kernel.
448
449The :c:func:`module_init()` macro defines which function is to be
450called at module insertion time (if the file is compiled as a module),
451or at boot time: if the file is not compiled as a module the
452:c:func:`module_init()` macro becomes equivalent to
453:c:func:`__initcall()`, which through linker magic ensures that
454the function is called on boot.
455
456The function can return a negative error number to cause module loading
457to fail (unfortunately, this has no effect if the module is compiled
458into the kernel). This function is called in user context with
459interrupts enabled, so it can sleep.
460
461:c:func:`module_exit()`
462-----------------------
463
464
465Defined in ``include/linux/module.h``
466
467This macro defines the function to be called at module removal time (or
468never, in the case of the file compiled into the kernel). It will only
469be called if the module usage count has reached zero. This function can
470also sleep, but cannot fail: everything must be cleaned up by the time
471it returns.
472
473Note that this macro is optional: if it is not present, your module will
474not be removable (except for 'rmmod -f').
475
476:c:func:`try_module_get()`/:c:func:`module_put()`
477-------------------------------------------------
478
479Defined in ``include/linux/module.h``
480
481These manipulate the module usage count, to protect against removal (a
482module also can't be removed if another module uses one of its exported
483symbols: see below). Before calling into module code, you should call
484:c:func:`try_module_get()` on that module: if it fails, then the
485module is being removed and you should act as if it wasn't there.
486Otherwise, you can safely enter the module, and call
487:c:func:`module_put()` when you're finished.
488
489Most registerable structures have an owner field, such as in the
490:c:type:`struct file_operations <file_operations>` structure.
491Set this field to the macro ``THIS_MODULE``.
492
493Wait Queues ``include/linux/wait.h``
494====================================
495
496**[SLEEPS]**
497
498A wait queue is used to wait for someone to wake you up when a certain
499condition is true. They must be used carefully to ensure there is no
500race condition. You declare a :c:type:`wait_queue_head_t`, and then processes
501which want to wait for that condition declare a :c:type:`wait_queue_entry_t`
502referring to themselves, and place that in the queue.
503
504Declaring
505---------
506
507You declare a ``wait_queue_head_t`` using the
508:c:func:`DECLARE_WAIT_QUEUE_HEAD()` macro, or using the
509:c:func:`init_waitqueue_head()` routine in your initialization
510code.
511
512Queuing
513-------
514
515Placing yourself in the waitqueue is fairly complex, because you must
516put yourself in the queue before checking the condition. There is a
517macro to do this: :c:func:`wait_event_interruptible()`
518(``include/linux/wait.h``) The first argument is the wait queue head, and
519the second is an expression which is evaluated; the macro returns 0 when
520this expression is true, or ``-ERESTARTSYS`` if a signal is received. The
521:c:func:`wait_event()` version ignores signals.
522
523Waking Up Queued Tasks
524----------------------
525
526Call :c:func:`wake_up()` (``include/linux/wait.h``);, which will wake
527up every process in the queue. The exception is if one has
528``TASK_EXCLUSIVE`` set, in which case the remainder of the queue will
529not be woken. There are other variants of this basic function available
530in the same header.
531
532Atomic Operations
533=================
534
535Certain operations are guaranteed atomic on all platforms. The first
536class of operations work on :c:type:`atomic_t` (``include/asm/atomic.h``);
537this contains a signed integer (at least 32 bits long), and you must use
538these functions to manipulate or read :c:type:`atomic_t` variables.
539:c:func:`atomic_read()` and :c:func:`atomic_set()` get and set
540the counter, :c:func:`atomic_add()`, :c:func:`atomic_sub()`,
541:c:func:`atomic_inc()`, :c:func:`atomic_dec()`, and
542:c:func:`atomic_dec_and_test()` (returns true if it was
543decremented to zero).
544
545Yes. It returns true (i.e. != 0) if the atomic variable is zero.
546
547Note that these functions are slower than normal arithmetic, and so
548should not be used unnecessarily.
549
550The second class of atomic operations is atomic bit operations on an
551``unsigned long``, defined in ``include/linux/bitops.h``. These
552operations generally take a pointer to the bit pattern, and a bit
553number: 0 is the least significant bit. :c:func:`set_bit()`,
554:c:func:`clear_bit()` and :c:func:`change_bit()` set, clear,
555and flip the given bit. :c:func:`test_and_set_bit()`,
556:c:func:`test_and_clear_bit()` and
557:c:func:`test_and_change_bit()` do the same thing, except return
558true if the bit was previously set; these are particularly useful for
559atomically setting flags.
560
561It is possible to call these operations with bit indices greater than
562``BITS_PER_LONG``. The resulting behavior is strange on big-endian
563platforms though so it is a good idea not to do this.
564
565Symbols
566=======
567
568Within the kernel proper, the normal linking rules apply (ie. unless a
569symbol is declared to be file scope with the ``static`` keyword, it can
570be used anywhere in the kernel). However, for modules, a special
571exported symbol table is kept which limits the entry points to the
572kernel proper. Modules can also export symbols.
573
574:c:func:`EXPORT_SYMBOL()`
575-------------------------
576
577Defined in ``include/linux/export.h``
578
579This is the classic method of exporting a symbol: dynamically loaded
580modules will be able to use the symbol as normal.
581
582:c:func:`EXPORT_SYMBOL_GPL()`
583-----------------------------
584
585Defined in ``include/linux/export.h``
586
587Similar to :c:func:`EXPORT_SYMBOL()` except that the symbols
588exported by :c:func:`EXPORT_SYMBOL_GPL()` can only be seen by
589modules with a :c:func:`MODULE_LICENSE()` that specifies a GPL
590compatible license. It implies that the function is considered an
591internal implementation issue, and not really an interface. Some
592maintainers and developers may however require EXPORT_SYMBOL_GPL()
593when adding any new APIs or functionality.
594
595Routines and Conventions
596========================
597
598Double-linked lists ``include/linux/list.h``
599--------------------------------------------
600
601There used to be three sets of linked-list routines in the kernel
602headers, but this one is the winner. If you don't have some particular
603pressing need for a single list, it's a good choice.
604
605In particular, :c:func:`list_for_each_entry()` is useful.
606
607Return Conventions
608------------------
609
610For code called in user context, it's very common to defy C convention,
611and return 0 for success, and a negative error number (eg. ``-EFAULT``) for
612failure. This can be unintuitive at first, but it's fairly widespread in
613the kernel.
614
615Using :c:func:`ERR_PTR()` (``include/linux/err.h``) to encode a
616negative error number into a pointer, and :c:func:`IS_ERR()` and
617:c:func:`PTR_ERR()` to get it back out again: avoids a separate
618pointer parameter for the error number. Icky, but in a good way.
619
620Breaking Compilation
621--------------------
622
623Linus and the other developers sometimes change function or structure
624names in development kernels; this is not done just to keep everyone on
625their toes: it reflects a fundamental change (eg. can no longer be
626called with interrupts on, or does extra checks, or doesn't do checks
627which were caught before). Usually this is accompanied by a fairly
628complete note to the linux-kernel mailing list; search the archive.
629Simply doing a global replace on the file usually makes things **worse**.
630
631Initializing structure members
632------------------------------
633
634The preferred method of initializing structures is to use designated
635initialisers, as defined by ISO C99, eg::
636
637 static struct block_device_operations opt_fops = {
638 .open = opt_open,
639 .release = opt_release,
640 .ioctl = opt_ioctl,
641 .check_media_change = opt_media_change,
642 };
643
644
645This makes it easy to grep for, and makes it clear which structure
646fields are set. You should do this because it looks cool.
647
648GNU Extensions
649--------------
650
651GNU Extensions are explicitly allowed in the Linux kernel. Note that
652some of the more complex ones are not very well supported, due to lack
653of general use, but the following are considered standard (see the GCC
654info page section "C Extensions" for more details - Yes, really the info
655page, the man page is only a short summary of the stuff in info).
656
657- Inline functions
658
659- Statement expressions (ie. the ({ and }) constructs).
660
661- Declaring attributes of a function / variable / type
662 (__attribute__)
663
664- typeof
665
666- Zero length arrays
667
668- Macro varargs
669
670- Arithmetic on void pointers
671
672- Non-Constant initializers
673
674- Assembler Instructions (not outside arch/ and include/asm/)
675
676- Function names as strings (__func__).
677
678- __builtin_constant_p()
679
680Be wary when using long long in the kernel, the code gcc generates for
681it is horrible and worse: division and multiplication does not work on
682i386 because the GCC runtime functions for it are missing from the
683kernel environment.
684
685C++
686---
687
688Using C++ in the kernel is usually a bad idea, because the kernel does
689not provide the necessary runtime environment and the include files are
690not tested for it. It is still possible, but not recommended. If you
691really want to do this, forget about exceptions at least.
692
693NUMif
694-----
695
696It is generally considered cleaner to use macros in header files (or at
697the top of .c files) to abstract away functions rather than using \`#if'
698pre-processor statements throughout the source code.
699
700Putting Your Stuff in the Kernel
701================================
702
703In order to get your stuff into shape for official inclusion, or even to
704make a neat patch, there's administrative work to be done:
705
706- Figure out whose pond you've been pissing in. Look at the top of the
707 source files, inside the ``MAINTAINERS`` file, and last of all in the
708 ``CREDITS`` file. You should coordinate with this person to make sure
709 you're not duplicating effort, or trying something that's already
710 been rejected.
711
712 Make sure you put your name and EMail address at the top of any files
713 you create or mangle significantly. This is the first place people
714 will look when they find a bug, or when **they** want to make a change.
715
716- Usually you want a configuration option for your kernel hack. Edit
717 ``Kconfig`` in the appropriate directory. The Config language is
718 simple to use by cut and paste, and there's complete documentation in
719 ``Documentation/kbuild/kconfig-language.txt``.
720
721 In your description of the option, make sure you address both the
722 expert user and the user who knows nothing about your feature.
723 Mention incompatibilities and issues here. **Definitely** end your
724 description with “if in doubt, say N†(or, occasionally, \`Y'); this
725 is for people who have no idea what you are talking about.
726
727- Edit the ``Makefile``: the CONFIG variables are exported here so you
728 can usually just add a "obj-$(CONFIG_xxx) += xxx.o" line. The syntax
729 is documented in ``Documentation/kbuild/makefiles.txt``.
730
731- Put yourself in ``CREDITS`` if you've done something noteworthy,
732 usually beyond a single file (your name should be at the top of the
733 source files anyway). ``MAINTAINERS`` means you want to be consulted
734 when changes are made to a subsystem, and hear about bugs; it implies
735 a more-than-passing commitment to some part of the code.
736
737- Finally, don't forget to read
738 ``Documentation/process/submitting-patches.rst`` and possibly
739 ``Documentation/process/submitting-drivers.rst``.
740
741Kernel Cantrips
742===============
743
744Some favorites from browsing the source. Feel free to add to this list.
745
746``arch/x86/include/asm/delay.h``::
747
748 #define ndelay(n) (__builtin_constant_p(n) ? \
749 ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
750 __ndelay(n))
751
752
753``include/linux/fs.h``::
754
755 /*
756 * Kernel pointers have redundant information, so we can use a
757 * scheme where we can return either an error code or a dentry
758 * pointer with the same return value.
759 *
760 * This should be a per-architecture thing, to allow different
761 * error and pointer decisions.
762 */
763 #define ERR_PTR(err) ((void *)((long)(err)))
764 #define PTR_ERR(ptr) ((long)(ptr))
765 #define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000))
766
767``arch/x86/include/asm/uaccess_32.h:``::
768
769 #define copy_to_user(to,from,n) \
770 (__builtin_constant_p(n) ? \
771 __constant_copy_to_user((to),(from),(n)) : \
772 __generic_copy_to_user((to),(from),(n)))
773
774
775``arch/sparc/kernel/head.S:``::
776
777 /*
778 * Sun people can't spell worth damn. "compatability" indeed.
779 * At least we *know* we can't spell, and use a spell-checker.
780 */
781
782 /* Uh, actually Linus it is I who cannot spell. Too much murky
783 * Sparc assembly will do this to ya.
784 */
785 C_LABEL(cputypvar):
786 .asciz "compatibility"
787
788 /* Tested on SS-5, SS-10. Probably someone at Sun applied a spell-checker. */
789 .align 4
790 C_LABEL(cputypvar_sun4m):
791 .asciz "compatible"
792
793
794``arch/sparc/lib/checksum.S:``::
795
796 /* Sun, you just can't beat me, you just can't. Stop trying,
797 * give up. I'm serious, I am going to kick the living shit
798 * out of you, game over, lights out.
799 */
800
801
802Thanks
803======
804
805Thanks to Andi Kleen for the idea, answering my questions, fixing my
806mistakes, filling content, etc. Philipp Rumpf for more spelling and
807clarity fixes, and some excellent non-obvious points. Werner Almesberger
808for giving me a great summary of :c:func:`disable_irq()`, and Jes
809Sorensen and Andrea Arcangeli added caveats. Michael Elizabeth Chastain
810for checking and adding to the Configure section. Telsa Gwynne for
811teaching me DocBook.
diff --git a/Documentation/kernel-hacking/index.rst b/Documentation/kernel-hacking/index.rst
new file mode 100644
index 000000000000..fcb0eda3cca3
--- /dev/null
+++ b/Documentation/kernel-hacking/index.rst
@@ -0,0 +1,9 @@
1=====================
2Kernel Hacking Guides
3=====================
4
5.. toctree::
6 :maxdepth: 2
7
8 hacking
9 locking
diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
new file mode 100644
index 000000000000..f937c0fd11aa
--- /dev/null
+++ b/Documentation/kernel-hacking/locking.rst
@@ -0,0 +1,1446 @@
1===========================
2Unreliable Guide To Locking
3===========================
4
5:Author: Rusty Russell
6
7Introduction
8============
9
10Welcome, to Rusty's Remarkably Unreliable Guide to Kernel Locking
11issues. This document describes the locking systems in the Linux Kernel
12in 2.6.
13
14With the wide availability of HyperThreading, and preemption in the
15Linux Kernel, everyone hacking on the kernel needs to know the
16fundamentals of concurrency and locking for SMP.
17
18The Problem With Concurrency
19============================
20
21(Skip this if you know what a Race Condition is).
22
23In a normal program, you can increment a counter like so:
24
25::
26
27 very_important_count++;
28
29
30This is what they would expect to happen:
31
32
33.. table:: Expected Results
34
35 +------------------------------------+------------------------------------+
36 | Instance 1 | Instance 2 |
37 +====================================+====================================+
38 | read very_important_count (5) | |
39 +------------------------------------+------------------------------------+
40 | add 1 (6) | |
41 +------------------------------------+------------------------------------+
42 | write very_important_count (6) | |
43 +------------------------------------+------------------------------------+
44 | | read very_important_count (6) |
45 +------------------------------------+------------------------------------+
46 | | add 1 (7) |
47 +------------------------------------+------------------------------------+
48 | | write very_important_count (7) |
49 +------------------------------------+------------------------------------+
50
51This is what might happen:
52
53.. table:: Possible Results
54
55 +------------------------------------+------------------------------------+
56 | Instance 1 | Instance 2 |
57 +====================================+====================================+
58 | read very_important_count (5) | |
59 +------------------------------------+------------------------------------+
60 | | read very_important_count (5) |
61 +------------------------------------+------------------------------------+
62 | add 1 (6) | |
63 +------------------------------------+------------------------------------+
64 | | add 1 (6) |
65 +------------------------------------+------------------------------------+
66 | write very_important_count (6) | |
67 +------------------------------------+------------------------------------+
68 | | write very_important_count (6) |
69 +------------------------------------+------------------------------------+
70
71
72Race Conditions and Critical Regions
73------------------------------------
74
75This overlap, where the result depends on the relative timing of
76multiple tasks, is called a race condition. The piece of code containing
77the concurrency issue is called a critical region. And especially since
78Linux starting running on SMP machines, they became one of the major
79issues in kernel design and implementation.
80
81Preemption can have the same effect, even if there is only one CPU: by
82preempting one task during the critical region, we have exactly the same
83race condition. In this case the thread which preempts might run the
84critical region itself.
85
86The solution is to recognize when these simultaneous accesses occur, and
87use locks to make sure that only one instance can enter the critical
88region at any time. There are many friendly primitives in the Linux
89kernel to help you do this. And then there are the unfriendly
90primitives, but I'll pretend they don't exist.
91
92Locking in the Linux Kernel
93===========================
94
95If I could give you one piece of advice: never sleep with anyone crazier
96than yourself. But if I had to give you advice on locking: **keep it
97simple**.
98
99Be reluctant to introduce new locks.
100
101Strangely enough, this last one is the exact reverse of my advice when
102you **have** slept with someone crazier than yourself. And you should
103think about getting a big dog.
104
105Two Main Types of Kernel Locks: Spinlocks and Mutexes
106-----------------------------------------------------
107
108There are two main types of kernel locks. The fundamental type is the
109spinlock (``include/asm/spinlock.h``), which is a very simple
110single-holder lock: if you can't get the spinlock, you keep trying
111(spinning) until you can. Spinlocks are very small and fast, and can be
112used anywhere.
113
114The second type is a mutex (``include/linux/mutex.h``): it is like a
115spinlock, but you may block holding a mutex. If you can't lock a mutex,
116your task will suspend itself, and be woken up when the mutex is
117released. This means the CPU can do something else while you are
118waiting. There are many cases when you simply can't sleep (see
119`What Functions Are Safe To Call From Interrupts? <#sleeping-things>`__),
120and so have to use a spinlock instead.
121
122Neither type of lock is recursive: see
123`Deadlock: Simple and Advanced <#deadlock>`__.
124
125Locks and Uniprocessor Kernels
126------------------------------
127
128For kernels compiled without ``CONFIG_SMP``, and without
129``CONFIG_PREEMPT`` spinlocks do not exist at all. This is an excellent
130design decision: when no-one else can run at the same time, there is no
131reason to have a lock.
132
133If the kernel is compiled without ``CONFIG_SMP``, but ``CONFIG_PREEMPT``
134is set, then spinlocks simply disable preemption, which is sufficient to
135prevent any races. For most purposes, we can think of preemption as
136equivalent to SMP, and not worry about it separately.
137
138You should always test your locking code with ``CONFIG_SMP`` and
139``CONFIG_PREEMPT`` enabled, even if you don't have an SMP test box,
140because it will still catch some kinds of locking bugs.
141
142Mutexes still exist, because they are required for synchronization
143between user contexts, as we will see below.
144
145Locking Only In User Context
146----------------------------
147
148If you have a data structure which is only ever accessed from user
149context, then you can use a simple mutex (``include/linux/mutex.h``) to
150protect it. This is the most trivial case: you initialize the mutex.
151Then you can call :c:func:`mutex_lock_interruptible()` to grab the
152mutex, and :c:func:`mutex_unlock()` to release it. There is also a
153:c:func:`mutex_lock()`, which should be avoided, because it will
154not return if a signal is received.
155
156Example: ``net/netfilter/nf_sockopt.c`` allows registration of new
157:c:func:`setsockopt()` and :c:func:`getsockopt()` calls, with
158:c:func:`nf_register_sockopt()`. Registration and de-registration
159are only done on module load and unload (and boot time, where there is
160no concurrency), and the list of registrations is only consulted for an
161unknown :c:func:`setsockopt()` or :c:func:`getsockopt()` system
162call. The ``nf_sockopt_mutex`` is perfect to protect this, especially
163since the setsockopt and getsockopt calls may well sleep.
164
165Locking Between User Context and Softirqs
166-----------------------------------------
167
168If a softirq shares data with user context, you have two problems.
169Firstly, the current user context can be interrupted by a softirq, and
170secondly, the critical region could be entered from another CPU. This is
171where :c:func:`spin_lock_bh()` (``include/linux/spinlock.h``) is
172used. It disables softirqs on that CPU, then grabs the lock.
173:c:func:`spin_unlock_bh()` does the reverse. (The '_bh' suffix is
174a historical reference to "Bottom Halves", the old name for software
175interrupts. It should really be called spin_lock_softirq()' in a
176perfect world).
177
178Note that you can also use :c:func:`spin_lock_irq()` or
179:c:func:`spin_lock_irqsave()` here, which stop hardware interrupts
180as well: see `Hard IRQ Context <#hardirq-context>`__.
181
182This works perfectly for UP as well: the spin lock vanishes, and this
183macro simply becomes :c:func:`local_bh_disable()`
184(``include/linux/interrupt.h``), which protects you from the softirq
185being run.
186
187Locking Between User Context and Tasklets
188-----------------------------------------
189
190This is exactly the same as above, because tasklets are actually run
191from a softirq.
192
193Locking Between User Context and Timers
194---------------------------------------
195
196This, too, is exactly the same as above, because timers are actually run
197from a softirq. From a locking point of view, tasklets and timers are
198identical.
199
200Locking Between Tasklets/Timers
201-------------------------------
202
203Sometimes a tasklet or timer might want to share data with another
204tasklet or timer.
205
206The Same Tasklet/Timer
207~~~~~~~~~~~~~~~~~~~~~~
208
209Since a tasklet is never run on two CPUs at once, you don't need to
210worry about your tasklet being reentrant (running twice at once), even
211on SMP.
212
213Different Tasklets/Timers
214~~~~~~~~~~~~~~~~~~~~~~~~~
215
216If another tasklet/timer wants to share data with your tasklet or timer
217, you will both need to use :c:func:`spin_lock()` and
218:c:func:`spin_unlock()` calls. :c:func:`spin_lock_bh()` is
219unnecessary here, as you are already in a tasklet, and none will be run
220on the same CPU.
221
222Locking Between Softirqs
223------------------------
224
225Often a softirq might want to share data with itself or a tasklet/timer.
226
227The Same Softirq
228~~~~~~~~~~~~~~~~
229
230The same softirq can run on the other CPUs: you can use a per-CPU array
231(see `Per-CPU Data <#per-cpu>`__) for better performance. If you're
232going so far as to use a softirq, you probably care about scalable
233performance enough to justify the extra complexity.
234
235You'll need to use :c:func:`spin_lock()` and
236:c:func:`spin_unlock()` for shared data.
237
238Different Softirqs
239~~~~~~~~~~~~~~~~~~
240
241You'll need to use :c:func:`spin_lock()` and
242:c:func:`spin_unlock()` for shared data, whether it be a timer,
243tasklet, different softirq or the same or another softirq: any of them
244could be running on a different CPU.
245
246Hard IRQ Context
247================
248
249Hardware interrupts usually communicate with a tasklet or softirq.
250Frequently this involves putting work in a queue, which the softirq will
251take out.
252
253Locking Between Hard IRQ and Softirqs/Tasklets
254----------------------------------------------
255
256If a hardware irq handler shares data with a softirq, you have two
257concerns. Firstly, the softirq processing can be interrupted by a
258hardware interrupt, and secondly, the critical region could be entered
259by a hardware interrupt on another CPU. This is where
260:c:func:`spin_lock_irq()` is used. It is defined to disable
261interrupts on that cpu, then grab the lock.
262:c:func:`spin_unlock_irq()` does the reverse.
263
264The irq handler does not to use :c:func:`spin_lock_irq()`, because
265the softirq cannot run while the irq handler is running: it can use
266:c:func:`spin_lock()`, which is slightly faster. The only exception
267would be if a different hardware irq handler uses the same lock:
268:c:func:`spin_lock_irq()` will stop that from interrupting us.
269
270This works perfectly for UP as well: the spin lock vanishes, and this
271macro simply becomes :c:func:`local_irq_disable()`
272(``include/asm/smp.h``), which protects you from the softirq/tasklet/BH
273being run.
274
275:c:func:`spin_lock_irqsave()` (``include/linux/spinlock.h``) is a
276variant which saves whether interrupts were on or off in a flags word,
277which is passed to :c:func:`spin_unlock_irqrestore()`. This means
278that the same code can be used inside an hard irq handler (where
279interrupts are already off) and in softirqs (where the irq disabling is
280required).
281
282Note that softirqs (and hence tasklets and timers) are run on return
283from hardware interrupts, so :c:func:`spin_lock_irq()` also stops
284these. In that sense, :c:func:`spin_lock_irqsave()` is the most
285general and powerful locking function.
286
287Locking Between Two Hard IRQ Handlers
288-------------------------------------
289
290It is rare to have to share data between two IRQ handlers, but if you
291do, :c:func:`spin_lock_irqsave()` should be used: it is
292architecture-specific whether all interrupts are disabled inside irq
293handlers themselves.
294
295Cheat Sheet For Locking
296=======================
297
298Pete Zaitcev gives the following summary:
299
300- If you are in a process context (any syscall) and want to lock other
301 process out, use a mutex. You can take a mutex and sleep
302 (``copy_from_user*(`` or ``kmalloc(x,GFP_KERNEL)``).
303
304- Otherwise (== data can be touched in an interrupt), use
305 :c:func:`spin_lock_irqsave()` and
306 :c:func:`spin_unlock_irqrestore()`.
307
308- Avoid holding spinlock for more than 5 lines of code and across any
309 function call (except accessors like :c:func:`readb()`).
310
311Table of Minimum Requirements
312-----------------------------
313
314The following table lists the **minimum** locking requirements between
315various contexts. In some cases, the same context can only be running on
316one CPU at a time, so no locking is required for that context (eg. a
317particular thread can only run on one CPU at a time, but if it needs
318shares data with another thread, locking is required).
319
320Remember the advice above: you can always use
321:c:func:`spin_lock_irqsave()`, which is a superset of all other
322spinlock primitives.
323
324============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
325. IRQ Handler A IRQ Handler B Softirq A Softirq B Tasklet A Tasklet B Timer A Timer B User Context A User Context B
326============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
327IRQ Handler A None
328IRQ Handler B SLIS None
329Softirq A SLI SLI SL
330Softirq B SLI SLI SL SL
331Tasklet A SLI SLI SL SL None
332Tasklet B SLI SLI SL SL SL None
333Timer A SLI SLI SL SL SL SL None
334Timer B SLI SLI SL SL SL SL SL None
335User Context A SLI SLI SLBH SLBH SLBH SLBH SLBH SLBH None
336User Context B SLI SLI SLBH SLBH SLBH SLBH SLBH SLBH MLI None
337============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ==============
338
339Table: Table of Locking Requirements
340
341+--------+----------------------------+
342| SLIS | spin_lock_irqsave |
343+--------+----------------------------+
344| SLI | spin_lock_irq |
345+--------+----------------------------+
346| SL | spin_lock |
347+--------+----------------------------+
348| SLBH | spin_lock_bh |
349+--------+----------------------------+
350| MLI | mutex_lock_interruptible |
351+--------+----------------------------+
352
353Table: Legend for Locking Requirements Table
354
355The trylock Functions
356=====================
357
358There are functions that try to acquire a lock only once and immediately
359return a value telling about success or failure to acquire the lock.
360They can be used if you need no access to the data protected with the
361lock when some other thread is holding the lock. You should acquire the
362lock later if you then need access to the data protected with the lock.
363
364:c:func:`spin_trylock()` does not spin but returns non-zero if it
365acquires the spinlock on the first try or 0 if not. This function can be
366used in all contexts like :c:func:`spin_lock()`: you must have
367disabled the contexts that might interrupt you and acquire the spin
368lock.
369
370:c:func:`mutex_trylock()` does not suspend your task but returns
371non-zero if it could lock the mutex on the first try or 0 if not. This
372function cannot be safely used in hardware or software interrupt
373contexts despite not sleeping.
374
375Common Examples
376===============
377
378Let's step through a simple example: a cache of number to name mappings.
379The cache keeps a count of how often each of the objects is used, and
380when it gets full, throws out the least used one.
381
382All In User Context
383-------------------
384
385For our first example, we assume that all operations are in user context
386(ie. from system calls), so we can sleep. This means we can use a mutex
387to protect the cache and all the objects within it. Here's the code::
388
389 #include <linux/list.h>
390 #include <linux/slab.h>
391 #include <linux/string.h>
392 #include <linux/mutex.h>
393 #include <asm/errno.h>
394
395 struct object
396 {
397 struct list_head list;
398 int id;
399 char name[32];
400 int popularity;
401 };
402
403 /* Protects the cache, cache_num, and the objects within it */
404 static DEFINE_MUTEX(cache_lock);
405 static LIST_HEAD(cache);
406 static unsigned int cache_num = 0;
407 #define MAX_CACHE_SIZE 10
408
409 /* Must be holding cache_lock */
410 static struct object *__cache_find(int id)
411 {
412 struct object *i;
413
414 list_for_each_entry(i, &cache, list)
415 if (i->id == id) {
416 i->popularity++;
417 return i;
418 }
419 return NULL;
420 }
421
422 /* Must be holding cache_lock */
423 static void __cache_delete(struct object *obj)
424 {
425 BUG_ON(!obj);
426 list_del(&obj->list);
427 kfree(obj);
428 cache_num--;
429 }
430
431 /* Must be holding cache_lock */
432 static void __cache_add(struct object *obj)
433 {
434 list_add(&obj->list, &cache);
435 if (++cache_num > MAX_CACHE_SIZE) {
436 struct object *i, *outcast = NULL;
437 list_for_each_entry(i, &cache, list) {
438 if (!outcast || i->popularity < outcast->popularity)
439 outcast = i;
440 }
441 __cache_delete(outcast);
442 }
443 }
444
445 int cache_add(int id, const char *name)
446 {
447 struct object *obj;
448
449 if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL)
450 return -ENOMEM;
451
452 strlcpy(obj->name, name, sizeof(obj->name));
453 obj->id = id;
454 obj->popularity = 0;
455
456 mutex_lock(&cache_lock);
457 __cache_add(obj);
458 mutex_unlock(&cache_lock);
459 return 0;
460 }
461
462 void cache_delete(int id)
463 {
464 mutex_lock(&cache_lock);
465 __cache_delete(__cache_find(id));
466 mutex_unlock(&cache_lock);
467 }
468
469 int cache_find(int id, char *name)
470 {
471 struct object *obj;
472 int ret = -ENOENT;
473
474 mutex_lock(&cache_lock);
475 obj = __cache_find(id);
476 if (obj) {
477 ret = 0;
478 strcpy(name, obj->name);
479 }
480 mutex_unlock(&cache_lock);
481 return ret;
482 }
483
484Note that we always make sure we have the cache_lock when we add,
485delete, or look up the cache: both the cache infrastructure itself and
486the contents of the objects are protected by the lock. In this case it's
487easy, since we copy the data for the user, and never let them access the
488objects directly.
489
490There is a slight (and common) optimization here: in
491:c:func:`cache_add()` we set up the fields of the object before
492grabbing the lock. This is safe, as no-one else can access it until we
493put it in cache.
494
495Accessing From Interrupt Context
496--------------------------------
497
498Now consider the case where :c:func:`cache_find()` can be called
499from interrupt context: either a hardware interrupt or a softirq. An
500example would be a timer which deletes object from the cache.
501
502The change is shown below, in standard patch format: the ``-`` are lines
503which are taken away, and the ``+`` are lines which are added.
504
505::
506
507 --- cache.c.usercontext 2003-12-09 13:58:54.000000000 +1100
508 +++ cache.c.interrupt 2003-12-09 14:07:49.000000000 +1100
509 @@ -12,7 +12,7 @@
510 int popularity;
511 };
512
513 -static DEFINE_MUTEX(cache_lock);
514 +static DEFINE_SPINLOCK(cache_lock);
515 static LIST_HEAD(cache);
516 static unsigned int cache_num = 0;
517 #define MAX_CACHE_SIZE 10
518 @@ -55,6 +55,7 @@
519 int cache_add(int id, const char *name)
520 {
521 struct object *obj;
522 + unsigned long flags;
523
524 if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL)
525 return -ENOMEM;
526 @@ -63,30 +64,33 @@
527 obj->id = id;
528 obj->popularity = 0;
529
530 - mutex_lock(&cache_lock);
531 + spin_lock_irqsave(&cache_lock, flags);
532 __cache_add(obj);
533 - mutex_unlock(&cache_lock);
534 + spin_unlock_irqrestore(&cache_lock, flags);
535 return 0;
536 }
537
538 void cache_delete(int id)
539 {
540 - mutex_lock(&cache_lock);
541 + unsigned long flags;
542 +
543 + spin_lock_irqsave(&cache_lock, flags);
544 __cache_delete(__cache_find(id));
545 - mutex_unlock(&cache_lock);
546 + spin_unlock_irqrestore(&cache_lock, flags);
547 }
548
549 int cache_find(int id, char *name)
550 {
551 struct object *obj;
552 int ret = -ENOENT;
553 + unsigned long flags;
554
555 - mutex_lock(&cache_lock);
556 + spin_lock_irqsave(&cache_lock, flags);
557 obj = __cache_find(id);
558 if (obj) {
559 ret = 0;
560 strcpy(name, obj->name);
561 }
562 - mutex_unlock(&cache_lock);
563 + spin_unlock_irqrestore(&cache_lock, flags);
564 return ret;
565 }
566
567Note that the :c:func:`spin_lock_irqsave()` will turn off
568interrupts if they are on, otherwise does nothing (if we are already in
569an interrupt handler), hence these functions are safe to call from any
570context.
571
572Unfortunately, :c:func:`cache_add()` calls :c:func:`kmalloc()`
573with the ``GFP_KERNEL`` flag, which is only legal in user context. I
574have assumed that :c:func:`cache_add()` is still only called in
575user context, otherwise this should become a parameter to
576:c:func:`cache_add()`.
577
578Exposing Objects Outside This File
579----------------------------------
580
581If our objects contained more information, it might not be sufficient to
582copy the information in and out: other parts of the code might want to
583keep pointers to these objects, for example, rather than looking up the
584id every time. This produces two problems.
585
586The first problem is that we use the ``cache_lock`` to protect objects:
587we'd need to make this non-static so the rest of the code can use it.
588This makes locking trickier, as it is no longer all in one place.
589
590The second problem is the lifetime problem: if another structure keeps a
591pointer to an object, it presumably expects that pointer to remain
592valid. Unfortunately, this is only guaranteed while you hold the lock,
593otherwise someone might call :c:func:`cache_delete()` and even
594worse, add another object, re-using the same address.
595
596As there is only one lock, you can't hold it forever: no-one else would
597get any work done.
598
599The solution to this problem is to use a reference count: everyone who
600has a pointer to the object increases it when they first get the object,
601and drops the reference count when they're finished with it. Whoever
602drops it to zero knows it is unused, and can actually delete it.
603
604Here is the code::
605
606 --- cache.c.interrupt 2003-12-09 14:25:43.000000000 +1100
607 +++ cache.c.refcnt 2003-12-09 14:33:05.000000000 +1100
608 @@ -7,6 +7,7 @@
609 struct object
610 {
611 struct list_head list;
612 + unsigned int refcnt;
613 int id;
614 char name[32];
615 int popularity;
616 @@ -17,6 +18,35 @@
617 static unsigned int cache_num = 0;
618 #define MAX_CACHE_SIZE 10
619
620 +static void __object_put(struct object *obj)
621 +{
622 + if (--obj->refcnt == 0)
623 + kfree(obj);
624 +}
625 +
626 +static void __object_get(struct object *obj)
627 +{
628 + obj->refcnt++;
629 +}
630 +
631 +void object_put(struct object *obj)
632 +{
633 + unsigned long flags;
634 +
635 + spin_lock_irqsave(&cache_lock, flags);
636 + __object_put(obj);
637 + spin_unlock_irqrestore(&cache_lock, flags);
638 +}
639 +
640 +void object_get(struct object *obj)
641 +{
642 + unsigned long flags;
643 +
644 + spin_lock_irqsave(&cache_lock, flags);
645 + __object_get(obj);
646 + spin_unlock_irqrestore(&cache_lock, flags);
647 +}
648 +
649 /* Must be holding cache_lock */
650 static struct object *__cache_find(int id)
651 {
652 @@ -35,6 +65,7 @@
653 {
654 BUG_ON(!obj);
655 list_del(&obj->list);
656 + __object_put(obj);
657 cache_num--;
658 }
659
660 @@ -63,6 +94,7 @@
661 strlcpy(obj->name, name, sizeof(obj->name));
662 obj->id = id;
663 obj->popularity = 0;
664 + obj->refcnt = 1; /* The cache holds a reference */
665
666 spin_lock_irqsave(&cache_lock, flags);
667 __cache_add(obj);
668 @@ -79,18 +111,15 @@
669 spin_unlock_irqrestore(&cache_lock, flags);
670 }
671
672 -int cache_find(int id, char *name)
673 +struct object *cache_find(int id)
674 {
675 struct object *obj;
676 - int ret = -ENOENT;
677 unsigned long flags;
678
679 spin_lock_irqsave(&cache_lock, flags);
680 obj = __cache_find(id);
681 - if (obj) {
682 - ret = 0;
683 - strcpy(name, obj->name);
684 - }
685 + if (obj)
686 + __object_get(obj);
687 spin_unlock_irqrestore(&cache_lock, flags);
688 - return ret;
689 + return obj;
690 }
691
692We encapsulate the reference counting in the standard 'get' and 'put'
693functions. Now we can return the object itself from
694:c:func:`cache_find()` which has the advantage that the user can
695now sleep holding the object (eg. to :c:func:`copy_to_user()` to
696name to userspace).
697
698The other point to note is that I said a reference should be held for
699every pointer to the object: thus the reference count is 1 when first
700inserted into the cache. In some versions the framework does not hold a
701reference count, but they are more complicated.
702
703Using Atomic Operations For The Reference Count
704~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
705
706In practice, :c:type:`atomic_t` would usually be used for refcnt. There are a
707number of atomic operations defined in ``include/asm/atomic.h``: these
708are guaranteed to be seen atomically from all CPUs in the system, so no
709lock is required. In this case, it is simpler than using spinlocks,
710although for anything non-trivial using spinlocks is clearer. The
711:c:func:`atomic_inc()` and :c:func:`atomic_dec_and_test()`
712are used instead of the standard increment and decrement operators, and
713the lock is no longer used to protect the reference count itself.
714
715::
716
717 --- cache.c.refcnt 2003-12-09 15:00:35.000000000 +1100
718 +++ cache.c.refcnt-atomic 2003-12-11 15:49:42.000000000 +1100
719 @@ -7,7 +7,7 @@
720 struct object
721 {
722 struct list_head list;
723 - unsigned int refcnt;
724 + atomic_t refcnt;
725 int id;
726 char name[32];
727 int popularity;
728 @@ -18,33 +18,15 @@
729 static unsigned int cache_num = 0;
730 #define MAX_CACHE_SIZE 10
731
732 -static void __object_put(struct object *obj)
733 -{
734 - if (--obj->refcnt == 0)
735 - kfree(obj);
736 -}
737 -
738 -static void __object_get(struct object *obj)
739 -{
740 - obj->refcnt++;
741 -}
742 -
743 void object_put(struct object *obj)
744 {
745 - unsigned long flags;
746 -
747 - spin_lock_irqsave(&cache_lock, flags);
748 - __object_put(obj);
749 - spin_unlock_irqrestore(&cache_lock, flags);
750 + if (atomic_dec_and_test(&obj->refcnt))
751 + kfree(obj);
752 }
753
754 void object_get(struct object *obj)
755 {
756 - unsigned long flags;
757 -
758 - spin_lock_irqsave(&cache_lock, flags);
759 - __object_get(obj);
760 - spin_unlock_irqrestore(&cache_lock, flags);
761 + atomic_inc(&obj->refcnt);
762 }
763
764 /* Must be holding cache_lock */
765 @@ -65,7 +47,7 @@
766 {
767 BUG_ON(!obj);
768 list_del(&obj->list);
769 - __object_put(obj);
770 + object_put(obj);
771 cache_num--;
772 }
773
774 @@ -94,7 +76,7 @@
775 strlcpy(obj->name, name, sizeof(obj->name));
776 obj->id = id;
777 obj->popularity = 0;
778 - obj->refcnt = 1; /* The cache holds a reference */
779 + atomic_set(&obj->refcnt, 1); /* The cache holds a reference */
780
781 spin_lock_irqsave(&cache_lock, flags);
782 __cache_add(obj);
783 @@ -119,7 +101,7 @@
784 spin_lock_irqsave(&cache_lock, flags);
785 obj = __cache_find(id);
786 if (obj)
787 - __object_get(obj);
788 + object_get(obj);
789 spin_unlock_irqrestore(&cache_lock, flags);
790 return obj;
791 }
792
793Protecting The Objects Themselves
794---------------------------------
795
796In these examples, we assumed that the objects (except the reference
797counts) never changed once they are created. If we wanted to allow the
798name to change, there are three possibilities:
799
800- You can make ``cache_lock`` non-static, and tell people to grab that
801 lock before changing the name in any object.
802
803- You can provide a :c:func:`cache_obj_rename()` which grabs this
804 lock and changes the name for the caller, and tell everyone to use
805 that function.
806
807- You can make the ``cache_lock`` protect only the cache itself, and
808 use another lock to protect the name.
809
810Theoretically, you can make the locks as fine-grained as one lock for
811every field, for every object. In practice, the most common variants
812are:
813
814- One lock which protects the infrastructure (the ``cache`` list in
815 this example) and all the objects. This is what we have done so far.
816
817- One lock which protects the infrastructure (including the list
818 pointers inside the objects), and one lock inside the object which
819 protects the rest of that object.
820
821- Multiple locks to protect the infrastructure (eg. one lock per hash
822 chain), possibly with a separate per-object lock.
823
824Here is the "lock-per-object" implementation:
825
826::
827
828 --- cache.c.refcnt-atomic 2003-12-11 15:50:54.000000000 +1100
829 +++ cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100
830 @@ -6,11 +6,17 @@
831
832 struct object
833 {
834 + /* These two protected by cache_lock. */
835 struct list_head list;
836 + int popularity;
837 +
838 atomic_t refcnt;
839 +
840 + /* Doesn't change once created. */
841 int id;
842 +
843 + spinlock_t lock; /* Protects the name */
844 char name[32];
845 - int popularity;
846 };
847
848 static DEFINE_SPINLOCK(cache_lock);
849 @@ -77,6 +84,7 @@
850 obj->id = id;
851 obj->popularity = 0;
852 atomic_set(&obj->refcnt, 1); /* The cache holds a reference */
853 + spin_lock_init(&obj->lock);
854
855 spin_lock_irqsave(&cache_lock, flags);
856 __cache_add(obj);
857
858Note that I decide that the popularity count should be protected by the
859``cache_lock`` rather than the per-object lock: this is because it (like
860the :c:type:`struct list_head <list_head>` inside the object)
861is logically part of the infrastructure. This way, I don't need to grab
862the lock of every object in :c:func:`__cache_add()` when seeking
863the least popular.
864
865I also decided that the id member is unchangeable, so I don't need to
866grab each object lock in :c:func:`__cache_find()` to examine the
867id: the object lock is only used by a caller who wants to read or write
868the name field.
869
870Note also that I added a comment describing what data was protected by
871which locks. This is extremely important, as it describes the runtime
872behavior of the code, and can be hard to gain from just reading. And as
873Alan Cox says, “Lock data, not codeâ€.
874
875Common Problems
876===============
877
878Deadlock: Simple and Advanced
879-----------------------------
880
881There is a coding bug where a piece of code tries to grab a spinlock
882twice: it will spin forever, waiting for the lock to be released
883(spinlocks, rwlocks and mutexes are not recursive in Linux). This is
884trivial to diagnose: not a
885stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem.
886
887For a slightly more complex case, imagine you have a region shared by a
888softirq and user context. If you use a :c:func:`spin_lock()` call
889to protect it, it is possible that the user context will be interrupted
890by the softirq while it holds the lock, and the softirq will then spin
891forever trying to get the same lock.
892
893Both of these are called deadlock, and as shown above, it can occur even
894with a single CPU (although not on UP compiles, since spinlocks vanish
895on kernel compiles with ``CONFIG_SMP``\ =n. You'll still get data
896corruption in the second example).
897
898This complete lockup is easy to diagnose: on SMP boxes the watchdog
899timer or compiling with ``DEBUG_SPINLOCK`` set
900(``include/linux/spinlock.h``) will show this up immediately when it
901happens.
902
903A more complex problem is the so-called 'deadly embrace', involving two
904or more locks. Say you have a hash table: each entry in the table is a
905spinlock, and a chain of hashed objects. Inside a softirq handler, you
906sometimes want to alter an object from one place in the hash to another:
907you grab the spinlock of the old hash chain and the spinlock of the new
908hash chain, and delete the object from the old one, and insert it in the
909new one.
910
911There are two problems here. First, if your code ever tries to move the
912object to the same chain, it will deadlock with itself as it tries to
913lock it twice. Secondly, if the same softirq on another CPU is trying to
914move another object in the reverse direction, the following could
915happen:
916
917+-----------------------+-----------------------+
918| CPU 1 | CPU 2 |
919+=======================+=======================+
920| Grab lock A -> OK | Grab lock B -> OK |
921+-----------------------+-----------------------+
922| Grab lock B -> spin | Grab lock A -> spin |
923+-----------------------+-----------------------+
924
925Table: Consequences
926
927The two CPUs will spin forever, waiting for the other to give up their
928lock. It will look, smell, and feel like a crash.
929
930Preventing Deadlock
931-------------------
932
933Textbooks will tell you that if you always lock in the same order, you
934will never get this kind of deadlock. Practice will tell you that this
935approach doesn't scale: when I create a new lock, I don't understand
936enough of the kernel to figure out where in the 5000 lock hierarchy it
937will fit.
938
939The best locks are encapsulated: they never get exposed in headers, and
940are never held around calls to non-trivial functions outside the same
941file. You can read through this code and see that it will never
942deadlock, because it never tries to grab another lock while it has that
943one. People using your code don't even need to know you are using a
944lock.
945
946A classic problem here is when you provide callbacks or hooks: if you
947call these with the lock held, you risk simple deadlock, or a deadly
948embrace (who knows what the callback will do?). Remember, the other
949programmers are out to get you, so don't do this.
950
951Overzealous Prevention Of Deadlocks
952~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
953
954Deadlocks are problematic, but not as bad as data corruption. Code which
955grabs a read lock, searches a list, fails to find what it wants, drops
956the read lock, grabs a write lock and inserts the object has a race
957condition.
958
959If you don't see why, please stay the fuck away from my code.
960
961Racing Timers: A Kernel Pastime
962-------------------------------
963
964Timers can produce their own special problems with races. Consider a
965collection of objects (list, hash, etc) where each object has a timer
966which is due to destroy it.
967
968If you want to destroy the entire collection (say on module removal),
969you might do the following::
970
971 /* THIS CODE BAD BAD BAD BAD: IF IT WAS ANY WORSE IT WOULD USE
972 HUNGARIAN NOTATION */
973 spin_lock_bh(&list_lock);
974
975 while (list) {
976 struct foo *next = list->next;
977 del_timer(&list->timer);
978 kfree(list);
979 list = next;
980 }
981
982 spin_unlock_bh(&list_lock);
983
984
985Sooner or later, this will crash on SMP, because a timer can have just
986gone off before the :c:func:`spin_lock_bh()`, and it will only get
987the lock after we :c:func:`spin_unlock_bh()`, and then try to free
988the element (which has already been freed!).
989
990This can be avoided by checking the result of
991:c:func:`del_timer()`: if it returns 1, the timer has been deleted.
992If 0, it means (in this case) that it is currently running, so we can
993do::
994
995 retry:
996 spin_lock_bh(&list_lock);
997
998 while (list) {
999 struct foo *next = list->next;
1000 if (!del_timer(&list->timer)) {
1001 /* Give timer a chance to delete this */
1002 spin_unlock_bh(&list_lock);
1003 goto retry;
1004 }
1005 kfree(list);
1006 list = next;
1007 }
1008
1009 spin_unlock_bh(&list_lock);
1010
1011
1012Another common problem is deleting timers which restart themselves (by
1013calling :c:func:`add_timer()` at the end of their timer function).
1014Because this is a fairly common case which is prone to races, you should
1015use :c:func:`del_timer_sync()` (``include/linux/timer.h``) to
1016handle this case. It returns the number of times the timer had to be
1017deleted before we finally stopped it from adding itself back in.
1018
1019Locking Speed
1020=============
1021
1022There are three main things to worry about when considering speed of
1023some code which does locking. First is concurrency: how many things are
1024going to be waiting while someone else is holding a lock. Second is the
1025time taken to actually acquire and release an uncontended lock. Third is
1026using fewer, or smarter locks. I'm assuming that the lock is used fairly
1027often: otherwise, you wouldn't be concerned about efficiency.
1028
1029Concurrency depends on how long the lock is usually held: you should
1030hold the lock for as long as needed, but no longer. In the cache
1031example, we always create the object without the lock held, and then
1032grab the lock only when we are ready to insert it in the list.
1033
1034Acquisition times depend on how much damage the lock operations do to
1035the pipeline (pipeline stalls) and how likely it is that this CPU was
1036the last one to grab the lock (ie. is the lock cache-hot for this CPU):
1037on a machine with more CPUs, this likelihood drops fast. Consider a
1038700MHz Intel Pentium III: an instruction takes about 0.7ns, an atomic
1039increment takes about 58ns, a lock which is cache-hot on this CPU takes
1040160ns, and a cacheline transfer from another CPU takes an additional 170
1041to 360ns. (These figures from Paul McKenney's `Linux Journal RCU
1042article <http://www.linuxjournal.com/article.php?sid=6993>`__).
1043
1044These two aims conflict: holding a lock for a short time might be done
1045by splitting locks into parts (such as in our final per-object-lock
1046example), but this increases the number of lock acquisitions, and the
1047results are often slower than having a single lock. This is another
1048reason to advocate locking simplicity.
1049
1050The third concern is addressed below: there are some methods to reduce
1051the amount of locking which needs to be done.
1052
1053Read/Write Lock Variants
1054------------------------
1055
1056Both spinlocks and mutexes have read/write variants: ``rwlock_t`` and
1057:c:type:`struct rw_semaphore <rw_semaphore>`. These divide
1058users into two classes: the readers and the writers. If you are only
1059reading the data, you can get a read lock, but to write to the data you
1060need the write lock. Many people can hold a read lock, but a writer must
1061be sole holder.
1062
1063If your code divides neatly along reader/writer lines (as our cache code
1064does), and the lock is held by readers for significant lengths of time,
1065using these locks can help. They are slightly slower than the normal
1066locks though, so in practice ``rwlock_t`` is not usually worthwhile.
1067
1068Avoiding Locks: Read Copy Update
1069--------------------------------
1070
1071There is a special method of read/write locking called Read Copy Update.
1072Using RCU, the readers can avoid taking a lock altogether: as we expect
1073our cache to be read more often than updated (otherwise the cache is a
1074waste of time), it is a candidate for this optimization.
1075
1076How do we get rid of read locks? Getting rid of read locks means that
1077writers may be changing the list underneath the readers. That is
1078actually quite simple: we can read a linked list while an element is
1079being added if the writer adds the element very carefully. For example,
1080adding ``new`` to a single linked list called ``list``::
1081
1082 new->next = list->next;
1083 wmb();
1084 list->next = new;
1085
1086
1087The :c:func:`wmb()` is a write memory barrier. It ensures that the
1088first operation (setting the new element's ``next`` pointer) is complete
1089and will be seen by all CPUs, before the second operation is (putting
1090the new element into the list). This is important, since modern
1091compilers and modern CPUs can both reorder instructions unless told
1092otherwise: we want a reader to either not see the new element at all, or
1093see the new element with the ``next`` pointer correctly pointing at the
1094rest of the list.
1095
1096Fortunately, there is a function to do this for standard
1097:c:type:`struct list_head <list_head>` lists:
1098:c:func:`list_add_rcu()` (``include/linux/list.h``).
1099
1100Removing an element from the list is even simpler: we replace the
1101pointer to the old element with a pointer to its successor, and readers
1102will either see it, or skip over it.
1103
1104::
1105
1106 list->next = old->next;
1107
1108
1109There is :c:func:`list_del_rcu()` (``include/linux/list.h``) which
1110does this (the normal version poisons the old object, which we don't
1111want).
1112
1113The reader must also be careful: some CPUs can look through the ``next``
1114pointer to start reading the contents of the next element early, but
1115don't realize that the pre-fetched contents is wrong when the ``next``
1116pointer changes underneath them. Once again, there is a
1117:c:func:`list_for_each_entry_rcu()` (``include/linux/list.h``)
1118to help you. Of course, writers can just use
1119:c:func:`list_for_each_entry()`, since there cannot be two
1120simultaneous writers.
1121
1122Our final dilemma is this: when can we actually destroy the removed
1123element? Remember, a reader might be stepping through this element in
1124the list right now: if we free this element and the ``next`` pointer
1125changes, the reader will jump off into garbage and crash. We need to
1126wait until we know that all the readers who were traversing the list
1127when we deleted the element are finished. We use
1128:c:func:`call_rcu()` to register a callback which will actually
1129destroy the object once all pre-existing readers are finished.
1130Alternatively, :c:func:`synchronize_rcu()` may be used to block
1131until all pre-existing are finished.
1132
1133But how does Read Copy Update know when the readers are finished? The
1134method is this: firstly, the readers always traverse the list inside
1135:c:func:`rcu_read_lock()`/:c:func:`rcu_read_unlock()` pairs:
1136these simply disable preemption so the reader won't go to sleep while
1137reading the list.
1138
1139RCU then waits until every other CPU has slept at least once: since
1140readers cannot sleep, we know that any readers which were traversing the
1141list during the deletion are finished, and the callback is triggered.
1142The real Read Copy Update code is a little more optimized than this, but
1143this is the fundamental idea.
1144
1145::
1146
1147 --- cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100
1148 +++ cache.c.rcupdate 2003-12-11 17:55:14.000000000 +1100
1149 @@ -1,15 +1,18 @@
1150 #include <linux/list.h>
1151 #include <linux/slab.h>
1152 #include <linux/string.h>
1153 +#include <linux/rcupdate.h>
1154 #include <linux/mutex.h>
1155 #include <asm/errno.h>
1156
1157 struct object
1158 {
1159 - /* These two protected by cache_lock. */
1160 + /* This is protected by RCU */
1161 struct list_head list;
1162 int popularity;
1163
1164 + struct rcu_head rcu;
1165 +
1166 atomic_t refcnt;
1167
1168 /* Doesn't change once created. */
1169 @@ -40,7 +43,7 @@
1170 {
1171 struct object *i;
1172
1173 - list_for_each_entry(i, &cache, list) {
1174 + list_for_each_entry_rcu(i, &cache, list) {
1175 if (i->id == id) {
1176 i->popularity++;
1177 return i;
1178 @@ -49,19 +52,25 @@
1179 return NULL;
1180 }
1181
1182 +/* Final discard done once we know no readers are looking. */
1183 +static void cache_delete_rcu(void *arg)
1184 +{
1185 + object_put(arg);
1186 +}
1187 +
1188 /* Must be holding cache_lock */
1189 static void __cache_delete(struct object *obj)
1190 {
1191 BUG_ON(!obj);
1192 - list_del(&obj->list);
1193 - object_put(obj);
1194 + list_del_rcu(&obj->list);
1195 cache_num--;
1196 + call_rcu(&obj->rcu, cache_delete_rcu);
1197 }
1198
1199 /* Must be holding cache_lock */
1200 static void __cache_add(struct object *obj)
1201 {
1202 - list_add(&obj->list, &cache);
1203 + list_add_rcu(&obj->list, &cache);
1204 if (++cache_num > MAX_CACHE_SIZE) {
1205 struct object *i, *outcast = NULL;
1206 list_for_each_entry(i, &cache, list) {
1207 @@ -104,12 +114,11 @@
1208 struct object *cache_find(int id)
1209 {
1210 struct object *obj;
1211 - unsigned long flags;
1212
1213 - spin_lock_irqsave(&cache_lock, flags);
1214 + rcu_read_lock();
1215 obj = __cache_find(id);
1216 if (obj)
1217 object_get(obj);
1218 - spin_unlock_irqrestore(&cache_lock, flags);
1219 + rcu_read_unlock();
1220 return obj;
1221 }
1222
1223Note that the reader will alter the popularity member in
1224:c:func:`__cache_find()`, and now it doesn't hold a lock. One
1225solution would be to make it an ``atomic_t``, but for this usage, we
1226don't really care about races: an approximate result is good enough, so
1227I didn't change it.
1228
1229The result is that :c:func:`cache_find()` requires no
1230synchronization with any other functions, so is almost as fast on SMP as
1231it would be on UP.
1232
1233There is a further optimization possible here: remember our original
1234cache code, where there were no reference counts and the caller simply
1235held the lock whenever using the object? This is still possible: if you
1236hold the lock, no one can delete the object, so you don't need to get
1237and put the reference count.
1238
1239Now, because the 'read lock' in RCU is simply disabling preemption, a
1240caller which always has preemption disabled between calling
1241:c:func:`cache_find()` and :c:func:`object_put()` does not
1242need to actually get and put the reference count: we could expose
1243:c:func:`__cache_find()` by making it non-static, and such
1244callers could simply call that.
1245
1246The benefit here is that the reference count is not written to: the
1247object is not altered in any way, which is much faster on SMP machines
1248due to caching.
1249
1250Per-CPU Data
1251------------
1252
1253Another technique for avoiding locking which is used fairly widely is to
1254duplicate information for each CPU. For example, if you wanted to keep a
1255count of a common condition, you could use a spin lock and a single
1256counter. Nice and simple.
1257
1258If that was too slow (it's usually not, but if you've got a really big
1259machine to test on and can show that it is), you could instead use a
1260counter for each CPU, then none of them need an exclusive lock. See
1261:c:func:`DEFINE_PER_CPU()`, :c:func:`get_cpu_var()` and
1262:c:func:`put_cpu_var()` (``include/linux/percpu.h``).
1263
1264Of particular use for simple per-cpu counters is the ``local_t`` type,
1265and the :c:func:`cpu_local_inc()` and related functions, which are
1266more efficient than simple code on some architectures
1267(``include/asm/local.h``).
1268
1269Note that there is no simple, reliable way of getting an exact value of
1270such a counter, without introducing more locks. This is not a problem
1271for some uses.
1272
1273Data Which Mostly Used By An IRQ Handler
1274----------------------------------------
1275
1276If data is always accessed from within the same IRQ handler, you don't
1277need a lock at all: the kernel already guarantees that the irq handler
1278will not run simultaneously on multiple CPUs.
1279
1280Manfred Spraul points out that you can still do this, even if the data
1281is very occasionally accessed in user context or softirqs/tasklets. The
1282irq handler doesn't use a lock, and all other accesses are done as so::
1283
1284 spin_lock(&lock);
1285 disable_irq(irq);
1286 ...
1287 enable_irq(irq);
1288 spin_unlock(&lock);
1289
1290The :c:func:`disable_irq()` prevents the irq handler from running
1291(and waits for it to finish if it's currently running on other CPUs).
1292The spinlock prevents any other accesses happening at the same time.
1293Naturally, this is slower than just a :c:func:`spin_lock_irq()`
1294call, so it only makes sense if this type of access happens extremely
1295rarely.
1296
1297What Functions Are Safe To Call From Interrupts?
1298================================================
1299
1300Many functions in the kernel sleep (ie. call schedule()) directly or
1301indirectly: you can never call them while holding a spinlock, or with
1302preemption disabled. This also means you need to be in user context:
1303calling them from an interrupt is illegal.
1304
1305Some Functions Which Sleep
1306--------------------------
1307
1308The most common ones are listed below, but you usually have to read the
1309code to find out if other calls are safe. If everyone else who calls it
1310can sleep, you probably need to be able to sleep, too. In particular,
1311registration and deregistration functions usually expect to be called
1312from user context, and can sleep.
1313
1314- Accesses to userspace:
1315
1316 - :c:func:`copy_from_user()`
1317
1318 - :c:func:`copy_to_user()`
1319
1320 - :c:func:`get_user()`
1321
1322 - :c:func:`put_user()`
1323
1324- :c:func:`kmalloc(GFP_KERNEL) <kmalloc>`
1325
1326- :c:func:`mutex_lock_interruptible()` and
1327 :c:func:`mutex_lock()`
1328
1329 There is a :c:func:`mutex_trylock()` which does not sleep.
1330 Still, it must not be used inside interrupt context since its
1331 implementation is not safe for that. :c:func:`mutex_unlock()`
1332 will also never sleep. It cannot be used in interrupt context either
1333 since a mutex must be released by the same task that acquired it.
1334
1335Some Functions Which Don't Sleep
1336--------------------------------
1337
1338Some functions are safe to call from any context, or holding almost any
1339lock.
1340
1341- :c:func:`printk()`
1342
1343- :c:func:`kfree()`
1344
1345- :c:func:`add_timer()` and :c:func:`del_timer()`
1346
1347Mutex API reference
1348===================
1349
1350.. kernel-doc:: include/linux/mutex.h
1351 :internal:
1352
1353.. kernel-doc:: kernel/locking/mutex.c
1354 :export:
1355
1356Futex API reference
1357===================
1358
1359.. kernel-doc:: kernel/futex.c
1360 :internal:
1361
1362Further reading
1363===============
1364
1365- ``Documentation/locking/spinlocks.txt``: Linus Torvalds' spinlocking
1366 tutorial in the kernel sources.
1367
1368- Unix Systems for Modern Architectures: Symmetric Multiprocessing and
1369 Caching for Kernel Programmers:
1370
1371 Curt Schimmel's very good introduction to kernel level locking (not
1372 written for Linux, but nearly everything applies). The book is
1373 expensive, but really worth every penny to understand SMP locking.
1374 [ISBN: 0201633388]
1375
1376Thanks
1377======
1378
1379Thanks to Telsa Gwynne for DocBooking, neatening and adding style.
1380
1381Thanks to Martin Pool, Philipp Rumpf, Stephen Rothwell, Paul Mackerras,
1382Ruedi Aschwanden, Alan Cox, Manfred Spraul, Tim Waugh, Pete Zaitcev,
1383James Morris, Robert Love, Paul McKenney, John Ashby for proofreading,
1384correcting, flaming, commenting.
1385
1386Thanks to the cabal for having no influence on this document.
1387
1388Glossary
1389========
1390
1391preemption
1392 Prior to 2.5, or when ``CONFIG_PREEMPT`` is unset, processes in user
1393 context inside the kernel would not preempt each other (ie. you had that
1394 CPU until you gave it up, except for interrupts). With the addition of
1395 ``CONFIG_PREEMPT`` in 2.5.4, this changed: when in user context, higher
1396 priority tasks can "cut in": spinlocks were changed to disable
1397 preemption, even on UP.
1398
1399bh
1400 Bottom Half: for historical reasons, functions with '_bh' in them often
1401 now refer to any software interrupt, e.g. :c:func:`spin_lock_bh()`
1402 blocks any software interrupt on the current CPU. Bottom halves are
1403 deprecated, and will eventually be replaced by tasklets. Only one bottom
1404 half will be running at any time.
1405
1406Hardware Interrupt / Hardware IRQ
1407 Hardware interrupt request. :c:func:`in_irq()` returns true in a
1408 hardware interrupt handler.
1409
1410Interrupt Context
1411 Not user context: processing a hardware irq or software irq. Indicated
1412 by the :c:func:`in_interrupt()` macro returning true.
1413
1414SMP
1415 Symmetric Multi-Processor: kernels compiled for multiple-CPU machines.
1416 (``CONFIG_SMP=y``).
1417
1418Software Interrupt / softirq
1419 Software interrupt handler. :c:func:`in_irq()` returns false;
1420 :c:func:`in_softirq()` returns true. Tasklets and softirqs both
1421 fall into the category of 'software interrupts'.
1422
1423 Strictly speaking a softirq is one of up to 32 enumerated software
1424 interrupts which can run on multiple CPUs at once. Sometimes used to
1425 refer to tasklets as well (ie. all software interrupts).
1426
1427tasklet
1428 A dynamically-registrable software interrupt, which is guaranteed to
1429 only run on one CPU at a time.
1430
1431timer
1432 A dynamically-registrable software interrupt, which is run at (or close
1433 to) a given time. When running, it is just like a tasklet (in fact, they
1434 are called from the ``TIMER_SOFTIRQ``).
1435
1436UP
1437 Uni-Processor: Non-SMP. (``CONFIG_SMP=n``).
1438
1439User Context
1440 The kernel executing on behalf of a particular process (ie. a system
1441 call or trap) or kernel thread. You can tell which process with the
1442 ``current`` macro.) Not to be confused with userspace. Can be
1443 interrupted by software or hardware interrupts.
1444
1445Userspace
1446 A process executing its own code outside the kernel.
diff --git a/Documentation/lsm.txt b/Documentation/lsm.txt
new file mode 100644
index 000000000000..ad4dfd020e0d
--- /dev/null
+++ b/Documentation/lsm.txt
@@ -0,0 +1,201 @@
1========================================================
2Linux Security Modules: General Security Hooks for Linux
3========================================================
4
5:Author: Stephen Smalley
6:Author: Timothy Fraser
7:Author: Chris Vance
8
9.. note::
10
11 The APIs described in this book are outdated.
12
13Introduction
14============
15
16In March 2001, the National Security Agency (NSA) gave a presentation
17about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel Summit.
18SELinux is an implementation of flexible and fine-grained
19nondiscretionary access controls in the Linux kernel, originally
20implemented as its own particular kernel patch. Several other security
21projects (e.g. RSBAC, Medusa) have also developed flexible access
22control architectures for the Linux kernel, and various projects have
23developed particular access control models for Linux (e.g. LIDS, DTE,
24SubDomain). Each project has developed and maintained its own kernel
25patch to support its security needs.
26
27In response to the NSA presentation, Linus Torvalds made a set of
28remarks that described a security framework he would be willing to
29consider for inclusion in the mainstream Linux kernel. He described a
30general framework that would provide a set of security hooks to control
31operations on kernel objects and a set of opaque security fields in
32kernel data structures for maintaining security attributes. This
33framework could then be used by loadable kernel modules to implement any
34desired model of security. Linus also suggested the possibility of
35migrating the Linux capabilities code into such a module.
36
37The Linux Security Modules (LSM) project was started by WireX to develop
38such a framework. LSM is a joint development effort by several security
39projects, including Immunix, SELinux, SGI and Janus, and several
40individuals, including Greg Kroah-Hartman and James Morris, to develop a
41Linux kernel patch that implements this framework. The patch is
42currently tracking the 2.4 series and is targeted for integration into
43the 2.5 development series. This technical report provides an overview
44of the framework and the example capabilities security module provided
45by the LSM kernel patch.
46
47LSM Framework
48=============
49
50The LSM kernel patch provides a general kernel framework to support
51security modules. In particular, the LSM framework is primarily focused
52on supporting access control modules, although future development is
53likely to address other security needs such as auditing. By itself, the
54framework does not provide any additional security; it merely provides
55the infrastructure to support security modules. The LSM kernel patch
56also moves most of the capabilities logic into an optional security
57module, with the system defaulting to the traditional superuser logic.
58This capabilities module is discussed further in
59`LSM Capabilities Module <#cap>`__.
60
61The LSM kernel patch adds security fields to kernel data structures and
62inserts calls to hook functions at critical points in the kernel code to
63manage the security fields and to perform access control. It also adds
64functions for registering and unregistering security modules, and adds a
65general :c:func:`security()` system call to support new system calls
66for security-aware applications.
67
68The LSM security fields are simply ``void*`` pointers. For process and
69program execution security information, security fields were added to
70:c:type:`struct task_struct <task_struct>` and
71:c:type:`struct linux_binprm <linux_binprm>`. For filesystem
72security information, a security field was added to :c:type:`struct
73super_block <super_block>`. For pipe, file, and socket security
74information, security fields were added to :c:type:`struct inode
75<inode>` and :c:type:`struct file <file>`. For packet and
76network device security information, security fields were added to
77:c:type:`struct sk_buff <sk_buff>` and :c:type:`struct
78net_device <net_device>`. For System V IPC security information,
79security fields were added to :c:type:`struct kern_ipc_perm
80<kern_ipc_perm>` and :c:type:`struct msg_msg
81<msg_msg>`; additionally, the definitions for :c:type:`struct
82msg_msg <msg_msg>`, struct msg_queue, and struct shmid_kernel
83were moved to header files (``include/linux/msg.h`` and
84``include/linux/shm.h`` as appropriate) to allow the security modules to
85use these definitions.
86
87Each LSM hook is a function pointer in a global table, security_ops.
88This table is a :c:type:`struct security_operations
89<security_operations>` structure as defined by
90``include/linux/security.h``. Detailed documentation for each hook is
91included in this header file. At present, this structure consists of a
92collection of substructures that group related hooks based on the kernel
93object (e.g. task, inode, file, sk_buff, etc) as well as some top-level
94hook function pointers for system operations. This structure is likely
95to be flattened in the future for performance. The placement of the hook
96calls in the kernel code is described by the "called:" lines in the
97per-hook documentation in the header file. The hook calls can also be
98easily found in the kernel code by looking for the string
99"security_ops->".
100
101Linus mentioned per-process security hooks in his original remarks as a
102possible alternative to global security hooks. However, if LSM were to
103start from the perspective of per-process hooks, then the base framework
104would have to deal with how to handle operations that involve multiple
105processes (e.g. kill), since each process might have its own hook for
106controlling the operation. This would require a general mechanism for
107composing hooks in the base framework. Additionally, LSM would still
108need global hooks for operations that have no process context (e.g.
109network input operations). Consequently, LSM provides global security
110hooks, but a security module is free to implement per-process hooks
111(where that makes sense) by storing a security_ops table in each
112process' security field and then invoking these per-process hooks from
113the global hooks. The problem of composition is thus deferred to the
114module.
115
116The global security_ops table is initialized to a set of hook functions
117provided by a dummy security module that provides traditional superuser
118logic. A :c:func:`register_security()` function (in
119``security/security.c``) is provided to allow a security module to set
120security_ops to refer to its own hook functions, and an
121:c:func:`unregister_security()` function is provided to revert
122security_ops to the dummy module hooks. This mechanism is used to set
123the primary security module, which is responsible for making the final
124decision for each hook.
125
126LSM also provides a simple mechanism for stacking additional security
127modules with the primary security module. It defines
128:c:func:`register_security()` and
129:c:func:`unregister_security()` hooks in the :c:type:`struct
130security_operations <security_operations>` structure and
131provides :c:func:`mod_reg_security()` and
132:c:func:`mod_unreg_security()` functions that invoke these hooks
133after performing some sanity checking. A security module can call these
134functions in order to stack with other modules. However, the actual
135details of how this stacking is handled are deferred to the module,
136which can implement these hooks in any way it wishes (including always
137returning an error if it does not wish to support stacking). In this
138manner, LSM again defers the problem of composition to the module.
139
140Although the LSM hooks are organized into substructures based on kernel
141object, all of the hooks can be viewed as falling into two major
142categories: hooks that are used to manage the security fields and hooks
143that are used to perform access control. Examples of the first category
144of hooks include the :c:func:`alloc_security()` and
145:c:func:`free_security()` hooks defined for each kernel data
146structure that has a security field. These hooks are used to allocate
147and free security structures for kernel objects. The first category of
148hooks also includes hooks that set information in the security field
149after allocation, such as the :c:func:`post_lookup()` hook in
150:c:type:`struct inode_security_ops <inode_security_ops>`.
151This hook is used to set security information for inodes after
152successful lookup operations. An example of the second category of hooks
153is the :c:func:`permission()` hook in :c:type:`struct
154inode_security_ops <inode_security_ops>`. This hook checks
155permission when accessing an inode.
156
157LSM Capabilities Module
158=======================
159
160The LSM kernel patch moves most of the existing POSIX.1e capabilities
161logic into an optional security module stored in the file
162``security/capability.c``. This change allows users who do not want to
163use capabilities to omit this code entirely from their kernel, instead
164using the dummy module for traditional superuser logic or any other
165module that they desire. This change also allows the developers of the
166capabilities logic to maintain and enhance their code more freely,
167without needing to integrate patches back into the base kernel.
168
169In addition to moving the capabilities logic, the LSM kernel patch could
170move the capability-related fields from the kernel data structures into
171the new security fields managed by the security modules. However, at
172present, the LSM kernel patch leaves the capability fields in the kernel
173data structures. In his original remarks, Linus suggested that this
174might be preferable so that other security modules can be easily stacked
175with the capabilities module without needing to chain multiple security
176structures on the security field. It also avoids imposing extra overhead
177on the capabilities module to manage the security fields. However, the
178LSM framework could certainly support such a move if it is determined to
179be desirable, with only a few additional changes described below.
180
181At present, the capabilities logic for computing process capabilities on
182:c:func:`execve()` and :c:func:`set\*uid()`, checking
183capabilities for a particular process, saving and checking capabilities
184for netlink messages, and handling the :c:func:`capget()` and
185:c:func:`capset()` system calls have been moved into the
186capabilities module. There are still a few locations in the base kernel
187where capability-related fields are directly examined or modified, but
188the current version of the LSM patch does allow a security module to
189completely replace the assignment and testing of capabilities. These few
190locations would need to be changed if the capability-related fields were
191moved into the security field. The following is a list of known
192locations that still perform such direct examination or modification of
193capability-related fields:
194
195- ``fs/open.c``::c:func:`sys_access()`
196
197- ``fs/lockd/host.c``::c:func:`nlm_bind_host()`
198
199- ``fs/nfsd/auth.c``::c:func:`nfsd_setuser()`
200
201- ``fs/proc/array.c``::c:func:`task_cap()`
diff --git a/Documentation/media/uapi/v4l/vidioc-g-selection.rst b/Documentation/media/uapi/v4l/vidioc-g-selection.rst
index deb1f6fb473b..b80d85cb8891 100644
--- a/Documentation/media/uapi/v4l/vidioc-g-selection.rst
+++ b/Documentation/media/uapi/v4l/vidioc-g-selection.rst
@@ -129,8 +129,8 @@ Selection targets and flags are documented in
129 129
130.. _sel-const-adjust: 130.. _sel-const-adjust:
131 131
132.. figure:: constraints.* 132.. kernel-figure:: constraints.svg
133 :alt: constraints.pdf / constraints.svg 133 :alt: constraints.svg
134 :align: center 134 :align: center
135 135
136 Size adjustments with constraint flags. 136 Size adjustments with constraint flags.
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 9d5e0f853f08..c239a0cf4b1a 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -498,11 +498,11 @@ And a couple of implicit varieties:
498 This means that ACQUIRE acts as a minimal "acquire" operation and 498 This means that ACQUIRE acts as a minimal "acquire" operation and
499 RELEASE acts as a minimal "release" operation. 499 RELEASE acts as a minimal "release" operation.
500 500
501A subset of the atomic operations described in atomic_ops.txt have ACQUIRE 501A subset of the atomic operations described in core-api/atomic_ops.rst have
502and RELEASE variants in addition to fully-ordered and relaxed (no barrier 502ACQUIRE and RELEASE variants in addition to fully-ordered and relaxed (no
503semantics) definitions. For compound atomics performing both a load and a 503barrier semantics) definitions. For compound atomics performing both a load
504store, ACQUIRE semantics apply only to the load and RELEASE semantics apply 504and a store, ACQUIRE semantics apply only to the load and RELEASE semantics
505only to the store portion of the operation. 505apply only to the store portion of the operation.
506 506
507Memory barriers are only required where there's a possibility of interaction 507Memory barriers are only required where there's a possibility of interaction
508between two CPUs or between a CPU and a device. If it can be guaranteed that 508between two CPUs or between a CPU and a device. If it can be guaranteed that
diff --git a/Documentation/networking/conf.py b/Documentation/networking/conf.py
new file mode 100644
index 000000000000..40f69e67a883
--- /dev/null
+++ b/Documentation/networking/conf.py
@@ -0,0 +1,10 @@
1# -*- coding: utf-8; mode: python -*-
2
3project = "Linux Networking Documentation"
4
5tags.add("subproject")
6
7latex_documents = [
8 ('index', 'networking.tex', project,
9 'The kernel development community', 'manual'),
10]
diff --git a/Documentation/networking/dns_resolver.txt b/Documentation/networking/dns_resolver.txt
index d86adcdae420..eaa8f9a6fd5d 100644
--- a/Documentation/networking/dns_resolver.txt
+++ b/Documentation/networking/dns_resolver.txt
@@ -143,7 +143,7 @@ the key will be discarded and recreated when the data it holds has expired.
143dns_query() returns a copy of the value attached to the key, or an error if 143dns_query() returns a copy of the value attached to the key, or an error if
144that is indicated instead. 144that is indicated instead.
145 145
146See <file:Documentation/security/keys-request-key.txt> for further 146See <file:Documentation/security/keys/request-key.rst> for further
147information about request-key function. 147information about request-key function.
148 148
149 149
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
new file mode 100644
index 000000000000..b5bd87e01f52
--- /dev/null
+++ b/Documentation/networking/index.rst
@@ -0,0 +1,18 @@
1Linux Networking Documentation
2==============================
3
4Contents:
5
6.. toctree::
7 :maxdepth: 2
8
9 kapi
10 z8530book
11
12.. only:: subproject
13
14 Indices
15 =======
16
17 * :ref:`genindex`
18
diff --git a/Documentation/networking/kapi.rst b/Documentation/networking/kapi.rst
new file mode 100644
index 000000000000..580289f345da
--- /dev/null
+++ b/Documentation/networking/kapi.rst
@@ -0,0 +1,147 @@
1=========================================
2Linux Networking and Network Devices APIs
3=========================================
4
5Linux Networking
6================
7
8Networking Base Types
9---------------------
10
11.. kernel-doc:: include/linux/net.h
12 :internal:
13
14Socket Buffer Functions
15-----------------------
16
17.. kernel-doc:: include/linux/skbuff.h
18 :internal:
19
20.. kernel-doc:: include/net/sock.h
21 :internal:
22
23.. kernel-doc:: net/socket.c
24 :export:
25
26.. kernel-doc:: net/core/skbuff.c
27 :export:
28
29.. kernel-doc:: net/core/sock.c
30 :export:
31
32.. kernel-doc:: net/core/datagram.c
33 :export:
34
35.. kernel-doc:: net/core/stream.c
36 :export:
37
38Socket Filter
39-------------
40
41.. kernel-doc:: net/core/filter.c
42 :export:
43
44Generic Network Statistics
45--------------------------
46
47.. kernel-doc:: include/uapi/linux/gen_stats.h
48 :internal:
49
50.. kernel-doc:: net/core/gen_stats.c
51 :export:
52
53.. kernel-doc:: net/core/gen_estimator.c
54 :export:
55
56SUN RPC subsystem
57-----------------
58
59.. kernel-doc:: net/sunrpc/xdr.c
60 :export:
61
62.. kernel-doc:: net/sunrpc/svc_xprt.c
63 :export:
64
65.. kernel-doc:: net/sunrpc/xprt.c
66 :export:
67
68.. kernel-doc:: net/sunrpc/sched.c
69 :export:
70
71.. kernel-doc:: net/sunrpc/socklib.c
72 :export:
73
74.. kernel-doc:: net/sunrpc/stats.c
75 :export:
76
77.. kernel-doc:: net/sunrpc/rpc_pipe.c
78 :export:
79
80.. kernel-doc:: net/sunrpc/rpcb_clnt.c
81 :export:
82
83.. kernel-doc:: net/sunrpc/clnt.c
84 :export:
85
86WiMAX
87-----
88
89.. kernel-doc:: net/wimax/op-msg.c
90 :export:
91
92.. kernel-doc:: net/wimax/op-reset.c
93 :export:
94
95.. kernel-doc:: net/wimax/op-rfkill.c
96 :export:
97
98.. kernel-doc:: net/wimax/stack.c
99 :export:
100
101.. kernel-doc:: include/net/wimax.h
102 :internal:
103
104.. kernel-doc:: include/uapi/linux/wimax.h
105 :internal:
106
107Network device support
108======================
109
110Driver Support
111--------------
112
113.. kernel-doc:: net/core/dev.c
114 :export:
115
116.. kernel-doc:: net/ethernet/eth.c
117 :export:
118
119.. kernel-doc:: net/sched/sch_generic.c
120 :export:
121
122.. kernel-doc:: include/linux/etherdevice.h
123 :internal:
124
125.. kernel-doc:: include/linux/netdevice.h
126 :internal:
127
128PHY Support
129-----------
130
131.. kernel-doc:: drivers/net/phy/phy.c
132 :export:
133
134.. kernel-doc:: drivers/net/phy/phy.c
135 :internal:
136
137.. kernel-doc:: drivers/net/phy/phy_device.c
138 :export:
139
140.. kernel-doc:: drivers/net/phy/phy_device.c
141 :internal:
142
143.. kernel-doc:: drivers/net/phy/mdio_bus.c
144 :export:
145
146.. kernel-doc:: drivers/net/phy/mdio_bus.c
147 :internal:
diff --git a/Documentation/networking/z8530book.rst b/Documentation/networking/z8530book.rst
new file mode 100644
index 000000000000..fea2c40e7973
--- /dev/null
+++ b/Documentation/networking/z8530book.rst
@@ -0,0 +1,256 @@
1=======================
2Z8530 Programming Guide
3=======================
4
5:Author: Alan Cox
6
7Introduction
8============
9
10The Z85x30 family synchronous/asynchronous controller chips are used on
11a large number of cheap network interface cards. The kernel provides a
12core interface layer that is designed to make it easy to provide WAN
13services using this chip.
14
15The current driver only support synchronous operation. Merging the
16asynchronous driver support into this code to allow any Z85x30 device to
17be used as both a tty interface and as a synchronous controller is a
18project for Linux post the 2.4 release
19
20Driver Modes
21============
22
23The Z85230 driver layer can drive Z8530, Z85C30 and Z85230 devices in
24three different modes. Each mode can be applied to an individual channel
25on the chip (each chip has two channels).
26
27The PIO synchronous mode supports the most common Z8530 wiring. Here the
28chip is interface to the I/O and interrupt facilities of the host
29machine but not to the DMA subsystem. When running PIO the Z8530 has
30extremely tight timing requirements. Doing high speeds, even with a
31Z85230 will be tricky. Typically you should expect to achieve at best
329600 baud with a Z8C530 and 64Kbits with a Z85230.
33
34The DMA mode supports the chip when it is configured to use dual DMA
35channels on an ISA bus. The better cards tend to support this mode of
36operation for a single channel. With DMA running the Z85230 tops out
37when it starts to hit ISA DMA constraints at about 512Kbits. It is worth
38noting here that many PC machines hang or crash when the chip is driven
39fast enough to hold the ISA bus solid.
40
41Transmit DMA mode uses a single DMA channel. The DMA channel is used for
42transmission as the transmit FIFO is smaller than the receive FIFO. it
43gives better performance than pure PIO mode but is nowhere near as ideal
44as pure DMA mode.
45
46Using the Z85230 driver
47=======================
48
49The Z85230 driver provides the back end interface to your board. To
50configure a Z8530 interface you need to detect the board and to identify
51its ports and interrupt resources. It is also your problem to verify the
52resources are available.
53
54Having identified the chip you need to fill in a struct z8530_dev,
55which describes each chip. This object must exist until you finally
56shutdown the board. Firstly zero the active field. This ensures nothing
57goes off without you intending it. The irq field should be set to the
58interrupt number of the chip. (Each chip has a single interrupt source
59rather than each channel). You are responsible for allocating the
60interrupt line. The interrupt handler should be set to
61:c:func:`z8530_interrupt()`. The device id should be set to the
62z8530_dev structure pointer. Whether the interrupt can be shared or not
63is board dependent, and up to you to initialise.
64
65The structure holds two channel structures. Initialise chanA.ctrlio and
66chanA.dataio with the address of the control and data ports. You can or
67this with Z8530_PORT_SLEEP to indicate your interface needs the 5uS
68delay for chip settling done in software. The PORT_SLEEP option is
69architecture specific. Other flags may become available on future
70platforms, eg for MMIO. Initialise the chanA.irqs to &z8530_nop to
71start the chip up as disabled and discarding interrupt events. This
72ensures that stray interrupts will be mopped up and not hang the bus.
73Set chanA.dev to point to the device structure itself. The private and
74name field you may use as you wish. The private field is unused by the
75Z85230 layer. The name is used for error reporting and it may thus make
76sense to make it match the network name.
77
78Repeat the same operation with the B channel if your chip has both
79channels wired to something useful. This isn't always the case. If it is
80not wired then the I/O values do not matter, but you must initialise
81chanB.dev.
82
83If your board has DMA facilities then initialise the txdma and rxdma
84fields for the relevant channels. You must also allocate the ISA DMA
85channels and do any necessary board level initialisation to configure
86them. The low level driver will do the Z8530 and DMA controller
87programming but not board specific magic.
88
89Having initialised the device you can then call
90:c:func:`z8530_init()`. This will probe the chip and reset it into
91a known state. An identification sequence is then run to identify the
92chip type. If the checks fail to pass the function returns a non zero
93error code. Typically this indicates that the port given is not valid.
94After this call the type field of the z8530_dev structure is
95initialised to either Z8530, Z85C30 or Z85230 according to the chip
96found.
97
98Once you have called z8530_init you can also make use of the utility
99function :c:func:`z8530_describe()`. This provides a consistent
100reporting format for the Z8530 devices, and allows all the drivers to
101provide consistent reporting.
102
103Attaching Network Interfaces
104============================
105
106If you wish to use the network interface facilities of the driver, then
107you need to attach a network device to each channel that is present and
108in use. In addition to use the generic HDLC you need to follow some
109additional plumbing rules. They may seem complex but a look at the
110example hostess_sv11 driver should reassure you.
111
112The network device used for each channel should be pointed to by the
113netdevice field of each channel. The hdlc-> priv field of the network
114device points to your private data - you will need to be able to find
115your private data from this.
116
117The way most drivers approach this particular problem is to create a
118structure holding the Z8530 device definition and put that into the
119private field of the network device. The network device fields of the
120channels then point back to the network devices.
121
122If you wish to use the generic HDLC then you need to register the HDLC
123device.
124
125Before you register your network device you will also need to provide
126suitable handlers for most of the network device callbacks. See the
127network device documentation for more details on this.
128
129Configuring And Activating The Port
130===================================
131
132The Z85230 driver provides helper functions and tables to load the port
133registers on the Z8530 chips. When programming the register settings for
134a channel be aware that the documentation recommends initialisation
135orders. Strange things happen when these are not followed.
136
137:c:func:`z8530_channel_load()` takes an array of pairs of
138initialisation values in an array of u8 type. The first value is the
139Z8530 register number. Add 16 to indicate the alternate register bank on
140the later chips. The array is terminated by a 255.
141
142The driver provides a pair of public tables. The z8530_hdlc_kilostream
143table is for the UK 'Kilostream' service and also happens to cover most
144other end host configurations. The z8530_hdlc_kilostream_85230 table
145is the same configuration using the enhancements of the 85230 chip. The
146configuration loaded is standard NRZ encoded synchronous data with HDLC
147bitstuffing. All of the timing is taken from the other end of the link.
148
149When writing your own tables be aware that the driver internally tracks
150register values. It may need to reload values. You should therefore be
151sure to set registers 1-7, 9-11, 14 and 15 in all configurations. Where
152the register settings depend on DMA selection the driver will update the
153bits itself when you open or close. Loading a new table with the
154interface open is not recommended.
155
156There are three standard configurations supported by the core code. In
157PIO mode the interface is programmed up to use interrupt driven PIO.
158This places high demands on the host processor to avoid latency. The
159driver is written to take account of latency issues but it cannot avoid
160latencies caused by other drivers, notably IDE in PIO mode. Because the
161drivers allocate buffers you must also prevent MTU changes while the
162port is open.
163
164Once the port is open it will call the rx_function of each channel
165whenever a completed packet arrived. This is invoked from interrupt
166context and passes you the channel and a network buffer (struct
167sk_buff) holding the data. The data includes the CRC bytes so most
168users will want to trim the last two bytes before processing the data.
169This function is very timing critical. When you wish to simply discard
170data the support code provides the function
171:c:func:`z8530_null_rx()` to discard the data.
172
173To active PIO mode sending and receiving the ``z8530_sync_open`` is called.
174This expects to be passed the network device and the channel. Typically
175this is called from your network device open callback. On a failure a
176non zero error status is returned.
177The :c:func:`z8530_sync_close()` function shuts down a PIO
178channel. This must be done before the channel is opened again and before
179the driver shuts down and unloads.
180
181The ideal mode of operation is dual channel DMA mode. Here the kernel
182driver will configure the board for DMA in both directions. The driver
183also handles ISA DMA issues such as controller programming and the
184memory range limit for you. This mode is activated by calling the
185:c:func:`z8530_sync_dma_open()` function. On failure a non zero
186error value is returned. Once this mode is activated it can be shut down
187by calling the :c:func:`z8530_sync_dma_close()`. You must call
188the close function matching the open mode you used.
189
190The final supported mode uses a single DMA channel to drive the transmit
191side. As the Z85C30 has a larger FIFO on the receive channel this tends
192to increase the maximum speed a little. This is activated by calling the
193``z8530_sync_txdma_open``. This returns a non zero error code on failure. The
194:c:func:`z8530_sync_txdma_close()` function closes down the Z8530
195interface from this mode.
196
197Network Layer Functions
198=======================
199
200The Z8530 layer provides functions to queue packets for transmission.
201The driver internally buffers the frame currently being transmitted and
202one further frame (in order to keep back to back transmission running).
203Any further buffering is up to the caller.
204
205The function :c:func:`z8530_queue_xmit()` takes a network buffer
206in sk_buff format and queues it for transmission. The caller must
207provide the entire packet with the exception of the bitstuffing and CRC.
208This is normally done by the caller via the generic HDLC interface
209layer. It returns 0 if the buffer has been queued and non zero values
210for queue full. If the function accepts the buffer it becomes property
211of the Z8530 layer and the caller should not free it.
212
213The function :c:func:`z8530_get_stats()` returns a pointer to an
214internally maintained per interface statistics block. This provides most
215of the interface code needed to implement the network layer get_stats
216callback.
217
218Porting The Z8530 Driver
219========================
220
221The Z8530 driver is written to be portable. In DMA mode it makes
222assumptions about the use of ISA DMA. These are probably warranted in
223most cases as the Z85230 in particular was designed to glue to PC type
224machines. The PIO mode makes no real assumptions.
225
226Should you need to retarget the Z8530 driver to another architecture the
227only code that should need changing are the port I/O functions. At the
228moment these assume PC I/O port accesses. This may not be appropriate
229for all platforms. Replacing :c:func:`z8530_read_port()` and
230``z8530_write_port`` is intended to be all that is required to port
231this driver layer.
232
233Known Bugs And Assumptions
234==========================
235
236Interrupt Locking
237 The locking in the driver is done via the global cli/sti lock. This
238 makes for relatively poor SMP performance. Switching this to use a
239 per device spin lock would probably materially improve performance.
240
241Occasional Failures
242 We have reports of occasional failures when run for very long
243 periods of time and the driver starts to receive junk frames. At the
244 moment the cause of this is not clear.
245
246Public Functions Provided
247=========================
248
249.. kernel-doc:: drivers/net/wan/z85230.c
250 :export:
251
252Internal Functions
253==================
254
255.. kernel-doc:: drivers/net/wan/z85230.c
256 :internal:
diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index e25d63f8c0da..3aed751e0cb5 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -116,12 +116,11 @@ DevFS has been obsoleted in favour of udev
116 116
117Linux documentation for functions is transitioning to inline 117Linux documentation for functions is transitioning to inline
118documentation via specially-formatted comments near their 118documentation via specially-formatted comments near their
119definitions in the source. These comments can be combined with the 119definitions in the source. These comments can be combined with ReST
120SGML templates in the Documentation/DocBook directory to make DocBook 120files the Documentation/ directory to make enriched documentation, which can
121files, which can then be converted by DocBook stylesheets to PostScript, 121then be converted to PostScript, HTML, LaTex, ePUB and PDF files.
122HTML, PDF files, and several other formats. In order to convert from 122In order to convert from ReST format to a format of your choice, you'll need
123DocBook format to a format of your choice, you'll need to install Jade as 123Sphinx.
124well as the desired DocBook stylesheets.
125 124
126Util-linux 125Util-linux
127---------- 126----------
@@ -323,12 +322,6 @@ PDF outputs, it is recommended to use version 1.4.6.
323 functionalities required for ``XeLaTex`` to work. For PDF output you'll also 322 functionalities required for ``XeLaTex`` to work. For PDF output you'll also
324 need ``convert(1)`` from ImageMagick (https://www.imagemagick.org). 323 need ``convert(1)`` from ImageMagick (https://www.imagemagick.org).
325 324
326Other tools
327-----------
328
329In order to produce documentation from DocBook, you'll also need ``xmlto``.
330Please notice, however, that we're currently migrating all documents to use
331``Sphinx``.
332 325
333Getting updated software 326Getting updated software
334======================== 327========================
@@ -409,15 +402,6 @@ Quota-tools
409 402
410- <http://sourceforge.net/projects/linuxquota/> 403- <http://sourceforge.net/projects/linuxquota/>
411 404
412DocBook Stylesheets
413-------------------
414
415- <http://sourceforge.net/projects/docbook/files/docbook-dsssl/>
416
417XMLTO XSLT Frontend
418-------------------
419
420- <http://cyberelk.net/tim/xmlto/>
421 405
422Intel P6 microcode 406Intel P6 microcode
423------------------ 407------------------
diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index d20d52a4d812..a20b44a40ec4 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -980,8 +980,8 @@ do so, though, and doing so unnecessarily can limit optimization.
980 980
981When writing a single inline assembly statement containing multiple 981When writing a single inline assembly statement containing multiple
982instructions, put each instruction on a separate line in a separate quoted 982instructions, put each instruction on a separate line in a separate quoted
983string, and end each string except the last with \n\t to properly indent the 983string, and end each string except the last with ``\n\t`` to properly indent
984next instruction in the assembly output: 984the next instruction in the assembly output:
985 985
986.. code-block:: c 986.. code-block:: c
987 987
diff --git a/Documentation/process/email-clients.rst b/Documentation/process/email-clients.rst
index ac892b30815e..07faa5457bcb 100644
--- a/Documentation/process/email-clients.rst
+++ b/Documentation/process/email-clients.rst
@@ -167,6 +167,11 @@ Lotus Notes (GUI)
167 167
168Run away from it. 168Run away from it.
169 169
170IBM Verse (Web GUI)
171*******************
172
173See Lotus Notes.
174
170Mutt (TUI) 175Mutt (TUI)
171********** 176**********
172 177
diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst
index 1260f60d4cb9..c6875b1db56f 100644
--- a/Documentation/process/howto.rst
+++ b/Documentation/process/howto.rst
@@ -180,14 +180,6 @@ They can also be generated on LaTeX and ePub formats with::
180 make latexdocs 180 make latexdocs
181 make epubdocs 181 make epubdocs
182 182
183Currently, there are some documents written on DocBook that are in
184the process of conversion to ReST. Such documents will be created in the
185Documentation/DocBook/ directory and can be generated also as
186Postscript or man pages by running::
187
188 make psdocs
189 make mandocs
190
191Becoming A Kernel Developer 183Becoming A Kernel Developer
192--------------------------- 184---------------------------
193 185
diff --git a/Documentation/process/kernel-docs.rst b/Documentation/process/kernel-docs.rst
index 05a7857a4a83..b8cac85a4001 100644
--- a/Documentation/process/kernel-docs.rst
+++ b/Documentation/process/kernel-docs.rst
@@ -40,50 +40,18 @@ Enjoy!
40Docs at the Linux Kernel tree 40Docs at the Linux Kernel tree
41----------------------------- 41-----------------------------
42 42
43The DocBook books should be built with ``make {htmldocs | psdocs | pdfdocs}``.
44The Sphinx books should be built with ``make {htmldocs | pdfdocs | epubdocs}``. 43The Sphinx books should be built with ``make {htmldocs | pdfdocs | epubdocs}``.
45 44
46 * Name: **linux/Documentation** 45 * Name: **linux/Documentation**
47 46
48 :Author: Many. 47 :Author: Many.
49 :Location: Documentation/ 48 :Location: Documentation/
50 :Keywords: text files, Sphinx, DocBook. 49 :Keywords: text files, Sphinx.
51 :Description: Documentation that comes with the kernel sources, 50 :Description: Documentation that comes with the kernel sources,
52 inside the Documentation directory. Some pages from this document 51 inside the Documentation directory. Some pages from this document
53 (including this document itself) have been moved there, and might 52 (including this document itself) have been moved there, and might
54 be more up to date than the web version. 53 be more up to date than the web version.
55 54
56 * Title: **The Kernel Hacking HOWTO**
57
58 :Author: Various Talented People, and Rusty.
59 :Location: Documentation/DocBook/kernel-hacking.tmpl
60 :Keywords: HOWTO, kernel contexts, deadlock, locking, modules,
61 symbols, return conventions.
62 :Description: From the Introduction: "Please understand that I
63 never wanted to write this document, being grossly underqualified,
64 but I always wanted to read it, and this was the only way. I
65 simply explain some best practices, and give reading entry-points
66 into the kernel sources. I avoid implementation details: that's
67 what the code is for, and I ignore whole tracts of useful
68 routines. This document assumes familiarity with C, and an
69 understanding of what the kernel is, and how it is used. It was
70 originally written for the 2.3 kernels, but nearly all of it
71 applies to 2.2 too; 2.0 is slightly different".
72
73 * Title: **Linux Kernel Locking HOWTO**
74
75 :Author: Various Talented People, and Rusty.
76 :Location: Documentation/DocBook/kernel-locking.tmpl
77 :Keywords: locks, locking, spinlock, semaphore, atomic, race
78 condition, bottom halves, tasklets, softirqs.
79 :Description: The title says it all: document describing the
80 locking system in the Linux Kernel either in uniprocessor or SMP
81 systems.
82 :Notes: "It was originally written for the later (>2.3.47) 2.3
83 kernels, but most of it applies to 2.2 too; 2.0 is slightly
84 different". Freely redistributable under the conditions of the GNU
85 General Public License.
86
87On-line docs 55On-line docs
88------------ 56------------
89 57
diff --git a/Documentation/security/00-INDEX b/Documentation/security/00-INDEX
deleted file mode 100644
index 45c82fd3e9d3..000000000000
--- a/Documentation/security/00-INDEX
+++ /dev/null
@@ -1,26 +0,0 @@
100-INDEX
2 - this file.
3LSM.txt
4 - description of the Linux Security Module framework.
5SELinux.txt
6 - how to get started with the SELinux security enhancement.
7Smack.txt
8 - documentation on the Smack Linux Security Module.
9Yama.txt
10 - documentation on the Yama Linux Security Module.
11apparmor.txt
12 - documentation on the AppArmor security extension.
13credentials.txt
14 - documentation about credentials in Linux.
15keys-ecryptfs.txt
16 - description of the encryption keys for the ecryptfs filesystem.
17keys-request-key.txt
18 - description of the kernel key request service.
19keys-trusted-encrypted.txt
20 - info on the Trusted and Encrypted keys in the kernel key ring service.
21keys.txt
22 - description of the kernel key retention service.
23tomoyo.txt
24 - documentation on the TOMOYO Linux Security Module.
25IMA-templates.txt
26 - documentation on the template management mechanism for IMA.
diff --git a/Documentation/security/IMA-templates.txt b/Documentation/security/IMA-templates.rst
index 839b5dad9226..2cd0e273cc9a 100644
--- a/Documentation/security/IMA-templates.txt
+++ b/Documentation/security/IMA-templates.rst
@@ -1,9 +1,12 @@
1 IMA Template Management Mechanism 1=================================
2IMA Template Management Mechanism
3=================================
2 4
3 5
4==== INTRODUCTION ==== 6Introduction
7============
5 8
6The original 'ima' template is fixed length, containing the filedata hash 9The original ``ima`` template is fixed length, containing the filedata hash
7and pathname. The filedata hash is limited to 20 bytes (md5/sha1). 10and pathname. The filedata hash is limited to 20 bytes (md5/sha1).
8The pathname is a null terminated string, limited to 255 characters. 11The pathname is a null terminated string, limited to 255 characters.
9To overcome these limitations and to add additional file metadata, it is 12To overcome these limitations and to add additional file metadata, it is
@@ -28,61 +31,64 @@ a new data type, developers define the field identifier and implement
28two functions, init() and show(), respectively to generate and display 31two functions, init() and show(), respectively to generate and display
29measurement entries. Defining a new template descriptor requires 32measurement entries. Defining a new template descriptor requires
30specifying the template format (a string of field identifiers separated 33specifying the template format (a string of field identifiers separated
31by the '|' character) through the 'ima_template_fmt' kernel command line 34by the ``|`` character) through the ``ima_template_fmt`` kernel command line
32parameter. At boot time, IMA initializes the chosen template descriptor 35parameter. At boot time, IMA initializes the chosen template descriptor
33by translating the format into an array of template fields structures taken 36by translating the format into an array of template fields structures taken
34from the set of the supported ones. 37from the set of the supported ones.
35 38
36After the initialization step, IMA will call ima_alloc_init_template() 39After the initialization step, IMA will call ``ima_alloc_init_template()``
37(new function defined within the patches for the new template management 40(new function defined within the patches for the new template management
38mechanism) to generate a new measurement entry by using the template 41mechanism) to generate a new measurement entry by using the template
39descriptor chosen through the kernel configuration or through the newly 42descriptor chosen through the kernel configuration or through the newly
40introduced 'ima_template' and 'ima_template_fmt' kernel command line parameters. 43introduced ``ima_template`` and ``ima_template_fmt`` kernel command line parameters.
41It is during this phase that the advantages of the new architecture are 44It is during this phase that the advantages of the new architecture are
42clearly shown: the latter function will not contain specific code to handle 45clearly shown: the latter function will not contain specific code to handle
43a given template but, instead, it simply calls the init() method of the template 46a given template but, instead, it simply calls the ``init()`` method of the template
44fields associated to the chosen template descriptor and store the result 47fields associated to the chosen template descriptor and store the result
45(pointer to allocated data and data length) in the measurement entry structure. 48(pointer to allocated data and data length) in the measurement entry structure.
46 49
47The same mechanism is employed to display measurements entries. 50The same mechanism is employed to display measurements entries.
48The functions ima[_ascii]_measurements_show() retrieve, for each entry, 51The functions ``ima[_ascii]_measurements_show()`` retrieve, for each entry,
49the template descriptor used to produce that entry and call the show() 52the template descriptor used to produce that entry and call the show()
50method for each item of the array of template fields structures. 53method for each item of the array of template fields structures.
51 54
52 55
53 56
54==== SUPPORTED TEMPLATE FIELDS AND DESCRIPTORS ==== 57Supported Template Fields and Descriptors
58=========================================
55 59
56In the following, there is the list of supported template fields 60In the following, there is the list of supported template fields
57('<identifier>': description), that can be used to define new template 61``('<identifier>': description)``, that can be used to define new template
58descriptors by adding their identifier to the format string 62descriptors by adding their identifier to the format string
59(support for more data types will be added later): 63(support for more data types will be added later):
60 64
61 - 'd': the digest of the event (i.e. the digest of a measured file), 65 - 'd': the digest of the event (i.e. the digest of a measured file),
62 calculated with the SHA1 or MD5 hash algorithm; 66 calculated with the SHA1 or MD5 hash algorithm;
63 - 'n': the name of the event (i.e. the file name), with size up to 255 bytes; 67 - 'n': the name of the event (i.e. the file name), with size up to 255 bytes;
64 - 'd-ng': the digest of the event, calculated with an arbitrary hash 68 - 'd-ng': the digest of the event, calculated with an arbitrary hash
65 algorithm (field format: [<hash algo>:]digest, where the digest 69 algorithm (field format: [<hash algo>:]digest, where the digest
66 prefix is shown only if the hash algorithm is not SHA1 or MD5); 70 prefix is shown only if the hash algorithm is not SHA1 or MD5);
67 - 'n-ng': the name of the event, without size limitations; 71 - 'n-ng': the name of the event, without size limitations;
68 - 'sig': the file signature. 72 - 'sig': the file signature.
69 73
70 74
71Below, there is the list of defined template descriptors: 75Below, there is the list of defined template descriptors:
72 - "ima": its format is 'd|n';
73 - "ima-ng" (default): its format is 'd-ng|n-ng';
74 - "ima-sig": its format is 'd-ng|n-ng|sig'.
75 76
77 - "ima": its format is ``d|n``;
78 - "ima-ng" (default): its format is ``d-ng|n-ng``;
79 - "ima-sig": its format is ``d-ng|n-ng|sig``.
76 80
77 81
78==== USE ==== 82
83Use
84===
79 85
80To specify the template descriptor to be used to generate measurement entries, 86To specify the template descriptor to be used to generate measurement entries,
81currently the following methods are supported: 87currently the following methods are supported:
82 88
83 - select a template descriptor among those supported in the kernel 89 - select a template descriptor among those supported in the kernel
84 configuration ('ima-ng' is the default choice); 90 configuration (``ima-ng`` is the default choice);
85 - specify a template descriptor name from the kernel command line through 91 - specify a template descriptor name from the kernel command line through
86 the 'ima_template=' parameter; 92 the ``ima_template=`` parameter;
87 - register a new template descriptor with custom format through the kernel 93 - register a new template descriptor with custom format through the kernel
88 command line parameter 'ima_template_fmt='. 94 command line parameter ``ima_template_fmt=``.
diff --git a/Documentation/security/LSM.rst b/Documentation/security/LSM.rst
new file mode 100644
index 000000000000..d75778b0fa10
--- /dev/null
+++ b/Documentation/security/LSM.rst
@@ -0,0 +1,14 @@
1=================================
2Linux Security Module Development
3=================================
4
5Based on https://lkml.org/lkml/2007/10/26/215,
6a new LSM is accepted into the kernel when its intent (a description of
7what it tries to protect against and in what cases one would expect to
8use it) has been appropriately documented in ``Documentation/security/LSM``.
9This allows an LSM's code to be easily compared to its goals, and so
10that end users and distros can make a more informed decision about which
11LSMs suit their requirements.
12
13For extensive documentation on the available LSM hook interfaces, please
14see ``include/linux/lsm_hooks.h``.
diff --git a/Documentation/security/conf.py b/Documentation/security/conf.py
deleted file mode 100644
index 472fc9a8eb67..000000000000
--- a/Documentation/security/conf.py
+++ /dev/null
@@ -1,8 +0,0 @@
1project = "The kernel security subsystem manual"
2
3tags.add("subproject")
4
5latex_documents = [
6 ('index', 'security.tex', project,
7 'The kernel development community', 'manual'),
8]
diff --git a/Documentation/security/credentials.txt b/Documentation/security/credentials.rst
index 86257052e31a..038a7e19eff9 100644
--- a/Documentation/security/credentials.txt
+++ b/Documentation/security/credentials.rst
@@ -1,38 +1,18 @@
1 ==================== 1====================
2 CREDENTIALS IN LINUX 2Credentials in Linux
3 ==================== 3====================
4 4
5By: David Howells <dhowells@redhat.com> 5By: David Howells <dhowells@redhat.com>
6 6
7Contents: 7.. contents:: :local:
8
9 (*) Overview.
10
11 (*) Types of credentials.
12
13 (*) File markings.
14
15 (*) Task credentials.
16 8
17 - Immutable credentials. 9Overview
18 - Accessing task credentials.
19 - Accessing another task's credentials.
20 - Altering credentials.
21 - Managing credentials.
22
23 (*) Open file credentials.
24
25 (*) Overriding the VFS's use of credentials.
26
27
28========
29OVERVIEW
30======== 10========
31 11
32There are several parts to the security check performed by Linux when one 12There are several parts to the security check performed by Linux when one
33object acts upon another: 13object acts upon another:
34 14
35 (1) Objects. 15 1. Objects.
36 16
37 Objects are things in the system that may be acted upon directly by 17 Objects are things in the system that may be acted upon directly by
38 userspace programs. Linux has a variety of actionable objects, including: 18 userspace programs. Linux has a variety of actionable objects, including:
@@ -48,7 +28,7 @@ object acts upon another:
48 As a part of the description of all these objects there is a set of 28 As a part of the description of all these objects there is a set of
49 credentials. What's in the set depends on the type of object. 29 credentials. What's in the set depends on the type of object.
50 30
51 (2) Object ownership. 31 2. Object ownership.
52 32
53 Amongst the credentials of most objects, there will be a subset that 33 Amongst the credentials of most objects, there will be a subset that
54 indicates the ownership of that object. This is used for resource 34 indicates the ownership of that object. This is used for resource
@@ -57,7 +37,7 @@ object acts upon another:
57 In a standard UNIX filesystem, for instance, this will be defined by the 37 In a standard UNIX filesystem, for instance, this will be defined by the
58 UID marked on the inode. 38 UID marked on the inode.
59 39
60 (3) The objective context. 40 3. The objective context.
61 41
62 Also amongst the credentials of those objects, there will be a subset that 42 Also amongst the credentials of those objects, there will be a subset that
63 indicates the 'objective context' of that object. This may or may not be 43 indicates the 'objective context' of that object. This may or may not be
@@ -67,7 +47,7 @@ object acts upon another:
67 The objective context is used as part of the security calculation that is 47 The objective context is used as part of the security calculation that is
68 carried out when an object is acted upon. 48 carried out when an object is acted upon.
69 49
70 (4) Subjects. 50 4. Subjects.
71 51
72 A subject is an object that is acting upon another object. 52 A subject is an object that is acting upon another object.
73 53
@@ -77,10 +57,10 @@ object acts upon another:
77 57
78 Objects other than tasks may under some circumstances also be subjects. 58 Objects other than tasks may under some circumstances also be subjects.
79 For instance an open file may send SIGIO to a task using the UID and EUID 59 For instance an open file may send SIGIO to a task using the UID and EUID
80 given to it by a task that called fcntl(F_SETOWN) upon it. In this case, 60 given to it by a task that called ``fcntl(F_SETOWN)`` upon it. In this case,
81 the file struct will have a subjective context too. 61 the file struct will have a subjective context too.
82 62
83 (5) The subjective context. 63 5. The subjective context.
84 64
85 A subject has an additional interpretation of its credentials. A subset 65 A subject has an additional interpretation of its credentials. A subset
86 of its credentials forms the 'subjective context'. The subjective context 66 of its credentials forms the 'subjective context'. The subjective context
@@ -92,7 +72,7 @@ object acts upon another:
92 from the real UID and GID that normally form the objective context of the 72 from the real UID and GID that normally form the objective context of the
93 task. 73 task.
94 74
95 (6) Actions. 75 6. Actions.
96 76
97 Linux has a number of actions available that a subject may perform upon an 77 Linux has a number of actions available that a subject may perform upon an
98 object. The set of actions available depends on the nature of the subject 78 object. The set of actions available depends on the nature of the subject
@@ -101,7 +81,7 @@ object acts upon another:
101 Actions include reading, writing, creating and deleting files; forking or 81 Actions include reading, writing, creating and deleting files; forking or
102 signalling and tracing tasks. 82 signalling and tracing tasks.
103 83
104 (7) Rules, access control lists and security calculations. 84 7. Rules, access control lists and security calculations.
105 85
106 When a subject acts upon an object, a security calculation is made. This 86 When a subject acts upon an object, a security calculation is made. This
107 involves taking the subjective context, the objective context and the 87 involves taking the subjective context, the objective context and the
@@ -111,7 +91,7 @@ object acts upon another:
111 91
112 There are two main sources of rules: 92 There are two main sources of rules:
113 93
114 (a) Discretionary access control (DAC): 94 a. Discretionary access control (DAC):
115 95
116 Sometimes the object will include sets of rules as part of its 96 Sometimes the object will include sets of rules as part of its
117 description. This is an 'Access Control List' or 'ACL'. A Linux 97 description. This is an 'Access Control List' or 'ACL'. A Linux
@@ -127,7 +107,7 @@ object acts upon another:
127 A Linux file might also sport a POSIX ACL. This is a list of rules 107 A Linux file might also sport a POSIX ACL. This is a list of rules
128 that grants various permissions to arbitrary subjects. 108 that grants various permissions to arbitrary subjects.
129 109
130 (b) Mandatory access control (MAC): 110 b. Mandatory access control (MAC):
131 111
132 The system as a whole may have one or more sets of rules that get 112 The system as a whole may have one or more sets of rules that get
133 applied to all subjects and objects, regardless of their source. 113 applied to all subjects and objects, regardless of their source.
@@ -139,65 +119,65 @@ object acts upon another:
139 that says that this action is either granted or denied. 119 that says that this action is either granted or denied.
140 120
141 121
142==================== 122Types of Credentials
143TYPES OF CREDENTIALS
144==================== 123====================
145 124
146The Linux kernel supports the following types of credentials: 125The Linux kernel supports the following types of credentials:
147 126
148 (1) Traditional UNIX credentials. 127 1. Traditional UNIX credentials.
149 128
150 Real User ID 129 - Real User ID
151 Real Group ID 130 - Real Group ID
152 131
153 The UID and GID are carried by most, if not all, Linux objects, even if in 132 The UID and GID are carried by most, if not all, Linux objects, even if in
154 some cases it has to be invented (FAT or CIFS files for example, which are 133 some cases it has to be invented (FAT or CIFS files for example, which are
155 derived from Windows). These (mostly) define the objective context of 134 derived from Windows). These (mostly) define the objective context of
156 that object, with tasks being slightly different in some cases. 135 that object, with tasks being slightly different in some cases.
157 136
158 Effective, Saved and FS User ID 137 - Effective, Saved and FS User ID
159 Effective, Saved and FS Group ID 138 - Effective, Saved and FS Group ID
160 Supplementary groups 139 - Supplementary groups
161 140
162 These are additional credentials used by tasks only. Usually, an 141 These are additional credentials used by tasks only. Usually, an
163 EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID 142 EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
164 will be used as the objective. For tasks, it should be noted that this is 143 will be used as the objective. For tasks, it should be noted that this is
165 not always true. 144 not always true.
166 145
167 (2) Capabilities. 146 2. Capabilities.
168 147
169 Set of permitted capabilities 148 - Set of permitted capabilities
170 Set of inheritable capabilities 149 - Set of inheritable capabilities
171 Set of effective capabilities 150 - Set of effective capabilities
172 Capability bounding set 151 - Capability bounding set
173 152
174 These are only carried by tasks. They indicate superior capabilities 153 These are only carried by tasks. They indicate superior capabilities
175 granted piecemeal to a task that an ordinary task wouldn't otherwise have. 154 granted piecemeal to a task that an ordinary task wouldn't otherwise have.
176 These are manipulated implicitly by changes to the traditional UNIX 155 These are manipulated implicitly by changes to the traditional UNIX
177 credentials, but can also be manipulated directly by the capset() system 156 credentials, but can also be manipulated directly by the ``capset()``
178 call. 157 system call.
179 158
180 The permitted capabilities are those caps that the process might grant 159 The permitted capabilities are those caps that the process might grant
181 itself to its effective or permitted sets through capset(). This 160 itself to its effective or permitted sets through ``capset()``. This
182 inheritable set might also be so constrained. 161 inheritable set might also be so constrained.
183 162
184 The effective capabilities are the ones that a task is actually allowed to 163 The effective capabilities are the ones that a task is actually allowed to
185 make use of itself. 164 make use of itself.
186 165
187 The inheritable capabilities are the ones that may get passed across 166 The inheritable capabilities are the ones that may get passed across
188 execve(). 167 ``execve()``.
189 168
190 The bounding set limits the capabilities that may be inherited across 169 The bounding set limits the capabilities that may be inherited across
191 execve(), especially when a binary is executed that will execute as UID 0. 170 ``execve()``, especially when a binary is executed that will execute as
171 UID 0.
192 172
193 (3) Secure management flags (securebits). 173 3. Secure management flags (securebits).
194 174
195 These are only carried by tasks. These govern the way the above 175 These are only carried by tasks. These govern the way the above
196 credentials are manipulated and inherited over certain operations such as 176 credentials are manipulated and inherited over certain operations such as
197 execve(). They aren't used directly as objective or subjective 177 execve(). They aren't used directly as objective or subjective
198 credentials. 178 credentials.
199 179
200 (4) Keys and keyrings. 180 4. Keys and keyrings.
201 181
202 These are only carried by tasks. They carry and cache security tokens 182 These are only carried by tasks. They carry and cache security tokens
203 that don't fit into the other standard UNIX credentials. They are for 183 that don't fit into the other standard UNIX credentials. They are for
@@ -218,7 +198,7 @@ The Linux kernel supports the following types of credentials:
218 198
219 For more information on using keys, see Documentation/security/keys.txt. 199 For more information on using keys, see Documentation/security/keys.txt.
220 200
221 (5) LSM 201 5. LSM
222 202
223 The Linux Security Module allows extra controls to be placed over the 203 The Linux Security Module allows extra controls to be placed over the
224 operations that a task may do. Currently Linux supports several LSM 204 operations that a task may do. Currently Linux supports several LSM
@@ -228,7 +208,7 @@ The Linux kernel supports the following types of credentials:
228 rules (policies) that say what operations a task with one label may do to 208 rules (policies) that say what operations a task with one label may do to
229 an object with another label. 209 an object with another label.
230 210
231 (6) AF_KEY 211 6. AF_KEY
232 212
233 This is a socket-based approach to credential management for networking 213 This is a socket-based approach to credential management for networking
234 stacks [RFC 2367]. It isn't discussed by this document as it doesn't 214 stacks [RFC 2367]. It isn't discussed by this document as it doesn't
@@ -244,25 +224,19 @@ network filesystem where the credentials of the opened file should be presented
244to the server, regardless of who is actually doing a read or a write upon it. 224to the server, regardless of who is actually doing a read or a write upon it.
245 225
246 226
247============= 227File Markings
248FILE MARKINGS
249============= 228=============
250 229
251Files on disk or obtained over the network may have annotations that form the 230Files on disk or obtained over the network may have annotations that form the
252objective security context of that file. Depending on the type of filesystem, 231objective security context of that file. Depending on the type of filesystem,
253this may include one or more of the following: 232this may include one or more of the following:
254 233
255 (*) UNIX UID, GID, mode; 234 * UNIX UID, GID, mode;
256 235 * Windows user ID;
257 (*) Windows user ID; 236 * Access control list;
258 237 * LSM security label;
259 (*) Access control list; 238 * UNIX exec privilege escalation bits (SUID/SGID);
260 239 * File capabilities exec privilege escalation bits.
261 (*) LSM security label;
262
263 (*) UNIX exec privilege escalation bits (SUID/SGID);
264
265 (*) File capabilities exec privilege escalation bits.
266 240
267These are compared to the task's subjective security context, and certain 241These are compared to the task's subjective security context, and certain
268operations allowed or disallowed as a result. In the case of execve(), the 242operations allowed or disallowed as a result. In the case of execve(), the
@@ -270,8 +244,7 @@ privilege escalation bits come into play, and may allow the resulting process
270extra privileges, based on the annotations on the executable file. 244extra privileges, based on the annotations on the executable file.
271 245
272 246
273================ 247Task Credentials
274TASK CREDENTIALS
275================ 248================
276 249
277In Linux, all of a task's credentials are held in (uid, gid) or through 250In Linux, all of a task's credentials are held in (uid, gid) or through
@@ -282,20 +255,20 @@ task_struct.
282Once a set of credentials has been prepared and committed, it may not be 255Once a set of credentials has been prepared and committed, it may not be
283changed, barring the following exceptions: 256changed, barring the following exceptions:
284 257
285 (1) its reference count may be changed; 258 1. its reference count may be changed;
286 259
287 (2) the reference count on the group_info struct it points to may be changed; 260 2. the reference count on the group_info struct it points to may be changed;
288 261
289 (3) the reference count on the security data it points to may be changed; 262 3. the reference count on the security data it points to may be changed;
290 263
291 (4) the reference count on any keyrings it points to may be changed; 264 4. the reference count on any keyrings it points to may be changed;
292 265
293 (5) any keyrings it points to may be revoked, expired or have their security 266 5. any keyrings it points to may be revoked, expired or have their security
294 attributes changed; and 267 attributes changed; and
295 268
296 (6) the contents of any keyrings to which it points may be changed (the whole 269 6. the contents of any keyrings to which it points may be changed (the whole
297 point of keyrings being a shared set of credentials, modifiable by anyone 270 point of keyrings being a shared set of credentials, modifiable by anyone
298 with appropriate access). 271 with appropriate access).
299 272
300To alter anything in the cred struct, the copy-and-replace principle must be 273To alter anything in the cred struct, the copy-and-replace principle must be
301adhered to. First take a copy, then alter the copy and then use RCU to change 274adhered to. First take a copy, then alter the copy and then use RCU to change
@@ -303,37 +276,37 @@ the task pointer to make it point to the new copy. There are wrappers to aid
303with this (see below). 276with this (see below).
304 277
305A task may only alter its _own_ credentials; it is no longer permitted for a 278A task may only alter its _own_ credentials; it is no longer permitted for a
306task to alter another's credentials. This means the capset() system call is no 279task to alter another's credentials. This means the ``capset()`` system call
307longer permitted to take any PID other than the one of the current process. 280is no longer permitted to take any PID other than the one of the current
308Also keyctl_instantiate() and keyctl_negate() functions no longer permit 281process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
309attachment to process-specific keyrings in the requesting process as the 282longer permit attachment to process-specific keyrings in the requesting
310instantiating process may need to create them. 283process as the instantiating process may need to create them.
311 284
312 285
313IMMUTABLE CREDENTIALS 286Immutable Credentials
314--------------------- 287---------------------
315 288
316Once a set of credentials has been made public (by calling commit_creds() for 289Once a set of credentials has been made public (by calling ``commit_creds()``
317example), it must be considered immutable, barring two exceptions: 290for example), it must be considered immutable, barring two exceptions:
318 291
319 (1) The reference count may be altered. 292 1. The reference count may be altered.
320 293
321 (2) Whilst the keyring subscriptions of a set of credentials may not be 294 2. Whilst the keyring subscriptions of a set of credentials may not be
322 changed, the keyrings subscribed to may have their contents altered. 295 changed, the keyrings subscribed to may have their contents altered.
323 296
324To catch accidental credential alteration at compile time, struct task_struct 297To catch accidental credential alteration at compile time, struct task_struct
325has _const_ pointers to its credential sets, as does struct file. Furthermore, 298has _const_ pointers to its credential sets, as does struct file. Furthermore,
326certain functions such as get_cred() and put_cred() operate on const pointers, 299certain functions such as ``get_cred()`` and ``put_cred()`` operate on const
327thus rendering casts unnecessary, but require to temporarily ditch the const 300pointers, thus rendering casts unnecessary, but require to temporarily ditch
328qualification to be able to alter the reference count. 301the const qualification to be able to alter the reference count.
329 302
330 303
331ACCESSING TASK CREDENTIALS 304Accessing Task Credentials
332-------------------------- 305--------------------------
333 306
334A task being able to alter only its own credentials permits the current process 307A task being able to alter only its own credentials permits the current process
335to read or replace its own credentials without the need for any form of locking 308to read or replace its own credentials without the need for any form of locking
336- which simplifies things greatly. It can just call: 309-- which simplifies things greatly. It can just call::
337 310
338 const struct cred *current_cred() 311 const struct cred *current_cred()
339 312
@@ -341,7 +314,7 @@ to get a pointer to its credentials structure, and it doesn't have to release
341it afterwards. 314it afterwards.
342 315
343There are convenience wrappers for retrieving specific aspects of a task's 316There are convenience wrappers for retrieving specific aspects of a task's
344credentials (the value is simply returned in each case): 317credentials (the value is simply returned in each case)::
345 318
346 uid_t current_uid(void) Current's real UID 319 uid_t current_uid(void) Current's real UID
347 gid_t current_gid(void) Current's real GID 320 gid_t current_gid(void) Current's real GID
@@ -354,7 +327,7 @@ credentials (the value is simply returned in each case):
354 struct user_struct *current_user(void) Current's user account 327 struct user_struct *current_user(void) Current's user account
355 328
356There are also convenience wrappers for retrieving specific associated pairs of 329There are also convenience wrappers for retrieving specific associated pairs of
357a task's credentials: 330a task's credentials::
358 331
359 void current_uid_gid(uid_t *, gid_t *); 332 void current_uid_gid(uid_t *, gid_t *);
360 void current_euid_egid(uid_t *, gid_t *); 333 void current_euid_egid(uid_t *, gid_t *);
@@ -365,12 +338,12 @@ them from the current task's credentials.
365 338
366 339
367In addition, there is a function for obtaining a reference on the current 340In addition, there is a function for obtaining a reference on the current
368process's current set of credentials: 341process's current set of credentials::
369 342
370 const struct cred *get_current_cred(void); 343 const struct cred *get_current_cred(void);
371 344
372and functions for getting references to one of the credentials that don't 345and functions for getting references to one of the credentials that don't
373actually live in struct cred: 346actually live in struct cred::
374 347
375 struct user_struct *get_current_user(void); 348 struct user_struct *get_current_user(void);
376 struct group_info *get_current_groups(void); 349 struct group_info *get_current_groups(void);
@@ -378,22 +351,22 @@ actually live in struct cred:
378which get references to the current process's user accounting structure and 351which get references to the current process's user accounting structure and
379supplementary groups list respectively. 352supplementary groups list respectively.
380 353
381Once a reference has been obtained, it must be released with put_cred(), 354Once a reference has been obtained, it must be released with ``put_cred()``,
382free_uid() or put_group_info() as appropriate. 355``free_uid()`` or ``put_group_info()`` as appropriate.
383 356
384 357
385ACCESSING ANOTHER TASK'S CREDENTIALS 358Accessing Another Task's Credentials
386------------------------------------ 359------------------------------------
387 360
388Whilst a task may access its own credentials without the need for locking, the 361Whilst a task may access its own credentials without the need for locking, the
389same is not true of a task wanting to access another task's credentials. It 362same is not true of a task wanting to access another task's credentials. It
390must use the RCU read lock and rcu_dereference(). 363must use the RCU read lock and ``rcu_dereference()``.
391 364
392The rcu_dereference() is wrapped by: 365The ``rcu_dereference()`` is wrapped by::
393 366
394 const struct cred *__task_cred(struct task_struct *task); 367 const struct cred *__task_cred(struct task_struct *task);
395 368
396This should be used inside the RCU read lock, as in the following example: 369This should be used inside the RCU read lock, as in the following example::
397 370
398 void foo(struct task_struct *t, struct foo_data *f) 371 void foo(struct task_struct *t, struct foo_data *f)
399 { 372 {
@@ -410,39 +383,40 @@ This should be used inside the RCU read lock, as in the following example:
410 383
411Should it be necessary to hold another task's credentials for a long period of 384Should it be necessary to hold another task's credentials for a long period of
412time, and possibly to sleep whilst doing so, then the caller should get a 385time, and possibly to sleep whilst doing so, then the caller should get a
413reference on them using: 386reference on them using::
414 387
415 const struct cred *get_task_cred(struct task_struct *task); 388 const struct cred *get_task_cred(struct task_struct *task);
416 389
417This does all the RCU magic inside of it. The caller must call put_cred() on 390This does all the RCU magic inside of it. The caller must call put_cred() on
418the credentials so obtained when they're finished with. 391the credentials so obtained when they're finished with.
419 392
420 [*] Note: The result of __task_cred() should not be passed directly to 393.. note::
421 get_cred() as this may race with commit_cred(). 394 The result of ``__task_cred()`` should not be passed directly to
395 ``get_cred()`` as this may race with ``commit_cred()``.
422 396
423There are a couple of convenience functions to access bits of another task's 397There are a couple of convenience functions to access bits of another task's
424credentials, hiding the RCU magic from the caller: 398credentials, hiding the RCU magic from the caller::
425 399
426 uid_t task_uid(task) Task's real UID 400 uid_t task_uid(task) Task's real UID
427 uid_t task_euid(task) Task's effective UID 401 uid_t task_euid(task) Task's effective UID
428 402
429If the caller is holding the RCU read lock at the time anyway, then: 403If the caller is holding the RCU read lock at the time anyway, then::
430 404
431 __task_cred(task)->uid 405 __task_cred(task)->uid
432 __task_cred(task)->euid 406 __task_cred(task)->euid
433 407
434should be used instead. Similarly, if multiple aspects of a task's credentials 408should be used instead. Similarly, if multiple aspects of a task's credentials
435need to be accessed, RCU read lock should be used, __task_cred() called, the 409need to be accessed, RCU read lock should be used, ``__task_cred()`` called,
436result stored in a temporary pointer and then the credential aspects called 410the result stored in a temporary pointer and then the credential aspects called
437from that before dropping the lock. This prevents the potentially expensive 411from that before dropping the lock. This prevents the potentially expensive
438RCU magic from being invoked multiple times. 412RCU magic from being invoked multiple times.
439 413
440Should some other single aspect of another task's credentials need to be 414Should some other single aspect of another task's credentials need to be
441accessed, then this can be used: 415accessed, then this can be used::
442 416
443 task_cred_xxx(task, member) 417 task_cred_xxx(task, member)
444 418
445where 'member' is a non-pointer member of the cred struct. For instance: 419where 'member' is a non-pointer member of the cred struct. For instance::
446 420
447 uid_t task_cred_xxx(task, suid); 421 uid_t task_cred_xxx(task, suid);
448 422
@@ -451,7 +425,7 @@ magic. This may not be used for pointer members as what they point to may
451disappear the moment the RCU read lock is dropped. 425disappear the moment the RCU read lock is dropped.
452 426
453 427
454ALTERING CREDENTIALS 428Altering Credentials
455-------------------- 429--------------------
456 430
457As previously mentioned, a task may only alter its own credentials, and may not 431As previously mentioned, a task may only alter its own credentials, and may not
@@ -459,7 +433,7 @@ alter those of another task. This means that it doesn't need to use any
459locking to alter its own credentials. 433locking to alter its own credentials.
460 434
461To alter the current process's credentials, a function should first prepare a 435To alter the current process's credentials, a function should first prepare a
462new set of credentials by calling: 436new set of credentials by calling::
463 437
464 struct cred *prepare_creds(void); 438 struct cred *prepare_creds(void);
465 439
@@ -467,9 +441,10 @@ this locks current->cred_replace_mutex and then allocates and constructs a
467duplicate of the current process's credentials, returning with the mutex still 441duplicate of the current process's credentials, returning with the mutex still
468held if successful. It returns NULL if not successful (out of memory). 442held if successful. It returns NULL if not successful (out of memory).
469 443
470The mutex prevents ptrace() from altering the ptrace state of a process whilst 444The mutex prevents ``ptrace()`` from altering the ptrace state of a process
471security checks on credentials construction and changing is taking place as 445whilst security checks on credentials construction and changing is taking place
472the ptrace state may alter the outcome, particularly in the case of execve(). 446as the ptrace state may alter the outcome, particularly in the case of
447``execve()``.
473 448
474The new credentials set should be altered appropriately, and any security 449The new credentials set should be altered appropriately, and any security
475checks and hooks done. Both the current and the proposed sets of credentials 450checks and hooks done. Both the current and the proposed sets of credentials
@@ -478,36 +453,37 @@ still at this point.
478 453
479 454
480When the credential set is ready, it should be committed to the current process 455When the credential set is ready, it should be committed to the current process
481by calling: 456by calling::
482 457
483 int commit_creds(struct cred *new); 458 int commit_creds(struct cred *new);
484 459
485This will alter various aspects of the credentials and the process, giving the 460This will alter various aspects of the credentials and the process, giving the
486LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually 461LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
487commit the new credentials to current->cred, it will release 462actually commit the new credentials to ``current->cred``, it will release
488current->cred_replace_mutex to allow ptrace() to take place, and it will notify 463``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
489the scheduler and others of the changes. 464will notify the scheduler and others of the changes.
490 465
491This function is guaranteed to return 0, so that it can be tail-called at the 466This function is guaranteed to return 0, so that it can be tail-called at the
492end of such functions as sys_setresuid(). 467end of such functions as ``sys_setresuid()``.
493 468
494Note that this function consumes the caller's reference to the new credentials. 469Note that this function consumes the caller's reference to the new credentials.
495The caller should _not_ call put_cred() on the new credentials afterwards. 470The caller should _not_ call ``put_cred()`` on the new credentials afterwards.
496 471
497Furthermore, once this function has been called on a new set of credentials, 472Furthermore, once this function has been called on a new set of credentials,
498those credentials may _not_ be changed further. 473those credentials may _not_ be changed further.
499 474
500 475
501Should the security checks fail or some other error occur after prepare_creds() 476Should the security checks fail or some other error occur after
502has been called, then the following function should be invoked: 477``prepare_creds()`` has been called, then the following function should be
478invoked::
503 479
504 void abort_creds(struct cred *new); 480 void abort_creds(struct cred *new);
505 481
506This releases the lock on current->cred_replace_mutex that prepare_creds() got 482This releases the lock on ``current->cred_replace_mutex`` that
507and then releases the new credentials. 483``prepare_creds()`` got and then releases the new credentials.
508 484
509 485
510A typical credentials alteration function would look something like this: 486A typical credentials alteration function would look something like this::
511 487
512 int alter_suid(uid_t suid) 488 int alter_suid(uid_t suid)
513 { 489 {
@@ -529,53 +505,50 @@ A typical credentials alteration function would look something like this:
529 } 505 }
530 506
531 507
532MANAGING CREDENTIALS 508Managing Credentials
533-------------------- 509--------------------
534 510
535There are some functions to help manage credentials: 511There are some functions to help manage credentials:
536 512
537 (*) void put_cred(const struct cred *cred); 513 - ``void put_cred(const struct cred *cred);``
538 514
539 This releases a reference to the given set of credentials. If the 515 This releases a reference to the given set of credentials. If the
540 reference count reaches zero, the credentials will be scheduled for 516 reference count reaches zero, the credentials will be scheduled for
541 destruction by the RCU system. 517 destruction by the RCU system.
542 518
543 (*) const struct cred *get_cred(const struct cred *cred); 519 - ``const struct cred *get_cred(const struct cred *cred);``
544 520
545 This gets a reference on a live set of credentials, returning a pointer to 521 This gets a reference on a live set of credentials, returning a pointer to
546 that set of credentials. 522 that set of credentials.
547 523
548 (*) struct cred *get_new_cred(struct cred *cred); 524 - ``struct cred *get_new_cred(struct cred *cred);``
549 525
550 This gets a reference on a set of credentials that is under construction 526 This gets a reference on a set of credentials that is under construction
551 and is thus still mutable, returning a pointer to that set of credentials. 527 and is thus still mutable, returning a pointer to that set of credentials.
552 528
553 529
554===================== 530Open File Credentials
555OPEN FILE CREDENTIALS
556===================== 531=====================
557 532
558When a new file is opened, a reference is obtained on the opening task's 533When a new file is opened, a reference is obtained on the opening task's
559credentials and this is attached to the file struct as 'f_cred' in place of 534credentials and this is attached to the file struct as ``f_cred`` in place of
560'f_uid' and 'f_gid'. Code that used to access file->f_uid and file->f_gid 535``f_uid`` and ``f_gid``. Code that used to access ``file->f_uid`` and
561should now access file->f_cred->fsuid and file->f_cred->fsgid. 536``file->f_gid`` should now access ``file->f_cred->fsuid`` and
537``file->f_cred->fsgid``.
562 538
563It is safe to access f_cred without the use of RCU or locking because the 539It is safe to access ``f_cred`` without the use of RCU or locking because the
564pointer will not change over the lifetime of the file struct, and nor will the 540pointer will not change over the lifetime of the file struct, and nor will the
565contents of the cred struct pointed to, barring the exceptions listed above 541contents of the cred struct pointed to, barring the exceptions listed above
566(see the Task Credentials section). 542(see the Task Credentials section).
567 543
568 544
569======================================= 545Overriding the VFS's Use of Credentials
570OVERRIDING THE VFS'S USE OF CREDENTIALS
571======================================= 546=======================================
572 547
573Under some circumstances it is desirable to override the credentials used by 548Under some circumstances it is desirable to override the credentials used by
574the VFS, and that can be done by calling into such as vfs_mkdir() with a 549the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
575different set of credentials. This is done in the following places: 550different set of credentials. This is done in the following places:
576 551
577 (*) sys_faccessat(). 552 * ``sys_faccessat()``.
578 553 * ``do_coredump()``.
579 (*) do_coredump(). 554 * nfs4recover.c.
580
581 (*) nfs4recover.c.
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 9bae6bb20e7f..298a94a33f05 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -1,7 +1,13 @@
1====================== 1======================
2Security documentation 2Security Documentation
3====================== 3======================
4 4
5.. toctree:: 5.. toctree::
6 :maxdepth: 1
6 7
8 credentials
9 IMA-templates
10 keys/index
11 LSM
12 self-protection
7 tpm/index 13 tpm/index
diff --git a/Documentation/security/keys.txt b/Documentation/security/keys/core.rst
index cd5019934d7f..0d831a7afe4f 100644
--- a/Documentation/security/keys.txt
+++ b/Documentation/security/keys/core.rst
@@ -1,6 +1,6 @@
1 ============================ 1============================
2 KERNEL KEY RETENTION SERVICE 2Kernel Key Retention Service
3 ============================ 3============================
4 4
5This service allows cryptographic keys, authentication tokens, cross-domain 5This service allows cryptographic keys, authentication tokens, cross-domain
6user mappings, and similar to be cached in the kernel for the use of 6user mappings, and similar to be cached in the kernel for the use of
@@ -29,8 +29,7 @@ This document has the following sections:
29 - Garbage collection 29 - Garbage collection
30 30
31 31
32============ 32Key Overview
33KEY OVERVIEW
34============ 33============
35 34
36In this context, keys represent units of cryptographic data, authentication 35In this context, keys represent units of cryptographic data, authentication
@@ -47,14 +46,14 @@ Each key has a number of attributes:
47 - State. 46 - State.
48 47
49 48
50 (*) Each key is issued a serial number of type key_serial_t that is unique for 49 * Each key is issued a serial number of type key_serial_t that is unique for
51 the lifetime of that key. All serial numbers are positive non-zero 32-bit 50 the lifetime of that key. All serial numbers are positive non-zero 32-bit
52 integers. 51 integers.
53 52
54 Userspace programs can use a key's serial numbers as a way to gain access 53 Userspace programs can use a key's serial numbers as a way to gain access
55 to it, subject to permission checking. 54 to it, subject to permission checking.
56 55
57 (*) Each key is of a defined "type". Types must be registered inside the 56 * Each key is of a defined "type". Types must be registered inside the
58 kernel by a kernel service (such as a filesystem) before keys of that type 57 kernel by a kernel service (such as a filesystem) before keys of that type
59 can be added or used. Userspace programs cannot define new types directly. 58 can be added or used. Userspace programs cannot define new types directly.
60 59
@@ -64,18 +63,18 @@ Each key has a number of attributes:
64 Should a type be removed from the system, all the keys of that type will 63 Should a type be removed from the system, all the keys of that type will
65 be invalidated. 64 be invalidated.
66 65
67 (*) Each key has a description. This should be a printable string. The key 66 * Each key has a description. This should be a printable string. The key
68 type provides an operation to perform a match between the description on a 67 type provides an operation to perform a match between the description on a
69 key and a criterion string. 68 key and a criterion string.
70 69
71 (*) Each key has an owner user ID, a group ID and a permissions mask. These 70 * Each key has an owner user ID, a group ID and a permissions mask. These
72 are used to control what a process may do to a key from userspace, and 71 are used to control what a process may do to a key from userspace, and
73 whether a kernel service will be able to find the key. 72 whether a kernel service will be able to find the key.
74 73
75 (*) Each key can be set to expire at a specific time by the key type's 74 * Each key can be set to expire at a specific time by the key type's
76 instantiation function. Keys can also be immortal. 75 instantiation function. Keys can also be immortal.
77 76
78 (*) Each key can have a payload. This is a quantity of data that represent the 77 * Each key can have a payload. This is a quantity of data that represent the
79 actual "key". In the case of a keyring, this is a list of keys to which 78 actual "key". In the case of a keyring, this is a list of keys to which
80 the keyring links; in the case of a user-defined key, it's an arbitrary 79 the keyring links; in the case of a user-defined key, it's an arbitrary
81 blob of data. 80 blob of data.
@@ -91,39 +90,38 @@ Each key has a number of attributes:
91 permitted, another key type operation will be called to convert the key's 90 permitted, another key type operation will be called to convert the key's
92 attached payload back into a blob of data. 91 attached payload back into a blob of data.
93 92
94 (*) Each key can be in one of a number of basic states: 93 * Each key can be in one of a number of basic states:
95 94
96 (*) Uninstantiated. The key exists, but does not have any data attached. 95 * Uninstantiated. The key exists, but does not have any data attached.
97 Keys being requested from userspace will be in this state. 96 Keys being requested from userspace will be in this state.
98 97
99 (*) Instantiated. This is the normal state. The key is fully formed, and 98 * Instantiated. This is the normal state. The key is fully formed, and
100 has data attached. 99 has data attached.
101 100
102 (*) Negative. This is a relatively short-lived state. The key acts as a 101 * Negative. This is a relatively short-lived state. The key acts as a
103 note saying that a previous call out to userspace failed, and acts as 102 note saying that a previous call out to userspace failed, and acts as
104 a throttle on key lookups. A negative key can be updated to a normal 103 a throttle on key lookups. A negative key can be updated to a normal
105 state. 104 state.
106 105
107 (*) Expired. Keys can have lifetimes set. If their lifetime is exceeded, 106 * Expired. Keys can have lifetimes set. If their lifetime is exceeded,
108 they traverse to this state. An expired key can be updated back to a 107 they traverse to this state. An expired key can be updated back to a
109 normal state. 108 normal state.
110 109
111 (*) Revoked. A key is put in this state by userspace action. It can't be 110 * Revoked. A key is put in this state by userspace action. It can't be
112 found or operated upon (apart from by unlinking it). 111 found or operated upon (apart from by unlinking it).
113 112
114 (*) Dead. The key's type was unregistered, and so the key is now useless. 113 * Dead. The key's type was unregistered, and so the key is now useless.
115 114
116Keys in the last three states are subject to garbage collection. See the 115Keys in the last three states are subject to garbage collection. See the
117section on "Garbage collection". 116section on "Garbage collection".
118 117
119 118
120==================== 119Key Service Overview
121KEY SERVICE OVERVIEW
122==================== 120====================
123 121
124The key service provides a number of features besides keys: 122The key service provides a number of features besides keys:
125 123
126 (*) The key service defines three special key types: 124 * The key service defines three special key types:
127 125
128 (+) "keyring" 126 (+) "keyring"
129 127
@@ -149,7 +147,7 @@ The key service provides a number of features besides keys:
149 be created and updated from userspace, but the payload is only 147 be created and updated from userspace, but the payload is only
150 readable from kernel space. 148 readable from kernel space.
151 149
152 (*) Each process subscribes to three keyrings: a thread-specific keyring, a 150 * Each process subscribes to three keyrings: a thread-specific keyring, a
153 process-specific keyring, and a session-specific keyring. 151 process-specific keyring, and a session-specific keyring.
154 152
155 The thread-specific keyring is discarded from the child when any sort of 153 The thread-specific keyring is discarded from the child when any sort of
@@ -170,7 +168,7 @@ The key service provides a number of features besides keys:
170 The ownership of the thread keyring changes when the real UID and GID of 168 The ownership of the thread keyring changes when the real UID and GID of
171 the thread changes. 169 the thread changes.
172 170
173 (*) Each user ID resident in the system holds two special keyrings: a user 171 * Each user ID resident in the system holds two special keyrings: a user
174 specific keyring and a default user session keyring. The default session 172 specific keyring and a default user session keyring. The default session
175 keyring is initialised with a link to the user-specific keyring. 173 keyring is initialised with a link to the user-specific keyring.
176 174
@@ -180,7 +178,7 @@ The key service provides a number of features besides keys:
180 If a process attempts to access its session key when it doesn't have one, 178 If a process attempts to access its session key when it doesn't have one,
181 it will be subscribed to the default for its current UID. 179 it will be subscribed to the default for its current UID.
182 180
183 (*) Each user has two quotas against which the keys they own are tracked. One 181 * Each user has two quotas against which the keys they own are tracked. One
184 limits the total number of keys and keyrings, the other limits the total 182 limits the total number of keys and keyrings, the other limits the total
185 amount of description and payload space that can be consumed. 183 amount of description and payload space that can be consumed.
186 184
@@ -194,54 +192,53 @@ The key service provides a number of features besides keys:
194 If a system call that modifies a key or keyring in some way would put the 192 If a system call that modifies a key or keyring in some way would put the
195 user over quota, the operation is refused and error EDQUOT is returned. 193 user over quota, the operation is refused and error EDQUOT is returned.
196 194
197 (*) There's a system call interface by which userspace programs can create and 195 * There's a system call interface by which userspace programs can create and
198 manipulate keys and keyrings. 196 manipulate keys and keyrings.
199 197
200 (*) There's a kernel interface by which services can register types and search 198 * There's a kernel interface by which services can register types and search
201 for keys. 199 for keys.
202 200
203 (*) There's a way for the a search done from the kernel to call back to 201 * There's a way for the a search done from the kernel to call back to
204 userspace to request a key that can't be found in a process's keyrings. 202 userspace to request a key that can't be found in a process's keyrings.
205 203
206 (*) An optional filesystem is available through which the key database can be 204 * An optional filesystem is available through which the key database can be
207 viewed and manipulated. 205 viewed and manipulated.
208 206
209 207
210====================== 208Key Access Permissions
211KEY ACCESS PERMISSIONS
212====================== 209======================
213 210
214Keys have an owner user ID, a group access ID, and a permissions mask. The mask 211Keys have an owner user ID, a group access ID, and a permissions mask. The mask
215has up to eight bits each for possessor, user, group and other access. Only 212has up to eight bits each for possessor, user, group and other access. Only
216six of each set of eight bits are defined. These permissions granted are: 213six of each set of eight bits are defined. These permissions granted are:
217 214
218 (*) View 215 * View
219 216
220 This permits a key or keyring's attributes to be viewed - including key 217 This permits a key or keyring's attributes to be viewed - including key
221 type and description. 218 type and description.
222 219
223 (*) Read 220 * Read
224 221
225 This permits a key's payload to be viewed or a keyring's list of linked 222 This permits a key's payload to be viewed or a keyring's list of linked
226 keys. 223 keys.
227 224
228 (*) Write 225 * Write
229 226
230 This permits a key's payload to be instantiated or updated, or it allows a 227 This permits a key's payload to be instantiated or updated, or it allows a
231 link to be added to or removed from a keyring. 228 link to be added to or removed from a keyring.
232 229
233 (*) Search 230 * Search
234 231
235 This permits keyrings to be searched and keys to be found. Searches can 232 This permits keyrings to be searched and keys to be found. Searches can
236 only recurse into nested keyrings that have search permission set. 233 only recurse into nested keyrings that have search permission set.
237 234
238 (*) Link 235 * Link
239 236
240 This permits a key or keyring to be linked to. To create a link from a 237 This permits a key or keyring to be linked to. To create a link from a
241 keyring to a key, a process must have Write permission on the keyring and 238 keyring to a key, a process must have Write permission on the keyring and
242 Link permission on the key. 239 Link permission on the key.
243 240
244 (*) Set Attribute 241 * Set Attribute
245 242
246 This permits a key's UID, GID and permissions mask to be changed. 243 This permits a key's UID, GID and permissions mask to be changed.
247 244
@@ -249,8 +246,7 @@ For changing the ownership, group ID or permissions mask, being the owner of
249the key or having the sysadmin capability is sufficient. 246the key or having the sysadmin capability is sufficient.
250 247
251 248
252=============== 249SELinux Support
253SELINUX SUPPORT
254=============== 250===============
255 251
256The security class "key" has been added to SELinux so that mandatory access 252The security class "key" has been added to SELinux so that mandatory access
@@ -282,14 +278,13 @@ their associated thread, and both session and process keyrings are handled
282similarly. 278similarly.
283 279
284 280
285================ 281New ProcFS Files
286NEW PROCFS FILES
287================ 282================
288 283
289Two files have been added to procfs by which an administrator can find out 284Two files have been added to procfs by which an administrator can find out
290about the status of the key service: 285about the status of the key service:
291 286
292 (*) /proc/keys 287 * /proc/keys
293 288
294 This lists the keys that are currently viewable by the task reading the 289 This lists the keys that are currently viewable by the task reading the
295 file, giving information about their type, description and permissions. 290 file, giving information about their type, description and permissions.
@@ -301,7 +296,7 @@ about the status of the key service:
301 security checks are still performed, and may further filter out keys that 296 security checks are still performed, and may further filter out keys that
302 the current process is not authorised to view. 297 the current process is not authorised to view.
303 298
304 The contents of the file look like this: 299 The contents of the file look like this::
305 300
306 SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY 301 SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY
307 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 302 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4
@@ -314,7 +309,7 @@ about the status of the key service:
314 00000893 I--Q-N 1 35s 1f3f0000 0 0 user metal:silver: 0 309 00000893 I--Q-N 1 35s 1f3f0000 0 0 user metal:silver: 0
315 00000894 I--Q-- 1 10h 003f0000 0 0 user metal:gold: 0 310 00000894 I--Q-- 1 10h 003f0000 0 0 user metal:gold: 0
316 311
317 The flags are: 312 The flags are::
318 313
319 I Instantiated 314 I Instantiated
320 R Revoked 315 R Revoked
@@ -324,10 +319,10 @@ about the status of the key service:
324 N Negative key 319 N Negative key
325 320
326 321
327 (*) /proc/key-users 322 * /proc/key-users
328 323
329 This file lists the tracking data for each user that has at least one key 324 This file lists the tracking data for each user that has at least one key
330 on the system. Such data includes quota information and statistics: 325 on the system. Such data includes quota information and statistics::
331 326
332 [root@andromeda root]# cat /proc/key-users 327 [root@andromeda root]# cat /proc/key-users
333 0: 46 45/45 1/100 13/10000 328 0: 46 45/45 1/100 13/10000
@@ -335,7 +330,8 @@ about the status of the key service:
335 32: 2 2/2 2/100 40/10000 330 32: 2 2/2 2/100 40/10000
336 38: 2 2/2 2/100 40/10000 331 38: 2 2/2 2/100 40/10000
337 332
338 The format of each line is 333 The format of each line is::
334
339 <UID>: User ID to which this applies 335 <UID>: User ID to which this applies
340 <usage> Structure refcount 336 <usage> Structure refcount
341 <inst>/<keys> Total number of keys and number instantiated 337 <inst>/<keys> Total number of keys and number instantiated
@@ -346,14 +342,14 @@ about the status of the key service:
346Four new sysctl files have been added also for the purpose of controlling the 342Four new sysctl files have been added also for the purpose of controlling the
347quota limits on keys: 343quota limits on keys:
348 344
349 (*) /proc/sys/kernel/keys/root_maxkeys 345 * /proc/sys/kernel/keys/root_maxkeys
350 /proc/sys/kernel/keys/root_maxbytes 346 /proc/sys/kernel/keys/root_maxbytes
351 347
352 These files hold the maximum number of keys that root may have and the 348 These files hold the maximum number of keys that root may have and the
353 maximum total number of bytes of data that root may have stored in those 349 maximum total number of bytes of data that root may have stored in those
354 keys. 350 keys.
355 351
356 (*) /proc/sys/kernel/keys/maxkeys 352 * /proc/sys/kernel/keys/maxkeys
357 /proc/sys/kernel/keys/maxbytes 353 /proc/sys/kernel/keys/maxbytes
358 354
359 These files hold the maximum number of keys that each non-root user may 355 These files hold the maximum number of keys that each non-root user may
@@ -364,8 +360,7 @@ Root may alter these by writing each new limit as a decimal number string to
364the appropriate file. 360the appropriate file.
365 361
366 362
367=============================== 363Userspace System Call Interface
368USERSPACE SYSTEM CALL INTERFACE
369=============================== 364===============================
370 365
371Userspace can manipulate keys directly through three new syscalls: add_key, 366Userspace can manipulate keys directly through three new syscalls: add_key,
@@ -375,7 +370,7 @@ manipulating keys.
375When referring to a key directly, userspace programs should use the key's 370When referring to a key directly, userspace programs should use the key's
376serial number (a positive 32-bit integer). However, there are some special 371serial number (a positive 32-bit integer). However, there are some special
377values available for referring to special keys and keyrings that relate to the 372values available for referring to special keys and keyrings that relate to the
378process making the call: 373process making the call::
379 374
380 CONSTANT VALUE KEY REFERENCED 375 CONSTANT VALUE KEY REFERENCED
381 ============================== ====== =========================== 376 ============================== ====== ===========================
@@ -391,8 +386,8 @@ process making the call:
391 386
392The main syscalls are: 387The main syscalls are:
393 388
394 (*) Create a new key of given type, description and payload and add it to the 389 * Create a new key of given type, description and payload and add it to the
395 nominated keyring: 390 nominated keyring::
396 391
397 key_serial_t add_key(const char *type, const char *desc, 392 key_serial_t add_key(const char *type, const char *desc,
398 const void *payload, size_t plen, 393 const void *payload, size_t plen,
@@ -432,8 +427,8 @@ The main syscalls are:
432 The ID of the new or updated key is returned if successful. 427 The ID of the new or updated key is returned if successful.
433 428
434 429
435 (*) Search the process's keyrings for a key, potentially calling out to 430 * Search the process's keyrings for a key, potentially calling out to
436 userspace to create it. 431 userspace to create it::
437 432
438 key_serial_t request_key(const char *type, const char *description, 433 key_serial_t request_key(const char *type, const char *description,
439 const char *callout_info, 434 const char *callout_info,
@@ -453,7 +448,7 @@ The main syscalls are:
453 448
454The keyctl syscall functions are: 449The keyctl syscall functions are:
455 450
456 (*) Map a special key ID to a real key ID for this process: 451 * Map a special key ID to a real key ID for this process::
457 452
458 key_serial_t keyctl(KEYCTL_GET_KEYRING_ID, key_serial_t id, 453 key_serial_t keyctl(KEYCTL_GET_KEYRING_ID, key_serial_t id,
459 int create); 454 int create);
@@ -466,7 +461,7 @@ The keyctl syscall functions are:
466 non-zero; and the error ENOKEY will be returned if "create" is zero. 461 non-zero; and the error ENOKEY will be returned if "create" is zero.
467 462
468 463
469 (*) Replace the session keyring this process subscribes to with a new one: 464 * Replace the session keyring this process subscribes to with a new one::
470 465
471 key_serial_t keyctl(KEYCTL_JOIN_SESSION_KEYRING, const char *name); 466 key_serial_t keyctl(KEYCTL_JOIN_SESSION_KEYRING, const char *name);
472 467
@@ -484,7 +479,7 @@ The keyctl syscall functions are:
484 The ID of the new session keyring is returned if successful. 479 The ID of the new session keyring is returned if successful.
485 480
486 481
487 (*) Update the specified key: 482 * Update the specified key::
488 483
489 long keyctl(KEYCTL_UPDATE, key_serial_t key, const void *payload, 484 long keyctl(KEYCTL_UPDATE, key_serial_t key, const void *payload,
490 size_t plen); 485 size_t plen);
@@ -498,7 +493,7 @@ The keyctl syscall functions are:
498 add_key(). 493 add_key().
499 494
500 495
501 (*) Revoke a key: 496 * Revoke a key::
502 497
503 long keyctl(KEYCTL_REVOKE, key_serial_t key); 498 long keyctl(KEYCTL_REVOKE, key_serial_t key);
504 499
@@ -507,7 +502,7 @@ The keyctl syscall functions are:
507 be findable. 502 be findable.
508 503
509 504
510 (*) Change the ownership of a key: 505 * Change the ownership of a key::
511 506
512 long keyctl(KEYCTL_CHOWN, key_serial_t key, uid_t uid, gid_t gid); 507 long keyctl(KEYCTL_CHOWN, key_serial_t key, uid_t uid, gid_t gid);
513 508
@@ -520,7 +515,7 @@ The keyctl syscall functions are:
520 its group list members. 515 its group list members.
521 516
522 517
523 (*) Change the permissions mask on a key: 518 * Change the permissions mask on a key::
524 519
525 long keyctl(KEYCTL_SETPERM, key_serial_t key, key_perm_t perm); 520 long keyctl(KEYCTL_SETPERM, key_serial_t key, key_perm_t perm);
526 521
@@ -531,7 +526,7 @@ The keyctl syscall functions are:
531 error EINVAL will be returned. 526 error EINVAL will be returned.
532 527
533 528
534 (*) Describe a key: 529 * Describe a key::
535 530
536 long keyctl(KEYCTL_DESCRIBE, key_serial_t key, char *buffer, 531 long keyctl(KEYCTL_DESCRIBE, key_serial_t key, char *buffer,
537 size_t buflen); 532 size_t buflen);
@@ -547,7 +542,7 @@ The keyctl syscall functions are:
547 A process must have view permission on the key for this function to be 542 A process must have view permission on the key for this function to be
548 successful. 543 successful.
549 544
550 If successful, a string is placed in the buffer in the following format: 545 If successful, a string is placed in the buffer in the following format::
551 546
552 <type>;<uid>;<gid>;<perm>;<description> 547 <type>;<uid>;<gid>;<perm>;<description>
553 548
@@ -555,12 +550,12 @@ The keyctl syscall functions are:
555 is hexadecimal. A NUL character is included at the end of the string if 550 is hexadecimal. A NUL character is included at the end of the string if
556 the buffer is sufficiently big. 551 the buffer is sufficiently big.
557 552
558 This can be parsed with 553 This can be parsed with::
559 554
560 sscanf(buffer, "%[^;];%d;%d;%o;%s", type, &uid, &gid, &mode, desc); 555 sscanf(buffer, "%[^;];%d;%d;%o;%s", type, &uid, &gid, &mode, desc);
561 556
562 557
563 (*) Clear out a keyring: 558 * Clear out a keyring::
564 559
565 long keyctl(KEYCTL_CLEAR, key_serial_t keyring); 560 long keyctl(KEYCTL_CLEAR, key_serial_t keyring);
566 561
@@ -573,7 +568,7 @@ The keyctl syscall functions are:
573 DNS resolver cache keyring is an example of this. 568 DNS resolver cache keyring is an example of this.
574 569
575 570
576 (*) Link a key into a keyring: 571 * Link a key into a keyring::
577 572
578 long keyctl(KEYCTL_LINK, key_serial_t keyring, key_serial_t key); 573 long keyctl(KEYCTL_LINK, key_serial_t keyring, key_serial_t key);
579 574
@@ -592,7 +587,7 @@ The keyctl syscall functions are:
592 added. 587 added.
593 588
594 589
595 (*) Unlink a key or keyring from another keyring: 590 * Unlink a key or keyring from another keyring::
596 591
597 long keyctl(KEYCTL_UNLINK, key_serial_t keyring, key_serial_t key); 592 long keyctl(KEYCTL_UNLINK, key_serial_t keyring, key_serial_t key);
598 593
@@ -604,7 +599,7 @@ The keyctl syscall functions are:
604 is not present, error ENOENT will be the result. 599 is not present, error ENOENT will be the result.
605 600
606 601
607 (*) Search a keyring tree for a key: 602 * Search a keyring tree for a key::
608 603
609 key_serial_t keyctl(KEYCTL_SEARCH, key_serial_t keyring, 604 key_serial_t keyctl(KEYCTL_SEARCH, key_serial_t keyring,
610 const char *type, const char *description, 605 const char *type, const char *description,
@@ -628,7 +623,7 @@ The keyctl syscall functions are:
628 fails. On success, the resulting key ID will be returned. 623 fails. On success, the resulting key ID will be returned.
629 624
630 625
631 (*) Read the payload data from a key: 626 * Read the payload data from a key::
632 627
633 long keyctl(KEYCTL_READ, key_serial_t keyring, char *buffer, 628 long keyctl(KEYCTL_READ, key_serial_t keyring, char *buffer,
634 size_t buflen); 629 size_t buflen);
@@ -650,7 +645,7 @@ The keyctl syscall functions are:
650 available rather than the amount copied. 645 available rather than the amount copied.
651 646
652 647
653 (*) Instantiate a partially constructed key. 648 * Instantiate a partially constructed key::
654 649
655 long keyctl(KEYCTL_INSTANTIATE, key_serial_t key, 650 long keyctl(KEYCTL_INSTANTIATE, key_serial_t key,
656 const void *payload, size_t plen, 651 const void *payload, size_t plen,
@@ -677,7 +672,7 @@ The keyctl syscall functions are:
677 array instead of a single buffer. 672 array instead of a single buffer.
678 673
679 674
680 (*) Negatively instantiate a partially constructed key. 675 * Negatively instantiate a partially constructed key::
681 676
682 long keyctl(KEYCTL_NEGATE, key_serial_t key, 677 long keyctl(KEYCTL_NEGATE, key_serial_t key,
683 unsigned timeout, key_serial_t keyring); 678 unsigned timeout, key_serial_t keyring);
@@ -700,12 +695,12 @@ The keyctl syscall functions are:
700 as rejecting the key with ENOKEY as the error code. 695 as rejecting the key with ENOKEY as the error code.
701 696
702 697
703 (*) Set the default request-key destination keyring. 698 * Set the default request-key destination keyring::
704 699
705 long keyctl(KEYCTL_SET_REQKEY_KEYRING, int reqkey_defl); 700 long keyctl(KEYCTL_SET_REQKEY_KEYRING, int reqkey_defl);
706 701
707 This sets the default keyring to which implicitly requested keys will be 702 This sets the default keyring to which implicitly requested keys will be
708 attached for this thread. reqkey_defl should be one of these constants: 703 attached for this thread. reqkey_defl should be one of these constants::
709 704
710 CONSTANT VALUE NEW DEFAULT KEYRING 705 CONSTANT VALUE NEW DEFAULT KEYRING
711 ====================================== ====== ======================= 706 ====================================== ====== =======================
@@ -731,7 +726,7 @@ The keyctl syscall functions are:
731 there is one, otherwise the user default session keyring. 726 there is one, otherwise the user default session keyring.
732 727
733 728
734 (*) Set the timeout on a key. 729 * Set the timeout on a key::
735 730
736 long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout); 731 long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout);
737 732
@@ -744,7 +739,7 @@ The keyctl syscall functions are:
744 or expired keys. 739 or expired keys.
745 740
746 741
747 (*) Assume the authority granted to instantiate a key 742 * Assume the authority granted to instantiate a key::
748 743
749 long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key); 744 long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key);
750 745
@@ -766,7 +761,7 @@ The keyctl syscall functions are:
766 The assumed authoritative key is inherited across fork and exec. 761 The assumed authoritative key is inherited across fork and exec.
767 762
768 763
769 (*) Get the LSM security context attached to a key. 764 * Get the LSM security context attached to a key::
770 765
771 long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer, 766 long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer,
772 size_t buflen) 767 size_t buflen)
@@ -787,7 +782,7 @@ The keyctl syscall functions are:
787 successful. 782 successful.
788 783
789 784
790 (*) Install the calling process's session keyring on its parent. 785 * Install the calling process's session keyring on its parent::
791 786
792 long keyctl(KEYCTL_SESSION_TO_PARENT); 787 long keyctl(KEYCTL_SESSION_TO_PARENT);
793 788
@@ -807,7 +802,7 @@ The keyctl syscall functions are:
807 kernel and resumes executing userspace. 802 kernel and resumes executing userspace.
808 803
809 804
810 (*) Invalidate a key. 805 * Invalidate a key::
811 806
812 long keyctl(KEYCTL_INVALIDATE, key_serial_t key); 807 long keyctl(KEYCTL_INVALIDATE, key_serial_t key);
813 808
@@ -823,20 +818,19 @@ The keyctl syscall functions are:
823 A process must have search permission on the key for this function to be 818 A process must have search permission on the key for this function to be
824 successful. 819 successful.
825 820
826 (*) Compute a Diffie-Hellman shared secret or public key 821 * Compute a Diffie-Hellman shared secret or public key::
827 822
828 long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, 823 long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params,
829 char *buffer, size_t buflen, 824 char *buffer, size_t buflen, struct keyctl_kdf_params *kdf);
830 struct keyctl_kdf_params *kdf);
831 825
832 The params struct contains serial numbers for three keys: 826 The params struct contains serial numbers for three keys::
833 827
834 - The prime, p, known to both parties 828 - The prime, p, known to both parties
835 - The local private key 829 - The local private key
836 - The base integer, which is either a shared generator or the 830 - The base integer, which is either a shared generator or the
837 remote public key 831 remote public key
838 832
839 The value computed is: 833 The value computed is::
840 834
841 result = base ^ private (mod prime) 835 result = base ^ private (mod prime)
842 836
@@ -858,12 +852,12 @@ The keyctl syscall functions are:
858 of the KDF is returned to the caller. The KDF is characterized with 852 of the KDF is returned to the caller. The KDF is characterized with
859 struct keyctl_kdf_params as follows: 853 struct keyctl_kdf_params as follows:
860 854
861 - char *hashname specifies the NUL terminated string identifying 855 - ``char *hashname`` specifies the NUL terminated string identifying
862 the hash used from the kernel crypto API and applied for the KDF 856 the hash used from the kernel crypto API and applied for the KDF
863 operation. The KDF implemenation complies with SP800-56A as well 857 operation. The KDF implemenation complies with SP800-56A as well
864 as with SP800-108 (the counter KDF). 858 as with SP800-108 (the counter KDF).
865 859
866 - char *otherinfo specifies the OtherInfo data as documented in 860 - ``char *otherinfo`` specifies the OtherInfo data as documented in
867 SP800-56A section 5.8.1.2. The length of the buffer is given with 861 SP800-56A section 5.8.1.2. The length of the buffer is given with
868 otherinfolen. The format of OtherInfo is defined by the caller. 862 otherinfolen. The format of OtherInfo is defined by the caller.
869 The otherinfo pointer may be NULL if no OtherInfo shall be used. 863 The otherinfo pointer may be NULL if no OtherInfo shall be used.
@@ -875,10 +869,10 @@ The keyctl syscall functions are:
875 and either the buffer length or the OtherInfo length exceeds the 869 and either the buffer length or the OtherInfo length exceeds the
876 allowed length. 870 allowed length.
877 871
878 (*) Restrict keyring linkage 872 * Restrict keyring linkage::
879 873
880 long keyctl(KEYCTL_RESTRICT_KEYRING, key_serial_t keyring, 874 long keyctl(KEYCTL_RESTRICT_KEYRING, key_serial_t keyring,
881 const char *type, const char *restriction); 875 const char *type, const char *restriction);
882 876
883 An existing keyring can restrict linkage of additional keys by evaluating 877 An existing keyring can restrict linkage of additional keys by evaluating
884 the contents of the key according to a restriction scheme. 878 the contents of the key according to a restriction scheme.
@@ -900,8 +894,7 @@ The keyctl syscall functions are:
900 To apply a keyring restriction the process must have Set Attribute 894 To apply a keyring restriction the process must have Set Attribute
901 permission and the keyring must not be previously restricted. 895 permission and the keyring must not be previously restricted.
902 896
903=============== 897Kernel Services
904KERNEL SERVICES
905=============== 898===============
906 899
907The kernel services for key management are fairly simple to deal with. They can 900The kernel services for key management are fairly simple to deal with. They can
@@ -915,29 +908,29 @@ call, and the key released upon close. How to deal with conflicting keys due to
915two different users opening the same file is left to the filesystem author to 908two different users opening the same file is left to the filesystem author to
916solve. 909solve.
917 910
918To access the key manager, the following header must be #included: 911To access the key manager, the following header must be #included::
919 912
920 <linux/key.h> 913 <linux/key.h>
921 914
922Specific key types should have a header file under include/keys/ that should be 915Specific key types should have a header file under include/keys/ that should be
923used to access that type. For keys of type "user", for example, that would be: 916used to access that type. For keys of type "user", for example, that would be::
924 917
925 <keys/user-type.h> 918 <keys/user-type.h>
926 919
927Note that there are two different types of pointers to keys that may be 920Note that there are two different types of pointers to keys that may be
928encountered: 921encountered:
929 922
930 (*) struct key * 923 * struct key *
931 924
932 This simply points to the key structure itself. Key structures will be at 925 This simply points to the key structure itself. Key structures will be at
933 least four-byte aligned. 926 least four-byte aligned.
934 927
935 (*) key_ref_t 928 * key_ref_t
936 929
937 This is equivalent to a struct key *, but the least significant bit is set 930 This is equivalent to a ``struct key *``, but the least significant bit is set
938 if the caller "possesses" the key. By "possession" it is meant that the 931 if the caller "possesses" the key. By "possession" it is meant that the
939 calling processes has a searchable link to the key from one of its 932 calling processes has a searchable link to the key from one of its
940 keyrings. There are three functions for dealing with these: 933 keyrings. There are three functions for dealing with these::
941 934
942 key_ref_t make_key_ref(const struct key *key, bool possession); 935 key_ref_t make_key_ref(const struct key *key, bool possession);
943 936
@@ -955,7 +948,7 @@ When accessing a key's payload contents, certain precautions must be taken to
955prevent access vs modification races. See the section "Notes on accessing 948prevent access vs modification races. See the section "Notes on accessing
956payload contents" for more information. 949payload contents" for more information.
957 950
958(*) To search for a key, call: 951 * To search for a key, call::
959 952
960 struct key *request_key(const struct key_type *type, 953 struct key *request_key(const struct key_type *type,
961 const char *description, 954 const char *description,
@@ -977,7 +970,7 @@ payload contents" for more information.
977 See also Documentation/security/keys-request-key.txt. 970 See also Documentation/security/keys-request-key.txt.
978 971
979 972
980(*) To search for a key, passing auxiliary data to the upcaller, call: 973 * To search for a key, passing auxiliary data to the upcaller, call::
981 974
982 struct key *request_key_with_auxdata(const struct key_type *type, 975 struct key *request_key_with_auxdata(const struct key_type *type,
983 const char *description, 976 const char *description,
@@ -990,14 +983,14 @@ payload contents" for more information.
990 is a blob of length callout_len, if given (the length may be 0). 983 is a blob of length callout_len, if given (the length may be 0).
991 984
992 985
993(*) A key can be requested asynchronously by calling one of: 986 * A key can be requested asynchronously by calling one of::
994 987
995 struct key *request_key_async(const struct key_type *type, 988 struct key *request_key_async(const struct key_type *type,
996 const char *description, 989 const char *description,
997 const void *callout_info, 990 const void *callout_info,
998 size_t callout_len); 991 size_t callout_len);
999 992
1000 or: 993 or::
1001 994
1002 struct key *request_key_async_with_auxdata(const struct key_type *type, 995 struct key *request_key_async_with_auxdata(const struct key_type *type,
1003 const char *description, 996 const char *description,
@@ -1010,7 +1003,7 @@ payload contents" for more information.
1010 1003
1011 These two functions return with the key potentially still under 1004 These two functions return with the key potentially still under
1012 construction. To wait for construction completion, the following should be 1005 construction. To wait for construction completion, the following should be
1013 called: 1006 called::
1014 1007
1015 int wait_for_key_construction(struct key *key, bool intr); 1008 int wait_for_key_construction(struct key *key, bool intr);
1016 1009
@@ -1022,11 +1015,11 @@ payload contents" for more information.
1022 case error ERESTARTSYS will be returned. 1015 case error ERESTARTSYS will be returned.
1023 1016
1024 1017
1025(*) When it is no longer required, the key should be released using: 1018 * When it is no longer required, the key should be released using::
1026 1019
1027 void key_put(struct key *key); 1020 void key_put(struct key *key);
1028 1021
1029 Or: 1022 Or::
1030 1023
1031 void key_ref_put(key_ref_t key_ref); 1024 void key_ref_put(key_ref_t key_ref);
1032 1025
@@ -1034,8 +1027,8 @@ payload contents" for more information.
1034 the argument will not be parsed. 1027 the argument will not be parsed.
1035 1028
1036 1029
1037(*) Extra references can be made to a key by calling one of the following 1030 * Extra references can be made to a key by calling one of the following
1038 functions: 1031 functions::
1039 1032
1040 struct key *__key_get(struct key *key); 1033 struct key *__key_get(struct key *key);
1041 struct key *key_get(struct key *key); 1034 struct key *key_get(struct key *key);
@@ -1047,7 +1040,7 @@ payload contents" for more information.
1047 then the key will not be dereferenced and no increment will take place. 1040 then the key will not be dereferenced and no increment will take place.
1048 1041
1049 1042
1050(*) A key's serial number can be obtained by calling: 1043 * A key's serial number can be obtained by calling::
1051 1044
1052 key_serial_t key_serial(struct key *key); 1045 key_serial_t key_serial(struct key *key);
1053 1046
@@ -1055,7 +1048,7 @@ payload contents" for more information.
1055 latter case without parsing the argument). 1048 latter case without parsing the argument).
1056 1049
1057 1050
1058(*) If a keyring was found in the search, this can be further searched by: 1051 * If a keyring was found in the search, this can be further searched by::
1059 1052
1060 key_ref_t keyring_search(key_ref_t keyring_ref, 1053 key_ref_t keyring_search(key_ref_t keyring_ref,
1061 const struct key_type *type, 1054 const struct key_type *type,
@@ -1070,7 +1063,7 @@ payload contents" for more information.
1070 reference pointer if successful. 1063 reference pointer if successful.
1071 1064
1072 1065
1073(*) A keyring can be created by: 1066 * A keyring can be created by::
1074 1067
1075 struct key *keyring_alloc(const char *description, uid_t uid, gid_t gid, 1068 struct key *keyring_alloc(const char *description, uid_t uid, gid_t gid,
1076 const struct cred *cred, 1069 const struct cred *cred,
@@ -1109,7 +1102,7 @@ payload contents" for more information.
1109 -EPERM to in this case. 1102 -EPERM to in this case.
1110 1103
1111 1104
1112(*) To check the validity of a key, this function can be called: 1105 * To check the validity of a key, this function can be called::
1113 1106
1114 int validate_key(struct key *key); 1107 int validate_key(struct key *key);
1115 1108
@@ -1119,7 +1112,7 @@ payload contents" for more information.
1119 returned (in the latter case without parsing the argument). 1112 returned (in the latter case without parsing the argument).
1120 1113
1121 1114
1122(*) To register a key type, the following function should be called: 1115 * To register a key type, the following function should be called::
1123 1116
1124 int register_key_type(struct key_type *type); 1117 int register_key_type(struct key_type *type);
1125 1118
@@ -1127,13 +1120,13 @@ payload contents" for more information.
1127 present. 1120 present.
1128 1121
1129 1122
1130(*) To unregister a key type, call: 1123 * To unregister a key type, call::
1131 1124
1132 void unregister_key_type(struct key_type *type); 1125 void unregister_key_type(struct key_type *type);
1133 1126
1134 1127
1135Under some circumstances, it may be desirable to deal with a bundle of keys. 1128Under some circumstances, it may be desirable to deal with a bundle of keys.
1136The facility provides access to the keyring type for managing such a bundle: 1129The facility provides access to the keyring type for managing such a bundle::
1137 1130
1138 struct key_type key_type_keyring; 1131 struct key_type key_type_keyring;
1139 1132
@@ -1143,8 +1136,7 @@ with keyring_search(). Note that it is not possible to use request_key() to
1143search a specific keyring, so using keyrings in this way is of limited utility. 1136search a specific keyring, so using keyrings in this way is of limited utility.
1144 1137
1145 1138
1146=================================== 1139Notes On Accessing Payload Contents
1147NOTES ON ACCESSING PAYLOAD CONTENTS
1148=================================== 1140===================================
1149 1141
1150The simplest payload is just data stored in key->payload directly. In this 1142The simplest payload is just data stored in key->payload directly. In this
@@ -1154,31 +1146,31 @@ More complex payload contents must be allocated and pointers to them set in the
1154key->payload.data[] array. One of the following ways must be selected to 1146key->payload.data[] array. One of the following ways must be selected to
1155access the data: 1147access the data:
1156 1148
1157 (1) Unmodifiable key type. 1149 1) Unmodifiable key type.
1158 1150
1159 If the key type does not have a modify method, then the key's payload can 1151 If the key type does not have a modify method, then the key's payload can
1160 be accessed without any form of locking, provided that it's known to be 1152 be accessed without any form of locking, provided that it's known to be
1161 instantiated (uninstantiated keys cannot be "found"). 1153 instantiated (uninstantiated keys cannot be "found").
1162 1154
1163 (2) The key's semaphore. 1155 2) The key's semaphore.
1164 1156
1165 The semaphore could be used to govern access to the payload and to control 1157 The semaphore could be used to govern access to the payload and to control
1166 the payload pointer. It must be write-locked for modifications and would 1158 the payload pointer. It must be write-locked for modifications and would
1167 have to be read-locked for general access. The disadvantage of doing this 1159 have to be read-locked for general access. The disadvantage of doing this
1168 is that the accessor may be required to sleep. 1160 is that the accessor may be required to sleep.
1169 1161
1170 (3) RCU. 1162 3) RCU.
1171 1163
1172 RCU must be used when the semaphore isn't already held; if the semaphore 1164 RCU must be used when the semaphore isn't already held; if the semaphore
1173 is held then the contents can't change under you unexpectedly as the 1165 is held then the contents can't change under you unexpectedly as the
1174 semaphore must still be used to serialise modifications to the key. The 1166 semaphore must still be used to serialise modifications to the key. The
1175 key management code takes care of this for the key type. 1167 key management code takes care of this for the key type.
1176 1168
1177 However, this means using: 1169 However, this means using::
1178 1170
1179 rcu_read_lock() ... rcu_dereference() ... rcu_read_unlock() 1171 rcu_read_lock() ... rcu_dereference() ... rcu_read_unlock()
1180 1172
1181 to read the pointer, and: 1173 to read the pointer, and::
1182 1174
1183 rcu_dereference() ... rcu_assign_pointer() ... call_rcu() 1175 rcu_dereference() ... rcu_assign_pointer() ... call_rcu()
1184 1176
@@ -1194,11 +1186,11 @@ access the data:
1194 usage. This is called key->payload.rcu_data0. The following accessors 1186 usage. This is called key->payload.rcu_data0. The following accessors
1195 wrap the RCU calls to this element: 1187 wrap the RCU calls to this element:
1196 1188
1197 (a) Set or change the first payload pointer: 1189 a) Set or change the first payload pointer::
1198 1190
1199 rcu_assign_keypointer(struct key *key, void *data); 1191 rcu_assign_keypointer(struct key *key, void *data);
1200 1192
1201 (b) Read the first payload pointer with the key semaphore held: 1193 b) Read the first payload pointer with the key semaphore held::
1202 1194
1203 [const] void *dereference_key_locked([const] struct key *key); 1195 [const] void *dereference_key_locked([const] struct key *key);
1204 1196
@@ -1206,39 +1198,38 @@ access the data:
1206 parameter. Static analysis will give an error if it things the lock 1198 parameter. Static analysis will give an error if it things the lock
1207 isn't held. 1199 isn't held.
1208 1200
1209 (c) Read the first payload pointer with the RCU read lock held: 1201 c) Read the first payload pointer with the RCU read lock held::
1210 1202
1211 const void *dereference_key_rcu(const struct key *key); 1203 const void *dereference_key_rcu(const struct key *key);
1212 1204
1213 1205
1214=================== 1206Defining a Key Type
1215DEFINING A KEY TYPE
1216=================== 1207===================
1217 1208
1218A kernel service may want to define its own key type. For instance, an AFS 1209A kernel service may want to define its own key type. For instance, an AFS
1219filesystem might want to define a Kerberos 5 ticket key type. To do this, it 1210filesystem might want to define a Kerberos 5 ticket key type. To do this, it
1220author fills in a key_type struct and registers it with the system. 1211author fills in a key_type struct and registers it with the system.
1221 1212
1222Source files that implement key types should include the following header file: 1213Source files that implement key types should include the following header file::
1223 1214
1224 <linux/key-type.h> 1215 <linux/key-type.h>
1225 1216
1226The structure has a number of fields, some of which are mandatory: 1217The structure has a number of fields, some of which are mandatory:
1227 1218
1228 (*) const char *name 1219 * ``const char *name``
1229 1220
1230 The name of the key type. This is used to translate a key type name 1221 The name of the key type. This is used to translate a key type name
1231 supplied by userspace into a pointer to the structure. 1222 supplied by userspace into a pointer to the structure.
1232 1223
1233 1224
1234 (*) size_t def_datalen 1225 * ``size_t def_datalen``
1235 1226
1236 This is optional - it supplies the default payload data length as 1227 This is optional - it supplies the default payload data length as
1237 contributed to the quota. If the key type's payload is always or almost 1228 contributed to the quota. If the key type's payload is always or almost
1238 always the same size, then this is a more efficient way to do things. 1229 always the same size, then this is a more efficient way to do things.
1239 1230
1240 The data length (and quota) on a particular key can always be changed 1231 The data length (and quota) on a particular key can always be changed
1241 during instantiation or update by calling: 1232 during instantiation or update by calling::
1242 1233
1243 int key_payload_reserve(struct key *key, size_t datalen); 1234 int key_payload_reserve(struct key *key, size_t datalen);
1244 1235
@@ -1246,18 +1237,18 @@ The structure has a number of fields, some of which are mandatory:
1246 viable. 1237 viable.
1247 1238
1248 1239
1249 (*) int (*vet_description)(const char *description); 1240 * ``int (*vet_description)(const char *description);``
1250 1241
1251 This optional method is called to vet a key description. If the key type 1242 This optional method is called to vet a key description. If the key type
1252 doesn't approve of the key description, it may return an error, otherwise 1243 doesn't approve of the key description, it may return an error, otherwise
1253 it should return 0. 1244 it should return 0.
1254 1245
1255 1246
1256 (*) int (*preparse)(struct key_preparsed_payload *prep); 1247 * ``int (*preparse)(struct key_preparsed_payload *prep);``
1257 1248
1258 This optional method permits the key type to attempt to parse payload 1249 This optional method permits the key type to attempt to parse payload
1259 before a key is created (add key) or the key semaphore is taken (update or 1250 before a key is created (add key) or the key semaphore is taken (update or
1260 instantiate key). The structure pointed to by prep looks like: 1251 instantiate key). The structure pointed to by prep looks like::
1261 1252
1262 struct key_preparsed_payload { 1253 struct key_preparsed_payload {
1263 char *description; 1254 char *description;
@@ -1285,7 +1276,7 @@ The structure has a number of fields, some of which are mandatory:
1285 otherwise. 1276 otherwise.
1286 1277
1287 1278
1288 (*) void (*free_preparse)(struct key_preparsed_payload *prep); 1279 * ``void (*free_preparse)(struct key_preparsed_payload *prep);``
1289 1280
1290 This method is only required if the preparse() method is provided, 1281 This method is only required if the preparse() method is provided,
1291 otherwise it is unused. It cleans up anything attached to the description 1282 otherwise it is unused. It cleans up anything attached to the description
@@ -1294,7 +1285,7 @@ The structure has a number of fields, some of which are mandatory:
1294 successfully, even if instantiate() or update() succeed. 1285 successfully, even if instantiate() or update() succeed.
1295 1286
1296 1287
1297 (*) int (*instantiate)(struct key *key, struct key_preparsed_payload *prep); 1288 * ``int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);``
1298 1289
1299 This method is called to attach a payload to a key during construction. 1290 This method is called to attach a payload to a key during construction.
1300 The payload attached need not bear any relation to the data passed to this 1291 The payload attached need not bear any relation to the data passed to this
@@ -1318,7 +1309,7 @@ The structure has a number of fields, some of which are mandatory:
1318 free_preparse method doesn't release the data. 1309 free_preparse method doesn't release the data.
1319 1310
1320 1311
1321 (*) int (*update)(struct key *key, const void *data, size_t datalen); 1312 * ``int (*update)(struct key *key, const void *data, size_t datalen);``
1322 1313
1323 If this type of key can be updated, then this method should be provided. 1314 If this type of key can be updated, then this method should be provided.
1324 It is called to update a key's payload from the blob of data provided. 1315 It is called to update a key's payload from the blob of data provided.
@@ -1343,10 +1334,10 @@ The structure has a number of fields, some of which are mandatory:
1343 It is safe to sleep in this method. 1334 It is safe to sleep in this method.
1344 1335
1345 1336
1346 (*) int (*match_preparse)(struct key_match_data *match_data); 1337 * ``int (*match_preparse)(struct key_match_data *match_data);``
1347 1338
1348 This method is optional. It is called when a key search is about to be 1339 This method is optional. It is called when a key search is about to be
1349 performed. It is given the following structure: 1340 performed. It is given the following structure::
1350 1341
1351 struct key_match_data { 1342 struct key_match_data {
1352 bool (*cmp)(const struct key *key, 1343 bool (*cmp)(const struct key *key,
@@ -1357,23 +1348,23 @@ The structure has a number of fields, some of which are mandatory:
1357 }; 1348 };
1358 1349
1359 On entry, raw_data will be pointing to the criteria to be used in matching 1350 On entry, raw_data will be pointing to the criteria to be used in matching
1360 a key by the caller and should not be modified. (*cmp)() will be pointing 1351 a key by the caller and should not be modified. ``(*cmp)()`` will be pointing
1361 to the default matcher function (which does an exact description match 1352 to the default matcher function (which does an exact description match
1362 against raw_data) and lookup_type will be set to indicate a direct lookup. 1353 against raw_data) and lookup_type will be set to indicate a direct lookup.
1363 1354
1364 The following lookup_type values are available: 1355 The following lookup_type values are available:
1365 1356
1366 [*] KEYRING_SEARCH_LOOKUP_DIRECT - A direct lookup hashes the type and 1357 * KEYRING_SEARCH_LOOKUP_DIRECT - A direct lookup hashes the type and
1367 description to narrow down the search to a small number of keys. 1358 description to narrow down the search to a small number of keys.
1368 1359
1369 [*] KEYRING_SEARCH_LOOKUP_ITERATE - An iterative lookup walks all the 1360 * KEYRING_SEARCH_LOOKUP_ITERATE - An iterative lookup walks all the
1370 keys in the keyring until one is matched. This must be used for any 1361 keys in the keyring until one is matched. This must be used for any
1371 search that's not doing a simple direct match on the key description. 1362 search that's not doing a simple direct match on the key description.
1372 1363
1373 The method may set cmp to point to a function of its choice that does some 1364 The method may set cmp to point to a function of its choice that does some
1374 other form of match, may set lookup_type to KEYRING_SEARCH_LOOKUP_ITERATE 1365 other form of match, may set lookup_type to KEYRING_SEARCH_LOOKUP_ITERATE
1375 and may attach something to the preparsed pointer for use by (*cmp)(). 1366 and may attach something to the preparsed pointer for use by ``(*cmp)()``.
1376 (*cmp)() should return true if a key matches and false otherwise. 1367 ``(*cmp)()`` should return true if a key matches and false otherwise.
1377 1368
1378 If preparsed is set, it may be necessary to use the match_free() method to 1369 If preparsed is set, it may be necessary to use the match_free() method to
1379 clean it up. 1370 clean it up.
@@ -1381,20 +1372,20 @@ The structure has a number of fields, some of which are mandatory:
1381 The method should return 0 if successful or a negative error code 1372 The method should return 0 if successful or a negative error code
1382 otherwise. 1373 otherwise.
1383 1374
1384 It is permitted to sleep in this method, but (*cmp)() may not sleep as 1375 It is permitted to sleep in this method, but ``(*cmp)()`` may not sleep as
1385 locks will be held over it. 1376 locks will be held over it.
1386 1377
1387 If match_preparse() is not provided, keys of this type will be matched 1378 If match_preparse() is not provided, keys of this type will be matched
1388 exactly by their description. 1379 exactly by their description.
1389 1380
1390 1381
1391 (*) void (*match_free)(struct key_match_data *match_data); 1382 * ``void (*match_free)(struct key_match_data *match_data);``
1392 1383
1393 This method is optional. If given, it called to clean up 1384 This method is optional. If given, it called to clean up
1394 match_data->preparsed after a successful call to match_preparse(). 1385 match_data->preparsed after a successful call to match_preparse().
1395 1386
1396 1387
1397 (*) void (*revoke)(struct key *key); 1388 * ``void (*revoke)(struct key *key);``
1398 1389
1399 This method is optional. It is called to discard part of the payload 1390 This method is optional. It is called to discard part of the payload
1400 data upon a key being revoked. The caller will have the key semaphore 1391 data upon a key being revoked. The caller will have the key semaphore
@@ -1404,7 +1395,7 @@ The structure has a number of fields, some of which are mandatory:
1404 a deadlock against the key semaphore. 1395 a deadlock against the key semaphore.
1405 1396
1406 1397
1407 (*) void (*destroy)(struct key *key); 1398 * ``void (*destroy)(struct key *key);``
1408 1399
1409 This method is optional. It is called to discard the payload data on a key 1400 This method is optional. It is called to discard the payload data on a key
1410 when it is being destroyed. 1401 when it is being destroyed.
@@ -1416,7 +1407,7 @@ The structure has a number of fields, some of which are mandatory:
1416 It is not safe to sleep in this method; the caller may hold spinlocks. 1407 It is not safe to sleep in this method; the caller may hold spinlocks.
1417 1408
1418 1409
1419 (*) void (*describe)(const struct key *key, struct seq_file *p); 1410 * ``void (*describe)(const struct key *key, struct seq_file *p);``
1420 1411
1421 This method is optional. It is called during /proc/keys reading to 1412 This method is optional. It is called during /proc/keys reading to
1422 summarise a key's description and payload in text form. 1413 summarise a key's description and payload in text form.
@@ -1432,7 +1423,7 @@ The structure has a number of fields, some of which are mandatory:
1432 caller. 1423 caller.
1433 1424
1434 1425
1435 (*) long (*read)(const struct key *key, char __user *buffer, size_t buflen); 1426 * ``long (*read)(const struct key *key, char __user *buffer, size_t buflen);``
1436 1427
1437 This method is optional. It is called by KEYCTL_READ to translate the 1428 This method is optional. It is called by KEYCTL_READ to translate the
1438 key's payload into something a blob of data for userspace to deal with. 1429 key's payload into something a blob of data for userspace to deal with.
@@ -1448,8 +1439,7 @@ The structure has a number of fields, some of which are mandatory:
1448 as might happen when the userspace buffer is accessed. 1439 as might happen when the userspace buffer is accessed.
1449 1440
1450 1441
1451 (*) int (*request_key)(struct key_construction *cons, const char *op, 1442 * ``int (*request_key)(struct key_construction *cons, const char *op, void *aux);``
1452 void *aux);
1453 1443
1454 This method is optional. If provided, request_key() and friends will 1444 This method is optional. If provided, request_key() and friends will
1455 invoke this function rather than upcalling to /sbin/request-key to operate 1445 invoke this function rather than upcalling to /sbin/request-key to operate
@@ -1463,7 +1453,7 @@ The structure has a number of fields, some of which are mandatory:
1463 This method is permitted to return before the upcall is complete, but the 1453 This method is permitted to return before the upcall is complete, but the
1464 following function must be called under all circumstances to complete the 1454 following function must be called under all circumstances to complete the
1465 instantiation process, whether or not it succeeds, whether or not there's 1455 instantiation process, whether or not it succeeds, whether or not there's
1466 an error: 1456 an error::
1467 1457
1468 void complete_request_key(struct key_construction *cons, int error); 1458 void complete_request_key(struct key_construction *cons, int error);
1469 1459
@@ -1479,16 +1469,16 @@ The structure has a number of fields, some of which are mandatory:
1479 The key under construction and the authorisation key can be found in the 1469 The key under construction and the authorisation key can be found in the
1480 key_construction struct pointed to by cons: 1470 key_construction struct pointed to by cons:
1481 1471
1482 (*) struct key *key; 1472 * ``struct key *key;``
1483 1473
1484 The key under construction. 1474 The key under construction.
1485 1475
1486 (*) struct key *authkey; 1476 * ``struct key *authkey;``
1487 1477
1488 The authorisation key. 1478 The authorisation key.
1489 1479
1490 1480
1491 (*) struct key_restriction *(*lookup_restriction)(const char *params); 1481 * ``struct key_restriction *(*lookup_restriction)(const char *params);``
1492 1482
1493 This optional method is used to enable userspace configuration of keyring 1483 This optional method is used to enable userspace configuration of keyring
1494 restrictions. The restriction parameter string (not including the key type 1484 restrictions. The restriction parameter string (not including the key type
@@ -1497,12 +1487,11 @@ The structure has a number of fields, some of which are mandatory:
1497 attempted key link operation. If there is no match, -EINVAL is returned. 1487 attempted key link operation. If there is no match, -EINVAL is returned.
1498 1488
1499 1489
1500============================ 1490Request-Key Callback Service
1501REQUEST-KEY CALLBACK SERVICE
1502============================ 1491============================
1503 1492
1504To create a new key, the kernel will attempt to execute the following command 1493To create a new key, the kernel will attempt to execute the following command
1505line: 1494line::
1506 1495
1507 /sbin/request-key create <key> <uid> <gid> \ 1496 /sbin/request-key create <key> <uid> <gid> \
1508 <threadring> <processring> <sessionring> <callout_info> 1497 <threadring> <processring> <sessionring> <callout_info>
@@ -1511,10 +1500,10 @@ line:
1511keyrings from the process that caused the search to be issued. These are 1500keyrings from the process that caused the search to be issued. These are
1512included for two reasons: 1501included for two reasons:
1513 1502
1514 (1) There may be an authentication token in one of the keyrings that is 1503 1 There may be an authentication token in one of the keyrings that is
1515 required to obtain the key, eg: a Kerberos Ticket-Granting Ticket. 1504 required to obtain the key, eg: a Kerberos Ticket-Granting Ticket.
1516 1505
1517 (2) The new key should probably be cached in one of these rings. 1506 2 The new key should probably be cached in one of these rings.
1518 1507
1519This program should set it UID and GID to those specified before attempting to 1508This program should set it UID and GID to those specified before attempting to
1520access any more keys. It may then look around for a user specific process to 1509access any more keys. It may then look around for a user specific process to
@@ -1539,7 +1528,7 @@ instead.
1539 1528
1540 1529
1541Similarly, the kernel may attempt to update an expired or a soon to expire key 1530Similarly, the kernel may attempt to update an expired or a soon to expire key
1542by executing: 1531by executing::
1543 1532
1544 /sbin/request-key update <key> <uid> <gid> \ 1533 /sbin/request-key update <key> <uid> <gid> \
1545 <threadring> <processring> <sessionring> 1534 <threadring> <processring> <sessionring>
@@ -1548,8 +1537,7 @@ In this case, the program isn't required to actually attach the key to a ring;
1548the rings are provided for reference. 1537the rings are provided for reference.
1549 1538
1550 1539
1551================== 1540Garbage Collection
1552GARBAGE COLLECTION
1553================== 1541==================
1554 1542
1555Dead keys (for which the type has been removed) will be automatically unlinked 1543Dead keys (for which the type has been removed) will be automatically unlinked
@@ -1557,6 +1545,6 @@ from those keyrings that point to them and deleted as soon as possible by a
1557background garbage collector. 1545background garbage collector.
1558 1546
1559Similarly, revoked and expired keys will be garbage collected, but only after a 1547Similarly, revoked and expired keys will be garbage collected, but only after a
1560certain amount of time has passed. This time is set as a number of seconds in: 1548certain amount of time has passed. This time is set as a number of seconds in::
1561 1549
1562 /proc/sys/kernel/keys/gc_delay 1550 /proc/sys/kernel/keys/gc_delay
diff --git a/Documentation/security/keys-ecryptfs.txt b/Documentation/security/keys/ecryptfs.rst
index c3bbeba63562..4920f3a8ea75 100644
--- a/Documentation/security/keys-ecryptfs.txt
+++ b/Documentation/security/keys/ecryptfs.rst
@@ -1,4 +1,6 @@
1 Encrypted keys for the eCryptfs filesystem 1==========================================
2Encrypted keys for the eCryptfs filesystem
3==========================================
2 4
3ECryptfs is a stacked filesystem which transparently encrypts and decrypts each 5ECryptfs is a stacked filesystem which transparently encrypts and decrypts each
4file using a randomly generated File Encryption Key (FEK). 6file using a randomly generated File Encryption Key (FEK).
@@ -35,20 +37,23 @@ controlled environment. Another advantage is that the key is not exposed to
35threats of malicious software, because it is available in clear form only at 37threats of malicious software, because it is available in clear form only at
36kernel level. 38kernel level.
37 39
38Usage: 40Usage::
41
39 keyctl add encrypted name "new ecryptfs key-type:master-key-name keylen" ring 42 keyctl add encrypted name "new ecryptfs key-type:master-key-name keylen" ring
40 keyctl add encrypted name "load hex_blob" ring 43 keyctl add encrypted name "load hex_blob" ring
41 keyctl update keyid "update key-type:master-key-name" 44 keyctl update keyid "update key-type:master-key-name"
42 45
43name:= '<16 hexadecimal characters>' 46Where::
44key-type:= 'trusted' | 'user' 47
45keylen:= 64 48 name:= '<16 hexadecimal characters>'
49 key-type:= 'trusted' | 'user'
50 keylen:= 64
46 51
47 52
48Example of encrypted key usage with the eCryptfs filesystem: 53Example of encrypted key usage with the eCryptfs filesystem:
49 54
50Create an encrypted key "1000100010001000" of length 64 bytes with format 55Create an encrypted key "1000100010001000" of length 64 bytes with format
51'ecryptfs' and save it using a previously loaded user key "test": 56'ecryptfs' and save it using a previously loaded user key "test"::
52 57
53 $ keyctl add encrypted 1000100010001000 "new ecryptfs user:test 64" @u 58 $ keyctl add encrypted 1000100010001000 "new ecryptfs user:test 64" @u
54 19184530 59 19184530
@@ -62,7 +67,7 @@ Create an encrypted key "1000100010001000" of length 64 bytes with format
62 $ keyctl pipe 19184530 > ecryptfs.blob 67 $ keyctl pipe 19184530 > ecryptfs.blob
63 68
64Mount an eCryptfs filesystem using the created encrypted key "1000100010001000" 69Mount an eCryptfs filesystem using the created encrypted key "1000100010001000"
65into the '/secret' directory: 70into the '/secret' directory::
66 71
67 $ mount -i -t ecryptfs -oecryptfs_sig=1000100010001000,\ 72 $ mount -i -t ecryptfs -oecryptfs_sig=1000100010001000,\
68 ecryptfs_cipher=aes,ecryptfs_key_bytes=32 /secret /secret 73 ecryptfs_cipher=aes,ecryptfs_key_bytes=32 /secret /secret
diff --git a/Documentation/security/keys/index.rst b/Documentation/security/keys/index.rst
new file mode 100644
index 000000000000..647d58f2588e
--- /dev/null
+++ b/Documentation/security/keys/index.rst
@@ -0,0 +1,11 @@
1===========
2Kernel Keys
3===========
4
5.. toctree::
6 :maxdepth: 1
7
8 core
9 ecryptfs
10 request-key
11 trusted-encrypted
diff --git a/Documentation/security/keys-request-key.txt b/Documentation/security/keys/request-key.rst
index 51987bfecfed..aba32784174c 100644
--- a/Documentation/security/keys-request-key.txt
+++ b/Documentation/security/keys/request-key.rst
@@ -1,19 +1,19 @@
1 =================== 1===================
2 KEY REQUEST SERVICE 2Key Request Service
3 =================== 3===================
4 4
5The key request service is part of the key retention service (refer to 5The key request service is part of the key retention service (refer to
6Documentation/security/keys.txt). This document explains more fully how 6Documentation/security/keys.txt). This document explains more fully how
7the requesting algorithm works. 7the requesting algorithm works.
8 8
9The process starts by either the kernel requesting a service by calling 9The process starts by either the kernel requesting a service by calling
10request_key*(): 10``request_key*()``::
11 11
12 struct key *request_key(const struct key_type *type, 12 struct key *request_key(const struct key_type *type,
13 const char *description, 13 const char *description,
14 const char *callout_info); 14 const char *callout_info);
15 15
16or: 16or::
17 17
18 struct key *request_key_with_auxdata(const struct key_type *type, 18 struct key *request_key_with_auxdata(const struct key_type *type,
19 const char *description, 19 const char *description,
@@ -21,14 +21,14 @@ or:
21 size_t callout_len, 21 size_t callout_len,
22 void *aux); 22 void *aux);
23 23
24or: 24or::
25 25
26 struct key *request_key_async(const struct key_type *type, 26 struct key *request_key_async(const struct key_type *type,
27 const char *description, 27 const char *description,
28 const char *callout_info, 28 const char *callout_info,
29 size_t callout_len); 29 size_t callout_len);
30 30
31or: 31or::
32 32
33 struct key *request_key_async_with_auxdata(const struct key_type *type, 33 struct key *request_key_async_with_auxdata(const struct key_type *type,
34 const char *description, 34 const char *description,
@@ -36,7 +36,7 @@ or:
36 size_t callout_len, 36 size_t callout_len,
37 void *aux); 37 void *aux);
38 38
39Or by userspace invoking the request_key system call: 39Or by userspace invoking the request_key system call::
40 40
41 key_serial_t request_key(const char *type, 41 key_serial_t request_key(const char *type,
42 const char *description, 42 const char *description,
@@ -67,38 +67,37 @@ own upcall mechanisms. If they do, then those should be substituted for the
67forking and execution of /sbin/request-key. 67forking and execution of /sbin/request-key.
68 68
69 69
70=========== 70The Process
71THE PROCESS
72=========== 71===========
73 72
74A request proceeds in the following manner: 73A request proceeds in the following manner:
75 74
76 (1) Process A calls request_key() [the userspace syscall calls the kernel 75 1) Process A calls request_key() [the userspace syscall calls the kernel
77 interface]. 76 interface].
78 77
79 (2) request_key() searches the process's subscribed keyrings to see if there's 78 2) request_key() searches the process's subscribed keyrings to see if there's
80 a suitable key there. If there is, it returns the key. If there isn't, 79 a suitable key there. If there is, it returns the key. If there isn't,
81 and callout_info is not set, an error is returned. Otherwise the process 80 and callout_info is not set, an error is returned. Otherwise the process
82 proceeds to the next step. 81 proceeds to the next step.
83 82
84 (3) request_key() sees that A doesn't have the desired key yet, so it creates 83 3) request_key() sees that A doesn't have the desired key yet, so it creates
85 two things: 84 two things:
86 85
87 (a) An uninstantiated key U of requested type and description. 86 a) An uninstantiated key U of requested type and description.
88 87
89 (b) An authorisation key V that refers to key U and notes that process A 88 b) An authorisation key V that refers to key U and notes that process A
90 is the context in which key U should be instantiated and secured, and 89 is the context in which key U should be instantiated and secured, and
91 from which associated key requests may be satisfied. 90 from which associated key requests may be satisfied.
92 91
93 (4) request_key() then forks and executes /sbin/request-key with a new session 92 4) request_key() then forks and executes /sbin/request-key with a new session
94 keyring that contains a link to auth key V. 93 keyring that contains a link to auth key V.
95 94
96 (5) /sbin/request-key assumes the authority associated with key U. 95 5) /sbin/request-key assumes the authority associated with key U.
97 96
98 (6) /sbin/request-key execs an appropriate program to perform the actual 97 6) /sbin/request-key execs an appropriate program to perform the actual
99 instantiation. 98 instantiation.
100 99
101 (7) The program may want to access another key from A's context (say a 100 7) The program may want to access another key from A's context (say a
102 Kerberos TGT key). It just requests the appropriate key, and the keyring 101 Kerberos TGT key). It just requests the appropriate key, and the keyring
103 search notes that the session keyring has auth key V in its bottom level. 102 search notes that the session keyring has auth key V in its bottom level.
104 103
@@ -106,15 +105,15 @@ A request proceeds in the following manner:
106 UID, GID, groups and security info of process A as if it was process A, 105 UID, GID, groups and security info of process A as if it was process A,
107 and come up with key W. 106 and come up with key W.
108 107
109 (8) The program then does what it must to get the data with which to 108 8) The program then does what it must to get the data with which to
110 instantiate key U, using key W as a reference (perhaps it contacts a 109 instantiate key U, using key W as a reference (perhaps it contacts a
111 Kerberos server using the TGT) and then instantiates key U. 110 Kerberos server using the TGT) and then instantiates key U.
112 111
113 (9) Upon instantiating key U, auth key V is automatically revoked so that it 112 9) Upon instantiating key U, auth key V is automatically revoked so that it
114 may not be used again. 113 may not be used again.
115 114
116(10) The program then exits 0 and request_key() deletes key V and returns key 115 10) The program then exits 0 and request_key() deletes key V and returns key
117 U to the caller. 116 U to the caller.
118 117
119This also extends further. If key W (step 7 above) didn't exist, key W would 118This also extends further. If key W (step 7 above) didn't exist, key W would
120be created uninstantiated, another auth key (X) would be created (as per step 119be created uninstantiated, another auth key (X) would be created (as per step
@@ -127,8 +126,7 @@ This is because process A's keyrings can't simply be attached to
127of them, and (b) it requires the same UID/GID/Groups all the way through. 126of them, and (b) it requires the same UID/GID/Groups all the way through.
128 127
129 128
130==================================== 129Negative Instantiation And Rejection
131NEGATIVE INSTANTIATION AND REJECTION
132==================================== 130====================================
133 131
134Rather than instantiating a key, it is possible for the possessor of an 132Rather than instantiating a key, it is possible for the possessor of an
@@ -145,23 +143,22 @@ signal, the key under construction will be automatically negatively
145instantiated for a short amount of time. 143instantiated for a short amount of time.
146 144
147 145
148==================== 146The Search Algorithm
149THE SEARCH ALGORITHM
150==================== 147====================
151 148
152A search of any particular keyring proceeds in the following fashion: 149A search of any particular keyring proceeds in the following fashion:
153 150
154 (1) When the key management code searches for a key (keyring_search_aux) it 151 1) When the key management code searches for a key (keyring_search_aux) it
155 firstly calls key_permission(SEARCH) on the keyring it's starting with, 152 firstly calls key_permission(SEARCH) on the keyring it's starting with,
156 if this denies permission, it doesn't search further. 153 if this denies permission, it doesn't search further.
157 154
158 (2) It considers all the non-keyring keys within that keyring and, if any key 155 2) It considers all the non-keyring keys within that keyring and, if any key
159 matches the criteria specified, calls key_permission(SEARCH) on it to see 156 matches the criteria specified, calls key_permission(SEARCH) on it to see
160 if the key is allowed to be found. If it is, that key is returned; if 157 if the key is allowed to be found. If it is, that key is returned; if
161 not, the search continues, and the error code is retained if of higher 158 not, the search continues, and the error code is retained if of higher
162 priority than the one currently set. 159 priority than the one currently set.
163 160
164 (3) It then considers all the keyring-type keys in the keyring it's currently 161 3) It then considers all the keyring-type keys in the keyring it's currently
165 searching. It calls key_permission(SEARCH) on each keyring, and if this 162 searching. It calls key_permission(SEARCH) on each keyring, and if this
166 grants permission, it recurses, executing steps (2) and (3) on that 163 grants permission, it recurses, executing steps (2) and (3) on that
167 keyring. 164 keyring.
@@ -173,20 +170,20 @@ returned.
173When search_process_keyrings() is invoked, it performs the following searches 170When search_process_keyrings() is invoked, it performs the following searches
174until one succeeds: 171until one succeeds:
175 172
176 (1) If extant, the process's thread keyring is searched. 173 1) If extant, the process's thread keyring is searched.
177 174
178 (2) If extant, the process's process keyring is searched. 175 2) If extant, the process's process keyring is searched.
179 176
180 (3) The process's session keyring is searched. 177 3) The process's session keyring is searched.
181 178
182 (4) If the process has assumed the authority associated with a request_key() 179 4) If the process has assumed the authority associated with a request_key()
183 authorisation key then: 180 authorisation key then:
184 181
185 (a) If extant, the calling process's thread keyring is searched. 182 a) If extant, the calling process's thread keyring is searched.
186 183
187 (b) If extant, the calling process's process keyring is searched. 184 b) If extant, the calling process's process keyring is searched.
188 185
189 (c) The calling process's session keyring is searched. 186 c) The calling process's session keyring is searched.
190 187
191The moment one succeeds, all pending errors are discarded and the found key is 188The moment one succeeds, all pending errors are discarded and the found key is
192returned. 189returned.
@@ -194,7 +191,7 @@ returned.
194Only if all these fail does the whole thing fail with the highest priority 191Only if all these fail does the whole thing fail with the highest priority
195error. Note that several errors may have come from LSM. 192error. Note that several errors may have come from LSM.
196 193
197The error priority is: 194The error priority is::
198 195
199 EKEYREVOKED > EKEYEXPIRED > ENOKEY 196 EKEYREVOKED > EKEYEXPIRED > ENOKEY
200 197
diff --git a/Documentation/security/keys-trusted-encrypted.txt b/Documentation/security/keys/trusted-encrypted.rst
index b20a993a32af..7b503831bdea 100644
--- a/Documentation/security/keys-trusted-encrypted.txt
+++ b/Documentation/security/keys/trusted-encrypted.rst
@@ -1,4 +1,6 @@
1 Trusted and Encrypted Keys 1==========================
2Trusted and Encrypted Keys
3==========================
2 4
3Trusted and Encrypted Keys are two new key types added to the existing kernel 5Trusted and Encrypted Keys are two new key types added to the existing kernel
4key ring service. Both of these new types are variable length symmetric keys, 6key ring service. Both of these new types are variable length symmetric keys,
@@ -20,7 +22,8 @@ By default, trusted keys are sealed under the SRK, which has the default
20authorization value (20 zeros). This can be set at takeownership time with the 22authorization value (20 zeros). This can be set at takeownership time with the
21trouser's utility: "tpm_takeownership -u -z". 23trouser's utility: "tpm_takeownership -u -z".
22 24
23Usage: 25Usage::
26
24 keyctl add trusted name "new keylen [options]" ring 27 keyctl add trusted name "new keylen [options]" ring
25 keyctl add trusted name "load hex_blob [pcrlock=pcrnum]" ring 28 keyctl add trusted name "load hex_blob [pcrlock=pcrnum]" ring
26 keyctl update key "update [options]" 29 keyctl update key "update [options]"
@@ -64,19 +67,22 @@ The decrypted portion of encrypted keys can contain either a simple symmetric
64key or a more complex structure. The format of the more complex structure is 67key or a more complex structure. The format of the more complex structure is
65application specific, which is identified by 'format'. 68application specific, which is identified by 'format'.
66 69
67Usage: 70Usage::
71
68 keyctl add encrypted name "new [format] key-type:master-key-name keylen" 72 keyctl add encrypted name "new [format] key-type:master-key-name keylen"
69 ring 73 ring
70 keyctl add encrypted name "load hex_blob" ring 74 keyctl add encrypted name "load hex_blob" ring
71 keyctl update keyid "update key-type:master-key-name" 75 keyctl update keyid "update key-type:master-key-name"
72 76
73format:= 'default | ecryptfs' 77Where::
74key-type:= 'trusted' | 'user' 78
79 format:= 'default | ecryptfs'
80 key-type:= 'trusted' | 'user'
75 81
76 82
77Examples of trusted and encrypted key usage: 83Examples of trusted and encrypted key usage:
78 84
79Create and save a trusted key named "kmk" of length 32 bytes: 85Create and save a trusted key named "kmk" of length 32 bytes::
80 86
81 $ keyctl add trusted kmk "new 32" @u 87 $ keyctl add trusted kmk "new 32" @u
82 440502848 88 440502848
@@ -99,7 +105,7 @@ Create and save a trusted key named "kmk" of length 32 bytes:
99 105
100 $ keyctl pipe 440502848 > kmk.blob 106 $ keyctl pipe 440502848 > kmk.blob
101 107
102Load a trusted key from the saved blob: 108Load a trusted key from the saved blob::
103 109
104 $ keyctl add trusted kmk "load `cat kmk.blob`" @u 110 $ keyctl add trusted kmk "load `cat kmk.blob`" @u
105 268728824 111 268728824
@@ -114,7 +120,7 @@ Load a trusted key from the saved blob:
114 f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b 120 f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b
115 e4a8aea2b607ec96931e6f4d4fe563ba 121 e4a8aea2b607ec96931e6f4d4fe563ba
116 122
117Reseal a trusted key under new pcr values: 123Reseal a trusted key under new pcr values::
118 124
119 $ keyctl update 268728824 "update pcrinfo=`cat pcr.blob`" 125 $ keyctl update 268728824 "update pcrinfo=`cat pcr.blob`"
120 $ keyctl print 268728824 126 $ keyctl print 268728824
@@ -135,11 +141,13 @@ compromised by a user level problem, and when sealed to specific boot PCR
135values, protects against boot and offline attacks. Create and save an 141values, protects against boot and offline attacks. Create and save an
136encrypted key "evm" using the above trusted key "kmk": 142encrypted key "evm" using the above trusted key "kmk":
137 143
138option 1: omitting 'format' 144option 1: omitting 'format'::
145
139 $ keyctl add encrypted evm "new trusted:kmk 32" @u 146 $ keyctl add encrypted evm "new trusted:kmk 32" @u
140 159771175 147 159771175
141 148
142option 2: explicitly defining 'format' as 'default' 149option 2: explicitly defining 'format' as 'default'::
150
143 $ keyctl add encrypted evm "new default trusted:kmk 32" @u 151 $ keyctl add encrypted evm "new default trusted:kmk 32" @u
144 159771175 152 159771175
145 153
@@ -150,7 +158,7 @@ option 2: explicitly defining 'format' as 'default'
150 158
151 $ keyctl pipe 159771175 > evm.blob 159 $ keyctl pipe 159771175 > evm.blob
152 160
153Load an encrypted key "evm" from saved blob: 161Load an encrypted key "evm" from saved blob::
154 162
155 $ keyctl add encrypted evm "load `cat evm.blob`" @u 163 $ keyctl add encrypted evm "load `cat evm.blob`" @u
156 831684262 164 831684262
@@ -164,4 +172,4 @@ Other uses for trusted and encrypted keys, such as for disk and file encryption
164are anticipated. In particular the new format 'ecryptfs' has been defined in 172are anticipated. In particular the new format 'ecryptfs' has been defined in
165in order to use encrypted keys to mount an eCryptfs filesystem. More details 173in order to use encrypted keys to mount an eCryptfs filesystem. More details
166about the usage can be found in the file 174about the usage can be found in the file
167'Documentation/security/keys-ecryptfs.txt'. 175``Documentation/security/keys-ecryptfs.txt``.
diff --git a/Documentation/security/self-protection.txt b/Documentation/security/self-protection.rst
index 141acfebe6ef..60c8bd8b77bf 100644
--- a/Documentation/security/self-protection.txt
+++ b/Documentation/security/self-protection.rst
@@ -1,4 +1,6 @@
1# Kernel Self-Protection 1======================
2Kernel Self-Protection
3======================
2 4
3Kernel self-protection is the design and implementation of systems and 5Kernel self-protection is the design and implementation of systems and
4structures within the Linux kernel to protect against security flaws in 6structures within the Linux kernel to protect against security flaws in
@@ -26,7 +28,8 @@ mentioning them, since these aspects need to be explored, dealt with,
26and/or accepted. 28and/or accepted.
27 29
28 30
29## Attack Surface Reduction 31Attack Surface Reduction
32========================
30 33
31The most fundamental defense against security exploits is to reduce the 34The most fundamental defense against security exploits is to reduce the
32areas of the kernel that can be used to redirect execution. This ranges 35areas of the kernel that can be used to redirect execution. This ranges
@@ -34,13 +37,15 @@ from limiting the exposed APIs available to userspace, making in-kernel
34APIs hard to use incorrectly, minimizing the areas of writable kernel 37APIs hard to use incorrectly, minimizing the areas of writable kernel
35memory, etc. 38memory, etc.
36 39
37### Strict kernel memory permissions 40Strict kernel memory permissions
41--------------------------------
38 42
39When all of kernel memory is writable, it becomes trivial for attacks 43When all of kernel memory is writable, it becomes trivial for attacks
40to redirect execution flow. To reduce the availability of these targets 44to redirect execution flow. To reduce the availability of these targets
41the kernel needs to protect its memory with a tight set of permissions. 45the kernel needs to protect its memory with a tight set of permissions.
42 46
43#### Executable code and read-only data must not be writable 47Executable code and read-only data must not be writable
48~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
44 49
45Any areas of the kernel with executable memory must not be writable. 50Any areas of the kernel with executable memory must not be writable.
46While this obviously includes the kernel text itself, we must consider 51While this obviously includes the kernel text itself, we must consider
@@ -51,18 +56,19 @@ kernel, they are implemented in a way where the memory is temporarily
51made writable during the update, and then returned to the original 56made writable during the update, and then returned to the original
52permissions.) 57permissions.)
53 58
54In support of this are CONFIG_STRICT_KERNEL_RWX and 59In support of this are ``CONFIG_STRICT_KERNEL_RWX`` and
55CONFIG_STRICT_MODULE_RWX, which seek to make sure that code is not 60``CONFIG_STRICT_MODULE_RWX``, which seek to make sure that code is not
56writable, data is not executable, and read-only data is neither writable 61writable, data is not executable, and read-only data is neither writable
57nor executable. 62nor executable.
58 63
59Most architectures have these options on by default and not user selectable. 64Most architectures have these options on by default and not user selectable.
60For some architectures like arm that wish to have these be selectable, 65For some architectures like arm that wish to have these be selectable,
61the architecture Kconfig can select ARCH_OPTIONAL_KERNEL_RWX to enable 66the architecture Kconfig can select ARCH_OPTIONAL_KERNEL_RWX to enable
62a Kconfig prompt. CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT determines 67a Kconfig prompt. ``CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT`` determines
63the default setting when ARCH_OPTIONAL_KERNEL_RWX is enabled. 68the default setting when ARCH_OPTIONAL_KERNEL_RWX is enabled.
64 69
65#### Function pointers and sensitive variables must not be writable 70Function pointers and sensitive variables must not be writable
71~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
66 72
67Vast areas of kernel memory contain function pointers that are looked 73Vast areas of kernel memory contain function pointers that are looked
68up by the kernel and used to continue execution (e.g. descriptor/vector 74up by the kernel and used to continue execution (e.g. descriptor/vector
@@ -74,8 +80,8 @@ so that they live in the .rodata section instead of the .data section
74of the kernel, gaining the protection of the kernel's strict memory 80of the kernel, gaining the protection of the kernel's strict memory
75permissions as described above. 81permissions as described above.
76 82
77For variables that are initialized once at __init time, these can 83For variables that are initialized once at ``__init`` time, these can
78be marked with the (new and under development) __ro_after_init 84be marked with the (new and under development) ``__ro_after_init``
79attribute. 85attribute.
80 86
81What remains are variables that are updated rarely (e.g. GDT). These 87What remains are variables that are updated rarely (e.g. GDT). These
@@ -85,7 +91,8 @@ of their lifetime read-only. (For example, when being updated, only the
85CPU thread performing the update would be given uninterruptible write 91CPU thread performing the update would be given uninterruptible write
86access to the memory.) 92access to the memory.)
87 93
88#### Segregation of kernel memory from userspace memory 94Segregation of kernel memory from userspace memory
95~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
89 96
90The kernel must never execute userspace memory. The kernel must also never 97The kernel must never execute userspace memory. The kernel must also never
91access userspace memory without explicit expectation to do so. These 98access userspace memory without explicit expectation to do so. These
@@ -95,10 +102,11 @@ By blocking userspace memory in this way, execution and data parsing
95cannot be passed to trivially-controlled userspace memory, forcing 102cannot be passed to trivially-controlled userspace memory, forcing
96attacks to operate entirely in kernel memory. 103attacks to operate entirely in kernel memory.
97 104
98### Reduced access to syscalls 105Reduced access to syscalls
106--------------------------
99 107
100One trivial way to eliminate many syscalls for 64-bit systems is building 108One trivial way to eliminate many syscalls for 64-bit systems is building
101without CONFIG_COMPAT. However, this is rarely a feasible scenario. 109without ``CONFIG_COMPAT``. However, this is rarely a feasible scenario.
102 110
103The "seccomp" system provides an opt-in feature made available to 111The "seccomp" system provides an opt-in feature made available to
104userspace, which provides a way to reduce the number of kernel entry 112userspace, which provides a way to reduce the number of kernel entry
@@ -112,7 +120,8 @@ to trusted processes. This would keep the scope of kernel entry points
112restricted to the more regular set of normally available to unprivileged 120restricted to the more regular set of normally available to unprivileged
113userspace. 121userspace.
114 122
115### Restricting access to kernel modules 123Restricting access to kernel modules
124------------------------------------
116 125
117The kernel should never allow an unprivileged user the ability to 126The kernel should never allow an unprivileged user the ability to
118load specific kernel modules, since that would provide a facility to 127load specific kernel modules, since that would provide a facility to
@@ -127,11 +136,12 @@ for debate in some scenarios.)
127To protect against even privileged users, systems may need to either 136To protect against even privileged users, systems may need to either
128disable module loading entirely (e.g. monolithic kernel builds or 137disable module loading entirely (e.g. monolithic kernel builds or
129modules_disabled sysctl), or provide signed modules (e.g. 138modules_disabled sysctl), or provide signed modules (e.g.
130CONFIG_MODULE_SIG_FORCE, or dm-crypt with LoadPin), to keep from having 139``CONFIG_MODULE_SIG_FORCE``, or dm-crypt with LoadPin), to keep from having
131root load arbitrary kernel code via the module loader interface. 140root load arbitrary kernel code via the module loader interface.
132 141
133 142
134## Memory integrity 143Memory integrity
144================
135 145
136There are many memory structures in the kernel that are regularly abused 146There are many memory structures in the kernel that are regularly abused
137to gain execution control during an attack, By far the most commonly 147to gain execution control during an attack, By far the most commonly
@@ -139,16 +149,18 @@ understood is that of the stack buffer overflow in which the return
139address stored on the stack is overwritten. Many other examples of this 149address stored on the stack is overwritten. Many other examples of this
140kind of attack exist, and protections exist to defend against them. 150kind of attack exist, and protections exist to defend against them.
141 151
142### Stack buffer overflow 152Stack buffer overflow
153---------------------
143 154
144The classic stack buffer overflow involves writing past the expected end 155The classic stack buffer overflow involves writing past the expected end
145of a variable stored on the stack, ultimately writing a controlled value 156of a variable stored on the stack, ultimately writing a controlled value
146to the stack frame's stored return address. The most widely used defense 157to the stack frame's stored return address. The most widely used defense
147is the presence of a stack canary between the stack variables and the 158is the presence of a stack canary between the stack variables and the
148return address (CONFIG_CC_STACKPROTECTOR), which is verified just before 159return address (``CONFIG_CC_STACKPROTECTOR``), which is verified just before
149the function returns. Other defenses include things like shadow stacks. 160the function returns. Other defenses include things like shadow stacks.
150 161
151### Stack depth overflow 162Stack depth overflow
163--------------------
152 164
153A less well understood attack is using a bug that triggers the 165A less well understood attack is using a bug that triggers the
154kernel to consume stack memory with deep function calls or large stack 166kernel to consume stack memory with deep function calls or large stack
@@ -158,27 +170,31 @@ important changes need to be made for better protections: moving the
158sensitive thread_info structure elsewhere, and adding a faulting memory 170sensitive thread_info structure elsewhere, and adding a faulting memory
159hole at the bottom of the stack to catch these overflows. 171hole at the bottom of the stack to catch these overflows.
160 172
161### Heap memory integrity 173Heap memory integrity
174---------------------
162 175
163The structures used to track heap free lists can be sanity-checked during 176The structures used to track heap free lists can be sanity-checked during
164allocation and freeing to make sure they aren't being used to manipulate 177allocation and freeing to make sure they aren't being used to manipulate
165other memory areas. 178other memory areas.
166 179
167### Counter integrity 180Counter integrity
181-----------------
168 182
169Many places in the kernel use atomic counters to track object references 183Many places in the kernel use atomic counters to track object references
170or perform similar lifetime management. When these counters can be made 184or perform similar lifetime management. When these counters can be made
171to wrap (over or under) this traditionally exposes a use-after-free 185to wrap (over or under) this traditionally exposes a use-after-free
172flaw. By trapping atomic wrapping, this class of bug vanishes. 186flaw. By trapping atomic wrapping, this class of bug vanishes.
173 187
174### Size calculation overflow detection 188Size calculation overflow detection
189-----------------------------------
175 190
176Similar to counter overflow, integer overflows (usually size calculations) 191Similar to counter overflow, integer overflows (usually size calculations)
177need to be detected at runtime to kill this class of bug, which 192need to be detected at runtime to kill this class of bug, which
178traditionally leads to being able to write past the end of kernel buffers. 193traditionally leads to being able to write past the end of kernel buffers.
179 194
180 195
181## Statistical defenses 196Probabilistic defenses
197======================
182 198
183While many protections can be considered deterministic (e.g. read-only 199While many protections can be considered deterministic (e.g. read-only
184memory cannot be written to), some protections provide only statistical 200memory cannot be written to), some protections provide only statistical
@@ -186,7 +202,8 @@ defense, in that an attack must gather enough information about a
186running system to overcome the defense. While not perfect, these do 202running system to overcome the defense. While not perfect, these do
187provide meaningful defenses. 203provide meaningful defenses.
188 204
189### Canaries, blinding, and other secrets 205Canaries, blinding, and other secrets
206-------------------------------------
190 207
191It should be noted that things like the stack canary discussed earlier 208It should be noted that things like the stack canary discussed earlier
192are technically statistical defenses, since they rely on a secret value, 209are technically statistical defenses, since they rely on a secret value,
@@ -201,7 +218,8 @@ It is critical that the secret values used must be separate (e.g.
201different canary per stack) and high entropy (e.g. is the RNG actually 218different canary per stack) and high entropy (e.g. is the RNG actually
202working?) in order to maximize their success. 219working?) in order to maximize their success.
203 220
204### Kernel Address Space Layout Randomization (KASLR) 221Kernel Address Space Layout Randomization (KASLR)
222-------------------------------------------------
205 223
206Since the location of kernel memory is almost always instrumental in 224Since the location of kernel memory is almost always instrumental in
207mounting a successful attack, making the location non-deterministic 225mounting a successful attack, making the location non-deterministic
@@ -209,22 +227,25 @@ raises the difficulty of an exploit. (Note that this in turn makes
209the value of information exposures higher, since they may be used to 227the value of information exposures higher, since they may be used to
210discover desired memory locations.) 228discover desired memory locations.)
211 229
212#### Text and module base 230Text and module base
231~~~~~~~~~~~~~~~~~~~~
213 232
214By relocating the physical and virtual base address of the kernel at 233By relocating the physical and virtual base address of the kernel at
215boot-time (CONFIG_RANDOMIZE_BASE), attacks needing kernel code will be 234boot-time (``CONFIG_RANDOMIZE_BASE``), attacks needing kernel code will be
216frustrated. Additionally, offsetting the module loading base address 235frustrated. Additionally, offsetting the module loading base address
217means that even systems that load the same set of modules in the same 236means that even systems that load the same set of modules in the same
218order every boot will not share a common base address with the rest of 237order every boot will not share a common base address with the rest of
219the kernel text. 238the kernel text.
220 239
221#### Stack base 240Stack base
241~~~~~~~~~~
222 242
223If the base address of the kernel stack is not the same between processes, 243If the base address of the kernel stack is not the same between processes,
224or even not the same between syscalls, targets on or beyond the stack 244or even not the same between syscalls, targets on or beyond the stack
225become more difficult to locate. 245become more difficult to locate.
226 246
227#### Dynamic memory base 247Dynamic memory base
248~~~~~~~~~~~~~~~~~~~
228 249
229Much of the kernel's dynamic memory (e.g. kmalloc, vmalloc, etc) ends up 250Much of the kernel's dynamic memory (e.g. kmalloc, vmalloc, etc) ends up
230being relatively deterministic in layout due to the order of early-boot 251being relatively deterministic in layout due to the order of early-boot
@@ -232,7 +253,8 @@ initializations. If the base address of these areas is not the same
232between boots, targeting them is frustrated, requiring an information 253between boots, targeting them is frustrated, requiring an information
233exposure specific to the region. 254exposure specific to the region.
234 255
235#### Structure layout 256Structure layout
257~~~~~~~~~~~~~~~~
236 258
237By performing a per-build randomization of the layout of sensitive 259By performing a per-build randomization of the layout of sensitive
238structures, attacks must either be tuned to known kernel builds or expose 260structures, attacks must either be tuned to known kernel builds or expose
@@ -240,26 +262,30 @@ enough kernel memory to determine structure layouts before manipulating
240them. 262them.
241 263
242 264
243## Preventing Information Exposures 265Preventing Information Exposures
266================================
244 267
245Since the locations of sensitive structures are the primary target for 268Since the locations of sensitive structures are the primary target for
246attacks, it is important to defend against exposure of both kernel memory 269attacks, it is important to defend against exposure of both kernel memory
247addresses and kernel memory contents (since they may contain kernel 270addresses and kernel memory contents (since they may contain kernel
248addresses or other sensitive things like canary values). 271addresses or other sensitive things like canary values).
249 272
250### Unique identifiers 273Unique identifiers
274------------------
251 275
252Kernel memory addresses must never be used as identifiers exposed to 276Kernel memory addresses must never be used as identifiers exposed to
253userspace. Instead, use an atomic counter, an idr, or similar unique 277userspace. Instead, use an atomic counter, an idr, or similar unique
254identifier. 278identifier.
255 279
256### Memory initialization 280Memory initialization
281---------------------
257 282
258Memory copied to userspace must always be fully initialized. If not 283Memory copied to userspace must always be fully initialized. If not
259explicitly memset(), this will require changes to the compiler to make 284explicitly memset(), this will require changes to the compiler to make
260sure structure holes are cleared. 285sure structure holes are cleared.
261 286
262### Memory poisoning 287Memory poisoning
288----------------
263 289
264When releasing memory, it is best to poison the contents (clear stack on 290When releasing memory, it is best to poison the contents (clear stack on
265syscall return, wipe heap memory on a free), to avoid reuse attacks that 291syscall return, wipe heap memory on a free), to avoid reuse attacks that
@@ -267,9 +293,10 @@ rely on the old contents of memory. This frustrates many uninitialized
267variable attacks, stack content exposures, heap content exposures, and 293variable attacks, stack content exposures, heap content exposures, and
268use-after-free attacks. 294use-after-free attacks.
269 295
270### Destination tracking 296Destination tracking
297--------------------
271 298
272To help kill classes of bugs that result in kernel addresses being 299To help kill classes of bugs that result in kernel addresses being
273written to userspace, the destination of writes needs to be tracked. If 300written to userspace, the destination of writes needs to be tracked. If
274the buffer is destined for userspace (e.g. seq_file backed /proc files), 301the buffer is destined for userspace (e.g. seq_file backed ``/proc`` files),
275it should automatically censor sensitive values. 302it should automatically censor sensitive values.
diff --git a/Documentation/sh/conf.py b/Documentation/sh/conf.py
new file mode 100644
index 000000000000..1eb684a13ac8
--- /dev/null
+++ b/Documentation/sh/conf.py
@@ -0,0 +1,10 @@
1# -*- coding: utf-8; mode: python -*-
2
3project = "SuperH architecture implementation manual"
4
5tags.add("subproject")
6
7latex_documents = [
8 ('index', 'sh.tex', project,
9 'The kernel development community', 'manual'),
10]
diff --git a/Documentation/sh/index.rst b/Documentation/sh/index.rst
new file mode 100644
index 000000000000..bc8db7ba894a
--- /dev/null
+++ b/Documentation/sh/index.rst
@@ -0,0 +1,59 @@
1=======================
2SuperH Interfaces Guide
3=======================
4
5:Author: Paul Mundt
6
7Memory Management
8=================
9
10SH-4
11----
12
13Store Queue API
14~~~~~~~~~~~~~~~
15
16.. kernel-doc:: arch/sh/kernel/cpu/sh4/sq.c
17 :export:
18
19SH-5
20----
21
22TLB Interfaces
23~~~~~~~~~~~~~~
24
25.. kernel-doc:: arch/sh/mm/tlb-sh5.c
26 :internal:
27
28.. kernel-doc:: arch/sh/include/asm/tlb_64.h
29 :internal:
30
31Machine Specific Interfaces
32===========================
33
34mach-dreamcast
35--------------
36
37.. kernel-doc:: arch/sh/boards/mach-dreamcast/rtc.c
38 :internal:
39
40mach-x3proto
41------------
42
43.. kernel-doc:: arch/sh/boards/mach-x3proto/ilsel.c
44 :export:
45
46Busses
47======
48
49SuperHyway
50----------
51
52.. kernel-doc:: drivers/sh/superhyway/superhyway.c
53 :export:
54
55Maple
56-----
57
58.. kernel-doc:: drivers/sh/maple/maple.c
59 :export:
diff --git a/Documentation/sound/conf.py b/Documentation/sound/conf.py
new file mode 100644
index 000000000000..3f1fc5e74e7b
--- /dev/null
+++ b/Documentation/sound/conf.py
@@ -0,0 +1,10 @@
1# -*- coding: utf-8; mode: python -*-
2
3project = "Linux Sound Subsystem Documentation"
4
5tags.add("subproject")
6
7latex_documents = [
8 ('index', 'sound.tex', project,
9 'The kernel development community', 'manual'),
10]
diff --git a/Documentation/sphinx/convert_template.sed b/Documentation/sphinx/convert_template.sed
deleted file mode 100644
index c1503fcca4ec..000000000000
--- a/Documentation/sphinx/convert_template.sed
+++ /dev/null
@@ -1,18 +0,0 @@
1#
2# Pandoc doesn't grok <function> or <structname>, so convert them
3# ahead of time.
4#
5# Use the following escapes to pass through pandoc:
6# $bq = "`"
7# $lt = "<"
8# $gt = ">"
9#
10s%<function>\([^<(]\+\)()</function>%:c:func:$bq\1()$bq%g
11s%<function>\([^<(]\+\)</function>%:c:func:$bq\1()$bq%g
12s%<structname>struct *\([^<]\+\)</structname>%:c:type:$bqstruct \1 $lt\1$gt$bq%g
13s%struct <structname>\([^<]\+\)</structname>%:c:type:$bqstruct \1 $lt\1$gt$bq%g
14s%<structname>\([^<]\+\)</structname>%:c:type:$bqstruct \1 $lt\1$gt$bq%g
15#
16# Wrap docproc directives in para and code blocks.
17#
18s%^\(!.*\)$%<para><code>DOCPROC: \1</code></para>%
diff --git a/Documentation/sphinx/post_convert.sed b/Documentation/sphinx/post_convert.sed
deleted file mode 100644
index 392770bac53b..000000000000
--- a/Documentation/sphinx/post_convert.sed
+++ /dev/null
@@ -1,23 +0,0 @@
1#
2# Unescape.
3#
4s/$bq/`/g
5s/$lt/</g
6s/$gt/>/g
7#
8# pandoc thinks that both "_" needs to be escaped. Remove the extra
9# backslashes.
10#
11s/\\_/_/g
12#
13# Unwrap docproc directives.
14#
15s/^``DOCPROC: !E\(.*\)``$/.. kernel-doc:: \1\n :export:/
16s/^``DOCPROC: !I\(.*\)``$/.. kernel-doc:: \1\n :internal:/
17s/^``DOCPROC: !F\([^ ]*\) \(.*\)``$/.. kernel-doc:: \1\n :functions: \2/
18s/^``DOCPROC: !P\([^ ]*\) \(.*\)``$/.. kernel-doc:: \1\n :doc: \2/
19s/^``DOCPROC: \(!.*\)``$/.. WARNING: DOCPROC directive not supported: \1/
20#
21# Trim trailing whitespace.
22#
23s/[[:space:]]*$//
diff --git a/Documentation/sphinx/tmplcvt b/Documentation/sphinx/tmplcvt
deleted file mode 100755
index 6848f0a26fa5..000000000000
--- a/Documentation/sphinx/tmplcvt
+++ /dev/null
@@ -1,28 +0,0 @@
1#!/bin/bash
2#
3# Convert a template file into something like RST
4#
5# fix <function>
6# feed to pandoc
7# fix \_
8# title line?
9#
10set -eu
11
12if [ "$#" != "2" ]; then
13 echo "$0 <docbook file> <rst file>"
14 exit
15fi
16
17DIR=$(dirname $0)
18
19in=$1
20rst=$2
21tmp=$rst.tmp
22
23cp $in $tmp
24sed --in-place -f $DIR/convert_template.sed $tmp
25pandoc -s -S -f docbook -t rst -o $rst $tmp
26sed --in-place -f $DIR/post_convert.sed $rst
27rm $tmp
28echo "book writen to $rst"
diff --git a/Documentation/translations/ja_JP/howto.rst b/Documentation/translations/ja_JP/howto.rst
index 4511eed0fabb..8d7ed0cbbf5f 100644
--- a/Documentation/translations/ja_JP/howto.rst
+++ b/Documentation/translations/ja_JP/howto.rst
@@ -197,13 +197,6 @@ ReSTマークアップを使ã£ãŸãƒ‰ã‚­ãƒ¥ãƒ¡ãƒ³ãƒˆã¯ Documentation/outputã«ç
197 make latexdocs 197 make latexdocs
198 make epubdocs 198 make epubdocs
199 199
200ç¾åœ¨ã€å¹¾ã¤ã‹ã® DocBookå½¢å¼ã§æ›¸ã‹ã‚ŒãŸãƒ‰ã‚­ãƒ¥ãƒ¡ãƒ³ãƒˆã¯ ReSTå½¢å¼ã«è»¢æ›ä¸­ã§
201ã™ã€‚ãれらã®ãƒ‰ã‚­ãƒ¥ãƒ¡ãƒ³ãƒˆã¯Documentation/DocBook ディレクトリã«ç”Ÿæˆã•れã€
202Postscript ã¾ãŸã¯ man ページã®å½¢å¼ã‚’生æˆã™ã‚‹ã«ã¯ä»¥ä¸‹ã®ã‚ˆã†ã«ã—ã¾ã™ - ::
203
204 make psdocs
205 make mandocs
206
207カーãƒãƒ«é–‹ç™ºè€…ã«ãªã‚‹ã«ã¯ 200カーãƒãƒ«é–‹ç™ºè€…ã«ãªã‚‹ã«ã¯
208------------------------ 201------------------------
209 202
diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst
index 2333697251dd..624654bdcd8a 100644
--- a/Documentation/translations/ko_KR/howto.rst
+++ b/Documentation/translations/ko_KR/howto.rst
@@ -191,13 +191,6 @@ ReST 마í¬ì—…ì„ ì‚¬ìš©í•˜ëŠ” ë¬¸ì„œë“¤ì€ Documentation/output ì— ìƒì„±ëœë‹
191 make latexdocs 191 make latexdocs
192 make epubdocs 192 make epubdocs
193 193
194현재, ReST ë¡œì˜ ë³€í™˜ì´ ì§„í–‰ì¤‘ì¸, DocBook 으로 ì“°ì¸ ë¬¸ì„œë“¤ì´ ì¡´ìž¬í•œë‹¤. 그런
195ë¬¸ì„œë“¤ì€ Documentation/DocBook/ 디렉토리 ì•ˆì— ìƒì„±ë  것ì´ê³  ë‹¤ìŒ ì»¤ë§¨ë“œë¥¼ 통해
196Postscript 나 man page ë¡œë„ ë§Œë“¤ì–´ì§ˆ 수 있다::
197
198 make psdocs
199 make mandocs
200
201ì»¤ë„ ê°œë°œìžê°€ ë˜ëŠ” 것 194ì»¤ë„ ê°œë°œìžê°€ ë˜ëŠ” 것
202--------------------- 195---------------------
203 196
@@ -270,15 +263,17 @@ pub/linux/kernel/v4.x/ 디렉토리ì—서 ì°¸ì¡°ë  ìˆ˜ 있다.개발 프로세ì
270 선호ë˜ëŠ” ë°©ë²•ì€ git(커ë„ì˜ ì†ŒìŠ¤ 관리 툴, ë” ë§Žì€ ì •ë³´ë“¤ì€ 263 선호ë˜ëŠ” ë°©ë²•ì€ git(커ë„ì˜ ì†ŒìŠ¤ 관리 툴, ë” ë§Žì€ ì •ë³´ë“¤ì€
271 https://git-scm.com/ ì—서 참조할 수 있다)를 사용하는 것ì´ì§€ë§Œ 순수한 264 https://git-scm.com/ ì—서 참조할 수 있다)를 사용하는 것ì´ì§€ë§Œ 순수한
272 패치파ì¼ì˜ 형ì‹ìœ¼ë¡œ 보내는 ê²ƒë„ ë¬´ê´€í•˜ë‹¤. 265 패치파ì¼ì˜ 형ì‹ìœ¼ë¡œ 보내는 ê²ƒë„ ë¬´ê´€í•˜ë‹¤.
273 - 2주 í›„ì— -rc1 커ë„ì´ ë°°í¬ë˜ë©° 지금부터는 ì „ì²´ 커ë„ì˜ ì•ˆì •ì„±ì— ì˜í–¥ì„ 266 - 2주 í›„ì— -rc1 커ë„ì´ ë¦´ë¦¬ì¦ˆë˜ë©° ì—¬ê¸°ì„œë¶€í„°ì˜ ì£¼ì•ˆì ì€ 새로운 커ë„ì„
274 미칠수 있는 새로운 ê¸°ëŠ¥ë“¤ì„ í¬í•¨í•˜ì§€ 않는 íŒ¨ì¹˜ë“¤ë§Œì´ ì¶”ê°€ë  ìˆ˜ 있다. 267 가능한한 안정ë˜ê²Œ 하는 것ì´ë‹¤. ì´ ì‹œì ì—ì„œì˜ ëŒ€ë¶€ë¶„ì˜ íŒ¨ì¹˜ë“¤ì€
275 완전히 새로운 드ë¼ì´ë²„(í˜¹ì€ íŒŒì¼ì‹œìŠ¤í…œ)는 -rc1 ì´í›„ì—ë§Œ 받아들여진다는
276 ê²ƒì„ ê¸°ì–µí•´ë¼. 왜ëƒí•˜ë©´ ë³€ê²½ì´ ìžì²´ë‚´ì—서만 ë°œìƒí•˜ê³  ì¶”ê°€ëœ ì½”ë“œê°€
277 드ë¼ì´ë²„ ì™¸ë¶€ì˜ ë‹¤ë¥¸ 부분ì—는 ì˜í–¥ì„ 주지 않으므로 그런 변경ì€
278 회귀(ì—­ìžì£¼: ì´ì „ì—는 존재하지 않았지만 새로운 기능추가나 변경으로 ì¸í•´ 268 회귀(ì—­ìžì£¼: ì´ì „ì—는 존재하지 않았지만 새로운 기능추가나 변경으로 ì¸í•´
279 ìƒê²¨ë‚œ 버그)를 ì¼ìœ¼í‚¬ 만한 ìœ„í—˜ì„ ê°€ì§€ê³  있지 않기 때문ì´ë‹¤. -rc1ì´ 269 ìƒê²¨ë‚œ 버그)를 ê³ ì³ì•¼ 한다. ì´ì „부터 존재한 버그는 회귀가 아니므로, 그런
280 ë°°í¬ëœ ì´í›„ì— git를 사용하여 íŒ¨ì¹˜ë“¤ì„ Linusì—게 보낼수 있지만 íŒ¨ì¹˜ë“¤ì€ 270 ë²„ê·¸ì— ëŒ€í•œ ìˆ˜ì •ì‚¬í•­ì€ ì¤‘ìš”í•œ 경우ì—ë§Œ 보내져야 한다. 완전히 새로운
281 ê³µì‹ì ì¸ ë©”ì¼ë§ 리스트로 보내서 검토를 ë°›ì„ í•„ìš”ê°€ 있다. 271 드ë¼ì´ë²„(í˜¹ì€ íŒŒì¼ì‹œìŠ¤í…œ)는 -rc1 ì´í›„ì—ë§Œ 받아들여진다는 ê²ƒì„ ê¸°ì–µí•´ë¼.
272 왜ëƒí•˜ë©´ ë³€ê²½ì´ ìžì²´ë‚´ì—서만 ë°œìƒí•˜ê³  ì¶”ê°€ëœ ì½”ë“œê°€ 드ë¼ì´ë²„ ì™¸ë¶€ì˜ ë‹¤ë¥¸
273 부분ì—는 ì˜í–¥ì„ 주지 않으므로 그런 ë³€ê²½ì€ íšŒê·€ë¥¼ ì¼ìœ¼í‚¬ 만한 ìœ„í—˜ì„ ê°€ì§€ê³ 
274 있지 않기 때문ì´ë‹¤. -rc1ì´ ë°°í¬ëœ ì´í›„ì— git를 사용하여 íŒ¨ì¹˜ë“¤ì„ Linusì—게
275 보낼수 있지만 íŒ¨ì¹˜ë“¤ì€ ê³µì‹ì ì¸ ë©”ì¼ë§ 리스트로 보내서 검토를 ë°›ì„ í•„ìš”ê°€
276 있다.
282 - 새로운 -rc는 Linusê°€ 현재 git treeê°€ 테스트 í•˜ê¸°ì— ì¶©ë¶„ížˆ ì•ˆì •ëœ ìƒíƒœì— 277 - 새로운 -rc는 Linusê°€ 현재 git treeê°€ 테스트 í•˜ê¸°ì— ì¶©ë¶„ížˆ ì•ˆì •ëœ ìƒíƒœì—
283 있다고 íŒë‹¨ë  때마다 ë°°í¬ëœë‹¤. 목표는 새로운 -rc 커ë„ì„ ë§¤ì£¼ ë°°í¬í•˜ëŠ” 278 있다고 íŒë‹¨ë  때마다 ë°°í¬ëœë‹¤. 목표는 새로운 -rc 커ë„ì„ ë§¤ì£¼ ë°°í¬í•˜ëŠ”
284 것ì´ë‹¤. 279 것ì´ë‹¤.
@@ -359,7 +354,7 @@ http://patchwork.ozlabs.org/ ì— ë‚˜ì—´ë˜ì–´ 있다.
359버그 보고 354버그 보고
360--------- 355---------
361 356
362https://bugzilla.kernel.org는 리눅스 ì»¤ë„ ê°œë°œìžë“¤ì´ 커ë„ì˜ ë²„ê·¸ë¥¼ ì¶”ì í•˜ëŠ” 357https://bugzilla.kernel.org 는 리눅스 ì»¤ë„ ê°œë°œìžë“¤ì´ 커ë„ì˜ ë²„ê·¸ë¥¼ ì¶”ì í•˜ëŠ”
363ê³³ì´ë‹¤. 사용ìžë“¤ì€ 발견한 모든 ë²„ê·¸ë“¤ì„ ë³´ê³ í•˜ê¸° 위하여 ì´ íˆ´ì„ ì‚¬ìš©í•  ê²ƒì„ 358ê³³ì´ë‹¤. 사용ìžë“¤ì€ 발견한 모든 ë²„ê·¸ë“¤ì„ ë³´ê³ í•˜ê¸° 위하여 ì´ íˆ´ì„ ì‚¬ìš©í•  것ì„
364권장한다. kernel bugzilla를 사용하는 ìžì„¸í•œ ë°©ë²•ì€ ë‹¤ìŒì„ 참조하ë¼. 359권장한다. kernel bugzilla를 사용하는 ìžì„¸í•œ ë°©ë²•ì€ ë‹¤ìŒì„ 참조하ë¼.
365 360
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt
index d05d4c54e8f7..c6f4ead76ce7 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -786,7 +786,7 @@ CPU 는 b ë¡œë¶€í„°ì˜ ë¡œë“œ 오í¼ë ˆì´ì…˜ì´ a ë¡œë¶€í„°ì˜ ë¡œë“œ 오í¼ë ˆ
786ìœ„ì˜ ì½”ë“œë¥¼ 아래와 ê°™ì´ ë°”ê¿”ë²„ë¦´ 수 있습니다: 786ìœ„ì˜ ì½”ë“œë¥¼ 아래와 ê°™ì´ ë°”ê¿”ë²„ë¦´ 수 있습니다:
787 787
788 q = READ_ONCE(a); 788 q = READ_ONCE(a);
789 WRITE_ONCE(b, 1); 789 WRITE_ONCE(b, 2);
790 do_something_else(); 790 do_something_else();
791 791
792ì´ë ‡ê²Œ ë˜ë©´, CPU 는 변수 'a' ë¡œë¶€í„°ì˜ ë¡œë“œì™€ 변수 'b' ë¡œì˜ ìŠ¤í† ì–´ 사ì´ì˜ 순서를 792ì´ë ‡ê²Œ ë˜ë©´, CPU 는 변수 'a' ë¡œë¶€í„°ì˜ ë¡œë“œì™€ 변수 'b' ë¡œì˜ ìŠ¤í† ì–´ 사ì´ì˜ 순서를
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index a9d01b44a659..7b2eb1b7d4ca 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -16,6 +16,8 @@ place where this information is gathered.
16.. toctree:: 16.. toctree::
17 :maxdepth: 2 17 :maxdepth: 2
18 18
19 no_new_privs
20 seccomp_filter
19 unshare 21 unshare
20 22
21.. only:: subproject and html 23.. only:: subproject and html
diff --git a/Documentation/prctl/no_new_privs.txt b/Documentation/userspace-api/no_new_privs.rst
index f7be84fba910..d060ea217ea1 100644
--- a/Documentation/prctl/no_new_privs.txt
+++ b/Documentation/userspace-api/no_new_privs.rst
@@ -1,3 +1,7 @@
1======================
2No New Privileges Flag
3======================
4
1The execve system call can grant a newly-started program privileges that 5The execve system call can grant a newly-started program privileges that
2its parent did not have. The most obvious examples are setuid/setgid 6its parent did not have. The most obvious examples are setuid/setgid
3programs and file capabilities. To prevent the parent program from 7programs and file capabilities. To prevent the parent program from
@@ -5,53 +9,55 @@ gaining these privileges as well, the kernel and user code must be
5careful to prevent the parent from doing anything that could subvert the 9careful to prevent the parent from doing anything that could subvert the
6child. For example: 10child. For example:
7 11
8 - The dynamic loader handles LD_* environment variables differently if 12 - The dynamic loader handles ``LD_*`` environment variables differently if
9 a program is setuid. 13 a program is setuid.
10 14
11 - chroot is disallowed to unprivileged processes, since it would allow 15 - chroot is disallowed to unprivileged processes, since it would allow
12 /etc/passwd to be replaced from the point of view of a process that 16 ``/etc/passwd`` to be replaced from the point of view of a process that
13 inherited chroot. 17 inherited chroot.
14 18
15 - The exec code has special handling for ptrace. 19 - The exec code has special handling for ptrace.
16 20
17These are all ad-hoc fixes. The no_new_privs bit (since Linux 3.5) is a 21These are all ad-hoc fixes. The ``no_new_privs`` bit (since Linux 3.5) is a
18new, generic mechanism to make it safe for a process to modify its 22new, generic mechanism to make it safe for a process to modify its
19execution environment in a manner that persists across execve. Any task 23execution environment in a manner that persists across execve. Any task
20can set no_new_privs. Once the bit is set, it is inherited across fork, 24can set ``no_new_privs``. Once the bit is set, it is inherited across fork,
21clone, and execve and cannot be unset. With no_new_privs set, execve 25clone, and execve and cannot be unset. With ``no_new_privs`` set, ``execve()``
22promises not to grant the privilege to do anything that could not have 26promises not to grant the privilege to do anything that could not have
23been done without the execve call. For example, the setuid and setgid 27been done without the execve call. For example, the setuid and setgid
24bits will no longer change the uid or gid; file capabilities will not 28bits will no longer change the uid or gid; file capabilities will not
25add to the permitted set, and LSMs will not relax constraints after 29add to the permitted set, and LSMs will not relax constraints after
26execve. 30execve.
27 31
28To set no_new_privs, use prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0). 32To set ``no_new_privs``, use::
33
34 prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
29 35
30Be careful, though: LSMs might also not tighten constraints on exec 36Be careful, though: LSMs might also not tighten constraints on exec
31in no_new_privs mode. (This means that setting up a general-purpose 37in ``no_new_privs`` mode. (This means that setting up a general-purpose
32service launcher to set no_new_privs before execing daemons may 38service launcher to set ``no_new_privs`` before execing daemons may
33interfere with LSM-based sandboxing.) 39interfere with LSM-based sandboxing.)
34 40
35Note that no_new_privs does not prevent privilege changes that do not 41Note that ``no_new_privs`` does not prevent privilege changes that do not
36involve execve. An appropriately privileged task can still call 42involve ``execve()``. An appropriately privileged task can still call
37setuid(2) and receive SCM_RIGHTS datagrams. 43``setuid(2)`` and receive SCM_RIGHTS datagrams.
38 44
39There are two main use cases for no_new_privs so far: 45There are two main use cases for ``no_new_privs`` so far:
40 46
41 - Filters installed for the seccomp mode 2 sandbox persist across 47 - Filters installed for the seccomp mode 2 sandbox persist across
42 execve and can change the behavior of newly-executed programs. 48 execve and can change the behavior of newly-executed programs.
43 Unprivileged users are therefore only allowed to install such filters 49 Unprivileged users are therefore only allowed to install such filters
44 if no_new_privs is set. 50 if ``no_new_privs`` is set.
45 51
46 - By itself, no_new_privs can be used to reduce the attack surface 52 - By itself, ``no_new_privs`` can be used to reduce the attack surface
47 available to an unprivileged user. If everything running with a 53 available to an unprivileged user. If everything running with a
48 given uid has no_new_privs set, then that uid will be unable to 54 given uid has ``no_new_privs`` set, then that uid will be unable to
49 escalate its privileges by directly attacking setuid, setgid, and 55 escalate its privileges by directly attacking setuid, setgid, and
50 fcap-using binaries; it will need to compromise something without the 56 fcap-using binaries; it will need to compromise something without the
51 no_new_privs bit set first. 57 ``no_new_privs`` bit set first.
52 58
53In the future, other potentially dangerous kernel features could become 59In the future, other potentially dangerous kernel features could become
54available to unprivileged tasks if no_new_privs is set. In principle, 60available to unprivileged tasks if ``no_new_privs`` is set. In principle,
55several options to unshare(2) and clone(2) would be safe when 61several options to ``unshare(2)`` and ``clone(2)`` would be safe when
56no_new_privs is set, and no_new_privs + chroot is considerable less 62``no_new_privs`` is set, and ``no_new_privs`` + ``chroot`` is considerable less
57dangerous than chroot by itself. 63dangerous than chroot by itself.
diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/userspace-api/seccomp_filter.rst
index 1e469ef75778..f71eb5ef1f2d 100644
--- a/Documentation/prctl/seccomp_filter.txt
+++ b/Documentation/userspace-api/seccomp_filter.rst
@@ -1,8 +1,9 @@
1 SECure COMPuting with filters 1===========================================
2 ============================= 2Seccomp BPF (SECure COMPuting with filters)
3===========================================
3 4
4Introduction 5Introduction
5------------ 6============
6 7
7A large number of system calls are exposed to every userland process 8A large number of system calls are exposed to every userland process
8with many of them going unused for the entire lifetime of the process. 9with many of them going unused for the entire lifetime of the process.
@@ -27,7 +28,7 @@ pointers which constrains all filters to solely evaluating the system
27call arguments directly. 28call arguments directly.
28 29
29What it isn't 30What it isn't
30------------- 31=============
31 32
32System call filtering isn't a sandbox. It provides a clearly defined 33System call filtering isn't a sandbox. It provides a clearly defined
33mechanism for minimizing the exposed kernel surface. It is meant to be 34mechanism for minimizing the exposed kernel surface. It is meant to be
@@ -40,13 +41,13 @@ system calls in socketcall() is allowed, for instance) which could be
40construed, incorrectly, as a more complete sandboxing solution. 41construed, incorrectly, as a more complete sandboxing solution.
41 42
42Usage 43Usage
43----- 44=====
44 45
45An additional seccomp mode is added and is enabled using the same 46An additional seccomp mode is added and is enabled using the same
46prctl(2) call as the strict seccomp. If the architecture has 47prctl(2) call as the strict seccomp. If the architecture has
47CONFIG_HAVE_ARCH_SECCOMP_FILTER, then filters may be added as below: 48``CONFIG_HAVE_ARCH_SECCOMP_FILTER``, then filters may be added as below:
48 49
49PR_SET_SECCOMP: 50``PR_SET_SECCOMP``:
50 Now takes an additional argument which specifies a new filter 51 Now takes an additional argument which specifies a new filter
51 using a BPF program. 52 using a BPF program.
52 The BPF program will be executed over struct seccomp_data 53 The BPF program will be executed over struct seccomp_data
@@ -55,24 +56,25 @@ PR_SET_SECCOMP:
55 acceptable values to inform the kernel which action should be 56 acceptable values to inform the kernel which action should be
56 taken. 57 taken.
57 58
58 Usage: 59 Usage::
60
59 prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog); 61 prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog);
60 62
61 The 'prog' argument is a pointer to a struct sock_fprog which 63 The 'prog' argument is a pointer to a struct sock_fprog which
62 will contain the filter program. If the program is invalid, the 64 will contain the filter program. If the program is invalid, the
63 call will return -1 and set errno to EINVAL. 65 call will return -1 and set errno to ``EINVAL``.
64 66
65 If fork/clone and execve are allowed by @prog, any child 67 If ``fork``/``clone`` and ``execve`` are allowed by @prog, any child
66 processes will be constrained to the same filters and system 68 processes will be constrained to the same filters and system
67 call ABI as the parent. 69 call ABI as the parent.
68 70
69 Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or 71 Prior to use, the task must call ``prctl(PR_SET_NO_NEW_PRIVS, 1)`` or
70 run with CAP_SYS_ADMIN privileges in its namespace. If these are not 72 run with ``CAP_SYS_ADMIN`` privileges in its namespace. If these are not
71 true, -EACCES will be returned. This requirement ensures that filter 73 true, ``-EACCES`` will be returned. This requirement ensures that filter
72 programs cannot be applied to child processes with greater privileges 74 programs cannot be applied to child processes with greater privileges
73 than the task that installed them. 75 than the task that installed them.
74 76
75 Additionally, if prctl(2) is allowed by the attached filter, 77 Additionally, if ``prctl(2)`` is allowed by the attached filter,
76 additional filters may be layered on which will increase evaluation 78 additional filters may be layered on which will increase evaluation
77 time, but allow for further decreasing the attack surface during 79 time, but allow for further decreasing the attack surface during
78 execution of a process. 80 execution of a process.
@@ -80,51 +82,52 @@ PR_SET_SECCOMP:
80The above call returns 0 on success and non-zero on error. 82The above call returns 0 on success and non-zero on error.
81 83
82Return values 84Return values
83------------- 85=============
86
84A seccomp filter may return any of the following values. If multiple 87A seccomp filter may return any of the following values. If multiple
85filters exist, the return value for the evaluation of a given system 88filters exist, the return value for the evaluation of a given system
86call will always use the highest precedent value. (For example, 89call will always use the highest precedent value. (For example,
87SECCOMP_RET_KILL will always take precedence.) 90``SECCOMP_RET_KILL`` will always take precedence.)
88 91
89In precedence order, they are: 92In precedence order, they are:
90 93
91SECCOMP_RET_KILL: 94``SECCOMP_RET_KILL``:
92 Results in the task exiting immediately without executing the 95 Results in the task exiting immediately without executing the
93 system call. The exit status of the task (status & 0x7f) will 96 system call. The exit status of the task (``status & 0x7f``) will
94 be SIGSYS, not SIGKILL. 97 be ``SIGSYS``, not ``SIGKILL``.
95 98
96SECCOMP_RET_TRAP: 99``SECCOMP_RET_TRAP``:
97 Results in the kernel sending a SIGSYS signal to the triggering 100 Results in the kernel sending a ``SIGSYS`` signal to the triggering
98 task without executing the system call. siginfo->si_call_addr 101 task without executing the system call. ``siginfo->si_call_addr``
99 will show the address of the system call instruction, and 102 will show the address of the system call instruction, and
100 siginfo->si_syscall and siginfo->si_arch will indicate which 103 ``siginfo->si_syscall`` and ``siginfo->si_arch`` will indicate which
101 syscall was attempted. The program counter will be as though 104 syscall was attempted. The program counter will be as though
102 the syscall happened (i.e. it will not point to the syscall 105 the syscall happened (i.e. it will not point to the syscall
103 instruction). The return value register will contain an arch- 106 instruction). The return value register will contain an arch-
104 dependent value -- if resuming execution, set it to something 107 dependent value -- if resuming execution, set it to something
105 sensible. (The architecture dependency is because replacing 108 sensible. (The architecture dependency is because replacing
106 it with -ENOSYS could overwrite some useful information.) 109 it with ``-ENOSYS`` could overwrite some useful information.)
107 110
108 The SECCOMP_RET_DATA portion of the return value will be passed 111 The ``SECCOMP_RET_DATA`` portion of the return value will be passed
109 as si_errno. 112 as ``si_errno``.
110 113
111 SIGSYS triggered by seccomp will have a si_code of SYS_SECCOMP. 114 ``SIGSYS`` triggered by seccomp will have a si_code of ``SYS_SECCOMP``.
112 115
113SECCOMP_RET_ERRNO: 116``SECCOMP_RET_ERRNO``:
114 Results in the lower 16-bits of the return value being passed 117 Results in the lower 16-bits of the return value being passed
115 to userland as the errno without executing the system call. 118 to userland as the errno without executing the system call.
116 119
117SECCOMP_RET_TRACE: 120``SECCOMP_RET_TRACE``:
118 When returned, this value will cause the kernel to attempt to 121 When returned, this value will cause the kernel to attempt to
119 notify a ptrace()-based tracer prior to executing the system 122 notify a ``ptrace()``-based tracer prior to executing the system
120 call. If there is no tracer present, -ENOSYS is returned to 123 call. If there is no tracer present, ``-ENOSYS`` is returned to
121 userland and the system call is not executed. 124 userland and the system call is not executed.
122 125
123 A tracer will be notified if it requests PTRACE_O_TRACESECCOMP 126 A tracer will be notified if it requests ``PTRACE_O_TRACESECCOM``P
124 using ptrace(PTRACE_SETOPTIONS). The tracer will be notified 127 using ``ptrace(PTRACE_SETOPTIONS)``. The tracer will be notified
125 of a PTRACE_EVENT_SECCOMP and the SECCOMP_RET_DATA portion of 128 of a ``PTRACE_EVENT_SECCOMP`` and the ``SECCOMP_RET_DATA`` portion of
126 the BPF program return value will be available to the tracer 129 the BPF program return value will be available to the tracer
127 via PTRACE_GETEVENTMSG. 130 via ``PTRACE_GETEVENTMSG``.
128 131
129 The tracer can skip the system call by changing the syscall number 132 The tracer can skip the system call by changing the syscall number
130 to -1. Alternatively, the tracer can change the system call 133 to -1. Alternatively, the tracer can change the system call
@@ -138,19 +141,19 @@ SECCOMP_RET_TRACE:
138 allow use of ptrace, even of other sandboxed processes, without 141 allow use of ptrace, even of other sandboxed processes, without
139 extreme care; ptracers can use this mechanism to escape.) 142 extreme care; ptracers can use this mechanism to escape.)
140 143
141SECCOMP_RET_ALLOW: 144``SECCOMP_RET_ALLOW``:
142 Results in the system call being executed. 145 Results in the system call being executed.
143 146
144If multiple filters exist, the return value for the evaluation of a 147If multiple filters exist, the return value for the evaluation of a
145given system call will always use the highest precedent value. 148given system call will always use the highest precedent value.
146 149
147Precedence is only determined using the SECCOMP_RET_ACTION mask. When 150Precedence is only determined using the ``SECCOMP_RET_ACTION`` mask. When
148multiple filters return values of the same precedence, only the 151multiple filters return values of the same precedence, only the
149SECCOMP_RET_DATA from the most recently installed filter will be 152``SECCOMP_RET_DATA`` from the most recently installed filter will be
150returned. 153returned.
151 154
152Pitfalls 155Pitfalls
153-------- 156========
154 157
155The biggest pitfall to avoid during use is filtering on system call 158The biggest pitfall to avoid during use is filtering on system call
156number without checking the architecture value. Why? On any 159number without checking the architecture value. Why? On any
@@ -160,39 +163,40 @@ the numbers in the different calling conventions overlap, then checks in
160the filters may be abused. Always check the arch value! 163the filters may be abused. Always check the arch value!
161 164
162Example 165Example
163------- 166=======
164 167
165The samples/seccomp/ directory contains both an x86-specific example 168The ``samples/seccomp/`` directory contains both an x86-specific example
166and a more generic example of a higher level macro interface for BPF 169and a more generic example of a higher level macro interface for BPF
167program generation. 170program generation.
168 171
169 172
170 173
171Adding architecture support 174Adding architecture support
172----------------------- 175===========================
173 176
174See arch/Kconfig for the authoritative requirements. In general, if an 177See ``arch/Kconfig`` for the authoritative requirements. In general, if an
175architecture supports both ptrace_event and seccomp, it will be able to 178architecture supports both ptrace_event and seccomp, it will be able to
176support seccomp filter with minor fixup: SIGSYS support and seccomp return 179support seccomp filter with minor fixup: ``SIGSYS`` support and seccomp return
177value checking. Then it must just add CONFIG_HAVE_ARCH_SECCOMP_FILTER 180value checking. Then it must just add ``CONFIG_HAVE_ARCH_SECCOMP_FILTER``
178to its arch-specific Kconfig. 181to its arch-specific Kconfig.
179 182
180 183
181 184
182Caveats 185Caveats
183------- 186=======
184 187
185The vDSO can cause some system calls to run entirely in userspace, 188The vDSO can cause some system calls to run entirely in userspace,
186leading to surprises when you run programs on different machines that 189leading to surprises when you run programs on different machines that
187fall back to real syscalls. To minimize these surprises on x86, make 190fall back to real syscalls. To minimize these surprises on x86, make
188sure you test with 191sure you test with
189/sys/devices/system/clocksource/clocksource0/current_clocksource set to 192``/sys/devices/system/clocksource/clocksource0/current_clocksource`` set to
190something like acpi_pm. 193something like ``acpi_pm``.
191 194
192On x86-64, vsyscall emulation is enabled by default. (vsyscalls are 195On x86-64, vsyscall emulation is enabled by default. (vsyscalls are
193legacy variants on vDSO calls.) Currently, emulated vsyscalls will honor seccomp, with a few oddities: 196legacy variants on vDSO calls.) Currently, emulated vsyscalls will
197honor seccomp, with a few oddities:
194 198
195- A return value of SECCOMP_RET_TRAP will set a si_call_addr pointing to 199- A return value of ``SECCOMP_RET_TRAP`` will set a ``si_call_addr`` pointing to
196 the vsyscall entry for the given call and not the address after the 200 the vsyscall entry for the given call and not the address after the
197 'syscall' instruction. Any code which wants to restart the call 201 'syscall' instruction. Any code which wants to restart the call
198 should be aware that (a) a ret instruction has been emulated and (b) 202 should be aware that (a) a ret instruction has been emulated and (b)
@@ -200,7 +204,7 @@ legacy variants on vDSO calls.) Currently, emulated vsyscalls will honor seccom
200 emulation security checks, making resuming the syscall mostly 204 emulation security checks, making resuming the syscall mostly
201 pointless. 205 pointless.
202 206
203- A return value of SECCOMP_RET_TRACE will signal the tracer as usual, 207- A return value of ``SECCOMP_RET_TRACE`` will signal the tracer as usual,
204 but the syscall may not be changed to another system call using the 208 but the syscall may not be changed to another system call using the
205 orig_rax register. It may only be changed to -1 order to skip the 209 orig_rax register. It may only be changed to -1 order to skip the
206 currently emulated call. Any other change MAY terminate the process. 210 currently emulated call. Any other change MAY terminate the process.
@@ -209,14 +213,14 @@ legacy variants on vDSO calls.) Currently, emulated vsyscalls will honor seccom
209 rip or rsp. (Do not rely on other changes terminating the process. 213 rip or rsp. (Do not rely on other changes terminating the process.
210 They might work. For example, on some kernels, choosing a syscall 214 They might work. For example, on some kernels, choosing a syscall
211 that only exists in future kernels will be correctly emulated (by 215 that only exists in future kernels will be correctly emulated (by
212 returning -ENOSYS). 216 returning ``-ENOSYS``).
213 217
214To detect this quirky behavior, check for addr & ~0x0C00 == 218To detect this quirky behavior, check for ``addr & ~0x0C00 ==
2150xFFFFFFFFFF600000. (For SECCOMP_RET_TRACE, use rip. For 2190xFFFFFFFFFF600000``. (For ``SECCOMP_RET_TRACE``, use rip. For
216SECCOMP_RET_TRAP, use siginfo->si_call_addr.) Do not check any other 220``SECCOMP_RET_TRAP``, use ``siginfo->si_call_addr``.) Do not check any other
217condition: future kernels may improve vsyscall emulation and current 221condition: future kernels may improve vsyscall emulation and current
218kernels in vsyscall=native mode will behave differently, but the 222kernels in vsyscall=native mode will behave differently, but the
219instructions at 0xF...F600{0,4,8,C}00 will not be system calls in these 223instructions at ``0xF...F600{0,4,8,C}00`` will not be system calls in these
220cases. 224cases.
221 225
222Note that modern systems are unlikely to use vsyscalls at all -- they 226Note that modern systems are unlikely to use vsyscalls at all -- they
diff --git a/Documentation/userspace-api/unshare.rst b/Documentation/userspace-api/unshare.rst
index 737c192cf4e7..877e90a35238 100644
--- a/Documentation/userspace-api/unshare.rst
+++ b/Documentation/userspace-api/unshare.rst
@@ -107,7 +107,7 @@ the benefits of this new feature can exceed its cost.
107 107
108unshare() reverses sharing that was done using clone(2) system call, 108unshare() reverses sharing that was done using clone(2) system call,
109so unshare() should have a similar interface as clone(2). That is, 109so unshare() should have a similar interface as clone(2). That is,
110since flags in clone(int flags, void *stack) specifies what should 110since flags in clone(int flags, void \*stack) specifies what should
111be shared, similar flags in unshare(int flags) should specify 111be shared, similar flags in unshare(int flags) should specify
112what should be unshared. Unfortunately, this may appear to invert 112what should be unshared. Unfortunately, this may appear to invert
113the meaning of the flags from the way they are used in clone(2). 113the meaning of the flags from the way they are used in clone(2).
diff --git a/MAINTAINERS b/MAINTAINERS
index ba64d98e7897..867366bb67f1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3597,7 +3597,6 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git
3597S: Maintained 3597S: Maintained
3598F: Documentation/crypto/ 3598F: Documentation/crypto/
3599F: Documentation/devicetree/bindings/crypto/ 3599F: Documentation/devicetree/bindings/crypto/
3600F: Documentation/DocBook/crypto-API.tmpl
3601F: arch/*/crypto/ 3600F: arch/*/crypto/
3602F: crypto/ 3601F: crypto/
3603F: drivers/crypto/ 3602F: drivers/crypto/
@@ -4154,8 +4153,7 @@ M: Jonathan Corbet <corbet@lwn.net>
4154L: linux-doc@vger.kernel.org 4153L: linux-doc@vger.kernel.org
4155S: Maintained 4154S: Maintained
4156F: Documentation/ 4155F: Documentation/
4157F: scripts/docproc.c 4156F: scripts/kernel-doc
4158F: scripts/kernel-doc*
4159X: Documentation/ABI/ 4157X: Documentation/ABI/
4160X: Documentation/devicetree/ 4158X: Documentation/devicetree/
4161X: Documentation/acpi 4159X: Documentation/acpi
@@ -7366,7 +7364,7 @@ KEYS/KEYRINGS:
7366M: David Howells <dhowells@redhat.com> 7364M: David Howells <dhowells@redhat.com>
7367L: keyrings@vger.kernel.org 7365L: keyrings@vger.kernel.org
7368S: Maintained 7366S: Maintained
7369F: Documentation/security/keys.txt 7367F: Documentation/security/keys/core.rst
7370F: include/linux/key.h 7368F: include/linux/key.h
7371F: include/linux/key-type.h 7369F: include/linux/key-type.h
7372F: include/linux/keyctl.h 7370F: include/linux/keyctl.h
@@ -7380,7 +7378,7 @@ M: Mimi Zohar <zohar@linux.vnet.ibm.com>
7380L: linux-security-module@vger.kernel.org 7378L: linux-security-module@vger.kernel.org
7381L: keyrings@vger.kernel.org 7379L: keyrings@vger.kernel.org
7382S: Supported 7380S: Supported
7383F: Documentation/security/keys-trusted-encrypted.txt 7381F: Documentation/security/keys/trusted-encrypted.rst
7384F: include/keys/trusted-type.h 7382F: include/keys/trusted-type.h
7385F: security/keys/trusted.c 7383F: security/keys/trusted.c
7386F: security/keys/trusted.h 7384F: security/keys/trusted.h
@@ -7391,7 +7389,7 @@ M: David Safford <safford@us.ibm.com>
7391L: linux-security-module@vger.kernel.org 7389L: linux-security-module@vger.kernel.org
7392L: keyrings@vger.kernel.org 7390L: keyrings@vger.kernel.org
7393S: Supported 7391S: Supported
7394F: Documentation/security/keys-trusted-encrypted.txt 7392F: Documentation/security/keys/trusted-encrypted.rst
7395F: include/keys/encrypted-type.h 7393F: include/keys/encrypted-type.h
7396F: security/keys/encrypted-keys/ 7394F: security/keys/encrypted-keys/
7397 7395
@@ -7401,7 +7399,7 @@ W: http://kgdb.wiki.kernel.org/
7401L: kgdb-bugreport@lists.sourceforge.net 7399L: kgdb-bugreport@lists.sourceforge.net
7402T: git git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git 7400T: git git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git
7403S: Maintained 7401S: Maintained
7404F: Documentation/DocBook/kgdb.tmpl 7402F: Documentation/dev-tools/kgdb.rst
7405F: drivers/misc/kgdbts.c 7403F: drivers/misc/kgdbts.c
7406F: drivers/tty/serial/kgdboc.c 7404F: drivers/tty/serial/kgdboc.c
7407F: include/linux/kdb.h 7405F: include/linux/kdb.h
@@ -11020,7 +11018,7 @@ S: Supported
11020F: arch/s390/ 11018F: arch/s390/
11021F: drivers/s390/ 11019F: drivers/s390/
11022F: Documentation/s390/ 11020F: Documentation/s390/
11023F: Documentation/DocBook/s390* 11021F: Documentation/driver-api/s390-drivers.rst
11024 11022
11025S390 COMMON I/O LAYER 11023S390 COMMON I/O LAYER
11026M: Sebastian Ott <sebott@linux.vnet.ibm.com> 11024M: Sebastian Ott <sebott@linux.vnet.ibm.com>
@@ -11524,6 +11522,7 @@ F: kernel/seccomp.c
11524F: include/uapi/linux/seccomp.h 11522F: include/uapi/linux/seccomp.h
11525F: include/linux/seccomp.h 11523F: include/linux/seccomp.h
11526F: tools/testing/selftests/seccomp/* 11524F: tools/testing/selftests/seccomp/*
11525F: Documentation/userspace-api/seccomp_filter.rst
11527K: \bsecure_computing 11526K: \bsecure_computing
11528K: \bTIF_SECCOMP\b 11527K: \bTIF_SECCOMP\b
11529 11528
@@ -11582,6 +11581,7 @@ S: Supported
11582F: include/linux/selinux* 11581F: include/linux/selinux*
11583F: security/selinux/ 11582F: security/selinux/
11584F: scripts/selinux/ 11583F: scripts/selinux/
11584F: Documentation/admin-guide/LSM/SELinux.rst
11585 11585
11586APPARMOR SECURITY MODULE 11586APPARMOR SECURITY MODULE
11587M: John Johansen <john.johansen@canonical.com> 11587M: John Johansen <john.johansen@canonical.com>
@@ -11590,18 +11590,21 @@ W: apparmor.wiki.kernel.org
11590T: git git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git 11590T: git git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git
11591S: Supported 11591S: Supported
11592F: security/apparmor/ 11592F: security/apparmor/
11593F: Documentation/admin-guide/LSM/apparmor.rst
11593 11594
11594LOADPIN SECURITY MODULE 11595LOADPIN SECURITY MODULE
11595M: Kees Cook <keescook@chromium.org> 11596M: Kees Cook <keescook@chromium.org>
11596T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git lsm/loadpin 11597T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git lsm/loadpin
11597S: Supported 11598S: Supported
11598F: security/loadpin/ 11599F: security/loadpin/
11600F: Documentation/admin-guide/LSM/LoadPin.rst
11599 11601
11600YAMA SECURITY MODULE 11602YAMA SECURITY MODULE
11601M: Kees Cook <keescook@chromium.org> 11603M: Kees Cook <keescook@chromium.org>
11602T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git yama/tip 11604T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git yama/tip
11603S: Supported 11605S: Supported
11604F: security/yama/ 11606F: security/yama/
11607F: Documentation/admin-guide/LSM/Yama.rst
11605 11608
11606SENSABLE PHANTOM 11609SENSABLE PHANTOM
11607M: Jiri Slaby <jirislaby@gmail.com> 11610M: Jiri Slaby <jirislaby@gmail.com>
@@ -11904,7 +11907,7 @@ L: linux-security-module@vger.kernel.org
11904W: http://schaufler-ca.com 11907W: http://schaufler-ca.com
11905T: git git://github.com/cschaufler/smack-next 11908T: git git://github.com/cschaufler/smack-next
11906S: Maintained 11909S: Maintained
11907F: Documentation/security/Smack.txt 11910F: Documentation/admin-guide/LSM/Smack.rst
11908F: security/smack/ 11911F: security/smack/
11909 11912
11910DRIVERS FOR ADAPTIVE VOLTAGE SCALING (AVS) 11913DRIVERS FOR ADAPTIVE VOLTAGE SCALING (AVS)
diff --git a/Makefile b/Makefile
index 283c6236438e..d7cb0372bed9 100644
--- a/Makefile
+++ b/Makefile
@@ -1312,7 +1312,7 @@ clean: archclean vmlinuxclean
1312# 1312#
1313mrproper: rm-dirs := $(wildcard $(MRPROPER_DIRS)) 1313mrproper: rm-dirs := $(wildcard $(MRPROPER_DIRS))
1314mrproper: rm-files := $(wildcard $(MRPROPER_FILES)) 1314mrproper: rm-files := $(wildcard $(MRPROPER_FILES))
1315mrproper-dirs := $(addprefix _mrproper_,Documentation/DocBook scripts) 1315mrproper-dirs := $(addprefix _mrproper_,scripts)
1316 1316
1317PHONY += $(mrproper-dirs) mrproper archmrproper 1317PHONY += $(mrproper-dirs) mrproper archmrproper
1318$(mrproper-dirs): 1318$(mrproper-dirs):
@@ -1416,9 +1416,7 @@ help:
1416 @$(MAKE) $(build)=$(package-dir) help 1416 @$(MAKE) $(build)=$(package-dir) help
1417 @echo '' 1417 @echo ''
1418 @echo 'Documentation targets:' 1418 @echo 'Documentation targets:'
1419 @$(MAKE) -f $(srctree)/Documentation/Makefile.sphinx dochelp 1419 @$(MAKE) -f $(srctree)/Documentation/Makefile dochelp
1420 @echo ''
1421 @$(MAKE) -f $(srctree)/Documentation/DocBook/Makefile dochelp
1422 @echo '' 1420 @echo ''
1423 @echo 'Architecture specific targets ($(SRCARCH)):' 1421 @echo 'Architecture specific targets ($(SRCARCH)):'
1424 @$(if $(archhelp),$(archhelp),\ 1422 @$(if $(archhelp),$(archhelp),\
@@ -1469,9 +1467,7 @@ $(help-board-dirs): help-%:
1469DOC_TARGETS := xmldocs sgmldocs psdocs latexdocs pdfdocs htmldocs mandocs installmandocs epubdocs cleandocs linkcheckdocs 1467DOC_TARGETS := xmldocs sgmldocs psdocs latexdocs pdfdocs htmldocs mandocs installmandocs epubdocs cleandocs linkcheckdocs
1470PHONY += $(DOC_TARGETS) 1468PHONY += $(DOC_TARGETS)
1471$(DOC_TARGETS): scripts_basic FORCE 1469$(DOC_TARGETS): scripts_basic FORCE
1472 $(Q)$(MAKE) $(build)=scripts build_docproc build_check-lc_ctype 1470 $(Q)$(MAKE) $(build)=Documentation $@
1473 $(Q)$(MAKE) $(build)=Documentation -f $(srctree)/Documentation/Makefile.sphinx $@
1474 $(Q)$(MAKE) $(build)=Documentation/DocBook $@
1475 1471
1476else # KBUILD_EXTMOD 1472else # KBUILD_EXTMOD
1477 1473
diff --git a/arch/ia64/include/asm/io.h b/arch/ia64/include/asm/io.h
index 5de673ac9cb1..a2540e21f919 100644
--- a/arch/ia64/include/asm/io.h
+++ b/arch/ia64/include/asm/io.h
@@ -117,7 +117,7 @@ extern int valid_mmap_phys_addr_range (unsigned long pfn, size_t count);
117 * following the barrier will arrive after all previous writes. For most 117 * following the barrier will arrive after all previous writes. For most
118 * ia64 platforms, this is a simple 'mf.a' instruction. 118 * ia64 platforms, this is a simple 'mf.a' instruction.
119 * 119 *
120 * See Documentation/DocBook/deviceiobook.tmpl for more information. 120 * See Documentation/driver-api/device-io.rst for more information.
121 */ 121 */
122static inline void ___ia64_mmiowb(void) 122static inline void ___ia64_mmiowb(void)
123{ 123{
diff --git a/arch/ia64/sn/kernel/iomv.c b/arch/ia64/sn/kernel/iomv.c
index c77ebdf98119..2b22a71663c1 100644
--- a/arch/ia64/sn/kernel/iomv.c
+++ b/arch/ia64/sn/kernel/iomv.c
@@ -63,7 +63,7 @@ EXPORT_SYMBOL(sn_io_addr);
63/** 63/**
64 * __sn_mmiowb - I/O space memory barrier 64 * __sn_mmiowb - I/O space memory barrier
65 * 65 *
66 * See arch/ia64/include/asm/io.h and Documentation/DocBook/deviceiobook.tmpl 66 * See arch/ia64/include/asm/io.h and Documentation/driver-api/device-io.rst
67 * for details. 67 * for details.
68 * 68 *
69 * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear. 69 * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear.
diff --git a/drivers/ata/acard-ahci.c b/drivers/ata/acard-ahci.c
index ed6a30cd681a..940ddbc59aa7 100644
--- a/drivers/ata/acard-ahci.c
+++ b/drivers/ata/acard-ahci.c
@@ -25,7 +25,7 @@
25 * 25 *
26 * 26 *
27 * libata documentation is available via 'make {ps|pdf}docs', 27 * libata documentation is available via 'make {ps|pdf}docs',
28 * as Documentation/DocBook/libata.* 28 * as Documentation/driver-api/libata.rst
29 * 29 *
30 * AHCI hardware documentation: 30 * AHCI hardware documentation:
31 * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf 31 * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index c69954023c2e..1e1c355121e4 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -24,7 +24,7 @@
24 * 24 *
25 * 25 *
26 * libata documentation is available via 'make {ps|pdf}docs', 26 * libata documentation is available via 'make {ps|pdf}docs',
27 * as Documentation/DocBook/libata.* 27 * as Documentation/driver-api/libata.rst
28 * 28 *
29 * AHCI hardware documentation: 29 * AHCI hardware documentation:
30 * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf 30 * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf
diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h
index 5db6ab261643..30f67a1a4f54 100644
--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -24,7 +24,7 @@
24 * 24 *
25 * 25 *
26 * libata documentation is available via 'make {ps|pdf}docs', 26 * libata documentation is available via 'make {ps|pdf}docs',
27 * as Documentation/DocBook/libata.* 27 * as Documentation/driver-api/libata.rst
28 * 28 *
29 * AHCI hardware documentation: 29 * AHCI hardware documentation:
30 * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf 30 * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf
diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index ffbe625e6fd2..8401c3b5be92 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -33,7 +33,7 @@
33 * 33 *
34 * 34 *
35 * libata documentation is available via 'make {ps|pdf}docs', 35 * libata documentation is available via 'make {ps|pdf}docs',
36 * as Documentation/DocBook/libata.* 36 * as Documentation/driver-api/libata.rst
37 * 37 *
38 * Hardware documentation available at http://developer.intel.com/ 38 * Hardware documentation available at http://developer.intel.com/
39 * 39 *
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 3159f9e66d8f..6154f0e2b81a 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -24,7 +24,7 @@
24 * 24 *
25 * 25 *
26 * libata documentation is available via 'make {ps|pdf}docs', 26 * libata documentation is available via 'make {ps|pdf}docs',
27 * as Documentation/DocBook/libata.* 27 * as Documentation/driver-api/libata.rst
28 * 28 *
29 * AHCI hardware documentation: 29 * AHCI hardware documentation:
30 * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf 30 * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index e157a0e44419..b82d6bb88d27 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -25,7 +25,7 @@
25 * 25 *
26 * 26 *
27 * libata documentation is available via 'make {ps|pdf}docs', 27 * libata documentation is available via 'make {ps|pdf}docs',
28 * as Documentation/DocBook/libata.* 28 * as Documentation/driver-api/libata.rst
29 * 29 *
30 * Hardware documentation available from http://www.t13.org/ and 30 * Hardware documentation available from http://www.t13.org/ and
31 * http://www.sata-io.org/ 31 * http://www.sata-io.org/
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index ef68232b5222..7e33e200aae5 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -25,7 +25,7 @@
25 * 25 *
26 * 26 *
27 * libata documentation is available via 'make {ps|pdf}docs', 27 * libata documentation is available via 'make {ps|pdf}docs',
28 * as Documentation/DocBook/libata.* 28 * as Documentation/driver-api/libata.rst
29 * 29 *
30 * Hardware documentation available from http://www.t13.org/ and 30 * Hardware documentation available from http://www.t13.org/ and
31 * http://www.sata-io.org/ 31 * http://www.sata-io.org/
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 49ba9834c715..b0866f040d1f 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -25,7 +25,7 @@
25 * 25 *
26 * 26 *
27 * libata documentation is available via 'make {ps|pdf}docs', 27 * libata documentation is available via 'make {ps|pdf}docs',
28 * as Documentation/DocBook/libata.* 28 * as Documentation/driver-api/libata.rst
29 * 29 *
30 * Hardware documentation available from 30 * Hardware documentation available from
31 * - http://www.t10.org/ 31 * - http://www.t10.org/
@@ -3398,9 +3398,10 @@ static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax,
3398 * 3398 *
3399 * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or 3399 * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or
3400 * an SCT Write Same command. 3400 * an SCT Write Same command.
3401 * Based on WRITE SAME has the UNMAP flag 3401 * Based on WRITE SAME has the UNMAP flag:
3402 * When set translate to DSM TRIM 3402 *
3403 * When clear translate to SCT Write Same 3403 * - When set translate to DSM TRIM
3404 * - When clear translate to SCT Write Same
3404 */ 3405 */
3405static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) 3406static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
3406{ 3407{
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
index 274d6d7193d7..052921352f31 100644
--- a/drivers/ata/libata-sff.c
+++ b/drivers/ata/libata-sff.c
@@ -25,7 +25,7 @@
25 * 25 *
26 * 26 *
27 * libata documentation is available via 'make {ps|pdf}docs', 27 * libata documentation is available via 'make {ps|pdf}docs',
28 * as Documentation/DocBook/libata.* 28 * as Documentation/driver-api/libata.rst
29 * 29 *
30 * Hardware documentation available from http://www.t13.org/ and 30 * Hardware documentation available from http://www.t13.org/ and
31 * http://www.sata-io.org/ 31 * http://www.sata-io.org/
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 120fce0befd3..5afe35baf61b 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -21,7 +21,7 @@
21 * 21 *
22 * 22 *
23 * libata documentation is available via 'make {ps|pdf}docs', 23 * libata documentation is available via 'make {ps|pdf}docs',
24 * as Documentation/DocBook/libata.* 24 * as Documentation/driver-api/libata.rst
25 * 25 *
26 */ 26 */
27 27
diff --git a/drivers/ata/pata_pdc2027x.c b/drivers/ata/pata_pdc2027x.c
index d9ef9e276225..82bfd51692f3 100644
--- a/drivers/ata/pata_pdc2027x.c
+++ b/drivers/ata/pata_pdc2027x.c
@@ -17,7 +17,7 @@
17 * 17 *
18 * 18 *
19 * libata documentation is available via 'make {ps|pdf}docs', 19 * libata documentation is available via 'make {ps|pdf}docs',
20 * as Documentation/DocBook/libata.* 20 * as Documentation/driver-api/libata.rst
21 * 21 *
22 * Hardware information only available under NDA. 22 * Hardware information only available under NDA.
23 * 23 *
diff --git a/drivers/ata/pdc_adma.c b/drivers/ata/pdc_adma.c
index 64d682c6ee57..f1e873a37465 100644
--- a/drivers/ata/pdc_adma.c
+++ b/drivers/ata/pdc_adma.c
@@ -21,7 +21,7 @@
21 * 21 *
22 * 22 *
23 * libata documentation is available via 'make {ps|pdf}docs', 23 * libata documentation is available via 'make {ps|pdf}docs',
24 * as Documentation/DocBook/libata.* 24 * as Documentation/driver-api/libata.rst
25 * 25 *
26 * 26 *
27 * Supports ATA disks in single-packet ADMA mode. 27 * Supports ATA disks in single-packet ADMA mode.
diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
index 734f563b8d37..8c683ddd0f58 100644
--- a/drivers/ata/sata_nv.c
+++ b/drivers/ata/sata_nv.c
@@ -21,7 +21,7 @@
21 * 21 *
22 * 22 *
23 * libata documentation is available via 'make {ps|pdf}docs', 23 * libata documentation is available via 'make {ps|pdf}docs',
24 * as Documentation/DocBook/libata.* 24 * as Documentation/driver-api/libata.rst
25 * 25 *
26 * No hardware documentation available outside of NVIDIA. 26 * No hardware documentation available outside of NVIDIA.
27 * This driver programs the NVIDIA SATA controller in a similar 27 * This driver programs the NVIDIA SATA controller in a similar
diff --git a/drivers/ata/sata_promise.c b/drivers/ata/sata_promise.c
index 0fa211e2831c..d032bf657f70 100644
--- a/drivers/ata/sata_promise.c
+++ b/drivers/ata/sata_promise.c
@@ -25,7 +25,7 @@
25 * 25 *
26 * 26 *
27 * libata documentation is available via 'make {ps|pdf}docs', 27 * libata documentation is available via 'make {ps|pdf}docs',
28 * as Documentation/DocBook/libata.* 28 * as Documentation/driver-api/libata.rst
29 * 29 *
30 * Hardware information only available under NDA. 30 * Hardware information only available under NDA.
31 * 31 *
diff --git a/drivers/ata/sata_promise.h b/drivers/ata/sata_promise.h
index 00d6000e546f..61633ef5ed72 100644
--- a/drivers/ata/sata_promise.h
+++ b/drivers/ata/sata_promise.h
@@ -20,7 +20,7 @@
20 * 20 *
21 * 21 *
22 * libata documentation is available via 'make {ps|pdf}docs', 22 * libata documentation is available via 'make {ps|pdf}docs',
23 * as Documentation/DocBook/libata.* 23 * as Documentation/driver-api/libata.rst
24 * 24 *
25 */ 25 */
26 26
diff --git a/drivers/ata/sata_qstor.c b/drivers/ata/sata_qstor.c
index af987a4f33d1..1fe941688e95 100644
--- a/drivers/ata/sata_qstor.c
+++ b/drivers/ata/sata_qstor.c
@@ -23,7 +23,7 @@
23 * 23 *
24 * 24 *
25 * libata documentation is available via 'make {ps|pdf}docs', 25 * libata documentation is available via 'make {ps|pdf}docs',
26 * as Documentation/DocBook/libata.* 26 * as Documentation/driver-api/libata.rst
27 * 27 *
28 */ 28 */
29 29
diff --git a/drivers/ata/sata_sil.c b/drivers/ata/sata_sil.c
index 29bcff086bce..ed76f070d21e 100644
--- a/drivers/ata/sata_sil.c
+++ b/drivers/ata/sata_sil.c
@@ -25,7 +25,7 @@
25 * 25 *
26 * 26 *
27 * libata documentation is available via 'make {ps|pdf}docs', 27 * libata documentation is available via 'make {ps|pdf}docs',
28 * as Documentation/DocBook/libata.* 28 * as Documentation/driver-api/libata.rst
29 * 29 *
30 * Documentation for SiI 3112: 30 * Documentation for SiI 3112:
31 * http://gkernel.sourceforge.net/specs/sii/3112A_SiI-DS-0095-B2.pdf.bz2 31 * http://gkernel.sourceforge.net/specs/sii/3112A_SiI-DS-0095-B2.pdf.bz2
diff --git a/drivers/ata/sata_sis.c b/drivers/ata/sata_sis.c
index d1637ac40a73..30f4f35f36d4 100644
--- a/drivers/ata/sata_sis.c
+++ b/drivers/ata/sata_sis.c
@@ -24,7 +24,7 @@
24 * 24 *
25 * 25 *
26 * libata documentation is available via 'make {ps|pdf}docs', 26 * libata documentation is available via 'make {ps|pdf}docs',
27 * as Documentation/DocBook/libata.* 27 * as Documentation/driver-api/libata.rst
28 * 28 *
29 * Hardware documentation available under NDA. 29 * Hardware documentation available under NDA.
30 * 30 *
diff --git a/drivers/ata/sata_svw.c b/drivers/ata/sata_svw.c
index ff614be55d0f..0fd6ac7e57ba 100644
--- a/drivers/ata/sata_svw.c
+++ b/drivers/ata/sata_svw.c
@@ -30,7 +30,7 @@
30 * 30 *
31 * 31 *
32 * libata documentation is available via 'make {ps|pdf}docs', 32 * libata documentation is available via 'make {ps|pdf}docs',
33 * as Documentation/DocBook/libata.* 33 * as Documentation/driver-api/libata.rst
34 * 34 *
35 * Hardware documentation available under NDA. 35 * Hardware documentation available under NDA.
36 * 36 *
diff --git a/drivers/ata/sata_sx4.c b/drivers/ata/sata_sx4.c
index 48301cb3a316..405e606a234d 100644
--- a/drivers/ata/sata_sx4.c
+++ b/drivers/ata/sata_sx4.c
@@ -24,7 +24,7 @@
24 * 24 *
25 * 25 *
26 * libata documentation is available via 'make {ps|pdf}docs', 26 * libata documentation is available via 'make {ps|pdf}docs',
27 * as Documentation/DocBook/libata.* 27 * as Documentation/driver-api/libata.rst
28 * 28 *
29 * Hardware documentation available under NDA. 29 * Hardware documentation available under NDA.
30 * 30 *
diff --git a/drivers/ata/sata_uli.c b/drivers/ata/sata_uli.c
index 08f98c3ed5c8..4f6e8d8156de 100644
--- a/drivers/ata/sata_uli.c
+++ b/drivers/ata/sata_uli.c
@@ -18,7 +18,7 @@
18 * 18 *
19 * 19 *
20 * libata documentation is available via 'make {ps|pdf}docs', 20 * libata documentation is available via 'make {ps|pdf}docs',
21 * as Documentation/DocBook/libata.* 21 * as Documentation/driver-api/libata.rst
22 * 22 *
23 * Hardware documentation available under NDA. 23 * Hardware documentation available under NDA.
24 * 24 *
diff --git a/drivers/ata/sata_via.c b/drivers/ata/sata_via.c
index f3f538eec7b3..22e96fc77d09 100644
--- a/drivers/ata/sata_via.c
+++ b/drivers/ata/sata_via.c
@@ -25,7 +25,7 @@
25 * 25 *
26 * 26 *
27 * libata documentation is available via 'make {ps|pdf}docs', 27 * libata documentation is available via 'make {ps|pdf}docs',
28 * as Documentation/DocBook/libata.* 28 * as Documentation/driver-api/libata.rst
29 * 29 *
30 * Hardware documentation available under NDA. 30 * Hardware documentation available under NDA.
31 * 31 *
diff --git a/drivers/ata/sata_vsc.c b/drivers/ata/sata_vsc.c
index 183eb52085df..9648127cca70 100644
--- a/drivers/ata/sata_vsc.c
+++ b/drivers/ata/sata_vsc.c
@@ -26,7 +26,7 @@
26 * 26 *
27 * 27 *
28 * libata documentation is available via 'make {ps|pdf}docs', 28 * libata documentation is available via 'make {ps|pdf}docs',
29 * as Documentation/DocBook/libata.* 29 * as Documentation/driver-api/libata.rst
30 * 30 *
31 * Vitesse hardware documentation presumably available under NDA. 31 * Vitesse hardware documentation presumably available under NDA.
32 * Intel 31244 (same hardware interface) documentation presumably 32 * Intel 31244 (same hardware interface) documentation presumably
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index b1dd12729f19..bf8486c406d3 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -502,10 +502,12 @@ static int nand_default_block_markbad(struct mtd_info *mtd, loff_t ofs)
502 * specify how to write bad block markers to OOB (chip->block_markbad). 502 * specify how to write bad block markers to OOB (chip->block_markbad).
503 * 503 *
504 * We try operations in the following order: 504 * We try operations in the following order:
505 *
505 * (1) erase the affected block, to allow OOB marker to be written cleanly 506 * (1) erase the affected block, to allow OOB marker to be written cleanly
506 * (2) write bad block marker to OOB area of affected block (unless flag 507 * (2) write bad block marker to OOB area of affected block (unless flag
507 * NAND_BBT_NO_OOB_BBM is present) 508 * NAND_BBT_NO_OOB_BBM is present)
508 * (3) update the BBT 509 * (3) update the BBT
510 *
509 * Note that we retain the first error encountered in (2) or (3), finish the 511 * Note that we retain the first error encountered in (2) or (3), finish the
510 * procedures, and dump the error in the end. 512 * procedures, and dump the error in the end.
511*/ 513*/
@@ -1219,9 +1221,10 @@ int nand_reset(struct nand_chip *chip, int chipnr)
1219 * @mtd: mtd info 1221 * @mtd: mtd info
1220 * @ofs: offset to start unlock from 1222 * @ofs: offset to start unlock from
1221 * @len: length to unlock 1223 * @len: length to unlock
1222 * @invert: when = 0, unlock the range of blocks within the lower and 1224 * @invert:
1225 * - when = 0, unlock the range of blocks within the lower and
1223 * upper boundary address 1226 * upper boundary address
1224 * when = 1, unlock the range of blocks outside the boundaries 1227 * - when = 1, unlock the range of blocks outside the boundaries
1225 * of the lower and upper boundary address 1228 * of the lower and upper boundary address
1226 * 1229 *
1227 * Returs unlock status. 1230 * Returs unlock status.
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index eebb0e1c70ff..3e231a54476e 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -379,6 +379,7 @@ static void phy_sanitize_settings(struct phy_device *phydev)
379 * @cmd: ethtool_cmd 379 * @cmd: ethtool_cmd
380 * 380 *
381 * A few notes about parameter checking: 381 * A few notes about parameter checking:
382 *
382 * - We don't set port or transceiver, so we don't care what they 383 * - We don't set port or transceiver, so we don't care what they
383 * were set to. 384 * were set to.
384 * - phy_start_aneg() will make sure forced settings are sane, and 385 * - phy_start_aneg() will make sure forced settings are sane, and
diff --git a/drivers/scsi/qla1280.c b/drivers/scsi/qla1280.c
index 634254a52301..8a29fb09db14 100644
--- a/drivers/scsi/qla1280.c
+++ b/drivers/scsi/qla1280.c
@@ -3390,7 +3390,7 @@ qla1280_isp_cmd(struct scsi_qla_host *ha)
3390 * On PCI bus, order reverses and write of 6 posts, then index 5, 3390 * On PCI bus, order reverses and write of 6 posts, then index 5,
3391 * causing chip to issue full queue of stale commands 3391 * causing chip to issue full queue of stale commands
3392 * The mmiowb() prevents future writes from crossing the barrier. 3392 * The mmiowb() prevents future writes from crossing the barrier.
3393 * See Documentation/DocBook/deviceiobook.tmpl for more information. 3393 * See Documentation/driver-api/device-io.rst for more information.
3394 */ 3394 */
3395 WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index); 3395 WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
3396 mmiowb(); 3396 mmiowb();
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 6f7128f49c30..69979574004f 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1051,10 +1051,11 @@ static unsigned char *scsi_inq_str(unsigned char *buf, unsigned char *inq,
1051 * allocate and set it up by calling scsi_add_lun. 1051 * allocate and set it up by calling scsi_add_lun.
1052 * 1052 *
1053 * Return: 1053 * Return:
1054 * SCSI_SCAN_NO_RESPONSE: could not allocate or setup a scsi_device 1054 *
1055 * SCSI_SCAN_TARGET_PRESENT: target responded, but no device is 1055 * - SCSI_SCAN_NO_RESPONSE: could not allocate or setup a scsi_device
1056 * - SCSI_SCAN_TARGET_PRESENT: target responded, but no device is
1056 * attached at the LUN 1057 * attached at the LUN
1057 * SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized 1058 * - SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized
1058 **/ 1059 **/
1059static int scsi_probe_and_add_lun(struct scsi_target *starget, 1060static int scsi_probe_and_add_lun(struct scsi_target *starget,
1060 u64 lun, int *bflagsp, 1061 u64 lun, int *bflagsp,
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index d4cf32d55546..1df77453f6b6 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2914,16 +2914,18 @@ EXPORT_SYMBOL(fc_remote_port_add);
2914 * port is no longer part of the topology. Note: Although a port 2914 * port is no longer part of the topology. Note: Although a port
2915 * may no longer be part of the topology, it may persist in the remote 2915 * may no longer be part of the topology, it may persist in the remote
2916 * ports displayed by the fc_host. We do this under 2 conditions: 2916 * ports displayed by the fc_host. We do this under 2 conditions:
2917 *
2917 * 1) If the port was a scsi target, we delay its deletion by "blocking" it. 2918 * 1) If the port was a scsi target, we delay its deletion by "blocking" it.
2918 * This allows the port to temporarily disappear, then reappear without 2919 * This allows the port to temporarily disappear, then reappear without
2919 * disrupting the SCSI device tree attached to it. During the "blocked" 2920 * disrupting the SCSI device tree attached to it. During the "blocked"
2920 * period the port will still exist. 2921 * period the port will still exist.
2922 *
2921 * 2) If the port was a scsi target and disappears for longer than we 2923 * 2) If the port was a scsi target and disappears for longer than we
2922 * expect, we'll delete the port and the tear down the SCSI device tree 2924 * expect, we'll delete the port and the tear down the SCSI device tree
2923 * attached to it. However, we want to semi-persist the target id assigned 2925 * attached to it. However, we want to semi-persist the target id assigned
2924 * to that port if it eventually does exist. The port structure will 2926 * to that port if it eventually does exist. The port structure will
2925 * remain (although with minimal information) so that the target id 2927 * remain (although with minimal information) so that the target id
2926 * bindings remails. 2928 * bindings remails.
2927 * 2929 *
2928 * If the remote port is not an FCP Target, it will be fully torn down 2930 * If the remote port is not an FCP Target, it will be fully torn down
2929 * and deallocated, including the fc_remote_port class device. 2931 * and deallocated, including the fc_remote_port class device.
diff --git a/drivers/scsi/scsicam.c b/drivers/scsi/scsicam.c
index 910f4a7a3924..31273468589c 100644
--- a/drivers/scsi/scsicam.c
+++ b/drivers/scsi/scsicam.c
@@ -116,8 +116,8 @@ EXPORT_SYMBOL(scsicam_bios_param);
116 * @hds: put heads here 116 * @hds: put heads here
117 * @secs: put sectors here 117 * @secs: put sectors here
118 * 118 *
119 * Description: determine the BIOS mapping/geometry used to create the partition 119 * Determine the BIOS mapping/geometry used to create the partition
120 * table, storing the results in *cyls, *hds, and *secs 120 * table, storing the results in @cyls, @hds, and @secs
121 * 121 *
122 * Returns: -1 on failure, 0 on success. 122 * Returns: -1 on failure, 0 on success.
123 */ 123 */
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 354e2ab62031..6dabc4a10396 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -9,7 +9,7 @@
9 * 2 as published by the Free Software Foundation. 9 * 2 as published by the Free Software Foundation.
10 * 10 *
11 * debugfs is for people to use instead of /proc or /sys. 11 * debugfs is for people to use instead of /proc or /sys.
12 * See Documentation/DocBook/filesystems for more details. 12 * See Documentation/filesystems/ for more details.
13 * 13 *
14 */ 14 */
15 15
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index e892ae7d89f8..77440e4aa9d4 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -9,7 +9,7 @@
9 * 2 as published by the Free Software Foundation. 9 * 2 as published by the Free Software Foundation.
10 * 10 *
11 * debugfs is for people to use instead of /proc or /sys. 11 * debugfs is for people to use instead of /proc or /sys.
12 * See Documentation/DocBook/kernel-api for more details. 12 * See ./Documentation/core-api/kernel-api.rst for more details.
13 * 13 *
14 */ 14 */
15 15
diff --git a/fs/eventfd.c b/fs/eventfd.c
index 9736df2ce89d..2fb4eadaa118 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -215,8 +215,8 @@ EXPORT_SYMBOL_GPL(eventfd_ctx_remove_wait_queue);
215 * 215 *
216 * Returns %0 if successful, or the following error codes: 216 * Returns %0 if successful, or the following error codes:
217 * 217 *
218 * -EAGAIN : The operation would have blocked but @no_wait was non-zero. 218 * - -EAGAIN : The operation would have blocked but @no_wait was non-zero.
219 * -ERESTARTSYS : A signal interrupted the wait operation. 219 * - -ERESTARTSYS : A signal interrupted the wait operation.
220 * 220 *
221 * If @no_wait is zero, the function might sleep until the eventfd internal 221 * If @no_wait is zero, the function might sleep until the eventfd internal
222 * counter becomes greater than zero. 222 * counter becomes greater than zero.
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 63ee2940775c..8b426f83909f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2052,11 +2052,13 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode)
2052} 2052}
2053 2053
2054/** 2054/**
2055 * __mark_inode_dirty - internal function 2055 * __mark_inode_dirty - internal function
2056 * @inode: inode to mark 2056 *
2057 * @flags: what kind of dirty (i.e. I_DIRTY_SYNC) 2057 * @inode: inode to mark
2058 * Mark an inode as dirty. Callers should use mark_inode_dirty or 2058 * @flags: what kind of dirty (i.e. I_DIRTY_SYNC)
2059 * mark_inode_dirty_sync. 2059 *
2060 * Mark an inode as dirty. Callers should use mark_inode_dirty or
2061 * mark_inode_dirty_sync.
2060 * 2062 *
2061 * Put the inode on the super block's dirty list. 2063 * Put the inode on the super block's dirty list.
2062 * 2064 *
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 2d30a6da7013..8b08044b3120 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -409,25 +409,6 @@ static handle_t *new_handle(int nblocks)
409 return handle; 409 return handle;
410} 410}
411 411
412/**
413 * handle_t *jbd2_journal_start() - Obtain a new handle.
414 * @journal: Journal to start transaction on.
415 * @nblocks: number of block buffer we might modify
416 *
417 * We make sure that the transaction can guarantee at least nblocks of
418 * modified buffers in the log. We block until the log can guarantee
419 * that much space. Additionally, if rsv_blocks > 0, we also create another
420 * handle with rsv_blocks reserved blocks in the journal. This handle is
421 * is stored in h_rsv_handle. It is not attached to any particular transaction
422 * and thus doesn't block transaction commit. If the caller uses this reserved
423 * handle, it has to set h_rsv_handle to NULL as otherwise jbd2_journal_stop()
424 * on the parent handle will dispose the reserved one. Reserved handle has to
425 * be converted to a normal handle using jbd2_journal_start_reserved() before
426 * it can be used.
427 *
428 * Return a pointer to a newly allocated handle, or an ERR_PTR() value
429 * on failure.
430 */
431handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks, 412handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks,
432 gfp_t gfp_mask, unsigned int type, 413 gfp_t gfp_mask, unsigned int type,
433 unsigned int line_no) 414 unsigned int line_no)
@@ -478,6 +459,25 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks,
478EXPORT_SYMBOL(jbd2__journal_start); 459EXPORT_SYMBOL(jbd2__journal_start);
479 460
480 461
462/**
463 * handle_t *jbd2_journal_start() - Obtain a new handle.
464 * @journal: Journal to start transaction on.
465 * @nblocks: number of block buffer we might modify
466 *
467 * We make sure that the transaction can guarantee at least nblocks of
468 * modified buffers in the log. We block until the log can guarantee
469 * that much space. Additionally, if rsv_blocks > 0, we also create another
470 * handle with rsv_blocks reserved blocks in the journal. This handle is
471 * is stored in h_rsv_handle. It is not attached to any particular transaction
472 * and thus doesn't block transaction commit. If the caller uses this reserved
473 * handle, it has to set h_rsv_handle to NULL as otherwise jbd2_journal_stop()
474 * on the parent handle will dispose the reserved one. Reserved handle has to
475 * be converted to a normal handle using jbd2_journal_start_reserved() before
476 * it can be used.
477 *
478 * Return a pointer to a newly allocated handle, or an ERR_PTR() value
479 * on failure.
480 */
481handle_t *jbd2_journal_start(journal_t *journal, int nblocks) 481handle_t *jbd2_journal_start(journal_t *journal, int nblocks)
482{ 482{
483 return jbd2__journal_start(journal, nblocks, 0, GFP_NOFS, 0, 0); 483 return jbd2__journal_start(journal, nblocks, 0, GFP_NOFS, 0, 0);
@@ -1072,10 +1072,10 @@ out:
1072 * @handle: transaction to add buffer modifications to 1072 * @handle: transaction to add buffer modifications to
1073 * @bh: bh to be used for metadata writes 1073 * @bh: bh to be used for metadata writes
1074 * 1074 *
1075 * Returns an error code or 0 on success. 1075 * Returns: error code or 0 on success.
1076 * 1076 *
1077 * In full data journalling mode the buffer may be of type BJ_AsyncData, 1077 * In full data journalling mode the buffer may be of type BJ_AsyncData,
1078 * because we're write()ing a buffer which is also part of a shared mapping. 1078 * because we're ``write()ing`` a buffer which is also part of a shared mapping.
1079 */ 1079 */
1080 1080
1081int jbd2_journal_get_write_access(handle_t *handle, struct buffer_head *bh) 1081int jbd2_journal_get_write_access(handle_t *handle, struct buffer_head *bh)
diff --git a/fs/mpage.c b/fs/mpage.c
index d6d1486d6f99..2e4c41ccb5c9 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -345,6 +345,7 @@ confused:
345 * 345 *
346 * So an mpage read of the first 16 blocks of an ext2 file will cause I/O to be 346 * So an mpage read of the first 16 blocks of an ext2 file will cause I/O to be
347 * submitted in the following order: 347 * submitted in the following order:
348 *
348 * 12 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 349 * 12 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16
349 * 350 *
350 * because the indirect block has to be read to get the mappings of blocks 351 * because the indirect block has to be read to get the mappings of blocks
diff --git a/fs/namei.c b/fs/namei.c
index 6571a5f5112e..8bacc390c51e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4332,6 +4332,7 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
4332 * The worst of all namespace operations - renaming directory. "Perverted" 4332 * The worst of all namespace operations - renaming directory. "Perverted"
4333 * doesn't even start to describe it. Somebody in UCB had a heck of a trip... 4333 * doesn't even start to describe it. Somebody in UCB had a heck of a trip...
4334 * Problems: 4334 * Problems:
4335 *
4335 * a) we can get into loop creation. 4336 * a) we can get into loop creation.
4336 * b) race potential - two innocent renames can create a loop together. 4337 * b) race potential - two innocent renames can create a loop together.
4337 * That's where 4.4 screws up. Current fix: serialization on 4338 * That's where 4.4 screws up. Current fix: serialization on
diff --git a/include/linux/ata.h b/include/linux/ata.h
index ad7d9ee89ff0..73fe18edfdaf 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -20,7 +20,7 @@
20 * 20 *
21 * 21 *
22 * libata documentation is available via 'make {ps|pdf}docs', 22 * libata documentation is available via 'make {ps|pdf}docs',
23 * as Documentation/DocBook/libata.* 23 * as Documentation/driver-api/libata.rst
24 * 24 *
25 * Hardware documentation available from http://www.t13.org/ 25 * Hardware documentation available from http://www.t13.org/
26 * 26 *
diff --git a/include/linux/cred.h b/include/linux/cred.h
index b03e7d049a64..c728d515e5e2 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -1,4 +1,4 @@
1/* Credentials management - see Documentation/security/credentials.txt 1/* Credentials management - see Documentation/security/credentials.rst
2 * 2 *
3 * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved. 3 * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved.
4 * Written by David Howells (dhowells@redhat.com) 4 * Written by David Howells (dhowells@redhat.com)
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index 9174b0d28582..aa86e6d8c1aa 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -9,7 +9,7 @@
9 * 2 as published by the Free Software Foundation. 9 * 2 as published by the Free Software Foundation.
10 * 10 *
11 * debugfs is for people to use instead of /proc or /sys. 11 * debugfs is for people to use instead of /proc or /sys.
12 * See Documentation/DocBook/filesystems for more details. 12 * See Documentation/filesystems/ for more details.
13 */ 13 */
14 14
15#ifndef _DEBUGFS_H_ 15#ifndef _DEBUGFS_H_
diff --git a/include/linux/key.h b/include/linux/key.h
index 78e25aabedaf..044114185120 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -9,7 +9,7 @@
9 * 2 of the License, or (at your option) any later version. 9 * 2 of the License, or (at your option) any later version.
10 * 10 *
11 * 11 *
12 * See Documentation/security/keys.txt for information on keys/keyrings. 12 * See Documentation/security/keys/core.rst for information on keys/keyrings.
13 */ 13 */
14 14
15#ifndef _LINUX_KEY_H 15#ifndef _LINUX_KEY_H
diff --git a/include/linux/libata.h b/include/linux/libata.h
index c9a69fc8821e..9e6633235ad7 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -19,7 +19,7 @@
19 * 19 *
20 * 20 *
21 * libata documentation is available via 'make {ps|pdf}docs', 21 * libata documentation is available via 'make {ps|pdf}docs',
22 * as Documentation/DocBook/libata.* 22 * as Documentation/driver-api/libata.rst
23 * 23 *
24 */ 24 */
25 25
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 080f34e66017..a1eeaf603d2f 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -29,6 +29,8 @@
29#include <linux/rculist.h> 29#include <linux/rculist.h>
30 30
31/** 31/**
32 * union security_list_options - Linux Security Module hook function list
33 *
32 * Security hooks for program execution operations. 34 * Security hooks for program execution operations.
33 * 35 *
34 * @bprm_set_creds: 36 * @bprm_set_creds:
@@ -193,8 +195,8 @@
193 * @value will be set to the allocated attribute value. 195 * @value will be set to the allocated attribute value.
194 * @len will be set to the length of the value. 196 * @len will be set to the length of the value.
195 * Returns 0 if @name and @value have been successfully set, 197 * Returns 0 if @name and @value have been successfully set,
196 * -EOPNOTSUPP if no security attribute is needed, or 198 * -EOPNOTSUPP if no security attribute is needed, or
197 * -ENOMEM on memory allocation failure. 199 * -ENOMEM on memory allocation failure.
198 * @inode_create: 200 * @inode_create:
199 * Check permission to create a regular file. 201 * Check permission to create a regular file.
200 * @dir contains inode structure of the parent of the new file. 202 * @dir contains inode structure of the parent of the new file.
@@ -510,8 +512,7 @@
510 * process @tsk. Note that this hook is sometimes called from interrupt. 512 * process @tsk. Note that this hook is sometimes called from interrupt.
511 * Note that the fown_struct, @fown, is never outside the context of a 513 * Note that the fown_struct, @fown, is never outside the context of a
512 * struct file, so the file structure (and associated security information) 514 * struct file, so the file structure (and associated security information)
513 * can always be obtained: 515 * can always be obtained: container_of(fown, struct file, f_owner)
514 * container_of(fown, struct file, f_owner)
515 * @tsk contains the structure of task receiving signal. 516 * @tsk contains the structure of task receiving signal.
516 * @fown contains the file owner information. 517 * @fown contains the file owner information.
517 * @sig is the signal that will be sent. When 0, kernel sends SIGIO. 518 * @sig is the signal that will be sent. When 0, kernel sends SIGIO.
@@ -521,7 +522,7 @@
521 * to receive an open file descriptor via socket IPC. 522 * to receive an open file descriptor via socket IPC.
522 * @file contains the file structure being received. 523 * @file contains the file structure being received.
523 * Return 0 if permission is granted. 524 * Return 0 if permission is granted.
524 * @file_open 525 * @file_open:
525 * Save open-time permission checking state for later use upon 526 * Save open-time permission checking state for later use upon
526 * file_permission, and recheck access if anything has changed 527 * file_permission, and recheck access if anything has changed
527 * since inode_permission. 528 * since inode_permission.
@@ -1143,7 +1144,7 @@
1143 * @sma contains the semaphore structure. May be NULL. 1144 * @sma contains the semaphore structure. May be NULL.
1144 * @cmd contains the operation to be performed. 1145 * @cmd contains the operation to be performed.
1145 * Return 0 if permission is granted. 1146 * Return 0 if permission is granted.
1146 * @sem_semop 1147 * @sem_semop:
1147 * Check permissions before performing operations on members of the 1148 * Check permissions before performing operations on members of the
1148 * semaphore set @sma. If the @alter flag is nonzero, the semaphore set 1149 * semaphore set @sma. If the @alter flag is nonzero, the semaphore set
1149 * may be modified. 1150 * may be modified.
@@ -1153,20 +1154,20 @@
1153 * @alter contains the flag indicating whether changes are to be made. 1154 * @alter contains the flag indicating whether changes are to be made.
1154 * Return 0 if permission is granted. 1155 * Return 0 if permission is granted.
1155 * 1156 *
1156 * @binder_set_context_mgr 1157 * @binder_set_context_mgr:
1157 * Check whether @mgr is allowed to be the binder context manager. 1158 * Check whether @mgr is allowed to be the binder context manager.
1158 * @mgr contains the task_struct for the task being registered. 1159 * @mgr contains the task_struct for the task being registered.
1159 * Return 0 if permission is granted. 1160 * Return 0 if permission is granted.
1160 * @binder_transaction 1161 * @binder_transaction:
1161 * Check whether @from is allowed to invoke a binder transaction call 1162 * Check whether @from is allowed to invoke a binder transaction call
1162 * to @to. 1163 * to @to.
1163 * @from contains the task_struct for the sending task. 1164 * @from contains the task_struct for the sending task.
1164 * @to contains the task_struct for the receiving task. 1165 * @to contains the task_struct for the receiving task.
1165 * @binder_transfer_binder 1166 * @binder_transfer_binder:
1166 * Check whether @from is allowed to transfer a binder reference to @to. 1167 * Check whether @from is allowed to transfer a binder reference to @to.
1167 * @from contains the task_struct for the sending task. 1168 * @from contains the task_struct for the sending task.
1168 * @to contains the task_struct for the receiving task. 1169 * @to contains the task_struct for the receiving task.
1169 * @binder_transfer_file 1170 * @binder_transfer_file:
1170 * Check whether @from is allowed to transfer @file to @to. 1171 * Check whether @from is allowed to transfer @file to @to.
1171 * @from contains the task_struct for the sending task. 1172 * @from contains the task_struct for the sending task.
1172 * @file contains the struct file being transferred. 1173 * @file contains the struct file being transferred.
@@ -1214,7 +1215,7 @@
1214 * @cred contains the credentials to use. 1215 * @cred contains the credentials to use.
1215 * @ns contains the user namespace we want the capability in 1216 * @ns contains the user namespace we want the capability in
1216 * @cap contains the capability <include/linux/capability.h>. 1217 * @cap contains the capability <include/linux/capability.h>.
1217 * @audit: Whether to write an audit message or not 1218 * @audit contains whether to write an audit message or not
1218 * Return 0 if the capability is granted for @tsk. 1219 * Return 0 if the capability is granted for @tsk.
1219 * @syslog: 1220 * @syslog:
1220 * Check permission before accessing the kernel message ring or changing 1221 * Check permission before accessing the kernel message ring or changing
@@ -1336,9 +1337,7 @@
1336 * @inode we wish to get the security context of. 1337 * @inode we wish to get the security context of.
1337 * @ctx is a pointer in which to place the allocated security context. 1338 * @ctx is a pointer in which to place the allocated security context.
1338 * @ctxlen points to the place to put the length of @ctx. 1339 * @ctxlen points to the place to put the length of @ctx.
1339 * This is the main security structure.
1340 */ 1340 */
1341
1342union security_list_options { 1341union security_list_options {
1343 int (*binder_set_context_mgr)(struct task_struct *mgr); 1342 int (*binder_set_context_mgr)(struct task_struct *mgr);
1344 int (*binder_transaction)(struct task_struct *from, 1343 int (*binder_transaction)(struct task_struct *from,
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 8f67b1581683..de0d889e4fe1 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -785,7 +785,7 @@ struct nand_manufacturer_ops {
785 * Minimum amount of bit errors per @ecc_step_ds guaranteed 785 * Minimum amount of bit errors per @ecc_step_ds guaranteed
786 * to be correctable. If unknown, set to zero. 786 * to be correctable. If unknown, set to zero.
787 * @ecc_step_ds: [INTERN] ECC step required by the @ecc_strength_ds, 787 * @ecc_step_ds: [INTERN] ECC step required by the @ecc_strength_ds,
788 * also from the datasheet. It is the recommended ECC step 788 * also from the datasheet. It is the recommended ECC step
789 * size, if known; if unknown, set to zero. 789 * size, if known; if unknown, set to zero.
790 * @onfi_timing_mode_default: [INTERN] default ONFI timing mode. This field is 790 * @onfi_timing_mode_default: [INTERN] default ONFI timing mode. This field is
791 * set to the actually used ONFI mode if the chip is 791 * set to the actually used ONFI mode if the chip is
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 1127fe31645d..ffcba1f337da 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -214,9 +214,9 @@ enum mutex_trylock_recursive_enum {
214 * raisins, and once those are gone this will be removed. 214 * raisins, and once those are gone this will be removed.
215 * 215 *
216 * Returns: 216 * Returns:
217 * MUTEX_TRYLOCK_FAILED - trylock failed, 217 * - MUTEX_TRYLOCK_FAILED - trylock failed,
218 * MUTEX_TRYLOCK_SUCCESS - lock acquired, 218 * - MUTEX_TRYLOCK_SUCCESS - lock acquired,
219 * MUTEX_TRYLOCK_RECURSIVE - we already owned the lock. 219 * - MUTEX_TRYLOCK_RECURSIVE - we already owned the lock.
220 */ 220 */
221static inline /* __deprecated */ __must_check enum mutex_trylock_recursive_enum 221static inline /* __deprecated */ __must_check enum mutex_trylock_recursive_enum
222mutex_trylock_recursive(struct mutex *lock) 222mutex_trylock_recursive(struct mutex *lock)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4ed952c17fc7..24e88b33a06c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1432,13 +1432,14 @@ enum netdev_priv_flags {
1432 1432
1433/** 1433/**
1434 * struct net_device - The DEVICE structure. 1434 * struct net_device - The DEVICE structure.
1435 * Actually, this whole structure is a big mistake. It mixes I/O 1435 *
1436 * data with strictly "high-level" data, and it has to know about 1436 * Actually, this whole structure is a big mistake. It mixes I/O
1437 * almost every data structure used in the INET module. 1437 * data with strictly "high-level" data, and it has to know about
1438 * almost every data structure used in the INET module.
1438 * 1439 *
1439 * @name: This is the first field of the "visible" part of this structure 1440 * @name: This is the first field of the "visible" part of this structure
1440 * (i.e. as seen by users in the "Space.c" file). It is the name 1441 * (i.e. as seen by users in the "Space.c" file). It is the name
1441 * of the interface. 1442 * of the interface.
1442 * 1443 *
1443 * @name_hlist: Device name hash chain, please keep it close to name[] 1444 * @name_hlist: Device name hash chain, please keep it close to name[]
1444 * @ifalias: SNMP alias 1445 * @ifalias: SNMP alias
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a098d95b3d84..25b1659c832a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2691,7 +2691,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio);
2691 * @offset: the offset within the fragment (starting at the 2691 * @offset: the offset within the fragment (starting at the
2692 * fragment's own offset) 2692 * fragment's own offset)
2693 * @size: the number of bytes to map 2693 * @size: the number of bytes to map
2694 * @dir: the direction of the mapping (%PCI_DMA_*) 2694 * @dir: the direction of the mapping (``PCI_DMA_*``)
2695 * 2695 *
2696 * Maps the page associated with @frag to @device. 2696 * Maps the page associated with @frag to @device.
2697 */ 2697 */
diff --git a/include/net/sock.h b/include/net/sock.h
index f33e3d134e0b..f97da141d920 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1953,11 +1953,10 @@ static inline bool sk_has_allocations(const struct sock *sk)
1953 * The purpose of the skwq_has_sleeper and sock_poll_wait is to wrap the memory 1953 * The purpose of the skwq_has_sleeper and sock_poll_wait is to wrap the memory
1954 * barrier call. They were added due to the race found within the tcp code. 1954 * barrier call. They were added due to the race found within the tcp code.
1955 * 1955 *
1956 * Consider following tcp code paths: 1956 * Consider following tcp code paths::
1957 * 1957 *
1958 * CPU1 CPU2 1958 * CPU1 CPU2
1959 * 1959 * sys_select receive packet
1960 * sys_select receive packet
1961 * ... ... 1960 * ... ...
1962 * __add_wait_queue update tp->rcv_nxt 1961 * __add_wait_queue update tp->rcv_nxt
1963 * ... ... 1962 * ... ...
@@ -2264,7 +2263,7 @@ void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags);
2264 * @tsflags: timestamping flags to use 2263 * @tsflags: timestamping flags to use
2265 * @tx_flags: completed with instructions for time stamping 2264 * @tx_flags: completed with instructions for time stamping
2266 * 2265 *
2267 * Note : callers should take care of initial *tx_flags value (usually 0) 2266 * Note: callers should take care of initial ``*tx_flags`` value (usually 0)
2268 */ 2267 */
2269static inline void sock_tx_timestamp(const struct sock *sk, __u16 tsflags, 2268static inline void sock_tx_timestamp(const struct sock *sk, __u16 tsflags,
2270 __u8 *tx_flags) 2269 __u8 *tx_flags)
diff --git a/kernel/cred.c b/kernel/cred.c
index 2bc66075740f..ecf03657e71c 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -1,4 +1,4 @@
1/* Task credentials management - see Documentation/security/credentials.txt 1/* Task credentials management - see Documentation/security/credentials.rst
2 * 2 *
3 * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved. 3 * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved.
4 * Written by David Howells (dhowells@redhat.com) 4 * Written by David Howells (dhowells@redhat.com)
diff --git a/kernel/futex.c b/kernel/futex.c
index d6cf71d08f21..c934689043b2 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -488,7 +488,7 @@ static void drop_futex_key_refs(union futex_key *key)
488 * 488 *
489 * Return: a negative error code or 0 489 * Return: a negative error code or 0
490 * 490 *
491 * The key words are stored in *key on success. 491 * The key words are stored in @key on success.
492 * 492 *
493 * For shared mappings, it's (page->index, file_inode(vma->vm_file), 493 * For shared mappings, it's (page->index, file_inode(vma->vm_file),
494 * offset_within_page). For private mappings, it's (uaddr, current->mm). 494 * offset_within_page). For private mappings, it's (uaddr, current->mm).
@@ -1259,9 +1259,9 @@ static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
1259 * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0) 1259 * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0)
1260 * 1260 *
1261 * Return: 1261 * Return:
1262 * 0 - ready to wait; 1262 * - 0 - ready to wait;
1263 * 1 - acquired the lock; 1263 * - 1 - acquired the lock;
1264 * <0 - error 1264 * - <0 - error
1265 * 1265 *
1266 * The hb->lock and futex_key refs shall be held by the caller. 1266 * The hb->lock and futex_key refs shall be held by the caller.
1267 */ 1267 */
@@ -1717,9 +1717,9 @@ void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key,
1717 * hb1 and hb2 must be held by the caller. 1717 * hb1 and hb2 must be held by the caller.
1718 * 1718 *
1719 * Return: 1719 * Return:
1720 * 0 - failed to acquire the lock atomically; 1720 * - 0 - failed to acquire the lock atomically;
1721 * >0 - acquired the lock, return value is vpid of the top_waiter 1721 * - >0 - acquired the lock, return value is vpid of the top_waiter
1722 * <0 - error 1722 * - <0 - error
1723 */ 1723 */
1724static int futex_proxy_trylock_atomic(u32 __user *pifutex, 1724static int futex_proxy_trylock_atomic(u32 __user *pifutex,
1725 struct futex_hash_bucket *hb1, 1725 struct futex_hash_bucket *hb1,
@@ -1785,8 +1785,8 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
1785 * uaddr2 atomically on behalf of the top waiter. 1785 * uaddr2 atomically on behalf of the top waiter.
1786 * 1786 *
1787 * Return: 1787 * Return:
1788 * >=0 - on success, the number of tasks requeued or woken; 1788 * - >=0 - on success, the number of tasks requeued or woken;
1789 * <0 - on error 1789 * - <0 - on error
1790 */ 1790 */
1791static int futex_requeue(u32 __user *uaddr1, unsigned int flags, 1791static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
1792 u32 __user *uaddr2, int nr_wake, int nr_requeue, 1792 u32 __user *uaddr2, int nr_wake, int nr_requeue,
@@ -2142,8 +2142,8 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
2142 * be paired with exactly one earlier call to queue_me(). 2142 * be paired with exactly one earlier call to queue_me().
2143 * 2143 *
2144 * Return: 2144 * Return:
2145 * 1 - if the futex_q was still queued (and we removed unqueued it); 2145 * - 1 - if the futex_q was still queued (and we removed unqueued it);
2146 * 0 - if the futex_q was already removed by the waking thread 2146 * - 0 - if the futex_q was already removed by the waking thread
2147 */ 2147 */
2148static int unqueue_me(struct futex_q *q) 2148static int unqueue_me(struct futex_q *q)
2149{ 2149{
@@ -2333,9 +2333,9 @@ static long futex_wait_restart(struct restart_block *restart);
2333 * acquire the lock. Must be called with the hb lock held. 2333 * acquire the lock. Must be called with the hb lock held.
2334 * 2334 *
2335 * Return: 2335 * Return:
2336 * 1 - success, lock taken; 2336 * - 1 - success, lock taken;
2337 * 0 - success, lock not taken; 2337 * - 0 - success, lock not taken;
2338 * <0 - on error (-EFAULT) 2338 * - <0 - on error (-EFAULT)
2339 */ 2339 */
2340static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) 2340static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
2341{ 2341{
@@ -2422,8 +2422,8 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
2422 * with no q.key reference on failure. 2422 * with no q.key reference on failure.
2423 * 2423 *
2424 * Return: 2424 * Return:
2425 * 0 - uaddr contains val and hb has been locked; 2425 * - 0 - uaddr contains val and hb has been locked;
2426 * <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlocked 2426 * - <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlocked
2427 */ 2427 */
2428static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags, 2428static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
2429 struct futex_q *q, struct futex_hash_bucket **hb) 2429 struct futex_q *q, struct futex_hash_bucket **hb)
@@ -2895,8 +2895,8 @@ pi_faulted:
2895 * called with the hb lock held. 2895 * called with the hb lock held.
2896 * 2896 *
2897 * Return: 2897 * Return:
2898 * 0 = no early wakeup detected; 2898 * - 0 = no early wakeup detected;
2899 * <0 = -ETIMEDOUT or -ERESTARTNOINTR 2899 * - <0 = -ETIMEDOUT or -ERESTARTNOINTR
2900 */ 2900 */
2901static inline 2901static inline
2902int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb, 2902int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
@@ -2968,8 +2968,8 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
2968 * If 4 or 7, we cleanup and return with -ETIMEDOUT. 2968 * If 4 or 7, we cleanup and return with -ETIMEDOUT.
2969 * 2969 *
2970 * Return: 2970 * Return:
2971 * 0 - On success; 2971 * - 0 - On success;
2972 * <0 - On error 2972 * - <0 - On error
2973 */ 2973 */
2974static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, 2974static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
2975 u32 val, ktime_t *abs_time, u32 bitset, 2975 u32 val, ktime_t *abs_time, u32 bitset,
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 2e30d925a40d..ad43468e89f0 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -7,7 +7,7 @@
7 * This file contains the core interrupt handling code, for irq-chip 7 * This file contains the core interrupt handling code, for irq-chip
8 * based architectures. 8 * based architectures.
9 * 9 *
10 * Detailed information is available in Documentation/DocBook/genericirq 10 * Detailed information is available in Documentation/core-api/genericirq.rst
11 */ 11 */
12 12
13#include <linux/irq.h> 13#include <linux/irq.h>
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index eb4d3e8945b8..79f987b942b8 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -6,7 +6,7 @@
6 * 6 *
7 * This file contains the core interrupt handling code. 7 * This file contains the core interrupt handling code.
8 * 8 *
9 * Detailed information is available in Documentation/DocBook/genericirq 9 * Detailed information is available in Documentation/core-api/genericirq.rst
10 * 10 *
11 */ 11 */
12 12
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 948b50e78549..8bbd06405e60 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -4,7 +4,7 @@
4 * 4 *
5 * This file contains the interrupt descriptor management code 5 * This file contains the interrupt descriptor management code
6 * 6 *
7 * Detailed information is available in Documentation/DocBook/genericirq 7 * Detailed information is available in Documentation/core-api/genericirq.rst
8 * 8 *
9 */ 9 */
10#include <linux/irq.h> 10#include <linux/irq.h>
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 198527a62149..858a07590e39 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -227,9 +227,9 @@ static void __sched __mutex_lock_slowpath(struct mutex *lock);
227 * (or statically defined) before it can be locked. memset()-ing 227 * (or statically defined) before it can be locked. memset()-ing
228 * the mutex to 0 is not allowed. 228 * the mutex to 0 is not allowed.
229 * 229 *
230 * ( The CONFIG_DEBUG_MUTEXES .config option turns on debugging 230 * (The CONFIG_DEBUG_MUTEXES .config option turns on debugging
231 * checks that will enforce the restrictions and will also do 231 * checks that will enforce the restrictions and will also do
232 * deadlock debugging. ) 232 * deadlock debugging)
233 * 233 *
234 * This function is similar to (but not equivalent to) down(). 234 * This function is similar to (but not equivalent to) down().
235 */ 235 */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 9c5d40a50930..ca9460f049b8 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -286,7 +286,7 @@ config DEBUG_FS
286 write to these files. 286 write to these files.
287 287
288 For detailed documentation on the debugfs API, see 288 For detailed documentation on the debugfs API, see
289 Documentation/DocBook/filesystems. 289 Documentation/filesystems/.
290 290
291 If unsure, say N. 291 If unsure, say N.
292 292
diff --git a/lib/Kconfig.kgdb b/lib/Kconfig.kgdb
index 533f912638ed..ab4ff0eea776 100644
--- a/lib/Kconfig.kgdb
+++ b/lib/Kconfig.kgdb
@@ -13,7 +13,7 @@ menuconfig KGDB
13 CONFIG_FRAME_POINTER to aid in producing more reliable stack 13 CONFIG_FRAME_POINTER to aid in producing more reliable stack
14 backtraces in the external debugger. Documentation of 14 backtraces in the external debugger. Documentation of
15 kernel debugger is available at http://kgdb.sourceforge.net 15 kernel debugger is available at http://kgdb.sourceforge.net
16 as well as in DocBook form in Documentation/DocBook/. If 16 as well as in Documentation/dev-tools/kgdb.rst. If
17 unsure, say N. 17 unsure, say N.
18 18
19if KGDB 19if KGDB
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 34678828e2bb..f9653987c0f9 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -181,7 +181,7 @@ done:
181 * 181 *
182 * This function will lock the socket if a skb is returned, so 182 * This function will lock the socket if a skb is returned, so
183 * the caller needs to unlock the socket in that case (usually by 183 * the caller needs to unlock the socket in that case (usually by
184 * calling skb_free_datagram). Returns NULL with *err set to 184 * calling skb_free_datagram). Returns NULL with @err set to
185 * -EAGAIN if no data was available or to some other value if an 185 * -EAGAIN if no data was available or to some other value if an
186 * error was detected. 186 * error was detected.
187 * 187 *
diff --git a/net/core/sock.c b/net/core/sock.c
index 727f924b7f91..0c3fc16223f9 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2675,9 +2675,12 @@ EXPORT_SYMBOL(release_sock);
2675 * @sk: socket 2675 * @sk: socket
2676 * 2676 *
2677 * This version should be used for very small section, where process wont block 2677 * This version should be used for very small section, where process wont block
2678 * return false if fast path is taken 2678 * return false if fast path is taken:
2679 *
2679 * sk_lock.slock locked, owned = 0, BH disabled 2680 * sk_lock.slock locked, owned = 0, BH disabled
2680 * return true if slow path is taken 2681 *
2682 * return true if slow path is taken:
2683 *
2681 * sk_lock.slock unlocked, owned = 1, BH enabled 2684 * sk_lock.slock unlocked, owned = 1, BH enabled
2682 */ 2685 */
2683bool lock_sock_fast(struct sock *sk) 2686bool lock_sock_fast(struct sock *sk)
diff --git a/scripts/.gitignore b/scripts/.gitignore
index e063daa3ec4a..0442c06eefcb 100644
--- a/scripts/.gitignore
+++ b/scripts/.gitignore
@@ -7,7 +7,6 @@ pnmtologo
7unifdef 7unifdef
8ihex2fw 8ihex2fw
9recordmcount 9recordmcount
10docproc
11check-lc_ctype 10check-lc_ctype
12sortextable 11sortextable
13asn1_compiler 12asn1_compiler
diff --git a/scripts/Makefile b/scripts/Makefile
index 1d80897a9644..c06f4997d700 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -6,8 +6,6 @@
6# pnmttologo: Convert pnm files to logo files 6# pnmttologo: Convert pnm files to logo files
7# conmakehash: Create chartable 7# conmakehash: Create chartable
8# conmakehash: Create arrays for initializing the kernel console tables 8# conmakehash: Create arrays for initializing the kernel console tables
9# docproc: Used in Documentation/DocBook
10# check-lc_ctype: Used in Documentation/DocBook
11 9
12HOST_EXTRACFLAGS += -I$(srctree)/tools/include 10HOST_EXTRACFLAGS += -I$(srctree)/tools/include
13 11
@@ -29,16 +27,12 @@ HOSTLOADLIBES_extract-cert = -lcrypto
29always := $(hostprogs-y) $(hostprogs-m) 27always := $(hostprogs-y) $(hostprogs-m)
30 28
31# The following hostprogs-y programs are only build on demand 29# The following hostprogs-y programs are only build on demand
32hostprogs-y += unifdef docproc check-lc_ctype 30hostprogs-y += unifdef
33 31
34# These targets are used internally to avoid "is up to date" messages 32# These targets are used internally to avoid "is up to date" messages
35PHONY += build_unifdef build_docproc build_check-lc_ctype 33PHONY += build_unifdef
36build_unifdef: $(obj)/unifdef 34build_unifdef: $(obj)/unifdef
37 @: 35 @:
38build_docproc: $(obj)/docproc
39 @:
40build_check-lc_ctype: $(obj)/check-lc_ctype
41 @:
42 36
43subdir-$(CONFIG_MODVERSIONS) += genksyms 37subdir-$(CONFIG_MODVERSIONS) += genksyms
44subdir-y += mod 38subdir-y += mod
diff --git a/scripts/check-lc_ctype.c b/scripts/check-lc_ctype.c
deleted file mode 100644
index 9097ff5449fb..000000000000
--- a/scripts/check-lc_ctype.c
+++ /dev/null
@@ -1,11 +0,0 @@
1/*
2 * Check that a specified locale works as LC_CTYPE. Used by the
3 * DocBook build system to probe for C.UTF-8 support.
4 */
5
6#include <locale.h>
7
8int main(void)
9{
10 return !setlocale(LC_CTYPE, "");
11}
diff --git a/scripts/docproc.c b/scripts/docproc.c
deleted file mode 100644
index 0a12593b9041..000000000000
--- a/scripts/docproc.c
+++ /dev/null
@@ -1,681 +0,0 @@
1/*
2 * docproc is a simple preprocessor for the template files
3 * used as placeholders for the kernel internal documentation.
4 * docproc is used for documentation-frontend and
5 * dependency-generator.
6 * The two usages have in common that they require
7 * some knowledge of the .tmpl syntax, therefore they
8 * are kept together.
9 *
10 * documentation-frontend
11 * Scans the template file and call kernel-doc for
12 * all occurrences of ![EIF]file
13 * Beforehand each referenced file is scanned for
14 * any symbols that are exported via these macros:
15 * EXPORT_SYMBOL(), EXPORT_SYMBOL_GPL(), &
16 * EXPORT_SYMBOL_GPL_FUTURE()
17 * This is used to create proper -function and
18 * -nofunction arguments in calls to kernel-doc.
19 * Usage: docproc doc file.tmpl
20 *
21 * dependency-generator:
22 * Scans the template file and list all files
23 * referenced in a format recognized by make.
24 * Usage: docproc depend file.tmpl
25 * Writes dependency information to stdout
26 * in the following format:
27 * file.tmpl src.c src2.c
28 * The filenames are obtained from the following constructs:
29 * !Efilename
30 * !Ifilename
31 * !Dfilename
32 * !Ffilename
33 * !Pfilename
34 *
35 */
36
37#define _GNU_SOURCE
38#include <stdio.h>
39#include <stdlib.h>
40#include <string.h>
41#include <ctype.h>
42#include <unistd.h>
43#include <limits.h>
44#include <errno.h>
45#include <getopt.h>
46#include <sys/types.h>
47#include <sys/wait.h>
48#include <time.h>
49
50/* exitstatus is used to keep track of any failing calls to kernel-doc,
51 * but execution continues. */
52int exitstatus = 0;
53
54typedef void DFL(char *);
55DFL *defaultline;
56
57typedef void FILEONLY(char * file);
58FILEONLY *internalfunctions;
59FILEONLY *externalfunctions;
60FILEONLY *symbolsonly;
61FILEONLY *findall;
62
63typedef void FILELINE(char * file, char * line);
64FILELINE * singlefunctions;
65FILELINE * entity_system;
66FILELINE * docsection;
67
68#define MAXLINESZ 2048
69#define MAXFILES 250
70#define KERNELDOCPATH "scripts/"
71#define KERNELDOC "kernel-doc"
72#define DOCBOOK "-docbook"
73#define RST "-rst"
74#define LIST "-list"
75#define FUNCTION "-function"
76#define NOFUNCTION "-nofunction"
77#define NODOCSECTIONS "-no-doc-sections"
78#define SHOWNOTFOUND "-show-not-found"
79
80enum file_format {
81 FORMAT_AUTO,
82 FORMAT_DOCBOOK,
83 FORMAT_RST,
84};
85
86static enum file_format file_format = FORMAT_AUTO;
87
88#define KERNELDOC_FORMAT (file_format == FORMAT_RST ? RST : DOCBOOK)
89
90static char *srctree, *kernsrctree;
91
92static char **all_list = NULL;
93static int all_list_len = 0;
94
95static void consume_symbol(const char *sym)
96{
97 int i;
98
99 for (i = 0; i < all_list_len; i++) {
100 if (!all_list[i])
101 continue;
102 if (strcmp(sym, all_list[i]))
103 continue;
104 all_list[i] = NULL;
105 break;
106 }
107}
108
109static void usage (void)
110{
111 fprintf(stderr, "Usage: docproc [{--docbook|--rst}] {doc|depend} file\n");
112 fprintf(stderr, "Input is read from file.tmpl. Output is sent to stdout\n");
113 fprintf(stderr, "doc: frontend when generating kernel documentation\n");
114 fprintf(stderr, "depend: generate list of files referenced within file\n");
115 fprintf(stderr, "Environment variable SRCTREE: absolute path to sources.\n");
116 fprintf(stderr, " KBUILD_SRC: absolute path to kernel source tree.\n");
117}
118
119/*
120 * Execute kernel-doc with parameters given in svec
121 */
122static void exec_kernel_doc(char **svec)
123{
124 pid_t pid;
125 int ret;
126 char real_filename[PATH_MAX + 1];
127 /* Make sure output generated so far are flushed */
128 fflush(stdout);
129 switch (pid=fork()) {
130 case -1:
131 perror("fork");
132 exit(1);
133 case 0:
134 memset(real_filename, 0, sizeof(real_filename));
135 strncat(real_filename, kernsrctree, PATH_MAX);
136 strncat(real_filename, "/" KERNELDOCPATH KERNELDOC,
137 PATH_MAX - strlen(real_filename));
138 execvp(real_filename, svec);
139 fprintf(stderr, "exec ");
140 perror(real_filename);
141 exit(1);
142 default:
143 waitpid(pid, &ret ,0);
144 }
145 if (WIFEXITED(ret))
146 exitstatus |= WEXITSTATUS(ret);
147 else
148 exitstatus = 0xff;
149}
150
151/* Types used to create list of all exported symbols in a number of files */
152struct symbols
153{
154 char *name;
155};
156
157struct symfile
158{
159 char *filename;
160 struct symbols *symbollist;
161 int symbolcnt;
162};
163
164struct symfile symfilelist[MAXFILES];
165int symfilecnt = 0;
166
167static void add_new_symbol(struct symfile *sym, char * symname)
168{
169 sym->symbollist =
170 realloc(sym->symbollist, (sym->symbolcnt + 1) * sizeof(char *));
171 sym->symbollist[sym->symbolcnt++].name = strdup(symname);
172}
173
174/* Add a filename to the list */
175static struct symfile * add_new_file(char * filename)
176{
177 symfilelist[symfilecnt++].filename = strdup(filename);
178 return &symfilelist[symfilecnt - 1];
179}
180
181/* Check if file already are present in the list */
182static struct symfile * filename_exist(char * filename)
183{
184 int i;
185 for (i=0; i < symfilecnt; i++)
186 if (strcmp(symfilelist[i].filename, filename) == 0)
187 return &symfilelist[i];
188 return NULL;
189}
190
191/*
192 * List all files referenced within the template file.
193 * Files are separated by tabs.
194 */
195static void adddep(char * file) { printf("\t%s", file); }
196static void adddep2(char * file, char * line) { line = line; adddep(file); }
197static void noaction(char * line) { line = line; }
198static void noaction2(char * file, char * line) { file = file; line = line; }
199
200/* Echo the line without further action */
201static void printline(char * line) { printf("%s", line); }
202
203/*
204 * Find all symbols in filename that are exported with EXPORT_SYMBOL &
205 * EXPORT_SYMBOL_GPL (& EXPORT_SYMBOL_GPL_FUTURE implicitly).
206 * All symbols located are stored in symfilelist.
207 */
208static void find_export_symbols(char * filename)
209{
210 FILE * fp;
211 struct symfile *sym;
212 char line[MAXLINESZ];
213 if (filename_exist(filename) == NULL) {
214 char real_filename[PATH_MAX + 1];
215 memset(real_filename, 0, sizeof(real_filename));
216 strncat(real_filename, srctree, PATH_MAX);
217 strncat(real_filename, "/", PATH_MAX - strlen(real_filename));
218 strncat(real_filename, filename,
219 PATH_MAX - strlen(real_filename));
220 sym = add_new_file(filename);
221 fp = fopen(real_filename, "r");
222 if (fp == NULL) {
223 fprintf(stderr, "docproc: ");
224 perror(real_filename);
225 exit(1);
226 }
227 while (fgets(line, MAXLINESZ, fp)) {
228 char *p;
229 char *e;
230 if (((p = strstr(line, "EXPORT_SYMBOL_GPL")) != NULL) ||
231 ((p = strstr(line, "EXPORT_SYMBOL")) != NULL)) {
232 /* Skip EXPORT_SYMBOL{_GPL} */
233 while (isalnum(*p) || *p == '_')
234 p++;
235 /* Remove parentheses & additional whitespace */
236 while (isspace(*p))
237 p++;
238 if (*p != '(')
239 continue; /* Syntax error? */
240 else
241 p++;
242 while (isspace(*p))
243 p++;
244 e = p;
245 while (isalnum(*e) || *e == '_')
246 e++;
247 *e = '\0';
248 add_new_symbol(sym, p);
249 }
250 }
251 fclose(fp);
252 }
253}
254
255/*
256 * Document all external or internal functions in a file.
257 * Call kernel-doc with following parameters:
258 * kernel-doc [-docbook|-rst] -nofunction function_name1 filename
259 * Function names are obtained from all the src files
260 * by find_export_symbols.
261 * intfunc uses -nofunction
262 * extfunc uses -function
263 */
264static void docfunctions(char * filename, char * type)
265{
266 int i,j;
267 int symcnt = 0;
268 int idx = 0;
269 char **vec;
270
271 for (i=0; i <= symfilecnt; i++)
272 symcnt += symfilelist[i].symbolcnt;
273 vec = malloc((2 + 2 * symcnt + 3) * sizeof(char *));
274 if (vec == NULL) {
275 perror("docproc: ");
276 exit(1);
277 }
278 vec[idx++] = KERNELDOC;
279 vec[idx++] = KERNELDOC_FORMAT;
280 vec[idx++] = NODOCSECTIONS;
281 for (i=0; i < symfilecnt; i++) {
282 struct symfile * sym = &symfilelist[i];
283 for (j=0; j < sym->symbolcnt; j++) {
284 vec[idx++] = type;
285 consume_symbol(sym->symbollist[j].name);
286 vec[idx++] = sym->symbollist[j].name;
287 }
288 }
289 vec[idx++] = filename;
290 vec[idx] = NULL;
291 if (file_format == FORMAT_RST)
292 printf(".. %s\n", filename);
293 else
294 printf("<!-- %s -->\n", filename);
295 exec_kernel_doc(vec);
296 fflush(stdout);
297 free(vec);
298}
299static void intfunc(char * filename) { docfunctions(filename, NOFUNCTION); }
300static void extfunc(char * filename) { docfunctions(filename, FUNCTION); }
301
302/*
303 * Document specific function(s) in a file.
304 * Call kernel-doc with the following parameters:
305 * kernel-doc -docbook -function function1 [-function function2]
306 */
307static void singfunc(char * filename, char * line)
308{
309 char *vec[200]; /* Enough for specific functions */
310 int i, idx = 0;
311 int startofsym = 1;
312 vec[idx++] = KERNELDOC;
313 vec[idx++] = KERNELDOC_FORMAT;
314 vec[idx++] = SHOWNOTFOUND;
315
316 /* Split line up in individual parameters preceded by FUNCTION */
317 for (i=0; line[i]; i++) {
318 if (isspace(line[i])) {
319 line[i] = '\0';
320 startofsym = 1;
321 continue;
322 }
323 if (startofsym) {
324 startofsym = 0;
325 vec[idx++] = FUNCTION;
326 vec[idx++] = &line[i];
327 }
328 }
329 for (i = 0; i < idx; i++) {
330 if (strcmp(vec[i], FUNCTION))
331 continue;
332 consume_symbol(vec[i + 1]);
333 }
334 vec[idx++] = filename;
335 vec[idx] = NULL;
336 exec_kernel_doc(vec);
337}
338
339/*
340 * Insert specific documentation section from a file.
341 * Call kernel-doc with the following parameters:
342 * kernel-doc -docbook -function "doc section" filename
343 */
344static void docsect(char *filename, char *line)
345{
346 /* kerneldoc -docbook -show-not-found -function "section" file NULL */
347 char *vec[7];
348 char *s;
349
350 for (s = line; *s; s++)
351 if (*s == '\n')
352 *s = '\0';
353
354 if (asprintf(&s, "DOC: %s", line) < 0) {
355 perror("asprintf");
356 exit(1);
357 }
358 consume_symbol(s);
359 free(s);
360
361 vec[0] = KERNELDOC;
362 vec[1] = KERNELDOC_FORMAT;
363 vec[2] = SHOWNOTFOUND;
364 vec[3] = FUNCTION;
365 vec[4] = line;
366 vec[5] = filename;
367 vec[6] = NULL;
368 exec_kernel_doc(vec);
369}
370
371static void find_all_symbols(char *filename)
372{
373 char *vec[4]; /* kerneldoc -list file NULL */
374 pid_t pid;
375 int ret, i, count, start;
376 char real_filename[PATH_MAX + 1];
377 int pipefd[2];
378 char *data, *str;
379 size_t data_len = 0;
380
381 vec[0] = KERNELDOC;
382 vec[1] = LIST;
383 vec[2] = filename;
384 vec[3] = NULL;
385
386 if (pipe(pipefd)) {
387 perror("pipe");
388 exit(1);
389 }
390
391 switch (pid=fork()) {
392 case -1:
393 perror("fork");
394 exit(1);
395 case 0:
396 close(pipefd[0]);
397 dup2(pipefd[1], 1);
398 memset(real_filename, 0, sizeof(real_filename));
399 strncat(real_filename, kernsrctree, PATH_MAX);
400 strncat(real_filename, "/" KERNELDOCPATH KERNELDOC,
401 PATH_MAX - strlen(real_filename));
402 execvp(real_filename, vec);
403 fprintf(stderr, "exec ");
404 perror(real_filename);
405 exit(1);
406 default:
407 close(pipefd[1]);
408 data = malloc(4096);
409 do {
410 while ((ret = read(pipefd[0],
411 data + data_len,
412 4096)) > 0) {
413 data_len += ret;
414 data = realloc(data, data_len + 4096);
415 }
416 } while (ret == -EAGAIN);
417 if (ret != 0) {
418 perror("read");
419 exit(1);
420 }
421 waitpid(pid, &ret ,0);
422 }
423 if (WIFEXITED(ret))
424 exitstatus |= WEXITSTATUS(ret);
425 else
426 exitstatus = 0xff;
427
428 count = 0;
429 /* poor man's strtok, but with counting */
430 for (i = 0; i < data_len; i++) {
431 if (data[i] == '\n') {
432 count++;
433 data[i] = '\0';
434 }
435 }
436 start = all_list_len;
437 all_list_len += count;
438 all_list = realloc(all_list, sizeof(char *) * all_list_len);
439 str = data;
440 for (i = 0; i < data_len && start != all_list_len; i++) {
441 if (data[i] == '\0') {
442 all_list[start] = str;
443 str = data + i + 1;
444 start++;
445 }
446 }
447}
448
449/*
450 * Terminate s at first space, if any. If there was a space, return pointer to
451 * the character after that. Otherwise, return pointer to the terminating NUL.
452 */
453static char *chomp(char *s)
454{
455 while (*s && !isspace(*s))
456 s++;
457
458 if (*s)
459 *s++ = '\0';
460
461 return s;
462}
463
464/* Return pointer to directive content, or NULL if not a directive. */
465static char *is_directive(char *line)
466{
467 if (file_format == FORMAT_DOCBOOK && line[0] == '!')
468 return line + 1;
469 else if (file_format == FORMAT_RST && !strncmp(line, ".. !", 4))
470 return line + 4;
471
472 return NULL;
473}
474
475/*
476 * Parse file, calling action specific functions for:
477 * 1) Lines containing !E
478 * 2) Lines containing !I
479 * 3) Lines containing !D
480 * 4) Lines containing !F
481 * 5) Lines containing !P
482 * 6) Lines containing !C
483 * 7) Default lines - lines not matching the above
484 */
485static void parse_file(FILE *infile)
486{
487 char line[MAXLINESZ];
488 char *p, *s;
489 while (fgets(line, MAXLINESZ, infile)) {
490 p = is_directive(line);
491 if (!p) {
492 defaultline(line);
493 continue;
494 }
495
496 switch (*p++) {
497 case 'E':
498 chomp(p);
499 externalfunctions(p);
500 break;
501 case 'I':
502 chomp(p);
503 internalfunctions(p);
504 break;
505 case 'D':
506 chomp(p);
507 symbolsonly(p);
508 break;
509 case 'F':
510 /* filename */
511 s = chomp(p);
512 /* function names */
513 while (isspace(*s))
514 s++;
515 singlefunctions(p, s);
516 break;
517 case 'P':
518 /* filename */
519 s = chomp(p);
520 /* DOC: section name */
521 while (isspace(*s))
522 s++;
523 docsection(p, s);
524 break;
525 case 'C':
526 chomp(p);
527 if (findall)
528 findall(p);
529 break;
530 default:
531 defaultline(line);
532 }
533 }
534 fflush(stdout);
535}
536
537/*
538 * Is this a RestructuredText template? Answer the question by seeing if its
539 * name ends in ".rst".
540 */
541static int is_rst(const char *file)
542{
543 char *dot = strrchr(file, '.');
544
545 return dot && !strcmp(dot + 1, "rst");
546}
547
548enum opts {
549 OPT_DOCBOOK,
550 OPT_RST,
551 OPT_HELP,
552};
553
554int main(int argc, char *argv[])
555{
556 const char *subcommand, *filename;
557 FILE * infile;
558 int i;
559
560 srctree = getenv("SRCTREE");
561 if (!srctree)
562 srctree = getcwd(NULL, 0);
563 kernsrctree = getenv("KBUILD_SRC");
564 if (!kernsrctree || !*kernsrctree)
565 kernsrctree = srctree;
566
567 for (;;) {
568 int c;
569 struct option opts[] = {
570 { "docbook", no_argument, NULL, OPT_DOCBOOK },
571 { "rst", no_argument, NULL, OPT_RST },
572 { "help", no_argument, NULL, OPT_HELP },
573 {}
574 };
575
576 c = getopt_long_only(argc, argv, "", opts, NULL);
577 if (c == -1)
578 break;
579
580 switch (c) {
581 case OPT_DOCBOOK:
582 file_format = FORMAT_DOCBOOK;
583 break;
584 case OPT_RST:
585 file_format = FORMAT_RST;
586 break;
587 case OPT_HELP:
588 usage();
589 return 0;
590 default:
591 case '?':
592 usage();
593 return 1;
594 }
595 }
596
597 argc -= optind;
598 argv += optind;
599
600 if (argc != 2) {
601 usage();
602 exit(1);
603 }
604
605 subcommand = argv[0];
606 filename = argv[1];
607
608 if (file_format == FORMAT_AUTO)
609 file_format = is_rst(filename) ? FORMAT_RST : FORMAT_DOCBOOK;
610
611 /* Open file, exit on error */
612 infile = fopen(filename, "r");
613 if (infile == NULL) {
614 fprintf(stderr, "docproc: ");
615 perror(filename);
616 exit(2);
617 }
618
619 if (strcmp("doc", subcommand) == 0) {
620 if (file_format == FORMAT_RST) {
621 time_t t = time(NULL);
622 printf(".. generated from %s by docproc %s\n",
623 filename, ctime(&t));
624 }
625
626 /* Need to do this in two passes.
627 * First pass is used to collect all symbols exported
628 * in the various files;
629 * Second pass generate the documentation.
630 * This is required because some functions are declared
631 * and exported in different files :-((
632 */
633 /* Collect symbols */
634 defaultline = noaction;
635 internalfunctions = find_export_symbols;
636 externalfunctions = find_export_symbols;
637 symbolsonly = find_export_symbols;
638 singlefunctions = noaction2;
639 docsection = noaction2;
640 findall = find_all_symbols;
641 parse_file(infile);
642
643 /* Rewind to start from beginning of file again */
644 fseek(infile, 0, SEEK_SET);
645 defaultline = printline;
646 internalfunctions = intfunc;
647 externalfunctions = extfunc;
648 symbolsonly = printline;
649 singlefunctions = singfunc;
650 docsection = docsect;
651 findall = NULL;
652
653 parse_file(infile);
654
655 for (i = 0; i < all_list_len; i++) {
656 if (!all_list[i])
657 continue;
658 fprintf(stderr, "Warning: didn't use docs for %s\n",
659 all_list[i]);
660 }
661 } else if (strcmp("depend", subcommand) == 0) {
662 /* Create first part of dependency chain
663 * file.tmpl */
664 printf("%s\t", filename);
665 defaultline = noaction;
666 internalfunctions = adddep;
667 externalfunctions = adddep;
668 symbolsonly = adddep;
669 singlefunctions = adddep2;
670 docsection = adddep2;
671 findall = adddep;
672 parse_file(infile);
673 printf("\n");
674 } else {
675 fprintf(stderr, "Unknown option: %s\n", subcommand);
676 exit(1);
677 }
678 fclose(infile);
679 fflush(stdout);
680 return exitstatus;
681}
diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index a26a5f2dce39..c1ffd31ff423 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -2189,6 +2189,8 @@ sub dump_struct($$) {
2189 $members =~ s/\s*CRYPTO_MINALIGN_ATTR//gos; 2189 $members =~ s/\s*CRYPTO_MINALIGN_ATTR//gos;
2190 # replace DECLARE_BITMAP 2190 # replace DECLARE_BITMAP
2191 $members =~ s/DECLARE_BITMAP\s*\(([^,)]+), ([^,)]+)\)/unsigned long $1\[BITS_TO_LONGS($2)\]/gos; 2191 $members =~ s/DECLARE_BITMAP\s*\(([^,)]+), ([^,)]+)\)/unsigned long $1\[BITS_TO_LONGS($2)\]/gos;
2192 # replace DECLARE_HASHTABLE
2193 $members =~ s/DECLARE_HASHTABLE\s*\(([^,)]+), ([^,)]+)\)/unsigned long $1\[1 << (($2) - 1)\]/gos;
2192 2194
2193 create_parameterlist($members, ';', $file); 2195 create_parameterlist($members, ';', $file);
2194 check_sections($file, $declaration_name, "struct", $sectcheck, $struct_actual, $nested); 2196 check_sections($file, $declaration_name, "struct", $sectcheck, $struct_actual, $nested);
diff --git a/scripts/kernel-doc-xml-ref b/scripts/kernel-doc-xml-ref
deleted file mode 100755
index 104a5a5ba2c8..000000000000
--- a/scripts/kernel-doc-xml-ref
+++ /dev/null
@@ -1,198 +0,0 @@
1#!/usr/bin/perl -w
2
3use strict;
4
5## Copyright (C) 2015 Intel Corporation ##
6# ##
7## This software falls under the GNU General Public License. ##
8## Please read the COPYING file for more information ##
9#
10#
11# This software reads a XML file and a list of valid interal
12# references to replace Docbook tags with links.
13#
14# The list of "valid internal references" must be one-per-line in the following format:
15# API-struct-foo
16# API-enum-bar
17# API-my-function
18#
19# The software walks over the XML file looking for xml tags representing possible references
20# to the Document. Each reference will be cross checked against the "Valid Internal Reference" list. If
21# the referece is found it replaces its content by a <link> tag.
22#
23# usage:
24# kernel-doc-xml-ref -db filename
25# xml filename > outputfile
26
27# read arguments
28if ($#ARGV != 2) {
29 usage();
30}
31
32#Holds the database filename
33my $databasefile;
34my @database;
35
36#holds the inputfile
37my $inputfile;
38my $errors = 0;
39
40my %highlights = (
41 "<function>(.*?)</function>",
42 "\"<function>\" . convert_function(\$1, \$line) . \"</function>\"",
43 "<structname>(.*?)</structname>",
44 "\"<structname>\" . convert_struct(\$1) . \"</structname>\"",
45 "<funcdef>(.*?)<function>(.*?)</function></funcdef>",
46 "\"<funcdef>\" . convert_param(\$1) . \"<function>\$2</function></funcdef>\"",
47 "<paramdef>(.*?)<parameter>(.*?)</parameter></paramdef>",
48 "\"<paramdef>\" . convert_param(\$1) . \"<parameter>\$2</parameter></paramdef>\"");
49
50while($ARGV[0] =~ m/^-(.*)/) {
51 my $cmd = shift @ARGV;
52 if ($cmd eq "-db") {
53 $databasefile = shift @ARGV
54 } else {
55 usage();
56 }
57}
58$inputfile = shift @ARGV;
59
60sub open_database {
61 open (my $handle, '<', $databasefile) or die "Cannot open $databasefile";
62 chomp(my @lines = <$handle>);
63 close $handle;
64
65 @database = @lines;
66}
67
68sub process_file {
69 open_database();
70
71 my $dohighlight;
72 foreach my $pattern (keys %highlights) {
73 $dohighlight .= "\$line =~ s:$pattern:$highlights{$pattern}:eg;\n";
74 }
75
76 open(FILE, $inputfile) or die("Could not open $inputfile") or die ("Cannot open $inputfile");
77 foreach my $line (<FILE>) {
78 eval $dohighlight;
79 print $line;
80 }
81}
82
83sub trim($_)
84{
85 my $str = $_[0];
86 $str =~ s/^\s+|\s+$//g;
87 return $str
88}
89
90sub has_key_defined($_)
91{
92 if ( grep( /^$_[0]$/, @database)) {
93 return 1;
94 }
95 return 0;
96}
97
98# Gets a <function> content and add it a hyperlink if possible.
99sub convert_function($_)
100{
101 my $arg = $_[0];
102 my $key = $_[0];
103
104 my $line = $_[1];
105
106 $key = trim($key);
107
108 $key =~ s/[^A-Za-z0-9]/-/g;
109 $key = "API-" . $key;
110
111 # We shouldn't add links to <funcdef> prototype
112 if (!has_key_defined($key) || $line =~ m/\s+<funcdef/i) {
113 return $arg;
114 }
115
116 my $head = $arg;
117 my $tail = "";
118 if ($arg =~ /(.*?)( ?)$/) {
119 $head = $1;
120 $tail = $2;
121 }
122 return "<link linkend=\"$key\">$head</link>$tail";
123}
124
125# Converting a struct text to link
126sub convert_struct($_)
127{
128 my $arg = $_[0];
129 my $key = $_[0];
130 $key =~ s/(struct )?(\w)/$2/g;
131 $key =~ s/[^A-Za-z0-9]/-/g;
132 $key = "API-struct-" . $key;
133
134 if (!has_key_defined($key)) {
135 return $arg;
136 }
137
138 my ($head, $tail) = split_pointer($arg);
139 return "<link linkend=\"$key\">$head</link>$tail";
140}
141
142# Identify "object *" elements
143sub split_pointer($_)
144{
145 my $arg = $_[0];
146 if ($arg =~ /(.*?)( ?\* ?)/) {
147 return ($1, $2);
148 }
149 return ($arg, "");
150}
151
152sub convert_param($_)
153{
154 my $type = $_[0];
155 my $keyname = convert_key_name($type);
156
157 if (!has_key_defined($keyname)) {
158 return $type;
159 }
160
161 my ($head, $tail) = split_pointer($type);
162 return "<link linkend=\"$keyname\">$head</link>$tail";
163
164}
165
166# DocBook links are in the API-<TYPE>-<STRUCT-NAME> format
167# This method gets an element and returns a valid DocBook reference for it.
168sub convert_key_name($_)
169{
170 #Pattern $2 is optional and might be uninitialized
171 no warnings 'uninitialized';
172
173 my $str = $_[0];
174 $str =~ s/(const|static)? ?(struct)? ?([a-zA-Z0-9_]+) ?(\*|&)?/$2 $3/g ;
175
176 # trim
177 $str =~ s/^\s+|\s+$//g;
178
179 # spaces and _ to -
180 $str =~ s/[^A-Za-z0-9]/-/g;
181
182 return "API-" . $str;
183}
184
185sub usage {
186 print "Usage: $0 -db database filename\n";
187 print " xml source file(s) > outputfile\n";
188 exit 1;
189}
190
191# starting point
192process_file();
193
194if ($errors) {
195 print STDERR "$errors errors\n";
196}
197
198exit($errors);
diff --git a/scripts/selinux/README b/scripts/selinux/README
index 4d020ecb7524..5ba679c5be18 100644
--- a/scripts/selinux/README
+++ b/scripts/selinux/README
@@ -1,2 +1,2 @@
1Please see Documentation/security/SELinux.txt for information on 1Please see Documentation/admin-guide/LSM/SELinux.rst for information on
2installing a dummy SELinux policy. 2installing a dummy SELinux policy.
diff --git a/security/apparmor/match.c b/security/apparmor/match.c
index 960c913381e2..72c604350e80 100644
--- a/security/apparmor/match.c
+++ b/security/apparmor/match.c
@@ -226,7 +226,7 @@ void aa_dfa_free_kref(struct kref *kref)
226 * @flags: flags controlling what type of accept tables are acceptable 226 * @flags: flags controlling what type of accept tables are acceptable
227 * 227 *
228 * Unpack a dfa that has been serialized. To find information on the dfa 228 * Unpack a dfa that has been serialized. To find information on the dfa
229 * format look in Documentation/security/apparmor.txt 229 * format look in Documentation/admin-guide/LSM/apparmor.rst
230 * Assumes the dfa @blob stream has been aligned on a 8 byte boundary 230 * Assumes the dfa @blob stream has been aligned on a 8 byte boundary
231 * 231 *
232 * Returns: an unpacked dfa ready for matching or ERR_PTR on failure 232 * Returns: an unpacked dfa ready for matching or ERR_PTR on failure
diff --git a/security/apparmor/policy_unpack.c b/security/apparmor/policy_unpack.c
index f3422a91353c..981d570eebba 100644
--- a/security/apparmor/policy_unpack.c
+++ b/security/apparmor/policy_unpack.c
@@ -13,7 +13,7 @@
13 * License. 13 * License.
14 * 14 *
15 * AppArmor uses a serialized binary format for loading policy. To find 15 * AppArmor uses a serialized binary format for loading policy. To find
16 * policy format documentation look in Documentation/security/apparmor.txt 16 * policy format documentation see Documentation/admin-guide/LSM/apparmor.rst
17 * All policy is validated before it is used. 17 * All policy is validated before it is used.
18 */ 18 */
19 19
diff --git a/security/keys/encrypted-keys/encrypted.c b/security/keys/encrypted-keys/encrypted.c
index bb6324d1ccec..69855ba0d3b3 100644
--- a/security/keys/encrypted-keys/encrypted.c
+++ b/security/keys/encrypted-keys/encrypted.c
@@ -11,7 +11,7 @@
11 * it under the terms of the GNU General Public License as published by 11 * it under the terms of the GNU General Public License as published by
12 * the Free Software Foundation, version 2 of the License. 12 * the Free Software Foundation, version 2 of the License.
13 * 13 *
14 * See Documentation/security/keys-trusted-encrypted.txt 14 * See Documentation/security/keys/trusted-encrypted.rst
15 */ 15 */
16 16
17#include <linux/uaccess.h> 17#include <linux/uaccess.h>
diff --git a/security/keys/encrypted-keys/masterkey_trusted.c b/security/keys/encrypted-keys/masterkey_trusted.c
index b5b4812dbc87..cbf0bc127a73 100644
--- a/security/keys/encrypted-keys/masterkey_trusted.c
+++ b/security/keys/encrypted-keys/masterkey_trusted.c
@@ -11,7 +11,7 @@
11 * it under the terms of the GNU General Public License as published by 11 * it under the terms of the GNU General Public License as published by
12 * the Free Software Foundation, version 2 of the License. 12 * the Free Software Foundation, version 2 of the License.
13 * 13 *
14 * See Documentation/security/keys-trusted-encrypted.txt 14 * See Documentation/security/keys/trusted-encrypted.rst
15 */ 15 */
16 16
17#include <linux/uaccess.h> 17#include <linux/uaccess.h>
diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index 9822e500d50d..63e63a42db3c 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -8,7 +8,7 @@
8 * as published by the Free Software Foundation; either version 8 * as published by the Free Software Foundation; either version
9 * 2 of the License, or (at your option) any later version. 9 * 2 of the License, or (at your option) any later version.
10 * 10 *
11 * See Documentation/security/keys-request-key.txt 11 * See Documentation/security/keys/request-key.rst
12 */ 12 */
13 13
14#include <linux/module.h> 14#include <linux/module.h>
diff --git a/security/keys/request_key_auth.c b/security/keys/request_key_auth.c
index 0f062156dfb2..afe9d22ab361 100644
--- a/security/keys/request_key_auth.c
+++ b/security/keys/request_key_auth.c
@@ -8,7 +8,7 @@
8 * as published by the Free Software Foundation; either version 8 * as published by the Free Software Foundation; either version
9 * 2 of the License, or (at your option) any later version. 9 * 2 of the License, or (at your option) any later version.
10 * 10 *
11 * See Documentation/security/keys-request-key.txt 11 * See Documentation/security/keys/request-key.rst
12 */ 12 */
13 13
14#include <linux/module.h> 14#include <linux/module.h>
diff --git a/security/keys/trusted.c b/security/keys/trusted.c
index 435e86e13879..ddfaebf60fc8 100644
--- a/security/keys/trusted.c
+++ b/security/keys/trusted.c
@@ -8,7 +8,7 @@
8 * it under the terms of the GNU General Public License as published by 8 * it under the terms of the GNU General Public License as published by
9 * the Free Software Foundation, version 2 of the License. 9 * the Free Software Foundation, version 2 of the License.
10 * 10 *
11 * See Documentation/security/keys-trusted-encrypted.txt 11 * See Documentation/security/keys/trusted-encrypted.rst
12 */ 12 */
13 13
14#include <crypto/hash_info.h> 14#include <crypto/hash_info.h>
diff --git a/security/yama/Kconfig b/security/yama/Kconfig
index 90c605eea892..96b27405558a 100644
--- a/security/yama/Kconfig
+++ b/security/yama/Kconfig
@@ -7,6 +7,7 @@ config SECURITY_YAMA
7 system-wide security settings beyond regular Linux discretionary 7 system-wide security settings beyond regular Linux discretionary
8 access controls. Currently available is ptrace scope restriction. 8 access controls. Currently available is ptrace scope restriction.
9 Like capabilities, this security module stacks with other LSMs. 9 Like capabilities, this security module stacks with other LSMs.
10 Further information can be found in Documentation/security/Yama.txt. 10 Further information can be found in
11 Documentation/admin-guide/LSM/Yama.rst.
11 12
12 If you are unsure how to answer this question, answer N. 13 If you are unsure how to answer this question, answer N.