diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2017-07-04 00:13:25 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2017-07-04 00:13:25 -0400 |
commit | 650fc870a2ef35b83397eebd35b8c8df211bff78 (patch) | |
tree | 14a293fa894d0f166aa60f1f5ca672a2bdb312c0 | |
parent | f4dd029ee0b92b77769a1ac6dce03e829e74763e (diff) | |
parent | 1cb566ba5634d7593b8b2a0a5c83f1c9e14b2e09 (diff) |
Merge tag 'docs-4.13' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"There has been a fair amount of activity in the docs tree this time
around. Highlights include:
- Conversion of a bunch of security documentation into RST
- The conversion of the remaining DocBook templates by The Amazing
Mauro Machine. We can now drop the entire DocBook build chain.
- The usual collection of fixes and minor updates"
* tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
scripts/kernel-doc: handle DECLARE_HASHTABLE
Documentation: atomic_ops.txt is core-api/atomic_ops.rst
Docs: clean up some DocBook loose ends
Make the main documentation title less Geocities
Docs: Use kernel-figure in vidioc-g-selection.rst
Docs: fix table problems in ras.rst
Docs: Fix breakage with Sphinx 1.5 and upper
Docs: Include the Latex "ifthen" package
doc/kokr/howto: Only send regression fixes after -rc1
docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
doc: Document suitability of IBM Verse for kernel development
Doc: fix a markup error in coding-style.rst
docs: driver-api: i2c: remove some outdated information
Documentation: DMA API: fix a typo in a function name
Docs: Insert missing space to separate link from text
doc/ko_KR/memory-barriers: Update control-dependencies example
Documentation, kbuild: fix typo "minimun" -> "minimum"
docs: Fix some formatting issues in request-key.rst
doc: ReSTify keys-trusted-encrypted.txt
doc: ReSTify keys-request-key.txt
...
181 files changed, 8403 insertions, 12195 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index ed3e5e949fce..f35473f8c630 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX | |||
@@ -24,8 +24,6 @@ DMA-ISA-LPC.txt | |||
24 | - How to do DMA with ISA (and LPC) devices. | 24 | - How to do DMA with ISA (and LPC) devices. |
25 | DMA-attributes.txt | 25 | DMA-attributes.txt |
26 | - listing of the various possible attributes a DMA region can have | 26 | - listing of the various possible attributes a DMA region can have |
27 | DocBook/ | ||
28 | - directory with DocBook templates etc. for kernel documentation. | ||
29 | EDID/ | 27 | EDID/ |
30 | - directory with info on customizing EDID for broken gfx/displays. | 28 | - directory with info on customizing EDID for broken gfx/displays. |
31 | IPMI.txt | 29 | IPMI.txt |
@@ -40,8 +38,6 @@ Intel-IOMMU.txt | |||
40 | - basic info on the Intel IOMMU virtualization support. | 38 | - basic info on the Intel IOMMU virtualization support. |
41 | Makefile | 39 | Makefile |
42 | - It's not of interest for those who aren't touching the build system. | 40 | - It's not of interest for those who aren't touching the build system. |
43 | Makefile.sphinx | ||
44 | - It's not of interest for those who aren't touching the build system. | ||
45 | PCI/ | 41 | PCI/ |
46 | - info related to PCI drivers. | 42 | - info related to PCI drivers. |
47 | RCU/ | 43 | RCU/ |
@@ -264,6 +260,8 @@ logo.gif | |||
264 | - full colour GIF image of Linux logo (penguin - Tux). | 260 | - full colour GIF image of Linux logo (penguin - Tux). |
265 | logo.txt | 261 | logo.txt |
266 | - info on creator of above logo & site to get additional images from. | 262 | - info on creator of above logo & site to get additional images from. |
263 | lsm.txt | ||
264 | - Linux Security Modules: General Security Hooks for Linux | ||
267 | lzo.txt | 265 | lzo.txt |
268 | - kernel LZO decompressor input formats | 266 | - kernel LZO decompressor input formats |
269 | m68k/ | 267 | m68k/ |
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 6b20128fab8a..71200dfa0922 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt | |||
@@ -692,7 +692,7 @@ of preallocated entries is defined per architecture. If it is too low for you | |||
692 | boot with 'dma_debug_entries=<your_desired_number>' to overwrite the | 692 | boot with 'dma_debug_entries=<your_desired_number>' to overwrite the |
693 | architectural default. | 693 | architectural default. |
694 | 694 | ||
695 | void debug_dmap_mapping_error(struct device *dev, dma_addr_t dma_addr); | 695 | void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr); |
696 | 696 | ||
697 | dma-debug interface debug_dma_mapping_error() to debug drivers that fail | 697 | dma-debug interface debug_dma_mapping_error() to debug drivers that fail |
698 | to check DMA mapping errors on addresses returned by dma_map_single() and | 698 | to check DMA mapping errors on addresses returned by dma_map_single() and |
diff --git a/Documentation/DocBook/.gitignore b/Documentation/DocBook/.gitignore deleted file mode 100644 index e05da3f7aa21..000000000000 --- a/Documentation/DocBook/.gitignore +++ /dev/null | |||
@@ -1,17 +0,0 @@ | |||
1 | *.xml | ||
2 | *.ps | ||
3 | |||
4 | *.html | ||
5 | *.9.gz | ||
6 | *.9 | ||
7 | *.aux | ||
8 | *.dvi | ||
9 | *.log | ||
10 | *.out | ||
11 | *.png | ||
12 | *.gif | ||
13 | *.svg | ||
14 | *.proc | ||
15 | *.db | ||
16 | media-indices.tmpl | ||
17 | media-entities.tmpl | ||
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile deleted file mode 100644 index 85916f13d330..000000000000 --- a/Documentation/DocBook/Makefile +++ /dev/null | |||
@@ -1,282 +0,0 @@ | |||
1 | ### | ||
2 | # This makefile is used to generate the kernel documentation, | ||
3 | # primarily based on in-line comments in various source files. | ||
4 | # See Documentation/kernel-doc-nano-HOWTO.txt for instruction in how | ||
5 | # to document the SRC - and how to read it. | ||
6 | # To add a new book the only step required is to add the book to the | ||
7 | # list of DOCBOOKS. | ||
8 | |||
9 | DOCBOOKS := z8530book.xml \ | ||
10 | kernel-hacking.xml kernel-locking.xml \ | ||
11 | networking.xml \ | ||
12 | filesystems.xml lsm.xml kgdb.xml \ | ||
13 | libata.xml mtdnand.xml librs.xml rapidio.xml \ | ||
14 | s390-drivers.xml scsi.xml \ | ||
15 | sh.xml w1.xml | ||
16 | |||
17 | ifeq ($(DOCBOOKS),) | ||
18 | |||
19 | # Skip DocBook build if the user explicitly requested no DOCBOOKS. | ||
20 | .DEFAULT: | ||
21 | @echo " SKIP DocBook $@ target (DOCBOOKS=\"\" specified)." | ||
22 | else | ||
23 | ifneq ($(SPHINXDIRS),) | ||
24 | |||
25 | # Skip DocBook build if the user explicitly requested a sphinx dir | ||
26 | .DEFAULT: | ||
27 | @echo " SKIP DocBook $@ target (SPHINXDIRS specified)." | ||
28 | else | ||
29 | |||
30 | |||
31 | ### | ||
32 | # The build process is as follows (targets): | ||
33 | # (xmldocs) [by docproc] | ||
34 | # file.tmpl --> file.xml +--> file.ps (psdocs) [by db2ps or xmlto] | ||
35 | # +--> file.pdf (pdfdocs) [by db2pdf or xmlto] | ||
36 | # +--> DIR=file (htmldocs) [by xmlto] | ||
37 | # +--> man/ (mandocs) [by xmlto] | ||
38 | |||
39 | |||
40 | # for PDF and PS output you can choose between xmlto and docbook-utils tools | ||
41 | PDF_METHOD = $(prefer-db2x) | ||
42 | PS_METHOD = $(prefer-db2x) | ||
43 | |||
44 | |||
45 | targets += $(DOCBOOKS) | ||
46 | BOOKS := $(addprefix $(obj)/,$(DOCBOOKS)) | ||
47 | xmldocs: $(BOOKS) | ||
48 | sgmldocs: xmldocs | ||
49 | |||
50 | PS := $(patsubst %.xml, %.ps, $(BOOKS)) | ||
51 | psdocs: $(PS) | ||
52 | |||
53 | PDF := $(patsubst %.xml, %.pdf, $(BOOKS)) | ||
54 | pdfdocs: $(PDF) | ||
55 | |||
56 | HTML := $(sort $(patsubst %.xml, %.html, $(BOOKS))) | ||
57 | htmldocs: $(HTML) | ||
58 | $(call cmd,build_main_index) | ||
59 | |||
60 | MAN := $(patsubst %.xml, %.9, $(BOOKS)) | ||
61 | mandocs: $(MAN) | ||
62 | find $(obj)/man -name '*.9' | xargs gzip -nf | ||
63 | |||
64 | # Default location for installed man pages | ||
65 | export INSTALL_MAN_PATH = $(objtree)/usr | ||
66 | |||
67 | installmandocs: mandocs | ||
68 | mkdir -p $(INSTALL_MAN_PATH)/man/man9/ | ||
69 | find $(obj)/man -name '*.9.gz' -printf '%h %f\n' | \ | ||
70 | sort -k 2 -k 1 | uniq -f 1 | sed -e 's: :/:' | \ | ||
71 | xargs install -m 644 -t $(INSTALL_MAN_PATH)/man/man9/ | ||
72 | |||
73 | # no-op for the DocBook toolchain | ||
74 | epubdocs: | ||
75 | latexdocs: | ||
76 | linkcheckdocs: | ||
77 | |||
78 | ### | ||
79 | #External programs used | ||
80 | KERNELDOCXMLREF = $(srctree)/scripts/kernel-doc-xml-ref | ||
81 | KERNELDOC = $(srctree)/scripts/kernel-doc | ||
82 | DOCPROC = $(objtree)/scripts/docproc | ||
83 | CHECK_LC_CTYPE = $(objtree)/scripts/check-lc_ctype | ||
84 | |||
85 | # Use a fixed encoding - UTF-8 if the C library has support built-in | ||
86 | # or ASCII if not | ||
87 | LC_CTYPE := $(call try-run, LC_CTYPE=C.UTF-8 $(CHECK_LC_CTYPE),C.UTF-8,C) | ||
88 | export LC_CTYPE | ||
89 | |||
90 | XMLTOFLAGS = -m $(srctree)/$(src)/stylesheet.xsl | ||
91 | XMLTOFLAGS += --skip-validation | ||
92 | |||
93 | ### | ||
94 | # DOCPROC is used for two purposes: | ||
95 | # 1) To generate a dependency list for a .tmpl file | ||
96 | # 2) To preprocess a .tmpl file and call kernel-doc with | ||
97 | # appropriate parameters. | ||
98 | # The following rules are used to generate the .xml documentation | ||
99 | # required to generate the final targets. (ps, pdf, html). | ||
100 | quiet_cmd_docproc = DOCPROC $@ | ||
101 | cmd_docproc = SRCTREE=$(srctree)/ $(DOCPROC) doc $< >$@ | ||
102 | define rule_docproc | ||
103 | set -e; \ | ||
104 | $(if $($(quiet)cmd_$(1)),echo ' $($(quiet)cmd_$(1))';) \ | ||
105 | $(cmd_$(1)); \ | ||
106 | ( \ | ||
107 | echo 'cmd_$@ := $(cmd_$(1))'; \ | ||
108 | echo $@: `SRCTREE=$(srctree) $(DOCPROC) depend $<`; \ | ||
109 | ) > $(dir $@).$(notdir $@).cmd | ||
110 | endef | ||
111 | |||
112 | %.xml: %.tmpl $(KERNELDOC) $(DOCPROC) $(KERNELDOCXMLREF) FORCE | ||
113 | $(call if_changed_rule,docproc) | ||
114 | |||
115 | # Tell kbuild to always build the programs | ||
116 | always := $(hostprogs-y) | ||
117 | |||
118 | notfoundtemplate = echo "*** You have to install docbook-utils or xmlto ***"; \ | ||
119 | exit 1 | ||
120 | db2xtemplate = db2TYPE -o $(dir $@) $< | ||
121 | xmltotemplate = xmlto TYPE $(XMLTOFLAGS) -o $(dir $@) $< | ||
122 | |||
123 | # determine which methods are available | ||
124 | ifeq ($(shell which db2ps >/dev/null 2>&1 && echo found),found) | ||
125 | use-db2x = db2x | ||
126 | prefer-db2x = db2x | ||
127 | else | ||
128 | use-db2x = notfound | ||
129 | prefer-db2x = $(use-xmlto) | ||
130 | endif | ||
131 | ifeq ($(shell which xmlto >/dev/null 2>&1 && echo found),found) | ||
132 | use-xmlto = xmlto | ||
133 | prefer-xmlto = xmlto | ||
134 | else | ||
135 | use-xmlto = notfound | ||
136 | prefer-xmlto = $(use-db2x) | ||
137 | endif | ||
138 | |||
139 | # the commands, generated from the chosen template | ||
140 | quiet_cmd_db2ps = PS $@ | ||
141 | cmd_db2ps = $(subst TYPE,ps, $($(PS_METHOD)template)) | ||
142 | %.ps : %.xml | ||
143 | $(call cmd,db2ps) | ||
144 | |||
145 | quiet_cmd_db2pdf = PDF $@ | ||
146 | cmd_db2pdf = $(subst TYPE,pdf, $($(PDF_METHOD)template)) | ||
147 | %.pdf : %.xml | ||
148 | $(call cmd,db2pdf) | ||
149 | |||
150 | |||
151 | index = index.html | ||
152 | main_idx = $(obj)/$(index) | ||
153 | quiet_cmd_build_main_index = HTML $(main_idx) | ||
154 | cmd_build_main_index = rm -rf $(main_idx); \ | ||
155 | echo '<h1>Linux Kernel HTML Documentation</h1>' >> $(main_idx) && \ | ||
156 | echo '<h2>Kernel Version: $(KERNELVERSION)</h2>' >> $(main_idx) && \ | ||
157 | cat $(HTML) >> $(main_idx) | ||
158 | |||
159 | quiet_cmd_db2html = HTML $@ | ||
160 | cmd_db2html = xmlto html $(XMLTOFLAGS) -o $(patsubst %.html,%,$@) $< && \ | ||
161 | echo '<a HREF="$(patsubst %.html,%,$(notdir $@))/index.html"> \ | ||
162 | $(patsubst %.html,%,$(notdir $@))</a><p>' > $@ | ||
163 | |||
164 | ### | ||
165 | # Rules to create an aux XML and .db, and use them to re-process the DocBook XML | ||
166 | # to fill internal hyperlinks | ||
167 | gen_aux_xml = : | ||
168 | quiet_gen_aux_xml = echo ' XMLREF $@' | ||
169 | silent_gen_aux_xml = : | ||
170 | %.aux.xml: %.xml | ||
171 | @$($(quiet)gen_aux_xml) | ||
172 | @rm -rf $@ | ||
173 | @(cat $< | egrep "^<refentry id" | egrep -o "\".*\"" | cut -f 2 -d \" > $<.db) | ||
174 | @$(KERNELDOCXMLREF) -db $<.db $< > $@ | ||
175 | .PRECIOUS: %.aux.xml | ||
176 | |||
177 | %.html: %.aux.xml | ||
178 | @(which xmlto > /dev/null 2>&1) || \ | ||
179 | (echo "*** You need to install xmlto ***"; \ | ||
180 | exit 1) | ||
181 | @rm -rf $@ $(patsubst %.html,%,$@) | ||
182 | $(call cmd,db2html) | ||
183 | @if [ ! -z "$(PNG-$(basename $(notdir $@)))" ]; then \ | ||
184 | cp $(PNG-$(basename $(notdir $@))) $(patsubst %.html,%,$@); fi | ||
185 | |||
186 | quiet_cmd_db2man = MAN $@ | ||
187 | cmd_db2man = if grep -q refentry $<; then xmlto man $(XMLTOFLAGS) -o $(obj)/man/$(*F) $< ; fi | ||
188 | %.9 : %.xml | ||
189 | @(which xmlto > /dev/null 2>&1) || \ | ||
190 | (echo "*** You need to install xmlto ***"; \ | ||
191 | exit 1) | ||
192 | $(Q)mkdir -p $(obj)/man/$(*F) | ||
193 | $(call cmd,db2man) | ||
194 | @touch $@ | ||
195 | |||
196 | ### | ||
197 | # Rules to generate postscripts and PNG images from .fig format files | ||
198 | quiet_cmd_fig2eps = FIG2EPS $@ | ||
199 | cmd_fig2eps = fig2dev -Leps $< $@ | ||
200 | |||
201 | %.eps: %.fig | ||
202 | @(which fig2dev > /dev/null 2>&1) || \ | ||
203 | (echo "*** You need to install transfig ***"; \ | ||
204 | exit 1) | ||
205 | $(call cmd,fig2eps) | ||
206 | |||
207 | quiet_cmd_fig2png = FIG2PNG $@ | ||
208 | cmd_fig2png = fig2dev -Lpng $< $@ | ||
209 | |||
210 | %.png: %.fig | ||
211 | @(which fig2dev > /dev/null 2>&1) || \ | ||
212 | (echo "*** You need to install transfig ***"; \ | ||
213 | exit 1) | ||
214 | $(call cmd,fig2png) | ||
215 | |||
216 | ### | ||
217 | # Rule to convert a .c file to inline XML documentation | ||
218 | gen_xml = : | ||
219 | quiet_gen_xml = echo ' GEN $@' | ||
220 | silent_gen_xml = : | ||
221 | %.xml: %.c | ||
222 | @$($(quiet)gen_xml) | ||
223 | @( \ | ||
224 | echo "<programlisting>"; \ | ||
225 | expand --tabs=8 < $< | \ | ||
226 | sed -e "s/&/\\&/g" \ | ||
227 | -e "s/</\\</g" \ | ||
228 | -e "s/>/\\>/g"; \ | ||
229 | echo "</programlisting>") > $@ | ||
230 | |||
231 | endif # DOCBOOKS="" | ||
232 | endif # SPHINDIR=... | ||
233 | |||
234 | ### | ||
235 | # Help targets as used by the top-level makefile | ||
236 | dochelp: | ||
237 | @echo ' Linux kernel internal documentation in different formats (DocBook):' | ||
238 | @echo ' htmldocs - HTML' | ||
239 | @echo ' pdfdocs - PDF' | ||
240 | @echo ' psdocs - Postscript' | ||
241 | @echo ' xmldocs - XML DocBook' | ||
242 | @echo ' mandocs - man pages' | ||
243 | @echo ' installmandocs - install man pages generated by mandocs to INSTALL_MAN_PATH'; \ | ||
244 | echo ' (default: $(INSTALL_MAN_PATH))'; \ | ||
245 | echo '' | ||
246 | @echo ' cleandocs - clean all generated DocBook files' | ||
247 | @echo | ||
248 | @echo ' make DOCBOOKS="s1.xml s2.xml" [target] Generate only docs s1.xml s2.xml' | ||
249 | @echo ' valid values for DOCBOOKS are: $(DOCBOOKS)' | ||
250 | @echo | ||
251 | @echo " make DOCBOOKS=\"\" [target] Don't generate docs from Docbook" | ||
252 | @echo ' This is useful to generate only the ReST docs (Sphinx)' | ||
253 | |||
254 | |||
255 | ### | ||
256 | # Temporary files left by various tools | ||
257 | clean-files := $(DOCBOOKS) \ | ||
258 | $(patsubst %.xml, %.dvi, $(DOCBOOKS)) \ | ||
259 | $(patsubst %.xml, %.aux, $(DOCBOOKS)) \ | ||
260 | $(patsubst %.xml, %.tex, $(DOCBOOKS)) \ | ||
261 | $(patsubst %.xml, %.log, $(DOCBOOKS)) \ | ||
262 | $(patsubst %.xml, %.out, $(DOCBOOKS)) \ | ||
263 | $(patsubst %.xml, %.ps, $(DOCBOOKS)) \ | ||
264 | $(patsubst %.xml, %.pdf, $(DOCBOOKS)) \ | ||
265 | $(patsubst %.xml, %.html, $(DOCBOOKS)) \ | ||
266 | $(patsubst %.xml, %.9, $(DOCBOOKS)) \ | ||
267 | $(patsubst %.xml, %.aux.xml, $(DOCBOOKS)) \ | ||
268 | $(patsubst %.xml, %.xml.db, $(DOCBOOKS)) \ | ||
269 | $(patsubst %.xml, %.xml, $(DOCBOOKS)) \ | ||
270 | $(patsubst %.xml, .%.xml.cmd, $(DOCBOOKS)) \ | ||
271 | $(index) | ||
272 | |||
273 | clean-dirs := $(patsubst %.xml,%,$(DOCBOOKS)) man | ||
274 | |||
275 | cleandocs: | ||
276 | $(Q)rm -f $(call objectify, $(clean-files)) | ||
277 | $(Q)rm -rf $(call objectify, $(clean-dirs)) | ||
278 | |||
279 | # Declare the contents of the .PHONY variable as phony. We keep that | ||
280 | # information in a variable so we can use it in if_changed and friends. | ||
281 | |||
282 | .PHONY: $(PHONY) | ||
diff --git a/Documentation/DocBook/filesystems.tmpl b/Documentation/DocBook/filesystems.tmpl deleted file mode 100644 index 6006b6358c86..000000000000 --- a/Documentation/DocBook/filesystems.tmpl +++ /dev/null | |||
@@ -1,381 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="Linux-filesystems-API"> | ||
6 | <bookinfo> | ||
7 | <title>Linux Filesystems API</title> | ||
8 | |||
9 | <legalnotice> | ||
10 | <para> | ||
11 | This documentation is free software; you can redistribute | ||
12 | it and/or modify it under the terms of the GNU General Public | ||
13 | License as published by the Free Software Foundation; either | ||
14 | version 2 of the License, or (at your option) any later | ||
15 | version. | ||
16 | </para> | ||
17 | |||
18 | <para> | ||
19 | This program is distributed in the hope that it will be | ||
20 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
21 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
22 | See the GNU General Public License for more details. | ||
23 | </para> | ||
24 | |||
25 | <para> | ||
26 | You should have received a copy of the GNU General Public | ||
27 | License along with this program; if not, write to the Free | ||
28 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
29 | MA 02111-1307 USA | ||
30 | </para> | ||
31 | |||
32 | <para> | ||
33 | For more details see the file COPYING in the source | ||
34 | distribution of Linux. | ||
35 | </para> | ||
36 | </legalnotice> | ||
37 | </bookinfo> | ||
38 | |||
39 | <toc></toc> | ||
40 | |||
41 | <chapter id="vfs"> | ||
42 | <title>The Linux VFS</title> | ||
43 | <sect1 id="the_filesystem_types"><title>The Filesystem types</title> | ||
44 | !Iinclude/linux/fs.h | ||
45 | </sect1> | ||
46 | <sect1 id="the_directory_cache"><title>The Directory Cache</title> | ||
47 | !Efs/dcache.c | ||
48 | !Iinclude/linux/dcache.h | ||
49 | </sect1> | ||
50 | <sect1 id="inode_handling"><title>Inode Handling</title> | ||
51 | !Efs/inode.c | ||
52 | !Efs/bad_inode.c | ||
53 | </sect1> | ||
54 | <sect1 id="registration_and_superblocks"><title>Registration and Superblocks</title> | ||
55 | !Efs/super.c | ||
56 | </sect1> | ||
57 | <sect1 id="file_locks"><title>File Locks</title> | ||
58 | !Efs/locks.c | ||
59 | !Ifs/locks.c | ||
60 | </sect1> | ||
61 | <sect1 id="other_functions"><title>Other Functions</title> | ||
62 | !Efs/mpage.c | ||
63 | !Efs/namei.c | ||
64 | !Efs/buffer.c | ||
65 | !Eblock/bio.c | ||
66 | !Efs/seq_file.c | ||
67 | !Efs/filesystems.c | ||
68 | !Efs/fs-writeback.c | ||
69 | !Efs/block_dev.c | ||
70 | </sect1> | ||
71 | </chapter> | ||
72 | |||
73 | <chapter id="proc"> | ||
74 | <title>The proc filesystem</title> | ||
75 | |||
76 | <sect1 id="sysctl_interface"><title>sysctl interface</title> | ||
77 | !Ekernel/sysctl.c | ||
78 | </sect1> | ||
79 | |||
80 | <sect1 id="proc_filesystem_interface"><title>proc filesystem interface</title> | ||
81 | !Ifs/proc/base.c | ||
82 | </sect1> | ||
83 | </chapter> | ||
84 | |||
85 | <chapter id="fs_events"> | ||
86 | <title>Events based on file descriptors</title> | ||
87 | !Efs/eventfd.c | ||
88 | </chapter> | ||
89 | |||
90 | <chapter id="sysfs"> | ||
91 | <title>The Filesystem for Exporting Kernel Objects</title> | ||
92 | !Efs/sysfs/file.c | ||
93 | !Efs/sysfs/symlink.c | ||
94 | </chapter> | ||
95 | |||
96 | <chapter id="debugfs"> | ||
97 | <title>The debugfs filesystem</title> | ||
98 | |||
99 | <sect1 id="debugfs_interface"><title>debugfs interface</title> | ||
100 | !Efs/debugfs/inode.c | ||
101 | !Efs/debugfs/file.c | ||
102 | </sect1> | ||
103 | </chapter> | ||
104 | |||
105 | <chapter id="LinuxJDBAPI"> | ||
106 | <chapterinfo> | ||
107 | <title>The Linux Journalling API</title> | ||
108 | |||
109 | <authorgroup> | ||
110 | <author> | ||
111 | <firstname>Roger</firstname> | ||
112 | <surname>Gammans</surname> | ||
113 | <affiliation> | ||
114 | <address> | ||
115 | <email>rgammans@computer-surgery.co.uk</email> | ||
116 | </address> | ||
117 | </affiliation> | ||
118 | </author> | ||
119 | </authorgroup> | ||
120 | |||
121 | <authorgroup> | ||
122 | <author> | ||
123 | <firstname>Stephen</firstname> | ||
124 | <surname>Tweedie</surname> | ||
125 | <affiliation> | ||
126 | <address> | ||
127 | <email>sct@redhat.com</email> | ||
128 | </address> | ||
129 | </affiliation> | ||
130 | </author> | ||
131 | </authorgroup> | ||
132 | |||
133 | <copyright> | ||
134 | <year>2002</year> | ||
135 | <holder>Roger Gammans</holder> | ||
136 | </copyright> | ||
137 | </chapterinfo> | ||
138 | |||
139 | <title>The Linux Journalling API</title> | ||
140 | |||
141 | <sect1 id="journaling_overview"> | ||
142 | <title>Overview</title> | ||
143 | <sect2 id="journaling_details"> | ||
144 | <title>Details</title> | ||
145 | <para> | ||
146 | The journalling layer is easy to use. You need to | ||
147 | first of all create a journal_t data structure. There are | ||
148 | two calls to do this dependent on how you decide to allocate the physical | ||
149 | media on which the journal resides. The jbd2_journal_init_inode() call | ||
150 | is for journals stored in filesystem inodes, or the jbd2_journal_init_dev() | ||
151 | call can be used for journal stored on a raw device (in a continuous range | ||
152 | of blocks). A journal_t is a typedef for a struct pointer, so when | ||
153 | you are finally finished make sure you call jbd2_journal_destroy() on it | ||
154 | to free up any used kernel memory. | ||
155 | </para> | ||
156 | |||
157 | <para> | ||
158 | Once you have got your journal_t object you need to 'mount' or load the journal | ||
159 | file. The journalling layer expects the space for the journal was already | ||
160 | allocated and initialized properly by the userspace tools. When loading the | ||
161 | journal you must call jbd2_journal_load() to process journal contents. If the | ||
162 | client file system detects the journal contents does not need to be processed | ||
163 | (or even need not have valid contents), it may call jbd2_journal_wipe() to | ||
164 | clear the journal contents before calling jbd2_journal_load(). | ||
165 | </para> | ||
166 | |||
167 | <para> | ||
168 | Note that jbd2_journal_wipe(..,0) calls jbd2_journal_skip_recovery() for you if | ||
169 | it detects any outstanding transactions in the journal and similarly | ||
170 | jbd2_journal_load() will call jbd2_journal_recover() if necessary. I would | ||
171 | advise reading ext4_load_journal() in fs/ext4/super.c for examples on this | ||
172 | stage. | ||
173 | </para> | ||
174 | |||
175 | <para> | ||
176 | Now you can go ahead and start modifying the underlying | ||
177 | filesystem. Almost. | ||
178 | </para> | ||
179 | |||
180 | <para> | ||
181 | |||
182 | You still need to actually journal your filesystem changes, this | ||
183 | is done by wrapping them into transactions. Additionally you | ||
184 | also need to wrap the modification of each of the buffers | ||
185 | with calls to the journal layer, so it knows what the modifications | ||
186 | you are actually making are. To do this use jbd2_journal_start() which | ||
187 | returns a transaction handle. | ||
188 | </para> | ||
189 | |||
190 | <para> | ||
191 | jbd2_journal_start() | ||
192 | and its counterpart jbd2_journal_stop(), which indicates the end of a | ||
193 | transaction are nestable calls, so you can reenter a transaction if necessary, | ||
194 | but remember you must call jbd2_journal_stop() the same number of times as | ||
195 | jbd2_journal_start() before the transaction is completed (or more accurately | ||
196 | leaves the update phase). Ext4/VFS makes use of this feature to simplify | ||
197 | handling of inode dirtying, quota support, etc. | ||
198 | </para> | ||
199 | |||
200 | <para> | ||
201 | Inside each transaction you need to wrap the modifications to the | ||
202 | individual buffers (blocks). Before you start to modify a buffer you | ||
203 | need to call jbd2_journal_get_{create,write,undo}_access() as appropriate, | ||
204 | this allows the journalling layer to copy the unmodified data if it | ||
205 | needs to. After all the buffer may be part of a previously uncommitted | ||
206 | transaction. | ||
207 | At this point you are at last ready to modify a buffer, and once | ||
208 | you are have done so you need to call jbd2_journal_dirty_{meta,}data(). | ||
209 | Or if you've asked for access to a buffer you now know is now longer | ||
210 | required to be pushed back on the device you can call jbd2_journal_forget() | ||
211 | in much the same way as you might have used bforget() in the past. | ||
212 | </para> | ||
213 | |||
214 | <para> | ||
215 | A jbd2_journal_flush() may be called at any time to commit and checkpoint | ||
216 | all your transactions. | ||
217 | </para> | ||
218 | |||
219 | <para> | ||
220 | Then at umount time , in your put_super() you can then call jbd2_journal_destroy() | ||
221 | to clean up your in-core journal object. | ||
222 | </para> | ||
223 | |||
224 | <para> | ||
225 | Unfortunately there a couple of ways the journal layer can cause a deadlock. | ||
226 | The first thing to note is that each task can only have | ||
227 | a single outstanding transaction at any one time, remember nothing | ||
228 | commits until the outermost jbd2_journal_stop(). This means | ||
229 | you must complete the transaction at the end of each file/inode/address | ||
230 | etc. operation you perform, so that the journalling system isn't re-entered | ||
231 | on another journal. Since transactions can't be nested/batched | ||
232 | across differing journals, and another filesystem other than | ||
233 | yours (say ext4) may be modified in a later syscall. | ||
234 | </para> | ||
235 | |||
236 | <para> | ||
237 | The second case to bear in mind is that jbd2_journal_start() can | ||
238 | block if there isn't enough space in the journal for your transaction | ||
239 | (based on the passed nblocks param) - when it blocks it merely(!) needs to | ||
240 | wait for transactions to complete and be committed from other tasks, | ||
241 | so essentially we are waiting for jbd2_journal_stop(). So to avoid | ||
242 | deadlocks you must treat jbd2_journal_start/stop() as if they | ||
243 | were semaphores and include them in your semaphore ordering rules to prevent | ||
244 | deadlocks. Note that jbd2_journal_extend() has similar blocking behaviour to | ||
245 | jbd2_journal_start() so you can deadlock here just as easily as on | ||
246 | jbd2_journal_start(). | ||
247 | </para> | ||
248 | |||
249 | <para> | ||
250 | Try to reserve the right number of blocks the first time. ;-). This will | ||
251 | be the maximum number of blocks you are going to touch in this transaction. | ||
252 | I advise having a look at at least ext4_jbd.h to see the basis on which | ||
253 | ext4 uses to make these decisions. | ||
254 | </para> | ||
255 | |||
256 | <para> | ||
257 | Another wriggle to watch out for is your on-disk block allocation strategy. | ||
258 | Why? Because, if you do a delete, you need to ensure you haven't reused any | ||
259 | of the freed blocks until the transaction freeing these blocks commits. If you | ||
260 | reused these blocks and crash happens, there is no way to restore the contents | ||
261 | of the reallocated blocks at the end of the last fully committed transaction. | ||
262 | |||
263 | One simple way of doing this is to mark blocks as free in internal in-memory | ||
264 | block allocation structures only after the transaction freeing them commits. | ||
265 | Ext4 uses journal commit callback for this purpose. | ||
266 | </para> | ||
267 | |||
268 | <para> | ||
269 | With journal commit callbacks you can ask the journalling layer to call a | ||
270 | callback function when the transaction is finally committed to disk, so that | ||
271 | you can do some of your own management. You ask the journalling layer for | ||
272 | calling the callback by simply setting journal->j_commit_callback function | ||
273 | pointer and that function is called after each transaction commit. You can also | ||
274 | use transaction->t_private_list for attaching entries to a transaction that | ||
275 | need processing when the transaction commits. | ||
276 | </para> | ||
277 | |||
278 | <para> | ||
279 | JBD2 also provides a way to block all transaction updates via | ||
280 | jbd2_journal_{un,}lock_updates(). Ext4 uses this when it wants a window with a | ||
281 | clean and stable fs for a moment. E.g. | ||
282 | </para> | ||
283 | |||
284 | <programlisting> | ||
285 | |||
286 | jbd2_journal_lock_updates() //stop new stuff happening.. | ||
287 | jbd2_journal_flush() // checkpoint everything. | ||
288 | ..do stuff on stable fs | ||
289 | jbd2_journal_unlock_updates() // carry on with filesystem use. | ||
290 | </programlisting> | ||
291 | |||
292 | <para> | ||
293 | The opportunities for abuse and DOS attacks with this should be obvious, | ||
294 | if you allow unprivileged userspace to trigger codepaths containing these | ||
295 | calls. | ||
296 | </para> | ||
297 | |||
298 | </sect2> | ||
299 | |||
300 | <sect2 id="jbd_summary"> | ||
301 | <title>Summary</title> | ||
302 | <para> | ||
303 | Using the journal is a matter of wrapping the different context changes, | ||
304 | being each mount, each modification (transaction) and each changed buffer | ||
305 | to tell the journalling layer about them. | ||
306 | </para> | ||
307 | |||
308 | </sect2> | ||
309 | |||
310 | </sect1> | ||
311 | |||
312 | <sect1 id="data_types"> | ||
313 | <title>Data Types</title> | ||
314 | <para> | ||
315 | The journalling layer uses typedefs to 'hide' the concrete definitions | ||
316 | of the structures used. As a client of the JBD2 layer you can | ||
317 | just rely on the using the pointer as a magic cookie of some sort. | ||
318 | |||
319 | Obviously the hiding is not enforced as this is 'C'. | ||
320 | </para> | ||
321 | <sect2 id="structures"><title>Structures</title> | ||
322 | !Iinclude/linux/jbd2.h | ||
323 | </sect2> | ||
324 | </sect1> | ||
325 | |||
326 | <sect1 id="functions"> | ||
327 | <title>Functions</title> | ||
328 | <para> | ||
329 | The functions here are split into two groups those that | ||
330 | affect a journal as a whole, and those which are used to | ||
331 | manage transactions | ||
332 | </para> | ||
333 | <sect2 id="journal_level"><title>Journal Level</title> | ||
334 | !Efs/jbd2/journal.c | ||
335 | !Ifs/jbd2/recovery.c | ||
336 | </sect2> | ||
337 | <sect2 id="transaction_level"><title>Transasction Level</title> | ||
338 | !Efs/jbd2/transaction.c | ||
339 | </sect2> | ||
340 | </sect1> | ||
341 | <sect1 id="see_also"> | ||
342 | <title>See also</title> | ||
343 | <para> | ||
344 | <citation> | ||
345 | <ulink url="http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz"> | ||
346 | Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen Tweedie | ||
347 | </ulink> | ||
348 | </citation> | ||
349 | </para> | ||
350 | <para> | ||
351 | <citation> | ||
352 | <ulink url="http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html"> | ||
353 | Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen Tweedie | ||
354 | </ulink> | ||
355 | </citation> | ||
356 | </para> | ||
357 | </sect1> | ||
358 | |||
359 | </chapter> | ||
360 | |||
361 | <chapter id="splice"> | ||
362 | <title>splice API</title> | ||
363 | <para> | ||
364 | splice is a method for moving blocks of data around inside the | ||
365 | kernel, without continually transferring them between the kernel | ||
366 | and user space. | ||
367 | </para> | ||
368 | !Ffs/splice.c | ||
369 | </chapter> | ||
370 | |||
371 | <chapter id="pipes"> | ||
372 | <title>pipes API</title> | ||
373 | <para> | ||
374 | Pipe interfaces are all for in-kernel (builtin image) use. | ||
375 | They are not exported for use by modules. | ||
376 | </para> | ||
377 | !Iinclude/linux/pipe_fs_i.h | ||
378 | !Ffs/pipe.c | ||
379 | </chapter> | ||
380 | |||
381 | </book> | ||
diff --git a/Documentation/DocBook/kernel-hacking.tmpl b/Documentation/DocBook/kernel-hacking.tmpl deleted file mode 100644 index c3c705591532..000000000000 --- a/Documentation/DocBook/kernel-hacking.tmpl +++ /dev/null | |||
@@ -1,1312 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="lk-hacking-guide"> | ||
6 | <bookinfo> | ||
7 | <title>Unreliable Guide To Hacking The Linux Kernel</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Rusty</firstname> | ||
12 | <surname>Russell</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>rusty@rustcorp.com.au</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | </authorgroup> | ||
20 | |||
21 | <copyright> | ||
22 | <year>2005</year> | ||
23 | <holder>Rusty Russell</holder> | ||
24 | </copyright> | ||
25 | |||
26 | <legalnotice> | ||
27 | <para> | ||
28 | This documentation is free software; you can redistribute | ||
29 | it and/or modify it under the terms of the GNU General Public | ||
30 | License as published by the Free Software Foundation; either | ||
31 | version 2 of the License, or (at your option) any later | ||
32 | version. | ||
33 | </para> | ||
34 | |||
35 | <para> | ||
36 | This program is distributed in the hope that it will be | ||
37 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
38 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
39 | See the GNU General Public License for more details. | ||
40 | </para> | ||
41 | |||
42 | <para> | ||
43 | You should have received a copy of the GNU General Public | ||
44 | License along with this program; if not, write to the Free | ||
45 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
46 | MA 02111-1307 USA | ||
47 | </para> | ||
48 | |||
49 | <para> | ||
50 | For more details see the file COPYING in the source | ||
51 | distribution of Linux. | ||
52 | </para> | ||
53 | </legalnotice> | ||
54 | |||
55 | <releaseinfo> | ||
56 | This is the first release of this document as part of the kernel tarball. | ||
57 | </releaseinfo> | ||
58 | |||
59 | </bookinfo> | ||
60 | |||
61 | <toc></toc> | ||
62 | |||
63 | <chapter id="introduction"> | ||
64 | <title>Introduction</title> | ||
65 | <para> | ||
66 | Welcome, gentle reader, to Rusty's Remarkably Unreliable Guide to Linux | ||
67 | Kernel Hacking. This document describes the common routines and | ||
68 | general requirements for kernel code: its goal is to serve as a | ||
69 | primer for Linux kernel development for experienced C | ||
70 | programmers. I avoid implementation details: that's what the | ||
71 | code is for, and I ignore whole tracts of useful routines. | ||
72 | </para> | ||
73 | <para> | ||
74 | Before you read this, please understand that I never wanted to | ||
75 | write this document, being grossly under-qualified, but I always | ||
76 | wanted to read it, and this was the only way. I hope it will | ||
77 | grow into a compendium of best practice, common starting points | ||
78 | and random information. | ||
79 | </para> | ||
80 | </chapter> | ||
81 | |||
82 | <chapter id="basic-players"> | ||
83 | <title>The Players</title> | ||
84 | |||
85 | <para> | ||
86 | At any time each of the CPUs in a system can be: | ||
87 | </para> | ||
88 | |||
89 | <itemizedlist> | ||
90 | <listitem> | ||
91 | <para> | ||
92 | not associated with any process, serving a hardware interrupt; | ||
93 | </para> | ||
94 | </listitem> | ||
95 | |||
96 | <listitem> | ||
97 | <para> | ||
98 | not associated with any process, serving a softirq or tasklet; | ||
99 | </para> | ||
100 | </listitem> | ||
101 | |||
102 | <listitem> | ||
103 | <para> | ||
104 | running in kernel space, associated with a process (user context); | ||
105 | </para> | ||
106 | </listitem> | ||
107 | |||
108 | <listitem> | ||
109 | <para> | ||
110 | running a process in user space. | ||
111 | </para> | ||
112 | </listitem> | ||
113 | </itemizedlist> | ||
114 | |||
115 | <para> | ||
116 | There is an ordering between these. The bottom two can preempt | ||
117 | each other, but above that is a strict hierarchy: each can only be | ||
118 | preempted by the ones above it. For example, while a softirq is | ||
119 | running on a CPU, no other softirq will preempt it, but a hardware | ||
120 | interrupt can. However, any other CPUs in the system execute | ||
121 | independently. | ||
122 | </para> | ||
123 | |||
124 | <para> | ||
125 | We'll see a number of ways that the user context can block | ||
126 | interrupts, to become truly non-preemptable. | ||
127 | </para> | ||
128 | |||
129 | <sect1 id="basics-usercontext"> | ||
130 | <title>User Context</title> | ||
131 | |||
132 | <para> | ||
133 | User context is when you are coming in from a system call or other | ||
134 | trap: like userspace, you can be preempted by more important tasks | ||
135 | and by interrupts. You can sleep, by calling | ||
136 | <function>schedule()</function>. | ||
137 | </para> | ||
138 | |||
139 | <note> | ||
140 | <para> | ||
141 | You are always in user context on module load and unload, | ||
142 | and on operations on the block device layer. | ||
143 | </para> | ||
144 | </note> | ||
145 | |||
146 | <para> | ||
147 | In user context, the <varname>current</varname> pointer (indicating | ||
148 | the task we are currently executing) is valid, and | ||
149 | <function>in_interrupt()</function> | ||
150 | (<filename>include/linux/interrupt.h</filename>) is <returnvalue>false | ||
151 | </returnvalue>. | ||
152 | </para> | ||
153 | |||
154 | <caution> | ||
155 | <para> | ||
156 | Beware that if you have preemption or softirqs disabled | ||
157 | (see below), <function>in_interrupt()</function> will return a | ||
158 | false positive. | ||
159 | </para> | ||
160 | </caution> | ||
161 | </sect1> | ||
162 | |||
163 | <sect1 id="basics-hardirqs"> | ||
164 | <title>Hardware Interrupts (Hard IRQs)</title> | ||
165 | |||
166 | <para> | ||
167 | Timer ticks, <hardware>network cards</hardware> and | ||
168 | <hardware>keyboard</hardware> are examples of real | ||
169 | hardware which produce interrupts at any time. The kernel runs | ||
170 | interrupt handlers, which services the hardware. The kernel | ||
171 | guarantees that this handler is never re-entered: if the same | ||
172 | interrupt arrives, it is queued (or dropped). Because it | ||
173 | disables interrupts, this handler has to be fast: frequently it | ||
174 | simply acknowledges the interrupt, marks a 'software interrupt' | ||
175 | for execution and exits. | ||
176 | </para> | ||
177 | |||
178 | <para> | ||
179 | You can tell you are in a hardware interrupt, because | ||
180 | <function>in_irq()</function> returns <returnvalue>true</returnvalue>. | ||
181 | </para> | ||
182 | <caution> | ||
183 | <para> | ||
184 | Beware that this will return a false positive if interrupts are disabled | ||
185 | (see below). | ||
186 | </para> | ||
187 | </caution> | ||
188 | </sect1> | ||
189 | |||
190 | <sect1 id="basics-softirqs"> | ||
191 | <title>Software Interrupt Context: Softirqs and Tasklets</title> | ||
192 | |||
193 | <para> | ||
194 | Whenever a system call is about to return to userspace, or a | ||
195 | hardware interrupt handler exits, any 'software interrupts' | ||
196 | which are marked pending (usually by hardware interrupts) are | ||
197 | run (<filename>kernel/softirq.c</filename>). | ||
198 | </para> | ||
199 | |||
200 | <para> | ||
201 | Much of the real interrupt handling work is done here. Early in | ||
202 | the transition to <acronym>SMP</acronym>, there were only 'bottom | ||
203 | halves' (BHs), which didn't take advantage of multiple CPUs. Shortly | ||
204 | after we switched from wind-up computers made of match-sticks and snot, | ||
205 | we abandoned this limitation and switched to 'softirqs'. | ||
206 | </para> | ||
207 | |||
208 | <para> | ||
209 | <filename class="headerfile">include/linux/interrupt.h</filename> lists the | ||
210 | different softirqs. A very important softirq is the | ||
211 | timer softirq (<filename | ||
212 | class="headerfile">include/linux/timer.h</filename>): you can | ||
213 | register to have it call functions for you in a given length of | ||
214 | time. | ||
215 | </para> | ||
216 | |||
217 | <para> | ||
218 | Softirqs are often a pain to deal with, since the same softirq | ||
219 | will run simultaneously on more than one CPU. For this reason, | ||
220 | tasklets (<filename | ||
221 | class="headerfile">include/linux/interrupt.h</filename>) are more | ||
222 | often used: they are dynamically-registrable (meaning you can have | ||
223 | as many as you want), and they also guarantee that any tasklet | ||
224 | will only run on one CPU at any time, although different tasklets | ||
225 | can run simultaneously. | ||
226 | </para> | ||
227 | <caution> | ||
228 | <para> | ||
229 | The name 'tasklet' is misleading: they have nothing to do with 'tasks', | ||
230 | and probably more to do with some bad vodka Alexey Kuznetsov had at the | ||
231 | time. | ||
232 | </para> | ||
233 | </caution> | ||
234 | |||
235 | <para> | ||
236 | You can tell you are in a softirq (or tasklet) | ||
237 | using the <function>in_softirq()</function> macro | ||
238 | (<filename class="headerfile">include/linux/interrupt.h</filename>). | ||
239 | </para> | ||
240 | <caution> | ||
241 | <para> | ||
242 | Beware that this will return a false positive if a bh lock (see below) | ||
243 | is held. | ||
244 | </para> | ||
245 | </caution> | ||
246 | </sect1> | ||
247 | </chapter> | ||
248 | |||
249 | <chapter id="basic-rules"> | ||
250 | <title>Some Basic Rules</title> | ||
251 | |||
252 | <variablelist> | ||
253 | <varlistentry> | ||
254 | <term>No memory protection</term> | ||
255 | <listitem> | ||
256 | <para> | ||
257 | If you corrupt memory, whether in user context or | ||
258 | interrupt context, the whole machine will crash. Are you | ||
259 | sure you can't do what you want in userspace? | ||
260 | </para> | ||
261 | </listitem> | ||
262 | </varlistentry> | ||
263 | |||
264 | <varlistentry> | ||
265 | <term>No floating point or <acronym>MMX</acronym></term> | ||
266 | <listitem> | ||
267 | <para> | ||
268 | The <acronym>FPU</acronym> context is not saved; even in user | ||
269 | context the <acronym>FPU</acronym> state probably won't | ||
270 | correspond with the current process: you would mess with some | ||
271 | user process' <acronym>FPU</acronym> state. If you really want | ||
272 | to do this, you would have to explicitly save/restore the full | ||
273 | <acronym>FPU</acronym> state (and avoid context switches). It | ||
274 | is generally a bad idea; use fixed point arithmetic first. | ||
275 | </para> | ||
276 | </listitem> | ||
277 | </varlistentry> | ||
278 | |||
279 | <varlistentry> | ||
280 | <term>A rigid stack limit</term> | ||
281 | <listitem> | ||
282 | <para> | ||
283 | Depending on configuration options the kernel stack is about 3K to 6K for most 32-bit architectures: it's | ||
284 | about 14K on most 64-bit archs, and often shared with interrupts | ||
285 | so you can't use it all. Avoid deep recursion and huge local | ||
286 | arrays on the stack (allocate them dynamically instead). | ||
287 | </para> | ||
288 | </listitem> | ||
289 | </varlistentry> | ||
290 | |||
291 | <varlistentry> | ||
292 | <term>The Linux kernel is portable</term> | ||
293 | <listitem> | ||
294 | <para> | ||
295 | Let's keep it that way. Your code should be 64-bit clean, | ||
296 | and endian-independent. You should also minimize CPU | ||
297 | specific stuff, e.g. inline assembly should be cleanly | ||
298 | encapsulated and minimized to ease porting. Generally it | ||
299 | should be restricted to the architecture-dependent part of | ||
300 | the kernel tree. | ||
301 | </para> | ||
302 | </listitem> | ||
303 | </varlistentry> | ||
304 | </variablelist> | ||
305 | </chapter> | ||
306 | |||
307 | <chapter id="ioctls"> | ||
308 | <title>ioctls: Not writing a new system call</title> | ||
309 | |||
310 | <para> | ||
311 | A system call generally looks like this | ||
312 | </para> | ||
313 | |||
314 | <programlisting> | ||
315 | asmlinkage long sys_mycall(int arg) | ||
316 | { | ||
317 | return 0; | ||
318 | } | ||
319 | </programlisting> | ||
320 | |||
321 | <para> | ||
322 | First, in most cases you don't want to create a new system call. | ||
323 | You create a character device and implement an appropriate ioctl | ||
324 | for it. This is much more flexible than system calls, doesn't have | ||
325 | to be entered in every architecture's | ||
326 | <filename class="headerfile">include/asm/unistd.h</filename> and | ||
327 | <filename>arch/kernel/entry.S</filename> file, and is much more | ||
328 | likely to be accepted by Linus. | ||
329 | </para> | ||
330 | |||
331 | <para> | ||
332 | If all your routine does is read or write some parameter, consider | ||
333 | implementing a <function>sysfs</function> interface instead. | ||
334 | </para> | ||
335 | |||
336 | <para> | ||
337 | Inside the ioctl you're in user context to a process. When a | ||
338 | error occurs you return a negated errno (see | ||
339 | <filename class="headerfile">include/linux/errno.h</filename>), | ||
340 | otherwise you return <returnvalue>0</returnvalue>. | ||
341 | </para> | ||
342 | |||
343 | <para> | ||
344 | After you slept you should check if a signal occurred: the | ||
345 | Unix/Linux way of handling signals is to temporarily exit the | ||
346 | system call with the <constant>-ERESTARTSYS</constant> error. The | ||
347 | system call entry code will switch back to user context, process | ||
348 | the signal handler and then your system call will be restarted | ||
349 | (unless the user disabled that). So you should be prepared to | ||
350 | process the restart, e.g. if you're in the middle of manipulating | ||
351 | some data structure. | ||
352 | </para> | ||
353 | |||
354 | <programlisting> | ||
355 | if (signal_pending(current)) | ||
356 | return -ERESTARTSYS; | ||
357 | </programlisting> | ||
358 | |||
359 | <para> | ||
360 | If you're doing longer computations: first think userspace. If you | ||
361 | <emphasis>really</emphasis> want to do it in kernel you should | ||
362 | regularly check if you need to give up the CPU (remember there is | ||
363 | cooperative multitasking per CPU). Idiom: | ||
364 | </para> | ||
365 | |||
366 | <programlisting> | ||
367 | cond_resched(); /* Will sleep */ | ||
368 | </programlisting> | ||
369 | |||
370 | <para> | ||
371 | A short note on interface design: the UNIX system call motto is | ||
372 | "Provide mechanism not policy". | ||
373 | </para> | ||
374 | </chapter> | ||
375 | |||
376 | <chapter id="deadlock-recipes"> | ||
377 | <title>Recipes for Deadlock</title> | ||
378 | |||
379 | <para> | ||
380 | You cannot call any routines which may sleep, unless: | ||
381 | </para> | ||
382 | <itemizedlist> | ||
383 | <listitem> | ||
384 | <para> | ||
385 | You are in user context. | ||
386 | </para> | ||
387 | </listitem> | ||
388 | |||
389 | <listitem> | ||
390 | <para> | ||
391 | You do not own any spinlocks. | ||
392 | </para> | ||
393 | </listitem> | ||
394 | |||
395 | <listitem> | ||
396 | <para> | ||
397 | You have interrupts enabled (actually, Andi Kleen says | ||
398 | that the scheduling code will enable them for you, but | ||
399 | that's probably not what you wanted). | ||
400 | </para> | ||
401 | </listitem> | ||
402 | </itemizedlist> | ||
403 | |||
404 | <para> | ||
405 | Note that some functions may sleep implicitly: common ones are | ||
406 | the user space access functions (*_user) and memory allocation | ||
407 | functions without <symbol>GFP_ATOMIC</symbol>. | ||
408 | </para> | ||
409 | |||
410 | <para> | ||
411 | You should always compile your kernel | ||
412 | <symbol>CONFIG_DEBUG_ATOMIC_SLEEP</symbol> on, and it will warn | ||
413 | you if you break these rules. If you <emphasis>do</emphasis> break | ||
414 | the rules, you will eventually lock up your box. | ||
415 | </para> | ||
416 | |||
417 | <para> | ||
418 | Really. | ||
419 | </para> | ||
420 | </chapter> | ||
421 | |||
422 | <chapter id="common-routines"> | ||
423 | <title>Common Routines</title> | ||
424 | |||
425 | <sect1 id="routines-printk"> | ||
426 | <title> | ||
427 | <function>printk()</function> | ||
428 | <filename class="headerfile">include/linux/kernel.h</filename> | ||
429 | </title> | ||
430 | |||
431 | <para> | ||
432 | <function>printk()</function> feeds kernel messages to the | ||
433 | console, dmesg, and the syslog daemon. It is useful for debugging | ||
434 | and reporting errors, and can be used inside interrupt context, | ||
435 | but use with caution: a machine which has its console flooded with | ||
436 | printk messages is unusable. It uses a format string mostly | ||
437 | compatible with ANSI C printf, and C string concatenation to give | ||
438 | it a first "priority" argument: | ||
439 | </para> | ||
440 | |||
441 | <programlisting> | ||
442 | printk(KERN_INFO "i = %u\n", i); | ||
443 | </programlisting> | ||
444 | |||
445 | <para> | ||
446 | See <filename class="headerfile">include/linux/kernel.h</filename>; | ||
447 | for other KERN_ values; these are interpreted by syslog as the | ||
448 | level. Special case: for printing an IP address use | ||
449 | </para> | ||
450 | |||
451 | <programlisting> | ||
452 | __be32 ipaddress; | ||
453 | printk(KERN_INFO "my ip: %pI4\n", &ipaddress); | ||
454 | </programlisting> | ||
455 | |||
456 | <para> | ||
457 | <function>printk()</function> internally uses a 1K buffer and does | ||
458 | not catch overruns. Make sure that will be enough. | ||
459 | </para> | ||
460 | |||
461 | <note> | ||
462 | <para> | ||
463 | You will know when you are a real kernel hacker | ||
464 | when you start typoing printf as printk in your user programs :) | ||
465 | </para> | ||
466 | </note> | ||
467 | |||
468 | <!--- From the Lions book reader department --> | ||
469 | |||
470 | <note> | ||
471 | <para> | ||
472 | Another sidenote: the original Unix Version 6 sources had a | ||
473 | comment on top of its printf function: "Printf should not be | ||
474 | used for chit-chat". You should follow that advice. | ||
475 | </para> | ||
476 | </note> | ||
477 | </sect1> | ||
478 | |||
479 | <sect1 id="routines-copy"> | ||
480 | <title> | ||
481 | <function>copy_[to/from]_user()</function> | ||
482 | / | ||
483 | <function>get_user()</function> | ||
484 | / | ||
485 | <function>put_user()</function> | ||
486 | <filename class="headerfile">include/linux/uaccess.h</filename> | ||
487 | </title> | ||
488 | |||
489 | <para> | ||
490 | <emphasis>[SLEEPS]</emphasis> | ||
491 | </para> | ||
492 | |||
493 | <para> | ||
494 | <function>put_user()</function> and <function>get_user()</function> | ||
495 | are used to get and put single values (such as an int, char, or | ||
496 | long) from and to userspace. A pointer into userspace should | ||
497 | never be simply dereferenced: data should be copied using these | ||
498 | routines. Both return <constant>-EFAULT</constant> or 0. | ||
499 | </para> | ||
500 | <para> | ||
501 | <function>copy_to_user()</function> and | ||
502 | <function>copy_from_user()</function> are more general: they copy | ||
503 | an arbitrary amount of data to and from userspace. | ||
504 | <caution> | ||
505 | <para> | ||
506 | Unlike <function>put_user()</function> and | ||
507 | <function>get_user()</function>, they return the amount of | ||
508 | uncopied data (ie. <returnvalue>0</returnvalue> still means | ||
509 | success). | ||
510 | </para> | ||
511 | </caution> | ||
512 | [Yes, this moronic interface makes me cringe. The flamewar comes up every year or so. --RR.] | ||
513 | </para> | ||
514 | <para> | ||
515 | The functions may sleep implicitly. This should never be called | ||
516 | outside user context (it makes no sense), with interrupts | ||
517 | disabled, or a spinlock held. | ||
518 | </para> | ||
519 | </sect1> | ||
520 | |||
521 | <sect1 id="routines-kmalloc"> | ||
522 | <title><function>kmalloc()</function>/<function>kfree()</function> | ||
523 | <filename class="headerfile">include/linux/slab.h</filename></title> | ||
524 | |||
525 | <para> | ||
526 | <emphasis>[MAY SLEEP: SEE BELOW]</emphasis> | ||
527 | </para> | ||
528 | |||
529 | <para> | ||
530 | These routines are used to dynamically request pointer-aligned | ||
531 | chunks of memory, like malloc and free do in userspace, but | ||
532 | <function>kmalloc()</function> takes an extra flag word. | ||
533 | Important values: | ||
534 | </para> | ||
535 | |||
536 | <variablelist> | ||
537 | <varlistentry> | ||
538 | <term> | ||
539 | <constant> | ||
540 | GFP_KERNEL | ||
541 | </constant> | ||
542 | </term> | ||
543 | <listitem> | ||
544 | <para> | ||
545 | May sleep and swap to free memory. Only allowed in user | ||
546 | context, but is the most reliable way to allocate memory. | ||
547 | </para> | ||
548 | </listitem> | ||
549 | </varlistentry> | ||
550 | |||
551 | <varlistentry> | ||
552 | <term> | ||
553 | <constant> | ||
554 | GFP_ATOMIC | ||
555 | </constant> | ||
556 | </term> | ||
557 | <listitem> | ||
558 | <para> | ||
559 | Don't sleep. Less reliable than <constant>GFP_KERNEL</constant>, | ||
560 | but may be called from interrupt context. You should | ||
561 | <emphasis>really</emphasis> have a good out-of-memory | ||
562 | error-handling strategy. | ||
563 | </para> | ||
564 | </listitem> | ||
565 | </varlistentry> | ||
566 | |||
567 | <varlistentry> | ||
568 | <term> | ||
569 | <constant> | ||
570 | GFP_DMA | ||
571 | </constant> | ||
572 | </term> | ||
573 | <listitem> | ||
574 | <para> | ||
575 | Allocate ISA DMA lower than 16MB. If you don't know what that | ||
576 | is you don't need it. Very unreliable. | ||
577 | </para> | ||
578 | </listitem> | ||
579 | </varlistentry> | ||
580 | </variablelist> | ||
581 | |||
582 | <para> | ||
583 | If you see a <errorname>sleeping function called from invalid | ||
584 | context</errorname> warning message, then maybe you called a | ||
585 | sleeping allocation function from interrupt context without | ||
586 | <constant>GFP_ATOMIC</constant>. You should really fix that. | ||
587 | Run, don't walk. | ||
588 | </para> | ||
589 | |||
590 | <para> | ||
591 | If you are allocating at least <constant>PAGE_SIZE</constant> | ||
592 | (<filename class="headerfile">include/asm/page.h</filename>) bytes, | ||
593 | consider using <function>__get_free_pages()</function> | ||
594 | |||
595 | (<filename class="headerfile">include/linux/mm.h</filename>). It | ||
596 | takes an order argument (0 for page sized, 1 for double page, 2 | ||
597 | for four pages etc.) and the same memory priority flag word as | ||
598 | above. | ||
599 | </para> | ||
600 | |||
601 | <para> | ||
602 | If you are allocating more than a page worth of bytes you can use | ||
603 | <function>vmalloc()</function>. It'll allocate virtual memory in | ||
604 | the kernel map. This block is not contiguous in physical memory, | ||
605 | but the <acronym>MMU</acronym> makes it look like it is for you | ||
606 | (so it'll only look contiguous to the CPUs, not to external device | ||
607 | drivers). If you really need large physically contiguous memory | ||
608 | for some weird device, you have a problem: it is poorly supported | ||
609 | in Linux because after some time memory fragmentation in a running | ||
610 | kernel makes it hard. The best way is to allocate the block early | ||
611 | in the boot process via the <function>alloc_bootmem()</function> | ||
612 | routine. | ||
613 | </para> | ||
614 | |||
615 | <para> | ||
616 | Before inventing your own cache of often-used objects consider | ||
617 | using a slab cache in | ||
618 | <filename class="headerfile">include/linux/slab.h</filename> | ||
619 | </para> | ||
620 | </sect1> | ||
621 | |||
622 | <sect1 id="routines-current"> | ||
623 | <title><function>current</function> | ||
624 | <filename class="headerfile">include/asm/current.h</filename></title> | ||
625 | |||
626 | <para> | ||
627 | This global variable (really a macro) contains a pointer to | ||
628 | the current task structure, so is only valid in user context. | ||
629 | For example, when a process makes a system call, this will | ||
630 | point to the task structure of the calling process. It is | ||
631 | <emphasis>not NULL</emphasis> in interrupt context. | ||
632 | </para> | ||
633 | </sect1> | ||
634 | |||
635 | <sect1 id="routines-udelay"> | ||
636 | <title><function>mdelay()</function>/<function>udelay()</function> | ||
637 | <filename class="headerfile">include/asm/delay.h</filename> | ||
638 | <filename class="headerfile">include/linux/delay.h</filename> | ||
639 | </title> | ||
640 | |||
641 | <para> | ||
642 | The <function>udelay()</function> and <function>ndelay()</function> functions can be used for small pauses. | ||
643 | Do not use large values with them as you risk | ||
644 | overflow - the helper function <function>mdelay()</function> is useful | ||
645 | here, or consider <function>msleep()</function>. | ||
646 | </para> | ||
647 | </sect1> | ||
648 | |||
649 | <sect1 id="routines-endian"> | ||
650 | <title><function>cpu_to_be32()</function>/<function>be32_to_cpu()</function>/<function>cpu_to_le32()</function>/<function>le32_to_cpu()</function> | ||
651 | <filename class="headerfile">include/asm/byteorder.h</filename> | ||
652 | </title> | ||
653 | |||
654 | <para> | ||
655 | The <function>cpu_to_be32()</function> family (where the "32" can | ||
656 | be replaced by 64 or 16, and the "be" can be replaced by "le") are | ||
657 | the general way to do endian conversions in the kernel: they | ||
658 | return the converted value. All variations supply the reverse as | ||
659 | well: <function>be32_to_cpu()</function>, etc. | ||
660 | </para> | ||
661 | |||
662 | <para> | ||
663 | There are two major variations of these functions: the pointer | ||
664 | variation, such as <function>cpu_to_be32p()</function>, which take | ||
665 | a pointer to the given type, and return the converted value. The | ||
666 | other variation is the "in-situ" family, such as | ||
667 | <function>cpu_to_be32s()</function>, which convert value referred | ||
668 | to by the pointer, and return void. | ||
669 | </para> | ||
670 | </sect1> | ||
671 | |||
672 | <sect1 id="routines-local-irqs"> | ||
673 | <title><function>local_irq_save()</function>/<function>local_irq_restore()</function> | ||
674 | <filename class="headerfile">include/linux/irqflags.h</filename> | ||
675 | </title> | ||
676 | |||
677 | <para> | ||
678 | These routines disable hard interrupts on the local CPU, and | ||
679 | restore them. They are reentrant; saving the previous state in | ||
680 | their one <varname>unsigned long flags</varname> argument. If you | ||
681 | know that interrupts are enabled, you can simply use | ||
682 | <function>local_irq_disable()</function> and | ||
683 | <function>local_irq_enable()</function>. | ||
684 | </para> | ||
685 | </sect1> | ||
686 | |||
687 | <sect1 id="routines-softirqs"> | ||
688 | <title><function>local_bh_disable()</function>/<function>local_bh_enable()</function> | ||
689 | <filename class="headerfile">include/linux/interrupt.h</filename></title> | ||
690 | |||
691 | <para> | ||
692 | These routines disable soft interrupts on the local CPU, and | ||
693 | restore them. They are reentrant; if soft interrupts were | ||
694 | disabled before, they will still be disabled after this pair | ||
695 | of functions has been called. They prevent softirqs and tasklets | ||
696 | from running on the current CPU. | ||
697 | </para> | ||
698 | </sect1> | ||
699 | |||
700 | <sect1 id="routines-processorids"> | ||
701 | <title><function>smp_processor_id</function>() | ||
702 | <filename class="headerfile">include/asm/smp.h</filename></title> | ||
703 | |||
704 | <para> | ||
705 | <function>get_cpu()</function> disables preemption (so you won't | ||
706 | suddenly get moved to another CPU) and returns the current | ||
707 | processor number, between 0 and <symbol>NR_CPUS</symbol>. Note | ||
708 | that the CPU numbers are not necessarily continuous. You return | ||
709 | it again with <function>put_cpu()</function> when you are done. | ||
710 | </para> | ||
711 | <para> | ||
712 | If you know you cannot be preempted by another task (ie. you are | ||
713 | in interrupt context, or have preemption disabled) you can use | ||
714 | smp_processor_id(). | ||
715 | </para> | ||
716 | </sect1> | ||
717 | |||
718 | <sect1 id="routines-init"> | ||
719 | <title><type>__init</type>/<type>__exit</type>/<type>__initdata</type> | ||
720 | <filename class="headerfile">include/linux/init.h</filename></title> | ||
721 | |||
722 | <para> | ||
723 | After boot, the kernel frees up a special section; functions | ||
724 | marked with <type>__init</type> and data structures marked with | ||
725 | <type>__initdata</type> are dropped after boot is complete: similarly | ||
726 | modules discard this memory after initialization. <type>__exit</type> | ||
727 | is used to declare a function which is only required on exit: the | ||
728 | function will be dropped if this file is not compiled as a module. | ||
729 | See the header file for use. Note that it makes no sense for a function | ||
730 | marked with <type>__init</type> to be exported to modules with | ||
731 | <function>EXPORT_SYMBOL()</function> - this will break. | ||
732 | </para> | ||
733 | |||
734 | </sect1> | ||
735 | |||
736 | <sect1 id="routines-init-again"> | ||
737 | <title><function>__initcall()</function>/<function>module_init()</function> | ||
738 | <filename class="headerfile">include/linux/init.h</filename></title> | ||
739 | <para> | ||
740 | Many parts of the kernel are well served as a module | ||
741 | (dynamically-loadable parts of the kernel). Using the | ||
742 | <function>module_init()</function> and | ||
743 | <function>module_exit()</function> macros it is easy to write code | ||
744 | without #ifdefs which can operate both as a module or built into | ||
745 | the kernel. | ||
746 | </para> | ||
747 | |||
748 | <para> | ||
749 | The <function>module_init()</function> macro defines which | ||
750 | function is to be called at module insertion time (if the file is | ||
751 | compiled as a module), or at boot time: if the file is not | ||
752 | compiled as a module the <function>module_init()</function> macro | ||
753 | becomes equivalent to <function>__initcall()</function>, which | ||
754 | through linker magic ensures that the function is called on boot. | ||
755 | </para> | ||
756 | |||
757 | <para> | ||
758 | The function can return a negative error number to cause | ||
759 | module loading to fail (unfortunately, this has no effect if | ||
760 | the module is compiled into the kernel). This function is | ||
761 | called in user context with interrupts enabled, so it can sleep. | ||
762 | </para> | ||
763 | </sect1> | ||
764 | |||
765 | <sect1 id="routines-moduleexit"> | ||
766 | <title> <function>module_exit()</function> | ||
767 | <filename class="headerfile">include/linux/init.h</filename> </title> | ||
768 | |||
769 | <para> | ||
770 | This macro defines the function to be called at module removal | ||
771 | time (or never, in the case of the file compiled into the | ||
772 | kernel). It will only be called if the module usage count has | ||
773 | reached zero. This function can also sleep, but cannot fail: | ||
774 | everything must be cleaned up by the time it returns. | ||
775 | </para> | ||
776 | |||
777 | <para> | ||
778 | Note that this macro is optional: if it is not present, your | ||
779 | module will not be removable (except for 'rmmod -f'). | ||
780 | </para> | ||
781 | </sect1> | ||
782 | |||
783 | <sect1 id="routines-module-use-counters"> | ||
784 | <title> <function>try_module_get()</function>/<function>module_put()</function> | ||
785 | <filename class="headerfile">include/linux/module.h</filename></title> | ||
786 | |||
787 | <para> | ||
788 | These manipulate the module usage count, to protect against | ||
789 | removal (a module also can't be removed if another module uses one | ||
790 | of its exported symbols: see below). Before calling into module | ||
791 | code, you should call <function>try_module_get()</function> on | ||
792 | that module: if it fails, then the module is being removed and you | ||
793 | should act as if it wasn't there. Otherwise, you can safely enter | ||
794 | the module, and call <function>module_put()</function> when you're | ||
795 | finished. | ||
796 | </para> | ||
797 | |||
798 | <para> | ||
799 | Most registerable structures have an | ||
800 | <structfield>owner</structfield> field, such as in the | ||
801 | <structname>file_operations</structname> structure. Set this field | ||
802 | to the macro <symbol>THIS_MODULE</symbol>. | ||
803 | </para> | ||
804 | </sect1> | ||
805 | |||
806 | <!-- add info on new-style module refcounting here --> | ||
807 | </chapter> | ||
808 | |||
809 | <chapter id="queues"> | ||
810 | <title>Wait Queues | ||
811 | <filename class="headerfile">include/linux/wait.h</filename> | ||
812 | </title> | ||
813 | <para> | ||
814 | <emphasis>[SLEEPS]</emphasis> | ||
815 | </para> | ||
816 | |||
817 | <para> | ||
818 | A wait queue is used to wait for someone to wake you up when a | ||
819 | certain condition is true. They must be used carefully to ensure | ||
820 | there is no race condition. You declare a | ||
821 | <type>wait_queue_head_t</type>, and then processes which want to | ||
822 | wait for that condition declare a <type>wait_queue_entry_t</type> | ||
823 | referring to themselves, and place that in the queue. | ||
824 | </para> | ||
825 | |||
826 | <sect1 id="queue-declaring"> | ||
827 | <title>Declaring</title> | ||
828 | |||
829 | <para> | ||
830 | You declare a <type>wait_queue_head_t</type> using the | ||
831 | <function>DECLARE_WAIT_QUEUE_HEAD()</function> macro, or using the | ||
832 | <function>init_waitqueue_head()</function> routine in your | ||
833 | initialization code. | ||
834 | </para> | ||
835 | </sect1> | ||
836 | |||
837 | <sect1 id="queue-waitqueue"> | ||
838 | <title>Queuing</title> | ||
839 | |||
840 | <para> | ||
841 | Placing yourself in the waitqueue is fairly complex, because you | ||
842 | must put yourself in the queue before checking the condition. | ||
843 | There is a macro to do this: | ||
844 | <function>wait_event_interruptible()</function> | ||
845 | |||
846 | <filename class="headerfile">include/linux/wait.h</filename> The | ||
847 | first argument is the wait queue head, and the second is an | ||
848 | expression which is evaluated; the macro returns | ||
849 | <returnvalue>0</returnvalue> when this expression is true, or | ||
850 | <returnvalue>-ERESTARTSYS</returnvalue> if a signal is received. | ||
851 | The <function>wait_event()</function> version ignores signals. | ||
852 | </para> | ||
853 | |||
854 | </sect1> | ||
855 | |||
856 | <sect1 id="queue-waking"> | ||
857 | <title>Waking Up Queued Tasks</title> | ||
858 | |||
859 | <para> | ||
860 | Call <function>wake_up()</function> | ||
861 | |||
862 | <filename class="headerfile">include/linux/wait.h</filename>;, | ||
863 | which will wake up every process in the queue. The exception is | ||
864 | if one has <constant>TASK_EXCLUSIVE</constant> set, in which case | ||
865 | the remainder of the queue will not be woken. There are other variants | ||
866 | of this basic function available in the same header. | ||
867 | </para> | ||
868 | </sect1> | ||
869 | </chapter> | ||
870 | |||
871 | <chapter id="atomic-ops"> | ||
872 | <title>Atomic Operations</title> | ||
873 | |||
874 | <para> | ||
875 | Certain operations are guaranteed atomic on all platforms. The | ||
876 | first class of operations work on <type>atomic_t</type> | ||
877 | |||
878 | <filename class="headerfile">include/asm/atomic.h</filename>; this | ||
879 | contains a signed integer (at least 32 bits long), and you must use | ||
880 | these functions to manipulate or read atomic_t variables. | ||
881 | <function>atomic_read()</function> and | ||
882 | <function>atomic_set()</function> get and set the counter, | ||
883 | <function>atomic_add()</function>, | ||
884 | <function>atomic_sub()</function>, | ||
885 | <function>atomic_inc()</function>, | ||
886 | <function>atomic_dec()</function>, and | ||
887 | <function>atomic_dec_and_test()</function> (returns | ||
888 | <returnvalue>true</returnvalue> if it was decremented to zero). | ||
889 | </para> | ||
890 | |||
891 | <para> | ||
892 | Yes. It returns <returnvalue>true</returnvalue> (i.e. != 0) if the | ||
893 | atomic variable is zero. | ||
894 | </para> | ||
895 | |||
896 | <para> | ||
897 | Note that these functions are slower than normal arithmetic, and | ||
898 | so should not be used unnecessarily. | ||
899 | </para> | ||
900 | |||
901 | <para> | ||
902 | The second class of atomic operations is atomic bit operations on an | ||
903 | <type>unsigned long</type>, defined in | ||
904 | |||
905 | <filename class="headerfile">include/linux/bitops.h</filename>. These | ||
906 | operations generally take a pointer to the bit pattern, and a bit | ||
907 | number: 0 is the least significant bit. | ||
908 | <function>set_bit()</function>, <function>clear_bit()</function> | ||
909 | and <function>change_bit()</function> set, clear, and flip the | ||
910 | given bit. <function>test_and_set_bit()</function>, | ||
911 | <function>test_and_clear_bit()</function> and | ||
912 | <function>test_and_change_bit()</function> do the same thing, | ||
913 | except return true if the bit was previously set; these are | ||
914 | particularly useful for atomically setting flags. | ||
915 | </para> | ||
916 | |||
917 | <para> | ||
918 | It is possible to call these operations with bit indices greater | ||
919 | than BITS_PER_LONG. The resulting behavior is strange on big-endian | ||
920 | platforms though so it is a good idea not to do this. | ||
921 | </para> | ||
922 | </chapter> | ||
923 | |||
924 | <chapter id="symbols"> | ||
925 | <title>Symbols</title> | ||
926 | |||
927 | <para> | ||
928 | Within the kernel proper, the normal linking rules apply | ||
929 | (ie. unless a symbol is declared to be file scope with the | ||
930 | <type>static</type> keyword, it can be used anywhere in the | ||
931 | kernel). However, for modules, a special exported symbol table is | ||
932 | kept which limits the entry points to the kernel proper. Modules | ||
933 | can also export symbols. | ||
934 | </para> | ||
935 | |||
936 | <sect1 id="sym-exportsymbols"> | ||
937 | <title><function>EXPORT_SYMBOL()</function> | ||
938 | <filename class="headerfile">include/linux/export.h</filename></title> | ||
939 | |||
940 | <para> | ||
941 | This is the classic method of exporting a symbol: dynamically | ||
942 | loaded modules will be able to use the symbol as normal. | ||
943 | </para> | ||
944 | </sect1> | ||
945 | |||
946 | <sect1 id="sym-exportsymbols-gpl"> | ||
947 | <title><function>EXPORT_SYMBOL_GPL()</function> | ||
948 | <filename class="headerfile">include/linux/export.h</filename></title> | ||
949 | |||
950 | <para> | ||
951 | Similar to <function>EXPORT_SYMBOL()</function> except that the | ||
952 | symbols exported by <function>EXPORT_SYMBOL_GPL()</function> can | ||
953 | only be seen by modules with a | ||
954 | <function>MODULE_LICENSE()</function> that specifies a GPL | ||
955 | compatible license. It implies that the function is considered | ||
956 | an internal implementation issue, and not really an interface. | ||
957 | Some maintainers and developers may however | ||
958 | require EXPORT_SYMBOL_GPL() when adding any new APIs or functionality. | ||
959 | </para> | ||
960 | </sect1> | ||
961 | </chapter> | ||
962 | |||
963 | <chapter id="conventions"> | ||
964 | <title>Routines and Conventions</title> | ||
965 | |||
966 | <sect1 id="conventions-doublelinkedlist"> | ||
967 | <title>Double-linked lists | ||
968 | <filename class="headerfile">include/linux/list.h</filename></title> | ||
969 | |||
970 | <para> | ||
971 | There used to be three sets of linked-list routines in the kernel | ||
972 | headers, but this one is the winner. If you don't have some | ||
973 | particular pressing need for a single list, it's a good choice. | ||
974 | </para> | ||
975 | |||
976 | <para> | ||
977 | In particular, <function>list_for_each_entry</function> is useful. | ||
978 | </para> | ||
979 | </sect1> | ||
980 | |||
981 | <sect1 id="convention-returns"> | ||
982 | <title>Return Conventions</title> | ||
983 | |||
984 | <para> | ||
985 | For code called in user context, it's very common to defy C | ||
986 | convention, and return <returnvalue>0</returnvalue> for success, | ||
987 | and a negative error number | ||
988 | (eg. <returnvalue>-EFAULT</returnvalue>) for failure. This can be | ||
989 | unintuitive at first, but it's fairly widespread in the kernel. | ||
990 | </para> | ||
991 | |||
992 | <para> | ||
993 | Using <function>ERR_PTR()</function> | ||
994 | |||
995 | <filename class="headerfile">include/linux/err.h</filename>; to | ||
996 | encode a negative error number into a pointer, and | ||
997 | <function>IS_ERR()</function> and <function>PTR_ERR()</function> | ||
998 | to get it back out again: avoids a separate pointer parameter for | ||
999 | the error number. Icky, but in a good way. | ||
1000 | </para> | ||
1001 | </sect1> | ||
1002 | |||
1003 | <sect1 id="conventions-borkedcompile"> | ||
1004 | <title>Breaking Compilation</title> | ||
1005 | |||
1006 | <para> | ||
1007 | Linus and the other developers sometimes change function or | ||
1008 | structure names in development kernels; this is not done just to | ||
1009 | keep everyone on their toes: it reflects a fundamental change | ||
1010 | (eg. can no longer be called with interrupts on, or does extra | ||
1011 | checks, or doesn't do checks which were caught before). Usually | ||
1012 | this is accompanied by a fairly complete note to the linux-kernel | ||
1013 | mailing list; search the archive. Simply doing a global replace | ||
1014 | on the file usually makes things <emphasis>worse</emphasis>. | ||
1015 | </para> | ||
1016 | </sect1> | ||
1017 | |||
1018 | <sect1 id="conventions-initialising"> | ||
1019 | <title>Initializing structure members</title> | ||
1020 | |||
1021 | <para> | ||
1022 | The preferred method of initializing structures is to use | ||
1023 | designated initialisers, as defined by ISO C99, eg: | ||
1024 | </para> | ||
1025 | <programlisting> | ||
1026 | static struct block_device_operations opt_fops = { | ||
1027 | .open = opt_open, | ||
1028 | .release = opt_release, | ||
1029 | .ioctl = opt_ioctl, | ||
1030 | .check_media_change = opt_media_change, | ||
1031 | }; | ||
1032 | </programlisting> | ||
1033 | <para> | ||
1034 | This makes it easy to grep for, and makes it clear which | ||
1035 | structure fields are set. You should do this because it looks | ||
1036 | cool. | ||
1037 | </para> | ||
1038 | </sect1> | ||
1039 | |||
1040 | <sect1 id="conventions-gnu-extns"> | ||
1041 | <title>GNU Extensions</title> | ||
1042 | |||
1043 | <para> | ||
1044 | GNU Extensions are explicitly allowed in the Linux kernel. | ||
1045 | Note that some of the more complex ones are not very well | ||
1046 | supported, due to lack of general use, but the following are | ||
1047 | considered standard (see the GCC info page section "C | ||
1048 | Extensions" for more details - Yes, really the info page, the | ||
1049 | man page is only a short summary of the stuff in info). | ||
1050 | </para> | ||
1051 | <itemizedlist> | ||
1052 | <listitem> | ||
1053 | <para> | ||
1054 | Inline functions | ||
1055 | </para> | ||
1056 | </listitem> | ||
1057 | <listitem> | ||
1058 | <para> | ||
1059 | Statement expressions (ie. the ({ and }) constructs). | ||
1060 | </para> | ||
1061 | </listitem> | ||
1062 | <listitem> | ||
1063 | <para> | ||
1064 | Declaring attributes of a function / variable / type | ||
1065 | (__attribute__) | ||
1066 | </para> | ||
1067 | </listitem> | ||
1068 | <listitem> | ||
1069 | <para> | ||
1070 | typeof | ||
1071 | </para> | ||
1072 | </listitem> | ||
1073 | <listitem> | ||
1074 | <para> | ||
1075 | Zero length arrays | ||
1076 | </para> | ||
1077 | </listitem> | ||
1078 | <listitem> | ||
1079 | <para> | ||
1080 | Macro varargs | ||
1081 | </para> | ||
1082 | </listitem> | ||
1083 | <listitem> | ||
1084 | <para> | ||
1085 | Arithmetic on void pointers | ||
1086 | </para> | ||
1087 | </listitem> | ||
1088 | <listitem> | ||
1089 | <para> | ||
1090 | Non-Constant initializers | ||
1091 | </para> | ||
1092 | </listitem> | ||
1093 | <listitem> | ||
1094 | <para> | ||
1095 | Assembler Instructions (not outside arch/ and include/asm/) | ||
1096 | </para> | ||
1097 | </listitem> | ||
1098 | <listitem> | ||
1099 | <para> | ||
1100 | Function names as strings (__func__). | ||
1101 | </para> | ||
1102 | </listitem> | ||
1103 | <listitem> | ||
1104 | <para> | ||
1105 | __builtin_constant_p() | ||
1106 | </para> | ||
1107 | </listitem> | ||
1108 | </itemizedlist> | ||
1109 | |||
1110 | <para> | ||
1111 | Be wary when using long long in the kernel, the code gcc generates for | ||
1112 | it is horrible and worse: division and multiplication does not work | ||
1113 | on i386 because the GCC runtime functions for it are missing from | ||
1114 | the kernel environment. | ||
1115 | </para> | ||
1116 | |||
1117 | <!-- FIXME: add a note about ANSI aliasing cleanness --> | ||
1118 | </sect1> | ||
1119 | |||
1120 | <sect1 id="conventions-cplusplus"> | ||
1121 | <title>C++</title> | ||
1122 | |||
1123 | <para> | ||
1124 | Using C++ in the kernel is usually a bad idea, because the | ||
1125 | kernel does not provide the necessary runtime environment | ||
1126 | and the include files are not tested for it. It is still | ||
1127 | possible, but not recommended. If you really want to do | ||
1128 | this, forget about exceptions at least. | ||
1129 | </para> | ||
1130 | </sect1> | ||
1131 | |||
1132 | <sect1 id="conventions-ifdef"> | ||
1133 | <title>#if</title> | ||
1134 | |||
1135 | <para> | ||
1136 | It is generally considered cleaner to use macros in header files | ||
1137 | (or at the top of .c files) to abstract away functions rather than | ||
1138 | using `#if' pre-processor statements throughout the source code. | ||
1139 | </para> | ||
1140 | </sect1> | ||
1141 | </chapter> | ||
1142 | |||
1143 | <chapter id="submitting"> | ||
1144 | <title>Putting Your Stuff in the Kernel</title> | ||
1145 | |||
1146 | <para> | ||
1147 | In order to get your stuff into shape for official inclusion, or | ||
1148 | even to make a neat patch, there's administrative work to be | ||
1149 | done: | ||
1150 | </para> | ||
1151 | <itemizedlist> | ||
1152 | <listitem> | ||
1153 | <para> | ||
1154 | Figure out whose pond you've been pissing in. Look at the top of | ||
1155 | the source files, inside the <filename>MAINTAINERS</filename> | ||
1156 | file, and last of all in the <filename>CREDITS</filename> file. | ||
1157 | You should coordinate with this person to make sure you're not | ||
1158 | duplicating effort, or trying something that's already been | ||
1159 | rejected. | ||
1160 | </para> | ||
1161 | |||
1162 | <para> | ||
1163 | Make sure you put your name and EMail address at the top of | ||
1164 | any files you create or mangle significantly. This is the | ||
1165 | first place people will look when they find a bug, or when | ||
1166 | <emphasis>they</emphasis> want to make a change. | ||
1167 | </para> | ||
1168 | </listitem> | ||
1169 | |||
1170 | <listitem> | ||
1171 | <para> | ||
1172 | Usually you want a configuration option for your kernel hack. | ||
1173 | Edit <filename>Kconfig</filename> in the appropriate directory. | ||
1174 | The Config language is simple to use by cut and paste, and there's | ||
1175 | complete documentation in | ||
1176 | <filename>Documentation/kbuild/kconfig-language.txt</filename>. | ||
1177 | </para> | ||
1178 | |||
1179 | <para> | ||
1180 | In your description of the option, make sure you address both the | ||
1181 | expert user and the user who knows nothing about your feature. Mention | ||
1182 | incompatibilities and issues here. <emphasis> Definitely | ||
1183 | </emphasis> end your description with <quote> if in doubt, say N | ||
1184 | </quote> (or, occasionally, `Y'); this is for people who have no | ||
1185 | idea what you are talking about. | ||
1186 | </para> | ||
1187 | </listitem> | ||
1188 | |||
1189 | <listitem> | ||
1190 | <para> | ||
1191 | Edit the <filename>Makefile</filename>: the CONFIG variables are | ||
1192 | exported here so you can usually just add a "obj-$(CONFIG_xxx) += | ||
1193 | xxx.o" line. The syntax is documented in | ||
1194 | <filename>Documentation/kbuild/makefiles.txt</filename>. | ||
1195 | </para> | ||
1196 | </listitem> | ||
1197 | |||
1198 | <listitem> | ||
1199 | <para> | ||
1200 | Put yourself in <filename>CREDITS</filename> if you've done | ||
1201 | something noteworthy, usually beyond a single file (your name | ||
1202 | should be at the top of the source files anyway). | ||
1203 | <filename>MAINTAINERS</filename> means you want to be consulted | ||
1204 | when changes are made to a subsystem, and hear about bugs; it | ||
1205 | implies a more-than-passing commitment to some part of the code. | ||
1206 | </para> | ||
1207 | </listitem> | ||
1208 | |||
1209 | <listitem> | ||
1210 | <para> | ||
1211 | Finally, don't forget to read <filename>Documentation/process/submitting-patches.rst</filename> | ||
1212 | and possibly <filename>Documentation/process/submitting-drivers.rst</filename>. | ||
1213 | </para> | ||
1214 | </listitem> | ||
1215 | </itemizedlist> | ||
1216 | </chapter> | ||
1217 | |||
1218 | <chapter id="cantrips"> | ||
1219 | <title>Kernel Cantrips</title> | ||
1220 | |||
1221 | <para> | ||
1222 | Some favorites from browsing the source. Feel free to add to this | ||
1223 | list. | ||
1224 | </para> | ||
1225 | |||
1226 | <para> | ||
1227 | <filename>arch/x86/include/asm/delay.h:</filename> | ||
1228 | </para> | ||
1229 | <programlisting> | ||
1230 | #define ndelay(n) (__builtin_constant_p(n) ? \ | ||
1231 | ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \ | ||
1232 | __ndelay(n)) | ||
1233 | </programlisting> | ||
1234 | |||
1235 | <para> | ||
1236 | <filename>include/linux/fs.h</filename>: | ||
1237 | </para> | ||
1238 | <programlisting> | ||
1239 | /* | ||
1240 | * Kernel pointers have redundant information, so we can use a | ||
1241 | * scheme where we can return either an error code or a dentry | ||
1242 | * pointer with the same return value. | ||
1243 | * | ||
1244 | * This should be a per-architecture thing, to allow different | ||
1245 | * error and pointer decisions. | ||
1246 | */ | ||
1247 | #define ERR_PTR(err) ((void *)((long)(err))) | ||
1248 | #define PTR_ERR(ptr) ((long)(ptr)) | ||
1249 | #define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000)) | ||
1250 | </programlisting> | ||
1251 | |||
1252 | <para> | ||
1253 | <filename>arch/x86/include/asm/uaccess_32.h:</filename> | ||
1254 | </para> | ||
1255 | |||
1256 | <programlisting> | ||
1257 | #define copy_to_user(to,from,n) \ | ||
1258 | (__builtin_constant_p(n) ? \ | ||
1259 | __constant_copy_to_user((to),(from),(n)) : \ | ||
1260 | __generic_copy_to_user((to),(from),(n))) | ||
1261 | </programlisting> | ||
1262 | |||
1263 | <para> | ||
1264 | <filename>arch/sparc/kernel/head.S:</filename> | ||
1265 | </para> | ||
1266 | |||
1267 | <programlisting> | ||
1268 | /* | ||
1269 | * Sun people can't spell worth damn. "compatability" indeed. | ||
1270 | * At least we *know* we can't spell, and use a spell-checker. | ||
1271 | */ | ||
1272 | |||
1273 | /* Uh, actually Linus it is I who cannot spell. Too much murky | ||
1274 | * Sparc assembly will do this to ya. | ||
1275 | */ | ||
1276 | C_LABEL(cputypvar): | ||
1277 | .asciz "compatibility" | ||
1278 | |||
1279 | /* Tested on SS-5, SS-10. Probably someone at Sun applied a spell-checker. */ | ||
1280 | .align 4 | ||
1281 | C_LABEL(cputypvar_sun4m): | ||
1282 | .asciz "compatible" | ||
1283 | </programlisting> | ||
1284 | |||
1285 | <para> | ||
1286 | <filename>arch/sparc/lib/checksum.S:</filename> | ||
1287 | </para> | ||
1288 | |||
1289 | <programlisting> | ||
1290 | /* Sun, you just can't beat me, you just can't. Stop trying, | ||
1291 | * give up. I'm serious, I am going to kick the living shit | ||
1292 | * out of you, game over, lights out. | ||
1293 | */ | ||
1294 | </programlisting> | ||
1295 | </chapter> | ||
1296 | |||
1297 | <chapter id="credits"> | ||
1298 | <title>Thanks</title> | ||
1299 | |||
1300 | <para> | ||
1301 | Thanks to Andi Kleen for the idea, answering my questions, fixing | ||
1302 | my mistakes, filling content, etc. Philipp Rumpf for more spelling | ||
1303 | and clarity fixes, and some excellent non-obvious points. Werner | ||
1304 | Almesberger for giving me a great summary of | ||
1305 | <function>disable_irq()</function>, and Jes Sorensen and Andrea | ||
1306 | Arcangeli added caveats. Michael Elizabeth Chastain for checking | ||
1307 | and adding to the Configure section. <!-- Rusty insisted on this | ||
1308 | bit; I didn't do it! --> Telsa Gwynne for teaching me DocBook. | ||
1309 | </para> | ||
1310 | </chapter> | ||
1311 | </book> | ||
1312 | |||
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl deleted file mode 100644 index 7c9cc4846cb6..000000000000 --- a/Documentation/DocBook/kernel-locking.tmpl +++ /dev/null | |||
@@ -1,2151 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="LKLockingGuide"> | ||
6 | <bookinfo> | ||
7 | <title>Unreliable Guide To Locking</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Rusty</firstname> | ||
12 | <surname>Russell</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>rusty@rustcorp.com.au</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | </authorgroup> | ||
20 | |||
21 | <copyright> | ||
22 | <year>2003</year> | ||
23 | <holder>Rusty Russell</holder> | ||
24 | </copyright> | ||
25 | |||
26 | <legalnotice> | ||
27 | <para> | ||
28 | This documentation is free software; you can redistribute | ||
29 | it and/or modify it under the terms of the GNU General Public | ||
30 | License as published by the Free Software Foundation; either | ||
31 | version 2 of the License, or (at your option) any later | ||
32 | version. | ||
33 | </para> | ||
34 | |||
35 | <para> | ||
36 | This program is distributed in the hope that it will be | ||
37 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
38 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
39 | See the GNU General Public License for more details. | ||
40 | </para> | ||
41 | |||
42 | <para> | ||
43 | You should have received a copy of the GNU General Public | ||
44 | License along with this program; if not, write to the Free | ||
45 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
46 | MA 02111-1307 USA | ||
47 | </para> | ||
48 | |||
49 | <para> | ||
50 | For more details see the file COPYING in the source | ||
51 | distribution of Linux. | ||
52 | </para> | ||
53 | </legalnotice> | ||
54 | </bookinfo> | ||
55 | |||
56 | <toc></toc> | ||
57 | <chapter id="intro"> | ||
58 | <title>Introduction</title> | ||
59 | <para> | ||
60 | Welcome, to Rusty's Remarkably Unreliable Guide to Kernel | ||
61 | Locking issues. This document describes the locking systems in | ||
62 | the Linux Kernel in 2.6. | ||
63 | </para> | ||
64 | <para> | ||
65 | With the wide availability of HyperThreading, and <firstterm | ||
66 | linkend="gloss-preemption">preemption </firstterm> in the Linux | ||
67 | Kernel, everyone hacking on the kernel needs to know the | ||
68 | fundamentals of concurrency and locking for | ||
69 | <firstterm linkend="gloss-smp"><acronym>SMP</acronym></firstterm>. | ||
70 | </para> | ||
71 | </chapter> | ||
72 | |||
73 | <chapter id="races"> | ||
74 | <title>The Problem With Concurrency</title> | ||
75 | <para> | ||
76 | (Skip this if you know what a Race Condition is). | ||
77 | </para> | ||
78 | <para> | ||
79 | In a normal program, you can increment a counter like so: | ||
80 | </para> | ||
81 | <programlisting> | ||
82 | very_important_count++; | ||
83 | </programlisting> | ||
84 | |||
85 | <para> | ||
86 | This is what they would expect to happen: | ||
87 | </para> | ||
88 | |||
89 | <table> | ||
90 | <title>Expected Results</title> | ||
91 | |||
92 | <tgroup cols="2" align="left"> | ||
93 | |||
94 | <thead> | ||
95 | <row> | ||
96 | <entry>Instance 1</entry> | ||
97 | <entry>Instance 2</entry> | ||
98 | </row> | ||
99 | </thead> | ||
100 | |||
101 | <tbody> | ||
102 | <row> | ||
103 | <entry>read very_important_count (5)</entry> | ||
104 | <entry></entry> | ||
105 | </row> | ||
106 | <row> | ||
107 | <entry>add 1 (6)</entry> | ||
108 | <entry></entry> | ||
109 | </row> | ||
110 | <row> | ||
111 | <entry>write very_important_count (6)</entry> | ||
112 | <entry></entry> | ||
113 | </row> | ||
114 | <row> | ||
115 | <entry></entry> | ||
116 | <entry>read very_important_count (6)</entry> | ||
117 | </row> | ||
118 | <row> | ||
119 | <entry></entry> | ||
120 | <entry>add 1 (7)</entry> | ||
121 | </row> | ||
122 | <row> | ||
123 | <entry></entry> | ||
124 | <entry>write very_important_count (7)</entry> | ||
125 | </row> | ||
126 | </tbody> | ||
127 | |||
128 | </tgroup> | ||
129 | </table> | ||
130 | |||
131 | <para> | ||
132 | This is what might happen: | ||
133 | </para> | ||
134 | |||
135 | <table> | ||
136 | <title>Possible Results</title> | ||
137 | |||
138 | <tgroup cols="2" align="left"> | ||
139 | <thead> | ||
140 | <row> | ||
141 | <entry>Instance 1</entry> | ||
142 | <entry>Instance 2</entry> | ||
143 | </row> | ||
144 | </thead> | ||
145 | |||
146 | <tbody> | ||
147 | <row> | ||
148 | <entry>read very_important_count (5)</entry> | ||
149 | <entry></entry> | ||
150 | </row> | ||
151 | <row> | ||
152 | <entry></entry> | ||
153 | <entry>read very_important_count (5)</entry> | ||
154 | </row> | ||
155 | <row> | ||
156 | <entry>add 1 (6)</entry> | ||
157 | <entry></entry> | ||
158 | </row> | ||
159 | <row> | ||
160 | <entry></entry> | ||
161 | <entry>add 1 (6)</entry> | ||
162 | </row> | ||
163 | <row> | ||
164 | <entry>write very_important_count (6)</entry> | ||
165 | <entry></entry> | ||
166 | </row> | ||
167 | <row> | ||
168 | <entry></entry> | ||
169 | <entry>write very_important_count (6)</entry> | ||
170 | </row> | ||
171 | </tbody> | ||
172 | </tgroup> | ||
173 | </table> | ||
174 | |||
175 | <sect1 id="race-condition"> | ||
176 | <title>Race Conditions and Critical Regions</title> | ||
177 | <para> | ||
178 | This overlap, where the result depends on the | ||
179 | relative timing of multiple tasks, is called a <firstterm>race condition</firstterm>. | ||
180 | The piece of code containing the concurrency issue is called a | ||
181 | <firstterm>critical region</firstterm>. And especially since Linux starting running | ||
182 | on SMP machines, they became one of the major issues in kernel | ||
183 | design and implementation. | ||
184 | </para> | ||
185 | <para> | ||
186 | Preemption can have the same effect, even if there is only one | ||
187 | CPU: by preempting one task during the critical region, we have | ||
188 | exactly the same race condition. In this case the thread which | ||
189 | preempts might run the critical region itself. | ||
190 | </para> | ||
191 | <para> | ||
192 | The solution is to recognize when these simultaneous accesses | ||
193 | occur, and use locks to make sure that only one instance can | ||
194 | enter the critical region at any time. There are many | ||
195 | friendly primitives in the Linux kernel to help you do this. | ||
196 | And then there are the unfriendly primitives, but I'll pretend | ||
197 | they don't exist. | ||
198 | </para> | ||
199 | </sect1> | ||
200 | </chapter> | ||
201 | |||
202 | <chapter id="locks"> | ||
203 | <title>Locking in the Linux Kernel</title> | ||
204 | |||
205 | <para> | ||
206 | If I could give you one piece of advice: never sleep with anyone | ||
207 | crazier than yourself. But if I had to give you advice on | ||
208 | locking: <emphasis>keep it simple</emphasis>. | ||
209 | </para> | ||
210 | |||
211 | <para> | ||
212 | Be reluctant to introduce new locks. | ||
213 | </para> | ||
214 | |||
215 | <para> | ||
216 | Strangely enough, this last one is the exact reverse of my advice when | ||
217 | you <emphasis>have</emphasis> slept with someone crazier than yourself. | ||
218 | And you should think about getting a big dog. | ||
219 | </para> | ||
220 | |||
221 | <sect1 id="lock-intro"> | ||
222 | <title>Two Main Types of Kernel Locks: Spinlocks and Mutexes</title> | ||
223 | |||
224 | <para> | ||
225 | There are two main types of kernel locks. The fundamental type | ||
226 | is the spinlock | ||
227 | (<filename class="headerfile">include/asm/spinlock.h</filename>), | ||
228 | which is a very simple single-holder lock: if you can't get the | ||
229 | spinlock, you keep trying (spinning) until you can. Spinlocks are | ||
230 | very small and fast, and can be used anywhere. | ||
231 | </para> | ||
232 | <para> | ||
233 | The second type is a mutex | ||
234 | (<filename class="headerfile">include/linux/mutex.h</filename>): it | ||
235 | is like a spinlock, but you may block holding a mutex. | ||
236 | If you can't lock a mutex, your task will suspend itself, and be woken | ||
237 | up when the mutex is released. This means the CPU can do something | ||
238 | else while you are waiting. There are many cases when you simply | ||
239 | can't sleep (see <xref linkend="sleeping-things"/>), and so have to | ||
240 | use a spinlock instead. | ||
241 | </para> | ||
242 | <para> | ||
243 | Neither type of lock is recursive: see | ||
244 | <xref linkend="deadlock"/>. | ||
245 | </para> | ||
246 | </sect1> | ||
247 | |||
248 | <sect1 id="uniprocessor"> | ||
249 | <title>Locks and Uniprocessor Kernels</title> | ||
250 | |||
251 | <para> | ||
252 | For kernels compiled without <symbol>CONFIG_SMP</symbol>, and | ||
253 | without <symbol>CONFIG_PREEMPT</symbol> spinlocks do not exist at | ||
254 | all. This is an excellent design decision: when no-one else can | ||
255 | run at the same time, there is no reason to have a lock. | ||
256 | </para> | ||
257 | |||
258 | <para> | ||
259 | If the kernel is compiled without <symbol>CONFIG_SMP</symbol>, | ||
260 | but <symbol>CONFIG_PREEMPT</symbol> is set, then spinlocks | ||
261 | simply disable preemption, which is sufficient to prevent any | ||
262 | races. For most purposes, we can think of preemption as | ||
263 | equivalent to SMP, and not worry about it separately. | ||
264 | </para> | ||
265 | |||
266 | <para> | ||
267 | You should always test your locking code with <symbol>CONFIG_SMP</symbol> | ||
268 | and <symbol>CONFIG_PREEMPT</symbol> enabled, even if you don't have an SMP test box, because it | ||
269 | will still catch some kinds of locking bugs. | ||
270 | </para> | ||
271 | |||
272 | <para> | ||
273 | Mutexes still exist, because they are required for | ||
274 | synchronization between <firstterm linkend="gloss-usercontext">user | ||
275 | contexts</firstterm>, as we will see below. | ||
276 | </para> | ||
277 | </sect1> | ||
278 | |||
279 | <sect1 id="usercontextlocking"> | ||
280 | <title>Locking Only In User Context</title> | ||
281 | |||
282 | <para> | ||
283 | If you have a data structure which is only ever accessed from | ||
284 | user context, then you can use a simple mutex | ||
285 | (<filename>include/linux/mutex.h</filename>) to protect it. This | ||
286 | is the most trivial case: you initialize the mutex. Then you can | ||
287 | call <function>mutex_lock_interruptible()</function> to grab the mutex, | ||
288 | and <function>mutex_unlock()</function> to release it. There is also a | ||
289 | <function>mutex_lock()</function>, which should be avoided, because it | ||
290 | will not return if a signal is received. | ||
291 | </para> | ||
292 | |||
293 | <para> | ||
294 | Example: <filename>net/netfilter/nf_sockopt.c</filename> allows | ||
295 | registration of new <function>setsockopt()</function> and | ||
296 | <function>getsockopt()</function> calls, with | ||
297 | <function>nf_register_sockopt()</function>. Registration and | ||
298 | de-registration are only done on module load and unload (and boot | ||
299 | time, where there is no concurrency), and the list of registrations | ||
300 | is only consulted for an unknown <function>setsockopt()</function> | ||
301 | or <function>getsockopt()</function> system call. The | ||
302 | <varname>nf_sockopt_mutex</varname> is perfect to protect this, | ||
303 | especially since the setsockopt and getsockopt calls may well | ||
304 | sleep. | ||
305 | </para> | ||
306 | </sect1> | ||
307 | |||
308 | <sect1 id="lock-user-bh"> | ||
309 | <title>Locking Between User Context and Softirqs</title> | ||
310 | |||
311 | <para> | ||
312 | If a <firstterm linkend="gloss-softirq">softirq</firstterm> shares | ||
313 | data with user context, you have two problems. Firstly, the current | ||
314 | user context can be interrupted by a softirq, and secondly, the | ||
315 | critical region could be entered from another CPU. This is where | ||
316 | <function>spin_lock_bh()</function> | ||
317 | (<filename class="headerfile">include/linux/spinlock.h</filename>) is | ||
318 | used. It disables softirqs on that CPU, then grabs the lock. | ||
319 | <function>spin_unlock_bh()</function> does the reverse. (The | ||
320 | '_bh' suffix is a historical reference to "Bottom Halves", the | ||
321 | old name for software interrupts. It should really be | ||
322 | called spin_lock_softirq()' in a perfect world). | ||
323 | </para> | ||
324 | |||
325 | <para> | ||
326 | Note that you can also use <function>spin_lock_irq()</function> | ||
327 | or <function>spin_lock_irqsave()</function> here, which stop | ||
328 | hardware interrupts as well: see <xref linkend="hardirq-context"/>. | ||
329 | </para> | ||
330 | |||
331 | <para> | ||
332 | This works perfectly for <firstterm linkend="gloss-up"><acronym>UP | ||
333 | </acronym></firstterm> as well: the spin lock vanishes, and this macro | ||
334 | simply becomes <function>local_bh_disable()</function> | ||
335 | (<filename class="headerfile">include/linux/interrupt.h</filename>), which | ||
336 | protects you from the softirq being run. | ||
337 | </para> | ||
338 | </sect1> | ||
339 | |||
340 | <sect1 id="lock-user-tasklet"> | ||
341 | <title>Locking Between User Context and Tasklets</title> | ||
342 | |||
343 | <para> | ||
344 | This is exactly the same as above, because <firstterm | ||
345 | linkend="gloss-tasklet">tasklets</firstterm> are actually run | ||
346 | from a softirq. | ||
347 | </para> | ||
348 | </sect1> | ||
349 | |||
350 | <sect1 id="lock-user-timers"> | ||
351 | <title>Locking Between User Context and Timers</title> | ||
352 | |||
353 | <para> | ||
354 | This, too, is exactly the same as above, because <firstterm | ||
355 | linkend="gloss-timers">timers</firstterm> are actually run from | ||
356 | a softirq. From a locking point of view, tasklets and timers | ||
357 | are identical. | ||
358 | </para> | ||
359 | </sect1> | ||
360 | |||
361 | <sect1 id="lock-tasklets"> | ||
362 | <title>Locking Between Tasklets/Timers</title> | ||
363 | |||
364 | <para> | ||
365 | Sometimes a tasklet or timer might want to share data with | ||
366 | another tasklet or timer. | ||
367 | </para> | ||
368 | |||
369 | <sect2 id="lock-tasklets-same"> | ||
370 | <title>The Same Tasklet/Timer</title> | ||
371 | <para> | ||
372 | Since a tasklet is never run on two CPUs at once, you don't | ||
373 | need to worry about your tasklet being reentrant (running | ||
374 | twice at once), even on SMP. | ||
375 | </para> | ||
376 | </sect2> | ||
377 | |||
378 | <sect2 id="lock-tasklets-different"> | ||
379 | <title>Different Tasklets/Timers</title> | ||
380 | <para> | ||
381 | If another tasklet/timer wants | ||
382 | to share data with your tasklet or timer , you will both need to use | ||
383 | <function>spin_lock()</function> and | ||
384 | <function>spin_unlock()</function> calls. | ||
385 | <function>spin_lock_bh()</function> is | ||
386 | unnecessary here, as you are already in a tasklet, and | ||
387 | none will be run on the same CPU. | ||
388 | </para> | ||
389 | </sect2> | ||
390 | </sect1> | ||
391 | |||
392 | <sect1 id="lock-softirqs"> | ||
393 | <title>Locking Between Softirqs</title> | ||
394 | |||
395 | <para> | ||
396 | Often a softirq might | ||
397 | want to share data with itself or a tasklet/timer. | ||
398 | </para> | ||
399 | |||
400 | <sect2 id="lock-softirqs-same"> | ||
401 | <title>The Same Softirq</title> | ||
402 | |||
403 | <para> | ||
404 | The same softirq can run on the other CPUs: you can use a | ||
405 | per-CPU array (see <xref linkend="per-cpu"/>) for better | ||
406 | performance. If you're going so far as to use a softirq, | ||
407 | you probably care about scalable performance enough | ||
408 | to justify the extra complexity. | ||
409 | </para> | ||
410 | |||
411 | <para> | ||
412 | You'll need to use <function>spin_lock()</function> and | ||
413 | <function>spin_unlock()</function> for shared data. | ||
414 | </para> | ||
415 | </sect2> | ||
416 | |||
417 | <sect2 id="lock-softirqs-different"> | ||
418 | <title>Different Softirqs</title> | ||
419 | |||
420 | <para> | ||
421 | You'll need to use <function>spin_lock()</function> and | ||
422 | <function>spin_unlock()</function> for shared data, whether it | ||
423 | be a timer, tasklet, different softirq or the same or another | ||
424 | softirq: any of them could be running on a different CPU. | ||
425 | </para> | ||
426 | </sect2> | ||
427 | </sect1> | ||
428 | </chapter> | ||
429 | |||
430 | <chapter id="hardirq-context"> | ||
431 | <title>Hard IRQ Context</title> | ||
432 | |||
433 | <para> | ||
434 | Hardware interrupts usually communicate with a | ||
435 | tasklet or softirq. Frequently this involves putting work in a | ||
436 | queue, which the softirq will take out. | ||
437 | </para> | ||
438 | |||
439 | <sect1 id="hardirq-softirq"> | ||
440 | <title>Locking Between Hard IRQ and Softirqs/Tasklets</title> | ||
441 | |||
442 | <para> | ||
443 | If a hardware irq handler shares data with a softirq, you have | ||
444 | two concerns. Firstly, the softirq processing can be | ||
445 | interrupted by a hardware interrupt, and secondly, the | ||
446 | critical region could be entered by a hardware interrupt on | ||
447 | another CPU. This is where <function>spin_lock_irq()</function> is | ||
448 | used. It is defined to disable interrupts on that cpu, then grab | ||
449 | the lock. <function>spin_unlock_irq()</function> does the reverse. | ||
450 | </para> | ||
451 | |||
452 | <para> | ||
453 | The irq handler does not to use | ||
454 | <function>spin_lock_irq()</function>, because the softirq cannot | ||
455 | run while the irq handler is running: it can use | ||
456 | <function>spin_lock()</function>, which is slightly faster. The | ||
457 | only exception would be if a different hardware irq handler uses | ||
458 | the same lock: <function>spin_lock_irq()</function> will stop | ||
459 | that from interrupting us. | ||
460 | </para> | ||
461 | |||
462 | <para> | ||
463 | This works perfectly for UP as well: the spin lock vanishes, | ||
464 | and this macro simply becomes <function>local_irq_disable()</function> | ||
465 | (<filename class="headerfile">include/asm/smp.h</filename>), which | ||
466 | protects you from the softirq/tasklet/BH being run. | ||
467 | </para> | ||
468 | |||
469 | <para> | ||
470 | <function>spin_lock_irqsave()</function> | ||
471 | (<filename>include/linux/spinlock.h</filename>) is a variant | ||
472 | which saves whether interrupts were on or off in a flags word, | ||
473 | which is passed to <function>spin_unlock_irqrestore()</function>. This | ||
474 | means that the same code can be used inside an hard irq handler (where | ||
475 | interrupts are already off) and in softirqs (where the irq | ||
476 | disabling is required). | ||
477 | </para> | ||
478 | |||
479 | <para> | ||
480 | Note that softirqs (and hence tasklets and timers) are run on | ||
481 | return from hardware interrupts, so | ||
482 | <function>spin_lock_irq()</function> also stops these. In that | ||
483 | sense, <function>spin_lock_irqsave()</function> is the most | ||
484 | general and powerful locking function. | ||
485 | </para> | ||
486 | |||
487 | </sect1> | ||
488 | <sect1 id="hardirq-hardirq"> | ||
489 | <title>Locking Between Two Hard IRQ Handlers</title> | ||
490 | <para> | ||
491 | It is rare to have to share data between two IRQ handlers, but | ||
492 | if you do, <function>spin_lock_irqsave()</function> should be | ||
493 | used: it is architecture-specific whether all interrupts are | ||
494 | disabled inside irq handlers themselves. | ||
495 | </para> | ||
496 | </sect1> | ||
497 | |||
498 | </chapter> | ||
499 | |||
500 | <chapter id="cheatsheet"> | ||
501 | <title>Cheat Sheet For Locking</title> | ||
502 | <para> | ||
503 | Pete Zaitcev gives the following summary: | ||
504 | </para> | ||
505 | <itemizedlist> | ||
506 | <listitem> | ||
507 | <para> | ||
508 | If you are in a process context (any syscall) and want to | ||
509 | lock other process out, use a mutex. You can take a mutex | ||
510 | and sleep (<function>copy_from_user*(</function> or | ||
511 | <function>kmalloc(x,GFP_KERNEL)</function>). | ||
512 | </para> | ||
513 | </listitem> | ||
514 | <listitem> | ||
515 | <para> | ||
516 | Otherwise (== data can be touched in an interrupt), use | ||
517 | <function>spin_lock_irqsave()</function> and | ||
518 | <function>spin_unlock_irqrestore()</function>. | ||
519 | </para> | ||
520 | </listitem> | ||
521 | <listitem> | ||
522 | <para> | ||
523 | Avoid holding spinlock for more than 5 lines of code and | ||
524 | across any function call (except accessors like | ||
525 | <function>readb</function>). | ||
526 | </para> | ||
527 | </listitem> | ||
528 | </itemizedlist> | ||
529 | |||
530 | <sect1 id="minimum-lock-reqirements"> | ||
531 | <title>Table of Minimum Requirements</title> | ||
532 | |||
533 | <para> The following table lists the <emphasis>minimum</emphasis> | ||
534 | locking requirements between various contexts. In some cases, | ||
535 | the same context can only be running on one CPU at a time, so | ||
536 | no locking is required for that context (eg. a particular | ||
537 | thread can only run on one CPU at a time, but if it needs | ||
538 | shares data with another thread, locking is required). | ||
539 | </para> | ||
540 | <para> | ||
541 | Remember the advice above: you can always use | ||
542 | <function>spin_lock_irqsave()</function>, which is a superset | ||
543 | of all other spinlock primitives. | ||
544 | </para> | ||
545 | |||
546 | <table> | ||
547 | <title>Table of Locking Requirements</title> | ||
548 | <tgroup cols="11"> | ||
549 | <tbody> | ||
550 | |||
551 | <row> | ||
552 | <entry></entry> | ||
553 | <entry>IRQ Handler A</entry> | ||
554 | <entry>IRQ Handler B</entry> | ||
555 | <entry>Softirq A</entry> | ||
556 | <entry>Softirq B</entry> | ||
557 | <entry>Tasklet A</entry> | ||
558 | <entry>Tasklet B</entry> | ||
559 | <entry>Timer A</entry> | ||
560 | <entry>Timer B</entry> | ||
561 | <entry>User Context A</entry> | ||
562 | <entry>User Context B</entry> | ||
563 | </row> | ||
564 | |||
565 | <row> | ||
566 | <entry>IRQ Handler A</entry> | ||
567 | <entry>None</entry> | ||
568 | </row> | ||
569 | |||
570 | <row> | ||
571 | <entry>IRQ Handler B</entry> | ||
572 | <entry>SLIS</entry> | ||
573 | <entry>None</entry> | ||
574 | </row> | ||
575 | |||
576 | <row> | ||
577 | <entry>Softirq A</entry> | ||
578 | <entry>SLI</entry> | ||
579 | <entry>SLI</entry> | ||
580 | <entry>SL</entry> | ||
581 | </row> | ||
582 | |||
583 | <row> | ||
584 | <entry>Softirq B</entry> | ||
585 | <entry>SLI</entry> | ||
586 | <entry>SLI</entry> | ||
587 | <entry>SL</entry> | ||
588 | <entry>SL</entry> | ||
589 | </row> | ||
590 | |||
591 | <row> | ||
592 | <entry>Tasklet A</entry> | ||
593 | <entry>SLI</entry> | ||
594 | <entry>SLI</entry> | ||
595 | <entry>SL</entry> | ||
596 | <entry>SL</entry> | ||
597 | <entry>None</entry> | ||
598 | </row> | ||
599 | |||
600 | <row> | ||
601 | <entry>Tasklet B</entry> | ||
602 | <entry>SLI</entry> | ||
603 | <entry>SLI</entry> | ||
604 | <entry>SL</entry> | ||
605 | <entry>SL</entry> | ||
606 | <entry>SL</entry> | ||
607 | <entry>None</entry> | ||
608 | </row> | ||
609 | |||
610 | <row> | ||
611 | <entry>Timer A</entry> | ||
612 | <entry>SLI</entry> | ||
613 | <entry>SLI</entry> | ||
614 | <entry>SL</entry> | ||
615 | <entry>SL</entry> | ||
616 | <entry>SL</entry> | ||
617 | <entry>SL</entry> | ||
618 | <entry>None</entry> | ||
619 | </row> | ||
620 | |||
621 | <row> | ||
622 | <entry>Timer B</entry> | ||
623 | <entry>SLI</entry> | ||
624 | <entry>SLI</entry> | ||
625 | <entry>SL</entry> | ||
626 | <entry>SL</entry> | ||
627 | <entry>SL</entry> | ||
628 | <entry>SL</entry> | ||
629 | <entry>SL</entry> | ||
630 | <entry>None</entry> | ||
631 | </row> | ||
632 | |||
633 | <row> | ||
634 | <entry>User Context A</entry> | ||
635 | <entry>SLI</entry> | ||
636 | <entry>SLI</entry> | ||
637 | <entry>SLBH</entry> | ||
638 | <entry>SLBH</entry> | ||
639 | <entry>SLBH</entry> | ||
640 | <entry>SLBH</entry> | ||
641 | <entry>SLBH</entry> | ||
642 | <entry>SLBH</entry> | ||
643 | <entry>None</entry> | ||
644 | </row> | ||
645 | |||
646 | <row> | ||
647 | <entry>User Context B</entry> | ||
648 | <entry>SLI</entry> | ||
649 | <entry>SLI</entry> | ||
650 | <entry>SLBH</entry> | ||
651 | <entry>SLBH</entry> | ||
652 | <entry>SLBH</entry> | ||
653 | <entry>SLBH</entry> | ||
654 | <entry>SLBH</entry> | ||
655 | <entry>SLBH</entry> | ||
656 | <entry>MLI</entry> | ||
657 | <entry>None</entry> | ||
658 | </row> | ||
659 | |||
660 | </tbody> | ||
661 | </tgroup> | ||
662 | </table> | ||
663 | |||
664 | <table> | ||
665 | <title>Legend for Locking Requirements Table</title> | ||
666 | <tgroup cols="2"> | ||
667 | <tbody> | ||
668 | |||
669 | <row> | ||
670 | <entry>SLIS</entry> | ||
671 | <entry>spin_lock_irqsave</entry> | ||
672 | </row> | ||
673 | <row> | ||
674 | <entry>SLI</entry> | ||
675 | <entry>spin_lock_irq</entry> | ||
676 | </row> | ||
677 | <row> | ||
678 | <entry>SL</entry> | ||
679 | <entry>spin_lock</entry> | ||
680 | </row> | ||
681 | <row> | ||
682 | <entry>SLBH</entry> | ||
683 | <entry>spin_lock_bh</entry> | ||
684 | </row> | ||
685 | <row> | ||
686 | <entry>MLI</entry> | ||
687 | <entry>mutex_lock_interruptible</entry> | ||
688 | </row> | ||
689 | |||
690 | </tbody> | ||
691 | </tgroup> | ||
692 | </table> | ||
693 | |||
694 | </sect1> | ||
695 | </chapter> | ||
696 | |||
697 | <chapter id="trylock-functions"> | ||
698 | <title>The trylock Functions</title> | ||
699 | <para> | ||
700 | There are functions that try to acquire a lock only once and immediately | ||
701 | return a value telling about success or failure to acquire the lock. | ||
702 | They can be used if you need no access to the data protected with the lock | ||
703 | when some other thread is holding the lock. You should acquire the lock | ||
704 | later if you then need access to the data protected with the lock. | ||
705 | </para> | ||
706 | |||
707 | <para> | ||
708 | <function>spin_trylock()</function> does not spin but returns non-zero if | ||
709 | it acquires the spinlock on the first try or 0 if not. This function can | ||
710 | be used in all contexts like <function>spin_lock</function>: you must have | ||
711 | disabled the contexts that might interrupt you and acquire the spin lock. | ||
712 | </para> | ||
713 | |||
714 | <para> | ||
715 | <function>mutex_trylock()</function> does not suspend your task | ||
716 | but returns non-zero if it could lock the mutex on the first try | ||
717 | or 0 if not. This function cannot be safely used in hardware or software | ||
718 | interrupt contexts despite not sleeping. | ||
719 | </para> | ||
720 | </chapter> | ||
721 | |||
722 | <chapter id="Examples"> | ||
723 | <title>Common Examples</title> | ||
724 | <para> | ||
725 | Let's step through a simple example: a cache of number to name | ||
726 | mappings. The cache keeps a count of how often each of the objects is | ||
727 | used, and when it gets full, throws out the least used one. | ||
728 | |||
729 | </para> | ||
730 | |||
731 | <sect1 id="examples-usercontext"> | ||
732 | <title>All In User Context</title> | ||
733 | <para> | ||
734 | For our first example, we assume that all operations are in user | ||
735 | context (ie. from system calls), so we can sleep. This means we can | ||
736 | use a mutex to protect the cache and all the objects within | ||
737 | it. Here's the code: | ||
738 | </para> | ||
739 | |||
740 | <programlisting> | ||
741 | #include <linux/list.h> | ||
742 | #include <linux/slab.h> | ||
743 | #include <linux/string.h> | ||
744 | #include <linux/mutex.h> | ||
745 | #include <asm/errno.h> | ||
746 | |||
747 | struct object | ||
748 | { | ||
749 | struct list_head list; | ||
750 | int id; | ||
751 | char name[32]; | ||
752 | int popularity; | ||
753 | }; | ||
754 | |||
755 | /* Protects the cache, cache_num, and the objects within it */ | ||
756 | static DEFINE_MUTEX(cache_lock); | ||
757 | static LIST_HEAD(cache); | ||
758 | static unsigned int cache_num = 0; | ||
759 | #define MAX_CACHE_SIZE 10 | ||
760 | |||
761 | /* Must be holding cache_lock */ | ||
762 | static struct object *__cache_find(int id) | ||
763 | { | ||
764 | struct object *i; | ||
765 | |||
766 | list_for_each_entry(i, &cache, list) | ||
767 | if (i->id == id) { | ||
768 | i->popularity++; | ||
769 | return i; | ||
770 | } | ||
771 | return NULL; | ||
772 | } | ||
773 | |||
774 | /* Must be holding cache_lock */ | ||
775 | static void __cache_delete(struct object *obj) | ||
776 | { | ||
777 | BUG_ON(!obj); | ||
778 | list_del(&obj->list); | ||
779 | kfree(obj); | ||
780 | cache_num--; | ||
781 | } | ||
782 | |||
783 | /* Must be holding cache_lock */ | ||
784 | static void __cache_add(struct object *obj) | ||
785 | { | ||
786 | list_add(&obj->list, &cache); | ||
787 | if (++cache_num > MAX_CACHE_SIZE) { | ||
788 | struct object *i, *outcast = NULL; | ||
789 | list_for_each_entry(i, &cache, list) { | ||
790 | if (!outcast || i->popularity < outcast->popularity) | ||
791 | outcast = i; | ||
792 | } | ||
793 | __cache_delete(outcast); | ||
794 | } | ||
795 | } | ||
796 | |||
797 | int cache_add(int id, const char *name) | ||
798 | { | ||
799 | struct object *obj; | ||
800 | |||
801 | if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL) | ||
802 | return -ENOMEM; | ||
803 | |||
804 | strlcpy(obj->name, name, sizeof(obj->name)); | ||
805 | obj->id = id; | ||
806 | obj->popularity = 0; | ||
807 | |||
808 | mutex_lock(&cache_lock); | ||
809 | __cache_add(obj); | ||
810 | mutex_unlock(&cache_lock); | ||
811 | return 0; | ||
812 | } | ||
813 | |||
814 | void cache_delete(int id) | ||
815 | { | ||
816 | mutex_lock(&cache_lock); | ||
817 | __cache_delete(__cache_find(id)); | ||
818 | mutex_unlock(&cache_lock); | ||
819 | } | ||
820 | |||
821 | int cache_find(int id, char *name) | ||
822 | { | ||
823 | struct object *obj; | ||
824 | int ret = -ENOENT; | ||
825 | |||
826 | mutex_lock(&cache_lock); | ||
827 | obj = __cache_find(id); | ||
828 | if (obj) { | ||
829 | ret = 0; | ||
830 | strcpy(name, obj->name); | ||
831 | } | ||
832 | mutex_unlock(&cache_lock); | ||
833 | return ret; | ||
834 | } | ||
835 | </programlisting> | ||
836 | |||
837 | <para> | ||
838 | Note that we always make sure we have the cache_lock when we add, | ||
839 | delete, or look up the cache: both the cache infrastructure itself and | ||
840 | the contents of the objects are protected by the lock. In this case | ||
841 | it's easy, since we copy the data for the user, and never let them | ||
842 | access the objects directly. | ||
843 | </para> | ||
844 | <para> | ||
845 | There is a slight (and common) optimization here: in | ||
846 | <function>cache_add</function> we set up the fields of the object | ||
847 | before grabbing the lock. This is safe, as no-one else can access it | ||
848 | until we put it in cache. | ||
849 | </para> | ||
850 | </sect1> | ||
851 | |||
852 | <sect1 id="examples-interrupt"> | ||
853 | <title>Accessing From Interrupt Context</title> | ||
854 | <para> | ||
855 | Now consider the case where <function>cache_find</function> can be | ||
856 | called from interrupt context: either a hardware interrupt or a | ||
857 | softirq. An example would be a timer which deletes object from the | ||
858 | cache. | ||
859 | </para> | ||
860 | <para> | ||
861 | The change is shown below, in standard patch format: the | ||
862 | <symbol>-</symbol> are lines which are taken away, and the | ||
863 | <symbol>+</symbol> are lines which are added. | ||
864 | </para> | ||
865 | <programlisting> | ||
866 | --- cache.c.usercontext 2003-12-09 13:58:54.000000000 +1100 | ||
867 | +++ cache.c.interrupt 2003-12-09 14:07:49.000000000 +1100 | ||
868 | @@ -12,7 +12,7 @@ | ||
869 | int popularity; | ||
870 | }; | ||
871 | |||
872 | -static DEFINE_MUTEX(cache_lock); | ||
873 | +static DEFINE_SPINLOCK(cache_lock); | ||
874 | static LIST_HEAD(cache); | ||
875 | static unsigned int cache_num = 0; | ||
876 | #define MAX_CACHE_SIZE 10 | ||
877 | @@ -55,6 +55,7 @@ | ||
878 | int cache_add(int id, const char *name) | ||
879 | { | ||
880 | struct object *obj; | ||
881 | + unsigned long flags; | ||
882 | |||
883 | if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL) | ||
884 | return -ENOMEM; | ||
885 | @@ -63,30 +64,33 @@ | ||
886 | obj->id = id; | ||
887 | obj->popularity = 0; | ||
888 | |||
889 | - mutex_lock(&cache_lock); | ||
890 | + spin_lock_irqsave(&cache_lock, flags); | ||
891 | __cache_add(obj); | ||
892 | - mutex_unlock(&cache_lock); | ||
893 | + spin_unlock_irqrestore(&cache_lock, flags); | ||
894 | return 0; | ||
895 | } | ||
896 | |||
897 | void cache_delete(int id) | ||
898 | { | ||
899 | - mutex_lock(&cache_lock); | ||
900 | + unsigned long flags; | ||
901 | + | ||
902 | + spin_lock_irqsave(&cache_lock, flags); | ||
903 | __cache_delete(__cache_find(id)); | ||
904 | - mutex_unlock(&cache_lock); | ||
905 | + spin_unlock_irqrestore(&cache_lock, flags); | ||
906 | } | ||
907 | |||
908 | int cache_find(int id, char *name) | ||
909 | { | ||
910 | struct object *obj; | ||
911 | int ret = -ENOENT; | ||
912 | + unsigned long flags; | ||
913 | |||
914 | - mutex_lock(&cache_lock); | ||
915 | + spin_lock_irqsave(&cache_lock, flags); | ||
916 | obj = __cache_find(id); | ||
917 | if (obj) { | ||
918 | ret = 0; | ||
919 | strcpy(name, obj->name); | ||
920 | } | ||
921 | - mutex_unlock(&cache_lock); | ||
922 | + spin_unlock_irqrestore(&cache_lock, flags); | ||
923 | return ret; | ||
924 | } | ||
925 | </programlisting> | ||
926 | |||
927 | <para> | ||
928 | Note that the <function>spin_lock_irqsave</function> will turn off | ||
929 | interrupts if they are on, otherwise does nothing (if we are already | ||
930 | in an interrupt handler), hence these functions are safe to call from | ||
931 | any context. | ||
932 | </para> | ||
933 | <para> | ||
934 | Unfortunately, <function>cache_add</function> calls | ||
935 | <function>kmalloc</function> with the <symbol>GFP_KERNEL</symbol> | ||
936 | flag, which is only legal in user context. I have assumed that | ||
937 | <function>cache_add</function> is still only called in user context, | ||
938 | otherwise this should become a parameter to | ||
939 | <function>cache_add</function>. | ||
940 | </para> | ||
941 | </sect1> | ||
942 | <sect1 id="examples-refcnt"> | ||
943 | <title>Exposing Objects Outside This File</title> | ||
944 | <para> | ||
945 | If our objects contained more information, it might not be sufficient | ||
946 | to copy the information in and out: other parts of the code might want | ||
947 | to keep pointers to these objects, for example, rather than looking up | ||
948 | the id every time. This produces two problems. | ||
949 | </para> | ||
950 | <para> | ||
951 | The first problem is that we use the <symbol>cache_lock</symbol> to | ||
952 | protect objects: we'd need to make this non-static so the rest of the | ||
953 | code can use it. This makes locking trickier, as it is no longer all | ||
954 | in one place. | ||
955 | </para> | ||
956 | <para> | ||
957 | The second problem is the lifetime problem: if another structure keeps | ||
958 | a pointer to an object, it presumably expects that pointer to remain | ||
959 | valid. Unfortunately, this is only guaranteed while you hold the | ||
960 | lock, otherwise someone might call <function>cache_delete</function> | ||
961 | and even worse, add another object, re-using the same address. | ||
962 | </para> | ||
963 | <para> | ||
964 | As there is only one lock, you can't hold it forever: no-one else would | ||
965 | get any work done. | ||
966 | </para> | ||
967 | <para> | ||
968 | The solution to this problem is to use a reference count: everyone who | ||
969 | has a pointer to the object increases it when they first get the | ||
970 | object, and drops the reference count when they're finished with it. | ||
971 | Whoever drops it to zero knows it is unused, and can actually delete it. | ||
972 | </para> | ||
973 | <para> | ||
974 | Here is the code: | ||
975 | </para> | ||
976 | |||
977 | <programlisting> | ||
978 | --- cache.c.interrupt 2003-12-09 14:25:43.000000000 +1100 | ||
979 | +++ cache.c.refcnt 2003-12-09 14:33:05.000000000 +1100 | ||
980 | @@ -7,6 +7,7 @@ | ||
981 | struct object | ||
982 | { | ||
983 | struct list_head list; | ||
984 | + unsigned int refcnt; | ||
985 | int id; | ||
986 | char name[32]; | ||
987 | int popularity; | ||
988 | @@ -17,6 +18,35 @@ | ||
989 | static unsigned int cache_num = 0; | ||
990 | #define MAX_CACHE_SIZE 10 | ||
991 | |||
992 | +static void __object_put(struct object *obj) | ||
993 | +{ | ||
994 | + if (--obj->refcnt == 0) | ||
995 | + kfree(obj); | ||
996 | +} | ||
997 | + | ||
998 | +static void __object_get(struct object *obj) | ||
999 | +{ | ||
1000 | + obj->refcnt++; | ||
1001 | +} | ||
1002 | + | ||
1003 | +void object_put(struct object *obj) | ||
1004 | +{ | ||
1005 | + unsigned long flags; | ||
1006 | + | ||
1007 | + spin_lock_irqsave(&cache_lock, flags); | ||
1008 | + __object_put(obj); | ||
1009 | + spin_unlock_irqrestore(&cache_lock, flags); | ||
1010 | +} | ||
1011 | + | ||
1012 | +void object_get(struct object *obj) | ||
1013 | +{ | ||
1014 | + unsigned long flags; | ||
1015 | + | ||
1016 | + spin_lock_irqsave(&cache_lock, flags); | ||
1017 | + __object_get(obj); | ||
1018 | + spin_unlock_irqrestore(&cache_lock, flags); | ||
1019 | +} | ||
1020 | + | ||
1021 | /* Must be holding cache_lock */ | ||
1022 | static struct object *__cache_find(int id) | ||
1023 | { | ||
1024 | @@ -35,6 +65,7 @@ | ||
1025 | { | ||
1026 | BUG_ON(!obj); | ||
1027 | list_del(&obj->list); | ||
1028 | + __object_put(obj); | ||
1029 | cache_num--; | ||
1030 | } | ||
1031 | |||
1032 | @@ -63,6 +94,7 @@ | ||
1033 | strlcpy(obj->name, name, sizeof(obj->name)); | ||
1034 | obj->id = id; | ||
1035 | obj->popularity = 0; | ||
1036 | + obj->refcnt = 1; /* The cache holds a reference */ | ||
1037 | |||
1038 | spin_lock_irqsave(&cache_lock, flags); | ||
1039 | __cache_add(obj); | ||
1040 | @@ -79,18 +111,15 @@ | ||
1041 | spin_unlock_irqrestore(&cache_lock, flags); | ||
1042 | } | ||
1043 | |||
1044 | -int cache_find(int id, char *name) | ||
1045 | +struct object *cache_find(int id) | ||
1046 | { | ||
1047 | struct object *obj; | ||
1048 | - int ret = -ENOENT; | ||
1049 | unsigned long flags; | ||
1050 | |||
1051 | spin_lock_irqsave(&cache_lock, flags); | ||
1052 | obj = __cache_find(id); | ||
1053 | - if (obj) { | ||
1054 | - ret = 0; | ||
1055 | - strcpy(name, obj->name); | ||
1056 | - } | ||
1057 | + if (obj) | ||
1058 | + __object_get(obj); | ||
1059 | spin_unlock_irqrestore(&cache_lock, flags); | ||
1060 | - return ret; | ||
1061 | + return obj; | ||
1062 | } | ||
1063 | </programlisting> | ||
1064 | |||
1065 | <para> | ||
1066 | We encapsulate the reference counting in the standard 'get' and 'put' | ||
1067 | functions. Now we can return the object itself from | ||
1068 | <function>cache_find</function> which has the advantage that the user | ||
1069 | can now sleep holding the object (eg. to | ||
1070 | <function>copy_to_user</function> to name to userspace). | ||
1071 | </para> | ||
1072 | <para> | ||
1073 | The other point to note is that I said a reference should be held for | ||
1074 | every pointer to the object: thus the reference count is 1 when first | ||
1075 | inserted into the cache. In some versions the framework does not hold | ||
1076 | a reference count, but they are more complicated. | ||
1077 | </para> | ||
1078 | |||
1079 | <sect2 id="examples-refcnt-atomic"> | ||
1080 | <title>Using Atomic Operations For The Reference Count</title> | ||
1081 | <para> | ||
1082 | In practice, <type>atomic_t</type> would usually be used for | ||
1083 | <structfield>refcnt</structfield>. There are a number of atomic | ||
1084 | operations defined in | ||
1085 | |||
1086 | <filename class="headerfile">include/asm/atomic.h</filename>: these are | ||
1087 | guaranteed to be seen atomically from all CPUs in the system, so no | ||
1088 | lock is required. In this case, it is simpler than using spinlocks, | ||
1089 | although for anything non-trivial using spinlocks is clearer. The | ||
1090 | <function>atomic_inc</function> and | ||
1091 | <function>atomic_dec_and_test</function> are used instead of the | ||
1092 | standard increment and decrement operators, and the lock is no longer | ||
1093 | used to protect the reference count itself. | ||
1094 | </para> | ||
1095 | |||
1096 | <programlisting> | ||
1097 | --- cache.c.refcnt 2003-12-09 15:00:35.000000000 +1100 | ||
1098 | +++ cache.c.refcnt-atomic 2003-12-11 15:49:42.000000000 +1100 | ||
1099 | @@ -7,7 +7,7 @@ | ||
1100 | struct object | ||
1101 | { | ||
1102 | struct list_head list; | ||
1103 | - unsigned int refcnt; | ||
1104 | + atomic_t refcnt; | ||
1105 | int id; | ||
1106 | char name[32]; | ||
1107 | int popularity; | ||
1108 | @@ -18,33 +18,15 @@ | ||
1109 | static unsigned int cache_num = 0; | ||
1110 | #define MAX_CACHE_SIZE 10 | ||
1111 | |||
1112 | -static void __object_put(struct object *obj) | ||
1113 | -{ | ||
1114 | - if (--obj->refcnt == 0) | ||
1115 | - kfree(obj); | ||
1116 | -} | ||
1117 | - | ||
1118 | -static void __object_get(struct object *obj) | ||
1119 | -{ | ||
1120 | - obj->refcnt++; | ||
1121 | -} | ||
1122 | - | ||
1123 | void object_put(struct object *obj) | ||
1124 | { | ||
1125 | - unsigned long flags; | ||
1126 | - | ||
1127 | - spin_lock_irqsave(&cache_lock, flags); | ||
1128 | - __object_put(obj); | ||
1129 | - spin_unlock_irqrestore(&cache_lock, flags); | ||
1130 | + if (atomic_dec_and_test(&obj->refcnt)) | ||
1131 | + kfree(obj); | ||
1132 | } | ||
1133 | |||
1134 | void object_get(struct object *obj) | ||
1135 | { | ||
1136 | - unsigned long flags; | ||
1137 | - | ||
1138 | - spin_lock_irqsave(&cache_lock, flags); | ||
1139 | - __object_get(obj); | ||
1140 | - spin_unlock_irqrestore(&cache_lock, flags); | ||
1141 | + atomic_inc(&obj->refcnt); | ||
1142 | } | ||
1143 | |||
1144 | /* Must be holding cache_lock */ | ||
1145 | @@ -65,7 +47,7 @@ | ||
1146 | { | ||
1147 | BUG_ON(!obj); | ||
1148 | list_del(&obj->list); | ||
1149 | - __object_put(obj); | ||
1150 | + object_put(obj); | ||
1151 | cache_num--; | ||
1152 | } | ||
1153 | |||
1154 | @@ -94,7 +76,7 @@ | ||
1155 | strlcpy(obj->name, name, sizeof(obj->name)); | ||
1156 | obj->id = id; | ||
1157 | obj->popularity = 0; | ||
1158 | - obj->refcnt = 1; /* The cache holds a reference */ | ||
1159 | + atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ | ||
1160 | |||
1161 | spin_lock_irqsave(&cache_lock, flags); | ||
1162 | __cache_add(obj); | ||
1163 | @@ -119,7 +101,7 @@ | ||
1164 | spin_lock_irqsave(&cache_lock, flags); | ||
1165 | obj = __cache_find(id); | ||
1166 | if (obj) | ||
1167 | - __object_get(obj); | ||
1168 | + object_get(obj); | ||
1169 | spin_unlock_irqrestore(&cache_lock, flags); | ||
1170 | return obj; | ||
1171 | } | ||
1172 | </programlisting> | ||
1173 | </sect2> | ||
1174 | </sect1> | ||
1175 | |||
1176 | <sect1 id="examples-lock-per-obj"> | ||
1177 | <title>Protecting The Objects Themselves</title> | ||
1178 | <para> | ||
1179 | In these examples, we assumed that the objects (except the reference | ||
1180 | counts) never changed once they are created. If we wanted to allow | ||
1181 | the name to change, there are three possibilities: | ||
1182 | </para> | ||
1183 | <itemizedlist> | ||
1184 | <listitem> | ||
1185 | <para> | ||
1186 | You can make <symbol>cache_lock</symbol> non-static, and tell people | ||
1187 | to grab that lock before changing the name in any object. | ||
1188 | </para> | ||
1189 | </listitem> | ||
1190 | <listitem> | ||
1191 | <para> | ||
1192 | You can provide a <function>cache_obj_rename</function> which grabs | ||
1193 | this lock and changes the name for the caller, and tell everyone to | ||
1194 | use that function. | ||
1195 | </para> | ||
1196 | </listitem> | ||
1197 | <listitem> | ||
1198 | <para> | ||
1199 | You can make the <symbol>cache_lock</symbol> protect only the cache | ||
1200 | itself, and use another lock to protect the name. | ||
1201 | </para> | ||
1202 | </listitem> | ||
1203 | </itemizedlist> | ||
1204 | |||
1205 | <para> | ||
1206 | Theoretically, you can make the locks as fine-grained as one lock for | ||
1207 | every field, for every object. In practice, the most common variants | ||
1208 | are: | ||
1209 | </para> | ||
1210 | <itemizedlist> | ||
1211 | <listitem> | ||
1212 | <para> | ||
1213 | One lock which protects the infrastructure (the <symbol>cache</symbol> | ||
1214 | list in this example) and all the objects. This is what we have done | ||
1215 | so far. | ||
1216 | </para> | ||
1217 | </listitem> | ||
1218 | <listitem> | ||
1219 | <para> | ||
1220 | One lock which protects the infrastructure (including the list | ||
1221 | pointers inside the objects), and one lock inside the object which | ||
1222 | protects the rest of that object. | ||
1223 | </para> | ||
1224 | </listitem> | ||
1225 | <listitem> | ||
1226 | <para> | ||
1227 | Multiple locks to protect the infrastructure (eg. one lock per hash | ||
1228 | chain), possibly with a separate per-object lock. | ||
1229 | </para> | ||
1230 | </listitem> | ||
1231 | </itemizedlist> | ||
1232 | |||
1233 | <para> | ||
1234 | Here is the "lock-per-object" implementation: | ||
1235 | </para> | ||
1236 | <programlisting> | ||
1237 | --- cache.c.refcnt-atomic 2003-12-11 15:50:54.000000000 +1100 | ||
1238 | +++ cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100 | ||
1239 | @@ -6,11 +6,17 @@ | ||
1240 | |||
1241 | struct object | ||
1242 | { | ||
1243 | + /* These two protected by cache_lock. */ | ||
1244 | struct list_head list; | ||
1245 | + int popularity; | ||
1246 | + | ||
1247 | atomic_t refcnt; | ||
1248 | + | ||
1249 | + /* Doesn't change once created. */ | ||
1250 | int id; | ||
1251 | + | ||
1252 | + spinlock_t lock; /* Protects the name */ | ||
1253 | char name[32]; | ||
1254 | - int popularity; | ||
1255 | }; | ||
1256 | |||
1257 | static DEFINE_SPINLOCK(cache_lock); | ||
1258 | @@ -77,6 +84,7 @@ | ||
1259 | obj->id = id; | ||
1260 | obj->popularity = 0; | ||
1261 | atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ | ||
1262 | + spin_lock_init(&obj->lock); | ||
1263 | |||
1264 | spin_lock_irqsave(&cache_lock, flags); | ||
1265 | __cache_add(obj); | ||
1266 | </programlisting> | ||
1267 | |||
1268 | <para> | ||
1269 | Note that I decide that the <structfield>popularity</structfield> | ||
1270 | count should be protected by the <symbol>cache_lock</symbol> rather | ||
1271 | than the per-object lock: this is because it (like the | ||
1272 | <structname>struct list_head</structname> inside the object) is | ||
1273 | logically part of the infrastructure. This way, I don't need to grab | ||
1274 | the lock of every object in <function>__cache_add</function> when | ||
1275 | seeking the least popular. | ||
1276 | </para> | ||
1277 | |||
1278 | <para> | ||
1279 | I also decided that the <structfield>id</structfield> member is | ||
1280 | unchangeable, so I don't need to grab each object lock in | ||
1281 | <function>__cache_find()</function> to examine the | ||
1282 | <structfield>id</structfield>: the object lock is only used by a | ||
1283 | caller who wants to read or write the <structfield>name</structfield> | ||
1284 | field. | ||
1285 | </para> | ||
1286 | |||
1287 | <para> | ||
1288 | Note also that I added a comment describing what data was protected by | ||
1289 | which locks. This is extremely important, as it describes the runtime | ||
1290 | behavior of the code, and can be hard to gain from just reading. And | ||
1291 | as Alan Cox says, <quote>Lock data, not code</quote>. | ||
1292 | </para> | ||
1293 | </sect1> | ||
1294 | </chapter> | ||
1295 | |||
1296 | <chapter id="common-problems"> | ||
1297 | <title>Common Problems</title> | ||
1298 | <sect1 id="deadlock"> | ||
1299 | <title>Deadlock: Simple and Advanced</title> | ||
1300 | |||
1301 | <para> | ||
1302 | There is a coding bug where a piece of code tries to grab a | ||
1303 | spinlock twice: it will spin forever, waiting for the lock to | ||
1304 | be released (spinlocks, rwlocks and mutexes are not | ||
1305 | recursive in Linux). This is trivial to diagnose: not a | ||
1306 | stay-up-five-nights-talk-to-fluffy-code-bunnies kind of | ||
1307 | problem. | ||
1308 | </para> | ||
1309 | |||
1310 | <para> | ||
1311 | For a slightly more complex case, imagine you have a region | ||
1312 | shared by a softirq and user context. If you use a | ||
1313 | <function>spin_lock()</function> call to protect it, it is | ||
1314 | possible that the user context will be interrupted by the softirq | ||
1315 | while it holds the lock, and the softirq will then spin | ||
1316 | forever trying to get the same lock. | ||
1317 | </para> | ||
1318 | |||
1319 | <para> | ||
1320 | Both of these are called deadlock, and as shown above, it can | ||
1321 | occur even with a single CPU (although not on UP compiles, | ||
1322 | since spinlocks vanish on kernel compiles with | ||
1323 | <symbol>CONFIG_SMP</symbol>=n. You'll still get data corruption | ||
1324 | in the second example). | ||
1325 | </para> | ||
1326 | |||
1327 | <para> | ||
1328 | This complete lockup is easy to diagnose: on SMP boxes the | ||
1329 | watchdog timer or compiling with <symbol>DEBUG_SPINLOCK</symbol> set | ||
1330 | (<filename>include/linux/spinlock.h</filename>) will show this up | ||
1331 | immediately when it happens. | ||
1332 | </para> | ||
1333 | |||
1334 | <para> | ||
1335 | A more complex problem is the so-called 'deadly embrace', | ||
1336 | involving two or more locks. Say you have a hash table: each | ||
1337 | entry in the table is a spinlock, and a chain of hashed | ||
1338 | objects. Inside a softirq handler, you sometimes want to | ||
1339 | alter an object from one place in the hash to another: you | ||
1340 | grab the spinlock of the old hash chain and the spinlock of | ||
1341 | the new hash chain, and delete the object from the old one, | ||
1342 | and insert it in the new one. | ||
1343 | </para> | ||
1344 | |||
1345 | <para> | ||
1346 | There are two problems here. First, if your code ever | ||
1347 | tries to move the object to the same chain, it will deadlock | ||
1348 | with itself as it tries to lock it twice. Secondly, if the | ||
1349 | same softirq on another CPU is trying to move another object | ||
1350 | in the reverse direction, the following could happen: | ||
1351 | </para> | ||
1352 | |||
1353 | <table> | ||
1354 | <title>Consequences</title> | ||
1355 | |||
1356 | <tgroup cols="2" align="left"> | ||
1357 | |||
1358 | <thead> | ||
1359 | <row> | ||
1360 | <entry>CPU 1</entry> | ||
1361 | <entry>CPU 2</entry> | ||
1362 | </row> | ||
1363 | </thead> | ||
1364 | |||
1365 | <tbody> | ||
1366 | <row> | ||
1367 | <entry>Grab lock A -> OK</entry> | ||
1368 | <entry>Grab lock B -> OK</entry> | ||
1369 | </row> | ||
1370 | <row> | ||
1371 | <entry>Grab lock B -> spin</entry> | ||
1372 | <entry>Grab lock A -> spin</entry> | ||
1373 | </row> | ||
1374 | </tbody> | ||
1375 | </tgroup> | ||
1376 | </table> | ||
1377 | |||
1378 | <para> | ||
1379 | The two CPUs will spin forever, waiting for the other to give up | ||
1380 | their lock. It will look, smell, and feel like a crash. | ||
1381 | </para> | ||
1382 | </sect1> | ||
1383 | |||
1384 | <sect1 id="techs-deadlock-prevent"> | ||
1385 | <title>Preventing Deadlock</title> | ||
1386 | |||
1387 | <para> | ||
1388 | Textbooks will tell you that if you always lock in the same | ||
1389 | order, you will never get this kind of deadlock. Practice | ||
1390 | will tell you that this approach doesn't scale: when I | ||
1391 | create a new lock, I don't understand enough of the kernel | ||
1392 | to figure out where in the 5000 lock hierarchy it will fit. | ||
1393 | </para> | ||
1394 | |||
1395 | <para> | ||
1396 | The best locks are encapsulated: they never get exposed in | ||
1397 | headers, and are never held around calls to non-trivial | ||
1398 | functions outside the same file. You can read through this | ||
1399 | code and see that it will never deadlock, because it never | ||
1400 | tries to grab another lock while it has that one. People | ||
1401 | using your code don't even need to know you are using a | ||
1402 | lock. | ||
1403 | </para> | ||
1404 | |||
1405 | <para> | ||
1406 | A classic problem here is when you provide callbacks or | ||
1407 | hooks: if you call these with the lock held, you risk simple | ||
1408 | deadlock, or a deadly embrace (who knows what the callback | ||
1409 | will do?). Remember, the other programmers are out to get | ||
1410 | you, so don't do this. | ||
1411 | </para> | ||
1412 | |||
1413 | <sect2 id="techs-deadlock-overprevent"> | ||
1414 | <title>Overzealous Prevention Of Deadlocks</title> | ||
1415 | |||
1416 | <para> | ||
1417 | Deadlocks are problematic, but not as bad as data | ||
1418 | corruption. Code which grabs a read lock, searches a list, | ||
1419 | fails to find what it wants, drops the read lock, grabs a | ||
1420 | write lock and inserts the object has a race condition. | ||
1421 | </para> | ||
1422 | |||
1423 | <para> | ||
1424 | If you don't see why, please stay the fuck away from my code. | ||
1425 | </para> | ||
1426 | </sect2> | ||
1427 | </sect1> | ||
1428 | |||
1429 | <sect1 id="racing-timers"> | ||
1430 | <title>Racing Timers: A Kernel Pastime</title> | ||
1431 | |||
1432 | <para> | ||
1433 | Timers can produce their own special problems with races. | ||
1434 | Consider a collection of objects (list, hash, etc) where each | ||
1435 | object has a timer which is due to destroy it. | ||
1436 | </para> | ||
1437 | |||
1438 | <para> | ||
1439 | If you want to destroy the entire collection (say on module | ||
1440 | removal), you might do the following: | ||
1441 | </para> | ||
1442 | |||
1443 | <programlisting> | ||
1444 | /* THIS CODE BAD BAD BAD BAD: IF IT WAS ANY WORSE IT WOULD USE | ||
1445 | HUNGARIAN NOTATION */ | ||
1446 | spin_lock_bh(&list_lock); | ||
1447 | |||
1448 | while (list) { | ||
1449 | struct foo *next = list->next; | ||
1450 | del_timer(&list->timer); | ||
1451 | kfree(list); | ||
1452 | list = next; | ||
1453 | } | ||
1454 | |||
1455 | spin_unlock_bh(&list_lock); | ||
1456 | </programlisting> | ||
1457 | |||
1458 | <para> | ||
1459 | Sooner or later, this will crash on SMP, because a timer can | ||
1460 | have just gone off before the <function>spin_lock_bh()</function>, | ||
1461 | and it will only get the lock after we | ||
1462 | <function>spin_unlock_bh()</function>, and then try to free | ||
1463 | the element (which has already been freed!). | ||
1464 | </para> | ||
1465 | |||
1466 | <para> | ||
1467 | This can be avoided by checking the result of | ||
1468 | <function>del_timer()</function>: if it returns | ||
1469 | <returnvalue>1</returnvalue>, the timer has been deleted. | ||
1470 | If <returnvalue>0</returnvalue>, it means (in this | ||
1471 | case) that it is currently running, so we can do: | ||
1472 | </para> | ||
1473 | |||
1474 | <programlisting> | ||
1475 | retry: | ||
1476 | spin_lock_bh(&list_lock); | ||
1477 | |||
1478 | while (list) { | ||
1479 | struct foo *next = list->next; | ||
1480 | if (!del_timer(&list->timer)) { | ||
1481 | /* Give timer a chance to delete this */ | ||
1482 | spin_unlock_bh(&list_lock); | ||
1483 | goto retry; | ||
1484 | } | ||
1485 | kfree(list); | ||
1486 | list = next; | ||
1487 | } | ||
1488 | |||
1489 | spin_unlock_bh(&list_lock); | ||
1490 | </programlisting> | ||
1491 | |||
1492 | <para> | ||
1493 | Another common problem is deleting timers which restart | ||
1494 | themselves (by calling <function>add_timer()</function> at the end | ||
1495 | of their timer function). Because this is a fairly common case | ||
1496 | which is prone to races, you should use <function>del_timer_sync()</function> | ||
1497 | (<filename class="headerfile">include/linux/timer.h</filename>) | ||
1498 | to handle this case. It returns the number of times the timer | ||
1499 | had to be deleted before we finally stopped it from adding itself back | ||
1500 | in. | ||
1501 | </para> | ||
1502 | </sect1> | ||
1503 | |||
1504 | </chapter> | ||
1505 | |||
1506 | <chapter id="Efficiency"> | ||
1507 | <title>Locking Speed</title> | ||
1508 | |||
1509 | <para> | ||
1510 | There are three main things to worry about when considering speed of | ||
1511 | some code which does locking. First is concurrency: how many things | ||
1512 | are going to be waiting while someone else is holding a lock. Second | ||
1513 | is the time taken to actually acquire and release an uncontended lock. | ||
1514 | Third is using fewer, or smarter locks. I'm assuming that the lock is | ||
1515 | used fairly often: otherwise, you wouldn't be concerned about | ||
1516 | efficiency. | ||
1517 | </para> | ||
1518 | <para> | ||
1519 | Concurrency depends on how long the lock is usually held: you should | ||
1520 | hold the lock for as long as needed, but no longer. In the cache | ||
1521 | example, we always create the object without the lock held, and then | ||
1522 | grab the lock only when we are ready to insert it in the list. | ||
1523 | </para> | ||
1524 | <para> | ||
1525 | Acquisition times depend on how much damage the lock operations do to | ||
1526 | the pipeline (pipeline stalls) and how likely it is that this CPU was | ||
1527 | the last one to grab the lock (ie. is the lock cache-hot for this | ||
1528 | CPU): on a machine with more CPUs, this likelihood drops fast. | ||
1529 | Consider a 700MHz Intel Pentium III: an instruction takes about 0.7ns, | ||
1530 | an atomic increment takes about 58ns, a lock which is cache-hot on | ||
1531 | this CPU takes 160ns, and a cacheline transfer from another CPU takes | ||
1532 | an additional 170 to 360ns. (These figures from Paul McKenney's | ||
1533 | <ulink url="http://www.linuxjournal.com/article.php?sid=6993"> Linux | ||
1534 | Journal RCU article</ulink>). | ||
1535 | </para> | ||
1536 | <para> | ||
1537 | These two aims conflict: holding a lock for a short time might be done | ||
1538 | by splitting locks into parts (such as in our final per-object-lock | ||
1539 | example), but this increases the number of lock acquisitions, and the | ||
1540 | results are often slower than having a single lock. This is another | ||
1541 | reason to advocate locking simplicity. | ||
1542 | </para> | ||
1543 | <para> | ||
1544 | The third concern is addressed below: there are some methods to reduce | ||
1545 | the amount of locking which needs to be done. | ||
1546 | </para> | ||
1547 | |||
1548 | <sect1 id="efficiency-rwlocks"> | ||
1549 | <title>Read/Write Lock Variants</title> | ||
1550 | |||
1551 | <para> | ||
1552 | Both spinlocks and mutexes have read/write variants: | ||
1553 | <type>rwlock_t</type> and <structname>struct rw_semaphore</structname>. | ||
1554 | These divide users into two classes: the readers and the writers. If | ||
1555 | you are only reading the data, you can get a read lock, but to write to | ||
1556 | the data you need the write lock. Many people can hold a read lock, | ||
1557 | but a writer must be sole holder. | ||
1558 | </para> | ||
1559 | |||
1560 | <para> | ||
1561 | If your code divides neatly along reader/writer lines (as our | ||
1562 | cache code does), and the lock is held by readers for | ||
1563 | significant lengths of time, using these locks can help. They | ||
1564 | are slightly slower than the normal locks though, so in practice | ||
1565 | <type>rwlock_t</type> is not usually worthwhile. | ||
1566 | </para> | ||
1567 | </sect1> | ||
1568 | |||
1569 | <sect1 id="efficiency-read-copy-update"> | ||
1570 | <title>Avoiding Locks: Read Copy Update</title> | ||
1571 | |||
1572 | <para> | ||
1573 | There is a special method of read/write locking called Read Copy | ||
1574 | Update. Using RCU, the readers can avoid taking a lock | ||
1575 | altogether: as we expect our cache to be read more often than | ||
1576 | updated (otherwise the cache is a waste of time), it is a | ||
1577 | candidate for this optimization. | ||
1578 | </para> | ||
1579 | |||
1580 | <para> | ||
1581 | How do we get rid of read locks? Getting rid of read locks | ||
1582 | means that writers may be changing the list underneath the | ||
1583 | readers. That is actually quite simple: we can read a linked | ||
1584 | list while an element is being added if the writer adds the | ||
1585 | element very carefully. For example, adding | ||
1586 | <symbol>new</symbol> to a single linked list called | ||
1587 | <symbol>list</symbol>: | ||
1588 | </para> | ||
1589 | |||
1590 | <programlisting> | ||
1591 | new->next = list->next; | ||
1592 | wmb(); | ||
1593 | list->next = new; | ||
1594 | </programlisting> | ||
1595 | |||
1596 | <para> | ||
1597 | The <function>wmb()</function> is a write memory barrier. It | ||
1598 | ensures that the first operation (setting the new element's | ||
1599 | <symbol>next</symbol> pointer) is complete and will be seen by | ||
1600 | all CPUs, before the second operation is (putting the new | ||
1601 | element into the list). This is important, since modern | ||
1602 | compilers and modern CPUs can both reorder instructions unless | ||
1603 | told otherwise: we want a reader to either not see the new | ||
1604 | element at all, or see the new element with the | ||
1605 | <symbol>next</symbol> pointer correctly pointing at the rest of | ||
1606 | the list. | ||
1607 | </para> | ||
1608 | <para> | ||
1609 | Fortunately, there is a function to do this for standard | ||
1610 | <structname>struct list_head</structname> lists: | ||
1611 | <function>list_add_rcu()</function> | ||
1612 | (<filename>include/linux/list.h</filename>). | ||
1613 | </para> | ||
1614 | <para> | ||
1615 | Removing an element from the list is even simpler: we replace | ||
1616 | the pointer to the old element with a pointer to its successor, | ||
1617 | and readers will either see it, or skip over it. | ||
1618 | </para> | ||
1619 | <programlisting> | ||
1620 | list->next = old->next; | ||
1621 | </programlisting> | ||
1622 | <para> | ||
1623 | There is <function>list_del_rcu()</function> | ||
1624 | (<filename>include/linux/list.h</filename>) which does this (the | ||
1625 | normal version poisons the old object, which we don't want). | ||
1626 | </para> | ||
1627 | <para> | ||
1628 | The reader must also be careful: some CPUs can look through the | ||
1629 | <symbol>next</symbol> pointer to start reading the contents of | ||
1630 | the next element early, but don't realize that the pre-fetched | ||
1631 | contents is wrong when the <symbol>next</symbol> pointer changes | ||
1632 | underneath them. Once again, there is a | ||
1633 | <function>list_for_each_entry_rcu()</function> | ||
1634 | (<filename>include/linux/list.h</filename>) to help you. Of | ||
1635 | course, writers can just use | ||
1636 | <function>list_for_each_entry()</function>, since there cannot | ||
1637 | be two simultaneous writers. | ||
1638 | </para> | ||
1639 | <para> | ||
1640 | Our final dilemma is this: when can we actually destroy the | ||
1641 | removed element? Remember, a reader might be stepping through | ||
1642 | this element in the list right now: if we free this element and | ||
1643 | the <symbol>next</symbol> pointer changes, the reader will jump | ||
1644 | off into garbage and crash. We need to wait until we know that | ||
1645 | all the readers who were traversing the list when we deleted the | ||
1646 | element are finished. We use <function>call_rcu()</function> to | ||
1647 | register a callback which will actually destroy the object once | ||
1648 | all pre-existing readers are finished. Alternatively, | ||
1649 | <function>synchronize_rcu()</function> may be used to block until | ||
1650 | all pre-existing are finished. | ||
1651 | </para> | ||
1652 | <para> | ||
1653 | But how does Read Copy Update know when the readers are | ||
1654 | finished? The method is this: firstly, the readers always | ||
1655 | traverse the list inside | ||
1656 | <function>rcu_read_lock()</function>/<function>rcu_read_unlock()</function> | ||
1657 | pairs: these simply disable preemption so the reader won't go to | ||
1658 | sleep while reading the list. | ||
1659 | </para> | ||
1660 | <para> | ||
1661 | RCU then waits until every other CPU has slept at least once: | ||
1662 | since readers cannot sleep, we know that any readers which were | ||
1663 | traversing the list during the deletion are finished, and the | ||
1664 | callback is triggered. The real Read Copy Update code is a | ||
1665 | little more optimized than this, but this is the fundamental | ||
1666 | idea. | ||
1667 | </para> | ||
1668 | |||
1669 | <programlisting> | ||
1670 | --- cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100 | ||
1671 | +++ cache.c.rcupdate 2003-12-11 17:55:14.000000000 +1100 | ||
1672 | @@ -1,15 +1,18 @@ | ||
1673 | #include <linux/list.h> | ||
1674 | #include <linux/slab.h> | ||
1675 | #include <linux/string.h> | ||
1676 | +#include <linux/rcupdate.h> | ||
1677 | #include <linux/mutex.h> | ||
1678 | #include <asm/errno.h> | ||
1679 | |||
1680 | struct object | ||
1681 | { | ||
1682 | - /* These two protected by cache_lock. */ | ||
1683 | + /* This is protected by RCU */ | ||
1684 | struct list_head list; | ||
1685 | int popularity; | ||
1686 | |||
1687 | + struct rcu_head rcu; | ||
1688 | + | ||
1689 | atomic_t refcnt; | ||
1690 | |||
1691 | /* Doesn't change once created. */ | ||
1692 | @@ -40,7 +43,7 @@ | ||
1693 | { | ||
1694 | struct object *i; | ||
1695 | |||
1696 | - list_for_each_entry(i, &cache, list) { | ||
1697 | + list_for_each_entry_rcu(i, &cache, list) { | ||
1698 | if (i->id == id) { | ||
1699 | i->popularity++; | ||
1700 | return i; | ||
1701 | @@ -49,19 +52,25 @@ | ||
1702 | return NULL; | ||
1703 | } | ||
1704 | |||
1705 | +/* Final discard done once we know no readers are looking. */ | ||
1706 | +static void cache_delete_rcu(void *arg) | ||
1707 | +{ | ||
1708 | + object_put(arg); | ||
1709 | +} | ||
1710 | + | ||
1711 | /* Must be holding cache_lock */ | ||
1712 | static void __cache_delete(struct object *obj) | ||
1713 | { | ||
1714 | BUG_ON(!obj); | ||
1715 | - list_del(&obj->list); | ||
1716 | - object_put(obj); | ||
1717 | + list_del_rcu(&obj->list); | ||
1718 | cache_num--; | ||
1719 | + call_rcu(&obj->rcu, cache_delete_rcu); | ||
1720 | } | ||
1721 | |||
1722 | /* Must be holding cache_lock */ | ||
1723 | static void __cache_add(struct object *obj) | ||
1724 | { | ||
1725 | - list_add(&obj->list, &cache); | ||
1726 | + list_add_rcu(&obj->list, &cache); | ||
1727 | if (++cache_num > MAX_CACHE_SIZE) { | ||
1728 | struct object *i, *outcast = NULL; | ||
1729 | list_for_each_entry(i, &cache, list) { | ||
1730 | @@ -104,12 +114,11 @@ | ||
1731 | struct object *cache_find(int id) | ||
1732 | { | ||
1733 | struct object *obj; | ||
1734 | - unsigned long flags; | ||
1735 | |||
1736 | - spin_lock_irqsave(&cache_lock, flags); | ||
1737 | + rcu_read_lock(); | ||
1738 | obj = __cache_find(id); | ||
1739 | if (obj) | ||
1740 | object_get(obj); | ||
1741 | - spin_unlock_irqrestore(&cache_lock, flags); | ||
1742 | + rcu_read_unlock(); | ||
1743 | return obj; | ||
1744 | } | ||
1745 | </programlisting> | ||
1746 | |||
1747 | <para> | ||
1748 | Note that the reader will alter the | ||
1749 | <structfield>popularity</structfield> member in | ||
1750 | <function>__cache_find()</function>, and now it doesn't hold a lock. | ||
1751 | One solution would be to make it an <type>atomic_t</type>, but for | ||
1752 | this usage, we don't really care about races: an approximate result is | ||
1753 | good enough, so I didn't change it. | ||
1754 | </para> | ||
1755 | |||
1756 | <para> | ||
1757 | The result is that <function>cache_find()</function> requires no | ||
1758 | synchronization with any other functions, so is almost as fast on SMP | ||
1759 | as it would be on UP. | ||
1760 | </para> | ||
1761 | |||
1762 | <para> | ||
1763 | There is a further optimization possible here: remember our original | ||
1764 | cache code, where there were no reference counts and the caller simply | ||
1765 | held the lock whenever using the object? This is still possible: if | ||
1766 | you hold the lock, no one can delete the object, so you don't need to | ||
1767 | get and put the reference count. | ||
1768 | </para> | ||
1769 | |||
1770 | <para> | ||
1771 | Now, because the 'read lock' in RCU is simply disabling preemption, a | ||
1772 | caller which always has preemption disabled between calling | ||
1773 | <function>cache_find()</function> and | ||
1774 | <function>object_put()</function> does not need to actually get and | ||
1775 | put the reference count: we could expose | ||
1776 | <function>__cache_find()</function> by making it non-static, and | ||
1777 | such callers could simply call that. | ||
1778 | </para> | ||
1779 | <para> | ||
1780 | The benefit here is that the reference count is not written to: the | ||
1781 | object is not altered in any way, which is much faster on SMP | ||
1782 | machines due to caching. | ||
1783 | </para> | ||
1784 | </sect1> | ||
1785 | |||
1786 | <sect1 id="per-cpu"> | ||
1787 | <title>Per-CPU Data</title> | ||
1788 | |||
1789 | <para> | ||
1790 | Another technique for avoiding locking which is used fairly | ||
1791 | widely is to duplicate information for each CPU. For example, | ||
1792 | if you wanted to keep a count of a common condition, you could | ||
1793 | use a spin lock and a single counter. Nice and simple. | ||
1794 | </para> | ||
1795 | |||
1796 | <para> | ||
1797 | If that was too slow (it's usually not, but if you've got a | ||
1798 | really big machine to test on and can show that it is), you | ||
1799 | could instead use a counter for each CPU, then none of them need | ||
1800 | an exclusive lock. See <function>DEFINE_PER_CPU()</function>, | ||
1801 | <function>get_cpu_var()</function> and | ||
1802 | <function>put_cpu_var()</function> | ||
1803 | (<filename class="headerfile">include/linux/percpu.h</filename>). | ||
1804 | </para> | ||
1805 | |||
1806 | <para> | ||
1807 | Of particular use for simple per-cpu counters is the | ||
1808 | <type>local_t</type> type, and the | ||
1809 | <function>cpu_local_inc()</function> and related functions, | ||
1810 | which are more efficient than simple code on some architectures | ||
1811 | (<filename class="headerfile">include/asm/local.h</filename>). | ||
1812 | </para> | ||
1813 | |||
1814 | <para> | ||
1815 | Note that there is no simple, reliable way of getting an exact | ||
1816 | value of such a counter, without introducing more locks. This | ||
1817 | is not a problem for some uses. | ||
1818 | </para> | ||
1819 | </sect1> | ||
1820 | |||
1821 | <sect1 id="mostly-hardirq"> | ||
1822 | <title>Data Which Mostly Used By An IRQ Handler</title> | ||
1823 | |||
1824 | <para> | ||
1825 | If data is always accessed from within the same IRQ handler, you | ||
1826 | don't need a lock at all: the kernel already guarantees that the | ||
1827 | irq handler will not run simultaneously on multiple CPUs. | ||
1828 | </para> | ||
1829 | <para> | ||
1830 | Manfred Spraul points out that you can still do this, even if | ||
1831 | the data is very occasionally accessed in user context or | ||
1832 | softirqs/tasklets. The irq handler doesn't use a lock, and | ||
1833 | all other accesses are done as so: | ||
1834 | </para> | ||
1835 | |||
1836 | <programlisting> | ||
1837 | spin_lock(&lock); | ||
1838 | disable_irq(irq); | ||
1839 | ... | ||
1840 | enable_irq(irq); | ||
1841 | spin_unlock(&lock); | ||
1842 | </programlisting> | ||
1843 | <para> | ||
1844 | The <function>disable_irq()</function> prevents the irq handler | ||
1845 | from running (and waits for it to finish if it's currently | ||
1846 | running on other CPUs). The spinlock prevents any other | ||
1847 | accesses happening at the same time. Naturally, this is slower | ||
1848 | than just a <function>spin_lock_irq()</function> call, so it | ||
1849 | only makes sense if this type of access happens extremely | ||
1850 | rarely. | ||
1851 | </para> | ||
1852 | </sect1> | ||
1853 | </chapter> | ||
1854 | |||
1855 | <chapter id="sleeping-things"> | ||
1856 | <title>What Functions Are Safe To Call From Interrupts?</title> | ||
1857 | |||
1858 | <para> | ||
1859 | Many functions in the kernel sleep (ie. call schedule()) | ||
1860 | directly or indirectly: you can never call them while holding a | ||
1861 | spinlock, or with preemption disabled. This also means you need | ||
1862 | to be in user context: calling them from an interrupt is illegal. | ||
1863 | </para> | ||
1864 | |||
1865 | <sect1 id="sleeping"> | ||
1866 | <title>Some Functions Which Sleep</title> | ||
1867 | |||
1868 | <para> | ||
1869 | The most common ones are listed below, but you usually have to | ||
1870 | read the code to find out if other calls are safe. If everyone | ||
1871 | else who calls it can sleep, you probably need to be able to | ||
1872 | sleep, too. In particular, registration and deregistration | ||
1873 | functions usually expect to be called from user context, and can | ||
1874 | sleep. | ||
1875 | </para> | ||
1876 | |||
1877 | <itemizedlist> | ||
1878 | <listitem> | ||
1879 | <para> | ||
1880 | Accesses to | ||
1881 | <firstterm linkend="gloss-userspace">userspace</firstterm>: | ||
1882 | </para> | ||
1883 | <itemizedlist> | ||
1884 | <listitem> | ||
1885 | <para> | ||
1886 | <function>copy_from_user()</function> | ||
1887 | </para> | ||
1888 | </listitem> | ||
1889 | <listitem> | ||
1890 | <para> | ||
1891 | <function>copy_to_user()</function> | ||
1892 | </para> | ||
1893 | </listitem> | ||
1894 | <listitem> | ||
1895 | <para> | ||
1896 | <function>get_user()</function> | ||
1897 | </para> | ||
1898 | </listitem> | ||
1899 | <listitem> | ||
1900 | <para> | ||
1901 | <function>put_user()</function> | ||
1902 | </para> | ||
1903 | </listitem> | ||
1904 | </itemizedlist> | ||
1905 | </listitem> | ||
1906 | |||
1907 | <listitem> | ||
1908 | <para> | ||
1909 | <function>kmalloc(GFP_KERNEL)</function> | ||
1910 | </para> | ||
1911 | </listitem> | ||
1912 | |||
1913 | <listitem> | ||
1914 | <para> | ||
1915 | <function>mutex_lock_interruptible()</function> and | ||
1916 | <function>mutex_lock()</function> | ||
1917 | </para> | ||
1918 | <para> | ||
1919 | There is a <function>mutex_trylock()</function> which does not | ||
1920 | sleep. Still, it must not be used inside interrupt context since | ||
1921 | its implementation is not safe for that. | ||
1922 | <function>mutex_unlock()</function> will also never sleep. | ||
1923 | It cannot be used in interrupt context either since a mutex | ||
1924 | must be released by the same task that acquired it. | ||
1925 | </para> | ||
1926 | </listitem> | ||
1927 | </itemizedlist> | ||
1928 | </sect1> | ||
1929 | |||
1930 | <sect1 id="dont-sleep"> | ||
1931 | <title>Some Functions Which Don't Sleep</title> | ||
1932 | |||
1933 | <para> | ||
1934 | Some functions are safe to call from any context, or holding | ||
1935 | almost any lock. | ||
1936 | </para> | ||
1937 | |||
1938 | <itemizedlist> | ||
1939 | <listitem> | ||
1940 | <para> | ||
1941 | <function>printk()</function> | ||
1942 | </para> | ||
1943 | </listitem> | ||
1944 | <listitem> | ||
1945 | <para> | ||
1946 | <function>kfree()</function> | ||
1947 | </para> | ||
1948 | </listitem> | ||
1949 | <listitem> | ||
1950 | <para> | ||
1951 | <function>add_timer()</function> and <function>del_timer()</function> | ||
1952 | </para> | ||
1953 | </listitem> | ||
1954 | </itemizedlist> | ||
1955 | </sect1> | ||
1956 | </chapter> | ||
1957 | |||
1958 | <chapter id="apiref-mutex"> | ||
1959 | <title>Mutex API reference</title> | ||
1960 | !Iinclude/linux/mutex.h | ||
1961 | !Ekernel/locking/mutex.c | ||
1962 | </chapter> | ||
1963 | |||
1964 | <chapter id="apiref-futex"> | ||
1965 | <title>Futex API reference</title> | ||
1966 | !Ikernel/futex.c | ||
1967 | </chapter> | ||
1968 | |||
1969 | <chapter id="references"> | ||
1970 | <title>Further reading</title> | ||
1971 | |||
1972 | <itemizedlist> | ||
1973 | <listitem> | ||
1974 | <para> | ||
1975 | <filename>Documentation/locking/spinlocks.txt</filename>: | ||
1976 | Linus Torvalds' spinlocking tutorial in the kernel sources. | ||
1977 | </para> | ||
1978 | </listitem> | ||
1979 | |||
1980 | <listitem> | ||
1981 | <para> | ||
1982 | Unix Systems for Modern Architectures: Symmetric | ||
1983 | Multiprocessing and Caching for Kernel Programmers: | ||
1984 | </para> | ||
1985 | |||
1986 | <para> | ||
1987 | Curt Schimmel's very good introduction to kernel level | ||
1988 | locking (not written for Linux, but nearly everything | ||
1989 | applies). The book is expensive, but really worth every | ||
1990 | penny to understand SMP locking. [ISBN: 0201633388] | ||
1991 | </para> | ||
1992 | </listitem> | ||
1993 | </itemizedlist> | ||
1994 | </chapter> | ||
1995 | |||
1996 | <chapter id="thanks"> | ||
1997 | <title>Thanks</title> | ||
1998 | |||
1999 | <para> | ||
2000 | Thanks to Telsa Gwynne for DocBooking, neatening and adding | ||
2001 | style. | ||
2002 | </para> | ||
2003 | |||
2004 | <para> | ||
2005 | Thanks to Martin Pool, Philipp Rumpf, Stephen Rothwell, Paul | ||
2006 | Mackerras, Ruedi Aschwanden, Alan Cox, Manfred Spraul, Tim | ||
2007 | Waugh, Pete Zaitcev, James Morris, Robert Love, Paul McKenney, | ||
2008 | John Ashby for proofreading, correcting, flaming, commenting. | ||
2009 | </para> | ||
2010 | |||
2011 | <para> | ||
2012 | Thanks to the cabal for having no influence on this document. | ||
2013 | </para> | ||
2014 | </chapter> | ||
2015 | |||
2016 | <glossary id="glossary"> | ||
2017 | <title>Glossary</title> | ||
2018 | |||
2019 | <glossentry id="gloss-preemption"> | ||
2020 | <glossterm>preemption</glossterm> | ||
2021 | <glossdef> | ||
2022 | <para> | ||
2023 | Prior to 2.5, or when <symbol>CONFIG_PREEMPT</symbol> is | ||
2024 | unset, processes in user context inside the kernel would not | ||
2025 | preempt each other (ie. you had that CPU until you gave it up, | ||
2026 | except for interrupts). With the addition of | ||
2027 | <symbol>CONFIG_PREEMPT</symbol> in 2.5.4, this changed: when | ||
2028 | in user context, higher priority tasks can "cut in": spinlocks | ||
2029 | were changed to disable preemption, even on UP. | ||
2030 | </para> | ||
2031 | </glossdef> | ||
2032 | </glossentry> | ||
2033 | |||
2034 | <glossentry id="gloss-bh"> | ||
2035 | <glossterm>bh</glossterm> | ||
2036 | <glossdef> | ||
2037 | <para> | ||
2038 | Bottom Half: for historical reasons, functions with | ||
2039 | '_bh' in them often now refer to any software interrupt, e.g. | ||
2040 | <function>spin_lock_bh()</function> blocks any software interrupt | ||
2041 | on the current CPU. Bottom halves are deprecated, and will | ||
2042 | eventually be replaced by tasklets. Only one bottom half will be | ||
2043 | running at any time. | ||
2044 | </para> | ||
2045 | </glossdef> | ||
2046 | </glossentry> | ||
2047 | |||
2048 | <glossentry id="gloss-hwinterrupt"> | ||
2049 | <glossterm>Hardware Interrupt / Hardware IRQ</glossterm> | ||
2050 | <glossdef> | ||
2051 | <para> | ||
2052 | Hardware interrupt request. <function>in_irq()</function> returns | ||
2053 | <returnvalue>true</returnvalue> in a hardware interrupt handler. | ||
2054 | </para> | ||
2055 | </glossdef> | ||
2056 | </glossentry> | ||
2057 | |||
2058 | <glossentry id="gloss-interruptcontext"> | ||
2059 | <glossterm>Interrupt Context</glossterm> | ||
2060 | <glossdef> | ||
2061 | <para> | ||
2062 | Not user context: processing a hardware irq or software irq. | ||
2063 | Indicated by the <function>in_interrupt()</function> macro | ||
2064 | returning <returnvalue>true</returnvalue>. | ||
2065 | </para> | ||
2066 | </glossdef> | ||
2067 | </glossentry> | ||
2068 | |||
2069 | <glossentry id="gloss-smp"> | ||
2070 | <glossterm><acronym>SMP</acronym></glossterm> | ||
2071 | <glossdef> | ||
2072 | <para> | ||
2073 | Symmetric Multi-Processor: kernels compiled for multiple-CPU | ||
2074 | machines. (CONFIG_SMP=y). | ||
2075 | </para> | ||
2076 | </glossdef> | ||
2077 | </glossentry> | ||
2078 | |||
2079 | <glossentry id="gloss-softirq"> | ||
2080 | <glossterm>Software Interrupt / softirq</glossterm> | ||
2081 | <glossdef> | ||
2082 | <para> | ||
2083 | Software interrupt handler. <function>in_irq()</function> returns | ||
2084 | <returnvalue>false</returnvalue>; <function>in_softirq()</function> | ||
2085 | returns <returnvalue>true</returnvalue>. Tasklets and softirqs | ||
2086 | both fall into the category of 'software interrupts'. | ||
2087 | </para> | ||
2088 | <para> | ||
2089 | Strictly speaking a softirq is one of up to 32 enumerated software | ||
2090 | interrupts which can run on multiple CPUs at once. | ||
2091 | Sometimes used to refer to tasklets as | ||
2092 | well (ie. all software interrupts). | ||
2093 | </para> | ||
2094 | </glossdef> | ||
2095 | </glossentry> | ||
2096 | |||
2097 | <glossentry id="gloss-tasklet"> | ||
2098 | <glossterm>tasklet</glossterm> | ||
2099 | <glossdef> | ||
2100 | <para> | ||
2101 | A dynamically-registrable software interrupt, | ||
2102 | which is guaranteed to only run on one CPU at a time. | ||
2103 | </para> | ||
2104 | </glossdef> | ||
2105 | </glossentry> | ||
2106 | |||
2107 | <glossentry id="gloss-timers"> | ||
2108 | <glossterm>timer</glossterm> | ||
2109 | <glossdef> | ||
2110 | <para> | ||
2111 | A dynamically-registrable software interrupt, which is run at | ||
2112 | (or close to) a given time. When running, it is just like a | ||
2113 | tasklet (in fact, they are called from the TIMER_SOFTIRQ). | ||
2114 | </para> | ||
2115 | </glossdef> | ||
2116 | </glossentry> | ||
2117 | |||
2118 | <glossentry id="gloss-up"> | ||
2119 | <glossterm><acronym>UP</acronym></glossterm> | ||
2120 | <glossdef> | ||
2121 | <para> | ||
2122 | Uni-Processor: Non-SMP. (CONFIG_SMP=n). | ||
2123 | </para> | ||
2124 | </glossdef> | ||
2125 | </glossentry> | ||
2126 | |||
2127 | <glossentry id="gloss-usercontext"> | ||
2128 | <glossterm>User Context</glossterm> | ||
2129 | <glossdef> | ||
2130 | <para> | ||
2131 | The kernel executing on behalf of a particular process (ie. a | ||
2132 | system call or trap) or kernel thread. You can tell which | ||
2133 | process with the <symbol>current</symbol> macro.) Not to | ||
2134 | be confused with userspace. Can be interrupted by software or | ||
2135 | hardware interrupts. | ||
2136 | </para> | ||
2137 | </glossdef> | ||
2138 | </glossentry> | ||
2139 | |||
2140 | <glossentry id="gloss-userspace"> | ||
2141 | <glossterm>Userspace</glossterm> | ||
2142 | <glossdef> | ||
2143 | <para> | ||
2144 | A process executing its own code outside the kernel. | ||
2145 | </para> | ||
2146 | </glossdef> | ||
2147 | </glossentry> | ||
2148 | |||
2149 | </glossary> | ||
2150 | </book> | ||
2151 | |||
diff --git a/Documentation/DocBook/kgdb.tmpl b/Documentation/DocBook/kgdb.tmpl deleted file mode 100644 index 856ac20bf367..000000000000 --- a/Documentation/DocBook/kgdb.tmpl +++ /dev/null | |||
@@ -1,918 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="kgdbOnLinux"> | ||
6 | <bookinfo> | ||
7 | <title>Using kgdb, kdb and the kernel debugger internals</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Jason</firstname> | ||
12 | <surname>Wessel</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>jason.wessel@windriver.com</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | </authorgroup> | ||
20 | <copyright> | ||
21 | <year>2008,2010</year> | ||
22 | <holder>Wind River Systems, Inc.</holder> | ||
23 | </copyright> | ||
24 | <copyright> | ||
25 | <year>2004-2005</year> | ||
26 | <holder>MontaVista Software, Inc.</holder> | ||
27 | </copyright> | ||
28 | <copyright> | ||
29 | <year>2004</year> | ||
30 | <holder>Amit S. Kale</holder> | ||
31 | </copyright> | ||
32 | |||
33 | <legalnotice> | ||
34 | <para> | ||
35 | This file is licensed under the terms of the GNU General Public License | ||
36 | version 2. This program is licensed "as is" without any warranty of any | ||
37 | kind, whether express or implied. | ||
38 | </para> | ||
39 | |||
40 | </legalnotice> | ||
41 | </bookinfo> | ||
42 | |||
43 | <toc></toc> | ||
44 | <chapter id="Introduction"> | ||
45 | <title>Introduction</title> | ||
46 | <para> | ||
47 | The kernel has two different debugger front ends (kdb and kgdb) | ||
48 | which interface to the debug core. It is possible to use either | ||
49 | of the debugger front ends and dynamically transition between them | ||
50 | if you configure the kernel properly at compile and runtime. | ||
51 | </para> | ||
52 | <para> | ||
53 | Kdb is simplistic shell-style interface which you can use on a | ||
54 | system console with a keyboard or serial console. You can use it | ||
55 | to inspect memory, registers, process lists, dmesg, and even set | ||
56 | breakpoints to stop in a certain location. Kdb is not a source | ||
57 | level debugger, although you can set breakpoints and execute some | ||
58 | basic kernel run control. Kdb is mainly aimed at doing some | ||
59 | analysis to aid in development or diagnosing kernel problems. You | ||
60 | can access some symbols by name in kernel built-ins or in kernel | ||
61 | modules if the code was built | ||
62 | with <symbol>CONFIG_KALLSYMS</symbol>. | ||
63 | </para> | ||
64 | <para> | ||
65 | Kgdb is intended to be used as a source level debugger for the | ||
66 | Linux kernel. It is used along with gdb to debug a Linux kernel. | ||
67 | The expectation is that gdb can be used to "break in" to the | ||
68 | kernel to inspect memory, variables and look through call stack | ||
69 | information similar to the way an application developer would use | ||
70 | gdb to debug an application. It is possible to place breakpoints | ||
71 | in kernel code and perform some limited execution stepping. | ||
72 | </para> | ||
73 | <para> | ||
74 | Two machines are required for using kgdb. One of these machines is | ||
75 | a development machine and the other is the target machine. The | ||
76 | kernel to be debugged runs on the target machine. The development | ||
77 | machine runs an instance of gdb against the vmlinux file which | ||
78 | contains the symbols (not a boot image such as bzImage, zImage, | ||
79 | uImage...). In gdb the developer specifies the connection | ||
80 | parameters and connects to kgdb. The type of connection a | ||
81 | developer makes with gdb depends on the availability of kgdb I/O | ||
82 | modules compiled as built-ins or loadable kernel modules in the test | ||
83 | machine's kernel. | ||
84 | </para> | ||
85 | </chapter> | ||
86 | <chapter id="CompilingAKernel"> | ||
87 | <title>Compiling a kernel</title> | ||
88 | <para> | ||
89 | <itemizedlist> | ||
90 | <listitem><para>In order to enable compilation of kdb, you must first enable kgdb.</para></listitem> | ||
91 | <listitem><para>The kgdb test compile options are described in the kgdb test suite chapter.</para></listitem> | ||
92 | </itemizedlist> | ||
93 | </para> | ||
94 | <sect1 id="CompileKGDB"> | ||
95 | <title>Kernel config options for kgdb</title> | ||
96 | <para> | ||
97 | To enable <symbol>CONFIG_KGDB</symbol> you should look under | ||
98 | "Kernel hacking" / "Kernel debugging" and select "KGDB: kernel debugger". | ||
99 | </para> | ||
100 | <para> | ||
101 | While it is not a hard requirement that you have symbols in your | ||
102 | vmlinux file, gdb tends not to be very useful without the symbolic | ||
103 | data, so you will want to turn | ||
104 | on <symbol>CONFIG_DEBUG_INFO</symbol> which is called "Compile the | ||
105 | kernel with debug info" in the config menu. | ||
106 | </para> | ||
107 | <para> | ||
108 | It is advised, but not required, that you turn on the | ||
109 | <symbol>CONFIG_FRAME_POINTER</symbol> kernel option which is called "Compile the | ||
110 | kernel with frame pointers" in the config menu. This option | ||
111 | inserts code to into the compiled executable which saves the frame | ||
112 | information in registers or on the stack at different points which | ||
113 | allows a debugger such as gdb to more accurately construct | ||
114 | stack back traces while debugging the kernel. | ||
115 | </para> | ||
116 | <para> | ||
117 | If the architecture that you are using supports the kernel option | ||
118 | CONFIG_STRICT_KERNEL_RWX, you should consider turning it off. This | ||
119 | option will prevent the use of software breakpoints because it | ||
120 | marks certain regions of the kernel's memory space as read-only. | ||
121 | If kgdb supports it for the architecture you are using, you can | ||
122 | use hardware breakpoints if you desire to run with the | ||
123 | CONFIG_STRICT_KERNEL_RWX option turned on, else you need to turn off | ||
124 | this option. | ||
125 | </para> | ||
126 | <para> | ||
127 | Next you should choose one of more I/O drivers to interconnect | ||
128 | debugging host and debugged target. Early boot debugging requires | ||
129 | a KGDB I/O driver that supports early debugging and the driver | ||
130 | must be built into the kernel directly. Kgdb I/O driver | ||
131 | configuration takes place via kernel or module parameters which | ||
132 | you can learn more about in the in the section that describes the | ||
133 | parameter "kgdboc". | ||
134 | </para> | ||
135 | <para>Here is an example set of .config symbols to enable or | ||
136 | disable for kgdb: | ||
137 | <itemizedlist> | ||
138 | <listitem><para># CONFIG_STRICT_KERNEL_RWX is not set</para></listitem> | ||
139 | <listitem><para>CONFIG_FRAME_POINTER=y</para></listitem> | ||
140 | <listitem><para>CONFIG_KGDB=y</para></listitem> | ||
141 | <listitem><para>CONFIG_KGDB_SERIAL_CONSOLE=y</para></listitem> | ||
142 | </itemizedlist> | ||
143 | </para> | ||
144 | </sect1> | ||
145 | <sect1 id="CompileKDB"> | ||
146 | <title>Kernel config options for kdb</title> | ||
147 | <para>Kdb is quite a bit more complex than the simple gdbstub | ||
148 | sitting on top of the kernel's debug core. Kdb must implement a | ||
149 | shell, and also adds some helper functions in other parts of the | ||
150 | kernel, responsible for printing out interesting data such as what | ||
151 | you would see if you ran "lsmod", or "ps". In order to build kdb | ||
152 | into the kernel you follow the same steps as you would for kgdb. | ||
153 | </para> | ||
154 | <para>The main config option for kdb | ||
155 | is <symbol>CONFIG_KGDB_KDB</symbol> which is called "KGDB_KDB: | ||
156 | include kdb frontend for kgdb" in the config menu. In theory you | ||
157 | would have already also selected an I/O driver such as the | ||
158 | CONFIG_KGDB_SERIAL_CONSOLE interface if you plan on using kdb on a | ||
159 | serial port, when you were configuring kgdb. | ||
160 | </para> | ||
161 | <para>If you want to use a PS/2-style keyboard with kdb, you would | ||
162 | select CONFIG_KDB_KEYBOARD which is called "KGDB_KDB: keyboard as | ||
163 | input device" in the config menu. The CONFIG_KDB_KEYBOARD option | ||
164 | is not used for anything in the gdb interface to kgdb. The | ||
165 | CONFIG_KDB_KEYBOARD option only works with kdb. | ||
166 | </para> | ||
167 | <para>Here is an example set of .config symbols to enable/disable kdb: | ||
168 | <itemizedlist> | ||
169 | <listitem><para># CONFIG_STRICT_KERNEL_RWX is not set</para></listitem> | ||
170 | <listitem><para>CONFIG_FRAME_POINTER=y</para></listitem> | ||
171 | <listitem><para>CONFIG_KGDB=y</para></listitem> | ||
172 | <listitem><para>CONFIG_KGDB_SERIAL_CONSOLE=y</para></listitem> | ||
173 | <listitem><para>CONFIG_KGDB_KDB=y</para></listitem> | ||
174 | <listitem><para>CONFIG_KDB_KEYBOARD=y</para></listitem> | ||
175 | </itemizedlist> | ||
176 | </para> | ||
177 | </sect1> | ||
178 | </chapter> | ||
179 | <chapter id="kgdbKernelArgs"> | ||
180 | <title>Kernel Debugger Boot Arguments</title> | ||
181 | <para>This section describes the various runtime kernel | ||
182 | parameters that affect the configuration of the kernel debugger. | ||
183 | The following chapter covers using kdb and kgdb as well as | ||
184 | providing some examples of the configuration parameters.</para> | ||
185 | <sect1 id="kgdboc"> | ||
186 | <title>Kernel parameter: kgdboc</title> | ||
187 | <para>The kgdboc driver was originally an abbreviation meant to | ||
188 | stand for "kgdb over console". Today it is the primary mechanism | ||
189 | to configure how to communicate from gdb to kgdb as well as the | ||
190 | devices you want to use to interact with the kdb shell. | ||
191 | </para> | ||
192 | <para>For kgdb/gdb, kgdboc is designed to work with a single serial | ||
193 | port. It is intended to cover the circumstance where you want to | ||
194 | use a serial console as your primary console as well as using it to | ||
195 | perform kernel debugging. It is also possible to use kgdb on a | ||
196 | serial port which is not designated as a system console. Kgdboc | ||
197 | may be configured as a kernel built-in or a kernel loadable module. | ||
198 | You can only make use of <constant>kgdbwait</constant> and early | ||
199 | debugging if you build kgdboc into the kernel as a built-in. | ||
200 | </para> | ||
201 | <para>Optionally you can elect to activate kms (Kernel Mode | ||
202 | Setting) integration. When you use kms with kgdboc and you have a | ||
203 | video driver that has atomic mode setting hooks, it is possible to | ||
204 | enter the debugger on the graphics console. When the kernel | ||
205 | execution is resumed, the previous graphics mode will be restored. | ||
206 | This integration can serve as a useful tool to aid in diagnosing | ||
207 | crashes or doing analysis of memory with kdb while allowing the | ||
208 | full graphics console applications to run. | ||
209 | </para> | ||
210 | <sect2 id="kgdbocArgs"> | ||
211 | <title>kgdboc arguments</title> | ||
212 | <para>Usage: <constant>kgdboc=[kms][[,]kbd][[,]serial_device][,baud]</constant></para> | ||
213 | <para>The order listed above must be observed if you use any of the | ||
214 | optional configurations together. | ||
215 | </para> | ||
216 | <para>Abbreviations: | ||
217 | <itemizedlist> | ||
218 | <listitem><para>kms = Kernel Mode Setting</para></listitem> | ||
219 | <listitem><para>kbd = Keyboard</para></listitem> | ||
220 | </itemizedlist> | ||
221 | </para> | ||
222 | <para>You can configure kgdboc to use the keyboard, and/or a serial | ||
223 | device depending on if you are using kdb and/or kgdb, in one of the | ||
224 | following scenarios. The order listed above must be observed if | ||
225 | you use any of the optional configurations together. Using kms + | ||
226 | only gdb is generally not a useful combination.</para> | ||
227 | <sect3 id="kgdbocArgs1"> | ||
228 | <title>Using loadable module or built-in</title> | ||
229 | <para> | ||
230 | <orderedlist> | ||
231 | <listitem><para>As a kernel built-in:</para> | ||
232 | <para>Use the kernel boot argument: <constant>kgdboc=<tty-device>,[baud]</constant></para></listitem> | ||
233 | <listitem> | ||
234 | <para>As a kernel loadable module:</para> | ||
235 | <para>Use the command: <constant>modprobe kgdboc kgdboc=<tty-device>,[baud]</constant></para> | ||
236 | <para>Here are two examples of how you might format the kgdboc | ||
237 | string. The first is for an x86 target using the first serial port. | ||
238 | The second example is for the ARM Versatile AB using the second | ||
239 | serial port. | ||
240 | <orderedlist> | ||
241 | <listitem><para><constant>kgdboc=ttyS0,115200</constant></para></listitem> | ||
242 | <listitem><para><constant>kgdboc=ttyAMA1,115200</constant></para></listitem> | ||
243 | </orderedlist> | ||
244 | </para> | ||
245 | </listitem> | ||
246 | </orderedlist></para> | ||
247 | </sect3> | ||
248 | <sect3 id="kgdbocArgs2"> | ||
249 | <title>Configure kgdboc at runtime with sysfs</title> | ||
250 | <para>At run time you can enable or disable kgdboc by echoing a | ||
251 | parameters into the sysfs. Here are two examples:</para> | ||
252 | <orderedlist> | ||
253 | <listitem><para>Enable kgdboc on ttyS0</para> | ||
254 | <para><constant>echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem> | ||
255 | <listitem><para>Disable kgdboc</para> | ||
256 | <para><constant>echo "" > /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem> | ||
257 | </orderedlist> | ||
258 | <para>NOTE: You do not need to specify the baud if you are | ||
259 | configuring the console on tty which is already configured or | ||
260 | open.</para> | ||
261 | </sect3> | ||
262 | <sect3 id="kgdbocArgs3"> | ||
263 | <title>More examples</title> | ||
264 | <para>You can configure kgdboc to use the keyboard, and/or a serial device | ||
265 | depending on if you are using kdb and/or kgdb, in one of the | ||
266 | following scenarios. | ||
267 | <orderedlist> | ||
268 | <listitem><para>kdb and kgdb over only a serial port</para> | ||
269 | <para><constant>kgdboc=<serial_device>[,baud]</constant></para> | ||
270 | <para>Example: <constant>kgdboc=ttyS0,115200</constant></para> | ||
271 | </listitem> | ||
272 | <listitem><para>kdb and kgdb with keyboard and a serial port</para> | ||
273 | <para><constant>kgdboc=kbd,<serial_device>[,baud]</constant></para> | ||
274 | <para>Example: <constant>kgdboc=kbd,ttyS0,115200</constant></para> | ||
275 | </listitem> | ||
276 | <listitem><para>kdb with a keyboard</para> | ||
277 | <para><constant>kgdboc=kbd</constant></para> | ||
278 | </listitem> | ||
279 | <listitem><para>kdb with kernel mode setting</para> | ||
280 | <para><constant>kgdboc=kms,kbd</constant></para> | ||
281 | </listitem> | ||
282 | <listitem><para>kdb with kernel mode setting and kgdb over a serial port</para> | ||
283 | <para><constant>kgdboc=kms,kbd,ttyS0,115200</constant></para> | ||
284 | </listitem> | ||
285 | </orderedlist> | ||
286 | </para> | ||
287 | <para>NOTE: Kgdboc does not support interrupting the target via the | ||
288 | gdb remote protocol. You must manually send a sysrq-g unless you | ||
289 | have a proxy that splits console output to a terminal program. | ||
290 | A console proxy has a separate TCP port for the debugger and a separate | ||
291 | TCP port for the "human" console. The proxy can take care of sending | ||
292 | the sysrq-g for you. | ||
293 | </para> | ||
294 | <para>When using kgdboc with no debugger proxy, you can end up | ||
295 | connecting the debugger at one of two entry points. If an | ||
296 | exception occurs after you have loaded kgdboc, a message should | ||
297 | print on the console stating it is waiting for the debugger. In | ||
298 | this case you disconnect your terminal program and then connect the | ||
299 | debugger in its place. If you want to interrupt the target system | ||
300 | and forcibly enter a debug session you have to issue a Sysrq | ||
301 | sequence and then type the letter <constant>g</constant>. Then | ||
302 | you disconnect the terminal session and connect gdb. Your options | ||
303 | if you don't like this are to hack gdb to send the sysrq-g for you | ||
304 | as well as on the initial connect, or to use a debugger proxy that | ||
305 | allows an unmodified gdb to do the debugging. | ||
306 | </para> | ||
307 | </sect3> | ||
308 | </sect2> | ||
309 | </sect1> | ||
310 | <sect1 id="kgdbwait"> | ||
311 | <title>Kernel parameter: kgdbwait</title> | ||
312 | <para> | ||
313 | The Kernel command line option <constant>kgdbwait</constant> makes | ||
314 | kgdb wait for a debugger connection during booting of a kernel. You | ||
315 | can only use this option if you compiled a kgdb I/O driver into the | ||
316 | kernel and you specified the I/O driver configuration as a kernel | ||
317 | command line option. The kgdbwait parameter should always follow the | ||
318 | configuration parameter for the kgdb I/O driver in the kernel | ||
319 | command line else the I/O driver will not be configured prior to | ||
320 | asking the kernel to use it to wait. | ||
321 | </para> | ||
322 | <para> | ||
323 | The kernel will stop and wait as early as the I/O driver and | ||
324 | architecture allows when you use this option. If you build the | ||
325 | kgdb I/O driver as a loadable kernel module kgdbwait will not do | ||
326 | anything. | ||
327 | </para> | ||
328 | </sect1> | ||
329 | <sect1 id="kgdbcon"> | ||
330 | <title>Kernel parameter: kgdbcon</title> | ||
331 | <para> The kgdbcon feature allows you to see printk() messages | ||
332 | inside gdb while gdb is connected to the kernel. Kdb does not make | ||
333 | use of the kgdbcon feature. | ||
334 | </para> | ||
335 | <para>Kgdb supports using the gdb serial protocol to send console | ||
336 | messages to the debugger when the debugger is connected and running. | ||
337 | There are two ways to activate this feature. | ||
338 | <orderedlist> | ||
339 | <listitem><para>Activate with the kernel command line option:</para> | ||
340 | <para><constant>kgdbcon</constant></para> | ||
341 | </listitem> | ||
342 | <listitem><para>Use sysfs before configuring an I/O driver</para> | ||
343 | <para> | ||
344 | <constant>echo 1 > /sys/module/kgdb/parameters/kgdb_use_con</constant> | ||
345 | </para> | ||
346 | <para> | ||
347 | NOTE: If you do this after you configure the kgdb I/O driver, the | ||
348 | setting will not take effect until the next point the I/O is | ||
349 | reconfigured. | ||
350 | </para> | ||
351 | </listitem> | ||
352 | </orderedlist> | ||
353 | </para> | ||
354 | <para>IMPORTANT NOTE: You cannot use kgdboc + kgdbcon on a tty that is an | ||
355 | active system console. An example of incorrect usage is <constant>console=ttyS0,115200 kgdboc=ttyS0 kgdbcon</constant> | ||
356 | </para> | ||
357 | <para>It is possible to use this option with kgdboc on a tty that is not a system console. | ||
358 | </para> | ||
359 | </sect1> | ||
360 | <sect1 id="kgdbreboot"> | ||
361 | <title>Run time parameter: kgdbreboot</title> | ||
362 | <para> The kgdbreboot feature allows you to change how the debugger | ||
363 | deals with the reboot notification. You have 3 choices for the | ||
364 | behavior. The default behavior is always set to 0.</para> | ||
365 | <orderedlist> | ||
366 | <listitem><para>echo -1 > /sys/module/debug_core/parameters/kgdbreboot</para> | ||
367 | <para>Ignore the reboot notification entirely.</para> | ||
368 | </listitem> | ||
369 | <listitem><para>echo 0 > /sys/module/debug_core/parameters/kgdbreboot</para> | ||
370 | <para>Send the detach message to any attached debugger client.</para> | ||
371 | </listitem> | ||
372 | <listitem><para>echo 1 > /sys/module/debug_core/parameters/kgdbreboot</para> | ||
373 | <para>Enter the debugger on reboot notify.</para> | ||
374 | </listitem> | ||
375 | </orderedlist> | ||
376 | </sect1> | ||
377 | </chapter> | ||
378 | <chapter id="usingKDB"> | ||
379 | <title>Using kdb</title> | ||
380 | <para> | ||
381 | </para> | ||
382 | <sect1 id="quickKDBserial"> | ||
383 | <title>Quick start for kdb on a serial port</title> | ||
384 | <para>This is a quick example of how to use kdb.</para> | ||
385 | <para><orderedlist> | ||
386 | <listitem><para>Configure kgdboc at boot using kernel parameters: | ||
387 | <itemizedlist> | ||
388 | <listitem><para><constant>console=ttyS0,115200 kgdboc=ttyS0,115200</constant></para></listitem> | ||
389 | </itemizedlist></para> | ||
390 | <para>OR</para> | ||
391 | <para>Configure kgdboc after the kernel has booted; assuming you are using a serial port console: | ||
392 | <itemizedlist> | ||
393 | <listitem><para><constant>echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem> | ||
394 | </itemizedlist> | ||
395 | </para> | ||
396 | </listitem> | ||
397 | <listitem><para>Enter the kernel debugger manually or by waiting for an oops or fault. There are several ways you can enter the kernel debugger manually; all involve using the sysrq-g, which means you must have enabled CONFIG_MAGIC_SYSRQ=y in your kernel config.</para> | ||
398 | <itemizedlist> | ||
399 | <listitem><para>When logged in as root or with a super user session you can run:</para> | ||
400 | <para><constant>echo g > /proc/sysrq-trigger</constant></para></listitem> | ||
401 | <listitem><para>Example using minicom 2.2</para> | ||
402 | <para>Press: <constant>Control-a</constant></para> | ||
403 | <para>Press: <constant>f</constant></para> | ||
404 | <para>Press: <constant>g</constant></para> | ||
405 | </listitem> | ||
406 | <listitem><para>When you have telneted to a terminal server that supports sending a remote break</para> | ||
407 | <para>Press: <constant>Control-]</constant></para> | ||
408 | <para>Type in:<constant>send break</constant></para> | ||
409 | <para>Press: <constant>Enter</constant></para> | ||
410 | <para>Press: <constant>g</constant></para> | ||
411 | </listitem> | ||
412 | </itemizedlist> | ||
413 | </listitem> | ||
414 | <listitem><para>From the kdb prompt you can run the "help" command to see a complete list of the commands that are available.</para> | ||
415 | <para>Some useful commands in kdb include: | ||
416 | <itemizedlist> | ||
417 | <listitem><para>lsmod -- Shows where kernel modules are loaded</para></listitem> | ||
418 | <listitem><para>ps -- Displays only the active processes</para></listitem> | ||
419 | <listitem><para>ps A -- Shows all the processes</para></listitem> | ||
420 | <listitem><para>summary -- Shows kernel version info and memory usage</para></listitem> | ||
421 | <listitem><para>bt -- Get a backtrace of the current process using dump_stack()</para></listitem> | ||
422 | <listitem><para>dmesg -- View the kernel syslog buffer</para></listitem> | ||
423 | <listitem><para>go -- Continue the system</para></listitem> | ||
424 | </itemizedlist> | ||
425 | </para> | ||
426 | </listitem> | ||
427 | <listitem> | ||
428 | <para>When you are done using kdb you need to consider rebooting the | ||
429 | system or using the "go" command to resuming normal kernel | ||
430 | execution. If you have paused the kernel for a lengthy period of | ||
431 | time, applications that rely on timely networking or anything to do | ||
432 | with real wall clock time could be adversely affected, so you | ||
433 | should take this into consideration when using the kernel | ||
434 | debugger.</para> | ||
435 | </listitem> | ||
436 | </orderedlist></para> | ||
437 | </sect1> | ||
438 | <sect1 id="quickKDBkeyboard"> | ||
439 | <title>Quick start for kdb using a keyboard connected console</title> | ||
440 | <para>This is a quick example of how to use kdb with a keyboard.</para> | ||
441 | <para><orderedlist> | ||
442 | <listitem><para>Configure kgdboc at boot using kernel parameters: | ||
443 | <itemizedlist> | ||
444 | <listitem><para><constant>kgdboc=kbd</constant></para></listitem> | ||
445 | </itemizedlist></para> | ||
446 | <para>OR</para> | ||
447 | <para>Configure kgdboc after the kernel has booted: | ||
448 | <itemizedlist> | ||
449 | <listitem><para><constant>echo kbd > /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem> | ||
450 | </itemizedlist> | ||
451 | </para> | ||
452 | </listitem> | ||
453 | <listitem><para>Enter the kernel debugger manually or by waiting for an oops or fault. There are several ways you can enter the kernel debugger manually; all involve using the sysrq-g, which means you must have enabled CONFIG_MAGIC_SYSRQ=y in your kernel config.</para> | ||
454 | <itemizedlist> | ||
455 | <listitem><para>When logged in as root or with a super user session you can run:</para> | ||
456 | <para><constant>echo g > /proc/sysrq-trigger</constant></para></listitem> | ||
457 | <listitem><para>Example using a laptop keyboard</para> | ||
458 | <para>Press and hold down: <constant>Alt</constant></para> | ||
459 | <para>Press and hold down: <constant>Fn</constant></para> | ||
460 | <para>Press and release the key with the label: <constant>SysRq</constant></para> | ||
461 | <para>Release: <constant>Fn</constant></para> | ||
462 | <para>Press and release: <constant>g</constant></para> | ||
463 | <para>Release: <constant>Alt</constant></para> | ||
464 | </listitem> | ||
465 | <listitem><para>Example using a PS/2 101-key keyboard</para> | ||
466 | <para>Press and hold down: <constant>Alt</constant></para> | ||
467 | <para>Press and release the key with the label: <constant>SysRq</constant></para> | ||
468 | <para>Press and release: <constant>g</constant></para> | ||
469 | <para>Release: <constant>Alt</constant></para> | ||
470 | </listitem> | ||
471 | </itemizedlist> | ||
472 | </listitem> | ||
473 | <listitem> | ||
474 | <para>Now type in a kdb command such as "help", "dmesg", "bt" or "go" to continue kernel execution.</para> | ||
475 | </listitem> | ||
476 | </orderedlist></para> | ||
477 | </sect1> | ||
478 | </chapter> | ||
479 | <chapter id="EnableKGDB"> | ||
480 | <title>Using kgdb / gdb</title> | ||
481 | <para>In order to use kgdb you must activate it by passing | ||
482 | configuration information to one of the kgdb I/O drivers. If you | ||
483 | do not pass any configuration information kgdb will not do anything | ||
484 | at all. Kgdb will only actively hook up to the kernel trap hooks | ||
485 | if a kgdb I/O driver is loaded and configured. If you unconfigure | ||
486 | a kgdb I/O driver, kgdb will unregister all the kernel hook points. | ||
487 | </para> | ||
488 | <para> All kgdb I/O drivers can be reconfigured at run time, if | ||
489 | <symbol>CONFIG_SYSFS</symbol> and <symbol>CONFIG_MODULES</symbol> | ||
490 | are enabled, by echo'ing a new config string to | ||
491 | <constant>/sys/module/<driver>/parameter/<option></constant>. | ||
492 | The driver can be unconfigured by passing an empty string. You cannot | ||
493 | change the configuration while the debugger is attached. Make sure | ||
494 | to detach the debugger with the <constant>detach</constant> command | ||
495 | prior to trying to unconfigure a kgdb I/O driver. | ||
496 | </para> | ||
497 | <sect1 id="ConnectingGDB"> | ||
498 | <title>Connecting with gdb to a serial port</title> | ||
499 | <orderedlist> | ||
500 | <listitem><para>Configure kgdboc</para> | ||
501 | <para>Configure kgdboc at boot using kernel parameters: | ||
502 | <itemizedlist> | ||
503 | <listitem><para><constant>kgdboc=ttyS0,115200</constant></para></listitem> | ||
504 | </itemizedlist></para> | ||
505 | <para>OR</para> | ||
506 | <para>Configure kgdboc after the kernel has booted: | ||
507 | <itemizedlist> | ||
508 | <listitem><para><constant>echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc</constant></para></listitem> | ||
509 | </itemizedlist></para> | ||
510 | </listitem> | ||
511 | <listitem> | ||
512 | <para>Stop kernel execution (break into the debugger)</para> | ||
513 | <para>In order to connect to gdb via kgdboc, the kernel must | ||
514 | first be stopped. There are several ways to stop the kernel which | ||
515 | include using kgdbwait as a boot argument, via a sysrq-g, or running | ||
516 | the kernel until it takes an exception where it waits for the | ||
517 | debugger to attach. | ||
518 | <itemizedlist> | ||
519 | <listitem><para>When logged in as root or with a super user session you can run:</para> | ||
520 | <para><constant>echo g > /proc/sysrq-trigger</constant></para></listitem> | ||
521 | <listitem><para>Example using minicom 2.2</para> | ||
522 | <para>Press: <constant>Control-a</constant></para> | ||
523 | <para>Press: <constant>f</constant></para> | ||
524 | <para>Press: <constant>g</constant></para> | ||
525 | </listitem> | ||
526 | <listitem><para>When you have telneted to a terminal server that supports sending a remote break</para> | ||
527 | <para>Press: <constant>Control-]</constant></para> | ||
528 | <para>Type in:<constant>send break</constant></para> | ||
529 | <para>Press: <constant>Enter</constant></para> | ||
530 | <para>Press: <constant>g</constant></para> | ||
531 | </listitem> | ||
532 | </itemizedlist> | ||
533 | </para> | ||
534 | </listitem> | ||
535 | <listitem> | ||
536 | <para>Connect from gdb</para> | ||
537 | <para> | ||
538 | Example (using a directly connected port): | ||
539 | </para> | ||
540 | <programlisting> | ||
541 | % gdb ./vmlinux | ||
542 | (gdb) set remotebaud 115200 | ||
543 | (gdb) target remote /dev/ttyS0 | ||
544 | </programlisting> | ||
545 | <para> | ||
546 | Example (kgdb to a terminal server on TCP port 2012): | ||
547 | </para> | ||
548 | <programlisting> | ||
549 | % gdb ./vmlinux | ||
550 | (gdb) target remote 192.168.2.2:2012 | ||
551 | </programlisting> | ||
552 | <para> | ||
553 | Once connected, you can debug a kernel the way you would debug an | ||
554 | application program. | ||
555 | </para> | ||
556 | <para> | ||
557 | If you are having problems connecting or something is going | ||
558 | seriously wrong while debugging, it will most often be the case | ||
559 | that you want to enable gdb to be verbose about its target | ||
560 | communications. You do this prior to issuing the <constant>target | ||
561 | remote</constant> command by typing in: <constant>set debug remote 1</constant> | ||
562 | </para> | ||
563 | </listitem> | ||
564 | </orderedlist> | ||
565 | <para>Remember if you continue in gdb, and need to "break in" again, | ||
566 | you need to issue an other sysrq-g. It is easy to create a simple | ||
567 | entry point by putting a breakpoint at <constant>sys_sync</constant> | ||
568 | and then you can run "sync" from a shell or script to break into the | ||
569 | debugger.</para> | ||
570 | </sect1> | ||
571 | </chapter> | ||
572 | <chapter id="switchKdbKgdb"> | ||
573 | <title>kgdb and kdb interoperability</title> | ||
574 | <para>It is possible to transition between kdb and kgdb dynamically. | ||
575 | The debug core will remember which you used the last time and | ||
576 | automatically start in the same mode.</para> | ||
577 | <sect1> | ||
578 | <title>Switching between kdb and kgdb</title> | ||
579 | <sect2> | ||
580 | <title>Switching from kgdb to kdb</title> | ||
581 | <para> | ||
582 | There are two ways to switch from kgdb to kdb: you can use gdb to | ||
583 | issue a maintenance packet, or you can blindly type the command $3#33. | ||
584 | Whenever the kernel debugger stops in kgdb mode it will print the | ||
585 | message <constant>KGDB or $3#33 for KDB</constant>. It is important | ||
586 | to note that you have to type the sequence correctly in one pass. | ||
587 | You cannot type a backspace or delete because kgdb will interpret | ||
588 | that as part of the debug stream. | ||
589 | <orderedlist> | ||
590 | <listitem><para>Change from kgdb to kdb by blindly typing:</para> | ||
591 | <para><constant>$3#33</constant></para></listitem> | ||
592 | <listitem><para>Change from kgdb to kdb with gdb</para> | ||
593 | <para><constant>maintenance packet 3</constant></para> | ||
594 | <para>NOTE: Now you must kill gdb. Typically you press control-z and | ||
595 | issue the command: kill -9 %</para></listitem> | ||
596 | </orderedlist> | ||
597 | </para> | ||
598 | </sect2> | ||
599 | <sect2> | ||
600 | <title>Change from kdb to kgdb</title> | ||
601 | <para>There are two ways you can change from kdb to kgdb. You can | ||
602 | manually enter kgdb mode by issuing the kgdb command from the kdb | ||
603 | shell prompt, or you can connect gdb while the kdb shell prompt is | ||
604 | active. The kdb shell looks for the typical first commands that gdb | ||
605 | would issue with the gdb remote protocol and if it sees one of those | ||
606 | commands it automatically changes into kgdb mode.</para> | ||
607 | <orderedlist> | ||
608 | <listitem><para>From kdb issue the command:</para> | ||
609 | <para><constant>kgdb</constant></para> | ||
610 | <para>Now disconnect your terminal program and connect gdb in its place</para></listitem> | ||
611 | <listitem><para>At the kdb prompt, disconnect the terminal program and connect gdb in its place.</para></listitem> | ||
612 | </orderedlist> | ||
613 | </sect2> | ||
614 | </sect1> | ||
615 | <sect1> | ||
616 | <title>Running kdb commands from gdb</title> | ||
617 | <para>It is possible to run a limited set of kdb commands from gdb, | ||
618 | using the gdb monitor command. You don't want to execute any of the | ||
619 | run control or breakpoint operations, because it can disrupt the | ||
620 | state of the kernel debugger. You should be using gdb for | ||
621 | breakpoints and run control operations if you have gdb connected. | ||
622 | The more useful commands to run are things like lsmod, dmesg, ps or | ||
623 | possibly some of the memory information commands. To see all the kdb | ||
624 | commands you can run <constant>monitor help</constant>.</para> | ||
625 | <para>Example: | ||
626 | <informalexample><programlisting> | ||
627 | (gdb) monitor ps | ||
628 | 1 idle process (state I) and | ||
629 | 27 sleeping system daemon (state M) processes suppressed, | ||
630 | use 'ps A' to see all. | ||
631 | Task Addr Pid Parent [*] cpu State Thread Command | ||
632 | |||
633 | 0xc78291d0 1 0 0 0 S 0xc7829404 init | ||
634 | 0xc7954150 942 1 0 0 S 0xc7954384 dropbear | ||
635 | 0xc78789c0 944 1 0 0 S 0xc7878bf4 sh | ||
636 | (gdb) | ||
637 | </programlisting></informalexample> | ||
638 | </para> | ||
639 | </sect1> | ||
640 | </chapter> | ||
641 | <chapter id="KGDBTestSuite"> | ||
642 | <title>kgdb Test Suite</title> | ||
643 | <para> | ||
644 | When kgdb is enabled in the kernel config you can also elect to | ||
645 | enable the config parameter KGDB_TESTS. Turning this on will | ||
646 | enable a special kgdb I/O module which is designed to test the | ||
647 | kgdb internal functions. | ||
648 | </para> | ||
649 | <para> | ||
650 | The kgdb tests are mainly intended for developers to test the kgdb | ||
651 | internals as well as a tool for developing a new kgdb architecture | ||
652 | specific implementation. These tests are not really for end users | ||
653 | of the Linux kernel. The primary source of documentation would be | ||
654 | to look in the drivers/misc/kgdbts.c file. | ||
655 | </para> | ||
656 | <para> | ||
657 | The kgdb test suite can also be configured at compile time to run | ||
658 | the core set of tests by setting the kernel config parameter | ||
659 | KGDB_TESTS_ON_BOOT. This particular option is aimed at automated | ||
660 | regression testing and does not require modifying the kernel boot | ||
661 | config arguments. If this is turned on, the kgdb test suite can | ||
662 | be disabled by specifying "kgdbts=" as a kernel boot argument. | ||
663 | </para> | ||
664 | </chapter> | ||
665 | <chapter id="CommonBackEndReq"> | ||
666 | <title>Kernel Debugger Internals</title> | ||
667 | <sect1 id="kgdbArchitecture"> | ||
668 | <title>Architecture Specifics</title> | ||
669 | <para> | ||
670 | The kernel debugger is organized into a number of components: | ||
671 | <orderedlist> | ||
672 | <listitem><para>The debug core</para> | ||
673 | <para> | ||
674 | The debug core is found in kernel/debugger/debug_core.c. It contains: | ||
675 | <itemizedlist> | ||
676 | <listitem><para>A generic OS exception handler which includes | ||
677 | sync'ing the processors into a stopped state on an multi-CPU | ||
678 | system.</para></listitem> | ||
679 | <listitem><para>The API to talk to the kgdb I/O drivers</para></listitem> | ||
680 | <listitem><para>The API to make calls to the arch-specific kgdb implementation</para></listitem> | ||
681 | <listitem><para>The logic to perform safe memory reads and writes to memory while using the debugger</para></listitem> | ||
682 | <listitem><para>A full implementation for software breakpoints unless overridden by the arch</para></listitem> | ||
683 | <listitem><para>The API to invoke either the kdb or kgdb frontend to the debug core.</para></listitem> | ||
684 | <listitem><para>The structures and callback API for atomic kernel mode setting.</para> | ||
685 | <para>NOTE: kgdboc is where the kms callbacks are invoked.</para></listitem> | ||
686 | </itemizedlist> | ||
687 | </para> | ||
688 | </listitem> | ||
689 | <listitem><para>kgdb arch-specific implementation</para> | ||
690 | <para> | ||
691 | This implementation is generally found in arch/*/kernel/kgdb.c. | ||
692 | As an example, arch/x86/kernel/kgdb.c contains the specifics to | ||
693 | implement HW breakpoint as well as the initialization to | ||
694 | dynamically register and unregister for the trap handlers on | ||
695 | this architecture. The arch-specific portion implements: | ||
696 | <itemizedlist> | ||
697 | <listitem><para>contains an arch-specific trap catcher which | ||
698 | invokes kgdb_handle_exception() to start kgdb about doing its | ||
699 | work</para></listitem> | ||
700 | <listitem><para>translation to and from gdb specific packet format to pt_regs</para></listitem> | ||
701 | <listitem><para>Registration and unregistration of architecture specific trap hooks</para></listitem> | ||
702 | <listitem><para>Any special exception handling and cleanup</para></listitem> | ||
703 | <listitem><para>NMI exception handling and cleanup</para></listitem> | ||
704 | <listitem><para>(optional) HW breakpoints</para></listitem> | ||
705 | </itemizedlist> | ||
706 | </para> | ||
707 | </listitem> | ||
708 | <listitem><para>gdbstub frontend (aka kgdb)</para> | ||
709 | <para>The gdbstub is located in kernel/debug/gdbstub.c. It contains:</para> | ||
710 | <itemizedlist> | ||
711 | <listitem><para>All the logic to implement the gdb serial protocol</para></listitem> | ||
712 | </itemizedlist> | ||
713 | </listitem> | ||
714 | <listitem><para>kdb frontend</para> | ||
715 | <para>The kdb debugger shell is broken down into a number of | ||
716 | components. The kdb core is located in kernel/debug/kdb. There | ||
717 | are a number of helper functions in some of the other kernel | ||
718 | components to make it possible for kdb to examine and report | ||
719 | information about the kernel without taking locks that could | ||
720 | cause a kernel deadlock. The kdb core contains implements the following functionality.</para> | ||
721 | <itemizedlist> | ||
722 | <listitem><para>A simple shell</para></listitem> | ||
723 | <listitem><para>The kdb core command set</para></listitem> | ||
724 | <listitem><para>A registration API to register additional kdb shell commands.</para> | ||
725 | <itemizedlist> | ||
726 | <listitem><para>A good example of a self-contained kdb module | ||
727 | is the "ftdump" command for dumping the ftrace buffer. See: | ||
728 | kernel/trace/trace_kdb.c</para></listitem> | ||
729 | <listitem><para>For an example of how to dynamically register | ||
730 | a new kdb command you can build the kdb_hello.ko kernel module | ||
731 | from samples/kdb/kdb_hello.c. To build this example you can | ||
732 | set CONFIG_SAMPLES=y and CONFIG_SAMPLE_KDB=m in your kernel | ||
733 | config. Later run "modprobe kdb_hello" and the next time you | ||
734 | enter the kdb shell, you can run the "hello" | ||
735 | command.</para></listitem> | ||
736 | </itemizedlist></listitem> | ||
737 | <listitem><para>The implementation for kdb_printf() which | ||
738 | emits messages directly to I/O drivers, bypassing the kernel | ||
739 | log.</para></listitem> | ||
740 | <listitem><para>SW / HW breakpoint management for the kdb shell</para></listitem> | ||
741 | </itemizedlist> | ||
742 | </listitem> | ||
743 | <listitem><para>kgdb I/O driver</para> | ||
744 | <para> | ||
745 | Each kgdb I/O driver has to provide an implementation for the following: | ||
746 | <itemizedlist> | ||
747 | <listitem><para>configuration via built-in or module</para></listitem> | ||
748 | <listitem><para>dynamic configuration and kgdb hook registration calls</para></listitem> | ||
749 | <listitem><para>read and write character interface</para></listitem> | ||
750 | <listitem><para>A cleanup handler for unconfiguring from the kgdb core</para></listitem> | ||
751 | <listitem><para>(optional) Early debug methodology</para></listitem> | ||
752 | </itemizedlist> | ||
753 | Any given kgdb I/O driver has to operate very closely with the | ||
754 | hardware and must do it in such a way that does not enable | ||
755 | interrupts or change other parts of the system context without | ||
756 | completely restoring them. The kgdb core will repeatedly "poll" | ||
757 | a kgdb I/O driver for characters when it needs input. The I/O | ||
758 | driver is expected to return immediately if there is no data | ||
759 | available. Doing so allows for the future possibility to touch | ||
760 | watchdog hardware in such a way as to have a target system not | ||
761 | reset when these are enabled. | ||
762 | </para> | ||
763 | </listitem> | ||
764 | </orderedlist> | ||
765 | </para> | ||
766 | <para> | ||
767 | If you are intent on adding kgdb architecture specific support | ||
768 | for a new architecture, the architecture should define | ||
769 | <constant>HAVE_ARCH_KGDB</constant> in the architecture specific | ||
770 | Kconfig file. This will enable kgdb for the architecture, and | ||
771 | at that point you must create an architecture specific kgdb | ||
772 | implementation. | ||
773 | </para> | ||
774 | <para> | ||
775 | There are a few flags which must be set on every architecture in | ||
776 | their <asm/kgdb.h> file. These are: | ||
777 | <itemizedlist> | ||
778 | <listitem> | ||
779 | <para> | ||
780 | NUMREGBYTES: The size in bytes of all of the registers, so | ||
781 | that we can ensure they will all fit into a packet. | ||
782 | </para> | ||
783 | </listitem> | ||
784 | <listitem> | ||
785 | <para> | ||
786 | BUFMAX: The size in bytes of the buffer GDB will read into. | ||
787 | This must be larger than NUMREGBYTES. | ||
788 | </para> | ||
789 | </listitem> | ||
790 | <listitem> | ||
791 | <para> | ||
792 | CACHE_FLUSH_IS_SAFE: Set to 1 if it is always safe to call | ||
793 | flush_cache_range or flush_icache_range. On some architectures, | ||
794 | these functions may not be safe to call on SMP since we keep other | ||
795 | CPUs in a holding pattern. | ||
796 | </para> | ||
797 | </listitem> | ||
798 | </itemizedlist> | ||
799 | </para> | ||
800 | <para> | ||
801 | There are also the following functions for the common backend, | ||
802 | found in kernel/kgdb.c, that must be supplied by the | ||
803 | architecture-specific backend unless marked as (optional), in | ||
804 | which case a default function maybe used if the architecture | ||
805 | does not need to provide a specific implementation. | ||
806 | </para> | ||
807 | !Iinclude/linux/kgdb.h | ||
808 | </sect1> | ||
809 | <sect1 id="kgdbocDesign"> | ||
810 | <title>kgdboc internals</title> | ||
811 | <sect2> | ||
812 | <title>kgdboc and uarts</title> | ||
813 | <para> | ||
814 | The kgdboc driver is actually a very thin driver that relies on the | ||
815 | underlying low level to the hardware driver having "polling hooks" | ||
816 | to which the tty driver is attached. In the initial | ||
817 | implementation of kgdboc the serial_core was changed to expose a | ||
818 | low level UART hook for doing polled mode reading and writing of a | ||
819 | single character while in an atomic context. When kgdb makes an I/O | ||
820 | request to the debugger, kgdboc invokes a callback in the serial | ||
821 | core which in turn uses the callback in the UART driver.</para> | ||
822 | <para> | ||
823 | When using kgdboc with a UART, the UART driver must implement two callbacks in the <constant>struct uart_ops</constant>. Example from drivers/8250.c:<programlisting> | ||
824 | #ifdef CONFIG_CONSOLE_POLL | ||
825 | .poll_get_char = serial8250_get_poll_char, | ||
826 | .poll_put_char = serial8250_put_poll_char, | ||
827 | #endif | ||
828 | </programlisting> | ||
829 | Any implementation specifics around creating a polling driver use the | ||
830 | <constant>#ifdef CONFIG_CONSOLE_POLL</constant>, as shown above. | ||
831 | Keep in mind that polling hooks have to be implemented in such a way | ||
832 | that they can be called from an atomic context and have to restore | ||
833 | the state of the UART chip on return such that the system can return | ||
834 | to normal when the debugger detaches. You need to be very careful | ||
835 | with any kind of lock you consider, because failing here is most likely | ||
836 | going to mean pressing the reset button. | ||
837 | </para> | ||
838 | </sect2> | ||
839 | <sect2 id="kgdbocKbd"> | ||
840 | <title>kgdboc and keyboards</title> | ||
841 | <para>The kgdboc driver contains logic to configure communications | ||
842 | with an attached keyboard. The keyboard infrastructure is only | ||
843 | compiled into the kernel when CONFIG_KDB_KEYBOARD=y is set in the | ||
844 | kernel configuration.</para> | ||
845 | <para>The core polled keyboard driver driver for PS/2 type keyboards | ||
846 | is in drivers/char/kdb_keyboard.c. This driver is hooked into the | ||
847 | debug core when kgdboc populates the callback in the array | ||
848 | called <constant>kdb_poll_funcs[]</constant>. The | ||
849 | kdb_get_kbd_char() is the top-level function which polls hardware | ||
850 | for single character input. | ||
851 | </para> | ||
852 | </sect2> | ||
853 | <sect2 id="kgdbocKms"> | ||
854 | <title>kgdboc and kms</title> | ||
855 | <para>The kgdboc driver contains logic to request the graphics | ||
856 | display to switch to a text context when you are using | ||
857 | "kgdboc=kms,kbd", provided that you have a video driver which has a | ||
858 | frame buffer console and atomic kernel mode setting support.</para> | ||
859 | <para> | ||
860 | Every time the kernel | ||
861 | debugger is entered it calls kgdboc_pre_exp_handler() which in turn | ||
862 | calls con_debug_enter() in the virtual console layer. On resuming kernel | ||
863 | execution, the kernel debugger calls kgdboc_post_exp_handler() which | ||
864 | in turn calls con_debug_leave().</para> | ||
865 | <para>Any video driver that wants to be compatible with the kernel | ||
866 | debugger and the atomic kms callbacks must implement the | ||
867 | mode_set_base_atomic, fb_debug_enter and fb_debug_leave operations. | ||
868 | For the fb_debug_enter and fb_debug_leave the option exists to use | ||
869 | the generic drm fb helper functions or implement something custom for | ||
870 | the hardware. The following example shows the initialization of the | ||
871 | .mode_set_base_atomic operation in | ||
872 | drivers/gpu/drm/i915/intel_display.c: | ||
873 | <informalexample> | ||
874 | <programlisting> | ||
875 | static const struct drm_crtc_helper_funcs intel_helper_funcs = { | ||
876 | [...] | ||
877 | .mode_set_base_atomic = intel_pipe_set_base_atomic, | ||
878 | [...] | ||
879 | }; | ||
880 | </programlisting> | ||
881 | </informalexample> | ||
882 | </para> | ||
883 | <para>Here is an example of how the i915 driver initializes the fb_debug_enter and fb_debug_leave functions to use the generic drm helpers in | ||
884 | drivers/gpu/drm/i915/intel_fb.c: | ||
885 | <informalexample> | ||
886 | <programlisting> | ||
887 | static struct fb_ops intelfb_ops = { | ||
888 | [...] | ||
889 | .fb_debug_enter = drm_fb_helper_debug_enter, | ||
890 | .fb_debug_leave = drm_fb_helper_debug_leave, | ||
891 | [...] | ||
892 | }; | ||
893 | </programlisting> | ||
894 | </informalexample> | ||
895 | </para> | ||
896 | </sect2> | ||
897 | </sect1> | ||
898 | </chapter> | ||
899 | <chapter id="credits"> | ||
900 | <title>Credits</title> | ||
901 | <para> | ||
902 | The following people have contributed to this document: | ||
903 | <orderedlist> | ||
904 | <listitem><para>Amit Kale<email>amitkale@linsyssoft.com</email></para></listitem> | ||
905 | <listitem><para>Tom Rini<email>trini@kernel.crashing.org</email></para></listitem> | ||
906 | </orderedlist> | ||
907 | In March 2008 this document was completely rewritten by: | ||
908 | <itemizedlist> | ||
909 | <listitem><para>Jason Wessel<email>jason.wessel@windriver.com</email></para></listitem> | ||
910 | </itemizedlist> | ||
911 | In Jan 2010 this document was updated to include kdb. | ||
912 | <itemizedlist> | ||
913 | <listitem><para>Jason Wessel<email>jason.wessel@windriver.com</email></para></listitem> | ||
914 | </itemizedlist> | ||
915 | </para> | ||
916 | </chapter> | ||
917 | </book> | ||
918 | |||
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl deleted file mode 100644 index 0320910b866d..000000000000 --- a/Documentation/DocBook/libata.tmpl +++ /dev/null | |||
@@ -1,1625 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="libataDevGuide"> | ||
6 | <bookinfo> | ||
7 | <title>libATA Developer's Guide</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Jeff</firstname> | ||
12 | <surname>Garzik</surname> | ||
13 | </author> | ||
14 | </authorgroup> | ||
15 | |||
16 | <copyright> | ||
17 | <year>2003-2006</year> | ||
18 | <holder>Jeff Garzik</holder> | ||
19 | </copyright> | ||
20 | |||
21 | <legalnotice> | ||
22 | <para> | ||
23 | The contents of this file are subject to the Open | ||
24 | Software License version 1.1 that can be found at | ||
25 | <ulink url="http://fedoraproject.org/wiki/Licensing:OSL1.1">http://fedoraproject.org/wiki/Licensing:OSL1.1</ulink> | ||
26 | and is included herein by reference. | ||
27 | </para> | ||
28 | |||
29 | <para> | ||
30 | Alternatively, the contents of this file may be used under the terms | ||
31 | of the GNU General Public License version 2 (the "GPL") as distributed | ||
32 | in the kernel source COPYING file, in which case the provisions of | ||
33 | the GPL are applicable instead of the above. If you wish to allow | ||
34 | the use of your version of this file only under the terms of the | ||
35 | GPL and not to allow others to use your version of this file under | ||
36 | the OSL, indicate your decision by deleting the provisions above and | ||
37 | replace them with the notice and other provisions required by the GPL. | ||
38 | If you do not delete the provisions above, a recipient may use your | ||
39 | version of this file under either the OSL or the GPL. | ||
40 | </para> | ||
41 | |||
42 | </legalnotice> | ||
43 | </bookinfo> | ||
44 | |||
45 | <toc></toc> | ||
46 | |||
47 | <chapter id="libataIntroduction"> | ||
48 | <title>Introduction</title> | ||
49 | <para> | ||
50 | libATA is a library used inside the Linux kernel to support ATA host | ||
51 | controllers and devices. libATA provides an ATA driver API, class | ||
52 | transports for ATA and ATAPI devices, and SCSI<->ATA translation | ||
53 | for ATA devices according to the T10 SAT specification. | ||
54 | </para> | ||
55 | <para> | ||
56 | This Guide documents the libATA driver API, library functions, library | ||
57 | internals, and a couple sample ATA low-level drivers. | ||
58 | </para> | ||
59 | </chapter> | ||
60 | |||
61 | <chapter id="libataDriverApi"> | ||
62 | <title>libata Driver API</title> | ||
63 | <para> | ||
64 | struct ata_port_operations is defined for every low-level libata | ||
65 | hardware driver, and it controls how the low-level driver | ||
66 | interfaces with the ATA and SCSI layers. | ||
67 | </para> | ||
68 | <para> | ||
69 | FIS-based drivers will hook into the system with ->qc_prep() and | ||
70 | ->qc_issue() high-level hooks. Hardware which behaves in a manner | ||
71 | similar to PCI IDE hardware may utilize several generic helpers, | ||
72 | defining at a bare minimum the bus I/O addresses of the ATA shadow | ||
73 | register blocks. | ||
74 | </para> | ||
75 | <sect1> | ||
76 | <title>struct ata_port_operations</title> | ||
77 | |||
78 | <sect2><title>Disable ATA port</title> | ||
79 | <programlisting> | ||
80 | void (*port_disable) (struct ata_port *); | ||
81 | </programlisting> | ||
82 | |||
83 | <para> | ||
84 | Called from ata_bus_probe() error path, as well as when | ||
85 | unregistering from the SCSI module (rmmod, hot unplug). | ||
86 | This function should do whatever needs to be done to take the | ||
87 | port out of use. In most cases, ata_port_disable() can be used | ||
88 | as this hook. | ||
89 | </para> | ||
90 | <para> | ||
91 | Called from ata_bus_probe() on a failed probe. | ||
92 | Called from ata_scsi_release(). | ||
93 | </para> | ||
94 | |||
95 | </sect2> | ||
96 | |||
97 | <sect2><title>Post-IDENTIFY device configuration</title> | ||
98 | <programlisting> | ||
99 | void (*dev_config) (struct ata_port *, struct ata_device *); | ||
100 | </programlisting> | ||
101 | |||
102 | <para> | ||
103 | Called after IDENTIFY [PACKET] DEVICE is issued to each device | ||
104 | found. Typically used to apply device-specific fixups prior to | ||
105 | issue of SET FEATURES - XFER MODE, and prior to operation. | ||
106 | </para> | ||
107 | <para> | ||
108 | This entry may be specified as NULL in ata_port_operations. | ||
109 | </para> | ||
110 | |||
111 | </sect2> | ||
112 | |||
113 | <sect2><title>Set PIO/DMA mode</title> | ||
114 | <programlisting> | ||
115 | void (*set_piomode) (struct ata_port *, struct ata_device *); | ||
116 | void (*set_dmamode) (struct ata_port *, struct ata_device *); | ||
117 | void (*post_set_mode) (struct ata_port *); | ||
118 | unsigned int (*mode_filter) (struct ata_port *, struct ata_device *, unsigned int); | ||
119 | </programlisting> | ||
120 | |||
121 | <para> | ||
122 | Hooks called prior to the issue of SET FEATURES - XFER MODE | ||
123 | command. The optional ->mode_filter() hook is called when libata | ||
124 | has built a mask of the possible modes. This is passed to the | ||
125 | ->mode_filter() function which should return a mask of valid modes | ||
126 | after filtering those unsuitable due to hardware limits. It is not | ||
127 | valid to use this interface to add modes. | ||
128 | </para> | ||
129 | <para> | ||
130 | dev->pio_mode and dev->dma_mode are guaranteed to be valid when | ||
131 | ->set_piomode() and when ->set_dmamode() is called. The timings for | ||
132 | any other drive sharing the cable will also be valid at this point. | ||
133 | That is the library records the decisions for the modes of each | ||
134 | drive on a channel before it attempts to set any of them. | ||
135 | </para> | ||
136 | <para> | ||
137 | ->post_set_mode() is | ||
138 | called unconditionally, after the SET FEATURES - XFER MODE | ||
139 | command completes successfully. | ||
140 | </para> | ||
141 | |||
142 | <para> | ||
143 | ->set_piomode() is always called (if present), but | ||
144 | ->set_dma_mode() is only called if DMA is possible. | ||
145 | </para> | ||
146 | |||
147 | </sect2> | ||
148 | |||
149 | <sect2><title>Taskfile read/write</title> | ||
150 | <programlisting> | ||
151 | void (*sff_tf_load) (struct ata_port *ap, struct ata_taskfile *tf); | ||
152 | void (*sff_tf_read) (struct ata_port *ap, struct ata_taskfile *tf); | ||
153 | </programlisting> | ||
154 | |||
155 | <para> | ||
156 | ->tf_load() is called to load the given taskfile into hardware | ||
157 | registers / DMA buffers. ->tf_read() is called to read the | ||
158 | hardware registers / DMA buffers, to obtain the current set of | ||
159 | taskfile register values. | ||
160 | Most drivers for taskfile-based hardware (PIO or MMIO) use | ||
161 | ata_sff_tf_load() and ata_sff_tf_read() for these hooks. | ||
162 | </para> | ||
163 | |||
164 | </sect2> | ||
165 | |||
166 | <sect2><title>PIO data read/write</title> | ||
167 | <programlisting> | ||
168 | void (*sff_data_xfer) (struct ata_device *, unsigned char *, unsigned int, int); | ||
169 | </programlisting> | ||
170 | |||
171 | <para> | ||
172 | All bmdma-style drivers must implement this hook. This is the low-level | ||
173 | operation that actually copies the data bytes during a PIO data | ||
174 | transfer. | ||
175 | Typically the driver will choose one of ata_sff_data_xfer_noirq(), | ||
176 | ata_sff_data_xfer(), or ata_sff_data_xfer32(). | ||
177 | </para> | ||
178 | |||
179 | </sect2> | ||
180 | |||
181 | <sect2><title>ATA command execute</title> | ||
182 | <programlisting> | ||
183 | void (*sff_exec_command)(struct ata_port *ap, struct ata_taskfile *tf); | ||
184 | </programlisting> | ||
185 | |||
186 | <para> | ||
187 | causes an ATA command, previously loaded with | ||
188 | ->tf_load(), to be initiated in hardware. | ||
189 | Most drivers for taskfile-based hardware use ata_sff_exec_command() | ||
190 | for this hook. | ||
191 | </para> | ||
192 | |||
193 | </sect2> | ||
194 | |||
195 | <sect2><title>Per-cmd ATAPI DMA capabilities filter</title> | ||
196 | <programlisting> | ||
197 | int (*check_atapi_dma) (struct ata_queued_cmd *qc); | ||
198 | </programlisting> | ||
199 | |||
200 | <para> | ||
201 | Allow low-level driver to filter ATA PACKET commands, returning a status | ||
202 | indicating whether or not it is OK to use DMA for the supplied PACKET | ||
203 | command. | ||
204 | </para> | ||
205 | <para> | ||
206 | This hook may be specified as NULL, in which case libata will | ||
207 | assume that atapi dma can be supported. | ||
208 | </para> | ||
209 | |||
210 | </sect2> | ||
211 | |||
212 | <sect2><title>Read specific ATA shadow registers</title> | ||
213 | <programlisting> | ||
214 | u8 (*sff_check_status)(struct ata_port *ap); | ||
215 | u8 (*sff_check_altstatus)(struct ata_port *ap); | ||
216 | </programlisting> | ||
217 | |||
218 | <para> | ||
219 | Reads the Status/AltStatus ATA shadow register from | ||
220 | hardware. On some hardware, reading the Status register has | ||
221 | the side effect of clearing the interrupt condition. | ||
222 | Most drivers for taskfile-based hardware use | ||
223 | ata_sff_check_status() for this hook. | ||
224 | </para> | ||
225 | |||
226 | </sect2> | ||
227 | |||
228 | <sect2><title>Write specific ATA shadow register</title> | ||
229 | <programlisting> | ||
230 | void (*sff_set_devctl)(struct ata_port *ap, u8 ctl); | ||
231 | </programlisting> | ||
232 | |||
233 | <para> | ||
234 | Write the device control ATA shadow register to the hardware. | ||
235 | Most drivers don't need to define this. | ||
236 | </para> | ||
237 | |||
238 | </sect2> | ||
239 | |||
240 | <sect2><title>Select ATA device on bus</title> | ||
241 | <programlisting> | ||
242 | void (*sff_dev_select)(struct ata_port *ap, unsigned int device); | ||
243 | </programlisting> | ||
244 | |||
245 | <para> | ||
246 | Issues the low-level hardware command(s) that causes one of N | ||
247 | hardware devices to be considered 'selected' (active and | ||
248 | available for use) on the ATA bus. This generally has no | ||
249 | meaning on FIS-based devices. | ||
250 | </para> | ||
251 | <para> | ||
252 | Most drivers for taskfile-based hardware use | ||
253 | ata_sff_dev_select() for this hook. | ||
254 | </para> | ||
255 | |||
256 | </sect2> | ||
257 | |||
258 | <sect2><title>Private tuning method</title> | ||
259 | <programlisting> | ||
260 | void (*set_mode) (struct ata_port *ap); | ||
261 | </programlisting> | ||
262 | |||
263 | <para> | ||
264 | By default libata performs drive and controller tuning in | ||
265 | accordance with the ATA timing rules and also applies blacklists | ||
266 | and cable limits. Some controllers need special handling and have | ||
267 | custom tuning rules, typically raid controllers that use ATA | ||
268 | commands but do not actually do drive timing. | ||
269 | </para> | ||
270 | |||
271 | <warning> | ||
272 | <para> | ||
273 | This hook should not be used to replace the standard controller | ||
274 | tuning logic when a controller has quirks. Replacing the default | ||
275 | tuning logic in that case would bypass handling for drive and | ||
276 | bridge quirks that may be important to data reliability. If a | ||
277 | controller needs to filter the mode selection it should use the | ||
278 | mode_filter hook instead. | ||
279 | </para> | ||
280 | </warning> | ||
281 | |||
282 | </sect2> | ||
283 | |||
284 | <sect2><title>Control PCI IDE BMDMA engine</title> | ||
285 | <programlisting> | ||
286 | void (*bmdma_setup) (struct ata_queued_cmd *qc); | ||
287 | void (*bmdma_start) (struct ata_queued_cmd *qc); | ||
288 | void (*bmdma_stop) (struct ata_port *ap); | ||
289 | u8 (*bmdma_status) (struct ata_port *ap); | ||
290 | </programlisting> | ||
291 | |||
292 | <para> | ||
293 | When setting up an IDE BMDMA transaction, these hooks arm | ||
294 | (->bmdma_setup), fire (->bmdma_start), and halt (->bmdma_stop) | ||
295 | the hardware's DMA engine. ->bmdma_status is used to read the standard | ||
296 | PCI IDE DMA Status register. | ||
297 | </para> | ||
298 | |||
299 | <para> | ||
300 | These hooks are typically either no-ops, or simply not implemented, in | ||
301 | FIS-based drivers. | ||
302 | </para> | ||
303 | <para> | ||
304 | Most legacy IDE drivers use ata_bmdma_setup() for the bmdma_setup() | ||
305 | hook. ata_bmdma_setup() will write the pointer to the PRD table to | ||
306 | the IDE PRD Table Address register, enable DMA in the DMA Command | ||
307 | register, and call exec_command() to begin the transfer. | ||
308 | </para> | ||
309 | <para> | ||
310 | Most legacy IDE drivers use ata_bmdma_start() for the bmdma_start() | ||
311 | hook. ata_bmdma_start() will write the ATA_DMA_START flag to the DMA | ||
312 | Command register. | ||
313 | </para> | ||
314 | <para> | ||
315 | Many legacy IDE drivers use ata_bmdma_stop() for the bmdma_stop() | ||
316 | hook. ata_bmdma_stop() clears the ATA_DMA_START flag in the DMA | ||
317 | command register. | ||
318 | </para> | ||
319 | <para> | ||
320 | Many legacy IDE drivers use ata_bmdma_status() as the bmdma_status() hook. | ||
321 | </para> | ||
322 | |||
323 | </sect2> | ||
324 | |||
325 | <sect2><title>High-level taskfile hooks</title> | ||
326 | <programlisting> | ||
327 | void (*qc_prep) (struct ata_queued_cmd *qc); | ||
328 | int (*qc_issue) (struct ata_queued_cmd *qc); | ||
329 | </programlisting> | ||
330 | |||
331 | <para> | ||
332 | Higher-level hooks, these two hooks can potentially supercede | ||
333 | several of the above taskfile/DMA engine hooks. ->qc_prep is | ||
334 | called after the buffers have been DMA-mapped, and is typically | ||
335 | used to populate the hardware's DMA scatter-gather table. | ||
336 | Most drivers use the standard ata_qc_prep() helper function, but | ||
337 | more advanced drivers roll their own. | ||
338 | </para> | ||
339 | <para> | ||
340 | ->qc_issue is used to make a command active, once the hardware | ||
341 | and S/G tables have been prepared. IDE BMDMA drivers use the | ||
342 | helper function ata_qc_issue_prot() for taskfile protocol-based | ||
343 | dispatch. More advanced drivers implement their own ->qc_issue. | ||
344 | </para> | ||
345 | <para> | ||
346 | ata_qc_issue_prot() calls ->tf_load(), ->bmdma_setup(), and | ||
347 | ->bmdma_start() as necessary to initiate a transfer. | ||
348 | </para> | ||
349 | |||
350 | </sect2> | ||
351 | |||
352 | <sect2><title>Exception and probe handling (EH)</title> | ||
353 | <programlisting> | ||
354 | void (*eng_timeout) (struct ata_port *ap); | ||
355 | void (*phy_reset) (struct ata_port *ap); | ||
356 | </programlisting> | ||
357 | |||
358 | <para> | ||
359 | Deprecated. Use ->error_handler() instead. | ||
360 | </para> | ||
361 | |||
362 | <programlisting> | ||
363 | void (*freeze) (struct ata_port *ap); | ||
364 | void (*thaw) (struct ata_port *ap); | ||
365 | </programlisting> | ||
366 | |||
367 | <para> | ||
368 | ata_port_freeze() is called when HSM violations or some other | ||
369 | condition disrupts normal operation of the port. A frozen port | ||
370 | is not allowed to perform any operation until the port is | ||
371 | thawed, which usually follows a successful reset. | ||
372 | </para> | ||
373 | |||
374 | <para> | ||
375 | The optional ->freeze() callback can be used for freezing the port | ||
376 | hardware-wise (e.g. mask interrupt and stop DMA engine). If a | ||
377 | port cannot be frozen hardware-wise, the interrupt handler | ||
378 | must ack and clear interrupts unconditionally while the port | ||
379 | is frozen. | ||
380 | </para> | ||
381 | <para> | ||
382 | The optional ->thaw() callback is called to perform the opposite of ->freeze(): | ||
383 | prepare the port for normal operation once again. Unmask interrupts, | ||
384 | start DMA engine, etc. | ||
385 | </para> | ||
386 | |||
387 | <programlisting> | ||
388 | void (*error_handler) (struct ata_port *ap); | ||
389 | </programlisting> | ||
390 | |||
391 | <para> | ||
392 | ->error_handler() is a driver's hook into probe, hotplug, and recovery | ||
393 | and other exceptional conditions. The primary responsibility of an | ||
394 | implementation is to call ata_do_eh() or ata_bmdma_drive_eh() with a set | ||
395 | of EH hooks as arguments: | ||
396 | </para> | ||
397 | |||
398 | <para> | ||
399 | 'prereset' hook (may be NULL) is called during an EH reset, before any other actions | ||
400 | are taken. | ||
401 | </para> | ||
402 | |||
403 | <para> | ||
404 | 'postreset' hook (may be NULL) is called after the EH reset is performed. Based on | ||
405 | existing conditions, severity of the problem, and hardware capabilities, | ||
406 | </para> | ||
407 | |||
408 | <para> | ||
409 | Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be | ||
410 | called to perform the low-level EH reset. | ||
411 | </para> | ||
412 | |||
413 | <programlisting> | ||
414 | void (*post_internal_cmd) (struct ata_queued_cmd *qc); | ||
415 | </programlisting> | ||
416 | |||
417 | <para> | ||
418 | Perform any hardware-specific actions necessary to finish processing | ||
419 | after executing a probe-time or EH-time command via ata_exec_internal(). | ||
420 | </para> | ||
421 | |||
422 | </sect2> | ||
423 | |||
424 | <sect2><title>Hardware interrupt handling</title> | ||
425 | <programlisting> | ||
426 | irqreturn_t (*irq_handler)(int, void *, struct pt_regs *); | ||
427 | void (*irq_clear) (struct ata_port *); | ||
428 | </programlisting> | ||
429 | |||
430 | <para> | ||
431 | ->irq_handler is the interrupt handling routine registered with | ||
432 | the system, by libata. ->irq_clear is called during probe just | ||
433 | before the interrupt handler is registered, to be sure hardware | ||
434 | is quiet. | ||
435 | </para> | ||
436 | <para> | ||
437 | The second argument, dev_instance, should be cast to a pointer | ||
438 | to struct ata_host_set. | ||
439 | </para> | ||
440 | <para> | ||
441 | Most legacy IDE drivers use ata_sff_interrupt() for the | ||
442 | irq_handler hook, which scans all ports in the host_set, | ||
443 | determines which queued command was active (if any), and calls | ||
444 | ata_sff_host_intr(ap,qc). | ||
445 | </para> | ||
446 | <para> | ||
447 | Most legacy IDE drivers use ata_sff_irq_clear() for the | ||
448 | irq_clear() hook, which simply clears the interrupt and error | ||
449 | flags in the DMA status register. | ||
450 | </para> | ||
451 | |||
452 | </sect2> | ||
453 | |||
454 | <sect2><title>SATA phy read/write</title> | ||
455 | <programlisting> | ||
456 | int (*scr_read) (struct ata_port *ap, unsigned int sc_reg, | ||
457 | u32 *val); | ||
458 | int (*scr_write) (struct ata_port *ap, unsigned int sc_reg, | ||
459 | u32 val); | ||
460 | </programlisting> | ||
461 | |||
462 | <para> | ||
463 | Read and write standard SATA phy registers. Currently only used | ||
464 | if ->phy_reset hook called the sata_phy_reset() helper function. | ||
465 | sc_reg is one of SCR_STATUS, SCR_CONTROL, SCR_ERROR, or SCR_ACTIVE. | ||
466 | </para> | ||
467 | |||
468 | </sect2> | ||
469 | |||
470 | <sect2><title>Init and shutdown</title> | ||
471 | <programlisting> | ||
472 | int (*port_start) (struct ata_port *ap); | ||
473 | void (*port_stop) (struct ata_port *ap); | ||
474 | void (*host_stop) (struct ata_host_set *host_set); | ||
475 | </programlisting> | ||
476 | |||
477 | <para> | ||
478 | ->port_start() is called just after the data structures for each | ||
479 | port are initialized. Typically this is used to alloc per-port | ||
480 | DMA buffers / tables / rings, enable DMA engines, and similar | ||
481 | tasks. Some drivers also use this entry point as a chance to | ||
482 | allocate driver-private memory for ap->private_data. | ||
483 | </para> | ||
484 | <para> | ||
485 | Many drivers use ata_port_start() as this hook or call | ||
486 | it from their own port_start() hooks. ata_port_start() | ||
487 | allocates space for a legacy IDE PRD table and returns. | ||
488 | </para> | ||
489 | <para> | ||
490 | ->port_stop() is called after ->host_stop(). Its sole function | ||
491 | is to release DMA/memory resources, now that they are no longer | ||
492 | actively being used. Many drivers also free driver-private | ||
493 | data from port at this time. | ||
494 | </para> | ||
495 | <para> | ||
496 | ->host_stop() is called after all ->port_stop() calls | ||
497 | have completed. The hook must finalize hardware shutdown, release DMA | ||
498 | and other resources, etc. | ||
499 | This hook may be specified as NULL, in which case it is not called. | ||
500 | </para> | ||
501 | |||
502 | </sect2> | ||
503 | |||
504 | </sect1> | ||
505 | </chapter> | ||
506 | |||
507 | <chapter id="libataEH"> | ||
508 | <title>Error handling</title> | ||
509 | |||
510 | <para> | ||
511 | This chapter describes how errors are handled under libata. | ||
512 | Readers are advised to read SCSI EH | ||
513 | (Documentation/scsi/scsi_eh.txt) and ATA exceptions doc first. | ||
514 | </para> | ||
515 | |||
516 | <sect1><title>Origins of commands</title> | ||
517 | <para> | ||
518 | In libata, a command is represented with struct ata_queued_cmd | ||
519 | or qc. qc's are preallocated during port initialization and | ||
520 | repetitively used for command executions. Currently only one | ||
521 | qc is allocated per port but yet-to-be-merged NCQ branch | ||
522 | allocates one for each tag and maps each qc to NCQ tag 1-to-1. | ||
523 | </para> | ||
524 | <para> | ||
525 | libata commands can originate from two sources - libata itself | ||
526 | and SCSI midlayer. libata internal commands are used for | ||
527 | initialization and error handling. All normal blk requests | ||
528 | and commands for SCSI emulation are passed as SCSI commands | ||
529 | through queuecommand callback of SCSI host template. | ||
530 | </para> | ||
531 | </sect1> | ||
532 | |||
533 | <sect1><title>How commands are issued</title> | ||
534 | |||
535 | <variablelist> | ||
536 | |||
537 | <varlistentry><term>Internal commands</term> | ||
538 | <listitem> | ||
539 | <para> | ||
540 | First, qc is allocated and initialized using | ||
541 | ata_qc_new_init(). Although ata_qc_new_init() doesn't | ||
542 | implement any wait or retry mechanism when qc is not | ||
543 | available, internal commands are currently issued only during | ||
544 | initialization and error recovery, so no other command is | ||
545 | active and allocation is guaranteed to succeed. | ||
546 | </para> | ||
547 | <para> | ||
548 | Once allocated qc's taskfile is initialized for the command to | ||
549 | be executed. qc currently has two mechanisms to notify | ||
550 | completion. One is via qc->complete_fn() callback and the | ||
551 | other is completion qc->waiting. qc->complete_fn() callback | ||
552 | is the asynchronous path used by normal SCSI translated | ||
553 | commands and qc->waiting is the synchronous (issuer sleeps in | ||
554 | process context) path used by internal commands. | ||
555 | </para> | ||
556 | <para> | ||
557 | Once initialization is complete, host_set lock is acquired | ||
558 | and the qc is issued. | ||
559 | </para> | ||
560 | </listitem> | ||
561 | </varlistentry> | ||
562 | |||
563 | <varlistentry><term>SCSI commands</term> | ||
564 | <listitem> | ||
565 | <para> | ||
566 | All libata drivers use ata_scsi_queuecmd() as | ||
567 | hostt->queuecommand callback. scmds can either be simulated | ||
568 | or translated. No qc is involved in processing a simulated | ||
569 | scmd. The result is computed right away and the scmd is | ||
570 | completed. | ||
571 | </para> | ||
572 | <para> | ||
573 | For a translated scmd, ata_qc_new_init() is invoked to | ||
574 | allocate a qc and the scmd is translated into the qc. SCSI | ||
575 | midlayer's completion notification function pointer is stored | ||
576 | into qc->scsidone. | ||
577 | </para> | ||
578 | <para> | ||
579 | qc->complete_fn() callback is used for completion | ||
580 | notification. ATA commands use ata_scsi_qc_complete() while | ||
581 | ATAPI commands use atapi_qc_complete(). Both functions end up | ||
582 | calling qc->scsidone to notify upper layer when the qc is | ||
583 | finished. After translation is completed, the qc is issued | ||
584 | with ata_qc_issue(). | ||
585 | </para> | ||
586 | <para> | ||
587 | Note that SCSI midlayer invokes hostt->queuecommand while | ||
588 | holding host_set lock, so all above occur while holding | ||
589 | host_set lock. | ||
590 | </para> | ||
591 | </listitem> | ||
592 | </varlistentry> | ||
593 | |||
594 | </variablelist> | ||
595 | </sect1> | ||
596 | |||
597 | <sect1><title>How commands are processed</title> | ||
598 | <para> | ||
599 | Depending on which protocol and which controller are used, | ||
600 | commands are processed differently. For the purpose of | ||
601 | discussion, a controller which uses taskfile interface and all | ||
602 | standard callbacks is assumed. | ||
603 | </para> | ||
604 | <para> | ||
605 | Currently 6 ATA command protocols are used. They can be | ||
606 | sorted into the following four categories according to how | ||
607 | they are processed. | ||
608 | </para> | ||
609 | |||
610 | <variablelist> | ||
611 | <varlistentry><term>ATA NO DATA or DMA</term> | ||
612 | <listitem> | ||
613 | <para> | ||
614 | ATA_PROT_NODATA and ATA_PROT_DMA fall into this category. | ||
615 | These types of commands don't require any software | ||
616 | intervention once issued. Device will raise interrupt on | ||
617 | completion. | ||
618 | </para> | ||
619 | </listitem> | ||
620 | </varlistentry> | ||
621 | |||
622 | <varlistentry><term>ATA PIO</term> | ||
623 | <listitem> | ||
624 | <para> | ||
625 | ATA_PROT_PIO is in this category. libata currently | ||
626 | implements PIO with polling. ATA_NIEN bit is set to turn | ||
627 | off interrupt and pio_task on ata_wq performs polling and | ||
628 | IO. | ||
629 | </para> | ||
630 | </listitem> | ||
631 | </varlistentry> | ||
632 | |||
633 | <varlistentry><term>ATAPI NODATA or DMA</term> | ||
634 | <listitem> | ||
635 | <para> | ||
636 | ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this | ||
637 | category. packet_task is used to poll BSY bit after | ||
638 | issuing PACKET command. Once BSY is turned off by the | ||
639 | device, packet_task transfers CDB and hands off processing | ||
640 | to interrupt handler. | ||
641 | </para> | ||
642 | </listitem> | ||
643 | </varlistentry> | ||
644 | |||
645 | <varlistentry><term>ATAPI PIO</term> | ||
646 | <listitem> | ||
647 | <para> | ||
648 | ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set | ||
649 | and, as in ATAPI NODATA or DMA, packet_task submits cdb. | ||
650 | However, after submitting cdb, further processing (data | ||
651 | transfer) is handed off to pio_task. | ||
652 | </para> | ||
653 | </listitem> | ||
654 | </varlistentry> | ||
655 | </variablelist> | ||
656 | </sect1> | ||
657 | |||
658 | <sect1><title>How commands are completed</title> | ||
659 | <para> | ||
660 | Once issued, all qc's are either completed with | ||
661 | ata_qc_complete() or time out. For commands which are handled | ||
662 | by interrupts, ata_host_intr() invokes ata_qc_complete(), and, | ||
663 | for PIO tasks, pio_task invokes ata_qc_complete(). In error | ||
664 | cases, packet_task may also complete commands. | ||
665 | </para> | ||
666 | <para> | ||
667 | ata_qc_complete() does the following. | ||
668 | </para> | ||
669 | |||
670 | <orderedlist> | ||
671 | |||
672 | <listitem> | ||
673 | <para> | ||
674 | DMA memory is unmapped. | ||
675 | </para> | ||
676 | </listitem> | ||
677 | |||
678 | <listitem> | ||
679 | <para> | ||
680 | ATA_QCFLAG_ACTIVE is cleared from qc->flags. | ||
681 | </para> | ||
682 | </listitem> | ||
683 | |||
684 | <listitem> | ||
685 | <para> | ||
686 | qc->complete_fn() callback is invoked. If the return value of | ||
687 | the callback is not zero. Completion is short circuited and | ||
688 | ata_qc_complete() returns. | ||
689 | </para> | ||
690 | </listitem> | ||
691 | |||
692 | <listitem> | ||
693 | <para> | ||
694 | __ata_qc_complete() is called, which does | ||
695 | <orderedlist> | ||
696 | |||
697 | <listitem> | ||
698 | <para> | ||
699 | qc->flags is cleared to zero. | ||
700 | </para> | ||
701 | </listitem> | ||
702 | |||
703 | <listitem> | ||
704 | <para> | ||
705 | ap->active_tag and qc->tag are poisoned. | ||
706 | </para> | ||
707 | </listitem> | ||
708 | |||
709 | <listitem> | ||
710 | <para> | ||
711 | qc->waiting is cleared & completed (in that order). | ||
712 | </para> | ||
713 | </listitem> | ||
714 | |||
715 | <listitem> | ||
716 | <para> | ||
717 | qc is deallocated by clearing appropriate bit in ap->qactive. | ||
718 | </para> | ||
719 | </listitem> | ||
720 | |||
721 | </orderedlist> | ||
722 | </para> | ||
723 | </listitem> | ||
724 | |||
725 | </orderedlist> | ||
726 | |||
727 | <para> | ||
728 | So, it basically notifies upper layer and deallocates qc. One | ||
729 | exception is short-circuit path in #3 which is used by | ||
730 | atapi_qc_complete(). | ||
731 | </para> | ||
732 | <para> | ||
733 | For all non-ATAPI commands, whether it fails or not, almost | ||
734 | the same code path is taken and very little error handling | ||
735 | takes place. A qc is completed with success status if it | ||
736 | succeeded, with failed status otherwise. | ||
737 | </para> | ||
738 | <para> | ||
739 | However, failed ATAPI commands require more handling as | ||
740 | REQUEST SENSE is needed to acquire sense data. If an ATAPI | ||
741 | command fails, ata_qc_complete() is invoked with error status, | ||
742 | which in turn invokes atapi_qc_complete() via | ||
743 | qc->complete_fn() callback. | ||
744 | </para> | ||
745 | <para> | ||
746 | This makes atapi_qc_complete() set scmd->result to | ||
747 | SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As | ||
748 | the sense data is empty but scmd->result is CHECK CONDITION, | ||
749 | SCSI midlayer will invoke EH for the scmd, and returning 1 | ||
750 | makes ata_qc_complete() to return without deallocating the qc. | ||
751 | This leads us to ata_scsi_error() with partially completed qc. | ||
752 | </para> | ||
753 | |||
754 | </sect1> | ||
755 | |||
756 | <sect1><title>ata_scsi_error()</title> | ||
757 | <para> | ||
758 | ata_scsi_error() is the current transportt->eh_strategy_handler() | ||
759 | for libata. As discussed above, this will be entered in two | ||
760 | cases - timeout and ATAPI error completion. This function | ||
761 | calls low level libata driver's eng_timeout() callback, the | ||
762 | standard callback for which is ata_eng_timeout(). It checks | ||
763 | if a qc is active and calls ata_qc_timeout() on the qc if so. | ||
764 | Actual error handling occurs in ata_qc_timeout(). | ||
765 | </para> | ||
766 | <para> | ||
767 | If EH is invoked for timeout, ata_qc_timeout() stops BMDMA and | ||
768 | completes the qc. Note that as we're currently in EH, we | ||
769 | cannot call scsi_done. As described in SCSI EH doc, a | ||
770 | recovered scmd should be either retried with | ||
771 | scsi_queue_insert() or finished with scsi_finish_command(). | ||
772 | Here, we override qc->scsidone with scsi_finish_command() and | ||
773 | calls ata_qc_complete(). | ||
774 | </para> | ||
775 | <para> | ||
776 | If EH is invoked due to a failed ATAPI qc, the qc here is | ||
777 | completed but not deallocated. The purpose of this | ||
778 | half-completion is to use the qc as place holder to make EH | ||
779 | code reach this place. This is a bit hackish, but it works. | ||
780 | </para> | ||
781 | <para> | ||
782 | Once control reaches here, the qc is deallocated by invoking | ||
783 | __ata_qc_complete() explicitly. Then, internal qc for REQUEST | ||
784 | SENSE is issued. Once sense data is acquired, scmd is | ||
785 | finished by directly invoking scsi_finish_command() on the | ||
786 | scmd. Note that as we already have completed and deallocated | ||
787 | the qc which was associated with the scmd, we don't need | ||
788 | to/cannot call ata_qc_complete() again. | ||
789 | </para> | ||
790 | |||
791 | </sect1> | ||
792 | |||
793 | <sect1><title>Problems with the current EH</title> | ||
794 | |||
795 | <itemizedlist> | ||
796 | |||
797 | <listitem> | ||
798 | <para> | ||
799 | Error representation is too crude. Currently any and all | ||
800 | error conditions are represented with ATA STATUS and ERROR | ||
801 | registers. Errors which aren't ATA device errors are treated | ||
802 | as ATA device errors by setting ATA_ERR bit. Better error | ||
803 | descriptor which can properly represent ATA and other | ||
804 | errors/exceptions is needed. | ||
805 | </para> | ||
806 | </listitem> | ||
807 | |||
808 | <listitem> | ||
809 | <para> | ||
810 | When handling timeouts, no action is taken to make device | ||
811 | forget about the timed out command and ready for new commands. | ||
812 | </para> | ||
813 | </listitem> | ||
814 | |||
815 | <listitem> | ||
816 | <para> | ||
817 | EH handling via ata_scsi_error() is not properly protected | ||
818 | from usual command processing. On EH entrance, the device is | ||
819 | not in quiescent state. Timed out commands may succeed or | ||
820 | fail any time. pio_task and atapi_task may still be running. | ||
821 | </para> | ||
822 | </listitem> | ||
823 | |||
824 | <listitem> | ||
825 | <para> | ||
826 | Too weak error recovery. Devices / controllers causing HSM | ||
827 | mismatch errors and other errors quite often require reset to | ||
828 | return to known state. Also, advanced error handling is | ||
829 | necessary to support features like NCQ and hotplug. | ||
830 | </para> | ||
831 | </listitem> | ||
832 | |||
833 | <listitem> | ||
834 | <para> | ||
835 | ATA errors are directly handled in the interrupt handler and | ||
836 | PIO errors in pio_task. This is problematic for advanced | ||
837 | error handling for the following reasons. | ||
838 | </para> | ||
839 | <para> | ||
840 | First, advanced error handling often requires context and | ||
841 | internal qc execution. | ||
842 | </para> | ||
843 | <para> | ||
844 | Second, even a simple failure (say, CRC error) needs | ||
845 | information gathering and could trigger complex error handling | ||
846 | (say, resetting & reconfiguring). Having multiple code | ||
847 | paths to gather information, enter EH and trigger actions | ||
848 | makes life painful. | ||
849 | </para> | ||
850 | <para> | ||
851 | Third, scattered EH code makes implementing low level drivers | ||
852 | difficult. Low level drivers override libata callbacks. If | ||
853 | EH is scattered over several places, each affected callbacks | ||
854 | should perform its part of error handling. This can be error | ||
855 | prone and painful. | ||
856 | </para> | ||
857 | </listitem> | ||
858 | |||
859 | </itemizedlist> | ||
860 | </sect1> | ||
861 | </chapter> | ||
862 | |||
863 | <chapter id="libataExt"> | ||
864 | <title>libata Library</title> | ||
865 | !Edrivers/ata/libata-core.c | ||
866 | </chapter> | ||
867 | |||
868 | <chapter id="libataInt"> | ||
869 | <title>libata Core Internals</title> | ||
870 | !Idrivers/ata/libata-core.c | ||
871 | </chapter> | ||
872 | |||
873 | <chapter id="libataScsiInt"> | ||
874 | <title>libata SCSI translation/emulation</title> | ||
875 | !Edrivers/ata/libata-scsi.c | ||
876 | !Idrivers/ata/libata-scsi.c | ||
877 | </chapter> | ||
878 | |||
879 | <chapter id="ataExceptions"> | ||
880 | <title>ATA errors and exceptions</title> | ||
881 | |||
882 | <para> | ||
883 | This chapter tries to identify what error/exception conditions exist | ||
884 | for ATA/ATAPI devices and describe how they should be handled in | ||
885 | implementation-neutral way. | ||
886 | </para> | ||
887 | |||
888 | <para> | ||
889 | The term 'error' is used to describe conditions where either an | ||
890 | explicit error condition is reported from device or a command has | ||
891 | timed out. | ||
892 | </para> | ||
893 | |||
894 | <para> | ||
895 | The term 'exception' is either used to describe exceptional | ||
896 | conditions which are not errors (say, power or hotplug events), or | ||
897 | to describe both errors and non-error exceptional conditions. Where | ||
898 | explicit distinction between error and exception is necessary, the | ||
899 | term 'non-error exception' is used. | ||
900 | </para> | ||
901 | |||
902 | <sect1 id="excat"> | ||
903 | <title>Exception categories</title> | ||
904 | <para> | ||
905 | Exceptions are described primarily with respect to legacy | ||
906 | taskfile + bus master IDE interface. If a controller provides | ||
907 | other better mechanism for error reporting, mapping those into | ||
908 | categories described below shouldn't be difficult. | ||
909 | </para> | ||
910 | |||
911 | <para> | ||
912 | In the following sections, two recovery actions - reset and | ||
913 | reconfiguring transport - are mentioned. These are described | ||
914 | further in <xref linkend="exrec"/>. | ||
915 | </para> | ||
916 | |||
917 | <sect2 id="excatHSMviolation"> | ||
918 | <title>HSM violation</title> | ||
919 | <para> | ||
920 | This error is indicated when STATUS value doesn't match HSM | ||
921 | requirement during issuing or execution any ATA/ATAPI command. | ||
922 | </para> | ||
923 | |||
924 | <itemizedlist> | ||
925 | <title>Examples</title> | ||
926 | |||
927 | <listitem> | ||
928 | <para> | ||
929 | ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying | ||
930 | to issue a command. | ||
931 | </para> | ||
932 | </listitem> | ||
933 | |||
934 | <listitem> | ||
935 | <para> | ||
936 | !BSY && !DRQ during PIO data transfer. | ||
937 | </para> | ||
938 | </listitem> | ||
939 | |||
940 | <listitem> | ||
941 | <para> | ||
942 | DRQ on command completion. | ||
943 | </para> | ||
944 | </listitem> | ||
945 | |||
946 | <listitem> | ||
947 | <para> | ||
948 | !BSY && ERR after CDB transfer starts but before the | ||
949 | last byte of CDB is transferred. ATA/ATAPI standard states | ||
950 | that "The device shall not terminate the PACKET command | ||
951 | with an error before the last byte of the command packet has | ||
952 | been written" in the error outputs description of PACKET | ||
953 | command and the state diagram doesn't include such | ||
954 | transitions. | ||
955 | </para> | ||
956 | </listitem> | ||
957 | |||
958 | </itemizedlist> | ||
959 | |||
960 | <para> | ||
961 | In these cases, HSM is violated and not much information | ||
962 | regarding the error can be acquired from STATUS or ERROR | ||
963 | register. IOW, this error can be anything - driver bug, | ||
964 | faulty device, controller and/or cable. | ||
965 | </para> | ||
966 | |||
967 | <para> | ||
968 | As HSM is violated, reset is necessary to restore known state. | ||
969 | Reconfiguring transport for lower speed might be helpful too | ||
970 | as transmission errors sometimes cause this kind of errors. | ||
971 | </para> | ||
972 | </sect2> | ||
973 | |||
974 | <sect2 id="excatDevErr"> | ||
975 | <title>ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</title> | ||
976 | |||
977 | <para> | ||
978 | These are errors detected and reported by ATA/ATAPI devices | ||
979 | indicating device problems. For this type of errors, STATUS | ||
980 | and ERROR register values are valid and describe error | ||
981 | condition. Note that some of ATA bus errors are detected by | ||
982 | ATA/ATAPI devices and reported using the same mechanism as | ||
983 | device errors. Those cases are described later in this | ||
984 | section. | ||
985 | </para> | ||
986 | |||
987 | <para> | ||
988 | For ATA commands, this type of errors are indicated by !BSY | ||
989 | && ERR during command execution and on completion. | ||
990 | </para> | ||
991 | |||
992 | <para>For ATAPI commands,</para> | ||
993 | |||
994 | <itemizedlist> | ||
995 | |||
996 | <listitem> | ||
997 | <para> | ||
998 | !BSY && ERR && ABRT right after issuing PACKET | ||
999 | indicates that PACKET command is not supported and falls in | ||
1000 | this category. | ||
1001 | </para> | ||
1002 | </listitem> | ||
1003 | |||
1004 | <listitem> | ||
1005 | <para> | ||
1006 | !BSY && ERR(==CHK) && !ABRT after the last | ||
1007 | byte of CDB is transferred indicates CHECK CONDITION and | ||
1008 | doesn't fall in this category. | ||
1009 | </para> | ||
1010 | </listitem> | ||
1011 | |||
1012 | <listitem> | ||
1013 | <para> | ||
1014 | !BSY && ERR(==CHK) && ABRT after the last byte | ||
1015 | of CDB is transferred *probably* indicates CHECK CONDITION and | ||
1016 | doesn't fall in this category. | ||
1017 | </para> | ||
1018 | </listitem> | ||
1019 | |||
1020 | </itemizedlist> | ||
1021 | |||
1022 | <para> | ||
1023 | Of errors detected as above, the following are not ATA/ATAPI | ||
1024 | device errors but ATA bus errors and should be handled | ||
1025 | according to <xref linkend="excatATAbusErr"/>. | ||
1026 | </para> | ||
1027 | |||
1028 | <variablelist> | ||
1029 | |||
1030 | <varlistentry> | ||
1031 | <term>CRC error during data transfer</term> | ||
1032 | <listitem> | ||
1033 | <para> | ||
1034 | This is indicated by ICRC bit in the ERROR register and | ||
1035 | means that corruption occurred during data transfer. Up to | ||
1036 | ATA/ATAPI-7, the standard specifies that this bit is only | ||
1037 | applicable to UDMA transfers but ATA/ATAPI-8 draft revision | ||
1038 | 1f says that the bit may be applicable to multiword DMA and | ||
1039 | PIO. | ||
1040 | </para> | ||
1041 | </listitem> | ||
1042 | </varlistentry> | ||
1043 | |||
1044 | <varlistentry> | ||
1045 | <term>ABRT error during data transfer or on completion</term> | ||
1046 | <listitem> | ||
1047 | <para> | ||
1048 | Up to ATA/ATAPI-7, the standard specifies that ABRT could be | ||
1049 | set on ICRC errors and on cases where a device is not able | ||
1050 | to complete a command. Combined with the fact that MWDMA | ||
1051 | and PIO transfer errors aren't allowed to use ICRC bit up to | ||
1052 | ATA/ATAPI-7, it seems to imply that ABRT bit alone could | ||
1053 | indicate transfer errors. | ||
1054 | </para> | ||
1055 | <para> | ||
1056 | However, ATA/ATAPI-8 draft revision 1f removes the part | ||
1057 | that ICRC errors can turn on ABRT. So, this is kind of | ||
1058 | gray area. Some heuristics are needed here. | ||
1059 | </para> | ||
1060 | </listitem> | ||
1061 | </varlistentry> | ||
1062 | |||
1063 | </variablelist> | ||
1064 | |||
1065 | <para> | ||
1066 | ATA/ATAPI device errors can be further categorized as follows. | ||
1067 | </para> | ||
1068 | |||
1069 | <variablelist> | ||
1070 | |||
1071 | <varlistentry> | ||
1072 | <term>Media errors</term> | ||
1073 | <listitem> | ||
1074 | <para> | ||
1075 | This is indicated by UNC bit in the ERROR register. ATA | ||
1076 | devices reports UNC error only after certain number of | ||
1077 | retries cannot recover the data, so there's nothing much | ||
1078 | else to do other than notifying upper layer. | ||
1079 | </para> | ||
1080 | <para> | ||
1081 | READ and WRITE commands report CHS or LBA of the first | ||
1082 | failed sector but ATA/ATAPI standard specifies that the | ||
1083 | amount of transferred data on error completion is | ||
1084 | indeterminate, so we cannot assume that sectors preceding | ||
1085 | the failed sector have been transferred and thus cannot | ||
1086 | complete those sectors successfully as SCSI does. | ||
1087 | </para> | ||
1088 | </listitem> | ||
1089 | </varlistentry> | ||
1090 | |||
1091 | <varlistentry> | ||
1092 | <term>Media changed / media change requested error</term> | ||
1093 | <listitem> | ||
1094 | <para> | ||
1095 | <<TODO: fill here>> | ||
1096 | </para> | ||
1097 | </listitem> | ||
1098 | </varlistentry> | ||
1099 | |||
1100 | <varlistentry><term>Address error</term> | ||
1101 | <listitem> | ||
1102 | <para> | ||
1103 | This is indicated by IDNF bit in the ERROR register. | ||
1104 | Report to upper layer. | ||
1105 | </para> | ||
1106 | </listitem> | ||
1107 | </varlistentry> | ||
1108 | |||
1109 | <varlistentry><term>Other errors</term> | ||
1110 | <listitem> | ||
1111 | <para> | ||
1112 | This can be invalid command or parameter indicated by ABRT | ||
1113 | ERROR bit or some other error condition. Note that ABRT | ||
1114 | bit can indicate a lot of things including ICRC and Address | ||
1115 | errors. Heuristics needed. | ||
1116 | </para> | ||
1117 | </listitem> | ||
1118 | </varlistentry> | ||
1119 | |||
1120 | </variablelist> | ||
1121 | |||
1122 | <para> | ||
1123 | Depending on commands, not all STATUS/ERROR bits are | ||
1124 | applicable. These non-applicable bits are marked with | ||
1125 | "na" in the output descriptions but up to ATA/ATAPI-7 | ||
1126 | no definition of "na" can be found. However, | ||
1127 | ATA/ATAPI-8 draft revision 1f describes "N/A" as | ||
1128 | follows. | ||
1129 | </para> | ||
1130 | |||
1131 | <blockquote> | ||
1132 | <variablelist> | ||
1133 | <varlistentry><term>3.2.3.3a N/A</term> | ||
1134 | <listitem> | ||
1135 | <para> | ||
1136 | A keyword the indicates a field has no defined value in | ||
1137 | this standard and should not be checked by the host or | ||
1138 | device. N/A fields should be cleared to zero. | ||
1139 | </para> | ||
1140 | </listitem> | ||
1141 | </varlistentry> | ||
1142 | </variablelist> | ||
1143 | </blockquote> | ||
1144 | |||
1145 | <para> | ||
1146 | So, it seems reasonable to assume that "na" bits are | ||
1147 | cleared to zero by devices and thus need no explicit masking. | ||
1148 | </para> | ||
1149 | |||
1150 | </sect2> | ||
1151 | |||
1152 | <sect2 id="excatATAPIcc"> | ||
1153 | <title>ATAPI device CHECK CONDITION</title> | ||
1154 | |||
1155 | <para> | ||
1156 | ATAPI device CHECK CONDITION error is indicated by set CHK bit | ||
1157 | (ERR bit) in the STATUS register after the last byte of CDB is | ||
1158 | transferred for a PACKET command. For this kind of errors, | ||
1159 | sense data should be acquired to gather information regarding | ||
1160 | the errors. REQUEST SENSE packet command should be used to | ||
1161 | acquire sense data. | ||
1162 | </para> | ||
1163 | |||
1164 | <para> | ||
1165 | Once sense data is acquired, this type of errors can be | ||
1166 | handled similarly to other SCSI errors. Note that sense data | ||
1167 | may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR | ||
1168 | && ASC/ASCQ 47h/00h SCSI PARITY ERROR). In such | ||
1169 | cases, the error should be considered as an ATA bus error and | ||
1170 | handled according to <xref linkend="excatATAbusErr"/>. | ||
1171 | </para> | ||
1172 | |||
1173 | </sect2> | ||
1174 | |||
1175 | <sect2 id="excatNCQerr"> | ||
1176 | <title>ATA device error (NCQ)</title> | ||
1177 | |||
1178 | <para> | ||
1179 | NCQ command error is indicated by cleared BSY and set ERR bit | ||
1180 | during NCQ command phase (one or more NCQ commands | ||
1181 | outstanding). Although STATUS and ERROR registers will | ||
1182 | contain valid values describing the error, READ LOG EXT is | ||
1183 | required to clear the error condition, determine which command | ||
1184 | has failed and acquire more information. | ||
1185 | </para> | ||
1186 | |||
1187 | <para> | ||
1188 | READ LOG EXT Log Page 10h reports which tag has failed and | ||
1189 | taskfile register values describing the error. With this | ||
1190 | information the failed command can be handled as a normal ATA | ||
1191 | command error as in <xref linkend="excatDevErr"/> and all | ||
1192 | other in-flight commands must be retried. Note that this | ||
1193 | retry should not be counted - it's likely that commands | ||
1194 | retried this way would have completed normally if it were not | ||
1195 | for the failed command. | ||
1196 | </para> | ||
1197 | |||
1198 | <para> | ||
1199 | Note that ATA bus errors can be reported as ATA device NCQ | ||
1200 | errors. This should be handled as described in <xref | ||
1201 | linkend="excatATAbusErr"/>. | ||
1202 | </para> | ||
1203 | |||
1204 | <para> | ||
1205 | If READ LOG EXT Log Page 10h fails or reports NQ, we're | ||
1206 | thoroughly screwed. This condition should be treated | ||
1207 | according to <xref linkend="excatHSMviolation"/>. | ||
1208 | </para> | ||
1209 | |||
1210 | </sect2> | ||
1211 | |||
1212 | <sect2 id="excatATAbusErr"> | ||
1213 | <title>ATA bus error</title> | ||
1214 | |||
1215 | <para> | ||
1216 | ATA bus error means that data corruption occurred during | ||
1217 | transmission over ATA bus (SATA or PATA). This type of errors | ||
1218 | can be indicated by | ||
1219 | </para> | ||
1220 | |||
1221 | <itemizedlist> | ||
1222 | |||
1223 | <listitem> | ||
1224 | <para> | ||
1225 | ICRC or ABRT error as described in <xref linkend="excatDevErr"/>. | ||
1226 | </para> | ||
1227 | </listitem> | ||
1228 | |||
1229 | <listitem> | ||
1230 | <para> | ||
1231 | Controller-specific error completion with error information | ||
1232 | indicating transmission error. | ||
1233 | </para> | ||
1234 | </listitem> | ||
1235 | |||
1236 | <listitem> | ||
1237 | <para> | ||
1238 | On some controllers, command timeout. In this case, there may | ||
1239 | be a mechanism to determine that the timeout is due to | ||
1240 | transmission error. | ||
1241 | </para> | ||
1242 | </listitem> | ||
1243 | |||
1244 | <listitem> | ||
1245 | <para> | ||
1246 | Unknown/random errors, timeouts and all sorts of weirdities. | ||
1247 | </para> | ||
1248 | </listitem> | ||
1249 | |||
1250 | </itemizedlist> | ||
1251 | |||
1252 | <para> | ||
1253 | As described above, transmission errors can cause wide variety | ||
1254 | of symptoms ranging from device ICRC error to random device | ||
1255 | lockup, and, for many cases, there is no way to tell if an | ||
1256 | error condition is due to transmission error or not; | ||
1257 | therefore, it's necessary to employ some kind of heuristic | ||
1258 | when dealing with errors and timeouts. For example, | ||
1259 | encountering repetitive ABRT errors for known supported | ||
1260 | command is likely to indicate ATA bus error. | ||
1261 | </para> | ||
1262 | |||
1263 | <para> | ||
1264 | Once it's determined that ATA bus errors have possibly | ||
1265 | occurred, lowering ATA bus transmission speed is one of | ||
1266 | actions which may alleviate the problem. See <xref | ||
1267 | linkend="exrecReconf"/> for more information. | ||
1268 | </para> | ||
1269 | |||
1270 | </sect2> | ||
1271 | |||
1272 | <sect2 id="excatPCIbusErr"> | ||
1273 | <title>PCI bus error</title> | ||
1274 | |||
1275 | <para> | ||
1276 | Data corruption or other failures during transmission over PCI | ||
1277 | (or other system bus). For standard BMDMA, this is indicated | ||
1278 | by Error bit in the BMDMA Status register. This type of | ||
1279 | errors must be logged as it indicates something is very wrong | ||
1280 | with the system. Resetting host controller is recommended. | ||
1281 | </para> | ||
1282 | |||
1283 | </sect2> | ||
1284 | |||
1285 | <sect2 id="excatLateCompletion"> | ||
1286 | <title>Late completion</title> | ||
1287 | |||
1288 | <para> | ||
1289 | This occurs when timeout occurs and the timeout handler finds | ||
1290 | out that the timed out command has completed successfully or | ||
1291 | with error. This is usually caused by lost interrupts. This | ||
1292 | type of errors must be logged. Resetting host controller is | ||
1293 | recommended. | ||
1294 | </para> | ||
1295 | |||
1296 | </sect2> | ||
1297 | |||
1298 | <sect2 id="excatUnknown"> | ||
1299 | <title>Unknown error (timeout)</title> | ||
1300 | |||
1301 | <para> | ||
1302 | This is when timeout occurs and the command is still | ||
1303 | processing or the host and device are in unknown state. When | ||
1304 | this occurs, HSM could be in any valid or invalid state. To | ||
1305 | bring the device to known state and make it forget about the | ||
1306 | timed out command, resetting is necessary. The timed out | ||
1307 | command may be retried. | ||
1308 | </para> | ||
1309 | |||
1310 | <para> | ||
1311 | Timeouts can also be caused by transmission errors. Refer to | ||
1312 | <xref linkend="excatATAbusErr"/> for more details. | ||
1313 | </para> | ||
1314 | |||
1315 | </sect2> | ||
1316 | |||
1317 | <sect2 id="excatHoplugPM"> | ||
1318 | <title>Hotplug and power management exceptions</title> | ||
1319 | |||
1320 | <para> | ||
1321 | <<TODO: fill here>> | ||
1322 | </para> | ||
1323 | |||
1324 | </sect2> | ||
1325 | |||
1326 | </sect1> | ||
1327 | |||
1328 | <sect1 id="exrec"> | ||
1329 | <title>EH recovery actions</title> | ||
1330 | |||
1331 | <para> | ||
1332 | This section discusses several important recovery actions. | ||
1333 | </para> | ||
1334 | |||
1335 | <sect2 id="exrecClr"> | ||
1336 | <title>Clearing error condition</title> | ||
1337 | |||
1338 | <para> | ||
1339 | Many controllers require its error registers to be cleared by | ||
1340 | error handler. Different controllers may have different | ||
1341 | requirements. | ||
1342 | </para> | ||
1343 | |||
1344 | <para> | ||
1345 | For SATA, it's strongly recommended to clear at least SError | ||
1346 | register during error handling. | ||
1347 | </para> | ||
1348 | </sect2> | ||
1349 | |||
1350 | <sect2 id="exrecRst"> | ||
1351 | <title>Reset</title> | ||
1352 | |||
1353 | <para> | ||
1354 | During EH, resetting is necessary in the following cases. | ||
1355 | </para> | ||
1356 | |||
1357 | <itemizedlist> | ||
1358 | |||
1359 | <listitem> | ||
1360 | <para> | ||
1361 | HSM is in unknown or invalid state | ||
1362 | </para> | ||
1363 | </listitem> | ||
1364 | |||
1365 | <listitem> | ||
1366 | <para> | ||
1367 | HBA is in unknown or invalid state | ||
1368 | </para> | ||
1369 | </listitem> | ||
1370 | |||
1371 | <listitem> | ||
1372 | <para> | ||
1373 | EH needs to make HBA/device forget about in-flight commands | ||
1374 | </para> | ||
1375 | </listitem> | ||
1376 | |||
1377 | <listitem> | ||
1378 | <para> | ||
1379 | HBA/device behaves weirdly | ||
1380 | </para> | ||
1381 | </listitem> | ||
1382 | |||
1383 | </itemizedlist> | ||
1384 | |||
1385 | <para> | ||
1386 | Resetting during EH might be a good idea regardless of error | ||
1387 | condition to improve EH robustness. Whether to reset both or | ||
1388 | either one of HBA and device depends on situation but the | ||
1389 | following scheme is recommended. | ||
1390 | </para> | ||
1391 | |||
1392 | <itemizedlist> | ||
1393 | |||
1394 | <listitem> | ||
1395 | <para> | ||
1396 | When it's known that HBA is in ready state but ATA/ATAPI | ||
1397 | device is in unknown state, reset only device. | ||
1398 | </para> | ||
1399 | </listitem> | ||
1400 | |||
1401 | <listitem> | ||
1402 | <para> | ||
1403 | If HBA is in unknown state, reset both HBA and device. | ||
1404 | </para> | ||
1405 | </listitem> | ||
1406 | |||
1407 | </itemizedlist> | ||
1408 | |||
1409 | <para> | ||
1410 | HBA resetting is implementation specific. For a controller | ||
1411 | complying to taskfile/BMDMA PCI IDE, stopping active DMA | ||
1412 | transaction may be sufficient iff BMDMA state is the only HBA | ||
1413 | context. But even mostly taskfile/BMDMA PCI IDE complying | ||
1414 | controllers may have implementation specific requirements and | ||
1415 | mechanism to reset themselves. This must be addressed by | ||
1416 | specific drivers. | ||
1417 | </para> | ||
1418 | |||
1419 | <para> | ||
1420 | OTOH, ATA/ATAPI standard describes in detail ways to reset | ||
1421 | ATA/ATAPI devices. | ||
1422 | </para> | ||
1423 | |||
1424 | <variablelist> | ||
1425 | |||
1426 | <varlistentry><term>PATA hardware reset</term> | ||
1427 | <listitem> | ||
1428 | <para> | ||
1429 | This is hardware initiated device reset signalled with | ||
1430 | asserted PATA RESET- signal. There is no standard way to | ||
1431 | initiate hardware reset from software although some | ||
1432 | hardware provides registers that allow driver to directly | ||
1433 | tweak the RESET- signal. | ||
1434 | </para> | ||
1435 | </listitem> | ||
1436 | </varlistentry> | ||
1437 | |||
1438 | <varlistentry><term>Software reset</term> | ||
1439 | <listitem> | ||
1440 | <para> | ||
1441 | This is achieved by turning CONTROL SRST bit on for at | ||
1442 | least 5us. Both PATA and SATA support it but, in case of | ||
1443 | SATA, this may require controller-specific support as the | ||
1444 | second Register FIS to clear SRST should be transmitted | ||
1445 | while BSY bit is still set. Note that on PATA, this resets | ||
1446 | both master and slave devices on a channel. | ||
1447 | </para> | ||
1448 | </listitem> | ||
1449 | </varlistentry> | ||
1450 | |||
1451 | <varlistentry><term>EXECUTE DEVICE DIAGNOSTIC command</term> | ||
1452 | <listitem> | ||
1453 | <para> | ||
1454 | Although ATA/ATAPI standard doesn't describe exactly, EDD | ||
1455 | implies some level of resetting, possibly similar level | ||
1456 | with software reset. Host-side EDD protocol can be handled | ||
1457 | with normal command processing and most SATA controllers | ||
1458 | should be able to handle EDD's just like other commands. | ||
1459 | As in software reset, EDD affects both devices on a PATA | ||
1460 | bus. | ||
1461 | </para> | ||
1462 | <para> | ||
1463 | Although EDD does reset devices, this doesn't suit error | ||
1464 | handling as EDD cannot be issued while BSY is set and it's | ||
1465 | unclear how it will act when device is in unknown/weird | ||
1466 | state. | ||
1467 | </para> | ||
1468 | </listitem> | ||
1469 | </varlistentry> | ||
1470 | |||
1471 | <varlistentry><term>ATAPI DEVICE RESET command</term> | ||
1472 | <listitem> | ||
1473 | <para> | ||
1474 | This is very similar to software reset except that reset | ||
1475 | can be restricted to the selected device without affecting | ||
1476 | the other device sharing the cable. | ||
1477 | </para> | ||
1478 | </listitem> | ||
1479 | </varlistentry> | ||
1480 | |||
1481 | <varlistentry><term>SATA phy reset</term> | ||
1482 | <listitem> | ||
1483 | <para> | ||
1484 | This is the preferred way of resetting a SATA device. In | ||
1485 | effect, it's identical to PATA hardware reset. Note that | ||
1486 | this can be done with the standard SCR Control register. | ||
1487 | As such, it's usually easier to implement than software | ||
1488 | reset. | ||
1489 | </para> | ||
1490 | </listitem> | ||
1491 | </varlistentry> | ||
1492 | |||
1493 | </variablelist> | ||
1494 | |||
1495 | <para> | ||
1496 | One more thing to consider when resetting devices is that | ||
1497 | resetting clears certain configuration parameters and they | ||
1498 | need to be set to their previous or newly adjusted values | ||
1499 | after reset. | ||
1500 | </para> | ||
1501 | |||
1502 | <para> | ||
1503 | Parameters affected are. | ||
1504 | </para> | ||
1505 | |||
1506 | <itemizedlist> | ||
1507 | |||
1508 | <listitem> | ||
1509 | <para> | ||
1510 | CHS set up with INITIALIZE DEVICE PARAMETERS (seldom used) | ||
1511 | </para> | ||
1512 | </listitem> | ||
1513 | |||
1514 | <listitem> | ||
1515 | <para> | ||
1516 | Parameters set with SET FEATURES including transfer mode setting | ||
1517 | </para> | ||
1518 | </listitem> | ||
1519 | |||
1520 | <listitem> | ||
1521 | <para> | ||
1522 | Block count set with SET MULTIPLE MODE | ||
1523 | </para> | ||
1524 | </listitem> | ||
1525 | |||
1526 | <listitem> | ||
1527 | <para> | ||
1528 | Other parameters (SET MAX, MEDIA LOCK...) | ||
1529 | </para> | ||
1530 | </listitem> | ||
1531 | |||
1532 | </itemizedlist> | ||
1533 | |||
1534 | <para> | ||
1535 | ATA/ATAPI standard specifies that some parameters must be | ||
1536 | maintained across hardware or software reset, but doesn't | ||
1537 | strictly specify all of them. Always reconfiguring needed | ||
1538 | parameters after reset is required for robustness. Note that | ||
1539 | this also applies when resuming from deep sleep (power-off). | ||
1540 | </para> | ||
1541 | |||
1542 | <para> | ||
1543 | Also, ATA/ATAPI standard requires that IDENTIFY DEVICE / | ||
1544 | IDENTIFY PACKET DEVICE is issued after any configuration | ||
1545 | parameter is updated or a hardware reset and the result used | ||
1546 | for further operation. OS driver is required to implement | ||
1547 | revalidation mechanism to support this. | ||
1548 | </para> | ||
1549 | |||
1550 | </sect2> | ||
1551 | |||
1552 | <sect2 id="exrecReconf"> | ||
1553 | <title>Reconfigure transport</title> | ||
1554 | |||
1555 | <para> | ||
1556 | For both PATA and SATA, a lot of corners are cut for cheap | ||
1557 | connectors, cables or controllers and it's quite common to see | ||
1558 | high transmission error rate. This can be mitigated by | ||
1559 | lowering transmission speed. | ||
1560 | </para> | ||
1561 | |||
1562 | <para> | ||
1563 | The following is a possible scheme Jeff Garzik suggested. | ||
1564 | </para> | ||
1565 | |||
1566 | <blockquote> | ||
1567 | <para> | ||
1568 | If more than $N (3?) transmission errors happen in 15 minutes, | ||
1569 | </para> | ||
1570 | <itemizedlist> | ||
1571 | <listitem> | ||
1572 | <para> | ||
1573 | if SATA, decrease SATA PHY speed. if speed cannot be decreased, | ||
1574 | </para> | ||
1575 | </listitem> | ||
1576 | <listitem> | ||
1577 | <para> | ||
1578 | decrease UDMA xfer speed. if at UDMA0, switch to PIO4, | ||
1579 | </para> | ||
1580 | </listitem> | ||
1581 | <listitem> | ||
1582 | <para> | ||
1583 | decrease PIO xfer speed. if at PIO3, complain, but continue | ||
1584 | </para> | ||
1585 | </listitem> | ||
1586 | </itemizedlist> | ||
1587 | </blockquote> | ||
1588 | |||
1589 | </sect2> | ||
1590 | |||
1591 | </sect1> | ||
1592 | |||
1593 | </chapter> | ||
1594 | |||
1595 | <chapter id="PiixInt"> | ||
1596 | <title>ata_piix Internals</title> | ||
1597 | !Idrivers/ata/ata_piix.c | ||
1598 | </chapter> | ||
1599 | |||
1600 | <chapter id="SILInt"> | ||
1601 | <title>sata_sil Internals</title> | ||
1602 | !Idrivers/ata/sata_sil.c | ||
1603 | </chapter> | ||
1604 | |||
1605 | <chapter id="libataThanks"> | ||
1606 | <title>Thanks</title> | ||
1607 | <para> | ||
1608 | The bulk of the ATA knowledge comes thanks to long conversations with | ||
1609 | Andre Hedrick (www.linux-ide.org), and long hours pondering the ATA | ||
1610 | and SCSI specifications. | ||
1611 | </para> | ||
1612 | <para> | ||
1613 | Thanks to Alan Cox for pointing out similarities | ||
1614 | between SATA and SCSI, and in general for motivation to hack on | ||
1615 | libata. | ||
1616 | </para> | ||
1617 | <para> | ||
1618 | libata's device detection | ||
1619 | method, ata_pio_devchk, and in general all the early probing was | ||
1620 | based on extensive study of Hale Landis's probe/reset code in his | ||
1621 | ATADRVR driver (www.ata-atapi.com). | ||
1622 | </para> | ||
1623 | </chapter> | ||
1624 | |||
1625 | </book> | ||
diff --git a/Documentation/DocBook/librs.tmpl b/Documentation/DocBook/librs.tmpl deleted file mode 100644 index 94f21361e0ed..000000000000 --- a/Documentation/DocBook/librs.tmpl +++ /dev/null | |||
@@ -1,289 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="Reed-Solomon-Library-Guide"> | ||
6 | <bookinfo> | ||
7 | <title>Reed-Solomon Library Programming Interface</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Thomas</firstname> | ||
12 | <surname>Gleixner</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>tglx@linutronix.de</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | </authorgroup> | ||
20 | |||
21 | <copyright> | ||
22 | <year>2004</year> | ||
23 | <holder>Thomas Gleixner</holder> | ||
24 | </copyright> | ||
25 | |||
26 | <legalnotice> | ||
27 | <para> | ||
28 | This documentation is free software; you can redistribute | ||
29 | it and/or modify it under the terms of the GNU General Public | ||
30 | License version 2 as published by the Free Software Foundation. | ||
31 | </para> | ||
32 | |||
33 | <para> | ||
34 | This program is distributed in the hope that it will be | ||
35 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
36 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
37 | See the GNU General Public License for more details. | ||
38 | </para> | ||
39 | |||
40 | <para> | ||
41 | You should have received a copy of the GNU General Public | ||
42 | License along with this program; if not, write to the Free | ||
43 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
44 | MA 02111-1307 USA | ||
45 | </para> | ||
46 | |||
47 | <para> | ||
48 | For more details see the file COPYING in the source | ||
49 | distribution of Linux. | ||
50 | </para> | ||
51 | </legalnotice> | ||
52 | </bookinfo> | ||
53 | |||
54 | <toc></toc> | ||
55 | |||
56 | <chapter id="intro"> | ||
57 | <title>Introduction</title> | ||
58 | <para> | ||
59 | The generic Reed-Solomon Library provides encoding, decoding | ||
60 | and error correction functions. | ||
61 | </para> | ||
62 | <para> | ||
63 | Reed-Solomon codes are used in communication and storage | ||
64 | applications to ensure data integrity. | ||
65 | </para> | ||
66 | <para> | ||
67 | This documentation is provided for developers who want to utilize | ||
68 | the functions provided by the library. | ||
69 | </para> | ||
70 | </chapter> | ||
71 | |||
72 | <chapter id="bugs"> | ||
73 | <title>Known Bugs And Assumptions</title> | ||
74 | <para> | ||
75 | None. | ||
76 | </para> | ||
77 | </chapter> | ||
78 | |||
79 | <chapter id="usage"> | ||
80 | <title>Usage</title> | ||
81 | <para> | ||
82 | This chapter provides examples of how to use the library. | ||
83 | </para> | ||
84 | <sect1> | ||
85 | <title>Initializing</title> | ||
86 | <para> | ||
87 | The init function init_rs returns a pointer to an | ||
88 | rs decoder structure, which holds the necessary | ||
89 | information for encoding, decoding and error correction | ||
90 | with the given polynomial. It either uses an existing | ||
91 | matching decoder or creates a new one. On creation all | ||
92 | the lookup tables for fast en/decoding are created. | ||
93 | The function may take a while, so make sure not to | ||
94 | call it in critical code paths. | ||
95 | </para> | ||
96 | <programlisting> | ||
97 | /* the Reed Solomon control structure */ | ||
98 | static struct rs_control *rs_decoder; | ||
99 | |||
100 | /* Symbolsize is 10 (bits) | ||
101 | * Primitive polynomial is x^10+x^3+1 | ||
102 | * first consecutive root is 0 | ||
103 | * primitive element to generate roots = 1 | ||
104 | * generator polynomial degree (number of roots) = 6 | ||
105 | */ | ||
106 | rs_decoder = init_rs (10, 0x409, 0, 1, 6); | ||
107 | </programlisting> | ||
108 | </sect1> | ||
109 | <sect1> | ||
110 | <title>Encoding</title> | ||
111 | <para> | ||
112 | The encoder calculates the Reed-Solomon code over | ||
113 | the given data length and stores the result in | ||
114 | the parity buffer. Note that the parity buffer must | ||
115 | be initialized before calling the encoder. | ||
116 | </para> | ||
117 | <para> | ||
118 | The expanded data can be inverted on the fly by | ||
119 | providing a non-zero inversion mask. The expanded data is | ||
120 | XOR'ed with the mask. This is used e.g. for FLASH | ||
121 | ECC, where the all 0xFF is inverted to an all 0x00. | ||
122 | The Reed-Solomon code for all 0x00 is all 0x00. The | ||
123 | code is inverted before storing to FLASH so it is 0xFF | ||
124 | too. This prevents that reading from an erased FLASH | ||
125 | results in ECC errors. | ||
126 | </para> | ||
127 | <para> | ||
128 | The databytes are expanded to the given symbol size | ||
129 | on the fly. There is no support for encoding continuous | ||
130 | bitstreams with a symbol size != 8 at the moment. If | ||
131 | it is necessary it should be not a big deal to implement | ||
132 | such functionality. | ||
133 | </para> | ||
134 | <programlisting> | ||
135 | /* Parity buffer. Size = number of roots */ | ||
136 | uint16_t par[6]; | ||
137 | /* Initialize the parity buffer */ | ||
138 | memset(par, 0, sizeof(par)); | ||
139 | /* Encode 512 byte in data8. Store parity in buffer par */ | ||
140 | encode_rs8 (rs_decoder, data8, 512, par, 0); | ||
141 | </programlisting> | ||
142 | </sect1> | ||
143 | <sect1> | ||
144 | <title>Decoding</title> | ||
145 | <para> | ||
146 | The decoder calculates the syndrome over | ||
147 | the given data length and the received parity symbols | ||
148 | and corrects errors in the data. | ||
149 | </para> | ||
150 | <para> | ||
151 | If a syndrome is available from a hardware decoder | ||
152 | then the syndrome calculation is skipped. | ||
153 | </para> | ||
154 | <para> | ||
155 | The correction of the data buffer can be suppressed | ||
156 | by providing a correction pattern buffer and an error | ||
157 | location buffer to the decoder. The decoder stores the | ||
158 | calculated error location and the correction bitmask | ||
159 | in the given buffers. This is useful for hardware | ||
160 | decoders which use a weird bit ordering scheme. | ||
161 | </para> | ||
162 | <para> | ||
163 | The databytes are expanded to the given symbol size | ||
164 | on the fly. There is no support for decoding continuous | ||
165 | bitstreams with a symbolsize != 8 at the moment. If | ||
166 | it is necessary it should be not a big deal to implement | ||
167 | such functionality. | ||
168 | </para> | ||
169 | |||
170 | <sect2> | ||
171 | <title> | ||
172 | Decoding with syndrome calculation, direct data correction | ||
173 | </title> | ||
174 | <programlisting> | ||
175 | /* Parity buffer. Size = number of roots */ | ||
176 | uint16_t par[6]; | ||
177 | uint8_t data[512]; | ||
178 | int numerr; | ||
179 | /* Receive data */ | ||
180 | ..... | ||
181 | /* Receive parity */ | ||
182 | ..... | ||
183 | /* Decode 512 byte in data8.*/ | ||
184 | numerr = decode_rs8 (rs_decoder, data8, par, 512, NULL, 0, NULL, 0, NULL); | ||
185 | </programlisting> | ||
186 | </sect2> | ||
187 | |||
188 | <sect2> | ||
189 | <title> | ||
190 | Decoding with syndrome given by hardware decoder, direct data correction | ||
191 | </title> | ||
192 | <programlisting> | ||
193 | /* Parity buffer. Size = number of roots */ | ||
194 | uint16_t par[6], syn[6]; | ||
195 | uint8_t data[512]; | ||
196 | int numerr; | ||
197 | /* Receive data */ | ||
198 | ..... | ||
199 | /* Receive parity */ | ||
200 | ..... | ||
201 | /* Get syndrome from hardware decoder */ | ||
202 | ..... | ||
203 | /* Decode 512 byte in data8.*/ | ||
204 | numerr = decode_rs8 (rs_decoder, data8, par, 512, syn, 0, NULL, 0, NULL); | ||
205 | </programlisting> | ||
206 | </sect2> | ||
207 | |||
208 | <sect2> | ||
209 | <title> | ||
210 | Decoding with syndrome given by hardware decoder, no direct data correction. | ||
211 | </title> | ||
212 | <para> | ||
213 | Note: It's not necessary to give data and received parity to the decoder. | ||
214 | </para> | ||
215 | <programlisting> | ||
216 | /* Parity buffer. Size = number of roots */ | ||
217 | uint16_t par[6], syn[6], corr[8]; | ||
218 | uint8_t data[512]; | ||
219 | int numerr, errpos[8]; | ||
220 | /* Receive data */ | ||
221 | ..... | ||
222 | /* Receive parity */ | ||
223 | ..... | ||
224 | /* Get syndrome from hardware decoder */ | ||
225 | ..... | ||
226 | /* Decode 512 byte in data8.*/ | ||
227 | numerr = decode_rs8 (rs_decoder, NULL, NULL, 512, syn, 0, errpos, 0, corr); | ||
228 | for (i = 0; i < numerr; i++) { | ||
229 | do_error_correction_in_your_buffer(errpos[i], corr[i]); | ||
230 | } | ||
231 | </programlisting> | ||
232 | </sect2> | ||
233 | </sect1> | ||
234 | <sect1> | ||
235 | <title>Cleanup</title> | ||
236 | <para> | ||
237 | The function free_rs frees the allocated resources, | ||
238 | if the caller is the last user of the decoder. | ||
239 | </para> | ||
240 | <programlisting> | ||
241 | /* Release resources */ | ||
242 | free_rs(rs_decoder); | ||
243 | </programlisting> | ||
244 | </sect1> | ||
245 | |||
246 | </chapter> | ||
247 | |||
248 | <chapter id="structs"> | ||
249 | <title>Structures</title> | ||
250 | <para> | ||
251 | This chapter contains the autogenerated documentation of the structures which are | ||
252 | used in the Reed-Solomon Library and are relevant for a developer. | ||
253 | </para> | ||
254 | !Iinclude/linux/rslib.h | ||
255 | </chapter> | ||
256 | |||
257 | <chapter id="pubfunctions"> | ||
258 | <title>Public Functions Provided</title> | ||
259 | <para> | ||
260 | This chapter contains the autogenerated documentation of the Reed-Solomon functions | ||
261 | which are exported. | ||
262 | </para> | ||
263 | !Elib/reed_solomon/reed_solomon.c | ||
264 | </chapter> | ||
265 | |||
266 | <chapter id="credits"> | ||
267 | <title>Credits</title> | ||
268 | <para> | ||
269 | The library code for encoding and decoding was written by Phil Karn. | ||
270 | </para> | ||
271 | <programlisting> | ||
272 | Copyright 2002, Phil Karn, KA9Q | ||
273 | May be used under the terms of the GNU General Public License (GPL) | ||
274 | </programlisting> | ||
275 | <para> | ||
276 | The wrapper functions and interfaces are written by Thomas Gleixner. | ||
277 | </para> | ||
278 | <para> | ||
279 | Many users have provided bugfixes, improvements and helping hands for testing. | ||
280 | Thanks a lot. | ||
281 | </para> | ||
282 | <para> | ||
283 | The following people have contributed to this document: | ||
284 | </para> | ||
285 | <para> | ||
286 | Thomas Gleixner<email>tglx@linutronix.de</email> | ||
287 | </para> | ||
288 | </chapter> | ||
289 | </book> | ||
diff --git a/Documentation/DocBook/lsm.tmpl b/Documentation/DocBook/lsm.tmpl deleted file mode 100644 index fe7664ce9667..000000000000 --- a/Documentation/DocBook/lsm.tmpl +++ /dev/null | |||
@@ -1,265 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <article class="whitepaper" id="LinuxSecurityModule" lang="en"> | ||
6 | <articleinfo> | ||
7 | <title>Linux Security Modules: General Security Hooks for Linux</title> | ||
8 | <authorgroup> | ||
9 | <author> | ||
10 | <firstname>Stephen</firstname> | ||
11 | <surname>Smalley</surname> | ||
12 | <affiliation> | ||
13 | <orgname>NAI Labs</orgname> | ||
14 | <address><email>ssmalley@nai.com</email></address> | ||
15 | </affiliation> | ||
16 | </author> | ||
17 | <author> | ||
18 | <firstname>Timothy</firstname> | ||
19 | <surname>Fraser</surname> | ||
20 | <affiliation> | ||
21 | <orgname>NAI Labs</orgname> | ||
22 | <address><email>tfraser@nai.com</email></address> | ||
23 | </affiliation> | ||
24 | </author> | ||
25 | <author> | ||
26 | <firstname>Chris</firstname> | ||
27 | <surname>Vance</surname> | ||
28 | <affiliation> | ||
29 | <orgname>NAI Labs</orgname> | ||
30 | <address><email>cvance@nai.com</email></address> | ||
31 | </affiliation> | ||
32 | </author> | ||
33 | </authorgroup> | ||
34 | </articleinfo> | ||
35 | |||
36 | <sect1 id="Introduction"><title>Introduction</title> | ||
37 | |||
38 | <para> | ||
39 | In March 2001, the National Security Agency (NSA) gave a presentation | ||
40 | about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel | ||
41 | Summit. SELinux is an implementation of flexible and fine-grained | ||
42 | nondiscretionary access controls in the Linux kernel, originally | ||
43 | implemented as its own particular kernel patch. Several other | ||
44 | security projects (e.g. RSBAC, Medusa) have also developed flexible | ||
45 | access control architectures for the Linux kernel, and various | ||
46 | projects have developed particular access control models for Linux | ||
47 | (e.g. LIDS, DTE, SubDomain). Each project has developed and | ||
48 | maintained its own kernel patch to support its security needs. | ||
49 | </para> | ||
50 | |||
51 | <para> | ||
52 | In response to the NSA presentation, Linus Torvalds made a set of | ||
53 | remarks that described a security framework he would be willing to | ||
54 | consider for inclusion in the mainstream Linux kernel. He described a | ||
55 | general framework that would provide a set of security hooks to | ||
56 | control operations on kernel objects and a set of opaque security | ||
57 | fields in kernel data structures for maintaining security attributes. | ||
58 | This framework could then be used by loadable kernel modules to | ||
59 | implement any desired model of security. Linus also suggested the | ||
60 | possibility of migrating the Linux capabilities code into such a | ||
61 | module. | ||
62 | </para> | ||
63 | |||
64 | <para> | ||
65 | The Linux Security Modules (LSM) project was started by WireX to | ||
66 | develop such a framework. LSM is a joint development effort by | ||
67 | several security projects, including Immunix, SELinux, SGI and Janus, | ||
68 | and several individuals, including Greg Kroah-Hartman and James | ||
69 | Morris, to develop a Linux kernel patch that implements this | ||
70 | framework. The patch is currently tracking the 2.4 series and is | ||
71 | targeted for integration into the 2.5 development series. This | ||
72 | technical report provides an overview of the framework and the example | ||
73 | capabilities security module provided by the LSM kernel patch. | ||
74 | </para> | ||
75 | |||
76 | </sect1> | ||
77 | |||
78 | <sect1 id="framework"><title>LSM Framework</title> | ||
79 | |||
80 | <para> | ||
81 | The LSM kernel patch provides a general kernel framework to support | ||
82 | security modules. In particular, the LSM framework is primarily | ||
83 | focused on supporting access control modules, although future | ||
84 | development is likely to address other security needs such as | ||
85 | auditing. By itself, the framework does not provide any additional | ||
86 | security; it merely provides the infrastructure to support security | ||
87 | modules. The LSM kernel patch also moves most of the capabilities | ||
88 | logic into an optional security module, with the system defaulting | ||
89 | to the traditional superuser logic. This capabilities module | ||
90 | is discussed further in <xref linkend="cap"/>. | ||
91 | </para> | ||
92 | |||
93 | <para> | ||
94 | The LSM kernel patch adds security fields to kernel data structures | ||
95 | and inserts calls to hook functions at critical points in the kernel | ||
96 | code to manage the security fields and to perform access control. It | ||
97 | also adds functions for registering and unregistering security | ||
98 | modules, and adds a general <function>security</function> system call | ||
99 | to support new system calls for security-aware applications. | ||
100 | </para> | ||
101 | |||
102 | <para> | ||
103 | The LSM security fields are simply <type>void*</type> pointers. For | ||
104 | process and program execution security information, security fields | ||
105 | were added to <structname>struct task_struct</structname> and | ||
106 | <structname>struct linux_binprm</structname>. For filesystem security | ||
107 | information, a security field was added to | ||
108 | <structname>struct super_block</structname>. For pipe, file, and socket | ||
109 | security information, security fields were added to | ||
110 | <structname>struct inode</structname> and | ||
111 | <structname>struct file</structname>. For packet and network device security | ||
112 | information, security fields were added to | ||
113 | <structname>struct sk_buff</structname> and | ||
114 | <structname>struct net_device</structname>. For System V IPC security | ||
115 | information, security fields were added to | ||
116 | <structname>struct kern_ipc_perm</structname> and | ||
117 | <structname>struct msg_msg</structname>; additionally, the definitions | ||
118 | for <structname>struct msg_msg</structname>, <structname>struct | ||
119 | msg_queue</structname>, and <structname>struct | ||
120 | shmid_kernel</structname> were moved to header files | ||
121 | (<filename>include/linux/msg.h</filename> and | ||
122 | <filename>include/linux/shm.h</filename> as appropriate) to allow | ||
123 | the security modules to use these definitions. | ||
124 | </para> | ||
125 | |||
126 | <para> | ||
127 | Each LSM hook is a function pointer in a global table, | ||
128 | security_ops. This table is a | ||
129 | <structname>security_operations</structname> structure as defined by | ||
130 | <filename>include/linux/security.h</filename>. Detailed documentation | ||
131 | for each hook is included in this header file. At present, this | ||
132 | structure consists of a collection of substructures that group related | ||
133 | hooks based on the kernel object (e.g. task, inode, file, sk_buff, | ||
134 | etc) as well as some top-level hook function pointers for system | ||
135 | operations. This structure is likely to be flattened in the future | ||
136 | for performance. The placement of the hook calls in the kernel code | ||
137 | is described by the "called:" lines in the per-hook documentation in | ||
138 | the header file. The hook calls can also be easily found in the | ||
139 | kernel code by looking for the string "security_ops->". | ||
140 | |||
141 | </para> | ||
142 | |||
143 | <para> | ||
144 | Linus mentioned per-process security hooks in his original remarks as a | ||
145 | possible alternative to global security hooks. However, if LSM were | ||
146 | to start from the perspective of per-process hooks, then the base | ||
147 | framework would have to deal with how to handle operations that | ||
148 | involve multiple processes (e.g. kill), since each process might have | ||
149 | its own hook for controlling the operation. This would require a | ||
150 | general mechanism for composing hooks in the base framework. | ||
151 | Additionally, LSM would still need global hooks for operations that | ||
152 | have no process context (e.g. network input operations). | ||
153 | Consequently, LSM provides global security hooks, but a security | ||
154 | module is free to implement per-process hooks (where that makes sense) | ||
155 | by storing a security_ops table in each process' security field and | ||
156 | then invoking these per-process hooks from the global hooks. | ||
157 | The problem of composition is thus deferred to the module. | ||
158 | </para> | ||
159 | |||
160 | <para> | ||
161 | The global security_ops table is initialized to a set of hook | ||
162 | functions provided by a dummy security module that provides | ||
163 | traditional superuser logic. A <function>register_security</function> | ||
164 | function (in <filename>security/security.c</filename>) is provided to | ||
165 | allow a security module to set security_ops to refer to its own hook | ||
166 | functions, and an <function>unregister_security</function> function is | ||
167 | provided to revert security_ops to the dummy module hooks. This | ||
168 | mechanism is used to set the primary security module, which is | ||
169 | responsible for making the final decision for each hook. | ||
170 | </para> | ||
171 | |||
172 | <para> | ||
173 | LSM also provides a simple mechanism for stacking additional security | ||
174 | modules with the primary security module. It defines | ||
175 | <function>register_security</function> and | ||
176 | <function>unregister_security</function> hooks in the | ||
177 | <structname>security_operations</structname> structure and provides | ||
178 | <function>mod_reg_security</function> and | ||
179 | <function>mod_unreg_security</function> functions that invoke these | ||
180 | hooks after performing some sanity checking. A security module can | ||
181 | call these functions in order to stack with other modules. However, | ||
182 | the actual details of how this stacking is handled are deferred to the | ||
183 | module, which can implement these hooks in any way it wishes | ||
184 | (including always returning an error if it does not wish to support | ||
185 | stacking). In this manner, LSM again defers the problem of | ||
186 | composition to the module. | ||
187 | </para> | ||
188 | |||
189 | <para> | ||
190 | Although the LSM hooks are organized into substructures based on | ||
191 | kernel object, all of the hooks can be viewed as falling into two | ||
192 | major categories: hooks that are used to manage the security fields | ||
193 | and hooks that are used to perform access control. Examples of the | ||
194 | first category of hooks include the | ||
195 | <function>alloc_security</function> and | ||
196 | <function>free_security</function> hooks defined for each kernel data | ||
197 | structure that has a security field. These hooks are used to allocate | ||
198 | and free security structures for kernel objects. The first category | ||
199 | of hooks also includes hooks that set information in the security | ||
200 | field after allocation, such as the <function>post_lookup</function> | ||
201 | hook in <structname>struct inode_security_ops</structname>. This hook | ||
202 | is used to set security information for inodes after successful lookup | ||
203 | operations. An example of the second category of hooks is the | ||
204 | <function>permission</function> hook in | ||
205 | <structname>struct inode_security_ops</structname>. This hook checks | ||
206 | permission when accessing an inode. | ||
207 | </para> | ||
208 | |||
209 | </sect1> | ||
210 | |||
211 | <sect1 id="cap"><title>LSM Capabilities Module</title> | ||
212 | |||
213 | <para> | ||
214 | The LSM kernel patch moves most of the existing POSIX.1e capabilities | ||
215 | logic into an optional security module stored in the file | ||
216 | <filename>security/capability.c</filename>. This change allows | ||
217 | users who do not want to use capabilities to omit this code entirely | ||
218 | from their kernel, instead using the dummy module for traditional | ||
219 | superuser logic or any other module that they desire. This change | ||
220 | also allows the developers of the capabilities logic to maintain and | ||
221 | enhance their code more freely, without needing to integrate patches | ||
222 | back into the base kernel. | ||
223 | </para> | ||
224 | |||
225 | <para> | ||
226 | In addition to moving the capabilities logic, the LSM kernel patch | ||
227 | could move the capability-related fields from the kernel data | ||
228 | structures into the new security fields managed by the security | ||
229 | modules. However, at present, the LSM kernel patch leaves the | ||
230 | capability fields in the kernel data structures. In his original | ||
231 | remarks, Linus suggested that this might be preferable so that other | ||
232 | security modules can be easily stacked with the capabilities module | ||
233 | without needing to chain multiple security structures on the security field. | ||
234 | It also avoids imposing extra overhead on the capabilities module | ||
235 | to manage the security fields. However, the LSM framework could | ||
236 | certainly support such a move if it is determined to be desirable, | ||
237 | with only a few additional changes described below. | ||
238 | </para> | ||
239 | |||
240 | <para> | ||
241 | At present, the capabilities logic for computing process capabilities | ||
242 | on <function>execve</function> and <function>set*uid</function>, | ||
243 | checking capabilities for a particular process, saving and checking | ||
244 | capabilities for netlink messages, and handling the | ||
245 | <function>capget</function> and <function>capset</function> system | ||
246 | calls have been moved into the capabilities module. There are still a | ||
247 | few locations in the base kernel where capability-related fields are | ||
248 | directly examined or modified, but the current version of the LSM | ||
249 | patch does allow a security module to completely replace the | ||
250 | assignment and testing of capabilities. These few locations would | ||
251 | need to be changed if the capability-related fields were moved into | ||
252 | the security field. The following is a list of known locations that | ||
253 | still perform such direct examination or modification of | ||
254 | capability-related fields: | ||
255 | <itemizedlist> | ||
256 | <listitem><para><filename>fs/open.c</filename>:<function>sys_access</function></para></listitem> | ||
257 | <listitem><para><filename>fs/lockd/host.c</filename>:<function>nlm_bind_host</function></para></listitem> | ||
258 | <listitem><para><filename>fs/nfsd/auth.c</filename>:<function>nfsd_setuser</function></para></listitem> | ||
259 | <listitem><para><filename>fs/proc/array.c</filename>:<function>task_cap</function></para></listitem> | ||
260 | </itemizedlist> | ||
261 | </para> | ||
262 | |||
263 | </sect1> | ||
264 | |||
265 | </article> | ||
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl deleted file mode 100644 index b442921bca54..000000000000 --- a/Documentation/DocBook/mtdnand.tmpl +++ /dev/null | |||
@@ -1,1291 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="MTD-NAND-Guide"> | ||
6 | <bookinfo> | ||
7 | <title>MTD NAND Driver Programming Interface</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Thomas</firstname> | ||
12 | <surname>Gleixner</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>tglx@linutronix.de</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | </authorgroup> | ||
20 | |||
21 | <copyright> | ||
22 | <year>2004</year> | ||
23 | <holder>Thomas Gleixner</holder> | ||
24 | </copyright> | ||
25 | |||
26 | <legalnotice> | ||
27 | <para> | ||
28 | This documentation is free software; you can redistribute | ||
29 | it and/or modify it under the terms of the GNU General Public | ||
30 | License version 2 as published by the Free Software Foundation. | ||
31 | </para> | ||
32 | |||
33 | <para> | ||
34 | This program is distributed in the hope that it will be | ||
35 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
36 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
37 | See the GNU General Public License for more details. | ||
38 | </para> | ||
39 | |||
40 | <para> | ||
41 | You should have received a copy of the GNU General Public | ||
42 | License along with this program; if not, write to the Free | ||
43 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
44 | MA 02111-1307 USA | ||
45 | </para> | ||
46 | |||
47 | <para> | ||
48 | For more details see the file COPYING in the source | ||
49 | distribution of Linux. | ||
50 | </para> | ||
51 | </legalnotice> | ||
52 | </bookinfo> | ||
53 | |||
54 | <toc></toc> | ||
55 | |||
56 | <chapter id="intro"> | ||
57 | <title>Introduction</title> | ||
58 | <para> | ||
59 | The generic NAND driver supports almost all NAND and AG-AND based | ||
60 | chips and connects them to the Memory Technology Devices (MTD) | ||
61 | subsystem of the Linux Kernel. | ||
62 | </para> | ||
63 | <para> | ||
64 | This documentation is provided for developers who want to implement | ||
65 | board drivers or filesystem drivers suitable for NAND devices. | ||
66 | </para> | ||
67 | </chapter> | ||
68 | |||
69 | <chapter id="bugs"> | ||
70 | <title>Known Bugs And Assumptions</title> | ||
71 | <para> | ||
72 | None. | ||
73 | </para> | ||
74 | </chapter> | ||
75 | |||
76 | <chapter id="dochints"> | ||
77 | <title>Documentation hints</title> | ||
78 | <para> | ||
79 | The function and structure docs are autogenerated. Each function and | ||
80 | struct member has a short description which is marked with an [XXX] identifier. | ||
81 | The following chapters explain the meaning of those identifiers. | ||
82 | </para> | ||
83 | <sect1 id="Function_identifiers_XXX"> | ||
84 | <title>Function identifiers [XXX]</title> | ||
85 | <para> | ||
86 | The functions are marked with [XXX] identifiers in the short | ||
87 | comment. The identifiers explain the usage and scope of the | ||
88 | functions. Following identifiers are used: | ||
89 | </para> | ||
90 | <itemizedlist> | ||
91 | <listitem><para> | ||
92 | [MTD Interface]</para><para> | ||
93 | These functions provide the interface to the MTD kernel API. | ||
94 | They are not replaceable and provide functionality | ||
95 | which is complete hardware independent. | ||
96 | </para></listitem> | ||
97 | <listitem><para> | ||
98 | [NAND Interface]</para><para> | ||
99 | These functions are exported and provide the interface to the NAND kernel API. | ||
100 | </para></listitem> | ||
101 | <listitem><para> | ||
102 | [GENERIC]</para><para> | ||
103 | Generic functions are not replaceable and provide functionality | ||
104 | which is complete hardware independent. | ||
105 | </para></listitem> | ||
106 | <listitem><para> | ||
107 | [DEFAULT]</para><para> | ||
108 | Default functions provide hardware related functionality which is suitable | ||
109 | for most of the implementations. These functions can be replaced by the | ||
110 | board driver if necessary. Those functions are called via pointers in the | ||
111 | NAND chip description structure. The board driver can set the functions which | ||
112 | should be replaced by board dependent functions before calling nand_scan(). | ||
113 | If the function pointer is NULL on entry to nand_scan() then the pointer | ||
114 | is set to the default function which is suitable for the detected chip type. | ||
115 | </para></listitem> | ||
116 | </itemizedlist> | ||
117 | </sect1> | ||
118 | <sect1 id="Struct_member_identifiers_XXX"> | ||
119 | <title>Struct member identifiers [XXX]</title> | ||
120 | <para> | ||
121 | The struct members are marked with [XXX] identifiers in the | ||
122 | comment. The identifiers explain the usage and scope of the | ||
123 | members. Following identifiers are used: | ||
124 | </para> | ||
125 | <itemizedlist> | ||
126 | <listitem><para> | ||
127 | [INTERN]</para><para> | ||
128 | These members are for NAND driver internal use only and must not be | ||
129 | modified. Most of these values are calculated from the chip geometry | ||
130 | information which is evaluated during nand_scan(). | ||
131 | </para></listitem> | ||
132 | <listitem><para> | ||
133 | [REPLACEABLE]</para><para> | ||
134 | Replaceable members hold hardware related functions which can be | ||
135 | provided by the board driver. The board driver can set the functions which | ||
136 | should be replaced by board dependent functions before calling nand_scan(). | ||
137 | If the function pointer is NULL on entry to nand_scan() then the pointer | ||
138 | is set to the default function which is suitable for the detected chip type. | ||
139 | </para></listitem> | ||
140 | <listitem><para> | ||
141 | [BOARDSPECIFIC]</para><para> | ||
142 | Board specific members hold hardware related information which must | ||
143 | be provided by the board driver. The board driver must set the function | ||
144 | pointers and datafields before calling nand_scan(). | ||
145 | </para></listitem> | ||
146 | <listitem><para> | ||
147 | [OPTIONAL]</para><para> | ||
148 | Optional members can hold information relevant for the board driver. The | ||
149 | generic NAND driver code does not use this information. | ||
150 | </para></listitem> | ||
151 | </itemizedlist> | ||
152 | </sect1> | ||
153 | </chapter> | ||
154 | |||
155 | <chapter id="basicboarddriver"> | ||
156 | <title>Basic board driver</title> | ||
157 | <para> | ||
158 | For most boards it will be sufficient to provide just the | ||
159 | basic functions and fill out some really board dependent | ||
160 | members in the nand chip description structure. | ||
161 | </para> | ||
162 | <sect1 id="Basic_defines"> | ||
163 | <title>Basic defines</title> | ||
164 | <para> | ||
165 | At least you have to provide a nand_chip structure | ||
166 | and a storage for the ioremap'ed chip address. | ||
167 | You can allocate the nand_chip structure using | ||
168 | kmalloc or you can allocate it statically. | ||
169 | The NAND chip structure embeds an mtd structure | ||
170 | which will be registered to the MTD subsystem. | ||
171 | You can extract a pointer to the mtd structure | ||
172 | from a nand_chip pointer using the nand_to_mtd() | ||
173 | helper. | ||
174 | </para> | ||
175 | <para> | ||
176 | Kmalloc based example | ||
177 | </para> | ||
178 | <programlisting> | ||
179 | static struct mtd_info *board_mtd; | ||
180 | static void __iomem *baseaddr; | ||
181 | </programlisting> | ||
182 | <para> | ||
183 | Static example | ||
184 | </para> | ||
185 | <programlisting> | ||
186 | static struct nand_chip board_chip; | ||
187 | static void __iomem *baseaddr; | ||
188 | </programlisting> | ||
189 | </sect1> | ||
190 | <sect1 id="Partition_defines"> | ||
191 | <title>Partition defines</title> | ||
192 | <para> | ||
193 | If you want to divide your device into partitions, then | ||
194 | define a partitioning scheme suitable to your board. | ||
195 | </para> | ||
196 | <programlisting> | ||
197 | #define NUM_PARTITIONS 2 | ||
198 | static struct mtd_partition partition_info[] = { | ||
199 | { .name = "Flash partition 1", | ||
200 | .offset = 0, | ||
201 | .size = 8 * 1024 * 1024 }, | ||
202 | { .name = "Flash partition 2", | ||
203 | .offset = MTDPART_OFS_NEXT, | ||
204 | .size = MTDPART_SIZ_FULL }, | ||
205 | }; | ||
206 | </programlisting> | ||
207 | </sect1> | ||
208 | <sect1 id="Hardware_control_functions"> | ||
209 | <title>Hardware control function</title> | ||
210 | <para> | ||
211 | The hardware control function provides access to the | ||
212 | control pins of the NAND chip(s). | ||
213 | The access can be done by GPIO pins or by address lines. | ||
214 | If you use address lines, make sure that the timing | ||
215 | requirements are met. | ||
216 | </para> | ||
217 | <para> | ||
218 | <emphasis>GPIO based example</emphasis> | ||
219 | </para> | ||
220 | <programlisting> | ||
221 | static void board_hwcontrol(struct mtd_info *mtd, int cmd) | ||
222 | { | ||
223 | switch(cmd){ | ||
224 | case NAND_CTL_SETCLE: /* Set CLE pin high */ break; | ||
225 | case NAND_CTL_CLRCLE: /* Set CLE pin low */ break; | ||
226 | case NAND_CTL_SETALE: /* Set ALE pin high */ break; | ||
227 | case NAND_CTL_CLRALE: /* Set ALE pin low */ break; | ||
228 | case NAND_CTL_SETNCE: /* Set nCE pin low */ break; | ||
229 | case NAND_CTL_CLRNCE: /* Set nCE pin high */ break; | ||
230 | } | ||
231 | } | ||
232 | </programlisting> | ||
233 | <para> | ||
234 | <emphasis>Address lines based example.</emphasis> It's assumed that the | ||
235 | nCE pin is driven by a chip select decoder. | ||
236 | </para> | ||
237 | <programlisting> | ||
238 | static void board_hwcontrol(struct mtd_info *mtd, int cmd) | ||
239 | { | ||
240 | struct nand_chip *this = mtd_to_nand(mtd); | ||
241 | switch(cmd){ | ||
242 | case NAND_CTL_SETCLE: this->IO_ADDR_W |= CLE_ADRR_BIT; break; | ||
243 | case NAND_CTL_CLRCLE: this->IO_ADDR_W &= ~CLE_ADRR_BIT; break; | ||
244 | case NAND_CTL_SETALE: this->IO_ADDR_W |= ALE_ADRR_BIT; break; | ||
245 | case NAND_CTL_CLRALE: this->IO_ADDR_W &= ~ALE_ADRR_BIT; break; | ||
246 | } | ||
247 | } | ||
248 | </programlisting> | ||
249 | </sect1> | ||
250 | <sect1 id="Device_ready_function"> | ||
251 | <title>Device ready function</title> | ||
252 | <para> | ||
253 | If the hardware interface has the ready busy pin of the NAND chip connected to a | ||
254 | GPIO or other accessible I/O pin, this function is used to read back the state of the | ||
255 | pin. The function has no arguments and should return 0, if the device is busy (R/B pin | ||
256 | is low) and 1, if the device is ready (R/B pin is high). | ||
257 | If the hardware interface does not give access to the ready busy pin, then | ||
258 | the function must not be defined and the function pointer this->dev_ready is set to NULL. | ||
259 | </para> | ||
260 | </sect1> | ||
261 | <sect1 id="Init_function"> | ||
262 | <title>Init function</title> | ||
263 | <para> | ||
264 | The init function allocates memory and sets up all the board | ||
265 | specific parameters and function pointers. When everything | ||
266 | is set up nand_scan() is called. This function tries to | ||
267 | detect and identify then chip. If a chip is found all the | ||
268 | internal data fields are initialized accordingly. | ||
269 | The structure(s) have to be zeroed out first and then filled with the necessary | ||
270 | information about the device. | ||
271 | </para> | ||
272 | <programlisting> | ||
273 | static int __init board_init (void) | ||
274 | { | ||
275 | struct nand_chip *this; | ||
276 | int err = 0; | ||
277 | |||
278 | /* Allocate memory for MTD device structure and private data */ | ||
279 | this = kzalloc(sizeof(struct nand_chip), GFP_KERNEL); | ||
280 | if (!this) { | ||
281 | printk ("Unable to allocate NAND MTD device structure.\n"); | ||
282 | err = -ENOMEM; | ||
283 | goto out; | ||
284 | } | ||
285 | |||
286 | board_mtd = nand_to_mtd(this); | ||
287 | |||
288 | /* map physical address */ | ||
289 | baseaddr = ioremap(CHIP_PHYSICAL_ADDRESS, 1024); | ||
290 | if (!baseaddr) { | ||
291 | printk("Ioremap to access NAND chip failed\n"); | ||
292 | err = -EIO; | ||
293 | goto out_mtd; | ||
294 | } | ||
295 | |||
296 | /* Set address of NAND IO lines */ | ||
297 | this->IO_ADDR_R = baseaddr; | ||
298 | this->IO_ADDR_W = baseaddr; | ||
299 | /* Reference hardware control function */ | ||
300 | this->hwcontrol = board_hwcontrol; | ||
301 | /* Set command delay time, see datasheet for correct value */ | ||
302 | this->chip_delay = CHIP_DEPENDEND_COMMAND_DELAY; | ||
303 | /* Assign the device ready function, if available */ | ||
304 | this->dev_ready = board_dev_ready; | ||
305 | this->eccmode = NAND_ECC_SOFT; | ||
306 | |||
307 | /* Scan to find existence of the device */ | ||
308 | if (nand_scan (board_mtd, 1)) { | ||
309 | err = -ENXIO; | ||
310 | goto out_ior; | ||
311 | } | ||
312 | |||
313 | add_mtd_partitions(board_mtd, partition_info, NUM_PARTITIONS); | ||
314 | goto out; | ||
315 | |||
316 | out_ior: | ||
317 | iounmap(baseaddr); | ||
318 | out_mtd: | ||
319 | kfree (this); | ||
320 | out: | ||
321 | return err; | ||
322 | } | ||
323 | module_init(board_init); | ||
324 | </programlisting> | ||
325 | </sect1> | ||
326 | <sect1 id="Exit_function"> | ||
327 | <title>Exit function</title> | ||
328 | <para> | ||
329 | The exit function is only necessary if the driver is | ||
330 | compiled as a module. It releases all resources which | ||
331 | are held by the chip driver and unregisters the partitions | ||
332 | in the MTD layer. | ||
333 | </para> | ||
334 | <programlisting> | ||
335 | #ifdef MODULE | ||
336 | static void __exit board_cleanup (void) | ||
337 | { | ||
338 | /* Release resources, unregister device */ | ||
339 | nand_release (board_mtd); | ||
340 | |||
341 | /* unmap physical address */ | ||
342 | iounmap(baseaddr); | ||
343 | |||
344 | /* Free the MTD device structure */ | ||
345 | kfree (mtd_to_nand(board_mtd)); | ||
346 | } | ||
347 | module_exit(board_cleanup); | ||
348 | #endif | ||
349 | </programlisting> | ||
350 | </sect1> | ||
351 | </chapter> | ||
352 | |||
353 | <chapter id="boarddriversadvanced"> | ||
354 | <title>Advanced board driver functions</title> | ||
355 | <para> | ||
356 | This chapter describes the advanced functionality of the NAND | ||
357 | driver. For a list of functions which can be overridden by the board | ||
358 | driver see the documentation of the nand_chip structure. | ||
359 | </para> | ||
360 | <sect1 id="Multiple_chip_control"> | ||
361 | <title>Multiple chip control</title> | ||
362 | <para> | ||
363 | The nand driver can control chip arrays. Therefore the | ||
364 | board driver must provide an own select_chip function. This | ||
365 | function must (de)select the requested chip. | ||
366 | The function pointer in the nand_chip structure must | ||
367 | be set before calling nand_scan(). The maxchip parameter | ||
368 | of nand_scan() defines the maximum number of chips to | ||
369 | scan for. Make sure that the select_chip function can | ||
370 | handle the requested number of chips. | ||
371 | </para> | ||
372 | <para> | ||
373 | The nand driver concatenates the chips to one virtual | ||
374 | chip and provides this virtual chip to the MTD layer. | ||
375 | </para> | ||
376 | <para> | ||
377 | <emphasis>Note: The driver can only handle linear chip arrays | ||
378 | of equally sized chips. There is no support for | ||
379 | parallel arrays which extend the buswidth.</emphasis> | ||
380 | </para> | ||
381 | <para> | ||
382 | <emphasis>GPIO based example</emphasis> | ||
383 | </para> | ||
384 | <programlisting> | ||
385 | static void board_select_chip (struct mtd_info *mtd, int chip) | ||
386 | { | ||
387 | /* Deselect all chips, set all nCE pins high */ | ||
388 | GPIO(BOARD_NAND_NCE) |= 0xff; | ||
389 | if (chip >= 0) | ||
390 | GPIO(BOARD_NAND_NCE) &= ~ (1 << chip); | ||
391 | } | ||
392 | </programlisting> | ||
393 | <para> | ||
394 | <emphasis>Address lines based example.</emphasis> | ||
395 | Its assumed that the nCE pins are connected to an | ||
396 | address decoder. | ||
397 | </para> | ||
398 | <programlisting> | ||
399 | static void board_select_chip (struct mtd_info *mtd, int chip) | ||
400 | { | ||
401 | struct nand_chip *this = mtd_to_nand(mtd); | ||
402 | |||
403 | /* Deselect all chips */ | ||
404 | this->IO_ADDR_R &= ~BOARD_NAND_ADDR_MASK; | ||
405 | this->IO_ADDR_W &= ~BOARD_NAND_ADDR_MASK; | ||
406 | switch (chip) { | ||
407 | case 0: | ||
408 | this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIP0; | ||
409 | this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIP0; | ||
410 | break; | ||
411 | .... | ||
412 | case n: | ||
413 | this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIPn; | ||
414 | this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIPn; | ||
415 | break; | ||
416 | } | ||
417 | } | ||
418 | </programlisting> | ||
419 | </sect1> | ||
420 | <sect1 id="Hardware_ECC_support"> | ||
421 | <title>Hardware ECC support</title> | ||
422 | <sect2 id="Functions_and_constants"> | ||
423 | <title>Functions and constants</title> | ||
424 | <para> | ||
425 | The nand driver supports three different types of | ||
426 | hardware ECC. | ||
427 | <itemizedlist> | ||
428 | <listitem><para>NAND_ECC_HW3_256</para><para> | ||
429 | Hardware ECC generator providing 3 bytes ECC per | ||
430 | 256 byte. | ||
431 | </para> </listitem> | ||
432 | <listitem><para>NAND_ECC_HW3_512</para><para> | ||
433 | Hardware ECC generator providing 3 bytes ECC per | ||
434 | 512 byte. | ||
435 | </para> </listitem> | ||
436 | <listitem><para>NAND_ECC_HW6_512</para><para> | ||
437 | Hardware ECC generator providing 6 bytes ECC per | ||
438 | 512 byte. | ||
439 | </para> </listitem> | ||
440 | <listitem><para>NAND_ECC_HW8_512</para><para> | ||
441 | Hardware ECC generator providing 6 bytes ECC per | ||
442 | 512 byte. | ||
443 | </para> </listitem> | ||
444 | </itemizedlist> | ||
445 | If your hardware generator has a different functionality | ||
446 | add it at the appropriate place in nand_base.c | ||
447 | </para> | ||
448 | <para> | ||
449 | The board driver must provide following functions: | ||
450 | <itemizedlist> | ||
451 | <listitem><para>enable_hwecc</para><para> | ||
452 | This function is called before reading / writing to | ||
453 | the chip. Reset or initialize the hardware generator | ||
454 | in this function. The function is called with an | ||
455 | argument which let you distinguish between read | ||
456 | and write operations. | ||
457 | </para> </listitem> | ||
458 | <listitem><para>calculate_ecc</para><para> | ||
459 | This function is called after read / write from / to | ||
460 | the chip. Transfer the ECC from the hardware to | ||
461 | the buffer. If the option NAND_HWECC_SYNDROME is set | ||
462 | then the function is only called on write. See below. | ||
463 | </para> </listitem> | ||
464 | <listitem><para>correct_data</para><para> | ||
465 | In case of an ECC error this function is called for | ||
466 | error detection and correction. Return 1 respectively 2 | ||
467 | in case the error can be corrected. If the error is | ||
468 | not correctable return -1. If your hardware generator | ||
469 | matches the default algorithm of the nand_ecc software | ||
470 | generator then use the correction function provided | ||
471 | by nand_ecc instead of implementing duplicated code. | ||
472 | </para> </listitem> | ||
473 | </itemizedlist> | ||
474 | </para> | ||
475 | </sect2> | ||
476 | <sect2 id="Hardware_ECC_with_syndrome_calculation"> | ||
477 | <title>Hardware ECC with syndrome calculation</title> | ||
478 | <para> | ||
479 | Many hardware ECC implementations provide Reed-Solomon | ||
480 | codes and calculate an error syndrome on read. The syndrome | ||
481 | must be converted to a standard Reed-Solomon syndrome | ||
482 | before calling the error correction code in the generic | ||
483 | Reed-Solomon library. | ||
484 | </para> | ||
485 | <para> | ||
486 | The ECC bytes must be placed immediately after the data | ||
487 | bytes in order to make the syndrome generator work. This | ||
488 | is contrary to the usual layout used by software ECC. The | ||
489 | separation of data and out of band area is not longer | ||
490 | possible. The nand driver code handles this layout and | ||
491 | the remaining free bytes in the oob area are managed by | ||
492 | the autoplacement code. Provide a matching oob-layout | ||
493 | in this case. See rts_from4.c and diskonchip.c for | ||
494 | implementation reference. In those cases we must also | ||
495 | use bad block tables on FLASH, because the ECC layout is | ||
496 | interfering with the bad block marker positions. | ||
497 | See bad block table support for details. | ||
498 | </para> | ||
499 | </sect2> | ||
500 | </sect1> | ||
501 | <sect1 id="Bad_Block_table_support"> | ||
502 | <title>Bad block table support</title> | ||
503 | <para> | ||
504 | Most NAND chips mark the bad blocks at a defined | ||
505 | position in the spare area. Those blocks must | ||
506 | not be erased under any circumstances as the bad | ||
507 | block information would be lost. | ||
508 | It is possible to check the bad block mark each | ||
509 | time when the blocks are accessed by reading the | ||
510 | spare area of the first page in the block. This | ||
511 | is time consuming so a bad block table is used. | ||
512 | </para> | ||
513 | <para> | ||
514 | The nand driver supports various types of bad block | ||
515 | tables. | ||
516 | <itemizedlist> | ||
517 | <listitem><para>Per device</para><para> | ||
518 | The bad block table contains all bad block information | ||
519 | of the device which can consist of multiple chips. | ||
520 | </para> </listitem> | ||
521 | <listitem><para>Per chip</para><para> | ||
522 | A bad block table is used per chip and contains the | ||
523 | bad block information for this particular chip. | ||
524 | </para> </listitem> | ||
525 | <listitem><para>Fixed offset</para><para> | ||
526 | The bad block table is located at a fixed offset | ||
527 | in the chip (device). This applies to various | ||
528 | DiskOnChip devices. | ||
529 | </para> </listitem> | ||
530 | <listitem><para>Automatic placed</para><para> | ||
531 | The bad block table is automatically placed and | ||
532 | detected either at the end or at the beginning | ||
533 | of a chip (device) | ||
534 | </para> </listitem> | ||
535 | <listitem><para>Mirrored tables</para><para> | ||
536 | The bad block table is mirrored on the chip (device) to | ||
537 | allow updates of the bad block table without data loss. | ||
538 | </para> </listitem> | ||
539 | </itemizedlist> | ||
540 | </para> | ||
541 | <para> | ||
542 | nand_scan() calls the function nand_default_bbt(). | ||
543 | nand_default_bbt() selects appropriate default | ||
544 | bad block table descriptors depending on the chip information | ||
545 | which was retrieved by nand_scan(). | ||
546 | </para> | ||
547 | <para> | ||
548 | The standard policy is scanning the device for bad | ||
549 | blocks and build a ram based bad block table which | ||
550 | allows faster access than always checking the | ||
551 | bad block information on the flash chip itself. | ||
552 | </para> | ||
553 | <sect2 id="Flash_based_tables"> | ||
554 | <title>Flash based tables</title> | ||
555 | <para> | ||
556 | It may be desired or necessary to keep a bad block table in FLASH. | ||
557 | For AG-AND chips this is mandatory, as they have no factory marked | ||
558 | bad blocks. They have factory marked good blocks. The marker pattern | ||
559 | is erased when the block is erased to be reused. So in case of | ||
560 | powerloss before writing the pattern back to the chip this block | ||
561 | would be lost and added to the bad blocks. Therefore we scan the | ||
562 | chip(s) when we detect them the first time for good blocks and | ||
563 | store this information in a bad block table before erasing any | ||
564 | of the blocks. | ||
565 | </para> | ||
566 | <para> | ||
567 | The blocks in which the tables are stored are protected against | ||
568 | accidental access by marking them bad in the memory bad block | ||
569 | table. The bad block table management functions are allowed | ||
570 | to circumvent this protection. | ||
571 | </para> | ||
572 | <para> | ||
573 | The simplest way to activate the FLASH based bad block table support | ||
574 | is to set the option NAND_BBT_USE_FLASH in the bbt_option field of | ||
575 | the nand chip structure before calling nand_scan(). For AG-AND | ||
576 | chips is this done by default. | ||
577 | This activates the default FLASH based bad block table functionality | ||
578 | of the NAND driver. The default bad block table options are | ||
579 | <itemizedlist> | ||
580 | <listitem><para>Store bad block table per chip</para></listitem> | ||
581 | <listitem><para>Use 2 bits per block</para></listitem> | ||
582 | <listitem><para>Automatic placement at the end of the chip</para></listitem> | ||
583 | <listitem><para>Use mirrored tables with version numbers</para></listitem> | ||
584 | <listitem><para>Reserve 4 blocks at the end of the chip</para></listitem> | ||
585 | </itemizedlist> | ||
586 | </para> | ||
587 | </sect2> | ||
588 | <sect2 id="User_defined_tables"> | ||
589 | <title>User defined tables</title> | ||
590 | <para> | ||
591 | User defined tables are created by filling out a | ||
592 | nand_bbt_descr structure and storing the pointer in the | ||
593 | nand_chip structure member bbt_td before calling nand_scan(). | ||
594 | If a mirror table is necessary a second structure must be | ||
595 | created and a pointer to this structure must be stored | ||
596 | in bbt_md inside the nand_chip structure. If the bbt_md | ||
597 | member is set to NULL then only the main table is used | ||
598 | and no scan for the mirrored table is performed. | ||
599 | </para> | ||
600 | <para> | ||
601 | The most important field in the nand_bbt_descr structure | ||
602 | is the options field. The options define most of the | ||
603 | table properties. Use the predefined constants from | ||
604 | nand.h to define the options. | ||
605 | <itemizedlist> | ||
606 | <listitem><para>Number of bits per block</para> | ||
607 | <para>The supported number of bits is 1, 2, 4, 8.</para></listitem> | ||
608 | <listitem><para>Table per chip</para> | ||
609 | <para>Setting the constant NAND_BBT_PERCHIP selects that | ||
610 | a bad block table is managed for each chip in a chip array. | ||
611 | If this option is not set then a per device bad block table | ||
612 | is used.</para></listitem> | ||
613 | <listitem><para>Table location is absolute</para> | ||
614 | <para>Use the option constant NAND_BBT_ABSPAGE and | ||
615 | define the absolute page number where the bad block | ||
616 | table starts in the field pages. If you have selected bad block | ||
617 | tables per chip and you have a multi chip array then the start page | ||
618 | must be given for each chip in the chip array. Note: there is no scan | ||
619 | for a table ident pattern performed, so the fields | ||
620 | pattern, veroffs, offs, len can be left uninitialized</para></listitem> | ||
621 | <listitem><para>Table location is automatically detected</para> | ||
622 | <para>The table can either be located in the first or the last good | ||
623 | blocks of the chip (device). Set NAND_BBT_LASTBLOCK to place | ||
624 | the bad block table at the end of the chip (device). The | ||
625 | bad block tables are marked and identified by a pattern which | ||
626 | is stored in the spare area of the first page in the block which | ||
627 | holds the bad block table. Store a pointer to the pattern | ||
628 | in the pattern field. Further the length of the pattern has to be | ||
629 | stored in len and the offset in the spare area must be given | ||
630 | in the offs member of the nand_bbt_descr structure. For mirrored | ||
631 | bad block tables different patterns are mandatory.</para></listitem> | ||
632 | <listitem><para>Table creation</para> | ||
633 | <para>Set the option NAND_BBT_CREATE to enable the table creation | ||
634 | if no table can be found during the scan. Usually this is done only | ||
635 | once if a new chip is found. </para></listitem> | ||
636 | <listitem><para>Table write support</para> | ||
637 | <para>Set the option NAND_BBT_WRITE to enable the table write support. | ||
638 | This allows the update of the bad block table(s) in case a block has | ||
639 | to be marked bad due to wear. The MTD interface function block_markbad | ||
640 | is calling the update function of the bad block table. If the write | ||
641 | support is enabled then the table is updated on FLASH.</para> | ||
642 | <para> | ||
643 | Note: Write support should only be enabled for mirrored tables with | ||
644 | version control. | ||
645 | </para></listitem> | ||
646 | <listitem><para>Table version control</para> | ||
647 | <para>Set the option NAND_BBT_VERSION to enable the table version control. | ||
648 | It's highly recommended to enable this for mirrored tables with write | ||
649 | support. It makes sure that the risk of losing the bad block | ||
650 | table information is reduced to the loss of the information about the | ||
651 | one worn out block which should be marked bad. The version is stored in | ||
652 | 4 consecutive bytes in the spare area of the device. The position of | ||
653 | the version number is defined by the member veroffs in the bad block table | ||
654 | descriptor.</para></listitem> | ||
655 | <listitem><para>Save block contents on write</para> | ||
656 | <para> | ||
657 | In case that the block which holds the bad block table does contain | ||
658 | other useful information, set the option NAND_BBT_SAVECONTENT. When | ||
659 | the bad block table is written then the whole block is read the bad | ||
660 | block table is updated and the block is erased and everything is | ||
661 | written back. If this option is not set only the bad block table | ||
662 | is written and everything else in the block is ignored and erased. | ||
663 | </para></listitem> | ||
664 | <listitem><para>Number of reserved blocks</para> | ||
665 | <para> | ||
666 | For automatic placement some blocks must be reserved for | ||
667 | bad block table storage. The number of reserved blocks is defined | ||
668 | in the maxblocks member of the bad block table description structure. | ||
669 | Reserving 4 blocks for mirrored tables should be a reasonable number. | ||
670 | This also limits the number of blocks which are scanned for the bad | ||
671 | block table ident pattern. | ||
672 | </para></listitem> | ||
673 | </itemizedlist> | ||
674 | </para> | ||
675 | </sect2> | ||
676 | </sect1> | ||
677 | <sect1 id="Spare_area_placement"> | ||
678 | <title>Spare area (auto)placement</title> | ||
679 | <para> | ||
680 | The nand driver implements different possibilities for | ||
681 | placement of filesystem data in the spare area, | ||
682 | <itemizedlist> | ||
683 | <listitem><para>Placement defined by fs driver</para></listitem> | ||
684 | <listitem><para>Automatic placement</para></listitem> | ||
685 | </itemizedlist> | ||
686 | The default placement function is automatic placement. The | ||
687 | nand driver has built in default placement schemes for the | ||
688 | various chiptypes. If due to hardware ECC functionality the | ||
689 | default placement does not fit then the board driver can | ||
690 | provide a own placement scheme. | ||
691 | </para> | ||
692 | <para> | ||
693 | File system drivers can provide a own placement scheme which | ||
694 | is used instead of the default placement scheme. | ||
695 | </para> | ||
696 | <para> | ||
697 | Placement schemes are defined by a nand_oobinfo structure | ||
698 | <programlisting> | ||
699 | struct nand_oobinfo { | ||
700 | int useecc; | ||
701 | int eccbytes; | ||
702 | int eccpos[24]; | ||
703 | int oobfree[8][2]; | ||
704 | }; | ||
705 | </programlisting> | ||
706 | <itemizedlist> | ||
707 | <listitem><para>useecc</para><para> | ||
708 | The useecc member controls the ecc and placement function. The header | ||
709 | file include/mtd/mtd-abi.h contains constants to select ecc and | ||
710 | placement. MTD_NANDECC_OFF switches off the ecc complete. This is | ||
711 | not recommended and available for testing and diagnosis only. | ||
712 | MTD_NANDECC_PLACE selects caller defined placement, MTD_NANDECC_AUTOPLACE | ||
713 | selects automatic placement. | ||
714 | </para></listitem> | ||
715 | <listitem><para>eccbytes</para><para> | ||
716 | The eccbytes member defines the number of ecc bytes per page. | ||
717 | </para></listitem> | ||
718 | <listitem><para>eccpos</para><para> | ||
719 | The eccpos array holds the byte offsets in the spare area where | ||
720 | the ecc codes are placed. | ||
721 | </para></listitem> | ||
722 | <listitem><para>oobfree</para><para> | ||
723 | The oobfree array defines the areas in the spare area which can be | ||
724 | used for automatic placement. The information is given in the format | ||
725 | {offset, size}. offset defines the start of the usable area, size the | ||
726 | length in bytes. More than one area can be defined. The list is terminated | ||
727 | by an {0, 0} entry. | ||
728 | </para></listitem> | ||
729 | </itemizedlist> | ||
730 | </para> | ||
731 | <sect2 id="Placement_defined_by_fs_driver"> | ||
732 | <title>Placement defined by fs driver</title> | ||
733 | <para> | ||
734 | The calling function provides a pointer to a nand_oobinfo | ||
735 | structure which defines the ecc placement. For writes the | ||
736 | caller must provide a spare area buffer along with the | ||
737 | data buffer. The spare area buffer size is (number of pages) * | ||
738 | (size of spare area). For reads the buffer size is | ||
739 | (number of pages) * ((size of spare area) + (number of ecc | ||
740 | steps per page) * sizeof (int)). The driver stores the | ||
741 | result of the ecc check for each tuple in the spare buffer. | ||
742 | The storage sequence is | ||
743 | </para> | ||
744 | <para> | ||
745 | <spare data page 0><ecc result 0>...<ecc result n> | ||
746 | </para> | ||
747 | <para> | ||
748 | ... | ||
749 | </para> | ||
750 | <para> | ||
751 | <spare data page n><ecc result 0>...<ecc result n> | ||
752 | </para> | ||
753 | <para> | ||
754 | This is a legacy mode used by YAFFS1. | ||
755 | </para> | ||
756 | <para> | ||
757 | If the spare area buffer is NULL then only the ECC placement is | ||
758 | done according to the given scheme in the nand_oobinfo structure. | ||
759 | </para> | ||
760 | </sect2> | ||
761 | <sect2 id="Automatic_placement"> | ||
762 | <title>Automatic placement</title> | ||
763 | <para> | ||
764 | Automatic placement uses the built in defaults to place the | ||
765 | ecc bytes in the spare area. If filesystem data have to be stored / | ||
766 | read into the spare area then the calling function must provide a | ||
767 | buffer. The buffer size per page is determined by the oobfree array in | ||
768 | the nand_oobinfo structure. | ||
769 | </para> | ||
770 | <para> | ||
771 | If the spare area buffer is NULL then only the ECC placement is | ||
772 | done according to the default builtin scheme. | ||
773 | </para> | ||
774 | </sect2> | ||
775 | </sect1> | ||
776 | <sect1 id="Spare_area_autoplacement_default"> | ||
777 | <title>Spare area autoplacement default schemes</title> | ||
778 | <sect2 id="pagesize_256"> | ||
779 | <title>256 byte pagesize</title> | ||
780 | <informaltable><tgroup cols="3"><tbody> | ||
781 | <row> | ||
782 | <entry>Offset</entry> | ||
783 | <entry>Content</entry> | ||
784 | <entry>Comment</entry> | ||
785 | </row> | ||
786 | <row> | ||
787 | <entry>0x00</entry> | ||
788 | <entry>ECC byte 0</entry> | ||
789 | <entry>Error correction code byte 0</entry> | ||
790 | </row> | ||
791 | <row> | ||
792 | <entry>0x01</entry> | ||
793 | <entry>ECC byte 1</entry> | ||
794 | <entry>Error correction code byte 1</entry> | ||
795 | </row> | ||
796 | <row> | ||
797 | <entry>0x02</entry> | ||
798 | <entry>ECC byte 2</entry> | ||
799 | <entry>Error correction code byte 2</entry> | ||
800 | </row> | ||
801 | <row> | ||
802 | <entry>0x03</entry> | ||
803 | <entry>Autoplace 0</entry> | ||
804 | <entry></entry> | ||
805 | </row> | ||
806 | <row> | ||
807 | <entry>0x04</entry> | ||
808 | <entry>Autoplace 1</entry> | ||
809 | <entry></entry> | ||
810 | </row> | ||
811 | <row> | ||
812 | <entry>0x05</entry> | ||
813 | <entry>Bad block marker</entry> | ||
814 | <entry>If any bit in this byte is zero, then this block is bad. | ||
815 | This applies only to the first page in a block. In the remaining | ||
816 | pages this byte is reserved</entry> | ||
817 | </row> | ||
818 | <row> | ||
819 | <entry>0x06</entry> | ||
820 | <entry>Autoplace 2</entry> | ||
821 | <entry></entry> | ||
822 | </row> | ||
823 | <row> | ||
824 | <entry>0x07</entry> | ||
825 | <entry>Autoplace 3</entry> | ||
826 | <entry></entry> | ||
827 | </row> | ||
828 | </tbody></tgroup></informaltable> | ||
829 | </sect2> | ||
830 | <sect2 id="pagesize_512"> | ||
831 | <title>512 byte pagesize</title> | ||
832 | <informaltable><tgroup cols="3"><tbody> | ||
833 | <row> | ||
834 | <entry>Offset</entry> | ||
835 | <entry>Content</entry> | ||
836 | <entry>Comment</entry> | ||
837 | </row> | ||
838 | <row> | ||
839 | <entry>0x00</entry> | ||
840 | <entry>ECC byte 0</entry> | ||
841 | <entry>Error correction code byte 0 of the lower 256 Byte data in | ||
842 | this page</entry> | ||
843 | </row> | ||
844 | <row> | ||
845 | <entry>0x01</entry> | ||
846 | <entry>ECC byte 1</entry> | ||
847 | <entry>Error correction code byte 1 of the lower 256 Bytes of data | ||
848 | in this page</entry> | ||
849 | </row> | ||
850 | <row> | ||
851 | <entry>0x02</entry> | ||
852 | <entry>ECC byte 2</entry> | ||
853 | <entry>Error correction code byte 2 of the lower 256 Bytes of data | ||
854 | in this page</entry> | ||
855 | </row> | ||
856 | <row> | ||
857 | <entry>0x03</entry> | ||
858 | <entry>ECC byte 3</entry> | ||
859 | <entry>Error correction code byte 0 of the upper 256 Bytes of data | ||
860 | in this page</entry> | ||
861 | </row> | ||
862 | <row> | ||
863 | <entry>0x04</entry> | ||
864 | <entry>reserved</entry> | ||
865 | <entry>reserved</entry> | ||
866 | </row> | ||
867 | <row> | ||
868 | <entry>0x05</entry> | ||
869 | <entry>Bad block marker</entry> | ||
870 | <entry>If any bit in this byte is zero, then this block is bad. | ||
871 | This applies only to the first page in a block. In the remaining | ||
872 | pages this byte is reserved</entry> | ||
873 | </row> | ||
874 | <row> | ||
875 | <entry>0x06</entry> | ||
876 | <entry>ECC byte 4</entry> | ||
877 | <entry>Error correction code byte 1 of the upper 256 Bytes of data | ||
878 | in this page</entry> | ||
879 | </row> | ||
880 | <row> | ||
881 | <entry>0x07</entry> | ||
882 | <entry>ECC byte 5</entry> | ||
883 | <entry>Error correction code byte 2 of the upper 256 Bytes of data | ||
884 | in this page</entry> | ||
885 | </row> | ||
886 | <row> | ||
887 | <entry>0x08 - 0x0F</entry> | ||
888 | <entry>Autoplace 0 - 7</entry> | ||
889 | <entry></entry> | ||
890 | </row> | ||
891 | </tbody></tgroup></informaltable> | ||
892 | </sect2> | ||
893 | <sect2 id="pagesize_2048"> | ||
894 | <title>2048 byte pagesize</title> | ||
895 | <informaltable><tgroup cols="3"><tbody> | ||
896 | <row> | ||
897 | <entry>Offset</entry> | ||
898 | <entry>Content</entry> | ||
899 | <entry>Comment</entry> | ||
900 | </row> | ||
901 | <row> | ||
902 | <entry>0x00</entry> | ||
903 | <entry>Bad block marker</entry> | ||
904 | <entry>If any bit in this byte is zero, then this block is bad. | ||
905 | This applies only to the first page in a block. In the remaining | ||
906 | pages this byte is reserved</entry> | ||
907 | </row> | ||
908 | <row> | ||
909 | <entry>0x01</entry> | ||
910 | <entry>Reserved</entry> | ||
911 | <entry>Reserved</entry> | ||
912 | </row> | ||
913 | <row> | ||
914 | <entry>0x02-0x27</entry> | ||
915 | <entry>Autoplace 0 - 37</entry> | ||
916 | <entry></entry> | ||
917 | </row> | ||
918 | <row> | ||
919 | <entry>0x28</entry> | ||
920 | <entry>ECC byte 0</entry> | ||
921 | <entry>Error correction code byte 0 of the first 256 Byte data in | ||
922 | this page</entry> | ||
923 | </row> | ||
924 | <row> | ||
925 | <entry>0x29</entry> | ||
926 | <entry>ECC byte 1</entry> | ||
927 | <entry>Error correction code byte 1 of the first 256 Bytes of data | ||
928 | in this page</entry> | ||
929 | </row> | ||
930 | <row> | ||
931 | <entry>0x2A</entry> | ||
932 | <entry>ECC byte 2</entry> | ||
933 | <entry>Error correction code byte 2 of the first 256 Bytes data in | ||
934 | this page</entry> | ||
935 | </row> | ||
936 | <row> | ||
937 | <entry>0x2B</entry> | ||
938 | <entry>ECC byte 3</entry> | ||
939 | <entry>Error correction code byte 0 of the second 256 Bytes of data | ||
940 | in this page</entry> | ||
941 | </row> | ||
942 | <row> | ||
943 | <entry>0x2C</entry> | ||
944 | <entry>ECC byte 4</entry> | ||
945 | <entry>Error correction code byte 1 of the second 256 Bytes of data | ||
946 | in this page</entry> | ||
947 | </row> | ||
948 | <row> | ||
949 | <entry>0x2D</entry> | ||
950 | <entry>ECC byte 5</entry> | ||
951 | <entry>Error correction code byte 2 of the second 256 Bytes of data | ||
952 | in this page</entry> | ||
953 | </row> | ||
954 | <row> | ||
955 | <entry>0x2E</entry> | ||
956 | <entry>ECC byte 6</entry> | ||
957 | <entry>Error correction code byte 0 of the third 256 Bytes of data | ||
958 | in this page</entry> | ||
959 | </row> | ||
960 | <row> | ||
961 | <entry>0x2F</entry> | ||
962 | <entry>ECC byte 7</entry> | ||
963 | <entry>Error correction code byte 1 of the third 256 Bytes of data | ||
964 | in this page</entry> | ||
965 | </row> | ||
966 | <row> | ||
967 | <entry>0x30</entry> | ||
968 | <entry>ECC byte 8</entry> | ||
969 | <entry>Error correction code byte 2 of the third 256 Bytes of data | ||
970 | in this page</entry> | ||
971 | </row> | ||
972 | <row> | ||
973 | <entry>0x31</entry> | ||
974 | <entry>ECC byte 9</entry> | ||
975 | <entry>Error correction code byte 0 of the fourth 256 Bytes of data | ||
976 | in this page</entry> | ||
977 | </row> | ||
978 | <row> | ||
979 | <entry>0x32</entry> | ||
980 | <entry>ECC byte 10</entry> | ||
981 | <entry>Error correction code byte 1 of the fourth 256 Bytes of data | ||
982 | in this page</entry> | ||
983 | </row> | ||
984 | <row> | ||
985 | <entry>0x33</entry> | ||
986 | <entry>ECC byte 11</entry> | ||
987 | <entry>Error correction code byte 2 of the fourth 256 Bytes of data | ||
988 | in this page</entry> | ||
989 | </row> | ||
990 | <row> | ||
991 | <entry>0x34</entry> | ||
992 | <entry>ECC byte 12</entry> | ||
993 | <entry>Error correction code byte 0 of the fifth 256 Bytes of data | ||
994 | in this page</entry> | ||
995 | </row> | ||
996 | <row> | ||
997 | <entry>0x35</entry> | ||
998 | <entry>ECC byte 13</entry> | ||
999 | <entry>Error correction code byte 1 of the fifth 256 Bytes of data | ||
1000 | in this page</entry> | ||
1001 | </row> | ||
1002 | <row> | ||
1003 | <entry>0x36</entry> | ||
1004 | <entry>ECC byte 14</entry> | ||
1005 | <entry>Error correction code byte 2 of the fifth 256 Bytes of data | ||
1006 | in this page</entry> | ||
1007 | </row> | ||
1008 | <row> | ||
1009 | <entry>0x37</entry> | ||
1010 | <entry>ECC byte 15</entry> | ||
1011 | <entry>Error correction code byte 0 of the sixt 256 Bytes of data | ||
1012 | in this page</entry> | ||
1013 | </row> | ||
1014 | <row> | ||
1015 | <entry>0x38</entry> | ||
1016 | <entry>ECC byte 16</entry> | ||
1017 | <entry>Error correction code byte 1 of the sixt 256 Bytes of data | ||
1018 | in this page</entry> | ||
1019 | </row> | ||
1020 | <row> | ||
1021 | <entry>0x39</entry> | ||
1022 | <entry>ECC byte 17</entry> | ||
1023 | <entry>Error correction code byte 2 of the sixt 256 Bytes of data | ||
1024 | in this page</entry> | ||
1025 | </row> | ||
1026 | <row> | ||
1027 | <entry>0x3A</entry> | ||
1028 | <entry>ECC byte 18</entry> | ||
1029 | <entry>Error correction code byte 0 of the seventh 256 Bytes of | ||
1030 | data in this page</entry> | ||
1031 | </row> | ||
1032 | <row> | ||
1033 | <entry>0x3B</entry> | ||
1034 | <entry>ECC byte 19</entry> | ||
1035 | <entry>Error correction code byte 1 of the seventh 256 Bytes of | ||
1036 | data in this page</entry> | ||
1037 | </row> | ||
1038 | <row> | ||
1039 | <entry>0x3C</entry> | ||
1040 | <entry>ECC byte 20</entry> | ||
1041 | <entry>Error correction code byte 2 of the seventh 256 Bytes of | ||
1042 | data in this page</entry> | ||
1043 | </row> | ||
1044 | <row> | ||
1045 | <entry>0x3D</entry> | ||
1046 | <entry>ECC byte 21</entry> | ||
1047 | <entry>Error correction code byte 0 of the eighth 256 Bytes of data | ||
1048 | in this page</entry> | ||
1049 | </row> | ||
1050 | <row> | ||
1051 | <entry>0x3E</entry> | ||
1052 | <entry>ECC byte 22</entry> | ||
1053 | <entry>Error correction code byte 1 of the eighth 256 Bytes of data | ||
1054 | in this page</entry> | ||
1055 | </row> | ||
1056 | <row> | ||
1057 | <entry>0x3F</entry> | ||
1058 | <entry>ECC byte 23</entry> | ||
1059 | <entry>Error correction code byte 2 of the eighth 256 Bytes of data | ||
1060 | in this page</entry> | ||
1061 | </row> | ||
1062 | </tbody></tgroup></informaltable> | ||
1063 | </sect2> | ||
1064 | </sect1> | ||
1065 | </chapter> | ||
1066 | |||
1067 | <chapter id="filesystems"> | ||
1068 | <title>Filesystem support</title> | ||
1069 | <para> | ||
1070 | The NAND driver provides all necessary functions for a | ||
1071 | filesystem via the MTD interface. | ||
1072 | </para> | ||
1073 | <para> | ||
1074 | Filesystems must be aware of the NAND peculiarities and | ||
1075 | restrictions. One major restrictions of NAND Flash is, that you cannot | ||
1076 | write as often as you want to a page. The consecutive writes to a page, | ||
1077 | before erasing it again, are restricted to 1-3 writes, depending on the | ||
1078 | manufacturers specifications. This applies similar to the spare area. | ||
1079 | </para> | ||
1080 | <para> | ||
1081 | Therefore NAND aware filesystems must either write in page size chunks | ||
1082 | or hold a writebuffer to collect smaller writes until they sum up to | ||
1083 | pagesize. Available NAND aware filesystems: JFFS2, YAFFS. | ||
1084 | </para> | ||
1085 | <para> | ||
1086 | The spare area usage to store filesystem data is controlled by | ||
1087 | the spare area placement functionality which is described in one | ||
1088 | of the earlier chapters. | ||
1089 | </para> | ||
1090 | </chapter> | ||
1091 | <chapter id="tools"> | ||
1092 | <title>Tools</title> | ||
1093 | <para> | ||
1094 | The MTD project provides a couple of helpful tools to handle NAND Flash. | ||
1095 | <itemizedlist> | ||
1096 | <listitem><para>flasherase, flasheraseall: Erase and format FLASH partitions</para></listitem> | ||
1097 | <listitem><para>nandwrite: write filesystem images to NAND FLASH</para></listitem> | ||
1098 | <listitem><para>nanddump: dump the contents of a NAND FLASH partitions</para></listitem> | ||
1099 | </itemizedlist> | ||
1100 | </para> | ||
1101 | <para> | ||
1102 | These tools are aware of the NAND restrictions. Please use those tools | ||
1103 | instead of complaining about errors which are caused by non NAND aware | ||
1104 | access methods. | ||
1105 | </para> | ||
1106 | </chapter> | ||
1107 | |||
1108 | <chapter id="defines"> | ||
1109 | <title>Constants</title> | ||
1110 | <para> | ||
1111 | This chapter describes the constants which might be relevant for a driver developer. | ||
1112 | </para> | ||
1113 | <sect1 id="Chip_option_constants"> | ||
1114 | <title>Chip option constants</title> | ||
1115 | <sect2 id="Constants_for_chip_id_table"> | ||
1116 | <title>Constants for chip id table</title> | ||
1117 | <para> | ||
1118 | These constants are defined in nand.h. They are ored together to describe | ||
1119 | the chip functionality. | ||
1120 | <programlisting> | ||
1121 | /* Buswitdh is 16 bit */ | ||
1122 | #define NAND_BUSWIDTH_16 0x00000002 | ||
1123 | /* Device supports partial programming without padding */ | ||
1124 | #define NAND_NO_PADDING 0x00000004 | ||
1125 | /* Chip has cache program function */ | ||
1126 | #define NAND_CACHEPRG 0x00000008 | ||
1127 | /* Chip has copy back function */ | ||
1128 | #define NAND_COPYBACK 0x00000010 | ||
1129 | /* AND Chip which has 4 banks and a confusing page / block | ||
1130 | * assignment. See Renesas datasheet for further information */ | ||
1131 | #define NAND_IS_AND 0x00000020 | ||
1132 | /* Chip has a array of 4 pages which can be read without | ||
1133 | * additional ready /busy waits */ | ||
1134 | #define NAND_4PAGE_ARRAY 0x00000040 | ||
1135 | </programlisting> | ||
1136 | </para> | ||
1137 | </sect2> | ||
1138 | <sect2 id="Constants_for_runtime_options"> | ||
1139 | <title>Constants for runtime options</title> | ||
1140 | <para> | ||
1141 | These constants are defined in nand.h. They are ored together to describe | ||
1142 | the functionality. | ||
1143 | <programlisting> | ||
1144 | /* The hw ecc generator provides a syndrome instead a ecc value on read | ||
1145 | * This can only work if we have the ecc bytes directly behind the | ||
1146 | * data bytes. Applies for DOC and AG-AND Renesas HW Reed Solomon generators */ | ||
1147 | #define NAND_HWECC_SYNDROME 0x00020000 | ||
1148 | </programlisting> | ||
1149 | </para> | ||
1150 | </sect2> | ||
1151 | </sect1> | ||
1152 | |||
1153 | <sect1 id="EEC_selection_constants"> | ||
1154 | <title>ECC selection constants</title> | ||
1155 | <para> | ||
1156 | Use these constants to select the ECC algorithm. | ||
1157 | <programlisting> | ||
1158 | /* No ECC. Usage is not recommended ! */ | ||
1159 | #define NAND_ECC_NONE 0 | ||
1160 | /* Software ECC 3 byte ECC per 256 Byte data */ | ||
1161 | #define NAND_ECC_SOFT 1 | ||
1162 | /* Hardware ECC 3 byte ECC per 256 Byte data */ | ||
1163 | #define NAND_ECC_HW3_256 2 | ||
1164 | /* Hardware ECC 3 byte ECC per 512 Byte data */ | ||
1165 | #define NAND_ECC_HW3_512 3 | ||
1166 | /* Hardware ECC 6 byte ECC per 512 Byte data */ | ||
1167 | #define NAND_ECC_HW6_512 4 | ||
1168 | /* Hardware ECC 6 byte ECC per 512 Byte data */ | ||
1169 | #define NAND_ECC_HW8_512 6 | ||
1170 | </programlisting> | ||
1171 | </para> | ||
1172 | </sect1> | ||
1173 | |||
1174 | <sect1 id="Hardware_control_related_constants"> | ||
1175 | <title>Hardware control related constants</title> | ||
1176 | <para> | ||
1177 | These constants describe the requested hardware access function when | ||
1178 | the boardspecific hardware control function is called | ||
1179 | <programlisting> | ||
1180 | /* Select the chip by setting nCE to low */ | ||
1181 | #define NAND_CTL_SETNCE 1 | ||
1182 | /* Deselect the chip by setting nCE to high */ | ||
1183 | #define NAND_CTL_CLRNCE 2 | ||
1184 | /* Select the command latch by setting CLE to high */ | ||
1185 | #define NAND_CTL_SETCLE 3 | ||
1186 | /* Deselect the command latch by setting CLE to low */ | ||
1187 | #define NAND_CTL_CLRCLE 4 | ||
1188 | /* Select the address latch by setting ALE to high */ | ||
1189 | #define NAND_CTL_SETALE 5 | ||
1190 | /* Deselect the address latch by setting ALE to low */ | ||
1191 | #define NAND_CTL_CLRALE 6 | ||
1192 | /* Set write protection by setting WP to high. Not used! */ | ||
1193 | #define NAND_CTL_SETWP 7 | ||
1194 | /* Clear write protection by setting WP to low. Not used! */ | ||
1195 | #define NAND_CTL_CLRWP 8 | ||
1196 | </programlisting> | ||
1197 | </para> | ||
1198 | </sect1> | ||
1199 | |||
1200 | <sect1 id="Bad_block_table_constants"> | ||
1201 | <title>Bad block table related constants</title> | ||
1202 | <para> | ||
1203 | These constants describe the options used for bad block | ||
1204 | table descriptors. | ||
1205 | <programlisting> | ||
1206 | /* Options for the bad block table descriptors */ | ||
1207 | |||
1208 | /* The number of bits used per block in the bbt on the device */ | ||
1209 | #define NAND_BBT_NRBITS_MSK 0x0000000F | ||
1210 | #define NAND_BBT_1BIT 0x00000001 | ||
1211 | #define NAND_BBT_2BIT 0x00000002 | ||
1212 | #define NAND_BBT_4BIT 0x00000004 | ||
1213 | #define NAND_BBT_8BIT 0x00000008 | ||
1214 | /* The bad block table is in the last good block of the device */ | ||
1215 | #define NAND_BBT_LASTBLOCK 0x00000010 | ||
1216 | /* The bbt is at the given page, else we must scan for the bbt */ | ||
1217 | #define NAND_BBT_ABSPAGE 0x00000020 | ||
1218 | /* bbt is stored per chip on multichip devices */ | ||
1219 | #define NAND_BBT_PERCHIP 0x00000080 | ||
1220 | /* bbt has a version counter at offset veroffs */ | ||
1221 | #define NAND_BBT_VERSION 0x00000100 | ||
1222 | /* Create a bbt if none axists */ | ||
1223 | #define NAND_BBT_CREATE 0x00000200 | ||
1224 | /* Write bbt if necessary */ | ||
1225 | #define NAND_BBT_WRITE 0x00001000 | ||
1226 | /* Read and write back block contents when writing bbt */ | ||
1227 | #define NAND_BBT_SAVECONTENT 0x00002000 | ||
1228 | </programlisting> | ||
1229 | </para> | ||
1230 | </sect1> | ||
1231 | |||
1232 | </chapter> | ||
1233 | |||
1234 | <chapter id="structs"> | ||
1235 | <title>Structures</title> | ||
1236 | <para> | ||
1237 | This chapter contains the autogenerated documentation of the structures which are | ||
1238 | used in the NAND driver and might be relevant for a driver developer. Each | ||
1239 | struct member has a short description which is marked with an [XXX] identifier. | ||
1240 | See the chapter "Documentation hints" for an explanation. | ||
1241 | </para> | ||
1242 | !Iinclude/linux/mtd/nand.h | ||
1243 | </chapter> | ||
1244 | |||
1245 | <chapter id="pubfunctions"> | ||
1246 | <title>Public Functions Provided</title> | ||
1247 | <para> | ||
1248 | This chapter contains the autogenerated documentation of the NAND kernel API functions | ||
1249 | which are exported. Each function has a short description which is marked with an [XXX] identifier. | ||
1250 | See the chapter "Documentation hints" for an explanation. | ||
1251 | </para> | ||
1252 | !Edrivers/mtd/nand/nand_base.c | ||
1253 | !Edrivers/mtd/nand/nand_bbt.c | ||
1254 | !Edrivers/mtd/nand/nand_ecc.c | ||
1255 | </chapter> | ||
1256 | |||
1257 | <chapter id="intfunctions"> | ||
1258 | <title>Internal Functions Provided</title> | ||
1259 | <para> | ||
1260 | This chapter contains the autogenerated documentation of the NAND driver internal functions. | ||
1261 | Each function has a short description which is marked with an [XXX] identifier. | ||
1262 | See the chapter "Documentation hints" for an explanation. | ||
1263 | The functions marked with [DEFAULT] might be relevant for a board driver developer. | ||
1264 | </para> | ||
1265 | !Idrivers/mtd/nand/nand_base.c | ||
1266 | !Idrivers/mtd/nand/nand_bbt.c | ||
1267 | <!-- No internal functions for kernel-doc: | ||
1268 | X!Idrivers/mtd/nand/nand_ecc.c | ||
1269 | --> | ||
1270 | </chapter> | ||
1271 | |||
1272 | <chapter id="credits"> | ||
1273 | <title>Credits</title> | ||
1274 | <para> | ||
1275 | The following people have contributed to the NAND driver: | ||
1276 | <orderedlist> | ||
1277 | <listitem><para>Steven J. Hill<email>sjhill@realitydiluted.com</email></para></listitem> | ||
1278 | <listitem><para>David Woodhouse<email>dwmw2@infradead.org</email></para></listitem> | ||
1279 | <listitem><para>Thomas Gleixner<email>tglx@linutronix.de</email></para></listitem> | ||
1280 | </orderedlist> | ||
1281 | A lot of users have provided bugfixes, improvements and helping hands for testing. | ||
1282 | Thanks a lot. | ||
1283 | </para> | ||
1284 | <para> | ||
1285 | The following people have contributed to this document: | ||
1286 | <orderedlist> | ||
1287 | <listitem><para>Thomas Gleixner<email>tglx@linutronix.de</email></para></listitem> | ||
1288 | </orderedlist> | ||
1289 | </para> | ||
1290 | </chapter> | ||
1291 | </book> | ||
diff --git a/Documentation/DocBook/networking.tmpl b/Documentation/DocBook/networking.tmpl deleted file mode 100644 index 29df25016c7c..000000000000 --- a/Documentation/DocBook/networking.tmpl +++ /dev/null | |||
@@ -1,111 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="LinuxNetworking"> | ||
6 | <bookinfo> | ||
7 | <title>Linux Networking and Network Devices APIs</title> | ||
8 | |||
9 | <legalnotice> | ||
10 | <para> | ||
11 | This documentation is free software; you can redistribute | ||
12 | it and/or modify it under the terms of the GNU General Public | ||
13 | License as published by the Free Software Foundation; either | ||
14 | version 2 of the License, or (at your option) any later | ||
15 | version. | ||
16 | </para> | ||
17 | |||
18 | <para> | ||
19 | This program is distributed in the hope that it will be | ||
20 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
21 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
22 | See the GNU General Public License for more details. | ||
23 | </para> | ||
24 | |||
25 | <para> | ||
26 | You should have received a copy of the GNU General Public | ||
27 | License along with this program; if not, write to the Free | ||
28 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
29 | MA 02111-1307 USA | ||
30 | </para> | ||
31 | |||
32 | <para> | ||
33 | For more details see the file COPYING in the source | ||
34 | distribution of Linux. | ||
35 | </para> | ||
36 | </legalnotice> | ||
37 | </bookinfo> | ||
38 | |||
39 | <toc></toc> | ||
40 | |||
41 | <chapter id="netcore"> | ||
42 | <title>Linux Networking</title> | ||
43 | <sect1><title>Networking Base Types</title> | ||
44 | !Iinclude/linux/net.h | ||
45 | </sect1> | ||
46 | <sect1><title>Socket Buffer Functions</title> | ||
47 | !Iinclude/linux/skbuff.h | ||
48 | !Iinclude/net/sock.h | ||
49 | !Enet/socket.c | ||
50 | !Enet/core/skbuff.c | ||
51 | !Enet/core/sock.c | ||
52 | !Enet/core/datagram.c | ||
53 | !Enet/core/stream.c | ||
54 | </sect1> | ||
55 | <sect1><title>Socket Filter</title> | ||
56 | !Enet/core/filter.c | ||
57 | </sect1> | ||
58 | <sect1><title>Generic Network Statistics</title> | ||
59 | !Iinclude/uapi/linux/gen_stats.h | ||
60 | !Enet/core/gen_stats.c | ||
61 | !Enet/core/gen_estimator.c | ||
62 | </sect1> | ||
63 | <sect1><title>SUN RPC subsystem</title> | ||
64 | <!-- The !D functionality is not perfect, garbage has to be protected by comments | ||
65 | !Dnet/sunrpc/sunrpc_syms.c | ||
66 | --> | ||
67 | !Enet/sunrpc/xdr.c | ||
68 | !Enet/sunrpc/svc_xprt.c | ||
69 | !Enet/sunrpc/xprt.c | ||
70 | !Enet/sunrpc/sched.c | ||
71 | !Enet/sunrpc/socklib.c | ||
72 | !Enet/sunrpc/stats.c | ||
73 | !Enet/sunrpc/rpc_pipe.c | ||
74 | !Enet/sunrpc/rpcb_clnt.c | ||
75 | !Enet/sunrpc/clnt.c | ||
76 | </sect1> | ||
77 | <sect1><title>WiMAX</title> | ||
78 | !Enet/wimax/op-msg.c | ||
79 | !Enet/wimax/op-reset.c | ||
80 | !Enet/wimax/op-rfkill.c | ||
81 | !Enet/wimax/stack.c | ||
82 | !Iinclude/net/wimax.h | ||
83 | !Iinclude/uapi/linux/wimax.h | ||
84 | </sect1> | ||
85 | </chapter> | ||
86 | |||
87 | <chapter id="netdev"> | ||
88 | <title>Network device support</title> | ||
89 | <sect1><title>Driver Support</title> | ||
90 | !Enet/core/dev.c | ||
91 | !Enet/ethernet/eth.c | ||
92 | !Enet/sched/sch_generic.c | ||
93 | !Iinclude/linux/etherdevice.h | ||
94 | !Iinclude/linux/netdevice.h | ||
95 | </sect1> | ||
96 | <sect1><title>PHY Support</title> | ||
97 | !Edrivers/net/phy/phy.c | ||
98 | !Idrivers/net/phy/phy.c | ||
99 | !Edrivers/net/phy/phy_device.c | ||
100 | !Idrivers/net/phy/phy_device.c | ||
101 | !Edrivers/net/phy/mdio_bus.c | ||
102 | !Idrivers/net/phy/mdio_bus.c | ||
103 | </sect1> | ||
104 | <!-- FIXME: Removed for now since no structured comments in source | ||
105 | <sect1><title>Wireless</title> | ||
106 | X!Enet/core/wireless.c | ||
107 | </sect1> | ||
108 | --> | ||
109 | </chapter> | ||
110 | |||
111 | </book> | ||
diff --git a/Documentation/DocBook/rapidio.tmpl b/Documentation/DocBook/rapidio.tmpl deleted file mode 100644 index ac3cca3399a1..000000000000 --- a/Documentation/DocBook/rapidio.tmpl +++ /dev/null | |||
@@ -1,155 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [ | ||
4 | <!ENTITY rapidio SYSTEM "rapidio.xml"> | ||
5 | ]> | ||
6 | |||
7 | <book id="RapidIO-Guide"> | ||
8 | <bookinfo> | ||
9 | <title>RapidIO Subsystem Guide</title> | ||
10 | |||
11 | <authorgroup> | ||
12 | <author> | ||
13 | <firstname>Matt</firstname> | ||
14 | <surname>Porter</surname> | ||
15 | <affiliation> | ||
16 | <address> | ||
17 | <email>mporter@kernel.crashing.org</email> | ||
18 | <email>mporter@mvista.com</email> | ||
19 | </address> | ||
20 | </affiliation> | ||
21 | </author> | ||
22 | </authorgroup> | ||
23 | |||
24 | <copyright> | ||
25 | <year>2005</year> | ||
26 | <holder>MontaVista Software, Inc.</holder> | ||
27 | </copyright> | ||
28 | |||
29 | <legalnotice> | ||
30 | <para> | ||
31 | This documentation is free software; you can redistribute | ||
32 | it and/or modify it under the terms of the GNU General Public | ||
33 | License version 2 as published by the Free Software Foundation. | ||
34 | </para> | ||
35 | |||
36 | <para> | ||
37 | This program is distributed in the hope that it will be | ||
38 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
39 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
40 | See the GNU General Public License for more details. | ||
41 | </para> | ||
42 | |||
43 | <para> | ||
44 | You should have received a copy of the GNU General Public | ||
45 | License along with this program; if not, write to the Free | ||
46 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
47 | MA 02111-1307 USA | ||
48 | </para> | ||
49 | |||
50 | <para> | ||
51 | For more details see the file COPYING in the source | ||
52 | distribution of Linux. | ||
53 | </para> | ||
54 | </legalnotice> | ||
55 | </bookinfo> | ||
56 | |||
57 | <toc></toc> | ||
58 | |||
59 | <chapter id="intro"> | ||
60 | <title>Introduction</title> | ||
61 | <para> | ||
62 | RapidIO is a high speed switched fabric interconnect with | ||
63 | features aimed at the embedded market. RapidIO provides | ||
64 | support for memory-mapped I/O as well as message-based | ||
65 | transactions over the switched fabric network. RapidIO has | ||
66 | a standardized discovery mechanism not unlike the PCI bus | ||
67 | standard that allows simple detection of devices in a | ||
68 | network. | ||
69 | </para> | ||
70 | <para> | ||
71 | This documentation is provided for developers intending | ||
72 | to support RapidIO on new architectures, write new drivers, | ||
73 | or to understand the subsystem internals. | ||
74 | </para> | ||
75 | </chapter> | ||
76 | |||
77 | <chapter id="bugs"> | ||
78 | <title>Known Bugs and Limitations</title> | ||
79 | |||
80 | <sect1 id="known_bugs"> | ||
81 | <title>Bugs</title> | ||
82 | <para>None. ;)</para> | ||
83 | </sect1> | ||
84 | <sect1 id="Limitations"> | ||
85 | <title>Limitations</title> | ||
86 | <para> | ||
87 | <orderedlist> | ||
88 | <listitem><para>Access/management of RapidIO memory regions is not supported</para></listitem> | ||
89 | <listitem><para>Multiple host enumeration is not supported</para></listitem> | ||
90 | </orderedlist> | ||
91 | </para> | ||
92 | </sect1> | ||
93 | </chapter> | ||
94 | |||
95 | <chapter id="drivers"> | ||
96 | <title>RapidIO driver interface</title> | ||
97 | <para> | ||
98 | Drivers are provided a set of calls in order | ||
99 | to interface with the subsystem to gather info | ||
100 | on devices, request/map memory region resources, | ||
101 | and manage mailboxes/doorbells. | ||
102 | </para> | ||
103 | <sect1 id="Functions"> | ||
104 | <title>Functions</title> | ||
105 | !Iinclude/linux/rio_drv.h | ||
106 | !Edrivers/rapidio/rio-driver.c | ||
107 | !Edrivers/rapidio/rio.c | ||
108 | </sect1> | ||
109 | </chapter> | ||
110 | |||
111 | <chapter id="internals"> | ||
112 | <title>Internals</title> | ||
113 | |||
114 | <para> | ||
115 | This chapter contains the autogenerated documentation of the RapidIO | ||
116 | subsystem. | ||
117 | </para> | ||
118 | |||
119 | <sect1 id="Structures"><title>Structures</title> | ||
120 | !Iinclude/linux/rio.h | ||
121 | </sect1> | ||
122 | <sect1 id="Enumeration_and_Discovery"><title>Enumeration and Discovery</title> | ||
123 | !Idrivers/rapidio/rio-scan.c | ||
124 | </sect1> | ||
125 | <sect1 id="Driver_functionality"><title>Driver functionality</title> | ||
126 | !Idrivers/rapidio/rio.c | ||
127 | !Idrivers/rapidio/rio-access.c | ||
128 | </sect1> | ||
129 | <sect1 id="Device_model_support"><title>Device model support</title> | ||
130 | !Idrivers/rapidio/rio-driver.c | ||
131 | </sect1> | ||
132 | <sect1 id="PPC32_support"><title>PPC32 support</title> | ||
133 | !Iarch/powerpc/sysdev/fsl_rio.c | ||
134 | </sect1> | ||
135 | </chapter> | ||
136 | |||
137 | <chapter id="credits"> | ||
138 | <title>Credits</title> | ||
139 | <para> | ||
140 | The following people have contributed to the RapidIO | ||
141 | subsystem directly or indirectly: | ||
142 | <orderedlist> | ||
143 | <listitem><para>Matt Porter<email>mporter@kernel.crashing.org</email></para></listitem> | ||
144 | <listitem><para>Randy Vinson<email>rvinson@mvista.com</email></para></listitem> | ||
145 | <listitem><para>Dan Malek<email>dan@embeddedalley.com</email></para></listitem> | ||
146 | </orderedlist> | ||
147 | </para> | ||
148 | <para> | ||
149 | The following people have contributed to this document: | ||
150 | <orderedlist> | ||
151 | <listitem><para>Matt Porter<email>mporter@kernel.crashing.org</email></para></listitem> | ||
152 | </orderedlist> | ||
153 | </para> | ||
154 | </chapter> | ||
155 | </book> | ||
diff --git a/Documentation/DocBook/s390-drivers.tmpl b/Documentation/DocBook/s390-drivers.tmpl deleted file mode 100644 index 95bfc12e5439..000000000000 --- a/Documentation/DocBook/s390-drivers.tmpl +++ /dev/null | |||
@@ -1,161 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="s390drivers"> | ||
6 | <bookinfo> | ||
7 | <title>Writing s390 channel device drivers</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Cornelia</firstname> | ||
12 | <surname>Huck</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>cornelia.huck@de.ibm.com</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | </authorgroup> | ||
20 | |||
21 | <copyright> | ||
22 | <year>2007</year> | ||
23 | <holder>IBM Corp.</holder> | ||
24 | </copyright> | ||
25 | |||
26 | <legalnotice> | ||
27 | <para> | ||
28 | This documentation is free software; you can redistribute | ||
29 | it and/or modify it under the terms of the GNU General Public | ||
30 | License as published by the Free Software Foundation; either | ||
31 | version 2 of the License, or (at your option) any later | ||
32 | version. | ||
33 | </para> | ||
34 | |||
35 | <para> | ||
36 | This program is distributed in the hope that it will be | ||
37 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
38 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
39 | See the GNU General Public License for more details. | ||
40 | </para> | ||
41 | |||
42 | <para> | ||
43 | You should have received a copy of the GNU General Public | ||
44 | License along with this program; if not, write to the Free | ||
45 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
46 | MA 02111-1307 USA | ||
47 | </para> | ||
48 | |||
49 | <para> | ||
50 | For more details see the file COPYING in the source | ||
51 | distribution of Linux. | ||
52 | </para> | ||
53 | </legalnotice> | ||
54 | </bookinfo> | ||
55 | |||
56 | <toc></toc> | ||
57 | |||
58 | <chapter id="intro"> | ||
59 | <title>Introduction</title> | ||
60 | <para> | ||
61 | This document describes the interfaces available for device drivers that | ||
62 | drive s390 based channel attached I/O devices. This includes interfaces for | ||
63 | interaction with the hardware and interfaces for interacting with the | ||
64 | common driver core. Those interfaces are provided by the s390 common I/O | ||
65 | layer. | ||
66 | </para> | ||
67 | <para> | ||
68 | The document assumes a familarity with the technical terms associated | ||
69 | with the s390 channel I/O architecture. For a description of this | ||
70 | architecture, please refer to the "z/Architecture: Principles of | ||
71 | Operation", IBM publication no. SA22-7832. | ||
72 | </para> | ||
73 | <para> | ||
74 | While most I/O devices on a s390 system are typically driven through the | ||
75 | channel I/O mechanism described here, there are various other methods | ||
76 | (like the diag interface). These are out of the scope of this document. | ||
77 | </para> | ||
78 | <para> | ||
79 | Some additional information can also be found in the kernel source | ||
80 | under Documentation/s390/driver-model.txt. | ||
81 | </para> | ||
82 | </chapter> | ||
83 | <chapter id="ccw"> | ||
84 | <title>The ccw bus</title> | ||
85 | <para> | ||
86 | The ccw bus typically contains the majority of devices available to | ||
87 | a s390 system. Named after the channel command word (ccw), the basic | ||
88 | command structure used to address its devices, the ccw bus contains | ||
89 | so-called channel attached devices. They are addressed via I/O | ||
90 | subchannels, visible on the css bus. A device driver for | ||
91 | channel-attached devices, however, will never interact with the | ||
92 | subchannel directly, but only via the I/O device on the ccw bus, | ||
93 | the ccw device. | ||
94 | </para> | ||
95 | <sect1 id="channelIO"> | ||
96 | <title>I/O functions for channel-attached devices</title> | ||
97 | <para> | ||
98 | Some hardware structures have been translated into C structures for use | ||
99 | by the common I/O layer and device drivers. For more information on | ||
100 | the hardware structures represented here, please consult the Principles | ||
101 | of Operation. | ||
102 | </para> | ||
103 | !Iarch/s390/include/asm/cio.h | ||
104 | </sect1> | ||
105 | <sect1 id="ccwdev"> | ||
106 | <title>ccw devices</title> | ||
107 | <para> | ||
108 | Devices that want to initiate channel I/O need to attach to the ccw bus. | ||
109 | Interaction with the driver core is done via the common I/O layer, which | ||
110 | provides the abstractions of ccw devices and ccw device drivers. | ||
111 | </para> | ||
112 | <para> | ||
113 | The functions that initiate or terminate channel I/O all act upon a | ||
114 | ccw device structure. Device drivers must not bypass those functions | ||
115 | or strange side effects may happen. | ||
116 | </para> | ||
117 | !Iarch/s390/include/asm/ccwdev.h | ||
118 | !Edrivers/s390/cio/device.c | ||
119 | !Edrivers/s390/cio/device_ops.c | ||
120 | </sect1> | ||
121 | <sect1 id="cmf"> | ||
122 | <title>The channel-measurement facility</title> | ||
123 | <para> | ||
124 | The channel-measurement facility provides a means to collect | ||
125 | measurement data which is made available by the channel subsystem | ||
126 | for each channel attached device. | ||
127 | </para> | ||
128 | !Iarch/s390/include/asm/cmb.h | ||
129 | !Edrivers/s390/cio/cmf.c | ||
130 | </sect1> | ||
131 | </chapter> | ||
132 | |||
133 | <chapter id="ccwgroup"> | ||
134 | <title>The ccwgroup bus</title> | ||
135 | <para> | ||
136 | The ccwgroup bus only contains artificial devices, created by the user. | ||
137 | Many networking devices (e.g. qeth) are in fact composed of several | ||
138 | ccw devices (like read, write and data channel for qeth). The | ||
139 | ccwgroup bus provides a mechanism to create a meta-device which | ||
140 | contains those ccw devices as slave devices and can be associated | ||
141 | with the netdevice. | ||
142 | </para> | ||
143 | <sect1 id="ccwgroupdevices"> | ||
144 | <title>ccw group devices</title> | ||
145 | !Iarch/s390/include/asm/ccwgroup.h | ||
146 | !Edrivers/s390/cio/ccwgroup.c | ||
147 | </sect1> | ||
148 | </chapter> | ||
149 | |||
150 | <chapter id="genericinterfaces"> | ||
151 | <title>Generic interfaces</title> | ||
152 | <para> | ||
153 | Some interfaces are available to other drivers that do not necessarily | ||
154 | have anything to do with the busses described above, but still are | ||
155 | indirectly using basic infrastructure in the common I/O layer. | ||
156 | One example is the support for adapter interrupts. | ||
157 | </para> | ||
158 | !Edrivers/s390/cio/airq.c | ||
159 | </chapter> | ||
160 | |||
161 | </book> | ||
diff --git a/Documentation/DocBook/scsi.tmpl b/Documentation/DocBook/scsi.tmpl deleted file mode 100644 index 4b9b9b286cea..000000000000 --- a/Documentation/DocBook/scsi.tmpl +++ /dev/null | |||
@@ -1,409 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="scsimid"> | ||
6 | <bookinfo> | ||
7 | <title>SCSI Interfaces Guide</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>James</firstname> | ||
12 | <surname>Bottomley</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>James.Bottomley@hansenpartnership.com</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | |||
20 | <author> | ||
21 | <firstname>Rob</firstname> | ||
22 | <surname>Landley</surname> | ||
23 | <affiliation> | ||
24 | <address> | ||
25 | <email>rob@landley.net</email> | ||
26 | </address> | ||
27 | </affiliation> | ||
28 | </author> | ||
29 | |||
30 | </authorgroup> | ||
31 | |||
32 | <copyright> | ||
33 | <year>2007</year> | ||
34 | <holder>Linux Foundation</holder> | ||
35 | </copyright> | ||
36 | |||
37 | <legalnotice> | ||
38 | <para> | ||
39 | This documentation is free software; you can redistribute | ||
40 | it and/or modify it under the terms of the GNU General Public | ||
41 | License version 2. | ||
42 | </para> | ||
43 | |||
44 | <para> | ||
45 | This program is distributed in the hope that it will be | ||
46 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
47 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
48 | For more details see the file COPYING in the source | ||
49 | distribution of Linux. | ||
50 | </para> | ||
51 | </legalnotice> | ||
52 | </bookinfo> | ||
53 | |||
54 | <toc></toc> | ||
55 | |||
56 | <chapter id="intro"> | ||
57 | <title>Introduction</title> | ||
58 | <sect1 id="protocol_vs_bus"> | ||
59 | <title>Protocol vs bus</title> | ||
60 | <para> | ||
61 | Once upon a time, the Small Computer Systems Interface defined both | ||
62 | a parallel I/O bus and a data protocol to connect a wide variety of | ||
63 | peripherals (disk drives, tape drives, modems, printers, scanners, | ||
64 | optical drives, test equipment, and medical devices) to a host | ||
65 | computer. | ||
66 | </para> | ||
67 | <para> | ||
68 | Although the old parallel (fast/wide/ultra) SCSI bus has largely | ||
69 | fallen out of use, the SCSI command set is more widely used than ever | ||
70 | to communicate with devices over a number of different busses. | ||
71 | </para> | ||
72 | <para> | ||
73 | The <ulink url='http://www.t10.org/scsi-3.htm'>SCSI protocol</ulink> | ||
74 | is a big-endian peer-to-peer packet based protocol. SCSI commands | ||
75 | are 6, 10, 12, or 16 bytes long, often followed by an associated data | ||
76 | payload. | ||
77 | </para> | ||
78 | <para> | ||
79 | SCSI commands can be transported over just about any kind of bus, and | ||
80 | are the default protocol for storage devices attached to USB, SATA, | ||
81 | SAS, Fibre Channel, FireWire, and ATAPI devices. SCSI packets are | ||
82 | also commonly exchanged over Infiniband, | ||
83 | <ulink url='http://i2o.shadowconnect.com/faq.php'>I20</ulink>, TCP/IP | ||
84 | (<ulink url='https://en.wikipedia.org/wiki/ISCSI'>iSCSI</ulink>), even | ||
85 | <ulink url='http://cyberelk.net/tim/parport/parscsi.html'>Parallel | ||
86 | ports</ulink>. | ||
87 | </para> | ||
88 | </sect1> | ||
89 | <sect1 id="subsystem_design"> | ||
90 | <title>Design of the Linux SCSI subsystem</title> | ||
91 | <para> | ||
92 | The SCSI subsystem uses a three layer design, with upper, mid, and low | ||
93 | layers. Every operation involving the SCSI subsystem (such as reading | ||
94 | a sector from a disk) uses one driver at each of the 3 levels: one | ||
95 | upper layer driver, one lower layer driver, and the SCSI midlayer. | ||
96 | </para> | ||
97 | <para> | ||
98 | The SCSI upper layer provides the interface between userspace and the | ||
99 | kernel, in the form of block and char device nodes for I/O and | ||
100 | ioctl(). The SCSI lower layer contains drivers for specific hardware | ||
101 | devices. | ||
102 | </para> | ||
103 | <para> | ||
104 | In between is the SCSI mid-layer, analogous to a network routing | ||
105 | layer such as the IPv4 stack. The SCSI mid-layer routes a packet | ||
106 | based data protocol between the upper layer's /dev nodes and the | ||
107 | corresponding devices in the lower layer. It manages command queues, | ||
108 | provides error handling and power management functions, and responds | ||
109 | to ioctl() requests. | ||
110 | </para> | ||
111 | </sect1> | ||
112 | </chapter> | ||
113 | |||
114 | <chapter id="upper_layer"> | ||
115 | <title>SCSI upper layer</title> | ||
116 | <para> | ||
117 | The upper layer supports the user-kernel interface by providing | ||
118 | device nodes. | ||
119 | </para> | ||
120 | <sect1 id="sd"> | ||
121 | <title>sd (SCSI Disk)</title> | ||
122 | <para>sd (sd_mod.o)</para> | ||
123 | <!-- !Idrivers/scsi/sd.c --> | ||
124 | </sect1> | ||
125 | <sect1 id="sr"> | ||
126 | <title>sr (SCSI CD-ROM)</title> | ||
127 | <para>sr (sr_mod.o)</para> | ||
128 | </sect1> | ||
129 | <sect1 id="st"> | ||
130 | <title>st (SCSI Tape)</title> | ||
131 | <para>st (st.o)</para> | ||
132 | </sect1> | ||
133 | <sect1 id="sg"> | ||
134 | <title>sg (SCSI Generic)</title> | ||
135 | <para>sg (sg.o)</para> | ||
136 | </sect1> | ||
137 | <sect1 id="ch"> | ||
138 | <title>ch (SCSI Media Changer)</title> | ||
139 | <para>ch (ch.c)</para> | ||
140 | </sect1> | ||
141 | </chapter> | ||
142 | |||
143 | <chapter id="mid_layer"> | ||
144 | <title>SCSI mid layer</title> | ||
145 | |||
146 | <sect1 id="midlayer_implementation"> | ||
147 | <title>SCSI midlayer implementation</title> | ||
148 | <sect2 id="scsi_device.h"> | ||
149 | <title>include/scsi/scsi_device.h</title> | ||
150 | <para> | ||
151 | </para> | ||
152 | !Iinclude/scsi/scsi_device.h | ||
153 | </sect2> | ||
154 | |||
155 | <sect2 id="scsi.c"> | ||
156 | <title>drivers/scsi/scsi.c</title> | ||
157 | <para>Main file for the SCSI midlayer.</para> | ||
158 | !Edrivers/scsi/scsi.c | ||
159 | </sect2> | ||
160 | <sect2 id="scsicam.c"> | ||
161 | <title>drivers/scsi/scsicam.c</title> | ||
162 | <para> | ||
163 | <ulink url='http://www.t10.org/ftp/t10/drafts/cam/cam-r12b.pdf'>SCSI | ||
164 | Common Access Method</ulink> support functions, for use with | ||
165 | HDIO_GETGEO, etc. | ||
166 | </para> | ||
167 | !Edrivers/scsi/scsicam.c | ||
168 | </sect2> | ||
169 | <sect2 id="scsi_error.c"> | ||
170 | <title>drivers/scsi/scsi_error.c</title> | ||
171 | <para>Common SCSI error/timeout handling routines.</para> | ||
172 | !Edrivers/scsi/scsi_error.c | ||
173 | </sect2> | ||
174 | <sect2 id="scsi_devinfo.c"> | ||
175 | <title>drivers/scsi/scsi_devinfo.c</title> | ||
176 | <para> | ||
177 | Manage scsi_dev_info_list, which tracks blacklisted and whitelisted | ||
178 | devices. | ||
179 | </para> | ||
180 | !Idrivers/scsi/scsi_devinfo.c | ||
181 | </sect2> | ||
182 | <sect2 id="scsi_ioctl.c"> | ||
183 | <title>drivers/scsi/scsi_ioctl.c</title> | ||
184 | <para> | ||
185 | Handle ioctl() calls for SCSI devices. | ||
186 | </para> | ||
187 | !Edrivers/scsi/scsi_ioctl.c | ||
188 | </sect2> | ||
189 | <sect2 id="scsi_lib.c"> | ||
190 | <title>drivers/scsi/scsi_lib.c</title> | ||
191 | <para> | ||
192 | SCSI queuing library. | ||
193 | </para> | ||
194 | !Edrivers/scsi/scsi_lib.c | ||
195 | </sect2> | ||
196 | <sect2 id="scsi_lib_dma.c"> | ||
197 | <title>drivers/scsi/scsi_lib_dma.c</title> | ||
198 | <para> | ||
199 | SCSI library functions depending on DMA | ||
200 | (map and unmap scatter-gather lists). | ||
201 | </para> | ||
202 | !Edrivers/scsi/scsi_lib_dma.c | ||
203 | </sect2> | ||
204 | <sect2 id="scsi_module.c"> | ||
205 | <title>drivers/scsi/scsi_module.c</title> | ||
206 | <para> | ||
207 | The file drivers/scsi/scsi_module.c contains legacy support for | ||
208 | old-style host templates. It should never be used by any new driver. | ||
209 | </para> | ||
210 | </sect2> | ||
211 | <sect2 id="scsi_proc.c"> | ||
212 | <title>drivers/scsi/scsi_proc.c</title> | ||
213 | <para> | ||
214 | The functions in this file provide an interface between | ||
215 | the PROC file system and the SCSI device drivers | ||
216 | It is mainly used for debugging, statistics and to pass | ||
217 | information directly to the lowlevel driver. | ||
218 | |||
219 | I.E. plumbing to manage /proc/scsi/* | ||
220 | </para> | ||
221 | !Idrivers/scsi/scsi_proc.c | ||
222 | </sect2> | ||
223 | <sect2 id="scsi_netlink.c"> | ||
224 | <title>drivers/scsi/scsi_netlink.c</title> | ||
225 | <para> | ||
226 | Infrastructure to provide async events from transports to userspace | ||
227 | via netlink, using a single NETLINK_SCSITRANSPORT protocol for all | ||
228 | transports. | ||
229 | |||
230 | See <ulink url='http://marc.info/?l=linux-scsi&m=115507374832500&w=2'>the | ||
231 | original patch submission</ulink> for more details. | ||
232 | </para> | ||
233 | !Idrivers/scsi/scsi_netlink.c | ||
234 | </sect2> | ||
235 | <sect2 id="scsi_scan.c"> | ||
236 | <title>drivers/scsi/scsi_scan.c</title> | ||
237 | <para> | ||
238 | Scan a host to determine which (if any) devices are attached. | ||
239 | |||
240 | The general scanning/probing algorithm is as follows, exceptions are | ||
241 | made to it depending on device specific flags, compilation options, | ||
242 | and global variable (boot or module load time) settings. | ||
243 | |||
244 | A specific LUN is scanned via an INQUIRY command; if the LUN has a | ||
245 | device attached, a scsi_device is allocated and setup for it. | ||
246 | |||
247 | For every id of every channel on the given host, start by scanning | ||
248 | LUN 0. Skip hosts that don't respond at all to a scan of LUN 0. | ||
249 | Otherwise, if LUN 0 has a device attached, allocate and setup a | ||
250 | scsi_device for it. If target is SCSI-3 or up, issue a REPORT LUN, | ||
251 | and scan all of the LUNs returned by the REPORT LUN; else, | ||
252 | sequentially scan LUNs up until some maximum is reached, or a LUN is | ||
253 | seen that cannot have a device attached to it. | ||
254 | </para> | ||
255 | !Idrivers/scsi/scsi_scan.c | ||
256 | </sect2> | ||
257 | <sect2 id="scsi_sysctl.c"> | ||
258 | <title>drivers/scsi/scsi_sysctl.c</title> | ||
259 | <para> | ||
260 | Set up the sysctl entry: "/dev/scsi/logging_level" | ||
261 | (DEV_SCSI_LOGGING_LEVEL) which sets/returns scsi_logging_level. | ||
262 | </para> | ||
263 | </sect2> | ||
264 | <sect2 id="scsi_sysfs.c"> | ||
265 | <title>drivers/scsi/scsi_sysfs.c</title> | ||
266 | <para> | ||
267 | SCSI sysfs interface routines. | ||
268 | </para> | ||
269 | !Edrivers/scsi/scsi_sysfs.c | ||
270 | </sect2> | ||
271 | <sect2 id="hosts.c"> | ||
272 | <title>drivers/scsi/hosts.c</title> | ||
273 | <para> | ||
274 | mid to lowlevel SCSI driver interface | ||
275 | </para> | ||
276 | !Edrivers/scsi/hosts.c | ||
277 | </sect2> | ||
278 | <sect2 id="constants.c"> | ||
279 | <title>drivers/scsi/constants.c</title> | ||
280 | <para> | ||
281 | mid to lowlevel SCSI driver interface | ||
282 | </para> | ||
283 | !Edrivers/scsi/constants.c | ||
284 | </sect2> | ||
285 | </sect1> | ||
286 | |||
287 | <sect1 id="Transport_classes"> | ||
288 | <title>Transport classes</title> | ||
289 | <para> | ||
290 | Transport classes are service libraries for drivers in the SCSI | ||
291 | lower layer, which expose transport attributes in sysfs. | ||
292 | </para> | ||
293 | <sect2 id="Fibre_Channel_transport"> | ||
294 | <title>Fibre Channel transport</title> | ||
295 | <para> | ||
296 | The file drivers/scsi/scsi_transport_fc.c defines transport attributes | ||
297 | for Fibre Channel. | ||
298 | </para> | ||
299 | !Edrivers/scsi/scsi_transport_fc.c | ||
300 | </sect2> | ||
301 | <sect2 id="iSCSI_transport"> | ||
302 | <title>iSCSI transport class</title> | ||
303 | <para> | ||
304 | The file drivers/scsi/scsi_transport_iscsi.c defines transport | ||
305 | attributes for the iSCSI class, which sends SCSI packets over TCP/IP | ||
306 | connections. | ||
307 | </para> | ||
308 | !Edrivers/scsi/scsi_transport_iscsi.c | ||
309 | </sect2> | ||
310 | <sect2 id="SAS_transport"> | ||
311 | <title>Serial Attached SCSI (SAS) transport class</title> | ||
312 | <para> | ||
313 | The file drivers/scsi/scsi_transport_sas.c defines transport | ||
314 | attributes for Serial Attached SCSI, a variant of SATA aimed at | ||
315 | large high-end systems. | ||
316 | </para> | ||
317 | <para> | ||
318 | The SAS transport class contains common code to deal with SAS HBAs, | ||
319 | an aproximated representation of SAS topologies in the driver model, | ||
320 | and various sysfs attributes to expose these topologies and management | ||
321 | interfaces to userspace. | ||
322 | </para> | ||
323 | <para> | ||
324 | In addition to the basic SCSI core objects this transport class | ||
325 | introduces two additional intermediate objects: The SAS PHY | ||
326 | as represented by struct sas_phy defines an "outgoing" PHY on | ||
327 | a SAS HBA or Expander, and the SAS remote PHY represented by | ||
328 | struct sas_rphy defines an "incoming" PHY on a SAS Expander or | ||
329 | end device. Note that this is purely a software concept, the | ||
330 | underlying hardware for a PHY and a remote PHY is the exactly | ||
331 | the same. | ||
332 | </para> | ||
333 | <para> | ||
334 | There is no concept of a SAS port in this code, users can see | ||
335 | what PHYs form a wide port based on the port_identifier attribute, | ||
336 | which is the same for all PHYs in a port. | ||
337 | </para> | ||
338 | !Edrivers/scsi/scsi_transport_sas.c | ||
339 | </sect2> | ||
340 | <sect2 id="SATA_transport"> | ||
341 | <title>SATA transport class</title> | ||
342 | <para> | ||
343 | The SATA transport is handled by libata, which has its own book of | ||
344 | documentation in this directory. | ||
345 | </para> | ||
346 | </sect2> | ||
347 | <sect2 id="SPI_transport"> | ||
348 | <title>Parallel SCSI (SPI) transport class</title> | ||
349 | <para> | ||
350 | The file drivers/scsi/scsi_transport_spi.c defines transport | ||
351 | attributes for traditional (fast/wide/ultra) SCSI busses. | ||
352 | </para> | ||
353 | !Edrivers/scsi/scsi_transport_spi.c | ||
354 | </sect2> | ||
355 | <sect2 id="SRP_transport"> | ||
356 | <title>SCSI RDMA (SRP) transport class</title> | ||
357 | <para> | ||
358 | The file drivers/scsi/scsi_transport_srp.c defines transport | ||
359 | attributes for SCSI over Remote Direct Memory Access. | ||
360 | </para> | ||
361 | !Edrivers/scsi/scsi_transport_srp.c | ||
362 | </sect2> | ||
363 | </sect1> | ||
364 | |||
365 | </chapter> | ||
366 | |||
367 | <chapter id="lower_layer"> | ||
368 | <title>SCSI lower layer</title> | ||
369 | <sect1 id="hba_drivers"> | ||
370 | <title>Host Bus Adapter transport types</title> | ||
371 | <para> | ||
372 | Many modern device controllers use the SCSI command set as a protocol to | ||
373 | communicate with their devices through many different types of physical | ||
374 | connections. | ||
375 | </para> | ||
376 | <para> | ||
377 | In SCSI language a bus capable of carrying SCSI commands is | ||
378 | called a "transport", and a controller connecting to such a bus is | ||
379 | called a "host bus adapter" (HBA). | ||
380 | </para> | ||
381 | <sect2 id="scsi_debug.c"> | ||
382 | <title>Debug transport</title> | ||
383 | <para> | ||
384 | The file drivers/scsi/scsi_debug.c simulates a host adapter with a | ||
385 | variable number of disks (or disk like devices) attached, sharing a | ||
386 | common amount of RAM. Does a lot of checking to make sure that we are | ||
387 | not getting blocks mixed up, and panics the kernel if anything out of | ||
388 | the ordinary is seen. | ||
389 | </para> | ||
390 | <para> | ||
391 | To be more realistic, the simulated devices have the transport | ||
392 | attributes of SAS disks. | ||
393 | </para> | ||
394 | <para> | ||
395 | For documentation see | ||
396 | <ulink url='http://sg.danny.cz/sg/sdebug26.html'>http://sg.danny.cz/sg/sdebug26.html</ulink> | ||
397 | </para> | ||
398 | <!-- !Edrivers/scsi/scsi_debug.c --> | ||
399 | </sect2> | ||
400 | <sect2 id="todo"> | ||
401 | <title>todo</title> | ||
402 | <para>Parallel (fast/wide/ultra) SCSI, USB, SATA, | ||
403 | SAS, Fibre Channel, FireWire, ATAPI devices, Infiniband, | ||
404 | I20, iSCSI, Parallel ports, netlink... | ||
405 | </para> | ||
406 | </sect2> | ||
407 | </sect1> | ||
408 | </chapter> | ||
409 | </book> | ||
diff --git a/Documentation/DocBook/sh.tmpl b/Documentation/DocBook/sh.tmpl deleted file mode 100644 index 4a38f604fa66..000000000000 --- a/Documentation/DocBook/sh.tmpl +++ /dev/null | |||
@@ -1,105 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="sh-drivers"> | ||
6 | <bookinfo> | ||
7 | <title>SuperH Interfaces Guide</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Paul</firstname> | ||
12 | <surname>Mundt</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>lethal@linux-sh.org</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | </authorgroup> | ||
20 | |||
21 | <copyright> | ||
22 | <year>2008-2010</year> | ||
23 | <holder>Paul Mundt</holder> | ||
24 | </copyright> | ||
25 | <copyright> | ||
26 | <year>2008-2010</year> | ||
27 | <holder>Renesas Technology Corp.</holder> | ||
28 | </copyright> | ||
29 | <copyright> | ||
30 | <year>2010</year> | ||
31 | <holder>Renesas Electronics Corp.</holder> | ||
32 | </copyright> | ||
33 | |||
34 | <legalnotice> | ||
35 | <para> | ||
36 | This documentation is free software; you can redistribute | ||
37 | it and/or modify it under the terms of the GNU General Public | ||
38 | License version 2 as published by the Free Software Foundation. | ||
39 | </para> | ||
40 | |||
41 | <para> | ||
42 | This program is distributed in the hope that it will be | ||
43 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
44 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
45 | See the GNU General Public License for more details. | ||
46 | </para> | ||
47 | |||
48 | <para> | ||
49 | You should have received a copy of the GNU General Public | ||
50 | License along with this program; if not, write to the Free | ||
51 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
52 | MA 02111-1307 USA | ||
53 | </para> | ||
54 | |||
55 | <para> | ||
56 | For more details see the file COPYING in the source | ||
57 | distribution of Linux. | ||
58 | </para> | ||
59 | </legalnotice> | ||
60 | </bookinfo> | ||
61 | |||
62 | <toc></toc> | ||
63 | |||
64 | <chapter id="mm"> | ||
65 | <title>Memory Management</title> | ||
66 | <sect1 id="sh4"> | ||
67 | <title>SH-4</title> | ||
68 | <sect2 id="sq"> | ||
69 | <title>Store Queue API</title> | ||
70 | !Earch/sh/kernel/cpu/sh4/sq.c | ||
71 | </sect2> | ||
72 | </sect1> | ||
73 | <sect1 id="sh5"> | ||
74 | <title>SH-5</title> | ||
75 | <sect2 id="tlb"> | ||
76 | <title>TLB Interfaces</title> | ||
77 | !Iarch/sh/mm/tlb-sh5.c | ||
78 | !Iarch/sh/include/asm/tlb_64.h | ||
79 | </sect2> | ||
80 | </sect1> | ||
81 | </chapter> | ||
82 | <chapter id="mach"> | ||
83 | <title>Machine Specific Interfaces</title> | ||
84 | <sect1 id="dreamcast"> | ||
85 | <title>mach-dreamcast</title> | ||
86 | !Iarch/sh/boards/mach-dreamcast/rtc.c | ||
87 | </sect1> | ||
88 | <sect1 id="x3proto"> | ||
89 | <title>mach-x3proto</title> | ||
90 | !Earch/sh/boards/mach-x3proto/ilsel.c | ||
91 | </sect1> | ||
92 | </chapter> | ||
93 | <chapter id="busses"> | ||
94 | <title>Busses</title> | ||
95 | <sect1 id="superhyway"> | ||
96 | <title>SuperHyway</title> | ||
97 | !Edrivers/sh/superhyway/superhyway.c | ||
98 | </sect1> | ||
99 | |||
100 | <sect1 id="maple"> | ||
101 | <title>Maple</title> | ||
102 | !Edrivers/sh/maple/maple.c | ||
103 | </sect1> | ||
104 | </chapter> | ||
105 | </book> | ||
diff --git a/Documentation/DocBook/stylesheet.xsl b/Documentation/DocBook/stylesheet.xsl deleted file mode 100644 index 3bf4ecf3d760..000000000000 --- a/Documentation/DocBook/stylesheet.xsl +++ /dev/null | |||
@@ -1,11 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0"> | ||
3 | <param name="chunk.quietly">1</param> | ||
4 | <param name="funcsynopsis.style">ansi</param> | ||
5 | <param name="funcsynopsis.tabular.threshold">80</param> | ||
6 | <param name="callout.graphics">0</param> | ||
7 | <!-- <param name="paper.type">A4</param> --> | ||
8 | <param name="generate.consistent.ids">1</param> | ||
9 | <param name="generate.section.toc.level">2</param> | ||
10 | <param name="use.id.as.filename">1</param> | ||
11 | </stylesheet> | ||
diff --git a/Documentation/DocBook/w1.tmpl b/Documentation/DocBook/w1.tmpl deleted file mode 100644 index c65cb27abef9..000000000000 --- a/Documentation/DocBook/w1.tmpl +++ /dev/null | |||
@@ -1,101 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="w1id"> | ||
6 | <bookinfo> | ||
7 | <title>W1: Dallas' 1-wire bus</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>David</firstname> | ||
12 | <surname>Fries</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>David@Fries.net</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | |||
20 | </authorgroup> | ||
21 | |||
22 | <copyright> | ||
23 | <year>2013</year> | ||
24 | <!-- | ||
25 | <holder></holder> | ||
26 | --> | ||
27 | </copyright> | ||
28 | |||
29 | <legalnotice> | ||
30 | <para> | ||
31 | This documentation is free software; you can redistribute | ||
32 | it and/or modify it under the terms of the GNU General Public | ||
33 | License version 2. | ||
34 | </para> | ||
35 | |||
36 | <para> | ||
37 | This program is distributed in the hope that it will be | ||
38 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
39 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
40 | For more details see the file COPYING in the source | ||
41 | distribution of Linux. | ||
42 | </para> | ||
43 | </legalnotice> | ||
44 | </bookinfo> | ||
45 | |||
46 | <toc></toc> | ||
47 | |||
48 | <chapter id="w1_internal"> | ||
49 | <title>W1 API internal to the kernel</title> | ||
50 | |||
51 | <sect1 id="w1_internal_api"> | ||
52 | <title>W1 API internal to the kernel</title> | ||
53 | <sect2 id="w1.h"> | ||
54 | <title>include/linux/w1.h</title> | ||
55 | <para>W1 kernel API functions.</para> | ||
56 | !Iinclude/linux/w1.h | ||
57 | </sect2> | ||
58 | |||
59 | <sect2 id="w1.c"> | ||
60 | <title>drivers/w1/w1.c</title> | ||
61 | <para>W1 core functions.</para> | ||
62 | !Idrivers/w1/w1.c | ||
63 | </sect2> | ||
64 | |||
65 | <sect2 id="w1_family.c"> | ||
66 | <title>drivers/w1/w1_family.c</title> | ||
67 | <para>Allows registering device family operations.</para> | ||
68 | !Edrivers/w1/w1_family.c | ||
69 | </sect2> | ||
70 | |||
71 | <sect2 id="w1_internal.h"> | ||
72 | <title>drivers/w1/w1_internal.h</title> | ||
73 | <para>W1 internal initialization for master devices.</para> | ||
74 | !Idrivers/w1/w1_internal.h | ||
75 | </sect2> | ||
76 | |||
77 | <sect2 id="w1_int.c"> | ||
78 | <title>drivers/w1/w1_int.c</title> | ||
79 | <para>W1 internal initialization for master devices.</para> | ||
80 | !Edrivers/w1/w1_int.c | ||
81 | </sect2> | ||
82 | |||
83 | <sect2 id="w1_netlink.h"> | ||
84 | <title>drivers/w1/w1_netlink.h</title> | ||
85 | <para>W1 external netlink API structures and commands.</para> | ||
86 | !Idrivers/w1/w1_netlink.h | ||
87 | </sect2> | ||
88 | |||
89 | <sect2 id="w1_io.c"> | ||
90 | <title>drivers/w1/w1_io.c</title> | ||
91 | <para>W1 input/output.</para> | ||
92 | !Edrivers/w1/w1_io.c | ||
93 | !Idrivers/w1/w1_io.c | ||
94 | </sect2> | ||
95 | |||
96 | </sect1> | ||
97 | |||
98 | |||
99 | </chapter> | ||
100 | |||
101 | </book> | ||
diff --git a/Documentation/DocBook/z8530book.tmpl b/Documentation/DocBook/z8530book.tmpl deleted file mode 100644 index 6f3883be877e..000000000000 --- a/Documentation/DocBook/z8530book.tmpl +++ /dev/null | |||
@@ -1,371 +0,0 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="Z85230Guide"> | ||
6 | <bookinfo> | ||
7 | <title>Z8530 Programming Guide</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Alan</firstname> | ||
12 | <surname>Cox</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>alan@lxorguk.ukuu.org.uk</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | </authorgroup> | ||
20 | |||
21 | <copyright> | ||
22 | <year>2000</year> | ||
23 | <holder>Alan Cox</holder> | ||
24 | </copyright> | ||
25 | |||
26 | <legalnotice> | ||
27 | <para> | ||
28 | This documentation is free software; you can redistribute | ||
29 | it and/or modify it under the terms of the GNU General Public | ||
30 | License as published by the Free Software Foundation; either | ||
31 | version 2 of the License, or (at your option) any later | ||
32 | version. | ||
33 | </para> | ||
34 | |||
35 | <para> | ||
36 | This program is distributed in the hope that it will be | ||
37 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
38 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
39 | See the GNU General Public License for more details. | ||
40 | </para> | ||
41 | |||
42 | <para> | ||
43 | You should have received a copy of the GNU General Public | ||
44 | License along with this program; if not, write to the Free | ||
45 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
46 | MA 02111-1307 USA | ||
47 | </para> | ||
48 | |||
49 | <para> | ||
50 | For more details see the file COPYING in the source | ||
51 | distribution of Linux. | ||
52 | </para> | ||
53 | </legalnotice> | ||
54 | </bookinfo> | ||
55 | |||
56 | <toc></toc> | ||
57 | |||
58 | <chapter id="intro"> | ||
59 | <title>Introduction</title> | ||
60 | <para> | ||
61 | The Z85x30 family synchronous/asynchronous controller chips are | ||
62 | used on a large number of cheap network interface cards. The | ||
63 | kernel provides a core interface layer that is designed to make | ||
64 | it easy to provide WAN services using this chip. | ||
65 | </para> | ||
66 | <para> | ||
67 | The current driver only support synchronous operation. Merging the | ||
68 | asynchronous driver support into this code to allow any Z85x30 | ||
69 | device to be used as both a tty interface and as a synchronous | ||
70 | controller is a project for Linux post the 2.4 release | ||
71 | </para> | ||
72 | </chapter> | ||
73 | |||
74 | <chapter id="Driver_Modes"> | ||
75 | <title>Driver Modes</title> | ||
76 | <para> | ||
77 | The Z85230 driver layer can drive Z8530, Z85C30 and Z85230 devices | ||
78 | in three different modes. Each mode can be applied to an individual | ||
79 | channel on the chip (each chip has two channels). | ||
80 | </para> | ||
81 | <para> | ||
82 | The PIO synchronous mode supports the most common Z8530 wiring. Here | ||
83 | the chip is interface to the I/O and interrupt facilities of the | ||
84 | host machine but not to the DMA subsystem. When running PIO the | ||
85 | Z8530 has extremely tight timing requirements. Doing high speeds, | ||
86 | even with a Z85230 will be tricky. Typically you should expect to | ||
87 | achieve at best 9600 baud with a Z8C530 and 64Kbits with a Z85230. | ||
88 | </para> | ||
89 | <para> | ||
90 | The DMA mode supports the chip when it is configured to use dual DMA | ||
91 | channels on an ISA bus. The better cards tend to support this mode | ||
92 | of operation for a single channel. With DMA running the Z85230 tops | ||
93 | out when it starts to hit ISA DMA constraints at about 512Kbits. It | ||
94 | is worth noting here that many PC machines hang or crash when the | ||
95 | chip is driven fast enough to hold the ISA bus solid. | ||
96 | </para> | ||
97 | <para> | ||
98 | Transmit DMA mode uses a single DMA channel. The DMA channel is used | ||
99 | for transmission as the transmit FIFO is smaller than the receive | ||
100 | FIFO. it gives better performance than pure PIO mode but is nowhere | ||
101 | near as ideal as pure DMA mode. | ||
102 | </para> | ||
103 | </chapter> | ||
104 | |||
105 | <chapter id="Using_the_Z85230_driver"> | ||
106 | <title>Using the Z85230 driver</title> | ||
107 | <para> | ||
108 | The Z85230 driver provides the back end interface to your board. To | ||
109 | configure a Z8530 interface you need to detect the board and to | ||
110 | identify its ports and interrupt resources. It is also your problem | ||
111 | to verify the resources are available. | ||
112 | </para> | ||
113 | <para> | ||
114 | Having identified the chip you need to fill in a struct z8530_dev, | ||
115 | which describes each chip. This object must exist until you finally | ||
116 | shutdown the board. Firstly zero the active field. This ensures | ||
117 | nothing goes off without you intending it. The irq field should | ||
118 | be set to the interrupt number of the chip. (Each chip has a single | ||
119 | interrupt source rather than each channel). You are responsible | ||
120 | for allocating the interrupt line. The interrupt handler should be | ||
121 | set to <function>z8530_interrupt</function>. The device id should | ||
122 | be set to the z8530_dev structure pointer. Whether the interrupt can | ||
123 | be shared or not is board dependent, and up to you to initialise. | ||
124 | </para> | ||
125 | <para> | ||
126 | The structure holds two channel structures. | ||
127 | Initialise chanA.ctrlio and chanA.dataio with the address of the | ||
128 | control and data ports. You can or this with Z8530_PORT_SLEEP to | ||
129 | indicate your interface needs the 5uS delay for chip settling done | ||
130 | in software. The PORT_SLEEP option is architecture specific. Other | ||
131 | flags may become available on future platforms, eg for MMIO. | ||
132 | Initialise the chanA.irqs to &z8530_nop to start the chip up | ||
133 | as disabled and discarding interrupt events. This ensures that | ||
134 | stray interrupts will be mopped up and not hang the bus. Set | ||
135 | chanA.dev to point to the device structure itself. The | ||
136 | private and name field you may use as you wish. The private field | ||
137 | is unused by the Z85230 layer. The name is used for error reporting | ||
138 | and it may thus make sense to make it match the network name. | ||
139 | </para> | ||
140 | <para> | ||
141 | Repeat the same operation with the B channel if your chip has | ||
142 | both channels wired to something useful. This isn't always the | ||
143 | case. If it is not wired then the I/O values do not matter, but | ||
144 | you must initialise chanB.dev. | ||
145 | </para> | ||
146 | <para> | ||
147 | If your board has DMA facilities then initialise the txdma and | ||
148 | rxdma fields for the relevant channels. You must also allocate the | ||
149 | ISA DMA channels and do any necessary board level initialisation | ||
150 | to configure them. The low level driver will do the Z8530 and | ||
151 | DMA controller programming but not board specific magic. | ||
152 | </para> | ||
153 | <para> | ||
154 | Having initialised the device you can then call | ||
155 | <function>z8530_init</function>. This will probe the chip and | ||
156 | reset it into a known state. An identification sequence is then | ||
157 | run to identify the chip type. If the checks fail to pass the | ||
158 | function returns a non zero error code. Typically this indicates | ||
159 | that the port given is not valid. After this call the | ||
160 | type field of the z8530_dev structure is initialised to either | ||
161 | Z8530, Z85C30 or Z85230 according to the chip found. | ||
162 | </para> | ||
163 | <para> | ||
164 | Once you have called z8530_init you can also make use of the utility | ||
165 | function <function>z8530_describe</function>. This provides a | ||
166 | consistent reporting format for the Z8530 devices, and allows all | ||
167 | the drivers to provide consistent reporting. | ||
168 | </para> | ||
169 | </chapter> | ||
170 | |||
171 | <chapter id="Attaching_Network_Interfaces"> | ||
172 | <title>Attaching Network Interfaces</title> | ||
173 | <para> | ||
174 | If you wish to use the network interface facilities of the driver, | ||
175 | then you need to attach a network device to each channel that is | ||
176 | present and in use. In addition to use the generic HDLC | ||
177 | you need to follow some additional plumbing rules. They may seem | ||
178 | complex but a look at the example hostess_sv11 driver should | ||
179 | reassure you. | ||
180 | </para> | ||
181 | <para> | ||
182 | The network device used for each channel should be pointed to by | ||
183 | the netdevice field of each channel. The hdlc-> priv field of the | ||
184 | network device points to your private data - you will need to be | ||
185 | able to find your private data from this. | ||
186 | </para> | ||
187 | <para> | ||
188 | The way most drivers approach this particular problem is to | ||
189 | create a structure holding the Z8530 device definition and | ||
190 | put that into the private field of the network device. The | ||
191 | network device fields of the channels then point back to the | ||
192 | network devices. | ||
193 | </para> | ||
194 | <para> | ||
195 | If you wish to use the generic HDLC then you need to register | ||
196 | the HDLC device. | ||
197 | </para> | ||
198 | <para> | ||
199 | Before you register your network device you will also need to | ||
200 | provide suitable handlers for most of the network device callbacks. | ||
201 | See the network device documentation for more details on this. | ||
202 | </para> | ||
203 | </chapter> | ||
204 | |||
205 | <chapter id="Configuring_And_Activating_The_Port"> | ||
206 | <title>Configuring And Activating The Port</title> | ||
207 | <para> | ||
208 | The Z85230 driver provides helper functions and tables to load the | ||
209 | port registers on the Z8530 chips. When programming the register | ||
210 | settings for a channel be aware that the documentation recommends | ||
211 | initialisation orders. Strange things happen when these are not | ||
212 | followed. | ||
213 | </para> | ||
214 | <para> | ||
215 | <function>z8530_channel_load</function> takes an array of | ||
216 | pairs of initialisation values in an array of u8 type. The first | ||
217 | value is the Z8530 register number. Add 16 to indicate the alternate | ||
218 | register bank on the later chips. The array is terminated by a 255. | ||
219 | </para> | ||
220 | <para> | ||
221 | The driver provides a pair of public tables. The | ||
222 | z8530_hdlc_kilostream table is for the UK 'Kilostream' service and | ||
223 | also happens to cover most other end host configurations. The | ||
224 | z8530_hdlc_kilostream_85230 table is the same configuration using | ||
225 | the enhancements of the 85230 chip. The configuration loaded is | ||
226 | standard NRZ encoded synchronous data with HDLC bitstuffing. All | ||
227 | of the timing is taken from the other end of the link. | ||
228 | </para> | ||
229 | <para> | ||
230 | When writing your own tables be aware that the driver internally | ||
231 | tracks register values. It may need to reload values. You should | ||
232 | therefore be sure to set registers 1-7, 9-11, 14 and 15 in all | ||
233 | configurations. Where the register settings depend on DMA selection | ||
234 | the driver will update the bits itself when you open or close. | ||
235 | Loading a new table with the interface open is not recommended. | ||
236 | </para> | ||
237 | <para> | ||
238 | There are three standard configurations supported by the core | ||
239 | code. In PIO mode the interface is programmed up to use | ||
240 | interrupt driven PIO. This places high demands on the host processor | ||
241 | to avoid latency. The driver is written to take account of latency | ||
242 | issues but it cannot avoid latencies caused by other drivers, | ||
243 | notably IDE in PIO mode. Because the drivers allocate buffers you | ||
244 | must also prevent MTU changes while the port is open. | ||
245 | </para> | ||
246 | <para> | ||
247 | Once the port is open it will call the rx_function of each channel | ||
248 | whenever a completed packet arrived. This is invoked from | ||
249 | interrupt context and passes you the channel and a network | ||
250 | buffer (struct sk_buff) holding the data. The data includes | ||
251 | the CRC bytes so most users will want to trim the last two | ||
252 | bytes before processing the data. This function is very timing | ||
253 | critical. When you wish to simply discard data the support | ||
254 | code provides the function <function>z8530_null_rx</function> | ||
255 | to discard the data. | ||
256 | </para> | ||
257 | <para> | ||
258 | To active PIO mode sending and receiving the <function> | ||
259 | z8530_sync_open</function> is called. This expects to be passed | ||
260 | the network device and the channel. Typically this is called from | ||
261 | your network device open callback. On a failure a non zero error | ||
262 | status is returned. The <function>z8530_sync_close</function> | ||
263 | function shuts down a PIO channel. This must be done before the | ||
264 | channel is opened again and before the driver shuts down | ||
265 | and unloads. | ||
266 | </para> | ||
267 | <para> | ||
268 | The ideal mode of operation is dual channel DMA mode. Here the | ||
269 | kernel driver will configure the board for DMA in both directions. | ||
270 | The driver also handles ISA DMA issues such as controller | ||
271 | programming and the memory range limit for you. This mode is | ||
272 | activated by calling the <function>z8530_sync_dma_open</function> | ||
273 | function. On failure a non zero error value is returned. | ||
274 | Once this mode is activated it can be shut down by calling the | ||
275 | <function>z8530_sync_dma_close</function>. You must call the close | ||
276 | function matching the open mode you used. | ||
277 | </para> | ||
278 | <para> | ||
279 | The final supported mode uses a single DMA channel to drive the | ||
280 | transmit side. As the Z85C30 has a larger FIFO on the receive | ||
281 | channel this tends to increase the maximum speed a little. | ||
282 | This is activated by calling the <function>z8530_sync_txdma_open | ||
283 | </function>. This returns a non zero error code on failure. The | ||
284 | <function>z8530_sync_txdma_close</function> function closes down | ||
285 | the Z8530 interface from this mode. | ||
286 | </para> | ||
287 | </chapter> | ||
288 | |||
289 | <chapter id="Network_Layer_Functions"> | ||
290 | <title>Network Layer Functions</title> | ||
291 | <para> | ||
292 | The Z8530 layer provides functions to queue packets for | ||
293 | transmission. The driver internally buffers the frame currently | ||
294 | being transmitted and one further frame (in order to keep back | ||
295 | to back transmission running). Any further buffering is up to | ||
296 | the caller. | ||
297 | </para> | ||
298 | <para> | ||
299 | The function <function>z8530_queue_xmit</function> takes a network | ||
300 | buffer in sk_buff format and queues it for transmission. The | ||
301 | caller must provide the entire packet with the exception of the | ||
302 | bitstuffing and CRC. This is normally done by the caller via | ||
303 | the generic HDLC interface layer. It returns 0 if the buffer has been | ||
304 | queued and non zero values for queue full. If the function accepts | ||
305 | the buffer it becomes property of the Z8530 layer and the caller | ||
306 | should not free it. | ||
307 | </para> | ||
308 | <para> | ||
309 | The function <function>z8530_get_stats</function> returns a pointer | ||
310 | to an internally maintained per interface statistics block. This | ||
311 | provides most of the interface code needed to implement the network | ||
312 | layer get_stats callback. | ||
313 | </para> | ||
314 | </chapter> | ||
315 | |||
316 | <chapter id="Porting_The_Z8530_Driver"> | ||
317 | <title>Porting The Z8530 Driver</title> | ||
318 | <para> | ||
319 | The Z8530 driver is written to be portable. In DMA mode it makes | ||
320 | assumptions about the use of ISA DMA. These are probably warranted | ||
321 | in most cases as the Z85230 in particular was designed to glue to PC | ||
322 | type machines. The PIO mode makes no real assumptions. | ||
323 | </para> | ||
324 | <para> | ||
325 | Should you need to retarget the Z8530 driver to another architecture | ||
326 | the only code that should need changing are the port I/O functions. | ||
327 | At the moment these assume PC I/O port accesses. This may not be | ||
328 | appropriate for all platforms. Replacing | ||
329 | <function>z8530_read_port</function> and <function>z8530_write_port | ||
330 | </function> is intended to be all that is required to port this | ||
331 | driver layer. | ||
332 | </para> | ||
333 | </chapter> | ||
334 | |||
335 | <chapter id="bugs"> | ||
336 | <title>Known Bugs And Assumptions</title> | ||
337 | <para> | ||
338 | <variablelist> | ||
339 | <varlistentry><term>Interrupt Locking</term> | ||
340 | <listitem> | ||
341 | <para> | ||
342 | The locking in the driver is done via the global cli/sti lock. This | ||
343 | makes for relatively poor SMP performance. Switching this to use a | ||
344 | per device spin lock would probably materially improve performance. | ||
345 | </para> | ||
346 | </listitem></varlistentry> | ||
347 | |||
348 | <varlistentry><term>Occasional Failures</term> | ||
349 | <listitem> | ||
350 | <para> | ||
351 | We have reports of occasional failures when run for very long | ||
352 | periods of time and the driver starts to receive junk frames. At | ||
353 | the moment the cause of this is not clear. | ||
354 | </para> | ||
355 | </listitem></varlistentry> | ||
356 | </variablelist> | ||
357 | |||
358 | </para> | ||
359 | </chapter> | ||
360 | |||
361 | <chapter id="pubfunctions"> | ||
362 | <title>Public Functions Provided</title> | ||
363 | !Edrivers/net/wan/z85230.c | ||
364 | </chapter> | ||
365 | |||
366 | <chapter id="intfunctions"> | ||
367 | <title>Internal Functions</title> | ||
368 | !Idrivers/net/wan/z85230.c | ||
369 | </chapter> | ||
370 | |||
371 | </book> | ||
diff --git a/Documentation/Makefile b/Documentation/Makefile index c2a469112c37..a42320385df3 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile | |||
@@ -1 +1,126 @@ | |||
1 | # -*- makefile -*- | ||
2 | # Makefile for Sphinx documentation | ||
3 | # | ||
4 | |||
1 | subdir-y := | 5 | subdir-y := |
6 | |||
7 | # You can set these variables from the command line. | ||
8 | SPHINXBUILD = sphinx-build | ||
9 | SPHINXOPTS = | ||
10 | SPHINXDIRS = . | ||
11 | _SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(srctree)/Documentation/*/conf.py)) | ||
12 | SPHINX_CONF = conf.py | ||
13 | PAPER = | ||
14 | BUILDDIR = $(obj)/output | ||
15 | PDFLATEX = xelatex | ||
16 | LATEXOPTS = -interaction=batchmode | ||
17 | |||
18 | # User-friendly check for sphinx-build | ||
19 | HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi) | ||
20 | |||
21 | ifeq ($(HAVE_SPHINX),0) | ||
22 | |||
23 | .DEFAULT: | ||
24 | $(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.) | ||
25 | @echo " SKIP Sphinx $@ target." | ||
26 | |||
27 | else # HAVE_SPHINX | ||
28 | |||
29 | # User-friendly check for pdflatex | ||
30 | HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi) | ||
31 | |||
32 | # Internal variables. | ||
33 | PAPEROPT_a4 = -D latex_paper_size=a4 | ||
34 | PAPEROPT_letter = -D latex_paper_size=letter | ||
35 | KERNELDOC = $(srctree)/scripts/kernel-doc | ||
36 | KERNELDOC_CONF = -D kerneldoc_srctree=$(srctree) -D kerneldoc_bin=$(KERNELDOC) | ||
37 | ALLSPHINXOPTS = $(KERNELDOC_CONF) $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) | ||
38 | # the i18n builder cannot share the environment and doctrees with the others | ||
39 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . | ||
40 | |||
41 | # commands; the 'cmd' from scripts/Kbuild.include is not *loopable* | ||
42 | loop_cmd = $(echo-cmd) $(cmd_$(1)) || exit; | ||
43 | |||
44 | # $2 sphinx builder e.g. "html" | ||
45 | # $3 name of the build subfolder / e.g. "media", used as: | ||
46 | # * dest folder relative to $(BUILDDIR) and | ||
47 | # * cache folder relative to $(BUILDDIR)/.doctrees | ||
48 | # $4 dest subfolder e.g. "man" for man pages at media/man | ||
49 | # $5 reST source folder relative to $(srctree)/$(src), | ||
50 | # e.g. "media" for the linux-tv book-set at ./Documentation/media | ||
51 | |||
52 | quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4) | ||
53 | cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2 && \ | ||
54 | PYTHONDONTWRITEBYTECODE=1 \ | ||
55 | BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \ | ||
56 | $(SPHINXBUILD) \ | ||
57 | -b $2 \ | ||
58 | -c $(abspath $(srctree)/$(src)) \ | ||
59 | -d $(abspath $(BUILDDIR)/.doctrees/$3) \ | ||
60 | -D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \ | ||
61 | $(ALLSPHINXOPTS) \ | ||
62 | $(abspath $(srctree)/$(src)/$5) \ | ||
63 | $(abspath $(BUILDDIR)/$3/$4) | ||
64 | |||
65 | htmldocs: | ||
66 | @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var))) | ||
67 | |||
68 | linkcheckdocs: | ||
69 | @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var))) | ||
70 | |||
71 | latexdocs: | ||
72 | @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var))) | ||
73 | |||
74 | ifeq ($(HAVE_PDFLATEX),0) | ||
75 | |||
76 | pdfdocs: | ||
77 | $(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.) | ||
78 | @echo " SKIP Sphinx $@ target." | ||
79 | |||
80 | else # HAVE_PDFLATEX | ||
81 | |||
82 | pdfdocs: latexdocs | ||
83 | $(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;) | ||
84 | |||
85 | endif # HAVE_PDFLATEX | ||
86 | |||
87 | epubdocs: | ||
88 | @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var))) | ||
89 | |||
90 | xmldocs: | ||
91 | @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var))) | ||
92 | |||
93 | endif # HAVE_SPHINX | ||
94 | |||
95 | # The following targets are independent of HAVE_SPHINX, and the rules should | ||
96 | # work or silently pass without Sphinx. | ||
97 | |||
98 | # no-ops for the Sphinx toolchain | ||
99 | sgmldocs: | ||
100 | @: | ||
101 | psdocs: | ||
102 | @: | ||
103 | mandocs: | ||
104 | @: | ||
105 | installmandocs: | ||
106 | @: | ||
107 | |||
108 | cleandocs: | ||
109 | $(Q)rm -rf $(BUILDDIR) | ||
110 | $(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media clean | ||
111 | |||
112 | dochelp: | ||
113 | @echo ' Linux kernel internal documentation in different formats from ReST:' | ||
114 | @echo ' htmldocs - HTML' | ||
115 | @echo ' latexdocs - LaTeX' | ||
116 | @echo ' pdfdocs - PDF' | ||
117 | @echo ' epubdocs - EPUB' | ||
118 | @echo ' xmldocs - XML' | ||
119 | @echo ' linkcheckdocs - check for broken external links (will connect to external hosts)' | ||
120 | @echo ' cleandocs - clean all generated files' | ||
121 | @echo | ||
122 | @echo ' make SPHINXDIRS="s1 s2" [target] Generate only docs of folder s1, s2' | ||
123 | @echo ' valid values for SPHINXDIRS are: $(_SPHINXDIRS)' | ||
124 | @echo | ||
125 | @echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build' | ||
126 | @echo ' configuration. This is e.g. useful to build with nit-picking config.' | ||
diff --git a/Documentation/Makefile.sphinx b/Documentation/Makefile.sphinx deleted file mode 100644 index bcf529f6cf9b..000000000000 --- a/Documentation/Makefile.sphinx +++ /dev/null | |||
@@ -1,130 +0,0 @@ | |||
1 | # -*- makefile -*- | ||
2 | # Makefile for Sphinx documentation | ||
3 | # | ||
4 | |||
5 | # You can set these variables from the command line. | ||
6 | SPHINXBUILD = sphinx-build | ||
7 | SPHINXOPTS = | ||
8 | SPHINXDIRS = . | ||
9 | _SPHINXDIRS = $(patsubst $(srctree)/Documentation/%/conf.py,%,$(wildcard $(srctree)/Documentation/*/conf.py)) | ||
10 | SPHINX_CONF = conf.py | ||
11 | PAPER = | ||
12 | BUILDDIR = $(obj)/output | ||
13 | PDFLATEX = xelatex | ||
14 | LATEXOPTS = -interaction=batchmode | ||
15 | |||
16 | # User-friendly check for sphinx-build | ||
17 | HAVE_SPHINX := $(shell if which $(SPHINXBUILD) >/dev/null 2>&1; then echo 1; else echo 0; fi) | ||
18 | |||
19 | ifeq ($(HAVE_SPHINX),0) | ||
20 | |||
21 | .DEFAULT: | ||
22 | $(warning The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed and in PATH, or set the SPHINXBUILD make variable to point to the full path of the '$(SPHINXBUILD)' executable.) | ||
23 | @echo " SKIP Sphinx $@ target." | ||
24 | |||
25 | else ifneq ($(DOCBOOKS),) | ||
26 | |||
27 | # Skip Sphinx build if the user explicitly requested DOCBOOKS. | ||
28 | .DEFAULT: | ||
29 | @echo " SKIP Sphinx $@ target (DOCBOOKS specified)." | ||
30 | |||
31 | else # HAVE_SPHINX | ||
32 | |||
33 | # User-friendly check for pdflatex | ||
34 | HAVE_PDFLATEX := $(shell if which $(PDFLATEX) >/dev/null 2>&1; then echo 1; else echo 0; fi) | ||
35 | |||
36 | # Internal variables. | ||
37 | PAPEROPT_a4 = -D latex_paper_size=a4 | ||
38 | PAPEROPT_letter = -D latex_paper_size=letter | ||
39 | KERNELDOC = $(srctree)/scripts/kernel-doc | ||
40 | KERNELDOC_CONF = -D kerneldoc_srctree=$(srctree) -D kerneldoc_bin=$(KERNELDOC) | ||
41 | ALLSPHINXOPTS = $(KERNELDOC_CONF) $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) | ||
42 | # the i18n builder cannot share the environment and doctrees with the others | ||
43 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . | ||
44 | |||
45 | # commands; the 'cmd' from scripts/Kbuild.include is not *loopable* | ||
46 | loop_cmd = $(echo-cmd) $(cmd_$(1)) || exit; | ||
47 | |||
48 | # $2 sphinx builder e.g. "html" | ||
49 | # $3 name of the build subfolder / e.g. "media", used as: | ||
50 | # * dest folder relative to $(BUILDDIR) and | ||
51 | # * cache folder relative to $(BUILDDIR)/.doctrees | ||
52 | # $4 dest subfolder e.g. "man" for man pages at media/man | ||
53 | # $5 reST source folder relative to $(srctree)/$(src), | ||
54 | # e.g. "media" for the linux-tv book-set at ./Documentation/media | ||
55 | |||
56 | quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4) | ||
57 | cmd_sphinx = $(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media $2 && \ | ||
58 | PYTHONDONTWRITEBYTECODE=1 \ | ||
59 | BUILDDIR=$(abspath $(BUILDDIR)) SPHINX_CONF=$(abspath $(srctree)/$(src)/$5/$(SPHINX_CONF)) \ | ||
60 | $(SPHINXBUILD) \ | ||
61 | -b $2 \ | ||
62 | -c $(abspath $(srctree)/$(src)) \ | ||
63 | -d $(abspath $(BUILDDIR)/.doctrees/$3) \ | ||
64 | -D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \ | ||
65 | $(ALLSPHINXOPTS) \ | ||
66 | $(abspath $(srctree)/$(src)/$5) \ | ||
67 | $(abspath $(BUILDDIR)/$3/$4) | ||
68 | |||
69 | htmldocs: | ||
70 | @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,html,$(var),,$(var))) | ||
71 | |||
72 | linkcheckdocs: | ||
73 | @$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,linkcheck,$(var),,$(var))) | ||
74 | |||
75 | latexdocs: | ||
76 | @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,latex,$(var),latex,$(var))) | ||
77 | |||
78 | ifeq ($(HAVE_PDFLATEX),0) | ||
79 | |||
80 | pdfdocs: | ||
81 | $(warning The '$(PDFLATEX)' command was not found. Make sure you have it installed and in PATH to produce PDF output.) | ||
82 | @echo " SKIP Sphinx $@ target." | ||
83 | |||
84 | else # HAVE_PDFLATEX | ||
85 | |||
86 | pdfdocs: latexdocs | ||
87 | $(foreach var,$(SPHINXDIRS), $(MAKE) PDFLATEX=$(PDFLATEX) LATEXOPTS="$(LATEXOPTS)" -C $(BUILDDIR)/$(var)/latex || exit;) | ||
88 | |||
89 | endif # HAVE_PDFLATEX | ||
90 | |||
91 | epubdocs: | ||
92 | @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,epub,$(var),epub,$(var))) | ||
93 | |||
94 | xmldocs: | ||
95 | @+$(foreach var,$(SPHINXDIRS),$(call loop_cmd,sphinx,xml,$(var),xml,$(var))) | ||
96 | |||
97 | endif # HAVE_SPHINX | ||
98 | |||
99 | # The following targets are independent of HAVE_SPHINX, and the rules should | ||
100 | # work or silently pass without Sphinx. | ||
101 | |||
102 | # no-ops for the Sphinx toolchain | ||
103 | sgmldocs: | ||
104 | @: | ||
105 | psdocs: | ||
106 | @: | ||
107 | mandocs: | ||
108 | @: | ||
109 | installmandocs: | ||
110 | @: | ||
111 | |||
112 | cleandocs: | ||
113 | $(Q)rm -rf $(BUILDDIR) | ||
114 | $(Q)$(MAKE) BUILDDIR=$(abspath $(BUILDDIR)) $(build)=Documentation/media clean | ||
115 | |||
116 | dochelp: | ||
117 | @echo ' Linux kernel internal documentation in different formats (Sphinx):' | ||
118 | @echo ' htmldocs - HTML' | ||
119 | @echo ' latexdocs - LaTeX' | ||
120 | @echo ' pdfdocs - PDF' | ||
121 | @echo ' epubdocs - EPUB' | ||
122 | @echo ' xmldocs - XML' | ||
123 | @echo ' linkcheckdocs - check for broken external links (will connect to external hosts)' | ||
124 | @echo ' cleandocs - clean all generated files' | ||
125 | @echo | ||
126 | @echo ' make SPHINXDIRS="s1 s2" [target] Generate only docs of folder s1, s2' | ||
127 | @echo ' valid values for SPHINXDIRS are: $(_SPHINXDIRS)' | ||
128 | @echo | ||
129 | @echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build' | ||
130 | @echo ' configuration. This is e.g. useful to build with nit-picking config.' | ||
diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt index 1e37138027a3..618e13d5e276 100644 --- a/Documentation/PCI/MSI-HOWTO.txt +++ b/Documentation/PCI/MSI-HOWTO.txt | |||
@@ -186,7 +186,7 @@ must disable interrupts while the lock is held. If the device sends | |||
186 | a different interrupt, the driver will deadlock trying to recursively | 186 | a different interrupt, the driver will deadlock trying to recursively |
187 | acquire the spinlock. Such deadlocks can be avoided by using | 187 | acquire the spinlock. Such deadlocks can be avoided by using |
188 | spin_lock_irqsave() or spin_lock_irq() which disable local interrupts | 188 | spin_lock_irqsave() or spin_lock_irq() which disable local interrupts |
189 | and acquire the lock (see Documentation/DocBook/kernel-locking). | 189 | and acquire the lock (see Documentation/kernel-hacking/locking.rst). |
190 | 190 | ||
191 | 4.5 How to tell whether MSI/MSI-X is enabled on a device | 191 | 4.5 How to tell whether MSI/MSI-X is enabled on a device |
192 | 192 | ||
diff --git a/Documentation/security/LoadPin.txt b/Documentation/admin-guide/LSM/LoadPin.rst index e11877f5d3d4..32070762d24c 100644 --- a/Documentation/security/LoadPin.txt +++ b/Documentation/admin-guide/LSM/LoadPin.rst | |||
@@ -1,3 +1,7 @@ | |||
1 | ======= | ||
2 | LoadPin | ||
3 | ======= | ||
4 | |||
1 | LoadPin is a Linux Security Module that ensures all kernel-loaded files | 5 | LoadPin is a Linux Security Module that ensures all kernel-loaded files |
2 | (modules, firmware, etc) all originate from the same filesystem, with | 6 | (modules, firmware, etc) all originate from the same filesystem, with |
3 | the expectation that such a filesystem is backed by a read-only device | 7 | the expectation that such a filesystem is backed by a read-only device |
@@ -5,13 +9,13 @@ such as dm-verity or CDROM. This allows systems that have a verified | |||
5 | and/or unchangeable filesystem to enforce module and firmware loading | 9 | and/or unchangeable filesystem to enforce module and firmware loading |
6 | restrictions without needing to sign the files individually. | 10 | restrictions without needing to sign the files individually. |
7 | 11 | ||
8 | The LSM is selectable at build-time with CONFIG_SECURITY_LOADPIN, and | 12 | The LSM is selectable at build-time with ``CONFIG_SECURITY_LOADPIN``, and |
9 | can be controlled at boot-time with the kernel command line option | 13 | can be controlled at boot-time with the kernel command line option |
10 | "loadpin.enabled". By default, it is enabled, but can be disabled at | 14 | "``loadpin.enabled``". By default, it is enabled, but can be disabled at |
11 | boot ("loadpin.enabled=0"). | 15 | boot ("``loadpin.enabled=0``"). |
12 | 16 | ||
13 | LoadPin starts pinning when it sees the first file loaded. If the | 17 | LoadPin starts pinning when it sees the first file loaded. If the |
14 | block device backing the filesystem is not read-only, a sysctl is | 18 | block device backing the filesystem is not read-only, a sysctl is |
15 | created to toggle pinning: /proc/sys/kernel/loadpin/enabled. (Having | 19 | created to toggle pinning: ``/proc/sys/kernel/loadpin/enabled``. (Having |
16 | a mutable filesystem means pinning is mutable too, but having the | 20 | a mutable filesystem means pinning is mutable too, but having the |
17 | sysctl allows for easy testing on systems with a mutable filesystem.) | 21 | sysctl allows for easy testing on systems with a mutable filesystem.) |
diff --git a/Documentation/security/SELinux.txt b/Documentation/admin-guide/LSM/SELinux.rst index 07eae00f3314..f722c9b4173a 100644 --- a/Documentation/security/SELinux.txt +++ b/Documentation/admin-guide/LSM/SELinux.rst | |||
@@ -1,27 +1,33 @@ | |||
1 | ======= | ||
2 | SELinux | ||
3 | ======= | ||
4 | |||
1 | If you want to use SELinux, chances are you will want | 5 | If you want to use SELinux, chances are you will want |
2 | to use the distro-provided policies, or install the | 6 | to use the distro-provided policies, or install the |
3 | latest reference policy release from | 7 | latest reference policy release from |
8 | |||
4 | http://oss.tresys.com/projects/refpolicy | 9 | http://oss.tresys.com/projects/refpolicy |
5 | 10 | ||
6 | However, if you want to install a dummy policy for | 11 | However, if you want to install a dummy policy for |
7 | testing, you can do using 'mdp' provided under | 12 | testing, you can do using ``mdp`` provided under |
8 | scripts/selinux. Note that this requires the selinux | 13 | scripts/selinux. Note that this requires the selinux |
9 | userspace to be installed - in particular you will | 14 | userspace to be installed - in particular you will |
10 | need checkpolicy to compile a kernel, and setfiles and | 15 | need checkpolicy to compile a kernel, and setfiles and |
11 | fixfiles to label the filesystem. | 16 | fixfiles to label the filesystem. |
12 | 17 | ||
13 | 1. Compile the kernel with selinux enabled. | 18 | 1. Compile the kernel with selinux enabled. |
14 | 2. Type 'make' to compile mdp. | 19 | 2. Type ``make`` to compile ``mdp``. |
15 | 3. Make sure that you are not running with | 20 | 3. Make sure that you are not running with |
16 | SELinux enabled and a real policy. If | 21 | SELinux enabled and a real policy. If |
17 | you are, reboot with selinux disabled | 22 | you are, reboot with selinux disabled |
18 | before continuing. | 23 | before continuing. |
19 | 4. Run install_policy.sh: | 24 | 4. Run install_policy.sh:: |
25 | |||
20 | cd scripts/selinux | 26 | cd scripts/selinux |
21 | sh install_policy.sh | 27 | sh install_policy.sh |
22 | 28 | ||
23 | Step 4 will create a new dummy policy valid for your | 29 | Step 4 will create a new dummy policy valid for your |
24 | kernel, with a single selinux user, role, and type. | 30 | kernel, with a single selinux user, role, and type. |
25 | It will compile the policy, will set your SELINUXTYPE to | 31 | It will compile the policy, will set your ``SELINUXTYPE`` to |
26 | dummy in /etc/selinux/config, install the compiled policy | 32 | ``dummy`` in ``/etc/selinux/config``, install the compiled policy |
27 | as 'dummy', and relabel your filesystem. | 33 | as ``dummy``, and relabel your filesystem. |
diff --git a/Documentation/security/Smack.txt b/Documentation/admin-guide/LSM/Smack.rst index 945cc633d883..6a5826a13aea 100644 --- a/Documentation/security/Smack.txt +++ b/Documentation/admin-guide/LSM/Smack.rst | |||
@@ -1,3 +1,6 @@ | |||
1 | ===== | ||
2 | Smack | ||
3 | ===== | ||
1 | 4 | ||
2 | 5 | ||
3 | "Good for you, you've decided to clean the elevator!" | 6 | "Good for you, you've decided to clean the elevator!" |
@@ -14,6 +17,7 @@ available to determine which is best suited to the problem | |||
14 | at hand. | 17 | at hand. |
15 | 18 | ||
16 | Smack consists of three major components: | 19 | Smack consists of three major components: |
20 | |||
17 | - The kernel | 21 | - The kernel |
18 | - Basic utilities, which are helpful but not required | 22 | - Basic utilities, which are helpful but not required |
19 | - Configuration data | 23 | - Configuration data |
@@ -39,16 +43,24 @@ The current git repository for Smack user space is: | |||
39 | This should make and install on most modern distributions. | 43 | This should make and install on most modern distributions. |
40 | There are five commands included in smackutil: | 44 | There are five commands included in smackutil: |
41 | 45 | ||
42 | chsmack - display or set Smack extended attribute values | 46 | chsmack: |
43 | smackctl - load the Smack access rules | 47 | display or set Smack extended attribute values |
44 | smackaccess - report if a process with one label has access | 48 | |
45 | to an object with another | 49 | smackctl: |
50 | load the Smack access rules | ||
51 | |||
52 | smackaccess: | ||
53 | report if a process with one label has access | ||
54 | to an object with another | ||
46 | 55 | ||
47 | These two commands are obsolete with the introduction of | 56 | These two commands are obsolete with the introduction of |
48 | the smackfs/load2 and smackfs/cipso2 interfaces. | 57 | the smackfs/load2 and smackfs/cipso2 interfaces. |
49 | 58 | ||
50 | smackload - properly formats data for writing to smackfs/load | 59 | smackload: |
51 | smackcipso - properly formats data for writing to smackfs/cipso | 60 | properly formats data for writing to smackfs/load |
61 | |||
62 | smackcipso: | ||
63 | properly formats data for writing to smackfs/cipso | ||
52 | 64 | ||
53 | In keeping with the intent of Smack, configuration data is | 65 | In keeping with the intent of Smack, configuration data is |
54 | minimal and not strictly required. The most important | 66 | minimal and not strictly required. The most important |
@@ -56,15 +68,15 @@ configuration step is mounting the smackfs pseudo filesystem. | |||
56 | If smackutil is installed the startup script will take care | 68 | If smackutil is installed the startup script will take care |
57 | of this, but it can be manually as well. | 69 | of this, but it can be manually as well. |
58 | 70 | ||
59 | Add this line to /etc/fstab: | 71 | Add this line to ``/etc/fstab``:: |
60 | 72 | ||
61 | smackfs /sys/fs/smackfs smackfs defaults 0 0 | 73 | smackfs /sys/fs/smackfs smackfs defaults 0 0 |
62 | 74 | ||
63 | The /sys/fs/smackfs directory is created by the kernel. | 75 | The ``/sys/fs/smackfs`` directory is created by the kernel. |
64 | 76 | ||
65 | Smack uses extended attributes (xattrs) to store labels on filesystem | 77 | Smack uses extended attributes (xattrs) to store labels on filesystem |
66 | objects. The attributes are stored in the extended attribute security | 78 | objects. The attributes are stored in the extended attribute security |
67 | name space. A process must have CAP_MAC_ADMIN to change any of these | 79 | name space. A process must have ``CAP_MAC_ADMIN`` to change any of these |
68 | attributes. | 80 | attributes. |
69 | 81 | ||
70 | The extended attributes that Smack uses are: | 82 | The extended attributes that Smack uses are: |
@@ -73,14 +85,17 @@ SMACK64 | |||
73 | Used to make access control decisions. In almost all cases | 85 | Used to make access control decisions. In almost all cases |
74 | the label given to a new filesystem object will be the label | 86 | the label given to a new filesystem object will be the label |
75 | of the process that created it. | 87 | of the process that created it. |
88 | |||
76 | SMACK64EXEC | 89 | SMACK64EXEC |
77 | The Smack label of a process that execs a program file with | 90 | The Smack label of a process that execs a program file with |
78 | this attribute set will run with this attribute's value. | 91 | this attribute set will run with this attribute's value. |
92 | |||
79 | SMACK64MMAP | 93 | SMACK64MMAP |
80 | Don't allow the file to be mmapped by a process whose Smack | 94 | Don't allow the file to be mmapped by a process whose Smack |
81 | label does not allow all of the access permitted to a process | 95 | label does not allow all of the access permitted to a process |
82 | with the label contained in this attribute. This is a very | 96 | with the label contained in this attribute. This is a very |
83 | specific use case for shared libraries. | 97 | specific use case for shared libraries. |
98 | |||
84 | SMACK64TRANSMUTE | 99 | SMACK64TRANSMUTE |
85 | Can only have the value "TRUE". If this attribute is present | 100 | Can only have the value "TRUE". If this attribute is present |
86 | on a directory when an object is created in the directory and | 101 | on a directory when an object is created in the directory and |
@@ -89,27 +104,29 @@ SMACK64TRANSMUTE | |||
89 | gets the label of the directory instead of the label of the | 104 | gets the label of the directory instead of the label of the |
90 | creating process. If the object being created is a directory | 105 | creating process. If the object being created is a directory |
91 | the SMACK64TRANSMUTE attribute is set as well. | 106 | the SMACK64TRANSMUTE attribute is set as well. |
107 | |||
92 | SMACK64IPIN | 108 | SMACK64IPIN |
93 | This attribute is only available on file descriptors for sockets. | 109 | This attribute is only available on file descriptors for sockets. |
94 | Use the Smack label in this attribute for access control | 110 | Use the Smack label in this attribute for access control |
95 | decisions on packets being delivered to this socket. | 111 | decisions on packets being delivered to this socket. |
112 | |||
96 | SMACK64IPOUT | 113 | SMACK64IPOUT |
97 | This attribute is only available on file descriptors for sockets. | 114 | This attribute is only available on file descriptors for sockets. |
98 | Use the Smack label in this attribute for access control | 115 | Use the Smack label in this attribute for access control |
99 | decisions on packets coming from this socket. | 116 | decisions on packets coming from this socket. |
100 | 117 | ||
101 | There are multiple ways to set a Smack label on a file: | 118 | There are multiple ways to set a Smack label on a file:: |
102 | 119 | ||
103 | # attr -S -s SMACK64 -V "value" path | 120 | # attr -S -s SMACK64 -V "value" path |
104 | # chsmack -a value path | 121 | # chsmack -a value path |
105 | 122 | ||
106 | A process can see the Smack label it is running with by | 123 | A process can see the Smack label it is running with by |
107 | reading /proc/self/attr/current. A process with CAP_MAC_ADMIN | 124 | reading ``/proc/self/attr/current``. A process with ``CAP_MAC_ADMIN`` |
108 | can set the process Smack by writing there. | 125 | can set the process Smack by writing there. |
109 | 126 | ||
110 | Most Smack configuration is accomplished by writing to files | 127 | Most Smack configuration is accomplished by writing to files |
111 | in the smackfs filesystem. This pseudo-filesystem is mounted | 128 | in the smackfs filesystem. This pseudo-filesystem is mounted |
112 | on /sys/fs/smackfs. | 129 | on ``/sys/fs/smackfs``. |
113 | 130 | ||
114 | access | 131 | access |
115 | Provided for backward compatibility. The access2 interface | 132 | Provided for backward compatibility. The access2 interface |
@@ -120,6 +137,7 @@ access | |||
120 | this file. The next read will indicate whether the access | 137 | this file. The next read will indicate whether the access |
121 | would be permitted. The text will be either "1" indicating | 138 | would be permitted. The text will be either "1" indicating |
122 | access, or "0" indicating denial. | 139 | access, or "0" indicating denial. |
140 | |||
123 | access2 | 141 | access2 |
124 | This interface reports whether a subject with the specified | 142 | This interface reports whether a subject with the specified |
125 | Smack label has a particular access to an object with a | 143 | Smack label has a particular access to an object with a |
@@ -127,13 +145,17 @@ access2 | |||
127 | this file. The next read will indicate whether the access | 145 | this file. The next read will indicate whether the access |
128 | would be permitted. The text will be either "1" indicating | 146 | would be permitted. The text will be either "1" indicating |
129 | access, or "0" indicating denial. | 147 | access, or "0" indicating denial. |
148 | |||
130 | ambient | 149 | ambient |
131 | This contains the Smack label applied to unlabeled network | 150 | This contains the Smack label applied to unlabeled network |
132 | packets. | 151 | packets. |
152 | |||
133 | change-rule | 153 | change-rule |
134 | This interface allows modification of existing access control rules. | 154 | This interface allows modification of existing access control rules. |
135 | The format accepted on write is: | 155 | The format accepted on write is:: |
156 | |||
136 | "%s %s %s %s" | 157 | "%s %s %s %s" |
158 | |||
137 | where the first string is the subject label, the second the | 159 | where the first string is the subject label, the second the |
138 | object label, the third the access to allow and the fourth the | 160 | object label, the third the access to allow and the fourth the |
139 | access to deny. The access strings may contain only the characters | 161 | access to deny. The access strings may contain only the characters |
@@ -141,47 +163,63 @@ change-rule | |||
141 | modified by enabling the permissions in the third string and disabling | 163 | modified by enabling the permissions in the third string and disabling |
142 | those in the fourth string. If there is no such rule it will be | 164 | those in the fourth string. If there is no such rule it will be |
143 | created using the access specified in the third and the fourth strings. | 165 | created using the access specified in the third and the fourth strings. |
166 | |||
144 | cipso | 167 | cipso |
145 | Provided for backward compatibility. The cipso2 interface | 168 | Provided for backward compatibility. The cipso2 interface |
146 | is preferred and should be used instead. | 169 | is preferred and should be used instead. |
147 | This interface allows a specific CIPSO header to be assigned | 170 | This interface allows a specific CIPSO header to be assigned |
148 | to a Smack label. The format accepted on write is: | 171 | to a Smack label. The format accepted on write is:: |
172 | |||
149 | "%24s%4d%4d"["%4d"]... | 173 | "%24s%4d%4d"["%4d"]... |
174 | |||
150 | The first string is a fixed Smack label. The first number is | 175 | The first string is a fixed Smack label. The first number is |
151 | the level to use. The second number is the number of categories. | 176 | the level to use. The second number is the number of categories. |
152 | The following numbers are the categories. | 177 | The following numbers are the categories:: |
153 | "level-3-cats-5-19 3 2 5 19" | 178 | |
179 | "level-3-cats-5-19 3 2 5 19" | ||
180 | |||
154 | cipso2 | 181 | cipso2 |
155 | This interface allows a specific CIPSO header to be assigned | 182 | This interface allows a specific CIPSO header to be assigned |
156 | to a Smack label. The format accepted on write is: | 183 | to a Smack label. The format accepted on write is:: |
157 | "%s%4d%4d"["%4d"]... | 184 | |
185 | "%s%4d%4d"["%4d"]... | ||
186 | |||
158 | The first string is a long Smack label. The first number is | 187 | The first string is a long Smack label. The first number is |
159 | the level to use. The second number is the number of categories. | 188 | the level to use. The second number is the number of categories. |
160 | The following numbers are the categories. | 189 | The following numbers are the categories:: |
161 | "level-3-cats-5-19 3 2 5 19" | 190 | |
191 | "level-3-cats-5-19 3 2 5 19" | ||
192 | |||
162 | direct | 193 | direct |
163 | This contains the CIPSO level used for Smack direct label | 194 | This contains the CIPSO level used for Smack direct label |
164 | representation in network packets. | 195 | representation in network packets. |
196 | |||
165 | doi | 197 | doi |
166 | This contains the CIPSO domain of interpretation used in | 198 | This contains the CIPSO domain of interpretation used in |
167 | network packets. | 199 | network packets. |
200 | |||
168 | ipv6host | 201 | ipv6host |
169 | This interface allows specific IPv6 internet addresses to be | 202 | This interface allows specific IPv6 internet addresses to be |
170 | treated as single label hosts. Packets are sent to single | 203 | treated as single label hosts. Packets are sent to single |
171 | label hosts only from processes that have Smack write access | 204 | label hosts only from processes that have Smack write access |
172 | to the host label. All packets received from single label hosts | 205 | to the host label. All packets received from single label hosts |
173 | are given the specified label. The format accepted on write is: | 206 | are given the specified label. The format accepted on write is:: |
207 | |||
174 | "%h:%h:%h:%h:%h:%h:%h:%h label" or | 208 | "%h:%h:%h:%h:%h:%h:%h:%h label" or |
175 | "%h:%h:%h:%h:%h:%h:%h:%h/%d label". | 209 | "%h:%h:%h:%h:%h:%h:%h:%h/%d label". |
210 | |||
176 | The "::" address shortcut is not supported. | 211 | The "::" address shortcut is not supported. |
177 | If label is "-DELETE" a matched entry will be deleted. | 212 | If label is "-DELETE" a matched entry will be deleted. |
213 | |||
178 | load | 214 | load |
179 | Provided for backward compatibility. The load2 interface | 215 | Provided for backward compatibility. The load2 interface |
180 | is preferred and should be used instead. | 216 | is preferred and should be used instead. |
181 | This interface allows access control rules in addition to | 217 | This interface allows access control rules in addition to |
182 | the system defined rules to be specified. The format accepted | 218 | the system defined rules to be specified. The format accepted |
183 | on write is: | 219 | on write is:: |
220 | |||
184 | "%24s%24s%5s" | 221 | "%24s%24s%5s" |
222 | |||
185 | where the first string is the subject label, the second the | 223 | where the first string is the subject label, the second the |
186 | object label, and the third the requested access. The access | 224 | object label, and the third the requested access. The access |
187 | string may contain only the characters "rwxat-", and specifies | 225 | string may contain only the characters "rwxat-", and specifies |
@@ -189,17 +227,21 @@ load | |||
189 | permissions that are not allowed. The string "r-x--" would | 227 | permissions that are not allowed. The string "r-x--" would |
190 | specify read and execute access. Labels are limited to 23 | 228 | specify read and execute access. Labels are limited to 23 |
191 | characters in length. | 229 | characters in length. |
230 | |||
192 | load2 | 231 | load2 |
193 | This interface allows access control rules in addition to | 232 | This interface allows access control rules in addition to |
194 | the system defined rules to be specified. The format accepted | 233 | the system defined rules to be specified. The format accepted |
195 | on write is: | 234 | on write is:: |
235 | |||
196 | "%s %s %s" | 236 | "%s %s %s" |
237 | |||
197 | where the first string is the subject label, the second the | 238 | where the first string is the subject label, the second the |
198 | object label, and the third the requested access. The access | 239 | object label, and the third the requested access. The access |
199 | string may contain only the characters "rwxat-", and specifies | 240 | string may contain only the characters "rwxat-", and specifies |
200 | which sort of access is allowed. The "-" is a placeholder for | 241 | which sort of access is allowed. The "-" is a placeholder for |
201 | permissions that are not allowed. The string "r-x--" would | 242 | permissions that are not allowed. The string "r-x--" would |
202 | specify read and execute access. | 243 | specify read and execute access. |
244 | |||
203 | load-self | 245 | load-self |
204 | Provided for backward compatibility. The load-self2 interface | 246 | Provided for backward compatibility. The load-self2 interface |
205 | is preferred and should be used instead. | 247 | is preferred and should be used instead. |
@@ -208,66 +250,83 @@ load-self | |||
208 | otherwise be permitted, and are intended to provide additional | 250 | otherwise be permitted, and are intended to provide additional |
209 | restrictions on the process. The format is the same as for | 251 | restrictions on the process. The format is the same as for |
210 | the load interface. | 252 | the load interface. |
253 | |||
211 | load-self2 | 254 | load-self2 |
212 | This interface allows process specific access rules to be | 255 | This interface allows process specific access rules to be |
213 | defined. These rules are only consulted if access would | 256 | defined. These rules are only consulted if access would |
214 | otherwise be permitted, and are intended to provide additional | 257 | otherwise be permitted, and are intended to provide additional |
215 | restrictions on the process. The format is the same as for | 258 | restrictions on the process. The format is the same as for |
216 | the load2 interface. | 259 | the load2 interface. |
260 | |||
217 | logging | 261 | logging |
218 | This contains the Smack logging state. | 262 | This contains the Smack logging state. |
263 | |||
219 | mapped | 264 | mapped |
220 | This contains the CIPSO level used for Smack mapped label | 265 | This contains the CIPSO level used for Smack mapped label |
221 | representation in network packets. | 266 | representation in network packets. |
267 | |||
222 | netlabel | 268 | netlabel |
223 | This interface allows specific internet addresses to be | 269 | This interface allows specific internet addresses to be |
224 | treated as single label hosts. Packets are sent to single | 270 | treated as single label hosts. Packets are sent to single |
225 | label hosts without CIPSO headers, but only from processes | 271 | label hosts without CIPSO headers, but only from processes |
226 | that have Smack write access to the host label. All packets | 272 | that have Smack write access to the host label. All packets |
227 | received from single label hosts are given the specified | 273 | received from single label hosts are given the specified |
228 | label. The format accepted on write is: | 274 | label. The format accepted on write is:: |
275 | |||
229 | "%d.%d.%d.%d label" or "%d.%d.%d.%d/%d label". | 276 | "%d.%d.%d.%d label" or "%d.%d.%d.%d/%d label". |
277 | |||
230 | If the label specified is "-CIPSO" the address is treated | 278 | If the label specified is "-CIPSO" the address is treated |
231 | as a host that supports CIPSO headers. | 279 | as a host that supports CIPSO headers. |
280 | |||
232 | onlycap | 281 | onlycap |
233 | This contains labels processes must have for CAP_MAC_ADMIN | 282 | This contains labels processes must have for CAP_MAC_ADMIN |
234 | and CAP_MAC_OVERRIDE to be effective. If this file is empty | 283 | and ``CAP_MAC_OVERRIDE`` to be effective. If this file is empty |
235 | these capabilities are effective at for processes with any | 284 | these capabilities are effective at for processes with any |
236 | label. The values are set by writing the desired labels, separated | 285 | label. The values are set by writing the desired labels, separated |
237 | by spaces, to the file or cleared by writing "-" to the file. | 286 | by spaces, to the file or cleared by writing "-" to the file. |
287 | |||
238 | ptrace | 288 | ptrace |
239 | This is used to define the current ptrace policy | 289 | This is used to define the current ptrace policy |
240 | 0 - default: this is the policy that relies on Smack access rules. | 290 | |
241 | For the PTRACE_READ a subject needs to have a read access on | 291 | 0 - default: |
242 | object. For the PTRACE_ATTACH a read-write access is required. | 292 | this is the policy that relies on Smack access rules. |
243 | 1 - exact: this is the policy that limits PTRACE_ATTACH. Attach is | 293 | For the ``PTRACE_READ`` a subject needs to have a read access on |
294 | object. For the ``PTRACE_ATTACH`` a read-write access is required. | ||
295 | |||
296 | 1 - exact: | ||
297 | this is the policy that limits ``PTRACE_ATTACH``. Attach is | ||
244 | only allowed when subject's and object's labels are equal. | 298 | only allowed when subject's and object's labels are equal. |
245 | PTRACE_READ is not affected. Can be overridden with CAP_SYS_PTRACE. | 299 | ``PTRACE_READ`` is not affected. Can be overridden with ``CAP_SYS_PTRACE``. |
246 | 2 - draconian: this policy behaves like the 'exact' above with an | 300 | |
247 | exception that it can't be overridden with CAP_SYS_PTRACE. | 301 | 2 - draconian: |
302 | this policy behaves like the 'exact' above with an | ||
303 | exception that it can't be overridden with ``CAP_SYS_PTRACE``. | ||
304 | |||
248 | revoke-subject | 305 | revoke-subject |
249 | Writing a Smack label here sets the access to '-' for all access | 306 | Writing a Smack label here sets the access to '-' for all access |
250 | rules with that subject label. | 307 | rules with that subject label. |
308 | |||
251 | unconfined | 309 | unconfined |
252 | If the kernel is configured with CONFIG_SECURITY_SMACK_BRINGUP | 310 | If the kernel is configured with ``CONFIG_SECURITY_SMACK_BRINGUP`` |
253 | a process with CAP_MAC_ADMIN can write a label into this interface. | 311 | a process with ``CAP_MAC_ADMIN`` can write a label into this interface. |
254 | Thereafter, accesses that involve that label will be logged and | 312 | Thereafter, accesses that involve that label will be logged and |
255 | the access permitted if it wouldn't be otherwise. Note that this | 313 | the access permitted if it wouldn't be otherwise. Note that this |
256 | is dangerous and can ruin the proper labeling of your system. | 314 | is dangerous and can ruin the proper labeling of your system. |
257 | It should never be used in production. | 315 | It should never be used in production. |
316 | |||
258 | relabel-self | 317 | relabel-self |
259 | This interface contains a list of labels to which the process can | 318 | This interface contains a list of labels to which the process can |
260 | transition to, by writing to /proc/self/attr/current. | 319 | transition to, by writing to ``/proc/self/attr/current``. |
261 | Normally a process can change its own label to any legal value, but only | 320 | Normally a process can change its own label to any legal value, but only |
262 | if it has CAP_MAC_ADMIN. This interface allows a process without | 321 | if it has ``CAP_MAC_ADMIN``. This interface allows a process without |
263 | CAP_MAC_ADMIN to relabel itself to one of labels from predefined list. | 322 | ``CAP_MAC_ADMIN`` to relabel itself to one of labels from predefined list. |
264 | A process without CAP_MAC_ADMIN can change its label only once. When it | 323 | A process without ``CAP_MAC_ADMIN`` can change its label only once. When it |
265 | does, this list will be cleared. | 324 | does, this list will be cleared. |
266 | The values are set by writing the desired labels, separated | 325 | The values are set by writing the desired labels, separated |
267 | by spaces, to the file or cleared by writing "-" to the file. | 326 | by spaces, to the file or cleared by writing "-" to the file. |
268 | 327 | ||
269 | If you are using the smackload utility | 328 | If you are using the smackload utility |
270 | you can add access rules in /etc/smack/accesses. They take the form: | 329 | you can add access rules in ``/etc/smack/accesses``. They take the form:: |
271 | 330 | ||
272 | subjectlabel objectlabel access | 331 | subjectlabel objectlabel access |
273 | 332 | ||
@@ -277,14 +336,14 @@ object with objectlabel. If there is no rule no access is allowed. | |||
277 | 336 | ||
278 | Look for additional programs on http://schaufler-ca.com | 337 | Look for additional programs on http://schaufler-ca.com |
279 | 338 | ||
280 | From the Smack Whitepaper: | 339 | The Simplified Mandatory Access Control Kernel (Whitepaper) |
281 | 340 | =========================================================== | |
282 | The Simplified Mandatory Access Control Kernel | ||
283 | 341 | ||
284 | Casey Schaufler | 342 | Casey Schaufler |
285 | casey@schaufler-ca.com | 343 | casey@schaufler-ca.com |
286 | 344 | ||
287 | Mandatory Access Control | 345 | Mandatory Access Control |
346 | ------------------------ | ||
288 | 347 | ||
289 | Computer systems employ a variety of schemes to constrain how information is | 348 | Computer systems employ a variety of schemes to constrain how information is |
290 | shared among the people and services using the machine. Some of these schemes | 349 | shared among the people and services using the machine. Some of these schemes |
@@ -297,6 +356,7 @@ access control mechanisms because you don't have a choice regarding the users | |||
297 | or programs that have access to pieces of data. | 356 | or programs that have access to pieces of data. |
298 | 357 | ||
299 | Bell & LaPadula | 358 | Bell & LaPadula |
359 | --------------- | ||
300 | 360 | ||
301 | From the middle of the 1980's until the turn of the century Mandatory Access | 361 | From the middle of the 1980's until the turn of the century Mandatory Access |
302 | Control (MAC) was very closely associated with the Bell & LaPadula security | 362 | Control (MAC) was very closely associated with the Bell & LaPadula security |
@@ -306,6 +366,7 @@ within the Capital Beltway and Scandinavian supercomputer centers but was | |||
306 | often sited as failing to address general needs. | 366 | often sited as failing to address general needs. |
307 | 367 | ||
308 | Domain Type Enforcement | 368 | Domain Type Enforcement |
369 | ----------------------- | ||
309 | 370 | ||
310 | Around the turn of the century Domain Type Enforcement (DTE) became popular. | 371 | Around the turn of the century Domain Type Enforcement (DTE) became popular. |
311 | This scheme organizes users, programs, and data into domains that are | 372 | This scheme organizes users, programs, and data into domains that are |
@@ -316,6 +377,7 @@ necessary to provide a secure domain mapping leads to the scheme being | |||
316 | disabled or used in limited ways in the majority of cases. | 377 | disabled or used in limited ways in the majority of cases. |
317 | 378 | ||
318 | Smack | 379 | Smack |
380 | ----- | ||
319 | 381 | ||
320 | Smack is a Mandatory Access Control mechanism designed to provide useful MAC | 382 | Smack is a Mandatory Access Control mechanism designed to provide useful MAC |
321 | while avoiding the pitfalls of its predecessors. The limitations of Bell & | 383 | while avoiding the pitfalls of its predecessors. The limitations of Bell & |
@@ -326,46 +388,55 @@ Enforcement and avoided by defining access controls in terms of the access | |||
326 | modes already in use. | 388 | modes already in use. |
327 | 389 | ||
328 | Smack Terminology | 390 | Smack Terminology |
391 | ----------------- | ||
329 | 392 | ||
330 | The jargon used to talk about Smack will be familiar to those who have dealt | 393 | The jargon used to talk about Smack will be familiar to those who have dealt |
331 | with other MAC systems and shouldn't be too difficult for the uninitiated to | 394 | with other MAC systems and shouldn't be too difficult for the uninitiated to |
332 | pick up. There are four terms that are used in a specific way and that are | 395 | pick up. There are four terms that are used in a specific way and that are |
333 | especially important: | 396 | especially important: |
334 | 397 | ||
335 | Subject: A subject is an active entity on the computer system. | 398 | Subject: |
399 | A subject is an active entity on the computer system. | ||
336 | On Smack a subject is a task, which is in turn the basic unit | 400 | On Smack a subject is a task, which is in turn the basic unit |
337 | of execution. | 401 | of execution. |
338 | 402 | ||
339 | Object: An object is a passive entity on the computer system. | 403 | Object: |
404 | An object is a passive entity on the computer system. | ||
340 | On Smack files of all types, IPC, and tasks can be objects. | 405 | On Smack files of all types, IPC, and tasks can be objects. |
341 | 406 | ||
342 | Access: Any attempt by a subject to put information into or get | 407 | Access: |
408 | Any attempt by a subject to put information into or get | ||
343 | information from an object is an access. | 409 | information from an object is an access. |
344 | 410 | ||
345 | Label: Data that identifies the Mandatory Access Control | 411 | Label: |
412 | Data that identifies the Mandatory Access Control | ||
346 | characteristics of a subject or an object. | 413 | characteristics of a subject or an object. |
347 | 414 | ||
348 | These definitions are consistent with the traditional use in the security | 415 | These definitions are consistent with the traditional use in the security |
349 | community. There are also some terms from Linux that are likely to crop up: | 416 | community. There are also some terms from Linux that are likely to crop up: |
350 | 417 | ||
351 | Capability: A task that possesses a capability has permission to | 418 | Capability: |
419 | A task that possesses a capability has permission to | ||
352 | violate an aspect of the system security policy, as identified by | 420 | violate an aspect of the system security policy, as identified by |
353 | the specific capability. A task that possesses one or more | 421 | the specific capability. A task that possesses one or more |
354 | capabilities is a privileged task, whereas a task with no | 422 | capabilities is a privileged task, whereas a task with no |
355 | capabilities is an unprivileged task. | 423 | capabilities is an unprivileged task. |
356 | 424 | ||
357 | Privilege: A task that is allowed to violate the system security | 425 | Privilege: |
426 | A task that is allowed to violate the system security | ||
358 | policy is said to have privilege. As of this writing a task can | 427 | policy is said to have privilege. As of this writing a task can |
359 | have privilege either by possessing capabilities or by having an | 428 | have privilege either by possessing capabilities or by having an |
360 | effective user of root. | 429 | effective user of root. |
361 | 430 | ||
362 | Smack Basics | 431 | Smack Basics |
432 | ------------ | ||
363 | 433 | ||
364 | Smack is an extension to a Linux system. It enforces additional restrictions | 434 | Smack is an extension to a Linux system. It enforces additional restrictions |
365 | on what subjects can access which objects, based on the labels attached to | 435 | on what subjects can access which objects, based on the labels attached to |
366 | each of the subject and the object. | 436 | each of the subject and the object. |
367 | 437 | ||
368 | Labels | 438 | Labels |
439 | ~~~~~~ | ||
369 | 440 | ||
370 | Smack labels are ASCII character strings. They can be up to 255 characters | 441 | Smack labels are ASCII character strings. They can be up to 255 characters |
371 | long, but keeping them to twenty-three characters is recommended. | 442 | long, but keeping them to twenty-three characters is recommended. |
@@ -377,7 +448,7 @@ contain unprintable characters, the "/" (slash), the "\" (backslash), the "'" | |||
377 | (quote) and '"' (double-quote) characters. | 448 | (quote) and '"' (double-quote) characters. |
378 | Smack labels cannot begin with a '-'. This is reserved for special options. | 449 | Smack labels cannot begin with a '-'. This is reserved for special options. |
379 | 450 | ||
380 | There are some predefined labels: | 451 | There are some predefined labels:: |
381 | 452 | ||
382 | _ Pronounced "floor", a single underscore character. | 453 | _ Pronounced "floor", a single underscore character. |
383 | ^ Pronounced "hat", a single circumflex character. | 454 | ^ Pronounced "hat", a single circumflex character. |
@@ -390,14 +461,18 @@ of a process will usually be assigned by the system initialization | |||
390 | mechanism. | 461 | mechanism. |
391 | 462 | ||
392 | Access Rules | 463 | Access Rules |
464 | ~~~~~~~~~~~~ | ||
393 | 465 | ||
394 | Smack uses the traditional access modes of Linux. These modes are read, | 466 | Smack uses the traditional access modes of Linux. These modes are read, |
395 | execute, write, and occasionally append. There are a few cases where the | 467 | execute, write, and occasionally append. There are a few cases where the |
396 | access mode may not be obvious. These include: | 468 | access mode may not be obvious. These include: |
397 | 469 | ||
398 | Signals: A signal is a write operation from the subject task to | 470 | Signals: |
471 | A signal is a write operation from the subject task to | ||
399 | the object task. | 472 | the object task. |
400 | Internet Domain IPC: Transmission of a packet is considered a | 473 | |
474 | Internet Domain IPC: | ||
475 | Transmission of a packet is considered a | ||
401 | write operation from the source task to the destination task. | 476 | write operation from the source task to the destination task. |
402 | 477 | ||
403 | Smack restricts access based on the label attached to a subject and the label | 478 | Smack restricts access based on the label attached to a subject and the label |
@@ -417,6 +492,7 @@ order: | |||
417 | 7. Any other access is denied. | 492 | 7. Any other access is denied. |
418 | 493 | ||
419 | Smack Access Rules | 494 | Smack Access Rules |
495 | ~~~~~~~~~~~~~~~~~~ | ||
420 | 496 | ||
421 | With the isolation provided by Smack access separation is simple. There are | 497 | With the isolation provided by Smack access separation is simple. There are |
422 | many interesting cases where limited access by subjects to objects with | 498 | many interesting cases where limited access by subjects to objects with |
@@ -427,8 +503,9 @@ be "born" highly classified. To accommodate such schemes Smack includes a | |||
427 | mechanism for specifying rules allowing access between labels. | 503 | mechanism for specifying rules allowing access between labels. |
428 | 504 | ||
429 | Access Rule Format | 505 | Access Rule Format |
506 | ~~~~~~~~~~~~~~~~~~ | ||
430 | 507 | ||
431 | The format of an access rule is: | 508 | The format of an access rule is:: |
432 | 509 | ||
433 | subject-label object-label access | 510 | subject-label object-label access |
434 | 511 | ||
@@ -446,7 +523,7 @@ describe access modes: | |||
446 | 523 | ||
447 | Uppercase values for the specification letters are allowed as well. | 524 | Uppercase values for the specification letters are allowed as well. |
448 | Access mode specifications can be in any order. Examples of acceptable rules | 525 | Access mode specifications can be in any order. Examples of acceptable rules |
449 | are: | 526 | are:: |
450 | 527 | ||
451 | TopSecret Secret rx | 528 | TopSecret Secret rx |
452 | Secret Unclass R | 529 | Secret Unclass R |
@@ -456,7 +533,7 @@ are: | |||
456 | New Old rRrRr | 533 | New Old rRrRr |
457 | Closed Off - | 534 | Closed Off - |
458 | 535 | ||
459 | Examples of unacceptable rules are: | 536 | Examples of unacceptable rules are:: |
460 | 537 | ||
461 | Top Secret Secret rx | 538 | Top Secret Secret rx |
462 | Ace Ace r | 539 | Ace Ace r |
@@ -469,6 +546,7 @@ access specifications. The dash is a placeholder, so "a-r" is the same | |||
469 | as "ar". A lone dash is used to specify that no access should be allowed. | 546 | as "ar". A lone dash is used to specify that no access should be allowed. |
470 | 547 | ||
471 | Applying Access Rules | 548 | Applying Access Rules |
549 | ~~~~~~~~~~~~~~~~~~~~~ | ||
472 | 550 | ||
473 | The developers of Linux rarely define new sorts of things, usually importing | 551 | The developers of Linux rarely define new sorts of things, usually importing |
474 | schemes and concepts from other systems. Most often, the other systems are | 552 | schemes and concepts from other systems. Most often, the other systems are |
@@ -511,6 +589,7 @@ one process to another requires that the sender have write access to the | |||
511 | receiver. The receiver is not required to have read access to the sender. | 589 | receiver. The receiver is not required to have read access to the sender. |
512 | 590 | ||
513 | Setting Access Rules | 591 | Setting Access Rules |
592 | ~~~~~~~~~~~~~~~~~~~~ | ||
514 | 593 | ||
515 | The configuration file /etc/smack/accesses contains the rules to be set at | 594 | The configuration file /etc/smack/accesses contains the rules to be set at |
516 | system startup. The contents are written to the special file | 595 | system startup. The contents are written to the special file |
@@ -520,6 +599,7 @@ one rule, with the most recently specified overriding any earlier | |||
520 | specification. | 599 | specification. |
521 | 600 | ||
522 | Task Attribute | 601 | Task Attribute |
602 | ~~~~~~~~~~~~~~ | ||
523 | 603 | ||
524 | The Smack label of a process can be read from /proc/<pid>/attr/current. A | 604 | The Smack label of a process can be read from /proc/<pid>/attr/current. A |
525 | process can read its own Smack label from /proc/self/attr/current. A | 605 | process can read its own Smack label from /proc/self/attr/current. A |
@@ -527,12 +607,14 @@ privileged process can change its own Smack label by writing to | |||
527 | /proc/self/attr/current but not the label of another process. | 607 | /proc/self/attr/current but not the label of another process. |
528 | 608 | ||
529 | File Attribute | 609 | File Attribute |
610 | ~~~~~~~~~~~~~~ | ||
530 | 611 | ||
531 | The Smack label of a filesystem object is stored as an extended attribute | 612 | The Smack label of a filesystem object is stored as an extended attribute |
532 | named SMACK64 on the file. This attribute is in the security namespace. It can | 613 | named SMACK64 on the file. This attribute is in the security namespace. It can |
533 | only be changed by a process with privilege. | 614 | only be changed by a process with privilege. |
534 | 615 | ||
535 | Privilege | 616 | Privilege |
617 | ~~~~~~~~~ | ||
536 | 618 | ||
537 | A process with CAP_MAC_OVERRIDE or CAP_MAC_ADMIN is privileged. | 619 | A process with CAP_MAC_OVERRIDE or CAP_MAC_ADMIN is privileged. |
538 | CAP_MAC_OVERRIDE allows the process access to objects it would | 620 | CAP_MAC_OVERRIDE allows the process access to objects it would |
@@ -540,6 +622,7 @@ be denied otherwise. CAP_MAC_ADMIN allows a process to change | |||
540 | Smack data, including rules and attributes. | 622 | Smack data, including rules and attributes. |
541 | 623 | ||
542 | Smack Networking | 624 | Smack Networking |
625 | ~~~~~~~~~~~~~~~~ | ||
543 | 626 | ||
544 | As mentioned before, Smack enforces access control on network protocol | 627 | As mentioned before, Smack enforces access control on network protocol |
545 | transmissions. Every packet sent by a Smack process is tagged with its Smack | 628 | transmissions. Every packet sent by a Smack process is tagged with its Smack |
@@ -551,6 +634,7 @@ packet has write access to the receiving process and if that is not the case | |||
551 | the packet is dropped. | 634 | the packet is dropped. |
552 | 635 | ||
553 | CIPSO Configuration | 636 | CIPSO Configuration |
637 | ~~~~~~~~~~~~~~~~~~~ | ||
554 | 638 | ||
555 | It is normally unnecessary to specify the CIPSO configuration. The default | 639 | It is normally unnecessary to specify the CIPSO configuration. The default |
556 | values used by the system handle all internal cases. Smack will compose CIPSO | 640 | values used by the system handle all internal cases. Smack will compose CIPSO |
@@ -571,13 +655,13 @@ discarded. The DOI is 3 by default. The value can be read from | |||
571 | The label and category set are mapped to a Smack label as defined in | 655 | The label and category set are mapped to a Smack label as defined in |
572 | /etc/smack/cipso. | 656 | /etc/smack/cipso. |
573 | 657 | ||
574 | A Smack/CIPSO mapping has the form: | 658 | A Smack/CIPSO mapping has the form:: |
575 | 659 | ||
576 | smack level [category [category]*] | 660 | smack level [category [category]*] |
577 | 661 | ||
578 | Smack does not expect the level or category sets to be related in any | 662 | Smack does not expect the level or category sets to be related in any |
579 | particular way and does not assume or assign accesses based on them. Some | 663 | particular way and does not assume or assign accesses based on them. Some |
580 | examples of mappings: | 664 | examples of mappings:: |
581 | 665 | ||
582 | TopSecret 7 | 666 | TopSecret 7 |
583 | TS:A,B 7 1 2 | 667 | TS:A,B 7 1 2 |
@@ -597,25 +681,30 @@ value can be read from /sys/fs/smackfs/direct and changed by writing to | |||
597 | /sys/fs/smackfs/direct. | 681 | /sys/fs/smackfs/direct. |
598 | 682 | ||
599 | Socket Attributes | 683 | Socket Attributes |
684 | ~~~~~~~~~~~~~~~~~ | ||
600 | 685 | ||
601 | There are two attributes that are associated with sockets. These attributes | 686 | There are two attributes that are associated with sockets. These attributes |
602 | can only be set by privileged tasks, but any task can read them for their own | 687 | can only be set by privileged tasks, but any task can read them for their own |
603 | sockets. | 688 | sockets. |
604 | 689 | ||
605 | SMACK64IPIN: The Smack label of the task object. A privileged | 690 | SMACK64IPIN: |
691 | The Smack label of the task object. A privileged | ||
606 | program that will enforce policy may set this to the star label. | 692 | program that will enforce policy may set this to the star label. |
607 | 693 | ||
608 | SMACK64IPOUT: The Smack label transmitted with outgoing packets. | 694 | SMACK64IPOUT: |
695 | The Smack label transmitted with outgoing packets. | ||
609 | A privileged program may set this to match the label of another | 696 | A privileged program may set this to match the label of another |
610 | task with which it hopes to communicate. | 697 | task with which it hopes to communicate. |
611 | 698 | ||
612 | Smack Netlabel Exceptions | 699 | Smack Netlabel Exceptions |
700 | ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
613 | 701 | ||
614 | You will often find that your labeled application has to talk to the outside, | 702 | You will often find that your labeled application has to talk to the outside, |
615 | unlabeled world. To do this there's a special file /sys/fs/smackfs/netlabel | 703 | unlabeled world. To do this there's a special file /sys/fs/smackfs/netlabel |
616 | where you can add some exceptions in the form of : | 704 | where you can add some exceptions in the form of:: |
617 | @IP1 LABEL1 or | 705 | |
618 | @IP2/MASK LABEL2 | 706 | @IP1 LABEL1 or |
707 | @IP2/MASK LABEL2 | ||
619 | 708 | ||
620 | It means that your application will have unlabeled access to @IP1 if it has | 709 | It means that your application will have unlabeled access to @IP1 if it has |
621 | write access on LABEL1, and access to the subnet @IP2/MASK if it has write | 710 | write access on LABEL1, and access to the subnet @IP2/MASK if it has write |
@@ -624,28 +713,32 @@ access on LABEL2. | |||
624 | Entries in the /sys/fs/smackfs/netlabel file are matched by longest mask | 713 | Entries in the /sys/fs/smackfs/netlabel file are matched by longest mask |
625 | first, like in classless IPv4 routing. | 714 | first, like in classless IPv4 routing. |
626 | 715 | ||
627 | A special label '@' and an option '-CIPSO' can be used there : | 716 | A special label '@' and an option '-CIPSO' can be used there:: |
628 | @ means Internet, any application with any label has access to it | ||
629 | -CIPSO means standard CIPSO networking | ||
630 | 717 | ||
631 | If you don't know what CIPSO is and don't plan to use it, you can just do : | 718 | @ means Internet, any application with any label has access to it |
632 | echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel | 719 | -CIPSO means standard CIPSO networking |
633 | echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel | 720 | |
721 | If you don't know what CIPSO is and don't plan to use it, you can just do:: | ||
722 | |||
723 | echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel | ||
724 | echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel | ||
634 | 725 | ||
635 | If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled | 726 | If you use CIPSO on your 192.168.0.0/16 local network and need also unlabeled |
636 | Internet access, you can have : | 727 | Internet access, you can have:: |
637 | echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel | ||
638 | echo 192.168.0.0/16 -CIPSO > /sys/fs/smackfs/netlabel | ||
639 | echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel | ||
640 | 728 | ||
729 | echo 127.0.0.1 -CIPSO > /sys/fs/smackfs/netlabel | ||
730 | echo 192.168.0.0/16 -CIPSO > /sys/fs/smackfs/netlabel | ||
731 | echo 0.0.0.0/0 @ > /sys/fs/smackfs/netlabel | ||
641 | 732 | ||
642 | Writing Applications for Smack | 733 | Writing Applications for Smack |
734 | ------------------------------ | ||
643 | 735 | ||
644 | There are three sorts of applications that will run on a Smack system. How an | 736 | There are three sorts of applications that will run on a Smack system. How an |
645 | application interacts with Smack will determine what it will have to do to | 737 | application interacts with Smack will determine what it will have to do to |
646 | work properly under Smack. | 738 | work properly under Smack. |
647 | 739 | ||
648 | Smack Ignorant Applications | 740 | Smack Ignorant Applications |
741 | --------------------------- | ||
649 | 742 | ||
650 | By far the majority of applications have no reason whatever to care about the | 743 | By far the majority of applications have no reason whatever to care about the |
651 | unique properties of Smack. Since invoking a program has no impact on the | 744 | unique properties of Smack. Since invoking a program has no impact on the |
@@ -653,12 +746,14 @@ Smack label associated with the process the only concern likely to arise is | |||
653 | whether the process has execute access to the program. | 746 | whether the process has execute access to the program. |
654 | 747 | ||
655 | Smack Relevant Applications | 748 | Smack Relevant Applications |
749 | --------------------------- | ||
656 | 750 | ||
657 | Some programs can be improved by teaching them about Smack, but do not make | 751 | Some programs can be improved by teaching them about Smack, but do not make |
658 | any security decisions themselves. The utility ls(1) is one example of such a | 752 | any security decisions themselves. The utility ls(1) is one example of such a |
659 | program. | 753 | program. |
660 | 754 | ||
661 | Smack Enforcing Applications | 755 | Smack Enforcing Applications |
756 | ---------------------------- | ||
662 | 757 | ||
663 | These are special programs that not only know about Smack, but participate in | 758 | These are special programs that not only know about Smack, but participate in |
664 | the enforcement of system policy. In most cases these are the programs that | 759 | the enforcement of system policy. In most cases these are the programs that |
@@ -666,15 +761,16 @@ set up user sessions. There are also network services that provide information | |||
666 | to processes running with various labels. | 761 | to processes running with various labels. |
667 | 762 | ||
668 | File System Interfaces | 763 | File System Interfaces |
764 | ---------------------- | ||
669 | 765 | ||
670 | Smack maintains labels on file system objects using extended attributes. The | 766 | Smack maintains labels on file system objects using extended attributes. The |
671 | Smack label of a file, directory, or other file system object can be obtained | 767 | Smack label of a file, directory, or other file system object can be obtained |
672 | using getxattr(2). | 768 | using getxattr(2):: |
673 | 769 | ||
674 | len = getxattr("/", "security.SMACK64", value, sizeof (value)); | 770 | len = getxattr("/", "security.SMACK64", value, sizeof (value)); |
675 | 771 | ||
676 | will put the Smack label of the root directory into value. A privileged | 772 | will put the Smack label of the root directory into value. A privileged |
677 | process can set the Smack label of a file system object with setxattr(2). | 773 | process can set the Smack label of a file system object with setxattr(2):: |
678 | 774 | ||
679 | len = strlen("Rubble"); | 775 | len = strlen("Rubble"); |
680 | rc = setxattr("/foo", "security.SMACK64", "Rubble", len, 0); | 776 | rc = setxattr("/foo", "security.SMACK64", "Rubble", len, 0); |
@@ -683,17 +779,18 @@ will set the Smack label of /foo to "Rubble" if the program has appropriate | |||
683 | privilege. | 779 | privilege. |
684 | 780 | ||
685 | Socket Interfaces | 781 | Socket Interfaces |
782 | ----------------- | ||
686 | 783 | ||
687 | The socket attributes can be read using fgetxattr(2). | 784 | The socket attributes can be read using fgetxattr(2). |
688 | 785 | ||
689 | A privileged process can set the Smack label of outgoing packets with | 786 | A privileged process can set the Smack label of outgoing packets with |
690 | fsetxattr(2). | 787 | fsetxattr(2):: |
691 | 788 | ||
692 | len = strlen("Rubble"); | 789 | len = strlen("Rubble"); |
693 | rc = fsetxattr(fd, "security.SMACK64IPOUT", "Rubble", len, 0); | 790 | rc = fsetxattr(fd, "security.SMACK64IPOUT", "Rubble", len, 0); |
694 | 791 | ||
695 | will set the Smack label "Rubble" on packets going out from the socket if the | 792 | will set the Smack label "Rubble" on packets going out from the socket if the |
696 | program has appropriate privilege. | 793 | program has appropriate privilege:: |
697 | 794 | ||
698 | rc = fsetxattr(fd, "security.SMACK64IPIN, "*", strlen("*"), 0); | 795 | rc = fsetxattr(fd, "security.SMACK64IPIN, "*", strlen("*"), 0); |
699 | 796 | ||
@@ -701,33 +798,40 @@ will set the Smack label "*" as the object label against which incoming | |||
701 | packets will be checked if the program has appropriate privilege. | 798 | packets will be checked if the program has appropriate privilege. |
702 | 799 | ||
703 | Administration | 800 | Administration |
801 | -------------- | ||
704 | 802 | ||
705 | Smack supports some mount options: | 803 | Smack supports some mount options: |
706 | 804 | ||
707 | smackfsdef=label: specifies the label to give files that lack | 805 | smackfsdef=label: |
806 | specifies the label to give files that lack | ||
708 | the Smack label extended attribute. | 807 | the Smack label extended attribute. |
709 | 808 | ||
710 | smackfsroot=label: specifies the label to assign the root of the | 809 | smackfsroot=label: |
810 | specifies the label to assign the root of the | ||
711 | file system if it lacks the Smack extended attribute. | 811 | file system if it lacks the Smack extended attribute. |
712 | 812 | ||
713 | smackfshat=label: specifies a label that must have read access to | 813 | smackfshat=label: |
814 | specifies a label that must have read access to | ||
714 | all labels set on the filesystem. Not yet enforced. | 815 | all labels set on the filesystem. Not yet enforced. |
715 | 816 | ||
716 | smackfsfloor=label: specifies a label to which all labels set on the | 817 | smackfsfloor=label: |
818 | specifies a label to which all labels set on the | ||
717 | filesystem must have read access. Not yet enforced. | 819 | filesystem must have read access. Not yet enforced. |
718 | 820 | ||
719 | These mount options apply to all file system types. | 821 | These mount options apply to all file system types. |
720 | 822 | ||
721 | Smack auditing | 823 | Smack auditing |
824 | -------------- | ||
722 | 825 | ||
723 | If you want Smack auditing of security events, you need to set CONFIG_AUDIT | 826 | If you want Smack auditing of security events, you need to set CONFIG_AUDIT |
724 | in your kernel configuration. | 827 | in your kernel configuration. |
725 | By default, all denied events will be audited. You can change this behavior by | 828 | By default, all denied events will be audited. You can change this behavior by |
726 | writing a single character to the /sys/fs/smackfs/logging file : | 829 | writing a single character to the /sys/fs/smackfs/logging file:: |
727 | 0 : no logging | 830 | |
728 | 1 : log denied (default) | 831 | 0 : no logging |
729 | 2 : log accepted | 832 | 1 : log denied (default) |
730 | 3 : log denied & accepted | 833 | 2 : log accepted |
834 | 3 : log denied & accepted | ||
731 | 835 | ||
732 | Events are logged as 'key=value' pairs, for each event you at least will get | 836 | Events are logged as 'key=value' pairs, for each event you at least will get |
733 | the subject, the object, the rights requested, the action, the kernel function | 837 | the subject, the object, the rights requested, the action, the kernel function |
@@ -735,6 +839,7 @@ that triggered the event, plus other pairs depending on the type of event | |||
735 | audited. | 839 | audited. |
736 | 840 | ||
737 | Bringup Mode | 841 | Bringup Mode |
842 | ------------ | ||
738 | 843 | ||
739 | Bringup mode provides logging features that can make application | 844 | Bringup mode provides logging features that can make application |
740 | configuration and system bringup easier. Configure the kernel with | 845 | configuration and system bringup easier. Configure the kernel with |
diff --git a/Documentation/security/Yama.txt b/Documentation/admin-guide/LSM/Yama.rst index d9ee7d7a6c7f..13468ea696b7 100644 --- a/Documentation/security/Yama.txt +++ b/Documentation/admin-guide/LSM/Yama.rst | |||
@@ -1,13 +1,14 @@ | |||
1 | ==== | ||
2 | Yama | ||
3 | ==== | ||
4 | |||
1 | Yama is a Linux Security Module that collects system-wide DAC security | 5 | Yama is a Linux Security Module that collects system-wide DAC security |
2 | protections that are not handled by the core kernel itself. This is | 6 | protections that are not handled by the core kernel itself. This is |
3 | selectable at build-time with CONFIG_SECURITY_YAMA, and can be controlled | 7 | selectable at build-time with ``CONFIG_SECURITY_YAMA``, and can be controlled |
4 | at run-time through sysctls in /proc/sys/kernel/yama: | 8 | at run-time through sysctls in ``/proc/sys/kernel/yama``: |
5 | |||
6 | - ptrace_scope | ||
7 | 9 | ||
8 | ============================================================== | 10 | ptrace_scope |
9 | 11 | ============ | |
10 | ptrace_scope: | ||
11 | 12 | ||
12 | As Linux grows in popularity, it will become a larger target for | 13 | As Linux grows in popularity, it will become a larger target for |
13 | malware. One particularly troubling weakness of the Linux process | 14 | malware. One particularly troubling weakness of the Linux process |
@@ -25,47 +26,49 @@ exist and remain possible if ptrace is allowed to operate as before. | |||
25 | Since ptrace is not commonly used by non-developers and non-admins, system | 26 | Since ptrace is not commonly used by non-developers and non-admins, system |
26 | builders should be allowed the option to disable this debugging system. | 27 | builders should be allowed the option to disable this debugging system. |
27 | 28 | ||
28 | For a solution, some applications use prctl(PR_SET_DUMPABLE, ...) to | 29 | For a solution, some applications use ``prctl(PR_SET_DUMPABLE, ...)`` to |
29 | specifically disallow such ptrace attachment (e.g. ssh-agent), but many | 30 | specifically disallow such ptrace attachment (e.g. ssh-agent), but many |
30 | do not. A more general solution is to only allow ptrace directly from a | 31 | do not. A more general solution is to only allow ptrace directly from a |
31 | parent to a child process (i.e. direct "gdb EXE" and "strace EXE" still | 32 | parent to a child process (i.e. direct "gdb EXE" and "strace EXE" still |
32 | work), or with CAP_SYS_PTRACE (i.e. "gdb --pid=PID", and "strace -p PID" | 33 | work), or with ``CAP_SYS_PTRACE`` (i.e. "gdb --pid=PID", and "strace -p PID" |
33 | still work as root). | 34 | still work as root). |
34 | 35 | ||
35 | In mode 1, software that has defined application-specific relationships | 36 | In mode 1, software that has defined application-specific relationships |
36 | between a debugging process and its inferior (crash handlers, etc), | 37 | between a debugging process and its inferior (crash handlers, etc), |
37 | prctl(PR_SET_PTRACER, pid, ...) can be used. An inferior can declare which | 38 | ``prctl(PR_SET_PTRACER, pid, ...)`` can be used. An inferior can declare which |
38 | other process (and its descendants) are allowed to call PTRACE_ATTACH | 39 | other process (and its descendants) are allowed to call ``PTRACE_ATTACH`` |
39 | against it. Only one such declared debugging process can exists for | 40 | against it. Only one such declared debugging process can exists for |
40 | each inferior at a time. For example, this is used by KDE, Chromium, and | 41 | each inferior at a time. For example, this is used by KDE, Chromium, and |
41 | Firefox's crash handlers, and by Wine for allowing only Wine processes | 42 | Firefox's crash handlers, and by Wine for allowing only Wine processes |
42 | to ptrace each other. If a process wishes to entirely disable these ptrace | 43 | to ptrace each other. If a process wishes to entirely disable these ptrace |
43 | restrictions, it can call prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...) | 44 | restrictions, it can call ``prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...)`` |
44 | so that any otherwise allowed process (even those in external pid namespaces) | 45 | so that any otherwise allowed process (even those in external pid namespaces) |
45 | may attach. | 46 | may attach. |
46 | 47 | ||
47 | The sysctl settings (writable only with CAP_SYS_PTRACE) are: | 48 | The sysctl settings (writable only with ``CAP_SYS_PTRACE``) are: |
48 | 49 | ||
49 | 0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other | 50 | 0 - classic ptrace permissions: |
51 | a process can ``PTRACE_ATTACH`` to any other | ||
50 | process running under the same uid, as long as it is dumpable (i.e. | 52 | process running under the same uid, as long as it is dumpable (i.e. |
51 | did not transition uids, start privileged, or have called | 53 | did not transition uids, start privileged, or have called |
52 | prctl(PR_SET_DUMPABLE...) already). Similarly, PTRACE_TRACEME is | 54 | ``prctl(PR_SET_DUMPABLE...)`` already). Similarly, ``PTRACE_TRACEME`` is |
53 | unchanged. | 55 | unchanged. |
54 | 56 | ||
55 | 1 - restricted ptrace: a process must have a predefined relationship | 57 | 1 - restricted ptrace: |
56 | with the inferior it wants to call PTRACE_ATTACH on. By default, | 58 | a process must have a predefined relationship |
59 | with the inferior it wants to call ``PTRACE_ATTACH`` on. By default, | ||
57 | this relationship is that of only its descendants when the above | 60 | this relationship is that of only its descendants when the above |
58 | classic criteria is also met. To change the relationship, an | 61 | classic criteria is also met. To change the relationship, an |
59 | inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare | 62 | inferior can call ``prctl(PR_SET_PTRACER, debugger, ...)`` to declare |
60 | an allowed debugger PID to call PTRACE_ATTACH on the inferior. | 63 | an allowed debugger PID to call ``PTRACE_ATTACH`` on the inferior. |
61 | Using PTRACE_TRACEME is unchanged. | 64 | Using ``PTRACE_TRACEME`` is unchanged. |
62 | 65 | ||
63 | 2 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace | 66 | 2 - admin-only attach: |
64 | with PTRACE_ATTACH, or through children calling PTRACE_TRACEME. | 67 | only processes with ``CAP_SYS_PTRACE`` may use ptrace |
68 | with ``PTRACE_ATTACH``, or through children calling ``PTRACE_TRACEME``. | ||
65 | 69 | ||
66 | 3 - no attach: no processes may use ptrace with PTRACE_ATTACH nor via | 70 | 3 - no attach: |
67 | PTRACE_TRACEME. Once set, this sysctl value cannot be changed. | 71 | no processes may use ptrace with ``PTRACE_ATTACH`` nor via |
72 | ``PTRACE_TRACEME``. Once set, this sysctl value cannot be changed. | ||
68 | 73 | ||
69 | The original children-only logic was based on the restrictions in grsecurity. | 74 | The original children-only logic was based on the restrictions in grsecurity. |
70 | |||
71 | ============================================================== | ||
diff --git a/Documentation/security/apparmor.txt b/Documentation/admin-guide/LSM/apparmor.rst index 93c1fd7d0635..3e9734bd0e05 100644 --- a/Documentation/security/apparmor.txt +++ b/Documentation/admin-guide/LSM/apparmor.rst | |||
@@ -1,4 +1,9 @@ | |||
1 | --- What is AppArmor? --- | 1 | ======== |
2 | AppArmor | ||
3 | ======== | ||
4 | |||
5 | What is AppArmor? | ||
6 | ================= | ||
2 | 7 | ||
3 | AppArmor is MAC style security extension for the Linux kernel. It implements | 8 | AppArmor is MAC style security extension for the Linux kernel. It implements |
4 | a task centered policy, with task "profiles" being created and loaded | 9 | a task centered policy, with task "profiles" being created and loaded |
@@ -6,34 +11,41 @@ from user space. Tasks on the system that do not have a profile defined for | |||
6 | them run in an unconfined state which is equivalent to standard Linux DAC | 11 | them run in an unconfined state which is equivalent to standard Linux DAC |
7 | permissions. | 12 | permissions. |
8 | 13 | ||
9 | --- How to enable/disable --- | 14 | How to enable/disable |
15 | ===================== | ||
16 | |||
17 | set ``CONFIG_SECURITY_APPARMOR=y`` | ||
10 | 18 | ||
11 | set CONFIG_SECURITY_APPARMOR=y | 19 | If AppArmor should be selected as the default security module then set:: |
12 | 20 | ||
13 | If AppArmor should be selected as the default security module then | 21 | CONFIG_DEFAULT_SECURITY="apparmor" |
14 | set CONFIG_DEFAULT_SECURITY="apparmor" | 22 | CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1 |
15 | and CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1 | ||
16 | 23 | ||
17 | Build the kernel | 24 | Build the kernel |
18 | 25 | ||
19 | If AppArmor is not the default security module it can be enabled by passing | 26 | If AppArmor is not the default security module it can be enabled by passing |
20 | security=apparmor on the kernel's command line. | 27 | ``security=apparmor`` on the kernel's command line. |
21 | 28 | ||
22 | If AppArmor is the default security module it can be disabled by passing | 29 | If AppArmor is the default security module it can be disabled by passing |
23 | apparmor=0, security=XXXX (where XXX is valid security module), on the | 30 | ``apparmor=0, security=XXXX`` (where ``XXXX`` is valid security module), on the |
24 | kernel's command line | 31 | kernel's command line. |
25 | 32 | ||
26 | For AppArmor to enforce any restrictions beyond standard Linux DAC permissions | 33 | For AppArmor to enforce any restrictions beyond standard Linux DAC permissions |
27 | policy must be loaded into the kernel from user space (see the Documentation | 34 | policy must be loaded into the kernel from user space (see the Documentation |
28 | and tools links). | 35 | and tools links). |
29 | 36 | ||
30 | --- Documentation --- | 37 | Documentation |
38 | ============= | ||
31 | 39 | ||
32 | Documentation can be found on the wiki. | 40 | Documentation can be found on the wiki, linked below. |
33 | 41 | ||
34 | --- Links --- | 42 | Links |
43 | ===== | ||
35 | 44 | ||
36 | Mailing List - apparmor@lists.ubuntu.com | 45 | Mailing List - apparmor@lists.ubuntu.com |
46 | |||
37 | Wiki - http://apparmor.wiki.kernel.org/ | 47 | Wiki - http://apparmor.wiki.kernel.org/ |
48 | |||
38 | User space tools - https://launchpad.net/apparmor | 49 | User space tools - https://launchpad.net/apparmor |
50 | |||
39 | Kernel module - git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git | 51 | Kernel module - git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git |
diff --git a/Documentation/security/LSM.txt b/Documentation/admin-guide/LSM/index.rst index c2683f28ed36..c980dfe9abf1 100644 --- a/Documentation/security/LSM.txt +++ b/Documentation/admin-guide/LSM/index.rst | |||
@@ -1,12 +1,13 @@ | |||
1 | Linux Security Module framework | 1 | =========================== |
2 | ------------------------------- | 2 | Linux Security Module Usage |
3 | =========================== | ||
3 | 4 | ||
4 | The Linux Security Module (LSM) framework provides a mechanism for | 5 | The Linux Security Module (LSM) framework provides a mechanism for |
5 | various security checks to be hooked by new kernel extensions. The name | 6 | various security checks to be hooked by new kernel extensions. The name |
6 | "module" is a bit of a misnomer since these extensions are not actually | 7 | "module" is a bit of a misnomer since these extensions are not actually |
7 | loadable kernel modules. Instead, they are selectable at build-time via | 8 | loadable kernel modules. Instead, they are selectable at build-time via |
8 | CONFIG_DEFAULT_SECURITY and can be overridden at boot-time via the | 9 | CONFIG_DEFAULT_SECURITY and can be overridden at boot-time via the |
9 | "security=..." kernel command line argument, in the case where multiple | 10 | ``"security=..."`` kernel command line argument, in the case where multiple |
10 | LSMs were built into a given kernel. | 11 | LSMs were built into a given kernel. |
11 | 12 | ||
12 | The primary users of the LSM interface are Mandatory Access Control | 13 | The primary users of the LSM interface are Mandatory Access Control |
@@ -19,23 +20,22 @@ in the core functionality of Linux itself. | |||
19 | Without a specific LSM built into the kernel, the default LSM will be the | 20 | Without a specific LSM built into the kernel, the default LSM will be the |
20 | Linux capabilities system. Most LSMs choose to extend the capabilities | 21 | Linux capabilities system. Most LSMs choose to extend the capabilities |
21 | system, building their checks on top of the defined capability hooks. | 22 | system, building their checks on top of the defined capability hooks. |
22 | For more details on capabilities, see capabilities(7) in the Linux | 23 | For more details on capabilities, see ``capabilities(7)`` in the Linux |
23 | man-pages project. | 24 | man-pages project. |
24 | 25 | ||
25 | A list of the active security modules can be found by reading | 26 | A list of the active security modules can be found by reading |
26 | /sys/kernel/security/lsm. This is a comma separated list, and | 27 | ``/sys/kernel/security/lsm``. This is a comma separated list, and |
27 | will always include the capability module. The list reflects the | 28 | will always include the capability module. The list reflects the |
28 | order in which checks are made. The capability module will always | 29 | order in which checks are made. The capability module will always |
29 | be first, followed by any "minor" modules (e.g. Yama) and then | 30 | be first, followed by any "minor" modules (e.g. Yama) and then |
30 | the one "major" module (e.g. SELinux) if there is one configured. | 31 | the one "major" module (e.g. SELinux) if there is one configured. |
31 | 32 | ||
32 | Based on https://lkml.org/lkml/2007/10/26/215, | 33 | .. toctree:: |
33 | a new LSM is accepted into the kernel when its intent (a description of | 34 | :maxdepth: 1 |
34 | what it tries to protect against and in what cases one would expect to | ||
35 | use it) has been appropriately documented in Documentation/security/. | ||
36 | This allows an LSM's code to be easily compared to its goals, and so | ||
37 | that end users and distros can make a more informed decision about which | ||
38 | LSMs suit their requirements. | ||
39 | 35 | ||
40 | For extensive documentation on the available LSM hook interfaces, please | 36 | apparmor |
41 | see include/linux/security.h. | 37 | LoadPin |
38 | SELinux | ||
39 | Smack | ||
40 | tomoyo | ||
41 | Yama | ||
diff --git a/Documentation/security/tomoyo.txt b/Documentation/admin-guide/LSM/tomoyo.rst index 200a2d37cbc8..a5947218fa64 100644 --- a/Documentation/security/tomoyo.txt +++ b/Documentation/admin-guide/LSM/tomoyo.rst | |||
@@ -1,21 +1,30 @@ | |||
1 | --- What is TOMOYO? --- | 1 | ====== |
2 | TOMOYO | ||
3 | ====== | ||
4 | |||
5 | What is TOMOYO? | ||
6 | =============== | ||
2 | 7 | ||
3 | TOMOYO is a name-based MAC extension (LSM module) for the Linux kernel. | 8 | TOMOYO is a name-based MAC extension (LSM module) for the Linux kernel. |
4 | 9 | ||
5 | LiveCD-based tutorials are available at | 10 | LiveCD-based tutorials are available at |
11 | |||
6 | http://tomoyo.sourceforge.jp/1.7/1st-step/ubuntu10.04-live/ | 12 | http://tomoyo.sourceforge.jp/1.7/1st-step/ubuntu10.04-live/ |
7 | http://tomoyo.sourceforge.jp/1.7/1st-step/centos5-live/ . | 13 | http://tomoyo.sourceforge.jp/1.7/1st-step/centos5-live/ |
14 | |||
8 | Though these tutorials use non-LSM version of TOMOYO, they are useful for you | 15 | Though these tutorials use non-LSM version of TOMOYO, they are useful for you |
9 | to know what TOMOYO is. | 16 | to know what TOMOYO is. |
10 | 17 | ||
11 | --- How to enable TOMOYO? --- | 18 | How to enable TOMOYO? |
19 | ===================== | ||
12 | 20 | ||
13 | Build the kernel with CONFIG_SECURITY_TOMOYO=y and pass "security=tomoyo" on | 21 | Build the kernel with ``CONFIG_SECURITY_TOMOYO=y`` and pass ``security=tomoyo`` on |
14 | kernel's command line. | 22 | kernel's command line. |
15 | 23 | ||
16 | Please see http://tomoyo.sourceforge.jp/2.3/ for details. | 24 | Please see http://tomoyo.sourceforge.jp/2.3/ for details. |
17 | 25 | ||
18 | --- Where is documentation? --- | 26 | Where is documentation? |
27 | ======================= | ||
19 | 28 | ||
20 | User <-> Kernel interface documentation is available at | 29 | User <-> Kernel interface documentation is available at |
21 | http://tomoyo.sourceforge.jp/2.3/policy-reference.html . | 30 | http://tomoyo.sourceforge.jp/2.3/policy-reference.html . |
@@ -42,7 +51,8 @@ History of TOMOYO? | |||
42 | Realities of Mainlining | 51 | Realities of Mainlining |
43 | http://sourceforge.jp/projects/tomoyo/docs/lfj2008.pdf | 52 | http://sourceforge.jp/projects/tomoyo/docs/lfj2008.pdf |
44 | 53 | ||
45 | --- What is future plan? --- | 54 | What is future plan? |
55 | ==================== | ||
46 | 56 | ||
47 | We believe that inode based security and name based security are complementary | 57 | We believe that inode based security and name based security are complementary |
48 | and both should be used together. But unfortunately, so far, we cannot enable | 58 | and both should be used together. But unfortunately, so far, we cannot enable |
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index b96e80f79e85..b5343c5aa224 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst | |||
@@ -55,12 +55,6 @@ Documentation | |||
55 | contains information about the problems, which may result by upgrading | 55 | contains information about the problems, which may result by upgrading |
56 | your kernel. | 56 | your kernel. |
57 | 57 | ||
58 | - The Documentation/DocBook/ subdirectory contains several guides for | ||
59 | kernel developers and users. These guides can be rendered in a | ||
60 | number of formats: PostScript (.ps), PDF, HTML, & man-pages, among others. | ||
61 | After installation, ``make psdocs``, ``make pdfdocs``, ``make htmldocs``, | ||
62 | or ``make mandocs`` will render the documentation in the requested format. | ||
63 | |||
64 | Installing the kernel source | 58 | Installing the kernel source |
65 | ---------------------------- | 59 | ---------------------------- |
66 | 60 | ||
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index 6d99a7ce6e21..5bb9161dbe6a 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst | |||
@@ -62,6 +62,7 @@ configure specific aspects of kernel behavior to your liking. | |||
62 | ras | 62 | ras |
63 | pm/index | 63 | pm/index |
64 | thunderbolt | 64 | thunderbolt |
65 | LSM/index | ||
65 | 66 | ||
66 | .. only:: subproject and html | 67 | .. only:: subproject and html |
67 | 68 | ||
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 783010e95f51..3b335c1f8441 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt | |||
@@ -727,7 +727,8 @@ | |||
727 | See also Documentation/input/joystick-parport.txt | 727 | See also Documentation/input/joystick-parport.txt |
728 | 728 | ||
729 | ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot | 729 | ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot |
730 | time. See Documentation/dynamic-debug-howto.txt for | 730 | time. See |
731 | Documentation/admin-guide/dynamic-debug-howto.rst for | ||
731 | details. Deprecated, see dyndbg. | 732 | details. Deprecated, see dyndbg. |
732 | 733 | ||
733 | debug [KNL] Enable kernel debugging (events log level). | 734 | debug [KNL] Enable kernel debugging (events log level). |
@@ -890,7 +891,8 @@ | |||
890 | dyndbg[="val"] [KNL,DYNAMIC_DEBUG] | 891 | dyndbg[="val"] [KNL,DYNAMIC_DEBUG] |
891 | module.dyndbg[="val"] | 892 | module.dyndbg[="val"] |
892 | Enable debug messages at boot time. See | 893 | Enable debug messages at boot time. See |
893 | Documentation/dynamic-debug-howto.txt for details. | 894 | Documentation/admin-guide/dynamic-debug-howto.rst |
895 | for details. | ||
894 | 896 | ||
895 | nompx [X86] Disables Intel Memory Protection Extensions. | 897 | nompx [X86] Disables Intel Memory Protection Extensions. |
896 | See Documentation/x86/intel_mpx.txt for more | 898 | See Documentation/x86/intel_mpx.txt for more |
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst index 8c7bbf2c88d2..197896718f81 100644 --- a/Documentation/admin-guide/ras.rst +++ b/Documentation/admin-guide/ras.rst | |||
@@ -344,9 +344,9 @@ for more than 2 channels, like Fully Buffered DIMMs (FB-DIMMs) memory | |||
344 | controllers. The following example will assume 2 channels: | 344 | controllers. The following example will assume 2 channels: |
345 | 345 | ||
346 | +------------+-----------------------+ | 346 | +------------+-----------------------+ |
347 | | Chip | Channels | | 347 | | CS Rows | Channels | |
348 | | Select +-----------+-----------+ | 348 | +------------+-----------+-----------+ |
349 | | rows | ``ch0`` | ``ch1`` | | 349 | | | ``ch0`` | ``ch1`` | |
350 | +============+===========+===========+ | 350 | +============+===========+===========+ |
351 | | ``csrow0`` | DIMM_A0 | DIMM_B0 | | 351 | | ``csrow0`` | DIMM_A0 | DIMM_B0 | |
352 | +------------+ | | | 352 | +------------+ | | |
@@ -698,7 +698,7 @@ information indicating that errors have been detected:: | |||
698 | The structure of the message is: | 698 | The structure of the message is: |
699 | 699 | ||
700 | +---------------------------------------+-------------+ | 700 | +---------------------------------------+-------------+ |
701 | | Content + Example | | 701 | | Content | Example | |
702 | +=======================================+=============+ | 702 | +=======================================+=============+ |
703 | | The memory controller | MC0 | | 703 | | The memory controller | MC0 | |
704 | +---------------------------------------+-------------+ | 704 | +---------------------------------------+-------------+ |
@@ -713,7 +713,7 @@ The structure of the message is: | |||
713 | +---------------------------------------+-------------+ | 713 | +---------------------------------------+-------------+ |
714 | | The error syndrome | 0xb741 | | 714 | | The error syndrome | 0xb741 | |
715 | +---------------------------------------+-------------+ | 715 | +---------------------------------------+-------------+ |
716 | | Memory row | row 0 + | 716 | | Memory row | row 0 | |
717 | +---------------------------------------+-------------+ | 717 | +---------------------------------------+-------------+ |
718 | | Memory channel | channel 1 | | 718 | | Memory channel | channel 1 | |
719 | +---------------------------------------+-------------+ | 719 | +---------------------------------------+-------------+ |
diff --git a/Documentation/conf.py b/Documentation/conf.py index bacf9d337c89..71b032bb44fd 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py | |||
@@ -271,8 +271,7 @@ latex_elements = { | |||
271 | 271 | ||
272 | # Additional stuff for the LaTeX preamble. | 272 | # Additional stuff for the LaTeX preamble. |
273 | 'preamble': ''' | 273 | 'preamble': ''' |
274 | % Adjust margins | 274 | \\usepackage{ifthen} |
275 | \\usepackage[margin=0.5in, top=1in, bottom=1in]{geometry} | ||
276 | 275 | ||
277 | % Allow generate some pages in landscape | 276 | % Allow generate some pages in landscape |
278 | \\usepackage{lscape} | 277 | \\usepackage{lscape} |
@@ -281,6 +280,7 @@ latex_elements = { | |||
281 | \\definecolor{NoteColor}{RGB}{204,255,255} | 280 | \\definecolor{NoteColor}{RGB}{204,255,255} |
282 | \\definecolor{WarningColor}{RGB}{255,204,204} | 281 | \\definecolor{WarningColor}{RGB}{255,204,204} |
283 | \\definecolor{AttentionColor}{RGB}{255,255,204} | 282 | \\definecolor{AttentionColor}{RGB}{255,255,204} |
283 | \\definecolor{ImportantColor}{RGB}{192,255,204} | ||
284 | \\definecolor{OtherColor}{RGB}{204,204,204} | 284 | \\definecolor{OtherColor}{RGB}{204,204,204} |
285 | \\newlength{\\mynoticelength} | 285 | \\newlength{\\mynoticelength} |
286 | \\makeatletter\\newenvironment{coloredbox}[1]{% | 286 | \\makeatletter\\newenvironment{coloredbox}[1]{% |
@@ -301,7 +301,12 @@ latex_elements = { | |||
301 | \\ifthenelse% | 301 | \\ifthenelse% |
302 | {\\equal{\\py@noticetype}{attention}}% | 302 | {\\equal{\\py@noticetype}{attention}}% |
303 | {\\colorbox{AttentionColor}{\\usebox{\\@tempboxa}}}% | 303 | {\\colorbox{AttentionColor}{\\usebox{\\@tempboxa}}}% |
304 | {\\colorbox{OtherColor}{\\usebox{\\@tempboxa}}}% | 304 | {% |
305 | \\ifthenelse% | ||
306 | {\\equal{\\py@noticetype}{important}}% | ||
307 | {\\colorbox{ImportantColor}{\\usebox{\\@tempboxa}}}% | ||
308 | {\\colorbox{OtherColor}{\\usebox{\\@tempboxa}}}% | ||
309 | }% | ||
305 | }% | 310 | }% |
306 | }% | 311 | }% |
307 | }\\makeatother | 312 | }\\makeatother |
@@ -336,30 +341,51 @@ latex_elements = { | |||
336 | if major == 1 and minor > 3: | 341 | if major == 1 and minor > 3: |
337 | latex_elements['preamble'] += '\\renewcommand*{\\DUrole}[2]{ #2 }\n' | 342 | latex_elements['preamble'] += '\\renewcommand*{\\DUrole}[2]{ #2 }\n' |
338 | 343 | ||
344 | if major == 1 and minor <= 4: | ||
345 | latex_elements['preamble'] += '\\usepackage[margin=0.5in, top=1in, bottom=1in]{geometry}' | ||
346 | elif major == 1 and (minor > 5 or (minor == 5 and patch >= 3)): | ||
347 | latex_elements['sphinxsetup'] = 'hmargin=0.5in, vmargin=0.5in' | ||
348 | |||
349 | |||
339 | # Grouping the document tree into LaTeX files. List of tuples | 350 | # Grouping the document tree into LaTeX files. List of tuples |
340 | # (source start file, target name, title, | 351 | # (source start file, target name, title, |
341 | # author, documentclass [howto, manual, or own class]). | 352 | # author, documentclass [howto, manual, or own class]). |
353 | # Sorted in alphabetical order | ||
342 | latex_documents = [ | 354 | latex_documents = [ |
343 | ('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide', | ||
344 | 'The kernel development community', 'manual'), | ||
345 | ('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation', | 355 | ('admin-guide/index', 'linux-user.tex', 'Linux Kernel User Documentation', |
346 | 'The kernel development community', 'manual'), | 356 | 'The kernel development community', 'manual'), |
347 | ('core-api/index', 'core-api.tex', 'The kernel core API manual', | 357 | ('core-api/index', 'core-api.tex', 'The kernel core API manual', |
348 | 'The kernel development community', 'manual'), | 358 | 'The kernel development community', 'manual'), |
349 | ('driver-api/index', 'driver-api.tex', 'The kernel driver API manual', | 359 | ('crypto/index', 'crypto-api.tex', 'Linux Kernel Crypto API manual', |
350 | 'The kernel development community', 'manual'), | 360 | 'The kernel development community', 'manual'), |
351 | ('input/index', 'linux-input.tex', 'The Linux input driver subsystem', | 361 | ('dev-tools/index', 'dev-tools.tex', 'Development tools for the Kernel', |
352 | 'The kernel development community', 'manual'), | 362 | 'The kernel development community', 'manual'), |
353 | ('kernel-documentation', 'kernel-documentation.tex', 'The Linux Kernel Documentation', | 363 | ('doc-guide/index', 'kernel-doc-guide.tex', 'Linux Kernel Documentation Guide', |
354 | 'The kernel development community', 'manual'), | 364 | 'The kernel development community', 'manual'), |
355 | ('process/index', 'development-process.tex', 'Linux Kernel Development Documentation', | 365 | ('driver-api/index', 'driver-api.tex', 'The kernel driver API manual', |
366 | 'The kernel development community', 'manual'), | ||
367 | ('filesystems/index', 'filesystems.tex', 'Linux Filesystems API', | ||
356 | 'The kernel development community', 'manual'), | 368 | 'The kernel development community', 'manual'), |
357 | ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide', | 369 | ('gpu/index', 'gpu.tex', 'Linux GPU Driver Developer\'s Guide', |
358 | 'The kernel development community', 'manual'), | 370 | 'The kernel development community', 'manual'), |
371 | ('input/index', 'linux-input.tex', 'The Linux input driver subsystem', | ||
372 | 'The kernel development community', 'manual'), | ||
373 | ('kernel-hacking/index', 'kernel-hacking.tex', 'Unreliable Guide To Hacking The Linux Kernel', | ||
374 | 'The kernel development community', 'manual'), | ||
359 | ('media/index', 'media.tex', 'Linux Media Subsystem Documentation', | 375 | ('media/index', 'media.tex', 'Linux Media Subsystem Documentation', |
360 | 'The kernel development community', 'manual'), | 376 | 'The kernel development community', 'manual'), |
377 | ('networking/index', 'networking.tex', 'Linux Networking Documentation', | ||
378 | 'The kernel development community', 'manual'), | ||
379 | ('process/index', 'development-process.tex', 'Linux Kernel Development Documentation', | ||
380 | 'The kernel development community', 'manual'), | ||
361 | ('security/index', 'security.tex', 'The kernel security subsystem manual', | 381 | ('security/index', 'security.tex', 'The kernel security subsystem manual', |
362 | 'The kernel development community', 'manual'), | 382 | 'The kernel development community', 'manual'), |
383 | ('sh/index', 'sh.tex', 'SuperH architecture implementation manual', | ||
384 | 'The kernel development community', 'manual'), | ||
385 | ('sound/index', 'sound.tex', 'Linux Sound Subsystem Documentation', | ||
386 | 'The kernel development community', 'manual'), | ||
387 | ('userspace-api/index', 'userspace-api.tex', 'The Linux kernel user-space API guide', | ||
388 | 'The kernel development community', 'manual'), | ||
363 | ] | 389 | ] |
364 | 390 | ||
365 | # The name of an image file (relative to this directory) to place at the top of | 391 | # The name of an image file (relative to this directory) to place at the top of |
diff --git a/Documentation/core-api/assoc_array.rst b/Documentation/core-api/assoc_array.rst index d83cfff9ea43..8231b915c939 100644 --- a/Documentation/core-api/assoc_array.rst +++ b/Documentation/core-api/assoc_array.rst | |||
@@ -10,7 +10,10 @@ properties: | |||
10 | 10 | ||
11 | 1. Objects are opaque pointers. The implementation does not care where they | 11 | 1. Objects are opaque pointers. The implementation does not care where they |
12 | point (if anywhere) or what they point to (if anything). | 12 | point (if anywhere) or what they point to (if anything). |
13 | .. note:: Pointers to objects _must_ be zero in the least significant bit. | 13 | |
14 | .. note:: | ||
15 | |||
16 | Pointers to objects _must_ be zero in the least significant bit. | ||
14 | 17 | ||
15 | 2. Objects do not need to contain linkage blocks for use by the array. This | 18 | 2. Objects do not need to contain linkage blocks for use by the array. This |
16 | permits an object to be located in multiple arrays simultaneously. | 19 | permits an object to be located in multiple arrays simultaneously. |
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 62abd36bfffb..0606be3a3111 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst | |||
@@ -19,6 +19,7 @@ Core utilities | |||
19 | workqueue | 19 | workqueue |
20 | genericirq | 20 | genericirq |
21 | flexible-arrays | 21 | flexible-arrays |
22 | librs | ||
22 | 23 | ||
23 | Interfaces for kernel debugging | 24 | Interfaces for kernel debugging |
24 | =============================== | 25 | =============================== |
diff --git a/Documentation/core-api/librs.rst b/Documentation/core-api/librs.rst new file mode 100644 index 000000000000..6010f5bc5bf9 --- /dev/null +++ b/Documentation/core-api/librs.rst | |||
@@ -0,0 +1,212 @@ | |||
1 | ========================================== | ||
2 | Reed-Solomon Library Programming Interface | ||
3 | ========================================== | ||
4 | |||
5 | :Author: Thomas Gleixner | ||
6 | |||
7 | Introduction | ||
8 | ============ | ||
9 | |||
10 | The generic Reed-Solomon Library provides encoding, decoding and error | ||
11 | correction functions. | ||
12 | |||
13 | Reed-Solomon codes are used in communication and storage applications to | ||
14 | ensure data integrity. | ||
15 | |||
16 | This documentation is provided for developers who want to utilize the | ||
17 | functions provided by the library. | ||
18 | |||
19 | Known Bugs And Assumptions | ||
20 | ========================== | ||
21 | |||
22 | None. | ||
23 | |||
24 | Usage | ||
25 | ===== | ||
26 | |||
27 | This chapter provides examples of how to use the library. | ||
28 | |||
29 | Initializing | ||
30 | ------------ | ||
31 | |||
32 | The init function init_rs returns a pointer to an rs decoder structure, | ||
33 | which holds the necessary information for encoding, decoding and error | ||
34 | correction with the given polynomial. It either uses an existing | ||
35 | matching decoder or creates a new one. On creation all the lookup tables | ||
36 | for fast en/decoding are created. The function may take a while, so make | ||
37 | sure not to call it in critical code paths. | ||
38 | |||
39 | :: | ||
40 | |||
41 | /* the Reed Solomon control structure */ | ||
42 | static struct rs_control *rs_decoder; | ||
43 | |||
44 | /* Symbolsize is 10 (bits) | ||
45 | * Primitive polynomial is x^10+x^3+1 | ||
46 | * first consecutive root is 0 | ||
47 | * primitive element to generate roots = 1 | ||
48 | * generator polynomial degree (number of roots) = 6 | ||
49 | */ | ||
50 | rs_decoder = init_rs (10, 0x409, 0, 1, 6); | ||
51 | |||
52 | |||
53 | Encoding | ||
54 | -------- | ||
55 | |||
56 | The encoder calculates the Reed-Solomon code over the given data length | ||
57 | and stores the result in the parity buffer. Note that the parity buffer | ||
58 | must be initialized before calling the encoder. | ||
59 | |||
60 | The expanded data can be inverted on the fly by providing a non-zero | ||
61 | inversion mask. The expanded data is XOR'ed with the mask. This is used | ||
62 | e.g. for FLASH ECC, where the all 0xFF is inverted to an all 0x00. The | ||
63 | Reed-Solomon code for all 0x00 is all 0x00. The code is inverted before | ||
64 | storing to FLASH so it is 0xFF too. This prevents that reading from an | ||
65 | erased FLASH results in ECC errors. | ||
66 | |||
67 | The databytes are expanded to the given symbol size on the fly. There is | ||
68 | no support for encoding continuous bitstreams with a symbol size != 8 at | ||
69 | the moment. If it is necessary it should be not a big deal to implement | ||
70 | such functionality. | ||
71 | |||
72 | :: | ||
73 | |||
74 | /* Parity buffer. Size = number of roots */ | ||
75 | uint16_t par[6]; | ||
76 | /* Initialize the parity buffer */ | ||
77 | memset(par, 0, sizeof(par)); | ||
78 | /* Encode 512 byte in data8. Store parity in buffer par */ | ||
79 | encode_rs8 (rs_decoder, data8, 512, par, 0); | ||
80 | |||
81 | |||
82 | Decoding | ||
83 | -------- | ||
84 | |||
85 | The decoder calculates the syndrome over the given data length and the | ||
86 | received parity symbols and corrects errors in the data. | ||
87 | |||
88 | If a syndrome is available from a hardware decoder then the syndrome | ||
89 | calculation is skipped. | ||
90 | |||
91 | The correction of the data buffer can be suppressed by providing a | ||
92 | correction pattern buffer and an error location buffer to the decoder. | ||
93 | The decoder stores the calculated error location and the correction | ||
94 | bitmask in the given buffers. This is useful for hardware decoders which | ||
95 | use a weird bit ordering scheme. | ||
96 | |||
97 | The databytes are expanded to the given symbol size on the fly. There is | ||
98 | no support for decoding continuous bitstreams with a symbolsize != 8 at | ||
99 | the moment. If it is necessary it should be not a big deal to implement | ||
100 | such functionality. | ||
101 | |||
102 | Decoding with syndrome calculation, direct data correction | ||
103 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
104 | |||
105 | :: | ||
106 | |||
107 | /* Parity buffer. Size = number of roots */ | ||
108 | uint16_t par[6]; | ||
109 | uint8_t data[512]; | ||
110 | int numerr; | ||
111 | /* Receive data */ | ||
112 | ..... | ||
113 | /* Receive parity */ | ||
114 | ..... | ||
115 | /* Decode 512 byte in data8.*/ | ||
116 | numerr = decode_rs8 (rs_decoder, data8, par, 512, NULL, 0, NULL, 0, NULL); | ||
117 | |||
118 | |||
119 | Decoding with syndrome given by hardware decoder, direct data correction | ||
120 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
121 | |||
122 | :: | ||
123 | |||
124 | /* Parity buffer. Size = number of roots */ | ||
125 | uint16_t par[6], syn[6]; | ||
126 | uint8_t data[512]; | ||
127 | int numerr; | ||
128 | /* Receive data */ | ||
129 | ..... | ||
130 | /* Receive parity */ | ||
131 | ..... | ||
132 | /* Get syndrome from hardware decoder */ | ||
133 | ..... | ||
134 | /* Decode 512 byte in data8.*/ | ||
135 | numerr = decode_rs8 (rs_decoder, data8, par, 512, syn, 0, NULL, 0, NULL); | ||
136 | |||
137 | |||
138 | Decoding with syndrome given by hardware decoder, no direct data correction. | ||
139 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
140 | |||
141 | Note: It's not necessary to give data and received parity to the | ||
142 | decoder. | ||
143 | |||
144 | :: | ||
145 | |||
146 | /* Parity buffer. Size = number of roots */ | ||
147 | uint16_t par[6], syn[6], corr[8]; | ||
148 | uint8_t data[512]; | ||
149 | int numerr, errpos[8]; | ||
150 | /* Receive data */ | ||
151 | ..... | ||
152 | /* Receive parity */ | ||
153 | ..... | ||
154 | /* Get syndrome from hardware decoder */ | ||
155 | ..... | ||
156 | /* Decode 512 byte in data8.*/ | ||
157 | numerr = decode_rs8 (rs_decoder, NULL, NULL, 512, syn, 0, errpos, 0, corr); | ||
158 | for (i = 0; i < numerr; i++) { | ||
159 | do_error_correction_in_your_buffer(errpos[i], corr[i]); | ||
160 | } | ||
161 | |||
162 | |||
163 | Cleanup | ||
164 | ------- | ||
165 | |||
166 | The function free_rs frees the allocated resources, if the caller is | ||
167 | the last user of the decoder. | ||
168 | |||
169 | :: | ||
170 | |||
171 | /* Release resources */ | ||
172 | free_rs(rs_decoder); | ||
173 | |||
174 | |||
175 | Structures | ||
176 | ========== | ||
177 | |||
178 | This chapter contains the autogenerated documentation of the structures | ||
179 | which are used in the Reed-Solomon Library and are relevant for a | ||
180 | developer. | ||
181 | |||
182 | .. kernel-doc:: include/linux/rslib.h | ||
183 | :internal: | ||
184 | |||
185 | Public Functions Provided | ||
186 | ========================= | ||
187 | |||
188 | This chapter contains the autogenerated documentation of the | ||
189 | Reed-Solomon functions which are exported. | ||
190 | |||
191 | .. kernel-doc:: lib/reed_solomon/reed_solomon.c | ||
192 | :export: | ||
193 | |||
194 | Credits | ||
195 | ======= | ||
196 | |||
197 | The library code for encoding and decoding was written by Phil Karn. | ||
198 | |||
199 | :: | ||
200 | |||
201 | Copyright 2002, Phil Karn, KA9Q | ||
202 | May be used under the terms of the GNU General Public License (GPL) | ||
203 | |||
204 | |||
205 | The wrapper functions and interfaces are written by Thomas Gleixner. | ||
206 | |||
207 | Many users have provided bugfixes, improvements and helping hands for | ||
208 | testing. Thanks a lot. | ||
209 | |||
210 | The following people have contributed to this document: | ||
211 | |||
212 | Thomas Gleixner\ tglx@linutronix.de | ||
diff --git a/Documentation/crypto/asymmetric-keys.txt b/Documentation/crypto/asymmetric-keys.txt index 5ad6480e3fb9..b82b6ad48488 100644 --- a/Documentation/crypto/asymmetric-keys.txt +++ b/Documentation/crypto/asymmetric-keys.txt | |||
@@ -265,7 +265,7 @@ mandatory: | |||
265 | 265 | ||
266 | The caller passes a pointer to the following struct with all of the fields | 266 | The caller passes a pointer to the following struct with all of the fields |
267 | cleared, except for data, datalen and quotalen [see | 267 | cleared, except for data, datalen and quotalen [see |
268 | Documentation/security/keys.txt]. | 268 | Documentation/security/keys/core.rst]. |
269 | 269 | ||
270 | struct key_preparsed_payload { | 270 | struct key_preparsed_payload { |
271 | char *description; | 271 | char *description; |
diff --git a/Documentation/crypto/conf.py b/Documentation/crypto/conf.py new file mode 100644 index 000000000000..4335d251ddf3 --- /dev/null +++ b/Documentation/crypto/conf.py | |||
@@ -0,0 +1,10 @@ | |||
1 | # -*- coding: utf-8; mode: python -*- | ||
2 | |||
3 | project = 'Linux Kernel Crypto API' | ||
4 | |||
5 | tags.add("subproject") | ||
6 | |||
7 | latex_documents = [ | ||
8 | ('index', 'crypto-api.tex', 'Linux Kernel Crypto API manual', | ||
9 | 'The kernel development community', 'manual'), | ||
10 | ] | ||
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst index 07d881147ef3..4ac991dbddb7 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst | |||
@@ -23,6 +23,7 @@ whole; patches welcome! | |||
23 | kmemleak | 23 | kmemleak |
24 | kmemcheck | 24 | kmemcheck |
25 | gdb-kernel-debugging | 25 | gdb-kernel-debugging |
26 | kgdb | ||
26 | 27 | ||
27 | 28 | ||
28 | .. only:: subproject and html | 29 | .. only:: subproject and html |
diff --git a/Documentation/dev-tools/kgdb.rst b/Documentation/dev-tools/kgdb.rst new file mode 100644 index 000000000000..75273203a35a --- /dev/null +++ b/Documentation/dev-tools/kgdb.rst | |||
@@ -0,0 +1,907 @@ | |||
1 | ================================================= | ||
2 | Using kgdb, kdb and the kernel debugger internals | ||
3 | ================================================= | ||
4 | |||
5 | :Author: Jason Wessel | ||
6 | |||
7 | Introduction | ||
8 | ============ | ||
9 | |||
10 | The kernel has two different debugger front ends (kdb and kgdb) which | ||
11 | interface to the debug core. It is possible to use either of the | ||
12 | debugger front ends and dynamically transition between them if you | ||
13 | configure the kernel properly at compile and runtime. | ||
14 | |||
15 | Kdb is simplistic shell-style interface which you can use on a system | ||
16 | console with a keyboard or serial console. You can use it to inspect | ||
17 | memory, registers, process lists, dmesg, and even set breakpoints to | ||
18 | stop in a certain location. Kdb is not a source level debugger, although | ||
19 | you can set breakpoints and execute some basic kernel run control. Kdb | ||
20 | is mainly aimed at doing some analysis to aid in development or | ||
21 | diagnosing kernel problems. You can access some symbols by name in | ||
22 | kernel built-ins or in kernel modules if the code was built with | ||
23 | ``CONFIG_KALLSYMS``. | ||
24 | |||
25 | Kgdb is intended to be used as a source level debugger for the Linux | ||
26 | kernel. It is used along with gdb to debug a Linux kernel. The | ||
27 | expectation is that gdb can be used to "break in" to the kernel to | ||
28 | inspect memory, variables and look through call stack information | ||
29 | similar to the way an application developer would use gdb to debug an | ||
30 | application. It is possible to place breakpoints in kernel code and | ||
31 | perform some limited execution stepping. | ||
32 | |||
33 | Two machines are required for using kgdb. One of these machines is a | ||
34 | development machine and the other is the target machine. The kernel to | ||
35 | be debugged runs on the target machine. The development machine runs an | ||
36 | instance of gdb against the vmlinux file which contains the symbols (not | ||
37 | a boot image such as bzImage, zImage, uImage...). In gdb the developer | ||
38 | specifies the connection parameters and connects to kgdb. The type of | ||
39 | connection a developer makes with gdb depends on the availability of | ||
40 | kgdb I/O modules compiled as built-ins or loadable kernel modules in the | ||
41 | test machine's kernel. | ||
42 | |||
43 | Compiling a kernel | ||
44 | ================== | ||
45 | |||
46 | - In order to enable compilation of kdb, you must first enable kgdb. | ||
47 | |||
48 | - The kgdb test compile options are described in the kgdb test suite | ||
49 | chapter. | ||
50 | |||
51 | Kernel config options for kgdb | ||
52 | ------------------------------ | ||
53 | |||
54 | To enable ``CONFIG_KGDB`` you should look under | ||
55 | :menuselection:`Kernel hacking --> Kernel debugging` and select | ||
56 | :menuselection:`KGDB: kernel debugger`. | ||
57 | |||
58 | While it is not a hard requirement that you have symbols in your vmlinux | ||
59 | file, gdb tends not to be very useful without the symbolic data, so you | ||
60 | will want to turn on ``CONFIG_DEBUG_INFO`` which is called | ||
61 | :menuselection:`Compile the kernel with debug info` in the config menu. | ||
62 | |||
63 | It is advised, but not required, that you turn on the | ||
64 | ``CONFIG_FRAME_POINTER`` kernel option which is called :menuselection:`Compile | ||
65 | the kernel with frame pointers` in the config menu. This option inserts code | ||
66 | to into the compiled executable which saves the frame information in | ||
67 | registers or on the stack at different points which allows a debugger | ||
68 | such as gdb to more accurately construct stack back traces while | ||
69 | debugging the kernel. | ||
70 | |||
71 | If the architecture that you are using supports the kernel option | ||
72 | ``CONFIG_STRICT_KERNEL_RWX``, you should consider turning it off. This | ||
73 | option will prevent the use of software breakpoints because it marks | ||
74 | certain regions of the kernel's memory space as read-only. If kgdb | ||
75 | supports it for the architecture you are using, you can use hardware | ||
76 | breakpoints if you desire to run with the ``CONFIG_STRICT_KERNEL_RWX`` | ||
77 | option turned on, else you need to turn off this option. | ||
78 | |||
79 | Next you should choose one of more I/O drivers to interconnect debugging | ||
80 | host and debugged target. Early boot debugging requires a KGDB I/O | ||
81 | driver that supports early debugging and the driver must be built into | ||
82 | the kernel directly. Kgdb I/O driver configuration takes place via | ||
83 | kernel or module parameters which you can learn more about in the in the | ||
84 | section that describes the parameter kgdboc. | ||
85 | |||
86 | Here is an example set of ``.config`` symbols to enable or disable for kgdb:: | ||
87 | |||
88 | # CONFIG_STRICT_KERNEL_RWX is not set | ||
89 | CONFIG_FRAME_POINTER=y | ||
90 | CONFIG_KGDB=y | ||
91 | CONFIG_KGDB_SERIAL_CONSOLE=y | ||
92 | |||
93 | Kernel config options for kdb | ||
94 | ----------------------------- | ||
95 | |||
96 | Kdb is quite a bit more complex than the simple gdbstub sitting on top | ||
97 | of the kernel's debug core. Kdb must implement a shell, and also adds | ||
98 | some helper functions in other parts of the kernel, responsible for | ||
99 | printing out interesting data such as what you would see if you ran | ||
100 | ``lsmod``, or ``ps``. In order to build kdb into the kernel you follow the | ||
101 | same steps as you would for kgdb. | ||
102 | |||
103 | The main config option for kdb is ``CONFIG_KGDB_KDB`` which is called | ||
104 | :menuselection:`KGDB_KDB: include kdb frontend for kgdb` in the config menu. | ||
105 | In theory you would have already also selected an I/O driver such as the | ||
106 | ``CONFIG_KGDB_SERIAL_CONSOLE`` interface if you plan on using kdb on a | ||
107 | serial port, when you were configuring kgdb. | ||
108 | |||
109 | If you want to use a PS/2-style keyboard with kdb, you would select | ||
110 | ``CONFIG_KDB_KEYBOARD`` which is called :menuselection:`KGDB_KDB: keyboard as | ||
111 | input device` in the config menu. The ``CONFIG_KDB_KEYBOARD`` option is not | ||
112 | used for anything in the gdb interface to kgdb. The ``CONFIG_KDB_KEYBOARD`` | ||
113 | option only works with kdb. | ||
114 | |||
115 | Here is an example set of ``.config`` symbols to enable/disable kdb:: | ||
116 | |||
117 | # CONFIG_STRICT_KERNEL_RWX is not set | ||
118 | CONFIG_FRAME_POINTER=y | ||
119 | CONFIG_KGDB=y | ||
120 | CONFIG_KGDB_SERIAL_CONSOLE=y | ||
121 | CONFIG_KGDB_KDB=y | ||
122 | CONFIG_KDB_KEYBOARD=y | ||
123 | |||
124 | Kernel Debugger Boot Arguments | ||
125 | ============================== | ||
126 | |||
127 | This section describes the various runtime kernel parameters that affect | ||
128 | the configuration of the kernel debugger. The following chapter covers | ||
129 | using kdb and kgdb as well as providing some examples of the | ||
130 | configuration parameters. | ||
131 | |||
132 | Kernel parameter: kgdboc | ||
133 | ------------------------ | ||
134 | |||
135 | The kgdboc driver was originally an abbreviation meant to stand for | ||
136 | "kgdb over console". Today it is the primary mechanism to configure how | ||
137 | to communicate from gdb to kgdb as well as the devices you want to use | ||
138 | to interact with the kdb shell. | ||
139 | |||
140 | For kgdb/gdb, kgdboc is designed to work with a single serial port. It | ||
141 | is intended to cover the circumstance where you want to use a serial | ||
142 | console as your primary console as well as using it to perform kernel | ||
143 | debugging. It is also possible to use kgdb on a serial port which is not | ||
144 | designated as a system console. Kgdboc may be configured as a kernel | ||
145 | built-in or a kernel loadable module. You can only make use of | ||
146 | ``kgdbwait`` and early debugging if you build kgdboc into the kernel as | ||
147 | a built-in. | ||
148 | |||
149 | Optionally you can elect to activate kms (Kernel Mode Setting) | ||
150 | integration. When you use kms with kgdboc and you have a video driver | ||
151 | that has atomic mode setting hooks, it is possible to enter the debugger | ||
152 | on the graphics console. When the kernel execution is resumed, the | ||
153 | previous graphics mode will be restored. This integration can serve as a | ||
154 | useful tool to aid in diagnosing crashes or doing analysis of memory | ||
155 | with kdb while allowing the full graphics console applications to run. | ||
156 | |||
157 | kgdboc arguments | ||
158 | ~~~~~~~~~~~~~~~~ | ||
159 | |||
160 | Usage:: | ||
161 | |||
162 | kgdboc=[kms][[,]kbd][[,]serial_device][,baud] | ||
163 | |||
164 | The order listed above must be observed if you use any of the optional | ||
165 | configurations together. | ||
166 | |||
167 | Abbreviations: | ||
168 | |||
169 | - kms = Kernel Mode Setting | ||
170 | |||
171 | - kbd = Keyboard | ||
172 | |||
173 | You can configure kgdboc to use the keyboard, and/or a serial device | ||
174 | depending on if you are using kdb and/or kgdb, in one of the following | ||
175 | scenarios. The order listed above must be observed if you use any of the | ||
176 | optional configurations together. Using kms + only gdb is generally not | ||
177 | a useful combination. | ||
178 | |||
179 | Using loadable module or built-in | ||
180 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
181 | |||
182 | 1. As a kernel built-in: | ||
183 | |||
184 | Use the kernel boot argument:: | ||
185 | |||
186 | kgdboc=<tty-device>,[baud] | ||
187 | |||
188 | 2. As a kernel loadable module: | ||
189 | |||
190 | Use the command:: | ||
191 | |||
192 | modprobe kgdboc kgdboc=<tty-device>,[baud] | ||
193 | |||
194 | Here are two examples of how you might format the kgdboc string. The | ||
195 | first is for an x86 target using the first serial port. The second | ||
196 | example is for the ARM Versatile AB using the second serial port. | ||
197 | |||
198 | 1. ``kgdboc=ttyS0,115200`` | ||
199 | |||
200 | 2. ``kgdboc=ttyAMA1,115200`` | ||
201 | |||
202 | Configure kgdboc at runtime with sysfs | ||
203 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
204 | |||
205 | At run time you can enable or disable kgdboc by echoing a parameters | ||
206 | into the sysfs. Here are two examples: | ||
207 | |||
208 | 1. Enable kgdboc on ttyS0:: | ||
209 | |||
210 | echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc | ||
211 | |||
212 | 2. Disable kgdboc:: | ||
213 | |||
214 | echo "" > /sys/module/kgdboc/parameters/kgdboc | ||
215 | |||
216 | .. note:: | ||
217 | |||
218 | You do not need to specify the baud if you are configuring the | ||
219 | console on tty which is already configured or open. | ||
220 | |||
221 | More examples | ||
222 | ^^^^^^^^^^^^^ | ||
223 | |||
224 | You can configure kgdboc to use the keyboard, and/or a serial device | ||
225 | depending on if you are using kdb and/or kgdb, in one of the following | ||
226 | scenarios. | ||
227 | |||
228 | 1. kdb and kgdb over only a serial port:: | ||
229 | |||
230 | kgdboc=<serial_device>[,baud] | ||
231 | |||
232 | Example:: | ||
233 | |||
234 | kgdboc=ttyS0,115200 | ||
235 | |||
236 | 2. kdb and kgdb with keyboard and a serial port:: | ||
237 | |||
238 | kgdboc=kbd,<serial_device>[,baud] | ||
239 | |||
240 | Example:: | ||
241 | |||
242 | kgdboc=kbd,ttyS0,115200 | ||
243 | |||
244 | 3. kdb with a keyboard:: | ||
245 | |||
246 | kgdboc=kbd | ||
247 | |||
248 | 4. kdb with kernel mode setting:: | ||
249 | |||
250 | kgdboc=kms,kbd | ||
251 | |||
252 | 5. kdb with kernel mode setting and kgdb over a serial port:: | ||
253 | |||
254 | kgdboc=kms,kbd,ttyS0,115200 | ||
255 | |||
256 | .. note:: | ||
257 | |||
258 | Kgdboc does not support interrupting the target via the gdb remote | ||
259 | protocol. You must manually send a :kbd:`SysRq-G` unless you have a proxy | ||
260 | that splits console output to a terminal program. A console proxy has a | ||
261 | separate TCP port for the debugger and a separate TCP port for the | ||
262 | "human" console. The proxy can take care of sending the :kbd:`SysRq-G` | ||
263 | for you. | ||
264 | |||
265 | When using kgdboc with no debugger proxy, you can end up connecting the | ||
266 | debugger at one of two entry points. If an exception occurs after you | ||
267 | have loaded kgdboc, a message should print on the console stating it is | ||
268 | waiting for the debugger. In this case you disconnect your terminal | ||
269 | program and then connect the debugger in its place. If you want to | ||
270 | interrupt the target system and forcibly enter a debug session you have | ||
271 | to issue a :kbd:`Sysrq` sequence and then type the letter :kbd:`g`. Then you | ||
272 | disconnect the terminal session and connect gdb. Your options if you | ||
273 | don't like this are to hack gdb to send the :kbd:`SysRq-G` for you as well as | ||
274 | on the initial connect, or to use a debugger proxy that allows an | ||
275 | unmodified gdb to do the debugging. | ||
276 | |||
277 | Kernel parameter: ``kgdbwait`` | ||
278 | ------------------------------ | ||
279 | |||
280 | The Kernel command line option ``kgdbwait`` makes kgdb wait for a | ||
281 | debugger connection during booting of a kernel. You can only use this | ||
282 | option if you compiled a kgdb I/O driver into the kernel and you | ||
283 | specified the I/O driver configuration as a kernel command line option. | ||
284 | The kgdbwait parameter should always follow the configuration parameter | ||
285 | for the kgdb I/O driver in the kernel command line else the I/O driver | ||
286 | will not be configured prior to asking the kernel to use it to wait. | ||
287 | |||
288 | The kernel will stop and wait as early as the I/O driver and | ||
289 | architecture allows when you use this option. If you build the kgdb I/O | ||
290 | driver as a loadable kernel module kgdbwait will not do anything. | ||
291 | |||
292 | Kernel parameter: ``kgdbcon`` | ||
293 | ----------------------------- | ||
294 | |||
295 | The ``kgdbcon`` feature allows you to see :c:func:`printk` messages inside gdb | ||
296 | while gdb is connected to the kernel. Kdb does not make use of the kgdbcon | ||
297 | feature. | ||
298 | |||
299 | Kgdb supports using the gdb serial protocol to send console messages to | ||
300 | the debugger when the debugger is connected and running. There are two | ||
301 | ways to activate this feature. | ||
302 | |||
303 | 1. Activate with the kernel command line option:: | ||
304 | |||
305 | kgdbcon | ||
306 | |||
307 | 2. Use sysfs before configuring an I/O driver:: | ||
308 | |||
309 | echo 1 > /sys/module/kgdb/parameters/kgdb_use_con | ||
310 | |||
311 | .. note:: | ||
312 | |||
313 | If you do this after you configure the kgdb I/O driver, the | ||
314 | setting will not take effect until the next point the I/O is | ||
315 | reconfigured. | ||
316 | |||
317 | .. important:: | ||
318 | |||
319 | You cannot use kgdboc + kgdbcon on a tty that is an | ||
320 | active system console. An example of incorrect usage is:: | ||
321 | |||
322 | console=ttyS0,115200 kgdboc=ttyS0 kgdbcon | ||
323 | |||
324 | It is possible to use this option with kgdboc on a tty that is not a | ||
325 | system console. | ||
326 | |||
327 | Run time parameter: ``kgdbreboot`` | ||
328 | ---------------------------------- | ||
329 | |||
330 | The kgdbreboot feature allows you to change how the debugger deals with | ||
331 | the reboot notification. You have 3 choices for the behavior. The | ||
332 | default behavior is always set to 0. | ||
333 | |||
334 | .. tabularcolumns:: |p{0.4cm}|p{11.5cm}|p{5.6cm}| | ||
335 | |||
336 | .. flat-table:: | ||
337 | :widths: 1 10 8 | ||
338 | |||
339 | * - 1 | ||
340 | - ``echo -1 > /sys/module/debug_core/parameters/kgdbreboot`` | ||
341 | - Ignore the reboot notification entirely. | ||
342 | |||
343 | * - 2 | ||
344 | - ``echo 0 > /sys/module/debug_core/parameters/kgdbreboot`` | ||
345 | - Send the detach message to any attached debugger client. | ||
346 | |||
347 | * - 3 | ||
348 | - ``echo 1 > /sys/module/debug_core/parameters/kgdbreboot`` | ||
349 | - Enter the debugger on reboot notify. | ||
350 | |||
351 | Using kdb | ||
352 | ========= | ||
353 | |||
354 | Quick start for kdb on a serial port | ||
355 | ------------------------------------ | ||
356 | |||
357 | This is a quick example of how to use kdb. | ||
358 | |||
359 | 1. Configure kgdboc at boot using kernel parameters:: | ||
360 | |||
361 | console=ttyS0,115200 kgdboc=ttyS0,115200 | ||
362 | |||
363 | OR | ||
364 | |||
365 | Configure kgdboc after the kernel has booted; assuming you are using | ||
366 | a serial port console:: | ||
367 | |||
368 | echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc | ||
369 | |||
370 | 2. Enter the kernel debugger manually or by waiting for an oops or | ||
371 | fault. There are several ways you can enter the kernel debugger | ||
372 | manually; all involve using the :kbd:`SysRq-G`, which means you must have | ||
373 | enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config. | ||
374 | |||
375 | - When logged in as root or with a super user session you can run:: | ||
376 | |||
377 | echo g > /proc/sysrq-trigger | ||
378 | |||
379 | - Example using minicom 2.2 | ||
380 | |||
381 | Press: :kbd:`CTRL-A` :kbd:`f` :kbd:`g` | ||
382 | |||
383 | - When you have telneted to a terminal server that supports sending | ||
384 | a remote break | ||
385 | |||
386 | Press: :kbd:`CTRL-]` | ||
387 | |||
388 | Type in: ``send break`` | ||
389 | |||
390 | Press: :kbd:`Enter` :kbd:`g` | ||
391 | |||
392 | 3. From the kdb prompt you can run the ``help`` command to see a complete | ||
393 | list of the commands that are available. | ||
394 | |||
395 | Some useful commands in kdb include: | ||
396 | |||
397 | =========== ================================================================= | ||
398 | ``lsmod`` Shows where kernel modules are loaded | ||
399 | ``ps`` Displays only the active processes | ||
400 | ``ps A`` Shows all the processes | ||
401 | ``summary`` Shows kernel version info and memory usage | ||
402 | ``bt`` Get a backtrace of the current process using :c:func:`dump_stack` | ||
403 | ``dmesg`` View the kernel syslog buffer | ||
404 | ``go`` Continue the system | ||
405 | =========== ================================================================= | ||
406 | |||
407 | 4. When you are done using kdb you need to consider rebooting the system | ||
408 | or using the ``go`` command to resuming normal kernel execution. If you | ||
409 | have paused the kernel for a lengthy period of time, applications | ||
410 | that rely on timely networking or anything to do with real wall clock | ||
411 | time could be adversely affected, so you should take this into | ||
412 | consideration when using the kernel debugger. | ||
413 | |||
414 | Quick start for kdb using a keyboard connected console | ||
415 | ------------------------------------------------------ | ||
416 | |||
417 | This is a quick example of how to use kdb with a keyboard. | ||
418 | |||
419 | 1. Configure kgdboc at boot using kernel parameters:: | ||
420 | |||
421 | kgdboc=kbd | ||
422 | |||
423 | OR | ||
424 | |||
425 | Configure kgdboc after the kernel has booted:: | ||
426 | |||
427 | echo kbd > /sys/module/kgdboc/parameters/kgdboc | ||
428 | |||
429 | 2. Enter the kernel debugger manually or by waiting for an oops or | ||
430 | fault. There are several ways you can enter the kernel debugger | ||
431 | manually; all involve using the :kbd:`SysRq-G`, which means you must have | ||
432 | enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config. | ||
433 | |||
434 | - When logged in as root or with a super user session you can run:: | ||
435 | |||
436 | echo g > /proc/sysrq-trigger | ||
437 | |||
438 | - Example using a laptop keyboard: | ||
439 | |||
440 | Press and hold down: :kbd:`Alt` | ||
441 | |||
442 | Press and hold down: :kbd:`Fn` | ||
443 | |||
444 | Press and release the key with the label: :kbd:`SysRq` | ||
445 | |||
446 | Release: :kbd:`Fn` | ||
447 | |||
448 | Press and release: :kbd:`g` | ||
449 | |||
450 | Release: :kbd:`Alt` | ||
451 | |||
452 | - Example using a PS/2 101-key keyboard | ||
453 | |||
454 | Press and hold down: :kbd:`Alt` | ||
455 | |||
456 | Press and release the key with the label: :kbd:`SysRq` | ||
457 | |||
458 | Press and release: :kbd:`g` | ||
459 | |||
460 | Release: :kbd:`Alt` | ||
461 | |||
462 | 3. Now type in a kdb command such as ``help``, ``dmesg``, ``bt`` or ``go`` to | ||
463 | continue kernel execution. | ||
464 | |||
465 | Using kgdb / gdb | ||
466 | ================ | ||
467 | |||
468 | In order to use kgdb you must activate it by passing configuration | ||
469 | information to one of the kgdb I/O drivers. If you do not pass any | ||
470 | configuration information kgdb will not do anything at all. Kgdb will | ||
471 | only actively hook up to the kernel trap hooks if a kgdb I/O driver is | ||
472 | loaded and configured. If you unconfigure a kgdb I/O driver, kgdb will | ||
473 | unregister all the kernel hook points. | ||
474 | |||
475 | All kgdb I/O drivers can be reconfigured at run time, if | ||
476 | ``CONFIG_SYSFS`` and ``CONFIG_MODULES`` are enabled, by echo'ing a new | ||
477 | config string to ``/sys/module/<driver>/parameter/<option>``. The driver | ||
478 | can be unconfigured by passing an empty string. You cannot change the | ||
479 | configuration while the debugger is attached. Make sure to detach the | ||
480 | debugger with the ``detach`` command prior to trying to unconfigure a | ||
481 | kgdb I/O driver. | ||
482 | |||
483 | Connecting with gdb to a serial port | ||
484 | ------------------------------------ | ||
485 | |||
486 | 1. Configure kgdboc | ||
487 | |||
488 | Configure kgdboc at boot using kernel parameters:: | ||
489 | |||
490 | kgdboc=ttyS0,115200 | ||
491 | |||
492 | OR | ||
493 | |||
494 | Configure kgdboc after the kernel has booted:: | ||
495 | |||
496 | echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc | ||
497 | |||
498 | 2. Stop kernel execution (break into the debugger) | ||
499 | |||
500 | In order to connect to gdb via kgdboc, the kernel must first be | ||
501 | stopped. There are several ways to stop the kernel which include | ||
502 | using kgdbwait as a boot argument, via a :kbd:`SysRq-G`, or running the | ||
503 | kernel until it takes an exception where it waits for the debugger to | ||
504 | attach. | ||
505 | |||
506 | - When logged in as root or with a super user session you can run:: | ||
507 | |||
508 | echo g > /proc/sysrq-trigger | ||
509 | |||
510 | - Example using minicom 2.2 | ||
511 | |||
512 | Press: :kbd:`CTRL-A` :kbd:`f` :kbd:`g` | ||
513 | |||
514 | - When you have telneted to a terminal server that supports sending | ||
515 | a remote break | ||
516 | |||
517 | Press: :kbd:`CTRL-]` | ||
518 | |||
519 | Type in: ``send break`` | ||
520 | |||
521 | Press: :kbd:`Enter` :kbd:`g` | ||
522 | |||
523 | 3. Connect from gdb | ||
524 | |||
525 | Example (using a directly connected port):: | ||
526 | |||
527 | % gdb ./vmlinux | ||
528 | (gdb) set remotebaud 115200 | ||
529 | (gdb) target remote /dev/ttyS0 | ||
530 | |||
531 | |||
532 | Example (kgdb to a terminal server on TCP port 2012):: | ||
533 | |||
534 | % gdb ./vmlinux | ||
535 | (gdb) target remote 192.168.2.2:2012 | ||
536 | |||
537 | |||
538 | Once connected, you can debug a kernel the way you would debug an | ||
539 | application program. | ||
540 | |||
541 | If you are having problems connecting or something is going seriously | ||
542 | wrong while debugging, it will most often be the case that you want | ||
543 | to enable gdb to be verbose about its target communications. You do | ||
544 | this prior to issuing the ``target remote`` command by typing in:: | ||
545 | |||
546 | set debug remote 1 | ||
547 | |||
548 | Remember if you continue in gdb, and need to "break in" again, you need | ||
549 | to issue an other :kbd:`SysRq-G`. It is easy to create a simple entry point by | ||
550 | putting a breakpoint at ``sys_sync`` and then you can run ``sync`` from a | ||
551 | shell or script to break into the debugger. | ||
552 | |||
553 | kgdb and kdb interoperability | ||
554 | ============================= | ||
555 | |||
556 | It is possible to transition between kdb and kgdb dynamically. The debug | ||
557 | core will remember which you used the last time and automatically start | ||
558 | in the same mode. | ||
559 | |||
560 | Switching between kdb and kgdb | ||
561 | ------------------------------ | ||
562 | |||
563 | Switching from kgdb to kdb | ||
564 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
565 | |||
566 | There are two ways to switch from kgdb to kdb: you can use gdb to issue | ||
567 | a maintenance packet, or you can blindly type the command ``$3#33``. | ||
568 | Whenever the kernel debugger stops in kgdb mode it will print the | ||
569 | message ``KGDB or $3#33 for KDB``. It is important to note that you have | ||
570 | to type the sequence correctly in one pass. You cannot type a backspace | ||
571 | or delete because kgdb will interpret that as part of the debug stream. | ||
572 | |||
573 | 1. Change from kgdb to kdb by blindly typing:: | ||
574 | |||
575 | $3#33 | ||
576 | |||
577 | 2. Change from kgdb to kdb with gdb:: | ||
578 | |||
579 | maintenance packet 3 | ||
580 | |||
581 | .. note:: | ||
582 | |||
583 | Now you must kill gdb. Typically you press :kbd:`CTRL-Z` and issue | ||
584 | the command:: | ||
585 | |||
586 | kill -9 % | ||
587 | |||
588 | Change from kdb to kgdb | ||
589 | ~~~~~~~~~~~~~~~~~~~~~~~ | ||
590 | |||
591 | There are two ways you can change from kdb to kgdb. You can manually | ||
592 | enter kgdb mode by issuing the kgdb command from the kdb shell prompt, | ||
593 | or you can connect gdb while the kdb shell prompt is active. The kdb | ||
594 | shell looks for the typical first commands that gdb would issue with the | ||
595 | gdb remote protocol and if it sees one of those commands it | ||
596 | automatically changes into kgdb mode. | ||
597 | |||
598 | 1. From kdb issue the command:: | ||
599 | |||
600 | kgdb | ||
601 | |||
602 | Now disconnect your terminal program and connect gdb in its place | ||
603 | |||
604 | 2. At the kdb prompt, disconnect the terminal program and connect gdb in | ||
605 | its place. | ||
606 | |||
607 | Running kdb commands from gdb | ||
608 | ----------------------------- | ||
609 | |||
610 | It is possible to run a limited set of kdb commands from gdb, using the | ||
611 | gdb monitor command. You don't want to execute any of the run control or | ||
612 | breakpoint operations, because it can disrupt the state of the kernel | ||
613 | debugger. You should be using gdb for breakpoints and run control | ||
614 | operations if you have gdb connected. The more useful commands to run | ||
615 | are things like lsmod, dmesg, ps or possibly some of the memory | ||
616 | information commands. To see all the kdb commands you can run | ||
617 | ``monitor help``. | ||
618 | |||
619 | Example:: | ||
620 | |||
621 | (gdb) monitor ps | ||
622 | 1 idle process (state I) and | ||
623 | 27 sleeping system daemon (state M) processes suppressed, | ||
624 | use 'ps A' to see all. | ||
625 | Task Addr Pid Parent [*] cpu State Thread Command | ||
626 | |||
627 | 0xc78291d0 1 0 0 0 S 0xc7829404 init | ||
628 | 0xc7954150 942 1 0 0 S 0xc7954384 dropbear | ||
629 | 0xc78789c0 944 1 0 0 S 0xc7878bf4 sh | ||
630 | (gdb) | ||
631 | |||
632 | kgdb Test Suite | ||
633 | =============== | ||
634 | |||
635 | When kgdb is enabled in the kernel config you can also elect to enable | ||
636 | the config parameter ``KGDB_TESTS``. Turning this on will enable a special | ||
637 | kgdb I/O module which is designed to test the kgdb internal functions. | ||
638 | |||
639 | The kgdb tests are mainly intended for developers to test the kgdb | ||
640 | internals as well as a tool for developing a new kgdb architecture | ||
641 | specific implementation. These tests are not really for end users of the | ||
642 | Linux kernel. The primary source of documentation would be to look in | ||
643 | the ``drivers/misc/kgdbts.c`` file. | ||
644 | |||
645 | The kgdb test suite can also be configured at compile time to run the | ||
646 | core set of tests by setting the kernel config parameter | ||
647 | ``KGDB_TESTS_ON_BOOT``. This particular option is aimed at automated | ||
648 | regression testing and does not require modifying the kernel boot config | ||
649 | arguments. If this is turned on, the kgdb test suite can be disabled by | ||
650 | specifying ``kgdbts=`` as a kernel boot argument. | ||
651 | |||
652 | Kernel Debugger Internals | ||
653 | ========================= | ||
654 | |||
655 | Architecture Specifics | ||
656 | ---------------------- | ||
657 | |||
658 | The kernel debugger is organized into a number of components: | ||
659 | |||
660 | 1. The debug core | ||
661 | |||
662 | The debug core is found in ``kernel/debugger/debug_core.c``. It | ||
663 | contains: | ||
664 | |||
665 | - A generic OS exception handler which includes sync'ing the | ||
666 | processors into a stopped state on an multi-CPU system. | ||
667 | |||
668 | - The API to talk to the kgdb I/O drivers | ||
669 | |||
670 | - The API to make calls to the arch-specific kgdb implementation | ||
671 | |||
672 | - The logic to perform safe memory reads and writes to memory while | ||
673 | using the debugger | ||
674 | |||
675 | - A full implementation for software breakpoints unless overridden | ||
676 | by the arch | ||
677 | |||
678 | - The API to invoke either the kdb or kgdb frontend to the debug | ||
679 | core. | ||
680 | |||
681 | - The structures and callback API for atomic kernel mode setting. | ||
682 | |||
683 | .. note:: kgdboc is where the kms callbacks are invoked. | ||
684 | |||
685 | 2. kgdb arch-specific implementation | ||
686 | |||
687 | This implementation is generally found in ``arch/*/kernel/kgdb.c``. As | ||
688 | an example, ``arch/x86/kernel/kgdb.c`` contains the specifics to | ||
689 | implement HW breakpoint as well as the initialization to dynamically | ||
690 | register and unregister for the trap handlers on this architecture. | ||
691 | The arch-specific portion implements: | ||
692 | |||
693 | - contains an arch-specific trap catcher which invokes | ||
694 | :c:func:`kgdb_handle_exception` to start kgdb about doing its work | ||
695 | |||
696 | - translation to and from gdb specific packet format to :c:type:`pt_regs` | ||
697 | |||
698 | - Registration and unregistration of architecture specific trap | ||
699 | hooks | ||
700 | |||
701 | - Any special exception handling and cleanup | ||
702 | |||
703 | - NMI exception handling and cleanup | ||
704 | |||
705 | - (optional) HW breakpoints | ||
706 | |||
707 | 3. gdbstub frontend (aka kgdb) | ||
708 | |||
709 | The gdbstub is located in ``kernel/debug/gdbstub.c``. It contains: | ||
710 | |||
711 | - All the logic to implement the gdb serial protocol | ||
712 | |||
713 | 4. kdb frontend | ||
714 | |||
715 | The kdb debugger shell is broken down into a number of components. | ||
716 | The kdb core is located in kernel/debug/kdb. There are a number of | ||
717 | helper functions in some of the other kernel components to make it | ||
718 | possible for kdb to examine and report information about the kernel | ||
719 | without taking locks that could cause a kernel deadlock. The kdb core | ||
720 | contains implements the following functionality. | ||
721 | |||
722 | - A simple shell | ||
723 | |||
724 | - The kdb core command set | ||
725 | |||
726 | - A registration API to register additional kdb shell commands. | ||
727 | |||
728 | - A good example of a self-contained kdb module is the ``ftdump`` | ||
729 | command for dumping the ftrace buffer. See: | ||
730 | ``kernel/trace/trace_kdb.c`` | ||
731 | |||
732 | - For an example of how to dynamically register a new kdb command | ||
733 | you can build the kdb_hello.ko kernel module from | ||
734 | ``samples/kdb/kdb_hello.c``. To build this example you can set | ||
735 | ``CONFIG_SAMPLES=y`` and ``CONFIG_SAMPLE_KDB=m`` in your kernel | ||
736 | config. Later run ``modprobe kdb_hello`` and the next time you | ||
737 | enter the kdb shell, you can run the ``hello`` command. | ||
738 | |||
739 | - The implementation for :c:func:`kdb_printf` which emits messages directly | ||
740 | to I/O drivers, bypassing the kernel log. | ||
741 | |||
742 | - SW / HW breakpoint management for the kdb shell | ||
743 | |||
744 | 5. kgdb I/O driver | ||
745 | |||
746 | Each kgdb I/O driver has to provide an implementation for the | ||
747 | following: | ||
748 | |||
749 | - configuration via built-in or module | ||
750 | |||
751 | - dynamic configuration and kgdb hook registration calls | ||
752 | |||
753 | - read and write character interface | ||
754 | |||
755 | - A cleanup handler for unconfiguring from the kgdb core | ||
756 | |||
757 | - (optional) Early debug methodology | ||
758 | |||
759 | Any given kgdb I/O driver has to operate very closely with the | ||
760 | hardware and must do it in such a way that does not enable interrupts | ||
761 | or change other parts of the system context without completely | ||
762 | restoring them. The kgdb core will repeatedly "poll" a kgdb I/O | ||
763 | driver for characters when it needs input. The I/O driver is expected | ||
764 | to return immediately if there is no data available. Doing so allows | ||
765 | for the future possibility to touch watchdog hardware in such a way | ||
766 | as to have a target system not reset when these are enabled. | ||
767 | |||
768 | If you are intent on adding kgdb architecture specific support for a new | ||
769 | architecture, the architecture should define ``HAVE_ARCH_KGDB`` in the | ||
770 | architecture specific Kconfig file. This will enable kgdb for the | ||
771 | architecture, and at that point you must create an architecture specific | ||
772 | kgdb implementation. | ||
773 | |||
774 | There are a few flags which must be set on every architecture in their | ||
775 | ``asm/kgdb.h`` file. These are: | ||
776 | |||
777 | - ``NUMREGBYTES``: | ||
778 | The size in bytes of all of the registers, so that we | ||
779 | can ensure they will all fit into a packet. | ||
780 | |||
781 | - ``BUFMAX``: | ||
782 | The size in bytes of the buffer GDB will read into. This must | ||
783 | be larger than NUMREGBYTES. | ||
784 | |||
785 | - ``CACHE_FLUSH_IS_SAFE``: | ||
786 | Set to 1 if it is always safe to call | ||
787 | flush_cache_range or flush_icache_range. On some architectures, | ||
788 | these functions may not be safe to call on SMP since we keep other | ||
789 | CPUs in a holding pattern. | ||
790 | |||
791 | There are also the following functions for the common backend, found in | ||
792 | ``kernel/kgdb.c``, that must be supplied by the architecture-specific | ||
793 | backend unless marked as (optional), in which case a default function | ||
794 | maybe used if the architecture does not need to provide a specific | ||
795 | implementation. | ||
796 | |||
797 | .. kernel-doc:: include/linux/kgdb.h | ||
798 | :internal: | ||
799 | |||
800 | kgdboc internals | ||
801 | ---------------- | ||
802 | |||
803 | kgdboc and uarts | ||
804 | ~~~~~~~~~~~~~~~~ | ||
805 | |||
806 | The kgdboc driver is actually a very thin driver that relies on the | ||
807 | underlying low level to the hardware driver having "polling hooks" to | ||
808 | which the tty driver is attached. In the initial implementation of | ||
809 | kgdboc the serial_core was changed to expose a low level UART hook for | ||
810 | doing polled mode reading and writing of a single character while in an | ||
811 | atomic context. When kgdb makes an I/O request to the debugger, kgdboc | ||
812 | invokes a callback in the serial core which in turn uses the callback in | ||
813 | the UART driver. | ||
814 | |||
815 | When using kgdboc with a UART, the UART driver must implement two | ||
816 | callbacks in the :c:type:`struct uart_ops <uart_ops>`. | ||
817 | Example from ``drivers/8250.c``:: | ||
818 | |||
819 | |||
820 | #ifdef CONFIG_CONSOLE_POLL | ||
821 | .poll_get_char = serial8250_get_poll_char, | ||
822 | .poll_put_char = serial8250_put_poll_char, | ||
823 | #endif | ||
824 | |||
825 | |||
826 | Any implementation specifics around creating a polling driver use the | ||
827 | ``#ifdef CONFIG_CONSOLE_POLL``, as shown above. Keep in mind that | ||
828 | polling hooks have to be implemented in such a way that they can be | ||
829 | called from an atomic context and have to restore the state of the UART | ||
830 | chip on return such that the system can return to normal when the | ||
831 | debugger detaches. You need to be very careful with any kind of lock you | ||
832 | consider, because failing here is most likely going to mean pressing the | ||
833 | reset button. | ||
834 | |||
835 | kgdboc and keyboards | ||
836 | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
837 | |||
838 | The kgdboc driver contains logic to configure communications with an | ||
839 | attached keyboard. The keyboard infrastructure is only compiled into the | ||
840 | kernel when ``CONFIG_KDB_KEYBOARD=y`` is set in the kernel configuration. | ||
841 | |||
842 | The core polled keyboard driver driver for PS/2 type keyboards is in | ||
843 | ``drivers/char/kdb_keyboard.c``. This driver is hooked into the debug core | ||
844 | when kgdboc populates the callback in the array called | ||
845 | :c:type:`kdb_poll_funcs[]`. The :c:func:`kdb_get_kbd_char` is the top-level | ||
846 | function which polls hardware for single character input. | ||
847 | |||
848 | kgdboc and kms | ||
849 | ~~~~~~~~~~~~~~~~~~ | ||
850 | |||
851 | The kgdboc driver contains logic to request the graphics display to | ||
852 | switch to a text context when you are using ``kgdboc=kms,kbd``, provided | ||
853 | that you have a video driver which has a frame buffer console and atomic | ||
854 | kernel mode setting support. | ||
855 | |||
856 | Every time the kernel debugger is entered it calls | ||
857 | :c:func:`kgdboc_pre_exp_handler` which in turn calls :c:func:`con_debug_enter` | ||
858 | in the virtual console layer. On resuming kernel execution, the kernel | ||
859 | debugger calls :c:func:`kgdboc_post_exp_handler` which in turn calls | ||
860 | :c:func:`con_debug_leave`. | ||
861 | |||
862 | Any video driver that wants to be compatible with the kernel debugger | ||
863 | and the atomic kms callbacks must implement the ``mode_set_base_atomic``, | ||
864 | ``fb_debug_enter`` and ``fb_debug_leave operations``. For the | ||
865 | ``fb_debug_enter`` and ``fb_debug_leave`` the option exists to use the | ||
866 | generic drm fb helper functions or implement something custom for the | ||
867 | hardware. The following example shows the initialization of the | ||
868 | .mode_set_base_atomic operation in | ||
869 | drivers/gpu/drm/i915/intel_display.c:: | ||
870 | |||
871 | |||
872 | static const struct drm_crtc_helper_funcs intel_helper_funcs = { | ||
873 | [...] | ||
874 | .mode_set_base_atomic = intel_pipe_set_base_atomic, | ||
875 | [...] | ||
876 | }; | ||
877 | |||
878 | |||
879 | Here is an example of how the i915 driver initializes the | ||
880 | fb_debug_enter and fb_debug_leave functions to use the generic drm | ||
881 | helpers in ``drivers/gpu/drm/i915/intel_fb.c``:: | ||
882 | |||
883 | |||
884 | static struct fb_ops intelfb_ops = { | ||
885 | [...] | ||
886 | .fb_debug_enter = drm_fb_helper_debug_enter, | ||
887 | .fb_debug_leave = drm_fb_helper_debug_leave, | ||
888 | [...] | ||
889 | }; | ||
890 | |||
891 | |||
892 | Credits | ||
893 | ======= | ||
894 | |||
895 | The following people have contributed to this document: | ||
896 | |||
897 | 1. Amit Kale <amitkale@linsyssoft.com> | ||
898 | |||
899 | 2. Tom Rini <trini@kernel.crashing.org> | ||
900 | |||
901 | In March 2008 this document was completely rewritten by: | ||
902 | |||
903 | - Jason Wessel <jason.wessel@windriver.com> | ||
904 | |||
905 | In Jan 2010 this document was updated to include kdb. | ||
906 | |||
907 | - Jason Wessel <jason.wessel@windriver.com> | ||
diff --git a/Documentation/doc-guide/docbook.rst b/Documentation/doc-guide/docbook.rst deleted file mode 100644 index d8bf04308b43..000000000000 --- a/Documentation/doc-guide/docbook.rst +++ /dev/null | |||
@@ -1,90 +0,0 @@ | |||
1 | DocBook XML [DEPRECATED] | ||
2 | ======================== | ||
3 | |||
4 | .. attention:: | ||
5 | |||
6 | This section describes the deprecated DocBook XML toolchain. Please do not | ||
7 | create new DocBook XML template files. Please consider converting existing | ||
8 | DocBook XML templates files to Sphinx/reStructuredText. | ||
9 | |||
10 | Converting DocBook to Sphinx | ||
11 | ---------------------------- | ||
12 | |||
13 | Over time, we expect all of the documents under ``Documentation/DocBook`` to be | ||
14 | converted to Sphinx and reStructuredText. For most DocBook XML documents, a good | ||
15 | enough solution is to use the simple ``Documentation/sphinx/tmplcvt`` script, | ||
16 | which uses ``pandoc`` under the hood. For example:: | ||
17 | |||
18 | $ cd Documentation/sphinx | ||
19 | $ ./tmplcvt ../DocBook/in.tmpl ../out.rst | ||
20 | |||
21 | Then edit the resulting rst files to fix any remaining issues, and add the | ||
22 | document in the ``toctree`` in ``Documentation/index.rst``. | ||
23 | |||
24 | Components of the kernel-doc system | ||
25 | ----------------------------------- | ||
26 | |||
27 | Many places in the source tree have extractable documentation in the form of | ||
28 | block comments above functions. The components of this system are: | ||
29 | |||
30 | - ``scripts/kernel-doc`` | ||
31 | |||
32 | This is a perl script that hunts for the block comments and can mark them up | ||
33 | directly into reStructuredText, DocBook, man, text, and HTML. (No, not | ||
34 | texinfo.) | ||
35 | |||
36 | - ``Documentation/DocBook/*.tmpl`` | ||
37 | |||
38 | These are XML template files, which are normal XML files with special | ||
39 | place-holders for where the extracted documentation should go. | ||
40 | |||
41 | - ``scripts/docproc.c`` | ||
42 | |||
43 | This is a program for converting XML template files into XML files. When a | ||
44 | file is referenced it is searched for symbols exported (EXPORT_SYMBOL), to be | ||
45 | able to distinguish between internal and external functions. | ||
46 | |||
47 | It invokes kernel-doc, giving it the list of functions that are to be | ||
48 | documented. | ||
49 | |||
50 | Additionally it is used to scan the XML template files to locate all the files | ||
51 | referenced herein. This is used to generate dependency information as used by | ||
52 | make. | ||
53 | |||
54 | - ``Makefile`` | ||
55 | |||
56 | The targets 'xmldocs', 'psdocs', 'pdfdocs', and 'htmldocs' are used to build | ||
57 | DocBook XML files, PostScript files, PDF files, and html files in | ||
58 | Documentation/DocBook. The older target 'sgmldocs' is equivalent to 'xmldocs'. | ||
59 | |||
60 | - ``Documentation/DocBook/Makefile`` | ||
61 | |||
62 | This is where C files are associated with SGML templates. | ||
63 | |||
64 | How to use kernel-doc comments in DocBook XML template files | ||
65 | ------------------------------------------------------------ | ||
66 | |||
67 | DocBook XML template files (\*.tmpl) are like normal XML files, except that they | ||
68 | can contain escape sequences where extracted documentation should be inserted. | ||
69 | |||
70 | ``!E<filename>`` is replaced by the documentation, in ``<filename>``, for | ||
71 | functions that are exported using ``EXPORT_SYMBOL``: the function list is | ||
72 | collected from files listed in ``Documentation/DocBook/Makefile``. | ||
73 | |||
74 | ``!I<filename>`` is replaced by the documentation for functions that are **not** | ||
75 | exported using ``EXPORT_SYMBOL``. | ||
76 | |||
77 | ``!D<filename>`` is used to name additional files to search for functions | ||
78 | exported using ``EXPORT_SYMBOL``. | ||
79 | |||
80 | ``!F<filename> <function [functions...]>`` is replaced by the documentation, in | ||
81 | ``<filename>``, for the functions listed. | ||
82 | |||
83 | ``!P<filename> <section title>`` is replaced by the contents of the ``DOC:`` | ||
84 | section titled ``<section title>`` from ``<filename>``. Spaces are allowed in | ||
85 | ``<section title>``; do not quote the ``<section title>``. | ||
86 | |||
87 | ``!C<filename>`` is replaced by nothing, but makes the tools check that all DOC: | ||
88 | sections and documented functions, symbols, etc. are used. This makes sense to | ||
89 | use when you use ``!F`` or ``!P`` only and want to verify that all documentation | ||
90 | is included. | ||
diff --git a/Documentation/doc-guide/index.rst b/Documentation/doc-guide/index.rst index 6fff4024606e..a7f95d7d3a63 100644 --- a/Documentation/doc-guide/index.rst +++ b/Documentation/doc-guide/index.rst | |||
@@ -10,7 +10,6 @@ How to write kernel documentation | |||
10 | sphinx.rst | 10 | sphinx.rst |
11 | kernel-doc.rst | 11 | kernel-doc.rst |
12 | parse-headers.rst | 12 | parse-headers.rst |
13 | docbook.rst | ||
14 | 13 | ||
15 | .. only:: subproject and html | 14 | .. only:: subproject and html |
16 | 15 | ||
diff --git a/Documentation/doc-guide/kernel-doc.rst b/Documentation/doc-guide/kernel-doc.rst index b32e4813ff6f..b24854b5d6be 100644 --- a/Documentation/doc-guide/kernel-doc.rst +++ b/Documentation/doc-guide/kernel-doc.rst | |||
@@ -149,6 +149,16 @@ Domain`_ references. | |||
149 | ``%CONST`` | 149 | ``%CONST`` |
150 | Name of a constant. (No cross-referencing, just formatting.) | 150 | Name of a constant. (No cross-referencing, just formatting.) |
151 | 151 | ||
152 | ````literal```` | ||
153 | A literal block that should be handled as-is. The output will use a | ||
154 | ``monospaced font``. | ||
155 | |||
156 | Useful if you need to use special characters that would otherwise have some | ||
157 | meaning either by kernel-doc script of by reStructuredText. | ||
158 | |||
159 | This is particularly useful if you need to use things like ``%ph`` inside | ||
160 | a function description. | ||
161 | |||
152 | ``$ENVVAR`` | 162 | ``$ENVVAR`` |
153 | Name of an environment variable. (No cross-referencing, just formatting.) | 163 | Name of an environment variable. (No cross-referencing, just formatting.) |
154 | 164 | ||
diff --git a/Documentation/doc-guide/sphinx.rst b/Documentation/doc-guide/sphinx.rst index 731334de3efd..84e8e8a9cbdb 100644 --- a/Documentation/doc-guide/sphinx.rst +++ b/Documentation/doc-guide/sphinx.rst | |||
@@ -15,11 +15,6 @@ are used to describe the functions and types and design of the code. The | |||
15 | kernel-doc comments have some special structure and formatting, but beyond that | 15 | kernel-doc comments have some special structure and formatting, but beyond that |
16 | they are also treated as reStructuredText. | 16 | they are also treated as reStructuredText. |
17 | 17 | ||
18 | There is also the deprecated DocBook toolchain to generate documentation from | ||
19 | DocBook XML template files under ``Documentation/DocBook``. The DocBook files | ||
20 | are to be converted to reStructuredText, and the toolchain is slated to be | ||
21 | removed. | ||
22 | |||
23 | Finally, there are thousands of plain text documentation files scattered around | 18 | Finally, there are thousands of plain text documentation files scattered around |
24 | ``Documentation``. Some of these will likely be converted to reStructuredText | 19 | ``Documentation``. Some of these will likely be converted to reStructuredText |
25 | over time, but the bulk of them will remain in plain text. | 20 | over time, but the bulk of them will remain in plain text. |
diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 77b92221f951..f64a63b233c3 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff | |||
@@ -118,7 +118,6 @@ defkeymap.c | |||
118 | devlist.h* | 118 | devlist.h* |
119 | devicetable-offsets.h | 119 | devicetable-offsets.h |
120 | dnotify_test | 120 | dnotify_test |
121 | docproc | ||
122 | dslm | 121 | dslm |
123 | dtc | 122 | dtc |
124 | elf2ecoff | 123 | elf2ecoff |
diff --git a/Documentation/driver-api/i2c.rst b/Documentation/driver-api/i2c.rst index f3939f7852bd..0bf86a445d01 100644 --- a/Documentation/driver-api/i2c.rst +++ b/Documentation/driver-api/i2c.rst | |||
@@ -13,8 +13,8 @@ I2C is a multi-master bus; open drain signaling is used to arbitrate | |||
13 | between masters, as well as to handshake and to synchronize clocks from | 13 | between masters, as well as to handshake and to synchronize clocks from |
14 | slower clients. | 14 | slower clients. |
15 | 15 | ||
16 | The Linux I2C programming interfaces support only the master side of bus | 16 | The Linux I2C programming interfaces support the master side of bus |
17 | interactions, not the slave side. The programming interface is | 17 | interactions and the slave side. The programming interface is |
18 | structured around two kinds of driver, and two kinds of device. An I2C | 18 | structured around two kinds of driver, and two kinds of device. An I2C |
19 | "Adapter Driver" abstracts the controller hardware; it binds to a | 19 | "Adapter Driver" abstracts the controller hardware; it binds to a |
20 | physical device (perhaps a PCI device or platform_device) and exposes a | 20 | physical device (perhaps a PCI device or platform_device) and exposes a |
@@ -22,9 +22,8 @@ physical device (perhaps a PCI device or platform_device) and exposes a | |||
22 | I2C bus segment it manages. On each I2C bus segment will be I2C devices | 22 | I2C bus segment it manages. On each I2C bus segment will be I2C devices |
23 | represented by a :c:type:`struct i2c_client <i2c_client>`. | 23 | represented by a :c:type:`struct i2c_client <i2c_client>`. |
24 | Those devices will be bound to a :c:type:`struct i2c_driver | 24 | Those devices will be bound to a :c:type:`struct i2c_driver |
25 | <i2c_driver>`, which should follow the standard Linux driver | 25 | <i2c_driver>`, which should follow the standard Linux driver model. There |
26 | model. (At this writing, a legacy model is more widely used.) There are | 26 | are functions to perform various I2C protocol operations; at this writing |
27 | functions to perform various I2C protocol operations; at this writing | ||
28 | all such functions are usable only from task context. | 27 | all such functions are usable only from task context. |
29 | 28 | ||
30 | The System Management Bus (SMBus) is a sibling protocol. Most SMBus | 29 | The System Management Bus (SMBus) is a sibling protocol. Most SMBus |
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 8058a87c1c74..3cf1acebc4ee 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst | |||
@@ -32,7 +32,13 @@ available subsections can be seen below. | |||
32 | i2c | 32 | i2c |
33 | hsi | 33 | hsi |
34 | edac | 34 | edac |
35 | scsi | ||
36 | libata | ||
37 | mtdnand | ||
35 | miscellaneous | 38 | miscellaneous |
39 | w1 | ||
40 | rapidio | ||
41 | s390-drivers | ||
36 | vme | 42 | vme |
37 | 80211/index | 43 | 80211/index |
38 | uio-howto | 44 | uio-howto |
diff --git a/Documentation/driver-api/libata.rst b/Documentation/driver-api/libata.rst new file mode 100644 index 000000000000..4adc056f7635 --- /dev/null +++ b/Documentation/driver-api/libata.rst | |||
@@ -0,0 +1,1031 @@ | |||
1 | ======================== | ||
2 | libATA Developer's Guide | ||
3 | ======================== | ||
4 | |||
5 | :Author: Jeff Garzik | ||
6 | |||
7 | Introduction | ||
8 | ============ | ||
9 | |||
10 | libATA is a library used inside the Linux kernel to support ATA host | ||
11 | controllers and devices. libATA provides an ATA driver API, class | ||
12 | transports for ATA and ATAPI devices, and SCSI<->ATA translation for ATA | ||
13 | devices according to the T10 SAT specification. | ||
14 | |||
15 | This Guide documents the libATA driver API, library functions, library | ||
16 | internals, and a couple sample ATA low-level drivers. | ||
17 | |||
18 | libata Driver API | ||
19 | ================= | ||
20 | |||
21 | :c:type:`struct ata_port_operations <ata_port_operations>` | ||
22 | is defined for every low-level libata | ||
23 | hardware driver, and it controls how the low-level driver interfaces | ||
24 | with the ATA and SCSI layers. | ||
25 | |||
26 | FIS-based drivers will hook into the system with ``->qc_prep()`` and | ||
27 | ``->qc_issue()`` high-level hooks. Hardware which behaves in a manner | ||
28 | similar to PCI IDE hardware may utilize several generic helpers, | ||
29 | defining at a bare minimum the bus I/O addresses of the ATA shadow | ||
30 | register blocks. | ||
31 | |||
32 | :c:type:`struct ata_port_operations <ata_port_operations>` | ||
33 | ---------------------------------------------------------- | ||
34 | |||
35 | Disable ATA port | ||
36 | ~~~~~~~~~~~~~~~~ | ||
37 | |||
38 | :: | ||
39 | |||
40 | void (*port_disable) (struct ata_port *); | ||
41 | |||
42 | |||
43 | Called from :c:func:`ata_bus_probe` error path, as well as when unregistering | ||
44 | from the SCSI module (rmmod, hot unplug). This function should do | ||
45 | whatever needs to be done to take the port out of use. In most cases, | ||
46 | :c:func:`ata_port_disable` can be used as this hook. | ||
47 | |||
48 | Called from :c:func:`ata_bus_probe` on a failed probe. Called from | ||
49 | :c:func:`ata_scsi_release`. | ||
50 | |||
51 | Post-IDENTIFY device configuration | ||
52 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
53 | |||
54 | :: | ||
55 | |||
56 | void (*dev_config) (struct ata_port *, struct ata_device *); | ||
57 | |||
58 | |||
59 | Called after IDENTIFY [PACKET] DEVICE is issued to each device found. | ||
60 | Typically used to apply device-specific fixups prior to issue of SET | ||
61 | FEATURES - XFER MODE, and prior to operation. | ||
62 | |||
63 | This entry may be specified as NULL in ata_port_operations. | ||
64 | |||
65 | Set PIO/DMA mode | ||
66 | ~~~~~~~~~~~~~~~~ | ||
67 | |||
68 | :: | ||
69 | |||
70 | void (*set_piomode) (struct ata_port *, struct ata_device *); | ||
71 | void (*set_dmamode) (struct ata_port *, struct ata_device *); | ||
72 | void (*post_set_mode) (struct ata_port *); | ||
73 | unsigned int (*mode_filter) (struct ata_port *, struct ata_device *, unsigned int); | ||
74 | |||
75 | |||
76 | Hooks called prior to the issue of SET FEATURES - XFER MODE command. The | ||
77 | optional ``->mode_filter()`` hook is called when libata has built a mask of | ||
78 | the possible modes. This is passed to the ``->mode_filter()`` function | ||
79 | which should return a mask of valid modes after filtering those | ||
80 | unsuitable due to hardware limits. It is not valid to use this interface | ||
81 | to add modes. | ||
82 | |||
83 | ``dev->pio_mode`` and ``dev->dma_mode`` are guaranteed to be valid when | ||
84 | ``->set_piomode()`` and when ``->set_dmamode()`` is called. The timings for | ||
85 | any other drive sharing the cable will also be valid at this point. That | ||
86 | is the library records the decisions for the modes of each drive on a | ||
87 | channel before it attempts to set any of them. | ||
88 | |||
89 | ``->post_set_mode()`` is called unconditionally, after the SET FEATURES - | ||
90 | XFER MODE command completes successfully. | ||
91 | |||
92 | ``->set_piomode()`` is always called (if present), but ``->set_dma_mode()`` | ||
93 | is only called if DMA is possible. | ||
94 | |||
95 | Taskfile read/write | ||
96 | ~~~~~~~~~~~~~~~~~~~ | ||
97 | |||
98 | :: | ||
99 | |||
100 | void (*sff_tf_load) (struct ata_port *ap, struct ata_taskfile *tf); | ||
101 | void (*sff_tf_read) (struct ata_port *ap, struct ata_taskfile *tf); | ||
102 | |||
103 | |||
104 | ``->tf_load()`` is called to load the given taskfile into hardware | ||
105 | registers / DMA buffers. ``->tf_read()`` is called to read the hardware | ||
106 | registers / DMA buffers, to obtain the current set of taskfile register | ||
107 | values. Most drivers for taskfile-based hardware (PIO or MMIO) use | ||
108 | :c:func:`ata_sff_tf_load` and :c:func:`ata_sff_tf_read` for these hooks. | ||
109 | |||
110 | PIO data read/write | ||
111 | ~~~~~~~~~~~~~~~~~~~ | ||
112 | |||
113 | :: | ||
114 | |||
115 | void (*sff_data_xfer) (struct ata_device *, unsigned char *, unsigned int, int); | ||
116 | |||
117 | |||
118 | All bmdma-style drivers must implement this hook. This is the low-level | ||
119 | operation that actually copies the data bytes during a PIO data | ||
120 | transfer. Typically the driver will choose one of | ||
121 | :c:func:`ata_sff_data_xfer_noirq`, :c:func:`ata_sff_data_xfer`, or | ||
122 | :c:func:`ata_sff_data_xfer32`. | ||
123 | |||
124 | ATA command execute | ||
125 | ~~~~~~~~~~~~~~~~~~~ | ||
126 | |||
127 | :: | ||
128 | |||
129 | void (*sff_exec_command)(struct ata_port *ap, struct ata_taskfile *tf); | ||
130 | |||
131 | |||
132 | causes an ATA command, previously loaded with ``->tf_load()``, to be | ||
133 | initiated in hardware. Most drivers for taskfile-based hardware use | ||
134 | :c:func:`ata_sff_exec_command` for this hook. | ||
135 | |||
136 | Per-cmd ATAPI DMA capabilities filter | ||
137 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
138 | |||
139 | :: | ||
140 | |||
141 | int (*check_atapi_dma) (struct ata_queued_cmd *qc); | ||
142 | |||
143 | |||
144 | Allow low-level driver to filter ATA PACKET commands, returning a status | ||
145 | indicating whether or not it is OK to use DMA for the supplied PACKET | ||
146 | command. | ||
147 | |||
148 | This hook may be specified as NULL, in which case libata will assume | ||
149 | that atapi dma can be supported. | ||
150 | |||
151 | Read specific ATA shadow registers | ||
152 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
153 | |||
154 | :: | ||
155 | |||
156 | u8 (*sff_check_status)(struct ata_port *ap); | ||
157 | u8 (*sff_check_altstatus)(struct ata_port *ap); | ||
158 | |||
159 | |||
160 | Reads the Status/AltStatus ATA shadow register from hardware. On some | ||
161 | hardware, reading the Status register has the side effect of clearing | ||
162 | the interrupt condition. Most drivers for taskfile-based hardware use | ||
163 | :c:func:`ata_sff_check_status` for this hook. | ||
164 | |||
165 | Write specific ATA shadow register | ||
166 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
167 | |||
168 | :: | ||
169 | |||
170 | void (*sff_set_devctl)(struct ata_port *ap, u8 ctl); | ||
171 | |||
172 | |||
173 | Write the device control ATA shadow register to the hardware. Most | ||
174 | drivers don't need to define this. | ||
175 | |||
176 | Select ATA device on bus | ||
177 | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
178 | |||
179 | :: | ||
180 | |||
181 | void (*sff_dev_select)(struct ata_port *ap, unsigned int device); | ||
182 | |||
183 | |||
184 | Issues the low-level hardware command(s) that causes one of N hardware | ||
185 | devices to be considered 'selected' (active and available for use) on | ||
186 | the ATA bus. This generally has no meaning on FIS-based devices. | ||
187 | |||
188 | Most drivers for taskfile-based hardware use :c:func:`ata_sff_dev_select` for | ||
189 | this hook. | ||
190 | |||
191 | Private tuning method | ||
192 | ~~~~~~~~~~~~~~~~~~~~~ | ||
193 | |||
194 | :: | ||
195 | |||
196 | void (*set_mode) (struct ata_port *ap); | ||
197 | |||
198 | |||
199 | By default libata performs drive and controller tuning in accordance | ||
200 | with the ATA timing rules and also applies blacklists and cable limits. | ||
201 | Some controllers need special handling and have custom tuning rules, | ||
202 | typically raid controllers that use ATA commands but do not actually do | ||
203 | drive timing. | ||
204 | |||
205 | **Warning** | ||
206 | |||
207 | This hook should not be used to replace the standard controller | ||
208 | tuning logic when a controller has quirks. Replacing the default | ||
209 | tuning logic in that case would bypass handling for drive and bridge | ||
210 | quirks that may be important to data reliability. If a controller | ||
211 | needs to filter the mode selection it should use the mode_filter | ||
212 | hook instead. | ||
213 | |||
214 | Control PCI IDE BMDMA engine | ||
215 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
216 | |||
217 | :: | ||
218 | |||
219 | void (*bmdma_setup) (struct ata_queued_cmd *qc); | ||
220 | void (*bmdma_start) (struct ata_queued_cmd *qc); | ||
221 | void (*bmdma_stop) (struct ata_port *ap); | ||
222 | u8 (*bmdma_status) (struct ata_port *ap); | ||
223 | |||
224 | |||
225 | When setting up an IDE BMDMA transaction, these hooks arm | ||
226 | (``->bmdma_setup``), fire (``->bmdma_start``), and halt (``->bmdma_stop``) the | ||
227 | hardware's DMA engine. ``->bmdma_status`` is used to read the standard PCI | ||
228 | IDE DMA Status register. | ||
229 | |||
230 | These hooks are typically either no-ops, or simply not implemented, in | ||
231 | FIS-based drivers. | ||
232 | |||
233 | Most legacy IDE drivers use :c:func:`ata_bmdma_setup` for the | ||
234 | :c:func:`bmdma_setup` hook. :c:func:`ata_bmdma_setup` will write the pointer | ||
235 | to the PRD table to the IDE PRD Table Address register, enable DMA in the DMA | ||
236 | Command register, and call :c:func:`exec_command` to begin the transfer. | ||
237 | |||
238 | Most legacy IDE drivers use :c:func:`ata_bmdma_start` for the | ||
239 | :c:func:`bmdma_start` hook. :c:func:`ata_bmdma_start` will write the | ||
240 | ATA_DMA_START flag to the DMA Command register. | ||
241 | |||
242 | Many legacy IDE drivers use :c:func:`ata_bmdma_stop` for the | ||
243 | :c:func:`bmdma_stop` hook. :c:func:`ata_bmdma_stop` clears the ATA_DMA_START | ||
244 | flag in the DMA command register. | ||
245 | |||
246 | Many legacy IDE drivers use :c:func:`ata_bmdma_status` as the | ||
247 | :c:func:`bmdma_status` hook. | ||
248 | |||
249 | High-level taskfile hooks | ||
250 | ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
251 | |||
252 | :: | ||
253 | |||
254 | void (*qc_prep) (struct ata_queued_cmd *qc); | ||
255 | int (*qc_issue) (struct ata_queued_cmd *qc); | ||
256 | |||
257 | |||
258 | Higher-level hooks, these two hooks can potentially supercede several of | ||
259 | the above taskfile/DMA engine hooks. ``->qc_prep`` is called after the | ||
260 | buffers have been DMA-mapped, and is typically used to populate the | ||
261 | hardware's DMA scatter-gather table. Most drivers use the standard | ||
262 | :c:func:`ata_qc_prep` helper function, but more advanced drivers roll their | ||
263 | own. | ||
264 | |||
265 | ``->qc_issue`` is used to make a command active, once the hardware and S/G | ||
266 | tables have been prepared. IDE BMDMA drivers use the helper function | ||
267 | :c:func:`ata_qc_issue_prot` for taskfile protocol-based dispatch. More | ||
268 | advanced drivers implement their own ``->qc_issue``. | ||
269 | |||
270 | :c:func:`ata_qc_issue_prot` calls ``->tf_load()``, ``->bmdma_setup()``, and | ||
271 | ``->bmdma_start()`` as necessary to initiate a transfer. | ||
272 | |||
273 | Exception and probe handling (EH) | ||
274 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
275 | |||
276 | :: | ||
277 | |||
278 | void (*eng_timeout) (struct ata_port *ap); | ||
279 | void (*phy_reset) (struct ata_port *ap); | ||
280 | |||
281 | |||
282 | Deprecated. Use ``->error_handler()`` instead. | ||
283 | |||
284 | :: | ||
285 | |||
286 | void (*freeze) (struct ata_port *ap); | ||
287 | void (*thaw) (struct ata_port *ap); | ||
288 | |||
289 | |||
290 | :c:func:`ata_port_freeze` is called when HSM violations or some other | ||
291 | condition disrupts normal operation of the port. A frozen port is not | ||
292 | allowed to perform any operation until the port is thawed, which usually | ||
293 | follows a successful reset. | ||
294 | |||
295 | The optional ``->freeze()`` callback can be used for freezing the port | ||
296 | hardware-wise (e.g. mask interrupt and stop DMA engine). If a port | ||
297 | cannot be frozen hardware-wise, the interrupt handler must ack and clear | ||
298 | interrupts unconditionally while the port is frozen. | ||
299 | |||
300 | The optional ``->thaw()`` callback is called to perform the opposite of | ||
301 | ``->freeze()``: prepare the port for normal operation once again. Unmask | ||
302 | interrupts, start DMA engine, etc. | ||
303 | |||
304 | :: | ||
305 | |||
306 | void (*error_handler) (struct ata_port *ap); | ||
307 | |||
308 | |||
309 | ``->error_handler()`` is a driver's hook into probe, hotplug, and recovery | ||
310 | and other exceptional conditions. The primary responsibility of an | ||
311 | implementation is to call :c:func:`ata_do_eh` or :c:func:`ata_bmdma_drive_eh` | ||
312 | with a set of EH hooks as arguments: | ||
313 | |||
314 | 'prereset' hook (may be NULL) is called during an EH reset, before any | ||
315 | other actions are taken. | ||
316 | |||
317 | 'postreset' hook (may be NULL) is called after the EH reset is | ||
318 | performed. Based on existing conditions, severity of the problem, and | ||
319 | hardware capabilities, | ||
320 | |||
321 | Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be | ||
322 | called to perform the low-level EH reset. | ||
323 | |||
324 | :: | ||
325 | |||
326 | void (*post_internal_cmd) (struct ata_queued_cmd *qc); | ||
327 | |||
328 | |||
329 | Perform any hardware-specific actions necessary to finish processing | ||
330 | after executing a probe-time or EH-time command via | ||
331 | :c:func:`ata_exec_internal`. | ||
332 | |||
333 | Hardware interrupt handling | ||
334 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
335 | |||
336 | :: | ||
337 | |||
338 | irqreturn_t (*irq_handler)(int, void *, struct pt_regs *); | ||
339 | void (*irq_clear) (struct ata_port *); | ||
340 | |||
341 | |||
342 | ``->irq_handler`` is the interrupt handling routine registered with the | ||
343 | system, by libata. ``->irq_clear`` is called during probe just before the | ||
344 | interrupt handler is registered, to be sure hardware is quiet. | ||
345 | |||
346 | The second argument, dev_instance, should be cast to a pointer to | ||
347 | :c:type:`struct ata_host_set <ata_host_set>`. | ||
348 | |||
349 | Most legacy IDE drivers use :c:func:`ata_sff_interrupt` for the irq_handler | ||
350 | hook, which scans all ports in the host_set, determines which queued | ||
351 | command was active (if any), and calls ata_sff_host_intr(ap,qc). | ||
352 | |||
353 | Most legacy IDE drivers use :c:func:`ata_sff_irq_clear` for the | ||
354 | :c:func:`irq_clear` hook, which simply clears the interrupt and error flags | ||
355 | in the DMA status register. | ||
356 | |||
357 | SATA phy read/write | ||
358 | ~~~~~~~~~~~~~~~~~~~ | ||
359 | |||
360 | :: | ||
361 | |||
362 | int (*scr_read) (struct ata_port *ap, unsigned int sc_reg, | ||
363 | u32 *val); | ||
364 | int (*scr_write) (struct ata_port *ap, unsigned int sc_reg, | ||
365 | u32 val); | ||
366 | |||
367 | |||
368 | Read and write standard SATA phy registers. Currently only used if | ||
369 | ``->phy_reset`` hook called the :c:func:`sata_phy_reset` helper function. | ||
370 | sc_reg is one of SCR_STATUS, SCR_CONTROL, SCR_ERROR, or SCR_ACTIVE. | ||
371 | |||
372 | Init and shutdown | ||
373 | ~~~~~~~~~~~~~~~~~ | ||
374 | |||
375 | :: | ||
376 | |||
377 | int (*port_start) (struct ata_port *ap); | ||
378 | void (*port_stop) (struct ata_port *ap); | ||
379 | void (*host_stop) (struct ata_host_set *host_set); | ||
380 | |||
381 | |||
382 | ``->port_start()`` is called just after the data structures for each port | ||
383 | are initialized. Typically this is used to alloc per-port DMA buffers / | ||
384 | tables / rings, enable DMA engines, and similar tasks. Some drivers also | ||
385 | use this entry point as a chance to allocate driver-private memory for | ||
386 | ``ap->private_data``. | ||
387 | |||
388 | Many drivers use :c:func:`ata_port_start` as this hook or call it from their | ||
389 | own :c:func:`port_start` hooks. :c:func:`ata_port_start` allocates space for | ||
390 | a legacy IDE PRD table and returns. | ||
391 | |||
392 | ``->port_stop()`` is called after ``->host_stop()``. Its sole function is to | ||
393 | release DMA/memory resources, now that they are no longer actively being | ||
394 | used. Many drivers also free driver-private data from port at this time. | ||
395 | |||
396 | ``->host_stop()`` is called after all ``->port_stop()`` calls have completed. | ||
397 | The hook must finalize hardware shutdown, release DMA and other | ||
398 | resources, etc. This hook may be specified as NULL, in which case it is | ||
399 | not called. | ||
400 | |||
401 | Error handling | ||
402 | ============== | ||
403 | |||
404 | This chapter describes how errors are handled under libata. Readers are | ||
405 | advised to read SCSI EH (Documentation/scsi/scsi_eh.txt) and ATA | ||
406 | exceptions doc first. | ||
407 | |||
408 | Origins of commands | ||
409 | ------------------- | ||
410 | |||
411 | In libata, a command is represented with | ||
412 | :c:type:`struct ata_queued_cmd <ata_queued_cmd>` or qc. | ||
413 | qc's are preallocated during port initialization and repetitively used | ||
414 | for command executions. Currently only one qc is allocated per port but | ||
415 | yet-to-be-merged NCQ branch allocates one for each tag and maps each qc | ||
416 | to NCQ tag 1-to-1. | ||
417 | |||
418 | libata commands can originate from two sources - libata itself and SCSI | ||
419 | midlayer. libata internal commands are used for initialization and error | ||
420 | handling. All normal blk requests and commands for SCSI emulation are | ||
421 | passed as SCSI commands through queuecommand callback of SCSI host | ||
422 | template. | ||
423 | |||
424 | How commands are issued | ||
425 | ----------------------- | ||
426 | |||
427 | Internal commands | ||
428 | First, qc is allocated and initialized using :c:func:`ata_qc_new_init`. | ||
429 | Although :c:func:`ata_qc_new_init` doesn't implement any wait or retry | ||
430 | mechanism when qc is not available, internal commands are currently | ||
431 | issued only during initialization and error recovery, so no other | ||
432 | command is active and allocation is guaranteed to succeed. | ||
433 | |||
434 | Once allocated qc's taskfile is initialized for the command to be | ||
435 | executed. qc currently has two mechanisms to notify completion. One | ||
436 | is via ``qc->complete_fn()`` callback and the other is completion | ||
437 | ``qc->waiting``. ``qc->complete_fn()`` callback is the asynchronous path | ||
438 | used by normal SCSI translated commands and ``qc->waiting`` is the | ||
439 | synchronous (issuer sleeps in process context) path used by internal | ||
440 | commands. | ||
441 | |||
442 | Once initialization is complete, host_set lock is acquired and the | ||
443 | qc is issued. | ||
444 | |||
445 | SCSI commands | ||
446 | All libata drivers use :c:func:`ata_scsi_queuecmd` as | ||
447 | ``hostt->queuecommand`` callback. scmds can either be simulated or | ||
448 | translated. No qc is involved in processing a simulated scmd. The | ||
449 | result is computed right away and the scmd is completed. | ||
450 | |||
451 | For a translated scmd, :c:func:`ata_qc_new_init` is invoked to allocate a | ||
452 | qc and the scmd is translated into the qc. SCSI midlayer's | ||
453 | completion notification function pointer is stored into | ||
454 | ``qc->scsidone``. | ||
455 | |||
456 | ``qc->complete_fn()`` callback is used for completion notification. ATA | ||
457 | commands use :c:func:`ata_scsi_qc_complete` while ATAPI commands use | ||
458 | :c:func:`atapi_qc_complete`. Both functions end up calling ``qc->scsidone`` | ||
459 | to notify upper layer when the qc is finished. After translation is | ||
460 | completed, the qc is issued with :c:func:`ata_qc_issue`. | ||
461 | |||
462 | Note that SCSI midlayer invokes hostt->queuecommand while holding | ||
463 | host_set lock, so all above occur while holding host_set lock. | ||
464 | |||
465 | How commands are processed | ||
466 | -------------------------- | ||
467 | |||
468 | Depending on which protocol and which controller are used, commands are | ||
469 | processed differently. For the purpose of discussion, a controller which | ||
470 | uses taskfile interface and all standard callbacks is assumed. | ||
471 | |||
472 | Currently 6 ATA command protocols are used. They can be sorted into the | ||
473 | following four categories according to how they are processed. | ||
474 | |||
475 | ATA NO DATA or DMA | ||
476 | ATA_PROT_NODATA and ATA_PROT_DMA fall into this category. These | ||
477 | types of commands don't require any software intervention once | ||
478 | issued. Device will raise interrupt on completion. | ||
479 | |||
480 | ATA PIO | ||
481 | ATA_PROT_PIO is in this category. libata currently implements PIO | ||
482 | with polling. ATA_NIEN bit is set to turn off interrupt and | ||
483 | pio_task on ata_wq performs polling and IO. | ||
484 | |||
485 | ATAPI NODATA or DMA | ||
486 | ATA_PROT_ATAPI_NODATA and ATA_PROT_ATAPI_DMA are in this | ||
487 | category. packet_task is used to poll BSY bit after issuing PACKET | ||
488 | command. Once BSY is turned off by the device, packet_task | ||
489 | transfers CDB and hands off processing to interrupt handler. | ||
490 | |||
491 | ATAPI PIO | ||
492 | ATA_PROT_ATAPI is in this category. ATA_NIEN bit is set and, as | ||
493 | in ATAPI NODATA or DMA, packet_task submits cdb. However, after | ||
494 | submitting cdb, further processing (data transfer) is handed off to | ||
495 | pio_task. | ||
496 | |||
497 | How commands are completed | ||
498 | -------------------------- | ||
499 | |||
500 | Once issued, all qc's are either completed with :c:func:`ata_qc_complete` or | ||
501 | time out. For commands which are handled by interrupts, | ||
502 | :c:func:`ata_host_intr` invokes :c:func:`ata_qc_complete`, and, for PIO tasks, | ||
503 | pio_task invokes :c:func:`ata_qc_complete`. In error cases, packet_task may | ||
504 | also complete commands. | ||
505 | |||
506 | :c:func:`ata_qc_complete` does the following. | ||
507 | |||
508 | 1. DMA memory is unmapped. | ||
509 | |||
510 | 2. ATA_QCFLAG_ACTIVE is cleared from qc->flags. | ||
511 | |||
512 | 3. :c:func:`qc->complete_fn` callback is invoked. If the return value of the | ||
513 | callback is not zero. Completion is short circuited and | ||
514 | :c:func:`ata_qc_complete` returns. | ||
515 | |||
516 | 4. :c:func:`__ata_qc_complete` is called, which does | ||
517 | |||
518 | 1. ``qc->flags`` is cleared to zero. | ||
519 | |||
520 | 2. ``ap->active_tag`` and ``qc->tag`` are poisoned. | ||
521 | |||
522 | 3. ``qc->waiting`` is cleared & completed (in that order). | ||
523 | |||
524 | 4. qc is deallocated by clearing appropriate bit in ``ap->qactive``. | ||
525 | |||
526 | So, it basically notifies upper layer and deallocates qc. One exception | ||
527 | is short-circuit path in #3 which is used by :c:func:`atapi_qc_complete`. | ||
528 | |||
529 | For all non-ATAPI commands, whether it fails or not, almost the same | ||
530 | code path is taken and very little error handling takes place. A qc is | ||
531 | completed with success status if it succeeded, with failed status | ||
532 | otherwise. | ||
533 | |||
534 | However, failed ATAPI commands require more handling as REQUEST SENSE is | ||
535 | needed to acquire sense data. If an ATAPI command fails, | ||
536 | :c:func:`ata_qc_complete` is invoked with error status, which in turn invokes | ||
537 | :c:func:`atapi_qc_complete` via ``qc->complete_fn()`` callback. | ||
538 | |||
539 | This makes :c:func:`atapi_qc_complete` set ``scmd->result`` to | ||
540 | SAM_STAT_CHECK_CONDITION, complete the scmd and return 1. As the | ||
541 | sense data is empty but ``scmd->result`` is CHECK CONDITION, SCSI midlayer | ||
542 | will invoke EH for the scmd, and returning 1 makes :c:func:`ata_qc_complete` | ||
543 | to return without deallocating the qc. This leads us to | ||
544 | :c:func:`ata_scsi_error` with partially completed qc. | ||
545 | |||
546 | :c:func:`ata_scsi_error` | ||
547 | ------------------------ | ||
548 | |||
549 | :c:func:`ata_scsi_error` is the current ``transportt->eh_strategy_handler()`` | ||
550 | for libata. As discussed above, this will be entered in two cases - | ||
551 | timeout and ATAPI error completion. This function calls low level libata | ||
552 | driver's :c:func:`eng_timeout` callback, the standard callback for which is | ||
553 | :c:func:`ata_eng_timeout`. It checks if a qc is active and calls | ||
554 | :c:func:`ata_qc_timeout` on the qc if so. Actual error handling occurs in | ||
555 | :c:func:`ata_qc_timeout`. | ||
556 | |||
557 | If EH is invoked for timeout, :c:func:`ata_qc_timeout` stops BMDMA and | ||
558 | completes the qc. Note that as we're currently in EH, we cannot call | ||
559 | scsi_done. As described in SCSI EH doc, a recovered scmd should be | ||
560 | either retried with :c:func:`scsi_queue_insert` or finished with | ||
561 | :c:func:`scsi_finish_command`. Here, we override ``qc->scsidone`` with | ||
562 | :c:func:`scsi_finish_command` and calls :c:func:`ata_qc_complete`. | ||
563 | |||
564 | If EH is invoked due to a failed ATAPI qc, the qc here is completed but | ||
565 | not deallocated. The purpose of this half-completion is to use the qc as | ||
566 | place holder to make EH code reach this place. This is a bit hackish, | ||
567 | but it works. | ||
568 | |||
569 | Once control reaches here, the qc is deallocated by invoking | ||
570 | :c:func:`__ata_qc_complete` explicitly. Then, internal qc for REQUEST SENSE | ||
571 | is issued. Once sense data is acquired, scmd is finished by directly | ||
572 | invoking :c:func:`scsi_finish_command` on the scmd. Note that as we already | ||
573 | have completed and deallocated the qc which was associated with the | ||
574 | scmd, we don't need to/cannot call :c:func:`ata_qc_complete` again. | ||
575 | |||
576 | Problems with the current EH | ||
577 | ---------------------------- | ||
578 | |||
579 | - Error representation is too crude. Currently any and all error | ||
580 | conditions are represented with ATA STATUS and ERROR registers. | ||
581 | Errors which aren't ATA device errors are treated as ATA device | ||
582 | errors by setting ATA_ERR bit. Better error descriptor which can | ||
583 | properly represent ATA and other errors/exceptions is needed. | ||
584 | |||
585 | - When handling timeouts, no action is taken to make device forget | ||
586 | about the timed out command and ready for new commands. | ||
587 | |||
588 | - EH handling via :c:func:`ata_scsi_error` is not properly protected from | ||
589 | usual command processing. On EH entrance, the device is not in | ||
590 | quiescent state. Timed out commands may succeed or fail any time. | ||
591 | pio_task and atapi_task may still be running. | ||
592 | |||
593 | - Too weak error recovery. Devices / controllers causing HSM mismatch | ||
594 | errors and other errors quite often require reset to return to known | ||
595 | state. Also, advanced error handling is necessary to support features | ||
596 | like NCQ and hotplug. | ||
597 | |||
598 | - ATA errors are directly handled in the interrupt handler and PIO | ||
599 | errors in pio_task. This is problematic for advanced error handling | ||
600 | for the following reasons. | ||
601 | |||
602 | First, advanced error handling often requires context and internal qc | ||
603 | execution. | ||
604 | |||
605 | Second, even a simple failure (say, CRC error) needs information | ||
606 | gathering and could trigger complex error handling (say, resetting & | ||
607 | reconfiguring). Having multiple code paths to gather information, | ||
608 | enter EH and trigger actions makes life painful. | ||
609 | |||
610 | Third, scattered EH code makes implementing low level drivers | ||
611 | difficult. Low level drivers override libata callbacks. If EH is | ||
612 | scattered over several places, each affected callbacks should perform | ||
613 | its part of error handling. This can be error prone and painful. | ||
614 | |||
615 | libata Library | ||
616 | ============== | ||
617 | |||
618 | .. kernel-doc:: drivers/ata/libata-core.c | ||
619 | :export: | ||
620 | |||
621 | libata Core Internals | ||
622 | ===================== | ||
623 | |||
624 | .. kernel-doc:: drivers/ata/libata-core.c | ||
625 | :internal: | ||
626 | |||
627 | .. kernel-doc:: drivers/ata/libata-eh.c | ||
628 | |||
629 | libata SCSI translation/emulation | ||
630 | ================================= | ||
631 | |||
632 | .. kernel-doc:: drivers/ata/libata-scsi.c | ||
633 | :export: | ||
634 | |||
635 | .. kernel-doc:: drivers/ata/libata-scsi.c | ||
636 | :internal: | ||
637 | |||
638 | ATA errors and exceptions | ||
639 | ========================= | ||
640 | |||
641 | This chapter tries to identify what error/exception conditions exist for | ||
642 | ATA/ATAPI devices and describe how they should be handled in | ||
643 | implementation-neutral way. | ||
644 | |||
645 | The term 'error' is used to describe conditions where either an explicit | ||
646 | error condition is reported from device or a command has timed out. | ||
647 | |||
648 | The term 'exception' is either used to describe exceptional conditions | ||
649 | which are not errors (say, power or hotplug events), or to describe both | ||
650 | errors and non-error exceptional conditions. Where explicit distinction | ||
651 | between error and exception is necessary, the term 'non-error exception' | ||
652 | is used. | ||
653 | |||
654 | Exception categories | ||
655 | -------------------- | ||
656 | |||
657 | Exceptions are described primarily with respect to legacy taskfile + bus | ||
658 | master IDE interface. If a controller provides other better mechanism | ||
659 | for error reporting, mapping those into categories described below | ||
660 | shouldn't be difficult. | ||
661 | |||
662 | In the following sections, two recovery actions - reset and | ||
663 | reconfiguring transport - are mentioned. These are described further in | ||
664 | `EH recovery actions <#exrec>`__. | ||
665 | |||
666 | HSM violation | ||
667 | ~~~~~~~~~~~~~ | ||
668 | |||
669 | This error is indicated when STATUS value doesn't match HSM requirement | ||
670 | during issuing or execution any ATA/ATAPI command. | ||
671 | |||
672 | - ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying to | ||
673 | issue a command. | ||
674 | |||
675 | - !BSY && !DRQ during PIO data transfer. | ||
676 | |||
677 | - DRQ on command completion. | ||
678 | |||
679 | - !BSY && ERR after CDB transfer starts but before the last byte of CDB | ||
680 | is transferred. ATA/ATAPI standard states that "The device shall not | ||
681 | terminate the PACKET command with an error before the last byte of | ||
682 | the command packet has been written" in the error outputs description | ||
683 | of PACKET command and the state diagram doesn't include such | ||
684 | transitions. | ||
685 | |||
686 | In these cases, HSM is violated and not much information regarding the | ||
687 | error can be acquired from STATUS or ERROR register. IOW, this error can | ||
688 | be anything - driver bug, faulty device, controller and/or cable. | ||
689 | |||
690 | As HSM is violated, reset is necessary to restore known state. | ||
691 | Reconfiguring transport for lower speed might be helpful too as | ||
692 | transmission errors sometimes cause this kind of errors. | ||
693 | |||
694 | ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION) | ||
695 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
696 | |||
697 | These are errors detected and reported by ATA/ATAPI devices indicating | ||
698 | device problems. For this type of errors, STATUS and ERROR register | ||
699 | values are valid and describe error condition. Note that some of ATA bus | ||
700 | errors are detected by ATA/ATAPI devices and reported using the same | ||
701 | mechanism as device errors. Those cases are described later in this | ||
702 | section. | ||
703 | |||
704 | For ATA commands, this type of errors are indicated by !BSY && ERR | ||
705 | during command execution and on completion. | ||
706 | |||
707 | For ATAPI commands, | ||
708 | |||
709 | - !BSY && ERR && ABRT right after issuing PACKET indicates that PACKET | ||
710 | command is not supported and falls in this category. | ||
711 | |||
712 | - !BSY && ERR(==CHK) && !ABRT after the last byte of CDB is transferred | ||
713 | indicates CHECK CONDITION and doesn't fall in this category. | ||
714 | |||
715 | - !BSY && ERR(==CHK) && ABRT after the last byte of CDB is transferred | ||
716 | \*probably\* indicates CHECK CONDITION and doesn't fall in this | ||
717 | category. | ||
718 | |||
719 | Of errors detected as above, the following are not ATA/ATAPI device | ||
720 | errors but ATA bus errors and should be handled according to | ||
721 | `ATA bus error <#excatATAbusErr>`__. | ||
722 | |||
723 | CRC error during data transfer | ||
724 | This is indicated by ICRC bit in the ERROR register and means that | ||
725 | corruption occurred during data transfer. Up to ATA/ATAPI-7, the | ||
726 | standard specifies that this bit is only applicable to UDMA | ||
727 | transfers but ATA/ATAPI-8 draft revision 1f says that the bit may be | ||
728 | applicable to multiword DMA and PIO. | ||
729 | |||
730 | ABRT error during data transfer or on completion | ||
731 | Up to ATA/ATAPI-7, the standard specifies that ABRT could be set on | ||
732 | ICRC errors and on cases where a device is not able to complete a | ||
733 | command. Combined with the fact that MWDMA and PIO transfer errors | ||
734 | aren't allowed to use ICRC bit up to ATA/ATAPI-7, it seems to imply | ||
735 | that ABRT bit alone could indicate transfer errors. | ||
736 | |||
737 | However, ATA/ATAPI-8 draft revision 1f removes the part that ICRC | ||
738 | errors can turn on ABRT. So, this is kind of gray area. Some | ||
739 | heuristics are needed here. | ||
740 | |||
741 | ATA/ATAPI device errors can be further categorized as follows. | ||
742 | |||
743 | Media errors | ||
744 | This is indicated by UNC bit in the ERROR register. ATA devices | ||
745 | reports UNC error only after certain number of retries cannot | ||
746 | recover the data, so there's nothing much else to do other than | ||
747 | notifying upper layer. | ||
748 | |||
749 | READ and WRITE commands report CHS or LBA of the first failed sector | ||
750 | but ATA/ATAPI standard specifies that the amount of transferred data | ||
751 | on error completion is indeterminate, so we cannot assume that | ||
752 | sectors preceding the failed sector have been transferred and thus | ||
753 | cannot complete those sectors successfully as SCSI does. | ||
754 | |||
755 | Media changed / media change requested error | ||
756 | <<TODO: fill here>> | ||
757 | |||
758 | Address error | ||
759 | This is indicated by IDNF bit in the ERROR register. Report to upper | ||
760 | layer. | ||
761 | |||
762 | Other errors | ||
763 | This can be invalid command or parameter indicated by ABRT ERROR bit | ||
764 | or some other error condition. Note that ABRT bit can indicate a lot | ||
765 | of things including ICRC and Address errors. Heuristics needed. | ||
766 | |||
767 | Depending on commands, not all STATUS/ERROR bits are applicable. These | ||
768 | non-applicable bits are marked with "na" in the output descriptions but | ||
769 | up to ATA/ATAPI-7 no definition of "na" can be found. However, | ||
770 | ATA/ATAPI-8 draft revision 1f describes "N/A" as follows. | ||
771 | |||
772 | 3.2.3.3a N/A | ||
773 | A keyword the indicates a field has no defined value in this | ||
774 | standard and should not be checked by the host or device. N/A | ||
775 | fields should be cleared to zero. | ||
776 | |||
777 | So, it seems reasonable to assume that "na" bits are cleared to zero by | ||
778 | devices and thus need no explicit masking. | ||
779 | |||
780 | ATAPI device CHECK CONDITION | ||
781 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
782 | |||
783 | ATAPI device CHECK CONDITION error is indicated by set CHK bit (ERR bit) | ||
784 | in the STATUS register after the last byte of CDB is transferred for a | ||
785 | PACKET command. For this kind of errors, sense data should be acquired | ||
786 | to gather information regarding the errors. REQUEST SENSE packet command | ||
787 | should be used to acquire sense data. | ||
788 | |||
789 | Once sense data is acquired, this type of errors can be handled | ||
790 | similarly to other SCSI errors. Note that sense data may indicate ATA | ||
791 | bus error (e.g. Sense Key 04h HARDWARE ERROR && ASC/ASCQ 47h/00h SCSI | ||
792 | PARITY ERROR). In such cases, the error should be considered as an ATA | ||
793 | bus error and handled according to `ATA bus error <#excatATAbusErr>`__. | ||
794 | |||
795 | ATA device error (NCQ) | ||
796 | ~~~~~~~~~~~~~~~~~~~~~~ | ||
797 | |||
798 | NCQ command error is indicated by cleared BSY and set ERR bit during NCQ | ||
799 | command phase (one or more NCQ commands outstanding). Although STATUS | ||
800 | and ERROR registers will contain valid values describing the error, READ | ||
801 | LOG EXT is required to clear the error condition, determine which | ||
802 | command has failed and acquire more information. | ||
803 | |||
804 | READ LOG EXT Log Page 10h reports which tag has failed and taskfile | ||
805 | register values describing the error. With this information the failed | ||
806 | command can be handled as a normal ATA command error as in | ||
807 | `ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION) <#excatDevErr>`__ | ||
808 | and all other in-flight commands must be retried. Note that this retry | ||
809 | should not be counted - it's likely that commands retried this way would | ||
810 | have completed normally if it were not for the failed command. | ||
811 | |||
812 | Note that ATA bus errors can be reported as ATA device NCQ errors. This | ||
813 | should be handled as described in `ATA bus error <#excatATAbusErr>`__. | ||
814 | |||
815 | If READ LOG EXT Log Page 10h fails or reports NQ, we're thoroughly | ||
816 | screwed. This condition should be treated according to | ||
817 | `HSM violation <#excatHSMviolation>`__. | ||
818 | |||
819 | ATA bus error | ||
820 | ~~~~~~~~~~~~~ | ||
821 | |||
822 | ATA bus error means that data corruption occurred during transmission | ||
823 | over ATA bus (SATA or PATA). This type of errors can be indicated by | ||
824 | |||
825 | - ICRC or ABRT error as described in | ||
826 | `ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION) <#excatDevErr>`__. | ||
827 | |||
828 | - Controller-specific error completion with error information | ||
829 | indicating transmission error. | ||
830 | |||
831 | - On some controllers, command timeout. In this case, there may be a | ||
832 | mechanism to determine that the timeout is due to transmission error. | ||
833 | |||
834 | - Unknown/random errors, timeouts and all sorts of weirdities. | ||
835 | |||
836 | As described above, transmission errors can cause wide variety of | ||
837 | symptoms ranging from device ICRC error to random device lockup, and, | ||
838 | for many cases, there is no way to tell if an error condition is due to | ||
839 | transmission error or not; therefore, it's necessary to employ some kind | ||
840 | of heuristic when dealing with errors and timeouts. For example, | ||
841 | encountering repetitive ABRT errors for known supported command is | ||
842 | likely to indicate ATA bus error. | ||
843 | |||
844 | Once it's determined that ATA bus errors have possibly occurred, | ||
845 | lowering ATA bus transmission speed is one of actions which may | ||
846 | alleviate the problem. See `Reconfigure transport <#exrecReconf>`__ for | ||
847 | more information. | ||
848 | |||
849 | PCI bus error | ||
850 | ~~~~~~~~~~~~~ | ||
851 | |||
852 | Data corruption or other failures during transmission over PCI (or other | ||
853 | system bus). For standard BMDMA, this is indicated by Error bit in the | ||
854 | BMDMA Status register. This type of errors must be logged as it | ||
855 | indicates something is very wrong with the system. Resetting host | ||
856 | controller is recommended. | ||
857 | |||
858 | Late completion | ||
859 | ~~~~~~~~~~~~~~~ | ||
860 | |||
861 | This occurs when timeout occurs and the timeout handler finds out that | ||
862 | the timed out command has completed successfully or with error. This is | ||
863 | usually caused by lost interrupts. This type of errors must be logged. | ||
864 | Resetting host controller is recommended. | ||
865 | |||
866 | Unknown error (timeout) | ||
867 | ~~~~~~~~~~~~~~~~~~~~~~~ | ||
868 | |||
869 | This is when timeout occurs and the command is still processing or the | ||
870 | host and device are in unknown state. When this occurs, HSM could be in | ||
871 | any valid or invalid state. To bring the device to known state and make | ||
872 | it forget about the timed out command, resetting is necessary. The timed | ||
873 | out command may be retried. | ||
874 | |||
875 | Timeouts can also be caused by transmission errors. Refer to | ||
876 | `ATA bus error <#excatATAbusErr>`__ for more details. | ||
877 | |||
878 | Hotplug and power management exceptions | ||
879 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
880 | |||
881 | <<TODO: fill here>> | ||
882 | |||
883 | EH recovery actions | ||
884 | ------------------- | ||
885 | |||
886 | This section discusses several important recovery actions. | ||
887 | |||
888 | Clearing error condition | ||
889 | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
890 | |||
891 | Many controllers require its error registers to be cleared by error | ||
892 | handler. Different controllers may have different requirements. | ||
893 | |||
894 | For SATA, it's strongly recommended to clear at least SError register | ||
895 | during error handling. | ||
896 | |||
897 | Reset | ||
898 | ~~~~~ | ||
899 | |||
900 | During EH, resetting is necessary in the following cases. | ||
901 | |||
902 | - HSM is in unknown or invalid state | ||
903 | |||
904 | - HBA is in unknown or invalid state | ||
905 | |||
906 | - EH needs to make HBA/device forget about in-flight commands | ||
907 | |||
908 | - HBA/device behaves weirdly | ||
909 | |||
910 | Resetting during EH might be a good idea regardless of error condition | ||
911 | to improve EH robustness. Whether to reset both or either one of HBA and | ||
912 | device depends on situation but the following scheme is recommended. | ||
913 | |||
914 | - When it's known that HBA is in ready state but ATA/ATAPI device is in | ||
915 | unknown state, reset only device. | ||
916 | |||
917 | - If HBA is in unknown state, reset both HBA and device. | ||
918 | |||
919 | HBA resetting is implementation specific. For a controller complying to | ||
920 | taskfile/BMDMA PCI IDE, stopping active DMA transaction may be | ||
921 | sufficient iff BMDMA state is the only HBA context. But even mostly | ||
922 | taskfile/BMDMA PCI IDE complying controllers may have implementation | ||
923 | specific requirements and mechanism to reset themselves. This must be | ||
924 | addressed by specific drivers. | ||
925 | |||
926 | OTOH, ATA/ATAPI standard describes in detail ways to reset ATA/ATAPI | ||
927 | devices. | ||
928 | |||
929 | PATA hardware reset | ||
930 | This is hardware initiated device reset signalled with asserted PATA | ||
931 | RESET- signal. There is no standard way to initiate hardware reset | ||
932 | from software although some hardware provides registers that allow | ||
933 | driver to directly tweak the RESET- signal. | ||
934 | |||
935 | Software reset | ||
936 | This is achieved by turning CONTROL SRST bit on for at least 5us. | ||
937 | Both PATA and SATA support it but, in case of SATA, this may require | ||
938 | controller-specific support as the second Register FIS to clear SRST | ||
939 | should be transmitted while BSY bit is still set. Note that on PATA, | ||
940 | this resets both master and slave devices on a channel. | ||
941 | |||
942 | EXECUTE DEVICE DIAGNOSTIC command | ||
943 | Although ATA/ATAPI standard doesn't describe exactly, EDD implies | ||
944 | some level of resetting, possibly similar level with software reset. | ||
945 | Host-side EDD protocol can be handled with normal command processing | ||
946 | and most SATA controllers should be able to handle EDD's just like | ||
947 | other commands. As in software reset, EDD affects both devices on a | ||
948 | PATA bus. | ||
949 | |||
950 | Although EDD does reset devices, this doesn't suit error handling as | ||
951 | EDD cannot be issued while BSY is set and it's unclear how it will | ||
952 | act when device is in unknown/weird state. | ||
953 | |||
954 | ATAPI DEVICE RESET command | ||
955 | This is very similar to software reset except that reset can be | ||
956 | restricted to the selected device without affecting the other device | ||
957 | sharing the cable. | ||
958 | |||
959 | SATA phy reset | ||
960 | This is the preferred way of resetting a SATA device. In effect, | ||
961 | it's identical to PATA hardware reset. Note that this can be done | ||
962 | with the standard SCR Control register. As such, it's usually easier | ||
963 | to implement than software reset. | ||
964 | |||
965 | One more thing to consider when resetting devices is that resetting | ||
966 | clears certain configuration parameters and they need to be set to their | ||
967 | previous or newly adjusted values after reset. | ||
968 | |||
969 | Parameters affected are. | ||
970 | |||
971 | - CHS set up with INITIALIZE DEVICE PARAMETERS (seldom used) | ||
972 | |||
973 | - Parameters set with SET FEATURES including transfer mode setting | ||
974 | |||
975 | - Block count set with SET MULTIPLE MODE | ||
976 | |||
977 | - Other parameters (SET MAX, MEDIA LOCK...) | ||
978 | |||
979 | ATA/ATAPI standard specifies that some parameters must be maintained | ||
980 | across hardware or software reset, but doesn't strictly specify all of | ||
981 | them. Always reconfiguring needed parameters after reset is required for | ||
982 | robustness. Note that this also applies when resuming from deep sleep | ||
983 | (power-off). | ||
984 | |||
985 | Also, ATA/ATAPI standard requires that IDENTIFY DEVICE / IDENTIFY PACKET | ||
986 | DEVICE is issued after any configuration parameter is updated or a | ||
987 | hardware reset and the result used for further operation. OS driver is | ||
988 | required to implement revalidation mechanism to support this. | ||
989 | |||
990 | Reconfigure transport | ||
991 | ~~~~~~~~~~~~~~~~~~~~~ | ||
992 | |||
993 | For both PATA and SATA, a lot of corners are cut for cheap connectors, | ||
994 | cables or controllers and it's quite common to see high transmission | ||
995 | error rate. This can be mitigated by lowering transmission speed. | ||
996 | |||
997 | The following is a possible scheme Jeff Garzik suggested. | ||
998 | |||
999 | If more than $N (3?) transmission errors happen in 15 minutes, | ||
1000 | |||
1001 | - if SATA, decrease SATA PHY speed. if speed cannot be decreased, | ||
1002 | |||
1003 | - decrease UDMA xfer speed. if at UDMA0, switch to PIO4, | ||
1004 | |||
1005 | - decrease PIO xfer speed. if at PIO3, complain, but continue | ||
1006 | |||
1007 | ata_piix Internals | ||
1008 | =================== | ||
1009 | |||
1010 | .. kernel-doc:: drivers/ata/ata_piix.c | ||
1011 | :internal: | ||
1012 | |||
1013 | sata_sil Internals | ||
1014 | =================== | ||
1015 | |||
1016 | .. kernel-doc:: drivers/ata/sata_sil.c | ||
1017 | :internal: | ||
1018 | |||
1019 | Thanks | ||
1020 | ====== | ||
1021 | |||
1022 | The bulk of the ATA knowledge comes thanks to long conversations with | ||
1023 | Andre Hedrick (www.linux-ide.org), and long hours pondering the ATA and | ||
1024 | SCSI specifications. | ||
1025 | |||
1026 | Thanks to Alan Cox for pointing out similarities between SATA and SCSI, | ||
1027 | and in general for motivation to hack on libata. | ||
1028 | |||
1029 | libata's device detection method, ata_pio_devchk, and in general all | ||
1030 | the early probing was based on extensive study of Hale Landis's | ||
1031 | probe/reset code in his ATADRVR driver (www.ata-atapi.com). | ||
diff --git a/Documentation/driver-api/mtdnand.rst b/Documentation/driver-api/mtdnand.rst new file mode 100644 index 000000000000..e9afa586d15e --- /dev/null +++ b/Documentation/driver-api/mtdnand.rst | |||
@@ -0,0 +1,1007 @@ | |||
1 | ===================================== | ||
2 | MTD NAND Driver Programming Interface | ||
3 | ===================================== | ||
4 | |||
5 | :Author: Thomas Gleixner | ||
6 | |||
7 | Introduction | ||
8 | ============ | ||
9 | |||
10 | The generic NAND driver supports almost all NAND and AG-AND based chips | ||
11 | and connects them to the Memory Technology Devices (MTD) subsystem of | ||
12 | the Linux Kernel. | ||
13 | |||
14 | This documentation is provided for developers who want to implement | ||
15 | board drivers or filesystem drivers suitable for NAND devices. | ||
16 | |||
17 | Known Bugs And Assumptions | ||
18 | ========================== | ||
19 | |||
20 | None. | ||
21 | |||
22 | Documentation hints | ||
23 | =================== | ||
24 | |||
25 | The function and structure docs are autogenerated. Each function and | ||
26 | struct member has a short description which is marked with an [XXX] | ||
27 | identifier. The following chapters explain the meaning of those | ||
28 | identifiers. | ||
29 | |||
30 | Function identifiers [XXX] | ||
31 | -------------------------- | ||
32 | |||
33 | The functions are marked with [XXX] identifiers in the short comment. | ||
34 | The identifiers explain the usage and scope of the functions. Following | ||
35 | identifiers are used: | ||
36 | |||
37 | - [MTD Interface] | ||
38 | |||
39 | These functions provide the interface to the MTD kernel API. They are | ||
40 | not replaceable and provide functionality which is complete hardware | ||
41 | independent. | ||
42 | |||
43 | - [NAND Interface] | ||
44 | |||
45 | These functions are exported and provide the interface to the NAND | ||
46 | kernel API. | ||
47 | |||
48 | - [GENERIC] | ||
49 | |||
50 | Generic functions are not replaceable and provide functionality which | ||
51 | is complete hardware independent. | ||
52 | |||
53 | - [DEFAULT] | ||
54 | |||
55 | Default functions provide hardware related functionality which is | ||
56 | suitable for most of the implementations. These functions can be | ||
57 | replaced by the board driver if necessary. Those functions are called | ||
58 | via pointers in the NAND chip description structure. The board driver | ||
59 | can set the functions which should be replaced by board dependent | ||
60 | functions before calling nand_scan(). If the function pointer is | ||
61 | NULL on entry to nand_scan() then the pointer is set to the default | ||
62 | function which is suitable for the detected chip type. | ||
63 | |||
64 | Struct member identifiers [XXX] | ||
65 | ------------------------------- | ||
66 | |||
67 | The struct members are marked with [XXX] identifiers in the comment. The | ||
68 | identifiers explain the usage and scope of the members. Following | ||
69 | identifiers are used: | ||
70 | |||
71 | - [INTERN] | ||
72 | |||
73 | These members are for NAND driver internal use only and must not be | ||
74 | modified. Most of these values are calculated from the chip geometry | ||
75 | information which is evaluated during nand_scan(). | ||
76 | |||
77 | - [REPLACEABLE] | ||
78 | |||
79 | Replaceable members hold hardware related functions which can be | ||
80 | provided by the board driver. The board driver can set the functions | ||
81 | which should be replaced by board dependent functions before calling | ||
82 | nand_scan(). If the function pointer is NULL on entry to | ||
83 | nand_scan() then the pointer is set to the default function which is | ||
84 | suitable for the detected chip type. | ||
85 | |||
86 | - [BOARDSPECIFIC] | ||
87 | |||
88 | Board specific members hold hardware related information which must | ||
89 | be provided by the board driver. The board driver must set the | ||
90 | function pointers and datafields before calling nand_scan(). | ||
91 | |||
92 | - [OPTIONAL] | ||
93 | |||
94 | Optional members can hold information relevant for the board driver. | ||
95 | The generic NAND driver code does not use this information. | ||
96 | |||
97 | Basic board driver | ||
98 | ================== | ||
99 | |||
100 | For most boards it will be sufficient to provide just the basic | ||
101 | functions and fill out some really board dependent members in the nand | ||
102 | chip description structure. | ||
103 | |||
104 | Basic defines | ||
105 | ------------- | ||
106 | |||
107 | At least you have to provide a nand_chip structure and a storage for | ||
108 | the ioremap'ed chip address. You can allocate the nand_chip structure | ||
109 | using kmalloc or you can allocate it statically. The NAND chip structure | ||
110 | embeds an mtd structure which will be registered to the MTD subsystem. | ||
111 | You can extract a pointer to the mtd structure from a nand_chip pointer | ||
112 | using the nand_to_mtd() helper. | ||
113 | |||
114 | Kmalloc based example | ||
115 | |||
116 | :: | ||
117 | |||
118 | static struct mtd_info *board_mtd; | ||
119 | static void __iomem *baseaddr; | ||
120 | |||
121 | |||
122 | Static example | ||
123 | |||
124 | :: | ||
125 | |||
126 | static struct nand_chip board_chip; | ||
127 | static void __iomem *baseaddr; | ||
128 | |||
129 | |||
130 | Partition defines | ||
131 | ----------------- | ||
132 | |||
133 | If you want to divide your device into partitions, then define a | ||
134 | partitioning scheme suitable to your board. | ||
135 | |||
136 | :: | ||
137 | |||
138 | #define NUM_PARTITIONS 2 | ||
139 | static struct mtd_partition partition_info[] = { | ||
140 | { .name = "Flash partition 1", | ||
141 | .offset = 0, | ||
142 | .size = 8 * 1024 * 1024 }, | ||
143 | { .name = "Flash partition 2", | ||
144 | .offset = MTDPART_OFS_NEXT, | ||
145 | .size = MTDPART_SIZ_FULL }, | ||
146 | }; | ||
147 | |||
148 | |||
149 | Hardware control function | ||
150 | ------------------------- | ||
151 | |||
152 | The hardware control function provides access to the control pins of the | ||
153 | NAND chip(s). The access can be done by GPIO pins or by address lines. | ||
154 | If you use address lines, make sure that the timing requirements are | ||
155 | met. | ||
156 | |||
157 | *GPIO based example* | ||
158 | |||
159 | :: | ||
160 | |||
161 | static void board_hwcontrol(struct mtd_info *mtd, int cmd) | ||
162 | { | ||
163 | switch(cmd){ | ||
164 | case NAND_CTL_SETCLE: /* Set CLE pin high */ break; | ||
165 | case NAND_CTL_CLRCLE: /* Set CLE pin low */ break; | ||
166 | case NAND_CTL_SETALE: /* Set ALE pin high */ break; | ||
167 | case NAND_CTL_CLRALE: /* Set ALE pin low */ break; | ||
168 | case NAND_CTL_SETNCE: /* Set nCE pin low */ break; | ||
169 | case NAND_CTL_CLRNCE: /* Set nCE pin high */ break; | ||
170 | } | ||
171 | } | ||
172 | |||
173 | |||
174 | *Address lines based example.* It's assumed that the nCE pin is driven | ||
175 | by a chip select decoder. | ||
176 | |||
177 | :: | ||
178 | |||
179 | static void board_hwcontrol(struct mtd_info *mtd, int cmd) | ||
180 | { | ||
181 | struct nand_chip *this = mtd_to_nand(mtd); | ||
182 | switch(cmd){ | ||
183 | case NAND_CTL_SETCLE: this->IO_ADDR_W |= CLE_ADRR_BIT; break; | ||
184 | case NAND_CTL_CLRCLE: this->IO_ADDR_W &= ~CLE_ADRR_BIT; break; | ||
185 | case NAND_CTL_SETALE: this->IO_ADDR_W |= ALE_ADRR_BIT; break; | ||
186 | case NAND_CTL_CLRALE: this->IO_ADDR_W &= ~ALE_ADRR_BIT; break; | ||
187 | } | ||
188 | } | ||
189 | |||
190 | |||
191 | Device ready function | ||
192 | --------------------- | ||
193 | |||
194 | If the hardware interface has the ready busy pin of the NAND chip | ||
195 | connected to a GPIO or other accessible I/O pin, this function is used | ||
196 | to read back the state of the pin. The function has no arguments and | ||
197 | should return 0, if the device is busy (R/B pin is low) and 1, if the | ||
198 | device is ready (R/B pin is high). If the hardware interface does not | ||
199 | give access to the ready busy pin, then the function must not be defined | ||
200 | and the function pointer this->dev_ready is set to NULL. | ||
201 | |||
202 | Init function | ||
203 | ------------- | ||
204 | |||
205 | The init function allocates memory and sets up all the board specific | ||
206 | parameters and function pointers. When everything is set up nand_scan() | ||
207 | is called. This function tries to detect and identify then chip. If a | ||
208 | chip is found all the internal data fields are initialized accordingly. | ||
209 | The structure(s) have to be zeroed out first and then filled with the | ||
210 | necessary information about the device. | ||
211 | |||
212 | :: | ||
213 | |||
214 | static int __init board_init (void) | ||
215 | { | ||
216 | struct nand_chip *this; | ||
217 | int err = 0; | ||
218 | |||
219 | /* Allocate memory for MTD device structure and private data */ | ||
220 | this = kzalloc(sizeof(struct nand_chip), GFP_KERNEL); | ||
221 | if (!this) { | ||
222 | printk ("Unable to allocate NAND MTD device structure.\n"); | ||
223 | err = -ENOMEM; | ||
224 | goto out; | ||
225 | } | ||
226 | |||
227 | board_mtd = nand_to_mtd(this); | ||
228 | |||
229 | /* map physical address */ | ||
230 | baseaddr = ioremap(CHIP_PHYSICAL_ADDRESS, 1024); | ||
231 | if (!baseaddr) { | ||
232 | printk("Ioremap to access NAND chip failed\n"); | ||
233 | err = -EIO; | ||
234 | goto out_mtd; | ||
235 | } | ||
236 | |||
237 | /* Set address of NAND IO lines */ | ||
238 | this->IO_ADDR_R = baseaddr; | ||
239 | this->IO_ADDR_W = baseaddr; | ||
240 | /* Reference hardware control function */ | ||
241 | this->hwcontrol = board_hwcontrol; | ||
242 | /* Set command delay time, see datasheet for correct value */ | ||
243 | this->chip_delay = CHIP_DEPENDEND_COMMAND_DELAY; | ||
244 | /* Assign the device ready function, if available */ | ||
245 | this->dev_ready = board_dev_ready; | ||
246 | this->eccmode = NAND_ECC_SOFT; | ||
247 | |||
248 | /* Scan to find existence of the device */ | ||
249 | if (nand_scan (board_mtd, 1)) { | ||
250 | err = -ENXIO; | ||
251 | goto out_ior; | ||
252 | } | ||
253 | |||
254 | add_mtd_partitions(board_mtd, partition_info, NUM_PARTITIONS); | ||
255 | goto out; | ||
256 | |||
257 | out_ior: | ||
258 | iounmap(baseaddr); | ||
259 | out_mtd: | ||
260 | kfree (this); | ||
261 | out: | ||
262 | return err; | ||
263 | } | ||
264 | module_init(board_init); | ||
265 | |||
266 | |||
267 | Exit function | ||
268 | ------------- | ||
269 | |||
270 | The exit function is only necessary if the driver is compiled as a | ||
271 | module. It releases all resources which are held by the chip driver and | ||
272 | unregisters the partitions in the MTD layer. | ||
273 | |||
274 | :: | ||
275 | |||
276 | #ifdef MODULE | ||
277 | static void __exit board_cleanup (void) | ||
278 | { | ||
279 | /* Release resources, unregister device */ | ||
280 | nand_release (board_mtd); | ||
281 | |||
282 | /* unmap physical address */ | ||
283 | iounmap(baseaddr); | ||
284 | |||
285 | /* Free the MTD device structure */ | ||
286 | kfree (mtd_to_nand(board_mtd)); | ||
287 | } | ||
288 | module_exit(board_cleanup); | ||
289 | #endif | ||
290 | |||
291 | |||
292 | Advanced board driver functions | ||
293 | =============================== | ||
294 | |||
295 | This chapter describes the advanced functionality of the NAND driver. | ||
296 | For a list of functions which can be overridden by the board driver see | ||
297 | the documentation of the nand_chip structure. | ||
298 | |||
299 | Multiple chip control | ||
300 | --------------------- | ||
301 | |||
302 | The nand driver can control chip arrays. Therefore the board driver must | ||
303 | provide an own select_chip function. This function must (de)select the | ||
304 | requested chip. The function pointer in the nand_chip structure must be | ||
305 | set before calling nand_scan(). The maxchip parameter of nand_scan() | ||
306 | defines the maximum number of chips to scan for. Make sure that the | ||
307 | select_chip function can handle the requested number of chips. | ||
308 | |||
309 | The nand driver concatenates the chips to one virtual chip and provides | ||
310 | this virtual chip to the MTD layer. | ||
311 | |||
312 | *Note: The driver can only handle linear chip arrays of equally sized | ||
313 | chips. There is no support for parallel arrays which extend the | ||
314 | buswidth.* | ||
315 | |||
316 | *GPIO based example* | ||
317 | |||
318 | :: | ||
319 | |||
320 | static void board_select_chip (struct mtd_info *mtd, int chip) | ||
321 | { | ||
322 | /* Deselect all chips, set all nCE pins high */ | ||
323 | GPIO(BOARD_NAND_NCE) |= 0xff; | ||
324 | if (chip >= 0) | ||
325 | GPIO(BOARD_NAND_NCE) &= ~ (1 << chip); | ||
326 | } | ||
327 | |||
328 | |||
329 | *Address lines based example.* Its assumed that the nCE pins are | ||
330 | connected to an address decoder. | ||
331 | |||
332 | :: | ||
333 | |||
334 | static void board_select_chip (struct mtd_info *mtd, int chip) | ||
335 | { | ||
336 | struct nand_chip *this = mtd_to_nand(mtd); | ||
337 | |||
338 | /* Deselect all chips */ | ||
339 | this->IO_ADDR_R &= ~BOARD_NAND_ADDR_MASK; | ||
340 | this->IO_ADDR_W &= ~BOARD_NAND_ADDR_MASK; | ||
341 | switch (chip) { | ||
342 | case 0: | ||
343 | this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIP0; | ||
344 | this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIP0; | ||
345 | break; | ||
346 | .... | ||
347 | case n: | ||
348 | this->IO_ADDR_R |= BOARD_NAND_ADDR_CHIPn; | ||
349 | this->IO_ADDR_W |= BOARD_NAND_ADDR_CHIPn; | ||
350 | break; | ||
351 | } | ||
352 | } | ||
353 | |||
354 | |||
355 | Hardware ECC support | ||
356 | -------------------- | ||
357 | |||
358 | Functions and constants | ||
359 | ~~~~~~~~~~~~~~~~~~~~~~~ | ||
360 | |||
361 | The nand driver supports three different types of hardware ECC. | ||
362 | |||
363 | - NAND_ECC_HW3_256 | ||
364 | |||
365 | Hardware ECC generator providing 3 bytes ECC per 256 byte. | ||
366 | |||
367 | - NAND_ECC_HW3_512 | ||
368 | |||
369 | Hardware ECC generator providing 3 bytes ECC per 512 byte. | ||
370 | |||
371 | - NAND_ECC_HW6_512 | ||
372 | |||
373 | Hardware ECC generator providing 6 bytes ECC per 512 byte. | ||
374 | |||
375 | - NAND_ECC_HW8_512 | ||
376 | |||
377 | Hardware ECC generator providing 6 bytes ECC per 512 byte. | ||
378 | |||
379 | If your hardware generator has a different functionality add it at the | ||
380 | appropriate place in nand_base.c | ||
381 | |||
382 | The board driver must provide following functions: | ||
383 | |||
384 | - enable_hwecc | ||
385 | |||
386 | This function is called before reading / writing to the chip. Reset | ||
387 | or initialize the hardware generator in this function. The function | ||
388 | is called with an argument which let you distinguish between read and | ||
389 | write operations. | ||
390 | |||
391 | - calculate_ecc | ||
392 | |||
393 | This function is called after read / write from / to the chip. | ||
394 | Transfer the ECC from the hardware to the buffer. If the option | ||
395 | NAND_HWECC_SYNDROME is set then the function is only called on | ||
396 | write. See below. | ||
397 | |||
398 | - correct_data | ||
399 | |||
400 | In case of an ECC error this function is called for error detection | ||
401 | and correction. Return 1 respectively 2 in case the error can be | ||
402 | corrected. If the error is not correctable return -1. If your | ||
403 | hardware generator matches the default algorithm of the nand_ecc | ||
404 | software generator then use the correction function provided by | ||
405 | nand_ecc instead of implementing duplicated code. | ||
406 | |||
407 | Hardware ECC with syndrome calculation | ||
408 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
409 | |||
410 | Many hardware ECC implementations provide Reed-Solomon codes and | ||
411 | calculate an error syndrome on read. The syndrome must be converted to a | ||
412 | standard Reed-Solomon syndrome before calling the error correction code | ||
413 | in the generic Reed-Solomon library. | ||
414 | |||
415 | The ECC bytes must be placed immediately after the data bytes in order | ||
416 | to make the syndrome generator work. This is contrary to the usual | ||
417 | layout used by software ECC. The separation of data and out of band area | ||
418 | is not longer possible. The nand driver code handles this layout and the | ||
419 | remaining free bytes in the oob area are managed by the autoplacement | ||
420 | code. Provide a matching oob-layout in this case. See rts_from4.c and | ||
421 | diskonchip.c for implementation reference. In those cases we must also | ||
422 | use bad block tables on FLASH, because the ECC layout is interfering | ||
423 | with the bad block marker positions. See bad block table support for | ||
424 | details. | ||
425 | |||
426 | Bad block table support | ||
427 | ----------------------- | ||
428 | |||
429 | Most NAND chips mark the bad blocks at a defined position in the spare | ||
430 | area. Those blocks must not be erased under any circumstances as the bad | ||
431 | block information would be lost. It is possible to check the bad block | ||
432 | mark each time when the blocks are accessed by reading the spare area of | ||
433 | the first page in the block. This is time consuming so a bad block table | ||
434 | is used. | ||
435 | |||
436 | The nand driver supports various types of bad block tables. | ||
437 | |||
438 | - Per device | ||
439 | |||
440 | The bad block table contains all bad block information of the device | ||
441 | which can consist of multiple chips. | ||
442 | |||
443 | - Per chip | ||
444 | |||
445 | A bad block table is used per chip and contains the bad block | ||
446 | information for this particular chip. | ||
447 | |||
448 | - Fixed offset | ||
449 | |||
450 | The bad block table is located at a fixed offset in the chip | ||
451 | (device). This applies to various DiskOnChip devices. | ||
452 | |||
453 | - Automatic placed | ||
454 | |||
455 | The bad block table is automatically placed and detected either at | ||
456 | the end or at the beginning of a chip (device) | ||
457 | |||
458 | - Mirrored tables | ||
459 | |||
460 | The bad block table is mirrored on the chip (device) to allow updates | ||
461 | of the bad block table without data loss. | ||
462 | |||
463 | nand_scan() calls the function nand_default_bbt(). | ||
464 | nand_default_bbt() selects appropriate default bad block table | ||
465 | descriptors depending on the chip information which was retrieved by | ||
466 | nand_scan(). | ||
467 | |||
468 | The standard policy is scanning the device for bad blocks and build a | ||
469 | ram based bad block table which allows faster access than always | ||
470 | checking the bad block information on the flash chip itself. | ||
471 | |||
472 | Flash based tables | ||
473 | ~~~~~~~~~~~~~~~~~~ | ||
474 | |||
475 | It may be desired or necessary to keep a bad block table in FLASH. For | ||
476 | AG-AND chips this is mandatory, as they have no factory marked bad | ||
477 | blocks. They have factory marked good blocks. The marker pattern is | ||
478 | erased when the block is erased to be reused. So in case of powerloss | ||
479 | before writing the pattern back to the chip this block would be lost and | ||
480 | added to the bad blocks. Therefore we scan the chip(s) when we detect | ||
481 | them the first time for good blocks and store this information in a bad | ||
482 | block table before erasing any of the blocks. | ||
483 | |||
484 | The blocks in which the tables are stored are protected against | ||
485 | accidental access by marking them bad in the memory bad block table. The | ||
486 | bad block table management functions are allowed to circumvent this | ||
487 | protection. | ||
488 | |||
489 | The simplest way to activate the FLASH based bad block table support is | ||
490 | to set the option NAND_BBT_USE_FLASH in the bbt_option field of the | ||
491 | nand chip structure before calling nand_scan(). For AG-AND chips is | ||
492 | this done by default. This activates the default FLASH based bad block | ||
493 | table functionality of the NAND driver. The default bad block table | ||
494 | options are | ||
495 | |||
496 | - Store bad block table per chip | ||
497 | |||
498 | - Use 2 bits per block | ||
499 | |||
500 | - Automatic placement at the end of the chip | ||
501 | |||
502 | - Use mirrored tables with version numbers | ||
503 | |||
504 | - Reserve 4 blocks at the end of the chip | ||
505 | |||
506 | User defined tables | ||
507 | ~~~~~~~~~~~~~~~~~~~ | ||
508 | |||
509 | User defined tables are created by filling out a nand_bbt_descr | ||
510 | structure and storing the pointer in the nand_chip structure member | ||
511 | bbt_td before calling nand_scan(). If a mirror table is necessary a | ||
512 | second structure must be created and a pointer to this structure must be | ||
513 | stored in bbt_md inside the nand_chip structure. If the bbt_md member | ||
514 | is set to NULL then only the main table is used and no scan for the | ||
515 | mirrored table is performed. | ||
516 | |||
517 | The most important field in the nand_bbt_descr structure is the | ||
518 | options field. The options define most of the table properties. Use the | ||
519 | predefined constants from nand.h to define the options. | ||
520 | |||
521 | - Number of bits per block | ||
522 | |||
523 | The supported number of bits is 1, 2, 4, 8. | ||
524 | |||
525 | - Table per chip | ||
526 | |||
527 | Setting the constant NAND_BBT_PERCHIP selects that a bad block | ||
528 | table is managed for each chip in a chip array. If this option is not | ||
529 | set then a per device bad block table is used. | ||
530 | |||
531 | - Table location is absolute | ||
532 | |||
533 | Use the option constant NAND_BBT_ABSPAGE and define the absolute | ||
534 | page number where the bad block table starts in the field pages. If | ||
535 | you have selected bad block tables per chip and you have a multi chip | ||
536 | array then the start page must be given for each chip in the chip | ||
537 | array. Note: there is no scan for a table ident pattern performed, so | ||
538 | the fields pattern, veroffs, offs, len can be left uninitialized | ||
539 | |||
540 | - Table location is automatically detected | ||
541 | |||
542 | The table can either be located in the first or the last good blocks | ||
543 | of the chip (device). Set NAND_BBT_LASTBLOCK to place the bad block | ||
544 | table at the end of the chip (device). The bad block tables are | ||
545 | marked and identified by a pattern which is stored in the spare area | ||
546 | of the first page in the block which holds the bad block table. Store | ||
547 | a pointer to the pattern in the pattern field. Further the length of | ||
548 | the pattern has to be stored in len and the offset in the spare area | ||
549 | must be given in the offs member of the nand_bbt_descr structure. | ||
550 | For mirrored bad block tables different patterns are mandatory. | ||
551 | |||
552 | - Table creation | ||
553 | |||
554 | Set the option NAND_BBT_CREATE to enable the table creation if no | ||
555 | table can be found during the scan. Usually this is done only once if | ||
556 | a new chip is found. | ||
557 | |||
558 | - Table write support | ||
559 | |||
560 | Set the option NAND_BBT_WRITE to enable the table write support. | ||
561 | This allows the update of the bad block table(s) in case a block has | ||
562 | to be marked bad due to wear. The MTD interface function | ||
563 | block_markbad is calling the update function of the bad block table. | ||
564 | If the write support is enabled then the table is updated on FLASH. | ||
565 | |||
566 | Note: Write support should only be enabled for mirrored tables with | ||
567 | version control. | ||
568 | |||
569 | - Table version control | ||
570 | |||
571 | Set the option NAND_BBT_VERSION to enable the table version | ||
572 | control. It's highly recommended to enable this for mirrored tables | ||
573 | with write support. It makes sure that the risk of losing the bad | ||
574 | block table information is reduced to the loss of the information | ||
575 | about the one worn out block which should be marked bad. The version | ||
576 | is stored in 4 consecutive bytes in the spare area of the device. The | ||
577 | position of the version number is defined by the member veroffs in | ||
578 | the bad block table descriptor. | ||
579 | |||
580 | - Save block contents on write | ||
581 | |||
582 | In case that the block which holds the bad block table does contain | ||
583 | other useful information, set the option NAND_BBT_SAVECONTENT. When | ||
584 | the bad block table is written then the whole block is read the bad | ||
585 | block table is updated and the block is erased and everything is | ||
586 | written back. If this option is not set only the bad block table is | ||
587 | written and everything else in the block is ignored and erased. | ||
588 | |||
589 | - Number of reserved blocks | ||
590 | |||
591 | For automatic placement some blocks must be reserved for bad block | ||
592 | table storage. The number of reserved blocks is defined in the | ||
593 | maxblocks member of the bad block table description structure. | ||
594 | Reserving 4 blocks for mirrored tables should be a reasonable number. | ||
595 | This also limits the number of blocks which are scanned for the bad | ||
596 | block table ident pattern. | ||
597 | |||
598 | Spare area (auto)placement | ||
599 | -------------------------- | ||
600 | |||
601 | The nand driver implements different possibilities for placement of | ||
602 | filesystem data in the spare area, | ||
603 | |||
604 | - Placement defined by fs driver | ||
605 | |||
606 | - Automatic placement | ||
607 | |||
608 | The default placement function is automatic placement. The nand driver | ||
609 | has built in default placement schemes for the various chiptypes. If due | ||
610 | to hardware ECC functionality the default placement does not fit then | ||
611 | the board driver can provide a own placement scheme. | ||
612 | |||
613 | File system drivers can provide a own placement scheme which is used | ||
614 | instead of the default placement scheme. | ||
615 | |||
616 | Placement schemes are defined by a nand_oobinfo structure | ||
617 | |||
618 | :: | ||
619 | |||
620 | struct nand_oobinfo { | ||
621 | int useecc; | ||
622 | int eccbytes; | ||
623 | int eccpos[24]; | ||
624 | int oobfree[8][2]; | ||
625 | }; | ||
626 | |||
627 | |||
628 | - useecc | ||
629 | |||
630 | The useecc member controls the ecc and placement function. The header | ||
631 | file include/mtd/mtd-abi.h contains constants to select ecc and | ||
632 | placement. MTD_NANDECC_OFF switches off the ecc complete. This is | ||
633 | not recommended and available for testing and diagnosis only. | ||
634 | MTD_NANDECC_PLACE selects caller defined placement, | ||
635 | MTD_NANDECC_AUTOPLACE selects automatic placement. | ||
636 | |||
637 | - eccbytes | ||
638 | |||
639 | The eccbytes member defines the number of ecc bytes per page. | ||
640 | |||
641 | - eccpos | ||
642 | |||
643 | The eccpos array holds the byte offsets in the spare area where the | ||
644 | ecc codes are placed. | ||
645 | |||
646 | - oobfree | ||
647 | |||
648 | The oobfree array defines the areas in the spare area which can be | ||
649 | used for automatic placement. The information is given in the format | ||
650 | {offset, size}. offset defines the start of the usable area, size the | ||
651 | length in bytes. More than one area can be defined. The list is | ||
652 | terminated by an {0, 0} entry. | ||
653 | |||
654 | Placement defined by fs driver | ||
655 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
656 | |||
657 | The calling function provides a pointer to a nand_oobinfo structure | ||
658 | which defines the ecc placement. For writes the caller must provide a | ||
659 | spare area buffer along with the data buffer. The spare area buffer size | ||
660 | is (number of pages) \* (size of spare area). For reads the buffer size | ||
661 | is (number of pages) \* ((size of spare area) + (number of ecc steps per | ||
662 | page) \* sizeof (int)). The driver stores the result of the ecc check | ||
663 | for each tuple in the spare buffer. The storage sequence is:: | ||
664 | |||
665 | <spare data page 0><ecc result 0>...<ecc result n> | ||
666 | |||
667 | ... | ||
668 | |||
669 | <spare data page n><ecc result 0>...<ecc result n> | ||
670 | |||
671 | This is a legacy mode used by YAFFS1. | ||
672 | |||
673 | If the spare area buffer is NULL then only the ECC placement is done | ||
674 | according to the given scheme in the nand_oobinfo structure. | ||
675 | |||
676 | Automatic placement | ||
677 | ~~~~~~~~~~~~~~~~~~~ | ||
678 | |||
679 | Automatic placement uses the built in defaults to place the ecc bytes in | ||
680 | the spare area. If filesystem data have to be stored / read into the | ||
681 | spare area then the calling function must provide a buffer. The buffer | ||
682 | size per page is determined by the oobfree array in the nand_oobinfo | ||
683 | structure. | ||
684 | |||
685 | If the spare area buffer is NULL then only the ECC placement is done | ||
686 | according to the default builtin scheme. | ||
687 | |||
688 | Spare area autoplacement default schemes | ||
689 | ---------------------------------------- | ||
690 | |||
691 | 256 byte pagesize | ||
692 | ~~~~~~~~~~~~~~~~~ | ||
693 | |||
694 | ======== ================== =================================================== | ||
695 | Offset Content Comment | ||
696 | ======== ================== =================================================== | ||
697 | 0x00 ECC byte 0 Error correction code byte 0 | ||
698 | 0x01 ECC byte 1 Error correction code byte 1 | ||
699 | 0x02 ECC byte 2 Error correction code byte 2 | ||
700 | 0x03 Autoplace 0 | ||
701 | 0x04 Autoplace 1 | ||
702 | 0x05 Bad block marker If any bit in this byte is zero, then this | ||
703 | block is bad. This applies only to the first | ||
704 | page in a block. In the remaining pages this | ||
705 | byte is reserved | ||
706 | 0x06 Autoplace 2 | ||
707 | 0x07 Autoplace 3 | ||
708 | ======== ================== =================================================== | ||
709 | |||
710 | 512 byte pagesize | ||
711 | ~~~~~~~~~~~~~~~~~ | ||
712 | |||
713 | |||
714 | ============= ================== ============================================== | ||
715 | Offset Content Comment | ||
716 | ============= ================== ============================================== | ||
717 | 0x00 ECC byte 0 Error correction code byte 0 of the lower | ||
718 | 256 Byte data in this page | ||
719 | 0x01 ECC byte 1 Error correction code byte 1 of the lower | ||
720 | 256 Bytes of data in this page | ||
721 | 0x02 ECC byte 2 Error correction code byte 2 of the lower | ||
722 | 256 Bytes of data in this page | ||
723 | 0x03 ECC byte 3 Error correction code byte 0 of the upper | ||
724 | 256 Bytes of data in this page | ||
725 | 0x04 reserved reserved | ||
726 | 0x05 Bad block marker If any bit in this byte is zero, then this | ||
727 | block is bad. This applies only to the first | ||
728 | page in a block. In the remaining pages this | ||
729 | byte is reserved | ||
730 | 0x06 ECC byte 4 Error correction code byte 1 of the upper | ||
731 | 256 Bytes of data in this page | ||
732 | 0x07 ECC byte 5 Error correction code byte 2 of the upper | ||
733 | 256 Bytes of data in this page | ||
734 | 0x08 - 0x0F Autoplace 0 - 7 | ||
735 | ============= ================== ============================================== | ||
736 | |||
737 | 2048 byte pagesize | ||
738 | ~~~~~~~~~~~~~~~~~~ | ||
739 | |||
740 | =========== ================== ================================================ | ||
741 | Offset Content Comment | ||
742 | =========== ================== ================================================ | ||
743 | 0x00 Bad block marker If any bit in this byte is zero, then this block | ||
744 | is bad. This applies only to the first page in a | ||
745 | block. In the remaining pages this byte is | ||
746 | reserved | ||
747 | 0x01 Reserved Reserved | ||
748 | 0x02-0x27 Autoplace 0 - 37 | ||
749 | 0x28 ECC byte 0 Error correction code byte 0 of the first | ||
750 | 256 Byte data in this page | ||
751 | 0x29 ECC byte 1 Error correction code byte 1 of the first | ||
752 | 256 Bytes of data in this page | ||
753 | 0x2A ECC byte 2 Error correction code byte 2 of the first | ||
754 | 256 Bytes data in this page | ||
755 | 0x2B ECC byte 3 Error correction code byte 0 of the second | ||
756 | 256 Bytes of data in this page | ||
757 | 0x2C ECC byte 4 Error correction code byte 1 of the second | ||
758 | 256 Bytes of data in this page | ||
759 | 0x2D ECC byte 5 Error correction code byte 2 of the second | ||
760 | 256 Bytes of data in this page | ||
761 | 0x2E ECC byte 6 Error correction code byte 0 of the third | ||
762 | 256 Bytes of data in this page | ||
763 | 0x2F ECC byte 7 Error correction code byte 1 of the third | ||
764 | 256 Bytes of data in this page | ||
765 | 0x30 ECC byte 8 Error correction code byte 2 of the third | ||
766 | 256 Bytes of data in this page | ||
767 | 0x31 ECC byte 9 Error correction code byte 0 of the fourth | ||
768 | 256 Bytes of data in this page | ||
769 | 0x32 ECC byte 10 Error correction code byte 1 of the fourth | ||
770 | 256 Bytes of data in this page | ||
771 | 0x33 ECC byte 11 Error correction code byte 2 of the fourth | ||
772 | 256 Bytes of data in this page | ||
773 | 0x34 ECC byte 12 Error correction code byte 0 of the fifth | ||
774 | 256 Bytes of data in this page | ||
775 | 0x35 ECC byte 13 Error correction code byte 1 of the fifth | ||
776 | 256 Bytes of data in this page | ||
777 | 0x36 ECC byte 14 Error correction code byte 2 of the fifth | ||
778 | 256 Bytes of data in this page | ||
779 | 0x37 ECC byte 15 Error correction code byte 0 of the sixth | ||
780 | 256 Bytes of data in this page | ||
781 | 0x38 ECC byte 16 Error correction code byte 1 of the sixth | ||
782 | 256 Bytes of data in this page | ||
783 | 0x39 ECC byte 17 Error correction code byte 2 of the sixth | ||
784 | 256 Bytes of data in this page | ||
785 | 0x3A ECC byte 18 Error correction code byte 0 of the seventh | ||
786 | 256 Bytes of data in this page | ||
787 | 0x3B ECC byte 19 Error correction code byte 1 of the seventh | ||
788 | 256 Bytes of data in this page | ||
789 | 0x3C ECC byte 20 Error correction code byte 2 of the seventh | ||
790 | 256 Bytes of data in this page | ||
791 | 0x3D ECC byte 21 Error correction code byte 0 of the eighth | ||
792 | 256 Bytes of data in this page | ||
793 | 0x3E ECC byte 22 Error correction code byte 1 of the eighth | ||
794 | 256 Bytes of data in this page | ||
795 | 0x3F ECC byte 23 Error correction code byte 2 of the eighth | ||
796 | 256 Bytes of data in this page | ||
797 | =========== ================== ================================================ | ||
798 | |||
799 | Filesystem support | ||
800 | ================== | ||
801 | |||
802 | The NAND driver provides all necessary functions for a filesystem via | ||
803 | the MTD interface. | ||
804 | |||
805 | Filesystems must be aware of the NAND peculiarities and restrictions. | ||
806 | One major restrictions of NAND Flash is, that you cannot write as often | ||
807 | as you want to a page. The consecutive writes to a page, before erasing | ||
808 | it again, are restricted to 1-3 writes, depending on the manufacturers | ||
809 | specifications. This applies similar to the spare area. | ||
810 | |||
811 | Therefore NAND aware filesystems must either write in page size chunks | ||
812 | or hold a writebuffer to collect smaller writes until they sum up to | ||
813 | pagesize. Available NAND aware filesystems: JFFS2, YAFFS. | ||
814 | |||
815 | The spare area usage to store filesystem data is controlled by the spare | ||
816 | area placement functionality which is described in one of the earlier | ||
817 | chapters. | ||
818 | |||
819 | Tools | ||
820 | ===== | ||
821 | |||
822 | The MTD project provides a couple of helpful tools to handle NAND Flash. | ||
823 | |||
824 | - flasherase, flasheraseall: Erase and format FLASH partitions | ||
825 | |||
826 | - nandwrite: write filesystem images to NAND FLASH | ||
827 | |||
828 | - nanddump: dump the contents of a NAND FLASH partitions | ||
829 | |||
830 | These tools are aware of the NAND restrictions. Please use those tools | ||
831 | instead of complaining about errors which are caused by non NAND aware | ||
832 | access methods. | ||
833 | |||
834 | Constants | ||
835 | ========= | ||
836 | |||
837 | This chapter describes the constants which might be relevant for a | ||
838 | driver developer. | ||
839 | |||
840 | Chip option constants | ||
841 | --------------------- | ||
842 | |||
843 | Constants for chip id table | ||
844 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
845 | |||
846 | These constants are defined in nand.h. They are OR-ed together to | ||
847 | describe the chip functionality:: | ||
848 | |||
849 | /* Buswitdh is 16 bit */ | ||
850 | #define NAND_BUSWIDTH_16 0x00000002 | ||
851 | /* Device supports partial programming without padding */ | ||
852 | #define NAND_NO_PADDING 0x00000004 | ||
853 | /* Chip has cache program function */ | ||
854 | #define NAND_CACHEPRG 0x00000008 | ||
855 | /* Chip has copy back function */ | ||
856 | #define NAND_COPYBACK 0x00000010 | ||
857 | /* AND Chip which has 4 banks and a confusing page / block | ||
858 | * assignment. See Renesas datasheet for further information */ | ||
859 | #define NAND_IS_AND 0x00000020 | ||
860 | /* Chip has a array of 4 pages which can be read without | ||
861 | * additional ready /busy waits */ | ||
862 | #define NAND_4PAGE_ARRAY 0x00000040 | ||
863 | |||
864 | |||
865 | Constants for runtime options | ||
866 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
867 | |||
868 | These constants are defined in nand.h. They are OR-ed together to | ||
869 | describe the functionality:: | ||
870 | |||
871 | /* The hw ecc generator provides a syndrome instead a ecc value on read | ||
872 | * This can only work if we have the ecc bytes directly behind the | ||
873 | * data bytes. Applies for DOC and AG-AND Renesas HW Reed Solomon generators */ | ||
874 | #define NAND_HWECC_SYNDROME 0x00020000 | ||
875 | |||
876 | |||
877 | ECC selection constants | ||
878 | ----------------------- | ||
879 | |||
880 | Use these constants to select the ECC algorithm:: | ||
881 | |||
882 | /* No ECC. Usage is not recommended ! */ | ||
883 | #define NAND_ECC_NONE 0 | ||
884 | /* Software ECC 3 byte ECC per 256 Byte data */ | ||
885 | #define NAND_ECC_SOFT 1 | ||
886 | /* Hardware ECC 3 byte ECC per 256 Byte data */ | ||
887 | #define NAND_ECC_HW3_256 2 | ||
888 | /* Hardware ECC 3 byte ECC per 512 Byte data */ | ||
889 | #define NAND_ECC_HW3_512 3 | ||
890 | /* Hardware ECC 6 byte ECC per 512 Byte data */ | ||
891 | #define NAND_ECC_HW6_512 4 | ||
892 | /* Hardware ECC 6 byte ECC per 512 Byte data */ | ||
893 | #define NAND_ECC_HW8_512 6 | ||
894 | |||
895 | |||
896 | Hardware control related constants | ||
897 | ---------------------------------- | ||
898 | |||
899 | These constants describe the requested hardware access function when the | ||
900 | boardspecific hardware control function is called:: | ||
901 | |||
902 | /* Select the chip by setting nCE to low */ | ||
903 | #define NAND_CTL_SETNCE 1 | ||
904 | /* Deselect the chip by setting nCE to high */ | ||
905 | #define NAND_CTL_CLRNCE 2 | ||
906 | /* Select the command latch by setting CLE to high */ | ||
907 | #define NAND_CTL_SETCLE 3 | ||
908 | /* Deselect the command latch by setting CLE to low */ | ||
909 | #define NAND_CTL_CLRCLE 4 | ||
910 | /* Select the address latch by setting ALE to high */ | ||
911 | #define NAND_CTL_SETALE 5 | ||
912 | /* Deselect the address latch by setting ALE to low */ | ||
913 | #define NAND_CTL_CLRALE 6 | ||
914 | /* Set write protection by setting WP to high. Not used! */ | ||
915 | #define NAND_CTL_SETWP 7 | ||
916 | /* Clear write protection by setting WP to low. Not used! */ | ||
917 | #define NAND_CTL_CLRWP 8 | ||
918 | |||
919 | |||
920 | Bad block table related constants | ||
921 | --------------------------------- | ||
922 | |||
923 | These constants describe the options used for bad block table | ||
924 | descriptors:: | ||
925 | |||
926 | /* Options for the bad block table descriptors */ | ||
927 | |||
928 | /* The number of bits used per block in the bbt on the device */ | ||
929 | #define NAND_BBT_NRBITS_MSK 0x0000000F | ||
930 | #define NAND_BBT_1BIT 0x00000001 | ||
931 | #define NAND_BBT_2BIT 0x00000002 | ||
932 | #define NAND_BBT_4BIT 0x00000004 | ||
933 | #define NAND_BBT_8BIT 0x00000008 | ||
934 | /* The bad block table is in the last good block of the device */ | ||
935 | #define NAND_BBT_LASTBLOCK 0x00000010 | ||
936 | /* The bbt is at the given page, else we must scan for the bbt */ | ||
937 | #define NAND_BBT_ABSPAGE 0x00000020 | ||
938 | /* bbt is stored per chip on multichip devices */ | ||
939 | #define NAND_BBT_PERCHIP 0x00000080 | ||
940 | /* bbt has a version counter at offset veroffs */ | ||
941 | #define NAND_BBT_VERSION 0x00000100 | ||
942 | /* Create a bbt if none axists */ | ||
943 | #define NAND_BBT_CREATE 0x00000200 | ||
944 | /* Write bbt if necessary */ | ||
945 | #define NAND_BBT_WRITE 0x00001000 | ||
946 | /* Read and write back block contents when writing bbt */ | ||
947 | #define NAND_BBT_SAVECONTENT 0x00002000 | ||
948 | |||
949 | |||
950 | Structures | ||
951 | ========== | ||
952 | |||
953 | This chapter contains the autogenerated documentation of the structures | ||
954 | which are used in the NAND driver and might be relevant for a driver | ||
955 | developer. Each struct member has a short description which is marked | ||
956 | with an [XXX] identifier. See the chapter "Documentation hints" for an | ||
957 | explanation. | ||
958 | |||
959 | .. kernel-doc:: include/linux/mtd/nand.h | ||
960 | :internal: | ||
961 | |||
962 | Public Functions Provided | ||
963 | ========================= | ||
964 | |||
965 | This chapter contains the autogenerated documentation of the NAND kernel | ||
966 | API functions which are exported. Each function has a short description | ||
967 | which is marked with an [XXX] identifier. See the chapter "Documentation | ||
968 | hints" for an explanation. | ||
969 | |||
970 | .. kernel-doc:: drivers/mtd/nand/nand_base.c | ||
971 | :export: | ||
972 | |||
973 | .. kernel-doc:: drivers/mtd/nand/nand_ecc.c | ||
974 | :export: | ||
975 | |||
976 | Internal Functions Provided | ||
977 | =========================== | ||
978 | |||
979 | This chapter contains the autogenerated documentation of the NAND driver | ||
980 | internal functions. Each function has a short description which is | ||
981 | marked with an [XXX] identifier. See the chapter "Documentation hints" | ||
982 | for an explanation. The functions marked with [DEFAULT] might be | ||
983 | relevant for a board driver developer. | ||
984 | |||
985 | .. kernel-doc:: drivers/mtd/nand/nand_base.c | ||
986 | :internal: | ||
987 | |||
988 | .. kernel-doc:: drivers/mtd/nand/nand_bbt.c | ||
989 | :internal: | ||
990 | |||
991 | Credits | ||
992 | ======= | ||
993 | |||
994 | The following people have contributed to the NAND driver: | ||
995 | |||
996 | 1. Steven J. Hill\ sjhill@realitydiluted.com | ||
997 | |||
998 | 2. David Woodhouse\ dwmw2@infradead.org | ||
999 | |||
1000 | 3. Thomas Gleixner\ tglx@linutronix.de | ||
1001 | |||
1002 | A lot of users have provided bugfixes, improvements and helping hands | ||
1003 | for testing. Thanks a lot. | ||
1004 | |||
1005 | The following people have contributed to this document: | ||
1006 | |||
1007 | 1. Thomas Gleixner\ tglx@linutronix.de | ||
diff --git a/Documentation/driver-api/rapidio.rst b/Documentation/driver-api/rapidio.rst new file mode 100644 index 000000000000..71ff658ab78e --- /dev/null +++ b/Documentation/driver-api/rapidio.rst | |||
@@ -0,0 +1,107 @@ | |||
1 | ======================= | ||
2 | RapidIO Subsystem Guide | ||
3 | ======================= | ||
4 | |||
5 | :Author: Matt Porter | ||
6 | |||
7 | Introduction | ||
8 | ============ | ||
9 | |||
10 | RapidIO is a high speed switched fabric interconnect with features aimed | ||
11 | at the embedded market. RapidIO provides support for memory-mapped I/O | ||
12 | as well as message-based transactions over the switched fabric network. | ||
13 | RapidIO has a standardized discovery mechanism not unlike the PCI bus | ||
14 | standard that allows simple detection of devices in a network. | ||
15 | |||
16 | This documentation is provided for developers intending to support | ||
17 | RapidIO on new architectures, write new drivers, or to understand the | ||
18 | subsystem internals. | ||
19 | |||
20 | Known Bugs and Limitations | ||
21 | ========================== | ||
22 | |||
23 | Bugs | ||
24 | ---- | ||
25 | |||
26 | None. ;) | ||
27 | |||
28 | Limitations | ||
29 | ----------- | ||
30 | |||
31 | 1. Access/management of RapidIO memory regions is not supported | ||
32 | |||
33 | 2. Multiple host enumeration is not supported | ||
34 | |||
35 | RapidIO driver interface | ||
36 | ======================== | ||
37 | |||
38 | Drivers are provided a set of calls in order to interface with the | ||
39 | subsystem to gather info on devices, request/map memory region | ||
40 | resources, and manage mailboxes/doorbells. | ||
41 | |||
42 | Functions | ||
43 | --------- | ||
44 | |||
45 | .. kernel-doc:: include/linux/rio_drv.h | ||
46 | :internal: | ||
47 | |||
48 | .. kernel-doc:: drivers/rapidio/rio-driver.c | ||
49 | :export: | ||
50 | |||
51 | .. kernel-doc:: drivers/rapidio/rio.c | ||
52 | :export: | ||
53 | |||
54 | Internals | ||
55 | ========= | ||
56 | |||
57 | This chapter contains the autogenerated documentation of the RapidIO | ||
58 | subsystem. | ||
59 | |||
60 | Structures | ||
61 | ---------- | ||
62 | |||
63 | .. kernel-doc:: include/linux/rio.h | ||
64 | :internal: | ||
65 | |||
66 | Enumeration and Discovery | ||
67 | ------------------------- | ||
68 | |||
69 | .. kernel-doc:: drivers/rapidio/rio-scan.c | ||
70 | :internal: | ||
71 | |||
72 | Driver functionality | ||
73 | -------------------- | ||
74 | |||
75 | .. kernel-doc:: drivers/rapidio/rio.c | ||
76 | :internal: | ||
77 | |||
78 | .. kernel-doc:: drivers/rapidio/rio-access.c | ||
79 | :internal: | ||
80 | |||
81 | Device model support | ||
82 | -------------------- | ||
83 | |||
84 | .. kernel-doc:: drivers/rapidio/rio-driver.c | ||
85 | :internal: | ||
86 | |||
87 | PPC32 support | ||
88 | ------------- | ||
89 | |||
90 | .. kernel-doc:: arch/powerpc/sysdev/fsl_rio.c | ||
91 | :internal: | ||
92 | |||
93 | Credits | ||
94 | ======= | ||
95 | |||
96 | The following people have contributed to the RapidIO subsystem directly | ||
97 | or indirectly: | ||
98 | |||
99 | 1. Matt Porter\ mporter@kernel.crashing.org | ||
100 | |||
101 | 2. Randy Vinson\ rvinson@mvista.com | ||
102 | |||
103 | 3. Dan Malek\ dan@embeddedalley.com | ||
104 | |||
105 | The following people have contributed to this document: | ||
106 | |||
107 | 1. Matt Porter\ mporter@kernel.crashing.org | ||
diff --git a/Documentation/driver-api/s390-drivers.rst b/Documentation/driver-api/s390-drivers.rst new file mode 100644 index 000000000000..7060da136095 --- /dev/null +++ b/Documentation/driver-api/s390-drivers.rst | |||
@@ -0,0 +1,111 @@ | |||
1 | =================================== | ||
2 | Writing s390 channel device drivers | ||
3 | =================================== | ||
4 | |||
5 | :Author: Cornelia Huck | ||
6 | |||
7 | Introduction | ||
8 | ============ | ||
9 | |||
10 | This document describes the interfaces available for device drivers that | ||
11 | drive s390 based channel attached I/O devices. This includes interfaces | ||
12 | for interaction with the hardware and interfaces for interacting with | ||
13 | the common driver core. Those interfaces are provided by the s390 common | ||
14 | I/O layer. | ||
15 | |||
16 | The document assumes a familarity with the technical terms associated | ||
17 | with the s390 channel I/O architecture. For a description of this | ||
18 | architecture, please refer to the "z/Architecture: Principles of | ||
19 | Operation", IBM publication no. SA22-7832. | ||
20 | |||
21 | While most I/O devices on a s390 system are typically driven through the | ||
22 | channel I/O mechanism described here, there are various other methods | ||
23 | (like the diag interface). These are out of the scope of this document. | ||
24 | |||
25 | Some additional information can also be found in the kernel source under | ||
26 | Documentation/s390/driver-model.txt. | ||
27 | |||
28 | The ccw bus | ||
29 | =========== | ||
30 | |||
31 | The ccw bus typically contains the majority of devices available to a | ||
32 | s390 system. Named after the channel command word (ccw), the basic | ||
33 | command structure used to address its devices, the ccw bus contains | ||
34 | so-called channel attached devices. They are addressed via I/O | ||
35 | subchannels, visible on the css bus. A device driver for | ||
36 | channel-attached devices, however, will never interact with the | ||
37 | subchannel directly, but only via the I/O device on the ccw bus, the ccw | ||
38 | device. | ||
39 | |||
40 | I/O functions for channel-attached devices | ||
41 | ------------------------------------------ | ||
42 | |||
43 | Some hardware structures have been translated into C structures for use | ||
44 | by the common I/O layer and device drivers. For more information on the | ||
45 | hardware structures represented here, please consult the Principles of | ||
46 | Operation. | ||
47 | |||
48 | .. kernel-doc:: arch/s390/include/asm/cio.h | ||
49 | :internal: | ||
50 | |||
51 | ccw devices | ||
52 | ----------- | ||
53 | |||
54 | Devices that want to initiate channel I/O need to attach to the ccw bus. | ||
55 | Interaction with the driver core is done via the common I/O layer, which | ||
56 | provides the abstractions of ccw devices and ccw device drivers. | ||
57 | |||
58 | The functions that initiate or terminate channel I/O all act upon a ccw | ||
59 | device structure. Device drivers must not bypass those functions or | ||
60 | strange side effects may happen. | ||
61 | |||
62 | .. kernel-doc:: arch/s390/include/asm/ccwdev.h | ||
63 | :internal: | ||
64 | |||
65 | .. kernel-doc:: drivers/s390/cio/device.c | ||
66 | :export: | ||
67 | |||
68 | .. kernel-doc:: drivers/s390/cio/device_ops.c | ||
69 | :export: | ||
70 | |||
71 | The channel-measurement facility | ||
72 | -------------------------------- | ||
73 | |||
74 | The channel-measurement facility provides a means to collect measurement | ||
75 | data which is made available by the channel subsystem for each channel | ||
76 | attached device. | ||
77 | |||
78 | .. kernel-doc:: arch/s390/include/asm/cmb.h | ||
79 | :internal: | ||
80 | |||
81 | .. kernel-doc:: drivers/s390/cio/cmf.c | ||
82 | :export: | ||
83 | |||
84 | The ccwgroup bus | ||
85 | ================ | ||
86 | |||
87 | The ccwgroup bus only contains artificial devices, created by the user. | ||
88 | Many networking devices (e.g. qeth) are in fact composed of several ccw | ||
89 | devices (like read, write and data channel for qeth). The ccwgroup bus | ||
90 | provides a mechanism to create a meta-device which contains those ccw | ||
91 | devices as slave devices and can be associated with the netdevice. | ||
92 | |||
93 | ccw group devices | ||
94 | ----------------- | ||
95 | |||
96 | .. kernel-doc:: arch/s390/include/asm/ccwgroup.h | ||
97 | :internal: | ||
98 | |||
99 | .. kernel-doc:: drivers/s390/cio/ccwgroup.c | ||
100 | :export: | ||
101 | |||
102 | Generic interfaces | ||
103 | ================== | ||
104 | |||
105 | Some interfaces are available to other drivers that do not necessarily | ||
106 | have anything to do with the busses described above, but still are | ||
107 | indirectly using basic infrastructure in the common I/O layer. One | ||
108 | example is the support for adapter interrupts. | ||
109 | |||
110 | .. kernel-doc:: drivers/s390/cio/airq.c | ||
111 | :export: | ||
diff --git a/Documentation/driver-api/scsi.rst b/Documentation/driver-api/scsi.rst new file mode 100644 index 000000000000..859fb672319f --- /dev/null +++ b/Documentation/driver-api/scsi.rst | |||
@@ -0,0 +1,344 @@ | |||
1 | ===================== | ||
2 | SCSI Interfaces Guide | ||
3 | ===================== | ||
4 | |||
5 | :Author: James Bottomley | ||
6 | :Author: Rob Landley | ||
7 | |||
8 | Introduction | ||
9 | ============ | ||
10 | |||
11 | Protocol vs bus | ||
12 | --------------- | ||
13 | |||
14 | Once upon a time, the Small Computer Systems Interface defined both a | ||
15 | parallel I/O bus and a data protocol to connect a wide variety of | ||
16 | peripherals (disk drives, tape drives, modems, printers, scanners, | ||
17 | optical drives, test equipment, and medical devices) to a host computer. | ||
18 | |||
19 | Although the old parallel (fast/wide/ultra) SCSI bus has largely fallen | ||
20 | out of use, the SCSI command set is more widely used than ever to | ||
21 | communicate with devices over a number of different busses. | ||
22 | |||
23 | The `SCSI protocol <http://www.t10.org/scsi-3.htm>`__ is a big-endian | ||
24 | peer-to-peer packet based protocol. SCSI commands are 6, 10, 12, or 16 | ||
25 | bytes long, often followed by an associated data payload. | ||
26 | |||
27 | SCSI commands can be transported over just about any kind of bus, and | ||
28 | are the default protocol for storage devices attached to USB, SATA, SAS, | ||
29 | Fibre Channel, FireWire, and ATAPI devices. SCSI packets are also | ||
30 | commonly exchanged over Infiniband, | ||
31 | `I20 <http://i2o.shadowconnect.com/faq.php>`__, TCP/IP | ||
32 | (`iSCSI <https://en.wikipedia.org/wiki/ISCSI>`__), even `Parallel | ||
33 | ports <http://cyberelk.net/tim/parport/parscsi.html>`__. | ||
34 | |||
35 | Design of the Linux SCSI subsystem | ||
36 | ---------------------------------- | ||
37 | |||
38 | The SCSI subsystem uses a three layer design, with upper, mid, and low | ||
39 | layers. Every operation involving the SCSI subsystem (such as reading a | ||
40 | sector from a disk) uses one driver at each of the 3 levels: one upper | ||
41 | layer driver, one lower layer driver, and the SCSI midlayer. | ||
42 | |||
43 | The SCSI upper layer provides the interface between userspace and the | ||
44 | kernel, in the form of block and char device nodes for I/O and ioctl(). | ||
45 | The SCSI lower layer contains drivers for specific hardware devices. | ||
46 | |||
47 | In between is the SCSI mid-layer, analogous to a network routing layer | ||
48 | such as the IPv4 stack. The SCSI mid-layer routes a packet based data | ||
49 | protocol between the upper layer's /dev nodes and the corresponding | ||
50 | devices in the lower layer. It manages command queues, provides error | ||
51 | handling and power management functions, and responds to ioctl() | ||
52 | requests. | ||
53 | |||
54 | SCSI upper layer | ||
55 | ================ | ||
56 | |||
57 | The upper layer supports the user-kernel interface by providing device | ||
58 | nodes. | ||
59 | |||
60 | sd (SCSI Disk) | ||
61 | -------------- | ||
62 | |||
63 | sd (sd_mod.o) | ||
64 | |||
65 | sr (SCSI CD-ROM) | ||
66 | ---------------- | ||
67 | |||
68 | sr (sr_mod.o) | ||
69 | |||
70 | st (SCSI Tape) | ||
71 | -------------- | ||
72 | |||
73 | st (st.o) | ||
74 | |||
75 | sg (SCSI Generic) | ||
76 | ----------------- | ||
77 | |||
78 | sg (sg.o) | ||
79 | |||
80 | ch (SCSI Media Changer) | ||
81 | ----------------------- | ||
82 | |||
83 | ch (ch.c) | ||
84 | |||
85 | SCSI mid layer | ||
86 | ============== | ||
87 | |||
88 | SCSI midlayer implementation | ||
89 | ---------------------------- | ||
90 | |||
91 | include/scsi/scsi_device.h | ||
92 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
93 | |||
94 | .. kernel-doc:: include/scsi/scsi_device.h | ||
95 | :internal: | ||
96 | |||
97 | drivers/scsi/scsi.c | ||
98 | ~~~~~~~~~~~~~~~~~~~ | ||
99 | |||
100 | Main file for the SCSI midlayer. | ||
101 | |||
102 | .. kernel-doc:: drivers/scsi/scsi.c | ||
103 | :export: | ||
104 | |||
105 | drivers/scsi/scsicam.c | ||
106 | ~~~~~~~~~~~~~~~~~~~~~~ | ||
107 | |||
108 | `SCSI Common Access | ||
109 | Method <http://www.t10.org/ftp/t10/drafts/cam/cam-r12b.pdf>`__ support | ||
110 | functions, for use with HDIO_GETGEO, etc. | ||
111 | |||
112 | .. kernel-doc:: drivers/scsi/scsicam.c | ||
113 | :export: | ||
114 | |||
115 | drivers/scsi/scsi_error.c | ||
116 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
117 | |||
118 | Common SCSI error/timeout handling routines. | ||
119 | |||
120 | .. kernel-doc:: drivers/scsi/scsi_error.c | ||
121 | :export: | ||
122 | |||
123 | drivers/scsi/scsi_devinfo.c | ||
124 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
125 | |||
126 | Manage scsi_dev_info_list, which tracks blacklisted and whitelisted | ||
127 | devices. | ||
128 | |||
129 | .. kernel-doc:: drivers/scsi/scsi_devinfo.c | ||
130 | :internal: | ||
131 | |||
132 | drivers/scsi/scsi_ioctl.c | ||
133 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
134 | |||
135 | Handle ioctl() calls for SCSI devices. | ||
136 | |||
137 | .. kernel-doc:: drivers/scsi/scsi_ioctl.c | ||
138 | :export: | ||
139 | |||
140 | drivers/scsi/scsi_lib.c | ||
141 | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
142 | |||
143 | SCSI queuing library. | ||
144 | |||
145 | .. kernel-doc:: drivers/scsi/scsi_lib.c | ||
146 | :export: | ||
147 | |||
148 | drivers/scsi/scsi_lib_dma.c | ||
149 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
150 | |||
151 | SCSI library functions depending on DMA (map and unmap scatter-gather | ||
152 | lists). | ||
153 | |||
154 | .. kernel-doc:: drivers/scsi/scsi_lib_dma.c | ||
155 | :export: | ||
156 | |||
157 | drivers/scsi/scsi_module.c | ||
158 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
159 | |||
160 | The file drivers/scsi/scsi_module.c contains legacy support for | ||
161 | old-style host templates. It should never be used by any new driver. | ||
162 | |||
163 | drivers/scsi/scsi_proc.c | ||
164 | ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
165 | |||
166 | The functions in this file provide an interface between the PROC file | ||
167 | system and the SCSI device drivers It is mainly used for debugging, | ||
168 | statistics and to pass information directly to the lowlevel driver. I.E. | ||
169 | plumbing to manage /proc/scsi/\* | ||
170 | |||
171 | .. kernel-doc:: drivers/scsi/scsi_proc.c | ||
172 | :internal: | ||
173 | |||
174 | drivers/scsi/scsi_netlink.c | ||
175 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
176 | |||
177 | Infrastructure to provide async events from transports to userspace via | ||
178 | netlink, using a single NETLINK_SCSITRANSPORT protocol for all | ||
179 | transports. See `the original patch | ||
180 | submission <http://marc.info/?l=linux-scsi&m=115507374832500&w=2>`__ for | ||
181 | more details. | ||
182 | |||
183 | .. kernel-doc:: drivers/scsi/scsi_netlink.c | ||
184 | :internal: | ||
185 | |||
186 | drivers/scsi/scsi_scan.c | ||
187 | ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
188 | |||
189 | Scan a host to determine which (if any) devices are attached. The | ||
190 | general scanning/probing algorithm is as follows, exceptions are made to | ||
191 | it depending on device specific flags, compilation options, and global | ||
192 | variable (boot or module load time) settings. A specific LUN is scanned | ||
193 | via an INQUIRY command; if the LUN has a device attached, a scsi_device | ||
194 | is allocated and setup for it. For every id of every channel on the | ||
195 | given host, start by scanning LUN 0. Skip hosts that don't respond at | ||
196 | all to a scan of LUN 0. Otherwise, if LUN 0 has a device attached, | ||
197 | allocate and setup a scsi_device for it. If target is SCSI-3 or up, | ||
198 | issue a REPORT LUN, and scan all of the LUNs returned by the REPORT LUN; | ||
199 | else, sequentially scan LUNs up until some maximum is reached, or a LUN | ||
200 | is seen that cannot have a device attached to it. | ||
201 | |||
202 | .. kernel-doc:: drivers/scsi/scsi_scan.c | ||
203 | :internal: | ||
204 | |||
205 | drivers/scsi/scsi_sysctl.c | ||
206 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
207 | |||
208 | Set up the sysctl entry: "/dev/scsi/logging_level" | ||
209 | (DEV_SCSI_LOGGING_LEVEL) which sets/returns scsi_logging_level. | ||
210 | |||
211 | drivers/scsi/scsi_sysfs.c | ||
212 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
213 | |||
214 | SCSI sysfs interface routines. | ||
215 | |||
216 | .. kernel-doc:: drivers/scsi/scsi_sysfs.c | ||
217 | :export: | ||
218 | |||
219 | drivers/scsi/hosts.c | ||
220 | ~~~~~~~~~~~~~~~~~~~~ | ||
221 | |||
222 | mid to lowlevel SCSI driver interface | ||
223 | |||
224 | .. kernel-doc:: drivers/scsi/hosts.c | ||
225 | :export: | ||
226 | |||
227 | drivers/scsi/constants.c | ||
228 | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
229 | |||
230 | mid to lowlevel SCSI driver interface | ||
231 | |||
232 | .. kernel-doc:: drivers/scsi/constants.c | ||
233 | :export: | ||
234 | |||
235 | Transport classes | ||
236 | ----------------- | ||
237 | |||
238 | Transport classes are service libraries for drivers in the SCSI lower | ||
239 | layer, which expose transport attributes in sysfs. | ||
240 | |||
241 | Fibre Channel transport | ||
242 | ~~~~~~~~~~~~~~~~~~~~~~~ | ||
243 | |||
244 | The file drivers/scsi/scsi_transport_fc.c defines transport attributes | ||
245 | for Fibre Channel. | ||
246 | |||
247 | .. kernel-doc:: drivers/scsi/scsi_transport_fc.c | ||
248 | :export: | ||
249 | |||
250 | iSCSI transport class | ||
251 | ~~~~~~~~~~~~~~~~~~~~~ | ||
252 | |||
253 | The file drivers/scsi/scsi_transport_iscsi.c defines transport | ||
254 | attributes for the iSCSI class, which sends SCSI packets over TCP/IP | ||
255 | connections. | ||
256 | |||
257 | .. kernel-doc:: drivers/scsi/scsi_transport_iscsi.c | ||
258 | :export: | ||
259 | |||
260 | Serial Attached SCSI (SAS) transport class | ||
261 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
262 | |||
263 | The file drivers/scsi/scsi_transport_sas.c defines transport | ||
264 | attributes for Serial Attached SCSI, a variant of SATA aimed at large | ||
265 | high-end systems. | ||
266 | |||
267 | The SAS transport class contains common code to deal with SAS HBAs, an | ||
268 | aproximated representation of SAS topologies in the driver model, and | ||
269 | various sysfs attributes to expose these topologies and management | ||
270 | interfaces to userspace. | ||
271 | |||
272 | In addition to the basic SCSI core objects this transport class | ||
273 | introduces two additional intermediate objects: The SAS PHY as | ||
274 | represented by struct sas_phy defines an "outgoing" PHY on a SAS HBA or | ||
275 | Expander, and the SAS remote PHY represented by struct sas_rphy defines | ||
276 | an "incoming" PHY on a SAS Expander or end device. Note that this is | ||
277 | purely a software concept, the underlying hardware for a PHY and a | ||
278 | remote PHY is the exactly the same. | ||
279 | |||
280 | There is no concept of a SAS port in this code, users can see what PHYs | ||
281 | form a wide port based on the port_identifier attribute, which is the | ||
282 | same for all PHYs in a port. | ||
283 | |||
284 | .. kernel-doc:: drivers/scsi/scsi_transport_sas.c | ||
285 | :export: | ||
286 | |||
287 | SATA transport class | ||
288 | ~~~~~~~~~~~~~~~~~~~~ | ||
289 | |||
290 | The SATA transport is handled by libata, which has its own book of | ||
291 | documentation in this directory. | ||
292 | |||
293 | Parallel SCSI (SPI) transport class | ||
294 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
295 | |||
296 | The file drivers/scsi/scsi_transport_spi.c defines transport | ||
297 | attributes for traditional (fast/wide/ultra) SCSI busses. | ||
298 | |||
299 | .. kernel-doc:: drivers/scsi/scsi_transport_spi.c | ||
300 | :export: | ||
301 | |||
302 | SCSI RDMA (SRP) transport class | ||
303 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
304 | |||
305 | The file drivers/scsi/scsi_transport_srp.c defines transport | ||
306 | attributes for SCSI over Remote Direct Memory Access. | ||
307 | |||
308 | .. kernel-doc:: drivers/scsi/scsi_transport_srp.c | ||
309 | :export: | ||
310 | |||
311 | SCSI lower layer | ||
312 | ================ | ||
313 | |||
314 | Host Bus Adapter transport types | ||
315 | -------------------------------- | ||
316 | |||
317 | Many modern device controllers use the SCSI command set as a protocol to | ||
318 | communicate with their devices through many different types of physical | ||
319 | connections. | ||
320 | |||
321 | In SCSI language a bus capable of carrying SCSI commands is called a | ||
322 | "transport", and a controller connecting to such a bus is called a "host | ||
323 | bus adapter" (HBA). | ||
324 | |||
325 | Debug transport | ||
326 | ~~~~~~~~~~~~~~~ | ||
327 | |||
328 | The file drivers/scsi/scsi_debug.c simulates a host adapter with a | ||
329 | variable number of disks (or disk like devices) attached, sharing a | ||
330 | common amount of RAM. Does a lot of checking to make sure that we are | ||
331 | not getting blocks mixed up, and panics the kernel if anything out of | ||
332 | the ordinary is seen. | ||
333 | |||
334 | To be more realistic, the simulated devices have the transport | ||
335 | attributes of SAS disks. | ||
336 | |||
337 | For documentation see http://sg.danny.cz/sg/sdebug26.html | ||
338 | |||
339 | todo | ||
340 | ~~~~ | ||
341 | |||
342 | Parallel (fast/wide/ultra) SCSI, USB, SATA, SAS, Fibre Channel, | ||
343 | FireWire, ATAPI devices, Infiniband, I20, iSCSI, Parallel ports, | ||
344 | netlink... | ||
diff --git a/Documentation/driver-api/w1.rst b/Documentation/driver-api/w1.rst new file mode 100644 index 000000000000..9963cca788a1 --- /dev/null +++ b/Documentation/driver-api/w1.rst | |||
@@ -0,0 +1,70 @@ | |||
1 | ====================== | ||
2 | W1: Dallas' 1-wire bus | ||
3 | ====================== | ||
4 | |||
5 | :Author: David Fries | ||
6 | |||
7 | W1 API internal to the kernel | ||
8 | ============================= | ||
9 | |||
10 | W1 API internal to the kernel | ||
11 | ----------------------------- | ||
12 | |||
13 | include/linux/w1.h | ||
14 | ~~~~~~~~~~~~~~~~~~ | ||
15 | |||
16 | W1 kernel API functions. | ||
17 | |||
18 | .. kernel-doc:: include/linux/w1.h | ||
19 | :internal: | ||
20 | |||
21 | drivers/w1/w1.c | ||
22 | ~~~~~~~~~~~~~~~ | ||
23 | |||
24 | W1 core functions. | ||
25 | |||
26 | .. kernel-doc:: drivers/w1/w1.c | ||
27 | :internal: | ||
28 | |||
29 | drivers/w1/w1_family.c | ||
30 | ~~~~~~~~~~~~~~~~~~~~~~~ | ||
31 | |||
32 | Allows registering device family operations. | ||
33 | |||
34 | .. kernel-doc:: drivers/w1/w1_family.c | ||
35 | :export: | ||
36 | |||
37 | drivers/w1/w1_internal.h | ||
38 | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
39 | |||
40 | W1 internal initialization for master devices. | ||
41 | |||
42 | .. kernel-doc:: drivers/w1/w1_internal.h | ||
43 | :internal: | ||
44 | |||
45 | drivers/w1/w1_int.c | ||
46 | ~~~~~~~~~~~~~~~~~~~~ | ||
47 | |||
48 | W1 internal initialization for master devices. | ||
49 | |||
50 | .. kernel-doc:: drivers/w1/w1_int.c | ||
51 | :export: | ||
52 | |||
53 | drivers/w1/w1_netlink.h | ||
54 | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
55 | |||
56 | W1 external netlink API structures and commands. | ||
57 | |||
58 | .. kernel-doc:: drivers/w1/w1_netlink.h | ||
59 | :internal: | ||
60 | |||
61 | drivers/w1/w1_io.c | ||
62 | ~~~~~~~~~~~~~~~~~~~ | ||
63 | |||
64 | W1 input/output. | ||
65 | |||
66 | .. kernel-doc:: drivers/w1/w1_io.c | ||
67 | :export: | ||
68 | |||
69 | .. kernel-doc:: drivers/w1/w1_io.c | ||
70 | :internal: | ||
diff --git a/Documentation/fb/api.txt b/Documentation/fb/api.txt index d4ff7de85700..d52cf1e3b975 100644 --- a/Documentation/fb/api.txt +++ b/Documentation/fb/api.txt | |||
@@ -289,12 +289,12 @@ the FB_CAP_FOURCC bit in the fb_fix_screeninfo capabilities field. | |||
289 | FOURCC definitions are located in the linux/videodev2.h header. However, and | 289 | FOURCC definitions are located in the linux/videodev2.h header. However, and |
290 | despite starting with the V4L2_PIX_FMT_prefix, they are not restricted to V4L2 | 290 | despite starting with the V4L2_PIX_FMT_prefix, they are not restricted to V4L2 |
291 | and don't require usage of the V4L2 subsystem. FOURCC documentation is | 291 | and don't require usage of the V4L2 subsystem. FOURCC documentation is |
292 | available in Documentation/DocBook/v4l/pixfmt.xml. | 292 | available in Documentation/media/uapi/v4l/pixfmt.rst. |
293 | 293 | ||
294 | To select a format, applications set the grayscale field to the desired FOURCC. | 294 | To select a format, applications set the grayscale field to the desired FOURCC. |
295 | For YUV formats, they should also select the appropriate colorspace by setting | 295 | For YUV formats, they should also select the appropriate colorspace by setting |
296 | the colorspace field to one of the colorspaces listed in linux/videodev2.h and | 296 | the colorspace field to one of the colorspaces listed in linux/videodev2.h and |
297 | documented in Documentation/DocBook/v4l/colorspaces.xml. | 297 | documented in Documentation/media/uapi/v4l/colorspaces.rst. |
298 | 298 | ||
299 | The red, green, blue and transp fields are not used with the FOURCC-based API. | 299 | The red, green, blue and transp fields are not used with the FOURCC-based API. |
300 | For forward compatibility reasons applications must zero those fields, and | 300 | For forward compatibility reasons applications must zero those fields, and |
diff --git a/Documentation/filesystems/conf.py b/Documentation/filesystems/conf.py new file mode 100644 index 000000000000..ea44172af5c4 --- /dev/null +++ b/Documentation/filesystems/conf.py | |||
@@ -0,0 +1,10 @@ | |||
1 | # -*- coding: utf-8; mode: python -*- | ||
2 | |||
3 | project = "Linux Filesystems API" | ||
4 | |||
5 | tags.add("subproject") | ||
6 | |||
7 | latex_documents = [ | ||
8 | ('index', 'filesystems.tex', project, | ||
9 | 'The kernel development community', 'manual'), | ||
10 | ] | ||
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst new file mode 100644 index 000000000000..256e10eedba4 --- /dev/null +++ b/Documentation/filesystems/index.rst | |||
@@ -0,0 +1,317 @@ | |||
1 | ===================== | ||
2 | Linux Filesystems API | ||
3 | ===================== | ||
4 | |||
5 | The Linux VFS | ||
6 | ============= | ||
7 | |||
8 | The Filesystem types | ||
9 | -------------------- | ||
10 | |||
11 | .. kernel-doc:: include/linux/fs.h | ||
12 | :internal: | ||
13 | |||
14 | The Directory Cache | ||
15 | ------------------- | ||
16 | |||
17 | .. kernel-doc:: fs/dcache.c | ||
18 | :export: | ||
19 | |||
20 | .. kernel-doc:: include/linux/dcache.h | ||
21 | :internal: | ||
22 | |||
23 | Inode Handling | ||
24 | -------------- | ||
25 | |||
26 | .. kernel-doc:: fs/inode.c | ||
27 | :export: | ||
28 | |||
29 | .. kernel-doc:: fs/bad_inode.c | ||
30 | :export: | ||
31 | |||
32 | Registration and Superblocks | ||
33 | ---------------------------- | ||
34 | |||
35 | .. kernel-doc:: fs/super.c | ||
36 | :export: | ||
37 | |||
38 | File Locks | ||
39 | ---------- | ||
40 | |||
41 | .. kernel-doc:: fs/locks.c | ||
42 | :export: | ||
43 | |||
44 | .. kernel-doc:: fs/locks.c | ||
45 | :internal: | ||
46 | |||
47 | Other Functions | ||
48 | --------------- | ||
49 | |||
50 | .. kernel-doc:: fs/mpage.c | ||
51 | :export: | ||
52 | |||
53 | .. kernel-doc:: fs/namei.c | ||
54 | :export: | ||
55 | |||
56 | .. kernel-doc:: fs/buffer.c | ||
57 | :export: | ||
58 | |||
59 | .. kernel-doc:: block/bio.c | ||
60 | :export: | ||
61 | |||
62 | .. kernel-doc:: fs/seq_file.c | ||
63 | :export: | ||
64 | |||
65 | .. kernel-doc:: fs/filesystems.c | ||
66 | :export: | ||
67 | |||
68 | .. kernel-doc:: fs/fs-writeback.c | ||
69 | :export: | ||
70 | |||
71 | .. kernel-doc:: fs/block_dev.c | ||
72 | :export: | ||
73 | |||
74 | The proc filesystem | ||
75 | =================== | ||
76 | |||
77 | sysctl interface | ||
78 | ---------------- | ||
79 | |||
80 | .. kernel-doc:: kernel/sysctl.c | ||
81 | :export: | ||
82 | |||
83 | proc filesystem interface | ||
84 | ------------------------- | ||
85 | |||
86 | .. kernel-doc:: fs/proc/base.c | ||
87 | :internal: | ||
88 | |||
89 | Events based on file descriptors | ||
90 | ================================ | ||
91 | |||
92 | .. kernel-doc:: fs/eventfd.c | ||
93 | :export: | ||
94 | |||
95 | The Filesystem for Exporting Kernel Objects | ||
96 | =========================================== | ||
97 | |||
98 | .. kernel-doc:: fs/sysfs/file.c | ||
99 | :export: | ||
100 | |||
101 | .. kernel-doc:: fs/sysfs/symlink.c | ||
102 | :export: | ||
103 | |||
104 | The debugfs filesystem | ||
105 | ====================== | ||
106 | |||
107 | debugfs interface | ||
108 | ----------------- | ||
109 | |||
110 | .. kernel-doc:: fs/debugfs/inode.c | ||
111 | :export: | ||
112 | |||
113 | .. kernel-doc:: fs/debugfs/file.c | ||
114 | :export: | ||
115 | |||
116 | The Linux Journalling API | ||
117 | ========================= | ||
118 | |||
119 | Overview | ||
120 | -------- | ||
121 | |||
122 | Details | ||
123 | ~~~~~~~ | ||
124 | |||
125 | The journalling layer is easy to use. You need to first of all create a | ||
126 | journal_t data structure. There are two calls to do this dependent on | ||
127 | how you decide to allocate the physical media on which the journal | ||
128 | resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in | ||
129 | filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used | ||
130 | for journal stored on a raw device (in a continuous range of blocks). A | ||
131 | journal_t is a typedef for a struct pointer, so when you are finally | ||
132 | finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up | ||
133 | any used kernel memory. | ||
134 | |||
135 | Once you have got your journal_t object you need to 'mount' or load the | ||
136 | journal file. The journalling layer expects the space for the journal | ||
137 | was already allocated and initialized properly by the userspace tools. | ||
138 | When loading the journal you must call :c:func:`jbd2_journal_load` to process | ||
139 | journal contents. If the client file system detects the journal contents | ||
140 | does not need to be processed (or even need not have valid contents), it | ||
141 | may call :c:func:`jbd2_journal_wipe` to clear the journal contents before | ||
142 | calling :c:func:`jbd2_journal_load`. | ||
143 | |||
144 | Note that jbd2_journal_wipe(..,0) calls | ||
145 | :c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding | ||
146 | transactions in the journal and similarly :c:func:`jbd2_journal_load` will | ||
147 | call :c:func:`jbd2_journal_recover` if necessary. I would advise reading | ||
148 | :c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage. | ||
149 | |||
150 | Now you can go ahead and start modifying the underlying filesystem. | ||
151 | Almost. | ||
152 | |||
153 | You still need to actually journal your filesystem changes, this is done | ||
154 | by wrapping them into transactions. Additionally you also need to wrap | ||
155 | the modification of each of the buffers with calls to the journal layer, | ||
156 | so it knows what the modifications you are actually making are. To do | ||
157 | this use :c:func:`jbd2_journal_start` which returns a transaction handle. | ||
158 | |||
159 | :c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`, | ||
160 | which indicates the end of a transaction are nestable calls, so you can | ||
161 | reenter a transaction if necessary, but remember you must call | ||
162 | :c:func:`jbd2_journal_stop` the same number of times as | ||
163 | :c:func:`jbd2_journal_start` before the transaction is completed (or more | ||
164 | accurately leaves the update phase). Ext4/VFS makes use of this feature to | ||
165 | simplify handling of inode dirtying, quota support, etc. | ||
166 | |||
167 | Inside each transaction you need to wrap the modifications to the | ||
168 | individual buffers (blocks). Before you start to modify a buffer you | ||
169 | need to call :c:func:`jbd2_journal_get_create_access()` / | ||
170 | :c:func:`jbd2_journal_get_write_access()` / | ||
171 | :c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the | ||
172 | journalling layer to copy the unmodified | ||
173 | data if it needs to. After all the buffer may be part of a previously | ||
174 | uncommitted transaction. At this point you are at last ready to modify a | ||
175 | buffer, and once you are have done so you need to call | ||
176 | :c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a | ||
177 | buffer you now know is now longer required to be pushed back on the | ||
178 | device you can call :c:func:`jbd2_journal_forget` in much the same way as you | ||
179 | might have used :c:func:`bforget` in the past. | ||
180 | |||
181 | A :c:func:`jbd2_journal_flush` may be called at any time to commit and | ||
182 | checkpoint all your transactions. | ||
183 | |||
184 | Then at umount time , in your :c:func:`put_super` you can then call | ||
185 | :c:func:`jbd2_journal_destroy` to clean up your in-core journal object. | ||
186 | |||
187 | Unfortunately there a couple of ways the journal layer can cause a | ||
188 | deadlock. The first thing to note is that each task can only have a | ||
189 | single outstanding transaction at any one time, remember nothing commits | ||
190 | until the outermost :c:func:`jbd2_journal_stop`. This means you must complete | ||
191 | the transaction at the end of each file/inode/address etc. operation you | ||
192 | perform, so that the journalling system isn't re-entered on another | ||
193 | journal. Since transactions can't be nested/batched across differing | ||
194 | journals, and another filesystem other than yours (say ext4) may be | ||
195 | modified in a later syscall. | ||
196 | |||
197 | The second case to bear in mind is that :c:func:`jbd2_journal_start` can block | ||
198 | if there isn't enough space in the journal for your transaction (based | ||
199 | on the passed nblocks param) - when it blocks it merely(!) needs to wait | ||
200 | for transactions to complete and be committed from other tasks, so | ||
201 | essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid | ||
202 | deadlocks you must treat :c:func:`jbd2_journal_start` / | ||
203 | :c:func:`jbd2_journal_stop` as if they were semaphores and include them in | ||
204 | your semaphore ordering rules to prevent | ||
205 | deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking | ||
206 | behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as | ||
207 | easily as on :c:func:`jbd2_journal_start`. | ||
208 | |||
209 | Try to reserve the right number of blocks the first time. ;-). This will | ||
210 | be the maximum number of blocks you are going to touch in this | ||
211 | transaction. I advise having a look at at least ext4_jbd.h to see the | ||
212 | basis on which ext4 uses to make these decisions. | ||
213 | |||
214 | Another wriggle to watch out for is your on-disk block allocation | ||
215 | strategy. Why? Because, if you do a delete, you need to ensure you | ||
216 | haven't reused any of the freed blocks until the transaction freeing | ||
217 | these blocks commits. If you reused these blocks and crash happens, | ||
218 | there is no way to restore the contents of the reallocated blocks at the | ||
219 | end of the last fully committed transaction. One simple way of doing | ||
220 | this is to mark blocks as free in internal in-memory block allocation | ||
221 | structures only after the transaction freeing them commits. Ext4 uses | ||
222 | journal commit callback for this purpose. | ||
223 | |||
224 | With journal commit callbacks you can ask the journalling layer to call | ||
225 | a callback function when the transaction is finally committed to disk, | ||
226 | so that you can do some of your own management. You ask the journalling | ||
227 | layer for calling the callback by simply setting | ||
228 | ``journal->j_commit_callback`` function pointer and that function is | ||
229 | called after each transaction commit. You can also use | ||
230 | ``transaction->t_private_list`` for attaching entries to a transaction | ||
231 | that need processing when the transaction commits. | ||
232 | |||
233 | JBD2 also provides a way to block all transaction updates via | ||
234 | :c:func:`jbd2_journal_lock_updates()` / | ||
235 | :c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a | ||
236 | window with a clean and stable fs for a moment. E.g. | ||
237 | |||
238 | :: | ||
239 | |||
240 | |||
241 | jbd2_journal_lock_updates() //stop new stuff happening.. | ||
242 | jbd2_journal_flush() // checkpoint everything. | ||
243 | ..do stuff on stable fs | ||
244 | jbd2_journal_unlock_updates() // carry on with filesystem use. | ||
245 | |||
246 | The opportunities for abuse and DOS attacks with this should be obvious, | ||
247 | if you allow unprivileged userspace to trigger codepaths containing | ||
248 | these calls. | ||
249 | |||
250 | Summary | ||
251 | ~~~~~~~ | ||
252 | |||
253 | Using the journal is a matter of wrapping the different context changes, | ||
254 | being each mount, each modification (transaction) and each changed | ||
255 | buffer to tell the journalling layer about them. | ||
256 | |||
257 | Data Types | ||
258 | ---------- | ||
259 | |||
260 | The journalling layer uses typedefs to 'hide' the concrete definitions | ||
261 | of the structures used. As a client of the JBD2 layer you can just rely | ||
262 | on the using the pointer as a magic cookie of some sort. Obviously the | ||
263 | hiding is not enforced as this is 'C'. | ||
264 | |||
265 | Structures | ||
266 | ~~~~~~~~~~ | ||
267 | |||
268 | .. kernel-doc:: include/linux/jbd2.h | ||
269 | :internal: | ||
270 | |||
271 | Functions | ||
272 | --------- | ||
273 | |||
274 | The functions here are split into two groups those that affect a journal | ||
275 | as a whole, and those which are used to manage transactions | ||
276 | |||
277 | Journal Level | ||
278 | ~~~~~~~~~~~~~ | ||
279 | |||
280 | .. kernel-doc:: fs/jbd2/journal.c | ||
281 | :export: | ||
282 | |||
283 | .. kernel-doc:: fs/jbd2/recovery.c | ||
284 | :internal: | ||
285 | |||
286 | Transasction Level | ||
287 | ~~~~~~~~~~~~~~~~~~ | ||
288 | |||
289 | .. kernel-doc:: fs/jbd2/transaction.c | ||
290 | |||
291 | See also | ||
292 | -------- | ||
293 | |||
294 | `Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen | ||
295 | Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__ | ||
296 | |||
297 | `Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen | ||
298 | Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__ | ||
299 | |||
300 | splice API | ||
301 | ========== | ||
302 | |||
303 | splice is a method for moving blocks of data around inside the kernel, | ||
304 | without continually transferring them between the kernel and user space. | ||
305 | |||
306 | .. kernel-doc:: fs/splice.c | ||
307 | |||
308 | pipes API | ||
309 | ========= | ||
310 | |||
311 | Pipe interfaces are all for in-kernel (builtin image) use. They are not | ||
312 | exported for use by modules. | ||
313 | |||
314 | .. kernel-doc:: include/linux/pipe_fs_i.h | ||
315 | :internal: | ||
316 | |||
317 | .. kernel-doc:: fs/pipe.c | ||
diff --git a/Documentation/filesystems/nfs/idmapper.txt b/Documentation/filesystems/nfs/idmapper.txt index fe03d10bb79a..b86831acd583 100644 --- a/Documentation/filesystems/nfs/idmapper.txt +++ b/Documentation/filesystems/nfs/idmapper.txt | |||
@@ -55,7 +55,7 @@ request-key will find the first matching line and corresponding program. In | |||
55 | this case, /some/other/program will handle all uid lookups and | 55 | this case, /some/other/program will handle all uid lookups and |
56 | /usr/sbin/nfs.idmap will handle gid, user, and group lookups. | 56 | /usr/sbin/nfs.idmap will handle gid, user, and group lookups. |
57 | 57 | ||
58 | See <file:Documentation/security/keys-request-key.txt> for more information | 58 | See <file:Documentation/security/keys/request-key.rst> for more information |
59 | about the request-key function. | 59 | about the request-key function. |
60 | 60 | ||
61 | 61 | ||
diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst index 1bdb7356a310..6162d0e9dc28 100644 --- a/Documentation/gpu/todo.rst +++ b/Documentation/gpu/todo.rst | |||
@@ -228,7 +228,7 @@ The DRM reference documentation is still lacking kerneldoc in a few areas. The | |||
228 | task would be to clean up interfaces like moving functions around between | 228 | task would be to clean up interfaces like moving functions around between |
229 | files to better group them and improving the interfaces like dropping return | 229 | files to better group them and improving the interfaces like dropping return |
230 | values for functions that never fail. Then write kerneldoc for all exported | 230 | values for functions that never fail. Then write kerneldoc for all exported |
231 | functions and an overview section and integrate it all into the drm DocBook. | 231 | functions and an overview section and integrate it all into the drm book. |
232 | 232 | ||
233 | See https://dri.freedesktop.org/docs/drm/ for what's there already. | 233 | See https://dri.freedesktop.org/docs/drm/ for what's there already. |
234 | 234 | ||
diff --git a/Documentation/index.rst b/Documentation/index.rst index bc67dbf76eb0..cb7f1ba5b3b1 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst | |||
@@ -3,8 +3,8 @@ | |||
3 | You can adapt this file completely to your liking, but it should at least | 3 | You can adapt this file completely to your liking, but it should at least |
4 | contain the root `toctree` directive. | 4 | contain the root `toctree` directive. |
5 | 5 | ||
6 | Welcome to The Linux Kernel's documentation | 6 | The Linux Kernel documentation |
7 | =========================================== | 7 | ============================== |
8 | 8 | ||
9 | This is the top level of the kernel's documentation tree. Kernel | 9 | This is the top level of the kernel's documentation tree. Kernel |
10 | documentation, like the kernel itself, is very much a work in progress; | 10 | documentation, like the kernel itself, is very much a work in progress; |
@@ -51,6 +51,7 @@ merged much easier. | |||
51 | process/index | 51 | process/index |
52 | dev-tools/index | 52 | dev-tools/index |
53 | doc-guide/index | 53 | doc-guide/index |
54 | kernel-hacking/index | ||
54 | 55 | ||
55 | Kernel API documentation | 56 | Kernel API documentation |
56 | ------------------------ | 57 | ------------------------ |
@@ -67,11 +68,24 @@ needed). | |||
67 | driver-api/index | 68 | driver-api/index |
68 | core-api/index | 69 | core-api/index |
69 | media/index | 70 | media/index |
71 | networking/index | ||
70 | input/index | 72 | input/index |
71 | gpu/index | 73 | gpu/index |
72 | security/index | 74 | security/index |
73 | sound/index | 75 | sound/index |
74 | crypto/index | 76 | crypto/index |
77 | filesystems/index | ||
78 | |||
79 | Architecture-specific documentation | ||
80 | ----------------------------------- | ||
81 | |||
82 | These books provide programming details about architecture-specific | ||
83 | implementation. | ||
84 | |||
85 | .. toctree:: | ||
86 | :maxdepth: 2 | ||
87 | |||
88 | sh/index | ||
75 | 89 | ||
76 | Korean translations | 90 | Korean translations |
77 | ------------------- | 91 | ------------------- |
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index e18daca65ccd..659afd56ecdb 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt | |||
@@ -1331,7 +1331,7 @@ See subsequent chapter for the syntax of the Kbuild file. | |||
1331 | --- 7.5 mandatory-y | 1331 | --- 7.5 mandatory-y |
1332 | 1332 | ||
1333 | mandatory-y is essentially used by include/uapi/asm-generic/Kbuild.asm | 1333 | mandatory-y is essentially used by include/uapi/asm-generic/Kbuild.asm |
1334 | to define the minimun set of headers that must be exported in | 1334 | to define the minimum set of headers that must be exported in |
1335 | include/asm. | 1335 | include/asm. |
1336 | 1336 | ||
1337 | The convention is to list one subdir per line and | 1337 | The convention is to list one subdir per line and |
diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt index 104740ea0041..c23e2c5ab80d 100644 --- a/Documentation/kernel-doc-nano-HOWTO.txt +++ b/Documentation/kernel-doc-nano-HOWTO.txt | |||
@@ -17,8 +17,8 @@ The format for this documentation is called the kernel-doc format. | |||
17 | It is documented in this Documentation/kernel-doc-nano-HOWTO.txt file. | 17 | It is documented in this Documentation/kernel-doc-nano-HOWTO.txt file. |
18 | 18 | ||
19 | This style embeds the documentation within the source files, using | 19 | This style embeds the documentation within the source files, using |
20 | a few simple conventions. The scripts/kernel-doc perl script, some | 20 | a few simple conventions. The scripts/kernel-doc perl script, the |
21 | SGML templates in Documentation/DocBook, and other tools understand | 21 | Documentation/sphinx/kerneldoc.py Sphinx extension and other tools understand |
22 | these conventions, and are used to extract this embedded documentation | 22 | these conventions, and are used to extract this embedded documentation |
23 | into various documents. | 23 | into various documents. |
24 | 24 | ||
@@ -122,15 +122,9 @@ are: | |||
122 | - scripts/kernel-doc | 122 | - scripts/kernel-doc |
123 | 123 | ||
124 | This is a perl script that hunts for the block comments and can mark | 124 | This is a perl script that hunts for the block comments and can mark |
125 | them up directly into DocBook, man, text, and HTML. (No, not | 125 | them up directly into DocBook, ReST, man, text, and HTML. (No, not |
126 | texinfo.) | 126 | texinfo.) |
127 | 127 | ||
128 | - Documentation/DocBook/*.tmpl | ||
129 | |||
130 | These are SGML template files, which are normal SGML files with | ||
131 | special place-holders for where the extracted documentation should | ||
132 | go. | ||
133 | |||
134 | - scripts/docproc.c | 128 | - scripts/docproc.c |
135 | 129 | ||
136 | This is a program for converting SGML template files into SGML | 130 | This is a program for converting SGML template files into SGML |
@@ -145,25 +139,18 @@ are: | |||
145 | 139 | ||
146 | - Makefile | 140 | - Makefile |
147 | 141 | ||
148 | The targets 'xmldocs', 'psdocs', 'pdfdocs', and 'htmldocs' are used | 142 | The targets 'xmldocs', 'latexdocs', 'pdfdocs', 'epubdocs'and 'htmldocs' |
149 | to build XML DocBook files, PostScript files, PDF files, and html files | 143 | are used to build XML DocBook files, LaTeX files, PDF files, |
150 | in Documentation/DocBook. The older target 'sgmldocs' is equivalent | 144 | ePub files and html files in Documentation/. |
151 | to 'xmldocs'. | ||
152 | |||
153 | - Documentation/DocBook/Makefile | ||
154 | |||
155 | This is where C files are associated with SGML templates. | ||
156 | |||
157 | 145 | ||
158 | How to extract the documentation | 146 | How to extract the documentation |
159 | -------------------------------- | 147 | -------------------------------- |
160 | 148 | ||
161 | If you just want to read the ready-made books on the various | 149 | If you just want to read the ready-made books on the various |
162 | subsystems (see Documentation/DocBook/*.tmpl), just type 'make | 150 | subsystems, just type 'make epubdocs', or 'make pdfdocs', or 'make htmldocs', |
163 | psdocs', or 'make pdfdocs', or 'make htmldocs', depending on your | 151 | depending on your preference. If you would rather read a different format, |
164 | preference. If you would rather read a different format, you can type | 152 | you can type 'make xmldocs' and then use DocBook tools to convert |
165 | 'make xmldocs' and then use DocBook tools to convert | 153 | Documentation/output/*.xml to a format of your choice (for example, |
166 | Documentation/DocBook/*.xml to a format of your choice (for example, | ||
167 | 'db2html ...' if 'make htmldocs' was not defined). | 154 | 'db2html ...' if 'make htmldocs' was not defined). |
168 | 155 | ||
169 | If you want to see man pages instead, you can do this: | 156 | If you want to see man pages instead, you can do this: |
@@ -329,37 +316,7 @@ This is done by using a DOC: section keyword with a section title. E.g.: | |||
329 | * hardware, software, or its subject(s). | 316 | * hardware, software, or its subject(s). |
330 | */ | 317 | */ |
331 | 318 | ||
332 | DOC: sections are used in SGML templates files as indicated below. | 319 | DOC: sections are used in ReST files. |
333 | |||
334 | |||
335 | How to make new SGML template files | ||
336 | ----------------------------------- | ||
337 | |||
338 | SGML template files (*.tmpl) are like normal SGML files, except that | ||
339 | they can contain escape sequences where extracted documentation should | ||
340 | be inserted. | ||
341 | |||
342 | !E<filename> is replaced by the documentation, in <filename>, for | ||
343 | functions that are exported using EXPORT_SYMBOL: the function list is | ||
344 | collected from files listed in Documentation/DocBook/Makefile. | ||
345 | |||
346 | !I<filename> is replaced by the documentation for functions that are | ||
347 | _not_ exported using EXPORT_SYMBOL. | ||
348 | |||
349 | !D<filename> is used to name additional files to search for functions | ||
350 | exported using EXPORT_SYMBOL. | ||
351 | |||
352 | !F<filename> <function [functions...]> is replaced by the | ||
353 | documentation, in <filename>, for the functions listed. | ||
354 | |||
355 | !P<filename> <section title> is replaced by the contents of the DOC: | ||
356 | section titled <section title> from <filename>. | ||
357 | Spaces are allowed in <section title>; do not quote the <section title>. | ||
358 | |||
359 | !C<filename> is replaced by nothing, but makes the tools check that | ||
360 | all DOC: sections and documented functions, symbols, etc. are used. | ||
361 | This makes sense to use when you use !F/!P only and want to verify | ||
362 | that all documentation is included. | ||
363 | 320 | ||
364 | Tim. | 321 | Tim. |
365 | */ <twaugh@redhat.com> | 322 | */ <twaugh@redhat.com> |
diff --git a/Documentation/kernel-hacking/conf.py b/Documentation/kernel-hacking/conf.py new file mode 100644 index 000000000000..3d8acf0f33ad --- /dev/null +++ b/Documentation/kernel-hacking/conf.py | |||
@@ -0,0 +1,10 @@ | |||
1 | # -*- coding: utf-8; mode: python -*- | ||
2 | |||
3 | project = "Kernel Hacking Guides" | ||
4 | |||
5 | tags.add("subproject") | ||
6 | |||
7 | latex_documents = [ | ||
8 | ('index', 'kernel-hacking.tex', project, | ||
9 | 'The kernel development community', 'manual'), | ||
10 | ] | ||
diff --git a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst new file mode 100644 index 000000000000..daf3883b2694 --- /dev/null +++ b/Documentation/kernel-hacking/hacking.rst | |||
@@ -0,0 +1,811 @@ | |||
1 | ============================================ | ||
2 | Unreliable Guide To Hacking The Linux Kernel | ||
3 | ============================================ | ||
4 | |||
5 | :Author: Rusty Russell | ||
6 | |||
7 | Introduction | ||
8 | ============ | ||
9 | |||
10 | Welcome, gentle reader, to Rusty's Remarkably Unreliable Guide to Linux | ||
11 | Kernel Hacking. This document describes the common routines and general | ||
12 | requirements for kernel code: its goal is to serve as a primer for Linux | ||
13 | kernel development for experienced C programmers. I avoid implementation | ||
14 | details: that's what the code is for, and I ignore whole tracts of | ||
15 | useful routines. | ||
16 | |||
17 | Before you read this, please understand that I never wanted to write | ||
18 | this document, being grossly under-qualified, but I always wanted to | ||
19 | read it, and this was the only way. I hope it will grow into a | ||
20 | compendium of best practice, common starting points and random | ||
21 | information. | ||
22 | |||
23 | The Players | ||
24 | =========== | ||
25 | |||
26 | At any time each of the CPUs in a system can be: | ||
27 | |||
28 | - not associated with any process, serving a hardware interrupt; | ||
29 | |||
30 | - not associated with any process, serving a softirq or tasklet; | ||
31 | |||
32 | - running in kernel space, associated with a process (user context); | ||
33 | |||
34 | - running a process in user space. | ||
35 | |||
36 | There is an ordering between these. The bottom two can preempt each | ||
37 | other, but above that is a strict hierarchy: each can only be preempted | ||
38 | by the ones above it. For example, while a softirq is running on a CPU, | ||
39 | no other softirq will preempt it, but a hardware interrupt can. However, | ||
40 | any other CPUs in the system execute independently. | ||
41 | |||
42 | We'll see a number of ways that the user context can block interrupts, | ||
43 | to become truly non-preemptable. | ||
44 | |||
45 | User Context | ||
46 | ------------ | ||
47 | |||
48 | User context is when you are coming in from a system call or other trap: | ||
49 | like userspace, you can be preempted by more important tasks and by | ||
50 | interrupts. You can sleep, by calling :c:func:`schedule()`. | ||
51 | |||
52 | .. note:: | ||
53 | |||
54 | You are always in user context on module load and unload, and on | ||
55 | operations on the block device layer. | ||
56 | |||
57 | In user context, the ``current`` pointer (indicating the task we are | ||
58 | currently executing) is valid, and :c:func:`in_interrupt()` | ||
59 | (``include/linux/preempt.h``) is false. | ||
60 | |||
61 | .. warning:: | ||
62 | |||
63 | Beware that if you have preemption or softirqs disabled (see below), | ||
64 | :c:func:`in_interrupt()` will return a false positive. | ||
65 | |||
66 | Hardware Interrupts (Hard IRQs) | ||
67 | ------------------------------- | ||
68 | |||
69 | Timer ticks, network cards and keyboard are examples of real hardware | ||
70 | which produce interrupts at any time. The kernel runs interrupt | ||
71 | handlers, which services the hardware. The kernel guarantees that this | ||
72 | handler is never re-entered: if the same interrupt arrives, it is queued | ||
73 | (or dropped). Because it disables interrupts, this handler has to be | ||
74 | fast: frequently it simply acknowledges the interrupt, marks a 'software | ||
75 | interrupt' for execution and exits. | ||
76 | |||
77 | You can tell you are in a hardware interrupt, because | ||
78 | :c:func:`in_irq()` returns true. | ||
79 | |||
80 | .. warning:: | ||
81 | |||
82 | Beware that this will return a false positive if interrupts are | ||
83 | disabled (see below). | ||
84 | |||
85 | Software Interrupt Context: Softirqs and Tasklets | ||
86 | ------------------------------------------------- | ||
87 | |||
88 | Whenever a system call is about to return to userspace, or a hardware | ||
89 | interrupt handler exits, any 'software interrupts' which are marked | ||
90 | pending (usually by hardware interrupts) are run (``kernel/softirq.c``). | ||
91 | |||
92 | Much of the real interrupt handling work is done here. Early in the | ||
93 | transition to SMP, there were only 'bottom halves' (BHs), which didn't | ||
94 | take advantage of multiple CPUs. Shortly after we switched from wind-up | ||
95 | computers made of match-sticks and snot, we abandoned this limitation | ||
96 | and switched to 'softirqs'. | ||
97 | |||
98 | ``include/linux/interrupt.h`` lists the different softirqs. A very | ||
99 | important softirq is the timer softirq (``include/linux/timer.h``): you | ||
100 | can register to have it call functions for you in a given length of | ||
101 | time. | ||
102 | |||
103 | Softirqs are often a pain to deal with, since the same softirq will run | ||
104 | simultaneously on more than one CPU. For this reason, tasklets | ||
105 | (``include/linux/interrupt.h``) are more often used: they are | ||
106 | dynamically-registrable (meaning you can have as many as you want), and | ||
107 | they also guarantee that any tasklet will only run on one CPU at any | ||
108 | time, although different tasklets can run simultaneously. | ||
109 | |||
110 | .. warning:: | ||
111 | |||
112 | The name 'tasklet' is misleading: they have nothing to do with | ||
113 | 'tasks', and probably more to do with some bad vodka Alexey | ||
114 | Kuznetsov had at the time. | ||
115 | |||
116 | You can tell you are in a softirq (or tasklet) using the | ||
117 | :c:func:`in_softirq()` macro (``include/linux/preempt.h``). | ||
118 | |||
119 | .. warning:: | ||
120 | |||
121 | Beware that this will return a false positive if a | ||
122 | :ref:`botton half lock <local_bh_disable>` is held. | ||
123 | |||
124 | Some Basic Rules | ||
125 | ================ | ||
126 | |||
127 | No memory protection | ||
128 | If you corrupt memory, whether in user context or interrupt context, | ||
129 | the whole machine will crash. Are you sure you can't do what you | ||
130 | want in userspace? | ||
131 | |||
132 | No floating point or MMX | ||
133 | The FPU context is not saved; even in user context the FPU state | ||
134 | probably won't correspond with the current process: you would mess | ||
135 | with some user process' FPU state. If you really want to do this, | ||
136 | you would have to explicitly save/restore the full FPU state (and | ||
137 | avoid context switches). It is generally a bad idea; use fixed point | ||
138 | arithmetic first. | ||
139 | |||
140 | A rigid stack limit | ||
141 | Depending on configuration options the kernel stack is about 3K to | ||
142 | 6K for most 32-bit architectures: it's about 14K on most 64-bit | ||
143 | archs, and often shared with interrupts so you can't use it all. | ||
144 | Avoid deep recursion and huge local arrays on the stack (allocate | ||
145 | them dynamically instead). | ||
146 | |||
147 | The Linux kernel is portable | ||
148 | Let's keep it that way. Your code should be 64-bit clean, and | ||
149 | endian-independent. You should also minimize CPU specific stuff, | ||
150 | e.g. inline assembly should be cleanly encapsulated and minimized to | ||
151 | ease porting. Generally it should be restricted to the | ||
152 | architecture-dependent part of the kernel tree. | ||
153 | |||
154 | ioctls: Not writing a new system call | ||
155 | ===================================== | ||
156 | |||
157 | A system call generally looks like this:: | ||
158 | |||
159 | asmlinkage long sys_mycall(int arg) | ||
160 | { | ||
161 | return 0; | ||
162 | } | ||
163 | |||
164 | |||
165 | First, in most cases you don't want to create a new system call. You | ||
166 | create a character device and implement an appropriate ioctl for it. | ||
167 | This is much more flexible than system calls, doesn't have to be entered | ||
168 | in every architecture's ``include/asm/unistd.h`` and | ||
169 | ``arch/kernel/entry.S`` file, and is much more likely to be accepted by | ||
170 | Linus. | ||
171 | |||
172 | If all your routine does is read or write some parameter, consider | ||
173 | implementing a :c:func:`sysfs()` interface instead. | ||
174 | |||
175 | Inside the ioctl you're in user context to a process. When a error | ||
176 | occurs you return a negated errno (see | ||
177 | ``include/uapi/asm-generic/errno-base.h``, | ||
178 | ``include/uapi/asm-generic/errno.h`` and ``include/linux/errno.h``), | ||
179 | otherwise you return 0. | ||
180 | |||
181 | After you slept you should check if a signal occurred: the Unix/Linux | ||
182 | way of handling signals is to temporarily exit the system call with the | ||
183 | ``-ERESTARTSYS`` error. The system call entry code will switch back to | ||
184 | user context, process the signal handler and then your system call will | ||
185 | be restarted (unless the user disabled that). So you should be prepared | ||
186 | to process the restart, e.g. if you're in the middle of manipulating | ||
187 | some data structure. | ||
188 | |||
189 | :: | ||
190 | |||
191 | if (signal_pending(current)) | ||
192 | return -ERESTARTSYS; | ||
193 | |||
194 | |||
195 | If you're doing longer computations: first think userspace. If you | ||
196 | **really** want to do it in kernel you should regularly check if you need | ||
197 | to give up the CPU (remember there is cooperative multitasking per CPU). | ||
198 | Idiom:: | ||
199 | |||
200 | cond_resched(); /* Will sleep */ | ||
201 | |||
202 | |||
203 | A short note on interface design: the UNIX system call motto is "Provide | ||
204 | mechanism not policy". | ||
205 | |||
206 | Recipes for Deadlock | ||
207 | ==================== | ||
208 | |||
209 | You cannot call any routines which may sleep, unless: | ||
210 | |||
211 | - You are in user context. | ||
212 | |||
213 | - You do not own any spinlocks. | ||
214 | |||
215 | - You have interrupts enabled (actually, Andi Kleen says that the | ||
216 | scheduling code will enable them for you, but that's probably not | ||
217 | what you wanted). | ||
218 | |||
219 | Note that some functions may sleep implicitly: common ones are the user | ||
220 | space access functions (\*_user) and memory allocation functions | ||
221 | without ``GFP_ATOMIC``. | ||
222 | |||
223 | You should always compile your kernel ``CONFIG_DEBUG_ATOMIC_SLEEP`` on, | ||
224 | and it will warn you if you break these rules. If you **do** break the | ||
225 | rules, you will eventually lock up your box. | ||
226 | |||
227 | Really. | ||
228 | |||
229 | Common Routines | ||
230 | =============== | ||
231 | |||
232 | :c:func:`printk()` | ||
233 | ------------------ | ||
234 | |||
235 | Defined in ``include/linux/printk.h`` | ||
236 | |||
237 | :c:func:`printk()` feeds kernel messages to the console, dmesg, and | ||
238 | the syslog daemon. It is useful for debugging and reporting errors, and | ||
239 | can be used inside interrupt context, but use with caution: a machine | ||
240 | which has its console flooded with printk messages is unusable. It uses | ||
241 | a format string mostly compatible with ANSI C printf, and C string | ||
242 | concatenation to give it a first "priority" argument:: | ||
243 | |||
244 | printk(KERN_INFO "i = %u\n", i); | ||
245 | |||
246 | |||
247 | See ``include/linux/kern_levels.h``; for other ``KERN_`` values; these are | ||
248 | interpreted by syslog as the level. Special case: for printing an IP | ||
249 | address use:: | ||
250 | |||
251 | __be32 ipaddress; | ||
252 | printk(KERN_INFO "my ip: %pI4\n", &ipaddress); | ||
253 | |||
254 | |||
255 | :c:func:`printk()` internally uses a 1K buffer and does not catch | ||
256 | overruns. Make sure that will be enough. | ||
257 | |||
258 | .. note:: | ||
259 | |||
260 | You will know when you are a real kernel hacker when you start | ||
261 | typoing printf as printk in your user programs :) | ||
262 | |||
263 | .. note:: | ||
264 | |||
265 | Another sidenote: the original Unix Version 6 sources had a comment | ||
266 | on top of its printf function: "Printf should not be used for | ||
267 | chit-chat". You should follow that advice. | ||
268 | |||
269 | :c:func:`copy_to_user()` / :c:func:`copy_from_user()` / :c:func:`get_user()` / :c:func:`put_user()` | ||
270 | --------------------------------------------------------------------------------------------------- | ||
271 | |||
272 | Defined in ``include/linux/uaccess.h`` / ``asm/uaccess.h`` | ||
273 | |||
274 | **[SLEEPS]** | ||
275 | |||
276 | :c:func:`put_user()` and :c:func:`get_user()` are used to get | ||
277 | and put single values (such as an int, char, or long) from and to | ||
278 | userspace. A pointer into userspace should never be simply dereferenced: | ||
279 | data should be copied using these routines. Both return ``-EFAULT`` or | ||
280 | 0. | ||
281 | |||
282 | :c:func:`copy_to_user()` and :c:func:`copy_from_user()` are | ||
283 | more general: they copy an arbitrary amount of data to and from | ||
284 | userspace. | ||
285 | |||
286 | .. warning:: | ||
287 | |||
288 | Unlike :c:func:`put_user()` and :c:func:`get_user()`, they | ||
289 | return the amount of uncopied data (ie. 0 still means success). | ||
290 | |||
291 | [Yes, this moronic interface makes me cringe. The flamewar comes up | ||
292 | every year or so. --RR.] | ||
293 | |||
294 | The functions may sleep implicitly. This should never be called outside | ||
295 | user context (it makes no sense), with interrupts disabled, or a | ||
296 | spinlock held. | ||
297 | |||
298 | :c:func:`kmalloc()`/:c:func:`kfree()` | ||
299 | ------------------------------------- | ||
300 | |||
301 | Defined in ``include/linux/slab.h`` | ||
302 | |||
303 | **[MAY SLEEP: SEE BELOW]** | ||
304 | |||
305 | These routines are used to dynamically request pointer-aligned chunks of | ||
306 | memory, like malloc and free do in userspace, but | ||
307 | :c:func:`kmalloc()` takes an extra flag word. Important values: | ||
308 | |||
309 | ``GFP_KERNEL`` | ||
310 | May sleep and swap to free memory. Only allowed in user context, but | ||
311 | is the most reliable way to allocate memory. | ||
312 | |||
313 | ``GFP_ATOMIC`` | ||
314 | Don't sleep. Less reliable than ``GFP_KERNEL``, but may be called | ||
315 | from interrupt context. You should **really** have a good | ||
316 | out-of-memory error-handling strategy. | ||
317 | |||
318 | ``GFP_DMA`` | ||
319 | Allocate ISA DMA lower than 16MB. If you don't know what that is you | ||
320 | don't need it. Very unreliable. | ||
321 | |||
322 | If you see a sleeping function called from invalid context warning | ||
323 | message, then maybe you called a sleeping allocation function from | ||
324 | interrupt context without ``GFP_ATOMIC``. You should really fix that. | ||
325 | Run, don't walk. | ||
326 | |||
327 | If you are allocating at least ``PAGE_SIZE`` (``asm/page.h`` or | ||
328 | ``asm/page_types.h``) bytes, consider using :c:func:`__get_free_pages()` | ||
329 | (``include/linux/gfp.h``). It takes an order argument (0 for page sized, | ||
330 | 1 for double page, 2 for four pages etc.) and the same memory priority | ||
331 | flag word as above. | ||
332 | |||
333 | If you are allocating more than a page worth of bytes you can use | ||
334 | :c:func:`vmalloc()`. It'll allocate virtual memory in the kernel | ||
335 | map. This block is not contiguous in physical memory, but the MMU makes | ||
336 | it look like it is for you (so it'll only look contiguous to the CPUs, | ||
337 | not to external device drivers). If you really need large physically | ||
338 | contiguous memory for some weird device, you have a problem: it is | ||
339 | poorly supported in Linux because after some time memory fragmentation | ||
340 | in a running kernel makes it hard. The best way is to allocate the block | ||
341 | early in the boot process via the :c:func:`alloc_bootmem()` | ||
342 | routine. | ||
343 | |||
344 | Before inventing your own cache of often-used objects consider using a | ||
345 | slab cache in ``include/linux/slab.h`` | ||
346 | |||
347 | :c:func:`current()` | ||
348 | ------------------- | ||
349 | |||
350 | Defined in ``include/asm/current.h`` | ||
351 | |||
352 | This global variable (really a macro) contains a pointer to the current | ||
353 | task structure, so is only valid in user context. For example, when a | ||
354 | process makes a system call, this will point to the task structure of | ||
355 | the calling process. It is **not NULL** in interrupt context. | ||
356 | |||
357 | :c:func:`mdelay()`/:c:func:`udelay()` | ||
358 | ------------------------------------- | ||
359 | |||
360 | Defined in ``include/asm/delay.h`` / ``include/linux/delay.h`` | ||
361 | |||
362 | The :c:func:`udelay()` and :c:func:`ndelay()` functions can be | ||
363 | used for small pauses. Do not use large values with them as you risk | ||
364 | overflow - the helper function :c:func:`mdelay()` is useful here, or | ||
365 | consider :c:func:`msleep()`. | ||
366 | |||
367 | :c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()` | ||
368 | ----------------------------------------------------------------------------------------------- | ||
369 | |||
370 | Defined in ``include/asm/byteorder.h`` | ||
371 | |||
372 | The :c:func:`cpu_to_be32()` family (where the "32" can be replaced | ||
373 | by 64 or 16, and the "be" can be replaced by "le") are the general way | ||
374 | to do endian conversions in the kernel: they return the converted value. | ||
375 | All variations supply the reverse as well: | ||
376 | :c:func:`be32_to_cpu()`, etc. | ||
377 | |||
378 | There are two major variations of these functions: the pointer | ||
379 | variation, such as :c:func:`cpu_to_be32p()`, which take a pointer | ||
380 | to the given type, and return the converted value. The other variation | ||
381 | is the "in-situ" family, such as :c:func:`cpu_to_be32s()`, which | ||
382 | convert value referred to by the pointer, and return void. | ||
383 | |||
384 | :c:func:`local_irq_save()`/:c:func:`local_irq_restore()` | ||
385 | -------------------------------------------------------- | ||
386 | |||
387 | Defined in ``include/linux/irqflags.h`` | ||
388 | |||
389 | These routines disable hard interrupts on the local CPU, and restore | ||
390 | them. They are reentrant; saving the previous state in their one | ||
391 | ``unsigned long flags`` argument. If you know that interrupts are | ||
392 | enabled, you can simply use :c:func:`local_irq_disable()` and | ||
393 | :c:func:`local_irq_enable()`. | ||
394 | |||
395 | .. _local_bh_disable: | ||
396 | |||
397 | :c:func:`local_bh_disable()`/:c:func:`local_bh_enable()` | ||
398 | -------------------------------------------------------- | ||
399 | |||
400 | Defined in ``include/linux/bottom_half.h`` | ||
401 | |||
402 | |||
403 | These routines disable soft interrupts on the local CPU, and restore | ||
404 | them. They are reentrant; if soft interrupts were disabled before, they | ||
405 | will still be disabled after this pair of functions has been called. | ||
406 | They prevent softirqs and tasklets from running on the current CPU. | ||
407 | |||
408 | :c:func:`smp_processor_id()` | ||
409 | ---------------------------- | ||
410 | |||
411 | Defined in ``include/linux/smp.h`` | ||
412 | |||
413 | :c:func:`get_cpu()` disables preemption (so you won't suddenly get | ||
414 | moved to another CPU) and returns the current processor number, between | ||
415 | 0 and ``NR_CPUS``. Note that the CPU numbers are not necessarily | ||
416 | continuous. You return it again with :c:func:`put_cpu()` when you | ||
417 | are done. | ||
418 | |||
419 | If you know you cannot be preempted by another task (ie. you are in | ||
420 | interrupt context, or have preemption disabled) you can use | ||
421 | smp_processor_id(). | ||
422 | |||
423 | ``__init``/``__exit``/``__initdata`` | ||
424 | ------------------------------------ | ||
425 | |||
426 | Defined in ``include/linux/init.h`` | ||
427 | |||
428 | After boot, the kernel frees up a special section; functions marked with | ||
429 | ``__init`` and data structures marked with ``__initdata`` are dropped | ||
430 | after boot is complete: similarly modules discard this memory after | ||
431 | initialization. ``__exit`` is used to declare a function which is only | ||
432 | required on exit: the function will be dropped if this file is not | ||
433 | compiled as a module. See the header file for use. Note that it makes no | ||
434 | sense for a function marked with ``__init`` to be exported to modules | ||
435 | with :c:func:`EXPORT_SYMBOL()` or :c:func:`EXPORT_SYMBOL_GPL()`- this | ||
436 | will break. | ||
437 | |||
438 | :c:func:`__initcall()`/:c:func:`module_init()` | ||
439 | ---------------------------------------------- | ||
440 | |||
441 | Defined in ``include/linux/init.h`` / ``include/linux/module.h`` | ||
442 | |||
443 | Many parts of the kernel are well served as a module | ||
444 | (dynamically-loadable parts of the kernel). Using the | ||
445 | :c:func:`module_init()` and :c:func:`module_exit()` macros it | ||
446 | is easy to write code without #ifdefs which can operate both as a module | ||
447 | or built into the kernel. | ||
448 | |||
449 | The :c:func:`module_init()` macro defines which function is to be | ||
450 | called at module insertion time (if the file is compiled as a module), | ||
451 | or at boot time: if the file is not compiled as a module the | ||
452 | :c:func:`module_init()` macro becomes equivalent to | ||
453 | :c:func:`__initcall()`, which through linker magic ensures that | ||
454 | the function is called on boot. | ||
455 | |||
456 | The function can return a negative error number to cause module loading | ||
457 | to fail (unfortunately, this has no effect if the module is compiled | ||
458 | into the kernel). This function is called in user context with | ||
459 | interrupts enabled, so it can sleep. | ||
460 | |||
461 | :c:func:`module_exit()` | ||
462 | ----------------------- | ||
463 | |||
464 | |||
465 | Defined in ``include/linux/module.h`` | ||
466 | |||
467 | This macro defines the function to be called at module removal time (or | ||
468 | never, in the case of the file compiled into the kernel). It will only | ||
469 | be called if the module usage count has reached zero. This function can | ||
470 | also sleep, but cannot fail: everything must be cleaned up by the time | ||
471 | it returns. | ||
472 | |||
473 | Note that this macro is optional: if it is not present, your module will | ||
474 | not be removable (except for 'rmmod -f'). | ||
475 | |||
476 | :c:func:`try_module_get()`/:c:func:`module_put()` | ||
477 | ------------------------------------------------- | ||
478 | |||
479 | Defined in ``include/linux/module.h`` | ||
480 | |||
481 | These manipulate the module usage count, to protect against removal (a | ||
482 | module also can't be removed if another module uses one of its exported | ||
483 | symbols: see below). Before calling into module code, you should call | ||
484 | :c:func:`try_module_get()` on that module: if it fails, then the | ||
485 | module is being removed and you should act as if it wasn't there. | ||
486 | Otherwise, you can safely enter the module, and call | ||
487 | :c:func:`module_put()` when you're finished. | ||
488 | |||
489 | Most registerable structures have an owner field, such as in the | ||
490 | :c:type:`struct file_operations <file_operations>` structure. | ||
491 | Set this field to the macro ``THIS_MODULE``. | ||
492 | |||
493 | Wait Queues ``include/linux/wait.h`` | ||
494 | ==================================== | ||
495 | |||
496 | **[SLEEPS]** | ||
497 | |||
498 | A wait queue is used to wait for someone to wake you up when a certain | ||
499 | condition is true. They must be used carefully to ensure there is no | ||
500 | race condition. You declare a :c:type:`wait_queue_head_t`, and then processes | ||
501 | which want to wait for that condition declare a :c:type:`wait_queue_entry_t` | ||
502 | referring to themselves, and place that in the queue. | ||
503 | |||
504 | Declaring | ||
505 | --------- | ||
506 | |||
507 | You declare a ``wait_queue_head_t`` using the | ||
508 | :c:func:`DECLARE_WAIT_QUEUE_HEAD()` macro, or using the | ||
509 | :c:func:`init_waitqueue_head()` routine in your initialization | ||
510 | code. | ||
511 | |||
512 | Queuing | ||
513 | ------- | ||
514 | |||
515 | Placing yourself in the waitqueue is fairly complex, because you must | ||
516 | put yourself in the queue before checking the condition. There is a | ||
517 | macro to do this: :c:func:`wait_event_interruptible()` | ||
518 | (``include/linux/wait.h``) The first argument is the wait queue head, and | ||
519 | the second is an expression which is evaluated; the macro returns 0 when | ||
520 | this expression is true, or ``-ERESTARTSYS`` if a signal is received. The | ||
521 | :c:func:`wait_event()` version ignores signals. | ||
522 | |||
523 | Waking Up Queued Tasks | ||
524 | ---------------------- | ||
525 | |||
526 | Call :c:func:`wake_up()` (``include/linux/wait.h``);, which will wake | ||
527 | up every process in the queue. The exception is if one has | ||
528 | ``TASK_EXCLUSIVE`` set, in which case the remainder of the queue will | ||
529 | not be woken. There are other variants of this basic function available | ||
530 | in the same header. | ||
531 | |||
532 | Atomic Operations | ||
533 | ================= | ||
534 | |||
535 | Certain operations are guaranteed atomic on all platforms. The first | ||
536 | class of operations work on :c:type:`atomic_t` (``include/asm/atomic.h``); | ||
537 | this contains a signed integer (at least 32 bits long), and you must use | ||
538 | these functions to manipulate or read :c:type:`atomic_t` variables. | ||
539 | :c:func:`atomic_read()` and :c:func:`atomic_set()` get and set | ||
540 | the counter, :c:func:`atomic_add()`, :c:func:`atomic_sub()`, | ||
541 | :c:func:`atomic_inc()`, :c:func:`atomic_dec()`, and | ||
542 | :c:func:`atomic_dec_and_test()` (returns true if it was | ||
543 | decremented to zero). | ||
544 | |||
545 | Yes. It returns true (i.e. != 0) if the atomic variable is zero. | ||
546 | |||
547 | Note that these functions are slower than normal arithmetic, and so | ||
548 | should not be used unnecessarily. | ||
549 | |||
550 | The second class of atomic operations is atomic bit operations on an | ||
551 | ``unsigned long``, defined in ``include/linux/bitops.h``. These | ||
552 | operations generally take a pointer to the bit pattern, and a bit | ||
553 | number: 0 is the least significant bit. :c:func:`set_bit()`, | ||
554 | :c:func:`clear_bit()` and :c:func:`change_bit()` set, clear, | ||
555 | and flip the given bit. :c:func:`test_and_set_bit()`, | ||
556 | :c:func:`test_and_clear_bit()` and | ||
557 | :c:func:`test_and_change_bit()` do the same thing, except return | ||
558 | true if the bit was previously set; these are particularly useful for | ||
559 | atomically setting flags. | ||
560 | |||
561 | It is possible to call these operations with bit indices greater than | ||
562 | ``BITS_PER_LONG``. The resulting behavior is strange on big-endian | ||
563 | platforms though so it is a good idea not to do this. | ||
564 | |||
565 | Symbols | ||
566 | ======= | ||
567 | |||
568 | Within the kernel proper, the normal linking rules apply (ie. unless a | ||
569 | symbol is declared to be file scope with the ``static`` keyword, it can | ||
570 | be used anywhere in the kernel). However, for modules, a special | ||
571 | exported symbol table is kept which limits the entry points to the | ||
572 | kernel proper. Modules can also export symbols. | ||
573 | |||
574 | :c:func:`EXPORT_SYMBOL()` | ||
575 | ------------------------- | ||
576 | |||
577 | Defined in ``include/linux/export.h`` | ||
578 | |||
579 | This is the classic method of exporting a symbol: dynamically loaded | ||
580 | modules will be able to use the symbol as normal. | ||
581 | |||
582 | :c:func:`EXPORT_SYMBOL_GPL()` | ||
583 | ----------------------------- | ||
584 | |||
585 | Defined in ``include/linux/export.h`` | ||
586 | |||
587 | Similar to :c:func:`EXPORT_SYMBOL()` except that the symbols | ||
588 | exported by :c:func:`EXPORT_SYMBOL_GPL()` can only be seen by | ||
589 | modules with a :c:func:`MODULE_LICENSE()` that specifies a GPL | ||
590 | compatible license. It implies that the function is considered an | ||
591 | internal implementation issue, and not really an interface. Some | ||
592 | maintainers and developers may however require EXPORT_SYMBOL_GPL() | ||
593 | when adding any new APIs or functionality. | ||
594 | |||
595 | Routines and Conventions | ||
596 | ======================== | ||
597 | |||
598 | Double-linked lists ``include/linux/list.h`` | ||
599 | -------------------------------------------- | ||
600 | |||
601 | There used to be three sets of linked-list routines in the kernel | ||
602 | headers, but this one is the winner. If you don't have some particular | ||
603 | pressing need for a single list, it's a good choice. | ||
604 | |||
605 | In particular, :c:func:`list_for_each_entry()` is useful. | ||
606 | |||
607 | Return Conventions | ||
608 | ------------------ | ||
609 | |||
610 | For code called in user context, it's very common to defy C convention, | ||
611 | and return 0 for success, and a negative error number (eg. ``-EFAULT``) for | ||
612 | failure. This can be unintuitive at first, but it's fairly widespread in | ||
613 | the kernel. | ||
614 | |||
615 | Using :c:func:`ERR_PTR()` (``include/linux/err.h``) to encode a | ||
616 | negative error number into a pointer, and :c:func:`IS_ERR()` and | ||
617 | :c:func:`PTR_ERR()` to get it back out again: avoids a separate | ||
618 | pointer parameter for the error number. Icky, but in a good way. | ||
619 | |||
620 | Breaking Compilation | ||
621 | -------------------- | ||
622 | |||
623 | Linus and the other developers sometimes change function or structure | ||
624 | names in development kernels; this is not done just to keep everyone on | ||
625 | their toes: it reflects a fundamental change (eg. can no longer be | ||
626 | called with interrupts on, or does extra checks, or doesn't do checks | ||
627 | which were caught before). Usually this is accompanied by a fairly | ||
628 | complete note to the linux-kernel mailing list; search the archive. | ||
629 | Simply doing a global replace on the file usually makes things **worse**. | ||
630 | |||
631 | Initializing structure members | ||
632 | ------------------------------ | ||
633 | |||
634 | The preferred method of initializing structures is to use designated | ||
635 | initialisers, as defined by ISO C99, eg:: | ||
636 | |||
637 | static struct block_device_operations opt_fops = { | ||
638 | .open = opt_open, | ||
639 | .release = opt_release, | ||
640 | .ioctl = opt_ioctl, | ||
641 | .check_media_change = opt_media_change, | ||
642 | }; | ||
643 | |||
644 | |||
645 | This makes it easy to grep for, and makes it clear which structure | ||
646 | fields are set. You should do this because it looks cool. | ||
647 | |||
648 | GNU Extensions | ||
649 | -------------- | ||
650 | |||
651 | GNU Extensions are explicitly allowed in the Linux kernel. Note that | ||
652 | some of the more complex ones are not very well supported, due to lack | ||
653 | of general use, but the following are considered standard (see the GCC | ||
654 | info page section "C Extensions" for more details - Yes, really the info | ||
655 | page, the man page is only a short summary of the stuff in info). | ||
656 | |||
657 | - Inline functions | ||
658 | |||
659 | - Statement expressions (ie. the ({ and }) constructs). | ||
660 | |||
661 | - Declaring attributes of a function / variable / type | ||
662 | (__attribute__) | ||
663 | |||
664 | - typeof | ||
665 | |||
666 | - Zero length arrays | ||
667 | |||
668 | - Macro varargs | ||
669 | |||
670 | - Arithmetic on void pointers | ||
671 | |||
672 | - Non-Constant initializers | ||
673 | |||
674 | - Assembler Instructions (not outside arch/ and include/asm/) | ||
675 | |||
676 | - Function names as strings (__func__). | ||
677 | |||
678 | - __builtin_constant_p() | ||
679 | |||
680 | Be wary when using long long in the kernel, the code gcc generates for | ||
681 | it is horrible and worse: division and multiplication does not work on | ||
682 | i386 because the GCC runtime functions for it are missing from the | ||
683 | kernel environment. | ||
684 | |||
685 | C++ | ||
686 | --- | ||
687 | |||
688 | Using C++ in the kernel is usually a bad idea, because the kernel does | ||
689 | not provide the necessary runtime environment and the include files are | ||
690 | not tested for it. It is still possible, but not recommended. If you | ||
691 | really want to do this, forget about exceptions at least. | ||
692 | |||
693 | NUMif | ||
694 | ----- | ||
695 | |||
696 | It is generally considered cleaner to use macros in header files (or at | ||
697 | the top of .c files) to abstract away functions rather than using \`#if' | ||
698 | pre-processor statements throughout the source code. | ||
699 | |||
700 | Putting Your Stuff in the Kernel | ||
701 | ================================ | ||
702 | |||
703 | In order to get your stuff into shape for official inclusion, or even to | ||
704 | make a neat patch, there's administrative work to be done: | ||
705 | |||
706 | - Figure out whose pond you've been pissing in. Look at the top of the | ||
707 | source files, inside the ``MAINTAINERS`` file, and last of all in the | ||
708 | ``CREDITS`` file. You should coordinate with this person to make sure | ||
709 | you're not duplicating effort, or trying something that's already | ||
710 | been rejected. | ||
711 | |||
712 | Make sure you put your name and EMail address at the top of any files | ||
713 | you create or mangle significantly. This is the first place people | ||
714 | will look when they find a bug, or when **they** want to make a change. | ||
715 | |||
716 | - Usually you want a configuration option for your kernel hack. Edit | ||
717 | ``Kconfig`` in the appropriate directory. The Config language is | ||
718 | simple to use by cut and paste, and there's complete documentation in | ||
719 | ``Documentation/kbuild/kconfig-language.txt``. | ||
720 | |||
721 | In your description of the option, make sure you address both the | ||
722 | expert user and the user who knows nothing about your feature. | ||
723 | Mention incompatibilities and issues here. **Definitely** end your | ||
724 | description with “if in doubt, say N†(or, occasionally, \`Y'); this | ||
725 | is for people who have no idea what you are talking about. | ||
726 | |||
727 | - Edit the ``Makefile``: the CONFIG variables are exported here so you | ||
728 | can usually just add a "obj-$(CONFIG_xxx) += xxx.o" line. The syntax | ||
729 | is documented in ``Documentation/kbuild/makefiles.txt``. | ||
730 | |||
731 | - Put yourself in ``CREDITS`` if you've done something noteworthy, | ||
732 | usually beyond a single file (your name should be at the top of the | ||
733 | source files anyway). ``MAINTAINERS`` means you want to be consulted | ||
734 | when changes are made to a subsystem, and hear about bugs; it implies | ||
735 | a more-than-passing commitment to some part of the code. | ||
736 | |||
737 | - Finally, don't forget to read | ||
738 | ``Documentation/process/submitting-patches.rst`` and possibly | ||
739 | ``Documentation/process/submitting-drivers.rst``. | ||
740 | |||
741 | Kernel Cantrips | ||
742 | =============== | ||
743 | |||
744 | Some favorites from browsing the source. Feel free to add to this list. | ||
745 | |||
746 | ``arch/x86/include/asm/delay.h``:: | ||
747 | |||
748 | #define ndelay(n) (__builtin_constant_p(n) ? \ | ||
749 | ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \ | ||
750 | __ndelay(n)) | ||
751 | |||
752 | |||
753 | ``include/linux/fs.h``:: | ||
754 | |||
755 | /* | ||
756 | * Kernel pointers have redundant information, so we can use a | ||
757 | * scheme where we can return either an error code or a dentry | ||
758 | * pointer with the same return value. | ||
759 | * | ||
760 | * This should be a per-architecture thing, to allow different | ||
761 | * error and pointer decisions. | ||
762 | */ | ||
763 | #define ERR_PTR(err) ((void *)((long)(err))) | ||
764 | #define PTR_ERR(ptr) ((long)(ptr)) | ||
765 | #define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000)) | ||
766 | |||
767 | ``arch/x86/include/asm/uaccess_32.h:``:: | ||
768 | |||
769 | #define copy_to_user(to,from,n) \ | ||
770 | (__builtin_constant_p(n) ? \ | ||
771 | __constant_copy_to_user((to),(from),(n)) : \ | ||
772 | __generic_copy_to_user((to),(from),(n))) | ||
773 | |||
774 | |||
775 | ``arch/sparc/kernel/head.S:``:: | ||
776 | |||
777 | /* | ||
778 | * Sun people can't spell worth damn. "compatability" indeed. | ||
779 | * At least we *know* we can't spell, and use a spell-checker. | ||
780 | */ | ||
781 | |||
782 | /* Uh, actually Linus it is I who cannot spell. Too much murky | ||
783 | * Sparc assembly will do this to ya. | ||
784 | */ | ||
785 | C_LABEL(cputypvar): | ||
786 | .asciz "compatibility" | ||
787 | |||
788 | /* Tested on SS-5, SS-10. Probably someone at Sun applied a spell-checker. */ | ||
789 | .align 4 | ||
790 | C_LABEL(cputypvar_sun4m): | ||
791 | .asciz "compatible" | ||
792 | |||
793 | |||
794 | ``arch/sparc/lib/checksum.S:``:: | ||
795 | |||
796 | /* Sun, you just can't beat me, you just can't. Stop trying, | ||
797 | * give up. I'm serious, I am going to kick the living shit | ||
798 | * out of you, game over, lights out. | ||
799 | */ | ||
800 | |||
801 | |||
802 | Thanks | ||
803 | ====== | ||
804 | |||
805 | Thanks to Andi Kleen for the idea, answering my questions, fixing my | ||
806 | mistakes, filling content, etc. Philipp Rumpf for more spelling and | ||
807 | clarity fixes, and some excellent non-obvious points. Werner Almesberger | ||
808 | for giving me a great summary of :c:func:`disable_irq()`, and Jes | ||
809 | Sorensen and Andrea Arcangeli added caveats. Michael Elizabeth Chastain | ||
810 | for checking and adding to the Configure section. Telsa Gwynne for | ||
811 | teaching me DocBook. | ||
diff --git a/Documentation/kernel-hacking/index.rst b/Documentation/kernel-hacking/index.rst new file mode 100644 index 000000000000..fcb0eda3cca3 --- /dev/null +++ b/Documentation/kernel-hacking/index.rst | |||
@@ -0,0 +1,9 @@ | |||
1 | ===================== | ||
2 | Kernel Hacking Guides | ||
3 | ===================== | ||
4 | |||
5 | .. toctree:: | ||
6 | :maxdepth: 2 | ||
7 | |||
8 | hacking | ||
9 | locking | ||
diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst new file mode 100644 index 000000000000..f937c0fd11aa --- /dev/null +++ b/Documentation/kernel-hacking/locking.rst | |||
@@ -0,0 +1,1446 @@ | |||
1 | =========================== | ||
2 | Unreliable Guide To Locking | ||
3 | =========================== | ||
4 | |||
5 | :Author: Rusty Russell | ||
6 | |||
7 | Introduction | ||
8 | ============ | ||
9 | |||
10 | Welcome, to Rusty's Remarkably Unreliable Guide to Kernel Locking | ||
11 | issues. This document describes the locking systems in the Linux Kernel | ||
12 | in 2.6. | ||
13 | |||
14 | With the wide availability of HyperThreading, and preemption in the | ||
15 | Linux Kernel, everyone hacking on the kernel needs to know the | ||
16 | fundamentals of concurrency and locking for SMP. | ||
17 | |||
18 | The Problem With Concurrency | ||
19 | ============================ | ||
20 | |||
21 | (Skip this if you know what a Race Condition is). | ||
22 | |||
23 | In a normal program, you can increment a counter like so: | ||
24 | |||
25 | :: | ||
26 | |||
27 | very_important_count++; | ||
28 | |||
29 | |||
30 | This is what they would expect to happen: | ||
31 | |||
32 | |||
33 | .. table:: Expected Results | ||
34 | |||
35 | +------------------------------------+------------------------------------+ | ||
36 | | Instance 1 | Instance 2 | | ||
37 | +====================================+====================================+ | ||
38 | | read very_important_count (5) | | | ||
39 | +------------------------------------+------------------------------------+ | ||
40 | | add 1 (6) | | | ||
41 | +------------------------------------+------------------------------------+ | ||
42 | | write very_important_count (6) | | | ||
43 | +------------------------------------+------------------------------------+ | ||
44 | | | read very_important_count (6) | | ||
45 | +------------------------------------+------------------------------------+ | ||
46 | | | add 1 (7) | | ||
47 | +------------------------------------+------------------------------------+ | ||
48 | | | write very_important_count (7) | | ||
49 | +------------------------------------+------------------------------------+ | ||
50 | |||
51 | This is what might happen: | ||
52 | |||
53 | .. table:: Possible Results | ||
54 | |||
55 | +------------------------------------+------------------------------------+ | ||
56 | | Instance 1 | Instance 2 | | ||
57 | +====================================+====================================+ | ||
58 | | read very_important_count (5) | | | ||
59 | +------------------------------------+------------------------------------+ | ||
60 | | | read very_important_count (5) | | ||
61 | +------------------------------------+------------------------------------+ | ||
62 | | add 1 (6) | | | ||
63 | +------------------------------------+------------------------------------+ | ||
64 | | | add 1 (6) | | ||
65 | +------------------------------------+------------------------------------+ | ||
66 | | write very_important_count (6) | | | ||
67 | +------------------------------------+------------------------------------+ | ||
68 | | | write very_important_count (6) | | ||
69 | +------------------------------------+------------------------------------+ | ||
70 | |||
71 | |||
72 | Race Conditions and Critical Regions | ||
73 | ------------------------------------ | ||
74 | |||
75 | This overlap, where the result depends on the relative timing of | ||
76 | multiple tasks, is called a race condition. The piece of code containing | ||
77 | the concurrency issue is called a critical region. And especially since | ||
78 | Linux starting running on SMP machines, they became one of the major | ||
79 | issues in kernel design and implementation. | ||
80 | |||
81 | Preemption can have the same effect, even if there is only one CPU: by | ||
82 | preempting one task during the critical region, we have exactly the same | ||
83 | race condition. In this case the thread which preempts might run the | ||
84 | critical region itself. | ||
85 | |||
86 | The solution is to recognize when these simultaneous accesses occur, and | ||
87 | use locks to make sure that only one instance can enter the critical | ||
88 | region at any time. There are many friendly primitives in the Linux | ||
89 | kernel to help you do this. And then there are the unfriendly | ||
90 | primitives, but I'll pretend they don't exist. | ||
91 | |||
92 | Locking in the Linux Kernel | ||
93 | =========================== | ||
94 | |||
95 | If I could give you one piece of advice: never sleep with anyone crazier | ||
96 | than yourself. But if I had to give you advice on locking: **keep it | ||
97 | simple**. | ||
98 | |||
99 | Be reluctant to introduce new locks. | ||
100 | |||
101 | Strangely enough, this last one is the exact reverse of my advice when | ||
102 | you **have** slept with someone crazier than yourself. And you should | ||
103 | think about getting a big dog. | ||
104 | |||
105 | Two Main Types of Kernel Locks: Spinlocks and Mutexes | ||
106 | ----------------------------------------------------- | ||
107 | |||
108 | There are two main types of kernel locks. The fundamental type is the | ||
109 | spinlock (``include/asm/spinlock.h``), which is a very simple | ||
110 | single-holder lock: if you can't get the spinlock, you keep trying | ||
111 | (spinning) until you can. Spinlocks are very small and fast, and can be | ||
112 | used anywhere. | ||
113 | |||
114 | The second type is a mutex (``include/linux/mutex.h``): it is like a | ||
115 | spinlock, but you may block holding a mutex. If you can't lock a mutex, | ||
116 | your task will suspend itself, and be woken up when the mutex is | ||
117 | released. This means the CPU can do something else while you are | ||
118 | waiting. There are many cases when you simply can't sleep (see | ||
119 | `What Functions Are Safe To Call From Interrupts? <#sleeping-things>`__), | ||
120 | and so have to use a spinlock instead. | ||
121 | |||
122 | Neither type of lock is recursive: see | ||
123 | `Deadlock: Simple and Advanced <#deadlock>`__. | ||
124 | |||
125 | Locks and Uniprocessor Kernels | ||
126 | ------------------------------ | ||
127 | |||
128 | For kernels compiled without ``CONFIG_SMP``, and without | ||
129 | ``CONFIG_PREEMPT`` spinlocks do not exist at all. This is an excellent | ||
130 | design decision: when no-one else can run at the same time, there is no | ||
131 | reason to have a lock. | ||
132 | |||
133 | If the kernel is compiled without ``CONFIG_SMP``, but ``CONFIG_PREEMPT`` | ||
134 | is set, then spinlocks simply disable preemption, which is sufficient to | ||
135 | prevent any races. For most purposes, we can think of preemption as | ||
136 | equivalent to SMP, and not worry about it separately. | ||
137 | |||
138 | You should always test your locking code with ``CONFIG_SMP`` and | ||
139 | ``CONFIG_PREEMPT`` enabled, even if you don't have an SMP test box, | ||
140 | because it will still catch some kinds of locking bugs. | ||
141 | |||
142 | Mutexes still exist, because they are required for synchronization | ||
143 | between user contexts, as we will see below. | ||
144 | |||
145 | Locking Only In User Context | ||
146 | ---------------------------- | ||
147 | |||
148 | If you have a data structure which is only ever accessed from user | ||
149 | context, then you can use a simple mutex (``include/linux/mutex.h``) to | ||
150 | protect it. This is the most trivial case: you initialize the mutex. | ||
151 | Then you can call :c:func:`mutex_lock_interruptible()` to grab the | ||
152 | mutex, and :c:func:`mutex_unlock()` to release it. There is also a | ||
153 | :c:func:`mutex_lock()`, which should be avoided, because it will | ||
154 | not return if a signal is received. | ||
155 | |||
156 | Example: ``net/netfilter/nf_sockopt.c`` allows registration of new | ||
157 | :c:func:`setsockopt()` and :c:func:`getsockopt()` calls, with | ||
158 | :c:func:`nf_register_sockopt()`. Registration and de-registration | ||
159 | are only done on module load and unload (and boot time, where there is | ||
160 | no concurrency), and the list of registrations is only consulted for an | ||
161 | unknown :c:func:`setsockopt()` or :c:func:`getsockopt()` system | ||
162 | call. The ``nf_sockopt_mutex`` is perfect to protect this, especially | ||
163 | since the setsockopt and getsockopt calls may well sleep. | ||
164 | |||
165 | Locking Between User Context and Softirqs | ||
166 | ----------------------------------------- | ||
167 | |||
168 | If a softirq shares data with user context, you have two problems. | ||
169 | Firstly, the current user context can be interrupted by a softirq, and | ||
170 | secondly, the critical region could be entered from another CPU. This is | ||
171 | where :c:func:`spin_lock_bh()` (``include/linux/spinlock.h``) is | ||
172 | used. It disables softirqs on that CPU, then grabs the lock. | ||
173 | :c:func:`spin_unlock_bh()` does the reverse. (The '_bh' suffix is | ||
174 | a historical reference to "Bottom Halves", the old name for software | ||
175 | interrupts. It should really be called spin_lock_softirq()' in a | ||
176 | perfect world). | ||
177 | |||
178 | Note that you can also use :c:func:`spin_lock_irq()` or | ||
179 | :c:func:`spin_lock_irqsave()` here, which stop hardware interrupts | ||
180 | as well: see `Hard IRQ Context <#hardirq-context>`__. | ||
181 | |||
182 | This works perfectly for UP as well: the spin lock vanishes, and this | ||
183 | macro simply becomes :c:func:`local_bh_disable()` | ||
184 | (``include/linux/interrupt.h``), which protects you from the softirq | ||
185 | being run. | ||
186 | |||
187 | Locking Between User Context and Tasklets | ||
188 | ----------------------------------------- | ||
189 | |||
190 | This is exactly the same as above, because tasklets are actually run | ||
191 | from a softirq. | ||
192 | |||
193 | Locking Between User Context and Timers | ||
194 | --------------------------------------- | ||
195 | |||
196 | This, too, is exactly the same as above, because timers are actually run | ||
197 | from a softirq. From a locking point of view, tasklets and timers are | ||
198 | identical. | ||
199 | |||
200 | Locking Between Tasklets/Timers | ||
201 | ------------------------------- | ||
202 | |||
203 | Sometimes a tasklet or timer might want to share data with another | ||
204 | tasklet or timer. | ||
205 | |||
206 | The Same Tasklet/Timer | ||
207 | ~~~~~~~~~~~~~~~~~~~~~~ | ||
208 | |||
209 | Since a tasklet is never run on two CPUs at once, you don't need to | ||
210 | worry about your tasklet being reentrant (running twice at once), even | ||
211 | on SMP. | ||
212 | |||
213 | Different Tasklets/Timers | ||
214 | ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
215 | |||
216 | If another tasklet/timer wants to share data with your tasklet or timer | ||
217 | , you will both need to use :c:func:`spin_lock()` and | ||
218 | :c:func:`spin_unlock()` calls. :c:func:`spin_lock_bh()` is | ||
219 | unnecessary here, as you are already in a tasklet, and none will be run | ||
220 | on the same CPU. | ||
221 | |||
222 | Locking Between Softirqs | ||
223 | ------------------------ | ||
224 | |||
225 | Often a softirq might want to share data with itself or a tasklet/timer. | ||
226 | |||
227 | The Same Softirq | ||
228 | ~~~~~~~~~~~~~~~~ | ||
229 | |||
230 | The same softirq can run on the other CPUs: you can use a per-CPU array | ||
231 | (see `Per-CPU Data <#per-cpu>`__) for better performance. If you're | ||
232 | going so far as to use a softirq, you probably care about scalable | ||
233 | performance enough to justify the extra complexity. | ||
234 | |||
235 | You'll need to use :c:func:`spin_lock()` and | ||
236 | :c:func:`spin_unlock()` for shared data. | ||
237 | |||
238 | Different Softirqs | ||
239 | ~~~~~~~~~~~~~~~~~~ | ||
240 | |||
241 | You'll need to use :c:func:`spin_lock()` and | ||
242 | :c:func:`spin_unlock()` for shared data, whether it be a timer, | ||
243 | tasklet, different softirq or the same or another softirq: any of them | ||
244 | could be running on a different CPU. | ||
245 | |||
246 | Hard IRQ Context | ||
247 | ================ | ||
248 | |||
249 | Hardware interrupts usually communicate with a tasklet or softirq. | ||
250 | Frequently this involves putting work in a queue, which the softirq will | ||
251 | take out. | ||
252 | |||
253 | Locking Between Hard IRQ and Softirqs/Tasklets | ||
254 | ---------------------------------------------- | ||
255 | |||
256 | If a hardware irq handler shares data with a softirq, you have two | ||
257 | concerns. Firstly, the softirq processing can be interrupted by a | ||
258 | hardware interrupt, and secondly, the critical region could be entered | ||
259 | by a hardware interrupt on another CPU. This is where | ||
260 | :c:func:`spin_lock_irq()` is used. It is defined to disable | ||
261 | interrupts on that cpu, then grab the lock. | ||
262 | :c:func:`spin_unlock_irq()` does the reverse. | ||
263 | |||
264 | The irq handler does not to use :c:func:`spin_lock_irq()`, because | ||
265 | the softirq cannot run while the irq handler is running: it can use | ||
266 | :c:func:`spin_lock()`, which is slightly faster. The only exception | ||
267 | would be if a different hardware irq handler uses the same lock: | ||
268 | :c:func:`spin_lock_irq()` will stop that from interrupting us. | ||
269 | |||
270 | This works perfectly for UP as well: the spin lock vanishes, and this | ||
271 | macro simply becomes :c:func:`local_irq_disable()` | ||
272 | (``include/asm/smp.h``), which protects you from the softirq/tasklet/BH | ||
273 | being run. | ||
274 | |||
275 | :c:func:`spin_lock_irqsave()` (``include/linux/spinlock.h``) is a | ||
276 | variant which saves whether interrupts were on or off in a flags word, | ||
277 | which is passed to :c:func:`spin_unlock_irqrestore()`. This means | ||
278 | that the same code can be used inside an hard irq handler (where | ||
279 | interrupts are already off) and in softirqs (where the irq disabling is | ||
280 | required). | ||
281 | |||
282 | Note that softirqs (and hence tasklets and timers) are run on return | ||
283 | from hardware interrupts, so :c:func:`spin_lock_irq()` also stops | ||
284 | these. In that sense, :c:func:`spin_lock_irqsave()` is the most | ||
285 | general and powerful locking function. | ||
286 | |||
287 | Locking Between Two Hard IRQ Handlers | ||
288 | ------------------------------------- | ||
289 | |||
290 | It is rare to have to share data between two IRQ handlers, but if you | ||
291 | do, :c:func:`spin_lock_irqsave()` should be used: it is | ||
292 | architecture-specific whether all interrupts are disabled inside irq | ||
293 | handlers themselves. | ||
294 | |||
295 | Cheat Sheet For Locking | ||
296 | ======================= | ||
297 | |||
298 | Pete Zaitcev gives the following summary: | ||
299 | |||
300 | - If you are in a process context (any syscall) and want to lock other | ||
301 | process out, use a mutex. You can take a mutex and sleep | ||
302 | (``copy_from_user*(`` or ``kmalloc(x,GFP_KERNEL)``). | ||
303 | |||
304 | - Otherwise (== data can be touched in an interrupt), use | ||
305 | :c:func:`spin_lock_irqsave()` and | ||
306 | :c:func:`spin_unlock_irqrestore()`. | ||
307 | |||
308 | - Avoid holding spinlock for more than 5 lines of code and across any | ||
309 | function call (except accessors like :c:func:`readb()`). | ||
310 | |||
311 | Table of Minimum Requirements | ||
312 | ----------------------------- | ||
313 | |||
314 | The following table lists the **minimum** locking requirements between | ||
315 | various contexts. In some cases, the same context can only be running on | ||
316 | one CPU at a time, so no locking is required for that context (eg. a | ||
317 | particular thread can only run on one CPU at a time, but if it needs | ||
318 | shares data with another thread, locking is required). | ||
319 | |||
320 | Remember the advice above: you can always use | ||
321 | :c:func:`spin_lock_irqsave()`, which is a superset of all other | ||
322 | spinlock primitives. | ||
323 | |||
324 | ============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ============== | ||
325 | . IRQ Handler A IRQ Handler B Softirq A Softirq B Tasklet A Tasklet B Timer A Timer B User Context A User Context B | ||
326 | ============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ============== | ||
327 | IRQ Handler A None | ||
328 | IRQ Handler B SLIS None | ||
329 | Softirq A SLI SLI SL | ||
330 | Softirq B SLI SLI SL SL | ||
331 | Tasklet A SLI SLI SL SL None | ||
332 | Tasklet B SLI SLI SL SL SL None | ||
333 | Timer A SLI SLI SL SL SL SL None | ||
334 | Timer B SLI SLI SL SL SL SL SL None | ||
335 | User Context A SLI SLI SLBH SLBH SLBH SLBH SLBH SLBH None | ||
336 | User Context B SLI SLI SLBH SLBH SLBH SLBH SLBH SLBH MLI None | ||
337 | ============== ============= ============= ========= ========= ========= ========= ======= ======= ============== ============== | ||
338 | |||
339 | Table: Table of Locking Requirements | ||
340 | |||
341 | +--------+----------------------------+ | ||
342 | | SLIS | spin_lock_irqsave | | ||
343 | +--------+----------------------------+ | ||
344 | | SLI | spin_lock_irq | | ||
345 | +--------+----------------------------+ | ||
346 | | SL | spin_lock | | ||
347 | +--------+----------------------------+ | ||
348 | | SLBH | spin_lock_bh | | ||
349 | +--------+----------------------------+ | ||
350 | | MLI | mutex_lock_interruptible | | ||
351 | +--------+----------------------------+ | ||
352 | |||
353 | Table: Legend for Locking Requirements Table | ||
354 | |||
355 | The trylock Functions | ||
356 | ===================== | ||
357 | |||
358 | There are functions that try to acquire a lock only once and immediately | ||
359 | return a value telling about success or failure to acquire the lock. | ||
360 | They can be used if you need no access to the data protected with the | ||
361 | lock when some other thread is holding the lock. You should acquire the | ||
362 | lock later if you then need access to the data protected with the lock. | ||
363 | |||
364 | :c:func:`spin_trylock()` does not spin but returns non-zero if it | ||
365 | acquires the spinlock on the first try or 0 if not. This function can be | ||
366 | used in all contexts like :c:func:`spin_lock()`: you must have | ||
367 | disabled the contexts that might interrupt you and acquire the spin | ||
368 | lock. | ||
369 | |||
370 | :c:func:`mutex_trylock()` does not suspend your task but returns | ||
371 | non-zero if it could lock the mutex on the first try or 0 if not. This | ||
372 | function cannot be safely used in hardware or software interrupt | ||
373 | contexts despite not sleeping. | ||
374 | |||
375 | Common Examples | ||
376 | =============== | ||
377 | |||
378 | Let's step through a simple example: a cache of number to name mappings. | ||
379 | The cache keeps a count of how often each of the objects is used, and | ||
380 | when it gets full, throws out the least used one. | ||
381 | |||
382 | All In User Context | ||
383 | ------------------- | ||
384 | |||
385 | For our first example, we assume that all operations are in user context | ||
386 | (ie. from system calls), so we can sleep. This means we can use a mutex | ||
387 | to protect the cache and all the objects within it. Here's the code:: | ||
388 | |||
389 | #include <linux/list.h> | ||
390 | #include <linux/slab.h> | ||
391 | #include <linux/string.h> | ||
392 | #include <linux/mutex.h> | ||
393 | #include <asm/errno.h> | ||
394 | |||
395 | struct object | ||
396 | { | ||
397 | struct list_head list; | ||
398 | int id; | ||
399 | char name[32]; | ||
400 | int popularity; | ||
401 | }; | ||
402 | |||
403 | /* Protects the cache, cache_num, and the objects within it */ | ||
404 | static DEFINE_MUTEX(cache_lock); | ||
405 | static LIST_HEAD(cache); | ||
406 | static unsigned int cache_num = 0; | ||
407 | #define MAX_CACHE_SIZE 10 | ||
408 | |||
409 | /* Must be holding cache_lock */ | ||
410 | static struct object *__cache_find(int id) | ||
411 | { | ||
412 | struct object *i; | ||
413 | |||
414 | list_for_each_entry(i, &cache, list) | ||
415 | if (i->id == id) { | ||
416 | i->popularity++; | ||
417 | return i; | ||
418 | } | ||
419 | return NULL; | ||
420 | } | ||
421 | |||
422 | /* Must be holding cache_lock */ | ||
423 | static void __cache_delete(struct object *obj) | ||
424 | { | ||
425 | BUG_ON(!obj); | ||
426 | list_del(&obj->list); | ||
427 | kfree(obj); | ||
428 | cache_num--; | ||
429 | } | ||
430 | |||
431 | /* Must be holding cache_lock */ | ||
432 | static void __cache_add(struct object *obj) | ||
433 | { | ||
434 | list_add(&obj->list, &cache); | ||
435 | if (++cache_num > MAX_CACHE_SIZE) { | ||
436 | struct object *i, *outcast = NULL; | ||
437 | list_for_each_entry(i, &cache, list) { | ||
438 | if (!outcast || i->popularity < outcast->popularity) | ||
439 | outcast = i; | ||
440 | } | ||
441 | __cache_delete(outcast); | ||
442 | } | ||
443 | } | ||
444 | |||
445 | int cache_add(int id, const char *name) | ||
446 | { | ||
447 | struct object *obj; | ||
448 | |||
449 | if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL) | ||
450 | return -ENOMEM; | ||
451 | |||
452 | strlcpy(obj->name, name, sizeof(obj->name)); | ||
453 | obj->id = id; | ||
454 | obj->popularity = 0; | ||
455 | |||
456 | mutex_lock(&cache_lock); | ||
457 | __cache_add(obj); | ||
458 | mutex_unlock(&cache_lock); | ||
459 | return 0; | ||
460 | } | ||
461 | |||
462 | void cache_delete(int id) | ||
463 | { | ||
464 | mutex_lock(&cache_lock); | ||
465 | __cache_delete(__cache_find(id)); | ||
466 | mutex_unlock(&cache_lock); | ||
467 | } | ||
468 | |||
469 | int cache_find(int id, char *name) | ||
470 | { | ||
471 | struct object *obj; | ||
472 | int ret = -ENOENT; | ||
473 | |||
474 | mutex_lock(&cache_lock); | ||
475 | obj = __cache_find(id); | ||
476 | if (obj) { | ||
477 | ret = 0; | ||
478 | strcpy(name, obj->name); | ||
479 | } | ||
480 | mutex_unlock(&cache_lock); | ||
481 | return ret; | ||
482 | } | ||
483 | |||
484 | Note that we always make sure we have the cache_lock when we add, | ||
485 | delete, or look up the cache: both the cache infrastructure itself and | ||
486 | the contents of the objects are protected by the lock. In this case it's | ||
487 | easy, since we copy the data for the user, and never let them access the | ||
488 | objects directly. | ||
489 | |||
490 | There is a slight (and common) optimization here: in | ||
491 | :c:func:`cache_add()` we set up the fields of the object before | ||
492 | grabbing the lock. This is safe, as no-one else can access it until we | ||
493 | put it in cache. | ||
494 | |||
495 | Accessing From Interrupt Context | ||
496 | -------------------------------- | ||
497 | |||
498 | Now consider the case where :c:func:`cache_find()` can be called | ||
499 | from interrupt context: either a hardware interrupt or a softirq. An | ||
500 | example would be a timer which deletes object from the cache. | ||
501 | |||
502 | The change is shown below, in standard patch format: the ``-`` are lines | ||
503 | which are taken away, and the ``+`` are lines which are added. | ||
504 | |||
505 | :: | ||
506 | |||
507 | --- cache.c.usercontext 2003-12-09 13:58:54.000000000 +1100 | ||
508 | +++ cache.c.interrupt 2003-12-09 14:07:49.000000000 +1100 | ||
509 | @@ -12,7 +12,7 @@ | ||
510 | int popularity; | ||
511 | }; | ||
512 | |||
513 | -static DEFINE_MUTEX(cache_lock); | ||
514 | +static DEFINE_SPINLOCK(cache_lock); | ||
515 | static LIST_HEAD(cache); | ||
516 | static unsigned int cache_num = 0; | ||
517 | #define MAX_CACHE_SIZE 10 | ||
518 | @@ -55,6 +55,7 @@ | ||
519 | int cache_add(int id, const char *name) | ||
520 | { | ||
521 | struct object *obj; | ||
522 | + unsigned long flags; | ||
523 | |||
524 | if ((obj = kmalloc(sizeof(*obj), GFP_KERNEL)) == NULL) | ||
525 | return -ENOMEM; | ||
526 | @@ -63,30 +64,33 @@ | ||
527 | obj->id = id; | ||
528 | obj->popularity = 0; | ||
529 | |||
530 | - mutex_lock(&cache_lock); | ||
531 | + spin_lock_irqsave(&cache_lock, flags); | ||
532 | __cache_add(obj); | ||
533 | - mutex_unlock(&cache_lock); | ||
534 | + spin_unlock_irqrestore(&cache_lock, flags); | ||
535 | return 0; | ||
536 | } | ||
537 | |||
538 | void cache_delete(int id) | ||
539 | { | ||
540 | - mutex_lock(&cache_lock); | ||
541 | + unsigned long flags; | ||
542 | + | ||
543 | + spin_lock_irqsave(&cache_lock, flags); | ||
544 | __cache_delete(__cache_find(id)); | ||
545 | - mutex_unlock(&cache_lock); | ||
546 | + spin_unlock_irqrestore(&cache_lock, flags); | ||
547 | } | ||
548 | |||
549 | int cache_find(int id, char *name) | ||
550 | { | ||
551 | struct object *obj; | ||
552 | int ret = -ENOENT; | ||
553 | + unsigned long flags; | ||
554 | |||
555 | - mutex_lock(&cache_lock); | ||
556 | + spin_lock_irqsave(&cache_lock, flags); | ||
557 | obj = __cache_find(id); | ||
558 | if (obj) { | ||
559 | ret = 0; | ||
560 | strcpy(name, obj->name); | ||
561 | } | ||
562 | - mutex_unlock(&cache_lock); | ||
563 | + spin_unlock_irqrestore(&cache_lock, flags); | ||
564 | return ret; | ||
565 | } | ||
566 | |||
567 | Note that the :c:func:`spin_lock_irqsave()` will turn off | ||
568 | interrupts if they are on, otherwise does nothing (if we are already in | ||
569 | an interrupt handler), hence these functions are safe to call from any | ||
570 | context. | ||
571 | |||
572 | Unfortunately, :c:func:`cache_add()` calls :c:func:`kmalloc()` | ||
573 | with the ``GFP_KERNEL`` flag, which is only legal in user context. I | ||
574 | have assumed that :c:func:`cache_add()` is still only called in | ||
575 | user context, otherwise this should become a parameter to | ||
576 | :c:func:`cache_add()`. | ||
577 | |||
578 | Exposing Objects Outside This File | ||
579 | ---------------------------------- | ||
580 | |||
581 | If our objects contained more information, it might not be sufficient to | ||
582 | copy the information in and out: other parts of the code might want to | ||
583 | keep pointers to these objects, for example, rather than looking up the | ||
584 | id every time. This produces two problems. | ||
585 | |||
586 | The first problem is that we use the ``cache_lock`` to protect objects: | ||
587 | we'd need to make this non-static so the rest of the code can use it. | ||
588 | This makes locking trickier, as it is no longer all in one place. | ||
589 | |||
590 | The second problem is the lifetime problem: if another structure keeps a | ||
591 | pointer to an object, it presumably expects that pointer to remain | ||
592 | valid. Unfortunately, this is only guaranteed while you hold the lock, | ||
593 | otherwise someone might call :c:func:`cache_delete()` and even | ||
594 | worse, add another object, re-using the same address. | ||
595 | |||
596 | As there is only one lock, you can't hold it forever: no-one else would | ||
597 | get any work done. | ||
598 | |||
599 | The solution to this problem is to use a reference count: everyone who | ||
600 | has a pointer to the object increases it when they first get the object, | ||
601 | and drops the reference count when they're finished with it. Whoever | ||
602 | drops it to zero knows it is unused, and can actually delete it. | ||
603 | |||
604 | Here is the code:: | ||
605 | |||
606 | --- cache.c.interrupt 2003-12-09 14:25:43.000000000 +1100 | ||
607 | +++ cache.c.refcnt 2003-12-09 14:33:05.000000000 +1100 | ||
608 | @@ -7,6 +7,7 @@ | ||
609 | struct object | ||
610 | { | ||
611 | struct list_head list; | ||
612 | + unsigned int refcnt; | ||
613 | int id; | ||
614 | char name[32]; | ||
615 | int popularity; | ||
616 | @@ -17,6 +18,35 @@ | ||
617 | static unsigned int cache_num = 0; | ||
618 | #define MAX_CACHE_SIZE 10 | ||
619 | |||
620 | +static void __object_put(struct object *obj) | ||
621 | +{ | ||
622 | + if (--obj->refcnt == 0) | ||
623 | + kfree(obj); | ||
624 | +} | ||
625 | + | ||
626 | +static void __object_get(struct object *obj) | ||
627 | +{ | ||
628 | + obj->refcnt++; | ||
629 | +} | ||
630 | + | ||
631 | +void object_put(struct object *obj) | ||
632 | +{ | ||
633 | + unsigned long flags; | ||
634 | + | ||
635 | + spin_lock_irqsave(&cache_lock, flags); | ||
636 | + __object_put(obj); | ||
637 | + spin_unlock_irqrestore(&cache_lock, flags); | ||
638 | +} | ||
639 | + | ||
640 | +void object_get(struct object *obj) | ||
641 | +{ | ||
642 | + unsigned long flags; | ||
643 | + | ||
644 | + spin_lock_irqsave(&cache_lock, flags); | ||
645 | + __object_get(obj); | ||
646 | + spin_unlock_irqrestore(&cache_lock, flags); | ||
647 | +} | ||
648 | + | ||
649 | /* Must be holding cache_lock */ | ||
650 | static struct object *__cache_find(int id) | ||
651 | { | ||
652 | @@ -35,6 +65,7 @@ | ||
653 | { | ||
654 | BUG_ON(!obj); | ||
655 | list_del(&obj->list); | ||
656 | + __object_put(obj); | ||
657 | cache_num--; | ||
658 | } | ||
659 | |||
660 | @@ -63,6 +94,7 @@ | ||
661 | strlcpy(obj->name, name, sizeof(obj->name)); | ||
662 | obj->id = id; | ||
663 | obj->popularity = 0; | ||
664 | + obj->refcnt = 1; /* The cache holds a reference */ | ||
665 | |||
666 | spin_lock_irqsave(&cache_lock, flags); | ||
667 | __cache_add(obj); | ||
668 | @@ -79,18 +111,15 @@ | ||
669 | spin_unlock_irqrestore(&cache_lock, flags); | ||
670 | } | ||
671 | |||
672 | -int cache_find(int id, char *name) | ||
673 | +struct object *cache_find(int id) | ||
674 | { | ||
675 | struct object *obj; | ||
676 | - int ret = -ENOENT; | ||
677 | unsigned long flags; | ||
678 | |||
679 | spin_lock_irqsave(&cache_lock, flags); | ||
680 | obj = __cache_find(id); | ||
681 | - if (obj) { | ||
682 | - ret = 0; | ||
683 | - strcpy(name, obj->name); | ||
684 | - } | ||
685 | + if (obj) | ||
686 | + __object_get(obj); | ||
687 | spin_unlock_irqrestore(&cache_lock, flags); | ||
688 | - return ret; | ||
689 | + return obj; | ||
690 | } | ||
691 | |||
692 | We encapsulate the reference counting in the standard 'get' and 'put' | ||
693 | functions. Now we can return the object itself from | ||
694 | :c:func:`cache_find()` which has the advantage that the user can | ||
695 | now sleep holding the object (eg. to :c:func:`copy_to_user()` to | ||
696 | name to userspace). | ||
697 | |||
698 | The other point to note is that I said a reference should be held for | ||
699 | every pointer to the object: thus the reference count is 1 when first | ||
700 | inserted into the cache. In some versions the framework does not hold a | ||
701 | reference count, but they are more complicated. | ||
702 | |||
703 | Using Atomic Operations For The Reference Count | ||
704 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
705 | |||
706 | In practice, :c:type:`atomic_t` would usually be used for refcnt. There are a | ||
707 | number of atomic operations defined in ``include/asm/atomic.h``: these | ||
708 | are guaranteed to be seen atomically from all CPUs in the system, so no | ||
709 | lock is required. In this case, it is simpler than using spinlocks, | ||
710 | although for anything non-trivial using spinlocks is clearer. The | ||
711 | :c:func:`atomic_inc()` and :c:func:`atomic_dec_and_test()` | ||
712 | are used instead of the standard increment and decrement operators, and | ||
713 | the lock is no longer used to protect the reference count itself. | ||
714 | |||
715 | :: | ||
716 | |||
717 | --- cache.c.refcnt 2003-12-09 15:00:35.000000000 +1100 | ||
718 | +++ cache.c.refcnt-atomic 2003-12-11 15:49:42.000000000 +1100 | ||
719 | @@ -7,7 +7,7 @@ | ||
720 | struct object | ||
721 | { | ||
722 | struct list_head list; | ||
723 | - unsigned int refcnt; | ||
724 | + atomic_t refcnt; | ||
725 | int id; | ||
726 | char name[32]; | ||
727 | int popularity; | ||
728 | @@ -18,33 +18,15 @@ | ||
729 | static unsigned int cache_num = 0; | ||
730 | #define MAX_CACHE_SIZE 10 | ||
731 | |||
732 | -static void __object_put(struct object *obj) | ||
733 | -{ | ||
734 | - if (--obj->refcnt == 0) | ||
735 | - kfree(obj); | ||
736 | -} | ||
737 | - | ||
738 | -static void __object_get(struct object *obj) | ||
739 | -{ | ||
740 | - obj->refcnt++; | ||
741 | -} | ||
742 | - | ||
743 | void object_put(struct object *obj) | ||
744 | { | ||
745 | - unsigned long flags; | ||
746 | - | ||
747 | - spin_lock_irqsave(&cache_lock, flags); | ||
748 | - __object_put(obj); | ||
749 | - spin_unlock_irqrestore(&cache_lock, flags); | ||
750 | + if (atomic_dec_and_test(&obj->refcnt)) | ||
751 | + kfree(obj); | ||
752 | } | ||
753 | |||
754 | void object_get(struct object *obj) | ||
755 | { | ||
756 | - unsigned long flags; | ||
757 | - | ||
758 | - spin_lock_irqsave(&cache_lock, flags); | ||
759 | - __object_get(obj); | ||
760 | - spin_unlock_irqrestore(&cache_lock, flags); | ||
761 | + atomic_inc(&obj->refcnt); | ||
762 | } | ||
763 | |||
764 | /* Must be holding cache_lock */ | ||
765 | @@ -65,7 +47,7 @@ | ||
766 | { | ||
767 | BUG_ON(!obj); | ||
768 | list_del(&obj->list); | ||
769 | - __object_put(obj); | ||
770 | + object_put(obj); | ||
771 | cache_num--; | ||
772 | } | ||
773 | |||
774 | @@ -94,7 +76,7 @@ | ||
775 | strlcpy(obj->name, name, sizeof(obj->name)); | ||
776 | obj->id = id; | ||
777 | obj->popularity = 0; | ||
778 | - obj->refcnt = 1; /* The cache holds a reference */ | ||
779 | + atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ | ||
780 | |||
781 | spin_lock_irqsave(&cache_lock, flags); | ||
782 | __cache_add(obj); | ||
783 | @@ -119,7 +101,7 @@ | ||
784 | spin_lock_irqsave(&cache_lock, flags); | ||
785 | obj = __cache_find(id); | ||
786 | if (obj) | ||
787 | - __object_get(obj); | ||
788 | + object_get(obj); | ||
789 | spin_unlock_irqrestore(&cache_lock, flags); | ||
790 | return obj; | ||
791 | } | ||
792 | |||
793 | Protecting The Objects Themselves | ||
794 | --------------------------------- | ||
795 | |||
796 | In these examples, we assumed that the objects (except the reference | ||
797 | counts) never changed once they are created. If we wanted to allow the | ||
798 | name to change, there are three possibilities: | ||
799 | |||
800 | - You can make ``cache_lock`` non-static, and tell people to grab that | ||
801 | lock before changing the name in any object. | ||
802 | |||
803 | - You can provide a :c:func:`cache_obj_rename()` which grabs this | ||
804 | lock and changes the name for the caller, and tell everyone to use | ||
805 | that function. | ||
806 | |||
807 | - You can make the ``cache_lock`` protect only the cache itself, and | ||
808 | use another lock to protect the name. | ||
809 | |||
810 | Theoretically, you can make the locks as fine-grained as one lock for | ||
811 | every field, for every object. In practice, the most common variants | ||
812 | are: | ||
813 | |||
814 | - One lock which protects the infrastructure (the ``cache`` list in | ||
815 | this example) and all the objects. This is what we have done so far. | ||
816 | |||
817 | - One lock which protects the infrastructure (including the list | ||
818 | pointers inside the objects), and one lock inside the object which | ||
819 | protects the rest of that object. | ||
820 | |||
821 | - Multiple locks to protect the infrastructure (eg. one lock per hash | ||
822 | chain), possibly with a separate per-object lock. | ||
823 | |||
824 | Here is the "lock-per-object" implementation: | ||
825 | |||
826 | :: | ||
827 | |||
828 | --- cache.c.refcnt-atomic 2003-12-11 15:50:54.000000000 +1100 | ||
829 | +++ cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100 | ||
830 | @@ -6,11 +6,17 @@ | ||
831 | |||
832 | struct object | ||
833 | { | ||
834 | + /* These two protected by cache_lock. */ | ||
835 | struct list_head list; | ||
836 | + int popularity; | ||
837 | + | ||
838 | atomic_t refcnt; | ||
839 | + | ||
840 | + /* Doesn't change once created. */ | ||
841 | int id; | ||
842 | + | ||
843 | + spinlock_t lock; /* Protects the name */ | ||
844 | char name[32]; | ||
845 | - int popularity; | ||
846 | }; | ||
847 | |||
848 | static DEFINE_SPINLOCK(cache_lock); | ||
849 | @@ -77,6 +84,7 @@ | ||
850 | obj->id = id; | ||
851 | obj->popularity = 0; | ||
852 | atomic_set(&obj->refcnt, 1); /* The cache holds a reference */ | ||
853 | + spin_lock_init(&obj->lock); | ||
854 | |||
855 | spin_lock_irqsave(&cache_lock, flags); | ||
856 | __cache_add(obj); | ||
857 | |||
858 | Note that I decide that the popularity count should be protected by the | ||
859 | ``cache_lock`` rather than the per-object lock: this is because it (like | ||
860 | the :c:type:`struct list_head <list_head>` inside the object) | ||
861 | is logically part of the infrastructure. This way, I don't need to grab | ||
862 | the lock of every object in :c:func:`__cache_add()` when seeking | ||
863 | the least popular. | ||
864 | |||
865 | I also decided that the id member is unchangeable, so I don't need to | ||
866 | grab each object lock in :c:func:`__cache_find()` to examine the | ||
867 | id: the object lock is only used by a caller who wants to read or write | ||
868 | the name field. | ||
869 | |||
870 | Note also that I added a comment describing what data was protected by | ||
871 | which locks. This is extremely important, as it describes the runtime | ||
872 | behavior of the code, and can be hard to gain from just reading. And as | ||
873 | Alan Cox says, “Lock data, not codeâ€. | ||
874 | |||
875 | Common Problems | ||
876 | =============== | ||
877 | |||
878 | Deadlock: Simple and Advanced | ||
879 | ----------------------------- | ||
880 | |||
881 | There is a coding bug where a piece of code tries to grab a spinlock | ||
882 | twice: it will spin forever, waiting for the lock to be released | ||
883 | (spinlocks, rwlocks and mutexes are not recursive in Linux). This is | ||
884 | trivial to diagnose: not a | ||
885 | stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem. | ||
886 | |||
887 | For a slightly more complex case, imagine you have a region shared by a | ||
888 | softirq and user context. If you use a :c:func:`spin_lock()` call | ||
889 | to protect it, it is possible that the user context will be interrupted | ||
890 | by the softirq while it holds the lock, and the softirq will then spin | ||
891 | forever trying to get the same lock. | ||
892 | |||
893 | Both of these are called deadlock, and as shown above, it can occur even | ||
894 | with a single CPU (although not on UP compiles, since spinlocks vanish | ||
895 | on kernel compiles with ``CONFIG_SMP``\ =n. You'll still get data | ||
896 | corruption in the second example). | ||
897 | |||
898 | This complete lockup is easy to diagnose: on SMP boxes the watchdog | ||
899 | timer or compiling with ``DEBUG_SPINLOCK`` set | ||
900 | (``include/linux/spinlock.h``) will show this up immediately when it | ||
901 | happens. | ||
902 | |||
903 | A more complex problem is the so-called 'deadly embrace', involving two | ||
904 | or more locks. Say you have a hash table: each entry in the table is a | ||
905 | spinlock, and a chain of hashed objects. Inside a softirq handler, you | ||
906 | sometimes want to alter an object from one place in the hash to another: | ||
907 | you grab the spinlock of the old hash chain and the spinlock of the new | ||
908 | hash chain, and delete the object from the old one, and insert it in the | ||
909 | new one. | ||
910 | |||
911 | There are two problems here. First, if your code ever tries to move the | ||
912 | object to the same chain, it will deadlock with itself as it tries to | ||
913 | lock it twice. Secondly, if the same softirq on another CPU is trying to | ||
914 | move another object in the reverse direction, the following could | ||
915 | happen: | ||
916 | |||
917 | +-----------------------+-----------------------+ | ||
918 | | CPU 1 | CPU 2 | | ||
919 | +=======================+=======================+ | ||
920 | | Grab lock A -> OK | Grab lock B -> OK | | ||
921 | +-----------------------+-----------------------+ | ||
922 | | Grab lock B -> spin | Grab lock A -> spin | | ||
923 | +-----------------------+-----------------------+ | ||
924 | |||
925 | Table: Consequences | ||
926 | |||
927 | The two CPUs will spin forever, waiting for the other to give up their | ||
928 | lock. It will look, smell, and feel like a crash. | ||
929 | |||
930 | Preventing Deadlock | ||
931 | ------------------- | ||
932 | |||
933 | Textbooks will tell you that if you always lock in the same order, you | ||
934 | will never get this kind of deadlock. Practice will tell you that this | ||
935 | approach doesn't scale: when I create a new lock, I don't understand | ||
936 | enough of the kernel to figure out where in the 5000 lock hierarchy it | ||
937 | will fit. | ||
938 | |||
939 | The best locks are encapsulated: they never get exposed in headers, and | ||
940 | are never held around calls to non-trivial functions outside the same | ||
941 | file. You can read through this code and see that it will never | ||
942 | deadlock, because it never tries to grab another lock while it has that | ||
943 | one. People using your code don't even need to know you are using a | ||
944 | lock. | ||
945 | |||
946 | A classic problem here is when you provide callbacks or hooks: if you | ||
947 | call these with the lock held, you risk simple deadlock, or a deadly | ||
948 | embrace (who knows what the callback will do?). Remember, the other | ||
949 | programmers are out to get you, so don't do this. | ||
950 | |||
951 | Overzealous Prevention Of Deadlocks | ||
952 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
953 | |||
954 | Deadlocks are problematic, but not as bad as data corruption. Code which | ||
955 | grabs a read lock, searches a list, fails to find what it wants, drops | ||
956 | the read lock, grabs a write lock and inserts the object has a race | ||
957 | condition. | ||
958 | |||
959 | If you don't see why, please stay the fuck away from my code. | ||
960 | |||
961 | Racing Timers: A Kernel Pastime | ||
962 | ------------------------------- | ||
963 | |||
964 | Timers can produce their own special problems with races. Consider a | ||
965 | collection of objects (list, hash, etc) where each object has a timer | ||
966 | which is due to destroy it. | ||
967 | |||
968 | If you want to destroy the entire collection (say on module removal), | ||
969 | you might do the following:: | ||
970 | |||
971 | /* THIS CODE BAD BAD BAD BAD: IF IT WAS ANY WORSE IT WOULD USE | ||
972 | HUNGARIAN NOTATION */ | ||
973 | spin_lock_bh(&list_lock); | ||
974 | |||
975 | while (list) { | ||
976 | struct foo *next = list->next; | ||
977 | del_timer(&list->timer); | ||
978 | kfree(list); | ||
979 | list = next; | ||
980 | } | ||
981 | |||
982 | spin_unlock_bh(&list_lock); | ||
983 | |||
984 | |||
985 | Sooner or later, this will crash on SMP, because a timer can have just | ||
986 | gone off before the :c:func:`spin_lock_bh()`, and it will only get | ||
987 | the lock after we :c:func:`spin_unlock_bh()`, and then try to free | ||
988 | the element (which has already been freed!). | ||
989 | |||
990 | This can be avoided by checking the result of | ||
991 | :c:func:`del_timer()`: if it returns 1, the timer has been deleted. | ||
992 | If 0, it means (in this case) that it is currently running, so we can | ||
993 | do:: | ||
994 | |||
995 | retry: | ||
996 | spin_lock_bh(&list_lock); | ||
997 | |||
998 | while (list) { | ||
999 | struct foo *next = list->next; | ||
1000 | if (!del_timer(&list->timer)) { | ||
1001 | /* Give timer a chance to delete this */ | ||
1002 | spin_unlock_bh(&list_lock); | ||
1003 | goto retry; | ||
1004 | } | ||
1005 | kfree(list); | ||
1006 | list = next; | ||
1007 | } | ||
1008 | |||
1009 | spin_unlock_bh(&list_lock); | ||
1010 | |||
1011 | |||
1012 | Another common problem is deleting timers which restart themselves (by | ||
1013 | calling :c:func:`add_timer()` at the end of their timer function). | ||
1014 | Because this is a fairly common case which is prone to races, you should | ||
1015 | use :c:func:`del_timer_sync()` (``include/linux/timer.h``) to | ||
1016 | handle this case. It returns the number of times the timer had to be | ||
1017 | deleted before we finally stopped it from adding itself back in. | ||
1018 | |||
1019 | Locking Speed | ||
1020 | ============= | ||
1021 | |||
1022 | There are three main things to worry about when considering speed of | ||
1023 | some code which does locking. First is concurrency: how many things are | ||
1024 | going to be waiting while someone else is holding a lock. Second is the | ||
1025 | time taken to actually acquire and release an uncontended lock. Third is | ||
1026 | using fewer, or smarter locks. I'm assuming that the lock is used fairly | ||
1027 | often: otherwise, you wouldn't be concerned about efficiency. | ||
1028 | |||
1029 | Concurrency depends on how long the lock is usually held: you should | ||
1030 | hold the lock for as long as needed, but no longer. In the cache | ||
1031 | example, we always create the object without the lock held, and then | ||
1032 | grab the lock only when we are ready to insert it in the list. | ||
1033 | |||
1034 | Acquisition times depend on how much damage the lock operations do to | ||
1035 | the pipeline (pipeline stalls) and how likely it is that this CPU was | ||
1036 | the last one to grab the lock (ie. is the lock cache-hot for this CPU): | ||
1037 | on a machine with more CPUs, this likelihood drops fast. Consider a | ||
1038 | 700MHz Intel Pentium III: an instruction takes about 0.7ns, an atomic | ||
1039 | increment takes about 58ns, a lock which is cache-hot on this CPU takes | ||
1040 | 160ns, and a cacheline transfer from another CPU takes an additional 170 | ||
1041 | to 360ns. (These figures from Paul McKenney's `Linux Journal RCU | ||
1042 | article <http://www.linuxjournal.com/article.php?sid=6993>`__). | ||
1043 | |||
1044 | These two aims conflict: holding a lock for a short time might be done | ||
1045 | by splitting locks into parts (such as in our final per-object-lock | ||
1046 | example), but this increases the number of lock acquisitions, and the | ||
1047 | results are often slower than having a single lock. This is another | ||
1048 | reason to advocate locking simplicity. | ||
1049 | |||
1050 | The third concern is addressed below: there are some methods to reduce | ||
1051 | the amount of locking which needs to be done. | ||
1052 | |||
1053 | Read/Write Lock Variants | ||
1054 | ------------------------ | ||
1055 | |||
1056 | Both spinlocks and mutexes have read/write variants: ``rwlock_t`` and | ||
1057 | :c:type:`struct rw_semaphore <rw_semaphore>`. These divide | ||
1058 | users into two classes: the readers and the writers. If you are only | ||
1059 | reading the data, you can get a read lock, but to write to the data you | ||
1060 | need the write lock. Many people can hold a read lock, but a writer must | ||
1061 | be sole holder. | ||
1062 | |||
1063 | If your code divides neatly along reader/writer lines (as our cache code | ||
1064 | does), and the lock is held by readers for significant lengths of time, | ||
1065 | using these locks can help. They are slightly slower than the normal | ||
1066 | locks though, so in practice ``rwlock_t`` is not usually worthwhile. | ||
1067 | |||
1068 | Avoiding Locks: Read Copy Update | ||
1069 | -------------------------------- | ||
1070 | |||
1071 | There is a special method of read/write locking called Read Copy Update. | ||
1072 | Using RCU, the readers can avoid taking a lock altogether: as we expect | ||
1073 | our cache to be read more often than updated (otherwise the cache is a | ||
1074 | waste of time), it is a candidate for this optimization. | ||
1075 | |||
1076 | How do we get rid of read locks? Getting rid of read locks means that | ||
1077 | writers may be changing the list underneath the readers. That is | ||
1078 | actually quite simple: we can read a linked list while an element is | ||
1079 | being added if the writer adds the element very carefully. For example, | ||
1080 | adding ``new`` to a single linked list called ``list``:: | ||
1081 | |||
1082 | new->next = list->next; | ||
1083 | wmb(); | ||
1084 | list->next = new; | ||
1085 | |||
1086 | |||
1087 | The :c:func:`wmb()` is a write memory barrier. It ensures that the | ||
1088 | first operation (setting the new element's ``next`` pointer) is complete | ||
1089 | and will be seen by all CPUs, before the second operation is (putting | ||
1090 | the new element into the list). This is important, since modern | ||
1091 | compilers and modern CPUs can both reorder instructions unless told | ||
1092 | otherwise: we want a reader to either not see the new element at all, or | ||
1093 | see the new element with the ``next`` pointer correctly pointing at the | ||
1094 | rest of the list. | ||
1095 | |||
1096 | Fortunately, there is a function to do this for standard | ||
1097 | :c:type:`struct list_head <list_head>` lists: | ||
1098 | :c:func:`list_add_rcu()` (``include/linux/list.h``). | ||
1099 | |||
1100 | Removing an element from the list is even simpler: we replace the | ||
1101 | pointer to the old element with a pointer to its successor, and readers | ||
1102 | will either see it, or skip over it. | ||
1103 | |||
1104 | :: | ||
1105 | |||
1106 | list->next = old->next; | ||
1107 | |||
1108 | |||
1109 | There is :c:func:`list_del_rcu()` (``include/linux/list.h``) which | ||
1110 | does this (the normal version poisons the old object, which we don't | ||
1111 | want). | ||
1112 | |||
1113 | The reader must also be careful: some CPUs can look through the ``next`` | ||
1114 | pointer to start reading the contents of the next element early, but | ||
1115 | don't realize that the pre-fetched contents is wrong when the ``next`` | ||
1116 | pointer changes underneath them. Once again, there is a | ||
1117 | :c:func:`list_for_each_entry_rcu()` (``include/linux/list.h``) | ||
1118 | to help you. Of course, writers can just use | ||
1119 | :c:func:`list_for_each_entry()`, since there cannot be two | ||
1120 | simultaneous writers. | ||
1121 | |||
1122 | Our final dilemma is this: when can we actually destroy the removed | ||
1123 | element? Remember, a reader might be stepping through this element in | ||
1124 | the list right now: if we free this element and the ``next`` pointer | ||
1125 | changes, the reader will jump off into garbage and crash. We need to | ||
1126 | wait until we know that all the readers who were traversing the list | ||
1127 | when we deleted the element are finished. We use | ||
1128 | :c:func:`call_rcu()` to register a callback which will actually | ||
1129 | destroy the object once all pre-existing readers are finished. | ||
1130 | Alternatively, :c:func:`synchronize_rcu()` may be used to block | ||
1131 | until all pre-existing are finished. | ||
1132 | |||
1133 | But how does Read Copy Update know when the readers are finished? The | ||
1134 | method is this: firstly, the readers always traverse the list inside | ||
1135 | :c:func:`rcu_read_lock()`/:c:func:`rcu_read_unlock()` pairs: | ||
1136 | these simply disable preemption so the reader won't go to sleep while | ||
1137 | reading the list. | ||
1138 | |||
1139 | RCU then waits until every other CPU has slept at least once: since | ||
1140 | readers cannot sleep, we know that any readers which were traversing the | ||
1141 | list during the deletion are finished, and the callback is triggered. | ||
1142 | The real Read Copy Update code is a little more optimized than this, but | ||
1143 | this is the fundamental idea. | ||
1144 | |||
1145 | :: | ||
1146 | |||
1147 | --- cache.c.perobjectlock 2003-12-11 17:15:03.000000000 +1100 | ||
1148 | +++ cache.c.rcupdate 2003-12-11 17:55:14.000000000 +1100 | ||
1149 | @@ -1,15 +1,18 @@ | ||
1150 | #include <linux/list.h> | ||
1151 | #include <linux/slab.h> | ||
1152 | #include <linux/string.h> | ||
1153 | +#include <linux/rcupdate.h> | ||
1154 | #include <linux/mutex.h> | ||
1155 | #include <asm/errno.h> | ||
1156 | |||
1157 | struct object | ||
1158 | { | ||
1159 | - /* These two protected by cache_lock. */ | ||
1160 | + /* This is protected by RCU */ | ||
1161 | struct list_head list; | ||
1162 | int popularity; | ||
1163 | |||
1164 | + struct rcu_head rcu; | ||
1165 | + | ||
1166 | atomic_t refcnt; | ||
1167 | |||
1168 | /* Doesn't change once created. */ | ||
1169 | @@ -40,7 +43,7 @@ | ||
1170 | { | ||
1171 | struct object *i; | ||
1172 | |||
1173 | - list_for_each_entry(i, &cache, list) { | ||
1174 | + list_for_each_entry_rcu(i, &cache, list) { | ||
1175 | if (i->id == id) { | ||
1176 | i->popularity++; | ||
1177 | return i; | ||
1178 | @@ -49,19 +52,25 @@ | ||
1179 | return NULL; | ||
1180 | } | ||
1181 | |||
1182 | +/* Final discard done once we know no readers are looking. */ | ||
1183 | +static void cache_delete_rcu(void *arg) | ||
1184 | +{ | ||
1185 | + object_put(arg); | ||
1186 | +} | ||
1187 | + | ||
1188 | /* Must be holding cache_lock */ | ||
1189 | static void __cache_delete(struct object *obj) | ||
1190 | { | ||
1191 | BUG_ON(!obj); | ||
1192 | - list_del(&obj->list); | ||
1193 | - object_put(obj); | ||
1194 | + list_del_rcu(&obj->list); | ||
1195 | cache_num--; | ||
1196 | + call_rcu(&obj->rcu, cache_delete_rcu); | ||
1197 | } | ||
1198 | |||
1199 | /* Must be holding cache_lock */ | ||
1200 | static void __cache_add(struct object *obj) | ||
1201 | { | ||
1202 | - list_add(&obj->list, &cache); | ||
1203 | + list_add_rcu(&obj->list, &cache); | ||
1204 | if (++cache_num > MAX_CACHE_SIZE) { | ||
1205 | struct object *i, *outcast = NULL; | ||
1206 | list_for_each_entry(i, &cache, list) { | ||
1207 | @@ -104,12 +114,11 @@ | ||
1208 | struct object *cache_find(int id) | ||
1209 | { | ||
1210 | struct object *obj; | ||
1211 | - unsigned long flags; | ||
1212 | |||
1213 | - spin_lock_irqsave(&cache_lock, flags); | ||
1214 | + rcu_read_lock(); | ||
1215 | obj = __cache_find(id); | ||
1216 | if (obj) | ||
1217 | object_get(obj); | ||
1218 | - spin_unlock_irqrestore(&cache_lock, flags); | ||
1219 | + rcu_read_unlock(); | ||
1220 | return obj; | ||
1221 | } | ||
1222 | |||
1223 | Note that the reader will alter the popularity member in | ||
1224 | :c:func:`__cache_find()`, and now it doesn't hold a lock. One | ||
1225 | solution would be to make it an ``atomic_t``, but for this usage, we | ||
1226 | don't really care about races: an approximate result is good enough, so | ||
1227 | I didn't change it. | ||
1228 | |||
1229 | The result is that :c:func:`cache_find()` requires no | ||
1230 | synchronization with any other functions, so is almost as fast on SMP as | ||
1231 | it would be on UP. | ||
1232 | |||
1233 | There is a further optimization possible here: remember our original | ||
1234 | cache code, where there were no reference counts and the caller simply | ||
1235 | held the lock whenever using the object? This is still possible: if you | ||
1236 | hold the lock, no one can delete the object, so you don't need to get | ||
1237 | and put the reference count. | ||
1238 | |||
1239 | Now, because the 'read lock' in RCU is simply disabling preemption, a | ||
1240 | caller which always has preemption disabled between calling | ||
1241 | :c:func:`cache_find()` and :c:func:`object_put()` does not | ||
1242 | need to actually get and put the reference count: we could expose | ||
1243 | :c:func:`__cache_find()` by making it non-static, and such | ||
1244 | callers could simply call that. | ||
1245 | |||
1246 | The benefit here is that the reference count is not written to: the | ||
1247 | object is not altered in any way, which is much faster on SMP machines | ||
1248 | due to caching. | ||
1249 | |||
1250 | Per-CPU Data | ||
1251 | ------------ | ||
1252 | |||
1253 | Another technique for avoiding locking which is used fairly widely is to | ||
1254 | duplicate information for each CPU. For example, if you wanted to keep a | ||
1255 | count of a common condition, you could use a spin lock and a single | ||
1256 | counter. Nice and simple. | ||
1257 | |||
1258 | If that was too slow (it's usually not, but if you've got a really big | ||
1259 | machine to test on and can show that it is), you could instead use a | ||
1260 | counter for each CPU, then none of them need an exclusive lock. See | ||
1261 | :c:func:`DEFINE_PER_CPU()`, :c:func:`get_cpu_var()` and | ||
1262 | :c:func:`put_cpu_var()` (``include/linux/percpu.h``). | ||
1263 | |||
1264 | Of particular use for simple per-cpu counters is the ``local_t`` type, | ||
1265 | and the :c:func:`cpu_local_inc()` and related functions, which are | ||
1266 | more efficient than simple code on some architectures | ||
1267 | (``include/asm/local.h``). | ||
1268 | |||
1269 | Note that there is no simple, reliable way of getting an exact value of | ||
1270 | such a counter, without introducing more locks. This is not a problem | ||
1271 | for some uses. | ||
1272 | |||
1273 | Data Which Mostly Used By An IRQ Handler | ||
1274 | ---------------------------------------- | ||
1275 | |||
1276 | If data is always accessed from within the same IRQ handler, you don't | ||
1277 | need a lock at all: the kernel already guarantees that the irq handler | ||
1278 | will not run simultaneously on multiple CPUs. | ||
1279 | |||
1280 | Manfred Spraul points out that you can still do this, even if the data | ||
1281 | is very occasionally accessed in user context or softirqs/tasklets. The | ||
1282 | irq handler doesn't use a lock, and all other accesses are done as so:: | ||
1283 | |||
1284 | spin_lock(&lock); | ||
1285 | disable_irq(irq); | ||
1286 | ... | ||
1287 | enable_irq(irq); | ||
1288 | spin_unlock(&lock); | ||
1289 | |||
1290 | The :c:func:`disable_irq()` prevents the irq handler from running | ||
1291 | (and waits for it to finish if it's currently running on other CPUs). | ||
1292 | The spinlock prevents any other accesses happening at the same time. | ||
1293 | Naturally, this is slower than just a :c:func:`spin_lock_irq()` | ||
1294 | call, so it only makes sense if this type of access happens extremely | ||
1295 | rarely. | ||
1296 | |||
1297 | What Functions Are Safe To Call From Interrupts? | ||
1298 | ================================================ | ||
1299 | |||
1300 | Many functions in the kernel sleep (ie. call schedule()) directly or | ||
1301 | indirectly: you can never call them while holding a spinlock, or with | ||
1302 | preemption disabled. This also means you need to be in user context: | ||
1303 | calling them from an interrupt is illegal. | ||
1304 | |||
1305 | Some Functions Which Sleep | ||
1306 | -------------------------- | ||
1307 | |||
1308 | The most common ones are listed below, but you usually have to read the | ||
1309 | code to find out if other calls are safe. If everyone else who calls it | ||
1310 | can sleep, you probably need to be able to sleep, too. In particular, | ||
1311 | registration and deregistration functions usually expect to be called | ||
1312 | from user context, and can sleep. | ||
1313 | |||
1314 | - Accesses to userspace: | ||
1315 | |||
1316 | - :c:func:`copy_from_user()` | ||
1317 | |||
1318 | - :c:func:`copy_to_user()` | ||
1319 | |||
1320 | - :c:func:`get_user()` | ||
1321 | |||
1322 | - :c:func:`put_user()` | ||
1323 | |||
1324 | - :c:func:`kmalloc(GFP_KERNEL) <kmalloc>` | ||
1325 | |||
1326 | - :c:func:`mutex_lock_interruptible()` and | ||
1327 | :c:func:`mutex_lock()` | ||
1328 | |||
1329 | There is a :c:func:`mutex_trylock()` which does not sleep. | ||
1330 | Still, it must not be used inside interrupt context since its | ||
1331 | implementation is not safe for that. :c:func:`mutex_unlock()` | ||
1332 | will also never sleep. It cannot be used in interrupt context either | ||
1333 | since a mutex must be released by the same task that acquired it. | ||
1334 | |||
1335 | Some Functions Which Don't Sleep | ||
1336 | -------------------------------- | ||
1337 | |||
1338 | Some functions are safe to call from any context, or holding almost any | ||
1339 | lock. | ||
1340 | |||
1341 | - :c:func:`printk()` | ||
1342 | |||
1343 | - :c:func:`kfree()` | ||
1344 | |||
1345 | - :c:func:`add_timer()` and :c:func:`del_timer()` | ||
1346 | |||
1347 | Mutex API reference | ||
1348 | =================== | ||
1349 | |||
1350 | .. kernel-doc:: include/linux/mutex.h | ||
1351 | :internal: | ||
1352 | |||
1353 | .. kernel-doc:: kernel/locking/mutex.c | ||
1354 | :export: | ||
1355 | |||
1356 | Futex API reference | ||
1357 | =================== | ||
1358 | |||
1359 | .. kernel-doc:: kernel/futex.c | ||
1360 | :internal: | ||
1361 | |||
1362 | Further reading | ||
1363 | =============== | ||
1364 | |||
1365 | - ``Documentation/locking/spinlocks.txt``: Linus Torvalds' spinlocking | ||
1366 | tutorial in the kernel sources. | ||
1367 | |||
1368 | - Unix Systems for Modern Architectures: Symmetric Multiprocessing and | ||
1369 | Caching for Kernel Programmers: | ||
1370 | |||
1371 | Curt Schimmel's very good introduction to kernel level locking (not | ||
1372 | written for Linux, but nearly everything applies). The book is | ||
1373 | expensive, but really worth every penny to understand SMP locking. | ||
1374 | [ISBN: 0201633388] | ||
1375 | |||
1376 | Thanks | ||
1377 | ====== | ||
1378 | |||
1379 | Thanks to Telsa Gwynne for DocBooking, neatening and adding style. | ||
1380 | |||
1381 | Thanks to Martin Pool, Philipp Rumpf, Stephen Rothwell, Paul Mackerras, | ||
1382 | Ruedi Aschwanden, Alan Cox, Manfred Spraul, Tim Waugh, Pete Zaitcev, | ||
1383 | James Morris, Robert Love, Paul McKenney, John Ashby for proofreading, | ||
1384 | correcting, flaming, commenting. | ||
1385 | |||
1386 | Thanks to the cabal for having no influence on this document. | ||
1387 | |||
1388 | Glossary | ||
1389 | ======== | ||
1390 | |||
1391 | preemption | ||
1392 | Prior to 2.5, or when ``CONFIG_PREEMPT`` is unset, processes in user | ||
1393 | context inside the kernel would not preempt each other (ie. you had that | ||
1394 | CPU until you gave it up, except for interrupts). With the addition of | ||
1395 | ``CONFIG_PREEMPT`` in 2.5.4, this changed: when in user context, higher | ||
1396 | priority tasks can "cut in": spinlocks were changed to disable | ||
1397 | preemption, even on UP. | ||
1398 | |||
1399 | bh | ||
1400 | Bottom Half: for historical reasons, functions with '_bh' in them often | ||
1401 | now refer to any software interrupt, e.g. :c:func:`spin_lock_bh()` | ||
1402 | blocks any software interrupt on the current CPU. Bottom halves are | ||
1403 | deprecated, and will eventually be replaced by tasklets. Only one bottom | ||
1404 | half will be running at any time. | ||
1405 | |||
1406 | Hardware Interrupt / Hardware IRQ | ||
1407 | Hardware interrupt request. :c:func:`in_irq()` returns true in a | ||
1408 | hardware interrupt handler. | ||
1409 | |||
1410 | Interrupt Context | ||
1411 | Not user context: processing a hardware irq or software irq. Indicated | ||
1412 | by the :c:func:`in_interrupt()` macro returning true. | ||
1413 | |||
1414 | SMP | ||
1415 | Symmetric Multi-Processor: kernels compiled for multiple-CPU machines. | ||
1416 | (``CONFIG_SMP=y``). | ||
1417 | |||
1418 | Software Interrupt / softirq | ||
1419 | Software interrupt handler. :c:func:`in_irq()` returns false; | ||
1420 | :c:func:`in_softirq()` returns true. Tasklets and softirqs both | ||
1421 | fall into the category of 'software interrupts'. | ||
1422 | |||
1423 | Strictly speaking a softirq is one of up to 32 enumerated software | ||
1424 | interrupts which can run on multiple CPUs at once. Sometimes used to | ||
1425 | refer to tasklets as well (ie. all software interrupts). | ||
1426 | |||
1427 | tasklet | ||
1428 | A dynamically-registrable software interrupt, which is guaranteed to | ||
1429 | only run on one CPU at a time. | ||
1430 | |||
1431 | timer | ||
1432 | A dynamically-registrable software interrupt, which is run at (or close | ||
1433 | to) a given time. When running, it is just like a tasklet (in fact, they | ||
1434 | are called from the ``TIMER_SOFTIRQ``). | ||
1435 | |||
1436 | UP | ||
1437 | Uni-Processor: Non-SMP. (``CONFIG_SMP=n``). | ||
1438 | |||
1439 | User Context | ||
1440 | The kernel executing on behalf of a particular process (ie. a system | ||
1441 | call or trap) or kernel thread. You can tell which process with the | ||
1442 | ``current`` macro.) Not to be confused with userspace. Can be | ||
1443 | interrupted by software or hardware interrupts. | ||
1444 | |||
1445 | Userspace | ||
1446 | A process executing its own code outside the kernel. | ||
diff --git a/Documentation/lsm.txt b/Documentation/lsm.txt new file mode 100644 index 000000000000..ad4dfd020e0d --- /dev/null +++ b/Documentation/lsm.txt | |||
@@ -0,0 +1,201 @@ | |||
1 | ======================================================== | ||
2 | Linux Security Modules: General Security Hooks for Linux | ||
3 | ======================================================== | ||
4 | |||
5 | :Author: Stephen Smalley | ||
6 | :Author: Timothy Fraser | ||
7 | :Author: Chris Vance | ||
8 | |||
9 | .. note:: | ||
10 | |||
11 | The APIs described in this book are outdated. | ||
12 | |||
13 | Introduction | ||
14 | ============ | ||
15 | |||
16 | In March 2001, the National Security Agency (NSA) gave a presentation | ||
17 | about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel Summit. | ||
18 | SELinux is an implementation of flexible and fine-grained | ||
19 | nondiscretionary access controls in the Linux kernel, originally | ||
20 | implemented as its own particular kernel patch. Several other security | ||
21 | projects (e.g. RSBAC, Medusa) have also developed flexible access | ||
22 | control architectures for the Linux kernel, and various projects have | ||
23 | developed particular access control models for Linux (e.g. LIDS, DTE, | ||
24 | SubDomain). Each project has developed and maintained its own kernel | ||
25 | patch to support its security needs. | ||
26 | |||
27 | In response to the NSA presentation, Linus Torvalds made a set of | ||
28 | remarks that described a security framework he would be willing to | ||
29 | consider for inclusion in the mainstream Linux kernel. He described a | ||
30 | general framework that would provide a set of security hooks to control | ||
31 | operations on kernel objects and a set of opaque security fields in | ||
32 | kernel data structures for maintaining security attributes. This | ||
33 | framework could then be used by loadable kernel modules to implement any | ||
34 | desired model of security. Linus also suggested the possibility of | ||
35 | migrating the Linux capabilities code into such a module. | ||
36 | |||
37 | The Linux Security Modules (LSM) project was started by WireX to develop | ||
38 | such a framework. LSM is a joint development effort by several security | ||
39 | projects, including Immunix, SELinux, SGI and Janus, and several | ||
40 | individuals, including Greg Kroah-Hartman and James Morris, to develop a | ||
41 | Linux kernel patch that implements this framework. The patch is | ||
42 | currently tracking the 2.4 series and is targeted for integration into | ||
43 | the 2.5 development series. This technical report provides an overview | ||
44 | of the framework and the example capabilities security module provided | ||
45 | by the LSM kernel patch. | ||
46 | |||
47 | LSM Framework | ||
48 | ============= | ||
49 | |||
50 | The LSM kernel patch provides a general kernel framework to support | ||
51 | security modules. In particular, the LSM framework is primarily focused | ||
52 | on supporting access control modules, although future development is | ||
53 | likely to address other security needs such as auditing. By itself, the | ||
54 | framework does not provide any additional security; it merely provides | ||
55 | the infrastructure to support security modules. The LSM kernel patch | ||
56 | also moves most of the capabilities logic into an optional security | ||
57 | module, with the system defaulting to the traditional superuser logic. | ||
58 | This capabilities module is discussed further in | ||
59 | `LSM Capabilities Module <#cap>`__. | ||
60 | |||
61 | The LSM kernel patch adds security fields to kernel data structures and | ||
62 | inserts calls to hook functions at critical points in the kernel code to | ||
63 | manage the security fields and to perform access control. It also adds | ||
64 | functions for registering and unregistering security modules, and adds a | ||
65 | general :c:func:`security()` system call to support new system calls | ||
66 | for security-aware applications. | ||
67 | |||
68 | The LSM security fields are simply ``void*`` pointers. For process and | ||
69 | program execution security information, security fields were added to | ||
70 | :c:type:`struct task_struct <task_struct>` and | ||
71 | :c:type:`struct linux_binprm <linux_binprm>`. For filesystem | ||
72 | security information, a security field was added to :c:type:`struct | ||
73 | super_block <super_block>`. For pipe, file, and socket security | ||
74 | information, security fields were added to :c:type:`struct inode | ||
75 | <inode>` and :c:type:`struct file <file>`. For packet and | ||
76 | network device security information, security fields were added to | ||
77 | :c:type:`struct sk_buff <sk_buff>` and :c:type:`struct | ||
78 | net_device <net_device>`. For System V IPC security information, | ||
79 | security fields were added to :c:type:`struct kern_ipc_perm | ||
80 | <kern_ipc_perm>` and :c:type:`struct msg_msg | ||
81 | <msg_msg>`; additionally, the definitions for :c:type:`struct | ||
82 | msg_msg <msg_msg>`, struct msg_queue, and struct shmid_kernel | ||
83 | were moved to header files (``include/linux/msg.h`` and | ||
84 | ``include/linux/shm.h`` as appropriate) to allow the security modules to | ||
85 | use these definitions. | ||
86 | |||
87 | Each LSM hook is a function pointer in a global table, security_ops. | ||
88 | This table is a :c:type:`struct security_operations | ||
89 | <security_operations>` structure as defined by | ||
90 | ``include/linux/security.h``. Detailed documentation for each hook is | ||
91 | included in this header file. At present, this structure consists of a | ||
92 | collection of substructures that group related hooks based on the kernel | ||
93 | object (e.g. task, inode, file, sk_buff, etc) as well as some top-level | ||
94 | hook function pointers for system operations. This structure is likely | ||
95 | to be flattened in the future for performance. The placement of the hook | ||
96 | calls in the kernel code is described by the "called:" lines in the | ||
97 | per-hook documentation in the header file. The hook calls can also be | ||
98 | easily found in the kernel code by looking for the string | ||
99 | "security_ops->". | ||
100 | |||
101 | Linus mentioned per-process security hooks in his original remarks as a | ||
102 | possible alternative to global security hooks. However, if LSM were to | ||
103 | start from the perspective of per-process hooks, then the base framework | ||
104 | would have to deal with how to handle operations that involve multiple | ||
105 | processes (e.g. kill), since each process might have its own hook for | ||
106 | controlling the operation. This would require a general mechanism for | ||
107 | composing hooks in the base framework. Additionally, LSM would still | ||
108 | need global hooks for operations that have no process context (e.g. | ||
109 | network input operations). Consequently, LSM provides global security | ||
110 | hooks, but a security module is free to implement per-process hooks | ||
111 | (where that makes sense) by storing a security_ops table in each | ||
112 | process' security field and then invoking these per-process hooks from | ||
113 | the global hooks. The problem of composition is thus deferred to the | ||
114 | module. | ||
115 | |||
116 | The global security_ops table is initialized to a set of hook functions | ||
117 | provided by a dummy security module that provides traditional superuser | ||
118 | logic. A :c:func:`register_security()` function (in | ||
119 | ``security/security.c``) is provided to allow a security module to set | ||
120 | security_ops to refer to its own hook functions, and an | ||
121 | :c:func:`unregister_security()` function is provided to revert | ||
122 | security_ops to the dummy module hooks. This mechanism is used to set | ||
123 | the primary security module, which is responsible for making the final | ||
124 | decision for each hook. | ||
125 | |||
126 | LSM also provides a simple mechanism for stacking additional security | ||
127 | modules with the primary security module. It defines | ||
128 | :c:func:`register_security()` and | ||
129 | :c:func:`unregister_security()` hooks in the :c:type:`struct | ||
130 | security_operations <security_operations>` structure and | ||
131 | provides :c:func:`mod_reg_security()` and | ||
132 | :c:func:`mod_unreg_security()` functions that invoke these hooks | ||
133 | after performing some sanity checking. A security module can call these | ||
134 | functions in order to stack with other modules. However, the actual | ||
135 | details of how this stacking is handled are deferred to the module, | ||
136 | which can implement these hooks in any way it wishes (including always | ||
137 | returning an error if it does not wish to support stacking). In this | ||
138 | manner, LSM again defers the problem of composition to the module. | ||
139 | |||
140 | Although the LSM hooks are organized into substructures based on kernel | ||
141 | object, all of the hooks can be viewed as falling into two major | ||
142 | categories: hooks that are used to manage the security fields and hooks | ||
143 | that are used to perform access control. Examples of the first category | ||
144 | of hooks include the :c:func:`alloc_security()` and | ||
145 | :c:func:`free_security()` hooks defined for each kernel data | ||
146 | structure that has a security field. These hooks are used to allocate | ||
147 | and free security structures for kernel objects. The first category of | ||
148 | hooks also includes hooks that set information in the security field | ||
149 | after allocation, such as the :c:func:`post_lookup()` hook in | ||
150 | :c:type:`struct inode_security_ops <inode_security_ops>`. | ||
151 | This hook is used to set security information for inodes after | ||
152 | successful lookup operations. An example of the second category of hooks | ||
153 | is the :c:func:`permission()` hook in :c:type:`struct | ||
154 | inode_security_ops <inode_security_ops>`. This hook checks | ||
155 | permission when accessing an inode. | ||
156 | |||
157 | LSM Capabilities Module | ||
158 | ======================= | ||
159 | |||
160 | The LSM kernel patch moves most of the existing POSIX.1e capabilities | ||
161 | logic into an optional security module stored in the file | ||
162 | ``security/capability.c``. This change allows users who do not want to | ||
163 | use capabilities to omit this code entirely from their kernel, instead | ||
164 | using the dummy module for traditional superuser logic or any other | ||
165 | module that they desire. This change also allows the developers of the | ||
166 | capabilities logic to maintain and enhance their code more freely, | ||
167 | without needing to integrate patches back into the base kernel. | ||
168 | |||
169 | In addition to moving the capabilities logic, the LSM kernel patch could | ||
170 | move the capability-related fields from the kernel data structures into | ||
171 | the new security fields managed by the security modules. However, at | ||
172 | present, the LSM kernel patch leaves the capability fields in the kernel | ||
173 | data structures. In his original remarks, Linus suggested that this | ||
174 | might be preferable so that other security modules can be easily stacked | ||
175 | with the capabilities module without needing to chain multiple security | ||
176 | structures on the security field. It also avoids imposing extra overhead | ||
177 | on the capabilities module to manage the security fields. However, the | ||
178 | LSM framework could certainly support such a move if it is determined to | ||
179 | be desirable, with only a few additional changes described below. | ||
180 | |||
181 | At present, the capabilities logic for computing process capabilities on | ||
182 | :c:func:`execve()` and :c:func:`set\*uid()`, checking | ||
183 | capabilities for a particular process, saving and checking capabilities | ||
184 | for netlink messages, and handling the :c:func:`capget()` and | ||
185 | :c:func:`capset()` system calls have been moved into the | ||
186 | capabilities module. There are still a few locations in the base kernel | ||
187 | where capability-related fields are directly examined or modified, but | ||
188 | the current version of the LSM patch does allow a security module to | ||
189 | completely replace the assignment and testing of capabilities. These few | ||
190 | locations would need to be changed if the capability-related fields were | ||
191 | moved into the security field. The following is a list of known | ||
192 | locations that still perform such direct examination or modification of | ||
193 | capability-related fields: | ||
194 | |||
195 | - ``fs/open.c``::c:func:`sys_access()` | ||
196 | |||
197 | - ``fs/lockd/host.c``::c:func:`nlm_bind_host()` | ||
198 | |||
199 | - ``fs/nfsd/auth.c``::c:func:`nfsd_setuser()` | ||
200 | |||
201 | - ``fs/proc/array.c``::c:func:`task_cap()` | ||
diff --git a/Documentation/media/uapi/v4l/vidioc-g-selection.rst b/Documentation/media/uapi/v4l/vidioc-g-selection.rst index deb1f6fb473b..b80d85cb8891 100644 --- a/Documentation/media/uapi/v4l/vidioc-g-selection.rst +++ b/Documentation/media/uapi/v4l/vidioc-g-selection.rst | |||
@@ -129,8 +129,8 @@ Selection targets and flags are documented in | |||
129 | 129 | ||
130 | .. _sel-const-adjust: | 130 | .. _sel-const-adjust: |
131 | 131 | ||
132 | .. figure:: constraints.* | 132 | .. kernel-figure:: constraints.svg |
133 | :alt: constraints.pdf / constraints.svg | 133 | :alt: constraints.svg |
134 | :align: center | 134 | :align: center |
135 | 135 | ||
136 | Size adjustments with constraint flags. | 136 | Size adjustments with constraint flags. |
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 9d5e0f853f08..c239a0cf4b1a 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt | |||
@@ -498,11 +498,11 @@ And a couple of implicit varieties: | |||
498 | This means that ACQUIRE acts as a minimal "acquire" operation and | 498 | This means that ACQUIRE acts as a minimal "acquire" operation and |
499 | RELEASE acts as a minimal "release" operation. | 499 | RELEASE acts as a minimal "release" operation. |
500 | 500 | ||
501 | A subset of the atomic operations described in atomic_ops.txt have ACQUIRE | 501 | A subset of the atomic operations described in core-api/atomic_ops.rst have |
502 | and RELEASE variants in addition to fully-ordered and relaxed (no barrier | 502 | ACQUIRE and RELEASE variants in addition to fully-ordered and relaxed (no |
503 | semantics) definitions. For compound atomics performing both a load and a | 503 | barrier semantics) definitions. For compound atomics performing both a load |
504 | store, ACQUIRE semantics apply only to the load and RELEASE semantics apply | 504 | and a store, ACQUIRE semantics apply only to the load and RELEASE semantics |
505 | only to the store portion of the operation. | 505 | apply only to the store portion of the operation. |
506 | 506 | ||
507 | Memory barriers are only required where there's a possibility of interaction | 507 | Memory barriers are only required where there's a possibility of interaction |
508 | between two CPUs or between a CPU and a device. If it can be guaranteed that | 508 | between two CPUs or between a CPU and a device. If it can be guaranteed that |
diff --git a/Documentation/networking/conf.py b/Documentation/networking/conf.py new file mode 100644 index 000000000000..40f69e67a883 --- /dev/null +++ b/Documentation/networking/conf.py | |||
@@ -0,0 +1,10 @@ | |||
1 | # -*- coding: utf-8; mode: python -*- | ||
2 | |||
3 | project = "Linux Networking Documentation" | ||
4 | |||
5 | tags.add("subproject") | ||
6 | |||
7 | latex_documents = [ | ||
8 | ('index', 'networking.tex', project, | ||
9 | 'The kernel development community', 'manual'), | ||
10 | ] | ||
diff --git a/Documentation/networking/dns_resolver.txt b/Documentation/networking/dns_resolver.txt index d86adcdae420..eaa8f9a6fd5d 100644 --- a/Documentation/networking/dns_resolver.txt +++ b/Documentation/networking/dns_resolver.txt | |||
@@ -143,7 +143,7 @@ the key will be discarded and recreated when the data it holds has expired. | |||
143 | dns_query() returns a copy of the value attached to the key, or an error if | 143 | dns_query() returns a copy of the value attached to the key, or an error if |
144 | that is indicated instead. | 144 | that is indicated instead. |
145 | 145 | ||
146 | See <file:Documentation/security/keys-request-key.txt> for further | 146 | See <file:Documentation/security/keys/request-key.rst> for further |
147 | information about request-key function. | 147 | information about request-key function. |
148 | 148 | ||
149 | 149 | ||
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst new file mode 100644 index 000000000000..b5bd87e01f52 --- /dev/null +++ b/Documentation/networking/index.rst | |||
@@ -0,0 +1,18 @@ | |||
1 | Linux Networking Documentation | ||
2 | ============================== | ||
3 | |||
4 | Contents: | ||
5 | |||
6 | .. toctree:: | ||
7 | :maxdepth: 2 | ||
8 | |||
9 | kapi | ||
10 | z8530book | ||
11 | |||
12 | .. only:: subproject | ||
13 | |||
14 | Indices | ||
15 | ======= | ||
16 | |||
17 | * :ref:`genindex` | ||
18 | |||
diff --git a/Documentation/networking/kapi.rst b/Documentation/networking/kapi.rst new file mode 100644 index 000000000000..580289f345da --- /dev/null +++ b/Documentation/networking/kapi.rst | |||
@@ -0,0 +1,147 @@ | |||
1 | ========================================= | ||
2 | Linux Networking and Network Devices APIs | ||
3 | ========================================= | ||
4 | |||
5 | Linux Networking | ||
6 | ================ | ||
7 | |||
8 | Networking Base Types | ||
9 | --------------------- | ||
10 | |||
11 | .. kernel-doc:: include/linux/net.h | ||
12 | :internal: | ||
13 | |||
14 | Socket Buffer Functions | ||
15 | ----------------------- | ||
16 | |||
17 | .. kernel-doc:: include/linux/skbuff.h | ||
18 | :internal: | ||
19 | |||
20 | .. kernel-doc:: include/net/sock.h | ||
21 | :internal: | ||
22 | |||
23 | .. kernel-doc:: net/socket.c | ||
24 | :export: | ||
25 | |||
26 | .. kernel-doc:: net/core/skbuff.c | ||
27 | :export: | ||
28 | |||
29 | .. kernel-doc:: net/core/sock.c | ||
30 | :export: | ||
31 | |||
32 | .. kernel-doc:: net/core/datagram.c | ||
33 | :export: | ||
34 | |||
35 | .. kernel-doc:: net/core/stream.c | ||
36 | :export: | ||
37 | |||
38 | Socket Filter | ||
39 | ------------- | ||
40 | |||
41 | .. kernel-doc:: net/core/filter.c | ||
42 | :export: | ||
43 | |||
44 | Generic Network Statistics | ||
45 | -------------------------- | ||
46 | |||
47 | .. kernel-doc:: include/uapi/linux/gen_stats.h | ||
48 | :internal: | ||
49 | |||
50 | .. kernel-doc:: net/core/gen_stats.c | ||
51 | :export: | ||
52 | |||
53 | .. kernel-doc:: net/core/gen_estimator.c | ||
54 | :export: | ||
55 | |||
56 | SUN RPC subsystem | ||
57 | ----------------- | ||
58 | |||
59 | .. kernel-doc:: net/sunrpc/xdr.c | ||
60 | :export: | ||
61 | |||
62 | .. kernel-doc:: net/sunrpc/svc_xprt.c | ||
63 | :export: | ||
64 | |||
65 | .. kernel-doc:: net/sunrpc/xprt.c | ||
66 | :export: | ||
67 | |||
68 | .. kernel-doc:: net/sunrpc/sched.c | ||
69 | :export: | ||
70 | |||
71 | .. kernel-doc:: net/sunrpc/socklib.c | ||
72 | :export: | ||
73 | |||
74 | .. kernel-doc:: net/sunrpc/stats.c | ||
75 | :export: | ||
76 | |||
77 | .. kernel-doc:: net/sunrpc/rpc_pipe.c | ||
78 | :export: | ||
79 | |||
80 | .. kernel-doc:: net/sunrpc/rpcb_clnt.c | ||
81 | :export: | ||
82 | |||
83 | .. kernel-doc:: net/sunrpc/clnt.c | ||
84 | :export: | ||
85 | |||
86 | WiMAX | ||
87 | ----- | ||
88 | |||
89 | .. kernel-doc:: net/wimax/op-msg.c | ||
90 | :export: | ||
91 | |||
92 | .. kernel-doc:: net/wimax/op-reset.c | ||
93 | :export: | ||
94 | |||
95 | .. kernel-doc:: net/wimax/op-rfkill.c | ||
96 | :export: | ||
97 | |||
98 | .. kernel-doc:: net/wimax/stack.c | ||
99 | :export: | ||
100 | |||
101 | .. kernel-doc:: include/net/wimax.h | ||
102 | :internal: | ||
103 | |||
104 | .. kernel-doc:: include/uapi/linux/wimax.h | ||
105 | :internal: | ||
106 | |||
107 | Network device support | ||
108 | ====================== | ||
109 | |||
110 | Driver Support | ||
111 | -------------- | ||
112 | |||
113 | .. kernel-doc:: net/core/dev.c | ||
114 | :export: | ||
115 | |||
116 | .. kernel-doc:: net/ethernet/eth.c | ||
117 | :export: | ||
118 | |||
119 | .. kernel-doc:: net/sched/sch_generic.c | ||
120 | :export: | ||
121 | |||
122 | .. kernel-doc:: include/linux/etherdevice.h | ||
123 | :internal: | ||
124 | |||
125 | .. kernel-doc:: include/linux/netdevice.h | ||
126 | :internal: | ||
127 | |||
128 | PHY Support | ||
129 | ----------- | ||
130 | |||
131 | .. kernel-doc:: drivers/net/phy/phy.c | ||
132 | :export: | ||
133 | |||
134 | .. kernel-doc:: drivers/net/phy/phy.c | ||
135 | :internal: | ||
136 | |||
137 | .. kernel-doc:: drivers/net/phy/phy_device.c | ||
138 | :export: | ||
139 | |||
140 | .. kernel-doc:: drivers/net/phy/phy_device.c | ||
141 | :internal: | ||
142 | |||
143 | .. kernel-doc:: drivers/net/phy/mdio_bus.c | ||
144 | :export: | ||
145 | |||
146 | .. kernel-doc:: drivers/net/phy/mdio_bus.c | ||
147 | :internal: | ||
diff --git a/Documentation/networking/z8530book.rst b/Documentation/networking/z8530book.rst new file mode 100644 index 000000000000..fea2c40e7973 --- /dev/null +++ b/Documentation/networking/z8530book.rst | |||
@@ -0,0 +1,256 @@ | |||
1 | ======================= | ||
2 | Z8530 Programming Guide | ||
3 | ======================= | ||
4 | |||
5 | :Author: Alan Cox | ||
6 | |||
7 | Introduction | ||
8 | ============ | ||
9 | |||
10 | The Z85x30 family synchronous/asynchronous controller chips are used on | ||
11 | a large number of cheap network interface cards. The kernel provides a | ||
12 | core interface layer that is designed to make it easy to provide WAN | ||
13 | services using this chip. | ||
14 | |||
15 | The current driver only support synchronous operation. Merging the | ||
16 | asynchronous driver support into this code to allow any Z85x30 device to | ||
17 | be used as both a tty interface and as a synchronous controller is a | ||
18 | project for Linux post the 2.4 release | ||
19 | |||
20 | Driver Modes | ||
21 | ============ | ||
22 | |||
23 | The Z85230 driver layer can drive Z8530, Z85C30 and Z85230 devices in | ||
24 | three different modes. Each mode can be applied to an individual channel | ||
25 | on the chip (each chip has two channels). | ||
26 | |||
27 | The PIO synchronous mode supports the most common Z8530 wiring. Here the | ||
28 | chip is interface to the I/O and interrupt facilities of the host | ||
29 | machine but not to the DMA subsystem. When running PIO the Z8530 has | ||
30 | extremely tight timing requirements. Doing high speeds, even with a | ||
31 | Z85230 will be tricky. Typically you should expect to achieve at best | ||
32 | 9600 baud with a Z8C530 and 64Kbits with a Z85230. | ||
33 | |||
34 | The DMA mode supports the chip when it is configured to use dual DMA | ||
35 | channels on an ISA bus. The better cards tend to support this mode of | ||
36 | operation for a single channel. With DMA running the Z85230 tops out | ||
37 | when it starts to hit ISA DMA constraints at about 512Kbits. It is worth | ||
38 | noting here that many PC machines hang or crash when the chip is driven | ||
39 | fast enough to hold the ISA bus solid. | ||
40 | |||
41 | Transmit DMA mode uses a single DMA channel. The DMA channel is used for | ||
42 | transmission as the transmit FIFO is smaller than the receive FIFO. it | ||
43 | gives better performance than pure PIO mode but is nowhere near as ideal | ||
44 | as pure DMA mode. | ||
45 | |||
46 | Using the Z85230 driver | ||
47 | ======================= | ||
48 | |||
49 | The Z85230 driver provides the back end interface to your board. To | ||
50 | configure a Z8530 interface you need to detect the board and to identify | ||
51 | its ports and interrupt resources. It is also your problem to verify the | ||
52 | resources are available. | ||
53 | |||
54 | Having identified the chip you need to fill in a struct z8530_dev, | ||
55 | which describes each chip. This object must exist until you finally | ||
56 | shutdown the board. Firstly zero the active field. This ensures nothing | ||
57 | goes off without you intending it. The irq field should be set to the | ||
58 | interrupt number of the chip. (Each chip has a single interrupt source | ||
59 | rather than each channel). You are responsible for allocating the | ||
60 | interrupt line. The interrupt handler should be set to | ||
61 | :c:func:`z8530_interrupt()`. The device id should be set to the | ||
62 | z8530_dev structure pointer. Whether the interrupt can be shared or not | ||
63 | is board dependent, and up to you to initialise. | ||
64 | |||
65 | The structure holds two channel structures. Initialise chanA.ctrlio and | ||
66 | chanA.dataio with the address of the control and data ports. You can or | ||
67 | this with Z8530_PORT_SLEEP to indicate your interface needs the 5uS | ||
68 | delay for chip settling done in software. The PORT_SLEEP option is | ||
69 | architecture specific. Other flags may become available on future | ||
70 | platforms, eg for MMIO. Initialise the chanA.irqs to &z8530_nop to | ||
71 | start the chip up as disabled and discarding interrupt events. This | ||
72 | ensures that stray interrupts will be mopped up and not hang the bus. | ||
73 | Set chanA.dev to point to the device structure itself. The private and | ||
74 | name field you may use as you wish. The private field is unused by the | ||
75 | Z85230 layer. The name is used for error reporting and it may thus make | ||
76 | sense to make it match the network name. | ||
77 | |||
78 | Repeat the same operation with the B channel if your chip has both | ||
79 | channels wired to something useful. This isn't always the case. If it is | ||
80 | not wired then the I/O values do not matter, but you must initialise | ||
81 | chanB.dev. | ||
82 | |||
83 | If your board has DMA facilities then initialise the txdma and rxdma | ||
84 | fields for the relevant channels. You must also allocate the ISA DMA | ||
85 | channels and do any necessary board level initialisation to configure | ||
86 | them. The low level driver will do the Z8530 and DMA controller | ||
87 | programming but not board specific magic. | ||
88 | |||
89 | Having initialised the device you can then call | ||
90 | :c:func:`z8530_init()`. This will probe the chip and reset it into | ||
91 | a known state. An identification sequence is then run to identify the | ||
92 | chip type. If the checks fail to pass the function returns a non zero | ||
93 | error code. Typically this indicates that the port given is not valid. | ||
94 | After this call the type field of the z8530_dev structure is | ||
95 | initialised to either Z8530, Z85C30 or Z85230 according to the chip | ||
96 | found. | ||
97 | |||
98 | Once you have called z8530_init you can also make use of the utility | ||
99 | function :c:func:`z8530_describe()`. This provides a consistent | ||
100 | reporting format for the Z8530 devices, and allows all the drivers to | ||
101 | provide consistent reporting. | ||
102 | |||
103 | Attaching Network Interfaces | ||
104 | ============================ | ||
105 | |||
106 | If you wish to use the network interface facilities of the driver, then | ||
107 | you need to attach a network device to each channel that is present and | ||
108 | in use. In addition to use the generic HDLC you need to follow some | ||
109 | additional plumbing rules. They may seem complex but a look at the | ||
110 | example hostess_sv11 driver should reassure you. | ||
111 | |||
112 | The network device used for each channel should be pointed to by the | ||
113 | netdevice field of each channel. The hdlc-> priv field of the network | ||
114 | device points to your private data - you will need to be able to find | ||
115 | your private data from this. | ||
116 | |||
117 | The way most drivers approach this particular problem is to create a | ||
118 | structure holding the Z8530 device definition and put that into the | ||
119 | private field of the network device. The network device fields of the | ||
120 | channels then point back to the network devices. | ||
121 | |||
122 | If you wish to use the generic HDLC then you need to register the HDLC | ||
123 | device. | ||
124 | |||
125 | Before you register your network device you will also need to provide | ||
126 | suitable handlers for most of the network device callbacks. See the | ||
127 | network device documentation for more details on this. | ||
128 | |||
129 | Configuring And Activating The Port | ||
130 | =================================== | ||
131 | |||
132 | The Z85230 driver provides helper functions and tables to load the port | ||
133 | registers on the Z8530 chips. When programming the register settings for | ||
134 | a channel be aware that the documentation recommends initialisation | ||
135 | orders. Strange things happen when these are not followed. | ||
136 | |||
137 | :c:func:`z8530_channel_load()` takes an array of pairs of | ||
138 | initialisation values in an array of u8 type. The first value is the | ||
139 | Z8530 register number. Add 16 to indicate the alternate register bank on | ||
140 | the later chips. The array is terminated by a 255. | ||
141 | |||
142 | The driver provides a pair of public tables. The z8530_hdlc_kilostream | ||
143 | table is for the UK 'Kilostream' service and also happens to cover most | ||
144 | other end host configurations. The z8530_hdlc_kilostream_85230 table | ||
145 | is the same configuration using the enhancements of the 85230 chip. The | ||
146 | configuration loaded is standard NRZ encoded synchronous data with HDLC | ||
147 | bitstuffing. All of the timing is taken from the other end of the link. | ||
148 | |||
149 | When writing your own tables be aware that the driver internally tracks | ||
150 | register values. It may need to reload values. You should therefore be | ||
151 | sure to set registers 1-7, 9-11, 14 and 15 in all configurations. Where | ||
152 | the register settings depend on DMA selection the driver will update the | ||
153 | bits itself when you open or close. Loading a new table with the | ||
154 | interface open is not recommended. | ||
155 | |||
156 | There are three standard configurations supported by the core code. In | ||
157 | PIO mode the interface is programmed up to use interrupt driven PIO. | ||
158 | This places high demands on the host processor to avoid latency. The | ||
159 | driver is written to take account of latency issues but it cannot avoid | ||
160 | latencies caused by other drivers, notably IDE in PIO mode. Because the | ||
161 | drivers allocate buffers you must also prevent MTU changes while the | ||
162 | port is open. | ||
163 | |||
164 | Once the port is open it will call the rx_function of each channel | ||
165 | whenever a completed packet arrived. This is invoked from interrupt | ||
166 | context and passes you the channel and a network buffer (struct | ||
167 | sk_buff) holding the data. The data includes the CRC bytes so most | ||
168 | users will want to trim the last two bytes before processing the data. | ||
169 | This function is very timing critical. When you wish to simply discard | ||
170 | data the support code provides the function | ||
171 | :c:func:`z8530_null_rx()` to discard the data. | ||
172 | |||
173 | To active PIO mode sending and receiving the ``z8530_sync_open`` is called. | ||
174 | This expects to be passed the network device and the channel. Typically | ||
175 | this is called from your network device open callback. On a failure a | ||
176 | non zero error status is returned. | ||
177 | The :c:func:`z8530_sync_close()` function shuts down a PIO | ||
178 | channel. This must be done before the channel is opened again and before | ||
179 | the driver shuts down and unloads. | ||
180 | |||
181 | The ideal mode of operation is dual channel DMA mode. Here the kernel | ||
182 | driver will configure the board for DMA in both directions. The driver | ||
183 | also handles ISA DMA issues such as controller programming and the | ||
184 | memory range limit for you. This mode is activated by calling the | ||
185 | :c:func:`z8530_sync_dma_open()` function. On failure a non zero | ||
186 | error value is returned. Once this mode is activated it can be shut down | ||
187 | by calling the :c:func:`z8530_sync_dma_close()`. You must call | ||
188 | the close function matching the open mode you used. | ||
189 | |||
190 | The final supported mode uses a single DMA channel to drive the transmit | ||
191 | side. As the Z85C30 has a larger FIFO on the receive channel this tends | ||
192 | to increase the maximum speed a little. This is activated by calling the | ||
193 | ``z8530_sync_txdma_open``. This returns a non zero error code on failure. The | ||
194 | :c:func:`z8530_sync_txdma_close()` function closes down the Z8530 | ||
195 | interface from this mode. | ||
196 | |||
197 | Network Layer Functions | ||
198 | ======================= | ||
199 | |||
200 | The Z8530 layer provides functions to queue packets for transmission. | ||
201 | The driver internally buffers the frame currently being transmitted and | ||
202 | one further frame (in order to keep back to back transmission running). | ||
203 | Any further buffering is up to the caller. | ||
204 | |||
205 | The function :c:func:`z8530_queue_xmit()` takes a network buffer | ||
206 | in sk_buff format and queues it for transmission. The caller must | ||
207 | provide the entire packet with the exception of the bitstuffing and CRC. | ||
208 | This is normally done by the caller via the generic HDLC interface | ||
209 | layer. It returns 0 if the buffer has been queued and non zero values | ||
210 | for queue full. If the function accepts the buffer it becomes property | ||
211 | of the Z8530 layer and the caller should not free it. | ||
212 | |||
213 | The function :c:func:`z8530_get_stats()` returns a pointer to an | ||
214 | internally maintained per interface statistics block. This provides most | ||
215 | of the interface code needed to implement the network layer get_stats | ||
216 | callback. | ||
217 | |||
218 | Porting The Z8530 Driver | ||
219 | ======================== | ||
220 | |||
221 | The Z8530 driver is written to be portable. In DMA mode it makes | ||
222 | assumptions about the use of ISA DMA. These are probably warranted in | ||
223 | most cases as the Z85230 in particular was designed to glue to PC type | ||
224 | machines. The PIO mode makes no real assumptions. | ||
225 | |||
226 | Should you need to retarget the Z8530 driver to another architecture the | ||
227 | only code that should need changing are the port I/O functions. At the | ||
228 | moment these assume PC I/O port accesses. This may not be appropriate | ||
229 | for all platforms. Replacing :c:func:`z8530_read_port()` and | ||
230 | ``z8530_write_port`` is intended to be all that is required to port | ||
231 | this driver layer. | ||
232 | |||
233 | Known Bugs And Assumptions | ||
234 | ========================== | ||
235 | |||
236 | Interrupt Locking | ||
237 | The locking in the driver is done via the global cli/sti lock. This | ||
238 | makes for relatively poor SMP performance. Switching this to use a | ||
239 | per device spin lock would probably materially improve performance. | ||
240 | |||
241 | Occasional Failures | ||
242 | We have reports of occasional failures when run for very long | ||
243 | periods of time and the driver starts to receive junk frames. At the | ||
244 | moment the cause of this is not clear. | ||
245 | |||
246 | Public Functions Provided | ||
247 | ========================= | ||
248 | |||
249 | .. kernel-doc:: drivers/net/wan/z85230.c | ||
250 | :export: | ||
251 | |||
252 | Internal Functions | ||
253 | ================== | ||
254 | |||
255 | .. kernel-doc:: drivers/net/wan/z85230.c | ||
256 | :internal: | ||
diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst index e25d63f8c0da..3aed751e0cb5 100644 --- a/Documentation/process/changes.rst +++ b/Documentation/process/changes.rst | |||
@@ -116,12 +116,11 @@ DevFS has been obsoleted in favour of udev | |||
116 | 116 | ||
117 | Linux documentation for functions is transitioning to inline | 117 | Linux documentation for functions is transitioning to inline |
118 | documentation via specially-formatted comments near their | 118 | documentation via specially-formatted comments near their |
119 | definitions in the source. These comments can be combined with the | 119 | definitions in the source. These comments can be combined with ReST |
120 | SGML templates in the Documentation/DocBook directory to make DocBook | 120 | files the Documentation/ directory to make enriched documentation, which can |
121 | files, which can then be converted by DocBook stylesheets to PostScript, | 121 | then be converted to PostScript, HTML, LaTex, ePUB and PDF files. |
122 | HTML, PDF files, and several other formats. In order to convert from | 122 | In order to convert from ReST format to a format of your choice, you'll need |
123 | DocBook format to a format of your choice, you'll need to install Jade as | 123 | Sphinx. |
124 | well as the desired DocBook stylesheets. | ||
125 | 124 | ||
126 | Util-linux | 125 | Util-linux |
127 | ---------- | 126 | ---------- |
@@ -323,12 +322,6 @@ PDF outputs, it is recommended to use version 1.4.6. | |||
323 | functionalities required for ``XeLaTex`` to work. For PDF output you'll also | 322 | functionalities required for ``XeLaTex`` to work. For PDF output you'll also |
324 | need ``convert(1)`` from ImageMagick (https://www.imagemagick.org). | 323 | need ``convert(1)`` from ImageMagick (https://www.imagemagick.org). |
325 | 324 | ||
326 | Other tools | ||
327 | ----------- | ||
328 | |||
329 | In order to produce documentation from DocBook, you'll also need ``xmlto``. | ||
330 | Please notice, however, that we're currently migrating all documents to use | ||
331 | ``Sphinx``. | ||
332 | 325 | ||
333 | Getting updated software | 326 | Getting updated software |
334 | ======================== | 327 | ======================== |
@@ -409,15 +402,6 @@ Quota-tools | |||
409 | 402 | ||
410 | - <http://sourceforge.net/projects/linuxquota/> | 403 | - <http://sourceforge.net/projects/linuxquota/> |
411 | 404 | ||
412 | DocBook Stylesheets | ||
413 | ------------------- | ||
414 | |||
415 | - <http://sourceforge.net/projects/docbook/files/docbook-dsssl/> | ||
416 | |||
417 | XMLTO XSLT Frontend | ||
418 | ------------------- | ||
419 | |||
420 | - <http://cyberelk.net/tim/xmlto/> | ||
421 | 405 | ||
422 | Intel P6 microcode | 406 | Intel P6 microcode |
423 | ------------------ | 407 | ------------------ |
diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst index d20d52a4d812..a20b44a40ec4 100644 --- a/Documentation/process/coding-style.rst +++ b/Documentation/process/coding-style.rst | |||
@@ -980,8 +980,8 @@ do so, though, and doing so unnecessarily can limit optimization. | |||
980 | 980 | ||
981 | When writing a single inline assembly statement containing multiple | 981 | When writing a single inline assembly statement containing multiple |
982 | instructions, put each instruction on a separate line in a separate quoted | 982 | instructions, put each instruction on a separate line in a separate quoted |
983 | string, and end each string except the last with \n\t to properly indent the | 983 | string, and end each string except the last with ``\n\t`` to properly indent |
984 | next instruction in the assembly output: | 984 | the next instruction in the assembly output: |
985 | 985 | ||
986 | .. code-block:: c | 986 | .. code-block:: c |
987 | 987 | ||
diff --git a/Documentation/process/email-clients.rst b/Documentation/process/email-clients.rst index ac892b30815e..07faa5457bcb 100644 --- a/Documentation/process/email-clients.rst +++ b/Documentation/process/email-clients.rst | |||
@@ -167,6 +167,11 @@ Lotus Notes (GUI) | |||
167 | 167 | ||
168 | Run away from it. | 168 | Run away from it. |
169 | 169 | ||
170 | IBM Verse (Web GUI) | ||
171 | ******************* | ||
172 | |||
173 | See Lotus Notes. | ||
174 | |||
170 | Mutt (TUI) | 175 | Mutt (TUI) |
171 | ********** | 176 | ********** |
172 | 177 | ||
diff --git a/Documentation/process/howto.rst b/Documentation/process/howto.rst index 1260f60d4cb9..c6875b1db56f 100644 --- a/Documentation/process/howto.rst +++ b/Documentation/process/howto.rst | |||
@@ -180,14 +180,6 @@ They can also be generated on LaTeX and ePub formats with:: | |||
180 | make latexdocs | 180 | make latexdocs |
181 | make epubdocs | 181 | make epubdocs |
182 | 182 | ||
183 | Currently, there are some documents written on DocBook that are in | ||
184 | the process of conversion to ReST. Such documents will be created in the | ||
185 | Documentation/DocBook/ directory and can be generated also as | ||
186 | Postscript or man pages by running:: | ||
187 | |||
188 | make psdocs | ||
189 | make mandocs | ||
190 | |||
191 | Becoming A Kernel Developer | 183 | Becoming A Kernel Developer |
192 | --------------------------- | 184 | --------------------------- |
193 | 185 | ||
diff --git a/Documentation/process/kernel-docs.rst b/Documentation/process/kernel-docs.rst index 05a7857a4a83..b8cac85a4001 100644 --- a/Documentation/process/kernel-docs.rst +++ b/Documentation/process/kernel-docs.rst | |||
@@ -40,50 +40,18 @@ Enjoy! | |||
40 | Docs at the Linux Kernel tree | 40 | Docs at the Linux Kernel tree |
41 | ----------------------------- | 41 | ----------------------------- |
42 | 42 | ||
43 | The DocBook books should be built with ``make {htmldocs | psdocs | pdfdocs}``. | ||
44 | The Sphinx books should be built with ``make {htmldocs | pdfdocs | epubdocs}``. | 43 | The Sphinx books should be built with ``make {htmldocs | pdfdocs | epubdocs}``. |
45 | 44 | ||
46 | * Name: **linux/Documentation** | 45 | * Name: **linux/Documentation** |
47 | 46 | ||
48 | :Author: Many. | 47 | :Author: Many. |
49 | :Location: Documentation/ | 48 | :Location: Documentation/ |
50 | :Keywords: text files, Sphinx, DocBook. | 49 | :Keywords: text files, Sphinx. |
51 | :Description: Documentation that comes with the kernel sources, | 50 | :Description: Documentation that comes with the kernel sources, |
52 | inside the Documentation directory. Some pages from this document | 51 | inside the Documentation directory. Some pages from this document |
53 | (including this document itself) have been moved there, and might | 52 | (including this document itself) have been moved there, and might |
54 | be more up to date than the web version. | 53 | be more up to date than the web version. |
55 | 54 | ||
56 | * Title: **The Kernel Hacking HOWTO** | ||
57 | |||
58 | :Author: Various Talented People, and Rusty. | ||
59 | :Location: Documentation/DocBook/kernel-hacking.tmpl | ||
60 | :Keywords: HOWTO, kernel contexts, deadlock, locking, modules, | ||
61 | symbols, return conventions. | ||
62 | :Description: From the Introduction: "Please understand that I | ||
63 | never wanted to write this document, being grossly underqualified, | ||
64 | but I always wanted to read it, and this was the only way. I | ||
65 | simply explain some best practices, and give reading entry-points | ||
66 | into the kernel sources. I avoid implementation details: that's | ||
67 | what the code is for, and I ignore whole tracts of useful | ||
68 | routines. This document assumes familiarity with C, and an | ||
69 | understanding of what the kernel is, and how it is used. It was | ||
70 | originally written for the 2.3 kernels, but nearly all of it | ||
71 | applies to 2.2 too; 2.0 is slightly different". | ||
72 | |||
73 | * Title: **Linux Kernel Locking HOWTO** | ||
74 | |||
75 | :Author: Various Talented People, and Rusty. | ||
76 | :Location: Documentation/DocBook/kernel-locking.tmpl | ||
77 | :Keywords: locks, locking, spinlock, semaphore, atomic, race | ||
78 | condition, bottom halves, tasklets, softirqs. | ||
79 | :Description: The title says it all: document describing the | ||
80 | locking system in the Linux Kernel either in uniprocessor or SMP | ||
81 | systems. | ||
82 | :Notes: "It was originally written for the later (>2.3.47) 2.3 | ||
83 | kernels, but most of it applies to 2.2 too; 2.0 is slightly | ||
84 | different". Freely redistributable under the conditions of the GNU | ||
85 | General Public License. | ||
86 | |||
87 | On-line docs | 55 | On-line docs |
88 | ------------ | 56 | ------------ |
89 | 57 | ||
diff --git a/Documentation/security/00-INDEX b/Documentation/security/00-INDEX deleted file mode 100644 index 45c82fd3e9d3..000000000000 --- a/Documentation/security/00-INDEX +++ /dev/null | |||
@@ -1,26 +0,0 @@ | |||
1 | 00-INDEX | ||
2 | - this file. | ||
3 | LSM.txt | ||
4 | - description of the Linux Security Module framework. | ||
5 | SELinux.txt | ||
6 | - how to get started with the SELinux security enhancement. | ||
7 | Smack.txt | ||
8 | - documentation on the Smack Linux Security Module. | ||
9 | Yama.txt | ||
10 | - documentation on the Yama Linux Security Module. | ||
11 | apparmor.txt | ||
12 | - documentation on the AppArmor security extension. | ||
13 | credentials.txt | ||
14 | - documentation about credentials in Linux. | ||
15 | keys-ecryptfs.txt | ||
16 | - description of the encryption keys for the ecryptfs filesystem. | ||
17 | keys-request-key.txt | ||
18 | - description of the kernel key request service. | ||
19 | keys-trusted-encrypted.txt | ||
20 | - info on the Trusted and Encrypted keys in the kernel key ring service. | ||
21 | keys.txt | ||
22 | - description of the kernel key retention service. | ||
23 | tomoyo.txt | ||
24 | - documentation on the TOMOYO Linux Security Module. | ||
25 | IMA-templates.txt | ||
26 | - documentation on the template management mechanism for IMA. | ||
diff --git a/Documentation/security/IMA-templates.txt b/Documentation/security/IMA-templates.rst index 839b5dad9226..2cd0e273cc9a 100644 --- a/Documentation/security/IMA-templates.txt +++ b/Documentation/security/IMA-templates.rst | |||
@@ -1,9 +1,12 @@ | |||
1 | IMA Template Management Mechanism | 1 | ================================= |
2 | IMA Template Management Mechanism | ||
3 | ================================= | ||
2 | 4 | ||
3 | 5 | ||
4 | ==== INTRODUCTION ==== | 6 | Introduction |
7 | ============ | ||
5 | 8 | ||
6 | The original 'ima' template is fixed length, containing the filedata hash | 9 | The original ``ima`` template is fixed length, containing the filedata hash |
7 | and pathname. The filedata hash is limited to 20 bytes (md5/sha1). | 10 | and pathname. The filedata hash is limited to 20 bytes (md5/sha1). |
8 | The pathname is a null terminated string, limited to 255 characters. | 11 | The pathname is a null terminated string, limited to 255 characters. |
9 | To overcome these limitations and to add additional file metadata, it is | 12 | To overcome these limitations and to add additional file metadata, it is |
@@ -28,61 +31,64 @@ a new data type, developers define the field identifier and implement | |||
28 | two functions, init() and show(), respectively to generate and display | 31 | two functions, init() and show(), respectively to generate and display |
29 | measurement entries. Defining a new template descriptor requires | 32 | measurement entries. Defining a new template descriptor requires |
30 | specifying the template format (a string of field identifiers separated | 33 | specifying the template format (a string of field identifiers separated |
31 | by the '|' character) through the 'ima_template_fmt' kernel command line | 34 | by the ``|`` character) through the ``ima_template_fmt`` kernel command line |
32 | parameter. At boot time, IMA initializes the chosen template descriptor | 35 | parameter. At boot time, IMA initializes the chosen template descriptor |
33 | by translating the format into an array of template fields structures taken | 36 | by translating the format into an array of template fields structures taken |
34 | from the set of the supported ones. | 37 | from the set of the supported ones. |
35 | 38 | ||
36 | After the initialization step, IMA will call ima_alloc_init_template() | 39 | After the initialization step, IMA will call ``ima_alloc_init_template()`` |
37 | (new function defined within the patches for the new template management | 40 | (new function defined within the patches for the new template management |
38 | mechanism) to generate a new measurement entry by using the template | 41 | mechanism) to generate a new measurement entry by using the template |
39 | descriptor chosen through the kernel configuration or through the newly | 42 | descriptor chosen through the kernel configuration or through the newly |
40 | introduced 'ima_template' and 'ima_template_fmt' kernel command line parameters. | 43 | introduced ``ima_template`` and ``ima_template_fmt`` kernel command line parameters. |
41 | It is during this phase that the advantages of the new architecture are | 44 | It is during this phase that the advantages of the new architecture are |
42 | clearly shown: the latter function will not contain specific code to handle | 45 | clearly shown: the latter function will not contain specific code to handle |
43 | a given template but, instead, it simply calls the init() method of the template | 46 | a given template but, instead, it simply calls the ``init()`` method of the template |
44 | fields associated to the chosen template descriptor and store the result | 47 | fields associated to the chosen template descriptor and store the result |
45 | (pointer to allocated data and data length) in the measurement entry structure. | 48 | (pointer to allocated data and data length) in the measurement entry structure. |
46 | 49 | ||
47 | The same mechanism is employed to display measurements entries. | 50 | The same mechanism is employed to display measurements entries. |
48 | The functions ima[_ascii]_measurements_show() retrieve, for each entry, | 51 | The functions ``ima[_ascii]_measurements_show()`` retrieve, for each entry, |
49 | the template descriptor used to produce that entry and call the show() | 52 | the template descriptor used to produce that entry and call the show() |
50 | method for each item of the array of template fields structures. | 53 | method for each item of the array of template fields structures. |
51 | 54 | ||
52 | 55 | ||
53 | 56 | ||
54 | ==== SUPPORTED TEMPLATE FIELDS AND DESCRIPTORS ==== | 57 | Supported Template Fields and Descriptors |
58 | ========================================= | ||
55 | 59 | ||
56 | In the following, there is the list of supported template fields | 60 | In the following, there is the list of supported template fields |
57 | ('<identifier>': description), that can be used to define new template | 61 | ``('<identifier>': description)``, that can be used to define new template |
58 | descriptors by adding their identifier to the format string | 62 | descriptors by adding their identifier to the format string |
59 | (support for more data types will be added later): | 63 | (support for more data types will be added later): |
60 | 64 | ||
61 | - 'd': the digest of the event (i.e. the digest of a measured file), | 65 | - 'd': the digest of the event (i.e. the digest of a measured file), |
62 | calculated with the SHA1 or MD5 hash algorithm; | 66 | calculated with the SHA1 or MD5 hash algorithm; |
63 | - 'n': the name of the event (i.e. the file name), with size up to 255 bytes; | 67 | - 'n': the name of the event (i.e. the file name), with size up to 255 bytes; |
64 | - 'd-ng': the digest of the event, calculated with an arbitrary hash | 68 | - 'd-ng': the digest of the event, calculated with an arbitrary hash |
65 | algorithm (field format: [<hash algo>:]digest, where the digest | 69 | algorithm (field format: [<hash algo>:]digest, where the digest |
66 | prefix is shown only if the hash algorithm is not SHA1 or MD5); | 70 | prefix is shown only if the hash algorithm is not SHA1 or MD5); |
67 | - 'n-ng': the name of the event, without size limitations; | 71 | - 'n-ng': the name of the event, without size limitations; |
68 | - 'sig': the file signature. | 72 | - 'sig': the file signature. |
69 | 73 | ||
70 | 74 | ||
71 | Below, there is the list of defined template descriptors: | 75 | Below, there is the list of defined template descriptors: |
72 | - "ima": its format is 'd|n'; | ||
73 | - "ima-ng" (default): its format is 'd-ng|n-ng'; | ||
74 | - "ima-sig": its format is 'd-ng|n-ng|sig'. | ||
75 | 76 | ||
77 | - "ima": its format is ``d|n``; | ||
78 | - "ima-ng" (default): its format is ``d-ng|n-ng``; | ||
79 | - "ima-sig": its format is ``d-ng|n-ng|sig``. | ||
76 | 80 | ||
77 | 81 | ||
78 | ==== USE ==== | 82 | |
83 | Use | ||
84 | === | ||
79 | 85 | ||
80 | To specify the template descriptor to be used to generate measurement entries, | 86 | To specify the template descriptor to be used to generate measurement entries, |
81 | currently the following methods are supported: | 87 | currently the following methods are supported: |
82 | 88 | ||
83 | - select a template descriptor among those supported in the kernel | 89 | - select a template descriptor among those supported in the kernel |
84 | configuration ('ima-ng' is the default choice); | 90 | configuration (``ima-ng`` is the default choice); |
85 | - specify a template descriptor name from the kernel command line through | 91 | - specify a template descriptor name from the kernel command line through |
86 | the 'ima_template=' parameter; | 92 | the ``ima_template=`` parameter; |
87 | - register a new template descriptor with custom format through the kernel | 93 | - register a new template descriptor with custom format through the kernel |
88 | command line parameter 'ima_template_fmt='. | 94 | command line parameter ``ima_template_fmt=``. |
diff --git a/Documentation/security/LSM.rst b/Documentation/security/LSM.rst new file mode 100644 index 000000000000..d75778b0fa10 --- /dev/null +++ b/Documentation/security/LSM.rst | |||
@@ -0,0 +1,14 @@ | |||
1 | ================================= | ||
2 | Linux Security Module Development | ||
3 | ================================= | ||
4 | |||
5 | Based on https://lkml.org/lkml/2007/10/26/215, | ||
6 | a new LSM is accepted into the kernel when its intent (a description of | ||
7 | what it tries to protect against and in what cases one would expect to | ||
8 | use it) has been appropriately documented in ``Documentation/security/LSM``. | ||
9 | This allows an LSM's code to be easily compared to its goals, and so | ||
10 | that end users and distros can make a more informed decision about which | ||
11 | LSMs suit their requirements. | ||
12 | |||
13 | For extensive documentation on the available LSM hook interfaces, please | ||
14 | see ``include/linux/lsm_hooks.h``. | ||
diff --git a/Documentation/security/conf.py b/Documentation/security/conf.py deleted file mode 100644 index 472fc9a8eb67..000000000000 --- a/Documentation/security/conf.py +++ /dev/null | |||
@@ -1,8 +0,0 @@ | |||
1 | project = "The kernel security subsystem manual" | ||
2 | |||
3 | tags.add("subproject") | ||
4 | |||
5 | latex_documents = [ | ||
6 | ('index', 'security.tex', project, | ||
7 | 'The kernel development community', 'manual'), | ||
8 | ] | ||
diff --git a/Documentation/security/credentials.txt b/Documentation/security/credentials.rst index 86257052e31a..038a7e19eff9 100644 --- a/Documentation/security/credentials.txt +++ b/Documentation/security/credentials.rst | |||
@@ -1,38 +1,18 @@ | |||
1 | ==================== | 1 | ==================== |
2 | CREDENTIALS IN LINUX | 2 | Credentials in Linux |
3 | ==================== | 3 | ==================== |
4 | 4 | ||
5 | By: David Howells <dhowells@redhat.com> | 5 | By: David Howells <dhowells@redhat.com> |
6 | 6 | ||
7 | Contents: | 7 | .. contents:: :local: |
8 | |||
9 | (*) Overview. | ||
10 | |||
11 | (*) Types of credentials. | ||
12 | |||
13 | (*) File markings. | ||
14 | |||
15 | (*) Task credentials. | ||
16 | 8 | ||
17 | - Immutable credentials. | 9 | Overview |
18 | - Accessing task credentials. | ||
19 | - Accessing another task's credentials. | ||
20 | - Altering credentials. | ||
21 | - Managing credentials. | ||
22 | |||
23 | (*) Open file credentials. | ||
24 | |||
25 | (*) Overriding the VFS's use of credentials. | ||
26 | |||
27 | |||
28 | ======== | ||
29 | OVERVIEW | ||
30 | ======== | 10 | ======== |
31 | 11 | ||
32 | There are several parts to the security check performed by Linux when one | 12 | There are several parts to the security check performed by Linux when one |
33 | object acts upon another: | 13 | object acts upon another: |
34 | 14 | ||
35 | (1) Objects. | 15 | 1. Objects. |
36 | 16 | ||
37 | Objects are things in the system that may be acted upon directly by | 17 | Objects are things in the system that may be acted upon directly by |
38 | userspace programs. Linux has a variety of actionable objects, including: | 18 | userspace programs. Linux has a variety of actionable objects, including: |
@@ -48,7 +28,7 @@ object acts upon another: | |||
48 | As a part of the description of all these objects there is a set of | 28 | As a part of the description of all these objects there is a set of |
49 | credentials. What's in the set depends on the type of object. | 29 | credentials. What's in the set depends on the type of object. |
50 | 30 | ||
51 | (2) Object ownership. | 31 | 2. Object ownership. |
52 | 32 | ||
53 | Amongst the credentials of most objects, there will be a subset that | 33 | Amongst the credentials of most objects, there will be a subset that |
54 | indicates the ownership of that object. This is used for resource | 34 | indicates the ownership of that object. This is used for resource |
@@ -57,7 +37,7 @@ object acts upon another: | |||
57 | In a standard UNIX filesystem, for instance, this will be defined by the | 37 | In a standard UNIX filesystem, for instance, this will be defined by the |
58 | UID marked on the inode. | 38 | UID marked on the inode. |
59 | 39 | ||
60 | (3) The objective context. | 40 | 3. The objective context. |
61 | 41 | ||
62 | Also amongst the credentials of those objects, there will be a subset that | 42 | Also amongst the credentials of those objects, there will be a subset that |
63 | indicates the 'objective context' of that object. This may or may not be | 43 | indicates the 'objective context' of that object. This may or may not be |
@@ -67,7 +47,7 @@ object acts upon another: | |||
67 | The objective context is used as part of the security calculation that is | 47 | The objective context is used as part of the security calculation that is |
68 | carried out when an object is acted upon. | 48 | carried out when an object is acted upon. |
69 | 49 | ||
70 | (4) Subjects. | 50 | 4. Subjects. |
71 | 51 | ||
72 | A subject is an object that is acting upon another object. | 52 | A subject is an object that is acting upon another object. |
73 | 53 | ||
@@ -77,10 +57,10 @@ object acts upon another: | |||
77 | 57 | ||
78 | Objects other than tasks may under some circumstances also be subjects. | 58 | Objects other than tasks may under some circumstances also be subjects. |
79 | For instance an open file may send SIGIO to a task using the UID and EUID | 59 | For instance an open file may send SIGIO to a task using the UID and EUID |
80 | given to it by a task that called fcntl(F_SETOWN) upon it. In this case, | 60 | given to it by a task that called ``fcntl(F_SETOWN)`` upon it. In this case, |
81 | the file struct will have a subjective context too. | 61 | the file struct will have a subjective context too. |
82 | 62 | ||
83 | (5) The subjective context. | 63 | 5. The subjective context. |
84 | 64 | ||
85 | A subject has an additional interpretation of its credentials. A subset | 65 | A subject has an additional interpretation of its credentials. A subset |
86 | of its credentials forms the 'subjective context'. The subjective context | 66 | of its credentials forms the 'subjective context'. The subjective context |
@@ -92,7 +72,7 @@ object acts upon another: | |||
92 | from the real UID and GID that normally form the objective context of the | 72 | from the real UID and GID that normally form the objective context of the |
93 | task. | 73 | task. |
94 | 74 | ||
95 | (6) Actions. | 75 | 6. Actions. |
96 | 76 | ||
97 | Linux has a number of actions available that a subject may perform upon an | 77 | Linux has a number of actions available that a subject may perform upon an |
98 | object. The set of actions available depends on the nature of the subject | 78 | object. The set of actions available depends on the nature of the subject |
@@ -101,7 +81,7 @@ object acts upon another: | |||
101 | Actions include reading, writing, creating and deleting files; forking or | 81 | Actions include reading, writing, creating and deleting files; forking or |
102 | signalling and tracing tasks. | 82 | signalling and tracing tasks. |
103 | 83 | ||
104 | (7) Rules, access control lists and security calculations. | 84 | 7. Rules, access control lists and security calculations. |
105 | 85 | ||
106 | When a subject acts upon an object, a security calculation is made. This | 86 | When a subject acts upon an object, a security calculation is made. This |
107 | involves taking the subjective context, the objective context and the | 87 | involves taking the subjective context, the objective context and the |
@@ -111,7 +91,7 @@ object acts upon another: | |||
111 | 91 | ||
112 | There are two main sources of rules: | 92 | There are two main sources of rules: |
113 | 93 | ||
114 | (a) Discretionary access control (DAC): | 94 | a. Discretionary access control (DAC): |
115 | 95 | ||
116 | Sometimes the object will include sets of rules as part of its | 96 | Sometimes the object will include sets of rules as part of its |
117 | description. This is an 'Access Control List' or 'ACL'. A Linux | 97 | description. This is an 'Access Control List' or 'ACL'. A Linux |
@@ -127,7 +107,7 @@ object acts upon another: | |||
127 | A Linux file might also sport a POSIX ACL. This is a list of rules | 107 | A Linux file might also sport a POSIX ACL. This is a list of rules |
128 | that grants various permissions to arbitrary subjects. | 108 | that grants various permissions to arbitrary subjects. |
129 | 109 | ||
130 | (b) Mandatory access control (MAC): | 110 | b. Mandatory access control (MAC): |
131 | 111 | ||
132 | The system as a whole may have one or more sets of rules that get | 112 | The system as a whole may have one or more sets of rules that get |
133 | applied to all subjects and objects, regardless of their source. | 113 | applied to all subjects and objects, regardless of their source. |
@@ -139,65 +119,65 @@ object acts upon another: | |||
139 | that says that this action is either granted or denied. | 119 | that says that this action is either granted or denied. |
140 | 120 | ||
141 | 121 | ||
142 | ==================== | 122 | Types of Credentials |
143 | TYPES OF CREDENTIALS | ||
144 | ==================== | 123 | ==================== |
145 | 124 | ||
146 | The Linux kernel supports the following types of credentials: | 125 | The Linux kernel supports the following types of credentials: |
147 | 126 | ||
148 | (1) Traditional UNIX credentials. | 127 | 1. Traditional UNIX credentials. |
149 | 128 | ||
150 | Real User ID | 129 | - Real User ID |
151 | Real Group ID | 130 | - Real Group ID |
152 | 131 | ||
153 | The UID and GID are carried by most, if not all, Linux objects, even if in | 132 | The UID and GID are carried by most, if not all, Linux objects, even if in |
154 | some cases it has to be invented (FAT or CIFS files for example, which are | 133 | some cases it has to be invented (FAT or CIFS files for example, which are |
155 | derived from Windows). These (mostly) define the objective context of | 134 | derived from Windows). These (mostly) define the objective context of |
156 | that object, with tasks being slightly different in some cases. | 135 | that object, with tasks being slightly different in some cases. |
157 | 136 | ||
158 | Effective, Saved and FS User ID | 137 | - Effective, Saved and FS User ID |
159 | Effective, Saved and FS Group ID | 138 | - Effective, Saved and FS Group ID |
160 | Supplementary groups | 139 | - Supplementary groups |
161 | 140 | ||
162 | These are additional credentials used by tasks only. Usually, an | 141 | These are additional credentials used by tasks only. Usually, an |
163 | EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID | 142 | EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID |
164 | will be used as the objective. For tasks, it should be noted that this is | 143 | will be used as the objective. For tasks, it should be noted that this is |
165 | not always true. | 144 | not always true. |
166 | 145 | ||
167 | (2) Capabilities. | 146 | 2. Capabilities. |
168 | 147 | ||
169 | Set of permitted capabilities | 148 | - Set of permitted capabilities |
170 | Set of inheritable capabilities | 149 | - Set of inheritable capabilities |
171 | Set of effective capabilities | 150 | - Set of effective capabilities |
172 | Capability bounding set | 151 | - Capability bounding set |
173 | 152 | ||
174 | These are only carried by tasks. They indicate superior capabilities | 153 | These are only carried by tasks. They indicate superior capabilities |
175 | granted piecemeal to a task that an ordinary task wouldn't otherwise have. | 154 | granted piecemeal to a task that an ordinary task wouldn't otherwise have. |
176 | These are manipulated implicitly by changes to the traditional UNIX | 155 | These are manipulated implicitly by changes to the traditional UNIX |
177 | credentials, but can also be manipulated directly by the capset() system | 156 | credentials, but can also be manipulated directly by the ``capset()`` |
178 | call. | 157 | system call. |
179 | 158 | ||
180 | The permitted capabilities are those caps that the process might grant | 159 | The permitted capabilities are those caps that the process might grant |
181 | itself to its effective or permitted sets through capset(). This | 160 | itself to its effective or permitted sets through ``capset()``. This |
182 | inheritable set might also be so constrained. | 161 | inheritable set might also be so constrained. |
183 | 162 | ||
184 | The effective capabilities are the ones that a task is actually allowed to | 163 | The effective capabilities are the ones that a task is actually allowed to |
185 | make use of itself. | 164 | make use of itself. |
186 | 165 | ||
187 | The inheritable capabilities are the ones that may get passed across | 166 | The inheritable capabilities are the ones that may get passed across |
188 | execve(). | 167 | ``execve()``. |
189 | 168 | ||
190 | The bounding set limits the capabilities that may be inherited across | 169 | The bounding set limits the capabilities that may be inherited across |
191 | execve(), especially when a binary is executed that will execute as UID 0. | 170 | ``execve()``, especially when a binary is executed that will execute as |
171 | UID 0. | ||
192 | 172 | ||
193 | (3) Secure management flags (securebits). | 173 | 3. Secure management flags (securebits). |
194 | 174 | ||
195 | These are only carried by tasks. These govern the way the above | 175 | These are only carried by tasks. These govern the way the above |
196 | credentials are manipulated and inherited over certain operations such as | 176 | credentials are manipulated and inherited over certain operations such as |
197 | execve(). They aren't used directly as objective or subjective | 177 | execve(). They aren't used directly as objective or subjective |
198 | credentials. | 178 | credentials. |
199 | 179 | ||
200 | (4) Keys and keyrings. | 180 | 4. Keys and keyrings. |
201 | 181 | ||
202 | These are only carried by tasks. They carry and cache security tokens | 182 | These are only carried by tasks. They carry and cache security tokens |
203 | that don't fit into the other standard UNIX credentials. They are for | 183 | that don't fit into the other standard UNIX credentials. They are for |
@@ -218,7 +198,7 @@ The Linux kernel supports the following types of credentials: | |||
218 | 198 | ||
219 | For more information on using keys, see Documentation/security/keys.txt. | 199 | For more information on using keys, see Documentation/security/keys.txt. |
220 | 200 | ||
221 | (5) LSM | 201 | 5. LSM |
222 | 202 | ||
223 | The Linux Security Module allows extra controls to be placed over the | 203 | The Linux Security Module allows extra controls to be placed over the |
224 | operations that a task may do. Currently Linux supports several LSM | 204 | operations that a task may do. Currently Linux supports several LSM |
@@ -228,7 +208,7 @@ The Linux kernel supports the following types of credentials: | |||
228 | rules (policies) that say what operations a task with one label may do to | 208 | rules (policies) that say what operations a task with one label may do to |
229 | an object with another label. | 209 | an object with another label. |
230 | 210 | ||
231 | (6) AF_KEY | 211 | 6. AF_KEY |
232 | 212 | ||
233 | This is a socket-based approach to credential management for networking | 213 | This is a socket-based approach to credential management for networking |
234 | stacks [RFC 2367]. It isn't discussed by this document as it doesn't | 214 | stacks [RFC 2367]. It isn't discussed by this document as it doesn't |
@@ -244,25 +224,19 @@ network filesystem where the credentials of the opened file should be presented | |||
244 | to the server, regardless of who is actually doing a read or a write upon it. | 224 | to the server, regardless of who is actually doing a read or a write upon it. |
245 | 225 | ||
246 | 226 | ||
247 | ============= | 227 | File Markings |
248 | FILE MARKINGS | ||
249 | ============= | 228 | ============= |
250 | 229 | ||
251 | Files on disk or obtained over the network may have annotations that form the | 230 | Files on disk or obtained over the network may have annotations that form the |
252 | objective security context of that file. Depending on the type of filesystem, | 231 | objective security context of that file. Depending on the type of filesystem, |
253 | this may include one or more of the following: | 232 | this may include one or more of the following: |
254 | 233 | ||
255 | (*) UNIX UID, GID, mode; | 234 | * UNIX UID, GID, mode; |
256 | 235 | * Windows user ID; | |
257 | (*) Windows user ID; | 236 | * Access control list; |
258 | 237 | * LSM security label; | |
259 | (*) Access control list; | 238 | * UNIX exec privilege escalation bits (SUID/SGID); |
260 | 239 | * File capabilities exec privilege escalation bits. | |
261 | (*) LSM security label; | ||
262 | |||
263 | (*) UNIX exec privilege escalation bits (SUID/SGID); | ||
264 | |||
265 | (*) File capabilities exec privilege escalation bits. | ||
266 | 240 | ||
267 | These are compared to the task's subjective security context, and certain | 241 | These are compared to the task's subjective security context, and certain |
268 | operations allowed or disallowed as a result. In the case of execve(), the | 242 | operations allowed or disallowed as a result. In the case of execve(), the |
@@ -270,8 +244,7 @@ privilege escalation bits come into play, and may allow the resulting process | |||
270 | extra privileges, based on the annotations on the executable file. | 244 | extra privileges, based on the annotations on the executable file. |
271 | 245 | ||
272 | 246 | ||
273 | ================ | 247 | Task Credentials |
274 | TASK CREDENTIALS | ||
275 | ================ | 248 | ================ |
276 | 249 | ||
277 | In Linux, all of a task's credentials are held in (uid, gid) or through | 250 | In Linux, all of a task's credentials are held in (uid, gid) or through |
@@ -282,20 +255,20 @@ task_struct. | |||
282 | Once a set of credentials has been prepared and committed, it may not be | 255 | Once a set of credentials has been prepared and committed, it may not be |
283 | changed, barring the following exceptions: | 256 | changed, barring the following exceptions: |
284 | 257 | ||
285 | (1) its reference count may be changed; | 258 | 1. its reference count may be changed; |
286 | 259 | ||
287 | (2) the reference count on the group_info struct it points to may be changed; | 260 | 2. the reference count on the group_info struct it points to may be changed; |
288 | 261 | ||
289 | (3) the reference count on the security data it points to may be changed; | 262 | 3. the reference count on the security data it points to may be changed; |
290 | 263 | ||
291 | (4) the reference count on any keyrings it points to may be changed; | 264 | 4. the reference count on any keyrings it points to may be changed; |
292 | 265 | ||
293 | (5) any keyrings it points to may be revoked, expired or have their security | 266 | 5. any keyrings it points to may be revoked, expired or have their security |
294 | attributes changed; and | 267 | attributes changed; and |
295 | 268 | ||
296 | (6) the contents of any keyrings to which it points may be changed (the whole | 269 | 6. the contents of any keyrings to which it points may be changed (the whole |
297 | point of keyrings being a shared set of credentials, modifiable by anyone | 270 | point of keyrings being a shared set of credentials, modifiable by anyone |
298 | with appropriate access). | 271 | with appropriate access). |
299 | 272 | ||
300 | To alter anything in the cred struct, the copy-and-replace principle must be | 273 | To alter anything in the cred struct, the copy-and-replace principle must be |
301 | adhered to. First take a copy, then alter the copy and then use RCU to change | 274 | adhered to. First take a copy, then alter the copy and then use RCU to change |
@@ -303,37 +276,37 @@ the task pointer to make it point to the new copy. There are wrappers to aid | |||
303 | with this (see below). | 276 | with this (see below). |
304 | 277 | ||
305 | A task may only alter its _own_ credentials; it is no longer permitted for a | 278 | A task may only alter its _own_ credentials; it is no longer permitted for a |
306 | task to alter another's credentials. This means the capset() system call is no | 279 | task to alter another's credentials. This means the ``capset()`` system call |
307 | longer permitted to take any PID other than the one of the current process. | 280 | is no longer permitted to take any PID other than the one of the current |
308 | Also keyctl_instantiate() and keyctl_negate() functions no longer permit | 281 | process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no |
309 | attachment to process-specific keyrings in the requesting process as the | 282 | longer permit attachment to process-specific keyrings in the requesting |
310 | instantiating process may need to create them. | 283 | process as the instantiating process may need to create them. |
311 | 284 | ||
312 | 285 | ||
313 | IMMUTABLE CREDENTIALS | 286 | Immutable Credentials |
314 | --------------------- | 287 | --------------------- |
315 | 288 | ||
316 | Once a set of credentials has been made public (by calling commit_creds() for | 289 | Once a set of credentials has been made public (by calling ``commit_creds()`` |
317 | example), it must be considered immutable, barring two exceptions: | 290 | for example), it must be considered immutable, barring two exceptions: |
318 | 291 | ||
319 | (1) The reference count may be altered. | 292 | 1. The reference count may be altered. |
320 | 293 | ||
321 | (2) Whilst the keyring subscriptions of a set of credentials may not be | 294 | 2. Whilst the keyring subscriptions of a set of credentials may not be |
322 | changed, the keyrings subscribed to may have their contents altered. | 295 | changed, the keyrings subscribed to may have their contents altered. |
323 | 296 | ||
324 | To catch accidental credential alteration at compile time, struct task_struct | 297 | To catch accidental credential alteration at compile time, struct task_struct |
325 | has _const_ pointers to its credential sets, as does struct file. Furthermore, | 298 | has _const_ pointers to its credential sets, as does struct file. Furthermore, |
326 | certain functions such as get_cred() and put_cred() operate on const pointers, | 299 | certain functions such as ``get_cred()`` and ``put_cred()`` operate on const |
327 | thus rendering casts unnecessary, but require to temporarily ditch the const | 300 | pointers, thus rendering casts unnecessary, but require to temporarily ditch |
328 | qualification to be able to alter the reference count. | 301 | the const qualification to be able to alter the reference count. |
329 | 302 | ||
330 | 303 | ||
331 | ACCESSING TASK CREDENTIALS | 304 | Accessing Task Credentials |
332 | -------------------------- | 305 | -------------------------- |
333 | 306 | ||
334 | A task being able to alter only its own credentials permits the current process | 307 | A task being able to alter only its own credentials permits the current process |
335 | to read or replace its own credentials without the need for any form of locking | 308 | to read or replace its own credentials without the need for any form of locking |
336 | - which simplifies things greatly. It can just call: | 309 | -- which simplifies things greatly. It can just call:: |
337 | 310 | ||
338 | const struct cred *current_cred() | 311 | const struct cred *current_cred() |
339 | 312 | ||
@@ -341,7 +314,7 @@ to get a pointer to its credentials structure, and it doesn't have to release | |||
341 | it afterwards. | 314 | it afterwards. |
342 | 315 | ||
343 | There are convenience wrappers for retrieving specific aspects of a task's | 316 | There are convenience wrappers for retrieving specific aspects of a task's |
344 | credentials (the value is simply returned in each case): | 317 | credentials (the value is simply returned in each case):: |
345 | 318 | ||
346 | uid_t current_uid(void) Current's real UID | 319 | uid_t current_uid(void) Current's real UID |
347 | gid_t current_gid(void) Current's real GID | 320 | gid_t current_gid(void) Current's real GID |
@@ -354,7 +327,7 @@ credentials (the value is simply returned in each case): | |||
354 | struct user_struct *current_user(void) Current's user account | 327 | struct user_struct *current_user(void) Current's user account |
355 | 328 | ||
356 | There are also convenience wrappers for retrieving specific associated pairs of | 329 | There are also convenience wrappers for retrieving specific associated pairs of |
357 | a task's credentials: | 330 | a task's credentials:: |
358 | 331 | ||
359 | void current_uid_gid(uid_t *, gid_t *); | 332 | void current_uid_gid(uid_t *, gid_t *); |
360 | void current_euid_egid(uid_t *, gid_t *); | 333 | void current_euid_egid(uid_t *, gid_t *); |
@@ -365,12 +338,12 @@ them from the current task's credentials. | |||
365 | 338 | ||
366 | 339 | ||
367 | In addition, there is a function for obtaining a reference on the current | 340 | In addition, there is a function for obtaining a reference on the current |
368 | process's current set of credentials: | 341 | process's current set of credentials:: |
369 | 342 | ||
370 | const struct cred *get_current_cred(void); | 343 | const struct cred *get_current_cred(void); |
371 | 344 | ||
372 | and functions for getting references to one of the credentials that don't | 345 | and functions for getting references to one of the credentials that don't |
373 | actually live in struct cred: | 346 | actually live in struct cred:: |
374 | 347 | ||
375 | struct user_struct *get_current_user(void); | 348 | struct user_struct *get_current_user(void); |
376 | struct group_info *get_current_groups(void); | 349 | struct group_info *get_current_groups(void); |
@@ -378,22 +351,22 @@ actually live in struct cred: | |||
378 | which get references to the current process's user accounting structure and | 351 | which get references to the current process's user accounting structure and |
379 | supplementary groups list respectively. | 352 | supplementary groups list respectively. |
380 | 353 | ||
381 | Once a reference has been obtained, it must be released with put_cred(), | 354 | Once a reference has been obtained, it must be released with ``put_cred()``, |
382 | free_uid() or put_group_info() as appropriate. | 355 | ``free_uid()`` or ``put_group_info()`` as appropriate. |
383 | 356 | ||
384 | 357 | ||
385 | ACCESSING ANOTHER TASK'S CREDENTIALS | 358 | Accessing Another Task's Credentials |
386 | ------------------------------------ | 359 | ------------------------------------ |
387 | 360 | ||
388 | Whilst a task may access its own credentials without the need for locking, the | 361 | Whilst a task may access its own credentials without the need for locking, the |
389 | same is not true of a task wanting to access another task's credentials. It | 362 | same is not true of a task wanting to access another task's credentials. It |
390 | must use the RCU read lock and rcu_dereference(). | 363 | must use the RCU read lock and ``rcu_dereference()``. |
391 | 364 | ||
392 | The rcu_dereference() is wrapped by: | 365 | The ``rcu_dereference()`` is wrapped by:: |
393 | 366 | ||
394 | const struct cred *__task_cred(struct task_struct *task); | 367 | const struct cred *__task_cred(struct task_struct *task); |
395 | 368 | ||
396 | This should be used inside the RCU read lock, as in the following example: | 369 | This should be used inside the RCU read lock, as in the following example:: |
397 | 370 | ||
398 | void foo(struct task_struct *t, struct foo_data *f) | 371 | void foo(struct task_struct *t, struct foo_data *f) |
399 | { | 372 | { |
@@ -410,39 +383,40 @@ This should be used inside the RCU read lock, as in the following example: | |||
410 | 383 | ||
411 | Should it be necessary to hold another task's credentials for a long period of | 384 | Should it be necessary to hold another task's credentials for a long period of |
412 | time, and possibly to sleep whilst doing so, then the caller should get a | 385 | time, and possibly to sleep whilst doing so, then the caller should get a |
413 | reference on them using: | 386 | reference on them using:: |
414 | 387 | ||
415 | const struct cred *get_task_cred(struct task_struct *task); | 388 | const struct cred *get_task_cred(struct task_struct *task); |
416 | 389 | ||
417 | This does all the RCU magic inside of it. The caller must call put_cred() on | 390 | This does all the RCU magic inside of it. The caller must call put_cred() on |
418 | the credentials so obtained when they're finished with. | 391 | the credentials so obtained when they're finished with. |
419 | 392 | ||
420 | [*] Note: The result of __task_cred() should not be passed directly to | 393 | .. note:: |
421 | get_cred() as this may race with commit_cred(). | 394 | The result of ``__task_cred()`` should not be passed directly to |
395 | ``get_cred()`` as this may race with ``commit_cred()``. | ||
422 | 396 | ||
423 | There are a couple of convenience functions to access bits of another task's | 397 | There are a couple of convenience functions to access bits of another task's |
424 | credentials, hiding the RCU magic from the caller: | 398 | credentials, hiding the RCU magic from the caller:: |
425 | 399 | ||
426 | uid_t task_uid(task) Task's real UID | 400 | uid_t task_uid(task) Task's real UID |
427 | uid_t task_euid(task) Task's effective UID | 401 | uid_t task_euid(task) Task's effective UID |
428 | 402 | ||
429 | If the caller is holding the RCU read lock at the time anyway, then: | 403 | If the caller is holding the RCU read lock at the time anyway, then:: |
430 | 404 | ||
431 | __task_cred(task)->uid | 405 | __task_cred(task)->uid |
432 | __task_cred(task)->euid | 406 | __task_cred(task)->euid |
433 | 407 | ||
434 | should be used instead. Similarly, if multiple aspects of a task's credentials | 408 | should be used instead. Similarly, if multiple aspects of a task's credentials |
435 | need to be accessed, RCU read lock should be used, __task_cred() called, the | 409 | need to be accessed, RCU read lock should be used, ``__task_cred()`` called, |
436 | result stored in a temporary pointer and then the credential aspects called | 410 | the result stored in a temporary pointer and then the credential aspects called |
437 | from that before dropping the lock. This prevents the potentially expensive | 411 | from that before dropping the lock. This prevents the potentially expensive |
438 | RCU magic from being invoked multiple times. | 412 | RCU magic from being invoked multiple times. |
439 | 413 | ||
440 | Should some other single aspect of another task's credentials need to be | 414 | Should some other single aspect of another task's credentials need to be |
441 | accessed, then this can be used: | 415 | accessed, then this can be used:: |
442 | 416 | ||
443 | task_cred_xxx(task, member) | 417 | task_cred_xxx(task, member) |
444 | 418 | ||
445 | where 'member' is a non-pointer member of the cred struct. For instance: | 419 | where 'member' is a non-pointer member of the cred struct. For instance:: |
446 | 420 | ||
447 | uid_t task_cred_xxx(task, suid); | 421 | uid_t task_cred_xxx(task, suid); |
448 | 422 | ||
@@ -451,7 +425,7 @@ magic. This may not be used for pointer members as what they point to may | |||
451 | disappear the moment the RCU read lock is dropped. | 425 | disappear the moment the RCU read lock is dropped. |
452 | 426 | ||
453 | 427 | ||
454 | ALTERING CREDENTIALS | 428 | Altering Credentials |
455 | -------------------- | 429 | -------------------- |
456 | 430 | ||
457 | As previously mentioned, a task may only alter its own credentials, and may not | 431 | As previously mentioned, a task may only alter its own credentials, and may not |
@@ -459,7 +433,7 @@ alter those of another task. This means that it doesn't need to use any | |||
459 | locking to alter its own credentials. | 433 | locking to alter its own credentials. |
460 | 434 | ||
461 | To alter the current process's credentials, a function should first prepare a | 435 | To alter the current process's credentials, a function should first prepare a |
462 | new set of credentials by calling: | 436 | new set of credentials by calling:: |
463 | 437 | ||
464 | struct cred *prepare_creds(void); | 438 | struct cred *prepare_creds(void); |
465 | 439 | ||
@@ -467,9 +441,10 @@ this locks current->cred_replace_mutex and then allocates and constructs a | |||
467 | duplicate of the current process's credentials, returning with the mutex still | 441 | duplicate of the current process's credentials, returning with the mutex still |
468 | held if successful. It returns NULL if not successful (out of memory). | 442 | held if successful. It returns NULL if not successful (out of memory). |
469 | 443 | ||
470 | The mutex prevents ptrace() from altering the ptrace state of a process whilst | 444 | The mutex prevents ``ptrace()`` from altering the ptrace state of a process |
471 | security checks on credentials construction and changing is taking place as | 445 | whilst security checks on credentials construction and changing is taking place |
472 | the ptrace state may alter the outcome, particularly in the case of execve(). | 446 | as the ptrace state may alter the outcome, particularly in the case of |
447 | ``execve()``. | ||
473 | 448 | ||
474 | The new credentials set should be altered appropriately, and any security | 449 | The new credentials set should be altered appropriately, and any security |
475 | checks and hooks done. Both the current and the proposed sets of credentials | 450 | checks and hooks done. Both the current and the proposed sets of credentials |
@@ -478,36 +453,37 @@ still at this point. | |||
478 | 453 | ||
479 | 454 | ||
480 | When the credential set is ready, it should be committed to the current process | 455 | When the credential set is ready, it should be committed to the current process |
481 | by calling: | 456 | by calling:: |
482 | 457 | ||
483 | int commit_creds(struct cred *new); | 458 | int commit_creds(struct cred *new); |
484 | 459 | ||
485 | This will alter various aspects of the credentials and the process, giving the | 460 | This will alter various aspects of the credentials and the process, giving the |
486 | LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually | 461 | LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to |
487 | commit the new credentials to current->cred, it will release | 462 | actually commit the new credentials to ``current->cred``, it will release |
488 | current->cred_replace_mutex to allow ptrace() to take place, and it will notify | 463 | ``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it |
489 | the scheduler and others of the changes. | 464 | will notify the scheduler and others of the changes. |
490 | 465 | ||
491 | This function is guaranteed to return 0, so that it can be tail-called at the | 466 | This function is guaranteed to return 0, so that it can be tail-called at the |
492 | end of such functions as sys_setresuid(). | 467 | end of such functions as ``sys_setresuid()``. |
493 | 468 | ||
494 | Note that this function consumes the caller's reference to the new credentials. | 469 | Note that this function consumes the caller's reference to the new credentials. |
495 | The caller should _not_ call put_cred() on the new credentials afterwards. | 470 | The caller should _not_ call ``put_cred()`` on the new credentials afterwards. |
496 | 471 | ||
497 | Furthermore, once this function has been called on a new set of credentials, | 472 | Furthermore, once this function has been called on a new set of credentials, |
498 | those credentials may _not_ be changed further. | 473 | those credentials may _not_ be changed further. |
499 | 474 | ||
500 | 475 | ||
501 | Should the security checks fail or some other error occur after prepare_creds() | 476 | Should the security checks fail or some other error occur after |
502 | has been called, then the following function should be invoked: | 477 | ``prepare_creds()`` has been called, then the following function should be |
478 | invoked:: | ||
503 | 479 | ||
504 | void abort_creds(struct cred *new); | 480 | void abort_creds(struct cred *new); |
505 | 481 | ||
506 | This releases the lock on current->cred_replace_mutex that prepare_creds() got | 482 | This releases the lock on ``current->cred_replace_mutex`` that |
507 | and then releases the new credentials. | 483 | ``prepare_creds()`` got and then releases the new credentials. |
508 | 484 | ||
509 | 485 | ||
510 | A typical credentials alteration function would look something like this: | 486 | A typical credentials alteration function would look something like this:: |
511 | 487 | ||
512 | int alter_suid(uid_t suid) | 488 | int alter_suid(uid_t suid) |
513 | { | 489 | { |
@@ -529,53 +505,50 @@ A typical credentials alteration function would look something like this: | |||
529 | } | 505 | } |
530 | 506 | ||
531 | 507 | ||
532 | MANAGING CREDENTIALS | 508 | Managing Credentials |
533 | -------------------- | 509 | -------------------- |
534 | 510 | ||
535 | There are some functions to help manage credentials: | 511 | There are some functions to help manage credentials: |
536 | 512 | ||
537 | (*) void put_cred(const struct cred *cred); | 513 | - ``void put_cred(const struct cred *cred);`` |
538 | 514 | ||
539 | This releases a reference to the given set of credentials. If the | 515 | This releases a reference to the given set of credentials. If the |
540 | reference count reaches zero, the credentials will be scheduled for | 516 | reference count reaches zero, the credentials will be scheduled for |
541 | destruction by the RCU system. | 517 | destruction by the RCU system. |
542 | 518 | ||
543 | (*) const struct cred *get_cred(const struct cred *cred); | 519 | - ``const struct cred *get_cred(const struct cred *cred);`` |
544 | 520 | ||
545 | This gets a reference on a live set of credentials, returning a pointer to | 521 | This gets a reference on a live set of credentials, returning a pointer to |
546 | that set of credentials. | 522 | that set of credentials. |
547 | 523 | ||
548 | (*) struct cred *get_new_cred(struct cred *cred); | 524 | - ``struct cred *get_new_cred(struct cred *cred);`` |
549 | 525 | ||
550 | This gets a reference on a set of credentials that is under construction | 526 | This gets a reference on a set of credentials that is under construction |
551 | and is thus still mutable, returning a pointer to that set of credentials. | 527 | and is thus still mutable, returning a pointer to that set of credentials. |
552 | 528 | ||
553 | 529 | ||
554 | ===================== | 530 | Open File Credentials |
555 | OPEN FILE CREDENTIALS | ||
556 | ===================== | 531 | ===================== |
557 | 532 | ||
558 | When a new file is opened, a reference is obtained on the opening task's | 533 | When a new file is opened, a reference is obtained on the opening task's |
559 | credentials and this is attached to the file struct as 'f_cred' in place of | 534 | credentials and this is attached to the file struct as ``f_cred`` in place of |
560 | 'f_uid' and 'f_gid'. Code that used to access file->f_uid and file->f_gid | 535 | ``f_uid`` and ``f_gid``. Code that used to access ``file->f_uid`` and |
561 | should now access file->f_cred->fsuid and file->f_cred->fsgid. | 536 | ``file->f_gid`` should now access ``file->f_cred->fsuid`` and |
537 | ``file->f_cred->fsgid``. | ||
562 | 538 | ||
563 | It is safe to access f_cred without the use of RCU or locking because the | 539 | It is safe to access ``f_cred`` without the use of RCU or locking because the |
564 | pointer will not change over the lifetime of the file struct, and nor will the | 540 | pointer will not change over the lifetime of the file struct, and nor will the |
565 | contents of the cred struct pointed to, barring the exceptions listed above | 541 | contents of the cred struct pointed to, barring the exceptions listed above |
566 | (see the Task Credentials section). | 542 | (see the Task Credentials section). |
567 | 543 | ||
568 | 544 | ||
569 | ======================================= | 545 | Overriding the VFS's Use of Credentials |
570 | OVERRIDING THE VFS'S USE OF CREDENTIALS | ||
571 | ======================================= | 546 | ======================================= |
572 | 547 | ||
573 | Under some circumstances it is desirable to override the credentials used by | 548 | Under some circumstances it is desirable to override the credentials used by |
574 | the VFS, and that can be done by calling into such as vfs_mkdir() with a | 549 | the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a |
575 | different set of credentials. This is done in the following places: | 550 | different set of credentials. This is done in the following places: |
576 | 551 | ||
577 | (*) sys_faccessat(). | 552 | * ``sys_faccessat()``. |
578 | 553 | * ``do_coredump()``. | |
579 | (*) do_coredump(). | 554 | * nfs4recover.c. |
580 | |||
581 | (*) nfs4recover.c. | ||
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst index 9bae6bb20e7f..298a94a33f05 100644 --- a/Documentation/security/index.rst +++ b/Documentation/security/index.rst | |||
@@ -1,7 +1,13 @@ | |||
1 | ====================== | 1 | ====================== |
2 | Security documentation | 2 | Security Documentation |
3 | ====================== | 3 | ====================== |
4 | 4 | ||
5 | .. toctree:: | 5 | .. toctree:: |
6 | :maxdepth: 1 | ||
6 | 7 | ||
8 | credentials | ||
9 | IMA-templates | ||
10 | keys/index | ||
11 | LSM | ||
12 | self-protection | ||
7 | tpm/index | 13 | tpm/index |
diff --git a/Documentation/security/keys.txt b/Documentation/security/keys/core.rst index cd5019934d7f..0d831a7afe4f 100644 --- a/Documentation/security/keys.txt +++ b/Documentation/security/keys/core.rst | |||
@@ -1,6 +1,6 @@ | |||
1 | ============================ | 1 | ============================ |
2 | KERNEL KEY RETENTION SERVICE | 2 | Kernel Key Retention Service |
3 | ============================ | 3 | ============================ |
4 | 4 | ||
5 | This service allows cryptographic keys, authentication tokens, cross-domain | 5 | This service allows cryptographic keys, authentication tokens, cross-domain |
6 | user mappings, and similar to be cached in the kernel for the use of | 6 | user mappings, and similar to be cached in the kernel for the use of |
@@ -29,8 +29,7 @@ This document has the following sections: | |||
29 | - Garbage collection | 29 | - Garbage collection |
30 | 30 | ||
31 | 31 | ||
32 | ============ | 32 | Key Overview |
33 | KEY OVERVIEW | ||
34 | ============ | 33 | ============ |
35 | 34 | ||
36 | In this context, keys represent units of cryptographic data, authentication | 35 | In this context, keys represent units of cryptographic data, authentication |
@@ -47,14 +46,14 @@ Each key has a number of attributes: | |||
47 | - State. | 46 | - State. |
48 | 47 | ||
49 | 48 | ||
50 | (*) Each key is issued a serial number of type key_serial_t that is unique for | 49 | * Each key is issued a serial number of type key_serial_t that is unique for |
51 | the lifetime of that key. All serial numbers are positive non-zero 32-bit | 50 | the lifetime of that key. All serial numbers are positive non-zero 32-bit |
52 | integers. | 51 | integers. |
53 | 52 | ||
54 | Userspace programs can use a key's serial numbers as a way to gain access | 53 | Userspace programs can use a key's serial numbers as a way to gain access |
55 | to it, subject to permission checking. | 54 | to it, subject to permission checking. |
56 | 55 | ||
57 | (*) Each key is of a defined "type". Types must be registered inside the | 56 | * Each key is of a defined "type". Types must be registered inside the |
58 | kernel by a kernel service (such as a filesystem) before keys of that type | 57 | kernel by a kernel service (such as a filesystem) before keys of that type |
59 | can be added or used. Userspace programs cannot define new types directly. | 58 | can be added or used. Userspace programs cannot define new types directly. |
60 | 59 | ||
@@ -64,18 +63,18 @@ Each key has a number of attributes: | |||
64 | Should a type be removed from the system, all the keys of that type will | 63 | Should a type be removed from the system, all the keys of that type will |
65 | be invalidated. | 64 | be invalidated. |
66 | 65 | ||
67 | (*) Each key has a description. This should be a printable string. The key | 66 | * Each key has a description. This should be a printable string. The key |
68 | type provides an operation to perform a match between the description on a | 67 | type provides an operation to perform a match between the description on a |
69 | key and a criterion string. | 68 | key and a criterion string. |
70 | 69 | ||
71 | (*) Each key has an owner user ID, a group ID and a permissions mask. These | 70 | * Each key has an owner user ID, a group ID and a permissions mask. These |
72 | are used to control what a process may do to a key from userspace, and | 71 | are used to control what a process may do to a key from userspace, and |
73 | whether a kernel service will be able to find the key. | 72 | whether a kernel service will be able to find the key. |
74 | 73 | ||
75 | (*) Each key can be set to expire at a specific time by the key type's | 74 | * Each key can be set to expire at a specific time by the key type's |
76 | instantiation function. Keys can also be immortal. | 75 | instantiation function. Keys can also be immortal. |
77 | 76 | ||
78 | (*) Each key can have a payload. This is a quantity of data that represent the | 77 | * Each key can have a payload. This is a quantity of data that represent the |
79 | actual "key". In the case of a keyring, this is a list of keys to which | 78 | actual "key". In the case of a keyring, this is a list of keys to which |
80 | the keyring links; in the case of a user-defined key, it's an arbitrary | 79 | the keyring links; in the case of a user-defined key, it's an arbitrary |
81 | blob of data. | 80 | blob of data. |
@@ -91,39 +90,38 @@ Each key has a number of attributes: | |||
91 | permitted, another key type operation will be called to convert the key's | 90 | permitted, another key type operation will be called to convert the key's |
92 | attached payload back into a blob of data. | 91 | attached payload back into a blob of data. |
93 | 92 | ||
94 | (*) Each key can be in one of a number of basic states: | 93 | * Each key can be in one of a number of basic states: |
95 | 94 | ||
96 | (*) Uninstantiated. The key exists, but does not have any data attached. | 95 | * Uninstantiated. The key exists, but does not have any data attached. |
97 | Keys being requested from userspace will be in this state. | 96 | Keys being requested from userspace will be in this state. |
98 | 97 | ||
99 | (*) Instantiated. This is the normal state. The key is fully formed, and | 98 | * Instantiated. This is the normal state. The key is fully formed, and |
100 | has data attached. | 99 | has data attached. |
101 | 100 | ||
102 | (*) Negative. This is a relatively short-lived state. The key acts as a | 101 | * Negative. This is a relatively short-lived state. The key acts as a |
103 | note saying that a previous call out to userspace failed, and acts as | 102 | note saying that a previous call out to userspace failed, and acts as |
104 | a throttle on key lookups. A negative key can be updated to a normal | 103 | a throttle on key lookups. A negative key can be updated to a normal |
105 | state. | 104 | state. |
106 | 105 | ||
107 | (*) Expired. Keys can have lifetimes set. If their lifetime is exceeded, | 106 | * Expired. Keys can have lifetimes set. If their lifetime is exceeded, |
108 | they traverse to this state. An expired key can be updated back to a | 107 | they traverse to this state. An expired key can be updated back to a |
109 | normal state. | 108 | normal state. |
110 | 109 | ||
111 | (*) Revoked. A key is put in this state by userspace action. It can't be | 110 | * Revoked. A key is put in this state by userspace action. It can't be |
112 | found or operated upon (apart from by unlinking it). | 111 | found or operated upon (apart from by unlinking it). |
113 | 112 | ||
114 | (*) Dead. The key's type was unregistered, and so the key is now useless. | 113 | * Dead. The key's type was unregistered, and so the key is now useless. |
115 | 114 | ||
116 | Keys in the last three states are subject to garbage collection. See the | 115 | Keys in the last three states are subject to garbage collection. See the |
117 | section on "Garbage collection". | 116 | section on "Garbage collection". |
118 | 117 | ||
119 | 118 | ||
120 | ==================== | 119 | Key Service Overview |
121 | KEY SERVICE OVERVIEW | ||
122 | ==================== | 120 | ==================== |
123 | 121 | ||
124 | The key service provides a number of features besides keys: | 122 | The key service provides a number of features besides keys: |
125 | 123 | ||
126 | (*) The key service defines three special key types: | 124 | * The key service defines three special key types: |
127 | 125 | ||
128 | (+) "keyring" | 126 | (+) "keyring" |
129 | 127 | ||
@@ -149,7 +147,7 @@ The key service provides a number of features besides keys: | |||
149 | be created and updated from userspace, but the payload is only | 147 | be created and updated from userspace, but the payload is only |
150 | readable from kernel space. | 148 | readable from kernel space. |
151 | 149 | ||
152 | (*) Each process subscribes to three keyrings: a thread-specific keyring, a | 150 | * Each process subscribes to three keyrings: a thread-specific keyring, a |
153 | process-specific keyring, and a session-specific keyring. | 151 | process-specific keyring, and a session-specific keyring. |
154 | 152 | ||
155 | The thread-specific keyring is discarded from the child when any sort of | 153 | The thread-specific keyring is discarded from the child when any sort of |
@@ -170,7 +168,7 @@ The key service provides a number of features besides keys: | |||
170 | The ownership of the thread keyring changes when the real UID and GID of | 168 | The ownership of the thread keyring changes when the real UID and GID of |
171 | the thread changes. | 169 | the thread changes. |
172 | 170 | ||
173 | (*) Each user ID resident in the system holds two special keyrings: a user | 171 | * Each user ID resident in the system holds two special keyrings: a user |
174 | specific keyring and a default user session keyring. The default session | 172 | specific keyring and a default user session keyring. The default session |
175 | keyring is initialised with a link to the user-specific keyring. | 173 | keyring is initialised with a link to the user-specific keyring. |
176 | 174 | ||
@@ -180,7 +178,7 @@ The key service provides a number of features besides keys: | |||
180 | If a process attempts to access its session key when it doesn't have one, | 178 | If a process attempts to access its session key when it doesn't have one, |
181 | it will be subscribed to the default for its current UID. | 179 | it will be subscribed to the default for its current UID. |
182 | 180 | ||
183 | (*) Each user has two quotas against which the keys they own are tracked. One | 181 | * Each user has two quotas against which the keys they own are tracked. One |
184 | limits the total number of keys and keyrings, the other limits the total | 182 | limits the total number of keys and keyrings, the other limits the total |
185 | amount of description and payload space that can be consumed. | 183 | amount of description and payload space that can be consumed. |
186 | 184 | ||
@@ -194,54 +192,53 @@ The key service provides a number of features besides keys: | |||
194 | If a system call that modifies a key or keyring in some way would put the | 192 | If a system call that modifies a key or keyring in some way would put the |
195 | user over quota, the operation is refused and error EDQUOT is returned. | 193 | user over quota, the operation is refused and error EDQUOT is returned. |
196 | 194 | ||
197 | (*) There's a system call interface by which userspace programs can create and | 195 | * There's a system call interface by which userspace programs can create and |
198 | manipulate keys and keyrings. | 196 | manipulate keys and keyrings. |
199 | 197 | ||
200 | (*) There's a kernel interface by which services can register types and search | 198 | * There's a kernel interface by which services can register types and search |
201 | for keys. | 199 | for keys. |
202 | 200 | ||
203 | (*) There's a way for the a search done from the kernel to call back to | 201 | * There's a way for the a search done from the kernel to call back to |
204 | userspace to request a key that can't be found in a process's keyrings. | 202 | userspace to request a key that can't be found in a process's keyrings. |
205 | 203 | ||
206 | (*) An optional filesystem is available through which the key database can be | 204 | * An optional filesystem is available through which the key database can be |
207 | viewed and manipulated. | 205 | viewed and manipulated. |
208 | 206 | ||
209 | 207 | ||
210 | ====================== | 208 | Key Access Permissions |
211 | KEY ACCESS PERMISSIONS | ||
212 | ====================== | 209 | ====================== |
213 | 210 | ||
214 | Keys have an owner user ID, a group access ID, and a permissions mask. The mask | 211 | Keys have an owner user ID, a group access ID, and a permissions mask. The mask |
215 | has up to eight bits each for possessor, user, group and other access. Only | 212 | has up to eight bits each for possessor, user, group and other access. Only |
216 | six of each set of eight bits are defined. These permissions granted are: | 213 | six of each set of eight bits are defined. These permissions granted are: |
217 | 214 | ||
218 | (*) View | 215 | * View |
219 | 216 | ||
220 | This permits a key or keyring's attributes to be viewed - including key | 217 | This permits a key or keyring's attributes to be viewed - including key |
221 | type and description. | 218 | type and description. |
222 | 219 | ||
223 | (*) Read | 220 | * Read |
224 | 221 | ||
225 | This permits a key's payload to be viewed or a keyring's list of linked | 222 | This permits a key's payload to be viewed or a keyring's list of linked |
226 | keys. | 223 | keys. |
227 | 224 | ||
228 | (*) Write | 225 | * Write |
229 | 226 | ||
230 | This permits a key's payload to be instantiated or updated, or it allows a | 227 | This permits a key's payload to be instantiated or updated, or it allows a |
231 | link to be added to or removed from a keyring. | 228 | link to be added to or removed from a keyring. |
232 | 229 | ||
233 | (*) Search | 230 | * Search |
234 | 231 | ||
235 | This permits keyrings to be searched and keys to be found. Searches can | 232 | This permits keyrings to be searched and keys to be found. Searches can |
236 | only recurse into nested keyrings that have search permission set. | 233 | only recurse into nested keyrings that have search permission set. |
237 | 234 | ||
238 | (*) Link | 235 | * Link |
239 | 236 | ||
240 | This permits a key or keyring to be linked to. To create a link from a | 237 | This permits a key or keyring to be linked to. To create a link from a |
241 | keyring to a key, a process must have Write permission on the keyring and | 238 | keyring to a key, a process must have Write permission on the keyring and |
242 | Link permission on the key. | 239 | Link permission on the key. |
243 | 240 | ||
244 | (*) Set Attribute | 241 | * Set Attribute |
245 | 242 | ||
246 | This permits a key's UID, GID and permissions mask to be changed. | 243 | This permits a key's UID, GID and permissions mask to be changed. |
247 | 244 | ||
@@ -249,8 +246,7 @@ For changing the ownership, group ID or permissions mask, being the owner of | |||
249 | the key or having the sysadmin capability is sufficient. | 246 | the key or having the sysadmin capability is sufficient. |
250 | 247 | ||
251 | 248 | ||
252 | =============== | 249 | SELinux Support |
253 | SELINUX SUPPORT | ||
254 | =============== | 250 | =============== |
255 | 251 | ||
256 | The security class "key" has been added to SELinux so that mandatory access | 252 | The security class "key" has been added to SELinux so that mandatory access |
@@ -282,14 +278,13 @@ their associated thread, and both session and process keyrings are handled | |||
282 | similarly. | 278 | similarly. |
283 | 279 | ||
284 | 280 | ||
285 | ================ | 281 | New ProcFS Files |
286 | NEW PROCFS FILES | ||
287 | ================ | 282 | ================ |
288 | 283 | ||
289 | Two files have been added to procfs by which an administrator can find out | 284 | Two files have been added to procfs by which an administrator can find out |
290 | about the status of the key service: | 285 | about the status of the key service: |
291 | 286 | ||
292 | (*) /proc/keys | 287 | * /proc/keys |
293 | 288 | ||
294 | This lists the keys that are currently viewable by the task reading the | 289 | This lists the keys that are currently viewable by the task reading the |
295 | file, giving information about their type, description and permissions. | 290 | file, giving information about their type, description and permissions. |
@@ -301,7 +296,7 @@ about the status of the key service: | |||
301 | security checks are still performed, and may further filter out keys that | 296 | security checks are still performed, and may further filter out keys that |
302 | the current process is not authorised to view. | 297 | the current process is not authorised to view. |
303 | 298 | ||
304 | The contents of the file look like this: | 299 | The contents of the file look like this:: |
305 | 300 | ||
306 | SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY | 301 | SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY |
307 | 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 | 302 | 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 |
@@ -314,7 +309,7 @@ about the status of the key service: | |||
314 | 00000893 I--Q-N 1 35s 1f3f0000 0 0 user metal:silver: 0 | 309 | 00000893 I--Q-N 1 35s 1f3f0000 0 0 user metal:silver: 0 |
315 | 00000894 I--Q-- 1 10h 003f0000 0 0 user metal:gold: 0 | 310 | 00000894 I--Q-- 1 10h 003f0000 0 0 user metal:gold: 0 |
316 | 311 | ||
317 | The flags are: | 312 | The flags are:: |
318 | 313 | ||
319 | I Instantiated | 314 | I Instantiated |
320 | R Revoked | 315 | R Revoked |
@@ -324,10 +319,10 @@ about the status of the key service: | |||
324 | N Negative key | 319 | N Negative key |
325 | 320 | ||
326 | 321 | ||
327 | (*) /proc/key-users | 322 | * /proc/key-users |
328 | 323 | ||
329 | This file lists the tracking data for each user that has at least one key | 324 | This file lists the tracking data for each user that has at least one key |
330 | on the system. Such data includes quota information and statistics: | 325 | on the system. Such data includes quota information and statistics:: |
331 | 326 | ||
332 | [root@andromeda root]# cat /proc/key-users | 327 | [root@andromeda root]# cat /proc/key-users |
333 | 0: 46 45/45 1/100 13/10000 | 328 | 0: 46 45/45 1/100 13/10000 |
@@ -335,7 +330,8 @@ about the status of the key service: | |||
335 | 32: 2 2/2 2/100 40/10000 | 330 | 32: 2 2/2 2/100 40/10000 |
336 | 38: 2 2/2 2/100 40/10000 | 331 | 38: 2 2/2 2/100 40/10000 |
337 | 332 | ||
338 | The format of each line is | 333 | The format of each line is:: |
334 | |||
339 | <UID>: User ID to which this applies | 335 | <UID>: User ID to which this applies |
340 | <usage> Structure refcount | 336 | <usage> Structure refcount |
341 | <inst>/<keys> Total number of keys and number instantiated | 337 | <inst>/<keys> Total number of keys and number instantiated |
@@ -346,14 +342,14 @@ about the status of the key service: | |||
346 | Four new sysctl files have been added also for the purpose of controlling the | 342 | Four new sysctl files have been added also for the purpose of controlling the |
347 | quota limits on keys: | 343 | quota limits on keys: |
348 | 344 | ||
349 | (*) /proc/sys/kernel/keys/root_maxkeys | 345 | * /proc/sys/kernel/keys/root_maxkeys |
350 | /proc/sys/kernel/keys/root_maxbytes | 346 | /proc/sys/kernel/keys/root_maxbytes |
351 | 347 | ||
352 | These files hold the maximum number of keys that root may have and the | 348 | These files hold the maximum number of keys that root may have and the |
353 | maximum total number of bytes of data that root may have stored in those | 349 | maximum total number of bytes of data that root may have stored in those |
354 | keys. | 350 | keys. |
355 | 351 | ||
356 | (*) /proc/sys/kernel/keys/maxkeys | 352 | * /proc/sys/kernel/keys/maxkeys |
357 | /proc/sys/kernel/keys/maxbytes | 353 | /proc/sys/kernel/keys/maxbytes |
358 | 354 | ||
359 | These files hold the maximum number of keys that each non-root user may | 355 | These files hold the maximum number of keys that each non-root user may |
@@ -364,8 +360,7 @@ Root may alter these by writing each new limit as a decimal number string to | |||
364 | the appropriate file. | 360 | the appropriate file. |
365 | 361 | ||
366 | 362 | ||
367 | =============================== | 363 | Userspace System Call Interface |
368 | USERSPACE SYSTEM CALL INTERFACE | ||
369 | =============================== | 364 | =============================== |
370 | 365 | ||
371 | Userspace can manipulate keys directly through three new syscalls: add_key, | 366 | Userspace can manipulate keys directly through three new syscalls: add_key, |
@@ -375,7 +370,7 @@ manipulating keys. | |||
375 | When referring to a key directly, userspace programs should use the key's | 370 | When referring to a key directly, userspace programs should use the key's |
376 | serial number (a positive 32-bit integer). However, there are some special | 371 | serial number (a positive 32-bit integer). However, there are some special |
377 | values available for referring to special keys and keyrings that relate to the | 372 | values available for referring to special keys and keyrings that relate to the |
378 | process making the call: | 373 | process making the call:: |
379 | 374 | ||
380 | CONSTANT VALUE KEY REFERENCED | 375 | CONSTANT VALUE KEY REFERENCED |
381 | ============================== ====== =========================== | 376 | ============================== ====== =========================== |
@@ -391,8 +386,8 @@ process making the call: | |||
391 | 386 | ||
392 | The main syscalls are: | 387 | The main syscalls are: |
393 | 388 | ||
394 | (*) Create a new key of given type, description and payload and add it to the | 389 | * Create a new key of given type, description and payload and add it to the |
395 | nominated keyring: | 390 | nominated keyring:: |
396 | 391 | ||
397 | key_serial_t add_key(const char *type, const char *desc, | 392 | key_serial_t add_key(const char *type, const char *desc, |
398 | const void *payload, size_t plen, | 393 | const void *payload, size_t plen, |
@@ -432,8 +427,8 @@ The main syscalls are: | |||
432 | The ID of the new or updated key is returned if successful. | 427 | The ID of the new or updated key is returned if successful. |
433 | 428 | ||
434 | 429 | ||
435 | (*) Search the process's keyrings for a key, potentially calling out to | 430 | * Search the process's keyrings for a key, potentially calling out to |
436 | userspace to create it. | 431 | userspace to create it:: |
437 | 432 | ||
438 | key_serial_t request_key(const char *type, const char *description, | 433 | key_serial_t request_key(const char *type, const char *description, |
439 | const char *callout_info, | 434 | const char *callout_info, |
@@ -453,7 +448,7 @@ The main syscalls are: | |||
453 | 448 | ||
454 | The keyctl syscall functions are: | 449 | The keyctl syscall functions are: |
455 | 450 | ||
456 | (*) Map a special key ID to a real key ID for this process: | 451 | * Map a special key ID to a real key ID for this process:: |
457 | 452 | ||
458 | key_serial_t keyctl(KEYCTL_GET_KEYRING_ID, key_serial_t id, | 453 | key_serial_t keyctl(KEYCTL_GET_KEYRING_ID, key_serial_t id, |
459 | int create); | 454 | int create); |
@@ -466,7 +461,7 @@ The keyctl syscall functions are: | |||
466 | non-zero; and the error ENOKEY will be returned if "create" is zero. | 461 | non-zero; and the error ENOKEY will be returned if "create" is zero. |
467 | 462 | ||
468 | 463 | ||
469 | (*) Replace the session keyring this process subscribes to with a new one: | 464 | * Replace the session keyring this process subscribes to with a new one:: |
470 | 465 | ||
471 | key_serial_t keyctl(KEYCTL_JOIN_SESSION_KEYRING, const char *name); | 466 | key_serial_t keyctl(KEYCTL_JOIN_SESSION_KEYRING, const char *name); |
472 | 467 | ||
@@ -484,7 +479,7 @@ The keyctl syscall functions are: | |||
484 | The ID of the new session keyring is returned if successful. | 479 | The ID of the new session keyring is returned if successful. |
485 | 480 | ||
486 | 481 | ||
487 | (*) Update the specified key: | 482 | * Update the specified key:: |
488 | 483 | ||
489 | long keyctl(KEYCTL_UPDATE, key_serial_t key, const void *payload, | 484 | long keyctl(KEYCTL_UPDATE, key_serial_t key, const void *payload, |
490 | size_t plen); | 485 | size_t plen); |
@@ -498,7 +493,7 @@ The keyctl syscall functions are: | |||
498 | add_key(). | 493 | add_key(). |
499 | 494 | ||
500 | 495 | ||
501 | (*) Revoke a key: | 496 | * Revoke a key:: |
502 | 497 | ||
503 | long keyctl(KEYCTL_REVOKE, key_serial_t key); | 498 | long keyctl(KEYCTL_REVOKE, key_serial_t key); |
504 | 499 | ||
@@ -507,7 +502,7 @@ The keyctl syscall functions are: | |||
507 | be findable. | 502 | be findable. |
508 | 503 | ||
509 | 504 | ||
510 | (*) Change the ownership of a key: | 505 | * Change the ownership of a key:: |
511 | 506 | ||
512 | long keyctl(KEYCTL_CHOWN, key_serial_t key, uid_t uid, gid_t gid); | 507 | long keyctl(KEYCTL_CHOWN, key_serial_t key, uid_t uid, gid_t gid); |
513 | 508 | ||
@@ -520,7 +515,7 @@ The keyctl syscall functions are: | |||
520 | its group list members. | 515 | its group list members. |
521 | 516 | ||
522 | 517 | ||
523 | (*) Change the permissions mask on a key: | 518 | * Change the permissions mask on a key:: |
524 | 519 | ||
525 | long keyctl(KEYCTL_SETPERM, key_serial_t key, key_perm_t perm); | 520 | long keyctl(KEYCTL_SETPERM, key_serial_t key, key_perm_t perm); |
526 | 521 | ||
@@ -531,7 +526,7 @@ The keyctl syscall functions are: | |||
531 | error EINVAL will be returned. | 526 | error EINVAL will be returned. |
532 | 527 | ||
533 | 528 | ||
534 | (*) Describe a key: | 529 | * Describe a key:: |
535 | 530 | ||
536 | long keyctl(KEYCTL_DESCRIBE, key_serial_t key, char *buffer, | 531 | long keyctl(KEYCTL_DESCRIBE, key_serial_t key, char *buffer, |
537 | size_t buflen); | 532 | size_t buflen); |
@@ -547,7 +542,7 @@ The keyctl syscall functions are: | |||
547 | A process must have view permission on the key for this function to be | 542 | A process must have view permission on the key for this function to be |
548 | successful. | 543 | successful. |
549 | 544 | ||
550 | If successful, a string is placed in the buffer in the following format: | 545 | If successful, a string is placed in the buffer in the following format:: |
551 | 546 | ||
552 | <type>;<uid>;<gid>;<perm>;<description> | 547 | <type>;<uid>;<gid>;<perm>;<description> |
553 | 548 | ||
@@ -555,12 +550,12 @@ The keyctl syscall functions are: | |||
555 | is hexadecimal. A NUL character is included at the end of the string if | 550 | is hexadecimal. A NUL character is included at the end of the string if |
556 | the buffer is sufficiently big. | 551 | the buffer is sufficiently big. |
557 | 552 | ||
558 | This can be parsed with | 553 | This can be parsed with:: |
559 | 554 | ||
560 | sscanf(buffer, "%[^;];%d;%d;%o;%s", type, &uid, &gid, &mode, desc); | 555 | sscanf(buffer, "%[^;];%d;%d;%o;%s", type, &uid, &gid, &mode, desc); |
561 | 556 | ||
562 | 557 | ||
563 | (*) Clear out a keyring: | 558 | * Clear out a keyring:: |
564 | 559 | ||
565 | long keyctl(KEYCTL_CLEAR, key_serial_t keyring); | 560 | long keyctl(KEYCTL_CLEAR, key_serial_t keyring); |
566 | 561 | ||
@@ -573,7 +568,7 @@ The keyctl syscall functions are: | |||
573 | DNS resolver cache keyring is an example of this. | 568 | DNS resolver cache keyring is an example of this. |
574 | 569 | ||
575 | 570 | ||
576 | (*) Link a key into a keyring: | 571 | * Link a key into a keyring:: |
577 | 572 | ||
578 | long keyctl(KEYCTL_LINK, key_serial_t keyring, key_serial_t key); | 573 | long keyctl(KEYCTL_LINK, key_serial_t keyring, key_serial_t key); |
579 | 574 | ||
@@ -592,7 +587,7 @@ The keyctl syscall functions are: | |||
592 | added. | 587 | added. |
593 | 588 | ||
594 | 589 | ||
595 | (*) Unlink a key or keyring from another keyring: | 590 | * Unlink a key or keyring from another keyring:: |
596 | 591 | ||
597 | long keyctl(KEYCTL_UNLINK, key_serial_t keyring, key_serial_t key); | 592 | long keyctl(KEYCTL_UNLINK, key_serial_t keyring, key_serial_t key); |
598 | 593 | ||
@@ -604,7 +599,7 @@ The keyctl syscall functions are: | |||
604 | is not present, error ENOENT will be the result. | 599 | is not present, error ENOENT will be the result. |
605 | 600 | ||
606 | 601 | ||
607 | (*) Search a keyring tree for a key: | 602 | * Search a keyring tree for a key:: |
608 | 603 | ||
609 | key_serial_t keyctl(KEYCTL_SEARCH, key_serial_t keyring, | 604 | key_serial_t keyctl(KEYCTL_SEARCH, key_serial_t keyring, |
610 | const char *type, const char *description, | 605 | const char *type, const char *description, |
@@ -628,7 +623,7 @@ The keyctl syscall functions are: | |||
628 | fails. On success, the resulting key ID will be returned. | 623 | fails. On success, the resulting key ID will be returned. |
629 | 624 | ||
630 | 625 | ||
631 | (*) Read the payload data from a key: | 626 | * Read the payload data from a key:: |
632 | 627 | ||
633 | long keyctl(KEYCTL_READ, key_serial_t keyring, char *buffer, | 628 | long keyctl(KEYCTL_READ, key_serial_t keyring, char *buffer, |
634 | size_t buflen); | 629 | size_t buflen); |
@@ -650,7 +645,7 @@ The keyctl syscall functions are: | |||
650 | available rather than the amount copied. | 645 | available rather than the amount copied. |
651 | 646 | ||
652 | 647 | ||
653 | (*) Instantiate a partially constructed key. | 648 | * Instantiate a partially constructed key:: |
654 | 649 | ||
655 | long keyctl(KEYCTL_INSTANTIATE, key_serial_t key, | 650 | long keyctl(KEYCTL_INSTANTIATE, key_serial_t key, |
656 | const void *payload, size_t plen, | 651 | const void *payload, size_t plen, |
@@ -677,7 +672,7 @@ The keyctl syscall functions are: | |||
677 | array instead of a single buffer. | 672 | array instead of a single buffer. |
678 | 673 | ||
679 | 674 | ||
680 | (*) Negatively instantiate a partially constructed key. | 675 | * Negatively instantiate a partially constructed key:: |
681 | 676 | ||
682 | long keyctl(KEYCTL_NEGATE, key_serial_t key, | 677 | long keyctl(KEYCTL_NEGATE, key_serial_t key, |
683 | unsigned timeout, key_serial_t keyring); | 678 | unsigned timeout, key_serial_t keyring); |
@@ -700,12 +695,12 @@ The keyctl syscall functions are: | |||
700 | as rejecting the key with ENOKEY as the error code. | 695 | as rejecting the key with ENOKEY as the error code. |
701 | 696 | ||
702 | 697 | ||
703 | (*) Set the default request-key destination keyring. | 698 | * Set the default request-key destination keyring:: |
704 | 699 | ||
705 | long keyctl(KEYCTL_SET_REQKEY_KEYRING, int reqkey_defl); | 700 | long keyctl(KEYCTL_SET_REQKEY_KEYRING, int reqkey_defl); |
706 | 701 | ||
707 | This sets the default keyring to which implicitly requested keys will be | 702 | This sets the default keyring to which implicitly requested keys will be |
708 | attached for this thread. reqkey_defl should be one of these constants: | 703 | attached for this thread. reqkey_defl should be one of these constants:: |
709 | 704 | ||
710 | CONSTANT VALUE NEW DEFAULT KEYRING | 705 | CONSTANT VALUE NEW DEFAULT KEYRING |
711 | ====================================== ====== ======================= | 706 | ====================================== ====== ======================= |
@@ -731,7 +726,7 @@ The keyctl syscall functions are: | |||
731 | there is one, otherwise the user default session keyring. | 726 | there is one, otherwise the user default session keyring. |
732 | 727 | ||
733 | 728 | ||
734 | (*) Set the timeout on a key. | 729 | * Set the timeout on a key:: |
735 | 730 | ||
736 | long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout); | 731 | long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout); |
737 | 732 | ||
@@ -744,7 +739,7 @@ The keyctl syscall functions are: | |||
744 | or expired keys. | 739 | or expired keys. |
745 | 740 | ||
746 | 741 | ||
747 | (*) Assume the authority granted to instantiate a key | 742 | * Assume the authority granted to instantiate a key:: |
748 | 743 | ||
749 | long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key); | 744 | long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key); |
750 | 745 | ||
@@ -766,7 +761,7 @@ The keyctl syscall functions are: | |||
766 | The assumed authoritative key is inherited across fork and exec. | 761 | The assumed authoritative key is inherited across fork and exec. |
767 | 762 | ||
768 | 763 | ||
769 | (*) Get the LSM security context attached to a key. | 764 | * Get the LSM security context attached to a key:: |
770 | 765 | ||
771 | long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer, | 766 | long keyctl(KEYCTL_GET_SECURITY, key_serial_t key, char *buffer, |
772 | size_t buflen) | 767 | size_t buflen) |
@@ -787,7 +782,7 @@ The keyctl syscall functions are: | |||
787 | successful. | 782 | successful. |
788 | 783 | ||
789 | 784 | ||
790 | (*) Install the calling process's session keyring on its parent. | 785 | * Install the calling process's session keyring on its parent:: |
791 | 786 | ||
792 | long keyctl(KEYCTL_SESSION_TO_PARENT); | 787 | long keyctl(KEYCTL_SESSION_TO_PARENT); |
793 | 788 | ||
@@ -807,7 +802,7 @@ The keyctl syscall functions are: | |||
807 | kernel and resumes executing userspace. | 802 | kernel and resumes executing userspace. |
808 | 803 | ||
809 | 804 | ||
810 | (*) Invalidate a key. | 805 | * Invalidate a key:: |
811 | 806 | ||
812 | long keyctl(KEYCTL_INVALIDATE, key_serial_t key); | 807 | long keyctl(KEYCTL_INVALIDATE, key_serial_t key); |
813 | 808 | ||
@@ -823,20 +818,19 @@ The keyctl syscall functions are: | |||
823 | A process must have search permission on the key for this function to be | 818 | A process must have search permission on the key for this function to be |
824 | successful. | 819 | successful. |
825 | 820 | ||
826 | (*) Compute a Diffie-Hellman shared secret or public key | 821 | * Compute a Diffie-Hellman shared secret or public key:: |
827 | 822 | ||
828 | long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, | 823 | long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, |
829 | char *buffer, size_t buflen, | 824 | char *buffer, size_t buflen, struct keyctl_kdf_params *kdf); |
830 | struct keyctl_kdf_params *kdf); | ||
831 | 825 | ||
832 | The params struct contains serial numbers for three keys: | 826 | The params struct contains serial numbers for three keys:: |
833 | 827 | ||
834 | - The prime, p, known to both parties | 828 | - The prime, p, known to both parties |
835 | - The local private key | 829 | - The local private key |
836 | - The base integer, which is either a shared generator or the | 830 | - The base integer, which is either a shared generator or the |
837 | remote public key | 831 | remote public key |
838 | 832 | ||
839 | The value computed is: | 833 | The value computed is:: |
840 | 834 | ||
841 | result = base ^ private (mod prime) | 835 | result = base ^ private (mod prime) |
842 | 836 | ||
@@ -858,12 +852,12 @@ The keyctl syscall functions are: | |||
858 | of the KDF is returned to the caller. The KDF is characterized with | 852 | of the KDF is returned to the caller. The KDF is characterized with |
859 | struct keyctl_kdf_params as follows: | 853 | struct keyctl_kdf_params as follows: |
860 | 854 | ||
861 | - char *hashname specifies the NUL terminated string identifying | 855 | - ``char *hashname`` specifies the NUL terminated string identifying |
862 | the hash used from the kernel crypto API and applied for the KDF | 856 | the hash used from the kernel crypto API and applied for the KDF |
863 | operation. The KDF implemenation complies with SP800-56A as well | 857 | operation. The KDF implemenation complies with SP800-56A as well |
864 | as with SP800-108 (the counter KDF). | 858 | as with SP800-108 (the counter KDF). |
865 | 859 | ||
866 | - char *otherinfo specifies the OtherInfo data as documented in | 860 | - ``char *otherinfo`` specifies the OtherInfo data as documented in |
867 | SP800-56A section 5.8.1.2. The length of the buffer is given with | 861 | SP800-56A section 5.8.1.2. The length of the buffer is given with |
868 | otherinfolen. The format of OtherInfo is defined by the caller. | 862 | otherinfolen. The format of OtherInfo is defined by the caller. |
869 | The otherinfo pointer may be NULL if no OtherInfo shall be used. | 863 | The otherinfo pointer may be NULL if no OtherInfo shall be used. |
@@ -875,10 +869,10 @@ The keyctl syscall functions are: | |||
875 | and either the buffer length or the OtherInfo length exceeds the | 869 | and either the buffer length or the OtherInfo length exceeds the |
876 | allowed length. | 870 | allowed length. |
877 | 871 | ||
878 | (*) Restrict keyring linkage | 872 | * Restrict keyring linkage:: |
879 | 873 | ||
880 | long keyctl(KEYCTL_RESTRICT_KEYRING, key_serial_t keyring, | 874 | long keyctl(KEYCTL_RESTRICT_KEYRING, key_serial_t keyring, |
881 | const char *type, const char *restriction); | 875 | const char *type, const char *restriction); |
882 | 876 | ||
883 | An existing keyring can restrict linkage of additional keys by evaluating | 877 | An existing keyring can restrict linkage of additional keys by evaluating |
884 | the contents of the key according to a restriction scheme. | 878 | the contents of the key according to a restriction scheme. |
@@ -900,8 +894,7 @@ The keyctl syscall functions are: | |||
900 | To apply a keyring restriction the process must have Set Attribute | 894 | To apply a keyring restriction the process must have Set Attribute |
901 | permission and the keyring must not be previously restricted. | 895 | permission and the keyring must not be previously restricted. |
902 | 896 | ||
903 | =============== | 897 | Kernel Services |
904 | KERNEL SERVICES | ||
905 | =============== | 898 | =============== |
906 | 899 | ||
907 | The kernel services for key management are fairly simple to deal with. They can | 900 | The kernel services for key management are fairly simple to deal with. They can |
@@ -915,29 +908,29 @@ call, and the key released upon close. How to deal with conflicting keys due to | |||
915 | two different users opening the same file is left to the filesystem author to | 908 | two different users opening the same file is left to the filesystem author to |
916 | solve. | 909 | solve. |
917 | 910 | ||
918 | To access the key manager, the following header must be #included: | 911 | To access the key manager, the following header must be #included:: |
919 | 912 | ||
920 | <linux/key.h> | 913 | <linux/key.h> |
921 | 914 | ||
922 | Specific key types should have a header file under include/keys/ that should be | 915 | Specific key types should have a header file under include/keys/ that should be |
923 | used to access that type. For keys of type "user", for example, that would be: | 916 | used to access that type. For keys of type "user", for example, that would be:: |
924 | 917 | ||
925 | <keys/user-type.h> | 918 | <keys/user-type.h> |
926 | 919 | ||
927 | Note that there are two different types of pointers to keys that may be | 920 | Note that there are two different types of pointers to keys that may be |
928 | encountered: | 921 | encountered: |
929 | 922 | ||
930 | (*) struct key * | 923 | * struct key * |
931 | 924 | ||
932 | This simply points to the key structure itself. Key structures will be at | 925 | This simply points to the key structure itself. Key structures will be at |
933 | least four-byte aligned. | 926 | least four-byte aligned. |
934 | 927 | ||
935 | (*) key_ref_t | 928 | * key_ref_t |
936 | 929 | ||
937 | This is equivalent to a struct key *, but the least significant bit is set | 930 | This is equivalent to a ``struct key *``, but the least significant bit is set |
938 | if the caller "possesses" the key. By "possession" it is meant that the | 931 | if the caller "possesses" the key. By "possession" it is meant that the |
939 | calling processes has a searchable link to the key from one of its | 932 | calling processes has a searchable link to the key from one of its |
940 | keyrings. There are three functions for dealing with these: | 933 | keyrings. There are three functions for dealing with these:: |
941 | 934 | ||
942 | key_ref_t make_key_ref(const struct key *key, bool possession); | 935 | key_ref_t make_key_ref(const struct key *key, bool possession); |
943 | 936 | ||
@@ -955,7 +948,7 @@ When accessing a key's payload contents, certain precautions must be taken to | |||
955 | prevent access vs modification races. See the section "Notes on accessing | 948 | prevent access vs modification races. See the section "Notes on accessing |
956 | payload contents" for more information. | 949 | payload contents" for more information. |
957 | 950 | ||
958 | (*) To search for a key, call: | 951 | * To search for a key, call:: |
959 | 952 | ||
960 | struct key *request_key(const struct key_type *type, | 953 | struct key *request_key(const struct key_type *type, |
961 | const char *description, | 954 | const char *description, |
@@ -977,7 +970,7 @@ payload contents" for more information. | |||
977 | See also Documentation/security/keys-request-key.txt. | 970 | See also Documentation/security/keys-request-key.txt. |
978 | 971 | ||
979 | 972 | ||
980 | (*) To search for a key, passing auxiliary data to the upcaller, call: | 973 | * To search for a key, passing auxiliary data to the upcaller, call:: |
981 | 974 | ||
982 | struct key *request_key_with_auxdata(const struct key_type *type, | 975 | struct key *request_key_with_auxdata(const struct key_type *type, |
983 | const char *description, | 976 | const char *description, |
@@ -990,14 +983,14 @@ payload contents" for more information. | |||
990 | is a blob of length callout_len, if given (the length may be 0). | 983 | is a blob of length callout_len, if given (the length may be 0). |
991 | 984 | ||
992 | 985 | ||
993 | (*) A key can be requested asynchronously by calling one of: | 986 | * A key can be requested asynchronously by calling one of:: |
994 | 987 | ||
995 | struct key *request_key_async(const struct key_type *type, | 988 | struct key *request_key_async(const struct key_type *type, |
996 | const char *description, | 989 | const char *description, |
997 | const void *callout_info, | 990 | const void *callout_info, |
998 | size_t callout_len); | 991 | size_t callout_len); |
999 | 992 | ||
1000 | or: | 993 | or:: |
1001 | 994 | ||
1002 | struct key *request_key_async_with_auxdata(const struct key_type *type, | 995 | struct key *request_key_async_with_auxdata(const struct key_type *type, |
1003 | const char *description, | 996 | const char *description, |
@@ -1010,7 +1003,7 @@ payload contents" for more information. | |||
1010 | 1003 | ||
1011 | These two functions return with the key potentially still under | 1004 | These two functions return with the key potentially still under |
1012 | construction. To wait for construction completion, the following should be | 1005 | construction. To wait for construction completion, the following should be |
1013 | called: | 1006 | called:: |
1014 | 1007 | ||
1015 | int wait_for_key_construction(struct key *key, bool intr); | 1008 | int wait_for_key_construction(struct key *key, bool intr); |
1016 | 1009 | ||
@@ -1022,11 +1015,11 @@ payload contents" for more information. | |||
1022 | case error ERESTARTSYS will be returned. | 1015 | case error ERESTARTSYS will be returned. |
1023 | 1016 | ||
1024 | 1017 | ||
1025 | (*) When it is no longer required, the key should be released using: | 1018 | * When it is no longer required, the key should be released using:: |
1026 | 1019 | ||
1027 | void key_put(struct key *key); | 1020 | void key_put(struct key *key); |
1028 | 1021 | ||
1029 | Or: | 1022 | Or:: |
1030 | 1023 | ||
1031 | void key_ref_put(key_ref_t key_ref); | 1024 | void key_ref_put(key_ref_t key_ref); |
1032 | 1025 | ||
@@ -1034,8 +1027,8 @@ payload contents" for more information. | |||
1034 | the argument will not be parsed. | 1027 | the argument will not be parsed. |
1035 | 1028 | ||
1036 | 1029 | ||
1037 | (*) Extra references can be made to a key by calling one of the following | 1030 | * Extra references can be made to a key by calling one of the following |
1038 | functions: | 1031 | functions:: |
1039 | 1032 | ||
1040 | struct key *__key_get(struct key *key); | 1033 | struct key *__key_get(struct key *key); |
1041 | struct key *key_get(struct key *key); | 1034 | struct key *key_get(struct key *key); |
@@ -1047,7 +1040,7 @@ payload contents" for more information. | |||
1047 | then the key will not be dereferenced and no increment will take place. | 1040 | then the key will not be dereferenced and no increment will take place. |
1048 | 1041 | ||
1049 | 1042 | ||
1050 | (*) A key's serial number can be obtained by calling: | 1043 | * A key's serial number can be obtained by calling:: |
1051 | 1044 | ||
1052 | key_serial_t key_serial(struct key *key); | 1045 | key_serial_t key_serial(struct key *key); |
1053 | 1046 | ||
@@ -1055,7 +1048,7 @@ payload contents" for more information. | |||
1055 | latter case without parsing the argument). | 1048 | latter case without parsing the argument). |
1056 | 1049 | ||
1057 | 1050 | ||
1058 | (*) If a keyring was found in the search, this can be further searched by: | 1051 | * If a keyring was found in the search, this can be further searched by:: |
1059 | 1052 | ||
1060 | key_ref_t keyring_search(key_ref_t keyring_ref, | 1053 | key_ref_t keyring_search(key_ref_t keyring_ref, |
1061 | const struct key_type *type, | 1054 | const struct key_type *type, |
@@ -1070,7 +1063,7 @@ payload contents" for more information. | |||
1070 | reference pointer if successful. | 1063 | reference pointer if successful. |
1071 | 1064 | ||
1072 | 1065 | ||
1073 | (*) A keyring can be created by: | 1066 | * A keyring can be created by:: |
1074 | 1067 | ||
1075 | struct key *keyring_alloc(const char *description, uid_t uid, gid_t gid, | 1068 | struct key *keyring_alloc(const char *description, uid_t uid, gid_t gid, |
1076 | const struct cred *cred, | 1069 | const struct cred *cred, |
@@ -1109,7 +1102,7 @@ payload contents" for more information. | |||
1109 | -EPERM to in this case. | 1102 | -EPERM to in this case. |
1110 | 1103 | ||
1111 | 1104 | ||
1112 | (*) To check the validity of a key, this function can be called: | 1105 | * To check the validity of a key, this function can be called:: |
1113 | 1106 | ||
1114 | int validate_key(struct key *key); | 1107 | int validate_key(struct key *key); |
1115 | 1108 | ||
@@ -1119,7 +1112,7 @@ payload contents" for more information. | |||
1119 | returned (in the latter case without parsing the argument). | 1112 | returned (in the latter case without parsing the argument). |
1120 | 1113 | ||
1121 | 1114 | ||
1122 | (*) To register a key type, the following function should be called: | 1115 | * To register a key type, the following function should be called:: |
1123 | 1116 | ||
1124 | int register_key_type(struct key_type *type); | 1117 | int register_key_type(struct key_type *type); |
1125 | 1118 | ||
@@ -1127,13 +1120,13 @@ payload contents" for more information. | |||
1127 | present. | 1120 | present. |
1128 | 1121 | ||
1129 | 1122 | ||
1130 | (*) To unregister a key type, call: | 1123 | * To unregister a key type, call:: |
1131 | 1124 | ||
1132 | void unregister_key_type(struct key_type *type); | 1125 | void unregister_key_type(struct key_type *type); |
1133 | 1126 | ||
1134 | 1127 | ||
1135 | Under some circumstances, it may be desirable to deal with a bundle of keys. | 1128 | Under some circumstances, it may be desirable to deal with a bundle of keys. |
1136 | The facility provides access to the keyring type for managing such a bundle: | 1129 | The facility provides access to the keyring type for managing such a bundle:: |
1137 | 1130 | ||
1138 | struct key_type key_type_keyring; | 1131 | struct key_type key_type_keyring; |
1139 | 1132 | ||
@@ -1143,8 +1136,7 @@ with keyring_search(). Note that it is not possible to use request_key() to | |||
1143 | search a specific keyring, so using keyrings in this way is of limited utility. | 1136 | search a specific keyring, so using keyrings in this way is of limited utility. |
1144 | 1137 | ||
1145 | 1138 | ||
1146 | =================================== | 1139 | Notes On Accessing Payload Contents |
1147 | NOTES ON ACCESSING PAYLOAD CONTENTS | ||
1148 | =================================== | 1140 | =================================== |
1149 | 1141 | ||
1150 | The simplest payload is just data stored in key->payload directly. In this | 1142 | The simplest payload is just data stored in key->payload directly. In this |
@@ -1154,31 +1146,31 @@ More complex payload contents must be allocated and pointers to them set in the | |||
1154 | key->payload.data[] array. One of the following ways must be selected to | 1146 | key->payload.data[] array. One of the following ways must be selected to |
1155 | access the data: | 1147 | access the data: |
1156 | 1148 | ||
1157 | (1) Unmodifiable key type. | 1149 | 1) Unmodifiable key type. |
1158 | 1150 | ||
1159 | If the key type does not have a modify method, then the key's payload can | 1151 | If the key type does not have a modify method, then the key's payload can |
1160 | be accessed without any form of locking, provided that it's known to be | 1152 | be accessed without any form of locking, provided that it's known to be |
1161 | instantiated (uninstantiated keys cannot be "found"). | 1153 | instantiated (uninstantiated keys cannot be "found"). |
1162 | 1154 | ||
1163 | (2) The key's semaphore. | 1155 | 2) The key's semaphore. |
1164 | 1156 | ||
1165 | The semaphore could be used to govern access to the payload and to control | 1157 | The semaphore could be used to govern access to the payload and to control |
1166 | the payload pointer. It must be write-locked for modifications and would | 1158 | the payload pointer. It must be write-locked for modifications and would |
1167 | have to be read-locked for general access. The disadvantage of doing this | 1159 | have to be read-locked for general access. The disadvantage of doing this |
1168 | is that the accessor may be required to sleep. | 1160 | is that the accessor may be required to sleep. |
1169 | 1161 | ||
1170 | (3) RCU. | 1162 | 3) RCU. |
1171 | 1163 | ||
1172 | RCU must be used when the semaphore isn't already held; if the semaphore | 1164 | RCU must be used when the semaphore isn't already held; if the semaphore |
1173 | is held then the contents can't change under you unexpectedly as the | 1165 | is held then the contents can't change under you unexpectedly as the |
1174 | semaphore must still be used to serialise modifications to the key. The | 1166 | semaphore must still be used to serialise modifications to the key. The |
1175 | key management code takes care of this for the key type. | 1167 | key management code takes care of this for the key type. |
1176 | 1168 | ||
1177 | However, this means using: | 1169 | However, this means using:: |
1178 | 1170 | ||
1179 | rcu_read_lock() ... rcu_dereference() ... rcu_read_unlock() | 1171 | rcu_read_lock() ... rcu_dereference() ... rcu_read_unlock() |
1180 | 1172 | ||
1181 | to read the pointer, and: | 1173 | to read the pointer, and:: |
1182 | 1174 | ||
1183 | rcu_dereference() ... rcu_assign_pointer() ... call_rcu() | 1175 | rcu_dereference() ... rcu_assign_pointer() ... call_rcu() |
1184 | 1176 | ||
@@ -1194,11 +1186,11 @@ access the data: | |||
1194 | usage. This is called key->payload.rcu_data0. The following accessors | 1186 | usage. This is called key->payload.rcu_data0. The following accessors |
1195 | wrap the RCU calls to this element: | 1187 | wrap the RCU calls to this element: |
1196 | 1188 | ||
1197 | (a) Set or change the first payload pointer: | 1189 | a) Set or change the first payload pointer:: |
1198 | 1190 | ||
1199 | rcu_assign_keypointer(struct key *key, void *data); | 1191 | rcu_assign_keypointer(struct key *key, void *data); |
1200 | 1192 | ||
1201 | (b) Read the first payload pointer with the key semaphore held: | 1193 | b) Read the first payload pointer with the key semaphore held:: |
1202 | 1194 | ||
1203 | [const] void *dereference_key_locked([const] struct key *key); | 1195 | [const] void *dereference_key_locked([const] struct key *key); |
1204 | 1196 | ||
@@ -1206,39 +1198,38 @@ access the data: | |||
1206 | parameter. Static analysis will give an error if it things the lock | 1198 | parameter. Static analysis will give an error if it things the lock |
1207 | isn't held. | 1199 | isn't held. |
1208 | 1200 | ||
1209 | (c) Read the first payload pointer with the RCU read lock held: | 1201 | c) Read the first payload pointer with the RCU read lock held:: |
1210 | 1202 | ||
1211 | const void *dereference_key_rcu(const struct key *key); | 1203 | const void *dereference_key_rcu(const struct key *key); |
1212 | 1204 | ||
1213 | 1205 | ||
1214 | =================== | 1206 | Defining a Key Type |
1215 | DEFINING A KEY TYPE | ||
1216 | =================== | 1207 | =================== |
1217 | 1208 | ||
1218 | A kernel service may want to define its own key type. For instance, an AFS | 1209 | A kernel service may want to define its own key type. For instance, an AFS |
1219 | filesystem might want to define a Kerberos 5 ticket key type. To do this, it | 1210 | filesystem might want to define a Kerberos 5 ticket key type. To do this, it |
1220 | author fills in a key_type struct and registers it with the system. | 1211 | author fills in a key_type struct and registers it with the system. |
1221 | 1212 | ||
1222 | Source files that implement key types should include the following header file: | 1213 | Source files that implement key types should include the following header file:: |
1223 | 1214 | ||
1224 | <linux/key-type.h> | 1215 | <linux/key-type.h> |
1225 | 1216 | ||
1226 | The structure has a number of fields, some of which are mandatory: | 1217 | The structure has a number of fields, some of which are mandatory: |
1227 | 1218 | ||
1228 | (*) const char *name | 1219 | * ``const char *name`` |
1229 | 1220 | ||
1230 | The name of the key type. This is used to translate a key type name | 1221 | The name of the key type. This is used to translate a key type name |
1231 | supplied by userspace into a pointer to the structure. | 1222 | supplied by userspace into a pointer to the structure. |
1232 | 1223 | ||
1233 | 1224 | ||
1234 | (*) size_t def_datalen | 1225 | * ``size_t def_datalen`` |
1235 | 1226 | ||
1236 | This is optional - it supplies the default payload data length as | 1227 | This is optional - it supplies the default payload data length as |
1237 | contributed to the quota. If the key type's payload is always or almost | 1228 | contributed to the quota. If the key type's payload is always or almost |
1238 | always the same size, then this is a more efficient way to do things. | 1229 | always the same size, then this is a more efficient way to do things. |
1239 | 1230 | ||
1240 | The data length (and quota) on a particular key can always be changed | 1231 | The data length (and quota) on a particular key can always be changed |
1241 | during instantiation or update by calling: | 1232 | during instantiation or update by calling:: |
1242 | 1233 | ||
1243 | int key_payload_reserve(struct key *key, size_t datalen); | 1234 | int key_payload_reserve(struct key *key, size_t datalen); |
1244 | 1235 | ||
@@ -1246,18 +1237,18 @@ The structure has a number of fields, some of which are mandatory: | |||
1246 | viable. | 1237 | viable. |
1247 | 1238 | ||
1248 | 1239 | ||
1249 | (*) int (*vet_description)(const char *description); | 1240 | * ``int (*vet_description)(const char *description);`` |
1250 | 1241 | ||
1251 | This optional method is called to vet a key description. If the key type | 1242 | This optional method is called to vet a key description. If the key type |
1252 | doesn't approve of the key description, it may return an error, otherwise | 1243 | doesn't approve of the key description, it may return an error, otherwise |
1253 | it should return 0. | 1244 | it should return 0. |
1254 | 1245 | ||
1255 | 1246 | ||
1256 | (*) int (*preparse)(struct key_preparsed_payload *prep); | 1247 | * ``int (*preparse)(struct key_preparsed_payload *prep);`` |
1257 | 1248 | ||
1258 | This optional method permits the key type to attempt to parse payload | 1249 | This optional method permits the key type to attempt to parse payload |
1259 | before a key is created (add key) or the key semaphore is taken (update or | 1250 | before a key is created (add key) or the key semaphore is taken (update or |
1260 | instantiate key). The structure pointed to by prep looks like: | 1251 | instantiate key). The structure pointed to by prep looks like:: |
1261 | 1252 | ||
1262 | struct key_preparsed_payload { | 1253 | struct key_preparsed_payload { |
1263 | char *description; | 1254 | char *description; |
@@ -1285,7 +1276,7 @@ The structure has a number of fields, some of which are mandatory: | |||
1285 | otherwise. | 1276 | otherwise. |
1286 | 1277 | ||
1287 | 1278 | ||
1288 | (*) void (*free_preparse)(struct key_preparsed_payload *prep); | 1279 | * ``void (*free_preparse)(struct key_preparsed_payload *prep);`` |
1289 | 1280 | ||
1290 | This method is only required if the preparse() method is provided, | 1281 | This method is only required if the preparse() method is provided, |
1291 | otherwise it is unused. It cleans up anything attached to the description | 1282 | otherwise it is unused. It cleans up anything attached to the description |
@@ -1294,7 +1285,7 @@ The structure has a number of fields, some of which are mandatory: | |||
1294 | successfully, even if instantiate() or update() succeed. | 1285 | successfully, even if instantiate() or update() succeed. |
1295 | 1286 | ||
1296 | 1287 | ||
1297 | (*) int (*instantiate)(struct key *key, struct key_preparsed_payload *prep); | 1288 | * ``int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);`` |
1298 | 1289 | ||
1299 | This method is called to attach a payload to a key during construction. | 1290 | This method is called to attach a payload to a key during construction. |
1300 | The payload attached need not bear any relation to the data passed to this | 1291 | The payload attached need not bear any relation to the data passed to this |
@@ -1318,7 +1309,7 @@ The structure has a number of fields, some of which are mandatory: | |||
1318 | free_preparse method doesn't release the data. | 1309 | free_preparse method doesn't release the data. |
1319 | 1310 | ||
1320 | 1311 | ||
1321 | (*) int (*update)(struct key *key, const void *data, size_t datalen); | 1312 | * ``int (*update)(struct key *key, const void *data, size_t datalen);`` |
1322 | 1313 | ||
1323 | If this type of key can be updated, then this method should be provided. | 1314 | If this type of key can be updated, then this method should be provided. |
1324 | It is called to update a key's payload from the blob of data provided. | 1315 | It is called to update a key's payload from the blob of data provided. |
@@ -1343,10 +1334,10 @@ The structure has a number of fields, some of which are mandatory: | |||
1343 | It is safe to sleep in this method. | 1334 | It is safe to sleep in this method. |
1344 | 1335 | ||
1345 | 1336 | ||
1346 | (*) int (*match_preparse)(struct key_match_data *match_data); | 1337 | * ``int (*match_preparse)(struct key_match_data *match_data);`` |
1347 | 1338 | ||
1348 | This method is optional. It is called when a key search is about to be | 1339 | This method is optional. It is called when a key search is about to be |
1349 | performed. It is given the following structure: | 1340 | performed. It is given the following structure:: |
1350 | 1341 | ||
1351 | struct key_match_data { | 1342 | struct key_match_data { |
1352 | bool (*cmp)(const struct key *key, | 1343 | bool (*cmp)(const struct key *key, |
@@ -1357,23 +1348,23 @@ The structure has a number of fields, some of which are mandatory: | |||
1357 | }; | 1348 | }; |
1358 | 1349 | ||
1359 | On entry, raw_data will be pointing to the criteria to be used in matching | 1350 | On entry, raw_data will be pointing to the criteria to be used in matching |
1360 | a key by the caller and should not be modified. (*cmp)() will be pointing | 1351 | a key by the caller and should not be modified. ``(*cmp)()`` will be pointing |
1361 | to the default matcher function (which does an exact description match | 1352 | to the default matcher function (which does an exact description match |
1362 | against raw_data) and lookup_type will be set to indicate a direct lookup. | 1353 | against raw_data) and lookup_type will be set to indicate a direct lookup. |
1363 | 1354 | ||
1364 | The following lookup_type values are available: | 1355 | The following lookup_type values are available: |
1365 | 1356 | ||
1366 | [*] KEYRING_SEARCH_LOOKUP_DIRECT - A direct lookup hashes the type and | 1357 | * KEYRING_SEARCH_LOOKUP_DIRECT - A direct lookup hashes the type and |
1367 | description to narrow down the search to a small number of keys. | 1358 | description to narrow down the search to a small number of keys. |
1368 | 1359 | ||
1369 | [*] KEYRING_SEARCH_LOOKUP_ITERATE - An iterative lookup walks all the | 1360 | * KEYRING_SEARCH_LOOKUP_ITERATE - An iterative lookup walks all the |
1370 | keys in the keyring until one is matched. This must be used for any | 1361 | keys in the keyring until one is matched. This must be used for any |
1371 | search that's not doing a simple direct match on the key description. | 1362 | search that's not doing a simple direct match on the key description. |
1372 | 1363 | ||
1373 | The method may set cmp to point to a function of its choice that does some | 1364 | The method may set cmp to point to a function of its choice that does some |
1374 | other form of match, may set lookup_type to KEYRING_SEARCH_LOOKUP_ITERATE | 1365 | other form of match, may set lookup_type to KEYRING_SEARCH_LOOKUP_ITERATE |
1375 | and may attach something to the preparsed pointer for use by (*cmp)(). | 1366 | and may attach something to the preparsed pointer for use by ``(*cmp)()``. |
1376 | (*cmp)() should return true if a key matches and false otherwise. | 1367 | ``(*cmp)()`` should return true if a key matches and false otherwise. |
1377 | 1368 | ||
1378 | If preparsed is set, it may be necessary to use the match_free() method to | 1369 | If preparsed is set, it may be necessary to use the match_free() method to |
1379 | clean it up. | 1370 | clean it up. |
@@ -1381,20 +1372,20 @@ The structure has a number of fields, some of which are mandatory: | |||
1381 | The method should return 0 if successful or a negative error code | 1372 | The method should return 0 if successful or a negative error code |
1382 | otherwise. | 1373 | otherwise. |
1383 | 1374 | ||
1384 | It is permitted to sleep in this method, but (*cmp)() may not sleep as | 1375 | It is permitted to sleep in this method, but ``(*cmp)()`` may not sleep as |
1385 | locks will be held over it. | 1376 | locks will be held over it. |
1386 | 1377 | ||
1387 | If match_preparse() is not provided, keys of this type will be matched | 1378 | If match_preparse() is not provided, keys of this type will be matched |
1388 | exactly by their description. | 1379 | exactly by their description. |
1389 | 1380 | ||
1390 | 1381 | ||
1391 | (*) void (*match_free)(struct key_match_data *match_data); | 1382 | * ``void (*match_free)(struct key_match_data *match_data);`` |
1392 | 1383 | ||
1393 | This method is optional. If given, it called to clean up | 1384 | This method is optional. If given, it called to clean up |
1394 | match_data->preparsed after a successful call to match_preparse(). | 1385 | match_data->preparsed after a successful call to match_preparse(). |
1395 | 1386 | ||
1396 | 1387 | ||
1397 | (*) void (*revoke)(struct key *key); | 1388 | * ``void (*revoke)(struct key *key);`` |
1398 | 1389 | ||
1399 | This method is optional. It is called to discard part of the payload | 1390 | This method is optional. It is called to discard part of the payload |
1400 | data upon a key being revoked. The caller will have the key semaphore | 1391 | data upon a key being revoked. The caller will have the key semaphore |
@@ -1404,7 +1395,7 @@ The structure has a number of fields, some of which are mandatory: | |||
1404 | a deadlock against the key semaphore. | 1395 | a deadlock against the key semaphore. |
1405 | 1396 | ||
1406 | 1397 | ||
1407 | (*) void (*destroy)(struct key *key); | 1398 | * ``void (*destroy)(struct key *key);`` |
1408 | 1399 | ||
1409 | This method is optional. It is called to discard the payload data on a key | 1400 | This method is optional. It is called to discard the payload data on a key |
1410 | when it is being destroyed. | 1401 | when it is being destroyed. |
@@ -1416,7 +1407,7 @@ The structure has a number of fields, some of which are mandatory: | |||
1416 | It is not safe to sleep in this method; the caller may hold spinlocks. | 1407 | It is not safe to sleep in this method; the caller may hold spinlocks. |
1417 | 1408 | ||
1418 | 1409 | ||
1419 | (*) void (*describe)(const struct key *key, struct seq_file *p); | 1410 | * ``void (*describe)(const struct key *key, struct seq_file *p);`` |
1420 | 1411 | ||
1421 | This method is optional. It is called during /proc/keys reading to | 1412 | This method is optional. It is called during /proc/keys reading to |
1422 | summarise a key's description and payload in text form. | 1413 | summarise a key's description and payload in text form. |
@@ -1432,7 +1423,7 @@ The structure has a number of fields, some of which are mandatory: | |||
1432 | caller. | 1423 | caller. |
1433 | 1424 | ||
1434 | 1425 | ||
1435 | (*) long (*read)(const struct key *key, char __user *buffer, size_t buflen); | 1426 | * ``long (*read)(const struct key *key, char __user *buffer, size_t buflen);`` |
1436 | 1427 | ||
1437 | This method is optional. It is called by KEYCTL_READ to translate the | 1428 | This method is optional. It is called by KEYCTL_READ to translate the |
1438 | key's payload into something a blob of data for userspace to deal with. | 1429 | key's payload into something a blob of data for userspace to deal with. |
@@ -1448,8 +1439,7 @@ The structure has a number of fields, some of which are mandatory: | |||
1448 | as might happen when the userspace buffer is accessed. | 1439 | as might happen when the userspace buffer is accessed. |
1449 | 1440 | ||
1450 | 1441 | ||
1451 | (*) int (*request_key)(struct key_construction *cons, const char *op, | 1442 | * ``int (*request_key)(struct key_construction *cons, const char *op, void *aux);`` |
1452 | void *aux); | ||
1453 | 1443 | ||
1454 | This method is optional. If provided, request_key() and friends will | 1444 | This method is optional. If provided, request_key() and friends will |
1455 | invoke this function rather than upcalling to /sbin/request-key to operate | 1445 | invoke this function rather than upcalling to /sbin/request-key to operate |
@@ -1463,7 +1453,7 @@ The structure has a number of fields, some of which are mandatory: | |||
1463 | This method is permitted to return before the upcall is complete, but the | 1453 | This method is permitted to return before the upcall is complete, but the |
1464 | following function must be called under all circumstances to complete the | 1454 | following function must be called under all circumstances to complete the |
1465 | instantiation process, whether or not it succeeds, whether or not there's | 1455 | instantiation process, whether or not it succeeds, whether or not there's |
1466 | an error: | 1456 | an error:: |
1467 | 1457 | ||
1468 | void complete_request_key(struct key_construction *cons, int error); | 1458 | void complete_request_key(struct key_construction *cons, int error); |
1469 | 1459 | ||
@@ -1479,16 +1469,16 @@ The structure has a number of fields, some of which are mandatory: | |||
1479 | The key under construction and the authorisation key can be found in the | 1469 | The key under construction and the authorisation key can be found in the |
1480 | key_construction struct pointed to by cons: | 1470 | key_construction struct pointed to by cons: |
1481 | 1471 | ||
1482 | (*) struct key *key; | 1472 | * ``struct key *key;`` |
1483 | 1473 | ||
1484 | The key under construction. | 1474 | The key under construction. |
1485 | 1475 | ||
1486 | (*) struct key *authkey; | 1476 | * ``struct key *authkey;`` |
1487 | 1477 | ||
1488 | The authorisation key. | 1478 | The authorisation key. |
1489 | 1479 | ||
1490 | 1480 | ||
1491 | (*) struct key_restriction *(*lookup_restriction)(const char *params); | 1481 | * ``struct key_restriction *(*lookup_restriction)(const char *params);`` |
1492 | 1482 | ||
1493 | This optional method is used to enable userspace configuration of keyring | 1483 | This optional method is used to enable userspace configuration of keyring |
1494 | restrictions. The restriction parameter string (not including the key type | 1484 | restrictions. The restriction parameter string (not including the key type |
@@ -1497,12 +1487,11 @@ The structure has a number of fields, some of which are mandatory: | |||
1497 | attempted key link operation. If there is no match, -EINVAL is returned. | 1487 | attempted key link operation. If there is no match, -EINVAL is returned. |
1498 | 1488 | ||
1499 | 1489 | ||
1500 | ============================ | 1490 | Request-Key Callback Service |
1501 | REQUEST-KEY CALLBACK SERVICE | ||
1502 | ============================ | 1491 | ============================ |
1503 | 1492 | ||
1504 | To create a new key, the kernel will attempt to execute the following command | 1493 | To create a new key, the kernel will attempt to execute the following command |
1505 | line: | 1494 | line:: |
1506 | 1495 | ||
1507 | /sbin/request-key create <key> <uid> <gid> \ | 1496 | /sbin/request-key create <key> <uid> <gid> \ |
1508 | <threadring> <processring> <sessionring> <callout_info> | 1497 | <threadring> <processring> <sessionring> <callout_info> |
@@ -1511,10 +1500,10 @@ line: | |||
1511 | keyrings from the process that caused the search to be issued. These are | 1500 | keyrings from the process that caused the search to be issued. These are |
1512 | included for two reasons: | 1501 | included for two reasons: |
1513 | 1502 | ||
1514 | (1) There may be an authentication token in one of the keyrings that is | 1503 | 1 There may be an authentication token in one of the keyrings that is |
1515 | required to obtain the key, eg: a Kerberos Ticket-Granting Ticket. | 1504 | required to obtain the key, eg: a Kerberos Ticket-Granting Ticket. |
1516 | 1505 | ||
1517 | (2) The new key should probably be cached in one of these rings. | 1506 | 2 The new key should probably be cached in one of these rings. |
1518 | 1507 | ||
1519 | This program should set it UID and GID to those specified before attempting to | 1508 | This program should set it UID and GID to those specified before attempting to |
1520 | access any more keys. It may then look around for a user specific process to | 1509 | access any more keys. It may then look around for a user specific process to |
@@ -1539,7 +1528,7 @@ instead. | |||
1539 | 1528 | ||
1540 | 1529 | ||
1541 | Similarly, the kernel may attempt to update an expired or a soon to expire key | 1530 | Similarly, the kernel may attempt to update an expired or a soon to expire key |
1542 | by executing: | 1531 | by executing:: |
1543 | 1532 | ||
1544 | /sbin/request-key update <key> <uid> <gid> \ | 1533 | /sbin/request-key update <key> <uid> <gid> \ |
1545 | <threadring> <processring> <sessionring> | 1534 | <threadring> <processring> <sessionring> |
@@ -1548,8 +1537,7 @@ In this case, the program isn't required to actually attach the key to a ring; | |||
1548 | the rings are provided for reference. | 1537 | the rings are provided for reference. |
1549 | 1538 | ||
1550 | 1539 | ||
1551 | ================== | 1540 | Garbage Collection |
1552 | GARBAGE COLLECTION | ||
1553 | ================== | 1541 | ================== |
1554 | 1542 | ||
1555 | Dead keys (for which the type has been removed) will be automatically unlinked | 1543 | Dead keys (for which the type has been removed) will be automatically unlinked |
@@ -1557,6 +1545,6 @@ from those keyrings that point to them and deleted as soon as possible by a | |||
1557 | background garbage collector. | 1545 | background garbage collector. |
1558 | 1546 | ||
1559 | Similarly, revoked and expired keys will be garbage collected, but only after a | 1547 | Similarly, revoked and expired keys will be garbage collected, but only after a |
1560 | certain amount of time has passed. This time is set as a number of seconds in: | 1548 | certain amount of time has passed. This time is set as a number of seconds in:: |
1561 | 1549 | ||
1562 | /proc/sys/kernel/keys/gc_delay | 1550 | /proc/sys/kernel/keys/gc_delay |
diff --git a/Documentation/security/keys-ecryptfs.txt b/Documentation/security/keys/ecryptfs.rst index c3bbeba63562..4920f3a8ea75 100644 --- a/Documentation/security/keys-ecryptfs.txt +++ b/Documentation/security/keys/ecryptfs.rst | |||
@@ -1,4 +1,6 @@ | |||
1 | Encrypted keys for the eCryptfs filesystem | 1 | ========================================== |
2 | Encrypted keys for the eCryptfs filesystem | ||
3 | ========================================== | ||
2 | 4 | ||
3 | ECryptfs is a stacked filesystem which transparently encrypts and decrypts each | 5 | ECryptfs is a stacked filesystem which transparently encrypts and decrypts each |
4 | file using a randomly generated File Encryption Key (FEK). | 6 | file using a randomly generated File Encryption Key (FEK). |
@@ -35,20 +37,23 @@ controlled environment. Another advantage is that the key is not exposed to | |||
35 | threats of malicious software, because it is available in clear form only at | 37 | threats of malicious software, because it is available in clear form only at |
36 | kernel level. | 38 | kernel level. |
37 | 39 | ||
38 | Usage: | 40 | Usage:: |
41 | |||
39 | keyctl add encrypted name "new ecryptfs key-type:master-key-name keylen" ring | 42 | keyctl add encrypted name "new ecryptfs key-type:master-key-name keylen" ring |
40 | keyctl add encrypted name "load hex_blob" ring | 43 | keyctl add encrypted name "load hex_blob" ring |
41 | keyctl update keyid "update key-type:master-key-name" | 44 | keyctl update keyid "update key-type:master-key-name" |
42 | 45 | ||
43 | name:= '<16 hexadecimal characters>' | 46 | Where:: |
44 | key-type:= 'trusted' | 'user' | 47 | |
45 | keylen:= 64 | 48 | name:= '<16 hexadecimal characters>' |
49 | key-type:= 'trusted' | 'user' | ||
50 | keylen:= 64 | ||
46 | 51 | ||
47 | 52 | ||
48 | Example of encrypted key usage with the eCryptfs filesystem: | 53 | Example of encrypted key usage with the eCryptfs filesystem: |
49 | 54 | ||
50 | Create an encrypted key "1000100010001000" of length 64 bytes with format | 55 | Create an encrypted key "1000100010001000" of length 64 bytes with format |
51 | 'ecryptfs' and save it using a previously loaded user key "test": | 56 | 'ecryptfs' and save it using a previously loaded user key "test":: |
52 | 57 | ||
53 | $ keyctl add encrypted 1000100010001000 "new ecryptfs user:test 64" @u | 58 | $ keyctl add encrypted 1000100010001000 "new ecryptfs user:test 64" @u |
54 | 19184530 | 59 | 19184530 |
@@ -62,7 +67,7 @@ Create an encrypted key "1000100010001000" of length 64 bytes with format | |||
62 | $ keyctl pipe 19184530 > ecryptfs.blob | 67 | $ keyctl pipe 19184530 > ecryptfs.blob |
63 | 68 | ||
64 | Mount an eCryptfs filesystem using the created encrypted key "1000100010001000" | 69 | Mount an eCryptfs filesystem using the created encrypted key "1000100010001000" |
65 | into the '/secret' directory: | 70 | into the '/secret' directory:: |
66 | 71 | ||
67 | $ mount -i -t ecryptfs -oecryptfs_sig=1000100010001000,\ | 72 | $ mount -i -t ecryptfs -oecryptfs_sig=1000100010001000,\ |
68 | ecryptfs_cipher=aes,ecryptfs_key_bytes=32 /secret /secret | 73 | ecryptfs_cipher=aes,ecryptfs_key_bytes=32 /secret /secret |
diff --git a/Documentation/security/keys/index.rst b/Documentation/security/keys/index.rst new file mode 100644 index 000000000000..647d58f2588e --- /dev/null +++ b/Documentation/security/keys/index.rst | |||
@@ -0,0 +1,11 @@ | |||
1 | =========== | ||
2 | Kernel Keys | ||
3 | =========== | ||
4 | |||
5 | .. toctree:: | ||
6 | :maxdepth: 1 | ||
7 | |||
8 | core | ||
9 | ecryptfs | ||
10 | request-key | ||
11 | trusted-encrypted | ||
diff --git a/Documentation/security/keys-request-key.txt b/Documentation/security/keys/request-key.rst index 51987bfecfed..aba32784174c 100644 --- a/Documentation/security/keys-request-key.txt +++ b/Documentation/security/keys/request-key.rst | |||
@@ -1,19 +1,19 @@ | |||
1 | =================== | 1 | =================== |
2 | KEY REQUEST SERVICE | 2 | Key Request Service |
3 | =================== | 3 | =================== |
4 | 4 | ||
5 | The key request service is part of the key retention service (refer to | 5 | The key request service is part of the key retention service (refer to |
6 | Documentation/security/keys.txt). This document explains more fully how | 6 | Documentation/security/keys.txt). This document explains more fully how |
7 | the requesting algorithm works. | 7 | the requesting algorithm works. |
8 | 8 | ||
9 | The process starts by either the kernel requesting a service by calling | 9 | The process starts by either the kernel requesting a service by calling |
10 | request_key*(): | 10 | ``request_key*()``:: |
11 | 11 | ||
12 | struct key *request_key(const struct key_type *type, | 12 | struct key *request_key(const struct key_type *type, |
13 | const char *description, | 13 | const char *description, |
14 | const char *callout_info); | 14 | const char *callout_info); |
15 | 15 | ||
16 | or: | 16 | or:: |
17 | 17 | ||
18 | struct key *request_key_with_auxdata(const struct key_type *type, | 18 | struct key *request_key_with_auxdata(const struct key_type *type, |
19 | const char *description, | 19 | const char *description, |
@@ -21,14 +21,14 @@ or: | |||
21 | size_t callout_len, | 21 | size_t callout_len, |
22 | void *aux); | 22 | void *aux); |
23 | 23 | ||
24 | or: | 24 | or:: |
25 | 25 | ||
26 | struct key *request_key_async(const struct key_type *type, | 26 | struct key *request_key_async(const struct key_type *type, |
27 | const char *description, | 27 | const char *description, |
28 | const char *callout_info, | 28 | const char *callout_info, |
29 | size_t callout_len); | 29 | size_t callout_len); |
30 | 30 | ||
31 | or: | 31 | or:: |
32 | 32 | ||
33 | struct key *request_key_async_with_auxdata(const struct key_type *type, | 33 | struct key *request_key_async_with_auxdata(const struct key_type *type, |
34 | const char *description, | 34 | const char *description, |
@@ -36,7 +36,7 @@ or: | |||
36 | size_t callout_len, | 36 | size_t callout_len, |
37 | void *aux); | 37 | void *aux); |
38 | 38 | ||
39 | Or by userspace invoking the request_key system call: | 39 | Or by userspace invoking the request_key system call:: |
40 | 40 | ||
41 | key_serial_t request_key(const char *type, | 41 | key_serial_t request_key(const char *type, |
42 | const char *description, | 42 | const char *description, |
@@ -67,38 +67,37 @@ own upcall mechanisms. If they do, then those should be substituted for the | |||
67 | forking and execution of /sbin/request-key. | 67 | forking and execution of /sbin/request-key. |
68 | 68 | ||
69 | 69 | ||
70 | =========== | 70 | The Process |
71 | THE PROCESS | ||
72 | =========== | 71 | =========== |
73 | 72 | ||
74 | A request proceeds in the following manner: | 73 | A request proceeds in the following manner: |
75 | 74 | ||
76 | (1) Process A calls request_key() [the userspace syscall calls the kernel | 75 | 1) Process A calls request_key() [the userspace syscall calls the kernel |
77 | interface]. | 76 | interface]. |
78 | 77 | ||
79 | (2) request_key() searches the process's subscribed keyrings to see if there's | 78 | 2) request_key() searches the process's subscribed keyrings to see if there's |
80 | a suitable key there. If there is, it returns the key. If there isn't, | 79 | a suitable key there. If there is, it returns the key. If there isn't, |
81 | and callout_info is not set, an error is returned. Otherwise the process | 80 | and callout_info is not set, an error is returned. Otherwise the process |
82 | proceeds to the next step. | 81 | proceeds to the next step. |
83 | 82 | ||
84 | (3) request_key() sees that A doesn't have the desired key yet, so it creates | 83 | 3) request_key() sees that A doesn't have the desired key yet, so it creates |
85 | two things: | 84 | two things: |
86 | 85 | ||
87 | (a) An uninstantiated key U of requested type and description. | 86 | a) An uninstantiated key U of requested type and description. |
88 | 87 | ||
89 | (b) An authorisation key V that refers to key U and notes that process A | 88 | b) An authorisation key V that refers to key U and notes that process A |
90 | is the context in which key U should be instantiated and secured, and | 89 | is the context in which key U should be instantiated and secured, and |
91 | from which associated key requests may be satisfied. | 90 | from which associated key requests may be satisfied. |
92 | 91 | ||
93 | (4) request_key() then forks and executes /sbin/request-key with a new session | 92 | 4) request_key() then forks and executes /sbin/request-key with a new session |
94 | keyring that contains a link to auth key V. | 93 | keyring that contains a link to auth key V. |
95 | 94 | ||
96 | (5) /sbin/request-key assumes the authority associated with key U. | 95 | 5) /sbin/request-key assumes the authority associated with key U. |
97 | 96 | ||
98 | (6) /sbin/request-key execs an appropriate program to perform the actual | 97 | 6) /sbin/request-key execs an appropriate program to perform the actual |
99 | instantiation. | 98 | instantiation. |
100 | 99 | ||
101 | (7) The program may want to access another key from A's context (say a | 100 | 7) The program may want to access another key from A's context (say a |
102 | Kerberos TGT key). It just requests the appropriate key, and the keyring | 101 | Kerberos TGT key). It just requests the appropriate key, and the keyring |
103 | search notes that the session keyring has auth key V in its bottom level. | 102 | search notes that the session keyring has auth key V in its bottom level. |
104 | 103 | ||
@@ -106,15 +105,15 @@ A request proceeds in the following manner: | |||
106 | UID, GID, groups and security info of process A as if it was process A, | 105 | UID, GID, groups and security info of process A as if it was process A, |
107 | and come up with key W. | 106 | and come up with key W. |
108 | 107 | ||
109 | (8) The program then does what it must to get the data with which to | 108 | 8) The program then does what it must to get the data with which to |
110 | instantiate key U, using key W as a reference (perhaps it contacts a | 109 | instantiate key U, using key W as a reference (perhaps it contacts a |
111 | Kerberos server using the TGT) and then instantiates key U. | 110 | Kerberos server using the TGT) and then instantiates key U. |
112 | 111 | ||
113 | (9) Upon instantiating key U, auth key V is automatically revoked so that it | 112 | 9) Upon instantiating key U, auth key V is automatically revoked so that it |
114 | may not be used again. | 113 | may not be used again. |
115 | 114 | ||
116 | (10) The program then exits 0 and request_key() deletes key V and returns key | 115 | 10) The program then exits 0 and request_key() deletes key V and returns key |
117 | U to the caller. | 116 | U to the caller. |
118 | 117 | ||
119 | This also extends further. If key W (step 7 above) didn't exist, key W would | 118 | This also extends further. If key W (step 7 above) didn't exist, key W would |
120 | be created uninstantiated, another auth key (X) would be created (as per step | 119 | be created uninstantiated, another auth key (X) would be created (as per step |
@@ -127,8 +126,7 @@ This is because process A's keyrings can't simply be attached to | |||
127 | of them, and (b) it requires the same UID/GID/Groups all the way through. | 126 | of them, and (b) it requires the same UID/GID/Groups all the way through. |
128 | 127 | ||
129 | 128 | ||
130 | ==================================== | 129 | Negative Instantiation And Rejection |
131 | NEGATIVE INSTANTIATION AND REJECTION | ||
132 | ==================================== | 130 | ==================================== |
133 | 131 | ||
134 | Rather than instantiating a key, it is possible for the possessor of an | 132 | Rather than instantiating a key, it is possible for the possessor of an |
@@ -145,23 +143,22 @@ signal, the key under construction will be automatically negatively | |||
145 | instantiated for a short amount of time. | 143 | instantiated for a short amount of time. |
146 | 144 | ||
147 | 145 | ||
148 | ==================== | 146 | The Search Algorithm |
149 | THE SEARCH ALGORITHM | ||
150 | ==================== | 147 | ==================== |
151 | 148 | ||
152 | A search of any particular keyring proceeds in the following fashion: | 149 | A search of any particular keyring proceeds in the following fashion: |
153 | 150 | ||
154 | (1) When the key management code searches for a key (keyring_search_aux) it | 151 | 1) When the key management code searches for a key (keyring_search_aux) it |
155 | firstly calls key_permission(SEARCH) on the keyring it's starting with, | 152 | firstly calls key_permission(SEARCH) on the keyring it's starting with, |
156 | if this denies permission, it doesn't search further. | 153 | if this denies permission, it doesn't search further. |
157 | 154 | ||
158 | (2) It considers all the non-keyring keys within that keyring and, if any key | 155 | 2) It considers all the non-keyring keys within that keyring and, if any key |
159 | matches the criteria specified, calls key_permission(SEARCH) on it to see | 156 | matches the criteria specified, calls key_permission(SEARCH) on it to see |
160 | if the key is allowed to be found. If it is, that key is returned; if | 157 | if the key is allowed to be found. If it is, that key is returned; if |
161 | not, the search continues, and the error code is retained if of higher | 158 | not, the search continues, and the error code is retained if of higher |
162 | priority than the one currently set. | 159 | priority than the one currently set. |
163 | 160 | ||
164 | (3) It then considers all the keyring-type keys in the keyring it's currently | 161 | 3) It then considers all the keyring-type keys in the keyring it's currently |
165 | searching. It calls key_permission(SEARCH) on each keyring, and if this | 162 | searching. It calls key_permission(SEARCH) on each keyring, and if this |
166 | grants permission, it recurses, executing steps (2) and (3) on that | 163 | grants permission, it recurses, executing steps (2) and (3) on that |
167 | keyring. | 164 | keyring. |
@@ -173,20 +170,20 @@ returned. | |||
173 | When search_process_keyrings() is invoked, it performs the following searches | 170 | When search_process_keyrings() is invoked, it performs the following searches |
174 | until one succeeds: | 171 | until one succeeds: |
175 | 172 | ||
176 | (1) If extant, the process's thread keyring is searched. | 173 | 1) If extant, the process's thread keyring is searched. |
177 | 174 | ||
178 | (2) If extant, the process's process keyring is searched. | 175 | 2) If extant, the process's process keyring is searched. |
179 | 176 | ||
180 | (3) The process's session keyring is searched. | 177 | 3) The process's session keyring is searched. |
181 | 178 | ||
182 | (4) If the process has assumed the authority associated with a request_key() | 179 | 4) If the process has assumed the authority associated with a request_key() |
183 | authorisation key then: | 180 | authorisation key then: |
184 | 181 | ||
185 | (a) If extant, the calling process's thread keyring is searched. | 182 | a) If extant, the calling process's thread keyring is searched. |
186 | 183 | ||
187 | (b) If extant, the calling process's process keyring is searched. | 184 | b) If extant, the calling process's process keyring is searched. |
188 | 185 | ||
189 | (c) The calling process's session keyring is searched. | 186 | c) The calling process's session keyring is searched. |
190 | 187 | ||
191 | The moment one succeeds, all pending errors are discarded and the found key is | 188 | The moment one succeeds, all pending errors are discarded and the found key is |
192 | returned. | 189 | returned. |
@@ -194,7 +191,7 @@ returned. | |||
194 | Only if all these fail does the whole thing fail with the highest priority | 191 | Only if all these fail does the whole thing fail with the highest priority |
195 | error. Note that several errors may have come from LSM. | 192 | error. Note that several errors may have come from LSM. |
196 | 193 | ||
197 | The error priority is: | 194 | The error priority is:: |
198 | 195 | ||
199 | EKEYREVOKED > EKEYEXPIRED > ENOKEY | 196 | EKEYREVOKED > EKEYEXPIRED > ENOKEY |
200 | 197 | ||
diff --git a/Documentation/security/keys-trusted-encrypted.txt b/Documentation/security/keys/trusted-encrypted.rst index b20a993a32af..7b503831bdea 100644 --- a/Documentation/security/keys-trusted-encrypted.txt +++ b/Documentation/security/keys/trusted-encrypted.rst | |||
@@ -1,4 +1,6 @@ | |||
1 | Trusted and Encrypted Keys | 1 | ========================== |
2 | Trusted and Encrypted Keys | ||
3 | ========================== | ||
2 | 4 | ||
3 | Trusted and Encrypted Keys are two new key types added to the existing kernel | 5 | Trusted and Encrypted Keys are two new key types added to the existing kernel |
4 | key ring service. Both of these new types are variable length symmetric keys, | 6 | key ring service. Both of these new types are variable length symmetric keys, |
@@ -20,7 +22,8 @@ By default, trusted keys are sealed under the SRK, which has the default | |||
20 | authorization value (20 zeros). This can be set at takeownership time with the | 22 | authorization value (20 zeros). This can be set at takeownership time with the |
21 | trouser's utility: "tpm_takeownership -u -z". | 23 | trouser's utility: "tpm_takeownership -u -z". |
22 | 24 | ||
23 | Usage: | 25 | Usage:: |
26 | |||
24 | keyctl add trusted name "new keylen [options]" ring | 27 | keyctl add trusted name "new keylen [options]" ring |
25 | keyctl add trusted name "load hex_blob [pcrlock=pcrnum]" ring | 28 | keyctl add trusted name "load hex_blob [pcrlock=pcrnum]" ring |
26 | keyctl update key "update [options]" | 29 | keyctl update key "update [options]" |
@@ -64,19 +67,22 @@ The decrypted portion of encrypted keys can contain either a simple symmetric | |||
64 | key or a more complex structure. The format of the more complex structure is | 67 | key or a more complex structure. The format of the more complex structure is |
65 | application specific, which is identified by 'format'. | 68 | application specific, which is identified by 'format'. |
66 | 69 | ||
67 | Usage: | 70 | Usage:: |
71 | |||
68 | keyctl add encrypted name "new [format] key-type:master-key-name keylen" | 72 | keyctl add encrypted name "new [format] key-type:master-key-name keylen" |
69 | ring | 73 | ring |
70 | keyctl add encrypted name "load hex_blob" ring | 74 | keyctl add encrypted name "load hex_blob" ring |
71 | keyctl update keyid "update key-type:master-key-name" | 75 | keyctl update keyid "update key-type:master-key-name" |
72 | 76 | ||
73 | format:= 'default | ecryptfs' | 77 | Where:: |
74 | key-type:= 'trusted' | 'user' | 78 | |
79 | format:= 'default | ecryptfs' | ||
80 | key-type:= 'trusted' | 'user' | ||
75 | 81 | ||
76 | 82 | ||
77 | Examples of trusted and encrypted key usage: | 83 | Examples of trusted and encrypted key usage: |
78 | 84 | ||
79 | Create and save a trusted key named "kmk" of length 32 bytes: | 85 | Create and save a trusted key named "kmk" of length 32 bytes:: |
80 | 86 | ||
81 | $ keyctl add trusted kmk "new 32" @u | 87 | $ keyctl add trusted kmk "new 32" @u |
82 | 440502848 | 88 | 440502848 |
@@ -99,7 +105,7 @@ Create and save a trusted key named "kmk" of length 32 bytes: | |||
99 | 105 | ||
100 | $ keyctl pipe 440502848 > kmk.blob | 106 | $ keyctl pipe 440502848 > kmk.blob |
101 | 107 | ||
102 | Load a trusted key from the saved blob: | 108 | Load a trusted key from the saved blob:: |
103 | 109 | ||
104 | $ keyctl add trusted kmk "load `cat kmk.blob`" @u | 110 | $ keyctl add trusted kmk "load `cat kmk.blob`" @u |
105 | 268728824 | 111 | 268728824 |
@@ -114,7 +120,7 @@ Load a trusted key from the saved blob: | |||
114 | f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b | 120 | f1f8fff03ad0acb083725535636addb08d73dedb9832da198081e5deae84bfaf0409c22b |
115 | e4a8aea2b607ec96931e6f4d4fe563ba | 121 | e4a8aea2b607ec96931e6f4d4fe563ba |
116 | 122 | ||
117 | Reseal a trusted key under new pcr values: | 123 | Reseal a trusted key under new pcr values:: |
118 | 124 | ||
119 | $ keyctl update 268728824 "update pcrinfo=`cat pcr.blob`" | 125 | $ keyctl update 268728824 "update pcrinfo=`cat pcr.blob`" |
120 | $ keyctl print 268728824 | 126 | $ keyctl print 268728824 |
@@ -135,11 +141,13 @@ compromised by a user level problem, and when sealed to specific boot PCR | |||
135 | values, protects against boot and offline attacks. Create and save an | 141 | values, protects against boot and offline attacks. Create and save an |
136 | encrypted key "evm" using the above trusted key "kmk": | 142 | encrypted key "evm" using the above trusted key "kmk": |
137 | 143 | ||
138 | option 1: omitting 'format' | 144 | option 1: omitting 'format':: |
145 | |||
139 | $ keyctl add encrypted evm "new trusted:kmk 32" @u | 146 | $ keyctl add encrypted evm "new trusted:kmk 32" @u |
140 | 159771175 | 147 | 159771175 |
141 | 148 | ||
142 | option 2: explicitly defining 'format' as 'default' | 149 | option 2: explicitly defining 'format' as 'default':: |
150 | |||
143 | $ keyctl add encrypted evm "new default trusted:kmk 32" @u | 151 | $ keyctl add encrypted evm "new default trusted:kmk 32" @u |
144 | 159771175 | 152 | 159771175 |
145 | 153 | ||
@@ -150,7 +158,7 @@ option 2: explicitly defining 'format' as 'default' | |||
150 | 158 | ||
151 | $ keyctl pipe 159771175 > evm.blob | 159 | $ keyctl pipe 159771175 > evm.blob |
152 | 160 | ||
153 | Load an encrypted key "evm" from saved blob: | 161 | Load an encrypted key "evm" from saved blob:: |
154 | 162 | ||
155 | $ keyctl add encrypted evm "load `cat evm.blob`" @u | 163 | $ keyctl add encrypted evm "load `cat evm.blob`" @u |
156 | 831684262 | 164 | 831684262 |
@@ -164,4 +172,4 @@ Other uses for trusted and encrypted keys, such as for disk and file encryption | |||
164 | are anticipated. In particular the new format 'ecryptfs' has been defined in | 172 | are anticipated. In particular the new format 'ecryptfs' has been defined in |
165 | in order to use encrypted keys to mount an eCryptfs filesystem. More details | 173 | in order to use encrypted keys to mount an eCryptfs filesystem. More details |
166 | about the usage can be found in the file | 174 | about the usage can be found in the file |
167 | 'Documentation/security/keys-ecryptfs.txt'. | 175 | ``Documentation/security/keys-ecryptfs.txt``. |
diff --git a/Documentation/security/self-protection.txt b/Documentation/security/self-protection.rst index 141acfebe6ef..60c8bd8b77bf 100644 --- a/Documentation/security/self-protection.txt +++ b/Documentation/security/self-protection.rst | |||
@@ -1,4 +1,6 @@ | |||
1 | # Kernel Self-Protection | 1 | ====================== |
2 | Kernel Self-Protection | ||
3 | ====================== | ||
2 | 4 | ||
3 | Kernel self-protection is the design and implementation of systems and | 5 | Kernel self-protection is the design and implementation of systems and |
4 | structures within the Linux kernel to protect against security flaws in | 6 | structures within the Linux kernel to protect against security flaws in |
@@ -26,7 +28,8 @@ mentioning them, since these aspects need to be explored, dealt with, | |||
26 | and/or accepted. | 28 | and/or accepted. |
27 | 29 | ||
28 | 30 | ||
29 | ## Attack Surface Reduction | 31 | Attack Surface Reduction |
32 | ======================== | ||
30 | 33 | ||
31 | The most fundamental defense against security exploits is to reduce the | 34 | The most fundamental defense against security exploits is to reduce the |
32 | areas of the kernel that can be used to redirect execution. This ranges | 35 | areas of the kernel that can be used to redirect execution. This ranges |
@@ -34,13 +37,15 @@ from limiting the exposed APIs available to userspace, making in-kernel | |||
34 | APIs hard to use incorrectly, minimizing the areas of writable kernel | 37 | APIs hard to use incorrectly, minimizing the areas of writable kernel |
35 | memory, etc. | 38 | memory, etc. |
36 | 39 | ||
37 | ### Strict kernel memory permissions | 40 | Strict kernel memory permissions |
41 | -------------------------------- | ||
38 | 42 | ||
39 | When all of kernel memory is writable, it becomes trivial for attacks | 43 | When all of kernel memory is writable, it becomes trivial for attacks |
40 | to redirect execution flow. To reduce the availability of these targets | 44 | to redirect execution flow. To reduce the availability of these targets |
41 | the kernel needs to protect its memory with a tight set of permissions. | 45 | the kernel needs to protect its memory with a tight set of permissions. |
42 | 46 | ||
43 | #### Executable code and read-only data must not be writable | 47 | Executable code and read-only data must not be writable |
48 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
44 | 49 | ||
45 | Any areas of the kernel with executable memory must not be writable. | 50 | Any areas of the kernel with executable memory must not be writable. |
46 | While this obviously includes the kernel text itself, we must consider | 51 | While this obviously includes the kernel text itself, we must consider |
@@ -51,18 +56,19 @@ kernel, they are implemented in a way where the memory is temporarily | |||
51 | made writable during the update, and then returned to the original | 56 | made writable during the update, and then returned to the original |
52 | permissions.) | 57 | permissions.) |
53 | 58 | ||
54 | In support of this are CONFIG_STRICT_KERNEL_RWX and | 59 | In support of this are ``CONFIG_STRICT_KERNEL_RWX`` and |
55 | CONFIG_STRICT_MODULE_RWX, which seek to make sure that code is not | 60 | ``CONFIG_STRICT_MODULE_RWX``, which seek to make sure that code is not |
56 | writable, data is not executable, and read-only data is neither writable | 61 | writable, data is not executable, and read-only data is neither writable |
57 | nor executable. | 62 | nor executable. |
58 | 63 | ||
59 | Most architectures have these options on by default and not user selectable. | 64 | Most architectures have these options on by default and not user selectable. |
60 | For some architectures like arm that wish to have these be selectable, | 65 | For some architectures like arm that wish to have these be selectable, |
61 | the architecture Kconfig can select ARCH_OPTIONAL_KERNEL_RWX to enable | 66 | the architecture Kconfig can select ARCH_OPTIONAL_KERNEL_RWX to enable |
62 | a Kconfig prompt. CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT determines | 67 | a Kconfig prompt. ``CONFIG_ARCH_OPTIONAL_KERNEL_RWX_DEFAULT`` determines |
63 | the default setting when ARCH_OPTIONAL_KERNEL_RWX is enabled. | 68 | the default setting when ARCH_OPTIONAL_KERNEL_RWX is enabled. |
64 | 69 | ||
65 | #### Function pointers and sensitive variables must not be writable | 70 | Function pointers and sensitive variables must not be writable |
71 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
66 | 72 | ||
67 | Vast areas of kernel memory contain function pointers that are looked | 73 | Vast areas of kernel memory contain function pointers that are looked |
68 | up by the kernel and used to continue execution (e.g. descriptor/vector | 74 | up by the kernel and used to continue execution (e.g. descriptor/vector |
@@ -74,8 +80,8 @@ so that they live in the .rodata section instead of the .data section | |||
74 | of the kernel, gaining the protection of the kernel's strict memory | 80 | of the kernel, gaining the protection of the kernel's strict memory |
75 | permissions as described above. | 81 | permissions as described above. |
76 | 82 | ||
77 | For variables that are initialized once at __init time, these can | 83 | For variables that are initialized once at ``__init`` time, these can |
78 | be marked with the (new and under development) __ro_after_init | 84 | be marked with the (new and under development) ``__ro_after_init`` |
79 | attribute. | 85 | attribute. |
80 | 86 | ||
81 | What remains are variables that are updated rarely (e.g. GDT). These | 87 | What remains are variables that are updated rarely (e.g. GDT). These |
@@ -85,7 +91,8 @@ of their lifetime read-only. (For example, when being updated, only the | |||
85 | CPU thread performing the update would be given uninterruptible write | 91 | CPU thread performing the update would be given uninterruptible write |
86 | access to the memory.) | 92 | access to the memory.) |
87 | 93 | ||
88 | #### Segregation of kernel memory from userspace memory | 94 | Segregation of kernel memory from userspace memory |
95 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
89 | 96 | ||
90 | The kernel must never execute userspace memory. The kernel must also never | 97 | The kernel must never execute userspace memory. The kernel must also never |
91 | access userspace memory without explicit expectation to do so. These | 98 | access userspace memory without explicit expectation to do so. These |
@@ -95,10 +102,11 @@ By blocking userspace memory in this way, execution and data parsing | |||
95 | cannot be passed to trivially-controlled userspace memory, forcing | 102 | cannot be passed to trivially-controlled userspace memory, forcing |
96 | attacks to operate entirely in kernel memory. | 103 | attacks to operate entirely in kernel memory. |
97 | 104 | ||
98 | ### Reduced access to syscalls | 105 | Reduced access to syscalls |
106 | -------------------------- | ||
99 | 107 | ||
100 | One trivial way to eliminate many syscalls for 64-bit systems is building | 108 | One trivial way to eliminate many syscalls for 64-bit systems is building |
101 | without CONFIG_COMPAT. However, this is rarely a feasible scenario. | 109 | without ``CONFIG_COMPAT``. However, this is rarely a feasible scenario. |
102 | 110 | ||
103 | The "seccomp" system provides an opt-in feature made available to | 111 | The "seccomp" system provides an opt-in feature made available to |
104 | userspace, which provides a way to reduce the number of kernel entry | 112 | userspace, which provides a way to reduce the number of kernel entry |
@@ -112,7 +120,8 @@ to trusted processes. This would keep the scope of kernel entry points | |||
112 | restricted to the more regular set of normally available to unprivileged | 120 | restricted to the more regular set of normally available to unprivileged |
113 | userspace. | 121 | userspace. |
114 | 122 | ||
115 | ### Restricting access to kernel modules | 123 | Restricting access to kernel modules |
124 | ------------------------------------ | ||
116 | 125 | ||
117 | The kernel should never allow an unprivileged user the ability to | 126 | The kernel should never allow an unprivileged user the ability to |
118 | load specific kernel modules, since that would provide a facility to | 127 | load specific kernel modules, since that would provide a facility to |
@@ -127,11 +136,12 @@ for debate in some scenarios.) | |||
127 | To protect against even privileged users, systems may need to either | 136 | To protect against even privileged users, systems may need to either |
128 | disable module loading entirely (e.g. monolithic kernel builds or | 137 | disable module loading entirely (e.g. monolithic kernel builds or |
129 | modules_disabled sysctl), or provide signed modules (e.g. | 138 | modules_disabled sysctl), or provide signed modules (e.g. |
130 | CONFIG_MODULE_SIG_FORCE, or dm-crypt with LoadPin), to keep from having | 139 | ``CONFIG_MODULE_SIG_FORCE``, or dm-crypt with LoadPin), to keep from having |
131 | root load arbitrary kernel code via the module loader interface. | 140 | root load arbitrary kernel code via the module loader interface. |
132 | 141 | ||
133 | 142 | ||
134 | ## Memory integrity | 143 | Memory integrity |
144 | ================ | ||
135 | 145 | ||
136 | There are many memory structures in the kernel that are regularly abused | 146 | There are many memory structures in the kernel that are regularly abused |
137 | to gain execution control during an attack, By far the most commonly | 147 | to gain execution control during an attack, By far the most commonly |
@@ -139,16 +149,18 @@ understood is that of the stack buffer overflow in which the return | |||
139 | address stored on the stack is overwritten. Many other examples of this | 149 | address stored on the stack is overwritten. Many other examples of this |
140 | kind of attack exist, and protections exist to defend against them. | 150 | kind of attack exist, and protections exist to defend against them. |
141 | 151 | ||
142 | ### Stack buffer overflow | 152 | Stack buffer overflow |
153 | --------------------- | ||
143 | 154 | ||
144 | The classic stack buffer overflow involves writing past the expected end | 155 | The classic stack buffer overflow involves writing past the expected end |
145 | of a variable stored on the stack, ultimately writing a controlled value | 156 | of a variable stored on the stack, ultimately writing a controlled value |
146 | to the stack frame's stored return address. The most widely used defense | 157 | to the stack frame's stored return address. The most widely used defense |
147 | is the presence of a stack canary between the stack variables and the | 158 | is the presence of a stack canary between the stack variables and the |
148 | return address (CONFIG_CC_STACKPROTECTOR), which is verified just before | 159 | return address (``CONFIG_CC_STACKPROTECTOR``), which is verified just before |
149 | the function returns. Other defenses include things like shadow stacks. | 160 | the function returns. Other defenses include things like shadow stacks. |
150 | 161 | ||
151 | ### Stack depth overflow | 162 | Stack depth overflow |
163 | -------------------- | ||
152 | 164 | ||
153 | A less well understood attack is using a bug that triggers the | 165 | A less well understood attack is using a bug that triggers the |
154 | kernel to consume stack memory with deep function calls or large stack | 166 | kernel to consume stack memory with deep function calls or large stack |
@@ -158,27 +170,31 @@ important changes need to be made for better protections: moving the | |||
158 | sensitive thread_info structure elsewhere, and adding a faulting memory | 170 | sensitive thread_info structure elsewhere, and adding a faulting memory |
159 | hole at the bottom of the stack to catch these overflows. | 171 | hole at the bottom of the stack to catch these overflows. |
160 | 172 | ||
161 | ### Heap memory integrity | 173 | Heap memory integrity |
174 | --------------------- | ||
162 | 175 | ||
163 | The structures used to track heap free lists can be sanity-checked during | 176 | The structures used to track heap free lists can be sanity-checked during |
164 | allocation and freeing to make sure they aren't being used to manipulate | 177 | allocation and freeing to make sure they aren't being used to manipulate |
165 | other memory areas. | 178 | other memory areas. |
166 | 179 | ||
167 | ### Counter integrity | 180 | Counter integrity |
181 | ----------------- | ||
168 | 182 | ||
169 | Many places in the kernel use atomic counters to track object references | 183 | Many places in the kernel use atomic counters to track object references |
170 | or perform similar lifetime management. When these counters can be made | 184 | or perform similar lifetime management. When these counters can be made |
171 | to wrap (over or under) this traditionally exposes a use-after-free | 185 | to wrap (over or under) this traditionally exposes a use-after-free |
172 | flaw. By trapping atomic wrapping, this class of bug vanishes. | 186 | flaw. By trapping atomic wrapping, this class of bug vanishes. |
173 | 187 | ||
174 | ### Size calculation overflow detection | 188 | Size calculation overflow detection |
189 | ----------------------------------- | ||
175 | 190 | ||
176 | Similar to counter overflow, integer overflows (usually size calculations) | 191 | Similar to counter overflow, integer overflows (usually size calculations) |
177 | need to be detected at runtime to kill this class of bug, which | 192 | need to be detected at runtime to kill this class of bug, which |
178 | traditionally leads to being able to write past the end of kernel buffers. | 193 | traditionally leads to being able to write past the end of kernel buffers. |
179 | 194 | ||
180 | 195 | ||
181 | ## Statistical defenses | 196 | Probabilistic defenses |
197 | ====================== | ||
182 | 198 | ||
183 | While many protections can be considered deterministic (e.g. read-only | 199 | While many protections can be considered deterministic (e.g. read-only |
184 | memory cannot be written to), some protections provide only statistical | 200 | memory cannot be written to), some protections provide only statistical |
@@ -186,7 +202,8 @@ defense, in that an attack must gather enough information about a | |||
186 | running system to overcome the defense. While not perfect, these do | 202 | running system to overcome the defense. While not perfect, these do |
187 | provide meaningful defenses. | 203 | provide meaningful defenses. |
188 | 204 | ||
189 | ### Canaries, blinding, and other secrets | 205 | Canaries, blinding, and other secrets |
206 | ------------------------------------- | ||
190 | 207 | ||
191 | It should be noted that things like the stack canary discussed earlier | 208 | It should be noted that things like the stack canary discussed earlier |
192 | are technically statistical defenses, since they rely on a secret value, | 209 | are technically statistical defenses, since they rely on a secret value, |
@@ -201,7 +218,8 @@ It is critical that the secret values used must be separate (e.g. | |||
201 | different canary per stack) and high entropy (e.g. is the RNG actually | 218 | different canary per stack) and high entropy (e.g. is the RNG actually |
202 | working?) in order to maximize their success. | 219 | working?) in order to maximize their success. |
203 | 220 | ||
204 | ### Kernel Address Space Layout Randomization (KASLR) | 221 | Kernel Address Space Layout Randomization (KASLR) |
222 | ------------------------------------------------- | ||
205 | 223 | ||
206 | Since the location of kernel memory is almost always instrumental in | 224 | Since the location of kernel memory is almost always instrumental in |
207 | mounting a successful attack, making the location non-deterministic | 225 | mounting a successful attack, making the location non-deterministic |
@@ -209,22 +227,25 @@ raises the difficulty of an exploit. (Note that this in turn makes | |||
209 | the value of information exposures higher, since they may be used to | 227 | the value of information exposures higher, since they may be used to |
210 | discover desired memory locations.) | 228 | discover desired memory locations.) |
211 | 229 | ||
212 | #### Text and module base | 230 | Text and module base |
231 | ~~~~~~~~~~~~~~~~~~~~ | ||
213 | 232 | ||
214 | By relocating the physical and virtual base address of the kernel at | 233 | By relocating the physical and virtual base address of the kernel at |
215 | boot-time (CONFIG_RANDOMIZE_BASE), attacks needing kernel code will be | 234 | boot-time (``CONFIG_RANDOMIZE_BASE``), attacks needing kernel code will be |
216 | frustrated. Additionally, offsetting the module loading base address | 235 | frustrated. Additionally, offsetting the module loading base address |
217 | means that even systems that load the same set of modules in the same | 236 | means that even systems that load the same set of modules in the same |
218 | order every boot will not share a common base address with the rest of | 237 | order every boot will not share a common base address with the rest of |
219 | the kernel text. | 238 | the kernel text. |
220 | 239 | ||
221 | #### Stack base | 240 | Stack base |
241 | ~~~~~~~~~~ | ||
222 | 242 | ||
223 | If the base address of the kernel stack is not the same between processes, | 243 | If the base address of the kernel stack is not the same between processes, |
224 | or even not the same between syscalls, targets on or beyond the stack | 244 | or even not the same between syscalls, targets on or beyond the stack |
225 | become more difficult to locate. | 245 | become more difficult to locate. |
226 | 246 | ||
227 | #### Dynamic memory base | 247 | Dynamic memory base |
248 | ~~~~~~~~~~~~~~~~~~~ | ||
228 | 249 | ||
229 | Much of the kernel's dynamic memory (e.g. kmalloc, vmalloc, etc) ends up | 250 | Much of the kernel's dynamic memory (e.g. kmalloc, vmalloc, etc) ends up |
230 | being relatively deterministic in layout due to the order of early-boot | 251 | being relatively deterministic in layout due to the order of early-boot |
@@ -232,7 +253,8 @@ initializations. If the base address of these areas is not the same | |||
232 | between boots, targeting them is frustrated, requiring an information | 253 | between boots, targeting them is frustrated, requiring an information |
233 | exposure specific to the region. | 254 | exposure specific to the region. |
234 | 255 | ||
235 | #### Structure layout | 256 | Structure layout |
257 | ~~~~~~~~~~~~~~~~ | ||
236 | 258 | ||
237 | By performing a per-build randomization of the layout of sensitive | 259 | By performing a per-build randomization of the layout of sensitive |
238 | structures, attacks must either be tuned to known kernel builds or expose | 260 | structures, attacks must either be tuned to known kernel builds or expose |
@@ -240,26 +262,30 @@ enough kernel memory to determine structure layouts before manipulating | |||
240 | them. | 262 | them. |
241 | 263 | ||
242 | 264 | ||
243 | ## Preventing Information Exposures | 265 | Preventing Information Exposures |
266 | ================================ | ||
244 | 267 | ||
245 | Since the locations of sensitive structures are the primary target for | 268 | Since the locations of sensitive structures are the primary target for |
246 | attacks, it is important to defend against exposure of both kernel memory | 269 | attacks, it is important to defend against exposure of both kernel memory |
247 | addresses and kernel memory contents (since they may contain kernel | 270 | addresses and kernel memory contents (since they may contain kernel |
248 | addresses or other sensitive things like canary values). | 271 | addresses or other sensitive things like canary values). |
249 | 272 | ||
250 | ### Unique identifiers | 273 | Unique identifiers |
274 | ------------------ | ||
251 | 275 | ||
252 | Kernel memory addresses must never be used as identifiers exposed to | 276 | Kernel memory addresses must never be used as identifiers exposed to |
253 | userspace. Instead, use an atomic counter, an idr, or similar unique | 277 | userspace. Instead, use an atomic counter, an idr, or similar unique |
254 | identifier. | 278 | identifier. |
255 | 279 | ||
256 | ### Memory initialization | 280 | Memory initialization |
281 | --------------------- | ||
257 | 282 | ||
258 | Memory copied to userspace must always be fully initialized. If not | 283 | Memory copied to userspace must always be fully initialized. If not |
259 | explicitly memset(), this will require changes to the compiler to make | 284 | explicitly memset(), this will require changes to the compiler to make |
260 | sure structure holes are cleared. | 285 | sure structure holes are cleared. |
261 | 286 | ||
262 | ### Memory poisoning | 287 | Memory poisoning |
288 | ---------------- | ||
263 | 289 | ||
264 | When releasing memory, it is best to poison the contents (clear stack on | 290 | When releasing memory, it is best to poison the contents (clear stack on |
265 | syscall return, wipe heap memory on a free), to avoid reuse attacks that | 291 | syscall return, wipe heap memory on a free), to avoid reuse attacks that |
@@ -267,9 +293,10 @@ rely on the old contents of memory. This frustrates many uninitialized | |||
267 | variable attacks, stack content exposures, heap content exposures, and | 293 | variable attacks, stack content exposures, heap content exposures, and |
268 | use-after-free attacks. | 294 | use-after-free attacks. |
269 | 295 | ||
270 | ### Destination tracking | 296 | Destination tracking |
297 | -------------------- | ||
271 | 298 | ||
272 | To help kill classes of bugs that result in kernel addresses being | 299 | To help kill classes of bugs that result in kernel addresses being |
273 | written to userspace, the destination of writes needs to be tracked. If | 300 | written to userspace, the destination of writes needs to be tracked. If |
274 | the buffer is destined for userspace (e.g. seq_file backed /proc files), | 301 | the buffer is destined for userspace (e.g. seq_file backed ``/proc`` files), |
275 | it should automatically censor sensitive values. | 302 | it should automatically censor sensitive values. |
diff --git a/Documentation/sh/conf.py b/Documentation/sh/conf.py new file mode 100644 index 000000000000..1eb684a13ac8 --- /dev/null +++ b/Documentation/sh/conf.py | |||
@@ -0,0 +1,10 @@ | |||
1 | # -*- coding: utf-8; mode: python -*- | ||
2 | |||
3 | project = "SuperH architecture implementation manual" | ||
4 | |||
5 | tags.add("subproject") | ||
6 | |||
7 | latex_documents = [ | ||
8 | ('index', 'sh.tex', project, | ||
9 | 'The kernel development community', 'manual'), | ||
10 | ] | ||
diff --git a/Documentation/sh/index.rst b/Documentation/sh/index.rst new file mode 100644 index 000000000000..bc8db7ba894a --- /dev/null +++ b/Documentation/sh/index.rst | |||
@@ -0,0 +1,59 @@ | |||
1 | ======================= | ||
2 | SuperH Interfaces Guide | ||
3 | ======================= | ||
4 | |||
5 | :Author: Paul Mundt | ||
6 | |||
7 | Memory Management | ||
8 | ================= | ||
9 | |||
10 | SH-4 | ||
11 | ---- | ||
12 | |||
13 | Store Queue API | ||
14 | ~~~~~~~~~~~~~~~ | ||
15 | |||
16 | .. kernel-doc:: arch/sh/kernel/cpu/sh4/sq.c | ||
17 | :export: | ||
18 | |||
19 | SH-5 | ||
20 | ---- | ||
21 | |||
22 | TLB Interfaces | ||
23 | ~~~~~~~~~~~~~~ | ||
24 | |||
25 | .. kernel-doc:: arch/sh/mm/tlb-sh5.c | ||
26 | :internal: | ||
27 | |||
28 | .. kernel-doc:: arch/sh/include/asm/tlb_64.h | ||
29 | :internal: | ||
30 | |||
31 | Machine Specific Interfaces | ||
32 | =========================== | ||
33 | |||
34 | mach-dreamcast | ||
35 | -------------- | ||
36 | |||
37 | .. kernel-doc:: arch/sh/boards/mach-dreamcast/rtc.c | ||
38 | :internal: | ||
39 | |||
40 | mach-x3proto | ||
41 | ------------ | ||
42 | |||
43 | .. kernel-doc:: arch/sh/boards/mach-x3proto/ilsel.c | ||
44 | :export: | ||
45 | |||
46 | Busses | ||
47 | ====== | ||
48 | |||
49 | SuperHyway | ||
50 | ---------- | ||
51 | |||
52 | .. kernel-doc:: drivers/sh/superhyway/superhyway.c | ||
53 | :export: | ||
54 | |||
55 | Maple | ||
56 | ----- | ||
57 | |||
58 | .. kernel-doc:: drivers/sh/maple/maple.c | ||
59 | :export: | ||
diff --git a/Documentation/sound/conf.py b/Documentation/sound/conf.py new file mode 100644 index 000000000000..3f1fc5e74e7b --- /dev/null +++ b/Documentation/sound/conf.py | |||
@@ -0,0 +1,10 @@ | |||
1 | # -*- coding: utf-8; mode: python -*- | ||
2 | |||
3 | project = "Linux Sound Subsystem Documentation" | ||
4 | |||
5 | tags.add("subproject") | ||
6 | |||
7 | latex_documents = [ | ||
8 | ('index', 'sound.tex', project, | ||
9 | 'The kernel development community', 'manual'), | ||
10 | ] | ||
diff --git a/Documentation/sphinx/convert_template.sed b/Documentation/sphinx/convert_template.sed deleted file mode 100644 index c1503fcca4ec..000000000000 --- a/Documentation/sphinx/convert_template.sed +++ /dev/null | |||
@@ -1,18 +0,0 @@ | |||
1 | # | ||
2 | # Pandoc doesn't grok <function> or <structname>, so convert them | ||
3 | # ahead of time. | ||
4 | # | ||
5 | # Use the following escapes to pass through pandoc: | ||
6 | # $bq = "`" | ||
7 | # $lt = "<" | ||
8 | # $gt = ">" | ||
9 | # | ||
10 | s%<function>\([^<(]\+\)()</function>%:c:func:$bq\1()$bq%g | ||
11 | s%<function>\([^<(]\+\)</function>%:c:func:$bq\1()$bq%g | ||
12 | s%<structname>struct *\([^<]\+\)</structname>%:c:type:$bqstruct \1 $lt\1$gt$bq%g | ||
13 | s%struct <structname>\([^<]\+\)</structname>%:c:type:$bqstruct \1 $lt\1$gt$bq%g | ||
14 | s%<structname>\([^<]\+\)</structname>%:c:type:$bqstruct \1 $lt\1$gt$bq%g | ||
15 | # | ||
16 | # Wrap docproc directives in para and code blocks. | ||
17 | # | ||
18 | s%^\(!.*\)$%<para><code>DOCPROC: \1</code></para>% | ||
diff --git a/Documentation/sphinx/post_convert.sed b/Documentation/sphinx/post_convert.sed deleted file mode 100644 index 392770bac53b..000000000000 --- a/Documentation/sphinx/post_convert.sed +++ /dev/null | |||
@@ -1,23 +0,0 @@ | |||
1 | # | ||
2 | # Unescape. | ||
3 | # | ||
4 | s/$bq/`/g | ||
5 | s/$lt/</g | ||
6 | s/$gt/>/g | ||
7 | # | ||
8 | # pandoc thinks that both "_" needs to be escaped. Remove the extra | ||
9 | # backslashes. | ||
10 | # | ||
11 | s/\\_/_/g | ||
12 | # | ||
13 | # Unwrap docproc directives. | ||
14 | # | ||
15 | s/^``DOCPROC: !E\(.*\)``$/.. kernel-doc:: \1\n :export:/ | ||
16 | s/^``DOCPROC: !I\(.*\)``$/.. kernel-doc:: \1\n :internal:/ | ||
17 | s/^``DOCPROC: !F\([^ ]*\) \(.*\)``$/.. kernel-doc:: \1\n :functions: \2/ | ||
18 | s/^``DOCPROC: !P\([^ ]*\) \(.*\)``$/.. kernel-doc:: \1\n :doc: \2/ | ||
19 | s/^``DOCPROC: \(!.*\)``$/.. WARNING: DOCPROC directive not supported: \1/ | ||
20 | # | ||
21 | # Trim trailing whitespace. | ||
22 | # | ||
23 | s/[[:space:]]*$// | ||
diff --git a/Documentation/sphinx/tmplcvt b/Documentation/sphinx/tmplcvt deleted file mode 100755 index 6848f0a26fa5..000000000000 --- a/Documentation/sphinx/tmplcvt +++ /dev/null | |||
@@ -1,28 +0,0 @@ | |||
1 | #!/bin/bash | ||
2 | # | ||
3 | # Convert a template file into something like RST | ||
4 | # | ||
5 | # fix <function> | ||
6 | # feed to pandoc | ||
7 | # fix \_ | ||
8 | # title line? | ||
9 | # | ||
10 | set -eu | ||
11 | |||
12 | if [ "$#" != "2" ]; then | ||
13 | echo "$0 <docbook file> <rst file>" | ||
14 | exit | ||
15 | fi | ||
16 | |||
17 | DIR=$(dirname $0) | ||
18 | |||
19 | in=$1 | ||
20 | rst=$2 | ||
21 | tmp=$rst.tmp | ||
22 | |||
23 | cp $in $tmp | ||
24 | sed --in-place -f $DIR/convert_template.sed $tmp | ||
25 | pandoc -s -S -f docbook -t rst -o $rst $tmp | ||
26 | sed --in-place -f $DIR/post_convert.sed $rst | ||
27 | rm $tmp | ||
28 | echo "book writen to $rst" | ||
diff --git a/Documentation/translations/ja_JP/howto.rst b/Documentation/translations/ja_JP/howto.rst index 4511eed0fabb..8d7ed0cbbf5f 100644 --- a/Documentation/translations/ja_JP/howto.rst +++ b/Documentation/translations/ja_JP/howto.rst | |||
@@ -197,13 +197,6 @@ ReSTマークアップを使ã£ãŸãƒ‰ã‚ュメント㯠Documentation/outputã«ç | |||
197 | make latexdocs | 197 | make latexdocs |
198 | make epubdocs | 198 | make epubdocs |
199 | 199 | ||
200 | ç¾åœ¨ã€å¹¾ã¤ã‹ã® DocBookå½¢å¼ã§æ›¸ã‹ã‚ŒãŸãƒ‰ã‚ュメント㯠ReSTå½¢å¼ã«è»¢æ›ä¸ã§ | ||
201 | ã™ã€‚ãれらã®ãƒ‰ã‚ュメントã¯Documentation/DocBook ディレクトリã«ç”Ÿæˆã•れ〠| ||
202 | Postscript ã¾ãŸã¯ man ページã®å½¢å¼ã‚’生æˆã™ã‚‹ã«ã¯ä»¥ä¸‹ã®ã‚ˆã†ã«ã—ã¾ã™ - :: | ||
203 | |||
204 | make psdocs | ||
205 | make mandocs | ||
206 | |||
207 | カーãƒãƒ«é–‹ç™ºè€…ã«ãªã‚‹ã«ã¯ | 200 | カーãƒãƒ«é–‹ç™ºè€…ã«ãªã‚‹ã«ã¯ |
208 | ------------------------ | 201 | ------------------------ |
209 | 202 | ||
diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index 2333697251dd..624654bdcd8a 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst | |||
@@ -191,13 +191,6 @@ ReST 마í¬ì—…ì„ ì‚¬ìš©í•˜ëŠ” ë¬¸ì„œë“¤ì€ Documentation/output ì— ìƒì„±ëœë‹ | |||
191 | make latexdocs | 191 | make latexdocs |
192 | make epubdocs | 192 | make epubdocs |
193 | 193 | ||
194 | 현재, ReST ë¡œì˜ ë³€í™˜ì´ ì§„í–‰ì¤‘ì¸, DocBook 으로 ì“°ì¸ ë¬¸ì„œë“¤ì´ ì¡´ìž¬í•œë‹¤. 그런 | ||
195 | ë¬¸ì„œë“¤ì€ Documentation/DocBook/ ë””ë ‰í† ë¦¬ ì•ˆì— ìƒì„±ë 것ì´ê³ ë‹¤ìŒ ì»¤ë§¨ë“œë¥¼ 통해 | ||
196 | Postscript 나 man page ë¡œë„ ë§Œë“¤ì–´ì§ˆ 수 있다:: | ||
197 | |||
198 | make psdocs | ||
199 | make mandocs | ||
200 | |||
201 | ì»¤ë„ ê°œë°œìžê°€ ë˜ëŠ” 것 | 194 | ì»¤ë„ ê°œë°œìžê°€ ë˜ëŠ” 것 |
202 | --------------------- | 195 | --------------------- |
203 | 196 | ||
@@ -270,15 +263,17 @@ pub/linux/kernel/v4.x/ ë””ë ‰í† ë¦¬ì—서 참조ë 수 있다.개발 프로세ì | |||
270 | ì„ í˜¸ë˜ëŠ” ë°©ë²•ì€ git(커ë„ì˜ ì†ŒìŠ¤ 관리 툴, ë” ë§Žì€ ì •ë³´ë“¤ì€ | 263 | ì„ í˜¸ë˜ëŠ” ë°©ë²•ì€ git(커ë„ì˜ ì†ŒìŠ¤ 관리 툴, ë” ë§Žì€ ì •ë³´ë“¤ì€ |
271 | https://git-scm.com/ ì—서 ì°¸ì¡°í• ìˆ˜ 있다)를 사용하는 것ì´ì§€ë§Œ 순수한 | 264 | https://git-scm.com/ ì—서 ì°¸ì¡°í• ìˆ˜ 있다)를 사용하는 것ì´ì§€ë§Œ 순수한 |
272 | 패치파ì¼ì˜ 형ì‹ìœ¼ë¡œ 보내는 ê²ƒë„ ë¬´ê´€í•˜ë‹¤. | 265 | 패치파ì¼ì˜ 형ì‹ìœ¼ë¡œ 보내는 ê²ƒë„ ë¬´ê´€í•˜ë‹¤. |
273 | - 2주 í›„ì— -rc1 커ë„ì´ ë°°í¬ë˜ë©° 지금부터는 ì „ì²´ 커ë„ì˜ ì•ˆì •ì„±ì— ì˜í–¥ì„ | 266 | - 2주 í›„ì— -rc1 커ë„ì´ ë¦´ë¦¬ì¦ˆë˜ë©° ì—¬ê¸°ì„œë¶€í„°ì˜ ì£¼ì•ˆì ì€ ìƒˆë¡œìš´ 커ë„ì„ |
274 | ë¯¸ì¹ ìˆ˜ 있는 새로운 ê¸°ëŠ¥ë“¤ì„ í¬í•¨í•˜ì§€ 않는 íŒ¨ì¹˜ë“¤ë§Œì´ ì¶”ê°€ë 수 있다. | 267 | 가능한한 ì•ˆì •ë˜ê²Œ 하는 것ì´ë‹¤. ì´ ì‹œì ì—ì„œì˜ ëŒ€ë¶€ë¶„ì˜ íŒ¨ì¹˜ë“¤ì€ |
275 | ì™„ì „ížˆ 새로운 드ë¼ì´ë²„(í˜¹ì€ íŒŒì¼ì‹œìŠ¤í…œ)는 -rc1 ì´í›„ì—ë§Œ 받아들여진다는 | ||
276 | ê²ƒì„ ê¸°ì–µí•´ë¼. 왜ëƒí•˜ë©´ ë³€ê²½ì´ ìžì²´ë‚´ì—서만 ë°œìƒí•˜ê³ ì¶”ê°€ëœ ì½”ë“œê°€ | ||
277 | 드ë¼ì´ë²„ ì™¸ë¶€ì˜ ë‹¤ë¥¸ 부분ì—는 ì˜í–¥ì„ 주지 않으므로 그런 ë³€ê²½ì€ | ||
278 | 회귀(ì—ìžì£¼: ì´ì „ì—는 존재하지 않았지만 새로운 기능추가나 변경으로 ì¸í•´ | 268 | 회귀(ì—ìžì£¼: ì´ì „ì—는 존재하지 않았지만 새로운 기능추가나 변경으로 ì¸í•´ |
279 | ìƒê²¨ë‚œ 버그)를 ì¼ìœ¼í‚¬ 만한 ìœ„í—˜ì„ ê°€ì§€ê³ ìžˆì§€ 않기 때문ì´ë‹¤. -rc1ì´ | 269 | ìƒê²¨ë‚œ 버그)를 ê³ ì³ì•¼ 한다. ì´ì „부터 존재한 버그는 회귀가 아니므로, 그런 |
280 | ë°°í¬ëœ ì´í›„ì— git를 사용하여 íŒ¨ì¹˜ë“¤ì„ Linusì—게 보낼수 있지만 íŒ¨ì¹˜ë“¤ì€ | 270 | ë²„ê·¸ì— ëŒ€í•œ ìˆ˜ì •ì‚¬í•ì€ ì¤‘ìš”í•œ 경우ì—ë§Œ ë³´ë‚´ì ¸ì•¼ 한다. ì™„ì „ížˆ 새로운 |
281 | ê³µì‹ì ì¸ ë©”ì¼ë§ 리스트로 보내서 ê²€í† ë¥¼ ë°›ì„ í•„ìš”ê°€ 있다. | 271 | 드ë¼ì´ë²„(í˜¹ì€ íŒŒì¼ì‹œìŠ¤í…œ)는 -rc1 ì´í›„ì—ë§Œ 받아들여진다는 ê²ƒì„ ê¸°ì–µí•´ë¼. |
272 | 왜ëƒí•˜ë©´ ë³€ê²½ì´ ìžì²´ë‚´ì—서만 ë°œìƒí•˜ê³ ì¶”ê°€ëœ ì½”ë“œê°€ 드ë¼ì´ë²„ ì™¸ë¶€ì˜ ë‹¤ë¥¸ | ||
273 | 부분ì—는 ì˜í–¥ì„ 주지 않으므로 그런 ë³€ê²½ì€ íšŒê·€ë¥¼ ì¼ìœ¼í‚¬ 만한 ìœ„í—˜ì„ ê°€ì§€ê³ | ||
274 | 있지 않기 때문ì´ë‹¤. -rc1ì´ ë°°í¬ëœ ì´í›„ì— git를 사용하여 íŒ¨ì¹˜ë“¤ì„ Linusì—게 | ||
275 | 보낼수 있지만 íŒ¨ì¹˜ë“¤ì€ ê³µì‹ì ì¸ ë©”ì¼ë§ 리스트로 보내서 ê²€í† ë¥¼ ë°›ì„ í•„ìš”ê°€ | ||
276 | 있다. | ||
282 | - 새로운 -rc는 Linusê°€ 현재 git treeê°€ 테스트 í•˜ê¸°ì— ì¶©ë¶„ížˆ ì•ˆì •ëœ ìƒíƒœì— | 277 | - 새로운 -rc는 Linusê°€ 현재 git treeê°€ 테스트 í•˜ê¸°ì— ì¶©ë¶„ížˆ ì•ˆì •ëœ ìƒíƒœì— |
283 | ìžˆë‹¤ê³ íŒë‹¨ë 때마다 ë°°í¬ëœë‹¤. 목표는 새로운 -rc 커ë„ì„ ë§¤ì£¼ ë°°í¬í•˜ëŠ” | 278 | ìžˆë‹¤ê³ íŒë‹¨ë 때마다 ë°°í¬ëœë‹¤. 목표는 새로운 -rc 커ë„ì„ ë§¤ì£¼ ë°°í¬í•˜ëŠ” |
284 | 것ì´ë‹¤. | 279 | 것ì´ë‹¤. |
@@ -359,7 +354,7 @@ http://patchwork.ozlabs.org/ ì— ë‚˜ì—´ë˜ì–´ 있다. | |||
359 | 버그 ë³´ê³ | 354 | 버그 ë³´ê³ |
360 | --------- | 355 | --------- |
361 | 356 | ||
362 | https://bugzilla.kernel.org는 리눅스 ì»¤ë„ ê°œë°œìžë“¤ì´ 커ë„ì˜ ë²„ê·¸ë¥¼ ì¶”ì 하는 | 357 | https://bugzilla.kernel.org 는 리눅스 ì»¤ë„ ê°œë°œìžë“¤ì´ 커ë„ì˜ ë²„ê·¸ë¥¼ ì¶”ì 하는 |
363 | ê³³ì´ë‹¤. 사용ìžë“¤ì€ 발견한 ëª¨ë“ ë²„ê·¸ë“¤ì„ ë³´ê³ í•˜ê¸° 위하여 ì´ íˆ´ì„ ì‚¬ìš©í• ê²ƒì„ | 358 | ê³³ì´ë‹¤. 사용ìžë“¤ì€ 발견한 ëª¨ë“ ë²„ê·¸ë“¤ì„ ë³´ê³ í•˜ê¸° 위하여 ì´ íˆ´ì„ ì‚¬ìš©í• ê²ƒì„ |
364 | 권장한다. kernel bugzilla를 사용하는 ìžì„¸í•œ ë°©ë²•ì€ ë‹¤ìŒì„ 참조하ë¼. | 359 | 권장한다. kernel bugzilla를 사용하는 ìžì„¸í•œ ë°©ë²•ì€ ë‹¤ìŒì„ 참조하ë¼. |
365 | 360 | ||
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt b/Documentation/translations/ko_KR/memory-barriers.txt index d05d4c54e8f7..c6f4ead76ce7 100644 --- a/Documentation/translations/ko_KR/memory-barriers.txt +++ b/Documentation/translations/ko_KR/memory-barriers.txt | |||
@@ -786,7 +786,7 @@ CPU 는 b ë¡œë¶€í„°ì˜ ë¡œë“œ 오í¼ë ˆì´ì…˜ì´ a ë¡œë¶€í„°ì˜ ë¡œë“œ 오í¼ë ˆ | |||
786 | ìœ„ì˜ ì½”ë“œë¥¼ 아래와 ê°™ì´ ë°”ê¿”ë²„ë¦´ 수 있습니다: | 786 | ìœ„ì˜ ì½”ë“œë¥¼ 아래와 ê°™ì´ ë°”ê¿”ë²„ë¦´ 수 있습니다: |
787 | 787 | ||
788 | q = READ_ONCE(a); | 788 | q = READ_ONCE(a); |
789 | WRITE_ONCE(b, 1); | 789 | WRITE_ONCE(b, 2); |
790 | do_something_else(); | 790 | do_something_else(); |
791 | 791 | ||
792 | ì´ë ‡ê²Œ ë˜ë©´, CPU 는 변수 'a' ë¡œë¶€í„°ì˜ ë¡œë“œì™€ 변수 'b' ë¡œì˜ ìŠ¤í† ì–´ 사ì´ì˜ 순서를 | 792 | ì´ë ‡ê²Œ ë˜ë©´, CPU 는 변수 'a' ë¡œë¶€í„°ì˜ ë¡œë“œì™€ 변수 'b' ë¡œì˜ ìŠ¤í† ì–´ 사ì´ì˜ 순서를 |
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index a9d01b44a659..7b2eb1b7d4ca 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst | |||
@@ -16,6 +16,8 @@ place where this information is gathered. | |||
16 | .. toctree:: | 16 | .. toctree:: |
17 | :maxdepth: 2 | 17 | :maxdepth: 2 |
18 | 18 | ||
19 | no_new_privs | ||
20 | seccomp_filter | ||
19 | unshare | 21 | unshare |
20 | 22 | ||
21 | .. only:: subproject and html | 23 | .. only:: subproject and html |
diff --git a/Documentation/prctl/no_new_privs.txt b/Documentation/userspace-api/no_new_privs.rst index f7be84fba910..d060ea217ea1 100644 --- a/Documentation/prctl/no_new_privs.txt +++ b/Documentation/userspace-api/no_new_privs.rst | |||
@@ -1,3 +1,7 @@ | |||
1 | ====================== | ||
2 | No New Privileges Flag | ||
3 | ====================== | ||
4 | |||
1 | The execve system call can grant a newly-started program privileges that | 5 | The execve system call can grant a newly-started program privileges that |
2 | its parent did not have. The most obvious examples are setuid/setgid | 6 | its parent did not have. The most obvious examples are setuid/setgid |
3 | programs and file capabilities. To prevent the parent program from | 7 | programs and file capabilities. To prevent the parent program from |
@@ -5,53 +9,55 @@ gaining these privileges as well, the kernel and user code must be | |||
5 | careful to prevent the parent from doing anything that could subvert the | 9 | careful to prevent the parent from doing anything that could subvert the |
6 | child. For example: | 10 | child. For example: |
7 | 11 | ||
8 | - The dynamic loader handles LD_* environment variables differently if | 12 | - The dynamic loader handles ``LD_*`` environment variables differently if |
9 | a program is setuid. | 13 | a program is setuid. |
10 | 14 | ||
11 | - chroot is disallowed to unprivileged processes, since it would allow | 15 | - chroot is disallowed to unprivileged processes, since it would allow |
12 | /etc/passwd to be replaced from the point of view of a process that | 16 | ``/etc/passwd`` to be replaced from the point of view of a process that |
13 | inherited chroot. | 17 | inherited chroot. |
14 | 18 | ||
15 | - The exec code has special handling for ptrace. | 19 | - The exec code has special handling for ptrace. |
16 | 20 | ||
17 | These are all ad-hoc fixes. The no_new_privs bit (since Linux 3.5) is a | 21 | These are all ad-hoc fixes. The ``no_new_privs`` bit (since Linux 3.5) is a |
18 | new, generic mechanism to make it safe for a process to modify its | 22 | new, generic mechanism to make it safe for a process to modify its |
19 | execution environment in a manner that persists across execve. Any task | 23 | execution environment in a manner that persists across execve. Any task |
20 | can set no_new_privs. Once the bit is set, it is inherited across fork, | 24 | can set ``no_new_privs``. Once the bit is set, it is inherited across fork, |
21 | clone, and execve and cannot be unset. With no_new_privs set, execve | 25 | clone, and execve and cannot be unset. With ``no_new_privs`` set, ``execve()`` |
22 | promises not to grant the privilege to do anything that could not have | 26 | promises not to grant the privilege to do anything that could not have |
23 | been done without the execve call. For example, the setuid and setgid | 27 | been done without the execve call. For example, the setuid and setgid |
24 | bits will no longer change the uid or gid; file capabilities will not | 28 | bits will no longer change the uid or gid; file capabilities will not |
25 | add to the permitted set, and LSMs will not relax constraints after | 29 | add to the permitted set, and LSMs will not relax constraints after |
26 | execve. | 30 | execve. |
27 | 31 | ||
28 | To set no_new_privs, use prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0). | 32 | To set ``no_new_privs``, use:: |
33 | |||
34 | prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); | ||
29 | 35 | ||
30 | Be careful, though: LSMs might also not tighten constraints on exec | 36 | Be careful, though: LSMs might also not tighten constraints on exec |
31 | in no_new_privs mode. (This means that setting up a general-purpose | 37 | in ``no_new_privs`` mode. (This means that setting up a general-purpose |
32 | service launcher to set no_new_privs before execing daemons may | 38 | service launcher to set ``no_new_privs`` before execing daemons may |
33 | interfere with LSM-based sandboxing.) | 39 | interfere with LSM-based sandboxing.) |
34 | 40 | ||
35 | Note that no_new_privs does not prevent privilege changes that do not | 41 | Note that ``no_new_privs`` does not prevent privilege changes that do not |
36 | involve execve. An appropriately privileged task can still call | 42 | involve ``execve()``. An appropriately privileged task can still call |
37 | setuid(2) and receive SCM_RIGHTS datagrams. | 43 | ``setuid(2)`` and receive SCM_RIGHTS datagrams. |
38 | 44 | ||
39 | There are two main use cases for no_new_privs so far: | 45 | There are two main use cases for ``no_new_privs`` so far: |
40 | 46 | ||
41 | - Filters installed for the seccomp mode 2 sandbox persist across | 47 | - Filters installed for the seccomp mode 2 sandbox persist across |
42 | execve and can change the behavior of newly-executed programs. | 48 | execve and can change the behavior of newly-executed programs. |
43 | Unprivileged users are therefore only allowed to install such filters | 49 | Unprivileged users are therefore only allowed to install such filters |
44 | if no_new_privs is set. | 50 | if ``no_new_privs`` is set. |
45 | 51 | ||
46 | - By itself, no_new_privs can be used to reduce the attack surface | 52 | - By itself, ``no_new_privs`` can be used to reduce the attack surface |
47 | available to an unprivileged user. If everything running with a | 53 | available to an unprivileged user. If everything running with a |
48 | given uid has no_new_privs set, then that uid will be unable to | 54 | given uid has ``no_new_privs`` set, then that uid will be unable to |
49 | escalate its privileges by directly attacking setuid, setgid, and | 55 | escalate its privileges by directly attacking setuid, setgid, and |
50 | fcap-using binaries; it will need to compromise something without the | 56 | fcap-using binaries; it will need to compromise something without the |
51 | no_new_privs bit set first. | 57 | ``no_new_privs`` bit set first. |
52 | 58 | ||
53 | In the future, other potentially dangerous kernel features could become | 59 | In the future, other potentially dangerous kernel features could become |
54 | available to unprivileged tasks if no_new_privs is set. In principle, | 60 | available to unprivileged tasks if ``no_new_privs`` is set. In principle, |
55 | several options to unshare(2) and clone(2) would be safe when | 61 | several options to ``unshare(2)`` and ``clone(2)`` would be safe when |
56 | no_new_privs is set, and no_new_privs + chroot is considerable less | 62 | ``no_new_privs`` is set, and ``no_new_privs`` + ``chroot`` is considerable less |
57 | dangerous than chroot by itself. | 63 | dangerous than chroot by itself. |
diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/userspace-api/seccomp_filter.rst index 1e469ef75778..f71eb5ef1f2d 100644 --- a/Documentation/prctl/seccomp_filter.txt +++ b/Documentation/userspace-api/seccomp_filter.rst | |||
@@ -1,8 +1,9 @@ | |||
1 | SECure COMPuting with filters | 1 | =========================================== |
2 | ============================= | 2 | Seccomp BPF (SECure COMPuting with filters) |
3 | =========================================== | ||
3 | 4 | ||
4 | Introduction | 5 | Introduction |
5 | ------------ | 6 | ============ |
6 | 7 | ||
7 | A large number of system calls are exposed to every userland process | 8 | A large number of system calls are exposed to every userland process |
8 | with many of them going unused for the entire lifetime of the process. | 9 | with many of them going unused for the entire lifetime of the process. |
@@ -27,7 +28,7 @@ pointers which constrains all filters to solely evaluating the system | |||
27 | call arguments directly. | 28 | call arguments directly. |
28 | 29 | ||
29 | What it isn't | 30 | What it isn't |
30 | ------------- | 31 | ============= |
31 | 32 | ||
32 | System call filtering isn't a sandbox. It provides a clearly defined | 33 | System call filtering isn't a sandbox. It provides a clearly defined |
33 | mechanism for minimizing the exposed kernel surface. It is meant to be | 34 | mechanism for minimizing the exposed kernel surface. It is meant to be |
@@ -40,13 +41,13 @@ system calls in socketcall() is allowed, for instance) which could be | |||
40 | construed, incorrectly, as a more complete sandboxing solution. | 41 | construed, incorrectly, as a more complete sandboxing solution. |
41 | 42 | ||
42 | Usage | 43 | Usage |
43 | ----- | 44 | ===== |
44 | 45 | ||
45 | An additional seccomp mode is added and is enabled using the same | 46 | An additional seccomp mode is added and is enabled using the same |
46 | prctl(2) call as the strict seccomp. If the architecture has | 47 | prctl(2) call as the strict seccomp. If the architecture has |
47 | CONFIG_HAVE_ARCH_SECCOMP_FILTER, then filters may be added as below: | 48 | ``CONFIG_HAVE_ARCH_SECCOMP_FILTER``, then filters may be added as below: |
48 | 49 | ||
49 | PR_SET_SECCOMP: | 50 | ``PR_SET_SECCOMP``: |
50 | Now takes an additional argument which specifies a new filter | 51 | Now takes an additional argument which specifies a new filter |
51 | using a BPF program. | 52 | using a BPF program. |
52 | The BPF program will be executed over struct seccomp_data | 53 | The BPF program will be executed over struct seccomp_data |
@@ -55,24 +56,25 @@ PR_SET_SECCOMP: | |||
55 | acceptable values to inform the kernel which action should be | 56 | acceptable values to inform the kernel which action should be |
56 | taken. | 57 | taken. |
57 | 58 | ||
58 | Usage: | 59 | Usage:: |
60 | |||
59 | prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog); | 61 | prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog); |
60 | 62 | ||
61 | The 'prog' argument is a pointer to a struct sock_fprog which | 63 | The 'prog' argument is a pointer to a struct sock_fprog which |
62 | will contain the filter program. If the program is invalid, the | 64 | will contain the filter program. If the program is invalid, the |
63 | call will return -1 and set errno to EINVAL. | 65 | call will return -1 and set errno to ``EINVAL``. |
64 | 66 | ||
65 | If fork/clone and execve are allowed by @prog, any child | 67 | If ``fork``/``clone`` and ``execve`` are allowed by @prog, any child |
66 | processes will be constrained to the same filters and system | 68 | processes will be constrained to the same filters and system |
67 | call ABI as the parent. | 69 | call ABI as the parent. |
68 | 70 | ||
69 | Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or | 71 | Prior to use, the task must call ``prctl(PR_SET_NO_NEW_PRIVS, 1)`` or |
70 | run with CAP_SYS_ADMIN privileges in its namespace. If these are not | 72 | run with ``CAP_SYS_ADMIN`` privileges in its namespace. If these are not |
71 | true, -EACCES will be returned. This requirement ensures that filter | 73 | true, ``-EACCES`` will be returned. This requirement ensures that filter |
72 | programs cannot be applied to child processes with greater privileges | 74 | programs cannot be applied to child processes with greater privileges |
73 | than the task that installed them. | 75 | than the task that installed them. |
74 | 76 | ||
75 | Additionally, if prctl(2) is allowed by the attached filter, | 77 | Additionally, if ``prctl(2)`` is allowed by the attached filter, |
76 | additional filters may be layered on which will increase evaluation | 78 | additional filters may be layered on which will increase evaluation |
77 | time, but allow for further decreasing the attack surface during | 79 | time, but allow for further decreasing the attack surface during |
78 | execution of a process. | 80 | execution of a process. |
@@ -80,51 +82,52 @@ PR_SET_SECCOMP: | |||
80 | The above call returns 0 on success and non-zero on error. | 82 | The above call returns 0 on success and non-zero on error. |
81 | 83 | ||
82 | Return values | 84 | Return values |
83 | ------------- | 85 | ============= |
86 | |||
84 | A seccomp filter may return any of the following values. If multiple | 87 | A seccomp filter may return any of the following values. If multiple |
85 | filters exist, the return value for the evaluation of a given system | 88 | filters exist, the return value for the evaluation of a given system |
86 | call will always use the highest precedent value. (For example, | 89 | call will always use the highest precedent value. (For example, |
87 | SECCOMP_RET_KILL will always take precedence.) | 90 | ``SECCOMP_RET_KILL`` will always take precedence.) |
88 | 91 | ||
89 | In precedence order, they are: | 92 | In precedence order, they are: |
90 | 93 | ||
91 | SECCOMP_RET_KILL: | 94 | ``SECCOMP_RET_KILL``: |
92 | Results in the task exiting immediately without executing the | 95 | Results in the task exiting immediately without executing the |
93 | system call. The exit status of the task (status & 0x7f) will | 96 | system call. The exit status of the task (``status & 0x7f``) will |
94 | be SIGSYS, not SIGKILL. | 97 | be ``SIGSYS``, not ``SIGKILL``. |
95 | 98 | ||
96 | SECCOMP_RET_TRAP: | 99 | ``SECCOMP_RET_TRAP``: |
97 | Results in the kernel sending a SIGSYS signal to the triggering | 100 | Results in the kernel sending a ``SIGSYS`` signal to the triggering |
98 | task without executing the system call. siginfo->si_call_addr | 101 | task without executing the system call. ``siginfo->si_call_addr`` |
99 | will show the address of the system call instruction, and | 102 | will show the address of the system call instruction, and |
100 | siginfo->si_syscall and siginfo->si_arch will indicate which | 103 | ``siginfo->si_syscall`` and ``siginfo->si_arch`` will indicate which |
101 | syscall was attempted. The program counter will be as though | 104 | syscall was attempted. The program counter will be as though |
102 | the syscall happened (i.e. it will not point to the syscall | 105 | the syscall happened (i.e. it will not point to the syscall |
103 | instruction). The return value register will contain an arch- | 106 | instruction). The return value register will contain an arch- |
104 | dependent value -- if resuming execution, set it to something | 107 | dependent value -- if resuming execution, set it to something |
105 | sensible. (The architecture dependency is because replacing | 108 | sensible. (The architecture dependency is because replacing |
106 | it with -ENOSYS could overwrite some useful information.) | 109 | it with ``-ENOSYS`` could overwrite some useful information.) |
107 | 110 | ||
108 | The SECCOMP_RET_DATA portion of the return value will be passed | 111 | The ``SECCOMP_RET_DATA`` portion of the return value will be passed |
109 | as si_errno. | 112 | as ``si_errno``. |
110 | 113 | ||
111 | SIGSYS triggered by seccomp will have a si_code of SYS_SECCOMP. | 114 | ``SIGSYS`` triggered by seccomp will have a si_code of ``SYS_SECCOMP``. |
112 | 115 | ||
113 | SECCOMP_RET_ERRNO: | 116 | ``SECCOMP_RET_ERRNO``: |
114 | Results in the lower 16-bits of the return value being passed | 117 | Results in the lower 16-bits of the return value being passed |
115 | to userland as the errno without executing the system call. | 118 | to userland as the errno without executing the system call. |
116 | 119 | ||
117 | SECCOMP_RET_TRACE: | 120 | ``SECCOMP_RET_TRACE``: |
118 | When returned, this value will cause the kernel to attempt to | 121 | When returned, this value will cause the kernel to attempt to |
119 | notify a ptrace()-based tracer prior to executing the system | 122 | notify a ``ptrace()``-based tracer prior to executing the system |
120 | call. If there is no tracer present, -ENOSYS is returned to | 123 | call. If there is no tracer present, ``-ENOSYS`` is returned to |
121 | userland and the system call is not executed. | 124 | userland and the system call is not executed. |
122 | 125 | ||
123 | A tracer will be notified if it requests PTRACE_O_TRACESECCOMP | 126 | A tracer will be notified if it requests ``PTRACE_O_TRACESECCOM``P |
124 | using ptrace(PTRACE_SETOPTIONS). The tracer will be notified | 127 | using ``ptrace(PTRACE_SETOPTIONS)``. The tracer will be notified |
125 | of a PTRACE_EVENT_SECCOMP and the SECCOMP_RET_DATA portion of | 128 | of a ``PTRACE_EVENT_SECCOMP`` and the ``SECCOMP_RET_DATA`` portion of |
126 | the BPF program return value will be available to the tracer | 129 | the BPF program return value will be available to the tracer |
127 | via PTRACE_GETEVENTMSG. | 130 | via ``PTRACE_GETEVENTMSG``. |
128 | 131 | ||
129 | The tracer can skip the system call by changing the syscall number | 132 | The tracer can skip the system call by changing the syscall number |
130 | to -1. Alternatively, the tracer can change the system call | 133 | to -1. Alternatively, the tracer can change the system call |
@@ -138,19 +141,19 @@ SECCOMP_RET_TRACE: | |||
138 | allow use of ptrace, even of other sandboxed processes, without | 141 | allow use of ptrace, even of other sandboxed processes, without |
139 | extreme care; ptracers can use this mechanism to escape.) | 142 | extreme care; ptracers can use this mechanism to escape.) |
140 | 143 | ||
141 | SECCOMP_RET_ALLOW: | 144 | ``SECCOMP_RET_ALLOW``: |
142 | Results in the system call being executed. | 145 | Results in the system call being executed. |
143 | 146 | ||
144 | If multiple filters exist, the return value for the evaluation of a | 147 | If multiple filters exist, the return value for the evaluation of a |
145 | given system call will always use the highest precedent value. | 148 | given system call will always use the highest precedent value. |
146 | 149 | ||
147 | Precedence is only determined using the SECCOMP_RET_ACTION mask. When | 150 | Precedence is only determined using the ``SECCOMP_RET_ACTION`` mask. When |
148 | multiple filters return values of the same precedence, only the | 151 | multiple filters return values of the same precedence, only the |
149 | SECCOMP_RET_DATA from the most recently installed filter will be | 152 | ``SECCOMP_RET_DATA`` from the most recently installed filter will be |
150 | returned. | 153 | returned. |
151 | 154 | ||
152 | Pitfalls | 155 | Pitfalls |
153 | -------- | 156 | ======== |
154 | 157 | ||
155 | The biggest pitfall to avoid during use is filtering on system call | 158 | The biggest pitfall to avoid during use is filtering on system call |
156 | number without checking the architecture value. Why? On any | 159 | number without checking the architecture value. Why? On any |
@@ -160,39 +163,40 @@ the numbers in the different calling conventions overlap, then checks in | |||
160 | the filters may be abused. Always check the arch value! | 163 | the filters may be abused. Always check the arch value! |
161 | 164 | ||
162 | Example | 165 | Example |
163 | ------- | 166 | ======= |
164 | 167 | ||
165 | The samples/seccomp/ directory contains both an x86-specific example | 168 | The ``samples/seccomp/`` directory contains both an x86-specific example |
166 | and a more generic example of a higher level macro interface for BPF | 169 | and a more generic example of a higher level macro interface for BPF |
167 | program generation. | 170 | program generation. |
168 | 171 | ||
169 | 172 | ||
170 | 173 | ||
171 | Adding architecture support | 174 | Adding architecture support |
172 | ----------------------- | 175 | =========================== |
173 | 176 | ||
174 | See arch/Kconfig for the authoritative requirements. In general, if an | 177 | See ``arch/Kconfig`` for the authoritative requirements. In general, if an |
175 | architecture supports both ptrace_event and seccomp, it will be able to | 178 | architecture supports both ptrace_event and seccomp, it will be able to |
176 | support seccomp filter with minor fixup: SIGSYS support and seccomp return | 179 | support seccomp filter with minor fixup: ``SIGSYS`` support and seccomp return |
177 | value checking. Then it must just add CONFIG_HAVE_ARCH_SECCOMP_FILTER | 180 | value checking. Then it must just add ``CONFIG_HAVE_ARCH_SECCOMP_FILTER`` |
178 | to its arch-specific Kconfig. | 181 | to its arch-specific Kconfig. |
179 | 182 | ||
180 | 183 | ||
181 | 184 | ||
182 | Caveats | 185 | Caveats |
183 | ------- | 186 | ======= |
184 | 187 | ||
185 | The vDSO can cause some system calls to run entirely in userspace, | 188 | The vDSO can cause some system calls to run entirely in userspace, |
186 | leading to surprises when you run programs on different machines that | 189 | leading to surprises when you run programs on different machines that |
187 | fall back to real syscalls. To minimize these surprises on x86, make | 190 | fall back to real syscalls. To minimize these surprises on x86, make |
188 | sure you test with | 191 | sure you test with |
189 | /sys/devices/system/clocksource/clocksource0/current_clocksource set to | 192 | ``/sys/devices/system/clocksource/clocksource0/current_clocksource`` set to |
190 | something like acpi_pm. | 193 | something like ``acpi_pm``. |
191 | 194 | ||
192 | On x86-64, vsyscall emulation is enabled by default. (vsyscalls are | 195 | On x86-64, vsyscall emulation is enabled by default. (vsyscalls are |
193 | legacy variants on vDSO calls.) Currently, emulated vsyscalls will honor seccomp, with a few oddities: | 196 | legacy variants on vDSO calls.) Currently, emulated vsyscalls will |
197 | honor seccomp, with a few oddities: | ||
194 | 198 | ||
195 | - A return value of SECCOMP_RET_TRAP will set a si_call_addr pointing to | 199 | - A return value of ``SECCOMP_RET_TRAP`` will set a ``si_call_addr`` pointing to |
196 | the vsyscall entry for the given call and not the address after the | 200 | the vsyscall entry for the given call and not the address after the |
197 | 'syscall' instruction. Any code which wants to restart the call | 201 | 'syscall' instruction. Any code which wants to restart the call |
198 | should be aware that (a) a ret instruction has been emulated and (b) | 202 | should be aware that (a) a ret instruction has been emulated and (b) |
@@ -200,7 +204,7 @@ legacy variants on vDSO calls.) Currently, emulated vsyscalls will honor seccom | |||
200 | emulation security checks, making resuming the syscall mostly | 204 | emulation security checks, making resuming the syscall mostly |
201 | pointless. | 205 | pointless. |
202 | 206 | ||
203 | - A return value of SECCOMP_RET_TRACE will signal the tracer as usual, | 207 | - A return value of ``SECCOMP_RET_TRACE`` will signal the tracer as usual, |
204 | but the syscall may not be changed to another system call using the | 208 | but the syscall may not be changed to another system call using the |
205 | orig_rax register. It may only be changed to -1 order to skip the | 209 | orig_rax register. It may only be changed to -1 order to skip the |
206 | currently emulated call. Any other change MAY terminate the process. | 210 | currently emulated call. Any other change MAY terminate the process. |
@@ -209,14 +213,14 @@ legacy variants on vDSO calls.) Currently, emulated vsyscalls will honor seccom | |||
209 | rip or rsp. (Do not rely on other changes terminating the process. | 213 | rip or rsp. (Do not rely on other changes terminating the process. |
210 | They might work. For example, on some kernels, choosing a syscall | 214 | They might work. For example, on some kernels, choosing a syscall |
211 | that only exists in future kernels will be correctly emulated (by | 215 | that only exists in future kernels will be correctly emulated (by |
212 | returning -ENOSYS). | 216 | returning ``-ENOSYS``). |
213 | 217 | ||
214 | To detect this quirky behavior, check for addr & ~0x0C00 == | 218 | To detect this quirky behavior, check for ``addr & ~0x0C00 == |
215 | 0xFFFFFFFFFF600000. (For SECCOMP_RET_TRACE, use rip. For | 219 | 0xFFFFFFFFFF600000``. (For ``SECCOMP_RET_TRACE``, use rip. For |
216 | SECCOMP_RET_TRAP, use siginfo->si_call_addr.) Do not check any other | 220 | ``SECCOMP_RET_TRAP``, use ``siginfo->si_call_addr``.) Do not check any other |
217 | condition: future kernels may improve vsyscall emulation and current | 221 | condition: future kernels may improve vsyscall emulation and current |
218 | kernels in vsyscall=native mode will behave differently, but the | 222 | kernels in vsyscall=native mode will behave differently, but the |
219 | instructions at 0xF...F600{0,4,8,C}00 will not be system calls in these | 223 | instructions at ``0xF...F600{0,4,8,C}00`` will not be system calls in these |
220 | cases. | 224 | cases. |
221 | 225 | ||
222 | Note that modern systems are unlikely to use vsyscalls at all -- they | 226 | Note that modern systems are unlikely to use vsyscalls at all -- they |
diff --git a/Documentation/userspace-api/unshare.rst b/Documentation/userspace-api/unshare.rst index 737c192cf4e7..877e90a35238 100644 --- a/Documentation/userspace-api/unshare.rst +++ b/Documentation/userspace-api/unshare.rst | |||
@@ -107,7 +107,7 @@ the benefits of this new feature can exceed its cost. | |||
107 | 107 | ||
108 | unshare() reverses sharing that was done using clone(2) system call, | 108 | unshare() reverses sharing that was done using clone(2) system call, |
109 | so unshare() should have a similar interface as clone(2). That is, | 109 | so unshare() should have a similar interface as clone(2). That is, |
110 | since flags in clone(int flags, void *stack) specifies what should | 110 | since flags in clone(int flags, void \*stack) specifies what should |
111 | be shared, similar flags in unshare(int flags) should specify | 111 | be shared, similar flags in unshare(int flags) should specify |
112 | what should be unshared. Unfortunately, this may appear to invert | 112 | what should be unshared. Unfortunately, this may appear to invert |
113 | the meaning of the flags from the way they are used in clone(2). | 113 | the meaning of the flags from the way they are used in clone(2). |
diff --git a/MAINTAINERS b/MAINTAINERS index ba64d98e7897..867366bb67f1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS | |||
@@ -3597,7 +3597,6 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git | |||
3597 | S: Maintained | 3597 | S: Maintained |
3598 | F: Documentation/crypto/ | 3598 | F: Documentation/crypto/ |
3599 | F: Documentation/devicetree/bindings/crypto/ | 3599 | F: Documentation/devicetree/bindings/crypto/ |
3600 | F: Documentation/DocBook/crypto-API.tmpl | ||
3601 | F: arch/*/crypto/ | 3600 | F: arch/*/crypto/ |
3602 | F: crypto/ | 3601 | F: crypto/ |
3603 | F: drivers/crypto/ | 3602 | F: drivers/crypto/ |
@@ -4154,8 +4153,7 @@ M: Jonathan Corbet <corbet@lwn.net> | |||
4154 | L: linux-doc@vger.kernel.org | 4153 | L: linux-doc@vger.kernel.org |
4155 | S: Maintained | 4154 | S: Maintained |
4156 | F: Documentation/ | 4155 | F: Documentation/ |
4157 | F: scripts/docproc.c | 4156 | F: scripts/kernel-doc |
4158 | F: scripts/kernel-doc* | ||
4159 | X: Documentation/ABI/ | 4157 | X: Documentation/ABI/ |
4160 | X: Documentation/devicetree/ | 4158 | X: Documentation/devicetree/ |
4161 | X: Documentation/acpi | 4159 | X: Documentation/acpi |
@@ -7366,7 +7364,7 @@ KEYS/KEYRINGS: | |||
7366 | M: David Howells <dhowells@redhat.com> | 7364 | M: David Howells <dhowells@redhat.com> |
7367 | L: keyrings@vger.kernel.org | 7365 | L: keyrings@vger.kernel.org |
7368 | S: Maintained | 7366 | S: Maintained |
7369 | F: Documentation/security/keys.txt | 7367 | F: Documentation/security/keys/core.rst |
7370 | F: include/linux/key.h | 7368 | F: include/linux/key.h |
7371 | F: include/linux/key-type.h | 7369 | F: include/linux/key-type.h |
7372 | F: include/linux/keyctl.h | 7370 | F: include/linux/keyctl.h |
@@ -7380,7 +7378,7 @@ M: Mimi Zohar <zohar@linux.vnet.ibm.com> | |||
7380 | L: linux-security-module@vger.kernel.org | 7378 | L: linux-security-module@vger.kernel.org |
7381 | L: keyrings@vger.kernel.org | 7379 | L: keyrings@vger.kernel.org |
7382 | S: Supported | 7380 | S: Supported |
7383 | F: Documentation/security/keys-trusted-encrypted.txt | 7381 | F: Documentation/security/keys/trusted-encrypted.rst |
7384 | F: include/keys/trusted-type.h | 7382 | F: include/keys/trusted-type.h |
7385 | F: security/keys/trusted.c | 7383 | F: security/keys/trusted.c |
7386 | F: security/keys/trusted.h | 7384 | F: security/keys/trusted.h |
@@ -7391,7 +7389,7 @@ M: David Safford <safford@us.ibm.com> | |||
7391 | L: linux-security-module@vger.kernel.org | 7389 | L: linux-security-module@vger.kernel.org |
7392 | L: keyrings@vger.kernel.org | 7390 | L: keyrings@vger.kernel.org |
7393 | S: Supported | 7391 | S: Supported |
7394 | F: Documentation/security/keys-trusted-encrypted.txt | 7392 | F: Documentation/security/keys/trusted-encrypted.rst |
7395 | F: include/keys/encrypted-type.h | 7393 | F: include/keys/encrypted-type.h |
7396 | F: security/keys/encrypted-keys/ | 7394 | F: security/keys/encrypted-keys/ |
7397 | 7395 | ||
@@ -7401,7 +7399,7 @@ W: http://kgdb.wiki.kernel.org/ | |||
7401 | L: kgdb-bugreport@lists.sourceforge.net | 7399 | L: kgdb-bugreport@lists.sourceforge.net |
7402 | T: git git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git | 7400 | T: git git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git |
7403 | S: Maintained | 7401 | S: Maintained |
7404 | F: Documentation/DocBook/kgdb.tmpl | 7402 | F: Documentation/dev-tools/kgdb.rst |
7405 | F: drivers/misc/kgdbts.c | 7403 | F: drivers/misc/kgdbts.c |
7406 | F: drivers/tty/serial/kgdboc.c | 7404 | F: drivers/tty/serial/kgdboc.c |
7407 | F: include/linux/kdb.h | 7405 | F: include/linux/kdb.h |
@@ -11020,7 +11018,7 @@ S: Supported | |||
11020 | F: arch/s390/ | 11018 | F: arch/s390/ |
11021 | F: drivers/s390/ | 11019 | F: drivers/s390/ |
11022 | F: Documentation/s390/ | 11020 | F: Documentation/s390/ |
11023 | F: Documentation/DocBook/s390* | 11021 | F: Documentation/driver-api/s390-drivers.rst |
11024 | 11022 | ||
11025 | S390 COMMON I/O LAYER | 11023 | S390 COMMON I/O LAYER |
11026 | M: Sebastian Ott <sebott@linux.vnet.ibm.com> | 11024 | M: Sebastian Ott <sebott@linux.vnet.ibm.com> |
@@ -11524,6 +11522,7 @@ F: kernel/seccomp.c | |||
11524 | F: include/uapi/linux/seccomp.h | 11522 | F: include/uapi/linux/seccomp.h |
11525 | F: include/linux/seccomp.h | 11523 | F: include/linux/seccomp.h |
11526 | F: tools/testing/selftests/seccomp/* | 11524 | F: tools/testing/selftests/seccomp/* |
11525 | F: Documentation/userspace-api/seccomp_filter.rst | ||
11527 | K: \bsecure_computing | 11526 | K: \bsecure_computing |
11528 | K: \bTIF_SECCOMP\b | 11527 | K: \bTIF_SECCOMP\b |
11529 | 11528 | ||
@@ -11582,6 +11581,7 @@ S: Supported | |||
11582 | F: include/linux/selinux* | 11581 | F: include/linux/selinux* |
11583 | F: security/selinux/ | 11582 | F: security/selinux/ |
11584 | F: scripts/selinux/ | 11583 | F: scripts/selinux/ |
11584 | F: Documentation/admin-guide/LSM/SELinux.rst | ||
11585 | 11585 | ||
11586 | APPARMOR SECURITY MODULE | 11586 | APPARMOR SECURITY MODULE |
11587 | M: John Johansen <john.johansen@canonical.com> | 11587 | M: John Johansen <john.johansen@canonical.com> |
@@ -11590,18 +11590,21 @@ W: apparmor.wiki.kernel.org | |||
11590 | T: git git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git | 11590 | T: git git://git.kernel.org/pub/scm/linux/kernel/git/jj/apparmor-dev.git |
11591 | S: Supported | 11591 | S: Supported |
11592 | F: security/apparmor/ | 11592 | F: security/apparmor/ |
11593 | F: Documentation/admin-guide/LSM/apparmor.rst | ||
11593 | 11594 | ||
11594 | LOADPIN SECURITY MODULE | 11595 | LOADPIN SECURITY MODULE |
11595 | M: Kees Cook <keescook@chromium.org> | 11596 | M: Kees Cook <keescook@chromium.org> |
11596 | T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git lsm/loadpin | 11597 | T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git lsm/loadpin |
11597 | S: Supported | 11598 | S: Supported |
11598 | F: security/loadpin/ | 11599 | F: security/loadpin/ |
11600 | F: Documentation/admin-guide/LSM/LoadPin.rst | ||
11599 | 11601 | ||
11600 | YAMA SECURITY MODULE | 11602 | YAMA SECURITY MODULE |
11601 | M: Kees Cook <keescook@chromium.org> | 11603 | M: Kees Cook <keescook@chromium.org> |
11602 | T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git yama/tip | 11604 | T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git yama/tip |
11603 | S: Supported | 11605 | S: Supported |
11604 | F: security/yama/ | 11606 | F: security/yama/ |
11607 | F: Documentation/admin-guide/LSM/Yama.rst | ||
11605 | 11608 | ||
11606 | SENSABLE PHANTOM | 11609 | SENSABLE PHANTOM |
11607 | M: Jiri Slaby <jirislaby@gmail.com> | 11610 | M: Jiri Slaby <jirislaby@gmail.com> |
@@ -11904,7 +11907,7 @@ L: linux-security-module@vger.kernel.org | |||
11904 | W: http://schaufler-ca.com | 11907 | W: http://schaufler-ca.com |
11905 | T: git git://github.com/cschaufler/smack-next | 11908 | T: git git://github.com/cschaufler/smack-next |
11906 | S: Maintained | 11909 | S: Maintained |
11907 | F: Documentation/security/Smack.txt | 11910 | F: Documentation/admin-guide/LSM/Smack.rst |
11908 | F: security/smack/ | 11911 | F: security/smack/ |
11909 | 11912 | ||
11910 | DRIVERS FOR ADAPTIVE VOLTAGE SCALING (AVS) | 11913 | DRIVERS FOR ADAPTIVE VOLTAGE SCALING (AVS) |
@@ -1312,7 +1312,7 @@ clean: archclean vmlinuxclean | |||
1312 | # | 1312 | # |
1313 | mrproper: rm-dirs := $(wildcard $(MRPROPER_DIRS)) | 1313 | mrproper: rm-dirs := $(wildcard $(MRPROPER_DIRS)) |
1314 | mrproper: rm-files := $(wildcard $(MRPROPER_FILES)) | 1314 | mrproper: rm-files := $(wildcard $(MRPROPER_FILES)) |
1315 | mrproper-dirs := $(addprefix _mrproper_,Documentation/DocBook scripts) | 1315 | mrproper-dirs := $(addprefix _mrproper_,scripts) |
1316 | 1316 | ||
1317 | PHONY += $(mrproper-dirs) mrproper archmrproper | 1317 | PHONY += $(mrproper-dirs) mrproper archmrproper |
1318 | $(mrproper-dirs): | 1318 | $(mrproper-dirs): |
@@ -1416,9 +1416,7 @@ help: | |||
1416 | @$(MAKE) $(build)=$(package-dir) help | 1416 | @$(MAKE) $(build)=$(package-dir) help |
1417 | @echo '' | 1417 | @echo '' |
1418 | @echo 'Documentation targets:' | 1418 | @echo 'Documentation targets:' |
1419 | @$(MAKE) -f $(srctree)/Documentation/Makefile.sphinx dochelp | 1419 | @$(MAKE) -f $(srctree)/Documentation/Makefile dochelp |
1420 | @echo '' | ||
1421 | @$(MAKE) -f $(srctree)/Documentation/DocBook/Makefile dochelp | ||
1422 | @echo '' | 1420 | @echo '' |
1423 | @echo 'Architecture specific targets ($(SRCARCH)):' | 1421 | @echo 'Architecture specific targets ($(SRCARCH)):' |
1424 | @$(if $(archhelp),$(archhelp),\ | 1422 | @$(if $(archhelp),$(archhelp),\ |
@@ -1469,9 +1467,7 @@ $(help-board-dirs): help-%: | |||
1469 | DOC_TARGETS := xmldocs sgmldocs psdocs latexdocs pdfdocs htmldocs mandocs installmandocs epubdocs cleandocs linkcheckdocs | 1467 | DOC_TARGETS := xmldocs sgmldocs psdocs latexdocs pdfdocs htmldocs mandocs installmandocs epubdocs cleandocs linkcheckdocs |
1470 | PHONY += $(DOC_TARGETS) | 1468 | PHONY += $(DOC_TARGETS) |
1471 | $(DOC_TARGETS): scripts_basic FORCE | 1469 | $(DOC_TARGETS): scripts_basic FORCE |
1472 | $(Q)$(MAKE) $(build)=scripts build_docproc build_check-lc_ctype | 1470 | $(Q)$(MAKE) $(build)=Documentation $@ |
1473 | $(Q)$(MAKE) $(build)=Documentation -f $(srctree)/Documentation/Makefile.sphinx $@ | ||
1474 | $(Q)$(MAKE) $(build)=Documentation/DocBook $@ | ||
1475 | 1471 | ||
1476 | else # KBUILD_EXTMOD | 1472 | else # KBUILD_EXTMOD |
1477 | 1473 | ||
diff --git a/arch/ia64/include/asm/io.h b/arch/ia64/include/asm/io.h index 5de673ac9cb1..a2540e21f919 100644 --- a/arch/ia64/include/asm/io.h +++ b/arch/ia64/include/asm/io.h | |||
@@ -117,7 +117,7 @@ extern int valid_mmap_phys_addr_range (unsigned long pfn, size_t count); | |||
117 | * following the barrier will arrive after all previous writes. For most | 117 | * following the barrier will arrive after all previous writes. For most |
118 | * ia64 platforms, this is a simple 'mf.a' instruction. | 118 | * ia64 platforms, this is a simple 'mf.a' instruction. |
119 | * | 119 | * |
120 | * See Documentation/DocBook/deviceiobook.tmpl for more information. | 120 | * See Documentation/driver-api/device-io.rst for more information. |
121 | */ | 121 | */ |
122 | static inline void ___ia64_mmiowb(void) | 122 | static inline void ___ia64_mmiowb(void) |
123 | { | 123 | { |
diff --git a/arch/ia64/sn/kernel/iomv.c b/arch/ia64/sn/kernel/iomv.c index c77ebdf98119..2b22a71663c1 100644 --- a/arch/ia64/sn/kernel/iomv.c +++ b/arch/ia64/sn/kernel/iomv.c | |||
@@ -63,7 +63,7 @@ EXPORT_SYMBOL(sn_io_addr); | |||
63 | /** | 63 | /** |
64 | * __sn_mmiowb - I/O space memory barrier | 64 | * __sn_mmiowb - I/O space memory barrier |
65 | * | 65 | * |
66 | * See arch/ia64/include/asm/io.h and Documentation/DocBook/deviceiobook.tmpl | 66 | * See arch/ia64/include/asm/io.h and Documentation/driver-api/device-io.rst |
67 | * for details. | 67 | * for details. |
68 | * | 68 | * |
69 | * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear. | 69 | * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear. |
diff --git a/drivers/ata/acard-ahci.c b/drivers/ata/acard-ahci.c index ed6a30cd681a..940ddbc59aa7 100644 --- a/drivers/ata/acard-ahci.c +++ b/drivers/ata/acard-ahci.c | |||
@@ -25,7 +25,7 @@ | |||
25 | * | 25 | * |
26 | * | 26 | * |
27 | * libata documentation is available via 'make {ps|pdf}docs', | 27 | * libata documentation is available via 'make {ps|pdf}docs', |
28 | * as Documentation/DocBook/libata.* | 28 | * as Documentation/driver-api/libata.rst |
29 | * | 29 | * |
30 | * AHCI hardware documentation: | 30 | * AHCI hardware documentation: |
31 | * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf | 31 | * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf |
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index c69954023c2e..1e1c355121e4 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c | |||
@@ -24,7 +24,7 @@ | |||
24 | * | 24 | * |
25 | * | 25 | * |
26 | * libata documentation is available via 'make {ps|pdf}docs', | 26 | * libata documentation is available via 'make {ps|pdf}docs', |
27 | * as Documentation/DocBook/libata.* | 27 | * as Documentation/driver-api/libata.rst |
28 | * | 28 | * |
29 | * AHCI hardware documentation: | 29 | * AHCI hardware documentation: |
30 | * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf | 30 | * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf |
diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h index 5db6ab261643..30f67a1a4f54 100644 --- a/drivers/ata/ahci.h +++ b/drivers/ata/ahci.h | |||
@@ -24,7 +24,7 @@ | |||
24 | * | 24 | * |
25 | * | 25 | * |
26 | * libata documentation is available via 'make {ps|pdf}docs', | 26 | * libata documentation is available via 'make {ps|pdf}docs', |
27 | * as Documentation/DocBook/libata.* | 27 | * as Documentation/driver-api/libata.rst |
28 | * | 28 | * |
29 | * AHCI hardware documentation: | 29 | * AHCI hardware documentation: |
30 | * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf | 30 | * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf |
diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index ffbe625e6fd2..8401c3b5be92 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c | |||
@@ -33,7 +33,7 @@ | |||
33 | * | 33 | * |
34 | * | 34 | * |
35 | * libata documentation is available via 'make {ps|pdf}docs', | 35 | * libata documentation is available via 'make {ps|pdf}docs', |
36 | * as Documentation/DocBook/libata.* | 36 | * as Documentation/driver-api/libata.rst |
37 | * | 37 | * |
38 | * Hardware documentation available at http://developer.intel.com/ | 38 | * Hardware documentation available at http://developer.intel.com/ |
39 | * | 39 | * |
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c index 3159f9e66d8f..6154f0e2b81a 100644 --- a/drivers/ata/libahci.c +++ b/drivers/ata/libahci.c | |||
@@ -24,7 +24,7 @@ | |||
24 | * | 24 | * |
25 | * | 25 | * |
26 | * libata documentation is available via 'make {ps|pdf}docs', | 26 | * libata documentation is available via 'make {ps|pdf}docs', |
27 | * as Documentation/DocBook/libata.* | 27 | * as Documentation/driver-api/libata.rst |
28 | * | 28 | * |
29 | * AHCI hardware documentation: | 29 | * AHCI hardware documentation: |
30 | * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf | 30 | * http://www.intel.com/technology/serialata/pdf/rev1_0.pdf |
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index e157a0e44419..b82d6bb88d27 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c | |||
@@ -25,7 +25,7 @@ | |||
25 | * | 25 | * |
26 | * | 26 | * |
27 | * libata documentation is available via 'make {ps|pdf}docs', | 27 | * libata documentation is available via 'make {ps|pdf}docs', |
28 | * as Documentation/DocBook/libata.* | 28 | * as Documentation/driver-api/libata.rst |
29 | * | 29 | * |
30 | * Hardware documentation available from http://www.t13.org/ and | 30 | * Hardware documentation available from http://www.t13.org/ and |
31 | * http://www.sata-io.org/ | 31 | * http://www.sata-io.org/ |
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index ef68232b5222..7e33e200aae5 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c | |||
@@ -25,7 +25,7 @@ | |||
25 | * | 25 | * |
26 | * | 26 | * |
27 | * libata documentation is available via 'make {ps|pdf}docs', | 27 | * libata documentation is available via 'make {ps|pdf}docs', |
28 | * as Documentation/DocBook/libata.* | 28 | * as Documentation/driver-api/libata.rst |
29 | * | 29 | * |
30 | * Hardware documentation available from http://www.t13.org/ and | 30 | * Hardware documentation available from http://www.t13.org/ and |
31 | * http://www.sata-io.org/ | 31 | * http://www.sata-io.org/ |
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 49ba9834c715..b0866f040d1f 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c | |||
@@ -25,7 +25,7 @@ | |||
25 | * | 25 | * |
26 | * | 26 | * |
27 | * libata documentation is available via 'make {ps|pdf}docs', | 27 | * libata documentation is available via 'make {ps|pdf}docs', |
28 | * as Documentation/DocBook/libata.* | 28 | * as Documentation/driver-api/libata.rst |
29 | * | 29 | * |
30 | * Hardware documentation available from | 30 | * Hardware documentation available from |
31 | * - http://www.t10.org/ | 31 | * - http://www.t10.org/ |
@@ -3398,9 +3398,10 @@ static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax, | |||
3398 | * | 3398 | * |
3399 | * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or | 3399 | * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or |
3400 | * an SCT Write Same command. | 3400 | * an SCT Write Same command. |
3401 | * Based on WRITE SAME has the UNMAP flag | 3401 | * Based on WRITE SAME has the UNMAP flag: |
3402 | * When set translate to DSM TRIM | 3402 | * |
3403 | * When clear translate to SCT Write Same | 3403 | * - When set translate to DSM TRIM |
3404 | * - When clear translate to SCT Write Same | ||
3404 | */ | 3405 | */ |
3405 | static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) | 3406 | static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) |
3406 | { | 3407 | { |
diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 274d6d7193d7..052921352f31 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c | |||
@@ -25,7 +25,7 @@ | |||
25 | * | 25 | * |
26 | * | 26 | * |
27 | * libata documentation is available via 'make {ps|pdf}docs', | 27 | * libata documentation is available via 'make {ps|pdf}docs', |
28 | * as Documentation/DocBook/libata.* | 28 | * as Documentation/driver-api/libata.rst |
29 | * | 29 | * |
30 | * Hardware documentation available from http://www.t13.org/ and | 30 | * Hardware documentation available from http://www.t13.org/ and |
31 | * http://www.sata-io.org/ | 31 | * http://www.sata-io.org/ |
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h index 120fce0befd3..5afe35baf61b 100644 --- a/drivers/ata/libata.h +++ b/drivers/ata/libata.h | |||
@@ -21,7 +21,7 @@ | |||
21 | * | 21 | * |
22 | * | 22 | * |
23 | * libata documentation is available via 'make {ps|pdf}docs', | 23 | * libata documentation is available via 'make {ps|pdf}docs', |
24 | * as Documentation/DocBook/libata.* | 24 | * as Documentation/driver-api/libata.rst |
25 | * | 25 | * |
26 | */ | 26 | */ |
27 | 27 | ||
diff --git a/drivers/ata/pata_pdc2027x.c b/drivers/ata/pata_pdc2027x.c index d9ef9e276225..82bfd51692f3 100644 --- a/drivers/ata/pata_pdc2027x.c +++ b/drivers/ata/pata_pdc2027x.c | |||
@@ -17,7 +17,7 @@ | |||
17 | * | 17 | * |
18 | * | 18 | * |
19 | * libata documentation is available via 'make {ps|pdf}docs', | 19 | * libata documentation is available via 'make {ps|pdf}docs', |
20 | * as Documentation/DocBook/libata.* | 20 | * as Documentation/driver-api/libata.rst |
21 | * | 21 | * |
22 | * Hardware information only available under NDA. | 22 | * Hardware information only available under NDA. |
23 | * | 23 | * |
diff --git a/drivers/ata/pdc_adma.c b/drivers/ata/pdc_adma.c index 64d682c6ee57..f1e873a37465 100644 --- a/drivers/ata/pdc_adma.c +++ b/drivers/ata/pdc_adma.c | |||
@@ -21,7 +21,7 @@ | |||
21 | * | 21 | * |
22 | * | 22 | * |
23 | * libata documentation is available via 'make {ps|pdf}docs', | 23 | * libata documentation is available via 'make {ps|pdf}docs', |
24 | * as Documentation/DocBook/libata.* | 24 | * as Documentation/driver-api/libata.rst |
25 | * | 25 | * |
26 | * | 26 | * |
27 | * Supports ATA disks in single-packet ADMA mode. | 27 | * Supports ATA disks in single-packet ADMA mode. |
diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c index 734f563b8d37..8c683ddd0f58 100644 --- a/drivers/ata/sata_nv.c +++ b/drivers/ata/sata_nv.c | |||
@@ -21,7 +21,7 @@ | |||
21 | * | 21 | * |
22 | * | 22 | * |
23 | * libata documentation is available via 'make {ps|pdf}docs', | 23 | * libata documentation is available via 'make {ps|pdf}docs', |
24 | * as Documentation/DocBook/libata.* | 24 | * as Documentation/driver-api/libata.rst |
25 | * | 25 | * |
26 | * No hardware documentation available outside of NVIDIA. | 26 | * No hardware documentation available outside of NVIDIA. |
27 | * This driver programs the NVIDIA SATA controller in a similar | 27 | * This driver programs the NVIDIA SATA controller in a similar |
diff --git a/drivers/ata/sata_promise.c b/drivers/ata/sata_promise.c index 0fa211e2831c..d032bf657f70 100644 --- a/drivers/ata/sata_promise.c +++ b/drivers/ata/sata_promise.c | |||
@@ -25,7 +25,7 @@ | |||
25 | * | 25 | * |
26 | * | 26 | * |
27 | * libata documentation is available via 'make {ps|pdf}docs', | 27 | * libata documentation is available via 'make {ps|pdf}docs', |
28 | * as Documentation/DocBook/libata.* | 28 | * as Documentation/driver-api/libata.rst |
29 | * | 29 | * |
30 | * Hardware information only available under NDA. | 30 | * Hardware information only available under NDA. |
31 | * | 31 | * |
diff --git a/drivers/ata/sata_promise.h b/drivers/ata/sata_promise.h index 00d6000e546f..61633ef5ed72 100644 --- a/drivers/ata/sata_promise.h +++ b/drivers/ata/sata_promise.h | |||
@@ -20,7 +20,7 @@ | |||
20 | * | 20 | * |
21 | * | 21 | * |
22 | * libata documentation is available via 'make {ps|pdf}docs', | 22 | * libata documentation is available via 'make {ps|pdf}docs', |
23 | * as Documentation/DocBook/libata.* | 23 | * as Documentation/driver-api/libata.rst |
24 | * | 24 | * |
25 | */ | 25 | */ |
26 | 26 | ||
diff --git a/drivers/ata/sata_qstor.c b/drivers/ata/sata_qstor.c index af987a4f33d1..1fe941688e95 100644 --- a/drivers/ata/sata_qstor.c +++ b/drivers/ata/sata_qstor.c | |||
@@ -23,7 +23,7 @@ | |||
23 | * | 23 | * |
24 | * | 24 | * |
25 | * libata documentation is available via 'make {ps|pdf}docs', | 25 | * libata documentation is available via 'make {ps|pdf}docs', |
26 | * as Documentation/DocBook/libata.* | 26 | * as Documentation/driver-api/libata.rst |
27 | * | 27 | * |
28 | */ | 28 | */ |
29 | 29 | ||
diff --git a/drivers/ata/sata_sil.c b/drivers/ata/sata_sil.c index 29bcff086bce..ed76f070d21e 100644 --- a/drivers/ata/sata_sil.c +++ b/drivers/ata/sata_sil.c | |||
@@ -25,7 +25,7 @@ | |||
25 | * | 25 | * |
26 | * | 26 | * |
27 | * libata documentation is available via 'make {ps|pdf}docs', | 27 | * libata documentation is available via 'make {ps|pdf}docs', |
28 | * as Documentation/DocBook/libata.* | 28 | * as Documentation/driver-api/libata.rst |
29 | * | 29 | * |
30 | * Documentation for SiI 3112: | 30 | * Documentation for SiI 3112: |
31 | * http://gkernel.sourceforge.net/specs/sii/3112A_SiI-DS-0095-B2.pdf.bz2 | 31 | * http://gkernel.sourceforge.net/specs/sii/3112A_SiI-DS-0095-B2.pdf.bz2 |
diff --git a/drivers/ata/sata_sis.c b/drivers/ata/sata_sis.c index d1637ac40a73..30f4f35f36d4 100644 --- a/drivers/ata/sata_sis.c +++ b/drivers/ata/sata_sis.c | |||
@@ -24,7 +24,7 @@ | |||
24 | * | 24 | * |
25 | * | 25 | * |
26 | * libata documentation is available via 'make {ps|pdf}docs', | 26 | * libata documentation is available via 'make {ps|pdf}docs', |
27 | * as Documentation/DocBook/libata.* | 27 | * as Documentation/driver-api/libata.rst |
28 | * | 28 | * |
29 | * Hardware documentation available under NDA. | 29 | * Hardware documentation available under NDA. |
30 | * | 30 | * |
diff --git a/drivers/ata/sata_svw.c b/drivers/ata/sata_svw.c index ff614be55d0f..0fd6ac7e57ba 100644 --- a/drivers/ata/sata_svw.c +++ b/drivers/ata/sata_svw.c | |||
@@ -30,7 +30,7 @@ | |||
30 | * | 30 | * |
31 | * | 31 | * |
32 | * libata documentation is available via 'make {ps|pdf}docs', | 32 | * libata documentation is available via 'make {ps|pdf}docs', |
33 | * as Documentation/DocBook/libata.* | 33 | * as Documentation/driver-api/libata.rst |
34 | * | 34 | * |
35 | * Hardware documentation available under NDA. | 35 | * Hardware documentation available under NDA. |
36 | * | 36 | * |
diff --git a/drivers/ata/sata_sx4.c b/drivers/ata/sata_sx4.c index 48301cb3a316..405e606a234d 100644 --- a/drivers/ata/sata_sx4.c +++ b/drivers/ata/sata_sx4.c | |||
@@ -24,7 +24,7 @@ | |||
24 | * | 24 | * |
25 | * | 25 | * |
26 | * libata documentation is available via 'make {ps|pdf}docs', | 26 | * libata documentation is available via 'make {ps|pdf}docs', |
27 | * as Documentation/DocBook/libata.* | 27 | * as Documentation/driver-api/libata.rst |
28 | * | 28 | * |
29 | * Hardware documentation available under NDA. | 29 | * Hardware documentation available under NDA. |
30 | * | 30 | * |
diff --git a/drivers/ata/sata_uli.c b/drivers/ata/sata_uli.c index 08f98c3ed5c8..4f6e8d8156de 100644 --- a/drivers/ata/sata_uli.c +++ b/drivers/ata/sata_uli.c | |||
@@ -18,7 +18,7 @@ | |||
18 | * | 18 | * |
19 | * | 19 | * |
20 | * libata documentation is available via 'make {ps|pdf}docs', | 20 | * libata documentation is available via 'make {ps|pdf}docs', |
21 | * as Documentation/DocBook/libata.* | 21 | * as Documentation/driver-api/libata.rst |
22 | * | 22 | * |
23 | * Hardware documentation available under NDA. | 23 | * Hardware documentation available under NDA. |
24 | * | 24 | * |
diff --git a/drivers/ata/sata_via.c b/drivers/ata/sata_via.c index f3f538eec7b3..22e96fc77d09 100644 --- a/drivers/ata/sata_via.c +++ b/drivers/ata/sata_via.c | |||
@@ -25,7 +25,7 @@ | |||
25 | * | 25 | * |
26 | * | 26 | * |
27 | * libata documentation is available via 'make {ps|pdf}docs', | 27 | * libata documentation is available via 'make {ps|pdf}docs', |
28 | * as Documentation/DocBook/libata.* | 28 | * as Documentation/driver-api/libata.rst |
29 | * | 29 | * |
30 | * Hardware documentation available under NDA. | 30 | * Hardware documentation available under NDA. |
31 | * | 31 | * |
diff --git a/drivers/ata/sata_vsc.c b/drivers/ata/sata_vsc.c index 183eb52085df..9648127cca70 100644 --- a/drivers/ata/sata_vsc.c +++ b/drivers/ata/sata_vsc.c | |||
@@ -26,7 +26,7 @@ | |||
26 | * | 26 | * |
27 | * | 27 | * |
28 | * libata documentation is available via 'make {ps|pdf}docs', | 28 | * libata documentation is available via 'make {ps|pdf}docs', |
29 | * as Documentation/DocBook/libata.* | 29 | * as Documentation/driver-api/libata.rst |
30 | * | 30 | * |
31 | * Vitesse hardware documentation presumably available under NDA. | 31 | * Vitesse hardware documentation presumably available under NDA. |
32 | * Intel 31244 (same hardware interface) documentation presumably | 32 | * Intel 31244 (same hardware interface) documentation presumably |
diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index b1dd12729f19..bf8486c406d3 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c | |||
@@ -502,10 +502,12 @@ static int nand_default_block_markbad(struct mtd_info *mtd, loff_t ofs) | |||
502 | * specify how to write bad block markers to OOB (chip->block_markbad). | 502 | * specify how to write bad block markers to OOB (chip->block_markbad). |
503 | * | 503 | * |
504 | * We try operations in the following order: | 504 | * We try operations in the following order: |
505 | * | ||
505 | * (1) erase the affected block, to allow OOB marker to be written cleanly | 506 | * (1) erase the affected block, to allow OOB marker to be written cleanly |
506 | * (2) write bad block marker to OOB area of affected block (unless flag | 507 | * (2) write bad block marker to OOB area of affected block (unless flag |
507 | * NAND_BBT_NO_OOB_BBM is present) | 508 | * NAND_BBT_NO_OOB_BBM is present) |
508 | * (3) update the BBT | 509 | * (3) update the BBT |
510 | * | ||
509 | * Note that we retain the first error encountered in (2) or (3), finish the | 511 | * Note that we retain the first error encountered in (2) or (3), finish the |
510 | * procedures, and dump the error in the end. | 512 | * procedures, and dump the error in the end. |
511 | */ | 513 | */ |
@@ -1219,9 +1221,10 @@ int nand_reset(struct nand_chip *chip, int chipnr) | |||
1219 | * @mtd: mtd info | 1221 | * @mtd: mtd info |
1220 | * @ofs: offset to start unlock from | 1222 | * @ofs: offset to start unlock from |
1221 | * @len: length to unlock | 1223 | * @len: length to unlock |
1222 | * @invert: when = 0, unlock the range of blocks within the lower and | 1224 | * @invert: |
1225 | * - when = 0, unlock the range of blocks within the lower and | ||
1223 | * upper boundary address | 1226 | * upper boundary address |
1224 | * when = 1, unlock the range of blocks outside the boundaries | 1227 | * - when = 1, unlock the range of blocks outside the boundaries |
1225 | * of the lower and upper boundary address | 1228 | * of the lower and upper boundary address |
1226 | * | 1229 | * |
1227 | * Returs unlock status. | 1230 | * Returs unlock status. |
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index eebb0e1c70ff..3e231a54476e 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c | |||
@@ -379,6 +379,7 @@ static void phy_sanitize_settings(struct phy_device *phydev) | |||
379 | * @cmd: ethtool_cmd | 379 | * @cmd: ethtool_cmd |
380 | * | 380 | * |
381 | * A few notes about parameter checking: | 381 | * A few notes about parameter checking: |
382 | * | ||
382 | * - We don't set port or transceiver, so we don't care what they | 383 | * - We don't set port or transceiver, so we don't care what they |
383 | * were set to. | 384 | * were set to. |
384 | * - phy_start_aneg() will make sure forced settings are sane, and | 385 | * - phy_start_aneg() will make sure forced settings are sane, and |
diff --git a/drivers/scsi/qla1280.c b/drivers/scsi/qla1280.c index 634254a52301..8a29fb09db14 100644 --- a/drivers/scsi/qla1280.c +++ b/drivers/scsi/qla1280.c | |||
@@ -3390,7 +3390,7 @@ qla1280_isp_cmd(struct scsi_qla_host *ha) | |||
3390 | * On PCI bus, order reverses and write of 6 posts, then index 5, | 3390 | * On PCI bus, order reverses and write of 6 posts, then index 5, |
3391 | * causing chip to issue full queue of stale commands | 3391 | * causing chip to issue full queue of stale commands |
3392 | * The mmiowb() prevents future writes from crossing the barrier. | 3392 | * The mmiowb() prevents future writes from crossing the barrier. |
3393 | * See Documentation/DocBook/deviceiobook.tmpl for more information. | 3393 | * See Documentation/driver-api/device-io.rst for more information. |
3394 | */ | 3394 | */ |
3395 | WRT_REG_WORD(®->mailbox4, ha->req_ring_index); | 3395 | WRT_REG_WORD(®->mailbox4, ha->req_ring_index); |
3396 | mmiowb(); | 3396 | mmiowb(); |
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 6f7128f49c30..69979574004f 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c | |||
@@ -1051,10 +1051,11 @@ static unsigned char *scsi_inq_str(unsigned char *buf, unsigned char *inq, | |||
1051 | * allocate and set it up by calling scsi_add_lun. | 1051 | * allocate and set it up by calling scsi_add_lun. |
1052 | * | 1052 | * |
1053 | * Return: | 1053 | * Return: |
1054 | * SCSI_SCAN_NO_RESPONSE: could not allocate or setup a scsi_device | 1054 | * |
1055 | * SCSI_SCAN_TARGET_PRESENT: target responded, but no device is | 1055 | * - SCSI_SCAN_NO_RESPONSE: could not allocate or setup a scsi_device |
1056 | * - SCSI_SCAN_TARGET_PRESENT: target responded, but no device is | ||
1056 | * attached at the LUN | 1057 | * attached at the LUN |
1057 | * SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized | 1058 | * - SCSI_SCAN_LUN_PRESENT: a new scsi_device was allocated and initialized |
1058 | **/ | 1059 | **/ |
1059 | static int scsi_probe_and_add_lun(struct scsi_target *starget, | 1060 | static int scsi_probe_and_add_lun(struct scsi_target *starget, |
1060 | u64 lun, int *bflagsp, | 1061 | u64 lun, int *bflagsp, |
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c index d4cf32d55546..1df77453f6b6 100644 --- a/drivers/scsi/scsi_transport_fc.c +++ b/drivers/scsi/scsi_transport_fc.c | |||
@@ -2914,16 +2914,18 @@ EXPORT_SYMBOL(fc_remote_port_add); | |||
2914 | * port is no longer part of the topology. Note: Although a port | 2914 | * port is no longer part of the topology. Note: Although a port |
2915 | * may no longer be part of the topology, it may persist in the remote | 2915 | * may no longer be part of the topology, it may persist in the remote |
2916 | * ports displayed by the fc_host. We do this under 2 conditions: | 2916 | * ports displayed by the fc_host. We do this under 2 conditions: |
2917 | * | ||
2917 | * 1) If the port was a scsi target, we delay its deletion by "blocking" it. | 2918 | * 1) If the port was a scsi target, we delay its deletion by "blocking" it. |
2918 | * This allows the port to temporarily disappear, then reappear without | 2919 | * This allows the port to temporarily disappear, then reappear without |
2919 | * disrupting the SCSI device tree attached to it. During the "blocked" | 2920 | * disrupting the SCSI device tree attached to it. During the "blocked" |
2920 | * period the port will still exist. | 2921 | * period the port will still exist. |
2922 | * | ||
2921 | * 2) If the port was a scsi target and disappears for longer than we | 2923 | * 2) If the port was a scsi target and disappears for longer than we |
2922 | * expect, we'll delete the port and the tear down the SCSI device tree | 2924 | * expect, we'll delete the port and the tear down the SCSI device tree |
2923 | * attached to it. However, we want to semi-persist the target id assigned | 2925 | * attached to it. However, we want to semi-persist the target id assigned |
2924 | * to that port if it eventually does exist. The port structure will | 2926 | * to that port if it eventually does exist. The port structure will |
2925 | * remain (although with minimal information) so that the target id | 2927 | * remain (although with minimal information) so that the target id |
2926 | * bindings remails. | 2928 | * bindings remails. |
2927 | * | 2929 | * |
2928 | * If the remote port is not an FCP Target, it will be fully torn down | 2930 | * If the remote port is not an FCP Target, it will be fully torn down |
2929 | * and deallocated, including the fc_remote_port class device. | 2931 | * and deallocated, including the fc_remote_port class device. |
diff --git a/drivers/scsi/scsicam.c b/drivers/scsi/scsicam.c index 910f4a7a3924..31273468589c 100644 --- a/drivers/scsi/scsicam.c +++ b/drivers/scsi/scsicam.c | |||
@@ -116,8 +116,8 @@ EXPORT_SYMBOL(scsicam_bios_param); | |||
116 | * @hds: put heads here | 116 | * @hds: put heads here |
117 | * @secs: put sectors here | 117 | * @secs: put sectors here |
118 | * | 118 | * |
119 | * Description: determine the BIOS mapping/geometry used to create the partition | 119 | * Determine the BIOS mapping/geometry used to create the partition |
120 | * table, storing the results in *cyls, *hds, and *secs | 120 | * table, storing the results in @cyls, @hds, and @secs |
121 | * | 121 | * |
122 | * Returns: -1 on failure, 0 on success. | 122 | * Returns: -1 on failure, 0 on success. |
123 | */ | 123 | */ |
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index 354e2ab62031..6dabc4a10396 100644 --- a/fs/debugfs/file.c +++ b/fs/debugfs/file.c | |||
@@ -9,7 +9,7 @@ | |||
9 | * 2 as published by the Free Software Foundation. | 9 | * 2 as published by the Free Software Foundation. |
10 | * | 10 | * |
11 | * debugfs is for people to use instead of /proc or /sys. | 11 | * debugfs is for people to use instead of /proc or /sys. |
12 | * See Documentation/DocBook/filesystems for more details. | 12 | * See Documentation/filesystems/ for more details. |
13 | * | 13 | * |
14 | */ | 14 | */ |
15 | 15 | ||
diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c index e892ae7d89f8..77440e4aa9d4 100644 --- a/fs/debugfs/inode.c +++ b/fs/debugfs/inode.c | |||
@@ -9,7 +9,7 @@ | |||
9 | * 2 as published by the Free Software Foundation. | 9 | * 2 as published by the Free Software Foundation. |
10 | * | 10 | * |
11 | * debugfs is for people to use instead of /proc or /sys. | 11 | * debugfs is for people to use instead of /proc or /sys. |
12 | * See Documentation/DocBook/kernel-api for more details. | 12 | * See ./Documentation/core-api/kernel-api.rst for more details. |
13 | * | 13 | * |
14 | */ | 14 | */ |
15 | 15 | ||
diff --git a/fs/eventfd.c b/fs/eventfd.c index 9736df2ce89d..2fb4eadaa118 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c | |||
@@ -215,8 +215,8 @@ EXPORT_SYMBOL_GPL(eventfd_ctx_remove_wait_queue); | |||
215 | * | 215 | * |
216 | * Returns %0 if successful, or the following error codes: | 216 | * Returns %0 if successful, or the following error codes: |
217 | * | 217 | * |
218 | * -EAGAIN : The operation would have blocked but @no_wait was non-zero. | 218 | * - -EAGAIN : The operation would have blocked but @no_wait was non-zero. |
219 | * -ERESTARTSYS : A signal interrupted the wait operation. | 219 | * - -ERESTARTSYS : A signal interrupted the wait operation. |
220 | * | 220 | * |
221 | * If @no_wait is zero, the function might sleep until the eventfd internal | 221 | * If @no_wait is zero, the function might sleep until the eventfd internal |
222 | * counter becomes greater than zero. | 222 | * counter becomes greater than zero. |
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 63ee2940775c..8b426f83909f 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c | |||
@@ -2052,11 +2052,13 @@ static noinline void block_dump___mark_inode_dirty(struct inode *inode) | |||
2052 | } | 2052 | } |
2053 | 2053 | ||
2054 | /** | 2054 | /** |
2055 | * __mark_inode_dirty - internal function | 2055 | * __mark_inode_dirty - internal function |
2056 | * @inode: inode to mark | 2056 | * |
2057 | * @flags: what kind of dirty (i.e. I_DIRTY_SYNC) | 2057 | * @inode: inode to mark |
2058 | * Mark an inode as dirty. Callers should use mark_inode_dirty or | 2058 | * @flags: what kind of dirty (i.e. I_DIRTY_SYNC) |
2059 | * mark_inode_dirty_sync. | 2059 | * |
2060 | * Mark an inode as dirty. Callers should use mark_inode_dirty or | ||
2061 | * mark_inode_dirty_sync. | ||
2060 | * | 2062 | * |
2061 | * Put the inode on the super block's dirty list. | 2063 | * Put the inode on the super block's dirty list. |
2062 | * | 2064 | * |
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 2d30a6da7013..8b08044b3120 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c | |||
@@ -409,25 +409,6 @@ static handle_t *new_handle(int nblocks) | |||
409 | return handle; | 409 | return handle; |
410 | } | 410 | } |
411 | 411 | ||
412 | /** | ||
413 | * handle_t *jbd2_journal_start() - Obtain a new handle. | ||
414 | * @journal: Journal to start transaction on. | ||
415 | * @nblocks: number of block buffer we might modify | ||
416 | * | ||
417 | * We make sure that the transaction can guarantee at least nblocks of | ||
418 | * modified buffers in the log. We block until the log can guarantee | ||
419 | * that much space. Additionally, if rsv_blocks > 0, we also create another | ||
420 | * handle with rsv_blocks reserved blocks in the journal. This handle is | ||
421 | * is stored in h_rsv_handle. It is not attached to any particular transaction | ||
422 | * and thus doesn't block transaction commit. If the caller uses this reserved | ||
423 | * handle, it has to set h_rsv_handle to NULL as otherwise jbd2_journal_stop() | ||
424 | * on the parent handle will dispose the reserved one. Reserved handle has to | ||
425 | * be converted to a normal handle using jbd2_journal_start_reserved() before | ||
426 | * it can be used. | ||
427 | * | ||
428 | * Return a pointer to a newly allocated handle, or an ERR_PTR() value | ||
429 | * on failure. | ||
430 | */ | ||
431 | handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks, | 412 | handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks, |
432 | gfp_t gfp_mask, unsigned int type, | 413 | gfp_t gfp_mask, unsigned int type, |
433 | unsigned int line_no) | 414 | unsigned int line_no) |
@@ -478,6 +459,25 @@ handle_t *jbd2__journal_start(journal_t *journal, int nblocks, int rsv_blocks, | |||
478 | EXPORT_SYMBOL(jbd2__journal_start); | 459 | EXPORT_SYMBOL(jbd2__journal_start); |
479 | 460 | ||
480 | 461 | ||
462 | /** | ||
463 | * handle_t *jbd2_journal_start() - Obtain a new handle. | ||
464 | * @journal: Journal to start transaction on. | ||
465 | * @nblocks: number of block buffer we might modify | ||
466 | * | ||
467 | * We make sure that the transaction can guarantee at least nblocks of | ||
468 | * modified buffers in the log. We block until the log can guarantee | ||
469 | * that much space. Additionally, if rsv_blocks > 0, we also create another | ||
470 | * handle with rsv_blocks reserved blocks in the journal. This handle is | ||
471 | * is stored in h_rsv_handle. It is not attached to any particular transaction | ||
472 | * and thus doesn't block transaction commit. If the caller uses this reserved | ||
473 | * handle, it has to set h_rsv_handle to NULL as otherwise jbd2_journal_stop() | ||
474 | * on the parent handle will dispose the reserved one. Reserved handle has to | ||
475 | * be converted to a normal handle using jbd2_journal_start_reserved() before | ||
476 | * it can be used. | ||
477 | * | ||
478 | * Return a pointer to a newly allocated handle, or an ERR_PTR() value | ||
479 | * on failure. | ||
480 | */ | ||
481 | handle_t *jbd2_journal_start(journal_t *journal, int nblocks) | 481 | handle_t *jbd2_journal_start(journal_t *journal, int nblocks) |
482 | { | 482 | { |
483 | return jbd2__journal_start(journal, nblocks, 0, GFP_NOFS, 0, 0); | 483 | return jbd2__journal_start(journal, nblocks, 0, GFP_NOFS, 0, 0); |
@@ -1072,10 +1072,10 @@ out: | |||
1072 | * @handle: transaction to add buffer modifications to | 1072 | * @handle: transaction to add buffer modifications to |
1073 | * @bh: bh to be used for metadata writes | 1073 | * @bh: bh to be used for metadata writes |
1074 | * | 1074 | * |
1075 | * Returns an error code or 0 on success. | 1075 | * Returns: error code or 0 on success. |
1076 | * | 1076 | * |
1077 | * In full data journalling mode the buffer may be of type BJ_AsyncData, | 1077 | * In full data journalling mode the buffer may be of type BJ_AsyncData, |
1078 | * because we're write()ing a buffer which is also part of a shared mapping. | 1078 | * because we're ``write()ing`` a buffer which is also part of a shared mapping. |
1079 | */ | 1079 | */ |
1080 | 1080 | ||
1081 | int jbd2_journal_get_write_access(handle_t *handle, struct buffer_head *bh) | 1081 | int jbd2_journal_get_write_access(handle_t *handle, struct buffer_head *bh) |
diff --git a/fs/mpage.c b/fs/mpage.c index d6d1486d6f99..2e4c41ccb5c9 100644 --- a/fs/mpage.c +++ b/fs/mpage.c | |||
@@ -345,6 +345,7 @@ confused: | |||
345 | * | 345 | * |
346 | * So an mpage read of the first 16 blocks of an ext2 file will cause I/O to be | 346 | * So an mpage read of the first 16 blocks of an ext2 file will cause I/O to be |
347 | * submitted in the following order: | 347 | * submitted in the following order: |
348 | * | ||
348 | * 12 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 | 349 | * 12 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 |
349 | * | 350 | * |
350 | * because the indirect block has to be read to get the mappings of blocks | 351 | * because the indirect block has to be read to get the mappings of blocks |
diff --git a/fs/namei.c b/fs/namei.c index 6571a5f5112e..8bacc390c51e 100644 --- a/fs/namei.c +++ b/fs/namei.c | |||
@@ -4332,6 +4332,7 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname | |||
4332 | * The worst of all namespace operations - renaming directory. "Perverted" | 4332 | * The worst of all namespace operations - renaming directory. "Perverted" |
4333 | * doesn't even start to describe it. Somebody in UCB had a heck of a trip... | 4333 | * doesn't even start to describe it. Somebody in UCB had a heck of a trip... |
4334 | * Problems: | 4334 | * Problems: |
4335 | * | ||
4335 | * a) we can get into loop creation. | 4336 | * a) we can get into loop creation. |
4336 | * b) race potential - two innocent renames can create a loop together. | 4337 | * b) race potential - two innocent renames can create a loop together. |
4337 | * That's where 4.4 screws up. Current fix: serialization on | 4338 | * That's where 4.4 screws up. Current fix: serialization on |
diff --git a/include/linux/ata.h b/include/linux/ata.h index ad7d9ee89ff0..73fe18edfdaf 100644 --- a/include/linux/ata.h +++ b/include/linux/ata.h | |||
@@ -20,7 +20,7 @@ | |||
20 | * | 20 | * |
21 | * | 21 | * |
22 | * libata documentation is available via 'make {ps|pdf}docs', | 22 | * libata documentation is available via 'make {ps|pdf}docs', |
23 | * as Documentation/DocBook/libata.* | 23 | * as Documentation/driver-api/libata.rst |
24 | * | 24 | * |
25 | * Hardware documentation available from http://www.t13.org/ | 25 | * Hardware documentation available from http://www.t13.org/ |
26 | * | 26 | * |
diff --git a/include/linux/cred.h b/include/linux/cred.h index b03e7d049a64..c728d515e5e2 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h | |||
@@ -1,4 +1,4 @@ | |||
1 | /* Credentials management - see Documentation/security/credentials.txt | 1 | /* Credentials management - see Documentation/security/credentials.rst |
2 | * | 2 | * |
3 | * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved. | 3 | * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved. |
4 | * Written by David Howells (dhowells@redhat.com) | 4 | * Written by David Howells (dhowells@redhat.com) |
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h index 9174b0d28582..aa86e6d8c1aa 100644 --- a/include/linux/debugfs.h +++ b/include/linux/debugfs.h | |||
@@ -9,7 +9,7 @@ | |||
9 | * 2 as published by the Free Software Foundation. | 9 | * 2 as published by the Free Software Foundation. |
10 | * | 10 | * |
11 | * debugfs is for people to use instead of /proc or /sys. | 11 | * debugfs is for people to use instead of /proc or /sys. |
12 | * See Documentation/DocBook/filesystems for more details. | 12 | * See Documentation/filesystems/ for more details. |
13 | */ | 13 | */ |
14 | 14 | ||
15 | #ifndef _DEBUGFS_H_ | 15 | #ifndef _DEBUGFS_H_ |
diff --git a/include/linux/key.h b/include/linux/key.h index 78e25aabedaf..044114185120 100644 --- a/include/linux/key.h +++ b/include/linux/key.h | |||
@@ -9,7 +9,7 @@ | |||
9 | * 2 of the License, or (at your option) any later version. | 9 | * 2 of the License, or (at your option) any later version. |
10 | * | 10 | * |
11 | * | 11 | * |
12 | * See Documentation/security/keys.txt for information on keys/keyrings. | 12 | * See Documentation/security/keys/core.rst for information on keys/keyrings. |
13 | */ | 13 | */ |
14 | 14 | ||
15 | #ifndef _LINUX_KEY_H | 15 | #ifndef _LINUX_KEY_H |
diff --git a/include/linux/libata.h b/include/linux/libata.h index c9a69fc8821e..9e6633235ad7 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h | |||
@@ -19,7 +19,7 @@ | |||
19 | * | 19 | * |
20 | * | 20 | * |
21 | * libata documentation is available via 'make {ps|pdf}docs', | 21 | * libata documentation is available via 'make {ps|pdf}docs', |
22 | * as Documentation/DocBook/libata.* | 22 | * as Documentation/driver-api/libata.rst |
23 | * | 23 | * |
24 | */ | 24 | */ |
25 | 25 | ||
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 080f34e66017..a1eeaf603d2f 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h | |||
@@ -29,6 +29,8 @@ | |||
29 | #include <linux/rculist.h> | 29 | #include <linux/rculist.h> |
30 | 30 | ||
31 | /** | 31 | /** |
32 | * union security_list_options - Linux Security Module hook function list | ||
33 | * | ||
32 | * Security hooks for program execution operations. | 34 | * Security hooks for program execution operations. |
33 | * | 35 | * |
34 | * @bprm_set_creds: | 36 | * @bprm_set_creds: |
@@ -193,8 +195,8 @@ | |||
193 | * @value will be set to the allocated attribute value. | 195 | * @value will be set to the allocated attribute value. |
194 | * @len will be set to the length of the value. | 196 | * @len will be set to the length of the value. |
195 | * Returns 0 if @name and @value have been successfully set, | 197 | * Returns 0 if @name and @value have been successfully set, |
196 | * -EOPNOTSUPP if no security attribute is needed, or | 198 | * -EOPNOTSUPP if no security attribute is needed, or |
197 | * -ENOMEM on memory allocation failure. | 199 | * -ENOMEM on memory allocation failure. |
198 | * @inode_create: | 200 | * @inode_create: |
199 | * Check permission to create a regular file. | 201 | * Check permission to create a regular file. |
200 | * @dir contains inode structure of the parent of the new file. | 202 | * @dir contains inode structure of the parent of the new file. |
@@ -510,8 +512,7 @@ | |||
510 | * process @tsk. Note that this hook is sometimes called from interrupt. | 512 | * process @tsk. Note that this hook is sometimes called from interrupt. |
511 | * Note that the fown_struct, @fown, is never outside the context of a | 513 | * Note that the fown_struct, @fown, is never outside the context of a |
512 | * struct file, so the file structure (and associated security information) | 514 | * struct file, so the file structure (and associated security information) |
513 | * can always be obtained: | 515 | * can always be obtained: container_of(fown, struct file, f_owner) |
514 | * container_of(fown, struct file, f_owner) | ||
515 | * @tsk contains the structure of task receiving signal. | 516 | * @tsk contains the structure of task receiving signal. |
516 | * @fown contains the file owner information. | 517 | * @fown contains the file owner information. |
517 | * @sig is the signal that will be sent. When 0, kernel sends SIGIO. | 518 | * @sig is the signal that will be sent. When 0, kernel sends SIGIO. |
@@ -521,7 +522,7 @@ | |||
521 | * to receive an open file descriptor via socket IPC. | 522 | * to receive an open file descriptor via socket IPC. |
522 | * @file contains the file structure being received. | 523 | * @file contains the file structure being received. |
523 | * Return 0 if permission is granted. | 524 | * Return 0 if permission is granted. |
524 | * @file_open | 525 | * @file_open: |
525 | * Save open-time permission checking state for later use upon | 526 | * Save open-time permission checking state for later use upon |
526 | * file_permission, and recheck access if anything has changed | 527 | * file_permission, and recheck access if anything has changed |
527 | * since inode_permission. | 528 | * since inode_permission. |
@@ -1143,7 +1144,7 @@ | |||
1143 | * @sma contains the semaphore structure. May be NULL. | 1144 | * @sma contains the semaphore structure. May be NULL. |
1144 | * @cmd contains the operation to be performed. | 1145 | * @cmd contains the operation to be performed. |
1145 | * Return 0 if permission is granted. | 1146 | * Return 0 if permission is granted. |
1146 | * @sem_semop | 1147 | * @sem_semop: |
1147 | * Check permissions before performing operations on members of the | 1148 | * Check permissions before performing operations on members of the |
1148 | * semaphore set @sma. If the @alter flag is nonzero, the semaphore set | 1149 | * semaphore set @sma. If the @alter flag is nonzero, the semaphore set |
1149 | * may be modified. | 1150 | * may be modified. |
@@ -1153,20 +1154,20 @@ | |||
1153 | * @alter contains the flag indicating whether changes are to be made. | 1154 | * @alter contains the flag indicating whether changes are to be made. |
1154 | * Return 0 if permission is granted. | 1155 | * Return 0 if permission is granted. |
1155 | * | 1156 | * |
1156 | * @binder_set_context_mgr | 1157 | * @binder_set_context_mgr: |
1157 | * Check whether @mgr is allowed to be the binder context manager. | 1158 | * Check whether @mgr is allowed to be the binder context manager. |
1158 | * @mgr contains the task_struct for the task being registered. | 1159 | * @mgr contains the task_struct for the task being registered. |
1159 | * Return 0 if permission is granted. | 1160 | * Return 0 if permission is granted. |
1160 | * @binder_transaction | 1161 | * @binder_transaction: |
1161 | * Check whether @from is allowed to invoke a binder transaction call | 1162 | * Check whether @from is allowed to invoke a binder transaction call |
1162 | * to @to. | 1163 | * to @to. |
1163 | * @from contains the task_struct for the sending task. | 1164 | * @from contains the task_struct for the sending task. |
1164 | * @to contains the task_struct for the receiving task. | 1165 | * @to contains the task_struct for the receiving task. |
1165 | * @binder_transfer_binder | 1166 | * @binder_transfer_binder: |
1166 | * Check whether @from is allowed to transfer a binder reference to @to. | 1167 | * Check whether @from is allowed to transfer a binder reference to @to. |
1167 | * @from contains the task_struct for the sending task. | 1168 | * @from contains the task_struct for the sending task. |
1168 | * @to contains the task_struct for the receiving task. | 1169 | * @to contains the task_struct for the receiving task. |
1169 | * @binder_transfer_file | 1170 | * @binder_transfer_file: |
1170 | * Check whether @from is allowed to transfer @file to @to. | 1171 | * Check whether @from is allowed to transfer @file to @to. |
1171 | * @from contains the task_struct for the sending task. | 1172 | * @from contains the task_struct for the sending task. |
1172 | * @file contains the struct file being transferred. | 1173 | * @file contains the struct file being transferred. |
@@ -1214,7 +1215,7 @@ | |||
1214 | * @cred contains the credentials to use. | 1215 | * @cred contains the credentials to use. |
1215 | * @ns contains the user namespace we want the capability in | 1216 | * @ns contains the user namespace we want the capability in |
1216 | * @cap contains the capability <include/linux/capability.h>. | 1217 | * @cap contains the capability <include/linux/capability.h>. |
1217 | * @audit: Whether to write an audit message or not | 1218 | * @audit contains whether to write an audit message or not |
1218 | * Return 0 if the capability is granted for @tsk. | 1219 | * Return 0 if the capability is granted for @tsk. |
1219 | * @syslog: | 1220 | * @syslog: |
1220 | * Check permission before accessing the kernel message ring or changing | 1221 | * Check permission before accessing the kernel message ring or changing |
@@ -1336,9 +1337,7 @@ | |||
1336 | * @inode we wish to get the security context of. | 1337 | * @inode we wish to get the security context of. |
1337 | * @ctx is a pointer in which to place the allocated security context. | 1338 | * @ctx is a pointer in which to place the allocated security context. |
1338 | * @ctxlen points to the place to put the length of @ctx. | 1339 | * @ctxlen points to the place to put the length of @ctx. |
1339 | * This is the main security structure. | ||
1340 | */ | 1340 | */ |
1341 | |||
1342 | union security_list_options { | 1341 | union security_list_options { |
1343 | int (*binder_set_context_mgr)(struct task_struct *mgr); | 1342 | int (*binder_set_context_mgr)(struct task_struct *mgr); |
1344 | int (*binder_transaction)(struct task_struct *from, | 1343 | int (*binder_transaction)(struct task_struct *from, |
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index 8f67b1581683..de0d889e4fe1 100644 --- a/include/linux/mtd/nand.h +++ b/include/linux/mtd/nand.h | |||
@@ -785,7 +785,7 @@ struct nand_manufacturer_ops { | |||
785 | * Minimum amount of bit errors per @ecc_step_ds guaranteed | 785 | * Minimum amount of bit errors per @ecc_step_ds guaranteed |
786 | * to be correctable. If unknown, set to zero. | 786 | * to be correctable. If unknown, set to zero. |
787 | * @ecc_step_ds: [INTERN] ECC step required by the @ecc_strength_ds, | 787 | * @ecc_step_ds: [INTERN] ECC step required by the @ecc_strength_ds, |
788 | * also from the datasheet. It is the recommended ECC step | 788 | * also from the datasheet. It is the recommended ECC step |
789 | * size, if known; if unknown, set to zero. | 789 | * size, if known; if unknown, set to zero. |
790 | * @onfi_timing_mode_default: [INTERN] default ONFI timing mode. This field is | 790 | * @onfi_timing_mode_default: [INTERN] default ONFI timing mode. This field is |
791 | * set to the actually used ONFI mode if the chip is | 791 | * set to the actually used ONFI mode if the chip is |
diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 1127fe31645d..ffcba1f337da 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h | |||
@@ -214,9 +214,9 @@ enum mutex_trylock_recursive_enum { | |||
214 | * raisins, and once those are gone this will be removed. | 214 | * raisins, and once those are gone this will be removed. |
215 | * | 215 | * |
216 | * Returns: | 216 | * Returns: |
217 | * MUTEX_TRYLOCK_FAILED - trylock failed, | 217 | * - MUTEX_TRYLOCK_FAILED - trylock failed, |
218 | * MUTEX_TRYLOCK_SUCCESS - lock acquired, | 218 | * - MUTEX_TRYLOCK_SUCCESS - lock acquired, |
219 | * MUTEX_TRYLOCK_RECURSIVE - we already owned the lock. | 219 | * - MUTEX_TRYLOCK_RECURSIVE - we already owned the lock. |
220 | */ | 220 | */ |
221 | static inline /* __deprecated */ __must_check enum mutex_trylock_recursive_enum | 221 | static inline /* __deprecated */ __must_check enum mutex_trylock_recursive_enum |
222 | mutex_trylock_recursive(struct mutex *lock) | 222 | mutex_trylock_recursive(struct mutex *lock) |
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 4ed952c17fc7..24e88b33a06c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h | |||
@@ -1432,13 +1432,14 @@ enum netdev_priv_flags { | |||
1432 | 1432 | ||
1433 | /** | 1433 | /** |
1434 | * struct net_device - The DEVICE structure. | 1434 | * struct net_device - The DEVICE structure. |
1435 | * Actually, this whole structure is a big mistake. It mixes I/O | 1435 | * |
1436 | * data with strictly "high-level" data, and it has to know about | 1436 | * Actually, this whole structure is a big mistake. It mixes I/O |
1437 | * almost every data structure used in the INET module. | 1437 | * data with strictly "high-level" data, and it has to know about |
1438 | * almost every data structure used in the INET module. | ||
1438 | * | 1439 | * |
1439 | * @name: This is the first field of the "visible" part of this structure | 1440 | * @name: This is the first field of the "visible" part of this structure |
1440 | * (i.e. as seen by users in the "Space.c" file). It is the name | 1441 | * (i.e. as seen by users in the "Space.c" file). It is the name |
1441 | * of the interface. | 1442 | * of the interface. |
1442 | * | 1443 | * |
1443 | * @name_hlist: Device name hash chain, please keep it close to name[] | 1444 | * @name_hlist: Device name hash chain, please keep it close to name[] |
1444 | * @ifalias: SNMP alias | 1445 | * @ifalias: SNMP alias |
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a098d95b3d84..25b1659c832a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h | |||
@@ -2691,7 +2691,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio); | |||
2691 | * @offset: the offset within the fragment (starting at the | 2691 | * @offset: the offset within the fragment (starting at the |
2692 | * fragment's own offset) | 2692 | * fragment's own offset) |
2693 | * @size: the number of bytes to map | 2693 | * @size: the number of bytes to map |
2694 | * @dir: the direction of the mapping (%PCI_DMA_*) | 2694 | * @dir: the direction of the mapping (``PCI_DMA_*``) |
2695 | * | 2695 | * |
2696 | * Maps the page associated with @frag to @device. | 2696 | * Maps the page associated with @frag to @device. |
2697 | */ | 2697 | */ |
diff --git a/include/net/sock.h b/include/net/sock.h index f33e3d134e0b..f97da141d920 100644 --- a/include/net/sock.h +++ b/include/net/sock.h | |||
@@ -1953,11 +1953,10 @@ static inline bool sk_has_allocations(const struct sock *sk) | |||
1953 | * The purpose of the skwq_has_sleeper and sock_poll_wait is to wrap the memory | 1953 | * The purpose of the skwq_has_sleeper and sock_poll_wait is to wrap the memory |
1954 | * barrier call. They were added due to the race found within the tcp code. | 1954 | * barrier call. They were added due to the race found within the tcp code. |
1955 | * | 1955 | * |
1956 | * Consider following tcp code paths: | 1956 | * Consider following tcp code paths:: |
1957 | * | 1957 | * |
1958 | * CPU1 CPU2 | 1958 | * CPU1 CPU2 |
1959 | * | 1959 | * sys_select receive packet |
1960 | * sys_select receive packet | ||
1961 | * ... ... | 1960 | * ... ... |
1962 | * __add_wait_queue update tp->rcv_nxt | 1961 | * __add_wait_queue update tp->rcv_nxt |
1963 | * ... ... | 1962 | * ... ... |
@@ -2264,7 +2263,7 @@ void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags); | |||
2264 | * @tsflags: timestamping flags to use | 2263 | * @tsflags: timestamping flags to use |
2265 | * @tx_flags: completed with instructions for time stamping | 2264 | * @tx_flags: completed with instructions for time stamping |
2266 | * | 2265 | * |
2267 | * Note : callers should take care of initial *tx_flags value (usually 0) | 2266 | * Note: callers should take care of initial ``*tx_flags`` value (usually 0) |
2268 | */ | 2267 | */ |
2269 | static inline void sock_tx_timestamp(const struct sock *sk, __u16 tsflags, | 2268 | static inline void sock_tx_timestamp(const struct sock *sk, __u16 tsflags, |
2270 | __u8 *tx_flags) | 2269 | __u8 *tx_flags) |
diff --git a/kernel/cred.c b/kernel/cred.c index 2bc66075740f..ecf03657e71c 100644 --- a/kernel/cred.c +++ b/kernel/cred.c | |||
@@ -1,4 +1,4 @@ | |||
1 | /* Task credentials management - see Documentation/security/credentials.txt | 1 | /* Task credentials management - see Documentation/security/credentials.rst |
2 | * | 2 | * |
3 | * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved. | 3 | * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved. |
4 | * Written by David Howells (dhowells@redhat.com) | 4 | * Written by David Howells (dhowells@redhat.com) |
diff --git a/kernel/futex.c b/kernel/futex.c index d6cf71d08f21..c934689043b2 100644 --- a/kernel/futex.c +++ b/kernel/futex.c | |||
@@ -488,7 +488,7 @@ static void drop_futex_key_refs(union futex_key *key) | |||
488 | * | 488 | * |
489 | * Return: a negative error code or 0 | 489 | * Return: a negative error code or 0 |
490 | * | 490 | * |
491 | * The key words are stored in *key on success. | 491 | * The key words are stored in @key on success. |
492 | * | 492 | * |
493 | * For shared mappings, it's (page->index, file_inode(vma->vm_file), | 493 | * For shared mappings, it's (page->index, file_inode(vma->vm_file), |
494 | * offset_within_page). For private mappings, it's (uaddr, current->mm). | 494 | * offset_within_page). For private mappings, it's (uaddr, current->mm). |
@@ -1259,9 +1259,9 @@ static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval) | |||
1259 | * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0) | 1259 | * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0) |
1260 | * | 1260 | * |
1261 | * Return: | 1261 | * Return: |
1262 | * 0 - ready to wait; | 1262 | * - 0 - ready to wait; |
1263 | * 1 - acquired the lock; | 1263 | * - 1 - acquired the lock; |
1264 | * <0 - error | 1264 | * - <0 - error |
1265 | * | 1265 | * |
1266 | * The hb->lock and futex_key refs shall be held by the caller. | 1266 | * The hb->lock and futex_key refs shall be held by the caller. |
1267 | */ | 1267 | */ |
@@ -1717,9 +1717,9 @@ void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key, | |||
1717 | * hb1 and hb2 must be held by the caller. | 1717 | * hb1 and hb2 must be held by the caller. |
1718 | * | 1718 | * |
1719 | * Return: | 1719 | * Return: |
1720 | * 0 - failed to acquire the lock atomically; | 1720 | * - 0 - failed to acquire the lock atomically; |
1721 | * >0 - acquired the lock, return value is vpid of the top_waiter | 1721 | * - >0 - acquired the lock, return value is vpid of the top_waiter |
1722 | * <0 - error | 1722 | * - <0 - error |
1723 | */ | 1723 | */ |
1724 | static int futex_proxy_trylock_atomic(u32 __user *pifutex, | 1724 | static int futex_proxy_trylock_atomic(u32 __user *pifutex, |
1725 | struct futex_hash_bucket *hb1, | 1725 | struct futex_hash_bucket *hb1, |
@@ -1785,8 +1785,8 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex, | |||
1785 | * uaddr2 atomically on behalf of the top waiter. | 1785 | * uaddr2 atomically on behalf of the top waiter. |
1786 | * | 1786 | * |
1787 | * Return: | 1787 | * Return: |
1788 | * >=0 - on success, the number of tasks requeued or woken; | 1788 | * - >=0 - on success, the number of tasks requeued or woken; |
1789 | * <0 - on error | 1789 | * - <0 - on error |
1790 | */ | 1790 | */ |
1791 | static int futex_requeue(u32 __user *uaddr1, unsigned int flags, | 1791 | static int futex_requeue(u32 __user *uaddr1, unsigned int flags, |
1792 | u32 __user *uaddr2, int nr_wake, int nr_requeue, | 1792 | u32 __user *uaddr2, int nr_wake, int nr_requeue, |
@@ -2142,8 +2142,8 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb) | |||
2142 | * be paired with exactly one earlier call to queue_me(). | 2142 | * be paired with exactly one earlier call to queue_me(). |
2143 | * | 2143 | * |
2144 | * Return: | 2144 | * Return: |
2145 | * 1 - if the futex_q was still queued (and we removed unqueued it); | 2145 | * - 1 - if the futex_q was still queued (and we removed unqueued it); |
2146 | * 0 - if the futex_q was already removed by the waking thread | 2146 | * - 0 - if the futex_q was already removed by the waking thread |
2147 | */ | 2147 | */ |
2148 | static int unqueue_me(struct futex_q *q) | 2148 | static int unqueue_me(struct futex_q *q) |
2149 | { | 2149 | { |
@@ -2333,9 +2333,9 @@ static long futex_wait_restart(struct restart_block *restart); | |||
2333 | * acquire the lock. Must be called with the hb lock held. | 2333 | * acquire the lock. Must be called with the hb lock held. |
2334 | * | 2334 | * |
2335 | * Return: | 2335 | * Return: |
2336 | * 1 - success, lock taken; | 2336 | * - 1 - success, lock taken; |
2337 | * 0 - success, lock not taken; | 2337 | * - 0 - success, lock not taken; |
2338 | * <0 - on error (-EFAULT) | 2338 | * - <0 - on error (-EFAULT) |
2339 | */ | 2339 | */ |
2340 | static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) | 2340 | static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) |
2341 | { | 2341 | { |
@@ -2422,8 +2422,8 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q, | |||
2422 | * with no q.key reference on failure. | 2422 | * with no q.key reference on failure. |
2423 | * | 2423 | * |
2424 | * Return: | 2424 | * Return: |
2425 | * 0 - uaddr contains val and hb has been locked; | 2425 | * - 0 - uaddr contains val and hb has been locked; |
2426 | * <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlocked | 2426 | * - <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlocked |
2427 | */ | 2427 | */ |
2428 | static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags, | 2428 | static int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags, |
2429 | struct futex_q *q, struct futex_hash_bucket **hb) | 2429 | struct futex_q *q, struct futex_hash_bucket **hb) |
@@ -2895,8 +2895,8 @@ pi_faulted: | |||
2895 | * called with the hb lock held. | 2895 | * called with the hb lock held. |
2896 | * | 2896 | * |
2897 | * Return: | 2897 | * Return: |
2898 | * 0 = no early wakeup detected; | 2898 | * - 0 = no early wakeup detected; |
2899 | * <0 = -ETIMEDOUT or -ERESTARTNOINTR | 2899 | * - <0 = -ETIMEDOUT or -ERESTARTNOINTR |
2900 | */ | 2900 | */ |
2901 | static inline | 2901 | static inline |
2902 | int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb, | 2902 | int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb, |
@@ -2968,8 +2968,8 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb, | |||
2968 | * If 4 or 7, we cleanup and return with -ETIMEDOUT. | 2968 | * If 4 or 7, we cleanup and return with -ETIMEDOUT. |
2969 | * | 2969 | * |
2970 | * Return: | 2970 | * Return: |
2971 | * 0 - On success; | 2971 | * - 0 - On success; |
2972 | * <0 - On error | 2972 | * - <0 - On error |
2973 | */ | 2973 | */ |
2974 | static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, | 2974 | static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, |
2975 | u32 val, ktime_t *abs_time, u32 bitset, | 2975 | u32 val, ktime_t *abs_time, u32 bitset, |
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 2e30d925a40d..ad43468e89f0 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c | |||
@@ -7,7 +7,7 @@ | |||
7 | * This file contains the core interrupt handling code, for irq-chip | 7 | * This file contains the core interrupt handling code, for irq-chip |
8 | * based architectures. | 8 | * based architectures. |
9 | * | 9 | * |
10 | * Detailed information is available in Documentation/DocBook/genericirq | 10 | * Detailed information is available in Documentation/core-api/genericirq.rst |
11 | */ | 11 | */ |
12 | 12 | ||
13 | #include <linux/irq.h> | 13 | #include <linux/irq.h> |
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c index eb4d3e8945b8..79f987b942b8 100644 --- a/kernel/irq/handle.c +++ b/kernel/irq/handle.c | |||
@@ -6,7 +6,7 @@ | |||
6 | * | 6 | * |
7 | * This file contains the core interrupt handling code. | 7 | * This file contains the core interrupt handling code. |
8 | * | 8 | * |
9 | * Detailed information is available in Documentation/DocBook/genericirq | 9 | * Detailed information is available in Documentation/core-api/genericirq.rst |
10 | * | 10 | * |
11 | */ | 11 | */ |
12 | 12 | ||
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 948b50e78549..8bbd06405e60 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c | |||
@@ -4,7 +4,7 @@ | |||
4 | * | 4 | * |
5 | * This file contains the interrupt descriptor management code | 5 | * This file contains the interrupt descriptor management code |
6 | * | 6 | * |
7 | * Detailed information is available in Documentation/DocBook/genericirq | 7 | * Detailed information is available in Documentation/core-api/genericirq.rst |
8 | * | 8 | * |
9 | */ | 9 | */ |
10 | #include <linux/irq.h> | 10 | #include <linux/irq.h> |
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 198527a62149..858a07590e39 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c | |||
@@ -227,9 +227,9 @@ static void __sched __mutex_lock_slowpath(struct mutex *lock); | |||
227 | * (or statically defined) before it can be locked. memset()-ing | 227 | * (or statically defined) before it can be locked. memset()-ing |
228 | * the mutex to 0 is not allowed. | 228 | * the mutex to 0 is not allowed. |
229 | * | 229 | * |
230 | * ( The CONFIG_DEBUG_MUTEXES .config option turns on debugging | 230 | * (The CONFIG_DEBUG_MUTEXES .config option turns on debugging |
231 | * checks that will enforce the restrictions and will also do | 231 | * checks that will enforce the restrictions and will also do |
232 | * deadlock debugging. ) | 232 | * deadlock debugging) |
233 | * | 233 | * |
234 | * This function is similar to (but not equivalent to) down(). | 234 | * This function is similar to (but not equivalent to) down(). |
235 | */ | 235 | */ |
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 9c5d40a50930..ca9460f049b8 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug | |||
@@ -286,7 +286,7 @@ config DEBUG_FS | |||
286 | write to these files. | 286 | write to these files. |
287 | 287 | ||
288 | For detailed documentation on the debugfs API, see | 288 | For detailed documentation on the debugfs API, see |
289 | Documentation/DocBook/filesystems. | 289 | Documentation/filesystems/. |
290 | 290 | ||
291 | If unsure, say N. | 291 | If unsure, say N. |
292 | 292 | ||
diff --git a/lib/Kconfig.kgdb b/lib/Kconfig.kgdb index 533f912638ed..ab4ff0eea776 100644 --- a/lib/Kconfig.kgdb +++ b/lib/Kconfig.kgdb | |||
@@ -13,7 +13,7 @@ menuconfig KGDB | |||
13 | CONFIG_FRAME_POINTER to aid in producing more reliable stack | 13 | CONFIG_FRAME_POINTER to aid in producing more reliable stack |
14 | backtraces in the external debugger. Documentation of | 14 | backtraces in the external debugger. Documentation of |
15 | kernel debugger is available at http://kgdb.sourceforge.net | 15 | kernel debugger is available at http://kgdb.sourceforge.net |
16 | as well as in DocBook form in Documentation/DocBook/. If | 16 | as well as in Documentation/dev-tools/kgdb.rst. If |
17 | unsure, say N. | 17 | unsure, say N. |
18 | 18 | ||
19 | if KGDB | 19 | if KGDB |
diff --git a/net/core/datagram.c b/net/core/datagram.c index 34678828e2bb..f9653987c0f9 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c | |||
@@ -181,7 +181,7 @@ done: | |||
181 | * | 181 | * |
182 | * This function will lock the socket if a skb is returned, so | 182 | * This function will lock the socket if a skb is returned, so |
183 | * the caller needs to unlock the socket in that case (usually by | 183 | * the caller needs to unlock the socket in that case (usually by |
184 | * calling skb_free_datagram). Returns NULL with *err set to | 184 | * calling skb_free_datagram). Returns NULL with @err set to |
185 | * -EAGAIN if no data was available or to some other value if an | 185 | * -EAGAIN if no data was available or to some other value if an |
186 | * error was detected. | 186 | * error was detected. |
187 | * | 187 | * |
diff --git a/net/core/sock.c b/net/core/sock.c index 727f924b7f91..0c3fc16223f9 100644 --- a/net/core/sock.c +++ b/net/core/sock.c | |||
@@ -2675,9 +2675,12 @@ EXPORT_SYMBOL(release_sock); | |||
2675 | * @sk: socket | 2675 | * @sk: socket |
2676 | * | 2676 | * |
2677 | * This version should be used for very small section, where process wont block | 2677 | * This version should be used for very small section, where process wont block |
2678 | * return false if fast path is taken | 2678 | * return false if fast path is taken: |
2679 | * | ||
2679 | * sk_lock.slock locked, owned = 0, BH disabled | 2680 | * sk_lock.slock locked, owned = 0, BH disabled |
2680 | * return true if slow path is taken | 2681 | * |
2682 | * return true if slow path is taken: | ||
2683 | * | ||
2681 | * sk_lock.slock unlocked, owned = 1, BH enabled | 2684 | * sk_lock.slock unlocked, owned = 1, BH enabled |
2682 | */ | 2685 | */ |
2683 | bool lock_sock_fast(struct sock *sk) | 2686 | bool lock_sock_fast(struct sock *sk) |
diff --git a/scripts/.gitignore b/scripts/.gitignore index e063daa3ec4a..0442c06eefcb 100644 --- a/scripts/.gitignore +++ b/scripts/.gitignore | |||
@@ -7,7 +7,6 @@ pnmtologo | |||
7 | unifdef | 7 | unifdef |
8 | ihex2fw | 8 | ihex2fw |
9 | recordmcount | 9 | recordmcount |
10 | docproc | ||
11 | check-lc_ctype | 10 | check-lc_ctype |
12 | sortextable | 11 | sortextable |
13 | asn1_compiler | 12 | asn1_compiler |
diff --git a/scripts/Makefile b/scripts/Makefile index 1d80897a9644..c06f4997d700 100644 --- a/scripts/Makefile +++ b/scripts/Makefile | |||
@@ -6,8 +6,6 @@ | |||
6 | # pnmttologo: Convert pnm files to logo files | 6 | # pnmttologo: Convert pnm files to logo files |
7 | # conmakehash: Create chartable | 7 | # conmakehash: Create chartable |
8 | # conmakehash: Create arrays for initializing the kernel console tables | 8 | # conmakehash: Create arrays for initializing the kernel console tables |
9 | # docproc: Used in Documentation/DocBook | ||
10 | # check-lc_ctype: Used in Documentation/DocBook | ||
11 | 9 | ||
12 | HOST_EXTRACFLAGS += -I$(srctree)/tools/include | 10 | HOST_EXTRACFLAGS += -I$(srctree)/tools/include |
13 | 11 | ||
@@ -29,16 +27,12 @@ HOSTLOADLIBES_extract-cert = -lcrypto | |||
29 | always := $(hostprogs-y) $(hostprogs-m) | 27 | always := $(hostprogs-y) $(hostprogs-m) |
30 | 28 | ||
31 | # The following hostprogs-y programs are only build on demand | 29 | # The following hostprogs-y programs are only build on demand |
32 | hostprogs-y += unifdef docproc check-lc_ctype | 30 | hostprogs-y += unifdef |
33 | 31 | ||
34 | # These targets are used internally to avoid "is up to date" messages | 32 | # These targets are used internally to avoid "is up to date" messages |
35 | PHONY += build_unifdef build_docproc build_check-lc_ctype | 33 | PHONY += build_unifdef |
36 | build_unifdef: $(obj)/unifdef | 34 | build_unifdef: $(obj)/unifdef |
37 | @: | 35 | @: |
38 | build_docproc: $(obj)/docproc | ||
39 | @: | ||
40 | build_check-lc_ctype: $(obj)/check-lc_ctype | ||
41 | @: | ||
42 | 36 | ||
43 | subdir-$(CONFIG_MODVERSIONS) += genksyms | 37 | subdir-$(CONFIG_MODVERSIONS) += genksyms |
44 | subdir-y += mod | 38 | subdir-y += mod |
diff --git a/scripts/check-lc_ctype.c b/scripts/check-lc_ctype.c deleted file mode 100644 index 9097ff5449fb..000000000000 --- a/scripts/check-lc_ctype.c +++ /dev/null | |||
@@ -1,11 +0,0 @@ | |||
1 | /* | ||
2 | * Check that a specified locale works as LC_CTYPE. Used by the | ||
3 | * DocBook build system to probe for C.UTF-8 support. | ||
4 | */ | ||
5 | |||
6 | #include <locale.h> | ||
7 | |||
8 | int main(void) | ||
9 | { | ||
10 | return !setlocale(LC_CTYPE, ""); | ||
11 | } | ||
diff --git a/scripts/docproc.c b/scripts/docproc.c deleted file mode 100644 index 0a12593b9041..000000000000 --- a/scripts/docproc.c +++ /dev/null | |||
@@ -1,681 +0,0 @@ | |||
1 | /* | ||
2 | * docproc is a simple preprocessor for the template files | ||
3 | * used as placeholders for the kernel internal documentation. | ||
4 | * docproc is used for documentation-frontend and | ||
5 | * dependency-generator. | ||
6 | * The two usages have in common that they require | ||
7 | * some knowledge of the .tmpl syntax, therefore they | ||
8 | * are kept together. | ||
9 | * | ||
10 | * documentation-frontend | ||
11 | * Scans the template file and call kernel-doc for | ||
12 | * all occurrences of ![EIF]file | ||
13 | * Beforehand each referenced file is scanned for | ||
14 | * any symbols that are exported via these macros: | ||
15 | * EXPORT_SYMBOL(), EXPORT_SYMBOL_GPL(), & | ||
16 | * EXPORT_SYMBOL_GPL_FUTURE() | ||
17 | * This is used to create proper -function and | ||
18 | * -nofunction arguments in calls to kernel-doc. | ||
19 | * Usage: docproc doc file.tmpl | ||
20 | * | ||
21 | * dependency-generator: | ||
22 | * Scans the template file and list all files | ||
23 | * referenced in a format recognized by make. | ||
24 | * Usage: docproc depend file.tmpl | ||
25 | * Writes dependency information to stdout | ||
26 | * in the following format: | ||
27 | * file.tmpl src.c src2.c | ||
28 | * The filenames are obtained from the following constructs: | ||
29 | * !Efilename | ||
30 | * !Ifilename | ||
31 | * !Dfilename | ||
32 | * !Ffilename | ||
33 | * !Pfilename | ||
34 | * | ||
35 | */ | ||
36 | |||
37 | #define _GNU_SOURCE | ||
38 | #include <stdio.h> | ||
39 | #include <stdlib.h> | ||
40 | #include <string.h> | ||
41 | #include <ctype.h> | ||
42 | #include <unistd.h> | ||
43 | #include <limits.h> | ||
44 | #include <errno.h> | ||
45 | #include <getopt.h> | ||
46 | #include <sys/types.h> | ||
47 | #include <sys/wait.h> | ||
48 | #include <time.h> | ||
49 | |||
50 | /* exitstatus is used to keep track of any failing calls to kernel-doc, | ||
51 | * but execution continues. */ | ||
52 | int exitstatus = 0; | ||
53 | |||
54 | typedef void DFL(char *); | ||
55 | DFL *defaultline; | ||
56 | |||
57 | typedef void FILEONLY(char * file); | ||
58 | FILEONLY *internalfunctions; | ||
59 | FILEONLY *externalfunctions; | ||
60 | FILEONLY *symbolsonly; | ||
61 | FILEONLY *findall; | ||
62 | |||
63 | typedef void FILELINE(char * file, char * line); | ||
64 | FILELINE * singlefunctions; | ||
65 | FILELINE * entity_system; | ||
66 | FILELINE * docsection; | ||
67 | |||
68 | #define MAXLINESZ 2048 | ||
69 | #define MAXFILES 250 | ||
70 | #define KERNELDOCPATH "scripts/" | ||
71 | #define KERNELDOC "kernel-doc" | ||
72 | #define DOCBOOK "-docbook" | ||
73 | #define RST "-rst" | ||
74 | #define LIST "-list" | ||
75 | #define FUNCTION "-function" | ||
76 | #define NOFUNCTION "-nofunction" | ||
77 | #define NODOCSECTIONS "-no-doc-sections" | ||
78 | #define SHOWNOTFOUND "-show-not-found" | ||
79 | |||
80 | enum file_format { | ||
81 | FORMAT_AUTO, | ||
82 | FORMAT_DOCBOOK, | ||
83 | FORMAT_RST, | ||
84 | }; | ||
85 | |||
86 | static enum file_format file_format = FORMAT_AUTO; | ||
87 | |||
88 | #define KERNELDOC_FORMAT (file_format == FORMAT_RST ? RST : DOCBOOK) | ||
89 | |||
90 | static char *srctree, *kernsrctree; | ||
91 | |||
92 | static char **all_list = NULL; | ||
93 | static int all_list_len = 0; | ||
94 | |||
95 | static void consume_symbol(const char *sym) | ||
96 | { | ||
97 | int i; | ||
98 | |||
99 | for (i = 0; i < all_list_len; i++) { | ||
100 | if (!all_list[i]) | ||
101 | continue; | ||
102 | if (strcmp(sym, all_list[i])) | ||
103 | continue; | ||
104 | all_list[i] = NULL; | ||
105 | break; | ||
106 | } | ||
107 | } | ||
108 | |||
109 | static void usage (void) | ||
110 | { | ||
111 | fprintf(stderr, "Usage: docproc [{--docbook|--rst}] {doc|depend} file\n"); | ||
112 | fprintf(stderr, "Input is read from file.tmpl. Output is sent to stdout\n"); | ||
113 | fprintf(stderr, "doc: frontend when generating kernel documentation\n"); | ||
114 | fprintf(stderr, "depend: generate list of files referenced within file\n"); | ||
115 | fprintf(stderr, "Environment variable SRCTREE: absolute path to sources.\n"); | ||
116 | fprintf(stderr, " KBUILD_SRC: absolute path to kernel source tree.\n"); | ||
117 | } | ||
118 | |||
119 | /* | ||
120 | * Execute kernel-doc with parameters given in svec | ||
121 | */ | ||
122 | static void exec_kernel_doc(char **svec) | ||
123 | { | ||
124 | pid_t pid; | ||
125 | int ret; | ||
126 | char real_filename[PATH_MAX + 1]; | ||
127 | /* Make sure output generated so far are flushed */ | ||
128 | fflush(stdout); | ||
129 | switch (pid=fork()) { | ||
130 | case -1: | ||
131 | perror("fork"); | ||
132 | exit(1); | ||
133 | case 0: | ||
134 | memset(real_filename, 0, sizeof(real_filename)); | ||
135 | strncat(real_filename, kernsrctree, PATH_MAX); | ||
136 | strncat(real_filename, "/" KERNELDOCPATH KERNELDOC, | ||
137 | PATH_MAX - strlen(real_filename)); | ||
138 | execvp(real_filename, svec); | ||
139 | fprintf(stderr, "exec "); | ||
140 | perror(real_filename); | ||
141 | exit(1); | ||
142 | default: | ||
143 | waitpid(pid, &ret ,0); | ||
144 | } | ||
145 | if (WIFEXITED(ret)) | ||
146 | exitstatus |= WEXITSTATUS(ret); | ||
147 | else | ||
148 | exitstatus = 0xff; | ||
149 | } | ||
150 | |||
151 | /* Types used to create list of all exported symbols in a number of files */ | ||
152 | struct symbols | ||
153 | { | ||
154 | char *name; | ||
155 | }; | ||
156 | |||
157 | struct symfile | ||
158 | { | ||
159 | char *filename; | ||
160 | struct symbols *symbollist; | ||
161 | int symbolcnt; | ||
162 | }; | ||
163 | |||
164 | struct symfile symfilelist[MAXFILES]; | ||
165 | int symfilecnt = 0; | ||
166 | |||
167 | static void add_new_symbol(struct symfile *sym, char * symname) | ||
168 | { | ||
169 | sym->symbollist = | ||
170 | realloc(sym->symbollist, (sym->symbolcnt + 1) * sizeof(char *)); | ||
171 | sym->symbollist[sym->symbolcnt++].name = strdup(symname); | ||
172 | } | ||
173 | |||
174 | /* Add a filename to the list */ | ||
175 | static struct symfile * add_new_file(char * filename) | ||
176 | { | ||
177 | symfilelist[symfilecnt++].filename = strdup(filename); | ||
178 | return &symfilelist[symfilecnt - 1]; | ||
179 | } | ||
180 | |||
181 | /* Check if file already are present in the list */ | ||
182 | static struct symfile * filename_exist(char * filename) | ||
183 | { | ||
184 | int i; | ||
185 | for (i=0; i < symfilecnt; i++) | ||
186 | if (strcmp(symfilelist[i].filename, filename) == 0) | ||
187 | return &symfilelist[i]; | ||
188 | return NULL; | ||
189 | } | ||
190 | |||
191 | /* | ||
192 | * List all files referenced within the template file. | ||
193 | * Files are separated by tabs. | ||
194 | */ | ||
195 | static void adddep(char * file) { printf("\t%s", file); } | ||
196 | static void adddep2(char * file, char * line) { line = line; adddep(file); } | ||
197 | static void noaction(char * line) { line = line; } | ||
198 | static void noaction2(char * file, char * line) { file = file; line = line; } | ||
199 | |||
200 | /* Echo the line without further action */ | ||
201 | static void printline(char * line) { printf("%s", line); } | ||
202 | |||
203 | /* | ||
204 | * Find all symbols in filename that are exported with EXPORT_SYMBOL & | ||
205 | * EXPORT_SYMBOL_GPL (& EXPORT_SYMBOL_GPL_FUTURE implicitly). | ||
206 | * All symbols located are stored in symfilelist. | ||
207 | */ | ||
208 | static void find_export_symbols(char * filename) | ||
209 | { | ||
210 | FILE * fp; | ||
211 | struct symfile *sym; | ||
212 | char line[MAXLINESZ]; | ||
213 | if (filename_exist(filename) == NULL) { | ||
214 | char real_filename[PATH_MAX + 1]; | ||
215 | memset(real_filename, 0, sizeof(real_filename)); | ||
216 | strncat(real_filename, srctree, PATH_MAX); | ||
217 | strncat(real_filename, "/", PATH_MAX - strlen(real_filename)); | ||
218 | strncat(real_filename, filename, | ||
219 | PATH_MAX - strlen(real_filename)); | ||
220 | sym = add_new_file(filename); | ||
221 | fp = fopen(real_filename, "r"); | ||
222 | if (fp == NULL) { | ||
223 | fprintf(stderr, "docproc: "); | ||
224 | perror(real_filename); | ||
225 | exit(1); | ||
226 | } | ||
227 | while (fgets(line, MAXLINESZ, fp)) { | ||
228 | char *p; | ||
229 | char *e; | ||
230 | if (((p = strstr(line, "EXPORT_SYMBOL_GPL")) != NULL) || | ||
231 | ((p = strstr(line, "EXPORT_SYMBOL")) != NULL)) { | ||
232 | /* Skip EXPORT_SYMBOL{_GPL} */ | ||
233 | while (isalnum(*p) || *p == '_') | ||
234 | p++; | ||
235 | /* Remove parentheses & additional whitespace */ | ||
236 | while (isspace(*p)) | ||
237 | p++; | ||
238 | if (*p != '(') | ||
239 | continue; /* Syntax error? */ | ||
240 | else | ||
241 | p++; | ||
242 | while (isspace(*p)) | ||
243 | p++; | ||
244 | e = p; | ||
245 | while (isalnum(*e) || *e == '_') | ||
246 | e++; | ||
247 | *e = '\0'; | ||
248 | add_new_symbol(sym, p); | ||
249 | } | ||
250 | } | ||
251 | fclose(fp); | ||
252 | } | ||
253 | } | ||
254 | |||
255 | /* | ||
256 | * Document all external or internal functions in a file. | ||
257 | * Call kernel-doc with following parameters: | ||
258 | * kernel-doc [-docbook|-rst] -nofunction function_name1 filename | ||
259 | * Function names are obtained from all the src files | ||
260 | * by find_export_symbols. | ||
261 | * intfunc uses -nofunction | ||
262 | * extfunc uses -function | ||
263 | */ | ||
264 | static void docfunctions(char * filename, char * type) | ||
265 | { | ||
266 | int i,j; | ||
267 | int symcnt = 0; | ||
268 | int idx = 0; | ||
269 | char **vec; | ||
270 | |||
271 | for (i=0; i <= symfilecnt; i++) | ||
272 | symcnt += symfilelist[i].symbolcnt; | ||
273 | vec = malloc((2 + 2 * symcnt + 3) * sizeof(char *)); | ||
274 | if (vec == NULL) { | ||
275 | perror("docproc: "); | ||
276 | exit(1); | ||
277 | } | ||
278 | vec[idx++] = KERNELDOC; | ||
279 | vec[idx++] = KERNELDOC_FORMAT; | ||
280 | vec[idx++] = NODOCSECTIONS; | ||
281 | for (i=0; i < symfilecnt; i++) { | ||
282 | struct symfile * sym = &symfilelist[i]; | ||
283 | for (j=0; j < sym->symbolcnt; j++) { | ||
284 | vec[idx++] = type; | ||
285 | consume_symbol(sym->symbollist[j].name); | ||
286 | vec[idx++] = sym->symbollist[j].name; | ||
287 | } | ||
288 | } | ||
289 | vec[idx++] = filename; | ||
290 | vec[idx] = NULL; | ||
291 | if (file_format == FORMAT_RST) | ||
292 | printf(".. %s\n", filename); | ||
293 | else | ||
294 | printf("<!-- %s -->\n", filename); | ||
295 | exec_kernel_doc(vec); | ||
296 | fflush(stdout); | ||
297 | free(vec); | ||
298 | } | ||
299 | static void intfunc(char * filename) { docfunctions(filename, NOFUNCTION); } | ||
300 | static void extfunc(char * filename) { docfunctions(filename, FUNCTION); } | ||
301 | |||
302 | /* | ||
303 | * Document specific function(s) in a file. | ||
304 | * Call kernel-doc with the following parameters: | ||
305 | * kernel-doc -docbook -function function1 [-function function2] | ||
306 | */ | ||
307 | static void singfunc(char * filename, char * line) | ||
308 | { | ||
309 | char *vec[200]; /* Enough for specific functions */ | ||
310 | int i, idx = 0; | ||
311 | int startofsym = 1; | ||
312 | vec[idx++] = KERNELDOC; | ||
313 | vec[idx++] = KERNELDOC_FORMAT; | ||
314 | vec[idx++] = SHOWNOTFOUND; | ||
315 | |||
316 | /* Split line up in individual parameters preceded by FUNCTION */ | ||
317 | for (i=0; line[i]; i++) { | ||
318 | if (isspace(line[i])) { | ||
319 | line[i] = '\0'; | ||
320 | startofsym = 1; | ||
321 | continue; | ||
322 | } | ||
323 | if (startofsym) { | ||
324 | startofsym = 0; | ||
325 | vec[idx++] = FUNCTION; | ||
326 | vec[idx++] = &line[i]; | ||
327 | } | ||
328 | } | ||
329 | for (i = 0; i < idx; i++) { | ||
330 | if (strcmp(vec[i], FUNCTION)) | ||
331 | continue; | ||
332 | consume_symbol(vec[i + 1]); | ||
333 | } | ||
334 | vec[idx++] = filename; | ||
335 | vec[idx] = NULL; | ||
336 | exec_kernel_doc(vec); | ||
337 | } | ||
338 | |||
339 | /* | ||
340 | * Insert specific documentation section from a file. | ||
341 | * Call kernel-doc with the following parameters: | ||
342 | * kernel-doc -docbook -function "doc section" filename | ||
343 | */ | ||
344 | static void docsect(char *filename, char *line) | ||
345 | { | ||
346 | /* kerneldoc -docbook -show-not-found -function "section" file NULL */ | ||
347 | char *vec[7]; | ||
348 | char *s; | ||
349 | |||
350 | for (s = line; *s; s++) | ||
351 | if (*s == '\n') | ||
352 | *s = '\0'; | ||
353 | |||
354 | if (asprintf(&s, "DOC: %s", line) < 0) { | ||
355 | perror("asprintf"); | ||
356 | exit(1); | ||
357 | } | ||
358 | consume_symbol(s); | ||
359 | free(s); | ||
360 | |||
361 | vec[0] = KERNELDOC; | ||
362 | vec[1] = KERNELDOC_FORMAT; | ||
363 | vec[2] = SHOWNOTFOUND; | ||
364 | vec[3] = FUNCTION; | ||
365 | vec[4] = line; | ||
366 | vec[5] = filename; | ||
367 | vec[6] = NULL; | ||
368 | exec_kernel_doc(vec); | ||
369 | } | ||
370 | |||
371 | static void find_all_symbols(char *filename) | ||
372 | { | ||
373 | char *vec[4]; /* kerneldoc -list file NULL */ | ||
374 | pid_t pid; | ||
375 | int ret, i, count, start; | ||
376 | char real_filename[PATH_MAX + 1]; | ||
377 | int pipefd[2]; | ||
378 | char *data, *str; | ||
379 | size_t data_len = 0; | ||
380 | |||
381 | vec[0] = KERNELDOC; | ||
382 | vec[1] = LIST; | ||
383 | vec[2] = filename; | ||
384 | vec[3] = NULL; | ||
385 | |||
386 | if (pipe(pipefd)) { | ||
387 | perror("pipe"); | ||
388 | exit(1); | ||
389 | } | ||
390 | |||
391 | switch (pid=fork()) { | ||
392 | case -1: | ||
393 | perror("fork"); | ||
394 | exit(1); | ||
395 | case 0: | ||
396 | close(pipefd[0]); | ||
397 | dup2(pipefd[1], 1); | ||
398 | memset(real_filename, 0, sizeof(real_filename)); | ||
399 | strncat(real_filename, kernsrctree, PATH_MAX); | ||
400 | strncat(real_filename, "/" KERNELDOCPATH KERNELDOC, | ||
401 | PATH_MAX - strlen(real_filename)); | ||
402 | execvp(real_filename, vec); | ||
403 | fprintf(stderr, "exec "); | ||
404 | perror(real_filename); | ||
405 | exit(1); | ||
406 | default: | ||
407 | close(pipefd[1]); | ||
408 | data = malloc(4096); | ||
409 | do { | ||
410 | while ((ret = read(pipefd[0], | ||
411 | data + data_len, | ||
412 | 4096)) > 0) { | ||
413 | data_len += ret; | ||
414 | data = realloc(data, data_len + 4096); | ||
415 | } | ||
416 | } while (ret == -EAGAIN); | ||
417 | if (ret != 0) { | ||
418 | perror("read"); | ||
419 | exit(1); | ||
420 | } | ||
421 | waitpid(pid, &ret ,0); | ||
422 | } | ||
423 | if (WIFEXITED(ret)) | ||
424 | exitstatus |= WEXITSTATUS(ret); | ||
425 | else | ||
426 | exitstatus = 0xff; | ||
427 | |||
428 | count = 0; | ||
429 | /* poor man's strtok, but with counting */ | ||
430 | for (i = 0; i < data_len; i++) { | ||
431 | if (data[i] == '\n') { | ||
432 | count++; | ||
433 | data[i] = '\0'; | ||
434 | } | ||
435 | } | ||
436 | start = all_list_len; | ||
437 | all_list_len += count; | ||
438 | all_list = realloc(all_list, sizeof(char *) * all_list_len); | ||
439 | str = data; | ||
440 | for (i = 0; i < data_len && start != all_list_len; i++) { | ||
441 | if (data[i] == '\0') { | ||
442 | all_list[start] = str; | ||
443 | str = data + i + 1; | ||
444 | start++; | ||
445 | } | ||
446 | } | ||
447 | } | ||
448 | |||
449 | /* | ||
450 | * Terminate s at first space, if any. If there was a space, return pointer to | ||
451 | * the character after that. Otherwise, return pointer to the terminating NUL. | ||
452 | */ | ||
453 | static char *chomp(char *s) | ||
454 | { | ||
455 | while (*s && !isspace(*s)) | ||
456 | s++; | ||
457 | |||
458 | if (*s) | ||
459 | *s++ = '\0'; | ||
460 | |||
461 | return s; | ||
462 | } | ||
463 | |||
464 | /* Return pointer to directive content, or NULL if not a directive. */ | ||
465 | static char *is_directive(char *line) | ||
466 | { | ||
467 | if (file_format == FORMAT_DOCBOOK && line[0] == '!') | ||
468 | return line + 1; | ||
469 | else if (file_format == FORMAT_RST && !strncmp(line, ".. !", 4)) | ||
470 | return line + 4; | ||
471 | |||
472 | return NULL; | ||
473 | } | ||
474 | |||
475 | /* | ||
476 | * Parse file, calling action specific functions for: | ||
477 | * 1) Lines containing !E | ||
478 | * 2) Lines containing !I | ||
479 | * 3) Lines containing !D | ||
480 | * 4) Lines containing !F | ||
481 | * 5) Lines containing !P | ||
482 | * 6) Lines containing !C | ||
483 | * 7) Default lines - lines not matching the above | ||
484 | */ | ||
485 | static void parse_file(FILE *infile) | ||
486 | { | ||
487 | char line[MAXLINESZ]; | ||
488 | char *p, *s; | ||
489 | while (fgets(line, MAXLINESZ, infile)) { | ||
490 | p = is_directive(line); | ||
491 | if (!p) { | ||
492 | defaultline(line); | ||
493 | continue; | ||
494 | } | ||
495 | |||
496 | switch (*p++) { | ||
497 | case 'E': | ||
498 | chomp(p); | ||
499 | externalfunctions(p); | ||
500 | break; | ||
501 | case 'I': | ||
502 | chomp(p); | ||
503 | internalfunctions(p); | ||
504 | break; | ||
505 | case 'D': | ||
506 | chomp(p); | ||
507 | symbolsonly(p); | ||
508 | break; | ||
509 | case 'F': | ||
510 | /* filename */ | ||
511 | s = chomp(p); | ||
512 | /* function names */ | ||
513 | while (isspace(*s)) | ||
514 | s++; | ||
515 | singlefunctions(p, s); | ||
516 | break; | ||
517 | case 'P': | ||
518 | /* filename */ | ||
519 | s = chomp(p); | ||
520 | /* DOC: section name */ | ||
521 | while (isspace(*s)) | ||
522 | s++; | ||
523 | docsection(p, s); | ||
524 | break; | ||
525 | case 'C': | ||
526 | chomp(p); | ||
527 | if (findall) | ||
528 | findall(p); | ||
529 | break; | ||
530 | default: | ||
531 | defaultline(line); | ||
532 | } | ||
533 | } | ||
534 | fflush(stdout); | ||
535 | } | ||
536 | |||
537 | /* | ||
538 | * Is this a RestructuredText template? Answer the question by seeing if its | ||
539 | * name ends in ".rst". | ||
540 | */ | ||
541 | static int is_rst(const char *file) | ||
542 | { | ||
543 | char *dot = strrchr(file, '.'); | ||
544 | |||
545 | return dot && !strcmp(dot + 1, "rst"); | ||
546 | } | ||
547 | |||
548 | enum opts { | ||
549 | OPT_DOCBOOK, | ||
550 | OPT_RST, | ||
551 | OPT_HELP, | ||
552 | }; | ||
553 | |||
554 | int main(int argc, char *argv[]) | ||
555 | { | ||
556 | const char *subcommand, *filename; | ||
557 | FILE * infile; | ||
558 | int i; | ||
559 | |||
560 | srctree = getenv("SRCTREE"); | ||
561 | if (!srctree) | ||
562 | srctree = getcwd(NULL, 0); | ||
563 | kernsrctree = getenv("KBUILD_SRC"); | ||
564 | if (!kernsrctree || !*kernsrctree) | ||
565 | kernsrctree = srctree; | ||
566 | |||
567 | for (;;) { | ||
568 | int c; | ||
569 | struct option opts[] = { | ||
570 | { "docbook", no_argument, NULL, OPT_DOCBOOK }, | ||
571 | { "rst", no_argument, NULL, OPT_RST }, | ||
572 | { "help", no_argument, NULL, OPT_HELP }, | ||
573 | {} | ||
574 | }; | ||
575 | |||
576 | c = getopt_long_only(argc, argv, "", opts, NULL); | ||
577 | if (c == -1) | ||
578 | break; | ||
579 | |||
580 | switch (c) { | ||
581 | case OPT_DOCBOOK: | ||
582 | file_format = FORMAT_DOCBOOK; | ||
583 | break; | ||
584 | case OPT_RST: | ||
585 | file_format = FORMAT_RST; | ||
586 | break; | ||
587 | case OPT_HELP: | ||
588 | usage(); | ||
589 | return 0; | ||
590 | default: | ||
591 | case '?': | ||
592 | usage(); | ||
593 | return 1; | ||
594 | } | ||
595 | } | ||
596 | |||
597 | argc -= optind; | ||
598 | argv += optind; | ||
599 | |||
600 | if (argc != 2) { | ||
601 | usage(); | ||
602 | exit(1); | ||
603 | } | ||
604 | |||
605 | subcommand = argv[0]; | ||
606 | filename = argv[1]; | ||
607 | |||
608 | if (file_format == FORMAT_AUTO) | ||
609 | file_format = is_rst(filename) ? FORMAT_RST : FORMAT_DOCBOOK; | ||
610 | |||
611 | /* Open file, exit on error */ | ||
612 | infile = fopen(filename, "r"); | ||
613 | if (infile == NULL) { | ||
614 | fprintf(stderr, "docproc: "); | ||
615 | perror(filename); | ||
616 | exit(2); | ||
617 | } | ||
618 | |||
619 | if (strcmp("doc", subcommand) == 0) { | ||
620 | if (file_format == FORMAT_RST) { | ||
621 | time_t t = time(NULL); | ||
622 | printf(".. generated from %s by docproc %s\n", | ||
623 | filename, ctime(&t)); | ||
624 | } | ||
625 | |||
626 | /* Need to do this in two passes. | ||
627 | * First pass is used to collect all symbols exported | ||
628 | * in the various files; | ||
629 | * Second pass generate the documentation. | ||
630 | * This is required because some functions are declared | ||
631 | * and exported in different files :-(( | ||
632 | */ | ||
633 | /* Collect symbols */ | ||
634 | defaultline = noaction; | ||
635 | internalfunctions = find_export_symbols; | ||
636 | externalfunctions = find_export_symbols; | ||
637 | symbolsonly = find_export_symbols; | ||
638 | singlefunctions = noaction2; | ||
639 | docsection = noaction2; | ||
640 | findall = find_all_symbols; | ||
641 | parse_file(infile); | ||
642 | |||
643 | /* Rewind to start from beginning of file again */ | ||
644 | fseek(infile, 0, SEEK_SET); | ||
645 | defaultline = printline; | ||
646 | internalfunctions = intfunc; | ||
647 | externalfunctions = extfunc; | ||
648 | symbolsonly = printline; | ||
649 | singlefunctions = singfunc; | ||
650 | docsection = docsect; | ||
651 | findall = NULL; | ||
652 | |||
653 | parse_file(infile); | ||
654 | |||
655 | for (i = 0; i < all_list_len; i++) { | ||
656 | if (!all_list[i]) | ||
657 | continue; | ||
658 | fprintf(stderr, "Warning: didn't use docs for %s\n", | ||
659 | all_list[i]); | ||
660 | } | ||
661 | } else if (strcmp("depend", subcommand) == 0) { | ||
662 | /* Create first part of dependency chain | ||
663 | * file.tmpl */ | ||
664 | printf("%s\t", filename); | ||
665 | defaultline = noaction; | ||
666 | internalfunctions = adddep; | ||
667 | externalfunctions = adddep; | ||
668 | symbolsonly = adddep; | ||
669 | singlefunctions = adddep2; | ||
670 | docsection = adddep2; | ||
671 | findall = adddep; | ||
672 | parse_file(infile); | ||
673 | printf("\n"); | ||
674 | } else { | ||
675 | fprintf(stderr, "Unknown option: %s\n", subcommand); | ||
676 | exit(1); | ||
677 | } | ||
678 | fclose(infile); | ||
679 | fflush(stdout); | ||
680 | return exitstatus; | ||
681 | } | ||
diff --git a/scripts/kernel-doc b/scripts/kernel-doc index a26a5f2dce39..c1ffd31ff423 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc | |||
@@ -2189,6 +2189,8 @@ sub dump_struct($$) { | |||
2189 | $members =~ s/\s*CRYPTO_MINALIGN_ATTR//gos; | 2189 | $members =~ s/\s*CRYPTO_MINALIGN_ATTR//gos; |
2190 | # replace DECLARE_BITMAP | 2190 | # replace DECLARE_BITMAP |
2191 | $members =~ s/DECLARE_BITMAP\s*\(([^,)]+), ([^,)]+)\)/unsigned long $1\[BITS_TO_LONGS($2)\]/gos; | 2191 | $members =~ s/DECLARE_BITMAP\s*\(([^,)]+), ([^,)]+)\)/unsigned long $1\[BITS_TO_LONGS($2)\]/gos; |
2192 | # replace DECLARE_HASHTABLE | ||
2193 | $members =~ s/DECLARE_HASHTABLE\s*\(([^,)]+), ([^,)]+)\)/unsigned long $1\[1 << (($2) - 1)\]/gos; | ||
2192 | 2194 | ||
2193 | create_parameterlist($members, ';', $file); | 2195 | create_parameterlist($members, ';', $file); |
2194 | check_sections($file, $declaration_name, "struct", $sectcheck, $struct_actual, $nested); | 2196 | check_sections($file, $declaration_name, "struct", $sectcheck, $struct_actual, $nested); |
diff --git a/scripts/kernel-doc-xml-ref b/scripts/kernel-doc-xml-ref deleted file mode 100755 index 104a5a5ba2c8..000000000000 --- a/scripts/kernel-doc-xml-ref +++ /dev/null | |||
@@ -1,198 +0,0 @@ | |||
1 | #!/usr/bin/perl -w | ||
2 | |||
3 | use strict; | ||
4 | |||
5 | ## Copyright (C) 2015 Intel Corporation ## | ||
6 | # ## | ||
7 | ## This software falls under the GNU General Public License. ## | ||
8 | ## Please read the COPYING file for more information ## | ||
9 | # | ||
10 | # | ||
11 | # This software reads a XML file and a list of valid interal | ||
12 | # references to replace Docbook tags with links. | ||
13 | # | ||
14 | # The list of "valid internal references" must be one-per-line in the following format: | ||
15 | # API-struct-foo | ||
16 | # API-enum-bar | ||
17 | # API-my-function | ||
18 | # | ||
19 | # The software walks over the XML file looking for xml tags representing possible references | ||
20 | # to the Document. Each reference will be cross checked against the "Valid Internal Reference" list. If | ||
21 | # the referece is found it replaces its content by a <link> tag. | ||
22 | # | ||
23 | # usage: | ||
24 | # kernel-doc-xml-ref -db filename | ||
25 | # xml filename > outputfile | ||
26 | |||
27 | # read arguments | ||
28 | if ($#ARGV != 2) { | ||
29 | usage(); | ||
30 | } | ||
31 | |||
32 | #Holds the database filename | ||
33 | my $databasefile; | ||
34 | my @database; | ||
35 | |||
36 | #holds the inputfile | ||
37 | my $inputfile; | ||
38 | my $errors = 0; | ||
39 | |||
40 | my %highlights = ( | ||
41 | "<function>(.*?)</function>", | ||
42 | "\"<function>\" . convert_function(\$1, \$line) . \"</function>\"", | ||
43 | "<structname>(.*?)</structname>", | ||
44 | "\"<structname>\" . convert_struct(\$1) . \"</structname>\"", | ||
45 | "<funcdef>(.*?)<function>(.*?)</function></funcdef>", | ||
46 | "\"<funcdef>\" . convert_param(\$1) . \"<function>\$2</function></funcdef>\"", | ||
47 | "<paramdef>(.*?)<parameter>(.*?)</parameter></paramdef>", | ||
48 | "\"<paramdef>\" . convert_param(\$1) . \"<parameter>\$2</parameter></paramdef>\""); | ||
49 | |||
50 | while($ARGV[0] =~ m/^-(.*)/) { | ||
51 | my $cmd = shift @ARGV; | ||
52 | if ($cmd eq "-db") { | ||
53 | $databasefile = shift @ARGV | ||
54 | } else { | ||
55 | usage(); | ||
56 | } | ||
57 | } | ||
58 | $inputfile = shift @ARGV; | ||
59 | |||
60 | sub open_database { | ||
61 | open (my $handle, '<', $databasefile) or die "Cannot open $databasefile"; | ||
62 | chomp(my @lines = <$handle>); | ||
63 | close $handle; | ||
64 | |||
65 | @database = @lines; | ||
66 | } | ||
67 | |||
68 | sub process_file { | ||
69 | open_database(); | ||
70 | |||
71 | my $dohighlight; | ||
72 | foreach my $pattern (keys %highlights) { | ||
73 | $dohighlight .= "\$line =~ s:$pattern:$highlights{$pattern}:eg;\n"; | ||
74 | } | ||
75 | |||
76 | open(FILE, $inputfile) or die("Could not open $inputfile") or die ("Cannot open $inputfile"); | ||
77 | foreach my $line (<FILE>) { | ||
78 | eval $dohighlight; | ||
79 | print $line; | ||
80 | } | ||
81 | } | ||
82 | |||
83 | sub trim($_) | ||
84 | { | ||
85 | my $str = $_[0]; | ||
86 | $str =~ s/^\s+|\s+$//g; | ||
87 | return $str | ||
88 | } | ||
89 | |||
90 | sub has_key_defined($_) | ||
91 | { | ||
92 | if ( grep( /^$_[0]$/, @database)) { | ||
93 | return 1; | ||
94 | } | ||
95 | return 0; | ||
96 | } | ||
97 | |||
98 | # Gets a <function> content and add it a hyperlink if possible. | ||
99 | sub convert_function($_) | ||
100 | { | ||
101 | my $arg = $_[0]; | ||
102 | my $key = $_[0]; | ||
103 | |||
104 | my $line = $_[1]; | ||
105 | |||
106 | $key = trim($key); | ||
107 | |||
108 | $key =~ s/[^A-Za-z0-9]/-/g; | ||
109 | $key = "API-" . $key; | ||
110 | |||
111 | # We shouldn't add links to <funcdef> prototype | ||
112 | if (!has_key_defined($key) || $line =~ m/\s+<funcdef/i) { | ||
113 | return $arg; | ||
114 | } | ||
115 | |||
116 | my $head = $arg; | ||
117 | my $tail = ""; | ||
118 | if ($arg =~ /(.*?)( ?)$/) { | ||
119 | $head = $1; | ||
120 | $tail = $2; | ||
121 | } | ||
122 | return "<link linkend=\"$key\">$head</link>$tail"; | ||
123 | } | ||
124 | |||
125 | # Converting a struct text to link | ||
126 | sub convert_struct($_) | ||
127 | { | ||
128 | my $arg = $_[0]; | ||
129 | my $key = $_[0]; | ||
130 | $key =~ s/(struct )?(\w)/$2/g; | ||
131 | $key =~ s/[^A-Za-z0-9]/-/g; | ||
132 | $key = "API-struct-" . $key; | ||
133 | |||
134 | if (!has_key_defined($key)) { | ||
135 | return $arg; | ||
136 | } | ||
137 | |||
138 | my ($head, $tail) = split_pointer($arg); | ||
139 | return "<link linkend=\"$key\">$head</link>$tail"; | ||
140 | } | ||
141 | |||
142 | # Identify "object *" elements | ||
143 | sub split_pointer($_) | ||
144 | { | ||
145 | my $arg = $_[0]; | ||
146 | if ($arg =~ /(.*?)( ?\* ?)/) { | ||
147 | return ($1, $2); | ||
148 | } | ||
149 | return ($arg, ""); | ||
150 | } | ||
151 | |||
152 | sub convert_param($_) | ||
153 | { | ||
154 | my $type = $_[0]; | ||
155 | my $keyname = convert_key_name($type); | ||
156 | |||
157 | if (!has_key_defined($keyname)) { | ||
158 | return $type; | ||
159 | } | ||
160 | |||
161 | my ($head, $tail) = split_pointer($type); | ||
162 | return "<link linkend=\"$keyname\">$head</link>$tail"; | ||
163 | |||
164 | } | ||
165 | |||
166 | # DocBook links are in the API-<TYPE>-<STRUCT-NAME> format | ||
167 | # This method gets an element and returns a valid DocBook reference for it. | ||
168 | sub convert_key_name($_) | ||
169 | { | ||
170 | #Pattern $2 is optional and might be uninitialized | ||
171 | no warnings 'uninitialized'; | ||
172 | |||
173 | my $str = $_[0]; | ||
174 | $str =~ s/(const|static)? ?(struct)? ?([a-zA-Z0-9_]+) ?(\*|&)?/$2 $3/g ; | ||
175 | |||
176 | # trim | ||
177 | $str =~ s/^\s+|\s+$//g; | ||
178 | |||
179 | # spaces and _ to - | ||
180 | $str =~ s/[^A-Za-z0-9]/-/g; | ||
181 | |||
182 | return "API-" . $str; | ||
183 | } | ||
184 | |||
185 | sub usage { | ||
186 | print "Usage: $0 -db database filename\n"; | ||
187 | print " xml source file(s) > outputfile\n"; | ||
188 | exit 1; | ||
189 | } | ||
190 | |||
191 | # starting point | ||
192 | process_file(); | ||
193 | |||
194 | if ($errors) { | ||
195 | print STDERR "$errors errors\n"; | ||
196 | } | ||
197 | |||
198 | exit($errors); | ||
diff --git a/scripts/selinux/README b/scripts/selinux/README index 4d020ecb7524..5ba679c5be18 100644 --- a/scripts/selinux/README +++ b/scripts/selinux/README | |||
@@ -1,2 +1,2 @@ | |||
1 | Please see Documentation/security/SELinux.txt for information on | 1 | Please see Documentation/admin-guide/LSM/SELinux.rst for information on |
2 | installing a dummy SELinux policy. | 2 | installing a dummy SELinux policy. |
diff --git a/security/apparmor/match.c b/security/apparmor/match.c index 960c913381e2..72c604350e80 100644 --- a/security/apparmor/match.c +++ b/security/apparmor/match.c | |||
@@ -226,7 +226,7 @@ void aa_dfa_free_kref(struct kref *kref) | |||
226 | * @flags: flags controlling what type of accept tables are acceptable | 226 | * @flags: flags controlling what type of accept tables are acceptable |
227 | * | 227 | * |
228 | * Unpack a dfa that has been serialized. To find information on the dfa | 228 | * Unpack a dfa that has been serialized. To find information on the dfa |
229 | * format look in Documentation/security/apparmor.txt | 229 | * format look in Documentation/admin-guide/LSM/apparmor.rst |
230 | * Assumes the dfa @blob stream has been aligned on a 8 byte boundary | 230 | * Assumes the dfa @blob stream has been aligned on a 8 byte boundary |
231 | * | 231 | * |
232 | * Returns: an unpacked dfa ready for matching or ERR_PTR on failure | 232 | * Returns: an unpacked dfa ready for matching or ERR_PTR on failure |
diff --git a/security/apparmor/policy_unpack.c b/security/apparmor/policy_unpack.c index f3422a91353c..981d570eebba 100644 --- a/security/apparmor/policy_unpack.c +++ b/security/apparmor/policy_unpack.c | |||
@@ -13,7 +13,7 @@ | |||
13 | * License. | 13 | * License. |
14 | * | 14 | * |
15 | * AppArmor uses a serialized binary format for loading policy. To find | 15 | * AppArmor uses a serialized binary format for loading policy. To find |
16 | * policy format documentation look in Documentation/security/apparmor.txt | 16 | * policy format documentation see Documentation/admin-guide/LSM/apparmor.rst |
17 | * All policy is validated before it is used. | 17 | * All policy is validated before it is used. |
18 | */ | 18 | */ |
19 | 19 | ||
diff --git a/security/keys/encrypted-keys/encrypted.c b/security/keys/encrypted-keys/encrypted.c index bb6324d1ccec..69855ba0d3b3 100644 --- a/security/keys/encrypted-keys/encrypted.c +++ b/security/keys/encrypted-keys/encrypted.c | |||
@@ -11,7 +11,7 @@ | |||
11 | * it under the terms of the GNU General Public License as published by | 11 | * it under the terms of the GNU General Public License as published by |
12 | * the Free Software Foundation, version 2 of the License. | 12 | * the Free Software Foundation, version 2 of the License. |
13 | * | 13 | * |
14 | * See Documentation/security/keys-trusted-encrypted.txt | 14 | * See Documentation/security/keys/trusted-encrypted.rst |
15 | */ | 15 | */ |
16 | 16 | ||
17 | #include <linux/uaccess.h> | 17 | #include <linux/uaccess.h> |
diff --git a/security/keys/encrypted-keys/masterkey_trusted.c b/security/keys/encrypted-keys/masterkey_trusted.c index b5b4812dbc87..cbf0bc127a73 100644 --- a/security/keys/encrypted-keys/masterkey_trusted.c +++ b/security/keys/encrypted-keys/masterkey_trusted.c | |||
@@ -11,7 +11,7 @@ | |||
11 | * it under the terms of the GNU General Public License as published by | 11 | * it under the terms of the GNU General Public License as published by |
12 | * the Free Software Foundation, version 2 of the License. | 12 | * the Free Software Foundation, version 2 of the License. |
13 | * | 13 | * |
14 | * See Documentation/security/keys-trusted-encrypted.txt | 14 | * See Documentation/security/keys/trusted-encrypted.rst |
15 | */ | 15 | */ |
16 | 16 | ||
17 | #include <linux/uaccess.h> | 17 | #include <linux/uaccess.h> |
diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 9822e500d50d..63e63a42db3c 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c | |||
@@ -8,7 +8,7 @@ | |||
8 | * as published by the Free Software Foundation; either version | 8 | * as published by the Free Software Foundation; either version |
9 | * 2 of the License, or (at your option) any later version. | 9 | * 2 of the License, or (at your option) any later version. |
10 | * | 10 | * |
11 | * See Documentation/security/keys-request-key.txt | 11 | * See Documentation/security/keys/request-key.rst |
12 | */ | 12 | */ |
13 | 13 | ||
14 | #include <linux/module.h> | 14 | #include <linux/module.h> |
diff --git a/security/keys/request_key_auth.c b/security/keys/request_key_auth.c index 0f062156dfb2..afe9d22ab361 100644 --- a/security/keys/request_key_auth.c +++ b/security/keys/request_key_auth.c | |||
@@ -8,7 +8,7 @@ | |||
8 | * as published by the Free Software Foundation; either version | 8 | * as published by the Free Software Foundation; either version |
9 | * 2 of the License, or (at your option) any later version. | 9 | * 2 of the License, or (at your option) any later version. |
10 | * | 10 | * |
11 | * See Documentation/security/keys-request-key.txt | 11 | * See Documentation/security/keys/request-key.rst |
12 | */ | 12 | */ |
13 | 13 | ||
14 | #include <linux/module.h> | 14 | #include <linux/module.h> |
diff --git a/security/keys/trusted.c b/security/keys/trusted.c index 435e86e13879..ddfaebf60fc8 100644 --- a/security/keys/trusted.c +++ b/security/keys/trusted.c | |||
@@ -8,7 +8,7 @@ | |||
8 | * it under the terms of the GNU General Public License as published by | 8 | * it under the terms of the GNU General Public License as published by |
9 | * the Free Software Foundation, version 2 of the License. | 9 | * the Free Software Foundation, version 2 of the License. |
10 | * | 10 | * |
11 | * See Documentation/security/keys-trusted-encrypted.txt | 11 | * See Documentation/security/keys/trusted-encrypted.rst |
12 | */ | 12 | */ |
13 | 13 | ||
14 | #include <crypto/hash_info.h> | 14 | #include <crypto/hash_info.h> |
diff --git a/security/yama/Kconfig b/security/yama/Kconfig index 90c605eea892..96b27405558a 100644 --- a/security/yama/Kconfig +++ b/security/yama/Kconfig | |||
@@ -7,6 +7,7 @@ config SECURITY_YAMA | |||
7 | system-wide security settings beyond regular Linux discretionary | 7 | system-wide security settings beyond regular Linux discretionary |
8 | access controls. Currently available is ptrace scope restriction. | 8 | access controls. Currently available is ptrace scope restriction. |
9 | Like capabilities, this security module stacks with other LSMs. | 9 | Like capabilities, this security module stacks with other LSMs. |
10 | Further information can be found in Documentation/security/Yama.txt. | 10 | Further information can be found in |
11 | Documentation/admin-guide/LSM/Yama.rst. | ||
11 | 12 | ||
12 | If you are unsure how to answer this question, answer N. | 13 | If you are unsure how to answer this question, answer N. |