Documentation: add How to avoid botching up ioctls

I pointed some folks at this and they wondered why it wasn't in the kernel Documentation directory. So now it is. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Michael Ellerman <mpe@ellerman.id.au> 2014-08-06 02:32:17 -0400
committer: Linus Torvalds <torvalds@linux-foundation.org> 2014-08-09 12:13:40 -0400
commit: efe4a772215869e780e69d0037dca50e34aee6a7 (patch)
tree: 6538ee03d03a4bc15b5dbb104688b9e35309c819
parent: c309bfa9b481e7dbd3e1ab819271bf3009f44859 (diff)
2 files changed, 221 insertions, 0 deletions
diff --git a/Documentation/ioctl/00-INDEX b/Documentation/ioctl/00-INDEX
index d2fe4d4729ef..c1a925787950 100644
--- a/Documentation/ioctl/00-INDEX
+++ b/Documentation/ioctl/00-INDEX
@@ -1,5 +1,7 @@
 00-INDEX
        - this file
+botching-up-ioctls.txt
+        - how to avoid botching up ioctls
 cdrom.txt
        - summary of CDROM ioctl calls
 hdio.txt
diff --git a/Documentation/ioctl/botching-up-ioctls.txt b/Documentation/ioctl/botching-up-ioctls.txt
new file mode 100644
index 000000000000..45fe78c58019
--- /dev/null
+++ b/Documentation/ioctl/botching-up-ioctls.txt
@@ -0,0 +1,219 @@
+(How to avoid) Botching up ioctls
+=================================
+From: http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
+By: Daniel Vetter, Copyright © 2013 Intel Corporation
+One clear insight kernel graphics hackers gained in the past few years is that
+trying to come up with a unified interface to manage the execution units and
+memory on completely different GPUs is a futile effort. So nowadays every
+driver has its own set of ioctls to allocate memory and submit work to the GPU.
+Which is nice, since there's no more insanity in the form of fake-generic, but
+actually only used once interfaces. But the clear downside is that there's much
+more potential to screw things up.
+To avoid repeating all the same mistakes again I've written up some of the
+lessons learned while botching the job for the drm/i915 driver. Most of these
+only cover technicalities and not the big-picture issues like what the command
+submission ioctl exactly should look like. Learning these lessons is probably
+something every GPU driver has to do on its own.
+Prerequisites
+-------------
+First the prerequisites. Without these you have already failed, because you
+will need to add a a 32-bit compat layer:
+ * Only use fixed sized integers. To avoid conflicts with typedefs in userspace
+   the kernel has special types like __u32, __s64. Use them.
+ * Align everything to the natural size and use explicit padding. 32-bit
+   platforms don't necessarily align 64-bit values to 64-bit boundaries, but
+   64-bit platforms do. So we always need padding to the natural size to get
+   this right.
+ * Pad the entire struct to a multiple of 64-bits - the structure size will
+   otherwise differ on 32-bit versus 64-bit. Having a different structure size
+   hurts when passing arrays of structures to the kernel, or if the kernel
+   checks the structure size, which e.g. the drm core does.
+ * Pointers are __u64, cast from/to a uintprt_t on the userspace side and
+   from/to a void __user * in the kernel. Try really hard not to delay this
+   conversion or worse, fiddle the raw __u64 through your code since that
+   diminishes the checking tools like sparse can provide.
+Basics
+------
+With the joys of writing a compat layer avoided we can take a look at the basic
+fumbles. Neglecting these will make backward and forward compatibility a real
+pain. And since getting things wrong on the first attempt is guaranteed you
+will have a second iteration or at least an extension for any given interface.
+ * Have a clear way for userspace to figure out whether your new ioctl or ioctl
+   extension is supported on a given kernel. If you can't rely on old kernels
+   rejecting the new flags/modes or ioctls (since doing that was botched in the
+   past) then you need a driver feature flag or revision number somewhere.
+ * Have a plan for extending ioctls with new flags or new fields at the end of
+   the structure. The drm core checks the passed-in size for each ioctl call
+   and zero-extends any mismatches between kernel and userspace. That helps,
+   but isn't a complete solution since newer userspace on older kernels won't
+   notice that the newly added fields at the end get ignored. So this still
+   needs a new driver feature flags.
+ * Check all unused fields and flags and all the padding for whether it's 0,
+   and reject the ioctl if that's not the case. Otherwise your nice plan for
+   future extensions is going right down the gutters since someone will submit
+   an ioctl struct with random stack garbage in the yet unused parts. Which
+   then bakes in the ABI that those fields can never be used for anything else
+   but garbage.
+ * Have simple testcases for all of the above.
+Fun with Error Paths
+--------------------
+Nowadays we don't have any excuse left any more for drm drivers being neat
+little root exploits. This means we both need full input validation and solid
+error handling paths - GPUs will die eventually in the oddmost corner cases
+anyway:
+ * The ioctl must check for array overflows. Also it needs to check for
+   over/underflows and clamping issues of integer values in general. The usual
+   example is sprite positioning values fed directly into the hardware with the
+   hardware just having 12 bits or so. Works nicely until some odd display
+   server doesn't bother with clamping itself and the cursor wraps around the
+   screen.
+ * Have simple testcases for every input validation failure case in your ioctl.
+   Check that the error code matches your expectations. And finally make sure
+   that you only test for one single error path in each subtest by submitting
+   otherwise perfectly valid data. Without this an earlier check might reject
+   the ioctl already and shadow the codepath you actually want to test, hiding
+   bugs and regressions.
+ * Make all your ioctls restartable. First X really loves signals and second
+   this will allow you to test 90% of all error handling paths by just
+   interrupting your main test suite constantly with signals. Thanks to X's
+   love for signal you'll get an excellent base coverage of all your error
+   paths pretty much for free for graphics drivers. Also, be consistent with
+   how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its
+   userspace library. The i915 driver botched this with the set_tiling ioctl,
+   now we're stuck forever with some arcane semantics in both the kernel and
+   userspace.
+ * If you can't make a given codepath restartable make a stuck task at least
+   killable. GPUs just die and your users won't like you more if you hang their
+   entire box (by means of an unkillable X process). If the state recovery is
+   still too tricky have a timeout or hangcheck safety net as a last-ditch
+   effort in case the hardware has gone bananas.
+ * Have testcases for the really tricky corner cases in your error recovery code
+   - it's way too easy to create a deadlock between your hangcheck code and
+   waiters.
+Time, Waiting and Missing it
+----------------------------
+GPUs do most everything asynchronously, so we have a need to time operations and
+wait for oustanding ones. This is really tricky business; at the moment none of
+the ioctls supported by the drm/i915 get this fully right, which means there's
+still tons more lessons to learn here.
+ * Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and
+   v4l use by default nowadays. But let userspace know which timestamps are
+   derived from different clock domains like your main system clock (provided
+   by the kernel) or some independent hardware counter somewhere else. Clocks
+   will mismatch if you look close enough, but if performance measuring tools
+   have this information they can at least compensate. If your userspace can
+   get at the raw values of some clocks (e.g. through in-command-stream
+   performance counter sampling instructions) consider exposing those also.
+ * Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most
+   convenient time specification, but it's mostly the standard.
+ * Check that input time values are normalized and reject them if not. Note
+   that the kernel native struct ktime has a signed integer for both seconds
+   and nanoseconds, so beware here.
+ * For timeouts, use absolute times. If you're a good fellow and made your
+   ioctl restartable relative timeouts tend to be too coarse and can
+   indefinitely extend your wait time due to rounding on each restart.
+   Especially if your reference clock is something really slow like the display
+   frame counter. With a spec laywer hat on this isn't a bug since timeouts can
+   always be extended - but users will surely hate you if their neat animations
+   starts to stutter due to this.
+ * Consider ditching any synchronous wait ioctls with timeouts and just deliver
+   an asynchronous event on a pollable file descriptor. It fits much better
+   into event driven applications' main loop.
+ * Have testcases for corner-cases, especially whether the return values for
+   already-completed events, successful waits and timed-out waits are all sane
+   and suiting to your needs.
+Leaking Resources, Not
+----------------------
+A full-blown drm driver essentially implements a little OS, but specialized to
+the given GPU platforms. This means a driver needs to expose tons of handles
+for different objects and other resources to userspace. Doing that right
+entails its own little set of pitfalls:
+ * Always attach the lifetime of your dynamically created resources to the
+   lifetime of a file descriptor. Consider using a 1:1 mapping if your resource
+   needs to be shared across processes -  fd-passing over unix domain sockets
+   also simplifies lifetime management for userspace.
+ * Always have O_CLOEXEC support.
+ * Ensure that you have sufficient insulation between different clients. By
+   default pick a private per-fd namespace which forces any sharing to be done
+   explictly. Only go with a more global per-device namespace if the objects
+   are truly device-unique. One counterexample in the drm modeset interfaces is
+   that the per-device modeset objects like connectors share a namespace with
+   framebuffer objects, which mostly are not shared at all. A separate
+   namespace, private by default, for framebuffers would have been more
+   suitable.
+ * Think about uniqueness requirements for userspace handles. E.g. for most drm
+   drivers it's a userspace bug to submit the same object twice in the same
+   command submission ioctl. But then if objects are shareable userspace needs
+   to know whether it has seen an imported object from a different process
+   already or not. I haven't tried this myself yet due to lack of a new class
+   of objects, but consider using inode numbers on your shared file descriptors
+   as unique identifiers - it's how real files are told apart, too.
+   Unfortunately this requires a full-blown virtual filesystem in the kernel.
+Last, but not Least
+-------------------
+Not every problem needs a new ioctl:
+ * Think hard whether you really want a driver-private interface. Of course
+   it's much quicker to push a driver-private interface than engaging in
+   lengthy discussions for a more generic solution. And occasionally doing a
+   private interface to spearhead a new concept is what's required. But in the
+   end, once the generic interface comes around you'll end up maintainer two
+   interfaces. Indefinitely.
+ * Consider other interfaces than ioctls. A sysfs attribute is much better for
+   per-device settings, or for child objects with fairly static lifetimes (like
+   output connectors in drm with all the detection override attributes). Or
+   maybe only your testsuite needs this interface, and then debugfs with its
+   disclaimer of not having a stable ABI would be better.
+Finally, the name of the game is to get it right on the first attempt, since if
+your driver proves popular and your hardware platforms long-lived then you'll
+be stuck with a given ioctl essentially forever. You can try to deprecate
+horrible ioctls on newer iterations of your hardware, but generally it takes
+years to accomplish this. And then again years until the last user able to
+complain about regressions disappears, too.
author	Michael Ellerman <mpe@ellerman.id.au>	2014-08-06 02:32:17 -0400
committer	Linus Torvalds <torvalds@linux-foundation.org>	2014-08-09 12:13:40 -0400
commit	efe4a772215869e780e69d0037dca50e34aee6a7 (patch)
tree	6538ee03d03a4bc15b5dbb104688b9e35309c819
parent	c309bfa9b481e7dbd3e1ab819271bf3009f44859 (diff)

diff --git a/Documentation/ioctl/00-INDEX b/Documentation/ioctl/00-INDEX index d2fe4d4729ef..c1a925787950 100644 --- a/Documentation/ioctl/00-INDEX +++ b/Documentation/ioctl/00-INDEX
@@ -1,5 +1,7 @@
1	00-INDEX	1	00-INDEX
2	- this file	2	- this file
		3	botching-up-ioctls.txt
		4	- how to avoid botching up ioctls
3	cdrom.txt	5	cdrom.txt
4	- summary of CDROM ioctl calls	6	- summary of CDROM ioctl calls
5	hdio.txt	7	hdio.txt


diff --git a/Documentation/ioctl/botching-up-ioctls.txt b/Documentation/ioctl/botching-up-ioctls.txt new file mode 100644 index 000000000000..45fe78c58019 --- /dev/null +++ b/Documentation/ioctl/botching-up-ioctls.txt
@@ -0,0 +1,219 @@
		1	(How to avoid) Botching up ioctls
		2	=================================
		3
		4	From: http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
		5
		6	By: Daniel Vetter, Copyright © 2013 Intel Corporation
		7
		8	One clear insight kernel graphics hackers gained in the past few years is that
		9	trying to come up with a unified interface to manage the execution units and
		10	memory on completely different GPUs is a futile effort. So nowadays every
		11	driver has its own set of ioctls to allocate memory and submit work to the GPU.
		12	Which is nice, since there's no more insanity in the form of fake-generic, but
		13	actually only used once interfaces. But the clear downside is that there's much
		14	more potential to screw things up.
		15
		16	To avoid repeating all the same mistakes again I've written up some of the
		17	lessons learned while botching the job for the drm/i915 driver. Most of these
		18	only cover technicalities and not the big-picture issues like what the command
		19	submission ioctl exactly should look like. Learning these lessons is probably
		20	something every GPU driver has to do on its own.
		21
		22
		23	Prerequisites
		24	-------------
		25
		26	First the prerequisites. Without these you have already failed, because you
		27	will need to add a a 32-bit compat layer:
		28
		29	* Only use fixed sized integers. To avoid conflicts with typedefs in userspace
		30	the kernel has special types like __u32, __s64. Use them.
		31
		32	* Align everything to the natural size and use explicit padding. 32-bit
		33	platforms don't necessarily align 64-bit values to 64-bit boundaries, but
		34	64-bit platforms do. So we always need padding to the natural size to get
		35	this right.
		36
		37	* Pad the entire struct to a multiple of 64-bits - the structure size will
		38	otherwise differ on 32-bit versus 64-bit. Having a different structure size
		39	hurts when passing arrays of structures to the kernel, or if the kernel
		40	checks the structure size, which e.g. the drm core does.
		41
		42	* Pointers are __u64, cast from/to a uintprt_t on the userspace side and
		43	from/to a void __user * in the kernel. Try really hard not to delay this
		44	conversion or worse, fiddle the raw __u64 through your code since that
		45	diminishes the checking tools like sparse can provide.
		46
		47
		48	Basics
		49	------
		50
		51	With the joys of writing a compat layer avoided we can take a look at the basic
		52	fumbles. Neglecting these will make backward and forward compatibility a real
		53	pain. And since getting things wrong on the first attempt is guaranteed you
		54	will have a second iteration or at least an extension for any given interface.
		55
		56	* Have a clear way for userspace to figure out whether your new ioctl or ioctl
		57	extension is supported on a given kernel. If you can't rely on old kernels
		58	rejecting the new flags/modes or ioctls (since doing that was botched in the
		59	past) then you need a driver feature flag or revision number somewhere.
		60
		61	* Have a plan for extending ioctls with new flags or new fields at the end of
		62	the structure. The drm core checks the passed-in size for each ioctl call
		63	and zero-extends any mismatches between kernel and userspace. That helps,
		64	but isn't a complete solution since newer userspace on older kernels won't
		65	notice that the newly added fields at the end get ignored. So this still
		66	needs a new driver feature flags.
		67
		68	* Check all unused fields and flags and all the padding for whether it's 0,
		69	and reject the ioctl if that's not the case. Otherwise your nice plan for
		70	future extensions is going right down the gutters since someone will submit
		71	an ioctl struct with random stack garbage in the yet unused parts. Which
		72	then bakes in the ABI that those fields can never be used for anything else
		73	but garbage.
		74
		75	* Have simple testcases for all of the above.
		76
		77
		78	Fun with Error Paths
		79	--------------------
		80
		81	Nowadays we don't have any excuse left any more for drm drivers being neat
		82	little root exploits. This means we both need full input validation and solid
		83	error handling paths - GPUs will die eventually in the oddmost corner cases
		84	anyway:
		85
		86	* The ioctl must check for array overflows. Also it needs to check for
		87	over/underflows and clamping issues of integer values in general. The usual
		88	example is sprite positioning values fed directly into the hardware with the
		89	hardware just having 12 bits or so. Works nicely until some odd display
		90	server doesn't bother with clamping itself and the cursor wraps around the
		91	screen.
		92
		93	* Have simple testcases for every input validation failure case in your ioctl.
		94	Check that the error code matches your expectations. And finally make sure
		95	that you only test for one single error path in each subtest by submitting
		96	otherwise perfectly valid data. Without this an earlier check might reject
		97	the ioctl already and shadow the codepath you actually want to test, hiding
		98	bugs and regressions.
		99
		100	* Make all your ioctls restartable. First X really loves signals and second
		101	this will allow you to test 90% of all error handling paths by just
		102	interrupting your main test suite constantly with signals. Thanks to X's
		103	love for signal you'll get an excellent base coverage of all your error
		104	paths pretty much for free for graphics drivers. Also, be consistent with
		105	how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its
		106	userspace library. The i915 driver botched this with the set_tiling ioctl,
		107	now we're stuck forever with some arcane semantics in both the kernel and
		108	userspace.
		109
		110	* If you can't make a given codepath restartable make a stuck task at least
		111	killable. GPUs just die and your users won't like you more if you hang their
		112	entire box (by means of an unkillable X process). If the state recovery is
		113	still too tricky have a timeout or hangcheck safety net as a last-ditch
		114	effort in case the hardware has gone bananas.
		115
		116	* Have testcases for the really tricky corner cases in your error recovery code
		117	- it's way too easy to create a deadlock between your hangcheck code and
		118	waiters.
		119
		120
		121	Time, Waiting and Missing it
		122	----------------------------
		123
		124	GPUs do most everything asynchronously, so we have a need to time operations and
		125	wait for oustanding ones. This is really tricky business; at the moment none of
		126	the ioctls supported by the drm/i915 get this fully right, which means there's
		127	still tons more lessons to learn here.
		128
		129	* Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and
		130	v4l use by default nowadays. But let userspace know which timestamps are
		131	derived from different clock domains like your main system clock (provided
		132	by the kernel) or some independent hardware counter somewhere else. Clocks
		133	will mismatch if you look close enough, but if performance measuring tools
		134	have this information they can at least compensate. If your userspace can
		135	get at the raw values of some clocks (e.g. through in-command-stream
		136	performance counter sampling instructions) consider exposing those also.
		137
		138	* Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most
		139	convenient time specification, but it's mostly the standard.
		140
		141	* Check that input time values are normalized and reject them if not. Note
		142	that the kernel native struct ktime has a signed integer for both seconds
		143	and nanoseconds, so beware here.
		144
		145	* For timeouts, use absolute times. If you're a good fellow and made your
		146	ioctl restartable relative timeouts tend to be too coarse and can
		147	indefinitely extend your wait time due to rounding on each restart.
		148	Especially if your reference clock is something really slow like the display
		149	frame counter. With a spec laywer hat on this isn't a bug since timeouts can
		150	always be extended - but users will surely hate you if their neat animations
		151	starts to stutter due to this.
		152
		153	* Consider ditching any synchronous wait ioctls with timeouts and just deliver
		154	an asynchronous event on a pollable file descriptor. It fits much better
		155	into event driven applications' main loop.
		156
		157	* Have testcases for corner-cases, especially whether the return values for
		158	already-completed events, successful waits and timed-out waits are all sane
		159	and suiting to your needs.
		160
		161
		162	Leaking Resources, Not
		163	----------------------
		164
		165	A full-blown drm driver essentially implements a little OS, but specialized to
		166	the given GPU platforms. This means a driver needs to expose tons of handles
		167	for different objects and other resources to userspace. Doing that right
		168	entails its own little set of pitfalls:
		169
		170	* Always attach the lifetime of your dynamically created resources to the
		171	lifetime of a file descriptor. Consider using a 1:1 mapping if your resource
		172	needs to be shared across processes - fd-passing over unix domain sockets
		173	also simplifies lifetime management for userspace.
		174
		175	* Always have O_CLOEXEC support.
		176
		177	* Ensure that you have sufficient insulation between different clients. By
		178	default pick a private per-fd namespace which forces any sharing to be done
		179	explictly. Only go with a more global per-device namespace if the objects
		180	are truly device-unique. One counterexample in the drm modeset interfaces is
		181	that the per-device modeset objects like connectors share a namespace with
		182	framebuffer objects, which mostly are not shared at all. A separate
		183	namespace, private by default, for framebuffers would have been more
		184	suitable.
		185
		186	* Think about uniqueness requirements for userspace handles. E.g. for most drm
		187	drivers it's a userspace bug to submit the same object twice in the same
		188	command submission ioctl. But then if objects are shareable userspace needs
		189	to know whether it has seen an imported object from a different process
		190	already or not. I haven't tried this myself yet due to lack of a new class
		191	of objects, but consider using inode numbers on your shared file descriptors
		192	as unique identifiers - it's how real files are told apart, too.
		193	Unfortunately this requires a full-blown virtual filesystem in the kernel.
		194
		195
		196	Last, but not Least
		197	-------------------
		198
		199	Not every problem needs a new ioctl:
		200
		201	* Think hard whether you really want a driver-private interface. Of course
		202	it's much quicker to push a driver-private interface than engaging in
		203	lengthy discussions for a more generic solution. And occasionally doing a
		204	private interface to spearhead a new concept is what's required. But in the
		205	end, once the generic interface comes around you'll end up maintainer two
		206	interfaces. Indefinitely.
		207
		208	* Consider other interfaces than ioctls. A sysfs attribute is much better for
		209	per-device settings, or for child objects with fairly static lifetimes (like
		210	output connectors in drm with all the detection override attributes). Or
		211	maybe only your testsuite needs this interface, and then debugfs with its
		212	disclaimer of not having a stable ABI would be better.
		213
		214	Finally, the name of the game is to get it right on the first attempt, since if
		215	your driver proves popular and your hardware platforms long-lived then you'll
		216	be stuck with a given ioctl essentially forever. You can try to deprecate
		217	horrible ioctls on newer iterations of your hardware, but generally it takes
		218	years to accomplish this. And then again years until the last user able to
		219	complain about regressions disappears, too.