diff options
author | Jonathan Corbet <corbet@lwn.net> | 2010-05-12 16:23:48 -0400 |
---|---|---|
committer | Jonathan Corbet <corbet@lwn.net> | 2010-05-14 11:18:55 -0400 |
commit | 4047f8b1f9f4b4ecc4863f5f10cd9ba388b32a94 (patch) | |
tree | 4cc50188ea1920b4963b45b798658784e49c7a9d /Documentation | |
parent | b57f95a38233a2e73b679bea4a5453a1cc2a1cc9 (diff) |
Add a document describing the padata interface
This originally appeared as http://lwn.net/Articles/382257/.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/00-INDEX | 2 | ||||
-rw-r--r-- | Documentation/padata.txt | 107 |
2 files changed, 109 insertions, 0 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX index 06b982affe76..dd10b51b4e65 100644 --- a/Documentation/00-INDEX +++ b/Documentation/00-INDEX | |||
@@ -250,6 +250,8 @@ numastat.txt | |||
250 | - info on how to read Numa policy hit/miss statistics in sysfs. | 250 | - info on how to read Numa policy hit/miss statistics in sysfs. |
251 | oops-tracing.txt | 251 | oops-tracing.txt |
252 | - how to decode those nasty internal kernel error dump messages. | 252 | - how to decode those nasty internal kernel error dump messages. |
253 | padata.txt | ||
254 | - An introduction to the "padata" parallel execution API | ||
253 | parisc/ | 255 | parisc/ |
254 | - directory with info on using Linux on PA-RISC architecture. | 256 | - directory with info on using Linux on PA-RISC architecture. |
255 | parport.txt | 257 | parport.txt |
diff --git a/Documentation/padata.txt b/Documentation/padata.txt new file mode 100644 index 000000000000..269d7d0d8335 --- /dev/null +++ b/Documentation/padata.txt | |||
@@ -0,0 +1,107 @@ | |||
1 | The padata parallel execution mechanism | ||
2 | Last updated for 2.6.34 | ||
3 | |||
4 | Padata is a mechanism by which the kernel can farm work out to be done in | ||
5 | parallel on multiple CPUs while retaining the ordering of tasks. It was | ||
6 | developed for use with the IPsec code, which needs to be able to perform | ||
7 | encryption and decryption on large numbers of packets without reordering | ||
8 | those packets. The crypto developers made a point of writing padata in a | ||
9 | sufficiently general fashion that it could be put to other uses as well. | ||
10 | |||
11 | The first step in using padata is to set up a padata_instance structure for | ||
12 | overall control of how tasks are to be run: | ||
13 | |||
14 | #include <linux/padata.h> | ||
15 | |||
16 | struct padata_instance *padata_alloc(const struct cpumask *cpumask, | ||
17 | struct workqueue_struct *wq); | ||
18 | |||
19 | The cpumask describes which processors will be used to execute work | ||
20 | submitted to this instance. The workqueue wq is where the work will | ||
21 | actually be done; it should be a multithreaded queue, naturally. | ||
22 | |||
23 | There are functions for enabling and disabling the instance: | ||
24 | |||
25 | void padata_start(struct padata_instance *pinst); | ||
26 | void padata_stop(struct padata_instance *pinst); | ||
27 | |||
28 | These functions literally do nothing beyond setting or clearing the | ||
29 | "padata_start() was called" flag; if that flag is not set, other functions | ||
30 | will refuse to work. | ||
31 | |||
32 | The list of CPUs to be used can be adjusted with these functions: | ||
33 | |||
34 | int padata_set_cpumask(struct padata_instance *pinst, | ||
35 | cpumask_var_t cpumask); | ||
36 | int padata_add_cpu(struct padata_instance *pinst, int cpu); | ||
37 | int padata_remove_cpu(struct padata_instance *pinst, int cpu); | ||
38 | |||
39 | Changing the CPU mask has the look of an expensive operation, though, so it | ||
40 | probably should not be done with great frequency. | ||
41 | |||
42 | Actually submitting work to the padata instance requires the creation of a | ||
43 | padata_priv structure: | ||
44 | |||
45 | struct padata_priv { | ||
46 | /* Other stuff here... */ | ||
47 | void (*parallel)(struct padata_priv *padata); | ||
48 | void (*serial)(struct padata_priv *padata); | ||
49 | }; | ||
50 | |||
51 | This structure will almost certainly be embedded within some larger | ||
52 | structure specific to the work to be done. Most its fields are private to | ||
53 | padata, but the structure should be zeroed at initialization time, and the | ||
54 | parallel() and serial() functions should be provided. Those functions will | ||
55 | be called in the process of getting the work done as we will see | ||
56 | momentarily. | ||
57 | |||
58 | The submission of work is done with: | ||
59 | |||
60 | int padata_do_parallel(struct padata_instance *pinst, | ||
61 | struct padata_priv *padata, int cb_cpu); | ||
62 | |||
63 | The pinst and padata structures must be set up as described above; cb_cpu | ||
64 | specifies which CPU will be used for the final callback when the work is | ||
65 | done; it must be in the current instance's CPU mask. The return value from | ||
66 | padata_do_parallel() is a little strange; zero is an error return | ||
67 | indicating that the caller forgot the padata_start() formalities. -EBUSY | ||
68 | means that somebody, somewhere else is messing with the instance's CPU | ||
69 | mask, while -EINVAL is a complaint about cb_cpu not being in that CPU mask. | ||
70 | If all goes well, this function will return -EINPROGRESS, indicating that | ||
71 | the work is in progress. | ||
72 | |||
73 | Each task submitted to padata_do_parallel() will, in turn, be passed to | ||
74 | exactly one call to the above-mentioned parallel() function, on one CPU, so | ||
75 | true parallelism is achieved by submitting multiple tasks. Despite the | ||
76 | fact that the workqueue is used to make these calls, parallel() is run with | ||
77 | software interrupts disabled and thus cannot sleep. The parallel() | ||
78 | function gets the padata_priv structure pointer as its lone parameter; | ||
79 | information about the actual work to be done is probably obtained by using | ||
80 | container_of() to find the enclosing structure. | ||
81 | |||
82 | Note that parallel() has no return value; the padata subsystem assumes that | ||
83 | parallel() will take responsibility for the task from this point. The work | ||
84 | need not be completed during this call, but, if parallel() leaves work | ||
85 | outstanding, it should be prepared to be called again with a new job before | ||
86 | the previous one completes. When a task does complete, parallel() (or | ||
87 | whatever function actually finishes the job) should inform padata of the | ||
88 | fact with a call to: | ||
89 | |||
90 | void padata_do_serial(struct padata_priv *padata); | ||
91 | |||
92 | At some point in the future, padata_do_serial() will trigger a call to the | ||
93 | serial() function in the padata_priv structure. That call will happen on | ||
94 | the CPU requested in the initial call to padata_do_parallel(); it, too, is | ||
95 | done through the workqueue, but with local software interrupts disabled. | ||
96 | Note that this call may be deferred for a while since the padata code takes | ||
97 | pains to ensure that tasks are completed in the order in which they were | ||
98 | submitted. | ||
99 | |||
100 | The one remaining function in the padata API should be called to clean up | ||
101 | when a padata instance is no longer needed: | ||
102 | |||
103 | void padata_free(struct padata_instance *pinst); | ||
104 | |||
105 | This function will busy-wait while any remaining tasks are completed, so it | ||
106 | might be best not to call it while there is work outstanding. Shutting | ||
107 | down the workqueue, if necessary, should be done separately. | ||