diff options
author | Linus Torvalds <torvalds@ppc970.osdl.org> | 2005-04-16 18:20:36 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@ppc970.osdl.org> | 2005-04-16 18:20:36 -0400 |
commit | 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (patch) | |
tree | 0bba044c4ce775e45a88a51686b5d9f90697ea9d /Documentation/crypto |
Linux-2.6.12-rc2v2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!
Diffstat (limited to 'Documentation/crypto')
-rw-r--r-- | Documentation/crypto/api-intro.txt | 244 | ||||
-rw-r--r-- | Documentation/crypto/descore-readme.txt | 352 |
2 files changed, 596 insertions, 0 deletions
diff --git a/Documentation/crypto/api-intro.txt b/Documentation/crypto/api-intro.txt new file mode 100644 index 000000000000..a2d5b4900772 --- /dev/null +++ b/Documentation/crypto/api-intro.txt | |||
@@ -0,0 +1,244 @@ | |||
1 | |||
2 | Scatterlist Cryptographic API | ||
3 | |||
4 | INTRODUCTION | ||
5 | |||
6 | The Scatterlist Crypto API takes page vectors (scatterlists) as | ||
7 | arguments, and works directly on pages. In some cases (e.g. ECB | ||
8 | mode ciphers), this will allow for pages to be encrypted in-place | ||
9 | with no copying. | ||
10 | |||
11 | One of the initial goals of this design was to readily support IPsec, | ||
12 | so that processing can be applied to paged skb's without the need | ||
13 | for linearization. | ||
14 | |||
15 | |||
16 | DETAILS | ||
17 | |||
18 | At the lowest level are algorithms, which register dynamically with the | ||
19 | API. | ||
20 | |||
21 | 'Transforms' are user-instantiated objects, which maintain state, handle all | ||
22 | of the implementation logic (e.g. manipulating page vectors), provide an | ||
23 | abstraction to the underlying algorithms, and handle common logical | ||
24 | operations (e.g. cipher modes, HMAC for digests). However, at the user | ||
25 | level they are very simple. | ||
26 | |||
27 | Conceptually, the API layering looks like this: | ||
28 | |||
29 | [transform api] (user interface) | ||
30 | [transform ops] (per-type logic glue e.g. cipher.c, digest.c) | ||
31 | [algorithm api] (for registering algorithms) | ||
32 | |||
33 | The idea is to make the user interface and algorithm registration API | ||
34 | very simple, while hiding the core logic from both. Many good ideas | ||
35 | from existing APIs such as Cryptoapi and Nettle have been adapted for this. | ||
36 | |||
37 | The API currently supports three types of transforms: Ciphers, Digests and | ||
38 | Compressors. The compression algorithms especially seem to be performing | ||
39 | very well so far. | ||
40 | |||
41 | Support for hardware crypto devices via an asynchronous interface is | ||
42 | under development. | ||
43 | |||
44 | Here's an example of how to use the API: | ||
45 | |||
46 | #include <linux/crypto.h> | ||
47 | |||
48 | struct scatterlist sg[2]; | ||
49 | char result[128]; | ||
50 | struct crypto_tfm *tfm; | ||
51 | |||
52 | tfm = crypto_alloc_tfm("md5", 0); | ||
53 | if (tfm == NULL) | ||
54 | fail(); | ||
55 | |||
56 | /* ... set up the scatterlists ... */ | ||
57 | |||
58 | crypto_digest_init(tfm); | ||
59 | crypto_digest_update(tfm, &sg, 2); | ||
60 | crypto_digest_final(tfm, result); | ||
61 | |||
62 | crypto_free_tfm(tfm); | ||
63 | |||
64 | |||
65 | Many real examples are available in the regression test module (tcrypt.c). | ||
66 | |||
67 | |||
68 | CONFIGURATION NOTES | ||
69 | |||
70 | As Triple DES is part of the DES module, for those using modular builds, | ||
71 | add the following line to /etc/modprobe.conf: | ||
72 | |||
73 | alias des3_ede des | ||
74 | |||
75 | The Null algorithms reside in the crypto_null module, so these lines | ||
76 | should also be added: | ||
77 | |||
78 | alias cipher_null crypto_null | ||
79 | alias digest_null crypto_null | ||
80 | alias compress_null crypto_null | ||
81 | |||
82 | The SHA384 algorithm shares code within the SHA512 module, so you'll | ||
83 | also need: | ||
84 | alias sha384 sha512 | ||
85 | |||
86 | |||
87 | DEVELOPER NOTES | ||
88 | |||
89 | Transforms may only be allocated in user context, and cryptographic | ||
90 | methods may only be called from softirq and user contexts. | ||
91 | |||
92 | When using the API for ciphers, performance will be optimal if each | ||
93 | scatterlist contains data which is a multiple of the cipher's block | ||
94 | size (typically 8 bytes). This prevents having to do any copying | ||
95 | across non-aligned page fragment boundaries. | ||
96 | |||
97 | |||
98 | ADDING NEW ALGORITHMS | ||
99 | |||
100 | When submitting a new algorithm for inclusion, a mandatory requirement | ||
101 | is that at least a few test vectors from known sources (preferably | ||
102 | standards) be included. | ||
103 | |||
104 | Converting existing well known code is preferred, as it is more likely | ||
105 | to have been reviewed and widely tested. If submitting code from LGPL | ||
106 | sources, please consider changing the license to GPL (see section 3 of | ||
107 | the LGPL). | ||
108 | |||
109 | Algorithms submitted must also be generally patent-free (e.g. IDEA | ||
110 | will not be included in the mainline until around 2011), and be based | ||
111 | on a recognized standard and/or have been subjected to appropriate | ||
112 | peer review. | ||
113 | |||
114 | Also check for any RFCs which may relate to the use of specific algorithms, | ||
115 | as well as general application notes such as RFC2451 ("The ESP CBC-Mode | ||
116 | Cipher Algorithms"). | ||
117 | |||
118 | It's a good idea to avoid using lots of macros and use inlined functions | ||
119 | instead, as gcc does a good job with inlining, while excessive use of | ||
120 | macros can cause compilation problems on some platforms. | ||
121 | |||
122 | Also check the TODO list at the web site listed below to see what people | ||
123 | might already be working on. | ||
124 | |||
125 | |||
126 | BUGS | ||
127 | |||
128 | Send bug reports to: | ||
129 | James Morris <jmorris@redhat.com> | ||
130 | Cc: David S. Miller <davem@redhat.com> | ||
131 | |||
132 | |||
133 | FURTHER INFORMATION | ||
134 | |||
135 | For further patches and various updates, including the current TODO | ||
136 | list, see: | ||
137 | http://samba.org/~jamesm/crypto/ | ||
138 | |||
139 | |||
140 | AUTHORS | ||
141 | |||
142 | James Morris | ||
143 | David S. Miller | ||
144 | |||
145 | |||
146 | CREDITS | ||
147 | |||
148 | The following people provided invaluable feedback during the development | ||
149 | of the API: | ||
150 | |||
151 | Alexey Kuznetzov | ||
152 | Rusty Russell | ||
153 | Herbert Valerio Riedel | ||
154 | Jeff Garzik | ||
155 | Michael Richardson | ||
156 | Andrew Morton | ||
157 | Ingo Oeser | ||
158 | Christoph Hellwig | ||
159 | |||
160 | Portions of this API were derived from the following projects: | ||
161 | |||
162 | Kerneli Cryptoapi (http://www.kerneli.org/) | ||
163 | Alexander Kjeldaas | ||
164 | Herbert Valerio Riedel | ||
165 | Kyle McMartin | ||
166 | Jean-Luc Cooke | ||
167 | David Bryson | ||
168 | Clemens Fruhwirth | ||
169 | Tobias Ringstrom | ||
170 | Harald Welte | ||
171 | |||
172 | and; | ||
173 | |||
174 | Nettle (http://www.lysator.liu.se/~nisse/nettle/) | ||
175 | Niels Möller | ||
176 | |||
177 | Original developers of the crypto algorithms: | ||
178 | |||
179 | Dana L. How (DES) | ||
180 | Andrew Tridgell and Steve French (MD4) | ||
181 | Colin Plumb (MD5) | ||
182 | Steve Reid (SHA1) | ||
183 | Jean-Luc Cooke (SHA256, SHA384, SHA512) | ||
184 | Kazunori Miyazawa / USAGI (HMAC) | ||
185 | Matthew Skala (Twofish) | ||
186 | Dag Arne Osvik (Serpent) | ||
187 | Brian Gladman (AES) | ||
188 | Kartikey Mahendra Bhatt (CAST6) | ||
189 | Jon Oberheide (ARC4) | ||
190 | Jouni Malinen (Michael MIC) | ||
191 | |||
192 | SHA1 algorithm contributors: | ||
193 | Jean-Francois Dive | ||
194 | |||
195 | DES algorithm contributors: | ||
196 | Raimar Falke | ||
197 | Gisle Sælensminde | ||
198 | Niels Möller | ||
199 | |||
200 | Blowfish algorithm contributors: | ||
201 | Herbert Valerio Riedel | ||
202 | Kyle McMartin | ||
203 | |||
204 | Twofish algorithm contributors: | ||
205 | Werner Koch | ||
206 | Marc Mutz | ||
207 | |||
208 | SHA256/384/512 algorithm contributors: | ||
209 | Andrew McDonald | ||
210 | Kyle McMartin | ||
211 | Herbert Valerio Riedel | ||
212 | |||
213 | AES algorithm contributors: | ||
214 | Alexander Kjeldaas | ||
215 | Herbert Valerio Riedel | ||
216 | Kyle McMartin | ||
217 | Adam J. Richter | ||
218 | Fruhwirth Clemens (i586) | ||
219 | Linus Torvalds (i586) | ||
220 | |||
221 | CAST5 algorithm contributors: | ||
222 | Kartikey Mahendra Bhatt (original developers unknown, FSF copyright). | ||
223 | |||
224 | TEA/XTEA algorithm contributors: | ||
225 | Aaron Grothe | ||
226 | |||
227 | Khazad algorithm contributors: | ||
228 | Aaron Grothe | ||
229 | |||
230 | Whirlpool algorithm contributors: | ||
231 | Aaron Grothe | ||
232 | Jean-Luc Cooke | ||
233 | |||
234 | Anubis algorithm contributors: | ||
235 | Aaron Grothe | ||
236 | |||
237 | Tiger algorithm contributors: | ||
238 | Aaron Grothe | ||
239 | |||
240 | Generic scatterwalk code by Adam J. Richter <adam@yggdrasil.com> | ||
241 | |||
242 | Please send any credits updates or corrections to: | ||
243 | James Morris <jmorris@redhat.com> | ||
244 | |||
diff --git a/Documentation/crypto/descore-readme.txt b/Documentation/crypto/descore-readme.txt new file mode 100644 index 000000000000..166474c2ee0b --- /dev/null +++ b/Documentation/crypto/descore-readme.txt | |||
@@ -0,0 +1,352 @@ | |||
1 | Below is the orginal README file from the descore.shar package. | ||
2 | ------------------------------------------------------------------------------ | ||
3 | |||
4 | des - fast & portable DES encryption & decryption. | ||
5 | Copyright (C) 1992 Dana L. How | ||
6 | |||
7 | This program is free software; you can redistribute it and/or modify | ||
8 | it under the terms of the GNU Library General Public License as published by | ||
9 | the Free Software Foundation; either version 2 of the License, or | ||
10 | (at your option) any later version. | ||
11 | |||
12 | This program is distributed in the hope that it will be useful, | ||
13 | but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
14 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
15 | GNU Library General Public License for more details. | ||
16 | |||
17 | You should have received a copy of the GNU Library General Public License | ||
18 | along with this program; if not, write to the Free Software | ||
19 | Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. | ||
20 | |||
21 | Author's address: how@isl.stanford.edu | ||
22 | |||
23 | $Id: README,v 1.15 1992/05/20 00:25:32 how E $ | ||
24 | |||
25 | |||
26 | ==>> To compile after untarring/unsharring, just `make' <<== | ||
27 | |||
28 | |||
29 | This package was designed with the following goals: | ||
30 | 1. Highest possible encryption/decryption PERFORMANCE. | ||
31 | 2. PORTABILITY to any byte-addressable host with a 32bit unsigned C type | ||
32 | 3. Plug-compatible replacement for KERBEROS's low-level routines. | ||
33 | |||
34 | This second release includes a number of performance enhancements for | ||
35 | register-starved machines. My discussions with Richard Outerbridge, | ||
36 | 71755.204@compuserve.com, sparked a number of these enhancements. | ||
37 | |||
38 | To more rapidly understand the code in this package, inspect desSmallFips.i | ||
39 | (created by typing `make') BEFORE you tackle desCode.h. The latter is set | ||
40 | up in a parameterized fashion so it can easily be modified by speed-daemon | ||
41 | hackers in pursuit of that last microsecond. You will find it more | ||
42 | illuminating to inspect one specific implementation, | ||
43 | and then move on to the common abstract skeleton with this one in mind. | ||
44 | |||
45 | |||
46 | performance comparison to other available des code which i could | ||
47 | compile on a SPARCStation 1 (cc -O4, gcc -O2): | ||
48 | |||
49 | this code (byte-order independent): | ||
50 | 30us per encryption (options: 64k tables, no IP/FP) | ||
51 | 33us per encryption (options: 64k tables, FIPS standard bit ordering) | ||
52 | 45us per encryption (options: 2k tables, no IP/FP) | ||
53 | 48us per encryption (options: 2k tables, FIPS standard bit ordering) | ||
54 | 275us to set a new key (uses 1k of key tables) | ||
55 | this has the quickest encryption/decryption routines i've seen. | ||
56 | since i was interested in fast des filters rather than crypt(3) | ||
57 | and password cracking, i haven't really bothered yet to speed up | ||
58 | the key setting routine. also, i have no interest in re-implementing | ||
59 | all the other junk in the mit kerberos des library, so i've just | ||
60 | provided my routines with little stub interfaces so they can be | ||
61 | used as drop-in replacements with mit's code or any of the mit- | ||
62 | compatible packages below. (note that the first two timings above | ||
63 | are highly variable because of cache effects). | ||
64 | |||
65 | kerberos des replacement from australia (version 1.95): | ||
66 | 53us per encryption (uses 2k of tables) | ||
67 | 96us to set a new key (uses 2.25k of key tables) | ||
68 | so despite the author's inclusion of some of the performance | ||
69 | improvements i had suggested to him, this package's | ||
70 | encryption/decryption is still slower on the sparc and 68000. | ||
71 | more specifically, 19-40% slower on the 68020 and 11-35% slower | ||
72 | on the sparc, depending on the compiler; | ||
73 | in full gory detail (ALT_ECB is a libdes variant): | ||
74 | compiler machine desCore libdes ALT_ECB slower by | ||
75 | gcc 2.1 -O2 Sun 3/110 304 uS 369.5uS 461.8uS 22% | ||
76 | cc -O1 Sun 3/110 336 uS 436.6uS 399.3uS 19% | ||
77 | cc -O2 Sun 3/110 360 uS 532.4uS 505.1uS 40% | ||
78 | cc -O4 Sun 3/110 365 uS 532.3uS 505.3uS 38% | ||
79 | gcc 2.1 -O2 Sun 4/50 48 uS 53.4uS 57.5uS 11% | ||
80 | cc -O2 Sun 4/50 48 uS 64.6uS 64.7uS 35% | ||
81 | cc -O4 Sun 4/50 48 uS 64.7uS 64.9uS 35% | ||
82 | (my time measurements are not as accurate as his). | ||
83 | the comments in my first release of desCore on version 1.92: | ||
84 | 68us per encryption (uses 2k of tables) | ||
85 | 96us to set a new key (uses 2.25k of key tables) | ||
86 | this is a very nice package which implements the most important | ||
87 | of the optimizations which i did in my encryption routines. | ||
88 | it's a bit weak on common low-level optimizations which is why | ||
89 | it's 39%-106% slower. because he was interested in fast crypt(3) and | ||
90 | password-cracking applications, he also used the same ideas to | ||
91 | speed up the key-setting routines with impressive results. | ||
92 | (at some point i may do the same in my package). he also implements | ||
93 | the rest of the mit des library. | ||
94 | (code from eay@psych.psy.uq.oz.au via comp.sources.misc) | ||
95 | |||
96 | fast crypt(3) package from denmark: | ||
97 | the des routine here is buried inside a loop to do the | ||
98 | crypt function and i didn't feel like ripping it out and measuring | ||
99 | performance. his code takes 26 sparc instructions to compute one | ||
100 | des iteration; above, Quick (64k) takes 21 and Small (2k) takes 37. | ||
101 | he claims to use 280k of tables but the iteration calculation seems | ||
102 | to use only 128k. his tables and code are machine independent. | ||
103 | (code from glad@daimi.aau.dk via alt.sources or comp.sources.misc) | ||
104 | |||
105 | swedish reimplementation of Kerberos des library | ||
106 | 108us per encryption (uses 34k worth of tables) | ||
107 | 134us to set a new key (uses 32k of key tables to get this speed!) | ||
108 | the tables used seem to be machine-independent; | ||
109 | he seems to have included a lot of special case code | ||
110 | so that, e.g., `long' loads can be used instead of 4 `char' loads | ||
111 | when the machine's architecture allows it. | ||
112 | (code obtained from chalmers.se:pub/des) | ||
113 | |||
114 | crack 3.3c package from england: | ||
115 | as in crypt above, the des routine is buried in a loop. it's | ||
116 | also very modified for crypt. his iteration code uses 16k | ||
117 | of tables and appears to be slow. | ||
118 | (code obtained from aem@aber.ac.uk via alt.sources or comp.sources.misc) | ||
119 | |||
120 | ``highly optimized'' and tweaked Kerberos/Athena code (byte-order dependent): | ||
121 | 165us per encryption (uses 6k worth of tables) | ||
122 | 478us to set a new key (uses <1k of key tables) | ||
123 | so despite the comments in this code, it was possible to get | ||
124 | faster code AND smaller tables, as well as making the tables | ||
125 | machine-independent. | ||
126 | (code obtained from prep.ai.mit.edu) | ||
127 | |||
128 | UC Berkeley code (depends on machine-endedness): | ||
129 | 226us per encryption | ||
130 | 10848us to set a new key | ||
131 | table sizes are unclear, but they don't look very small | ||
132 | (code obtained from wuarchive.wustl.edu) | ||
133 | |||
134 | |||
135 | motivation and history | ||
136 | |||
137 | a while ago i wanted some des routines and the routines documented on sun's | ||
138 | man pages either didn't exist or dumped core. i had heard of kerberos, | ||
139 | and knew that it used des, so i figured i'd use its routines. but once | ||
140 | i got it and looked at the code, it really set off a lot of pet peeves - | ||
141 | it was too convoluted, the code had been written without taking | ||
142 | advantage of the regular structure of operations such as IP, E, and FP | ||
143 | (i.e. the author didn't sit down and think before coding), | ||
144 | it was excessively slow, the author had attempted to clarify the code | ||
145 | by adding MORE statements to make the data movement more `consistent' | ||
146 | instead of simplifying his implementation and cutting down on all data | ||
147 | movement (in particular, his use of L1, R1, L2, R2), and it was full of | ||
148 | idiotic `tweaks' for particular machines which failed to deliver significant | ||
149 | speedups but which did obfuscate everything. so i took the test data | ||
150 | from his verification program and rewrote everything else. | ||
151 | |||
152 | a while later i ran across the great crypt(3) package mentioned above. | ||
153 | the fact that this guy was computing 2 sboxes per table lookup rather | ||
154 | than one (and using a MUCH larger table in the process) emboldened me to | ||
155 | do the same - it was a trivial change from which i had been scared away | ||
156 | by the larger table size. in his case he didn't realize you don't need to keep | ||
157 | the working data in TWO forms, one for easy use of half the sboxes in | ||
158 | indexing, the other for easy use of the other half; instead you can keep | ||
159 | it in the form for the first half and use a simple rotate to get the other | ||
160 | half. this means i have (almost) half the data manipulation and half | ||
161 | the table size. in fairness though he might be encoding something particular | ||
162 | to crypt(3) in his tables - i didn't check. | ||
163 | |||
164 | i'm glad that i implemented it the way i did, because this C version is | ||
165 | portable (the ifdef's are performance enhancements) and it is faster | ||
166 | than versions hand-written in assembly for the sparc! | ||
167 | |||
168 | |||
169 | porting notes | ||
170 | |||
171 | one thing i did not want to do was write an enormous mess | ||
172 | which depended on endedness and other machine quirks, | ||
173 | and which necessarily produced different code and different lookup tables | ||
174 | for different machines. see the kerberos code for an example | ||
175 | of what i didn't want to do; all their endedness-specific `optimizations' | ||
176 | obfuscate the code and in the end were slower than a simpler machine | ||
177 | independent approach. however, there are always some portability | ||
178 | considerations of some kind, and i have included some options | ||
179 | for varying numbers of register variables. | ||
180 | perhaps some will still regard the result as a mess! | ||
181 | |||
182 | 1) i assume everything is byte addressable, although i don't actually | ||
183 | depend on the byte order, and that bytes are 8 bits. | ||
184 | i assume word pointers can be freely cast to and from char pointers. | ||
185 | note that 99% of C programs make these assumptions. | ||
186 | i always use unsigned char's if the high bit could be set. | ||
187 | 2) the typedef `word' means a 32 bit unsigned integral type. | ||
188 | if `unsigned long' is not 32 bits, change the typedef in desCore.h. | ||
189 | i assume sizeof(word) == 4 EVERYWHERE. | ||
190 | |||
191 | the (worst-case) cost of my NOT doing endedness-specific optimizations | ||
192 | in the data loading and storing code surrounding the key iterations | ||
193 | is less than 12%. also, there is the added benefit that | ||
194 | the input and output work areas do not need to be word-aligned. | ||
195 | |||
196 | |||
197 | OPTIONAL performance optimizations | ||
198 | |||
199 | 1) you should define one of `i386,' `vax,' `mc68000,' or `sparc,' | ||
200 | whichever one is closest to the capabilities of your machine. | ||
201 | see the start of desCode.h to see exactly what this selection implies. | ||
202 | note that if you select the wrong one, the des code will still work; | ||
203 | these are just performance tweaks. | ||
204 | 2) for those with functional `asm' keywords: you should change the | ||
205 | ROR and ROL macros to use machine rotate instructions if you have them. | ||
206 | this will save 2 instructions and a temporary per use, | ||
207 | or about 32 to 40 instructions per en/decryption. | ||
208 | note that gcc is smart enough to translate the ROL/R macros into | ||
209 | machine rotates! | ||
210 | |||
211 | these optimizations are all rather persnickety, yet with them you should | ||
212 | be able to get performance equal to assembly-coding, except that: | ||
213 | 1) with the lack of a bit rotate operator in C, rotates have to be synthesized | ||
214 | from shifts. so access to `asm' will speed things up if your machine | ||
215 | has rotates, as explained above in (3) (not necessary if you use gcc). | ||
216 | 2) if your machine has less than 12 32-bit registers i doubt your compiler will | ||
217 | generate good code. | ||
218 | `i386' tries to configure the code for a 386 by only declaring 3 registers | ||
219 | (it appears that gcc can use ebx, esi and edi to hold register variables). | ||
220 | however, if you like assembly coding, the 386 does have 7 32-bit registers, | ||
221 | and if you use ALL of them, use `scaled by 8' address modes with displacement | ||
222 | and other tricks, you can get reasonable routines for DesQuickCore... with | ||
223 | about 250 instructions apiece. For DesSmall... it will help to rearrange | ||
224 | des_keymap, i.e., now the sbox # is the high part of the index and | ||
225 | the 6 bits of data is the low part; it helps to exchange these. | ||
226 | since i have no way to conveniently test it i have not provided my | ||
227 | shoehorned 386 version. note that with this release of desCore, gcc is able | ||
228 | to put everything in registers(!), and generate about 370 instructions apiece | ||
229 | for the DesQuickCore... routines! | ||
230 | |||
231 | coding notes | ||
232 | |||
233 | the en/decryption routines each use 6 necessary register variables, | ||
234 | with 4 being actively used at once during the inner iterations. | ||
235 | if you don't have 4 register variables get a new machine. | ||
236 | up to 8 more registers are used to hold constants in some configurations. | ||
237 | |||
238 | i assume that the use of a constant is more expensive than using a register: | ||
239 | a) additionally, i have tried to put the larger constants in registers. | ||
240 | registering priority was by the following: | ||
241 | anything more than 12 bits (bad for RISC and CISC) | ||
242 | greater than 127 in value (can't use movq or byte immediate on CISC) | ||
243 | 9-127 (may not be able to use CISC shift immediate or add/sub quick), | ||
244 | 1-8 were never registered, being the cheapest constants. | ||
245 | b) the compiler may be too stupid to realize table and table+256 should | ||
246 | be assigned to different constant registers and instead repetitively | ||
247 | do the arithmetic, so i assign these to explicit `m' register variables | ||
248 | when possible and helpful. | ||
249 | |||
250 | i assume that indexing is cheaper or equivalent to auto increment/decrement, | ||
251 | where the index is 7 bits unsigned or smaller. | ||
252 | this assumption is reversed for 68k and vax. | ||
253 | |||
254 | i assume that addresses can be cheaply formed from two registers, | ||
255 | or from a register and a small constant. | ||
256 | for the 68000, the `two registers and small offset' form is used sparingly. | ||
257 | all index scaling is done explicitly - no hidden shifts by log2(sizeof). | ||
258 | |||
259 | the code is written so that even a dumb compiler | ||
260 | should never need more than one hidden temporary, | ||
261 | increasing the chance that everything will fit in the registers. | ||
262 | KEEP THIS MORE SUBTLE POINT IN MIND IF YOU REWRITE ANYTHING. | ||
263 | (actually, there are some code fragments now which do require two temps, | ||
264 | but fixing it would either break the structure of the macros or | ||
265 | require declaring another temporary). | ||
266 | |||
267 | |||
268 | special efficient data format | ||
269 | |||
270 | bits are manipulated in this arrangement most of the time (S7 S5 S3 S1): | ||
271 | 003130292827xxxx242322212019xxxx161514131211xxxx080706050403xxxx | ||
272 | (the x bits are still there, i'm just emphasizing where the S boxes are). | ||
273 | bits are rotated left 4 when computing S6 S4 S2 S0: | ||
274 | 282726252423xxxx201918171615xxxx121110090807xxxx040302010031xxxx | ||
275 | the rightmost two bits are usually cleared so the lower byte can be used | ||
276 | as an index into an sbox mapping table. the next two x'd bits are set | ||
277 | to various values to access different parts of the tables. | ||
278 | |||
279 | |||
280 | how to use the routines | ||
281 | |||
282 | datatypes: | ||
283 | pointer to 8 byte area of type DesData | ||
284 | used to hold keys and input/output blocks to des. | ||
285 | |||
286 | pointer to 128 byte area of type DesKeys | ||
287 | used to hold full 768-bit key. | ||
288 | must be long-aligned. | ||
289 | |||
290 | DesQuickInit() | ||
291 | call this before using any other routine with `Quick' in its name. | ||
292 | it generates the special 64k table these routines need. | ||
293 | DesQuickDone() | ||
294 | frees this table | ||
295 | |||
296 | DesMethod(m, k) | ||
297 | m points to a 128byte block, k points to an 8 byte des key | ||
298 | which must have odd parity (or -1 is returned) and which must | ||
299 | not be a (semi-)weak key (or -2 is returned). | ||
300 | normally DesMethod() returns 0. | ||
301 | m is filled in from k so that when one of the routines below | ||
302 | is called with m, the routine will act like standard des | ||
303 | en/decryption with the key k. if you use DesMethod, | ||
304 | you supply a standard 56bit key; however, if you fill in | ||
305 | m yourself, you will get a 768bit key - but then it won't | ||
306 | be standard. it's 768bits not 1024 because the least significant | ||
307 | two bits of each byte are not used. note that these two bits | ||
308 | will be set to magic constants which speed up the encryption/decryption | ||
309 | on some machines. and yes, each byte controls | ||
310 | a specific sbox during a specific iteration. | ||
311 | you really shouldn't use the 768bit format directly; i should | ||
312 | provide a routine that converts 128 6-bit bytes (specified in | ||
313 | S-box mapping order or something) into the right format for you. | ||
314 | this would entail some byte concatenation and rotation. | ||
315 | |||
316 | Des{Small|Quick}{Fips|Core}{Encrypt|Decrypt}(d, m, s) | ||
317 | performs des on the 8 bytes at s into the 8 bytes at d. (d,s: char *). | ||
318 | uses m as a 768bit key as explained above. | ||
319 | the Encrypt|Decrypt choice is obvious. | ||
320 | Fips|Core determines whether a completely standard FIPS initial | ||
321 | and final permutation is done; if not, then the data is loaded | ||
322 | and stored in a nonstandard bit order (FIPS w/o IP/FP). | ||
323 | Fips slows down Quick by 10%, Small by 9%. | ||
324 | Small|Quick determines whether you use the normal routine | ||
325 | or the crazy quick one which gobbles up 64k more of memory. | ||
326 | Small is 50% slower then Quick, but Quick needs 32 times as much | ||
327 | memory. Quick is included for programs that do nothing but DES, | ||
328 | e.g., encryption filters, etc. | ||
329 | |||
330 | |||
331 | Getting it to compile on your machine | ||
332 | |||
333 | there are no machine-dependencies in the code (see porting), | ||
334 | except perhaps the `now()' macro in desTest.c. | ||
335 | ALL generated tables are machine independent. | ||
336 | you should edit the Makefile with the appropriate optimization flags | ||
337 | for your compiler (MAX optimization). | ||
338 | |||
339 | |||
340 | Speeding up kerberos (and/or its des library) | ||
341 | |||
342 | note that i have included a kerberos-compatible interface in desUtil.c | ||
343 | through the functions des_key_sched() and des_ecb_encrypt(). | ||
344 | to use these with kerberos or kerberos-compatible code put desCore.a | ||
345 | ahead of the kerberos-compatible library on your linker's command line. | ||
346 | you should not need to #include desCore.h; just include the header | ||
347 | file provided with the kerberos library. | ||
348 | |||
349 | Other uses | ||
350 | |||
351 | the macros in desCode.h would be very useful for putting inline des | ||
352 | functions in more complicated encryption routines. | ||