aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/arm/nwfpe
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@ppc970.osdl.org>2005-04-16 18:20:36 -0400
committerLinus Torvalds <torvalds@ppc970.osdl.org>2005-04-16 18:20:36 -0400
commit1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (patch)
tree0bba044c4ce775e45a88a51686b5d9f90697ea9d /Documentation/arm/nwfpe
Linux-2.6.12-rc2v2.6.12-rc2
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
Diffstat (limited to 'Documentation/arm/nwfpe')
-rw-r--r--Documentation/arm/nwfpe/NOTES29
-rw-r--r--Documentation/arm/nwfpe/README70
-rw-r--r--Documentation/arm/nwfpe/README.FPE156
-rw-r--r--Documentation/arm/nwfpe/TODO67
4 files changed, 322 insertions, 0 deletions
diff --git a/Documentation/arm/nwfpe/NOTES b/Documentation/arm/nwfpe/NOTES
new file mode 100644
index 000000000000..40577b5a49d3
--- /dev/null
+++ b/Documentation/arm/nwfpe/NOTES
@@ -0,0 +1,29 @@
1There seems to be a problem with exp(double) and our emulator. I haven't
2been able to track it down yet. This does not occur with the emulator
3supplied by Russell King.
4
5I also found one oddity in the emulator. I don't think it is serious but
6will point it out. The ARM calling conventions require floating point
7registers f4-f7 to be preserved over a function call. The compiler quite
8often uses an stfe instruction to save f4 on the stack upon entry to a
9function, and an ldfe instruction to restore it before returning.
10
11I was looking at some code, that calculated a double result, stored it in f4
12then made a function call. Upon return from the function call the number in
13f4 had been converted to an extended value in the emulator.
14
15This is a side effect of the stfe instruction. The double in f4 had to be
16converted to extended, then stored. If an lfm/sfm combination had been used,
17then no conversion would occur. This has performance considerations. The
18result from the function call and f4 were used in a multiplication. If the
19emulator sees a multiply of a double and extended, it promotes the double to
20extended, then does the multiply in extended precision.
21
22This code will cause this problem:
23
24double x, y, z;
25z = log(x)/log(y);
26
27The result of log(x) (a double) will be calculated, returned in f0, then
28moved to f4 to preserve it over the log(y) call. The division will be done
29in extended precision, due to the stfe instruction used to save f4 in log(y).
diff --git a/Documentation/arm/nwfpe/README b/Documentation/arm/nwfpe/README
new file mode 100644
index 000000000000..771871de0c8b
--- /dev/null
+++ b/Documentation/arm/nwfpe/README
@@ -0,0 +1,70 @@
1This directory contains the version 0.92 test release of the NetWinder
2Floating Point Emulator.
3
4The majority of the code was written by me, Scott Bambrough It is
5written in C, with a small number of routines in inline assembler
6where required. It was written quickly, with a goal of implementing a
7working version of all the floating point instructions the compiler
8emits as the first target. I have attempted to be as optimal as
9possible, but there remains much room for improvement.
10
11I have attempted to make the emulator as portable as possible. One of
12the problems is with leading underscores on kernel symbols. Elf
13kernels have no leading underscores, a.out compiled kernels do. I
14have attempted to use the C_SYMBOL_NAME macro wherever this may be
15important.
16
17Another choice I made was in the file structure. I have attempted to
18contain all operating system specific code in one module (fpmodule.*).
19All the other files contain emulator specific code. This should allow
20others to port the emulator to NetBSD for instance relatively easily.
21
22The floating point operations are based on SoftFloat Release 2, by
23John Hauser. SoftFloat is a software implementation of floating-point
24that conforms to the IEC/IEEE Standard for Binary Floating-point
25Arithmetic. As many as four formats are supported: single precision,
26double precision, extended double precision, and quadruple precision.
27All operations required by the standard are implemented, except for
28conversions to and from decimal. We use only the single precision,
29double precision and extended double precision formats. The port of
30SoftFloat to the ARM was done by Phil Blundell, based on an earlier
31port of SoftFloat version 1 by Neil Carson for NetBSD/arm32.
32
33The file README.FPE contains a description of what has been implemented
34so far in the emulator. The file TODO contains a information on what
35remains to be done, and other ideas for the emulator.
36
37Bug reports, comments, suggestions should be directed to me at
38<scottb@netwinder.org>. General reports of "this program doesn't
39work correctly when your emulator is installed" are useful for
40determining that bugs still exist; but are virtually useless when
41attempting to isolate the problem. Please report them, but don't
42expect quick action. Bugs still exist. The problem remains in isolating
43which instruction contains the bug. Small programs illustrating a specific
44problem are a godsend.
45
46Legal Notices
47-------------
48
49The NetWinder Floating Point Emulator is free software. Everything Rebel.com
50has written is provided under the GNU GPL. See the file COPYING for copying
51conditions. Excluded from the above is the SoftFloat code. John Hauser's
52legal notice for SoftFloat is included below.
53
54-------------------------------------------------------------------------------
55SoftFloat Legal Notice
56
57SoftFloat was written by John R. Hauser. This work was made possible in
58part by the International Computer Science Institute, located at Suite 600,
591947 Center Street, Berkeley, California 94704. Funding was partially
60provided by the National Science Foundation under grant MIP-9311980. The
61original version of this code was written as part of a project to build
62a fixed-point vector processor in collaboration with the University of
63California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.
64
65THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort
66has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
67TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO
68PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY
69AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE.
70-------------------------------------------------------------------------------
diff --git a/Documentation/arm/nwfpe/README.FPE b/Documentation/arm/nwfpe/README.FPE
new file mode 100644
index 000000000000..26f5d7bb9a41
--- /dev/null
+++ b/Documentation/arm/nwfpe/README.FPE
@@ -0,0 +1,156 @@
1The following describes the current state of the NetWinder's floating point
2emulator.
3
4In the following nomenclature is used to describe the floating point
5instructions. It follows the conventions in the ARM manual.
6
7<S|D|E> = <single|double|extended>, no default
8{P|M|Z} = {round to +infinity,round to -infinity,round to zero},
9 default = round to nearest
10
11Note: items enclosed in {} are optional.
12
13Floating Point Coprocessor Data Transfer Instructions (CPDT)
14------------------------------------------------------------
15
16LDF/STF - load and store floating
17
18<LDF|STF>{cond}<S|D|E> Fd, Rn
19<LDF|STF>{cond}<S|D|E> Fd, [Rn, #<expression>]{!}
20<LDF|STF>{cond}<S|D|E> Fd, [Rn], #<expression>
21
22These instructions are fully implemented.
23
24LFM/SFM - load and store multiple floating
25
26Form 1 syntax:
27<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn]
28<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn, #<expression>]{!}
29<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn], #<expression>
30
31Form 2 syntax:
32<LFM|SFM>{cond}<FD,EA> Fd, <count>, [Rn]{!}
33
34These instructions are fully implemented. They store/load three words
35for each floating point register into the memory location given in the
36instruction. The format in memory is unlikely to be compatible with
37other implementations, in particular the actual hardware. Specific
38mention of this is made in the ARM manuals.
39
40Floating Point Coprocessor Register Transfer Instructions (CPRT)
41----------------------------------------------------------------
42
43Conversions, read/write status/control register instructions
44
45FLT{cond}<S,D,E>{P,M,Z} Fn, Rd Convert integer to floating point
46FIX{cond}{P,M,Z} Rd, Fn Convert floating point to integer
47WFS{cond} Rd Write floating point status register
48RFS{cond} Rd Read floating point status register
49WFC{cond} Rd Write floating point control register
50RFC{cond} Rd Read floating point control register
51
52FLT/FIX are fully implemented.
53
54RFS/WFS are fully implemented.
55
56RFC/WFC are fully implemented. RFC/WFC are supervisor only instructions, and
57presently check the CPU mode, and do an invalid instruction trap if not called
58from supervisor mode.
59
60Compare instructions
61
62CMF{cond} Fn, Fm Compare floating
63CMFE{cond} Fn, Fm Compare floating with exception
64CNF{cond} Fn, Fm Compare negated floating
65CNFE{cond} Fn, Fm Compare negated floating with exception
66
67These are fully implemented.
68
69Floating Point Coprocessor Data Instructions (CPDT)
70---------------------------------------------------
71
72Dyadic operations:
73
74ADF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - add
75SUF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - subtract
76RSF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse subtract
77MUF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - multiply
78DVF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - divide
79RDV{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse divide
80
81These are fully implemented.
82
83FML{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast multiply
84FDV{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast divide
85FRD{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast reverse divide
86
87These are fully implemented as well. They use the same algorithm as the
88non-fast versions. Hence, in this implementation their performance is
89equivalent to the MUF/DVF/RDV instructions. This is acceptable according
90to the ARM manual. The manual notes these are defined only for single
91operands, on the actual FPA11 hardware they do not work for double or
92extended precision operands. The emulator currently does not check
93the requested permissions conditions, and performs the requested operation.
94
95RMF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - IEEE remainder
96
97This is fully implemented.
98
99Monadic operations:
100
101MVF{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - move
102MNF{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - move negated
103
104These are fully implemented.
105
106ABS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - absolute value
107SQT{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - square root
108RND{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - round
109
110These are fully implemented.
111
112URD{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - unnormalized round
113NRM{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - normalize
114
115These are implemented. URD is implemented using the same code as the RND
116instruction. Since URD cannot return a unnormalized number, NRM becomes
117a NOP.
118
119Library calls:
120
121POW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - power
122RPW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse power
123POL{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - polar angle (arctan2)
124
125LOG{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base 10
126LGN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base e
127EXP{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - exponent
128SIN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - sine
129COS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - cosine
130TAN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - tangent
131ASN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arcsine
132ACS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arccosine
133ATN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arctangent
134
135These are not implemented. They are not currently issued by the compiler,
136and are handled by routines in libc. These are not implemented by the FPA11
137hardware, but are handled by the floating point support code. They should
138be implemented in future versions.
139
140Signalling:
141
142Signals are implemented. However current ELF kernels produced by Rebel.com
143have a bug in them that prevents the module from generating a SIGFPE. This
144is caused by a failure to alias fp_current to the kernel variable
145current_set[0] correctly.
146
147The kernel provided with this distribution (vmlinux-nwfpe-0.93) contains
148a fix for this problem and also incorporates the current version of the
149emulator directly. It is possible to run with no floating point module
150loaded with this kernel. It is provided as a demonstration of the
151technology and for those who want to do floating point work that depends
152on signals. It is not strictly necessary to use the module.
153
154A module (either the one provided by Russell King, or the one in this
155distribution) can be loaded to replace the functionality of the emulator
156built into the kernel.
diff --git a/Documentation/arm/nwfpe/TODO b/Documentation/arm/nwfpe/TODO
new file mode 100644
index 000000000000..8027061b60eb
--- /dev/null
+++ b/Documentation/arm/nwfpe/TODO
@@ -0,0 +1,67 @@
1TODO LIST
2---------
3
4POW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - power
5RPW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse power
6POL{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - polar angle (arctan2)
7
8LOG{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base 10
9LGN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base e
10EXP{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - exponent
11SIN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - sine
12COS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - cosine
13TAN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - tangent
14ASN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arcsine
15ACS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arccosine
16ATN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arctangent
17
18These are not implemented. They are not currently issued by the compiler,
19and are handled by routines in libc. These are not implemented by the FPA11
20hardware, but are handled by the floating point support code. They should
21be implemented in future versions.
22
23There are a couple of ways to approach the implementation of these. One
24method would be to use accurate table methods for these routines. I have
25a couple of papers by S. Gal from IBM's research labs in Haifa, Israel that
26seem to promise extreme accuracy (in the order of 99.8%) and reasonable speed.
27These methods are used in GLIBC for some of the transcendental functions.
28
29Another approach, which I know little about is CORDIC. This stands for
30Coordinate Rotation Digital Computer, and is a method of computing
31transcendental functions using mostly shifts and adds and a few
32multiplications and divisions. The ARM excels at shifts and adds,
33so such a method could be promising, but requires more research to
34determine if it is feasible.
35
36Rounding Methods
37
38The IEEE standard defines 4 rounding modes. Round to nearest is the
39default, but rounding to + or - infinity or round to zero are also allowed.
40Many architectures allow the rounding mode to be specified by modifying bits
41in a control register. Not so with the ARM FPA11 architecture. To change
42the rounding mode one must specify it with each instruction.
43
44This has made porting some benchmarks difficult. It is possible to
45introduce such a capability into the emulator. The FPCR contains
46bits describing the rounding mode. The emulator could be altered to
47examine a flag, which if set forced it to ignore the rounding mode in
48the instruction, and use the mode specified in the bits in the FPCR.
49
50This would require a method of getting/setting the flag, and the bits
51in the FPCR. This requires a kernel call in ArmLinux, as WFC/RFC are
52supervisor only instructions. If anyone has any ideas or comments I
53would like to hear them.
54
55[NOTE: pulled out from some docs on ARM floating point, specifically
56 for the Acorn FPE, but not limited to it:
57
58 The floating point control register (FPCR) may only be present in some
59 implementations: it is there to control the hardware in an implementation-
60 specific manner, for example to disable the floating point system. The user
61 mode of the ARM is not permitted to use this register (since the right is
62 reserved to alter it between implementations) and the WFC and RFC
63 instructions will trap if tried in user mode.
64
65 Hence, the answer is yes, you could do this, but then you will run a high
66 risk of becoming isolated if and when hardware FP emulation comes out
67 -- Russell].