aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJoseph Myers <joseph@codesourcery.com>2013-11-04 11:54:46 -0500
committerScott Wood <scottwood@freescale.com>2014-01-07 19:38:59 -0500
commit28fbf1d540920ad6722fa6ac15237a307932bc9b (patch)
treef9aab017ed91c41428e9a019c25d58d0a24e5f29
parentd06b3326dfd02f3f036b670d622fe56eb68a5f30 (diff)
powerpc: fix e500 SPE float to integer and fixed-point conversions
The e500 SPE floating-point emulation code has several problems in how it handles conversions to integer and fixed-point fractional types. There are the following 20 relevant instructions. These can convert to signed or unsigned 32-bit integers, either rounding towards zero (as correct for C casts from floating-point to integer) or according to the current rounding mode, or to signed or unsigned 32-bit fixed-point values (values in the range [-1, 1) or [0, 1)). For conversion from double precision there are also instructions to convert to 64-bit integers, rounding towards zero, although as far as I know those instructions are completely theoretical (they are only defined for implementations that support both SPE and classic 64-bit, and I'm not aware of any such hardware even though the architecture definition permits that combination). #define EFSCTUI 0x2d4 #define EFSCTSI 0x2d5 #define EFSCTUF 0x2d6 #define EFSCTSF 0x2d7 #define EFSCTUIZ 0x2d8 #define EFSCTSIZ 0x2da #define EVFSCTUI 0x294 #define EVFSCTSI 0x295 #define EVFSCTUF 0x296 #define EVFSCTSF 0x297 #define EVFSCTUIZ 0x298 #define EVFSCTSIZ 0x29a #define EFDCTUIDZ 0x2ea #define EFDCTSIDZ 0x2eb #define EFDCTUI 0x2f4 #define EFDCTSI 0x2f5 #define EFDCTUF 0x2f6 #define EFDCTSF 0x2f7 #define EFDCTUIZ 0x2f8 #define EFDCTSIZ 0x2fa The emulation code, for the instructions that come in variants rounding either towards zero or according to the current rounding direction, uses "if (func & 0x4)" as a condition for using _FP_ROUND (otherwise _FP_ROUND_ZERO is used). The condition is correct, but the code it controls isn't. Whether _FP_ROUND or _FP_ROUND_ZERO is used makes no difference, as the effect of those soft-fp macros is to round an intermediate floating-point result using the low three bits (the last one sticky) of the working format. As these operations are dealing with a freshly unpacked floating-point input, those low bits are zero and no rounding occurs. The emulation code then uses the FP_TO_INT_* macros for the actual integer conversion, with the effect of always rounding towards zero; for rounding according to the current rounding direction, it should be using FP_TO_INT_ROUND_*. The instructions in question have semantics defined (in the Power ISA documents) for out-of-range values and NaNs: out-of-range values saturate and NaNs are converted to zero. The emulation does nothing to follow those semantics for NaNs (the soft-fp handling is to treat them as infinities), and messes up the saturation semantics. For single-precision conversion to integers, (((func & 0x3) != 0) || SB_s) is the condition used for doing a signed conversion. The first part is correct, but the second isn't: negative numbers should result in saturation to 0 when converted to unsigned. Double-precision conversion to 64-bit integers correctly uses ((func & 0x1) == 0). Double-precision conversion to 32-bit integers uses (((func & 0x3) != 0) || DB_s), with correct first part and incorrect second part. And vector float conversion to integers uses (((func & 0x3) != 0) || SB0_s) (and similar for the other vector element), where the sign bit check is again wrong. The incorrect handling of negative numbers converted to unsigned was introduced in commit afc0a07d4a283599ac3a6a31d7454e9baaeccca0. The rationale given there was a C testcase with cast from float to unsigned int. Conversion of out-of-range floating-point numbers to integer types in C is undefined behavior in the base standard, defined in Annex F to produce an unspecified value. That is, the C testcase used to justify that patch is incorrect - there is no ISO C requirement for a particular value resulting from this conversion - and in any case, the correct semantics for such emulation are the semantics for the instruction (unsigned saturation, which is what it does in hardware when the emulation is disabled). The conversion to fixed-point values has its own problems. That code doesn't try to do a full emulation; it relies on the trap handler only being called for arguments that are infinities, NaNs, subnormal or out of range. That's fine, but the logic ((vb.wp[1] >> 23) == 0xff && ((vb.wp[1] & 0x7fffff) > 0)) for NaN detection won't detect negative NaNs as being NaNs (the same applies for the double-precision case), and subnormals are mapped to 0 rather than respecting the rounding mode; the code should also explicitly raise the "invalid" exception. The code for vectors works by executing the scalar float instruction with the trapping disabled, meaning at least subnormals won't be handled correctly. As well as all those problems in the main emulation code, the rounding handler - used to emulate rounding upward and downward when not supported in hardware and when no higher priority exception occurred - has its own problems. * It gets called in some cases even for the instructions rounding to zero, and then acts according to the current rounding mode when it should just leave alone the truncated result provided by hardware. * It presumes that the result is a single-precision, double-precision or single-precision vector as appropriate for the instruction type, determines the sign of the result accordingly, and then adjusts the result based on that sign and the rounding mode. - In the single-precision cases at least the sign determination for an integer result is the same as for a floating-point result; in the double-precision case, converted to 32-bit integer or fixed point, the sign of a double-precision value is in the high part of the register but it's the low part of the register that has the result of the conversion. - If the result is unsigned fixed-point, its sign may be wrongly determined as negative (does not actually cause problems, because inexact unsigned fixed-point results with the high bit set can only appear when converting from double, in which case the sign determination is instead wrongly using the high part of the register). - If the sign of the result is correctly determined as negative, any adjustment required to change the truncated result to one correct for the rounding mode should be in the opposite direction for two's-complement integers as for sign-magnitude floating-point values. - And if the integer result is zero, the correct sign can only be determined by examining the original operand, and not at all (as far as I can tell) if the operand and result are the same register. This patch fixes all these problems (as far as possible, given the inability to determine the correct sign in the rounding handler when the truncated result is 0, the conversion is to a signed type and the truncated result has overwritten the original operand). Conversion to fixed-point now uses full emulation, and does not use "asm" in the vector case; the semantics are exactly those of converting to integer according to the current rounding direction, once the exponent has been adjusted, so the code makes such an adjustment then uses the FP_TO_INT_ROUND macros. The testcase I used for verifying that the instructions (other than the theoretical conversions to 64-bit integers) produce the correct results is at <http://lkml.org/lkml/2013/10/8/708>. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
-rw-r--r--arch/powerpc/math-emu/math_efp.c263
1 files changed, 188 insertions, 75 deletions
diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c
index ecdf35d8cb00..01a0abb94dd6 100644
--- a/arch/powerpc/math-emu/math_efp.c
+++ b/arch/powerpc/math-emu/math_efp.c
@@ -275,21 +275,13 @@ int do_spe_mathemu(struct pt_regs *regs)
275 275
276 case EFSCTSF: 276 case EFSCTSF:
277 case EFSCTUF: 277 case EFSCTUF:
278 if (!((vb.wp[1] >> 23) == 0xff && ((vb.wp[1] & 0x7fffff) > 0))) { 278 if (SB_c == FP_CLS_NAN) {
279 /* NaN */ 279 vc.wp[1] = 0;
280 if (((vb.wp[1] >> 23) & 0xff) == 0) { 280 FP_SET_EXCEPTION(FP_EX_INVALID);
281 /* denorm */ 281 } else {
282 vc.wp[1] = 0x0; 282 SB_e += (func == EFSCTSF ? 31 : 32);
283 } else if ((vb.wp[1] >> 31) == 0) { 283 FP_TO_INT_ROUND_S(vc.wp[1], SB, 32,
284 /* positive normal */ 284 (func == EFSCTSF));
285 vc.wp[1] = (func == EFSCTSF) ?
286 0x7fffffff : 0xffffffff;
287 } else { /* negative normal */
288 vc.wp[1] = (func == EFSCTSF) ?
289 0x80000000 : 0x0;
290 }
291 } else { /* rB is NaN */
292 vc.wp[1] = 0x0;
293 } 285 }
294 goto update_regs; 286 goto update_regs;
295 287
@@ -306,16 +298,25 @@ int do_spe_mathemu(struct pt_regs *regs)
306 } 298 }
307 299
308 case EFSCTSI: 300 case EFSCTSI:
309 case EFSCTSIZ:
310 case EFSCTUI: 301 case EFSCTUI:
302 if (SB_c == FP_CLS_NAN) {
303 vc.wp[1] = 0;
304 FP_SET_EXCEPTION(FP_EX_INVALID);
305 } else {
306 FP_TO_INT_ROUND_S(vc.wp[1], SB, 32,
307 ((func & 0x3) != 0));
308 }
309 goto update_regs;
310
311 case EFSCTSIZ:
311 case EFSCTUIZ: 312 case EFSCTUIZ:
312 if (func & 0x4) { 313 if (SB_c == FP_CLS_NAN) {
313 _FP_ROUND(1, SB); 314 vc.wp[1] = 0;
315 FP_SET_EXCEPTION(FP_EX_INVALID);
314 } else { 316 } else {
315 _FP_ROUND_ZERO(1, SB); 317 FP_TO_INT_S(vc.wp[1], SB, 32,
318 ((func & 0x3) != 0));
316 } 319 }
317 FP_TO_INT_S(vc.wp[1], SB, 32,
318 (((func & 0x3) != 0) || SB_s));
319 goto update_regs; 320 goto update_regs;
320 321
321 default: 322 default:
@@ -404,22 +405,13 @@ cmp_s:
404 405
405 case EFDCTSF: 406 case EFDCTSF:
406 case EFDCTUF: 407 case EFDCTUF:
407 if (!((vb.wp[0] >> 20) == 0x7ff && 408 if (DB_c == FP_CLS_NAN) {
408 ((vb.wp[0] & 0xfffff) > 0 || (vb.wp[1] > 0)))) { 409 vc.wp[1] = 0;
409 /* not a NaN */ 410 FP_SET_EXCEPTION(FP_EX_INVALID);
410 if (((vb.wp[0] >> 20) & 0x7ff) == 0) { 411 } else {
411 /* denorm */ 412 DB_e += (func == EFDCTSF ? 31 : 32);
412 vc.wp[1] = 0x0; 413 FP_TO_INT_ROUND_D(vc.wp[1], DB, 32,
413 } else if ((vb.wp[0] >> 31) == 0) { 414 (func == EFDCTSF));
414 /* positive normal */
415 vc.wp[1] = (func == EFDCTSF) ?
416 0x7fffffff : 0xffffffff;
417 } else { /* negative normal */
418 vc.wp[1] = (func == EFDCTSF) ?
419 0x80000000 : 0x0;
420 }
421 } else { /* NaN */
422 vc.wp[1] = 0x0;
423 } 415 }
424 goto update_regs; 416 goto update_regs;
425 417
@@ -437,21 +429,35 @@ cmp_s:
437 429
438 case EFDCTUIDZ: 430 case EFDCTUIDZ:
439 case EFDCTSIDZ: 431 case EFDCTSIDZ:
440 _FP_ROUND_ZERO(2, DB); 432 if (DB_c == FP_CLS_NAN) {
441 FP_TO_INT_D(vc.dp[0], DB, 64, ((func & 0x1) == 0)); 433 vc.dp[0] = 0;
434 FP_SET_EXCEPTION(FP_EX_INVALID);
435 } else {
436 FP_TO_INT_D(vc.dp[0], DB, 64,
437 ((func & 0x1) == 0));
438 }
442 goto update_regs; 439 goto update_regs;
443 440
444 case EFDCTUI: 441 case EFDCTUI:
445 case EFDCTSI: 442 case EFDCTSI:
443 if (DB_c == FP_CLS_NAN) {
444 vc.wp[1] = 0;
445 FP_SET_EXCEPTION(FP_EX_INVALID);
446 } else {
447 FP_TO_INT_ROUND_D(vc.wp[1], DB, 32,
448 ((func & 0x3) != 0));
449 }
450 goto update_regs;
451
446 case EFDCTUIZ: 452 case EFDCTUIZ:
447 case EFDCTSIZ: 453 case EFDCTSIZ:
448 if (func & 0x4) { 454 if (DB_c == FP_CLS_NAN) {
449 _FP_ROUND(2, DB); 455 vc.wp[1] = 0;
456 FP_SET_EXCEPTION(FP_EX_INVALID);
450 } else { 457 } else {
451 _FP_ROUND_ZERO(2, DB); 458 FP_TO_INT_D(vc.wp[1], DB, 32,
459 ((func & 0x3) != 0));
452 } 460 }
453 FP_TO_INT_D(vc.wp[1], DB, 32,
454 (((func & 0x3) != 0) || DB_s));
455 goto update_regs; 461 goto update_regs;
456 462
457 default: 463 default:
@@ -556,37 +562,60 @@ cmp_d:
556 cmp = -1; 562 cmp = -1;
557 goto cmp_vs; 563 goto cmp_vs;
558 564
559 case EVFSCTSF:
560 __asm__ __volatile__ ("mtspr 512, %4\n"
561 "efsctsf %0, %2\n"
562 "efsctsf %1, %3\n"
563 : "=r" (vc.wp[0]), "=r" (vc.wp[1])
564 : "r" (vb.wp[0]), "r" (vb.wp[1]), "r" (0));
565 goto update_regs;
566
567 case EVFSCTUF: 565 case EVFSCTUF:
568 __asm__ __volatile__ ("mtspr 512, %4\n" 566 case EVFSCTSF:
569 "efsctuf %0, %2\n" 567 if (SB0_c == FP_CLS_NAN) {
570 "efsctuf %1, %3\n" 568 vc.wp[0] = 0;
571 : "=r" (vc.wp[0]), "=r" (vc.wp[1]) 569 FP_SET_EXCEPTION(FP_EX_INVALID);
572 : "r" (vb.wp[0]), "r" (vb.wp[1]), "r" (0)); 570 } else {
571 SB0_e += (func == EVFSCTSF ? 31 : 32);
572 FP_TO_INT_ROUND_S(vc.wp[0], SB0, 32,
573 (func == EVFSCTSF));
574 }
575 if (SB1_c == FP_CLS_NAN) {
576 vc.wp[1] = 0;
577 FP_SET_EXCEPTION(FP_EX_INVALID);
578 } else {
579 SB1_e += (func == EVFSCTSF ? 31 : 32);
580 FP_TO_INT_ROUND_S(vc.wp[1], SB1, 32,
581 (func == EVFSCTSF));
582 }
573 goto update_regs; 583 goto update_regs;
574 584
575 case EVFSCTUI: 585 case EVFSCTUI:
576 case EVFSCTSI: 586 case EVFSCTSI:
587 if (SB0_c == FP_CLS_NAN) {
588 vc.wp[0] = 0;
589 FP_SET_EXCEPTION(FP_EX_INVALID);
590 } else {
591 FP_TO_INT_ROUND_S(vc.wp[0], SB0, 32,
592 ((func & 0x3) != 0));
593 }
594 if (SB1_c == FP_CLS_NAN) {
595 vc.wp[1] = 0;
596 FP_SET_EXCEPTION(FP_EX_INVALID);
597 } else {
598 FP_TO_INT_ROUND_S(vc.wp[1], SB1, 32,
599 ((func & 0x3) != 0));
600 }
601 goto update_regs;
602
577 case EVFSCTUIZ: 603 case EVFSCTUIZ:
578 case EVFSCTSIZ: 604 case EVFSCTSIZ:
579 if (func & 0x4) { 605 if (SB0_c == FP_CLS_NAN) {
580 _FP_ROUND(1, SB0); 606 vc.wp[0] = 0;
581 _FP_ROUND(1, SB1); 607 FP_SET_EXCEPTION(FP_EX_INVALID);
582 } else { 608 } else {
583 _FP_ROUND_ZERO(1, SB0); 609 FP_TO_INT_S(vc.wp[0], SB0, 32,
584 _FP_ROUND_ZERO(1, SB1); 610 ((func & 0x3) != 0));
611 }
612 if (SB1_c == FP_CLS_NAN) {
613 vc.wp[1] = 0;
614 FP_SET_EXCEPTION(FP_EX_INVALID);
615 } else {
616 FP_TO_INT_S(vc.wp[1], SB1, 32,
617 ((func & 0x3) != 0));
585 } 618 }
586 FP_TO_INT_S(vc.wp[0], SB0, 32,
587 (((func & 0x3) != 0) || SB0_s));
588 FP_TO_INT_S(vc.wp[1], SB1, 32,
589 (((func & 0x3) != 0) || SB1_s));
590 goto update_regs; 619 goto update_regs;
591 620
592 default: 621 default:
@@ -681,14 +710,16 @@ int speround_handler(struct pt_regs *regs)
681 union dw_union fgpr; 710 union dw_union fgpr;
682 int s_lo, s_hi; 711 int s_lo, s_hi;
683 int lo_inexact, hi_inexact; 712 int lo_inexact, hi_inexact;
684 unsigned long speinsn, type, fc, fptype; 713 int fp_result;
714 unsigned long speinsn, type, fb, fc, fptype, func;
685 715
686 if (get_user(speinsn, (unsigned int __user *) regs->nip)) 716 if (get_user(speinsn, (unsigned int __user *) regs->nip))
687 return -EFAULT; 717 return -EFAULT;
688 if ((speinsn >> 26) != 4) 718 if ((speinsn >> 26) != 4)
689 return -EINVAL; /* not an spe instruction */ 719 return -EINVAL; /* not an spe instruction */
690 720
691 type = insn_type(speinsn & 0x7ff); 721 func = speinsn & 0x7ff;
722 type = insn_type(func);
692 if (type == XCR) return -ENOSYS; 723 if (type == XCR) return -ENOSYS;
693 724
694 __FPU_FPSCR = mfspr(SPRN_SPEFSCR); 725 __FPU_FPSCR = mfspr(SPRN_SPEFSCR);
@@ -708,6 +739,65 @@ int speround_handler(struct pt_regs *regs)
708 fgpr.wp[0] = current->thread.evr[fc]; 739 fgpr.wp[0] = current->thread.evr[fc];
709 fgpr.wp[1] = regs->gpr[fc]; 740 fgpr.wp[1] = regs->gpr[fc];
710 741
742 fb = (speinsn >> 11) & 0x1f;
743 switch (func) {
744 case EFSCTUIZ:
745 case EFSCTSIZ:
746 case EVFSCTUIZ:
747 case EVFSCTSIZ:
748 case EFDCTUIDZ:
749 case EFDCTSIDZ:
750 case EFDCTUIZ:
751 case EFDCTSIZ:
752 /*
753 * These instructions always round to zero,
754 * independent of the rounding mode.
755 */
756 return 0;
757
758 case EFSCTUI:
759 case EFSCTUF:
760 case EVFSCTUI:
761 case EVFSCTUF:
762 case EFDCTUI:
763 case EFDCTUF:
764 fp_result = 0;
765 s_lo = 0;
766 s_hi = 0;
767 break;
768
769 case EFSCTSI:
770 case EFSCTSF:
771 fp_result = 0;
772 /* Recover the sign of a zero result if possible. */
773 if (fgpr.wp[1] == 0)
774 s_lo = regs->gpr[fb] & SIGN_BIT_S;
775 break;
776
777 case EVFSCTSI:
778 case EVFSCTSF:
779 fp_result = 0;
780 /* Recover the sign of a zero result if possible. */
781 if (fgpr.wp[1] == 0)
782 s_lo = regs->gpr[fb] & SIGN_BIT_S;
783 if (fgpr.wp[0] == 0)
784 s_hi = current->thread.evr[fb] & SIGN_BIT_S;
785 break;
786
787 case EFDCTSI:
788 case EFDCTSF:
789 fp_result = 0;
790 s_hi = s_lo;
791 /* Recover the sign of a zero result if possible. */
792 if (fgpr.wp[1] == 0)
793 s_hi = current->thread.evr[fb] & SIGN_BIT_S;
794 break;
795
796 default:
797 fp_result = 1;
798 break;
799 }
800
711 pr_debug("round fgpr: %08x %08x\n", fgpr.wp[0], fgpr.wp[1]); 801 pr_debug("round fgpr: %08x %08x\n", fgpr.wp[0], fgpr.wp[1]);
712 802
713 switch (fptype) { 803 switch (fptype) {
@@ -719,15 +809,30 @@ int speround_handler(struct pt_regs *regs)
719 if ((FP_ROUNDMODE) == FP_RND_PINF) { 809 if ((FP_ROUNDMODE) == FP_RND_PINF) {
720 if (!s_lo) fgpr.wp[1]++; /* Z > 0, choose Z1 */ 810 if (!s_lo) fgpr.wp[1]++; /* Z > 0, choose Z1 */
721 } else { /* round to -Inf */ 811 } else { /* round to -Inf */
722 if (s_lo) fgpr.wp[1]++; /* Z < 0, choose Z2 */ 812 if (s_lo) {
813 if (fp_result)
814 fgpr.wp[1]++; /* Z < 0, choose Z2 */
815 else
816 fgpr.wp[1]--; /* Z < 0, choose Z2 */
817 }
723 } 818 }
724 break; 819 break;
725 820
726 case DPFP: 821 case DPFP:
727 if (FP_ROUNDMODE == FP_RND_PINF) { 822 if (FP_ROUNDMODE == FP_RND_PINF) {
728 if (!s_hi) fgpr.dp[0]++; /* Z > 0, choose Z1 */ 823 if (!s_hi) {
824 if (fp_result)
825 fgpr.dp[0]++; /* Z > 0, choose Z1 */
826 else
827 fgpr.wp[1]++; /* Z > 0, choose Z1 */
828 }
729 } else { /* round to -Inf */ 829 } else { /* round to -Inf */
730 if (s_hi) fgpr.dp[0]++; /* Z < 0, choose Z2 */ 830 if (s_hi) {
831 if (fp_result)
832 fgpr.dp[0]++; /* Z < 0, choose Z2 */
833 else
834 fgpr.wp[1]--; /* Z < 0, choose Z2 */
835 }
731 } 836 }
732 break; 837 break;
733 838
@@ -738,10 +843,18 @@ int speround_handler(struct pt_regs *regs)
738 if (hi_inexact && !s_hi) 843 if (hi_inexact && !s_hi)
739 fgpr.wp[0]++; /* Z_high word > 0, choose Z1 */ 844 fgpr.wp[0]++; /* Z_high word > 0, choose Z1 */
740 } else { /* round to -Inf */ 845 } else { /* round to -Inf */
741 if (lo_inexact && s_lo) 846 if (lo_inexact && s_lo) {
742 fgpr.wp[1]++; /* Z_low < 0, choose Z2 */ 847 if (fp_result)
743 if (hi_inexact && s_hi) 848 fgpr.wp[1]++; /* Z_low < 0, choose Z2 */
744 fgpr.wp[0]++; /* Z_high < 0, choose Z2 */ 849 else
850 fgpr.wp[1]--; /* Z_low < 0, choose Z2 */
851 }
852 if (hi_inexact && s_hi) {
853 if (fp_result)
854 fgpr.wp[0]++; /* Z_high < 0, choose Z2 */
855 else
856 fgpr.wp[0]--; /* Z_high < 0, choose Z2 */
857 }
745 } 858 }
746 break; 859 break;
747 860