[cctbxbb] Trapping NaN or not trapping NaN

Mon Apr 11 01:06:46 PDT 2016

Hi Paul,

> On 11 Apr 2016, at 05:07, Paul Adams <pdadams at lbl.gov> wrote:
> 
> 
> Okay, this sounds like a reasonable solution under the circumstances. Do you anticipate any problems with removing the trap? I only have the vaguest recollection of why it was in there, other than to make sure the code didn’t have errors that created NaN.

First, let me remind we trap three floating point exceptions: division-by-zero, overflow (i.e. infinity), and NaN. And that we don’t do that for releases: only for development. The only point of trapping, really, is that the code fails as close to the origin of the problem as possible. Indeed if a NaN, or an +/- infinity was to make its way to our tests, we could catch it: one test would fail. Especially true for NaN, since if a is NaN then a > b, a <= b, etc are all false, for any value of b. But then we see a NaN or a +/- Infinity as the end result of what might be a long series of code, going through several layers of the cctbx and then the scitbx for example: it might be difficult to find where the problem actually occurs. 

The problem is mitigated by the fine granularity of our tests: if e.g. a division-by-zero occurs in matrix inversion in the scitbx and makes infinities appear in a cctbx test, I would say that it is most likely the scitbx test for matrix inversion would have caught it too. But there is still the risk of a hard to chase bug. But then the developer investigating the bug can switch trapping on again. With the caveat that he may then trigger spurious bugs like the one we are discussing. Considering all those remarks, I’d say it is a good as it gets.

Thinking some more about it, the problem spreads to all 3 floating point exceptions. Indeed the problem we have now for NaN could appear for the other traps too. The following code:

double x = x != 0 ? 1/x : 0;

could be optimised by the compiler so as to compute 1/x regardless of the value of x and then pick either the result of that or 0 depending on the test x != 0. Since we trap division by 0, we would crash although the code returns a perfectly valid value. Same with the following code:

double y = x > too_big ? plateau : std::exp(x);

The compiler may decide to always evaluate “plateau” and std::exp(x) in parallel and then select the right result. Then we may get +infinity if x is too big.

But let’s be conservative: we have only seen issues with NaN’s, so let’s disable that trap only for now, keeping in mind we may need to remove the others too.

>  I’m hoping that compilers may have moved on in the last few years… 

You have to realise that it is what we are doing that is esoteric. Trapping floating-point exceptions is not standard at all and the like of Boost certainly don’t routinely test in an environment with such traps. There are high-profile codes out there that do take advantage of infinities and division-by-zero to produce accurate results fast. Boost implementation of special functions and probability distribution thrives to be compliant with IEEE 754 and therefore gracefully handle infinities and zeroes for example.

Best wishes,

Luc