[cctbxbb] Trapping NaN or not trapping NaN

Graeme.Winter at diamond.ac.uk Graeme.Winter at diamond.ac.uk
Mon Apr 11 01:11:34 PDT 2016


Agree - compiler writers know more than I do about making fast code :)

This trapping is a real pain and probably was useful back when SGI's and Alphas roamed the earth, but today I think this is just unhelpful - I vote for retiring this

Will also help with graphics programs etc too! Often get crashes in matplotlib from this...

Thanks Graeme

-----Original Message-----
From: cctbxbb-bounces at phenix-online.org [mailto:cctbxbb-bounces at phenix-online.org] On Behalf Of Luc Bourhis
Sent: 11 April 2016 09:07
To: cctbx mailing list
Subject: Re: [cctbxbb] Trapping NaN or not trapping NaN

Hi Paul,

> On 11 Apr 2016, at 05:07, Paul Adams <pdadams at lbl.gov> wrote:
> Okay, this sounds like a reasonable solution under the circumstances. Do you anticipate any problems with removing the trap? I only have the vaguest recollection of why it was in there, other than to make sure the code didn’t have errors that created NaN.

First, let me remind we trap three floating point exceptions: division-by-zero, overflow (i.e. infinity), and NaN. And that we don’t do that for releases: only for development. The only point of trapping, really, is that the code fails as close to the origin of the problem as possible. Indeed if a NaN, or an +/- infinity was to make its way to our tests, we could catch it: one test would fail. Especially true for NaN, since if a is NaN then a > b, a <= b, etc are all false, for any value of b. But then we see a NaN or a +/- Infinity as the end result of what might be a long series of code, going through several layers of the cctbx and then the scitbx for example: it might be difficult to find where the problem actually occurs. 

The problem is mitigated by the fine granularity of our tests: if e.g. a division-by-zero occurs in matrix inversion in the scitbx and makes infinities appear in a cctbx test, I would say that it is most likely the scitbx test for matrix inversion would have caught it too. But there is still the risk of a hard to chase bug. But then the developer investigating the bug can switch trapping on again. With the caveat that he may then trigger spurious bugs like the one we are discussing. Considering all those remarks, I’d say it is a good as it gets.

Thinking some more about it, the problem spreads to all 3 floating point exceptions. Indeed the problem we have now for NaN could appear for the other traps too. The following code:

double x = x != 0 ? 1/x : 0;

could be optimised by the compiler so as to compute 1/x regardless of the value of x and then pick either the result of that or 0 depending on the test x != 0. Since we trap division by 0, we would crash although the code returns a perfectly valid value. Same with the following code:

double y = x > too_big ? plateau : std::exp(x);

The compiler may decide to always evaluate “plateau” and std::exp(x) in parallel and then select the right result. Then we may get +infinity if x is too big.

But let’s be conservative: we have only seen issues with NaN’s, so let’s disable that trap only for now, keeping in mind we may need to remove the others too.

>  I’m hoping that compilers may have moved on in the last few years… 

You have to realise that it is what we are doing that is esoteric. Trapping floating-point exceptions is not standard at all and the like of Boost certainly don’t routinely test in an environment with such traps. There are high-profile codes out there that do take advantage of infinities and division-by-zero to produce accurate results fast. Boost implementation of special functions and probability distribution thrives to be compliant with IEEE 754 and therefore gracefully handle infinities and zeroes for example.

Best wishes,


cctbxbb mailing list
cctbxbb at phenix-online.org

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

More information about the cctbxbb mailing list