[phenixbb] Are sigma cutoffs for R-free reflections cheating?

Thu Dec 10 10:28:18 PST 2009

(I got busy and did not follow up earlier.)

I think the value of weak reflections should be very obvious, if you 
think about it correctly, and is why I/sigma cutoffs have been 
discouraged for many years now. The significance of a reflection is not 
its overall amplitude, but how far it deviates from the expected value, 
which is approximately the average value for a resolution bin. Weak 
reflections are far more useful than average-intensity reflections. If 
you consider the 2D vector space of a complex number, they are very well 
defined because the phase is not as important.

People get confused about I/sigma significance for two reasons. First, 
weak reflections have no effect on maps, and many rules-of-thumb were 
developed from heavy-atom methods. Refinement is different.

Second, I/sigma is a useful validity measure for a set (resolution bin) 
of reflections, because it indicates the significance of the expectation 
value. If the I/sigma for a shell is small, than all reflections are 
approximately the same as the expectation value.

In my experience, anisotropic data can be poorly behaved when many weak 
reflections are missing in the low-resolution directions, because there 
is nothing to prevent the model from refining to non-zero values there.

My point is that no matter what the argument for the value of culling 
reflections, or any other sort of weighting scheme, it should never be 
applied to the "true" R-free value.

In practice, some culling is sensible when scaling, but it should be 
restricted to rejections based on multiple observations of the same 
reflection, and not systematic culling such as I/sigma cutoffs. That is 
why HKL sets the default sigma cutoff to -3.

Joe Krahn

Randy Read wrote:
> Pavel wanted some evidence of whether or not it makes a difference to  
> omit very weak reflections.  Here's one relevant paper.  Hirshfeld and  
> Rabinovich (Acta Cryst. A29: 510-513, 1973) showed in a numerical  
> experiment that, if you omit weak intensities, there is a systematic  
> error in refined scale and ADP parameters.  They used least squares so  
> it's possible that the results would be somewhat different with  
> maximum likelihood targets, but at least here is an objective  
> demonstration that the weak data can have a significant influence.
> 
> Regards,
> 
> Randy
...