[cctbxbb] Question about sigma(I) calculation in merge_equivalents

Phil Evans pre at mrc-lmb.cam.ac.uk
Fri Sep 7 08:53:20 PDT 2012


I think this is a hard question and I don't know the answer. Here a few of my thoughts

Phil

-------------- next part --------------
A non-text attachment was scrubbed...
Name: AveragingMultipleMeasurements.pdf
Type: application/pdf
Size: 396640 bytes
Desc: not available
URL: <http://phenix-online.org/pipermail/cctbxbb/attachments/20120907/14c85b83/attachment-0001.pdf>
-------------- next part --------------



On 6 Sep 2012, at 17:20, Keitaro Yamashita wrote:

> Dear Luc,
> 
> Thank you for your explanation! I understand better than before.
> 
> Usually, sigmas given by data processing programs are already
> corrected based on their error model.
> Sigmas are adjusted to match the actual scatter.
> I think using internal variance is re-correction of sigmas.
> Is it valid way? If internal variance is bigger, it suggests error
> model is not perfect?
> 
> I calculated external/internal variances using lysozyme (standard
> sample in protein crystallography) data.
> Intensities and sigmas are determined by XDS.
> 
> I attached two plots, where "Imean", "wsigma", "sigma" are averaged
> intensity, internal sigma, external sigma, respectively.
> One plot is histogram of wsigma/sigma by multiplicity. For lower
> multiplicity, we can see extreme discrepancies. (Note that each
> vertical axis is not on the same scale.)
> The other plot is wsigma/sigma vs intensity. Extreme discrepancies can
> be seen in lower intensities.
> I hope it could be interesting for you.
> 
> 
>> Crystals use only the external variance by default, the reasoning being that the internal variance being based on sample statistics is almost always too unreliable because groups of equivalent reflections are too small.
> 
> Then, I think it would be nice if we can choose the way in cctbx.
> I mean, option to choose "use bigger one" or "always use
> external/internal variance" would be nice to have.
> 
> I am looking forward to your comment.
> 
> Best regards,
> Keitaro
> 
> 2012/9/6 Luc Bourhis <luc_j_bourhis at mac.com>:
>> Dear Keitaro,
>> 
>>> But it is still unclear to me why it takes the greatest of the
>>> "internal" variance and "external" variance.
>>> Is it based on some tests using real data? or is it theoretically
>>> superior to using always external variance?
>> 
>> Those are good questions and to be honest I do not know for sure the answer to them. As it seems common in applied statistics, the treatment starts with by-the-book methods relying on a well defined theory but at the end there is always a completely heuristic twist. Particularly true in crystallography I would argue. But let me try to give some rationales.
>> 
>> It seems to me that the internal and external variance should not differ too much. Let's consider the two ways this may not be true.
>> 
>> 1. The quoted intensities of a group of equivalent reflections have a small spread, leading to a small internal variance, but the quoted sigma's are comparatively big, resulting in an external variance significantly bigger than the internal one. This is a possible event but an unlikely one: the statistical intuition in that case is to say that the small internal variance is a fluke and to use the external one instead.
>> 
>> 2. An external variance significantly smaller than the internal one, should ring an alarm bell. Indeed a small external variance means that the small quoted sigma's strongly suggests the intensities cannot spread too much from their assumed common true value whereas the comparatively bigger internal variance blatantly contradicts that. Thus either the intensities or the sigma's have not been correctly determined. Crystallographers seem to err on the side of trusting data here, i.e. to disregard the sigma's, and therefore to choose the internal variance.
>> 
>>> I would like to know how this method affects further crystallographic process.
>> 
>> I am afraid I do not have experience with your domain, protein crystallography. I know that the small molecule program Crystals use only the external variance by default, the reasoning being that the internal variance being based on sample statistics is almost always too unreliable because groups of equivalent reflections are too small. Since Crystals is as well accepted as ShelXL to produce publishable structures, it answer your question in at least in one corner of crystallography, unfortunately not yours.
>> 
>> I think it could be a simple and interesting exercise to take a representative protein dataset of yours, then to print the redundancy, the internal, and the external variance. Actually I would be surprised if such a study has not already been done and published. Perhaps some of the gurus on this forum can shed more lights onto that subject.
>> 
>> Best wishes,
>> 
>> Luc
>> 
>> _______________________________________________
>> cctbxbb mailing list
>> cctbxbb at phenix-online.org
>> http://phenix-online.org/mailman/listinfo/cctbxbb
> <internal_external_sigmas_vs_I.png><internal_external_sigmas_by_multiplicity.pdf>_______________________________________________
> cctbxbb mailing list
> cctbxbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/cctbxbb



More information about the cctbxbb mailing list