[cctbxbb] use_internal_variance in iotbx.merging_statistics

richard.gildea at diamond.ac.uk richard.gildea at diamond.ac.uk
Tue Nov 1 02:21:32 PDT 2016


Dear Keitaro,

iotbx.merging_statistics does have the option to change the parameter use_internal_variance. In xia2 we use the defaults use_internal_variance=False, eliminate_sys_absent=False, n_bins=20, when calculating merging statistics which give comparable results to those calculate by Aimless:

$ iotbx.merging_statistics 
Usage: 
phenix.merging_statistics [data_file] [options...]

Calculate merging statistics for non-unique data, including R-merge, R-meas,
R-pim, and redundancy.  Any format supported by Phenix is allowed, including
MTZ, unmerged Scalepack, or XDS/XSCALE (and possibly others).  Data should
already be on a common scale, but with individual observations unmerged.
  Diederichs K & Karplus PA (1997) Nature Structural Biology 4:269-275
    (with erratum in: Nat Struct Biol 1997 Jul;4(7):592)
  Weiss MS (2001) J Appl Cryst 34:130-135.
  Karplus PA & Diederichs K (2012) Science 336:1030-3.


Full parameters:

  file_name = None
  labels = None
  space_group = None
  unit_cell = None
  symmetry_file = None
  high_resolution = None
  low_resolution = None
  n_bins = 10
  extend_d_max_min = False
  anomalous = False
  sigma_filtering = *auto xds scala scalepack
    .help = "Determines how data are filtered by SigmaI and I/SigmaI. XDS"
            "discards reflections whose intensity after merging is less than"
            "-3*sigma, Scalepack uses the same cutoff before merging, and"
            "SCALA does not do any filtering. Reflections with negative SigmaI"
            "will always be discarded."
  use_internal_variance = True
  eliminate_sys_absent = True
  debug = False
  loggraph = False
  estimate_cutoffs = False
  job_title = None
    .help = "Job title in PHENIX GUI, not used on command line"


Below is my email to Pavel and Billy when we discussed this issue by email a while back:

The difference between use_internal_variance=True/False is explained in Luc's document here:

libtbx.pdflatex $(libtbx.find_in_repositories cctbx/miller)/equivalent_reflection_merging.tex

Essentially use_internal_variance=False uses only the unmerged sigmas to compute the merged sigmas, whereas use_internal_variance=True uses instead the spread of the unmerged intensities to compute the merged sigmas. Furthermore, use_internal_variance=True uses the largest of the variance coming from the spread of the intensities and that computed from the unmerged sigmas. As a result, use_internal_variance=True can only ever give lower I/sigI than use_internal_variance=False. The relevant code in the cctbx is here:

https://sourceforge.net/p/cctbx/code/HEAD/tree/trunk/cctbx/miller/merge_equivalents.h#l379

Aimless has a similar option for the SDCORRECTION keyword, if you set the option SAMPLESD, which I think is equivalent to use_internal_variance=True. The default behaviour of Aimless is equivalent to use_internal_variance=False:

http://www.mrc-lmb.cam.ac.uk/harry/pre/aimless.html#sdcorrection

"SAMPLESD is intended for very high multiplicity data such as XFEL serial data. The final SDs are estimated from the weighted population variance, assuming that the input sigma(I)^2 values are proportional to the true errors. This probably gives a more realistic estimate of the error in <I>. In this case refinement of the corrections is switched off unless explicitly requested."

I think that the "external" variance is probably better if the sigmas from the scaling program are reliable, or for low multiplicity data. For high multiplicity data or if the sigmas from the scaling program are not reliable, then "internal" variance is probably better.

Cheers,

Richard

Dr Richard Gildea
Data Analysis Scientist
Tel: +441235 77 8078

Diamond Light Source Ltd.
Diamond House
Harwell Science & Innovation Campus
Didcot
Oxfordshire
OX11 0DE

________________________________________
From: cctbxbb-bounces at phenix-online.org [cctbxbb-bounces at phenix-online.org] on behalf of Keitaro Yamashita [k.yamashita at spring8.or.jp]
Sent: 01 November 2016 07:23
To: cctbx mailing list
Subject: [cctbxbb] use_internal_variance in iotbx.merging_statistics

Dear Phenix/CCTBX developers,

iotbx/merging_statistics.py is used by phenix.merging_statistics,
phenix.table_one, and so on. By upgrading phenix from 1.10.1 to 1.11,
merging statistics-related codes were significantly changed.

Previously, miller.array.merge_equivalents() was always called with
argument use_internal_variance=False, which is consistent with XDS,
Aimless and so on. Currently, use_internal_variance=True is default,
and cannot be changed by users (see below).

These changes were made by @afonine and @rjgildea in rev. 22973 (Sep
26, 2015) and 23961 (Mar 8, 2016). Could anyone explain why these
changes were introduced?

https://sourceforge.net/p/cctbx/code/22973
https://sourceforge.net/p/cctbx/code/23961


My points are:

- We actually cannot control use_internal_variance= parameter because
it is not passed to merge_equivalents() in class
filter_intensities_by_sigma.

- In previous versions, if I gave XDS output to
phenix.merging_statistics, <I/sigma> values calculated in the same way
(as XDS does) were shown; but not in the current version.

- For (for example) phenix.table_one users who expect this behavior,
it can give inconsistency. The statistics would not be consistent with
the data used in refinement.


cf. the related discussion in cctbxbb:
http://phenix-online.org/pipermail/cctbxbb/2012-October/000611.html


Best regards,
Keitaro
_______________________________________________
cctbxbb mailing list
cctbxbb at phenix-online.org
http://phenix-online.org/mailman/listinfo/cctbxbb

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom




More information about the cctbxbb mailing list