[phenixbb] Rfree and a low resolution data set

Wed Apr 4 05:44:05 PDT 2018

Thank you, Tim !

An excellent remark !

Sacha

-----Message d'origine-----
De : phenixbb-bounces at phenix-online.org [mailto:phenixbb-bounces at phenix-online.org] De la part de Tim Gruene
Envoyé : mercredi 4 avril 2018 14:29
À : phenixbb at phenix-online.org
Objet : Re: [phenixbb] Rfree and a low resolution data set

Dear Toon, dear Sacha,

when you have very few reflections, too few for Rfree, you should use Rcomplete instead. Rcomplete is the proper way to validate, and Rfree was only introduced because Rcomplete takes longer to compute. In the 1990s this was an issue, but nowadays, Rcomplete calculates within a few minutes on a decent computer in cases where you have less than about 20,000 unique reflections. 

Rcomplete can be computed with a single reflection set aside during each refinement run.

Rcomplete could also be used for ML estimates instead of Rfree, but as far as I know this has not been implemented, yet.

The concept of Rcomplete was introduced by Axel Brunger. The good thing about Rcomplete (and actually also Rfree) is that the need set aside the free set from the very beginning is an urban myth: at any stage during refinement, the R-value from reflections not used during refinement is going to converge towards the Rfree=Rcomplete, whether or not these reflections were used previously in refinement. I called this "Tickle's conjecture", because I understood the idea from Ian Tickle's discussions on ccp4bb.

the concept of Rcomplete was introduced in A. Bruenger, "Free R value: cross- validation in crystallography", Meth. Enzymol. (1997), 277, 366-396. It's name was, to the best of my knowledge, introduced with Jiang & Bruenger, "Protein Hydration Observed by X-ray Diffraction", J. Mol. Biol. (1994), 243, 100-115.

A set of experiments the demonstrate the properties of Rcomplete, like Rcomplete=Rfree and the removal of bias simply by refinement, are given in Luebben & Gruene, "New method to compute Rcomplete enables maximum likelihood refinement for small datasets", PNAS (2015), 112, 8999-9003.

When using SHELXL, there is a comfortable GUI to calculate Rcomplete written by Jens Luebben, available at https://github.com/JLuebben/R_complete

For Refmac and Phenix.refine, it requires a little bit of scripting. PDB_REDO also calculates Rcomplete.

The concept of Rcomplete is applicable to all sorts of refinement runs. It usefuleness was recently demonstrated for charge density studies, Krause et al., "Validation of experimental charge-density refinement strategies: when do we overfit?", IUCrJ (2017), 4, 420-430 (and therein renamed as Rcross).

Best,
Tim