Re: [phenixbb] Rfree and a low resolution data set

4 Apr 2018

      Dear Toon, dear Sacha,

when you have very few reflections, too few for Rfree, you should use Rcomplete
instead. Rcomplete is the proper way to validate, and Rfree was only introduced
because Rcomplete takes longer to compute. In the 1990s this was an issue, but
nowadays, Rcomplete calculates within a few minutes on a decent computer in
cases where you have less than about 20,000 unique reflections. 

Rcomplete can be computed with a single reflection set aside during each
refinement run.

Rcomplete could also be used for ML estimates instead of Rfree, but as far as I
know this has not been implemented, yet.

The concept of Rcomplete was introduced by Axel Brunger. The good thing about
Rcomplete (and actually also Rfree) is that the need set aside the free set from
the very beginning is an urban myth: at any stage during refinement, the R-value 
from reflections not used during refinement is going to converge towards the
Rfree=Rcomplete, whether or not these reflections were used previously in
refinement. I called this "Tickle's conjecture", because I understood the idea
from Ian Tickle's discussions on ccp4bb.

the concept of Rcomplete was introduced in A. Bruenger, "Free R value: cross-
validation in crystallography", Meth. Enzymol. (1997), 277, 366-396. It's name
was, to the best of my knowledge, introduced with Jiang & Bruenger, "Protein
Hydration Observed by X-ray Diffraction", J. Mol. Biol. (1994), 243, 100-115.

A set of experiments the demonstrate the properties of Rcomplete, like
Rcomplete=Rfree and the removal of bias simply by refinement, are given in
Luebben & Gruene, "New method to compute Rcomplete enables maximum likelihood
refinement for small datasets", PNAS (2015), 112, 8999-9003.

When using SHELXL, there is a comfortable GUI to calculate Rcomplete written by
Jens Luebben, available at https://github.com/JLuebben/R_complete

For Refmac and Phenix.refine, it requires a little bit of scripting. PDB_REDO
also calculates Rcomplete.

The concept of Rcomplete is applicable to all sorts of refinement runs. It
usefuleness was recently demonstrated for charge density studies, Krause et al.,
"Validation of experimental charge-density refinement strategies: when do we
overfit?", IUCrJ (2017), 4, 420-430 (and therein renamed as Rcross).

Best,
Tim

On Wed, 2018-04-04 at 10:19 +0000, Alexandre OURJOUMTSEV wrote:
> Dear Toon,
> 
> I think some of your questions are addressed by the work
> 
> Praznikar, J. & Turk, D. (2014) Free kick instead of cross-validation in
> maximum-likelihood refinement of macromolecular crystal structures. Acta
> Cryst. D70, 3124-3134
> 
> (that does not say how to validate the result without the R-free but shows how
> to get this result).
> Please look it.
> 
> Note that the authors talk about ML and not about LS refinement; I wonder why
> you need to use LS.
> 
> Best regards,
> 
> Sacha Urzhumtsev
> 
> De : [email protected] [mailto:phenixbb-bounces@phenix-online
> .org] De la part de Toon Van Thillo
> Envoyé : mercredi 4 avril 2018 11:48
> À : [email protected]
> Objet : [phenixbb] Rfree and a low resolution data set
> 
> 
> Hi all,
> 
> 
> 
> Currently I am refining a data set which showed anisotropic diffraction.
> Aimless suggested cutoffs at 2.3, 2.6 and 3.6 angstrom for the h,k and l axis.
> 
> I chose a general 3.6 cutoff to obtain satisfactory statistics for Rmeas,
> I/sd(I) and CC1/2. At this resolution the data set consists of approximately
> 2800 reflections.
> 
> 
> 
> Generally 5% of the set is set aside as the Rfree test set and I found that a
> minimum of 500 reflections in total is used to produce a reliable Rfree.
> However, 5% only amounts to 140 reflections in this case. I am hesitant to
> include more reflections as I would have to go up to 20% of the reflections to
> obtain more than 500 reflections for the test set. In a discussion on the CCP4
> message boards some time ago it was suggested to do multiple refinements with
> different test sets:
> 
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=ccp4bb&F=&S=&P=125570
> 
> 
> 
> In the thread it was also discussed that a least squares approach is prefered
> when using a small test set. However, when using a LS target, the resulting
> Rfree is very high (10% higher than when using the automatic option) and
> phenix.refine produces awful geometry (24% ramachandran outliers, 105
> clashcore...). It seems that the refinement is performed without restraints?
> Optimize X-ray/stereochemistry weight does not result in improved
> stereochemistry. My question is if the LS approach is still relevant and if
> so, is there an explanation (and solution) for the bad statistics?
> 
> 
> 
> Kind regards,
> 
> 
> 
> Toon Van Thillo
> 
> 
> 
> 
> _______________________________________________
> phenixbb mailing list
> [email protected]
> http://phenix-online.org/mailman/listinfo/phenixbb
> Unsubscribe: [email protected]
-- 
--
Paul Scherrer Institut
Tim Gruene
- persoenlich -
OSUA/204
CH-5232 Villigen PSI
phone: +41 (0)56 310 5297

GPG Key ID = A46BEE1A