[phenixbb] Adequate size for Free R test set?
pafonine at lbl.gov
Tue Aug 3 10:38:57 PDT 2010
I think almost every one has his/her own opinion on this... Here is what
1) The test set should be such that each "relatively thin resolution
shell" receives at least 50 reflections, and we empirically found that
150 is "good enough" withing phenix.refine framework.
For "relatively thin resolution shell" definition see:
Lunin & Skovoroda. Acta Cryst. (1995). A51, 880-887. "R-free
likelihood-based estimates of errors for phases calculated from atomic
This basically defines how many test reflections you need.
2) It is customary to set aside either 5 or 10% for test set, with the
total maximum 2000. These are all "magic numbers", that I presume more
or less satisfy "1)" so they became widely used.
3) Presence of high-order NCS and selecting free-flags using "thin
shells" algorithm is a different story (Acta Cryst. (2006). D62,
227--238). It is good to do that because it removes the cross-talk
between test and work reflections due to NCS, but at the same time it
invalidates the requirement "1)". So, this is a gray area (for me at least).
4) Some people believe that the final refinement run should be done
using all reflections, arguing that taking away 5-10% of test
reflections worsens the maps. There is some truth in this, yes, removing
the data worsens the maps, but:
a) it is noticeable (in a sense that it can reduce the interpretability
of some parts of the map) only in extreme cases of somewhat low
resolution or low completeness data, b) in most of all other cases it is
simply negligible, c) removing reflections randomly has much smaller
effect than removing them systematically (see page #40 here:
some relevant references in 2010 PHENIX paper in Acta D). However, if
you do that "final run", you will invalidate the final refinement
statistics, Rfree and Rwork, and thus obtained final structure cannot
have the Rfree associated with it anymore.
On 8/3/10 10:04 AM, Joseph Noel wrote:
> Hi Folks,
> Its been a while since I personally refined many structures. In the
> past, I used as a default, 5% of my unique reflections for the Free R
> test set. I have a high resolution structure with 150,000 unique
> reflections and noticed that Phenix defaults are 5% or 2000
> reflections which ever is smaller. What is the current consensus on an
> adequate number of unique reflections to use for cross-validation?
> P.S. I really, really love Phenix.
> Joseph P. Noel, Ph.D.
> Investigator, Howard Hughes Medical Institute
> Professor, The Jack H. Skirball Center for Chemical Biology and Proteomics
> The Salk Institute for Biological Studies
> 10010 North Torrey Pines Road
> La Jolla, CA 92037 USA
> Phone: (858) 453-4100 extension 1442
> Cell: (858) 349-4700
> Fax: (858) 597-0855
> E-mail: noel at salk.edu <mailto:noel at salk.edu>
> Web Site (Salk): http://www.salk.edu/faculty/faculty_details.php?id=37
> Web Site (HHMI): http://hhmi.org/research/investigators/noel.html
> phenixbb mailing list
> phenixbb at phenix-online.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the phenixbb