<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Hi Joe,<br>
<br>
I think almost every one has his/her own opinion on this... Here is
what I think:<br>
<br>
1) The test set should be such that each "relatively thin resolution
shell" receives at least 50 reflections, and we empirically found
that 150 is "good enough" withing phenix.refine framework. <br>
For "relatively thin resolution shell" definition see:<br>
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
Lunin & Skovoroda. Acta Cryst. (1995). A51, 880-887.
"R-free likelihood-based estimates of errors for phases calculated
from atomic models".<br>
<br>
This basically defines how many test reflections you need.<br>
<br>
2) It is customary to set aside either 5 or 10% for test set, with
the total maximum 2000. These are all "magic numbers", that I
presume more or less satisfy "1)" so they became widely used.<br>
<br>
3) Presence of high-order NCS and selecting free-flags using "thin
shells" algorithm is a different story (Acta Cryst. (2006). D62,
227–238). It is good to do that because it removes the cross-talk
between test and work reflections due to NCS, but at the same time
it invalidates the requirement "1)". So, this is a gray area (for me
at least).<br>
<br>
4) Some people believe that the final refinement run should be done
using all reflections, arguing that taking away 5-10% of test
reflections worsens the maps. There is some truth in this, yes,
removing the data worsens the maps, but:<br>
a) it is noticeable (in a sense that it can reduce the
interpretability of some parts of the map) only in extreme cases of
somewhat low resolution or low completeness data, b) in most of all
other cases it is simply negligible, c) removing reflections
randomly has much smaller effect than removing them systematically
(see page #40 here:
<a class="moz-txt-link-freetext" href="http://www.phenix-online.org/presentations/latest/pavel_maps.pdf">http://www.phenix-online.org/presentations/latest/pavel_maps.pdf</a> and
some relevant references in 2010 PHENIX paper in Acta D). However,
if you do that "final run", you will invalidate the final refinement
statistics, Rfree and Rwork, and thus obtained final structure
cannot have the Rfree associated with it anymore.<br>
<br>
Pavel.<br>
<br>
<br>
On 8/3/10 10:04 AM, Joseph Noel wrote:
<blockquote cite="mid:257DBAE2-D059-43E4-96F3-B7768711839E@salk.edu"
type="cite">Hi Folks,
<div><br>
</div>
<div>Its been a while since I personally refined many structures.
In the past, I used as a default, 5% of my unique reflections
for the Free R test set. I have a high resolution structure with
150,000 unique reflections and noticed that Phenix defaults are
5% or 2000 reflections which ever is smaller. What is the
current consensus on an adequate number of unique reflections to
use for cross-validation?</div>
<div><br>
</div>
<div>Thanks!</div>
<div>Joe</div>
<div><br>
</div>
<div>P.S. I really, really love Phenix. <br>
<div>
<span class="Apple-style-span" style="border-collapse:
separate; color: rgb(0, 0, 0); font-family: 'Lucida Grande';
font-size: medium; font-style: normal; font-variant: normal;
font-weight: normal; letter-spacing: normal; line-height:
normal; orphans: 2; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;"><span
class="Apple-style-span" style="border-collapse: separate;
color: rgb(0, 0, 0); font-family: 'Lucida Grande';
font-size: medium; font-style: normal; font-variant:
normal; font-weight: normal; letter-spacing: normal;
line-height: normal; orphans: 2; text-indent: 0px;
text-transform: none; white-space: normal; widows: 2;
word-spacing: 0px;">
<div style="word-wrap: break-word;">___________________________________________________________</div>
<div style="word-wrap: break-word;">Joseph P. Noel, Ph.D.<br>
Investigator, Howard Hughes Medical Institute<br>
Professor, The Jack H. Skirball Center for Chemical
Biology and Proteomics<br>
The Salk Institute for Biological Studies<br>
10010 North Torrey Pines Road<br>
La Jolla, CA 92037 USA<br>
<br>
Phone: (858) 453-4100 extension 1442<br>
Cell: (858) 349-4700<br>
Fax: (858) 597-0855<br>
E-mail: <a moz-do-not-send="true"
href="mailto:noel@salk.edu">noel@salk.edu</a><br>
<br>
Web Site (Salk): <a moz-do-not-send="true"
href="http://www.salk.edu/faculty/faculty_details.php?id=37">http://www.salk.edu/faculty/faculty_details.php?id=37</a><br>
Web Site (HHMI): <a moz-do-not-send="true"
href="http://hhmi.org/research/investigators/noel.html">http://hhmi.org/research/investigators/noel.html</a><br>
___________________________________________________________<br>
</div>
</span></span>
</div>
<br>
</div>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
phenixbb mailing list
<a class="moz-txt-link-abbreviated" href="mailto:phenixbb@phenix-online.org">phenixbb@phenix-online.org</a>
<a class="moz-txt-link-freetext" href="http://phenix-online.org/mailman/listinfo/phenixbb">http://phenix-online.org/mailman/listinfo/phenixbb</a>
</pre>
</blockquote>
<br>
</body>
</html>