Hi Joe,
Normally, 5% for R-free is sufficient.
did anyone studied this and came to this conclusion (publication?)?
I'm not aware.
In fact, the absolute number is important. The number of test
reflections per relatively thin resolution shell has to be not smaller
than 50. This assures that the determination of maximum-likelihood
target parameters (alpha/beta, or sigmaa) is well defined. More -
better, but too much is not good since excluding too many reflections
is not good too. If interpolation is used, then less than 50 can be
used (I guess what CNS is using), but I have reasons to not like it.
When phenix.refine creates test reflections, by default it is 10%, but
not more than 2000.
Even though you may not do
real-space refinement with free reflections, external tools can do that
with the maps written out.
It is the most efficient to combine local and global real-space
refinement with reciprocal space refinement. I call it dual-space
refinement. This is why it is tightly integrated into phenix.refine:
http://cci.lbl.gov/~afonine/fix_rotamers/fit_rotamers.pdf
I am sure that many people will use it when
they find that real-space fitting and refinement tools lower R-free,
unaware that they are cheating.
Sure. This is why free-R flags are not used in maps calculation for
real-space refinement (my previous email).
With 10% test reflection, I suspect that difference maps used to find
waters can easily find a few noise peaks with significant R-free
contributions.
- I'm not aware of any systematic study on this matter, although I can
believe it in theory;
- phenix.refine uses very sophisticated filtering tools;
- I guess at some point I will switch to using Average Kick Maps for
water picking. This will remove the noise peaks, and so eliminate the
problem (I need to test this all, though).
IMHO, using test reflections for anything but computing R-free should
always be avoided unless you are unable to proceed using only the
non-test. Using test reflections is always cheating to some extent,
although trivial amounts of bias are probably removed during refinement.
I just think it is better to be very strict about test reflections and
avoid the possibility of bias.
Test reflections are used for calculation of m and D in 2mFo-DFc and
mFo-DFc maps, as well as in alpha/beta parameters of ML target. This is
inevitable.
Exclusion of test reflections ought to be an option. Ideally, deposited
PDB files should report whether maps used for model building included
test reflections.
Ideally yes, but I can name a hundred of other similarly important
parameters to report. At least a set of all parameters must be
reported so the published R-factors are 100% reproducible - I'm sure
this is an easy doable goal to start with.
Pavel.