R/R free reported values
Hi Phenix and CCP4 community, (sorry for the cross posting). I was looking at a PDB file. The http://www.rcsb.org/ website page gives the following values: R-Value: 0.103 (work) R-Free: 0.134 The actual PDB file gives the following: R :0.110 FREE R VALUE : 0.170 I was wondering why the difference. The structure is 1.0A resolution 2CWS. Thank you, ____________________________________________________________ FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more!
On Mon, Oct 27, 2014 at 1:00 PM, Cedric
Hi Phenix and CCP4 community,
(sorry for the cross posting).
I was looking at a PDB file.
The http://www.rcsb.org/ website page gives the following values:
R-Value: 0.103 (work) R-Free: 0.134
The actual PDB file gives the following:
R :0.110 FREE R VALUE : 0.170
I was wondering why the difference. The structure is 1.0A resolution 2CWS.
I don't have an exact explanation, but there is some inconsistency in the way data are stored internally by the PDB. The mmCIF file is the most complete representation: _refine.ls_R_factor_R_work 0.103 _refine.ls_R_factor_R_free 0.134 ... _pdbx_refine.R_factor_all_no_cutoff 0.1101 _pdbx_refine.R_factor_obs_no_cutoff ? _pdbx_refine.free_R_factor_no_cutoff 0.1704 So, the first set is what gets displayed on the web page, the second set ends up in the PDB header. I suspect something went awry in deposition, but only the PDB and the depositors can answer that question. I wouldn't take the advertised statistics at face value anyway; I prefer to rely on recalculated values (and if these are significantly different, I view the structure with suspicion). In this case, phenix.model_vs_data says: r_work(re-computed) : 0.1081 r_free(re-computed) : 0.1377 which, accounting for the precision loss in PDB format and the differences between SHELXL and Phenix, are reasonable enough, and suggest that the values on the web page are probably accurate. -Nat
Hi Nat,
Would you define "significant" for me (as you see it of course)?
I am NOT a X-ray crystallographer but I was intrigued by this post. I have
RE-refined structures and have seen R/R (Free) go up say by 0.4
Say from 0.110 to 0.150
Would you say that is significant for high resolution structures ?
Thank you,
George
On Mon, Oct 27, 2014 at 4:13 PM, Nathaniel Echols
On Mon, Oct 27, 2014 at 1:00 PM, Cedric
wrote: Hi Phenix and CCP4 community,
(sorry for the cross posting).
I was looking at a PDB file.
The http://www.rcsb.org/ website page gives the following values:
R-Value: 0.103 (work) R-Free: 0.134
The actual PDB file gives the following:
R :0.110 FREE R VALUE : 0.170
I was wondering why the difference. The structure is 1.0A resolution 2CWS.
I don't have an exact explanation, but there is some inconsistency in the way data are stored internally by the PDB. The mmCIF file is the most complete representation:
_refine.ls_R_factor_R_work 0.103 _refine.ls_R_factor_R_free 0.134 ... _pdbx_refine.R_factor_all_no_cutoff 0.1101 _pdbx_refine.R_factor_obs_no_cutoff ? _pdbx_refine.free_R_factor_no_cutoff 0.1704
So, the first set is what gets displayed on the web page, the second set ends up in the PDB header. I suspect something went awry in deposition, but only the PDB and the depositors can answer that question.
I wouldn't take the advertised statistics at face value anyway; I prefer to rely on recalculated values (and if these are significantly different, I view the structure with suspicion). In this case, phenix.model_vs_data says:
r_work(re-computed) : 0.1081 r_free(re-computed) : 0.1377
which, accounting for the precision loss in PDB format and the differences between SHELXL and Phenix, are reasonable enough, and suggest that the values on the web page are probably accurate.
-Nat
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On Mon, Oct 27, 2014 at 1:19 PM, George Devaniranjan wrote: Would you define "significant" for me (as you see it of course)? Pavel's definition:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2906258/
"...the difference between R factors computed using the different methods
is typically less than 0.01%."
I think this is probably a typo and it is supposed to mean "1%" or "0.01",
which would have been my estimate. Certainly differences below 0.005 are
hardly worth noticing, and below 0.001 is statistical noise. Differences
above 0.01 are more worrisome (although not entirely unheard of).
I am NOT a X-ray crystallographer but I was intrigued by this post. I have RE-refined structures and have seen R/R (Free) go up say by 0.4
Say from 0.110 to 0.150
Would you say that is significant for high resolution structures ? Yes. It may reflect a sub-optimal refinement strategy, although I have
seen this happen occasionally when re-refining (in Phenix) an
ultra-high-resolution structure previously refined with SHELX.
-Nat
On 10/27/14 4:43 PM, Nathaniel Echols wrote:
On Mon, Oct 27, 2014 at 1:19 PM, George Devaniranjan
mailto:[email protected]> wrote: Would you define "significant" for me (as you see it of course)?
Pavel's definition:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2906258/
"...the difference between R factors computed using the different methods is typically less than 0.01%."
I think this is probably a typo and it is supposed to mean "1%" or "0.01", which would have been my estimate. Certainly differences below 0.005 are hardly worth noticing, and below 0.001 is statistical noise. Differences above 0.01 are more worrisome (although not entirely unheard of).
It depends what you compare and how.. Two scenarios: 1) Compute Fcalc using FFT (Fc_fft) and direct summation (Fc_direct) and compute R-factor(Fc_fft, Fc_direct). In this case indeed typically the R-factor will be below 1% (not 0.01% !). Attached script illustrates this (to run: phenix.python run.py). Also, Table 4 in Acta Cryst. (2004). A60, 19-32. On a fast calculation of structure factors at a subatomic resolution P. V. Afonine and A. Urzhumtsev does exactly this comparison. 2) You run two identical refinements, in one you use FFT and in the other one direct summation. In this case the difference between R-factors is likely to be below 0.01%. This is because refinable parameters will absorb the differences. Pavel
I believe SHELXL reports R1 cut at 4 sigma F and also R1 on all data, which I've got to bet is the source of those two different values. However I've never used SHELXL for protein and don't know if the cutoff is redefined for protein data. (R1 = R-factor on F). Phil Jeffrey Princeton On 10/27/14 4:13 PM, Nathaniel Echols wrote:
I don't have an exact explanation, but there is some inconsistency in the way data are stored internally by the PDB. The mmCIF file is the most complete representation:
_refine.ls_R_factor_R_work 0.103 _refine.ls_R_factor_R_free 0.134 ... _pdbx_refine.R_factor_all_no_cutoff 0.1101 _pdbx_refine.R_factor_obs_no_cutoff ? _pdbx_refine.free_R_factor_no_cutoff 0.1704
On Mon, Oct 27, 2014 at 1:48 PM, Phil Jeffrey
I believe SHELXL reports R1 cut at 4 sigma F and also R1 on all data, which I've got to bet is the source of those two different values. However I've never used SHELXL for protein and don't know if the cutoff is redefined for protein data.
No, but the fact that phenix.model_vs_data comes closer to the lower values (without discarding any data) suggests that the higher values don't reflect the statistics with cutoffs applied. (As expected, model_vs_data reports slightly lower R-factors when the cutoffs are used.) -Nat
Hi Phil, SHELXL does make no difference between structure types, and it always reports both values. I ignore the cut-off value, though. Differences are usually due to solvent models, and the less solvent in the structure the less differences you'd expect. In this case, though, I assume a type in the _pdbx_refine.free_R_factor_no_cutoff rather than anything else. The structure is from 2005, when much manual work was necessary for structure deposition. Best, Tim On 10/27/2014 09:48 PM, Phil Jeffrey wrote:
I believe SHELXL reports R1 cut at 4 sigma F and also R1 on all data, which I've got to bet is the source of those two different values. However I've never used SHELXL for protein and don't know if the cutoff is redefined for protein data.
(R1 = R-factor on F).
Phil Jeffrey Princeton
On 10/27/14 4:13 PM, Nathaniel Echols wrote:
I don't have an exact explanation, but there is some inconsistency in the way data are stored internally by the PDB. The mmCIF file is the most complete representation:
_refine.ls_R_factor_R_work 0.103 _refine.ls_R_factor_R_free 0.134 ... _pdbx_refine.R_factor_all_no_cutoff 0.1101 _pdbx_refine.R_factor_obs_no_cutoff ? _pdbx_refine.free_R_factor_no_cutoff 0.1704
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A
participants (6)
-
Cedric
-
George Devaniranjan
-
Nathaniel Echols
-
Pavel Afonine
-
Phil Jeffrey
-
Tim Gruene