[phenixbb] WAS: changing TLS groups mid refinement
Phil Jeffrey
pjeffrey at princeton.edu
Mon May 17 14:37:11 PDT 2010
Can we turn the argument on its head ?
Demonstrate that the way that phenix.refine, as currently implemented,
inappropriately throws away weak data is never potentially deleterious
to the quality of a protein structure model.
See below - a test on real data/structure suggests that the data does
matter.
> phenix.refine does not have intensity based X-ray refinement targets and
> therefore phenix.refine does not use intensities in refinement. Although
> it accepts input reflection files with intensities which it then
> converts to amplitudes for all subsequent purposes.
So let's look at real data:
Short version:
phenix.refine throws out 2896 reflections of 48895, including 11% of the
data in the outermost shell, compared to using TRUNCATE to prep my data.
Using the common data subset the structure has a decreased R-free of
0.8% if you refine against the truncate=yes PDB file with the common
subset of data.
0.8% at a 24% R-free (24.0 vs 24.8) is pretty significant IMHO.
Longer version:
Using the same MTZ file and PDB file, just using the default column
label selection (IMEAN, SIGIMEAN) or the truncate=yes stucture factors
(F, SIGF).
phenix.refine's default behavior
| 12: 2.1292 - 2.0684 0.90 2536 123 0.1851 0.2381
| 13: 2.0684 - 2.0139 0.88 2462 123 0.1813 0.2654
| 14: 2.0139 - 1.9648 0.87 2400 136 0.1936 0.2637
| 15: 1.9648 - 1.9201 0.81 2273 130 0.2090 0.2789
| 16: 1.9201 - 1.8793 0.83 2303 118 0.2216 0.2761
| 17: 1.8793 - 1.8417 0.72 1998 106 0.2388 0.2790
phenix.refine, forcing it to use F, SIGF out of truncate (truncate=yes)
| 13: 2.1079 - 2.0524 0.97 2546 135 0.1862 0.2383
| 14: 2.0524 - 2.0023 0.96 2545 128 0.1874 0.2623
| 15: 2.0023 - 1.9568 0.96 2499 149 0.1920 0.2562
| 16: 1.9568 - 1.9152 0.92 2452 132 0.2106 0.2385
| 17: 1.9152 - 1.8769 0.95 2500 130 0.2169 0.2895
| 18: 1.8769 - 1.8415 0.83 2182 115 0.2403 0.2680
Columns are resolution range, completeness (work+free), #work, #free,
Rwork, Rfree. The incompleteness in the outer shell of the "complete"
data is because I was overly pessimistic in setting the detector
distance. Mea culpa. The outer shell R-symm in SCALEPACK is 53.8%.
Default behavior yields:
Final: r_work = 0.1898 r_free = 0.2479 bonds = 0.007 angles = 1.114
REMARK 3 DATA USED IN REFINEMENT.
REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.842
REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 32.943
REMARK 3 MIN(FOBS/SIGMA_FOBS) : 0.02
REMARK 3 COMPLETENESS FOR RANGE (%) : 91.38
REMARK 3 NUMBER OF REFLECTIONS : 45999
REMARK 3
REMARK 3 FIT TO DATA USED IN REFINEMENT.
REMARK 3 R VALUE (WORKING + TEST SET) : 0.1928
REMARK 3 R VALUE (WORKING SET) : 0.1898
REMARK 3 FREE R VALUE : 0.2479
REMARK 3 FREE R VALUE TEST SET SIZE (%) : 5.08
REMARK 3 FREE R VALUE TEST SET COUNT : 2339
Truncate=yes data yields:
Final: r_work = 0.1932 r_free = 0.2473 bonds = 0.007 angles = 1.113
REMARK 3 DATA USED IN REFINEMENT.
REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.841
REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 32.943
REMARK 3 MIN(FOBS/SIGMA_FOBS) : 1.34
REMARK 3 COMPLETENESS FOR RANGE (%) : 97.10
REMARK 3 NUMBER OF REFLECTIONS : 48895
REMARK 3
REMARK 3 FIT TO DATA USED IN REFINEMENT.
REMARK 3 R VALUE (WORKING + TEST SET) : 0.1960
REMARK 3 R VALUE (WORKING SET) : 0.1932
REMARK 3 FREE R VALUE : 0.2473
REMARK 3 FREE R VALUE TEST SET SIZE (%) : 5.10
REMARK 3 FREE R VALUE TEST SET COUNT : 2494
Despite the inclusion of more weak data the R-free doesn't change much.
It should increase a little - the same way that R-work does. However
phenix.refine discards 5.7% of the data overall, 11% of data in the
outermost shell, and this is for a dataset that is not at all
anisotropic - you expect the trend to be far worse with anisotropic data
where a lot of the data can be weak at the high resolution limit.
Bigger question is: what would R-free be for the common data subset
(Imean > 0) but using the truncate=yes F values and PDB file ? I used
SFTOOLS to make this selection, and then refining just the bulk solvent
correction for the truncate=yes PDB file against this data subset.....
Final R-work = 0.1884, R-free = 0.2399
i.e. if you refine the model against all the data from TRUNCATE, but
then cut to the subset that phenix.refine would use by default, the
R-free is lower by 0.8%.
The R-free test count was the same as for the default phenix.refine
behavior, so this superficially suggests I didn't do anything wrong.
Phil Jeffrey
More information about the phenixbb
mailing list