[phenixbb] WAS: changing TLS groups mid refinement

Phil Jeffrey pjeffrey at princeton.edu
Mon May 17 14:37:11 PDT 2010


Can we turn the argument on its head ?

Demonstrate that the way that phenix.refine, as currently implemented, 
inappropriately throws away weak data is never potentially deleterious 
to the quality of a protein structure model.


See below - a test on real data/structure suggests that the data does 
matter.

> phenix.refine does not have intensity based X-ray refinement targets and 
> therefore phenix.refine does not use intensities in refinement. Although 
> it accepts input reflection files with intensities which it then 
> converts to amplitudes for all subsequent purposes.

So let's look at real data:

Short version:

phenix.refine throws out 2896 reflections of 48895, including 11% of the 
data in the outermost shell, compared to using TRUNCATE to prep my data. 
  Using the common data subset the structure has a decreased R-free of 
0.8% if you refine against the truncate=yes PDB file with the common 
subset of data.

0.8% at a 24% R-free (24.0 vs 24.8) is pretty significant IMHO.


Longer version:

Using the same MTZ file and PDB file, just using the default column 
label selection (IMEAN, SIGIMEAN) or the truncate=yes stucture factors 
(F, SIGF).

phenix.refine's default behavior

| 12:  2.1292 -  2.0684 0.90   2536  123 0.1851 0.2381
| 13:  2.0684 -  2.0139 0.88   2462  123 0.1813 0.2654
| 14:  2.0139 -  1.9648 0.87   2400  136 0.1936 0.2637
| 15:  1.9648 -  1.9201 0.81   2273  130 0.2090 0.2789
| 16:  1.9201 -  1.8793 0.83   2303  118 0.2216 0.2761
| 17:  1.8793 -  1.8417 0.72   1998  106 0.2388 0.2790

phenix.refine, forcing it to use F, SIGF out of truncate (truncate=yes)

| 13:  2.1079 -  2.0524 0.97   2546  135 0.1862 0.2383
| 14:  2.0524 -  2.0023 0.96   2545  128 0.1874 0.2623
| 15:  2.0023 -  1.9568 0.96   2499  149 0.1920 0.2562
| 16:  1.9568 -  1.9152 0.92   2452  132 0.2106 0.2385
| 17:  1.9152 -  1.8769 0.95   2500  130 0.2169 0.2895
| 18:  1.8769 -  1.8415 0.83   2182  115 0.2403 0.2680

Columns are resolution range, completeness (work+free), #work, #free, 
Rwork, Rfree.  The incompleteness in the outer shell of the "complete" 
data is because I was overly pessimistic in setting the detector 
distance.  Mea culpa.  The outer shell R-symm in SCALEPACK is 53.8%.


Default behavior yields:
Final: r_work = 0.1898 r_free = 0.2479 bonds = 0.007 angles = 1.114

REMARK   3  DATA USED IN REFINEMENT.
REMARK   3   RESOLUTION RANGE HIGH (ANGSTROMS) : 1.842
REMARK   3   RESOLUTION RANGE LOW  (ANGSTROMS) : 32.943
REMARK   3   MIN(FOBS/SIGMA_FOBS)              : 0.02
REMARK   3   COMPLETENESS FOR RANGE        (%) : 91.38
REMARK   3   NUMBER OF REFLECTIONS             : 45999
REMARK   3
REMARK   3  FIT TO DATA USED IN REFINEMENT.
REMARK   3   R VALUE     (WORKING + TEST SET) : 0.1928
REMARK   3   R VALUE            (WORKING SET) : 0.1898
REMARK   3   FREE R VALUE                     : 0.2479
REMARK   3   FREE R VALUE TEST SET SIZE   (%) : 5.08
REMARK   3   FREE R VALUE TEST SET COUNT      : 2339

Truncate=yes data yields:
Final: r_work = 0.1932 r_free = 0.2473 bonds = 0.007 angles = 1.113

REMARK   3  DATA USED IN REFINEMENT.
REMARK   3   RESOLUTION RANGE HIGH (ANGSTROMS) : 1.841
REMARK   3   RESOLUTION RANGE LOW  (ANGSTROMS) : 32.943
REMARK   3   MIN(FOBS/SIGMA_FOBS)              : 1.34
REMARK   3   COMPLETENESS FOR RANGE        (%) : 97.10
REMARK   3   NUMBER OF REFLECTIONS             : 48895
REMARK   3
REMARK   3  FIT TO DATA USED IN REFINEMENT.
REMARK   3   R VALUE     (WORKING + TEST SET) : 0.1960
REMARK   3   R VALUE            (WORKING SET) : 0.1932
REMARK   3   FREE R VALUE                     : 0.2473
REMARK   3   FREE R VALUE TEST SET SIZE   (%) : 5.10
REMARK   3   FREE R VALUE TEST SET COUNT      : 2494

Despite the inclusion of more weak data the R-free doesn't change much. 
It should increase a little - the same way that R-work does.  However 
phenix.refine discards 5.7% of the data overall, 11% of data in the 
outermost shell, and this is for a dataset that is not at all 
anisotropic - you expect the trend to be far worse with anisotropic data 
where a lot of the data can be weak at the high resolution limit.

Bigger question is: what would R-free be for the common data subset 
(Imean > 0) but using the truncate=yes F values and PDB file ?  I used 
SFTOOLS to make this selection, and then refining just the bulk solvent 
correction for the truncate=yes PDB file against this data subset.....
Final R-work = 0.1884, R-free = 0.2399
i.e. if you refine the model against all the data from TRUNCATE, but 
then cut to the subset that phenix.refine would use by default, the 
R-free is lower by 0.8%.
The R-free test count was the same as for the default phenix.refine 
behavior, so this superficially suggests I didn't do anything wrong.

Phil Jeffrey






More information about the phenixbb mailing list