Can we turn the argument on its head ? Demonstrate that the way that phenix.refine, as currently implemented, inappropriately throws away weak data is never potentially deleterious to the quality of a protein structure model. See below - a test on real data/structure suggests that the data does matter.
phenix.refine does not have intensity based X-ray refinement targets and therefore phenix.refine does not use intensities in refinement. Although it accepts input reflection files with intensities which it then converts to amplitudes for all subsequent purposes.
So let's look at real data: Short version: phenix.refine throws out 2896 reflections of 48895, including 11% of the data in the outermost shell, compared to using TRUNCATE to prep my data. Using the common data subset the structure has a decreased R-free of 0.8% if you refine against the truncate=yes PDB file with the common subset of data. 0.8% at a 24% R-free (24.0 vs 24.8) is pretty significant IMHO. Longer version: Using the same MTZ file and PDB file, just using the default column label selection (IMEAN, SIGIMEAN) or the truncate=yes stucture factors (F, SIGF). phenix.refine's default behavior | 12: 2.1292 - 2.0684 0.90 2536 123 0.1851 0.2381 | 13: 2.0684 - 2.0139 0.88 2462 123 0.1813 0.2654 | 14: 2.0139 - 1.9648 0.87 2400 136 0.1936 0.2637 | 15: 1.9648 - 1.9201 0.81 2273 130 0.2090 0.2789 | 16: 1.9201 - 1.8793 0.83 2303 118 0.2216 0.2761 | 17: 1.8793 - 1.8417 0.72 1998 106 0.2388 0.2790 phenix.refine, forcing it to use F, SIGF out of truncate (truncate=yes) | 13: 2.1079 - 2.0524 0.97 2546 135 0.1862 0.2383 | 14: 2.0524 - 2.0023 0.96 2545 128 0.1874 0.2623 | 15: 2.0023 - 1.9568 0.96 2499 149 0.1920 0.2562 | 16: 1.9568 - 1.9152 0.92 2452 132 0.2106 0.2385 | 17: 1.9152 - 1.8769 0.95 2500 130 0.2169 0.2895 | 18: 1.8769 - 1.8415 0.83 2182 115 0.2403 0.2680 Columns are resolution range, completeness (work+free), #work, #free, Rwork, Rfree. The incompleteness in the outer shell of the "complete" data is because I was overly pessimistic in setting the detector distance. Mea culpa. The outer shell R-symm in SCALEPACK is 53.8%. Default behavior yields: Final: r_work = 0.1898 r_free = 0.2479 bonds = 0.007 angles = 1.114 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.842 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 32.943 REMARK 3 MIN(FOBS/SIGMA_FOBS) : 0.02 REMARK 3 COMPLETENESS FOR RANGE (%) : 91.38 REMARK 3 NUMBER OF REFLECTIONS : 45999 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 R VALUE (WORKING + TEST SET) : 0.1928 REMARK 3 R VALUE (WORKING SET) : 0.1898 REMARK 3 FREE R VALUE : 0.2479 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 5.08 REMARK 3 FREE R VALUE TEST SET COUNT : 2339 Truncate=yes data yields: Final: r_work = 0.1932 r_free = 0.2473 bonds = 0.007 angles = 1.113 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.841 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 32.943 REMARK 3 MIN(FOBS/SIGMA_FOBS) : 1.34 REMARK 3 COMPLETENESS FOR RANGE (%) : 97.10 REMARK 3 NUMBER OF REFLECTIONS : 48895 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 R VALUE (WORKING + TEST SET) : 0.1960 REMARK 3 R VALUE (WORKING SET) : 0.1932 REMARK 3 FREE R VALUE : 0.2473 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 5.10 REMARK 3 FREE R VALUE TEST SET COUNT : 2494 Despite the inclusion of more weak data the R-free doesn't change much. It should increase a little - the same way that R-work does. However phenix.refine discards 5.7% of the data overall, 11% of data in the outermost shell, and this is for a dataset that is not at all anisotropic - you expect the trend to be far worse with anisotropic data where a lot of the data can be weak at the high resolution limit. Bigger question is: what would R-free be for the common data subset (Imean > 0) but using the truncate=yes F values and PDB file ? I used SFTOOLS to make this selection, and then refining just the bulk solvent correction for the truncate=yes PDB file against this data subset..... Final R-work = 0.1884, R-free = 0.2399 i.e. if you refine the model against all the data from TRUNCATE, but then cut to the subset that phenix.refine would use by default, the R-free is lower by 0.8%. The R-free test count was the same as for the default phenix.refine behavior, so this superficially suggests I didn't do anything wrong. Phil Jeffrey