R-factor difference between phenix and sftools

Hi, we refined a structure in phenix.refine (dev-1810) and the R-factors are: Final R-work = 0.3221, R-free = 0.3638 phenix.cc_star agrees: phenix.cc_star com_001.pdb com_001.mtz \ f_obs_labels="F-obs,SIGF-obs" \ f_model_labels="F-model,PHIF-model" \ unmerged_data="xscale.hkl" r_work: 0.322 r_free: 0.364 so does phenix.model_vs_data: phenix.model_vs_data com_001.pdb com_001.mtz r_work : 0.3221 r_free : 0.3638 sigma_cutoff : None However, when I calculate the R-factors with sftools, I get a discrepancy between the phenix and sftools results: sftools << eof read com_001_f_model.mtz Y select col R_FREE_FLAGS > 0 correl col FOBS FMODEL select invert correl col FOBS FMODEL quit Y eof R-work: 33.6 R-free: 38.1 Does anyone perhaps know what causes this difference? With best regards, Simon PS: The structure is in spacegroup P212121 with strong pseudocentering tNCS (dimer in the asymmetric unit, native Patterson peak at 0.500 0.500 0.497). No indication for a P21 twin (L-test and refinement in P21 with twin law), not higher symmetry spacegroup I222 (alternating strong and weak reflections are visible in the diffraction patterns, resolved structural differences in the two protomers of the dimer). Two datasets from different beamlines. As might be expected, the R-factors after refinement are unusually high (e.g. 28/35 [conventional resolution cutoff at 3.1 A], 32/36 with Kay's CC cufoff at 2.6 A). I wanted to compare refinements between different programs to see how they cope with the unexpected bimodal amplitude distribution.

Dear Simon, I found this a very puzzling observation and given the high throughput of this board, it is likewise puzzling you have not received a reply yet - or have you? Best regards, Tim On 10/12/2014 08:25 PM, Simon Jenni wrote:
-- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A

Thanks Tim, yes missed this one! Hi Simon, the differences in R-factors you observe are most likely because we use improved bulk-solvent modeling and overall scaling procedure. This is described here: Bulk-solvent and overall scaling revisited: faster calculations, improved results. Afonine PV, Grosse-Kunstleve RW, Adams PD, Urzhumtsev A. Acta Cryst. D69, 625-34 (2013). Specifically, Figures 3-4 in this paper make your observations very expectable. Other contributions to differences are discussed here: phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics. P.V. Afonine, R.W. Grosse-Kunstleve, V.B. Chen, J.J. Headd, N.W. Moriarty, J.S. Richardson, D.C. Richardson, A. Urzhumtsev, P.H. Zwart, P.D. Adams J. Appl. Cryst. 43, 677-685 (2010). Pavel On 10/13/14 11:10 AM, Tim Gruene wrote:

Hi Pavel, thanks for the explanation. Call me lazy as I have not read the reference yet - does it explain how the bulk-solvent is incorporated into the model so that it is not represented in FMODEL written by Phenix (see Simon's script for sftools)? My naive thinking is that the bulk solvent model add to FMODEL so that particularly the calculated low resolution reflections match better the measured ones. Best, Tim On 10/13/2014 08:21 PM, Pavel Afonine wrote:
-- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A

Hi Tim, oh, I see.. Yes, you are right, FMODEL contains everything, including bulk-solvent. Any MTZ file out of phenix.refine contains four blocks of data: 1) input reflections (Fobs or Iobs, flags); 2) data actually used in refinement. For example, if Iobs were input, phenix.refine will convert them into Fobs and use in refinement. That's what will be in this block. 3) Fmodel (the total model structure factors including all scales, bulk-solvent etc); 4) various Fourier map coefficients. Using 2) and 3) you can reproduce reported R-factors exactly. Basically you can read such MTZ file with a script of your choice, take corresponding arrays and employ R-factor formula from a text book, and you will obtain reported R-factors. Now, I'm not a great expert in sftools so I can't really comment on the script that Simon quoted below.. If it really does take arrays of data from 2) and 3) and uses R-factor formula then the numbers must match... Otherwise this is puzzling! Simon, if you send me MTZ file offlist I will have a closer look.. - thanks! All the best, Pavel On 10/13/14 11:25 AM, Tim Gruene wrote:

On Sun, Oct 12, 2014 at 11:25 AM, Simon Jenni
In the default phenix.refine output MTZ, the "F-obs" column will not be scaled to F-model. My guess is that your input data have already been placed on an absolute scale based on the Wilson statistics, so the results are reasonably close, but when I tried using the same commands on an XFEL dataset I got an R-factor of 192. Try using F-obs-filtered, which will be on the same scale. (This doesn't explain however why phenix.cc_star nonetheless produces the expected output. Can you please send me the logfile?) -Nat

On Mon, Oct 13, 2014 at 11:54 AM, Nathaniel Echols
Okay, this statement is at least partially incorrect - your data are clearly on the correct scale in the phenix.refine output file, but the data in the file I used are not. (I'm going to blame this on the weirdness of certain XFEL data.) However, I did eventually figure out the problem: SFTOOLS is using a different formula for the R-factor. If you give it the command "correl help", it will include this: RFACT Rfactor in percent ( 200*Sum|col1-col2|/sum(col1+col2) ) Which disagrees with our source code, and the Rupp textbook, and Kay's wiki, and Wikipedia, all of which use sum(col1) as the denominator (assuming col1 == F-obs, but in our code it's written more generally). In other words: the R-factors from SFTOOLS cannot be meaningfully compared to the R-factors from refinement. -Nat
participants (4)
-
Nathaniel Echols
-
Pavel Afonine
-
Simon Jenni
-
Tim Gruene