R-factor difference between phenix and sftools
Hi, we refined a structure in phenix.refine (dev-1810) and the R-factors are: Final R-work = 0.3221, R-free = 0.3638 phenix.cc_star agrees: phenix.cc_star com_001.pdb com_001.mtz \ f_obs_labels="F-obs,SIGF-obs" \ f_model_labels="F-model,PHIF-model" \ unmerged_data="xscale.hkl" r_work: 0.322 r_free: 0.364 so does phenix.model_vs_data: phenix.model_vs_data com_001.pdb com_001.mtz r_work : 0.3221 r_free : 0.3638 sigma_cutoff : None However, when I calculate the R-factors with sftools, I get a discrepancy between the phenix and sftools results: sftools << eof read com_001_f_model.mtz Y select col R_FREE_FLAGS > 0 correl col FOBS FMODEL select invert correl col FOBS FMODEL quit Y eof R-work: 33.6 R-free: 38.1 Does anyone perhaps know what causes this difference? With best regards, Simon PS: The structure is in spacegroup P212121 with strong pseudocentering tNCS (dimer in the asymmetric unit, native Patterson peak at 0.500 0.500 0.497). No indication for a P21 twin (L-test and refinement in P21 with twin law), not higher symmetry spacegroup I222 (alternating strong and weak reflections are visible in the diffraction patterns, resolved structural differences in the two protomers of the dimer). Two datasets from different beamlines. As might be expected, the R-factors after refinement are unusually high (e.g. 28/35 [conventional resolution cutoff at 3.1 A], 32/36 with Kay's CC cufoff at 2.6 A). I wanted to compare refinements between different programs to see how they cope with the unexpected bimodal amplitude distribution.
Dear Simon, I found this a very puzzling observation and given the high throughput of this board, it is likewise puzzling you have not received a reply yet - or have you? Best regards, Tim On 10/12/2014 08:25 PM, Simon Jenni wrote:
Hi, we refined a structure in phenix.refine (dev-1810) and the R-factors are:
Final R-work = 0.3221, R-free = 0.3638
phenix.cc_star agrees:
phenix.cc_star com_001.pdb com_001.mtz \ f_obs_labels="F-obs,SIGF-obs" \ f_model_labels="F-model,PHIF-model" \ unmerged_data="xscale.hkl"
r_work: 0.322 r_free: 0.364
so does phenix.model_vs_data:
phenix.model_vs_data com_001.pdb com_001.mtz
r_work : 0.3221 r_free : 0.3638 sigma_cutoff : None
However, when I calculate the R-factors with sftools, I get a discrepancy between the phenix and sftools results:
sftools << eof read com_001_f_model.mtz Y select col R_FREE_FLAGS > 0 correl col FOBS FMODEL select invert correl col FOBS FMODEL quit Y eof
R-work: 33.6 R-free: 38.1
Does anyone perhaps know what causes this difference?
With best regards, Simon
PS: The structure is in spacegroup P212121 with strong pseudocentering tNCS (dimer in the asymmetric unit, native Patterson peak at 0.500 0.500 0.497). No indication for a P21 twin (L-test and refinement in P21 with twin law), not higher symmetry spacegroup I222 (alternating strong and weak reflections are visible in the diffraction patterns, resolved structural differences in the two protomers of the dimer). Two datasets from different beamlines. As might be expected, the R-factors after refinement are unusually high (e.g. 28/35 [conventional resolution cutoff at 3.1 A], 32/36 with Kay's CC cufoff at 2.6 A). I wanted to compare refinements between different programs to see how they cope with the unexpected bimodal amplitude distribution.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A
Thanks Tim, yes missed this one! Hi Simon, the differences in R-factors you observe are most likely because we use improved bulk-solvent modeling and overall scaling procedure. This is described here: Bulk-solvent and overall scaling revisited: faster calculations, improved results. Afonine PV, Grosse-Kunstleve RW, Adams PD, Urzhumtsev A. Acta Cryst. D69, 625-34 (2013). Specifically, Figures 3-4 in this paper make your observations very expectable. Other contributions to differences are discussed here: phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics. P.V. Afonine, R.W. Grosse-Kunstleve, V.B. Chen, J.J. Headd, N.W. Moriarty, J.S. Richardson, D.C. Richardson, A. Urzhumtsev, P.H. Zwart, P.D. Adams J. Appl. Cryst. 43, 677-685 (2010). Pavel On 10/13/14 11:10 AM, Tim Gruene wrote:
Dear Simon,
I found this a very puzzling observation and given the high throughput of this board, it is likewise puzzling you have not received a reply yet - or have you?
Best regards, Tim
On 10/12/2014 08:25 PM, Simon Jenni wrote:
Hi, we refined a structure in phenix.refine (dev-1810) and the R-factors are:
Final R-work = 0.3221, R-free = 0.3638
phenix.cc_star agrees:
phenix.cc_star com_001.pdb com_001.mtz \ f_obs_labels="F-obs,SIGF-obs" \ f_model_labels="F-model,PHIF-model" \ unmerged_data="xscale.hkl"
r_work: 0.322 r_free: 0.364
so does phenix.model_vs_data:
phenix.model_vs_data com_001.pdb com_001.mtz
r_work : 0.3221 r_free : 0.3638 sigma_cutoff : None
However, when I calculate the R-factors with sftools, I get a discrepancy between the phenix and sftools results:
sftools << eof read com_001_f_model.mtz Y select col R_FREE_FLAGS > 0 correl col FOBS FMODEL select invert correl col FOBS FMODEL quit Y eof
R-work: 33.6 R-free: 38.1
Does anyone perhaps know what causes this difference?
With best regards, Simon
PS: The structure is in spacegroup P212121 with strong pseudocentering tNCS (dimer in the asymmetric unit, native Patterson peak at 0.500 0.500 0.497). No indication for a P21 twin (L-test and refinement in P21 with twin law), not higher symmetry spacegroup I222 (alternating strong and weak reflections are visible in the diffraction patterns, resolved structural differences in the two protomers of the dimer). Two datasets from different beamlines. As might be expected, the R-factors after refinement are unusually high (e.g. 28/35 [conventional resolution cutoff at 3.1 A], 32/36 with Kay's CC cufoff at 2.6 A). I wanted to compare refinements between different programs to see how they cope with the unexpected bimodal amplitude distribution.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi Pavel, thanks for the explanation. Call me lazy as I have not read the reference yet - does it explain how the bulk-solvent is incorporated into the model so that it is not represented in FMODEL written by Phenix (see Simon's script for sftools)? My naive thinking is that the bulk solvent model add to FMODEL so that particularly the calculated low resolution reflections match better the measured ones. Best, Tim On 10/13/2014 08:21 PM, Pavel Afonine wrote:
Thanks Tim, yes missed this one!
Hi Simon,
the differences in R-factors you observe are most likely because we use improved bulk-solvent modeling and overall scaling procedure. This is described here:
Bulk-solvent and overall scaling revisited: faster calculations, improved results. Afonine PV, Grosse-Kunstleve RW, Adams PD, Urzhumtsev A. Acta Cryst. D69, 625-34 (2013).
Specifically, Figures 3-4 in this paper make your observations very expectable.
Other contributions to differences are discussed here:
phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics. P.V. Afonine, R.W. Grosse-Kunstleve, V.B. Chen, J.J. Headd, N.W. Moriarty, J.S. Richardson, D.C. Richardson, A. Urzhumtsev, P.H. Zwart, P.D. Adams J. Appl. Cryst. 43, 677-685 (2010).
Pavel
On 10/13/14 11:10 AM, Tim Gruene wrote:
Dear Simon,
I found this a very puzzling observation and given the high throughput of this board, it is likewise puzzling you have not received a reply yet - or have you?
Best regards, Tim
On 10/12/2014 08:25 PM, Simon Jenni wrote:
Hi, we refined a structure in phenix.refine (dev-1810) and the R-factors are:
Final R-work = 0.3221, R-free = 0.3638
phenix.cc_star agrees:
phenix.cc_star com_001.pdb com_001.mtz \ f_obs_labels="F-obs,SIGF-obs" \ f_model_labels="F-model,PHIF-model" \ unmerged_data="xscale.hkl"
r_work: 0.322 r_free: 0.364
so does phenix.model_vs_data:
phenix.model_vs_data com_001.pdb com_001.mtz
r_work : 0.3221 r_free : 0.3638 sigma_cutoff : None
However, when I calculate the R-factors with sftools, I get a discrepancy between the phenix and sftools results:
sftools << eof read com_001_f_model.mtz Y select col R_FREE_FLAGS > 0 correl col FOBS FMODEL select invert correl col FOBS FMODEL quit Y eof
R-work: 33.6 R-free: 38.1
Does anyone perhaps know what causes this difference?
With best regards, Simon
PS: The structure is in spacegroup P212121 with strong pseudocentering tNCS (dimer in the asymmetric unit, native Patterson peak at 0.500 0.500 0.497). No indication for a P21 twin (L-test and refinement in P21 with twin law), not higher symmetry spacegroup I222 (alternating strong and weak reflections are visible in the diffraction patterns, resolved structural differences in the two protomers of the dimer). Two datasets from different beamlines. As might be expected, the R-factors after refinement are unusually high (e.g. 28/35 [conventional resolution cutoff at 3.1 A], 32/36 with Kay's CC cufoff at 2.6 A). I wanted to compare refinements between different programs to see how they cope with the unexpected bimodal amplitude distribution.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- Dr Tim Gruene Institut fuer anorganische Chemie Tammannstr. 4 D-37077 Goettingen GPG Key ID = A46BEE1A
Hi Tim, oh, I see.. Yes, you are right, FMODEL contains everything, including bulk-solvent. Any MTZ file out of phenix.refine contains four blocks of data: 1) input reflections (Fobs or Iobs, flags); 2) data actually used in refinement. For example, if Iobs were input, phenix.refine will convert them into Fobs and use in refinement. That's what will be in this block. 3) Fmodel (the total model structure factors including all scales, bulk-solvent etc); 4) various Fourier map coefficients. Using 2) and 3) you can reproduce reported R-factors exactly. Basically you can read such MTZ file with a script of your choice, take corresponding arrays and employ R-factor formula from a text book, and you will obtain reported R-factors. Now, I'm not a great expert in sftools so I can't really comment on the script that Simon quoted below.. If it really does take arrays of data from 2) and 3) and uses R-factor formula then the numbers must match... Otherwise this is puzzling! Simon, if you send me MTZ file offlist I will have a closer look.. - thanks! All the best, Pavel On 10/13/14 11:25 AM, Tim Gruene wrote:
Hi Pavel,
thanks for the explanation. Call me lazy as I have not read the reference yet - does it explain how the bulk-solvent is incorporated into the model so that it is not represented in FMODEL written by Phenix (see Simon's script for sftools)? My naive thinking is that the bulk solvent model add to FMODEL so that particularly the calculated low resolution reflections match better the measured ones.
Best, Tim
On 10/13/2014 08:21 PM, Pavel Afonine wrote:
Thanks Tim, yes missed this one!
Hi Simon,
the differences in R-factors you observe are most likely because we use improved bulk-solvent modeling and overall scaling procedure. This is described here:
Bulk-solvent and overall scaling revisited: faster calculations, improved results. Afonine PV, Grosse-Kunstleve RW, Adams PD, Urzhumtsev A. Acta Cryst. D69, 625-34 (2013).
Specifically, Figures 3-4 in this paper make your observations very expectable.
Other contributions to differences are discussed here:
phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics. P.V. Afonine, R.W. Grosse-Kunstleve, V.B. Chen, J.J. Headd, N.W. Moriarty, J.S. Richardson, D.C. Richardson, A. Urzhumtsev, P.H. Zwart, P.D. Adams J. Appl. Cryst. 43, 677-685 (2010).
Pavel
On 10/13/14 11:10 AM, Tim Gruene wrote:
Dear Simon,
I found this a very puzzling observation and given the high throughput of this board, it is likewise puzzling you have not received a reply yet - or have you?
Best regards, Tim
On 10/12/2014 08:25 PM, Simon Jenni wrote:
Hi, we refined a structure in phenix.refine (dev-1810) and the R-factors are:
Final R-work = 0.3221, R-free = 0.3638
phenix.cc_star agrees:
phenix.cc_star com_001.pdb com_001.mtz \ f_obs_labels="F-obs,SIGF-obs" \ f_model_labels="F-model,PHIF-model" \ unmerged_data="xscale.hkl"
r_work: 0.322 r_free: 0.364
so does phenix.model_vs_data:
phenix.model_vs_data com_001.pdb com_001.mtz
r_work : 0.3221 r_free : 0.3638 sigma_cutoff : None
However, when I calculate the R-factors with sftools, I get a discrepancy between the phenix and sftools results:
sftools << eof read com_001_f_model.mtz Y select col R_FREE_FLAGS > 0 correl col FOBS FMODEL select invert correl col FOBS FMODEL quit Y eof
R-work: 33.6 R-free: 38.1
Does anyone perhaps know what causes this difference?
With best regards, Simon
PS: The structure is in spacegroup P212121 with strong pseudocentering tNCS (dimer in the asymmetric unit, native Patterson peak at 0.500 0.500 0.497). No indication for a P21 twin (L-test and refinement in P21 with twin law), not higher symmetry spacegroup I222 (alternating strong and weak reflections are visible in the diffraction patterns, resolved structural differences in the two protomers of the dimer). Two datasets from different beamlines. As might be expected, the R-factors after refinement are unusually high (e.g. 28/35 [conventional resolution cutoff at 3.1 A], 32/36 with Kay's CC cufoff at 2.6 A). I wanted to compare refinements between different programs to see how they cope with the unexpected bimodal amplitude distribution.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On Sun, Oct 12, 2014 at 11:25 AM, Simon Jenni
However, when I calculate the R-factors with sftools, I get a discrepancy between the phenix and sftools results:
sftools << eof read com_001_f_model.mtz Y select col R_FREE_FLAGS > 0 correl col FOBS FMODEL select invert correl col FOBS FMODEL quit Y eof
R-work: 33.6 R-free: 38.1
Does anyone perhaps know what causes this difference?
In the default phenix.refine output MTZ, the "F-obs" column will not be scaled to F-model. My guess is that your input data have already been placed on an absolute scale based on the Wilson statistics, so the results are reasonably close, but when I tried using the same commands on an XFEL dataset I got an R-factor of 192. Try using F-obs-filtered, which will be on the same scale. (This doesn't explain however why phenix.cc_star nonetheless produces the expected output. Can you please send me the logfile?) -Nat
On Mon, Oct 13, 2014 at 11:54 AM, Nathaniel Echols
In the default phenix.refine output MTZ, the "F-obs" column will not be scaled to F-model. My guess is that your input data have already been placed on an absolute scale based on the Wilson statistics, so the results are reasonably close, but when I tried using the same commands on an XFEL dataset I got an R-factor of 192.
Okay, this statement is at least partially incorrect - your data are clearly on the correct scale in the phenix.refine output file, but the data in the file I used are not. (I'm going to blame this on the weirdness of certain XFEL data.) However, I did eventually figure out the problem: SFTOOLS is using a different formula for the R-factor. If you give it the command "correl help", it will include this: RFACT Rfactor in percent ( 200*Sum|col1-col2|/sum(col1+col2) ) Which disagrees with our source code, and the Rupp textbook, and Kay's wiki, and Wikipedia, all of which use sum(col1) as the denominator (assuming col1 == F-obs, but in our code it's written more generally). In other words: the R-factors from SFTOOLS cannot be meaningfully compared to the R-factors from refinement. -Nat
Interesting.. I use this formula to calculate R-factor between two data sets when I cannot choose which one to call "Fobs" and which one to call "Fcalc". But clearly, this is not exact what we call R-factor. Pavel On 10/13/14 2:02 PM, Nathaniel Echols wrote:
On Mon, Oct 13, 2014 at 11:54 AM, Nathaniel Echols
mailto:[email protected]> wrote: In the default phenix.refine output MTZ, the "F-obs" column will not be scaled to F-model. My guess is that your input data have already been placed on an absolute scale based on the Wilson statistics, so the results are reasonably close, but when I tried using the same commands on an XFEL dataset I got an R-factor of 192.
Okay, this statement is at least partially incorrect - your data are clearly on the correct scale in the phenix.refine output file, but the data in the file I used are not. (I'm going to blame this on the weirdness of certain XFEL data.)
However, I did eventually figure out the problem: SFTOOLS is using a different formula for the R-factor. If you give it the command "correl help", it will include this:
RFACT Rfactor in percent ( 200*Sum|col1-col2|/sum(col1+col2) )
Which disagrees with our source code, and the Rupp textbook, and Kay's wiki, and Wikipedia, all of which use sum(col1) as the denominator (assuming col1 == F-obs, but in our code it's written more generally). In other words: the R-factors from SFTOOLS cannot be meaningfully compared to the R-factors from refinement.
-Nat
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Thanks to all, and especially Nat for finding out the cause (different
formula for R-factor calculation in sftools and phenix)!
All the best, Simon
On Mon, Oct 13, 2014 at 5:16 PM, Pavel Afonine
Interesting.. I use this formula to calculate R-factor between two data sets when I cannot choose which one to call "Fobs" and which one to call "Fcalc". But clearly, this is not exact what we call R-factor.
Pavel
On 10/13/14 2:02 PM, Nathaniel Echols wrote:
On Mon, Oct 13, 2014 at 11:54 AM, Nathaniel Echols
wrote: In the default phenix.refine output MTZ, the "F-obs" column will not be scaled to F-model. My guess is that your input data have already been placed on an absolute scale based on the Wilson statistics, so the results are reasonably close, but when I tried using the same commands on an XFEL dataset I got an R-factor of 192.
Okay, this statement is at least partially incorrect - your data are clearly on the correct scale in the phenix.refine output file, but the data in the file I used are not. (I'm going to blame this on the weirdness of certain XFEL data.)
However, I did eventually figure out the problem: SFTOOLS is using a different formula for the R-factor. If you give it the command "correl help", it will include this:
RFACT Rfactor in percent ( 200*Sum|col1-col2|/sum(col1+col2) )
Which disagrees with our source code, and the Rupp textbook, and Kay's wiki, and Wikipedia, all of which use sum(col1) as the denominator (assuming col1 == F-obs, but in our code it's written more generally). In other words: the R-factors from SFTOOLS cannot be meaningfully compared to the R-factors from refinement.
-Nat
_______________________________________________ phenixbb mailing [email protected]http://phenix-online.org/mailman/listinfo/phenixbb
participants (4)
-
Nathaniel Echols
-
Pavel Afonine
-
Simon Jenni
-
Tim Gruene