Rfree and a low resolution data set
Hi all, Currently I am refining a data set which showed anisotropic diffraction. Aimless suggested cutoffs at 2.3, 2.6 and 3.6 angstrom for the h,k and l axis. I chose a general 3.6 cutoff to obtain satisfactory statistics for Rmeas, I/sd(I) and CC1/2. At this resolution the data set consists of approximately 2800 reflections. Generally 5% of the set is set aside as the Rfree test set and I found that a minimum of 500 reflections in total is used to produce a reliable Rfree. However, 5% only amounts to 140 reflections in this case. I am hesitant to include more reflections as I would have to go up to 20% of the reflections to obtain more than 500 reflections for the test set. In a discussion on the CCP4 message boards some time ago it was suggested to do multiple refinements with different test sets: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=ccp4bb&F=&S=&P=125570 In the thread it was also discussed that a least squares approach is prefered when using a small test set. However, when using a LS target, the resulting Rfree is very high (10% higher than when using the automatic option) and phenix.refine? produces awful geometry (24% ramachandran outliers, 105 clashcore...). It seems that the refinement is performed without restraints? Optimize X-ray/stereochemistry weight does not result in improved stereochemistry. My question is if the LS approach is still relevant and if so, is there an explanation (and solution) for the bad statistics? Kind regards, Toon Van Thillo
Dear Toon, I think some of your questions are addressed by the work Praznikar, J. & Turk, D. (2014) Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures. Acta Cryst. D70, 3124-3134 (that does not say how to validate the result without the R-free but shows how to get this result). Please look it. Note that the authors talk about ML and not about LS refinement; I wonder why you need to use LS. Best regards, Sacha Urzhumtsev De : [email protected] [mailto:[email protected]] De la part de Toon Van Thillo Envoyé : mercredi 4 avril 2018 11:48 À : [email protected] Objet : [phenixbb] Rfree and a low resolution data set Hi all, Currently I am refining a data set which showed anisotropic diffraction. Aimless suggested cutoffs at 2.3, 2.6 and 3.6 angstrom for the h,k and l axis. I chose a general 3.6 cutoff to obtain satisfactory statistics for Rmeas, I/sd(I) and CC1/2. At this resolution the data set consists of approximately 2800 reflections. Generally 5% of the set is set aside as the Rfree test set and I found that a minimum of 500 reflections in total is used to produce a reliable Rfree. However, 5% only amounts to 140 reflections in this case. I am hesitant to include more reflections as I would have to go up to 20% of the reflections to obtain more than 500 reflections for the test set. In a discussion on the CCP4 message boards some time ago it was suggested to do multiple refinements with different test sets: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=ccp4bb&F=&S=&P=125570 In the thread it was also discussed that a least squares approach is prefered when using a small test set. However, when using a LS target, the resulting Rfree is very high (10% higher than when using the automatic option) and phenix.refine produces awful geometry (24% ramachandran outliers, 105 clashcore...). It seems that the refinement is performed without restraints? Optimize X-ray/stereochemistry weight does not result in improved stereochemistry. My question is if the LS approach is still relevant and if so, is there an explanation (and solution) for the bad statistics? Kind regards, Toon Van Thillo
Dear Toon, dear Sacha, when you have very few reflections, too few for Rfree, you should use Rcomplete instead. Rcomplete is the proper way to validate, and Rfree was only introduced because Rcomplete takes longer to compute. In the 1990s this was an issue, but nowadays, Rcomplete calculates within a few minutes on a decent computer in cases where you have less than about 20,000 unique reflections. Rcomplete can be computed with a single reflection set aside during each refinement run. Rcomplete could also be used for ML estimates instead of Rfree, but as far as I know this has not been implemented, yet. The concept of Rcomplete was introduced by Axel Brunger. The good thing about Rcomplete (and actually also Rfree) is that the need set aside the free set from the very beginning is an urban myth: at any stage during refinement, the R-value from reflections not used during refinement is going to converge towards the Rfree=Rcomplete, whether or not these reflections were used previously in refinement. I called this "Tickle's conjecture", because I understood the idea from Ian Tickle's discussions on ccp4bb. the concept of Rcomplete was introduced in A. Bruenger, "Free R value: cross- validation in crystallography", Meth. Enzymol. (1997), 277, 366-396. It's name was, to the best of my knowledge, introduced with Jiang & Bruenger, "Protein Hydration Observed by X-ray Diffraction", J. Mol. Biol. (1994), 243, 100-115. A set of experiments the demonstrate the properties of Rcomplete, like Rcomplete=Rfree and the removal of bias simply by refinement, are given in Luebben & Gruene, "New method to compute Rcomplete enables maximum likelihood refinement for small datasets", PNAS (2015), 112, 8999-9003. When using SHELXL, there is a comfortable GUI to calculate Rcomplete written by Jens Luebben, available at https://github.com/JLuebben/R_complete For Refmac and Phenix.refine, it requires a little bit of scripting. PDB_REDO also calculates Rcomplete. The concept of Rcomplete is applicable to all sorts of refinement runs. It usefuleness was recently demonstrated for charge density studies, Krause et al., "Validation of experimental charge-density refinement strategies: when do we overfit?", IUCrJ (2017), 4, 420-430 (and therein renamed as Rcross). Best, Tim On Wed, 2018-04-04 at 10:19 +0000, Alexandre OURJOUMTSEV wrote: > Dear Toon, > > I think some of your questions are addressed by the work > > Praznikar, J. & Turk, D. (2014) Free kick instead of cross-validation in > maximum-likelihood refinement of macromolecular crystal structures. Acta > Cryst. D70, 3124-3134 > > (that does not say how to validate the result without the R-free but shows how > to get this result). > Please look it. > > Note that the authors talk about ML and not about LS refinement; I wonder why > you need to use LS. > > Best regards, > > Sacha Urzhumtsev > > De : [email protected] [mailto:phenixbb-bounces@phenix-online > .org] De la part de Toon Van Thillo > Envoyé : mercredi 4 avril 2018 11:48 > À : [email protected] > Objet : [phenixbb] Rfree and a low resolution data set > > > Hi all, > > > > Currently I am refining a data set which showed anisotropic diffraction. > Aimless suggested cutoffs at 2.3, 2.6 and 3.6 angstrom for the h,k and l axis. > > I chose a general 3.6 cutoff to obtain satisfactory statistics for Rmeas, > I/sd(I) and CC1/2. At this resolution the data set consists of approximately > 2800 reflections. > > > > Generally 5% of the set is set aside as the Rfree test set and I found that a > minimum of 500 reflections in total is used to produce a reliable Rfree. > However, 5% only amounts to 140 reflections in this case. I am hesitant to > include more reflections as I would have to go up to 20% of the reflections to > obtain more than 500 reflections for the test set. In a discussion on the CCP4 > message boards some time ago it was suggested to do multiple refinements with > different test sets: > > https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=ccp4bb&F=&S=&P=125570 > > > > In the thread it was also discussed that a least squares approach is prefered > when using a small test set. However, when using a LS target, the resulting > Rfree is very high (10% higher than when using the automatic option) and > phenix.refine produces awful geometry (24% ramachandran outliers, 105 > clashcore...). It seems that the refinement is performed without restraints? > Optimize X-ray/stereochemistry weight does not result in improved > stereochemistry. My question is if the LS approach is still relevant and if > so, is there an explanation (and solution) for the bad statistics? > > > > Kind regards, > > > > Toon Van Thillo > > > > > _______________________________________________ > phenixbb mailing list > [email protected] > http://phenix-online.org/mailman/listinfo/phenixbb > Unsubscribe: [email protected] -- -- Paul Scherrer Institut Tim Gruene - persoenlich - OSUA/204 CH-5232 Villigen PSI phone: +41 (0)56 310 5297 GPG Key ID = A46BEE1A
Thank you, Tim ! An excellent remark ! Sacha -----Message d'origine----- De : [email protected] [mailto:[email protected]] De la part de Tim Gruene Envoyé : mercredi 4 avril 2018 14:29 À : [email protected] Objet : Re: [phenixbb] Rfree and a low resolution data set Dear Toon, dear Sacha, when you have very few reflections, too few for Rfree, you should use Rcomplete instead. Rcomplete is the proper way to validate, and Rfree was only introduced because Rcomplete takes longer to compute. In the 1990s this was an issue, but nowadays, Rcomplete calculates within a few minutes on a decent computer in cases where you have less than about 20,000 unique reflections. Rcomplete can be computed with a single reflection set aside during each refinement run. Rcomplete could also be used for ML estimates instead of Rfree, but as far as I know this has not been implemented, yet. The concept of Rcomplete was introduced by Axel Brunger. The good thing about Rcomplete (and actually also Rfree) is that the need set aside the free set from the very beginning is an urban myth: at any stage during refinement, the R-value from reflections not used during refinement is going to converge towards the Rfree=Rcomplete, whether or not these reflections were used previously in refinement. I called this "Tickle's conjecture", because I understood the idea from Ian Tickle's discussions on ccp4bb. the concept of Rcomplete was introduced in A. Bruenger, "Free R value: cross- validation in crystallography", Meth. Enzymol. (1997), 277, 366-396. It's name was, to the best of my knowledge, introduced with Jiang & Bruenger, "Protein Hydration Observed by X-ray Diffraction", J. Mol. Biol. (1994), 243, 100-115. A set of experiments the demonstrate the properties of Rcomplete, like Rcomplete=Rfree and the removal of bias simply by refinement, are given in Luebben & Gruene, "New method to compute Rcomplete enables maximum likelihood refinement for small datasets", PNAS (2015), 112, 8999-9003. When using SHELXL, there is a comfortable GUI to calculate Rcomplete written by Jens Luebben, available at https://github.com/JLuebben/R_complete For Refmac and Phenix.refine, it requires a little bit of scripting. PDB_REDO also calculates Rcomplete. The concept of Rcomplete is applicable to all sorts of refinement runs. It usefuleness was recently demonstrated for charge density studies, Krause et al., "Validation of experimental charge-density refinement strategies: when do we overfit?", IUCrJ (2017), 4, 420-430 (and therein renamed as Rcross). Best, Tim
Dear Toon, this does't answer your question, but cutting your data at 3.6 Å is probably not the best way of continuing. Try processing with autoPROC or submit the full data to the staranio server (staraniso.globalphasing.com) to get a better representation of the extent of reciprocal space that your experiment covers. You will end up with more reflections for refinement etc. All best. Andreas On Wed, Apr 4, 2018 at 11:47 AM, Toon Van Thillo < [email protected]> wrote:
Hi all,
Currently I am refining a data set which showed anisotropic diffraction. Aimless suggested cutoffs at 2.3, 2.6 and 3.6 angstrom for the h,k and l axis.
I chose a general 3.6 cutoff to obtain satisfactory statistics for Rmeas, I/sd(I) and CC1/2. At this resolution the data set consists of approximately 2800 reflections.
Generally 5% of the set is set aside as the Rfree test set and I found that a minimum of 500 reflections in total is used to produce a reliable Rfree. However, 5% only amounts to 140 reflections in this case. I am hesitant to include more reflections as I would have to go up to 20% of the reflections to obtain more than 500 reflections for the test set. In a discussion on the CCP4 message boards some time ago it was suggested to do multiple refinements with different test sets:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L= ccp4bb&F=&S=&P=125570
In the thread it was also discussed that a least squares approach is prefered when using a small test set. However, when using a LS target, the resulting Rfree is very high (10% higher than when using the automatic option) and *phenix.refine* produces awful geometry (24% ramachandran outliers, 105 clashcore...). It seems that the refinement is performed without restraints? Optimize X-ray/stereochemistry weight does not result in improved stereochemistry. My question is if the LS approach is still relevant and if so, is there an explanation (and solution) for the bad statistics?
Kind regards,
Toon Van Thillo
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb Unsubscribe: [email protected]
Dear Toon, if you have data up to 2.3A resolution, you should keep it, it is so valuable. The isotropic statistics that you are looking at and output by Aimless are I'm sure awful, because you simply don't have data (lack of completeness) in the high resolution shells, and the statistics are produced out of a mixture of real data (weak) and noise. Staraniso has done a lot of work on outputting statistics more meaningful in term of Table 1 and anisotropy, so it could be worth a try to submit your data and see what comes out of it. In any case, I would use the data to the resolution limit suggested by Aimless, and judge your electron density maps. I'm sure you will have a big improvement if you include the 2.3A limit, even though it is not going to look like an isotropic 2.3A. If you use the data output by staraniso, please ask the server to also output your original data so you can deposit the original intensities in the PDB and investigators can look at it untouched in the future. There is a check box on the server to tick. Best of luck with your data Vincent On 04/04/2018 11:47, Toon Van Thillo wrote:
Hi all,
Currently I am refining a data set which showed anisotropic diffraction. Aimless suggested cutoffs at 2.3, 2.6 and 3.6 angstrom for the h,k and l axis.
I chose a general 3.6 cutoff to obtain satisfactory statistics for Rmeas, I/sd(I) and CC1/2. At this resolution the data set consists of approximately 2800 reflections.
Generally 5% of the set is set aside as the Rfree test set and I found that a minimum of 500 reflections in total is used to produce a reliable Rfree. However, 5% only amounts to 140 reflections in this case. I am hesitant to include more reflections as I would have to go up to 20% of the reflections to obtain more than 500 reflections for the test set. In a discussion on the CCP4 message boards some time ago it was suggested to do multiple refinements with different test sets:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=ccp4bb&F=&S=&P=125570
In the thread it was also discussed that a least squares approach is prefered when using a small test set. However, when using a LS target, the resulting Rfree is very high (10% higher than when using the automatic option) and /phenix.refine/ produces awful geometry (24% ramachandran outliers, 105 clashcore...). It seems that the refinement is performed without restraints? Optimize X-ray/stereochemistry weight does not result in improved stereochemistry. My question is if the LS approach is still relevant and if so, is there an explanation (and solution) for the bad statistics?
Kind regards,
Toon Van Thillo
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb Unsubscribe: [email protected]
-- Vincent Chaptal, PhD Institut de Biologie et Chimie des Protéines Drug Resistance and Membrane Proteins Laboratory 7 passage du Vercors 69007 LYON FRANCE +33 4 37 65 29 01 http://www.ibcp.fr
Dear Toon & Vincent, Please note that, even if you tick the box to append the original data, the Staraniso server will still discard any reflections that it considers to be too weak to be observed. So you still need to go back to the data prior to Staraniso to keep all the original data for deposition in the PDB. The elimination of weak data can be very useful for programs that do not account well for the effect of measurement error, but it can in fact be a serious problem for Phaser, which assumes implicitly that no systematic truncation has been carried out. Phaser does its own analysis of anisotropy, which can be misled by throwing away all the weak reflections that indicate which direction of diffraction is weakest. Phaser also has its own method to decide (after the anisotropy and, if relevant, tNCS analysis) which reflections will not contribute significantly to the likelihood calculations; in the cases I've looked at, Staraniso discards a substantial number of reflections that Phaser's criteria label as being at least marginally useful. As time goes on, one hopes that more and more programs will be adapted to make better use of the data and its estimated errors, and it would be a great pity if data they could use were left out of the PDB. Best wishes, Randy Read
On 4 Apr 2018, at 11:58, vincent Chaptal
wrote: Dear Toon,
if you have data up to 2.3A resolution, you should keep it, it is so valuable. The isotropic statistics that you are looking at and output by Aimless are I'm sure awful, because you simply don't have data (lack of completeness) in the high resolution shells, and the statistics are produced out of a mixture of real data (weak) and noise.
Staraniso has done a lot of work on outputting statistics more meaningful in term of Table 1 and anisotropy, so it could be worth a try to submit your data and see what comes out of it. In any case, I would use the data to the resolution limit suggested by Aimless, and judge your electron density maps. I'm sure you will have a big improvement if you include the 2.3A limit, even though it is not going to look like an isotropic 2.3A.
If you use the data output by staraniso, please ask the server to also output your original data so you can deposit the original intensities in the PDB and investigators can look at it untouched in the future. There is a check box on the server to tick.
Best of luck with your data Vincent
On 04/04/2018 11:47, Toon Van Thillo wrote:
Hi all,
Currently I am refining a data set which showed anisotropic diffraction. Aimless suggested cutoffs at 2.3, 2.6 and 3.6 angstrom for the h,k and l axis. I chose a general 3.6 cutoff to obtain satisfactory statistics for Rmeas, I/sd(I) and CC1/2. At this resolution the data set consists of approximately 2800 reflections.
Generally 5% of the set is set aside as the Rfree test set and I found that a minimum of 500 reflections in total is used to produce a reliable Rfree. However, 5% only amounts to 140 reflections in this case. I am hesitant to include more reflections as I would have to go up to 20% of the reflections to obtain more than 500 reflections for the test set. In a discussion on the CCP4 message boards some time ago it was suggested to do multiple refinements with different test sets: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=ccp4bb&F=&S=&P=125570 https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1411&L=ccp4bb&F=&S=&P=125570
In the thread it was also discussed that a least squares approach is prefered when using a small test set. However, when using a LS target, the resulting Rfree is very high (10% higher than when using the automatic option) and phenix.refine produces awful geometry (24% ramachandran outliers, 105 clashcore...). It seems that the refinement is performed without restraints? Optimize X-ray/stereochemistry weight does not result in improved stereochemistry. My question is if the LS approach is still relevant and if so, is there an explanation (and solution) for the bad statistics?
Kind regards,
Toon Van Thillo
_______________________________________________ phenixbb mailing list [email protected] mailto:[email protected] http://phenix-online.org/mailman/listinfo/phenixbb http://phenix-online.org/mailman/listinfo/phenixbb Unsubscribe: [email protected] mailto:[email protected] -- Vincent Chaptal, PhD Institut de Biologie et Chimie des Protéines Drug Resistance and Membrane Proteins Laboratory 7 passage du Vercors 69007 LYON FRANCE +33 4 37 65 29 01 http://www.ibcp.fr http://www.ibcp.fr/
_______________________________________________ phenixbb mailing list [email protected] mailto:[email protected] http://phenix-online.org/mailman/listinfo/phenixbb http://phenix-online.org/mailman/listinfo/phenixbb Unsubscribe: [email protected] mailto:[email protected]
Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills Road E-mail: [email protected] Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk
participants (6)
-
Alexandre OURJOUMTSEV
-
Andreas Forster
-
Randy Read
-
Tim Gruene
-
Toon Van Thillo
-
vincent Chaptal