Re: [phenixbb] questions related to Phenix refinement
Dear Pavel, Am 18.01.15 um 07:23 schrieb [email protected]:
Date: Sat, 17 Jan 2015 20:36:30 -0800 From: Pavel Afonine
To: Smith Lee , "[email protected]" Subject: Re: [phenixbb] questions related to Phenix refinement Message-ID: <[email protected]> Content-Type: text/plain; charset="windows-1252"; Format="flowed" Hello Smith,
In the X-ray statistics by resolution bin of the Phenix.refine result, there is a column "%complete". For my refinement data, I find the better the resolution (from lower resolution to the higher resolution), the lower the completeness (for example for 40-6 A, %complete is 98, for 3.1-3.0 A, %complete is 60%, for 2.2-2.1 A, %complete is 6%).
Will you please tell me what does this "%complete" mean? why it decreases in the better diffraction bin? Completeness is how many reflections you have compared to theoretically possible. So the higher completeness the better. Ideally (and it's not that uncommon these days) you should have 100% complete data set in d_min-inf resolution. Anything below say 80 in any resolution bin is bad, and numbers you quote 6-60% mean something is wrong withe the dataset.
Given your standing in the community, the last sentence will lead many unexperienced people to believe that they should cut their data at the resolution where the completeness falls below "say 80"%. But that would be wrong. There is no reason to consider a completeness as "too low in a high-resolution shell" as long as the data in that shell are good. Particularly in refinement any reflection helps to improve the model, and to reduce overfitting. Of course, more complete is better! Nowadays there should be no reason _not_ to get >95% (>99% in high-symmetry spacegroups) completeness in the low resolution shells, unless people stick to the "collect minimum rotation range" paradigm, and get the starting angle (or the point group!) wrong. best, Kay
Dear Kay, thanks for email and bringing this topic!
In the X-ray statistics by resolution bin of the Phenix.refine result, there is a column "%complete". For my refinement data, I find the better the resolution (from lower resolution to the higher resolution), the lower the completeness (for example for 40-6 A, %complete is 98, for 3.1-3.0 A, %complete is 60%, for 2.2-2.1 A, %complete is 6%).
Will you please tell me what does this "%complete" mean? why it decreases in the better diffraction bin? Completeness is how many reflections you have compared to theoretically possible. So the higher completeness the better. Ideally (and it's not that uncommon these days) you should have 100% complete data set in d_min-inf resolution. Anything below say 80 in any resolution bin is bad, and numbers you quote 6-60% mean something is wrong withe the dataset.
Given your standing in the community, the last sentence will lead many unexperienced people to believe that they should cut their data at the resolution where the completeness falls below "say 80"%.
But that would be wrong. There is no reason to consider a completeness as "too low in a high-resolution shell" as long as the data in that shell are good. Particularly in refinement any reflection helps to improve the model, and to reduce overfitting.
Clearly, email is not the best way of communication, especially if written without a lawyer's help and attempted to read between the lines! No, I was not suggesting to cut the data, particularly if cutting is judged by completeness exclusively. What I was really saying is that if the data set is so incomplete then that should be alerting and prompt to review data collection and processing steps (rather than spending months struggling with a poor data set!). Also, I think, extremes such as routine data cutoffs by "sigma" or/and resolution (as used to be in the past) and panic fear to throw away a reflection (as the modern trend is) may be counterproductive. Indeed, for example, non-permanent data cutoffs by resolution (or by other criteria, such as derived from Fobs vs Fmodel differences) may be essential for success of refinement and phasing by Molecular Replacement: J. Appl. Cryst. (2008). 41, 491-522 Structure refinement: some background theory and practical strategies D. Watkin Acta Cryst. (1999). D55, 1759-1764 Detecting outliers in non-redundant diffraction data R. J. Read J. Appl. Cryst. (2009). 42, 607-615 Automatic multiple-zone rigid-body refinement with a large convergence radius P. V. Afonine, R. W. Grosse-Kunstleve, A. Urzhumtsev and P. D. Adams STIR option in SHELX. Also, incomplete data can distort maps. As few as 1% of missing reflections may be sufficient to destroy molecule image in Fourier maps: Acta Cryst. (1991). A47, 794-801 Low-resolution phases: influence on SIR syntheses and retrieval with double-step filtration A. G. Urzhumtsev Acta Cryst. (2014). D70, 2593-2606 Metrics for comparison of crystallographic maps A. Urzhumtsev, P. V. Afonine, V. Y. Lunin, T. C. Terwilliger and P. D. Adams Retrieval of lost reflections in high resolution Fourier syntheses by 'soft' solvent flattening. Natalia L. Lunina, Vladimir Y. Lunin and Alberto D. Podjarny http://www.ccp4.ac.uk/newsletters/newsletter41/00_contents.html Finally, it is a poor idea to assign the data resolution the resolution of the highest resolution reflection unless the data set is 100% complete. Instead, effective resolution (that has strict mathematical definition and meaning) should be used: Acta Cryst. (2013). D69, 1921-1934 On effective and optical resolutions of diffraction data sets L. Urzhumtseva, B. Klaholz and A. Urzhumtsev Summarizing, a severely incomplete data set should trigger suspicion. If that's the only datset available then correct expectations should be set about (possible difficulty of) structure solution and quality of final model. All the best, Pavel
Dear Pavel,
thanks for your thoughtful email! I am not going to try and comment on every specific point you make, mostly because I don't see any fundamental disagreement. People should really go and read the papers you cite; reading mailing lists is not a substitute for that.
Let me just make one remark: crystallography, although it resides on firm grounds, is complex enough that there are no simple rules for everything. It is oversimplification that has done a bad service to our science; just think of the "religious" Rsym cutoffs, in use until not long ago, that has caused people to discard a lot of valuable data. This is why I am against seemingly innocent rules of thumb like "the high-resolution cutoff should be done at X I/sigma or when the completeness falls below Y %" (and no, I'm not implying that you said this). There are better ways, but they are not as simplistic.
best,
Kay
On Sunday, January 18, 2015 18:58 CET, Pavel Afonine
Dear Kay,
thanks for email and bringing this topic!
In the X-ray statistics by resolution bin of the Phenix.refine result, there is a column "%complete". For my refinement data, I find the better the resolution (from lower resolution to the higher resolution), the lower the completeness (for example for 40-6 A, %complete is 98, for 3.1-3.0 A, %complete is 60%, for 2.2-2.1 A, %complete is 6%).
Will you please tell me what does this "%complete" mean? why it
decreases in the better diffraction bin? Completeness is how many reflections you have compared to theoretically possible. So the higher completeness the better. Ideally (and it's not that uncommon these days) you should have 100% complete data set in d_min-inf resolution. Anything below say 80 in any resolution bin is bad, and numbers you quote 6-60% mean something is wrong withe the dataset.
Given your standing in the community, the last sentence will lead many unexperienced people to believe that they should cut their data at the resolution where the completeness falls below "say 80"%.
But that would be wrong. There is no reason to consider a completeness as "too low in a high-resolution shell" as long as the data in that shell are good. Particularly in refinement any reflection helps to
improve the model, and to reduce overfitting.
Clearly, email is not the best way of communication, especially if written without a lawyer's help and attempted to read between the lines!
No, I was not suggesting to cut the data, particularly if cutting is judged by completeness exclusively. What I was really saying is that if the data set is so incomplete then that should be alerting and prompt to review data collection and processing steps (rather than spending months struggling with a poor data set!).
Also, I think, extremes such as routine data cutoffs by "sigma" or/and resolution (as used to be in the past) and panic fear to throw away a reflection (as the modern trend is) may be counterproductive. Indeed, for example, non-permanent data cutoffs by resolution (or by other criteria, such as derived from Fobs vs Fmodel differences) may be essential for success of refinement and phasing by Molecular Replacement:
J. Appl. Cryst. (2008). 41, 491-522 Structure refinement: some background theory and practical strategies D. Watkin
Acta Cryst. (1999). D55, 1759-1764 Detecting outliers in non-redundant diffraction data R. J. Read
J. Appl. Cryst. (2009). 42, 607-615 Automatic multiple-zone rigid-body refinement with a large convergence radius P. V. Afonine, R. W. Grosse-Kunstleve, A. Urzhumtsev and P. D. Adams
STIR option in SHELX.
Also, incomplete data can distort maps. As few as 1% of missing reflections may be sufficient to destroy molecule image in Fourier maps:
Acta Cryst. (1991). A47, 794-801 Low-resolution phases: influence on SIR syntheses and retrieval with double-step filtration A. G. Urzhumtsev
Acta Cryst. (2014). D70, 2593-2606 Metrics for comparison of crystallographic maps A. Urzhumtsev, P. V. Afonine, V. Y. Lunin, T. C. Terwilliger and P. D. Adams
Retrieval of lost reflections in high resolution Fourier
syntheses by 'soft' solvent flattening. Natalia L. Lunina, Vladimir Y. Lunin and Alberto D. Podjarny http://www.ccp4.ac.uk/newsletters/newsletter41/00_contents.html
Finally, it is a poor idea to assign the data resolution the resolution of the highest resolution reflection unless the data set is 100% complete. Instead, effective resolution (that has strict mathematical definition and meaning) should be used:
Acta Cryst. (2013). D69, 1921-1934 On effective and optical resolutions of diffraction data sets L. Urzhumtseva, B. Klaholz and A. Urzhumtsev
Summarizing, a severely incomplete data set should trigger suspicion. If that's the only datset available then correct expectations should be set about (possible difficulty of) structure solution and quality of final model.
All the best, Pavel
-- Kay Diederichs http://strucbio.biologie.uni-konstanz.de email: [email protected] Tel +49 7531 88 4049 Fax 3183 Fachbereich Biologie, Universität Konstanz, Box 647, D-78457 Konstanz
Dear Kay,
Let me just make one remark: crystallography, although it resides on firm grounds, is complex enough that there are no simple rules for everything. It is oversimplification that has done a bad service to our science; just think of the "religious" Rsym cutoffs, in use until not long ago, that has caused people to discard a lot of valuable data. This is why I am against seemingly innocent rules of thumb like "the high-resolution cutoff should be done at X I/sigma or when the completeness falls below Y %" (and no, I'm not implying that you said this). There are better ways, but they are not as simplistic.
I can't agree more! Pavel
participants (3)
-
Kay Diederichs
-
Kay Diederichs
-
Pavel Afonine