Re: [phenixbb] questions related to Phenix refinement

18 Jan 2015

      Dear Kay,

thanks for email and bringing this topic!
...
...
...
...
In the X-ray statistics by resolution bin of the Phenix.refine result,
there is a column "%complete".  For my refinement data, I find the
better the resolution (from lower resolution to the higher
resolution), the lower the completeness (for example for 40-6 A,
%complete is 98, for 3.1-3.0 A, %complete is 60%, for 2.2-2.1 A,
  %complete is  6%).
Will you please tell me what does this "%complete" mean? why it
decreases in the better diffraction bin?
Completeness is how many reflections you have compared to theoretically
possible. So the higher completeness the better. Ideally (and it's not
that uncommon these days) you should have 100% complete data set in
d_min-inf resolution. Anything below say 80 in any resolution bin is
bad, and numbers you quote 6-60% mean something is wrong withe the dataset.
Given your standing in the community, the last sentence will lead many
unexperienced people to believe that they should cut their data at the
resolution where the completeness falls below "say 80"%.
But that would be wrong. There is no reason to consider a completeness
as "too low in a high-resolution shell" as long as the data in that
shell are good. Particularly in refinement any reflection helps to
improve the model, and to reduce overfitting.
Clearly, email is not the best way of communication, especially if 
written without a lawyer's help and attempted to read between the lines!

No, I was not suggesting to cut the data, particularly if cutting is 
judged by completeness exclusively. What I was really saying is that if 
the data set is so incomplete then that should be alerting and prompt to 
review data collection and processing steps (rather than spending months 
struggling with a poor data set!).

Also, I think, extremes such as routine data cutoffs by "sigma" or/and 
resolution (as used to be in the past) and panic fear to throw away a 
reflection (as the modern trend is) may be counterproductive. Indeed, 
for example, non-permanent data cutoffs by resolution (or by other 
criteria, such as derived from Fobs vs Fmodel differences) may be 
essential for success of refinement and phasing by Molecular Replacement:

           J. Appl. Cryst. (2008). 41, 491-522
           Structure refinement: some background theory and practical 
strategies
           D. Watkin

           Acta Cryst. (1999). D55, 1759-1764
           Detecting outliers in non-redundant diffraction data
           R. J. Read

           J. Appl. Cryst. (2009). 42, 607-615
           Automatic multiple-zone rigid-body refinement with a large 
convergence radius
           P. V. Afonine, R. W. Grosse-Kunstleve, A. Urzhumtsev and P. 
D. Adams

           STIR option in SHELX.

Also, incomplete data can distort maps. As few as 1% of missing 
reflections may be sufficient to destroy molecule image in Fourier maps:

           Acta Cryst. (1991). A47, 794-801
           Low-resolution phases: influence on SIR syntheses and 
retrieval with double-step filtration
           A. G. Urzhumtsev

           Acta Cryst. (2014). D70, 2593-2606
           Metrics for comparison of crystallographic maps
           A. Urzhumtsev, P. V. Afonine, V. Y. Lunin, T. C. Terwilliger 
and P. D. Adams

           Retrieval of lost reflections in high resolution Fourier 
syntheses by 'soft' solvent flattening.
           Natalia L. Lunina, Vladimir Y. Lunin and Alberto D. Podjarny
http://www.ccp4.ac.uk/newsletters/newsletter41/00_contents.html

Finally, it is a poor idea to assign the data resolution the resolution 
of the highest resolution reflection unless the data set is 100% 
complete. Instead, effective resolution (that has strict mathematical 
definition and meaning) should be used:

           Acta Cryst. (2013). D69, 1921-1934
           On effective and optical resolutions of diffraction data sets
           L. Urzhumtseva, B. Klaholz and A. Urzhumtsev

Summarizing, a severely incomplete data set should trigger suspicion. If 
that's the only datset available then correct expectations should be set 
about (possible difficulty of) structure solution and quality of final 
model.

All the best,
Pavel

Re: [phenixbb] questions related to Phenix refinement

Pavel Afonine