[phenixbb] Geometry Restraints - Anisotropic truncation

Thu May 3 11:22:53 PDT 2012

On 05/03/12 10:35, Kendall Nettles wrote:
> Dale, 
> I completely agree that Rfree does not alway correlate with the best models. Or maybe I should say that  to see a correlation with other measures of model quality, I think you need  differences in Rfree on the order of a couple of percent at least. 
> 
> My question was whether Rfree changes differently when you throw out reflections with different signal to noise ratios, and whether this difference might be a useful guide to selecting resolution. Are you saying that Rfree is completely irrelevant to selecting resolution, or that some combination of Rfree and other model quality measures should be considered together? 
> 
   There are philosophical issues here as well as practical ones.  On the practical
side, you have to chose a resolution limit for your data long before you have
a model good enough to start reliable free R calculations or "model quality" measures.
To my mind these sorts of things can only be applied very late in the game and you
have already pretty much defined your model.  Changing the resolution limit then
is only useful to make the stats "look better".  If you can "improve" your R's without
changing the model you have not accomplished anything, since what people look for in
the R values is an assessment of the quality of the model.

   I believe (back to philosophy) that you have to define your data set and then build
a model consistent with it.  Any large scale modification of the data set after the
fact is problematic.

> I'm also confused about this:
> 
>> An important point is that the Fc's must never be used to judge the quality of the Fo's in a production environment.  
> 
> Does this mean that you shouldn't use maps to judge the resolution limits? I also don't understand how you can use the model as the judge but not the Fc. If there are some objective qualities of the model that can be used to determine resolution cut-offs, then it seems to be that it still might be possible to incorporate an automated procedure during the refinement. Do you envision that the optimal resolution limit might change over the course of refinement, as the phases improve? 
> 
   Models, Fc's, maps, they are all the same to me.  Each flows from the other.  You
can't criticize your data based on a map any more than the model.

   I didn't say, however, that the map has to be calculated using all the data you
used in refinement.  The map is only a tool for presenting information to the human
eye (in Coot rebuilding at least).  We make all sorts of decisions when calculating
maps, from sampling rates and contour levels to different sets of Fourier coefficients
each with special properties.  Certainly resolution limit can be one of these choices.
But these choices are not permanent and they do not affect the subsequent refinement
or assessment of the final model.

   An example is the sharpening of low resolution maps.  This technique can make the
map more interpretable to the model builder, but the original structure factors used
in refinement should never be "sharpened".  Refinement will do just fine with a blurry
data set and produce a model with suitably high B factors.  To sharpen the Fobs's is
deceptive, perhaps inadvertently but deceptive none the less.

   No I don't think the resolution limit might change over the course of refinement.
The data set contained signal before we started refinement and the amount of signal
is exactly the same afterwords.  Presuming you left it alone.

   In the distant past one would start refinement using only low resolution data and
increase the resolution as the size of the required shifts got smaller.  It has been
a long time since the refinement programs we use had a radius of convergence that
poor.  The maximum likelihood procedures used today go a long way toward properly
weighting the (Fobs-Fcalc)'s to allow convergence without such, crude, manipulation
of the data.

Dale

> Kendall
> 
> On May 3, 2012, at 1:05 PM, Dale Tronrud wrote:
> 
>>
>>   The fact that the R value stats get better when you toss out data is
>> NOT an indication that those data contain no signal.  It simply indicates
>> that that subset has a lower signal/noise than the remaining data.  If
>> you decide to throw away all data with less than average signal to noise
>> you will get better and better R values until you have no data left at
>> all!
>>
>>   Tests along the line of what Tom has recommended are in the right
>> direction, but they have already been done.  I have unpublished work
>> where I took a project with a 1.25A data set, as judged by I/sigI > 2
>> and near 100% completeness and tested the addition of higher resolution
>> data out to 1.1A with very poor stats on both counts.  I found that
>> the Rfree calculated only to 1.25A improved by adding the noisy data,
>> and the esd's (I was using shelxl) dropped indicating that the model
>> was more precise.  I performed the appropriate control to show that
>> you couldn't just add any numbers out there, you had to use the measured
>> numbers to get the improvements.
>>
>>   At the CCP4 meeting in January Kay Diederichs reported on work he has
>> done with P. A. Karplus which was much more rigorous.  They show that
>> a lot of data beyond the usual cut-off limits is useful to improving
>> the final model by several measures and they have developed a tool for
>> determining, on an objective level, at what resolution there is no longer
>> signal.  That resolution limit was found to be much higher than what we
>> used to and our final R values will be higher as a consequence.  But the
>> models that result are better when assessed by properly controlled tests.
>> This work will be in print shortly.
>>
>>   An important point is that the Fc's must never be used to judge the
>> quality of the Fo's in a production environment.  At the very least you
>> have to recognize that you don't have reliable Fc's at the start of
>> refinement and yet you need to decide what data to use.  If all you are
>> doing is changing your resolution limit after refinement to "clean up
>> your stats" you are wasting your time.  That sort of thing has nothing
>> to do with building better models.  The Diederichs and Karplus test
>> looks directly at the F^2s in the unmerged data to see what signal is
>> there.
>>
>>   None of this says anything about the merits of spherical verses
>> elliptical cutoff surfaces.  These tests only discuss the radius of
>> whatever surface you choose.  It seems to me if the signal/noise ratio
>> drops off faster in some directions than others that the point where
>> there is no signal will differ too.  Whatever those elliptical cutoff
>> limits are, they should be much more generous than current practice
>> and not determined by looking at R values.
>>
>> Dale Tronrud
>>
>> On 05/03/12 08:24, Terwilliger, Thomas C wrote:
>>>
>>> Hi Kendall,
>>> Yes, I think you could use this kind of approach to make overall
>>> decisions of any kind, including those you suggest. I would not use
>>> Rsleep for anything at all, other than calculating a final number.   I
>>> would use a fixed Rfree set (which could be a subset of the total free
>>> set or the whole set) for all such decision making. If a lot of such
>>> decisions are made with Rfree...then yes it would be good to have an
>>> Rsleep to make sure that everything is ok.
>>> All the best,
>>> Tom T
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Kendall Nettles [knettles at scripps.edu]
>>> *Sent:* Thursday, May 03, 2012 9:05 AM
>>> *To:* Terwilliger, Thomas C; PHENIX user mailing list
>>> *Subject:* Fwd: [phenixbb] Geometry Restraints - Anisotropic truncation
>>>
>>> Hi Tom, 
>>> Do you think something like this could be used during refinement to
>>> identify the "best" resolution limits? If you have an Rsleep set would
>>> Rfree be sufficient for this? I imagine  collecting data with a ring of
>>> noise and then let the optimal resolution be determined during
>>> refinement. My understanding of this is that the modern refinement
>>> algorithms can handle some noise in the reflections, but maybe this
>>> could be a way to optimize how much signal is needed to contribute in a
>>> positive fashion? 
>>> Kendall
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> phenixbb mailing list
>>> phenixbb at phenix-online.org
>>> http://phenix-online.org/mailman/listinfo/phenixbb
>> _______________________________________________
>> phenixbb mailing list
>> phenixbb at phenix-online.org
>> http://phenix-online.org/mailman/listinfo/phenixbb
> 
> _______________________________________________
> phenixbb mailing list
> phenixbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/phenixbb