[phenixbb] Optimizing the Geometry Weight for a 2.76 A structure

Fri Apr 2 09:35:10 PDT 2010

    That uncertainty is not quite the same thing.  What you describe
is the uncertainty in the free R due to the small sampling size of
the test set.  It is the spread of free R's obtained when different
test sets are chosen for the same project and circumstances, and is
useful when comparing free R's calculated using different test sets,
or, heaven forbid, trying to compare free R's from different crystals.

    In this case the question is "How much confidence do I have that
a drop in free R (estimate) from a single test set of X% indicates
that the average free R calculated over all test sets actually dropped?"

    It is possible that a free R estimate that happens by chance to
be 0.2% lower than the "true" free R will always be 0.2% lower and
its behavior is a reliable predictor.  It is also possible that the
difference between the free R estimate from a particular test set
that was 0.2% lower than the true value becomes 0.2% higher after
refinement.  In that case the free R estimate would be have much worst
reliability as a predictor of true free R improvement than it has
of predicting the true free R itself.

    I don't know that anyone has done the tests to determine the
actual behavior.  My personal bias is that there probably is some
tendency for "reversion to the mean" where a free R estimate that
happens to be too good will tend to become less "too good" with
refinement and one that is too bad will tend to become less "too bad".
If this is true the differences in free R estimate will have higher
reliability in predicting changes in the true free R than it has in
predicting the value of the true free R itself.

    If you do believe that the free R estimate is "fuzzy" and we see
that the minimum of the free R estimate vrs weight is rather broad,
how do you decide the optimal weight?  Let's go to the extreme and
say you have a function that looks like a square well.  Do you
choose the weight at the center of the well, assuming that is the
best compromise?  Or do you choose the the side of the well with
the lowest working R and the cleanest difference map?  Or do you
choose the side with the tightest geometry and the least overfitting?
If the improvement in free R estimate is not sufficiently precise
to determine the weight, I would like a little more guidance in
how the weight should be determined.

Dale Tronrud


Phil Jeffrey wrote:
> More pertinent to this example is the s.d. of the R-free itself, i.e.
> 
> sigma(Rfree)/Rfree ~ 1/sqrt(Ntest)
> 
> lifted from Kleywegt and Brunger, Structure, 15 August 1996, 4:897–904
> (but the original analysis from an earlier paper)
> 
> So for 1000 free reflections and an R-free of 24% the sigma is about 
> 0.75% and exceeds the range of variation that you're seeing, i.e. not a 
> significant fluctuation in R-free.
> 
> Phil Jeffrey
> Princeton
> 
> Pavel Afonine wrote:
> [snip]
> 
>> Also, the values 0.2377, 0.2388, 0.2399 ... look all the same to me. 
>> If you run 100 identical refinement runs where in each refinement the 
>> only difference is the random seed, you will get an ensemble of  
>> refined structures and the Rfree/Rwork spread can be as large as 1-2% 
>> or so (it depends on the resolution). This is because the random seed 
>> is involved in target weights calculation and therefore a small change 
>> in the random seed may slightly change the weights and this may be 
>> enough for the refinement to take another route to another "local 
>> minimum".
> 
> _______________________________________________
> phenixbb mailing list
> phenixbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/phenixbb