[phenixbb] Optimizing the Geometry Weight for a 2.76 A structure

Fri Apr 2 14:04:58 PDT 2010

Dear Phil,

    I don't have access to vol 277 of Met Enz here, which is sad since
I wrote a chapter in the thing, so I can't comment on Axel's exact
words.

    I agree that the uncertainty in the ability of the free R estimate
based on a subset of reflections to predict the mean of the free R
calculated from all possible test sets of that size is proportional to 1/N.

    My point is that the pdf of the difference of two random variables
is not the same as the pdf of either of them.  You have to know the
coupling between them to decide if the difference between them has
greater or less precision than the individual variable.  For example,
let's assume two measurements with a precision of 1% but the difference
in the two measurements is 0.5%.  This small difference can be
significant if the causes of the uncertainties are systematic.  The
"errors" would cancel out.

    I believe the fluctuations in free R estimate due to the small size
of the test set are due to the particular indices in the test set and
therefore mainly systematic.  The intensities of some reflections may
have been measured better than others and your test set may happen to
be more enriched with those, resulting is a slightly lower free R
estimate.  Since the test set is always the same during refinement the
benefit of this enrichment (or detriment if you were unlucky) will be
subtracted out when you compare the two free R estimates during your
optimization.  I expect the difference between two free R estimates
(using the same test set) will be a more precise indicator of the true
change in free R than either estimate is in predicting the free R itself.

Dale

Phil Jeffrey wrote:
> Although I am totally unclear why the form of the equation, which is 
> just counting/Poisson statistics, doesn't also apply to the uncertainty 
> of calculating the average percentage deviation over a test set of size 
> N.  For example a test set of N=1000 could be regarded as 100 different 
> test sets of size N=10, and I think it's likely that the distribution of 
> R-free for these 100 mini test sets would be Poisson in form centered 
> around the R-free for the superset N=1000.
> 
> So, since the change in |Fo-Fc| for each reflection doesn't simply scale 
> with wxc in structure refinement nor does it inevitably decrease for 
> every reflection on structure improvement, the change in R-free for any 
> given change in structure should be related to N in the form of the 
> equation given, so does reflect the s.d. of R-free as an estimate for 
> structure improvement.
> 
> No ?
> 
> If not, then I misinterpreted what Brunger mean on p.394 of his 1997 Met 
> Enz paper because that's certainly what it read like.
> 
> Phil
> 
> det102 at uoxray.uoregon.edu wrote:
>>
>>    That uncertainty is not quite the same thing.  What you describe
>> is the uncertainty in the free R due to the small sampling size of
>> the test set.  It is the spread of free R's obtained when different
>> test sets are chosen for the same project and circumstances, and is
>> useful when comparing free R's calculated using different test sets,
>> or, heaven forbid, trying to compare free R's from different crystals.