[phenixbb] Optimizing the Geometry Weight for a 2.76 A structure

Thu Apr 1 16:51:27 PDT 2010

Hi Sam,

> When the spread between R-work and R-free widens, this means that 
> you've optimized R-work at the expense of R-free, i.e. you're 
> "overfitting".  If they're too close together this can indicate that 
> you've somehow biased your refinement (by improper assignment of 
> R-free flags in the presence of NCS, accidentally switching R-free 
> sets in the middle of refinement, or using a very similar MR search 
> model refined against a different set are the usual reasons).

I would also add to the above list: "improper assignment of R-free flags 
in the presence of twinning".

>   Therefore, if a refinement raises R-work and lowers R-free, this is 
> almost always a good thing.

I think it is important "by how much" it lowers or/and rises. Say you 
have a 2.5A resolution data: I would probably still prefer Rwork/Rfree ~ 
20/26% over 23/25%, since the 20/26% result would more likely to give a 
better map at the cost of "insignificant" Rfree fluctuation.

>  Anything that raises R-free relative to the starting value is bad.

A possible exception is when you run refinement for the first time (not 
in your life, but given a model and data). Then Rwork ~ Rfree at the 
start, and they diverge as refinement progresses (may both go down, one 
faster than the other, or Rwork may drop and Rfree may increase). Again, 
what's important here is "by how much" they drop or rise.

>     2. How can I judge the output from my refinement? I have looked at
>     Rwork, Rfree, and the molprobity clashscore and overall score
>     values. I included them below, at the end of this email. How do I
>     tell which the best refinement is? Which one would you suggest? I
>     thought the best was wxc = 0.1 since the R-work and Rfree aren't
>     changed much from the start values but the geometry is far better.
>
> The first one in the list, with wxc=0.01, R=0.2104, R-free=0.2378 is 
> definitely the best, because all of the statistics that matter are 
> much better than in any other refinement.
>
> PS.  Use POLYGON in the GUI to get a better idea of how good these 
> statistics are relative to other structures.

Additionally, look at the local model-to-density fit quality: map 
correlation reported for per residue (for lower resolutions) or per atom 
(at higher resolution). It is available in PHENIX GUI and from the 
command line.

Also, the values 0.2377, 0.2388, 0.2399 ... look all the same to me. If 
you run 100 identical refinement runs where in each refinement the only 
difference is the random seed, you will get an ensemble of  refined 
structures and the Rfree/Rwork spread can be as large as 1-2% or so (it 
depends on the resolution). This is because the random seed is involved 
in target weights calculation and therefore a small change in the random 
seed may slightly change the weights and this may be enough for the 
refinement to take another route to another "local minimum".

Pavel.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/phenixbb/attachments/20100401/2d21e0cc/attachment-0003.htm>