[phenixbb] refinement question

Pavel Afonine PAfonine at lbl.gov
Fri Jun 26 12:30:48 PDT 2009


Hi Maia,

I would add a couple of comments too:

1)
> I am wondering why the ncs refinement gives me a better Rfree (21.0%) 

Because by using NCS you added some "observations", therefore you 
improved the data-to-parameters ratio, which in turn reduced degree of 
overfitting.
The fact that the R-factor dropped (and not increased) probably suggests 
that you selected NCS groups correctly.

2)
> the ncs refinement gives me a better Rfree (21.0%) 
> than without ncs (21.7%). 

Here is another stream of thought...
The target for restrained coordinate (and similarly for B-factor) 
refinement looks like this (in phenix.refine):

T_total = wxc_scale * wxc * T_xray + wc * T_geometry

where the relative target weight wxc is determined as  wxc ~ ratio of 
gradient's norms:

Brünger, A.T., Karplus, M. & Petsko, G.A. (1989). Acta Cryst. A45, 
50-61. "Crystallographic refinement by simulated annealing: application 
to crambin"
Brünger, A.T. (1992). Nature (London), 355, 472-474. "The free R value: 
a novel statistical quantity for assessing the accuracy of crystal 
structures"
Adams, P.D., Pannu, N.S., Read, R.J. & Brünger, A.T. (1997). Proc. Natl. 
Acad. Sci. 94, 5018-5023. "Cross-validated maximum likelihood enhances 
crystallographic simulated annealing refinement"

Before wxc scale is computed, the structure is subject of a short 
molecular dynamics run - this is where the random  component comes into 
play.

Now, having said this, we know that if you run, for example, 100 
identical phenix.refine runs, where the only difference between each run 
is the random seed, you will get 100 slightly different refinement 
results. The spread in R-factors depends on resolution, and if I 
remember correctly, for a structure at ~2A resolution I was getting 
delta_R~ from 0.1 to 2%. It can be higher at lower resolution, and 
smaller at higher resolution.

The NCS term goes into T_geometry, which in turn means that it changes 
(somehow) the weight. This may explain the difference in R-factors and, 
I would say, the one less then 1% I would consider insignificant (unless 
it is highly systematic and consistent observation, and unless it is not 
made weight independent).

To make it less arbitrary I would suggest to run two refinement jobs 
using "optimize_wxc=true optimize_wxu=true", one with NCS and the other 
one without using NCS. I'm sure I suggested this a month or two ago.

Pavel.





More information about the phenixbb mailing list