Re: [phenixbb] geometry_minimization makes molprobity score worse

13 Jul 2021


      P.S.:

and if on top of using SS and Ramachandran plot restraints I bump up the 
nonbonded weight by a bit (150), then I get

---------------------------------------
   Deviations from Ideal Values.
     Bond      :  0.003   0.012    519
     Angle     :  0.776   3.318    702

   Molprobity Statistics.
     All-atom Clashscore : 0.00
     Ramachandran Plot:
       Outliers :  0.00 %
       Allowed  :  0.00 %
       Favored  : 100.00 %
     Rotamer Outliers :  0.00 %
     MolProbity score: 0.5

   Rama-Z: 1.08
---------------------------------------

Pavel

On 7/13/21 11:44, Pavel Afonine wrote:
...
James,
okay here comes a more detailed analysis...
This is the statistics for monomer.pdb:
---------------------------------------
Deviations from Ideal Values.
  Bond      :  0.014   0.037    519
  Angle     :  1.622   6.565    702
Molprobity Statistics.
  All-atom Clashscore : 0.00
  Ramachandran Plot:
    Outliers :  0.00 %
    Allowed  :  1.61 %
    Favored  : 98.39 %
  Rotamer Outliers :  1.85 %
  MolProbity score: 0.70
Rama-Z: -0.32
---------------------------------------
and here it is for monomer_minimized.pdb:
---------------------------------------
Deviations from Ideal Values.
  Bond      :  0.001   0.010    519
  Angle     :  0.361   3.089    702
Molprobity Statistics.
  All-atom Clashscore : 3.12
  Ramachandran Plot:
    Outliers :  1.61 %
    Allowed  :  9.68 %
    Favored  : 88.71 %
  Rotamer Outliers :  1.85 %
  MolProbity score: 1.89
Rama-Z: -3.20
---------------------------------------
What I see is totally unsurprising to me. The model does not have 
hydrogens and restraint only have repulsions as NCI term. 
All-together, these two facts (no H and poor NCI terms) result in 
deterioration of secondary structure (and other H-bonds), poorer 
Ramachandran plot and slightly larger clashscore. Since Molprobity 
score is based on accumulation of clashes, Rama plot stats, rotamers 
etc -- no wonder it got worse in this case!
As a side remark, I don't know if clashscore 0 is better or worse than 
3.12. Generally, zero clashscore is not the goal. Clashscore 
calculation is agnostic to genuine NCI and various non-standard 
covalent links. For example, if you have something covalently attached 
to your aa residue, clashscore calculation will regard it as a 
terrible clash, which in reality isn't.
Now, back to your example. Adding H to monomer.pdb and running 
minimization gives me:
---------------------------------------
Deviations from Ideal Values.
  Bond      :  0.001   0.008    519
  Angle     :  0.340   3.422    702
Molprobity Statistics.
  All-atom Clashscore : 0.00
  Ramachandran Plot:
    Outliers :  1.61 %
    Allowed  : 14.52 %
    Favored  : 83.87 %
  Rotamer Outliers :  0.00 %
  MolProbity score: 1.18
Rama-Z: -4.59
---------------------------------------
Expectedly, adding H reduced clashscore, eliminated rotamer outliers, 
but deteriorated secondary structure even more as evidenced by Rama 
favored and terrible Rama-Z. Turns out for the Molprobity score the 
improvements in clashscore and rotamers outweighed worsening of 
Ramachandran stats, and the Molprobity score becames better as result.
Now, if I re-ran minimization of model with H but this time using 
secondary structure restraints (which are essentially H-bond 
restraints for SS elements) and Ramachandran plot restraints, here is 
what I get:
---------------------------------------
Deviations from Ideal Values.
  Bond      :  0.001   0.009    519
  Angle     :  0.340   3.267    702
Molprobity Statistics.
  All-atom Clashscore : 1.04
  Ramachandran Plot:
    Outliers :  0.00 %
    Allowed  :  0.00 %
    Favored  : 100.00 %
  Rotamer Outliers :  0.00 %
  MolProbity score: 0.80
Rama-Z: 0.90
---------------------------------------
Now everything is 'perfect', and Molprobity score is the same as for 
your starting model. Can you achieve the same stats with other 
packages? I'm *not* saying the model with this statistics is realistic 
or any better than your initial model (remember, outliers that are 
supported by the data are expected!) but this result shows that 
underlying algorithms work as expected.
So far all makes sense to me...
Pavel
On 7/12/21 17:55, James Holton wrote:
...
Thank you Dale for the thoughtful explanation.
I understand and agree that the "ideal" geometry is just an 
approximation.  But is our approximation so bad that that it is 
impossible to construct a model that doesn't have "problems"? As in: 
do we always have to decide between strained bonds and angles vs bad 
clashes and torsional outliers?  Even in the absence of an x-ray term?
I don't believe that is so.  My evidence to support this is that I 
can get pretty close to perfect molprobity scores using amber 
(imin=1), refmac (refi type ideal), and pdb2ins/shelxl (WGHT 0.1 0 0 
0 0 0), but not with phenix.geometry_minimization. I thought perhaps 
the phenix developers might find that interesting?  I certainly do.
I do get a better clash score if I provide a model with phenix.reduce 
hydrogens, but I always get Rama outliers. CDL or no CDL. And, of 
course, if I turn on Rama restraints I always get clashes, no matter 
what nonbonded_weight I use.
1aho is a small, 1.0 A structure so it is not hard to see what is 
specifically happening.  The one Rama outlier is Gly17. After amber 
minimization it is on the edge of the "favored" region, but after 
geometry_minimization it is pushed into outlier territory. The other 
thing bumping up the Molprobity score is a steric clash between 
OG:Ser40 and O:Gly43.  Yes, there is a hydrogen between them, but the 
O--O distance goes from 2.61 to 2.33 A, which is pretty tight for an 
H-bond. I realize that in common practice a single outlier is 
considered "OK", but in reality they are physically unreasonable.
I bring up my "whirlygig hypothesis" because the contour length of 
the main chain is the only aspect of macromolecular models that is 
sensitive to small "errors" in the geometry. Or, at least, errors 
that are not counter-balanced by other errors. Yes, the difference in 
the ideal (aka "goal") CA-CA distance in phenix is only 2% shorter 
than it is in the other three packages, but a 2% stretch of a 100 aa 
chain requires ~4 kcal/mol.  This is significantly more than any 
conventional dihedral and/or anti-bumping restraints, which is where 
I think the energy is going.
Or do you think something else is going on?
Thank you for taking the time to consider this,
-James
On 7/9/2021 2:12 PM, Dale Tronrud wrote:
...
Hi,
   I thought I'd toss in an observation that might clarify some of 
your problems.  As Pavel noted earlier, the uncertainty in the 
"ideal" values in libraries based on high resolution crystal 
structures are about 0.02 A for most bonds and one to two degrees 
for bond angles (with a higher number for the NCaC angle).  The 
origin of these uncertainties is not "error" but the actual 
variability of atomic arrangements caused by their environment. The 
variability these "uncertainties" represent are, in fact, real.
   Not only are the angles and bonds in individual molecules 
perturbed by their surroundings, but the distortions of one angle 
will, systematically, be connected to distortions of other angles.  
For example, when the NCaC angle is larger than average both the 
NCaCb and CbCaC angles will tend to be smaller.  This particular 
relationship is ignored in most libraries of standard geometry but 
is implicit in the CDLs that Andy Karplus has produced.
   What this fact indicates to me is that the standard libraries 
describe an ensemble of molecular models that all are consistent 
with the library, and long as their deviations from the mean are 
within the uncertainty.  This means that there is nothing at all 
special about the conformation that has average bond length and 
angles and makes me question to utility of creating an "idealized" 
model.  There really is no evidence that a significant number of 
molecules actually exist with that set of lengths and angles.  In 
fact, since we don't really understand all the interrelationships 
between the lengths and angles, there is a good chance that a 
molecule with all "perfect" angles and lengths is impossible! There 
may be a conflict which we identify as a Molprobaty clash which 
forbids that arrangement of atoms despite them being within the 
simple-minded ensemble limits of a library w/o interrelationships.
   In the absence of any other information you may decide that an 
"idealized" set of coordinates is the safest statement you can make 
about your molecule's structure but this limitation of traditional 
geometry libraries will mean that you have little to no confidence 
that any molecule will actually have that structure.  It is kind of 
like the expectation value of a structure factor whose magnitude you 
know but you have no idea of the phase.  There is a center to that 
distribution, but that doesn't mean that the probability at that 
center is non-zero.
Dale Tronrud
On 7/8/2021 10:24 AM, James Holton wrote:
...
Thank you Pavel for your prompt response!
I agree with everything you wrote below, and that is a good point 
about 2nd derivatives.
However, what I'm seeing is the opposite of what you might predict. 
See below.
On 7/7/2021 11:27 PM, Pavel Afonine wrote:
...
Hi James,
thanks for email and sharing your observations!
...
Greetings all, and I hope this little observation helps improve 
things somehow.
I did not expect this result, but there it is. My MolProbity 
score goes from 0.7 to 1.9 after a run of 
phenix.geometry_minimization
I started with an AMBER-minimized model (based on 1aho), and that 
got me my best MolProbity score so far (0.7). But, even with 
hydrogens and waters removed the geometry_minimization run 
increases the clashscore from 0 to 3.1 and Ramachandran favored 
drops from 98% to 88% with one residue reaching the outlier level.
It is not a secret that 'standard geometry restraints' used in 
Phenix and alike (read Refmac, etc) are very simplistic. They are 
not aware of main chain preferential conformations (Ramachandran 
plot), favorable side chain rotamer conformations. They don't even 
have any electrostatic/attraction terms -- only anti-bumping 
repulsion! Standard geometry restraints won't like any NCI 
(non-covalent interaction) and likely will make interacting atoms 
break apart rather than stay close together interacting.
Yes, there's the rub: I'm not seeing "interacting atoms break 
apart", but rather they are being smashed together. Torsion angles 
are also being twisted out of allowed regions of the Ramachandran 
plot.
All this with the x-ray term turned off!
...
With this in mind any high quality (high-resolution) atomic model 
or the one optimized using sufficiently high-level QM is going to 
have a more realistic geometry than the result of geometry 
regularization against very simplistic restraints target. An example:
https://journals.iucr.org/d/issues/2020/12/00/lp5048/lp5048.pdf
and previous papers on the topic.
I agree, but what doesn't make sense to me is how the "simplistic 
restraints" of phenix.geometry_minimization would be so 
inconsistent with the "simplistic restraints" in phenix.molprobity ?
What I am doing here is starting with an energy-minimized model of 
a 1.0 A structure (1aho). It's not a fancy QM, just the ff14SB 
potential in AMBER.  I get my best molprobity scores this way, but 
I need an x-ray refinement program like phenix.refine to compare 
these models with reality.  It troubles me that the "geometry" in 
the x-ray refinement program all by itself messes up my molprobity 
score.
...
...
Just for comparison, with refmac5 in "refi type ideal" mode I see 
the MolProbity rise to 1.13, but Clashscore remains zero, some 
Ramas go from favored to allowed, but none rise to the level of 
outliers.
I believe this is because of the nature of minimizer used. Refmac 
uses 2nd derivative based one, which in a nutshell means it can 
move the model much less (just a bit in vicinity of a local 
minimum) than any program that uses gradients only (like Phenix).
good point.
So, what should I do to stabilize phenix.geometry_minimization? 
Crank up the non-bonded weight? Restrain to starting coordinates?
...
...
Files and logs here:
https://bl831.als.lbl.gov/~jamesh/bugreports/phenixmin_070721.tgz
I suspect this might have something to do with library values for 
main-chain bonds and angles?  They do seem to vary between 
programs. Phenix having the shortest CA-CA distance by up to 0.08 
A. After running thorough minimization on a poly-A peptide I get:
bond   amber   refmac  phenix  shelxl Stryer
 C-N   1.330   1.339   1.331   1.325     1.32
 N-CA  1.462   1.482   1.455   1.454     1.47
CA-C   1.542   1.534   1.521   1.546     1.53
CA-CA  3.862   3.874 *3.794* 3.854
So, which one is "right" ?
I'd say they are all the same, within their 'sigmas' which are 
from memory about 0.02A:
I note that 3.874 - 3.794 = 0.08 > 0.02
This brings me to my pet theory.  I think what is going on is small 
errors like this build up a considerable amount of tension in the 
long main chain. For this 64-mer, the contour length of the main 
chain after idealization is ~5 A shorter after 
phenix.geometry_minimization than it is after shelxl or amber. That 
5 A has to come from somewhere.  Without stretching bonds or 
bending angles the only thing left to do is twisting torsions. A 
kind of "whirlygig" effect.
The question is: is the phenix CA-CA distance too short?  Or is the 
amber CA-CA distance too long?
Shall we vote?
-James
_______________________________________________
phenixbb mailing list
[email protected]
http://phenix-online.org/mailman/listinfo/phenixbb
Unsubscribe: [email protected]