[phenixbb] geometry_minimization makes molprobity score worse

Tue Jul 13 12:04:07 PDT 2021

P.S.:

and if on top of using SS and Ramachandran plot restraints I bump up the 
nonbonded weight by a bit (150), then I get

---------------------------------------
   Deviations from Ideal Values.
     Bond      :  0.003   0.012    519
     Angle     :  0.776   3.318    702

   Molprobity Statistics.
     All-atom Clashscore : 0.00
     Ramachandran Plot:
       Outliers :  0.00 %
       Allowed  :  0.00 %
       Favored  : 100.00 %
     Rotamer Outliers :  0.00 %
     MolProbity score: 0.5

   Rama-Z: 1.08
---------------------------------------

Pavel

On 7/13/21 11:44, Pavel Afonine wrote:
> James,
>
> okay here comes a more detailed analysis...
>
> This is the statistics for monomer.pdb:
>
> ---------------------------------------
> Deviations from Ideal Values.
>   Bond      :  0.014   0.037    519
>   Angle     :  1.622   6.565    702
>
> Molprobity Statistics.
>   All-atom Clashscore : 0.00
>   Ramachandran Plot:
>     Outliers :  0.00 %
>     Allowed  :  1.61 %
>     Favored  : 98.39 %
>   Rotamer Outliers :  1.85 %
>   MolProbity score: 0.70
>
> Rama-Z: -0.32
> ---------------------------------------
>
> and here it is for monomer_minimized.pdb:
>
> ---------------------------------------
> Deviations from Ideal Values.
>   Bond      :  0.001   0.010    519
>   Angle     :  0.361   3.089    702
>
> Molprobity Statistics.
>   All-atom Clashscore : 3.12
>   Ramachandran Plot:
>     Outliers :  1.61 %
>     Allowed  :  9.68 %
>     Favored  : 88.71 %
>   Rotamer Outliers :  1.85 %
>   MolProbity score: 1.89
>
> Rama-Z: -3.20
> ---------------------------------------
>
> What I see is totally unsurprising to me. The model does not have 
> hydrogens and restraint only have repulsions as NCI term. 
> All-together, these two facts (no H and poor NCI terms) result in 
> deterioration of secondary structure (and other H-bonds), poorer 
> Ramachandran plot and slightly larger clashscore. Since Molprobity 
> score is based on accumulation of clashes, Rama plot stats, rotamers 
> etc -- no wonder it got worse in this case!
>
> As a side remark, I don't know if clashscore 0 is better or worse than 
> 3.12. Generally, zero clashscore is not the goal. Clashscore 
> calculation is agnostic to genuine NCI and various non-standard 
> covalent links. For example, if you have something covalently attached 
> to your aa residue, clashscore calculation will regard it as a 
> terrible clash, which in reality isn't.
>
> Now, back to your example. Adding H to monomer.pdb and running 
> minimization gives me:
>
> ---------------------------------------
> Deviations from Ideal Values.
>   Bond      :  0.001   0.008    519
>   Angle     :  0.340   3.422    702
>
> Molprobity Statistics.
>   All-atom Clashscore : 0.00
>   Ramachandran Plot:
>     Outliers :  1.61 %
>     Allowed  : 14.52 %
>     Favored  : 83.87 %
>   Rotamer Outliers :  0.00 %
>   MolProbity score: 1.18
>
> Rama-Z: -4.59
> ---------------------------------------
>
> Expectedly, adding H reduced clashscore, eliminated rotamer outliers, 
> but deteriorated secondary structure even more as evidenced by Rama 
> favored and terrible Rama-Z. Turns out for the Molprobity score the 
> improvements in clashscore and rotamers outweighed worsening of 
> Ramachandran stats, and the Molprobity score becames better as result.
>
> Now, if I re-ran minimization of model with H but this time using 
> secondary structure restraints (which are essentially H-bond 
> restraints for SS elements) and Ramachandran plot restraints, here is 
> what I get:
>
>
> ---------------------------------------
> Deviations from Ideal Values.
>   Bond      :  0.001   0.009    519
>   Angle     :  0.340   3.267    702
>
> Molprobity Statistics.
>   All-atom Clashscore : 1.04
>   Ramachandran Plot:
>     Outliers :  0.00 %
>     Allowed  :  0.00 %
>     Favored  : 100.00 %
>   Rotamer Outliers :  0.00 %
>   MolProbity score: 0.80
>
> Rama-Z: 0.90
> ---------------------------------------
>
> Now everything is 'perfect', and Molprobity score is the same as for 
> your starting model. Can you achieve the same stats with other 
> packages? I'm *not* saying the model with this statistics is realistic 
> or any better than your initial model (remember, outliers that are 
> supported by the data are expected!) but this result shows that 
> underlying algorithms work as expected.
>
> So far all makes sense to me...
>
> Pavel
>
> On 7/12/21 17:55, James Holton wrote:
>> Thank you Dale for the thoughtful explanation.
>>
>> I understand and agree that the "ideal" geometry is just an 
>> approximation.  But is our approximation so bad that that it is 
>> impossible to construct a model that doesn't have "problems"? As in: 
>> do we always have to decide between strained bonds and angles vs bad 
>> clashes and torsional outliers?  Even in the absence of an x-ray term?
>>
>> I don't believe that is so.  My evidence to support this is that I 
>> can get pretty close to perfect molprobity scores using amber 
>> (imin=1), refmac (refi type ideal), and pdb2ins/shelxl (WGHT 0.1 0 0 
>> 0 0 0), but not with phenix.geometry_minimization. I thought perhaps 
>> the phenix developers might find that interesting?  I certainly do.
>>
>> I do get a better clash score if I provide a model with phenix.reduce 
>> hydrogens, but I always get Rama outliers. CDL or no CDL. And, of 
>> course, if I turn on Rama restraints I always get clashes, no matter 
>> what nonbonded_weight I use.
>>
>> 1aho is a small, 1.0 A structure so it is not hard to see what is 
>> specifically happening.  The one Rama outlier is Gly17. After amber 
>> minimization it is on the edge of the "favored" region, but after 
>> geometry_minimization it is pushed into outlier territory. The other 
>> thing bumping up the Molprobity score is a steric clash between 
>> OG:Ser40 and O:Gly43.  Yes, there is a hydrogen between them, but the 
>> O--O distance goes from 2.61 to 2.33 A, which is pretty tight for an 
>> H-bond. I realize that in common practice a single outlier is 
>> considered "OK", but in reality they are physically unreasonable.
>>
>> I bring up my "whirlygig hypothesis" because the contour length of 
>> the main chain is the only aspect of macromolecular models that is 
>> sensitive to small "errors" in the geometry. Or, at least, errors 
>> that are not counter-balanced by other errors. Yes, the difference in 
>> the ideal (aka "goal") CA-CA distance in phenix is only 2% shorter 
>> than it is in the other three packages, but a 2% stretch of a 100 aa 
>> chain requires ~4 kcal/mol.  This is significantly more than any 
>> conventional dihedral and/or anti-bumping restraints, which is where 
>> I think the energy is going.
>>
>> Or do you think something else is going on?
>>
>> Thank you for taking the time to consider this,
>>
>> -James
>>
>>
>> On 7/9/2021 2:12 PM, Dale Tronrud wrote:
>>> Hi,
>>>
>>>    I thought I'd toss in an observation that might clarify some of 
>>> your problems.  As Pavel noted earlier, the uncertainty in the 
>>> "ideal" values in libraries based on high resolution crystal 
>>> structures are about 0.02 A for most bonds and one to two degrees 
>>> for bond angles (with a higher number for the NCaC angle).  The 
>>> origin of these uncertainties is not "error" but the actual 
>>> variability of atomic arrangements caused by their environment. The 
>>> variability these "uncertainties" represent are, in fact, real.
>>>
>>>    Not only are the angles and bonds in individual molecules 
>>> perturbed by their surroundings, but the distortions of one angle 
>>> will, systematically, be connected to distortions of other angles.  
>>> For example, when the NCaC angle is larger than average both the 
>>> NCaCb and CbCaC angles will tend to be smaller.  This particular 
>>> relationship is ignored in most libraries of standard geometry but 
>>> is implicit in the CDLs that Andy Karplus has produced.
>>>
>>>    What this fact indicates to me is that the standard libraries 
>>> describe an ensemble of molecular models that all are consistent 
>>> with the library, and long as their deviations from the mean are 
>>> within the uncertainty.  This means that there is nothing at all 
>>> special about the conformation that has average bond length and 
>>> angles and makes me question to utility of creating an "idealized" 
>>> model.  There really is no evidence that a significant number of 
>>> molecules actually exist with that set of lengths and angles.  In 
>>> fact, since we don't really understand all the interrelationships 
>>> between the lengths and angles, there is a good chance that a 
>>> molecule with all "perfect" angles and lengths is impossible! There 
>>> may be a conflict which we identify as a Molprobaty clash which 
>>> forbids that arrangement of atoms despite them being within the 
>>> simple-minded ensemble limits of a library w/o interrelationships.
>>>
>>>    In the absence of any other information you may decide that an 
>>> "idealized" set of coordinates is the safest statement you can make 
>>> about your molecule's structure but this limitation of traditional 
>>> geometry libraries will mean that you have little to no confidence 
>>> that any molecule will actually have that structure.  It is kind of 
>>> like the expectation value of a structure factor whose magnitude you 
>>> know but you have no idea of the phase.  There is a center to that 
>>> distribution, but that doesn't mean that the probability at that 
>>> center is non-zero.
>>>
>>> Dale Tronrud
>>>
>>>
>>>
>>> On 7/8/2021 10:24 AM, James Holton wrote:
>>>> Thank you Pavel for your prompt response!
>>>>
>>>> I agree with everything you wrote below, and that is a good point 
>>>> about 2nd derivatives.
>>>>
>>>> However, what I'm seeing is the opposite of what you might predict. 
>>>> See below.
>>>>
>>>> On 7/7/2021 11:27 PM, Pavel Afonine wrote:
>>>>> Hi James,
>>>>>
>>>>> thanks for email and sharing your observations!
>>>>>
>>>>>> Greetings all, and I hope this little observation helps improve 
>>>>>> things somehow.
>>>>>>
>>>>>> I did not expect this result, but there it is. My MolProbity 
>>>>>> score goes from 0.7 to 1.9 after a run of 
>>>>>> phenix.geometry_minimization
>>>>>>
>>>>>> I started with an AMBER-minimized model (based on 1aho), and that 
>>>>>> got me my best MolProbity score so far (0.7). But, even with 
>>>>>> hydrogens and waters removed the geometry_minimization run 
>>>>>> increases the clashscore from 0 to 3.1 and Ramachandran favored 
>>>>>> drops from 98% to 88% with one residue reaching the outlier level.
>>>>>
>>>>> It is not a secret that 'standard geometry restraints' used in 
>>>>> Phenix and alike (read Refmac, etc) are very simplistic. They are 
>>>>> not aware of main chain preferential conformations (Ramachandran 
>>>>> plot), favorable side chain rotamer conformations. They don't even 
>>>>> have any electrostatic/attraction terms -- only anti-bumping 
>>>>> repulsion! Standard geometry restraints won't like any NCI 
>>>>> (non-covalent interaction) and likely will make interacting atoms 
>>>>> break apart rather than stay close together interacting.
>>>>
>>>> Yes, there's the rub: I'm not seeing "interacting atoms break 
>>>> apart", but rather they are being smashed together. Torsion angles 
>>>> are also being twisted out of allowed regions of the Ramachandran 
>>>> plot.
>>>>
>>>> All this with the x-ray term turned off!
>>>>
>>>>> With this in mind any high quality (high-resolution) atomic model 
>>>>> or the one optimized using sufficiently high-level QM is going to 
>>>>> have a more realistic geometry than the result of geometry 
>>>>> regularization against very simplistic restraints target. An example:
>>>>>
>>>>> https://journals.iucr.org/d/issues/2020/12/00/lp5048/lp5048.pdf
>>>>>
>>>>> and previous papers on the topic.
>>>>
>>>> I agree, but what doesn't make sense to me is how the "simplistic 
>>>> restraints" of phenix.geometry_minimization would be so 
>>>> inconsistent with the "simplistic restraints" in phenix.molprobity ?
>>>>
>>>> What I am doing here is starting with an energy-minimized model of 
>>>> a 1.0 A structure (1aho). It's not a fancy QM, just the ff14SB 
>>>> potential in AMBER.  I get my best molprobity scores this way, but 
>>>> I need an x-ray refinement program like phenix.refine to compare 
>>>> these models with reality.  It troubles me that the "geometry" in 
>>>> the x-ray refinement program all by itself messes up my molprobity 
>>>> score.
>>>>
>>>>>
>>>>>> Just for comparison, with refmac5 in "refi type ideal" mode I see 
>>>>>> the MolProbity rise to 1.13, but Clashscore remains zero, some 
>>>>>> Ramas go from favored to allowed, but none rise to the level of 
>>>>>> outliers.
>>>>>
>>>>> I believe this is because of the nature of minimizer used. Refmac 
>>>>> uses 2nd derivative based one, which in a nutshell means it can 
>>>>> move the model much less (just a bit in vicinity of a local 
>>>>> minimum) than any program that uses gradients only (like Phenix).
>>>> good point.
>>>>
>>>> So, what should I do to stabilize phenix.geometry_minimization? 
>>>> Crank up the non-bonded weight? Restrain to starting coordinates?
>>>>
>>>>>> Files and logs here:
>>>>>> https://bl831.als.lbl.gov/~jamesh/bugreports/phenixmin_070721.tgz
>>>>>>
>>>>>> I suspect this might have something to do with library values for 
>>>>>> main-chain bonds and angles?  They do seem to vary between 
>>>>>> programs. Phenix having the shortest CA-CA distance by up to 0.08 
>>>>>> A. After running thorough minimization on a poly-A peptide I get:
>>>>>> bond   amber   refmac  phenix  shelxl Stryer
>>>>>>  C-N   1.330   1.339   1.331   1.325     1.32
>>>>>>  N-CA  1.462   1.482   1.455   1.454     1.47
>>>>>> CA-C   1.542   1.534   1.521   1.546     1.53
>>>>>> CA-CA  3.862   3.874 *3.794* 3.854
>>>>>>
>>>>>> So, which one is "right" ?
>>>>>
>>>>> I'd say they are all the same, within their 'sigmas' which are 
>>>>> from memory about 0.02A:
>>>> I note that 3.874 - 3.794 = 0.08 > 0.02
>>>>
>>>> This brings me to my pet theory.  I think what is going on is small 
>>>> errors like this build up a considerable amount of tension in the 
>>>> long main chain. For this 64-mer, the contour length of the main 
>>>> chain after idealization is ~5 A shorter after 
>>>> phenix.geometry_minimization than it is after shelxl or amber. That 
>>>> 5 A has to come from somewhere.  Without stretching bonds or 
>>>> bending angles the only thing left to do is twisting torsions. A 
>>>> kind of "whirlygig" effect.
>>>>
>>>> The question is: is the phenix CA-CA distance too short?  Or is the 
>>>> amber CA-CA distance too long?
>>>>
>>>> Shall we vote?
>>>>
>>>> -James
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> phenixbb mailing list
>>>> phenixbb at phenix-online.org
>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>> Unsubscribe: phenixbb-leave at phenix-online.org
>>>>
>>