[phenixbb] geometry_minimization makes molprobity score worse

Fri Jul 9 14:12:26 PDT 2021

Hi,

    I thought I'd toss in an observation that might clarify some of your 
problems.  As Pavel noted earlier, the uncertainty in the "ideal" values 
in libraries based on high resolution crystal structures are about 0.02 
A for most bonds and one to two degrees for bond angles (with a higher 
number for the NCaC angle).  The origin of these uncertainties is not 
"error" but the actual variability of atomic arrangements caused by 
their environment.  The variability these "uncertainties" represent are, 
in fact, real.

    Not only are the angles and bonds in individual molecules perturbed 
by their surroundings, but the distortions of one angle will, 
systematically, be connected to distortions of other angles.  For 
example, when the NCaC angle is larger than average both the NCaCb and 
CbCaC angles will tend to be smaller.  This particular relationship is 
ignored in most libraries of standard geometry but is implicit in the 
CDLs that Andy Karplus has produced.

    What this fact indicates to me is that the standard libraries 
describe an ensemble of molecular models that all are consistent with 
the library, and long as their deviations from the mean are within the 
uncertainty.  This means that there is nothing at all special about the 
conformation that has average bond length and angles and makes me 
question to utility of creating an "idealized" model.  There really is 
no evidence that a significant number of molecules actually exist with 
that set of lengths and angles.  In fact, since we don't really 
understand all the interrelationships between the lengths and angles, 
there is a good chance that a molecule with all "perfect" angles and 
lengths is impossible!  There may be a conflict which we identify as a 
Molprobaty clash which forbids that arrangement of atoms despite them 
being within the simple-minded ensemble limits of a library w/o 
interrelationships.

    In the absence of any other information you may decide that an 
"idealized" set of coordinates is the safest statement you can make 
about your molecule's structure but this limitation of traditional 
geometry libraries will mean that you have little to no confidence that 
any molecule will actually have that structure.  It is kind of like the 
expectation value of a structure factor whose magnitude you know but you 
have no idea of the phase.  There is a center to that distribution, but 
that doesn't mean that the probability at that center is non-zero.

Dale Tronrud


On 7/8/2021 10:24 AM, James Holton wrote:
> Thank you Pavel for your prompt response!
> 
> I agree with everything you wrote below, and that is a good point about 
> 2nd derivatives.
> 
> However, what I'm seeing is the opposite of what you might predict. See 
> below.
> 
> On 7/7/2021 11:27 PM, Pavel Afonine wrote:
>> Hi James,
>>
>> thanks for email and sharing your observations!
>>
>>> Greetings all, and I hope this little observation helps improve 
>>> things somehow.
>>>
>>> I did not expect this result, but there it is. My MolProbity score 
>>> goes from 0.7 to 1.9 after a run of phenix.geometry_minimization
>>>
>>> I started with an AMBER-minimized model (based on 1aho), and that got 
>>> me my best MolProbity score so far (0.7). But, even with hydrogens 
>>> and waters removed the geometry_minimization run increases the 
>>> clashscore from 0 to 3.1 and Ramachandran favored drops from 98% to 
>>> 88% with one residue reaching the outlier level.
>>
>> It is not a secret that 'standard geometry restraints' used in Phenix 
>> and alike (read Refmac, etc) are very simplistic. They are not aware 
>> of main chain preferential conformations (Ramachandran plot), 
>> favorable side chain rotamer conformations. They don't even have any 
>> electrostatic/attraction terms -- only anti-bumping repulsion! 
>> Standard geometry restraints won't like any NCI (non-covalent 
>> interaction) and likely will make interacting atoms break apart rather 
>> than stay close together interacting.
> 
> Yes, there's the rub: I'm not seeing "interacting atoms break apart", 
> but rather they are being smashed together.  Torsion angles are also 
> being twisted out of allowed regions of the Ramachandran plot.
> 
> All this with the x-ray term turned off!
> 
>> With this in mind any high quality (high-resolution) atomic model or 
>> the one optimized using sufficiently high-level QM is going to have a 
>> more realistic geometry than the result of geometry regularization 
>> against very simplistic restraints target. An example:
>>
>> https://journals.iucr.org/d/issues/2020/12/00/lp5048/lp5048.pdf
>>
>> and previous papers on the topic.
> 
> I agree, but what doesn't make sense to me is how the "simplistic 
> restraints" of phenix.geometry_minimization would be so inconsistent 
> with the "simplistic restraints" in phenix.molprobity ?
> 
> What I am doing here is starting with an energy-minimized model of a 1.0 
> A structure (1aho). It's not a fancy QM, just the ff14SB potential in 
> AMBER.  I get my best molprobity scores this way, but I need an x-ray 
> refinement program like phenix.refine to compare these models with 
> reality.  It troubles me that the "geometry" in the x-ray refinement 
> program all by itself messes up my molprobity score.
> 
>>
>>> Just for comparison, with refmac5 in "refi type ideal" mode I see the 
>>> MolProbity rise to 1.13, but Clashscore remains zero, some Ramas go 
>>> from favored to allowed, but none rise to the level of outliers.
>>
>> I believe this is because of the nature of minimizer used. Refmac uses 
>> 2nd derivative based one, which in a nutshell means it can move the 
>> model much less (just a bit in vicinity of a local minimum) than any 
>> program that uses gradients only (like Phenix).
> good point.
> 
> So, what should I do to stabilize phenix.geometry_minimization? Crank up 
> the non-bonded weight?  Restrain to starting coordinates?
> 
>>> Files and logs here:
>>> https://bl831.als.lbl.gov/~jamesh/bugreports/phenixmin_070721.tgz
>>>
>>> I suspect this might have something to do with library values for 
>>> main-chain bonds and angles?  They do seem to vary between programs. 
>>> Phenix having the shortest CA-CA distance by up to 0.08 A. After 
>>> running thorough minimization on a poly-A peptide I get:
>>> bond   amber   refmac  phenix  shelxl Stryer
>>>  C-N   1.330   1.339   1.331   1.325     1.32
>>>  N-CA  1.462   1.482   1.455   1.454     1.47
>>> CA-C   1.542   1.534   1.521   1.546     1.53
>>> CA-CA  3.862   3.874 *3.794* 3.854
>>>
>>> So, which one is "right" ?
>>
>> I'd say they are all the same, within their 'sigmas' which are from 
>> memory about 0.02A:
> I note that 3.874 - 3.794 = 0.08 > 0.02
> 
> This brings me to my pet theory.  I think what is going on is small 
> errors like this build up a considerable amount of tension in the long 
> main chain. For this 64-mer, the contour length of the main chain after 
> idealization is ~5 A shorter after phenix.geometry_minimization than it 
> is after shelxl or amber.  That 5 A has to come from somewhere.  Without 
> stretching bonds or bending angles the only thing left to do is twisting 
> torsions. A kind of "whirlygig" effect.
> 
> The question is: is the phenix CA-CA distance too short?  Or is the 
> amber CA-CA distance too long?
> 
> Shall we vote?
> 
> -James
> 
> 
> 
> _______________________________________________
> phenixbb mailing list
> phenixbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/phenixbb
> Unsubscribe: phenixbb-leave at phenix-online.org
>