How to reduce clashscore value

CelinaSocek

17 Nov 2011 17 Nov '11

10:08 p.m.

Hi all, I'm now using phenix to refine a small protein about 300 residues long. The resolution of my data is about 2.9 angstrom. Currently I got a refined model with good R-factors about 23/26, RMSD very reasonable and no Ramanchandran outliers. However, I found that the clashscore is extremely high: 63!! And rotamer outliers are also a lot: 8.5%. I wonder, what kind of stretagy I can use to reduce this? Thank you very much! best, Celina

Attachments:

attachment.html (text/html — 695 bytes)

Show replies by date

Nathaniel Echols

18 Nov 18 Nov

6:58 a.m.

2011/11/17 CelinaSocek :

...

I'm now using phenix to refine a small protein about 300 residues long. The resolution of my data is about 2.9 angstrom. Currently I got a refined model with good R-factors about 23/26, RMSD very reasonable and no Ramanchandran outliers. However, I found that the clashscore is extremely high: 63!! And rotamer outliers are also a lot: 8.5%. I wonder, what kind of stretagy I can use to reduce this? Thank you very much!

Which version are you using? We made some changes over the summer that improve the geometry, especially the clashscore - these are present in version 1.7.2. If you're still using an earlier version, this might largely fix the problem. However, if there is conformational strain in the model, it is difficult to fix severe clashes without further rebuilding. You should inspect the clashes in Coot (the Phenix GUI has shortcuts for this, but Coot alone also has this capability) and fix as many as possible yourself. It is recommended that you use explicit hydrogen atoms when doing this, since the reason for clashes is often not obvious if only heavy atoms are used. (You don't necessarily need to refine with heavy atoms though - they're just an aid to visualizing clashes.) The rotamer problem may be related; there isn't much you can do with these except fix them manually, but relieving some of the clashes may help. The dihedral angle restraints simply aren't that strong, so in areas of weak density it can be difficult to completely prevent outliers. -Nat

Tim Fenn

12:24 p.m.

2011/11/17 CelinaSocek

...

Hi all, I'm now using phenix to refine a small protein about 300 residues long. The resolution of my data is about 2.9 angstrom. Currently I got a refined model with good R-factors about 23/26, RMSD very reasonable and no Ramanchandran outliers. However, I found that the clashscore is extremely high: 63!! And rotamer outliers are also a lot: 8.5%. I wonder, what kind of stretagy I can use to reduce this? Thank you very much!

There are most likely two reasons for the poor clash score: 1) the weighting between the chemical and X-ray potentials is putting too much emphasis on the data (see http://dx.doi.org/10.1107/S0907444911039060) 2) I'm not sure what kind of vdW function phenix uses by default (repulsive only?), but its very difficult, if not impossible, to get accurate interatomic separation distances without summing a Lennard-Jones style vdW potential and (at least) fixed atomic charge electrostatics. Regards, Tim

Pavel Afonine

1:15 p.m.

Hi,

...

1) the weighting between the chemical and X-ray potentials is putting too much emphasis on the data (see http://dx.doi.org/10.1107/S0907444911039060)

most of the time the default weight used in Phenix is good enough. Once it's not the case, then you can always use "optimize_xyz_weight=true" to get the optimal value. This is described in details here: http://www.phenix-online.org/newsletter/ see article "Improved target weight optimization in phenix.refine". Adding H may help further.

...

2) I'm not sure what kind of vdW function phenix uses by default (repulsive only?), but its very difficult, if not impossible, to get accurate interatomic separation distances without summing a Lennard-Jones style vdW potential and (at least) fixed atomic charge electrostatics.

phenix.refine uses repulsion term only. Although one can imagine reasons why attraction terms may be helpful, in reality they may be counterproductive if the model geometry quality is not great since attractive terms may lock wrong conformations and not let them move towards correct positions dictated by the electron density. Pavel

Tim Fenn

19 Nov 19 Nov

1:20 a.m.

On Fri, Nov 18, 2011 at 1:15 PM, Pavel Afonine wrote:

...

**

1) the weighting between the chemical and X-ray potentials is putting too much emphasis on the data (see http://dx.doi.org/10.1107/S0907444911039060 )

most of the time the default weight used in Phenix is good enough. Once it's not the case, then you can always use "optimize_xyz_weight=true" to get the optimal value. This is described in details here:

http://www.phenix-online.org/newsletter/

see article "Improved target weight optimization in phenix.refine".

The method of using the ratio of gradients doesn't make sense in a maximum likelihood context, and I've never understood the rationale for it - even this paper notes that it breaks down at high/low resolution regimes (as would be expected, since the X-ray gradient will be very large/small, respectively). 2) I'm not sure what kind of vdW function phenix uses by default (repulsive

...

only?), but its very difficult, if not impossible, to get accurate interatomic separation distances without summing a Lennard-Jones style vdW potential and (at least) fixed atomic charge electrostatics.

phenix.refine uses repulsion term only. Although one can imagine reasons why attraction terms may be helpful, in reality they may be counterproductive if the model geometry quality is not great since attractive terms may lock wrong conformations and not let them move towards correct positions dictated by the electron density.

Refinement using a force field without electrostatics versus with electrostatics was recently investigated ( http://dx.doi.org/10.1021/ct100506d), and found to favor its inclusion across a range of models/resolutions. However, a purely repulsive potential should be able to yield good results! Perhaps the original poster can try a different weighting scheme? Regards, Tim http://dx.doi.org/10.1021/ct100506d

Nathaniel Echols

8:24 a.m.

On Sat, Nov 19, 2011 at 1:20 AM, Tim Fenn wrote:

...

However, a purely repulsive potential should be able to yield good results! Perhaps the original poster can try a different weighting scheme?

We still haven't heard details from the original poster about the version used, etc., so it's somewhat premature to draw conclusions about the causes and/or remedies. Previous versions of Phenix used a weight on the repulsive terms that was at least 6x too weak, with the result that clashscores (and Ramachandran scores) were often substandard at low resolution. This has been fixed, and with the current version it is very rare to see a clashscore above 60 at the end of refinement *unless* the starting model is severely strained, and models with good packing are rarely made worse. With proper rebuilding and weighting it is entirely possible to refine a sub-4 Angstrom model with Phenix to a clashscore below 20 (the Molprobity-recommended maximum) without any additional restraints or tricks. However, in all of our tests there is a limit to how much improvement one can expect from minimization alone - and even really aggressive sampling like what Rosetta does has its limits. The geometry restraints aren't the limitation here (Rosetta uses all atoms and attractive forces too), the optimization techniques are. -Nat

Pavel Afonine

11:20 p.m.

Hi Tim,

...

The method of using the ratio of gradients doesn't make sense in a maximum likelihood context,

assuming that by "a maximum likelihood context" you mean refinement using a maximum-likelihood (ML) criterion as X-ray term (or, more generally, I would call it experimental data term, as it can be neutron too, for instance), I find the whole statement above as a little bit strange since it mixes different and absolutely not related things: type of crystallographic data term and a method of relative scale (weight) determination between it and the other term (restraints). I don't see how the choice of crystallographic data term (LS, ML, real-space or any other) is related to the method of this scale determination. The only difference between LS and ML targets is that the latter accounts for model completeness and errors in a statistical manner. The differences between LS and ML are completely irrelevant to the choice of weight between crystallographic and restraints terms. In fact, the ML target can even be approximated with LS (J. Appl.Cryst.(2003).36, 158-159) without any noticeable loss. ML target itself can be formulated in a few different ways and that alone can result in optimal weight values different by order of magnitude, while showing no difference in refinement results (since it is a matter of relative scale between two functions, that can be totally arbitrary). The ratio of gradients norms gives a good estimate for the optimal weight. In fact, if you look in the math, for two-atoms system it should be multiplied by cos(angle_between_gradient_vectors), which for a many-atom structure averages out to be approximately ~0.5 (this is what is used in CNS by default), if I remember all this correctly. If the data and restraints terms are normalized (doesn't matter how) then the weight value becomes predictable. For example, the optimal weight between ML and stereochemistry restraints in phenix.refine ranges between 1 and 10, most of the time being ~5, and the ratio of gradients norms predicts this very well. Furthermore, you can always normalize any crystallographic data term such that the optimal weight will be around 1.

...

phenix.refine uses repulsion term only. Although one can imagine reasons why attraction terms may be helpful, in reality they may be counterproductive if the model geometry quality is not great since attractive terms may lock wrong conformations and not let them move towards correct positions dictated by the electron density.

Refinement using a force field without electrostatics versus with electrostatics was recently investigated (http://dx.doi.org/10.1021/ct100506d), and found to favor its inclusion across a range of models/resolutions.

I had a look at this and more recent papers. I apologize in advance if I missed it, but I couldn't find an example showing how the proposed methodology performs for poor models. I mean real working models (incomplete with errors, like the one you get right out of MR solution). The tests shown in (/Acta Cryst./(2011). D*67*, 957-965) are all performed using models from PDB, which are supposedly good already. Sure these models may have small "cosmetic" problems, but as Joosten et al demonstrated there is always room for improvement of PDB deposited models. This is partly because the methodology and tools keep improving. So re-refinement of PDB deposited models using newer tools is very likely to yield better models, as you confirmed it once again in your paper. What would be really interesting to see is how your new methodology performs in real-life routine cases, where a structure is far away from the good final one. All the best! Pavel

Tim Fenn

20 Nov 20 Nov

8 p.m.

On Sat, Nov 19, 2011 at 11:20 PM, Pavel Afonine wrote:

...

**

The method of using the ratio of gradients doesn't make sense in a maximum likelihood context,

assuming that by "a maximum likelihood context" you mean refinement using a maximum-likelihood (ML) criterion as X-ray term (or, more generally, I would call it experimental data term, as it can be neutron too, for instance), I find the whole statement above as a little bit strange since it mixes different and absolutely not related things: type of crystallographic data term and a method of relative scale (weight) determination between it and the other term (restraints).

I don't see how the choice of crystallographic data term (LS, ML, real-space or any other) is related to the method of this scale determination.

This shouldn't be a surprise - in short, the errors are used as weights in LS and ML optimization targets, the latter just uses a different form for the errors that estimates all the model and unmeasured uncertainties (like phase error). So if the data is poorly predicted by a model, the ML target is broader/flatter (as are the gradients!), while good/complete models will yield a sharper ML target. So the likelihood target is naturally weighted, in a sense. This doesn't happen with least squares (unless the weights are not the inverse variances, which seems to be what the MLMF paper you mentioned is doing?). The likelihood function can then be plugged in to Bayes' law - if the model and data error terms are all accounted for, no other weighting should be necessary. This is discussed in Airlie McCoy's excellent review ( http://dx.doi.org/10.1107/S0907444904016038) - see sections 4.4 and 4.6, and the derivation is also in http://dx.doi.org/10.1107/S0907444911039060 Hope this helps! Regards, Tim

Pavel Afonine

8:34 p.m.

...

The likelihood function can then be plugged in to Bayes' law - if the model and data error terms are all accounted for, no other weighting should be necessary.

If I arbitrarily multiply the ML function (or any other - doesn't matter) by 100, the weight will have to account for this. The one based on ratio of gradient norms (or any other of similar kind) will do. Given the amount and variety of targets (both, data and restraints) we need to deal with (because each model and data quality require adequate proper parametrization), this flexibility is very essential. Postulating introduces rigidity, and that doesn't help to sample space when doing optimization. Pavel

Dialing Pretty

21 Nov 21 Nov

1:21 a.m.

New subject: On refine automatically correct N/Q/H errors

Dear All, I am refining a crystal structure with Phenix refine. I have adopted automatically correct N/Q/H errors. During refinement for some residues it writes "both conformations clash, check manually". Will you please tell me what does it mean for "both conformations clash, please check manually", and how to process it? I am looking forward to getting your reply. Cheers, Dialing

Jeff Headd

8:06 a.m.

New subject: On refine automatically correct N/Q/H errors

Hi Dialing, For cases where phenix.refine indicates that both conformations of an N/Q/H residue clash, it is likely that some part of the model in the vicinity of the N/Q/H residue is built incorrectly and is trapped in a false-minima. Refinement may not be powerful enough to correct this on its own, so your best course of action is to look at in a model building program such as Coot and correct the problem by hand. It will be best to examine your molecule with hydrogens present so you can see all of the clashes. The MolProbity webserver may also be helpful to you, or if you're using the Phenix GUI, you can launch KiNG from the validation window which will give you the same graphical analysis that you would find in MolProbity. If you have further questions please let me know. Thanks, Jeff On Mon, Nov 21, 2011 at 1:21 AM, Dialing Pretty wrote:

...

Dear All, I am refining a crystal structure with Phenix refine. I have adopted automatically correct N/Q/H errors. During refinement for some residues it writes "both conformations clash, check manually". Will you please tell me what does it mean for "both conformations clash, please check manually", and how to process it? I am looking forward to getting your reply.

Cheers, Dialing

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Nathaniel Echols

11:16 a.m.

New subject: On refine automatically correct N/Q/H errors

For a very detailed explanation, here is the original publication for the program Reduce: http://www.ncbi.nlm.nih.gov/pubmed/9917408 -Nat On Mon, Nov 21, 2011 at 8:06 AM, Jeff Headd wrote:

...

Hi Dialing,

For cases where phenix.refine indicates that both conformations of an N/Q/H residue clash, it is likely that some part of the model in the vicinity of the N/Q/H residue is built incorrectly and is trapped in a false-minima. Refinement may not be powerful enough to correct this on its own, so your best course of action is to look at in a model building program such as Coot and correct the problem by hand. It will be best to examine your molecule with hydrogens present so you can see all of the clashes. The MolProbity webserver may also be helpful to you, or if you're using the Phenix GUI, you can launch KiNG from the validation window which will give you the same graphical analysis that you would find in MolProbity.

If you have further questions please let me know.

Thanks, Jeff

On Mon, Nov 21, 2011 at 1:21 AM, Dialing Pretty wrote:

...
Dear All, I am refining a crystal structure with Phenix refine. I have adopted automatically correct N/Q/H errors. During refinement for some residues it writes "both conformations clash, check manually". Will you please tell me what does it mean for "both conformations clash, please check manually", and how to process it? I am looking forward to getting your reply.

Cheers, Dialing

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Dialing Pretty

2:21 p.m.

New subject: on Phenix refine

Dear All, By rigid body refinement (Phenix) with a searching model (Model A) I got a output PDB (Model B) and output mtz file (MTZ FILE) of a diffraction dataset. In Coot I modify the Model B and saved it as Model C. I want to continue to optimize Model C by Phenix.refine. In the GUI interface of Phenix.refine input, of course I need to input the MTZ FILE, but for the PDB file, should I only input the PDB of Model C, or all of Model A, Model B and Model C? If I input all of Model A, Model B and ModeC, then how does Phenix.refine distinguish the function of each PDB model? I am looking forward to getting your reply. Cheers, Dialing

Nathaniel Echols

2:31 p.m.

New subject: on Phenix refine

On Mon, Nov 21, 2011 at 2:21 PM, Dialing Pretty wrote:

...

By rigid body refinement (Phenix) with a searching model (Model A) I got a output PDB (Model B) and output mtz file (MTZ FILE) of a diffraction dataset. In Coot I modify the Model B and saved it as Model C. I want to continue to optimize Model C by Phenix.refine.

In the GUI interface of Phenix.refine input, of course I need to input the MTZ FILE, but for the PDB file, should I only input the PDB of Model C, or all of Model A, Model B and Model C? If I input all of Model A, Model B and ModeC, then how does Phenix.refine distinguish the function of each PDB model?

You only need to input the latest model. The exception is when you are working with low-resolution data (worse than about 2.8 Angstrom) and want to use a higher-resolution structure as a reference model, but this is completely optional (and sometimes unnecessary). -Nat

Dialing Pretty

20 Nov 20 Nov

8:19 p.m.

New subject: a question on starting temperature of Phenix refine simulated annealing

Dear All,

...

The default value of the starting temperature of Phenix refine simulated annealing is 5000. At this temperature all the protein structure will be destroyed.

Will you please explain why we start the simulated annealing at 5000? Will you please explain to me the basic theory of the Phenix refine simulated annealing or tell me a link on the introduction of Phenix refine simulated annealing?

I am looking forward to getting your reply.

Cheers,

Dialing

Nathaniel Echols

21 Nov 21 Nov

3:33 p.m.

New subject: a question on starting temperature of Phenix refine simulated annealing

On Sun, Nov 20, 2011 at 8:19 PM, Dialing Pretty wrote:

...

The default value of the starting temperature of Phenix refine simulated annealing is 5000. At this temperature all the protein structure will be destroyed.

Will you please explain why we start the simulated annealing at 5000? Will you please explain to me the basic theory of the Phenix refine simulated annealing or tell me a link on the introduction of Phenix refine simulated annealing?

The method was developed by Axel Brunger in the late 1980s - these are good references: http://atbweb.stanford.edu/scripts/papers.php?sendfile=49 http://atbweb.stanford.edu/scripts/papers.php?sendfile=33 The annealing uses very few timesteps of dynamics (relative to a genuine MD simulation), so the protein doesn't have time to explode - but hopefully it will manage to escape local minima. The point isn't to be physically realistic anyway, just to optimize the fit to the X-ray data. -Nat

Pavel Afonine

5:28 p.m.

New subject: a question on starting temperature of Phenix refine simulated annealing

Hi Dialing,

...

...
The default value of the starting temperature of Phenix refine simulated annealing is 5000. At this temperature all the protein structure will be destroyed.

it depends on parametrization. In phenix.refine you can use 10000-15000K and the model will not explode. Simply try it - that's the best way to find out.

...

...
Will you please explain why we start the simulated annealing at 5000?

As with ~500-600 parameters in phenix.refine, the defaults are set to the values that are "good in average, most of the time". If you want to do aggressive SA refinement (eg.: Korostelev et al, PNAS 2009), then 5000K is too low, and 10000 may be way better. If you want to get multi-start SA averaged map for a 1A resolution model, then 5000 is too high, and 500-1000K is a better start. Pavel

5002

Age (days ago)

5006

Last active (days ago)

List overview

Download

16 comments

6 participants

participants (6)

CelinaSocek
Dialing Pretty
Jeff Headd
Nathaniel Echols
Pavel Afonine
Tim Fenn