adding hydrogens -> increasing clash score
Hi! I have probably a very naive question - when one includes hydrogens into a refinement of a model build with data at ~3.6A clash score should not go up? I mean I am including hydrogens for a reason of improving the model quality when I cannot fit the sidechains into some density, etc. Can it be true that using default 3 macrocycles on a changed (reduced) model is not enough to minimize it? Thanks, Tanya
Hi Tanya, please try the latest nightly build dev-833: http://www.phenix-online.org/download/nightly_builds.cgi and let me know if the problem persists. Pavel. On 7/29/11 12:18 PM, Tatyana Sysoeva wrote:
Hi! I have probably a very naive question -
when one includes hydrogens into a refinement of a model build with data at ~3.6A clash score should not go up?
I mean I am including hydrogens for a reason of improving the model quality when I cannot fit the sidechains into some density, etc. Can it be true that using default 3 macrocycles on a changed (reduced) model is not enough to minimize it?
Thanks, Tanya
On Fri, 2011-07-29 at 15:18 -0400, Tatyana Sysoeva wrote:
when one includes hydrogens into a refinement of a model build with data at ~3.6A
Not sure what you mean by "including into refinement", but if it means that hydrogen positions are actually refined... well, you shouldn't do that at 3.6A. -- "Hurry up before we all come back to our senses!" Julian, King of Lemurs
I meant that initially I did not have H in the refinement, sorry for the
confusion.
By the way - should I still include "not (element H)" into the NCS
parameters or it is done automatically. I used this line, but I am not sure
if it is needed.
I will try the new built and report the results.
Thank you,
Tanya
On Fri, Jul 29, 2011 at 3:32 PM, Ed Pozharski
On Fri, 2011-07-29 at 15:18 -0400, Tatyana Sysoeva wrote:
when one includes hydrogens into a refinement of a model build with data at ~3.6A
Not sure what you mean by "including into refinement", but if it means that hydrogen positions are actually refined... well, you shouldn't do that at 3.6A.
-- "Hurry up before we all come back to our senses!" Julian, King of Lemurs
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi Ed,
when one includes hydrogens into a refinement of a model build with data at ~3.6A Not sure what you mean by "including into refinement", but if it means
On Fri, 2011-07-29 at 15:18 -0400, Tatyana Sysoeva wrote: that hydrogen positions are actually refined... well, you shouldn't do that at 3.6A.
good remark! Although it's quite tricky to do it in phenix.refine by accident so I guess most people mean "riding model" which is what phenix.refine does by default if H are present in PDB file. Pavel.
I ran the same commands with the build dev-833. Below are the results. These parameters are obtained by running phenix.ramalyze, cbetdev, and clashscore. I am almost done repeating fix-rotamers run. First time it gave a way worse validation results then before running phenix without fix_rotamers=true. I don't understand what I am doing wrong in the runs and would appreciate your help! Thanks, Tanya I used phenix-dev-833 to test the riding hydrogens refinement at 3.6A. I ran two indentical commands: Command line arguments: "model.pdb" "data.mtz" "main.ncs=true" "ncs.find_automatically=false" "ncs_groups.params" "refinement.input.xray_data.high_resolution=3.6" "mgadpbef.cif" "refinement.ncs.excessive_distance_limit=None" "strategy=individual_sites+individual_sites_real_space+group_adp+occupancies" with the only difference in the input files – ncs.params and model.pdb Model pdb contained the model identical to the control run but with added H atoms. I have not add H to the ADP molecules since I did not know how to write a correct CIF file for it. NCS definitions were changed by addition of “and not (element H)” to each line. It was done to exclude H from the NCS groups. *no hydrogens* *with hydrogens* # Date 2011-07-29 Time 19:09:58 EDT -0400 (1311980998.46 s) wall clock time: 2646.75 s Start R-work = 0.2891, R-free = 0.3267 Final R-work = 0.2868, R-free = 0.3304 # Date 2011-07-29 Time 20:14:20 EDT -0400 (1311984860.16 s) wall clock time: 6517.60 s Start R-work = 0.2910, R-free = 0.3273 Final R-work = 0.2720, R-free = 0.3353 clashscore 31.77 clashscore 96.12 cbeta 0 cbeta 151 rama 12.62% outliers rama 19.67% outliers Interestingly in previous release version the same refinement runs produced: without H clashscore = 77.580195/cbeta =8 with H clashscore = 119.729960/cbeta=330
On Sat, Jul 30, 2011 at 11:35 AM, Tatyana Sysoeva
I am almost done repeating fix-rotamers run. First time it gave a way worse validation results then before running phenix without fix_rotamers=true.
This method behaves very poorly at low resolution - I ran some tests recently on a variety of 3.0-4.5A structures solved by molecular replacement, and while the rotamer fitting usually decreased the R-factor, it always made the clashscore much worse. I wouldn't bother using it if you have worse than 3.0A data. Model pdb contained the model identical to the control run but with added H
atoms. I have not add H to the ADP molecules since I did not know how to write a correct CIF file for it.
If you use phenix.ready_set, it will handle the ADP molecules. Interestingly in previous release version the same refinement runs produced:
without H clashscore = 77.580195/cbeta =8 with H clashscore = 119.729960/cbeta=330
We increased the weight on the nonbonded restraints 5x in the last few releases - this always improves the clashscore and Ramachandran statistics at low resolution, although it doesn't change the R-factors very much. Still, 12% is an awful lot of Ramachandran outliers, even at that resolution. If you're willing to experiment a little, try adding "ncs.type=torsion" to the parameters (without hydrogens) and let us know what happens. -Nat
Hi Tatyana, could you please send me the data and model files (off-list) so I can investigate? I will post the summary on phenixbb once we know what's going on. I see a number of potential issues but I don't want to waste time guessing, since it's much more efficient if you send me the files then I run some tests and tell you how to proceed. I'm pretty confident there is a quick fix. Thanks! Pavel. On 7/30/11 11:35 AM, Tatyana Sysoeva wrote:
I ran the same commands with the build dev-833. Below are the results. These parameters are obtained by running phenix.ramalyze, cbetdev, and clashscore. I am almost done repeating fix-rotamers run. First time it gave a way worse validation results then before running phenix without fix_rotamers=true. I don't understand what I am doing wrong in the runs and would appreciate your help!
Thanks, Tanya
I used phenix-dev-833 to test the riding hydrogens refinement at 3.6A.
I ran two indentical commands:
Command line arguments: "model.pdb" "data.mtz" "main.ncs=true" "ncs.find_automatically=false" "ncs_groups.params" "refinement.input.xray_data.high_resolution=3.6" "mgadpbef.cif" "refinement.ncs.excessive_distance_limit=None" "strategy=individual_sites+individual_sites_real_space+group_adp+occupancies"
with the only difference in the input files – ncs.params and model.pdb
Model pdb contained the model identical to the control run but with added H atoms. I have not add H to the ADP molecules since I did not know how to write a correct CIF file for it. NCS definitions were changed by addition of “and not (element H)” to each line. It was done to exclude H from the NCS groups.
_no hydrogens_
_with hydrogens_
# Date 2011-07-29 Time 19:09:58 EDT -0400 (1311980998.46 s)
wall clock time: 2646.75 s
Start R-work = 0.2891, R-free = 0.3267
Final R-work = 0.2868, R-free = 0.3304
# Date 2011-07-29 Time 20:14:20 EDT -0400 (1311984860.16 s)
wall clock time: 6517.60 s
Start R-work = 0.2910, R-free = 0.3273
Final R-work = 0.2720, R-free = 0.3353
clashscore 31.77
clashscore 96.12
cbeta 0
cbeta 151
rama 12.62% outliers
rama 19.67% outliers
Interestingly in previous release version the same refinement runs produced:
without H clashscore = 77.580195/cbeta =8
with H clashscore = 119.729960/cbeta=330
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi Tanya, thanks for sending the data and model files. Here are some results. 1) Starting values corresponding to the model you sent me: r_work = 0.2891 r_free = 0.3267 MOLPROBITY STATISTICS. ALL-ATOM CLASHSCORE : 67.04 RAMACHANDRAN PLOT: OUTLIERS : 6.04 % ALLOWED : 16.58 % FAVORED : 77.38 % ROTAMER OUTLIERS : 14.80 % CBETA DEVIATIONS : 13 2) I ran six refinement jobs using input model with and without H atoms, and for each model I tried 3 ways of using NCS in refinement: the new option defining NCS in torsion angle space, let phenix.refine define NCS automatically and apply it in Cartesian space, and finally I used the NCS selections that you sent me. So 2 (model with and w/o H) * 3 (NCS options) = 6 refinements in total. Here are the results: MODEL with H: - torsion NCS: r_work = 0.2951 r_free = 0.3319 REMARK 3 MOLPROBITY STATISTICS. REMARK 3 ALL-ATOM CLASHSCORE : 19.91 REMARK 3 RAMACHANDRAN PLOT: REMARK 3 OUTLIERS : 1.99 % REMARK 3 ALLOWED : 6.99 % REMARK 3 FAVORED : 91.02 % REMARK 3 ROTAMER OUTLIERS : 0.45 % REMARK 3 CBETA DEVIATIONS : 0 - Cartesian NCS defined automatically: r_work = 0.3260 r_free = 0.3427 REMARK 3 MOLPROBITY STATISTICS. REMARK 3 ALL-ATOM CLASHSCORE : 12.79 REMARK 3 RAMACHANDRAN PLOT: REMARK 3 OUTLIERS : 1.02 % REMARK 3 ALLOWED : 5.97 % REMARK 3 FAVORED : 93.01 % REMARK 3 ROTAMER OUTLIERS : 14.07 % REMARK 3 CBETA DEVIATIONS : 45 - Cartesian NCS using your selections for NCS groups: r_work = 0.2792 r_free = 0.3245 REMARK 3 MOLPROBITY STATISTICS. REMARK 3 ALL-ATOM CLASHSCORE : 15.29 REMARK 3 RAMACHANDRAN PLOT: REMARK 3 OUTLIERS : 0.90 % REMARK 3 ALLOWED : 7.77 % REMARK 3 FAVORED : 91.33 % REMARK 3 ROTAMER OUTLIERS : 12.87 % REMARK 3 CBETA DEVIATIONS : 36 MODEL without H: - torsion NCS: r_work = 0.2938 r_free = 0.3301 REMARK 3 MOLPROBITY STATISTICS. REMARK 3 ALL-ATOM CLASHSCORE : 30.12 REMARK 3 RAMACHANDRAN PLOT: REMARK 3 OUTLIERS : 1.89 % REMARK 3 ALLOWED : 7.01 % REMARK 3 FAVORED : 91.11 % REMARK 3 ROTAMER OUTLIERS : 0.22 % REMARK 3 CBETA DEVIATIONS : 2 - Cartesian NCS defined automatically: r_work = 0.3174 r_free = 0.3354 REMARK 3 MOLPROBITY STATISTICS. REMARK 3 ALL-ATOM CLASHSCORE : 33.76 REMARK 3 RAMACHANDRAN PLOT: REMARK 3 OUTLIERS : 0.63 % REMARK 3 ALLOWED : 6.31 % REMARK 3 FAVORED : 93.06 % REMARK 3 ROTAMER OUTLIERS : 12.60 % REMARK 3 CBETA DEVIATIONS : 142 - Cartesian NCS using your selections for NCS groups: r_work = 0.2762 r_free = 0.3233 REMARK 3 MOLPROBITY STATISTICS. REMARK 3 ALL-ATOM CLASHSCORE : 41.83 REMARK 3 RAMACHANDRAN PLOT: REMARK 3 OUTLIERS : 0.73 % REMARK 3 ALLOWED : 8.32 % REMARK 3 FAVORED : 90.95 % REMARK 3 ROTAMER OUTLIERS : 18.36 % REMARK 3 CBETA DEVIATIONS : 118 3) Looking at the results above, I would say we have two overall equally good results (in my interpretation): MODEL with H (torsion NCS): r_work = 0.2951 r_free = 0.3319 REMARK 3 MOLPROBITY STATISTICS. REMARK 3 ALL-ATOM CLASHSCORE : 19.91 REMARK 3 RAMACHANDRAN PLOT: REMARK 3 OUTLIERS : 1.99 % REMARK 3 ALLOWED : 6.99 % REMARK 3 FAVORED : 91.02 % REMARK 3 ROTAMER OUTLIERS : 0.45 % REMARK 3 CBETA DEVIATIONS : 0 and MODEL without H (torsion NCS): r_work = 0.2938 r_free = 0.3301 REMARK 3 MOLPROBITY STATISTICS. REMARK 3 ALL-ATOM CLASHSCORE : 30.12 REMARK 3 RAMACHANDRAN PLOT: REMARK 3 OUTLIERS : 1.89 % REMARK 3 ALLOWED : 7.01 % REMARK 3 FAVORED : 91.11 % REMARK 3 ROTAMER OUTLIERS : 0.22 % REMARK 3 CBETA DEVIATIONS : 2 compare it to your original structure: r_work = 0.2891 r_free = 0.3267 MOLPROBITY STATISTICS. ALL-ATOM CLASHSCORE : 67.04 RAMACHANDRAN PLOT: OUTLIERS : 6.04 % ALLOWED : 16.58 % FAVORED : 77.38 % ROTAMER OUTLIERS : 14.80 % CBETA DEVIATIONS : 13 4) To be able to reproduce these numbers you need to use the most recent PHENIX version from the nightly builds: http://www.phenix-online.org/download/nightly_builds.cgi use dev-838 and up. 5) I'm sending the relevant files off-list. 6) The six commands I used are: phenix.refine data.mtz model_H.pdb ramachandran_restraints=true main.number_of_mac=5 optimize_xyz_weight=true optimize_adp_weight=true --overwrite main.ncs=true ncs.type=torsion output.prefix=H secondary_structure_restraints=true *cif xray_data.high_res=3.6 > & zlogH & phenix.refine data.mtz model.pdb ramachandran_restraints=true main.number_of_mac=5 optimize_xyz_weight=true optimize_adp_weight=true --overwrite main.ncs=true ncs.type=torsion output.prefix=noH secondary_structure_restraints=true *cif xray_data.high_res=3.6 > & zlognoH & phenix.refine data.mtz model_H.pdb excessive_distance_limit=None ramachandran_restraints=true main.number_of_mac=5 optimize_xyz_weight=true optimize_adp_weight=true --overwrite main.ncs=true output.prefix=H_ncsC secondary_structure_restraints=true *cif xray_data.high_res=3.6 > & zlogH_ncsC & phenix.refine data.mtz model.pdb excessive_distance_limit=None ramachandran_restraints=true main.number_of_mac=5 optimize_xyz_weight=true optimize_adp_weight=true --overwrite main.ncs=true output.prefix=noH_ncsC secondary_structure_restraints=true *cif xray_data.high_res=3.6 > & zlognoH_ncsC & phenix.refine data.mtz model_H.pdb excessive_distance_limit=None ncs_groups_H.params ramachandran_restraints=true main.number_of_mac=5 optimize_xyz_weight=true optimize_adp_weight=true --overwrite main.ncs=true output.prefix=H_ncsC_cust secondary_structure_restraints=true *cif xray_data.high_res=3.6 > & zlogH_ncsC_cust & phenix.refine data.mtz model.pdb excessive_distance_limit=None ncs_groups_H.params ramachandran_restraints=true main.number_of_mac=5 optimize_xyz_weight=true optimize_adp_weight=true --overwrite main.ncs=true output.prefix=noH_ncsC_cust secondary_structure_restraints=true *cif xray_data.high_res=3.6 > & zlognoH_ncsC_cust & It may not be necessary to use weights optimization, although I did not try without it. I recommend you try it: if it turns out to be not necessary than it may save you many hours of time. Also, the amount of CB-outliers and Ramachandran outliers in your original model probably indicates that it is far from final and requires some careful analysis manually. As Nat mentioned before, fit_rotamers will not work at resolutions lower than 2.8-3.0A, as it relies on "reasonably good" density. I plan to extend it to lower resolutions, but it will take some time. You don't need to run tools like phenix.ramalyze, etc separately after refinement, since most of Molprobity statistics is reported in REMARK 3 records by phenix.refine as in examples above. Finally, you don't really have to type all these long commands. Using parameter files is easy, and can reduce the amount of typing to something like: phenix.refine params.eff See https://www.phenix-online.org/presentations/latest/pavel_phenix_refine.pdf for details and examples. 7) Finally finally, at 3.6A resolution or so, the R-factors withing ~2% can be considered the same. For example, I would not say that r_free = 0.3245 is better than 0.3319. For discussion see http://phenix-online.org/newsletter/CCN_2011_07.pdf article "Improved target weight optimization in phenix.refine". Please let me know if you have any questions or need any help with this. Pavel. On 7/30/11 11:35 AM, Tatyana Sysoeva wrote:
I ran the same commands with the build dev-833. Below are the results. These parameters are obtained by running phenix.ramalyze, cbetdev, and clashscore. I am almost done repeating fix-rotamers run. First time it gave a way worse validation results then before running phenix without fix_rotamers=true. I don't understand what I am doing wrong in the runs and would appreciate your help!
Thanks, Tanya
I used phenix-dev-833 to test the riding hydrogens refinement at 3.6A.
I ran two indentical commands:
Command line arguments: "model.pdb" "data.mtz" "main.ncs=true" "ncs.find_automatically=false" "ncs_groups.params" "refinement.input.xray_data.high_resolution=3.6" "mgadpbef.cif" "refinement.ncs.excessive_distance_limit=None" "strategy=individual_sites+individual_sites_real_space+group_adp+occupancies"
with the only difference in the input files – ncs.params and model.pdb
Model pdb contained the model identical to the control run but with added H atoms. I have not add H to the ADP molecules since I did not know how to write a correct CIF file for it. NCS definitions were changed by addition of “and not (element H)” to each line. It was done to exclude H from the NCS groups.
_no hydrogens_
_with hydrogens_
# Date 2011-07-29 Time 19:09:58 EDT -0400 (1311980998.46 s)
wall clock time: 2646.75 s
Start R-work = 0.2891, R-free = 0.3267
Final R-work = 0.2868, R-free = 0.3304
# Date 2011-07-29 Time 20:14:20 EDT -0400 (1311984860.16 s)
wall clock time: 6517.60 s
Start R-work = 0.2910, R-free = 0.3273
Final R-work = 0.2720, R-free = 0.3353
clashscore 31.77
clashscore 96.12
cbeta 0
cbeta 151
rama 12.62% outliers
rama 19.67% outliers
Interestingly in previous release version the same refinement runs produced:
without H clashscore = 77.580195/cbeta =8
with H clashscore = 119.729960/cbeta=330
participants (4)
-
Ed Pozharski
-
Nathaniel Echols
-
Pavel Afonine
-
Tatyana Sysoeva