Refining partially occupied DNA on top of itself
Hello, I am having difficulty using Phenix to build and refine a DNA duplex. The issue is that my asymmetric unit consists of one protein monomer bound to DNA, but the protein is a dimer and the DNA is not palindromic, so each monomer is bound to a different sequence of DNA. As such, the density for the two different DNA half-sites is averaged out in the asymmetric unit. I have tried to place two duplexes directly on top of one another, each with 0.5 occupancy, and then refine. But I have noticed two problems. The first is that when xyz refinement is off, and I look at the output files, the density for DNA is awfully green, as if there were only *one* helix with 0.5 occupancy there. The other problem I noticed is that when I turn xyz refinement on, and look at the output files, one of the two half-sites gets moved several angtroms, so that it is in a region that generally has no density. I expect that, if done properly, the backbone of both half-site DNAs ought not to move. Any advice would be greatly appreciated. Thanks, Bryan Schmidt
Hi Bryan, when you place them one on another with the occupancy 0.5, you need to treat them as "alternative conformations", that is set altloc identifiers A and B. In this case they will not "see" each other through non-bonded interactions term and therefore will not be pushed apart. Their relative occupancies will also be refined. Please let me know if you have any difficulty with this. I do not understand why xyz refinement is off. If you send me the inputs (data+model+commands), I will be happy to have a closer look. Pavel. On 5/7/09 8:25 AM, [email protected] wrote:
Hello, I am having difficulty using Phenix to build and refine a DNA duplex. The issue is that my asymmetric unit consists of one protein monomer bound to DNA, but the protein is a dimer and the DNA is not palindromic, so each monomer is bound to a different sequence of DNA. As such, the density for the two different DNA half-sites is averaged out in the asymmetric unit. I have tried to place two duplexes directly on top of one another, each with 0.5 occupancy, and then refine. But I have noticed two problems. The first is that when xyz refinement is off, and I look at the output files, the density for DNA is awfully green, as if there were only *one* helix with 0.5 occupancy there. The other problem I noticed is that when I turn xyz refinement on, and look at the output files, one of the two half-sites gets moved several angtroms, so that it is in a region that generally has no density. I expect that, if done properly, the backbone of both half-site DNAs ought not to move. Any advice would be greatly appreciated. Thanks, Bryan Schmidt
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
Hi all, I am wondering why the ncs refinement gives me a better Rfree (21.0%) than without ncs (21.7%). I have this experience with different projects and different data (3.3 A and 2.3 A). The R work is similar ~18% As it's a fact with my data, which refinement do you think should be the final? Maia
NCS restraints increases the data:parameter ratio in low resolution situations. Others have suggested not releasing NCS restraints unless your resolution is > 2.0A and the model is complete (as including NCS can help you build/complete your model with the use of NCS averaged maps). When you do not include NCS, do you see sidechain differences among the monomers making up the NCS? Personally I keep NCS in unless I do see genuine differences among the monomers (which could be biologically significant depending on your system). Of course these small differences (say in a binding pocket) would likely need to be verified with omit maps or even better, biochemically. Just my 0.02. FR On Jun 26, 2009, at 9:49 AM, Maia Cherney wrote:
Hi all, I am wondering why the ncs refinement gives me a better Rfree (21.0%) than without ncs (21.7%). I have this experience with different projects and different data (3.3 A and 2.3 A). The R work is similar ~18%
As it's a fact with my data, which refinement do you think should be the final?
Maia _______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
--------------------------------------------- Francis Reyes M.Sc. 215 UCB University of Colorado at Boulder gpg --keyserver pgp.mit.edu --recv-keys 67BA8D5D 8AE2 F2F4 90F7 9640 28BC 686F 78FD 6669 67BA 8D5D
Francis, Engin, Thank you very much for you input. I appreciate your idea to keep the ncs, but remove it from some parts of the model that have violations. Maia Francis E Reyes wrote:
NCS restraints increases the data:parameter ratio in low resolution situations. Others have suggested not releasing NCS restraints unless your resolution is > 2.0A and the model is complete (as including NCS can help you build/complete your model with the use of NCS averaged maps). When you do not include NCS, do you see sidechain differences among the monomers making up the NCS?
Personally I keep NCS in unless I do see genuine differences among the monomers (which could be biologically significant depending on your system). Of course these small differences (say in a binding pocket) would likely need to be verified with omit maps or even better, biochemically.
Just my 0.02.
FR
On Jun 26, 2009, at 9:49 AM, Maia Cherney wrote:
Hi all, I am wondering why the ncs refinement gives me a better Rfree (21.0%) than without ncs (21.7%). I have this experience with different projects and different data (3.3 A and 2.3 A). The R work is similar ~18%
As it's a fact with my data, which refinement do you think should be the final?
Maia _______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
--------------------------------------------- Francis Reyes M.Sc. 215 UCB University of Colorado at Boulder
gpg --keyserver pgp.mit.edu --recv-keys 67BA8D5D
8AE2 F2F4 90F7 9640 28BC 686F 78FD 6669 67BA 8D5D
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
Hi Maia, I would add a couple of comments too: 1)
I am wondering why the ncs refinement gives me a better Rfree (21.0%)
Because by using NCS you added some "observations", therefore you improved the data-to-parameters ratio, which in turn reduced degree of overfitting. The fact that the R-factor dropped (and not increased) probably suggests that you selected NCS groups correctly. 2)
the ncs refinement gives me a better Rfree (21.0%) than without ncs (21.7%).
Here is another stream of thought... The target for restrained coordinate (and similarly for B-factor) refinement looks like this (in phenix.refine): T_total = wxc_scale * wxc * T_xray + wc * T_geometry where the relative target weight wxc is determined as wxc ~ ratio of gradient's norms: Brünger, A.T., Karplus, M. & Petsko, G.A. (1989). Acta Cryst. A45, 50-61. "Crystallographic refinement by simulated annealing: application to crambin" Brünger, A.T. (1992). Nature (London), 355, 472-474. "The free R value: a novel statistical quantity for assessing the accuracy of crystal structures" Adams, P.D., Pannu, N.S., Read, R.J. & Brünger, A.T. (1997). Proc. Natl. Acad. Sci. 94, 5018-5023. "Cross-validated maximum likelihood enhances crystallographic simulated annealing refinement" Before wxc scale is computed, the structure is subject of a short molecular dynamics run - this is where the random component comes into play. Now, having said this, we know that if you run, for example, 100 identical phenix.refine runs, where the only difference between each run is the random seed, you will get 100 slightly different refinement results. The spread in R-factors depends on resolution, and if I remember correctly, for a structure at ~2A resolution I was getting delta_R~ from 0.1 to 2%. It can be higher at lower resolution, and smaller at higher resolution. The NCS term goes into T_geometry, which in turn means that it changes (somehow) the weight. This may explain the difference in R-factors and, I would say, the one less then 1% I would consider insignificant (unless it is highly systematic and consistent observation, and unless it is not made weight independent). To make it less arbitrary I would suggest to run two refinement jobs using "optimize_wxc=true optimize_wxu=true", one with NCS and the other one without using NCS. I'm sure I suggested this a month or two ago. Pavel.
Hi Pavel, Thank you for your letter. You mentioned "unless it is highly systematic and consistent observation .." Yes, it is highly systematic and consistent for these data. I continue to refine this structure with both strategies (with and without ncs). Weights optimization keeps the same difference between Rfee factors (improves both). I liked the suggestion by Engin to keep the ncs, but remove some offending residues from the ncs restraints. (There are such residues that have slightly different conformations and they get moved out of density by the ncs restraints). Maia Pavel Afonine wrote:
Hi Maia,
I would add a couple of comments too:
1)
I am wondering why the ncs refinement gives me a better Rfree (21.0%)
Because by using NCS you added some "observations", therefore you improved the data-to-parameters ratio, which in turn reduced degree of overfitting. The fact that the R-factor dropped (and not increased) probably suggests that you selected NCS groups correctly.
2)
the ncs refinement gives me a better Rfree (21.0%) than without ncs (21.7%).
Here is another stream of thought... The target for restrained coordinate (and similarly for B-factor) refinement looks like this (in phenix.refine):
T_total = wxc_scale * wxc * T_xray + wc * T_geometry
where the relative target weight wxc is determined as wxc ~ ratio of gradient's norms:
Brünger, A.T., Karplus, M. & Petsko, G.A. (1989). Acta Cryst. A45, 50-61. "Crystallographic refinement by simulated annealing: application to crambin" Brünger, A.T. (1992). Nature (London), 355, 472-474. "The free R value: a novel statistical quantity for assessing the accuracy of crystal structures" Adams, P.D., Pannu, N.S., Read, R.J. & Brünger, A.T. (1997). Proc. Natl. Acad. Sci. 94, 5018-5023. "Cross-validated maximum likelihood enhances crystallographic simulated annealing refinement"
Before wxc scale is computed, the structure is subject of a short molecular dynamics run - this is where the random component comes into play.
Now, having said this, we know that if you run, for example, 100 identical phenix.refine runs, where the only difference between each run is the random seed, you will get 100 slightly different refinement results. The spread in R-factors depends on resolution, and if I remember correctly, for a structure at ~2A resolution I was getting delta_R~ from 0.1 to 2%. It can be higher at lower resolution, and smaller at higher resolution.
The NCS term goes into T_geometry, which in turn means that it changes (somehow) the weight. This may explain the difference in R-factors and, I would say, the one less then 1% I would consider insignificant (unless it is highly systematic and consistent observation, and unless it is not made weight independent).
To make it less arbitrary I would suggest to run two refinement jobs using "optimize_wxc=true optimize_wxu=true", one with NCS and the other one without using NCS. I'm sure I suggested this a month or two ago.
Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
Hi Maia, yes, we discussed it with Engin a few months ago, agreeing that automatic NCS detection in phenix.refine should be enhanced by making use of maps (for example), and automatic taking care of outliers. Pavel. On 6/26/09 2:38 PM, Maia Cherney wrote:
Hi Pavel, Thank you for your letter. You mentioned "unless it is highly systematic and consistent observation .." Yes, it is highly systematic and consistent for these data. I continue to refine this structure with both strategies (with and without ncs). Weights optimization keeps the same difference between Rfee factors (improves both). I liked the suggestion by Engin to keep the ncs, but remove some offending residues from the ncs restraints. (There are such residues that have slightly different conformations and they get moved out of density by the ncs restraints).
Maia
Pavel Afonine wrote:
Hi Maia,
I would add a couple of comments too:
1)
I am wondering why the ncs refinement gives me a better Rfree (21.0%)
Because by using NCS you added some "observations", therefore you improved the data-to-parameters ratio, which in turn reduced degree of overfitting. The fact that the R-factor dropped (and not increased) probably suggests that you selected NCS groups correctly.
2)
the ncs refinement gives me a better Rfree (21.0%) than without ncs (21.7%).
Here is another stream of thought... The target for restrained coordinate (and similarly for B-factor) refinement looks like this (in phenix.refine):
T_total = wxc_scale * wxc * T_xray + wc * T_geometry
where the relative target weight wxc is determined as wxc ~ ratio of gradient's norms:
Brünger, A.T., Karplus, M. & Petsko, G.A. (1989). Acta Cryst. A45, 50-61. "Crystallographic refinement by simulated annealing: application to crambin" Brünger, A.T. (1992). Nature (London), 355, 472-474. "The free R value: a novel statistical quantity for assessing the accuracy of crystal structures" Adams, P.D., Pannu, N.S., Read, R.J. & Brünger, A.T. (1997). Proc. Natl. Acad. Sci. 94, 5018-5023. "Cross-validated maximum likelihood enhances crystallographic simulated annealing refinement"
Before wxc scale is computed, the structure is subject of a short molecular dynamics run - this is where the random component comes into play.
Now, having said this, we know that if you run, for example, 100 identical phenix.refine runs, where the only difference between each run is the random seed, you will get 100 slightly different refinement results. The spread in R-factors depends on resolution, and if I remember correctly, for a structure at ~2A resolution I was getting delta_R~ from 0.1 to 2%. It can be higher at lower resolution, and smaller at higher resolution.
The NCS term goes into T_geometry, which in turn means that it changes (somehow) the weight. This may explain the difference in R-factors and, I would say, the one less then 1% I would consider insignificant (unless it is highly systematic and consistent observation, and unless it is not made weight independent).
To make it less arbitrary I would suggest to run two refinement jobs using "optimize_wxc=true optimize_wxu=true", one with NCS and the other one without using NCS. I'm sure I suggested this a month or two ago.
Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
participants (4)
-
bschmidt@berkeley.edu
-
Francis E Reyes
-
Maia Cherney
-
Pavel Afonine