Discrepancy between R-factors from phenix.refine vs phenix Generate "Table 1"
Dear Phenix Community, I am preparing my structure for deposition and ran it through the Generate "Table 1" function in phenix. The inputs I gave it were my .mtz structure factors file, the final pdb coordinate file, and the log file from hkl2000. I did not provide unmerged data. My pdb file (which had been refined using phenix.refine) had values of r_work=0.2342 r_free=0.2603 However, phenix.model_vs_data (from Generate "Table 1") gave me r_work=0.2940 r_free=0.3150 I also got this warning about resolution limits, but I think it should not have affected the R-free calculation to this extent: WARNING: Resolution limits in scaling log for structure A549T are inconsistent with resolution limits in the MTZ file: 50.00 - 3.02 (in logfile) 48.47 - 3.03 (in MTZ file) When I refined my model in phenix, I used the twin law h,-h-k,-l. I read in the documentation that twinning can account for some of this discrepancy, but that the program is supposed to take twinning into account if it will lower the calculated R-work by more than 2%, which it doesn't seem to have done (or there is some other problem with my data). Thanks for your help! -Sam
On Wed, May 29, 2013 at 4:13 PM, Sam Stampfer
I am preparing my structure for deposition and ran it through the Generate "Table 1" function in phenix. The inputs I gave it were my .mtz structure factors file, the final pdb coordinate file, and the log file from hkl2000. I did not provide unmerged data.
My pdb file (which had been refined using phenix.refine) had values of r_work=0.2342 r_free=0.2603 However, phenix.model_vs_data (from Generate "Table 1") gave me r_work=0.2940 r_free=0.3150
This is definitely not the intended behavior. Could you please send me the inputs off-list? thanks, Nat
On Wed, May 29, 2013 at 4:13 PM, Sam Stampfer
When I refined my model in phenix, I used the twin law h,-h-k,-l. I read in the documentation that twinning can account for some of this discrepancy, but that the program is supposed to take twinning into account if it will lower the calculated R-work by more than 2%, which it doesn't seem to have done (or there is some other problem with my data).
Okay, the problem is that your data don't actually appear to be twinned. The automatic method used by phenix.model_vs_data (which is used internally for Table 1 and the validation GUI) only tries possible twin laws if the results of the "L test" show a suspicious distribution of intensities. Your data look fine, so it doesn't bother trying the twin laws. That the R-factors are lower when you refine with a twin law isn't necessarily indicative of the data actually being twinned - Garib Murshudov has looked into this in detail but I confess to being ignorant of the math (but I can probably dig up his paper on the subject if anyone is interested). However, I'm pretty sure the data are actually in a higher-symmetry space group. Will send details and new files off-list (probably tomorrow at this rate). I should probably change some of the programs and/or documentation to make it more clear what is being done internally, since it took me a bit of digging to realize what was going on. In general, though, always be very careful before running twinned refinement! I have seen several users do this by mistake when they really had higher symmetry. The maps will also be more model-biased when using twinned refinement, so it's good to avoid doing this unless absolutely necessary. -Nat
Samuel, Nat, and all,
I have a similar issue. My data is nearly perfectly twinned so Xtriage usually fails to find any twin law. However when I solve the structure in the lower symmetry space group with twinning my maps look beautiful. In the higher symmetry space group, only 3 of the 6 monomers fit into nice looking density. I simply use generate table 1 and uncheck the box so it won't regenerate R factors. The discrepancy between the R-factors is however much more significant than 2%. From refine I get 18/22 and from table1 28/31.
~Heather
| On Wed, 29 May 2013 17:18:57 -0700
| Nathaniel Echols
On Thu, May 30, 2013 at 7:09 AM, Heather Condurso
I have a similar issue. My data is nearly perfectly twinned so Xtriage usually fails to find any twin law. However when I solve the structure in the lower symmetry space group with twinning my maps look beautiful. In the higher symmetry space group, only 3 of the 6 monomers fit into nice looking density. I simply use generate table 1 and uncheck the box so it won't regenerate R factors. The discrepancy between the R-factors is however much more significant than 2%. From refine I get 18/22 and from table1 28/31.
That's actually a little surprising that Xtriage wouldn't work - would you be willing to send us the data? Keep in mind that if you have perfect twinning and refine with the twin law, I think the maps will look beautiful pretty much by definition, because of the model bias. (Which doesn't mean that you did it wrong - the R-factors sound realistic - you just need to be very careful and make lots of omit maps.) It helps to have actual examples when warning against the dangers of twin refinement and model bias in general, so here's one (attached). This map isn't beautiful, but it matches the model very closely, including the complete lack of sidechains. Another Phenix user got this result by accident with a mostly-if-not-entirely incorrect MR solution - fortunately the combination of missing sidechain density and relatively high R-free (~40%) prompted him to email us. I suspect that if we searched the PDB we'd find an actual published example (especially considering how many spurious ligands have been found there). (Sorry to sound like a broken record here, but this keeps coming up in emails and workshops - perhaps we need better validation tools.) -Nat
Sorry if this is a little long, but I saw that more than one person was having a similar problem. First, I agree with everything Nat has said, but I will explain a little bit more. Twinning and/or pseudosymmetry can be a hairy issue to deal with. In some cases, NCS can be a real pain. I have seen many cases where just pseudosymmetry is present, but people assume it is twinned as well. To be fair, it is very easy to come to that conclusion, especially at the early stages. Let me just set the stage with a hypothetical example... You collect a 2.5A dataset that processes well in P622 with an Rsym=0.11 (overall). The intensity statistics look normal, which I won't go into right now because it is complicated (even though Nat eluded to it with the 'L-test'). You perform MR with a search model that has 60% sequence identity and get a solution with 1 molecule in the AU in P6522. The solvent content is 0.6. You run the solution through rigid-body, followed by positional and B refinement with simulated annealing. The resulting R=0.35, Rfree= 0.40. You look at the maps and the density does not look good. You can see density for most of the main-chain but almost all of the side-chain density is not present. You go about trying to build in the density but the R/Rfree do not drop much. What is your next step? Some would go to Se-Met or heavy atom soaks to solve by SAD or MIR, but the results are the same... Now What? Most people would say, its hexagonal where merohedral twinning is possible so it must be twinned. You rescale the data in P6, P312, P321, and even P3 (tetartohedral twinning) and the Rsym is about the same in all the subgroups with P6 and P3 being the lowest with Rsym=0.08. You perform MR in P65 and get a solution with 2 molecules in the AU. To 'check' if it is twinned, you run refinement twice from the same starting model; first, twin refinement specifying the 'twin law' (from XTRIAGE), and second regular refinement and then compare the results. You think the results speak for themselves... with the twin refinement, the R=0.24, Rfree=0.28 with the twin fraction refining to 0.49 (nearly perfect twinning). The maps look great! The regular refinement (without specifying the 'twin law') refines to R=0.30, R-free=0.32 and the maps look better than P6522, but not as good as the maps from the twin refinement. You think, 'This has to be twinned!' But here is the kicker, IT'S NOT TWINNED!!! How could this be??? When you input the 'twin law' in refinement and the twin fraction refines to >0.4, Phenix detwins using the proportionality rules. This method uses the model in the detwinning process which WILL introduce model bias in the maps, so they look great. Even if your model is NOT correct, the maps will still look good, because of the model bias. For this reason, you should calculate several 'omit maps' over different regions of your structure and inspect them, or look at maps calculated from refinement when you did NOT specify the 'twin law'. As for the R-factor being lower, the calculated twin R-factor is usually lower than the standard R-factor since the equations are NOT the same. So, a drop in R-factor at this early stage is not necessarily conclusive that your data is twinned. There are a couple of papers explaining this (which Nat eluded to and I believe I reference below). I have seen many people get 'itchy finger' and try to use the twin refinement too early. As Nat mentioned, you don't want to run it unless you are at the end of model building and ready for deposition, but the R-factor is still too high. Then it may be necessary to run refinement with the 'twin law' included to see if the R-factor drops 'significantly'. To summarize, if you suspect an issue with twinning and/or pseudosymmetry since the maps don't look as good as they should for a given resolution and the R-factor is too high, rescale/reprocess your data in lower symmetry SG's and rerun MR. When you get a solution, refine it WITHOUT specifying 'twin law' and look at the maps. A)If they look much better than the solution in the higher symmetry SG, continue your model building and refinement (WITHOUT 'twin law') cycles in this lower symmetry SG until the model building is complete and you would be ready to deposit the structure in the PDB. If the R-factor is still too high, then run refinement including the 'twin law' with the twin operator from Xtriage. If you see an appreciable decrease in R-factor, then you probably have twinning. If the R-factor dropped, but is still not low enough, then either your model is still not completely correct or you have something else going on as well (see end of B below). B) If the maps still do not look any better than the higher symmetry SG, try MR in another lower symmetry SG and repeat. I sometimes go down to P1, if I have enough data. If the maps still look bad (assuming you have a complete dataset and MR solution), then you may not have a SG issue at all, but something else. I would look into other issues including anisotropy, pseudo-translational symmetry, order-disorder, missing molecules, etc. I will usually scroll through the images to see if I see something obvious. Reprocess the data more carefully looking for things like extra spots not predicted, or predicted spots that aren't actually there, etc. Here are a few references to papers that explain some of these issues more in depth with examples: Acta Crystallogr. (2008), D64, 99-107. Acta Crystallogr. (2012), D68, 1541-1548. Acta Crystallogr. (2006), D62, 83-95. I hope this helps some people... Jon -- Jonathan P. Schuermann, Ph. D. Beamline Scientist, NE-CAT Argonne National Laboratory, 436E 9700 S. Cass Ave. Argonne, IL 60439 Email: [email protected] Tel: (630) 252-0682 On 05/30/2013 09:26 AM, Nathaniel Echols wrote:
I have a similar issue. My data is nearly perfectly twinned so Xtriage usually fails to find any twin law. However when I solve the structure in the lower symmetry space group with twinning my maps look beautiful. In the higher symmetry space group, only 3 of the 6 monomers fit into nice looking density. I simply use generate table 1 and uncheck the box so it won't regenerate R factors. The discrepancy between the R-factors is however much more significant than 2%. From refine I get 18/22 and from table1 28/31. That's actually a little surprising that Xtriage wouldn't work - would you be willing to send us the data? Keep in mind that if you have
On Thu, May 30, 2013 at 7:09 AM, Heather Condurso
wrote: perfect twinning and refine with the twin law, I think the maps will look beautiful pretty much by definition, because of the model bias. (Which doesn't mean that you did it wrong - the R-factors sound realistic - you just need to be very careful and make lots of omit maps.) It helps to have actual examples when warning against the dangers of twin refinement and model bias in general, so here's one (attached). This map isn't beautiful, but it matches the model very closely, including the complete lack of sidechains. Another Phenix user got this result by accident with a mostly-if-not-entirely incorrect MR solution - fortunately the combination of missing sidechain density and relatively high R-free (~40%) prompted him to email us. I suspect that if we searched the PDB we'd find an actual published example (especially considering how many spurious ligands have been found there).
(Sorry to sound like a broken record here, but this keeps coming up in emails and workshops - perhaps we need better validation tools.)
-Nat
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi, just for completeness... Correctly selecting free-r flags is essential in case of twinning (phenix.refine does it by default). If flags were chosen without having that in mind then "R=0.24, Rfree=0.28" from your example may easily become "R=0.24, Rfree=0.34" once you switch to correctly generated flags. I've seen it more than once. Pavel On 5/31/13 12:15 PM, Jon Schuermann wrote:
Sorry if this is a little long, but I saw that more than one person was having a similar problem.
First, I agree with everything Nat has said, but I will explain a little bit more. Twinning and/or pseudosymmetry can be a hairy issue to deal with. In some cases, NCS can be a real pain. I have seen many cases where just pseudosymmetry is present, but people assume it is twinned as well. To be fair, it is very easy to come to that conclusion, especially at the early stages. Let me just set the stage with a hypothetical example...
You collect a 2.5A dataset that processes well in P622 with an Rsym=0.11 (overall). The intensity statistics look normal, which I won't go into right now because it is complicated (even though Nat eluded to it with the 'L-test'). You perform MR with a search model that has 60% sequence identity and get a solution with 1 molecule in the AU in P6522. The solvent content is 0.6. You run the solution through rigid-body, followed by positional and B refinement with simulated annealing. The resulting R=0.35, Rfree= 0.40. You look at the maps and the density does not look good. You can see density for most of the main-chain but almost all of the side-chain density is not present. You go about trying to build in the density but the R/Rfree do not drop much. What is your next step? Some would go to Se-Met or heavy atom soaks to solve by SAD or MIR, but the results are the same... Now What? Most people would say, its hexagonal where merohedral twinning is possible so it must be twinned. You rescale the data in P6, P312, P321, and even P3 (tetartohedral twinning) and the Rsym is about the same in all the subgroups with P6 and P3 being the lowest with Rsym=0.08. You perform MR in P65 and get a solution with 2 molecules in the AU. To 'check' if it is twinned, you run refinement twice from the same starting model; first, twin refinement specifying the 'twin law' (from XTRIAGE), and second regular refinement and then compare the results. You think the results speak for themselves... with the twin refinement, the R=0.24, Rfree=0.28 with the twin fraction refining to 0.49 (nearly perfect twinning). The maps look great! The regular refinement (without specifying the 'twin law') refines to R=0.30, R-free=0.32 and the maps look better than P6522, but not as good as the maps from the twin refinement. You think, 'This has to be twinned!' But here is the kicker, IT'S NOT TWINNED!!! How could this be???
When you input the 'twin law' in refinement and the twin fraction refines to >0.4, Phenix detwins using the proportionality rules. This method uses the model in the detwinning process which WILL introduce model bias in the maps, so they look great. Even if your model is NOT correct, the maps will still look good, because of the model bias. For this reason, you should calculate several 'omit maps' over different regions of your structure and inspect them, or look at maps calculated from refinement when you did NOT specify the 'twin law'. As for the R-factor being lower, the calculated twin R-factor is usually lower than the standard R-factor since the equations are NOT the same. So, a drop in R-factor at this early stage is not necessarily conclusive that your data is twinned. There are a couple of papers explaining this (which Nat eluded to and I believe I reference below).
I have seen many people get 'itchy finger' and try to use the twin refinement too early. As Nat mentioned, you don't want to run it unless you are at the end of model building and ready for deposition, but the R-factor is still too high. Then it may be necessary to run refinement with the 'twin law' included to see if the R-factor drops 'significantly'.
To summarize, if you suspect an issue with twinning and/or pseudosymmetry since the maps don't look as good as they should for a given resolution and the R-factor is too high, rescale/reprocess your data in lower symmetry SG's and rerun MR. When you get a solution, refine it WITHOUT specifying 'twin law' and look at the maps. A)If they look much better than the solution in the higher symmetry SG, continue your model building and refinement (WITHOUT 'twin law') cycles in this lower symmetry SG until the model building is complete and you would be ready to deposit the structure in the PDB. If the R-factor is still too high, then run refinement including the 'twin law' with the twin operator from Xtriage. If you see an appreciable decrease in R-factor, then you probably have twinning. If the R-factor dropped, but is still not low enough, then either your model is still not completely correct or you have something else going on as well (see end of B below). B) If the maps still do not look any better than the higher symmetry SG, try MR in another lower symmetry SG and repeat. I sometimes go down to P1, if I have enough data. If the maps still look bad (assuming you have a complete dataset and MR solution), then you may not have a SG issue at all, but something else. I would look into other issues including anisotropy, pseudo-translational symmetry, order-disorder, missing molecules, etc. I will usually scroll through the images to see if I see something obvious. Reprocess the data more carefully looking for things like extra spots not predicted, or predicted spots that aren't actually there, etc.
Here are a few references to papers that explain some of these issues more in depth with examples: Acta Crystallogr. (2008), D64, 99-107. Acta Crystallogr. (2012), D68, 1541-1548. Acta Crystallogr. (2006), D62, 83-95.
I hope this helps some people...
Jon
-- Jonathan P. Schuermann, Ph. D. Beamline Scientist, NE-CAT Argonne National Laboratory, 436E 9700 S. Cass Ave. Argonne, IL 60439
Email:[email protected] Tel: (630) 252-0682
On 05/30/2013 09:26 AM, Nathaniel Echols wrote:
I have a similar issue. My data is nearly perfectly twinned so Xtriage usually fails to find any twin law. However when I solve the structure in the lower symmetry space group with twinning my maps look beautiful. In the higher symmetry space group, only 3 of the 6 monomers fit into nice looking density. I simply use generate table 1 and uncheck the box so it won't regenerate R factors. The discrepancy between the R-factors is however much more significant than 2%. From refine I get 18/22 and from table1 28/31. That's actually a little surprising that Xtriage wouldn't work - would you be willing to send us the data? Keep in mind that if you have
On Thu, May 30, 2013 at 7:09 AM, Heather Condurso
wrote: perfect twinning and refine with the twin law, I think the maps will look beautiful pretty much by definition, because of the model bias. (Which doesn't mean that you did it wrong - the R-factors sound realistic - you just need to be very careful and make lots of omit maps.) It helps to have actual examples when warning against the dangers of twin refinement and model bias in general, so here's one (attached). This map isn't beautiful, but it matches the model very closely, including the complete lack of sidechains. Another Phenix user got this result by accident with a mostly-if-not-entirely incorrect MR solution - fortunately the combination of missing sidechain density and relatively high R-free (~40%) prompted him to email us. I suspect that if we searched the PDB we'd find an actual published example (especially considering how many spurious ligands have been found there).
(Sorry to sound like a broken record here, but this keeps coming up in emails and workshops - perhaps we need better validation tools.)
-Nat
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
participants (5)
-
Heather Condurso
-
Jon Schuermann
-
Nathaniel Echols
-
Pavel Afonine
-
Sam Stampfer