R-factor expectations when translational pseudo symmetry is present
Dear PhenixBB, We have a crystal form that xtriage flags as having strong translational pseudo symmetry (Patterson peak 57% the height of the origin peak, p-value = 3E-5). The space group is P21212. We can solve the structure with MR and refine to R=0.233 and R-free =0.276 at 2.25 Angstrom resolution. The maps look very good, but do not suggest major additional modeling that could be done to improve the structure and lower the R-factors. I know that one expects the R-factors from refinement to be higher when TPS is present, but my question is how high is too high? Has anyone done a study that shows the expectations for R-factors when TPS is present? Thanks, Jack John J. Tanner Interim Chair, Department of Biochemistry Professor of Biochemistry and Chemistry Department of Biochemistry University of Missouri-Columbia 117 Schweitzer Hall 503 S College Avenue Columbia, MO 65211 Phone: 573-884-1280 Fax: 573-882-5635 Email: [email protected]mailto:[email protected] http://faculty.missouri.edu/~tannerjj/tannergroup/tanner.html Lab: Schlundt Annex rooms 3,6,9, 203B, 203C Office: Schlundt Annex 203A
Hi,
“How high is too high?” depends upon a lot of factors in your data that are related to the pseudo symmetry but also to the potential presence of other problems (e.g., anisotropy).
Look at this very useful Acta Cryst D paper from Paul Adam’s group and read the section about translational pseudo symmetry (Section 3.2).
Acta Crystallogr D Biol Crystallogr.https://www.ncbi.nlm.nih.gov/pubmed/?term=Surprises+and+pitfalls+arising+fro... 2008 Jan;64(Pt 1):99-107. Epub 2007 Dec 5.
Surprises and pitfalls arising from (pseudo)symmetry.
Zwart PHhttps://www.ncbi.nlm.nih.gov/pubmed/?term=Zwart%20PH%5BAuthor%5D&cauthor=true&cauthor_uid=180944731, Grosse-Kunstleve RWhttps://www.ncbi.nlm.nih.gov/pubmed/?term=Grosse-Kunstleve%20RW%5BAuthor%5D&cauthor=true&cauthor_uid=18094473, Lebedev AAhttps://www.ncbi.nlm.nih.gov/pubmed/?term=Lebedev%20AA%5BAuthor%5D&cauthor=true&cauthor_uid=18094473, Murshudov GNhttps://www.ncbi.nlm.nih.gov/pubmed/?term=Murshudov%20GN%5BAuthor%5D&cauthor=true&cauthor_uid=18094473, Adams PDhttps://www.ncbi.nlm.nih.gov/pubmed/?term=Adams%20PD%5BAuthor%5D&cauthor=true&cauthor_uid=18094473.
DOI:
10.1107/S090744490705531Xhttps://doi.org/10.1107/S090744490705531X
Diana
**************************************************
Diana R. Tomchick
Professor
Departments of Biophysics and Biochemistry
University of Texas Southwestern Medical Center
5323 Harry Hines Blvd.
Rm. ND10.214A
Dallas, TX 75390-8816
[email protected]mailto:[email protected]
(214) 645-6383 (phone)
(214) 645-6353 (fax)
On Apr 12, 2018, at 8:38 PM, Tanner, John J.
Hi Jack, Polygon tool is designed answer questions like "what Rwork, Rfree and Rfree-Rwork I expect at this resolution?". If focusing on R-factors only, then you can get a quick idea using a command line tool: phenix.r_factor_statistics 2.25 Histogram of Rwork for models in PDB at resolution 2.15-2.35 A: 0.123 - 0.144 : 36 0.144 - 0.165 : 442 0.165 - 0.187 : 1669 0.187 - 0.208 : 2782 *0.208 - 0.230 : 2023**<<< Your case ** 0.230 - 0.251 : 812* 0.251 - 0.273 : 165 0.273 - 0.294 : 19 0.294 - 0.316 : 5 0.316 - 0.337 : 3 Histogram of Rfree for models in PDB at resolution 2.15-2.35 A: 0.160 - 0.183 : 43 0.183 - 0.207 : 405 0.207 - 0.231 : 1485 0.231 - 0.255 : 2759 * 0.255 - 0.278 : 2216**<<< Your case* 0.278 - 0.302 : 861 0.302 - 0.326 : 142 0.326 - 0.350 : 36 0.350 - 0.373 : 7 0.373 - 0.397 : 2 Histogram of Rfree-Rwork for all model in PDB at resolution 2.15-2.35 A: 0.001 - 0.011 : 55 0.011 - 0.021 : 247 0.021 - 0.031 : 782 0.031 - 0.041 : 1597 * 0.041 - 0.050 : 2124**<<< Your case* 0.050 - 0.060 : 1716 0.060 - 0.070 : 912 0.070 - 0.080 : 316 0.080 - 0.090 : 131 0.090 - 0.100 : 76 Number of structures considered: 7956 So it looks like R-factors you have is what one would expect at this resolution. Pavel On 4/12/18 18:38, Tanner, John J. wrote:
Dear PhenixBB,
We have a crystal form that xtriage flags as having strong translational pseudo symmetry (Patterson peak 57% the height of the origin peak, p-value = 3E-5).
The space group is P21212. We can solve the structure with MR and refine to R=0.233 and R-free =0.276 at 2.25 Angstrom resolution. The maps look very good, but do not suggest major additional modeling that could be done to improve the structure and lower the R-factors. I know that one expects the R-factors from refinement to be higher when TPS is present, but my question is how high is too high? Has anyone done a study that shows the expectations for R-factors when TPS is present?
Thanks,
Jack
John J. Tanner Interim Chair, Department of Biochemistry Professor of Biochemistry and Chemistry Department of Biochemistry University of Missouri-Columbia 117 Schweitzer Hall 503 S College Avenue Columbia, MO 65211 Phone: 573-884-1280 Fax: 573-882-5635 Email: [email protected] mailto:[email protected] http://faculty.missouri.edu/~tannerjj/tannergroup/tanner.html http://faculty.missouri.edu/%7Etannerjj/tannergroup/tanner.html Lab: Schlundt Annex rooms 3,6,9, 203B, 203C Office: Schlundt Annex 203A
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb Unsubscribe: [email protected]
Thanks Pavel for the advice on polygon. I ran polygon using the phenix GUI. According to the GUI, my R-factors are on the edge of the histogram:
https://www.dropbox.com/s/txq7pmu1xrjmjb7/polygon.png?dl=0
It seems like the R-factor histograms from the GUI are different from those generated by phenix.r_factor_statistics (see attached png file).
Also, what is the difference in the GUI histograms for R-work and R-work (PDB)?
Thanks,
Jack
John J. Tanner
Interim Chair, Department of Biochemistry
Professor of Biochemistry and Chemistry
Department of Biochemistry
University of Missouri-Columbia
117 Schweitzer Hall
503 S College Avenue
Columbia, MO 65211
Phone: 573-884-1280
Fax: 573-882-5635
Email: [email protected]mailto:[email protected]
http://faculty.missouri.edu/~tannerjj/tannergroup/tanner.html
Lab: Schlundt Annex rooms 3,6,9, 203B, 203C
Office: Schlundt Annex 203A
On Apr 13, 2018, at 1:11 PM, Pavel Afonine
Please ignore “(see attached png file)” in my previous post.
Sent from Jack's iPhone
On Apr 14, 2018, at 7:49 AM, Tanner, John J.
Pavel,
I have an issue with the general use of these metrics as an "IQ score"
for protein structures. They completely ignore the details of the
experimental data and use one value, the maximum resolution, to set the
Bar. There are at least two reasons that this can be a poor choice.
(1) Highly Anisotropic data may go to 2.8A along one cell axis, but only
to 3.4A for the other two. (2) The parameters used to cut the data.
Previously and I/sigma~3 or an Rmerge~30% were considered the limits of
usable data. Today many data sets use a CC1/2>=0.5 as a cutoff, with
will include significantly more high resolution data and push the
"Resolution" to a higher value. In both cases we are now comparing data
sets with data to ~1 I/sigma to older data sets with an cutoff I/sigma
of ~ 3 - 5. These are not meaningful comparisons. If the software were
to define a comparative resolution based on I/sigma, completeness, then
these comparisons would be more meaningful.
If you want to reexamine the use of a single 'factor' in evaluating
anything I can highly recommend Stephen Jay Gould's the Mismeasure of
Man. We need to examine the assumptions that are made in the creation
of these metrics.
--
Yours sincerely,
Mark A. White, Ph.D.
Associate Professor of Biochemistry and Molecular Biology,
Manager, Sealy Center for Structural Biology and Molecular Biophysics
Macromolecular X-ray Laboratory,
Basic Science Building, Room 6.658A
University of Texas Medical Branch
Galveston, TX 77555-0647
mailto://[email protected]
http://xray.utmb.edu
QQ: "I suppose it is tempting, if the only tool you have is a hammer, to
treat everything as if it were a nail."
- Abraham Maslow (1966)
-----Original Message-----
From: Pavel Afonine
Dear PhenixBB,
We have a crystal form that xtriage flags as having strong translational pseudo symmetry (Patterson peak 57% the height of the origin peak, p-value = 3E-5).
The space group is P21212. We can solve the structure with MR and refine to R=0.233 and R-free =0.276 at 2.25 Angstrom resolution. The maps look very good, but do not suggest major additional modeling that could be done to improve the structure and lower the R-factors. I know that one expects the R-factors from refinement to be higher when TPS is present, but my question is how high is too high? Has anyone done a study that shows the expectations for R-factors when TPS is present?
Thanks,
Jack
John J. Tanner Interim Chair, Department of Biochemistry Professor of Biochemistry and Chemistry Department of Biochemistry University of Missouri-Columbia 117 Schweitzer Hall 503 S College Avenue Columbia, MO 65211 Phone: 573-884-1280 Fax: 573-882-5635 Email: [email protected] http://faculty.missouri.edu/~tannerjj/tannergroup/tanner.html Lab: Schlundt Annex rooms 3,6,9, 203B, 203C Office: Schlundt Annex 203A
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb Unsubscribe: [email protected]
_______________________________________________ phenixbb mailing list [email protected] https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fphenix-online.org%2Fmailman%2Flistinfo%2Fphenixbb&data=02%7C01%7Cmawhite%40utmb.edu%7C4389508070e8473b2ea708d5a16a2022%7C7bef256d85db4526a72d31aea2546852%7C0%7C0%7C636592399538790326&sdata=H0D6e7muY9LVRReD7StNbDsbdnp4GzpQiXnA%2F1usn1A%3D&reserved=0 Unsubscribe: [email protected]
Hi Mark, all true statements, in general. These tools are not to label an outlier as 'wrong'. Instead, they are meant to alert a user of something unusual, prompt to pay a closer attention and eventually explain the oddity (as result of paying a closer attention). Very much like in your example, if Polygon shows an outlier and you bring good arguments to explain it (such as peculiarity of the data -- anisotropy, I/sigma, Rmerge, etc) then it's great and you are good to go. The most common use case for the Polygon is when someone uses a suboptimal refinement strategy, gets hugely unlikely refinement statistics (such as R=25 at 1A resolution) and that goes unnoticed and ends up in the data base. One of my favorite examples is 1eic (1.4A, Rw=20, Rf=25). Polygon instantly tells you this is highly unusual. Applying proper refinement protocol, I can trivially get Rw and Rf down to 14 and 17% (otherwise, I would not know if I can potentially do this!). Using resolution as a guide is just because this is easy to grasp by most users. Clearly, something like effective resolution (that accounts for data completeness, for example) may potentially be better.. but if I say "2A resolution" most people will instantaneously know what I mean, while if I say "effective resolution is 2A" I will have to explain what I mean (and I'm sure not all will be patient enough to listen!). All in all, I'd say Polygon is based on a collection of compromises and shortcuts to get something useful and easy to grasp quickly. All the best, Pavel On 4/17/18 12:16, Mark A. White wrote:
Pavel,
I have an issue with the general use of these metrics as an "IQ score" for protein structures. They completely ignore the details of the experimental data and use one value, the maximum resolution, to set the Bar. There are at least two reasons that this can be a poor choice. (1) Highly Anisotropic data may go to 2.8A along one cell axis, but only to 3.4A for the other two. (2) The parameters used to cut the data. Previously and I/sigma~3 or an Rmerge~30% were considered the limits of usable data. Today many data sets use a CC1/2>=0.5 as a cutoff, with will include significantly more high resolution data and push the "Resolution" to a higher value. In both cases we are now comparing data sets with data to ~1 I/sigma to older data sets with an cutoff I/sigma of ~ 3 - 5. These are not meaningful comparisons. If the software were to define a comparative resolution based on I/sigma, completeness, then these comparisons would be more meaningful.
If you want to reexamine the use of a single 'factor' in evaluating anything I can highly recommend Stephen Jay Gould's the Mismeasure of Man. We need to examine the assumptions that are made in the creation of these metrics.
-- Yours sincerely,
Mark A. White, Ph.D. Associate Professor of Biochemistry and Molecular Biology, Manager, Sealy Center for Structural Biology and Molecular Biophysics Macromolecular X-ray Laboratory, Basic Science Building, Room 6.658A University of Texas Medical Branch Galveston, TX 77555-0647 mailto://[email protected] http://xray.utmb.edu
QQ: "I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail." - Abraham Maslow (1966)
-----Original Message----- *From*: Pavel Afonine
mailto:Pavel%20Afonine%20%[email protected]%3e> *To*: Tanner, John J. mailto:%22Tanner,%20John%20J.%22%20%[email protected]%3e>, [email protected] mailto:%[email protected]%22%20%[email protected]%3e> *Subject*: Re: [phenixbb] R-factor expectations when translational pseudo symmetry is present *Date*: Fri, 13 Apr 2018 11:11:59 -0700 Hi Jack,
Polygon tool is designed answer questions like "what Rwork, Rfree and Rfree-Rwork I expect at this resolution?". If focusing on R-factors only, then you can get a quick idea using a command line tool:
phenix.r_factor_statistics 2.25
Histogram of Rwork for models in PDB at resolution 2.15-2.35 A: 0.123 - 0.144 : 36 0.144 - 0.165 : 442 0.165 - 0.187 : 1669 0.187 - 0.208 : 2782 *0.208 - 0.230 : 2023 <<< Your case* * 0.230 - 0.251 : 812* 0.251 - 0.273 : 165 0.273 - 0.294 : 19 0.294 - 0.316 : 5 0.316 - 0.337 : 3 Histogram of Rfree for models in PDB at resolution 2.15-2.35 A: 0.160 - 0.183 : 43 0.183 - 0.207 : 405 0.207 - 0.231 : 1485 0.231 - 0.255 : 2759 * 0.255 - 0.278 : 2216 <<< Your case* 0.278 - 0.302 : 861 0.302 - 0.326 : 142 0.326 - 0.350 : 36 0.350 - 0.373 : 7 0.373 - 0.397 : 2 Histogram of Rfree-Rwork for all model in PDB at resolution 2.15-2.35 A: 0.001 - 0.011 : 55 0.011 - 0.021 : 247 0.021 - 0.031 : 782 0.031 - 0.041 : 1597 * 0.041 - 0.050 : 2124 <<< Your case* 0.050 - 0.060 : 1716 0.060 - 0.070 : 912 0.070 - 0.080 : 316 0.080 - 0.090 : 131 0.090 - 0.100 : 76 Number of structures considered: 7956
So it looks like R-factors you have is what one would expect at this resolution.
Pavel
On 4/12/18 18:38, Tanner, John J. wrote:
Dear PhenixBB,
We have a crystal form that xtriage flags as having strong translational pseudo symmetry (Patterson peak 57% the height of the origin peak, p-value = 3E-5).
The space group is P21212. We can solve the structure with MR and refine to R=0.233 and R-free =0.276 at 2.25 Angstrom resolution. The maps look very good, but do not suggest major additional modeling that could be done to improve the structure and lower the R-factors. I know that one expects the R-factors from refinement to be higher when TPS is present, but my question is how high is too high? Has anyone done a study that shows the expectations for R-factors when TPS is present?
Thanks,
Jack
John J. Tanner Interim Chair, Department of Biochemistry Professor of Biochemistry and Chemistry Department of Biochemistry University of Missouri-Columbia 117 Schweitzer Hall 503 S College Avenue Columbia, MO 65211 Phone: 573-884-1280 Fax: 573-882-5635 Email: [email protected] mailto:[email protected] http://faculty.missouri.edu/~tannerjj/tannergroup/tanner.html https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffaculty.missouri.edu%2F%257Etannerjj%2Ftannergroup%2Ftanner.html&data=02%7C01%7Cmawhite%40utmb.edu%7C4389508070e8473b2ea708d5a16a2022%7C7bef256d85db4526a72d31aea2546852%7C0%7C0%7C636592399538790326&sdata=1SiH0MMgyycxtsLmsLDhHsXLYS1XSYs%2BJ6mJuUg0D1Y%3D&reserved=0
Lab: Schlundt Annex rooms 3,6,9, 203B, 203C Office: Schlundt Annex 203A
_______________________________________________ phenixbb mailing list [email protected] mailto:[email protected] http://phenix-online.org/mailman/listinfo/phenixbb https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fphenix-online.org%2Fmailman%2Flistinfo%2Fphenixbb&data=02%7C01%7Cmawhite%40utmb.edu%7C4389508070e8473b2ea708d5a16a2022%7C7bef256d85db4526a72d31aea2546852%7C0%7C0%7C636592399538790326&sdata=H0D6e7muY9LVRReD7StNbDsbdnp4GzpQiXnA%2F1usn1A%3D&reserved=0 Unsubscribe:[email protected] mailto:[email protected]
_______________________________________________ phenixbb mailing list [email protected] mailto:[email protected] https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fphenix-online.org%2Fmailman%2Flistinfo%2Fphenixbb&data=02%7C01%7Cmawhite%40utmb.edu%7C4389508070e8473b2ea708d5a16a2022%7C7bef256d85db4526a72d31aea2546852%7C0%7C0%7C636592399538790326&sdata=H0D6e7muY9LVRReD7StNbDsbdnp4GzpQiXnA%2F1usn1A%3D&reserved=0 Unsubscribe:[email protected] mailto:[email protected]
participants (4)
-
Diana Tomchick
-
Mark A. White
-
Pavel Afonine
-
Tanner, John J.