Hi Mark,

all true statements, in general.

These tools are not to label an outlier as 'wrong'. Instead, they are meant to alert a user of something unusual, prompt to pay a closer attention and eventually explain the oddity (as result of paying a closer attention).

Very much like in your example, if Polygon shows an outlier and you bring good arguments to explain it (such as peculiarity of the data -- anisotropy, I/sigma, Rmerge, etc) then it's great and you are good to go.

The most common use case for the Polygon is when someone uses a suboptimal refinement strategy, gets hugely unlikely refinement statistics (such as R=25 at 1A resolution) and that goes unnoticed and ends up in the data base. One of my favorite examples is 1eic (1.4A, Rw=20, Rf=25). Polygon instantly tells you this is highly unusual. Applying proper refinement protocol, I can trivially get Rw and Rf down to 14 and 17% (otherwise, I would not know if I can potentially do this!).

Using resolution as a guide is just because this is easy to grasp by most users. Clearly, something like effective resolution (that accounts for data completeness, for example) may potentially be better.. but if I say "2A resolution" most people will instantaneously know what I mean, while if I say "effective resolution is 2A" I will have to explain what I mean (and I'm sure not all will be patient enough to listen!).

All in all, I'd say Polygon is based on a collection of compromises and shortcuts to get something useful and easy to grasp quickly.

All the best,
Pavel

On 4/17/18 12:16, Mark A. White wrote:
Pavel,

I have an issue with the general use of these metrics as an "IQ score" for protein structures.  They completely ignore the details of the experimental data and use one value, the maximum resolution, to set the Bar.  There are at least two reasons that this can be a poor choice.  (1) Highly Anisotropic data may go to 2.8A along one cell axis, but only to 3.4A for the other two. (2) The parameters used to cut the data.  Previously and I/sigma~3 or an Rmerge~30% were considered the limits of usable data.  Today many data sets use a CC1/2>=0.5 as a cutoff, with will include significantly more high resolution data and push the "Resolution" to a higher value.  In both cases we are now comparing data sets with data to ~1 I/sigma to older data sets with an cutoff I/sigma  of ~ 3 - 5.  These are not meaningful comparisons.  If the software were to define a comparative resolution based on I/sigma, completeness, then these comparisons would be  more meaningful.

If you want to reexamine the use of a single 'factor' in evaluating anything I can highly recommend Stephen Jay Gould's the Mismeasure of Man.  We need to examine the assumptions that are made in the creation of these metrics.

--
Yours sincerely,

Mark A. White, Ph.D.
Associate Professor of Biochemistry and Molecular Biology,
Manager, Sealy Center for Structural Biology and Molecular Biophysics
Macromolecular X-ray Laboratory,
Basic Science Building, Room 6.658A
University of Texas Medical Branch
Galveston, TX 77555-0647
mailto://[email protected]
http://xray.utmb.edu

QQ: "I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail."
- Abraham Maslow (1966)

-----Original Message-----
From: Pavel Afonine <[email protected]>
To: Tanner, John J. <[email protected]>, [email protected] <[email protected]>
Subject: Re: [phenixbb] R-factor expectations when translational pseudo symmetry is present
Date: Fri, 13 Apr 2018 11:11:59 -0700

Hi Jack,

Polygon tool is designed answer questions like "what Rwork, Rfree and Rfree-Rwork I expect at this resolution?".
If focusing on R-factors only, then you can get a quick idea using a command line tool:

phenix.r_factor_statistics 2.25

Histogram of Rwork for models in PDB at resolution 2.15-2.35 A:
     0.123 - 0.144      : 36
     0.144 - 0.165      : 442
     0.165 - 0.187      : 1669
     0.187 - 0.208      : 2782
     0.208 - 0.230      : 2023 <<< Your case
     0.230 - 0.251      : 812
     0.251 - 0.273      : 165
     0.273 - 0.294      : 19
     0.294 - 0.316      : 5
     0.316 - 0.337      : 3
Histogram of Rfree for models in PDB at resolution 2.15-2.35 A:
     0.160 - 0.183      : 43
     0.183 - 0.207      : 405
     0.207 - 0.231      : 1485
     0.231 - 0.255      : 2759
     0.255 - 0.278      : 2216 <<< Your case
     0.278 - 0.302      : 861
     0.302 - 0.326      : 142
     0.326 - 0.350      : 36
     0.350 - 0.373      : 7
     0.373 - 0.397      : 2
Histogram of Rfree-Rwork for all model in PDB at resolution 2.15-2.35 A:
     0.001 - 0.011      : 55
     0.011 - 0.021      : 247
     0.021 - 0.031      : 782
     0.031 - 0.041      : 1597
     0.041 - 0.050      : 2124 <<< Your case
     0.050 - 0.060      : 1716
     0.060 - 0.070      : 912
     0.070 - 0.080      : 316
     0.080 - 0.090      : 131
     0.090 - 0.100      : 76
Number of structures considered: 7956

So it looks like R-factors you have is what one would expect at this resolution.

Pavel

On 4/12/18 18:38, Tanner, John J. wrote:

Dear PhenixBB,


We have a crystal form that xtriage flags as having strong translational pseudo symmetry (Patterson peak 57% the height of the origin peak, p-value = 3E-5). 


The space group is P21212. We can solve the structure with MR and refine to R=0.233 and R-free =0.276 at 2.25 Angstrom resolution. The maps look very good, but do not suggest major additional modeling that could be done to improve the structure and lower the R-factors. I know that one expects the R-factors from refinement to be higher when TPS is present, but my question is how high is too high?  Has anyone done a study that shows the expectations for R-factors when TPS is present? 


Thanks,


Jack 

John J. Tanner
Interim Chair, Department of Biochemistry 
Professor of Biochemistry and Chemistry
Department of Biochemistry
University of Missouri-Columbia
117 Schweitzer Hall
503 S College Avenue
Columbia, MO 65211
Phone: 573-884-1280
Fax: 573-882-5635
Email: [email protected]
http://faculty.missouri.edu/~tannerjj/tannergroup/tanner.html
Lab: Schlundt Annex rooms 3,6,9, 203B, 203C
Office: Schlundt Annex 203A




_______________________________________________
phenixbb mailing list
[email protected]
http://phenix-online.org/mailman/listinfo/phenixbb
Unsubscribe: [email protected]

_______________________________________________
phenixbb mailing list
[email protected]
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fphenix-online.org%2Fmailman%2Flistinfo%2Fphenixbb&data=02%7C01%7Cmawhite%40utmb.edu%7C4389508070e8473b2ea708d5a16a2022%7C7bef256d85db4526a72d31aea2546852%7C0%7C0%7C636592399538790326&sdata=H0D6e7muY9LVRReD7StNbDsbdnp4GzpQiXnA%2F1usn1A%3D&reserved=0
Unsubscribe: [email protected]