Re: [phenixbb] Table 1 stastics
Dear all, I just want to point out that one possible source of discrepancy between I/sigma reported by a data processing program, and I/sigma reported by a program that uses the relevant cctbx routine is the following: that cctbx routine calculates the variance of merged data as max("internal variance", "external variance") which is different from what the data processing programs do (they calculate the "internal variance" - I hope I didn't get it the wrong way round). SHELXC and SHELXL also calculate the variance of the merged data like the data processing programs do. I dislike this "feature" of cctbx but I do not know if phenix.table_one actually uses this routine for this purpose. thanks, Kay P.S. you can find the discussion over at the cctbx mailing list; see e.g. http://phenix-online.org/pipermail/cctbxbb/2012-September/000530.html and other posts. On 10/18/2012 09:08 PM, [email protected] wrote:
Date: Thu, 18 Oct 2012 08:07:44 -0700 From: Nathaniel Echols
To: PHENIX user mailing list Subject: Re: [phenixbb] Table 1 stastics Message-ID: Content-Type: text/plain; charset=ISO-8859-1 On Thu, Oct 18, 2012 at 7:53 AM, Andrew Waight
wrote: Thanks for Phenix! Anyway quick question, I tried out the "generate Table1" utility for fun and noticed that my I/SigI and completeness do not match what is found in my XSCALE file. ... How does Phenix compute these parameters? Obviously my I/sigI has gotten much better (1.59 versus 0.85) but the scaling statistics should be the correct ones.
It computes these statistics directly from the reflection data you provide, not the log file. (The reasoning being that since these are the data that Phenix actually uses, the statistics calculated from them are more accurate and relevant than whatever the scaling program thinks.) My guess is that some of the internal processing accounts for the difference - could you please send me the data file?
FYI, the plan for the future is to recalculate all of these statistics from unmerged data rather than rely on parsing data processing logfiles, which obey no standard format and are subject to change without notice.
-Nat
-- Kay Diederichs http://strucbio.biologie.uni-konstanz.de email: [email protected] Tel +49 7531 88 4049 Fax 3183 Fachbereich Biologie, Universität Konstanz, Box M647, D-78457 Konstanz This e-mail is digitally signed. If your e-mail client does not have the necessary capabilities, just ignore the attached signature "smime.p7s".
On Fri, Oct 19, 2012 at 8:07 AM, Kay Diederichs
I just want to point out that one possible source of discrepancy between I/sigma reported by a data processing program, and I/sigma reported by a program that uses the relevant cctbx routine is the following: that cctbx routine calculates the variance of merged data as max("internal variance", "external variance") which is different from what the data processing programs do (they calculate the "internal variance" - I hope I didn't get it the wrong way round). SHELXC and SHELXL also calculate the variance of the merged data like the data processing programs do.
Good point. However, this particular CCTBX routine is only used when merging non-unique data, which would not be the typical input here (at least in the official release).
I dislike this "feature" of cctbx but I do not know if phenix.table_one actually uses this routine for this purpose.
It does. But I may change this, if I can be certain it won't screw things up for the small molecule folks. I'm just not sure why it was written that way in the first place... -Nat
= -3.0 as the cutoff, thus allowing some negative reflections, while
I was under the impression that the differences arise due to XDS uses
SIGNAL/NOISE
phenix use a strict cutoff of 0? Note, that the completeness in the high
res. shell is lower in phenix than XDS (90.6 vs 99.7), a behavior I have
seem several times solving different structures. XDS will show high
completeness at high res. and then in phenix.refine it always drops.
Am I wrong thinking this?
Thanks for any insights!
-Bjørn
--
Bjørn Panyella Pedersen
Macromolecular Structure Group
University of California, San Francisco
On Fri, Oct 19, 2012 at 8:15 AM, Nathaniel Echols
I just want to point out that one possible source of discrepancy between I/sigma reported by a data processing program, and I/sigma reported by a program that uses the relevant cctbx routine is the following: that cctbx routine calculates the variance of merged data as max("internal variance", "external variance") which is different from what the data processing programs do (they calculate the "internal variance" - I hope I didn't get it the wrong way round). SHELXC and SHELXL also calculate the variance of
On Fri, Oct 19, 2012 at 8:07 AM, Kay Diederichs
wrote: the merged data like the data processing programs do.
Good point. However, this particular CCTBX routine is only used when merging non-unique data, which would not be the typical input here (at least in the official release).
I dislike this "feature" of cctbx but I do not know if phenix.table_one actually uses this routine for this purpose.
It does. But I may change this, if I can be certain it won't screw things up for the small molecule folks. I'm just not sure why it was written that way in the first place...
-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On Thu, Oct 25, 2012 at 9:59 AM, Bjørn Pedersen
I was under the impression that the differences arise due to XDS uses SIGNAL/NOISE >= -3.0 as the cutoff, thus allowing some negative reflections, while phenix use a strict cutoff of 0? Note, that the completeness in the high res. shell is lower in phenix than XDS (90.6 vs 99.7), a behavior I have seem several times solving different structures. XDS will show high completeness at high res. and then in phenix.refine it always drops.
Phenix used to have a strict cutoff, but it now runs French&Wilson treatment by default, so negative intensities are allowed. However: Kay pointed out that in another piece of code, I am calculating I/sigma starting from amplitudes, which are always positive. For Table 1, if you use intensities as input, this won't matter, but if you use amplitudes and many of the original intensities were negative, it may be overestimating I/sigma. Fixing this now. It still doesn't explain why the completeness would be so far off, however. -Nat
Hi Nat and Bjørn, I apologize if this is way off base, but perhaps the difference in completeness is due to the (potential) difference between I-obs / F-obs (original reflections after data processing and reduction) and I-obs-filtered / F-obs-filtered (reflections actually used in refinement). Specifically, I am referring to data that was refined in older versions of Phenix (before French&Wison was default) -- and datasets that have some amount of weak reflections or negative intensities. Perhaps that is the case here? An easy to tell if there is a difference is to open an output mtz from refinement in ViewHKL (comes with CCP4 6.3) -- it nicely lists the original data vs data used for refinement and the corresponding completeness for each. Hope that helps, Kip On Oct 25, 2012, at 1:57 PM, Nathaniel Echols wrote:
On Thu, Oct 25, 2012 at 9:59 AM, Bjørn Pedersen
wrote: I was under the impression that the differences arise due to XDS uses SIGNAL/NOISE >= -3.0 as the cutoff, thus allowing some negative reflections, while phenix use a strict cutoff of 0? Note, that the completeness in the high res. shell is lower in phenix than XDS (90.6 vs 99.7), a behavior I have seem several times solving different structures. XDS will show high completeness at high res. and then in phenix.refine it always drops.
Phenix used to have a strict cutoff, but it now runs French&Wilson treatment by default, so negative intensities are allowed. However: Kay pointed out that in another piece of code, I am calculating I/sigma starting from amplitudes, which are always positive. For Table 1, if you use intensities as input, this won't matter, but if you use amplitudes and many of the original intensities were negative, it may be overestimating I/sigma. Fixing this now.
It still doesn't explain why the completeness would be so far off, however.
-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On Thu, Oct 25, 2012 at 12:03 PM, Kip Guja
I apologize if this is way off base, but perhaps the difference in completeness is due to the (potential) difference between I-obs / F-obs (original reflections after data processing and reduction) and I-obs-filtered / F-obs-filtered (reflections actually used in refinement).
Specifically, I am referring to data that was refined in older versions of Phenix (before French&Wison was default) -- and datasets that have some amount of weak reflections or negative intensities. Perhaps that is the case here?
Definitely possible - and there are probably still other filtering steps in Phenix that don't use French & Wilson. Generally the safest thing to do is either a) run French & Wilson treatment immediately once you have your data (depending on how you process it this may happen automatically), or b) always use the original intensities for everything (which will work for Phenix, more or less, but not other software). However, the "filtered" arrays with outliers removed generally won't be re-used anywhere - they are included for the sake of having a complete representation of the refinement result, but I believe they are also ignored (based on the label) by other programs. -Nat
participants (4)
-
Bjørn Pedersen
-
Kay Diederichs
-
Kip Guja
-
Nathaniel Echols