Hello Phenixers, Are there any considerations to changing TLS groups mid refinement? Can I rest assured that after a few rounds of refinement with the new TLS definitions everything will sort itself out or should I do something drastic, like remove all the ANISOU records, reset B factors to some constant value, and start over? Thanks, Scott
Hi Scott,
Are there any considerations to changing TLS groups mid refinement?
what you mean? Sometime soon there will be a tool to identify TLS groups automatically so you don't need to run TLSMD. May be later on we make it run automatically as part of refinement so you just say "I want to do TLS refinement" and the rest will be done for you automatically. Pavel.
Hi Pavel, I have inherited a refinement project. I was not satisfied with the previous choice of TLS groups. I used TLSMD to determine a new set of TLS domains. Can I just proceed or should I do anything special to the starting PDB to "erase" any influence of the previous choice of TLS groups. Thanks, Scott On May 14, 2010, at 12:28 PM, Pavel Afonine wrote:
Hi Scott,
Are there any considerations to changing TLS groups mid refinement?
what you mean?
Sometime soon there will be a tool to identify TLS groups automatically so you don't need to run TLSMD. May be later on we make it run automatically as part of refinement so you just say "I want to do TLS refinement" and the rest will be done for you automatically.
Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi Scott,
I have inherited a refinement project. I was not satisfied with the previous choice of TLS groups. I used TLSMD to determine a new set of TLS domains. Can I just proceed or should I do anything special to the starting PDB to "erase" any influence of the previous choice of TLS groups.
you don't need to do anything special. Just keep going with the new TLS groups. Which model did you use for determination of new TLS groups? If you used the old one with old ADPs then the new TLS groups are influenced by the previous choice of TLS groups. This is because TLSMD uses the ADPs from your input PDB file to determine the TLS groups. Ideally, you need to to the following: - convert to isotropic and reset to a constant value all B-factors; - do group ADP refinement with one or two ADP per residue in phenix.refine; - submit refined model to TLSMD to determine TLS groups; - throw away this model (the one for which you refined group ADP); - use you current best available model with the new TLS groups. Pavel.
Hi, I'm getting ready to deposit some models and they have different FOBS cutoffs. I have been running PHENIX 1.6.1 (336) with the bare bones inputs. I am wondering why there is an FOBS cutoff by default, and how that cutoff is determined? Thanks, -bob
Hi Bob, phenix.refine does not use any data cutoff for refinement. What's reported in the record like this: REMARK 3 MIN(FOBS/SIGMA_FOBS) : 1.00 is what phenix.refine actually found in your data file. That is the minimal value of the ratio Fobs/Sigma(Fobs) among all the reflections is 1.0. So you shouldn't worry about this. Pavel. On 5/14/10 3:04 PM, Robert Immormino wrote:
Hi,
I'm getting ready to deposit some models and they have different FOBS cutoffs. I have been running PHENIX 1.6.1 (336) with the bare bones inputs. I am wondering why there is an FOBS cutoff by default, and how that cutoff is determined?
Thanks, -bob _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On Fri, 2010-05-14 at 15:35 -0700, Pavel Afonine wrote:
phenix.refine does not use any data cutoff for refinement.
So was the Fo>0 hard-wired cutoff removed? I don't have the latest version so I can't check myself. -- "I'd sacrifice myself, but I'm too good at whistling." Julian, King of Lemurs
Hi Ed, I agree I was too generous in my statement and you promptly caught it, thanks! phenix.refine does catch and deal with clearly nonsensical situations, like having Fobs<=0 in refinement. So, saying "phenix.refine does not use any data cutoff for refinement" was not precise, indeed. In addition, phenix.refine automatically removes Fobs outliers based on R.Read paper. I don't see much sense having a term (0-Fcalc)**2 in least-squares target or equivalent one in ML target. Implementing an intensity based ML target function (or corresponding LS) would allow using Iobs<=0, but this is not done yet, and this is a different story - your original question below was about Fo (Fobs). Do you have rock solid evidence that substituting missing (unmeasured) Fobs with 0 would be better than just using actual set (Fobs>0) in refinement? Or did I miss any relevant paper on this matter? I would appreciate if you point me out. Unless I see a clear evidence that this would improve things I wouldn't waste time on implementing it. Unfortunately I don't have time right now for experimenting with this myself. Thanks! Pavel. On 5/17/10 6:52 AM, Ed Pozharski wrote:
On Fri, 2010-05-14 at 15:35 -0700, Pavel Afonine wrote:
phenix.refine does not use any data cutoff for refinement.
So was the Fo>0 hard-wired cutoff removed? I don't have the latest version so I can't check myself.
This has been discussed before. For a start look at French and Wilson (French G.S. and Wilson K.S. Acta. Cryst. (1978), A34, 517.) Fobs < 0 is not possible, but Fobs = 0 clearly conveys some information (i.e. the reflection is weak). Simply deleting the data is the worst case scenario where you remove any information content on that reflection during refinement. I'm surprised that this would even need further exposition, especially in light of the community tendency to use higher outer-shell Rsymms (50-60%) where a significant proportion of the data would be expected to be weak and therefore subject to risk of arbitrary cutoff by phenix.refine. If I/sigI = 2 (a not uncommon outer shell criterion) then a decent proportion of the data might have I<0, and this data is really there and weak and not the imagination of the processing program. Does phenix.refine enforce an I<=0 cutoff too ? It certainly behaves as if it does. As it stands the best way to get around this undesirable feature of phenix.refine is to use CCP4's TRUNCATE program (with "TRUNCATE YES") which makes much of the weak data small but positive. You still have to tell phenix.refine not to use Imean and SigImean (it will use those if it can find them) and use F instead. Again, include F=0 data. F<0 should probably immediately terminate program execution, since it infers a data content error. Phil Jeffrey Princeton Pavel Afonine wrote:
Hi Ed,
I agree I was too generous in my statement and you promptly caught it, thanks!
phenix.refine does catch and deal with clearly nonsensical situations, like having Fobs<=0 in refinement. So, saying "phenix.refine does not use any data cutoff for refinement" was not precise, indeed. In addition, phenix.refine automatically removes Fobs outliers based on R.Read paper.
I don't see much sense having a term (0-Fcalc)**2 in least-squares target or equivalent one in ML target. Implementing an intensity based ML target function (or corresponding LS) would allow using Iobs<=0, but this is not done yet, and this is a different story - your original question below was about Fo (Fobs).
Do you have rock solid evidence that substituting missing (unmeasured) Fobs with 0 would be better than just using actual set (Fobs>0) in refinement? Or did I miss any relevant paper on this matter? I would appreciate if you point me out. Unless I see a clear evidence that this would improve things I wouldn't waste time on implementing it. Unfortunately I don't have time right now for experimenting with this myself.
Thanks! Pavel.
On 5/17/10 6:52 AM, Ed Pozharski wrote:
On Fri, 2010-05-14 at 15:35 -0700, Pavel Afonine wrote:
phenix.refine does not use any data cutoff for refinement.
So was the Fo>0 hard-wired cutoff removed? I don't have the latest version so I can't check myself.
------------------------------------------------------------------------
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Excellent point! As was discussed before, F=0 is most likely the consequence of a user error (improper I->F conversion protocol), F<0 clearly indicates that something is wrong and most sensible response from any program would be consider it a fatal error rather than quietly ignoring it. The information can be gleaned from the log-files, but who reads those unless something goes terribly wrong? On Mon, 2010-05-17 at 12:08 -0400, Phil Jeffrey wrote:
Again, include F=0 data. F<0 should probably immediately terminate program execution, since it infers a data content error.
-- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
Dear Phil,
This has been discussed before. For a start look at French and Wilson (French G.S. and Wilson K.S. Acta. Cryst. (1978), A34, 517.)
do they demonstrate the benefits of using these data in refinement (see below)? Well, I guess I need to re-read it again.
Fobs < 0 is not possible, but Fobs = 0 clearly conveys some information (i.e. the reflection is weak). Simply deleting the data is the worst case scenario where you remove any information content on that reflection during refinement. I'm surprised that this would even need further exposition, especially in light of the community tendency to use higher outer-shell Rsymms (50-60%) where a significant proportion of the data would be expected to be weak and therefore subject to risk of arbitrary cutoff by phenix.refine. If I/sigI = 2 (a not uncommon outer shell criterion) then a decent proportion of the data might have I<0, and this data is really there and weak and not the imagination of the processing program.
I'm all with you: yes, more data is better. Although, I repeat my original question: Did anyone demonstrate that: "refinement with Fobs=0" vs "refinement with Fobs>0" results in : - noticeably better map, so you can find more details, explain more features, - noticeably better model, - anything else visibly better ? If so then I agree it is worth spending time on implementing it. However, if it's just theoretically/esthetically nice to have it (as many other things, like better than Flat bulk solvent model), but the benefits are not really clear - then I would still have it in my todo list (since it's good in general) but with much lower priority. Yes, treating properly negatives and zeros is theoretically good. There is even a paper with myself being a co-author that touches on it. But for practical considerations it's all about the ratio "time invested into implementing it vs benefits obtained as result".
Does phenix.refine enforce an I<=0 cutoff too ? It certainly behaves as if it does.
phenix.refine does not have intensity based X-ray refinement targets and therefore phenix.refine does not use intensities in refinement. Although it accepts input reflection files with intensities which it then converts to amplitudes for all subsequent purposes. Pavel.
Can we turn the argument on its head ? Demonstrate that the way that phenix.refine, as currently implemented, inappropriately throws away weak data is never potentially deleterious to the quality of a protein structure model. See below - a test on real data/structure suggests that the data does matter.
phenix.refine does not have intensity based X-ray refinement targets and therefore phenix.refine does not use intensities in refinement. Although it accepts input reflection files with intensities which it then converts to amplitudes for all subsequent purposes.
So let's look at real data: Short version: phenix.refine throws out 2896 reflections of 48895, including 11% of the data in the outermost shell, compared to using TRUNCATE to prep my data. Using the common data subset the structure has a decreased R-free of 0.8% if you refine against the truncate=yes PDB file with the common subset of data. 0.8% at a 24% R-free (24.0 vs 24.8) is pretty significant IMHO. Longer version: Using the same MTZ file and PDB file, just using the default column label selection (IMEAN, SIGIMEAN) or the truncate=yes stucture factors (F, SIGF). phenix.refine's default behavior | 12: 2.1292 - 2.0684 0.90 2536 123 0.1851 0.2381 | 13: 2.0684 - 2.0139 0.88 2462 123 0.1813 0.2654 | 14: 2.0139 - 1.9648 0.87 2400 136 0.1936 0.2637 | 15: 1.9648 - 1.9201 0.81 2273 130 0.2090 0.2789 | 16: 1.9201 - 1.8793 0.83 2303 118 0.2216 0.2761 | 17: 1.8793 - 1.8417 0.72 1998 106 0.2388 0.2790 phenix.refine, forcing it to use F, SIGF out of truncate (truncate=yes) | 13: 2.1079 - 2.0524 0.97 2546 135 0.1862 0.2383 | 14: 2.0524 - 2.0023 0.96 2545 128 0.1874 0.2623 | 15: 2.0023 - 1.9568 0.96 2499 149 0.1920 0.2562 | 16: 1.9568 - 1.9152 0.92 2452 132 0.2106 0.2385 | 17: 1.9152 - 1.8769 0.95 2500 130 0.2169 0.2895 | 18: 1.8769 - 1.8415 0.83 2182 115 0.2403 0.2680 Columns are resolution range, completeness (work+free), #work, #free, Rwork, Rfree. The incompleteness in the outer shell of the "complete" data is because I was overly pessimistic in setting the detector distance. Mea culpa. The outer shell R-symm in SCALEPACK is 53.8%. Default behavior yields: Final: r_work = 0.1898 r_free = 0.2479 bonds = 0.007 angles = 1.114 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.842 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 32.943 REMARK 3 MIN(FOBS/SIGMA_FOBS) : 0.02 REMARK 3 COMPLETENESS FOR RANGE (%) : 91.38 REMARK 3 NUMBER OF REFLECTIONS : 45999 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 R VALUE (WORKING + TEST SET) : 0.1928 REMARK 3 R VALUE (WORKING SET) : 0.1898 REMARK 3 FREE R VALUE : 0.2479 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 5.08 REMARK 3 FREE R VALUE TEST SET COUNT : 2339 Truncate=yes data yields: Final: r_work = 0.1932 r_free = 0.2473 bonds = 0.007 angles = 1.113 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.841 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 32.943 REMARK 3 MIN(FOBS/SIGMA_FOBS) : 1.34 REMARK 3 COMPLETENESS FOR RANGE (%) : 97.10 REMARK 3 NUMBER OF REFLECTIONS : 48895 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 R VALUE (WORKING + TEST SET) : 0.1960 REMARK 3 R VALUE (WORKING SET) : 0.1932 REMARK 3 FREE R VALUE : 0.2473 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 5.10 REMARK 3 FREE R VALUE TEST SET COUNT : 2494 Despite the inclusion of more weak data the R-free doesn't change much. It should increase a little - the same way that R-work does. However phenix.refine discards 5.7% of the data overall, 11% of data in the outermost shell, and this is for a dataset that is not at all anisotropic - you expect the trend to be far worse with anisotropic data where a lot of the data can be weak at the high resolution limit. Bigger question is: what would R-free be for the common data subset (Imean > 0) but using the truncate=yes F values and PDB file ? I used SFTOOLS to make this selection, and then refining just the bulk solvent correction for the truncate=yes PDB file against this data subset..... Final R-work = 0.1884, R-free = 0.2399 i.e. if you refine the model against all the data from TRUNCATE, but then cut to the subset that phenix.refine would use by default, the R-free is lower by 0.8%. The R-free test count was the same as for the default phenix.refine behavior, so this superficially suggests I didn't do anything wrong. Phil Jeffrey
On Mon, 2010-05-17 at 17:37 -0400, Phil Jeffrey wrote:
Demonstrate that the way that phenix.refine, as currently implemented, inappropriately throws away weak data is never potentially deleterious to the quality of a protein structure model.
Phil, I have always used denzo/scalepack, and then scalepack2mtz to convert .sca file to .mtz file. So my data is always processed according to French&Wilson. Now from what you are saying I understand that there is some possibility to get into using non-truncated data with phenix? And not only that, it seems to be the default? I suspect that French&Wilson statistics is actually applied by phenix internally if the supplied mtz-file only has intensities. If that is not the case, the problem is more serious than I thought. Otherwise it would take deliberate user effort to get into trouble, since TRUNCATE=YES is the default of the corresponding CCP4 program. Ed. -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
On Mon, May 17, 2010 at 3:18 PM, Ed Pozharski
I have always used denzo/scalepack, and then scalepack2mtz to convert .sca file to .mtz file. So my data is always processed according to French&Wilson.
Now from what you are saying I understand that there is some possibility to get into using non-truncated data with phenix? And not only that, it seems to be the default?
FYI, AutoSol, AutoMR, and Phaser all accept scalepack files as input (or d*TREK or XDS, I think), and generate MTZ files as output, so if a user jumps directly from HKL2000 to Phenix, it would be very easy to skip the French&Wilson step. The need to run an extra conversion step in a different suite is not going to be obvious to grad students (and many if not most postdocs). We've discussed implementing the French&Wilson protocol in CCTBX, but I don't know how much work that is (since I still don't know what it actually does after reading this entire discussion). -Nat
As far as I remember from my early encounters with Denzo and Scalepack during my visit to Yale University in the beginning of 90's (Alan Friedman, if you are listening, thank you for that) Scalepack already is doing sort of "tinkering" and French&Wilson statistics known in other places that claim priority as Haiker statistics, is not necessary. However maybe I am wrong, and input from Otwinowsky or/and Minor may clarify situation. FF Dr Felix Frolow Professor of Structural Biology and Biotechnology Department of Molecular Microbiology and Biotechnology Tel Aviv University 69978, Israel Acta Crystallographica D, co-editor e-mail: [email protected] Tel: ++972 3640 8723 Fax: ++972 3640 9407 Cellular: ++972 547 459 608 On May 18, 2010, at 01:35 , Nathaniel Echols wrote:
On Mon, May 17, 2010 at 3:18 PM, Ed Pozharski
wrote: I have always used denzo/scalepack, and then scalepack2mtz to convert .sca file to .mtz file. So my data is always processed according to French&Wilson. Now from what you are saying I understand that there is some possibility to get into using non-truncated data with phenix? And not only that, it seems to be the default?
FYI, AutoSol, AutoMR, and Phaser all accept scalepack files as input (or d*TREK or XDS, I think), and generate MTZ files as output, so if a user jumps directly from HKL2000 to Phenix, it would be very easy to skip the French&Wilson step. The need to run an extra conversion step in a different suite is not going to be obvious to grad students (and many if not most postdocs).
We've discussed implementing the French&Wilson protocol in CCTBX, but I don't know how much work that is (since I still don't know what it actually does after reading this entire discussion).
-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Dear all, sorry to interfere late and also to update the subject (that had nothing to "TLS groups" in the last messages). I would like also to point out an article in which in fact Pavel is one of the co-authors : Lunin et al (2002), Acta Cryst, A58, 270-282 and that is directly relevant to the question of using Fobs=0. Comparison of maximum-likelihood-based and least-squares refinements and a development of the ML-target into a Taylor series show that the ML-refinement has two basic features : 1) it defines in a very special way the weights for some terms 2) for weak reflections, especially when the model is incomplete (that is always the case), it suggests fitting Fcalc to 0 and not at all to the measured Fobs. You may look at the corresponding plots in the article. This means that, complementarily to the French & Wilson study of experimental data, an option to use Fobs=0 maybe not so meaningless in a refinement program especially if by some reasons one wants to use LS-refinement and not ML. Best regards, Sacha PS : My personal request to the phenixbb users - please take care and put a right subject when posting your original message
I doubt this. Scalepack outputs intensities, and the negative ones stay negative. On Tue, 2010-05-18 at 08:44 +0300, Felix Frolow wrote:
Scalepack already is doing sort of "tinkering" and French&Wilson statistics -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
I suspect that French&Wilson statistics is actually applied by phenix internally if the supplied mtz-file only has intensities.
phenix.refine does not use the French&Wilson algorithm internally. I started looking into it a couple times but went on to work on other things when the extent of the data manipulations made me feel uneasy. (Giving the users the option would be nice, though.) Ralf
Now from what you are saying I understand that there is some possibility to get into using non-truncated data with phenix? And not only that, it seems to be the default?
You make it sound like it is a bad thing. The effect of restraint weights (ADP, geometry) has most likely a much bigger impact on the final structure then a small fraction of smallish intensities (*) The French and Wilson procedure does circumvent issues with negative intensities, but depends on some prior knowledge of the intensity distribution and uses this to estimate the 'error free amplitude'. It all sounds nice, but it is not without trouble. If one has pseudo translational symmetry for instance, the basic wilson prior isn't valid. I agree at that smallish intensities are useful (if not essential in that case, depending on the degree of the PTS), but a bayesian update with an incorrect prior might even be worse. Also, why limit oneself to updating negative intensities? This is a fairly arbitrary decision rule. Also, how does one update? Do we update to the most likely posterior intensity or to the most likely posterior amplitude? These need not be the same. I fully agree with Pavel that a solution should be pursued and I also agree that the effort/payoff ratio should be used for ranking purposes. There is a intensity massaging option available in phenix, it is not default, well hidden and not for the faint of heart for reason stated above. Cheers, Peter (*): when the fraction is not small, but fairly sizeable due to pseudo translational symmetry, I definitely wouldn't want to have a French and Wilson procedure let loose on my intensities, but an ML intensity target that can deal with this kind of data is the only solution.
I suspect that French&Wilson statistics is actually applied by phenix internally if the supplied mtz-file only has intensities. If that is not the case, the problem is more serious than I thought. Otherwise it would take deliberate user effort to get into trouble, since TRUNCATE=YES is the default of the corresponding CCP4 program.
Ed.
-- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- ----------------------------------------------------------------- P.H. Zwart Beamline Scientist Berkeley Center for Structural Biology Lawrence Berkeley National Laboratories 1 Cyclotron Road, Berkeley, CA-94703, USA Cell: 510 289 9246 BCSB: http://bcsb.als.lbl.gov PHENIX: http://www.phenix-online.org CCTBX: http://cctbx.sf.net -----------------------------------------------------------------
On Mon, 2010-05-17 at 16:20 -0700, Peter Zwart wrote:
You make it sound like it is a bad thing. The effect of restraint weights (ADP, geometry) has most likely a much bigger impact on the final structure then a small fraction of smallish intensities (*)
Peter, I think discarding data has to be justified. Two points: 1. The fraction of negative intensities is not necessarily small. It depends on resolution cutoff (and you make an excellent point about PTS), but looking at scalepack log-files I can tell that in my hands the fraction is often 10% or more (DISCLAIMER: I belong to I/sigma=1 resolution cutoff cult). 2. Just because these reflections are weak does not mean that they are insignificant. Their contribution to the maps may be small (now I fear another round of "fill-in missing Fobs with Fc for map calculation" discussion), but keeping Fc close to zero for these reflections during refinement seems to be just as important as to keep Fc close to whatever values the strong reflections have. The weak reflections are not fundamentally worse than strong(er) reflections in the same resolution range. They are measured with roughly the same precision. Moreover, the practice of setting negative intensities to zero and then ignoring them in refinement discards those that are barely negative and leaves in those that were (quite randomly) barely positive. You are absolutely right that other factors will have impact on the model. But that does not mean that discarding weak data is justified. Crystallographic refinement is a Rube Goldberg machine, and all the components should be as good as we can make them. Perhaps there could be something better than French&Wilson, but discarding negative intensity reflections is hardly the solution. Cheers, Ed. PS. Personally I have no stake in this, since I always use truncate. -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
The philosophical arguments are fine, but is Pavel not justified in asking for real cases where it matters? It's not like he hasn't spent time thinking about it, and it's not like the whole phenix community isn't sitting in his ear about their own favourite missing fix. I suppose what comes to mind is a case with hectic pseudotranslation causing half the reflections to be systematically almost but not quite zero. But then again, I understand that even if you don't toss the weak ones out, current algorithms don't deal with this well anyway, so it needs special treatment (for now: refine in smaller cell, then rigid-body refine in super-cell). phx. On 18/05/2010 18:33, Ed Pozharski wrote:
On Mon, 2010-05-17 at 16:20 -0700, Peter Zwart wrote:
You make it sound like it is a bad thing. The effect of restraint weights (ADP, geometry) has most likely a much bigger impact on the final structure then a small fraction of smallish intensities (*)
Peter,
I think discarding data has to be justified. Two points:
1. The fraction of negative intensities is not necessarily small. It depends on resolution cutoff (and you make an excellent point about PTS), but looking at scalepack log-files I can tell that in my hands the fraction is often 10% or more (DISCLAIMER: I belong to I/sigma=1 resolution cutoff cult).
2. Just because these reflections are weak does not mean that they are insignificant. Their contribution to the maps may be small (now I fear another round of "fill-in missing Fobs with Fc for map calculation" discussion), but keeping Fc close to zero for these reflections during refinement seems to be just as important as to keep Fc close to whatever values the strong reflections have.
The weak reflections are not fundamentally worse than strong(er) reflections in the same resolution range. They are measured with roughly the same precision. Moreover, the practice of setting negative intensities to zero and then ignoring them in refinement discards those that are barely negative and leaves in those that were (quite randomly) barely positive.
You are absolutely right that other factors will have impact on the model. But that does not mean that discarding weak data is justified. Crystallographic refinement is a Rube Goldberg machine, and all the components should be as good as we can make them. Perhaps there could be something better than French&Wilson, but discarding negative intensity reflections is hardly the solution.
Cheers,
Ed.
PS. Personally I have no stake in this, since I always use truncate.
Frank,
Having had a case of such hectic pseudosymmetry, it turns out that
the problem originates already from the scaling that assumes a
unimodal intensity distribution. The weak reflections then have their
sigmas heavily overestimated and the strong ones overestimated. I had
to scale the weak and strong reflections separately and ridig-body
refine against the weak ones... Of course this type of problem has
little to do with issue of treating negative intensities.
Esko
Quoting "Frank von Delft"
The philosophical arguments are fine, but is Pavel not justified in asking for real cases where it matters? It's not like he hasn't spent time thinking about it, and it's not like the whole phenix community isn't sitting in his ear about their own favourite missing fix.
I suppose what comes to mind is a case with hectic pseudotranslation causing half the reflections to be systematically almost but not quite zero. But then again, I understand that even if you don't toss the weak ones out, current algorithms don't deal with this well anyway, so it needs special treatment (for now: refine in smaller cell, then rigid-body refine in super-cell).
phx.
On 18/05/2010 18:33, Ed Pozharski wrote:
On Mon, 2010-05-17 at 16:20 -0700, Peter Zwart wrote:
You make it sound like it is a bad thing. The effect of restraint weights (ADP, geometry) has most likely a much bigger impact on the final structure then a small fraction of smallish intensities (*)
Peter,
I think discarding data has to be justified. Two points:
1. The fraction of negative intensities is not necessarily small. It depends on resolution cutoff (and you make an excellent point about PTS), but looking at scalepack log-files I can tell that in my hands the fraction is often 10% or more (DISCLAIMER: I belong to I/sigma=1 resolution cutoff cult).
2. Just because these reflections are weak does not mean that they are insignificant. Their contribution to the maps may be small (now I fear another round of "fill-in missing Fobs with Fc for map calculation" discussion), but keeping Fc close to zero for these reflections during refinement seems to be just as important as to keep Fc close to whatever values the strong reflections have.
The weak reflections are not fundamentally worse than strong(er) reflections in the same resolution range. They are measured with roughly the same precision. Moreover, the practice of setting negative intensities to zero and then ignoring them in refinement discards those that are barely negative and leaves in those that were (quite randomly) barely positive.
You are absolutely right that other factors will have impact on the model. But that does not mean that discarding weak data is justified. Crystallographic refinement is a Rube Goldberg machine, and all the components should be as good as we can make them. Perhaps there could be something better than French&Wilson, but discarding negative intensity reflections is hardly the solution.
Cheers,
Ed.
PS. Personally I have no stake in this, since I always use truncate.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
If by "scaling" you mean scaling unmerged observations together, I don't think it is right that it assumes a unimodal intensity distribution. In refinement I believe the underlying distributions are assumed to be unimodal, but in data reduction scaling there is no such assumption. That scaling just assumes that the experiment (mainly the diffraction geometry) can be parameterised in such a way that reflections close together in reciprocal space and in measurement time have similar and parameterisable scales. This is independent of the intensity distribution. Phil On 19 May 2010, at 08:44, [email protected] wrote:
Frank,
Having had a case of such hectic pseudosymmetry, it turns out that the problem originates already from the scaling that assumes a unimodal intensity distribution. The weak reflections then have their sigmas heavily overestimated and the strong ones overestimated. I had to scale the weak and strong reflections separately and ridig-body refine against the weak ones... Of course this type of problem has little to do with issue of treating negative intensities.
Esko
Quoting "Frank von Delft"
: The philosophical arguments are fine, but is Pavel not justified in asking for real cases where it matters? It's not like he hasn't spent time thinking about it, and it's not like the whole phenix community isn't sitting in his ear about their own favourite missing fix.
I suppose what comes to mind is a case with hectic pseudotranslation causing half the reflections to be systematically almost but not quite zero. But then again, I understand that even if you don't toss the weak ones out, current algorithms don't deal with this well anyway, so it needs special treatment (for now: refine in smaller cell, then rigid-body refine in super-cell).
phx.
On 18/05/2010 18:33, Ed Pozharski wrote:
On Mon, 2010-05-17 at 16:20 -0700, Peter Zwart wrote:
You make it sound like it is a bad thing. The effect of restraint weights (ADP, geometry) has most likely a much bigger impact on the final structure then a small fraction of smallish intensities (*)
Peter,
I think discarding data has to be justified. Two points:
1. The fraction of negative intensities is not necessarily small. It depends on resolution cutoff (and you make an excellent point about PTS), but looking at scalepack log-files I can tell that in my hands the fraction is often 10% or more (DISCLAIMER: I belong to I/sigma=1 resolution cutoff cult).
2. Just because these reflections are weak does not mean that they are insignificant. Their contribution to the maps may be small (now I fear another round of "fill-in missing Fobs with Fc for map calculation" discussion), but keeping Fc close to zero for these reflections during refinement seems to be just as important as to keep Fc close to whatever values the strong reflections have.
The weak reflections are not fundamentally worse than strong(er) reflections in the same resolution range. They are measured with roughly the same precision. Moreover, the practice of setting negative intensities to zero and then ignoring them in refinement discards those that are barely negative and leaves in those that were (quite randomly) barely positive.
You are absolutely right that other factors will have impact on the model. But that does not mean that discarding weak data is justified. Crystallographic refinement is a Rube Goldberg machine, and all the components should be as good as we can make them. Perhaps there could be something better than French&Wilson, but discarding negative intensity reflections is hardly the solution.
Cheers,
Ed.
PS. Personally I have no stake in this, since I always use truncate.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
I recently went through this - I had a .sca file that had gone through the Scalepack2mtz -> Truncate route, then used it in phenix, and since Imean and SigImean are still present in the MTZ file phenix.refine uses those. As per Pavel it simply converts them to F's by square root, discards Imean less than/equal to zero. This is the DEFAULT behavior, and so quite pernicious and easy to miss. I caught this while checking R-free in shells and noticed lower completeness than Scalepack indicated, as per my example. Phil
Phil,
I have always used denzo/scalepack, and then scalepack2mtz to convert .sca file to .mtz file. So my data is always processed according to French&Wilson.
Now from what you are saying I understand that there is some possibility to get into using non-truncated data with phenix? And not only that, it seems to be the default?
I suspect that French&Wilson statistics is actually applied by phenix internally if the supplied mtz-file only has intensities. If that is not the case, the problem is more serious than I thought. Otherwise it would take deliberate user effort to get into trouble, since TRUNCATE=YES is the default of the corresponding CCP4 program.
Ed.
Dear Phil,
Can we turn the argument on its head ?
well, it depends what you call "head" -:)
Short version:
phenix.refine throws out 2896 reflections of 48895, including 11% of the data in the outermost shell, compared to using TRUNCATE to prep my data. Using the common data subset the structure has a decreased R-free of 0.8% if you refine against the truncate=yes PDB file with the common subset of data.
0.8% at a 24% R-free (24.0 vs 24.8) is pretty significant IMHO.
You cannot compare the R-factors that were computed using different sets of reflections. Therefore the above comparison is not valid, obviously. Same applies to your "Longer version". Let's compare apples with apples. Comparing R-factors in this case does not tell that one refinement is better or worse than the other one. It just doesn't tell anything because the R-factor is not a good measure when you deal with two different datasets (datasets containing different amount of reflections). Pavel.
Hi Phil,
Can you redo the exercise but get your intensities like this:
phenix.reflection_file_converter mysca.sca --massage-intensities
--write_mtz_amplitudes --mtz_root_label=FM \
--mtz=massage.mtz
You did use the common set of miller indices for R value computation, right?
Cheers,
P
2010/5/17 Pavel Afonine
Dear Phil,
Can we turn the argument on its head ?
well, it depends what you call "head" -:)
Short version:
phenix.refine throws out 2896 reflections of 48895, including 11% of the data in the outermost shell, compared to using TRUNCATE to prep my data. Using the common data subset the structure has a decreased R-free of 0.8% if you refine against the truncate=yes PDB file with the common subset of data.
0.8% at a 24% R-free (24.0 vs 24.8) is pretty significant IMHO.
You cannot compare the R-factors that were computed using different sets of reflections. Therefore the above comparison is not valid, obviously. Same applies to your "Longer version". Let's compare apples with apples.
Comparing R-factors in this case does not tell that one refinement is better or worse than the other one. It just doesn't tell anything because the R-factor is not a good measure when you deal with two different datasets (datasets containing different amount of reflections).
Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
-- ----------------------------------------------------------------- P.H. Zwart Beamline Scientist Berkeley Center for Structural Biology Lawrence Berkeley National Laboratories 1 Cyclotron Road, Berkeley, CA-94703, USA Cell: 510 289 9246 BCSB: http://bcsb.als.lbl.gov PHENIX: http://www.phenix-online.org CCTBX: http://cctbx.sf.net -----------------------------------------------------------------
You cannot compare the R-factors that were computed using different sets of reflections. Therefore the above comparison is not valid, obviously. Same applies to your "Longer version". Let's compare apples with apples.
Read the example. It's the same set of reflections. In particular I did that because it's the same set of F's which is not very easy to get to (phenix.refine's reflection utilities don't let me output F if I do a <= truncation on intensity). Specifically: default behavior of Phenix from the mtz is to take Imean, throw away anything with Imean < 0 and convert to F. That mtz file contained F's out of CCP4's TRUNCATE, but all the data. phenix.refine throws that data away internally. So I have three selection methods: 1. TRUNCATE F data which I have to force phenix.refine to use via labels='F,SIGF' 2. F data that phenix.refine converts from Imean 3. A subset of #1 reduced by the selection criteria used in #2 So I take a PDB file refined against option #1 and compare to #3. This contains all the Free R reflections that #2 has but since TRUNCATE modifies the F's. That's about as fair a comparison as I can find: a PDB file refined against #2 and compared to #2 vs a PDB file refined against #1 and compared to #3. The only difference between #1 and #3 is that #1 contains F's with Imean < 0, altered via TRUNCATE.
Comparing R-factors in this case does not tell that one refinement is better or worse than the other one. It just doesn't tell anything because the R-factor is not a good measure when you deal with two different datasets (datasets containing different amount of reflections).
This would mean that the whole thing is inherently untestable because of phenix.refine's rejection criteria - there will always be a difference in data count because of that. Propose a better experiment. Phil
Dear Phil,
Comparing R-factors in this case does not tell that one refinement is better or worse than the other one. It just doesn't tell anything because the R-factor is not a good measure when you deal with two different datasets (datasets containing different amount of reflections).
This would mean that the whole thing is inherently untestable because of phenix.refine's rejection criteria - there will always be a difference in data count because of that.
I guess this paper nicely explains this: Interpretation of ensembles created by multiple iterative rebuilding of macromolecular models. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. Afonine, P. D. Adams, N. W. Moriarty, P. H. Zwart, R. J. Read, D. Turk and L.-W. Hung Acta Cryst. D63, 597-610 (2007).
Propose a better experiment.
Just a stream of thought (you can tune this up): Do two complete structure solutions in parallel: 1) using the dataset containing Fobs=0 and 2) using dataset with Fobs>0. Given the above paper, you would probably need to build an ensemble of models in each experiment. Then find the differences between two results and demonstrate that these differences are "important" (or, saying differently, analyze these differences and may be you will find them "important"). Yes, you will need to define what is "important": tiny gain in R-factor (or making sure it stays the same at least) or revealing some new structural details, building a more complete model, resolving unclear densities and so on. You will probably need to consider doing this for a bunch of structures (and not just for your favorite one), at different resolutions. If you are lucky, you may run into the case where the building/refinement process gets stuck if you remove Fobs=0 and you magically unstuck it by including those Fobs=0. And so on, and so on. A nice little project for someone who has some extra time to spend. You may even publish it then: "On impact of weak reflections (Fobs=0) in structure solution and final model quality". But, I would not do this myself. Because I know it is good to use all data, at least it will not harm. And I know it will be done, and phenix.refine will use these data. The only thing is the priority: since I don't know how important it is (so far no-one convinced me that I have to rush doing it right now) I will not jump into doing it today, but rather would keep doing more pressing things. But sometime in future it will be there. All the best! Pavel.
Dear Phil, on a further thought about your "Longer version":
Default behavior yields: Final: r_work = 0.1898 r_free = 0.2479 bonds = 0.007 angles = 1.114 (...) Truncate=yes data yields: Final: r_work = 0.1932 r_free = 0.2473 bonds = 0.007 angles = 1.113
these Rfree-factor results above seem "identical" to me (we can compare them since they were the same for both refinements). Indeed, 0.006% difference - well, this is close to FFT vs direct summation impact -:) Also, see one of my previous posts about possible deviations in R-factors within similar refinement runs. The context was somewhat different but the basic idea stands. Pavel.
No. You've missed the point here. Pavel Afonine wrote:
Dear Phil,
on a further thought about your "Longer version":
Default behavior yields: Final: r_work = 0.1898 r_free = 0.2479 bonds = 0.007 angles = 1.114 (...) Truncate=yes data yields: Final: r_work = 0.1932 r_free = 0.2473 bonds = 0.007 angles = 1.113
these Rfree-factor results above seem "identical" to me (we can compare them since they were the same for both refinements).
Except they are absolutely not the same set of reflections. phenix.refine throws away the weakest data in the "default behavior" case. It throws away 5.7% of them. There are fewer reflections (read down the email and look at the Remark 3's from the PDB file). Truncate includes this data in the truncate=yes case, which contains more reflections (see Remark 3), this additional data will be weaker, so you'd expect R-free and R-work to be higher. I even agree that it's not wise to compare these two R-free's, which is why I created the data subset in the last test in my earlier email. A more ideal test would be to remove the extra truncate=yes reflections from the test set. However I don't know of any program that will do that (at least not with my current mtz file) and I'm not going to hand edit the data since I'm actually using this data. Regenerating a test set would involve concerns about removing work set bias from the new test set, and that's a can of worms I do not want to enter into. Phil
Pavel Afonine wrote:
Do you have rock solid evidence that substituting missing (unmeasured) Fobs with 0 would be better than just using actual set (Fobs>0) in refinement? Phil Jeffrey wrote: Fobs = 0 clearly conveys some information (i.e. the reflection is weak). Simply deleting the data is the worst case scenario
The problem seems to be that phenix is using Fobs=0 as a "missing number flag" which precludes its use as a valid measurement. I second the truncate advice, not actually truncating the reflections at zero but histogram-shifting them above zero (French & Wilson). Even without the French & Wilson, the number of refelection that are precisely 0.00000 must be rather small. However if all negative intensities are set to zero there could be a lot, and it might still be good to refine against them, since they must have been pretty weak. Any change which increases their calculated value should be penalized relative to one that decreases it Ed
The person(s) in charge of cctbx can give more specific answer, but I suspect this is not the case. While the main format is mtz (and thus NaN is used to specify missing reflections), it appears to me from a quick look at mmtbx module that internally missing reflections are exactly that - missing from corresponding python data structures. On Mon, 2010-05-17 at 12:49 -0400, Edward A. Berry wrote:
The problem seems to be that phenix is using Fobs=0 as a "missing number flag" which precludes its use as a valid measurement.
-- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
The person(s) in charge of cctbx can give more specific answer, but I suspect this is not the case. While the main format is mtz (and thus NaN is used to specify missing reflections), it appears to me from a quick look at mmtbx module that internally missing reflections are exactly that - missing from corresponding python data structures.
That's true, but the whole story is slightly more complicated. Reflections with Fobs=0 are ignored at the mtz reading stage if Sigma=0 or Sigma=NaN.
On Mon, 2010-05-17 at 12:49 -0400, Edward A. Berry wrote:
The problem seems to be that phenix is using Fobs=0 as a "missing number flag" which precludes its use as a valid measurement.
I think we could fairly easily change phenix.refine to use Fobs=0 if Sigma > 0. I'll add that to the low-priority to-do list (since there's the truncate workaround). Ralf
On Mon, 2010-05-17 at 13:00 -0700, Ralf W. Grosse-Kunstleve wrote:
That's true, but the whole story is slightly more complicated. Reflections with Fobs=0 are ignored at the mtz reading stage if Sigma=0 or Sigma=NaN.
That's good since both 0 and NaN values for sigma are unreasonable.
I think we could fairly easily change phenix.refine to use Fobs=0 if Sigma > 0. I'll add that to the low-priority to-do list (since there's the truncate workaround).
I guess truncate is not a workaround but rather something one must do. But this option may be useful when one inherits an incorrectly processed dataset. Few reflections will be of Iobs=0 in every dataset (just checked and got 8 reflections out of ~18000 with I=0.0 in scalepack output). Truncate, however, will push them into positive territory. Of course, if one got such a crazy dataset with a lot of zeros, a simple trick would be to increment every Fobs by a very small number. Would cheat the Fobs>0 check. Ed. -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
On Mon, 2010-05-17 at 08:47 -0700, Pavel Afonine wrote:
Do you have rock solid evidence that substituting missing (unmeasured) Fobs with 0 would be better than just using actual set (Fobs>0) in refinement? Or did I miss any relevant paper on this matter? I would appreciate if you point me out. Unless I see a clear evidence that this would improve things I wouldn't waste time on implementing it. Unfortunately I don't have time right now for experimenting with this myself.
Pavel, I don't think anyone (certainly not me) have ever suggested to replace *missing* reflections with zeros. On both occasions (now and last December) the issue was the Fobs=0 reflections introduced by I->F conversion of negative intensities. Ed. -- "I'd jump in myself, if I weren't so good at whistling." Julian, King of Lemurs
participants (14)
-
Alexandre Urzhumtsev
-
Ed Pozharski
-
Edward A. Berry
-
esko.oksanen@helsinki.fi
-
Felix Frolow
-
Frank von Delft
-
Nathaniel Echols
-
Pavel Afonine
-
Peter Zwart
-
Phil Evans
-
Phil Jeffrey
-
Ralf W. Grosse-Kunstleve
-
Robert Immormino
-
Scott Classen