[phenixbb] phenix and weak data
Randy Read
rjr27 at cam.ac.uk
Wed Dec 12 07:47:19 PST 2012
On 12 Dec 2012, at 15:36, Douglas Theobald wrote:
> On Dec 12, 2012, at 1:46 AM, Ed Pozharski <epozh001 at UMARYLAND.EDU> wrote:
>
>> On Tue, 2012-12-11 at 11:27 -0500, Douglas Theobald wrote:
>>
>>> What is the evidence, if any, that the exptl sigmas are actually negligible compared to fit beta (is it alluded to in Lunin 2002)? Is there somewhere in phenix output I can verify this myself?
>>
>> Essentially, equation 4 in Lunin (2002) is the same as equation 14 in
>> Murshudov (1997) or equation 1 in Cowtan (2005) or 12-79 in Rupp (2010).
>> The difference is that instead of combination of sigf^2 and sigma_wc you
>> have a single parameter, beta. One can do that assuming that
>> sigf<<sqrt(beta). Phenix log files list optimized beta parameter in
>> each resolution shell.
>
> From the log file:
>
> |-----------------------------------------------------------------------------|
> |R-free likelihood based estimates for figures of merit, absolute phase error,|
> |and distribution parameters alpha and beta (Acta Cryst. (1995). A51, 880-887)|
> | |
> | Bin Resolution No. Refl. FOM Phase Scale Alpha Beta |
> | # range work test error factor |
> | 1: 44.4859 - 3.0705 14086 154 0.93 12.12 1.00 0.98 118346.13|
> | 2: 3.0705 - 2.4372 13777 149 0.91 15.26 1.00 0.99 58331.77|
> | 3: 2.4372 - 2.1291 13644 148 0.94 11.42 1.00 0.99 23216.31|
>
> it appears that phenix estimates alpha and beta from the R-free set rather than from the working set (I might be misreading that). Is that correct?
Yes, using the cross-validation data was a key step in getting maximum likelihood refinement to work. A long time ago (a few years before our first paper on ML refinement) I implemented a first version of the MLF target we put into CNS, but the sigmaA values were estimated from the working data. What happened was that the data would be over-fit, then the sigmaA estimates would go up (with part of the increase being a result of the overfitting), then in the next cycle the pressure to fit the data compared to the restraints would be higher, and so on. The best I could claim for this at the time was that the resulting models were at least as good as the ones from least-squares refinement, but the R-factors were higher (indicating that there was still less over-fitting). It would have been hard to sell the advantage of higher R-factors to the protein crystallography community so it was good that, when we started using cross-validated sigmaA values, the convergence radius improved and we could get significantly better models with lower R-factors. I think you'll find that all the programs use just the cross-validation data to estimate the variance parameters for the likelihood target, not just phenix.refine.
Randy
> _______________________________________________
> phenixbb mailing list
> phenixbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/phenixbb
------
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research Tel: + 44 1223 336500
Wellcome Trust/MRC Building Fax: + 44 1223 336827
Hills Road E-mail: rjr27 at cam.ac.uk
Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk
More information about the phenixbb
mailing list