[cctbxbb] Model based outlier calculation in mmtbx.scaling.outlier_rejection

Sun Sep 8 18:33:55 PDT 2013

Dear all,

Thank you for your replies. I have already had a look at Read (1999)
paper, but I could not find direct explanation of this implementation
(or what the message in the code explains).

Thanks to an advice of my friend, I understand that what the code does
is something like likelihood-ratio test. The reason why taking square
root is because cumulative distribution function of chi-square
distribution with freedom of one is erf(sqrt(x/2)). However, I still
do not understand the reason why it is raised to the power of N (**N).
I would be grateful if you explained the reason.

Best regards,
Keitaro

2013/9/6 Peter Zwart <PHZwart at lbl.gov>:
> Hi,
>
> It has been a while since I wrote this and you could potentially be right
> that I forget to devide by the second derivative, I'll have a look.
>
> P
>
>
> On 5 September 2013 02:31, Keitaro Yamashita <k.yamashita at spring8.or.jp>
> wrote:
>>
>> Dear cctbx developers,
>>
>> I am interested in the implementation of model-based reflection
>> outlier rejection. As I read the code
>> mmtbx/scaling/outlier_rejection.py (lines 244-351), I noticed that
>> maybe there was a discrepancy between what log_message explained and
>> the actual code. The log_message in the code says:
>>
>> > Outliers are rejected on the basis of the assumption that a scaled
>> > log likelihood differnce 2(log[P(Fobs)]-log[P(Fmode)])/Q\" is
>> > distributed
>> > according to a Chi-square distribution (Q\" is equal to the second
>> > derivative of the log likelihood function of the mode of the
>> > distribution).
>> > The outlier threshold of the p-value relates to the p-value of the
>> > extreme value distribution of the chi-square distribution.
>>
>> while actual p_value is calculated for each hkl as
>> p_value = 1 - erf(sqrt(LLG))**N,
>> where
>> LLG = log p(F=Fbar | Fmodel) - log p(F=Fobs | Fmodel),
>> and N is the number of reflections. Here, Fbar is F which
>> gives the maximum value of p(F | Fmodel). At least, Q (the second
>> derivative of p(F=Fbar | Fmodel)) is not used in the actual
>> calculation.
>>
>> Could someone please explain the meaning of the actual calculation?
>> Why taking square-root and raising erf() result to the power of N?
>>
>> Thank you very much,
>> Keitaro
>> _______________________________________________
>> cctbxbb mailing list
>> cctbxbb at phenix-online.org
>> http://phenix-online.org/mailman/listinfo/cctbxbb
>
>
>
>
> --
> -----------------------------------------------------------------
> P.H. Zwart
> Research Scientist
> Berkeley Center for Structural Biology
> Lawrence Berkeley National Laboratories
> 1 Cyclotron Road, Berkeley, CA-94703, USA
> Cell: 510 289 9246
> BCSB:      http://bcsb.als.lbl.gov
> PHENIX:   http://www.phenix-online.org
> SASTBX:  http://sastbx.als.lbl.gov
> -----------------------------------------------------------------
>
> _______________________________________________
> cctbxbb mailing list
> cctbxbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/cctbxbb
>