[cctbxbb] Model based outlier calculation in mmtbx.scaling.outlier_rejection

Peter Zwart phzwart at gmail.com
Sun Sep 8 22:56:13 PDT 2013


The reason for the power of N might have to do with extreme value statistics.

Tomorrow I'll take s detailed look.

Sent from my iPhone

On Sep 8, 2013, at 18:33, Keitaro Yamashita <k.yamashita at spring8.or.jp> wrote:

> Dear all,
> 
> Thank you for your replies. I have already had a look at Read (1999)
> paper, but I could not find direct explanation of this implementation
> (or what the message in the code explains).
> 
> Thanks to an advice of my friend, I understand that what the code does
> is something like likelihood-ratio test. The reason why taking square
> root is because cumulative distribution function of chi-square
> distribution with freedom of one is erf(sqrt(x/2)). However, I still
> do not understand the reason why it is raised to the power of N (**N).
> I would be grateful if you explained the reason.
> 
> Best regards,
> Keitaro
> 
> 2013/9/6 Peter Zwart <PHZwart at lbl.gov>:
>> Hi,
>> 
>> It has been a while since I wrote this and you could potentially be right
>> that I forget to devide by the second derivative, I'll have a look.
>> 
>> P
>> 
>> 
>> On 5 September 2013 02:31, Keitaro Yamashita <k.yamashita at spring8.or.jp>
>> wrote:
>>> 
>>> Dear cctbx developers,
>>> 
>>> I am interested in the implementation of model-based reflection
>>> outlier rejection. As I read the code
>>> mmtbx/scaling/outlier_rejection.py (lines 244-351), I noticed that
>>> maybe there was a discrepancy between what log_message explained and
>>> the actual code. The log_message in the code says:
>>> 
>>>> Outliers are rejected on the basis of the assumption that a scaled
>>>> log likelihood differnce 2(log[P(Fobs)]-log[P(Fmode)])/Q\" is
>>>> distributed
>>>> according to a Chi-square distribution (Q\" is equal to the second
>>>> derivative of the log likelihood function of the mode of the
>>>> distribution).
>>>> The outlier threshold of the p-value relates to the p-value of the
>>>> extreme value distribution of the chi-square distribution.
>>> 
>>> while actual p_value is calculated for each hkl as
>>> p_value = 1 - erf(sqrt(LLG))**N,
>>> where
>>> LLG = log p(F=Fbar | Fmodel) - log p(F=Fobs | Fmodel),
>>> and N is the number of reflections. Here, Fbar is F which
>>> gives the maximum value of p(F | Fmodel). At least, Q (the second
>>> derivative of p(F=Fbar | Fmodel)) is not used in the actual
>>> calculation.
>>> 
>>> Could someone please explain the meaning of the actual calculation?
>>> Why taking square-root and raising erf() result to the power of N?
>>> 
>>> Thank you very much,
>>> Keitaro
>>> _______________________________________________
>>> cctbxbb mailing list
>>> cctbxbb at phenix-online.org
>>> http://phenix-online.org/mailman/listinfo/cctbxbb
>> 
>> 
>> 
>> 
>> --
>> -----------------------------------------------------------------
>> P.H. Zwart
>> Research Scientist
>> Berkeley Center for Structural Biology
>> Lawrence Berkeley National Laboratories
>> 1 Cyclotron Road, Berkeley, CA-94703, USA
>> Cell: 510 289 9246
>> BCSB:      http://bcsb.als.lbl.gov
>> PHENIX:   http://www.phenix-online.org
>> SASTBX:  http://sastbx.als.lbl.gov
>> -----------------------------------------------------------------
>> 
>> _______________________________________________
>> cctbxbb mailing list
>> cctbxbb at phenix-online.org
>> http://phenix-online.org/mailman/listinfo/cctbxbb
>> 
> _______________________________________________
> cctbxbb mailing list
> cctbxbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/cctbxbb


More information about the cctbxbb mailing list