Randy Read
rjr27 at cam.ac.uk
Wed Dec 5 13:30:43 PST 2012
Dear Phil,
Yes, you're exactly right, the Z-scores come from the initial fast scoring, so if the LLG rescoring changes the relative order of the peaks, the Z-scores will look out of synch.
This is a new feature, and it still catches me off-guard occasionally as well. The reason for the change is that the overhead of rescoring 500 random translations just to get an LLG-based Z-score was a large part of the overall computational expense of difficult MR problems, and Airlie realised that we could eliminate most of that without much (any?) impact on finding the correct answer. Instead, a total of 500 random translations over all orientations are rescored so that an overall Z-score can be computed for each peak once all the translations for all the rotations have been collected.
The header for that section of the logfile is misleading, so I'll see if I can clarify that tomorrow.
By the way, if anyone has examples where changes like this stop Phaser from succeeding where it used to, then please tell us.
Regards,
Randy
-----
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research Tel: +44 1223 336500
Wellcome Trust/MRC Building Fax: +44 1223 336827
Hills Road E-mail: rjr27 at cam.ac.uk
Cambridge CB2 0XY, U.K. www-structmed.cimr.cam.ac.uk
On 5 Dec 2012, at 00:17, Phil Jeffrey wrote:
> Phaser from CCP4 6.3.0 built via fink running on OSX 10.6.8 in 64-bit mode.
>
> This is a snippet from a much larger run, ongoing. In multiple instances the order of the LLG and the Z-score in the translation function are not in sync: (42.16, 40.23, 40.18 vs 5.14, 5.24, 4.82) and looks like the Z-scores are still shown in the peak rank order before rescoring, while the LLGs are after rescoring.
>
>
> SET #5 of 6 TRIAL #60 of 115
> ----------------------------
> Search Euler = 263.0 82.1 85.7, Ensemble = ensemb3
>
> ANNOTATION: RFZ=3.6 TFZ=3.8 PAK=0 LLG=26 LLG=30
> Known MR solutions
> SOLU SPAC C 1 2 1
> SOLU 6DIM ENSE ensemb3 EULER 30.6 65.7 246.9 FRAC 0.90 0.00 0.17 BFAC -5.14
>
> Grid sampling: 0.825029 Angstroms
>
> Select peaks over 67.5% of top (i.e. 0.675*(top-mean)+mean)
> Top 360 translations before clustering will be rescored
> Calculating Likelihood for TF SET #5 of 6 TRIAL #60 of 115
> 0% 100%
> |=========================================================================| DONE
>
> Scoring 1 randomly sampled translations
> Generating Statistics for TF SET #5 of 6 TRIAL #60 of 115
> 0% 100%
> |==| DONE
>
>
> Top Peaks With Clustering
> -------------------------
> # Rank of the peak after rescoring search points
> (#) Rank of the peak before rescoring search points
> LLG Log-Likelihood Gain
> Z-Score Number of standard deviations of LLG above the mean
> FSS Fast Search Score
>
> Select all peaks
> There were 159 peaks
> # (#) Frac X Frac Y Frac Z LLG Z-score Split #Group raw/top
> 1 6 0.842 0.923 0.779 +42.16 5.14 0 2 52.58/ 52.58
> 2 3 0.369 0.865 0.553 +40.23 5.24 37 2 53.18/ 53.18
> 3 10 0.743 0.937 0.261 +40.18 4.82 48 2 50.69/ 50.69
> #SITES = 159: OUTPUT TRUNCATED TO 3 SITES
>
