[phenixbb] Phaser output and error messages

Randy Read rjr27 at cam.ac.uk
Tue Nov 17 04:06:24 PST 2009


Hi,

I think a number of these questions could be answered by looking  
carefully through the whole logfile and seeing what it tells you about  
what is happening in each step of the calculation.  As well, the  
primary literature should be considered to be part of what documents a  
program.

1. Phaser uses likelihood to solve structures by molecular  
replacement, so the best solution is the one with the highest log- 
likelihood-gain (LLG).  One of the ways we talk about this is to  
consider molecular replacement as testing a series of hypotheses about  
how the molecule is oriented and how it is positioned, and likelihood  
measures how consistent the data are with each of these hypotheses.   
The one with the highest LLG is the one that is supported most  
strongly by the data.

On the point of "tuning" parameters, I'm not sure what you mean.  In a  
particular case, you should know what you put into your  
crystallization drop, and you will usually have some expectation about  
the stoichiometry of any complexes, so you usually have a good idea of  
the possible content of the asymmetric unit and the sequence identity  
of the models (thus giving you a rough idea of the expected RMS  
error).  You may have to test different choices for the number of  
copies, if different numbers are consistent with the range of solvent  
contents observed in crystals, but you can do better than a generic  
assumption of, say, 50% solvent.  The sequence identity <-> RMS error  
relationship is only approximate, but once the structure is solved  
then refinement programs like phenix.refine will do a better job of  
estimating the impact of errors in the coordinates.  However, if you  
only see negative LLG values in a search, then (as the documentation  
says) you should revise the estimated RMS error upwards, because the  
model is clearly worse than you would expect from the sequence identity.

2. Given that the potential solutions in the .sol file are sorted by  
LLG, I'm not sure where the idea would come from that they could be  
given in the order they were found.  You can follow the solutions in  
the logfile and see this.  The whole computation has to finish before  
it is known which is the best solution.  We have heuristics to stop  
Phaser spending too much time looking down blind alleys and, as we  
improve our understanding of how to recognize a correct solution from  
noise, we will improve these heuristics.  So we're already doing as  
well as we know how to stop when the solution is found.

The ten-minute timeout is not a good idea.  A Phaser molecular  
replacement run comes after weeks to years of protein expression,  
crystallization and data collection, and before days to months of  
rebuilding, refinement and interpretation, so if it takes 30 minutes,  
two hours or even a day to find a solution, then it doesn't seem too  
long to wait.

3. To help people, in cases where (say) the computer crashes in the  
middle of a long run, we've made Phaser write out  
intermediate .sol, .pdb and .mtz files, so that (in principle) you  
could pick up from the middle, or you could examine an intermediate  
solution, say with 2 of 3 components placed.  If you stop it in the  
middle, then you will get files from, say, after the translation  
search but before the packing check, or after the packing check but  
before the rigid-body refinement.  The results won't be as good, and  
you may well miss something better that would have been found later.

4. If Phaser reports that there is no scattering in a model, it means  
that you have supplied an empty PDB file, or one where all the  
occupancies are equal to zero, or one containing only HETATM records  
and no ATOM records.  If this happens in other circumstances, then it  
would be a bug and we would appreciate seeing the offending PDB file.

I hope that helps.

Regards,

Randy Read

On 15 Nov 2009, at 13:13, Ian Stokes-Rees wrote:

> I'm having some discussion with a colleague about phaser output (we're
> using Phaser 2.1.4).  We haven't been able to find any documentation
> which can clarify our situation, and I'm hoping someone on the list  
> can
> help answer these questions.  I should mention that I am relatively  
> new
> to Phaser.
>
>
>
> 1. PHASER.sol files:  Which "SOLU SET" does Phaser consider to be the
> best?  The first or the last?  Or the one with the highest LLG,  
> wherever
> that may be?  In our experience of running Phaser over several MTZ  
> files
> with a range of models the best Phaser solution has always been the
> first, and this has had the highest LLG.
>
> Note: this is with "untuned" Phaser settings for identity, solvent
> fraction, or number of search models in ASU -- our goal is to do a  
> first
> run with "generic" settings for these over a larger set of models,  
> then
> (from TFZ and LLG scores) select a subset for which we will tune  
> Phaser
> parameters and PDB search model variations.
>
>
>
> 2. If we are right that the first "SOLU SET" entry is indicative of  
> the
> potential for the search model to form a good MR candidate, then is it
> the case that the first entry is the first Phaser solution that is
> computed?  Or is the PHASER.sol file a sorted list output at the end  
> of
> the run?  From my reading of the documentation it is output in order  
> of
> computation, and *for our purposes* (if my first statement in this
> question is correct) Phaser can stop after it outputs this first
> solution.  Is there some way to tell Phaser to stop after the first
> solution is output?
>
> I realize that this doesn't sound like it makes sense (how could  
> Phaser
> know to pick the best solution first, and even if it could, why  
> would it
> ever continue past this point), however I ask because we have put a 10
> minute timeout into our Phaser runs and we have many situations  
> where we
> get a timeout but PHASER.sol has already been generated and the best  
> LLG
> solutions are output first.  It leaves me wondering why it didn't just
> stop on its own after outputting the first result instead of being
> aborted by our (external) timeout that terminates the process?
>
>
>
> 3. PHASER.sol files: For single domain search models, we usually get
> output of the form:
>
> SOLU SET  RFZ=4.5 TFZ=5.2 PAK=0 LLG=14 LLG=14
> SOLU 6DIM ENSE model1 EULER  242.049   45.040  326.088 FRAC -0.09425
> 0.50268  0.42575
>
> however we see three variations:
>
> i) No LLG:
>
> SOLU SET  RFZ=3.1 TFZ=5.0 PAK=0
> SOLU 6DIM ENSE model2 EULER   59.983   69.335  319.701 FRAC -1.17131
> -0.70030  0.23150
>
> ii) One LLG:
>
> SOLU SET  RFZ=3.7 TFZ=4.6 PAK=0 LLG=25
> SOLU 6DIM ENSE model3 EULER  293.943  128.068  332.147 FRAC  0.06273
> 0.13175  0.25054
>
> iii) Two LLG entries, but with different values:
>
> SOLU SET  RFZ=3.8 TFZ=4.1 PAK=0 LLG=21 LLG=20
> SOLU 6DIM ENSE model4 EULER  278.058  129.347   33.292 FRAC  0.28446
> 0.29011 -0.07986
>
>
>
> 4. Occasionally we get an error that we don't understand:
>
> FATAL RUNTIME ERROR: No scattering in pdbfile model1.pdb
>
> What does this mean?  Is there a problem with the PDB file?  We can't
> see anything obvious in the ones which produce this error.
>
> Thanks,
>
> Ian
>
> -- 
> Ian Stokes-Rees, Research Associate
> SBGrid, Harvard Medical School
> http://sbgrid.org
>
> _______________________________________________
> phenixbb mailing list
> phenixbb at phenix-online.org
> http://www.phenix-online.org/mailman/listinfo/phenixbb

------
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research      Tel: + 44 1223 336500
Wellcome Trust/MRC Building                   Fax: + 44 1223 336827
Hills Road                                    E-mail: rjr27 at cam.ac.uk
Cambridge CB2 0XY, U.K.                       www- 
structmed.cimr.cam.ac.uk




More information about the phenixbb mailing list