[phenixbb] Phaser output and error messages
Randy Read
rjr27 at cam.ac.uk
Tue Nov 17 04:06:24 PST 2009
Hi,
I think a number of these questions could be answered by looking
carefully through the whole logfile and seeing what it tells you about
what is happening in each step of the calculation. As well, the
primary literature should be considered to be part of what documents a
program.
1. Phaser uses likelihood to solve structures by molecular
replacement, so the best solution is the one with the highest log-
likelihood-gain (LLG). One of the ways we talk about this is to
consider molecular replacement as testing a series of hypotheses about
how the molecule is oriented and how it is positioned, and likelihood
measures how consistent the data are with each of these hypotheses.
The one with the highest LLG is the one that is supported most
strongly by the data.
On the point of "tuning" parameters, I'm not sure what you mean. In a
particular case, you should know what you put into your
crystallization drop, and you will usually have some expectation about
the stoichiometry of any complexes, so you usually have a good idea of
the possible content of the asymmetric unit and the sequence identity
of the models (thus giving you a rough idea of the expected RMS
error). You may have to test different choices for the number of
copies, if different numbers are consistent with the range of solvent
contents observed in crystals, but you can do better than a generic
assumption of, say, 50% solvent. The sequence identity <-> RMS error
relationship is only approximate, but once the structure is solved
then refinement programs like phenix.refine will do a better job of
estimating the impact of errors in the coordinates. However, if you
only see negative LLG values in a search, then (as the documentation
says) you should revise the estimated RMS error upwards, because the
model is clearly worse than you would expect from the sequence identity.
2. Given that the potential solutions in the .sol file are sorted by
LLG, I'm not sure where the idea would come from that they could be
given in the order they were found. You can follow the solutions in
the logfile and see this. The whole computation has to finish before
it is known which is the best solution. We have heuristics to stop
Phaser spending too much time looking down blind alleys and, as we
improve our understanding of how to recognize a correct solution from
noise, we will improve these heuristics. So we're already doing as
well as we know how to stop when the solution is found.
The ten-minute timeout is not a good idea. A Phaser molecular
replacement run comes after weeks to years of protein expression,
crystallization and data collection, and before days to months of
rebuilding, refinement and interpretation, so if it takes 30 minutes,
two hours or even a day to find a solution, then it doesn't seem too
long to wait.
3. To help people, in cases where (say) the computer crashes in the
middle of a long run, we've made Phaser write out
intermediate .sol, .pdb and .mtz files, so that (in principle) you
could pick up from the middle, or you could examine an intermediate
solution, say with 2 of 3 components placed. If you stop it in the
middle, then you will get files from, say, after the translation
search but before the packing check, or after the packing check but
before the rigid-body refinement. The results won't be as good, and
you may well miss something better that would have been found later.
4. If Phaser reports that there is no scattering in a model, it means
that you have supplied an empty PDB file, or one where all the
occupancies are equal to zero, or one containing only HETATM records
and no ATOM records. If this happens in other circumstances, then it
would be a bug and we would appreciate seeing the offending PDB file.
I hope that helps.
Regards,
Randy Read
On 15 Nov 2009, at 13:13, Ian Stokes-Rees wrote:
> I'm having some discussion with a colleague about phaser output (we're
> using Phaser 2.1.4). We haven't been able to find any documentation
> which can clarify our situation, and I'm hoping someone on the list
> can
> help answer these questions. I should mention that I am relatively
> new
> to Phaser.
>
>
>
> 1. PHASER.sol files: Which "SOLU SET" does Phaser consider to be the
> best? The first or the last? Or the one with the highest LLG,
> wherever
> that may be? In our experience of running Phaser over several MTZ
> files
> with a range of models the best Phaser solution has always been the
> first, and this has had the highest LLG.
>
> Note: this is with "untuned" Phaser settings for identity, solvent
> fraction, or number of search models in ASU -- our goal is to do a
> first
> run with "generic" settings for these over a larger set of models,
> then
> (from TFZ and LLG scores) select a subset for which we will tune
> Phaser
> parameters and PDB search model variations.
>
>
>
> 2. If we are right that the first "SOLU SET" entry is indicative of
> the
> potential for the search model to form a good MR candidate, then is it
> the case that the first entry is the first Phaser solution that is
> computed? Or is the PHASER.sol file a sorted list output at the end
> of
> the run? From my reading of the documentation it is output in order
> of
> computation, and *for our purposes* (if my first statement in this
> question is correct) Phaser can stop after it outputs this first
> solution. Is there some way to tell Phaser to stop after the first
> solution is output?
>
> I realize that this doesn't sound like it makes sense (how could
> Phaser
> know to pick the best solution first, and even if it could, why
> would it
> ever continue past this point), however I ask because we have put a 10
> minute timeout into our Phaser runs and we have many situations
> where we
> get a timeout but PHASER.sol has already been generated and the best
> LLG
> solutions are output first. It leaves me wondering why it didn't just
> stop on its own after outputting the first result instead of being
> aborted by our (external) timeout that terminates the process?
>
>
>
> 3. PHASER.sol files: For single domain search models, we usually get
> output of the form:
>
> SOLU SET RFZ=4.5 TFZ=5.2 PAK=0 LLG=14 LLG=14
> SOLU 6DIM ENSE model1 EULER 242.049 45.040 326.088 FRAC -0.09425
> 0.50268 0.42575
>
> however we see three variations:
>
> i) No LLG:
>
> SOLU SET RFZ=3.1 TFZ=5.0 PAK=0
> SOLU 6DIM ENSE model2 EULER 59.983 69.335 319.701 FRAC -1.17131
> -0.70030 0.23150
>
> ii) One LLG:
>
> SOLU SET RFZ=3.7 TFZ=4.6 PAK=0 LLG=25
> SOLU 6DIM ENSE model3 EULER 293.943 128.068 332.147 FRAC 0.06273
> 0.13175 0.25054
>
> iii) Two LLG entries, but with different values:
>
> SOLU SET RFZ=3.8 TFZ=4.1 PAK=0 LLG=21 LLG=20
> SOLU 6DIM ENSE model4 EULER 278.058 129.347 33.292 FRAC 0.28446
> 0.29011 -0.07986
>
>
>
> 4. Occasionally we get an error that we don't understand:
>
> FATAL RUNTIME ERROR: No scattering in pdbfile model1.pdb
>
> What does this mean? Is there a problem with the PDB file? We can't
> see anything obvious in the ones which produce this error.
>
> Thanks,
>
> Ian
>
> --
> Ian Stokes-Rees, Research Associate
> SBGrid, Harvard Medical School
> http://sbgrid.org
>
> _______________________________________________
> phenixbb mailing list
> phenixbb at phenix-online.org
> http://www.phenix-online.org/mailman/listinfo/phenixbb
------
Randy J. Read
Department of Haematology, University of Cambridge
Cambridge Institute for Medical Research Tel: + 44 1223 336500
Wellcome Trust/MRC Building Fax: + 44 1223 336827
Hills Road E-mail: rjr27 at cam.ac.uk
Cambridge CB2 0XY, U.K. www-
structmed.cimr.cam.ac.uk
More information about the phenixbb
mailing list