Hi, I think a number of these questions could be answered by looking carefully through the whole logfile and seeing what it tells you about what is happening in each step of the calculation. As well, the primary literature should be considered to be part of what documents a program. 1. Phaser uses likelihood to solve structures by molecular replacement, so the best solution is the one with the highest log- likelihood-gain (LLG). One of the ways we talk about this is to consider molecular replacement as testing a series of hypotheses about how the molecule is oriented and how it is positioned, and likelihood measures how consistent the data are with each of these hypotheses. The one with the highest LLG is the one that is supported most strongly by the data. On the point of "tuning" parameters, I'm not sure what you mean. In a particular case, you should know what you put into your crystallization drop, and you will usually have some expectation about the stoichiometry of any complexes, so you usually have a good idea of the possible content of the asymmetric unit and the sequence identity of the models (thus giving you a rough idea of the expected RMS error). You may have to test different choices for the number of copies, if different numbers are consistent with the range of solvent contents observed in crystals, but you can do better than a generic assumption of, say, 50% solvent. The sequence identity <-> RMS error relationship is only approximate, but once the structure is solved then refinement programs like phenix.refine will do a better job of estimating the impact of errors in the coordinates. However, if you only see negative LLG values in a search, then (as the documentation says) you should revise the estimated RMS error upwards, because the model is clearly worse than you would expect from the sequence identity. 2. Given that the potential solutions in the .sol file are sorted by LLG, I'm not sure where the idea would come from that they could be given in the order they were found. You can follow the solutions in the logfile and see this. The whole computation has to finish before it is known which is the best solution. We have heuristics to stop Phaser spending too much time looking down blind alleys and, as we improve our understanding of how to recognize a correct solution from noise, we will improve these heuristics. So we're already doing as well as we know how to stop when the solution is found. The ten-minute timeout is not a good idea. A Phaser molecular replacement run comes after weeks to years of protein expression, crystallization and data collection, and before days to months of rebuilding, refinement and interpretation, so if it takes 30 minutes, two hours or even a day to find a solution, then it doesn't seem too long to wait. 3. To help people, in cases where (say) the computer crashes in the middle of a long run, we've made Phaser write out intermediate .sol, .pdb and .mtz files, so that (in principle) you could pick up from the middle, or you could examine an intermediate solution, say with 2 of 3 components placed. If you stop it in the middle, then you will get files from, say, after the translation search but before the packing check, or after the packing check but before the rigid-body refinement. The results won't be as good, and you may well miss something better that would have been found later. 4. If Phaser reports that there is no scattering in a model, it means that you have supplied an empty PDB file, or one where all the occupancies are equal to zero, or one containing only HETATM records and no ATOM records. If this happens in other circumstances, then it would be a bug and we would appreciate seeing the offending PDB file. I hope that helps. Regards, Randy Read On 15 Nov 2009, at 13:13, Ian Stokes-Rees wrote:
I'm having some discussion with a colleague about phaser output (we're using Phaser 2.1.4). We haven't been able to find any documentation which can clarify our situation, and I'm hoping someone on the list can help answer these questions. I should mention that I am relatively new to Phaser.
1. PHASER.sol files: Which "SOLU SET" does Phaser consider to be the best? The first or the last? Or the one with the highest LLG, wherever that may be? In our experience of running Phaser over several MTZ files with a range of models the best Phaser solution has always been the first, and this has had the highest LLG.
Note: this is with "untuned" Phaser settings for identity, solvent fraction, or number of search models in ASU -- our goal is to do a first run with "generic" settings for these over a larger set of models, then (from TFZ and LLG scores) select a subset for which we will tune Phaser parameters and PDB search model variations.
2. If we are right that the first "SOLU SET" entry is indicative of the potential for the search model to form a good MR candidate, then is it the case that the first entry is the first Phaser solution that is computed? Or is the PHASER.sol file a sorted list output at the end of the run? From my reading of the documentation it is output in order of computation, and *for our purposes* (if my first statement in this question is correct) Phaser can stop after it outputs this first solution. Is there some way to tell Phaser to stop after the first solution is output?
I realize that this doesn't sound like it makes sense (how could Phaser know to pick the best solution first, and even if it could, why would it ever continue past this point), however I ask because we have put a 10 minute timeout into our Phaser runs and we have many situations where we get a timeout but PHASER.sol has already been generated and the best LLG solutions are output first. It leaves me wondering why it didn't just stop on its own after outputting the first result instead of being aborted by our (external) timeout that terminates the process?
3. PHASER.sol files: For single domain search models, we usually get output of the form:
SOLU SET RFZ=4.5 TFZ=5.2 PAK=0 LLG=14 LLG=14 SOLU 6DIM ENSE model1 EULER 242.049 45.040 326.088 FRAC -0.09425 0.50268 0.42575
however we see three variations:
i) No LLG:
SOLU SET RFZ=3.1 TFZ=5.0 PAK=0 SOLU 6DIM ENSE model2 EULER 59.983 69.335 319.701 FRAC -1.17131 -0.70030 0.23150
ii) One LLG:
SOLU SET RFZ=3.7 TFZ=4.6 PAK=0 LLG=25 SOLU 6DIM ENSE model3 EULER 293.943 128.068 332.147 FRAC 0.06273 0.13175 0.25054
iii) Two LLG entries, but with different values:
SOLU SET RFZ=3.8 TFZ=4.1 PAK=0 LLG=21 LLG=20 SOLU 6DIM ENSE model4 EULER 278.058 129.347 33.292 FRAC 0.28446 0.29011 -0.07986
4. Occasionally we get an error that we don't understand:
FATAL RUNTIME ERROR: No scattering in pdbfile model1.pdb
What does this mean? Is there a problem with the PDB file? We can't see anything obvious in the ones which produce this error.
Thanks,
Ian
-- Ian Stokes-Rees, Research Associate SBGrid, Harvard Medical School http://sbgrid.org
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
------ Randy J. Read Department of Haematology, University of Cambridge Cambridge Institute for Medical Research Tel: + 44 1223 336500 Wellcome Trust/MRC Building Fax: + 44 1223 336827 Hills Road E-mail: [email protected] Cambridge CB2 0XY, U.K. www- structmed.cimr.cam.ac.uk