Python-based Hierarchical ENvironment for Integrated Xtallography |
Documentation Home |
Frequently asked questions for phenix.refineNote: questions specific to the GUI can be found in its documentation. GeneralHow can I make phenix.refine run faster? There are currently two options for this:
The OpenMP parallelization is not particularly efficient, since the FFTs take less than half of the typical runtime; a speedup of 40% is usually the maximum. The process-level parallelization with 'nproc' is most useful when restraint weight optimization is enabled, since these procedures can be run as multiple separate processes. In these circumstances a speedup of 4-5x is possible. However, a run using default parameters will not significantly benefit from setting 'nproc'. There are several limitations to these options:
When should I use simulated annealing? Simulated annealing (SA) is most useful early in refinement, when the model is far from convergence. Manually built models, or MR solutions involving significant local conformational changes, are common inputs where SA can improve over simple gradient-driven refinement. It is generally less helpful later in refinement, and/or at high resolution. When should I use rigid-body refinement? This typically only needs to be performed once after molecular replacement, unless you dock in additional domains later. Continuing to use rigid-body refinement in later runs will not improve your structure, and only adds to the runtime. What type of experimental data should I use for refinement? Either amplitudes (F) or intensities (I) may be used in refinement (in any file format), but intensities will be used preferentially. Both anomalous and non-anomalous data are supported; there does not appear to be any particular benefit to model quality using one or the other with the default strategy. However, anomalous data may be used to refine anomalous scattering factors for heavy atoms, and anomalous difference map coefficients will automatically be created in the output MTZ file. For these reasons, anomalous data are recommended. Why does phenix.refine not use all data in refinement? Reflections with abnormal values tend to reduce the performance of the refinement engine. These are identified based on several criteria (see Read 1999 for details) and filtered out at the beginning of each macro-cycle. You can prevent this by setting xray_data.remove_outliers=False. How many macro-cycles should I run? We recommend at least five to ensure convergence, but in some cases (especially poorly refined structures) considerably more may be required for optimal results. (The default is three macro-cycles due to speed considerations.) RestraintsWhen should I optimize the geometry and/or B-factor restraint weights? This may be beneficial if the automatic weighting does not pick a good scale for the X-ray and restraint terms; this will often be recognizable by higher-than-expected bond and angle RMSDs. In general it rarely hurts to optimize the weights, and often results in a significantly better refinement, but it is several times slower than ordinary refinement unless you have a highly parallel system. However, we strongly recommend weight optimization in the final round of refinement, where it becomes essential to prevent overfitting. When should I use non-crystallographic symmetry (NCS) restraints? An approximate cutoff for NCS restraints is 2.0 Angstrom - at higher resolution the data alone are usually sufficient, but at lower resolution additional restraints are usually necessary. This is somewhat subjective due to the behavior of the global NCS restraints currently used by default in PHENIX, but will be addressed in future versions. What is the difference between global and torsion NCS, and which one should I pick? The global NCS restraints groups as rigid bodies, where all atoms in each group are expected to be related to the others by a single rotation and translation operation. This does not respect local deformations in the related molecules, which are common even at lower resolution. The torsion NCS restraints restraint dihedral angles instead, and allow them to be unrestrained if genuinely different. This will eventually become the default, since it often results in significantly better refinement. Both NCS restraint types make my structure worse - what should I do? In most cases this is due to the restraint of B-factors between NCS groups, which may actually have very different levels of disorder. Setting the NCS B-factor weight term to zero usually fixes the problem. My resolution is X Angstroms; what should RMS(bonds) and RMS(angles) be? This is somewhat controversial, but absolute upper limits for a well-refined protein structure at high resolution are typically 0.02 for RMS(bonds) and 2.0 for RMS(angles); usually they will be significantly lower. As resolution decreases the acceptable deviation from geometry restraints also decreases, so at 3.5 Angstrom, more appropriate values would be 0.01 and 1.0. We recommend using the POLYGON tool in the validation summary to judge your structure relative to others at similar resolutions. Why does my output model have very poor geometry (RMS(bonds) and RMS(angles))? This usually means that the automatic X-ray/geometry weighting did not work properly; this can sometimes happen if the starting model also has poor geometry. Optimizing the weight (optimize_xyz_weight=True, or equivalent GUI control in the "Refinement settings" tab) will usually fix this problem. I have experimental phases for this structure, but the initial maps were poor. Should I still use phase restraints? The experimental phases used to restraint refinement describe a bimodal probability distribution for every angle, rather than the single values used to generate a map. In most cases the additional restraints will not hurt refinement, and can often help. Why is phenix.refine messing up my ligand geometry? This often happens when the restraints were generated using ReadySet from a PDB file, and the ligand code is not recognizable in the Chemical Components Database. eLBOW will try to guess the molecular topology based on the coordinates alone, but this is imprecise and may not yield the desired result. For best results, restraints for non-standard ligands should be generated in eLBOW using a SMILES string or similar source of topology information. What can I do to make my low-resolution structure better? In general, if NCS is present in your structure, you should always use NCS restraints at low resolution; it is worth trying both the Cartesian (global) and torsion restraints to see which works best for your model. This alone usually helps with the geometry and overfitting, although it is rarely sufficient by itself. There are also several different types of restraint specifically designed to help with low-resolution refinement (consult the full phenix.refine manual page for details on each):
I have ions very close to water molecules/protein atoms, and phenix.refine keeps tring to move them apart. How can I prevent this? Use phenix.ready_set or phenix.metal_coordination to generate custom bond (and optionally, bond angle) restraints, which will be output to a parameter file ending in ".edits". If you are using the PHENIX GUI, there is a toolbar button for ReadySet in the phenix.refine interface, which will automatically load the output files for use in phenix.refine. I had previously generated custom restraints using ReadySet in the PHENIX GUI, but the atoms have changed. Now phenix.refine crashes because it can't find the atom selections. How do I remove the old custom restraints? In the Utilities menu, select "Clear custom restraints." B-factors/ADPs/TLSTLS refinement is generally valid at any resolution; at low resolution, it may be best to make each chain a single group, instead of trying to split them into smaller pieces. However, it is best to wait until near the end of refinement to add TLS; until then you should refine with isotropic ADPs only. Can I use both TLS and anisotropic ADPs? Yes, but not for the same atoms - since TLS is essentially constrained anisotropic refinement, the two methods are mutually exclusive. Where is the switch for anisotropic vs. isotropic B-factors/ADPs? phenix.refine does not have a single global switch for defining ADP parameterization; rather, when the "Individual ADPs" strategy is defined, the program uses several criteria to determine how atoms should be treated:
In the GUI, several common parameterizations are pre-defined in the dialog for entering ADP selections. Note that although it is possible to combine all of the different ADP refinement strategies in a single run, the atom selections for individual and grouped refinement may not overlap, nor may the selections for anisotropic ADPs and TLS groups. When should I refine anisotropic ADPs instead of TLS groups? There is no precise cutoff where you should turn on anisotropic ADPs, but these are approximate guidelines:
There may be circumstances where anisotropic refinement is permissible at slightly lower resolution, but 1.7 Angstrom is probably a lower limit. Exceptions may sometimes be made for metal ions, since they scatter very strongly. As always, you should use the drop in R-free to judge whether the change in parameterization was appropriate - a decrease of 0.5% (i.e. 0.005) or better indicates success. When should I refine grouped B-factors/ADPs instead of individual? It is again difficult to give an exact rule, since it depends on several properties of the crystal including resolution, solvent content, presence of NCS, etc. In general, the higher the data-to-parameter ratio, the more likely individual ADPs are to work well. As an approximate example, consider these two hypothetical structures:
In this case, the latter structure can probably be refined with individual ADPs, while the former is more marginal. If in doubt, early rounds of refinement may be done with grouped ADPs, switching to individual as the structure nears convergence. In general, it is usually worth trying individual ADPs at some point; ultimately the effect on R-factors (primarily R-free, but also the gap between R-work and R-free) is the most important guideline. Interpreting resultsMy resolution is X Angstroms, and my R/R-free are Y and Z. Am I done refining? A partial answer can be obtained by looking at POLYGON, which plots histograms of statistics for PDB structures solved at similar resolutions, and compares these to the statistics for your output model. As a general rule, R-factors alone should not be used to decide if a structure is "done", but should be examined in combination with the validation report. My resolution is X Angstroms, the structure is complete and well-validated, the maps look great, bu my R and R-free are still really high. How can I make them lower? There are several possible explanations for this:
The gap between R-work and R-free is very large - how can I fix this? Overfitting during refinement is usually helped by adding more restraints, and/or tightening the standard geometry restraints. If the output geometry is already within reasonable limits (typically RMS(bonds) < 0.016 and RMS(angles) < 1.8), ideas to try include adding NCS restraints if NCS is present, secondary structure restraints, or reference model restraints (if a high-resolution structure is available). At lower resolutions (worse than 3.0A), it may also be prudent to try grouped ADP refinement, and if desperate, Ramachandran restraints. TLS refinement can often improve overfitting across a wide range of resolutions. However, depending on the degree of overfitting, it may be necessary to perform extensive manual rebuilding first. (Note that if the large R/R-free gap suddenly appears after refinement of a model that was previously not overfit, this usually indicates incorrect parameterization of the refinement, e.g. using anisotropic ADPs at an inappropriate resolution.) HydrogensWhen should I refine with hydrogens? This is largely a matter of personal preference. Using explicit riding hydrogen atoms can improve geometry at any resolution; at higher resolutions, approximately 2 Angstrom or better, they will generally improve R-free as well. At atomic resolution (1.5 A or better) they should always be part of the final model. Note that at unless you have true subatomic resolution (0.9 A or better), the hydrogens should always be refined as "riding", meaning that their coordinates are defined by the heavy atoms, not individually refined. Although phenix.ready_set includes an option to add hydrogens to waters, we do not recommend this unless you have exceptionally high resolution and/or neutron data. Why are my hydrogen atoms added by PHENIX exploding when I run real-space refinement in Coot? Versions of Coot prior to 0.6.2 used a version of the CCP4 monomer library with hydrogen atoms named according to the PDB format version 2 standard; PHENIX can recognize these, but defaults to PDB v.3. To reconcile the different conventions, you can download the newer version of the monomer library (currently available here) and set the environment variable COOT_REFMAC_LIB_DIR to point to the directory in which you unpack it. Why can't PHENIX automatically remove hydrogens from the output PDB file? We strongly discourage removing any atoms used in refinement from the model, as it makes reproducing the published R-factors very difficult and eliminates essential information about how the structure was refined. MiscellaneousHow can I model a charged atom? The charge occupies columns 79-80 at the end of each ATOM or HETATM record, immediately following the element symbol. The format is the number of electrons followed by the charge sign, for example "1-" or "2+". You can edit the PDB file manually to add this, but we recommend using phenix.pdbtools: phenix.pdbtools model.pdb charge_selection="element Mn" charge=2 This is also available in the GUI under "Model tools". The effect of setting the charge will be to use modified scattering factors for X-ray refinement, which can be helpful if you notice difference density appearing at ion sites. Note that it will have no effect on the geometry, since phenix.refine does not take electrostatics into account. I can't see density for an arginine sidechain beyond the C-gamma atom. How should I model it? Opinion in the crystallography differs on the proper approach to disordered sidechains, with significant support for both of the following methods voiced on the PHENIX and CCP4 mailing lists:
A third approach, setting the occupancy of missing atoms to zero but leaving them in the model, is strongly disfavored, as the resulting positions and B-factors are entirely theoretical (but not immediately obvious as such). References
|