Frequently asked questions for phenix.refine

Note: questions specific to the GUI can be found in its documentation.

Contents

General

How can I make phenix.refine run faster?

There are currently two options for this:

  • compile from source with OpenMP support, which parallelizes the Fast Fourier Transform (FFT) used in structure factor and gradient calculation
  • specify the 'nproc' parameter (or equivalent control in the GUI), which parallelizes various grid searches by starting multiple Python jobs

The OpenMP parallelization is not particularly efficient, since the FFTs take less than half of the typical runtime; a speedup of 40% is usually the maximum. The process-level parallelization with 'nproc' is most useful when restraint weight optimization is enabled, since these procedures can be run as multiple separate processes. In these circumstances a speedup of 4-5x is possible. However, a run using default parameters will not significantly benefit from setting 'nproc'.

There are several limitations to these options:

  • OpenMP support is incompatible with the Phenix GUI
  • 'nproc' is incompatible with Windows
  • these approaches are incompatible with each other

What type of experimental data should I use for refinement?

Either amplitudes (F) or intensities (I) may be used in refinement (in any file format), but intensities will be used preferentially. Both anomalous and non-anomalous data are supported; there does not appear to be any particular benefit to model quality using one or the other with the default strategy. However, anomalous data may be used to refine anomalous scattering factors for heavy atoms, and anomalous difference map coefficients will automatically be created in the output MTZ file. For these reasons, anomalous data are recommended, but in general the R-factors will be similar.

Should I use the MTZ file output by phenix.refine as input for the next round of refinement?

The only time this is necessary is when you refined against a dataset that did not include R-free flags, and let phenix.refine generate a new test set. In this case, you should use the file ending in "_data.mtz" for all future rounds of refinement. You do not need to update the input file in each round, as the actual raw data (and R-free flags) are not modified.

I ran AutoSol to get a partial model that I now want to refine. Which data file should I give as input: the original .sca file from HKL2000, or the file overall_best_refine_data.mtz from AutoSol?

Always use the MTZ file output by AutoSol. This contains a new set of R-free flags that have been used to refine the model; starting over with the .sca file will result in a new set of flags being generated, which biases R-free.

Why does phenix.refine not use all data in refinement?

Reflections with abnormal values tend to reduce the performance of the refinement engine. These are identified based on several criteria (see Read 1999 for details) and filtered out at the beginning of each macro-cycle. You can prevent this by setting xray_data.remove_outliers=False.

How many macro-cycles should I run?

We recommend at least five to ensure convergence, but in some cases (especially poorly refined structures) considerably more may be required for optimal results. (The default is three macro-cycles due to speed considerations.)

How should I decide what resolution limit to use for my data?

This is a contentious subject, and not settled, although it is widely agreed that throwing away usable data simply to reduce the R-factors is not an acceptable practice. Traditional criteria for resolution limits include (among others) truncating the data at the resolution where the mean I/sigmaI falls below 2, but this may exclude valuable data. See the documentation on unmerged data (especially the cited references) for more details. In general it may be useful to include additional, weaker data in refinement in the final stages, since the weighting performed by maximum-likelihood refinement prevents these from degrading the model, and may improve it in some cases. (Note that more data will also cause phenix.refine to take longer to run, which often imposes a practical limit on resolution.)

Is it okay to refine against data that have been modified by anisotropic scaling or ellipsoidal truncation?

There is no technical reason why this is impossible. However, modified data should be used with extreme caution for several reasons:

  • phenix.refine already performs anisotropic scaling internally and will correct the output map coefficients for anisotropy, so the pre-scaling should be redundant.
  • Because phenix.refine (like REFMAC) by default substitutes F-calc for missing reflections when calculating 2mFo-DFc and similar map coefficients, ellipsoidal truncation will result in eliminated high-resolution reflections being replaced by F-calc in the maps. This will always make the maps look better, because they are more biased to the model!
  • Even if you do choose to refine against anisotropically scaled data, you should always deposit the original, unmodified data in the PDB upon publication.

Optimization methods

When should I use simulated annealing?

Simulated annealing (SA) is most useful early in refinement, when the model is far from convergence. Manually built models, or MR solutions involving significant local conformational changes, are common inputs where SA can improve over simple gradient-driven refinement. It is generally less helpful later in refinement, and/or at high resolution.

When should I use rigid-body refinement?

This typically only needs to be performed once after molecular replacement, unless you dock in additional domains later. Continuing to use rigid-body refinement in later runs will not improve your structure, and only adds to the runtime.

What happened to the old fix_rotamers option?

As of version 1.8.3, the real-space refinement strategy now incorporates both global minimization and local fitting; the latter is similar to fix_rotamers but much faster and incorporates backbone flexibility. Note that it will not run at very high or low resolution, or when explicit hydrogen atoms are present.

When is the ordered solvent method useful?

This works out to approximately 2.8Å as the low-resolution limit, depending on data quality. At atomic resolution (beyond approximately 1.2Å), it is useful early in refinement when no waters have been placed yet, but as the structure becomes more complete it may remove weaker, partial-occupancy waters, and it is unable to handle static disorder (alternate conformations).

Targets and restraints

When should I optimize the geometry and/or B-factor restraint weights?

This may be beneficial if the automatic weighting does not pick a good scale for the X-ray and restraint terms; this will often be recognizable by higher-than-expected bond and angle RMSDs. In general it rarely hurts to optimize the weights, and often results in a significantly better refinement, but it is several times slower than ordinary refinement unless you have a highly parallel system. However, we strongly recommend weight optimization in the final round of refinement, where it becomes essential to prevent overfitting.

How can I set the target weights manually?

Our usual response is: don't do this manually, use the automatic optimization instead. Although this takes significantly longer to run, in practice most users will spend an equivalent amount of time manually adjusting the weights by trial-and-error. If you are certain you need to have manual control, the parameters fix_wxc (for geometry restraints) and fix_wxu (for B-factor restraints) will set the weights.

When should I use non-crystallographic symmetry (NCS) restraints?

An approximate cutoff for NCS restraints is 2.0 Angstrom - at higher resolution the data alone are usually sufficient, but at lower resolution additional restraints are usually necessary. This is somewhat subjective due to the behavior of the global NCS restraints currently used by default in PHENIX, but will be addressed in future versions.

What is the difference between global and torsion NCS, and which one should I pick?

The global NCS restraints groups as rigid bodies, where all atoms in each group are expected to be related to the others by a single rotation and translation operation. This does not respect local deformations in the related molecules, which are common even at lower resolution. The torsion NCS restraints restraint dihedral angles instead, and allow them to be unrestrained if genuinely different. This will eventually become the default, since it often results in significantly better refinement.

How do I specify the .ncs_spec file from AutoSol or phenix.find_ncs for use in refinement?

The .ncs_spec files containing rotation and translation matrices are only used in density modification and model building. For refinement, the NCS relationships are always given as atom selections, and in the case of the default torsion NCS restraints, the automatically detected restraint groups should be very accurate.

Both NCS restraint types make my structure worse - what should I do?

In most cases this is due to the restraint of B-factors between NCS groups, which may actually have very different levels of disorder. Setting the NCS B-factor weight term to zero usually fixes the problem.

My resolution is X Angstroms; what should RMS(bonds) and RMS(angles) be?

This is somewhat controversial, but absolute upper limits for a well-refined protein structure at high resolution are typically 0.02 for RMS(bonds) and 2.0 for RMS(angles); usually they will be significantly lower. As resolution decreases the acceptable deviation from geometry restraints also decreases, so at 3.5 Angstrom, more appropriate values would be 0.01 and 1.0. We recommend using the POLYGON tool in the validation summary to judge your structure relative to others at similar resolutions.

Why does my output model have very poor geometry (RMS(bonds) and RMS(angles))?

This usually means that the automatic X-ray/geometry weighting did not work properly; this can sometimes happen if the starting model also has poor geometry. Optimizing the weight (optimize_xyz_weight=True, or equivalent GUI control in the "Refinement settings" tab) will usually fix this problem.

I have experimental phases for this structure, but the initial maps were poor. Should I still use the MLHL target?

The experimental phases used to restraint refinement describe a bimodal probability distribution for every angle, rather than the single values used to generate a map. In most cases the additional restraints will not hurt refinement, and can often help.

Why is phenix.refine messing up my ligand geometry?

This often happens when the restraints were generated using ReadySet from a PDB file, and the ligand code is not recognizable in the Chemical Components Database. eLBOW will try to guess the molecular topology based on the coordinates alone, but this is imprecise and may not yield the desired result. For best results, restraints for non-standard ligands should be generated in eLBOW using a SMILES string or similar source of topology information.

What can I do to make my low-resolution structure better?

In general, if NCS is present in your structure, you should always use NCS restraints at low resolution; it is worth trying both the Cartesian (global) and torsion restraints to see which works best for your model. This alone usually helps with the geometry and overfitting, although it is rarely sufficient by itself. There are also several different types of restraint specifically designed to help with low-resolution refinement (consult the full phenix.refine manual page for details on each):

  • Reference model: this restraints the torsion angles to a high-resolution model, using a potential that gently releases angles which are genuinely different (allowing both local deformations or domain movements). This is usually the best option, and typically improves both R-free, overfitting, and validation statistics. However, it depends on the availability of a suitable reference model.
  • Secondary structure restraints: these are simply harmonic distance restraints between hydrogen-bonding atoms in protein helices and sheets, and nucleic acid base pairs. Annotation can be either automatic or manual. The primary limitation is currently the need for outlier filtering, which removes spurious bonds but also some genuine ones. It has a small effect on validation statistics, and typically none on R-free, but it does keep secondary structures stable.

I have ions very close to water molecules/protein atoms, and phenix.refine keeps tring to move them apart. How can I prevent this?

Use phenix.ready_set or phenix.metal_coordination to generate custom bond (and optionally, bond angle) restraints, which will be output to a parameter file ending in ".edits". If you are using the PHENIX GUI, there is a toolbar button for ReadySet in the phenix.refine interface, which will automatically load the output files for use in phenix.refine.

I had previously generated custom restraints using ReadySet in the PHENIX GUI, but the atoms have changed. Now phenix.refine crashes because it can't find the atom selections. How do I remove the old custom restraints?

In the Utilities menu, select "Clear custom restraints."

What does "sigma" mean for geometry restraints, and what values are appropriate?

The sigma is the estimated standard deviation (e.s.d.) of the target value. For distance (bond) restraints, the sigma will be in Angstrom units; for bond angle and dihedral restraints it will be in degrees. A typical covalent bond will have a sigma of 0.02 or 0.03, and weaker "bond" restraints such as hydrogen bonds or metal coordination restraints will be looser with sigma between 0.05 and 0.1. For bond angle restraints, sigma is usually a few degrees; for dihedrals, up to 20-30 degrees is not uncommon. (See also the next question below.)

I want my ligand geometry to be absolutely perfect with no deviation from the target value(s). Can I just set the sigmas to zero or an extremely low value?

You cannot set the sigma to zero because the weight on the restraints is equal to 1/sigma^2. A very low value will not crash, but it will almost certainly confuse the minimizer and result in a sub-optimal structure, because those restraints will dominate the target and gradients, forcing the minimizer to take inappropriately large steps.

One part of my structure is particularly poor; how can I make the geometry restraints tighter for only those atoms?

The parameter scope refinement.geometry_restraints.edits.scale_restraints allows you to upweight the restraints for the specified atom selection, for any combination of bond lengths, bond angles, dihedral angles, and chirality. For instance:

refinement.geometry_restraints.edits.scale_restraints {
  atom_selection = "chain B"
  scale = 2.5
  apply_to = *bond *angle dihedral chirality
}

You may specify multiple such parameter blocks. The scale should be a relatively small number, typically less than 10 (you may also reduce the weight if you want). Note that since this affects the path of the minimizer, the overall geometry RMSDs (and the deviations for restraints which were not scaled) will likely change as a result.

Can I make phenix.refine restrain the planarity of RNA/DNA base pairs?

At present this requires specifying each base pair individually as a custom planarity restraint (part of the parameter scope refinement.geometry_restraints.edits), which may be excessively time-consuming for large structures. Automatic planarity restraints may be added in a future version.

Can I use a reference model to restrain ligand coordinates?

The reference model restraints are only intended to work with macromolecules. However, you may use separate harmonic restraints for any subset of atoms; these will tether the selected atoms to their initial coordinates. This can be effective when you already have good geometry and map fit for the restrained atoms; however, it does not allow for genuine conformational differences.

How do I stop simulated annealing from pushing certain atoms too far out of density?

The harmonic restraints are suitable for this purpose. This is especially useful when generating a simulated annealing omit map, where atoms may move to fill voids left by omitted scatterers.

B-factors/ADPs/TLS

When should I use TLS?

TLS refinement is generally valid at any resolution; at low resolution, it may be best to make each chain a single group, instead of trying to split them into smaller pieces. However, it is best to wait until near the end of refinement to add TLS; until then you should refine with isotropic ADPs only.

Can I use both TLS and anisotropic ADPs?

Yes, but not for the same atoms - since TLS is essentially constrained anisotropic refinement, the two methods are mutually exclusive.

Where is the switch for anisotropic vs. isotropic B-factors/ADPs?

phenix.refine does not have a single global switch for defining ADP parameterization; rather, when the "Individual ADPs" strategy is defined, the program uses several criteria to determine how atoms should be treated:

  • By default, atoms that are anisotropic in the input model (i.e. have ANISOU records) will be kept anisotropic if the resolution is at least 1.7A; isotropic atoms remain isotropic.
  • If the resolution is lower, all atoms will be converted to isotropic unless otherwise specified.
  • The atom selections for isotropic and anisotropic atoms may also be defined explicitly. Typically, the hydrogens will always be isotropic; protein atoms will be anisotropic at high resolution, and sometimes waters as well.

In the GUI, several common parameterizations are pre-defined in the dialog for entering ADP selections. Note that although it is possible to combine all of the different ADP refinement strategies in a single run, the atom selections for individual and grouped refinement may not overlap, nor may the selections for anisotropic ADPs and TLS groups.

When should I refine anisotropic ADPs instead of TLS groups?

There is no precise cutoff where you should turn on anisotropic ADPs, but these are approximate guidelines:

  • At 1.5 Angstrom resolution or better, refine protein/nucleic acid/ligand heavy atoms (C/N/O or heavier) anisotropically, and waters isotropically.
  • At 1.2 Angstrom or better, waters should also be anisotropic.
  • It is almost never appropriate to refine hydrogens anisotropically.

There may be circumstances where anisotropic refinement is permissible at slightly lower resolution, but 1.7 Angstrom is probably a lower limit. Exceptions may sometimes be made for metal ions, since they scatter very strongly. As always, you should use the drop in R-free to judge whether the change in parameterization was appropriate - a decrease of 0.5% (i.e. 0.005) or better indicates success.

When should I refine grouped B-factors/ADPs instead of individual?

It is again difficult to give an exact rule, since it depends on several properties of the crystal including resolution, solvent content, presence of NCS, etc. In general, the higher the data-to-parameter ratio, the more likely individual ADPs are to work well. As an approximate example, consider these two hypothetical structures:

  • a 3.5 Angstrom structure of a single protein chain per asymmetric unit, with 38% solvent content
  • a 4.0 Angstrom structure of 3 NCS-related chains, with 70% solvent content

In this case, the latter structure can probably be refined with individual ADPs, while the former is more marginal. If in doubt, early rounds of refinement may be done with grouped ADPs, switching to individual as the structure nears convergence. In general, it is usually worth trying individual ADPs at some point; ultimately the effect on R-factors (primarily R-free, but also the gap between R-work and R-free) is the most important guideline.

Twinning

I performed twin refinement and my R-free went down by 1%; does that mean my structure is twinned?

No, because R-factors calculated with and without twinning are not necessarily on the same scale. In phenix.model_vs_data, the structure is only considered twinned if application of a twin law reduces R-work by at least 2%. Note that if you specify twin_law=Auto, phenix.refine will use the same procedure to determine the twin law (if any).

My data has multiple twin laws; can I use these in Phenix?

Currently we only support a single twin law; programs capable of refining tetartohedrally twinned structures are REFMAC and SHELXL.

What are the disadvantages of twinned refinement?

In Phenix specifically, twinned refinement uses a least-squares (LS) target instead of the more powerful maximum likelihood target used for conventional refinement. Additionally, twinned refinement makes no use of experimental phases (if available) as restraints. Some refinement protocols may not work with twinning, although as of April 2013 most of these have been fixed. More generally, the output map coefficients will have a significantly worse model bias problem than conventional maps; this effect increases as the twin fraction nears 0.5.

Using R-free

phenix.refine stops with an error message about the model being refined against a different set of R-free flags. How can I fix this?

First, make sure that you have not actually generated a new set of R-free flags; once you have these flags for a given dataset, you should continue using them throughout the building and refinement process. The error message is intended to guard against this happening accidentally. If, however, you have collected new higher-resolution data and extended the old R-free flags, then the error message may be ignored. The R-free flag comparison is based on information stored in the REMARK records in the input PDB file, so if you edit the PDB file and remove the line containing the word "hexdigest", the refinement will be able to continue.

I have a model that was previously refined against a previous set of R-free flags that I don't have access to. How can I avoid biasing the R-free when I refine this model?

There are several methods for this, but the easiest is to reset the B-factors (using PDBTools or the "Modify start model" option in phenix.refine) and run simulated annealing on the coordinates. If you are especially worried about bias you can alternately randomize the coordinates and perform energy minimization, or build an entirely new model starting from the phases calculated from the original model. However, we usually find that the annealing is aggressive enough to remove any "memory" of the original R-free flags.

My resolution is X Angstroms, and my R/R-free are Y and Z. Am I done refining?

A partial answer can be obtained by looking at POLYGON, which plots histograms of statistics for PDB structures solved at similar resolutions, and compares these to the statistics for your output model. As a general rule, R-factors alone should not be used to decide if a structure is "done", but should be examined in combination with the validation report.

My resolution is X Angstroms, the structure is complete and well-validated, the maps look great, bu my R and R-free are still really high. How can I make them lower?

There are several possible explanations for this:

  • Twinning with a relatively small twin fraction (perhaps 10%) may not obviously affect map quality, but can still have a significant impact on R-factors. Run Xtriage to look for evidence of twinning, and possible twin laws for using in phenix.refine. (As noted above, however, only one twin law may be used at a time.)
  • Undiagnosed translational NCS (AKA translational pseudosymmetry) can have a similar effect. In this case, the diffraction images may have been processed incorrectly, thus missing the fainter spots that result from tNCS. The program LABELIT, or more specifically the command labelit.index (distributed with PHENIX, but not yet available in the GUI) can be run on the original images to provide appropriate indexing parameters.

The gap between R-work and R-free is very large - how can I fix this?

Overfitting during refinement is usually helped by adding more restraints, and/or tightening the standard geometry restraints. If the output geometry is already within reasonable limits (typically RMS(bonds) < 0.016 and RMS(angles) < 1.8), ideas to try include adding NCS restraints if NCS is present, secondary structure restraints, or reference model restraints (if a high-resolution structure is available). At lower resolutions (worse than 3.0A), it may also be prudent to try grouped ADP refinement, and if desperate, Ramachandran restraints. TLS refinement can often improve overfitting across a wide range of resolutions. However, depending on the degree of overfitting, it may be necessary to perform extensive manual rebuilding first. (Note that if the large R/R-free gap suddenly appears after refinement of a model that was previously not overfit, this usually indicates incorrect parameterization of the refinement, e.g. using anisotropic ADPs at an inappropriate resolution.)

I ran a round of refinement, rebuilt in Coot, and refined again. My previous R-free was 0.25, but the new refinement starts out at 0.35. Why is it so high?

The initial R-factors reported by phenix.refine are without bulk-solvent correction, which usually has a significant impact on R-factors. Once this step is performed (at the start of the first macrocycle), the R-free should drop immediately to approximately the expected value.

Why does phenix.refine give me a different R-factor than program X?

There are many explanations for this. Even without minimization, the bulk solvent and scaling methods alone may account for as much as a 1% difference or more in calculated R-factors. For refinement results, the differences in target functions, restraints, and minimizers may be significant. In some cases the explanation is as simple as running too few cycles of refinement for one or the other program. In general, if you find a case where phenix.refine performs significantly worse than another program, we encourage you to contact us at bugs@phenix-online.org.

Interpreting results

I solved my structure by MR and refined, and the R-free is 45%. The maps are messy and I see a lot of difference density, but none around my molecule. Why isn't refinement working?

This frequently indicates that too few copies of the structure were placed by MR, and an additional chain needs to be added. Remember that the predictions of unit cell contents based on the Matthews coefficient (performed by Xtriage, for example) only provide an estimate, not an exact answer. At high resolution a solvent content of 40% or less is quite common.

Why am I seeing negative blobs in the difference (mFo-DFc) map in hydrophobic voids?

In previous versions of Phenix, the bulk solvent mask was often being extended to include these regions. This should no longer be a problem as of July 2013, but if you continue to see this effect, please contact us at bugs@phenix-online.org.

After coordinate and B-factor refinement the heavy atoms in my structure have negative mFo-DFc peaks. How do I get rid of these?

Refining the occupancies will often fix this problem. Alternately, if a significant amount of anomalous scattering is expected at the wavelength used for data collection, anomalous group reifnement may also be helpful. We do not recommend setting the B-factor manually and turning off refinement for the problematic atoms.

After refinement the mFo-DFc map has positive density around correctly placed atoms that are already at full occupancy. Is my model missing something?

Usually this means that the initial B-factors of the input model were too high for refinement to converge. Typically the minimizer is very good at raising low B-factors to the correct value, but gets stuck in the opposite direction. The observed result could happen if you refine starting from a lower-resolution model, or if you build new residues in Coot and the default B-factors are well above what they should be (typically this only happens at atomic resolution). To fix the problem, you just need to reset the B-factors to a suitably low value. This can be done at the start of refinement by setting the parameter refinement.modify_start_model.adp.set_b_iso; in the GUI, this can be found in the Settings menu under "Modify start model" --> "Modify ADPs...".

Why does Phenix validation report a different number of Ramachandran outliers than Coot/Procheck/the PDB/other program?

The phi/psi distributions used in Phenix are the same as those in the MolProbity server (Chen et al. 2010), and are based on a curated set of 8000 high-resolution crystal structures. There are now six distributions for different residue classes (general, glycine, Ile/Val, pre-Pro, cis-Pro, and trans-Pro). These distributions are stored in 2-degree increments. Other programs generally use older and/or less precise distributions to score phi/psi angles, which frequently results in disagreements for residues which are on the border of allowed and outlier regions of the plot. We suggest that you rely primarily on the results in Phenix (or MolProbity), as the distributions we use are very accurate and based on the latest structural data.

Hydrogens

When should I refine with hydrogens?

This is largely a matter of personal preference. Using explicit riding hydrogen atoms can improve geometry at any resolution; at higher resolutions, approximately 2 Angstrom or better, they will generally improve R-free as well. At atomic resolution (1.5 A or better) they should always be part of the final model. Note that at unless you have true subatomic resolution (0.9 A or better), the hydrogens should always be refined as "riding", meaning that their coordinates are defined by the heavy atoms, not individually refined.

What about water molecules?

Although phenix.ready_set includes an option to add hydrogens to waters, we do not recommend this unless you have exceptionally high resolution and/or neutron data.

Why are my hydrogen atoms added by PHENIX exploding when I run real-space refinement in Coot?

Versions of Coot prior to 0.6.2 used a version of the CCP4 monomer library with hydrogen atoms named according to the PDB format version 2 standard; PHENIX can recognize these, but defaults to PDB v.3. To reconcile the different conventions, you can download the newer version of the monomer library (currently available here) and set the environment variable COOT_REFMAC_LIB_DIR to point to the directory in which you unpack it.

Why can't PHENIX automatically remove hydrogens from the output PDB file?

We strongly discourage removing any atoms used in refinement from the model, as it makes reproducing the published R-factors very difficult and eliminates essential information about how the structure was refined.

How come I have a bunch of clashes with water molecules in the validation results after running solvent update?

phenix.refine is relatively aggressive in placing solvent atoms in unmodeled density. However, this may sometimes result in clashes if the density represents ions, unmodeled residues, or alternate conformations rather than solvent. For this reason, we recommend that the final round of refinement not include solvent update (regardless of resolution), after any clashing water atoms have been removed. (You should also attempt to model the observed density features if possible, although this is not always straightforward.)

Miscellaneous

How can I model a charged atom?

The charge occupies columns 79-80 at the end of each ATOM or HETATM record, immediately following the element symbol. The format is the number of electrons followed by the charge sign, for example "1-" or "2+". You can edit the PDB file manually to add this, but we recommend using phenix.pdbtools:

phenix.pdbtools model.pdb charge_selection="element Mn" charge=2

This is also available in the GUI under "Model tools". The effect of setting the charge will be to use modified scattering factors for X-ray refinement, which can be helpful if you notice difference density appearing at ion sites. Note that it will have no effect on the geometry, since phenix.refine does not take electrostatics into account.

I can't see density for an arginine sidechain beyond the C-gamma atom. How should I model it?

Opinion in the crystallography community differs on the proper approach to disordered sidechains, with significant support for both of the following methods voiced on the PHENIX and CCP4 mailing lists:

  • Delete all atoms not visible in density, but leave the residue name alone. This is arguably the most conservative approach, as it avoids modeling any features not supported by the data, and it is consistent with the treatment of missing loops. The main disadvantage is aesthetic, since it is more difficult to visualize and interpret the biological effects of a structure with missing sidechains. Anecdotal evidence suggests that some non-crystallographers may be confused by this.
  • Pick an appropriate rotamer, and let the B-factors rise to account for disorder. This avoids truncated sidechains that may be mistaken for other residues, and is more realistic when interpreting surface electrostatics. The atomic B-factors and coordinates are actually refined against the data, however weak. It is potentially dangerous because it implies a greater level of confidence in these positions than is justified by the data. Additionally, the ADP restraints will keep the B-factors of nearby atoms similar (within some tolerance), which is normally essential for stable refinement but may artificially lower the B-factors of disordered sidechains.

A third approach, setting the occupancy of missing atoms to zero but leaving them in the model, is strongly disfavored, as the resulting positions and B-factors are entirely theoretical (but not immediately obvious as such).

Running phenix.model_vs_data (or the validation GUI) results in a slightly different R-factor than reported in the PDB header by phenix.refine. Shouldn't these be the same?

phenix.refine and phenix.model_vs_data use the same code to perform the bulk solvent correction and scaling, so they should report approximately the same R-factors given identical inputs. The discrepancy arises when taking a PDB file from refinement and running it back through phenix.model_vs_data. Because of the limited precision of the PDB format (three digits after the decimal point for coordinates, two digits for B-factors and occupancy), the atomic properties recorded in the PDB file will not be exactly the same as their actual refined values. In practice the difference in R-factors is statistically insignificant, however.

When should I perform anomalous group refinement?

This is most useful when you have a large number of strong anomalous scatterers, where map artifacts are common. In such cases a more precise modeling of atomic scattering may improve the R-factors as well. It is generally not necessary for routine cases, but it may be advantageous for identifying weak anomalous scatterers (such as ions from the crystallization condition) when used to calculate an anomalous log-likelihood gradient (LLG) or residual map at the end of refinement.

How does phenix.refine deal with atoms on special positions?

There are two ways to handle atoms on special positions (e.g. symmetry axes):

  • If the occupancy is set to the value expected for the special position (e.g. 0.5 for an atom on a two-fold axis) or less, the coordinate position will be refined, and it will not interact with its symmetry mates.
  • If the occupancy is set to 1, the atom will be constrained to stay at the special position, and the occupancy will be corrected internally when calculating structure factors.

Note that the partial-occupancy atoms on special positions will have their occupancies defined if using default settings; you can disable this by instructing phenix.refine to remove a specific atom selection from occupancy refinement (the keyword refinement.refine.occupancies.remove_selection, or in the GUI, edit the atom selections for the Occupancy strategy).

How can I extract the isotropic B-factor equivalent from a structure refined with TLS or anisotropic atoms?

You don't need any extra steps; the B-factor column in ATOM records in the PDB (or mmCIF) file will already be the total B-factor.

Why does phenix.refine output ANISOU records for individual atoms even though I only performed isotropic and TLS refinement?

TLS refinement is essentially constrained anisotropic refinement, so the individual atoms are anisotropic (just not independent); the ANISOU records simply make this explicit, since they have a standard format and are recognized by a variety of programs, unlike TLS information in the PDB header.

Why doesn't the PDB header report the bulk solvent parameters K_sol and B_sol?

Newer versions of Phenix use an improved bulk-solvent correction and scaling procedure which uses an entirely different parameterization that we find performs better (Afonine et al. 2013; see also Uson et al. (1999) Acta Cryst. D55, 1158–1167).

What is the difference between the various scattering tables? When should I use something other than the default?

If you are refining a neutron structure, you should of course use the neutron scattering table. The other tables are all X-ray-specific; the default, n_gaussian, is the best to use, as it uses dynamically defined number of Gaussians to approximate tabulated form-factors with required accuracy. it1992 is commonly used in other programs - this is four Gaussians plus constant, taken from International Tables 1992 edition. wk1995 is from (Waasmaier & Kirfel 1995), which is five Gaussians and is more accurate (but slower) than it1992.

Citations

How should I cite phenix.refine?

Either (Afonine et al. 2012) or (Adams et al. 2010) is suitable; we recommend the later if you used additional components of Phenix. If you used the integrated MolProbity validation in the GUI, you should also cite (Chen et al. 2010) and/or (Echols et al. 2012).

How should I specify the refinement program in my PDB deposition?

If the PDB or mmCIF file you are depositing was output by phenix.refine, the refinement program (including version number) is already specified in the header. Otherwise, "PHENIX (version 1.8.4)" is suitable (replaced with the actual version number). However, note that if you used multiple refinement programs (for example REFMAC and PHENIX) during the course of structure determination, only the last will usually be named in the header, so we suggest that you edit this information during deposition to complete the list.

What are some references for the underlying methods used in phenix.refine?

For technical background, the most thorough source is (Afonine et al. 2012), which contains all of the relevant citations. We recommend that everyone who uses phenix.refine read this paper at some point even if you are not concerned with theory, as it provides a more detailed explanation for the methods and motivations behind the program than this documentation.

Where can I read more about the principles of refinement in general?

Bernhard Rupp's "BioMolecular Crystallography" is the most modern and complete reference, and includes a detailed explanation of the maximum likelihood methods used in phenix.refine and many other programs.

References

PHENIX: a comprehensive Python-based system for macromolecular structure solution. P.D. Adams, P.V. Afonine, G. Bunkoczi, V.B. Chen, I.W. Davis, N. Echols, J.J. Headd, L.W. Hung, G.J. Kapral, R.W. Grosse-Kunstleve, A.J. McCoy, N.W. Moriarty, R. Oeffner, R.J. Read, D.C. Richardson, J.S. Richardson, T.C. Terwilliger, and P.H. Zwart. Acta Cryst. D66, 213-221 (2010).

A robust bulk-solvent correction and anisotropic scaling procedure. P.V. Afonine, R.W. Grosse-Kunstleve, and P.D. Adams. Acta Crystallogr D Biol Crystallogr 61, 850-5 (2005).

Towards automated crystallographic structure refinement with phenix.refine. P.V. Afonine, R.W. Grosse-Kunstleve, N. Echols, J.J. Headd, N.W. Moriarty, M. Mustyakimov, T.C. Terwilliger, A. Urzhumtsev, P.H. Zwart, and P.D. Adams. Acta Crystallogr D Biol Crystallogr 68, 352-67 (2012).

Bulk-solvent and overall scaling revisited: faster calculations, improved results. P.V. Afonine, R.W. Grosse-Kunstleve, P.D. Adams, and A. Urzhumtsev. Acta Crystallogr D Biol Crystallogr 69, 625-34 (2013).

MolProbity: all-atom structure validation for macromolecular crystallography. V.B. Chen, W.B. Arendall, J.J. Headd, D.A. Keedy, R.M. Immormino, G.J. Kapral, L.W. Murray, J.S. Richardson, and D.C. Richardson. Acta Cryst. D66, 16-21 (2010).

Graphical tools for macromolecular crystallography in PHENIX. N. Echols, R.W. Grosse-Kunstleve, P.V. Afonine, G. Bunkóczi, V.B. Chen, J.J. Headd, A.J. McCoy, N.W. Moriarty, R.J. Read, D.C. Richardson, J.S. Richardson, T.C. Terwilliger, and P.D. Adams. J. Appl. Cryst. 45, 581-586 (2012).

Crystallographic model quality at a glance. L. Urzhumtseva, P.V. Afonine, P.D. Adams, and A. Urzhumtsev. Acta Cryst. D65, 297-300 (2009).