Overview of molecular replacement in Phenix

Contents

Steps in structure solution with molecular replacement
- Model selection
- Molecular replacement
Automated Molecular replacement
Frequently Asked Questions
Reference

Molecular replacement (MR) is a phasing method that uses prior information in the form of known structures that are related or homologous to components in the crystal. Because it requires no additional experimental procedures or data, and additionally simplifies model-building, MR is usually the method of choice for structure determination when a suitable search model is available.

For new users, we recommend reading this document first, followed by the Phaser-MR tutorial for a step-by-step introduction to running MR in the Phenix GUI.

Steps in structure solution with molecular replacement

Model selection

Structures of suitable models are normally found via homology searches (using e.g. NCBI Blast for reasonably close relatives or HHpred for more distant relatives), and can be characterized by the sequence identity with the target sequence.

The main restriction on the use of molecular replacement is the requirement for a suitably similar search model. Although there is no exact rule for this, the relationship between sequence identity and MR success is roughly as follows:

Better than 40% identity: usually easy, unless large conformational changes are involved.

30-40%: MR usually possible, but sometimes more difficult.

20-30%: MR usually difficult if at all possible, careful model search and preparation required.

Less than 20%: MR unlikely to work, but MR-Rosetta can help in marginal cases.

In terms of RMSD, above 2.5A is very unlikely to work, while 1.5A or less is preferable.

Structures that undergo large conformational changes may need to be split into separate domains for searching, regardless of sequence identity. Where multiple similar search models are available, combining these into an ensemble will often improve the likelihood of success, particularly if the trimming option in Ensembler is used to trim the ensemble back to its conserved core.

In Phenix, the Sculptor and Ensembler utilities are available for preparing search models. Sculptor can be used to improve the model by pruning residues and side chains according to an alignment between the model structure and the target sequence, and also to apply B-factor weights to the model to weight down unreliable parts. Ensembler can be used to superpose homologous models automatically. The recommended order is to run Sculptor and then Ensembler, for reasons explained in the MR FAQ.

Molecular replacement

The procedure to place a search model is roughly divided into two steps, a rotation function (RF) to determine its orientation, and a translation function (TF) to determine its absolute position in the unit cell. Multiple components can be placed sequentially to solve the structure of a complex or the structure of a crystal containing multiple copies in the asymmetric unit.

In Phenix, MR is performed by the program Phaser, written by Randy Read's group at the University of Cambridge. Although Phaser may be run on the command line with CCP4-style inputs, we recommend using the Phaser-MR GUI. This GUI requires manual preparation and specification of search models, but enables fine control over parameters and multi-step searches, which may be necessary for difficult structures. See the Phaser-MR GUI manual for additional details specific to that interface.

(Phaser is also used for experimental phasing, but this functionality is exposed through the AutoSol wizard and Phaser-EP GUI.)

Input files and mandatory parameters

All of the MR programs in Phenix require a single reflections file containing experimental data (with sigmas); the Phaser-MR GUI will accept any file format or data type, including intensities. The procedure traditionally uses all reflections, so R-free flags are not required. (Unlike refinement, this does not significantly bias the final R-free value, since there are so few degrees of freedom: six to rotate and translate each molecule, and one each for the overall B-factor and estimated RMS error.)

At least one search model is required; in the context of Phaser the search model is usually referred to as an "ensemble". In many cases this will be a single PDB file containing one structure. For more distant models, an actual ensemble model may be used instead - either a PDB file with multiple MODEL records, or multiple similar PDB files. When using a multi-model ensemble, all models must be superposed in the same orientation; the Ensembler utility is used for this. There are no limitations on the size, complexity, or number of search models. For complexes or multimers, if the relative positions of individual subunits do not change, the entire assembly (for instance, a ribosomal subunit) may be used as a search model instead of placing each component separately.

Another option is to search using electron density (or rather, an MTZ file containing pre-weighted map coefficients), often solved at low resolution or obtained from cryoEM image reconstruction. This requires additional information about the center and extent of the map section to search with. It is best to prepare this MTZ file using the phenix.cut_out_density tool, making sure that the density is placed in a unit cell that is at least 2.5 times the x, y and z extents of the density.

The maximum likelihood phasing methods used in Phaser require prior knowledge about the deviation (or variance) of the search model(s) from the real structure, and the expected ASU contents or scattering mass of the crystal. To specify the model variance, either an RMSD value or percent sequence identity may be used (these will be converted internally, using relationships determined from a database of test calculations). It is important to minimize the variance if possible (see Model selection for guidelines), which often requires eliminating atoms or modifying B-factors. The Sculptor utility will perform this step, given a PDB file and a sequence alignment. (This is usually unnecessary for search models with high sequence identity to the target molecule, however, especially at lower sequence identity, processing models with the Sculptor utility is highly recommended.) Additionally, the Ensembler utility can trim loops that deviate among members of the ensemble, leaving only the conserved core.

For ASU contents, you may supply a sequence file (protein or nucleic acid), or simply enter the molecular weight. The standalone versions of Phaser also accept the fractional ASU contents of each search ensemble, if known. Note that the ASU contents data does not necessarily have a 1:1 correspondence with the search ensembles (see MR FAQ for details); if you wish to simply specify the molecular weight of a complex while searching for multiple subunits, you may enter the mass for the complex as a single component. (You may also use multi-record FASTA-format sequence files.) Even if you are only searching for a single ensemble out of several (e.g. the protein in a protein-DNA complex), you must still supply the expected ASU contents of the entire crystal because Phaser needs to know what fraction of the asymmetric unit each search model comprises.

Outline of MR procedure

The automated molecular replacement method in Phaser involves several discrete steps:

Anisotropy correction: scales reflections as necessary to overcome anisotropy (weak data in a particular direction).

Translational non-crystallographic symmetry (tNCS) correction: checks for the presence of tNCS. If present, parameters describing the translation and small orientation differences between copies are determined and used to compute correction factors.

Rotation function: identifies possible orientations of the model.

Translation function: given the orientation(s) from the RF, finds the absolute position(s) in the unit cell.

Packing analysis: filters TF results based on number of clashes between atoms, given a certain cutoff.

Refinement and phasing: performs simple rigid-body refinement of the placed molecules, and calculates phases from the final solution.

Log-likelihood gain calculation: determines the final LLG, which can be used to evaluate the success of MR.

If multiple search models are used, these steps will be performed sequentially for each model. Although each step may be run individually in the Phaser-MR GUI, this is necessary only in exceptionally difficult cases. In fact, it is usually better to allow Phaser to work out an optimal search order and, if necessary, test different choices of search order.

Limitations

You should watch out for otherwise valid solutions being thrown out because of packing clashes due to model deviations and/or extra residues. Phaser will attempt to resolve clashes by pruning segments of chain that refine to low occupancy values. If this does not succeed, manually removing the offending residues or lowering the packing cutoff can circumvent the problem.

For cases where anomalous data from a SAD experiment are available, a poor (but genuine) MR solution may be used to identify heavy-atom sites and combined with SAD phases, a technique known as MR-SAD. This may provide a decent-quality map where neither technique is independently sufficient. At the end of MR in the Phaser-MR GUI, if SAD data are available a button will be presented allowing MR-SAD in the AutoSol wizard to be launched easily.

Automated Molecular replacement

The MRage framework integrates model processing steps and molecular replacement into one application. It provides a highly customizable interface with parallelization options and the ability to connect to the NCBI Blast and wwPDB websites to perform homology searches and fetch potential models. A separate MRage tutorial is available on the Phaser home page.

Frequently Asked Questions

These are now covered on a separate section.

Reference

Phaser crystallographic software. A.J. McCoy, R.W. Grosse-Kunstleve, P.D. Adams, M.D. Winn, L.C. Storoni, and R.J. Read. J Appl Crystallogr 40, 658-674 (2007).