Automated molecular replacement with AutoMR

Author(s)

Phaser: Randy J. Read, Airlie J. McCoy and Laurent C. Storoni
AutoMR Wizard: Tom Terwilliger, Laurent Storoni, Randy Read, and Airlie McCoy
phenix.xtriage: Peter Zwart

Purpose

Purpose of the AutoMR Wizard

The AutoMR Wizard provides a convenient interface to Phaser molecular replacement and feeds the results of molecular replacement directly into the AutoBuild Wizard for automated model rebuilding.

The AutoMR Wizard begins with datafiles with structure factor amplitudes and uncertainties, a search model or models, and identifies placements of the search models that are compatible with the data.

Usage

The AutoMR wizard has been deprecated and is no longer available in the Phenix GUI, but it may still be run from the command-line, and from parameters files. Both versions are identical except in the way that they take commands from the user. See Using the PHENIX Wizards for details of how to run a Wizard.

Summary of inputs and outputs for AutoMR

Input data file. This file can be in most any format, and must contain either amplitudes or intensities and sigmas. You can specify what resolution to use for molecular replacement and separately what resolution to use for model rebuilding. If you specify "0.0" for resolution (recommended) then defaults will be used for molecular replacement (i.e. use data to 2.5A if available to solve structure, then carry out rigid body refinement of final solution with all data) and all the data will be used for model rebuilding.

Contents of the asymmetric unit. PHASER needs to know what the total mass in the asymmetric unit is (i.e. not just the mass of the search models). You can define this either by specifying one or more protein or nucleic acid sequence files, or by specifying protein or nucleic acid molecular masses, and telling the Wizard how many copies of each are present.

Space groups to search. You can request that all space groups with the same point group as the one you start out with be searched, and the best one be chosen. If you select this option then the best space group will be used for model rebuilding in AutoBuild.

Ensembles to search for. AutoMR builds up a model by finding a set of good positions and orientations of one "ensemble", and then using each of those placements as starting points for finding the next ensemble, until all the contents of the asymmetric unit are found and a consistent solution is obtained. You can specify any number of different ensembles to search for, and you can search for any number of copies of each ensemble. The order of searching for ensembles makes a difference, but Phaser chooses a sensible default search order based on the size and assumed accuracy of the different ensembles. In difficult cases you could try permuting the search order.

Each ensemble can be specified by a single PDB file or a set of PDB files. The contents of one set of PDB files for an ensemble must all be oriented in the same way, as they will be put together and used as a group always in the molecular replacement process.

You will need to specify how similar you think each input PDB file that is part of an ensemble is to the structure that is in your crystal. You can specify either sequence identity, or expected rmsd. Note that if you use a homology model, you should give the sequence identity of the template from which the model was constructed, not the 100% identity of the model!

Output of AutoMR

Output files from AutoMR

When you run AutoMR the output files will be in a subdirectory with your run number:

AutoMR_run_1_/   # subdirectory with results

A summary file listing the results of the run and the other files produced:

AutoMR_summary.dat  # overall summary

A warnings file listing any warnings about the run

AutoMR_warnings.dat  # any warnings

A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)

AutoMR_Facts.dat   # all Facts about the run

Molecular replacement model, structure factors, and map coefficients:

MR.1.pdb
MR.1.mtz

The AutoMR wizard writes out MR.1.pdb and MR.1.mtz as well as output log files. The MR.1.pdb file will contain all the components of your MR solution. If there are multiple PDB files in an ensemble, the model with the lowest estimated rmsd is chosen to represent the whole ensemble and is written to MR.1.pdb. If there are multiple copies of a model, the chains are lettered sequentially A B C... The MR.1.mtz file contains the data from your input file to the full resolution available, as well as sigmaA-weighted 2Fo-Fc map coefficients based on the rigid-body-refined model.

Model rebuilding. After PHASER molecular replacement the AutoMR Wizard loads the AutoBuild Wizard and sets the defaults based on the MR solution that has just been found. You can use the default values, or you may choose to use 2Fo-Fc maps instead of density-modified maps for rebuilding, or you may choose to start the model-rebuilding with the map coefficients from MR.1.mtz.

How to run the AutoMR Wizard

Running the AutoMR Wizard is easy. For example, from the command-line you can type:

phenix.automr native.sca search.pdb RMS=0.8 mass=23000 copies=1

The AutoMR Wizard will find the best location and orientation of the search model search.pdb in the unit cell based on the data in native.sca, assuming that the RMSD between the correct model and search.pdb is about 0.8 A, that the molecular mass of the true model is 23000 and that there is 1 copy of this model in the asymmetric unit. Once the AutoMR Wizard has found a solution, it will automatically call the AutoBuild Wizard and rebuild the model.

Components, copies, search models, and ensembles

Your structure is composed of one or more components such as a 20Kd subunit with sequence seq-of-20Kd-subunit.
There may be one or more copies of each component in your structure.
You can search for the location(s) of a component with a search model that consists of a single structure or an ensemble of structures.

What the AutoMR wizard needs to run

In a simple case where you have one search model and are looking for N copies of this model in your structure, you need:

1. a datafile name (native.sca or data=native.sca)
1. a search model (search_model.pdb or coords=search_model.pdb)
(3) how similar the search model is to your structure ( RMS=0.8 or identity=75)
(4) information about the contents of the asymmetric unit: (mass=23000 or seq_file=seq.dat) and (copies=1)

It may be advantageous to search using an ensemble of similar structures, rather than a single structure. If you have an ensemble of search models to search for, then specify it as

coords="model_1.pdb" coords="model_2.pdb" coords="model_3.pdb"

In this case you need to give the RMS or identity for each model: identity='45 40 35'. Each of the models in the ensemble must be in the same orientation as the others, so that the ensemble of models can be placed as a group in the unit cell. You may also use phenix.ensembler to generate a single multi-model PDB file containing the entire ensemble. In this case you should specify a single overall RMS or identity for the ensemble.

Running from a parameters file

You can run phenix.automr from a parameters file. This is often convenient because you can generate a default one with:

phenix.automr --show_defaults > my_automr.eff

and then you can just edit this file to match your needs and run it with:

phenix.automr  my_automr.eff

If you are searching for more than one ensemble, or if there is more than one component in the a.u., then use a parameters file and specify them like this (put all of this in a file like "my_mr.eff" and run it with "phenix.automr my_mr.eff":

automr {
  ensemble {
    ensembleID = "mol1"
    copies_to_find = 1
    coords = mol1.pdb
    identity = None
    RMS = "0.85"
  }
  ensemble {
    ensembleID = "mol2"
    copies_to_find = 1
    coords = mol2.pdb
    identity = None
    RMS = "0.90"
  }
  component {
    seq_file = "seq1.dat"
    component_type = *protein nucleic_acid
    mass = None
    component_copies = 1
  }
  component {
    seq_file = "seq2.dat"
    component_type = *protein nucleic_acid
    mass = None
    component_copies = 1
  }
}

Specifying which columns of data to use from input data files

If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.

For example, if your data file data.mtz has columns F SIGF then you might specify

data=data.mtz
input_label_string="F SIGF"

You can find out all the possible label strings in a data file that you might use by typing:

phenix.autosol display_labels=data.mtz  # display all labels for data.mtz

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Using the PHENIX Wizards for how to do this. Some of the most common parameters are:

data=w1.sca       # data file
model=coords.pdb  # starting model
seq_file=seq.dat  # sequence file

Examples

Standard AutoMR run with coords.pdb native.sca

Run AutoMR using coords.pdb as search model, native.sca as data, assume RMS between coords.pdb and true model is about 0.85 A, the sequence of true model is seq.dat and there is 1 copy in the asymmetric unit:

phenix.automr coords.pdb native.sca RMS=0.85 seq.dat copies=1  \
    n_cycle_rebuild_max=2 n_cycle_build_max=2

Specifying data columns

Run AutoMR as above, but specify the data columns explicitly:

phenix.automr coords.pdb RMS=0.85 seq.dat copies=1  \
    data=data.mtz input_label_string="F SIGF"  \
    n_cycle_rebuild_max=2 n_cycle_build_max=2

Note that the data columns are specified by a string that includes both F and SIGF : "F SIGF". The string must match some set of data labels that can be extracted automatically from your data file. You can find the possible values of this string as described above with

phenix.automr display_labels=data.mtz

Specifying a refinement file for AutoBuild

Run AutoMR as above, but specify a refinement file that is different from the file used for the MR search:

phenix.automr coords.pdb RMS=0.85 seq.dat copies=1  \
    data=data.mtz input_label_string="F SIGF"  \
    input_refinement_file=refinement.mtz \
    input_refinement_labels="FP SIGFP FreeR_flag"  \
    n_cycle_rebuild_max=2 n_cycle_build_max=2

Note that the commands input_refinement_file and input_refinement_labels are in the scope "autobuild_variables" . These commands and others with this prefix are passed on to AutoBuild.

AutoMR searching for 2 components

Run AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. This is all done by creating a parameters file with all the control information in it. Put all of this in a file like "my_mr.eff" and run it with "phenix.automr my_mr.eff":

automr {
  data = "w1.sca"
  build = False
  ensemble {
    ensembleID = "mol1"
    copies_to_find = 1
    coords = mol1.pdb
    identity = None
    RMS = "0.85"
  }
  ensemble {
    ensembleID = "mol2"
    copies_to_find = 1
    coords = mol2.pdb
    identity = None
    RMS = "0.90"
  }
  component {
    seq_file = "seq1.dat"
    component_type = *protein nucleic_acid
    mass = None
    component_copies = 1
  }
  component {
    seq_file = "seq2.dat"
    component_type = *protein nucleic_acid
    mass = None
    component_copies = 1
  }
}

Specifying molecular masses of 2 components

Run AutoMR as in the previous example, except specify the components of the asymmetric unit with molecular masses (30000 and 20000), and define the search models with PDB files and percent sequence identity with the true structures (50% and 60%). This is again all done by creating a parameters file with all the control information in it. Put all of this in a file like "my_mr.eff" and run it with "phenix.automr my_mr.eff":

automr {
  data = "w1.sca"
  seq_file = seq.dat
  ensemble {
    ensembleID = "mol1"
    copies_to_find = 1
    coords = mol1.pdb
    identity = 50
  }
  ensemble {
    ensembleID = "mol2"
    copies_to_find = 1
    coords = mol2.pdb
    identity = 60
  }
  component {
    component_type = *protein nucleic_acid
    mass = 30000
    component_copies = 1
  }
  component {
    component_type = *protein nucleic_acid
    mass = 40000
    component_copies = 1
  }
  autobuild_variables{
    n_cycle_rebuild_max = 1
  }
}

AutoMR searching for 2 components, but specifying the orientation of one of them

Run AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. Define the orientation and position of one component. Define the number of copies to find for each component (0 for beta, which is fixed, 1 for blip). This is again all done by creating a parameters file with all the control information in it. Put all of this in a file like "my_mr.eff" and run it with "phenix.automr my_mr.eff":

automr {
  data = "w1.sca"
  seq_file = seq.dat
  ensemble {
    ensembleID = "mol1"
    copies_to_find = 1
    coords = mol1.pdb
    identity = 50
  }
  ensemble {
    ensembleID = "mol2"
    copies_to_find = 0
    coords = mol2.pdb
    identity = 60
  }
  component {
    component_type = *protein nucleic_acid
    mass = 30000
    component_copies = 1
  }
  component {
    component_type = *protein nucleic_acid
    mass = 40000
    component_copies = 1
  }
  autobuild_variables{
    n_cycle_rebuild_max = 1
  }
 fixed_ensembles {
 fixed_ensembleID_list="mol2"
 fixed_euler_list = 199.84 41.535 184.15
 fixed_frac_list = -0.49736 -0.15895 -0.28067
 }
}

Note: you have to define an ensemble for the fixed molecule (mol2 in this example) and that you search for 0 copies of this molecule.

Possible Problems

Specific limitations and problems

The AutoBuild Wizard can build PROTEIN, RNA, or DNA, but it can only build one at a time. If your MR model contains more than one type of chain, then you will need to run AutoBuild separately from AutoMR and when you run AutoBuild, specify one of them with input_lig_file_list and the type of chain to build with chain_type:

input_lig_file_list=ProteinPartofMRmodel.pdb
chain_type=DNA

The keywords "cell" and "sg" have been replaced with "unit_cell" and "space_group" to make the keywords the same as in other phenix applications.
The syntax for searches with more than one ensemble and more than one component have changed in PHENIX version 1.4.
If you use an ensemble as a search model, the output structure will contain just the first member of the ensemble, so you may wish to put the member that is likely to be the most similar to the true structure as the first one in your ensemble.
AutoMR no longer is able to pass arbitrary commands to AutoBuild (this was discontinued to allow for a fixed set of inputs for AutoBuild).
The AutoMR Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Phaser crystallographic software. A.J. McCoy, R.W. Grosse-Kunstleve, P.D. Adams, M.D. Winn, L.C. Storoni, and R.J. Read. J Appl Crystallogr 40, 658-674 (2007).

Likelihood-enhanced fast rotation functions. L.C. Storoni, A.J. McCoy, and R.J. Read. Acta Crystallogr D Biol Crystallogr 60, 432-8 (2004).

Likelihood-enhanced fast translation functions. A.J. McCoy, R.W. Grosse-Kunstleve, L.C. Storoni, and R.J. Read. Acta Crystallogr D Biol Crystallogr 61, 458-64 (2005).