Automated Model Building and Rebuilding using AutoBuild

Contents

Author(s)

Purpose of the AutoBuild Wizard

The purpose of the AutoBuild Wizard is to provide a highly automated system for model rebuilding and completion. The Wizard design allows the user to specify data files and parameters through an interactive GUI, or alternatively through a parameters file. The AutoBuild Wizard begins with datafiles with structure factor amplitudes and uncertainties, along with either experimental phase information or a starting model, carries out cycles of model-building and refinement alternating with model-based density modification, and producing a relatively complete atomic model.

The AutoBuild Wizard uses RESOLVE, xtriage and phenix.refine to build an atomic model, refine it, and improve it with iterative density modification, refinement, and model-building

The Wizard begins with either experimental phases (i.e., from AutoSol) or with an atomic model that can be used to generate calculated phases. The AutoBuild Wizard produces a refined model that can be nearly complete if the data are strong and the resolution is about 2.5 A or better. At lower resolutions (2.5 - 3 A) the model may be less complete and at resolutions > 3A the model may be quite incomplete and not well refined.

The AutoBuild Wizard can be used to generate OMIT maps (simple omit, SA-omit, iterative-build omit) that can cover the entire unit cell or specific residues in a PDB file.

The AutoBuild Wizard can generate a set of models compatible with experimental data (multiple_models)

Usage

The AutoBuild Wizard can be run from the PHENIX GUI, from the command-line, and from parameters files. All three versions are identical except in the way that they take commands from the user. See Using the PHENIX Wizards for details of how to run a Wizard. The command-line version will be described here.

How the AutoBuild Wizard works

The AutoBuild Wizard begins with experimental structure factor amplitudes, along with either experimental or model-based estimates of crystallographic phases. The phase information is improved by using statistical density modification to improve the correlation of NCS-related density in the map (if present) and to improve the match of the distribution of electron densities in the map with those expected from a model map. This improved map is then used to build and refine an atomic model.

In subsequent cycles, the models from previous cycles are used as a source of phase information in statistical density modification, iteratively improving the quality of the map used for model-building.

Additionally, during the first few cycles additional phase information is obtained by detecting and enhancing (1) the presence of commonly-found local patterns of density in the map, and (2) the presence of density in the shape of helices and strands. The final model obtained is analyzed for residue-based map correlation and density at the coordinates of individual atoms, and an analysis including a summary of atoms and residues that are in strong, moderate, or weak density and out of density is provided.

Automation and user control

The AutoBuild Wizard has been designed for ease of use combined with maximal user control, with as many parameters set automatically by the Wizard as possible, but maintaining parameters accessible to the user through a GUI and through parameters files. The Wizard uses the input/output routines of the cctbx library, allowing data files of many different formats so that the user does not have to convert their data to any particular format before using the Wizard. Use of the phenix.refine refinement package in the AutoBuild Wizard allows a high degree of automation of refinement so that the neither user nor Wizard is required to specify parameters for refinement. The phenix.refine package automatically includes a bulk solvent model and automatically places solvent molecules.

Core modules in the AutoBuild Wizard

The five core modules in the AutoBuild Wizard are

The standard procedures available in the AutoBuild Wizard that are based on these modules include:

Starting from a set of experimental phases and structure factor amplitudes, normally model-building and completion starting from experimental phases is carried out, and then the resulting model is rebuilt from scratch.

Starting from a model (e.g., from molecular replacement) and experimental structure factor amplitudes, rebuilding a model in place is by default carried out if the starting model differs less than about 50% in sequence from the desired model, and otherwise the resulting model is rebuilt from scratch. It is generally a good idea to specify which you want to happen using the keyword "rebuild_in_place=True" (to keep the basic input model) or "rebuild_in_place=False" (to build a new model).

What the AutoBuild wizard needs to run

...and optional files

  • coefficients for a starting map (map_file=resolve.mtz)
  • a file for refinement (refinement_file=exptl_fobs_freeR_flags.mtz)
  • a high-resolution datafile (hires_file=high_res.sca)

Anisotropy correction and B-factor sharpening

The AutoBuild wizard will apply an anistropy correction and B-factor sharpening to all the raw experimental data by default (controlled by they keyword remove_aniso=True). The target overall Wilson B factor can be set with the keyword b_iso, as in b_iso=25. By default the target Wilson B will be 10 times the resolution of the data (e.g., if the resolution is 3 A then b_iso=30.), or the actual Wilson B of the data, whichever is lower.

If an anisotropy correction is applied then the entire AutoBuild run will be carried out with anisotropy-corrected and sharpened data. At the very end of the run the final model will be re-refined against the uncorrected refinement data and this re-refined model and the uncorrected refinement data (with freeR flags) will be written out as overall_best.pdb and overall_best_refine_data.mtz.

Specifying which columns of data to use from input data files

If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.

For example, if your data file ref.mtz has columns FP SIGFP and FreeR then you might specify

refinement_file=ref.mtz
input_refinement_labels="FP SIGFP None None None None None None FreeR"

The keywords for labels and anticipated input labels (program labels) are:

input_labels (for data file): FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag
input_refinement_labels: FP SIGFP FreeR_flag
input_map_labels: FP PHIB FOM
input_hires_labels: FP SIGFP FreeR_flag

You can find out all the possible label strings in a data file that you might use by typing:

phenix.autosol display_labels=w1.mtz  # display all labels for w1.mtz

NOTES: if your data files contain a mixture of amplitude and intensity data then only the amplitude data is available. If you have only intensity data in a data file and want to select specific columns, then you need to specify the column names as they are after importing the data and conversion to amplitudes (see below under General Limitations for details).

Specifying other general parameters

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at Using the PHENIX Wizards for how to do this. Some of the most common parameters are:

data=w1.sca       # data file
model=coords.pdb  # starting model
rebuild_in_place=true # rebuild input model in place
rebuild_in_place=false # build a new model; add or subtract residues
                       #   from input model as necessary
seq_file=seq.dat  # sequence file
map_file=map_coeffs.mtz # coefficients for a starting map for building
resolution=3     # dmin of 3 A
s_annealing=True  # use simulated annealing refinement at start of each cycle
n_cycle_build_max=5  # max number of build cycles (starting from experimental phases)
n_cycle_rebuild_max=5  # max number of rebuild cycles (starting from a model)

Running from a parameters file

You can run phenix.autobuild from a parameters file. This is often convenient because you can generate a default one with:

phenix.autobuild --show_defaults > my_autobuild.eff

and then you can just edit this file to match your needs and run it with:

phenix.autobuild  my_autobuild.eff

Picking waters in AutoBuild

By default AutoBuild will instruct phenix.refine to pick waters using its standard procedure. This means that if the resolution of the data is high enough (typically 3 A) then waters are placed.

You can tell AutoBuild not to have phenix.refine pick waters with the command:

place_waters=False

If you want to place waters at a lower resolution, you will need to reset the low-resolution cutoff for placing waters in phenix.refine. You would do that in a "refinement_params.eff" file containing lines like these (see below for passing parameters to phenix.refine with an ".eff" file):

refinement {
  ordered_solvent {
    low_resolution = 2.8
  }
}

Keeping waters from your input file in AutoBuild

You can tell AutoBuild to keep the waters in your input file when you are using rebuild_in_place (the default is to toss them and replace them with new ones). You can say,

keep_input_waters=True
place_waters=No

NOTE: If you specify keep_input_waters=True you should also specify either "place_waters=No" or "keep_pdb_atoms=No" . This is because if place_waters=Yes and keep_pdb_atoms=Yes then phenix.refine will add waters and then the wizard will keep the new waters from the new PDB file created by phenix.refine preferentially over the ones in your input file.

Twinning and AutoBuild

AutoBuild does not know about twinning, but you can incorporate a twin law into the refinement steps in the AutoBuild procedure if your crystal is twinned. Use phenix.xtriage to identify twinning and the twin law. Then specify the twin law in a parameters file (see next section) and provide that to AutoBuild with the keyword such as "refine_eff_file=twin_law.eff"

You may also want to try using the keyword "two_fofc_in_rebuild" which will use the 2Fo-Fc map from phenix.refine in model-building.

R-free flags and test set

In Phenix the parameter test_flag_value sets the value of the test set that is to be free. Normally Phenix sets up test sets with values of 0 and 1 with 1 as the free set. The CCP4 convention is values of 0 through 19 with 0 as the free set. Either of these is recognized by default in Phenix and you do not need to do anything special. If you have any other convention (for example values of 0 to 19 and test set is 1) then you can specify this with test_flag_value.

Note that phenix.refine and AutoBuild write out PDB files that contain the test_flag_value. AutoBuild can read this test_flag_value and use it automatically. However if there is a conflict between this test_flag_value and the default value based on your data file, you may have to specify which to use.

Special note on anomalous data and AutoBuild: Autobuild does not support anomalous test sets. If you have a data file with anomalous data that has Rfree flags such as Rfree(+),Rfree(-) then you will need to merge these Rfree flags before running Autobuild. Here is how:

Go to the reflection file editor, read in your refine_data.mtz (or whatever it is called) file with anomalous data. Copy all the data and Rfree flags to the output file, but select "Edit arrays" and in the window that comes up do the following:

  1. Change the names of the output data arrays from

    I+ SigI+ I- SigI- to I SigI (or equivalent)

  2. Specify "merge if present" for "anomalous"

  3. Do the same for the Rfree Flags array.

  4. Run the reflection editor.

Now you have a data file that is non-anomalous and that has the same test set as your original. You can use this in AutoBuild.

Specifying phenix.refine parameters

You can control phenix.refine parameters that are not specified directly by AutoBuild using a refinement parameters (.eff) file:

refine_eff_file=refinement_params.eff    # set any phenix.refine params not set by AutoBuild

This file might contain a twin-law for refinement:

refinement {
  twinning {
    twin_law = "-k, -h, -l"
  }
}

You can put any phenix.refine parameters in this file, but a few parameters that are set directly by AutoBuild override your inputs from the refine_eff_file. These parameters are listed below.

Refinement parameters that must be set using AutoBuild Wizard keywords (overwriting any values provided by user in input_eff_file)

phenix.refine keyword Wizard keyword(s) and notes
refinement.main.number_of_macro_cycles ncycle_refine
refinement.main.simulated_annealing s_annealing (only applies to 1st refinement in rebuild. SA in any other refinements controlled by input_eff_file, if any)
refinement.ncs.find_automatically refine_with_ncs=True turns on automatic ncs search
refinement.main.ncs refine_with_ncs=True turns on ncs
refinement.ncs.coordinate_sigma Normally not set by Wizard. However if the Wizard keyword ncs_refine_coord_sigma_from_rmsd is True then the ncs coordinate sigma is equal to ncs_refine_coord_sigma_from_rmsd_ratio times the rmsd among ncs copies
refinement.main.random_seed i_ran_seed sets the random seed at the beginning of a Wizard... this affects refinement.main.random_seed but does not set it to the value of i_ran_seed (because i_ran_seed gets updated by several different routines)
refinement.main.ordered_solvent place_waters=True will set ordered_solvent to True. Note that this only has an effect if the value of the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) is higher than the resolution used for refinement.
refinement.main.ordered_solvent place_waters_in_combine=True will set ordered_solvent to True, only applying this to the final combination step of multiple-model generation. Note that this only has an effect if the value of the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) is higher than the resolution used for refinement.
refinement.ordered_solvent.low_resolution ordered_solvent_low_resolution=3.0 (default) will set the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) to 3 A. If the resolution used for refinement is larger than the value of ordered_solvent_low_resolution then ordered solvent is not added.
refinement.main.use_experimental_phases use_mlhl=True will set refinement.main.use_experimental_phases to True
refinement.refine.strategy The Wizard keywords refine refine_b refine_xyz all affect refinement.refine.strategy. If refine=True then refinement is carried out. If refine_b=True (default) isotropic displacement factors are refined. If refine_xyz=True (default) coordinates are refined.
refinement.main.occupancy_max max_occ=1.0 sets the value of refinement.main.occupancy_max to 1.0. Default is to do nothing and use the default from phenix.refine (1.0)
refinement.refine.occupancies.individual The combination of Wizard keywords of semet=True and refine_se_occ=True will add "(name SE)" to the value of refinement.refine.occupancies.individual. You can add to your .eff file other names of atoms to have occupancies refined as well.
refinement.main.high_resolution Either of the Wizard keywords refinement_resolution and resolution will set the value of refinement.main.high_resolution, with refinement_resolution being used if available.
refinement.pdb_interpretation.link_distance_cutoff link_distance_cutoff

The following parameters controlling phenix.refine output are set directly in AutoBuild and cannot be set by the user

Specifying resolve/resolve_pattern parameters

Similarly, you can control resolve and resolve_pattern parameters. For these parameters, your inputs will not be overridden by AutoBuild. The format is a little tricky: you have to put two sets of quotes around the command like this:

resolve_command="'resolution 200 3'"    # NOTE ' and " quotes

This will put the text

resolution 200 3

at the end of every temporary command file created to run resolve. (This is why it is not overridden by AutoBuild commands; they will all come before your commands in the resolve command file.) Note that some commands in resolve may be incompatible with this usage.

Including ligand coordinates in AutoBuild

If your input PDB file contains ligands (anything other than solvent that is not protein if your chain_type=PROTEIN, for example) then by default these ligands will be kept, used in refinement, and written out to your output PDB file. Any solvent molecules will by default be discarded. You can change this behavior by changing the keywords from these defaults:

keep_input_ligands=True
keep_input_waters=False

The AutoBuild Wizard will use phenix.elbow to generate geometries for any ligands that are not recognized.

You can also tell AutoBuild to add the contents of any PDB files that you wish to supply to the current version of the structure just before refinement, so all the refined models produced contain whatever AutoBuild has built, plus the contents of these PDB files. This can be done through the GUI, the command-line, or a parameters file. In the command-line version you do this with:

input_lig_file_list=my_ligand.pdb

NOTE: The files in input_lig_file_list will be edited to make them all HETATM records to tell AutoBuild to ignore these residues in rebuilding.

NOTE You may need to tell phenix.refine about the geometry of your ligands. You will get an error message if the ligand is not recognized and an automatic run of phenix.elbow does not succeed in generating your ligand. In that case you will want to run phenix.elbow to create a cif definition file for this ligand:

phenix.elbow my_ligand.pdb --id=LIG

where LIG is the 3-letter ID code that you use in my_ligand.pdb to identify your ligand. If the automatic run does not work you may need to give phenix.elbow additional information to generate your ligand.

Once phenix.elbow has generated your ligand you can use the keyword "cif_def_file_list" to tell AutoBuild about this ligand:

cif_def_file_list=elbow.LIG.my_ligand.pdb.cif

Specifying arbitrary commands and cif files for phenix.refine

You can tell AutoBuild to apply any set of cif definitions to the model during refinement by using a combination of specification files and the commands cif_def_file_list and refine_eff_file_list:

refine_eff_file_list=link.eff cif_def_file_list=link.cif

This example comes from the phenix.refine manual page in which a link is specified in a cif definition file link.cif:

 data_mod_5pho
#
loop_
_chem_mod_atom.mod_id
_chem_mod_atom.function
_chem_mod_atom.atom_id
_chem_mod_atom.new_atom_id
_chem_mod_atom.new_type_symbol
_chem_mod_atom.new_type_energy
_chem_mod_atom.new_partial_charge
 5pho     add      .      O5T    O    OH      .
loop_
_chem_mod_bond.mod_id
_chem_mod_bond.function
_chem_mod_bond.atom_id_1
_chem_mod_bond.atom_id_2
_chem_mod_bond.new_type
_chem_mod_bond.new_value_dist
_chem_mod_bond.new_value_dist_esd
 5pho     add      O5T     P         coval        1.520    0.020

and this is applied with a parameters file link.eff:

 refinement.pdb_interpretation.apply_cif_modification
{
  data_mod = 5pho
  residue_selection = resname GUA and name O5T
}

You can have any number of cif files and parameters files.

Output files from AutoBuild

When you run AutoBuild the output files will be in a subdirectory with your run number:

AutoBuild_run_1_/   # subdirectory with results

The key output files that are produced are:

AutoBuild_summary.dat  # overall summary
AutoBuild_run_1_1.log # overall log file
AutoBuild_warnings.dat  # any warnings
overall_best.pdb

NOTE 1: The "working_best.pdb" file is the current working best model. If an anisotropy correction and sharpening are applied (remove_aniso=True) then working_best.pdb will be refined against the corrected data. At the end of the run the last working_best.pdb will be re-refined against the original data (overall B refined only) and written out as overall_best.pdb.

NOTE 2: If there are multiple chains or multiple ncs copies, each chain will be given its own chainID (A B C D...). Segments that are not assigned to a chain are given a separate chainID and are given a segid of "UNK" to indicate that their assignment is unknown. ChainID's for ligands are kept as input. The chainID for solvent molecules is normally S.

overall_best_denmod_map_coeffs.mtz
overall_best_refine_map_coeffs.mtz
overall_best_refine_data.mtz

NOTE: The labels for this mtz file are typically:

FP SIGFP PHIM FOMM HLAM HLBM HLCM HLDM FreeR_flag

The file overall_best_refine_data.mtz (identical to the file exptl_fobs_phases_freeR_flags.mtz) has a copy of the (experimental) HL coefficients that were input to autobuild. The labels HLAM HLBM etc have the ending "M" because they were copied by resolve and it outputs these labels...but in fact they are not density modified phases from autobuild, just copied straight from the input data file.

overall_best.log
overall_best.log_refine
overall_best.log_eval
overall_best_ncs_info.ncs

Standard building, rebuild_in_place, and multiple-models

The AutoBuild Wizard has two overall methods for building a model.

The first method (standard build) is to build a model from scratch. This involves identification of where helices (and strands, for proteins) are located, extension using fragment libraries, connection of segments, identification of side-chains, and sequence alignment. These methods are augmented in the standard building procedure by loop-fitting and building model outside of the region that has already been built.

The second method (rebuild_in_place) takes an existing model and rebuilds it without adding or deleting any residues and without changing the connectivity of the chain. The way this works is a segment of the model is deleted and then is filled-in again by rebuilding from the remaining ends. This is repeated for overlapping segments covering the entire model. NOTE: If you are using rebuild_in_place then your model must be quite similar to your sequence file, and in particular the model must not extend in the N-terminal direction beyond your sequence file. Minor edits (amino acid replacements) will be done automatically. Also NOTE: rebuild_in_place is not designed for models that contain alternate conformations. It is designed for a model with a single conformation. If you supply a model with some residues or side-chains with a blank altloc, and some with an altloc of A and some with B, then all those with A or B will be ignored (only the first conformer is considered).

The multiple-models approach really has two levels of multiple models. At the first level, several (multiple_models_group_number, default is number_of_parallel_models) models are built (using rebuild_in_place) and are then recombined into a single good model. At the next level, this whole process may be done more than once (multiple_models_number times), yielding several very good models. By default, if you ask for rebuild_in_place, then you will get a single very good model, created by running rebuild_in_place several times and recombining the models.

Parallel jobs, nproc, nbatch, number_of_parallel_models and how AutoBuild works in parallel

The AutoBuild Wizard is set up to take advantage of multi-processor machines or batch queues by splitting the work into separate tasks. See Tutorial 4: Iterative model-building, density modification and refinement starting from experimental phases and Tutorial 6: Automatically rebuilding a structure solved by Molecular Replacement for a description of the method used by the AutoBuild Wizard to run build jobs as sub-processes and to combine the results into single models.

Here are the key factors that determine how splitting model-building into batches and running them on one or more processors works:

Phenix.autobuild is set up so that you can specify the number of processors (nproc). Here is how to choose how to set it:

Additionally you will want to set two more parameters:

run_command ="command you use to submit a job to your system"
background=False   # probably false if this is a cluster, true if this is a multiprocessor machine

If you have a queueing system with 20 nodes, then you probably submit jobs with something like "qsub -someflags myjob.sh" # where someflags are whatever flags you use (or just "qsub myjob.sh" if no flags) Then you might use

run_command="qsub -someflags"  background=False nproc=20

or

run_command="qsub"  background=False nproc=20

or If you have a 20-processor machine instead, then you might say

run_command=sh  background=True nproc=20

so that it would run your jobs with sh on your machine, and run them all in the background (i.e., all at one time).

Resolution limits in AutoBuild

There are several resolution limits used in AutoBuild. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set:

Name Description How default value is set
resolution Overall resolution. Used as high-resolution limit for density modification. Used as default for refinement resolution and model-building resolution if they are not set. Resolution of input datafile. If a hires datafile is provided, the resolution of that data is used.
refinement_resolution Resolution for refinement value of "resolution"
resolution_build Resolution for model-building value of "resolution"
overall_resolution Resolution to truncate all data. This should only be used if you need to truncate the data in order to get the Wizard to run. It causes the Wizard to ignore all data at higher resolution than overall_resolution. It is normally better to use the resolution keyword to define the resolution limits, as that will keep all the data in the output and working files. None
multiple_models_starting_resolution Resolution for the initial rebuilding of a model in the multiple-models procedure. Normally a low resolution to generate diversity. 4 A by default

Phase extension in AutoBuild

If you supply a starting map file and a hires_file (with native data to higher resolution) and you do not supply a model,then autobuild will by default carry out phase extension (in increments of s (1/d_min) of s_step). If you do supply a model, or you do not supply a hires_file, or you do not supply a starting map file, then the resolution used will be the final resolution (no phase extension steps.)

Sample AutoBuild Commands

NOTE: Output files will be in subdirectories labelled "AutoBuild_run_1_" "AutoBuild_run_2_" etc.

Run AutoBuild beginning with experimental data

phenix.autobuild data=solve_1.mtz seq_file=seq.dat
input_ncs_file=ha.pdb

Here the data in solve_1.mtz (FP SIGFP PHIB FOM HLA HLB HLC HLD) will be used as the starting point for density modification. Then a model will be built and refined. In subsequent cycles the models that have been built will be used to improve the phases in density modification. If NCS can be found from the sites in ha.pdb or from any models that are built, then NCS will be used in density modification.

Run AutoBuild beginning with a model and rebuild in place

phenix.autobuild data=w1.sca seq.dat model=coords.pdb \
rebuild_in_place=True

Here "rebuild_in_place=True" tells AutoBuild to keep the overall model you have supplied, not to add or subtract residues from it, except that AutoBuild will try to edit the model to match the sequence in your sequence file. The AutoBuild Wizard will use your model and the data in w1.sca to generate starting phases, then it will carry out density modification to improve those phases, and adjust your model, rebuilding the model to match the resulting map and refining the model. This will be done iteratively, with the new model from each cycle being used at the start of the next one. If NCS is found in your model then it will be used in the density modification process.

Add more residues to a model or rebuild a model

phenix.autobuild data=solve_1.mtz seq_file=seq.dat \
   model=coords.pdb rebuild_in_place=False

Here "rebuild_in_place=False" tells AutoBuild to build a new model, adding or subtracting residues as necessary. The data in solve_1.mtz (FP SIGFP PHIB FOM HLA HLB HLC HLD) will be used along with your model as the starting point for density modification. Then a new model will be built and refined. In subsequent cycles the models that have been built will be used to improve the phases in density modification. If NCS is found in your model or any model that is built, then it will be used in density modification.

Run AutoBuild automatically after AutoSol

phenix.autobuild after_autosol

AutoBuild will identify the AutoSol run (in your working directory) with the highest overall score, then it will take the experimental phases (solve_xx.mtz or phaser_xx.mtz, where xx is the solution number) from that run, along with the corresponding density-modified map (resolve_xx.mtz) and the heavy_atom file (ha_xx.pdb_formatted.pdb) as inputs. Additionally, data for refinement are read in from exptl_fobs_freeR_flags_xx.mtz.

AutoBuild will then build a model, refine it, use the refined model in density modification, then iterate the model-building, refinement, and density modification process until no further improvement in the model occurs.

Merge in hires data

phenix.autobuild data=solve_2.mtz hires_file=w1.sca  seq_file=seq.dat

The high-resolution data in w1.sca will be used for FP and SIGFP. Other information from solve_2.mtz (PHIB FOM HLA HLB HLC HLD) will be kept.

Truncate density at heavy-atom sites

phenix.autobuild data=solve_2.mtz seq_file=seq.dat input_ha_file=ha.pdb truncate_ha_sites_in_resolve=True

The heavy-atom sites in ha.pdb will be used to mark locations where high density is to be ignored during initial cycles of density modification. This can be useful if the heavy-atom peaks are very pronounced in the experimental map. The sites in ha.pdb will also be included in the model for the structure if they do not overlap with any atoms that are built as part of the model.

Skip NCS in model_building and refinement

phenix.autobuild data=solve_2.mtz seq_file=seq.dat find_ncs=False refine_with_ncs=False

The keyword "find_ncs=False" disables the finding of NCS from the models that are built and its use in density modification and model-building. The keyword "refine_with_ncs=False" disables finding NCS and its use in the refinement process. Together they prevent all use of NCS.

Make a SA-omit map around atoms in target.pdb

phenix.autobuild data=data.mtz model=coords.pdb omit_box_pdb=target.pdb   composite_omit_type=sa_omit

Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted.

Make a simple composite omit map

phenix.autobuild data=data.mtz model=coords.pdb composite_omit_type=simple_omit

Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .

Make a SA composite omit map

phenix.autobuild data=data.mtz model=coords.pdb composite_omit_type=sa_omit

Coefficients for the output simulated-annealing composite omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .

Combine composite OMIT files from a set of parallel runs on different computers

If you run a composite OMIT job but it fails at the last step of combining files, or if you run all the individual omit boxes on different machines, you can still combine them all into one single composite omit map.

You can do this by copying all the individual mtz files with map coefficients for omit regions to a single directory.

Here is a script you can edit and use to combine omit maps representing different omit regions into one.

NOTE: you need to ensure that the OMIT regions are defined the same in the runs where you got your overall_best_denmod_map_coeffs.mtz_OMIT_REGION_1 etc files and this run. You ensure that with the n_xyz command that sets the grid. You can copy this from one of your resolve log files created when you ran your omit (i.e., AutoBuild_run_1_/TEMP0/AutoBuild_run_1_/TEMP0/resolve.log will have a line like "nu nv nw: 32 32 32 " and you copy those numbers).

 ------------------------------------
#!/bin/csh -f
# COMBINE OMIT SCRIPT
phenix.resolve << EOD
hklin exptl_fobs_phases_freeR_flags.mtz
labin FP=FP SIGFP=SIGFP
n_xyz 32 32 32  # YOU MUST SET THIS BASED ON THE nu nv nw in a resolve log
file.
solvent_content 0.85
no_build
ha_file NONE
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_1
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_2
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_3
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_4
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_5
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_6
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_7
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_8
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_9
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_10
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_11
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_12
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_13
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_14
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_15
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_16
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_17
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_18
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_19
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_20
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_21
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_22
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_23
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_24
omit
EOD
# END OF COMBINE OMIT SCRIPT

Make an iterative-build omit map around atoms in target.pdb

phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=target.pdb \
   composite_omit_type=iterative_build_omit

Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted.

Make a sa-omit map around residues 3 and 4 in chain A of coords.pdb

phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=coords.pdb \
   omit_res_start_list=3 omit_res_end_list=4 omit_chain_list=A   \
   composite_omit_type=sa_omit

Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ . An additional map coefficients file omit_region.mtz will show you the region that has been omitted.

Create one very good rebuilt model

phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \
  include_input_model=True  \
  multiple_models_number=1 n_cycle_rebuild_max=5

The final model will be in the subdirectory MULTIPLE_MODELS in the file all_models.pdb (this file will contain just one model).

Note that this procedure will keep the sequence that is present in coords.pdb. If you supply a sequence file it will edit the sequence of coords.pdb to match your sequence file and discard any residues that do not match. (If you want to input a sequence file but not edit the sequence in coords.pdb and not discard any non-matching residues, then specify also edit_pdb=False.)

Note also that if include_input_model=True then no randomization cycle will be carried out and multiple_models_starting_resolution is ignored.

Touch up a model

phenix.autobuild data=data.mtz model=coords.pdb \
touch_up=True worst_percent_res_rebuild=2 min_cc_res_rebuild=0.8

You can rebuild just the worst parts of your model by setting touch_up=True. You can decide what parts to rebuild based on a minimum model-map correlation (by residue). You can decide how much to rebuild using worst_percent_res_rebuild or with min_cc_res_rebuild, or both.

Remove the worst-fitting residues from a model

phenix.autobuild data=data.mtz model=coords.pdb \
 delete_bad_residues_only=True \
 input_map_file=map_coeffs.mtz \
 worst_percent_res_rebuild=2 min_cc_res_rebuild=0.8

The trimmed model will be in the file (the run number may vary):

AutoBuild_run_1_/starting_model_trimmed.pdb

and the removed residues will be in the file:

AutoBuild_run_1_/starting_model_removed_residues.pdb

You can delete just the worst parts of your model by setting delete_bad_residues_only=True. You can decide what parts to remove based on a minimum model-map correlation (by residue). You can decide how much to remove using worst_percent_res_rebuild or with min_cc_res_rebuild, or both. (these are the same parameters used to decide which residues to rebuild in touch_up=True).

Here the input_map_file is optional; if you do not provide it then a model- based density modified map will be used to evaluate your model.

Create 20 very good rebuilt models that are as different as possible

phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \
   multiple_models_number=20 n_cycle_rebuild_max=5

The 20 final models will be in the subdirectory MULTIPLE_MODELS in the file all_models.pdb. This procedure is useful for generating an ensemble of models that are each individually consistent with the data, and yet are diverse. The variation among these models is an indication of the uncertainty in each of the models. Note that the ensemble of models is not a representation of the ensemble of structures that is truly present in the crystal.

Combining files from a nearly-complete autobuild run with rebuild-in-place=true

If you have run autobuild with rebuild_in_place=True then the last step is combining the models that have been produced. If you ran the job in separate batches and want to combine the final models, you can use the script below.

Note that all the models must have exactly the same set of atoms (aside from any solvent).

Basically you run a dummy autobuild run to create a directory and database entries, then you copy your files there, then you run autobuild and tell it to carry on and do the combine step. You'll need a map_coeffs.mtz file that has map coefficients (they won't be used but have to be there just to make it run).

--------------------------------------------------------
#!/bin/csh -f
#COMBINE_MODELS SCRIPT

if (-d PDS || -d AutoBuild_run_1_) then
 echo "Please run in a directory without PDS or AutoBuild_run_1_"
 exit 1
endif

echo "Setting up combine models with a dummy run. NOTE:
multiple_models_group_number must be correct"

phenix.autobuild fobs.mtz multiple_models=true seq_file=seq.dat \
 combine_only=true multiple_models_group_number=2 \
input_map_file=map_coeffs.mtz \
multiple_models_number=1 > dummy_autobuild.log

echo "Copying files to AutoBuild_run_1_/MULTIPLE_MODELS"
mkdir AutoBuild_run_1_/MULTIPLE_MODELS
cp coords1.pdb AutoBuild_run_1_/MULTIPLE_MODELS/initial_model.pdb_1_1
cp coords2.pdb AutoBuild_run_1_/MULTIPLE_MODELS/initial_model.pdb_1_2
cp map_coeffs_1.mtz AutoBuild_run_1_/MULTIPLE_MODELS/initial_model.mtz_1_1
cp map_coeffs_2.mtz AutoBuild_run_1_/MULTIPLE_MODELS/initial_model.mtz_1_2

ls AutoBuild_run_1_/MULTIPLE_MODELS/

echo "Running autobuild to combine files in
AutoBuild_run_1_/MULTIPLE_MODELS"

phenix.autobuild combine_only=true seq_file=seq.dat carry_on=true run=1 > autobuild_combine.log

# END OF COMBINE_MODELS SCRIPT
-------------------------------------------------------

Build starting from a very accurate but very small part of a model

phenix.autobuild data=data.mtz model=MR.pdb \
rebuild_from_fragments=True\
seq_file=seq.dat \
i_ran_seed=124881 \
nproc=4

You can have autobuild try to start rebuilding from fragments of a model. Keyword is rebuild_from_fragments=True. This sets the parameters two_fofc_denmod_in_rebuild=True, all_maps_in_rebuild=True, rebuild_in_place=False, and sets consider_main_chain_list to include your input model. You might want to use this if you look for ideal helices using Phaser, then rebuild the resulting partial model, as in the Arcimboldo procedure. The special feature of finding helices is that they can be very accurately placed in some cases. This really helps the subsequent rebuilding. If you have enough computer time, then run it several or even many times with different values of i_ran_seed. Each time you'll get a slightly different result. Here two different types of density-modified maps are calculated and models are built with each. The starting phases and phase probabilities for one type are based on a sigmaA-weighted 2mFo-DFc map. Those for the other type come from density modification using a model-based map as a target map and finding phases that yield a map that is as close to this one as possible. In either case the starting phases and phase probabilities are used in a second cycle of density modification in which part of the density modification target is a calculated map and part is standard density modification (including solvent flattening, histogram matching, NCS).

Morph an MR model and rebuild it

phenix.autobuild data=data.mtz model=MR.pdb \
morph=True rebuild_in_place=False seq_file=seq.dat

You can have autobuild morph your input model, distorting it to match the density-modified map that is produced from your model and data. This can be used to make an improved starting model in cases where the MR model is very different than the structure that is to be solved. For the morphing to work, the two structures must be topologically similar and differ mostly by movements of domains or motifs such as a group of helices or a sheet.

The morphing process consists of identifying a coordinate shift to apply to each N (or P for nucleic acids) atom that maximizes the local density correlation between the model and the map. This is smoothed and applied to the structure to generate a morphed structure.

Build an RNA chain

phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=RNA

Build a DNA chain

phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=DNA

Density-modify with or without a model and make maps

You can use the AutoBuild Wizard as a convenient way to run resolve density modification with or without including model-based information. Just use a command like this:

phenix.autobuild data=data.mtz model=coords.pdb \
   maps_only=True seq_file=seq.dat

or

phenix.autobuild data=data.mtz  \
   maps_only=True seq_file=seq.dat

The Wizard will calculate the same map that it would normally calculate given these data, and then it will write the map out and stop.

Density-modify starting with your map coefficients and make maps

You can use the AutoBuild Wizard as a convenient way to run resolve density modification starting with map coefficients you define. Just use a command like this:

phenix.autobuild data=data.mtz \
     maps_only=True  seq_file=seq.dat \
     map_file=starting_map.mtz map_labels="2FOFCWT PH2FOFCWT"

The Wizard will start with the phases in starting_map.mtz calculate the same map that it would normally calculate given these data, and then it will write the map out and stop.

Calculate a prime-and-switch map

phenix.autobuild data=data.mtz solvent_fraction=.6 \
   ps_in_rebuild=True model=coords.pdb maps_only=True

The output prime-and-switch map will be in the file prime_and_switch.mtz.

Possible Problems

General Limitations

You can include more than one type of chain in rebuilding by supplying one type of chains as ligands with input_lig_file_list and rebuilding another type:

chain_type=PROTEIN  # build only protein
input_lig_file_list=MyDNA.pdb  # just read in DNA coordinates and include in refinement

In this case only protein chains will be built, but the DNA coordinates in MyDNA.pdb will be included in all refinements and will be written out to the final coordinate file. You may wish to add the keyword:

keep_pdb_atoms=False  #keep the ligand atoms if model (pdb) and ligand overlap

which will tell AutoBuild that the ligand (DNA) atoms are to be kept if the model that is being built (protein) overlaps with it. (The default is to keep the model that is being built and to discard any ligand atoms that overlap).

This whole process is likely to require substantial editing of the PDB files by hand because when you build DNA, a lot of chains are going to be built into the protein region, and when you build protein, it is going to be accidentally built into the DNA.

These column names may not be obvious. Here is how to find out what they will be. Do a quick dummy run like this with XXX as labels:

phenix.autobuild w2.sca coords.pdb input_labels="XXX XXX"

The Wizard will print out a list of available labels like this:

Sorry, the label XXX does not exist as an amplitude array in
the input_data_file ImportRawData_run_8_/w2_PHX.mtz
...available labels are: ['w2', 'SIGw2', 'None']

Then you know that the correct command is:

phenix.autobuild w2.sca coords.pdb input_labels="w2 SIGw2"

Specific limitations and problems

Literature

Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. T.C. Terwilliger, R.W. Grosse-Kunstleve, P.V. Afonine, N.W. Moriarty, P.H. Zwart, L.-W. Hung, R.J. Read, and P.D. Adams. Acta Cryst. D64, 61-69 (2008).

Interpretation of ensembles created by multiple iterative rebuilding of macromolecular models. T.C. Terwilliger, R.W. Grosse-Kunstleve, P.V. Afonine, P.D. Adams, N.W. Moriarty, P.H. Zwart, R.J. Read, D. Turk, and L.-W. Hung. Acta Cryst. D63, 597-610 (2007).

Improving macromolecular atomic models at moderate resolution by automated iterative model building, statistical density modification and refinement. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 59, 1174-82 (2003).

Using prime-and-switch phasing to reduce model bias in molecular replacement. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 60, 2144-9 (2004).

Rapid automatic NCS identification using heavy-atom substructures. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 58, 2213-5 (2002).

Maximum-likelihood density modification. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 56, 965-72 (2000).

Statistical density modification with non-crystallographic symmetry. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 58, 2082-6 (2002).

Statistical density modification using local pattern matching. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 59, 1688-701 (2003).

Maximum-likelihood density modification using pattern recognition of structural motifs. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 57, 1755-62 (2001).

Map-likelihood phasing. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 57, 1763-75 (2001).

Automated side-chain model building and sequence assignment by template matching. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 59, 45-9 (2003).

Automated main-chain model building by template matching and iterative fragment extension. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 59, 38-44 (2003).

List of all available keywords