Model completion by extraction of secondary structure and reassembly

Author(s)

model_completion: Tom Terwilliger

Purpose

Try to complete a model by extracting all the secondary structure elements (helices and strands), rearranging and reconnecting them into a new model.

How Model Completion works:

Model completion is based on the idea that the helices and strands in protein model are generally correct, while the loops connecting them may be inaccurate or even connect the wrong helices and strands.

Model completion first identifies secondary structure in a model. Then it examines all the loops connecting these helices and strands in the model and keeps the best-fitting and most plausible loops intact. All the other loops are removed from the model, resulting in a set of fragments representing part of the structure. The sequence assignments and connectivities of these fragments are then removed so that any fragment can potentially be connected to any other fragment.

The core step in model completion is identification of all plausible connections between ends of fragments in this partial model. Connections that were present in the supplied model are considered in this step along with potential connections that are found by tracing from one fragment to another through high density in the density map. Each possible connection is scored based on a set of criteria including (1) the lowest density at or between main-chain atom positions, (2) the maximum difference in density between adjacent main-chain atom positions and (3) length of the connection (and other criteria as well).

All possible arrangements of the available fragments and connections that use all the availble fragments (in either forward or reverse directions) are then considered using an iterative connection algorithm. Pairs of fragments are listed, then all pairs of pairs (triples). Then larger groupings are assembled using the higher scoring sets of smaller groupings (typically the top half are considered at each stage), until a set of potential complete chains are obtained. At this stage a chain is a CA-only model. All the CA-only models are then converted to full chains using PULCHRA (Rotkiewicz & Skolnick 2008) followed by adding side-chains with phenix.sequence_from_map.

The potential complete chains obtained typically have many duplicate parts but differ in connectivity. Complete chains are scored in manner similar to that used above for fragments, with an additional contribution to the score based on the match between the supplied sequence and side-chain density along a chain.

The highest-scoring complete chains are then optimized by recombination among these chains followed by real-space refinement. The optimized chains are rescored and the top resulting chains are provided.

Examples

Standard run of model_completion:

Running model_completion is easy. From the command-line you can type:

phenix.model_completion my_model.pdb my_map.mrc resolution=3 nproc=8

This will attempt to complete my_model.pdb based on the map my_map.mrc at a resolution of 3 A using 8 processors.

Possible Problems

Specific limitations and problems:

Literature

Fast procedure for reconstruction of full-atom protein models from reduced representations. P. Rotkiewicz, and J. Skolnick. J Comput Chem 29, 1460-5 (2008).

Additional information

List of all available keywords

job_title = None Job title in PHENIX GUI, not used on command line
input_files
- seq_file = None Sequence file
- map_model
  - full_map = None Input full map file
  - half_map = None Input half map files
  - model = None Input model file
output
- rebuilt_model = None Output file name
- overwrite = True Overwrite files with same names
model_completion
- refine_cycles = 3 Refinement cycles (set to zero to not refine)
- max_fraction_verylow = 0.75 Maximum fraction of residues in a loop in low density to keep
- allow_reverse = False Allow reverse direction of fragments
- minimum_score_tries = 1 Number of tries at model completion by lowering minimum connection score
- convincing_placement_score = 1 Z score of sequence alignment for a pair of segments in input model to keep the connection. Use 1 to keep most, -100 to keep everything, 3 to keep only very convincing junctions
crystal_info
- resolution = None Nominal resolution of map
- resolution_ratio = 1.0 Ratio of cutoff resolution to nominal resolution of map
- minimum_resolution = 2.5 If resolution is finer than minimum_resolution, use minimum_resolution. This speeds up model-building.
- scattering_table = n_gaussian wk1995 it1992 *electron neutron Choice of scattering table for structure factor calculations. Standard for X-ray is n_gaussian, for cryoEM is electron.
- wrapping = None You can specify whether the map is wrapped (can map values outside bounds to inside with cell translations).
- sequence = None Sequences
control
- nproc = 1 Number of processors (if None, use all available)
- thoroughness = quick *medium thorough Thoroughness. Medium and thorough take longer than quick
- ignore_symmetry_conflicts = False You can ignore the symmetry information (CRYST1) from coordinate files. This may be necessary if your model has been placed in a box with box_map for example.
- read_map_and_model = False Read data from previous runs
- verbose = False Verbose output
guiGUI-specific parameter required for output directory
- output_dir = None