Docking a model into a cryo-EM map with dock_in_map
Author(s)
- dock_in_map: Tom Terwilliger
Purpose
The routine dock_in_map will automatically dock a model or models into a map.
Usage
How dock_in_map works:
Dock-in-map uses both SSM and convolution-based shape searches to find a
part of a map that is similar to a model. The key elements of the search are:
An SSM search is carried out first (protein only). This search is
identical to the ssm_match_to_map option in superpose_models. Helices
and strands are found in the map to create a target model, then a brute
force superposition of helices and strands in the search model to those in
the target model is carried out. Potential superpositions are evaluated
by map-model correlation.
If the SSM search fails to find a satisfactory superposition,
an initial search at lower resolution that focuses on overall shape of the
molecule is carried out. This allows a quick search but is less
effective if the map contains more than one molecule. If this fails, the
next methods are tried.
Initial search without rotation. This allows a fast search that can place
a molecule that is just shifted by a translation
Optional initial search based on matching moments of inertia of model and map.
This is also a fast search but requires an accurate map that looks a lot
like the model and does not have extra density.
Full search at the full resolution of the map. This search can be run on
multiple processors to speed it up.
Uses
You can use dock_in_map to place any number of copies of any number of unique
molecules. You specify the molecules one by one:
search_model=model1.pdb
search_model=model2.pdb
search_model=model3.pdb
and the number of copies all at once with:
search_model_copies="3 1 2"
to look for 3 copies of model1.pdb, one of model2.pdb and two of model3.pdb.
You can also use dock_in_map to superimpose density from one map on another
map. You just specify the second map with the density:
search_map_file=map_file.ccp4
and dock_in_map will find the density in this map, convert it to a pseudo-model
that corresponds to this density, and then search for the pseudo-model just as
any other model. When dock_in_map is done finding the pseudo-model, it applies
the transformation that it found to the original density in your search_map_file
to create a new map that superimposes on the target map.
Examples
Standard run of dock_in_map:
Running dock_in_map is easy. From the command-line you can type:
phenix.dock_in_map 1ss8_A.pdb emd_8750.map resolution=4 nproc=4 \
pdb_out=placed_model_from_emd_8750.pdb
where 1ss8_A.pdb is the model you would like to place,
emd_8750.map is a CCP4, mrc or other related map format, and you
specify the nominal resolution of the map and the number of processors to use.
For docking density into a map:
- phenix.dock_in_map emd_8750.map density.ccp4 resolution=4
- superposed_map_file=density_superposed.ccp4
Possible Problems
If your map has pseudo-symmetry (like a proteasome) you might need to box
one subunit or try ssm_search=False to use a more thorough search in docking.
Specific limitations and problems:
Literature
Additional information
List of all available keywords
- job_title = None Job title in PHENIX GUI, not used on command line
- input_files
- map_file = None File with CCP4-style map. May have origin in any location.
- seq_file = None Optional sequence file
- search_model = None Input PDB file(s) to be placed in the map.
- search_map_file = None Search map file with CCP4-style map to be superimposed on the reference map. Map will be converted to a pseudo-model that corresponds to density in map. At completion of dock_in_map the search map file will be transformed to match the target map and written out.
- search_model_copies = None You can specify how many copies for each search model by supplying one value of search_model_copies for each of your search models. Put them in all at once as in search_model_copies="1,3,2".
- search_map_copies = None You can specify how many copies for each search map by supplying one value of search_map_copies for each of your search models. Put them all in at once as in search_map_copies="3,4,5"
- fixed_model = None Fixed model marking region not to be considered. Fixed model will be added to placed search model to create new fixed model if append_to_fixed_model is set and multiple searches are done.
- target_model = None Model (typically helices-strands model) already docked into map and used as a marker.
- symmetry_file = None Symmetry file (.ncs_spec format or MATRX records) with reconstruction symmetry. Used in identification of unique part of map. Required if allow_symmetry is set and more than one copy is requested. NOTE: symmetry file applies to map file in original position
- output_files
- pdb_out = placed_model.pdb Search PDB file, transformed to superpose on the target map file.
- superposed_map_file = superposed_map.ccp4 Search map file (if just one), transformed to superpose on the target map file.
- skip_fixed_model_in_pdb_out = True Skip fixed model in output PDB file. Just write the new part.
- temp_dir = None Temporary directory.
- crystal_info
- resolution = None High-resolution limit for main search. This can be lower resolution than the data. The search is quicker at lower resolution. If your model is poor, try 2-3 A lower resolution than your data (i.e, if your data is 2.5 A, try 5 A).
- scattering_table = n_gaussian wk1995 it1992 *electron neutron Choice of scattering table for structure factor calculations. Standard for X-ray is n_gaussian, for cryoEM is electron.
- sequence = None Sequence as text string. Normally supply a sequence file instead
- chain_type = *PROTEIN DNA RNA Chain type
- wrapping = None You can specify whether the map is wrapped (can map values outside bounds to inside with cell translations).
- search
- dock_chains_individually = False Dock each chain (identified by chain_id field) individually. This is useful if the chains might have different orientations in the map compared to the search model, such as in a search model that is a predicted model. Normally used along with create_unique_chain_at_end=True for predicted models to put the model back together.
- create_unique_chain_at_end = None Take all docked chains and try to create one chain that has no duplicate residue numbers and that has distances between ends of fragments consistent with the number of residues between them. If symmetry has been applied, create N copies representing that symmetry. Default is True if dock_chains_individually = True and False otherwise.
- minimum_cc_to_keep_domain = 0.2 Do not keep domains in create_unique_chain_at_end if cc is less than minimum_cc_to_keep_domain.
- weight_sequential_fragments_by_distance = None Try to dock a chain in a way that minimizes the distance between its first residue and the previously-docked residue with the highest residue number less than this, and similarly for the last residue and the next available placed residue. Default is True if create_unique_chain_at_end is set. Normally use only if you have a model with a single chain and you are docking pieces of that chain.
- choose_better_of_individual_and_group_docking = None Dock entire search model (normally one only) and also by chain...pick better-fitting of the two for each chain
- low_res_search = True Try to fit by searching at low resolution first
- dock_with_mr = True Try using MR to dock first
- ssm_search = None Try to fit by searching for secondary structure. Default is False for dock_in_map and True for dock_and_rebuild if dock_with_mr is not used.
- refine_cycles = 3 rigid-body refinement cycles
- ssm_search_min_cc = 0.30 Stop ssm search if this cc achieved
- backup_resolution_cutoff_searches = 2 If initial fitting does not work, try up to this many times increasing resolution by backup_resolution_ratio each time
- backup_resolution_ratio = 1.67 If initial fitting does not work, try up to backup_resolution_cutoff_searches times increasing the resolution by backup_resolution_ratio each time
- resolution_radius_scale = 0.5 Resolution for low-res search will be resolution_radius_scale times the radius of gyration of the search model.
- align_moments = False Try to fit by aligning moments of inertia if size of density region and molecule are similar
- max_radius_ratio = 2. Radius of gyration of molecule and density must be within max_radius_ratio of each other
- radius_scale = 1.5 Mask for density and atoms will be radius_scale times the radius of gyration of model
- use_symmetry = True If search_model_copies or search_map_copies are the same for each model and greater than one and allow_symmetry is set, use symmetry in the map to place all copies after the first one.
- skip_if_low_cc = True Skip solution before rigid-body refinement if map-model CC is less than half of min_cc.
- rigid_body_refinement = True Run rigid-body refinement on final model
- rigid_body_refinement_single_unit = True Run rigid-body refinement with just one unit (do not break up into chains)
- rigid_body_refinement_split_method = *chain_id segid When splitting up molecule for rigid-body refinement (if rigid_body_refinement_single_unit=False), use either chain_id or segid to split up molecule
- rigid_body_refinement_resolution = None Run rigid-body refinement at this resolution if specified
- append_to_fixed_model = True Append placed search model to fixed model (if any) after search
- min_cc = 0.4 If quick run, stop if minimum CC is achieved in local search. Also always skip if starting CC_mask is less than 1/4 min_cc.
- run_in_boxes = True Run on sub-boxes and combine at the end
- target_box_size = 60 Try to get boxes about this big on a side (grid units)
- target_boxes = None Try to get this many boxes. Default is nproc unless this makes box size much smaller than target_box_size
- box_to_run = None Run only this box
- box_overlap_scale = 1 box overlap (overlap of boxes) will be box_overlap_scale times the density radius
- edge_ratio = 10 box edge box_overlap times edge_ratio
- density_radius = None Radius for density to be cut out and compared. Default is 6 times the resolution.
- model_radius = 3 Radius for removing density near fixed_model
- zero_value = 0 Value to set map in regions overlapping fixed model
- density_peaks = 20 Number of NCS-related peaks of density to check
- delta_phi = 20 Angular spacing of search
- max_rot = None Maximum rotations to try
- rotz_only = None Rotate only around Z
- single_positions_to_try = 10 Number of offset positions to try in optimizing orientation. Positions along the chain are selected as centers and a local fit near each position is carried out. The resulting offsets relative to the original placement are used to optimize the overall orientation and position.
- max_position_shift_frac = 0.05 Maximum fractional positional shift in single_positions run
- min_relative_cc = 0.67 Minimum local CC relative to original CC to keep a local search. This is a way to reject local searches that are completely wrong.
- sieve_fit = None Use sieve_fit fraction of single positions in fitting. If None, use all
- ncs_copies_max = None Maximum number of matching models to write. If more than one they will be written as MODEL records in the output PDB file. You can get them individually afterwards with phenix.pdbtools placed_model.pdb keep="model 1" etc.
- start_rot = None Three numbers rotx, rotz, rotx defining the starting rotation of the search model. Normally used along with delta_phi=1000 or max_rot=1 to generate exactly one defined rotation.
- search_center = None Optional coordinates in search model for centering search. Note this is different from target_search_center which is the location xyz in the map to look.
- search_center_selection = None Optional selection defining coordinates in search model for centering search.
- target_search_center = None Optional coordinates in reference map where search_center should be approximately located after superimposing maps. Used to eliminate possible superpositions that place the search center elsewhere. Overlap scores decreased based on distance to target_search_center/density_radius.
- model_search_position = None Optional coordinates (usually part of search model) for matching to target_search_position after transformation. These can be specified in addition to target_search_center. Can have multiple model_search_position and target_search_position pairs by specifying each multiple times in order. NOTE: Not compatible with map search
- target_search_position = None Target positions for model_search_position after transformation.
- search_position_radius = None Radius for comparison of target_search_position and transformed search_position values. Default is density_radius. If specified, must be a single value or the same number of values as entries in model_search_position and target_search_position
- rot_id_n = None Number of rotation groups. Along with rot_id_group, allows defining groups of rotations to be carried out in one run. .short_caption = Number of rotation groups
- rot_id_group = None rotation group to include. See rot_id_n.
- map_box = True Run map_box to extract useful part of map before search
- fix_search_position = False You can choose to not move your model center of mass to the origin by fixing the search position
- search_box_size = None You can choose the size of the search box for local searches. Normally set by default corresponding to 3 times density_radius
- lower_bounds = None You can select a part of your map for analysis with lower_bounds and upper_bounds.
- upper_bounds = None You can select a part of your map for analysis with lower_bounds and upper_bounds.
- keep_search_order = None Keep search order as input
- remove_water = False Remove waters and other hetero atoms from input files
- control
- nproc = 1 Number of processors to use
- random_seed = 171731 Random seed
- verbose = False Verbose output
- quick = True Try translation search first and stop if CC is at least min_cc
- guiGUI-specific parameter required for output directory