Fixing register errors in a model with fix_insertions_deletions
Author(s)
- fix_insertions_deletions: Tom Terwilliger
Purpose
The routine fix_insertions_deletions is a tool for fixing register errors
in a model by comparing the density at side-chain positions to a
sequence file.
Usage
Normally you will access the functionality of fix_insertions_deletions
by running the Phenix map_to_model tool in the Phenix GUI.
However you can run it directly as well (there is no GUI for
fix_insertions_deletions).
How fix_insertions_deletions works:
The fix_insertions_deletions tool examines the density in the supplied map
at the position of each side chain in the supplied model and creates a table
of side-chain probabilities corresponding to each segment in the model.
These side-chain probabilities are used to generate a map-based sequence for
the model and map. This map-based sequence is then compared to the actual
sequence to identify positions where the sequence register is likely to
be incorrect, and what changes in register are needed to fix it.
Each place a register shift is needed is used as a target for main-chain
rebuilding. During rebuilding the specified insertion or deletion is
enforced so that only models with the desired changes are obtained (if
possible).
Additional rebuilding of the worst-fitting regions is also carried out.
Using fix_insertions_deletions:
The tool fix_insertions_deletions is usually run automatically as part of
trace_and_build. However you can run it yourself to try and fix up a model.
Input map file: The map file should cover the model you supply.
Resolution: Specify the resolution of your map (usually the
resolution defined by your half-dataset Fourier shell correlation
Model: Supply a model that you want to fix. Only the main-chain will matter.
Sequence: Supply a sequence file that covers at least the part of the
model that is supplied
Examples
Standard run of fix_insertions_deletions:
You can use fix_insertions_deletions to fix register shifts in
a model based on a cryo-EM map:
phenix.fix_insertions_deletions my_map.mrc resolution=3 my_model.pdb seq.dat
Possible Problems
Specific limitations and problems:
Literature
Additional information
List of all available keywords
- job_title = None Job title in PHENIX GUI, not used on command line
- input_files
- map_file = None File with CCP4-style map. May have origin in any location.
- model_file = None Input PDB file with chains to be adjusted.
- seq_file = None Optional sequence file
- placement_pickle_file = None Read placements from this file
- output_files
- target_output_format = *None pdb mmcif Desired output format (if possible). Choices are None ( try to use input format), pdb, mmcif. If output model does not fit in pdb format, mmcif will be used. Default is pdb.
- pdb_out = fix_insertions_deletions.pdb Output rebuilt model
- output_placement_pickle_file = None Write placements to this file
- temp_dir = None Temporary directory. Default is fix_insertions_deletions_xx where xx creates new directory
- crystal_info
- resolution = None High-resolution limit for map analysis.
- scattering_table = n_gaussian wk1995 it1992 *electron neutron Choice of scattering table for structure factor calculations. Standard for X-ray is n_gaussian, for cryoEM is electron.
- chain_type = *PROTEIN Chain type. Must be PROTEIN
- solvent_content = None Solvent fraction of the cell. If this is density cut out from a bigger cell, you can specify the fraction of the volume of this cell that is taken up by the macromolecule. Normally set automatically. Values go from 0 to 1.
- solvent_content_iterations = 3 Iterations of solvent fraction estimation
- use_mask_if_present = True If map is masked, use the mask as solvent content
- sequence = None Sequences
- origin_cart = None Origin (cartesian coordinates, overrides value based on map)
- strategy
- first_cycle_for_fixing = 1 First macro-cycle where insertions/deletions should be tested
- loop_method = *trace_chain *extend_only *split_loop *rebuild Method for loop building. None means try everything. trace_chain is finding CA positions to trace chain. extend-only is using resolve model-building to build loop. split_loop is cut loop in the middle and refine. rebuild is rebuild loops
- mask_secondary_structure_in_split_loop = True Mask out mask secondary structure in split loop
- residues_to_skip_on_ends = -2 Skip residues_to_skip_on_ends at ends of secondary structure in masking. Negative means keep that many residues off ends of secondary structure
- rebuilding
- fix_insertions_deletions = False Fix insertions and deletions using sequence_from_map to adjust alignment to match sequence that is supplied. NOTE: Use instead split_with_sequence which works better.
- restrain_ends_to_original = None Restrain this many residues at each end to original
- fix_insertions_deletions_only = None Only fix clear insertions and deletions
- take_all_insertions_deletions = None Accept insertion/deletion fixes without scoring (default quick is True)
- ratio_for_sequence_register = 1.0 Accept fixes for insertions/deletions if score is above ratio_for_sequence_register times previous score
- try_as_is = None Try rebuilding without insertions/deletions
- try_insertions = None Try insertions
- try_deletions = None Try deletions
- refine = None Refine models at start of procedure. Default is True unless quick is set.
- refine_cycles = 1 Refinement cycles (except final cycles)
- refine_b = None Refine B-values
- good_enough_cc = None If all residues have this CC, don't bother rebuilding them. Default is 0.7 (0.6 if quick=True)
- rebuild_length_worst = 15 Longest rebuild length to try
- max_insert_or_delete = 1 Maximum residues to insert or delete in on rebuild stage
- average_rebuild_length = True Choose rebuild length as average of minimum and optimal if optimal is longer than rebuild_length_worst. Alternative is use minimum if optimal is too long.
- time_per_residue = 1 How long to try in fitting loops (sec/residue)
- max_rebuild_cycles = None Maximum rebuilding cycles per macro cycle. Default is 20 (1 if quick is set and sequence is present)
- macro_cycles = None Macro cycles of rebuilding and refinement. Default is 4 (1 if quick is set)
- start_rebuild = None Starting residue to rebuild. If specified, this is all that is done
- end_rebuild = None Ending residue to rebuild. If specified, this is all that is done
- rebuild_segment = 1 Segment to rebuild from start_rebuild to end_rebuild. None means rebuild all segments from start_rebuild to end_rebuild.
- minimum_contact_distance = 3 Minimum distance between CA atoms not immediately connected
- split_with_sequence = False Use sequence assignment to identify sequence register errors. Runs sequence_from_map to split and fix assignment and then fit_all_loops to fill in gaps.
- keep_connectivity_in_split_with_sequence = True Keep connectivity in split_with_sequence
- first_cycle_for_split = 2 First macro-cycle where splitting should be tested
- split_input_model = True Input model will be split into segments
- weights
- weight_vdw = 10 weight on very close CA-CA contacts
- weight_ca_ca_dist = 1. weight on CA-CA distance
- weight_proximity_to_known_position = 0.5 weight on proximity to well-identified CA position
- weight_density = 1.0 weight on density at mid-points between CA atoms
- weight_cc_mask_score = 1.0 Weight on map correlation in evaluating chain direction
- weight_seq_score = 1.0 Weight on sequence-map matchin evaluating chain direction
- weight_x_gly_score = 1.0 Weight on excess Gly and X residues in evaluating chain direction
- sequencing
- random_sequences = 100 Number of random sequences of each length to use as baseline
- positive_gap_penalty = 1 Penalty for missing residues is positive_gap_penalty * gap**2
- negative_gap_penalty = 2 Gap penalty for extra residues is negative_gap_penalty * gap**2
- max_gap_length = 2 Maximum gap length for adjacent alignments
- minimum_alignment_length = 5 Minimum length of an alignment
- score_by_residue_groups = None Use residue groups in sequence alignment and listing of optimal sequences. Default is True unless no sequence is supplied.
- trace_chain
- helices_strands_cc_min = 0.5 Minimum map CC for helices/strands
- n_random_frag = 100
- control
- multiprocessing = *multiprocessing sge lsf pbs condor pbspro slurm Choices are multiprocessing (single machine) or queuing systems
- queue_run_command = None run command for queue jobs. For example qsub.
- nproc = 1 Number of processors to use
- random_seed = 171731 Random seed
- verbose = False Verbose output
- skip_temp_dir = True Skip temp_dir when scoring
- quick = True Quick run
- superquick = False Very quick run
- ignore_symmetry_conflicts = False You can ignore the symmetry information (CRYST1) from coordinate files. This may be necessary if your model has been placed in a box with box_map for example.
- max_dirs = 1000 Maximum number of directories (fix_insertions_deletions_xxxx)
- resolve_size = None Size of resolve to use.
- coarse_grid = None Use a coarse grid in RESOLVE (saves on memory)
- em_side_density = False Use EM side chain density. Alternative is to use standard x-ray side chain density in sequence templates.
- guiGUI-specific parameter required for output directory