Rapid model-building with trace_and_build

Author(s)

trace_and_build: Tom Terwilliger

Purpose

The routine trace_and_build is a tool for rapid protein model-building.

Usage

How trace_and_build works:

The trace_and_build tool traces the path of a polypeptide chain by working from high to low density and following the highest density path that does not yield branching. It then finds CB positions and builds an atomic model.

Using trace_and_build:

Normally you will access the functionality of trace_and_build by running the Phenix map_to_model tool in the Phenix GUI. However you can run it directly as well (there is no GUI for trace_and_build). Usuall you will run trace_and_build on the unique part of a cryo-EM map. Your main options are the resolution, whether to try a quick run (quick=True) or a more thorough run (quick=False), and how many segments to try and build.

Input map file: Usually you should supply trace_and_build with a sharpened map that represents the unique part of your structure. You can use phenix.map_sharpening to sharpen your map. If your map has symmetry you can use phenix.map_box with extract_unique=true to extract this part of the map.

Resolution: Specify the resolution of your map (usually the resolution defined by your half-dataset Fourier shell correlation

Sequence file: Supply a sequence file with the sequence or sequences of the molecule(s) to be built. If more than one they will simply be put together at this point.

Segments to build: you can choose to build just one or a few of the longest segments that trace_and_build can find (max_segments=3), or everything (max_segments=None)

Quick run: You can try to run quickly (quick=true) or more thoroughly (quick=False). One difference is that with quick=True, the number of segments to build is normally limited to the number of residues in your sequence file divided by 50, while with quick=False, it is unlimited. The other is that with quick=False, after building the model an attempt to fix insertions and deletions will be made (fix_insertions_deletions=True). You can set these parameters separately as well.

Input model: You can supply an input fixed model and trace_and_build will use it as a potential interpretation of the tracing of part of the map. It will not be used as is, rather CA positions will be extracted and used in later interpretation steps. This fixed model will take the place of the find_helices_strands step that is otherwise carried out.

Procedure used by trace_and_build

The procedure used by trace_and_build has several steps:

If no model is supplied, an initial fixed model is created by searching for regular secondary structure in the map. Then the tool find_helices_strands is used to analyze the new map to find regular secondary structure. Optionally both directions of each segment of the fixed model can be kept (allow_reverse=True).

The core of trace_and_build is to find, extend, and connect segments of high density in the map. The way this is done is a lot like the way a person would examine the path of a chain in a map, starting from a region of clear (high) density, following the chain until it ends, then lowering the contour level until a path is visible and following that one.

Initial segments of density are identified using the model that is supplied or found with find_helices_strands. New segments are identified from extended regions of high density, working down from high to low density in the map. Connections and extensions are also made working down from high to low density. Chain tracings are only kept if they do not have branching.

Once the path(s) of the polypeptide chain are identified, likely positions of CB atoms are identified from the presence of side-chain density along the traced path. The positions of CA atoms are then guessed and refined with the tool phenix.refine_ca_model which adjusts the number of CA atoms and their positions to match the likely CB positions, the chain tracing, and expected CA-CA distances.

Once a CA-only model is created, an attempt to correct the model using CA positions in the fixed model (input directly or from find_helices_strands). In this step the CA positions in segments in the fixed model are matched with those of the CA-only model, and if they overlap, the CA positions from the fixed model are used.

An all-atom model is generated from each CA-only model using Pulchra . If allow_reverse is set, then each possible direction of each segment is considered. Each of these possibilities is refined and scored based on map-model correlation (CC), agreement of side-chain density with the sequence, and H-bonding in the model.

The highest-scoring model is written out so that it superimposes on the input map.

Examples

Standard run of trace_and_build:

You can use trace_and_build to build a model based on a cryo-EM map:

phenix.trace_and_build my_map.mrc resolution=2.8 my_seq.dat

Using trace_and_build to evaluate forward and reverse versions of a model:

You can use trace_and_build to work on one or more segments, checking both directions:

phenix.trace_and_build my_map.mrc resolution=2.8 my_seq.dat my_fragment.pdb \
find_chains=False extend_chains=False connect_chains=False allow_reverse=True

This will read in your fragment(s), create forward and reverse versions, score them both, and try to build the better one (but without extending it or building any new model). You can set verbose=True to see more details of the scoring if you like. If one direction is clearly better than the other, only it will be kept. If you want to keep both, set the parameter convincing_delta_score to a big number or None (take everything).

Possible Problems

Specific limitations and problems:

Literature

Fast procedure for reconstruction of full-atom protein models from reduced representations. P. Rotkiewicz, and J. Skolnick. J Comput Chem 29, 1460-5 (2008).

Automated side-chain model building and sequence assignment by template matching. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 59, 45-9 (2003).

Rapid model building of alpha-helices in electron-density maps. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 66, 268-75 (2010).

Rapid model building of beta-sheets in electron-density maps. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 66, 276-84 (2010).

Additional information

List of all available keywords

job_title = None Job title in PHENIX GUI, not used on command line
input_files
- map_file = None File with CCP4-style map. May have origin in any location.
- model_file = None Input PDB file with fragments to be considered as starting points for building. Used in the same way as the model generated with find_helices_strands is used. If allow_reverse is set, try both directions of each fragment. If connect_chains is set, try connecting pairs of fragments. If extend_chains is set, try extending each end of each fragment.
- seq_file = None Sequence file (Fasta format or sequences separated by blank lines)
- marker_model_file = None Input PDB file with marker atoms from trace_chain
- trace_model_file = None Input PDB file with dummy CA atoms marking path of chain at about 0.5-1 A intervals
- input_scoring_file = None Input .pkl file with scoring information (can have more than one)
- fixed_model = None same as model_file.
- placement_pickle_file = None Read placements from this file
output_files
- pdb_out = trace_and_build.pdb Model built with trace_and_build
- output_scoring_file = None Output .pkl file with scoring information
- output_model_file = None Output PDB file with possible CA/CB positions
- output_marker_model_file = None Output PDB file with marker atoms from trace_chain
- output_placement_pickle_file = None Write placements to this file
- temp_dir = None Temporary directory. Default is trace_and_build_xx where xx creates new directory
crystal_info
- resolution = None High-resolution limit for map analysis.
- scattering_table = n_gaussian wk1995 it1992 *electron neutron Choice of scattering table for structure factor calculations. Standard for X-ray is n_gaussian, for cryoEM is electron.
- chain_type = *PROTEIN Chain type. Must be PROTEIN
- ca_ca_distance = 3.8 CA-CA distance
- solvent_content = None Solvent fraction of the cell. If this is density cut out from a bigger cell, you can specify the fraction of the volume of this cell that is taken up by the macromolecule. Normally set automatically. Values go from 0 to 1.
- solvent_content_iterations = 3 Iterations of solvent fraction estimation
- use_mask_if_present = True If map is masked, use the mask as solvent content
- sequence = None Sequences
- wrapping = False For cryo-EM maps, wrapping should be off
- origin_cart = None Origin (cartesian coordinates, overrides value based on map)
strategy
- get_path_length_only = False Just find adjusted path length and return it
- find_chains = True Try to create new chains in build_all_loops
- extend_chains = True Try to extend chains in build_all_loops
- connect_chains = True Try to connect chains in build_all_loops
- length_variants_to_try = 5 You can try several lengths of fragments (number of CA). Not compatible with allow_reverse=False
- allow_reverse = True If two chains are opposite direction but connected, reverse the one with lower score (usually shorter) and connect them. Also consider both directions of fragments from model_file.
- morph_ca_only = False Just morph chain and return trace through them
- trace_outside_model = False Trace outside model. After initial tracing, mask out region found and look for additional segments
- correct_segments = True Correct segments. Try to fix errors using fixed model as a template
- chains_from_fixed_model = True Use fixed model to mark initial chain positions.
- rerun_with_avail_seq = False Rerun sequence alignment with parts of the sequence already used removed.
- mask_side_chains = False Mask side chains in existing model in build_all_loops
- same_chain_rmsd_max = 1.0 RMSD between two CA models of same length to consider them the same
- local_trace = True Run trace in local regions
- split_and_join = True Split fragments at low density and then rejoin. Criterion is sd_ratio_pare_model
- test_threshold_fixed_segments = False Test thresholds when creating fixed segments
- require_ends_in_region = False Add ends of chain to region if necessary
- try_alternate_joins = False try joining each end of pair of fragments, not just the first
- skip_path_if_missing = True Skip path if points are missing
- split_at_low_density = True Split at low density points in split_and_join
- rejoin_split_fragments = True Specifically rejoin the split fragments leaving out poor junctions
- trim_ends_and_join = False Trim ends in split_and_join
- keep_main_chain_path = False Keep path of main chain when creating fixed fragments. Helps incorporate fixed model into final model, but may decrease length of fragments created (not used by default in sequence_from_map).
- high_density_from_model = False Create high density in map along path of main chain atoms in fixed model. Also sets keep_main_chain_path. Helps incorporate fixed model into final model, but may decrease length of fragments created (not used by default in sequence_from_map).
- max_grid_spacing = 1.4 Try to resample map with spacing of about target_grid_spacing if it is greater than max_grid_spacing
- target_grid_spacing = 1.2 Try to resample map with spacing of about target_grid_spacing if it is greater than max_grid_spacing
- min_grid_spacing = 0.8 Try to resample map with spacing of about target_min_grid_spacing if it is less than min_grid_spacing
- target_min_grid_spacing = 1.0 Try to resample map with spacing of about target_min_grid_spacing if it is less than min_grid_spacing
- pare_model = False Pare back model from ends if not convincing. Minimum value is mean minus sd_ratio_pare_model times sd.
- remove_clashes = True Remove clashing side chains in sequence_from_map
- trim_ends_extra_points = 1 Extra points tried in each direction deciding where to trim ends in split_and_join
- min_sd_ratio = 0.1 Minimum ratio of SD to mean
- sd_ratio_pare_model = 2.5 Lower limit of density after pare_model or in split_and_join is mean minus sd_ratio_pare_model times sd
- box_buffer = 5 Box buffer. Box will be at least the size of fragments plus this buffer in each direction (grid units)
- box_size = 150 150 150 You can specify the size of the boxes to use (grid units) when finding chains
- box_overlap_ratio = 3 If increasing the box size by 1/box_overlap_ratio reduces the number of boxes, do this.
- vary_sharpening = None Variable sharpening values. Zero is always added. For example 75 -50 -100 will sharpen by B = 75 then blur by b = 50 and b = 100. Increases residues built but may decrease accuracy
- maximum_duplication = 0.5 Maximum duplication in fragments
- target_n_overlap = 10 You can specify the targeted overlap of boxes
- ends_only = True Keep track only of the very ends of chains. Ignore direction. Just 2 atoms for each chain (one at each end).
- minimum_new_chain_length = 15 Minimum length to try building a new chain
- similar_threshold_ratio = 0.1 Similar threshold ratio. Thresholds differing by this fraction of the difference between maximim and minimum thresholds are considered similar.
- max_merge_cycles = 100 Maximum cycles of merging fragments
- minimum_extension_length = 10 Minimum length to try extending a chain
- minimum_extension_improvement = 2 Minimum improvement to keep a trial extension
- weight_path_by_density = True Weight path by density to trace through high density. Scale on distance is exp(path_density_weight times log(density-density_min)/(density max - density min))
- weight_fixed_segments_by_density = False Weight fixed segments path by density
- final_connect_chains = True Connect chains at very end after merging fragments. Uses full map
- path_density_weight = 2 Weight on having high density along path of trace.
- min_weight_ratio = 0.0001 Minimum scale on distance in trace through high density
- target_trim_length = 7.5 Trim back chains this much before trying to recombine.
- max_branch_length = 10 Maximum branch (not main path) to keep a segment If a branch is longer than max_branch_length or longer than max_fractional_branch_length times the main path, reject
- max_fract_non_contiguous = 0.25 Maximum fractional non-contiguous density
- max_fractional_branch_length = 0.5 Maximum branch (not main path) to keep a segment. If a branch is longer than max_branch_length or longer than max_fractional_branch_length times the main path, reject
- matching_end_dist = 3 Maximum distance between end atoms to consider them matching for purposes of excluding duplicate ends
- sharing_end_dist = 12 Maximum distance between end atoms to consider them matching for purposes of linking them. Can be larger than matching because the end points may not be at the ends of the chain.
- create_loop_maps = None Find connections between fragments in input model and write out small maps with just the density for the connection and the associated fragments. Default is True.
- max_points_per_region_ratio = 1.5 If set, max_points_per_region will be reset to this value times the current actual maximum points in any region, up to maximum of max_points_per_region_ever
- max_points_per_region_ever = 12000 Maximum number of points in a region ever. See max_points_per_region_ratio and max_points_per_region
- max_points_per_region = 4000 Maximum number of points in a region. Will be ignored if more than this. (Limiting this can prevent very long connections from being made, but reduces possibility that a connection that is totally wrong is made).
- max_new_chains = 1 Maximum number of new chains to obtain in one pass
- regions_for_new_chains = 3 Maximum number of top regions to examine for new chain info
- max_loops = 10 Maximum number of loops to write out in create_loop_maps
- half_window = 2 Half window (grid points) for examining shape of density
- max_loop_iterations = 999 Maximum number of iterations to look for loops
- spacing = 1 Spacing of points for a trace
- intervals = 10 Number of threshold values to try in create_loop_maps
- mean_density_ratio = 2 Guess of mean density at coordinates of atoms is mean density in map plus SD of density in map, corrected for solvent content, times mean_density_ratio
- min_sd_to_mean = 0.20 Limit SD of density at atoms to at least this value times the mean value.
- residues_to_have_middle = 10 Length of a chain to have a middle that can be masked out.
- min_points_in_region = 10 Minimum points in region to be considered as a new chain
- max_points_in_region = 3000 Maximum points in region to be considered as a new chain
- sd_density_ratio = 0.5 Guess of SD of density at coordinates of atoms is SD of density in map, corrected for solvent content, times sd_density_ratio
- threshold = None Density threshold for following density.
- threshold_low = None Density threshold low value for following density.
- threshold_high = None Density threshold high value for following density.
- save_threshold_info = False Use threshold info from initial analysis throughout
- threshold_from_model = False Use density at coordinates in model to estimate thresholds
- test_threshold = False Test thresholds when making connections and use maximum that will connect
- sd_ratio_intervals = 1 SD ratio intervals. Density is traced starting from highest and going to lowest in sd_ratio_intervals. XXX not used
- sd_ratio_create = 1 SD ratio for creating new chains . Default is same as sd_ratio.
- sd_ratio = 2. Lower limit of density to consider is sd_ratio below mean of density at C/N/CA atoms in current model in create_loop_maps. Applies when connecting segments. A high value may result in incorrect connections.
- min_fixed_model_segment_length = 4 Minimum length of segment in fixed_model
- mean_ratio = 0.25 Lower limit of density to consider is mean_ratio times mean of density at C/N/CA atoms in current model in create_loop_maps after subtracting mean of map
- minimum_fraction_intervals_to_try = 0.67 If at least this fraction of intervals have been tried in trace, terminate if recent fraction of long branches is too high.
- retry_long_branches = None If a long branch is found, try to retrace chain
- max_fraction_long_branches = 0.4 Terminate trace if recent fraction of long branches is high
- skip_remainder_on_non_completion = True Skip remainder at this threshold if too many failures
- n_tries_furthest_length = 3 Number of tries to include in looking for longest path
- recent_fraction_length = 8 Number of tries to include in recent fraction of long branches
- expand_size = 1 Expansion of mask when cutting out loop density
- build_segments = True Build segments
- first_cycle_for_fixing = None First macro-cycle where insertions/deletions should be tested
- score_cc_mask_offset = 0.5 Value of CC_mask that just starts to give a positive score. Typical values are 0 or 0.5
trace_and_build
- min_segment_length = 15 Minimum residues in a segment to keep (after joining and insertion).
- max_segments = None Maximum number of segments to keep (after joining and insertion). Default is take all with min_segment_length or more residues
- min_segments = 1 Minimum number of segments to keep (after joining and insertion) before skipping if segment length is too short
- fix_insertions_deletions = True Retrace chains and use sequence alignment to fix insertions and deletions.
- convincing_delta_score = 10. Convincing delta score for forward vs reverse directions that nearly always means the higher-scoring one is correct.
- find_helices_strands = True Find helices/strands before tracing chain if no starting model is supplied
- minimum_length_angstroms_helices_strands = 12 Minimum length (A CA start - CA end) for helix/strand to keep
- trim_chains = True Try to trim chains if turns are too tight (i-j-k with dist(i, k) < min_dist_ik)
- min_dist_ik = 4.5 Minimum i-k distance for CA i j k.
- fill_gaps = True Try to fill short gaps in input model
- tolerance_residue_distance = 2 Tolerance for residue-residue distance (typically 4.0)
- dot_min = -0.2 Minimum cosine of angle between suquential CA-CA directions.
- minimum_length = 2 Minimum length between CA-CA positions
- max_gap_residues = 2 Maximum number of residues to try and fill in gaps. Must be 1 or 2
- max_trim_ends = 1 Maximum number of residues to trim from ends before gap filling
- max_gap_dist = 3 Maximum distance to span in gap
- minimum_relative_density = 0.50 Minimum density at marker sites, relative to mean at coordinates of input model, to keep. Also minimum along path between a marker site and nearest main_chain trace. Also minimum ratio of density at ends of a connection and in gap to be filled.
- get_marker_atoms_by_segment = True Get marker atoms by segment. Only applies if map size is at least minimum_size_marker_atoms_by_segment
- minimum_size_marker_atoms_by_segment = 3375000 Use get_marker_atoms_by_segment if it is set and map size is at least this big (typically 150x150x150).
- morph_segments = True Morph segments to fit density before looking for side chains
- morphing_iterations = 3 Number of iterations of morphing
- points_per_atom = 3 Number of points between atoms in tracing path of main chain
- shift_radius = 2 Marker atoms within shift_radius of a main-chain atom are used to identify shift of main-chain in morphing
- minimum_sites = 4 Minimum nearby sites for shifting main chain in morphing
- smoothing_window = 5 Smoothing window for shifts of main-chain in morphing
- main_chain_radius = 1.5 Approximate radius of tube of density for main-chain. Marked points within this radius of an atom in the main-chain are considered main-chain, those outside are side chains and other chains
- side_chain_radius = 3.0 Consider marked points between main_chain_radius and side_chain_radius as likely side-chain markers. Best to use value of 3 or less or neighboring side chains will interfere.
- mask_atoms_radius = 2.5 Radius to use when masking around already-built model atoms
- delta_atoms_radius = 2. Increase in mask_atoms_radius arounds atoms at ends of already-built model.
- non_expanded_atoms_radius = None Atoms radius for creating mask showing main-chain. Should be slightly larger than the grid used so that a point on a grid point will have 6 adjacent grid points within this radius. Also should be comparable to N-CA-C distances. Default = max(1.0, grid+0.01)
- exclude_residues = 3 Residues to exclude at each end of chain in masking
- shell_radius = 0.5 Divide region between main_chain_radius and side_chain_radius into shells of thickness shell_radius
- cluster_width_ratio = 0.2 Maximum cluster width relative to CA-CA distance
- minimum_vector = 1.0 Minimum length of CB vector to include
- pruning_ratio = 3 Minimum distance between cluster centers relative to cluster width
- n_short_min = 20 N short min. Length of group of CA to optimize together (min). Default in trace_and_build is None, in map-to-model is 20 (quick) or 50 (not quick).
- n_short_max = 200 N short max. Length of group of CA to optimize together (max)
- n_short_delta = 40 N short delta. Length of group of CA to optimize together (delta)
- ca_ca_distance_tol = 1.0 CA-CA tolerance in scoring a set of possible CA atoms
- ca_ca_distance_tol_curves = 1.0 CA-CA tolerance in curves
- ca_ca_distance_curves = 3.0 CA-CA distance for residues in curves
- residues_defining_curves = 3 Number of residues in a row checked for defining a segment that curves.
- sites_to_delete = 3 Sites to delete and then try to rebuild as a group with 1 extra reside
- add_group = True Delete sites_to_delete and then rebuild as a group with 1 extra residue
- n_add_group = 10 Cycles for sites_to_delete
- n_tries_factor = 2 Try n_sites times n_tries_factor time to improve trace
- sites_cart_split_size = 1000 Size of sites_cart for finding sites with trace_chain
weights
- weight_vdw = 10 weight on very close CA-CA contacts
- weight_ca_ca_dist = 1. weight on CA-CA distance
- weight_proximity_to_known_position = 0.5 weight on proximity to well-identified CA position
- weight_chain_follows_trace = 10. weight on following the trace. Basically makes sure that the main chain does not cut across weak density
- distance_chain_follows_trace = 0.5 Target distance for following the trace.
- weight_target_length = 0.5 weight on total path length given number of CA positions
- weight_density = 1.0 weight on density at mid-points between CA atoms
- weight_cc_mask_score = 1.0 Weight on map correlation in evaluating chain direction
- weight_seq_score = 1.0 Weight on sequence-map matching evaluating chain direction
- weight_x_gly_score = 1.0 Weight on excess Gly and X residues in evaluating chain direction
- weight_refine_rmsd = 0.0 Weight on rmsd between CA before and after refinement
trace_chain
- helices_strands_cc_min = 0.35 Minimum map CC for helices/strands. Along with combine_models_score_min defines which fragments are tossed. Fragments are kept unless both criteria fail.
- use_all_trace_chains = False Use all trace_chains possibilities, not just the best
- n_overlap = 2 Maximum overlap in trace_chain nonamers
- n_random_frag = 100
- n_sift_nona = 2 Maximum nonamers with a common mid-point
- dist_ca_tol_start = 0.40 Minimum tolerance for CA-CA distances.
- dist_ca_tol_max = 0.6 Maximum deviation of CA-CA distances from target
- reject_ca_z_score = 2.5 Reject CA atoms in trace if their density density is more than Z of reject_ca_z_score below the mean and more than fraction reject_ca_fraction below the mean.
- reject_ca_fraction = 0.5 Reject CA atoms in trace if their density density is more than Z of reject_ca_fraction below the mean and more than fraction reject_ca_fraction below the mean.
- cc_ratio_min = None Minimum map CC relative to maximum for helices/strands.
- combine_models_score_min = 3.0 Minimum score for a chain in combine_models (after helices_strands). Score is mostly defined by sqrt(n_atoms)*overall map CC .
- rho_cut_min = 0.75 Minimum density (rho/sigma) at coordinates of potential CA atoms in trace_chain, after normalization for solvent fraction. For constant actual local rms in a map, the sigma (overall rms) of the map is proportional to the sqrt(1-solvent_fraction). Therefore rho_cut_min is adjusted by sqrt(0.5)/sqrt(1-solvent_fraction) to place it on a constant scale relative to a map with standard local rms.
- target_angle = 180 Target angle for CA-CA-CA (set to 180 to maximize chain length)
- rho_cut_min_low = 1. Starting value of rho_cut_min. Applies if rho_cut_min_delta is set (rho_cut_min is ignored in this case)
- rho_cut_min_high = 5 Ending value of rho_cut_min. Applies if rho_cut_min_delta is set
- rho_cut_min_delta = None Incremental value of rho_cut_min. If set, rho_cut_min will be ignored
- rat_pair_min = 0.5 Minimum ratio of density at midpoint between points to trace chain between them
- rad_sep_trace = 0.6 Dummy atom separation in trace_chain Usual 0.6 A for thorough run and 0.75 for quick Increased automatically if resolution is greater than 3 A Value of rad_mask_trace in resolve will be rad_sep_trace*2
- target_p_ratio = 4 Target ratio of atoms to peaks in trace_chain
- target_n_ratio = 1 Target ratio of nonamers to peaks in trace_chain
- max_triple_ratio = None Maximum ratio of triples to pairs in trace_chain
- max_pent_ratio = None Maximum ratio of pentamers to pairs in trace_chain
- n_atoms_total_scale = 3 Ratio of estimated atoms in au to standard estimate
- atom_target_ratio = 1.0 Target ratio of CA to look for to expected atoms in structure Standard is 0.45, quick is 0.35
- min_end_correl = 0.5 Minimum correlation of direction estimated from two ends to use end matching as criterion for keeping a chain
- add_side_chains = True Add in side chains at trace_chain step
- user_end_ratio = 2.0 Ratio of rad_user for ends of input chains to middle
- user_end_length = 3 Points in input chains within user_end_length of an end are considered near the end
- rad_user = 2.5 Radius for input chains in trace_chains. Points within this radius of an input chain will be removed.
- time_per_volume = 5 How long to try in trace chain (sec per volume of 10000 A**3) Try 2-20 to speed up trace_chain. Has similar effect as n_tries_target_p = 2 n_tries_p_ratio = 5.
- n_tries_target_p = None How many tries to get target p value in trace_chain n_target_p. n_tries_target_p. Set to 40 for slow. 2 quick
- n_tries_p_ratio = None Tries for p ratio. n_p_ratio. n_tries_max. Set to 20 for slow, 5 for quick
sequencing
- random_sequences = None Number of random sequences of each length to use as baseline
- positive_gap_penalty = None Penalty for missing residues is positive_gap_penalty * gap**2
- negative_gap_penalty = None Gap penalty for extra residues is negative_gap_penalty * gap**2
- max_gap_length = None Maximum gap length for adjacent alignments
- minimum_crossover_length = None Minimum length of crossovers
- minimum_alignment_length = None Minimum length of an alignment
- minimum_crossover_segment_length = 10 Minimum length of a segment in crossovers
- too_far_crossover = 0.75 fraction of CA-CA distance that is too far to cross over between chains
- too_much_further_crossover = 0.5 fraction of CA-CA distance that is too much further than the last matching CA-CA distance to cross over between chains
- score_by_residue_groups = None Use residue groups in sequence alignment and listing of optimal sequences. Default is use default from sequence_from_map
- split_using_alignment = False At end, split fragments based on alignment. Default is False
rebuilding
- refine = True Refine all-atom models at end of procedure
- refine_b = None Refine B-values
- refine_cycles = None Refinement cycles (except final cycles). Typical is 1 for quick or superquick and 5 for thorough building
- good_enough_cc = None If all residues have this CC, don't bother rebuilding them
- minimum_contact_distance = 3 Minimum distance between CA atoms not immediately connected
- split_with_sequence = None Use sequence to identify sequence register errors. Runs replace_side_chains with reassign_sequence = true and assign_sequence with iterative_assignment = true.
- rebuild_length_worst = 15 Longest rebuild length to try
- max_insert_or_delete = 1 Maximum residues to insert or delete in on rebuild stage
- time_per_residue = 1 How long to try in fitting loops (sec/residue)
- try_as_is = None Try rebuilding without insertions/deletions
- split_loop = None Try split_loop method for fixing insertions/deletions
- mask_secondary_structure_in_split_loop = True Mask out mask secondary structure in split loop
- try_insertions = None Try insertions
- try_deletions = None Try deletions
- max_rebuild_cycles = None Maximum rebuilding cycles per macro cycle
- macro_cycles = None Macro cycles of rebuilding and refinement
- start_rebuild = None Starting residue to rebuild. If specified, this is all that is done
- end_rebuild = None Ending residue to rebuild. If specified, this is all that is done
- rebuild_segment = 1 Segment to rebuild from start_rebuild to end_rebuild. None means rebuild all segments from start_rebuild to end_rebuild.
- min_z_score = 2.5 Minimum Z-score for keeping a residue in trace-chain stage. (density at N C CA atoms less than mean for all residues minus min_z_score times SD for all residues).
- min_average_z_score = 2 Minimum average Z-score for keeping a residue in trace-chain stage. (average density at N C CA atoms less than mean for all residues minus min_average_z_score times SD for all residues).
control
- multiprocessing = *multiprocessing sge lsf pbs condor pbspro slurm Choices are multiprocessing (single machine) or queuing systems
- queue_run_command = None run command for queue jobs. For example qsub.
- nproc = 1 Number of processors to use
- random_seed = 171731 Random seed
- verbose = False Verbose output
- write_maps = True Verbose map output
- skip_temp_dir = None Skip temp_dir
- trace_only = False Trace map and stop
- use_starting_model_ca_directly = False Use CA from starting model directly
- use_starting_model_ca_as_guide = False Applies if use_starting_model_ca_directly. Only use supplied CA as a guide
- return_trace_fragments = False return trace fragments with trace_only (both must be set)
- quick = True Quick run. Refine just 1 cycle and look for up to 3 segments
- superquick = False Very quick run
- max_dirs = 1000 Maximum number of directories (trace_and_build_xxx)
- resolve_size = None Size of resolve to use.
- coarse_grid = None Use a coarse grid in RESOLVE (saves on memory)
- em_side_density = None Use EM side chain density
guiGUI-specific parameter required for output directory
- output_dir = None