Model-building into cryo-EM and low-resolution maps with map_to_model

Author(s)

map_to_model: Tom Terwilliger

Purpose

The routine map_to_model will interpret a map (cryo-EM, low-resolution X-ray) and try to build an atomic model, fully automatically.

Usage

How map_to_model works:

If you have a CCP4-style (mrc, etc) map or just mtz map coefficients and a sequence file, you can use map_to_model to build a model into your map. The tool map_to_model will identify what kind of chains to build based on your sequence file. It will find where your molecule is in the map and cut out and work with just that part of the density.

If your map has been averaged based on NCS symmetry and you supply a file with that NCS information (.ncs_spec, biomtr.dat, etc), map_to_model will find the asymmetric unit of NCS, build that, then expand to the entire map. You can also just supply the type of symmetry (e.g., D7) or ask to check all plausible symmetry.

The map_to_model tool will cut the density in your map into small pieces of connected density and try to build model into each one. It will merge all the pieces into a compact model, refine it, and superimpose final model on the original map. The type of chain or chains to be built are chosen based on your sequence file. If multiple chain types are considered, the entire map is interpreted with each chain type, then the best-fitting non-overlapping chains are chosen.

Output files from map_to_model

map_to_model.pdb: A PDB file with the resulting model, superimposed on: the original map (or on the magnified map if magnification is applied).

Applying magnification to the map

Cryo-EM maps often have a scale that is not precisely defined by the experiment. map_to_model allows application of a scale factor (magnification) to the grid of the map. Normally this scale factor will be close to 1. If the magnification is specified and is not equal to 1, it will be applied to the input map and a magnification map will be written out to the output directory and will be used as if it were the original input file from then on. Additionally any input NCS information will be adjusted by the same magnification factor (translations and centers are scaled by the magnification factor, rotations are unchanged). Any input models are not modified.

Shifting the map to the origin

Most crystallographic maps have the origin at the corner of the map ( grid point [0,0,0]), while most cryo-EM maps have the orgin in the middle of the map. To make a consistent map, any maps with an origin not at the corner are shifted to put the origin at grid point [0,0,0]. This map is the shifted map that is used for further steps in model-building. At the conclusion of model-building, the model is shifted back to superimpose on the original map.

Finding the region containing the molecule

By default (density_select=True), the region of the map containing density is cut out of the entire map. This is particularly useful if the original map is very large and the molecule only takes up a small part of the map. This portion of the map is then shifted to place the origin at grid point [0,0,0]. (At the conclusion of model-building, the final model is shifted back to superimpose on the original map.) The region containing density is chosen as a box containing all the points above a threshold, typically 5% of the maximum in the map.

Map sharpening/blurring

By default (auto_sharpen=True) the resolution dependence of the map will be adjusted to maximize the clarity of the map. You can choose to use map kurtosis or the adjusted surface area of the map (default) for this purpose.

Kurtosis is a standard statistical measure that reflects the peakiness of the map.

The adjusted surface area is a combination of the surface area of contours in the map at a particular threshold and of the number of distinct regions enclosed by the top 30% (default) of those contours. The threshold is chosen by default to be one where the volume enclosed by the contours is 20% of the non-solvent volume in the map. The weighting between the surface area (to be maximized) and number of regions enclosed (to be minimized) is chosen empirically (default region_weight=20).

Several resolution-dependent functions are tested, and the one that gives the best kurtosis (or adjusted surface area) is chosen. In each case the map is transformed to obtain Fourier coefficients. The amplitudes of these coefficients are then adjusted, keeping the phases constant. The available functions for modifying the amplitudes are:

No sharpening (map is left as is)

Sharpening b-factor applied over entire resolution range (b_sharpen
applied to achieve an effective isotropic overall b-value of b_iso).

Sharpening b-factor applied up to resolution specified with the
resolution=xxx keyword, then not applied beyond this resolution (with
transition specified by the keyword k_sharpen, b_iso_to_d_cut).  If
blurring (sharpening with value less than zero) is applied,
the blurring is applied over the entire resolution range.

Resolution-dependent sharpening factor with three parameters.
First the resolution-dependence of the map is removed by normalizing the
amplitudes.  Then a scale factor S is to the data, where
log10(S) is determined by coefficients b[0],b[1],b[2] and a resolution
d_cut (typically d_cut is the nominal resolution of the map).
The value of log10(S) varies smoothly from 0 at resolution=infinity, to b[0]
at d_cut/2, to b[1] at d_cut, and to b[1]+b[2] at the highest resolution
in the map.  The value of b[1] is limited to being no larger than b[0] and the
value of b[1]+b[2] is limited to be no larger than b[1].

Sharpening using a half-dataset correlation.
The resolution-dependent correlation of density in two half-maps
is used to identify the optimal resolution-dependent weighting of
the map.  This approach requires a target resolution which is used
to set the overall fall-off with resolution for an ideal map.  That
fall-off for an ideal map is then multiplied by an estimated
resolution-dependent correlation of density in the map with the true
map (the estimation comes from the half-map correlations).


Model-based sharpening.
You can identify the sharpening parameters using your map and a
model.  This approach requires a guess of the RMSD between the model and
the true model.  The resolution-dependent correlation of model and map
density is used as in the half-map approach above to identify the
weighting of Fourier coefficients.

Finding the NCS asymmetric unit of the map

If you supply NCS matrices describing the NCS used to average the map (if any), then map_to_model will try to define a region of the map that represents the NCS asymmetric unit. Application of the NCS operators to the NCS asymmetric unit will generate the entire map, and application to a model built into the asymmetric unit will generate the entire model.

You can also supply the type of symmetry (e.g., C3, D7, etc) and map_to_model will try to find that symmetry in the map. You can even search for all plausible symmetry (ANY). For helical symmetry either an NCS file or the rotation and translation information is required, however.

Normally identification of the NCS asymmetric unit and segmentation of the map (below) are done as a single step, yielding an asymmetric unit and a set of contiguous regions of density within that asymmetric unit. The asymmetric unit of NCS will be written out as a map to the segmentation_dir directory, superimposed on the shifted map (so that they can be viewed together in Coot).

Segmentation of the map

By default (segment=True) the map or NCS asymmetric unit of the map will be segmented (cut into small pieces) into regions of connected density. This is done by choosing a threshold of density and identifying contiguous regions where all grid points are above this threshold. The threshold is chosen to yield regions that have a size corresponding to about 50 residues. The regions of density are written out to the segmentation_dir directory and are superimposed on the shifted map (if you load the shifted map in Coot and a region map in Coot, they should superimpose.)

Model-building

Models are built in several ways by map_to_model and then the best-fitting, non-overlapping models are chosen. The main methods used for model-building are:

Standard RESOLVE model-building for PROTEIN/RNA/DNA for the entire
asymmetric unit  of NCS (or the entire molecule if no NCS).

Helices (RNA) or helices/strands (PROTEIN) for entire asymmetric unit

tracing chain (RNA/PROTEIN/DNA) for each segmented region, with various
values of map sharpening applied

RESOLVE model-building for each segmented region, with various values of
map sharpening applied

Intermediate models are refined with phenix.real_space_refine and are written out relative to the shifted map with origin at [0,0,0]. You can view these intermediate models, the shifted map, and the shifted map containing just the asymmetric unit of NCS, and any region maps in Coot and they should all superimpose.

Once all intermediate models are built, all models of each chain type are combined, taking the best-fitting model for each part of the map. Then all chain types are combined, once again taking the best-fitting model for each part of the map. The models are refined again.

Then (if present) NCS is applied to the model and the full model is refined. Finally the best model, with NCS applied if present, is shifted to match the original map and is written out.

Iterative map improvement with model-building and sharpening

You can carry out multiple cycles of model-building and map sharpening, using the model from each cycle in the sharpening process. The map is sharpened to make it as similar as possible to density calculated from the current model and the new map is used for the next cycle of model building.

Examples

Standard run of map_to_model:

Running map_to_model is easy. From the command-line you can type:

phenix.map_to_model my_map.map seq.fa ncs_file=find_ncs.ncs_spec

where my_map.map is a CCP4, mrc or other related map format, seq.fa is a sequence file, and find_ncs.ncs_spec is an optional file specifying any NCS operators used in averaging the map. This can be in the form of BIOMTR records from a PDB file as well.

Standard run of map_to_model, specifying NCS type:

Finding NCS map_to_model is easy. From the command-line you can type:

phenix.map_to_model my_map.map seq.fa ncs_type=D7

Here map_to_model will look for D7 (7-fold symmetry along the c-axis and 2-fold symmetry along a or b) in the map and apply it if it is found. It will also write out the NCS matrices corresponding to this symmetry.

Combining partial runs from map_to_model:

If you have a completed or partially-completed run of map_to_model and you want to run again but you do not want to re-run all the steps, you can skip building some models by specifying a previously-built model for some steps. The models you can specify are:

init_PROTEIN

init_RNA

init_DNA

helices_strands_only_PROTEIN

build_rna_helices_RNA

standard_PROTEIN

standard_RNA

standard_DNA

final_PROTEIN

final_RNA

final_DNA

For example, you can say:

phenix.map_to_model my_map.map seq.fa ncs_file=find_ncs.ncs_spec
  partial_model=last_run/final_PROTEIN.pdb
  partial_model_type=final_PROTEIN
  partial_model=last_run/final_RNA.pdb
  partial_model_type=final_RNA

and then the files last_run/final_PROTEIN.pdb and last_run/final_RNA.pdb will simply be combined to create a new model, instead of doing a full model-building run.

The stages of model-building that can be replaced are:

init (building into individual regions of density)

helices_strands_only (PROTEIN search for helices and strands)

build_rna_helices (RNA search for A-form helices)

standard (RESOLVE model-building on entire structure)

final (combined model for one chain type)

The chain-types that can be specified are:

PROTEIN

RNA

DNA

Possible Problems

If you have a very large structure it is possible that your computer may not have enough memory to run map_to_model and that one or more sub-processes might crash. The best solution for this is to try on a computer with even more memory. You can also cut back the resolution used in steps that use Fourier transformation.

If your queueing system crashes during a run or one or more sub-processes crashes, then you might end up with models built for some stages of building and others not. You can carry out another run and read in the models that have already been built so that you do not need to build them again (see above in the section on combining partial runs).

Specific limitations and problems:

Literature

Additional information

List of all available keywords

job_title = None Job title in PHENIX GUI, not used on command line
input_files
- map_coeffs_file = None File with map coefficients
- map_coeffs_labels = None Optional label specifying which columns of of map coefficients to use
- map_file = None File with CCP4-style map
- seq_file = None Sequence file
- ncs_file = None Input NCS spec file for use with segment_and_split_map.
- starting_model = None Optional input PDB files to be used as starting point for further model building.
- sharpen_with_starting_model = False If starting_model is supplied, use it in identifying optimal resolution-dependent map sharpening
- include_starting_model = True If starting_model is supplied, use it in model-building.
- target_ncs_au_file = None Optional PDB file to partially define the ncs asymmetric unit of the map. The coordinates in this file will be used to mark part of the ncs au and all points nearby that are not part of another ncs au will be added.
- partial_model = None Optional input PDB file with protein/RNA/DNA chains for one step in model-building. If supplied, no model-building for this step will be carried out. Must match offset map (normally not the same as the original map.) This model will replace the model normally built at the step specified by partial_model_type and should match its position. Multiple partial models may be specified. You need to specify what type each one is with the keyword partial_model_type (the order of partial_model entries must match the order of partial_model_type entries). NOTE: This is intended as a way to use the models from a previous run, not to input your own model. For your own models, use starting_model instead.
- partial_model_type = init_PROTEIN init_RNA init_DNA helices_strands_only_PROTEIN build_rna_helices_RNA standard_PROTEIN standard_RNA standard_DNA *final_PROTEIN final_RNA final_DNA Model type for a partial model. The model will be used instead of generating this kind of model. One required for each partial_model. The partial_model_type keyword specifies both the model-building step (init, helices_strands_only, build_rna_helices, standard, final) and the chain_type (RNA, DNA, PROTEIN).
- trace_chain_pdb_in = None Input PDB file to use instead of tracing chains. Suitable only for the case where one region is being analyzed. The PDB file should match the original map (not the offset region).
- info_pickle_file = None Pickle file from segment_and_split_map. You can read in this file and and choose which map file to use.
- input_map_id_start = None first ID (number 1 to N) of the map to analyze from info_pickle file
- input_map_id_end = None last ID (number 1 to N) of the map to analyze from info_pickle file
- seed_ca_in = None Input PDB file with possible CA positions to assist in tracing chains
- seed_ca_only = False Only use sites from seed_ca_in (do not find additional sites)
- model_pickle_file = None Pickle file with working models. You can read in this file and and start with the models in it. Only suitable if just one region of the map is to be analyzed (same as for trace_chain_pdb_in)
output_files
- params_out = map_to_model_params.eff Output parameters file
- pdb_out = map_to_model.pdb Output PDB file
- pdb_out_unique = map_to_model_unique.pdb Output PDB file (ncs AU). Just asymmetric unit of model
- pdb_out_reverse = map_to_model_reverse.pdb Optional output reversed PDB file (traced opposite direction as pdb_out). Created if include_reverse is set.
- protein_mask_file = None Output ccp4-style map with mask for region of macromolecule
- region_model_suffix = None Suffix (before .pdb) for region models
- output_directory = map_to_model Place where final files will go
- segmentation_directory = segmented_maps Place where segmented maps will go if segment=True
crystal_info
- chain_types = PROTEIN DNA RNA Chain type (PROTEIN/DNA/RNA). You can choose one or more. If not specified; it will be guessed from your sequence file. Multiple chain types are fine.
- scattering_table = *n_gaussian wk1995 it1992 electron neutron Choice of scattering table for structure factor calculations. Standard for X-ray is n_gaussian, for cryoEM is electron.
- is_crystal = None Defines whether this is a crystal (or cryo-EM). Default is True if map_coefficients are supplied and False if a map is supplied. If True and no NCS operators are supplied, then segmentation yields the asymmetric unit of the crystal. Additionally the final model will represent the asymmetric unit of the crystal (not the entire map). Normally set is_crystal and use_sg_symmetry together.
- use_sg_symmetry = None If you set use_sg_symmetry=True then the symmetry of the space group will be used. For example in P1 a point at one end of the unit cell is next to a point on the other end. Normally for cryo-EM data this should be set to False. Default is True if map_coefficients are supplied and False if a map is supplied.
- resolution = None High-resolution limit. Data will be truncated at this resolution. If a map is supplied, it will be Fourier filtered at this resolution (multiplied by d_min_ratio). Required if input is a map and only_original_map is not set.
- solvent_content = None Solvent fraction of the cell. If this is density cut out from a bigger cell, you can specify the fraction of the volume of this cell that is taken up by the macromolecule. Normally set automatically. Values go from 0 to 1.
- solvent_content_iterations = 3 Iterations of solvent fraction estimation
- truncate_at_d_min = None Truncate data at resolution specified. Default is True if map_coeffs are supplied, False if map is supplied
- map_inside_box = True Place centers of all chains inside (0,1). This is required during normal operation.
- space_group = None Space group (normally read from the data file)
- unit_cell = None Unit Cell (normally read from the data file)
- chain_type = PROTEIN DNA RNA Do not set. Use chain_types instead.
- sequence = None Sequence as text string. Normally supply a sequence file instead
- origin_frac = None Saves origin shift
- solvent_fraction = None Same as solvent_content. Only used on input and copied to solvent content.
reconstruction_symmetry
- ncs_type = None Symmetry used in reconstruction. For example D7, C3, C2 I (icosahedral),T (tetrahedral), or ANY (try everything and use the highest symmetry found). Ignored if ncs_file is supplied. Note: ANY does not search for helical symmetry
- ncs_center = None Center (in A) for NCS operators (if ncs is found automatically). If set to None, first guess is the center of the cell and then if that fails, found automatically as the center of the density in the map.
- optimize_center = None Optimize position of NCS center. Default is False if ncs_center is supplied or center of map is used and True if it is found automatically).
- helical_rot_deg = None helical rotation about z in degrees
- helical_trans_z_angstrom = None helical translation along z in Angstrom units
- two_fold_along_x = None Specifies if D or I two-fold is along x (True) or y (False). If None, both are tried.
- random_points = None Number of random points in map to examine in finding NCS
- n_rescore = None Number of NCS operators to rescore
- op_max = None If ncs_type is ANY, try up to op_max-fold symmetries
map_modification
- magnification = None Magnification to apply to input map. Input map grid will be scaled by magnification factor before anything else is done.
- b_iso = None Target B-value for map (sharpening will be applied to yield this value of b_iso)
- b_sharpen = None Sharpen with this b-value. Contrast with b_iso that yield a targeted value of b_iso
- resolution_dependent_b_sharpen = None If set, apply resolution_dependent_b_sharpen (b0 b1 b2). Log10(amplitudes) will start at 1, change to b0 at half of resolution specified, changing linearly, change to b1 at resolution specified, and change to b2 at high-resolution limit of map
- d_min_ratio = 0.833 Sharpening will be applied using d_min equal to d_min_ratio times resolution. If None, box of reflections with the same grid as the map used.
- auto_sharpen = True Apply auto-sharpening in segment_and_split_map ( requires segment=True)
- auto_sharpen_regions = False Automatically determine sharpening for each region (in addition to the overall map).
- auto_sharpen_methods = *no_sharpening *b_iso *b_iso_to_d_cut *resolution_dependent Methods to use in sharpening. b_iso searches for b_iso to maximize sharpening target (kurtosis or adjusted_sa). b_iso_to_d_cut applies b_iso only up to resolution specified, with fall-over of k_sharpen. Resolution dependent adjusts 3 parameters to sharpen variably over resolution range.
- local_sharpening = None Sharpen locally using overlapping regions. NOTE: Best to turn off local_aniso_in_local_sharpening if NCS is present. If local_aniso_in_local_sharpening is True and NCS is present this can distort the map for some NCS copies because an anisotropy correction is applied based on local density in one copy and is transferred without rotation to other copies.
- local_aniso_in_local_sharpening = None Use local anisotropy in local sharpening. Default is True unless NCS is present.
- box_in_auto_sharpen = True Use a representative box of density for initial auto-sharpening instead of the entire map.
- max_box_fraction = None If box is greater than this fraction of entire map, use entire map.
- use_weak_density = False When choosing box of representative density, use poor density (to get optimized map for weaker density)
- k_sharpen = None Steepness of transition between sharpening (up to resolution ) and not sharpening (d < resolution). Note: for blurring, all data are blurred (regardless of resolution), while for sharpening, only data with d about resolution or lower are sharpened. This prevents making very high-resolution data too strong. Note 2: if k_sharpen is zero or None, then no transition is applied and all data is sharpened or blurred. Note 3: only used if b_iso is set.
- search_b_min = None Low bound for b_iso search.
- search_b_max = None High bound for b_iso search.
- search_b_n = None Number of b_iso values to search.
- residual_target = None Target for maximization steps in sharpening. Can be kurtosis or adjusted_sa (adjusted surface area)
- sharpening_target = None Overall target for sharpening. Can be kurtosis or adjusted_sa (adjusted surface area). Used to decide which sharpening approach is used. Note that during optimization, residual_target is used (they can be the same.)
- require_improvement = True Require improvement in score for sharpening to be applied
- region_weight = 40 Region weighting in adjusted surface area calculation. Score is surface area minus region_weight times number of regions. Default is 40.
- sa_percent = 30. Percent of target regions used in calulation of adjusted surface area. Default is 30.
- fraction_occupied = 0.20 Fraction of molecular volume targeted to be inside contours. Used to set contour level. Default is 0.20
- n_bins = 20 Number of resolution bins for sharpening. Default is 20.
- max_regions_to_test = 30 Number of regions to test for surface area in adjusted_sa scoring of sharpening
- eps = None
strategy
- cycles = None You can run cycles of model-building interpersed with model-based map sharpening.
- cycle_auto_sharpen = True Apply auto-sharpening on all cycles (not just first)
- density_select = None Run map_box with density_select=True to cut out the region in the input map that contains density. Useful if the input map is much larger than the structure. Default is True if a map is supplied and False if is_crystal is set or map_coeffs are supplied.
- density_select_threshold = None Choose region where density is this fraction of maximum or greater
- get_half_height_width = None Use 4 times half-width at half-height as estimate of max size
- segment = True Run segment_and_split_map to break up map into pieces for model-building (recommended for larger structures). You can also run segment_and_split_map separately and then run model-building (there are more parameters available for segment_and_split_map in that case).
- build_in_regions = True Run building on individual segmented regions
- include_phase_and_build = True If segment is True, also run phase_and_build on the entire asymmetric unit of the map.
- include_helices_strands_only = True If True, also run build_one_model on the entire asymmetric unit of the map with helices_stands (protein) or with build_rna_helices (RNA).
- max_nproc_for_build_one_model = 30 Use only up to this number of processors for a single run of phase_and_build. Typically there is little time advantage to using more than about 30 processors and if you use too many the time could increase and quality decrease due to splitting up the model-building too much.
- nmodels_for_phase_and_build = 20 Number of models to build in phase_and_build
- score_with_cc_in_phase_and_build = True Use map-model CC to score in phase_and build
- include_reverse = False Create reversed model where all chains are traced in opposite direction to pdb_out
- reverse_only = False Only create reversed model where all chains are traced in opposite direction to pdb_out (requires input model).
- quick = False Run quickly (sets region_b_sharpen=0, thorough_resolve_model_building=False, ss_refine_ncycle=1
- superquick = False Run very quickly just to test routines
- pdb_in_only = None Only use pdb_in (do not build anything new)
- assign_sequence = True Run assign_sequence to match model to sequence
- rearrange_before_assign_sequence = True Run replace_side_chains with reassign_sequence before assign_sequence
- fit_loops_before_assign_sequence = True Run fit_loops before assign_sequence
- extend_with_resolve = True Run resolve model extension
- min_percent_assigned_for_assign_sequence = 25 Skip assign_sequence if initial percentage sequence assigned is lower than min_percent_placed_for_assign_sequence
segmentation
- soft_mask = False Use soft mask (smooth change from inside to outside with radius based on resolution of map). In development
- value_outside_mask = None Value to assign to density outside masks in segment_and_split_map
- min_relative_helical_cc_to_keep = 0.90 For helical symmetry, keep copies within this range of max at end
- add_neighbors = True Add neighboring regions around the NCS au. Turns off exclude_points_in_ncs_copies also.
region_sharpening
- include_original_map = False You can include the original map without Fourier filtering in local maps with this keyword. To use only the original map use only_original_map=True
- only_original_map = False You can include just the original map without Fourier filtering in local maps. Applies if the starting point is a map (not map coefficients).
- region_b_sharpen = None B-factor for sharpening of local maps. Normally set to None and several values are tested. (Positive is sharpen, negative is blur) Note: if resolution is specified map will be Fourier filtered to that resolution even if it is not sharpened.
- region_b_sharpen_low = -105 Lowest value of region_b_sharpen to use. Applies if region_b_sharpen_delta is set. Negative is blur, positive is sharpen.
- region_b_sharpen_high = 0 Ending value of region_b_sharpen. Applies if region_b_sharpen_delta is set Negative is blur, positive is sharpen.
- region_b_sharpen_delta = 15 Incremental value of region_b_sharpen. If set, region_b_sharpen will be applied with values from region_b_sharpen_low to region_b_sharpen_high in increments of region_b_sharpen_delta. Ignored if region_b_sharpen is set.
resolve_model_building
- run_resolve_model_building = None Run resolve model-building algorithm on maps. If None, set to True for RNA/DNA and False for protein.
- thorough_resolve_model_building = True Use thorough resolve model-building
- run_helices_strands_only = None Run resolve helices-strands-only algorithms on maps. If None set to True for RNA and False for protein. This applies to segmented maps only if they exist. Compare with include_helices_strands_only which applies to the entire asymmetric unit
trace_chain
- run_trace_chain = True Run trace-chain model-building algorithm on maps
- rho_cut_min = 2.0 Minimum density (rho/sigma) at coordinates of potential CA atoms in trace_chain, after normalization for solvent fraction. For constant actual local rms in a map, the sigma (overall rms) of the map is proportional to the sqrt(1-solvent_fraction). Therefore rho_cut_min is adjusted by sqrt(0.5)/sqrt(1-solvent_fraction) to place it on a constant scale relative to a map with standard local rms.
- rho_cut_min_low = 1. Starting value of rho_cut_min. Applies if rho_cut_min_delta is set (rho_cut_min is ignored in this case)
- rho_cut_min_high = 5 Ending value of rho_cut_min. Applies if rho_cut_min_delta is set
- rho_cut_min_delta = None Incremental value of rho_cut_min. If set, rho_cut_min will be ignored
- rat_pair_min = 0.5 Minimum ratio of density at midpoint between points to trace chain between them
- dist_ca_tol_max = 0.80 Maximum tolerance for CA-CA distances. Normally 0.8 A for thorough run and 1.3 A for quick
- dist_ca_tol_start = 0.10 Minimum tolerance for CA-CA distances.
- rad_sep_trace = 0.75 Dummy atom separation in trace_chain Usual 0.6 A for thorough run and 0.75 for quick Increased automatically if resolution is greater than 3 A Value of rad_mask_trace in resolve will be rad_sep_trace*2
- target_p_ratio = 3 Target ratio of nonamers to peaks in trace_chain
- max_triple_ratio = 10 Maximum ratio of triples to pairs in trace_chain
- max_pent_ratio = 10 Maximum ratio of pentamers to pairs in trace_chain
- atom_target_ratio = 0.45 Target ratio of CA to look for to expected atoms in structure Standard is 0.45, quick is 0.35
- build_both_directions = False Build chains both directions and choose after merging
- min_end_correl = 0.5 Minimum correlation of direction estimated from two ends to use end matching as criterion for keeping a chain
- add_side_chains = True Add in side chains at trace_chain step
crossing
- combine_models = True Merge models from trace_chain to form single model
- standard_merge = None Merge by reading all chains and running resolve merging.
- use_cc_in_combine_extend = None You can choose to use the correlation of density rather than density at atomic positions to score models in the merge_second_model or merge_both_models step. This may be useful at lower resolution (> 3 A)
- merge_remainder = True Merge remainder in merge_by_segment_correlation
- merge_by_segment_correlation = True Merge using segment correlation if merge_with_combine_models is selected. Can be used along with merge_remainder=True/False and standard_merge=True/False
- merge_second_model = True In merging models, cut up second model to fill gaps in first Normally used internally.
- wang_radius = 5 Smoothing radius for solvent identification
- cc_min = 0.40 Minimum map-model correlation to keep a segment after refinement
- assignment_weight = 0.20 If set, increase score of segments assigned to sequence.
- cc_min_rna = 0.25 Minimum map-model correlation (RNA/DNA building)
- overlap_tolerance = None Minimum distance between N/P/C1prime in different chains. Used to reject NCS-related chains. Default is 3 A for RNA/DNA and 2 A for protein
- ncs_clash_threshold = 2. Threshold for considering two atoms too close after applying NCS
refinement
- refine = True Refine fragments after each building cycle
- refine_adp = False Refine individual atomic displacement factors (adp)
- number_of_sa_models = 20 Number of refinement sa_models.
- number_of_trials = 20 Number of refinement trials.
- number_of_macro_cycles = 5 Number of refinement macro_cycles.
- number_of_build_cycles = 5 Number of refinement (SA) cycles.
rebuilding
- rebuild = True Rebuild model. Only for protein.
- rebuild_cycles = 2 Number of rebuilding cycles. Set to zero to skip rebuilding.
- trace_loops = True Rebuild model by iterative loop tracing. Only for protein.
- trace_loops_target_time = 60 Target time for completing a single trace_loops job. You can decrease it to speed it up (and miss some rebuilding) or increase it and possibly get some more successful rebuilding.
- loop_lib = False Rebuild model using loop library. Only for protein.
- rebuild_length = 8 Length of segments to be rebuilt
- rebuild_length_worst = 6 Length of segments to be rebuilt in rebuilding worst sections
- offset_length = 3 Offset of segments to be rebuilt
- rebuild_length_loop_lib = 3 Length of segments to be rebuilt with loop library (max=3)
- offset_length_loop_lib = 1 Offset of segments to be rebuilt with loop library
- start_res = None Start residue for rebuilding. Normally use None.
- end_res = None Ending residue for rebuilding. Normally use None.
- worst_rebuild_percent = 10 If specified, this percentage of residues will be rebuilt. Otherwise, the entire model will be rebuilt
- target_insert = None Length of insert to put from start_res to end_res Normally leave blank. Can be used to force the addition or subtraction of residues.
- target_insert_offset = None If non-zero, try to rebuild with this number of extra residues in each rebuilt segment. Normally leave blank. Can be used to force the addition or subtraction of residues.
iterative_ss_refine
- iterative_ss = True Run iterative assignment of secondary structure and refinement with secondary structure restraints
- ss_refine_ncycle = None Overall cycles of iterative secondary_structure identification and refinement. Default is 10
- refine_cycles = 5 Cycles of refinement within iterative secondary structure and refinement
- parallel_runs = 4 Number of parallel runs of iterative secondary structure assignment and refinement. Best model from each group is taken each cycle
- combine_annotations = False Combine existing secondary_structure annotations each cycle with new annotations
- replace_side_chains = True Replace side chains during iterative optimization
- regularize = True regularize secondary structure during iterative optimization
mr_rosetta
- run_mr_rosetta = False Run mr_rosetta to rebuild model
- rosetta_models = 20 Number of rosetta models to build
- relax_models = 5 Number of relaxed rosetta models to build
- rosetta_fixed_seed = None Fixed seed for Rosetta. Only use for regression tests.
control
- multiprocessing = *multiprocessing sge lsf pbs condor pbspro slurm Choices are multiprocessing (single machine) or queuing systems
- queue_run_command = None run command for queue jobs. For example qsub.
- nproc = 1 Number of processors to use
- build_only = False Only build model (then stop)
- build_and_refine_only = False Only build model and refine (then stop)
- resolve_size = None Size of resolve to use.
- random_seed = 77151 Random seed (allows duplicating calculations or getting different results each time.)
- verbose = False Verbose output
- comparison_model = None Comparison model
guiGUI-specific parameter required for output directory
- output_dir = None