Model-building into cryo-EM and low-resolution maps with map_to_model
Author(s)
- map_to_model: Tom Terwilliger
Purpose
The routine map_to_model will interpret a map (cryo-EM, low-resolution
X-ray) and try to build an atomic model, fully automatically.
GUI
A Graphical User Interface is available.
Usage
How map_to_model works:
If you have a CCP4-style (mrc, etc) map or just mtz map coefficients
and a sequence file, you can use map_to_model to build a model into
your map. The tool map_to_model will identify what kind of chains to
build based on your sequence file. It will find where your molecule is
in the map and cut out and work with just that part of the density.
To run map_to_model successfully, you'll need a map with resolution of
about 4.5 A or better. The higher the resolution, the more complete the
model that you will typically get.
It is easiest to supply map_to_model with a map that has already been
sharpened optimally (for example with phenix.auto_sharpen). It is also easiest
if you cut out just the unique part of your map before supplying it to
map_to_model (you can use phenix.map_box to do this with
the extract_unique=True option). After you are all done you can reconstruct
the entire molecule using phenix.apply_ncs.
(Alternatively you can have map_to_model do everything for you (auto_sharpen,
symmetry average, cut out the unique part, reconstruct the entire molecule).
This is a lot slower so normally it is best to prepare just the right part
of the map in advance.)
You have a choice in map_to_model of running quickly (quick=True) or
more thoroughly. Normally quick=True should be fine, particularly
for building protein chains. In this mode,
map_to_model runs the phenix.trace_and_build tool to trace the chain,
identify positions of side chains, build a model for each segment of density,
and refine the entire model. This can be done quickly with multiple processors,
requiring perhaps 15 minutes to build 300 residues with 4 processors.
If you choose thorough building (quick=False), the map_to_model will
try several different methods of building your model.
One method is to build in small regions of the map.
The map_to_model tool will cut the density
in your map into small pieces of connected density and
try to build model into each one. It will merge all the
pieces into a compact model, refine it, and superimpose
final model on the original map. Another method is the trace_and_build
method used in the quick version. A third is standard resolve model-building.
A fourth is finding helices and strands.
If your structure has RNA or DNA, model-building for these chains is done
separately from protein. The type of chain or chains to be
built are chosen based on your sequence file. If multiple chain types
are considered, the entire map is interpreted with each chain type, then
the best-fitting non-overlapping chains are chosen.
How to use map_to_model and trace_and_build to build and complete a model
The recommended way to build a model using phenix is first to run
map_to_model and get a starting model, and then to try and build additional
model and fix errors using trace_and_build and fix_insertions_deletions.
Note that map_to_model will show
you all the chains that are built along with their density, so you can see
even before it is done whether there are changes in chain tracing that you
would like to make.
Once map_to_model is done, you can try to connect parts of the model, extend
chains, or build new chains using trace_and_build. map_to_model writes
out each segment as a separate chain, so you can specify chains to be
connected with trace_and_build. For example, if you think chains A and B
should be connected, you can use pdbtools to create a working model
working_AB.pdb with just chains A and B. Then you can run
trace_and_build with working_AB.pdb and your map, specifying
connect_chains and turning off extend_chains and find_chains. Then
trace_and_build will attempt to connect your chains A and B and create a
new model with a single chain. You can then replace chains A and B in your
model with the new chain.
You can also try to fix insertions and deletions in your model by running
fix_insertions_deletions with your model, your sequence file, and your map.
You can try to improve the sequence assignment with sequence_from_map. This
is run automatically at the end of map_to_model and trace_and_build
but if you create new chains you might want to try and assign them to
sequence separately.
If you want to create a model with full symmetry, you can do this with
map_to_model as well. You supply the full map, your model of the
unique part of the structure as the starting_model, and the symmetry
matrices that you can obtain from the map_symmetry tool. You can turn
off all the model-building methods so that the only thing that happens is
assembly and refinement.
Then map_to_model will apply the symmetry, remove any pieces that overlap
due to symmetry, and refine the entire model.
Applying magnification to the map
Cryo-EM maps often have a scale that is not precisely defined by the
experiment. map_to_model allows application of a scale factor
(magnification) to the grid of the map. Normally this scale factor
will be close to 1. If the magnification is specified and is not
equal to 1, it will be applied to the input map and a magnification map
will be written out to the output directory and will be used as if it were
the original input file from then on. Additionally any input symmetry
information will be adjusted by the same magnification factor (translations and
centers are scaled by the magnification factor, rotations are unchanged).
Any input models are not modified.
Shifting the map to the origin
Most crystallographic maps have the origin at the corner of the map (
grid point [0,0,0]), while most cryo-EM maps have the orgin in the
middle of the map. To make a consistent map, any maps with an origin not
at the corner are shifted to put the origin at grid point [0,0,0]. This map
is the shifted map that is used for further steps in model-building.
At the conclusion of model-building, the model is shifted back to
superimpose on the original map.
Finding the region containing the molecule
If requested, (density_select=True), the region of the map containing density
is cut out of the entire map. This is particularly useful if the original map
is very large and the molecule only takes up a small part of the map. This
portion of the map is then shifted to place the origin at grid point [0,0,0].
(At the conclusion of model-building, the final model is shifted back to
superimpose on the original map.) The region containing density is chosen
as a box containing all the points above a threshold, typically 5% of the
maximum in the map.
Map sharpening/blurring
If requested, (auto_sharpen=True) the resolution dependence of the map will
be adjusted to maximize the clarity of the map. It is generally
preferable to do this in advance using the phenix.auto_sharpen tool however.
If you run auto_sharpen, You can choose to use
map kurtosis or the adjusted surface area of the map (default) for this
purpose.
Kurtosis is a standard statistical measure that reflects the peakiness of
the map.
The adjusted surface area is a combination of the surface area of
contours in the map at a particular threshold
and of the number of distinct regions enclosed by the top 30% (default) of
those contours. The threshold is chosen by default to be one where the
volume enclosed by the contours is 20% of the non-solvent volume in the map.
The weighting between the surface area (to be maximized) and number of regions
enclosed (to be minimized) is chosen empirically (default region_weight=20).
Several resolution-dependent functions are tested, and the one
that gives the best adjusted surface area (or kurtosis) is chosen.
In each case the map is transformed to obtain Fourier coefficients. The
amplitudes of these coefficients are then adjusted, keeping the phases
constant. The available functions for modifying the amplitudes are:
No sharpening (map is left as is)
Sharpening b-factor applied over entire resolution range (b_sharpen
applied to achieve an effective isotropic overall b-value of b_iso).
Sharpening b-factor applied up to resolution specified with the
resolution=xxx keyword, then blurring applied beyond this resolution (with
transition specified by the keyword k_sharpen, b_iso_to_d_cut). If
sharpening value is less than zero (map is to be blurred),
the blurring is applied over the entire resolution range.
Resolution-dependent sharpening factor with three parameters.
First the resolution-dependence of the map is removed by normalizing the
amplitudes. Then a scale factor S is to the data, where
log10(S) is determined by coefficients b[0],b[1],b[2] and a resolution
d_cut (typically d_cut is the nominal resolution of the map).
The value of log10(S) varies smoothly from 0 at resolution=infinity, to b[0]
at d_cut/2, to b[1] at d_cut, and to b[1]+b[2] at the highest resolution
in the map. The value of b[1] is limited to being no larger than b[0] and the
value of b[1]+b[2] is limited to be no larger than b[1].
Sharpening using a half-dataset correlation.
The resolution-dependent correlation of density in two half-maps
is used to identify the optimal resolution-dependent weighting of
the map. This approach requires a target resolution which is used
to set the overall fall-off with resolution for an ideal map. That
fall-off for an ideal map is then multiplied by an estimated
resolution-dependent correlation of density in the map with the true
map (the estimation comes from the half-map correlations).
Model-based sharpening.
You can identify the sharpening parameters using your map and a
model. This approach requires a guess of the RMSD between the model and
the true model. The resolution-dependent correlation of model and map
density is used as in the half-map approach above to identify the
weighting of Fourier coefficients.
Local sharpening
Any of the sharpening methods can be applied locally instead of globally
over the entire map. For local sharpening, the map is cut into overlapping
boxes and sharpening parameter are determined for that box and applied.
For parts of boxes that overlap, the distances from a grid point to the
centers of the corresponding boxes are used to weight the values for those
boxes.
Finding the asymmetric unit of the map
Normally you should supply map_to_model with the unique part (asymmetric unit
of the map. You can get this using the phenix.map_box tool with the
extract_unique=True command. However if you want map_to_model to do this
automatically, you can supply symmetry matrices describing the
symmetry used to average the map (if any),
then map_to_model will try to define a region of the map that represents
the asymmetric unit of the map. Application of the symmetry operators to the
asymmetric unit will generate the entire map, and application to a model built
into the asymmetric unit will generate the entire model.
You can also supply the type of symmetry (e.g., C3, D7, etc)
and map_to_model will try to find that symmetry in the map. You can even
search for all plausible symmetry (ANY). For helical symmetry either a
symmetry file or the rotation and translation information is required, however.
Normally
identification of the asymmetric unit and segmentation of the map (below)
are done as a single step, yielding an asymmetric unit and a set of
contiguous regions of density within that asymmetric unit. The asymmetric unit
will be written out as a map to the segmentation_dir directory,
superimposed on the shifted map (so that they can be viewed together in Coot).
Segmentation of the map
By default (segment=True) the map or asymmetric unit of the map will
be segmented (cut into small pieces) into regions of connected density. This
is done by choosing a threshold of density and identifying contiguous regions
where all grid points are above this threshold. The threshold is chosen to
yield regions that have a size corresponding to about 50 residues. The
regions of density are written out to the segmentation_dir directory
and are superimposed on the shifted map (if you load the shifted map in
Coot and a region map in Coot, they should superimpose.)
Model-building
Models are built in several ways by map_to_model and then the best-fitting,
non-overlapping models are chosen. If the quick method for map_to_model is
selected and the chain type is protein, then the default is to run just
the trace_and_build model-building method. If quick=False, all methods are
tried. The main methods used for model-building
are:
Chain tracing followed by model-building (trace_and_build). Continuous
density is identified in the map, choosing the longest paths if there
are choices. Then C-alpha positions are identified from the presence of
side chain density, an all-atom model is constructed and refined.
Standard RESOLVE model-building for PROTEIN/RNA/DNA for the entire
asymmetric unit (or the entire molecule if no symmetry was used ).
Helices (RNA) or helices/strands (PROTEIN) for entire asymmetric unit
tracing chain (RNA/PROTEIN/DNA) for each segmented region, with various
values of map sharpening applied
RESOLVE model-building for each segmented region, with various values of
map sharpening applied
Intermediate models are refined with phenix.real_space_refine and are
written out relative to the shifted map with origin at [0,0,0].
You can view these intermediate models, the shifted map, and the
shifted map containing just the asymmetric unit , and any
region maps in Coot and they should all superimpose.
Once all intermediate models are built, all models of each chain type are
combined, taking the best-fitting model for each part of the map. Then all
chain types are combined, once again taking the best-fitting model for each
part of the map. The models are refined again.
Then (if present) symmetry is applied to the model and the full model is refined.
Finally the best model, with symmetry applied if present, is shifted to match
the original map and is written out.
Iterative map improvement with model-building and sharpening
A procedure is available
for carrying out multiple cycles multiple cycles of model-building
and map sharpening, using the model from each cycle in the sharpening process.
In practice this may help in borderline cases but it is not recommended for
normal use. The map is
sharpened to make it as similar as possible to density calculated from
the current model and the new map is used for the next cycle of model
building.
Examples
Standard run of map_to_model:
Running map_to_model is easy. From the command-line you can type:
phenix.map_to_model my_map.map seq.fa resolution=3
where my_map.map is a CCP4, mrc or other related map format, seq.fa is a
sequence file. Normally you should sharpen your map with phenix.auto_sharpen
and cut out the unique part of the map with phenix.map_box (using the
extract_unique=True keyword) before supplying it to map_to_model.
Output files from map_to_model
- map_to_model.pdb: A PDB file with the resulting model, superimposed on
- the original map (or on the magnified map if magnification is applied).
Standard run of map_to_model, specifying symmetry type:
Running map_to_model with symmetry is also easy. From the command-line
you can type:
phenix.map_to_model my_map.map seq.fa symmetry=D7 resolution=3
Here map_to_model will look for D7 (7-fold symmetry along the c-axis and
2-fold symmetry along a or b) in the map and apply it if it is found. It will
also write out the matrices corresponding to this symmetry.
Using carry_on to continue with a partially-finished run
If you have a completed or partially-completed run of map_to_model and
you want to run again but you do not want to re-run all the steps, you can
just carry on from where you left off by using the keyword carry_on=True.
Using carry_on to break up your run into small pieces
You can use carry_on=True to progressively build up your model
from pieces or to run many jobs in parallel.
First you run map_to_model with segment_and_split_only=True
to split up your
map and write a file (the info_pickle_file) that has all the
necessary information to put everything together.
Then you can create script files to run each small step in the
analysis. You can do that with the keyword split_up_job=True. This
will put your scripts in a commands/ subdirectory and you can
run them with your queueing system. When they are done,
you can use carry_on=True to put everything together.
You can also specify a source or other command to be placed in all
your scripts with the keyword source_command="xxx".
The scripts created by the split_up_job command use a set of keywords
like this (here only RNA will
be built, building will be done in just region number 26,
secondary structure searches are disabled and overall model-building
is disabled:
do_only_one_thing=True
chain_type=RNA
build_in_regions=True
input_map_id_start=26
input_map_id_end=26
include_helices_strands_only=False
include_phase_and_build=False
Another run might look like this, where overall model-building
is done starting with model number 4:
do_only_one_thing=True
chain_type=PROTEIN
build_in_regions=False
include_helices_strands_only=False
include_phase_and_build=True
model_start=4
You can run all the combinations that you would like in parallel.
Special treatment of structures with very high symmetry
If your structure has very high symmetry (by default, 20 or more symmetry
operators), then segment_and_split_map will try to cut out a piece
of your map and work with that instead of working with the whole map.
You can control this with the keyword select_au_box=True (or None, which
will give the default behavior). If you use select_au_box then a box
that contains about n_au_box asymmetric units of the map (default of 5)
will be cut out of your map. That map will be worked with along with the
symmetry operators that apply to it for the remainder of the analysis.
At the end of the analysis, just the region cut out will be built and
placed in the original reference frame. This process can greatly speed
up building and save on memory.
Building just the asymmetric unit of your model to save memory
Another way to save on memory is to
run just the segment_and_split step of map_to_model on a computer with a
lot of memory, then use the small map created by segment_and_split
for subsequent stages, and then finally take the resulting model for this
small map, put it into map_to_model on the computer with a lot of
memory and use it to create a final model. Here are the steps for doing this:
In a clean directory,
run phenix.map_to_model with the keyword
stop_after_segment_and_split_map=True on your map, including symmetry
information. Run this on a computer with a huge amount of memory.
Your segmentation information will be in segmented_maps/
Take the file segmented_maps/box_map_au.ccp4, which contains just the
asymmetric unit of density from your original map, and in a new
directory (perhaps box_map_directory/)
run phenix.map_to_model with box_map_au.ccp4 and the unique
part of your sequence file. This can be run (with luck) on a computer
with a more moderate amount of memory. The final model will be
box_map_directory/map_to_model/map_to_model.pdb;
it should match the box_map_au.ccp4 map.
Go back to the original directory where you split up your original map.
Run phenix.map_to_model with carry_on=True and specifying three keywords:
starting_model=box_map_directory/map_to_model/map_to_model.pdb
starting_model_is_from_box_map_au=True
build_new_model=False
Now map_to_model will start with your model from the small map, shift it
to match the original map, apply any symmetry you used originally,
refine the model, and write out a model map_to_model/map_to_model.pdb
Possible Problems
If you have a very large structure it is possible that your computer may not
have enough memory to run map_to_model and that one or more sub-processes
might crash.
The easiest solution is to run phenix.map_box with the extract_unique=True
option beforehand so that you are running map_to_model on just the
unique part of the structure.
You can also just cut out boxes of density from your structure with the
phenix.map_box tool and work on each one individually.
You can also set the
keyword coarse_grid=True to use a coarse grid in RESOLVE and save memory.
You can also cut back the resolution which will save memory.
Otherwise, you can try on a computer with even more memory.
If your queueing system crashes during a run or one or more sub-processes
crashes, then you might end up with models built for some stages of
building and others not. You can continue on with the keyword
carry_on=True in this case.
(see above in the section on combining partial runs).
Specific limitations and problems:
Literature
- A fully automatic method yielding initial models from high-resolution electron cryo-microscopy maps. T.C. Terwilliger, P.D. Adams, P.V. Afonine, and O.V. Sobolev. bioRxiv (2018).
Additional information
List of all available keywords
- job_title = None Job title in PHENIX GUI, not used on command line
- input_files
- map_coeffs_file = None File with map coefficients
- map_coeffs_labels = None Optional label specifying which columns of of map coefficients to use
- map_file = None File with CCP4-style map. This is the input map and its position defines the starting reference frame. The map may be shifted in later steps but the final model will be in the same reference frame as this input map.
- seq_file = None Sequence file
- ncs_file = None Input symmetry file for use with segment_and_split_map. Refers to the symmetry in the reference frame of the input map.
- model_file = None Optional input PDB files to be used as starting point for further model building. This model or models should be in the same reference frame as the input map unless starting_model_is_from_box_map_au is set.
- starting_model_is_from_box_map_au = False The starting model was built into box_map_au.ccp4, the asymmetric unit map created by segment_and_split_map. This model is offset from the original cell. This keyword will shift the starting model to match the original map before using it.
- sharpen_with_starting_model = True If starting_model is supplied, use it in identifying optimal resolution-dependent map sharpening. Also defines whether model-based sharpening will be used on cycles after the first.
- include_starting_model = True If starting_model is supplied, use it in model-building.
- target_ncs_au_file = None Optional PDB file to partially define the ncs asymmetric unit of the map. The coordinates in this file will be used to mark part of the ncs au and all points nearby that are not part of another ncs au will be added. The model should be positioned relative to the input map. If starting_model is supplied and include_starting_model=True then it will be used for this purpose as well.
- partial_model = None Optional input PDB file with protein/RNA/DNA chains for one step in model-building. If supplied, no model-building for this step will be carried out. Must match offset map (normally not the same as the original map.) This model will replace the model normally built at the step specified by partial_model_type and should match its position. Multiple partial models may be specified. You need to specify what type each one is with the keyword partial_model_type (the order of partial_model entries must match the order of partial_model_type entries). NOTE: This is intended as a way to use the models from a previous run, not to input your own model. For your own models, use starting_model instead.
- partial_model_type = init_PROTEIN init_RNA init_DNA helices_strands_only_PROTEIN build_rna_helices_RNA standard_PROTEIN standard_RNA standard_DNA *final_PROTEIN final_RNA final_DNA Model type for a partial model. The model will be used instead of generating this kind of model. One required for each partial_model. The partial_model_type keyword specifies both the model-building step (init, helices_strands_only, build_rna_helices, standard, final) and the chain_type (RNA, DNA, PROTEIN).
- trace_chain_pdb_in = None Input PDB file to use instead of tracing chains. Suitable only for the case where one region is being analyzed. The PDB file should match the original map (not the offset region).
- info_pickle_file = None Pickle file from segment_and_split_map. You can read in this file and and choose which map file to use. default is in segmentation_directory with file name of segment_and_split_map_info.pkl .
- input_map_id_start = None first ID (number 1 to N) of the map to analyze from info_pickle file
- input_map_id_end = None last ID (number 1 to N) of the map to analyze from info_pickle file
- seed_ca_in = None Input PDB file with possible CA positions to assist in tracing chains
- seed_ca_only = False Only use sites from seed_ca_in (do not find additional sites)
- model_pickle_file = None Pickle file with working models. You can read in this file and and start with the models in it. Only suitable if just one region of the map is to be analyzed (same as for trace_chain_pdb_in)
- pdb_to_shift = None You can supply a model and then set the keyword shift_type to specify which map this model is to to match. You can reverse the shift with reverse_shift=True. Requires the info_pickle_file file. If specified this is the only thing that happens (stops after writing out the file to shifted_pdb.)
- reverse_shift = False Specifies that the shift is from shifted_map or box_map_au reference frame back TO the original map reference frame (default is FROM the original map reference frame to the shifted_map or box_map_au map.)
- output_files
- params_out = map_to_model_params.eff Output parameters file
- pdb_out = map_to_model.pdb Output PDB file. It will be placed in both the output_directory and the final_output_directory.
- pdb_out_unique = map_to_model_unique.pdb Output PDB file (ncs AU). Just asymmetric unit of model
- sharpened_map_file = sharpened_map.ccp4 Output sharpened map file. This will be the map that is used in model-building. It will be superimposed on the original map.
- pdb_out_reverse = map_to_model_reverse.pdb Optional output reversed PDB file (traced opposite direction as pdb_out). Created if include_reverse is set.
- protein_mask_file = None Output ccp4-style map with mask for region of macromolecule
- region_model_suffix = None Suffix (before .pdb) for region models
- shifted_pdb = shifted_pdb.pdb Name of output file when using pdb_to_shift.
- output_directory = None Place where most files will go. Default is map_to_model_xxx
- final_output_directory = None The map_to_model.pdb file and sharpened map files will go here in addition to the output_directory.
- segmentation_directory = None Place where segmented maps will go if segment=True. Default is segmented_maps_xxx
- crystal_info
- chain_types = PROTEIN DNA RNA Chain type (PROTEIN/DNA/RNA). You can choose one or more. If not specified; it will be guessed from your sequence file. Multiple chain types are fine.
- minimum_fraction = 0.001 If chain types automatically determined, ignore any with fraction less than this.
- scattering_table = n_gaussian wk1995 it1992 *electron neutron Choice of scattering table for structure factor calculations. Standard for X-ray is n_gaussian, for cryoEM is electron.
- is_crystal = None Defines whether this is a crystal (or cryo-EM). Default is True if map_coefficients are supplied and False if a map is supplied. If True and no symmetry operators are supplied, segmentation yields the asymmetric unit of the crystal. Additionally the final model will represent the asymmetric unit of the crystal (not the entire map). Normally set is_crystal and use_sg_symmetry together.
- use_sg_symmetry = None If you set use_sg_symmetry=True then the symmetry of the space group will be used. For example in P1 a point at one end of the unit cell is next to a point on the other end. Normally for cryo-EM data this should be set to False. Default is True if map_coefficients are supplied and False if a map is supplied.
- resolution = None High-resolution limit. Data will be truncated at this resolution. If a map is supplied, it will be Fourier filtered at this resolution (multiplied by d_min_ratio). Required if input is a map and only_original_map is not set.
- solvent_content = None Solvent fraction of the cell. If this is density cut out from a bigger cell, you can specify the fraction of the volume of this cell that is taken up by the macromolecule. Normally set automatically. Values go from 0 to 1.
- solvent_content_iterations = 3 Iterations of solvent fraction estimation
- truncate_at_d_min = None Truncate data at resolution specified. Default is True if map_coeffs are supplied, False if map is supplied
- map_inside_box = True Place centers of all chains inside (0,1). This is required during normal operation.
- space_group = None Space group (normally read from the data file)
- unit_cell = None Unit Cell (normally read from the data file)
- chain_type = PROTEIN DNA RNA Do not set. Use chain_types instead.
- sequence = None Sequence as text string. Normally supply a sequence file instead
- origin_frac = None Saves origin shift
- solvent_fraction = None Same as solvent_content. Only used on input and copied to solvent content.
- reconstruction_symmetry
- symmetry = None Symmetry used in reconstruction. For example D7, C3, C2 I (icosahedral),T (tetrahedral), or ANY (try everything and use the highest symmetry found). Ignored if ncs_file is supplied or if asymmetric_map is specified or if a map obtained with extract_unique is supplied.
- find_symmetry = None Find symmetry in the map (alternative to supplying a symmetry file or specifying symmetry or supplying a map of unique part or specifying asymmetric_map=True).
- unique_part_only = True If symmetry is used, only the unique part of the map will be analyzed. (Do not refine full model and do not write out full model. Otherwise the same as usual).
- asymmetric_map = None Specifies that this is an asymmetric map and no symmetry is to be supplied or found. Alternative to supplying a symmetry file or symmetry.
- ncs_center = None Center (in A) for symmetry operators (if symmetry is found automatically). If set to None, first guess is the center of the cell and then if that fails, found automatically as the center of the density in the map.
- optimize_center = None Optimize position of symmetry center. Default is False if ncs_center is supplied or center of map is used and True if it is found automatically).
- helical_rot_deg = None helical rotation about z in degrees
- helical_trans_z_angstrom = None helical translation along z in Angstrom units
- two_fold_along_x = None Specifies if D or I two-fold is along x (True) or y (False). If None, both are tried.
- random_points = None Number of random points in map to examine in finding symmetry
- n_rescore = None Number of symmetry operators to rescore
- op_max = None If symmetry is ANY, try up to op_max-fold symmetries
- tol_r = 0.02 tolerance in rotations for point group or helical symmetry
- abs_tol_t = 2 tolerance in translations (A) for point group or helical symmetry
- rel_tol_t = 0.05 tolerance in translations (fractional) for point group or helical symmetry
- max_helical_operators = None Maximum helical operators (if extending existing helical operators)
- require_helical_or_point_group_symmetry = None Normally helical or point-group symmetry (or none) is expected. However in some cases (helical + rotational symmetry for example) this is not needed and is not the case. If set to True then the program will stop just before symmetry is applied if symmetry is present but helical or point-group symmetry is not present.
- map_modification
- magnification = None Magnification to apply to input map. Input map grid will be scaled by magnification factor before anything else is done.
- b_iso = None Target B-value for map (sharpening will be applied to yield this value of b_iso)
- b_sharpen = None Sharpen with this b-value. Contrast with b_iso that yield a targeted value of b_iso
- b_blur_hires = 200 Blur high_resolution data (higher than d_cut) with this b-value. Contrast with b_sharpen applied to data up to d_cut. Note on defaults: If None and b_sharpen is positive (sharpening) then high-resolution data is left as is (not sharpened). If None and b_sharpen is negative (blurring) high-resolution data is also blurred.
- resolution_dependent_b_sharpen = None If set, apply resolution_dependent_b_sharpen (b0 b1 b2). Log10(amplitudes) will start at 1, change to b0 at half of resolution specified, changing linearly, change to b1 at resolution specified, and change to b2 at high-resolution limit of map
- normalize_amplitudes_in_resdep = False Normalize amplitudes in resolution-dependent sharpening
- d_min_ratio = 0.833 Sharpening will be applied using d_min equal to d_min_ratio times resolution. If None, box of reflections with the same grid as the map used.
- input_d_cut = None High-resolution limit for sharpening
- rmsd = None RMSD of model to true model (if supplied). Used to estimate expected fall-of with resolution of correct part of model-based map. If None, assumed to be resolution times rmsd_resolution_factor.
- rmsd_resolution_factor = 0.25 default RMSD is resolution times resolution factor
- auto_sharpen = False Apply auto-sharpening in segment_and_split_map ( requires segment=True).
- auto_sharpen_regions = False Automatically determine sharpening for each region (in addition to the overall map).
- auto_sharpen_methods = no_sharpening b_iso *b_iso_to_d_cut resolution_dependent model_sharpening half_map_sharpening target_b_iso_to_d_cut None Methods to use in sharpening. b_iso searches for b_iso to maximize sharpening target (kurtosis or adjusted_sa). b_iso_to_d_cut applies b_iso only up to resolution specified, with fall-over of k_sharpen. Resolution dependent adjusts 3 parameters to sharpen variably over resolution range. Default is b_iso_to_d_cut . target_b_iso_to_d_cut uses target_b_iso_ratio to set b_iso.
- local_sharpening = None Sharpen locally using overlapping regions. NOTE: Best to turn off local_aniso_in_local_sharpening if symmetry is present. If local_aniso_in_local_sharpening is True and symmetry is present this can distort the map for some symmetry copies because an anisotropy correction is applied based on local density in one copy and is transferred without rotation to other copies.
- local_aniso_in_local_sharpening = None Use local anisotropy in local sharpening. Default is True unless symmetry is present.
- overall_before_local = True Apply overall scaling before local scaling
- box_in_auto_sharpen = False Use a representative box of density for initial auto-sharpening instead of the entire map.
- density_select_in_auto_sharpen = False Choose representative box of density for initial auto-sharpening with density_select method (choose region where there is high density). Normally set to False (instead in map_to_model use density_select=True which cuts out the map before auto-sharpening)
- allow_box_if_b_iso_set = False Allow box_in_auto_sharpen (if set to True) even if b_iso is set. Default is to set box_n_auto_sharpen=False if b_iso is set.
- soft_mask = None Use soft mask (smooth change from inside to outside with radius based on resolution of map). Default False unless thorough is True
- max_box_fraction = None If box is greater than this fraction of entire map, use entire map.
- density_select_max_box_fraction = None If box is greater than this fraction of entire map, use entire map for density_select. Default is 0.95
- use_weak_density = False When choosing box of representative density, use poor density (to get optimized map for weaker density)
- k_sharpen = None Steepness of transition between sharpening (up to resolution ) and not sharpening (d < resolution). Note: for blurring, all data are blurred (regardless of resolution), while for sharpening, only data with d about resolution or lower are sharpened. This prevents making very high-resolution data too strong. Note 2: if k_sharpen is zero or None, then no transition is applied and all data is sharpened or blurred. Note 3: only used if b_iso is set.
- iterate = False You can iterate auto-sharpening. This is useful in cases where you do not specify the solvent content and it is not accurately estimated until sharpening is optimized.
- optimize_b_blur_hires = False Optimize value of b_blur_hires. Only applies for auto_sharpen_methods b_iso_to_d_cut and b_iso. This is normally carried out and helps prevent over-blurring at high resolution if the same map is sharpened more than once.
- optimize_d_cut = None Optimize value of d_cut. Only applies for auto_sharpen_methods b_iso_to_d_cut and b_iso
- region_weight_method = initial_ratio *delta_ratio b_iso Method for choosing region_weights. Initial_ratio uses ratio of surface area to regions at low B value. Delta ratio uses change in this ratio from low to high B. B_iso uses resolution-dependent b_iso (not weights) with the formula b_iso=5.9*d_min**2
- search_b_min = None Low bound for b_iso search.
- search_b_max = None High bound for b_iso search.
- search_b_n = None Number of b_iso values to search.
- residual_target = None Target for maximization steps in sharpening. Can be kurtosis or adjusted_sa (adjusted surface area)
- sharpening_target = None Overall target for sharpening. Can be kurtosis or adjusted_sa (adjusted surface area). Used to decide which sharpening approach is used. Note that during optimization, residual_target is used (they can be the same.)
- target_b_iso_ratio = 5.9 Target b_iso ratio : b_iso is estimated as target_b_iso_ratio * resolution**2
- region_weight = 40 Region weighting in adjusted surface area calculation. Score is surface area minus region_weight times number of regions. Default is 40.
- discard_if_worse = None Discard sharpening if worse
- sa_percent = 30. Percent of target regions used in calulation of adjusted surface area. Default is 30.
- fraction_occupied = 0.20 Fraction of molecular volume targeted to be inside contours. Used to set contour level. Default is 0.20
- n_bins = 20 Number of resolution bins for sharpening. Default is 20.
- max_regions_to_test = 30 Number of regions to test for surface area in adjusted_sa scoring of sharpening
- eps = None
- strategy
- keep_longest_model = None Keep longest model (most residues)
- build_new_model = True If set to False, no new model building is done...just read in any existing input models and put them together and refine.
- cycles = 1 You can run cycles of model-building interpersed with model-based map sharpening.
- cycle_auto_sharpen = True Apply auto-sharpening on all cycles after the first, even if auto_sharpen is False. This applies model-based sharpening.
- density_select = None Run map_box with density_select=True to cut out the region in the input map that contains density. Useful if the input map is much larger than the structure. Default is True if an unmodified map is supplied and False if is_crystal is set or map_coeffs are supplied or the map has been extracted with extract_unique.
- density_select_threshold = None Choose region where density is this fraction of maximum or greater
- soft_mask_in_density_select = False Use soft mask (smooth change from inside to outside with radius based on resolution of map) when cutting out map with map_box
- get_half_height_width = None Use 4 times half-width at half-height as estimate of max size
- box_ncs_au = None Box the map containing just the au of the map. Default is True if an unmodified map is supplied and False if is_crystal is set or map_coeffs are supplied or the map has been extracted with extract_unique.
- segment = True Run segment_and_split_map to break up map into pieces for model-building (recommended for larger structures). Required if map origin is not at (0,0,0). Not used in trace_and_build. You can also run segment_and_split_map separately and then run model-building (there are more parameters available for segment_and_split_map in that case).
- build_in_regions = None Run building on individual segmented regions
- include_trace_and_build = True If segment is True, run trace_and_build on the entire asymmetric unit of the map.
- include_phase_and_build = None If segment is True, run phase_and_build on the entire asymmetric unit of the map.
- include_helices_strands_only = None If True, run build_one_model on the entire asymmetric unit of the map with helices_stands (protein) or with build_rna_helices (RNA).
- min_segment_length = None Minimum residues in a segment to keep in trace_and_build (after joining and insertion). NOTE: default of 7 is set to build as much model as possible. Default in trace_and_build is 15, suitable for optimal results with sequence_from_map. Default if quick is 15, otherwise 15.
- split_and_join = None Split fragments at low density and then rejoin. Criterion is sd_ratio_pare_model. Default False if quick, True otherwise.
- trace_outside_model = None Trace outside model. After initial tracing, mask out region found and look for additional segments. Default False if quick, True otherwise.
- retry_long_branches = None If a long branch is found, try to retrace chain
- correct_segments = True Correct segments. Try to fix errors using fixed model as a template.
- vary_sharpening = None Variable sharpening values. Zero is always added. For example 75 -75 -125 will sharpen by B=75 then blur by b=75 and b=125. May increase residues built but decrease accuracy. Default in trace_and_build and sequence_from_map is None. Default if quick is None. Default otherwise is 75 -75 -125.
- vary_high_density_from_model = None Try high_density_from_model True and False. Default is True unless quick=True
- high_density_from_model = None Create high density in map along path of main chain atoms in fixed model in trace_and_build. Also sets keep_main_chain_path. NOTE: Default is try True/False unless quick=True. True may build as much model as possible. Default in trace_and_build is False, suitable for optimal results with sequence_from_map.
- fix_insertions_deletions = None Retrace chains to fix insertions and deletions in trace_and_build. Default is False unless quick=False.
- max_nproc_for_build_one_model = 30 Use only up to this number of processors for a single run of phase_and_build. Typically there is little time advantage to using more than about 30 processors and if you use too many the time could increase and quality decrease due to splitting up the model-building too much.
- nmodels_for_phase_and_build = None Number of models to build in phase_and_build. Default 2 unless thorough is True (then 20)
- score_with_cc_in_phase_and_build = True Use map-model CC to score in phase_and build
- include_reverse = False Create reversed model where all chains are traced in opposite direction to pdb_out
- reverse_only = False Only create reversed model where all chains are traced in opposite direction to pdb_out (requires input model).
- quick_and_medium = None Run quick and medium methods
- thoroughness = quick *medium thorough extra_thorough If quick is True, then run more quickly. Suitable for easy cases with protein only. If medium, standard run. If thorough, run quick and medium and take best result. If extra_thorough , then run with more cycles and more attempts to build. Suitable for challenging cases.
- quick_debug = False Run very quickly just to test routines
- pdb_in_only = None Only use pdb_in (do not build anything new)
- assign_sequence = None Run assign_sequence to match model to sequence. Default is True if sequence_from_map_before_fit_loops is False (use one or the other).
- rerun_with_avail_seq = True Rerun sequence alignment with parts of the sequence already used removed. With a large structure this can take a long time.
- rearrange_before_assign_sequence = None Run replace_side_chains with reassign_sequence before assign_sequence. Default False unless thorough.
- sequence_from_map_before_fit_loops = True Run sequence_from_map before fit_loops
- create_gap_length = None Create gaps of this length if segments are nearly overlapping. A gap of 2 is suitable. This is to help fit_loops. Not default as it does not seem to help.
- fit_loops_before_assign_sequence = True Run fit_loops before assign_sequence
- extend_with_resolve = None Run resolve model extension. Default False unless thorough
- min_percent_assigned_for_assign_sequence = 25 Skip assign_sequence if percentage sequence assigned is lower than min_percent_placed_for_assign_sequence
- segmentation
- value_outside_mask = None Value to assign to density outside masks in segment_and_split_map
- min_relative_helical_cc_to_keep = 0.90 For helical symmetry, keep copies within this range of max at end
- add_neighbors = True Add neighboring regions around the NCS au. Turns off exclude_points_in_ncs_copies also.
- select_au_box = None Select box containing at least one representative region of the map. Also select just symmetry operators relevant to that box. Default is true if number of operators is at least n_ops_to_use_au_box
- n_ops_to_use_au_box = None If number of operators is this big or more and select_au_box is None, set it to True.
- n_au_box = None Number of symmetry copies to try and get inside au_box
- lower_bounds = None You can select a part of your map for analysis with lower_bounds and upper_bounds.
- upper_bounds = None You can select a part of your map for analysis with lower_bounds and upper_bounds.
- trace_and_build
- find_chains = True Try to create new chains in build_all_loops
- extend_chains = True Try to extend chains in build_all_loops
- connect_chains = True Try to connect chains in build_all_loops
- find_helices_strands = True Find helices/strands before tracing chain if no starting model is supplied
- n_short_min = 50 N short min. Length of group of CA to optimize together (min)
- allow_reverse = True If two chains are opposite direction but connected, reverse the one with lower score (usually shorter) and connect them In trace_and_build. Also consider both directions of fragments from model_file.
- length_variants_to_try = None You can try several lengths of fragments (number of CA). Not compatible with allow_reverse=False. Typical values are 1 or 5
- final_connect_chains = None Connect chains at very end after merging fragments. Uses full map. Default False if quick, True otherwise
- region_sharpening
- include_original_map = False You can include the original map without Fourier filtering in local maps with this keyword. To use only the original map use only_original_map=True
- only_original_map = False You can include just the original map without Fourier filtering in local maps. Applies if the starting point is a map (not map coefficients).
- region_b_sharpen = None B-factor for sharpening of local maps. Normally set to None and several values are tested. (Positive is sharpen, negative is blur) Note: if resolution is specified map will be Fourier filtered to that resolution even if it is not sharpened.
- region_b_sharpen_low = -105 Lowest value of region_b_sharpen to use. Applies if region_b_sharpen_delta is set. Negative is blur, positive is sharpen.
- region_b_sharpen_high = 0 Ending value of region_b_sharpen. Applies if region_b_sharpen_delta is set Negative is blur, positive is sharpen.
- region_b_sharpen_delta = 15 Incremental value of region_b_sharpen. If set, region_b_sharpen will be applied with values from region_b_sharpen_low to region_b_sharpen_high in increments of region_b_sharpen_delta. Ignored if region_b_sharpen is set.
- resolve_model_building
- run_resolve_model_building = None Run resolve model-building algorithm on maps. If None, set to True for RNA/DNA and False for protein.
- thorough_resolve_model_building = True Use thorough resolve model-building
- run_helices_strands_only = None Run resolve helices-strands-only algorithms on maps. If None set to True for RNA and False for protein. This applies to segmented maps only if they exist. Compare with include_helices_strands_only which applies to the entire asymmetric unit
- trace_chain
- run_trace_chain = True Run trace-chain model-building algorithm on maps
- rho_cut_min = 2.0 Minimum density (rho/sigma) at coordinates of potential CA atoms in trace_chain, after normalization for solvent fraction. For constant actual local rms in a map, the sigma (overall rms) of the map is proportional to the sqrt(1-solvent_fraction). Therefore rho_cut_min is adjusted by sqrt(0.5)/sqrt(1-solvent_fraction) to place it on a constant scale relative to a map with standard local rms.
- target_angle = 180 Target angle for CA-CA-CA (set to 180 to maximize chain length)
- rho_cut_min_low = 1. Starting value of rho_cut_min. Applies if rho_cut_min_delta is set (rho_cut_min is ignored in this case)
- rho_cut_min_high = 5 Ending value of rho_cut_min. Applies if rho_cut_min_delta is set
- rho_cut_min_delta = None Incremental value of rho_cut_min. If set, rho_cut_min will be ignored
- rat_pair_min = 0.5 Minimum ratio of density at midpoint between points to trace chain between them
- dist_ca_tol_max = 0.80 Maximum tolerance for CA-CA distances. Normally 0.8 A for thorough run and 1.3 A for quick
- dist_ca_tol_start = 0.10 Minimum tolerance for CA-CA distances.
- rad_sep_trace = 0.75 Dummy atom separation in trace_chain Usual 0.6 A for thorough run and 0.75 for quick Increased automatically if resolution is greater than 3 A Value of rad_mask_trace in resolve will be rad_sep_trace*2
- n_atoms_total_scale = 3.33 Ratio of estimated atoms in au to standard estimate
- target_p_ratio = 3 Target ratio of atoms to peaks in trace_chain
- target_n_ratio = 3 Target ratio of nonamers to peaks in trace_chain
- max_triple_ratio = 10 Maximum ratio of triples to pairs in trace_chain
- max_pent_ratio = 10 Maximum ratio of pentamers to pairs in trace_chain
- atom_target_ratio = 0.45 Target ratio of CA to look for to expected atoms in structure Standard is 0.45, quick is 0.35
- build_both_directions = False Build chains both directions and choose after merging
- min_end_correl = 0.5 Minimum correlation of direction estimated from two ends to use end matching as criterion for keeping a chain
- add_side_chains = True Add in side chains at trace_chain step
- crossing
- combine_models = True Merge models from trace_chain to form single model
- standard_merge = None Merge by reading all chains and running resolve merging.
- extend_in_merge = False Extend fragments of first and second model during merging of models
- use_cc_in_combine_extend = None You can choose to use the correlation of density rather than density at atomic positions to score models in the merge_second_model or merge_both_models step. This may be useful at lower resolution (> 3 A)
- merge_remainder = True Merge remainder in merge_by_segment_correlation
- merge_by_segment_correlation = True Merge using segment correlation if merge_with_combine_models is selected. Can be used along with merge_remainder=True/False and standard_merge=True/False
- merge_second_model = True In merging models, cut up second model to fill gaps in first Normally used internally.
- remove_poor_fragments = True After merging models with RNA and protein, remove poor protein chains with remove_poor_fragments
- remove_poor_fragments_minimum_rna = 0.20 Minimum RNA:protein residues in sequence file to run remove_poor_fragments
- wang_radius = 5 Smoothing radius for solvent identification
- cc_min = 0.40 Minimum map-model correlation to keep a segment after refinement
- assignment_weight = 0.20 If set, increase score of segments assigned to sequence.
- cc_min_rna = 0.25 Minimum map-model correlation (RNA/DNA building)
- overlap_tolerance = None Minimum distance between N/P/C1prime in different chains. Used to reject symmetry-related chains. Default is 3 A for RNA/DNA and 2 A for protein
- ncs_clash_threshold = 2. Threshold for considering two atoms too close after applying symmetry
- remove_outside_cell = None Remove chains with atoms outside cell
- refinement
- refine = True Refine fragments after each building cycle
- refine_adp = False Refine individual atomic displacement factors (adp)
- number_of_sa_models = None Number of refinement sa_models. Default 2 (20 if thorough)
- number_of_trials = None Number of refinement trials. Default 2 (20 if thorough)
- number_of_macro_cycles = None Number of refinement macro_cycles. Default 2 (5 if thorough)
- number_of_build_cycles = None Number of build cycles. Default is 2 (2 if thorough)
- rebuilding
- rebuild = True Rebuild model. Only for protein.
- rebuild_cycles = None Number of rebuilding cycles. Set to zero to skip rebuilding. Default is 1 (2 if thorough)
- trace_loops = True Rebuild model by iterative loop tracing. Only for protein.
- trace_loops_target_time = 60 Target time for completing a single trace_loops job. You can decrease it to speed it up (and miss some rebuilding) or increase it and possibly get some more successful rebuilding.
- loop_lib = False Rebuild model using loop library. Only for protein.
- rebuild_length = 8 Length of segments to be rebuilt
- rebuild_length_worst = 6 Length of segments to be rebuilt in rebuilding worst sections
- offset_length = 3 Offset of segments to be rebuilt
- rebuild_length_loop_lib = 3 Length of segments to be rebuilt with loop library (max=3)
- offset_length_loop_lib = 1 Offset of segments to be rebuilt with loop library
- start_res = None Start residue for rebuilding. Normally use None.
- end_res = None Ending residue for rebuilding. Normally use None.
- worst_rebuild_percent = 10 If specified, this percentage of residues will be rebuilt. Otherwise, the entire model will be rebuilt
- target_insert = None Length of insert to put from start_res to end_res Normally leave blank. Can be used to force the addition or subtraction of residues.
- target_insert_offset = None If non-zero, try to rebuild with this number of extra residues in each rebuilt segment. Normally leave blank. Can be used to force the addition or subtraction of residues.
- iterative_ss_refine
- iterative_ss = True Run iterative assignment of secondary structure and refinement with secondary structure restraints
- ss_refine_ncycle = None Overall cycles of iterative secondary_structure identification and refinement. Default is 10
- refine_cycles = None Cycles of refinement within iterative secondary structure and refinement. Default 2 (5 if thorough=True)
- parallel_runs = None Number of parallel runs of iterative secondary structure assignment and refinement. Best model from each group is taken each cycle. Default 1 (4 if thorough)
- combine_annotations = False Combine existing secondary_structure annotations each cycle with new annotations
- replace_side_chains = True Replace side chains during iterative optimization
- regularize = True regularize secondary structure during iterative optimization
- mr_rosetta
- run_mr_rosetta = False Run mr_rosetta to rebuild model
- rosetta_models = 20 Number of rosetta models to build
- relax_models = 5 Number of relaxed rosetta models to build
- rosetta_fixed_seed = None Fixed seed for Rosetta. Only use for regression tests.
- control
- multiprocessing = *multiprocessing sge lsf pbs condor pbspro slurm Choices are multiprocessing (single machine) or queuing systems
- queue_run_command = None run command for queue jobs. For example qsub.
- nproc = 1 Number of processors to use
- max_dirs = 1000 Maximum number of directories (map_to_model_xxxx)
- build_only = False Only build model (then stop)
- build_and_refine_only = False Only build model and refine (then stop)
- skip_combine_models = None Skip combine_models step
- skip_final_refinement = None Skip final refinement
- stop_after_segment_and_split_map = False Only carry out through segment_and_split_map step (then stop)
- split_up_job = False Generate scripts and optionally run them (if run_split_jobs=True) to run very small parts of your overall job individually. This is used to speed up the overall process for big structures such as ribosomes. You can run this after you have set up the job and segmented_maps/segment_and_split_map_info.pkl is present. After these jobs are done, you run map_to_model once more with carry_on=True and it will combine everything and finish up.
- run_split_jobs = False Run the split jobs from split_up_job=True directly. Default is to write scripts to the split_jobs_directory. .short_caption = Run split jobs
- split_jobs_commands_directory = commands Place where scripts to run split jobs will go
- split_jobs_logs_directory = logs Place where logs from split jobs will go
- split_jobs_source_command = None You can set the source command for split jobs
- remove_pdb_block = None Clear out any .pdb_block files written by previous runs of map_to_model. Normally only set to False when you are running jobs with split_up_job. Default is True unless do_only_one_thing is set.
- do_only_one_thing = None You can use this to not combine models. Just build and save. This is part of carry_on=True and is a way to build up pieces of your model.
- stop_after_combine_chain_types = False Stop after combining models from different chain types
- model_start = None You can use this to specify skipping model-building of model numbers below this (used to start in the middle)
- model_end = None You can use this to specify skipping model-building of model numbers above his (used to end in the middle)
- resolve_size = None Size of resolve to use.
- coarse_grid = None Use a coarse grid in RESOLVE (saves on memory)
- random_seed = 77151 Random seed (allows duplicating calculations or getting different results each time.)
- verbose = False Verbose output
- comparison_model = None Comparison model
- carry_on = False You can try to carry on from where you left off. The info_pickle_file (default or whatever you specify) will be read and any files listed in partial_model_type that are present will be used. If they are not present they will be generated. With this keyword you can progressively build up your model from pieces. You can also run many jobs in parallel building just part of the chain using the keywords do_only_one_thing input_map_id_start input_map_id_end model_start model_end and specifying chain_type and turning on and off include_helices_strands_only include_phase_and_build and build_in_regions. After they all run, you can run from the beginning again with carry_on=True and all the intermediate files will be combined to yield a full model.
- memory_check = None Map-to-model checks to make sure you have enough memory on your machine to run. You can disable this by setting this keyword to False. The estimates are approximate so it is possible your job could run even if the check fails. Note the check does not take any other uses of the memory on your machine into account.
- em_side_density = None Use EM side chain density in assign_sequence (and trace_and_build if em_side_density_in_tnb is True). Default is True if scattering_table is electron.
- em_side_density_in_tnb = False EM side chain density in trace_and_build ( if em_side_density is True)
- ignore_map_limitations = None Ignore limitations such as map cannot be sharpened
- guiGUI-specific parameter required for output directory