Density modification of cryo EM maps with resolve_cryo_em

Author(s)

resolve_cryo_em: Tom Terwilliger

Purpose

The routine resolve_cryo_em is a tool for carrying out density modification of cryo-EM maps

Usage

Density modification with resolve_cryo_em can be carried out based on two half-maps, along with the FSC-based resolution and a sequence file specifying the contents of the map.

Density modification can also be carried out by using the initial density-modified map as a basis for generating multiple models, then averaging the model-based maps with the density-modified map to yield a model-based density modified map.

How resolve_cryo_em works:

Density modification with resolve_cryo_em is based on two ideas. One is that the errors in Fourier coefficients representing a cryo-EM map are (to some extent) uncorrelated. This means that one Fourier coefficient does not know about the errors in another one. (Note that this is not including errors that are correlated simply because the molecule is small and is placed in a large map. Correlated errors in this context are those where one Fourier coefficient has been adjusted to compensate for errors in another one.)

The other is that some features in a map are known in advance. This could include features such as the flatness of the solvent region, distributions of map values in the solvent and macromolecule region, similarities of symmetry-related regions.

Then the way density modification works is that Fourier coefficients for the map are adjusted to agree both with the original map and with the expected features. This improves the Fourier coefficients, and the key result is that the map improves everywhere, not just where the information about expected features was available.

Unique features of density modification for cryo-EM are that two half-maps with independent errors are available in cryo-EM (allowing estimation of errors), and that the errors in Fourier coefficients are (more or less) distributed as two-dimensional Gaussians (i.e both phase and amplitude errors). This leads to many differences in implementation density modification in crystallography though core elements are identical.

Using resolve_cryo_em:

Normally you will access the functionality of resolve_cryo_em by running the ResolveCryoEM tool in the Phenix GUI. You can also run it from the command line. (You may wish to run it from the command line in background with multiprocessing if you are running with denmod_with_models=True as this can take a long time (1 day x 16 processors for 250 residues in the unique part of the model for example).

Half-maps: Supply two unmasked half maps. They can be sharpened but it does not make much of a difference.

Sequence file: Supply a sequence file with the sequence of the molecule. Be sure to put in all copies of the molecule (i.e. a 24-mer needs 24 chains).

Procedure used by resolve_cryo_em

The inputs to resolve_cryo_em are:

Two unmasked half-maps
sequence file or molecular mass or solvent fraction

The procedure used by resolve_cryo_em has several steps:

Boxing of maps:  If the supplied maps are much larger than the molecule,
the maps are trimmed down to about 5 A bigger than the largest dimension
of the molecule (estimated from a low-res mask and the molecular
volume based on sequence or as specified) in each direction.

Resolution estimate and half-map sharpening of maps: The half-maps are
compared as a function of resolution and the resolution (FSC=0.143)
is estimated and the maps are sharpened based on the estimated map quality
of the full (averaged) map.  A full map is calculated.

Generation of map-value (density) histograms:  The full map is analyzed
to identify the distribution of map values in the solvent and
macromolecule region.  These histograms are to be used in density
modification.

Density modification of half-maps:  Each half map is density-modified
using maximum-likelihood density modification. The histograms of map
values from the preceding step are used as targets indicating what the
distribution should be in the density modified maps.

Estimation of errors:  Fourier coefficients for the two starting
half-maps and the two density-modified maps are compared to give FSC
values as a function of resolution.  These FSC values are used to estimate
correlated and uncorrelated errors in the four maps and to identify
optimal weighting between original and density-modified maps.

Optional real-space and sigma weighting:  The smoothed local rms differences
between original half maps and between density-modified half maps are
used (optionally) to identify location-specific weighting for the
original and density-modified maps.  The variance of Fourier coefficients
among the four maps are used (optionally) to weight individual final
Fourier coefficients.

Optional spectral scaling and local sharpening.  The final
map is optionally scaled with a resolution-dependent scale factor
representing the radial part of a typical Fourier transform of a
macromolecule.  The final map is optionally locally resolution-filtered
(local sharpening).  The final map is also optionally blurred slightly
with a blurring dependent on the overall resolution of the map.

Procedure used by resolve_cryo_em for density modification with model-building

Density modification with model-building adds additional cycles to the density modification procedure in which multiple models are built using map_to_model and the averaged density and uncertainty in the average density is used to combine the model density with the initial density-modified map.

The procedure includes:

Create initial density-modified half-maps and full map

Create N (typically 16) variants of the full map by changing the resolution
cutoff, spectral_scaling, and blurring of the map.

Build a model into each modified full map

Refine some of the models against half-map 1 and some against half-map 2

Create one composite model based on all models

Create model density for each half-map based on the models refined againt
that map.  This model density will have a mean value and variance for each
point in the map near to at least 3 models.

Create composite density for each half-maps by combining the model density
with the density-modified half-map, weighting the model density according to
its consistency among models.

Density-modify each composite half-map, and create a new set of density
modified half-maps and full map, as in the procedure for standard
density modification.

Sharpen the resulting maps using model-based sharpening with the composite
model.

Examples

Standard run of resolve_cryo_em:

You can use resolve_cryo_em to density-modify a cryo-EM map:

phenix.resolve_cryo_em half_map_A.mrc half_map_B.mrc seq_file=seq.dat

Possible Problems

If the half-maps have been masked the procedure may not work well.

If the solvent noise is very non-uniform the procedure may work poorly. By default a rectangular solid region enclosing the molecule is cut out and used in density modification. You can supply a boxed map and set the keyword box_before_analysis=False to avoid this.

If the maps have very prominent density away from the macromolecule this may interfere with density modification.

If there is non-macromolecule but real density in the maps this may interfere with density modification (for example, lipid density).

Specific limitations and problems:

Density modification introduces some correlations between half-maps due to solvent flattening. This can have a small effect on the resolution estimates obtained with half-map FSC. The resolution estimates provided by the program are corrected for this effect.

If you use the real_space_weighting or sigma_weighting or sharpening_type=local_final_half_map options there may be some extra correlations between half-maps introduced. Calculating resolution using FSC between these density-modified maps can lead to overstating the resolution. The resolution estimates provided by the program are before applying these weighting schemes (unless you specify local_methods_final_cycle = False and run multiple cycles) so they are not normally affected by this.

The density modification procedure works best in the resolution range of about 4.5 A or better.

Model-based density modification necessarily biases the map towards the models that are built. By building multiple models, the effect of this bias is reduced but not eliminated. For example if the starting map has an error that causes models to be built with a side chain the wrong place, the new model- based density will show even more density in that location. It is essential that the original or non-model-based maps be consulted to evaluate any specific density in the map.

Literature

Improvement of cryo-EM maps by density modification. T.C. Terwilliger, S.J. Ludtke, R.J. Read, P.D. Adams, and Afonine. bioRxiv (2019).

Additional information

List of all available keywords

job_title = None Job title in PHENIX GUI, not used on command line
input_files
- half_map_file_name_1 = None Half map file name
- half_map_file_name_2 = None Half map file name
- map_file_name = None Map file name. Normally just supply half maps. If you supply a full map it will be sharpened based on the half-maps and used to get histograms in the first cycle of density modification.
- target_map_file_name = None Input target map file name
- mask_file_name = None Input mask file name (as map file with 1 for macromolecule, 0 solvent, smoothly varying between 1 and 0). Used instead of automatic mask to define region where map correlations are calculated.
- model_1 = None Model corresponding to half-map 1 (unique part of model) to use in density modification. All model_1 models should be refined against half_map_1 or denmod_half_map_1 before supplying them. Normally supply at least minimum_model_files corresponding to half-map 1 and the same for half-map 2. Generated automatically if density_modify_with_model is set.
- model_2 = None Model corresponding to half-map 2 (unique part of model) to use in density modification. All model_2 models should be refined against half_map_2 or denmod_half_map_2 before supplying them. Normally supply at least minimum_model_files corresponding to half-map 2 and the same for half-map 1. Generated automatically if density_modify_with_model is set.
- minimum_model_files = 3 Minimum number of models representing half-maps 1 and 2. NOTE: must be at least 2.
- denmod_map_file_name = None Optional input density-modified full map. Normally generated automatically
- denmod_half_map_file_name_1 = None Optional input density-modified half map file 1. Normally generated automatically
- denmod_half_map_file_name_2 = None Optional input density-modified half map file 2. Normally generated automatically
- truncate_density_file_name = None Input model file containing atomic coordinates around which density is to be truncated at 2 sigma in density modification. This can be used to truncate high density for heavy atoms or high density that is away from the macromolecule so that they do not interfere with density modification. Note that this means that the density in these regions will typically be much lower after density modification. See also rad_mask, the radius around the coordinates to mask.
- rad_mask = 5 Radius around truncate_density atom positions to truncate density in density modification
- histograms_file_name = None Input histograms file name. Normally this file is created automatically
- guess_histograms_path = True If histograms path is too long, try to guess the path based on the resolve default path and the working directory
- symmetry_file = None Symmetry file used to apply symmetry during and model-building. Required if density_modify_with_model is set and ncs_copies is greater than 1.
- seq_file = None Sequence file used to estimate volume occupied by the molecule and to generate models (if density_modify_with_model is set) . Supply the unique part of the sequence file and use ncs_copies to specify how many copies of this are present. If ncs_copies=1 or None, then the seq_file must include all copies (if a dimer, must have two copies of the chain) and no NCS will be applied. .short_caption = Sequence file
output_files
- pdb_out = merged_model.pdb Merged model
- initial_map_file_name = initial_map.ccp4 Output scaled original map file name. Only written out if apply_cc_star_to_initial_map is True. (Otherwise the initial map is highly sharpened and not suitable for viewing).
- denmod_map_file_name = denmod_map.ccp4 Denmod map file name
- denmod_blur_50_map_file_name = None Output scaled denmod map file name, blurred with B of 50
- denmod_half_map_1_file_name = denmod_half_map_1.ccp4 Output scaled denmod half map 1 file name
- denmod_half_map_2_file_name = denmod_half_map_2.ccp4 Output scaled denmod half map 2 file name
- map_phasing_half_map_1_file_name = None Map phasing half map 1 file name
- map_phasing_half_map_2_file_name = None Map phasing half map 2 file name
- output_mask_file_name = None Output mask file
- temp_dir = None Temporary directory. Default is resolve_cryo_em_xx where xx creates new directory
- restore_full_size = False Restore full size of maps on output by padding with zeroes.
- resample_on_fine_grid = False Resample output maps on fine grid with scale of resampling_ratio to starting resolution of map. Requires input maps have origin at (0,0,0) and restore_full_size is True.
- resampling_ratio_to_resolution = 0.2 Ratio of gridding for output map to starting resolution of map
- output_directory = None Location for output files
crystal_info
- resolution = None Estimated resolution of full map data. Used to set dm_resolution
- original_resolution = None Original resolution (set automatically).
- minimum_resolution = None Minimum resolution (set automatically).
- n_xyz = None Gridding for resolve density modification calculation. Normally set automatically as same as gridding of input map after boxing.
- auto_gridding = None Automatically set gridding in resolve density modification based on the resolution (not same as input map)
- use_model_mask = False Use model mask if a model is supplied
- soft_mask = True Use soft mask when boxing map
- soft_mask_radius = None Radius for soft mask when boxing map. Default is twice the resolution of the map
- ncs_copies = None Number of copies in the entire structure of whatever is in seq_file or whatever is specified as the molecular_mass. You can leave ncs_copies=None or ncs_copies=1 and specify entire molecule in seq_file or molecular mass. Or you can set ncs_copies=xx and specify just the unique part of the sequence or molecular_mass.
- box_before_analysis = True Cut out and soft-mask a box of density around the molecule before density modification. Required if patterns is True.
- box_cushion = 5 Buffer around box of density around molecule to be cut out for density modification
- density_select = False Use density_select option to cut out molecule in box_before_analysis
- close_index = None Index within this value of edge is considered close. If edge of proposed box is within close_index of end, take the whole box.
- min_close_index = 10 Smallest value for estimate of close_index
- min_close_index_ratio = 10 Close index will be larger of min_close_index and cell grid units divided by min_close_index_ratio.
- solvent_content = None Solvent fraction (content) of the cell. You can specify the fraction of the volume of this cell that is taken up by the macromolecule. Normally set automatically. Values go from 0 to 1. This number applies to the boxed map if boxing is done here.
- solvent_content_iterations = 3 Iterations of solvent fraction estimation
- molecular_mass = None Molecular mass of molecule in Da. Used as alternative method of specifying solvent content. If ncs_copies is specified, total molecular mass = molecular_mass times ncs_copies.
- fraction_of_max_mask_threshold = None threshold of standard deviation map in low resolution mask identification of solvent content. Used if no solvent content and no molecular mass and no sequence file are specified. A good value is 0.05.
- wang_radius = None Local averaging radius for solvent identification in masking. Default is 1.5* resolution
- denmod_wang_radius = None Local averaging radius for solvent identification in density modification Default is 1.5* resolution
- smoothing_radius = None Radius for mask smoothing. Default is twice resolution.
- buffer_radius = None Radius for mask buffer in map_box. Default is smoothing radius value.
- minimum_smoothing_radius = 5 Minimum radius for mask smoothing.
- minimum_solvent_content = 0.5 Stop and ask for a bigger box if solvent content appears to be less than minimum_solvent_content
- maximum_solvent_content = 0.9999 Stop and ask for a smaller box if solvent content appears to be more than maximum_solvent_content
- overall_delta_b = -30 Overall delta B to apply to model map coefficients, relative to overall B of the map. (-30 means get the overall B of the map, subtract 30, set B of atoms in model to this value.)
- n_bins = 20 Number of resolution bins for sharpening. Default is 20. More bins may help slightly but will slow down density modification.
- set_resolution_to_dm_resolution = True Set resolution to dm_resolution once it is determined
- dm_resolution_offset_list = None List of resolution offsets to try for density modification. Applied if quick=False
- dm_resolution_list = None List of resolutions to try for density modification. Normally set automatically
- dm_resolution = None High-resolution limit for density modification. Normally set automatically from input resolution.
- dm_res_a = 2.4 dm_resolution will be guessed from resolution with the formula: res_dm = a + b*(res-res_c) + d * (res-res_c)**2
- dm_res_b = 0.99 dm_resolution will be guessed from resolution with the formula: res_dm = a + b*(res-res_c) + d * (res-res_c)**2
- dm_res_c = 3.0 dm_resolution will be guessed from resolution with the formula: res_dm = a + b*(res-res_c) + d * (res-res_c)**2
- dm_res_d = -0.2 dm_resolution will be guessed from resolution with the formula: res_dm = a + b*(res-res_c) + d * (res-res_c)**2
- chain_type = *None PROTEIN RNA DNA Chain type. Determined automatically from sequence file if not given. Mixed chain types are fine (leave blank if so).
- sequence = None Optional sequence. Can be used instead of a seq_file
- scattering_table = n_gaussian wk1995 it1992 *electron neutron Choice of scattering table for structure factor calculations. Standard for X-ray is n_gaussian, for cryoEM is electron.
strategy
- optimize_r_value = False Optimize choice of histograms/sharpening based on R-value for density modification. Alternative is based on estimated improvement in resolution.
- optimize_sharpening_method = None Try unsharpened and half-map sharpened map coefficients as starting point for density modification, then auto-set overall sharpening if remove_aniso=true, then density modify and choose based on minimum density modification R value
- density_modify_density_target_start_with_denmod = False Start with density-modified maps in density_modify_density_target
- density_modify_model_simple_merge = True Merge model_density with original without another density modification run.
- density_modify_density_target = False Use density from unmerged models as target for density modification
- use_model_variance = True Use estimate of model variance to weight density target.
- mask_density_target = True Mask density target to use only region where model was present
- use_merged_models_in_denmod = False Use merged model in density modification (one for each half map)
- use_tight_mask = False Use tight mask in average_maps
- model_mask_radius = 3 Radius for masking density from model
- no_denmod_after_model = False No density modification after recombining model information. Not implemented.
- use_correlated_fsc = False Assume errors in model-based FSC values are highly correlated
- use_maximum_inside_mask = False Use map_1 density (original) if higher than map_2 (model-based) inside map_2 mask in average_maps
- sd_smoothing_radius_ratio = 0.75 Radius for smoothing density sd estimates as ratio to resolution. Used in average_maps
- smooth_model_map = False Smooth model map in average_maps
- use_half_datasets_for_sigmas_by_reflection = False Use differences between F1 and F2 for sigmas ( times 0.707). Alternative is use these differences in shells only. Do not use this as it introduces correlations between half-datasets.
- fix_amplitudes = False Fix amplitudes, only change phases
- use_variance = True Use model variance in model-based density modification
- use_symmetry_in_denmod = False Use symmetry in density modification. Normally not used as maps already have symmetry applied.
- use_denmod_map_in_denmod_with_model = False Use density-modified map to start density modification with model
- refine = True Refine models
- sequence_models = True Get sequences of models
- write_map_files = True Write out final map files
- delta_resolution = 0.0 0.5 Resolution offsets to try in model-building
- macro_cycles = 3 Cycles of real-space refinement in model-building
- build_thoroughness = *None quick medium thorough Thoroughness of model-building
- build_methods = high_density_from_model no_high_density_from_model Build methods to try in model-building. Allowed methods: trace_and_build phase_and_build high_density_from_model no_high_density_from_model quick medium
- backup_build_methods = phase_and_build Build methods to try in model-building if initial methods do not yield enough models. Allowed methods: trace_and_build phase_and_build high_density_from_model no_high_density_from_model quick medium
- vary_blur_and_spectral_scaling_with_model = True In density_modify_with_model, generate maps with and without blur_by_resolution and spectral_scaling (4 maps)
- density_modify_with_model = None Run model-based density modification. If models are supplied, use them. Otherwise generate models and refine against half-maps. Use model information in density modification.
- model_cycles = 3 Number of cycles of model-based density modification
- scale_rmsd = 1 Scale on RMSD estimate for model-map agreement
- use_model_as_target = True Use model as target in density modification.
- patterns = False Use resolve_patterns to identify patterns in map and use them in density modification. Requires box_before_analysis.
- update_starting_map = True Update starting half maps each cycle using half-maps from previous cycle.. Alternative is to restart with original (sharpened) half maps each cycle.
- update_histograms = True Update histograms each cycle using working map. Alternative is to use input histogram file or generate histograms once.
- database_list = None List of histogram database entries to try one at a time if optimizing histograms. Database 0 is local database for this run. Databases 1 2 3 4 5 are 1 - 3 A databases. Default is 0 for quick/5 for quick/density_modify_unsharpened_maps and [0,5] for non-quick
- database_number = None Use this database entry by default (0 is local database, 5 std)
- rad_smooth_ratio = 1 Ratio of rad_smooth to resolution
- spectral_scaling = False Scale average Fourier coefficient amplitude vs resolution to match spectrum expected for a protein in a box of solvent. This can restore resolution-dependent features of a map.
- blur_by_resolution = False Blur after half-map averaging by B=10 times resolution. This can be used along with blur by resolution factor to apply a slight blurring that may make the final map easier to interpret.
- blur_by_resolution_factor = 10 Blur after half-map averaging by B= blur_by_resolution_factor times resolution
- sigma_weighting = None In recombination step, weight individual fourier coefficients based on estimated variances: 1/sqrt(normalized_variance). Can be combined with real_space_weighting. Can cause correlations between density-modified half maps so that half-map FSC values may overstate resolution.
- average_neighbors = False In sigma_weighting, average neighboring weights
- apply_cc_star_to_initial_map = None Apply resolution-dependent cc_star (FSCref) weighting to initial map before density modification. If False, apply any other scaling but set cc_star=1.
- local_methods_final_cycle = True Apply any local sharpening/averaging only on last cycle
- real_space_weighting = None In recombination step, weight maps in half-map averaging based on local half-map variance in real space (smoothed with smoothing_radius). Can be combined with sigma_weighting. Can cause correlations between density-modified half maps so that half-map FSC values may overstate resolution.
- sharpening_type = None half_map *final_half_map local_final_half_map Sharpening to apply at each stage. Half-map means scale data in each shell to maximum then multiply by estimated FSCref from input first half maps. Final_half_map means same except use estimate of FSCref for final maps. Final_local_half_map uses local FSCref estimates.
- cycles = 1 Cycles of density modification
- damping = None Damping of shifts on each cycle.
- minimum_r_value_improvement_per_cycle = 0.001 Minimum r_value improvement to keep cycling. Does not apply if dampling is used.
- minimum_improvement_per_cycle = 0.001 Minimum improvement to keep cycling. Does not apply if dampling is used.
- zero_half_dataset_correlation = False Assume correlation between density-modified half-datasets is zero.
- very_high_error = 100 If an error (asqr,bsqr,csqr,ssqr) is this big or bigger, require it to keep this value for all subsequent bins
- cc_cut = 0.2 Estimate of minimum highly reliable CC in half-map FSC. Used to decide at what CC value to smooth the remaining CC values. Also used to decide whether scale_using_last will apply
- correct_for_final_value = True Correct FSC values for correlated errors by assuming that the last smoothed value represents the correlated error
- max_cc_for_rescale = 0.1 Min reliable CC in half-maps. Used along with cc_cut and scale_using_last to correct for small errors in FSC estimation at high resolution. If the value of FSC near the high-resolution limit is above max_cc_for_rescale, assume these values are correct and do not correct them.
- scale_using_last = 3 If set, assume that the last scale_using_last bins in the FSC for half-map or model sharpening are about zero (corrects for errors int the half-map process). Only applies if these values are less than cc_cut
- mask_atoms_atom_radius = 5 Atom radius in model masking
- control_no_denmod = False Run dummy density modification just flattening solvent as estimated by probability mask
- zero_solvent = False Zero solvent region (for testing)
- randomize_solvent = False Randomize solvent region (for testing). Also set solvent_content and solvent_noise_ratio to run this.
- solvent_noise_ratio = 0.4 Solvent noise ratio for randomize_solvent
- use_bins_for_solvent_noise = True Use bins in solvent noise
- require_non_identical_half_maps = True Require half maps that are not identical
- density_modify_unsharpened_maps = None Use unsharpened (original) half maps in density modification. If set and quick is True then also set database=5
- default_fom = 0.9 Default FOM value if unknown
anisotropy
- remove_aniso = True Remove anisotropy from data and optionally sharpen before density modification
- b_iso = None Target overall B value for anisotropy correction. Ignored if remove_aniso = False. If None, default is target_b_ratio*resolution
- max_b_iso = None Default maximum overall B value for anisotropy correction. Ignored if remove_aniso = False. Recommended if b_iso is set. If used, default is minimum of (max_b_iso, B of dataset, target_b_ratio*resolution)
- target_b_ratio_list = 10. 20. Set of ratios of target B to resolution to try. Applied if quick=False and ignored if remove_aniso=False or b_iso is set
- target_b_ratio = 10. Default ratio of target B value to resolution for anisotropy correction. Ignored if remove_aniso = False. Ignored if quick or b_iso is set. If used, default is minimum of (max_b_iso, B of dataset, target_b_ratio*resolution)
control
- quick = True Quick run, do not optimize dm_resolution, target_b_iso or histogram database. If density_modify_unsharpened_maps is true set database=5 (use default histograms). Turns off real_space_weighting and sigma_weighting unless they are set.
- ignore_symmetry_conflicts = True Ignore symmetry conflicts in input files
- ignore_limitations = False Ignore restrictions on what can be run
- stop_after_mask = False Stop after getting mask
- verbose = False Verbose output
- max_dirs = 1000 Maximum number of directories (trace_and_build_xxx)
- resolve_size = None Size of resolve to use.
- resolve_command_file = None File with commands for resolve
- multiprocessing = *multiprocessing sge lsf pbs condor pbspro slurm Choices are multiprocessing (single machine) or queuing systems
- queue_run_command = None run command for queue jobs. For example qsub.
- nproc = 1 Number of processors to use
- random_seed = 171731 Random seed
- test_gui = None Run from command line but test GUI features
guiGUI-specific parameter required for output directory
- output_dir = None