Density modification of cryo EM maps with resolve_cryo_em
Author(s)
- resolve_cryo_em: Tom Terwilliger
Purpose
The routine resolve_cryo_em is a tool for carrying out density modification
of cryo-EM maps
Usage
How resolve_cryo_em works:
Density modification with resolve_cryo_em is based on two ideas.
One is that the errors in Fourier coefficients representing a cryo-EM map are
(to some extent) uncorrelated. This means that one Fourier coefficient does
not know about the errors in another one. (Note that this is not including
errors that are correlated simply because the molecule is small and is
placed in a large map. Correlated errors in this context are those where one
Fourier coefficient has been adjusted to compensate for errors in another one.)
The other is that some features in a map are known in advance. This could
include features such as the flatness of the solvent region, distributions
of map values in the solvent and macromolecule region, similarities of
symmetry-related regions.
Then the way density modification works is that Fourier coefficients for
the map are adjusted to agree both with the original map and with
the expected features. This improves the Fourier coefficients,
and the key result is that the map improves everywhere, not just where
the information about expected features was available.
Unique features of density modification for cryo-EM are that two half-maps with
independent errors are available in cryo-EM (allowing estimation of
errors), and that the errors in Fourier coefficients are (more or less)
distributed as two-dimensional Gaussians (i.e both phase and amplitude
errors). This leads to many differences in implementation density
modification in crystallography though core elements are identical.
Using resolve_cryo_em:
Normally you will access the functionality of resolve_cryo_em by running
the Phenix map_to_model tool in the Phenix GUI.
However you can run it directly as well (there is no GUI for resolve_cryo_em).
Half-maps: Supply two unmasked half maps. They can be sharpened but it does
not make much of a difference.
Sequence file: Supply a sequence file with the sequence of the molecule. Be
sure to put in all copies of the molecule (i.e. a 24-mer needs 24 chains).
Procedure used by resolve_cryo_em
The inputs to resolve_cryo_em are:
Two unmasked half-maps
sequence file or molecular mass or solvent fraction
The procedure used by resolve_cryo_em has several steps:
Boxing of maps: If the supplied maps are much larger than the molecule,
the maps are trimmed down to about 5 A bigger than the largest dimension
of the molecule (estimated from a low-res mask and the molecular
volume based on sequence or as specified) in each direction.
Resolution estimate and half-map sharpening of maps: The half-maps are
compared as a function of resolution and the resolution (FSC=0.143)
is estimated and the maps are sharpened based on the estimated map quality
of the full (averaged) map. A full map is calculated.
Generation of map-value (density) histograms: The full map is analyzed
to identify the distribution of map values in the solvent and
macromolecule region. These histograms are to be used in density
modification.
Density modification of half-maps: Each half map is density-modified
using maximum-likelihood density modification. The histograms of map
values from the preceding step are used as targets indicating what the
distribution should be in the density modified maps.
Estimation of errors: Fourier coefficients for the two starting
half-maps and the two density-modified maps are compared to give FSC
values as a function of resolution. These FSC values are used to estimate
correlated and uncorrelated errors in the four maps and to identify
optimal weighting between original and density-modified maps.
Optional real-space and sigma weighting: The smoothed local rms differences
between original half maps and between density-modified half maps are
used (optionally) to identify location-specific weighting for the
original and density-modified maps. The variance of Fourier coefficients
among the four maps are used (optionally) to weight individual final
Fourier coefficients.
Optional spectral scaling and local sharpening. The final
map is optionally scaled with a resolution-dependent scale factor
representing the radial part of a typical Fourier transform of a
macromolecule. The final map is optionally locally resolution-filtered
(local sharpening). The final map is also optionally blurred slightly
with a blurring dependent on the overall resolution of the map.
Examples
Standard run of resolve_cryo_em:
You can use resolve_cryo_em to density-modify a cryo-EM map:
phenix.resolve_cryo_em half_map_A.mrc half_map_B.mrc seq_file=seq.dat
Possible Problems
If the half-maps have been masked the procedure may not work well.
If the maps have very prominent density away from the macromolecule this
may interfere with density modification.
If there is non-macromolecule but real density in the maps this
may interfere with density modification (for example, lipid density).
Specific limitations and problems:
The density modification procedure works best in the resolution range of about 4.5 A or better.
Literature
- Improvement of cryo-EM maps by density modification. T.C. Terwilliger, S.J. Ludtke, R.J. Read, P.D. Adams, and Afonine. bioRxiv (2019).
Additional information
List of all available keywords
- job_title = None Job title in PHENIX GUI, not used on command line
- input_files
- half_map_file_name_1 = None Half map file name
- half_map_file_name_2 = None Half map file name
- map_file_name = None Map file name. Normally just supply half maps. If you supply a full map it will be sharpened based on the half-maps and used to get histograms in the first cycle of density modification.
- target_map_file_name = None Input target map file name
- histograms_file_name = None Input histograms file name. Normally this file is created automatically
- symmetry_file = None Symmetry file used to apply symmetry during density modification
- seq_file = None Sequence file used to estimate volume occupied by the molecule. If ncs_copies=1 or None, then the seq_file must include all copies (if a dimer, must have two copies of the chain). If ncs_copies=xx then the sequence file replicated ncs_copies times.
- output_files
- initial_map_file_name = None Output scaled original map file name
- denmod_map_file_name = denmod_map.ccp4 Output scaled denmod map file name
- denmod_blur_50_map_file_name = None Output scaled denmod map file name, blurred with B of 50
- denmod_half_map_1_file_name = denmod_half_map_1.ccp4 Output scaled denmod half map 1 file name
- denmod_half_map_2_file_name = denmod_half_map_2.ccp4 Output scaled denmod half map 2 file name
- temp_dir = temp_dir temporary directory used to write local files
- crystal_info
- resolution = None Estimated resolution of full map data. Used to set dm_resolution
- original_resolution = None Original resolution (set automatically).
- n_xyz = None Gridding for resolve density modification calculation. Normally set automatically as same as gridding of input map after boxing.
- auto_gridding = None Automatically set gridding in resolve density modification based on the resolution (not same as input map)
- scattering_table = n_gaussian wk1995 it1992 *electron neutron Choice of scattering table for structure factor calculations. Standard for X-ray is n_gaussian, for cryoEM is electron.
- soft_mask = True Use soft mask when boxing map
- soft_mask_radius = None Radius for soft mask when boxing map. Default is twice the resolution of the map
- ncs_copies = None Number of copies in the entire structure of whatever is in seq_file or whatever is specified as the molecular_mass. You can leave ncs_copies=None or ncs_copies=1 and specify entire molecule in seq_file or molecular mass. Or you can set ncs_copies=xx and specify just the unique part of the sequence or molecular_mass.
- box_before_analysis = True Cut out and soft-mask a box of density around the molecule before density modification. Required if patterns is True.
- box_cushion = 5 Buffer around box of density around molecule to be cut out for density modification
- solvent_content = None Solvent fraction (content) of the cell. You can specify the fraction of the volume of this cell that is taken up by the macromolecule. Normally set automatically. Values go from 0 to 1. This number applies to the boxed map if boxing is done here.
- solvent_content_iterations = 3 Iterations of solvent fraction estimation
- molecular_mass = None Molecular mass of molecule in Da. Used as alternative method of specifying solvent content. If ncs_copies is specified, total molecular mass = molecular_mass times ncs_copies.
- fraction_of_max_mask_threshold = None threshold of standard deviation map in low resolution mask identification of solvent content. Used if no solvent content and no molecular mass and no sequence file are specified. A good value is 0.05.
- wang_radius = None Wang radius for solvent identification in masking. Default is 1.5* resolution
- denmod_wang_radius = None Wang radius for solvent identification in density modification Default is 1.5* resolution
- smoothing_radius = None Radius for mask smoothing. Default is twice resolution.
- buffer_radius = None Radius for mask buffer in map_box. Default is smoothing radius value.
- minimum_smoothing_radius = 5 Minimum radius for mask smoothing.
- minimum_solvent_content = 0.5 Stop and ask for a bigger box if solvent content appears to be less than minimum_solvent_content
- maximum_solvent_content = 0.9999 Stop and ask for a smaller box if solvent content appears to be more than maximum_solvent_content
- n_bins = 20 Number of resolution bins for sharpening. Default is 20.
- set_resolution_to_dm_resolution = True Set resolution to dm_resolution once it is determined
- dm_resolution_offset_list = -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 List of resolution offsets to try for density modification. Normally set automatically
- dm_resolution_list = None List of resolutions to try for density modification. Normally set automatically
- dm_resolution = None High-resolution limit for density modification. Normally set automatically from input resolution.
- dm_res_a = 2.4 dm_resolution will be guessed from resolution with the formula: res_dm = a + b*(res-res_c) + d * (res-res_c)**2
- dm_res_b = 0.99 dm_resolution will be guessed from resolution with the formula: res_dm = a + b*(res-res_c) + d * (res-res_c)**2
- dm_res_c = 3.0 dm_resolution will be guessed from resolution with the formula: res_dm = a + b*(res-res_c) + d * (res-res_c)**2
- dm_res_d = -0.2 dm_resolution will be guessed from resolution with the formula: res_dm = a + b*(res-res_c) + d * (res-res_c)**2
- chain_type = *None PROTEIN RNA DNA Chain type. Determined automatically from sequence file if not given. Mixed chain types are fine (leave blank if so).
- sequence = None Optional sequence. Can be used instead of a seq_file
- strategy
- patterns = False Use resolve_patterns to identify patterns in map and use them in density modification. Requires box_before_analysis.
- update_starting_map = False Update starting half maps each cycle using half-maps from previous cycle.. Alternative is to restart with original (sharpened) half maps each cycle.
- update_histograms = True Update histograms each cycle using working map. Alternative is to use input histogram file or generate histograms once.
- database_list = 1 4 5 6 List of database entries to try one at a time if resolution is database_cutoff or finer. Database 1 is local database for this run. Databases 4 5 6 are 2 - 3 A databases.
- database = None Just read this database entry (1 is the local database, 6 std)
- database_entries_to_read = 5 Number of histogram database entries to read from standard histogram file in solve_resolve/ext_ref_files/segments/rho.list. These will be appended to histograms estimated from half-maps. If resolution is finer than database_cutoff (or 2 A if it is not set) all of these histograms will be tried and best chosen based on denmod R value. Must be at least 1.
- database_cutoff = None Resolution cutoff for reading and testing all database entries (A). If resolution is finer than this the R-value in density modification will be used to choose which histogram to use.
- rad_smooth_ratio = 1 Ratio of rad_smooth to resolution
- spectral_scaling = True Apply standard amplitude vs resolution in half-map averaging
- blur_by_resolution = True Blur after half-map averaging by B=10 times resolution
- blur_by_resolution_factor = 10 Blur after half-map averaging by B= blur_by_resolution_factor times resolution
- sigma_weighting = False In recombination step, weight individual fourier coefficients based on estimated variances: 1/sqrt(normalized_variance). Can be combined with real_space_weighting.
- apply_cc_star_to_initial_map = False Apply resolution-dependent cc_star weighting to initial map before density modification. If False, apply any other scaling but set cc_star=1.
- local_sharpening_final_cycle_only = True Apply any local sharpening only on last cycle
- real_space_weighting = False In recombination step, weight maps in half-map averaging based on local half-map variance in real space (smoothed with smoothing_radius). Can be combined with sigma_weighting.
- sharpening_type = None half_map final_half_map *local_final_half_map Sharpening to apply at each stage. Half-map means scale data in each shell to maximum then multiply by estimated CC* from input first half maps. Final_half_map means same except use estimate of CC* for final maps. Final_local_half_map uses local CC* estimates.
- cycles = 5 Cycles of density modification
- minimum_improvement_per_cycle = 0.005 Minimum improvement to keep cycling
- very_high_error = 100 If an error (asqr,bsqr,csqr,ssqr) is this big or bigger, require it to keep this value for all subsequent bins
- cc_cut = 0.2 Estimate of minimum highly reliable CC in half-map FSC. Used to decide at what CC value to smooth the remaining CC values. Also used to decide whether scale_using_last will apply
- correct_for_final_value = True Correct FSC values for correlated errors by assuming that the last smoothed value represents the correlated error
- max_cc_for_rescale = 0.1 Min reliable CC in half-maps. Used along with cc_cut and scale_using_last to correct for small errors in FSC estimation at high resolution. If the value of FSC near the high-resolution limit is above max_cc_for_rescale, assume these values are correct and do not correct them.
- scale_using_last = 3 If set, assume that the last scale_using_last bins in the FSC for half-map or model sharpening are about zero (corrects for errors int the half-map process). Only applies if these values are less than cc_cut
- mask_atoms_atom_radius = 5 Atom radius in model masking
- control_no_denmod = False Run dummy density modification just zeroing out solvent as estimated by probability mask
- anisotropy
- remove_aniso = True Remove anisotropy from data and optionally sharpen before density modification
- b_iso = None Target overall B value for anisotropy correction. Ignored if remove_aniso = False. If None, default is target_b_ratio*resolution
- max_b_iso = None Default maximum overall B value for anisotropy correction. Ignored if remove_aniso = False. Recommended if b_iso is set. If used, default is minimum of (max_b_iso, B of dataset, target_b_ratio*resolution)
- target_b_ratio_list = 10. 15. 20. Set of ratios of target B to resolution to try. ignored if remove_aniso=False or b_iso is set
- target_b_ratio = 10. Default ratio of target B value to resolution for anisotropy correction. Ignored if remove_aniso = False. Ignored if b_iso is set. If used, default is minimum of (max_b_iso, B of dataset, target_b_ratio*resolution)
- control
- multiprocessing = *multiprocessing sge lsf pbs condor pbspro slurm Choices are multiprocessing (single machine) or queuing systems
- queue_run_command = None run command for queue jobs. For example qsub.
- nproc = 1 Number of processors to use
- random_seed = 171731 Random seed
- quick = True Quick run, do not optimize dm_resolution, target_b_iso or histogram database. Note: if dm_resolution < 2 A resolve will automatically test several databases
- verbose = False Verbose output
- write_maps = True Write intermediate maps
- ignore_symmetry_conflicts = True You can ignore the symmetry information (CRYST1) from coordinate files. This may be necessary if your model has been placed in a box with box_map for example.
- resolve_size = None Size of resolve to use.
- resolve_command_file = None File with commands for resolve
- guiGUI-specific parameter required for output directory