Auto-sharpening cryo-EM or crystallographic maps with auto_sharpen
Author(s)
- auto_sharpen: Tom Terwilliger
Purpose
The routine auto_sharpen will automatically identify optimal
sharpening/blurring/map adjustment for the input map and will write
out an optimized version of the map.
GUI
A Graphical User Interface is available.
Usage
How auto_sharpen works:
Auto-sharpen adjusts the resolution dependence of the map
to maximize the clarity of the map. You can choose to use
map kurtosis or the adjusted surface area of the map which is the
default for this purpose.
The auto-sharpen tool works best at resolutions of about 4.5 A or
better. At lower resolutions it may be unable to distinguish
variations in the quality of the map as the sharpening is varied.
Kurtosis is a standard statistical measure that reflects the peakiness of
the map.
The adjusted surface area is a combination of the surface area of
contours in the map at a particular threshold
and of the number of distinct regions enclosed by the top 30% (default) of
those contours. The threshold is chosen by default to be one where the
volume enclosed by the contours is 20% of the non-solvent volume in the map.
The weighting between the surface area (to be maximized) and number of regions
enclosed (to be minimized) is chosen empirically (default region_weight=20).
Several resolution-dependent functions are tested, and the one
that gives the best adjusted surface area (or kurtosis) is chosen.
In each case the map is transformed to obtain Fourier coefficients. The
amplitudes of these coefficients are then adjusted, keeping the phases
constant. The available functions for modifying the amplitudes are:
No sharpening (map is left as is)
Sharpening b-factor applied over entire resolution range (b_sharpen
applied to achieve an effective isotropic overall b-value of b_iso).
Sharpening b-factor applied up to resolution specified with the
resolution=xxx keyword, then blurred beyond this resolution (with
transition specified by the keyword k_sharpen, b_iso_to_d_cut). If
the sharpening b_sharpen is negative (blurring the map),
the blurring is applied over the entire resolution range.
Resolution-dependent sharpening factor with three parameters.
First the resolution-dependence of the map is removed by normalizing the
amplitudes. Then a scale factor S is to the data, where
log10(S) is determined by coefficients b[0],b[1],b[2] and a resolution
d_cut (typically d_cut is the nominal resolution of the map).
The value of log10(S) varies smoothly from 0 at resolution=infinity, to b[0]
at d_cut/2, to b[1] at d_cut, and to b[1]+b[2] at the highest resolution
in the map. The value of b[1] is limited to being no larger than b[0] and the
value of b[1]+b[2] is limited to be no larger than b[1].
You can also choose to specify the sharpening/blurring parameters for your
map and they will simply be applied to the map. For example you can apply
a sharpening B-value (b_sharpen) to sharpen the map, or you can specify a
target overall B-value (b_iso) to obtain after sharpening.
Box of density in sharpening
Normally phenix.auto_sharpen will determine the optimal sharpening by
examining the density in a box cut out of your map, then apply this to
the entire map.
Local sharpening
You can choose to apply autosharpening locally if you want. In this case
the auto-sharpening parameters are determined in many boxes cut out of
the map, and corresponding sharpened maps are calculated. The map that
is produced is a weighted map where the density at a particular point
comes most from the sharpened map based on a box near that point.
Half-map-based sharpening
You can identify the sharpening parameters using two half-maps if you
want. The resolution-dependent correlation of density in the two half-maps
is used to identify the optimal resolution-dependent weighting of
the map. This approach requires a target resolution which is used
to set the overall fall-off with resolution for an ideal map. That
fall-off for an ideal map is then multiplied by an estimated
resolution-dependent correlation of density in the map with the true
map (the estimation comes from the half-map correlations).
Model-based sharpening
You can identify instead the sharpening parameters using your map and a
model. This approach requires a guess of the RMSD between the model and
the true model. The resolution-dependent correlation of model and map
density is used as in the half-map approach above to identify the
weighting of Fourier coefficients.
Using crystallographic maps
You can use phenix.auto_sharpen with a crystallographic map (represented
as map coefficients).
Shifting the map to the origin
Most crystallographic maps have the origin at the corner of the map (
grid point [0,0,0]), while most cryo-EM maps have the orgin in the
middle of the map. An output map with the origin shifted to the
corner of the map is optionally written out.
Output files from auto_sharpen
sharpened_map.ccp4: Sharpened map.
shifted_sharpened_map.ccp4: Sharpened map, shifted to place the origin on grid point (0,0,0) and sharpened
sharpened_map_coeffs.mtz: Sharpened map, shifted to place the origin on grid point (0,0,0) and sharpened, represented as map coefficients.
Examples
Standard run of auto_sharpen:
Running auto_sharpen is easy. From the command-line you can type:
phenix.auto_sharpen my_map.map resolution=2.6
where my_map.map is a CCP4, mrc or other related map format, and you
specify the nominal resolution of the map.
Possible Problems
Specific limitations and problems:
Maps produced with the extract-unique option of map_box should not be
sharpened with auto-sharpen. These maps are closely masked around the
density of a single molecule and are set to zero in much of the map,
so the information about noise in the map that normally is available
is missing and auto-sharpen does not work properly.
Literature
- Automated map sharpening by maximization of detail and connectivity. T.C. Terwilliger, P.V. Afonine, Sobolev, OV, and P.D. Adams. bioRxiv (2018).
Additional information
List of all available keywords
- job_title = None Job title in PHENIX GUI, not used on command line
- input_files
- map_file = None File with CCP4-style map
- half_map_file = None Half map (two should be supplied) for FSC calculation. Must have grid identical to map_file
- map_coeffs_file = None Optional file with map coefficients
- map_coeffs_labels = None Optional label specifying which columns of of map coefficients to use
- pdb_file = None If a model is supplied, the map will be adjusted to maximize map-model correlation. This can be used to improve a map in regions where no model is yet built.
- ncs_file = None File with NCS information (typically point-group NCS with the center specified). Typically in PDB format. Can also be a .ncs_spec file from phenix. Created automatically if symmetry is specified.
- seq_file = None Sequence file (unique chains only, 1-letter code, chains separated by blank line or greater-than sign.) Can have chains that are DNA/RNA/protein and all can be present in one file.
- input_weight_map_pickle_file = None Weight map pickle file
- output_files
- shifted_map_file = shifted_map.ccp4 Input map file shifted to new origin.
- sharpened_map_file = sharpened_map.ccp4 Sharpened input map file. In the same location as input map.
- shifted_sharpened_map_file = None Input map file shifted to place origin at 0,0,0 and sharpened.
- sharpened_map_coeffs_file = sharpened_map_coeffs.mtz Sharpened input map (shifted to new origin if original origin was not 0,0,0), written out as map coefficients
- output_weight_map_pickle_file = weight_map_pickle_file.pkl Output weight map pickle file
- output_directory = None Directory where output files are to be written applied.
- crystal_info
- is_crystal = None Defines whether this is a crystal (or cryo-EM). Default is True if use_sg_symmetry=True and False otherwise.
- resolution = None Optional nominal resolution of the map.
- solvent_content = None Optional solvent fraction of the cell.
- solvent_content_iterations = 3 Iterations of solvent fraction estimation. Used for ID of solvent content in boxed maps.
- molecular_mass = None Molecular mass of molecule in Da. Used as alternative method of specifying solvent content.
- ncs_copies = None You can specify ncs copies and seq file to define solvent content
- wang_radius = None Wang radius for solvent identification. Default is 1.5* resolution
- buffer_radius = None Buffer radius for mask smoothing. Default is resolution
- pseudo_likelihood = None Use pseudo-likelihood method for half-map sharpening. (In development)
- map_modification
- b_iso = None Target B-value for map (sharpening will be applied to yield this value of b_iso). If sharpening method is not supplied, default is to use b_iso_to_d_cut sharpening.
- b_sharpen = None Sharpen with this b-value. Contrast with b_iso that yield a targeted value of b_iso
- b_blur_hires = 200 Blur high_resolution data (higher than d_cut) with this b-value. Contrast with b_sharpen applied to data up to d_cut. Note on defaults: If None and b_sharpen is positive (sharpening) then high-resolution data is left as is (not sharpened). If None and b_sharpen is negative (blurring) high-resolution data is also blurred.
- resolution_dependent_b = None If set, apply resolution_dependent_b (b0 b1 b2). Log10(amplitudes) will start at 1, change to b0 at half of resolution specified, changing linearly, change to b1/2 at resolution specified, and change to b1/2+b2 at d_min_ratio*resolution
- normalize_amplitudes_in_resdep = False Normalize amplitudes in resolution-dependent sharpening
- d_min_ratio = 0.833 Sharpening will be applied using d_min equal to d_min_ratio times resolution. Default is 0.833
- scale_max = 100000 Scale amplitudes from inverse FFT to yield maximum of this value
- input_d_cut = None High-resolution limit for sharpening
- rmsd = None RMSD of model to true model (if supplied). Used to estimate expected fall-of with resolution of correct part of model-based map. If None, assumed to be resolution times rmsd_resolution_factor.
- rmsd_resolution_factor = 0.25 default RMSD is resolution times resolution factor
- fraction_complete = None Completness of model (if supplied). Used to estimate correct part of model-based map. If None, estimated from max(FSC).
- auto_sharpen = True Automatically determine sharpening using kurtosis maximization or adjusted surface area. Default is True
- auto_sharpen_methods = no_sharpening b_iso *b_iso_to_d_cut resolution_dependent model_sharpening half_map_sharpening target_b_iso_to_d_cut None Methods to use in sharpening. b_iso searches for b_iso to maximize sharpening target (kurtosis or adjusted_sa). b_iso_to_d_cut applies b_iso only up to resolution specified, with fall-over of k_sharpen. Resolution dependent adjusts 3 parameters to sharpen variably over resolution range. Default is b_iso_to_d_cut . target_b_iso_to_d_cut uses target_b_iso_ratio to set b_iso.
- box_in_auto_sharpen = False Use a representative box of density for initial auto-sharpening instead of the entire map. Default is False.
- density_select_in_auto_sharpen = True Choose representative box of density for initial auto-sharpening with density_select method (choose region where there is high density). Normally use False for X-ray data and True for cryo-EM.
- density_select_threshold_in_auto_sharpen = None Threshold for density select choice of box. Default is 0.05. If your map has low overall contrast you might need to make this bigger such as 0.2.
- allow_box_if_b_iso_set = False Allow box_in_auto_sharpen (if set to True) even if b_iso is set. Default is to set box_n_auto_sharpen=False if b_iso is set.
- soft_mask = True Use soft mask (smooth change from inside to outside with radius based on resolution of map).
- use_weak_density = False When choosing box of representative density, use poor density (to get optimized map for weaker density)
- discard_if_worse = None Discard sharpening if worse
- local_sharpening = None Sharpen locally using overlapping regions. NOTE: Best to turn off local_aniso_in_local_sharpening if NCS is present. If local_aniso_in_local_sharpening is True and NCS is present this can distort the map for some NCS copies because an anisotropy correction is applied based on local density in one copy and is transferred without rotation to other copies.
- local_aniso_in_local_sharpening = None Use local anisotropy in local sharpening. Default is True unless NCS is present.
- overall_before_local = True Apply overall scaling before local scaling
- select_sharpened_map = None Select a single sharpened map to use
- read_sharpened_maps = None Read in previously-calculated sharpened maps
- write_sharpened_maps = None Write out local sharpened maps
- smoothing_radius = None Sharpen locally using smoothing_radius. Default is 2/3 of mean distance between centers for sharpening
- box_center = None You can specify the center of the box (A units)
- box_size = 30 30 30 You can specify the size of the box (grid units)
- target_n_overlap = 10 You can specify the targeted overlap of boxes in local sharpening
- restrict_map_size = None Restrict box map to be inside full map (required for cryo-EM data). Default is True if use_sg_symmetry=False and False if use_sg_symmetry=True
- remove_aniso = True You can remove anisotropy (overall and locally) during sharpening
- max_box_fraction = 0.5 If box is greater than this fraction of entire map, use entire map. Default is 0.5.
- density_select_max_box_fraction = 0.95 If box is greater than this fraction of entire map, use entire map for density_select. Default is 0.95
- cc_cut = 0.2 Estimate of minimum highly reliable CC in half-map FSC. Used to decide at what CC value to smooth the remaining CC values.
- max_cc_for_rescale = 0.2 Used along with cc_cut and scale_using_last to correct for small errors in FSC estimation at high resolution. If the value of FSC near the high-resolution limit is above max_cc_for_rescale, assume these values are correct and do not correct them.
- scale_using_last = 3 If set, assume that the last scale_using_last bins in the FSC for half-map or model sharpening are about zero (corrects for errors int the half-map process).
- mask_atoms = True Mask atoms when using model sharpening
- mask_atoms_atom_radius = 3 Mask for mask_atoms will have mask_atoms_atom_radius
- value_outside_atoms = None Value of map outside atoms (set to mean to have mean value inside and outside mask be equal)
- k_sharpen = 10 Steepness of transition between sharpening (up to resolution ) and not sharpening (d < resolution). Note: for blurring, all data are blurred (regardless of resolution), while for sharpening, only data with d about resolution or lower are sharpened. This prevents making very high-resolution data too strong. Note 2: if k_sharpen is zero or None, then no transition is applied and all data is sharpened or blurred.
- iterate = False You can iterate auto-sharpening. This is useful in cases where you do not specify the solvent content and it is not accurately estimated until sharpening is optimized.
- optimize_b_blur_hires = False Optimize value of b_blur_hires. Only applies for auto_sharpen_methods b_iso_to_d_cut and b_iso. This is normally carried out and helps prevent over-blurring at high resolution if the same map is sharpened more than once.
- optimize_d_cut = None Optimize value of d_cut. Only applies for auto_sharpen_methods b_iso_to_d_cut and b_iso
- adjust_region_weight = True Adjust region_weight to make overall change in surface area equal to overall change in normalized regions over the range of search_b_min to search_b_max using b_iso_to_d_cut.
- region_weight_method = initial_ratio *delta_ratio b_iso Method for choosing region_weights. Initial_ratio uses ratio of surface area to regions at low B value. Delta ratio uses change in this ratio from low to high B. B_iso uses resolution-dependent b_iso (not weights) with the formula b_iso=5.9*d_min**2
- region_weight_factor = 1.0 Multiplies region_weight after calculation with region_weight_method above
- region_weight_buffer = 0.1 Region_weight adjusted to be region_weight_buffer away from minimum or maximum values
- target_b_iso_ratio = 5.9 Target b_iso ratio : b_iso is estimated as target_b_iso_ratio * resolution**2
- target_b_iso_model_scale = 0. For model sharpening, the target_biso is scaled (normally zero).
- signal_min = 3.0 Minimum signal in estimation of optimal b_iso. If not achieved, use any other method chosen.
- search_b_min = None Low bound for b_iso search. Default is -100.
- search_b_max = None High bound for b_iso search. Default is 300.
- search_b_n = None Number of b_iso values to search. Default is 21.
- residual_target = None Target for maximization steps in sharpening. Can be kurtosis or adjusted_sa (adjusted surface area). Default is adjusted_sa.
- sharpening_target = None Overall target for sharpening. Can be kurtosis or adjusted_sa (adjusted surface area). Used to decide which sharpening approach is used. Note that during optimization, residual_target is used (they can be the same.) Default is adjusted_sa.
- require_improvement = None Require improvement in score for sharpening to be applied. Default is True.
- region_weight = None Region weighting in adjusted surface area calculation. Score is surface area minus region_weight times number of regions. Default is set automatically. A smaller value will give more sharpening.
- sa_percent = None Percent of target regions used in calulation of adjusted surface area. Default is 30.
- fraction_occupied = None Fraction of molecular volume targeted to be inside contours. Used to set contour level. Default is 0.20
- n_bins = None Number of resolution bins for sharpening. Default is 20.
- regions_to_keep = None You can specify a limit to the number of regions to keep when generating the asymmetric unit of density.
- max_regions_to_test = None Number of regions to test for surface area in adjusted_sa scoring of sharpening. Default is 30
- eps = None
- k_sol = 0.35 k_sol value for model map calculation. IGNORED (Not applied)
- b_sol = 50 b_sol value for model map calculation. IGNORED (Not applied)
- control
- verbose = False Verbose output
- resolve_size = None Size of resolve to use.
- ignore_map_limitations = None Ignore limitations such as map cannot be sharpened
- multiprocessing = *multiprocessing sge lsf pbs condor pbspro slurm Choices are multiprocessing (single machine) or queuing systems
- queue_run_command = None run command for queue jobs. For example qsub.
- nproc = 1 Number of processors to use
- guiGUI-specific parameter required for output directory