Density modification of cryo EM maps with resolve_cryo_em

Author(s)

Purpose

The routine resolve_cryo_em is a tool for carrying out density modification of cryo-EM maps

Usage

Density modification with resolve_cryo_em is carried out using two half-maps, along with the FSC-based resolution and a sequence file specifying the contents of the map.

Density modification can also be carried out by using the initial density-modified map or a single supplied model as a basis for generating multiple models, then using model-based density as part of the density modification target to yield a model-based density modified map.

Video Tutorial

The tutorial video is available on the Phenix YouTube channel and covers the following topics:

How resolve_cryo_em works:

Density modification with resolve_cryo_em is based on two ideas. One is that the errors in Fourier coefficients representing a cryo-EM map are (to some extent) uncorrelated. This means that one Fourier coefficient does not know about the errors in another one. (Note that this is not including errors that are correlated simply because the molecule is small and is placed in a large map. Correlated errors in this context are those where one Fourier coefficient has been adjusted to compensate for errors in another one.)

The other is that some features in a map are known in advance. This could include features such as the flatness of the solvent region, distributions of map values in the solvent and macromolecule region, similarities of symmetry-related regions.

Then the way density modification works is that Fourier coefficients for the map are adjusted to agree both with the original map and with the expected features. This improves the Fourier coefficients, and the key result is that the map improves everywhere, not just where the information about expected features was available.

Unique features of density modification for cryo-EM are that two half-maps with independent errors are available in cryo-EM (allowing estimation of errors), and that the errors in Fourier coefficients are (more or less) distributed as two-dimensional Gaussians (i.e both phase and amplitude errors). This leads to many differences in implementation density modification in crystallography though core elements are identical.

Using resolve_cryo_em:

Normally you will access the functionality of resolve_cryo_em by running the ResolveCryoEM tool in the Phenix GUI. You can also run it from the command line. You may wish to run it from the command line in background with multiprocessing if you are running with denmod_with_models=True as this can take a long time (perhaps 1 day x 16 processors if you have 250 residues in the unique part of your model).

Half-maps: Supply two unmasked half maps. They can be sharpened but it does not make much of a difference. They must be unmasked and they must still have noise in the solvent region. They can be boxed with a soft boundary (if they are boxed specify box_before_analysis=False so it isn't done again)

Sequence file: Supply a sequence file with the sequence of the molecule. Be sure to put in all copies of the molecule (i.e. a 24-mer needs 24 chains). (You can also supply one copy of the molecule and specify copies=24 if you want.)

Model file: You can supply a model; if you do then the model will be randomized and refined against the half-maps to yield 8 models for each half map. These will be used in model-based density modification.

Automatically-generated models: You can specify density_modify_with_model=True and not supply any model. In this case 8 models will be automatically generated for each half-map and used in model-based density modification.

Procedure used by resolve_cryo_em

The basic inputs to resolve_cryo_em are:

Two unmasked half-maps
sequence file or molecular mass or solvent fraction

Optional inputs are:

Full map (Recombined with density-modified half maps)
Model (Used to generate multi-model representation for density modification)
Symmetry file (Supply with model or if automatic model generation is
   used to limit the modeling to the unique part of the map. Alternatively,
   used in density modification in cases where the symmetry is not part
   of the reconstruction process).
Mask file (Used to identify the region where the macromolecule is present)

The procedure used by resolve_cryo_em has several steps:

Boxing of maps:  If the supplied maps are much larger than the molecule,
the maps are trimmed down to about 5 A bigger than the largest dimension
of the molecule (estimated from a low-res mask and the molecular
volume based on sequence or as specified) in each direction, using
a soft mask at the edges of the box.

Resolution estimate and half-map sharpening of maps: The half-maps are
compared as a function of resolution and the resolution (FSC=0.143)
is estimated and the maps are sharpened based on the estimated map quality
of the full (averaged) map.  A full map is calculated.

Generation of map-value (density) histograms:  The full map is analyzed
to identify the distribution of map values in the solvent and
macromolecule region.  These histograms are optionally used in density
modification.

Map-phasing of half-maps:  Each half map is used in a process of
map-based estimation of new Fourier coefficients using a
maximum-likelihood procedure. In essence, one Fourier coefficient is
removed from the map at a time. Then a new value of that Fourier coefficient
is found that maximizes the likelihood of the map given all the other
Fourier coefficients. The likelihood is calculated from the agreement of
the histograms of the (new) map with expected histograms.  For example,
if the solvent region is expected to be flat, then the new map has a
high likelihood if it is flat in the solvent region.  By varying the one
Fourier coefficient the flatness of the solvent region will change. Similarly
the histograms of density in the region of the macromolecule will change
depending on the value of the one Fourier coefficient. The best value for
each Fourier coefficient is then used to calculated a map-phasing map.

Estimation of errors:  Fourier coefficients for the two starting
half-maps and the two density-modified maps are compared to give FSC
values as a function of resolution.  These FSC values are used to estimate
correlated and uncorrelated errors in the four maps and to identify
optimal weighting between original and density-modified maps.

Recombination of original and map-phasing half-maps:  Based on the
estimated errors in original and map-phasing half-maps, all four maps
are recombined to create a new density-modified map.  Additionally,
each half-map and associated map-phasing half map are recombined to
create two new density-modified half-maps.

Optional real-space and sigma weighting:  The smoothed local rms differences
between original half maps and between density-modified half maps are
used (optionally) to identify location-specific weighting for the
original and density-modified maps.  The variance of Fourier coefficients
among the four maps are used (optionally) to weight individual final
Fourier coefficients.

Optional starting with unsharpened maps.  The input half maps are used in the
density modification step without adjusting the resolution dependence
of the maps with half-map sharpening.  This can be useful
for high-resolution maps.  Note that whether or not this option is
selected, normally an overall B-factor is still applied to the input maps
based on the nominal resolution.  The difference therefore is in the
details of the resolution dependence which is adjusted in the default
case based on a comparison of half-maps, but not with density
modify unsharpened maps.  The overall B-factor
applied before density modification is controlled by the "remove_aniso"
and "b_iso" keywords.  By default remove_aniso=True and b_iso is
target_b_ratio*resolution (target_b_ratio=10), so before density
modification whatever resolution dependence and anisotropy is present in
the each map is adjusted to yield an overall B of b_iso and no anisotropy.


Optional alternative final sharpening of maps.  The final sharpening normally
consists of two parts.  One is scaling the map in shells of resolution
based on the estimated correlation of the map coefficients at each
resolution with the true ones.  This is controlled by the
keyword final_scale_with_fsc_ref.   The second part of final scaling is
scaling Fourier coefficients in each shell of resolution either to match
the low-resolution shell or to match the scaled half-maps, or to
be part-way between these (controlled by the keyword geometric_scaling).

Optional spectral scaling and local sharpening.  The final
map is optionally scaled with a resolution-dependent scale factor
representing the radial part of a typical Fourier transform of a
macromolecule.  The final map is optionally locally resolution-filtered
(local sharpening).  The final map is also optionally blurred slightly
with a blurring dependent on the overall resolution of the map.

Optional use of an input full map.  If you supply a full map in addition
to the half-maps then the full map will be recombined with the two
map-phasing half-maps instead of the average of the two original
half-maps being recombined.  This could improve the final map if your
full map has been filtered in some special way.

Optional density modification with multiple models.  You can generate
multiple-model representations of each of your two half maps either
by supplying a single model (it will be randomized and refined against
each half-map) or by specifying density_modify_with_model=True (models
will be generated automatically).  The multiple models will be used to
estimate the density at each point near the models along with a guess of
the uncertainty of that density.  These are used to create a target map
for density modification that is based on model density and the
map from standard density modification, weighted according to their
uncertainty estimates.  This target map is then used in density modification
(as a target for the region containing the model), where Fourier coefficients   that lead to a map that is more similar to the target density are considered
more likely thatn those that do not.  This procedure necessarily can
introduce model bias, but this is greatly reduced by the use of multiple
models, uncertainty estimates, and the likelihood-based density modification.
You may with to use a finer resolution for density modification
(dm_resolution) if you include model information because the model can
provide higher-resolution information that simple density modification.

Density modification including symmetry that is not part of reconstruction
symmetry.  If your molecule has symmetry that was not used in the
reconstruction process, it can be used in density modification. Supply a
symmetry_file (.ncs_spec file) that you can obtain with the find_ncs tool
in phenix (uses a model or density) and it will be included.

Procedure used by resolve_cryo_em for density modification with model-building

Density modification with model-building adds additional cycles to the density modification procedure in which multiple models are built using map_to_model and the averaged density and uncertainty in the average density is used to combine the model density with the initial density-modified map.

The procedure includes:

Create initial density-modified half-maps and full map

Create N (typically 16) variants of the full map by changing the resolution
cutoff, spectral_scaling, and blurring of the map.

Build a model into each modified full map. If symmetry file supplied,
  build only the unique part of the model.

Refine some of the models against half-map 1 and some against half-map 2

Create one composite model based on all models

Create model density for each half-map based on the models refined againt
that map.  This model density will have a mean value and variance for each
point in the map near to at least 3 models.

Create composite density for each half-maps by combining the model density
with the density-modified half-map, weighting the model density according to
its consistency among models.

Density-modify each composite half-map, and create a new set of density
modified half-maps and full map, as in the procedure for standard
density modification.

Sharpen the resulting maps using model-based sharpening with the composite
model.

Note that this procedure takes a lot of computation.

Note on R-values in density modification

The R-value for density modification is quite different from a refinement R. Basically the statistical (maximum-likelihood) density modification procedure allows calculation of each structure factor based on the values of all the other structure factors and the expectations about the map (i.e, flat solvent, distribution of density values in protein, ncs, etc.) These "map-phasing" structure factors are (at least in the first cycle) independent of the starting structure factors and can be compared with the measured data and an R-value obtained. The values of these R-values range from about 0.25 to 0.5 in most cases, and when the map is very good usually the R values are smaller. These R values do not involve any atomic model so they are quite unlike a refinement R.

For discussion of how map-likelihood structure factors are calculated in Resolve, see the Map-likelihood phasing reference below.

Examples

Standard run of resolve_cryo_em:

You can use resolve_cryo_em to density-modify a cryo-EM map:

phenix.resolve_cryo_em half_map_A.mrc half_map_B.mrc seq_file=seq.dat

What to expect with density modification

If density modification is working well, you should see:

An improvement in reported resolution. Typically this is small..from 0.05 A
to 0.5 A.  If it is zero...it probably did not work.

More fine detail. Density modification mostly changes
the high-resolution part of your map. (The low-resolution part of your map
is generally already very good, and usually if it is not then density
modification can't fix it. Occasionally low and high-resolution aspects
of a map can improve.)  Compare your original (auto-sharpened) map with
your density-modified map and see if side chains and carbonyl oxygens are more
clear and if the connectivity of the chain improves.

If you supply or generate models, you should see improved high-resolution
detail in the region of the model.  You are not likely to see improvement
away from the model as you would in crystallography.  The multiple models
generated are basically identifying what locations in your map are
compatible with being the location of an atom, based on the density already
there and the geometrical restrictions on a model. That information then
is included in density modification and improves density in the region of
the model but is not strong enough to improve density elsewhere.

What to try to improve your map

Usually you should first just run with all defaults, supplying two half-maps and a sequence file and nominal resolution. See how the map looks and use it as a baseline. Then some things to try to improve the map are:

Add a full map file to recombine it with density-modified half-maps. This
can be helpful if your full map has been processed in a way that makes it
much better than an average of your half-maps.

Add a mask file to exclude part of the map. If your map has some huge noise
peaks or just density that is not part of your molecule, you can try to
exclude it by supplying a mask file (a map file with 1 where your molecule is
located and 0 elsewhere).

If you have a model, you can supply the model and it will be used to generate
a multi-model representation of your molecule. This multi-model representation
will be used in density modification. Note that you should supply a
symmetry file if there is symmetry in your reconstruction so that only
the unique part needs to be analyzed.

If you don't have a model but your resolution is high (about 3.5 A or better),
you can improve your map by automatically generating a multi-model
representation of your structure. This is used in the same way as the
multi-model representation starting with a supplied model, and a symmetry
file is helpful to identify just the unique part of the molecule.

You can try changing
the resolution (overall) and the density-modification resolution.  Normally
the resolution is the gold-standard resolution from comparing your half-maps.
However varying this may help.  The density-modification resolution is
normally set automatically to be somewhat finer than the overall resolution,
but its exact value does have an effect so you could try different values.

You can try changing some of the defaults that occasionally help. One is
setting density_modify_unsharpened_maps=True (density modifies the original
half-maps without auto-sharpening them). Another is setting
final_scale_with_fsc_ref=False (applies final resolution-dependent scaling
that is part-way between standard half-map FSC-based sharpening and
constant amplitudes over all resolutions).
These usually are best for high_resolution maps (< 3 A).

Real_space_weighting, sigma_weighting, and spectral_scaling may improve the
final map.  Real-space weighting weights original and density-modified maps
based on their local uncertainties.  Sigma-weighting weights them by
uncertainties in individual Fourier coefficients.  Spectral scaling scales the
final map by resolution to match the resolution-dependence of a typical
protein.

How to tell how well density modification is working

Density modification changes the sharpening of your map, so just because the map looks better or worse doesn't necessarily mean that anything important has happened.

If you have two maps and you want to know which is the better one, here are some things you can do:

1. Create a matched version of your maps that have the same
resolution-dependence using phenix.map_sharpening. Run it like this:

phenix.map_sharpening n_bins=100 auto_sharpen_methods=external_map_sharpening
     external_map_file=target_map.map map_to_change.map resolution=4.1
     sharpened_map_file=map_to_change_matched.map

Now you can look at target_map.map and map_to_change_matched.map and
differences you see are due to intrinsic properties of the maps, not just
sharpening.

2. If you have a good model, refine the model against each map.  Then use
phenix.mtriage to estimate the resolution where the FSC(map,model)==0.5
for each map. The map with a lower value of this resolution may be better.

3. As in #2, once you have a pair of maps and a model refined against
each map, you can run phenix.map_sharpening with each map and model and
it will print out the average FSC for each.  The one with the higher
overall FSC may be better.

Possible Problems

If the half-maps have been masked the procedure may not work well.

If the solvent noise is very non-uniform the procedure may work poorly. By default a rectangular solid region enclosing the molecule is cut out and used in density modification. You can supply a boxed map and set the keyword box_before_analysis=False to avoid this.

If the maps have very prominent density away from the macromolecule this may interfere with density modification. You can get around this by supplying a mask (as a map, 1=inside the molecule).

If there is non-macromolecule but real density in the maps this may interfere with density modification (for example, lipid density).

Specific limitations and problems:

If your computer does not have a lot of memory and you use multiple processors (nproc bigger than 1) then resolve_cryo_em can crash. If this happens try nproc=1.

Density modification introduces some correlations between half-maps due to solvent flattening. This can have a small effect on the resolution estimates obtained with half-map FSC. The resolution estimates provided by the program are corrected for this effect.

If you use the real_space_weighting or sigma_weighting or sharpening_type=local_final_half_map options there may be some extra correlations between half-maps introduced. Calculating resolution using FSC between these density-modified maps can lead to overstating the resolution. The resolution estimates provided by the program are before applying these weighting schemes (unless you specify local_methods_final_cycle = False and run multiple cycles) so they are not normally affected by this.

The density modification procedure works best in the resolution range of about 4.5 A or better, but occasionally works very well at lower resolutions (not tested beyond 9 A). The procedure does not work in every case (we do not have a great metric for success but about half of cases appear to be improved according to the metrics described above). Varying the parameters can have a substantial effect on the outcome.

Model-based density modification necessarily biases the map towards the models that are built. By building multiple models, the effect of this bias is reduced but not eliminated. For example if the starting map has an error that causes models to be built with a side chain the wrong place, the new model- based density will show even more density in that location. That is, this process can accentuate errors in the original maps if they match plausible locations of atoms in a model. Balancing this, the procedure generally does not appear to yield density at positions where there is no density in the original map even if atoms are located there in a model. It is important that the original or non-model-based maps be consulted to evaluate any specific density in the map.

Literature

Additional information

List of all available keywords