Docking processed AlphaFold2 and other predicted models in cryo-EM maps

Author(s)

Purpose

Dock predicted model docks the domains from a model produced by AlphaFold, RoseTTAFold and other prediction software into a cryo EM map. It uses the connectivity of the model as a restraint in the docking process so that the docked domains normally are in a reasonable arrangement. It can take map symmetry into account.

The dock_predicted_model program is normally the second step in working on an AlphaFold or other predicted model with a cryo-EM structure.

The first step is to process the predicted model by trimming off all the uncertain residues in the predicted model and breaking up the remaining structure into a best guess of rigid domains with phenix.process_predicted_model .

The next step is to dock each of the domains of the processed model into the map, keeping plausible connectivity. This is done with phenix.dock_predicted_model .

The third step is to morph the predicted model onto the docked domains and then to rebuild all the parts of the predicted model using the density in the map. This is done with phenix.rebuild_predicted_model .

How dock_predicted_model works:

The model input to dock_predicted_model is a model file that comes from your starting predicted model (AlphaFold model) but which has been split so that each chain represents one domain (fixed rigid unit) of the predicted model, and from which all uncertain residues are removed. This is normally the output of phenix.process_predicted_model .

Note that you can work with one chain at a time even in a cryo-EM map that has many chains.

The map input to dock_predicted model is normally your best sharpened or density-modified cryo-EM map. It can also be a map generated by any other procedure (including crystallography).

If you are able to mask your map, keeping only the part representing the region where this model (one chain) belongs, that can be very helpful. You can also box the map around this region. If you have a map that has many chains, this masking can greatly shorted the time for docking. If you don't know where in your map the model goes at all, you can supply the entire map. If your map has symmetry this will normally be found automatically.

The first step in docking the predicted model is to extract the unique part of the map. If the map is asymmetric, this is the region where there is density. If it is symmetric, it is the unique part of the density, chosen to make a compact unit. Note that this step may not be perfect, particularly in the case of a very asymmetric molecule. You can possibly do much better by segmenting the map by hand and selecting just the region represenging your molecule if you have the time.

The next step is to sort the domains in the processed model by size and dock them in the map. Each domain is docked and refined by rigid-body refinement. Then the transformation obtained for that domain is applied to all the other domains as a quick guess as to how they should be placed. If they match the density, they are refined as well. If a complete model can be obtained by connecting these pieces (with distances between residues in one domain and the next residues in another domain compatible with the number of residues in between that are missing), then the docking is complete. If it is not, additional domains are docked, until all have been tried.

When all domains are placed, all the residues in the domains are sorted and a single clean chain with chain ID from the original predicted model and gaps corresponding to the processed model is produced.

Examples

Standard run of dock_predicted_model:

Running dock_predicted_model is easy. From the command-line you can type:

phenix.dock_predicted_model model=my_model.pdb \
   processed_model_file=my_model_processed.pdb map_file=my_map.ccp4 \
   resolution=3

This will dock the domains (one for each chain ID) from my_model_processed.pdb into my_map.ccp4. It will rename all the chains to match the chain in my_model.pdb and arrange all the residues in order.

Possible Problems

For complicated molecules, this procedure may not work well because it does not examine every possible docking position of every domain, and a complex structure might have more than one copy of a particular chain. You can improve the success a lot if you are able to mask the map around one copy of the chain you are looking for.

If that doesn't work, you can try to dock the domains from your AlphaFold model using phenix.dock_in_map and asking for multiple copies of the domains, then you could manually try to choose which domains go together. You can then create one PDB file with one copy of each domain (you will need to create a PDB file that has just one chain ID and all residues arranged sequentially, but with gaps allowed) and go on to phenix.rebuild_predicted_model with that as your docked model.

Specific limitations and problems:

Literature

Additional information

List of all available keywords