Rebuild predicted model morphs and rebuilds a model produced by AlphaFold, RoseTTAFold and other prediction software into a cryo EM map, using a set of docked domains from the predicted model as a template.
The rebuild_predicted_model program is normally the second step in working on an AlphaFold or other predicted model with a cryo-EM structure.
The first step is to process the predicted model by trimming off all the uncertain residues in the predicted model and breaking up the remaining structure into a best guess of rigid domains with phenix.process_predicted_model .
The next step is to dock each of the domains of the processed model into the map, keeping plausible connectivity. This is done with phenix.dock_predicted_model .
The third step is to morph the predicted model onto the docked domains and then to rebuild all the parts of the predicted model using the density in the map. This is done with phenix.rebuild_predicted_model .
The rebuild_predicted_model procedure uses three pieces of input information.
The first is the starting predicted model (AlphaFold model). This model is assumed to generally be quite accurate and to have a chain with all the right residues, but in which some parts of the model are not useful. Further, it is assumed that the predicted model was supplied with a measure of residue accuracy (as AlphaFold models are) so that poorly-predicted residues can be identified and fixed. Note that you will normally work with a single chain at a time in this procedure even if your map has multiple chains in it.
The second piece of information is a docked set of domains from this predicted model. These domains can be any parts of the model and they are assumed to be the accurate parts of the model. They have to be placed in about the right places in the map but not every detail needs to be correct. This docked model can be the output of phenix.dock_predicted_model but it can come from any procedure that yields a single chain matching the chain in your predicted model (but it may have gaps).
The third piece of information is a map. Normally this is a cryo-EM map but it can be any map that was used to dock the domains. The map input to rebuild_predicted model is normally your best sharpened or density-modified cryo-EM map. It can also be a map generated by any other procedure (including crystallography).
Using a masked map in rebuild_predicted_model can be somewhat helpful but it is not nearly as helpful as in the docking step.
The first step in rebuilding the predicted model is to morph the model to match the docked domains. In essence this means that the parts of the model that match a docked domain superimpose on the docked domain, and the residues in between are stretched to span the gap. These residues between domains may be in totally implausible arrangements at this step. They are serving as markers for where a chain goes, not an actual tracing.
The next step is to rebuild each docked domain and each connecting loop in several ways, scoring each one based on fit to the map. The methods used to rebuild are:
Simple refinement Iterative resolution-dependent refinement. Starting from low resolution, carry out refinement, shift to higher resolution and repeat. Simple loop-fitting. Use `phenix.fit_loops <fit_loops.html>`_ to re-fit each loop. The number of residues in the loop remains fixed. Trace-through-density loop fitting. Using a procedure for finding a connection through high density to trace the possible path for a loop and then build a chain to match that tracing. Extend ends. Try to extend the ends of the model in N- or C-terminal directions starting at the last residues that are accurately placed if residues are poorly-defined on the ends
This procedure yields hypotheses for each segment in the model. The final step is to simply merge all the best segments into one composite model. The final model is not refined further, so additional refinement will be useful.
Running rebuild_predicted_model is easy. From the command-line you can type:
phenix.rebuild_predicted_model model=my_model.pdb \ processed_model_file=my_model_processed.pdb map_file=my_map.ccp4 \ resolution=3
This will dock the domains (one for each chain ID) from my_model_processed.pdb into my_map.ccp4. It will rename all the chains to match the chain in my_model.pdb and arrange all the residues in order.