AlphaFold and Phenix

You can use the predicted models from AlphaFold and other prediction software in Phenix. Using these models can be very helpful in structure determination because the models can be very accurate over much of their length and the models come with accuracy estimates that allow removal of poorly-predicted regions.

General procedure for using AlphaFold models in Phenix

To use AlphaFold models in Phenix you can follow this overall procedure:

1. Get an AlphaFold model (or a model from the PDB) for each chain in your structure. You can use the "AlphaFold in Colab" button in the Phenix GUI to do this using the Google Colab notebook system. (See also the documentation for running AlphaFold).

2. Trim the model and break into rigid domains. You can use phenix.process_predicted_model to do this.

3. Dock your models (cryo-EM) or carry out molecular replacement (crystallography) to place your models in the right places in your map or unit cell. You can use phenix.dock_in_map (cryo-EM) or phenix.phaser (crystallography) to do this.

4. Fill in the missing parts of your models with loop fitting or iterative model-building. You can do this with phenix.fit_loops for cryo-EM and phenix.autobuild for crystallography.

Background

Structure prediction software is now capable of generating models that are highly accurate over some or all parts of the models. Importantly, these predictions often come with reliable residue-by-residue estimates of uncertainty.

Compact domains in these predicted models in which all the residues have high confidence often will be very accurate over the entire domains. However, separate domains that each have high confidence but are connected by lower confidence residues sometimes have relative positions and orientations that differ between predicted and experimentally-determined structures.

When using predicted models as a starting point for experimental structure determination, it can be helpful to:

Remove low-confidence residues entirely

Break up the model into domains and allow the domains to have
different orientations

For a high-confidence predicted model, you might try using the predicted model as-is first. For most predicted models, you may want to try removing low-confidence residues, then additionally try breaking the model into domains and placing the domains one at a time.

An important feature of recent predicted models is that they generally have very accurate sequence alignment. That means that the assignment of the sequence to the high-confidence parts of the model is usually correct. This can make a very big difference in completion of the remainder of the structure (the parts that were not predicted with high confidence) because you know exactly what residues go in the gaps. This means that model-building of the remainder of the structure can often be completed with loop-fitting tools instead of trying to rebuild everything.

Literature

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2

Hiranuma, N., Park, H., Baek, M. et al. Improved protein structure

refinement guided by deep learning based accuracy estimation. Nat Commun 12, 1340 (2021). https://doi.org/10.1038/s41467-021-21511-x