AlphaFold and Phenix

You can use the predicted models from AlphaFold and other prediction software in Phenix. Using these models can be very helpful in structure determination because the models can be very accurate over much of their length and the models come with accuracy estimates that allow removal of poorly-predicted regions.

General procedure for using AlphaFold models in Phenix

To use AlphaFold models in Phenix you can follow this overall procedure:

1. Get an AlphaFold model (or a model from the PDB) for each chain in your structure. You can use the "AlphaFold in Colab" button in the Phenix GUI to do this using the Google Colab notebook system. (See also the documentation for running AlphaFold).

2. Trim the model and break into rigid domains. You can use phenix.process_predicted_model to do this.

3. Dock your models (cryo-EM) or carry out molecular replacement (crystallography) to place your models in the right places in your map or unit cell. You can use phenix.dock_predicted_model (cryo-EM) or phenix.phaser (crystallography) to do this.

4. Fill in the missing parts of your models with loop fitting or iterative model-building. You can do this with phenix.rebuild_predicted_model for cryo-EM and phenix.autobuild for crystallography.

5. Refine the rebuilt predicted models that you obtain. You can use phenix.real_space_refine for cryo-EM models and phenix.refine for crytstallographic models.

6. Examine your resulting model in detail, using the validation tools that are part of phenix.real_space_refine and phenix.refine to help you identify problem areas and using manual model-building tools to fix them.

Advanced steps

Iteration of AlphaFold and model-building with experimental data

You may be able to improve your rebuilt AlphaFold model by iterating the AlphaFold step and including your rebuilt model as a template (and skipping any other templates). You can do this with the the "AlphaFold in Colab" button in the Phenix GUI to do this using the Google Colab notebook system. (See also the documentation for running AlphaFold).

Structures with multiple chains

If your structure has more than one chain, you will need to carry out some additional steps. For a crystal structure, you'll want to generate a processed AlphaFold (or other) model for each chain and supply all of them to phenix.phaser for molecular replacement, usually all at once.

For a cryo-EM structure, you can work with one chain at a time. You can use the whole map for each chain, or if you have some idea of what chain goes where, you can mask out or box the map so that it shows only one chain and use that as your map. Boxing or masking the map can speed up the process and improve the result considerably.

Boxing maps before rebuilding

For complex structures with many chains or with chains that contain domains with long linkers, docking can be very complicated and take a long time. In these cases it may be especially helpful to box or mask the map if you can do that. If you cannot, you might want to run the docking step individually with each domain that you get from phenix.process_predicted_model and then examine where they went in the map. If the domains seem to correspond to different molecules, you might want to mask out the part of the density that corresponds to the molecule you don't want to fit and re-try. You can also try running phenix.dock_in_map with one domain at a time and ask to find multiple copies; then you can choose the one that matches up with the other domains you have placed.

Running rebuilding in a single step

For a cryo-EM structure, you can carry out steps 2-4 in one step with phenix.dock_and_rebuild. This just links the processing, docking, and rebuilding steps together.

Background

Structure prediction software is now capable of generating models that are highly accurate over some or all parts of the models. Importantly, these predictions often come with reliable residue-by-residue estimates of uncertainty.

Compact domains in these predicted models in which all the residues have high confidence often will be very accurate over the entire domains. However, separate domains that each have high confidence but are connected by lower confidence residues sometimes have relative positions and orientations that differ between predicted and experimentally-determined structures.

When using predicted models as a starting point for experimental structure determination, it can be helpful to:

Remove low-confidence residues entirely

Break up the model into domains and allow the domains to have
different orientations

For a high-confidence predicted model, you might try using the predicted model as-is first. For most predicted models, you may want to try removing low-confidence residues, then additionally try breaking the model into domains and placing the domains one at a time.

An important feature of recent predicted models is that they generally have very accurate sequence alignment. That means that the assignment of the sequence to the high-confidence parts of the model is usually correct. This can make a very big difference in completion of the remainder of the structure (the parts that were not predicted with high confidence) because you know exactly what residues go in the gaps. This means that model-building of the remainder of the structure can often be completed with loop-fitting tools instead of trying to rebuild everything.

What do do after applying Phenix tools to predicted models

While AlphaFold and other predicted models can be quite accurate overall, some details in otherwise-accurate regions and some whole regions can be incompatible with your experimental data. The Phenix procedures for producing models based on your experimental data and using predicted models as starting points are designed to try and keep the accurate parts of the predicted models and to replace the inaccurate parts. Depending on the resolution of your data, the automatically-produced models may be quite accurate or may themselves need a lot of trimming and rebuilding.

Once you have used Phenix tools to modify a predicted model based on experimental data you will want to carefully analyze the resulting model, comparing every detail to the experimental map or data. You will generally want to use manual model-building tools such as Coot or Isolde to fix small and large errors in the models that remain.

You can also iterate the AlphaFold prediction using your rebuilt model as an input to a new round of AlphaFold (See also the documentation for running AlphaFold).

Literature

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2

Hiranuma, N., Park, H., Baek, M. et al. Improved protein structure
refinement guided by deep learning based accuracy estimation. Nat Commun 12, 1340 (2021). https://doi.org/10.1038/s41467-021-21511-x