Running AlphaFold2 in Google CoLab

You can create a new AlphaFold model using cloud computing and a simple web interface available from the Phenix GUI.

How to run AlphaFold on Colab

You will need the 1-letter sequence of your protein (that's all).

To access the site, you can use the "AlphaFold2 in CoLab" button in the Phenix GUI or you can go directly to the Phenix AlphaFold Colab notebook .

Then you paste your sequence into the form and type in a job name. You enter that information by hitting the Run button (a circle with a right-arrow in it) next to the form.

Then you load up AlphaFold by hitting the next Run button, and finally you run AlphaFold by hitting the third Run button. That's it. The loading of AlphaFold can take a couple minutes and usually running it takes 10-15 minutes for a moderately-sized protein (350 residues or so).

Your AlphaFold model is downloaded automatically along with some informational files as a zip file.

This version of AlphaFold automatically scans the PDB for templates that are similar in sequence to your sequence and includes them in the prediction. This is in addition to finding all the sequence homologues as part of the core prediction method in AlphaFold.

Note: There is a second button in the GUI labelled "AlphaFold2 with template in CoLab". That is the button you want to use if you already have a model and want to improve it.

Advanced use of the AlphaFold notebook

You can run multiple sequences sequentially with this AlphaFold notebook. You supply a file that has one job name and one sequence on each line. These are then used as input to AlphaFold, one at a time. You get back one zip file for each sequence you submit.

You can also run multiple sequences by simply entering another sequence and job name after you hit the Run button to load your first sequence and job name. You can do this as many times as you want.

About Google Colab

Google Colab is a really nice system for sharing and running software. Anyone with a Google login can run, share and create notebooks on Colab.

Colab notebooks have one or more 'cells' that do something and set up for the next cell. You can run an individual cell with a Run button next to the cell. You can run all the cells in order with a pull-down menu item called "Run all" under the heading "Runtime" if you want as well (you can run this notebook in that way too, just put in your sequence and job name and hit "Run all".)

Notebooks can have inputs like numbers or sequences or places to upload files. The inputs can be available at the beginning or may appear during execution.

If something goes wrong, you can (sometimes) fix an input and re-run the cell where the problem occurred, or all cells starting there.

You do have to pay a little attention to exactly what notebook you are running. As notebooks can be easily changed and shared, there can be many versions of a notebook (as for AlphaFold).

Possible problems

If something goes wrong with the run, you can just load the site over again and start from the beginning.

The Colab notebook can crash or time out at any time...that means if you are running multiple predictions you could lose a lot of work. You can mitigate this problem by manually downloading results as the appear (using the folder icon on the left side of the notebook, selecting a .zip file to show a download menu, and downloading the file).

See the notes at the bottom of the Colab notebook Colab notebook for more hints

Background

The Google DeepMind software AlphaFold2 can be run with a Google login on Google's cloud computing software with Google's Colab notebook service. The AlphaFold team created an AlphaFold Colab notebook and the ColabFold team created a simpler version called ColabFold: AlphaFold2 w/ MMseqs2 .

The notebook for Phenix is a further simplified version of the ColabFold notebook suitable for use with Phenix.

Non-commercial use only

Google DeepMind has made the code for AlphaFold2 open source and available to anyone. Note however that the AlphaFold2 database of parameters (required for use of AlphaFold2) is licensed only for non-commercial use. That means that use of this notebook is limited to non-commercial purposes (NonCommercial means not primarily intended for or directed towards commercial advantage or monetary compensation)

Required citations for using the AlphaFold Colab notebook

If you use a model from the AlphaFold CoLab notebook you should be sure to cite the following two publications:

  1. The AlphaFold2 paper:

    Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

  2. The ColabFold notebook on which the Phenix AlphaFold notebook is based:

    Mirdita, M., Ovchinnikov, S., Steinegger, M. ColabFold - Making protein folding accessible to all bioRxiv 2021.08.15.456425.