Running AlphaFold2 in Google CoLab

You can create a new AlphaFold model using cloud computing and a simple web interface available from the Phenix GUI.

You can also run AlphaFold and iteratively improve the AlphaFold models by including information from a density map.

How to run AlphaFold with a density map

You will need:

The 1-letter sequence of your protein (one chain)

A density map. Normally you should cut out the part of your map
that includes the protein to be modelled using phenix.map_box.
However you can include your entire map (this makes it very slow
to upload and slow to run though).

The resolution of your map.

A Phenix download password. This is the password you or someone from
your institution has used to download Phenix.  It is updated each week
so you may need to get a new one rather frequently.  You can get your
password from https://phenix-online.org/download .

To access the site, you can use the "AlphaFold with density map" button in the Phenix GUI. This takes you to a Google Colab notebook (running Google GPU machines in the cloud).

You first reboot the virtual machine by hitting the Run button next to the first cell in the notebook.

Then you paste in the sequence, set the resolution, choose a job name (like myjb_any_text_here, with the first 4 characters before the first underscore defining the job hame), and enter the download password.

You can upload your map on the fly, or you can first upload it to a directory called "ColabInputs" in your Google Drive folder. In the third cell you specify whether you have already uploaded the map.

Then you use the pull-down menu to select Run All and the notebook starts. If you haven't uploaded your map, it will give you an upload button to upload it. If you have, it will ask for permission to access your Google Drive folder.

After that everything is automatic...the notebook installs AlphaFold and Phenix and runs the iterative prediction and model rebuilding for you. Then it automatically downloads a zip file with the AlphaFold models and rebuilt versions. You can have the notebook save your work in a Google drive folder called ColabOutputs if you want (it is a good idea, as the notebook can crash or time out). The notebook has many helpful hints at the bottom of the screen as well.

How to run AlphaFold on Colab

You will need the 1-letter sequence of your protein (that's all).

To access the site, you can use the "AlphaFold2" button in the Phenix GUI.

Then you paste your sequence into the form and type in a job name. You enter that information by hitting the Run button (a circle with a right-arrow in it) next to the form.

Then you load up AlphaFold by hitting the next Run button, and finally you run AlphaFold by hitting the third Run button. That's it. The loading of AlphaFold can take a couple minutes and usually running it takes 10-15 minutes for a moderately-sized protein (350 residues or so).

Your AlphaFold model is downloaded automatically along with some informational files as a zip file.

This version of AlphaFold allows you to use templates from the PDB are similar in sequence to your sequence and include them in the prediction. This is in addition to finding all the sequence homologues as part of the core prediction method in AlphaFold.

Advanced use of the AlphaFold notebook

You can run multiple sequences sequentially with this AlphaFold notebook. You supply a file that has one job name and one sequence on each line. These are then used as input to AlphaFold, one at a time. You get back one zip file for each sequence you submit.

You can also run multiple sequences by simply entering another sequence and job name after you hit the Run button to load your first sequence and job name. You can do this as many times as you want.

Using the Phenix AlphaFold Colab notebook to run AlphaFold with a template and iterate modeling

You can run AlphaFold with a template that you supply if you want. You can do this with the Phenix Colab notebook that you access through the Phenix GUI.

The purpose of running AlphaFold with a template is to improve a model that you already have by using it as a template in an AlphaFold prediction. This allows you to iterate the process of AlphaFold prediction and model rebuilding and can further improve a model beyond what you can achieve with a single cycle.

About Google Colab

Google Colab is a really nice system for sharing and running software. Anyone with a Google login can run, share and create notebooks on Colab.

Colab notebooks have one or more 'cells' that do something and set up for the next cell. You can run an individual cell with a Run button next to the cell. You can run all the cells in order with a pull-down menu item called "Run all" under the heading "Runtime" if you want as well (you can run this notebook in that way too, just put in your sequence and job name and hit "Run all".)

Notebooks can have inputs like numbers or sequences or places to upload files. The inputs can be available at the beginning or may appear during execution.

If something goes wrong, you can (sometimes) fix an input and re-run the cell where the problem occurred, or all cells starting there.

You do have to pay a little attention to exactly what notebook you are running. As notebooks can be easily changed and shared, there can be many versions of a notebook (as for AlphaFold).

Possible problems

If something goes wrong with the run, you can just load the site over again and start from the beginning.

The Colab notebook can crash or time out at any time...that means if you are running multiple predictions you could lose a lot of work. You can mitigate this problem by manually downloading results as the appear (using the folder icon on the left side of the notebook, selecting a .zip file to show a download menu, and downloading the file).

See the notes at the bottom of the Colab notebook for more hints

Background

The Google DeepMind software AlphaFold2 can be run with a Google login on Google's cloud computing software with Google's Colab notebook service. The AlphaFold team created an AlphaFold Colab notebook and the ColabFold team created a simpler version called ColabFold: AlphaFold2 w/ MMseqs2 .

The notebook for Phenix is a further simplified version of the ColabFold notebook suitable for use with Phenix.

Non-commercial and commercial use now permitted

Google DeepMind has made the code for AlphaFold2 open source and available to anyone. As of 1/2022 the AlphaFold2 database of parameters (required for use of AlphaFold2) is also licenced for use by anyone. That means anyone can use this notebook for non-commercial or commercial purposes.

Required citations for using the AlphaFold Colab notebook

If you use a model from the AlphaFold CoLab notebook you should be sure to cite the following two publications:

  1. The AlphaFold2 paper:

    Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

  2. The ColabFold notebook on which the Phenix AlphaFold notebook is based:

    Mirdita, M., Ovchinnikov, S., Steinegger, M. ColabFold - Making protein folding accessible to all bioRxiv 2021.08.15.456425.