PredictModel: Predict structures of chains in a sequence file

Author(s)

Purpose

Predict model can be used to generate predicted models from a sequence file. One predicted model is generated for each unique sequence.

You can choose whether to use a multiple sequence alignment (MSA), whether go use templates from the PDB, and whether to supply your own template to be used as a guide in prediction.

Sequence file

The sequence file that you supply specifies what is going to be predicted and how many copies of each chain are present in the structure (one for every copy in the sequence file). All models that are created and input will be associated with one or more of the chains in your sequence file based on sequence identity (normally every model should match a chain in the sequence file exactly).

Model prediction

Predict model normally uses either a Phenix server or a Google Colab notebook to carry out AlphaFold prediction of one chain at a time.

Prediction is fully automated with the Phenix server through the Phenix GUI, so all you have to do is let it run. The Phenix GUI will use the Phenix server to carry out the prediction and put it in the working directory.

You can specify what inputs the AlphaFold prediction should use. These always include a sequence file, but it can include an optional multiple sequence alignment file, optional templates, keywords for model prediction such as the number of models to generate, random seed, and whether to use multiple sequence alignment.

If you want to use Colab instead of the Phenix server, the procedure has slightly more steps. The Phenix GUI will provide a button that opens Google Colab in a browser, then you start the Colab notebook and upload a zipped tar file that the GUI will write out to your Downloads directory. The notebook runs the prediction and downloads a .zip file with the results. The Phenix GUI recognizes this file and opens it. You will need to be logged in to a Google account for this to work.

When prediction is being carried out, the Phenix GUI waits for the predicted models to appear in the working directory, then it writes out the resulting models.

MSA calculation vs model prediction

If you use the Phenix server, the calculation of multiple sequence alignments (MSAs) is a separate step from model prediction. The Phenix GUI on your computer sends a request to the mmseqs2 server, which creates an MSA and sends it back to your computer. Then your computer uploads the MSA to the Phenix server, which uses it in an AlphaFold prediction.

If you want, you can supply your own MSAs. The key requirement is that the sequence of the first entry in your MSA must exactly match the sequence to be predicted.

You can also skip the use of MSAs. This can be useful if you supply a template and you want AlphaFold to rebuild your template instead of doing a new prediction.

Number of models

AlphaFold can carry out multiple predictions for a sequence. You can specify how many of these to carry out. The PredictModel tool will choose the one with the highest value of pLDDT (predicted local difference distance test).

Using templates from the PDB

You can request that templates from the PDB be used in prediction. If you use this feature, models will be predicted both with and without the templates, and the model with the highest pLDDT will be saved.

Using supplied templates

You can supply your own templates. As for templates from the PDB, if you use this feature, models will be predicted both with and without the templates, and the model with the highest pLDDT will be saved.

Precalculated MSAs and models save time if you rerun with the same sequence

When you run a prediction on the Phenix server, it saves the MSA and the predicted model. Then if you run the same request again (same number of AlphaFold models, same sequence, same choice of including templates from the PDB, no supplied template), the server will return the result from the original run. You can turn this off with the precalculated results and precalculated MSA keywords if you want. Note that the server does not save your sequence or parameters. It just makes a hash string from them so that if the same request is made again it can detect it. The results are simply saved in a file that has the hash string as a name.

Precalculated MSAs and models allow you go get your result even if disconnected

If you get disconnected from the server during a prediction job, you can just wait for the job to finish, then submit a new request with the same parameters. As the server saves the predictions, it will return the result to you right away (if it is finished).

Using AlphaFold to improve a model you already have

As AlphaFold often produces models with quite good geometry, you can use it as a procedure for geometry optimization. You supply your working model and your sequence, and you turn off the use of MSAs and the use of templates from the PDB. Of course AlphaFold does not know about your density map, so it could move the model away from the density. Normally you would use refinement to help with this.

Using AlphaFold to build parts of a model that are missing

If you want AlphaFold to try and build the missing parts of a chain that you have partially built, supply your working partial model and your sequence, and allow the use of MSAs (and optionally, the use of templates from the PDB). Your working model will be used as a template for the part of the structure that you have already built. Note that you have to have the correct sequence for your partially-built model.

Examples

Standard run of predict_model

Running predict_model is easy. From the command-line you can type:

phenix.predict_and_build jobname=myjob seq_file=seq.dat \
  prediction_server=PhenixServer \
  stop_after_predict=True

Common questions

How long will it take?

A typical protein chain with 200 residues will take about 5-10 minutes to run, once it has started on the Phenix server.

Timing is more or less proportional to chain length. If you specify that more than 5 models are to be built, it will typically take longer.

If your chain has already been predicted for you or for someone else, you normally should get a result (a copy of that prediction) in about 30 seconds or less.

If the Phenix server is full (typically 6 jobs can run at once), you can use the Server button on the Phenix GUI to see the expected wait time to start a job.

Server problems

The most common problem in running Predict model is that either the Phenix server or the Colab server is not working as expected. Normally the first thing to try is just let the program retry (it will do this for a while normally). If that is not working, you can stop the program (abort in the GUI), and run again with the same parameters (i.e., the same jobname) except changing the server from PhenixServer to Colab or vice-versa. That will give you another chance. If neither server is working, you can take the files in the packaged .tgz file (listed in the GUI output), use them to get your own prediction with any server, and put the resulting predicted models in the place specified in the GUI or program output.

Specific limitations and problems:

Literature

Additional information

List of all available keywords (same as predict_and_build keywords)