PredictModel: Predict structures of chains in a sequence file

Author(s)

Purpose

Predict model can be used to generate predicted models from a sequence file. One predicted model is generated for each unique sequence.

Sequence file

The sequence file that you supply specifies what is going to be predicted and how many copies of each chain are present in the structure (one for every copy in the sequence file). All models that are created and input will be associated with one or more of the chains in your sequence file based on sequence identity (normally every model should match a chain in the sequence file exactly).

Model prediction

Predict model normally uses either a Phenix server or a Google Colab notebook to carry out AlphaFold prediction of one chain at a time.

The general procedure is to create a set of files that specify all the necessary inputs for model prediction (a sequence file, optional multiple sequence alignment file, optional templates, keywords for model prediction such as the number of models to generate, random seed, whether to use a multiple sequence alignment) and to package all these files in a single zipped tar (.tgz) file.

If the Phenix server is specified, the Phenix GUI will use the Phenix server to carry out the prediction and put it in the working directory. This is the normal way to use this tool.

If Colab is specified, the Phenix GUI will provide a button that opens Google Colab in a browser, then you start the Colab notebook and upload the zipped tar file. The notebook runs the prediction and downloads a .zip file with the results. The Phenix GUI recognizes this file, opens it and puts the new prediction in the working directory. This directory is specified by the unique jobname that you provide for your work. You will need to be logged in to a Google account for this to work.

When prediction is being carried out, the Phenix GUI waits for the predicted models to appear in the working directory, then it writes out the resulting models.

MSA calculation vs model prediction

If you use the Phenix server, the calculation of multiple sequence alignments (MSAs) is a separate step from model prediction. The Phenix GUI on your computer sends a request to the mmseqs2 server, which creates an MSA and sends it back to your computer. Then your computer uploads the MSA to the Phenix server, which uses it in an AlphaFold prediction.

If you want, you can supply your own MSAs. The key requirement is that the sequence of the first entry in your MSA must exactly match the sequence to be predicted.

You can also skip the use of MSAs. This can be useful if you supply a template and you want AlphaFold to rebuild your template instead of doing a new prediction.

Number of models

AlphaFold can carry out multiple predictions for a sequence. You can specify how many of these to carry out. The PredictModel tool will choose the one with the highest value of pLDDT (predicted local difference distance test).

Using templates from the PDB

You can request that templates from the PDB be used in prediction. If you use this feature, models will be predicted both with and without the templates, and the model with the highest pLDDT will be saved.

Using supplied templates

You can supply your own templates. As for templates from the PDB, if you use this feature, models will be predicted both with and without the templates, and the model with the highest pLDDT will be saved.

Precalculated MSAs and models save time if you rerun with the same sequence

When you run a prediction on the Phenix server, it saves the MSA and the predicted model. Then if you run the same request again (same number of AlphaFold models, same sequence, same choice of including templates from the PDB, no supplied template), the server will return the result from the original run. You can turn this off with the precalculated results and precalculated MSA keywords if you want. Note that the server does not save your sequence or parameters. It just makes a hash string from them so that if the same request is made again it can detect it. The results are simply saved in a file that has the hash string as a name.

Precalculated MSAs and models allow you go get your result even if disconnected

If you get disconnected from the server during a prediction job, you can just wait for the job to finish, then submit a new request with the same parameters. As the server saves the predictions, it will return the result to you right away (if it is finished).

Examples

Standard run of predict_model

Running predict_model is easy. From the command-line you can type:

phenix.predict_and_build jobname=myjob seq_file=seq.dat \
  prediction_server=PhenixServer \
  stop_after_predict=True

Common questions

How long will it take?

A typical protein chain with 200 residues will take about 5-10 minutes to run, once it has started on the Phenix server.

Timing is more or less proportional to chain length. If you specify that more than 5 models are to be built, it will typically take longer.

If your chain has already been predicted for you or for someone else, you normally should get a result (a copy of that prediction) in about 30 seconds or less.

If the Phenix server is full (typically 6 jobs can run at once), you can use the Server button on the Phenix GUI to see the expected wait time to start a job.

Server problems

The most common problem in running Predict model is that either the Phenix server or the Colab server is not working as expected. Normally the first thing to try is just let the program retry (it will do this for a while normally). If that is not working, you can stop the program (abort in the GUI), and run again with the same parameters (i.e., the same jobname) except changing the server from PhenixServer to Colab or vice-versa. That will give you another chance. If neither server is working, you can take the files in the packaged .tgz file (listed in the GUI output), use them to get your own prediction with any server, and put the resulting predicted models in the place specified in the GUI or program output.

Specific limitations and problems:

Literature

Additional information

List of all available keywords (same as predict_and_build keywords)