PredictAndBuild: Solving an X-ray structure or interpreting a cryo-EM map using predicted models
Author(s)
predict_and_build: Tom Terwilliger
Acknowledgements
The Phenix AlphaFold server uses AlphaFold (Jumper et al., 2021),
the mmseqs2 MSA server (Steinegger, and Söding, 2017), and
scripts derived from ColabFold (Mirdita et al., 2022).
basic overview of X-ray or cryo-EM structure determination with AlphaFold models
how to run PredictAndBuild with X-ray or cryo-EM data via the GUI
how to interpret the output
Purpose
Predict and build can be used to generate predicted models and to use them
to solve an X-ray structure by molecular replacement or to interpret a cryo-EM map. Predict and build then carries out iterative model rebuilding and
prediction to improve the models. The iterative procedure allows creation of
more accurate predicted models than can be obtained with a simple prediction.
Normally Predict and build is used as a way to automatically generate a
fairly accurate model starting from just a sequence file and either
cryo-EM half-maps or X-ray data. Additionally Predict and build provides
morphed versions of unrefined predicted models that can be useful as
reference models for refinement.
Steps in predict_and_build
The principal steps in predict_and_build are:
Model prediction (e.g., with AlphaFold)
Model trimming
Structure solution by molecular replacement (X-ray) or map interpretation
by docking (cryo-EM) to create a scaffold model
Morphing of original predicted models onto the scaffold model
Model refinement and rebuilding
Iteration of model prediction using the rebuilt models as templates
Sequence file
The sequence file that you supply specifies what is going to be predicted and
how many copies of each chain are present in the structure (one for every copy
in the sequence file). All models that are created and input will be associated
with one or more of the chains in your sequence file based on sequence
identity (normally every model should match a chain in the sequence file
exactly).
Input data (cryo-EM)
For cryo-EM maps, normally you will supply two full-size half-maps. These will
be used in density modification to create a single full-size map. This map
will then be boxed to extract just the part that contains the molecule (the
part with high density). Alternatively, you can supply a single full-size or
boxed map if you wish.
Input data (X-ray)
For a crystal structure, you will supply an mtz-style file containing data
for Fobs and sigFobs. Optionally this file can contain a test set
(FreeR_flags). The space group specified in this data file
and its enantiomer, if any, will be used in structure determination. This
means you should run phenix.xtriage first so that you have
a good idea of the space group.
Model prediction
Predict and build normally uses the Phenix server
to carry out AlphaFold prediction of one chain at a time. It can
also use predictions that you supply or that you generate on demand and put
in the same place on your computer as those that are created by one of
the servers.
Prediction (see phenix.predict_model for details)
is fully automated with the Phenix server through the Phenix GUI,
so all you have to do is let it run. The Phenix GUI will use the Phenix
server to carry out the prediction and put it in the working directory.
You can specify what inputs the AlphaFold prediction should use. These
always include a sequence file, but it can include an optional multiple
sequence alignment file, optional templates, keywords for model prediction
such as the number of models to generate, random seed, and whether to use
multiple sequence alignment.
When prediction is being carried out, the Phenix GUI waits for the predicted
models to appear in the working directory, then it goes on to the next steps.
If you want to create these models in some other way, you can just put them
in the working directory (the expected file names are listed in the GUI when
prediction starts) and they will be used. Note that all prediction files
must have all the residues in the corresponding sequence present.
Model trimming
The predicted models are trimmed to remove low-confidence sections and to
split into compact domains using
phenix.process_predicted_model . This
allows the molecular replacement (X-ray) or docking (cryo-EM) procedures
to use the most accurate parts of the models, increasing the chance of finding
the correct solution. Splitting into compact domains increases the chance
of correctly placing parts of chains that have different orientations in the
predicted model compared to the actual structure.
Structure determination by molecular replacement (X-ray data)
Predict and build uses phenix.phaser with all default
parameters to solve a structure by molecular replacement. If your structure
cannot be solved automatically in this way, then you will want to take your
trimmed predicted models, solve your structure separately by molecular
replacement (e.g., by using phenix.phaser but
changing some parameters), then supplying the molecular replacement model
to Predict and build as a "scaffold model". After molecular replacement,
domains are rearranged if necessary to allow sequential parts to be connected,
creating a scaffold model for use in the morphing step to follow.
The molecular replacement model is refined using the X-ray data, yielding
a 2mFo-DFc map. This map is density-modified as well, and the map that has
the higher correlation with the refined model is used as the working map.
Structure determination by docking (cryo-EM data)
Predict and build first carries out density modification based on the two
half-maps that you provide. This creates an optimized map for interpretation.
Predict and build then sequentially docks all the domains from all the
predicted models into the map. After docking, the domains are rearranged
if necessary to allow sequential parts to be connected. This docking is
carried out by phenix.dock_and_rebuild . The
rearranged docked model is the scaffold model that will be used in the next step.
Morphing of original predicted models onto the scaffold model
The scaffold model obtained from predicted models and X-ray or cryo-EM maps
usually contains one chain for each sequence in your sequence file. Your
predicted models normally include one model matching each unique sequence in
the sequence file. For each chain in the scaffold model, the predicted model
that is most similar is then morphed (adjusted) to superimpose on the
scaffold model.
In the simplest case, your predicted models all yielded
a single domain in the trimming step. In this case the chains in your scaffold
model will match your predicted models exactly and morphing consistes of
simple superposition of the full predicted model on the docked chain. In a
more complicated case, your trimmed model may have consisted of multiple
domains for one chain, so that your scaffold model may have separately-placed
domains for a particular chain. In this case, the morphing consists of
superimposing the parts of the full predicted model that match the scaffold,
and smoothly deforming the connecting segments.
Model refinement, rebuilding, and trimming
Model refinement and rebuilding is done by
phenix.dock_and_rebuild . This procedure consists
of identifying all the parts of each chain that are either predicted with
low confidence or that do not match the density, rebuilding them in
several different ways, and picking the one that matches the density the best.
Then the individual chains are refined with real-space refinement.
At the end of rebuilding, residues in the model that are in poor density
are identified and marked with zero occupancies. A new trimmed final model
lacking these residues is also created
For X-ray data, the trimmed model is refined using the X-ray data, yielding a
new working map (2mFo-DFc or density modified, as in the molecular
replacement step).
Iteration of model prediction using the rebuilt models as templates
A key element of the Predict and build procedure is iteration of model
prediction using chains from the rebuilt model as templates. This
improves model prediction compared to a single prediction step. Normally
the entire procedure is repeated until the change in predicted models between
subsequent cycles is small.
Output models
The results of the Predict and build procedure are:
A final docked model. This model is the last scaffold model and is
normally suitable for use as a reference model in further refinement.
Normally this model will consist of pure predicted models,
placed appropriately to match the map.
A final trimmed rebuilt model. This model is obtained by rebuilding and
refinement, followed by trimming residues that are in poor density.
It is normally suitable for use as a
starting point for further model rebuilding, refinement, and addition of
ligands and covalent modifications. A final untrimmed model is also
provided. This can be used as a hypothesis for the poorly-fitting regions.
In some cases (low-resolution data), the trimmed rebuilt model may have very
poor geometry and you might want to use the final docked model as the
starting point for refinement and continuation of building of your structure.
The Carry-on directory and restarting or re-running jobs
Predict and build saves the working files in a carry_on_directory.
At the end of the log file (and in the GUI at the end of a run) the
carry-on directory is listed. In the GUI it is normally a directory with the
same number as the job and ending in "CarryOn": for
PredictAndBuild_30 it is PredictAndBuild_30_CarryOn.
If you have a previous run that you want to restart or continue, specify
the name of this file in the GUI as the Carry-on directory, make sure
the flag "carry_on=True" is set, and rerun the
job with the same inputs (otherwise) as before.
Then running the job will result in reading all the results
that had been accumulated by the previous job, then continuing from where it
left off.
The carry-on directory lets you restart after stopping or crashing.
It also lets you
change some parameters and carry on from where you left off.
The jobname also lets you move a project to another computer if you want. You
can copy the carry-on directory and put it in any
directory on any computer. Then you can specify that directory as the
Carry-on directory on that computer and it will carry on from
where you left off.
Note that the file names inside the Carry-on directory contain
the jobname, so if you want to move individual files around, you might have
to change their names.
The universal solution for (most) problems in PredictAndBuild
The solution to many problems that you may have with PredictAndBuild is simply
to completely stop running your job, then restart, specifying the
Carry-on directory from the job that stopped.
This will pick up where you left off. It is suitable for cases
where something crashed, where you accidentally stopped a job, where you lost
connections to the server, or where something else went wrong.
In some cases to stop running your job you might have to stop all python
processes on your computer and quit the GUI, but in most cases you can
just hit Abort in the GUI and wait a little.
MSA calculation vs model prediction
If you use the Phenix server, the calculation of multiple sequence alignments
(MSAs) is a separate step from model prediction. The Phenix GUI
on your computer sends a request to the mmseqs2 server, which creates an MSA
and sends it back to your computer. Then your computer uploads the MSA to
the Phenix server, which uses it in an AlphaFold prediction.
If you want, you can supply your own MSAs. The key requirement is that the
sequence of the first entry in your MSA must exactly match the sequence to be
predicted.
You can also skip the use of MSAs. This can be useful if you supply a template
and you want AlphaFold to rebuild your template instead of doing a new
prediction.
Number of models
AlphaFold can carry out multiple predictions for a sequence. You can specify
how many of these to carry out. The PredictModel tool will choose the one with
the highest value of pLDDT (predicted local difference distance test).
Using templates from the PDB
You can request that templates from the PDB be used in prediction. If you use
this feature, models will be predicted both with and without the templates, and
the model with the highest pLDDT will be saved.
Using supplied templates
You can supply your own templates. As for templates from the PDB, if you use
this feature, models will be predicted both with and without the templates, and
the model with the highest pLDDT will be saved.
Using supplied predictions or RNA/DNA models
You can supply homology models or models that you have built yourself for any
of the chains in your structure. If you are working with RNA or DNA, you have
to do this. You just supply a model (does not have to be perfect) for each
chain that you do not want to have AlphaFold predict, and you call it a
predicted model (note the difference between a template -- a model to guide
AlphaFold prediction, and a predicted model -- the prediction or an alternative
to a prediction.)
Using PredictAndBuild to change the sequences in a model
If you have docked (cryo-EM) or placed (X-ray) chains for a structure from
one species using data from another species, you can use PredictAndBuild to
fix all the sequences and create a plausible model for each chain.
There are several ways you can do this. If you first run AlphaFold to get
a predicted model for each unique chain, you can
use a command like this:
Here existing.pdb is the model you have already created (wrong sequences).
The model replacements.pdb contains your predicted models for each unique
chain, where each predicted model has the right sequence, but can be
in any orientation or position. The
sequence file seq.dat has all the correct sequences. The map is required
as it is used to adjust the predicted models.
The keyword b_value_field_is=b_value tells PredictAndBuild that the predicted
models contain B-values (atomic displacement parameters), not pLDDT values
(predicted Local Difference Distance Test).
If you actually have no map,
you can create a model-based map from your existing.pdb structure with
the phenix.fmodel tool. The map type you want is called
complex (structure factors) and you would want to make it a low-resolution
map (like 10 A).
If you don't want to generate the predicted models yourself, you can skip
that input and just let PredictAndBuild do the predictions. Of course if you
have RNA or DNA chains, you will need to supply predicted models for those
chains.
Rebuilding a model without prediction (morph and refine only):
You can use predict_and_build to just rebuild a model that is placed (in a
cryo-EM map) like this:
In the GUI, select PredictAndBuild (cryo-EM data)
Supply: your map (call it “Map file”), your model (call it
“predicted model”), your sequence.
Just below the inputs there is a line that starts with
“Contents of b-value field”… Select “b_value” (not plddt),
as this model has B values in its b-value field.
In Prediction and building settings, go to All parameters,
the “Search parameters”, and search for “already”. Tick
the check box for “Models are already placed”.
In Prediction and building settings, select “Building” and check
the box for “Refine only”.
Hit Run. It should take your model, morph it to match
the map, and refine it.
Examples
Standard run of predict_and_build
Running predict_and_build is easy. From the command-line you can type:
This will carry out all the steps of prediction, molecular replacement and
iterative rebuilding and prediction to yield my_model_rebuilt.pdb
If you run again with the same command, Predict and build will read all the previous files and just give you the results.
Common questions
How long will it take to run?
Running predict_and_build can take anywhere from an hour to several days,
depending largely on the size of the chains to be placed and the number of
chains. Increasing the number of processors used (default of 4) will speed
it up. If you are unsure if the program is running, have a look at the log
file, where the status and current time are printed out periodically. Also
you can just see if any Python processes are running on your machine.
Server problems
The most common problem in running Predict and build is that
the Phenix server is not working as expected. Normally the first thing
to try is just let the program retry (it will do this for a while normally).
Here is a test you can run to see if everything is ok:
In the GUI, hit AlphaFold/AlphaFold model prediction/Prediction settings/
and check the box for “Run test job on server and stop”. Then hit Run.
In about 10 sec it should say “Test job completed successfully” if everything
is ok.
If the server is still not working, you can take the files in the packaged
.tgz file (listed in the GUI output), use them to get your own prediction with
any server, and put the resulting predicted models in the place specified in
the GUI or program output.
Problems with low-confidence models
If your predicted models have low confidence, the entire procedure may not work.
If the resolution of your data is about 4.5 A or better, and most of most of
your predicted models have high confidence (plDDT > 70), then
the procedure has a good chance of working. Otherwise, the chances are lower.
Symmetry problems
If your cryo-EM map has pseudo-symmetry (like a proteasome) you might need
to box one subunit or try ssm_search=False to use a more thorough search
in docking.
Working on individual chains in large cryo-EM structures
If you are working with a large cryo-EM structure (or even a small one),
you may find it
most efficient to break up the work into small pieces. If you can identify
where individual chains are in your map, you can box around each chain,
creating individual boxes for each chain to work on. One way to create
a box around a chain is to put a dummy molecule in the density you are
interested (anything that more or less covers the region you want), then
use the MapBox tool to create a little map surrounding
that dummy molecule.
When you use the MapBox tool be
sure to use the defaults and do not shift the origin (if you shift the origin
then the models you get will not superimpose on the original map).
Then of course you have to guess which sequence goes with that density. You
could try a few possibilites and run PredictAndBuild on your small map with
each likely sequence.
When you are done with each box, you can just combine all the models because
they will stay in their correct locations.
Inverted map:
If your map is inverted (left-handed), docking and model-building will not
work properly. You can often tell if your map is inverted because any
helices will be left-handed. If you are unsure, you can run MapBox with
invert_hand=True to invert the map and then see if docking works. Note that
if your map is inverted, you will want to invert all your maps and start
everything from the beginning.
Specific limitations and problems:
AlphaFold only predicts protein structures, not RNA or DNA, so you will have to
either build those chains separately or supply predicted models (homology
models, for example) for those chains. If you supply predicted models for some
chains and not others, predict_and_build will try to predict the missing
chains and build with all the available models.
The predict_and_build tool has two command-line parameters for nproc
(prediction.nproc and control.nproc), two for resolution
(prediction.resolution and crystal_info.resolution), and two for random_seed
(prediction.random_seed and control.random_seed).
Normally use the control.nproc, crystal_info.resolution and
control.random_seed parameters. However if you want to change the number
of processors (on the server, nproc only used for getting templates
with structure_search and limited to 4 processors) you can set prediction.nproc.
If you want to set the random seed used in prediction to a particular value
you can set prediction.random_seed (and also usually set
use_supplied_random_seed_in_prediction=True).
Literature
Accelerating crystal structure determination with iterative AlphaFold prediction. T.C. Terwilliger, P.V. Afonine, Liebschner, D., T.I. Croll, McCoy, AJ, Oeffner, RD., Williams, CJ, Poon, BK, J.S. Richardson, R.J. Read, and P.D. Adams. Acta Cryst D79, 234-244 (2023).
Improved AlphaFold modeling with implicit experimental information. T.C. Terwilliger, B.K. Poon, P.V. Afonine, C.J. Schlicksup, T.I. Croll, C. Millán, J.S. Richardson, R.J. Read, and P.D. Adams. Nature Methods19, 1376-1382 (2022).
AlphaFold predictions: great hypotheses but no match for experiment. T.C. Terwilliger, P.V. Afonine, Liebschner, D., T.I. Croll, McCoy, AJ, Oeffner, RD., Williams, CJ, Poon, BK, J.S. Richardson, R.J. Read, and P.D. Adams. BiorXiv (2023).
Making protein folding accessible to all. M. Mirdita, K. Schütze, Y. Moriwaki, L. Heo, S. Ovchinnikov, and M. Steinegger. Nature Methods19, 679–682 (2022).
Highly accurate protein structure prediction with AlphaFold. J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, A. Bridgland, C. Meyer, S.A.A. Kohl, A.J. Ballard, A. Cowie, B. Romera-Paredes, S. Nikolov, R. Jain, J. Adler, T. Back, S. Petersen, D. Reiman, E. Clancy, M. Zielinski, M. Steinegger, M. Pacholska, T. Berghammer, S. Bodenstein, D. Silver, O. Vinyals, A.W. Senior, K. Kavukcuoglu, P. Kohli, and D. Hassabis. Nature596, 583-589 (2021).
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. M. Steinegger, and J. Söding. Nature biotechnology35, 1026-1028 (2017).
lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. V. Mariani, M. Biasini, A. Barbato, and T.. Schwede. Bioinformatics29, 2722-2728 (2013).
Additional information
List of all available keywords
jobname = '' Name of this job. Used to create a directory that will hold all the working files for the run. Normally set automatically by the GUI to match the directory where the job is run: directory: PredictAndBuild_30, jobname: PredictAndBuild_30. When run on command line you can specify any 4-or-more character jobname if you want.
carry_on_directory = None Name of carry-on directory (stores intermediate and final results). You can continue with a previous run by specifying this directory from a previous run. To rerun or carry on from the job in /Users/Documents/PredictAndBuild_30, you specify carry_on_directory= /Users/Documents/PredictAndBuild_30/PredictAndBuild_30_CarryOn. Then if you also specify carry_on=True, files will be read from this directory and copied to a new carry_on_directory inside the working directory. Normally set automatically by the GUI in the directory containing your job: directory: PredictAndBuild_30, jobname: PredictAndBuild_30 carry_on_directory: PredictAndBuild_30_CarryOn. NOTE: Cannot be the the current directory. NOTE: if you specify a CarryOn directory you must specify carry_on=True if you want it to be used.
upload_download_directory = None Directory to use for uploading and downloading files from Colab. Normally use your standard Downloads folder.
alphafold_model_suffix = AlphaFold_cycle_1.pdb Name of AlphaFold model is <jobname>_<alphafold_model_suffix> when coming from Colab server.
pae_suffix = PAE_cycle_1.jsn Name of PAE file is <jobname>_<pae_suffix> when coming from Colab server.
carry_on = False Carry on with a job from where it left off. Read in all files that were created instead of making new ones. To use, run the job you want exactly as you did the first time except set carry_on=True and specify carry_on_directory=xxx where xxx is the carry_on_directory from the run you want to continue. If your previous run was /Users/Documents/PredictAndBuild_30, specify: carry_on_directory= /Users/Documents/PredictAndBuild_30/PredictAndBuild_30_CarryOn. NOTE: if you specify a CarryOn directory you must also specify carry_on=True (False is the default) if you want this to be used.
prediction_server = *PhenixServer Colab ManualPrediction LocalServer Choose where prediction calculations will be done. Normally use the Phenix server (PhenixServer). As a backup you can use Colab (you will need a Google account.) Use ManualPrediction if you are going to do the prediction off-line and copy the result to the location specified by PredictAndBuild. LocalServer is used internally only.
run_directly = False Run directly. Used internally only
max_len_at_server = 1500 Maximum length of sequence at server. Used internally only
dummy_run = False Write dummy file. Used internally only
run_server_test = False Run test of server and stop (does nothing else and does not require any inputs)
copy_of_final_model = None Copy final model here. Used internally and by predict_model. (Also copy map files to same directory). If None, copy to working directory. Use copy_of_final_model=null to not write these files at all.
copy_of_msa = None Copy final msa here (only if there is just one). Used internally
copy_of_pae = None Copy final pae here (only if there is just one). Used internally
predicted_model_prefix = None Predicted models will have this prefix. Used by predict_model.
job_title = None Job title in PHENIX GUI, not used on command line
predict_and_build
max_rmsd_to_predict_together = 2 Two models are considered similar (and not worth predicting separately) if they have an rmsd after superposition of this value or less (A)
template_file = None Template model to be used in prediction.
get_msa_with_mmseqs2 = True Run the mmseqs2 server from this computer to get MSA if not already available. Alternatives are to supply and MSA or to run the mmseqs2 server from the Phenix server or Colab.
combine_rebuilt_and_autobuild = True Combine rebuilt and autobuild models and use if better Rfree (X-ray only).
include_side_in_templates = False Include side-chains beyond CB in templates as well as just main and CB (two runs).
only_include_side_in_templates = None Include side-chains beyond CB in templates only (one run)
use_templates_in_prediction = True Use templates if available (i.e., after first cycle)
required_free_r_to_keep = 0.49 Skip refinements if R is worse than this value
autobuild_resolution = 3.5 Worst resolution where autobuild density modification may be useful.
autobuild_satisfactory_r_value = 0.25 Skip autobuild if R is this value or better
autobuild_for_xray_density_modification = None Use autobuild rebuilding for density modification. Default is True if resolution is autobuild_resolution or better.
autobuild_n_cycle_rebuild_max = 3 Maximum autobuild rebuild cycles
autobuild_n_cycle_build_max = 1 Maximum autobuild build cycles
autobuild_quick = False Run AutoBuild in quick mode
autobuild_min_resolution = 2 Minimum resolution for autobuilding. Resolution will be cut off at this value if it is better (smaller number).
density_modify_at_start = None Density modify cryo-EM maps before analysis. Requires two half maps. Default is True if half maps are supplied and no full map is supplied, otherwise False. Works best if maps contain entire structure with boundary (at least 5 A) or are original full-sized maps.
density_modify_with_model = None Density modify using rebuilt model each cycle. default is True. Requires two half-maps for cryo-EM. Works best for cryo-EM if maps contain entire structure with boundary (at least 5 A) or are original full-sized maps. Default is True for X-ray and False for cryo_EM
large_sequence = 1500 Length of sequence that requires job_size=large
pause_after_mr = False Pause after MR and refinement (X-ray only). You can go on after checking your results.
pause_after_docking = False Pause after docking and refinement (Cryo-EM only). You can go on after checking your results.
stop_after_predict = False Stop after prediction. No map file needed if set.
stop_after_mr = False Stop after MR.
parallel_predictions = 1 Number of predictions to run at once. If more than one, no feedback is provided until the end. If there are waiting jobs in the queue running more than one at a time does not help (your first job may go near the front of the line but the others stay at the end until the first one finishes).
write_prediction_files_and_wait = False Write prediction files in working directory and wait for result to appear. Only used for servers.
put_files_for_prediction_here = None Program will put prediction files (pdb, msa) in this directory for the rest server to read (may look like /app/xxx/xxx)
files_for_prediction_will_be_transfered_here = None Program will expect prediction files (pdb, msa) from the put_files_for_prediction_here directory will be copied to this directory so the rest server can read them. Your system has to automatically do this transfer.
precalculated_msas_dir = /app/results/precalculated_msas Location of precalculated msas (usually on server)
precalculated_results_dir = /app/results/precalculated_results Location of precalculated results (usually on server)
precalculated_pae_dir = /app/results/precalculated_pae Location of precalculated pae files (usually on server)
allow_precalculated_result_file = True Allow using precalculated results if inputs are identical
allow_precalculated_msa_file = True Allow using precalculated msas if inputs are identical
files_from_prediction_may_be_found_here = None Look for output from prediction here (in case full path is not available)
rest_server_incoming = None Location on REST server where files should be placed
max_wait_time = 100000 Maximum wait time for write_prediction_files_and_wait (sec)
stop_if_internet_not_available = True Stop if attempting to predict models online and internet is not available
number_of_models = 5 Number of models to generate for each chain. Normal way to set the value of random_seed_iterations. If a good plDDT is found, no more models are created, where good is good_enough_plddt (typically 80)
use_supplied_random_seed_in_prediction = False Use random seed to set up prediction (default is to use a standard random seed).
update_unique_sequences = True Update unique sequences based on MR or docking results
minimum_residues = 20 Minimum residues in a chain
maximum_residues = 10000 Maximum residues in a sequence that can be used for prediction
break_into_chunks_if_length_is = 1500 If a sequence is less than maximum_residues but at least break_into_chunks_if_length_is, break it into chunks of length chunk_size with overlap of overlap_size, then reassemble using overlap window of overlap_match_size.
chunk_size = 600 If a sequence is less than maximum_residues but at least break_into_chunks_if_length_is, break it into chunks of length chunk_size with overlap of overlap_size, then reassemble using overlap window of overlap_match_size.
overlap_size = 200 If a sequence is less than maximum_residues but at least break_into_chunks_if_length_is, break it into chunks of length chunk_size with overlap of overlap_size, then reassemble using overlap window of overlap_match_size.
overlap_match_size = 50 If a sequence is less than maximum_residues but at least break_into_chunks_if_length_is, break it into chunks of length chunk_size with overlap of overlap_size, then reassemble using overlap window of overlap_match_size.
morph_reassembled_model = True If break_into_chunks_if_length_is is applied, after reassembling identify close domains that were in different chunks and repredict them to get their relationships. Then adjust the reassembly to minimize overlaps and match pair relationships. NOTE: This reduces overlaps and can improve domain relationships but can yield distortions of the predicted model.
minimum_cycles = 2 Minimum cycles to carry out
use_input_scaffold_if_supplied = True If input scaffold is provided, use it throughout (all cycles)
default_maxit_dir = /app/results/rest/ Default place to look for maxit if not supplied. This directory should contain the whole maxit-v11.100-prod-src
skip_copying_files = False Skip copying output files to GUI directory
refinement
macro_cycles = 3 Refinement macro_cycles
rest_server
url = None The URL for the Phenix REST server Normally set automatically
port = None The port for interacting with the Phenix REST server Normally set automatically
token = None Authentication token for accessing the Phenix REST server. Normally set automatically
timeout = 5 Time to keep trying to connect to a server
quick_check_interval = 1 Time in seconds between job status checks
check_interval = 60 Time in seconds between job status checks
max_tries = 10 Number of tries to get result from server
max_tries_on_availability = 2 Number of tries to see if server is up
stop_if_server_not_available = True Stop if server is not available (wrong url or down)
job_size = small *medium large Size of job (small, medium, large)
requires_gpu = True Job requires GPU
running_server_test = False This is a test job
verbose = False Verbose output
input_files
seq_file = None Input sequence file. Required if no models supplied. Used to create predicted models. Format should be Fasta or simple sequences separated by blank lines. One sequence per chain to be generated. Must match input models if both are supplied. (Fasta format or sequences separated by blank lines)
xray_data_file = None Input X-ray data file (MTZ format).
xray_data_label = None Label specifying which data column to use from xray_data_file
xray_test_flag_label = None Label specifying which test flag column (if any) to use from xray_data_file
truncate_models_to_match_sequences = True If input sequences start after or end before supplied predicted models, trim the predictedd models to match sequences
density_select = None If set, trim map to region containing molecule (non-zero region) before use. Recommended for cryo-EM maps that have a large map that is mostly unused. Default is True if map is full-size and False if not.
predicted_model = None One or more input models in one or more files (normally predicted models) to be placed. Normally should start at residue 1 and match sequence file exactly.
b_value_field_is = *plddt rmsd b_value The B-factor field in predicted models can be pLDDT (confidence, 0-1 or 0-100) or rmsd (A) or a B-factor. Only applies to protein chains (always B-factor for non-protein).
msa_list = None Multiple sequence alignment (MSA) file. Format is a3m only. First sequence in an MSA file must match a sequence in the sequence file exactly.
models_are_already_placed = False You can specify that all models (predicted_model or model) are already placed (docked, placed in unit cell). No docking or MR is done if so. Equivalent to specifying a scaffold_model with the same contents as the model.
model_copies_list = None List of number of copies to find of each model in predicted_model. Default is 1 (must specify all or none of them) . Specify all together like model_copies_list=1,2,1,1
processed_model_file = None Processed model file (e.g., from phenix.process_predicted_model). If supplied, skip the process_predicted_model step and use this file. This model is expected to have one chain for each domain and actual B-values in the B-value field. It is expected that all poorly-predicted residues have been removed.
docked_model_file = None Docked model (e.g., output of dock_processed_model). If supplied, use this file as the docked model. (Skip process_predicted_model and docking steps This model is expected to have a single chain with gaps for parts of the model that are not accurately known from the prediction. NOTE: You still need to supply the predicted model as the input model for this procedure in addition to this docked model.
morphed_model_file = None Morphed model (e.g., full model morphed to match output of dock_processed_model). If supplied, skip the process_predicted_model and docking steps and use this file as the morphed model. This model is expected to have a single chain with no gaps. NOTE: You do not need to supply the predicted model as the input model for this procedure in addition to this morphed model.
previous_model_file = None Previous model (from a previous run or external). Must match sequence of working model exactly. Used as a source of possible model information and as a hypothesis for docking of working model. If docked_model_file is also supplied, previous_model is used only as source of model information.
scaffold_model = None Scaffold model. If supplied, used as target for docking of each chain in predicted_model or model. Must be similar in sequence to models to be docked. If supplied, model_copies_list is ignored and docking is done by superposition onto this structure instead of docking into density or MR. If the scaffold model does not have all chains represented, they are added in by docking.
scaffold_minimum_chain_cc = 0.35 Chains in scaffold model that have scaffold_minimum_cc are not re-docked, but those with lower CC are.
base_scaffold_minimum_chain_cc = -1 Value of scaffold_minimum_chain_cc in cases where scaffold is known to be good.
fragments_model_file = None Fragments model (e.g., map_to_model.pdb). Used as a source of possible model information and as a hypothesis for docking of working model. If docked_model_file is also supplied, fragments_model is used only as source of model information.
model_to_rebuild_file = None Model to be rebuilt. Must be already in place (already docked). Must match supplied sequences exactly.
symmetry_file = None Symmetry file (.ncs_spec format or MATRX records) with reconstruction symmetry. Used in identification of unique part of map. NOTE: symmetry file applies to map file in original position. NOTE 2: only proper symmetry (point-group, helical) is allowed NOTE 3: Only applies to cryo-EM maps
pae_file = None Optional input json file with matrix of inter-residue estimated errors (pae file)
distance_model_file = None Distance_model_file. A PDB or mmCIF file containing the model corresponding to the PAE matrix. Only needed if weight_by_ca_ca_distances is True.
search_model_copies = None Used internally only
search_model = None used internally only
map_model
full_map = None Input map file (for cryo-EM). This can be a boxed or masked map showing just the molecule to dock (best) or a full map with symmetry. If your map has symmetry be sure to set asymmetric_map = False. If you have a map with symmetry you can supply a symmetry file if you want. Otherwise symmetry will be automatically determined.
half_map = None Input half map files. Usually supply one full map or 2 half maps
model = None Input predicted model (e.g., AlphaFold model). Assumed to have pLDDT values in B-value field (or RMSD values). May have multiple chains. Normally use predicted_model instead.
output_files
output_model_prefix = None Output files with superposed models will begin with this prefix
output_seq_file = None Sequence file (possibly edited) written to this file by PredictAndBuild
pdb_out = None Used internally only
temp_dir = None Temporary directory. Default is dock_and_build_xx where xx creates new directory
crystal_info
resolution = None High-resolution limit for main search. This can be lower resolution than the data. The search is quicker at lower resolution. If your model is poor, try 2-3 A lower resolution than your data (i.e, if your data is 2.5 A, try 5 A).
scattering_table = *None n_gaussian wk1995 it1992 electron neutron Choice of scattering table for structure factor calculations. Default for X-ray is n_gaussian, for cryoEM is electron.
wrapping = None You can specify whether the map is wrapped (can map values outside bounds to inside with cell translations).
asymmetric_map = None Specifies that this is an asymmetric map and no symmetry is to be supplied or found. Alternative to supplying a symmetry file or symmetry. Applies to cryo-EM reconstructions only.
solvent_content = None Solvent fraction (content) of the cell. You can specify the fraction of the volume of this cell that is taken up by the macromolecule. Normally set automatically. Values go from 0 to 1.
pdb70_text = None Database of PDB entries to be used as templates (pdb70 file). Normally used internally only
sequence = None Old-style sequence string (alternative to sequence file).
chain_type = *PROTEIN DNA RNA Chain type
msa = None MSA as text. Supply MSA as text or as an msa_file. Used internally
templates_as_string = None Templates as string (single PDB file with one or more chains)
space_group = None Space group (normally read from the data file, applies to X-ray)
space_group_alternatives = *HAND ALL NONE Space group alternatives for MR: ALL: all space groups in point group HAND: enantiomer and listed space group LIST: list supplied (not available) NONE: only supplied space group
unit_cell = None Unit Cell (normally read from the data file, applies to X-ray)
unique_sequencesFile names and data labels.
sequence = "Enter or edit sequence"
copies = 1 Copies of this sequence
label = ""
iteration
cycles = None Cycles of prediction and rebuilding ( default is 10 for Thorough, 3 for Standard and 1 for Quick
cycle = 1 Iteration cycle
process_predicted_model
remove_low_confidence_residues = True Remove low-confidence residues (based on minimum plddt or maximum_rmsd, whichever is specified)
continuous_chain = False When removing low-confidence residues, only trim from ends
split_model_by_compact_regions = True Split model into compact regions after removing low-confidence residues.
maximum_domains = 3 Maximum domains to obtain. You can use this to merge the closest domains at the end of splitting the model. Make it bigger (and optionally make domain_size smaller) to get more domains. If model is processed in chunks, maximum_domains will apply to each chunk.
domain_size = 15 Approximate size of domains to be found (A units). This is the resolution that will be used to make a domain map. If you are getting too many domains, try making domain_size bigger (maximum is 70 A).
adjust_domain_size = True If more that maximum_domains are initially found, increase domain_size in increments of 5 A and take the value that gives the smallest number of domains, but at least maximum_domains.
minimum_domain_length = 10 Minimum length of a domain to keep (reject at end if smaller).
maximum_fraction_close = 0.3 Maximum fraction of CA in one domain close to one in another before merging them
minimum_sequential_residues = 5 Minimum length of a short segment to keep (reject at end ).
minimum_remainder_sequence_length = 15 used to choose whether the sequence of a removed segment is written to the remainder sequence file.
b_value_field_is = *plddt rmsd b_value The B-factor field in predicted models can be pLDDT (confidence, 0-1 or 0-100) or rmsd (A) or a B-factor
input_plddt_is_fractional = None You can specify if the input plddt values (in B-factor field) are fractional (0-1) or not (0-100). By default if all values are between 0 and 1 it is fractional.
minimum_plddt = None If low-confidence residues are removed, the cutoff is defined by minimum_plddt or maximum_rmsd, whichever is defined (you cannot define both). A minimum plddt of 0.70 corresponds to a maximum rmsd of 1.5. Minimum plddt values are fractional or not depending on the value of input_plddt_is_fractional.
maximum_rmsd = 1.5 If low-confidence residues are removed, the cutoff is defined by minimum_plddt or maximum_rmsd, whichever is defined (you cannot define both). A minimum plddt of 0.70 corresponds to a maximum rmsd of 1.5. Minimum plddt values are fractional or not depending on the value of input_plddt_is_fractional.
default_maximum_rmsd = 1.5 Default value of maximum_rmsd, used if maximum_rmsd is not set
subtract_minimum_b = False If set, subtract the lowest B-value from all B-values just before writing out the final files. Does not affect the cutoff for removing low- confidence residues.
pae_power = 1 If PAE matrix (predicted alignment error matrix) is supplied, each edge in the graph will be weighted proportional to (1/pae**pae_power). Use this to try and get the number of domains that you want (try 1, 0.5, 1.5, 2)
pae_cutoff = 5 If PAE matrix (predicted alignment error matrix) is supplied, graph edges will only be created for residue pairs with pae<pae_cutoff
pae_graph_resolution = 0.5 If PAE matrix (predicted alignment error matrix) is supplied, pae_graph_resolution regulates how aggressively the clustering algorithm is. Smaller values lead to larger clusters. Value should be larger than zero, and values larger than 5 are unlikely to be useful
weight_by_ca_ca_distance = False Adjust the edge weighting for each residue pair according to the distance between CA residues. If this is True, then distance_model can be provided, otherwise supplied model will be used. See also distance_power
distance_power = 1 If weight_by_ca_ca_distance is True, then edge weights will be multiplied by 1/distance**distance_power.
stop_if_no_residues_obtained = True Raise Sorry and stop if processing yields no residues
keep_all_if_no_residues_obtained = False Keep everything if processing yields no residues
vrms_from_rmsd_intercept = 0.25 Estimate of vrms (error in model) from pLDDT will be based on vrms_from_rmsd_intercept + vrms_from_rmsd_slope * pLDDT where mean pLDDT of non-low_confidence_residues is used.
vrms_from_rmsd_slope = 1.0 Estimate of vrms (error in model) from pLDDT will be based on vrms_from_rmsd_intercept + vrms_from_rmsd_slope * pLDDT where mean pLDDT of non-low_confidence_residues is used.
break_into_chunks_if_length_is = 1500 If a sequence is at least break_into_chunks_if_length_is, break it into chunks of length chunk_size with overlap of overlap_size for domain identification using split_model_by_compact_regions without a pae matrix
chunk_size = 600 If a sequence is at least break_into_chunks_if_length_is, break it into chunks of length chunk_size with overlap of overlap_size for domain identification using split_model_by_compact_regions without a pae_matrix
overlap_size = 200 If a sequence is at least break_into_chunks_if_length_is, break it into chunks of length chunk_size with overlap of overlap_size for domain identification using split_model_by_compact_regions without a pae_matrix
search
dock_chains_individually = False Dock each chain (identified by chain_id field) individually. This is useful if the chains might have different orientations in the map compared to the search model, such as in a search model that is a predicted model. Normally used along with create_unique_chain_at_end=True for predicted models to put the model back together.
create_unique_chain_at_end = None Take all docked chains and try to create one chain that has no duplicate residue numbers and that has distances between ends of fragments consistent with the number of residues between them. If symmetry has been applied, create N copies representing that symmetry. Default is True if dock_chains_individually = True and False otherwise.
minimum_cc_to_keep_domain = 0.2 Do not keep domains in create_unique_chain_at_end if cc is less than minimum_cc_to_keep_domain.
weight_sequential_fragments_by_distance = None Try to dock a chain in a way that minimizes the distance between its first residue and the previously-docked residue with the highest residue number less than this, and similarly for the last residue and the next available placed residue. Default is True if create_unique_chain_at_end is set. Normally use only if you have a model with a single chain and you are docking pieces of that chain.
choose_better_of_individual_and_group_docking = None Dock entire search model (normally one only) and also by chain...pick better-fitting of the two for each chain
low_res_search = True Try to fit by searching at low resolution first
dock_with_mr = True Try using MR to dock first
ssm_search = None Try to fit by searching for secondary structure. Default is False for dock_in_map and True for dock_and_rebuild if dock_with_mr is not used.
refine_cycles = 3 rigid-body refinement cycles
ssm_search_min_cc = 0.30 Stop ssm search if this cc achieved
backup_resolution_cutoff_searches = 2 If initial fitting does not work, try up to this many times increasing resolution by backup_resolution_ratio each time
backup_resolution_ratio = 1.67 If initial fitting does not work, try up to backup_resolution_cutoff_searches times increasing the resolution by backup_resolution_ratio each time
resolution_radius_scale = 0.5 Resolution for low-res search will be resolution_radius_scale times the radius of gyration of the search model.
align_moments = False Try to fit by aligning moments of inertia if size of density region and molecule are similar
max_radius_ratio = 2. Radius of gyration of molecule and density must be within max_radius_ratio of each other
radius_scale = 1.5 Mask for density and atoms will be radius_scale times the radius of gyration of model
use_symmetry = True If search_model_copies or search_map_copies are the same for each model and greater than one and allow_symmetry is set, use symmetry in the map to place all copies after the first one.
skip_if_low_cc = True Skip solution before rigid-body refinement if map-model CC is less than half of min_cc.
rigid_body_refinement = True Run rigid-body refinement on final model
rigid_body_refinement_single_unit = True Run rigid-body refinement with just one unit (do not break up into chains)
rigid_body_refinement_split_method = *chain_id segid When splitting up molecule for rigid-body refinement (if rigid_body_refinement_single_unit=False), use either chain_id or segid to split up molecule
rigid_body_refinement_resolution = None Run rigid-body refinement at this resolution if specified
append_to_fixed_model = True Append placed search model to fixed model (if any) after search
min_cc = 0.4 If quick run, stop if minimum CC is achieved in local search. Also always skip if starting CC_mask is less than 1/4 min_cc.
run_in_boxes = True Run on sub-boxes and combine at the end
target_box_size = 60 Try to get boxes about this big on a side (grid units)
target_boxes = None Try to get this many boxes. Default is nproc unless this makes box size much smaller than target_box_size
box_to_run = None Run only this box
box_overlap_scale = 1 box overlap (overlap of boxes) will be box_overlap_scale times the density radius
edge_ratio = 10 box edge box_overlap times edge_ratio
density_radius = None Radius for density to be cut out and compared. Default is 6 times the resolution.
model_radius = 3 Radius for removing density near fixed_model
zero_value = 0 Value to set map in regions overlapping fixed model
density_peaks = 20 Number of NCS-related peaks of density to check
delta_phi = 20 Angular spacing of search
max_rot = None Maximum rotations to try
rotz_only = None Rotate only around Z
single_positions_to_try = 10 Number of offset positions to try in optimizing orientation. Positions along the chain are selected as centers and a local fit near each position is carried out. The resulting offsets relative to the original placement are used to optimize the overall orientation and position.
max_position_shift_frac = 0.05 Maximum fractional positional shift in single_positions run
min_relative_cc = 0.67 Minimum local CC relative to original CC to keep a local search. This is a way to reject local searches that are completely wrong.
sieve_fit = None Use sieve_fit fraction of single positions in fitting. If None, use all
ncs_copies_max = None Maximum number of matching models to write. If more than one they will be written as MODEL records in the output PDB file. You can get them individually afterwards with phenix.pdbtools placed_model.pdb keep="model 1" etc.
start_rot = None Three numbers rotx, rotz, rotx defining the starting rotation of the search model. Normally used along with delta_phi=1000 or max_rot=1 to generate exactly one defined rotation.
search_center = None Optional coordinates in search model for centering search. Note this is different from target_search_center which is the location xyz in the map to look.
search_center_selection = None Optional selection defining coordinates in search model for centering search.
target_search_center = None Optional coordinates in reference map where search_center should be approximately located after superimposing maps. Used to eliminate possible superpositions that place the search center elsewhere. Overlap scores decreased based on distance to target_search_center/density_radius.
model_search_position = None Optional coordinates (usually part of search model) for matching to target_search_position after transformation. These can be specified in addition to target_search_center. Can have multiple model_search_position and target_search_position pairs by specifying each multiple times in order. NOTE: Not compatible with map search
target_search_position = None Target positions for model_search_position after transformation.
search_position_radius = None Radius for comparison of target_search_position and transformed search_position values. Default is density_radius. If specified, must be a single value or the same number of values as entries in model_search_position and target_search_position
rot_id_n = None Number of rotation groups. Along with rot_id_group, allows defining groups of rotations to be carried out in one run. .short_caption = Number of rotation groups
rot_id_group = None rotation group to include. See rot_id_n.
map_box = True Run map_box to extract useful part of map before search
fix_search_position = False You can choose to not move your model center of mass to the origin by fixing the search position
search_box_size = None You can choose the size of the search box for local searches. Normally set by default corresponding to 3 times density_radius
lower_bounds = None You can select a part of your map for analysis with lower_bounds and upper_bounds.
upper_bounds = None You can select a part of your map for analysis with lower_bounds and upper_bounds.
keep_search_order = None Keep search order as input
remove_water = False Remove waters and other hetero atoms from input files
prediction
prediction_method = *alphafold Prediction method to use
template_search_method = *mmseqs2 structure_search Method to identify templates from PDB and generate MSAs. If structure_search is set, you can specify what PDB databases to use (same as in phenix.structure_search). If structure_search is set, you must supply your own MSA file with the keyword upload_msa_file=True (structure_search does not generate MSAs).
starting_alphafold_model = None Starting AlphaFold model. You can supply an AlphaFold model and skip the initial AlphaFold step. This is equivalent to setting up an output_directory with just an AlphaFold model named as the expected first AlphaFold model and specifying carry_on=True.
input_directory = ColabInputs Input directory containing density map. The map filename must start with the same characters as the jobname (only including characters before the first underscore). If you are supplying an MSA file it goes here as well.
output_directory = ColabOutputs Output directory. Copy outputs to output_directory. Used to restart with carry on.
save_outputs_in_google_drive = False If run on Colab, copy outputs to output_directory in Google drive
content_dir = None Content directory. Default is working directory
maxit_path = None Path to maxit (pdb to cif) converter. Optional.
data_dir = /mnt Data directory (location of AlphaFold parameters)
upload_file_with_jobname_resolution_sequence_lines = None Upload a file with a set of jobs (Colab only). Each line in the file is a jobname, resolution, and sequence
maximum_cycles = 10 Maximum cycles to carry out
cycle_rmsd_to_resolution_ratio = 0.25 Stop iteration if rmsd between subsequent AlphaFold models is less than cycle_rmsd_to_resolution_ratio times the resolution for two cycles in a row
significant_increase_in_residues = 5 Significant increase in residues in model
password = None Phenix download password (Colab only). The password used to download Phenix at your institution. Updated weekly, so you may need to request a new one frequently.
version = dev-4502 Version of Phenix to run (Colab only)
query_sequence = None Sequence
resolution = None Resolution of map (A). Internal use only. Normally set instead crystal_info.resolution.
jobname = None Name of this job. The first characters before any underscore must be unique and will define the first characters of the corresponding map file in the input directory. The job name will normally also be the name of the working directory. Used internally only
use_msa = True Use multiple sequence alignments at some point
skip_all_msa_after_first_cycle = False Skip multiple sequence alignments after first cycle
include_templates_from_pdb = True Include templates from PDB. Note special behavior when running predict_and_build: applies only on first cycle, and if PhenixServer or Colab used prediction will be run with and without templates and PDB and highest plddt model will be kept
maximum_templates_from_pdb = 20 Maximum templates from PDB to include
release_date = None release date for templates from pdb (only use up to this date. Format is 2020-05-14.
upload_msa_file = False Supply MSA directly (.a3m format). You can supply the MSA as a file in your input_directory and it will be used instead of using mmseqs2 to generate an MSA. Your file name must end in .a3m. The format is two lines per sequence, the first starts with a greater-than sign and is ignored, the second is a sequence of letters or minus signs. All sequence lines must have the same length. The first sequence must be the target.
upload_manual_templates = False Supply templates for AlphaFold prediction. Used in the same way as templates from the PDB unless uploaded_templates_are_map_to_model is set. May be .cif or .pdb files. Templates must start with characters matching the first characters in the jobname (before the first underscore), the remainder of the file name can be anything but just end in .cif or .pdb. The files must be in the input_directory or be uploaded (in Colab only).
uploaded_templates_are_map_to_model = False The manual templates are models that may or may not have the sequence of the alphafold models. These are used only as suggestions for placement of the main chain.
upload_maps = True Use maps (required to be True)
random_seed = 7231771 Random seed. Used internally
random_seed_iterations = 1 Random seed iterations of AlphaFold in first cycle. The model with the highest plDDT will be used. If running predict_and_build and include_templates_from_pdb is True this many iterations will be carried out with and without templates . In predict_and_build and predict_chain set number_of_models instead.
minimum_random_seed_iterations = 1 Random seed iterations of AlphaFold after first cycle. The model with the highest plDDT will be used
big_improvement = 10 How much improvement in plDDT is worth going through all randomization cycles
good_enough_plddt = 80 Value of plDDT that is good enough to not make any more models
nproc = 4 Number of processors to use. Internal use only. Normally set instead control.nproc
debug = False Debugging run (print traceback error messages)
get_msa_only = False Just get MSA and save it on server, no prediction. Does not return the MSA
carry_on = False Carry on from where previous run ended. Used (usually in Colab) to go on after a crash or timeout. Requires that files are saved in the output directory
cif_dir = None Location of templates (normally set automatically)
template_hit_list = None List of templates (normally set automatically)
jobnames = None List of jobnames (normally set automatically)
resolutions = None List of resolutions (normally set automatically)
include_templates_from_pdb_list = None List of include_templates_from_pdb ((normally set automatically)
include_side_in_templates_list = None List of include_side_in_templates_list ((normally set automatically)
working_directory = None Working directory (used internally only)
maps_uploaded = None List of maps (used internally only)
msas_uploaded = None List of msas (used internally only)
num_models = 1 Number of models (used internally only)
homooligomer = 1 Number of copies (used internally only)
cycle = None Cycle number (internal use only)
host_url = https://api.colabfold.com Host url for mmseqs
use_env = None Use env (internal use only)
use_custom_msa = None Use custom msa (internal use only)
use_templates = None Use templates (internal use only)
template_paths = None Template paths (internal use only)
mtm_file_name = None map_to_model file name (internal use only)
cycle_model_file_name = None cycle_model_file_name (internal use only)
previous_final_model_name = None previous_final_model_name (internal use only)
msa = None MSA (internal use only)
msa_is_msa_object = None MSA info (internal use only)
deletion_matrix = None Deletion matrix (internal use only)
structure_search_params
structure_search
pdb_file = None Enter a PDB file name
sequence = None Optional Fasta sequence file. Only needed for a quick sequence search against RCSB without a PDB.
output_prefix = 'output' Provide an output prefix if needed
blastpath = None Enter path to blastall executable
sequence_only = False Do a Blast search against PDBaa sequence instead of
doing a Ramanchandran-based structure search
structure_only = False Do only a Ramanchandran-based structure search.
db_used = 'rcsb' structure database used in search. rcsb, scop95, or AF2
db = 'rcsb' Database used in search. rcsb, scop95, or AF2.
get_ligand = False Use get_ligand=True to retrive ligands.
get_ramacode_only = False Generate Rama code for input pdb/cif only.
This is for developers only.
get_xml_only = False Get BLAST XML output returned as a string object.
No coordinate superposition will be performed. Developers only.
use_pdb100aa = False Use PDB100 sequence database for sequence search.
use_custom_db = False Use custom database specified by custom_db_files/custom_db_dir.
custom_db_dir = None The directory of pdb/cif files to make custom database.
Default is current directory
custom_db_files = None Filenames of the pdb/cif files seperated by spaces for database.
If none specified, all pdb/cif in the custom_db_dir will be collected
atom_selection = 'all' Choose part of the pdb used in the search (default=all).
for example: chain B, resseq 113:219, ... etc.
get_pdb = 10 get_pdb=N will collect and superpose the top N
homologous pdbs. Use get_pdb=0 to disable this option.
deposited_before = 0 Specify the latest year of matching structures to be considered
for scoring. Pdbs deposited after this year will be discarded.
deposited_after = 0 Specify the earliest year of matching structures to be considered
for scoring. Pdbs deposited before this year will be discarded.
batch_size = 0 Process the pdbs in batch of <batch_size> until <min_match>
hits are identified or until all <get_pdb> pdbs are processed
min_match = 0 Finish structure_search when <min_match> matches are found.
Usually uses with <trim_ends> to exit the search once find suitable pdbs.
keep_all_pdb = False Keep all the PDB files, including full PDB, PDB_Chain and
superposed PDB_Chain. Default is False which will keep only superposed
PDB_Chain files in the directory specified in the output message.
trim_ends = False Remove terminal residues of hit pdbs extending beyond those
of the the target pdb.
write_pdb = True Set to False if no output pdb file is needed. Sometimes useful
if use Structure_Search within another program and only want to pass pdb
objects.
write_results = True Set to False if no output results/log files is needed. Useful
when calling Structure_Search within another program and only want to pass pdb
objects.
trim_hit_pdb = False Remove extra domains, extended loops, and unfit portions
of hit pdbs after superposed to the target pdb.
pickle_hits = False Pickle blast hit results from xml output.
coot_display = False (default) Display output pdb files in coot.
ask_coot = True prompt for coot display optios
PDB_MIRRORDIR = None Enter the top directory of local RCSB PDB mirror. The program
will try to retrieve PDBs and/or structure factors from this mirror first.
Note this assumes the directory trees under it follows those in RCSB --
pdb files as 'pdb####.ent.gz' in PDB_MIRRORDIR/data/structures/divided/pdb directory.
If you use PDB's rsync script, this variable would be the same as the $MIRRORDIR set
in the script
PDB_MIRROR_MMCIF = None Enter the parent directory of the mmcif files in the local PDB mirror.
MMCIFs will be retrieved from subdirectory ## where ## are the second and third letters
in the PDB id. This keyword should be $PDB_MIRRORDIR/data/structures/divided/mmcif directory.
PDB_MIRROR_PDB = None Enter the parent directory of the PDB files in the local PDB mirror.
PDBs will be retrieved from subdirectory ## where ## are the second and third letters
in the PDB id. This keyword should be $PDB_MIRRORDIR/data/structures/divided/pdb directory.
We recommend setting PDB_MIRRORDIR and it will take care of both PDB_MIRROR_PDB and
others together. However, users may choose to specify PDB_MIRROR_PDB
directly
PDB_MIRROR_STRUCTURE_FACTORS = None Enter the parent directory of the PDB files in the local PDB mirror.
structure factors s will be retrieved from subdirectory ## where ## are the second
and third letters in the PDB id. This keyword should be the same as the
$PDB_MIRRORDIR/data/structures/divided/structure_factors directory.
We recommend setting PDB_MIRRORDIR and it will take care of both PDB_MIRROR_PDB and
DB_MIRROR_STRUCTURE_FACTORS together. However, users may choose to specify
PDB_MIRROR_STRUCTURE_FACTORS directly
local_pdb_dir = None Enter the path directly to your local PDB repository.
verbose = False verbose output
debug = False debugging output
job_title = None Job title in PHENIX GUI, not used on command line
guiGUI-specific parameter required for output directory
output_dir = None
build
rebuild_strategy = Thorough Standard Quick Rebuilding strategy. Standard is up to 3 cycles, refining and replacing poorly-fitting loops in predicted models each cycle, using autobuild for density modification (if Xray). Quick is refinement only, one cycle. Thorough is up to 10 cycles, extensive rebuilding.
refine_only = None Refine only, no rebuilding steps (set automatically with Quick)
refine_only_resolution = 3.5 Default cutoff for using Refine only
run_fit_loops = True Run standard fit_loops
run_iterative_morph = True Run iterative morphing
run_trace_loops_through_density = True Run loop fitting with trace_through_density algorithm
run_refine = True Refine morphed model. Required for run_fit_loops or run_trace_loops_through_density
run_iterative_resolution_refine = True Refine morphed model with iterative resolution method
run_extend = True Extend ends of morphed model
extract_unique = True Extract unique part of map. Applies after density_select (if set).
use_symmetry_in_extract_unique = True Use symmetry from symmetry file (if available) when extracting unique part of map.
acceptable_docking_cc = 0.5 Acceptable docking CC for a chain
minimum_docking_cc = 0.15 Minimum docking CC for a chain
minimum_cutoff = 0.1 Minimum cutoff for estimating density in better parts of model
reasonable_cc_ratio = 0.80 Acceptable CC (ratio) for a chain relative to average. Used to make sure a chain entirely in very bad density is not kept
reasonable_cc_diff = 0.15 Acceptable CC (difference) for a chain relative to average. Used to make sure a chain entirely in very bad density is not kept
shift_field_distance = None Shift field characteristic distance (default = 10 A) for morphing
cc_sd_ratio = 3. Keep residues with CC at least within cc_sd_ratio of the mean CC for good residues
cc_sd_ratio_end = 2. Keep residues with CC at least within cc_sd_ratio_end of the mean CC for good residues (applies to end of fragments)
cc_sd_ratio_ok = 2. Residues with with CC at least within cc_sd_ratio_ok of the mean CC for good residues are ok (do not delete on the basis of plddt)
max_gap_ratio = 3. Allow CA-CA distance to be up to max_gap_ratio times expected
maximum_connectivity_deviation = 15 Maximum connectivity deviation. Reject a solution with bigger deviation than this plus 2 * resolution.
keep_fraction_of_best = 0.5 Acceptable CC to keep as ratio to best found. Applies if best found is at least acceptable_docking_cc.
keep_maximum_entries = 10 Maximum dock positions to keep
rmsd_for_similar_placement = None RMSD value indicating that two placements are similar. default is resolution of map
rigid_body_refine_cycles = 1 Refinement cycles for rigid-body refinement
overlap_ca_ca_distance = 3 Overlap distance for CA-CA atoms (or P-P)
maximum_combinations = 100000 Maximum combinations of placements to consider
proceed_with_any_symmetry = False Run even with symmetry that is not point-group or helical. Not recommended as symmetry may not work properly
box_cushion = 20 Size of buffer around model when boxing
refine_cycles = 3 Refinement cycles
loop_refine_cycles = 5 Refinement cycles for loops
loop_backup_residues = 3 Number of tries removing one residue at a time from each end of existing ends of loop if no loop is found with initial gap
residues_to_trim = 5 Residues to trim on each end of all trimmed fragments
find_ncs_from_model = True Find NCS (symmetry) from working models and apply in density modification. Also turns on finding NCS in any autobuild density modification (including density-based search)
allow_split_more_than_one_chain_for_mr = False Allow splitting chains into domains for mr (X-ray only) even if there are multiple chains. Normally only split if a single chain as reassembly may not work with split multiple chains.
acceptable_cc_ratio = 0.8 Keep segments with CC at least equal to average of base segments times minimum_cc_ratio minus difference between segment confidence (plDDT) and confidence cutoff (typically 0.7)
low_res_if_multiple_solutions = 3.5 Try phaser MR at increasing lower resolutions up to this value if multiple solutions are found
delta_low_res = 0.5 Resolution increment for low_res_if_multiple_solutions
control
stop_after_dock = None Stop after docking step
stop_after_morph = False Stop after morphing step
read_files = False Read existing output files and use them if present
write_files = True Write output files
nproc = 1 Number of processors to use on your local machine
ignore_symmetry_conflicts = True You can ignore the symmetry information (CRYST1) from coordinate files. This may be necessary if your model has been placed in a box with box_map for example.
random_seed = 171731 Random seed
max_dirs = 1000 Maximum number of directories (dock_and_build_xxx)
verbose = False Verbose output
quick = True Run quickly
guiGUI-specific parameter required for output directory