Automated ligand identification

Contents

Author(s)
Purpose
Purpose of the phenix.ligand_identification command
Usage
How the phenix.ligand_identification works:
How to run the phenix.ligand_identification
What the phenix.ligand_identification command needs to run:
Choice of ligand library (keywords+examples)
Output files from phenix.ligand_identification command
Multiprocessing
Running from a parameters file
Examples
Sample command_line inputs
Possible Problems
Specific limitations and problems:
Literature
Additional information
List of all available keywords

Author(s)

phenix.ligand_identification: Li-Wei Hung
PHENIX GUI and PDS Server: Nigel W. Moriarty,Nathaniel Echols
RESOLVE: Tom Terwilliger

Purpose

Purpose of the phenix.ligand_identification command

The phenix.ligand_identification command carries out fitting of a library of 180 most frequently observed ligands in the PDB to a given electron density map. The program also conducts the analysis and ranking of the ligand fitting results.

The phenix.ligand_identification command works with the ligand library provided with the Phenix program by default. It can also take a custom ligand library provided by the users.

Usage

The phenix.ligand_identification task can be run from command line, or from the PHENIX GUI.

How the phenix.ligand_identification works:

The phenix.ligand_identification command uses RESOLVE ligand fitting methods as described in the LigandFit documentation. The phenix.ligand_identification carries out this fitting process for a library of 180 most frequently observed ligands in the Protein Data Bank, or a custom library as described above, scores and ranks the overall fitting results. A real-space refinement is carried out on the ligand by default between RESOLVE fitting and Phenix scoring. The scoring algorithm takes into consideration of density correlation between ligand and density as well as non-bonded interactions between fitted ligand and the input model. The output consists of a list of the best fitted ligands from the library. The command provides options to view the top ranked ligand in coot with or without the electron density (use keyword "--open_in_coot=True").

How to run the phenix.ligand_identification

Example commands are provided below. The nsf-d2 files are in $PHENIX/phenix_examples/nsf-d2-ligand. This program accepts all LigandFit keywords in addition to the build-in keywords listed at the end of this document.

What the phenix.ligand_identification command needs to run:

The phenix.ligand_identification command needs:

a mtz file containing structure factors
(optional), a PDB file with your protein model without ligand

Choice of ligand library (keywords+examples)

*default ligand library: no keyword needed

*to specify a list of ligands in 3-letter codes:

--ligand_list="ATP CTP TTP GTP"

*to use all small molecules with a .pdb extension in the 'ligand_dir' as the search library:

--ligand_dir=/my/compound/library/A1217321

*to use all ligands fround in sequence and structural homologs of the input model as the search library. (Note, the compilation of ligand library is all done internally. No query is sent throught the network.) :

--use_pdb_ligand=True

*to use all ligands associated with proteins of specific function found in the PDB as the search library:

--function="tyrosine kinase"

*to use all ligands found in proteins with a specific Enzyme Classification number in the PDB:

--EC=1.1.1.3

*to use all ligands found in proteins with a specific Pfam accession number in the PDB:

--pfam=PF00042

*to use all ligands found in proteins with specific SCOP and/or CATH terms in the PDB:

--scop='DNA/RNA-binding 3-helical bundle'

--cath='Trypsin-like serine proteases'

*to use all ligands found in proteins with a specific Gene Ontology (GO) accession number in the PDB:

--GO=0009253

*to use all ligands found in proteins with a specific InterPro ID in the PDB:

--ipro=014838

*to filter ligand sizes (# of non-H atoms) in ligand library to be used in ligand search:

::: (use either or both) --natom_min=30 --natom_max=80

Output files from phenix.ligand_identification command

When you run phenix.ligand_identification command, the output files will be in the directory you started Phenix:

A summary file of the fitting results of all ligands:

overall_ligand_scores.log

A summary table listing the results of the top ranked ligands:

topligand.txt

The last column "Sequence in library' contains numbers '###' indicating the sequence number of the corresponding ligands. The final fitted ligand coordinates and all the log files are in the corresponding'###' files described below.

PDB files with the fitted ligands:

ligand_fit_pdbs/RSR_FITTED_[3-letter code]_###.pdb

Resolve fitting Score files:

ligand_fit_scores/[3-letter code].scores

Map coefficients for the map used for fitting:

resolve_map.mtz

Command file to display results in coot:

display.com (uses ligid.scm, also in the same directory)

Multiprocessing

The phenix.ligand_identification program has build-in multi- processing capability. Use keyword

--nproc=[number of threads]

to run the command in multiprocessing mode. In general, the processing speed is proportional to the the number of CPU cores used, up to the maximum free cores the system can allocate at run time. For example, it takes about 25 minutes for the nsf-d2 example to run in 8 threaded mode, while single-process uses about 180 minutes for the same job on a dual-Xeon W5580 machine.

Running from a parameters file

You can run phenix.ligand_identification from a parameters file. This is often convenient because you can generate a default one with:

phenix.ligand_identification --show_defaults > my_ligand.eff

and then you can just edit this file to match your needs and run it with:

phenix.ligand_identification  my_ligand.eff

Examples

Sample command_line inputs

1. Standard run of ligand_identification (input protein model and data: file, default ligand library, 8 CPUs)

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb
 input_labels=F nproc=8

2. Search ligand from a difference map or pre-calculated map: coefficients from phenix.refine

If your refine a model with a command such as,

phenix.refine data.mtz partial.pdb

then you will end up with the refined model,

partial_refine_001.pdb

and a map coefficients file:

partial_refine_001_map_coeffs.mtz

You can then run ligand_identification using the 2Fo-Fc map calculated from these map coefficients:

phenix.ligand_identification mtz_type=diffmap  mtz_in=partial_refine_001_map_coeffs.mtz
input_labels="2FOFCWT PH2FOFCWT" model=partial_refine_001.pdb nproc=8

For Fo-Fc map from the same file you can say:

phenix.ligand_identification mtz_type=diffmap  mtz_in=partial_refine_001_map_coeffs.mtz
input_labels="FOFCWT PHFOFCWT" model=partial_refine_001.pdb nproc=8

In the above two cases, "model" keyword is optional. If provided, non-bonded energy terms will be used in scoring.

The examples below show various ways of specifying custom ligand libraries based on the 'standard run' example above.

3. Identify ligand from a series of ligands in 3-letter codes

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb
 input_labels=F ligand_list="ATP GTP CTP TTP ADP GDP CDP TDP A3P GSP NAP AMP GMP CMP TMP"

4. Identify ligand from a given set of pdb files (could be your compound library) in a specific directory

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb
 input_labels=F ligand_dir=/my/compound/library/A1217321

This command will take all .pdb files in the ligand_dir and a make a custom library to carry out the search

5. Identify ligand from a library of ligands founds in homologous structures (sequence or structural homologs) of the input pdb file.

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb
 input_labels=F use_pdb_ligand=True

This command will compile a library of ligands found in homologous pdbs of the 'model=' pdb file from Phenix's internal library. You can combine this keyword with other functional keywords (function, EC, pfam, ipro ...etc) and the program will compile a non-redundant combined ligand library. You may further filter the library to limit the sizes of ligands used in the search. See below. All these can be done in the GUI as well.

6. Identify ligand from a library composed of all ligands found in Tyrosine kinases in the PDB, and number of non-H atoms between 20 and 40

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb
 input_labels=F function="Tyrosine kinase" natom_min=20 natom_max=40

This command could be useful when you want to use a function-specific ligand library. The 'function' should be in one of the EC terms. Note in the above example, if only "kinase" is specified, ligand found in all types of kinases will be searched.

7. Identify ligand from a library composed of all ligands found in proteins belonging to a specific Enzyme Classification

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb
 input_labels=F EC=1.1.3

This command could be useful when you want to use a function-specific ligand library, and your protein belongs to a EC. Note the EC can be any parent set of the actual EC. (e.g. you can use 1.1.3 instead of 1.1.3.1 although you'll get a broader set of library with EC=1.1.3).

The nsf-d2 files in the above examples can be found in $PHENIX/phenix_examples/nsf-d2-ligand

Possible Problems

Specific limitations and problems:

WHen using a custom ligand library in PDB format, the ligand atoms in the user-provided PDBs should be under 'HETATM' records.
Other ligand-fitting related limitations please refer to the document of the LigandFit wizard.

Literature

Ligand identification using electron-density map correlations. T.C. Terwilliger, P.D. Adams, N.W. Moriarty, and J.D. Cohn. Acta Crystallogr D Biol Crystallogr 63, 101-7 (2006).

Automated ligand fitting by core-fragment fitting and extension into density. T.C. Terwilliger, H. Klei, P.D. Adams, N.W. Moriarty, and J.D. Cohn. Acta Crystallogr D Biol Crystallogr 62, 915-22 (2006).

Additional information

List of ligands in the PHENIX ligand_identification default library:

PG4
CRY
CYS
DIO
DOX
GOL
MO5
NBN
OXL
OXM
PUT
PYR
F3S
MO6
PEG
COA
DTT
FS4
HED
HEZ
LI1
MPD
SF4
SIN
TMN
TRS
URA
BEN
BEZ
MET
NIO
PGA
POP
ASP
DAO
FSO
FUC
GLU
LYS
PEP
PGE
PHB
PHQ
PLM
TAR
XYS
ADE
AKG
7HP
BGC
CAM
HC4
ORO
DKA
GAL
GLC
MAN
MES
ARG
PHE
CIT
FLC
MMA
MPO
MYR
NHE
OLA
PG4
AMG
FER
NAA
NAG
NDG
SPM
EPE
NGA
PLP
TRP
BTN
F6P
FTT
G6P
LDA
UPL
1PE
BH4
H4B
THM
1PG
P6G
U10
ADN
BOG
EST
FBP
GSH
GTT
NVP
RET
UMP
C5P
C8E
DHT
TMP
UFP
CMP
NCN
PRP
BGC
MAN
GTS
IMP
LAT
MAL
SUC
GLC
AMP
FPP
PQQ
T44
2GP
3GP
5GP
TYD
UDP
GTX
SAH
TDP
TPP
IMO
SAM
A3P
ADP
DCP
GDP
NAG
XYS
2PE
CTP
DAD
FOK
TTP
DTP
DGA
FMN
SAP
ACP
ANP
APC
ATP
FOL
GNP
GSP
GTP
MTX
MAN
CB3
MA4
UPG
UD1
SPO
HEC
HEM
NAG
NAD
NAI
ACR
GLC
NAP
NDP
DHE
BPH
ACO
BCL
FAD
CAA
GLC
AP5
BPB
B12

List of all available keywords

Note: To use LigandFit parameters in ligand_identification, the parameter group (i.e. the bold text above the desired keyword) should be added to the keywords. For example,

*to specify search center:

search_target.search_center="10 10 10"

*to specify number of copies of the ligand in the asymmetric unit:

search_parameters.number_of_ligands=2

ligand_identification
- mtz_in = None Enter an MTZ file name
- mtz_type = *F diffmap If input a precalculated difference map, use "D" instead
- input_labels = None Provide a label of F if mtz type is F, provide F, PHI if mtz if a difference map
- ligand_list = None enter ligands to be searched. Ligands should be in 3-letter codes seperated by spaces. For example, ligand_list="ATP CTP APN FMN". If no input or uninterpretable, Default lib will be used. See phenix.doc for default ligand lib
- ligand_dir = None Directory of your ligand library. Files in this directory with .pdb or .cif extentions will be used in ligand search. If None, default library will be used.
- work_dir = None Top level directory where jobs will be run. Default is the directory where phenix.ligand_identification is started.
- EC = None Provide an EC# of your protein. The program will screen all ligands found in PDBs with in this Enzyme class or subclass.
- function = None Provide function description of your protein. For example, 'kinase'. The program will then screen all ligands found in PDBs associated with your functional terms. Note: function should belong to partial EC terms.
- scop_fold = None Provide fold information of your protein in SCOP terms. For example, 'tim barrel'. The program will then screen all ligands found in PDBS with the partiular folds in SCOP terms
- cath = None Provide structural information of protein in CATH terms. For example, 'Alanine racemase'. The program will then screen all ligands found in PDBs assocated with this CATH term.
- pfam = None Provide PFam IDs for protein(s). For example 'PF01280'. The program will then screen all ligands found in PDBs of the PF01280 family.
- go = None Provide Gene Ontology(GO) information of the protein. For example, '01234'. The program will then screen all ligands found in PDBs associated with that GO number.
- ipro = None Provide Interpro ID of the protein. For example, '01740'. The program will then screen all ligands found in PDBs associated with the specified InterPro ID.
- use_pdb_ligand = False Use ligands found in homologous pdbs of the input model. The progeam will conduct a fast structure and sequence search within Phenix (no internet queries) to identify homologs of the input pdb. All ligands found in these homologous pdbs will be used to in ligand search.
- natom_min = 4 minimum number of non-H atoms in ligand.
- natom_max = 999 maximum number of non-H atoms in ligand.
- model = None Enter a PDB files containing the protein only.
- high_resolution = 2.5 specify the high resolution to use. default=2.5
- low_resolution = 1000 Low resolution
- restart_run = False Use (restart_run = True) to continue unfinished run
- partial_analyze = False Analyze results before jobs complete. This function is currently not available with command-line version.
- ncpu = 1 Number of CPUs to use.
- n_indiv_tries_min = 30 usually 0 to 10, but set up to 300 to try harder to find soln
- n_indiv_tries_max = 300 usually 0 to 10, but set up to 300 to try harder to find soln
- n_group_search = 4 usually 3, but set up to 10 to try harder to find soln
- search_dist = 10 usually 10 A; always at least 5. smaller speeds up search
- delta_phi_lig = 40 usually 40 degree increments. set lower to search more
- fit_phi_inc = 20 usually 10 increments. set lower to search more
- local_search = True Usually True; Use False to force complete search
- search_center = '' Search center (in A) for ligand search. Leave it empty or 0 0 0 to ignore.
- ligand_near_res = None Search ligand near this protein residue. Please enter a residue selection syntax, e.g. "chain A and resid 120". Leave it empty or None to ignore.
- nproc = 1 number of processors to use. This is used in resolve internal parallelization
- verbose = False verbose output
- debug = False debugging output
- use_ligandfit = True use ligandfit to run fitting
- search_mode = *default LigandFit Use default mode (1 trial, faster), or LigandFit mode (5 trials, takes longer, or 1 very short trial with 'quick=True) in ligand fitting
- temp_dir = Auto Optional temporary work directory
- output_dir = None Output directory where files are to be written
- dry_run = False Just read in and check parameter names
- number_of_ligands = None Total number of ligand sites. Ignored if "None". find_all_ligands will keep looking until the correlation coefficient for the fit of the best ligand is less than cc_min or the number of ligands placed is number_of_ligands, whichever comes first
- cc_min = 0.75 Ignored if "None". find_all_ligands will keep looking until the correlation coefficient for the fit of the best ligand is less than cc_min or the number of ligands placed is number_of_ligands, whichever comes first
- open_in_coot = False If true, Phenix will automatically start Coot after the run is complete.
- non_bonded = True If true, non_bonded terms will be applied to the scores if RSR is used and model exists.
- keep_all_files = False If true, all intermediate pdb and log files will be kept.
- cif_def_file_list = None You can supply cif files for real-space refinement. example: cif_def_file_list='/my/cif/file1.cif /my/cif/file2.cif'
- real_space_target_weight = 10 You can carry change the weight on the real-space term in real-space refinement on the ligand after fitting.
- job_title = None Job title in PHENIX GUI, not used on command line
- ligandfit
  - data = None Datafile. This can be any format if only FP is to be read in. If phases are to be read in then MTZ format is required. The Wizard will guess the column identification. If you want to specify it you can say input_labels="FP" , or input_labels="FP PHIB FOM".
  - ligand = None Three-letter code of ligand, or file containing information about the ligand (PDB or SMILES)
  - model = None PDB file with model for everything but the ligand
  - quick = False Set to True for running as quickly as possible.
  - crystal_info
    - unit_cell = None Enter cell parameter a b c alpha beta gamma
    - resolution = 0 High-resolution limit. Zero means keep everything. If map_in is specified, resolution must be given.
    - space_group = None Space Group symbol (i.e., C2221 or C 2 2 21)
  - file_info
    - file_or_file_list = *single_file file_with_list_of_files Choose if you want to input a single file with PDB or other information about the ligand or if you want to input a file containing a list of files with this information for a list of ligands
    - input_labels = None Labels for input data columns
    - lig_map_type = fo-fc_difference_map fobs_map *pre_calculated_map_coeffs Enter the type of map to use in ligand fitting fo-fc_difference_map: Fo-Fc difference map phased on partial model (requires FOBS in your input file) fobs_map: Fo map phased on partial model (requires FOBS in your input file) pre_calculated_map_coeffs: map calculated from FP PHIB [FOM] coefficients in input data file (or 2FOFCWT PH2FOFCWT coeffs) If you supply a map just leave this at pre_calculated_map_coeffs.
    - ligand_format = *PDB SMILES Enter whether the files contain SMILES strings or PDB formatted information
  - input_files
    - existing_ligand_file_list = None You can enter a list of PDB files with ligands you have already fit. These will be used to exclude that region from consideration.
    - ligand_start = None LigandFit will attempt to put your ligand superimposing on ligand_start if supplied. This must have some of the same atoms as your ligand, but does not have to have all of them.
    - ncs_in = None You can supply a file with NCS information for use with ligands_from_ncs
    - input_ligand_compare_file = None If you enter a PDB file with a ligand in it, the coordinates of the newly-built ligand will be compared with the coordinates in this file.
    - cif_def_file_list = None You can supply cif files for real-space refinement after fitting
    - refinement_file = None You can supply a file for full refinement containing F/I SIGF/SIGI FreeR_flag If you supply this file then after real-space refinement a round of full refinement will be carried out with phenix.refine
    - fobs_labels = None Labels for Fobs SigFobs or Iobs SigIobs for refinement_file... same format as for phenix.refine
    - r_free_label = None Label for FreeR_flag in refinement_file...same format as for phenix.refine
    - map_in = None Map file (alternative to data file). Can be .ccp4, .map, .mrc and will be converted to map coefficients before use.
  - search_parameters
    - fixed_ligand = False Use fixed ligand (no rotations of any bonds) if set
    - conformers = 1 Enter how many conformers to create. If greater than 1, then ELBOW will always be used to generate them. If 1 then ELBOW will be used if a PDB file is not specified. These conformers are used to identify allowed torsion angles for your ligand. The alternative is to use the empirical rules in RESOLVE. ELBOW takes longer but is more accurate.
    - group_search = 0 Enter the ID number of the group from the ligand to use to seed the search for conformations
    - ligand_cc_min = 0.75 Enter the minimum correlation coefficient of the ligand to the map to quit searching for more conformations
    - ligand_completeness_min = 1 Enter the minimum completeness of the ligand to the map to quit searching for more conformations
    - local_search = True If local_search is True then, only the region within search_dist of the point in the map with the highest local rmsd will be searched in the FFT search for fragments
    - search_dist = 10 If local_search is True then, only the region within this distance of the point in the map with the highest local rmsd will be searched in the FFT search for fragments
    - use_cc_local = False You can specify the use of a local correlation coefficient for scoring ligand fits to the map. If you do not do this, then the region over which the ligand is scored are all points within 2.5 A of the atoms in the ligand. If you do specify use_cc_local, then the region over which the ligand is scored are all these points, plus all the contingous points that have density greater than 0.5 * sigma .
    - ligands_from_ncs = False You can try to use ncs (from your partial model file or from your ncs_in file) along with any ligands already found to place additional copies of your ligand. Only applicable if there is one type of ligand.
    - max_ligands_from_ncs = 1 You can specify how many of the ligands already found to consider using NCS (usually 1)
    - n_group_search = 3 Enter the number of different fragments of the ligand that will be looked for in FFT search of the map
    - n_indiv_tries_max = 10 If 0 is specified, all fragments are searched at once otherwise all are first searched at once then individually up to the number specified
    - n_indiv_tries_min = 5 If 0 is specified, all placements of a fragment are tested at once otherwise all are first tested at once then individually up to the number specified
    - number_of_ligands = 1 Number of copies of the ligand expected in the asymmetric unit
    - offsets_list = 7 53 29 You can specify an offset for the orientation of the templates in searching for ligands. This is used in generating diversity in models.
    - refine_ligand = True You can carry out real-space refinement on the ligand after fitting
    - ligand_occupancy = 1.0 You can set the initial occupancy of the ligand
    - real_space_target_weight = 10. You can carry change the weight on the real-space term in real-space refinement on the ligand after fitting
    - fittingParameters for tracing ligand
      - delta_phi_ligand = 40 Specify the angle (degrees) between successive tries in FFT search for fragments
      - fit_phi_inc = 20 Specify the angle (degrees) between rotations around bonds
      - fit_phi_range = -180 180 Range of bond rotation angles to search
  - search_target
    - ligand_near_chain = None You can specify where to search for the ligand either with search_center or with ligand_near_res and ligand_near_chain. If you set ligand_near_chain="None" or leave it blank or do not set it, then all chains will be included. The keywords ligand_near_res and ligand_near_chain refer to residue/chain in the file defined by input_partial_model_file (or model if running from command line).
    - ligand_near_res = None You can specify where to search for the ligand either with search_center or with ligand_near_res and ligand_near_chain The keywords ligand_near_res and ligand_near_chain refer to residue/chain in the file defined by input_partial_model_file (or model if running from command line).
    - ligand_near_pdb = None You can specify where LigandFit should look for your ligands by providing a PDB file containing one or more copies of the ligand. If you want you can provide a PDB file with ligand+ macromolecule and specify the ligand name with name_of_ligand_near_pdb.
    - name_of_ligand_near_pdb = None You can specify where LigandFit should look for your ligands by providing a PDB file containing one or more copies of the ligand. If you want you can provide a PDB file with ligand+ macromolecule and specify the ligand name with name_of_ligand_near_pdb.
    - search_center = 0.0 0.0 0.0 Enter coordinates for center of search region (ignored if [0.,0.,0.])
  - general
    - extend_try_list = True You can fill out the list of parallel jobs to match the number of jobs you want to run at one time, as specified with nbatch.
    - ligand_id = None You can specify an integer value for the ID of a ligand... This number will be added to whatever residue number the ligand search model in input_lig_file has. The keyword is only valid if a single copy of the ligand is to be found.
    - nbatch = 5 You can specify the number of processors to use (nproc) and the number of batches to divide the data into for parallel jobs. Normally you will set nproc to the number of processors available and leave nbatch alone. If you leave nbatch as None it will be set automatically, with a value depending on the Wizard. This is recommended. The value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch.
    - nproc = 1 You can specify the number of processors to use (nproc) and the number of batches to divide the data into for parallel jobs. Normally you will set nproc to the number of processors available and leave nbatch alone. If you leave nbatch as None it will be set automatically, with a value depending on the Wizard. This is recommended. The value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch. If you set nproc=Auto and your machine has n processors, then it will use n-1 processors, or 1 if only 1 is available
    - resolve_command_list = None Commands for resolve. One per line in the form: keyword value (put in the quotes.) value can be optional Examples: coarse_grid resolution 200 2.0 hklin test.mtz . NOTE: for command-line usage you need to enclose the whole set of commands in double quotes and each individual command in single quotes.
    - coot_name = "coot" If your version of coot is called something else, then you can specify that here.
    - i_ran_seed = 72432 Random seed (positive integer) for model-building and simulated annealing refinement
    - raise_sorry = False You can have any failure end with a Sorry instead of simply printout to the screen
    - background = True When you specify nproc=nn, you can run the jobs in background (default if nproc is greater than 1) or foreground (default if nproc=1). If you set run_command=qsub (or otherwise submit to a batch queue), then you should set background=False, so that the batch queue can keep track of your runs. There is no need to use background=True in this case because all the runs go as controlled by your batch system. If you use run_command='sh ' (or similar, sh is default) then normally you will use background=True so that all the jobs run simultaneously.
    - check_wait_time = 1.0 You can specify the length of time (seconds) to wait between checking for subprocesses to end
    - max_wait_time = 1.0 You can specify the length of time (seconds) to wait when looking for a file. If you have a cluster where jobs do not start right away you may need a longer time to wait. The symptom of too short a wait time is 'File not found'
    - wait_between_submit_time = 1.0 You can specify the length of time (seconds) to wait between each job that is submitted when running sub-processes. This can be helpful on NFS-mounted systems when running with multiple processors to avoid file conflicts. The symptom of too short a wait_between_submit_time is File exists:....
    - cache_resolve_libs = True Use caching of resolve libraries to speed up resolve
    - resolve_size = 12 Size for solve/resolve ("","_giant", "_huge","_extra_huge" or a number where 12=giant 18=huge
    - check_run_command = False You can have the wizard check your run command at startup
    - run_command = "sh " When you specify nproc=nn, you can run the subprocesses as jobs in background with sh (default) or submit them to a queue with the command of your choice (i.e., qsub ). If you have a multi-processor machine, use sh. If you have a cluster, use qsub or the equivalent command for your system. NOTE: If you set run_command=qsub (or otherwise submit to a batch queue), then you should set background=False, so that the batch queue can keep track of your runs. There is no need to use background=True in this case because all the runs go as controlled by your batch system. If nproc is greater than 1 and you use run_command='sh '(or similar, sh is default) then normally you will use background=True so that all the jobs run simultaneously.
    - queue_commands = None You can add any commands that need to be run for your queueing system. These are written before any other commands in the file that is submitted to your queueing system. For example on a PBS system you might say: queue_commands='#PBS -N mr_rosetta' queue_commands='#PBS -j oe' queue_commands='#PBS -l walltime=03:00:00' queue_commands='#PBS -l nodes=1:ppn=1' NOTE: you can put in the characters '<path>' in any queue_commands line and this will be replaced by a string of characters based on the path to the run directory. The first character and last two characters of each part of the path will be included, separated by '_',up to 15 characters. For example 'test_autobuild/WORK_5/AutoBuild_run_1_/TEMP0/RUN_1' would be represented by: 'tld_W_5_A1__TP0_1'
    - condor_universe = vanilla The universe for condor is usually vanilla. However you might need to set it to local for your cluster
    - add_double_quotes_in_condor = True You might need to turn on or off double quotes in condor job submission scripts. These are already default elsewhere but may interfere with condor paths.
    - condor = None Specifies if the group_run_command is submitting a job to a condor cluster. Set by default to True if group_run_command=condor_submit, otherwise False. For condor job submission mr_rosetta uses a customized script with condor commands. Also uses one_subprocess_level=True
    - last_process_is_local = True If true, run the last process in a group in background with sh as part of the job that is submitting jobs. This prevents having the job that is submitting jobs sit and wait for all the others while doing nothing
    - skip_r_factor = False You can skip R-factor calculation if refinement is not done and maps_only=True
    - test_flag_value = Auto Normally leave this at Auto (default). This parameter sets the value of the test set that is to be free. Normally phenix sets up test sets with values of 0 and 1 with 1 as the free set. The CCP4 convention is values of 0 through 19 with 0 as the free set. Either of these is recognized by default in Phenix. If you have any other convention (for example values of 0 to 19 and test set is 1) then you can specify this with test_flag_value.
    - skip_xtriage = False You can bypass xtriage if you want. This will prevent you from applying anisotropy corrections, however.
    - base_path = None You can specify the base path for files (default is current working directory)
    - temp_dir = None Define a temporary directory
    - local_temp_directory = None Write all temporary files to this directory and copy files specified by keep_files back to TEMP directory in working directory at the end of run. Used to speed up read/write operations. NOTE: Deletes all files except those specified by keep_files just like clean_up=True.
    - clean_up = None At the end of the entire run the TEMP directories will be removed if clean_up is True. Files listed in keep_files will not be deleted. If you want to remove files after your run is finished use a command like "phenix.autobuild run=1 clean_up=True"
    - print_citations = True Print citations at end of run
    - solution_output_pickle_file = None At end of run, write solutions to this file in output directory if defined
    - job_title = None Job title in PHENIX GUI, not used on command line
    - top_output_dir = None This is used in subprocess calls of wizards and to tell the Wizard where to look for the STOPWIZARD file.
    - wizard_directory_number = None This is used by the GUI to define the run number for Wizards. It is the same as desired_run_number NOTE: this value can only be specified on the command line, as the directory number is set before parameters files are read.
    - verbose = False Command files and other verbose output will be printed
    - extra_verbose = False Facts and possible commands will be printed every cycle if True
    - debug = False You can have the wizard stop with error messages about the code if you use debug. Additionally the output goes to the terminal if you specify "debug=True"
    - require_nonzero = True Require non-zero values in data columns to consider reading in.
    - remove_path_word_list = None List of words identifying paths to remove from PATH These can be used to shorten your PATH. For example... cns ccp4 coot would remove all paths containing these words except those also containing phenix. Capitalization is ignored.
    - fill = False Fill in all missing reflections to resolution res_fill. Applies to density modified maps. See also filled_2fofc_maps in autobuild.
    - res_fill = None Resolution for filling in missing data (default = highest resolution of any datafile). Only applies to density modified maps. Default is fill to high resolution of data. Ignored if fill=False
    - check_only = False Just read in and check initial parameters. Not for general use
    - keep_files = ligandfit*.pdb List of files that are not to be cleaned up. wildcards permitted
  - display
    - number_of_solutions_to_display = None Number of solutions to put on screen and to write out
    - solution_to_display = 1 Solution number of the solution to display and write out ( use 0 to let the wizard display the top solution)
  - run_control
    - ignore_blanks = None ignore_blanks allows you to have a command-line keyword with a blank value like "input_lig_file_list="
    - stop = None You can stop the current wizard with "stopwizard" or "stop". If you type "phenix.autobuild run=3 stop" then this will stop run 3 of autobuild.
    - display_facts = None Set display_facts to True and optionally run=[run-number] to display the facts for run run-number. If you just say display_facts then the facts for the highest-numbered existing run will be shown.
    - display_summary = None Set display_summary to True and optionally run=[run-number] to show the summary for run run-number. If you just say display_summary then the summary for the highest-numbered existing run will be shown.
    - carry_on = None Set carry_on to True to carry on with highest-numbered run from where you left off.
    - run = None Set run to n to continue with run n where you left off.
    - copy_run = None Set copy_run to n to copy run n to a new run and continue where you left off.
    - display_runs = None List all runs for this wizard.
    - delete_runs = None List runs to delete: 1 2 3-5 9:12
    - display_labels = None display_labels=test.mtz will list all the labels that identify data in test.mtz. You can use the label strings that are produced in AutoSol to identify which data to use from a datafile like this: peak.data="F+ SIGF+ F- SIGF-" # the entire string in quotes counts here You can use the individual labels from these strings as identifiers for data columns in AutoSol and AutoBuild like this: input_refinement_labels="FP SIGFP FreeR_flags" # each individual label counts
    - dry_run = False Just read in and check parameter names
    - params_only = False Just read in and return parameter defaults
    - display_all = False Just read in and display parameter defaults
    - coot = None Not presently applicable for ligandfit
  - special_keywords
    - write_run_directory_to_file = None Writes the full name of a run directory to the specified file. This can be used as a call-back to tell a script where the output is going to go.
  - non_user_parameters These are obsolete parameters and parameters that the wizards use to communicate among themselves. Not normally for general use.
    - gui_output_dir = None Used only by the GUI
    - sg = None Obsolete. Use space_group instead
    - get_lig_volume = False You can ask to get the volume of the ligand and to then stop
    - input_data_file = None Not normally used. Use "data=" instead
    - input_lig_file = None Not normally used. Use "ligand=" instead.
    - ligand_code = None Not normally used. Use "ligand=" instead.
    - input_partial_model_file = None Not normally used. Use "model=" instead
    - cif_already_generated = False You can specify that the ligand cif file is already generated