Automated ligand identification

Contents

Author(s)
Purpose
Purpose of the phenix.ligand_identification command
Usage
How the phenix.ligand_identification works:
How to run the phenix.ligand_identification
What the phenix.ligand_identification command needs to run:
Choice of ligand library (keywords+examples)
Output files from phenix.ligand_identification command
Multi-thread computing
Running from a parameters file
Examples
Sample command_line inputs
Possible Problems
Specific limitations and problems:
Literature
Additional information
List of all available keywords

Author(s)

phenix.ligand_identification: Li-Wei Hung
PHENIX GUI and PDS Server: Nigel W. Moriarty,Nathaniel Echols
RESOLVE: Tom Terwilliger

Purpose

Purpose of the phenix.ligand_identification command

The phenix.ligand_identification command carries out fitting of a library of 180 most frequently observed ligands in the PDB to a given electron density map. The program also conducts the analysis and ranking of the ligand fitting results.

The phenix.ligand_identification command works with the ligand library provided with the Phenix program by default. It can also take a custom ligand library provided by the users.

Usage

The phenix.ligand_identification task can be run from command line, or from the PHENIX GUI.

How the phenix.ligand_identification works:

The phenix.ligand_identification command uses RESOLVE ligand fitting methods as described in the LigandFit documentation. The phenix.ligand_identification carries out this fitting process for a library of 180 most frequently observed ligands in the Protein Data Bank, or a custom library as described above, scores and ranks the overall fitting results. A real-space refinement is carried out on the ligand by default between RESOLVE fitting and Phenix scoring. The scoring algorithm takes into consideration of density correlation between ligand and density as well as non-bonded interactions between fitted ligand and the input model. The output consists of a list of the best fitted ligands from the library. The command provides options to view the top ranked ligand in coot with or without the electron density (use keyword "--open_in_coot=True").

How to run the phenix.ligand_identification

Example commands are provided below. The nsf-d2 files are in $PHENIX/phenix_examples/nsf-d2-ligand

What the phenix.ligand_identification command needs to run:

The phenix.ligand_identification command needs:

a mtz file containing structure factors
(optional), a PDB file with your protein model without ligand

Choice of ligand library (keywords+examples)

*default ligand library: no keyword needed

*to specify a list of ligands in 3-letter codes:

--ligand_list="ATP CTP TTP GTP"

*to use all small molecules with a .pdb extension in the 'ligand_dir' as the search library:

--ligand_dir=/my/compound/library/A1217321

*to use all ligands associated with proteins of specific function found in the PDB as the search library:

--function="tyrosine kinase"

*to use all ligands found in proteins with a specific Enzyme Classification number in the PDB:

--EC=1.1.1.3

*to use all ligands found in proteins with a specific Pfam accession number in the PDB:

--pfam=PF00042

*to use all ligands found in proteins with specific SCOP and/or CATH terms in the PDB:

--scop='DNA/RNA-binding 3-helical bundle'

--cath='Trypsin-like serine proteases'

*to use all ligands found in proteins with a specific Gene Ontology (GO) accession number in the PDB:

--GO=0009253

*to use all ligands found in proteins with a specific InterPro ID in the PDB:

--ipro=014838

Output files from phenix.ligand_identification command

When you run phenix.ligand_identification command, the output files will be in the directory you started Phenix:

A summary file of the fitting results of all ligands:

overall_ligand_scores.log

A summary table listing the results of the top ranked ligands:

topligand.txt

The last column "Sequence in library' contains numbers '###' indicating the sequence number of the corresponding ligands. The final fitted ligand coordinates and all the log files are in the corresponding'###' files described below.

PDB files with the fitted ligands:

ligand_fit_pdbs/RSR_FITTED_[3-letter code]_###.pdb

Resolve fitting Score files:

ligand_fit_scores/[3-letter code].scores

Map coefficients for the map used for fitting:

resolve_map.mtz

Command file to display results in coot:

display.com (uses ligid.scm, also in the same directory)

Multi-thread computing

The phenix.ligand_identification program has build-in multi-thread processing capability. Use keyword

--nproc=[number of threads]

to run the command in multi-threaded mode. In general, the processing speed is proportional to the the number of threads used, up to the maximum free cores the system can allocate at run time. For example, it takes about 25 minutes for the nsf-d2 example to run in 8 threaded mode, while single-process uses about 180 minutes for the same job on a dual-Xeon W5580 machine.

Running from a parameters file

You can run phenix.ligand_identification from a parameters file. This is often convenient because you can generate a default one with:

phenix.ligand_identification --show_defaults > my_ligand.eff

and then you can just edit this file to match your needs and run it with:

phenix.ligand_identification  my_ligand.eff

Examples

Sample command_line inputs

Standard run of ligand_identification (input protein model and data file, default ligand library, 8 CPUs)

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \
 input_labels=F nproc=8\

Search ligand from a difference map or pre-calculated map coefficients from phenix.refine

If your refine a model with a command such as,

phenix.refine data.mtz partial.pdb

then you will end up with the refined model,

partial_refine_001.pdb

and a map coefficients file:

partial_refine_001_map_coeffs.mtz

You can then run ligand_identification using the 2Fo-Fc map calculated from these map coefficients:

phenix.ligand_identification mtz_type=diffmap \ mtz_in=partial_refine_001_map_coeffs.mtz  input_labels="2FOFCWT PH2FOFCWT" \
model=partial_refine_001.pdb nproc=8

For Fo-Fc map from the same file you can say:

phenix.ligand_identification mtz_type=diffmap \ mtz_in=partial_refine_001_map_coeffs.mtz  input_labels="FOFCWT PHFOFCWT" \
model=partial_refine_001.pdb nproc=8

In the above two cases, "model" keyword is optional. If provided, non-bonded terms will be used in scoring.

The examples below show various ways of specifying custom ligand libraries based on the 'standard run' example above.

Identify ligand from a series of ligands in 3-letter codes

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \
 input_labels=F ligand_list="ATP GTP CTP TTP ADP GDP CDP TDP A3P GSP NAP AMP GMP CMP TMP"

Identify ligand from a given set of pdb files (could be your compound library) in a specific directory

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \
 input_labels=F ligand_dir=/my/compound/library/A1217321

This command will take all .pdb files in the ligand_dir and a make a custom library to carry out the search

Identify ligand from a library composed of all ligands found in Tyrosine kinases in the PDB

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \
 input_labels=F function="Tyrosine kinase"

This command could be useful when you want to use a function-specific ligand library. The 'function' should be in one of the EC terms. Note in the above example, if only "kinase" is specified, ligand found in all types of kinases will be searched.

Identify ligand from a library composed of all ligands found in proteins belong to a specific Enzyme Classification

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \
 input_labels=F EC=1.1.3

This command could be useful when you want to use a function-specific ligand library, and your protein belongs to a EC. Note the EC can be any parent set of the actual EC. (e.g. you can use 1.1.3 instead of 1.1.3.1 although you'll get a boarder library with EC=1.1.3).

The nsf-d2 files in the above examples can be found in $PHENIX/phenix_examples/nsf-d2-ligand

Possible Problems

Specific limitations and problems:

The current Resolve_ligand_identification task work with the ligand library provided with the Phenix program by default. It is also capable of fitting and ranking ligands in a custom PDB library provided by the users. The ligand atoms in the user-provided PDBs should be under 'HETATM' records.
Other Resolve related limitations please refer to the document of the LigandFit wizard.

Literature

Ligand identification using electron-density map correlations. T.C. Terwilliger, P.D. Adams, N.W. Moriarty, and J.D. Cohn. Acta Crystallogr D Biol Crystallogr 63, 101-7 (2006).

Automated ligand fitting by core-fragment fitting and extension into density. T.C. Terwilliger, H. Klei, P.D. Adams, N.W. Moriarty, and J.D. Cohn. Acta Crystallogr D Biol Crystallogr 62, 915-22 (2006).

Additional information

List of ligands in the PHENIX ligand_identification default library:

PG4
CRY
CYS
DIO
DOX
GOL
MO5
NBN
OXL
OXM
PUT
PYR
F3S
MO6
PEG
COA
DTT
FS4
HED
HEZ
LI1
MPD
SF4
SIN
TMN
TRS
URA
BEN
BEZ
MET
NIO
PGA
POP
ASP
DAO
FSO
FUC
GLU
LYS
PEP
PGE
PHB
PHQ
PLM
TAR
XYS
ADE
AKG
7HP
BGC
CAM
HC4
ORO
DKA
GAL
GLC
MAN
MES
ARG
PHE
CIT
FLC
MMA
MPO
MYR
NHE
OLA
PG4
AMG
FER
NAA
NAG
NDG
SPM
EPE
NGA
PLP
TRP
BTN
F6P
FTT
G6P
LDA
UPL
1PE
BH4
H4B
THM
1PG
P6G
U10
ADN
BOG
EST
FBP
GSH
GTT
NVP
RET
UMP
C5P
C8E
DHT
TMP
UFP
CMP
NCN
PRP
BGC
MAN
GTS
IMP
LAT
MAL
SUC
GLC
AMP
FPP
PQQ
T44
2GP
3GP
5GP
TYD
UDP
GTX
SAH
TDP
TPP
IMO
SAM
A3P
ADP
DCP
GDP
NAG
XYS
2PE
CTP
DAD
FOK
TTP
DTP
DGA
FMN
SAP
ACP
ANP
APC
ATP
FOL
GNP
GSP
GTP
MTX
MAN
CB3
MA4
UPG
UD1
SPO
HEC
HEM
NAG
NAD
NAI
ACR
GLC
NAP
NDP
DHE
BPH
ACO
BCL
FAD
CAA
GLC
AP5
BPB
B12

List of all available keywords

ligand_identification
- mtz_in = None Enter an MTZ file name
- mtz_type = *F diffmap If input a precalculated difference map, use "D" instead
- input_labels = None Provide a label of F if mtz type is F, provide F, PHI if mtz if a difference map
- ligand_list = None enter ligands to be searched. Ligands should be in 3-letter codes seperated by spaces. For example, ligand_list="ATP CTP APN FMN". If no input or uninterpretable, Default lib will be used. See phenix.doc for default ligand lib
- ligand_dir = None Directory of your ligand library. Files in this directory with .pdb or .cif extentions will be used in ligand search. If None, default library will be used.
- work_dir = None Top level directory where jobs will be run. Default is the directory where phenix.ligand_identification is started.
- EC = None Provide an EC# of your protein. The program will screen all ligands interact with proteins in this class found in PDB
- function = None Provide functional terms of your protein. For example 'kinase'. The program will then screen all ligands interact with proteins of similar functions found in PDB. Note: function should belong to partial EC terms.
- scop_fold = None Provide fold information of your protein in SCOP terms. For example 'tim barrel'. The program will then screen all ligands interact with protein with TIM barrel folds in PDB.
- cath = None Provide structural information of protein in CATH terms. For example 'kinase'. The program will then screen all ligands interact with proteins with kinase folds in PDB.
- pfam = None Provide PFam IDs for protein(s). For example 'PF001232'. The program will then screen all ligands interact with proteins with all PF001232 in PDB.
- go = None Provide Gene Ontology(GO) information of protein in crystal. For example '01234'. The program will then screen all ligands interact with proteins with that GO number in PDB.
- ipro = None Provide Interpro ID of protein. For example '01740'. The program will then screen all ligands interact with proteins with the specified InterPro ID in PDB.
- model = None Enter a PDB files containing the protein only.
- high_resolution = 2.5 specify the high resolution to use. default=2.5
- low_resolution = 1000 Low resolution
- restart_run = False Use (restart_run = True) to continue unfinished run
- partial_analyze = False Analyze results before jobs complete. This function is currently not available with command-line version.
- ncpu = 1 Number of CPUs to use.
- n_indiv_tries_min = 30 usually 0 to 10, but set up to 300 to try harder to find soln
- n_indiv_tries_max = 300 usually 0 to 10, but set up to 300 to try harder to find soln
- n_group_search = 4 usually 3, but set up to 10 to try harder to find soln
- search_dist = 10 usually 10 A; always at least 5. smaller speeds up search
- delta_phi_lig = 40 usually 40 degree increments. set lower to search more
- fit_phi_inc = 20 usually 10 increments. set lower to search more
- local_search = True Usually True; Use False to force complete search
- nproc = 1 number of processors to use. This is used in resolve internal parallelization
- verbose = False verbose output
- debug = False debugging output
- temp_dir = Auto Optional temporary work directory
- output_dir = None Output directory where files are to be written
- dry_run = False Just read in and check parameter names
- number_of_ligands = None Total number of ligand sites. Ignored if "None". find_all_ligands will keep looking until the correlation coefficient for the fit of the best ligand is less than cc_min or the number of ligands placed is number_of_ligands, whichever comes first
- cc_min = 0.50 Ignored if "None". find_all_ligands will keep looking until the correlation coefficient for the fit of the best ligand is less than cc_min or the number of ligands placed is number_of_ligands, whichever comes first
- open_in_coot = False If true, Phenix will automatically start Coot after the run is complete.
- refine_ligand = True If true, real-space refinement will be applied to the top 20 ranked (or less) ligands.
- non_bonded = True If true, non_bonded terms will be applied to the scores if RSR is used and model exists.
- keep_all_files = False If true, all intermediate pdb and log files will be kept.
- cif_def_file_list = None You can supply cif files for real-space refinement. example: cif_def_file_list='/my/cif/file1.cif /my/cif/file2.cif'
- real_space_target_weight = 10 You can carry change the weight on the real-space term in real-space refinement on the ligand after fitting.
- job_title = None Job title in PHENIX GUI, not used on command line