Automated ligand identification


	Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home

Automated ligand identification

Author(s)
Purpose: Purpose of the phenix.ligand_identification command
Usage: How the phenix.ligand_identification works:; How to run the phenix.ligand_identification; What the phenix.ligand_identification command needs to run:; Choice of ligand library (keywords+examples); Output files from phenix.ligand_identification command; Multi-thread computing
Running from a parameters file
Examples: Sample command_line inputs
Possible Problems: Specific limitations and problems:
Literature
Additional information: List of ligands in the PHENIX ligand_identification default library; List of all ligand_identification keywords

Author(s)

phenix.ligand_identification: Li-Wei Hung
PHENIX GUI and PDS Server: Nigel W. Moriarty
RESOLVE: Tom Terwilliger

Purpose

Purpose of the phenix.ligand_identification command

The phenix.ligand_identification command carries out fitting of a library of 180 most frequently observed ligands in the PDB to a given electron density map. The program also conducts the analysis and ranking of the ligand fitting results.

The phenix.ligand_identification command works with the ligand library provided with the Phenix program by default. It can also take a custom ligand library provided by the users.

Usage

The phenix.ligand_identification task can be run from command line, or from the PHENIX GUI.

How the phenix.ligand_identification works:

The phenix.ligand_identification command uses RESOLVE ligand fitting methods as described in the LigandFit documentation. The phenix.ligand_identification carries out this fitting process for a library of 180 most frequently observed ligands in the Protein Data Bank, or a custom library as described above, scores and ranks the overall fitting results. A real-space refinement is carried out on the ligand by default between RESOLVE fitting and Phenix scoring. The scoring algorythm takes into consideration of density correlation between ligand and density as well as non-bonded interactions between fitted ligand and the input model. The output consists of a list of the best fitted ligands from the library. The command provides options to view the top ranked ligand in coot with or without the electron density (use keyword "--open_in_coot=True").

How to run the phenix.ligand_identification

Example commmands are provided below. The nsf-d2 files are in $PHENIX/phenix_examples/nsf-d2-ligand

What the phenix.ligand_identification command needs to run:

The phenix.ligand_identification command needs:

(2) (optional), a PDB file with your protein model without ligand

Choice of ligand library (keywords+examples)

*default ligand library: no keyword needed

*to specifiy a list of ligands in 3-letter codes:

--ligand_list="ATP CTP TTP GTP"

*to use all small molecules with a .pdb extension in the 'ligand_dir' as the search library:

--ligand_dir=/my/compound/library/A1217321

*to use all ligands associated with proteins of specific function found in the PDB as the search library:

--function="tyrosine kinase"

*to use all ligands found in proteins with a specific Enzyme Classfication number in the PDB:

--EC=1.1.1.3

*to use all ligands found in proteins with a specific Pfam accession number in the PDB:

--pfam=PF00042

*to use all ligands found in proteins with specific SCOP and/or CATH terms in the PDB:

--scop='DNA/RNA-binding 3-helical bundle'

--cath='Trypsin-like serine proteases'

*to use all ligands found in proteins with a specific Gene Ontology (GO) accession number in the PDB:

--GO=0009253

*to use all ligands found in proteins with a specific InterPro ID in the PDB:

--ipro=014838

Output files from phenix.ligand_identification command

When you run phenix.ligand_identification command, the output files will be in the directory you started Phenix:

A summary file of the fitting results of all ligands:
```
overall_ligand_scores.log
```
A summary table listing the results of the top ranked ligands:
```
topligand.txt
```
The last column "Sequence in library' contains numbers '###' indicating the sequence number of the corresponding ligands. The final fitted ligand coordinates and all the log files are in the corresponding'###' files described below.

PDB files with the fitted ligands:

ligand_fit_pdbs/RSR_FITTED_[3-letter code]_###.pdb

Resolve fitting Score files:

ligand_fit_scores/[3-letter code].scores

Map coefficients for the map used for fitting:
```
resolve_map.mtz
```

Command file to display results in coot:

display.com (uses ligid.scm, also in the same directory)

Multi-thread computing

The phenix.ligand_identification program has build-in multi-thread processing capability. Use keyword

--nproc=[number of threads]

to run the command in multi-threaded mode. In general, the processing speed is proportional to the the number of threads used, up to the maximum free cores the system can allocate at run time. For example, it takes about 25 minutes for the nsf-d2 example to run in 8 threaded mode, while single-process uses about 180 minutes for the same job on a dual-Xeon W5580 machine.

Running from a parameters file

You can run phenix.ligand_identification from a parameters file. This is often convenient because you can generate a default one with:

phenix.ligand_identification --show_defaults > my_ligand.eff

and then you can just edit this file to match your needs and run it with:

phenix.ligand_identification  my_ligand.eff

Examples

Sample command_line inputs

Standard run of ligand_identification (input protein model and data file, default ligand library, 8 CPUs)

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \
 input_labels=F nproc=8\

Search ligand from a difference map or pre-calculated map coefficients from phenix.refine
If your refine a model with a command such as,
```
phenix.refine data.mtz partial.pdb
```
then you will end up with the refined model,
```
partial_refine_001.pdb
```
and a map coefficients file:
```
partial_refine_001_map_coeffs.mtz
```
You can then run ligand_identification using the 2Fo-Fc map calculated from these map coefficients:
```
phenix.ligand_identification mtz_type=diffmap \ mtz_in=partial_refine_001_map_coeffs.mtz  input_labels="2FOFCWT PH2FOFCWT" \
model=partial_refine_001.pdb nproc=8
```
For Fo-Fc map from the same file you can say:
```
phenix.ligand_identification mtz_type=diffmap \ mtz_in=partial_refine_001_map_coeffs.mtz  input_labels="FOFCWT PHFOFCWT" \
model=partial_refine_001.pdb nproc=8
```
In the above two cases, "model" keyword is optional. If provided, non-bonded terms will be used in scoring.
The examples below show varius ways of specifying custom ligand libraries based on the 'standard run' example above.

Identify ligand from a series of ligands in 3-letter codes

phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \
 input_labels=F ligand_list="ATP GTP CTP TTP ADP GDP CDP TDP A3P GSP NAP AMP GMP CMP TMP"

Identify ligand from a given set of pdb files (could be your compound library) in a specific directory
```
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \
 input_labels=F ligand_dir=/my/compound/library/A1217321
```
This command will take all .pdb files in the ligand_dir and a make a custom library to carry out the search
Identify ligand from a library composed of all ligands found in Tyrosine kinases in the PDB
```
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \
 input_labels=F function="Tyrosine kinase"
```
This command could be useful when you want to use a function-specific ligand library. The 'function' should be in one of the EC terms. Note in the above example, if only "kinase" is specified, ligand found in all types of kinases will be searched.
Identify ligand from a library composed of all ligands found in proteins belong to a specific Enzyme Classfication
```
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \
 input_labels=F EC=1.1.3
```
This command could be useful when you want to use a function-specific ligand library, and your protein belongs to a EC. Note the EC can be any parent set of the actual EC. (e.g. you can use 1.1.3 instead of 1.1.3.1 although you'll get a boarder library with EC=1.1.3).

The nsf-d2 files in the above examples can be found in $PHENIX/phenix_examples/nsf-d2-ligand

Possible Problems

Specific limitations and problems:

The current Resolve_ligand_identification task work with the ligand library provided with the Phenix program by default. It is also capable of fitting and ranking ligands in a custom PDB library provided by the users. The ligand atoms in the user-provided PDBs should be under 'HETATM' records.
Other Resolve related limitations please refer to the document of the LigandFit wizard.

Literature

Ligand identification using electron-density map correlations. T. C. Terwilliger, P. D. Adams, N. W. Moriarty and J. D. Cohn Acta Cryst. D63, 101-107 (2007)
[pdf]

Automated ligand fitting by core-fragment fitting and extension into density. T. C. Terwilliger, H. Klei, P. D. Adams, N. W. Moriarty and J. D. Cohn Acta Cryst. D62, 915-922 (2006)
[pdf]

Additional information

List of ligands in the PHENIX ligand_identification default library

---------------------------------------------------------------

 PG4
 CRY
 CYS
 DIO
 DOX
 GOL
 MO5
 NBN
 OXL
 OXM
 PUT
 PYR
 F3S
 MO6
 PEG
 COA
 DTT
 FS4
 HED
 HEZ
 LI1
 MPD
 SF4
 SIN
 TMN
 TRS
 URA
 BEN
 BEZ
 MET
 NIO
 PGA
 POP
 ASP
 DAO
 FSO
 FUC
 GLU
 LYS
 PEP
 PGE
 PHB
 PHQ
 PLM
 TAR
 XYS
 ADE
 AKG
 7HP
 BGC
 CAM
 HC4
 ORO
 DKA
 GAL
 GLC
 MAN
 MES
 ARG
 PHE
 CIT
 FLC
 MMA
 MPO
 MYR
 NHE
 OLA
 PG4
 AMG
 FER
 NAA
 NAG
 NDG
 SPM
 EPE
 NGA
 PLP
 TRP
 BTN
 F6P
 FTT
 G6P
 LDA
 UPL
 1PE
 BH4
 H4B
 THM
 1PG
 P6G
 U10
 ADN
 BOG
 EST
 FBP
 GSH
 GTT
 NVP
 RET
 UMP
 C5P
 C8E
 DHT
 TMP
 UFP
 CMP
 NCN
 PRP
 BGC
 MAN
 GTS
 IMP
 LAT
 MAL
 SUC
 GLC
 AMP
 FPP
 PQQ
 T44
 2GP
 3GP
 5GP
 TYD
 UDP
 GTX
 SAH
 TDP
 TPP
 IMO
 SAM
 A3P
 ADP
 DCP
 GDP
 NAG
 XYS
 2PE
 CTP
 DAD
 FOK
 TTP
 DTP
 DGA
 FMN
 SAP
 ACP
 ANP
 APC
 ATP
 FOL
 GNP
 GSP
 GTP
 MTX
 MAN
 CB3
 MA4
 UPG
 UD1
 SPO
 HEC
 HEM
 NAG
 NAD
 NAI
 ACR
 GLC
 NAP
 NDP
 DHE
 BPH
 ACO
 BCL
 FAD
 CAA
 GLC
 AP5
 BPB
 B12

List of all ligand_identification keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
ligand_identification
   mtz_in= None Enter an MTZ file name
   mtz_type= *F diffmap If input a precalculated difference map, use "D"
             instead
   input_labels= None Provide a label of F if mtz type is F, provide F, PHI if
                 mtz if a difference map
   ligand_list= None enter ligands to be searched. Ligands should be in
                3-letter codes seperated by spaces. For example,
                ligand_list="ATP CTP APN FMN". If no input or uninterpretable,
                Default lib will be used. See phenix.doc for default ligand
                lib
   ligand_dir= None Directory of your ligand library. Files in this directory
               with .pdb extentions will be used in ligand search. If None,
               default lib will be used.
   work_dir= None Top level directory where jobs will be run. Default is the
             directory where phenix.ligand_identification is started.
   EC= None Provide an EC# of your protein. The program will screen all
       ligands interact with proteins in this class found in PDB
   function= None Provide functional terms of your protein. For example
             'kinase'. The program will then screen all ligands interact with
             proteins of similar functions found in PDB. Note: function should
             belong to partial EC terms.
   scop_fold= None Provide fold information of your protein in SCOP terms. For
              example 'tim barrel'. The program will then screen all ligands
              interact with protein with TIM barrel folds in PDB.
   cath= None Provide structural information of protein in CATH terms. For
         example 'kinase'. The program will then screen all ligands interact
         with proteins with kinase folds in PDB.
   pfam= None Provide PFam IDs for protein(s). For example 'PF001232'. The
         program will then screen all ligands interact with proteins with all
         PF001232 in PDB.
   go= None Provide Gene Ontology(GO) information of protein in crystal. For
       example '01234'. The program will then screen all ligands interact with
       proteins with that GO number in PDB.
   ipro= None Provide Interpro ID of protein. For example '01740'. The program
         will then screen all ligands interact with proteins with the
         specified InterPro ID in PDB.
   model= None Enter a PDB files containing the protein only.
   high_resolution= 2.5 specify the high resolution to use. default=2.5
   low_resolution= 1000 Low resolution
   restart_run= False Use (restart_run = True) to continue unfinished run
   partial_analyze= False Analyze results before jobs complete. This function
                    is currently not available with command-line version.
   ncpu= 1 Number of CPUs to use.
   n_indiv_tries_min= 30 usually 0 to 10, but set up to 300 to try harder to
                      find soln
   n_indiv_tries_max= 300 usually 0 to 10, but set up to 300 to try harder to
                      find soln
   n_group_search= 4 usually 3, but set up to 10 to try harder to find soln
   search_dist= 10 usually 10 A; always at least 5. smaller speeds up search
   delta_phi_lig= 40 usually 40 degree increments. set lower to search more
   fit_phi_inc= 20 usually 10 increments. set lower to search more
   local_search= True Usually True; Use False to force complete search
   nproc= 1 number of processors to use. This is used in resolve internal
          parallelization
   verbose= False verbose output
   debug= False debugging output
   temp_dir= Auto Optional temporary work directory
   output_dir= None Output directory where files are to be written
   dry_run= False Just read in and check parameter names
   number_of_ligands= None Total number of ligand sites. Ignored if "None".
                      find_all_ligands will keep looking until the correlation
                      coefficient for the fit of the best ligand is less than
                      cc_min or the number of ligands placed is
                      number_of_ligands, whichever comes first
   cc_min= 0.50 Ignored if "None". find_all_ligands will keep looking until
           the correlation coefficient for the fit of the best ligand is less
           than cc_min or the number of ligands placed is number_of_ligands,
           whichever comes first
   open_in_coot= False If true, Phenix will automatically start Coot after the
                 run is complete.
   refine_ligand= True If true, real-space refinement will be applied to the
                  top 20 ranked (or less) ligands.
   non_bonded= True If true, non_bonded terms will be applied to the scores if
               RSR is used and model exists.
   keep_all_files= False If true, all intermediate pdb and log files will be
                   kept.
   cif_def_file_list= None You can supply cif files for real-space refinement.
                      example: cif_def_file_list='/my/cif/file1.cif
                      /my/cif/file2.cif'
   real_space_target_weight= 10 You can carry change the weight on the
                             real-space term in real-space refinement on the
                             ligand after fitting.
   job_title= None Job title in PHENIX GUI, not used on command line