Contents
The phenix.ligand_identification command carries out fitting of a library of 180 most frequently observed ligands in the PDB to a given electron density map. The program also conducts the analysis and ranking of the ligand fitting results.
The phenix.ligand_identification command works with the ligand library provided with the Phenix program by default. It can also take a custom ligand library provided by the users.
The phenix.ligand_identification task can be run from command line, or from the PHENIX GUI.
The phenix.ligand_identification command uses RESOLVE ligand fitting methods as described in the LigandFit documentation. The phenix.ligand_identification carries out this fitting process for a library of 180 most frequently observed ligands in the Protein Data Bank, or a custom library as described above, scores and ranks the overall fitting results. A real-space refinement is carried out on the ligand by default between RESOLVE fitting and Phenix scoring. The scoring algorithm takes into consideration of density correlation between ligand and density as well as non-bonded interactions between fitted ligand and the input model. The output consists of a list of the best fitted ligands from the library. The command provides options to view the top ranked ligand in coot with or without the electron density (use keyword "--open_in_coot=True").
Example commands are provided below. The nsf-d2 files are in $PHENIX/phenix_examples/nsf-d2-ligand. This program accepts all LigandFit keywords in addition to the build-in keywords listed at the end of this document.
The phenix.ligand_identification command needs:
*default ligand library: no keyword needed
*to specify a list of ligands in 3-letter codes:
--ligand_list="ATP CTP TTP GTP"
*to use all small molecules with a .pdb extension in the 'ligand_dir' as the search library:
--ligand_dir=/my/compound/library/A1217321
*to use all ligands fround in sequence and structural homologs of the input model as the search library. (Note, the compilation of ligand library is all done internally. No query is sent throught the network.) :
--use_pdb_ligand=True
*to use all ligands associated with proteins of specific function found in the PDB as the search library:
--function="tyrosine kinase"
*to use all ligands found in proteins with a specific Enzyme Classification number in the PDB:
--EC=1.1.1.3
*to use all ligands found in proteins with a specific Pfam accession number in the PDB:
--pfam=PF00042
*to use all ligands found in proteins with specific SCOP and/or CATH terms in the PDB:
--scop='DNA/RNA-binding 3-helical bundle'
--cath='Trypsin-like serine proteases'
*to use all ligands found in proteins with a specific Gene Ontology (GO) accession number in the PDB:
--GO=0009253
*to use all ligands found in proteins with a specific InterPro ID in the PDB:
--ipro=014838
*to filter ligand sizes (# of non-H atoms) in ligand library to be used in ligand search:
When you run phenix.ligand_identification command, the output files will be in the directory you started Phenix:
overall_ligand_scores.log
topligand.txt
The last column "Sequence in library' contains numbers '###' indicating the sequence number of the corresponding ligands. The final fitted ligand coordinates and all the log files are in the corresponding'###' files described below.
ligand_fit_pdbs/RSR_FITTED_[3-letter code]_###.pdb
ligand_fit_scores/[3-letter code].scores
resolve_map.mtz
display.com (uses ligid.scm, also in the same directory)
The phenix.ligand_identification program has build-in multi- processing capability. Use keyword
--nproc=[number of threads]
to run the command in multiprocessing mode. In general, the processing speed is proportional to the the number of CPU cores used, up to the maximum free cores the system can allocate at run time. For example, it takes about 25 minutes for the nsf-d2 example to run in 8 threaded mode, while single-process uses about 180 minutes for the same job on a dual-Xeon W5580 machine.
You can run phenix.ligand_identification from a parameters file. This is often convenient because you can generate a default one with:
phenix.ligand_identification --show_defaults > my_ligand.eff
and then you can just edit this file to match your needs and run it with:
phenix.ligand_identification my_ligand.eff
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb input_labels=F nproc=8
If your refine a model with a command such as,
phenix.refine data.mtz partial.pdb
then you will end up with the refined model,
partial_refine_001.pdb
and a map coefficients file:
partial_refine_001_map_coeffs.mtz
You can then run ligand_identification using the 2Fo-Fc map calculated from these map coefficients:
phenix.ligand_identification mtz_type=diffmap mtz_in=partial_refine_001_map_coeffs.mtz input_labels="2FOFCWT PH2FOFCWT" model=partial_refine_001.pdb nproc=8
For Fo-Fc map from the same file you can say:
phenix.ligand_identification mtz_type=diffmap mtz_in=partial_refine_001_map_coeffs.mtz input_labels="FOFCWT PHFOFCWT" model=partial_refine_001.pdb nproc=8
In the above two cases, "model" keyword is optional. If provided, non-bonded energy terms will be used in scoring.
The examples below show various ways of specifying custom ligand libraries based on the 'standard run' example above.
3. Identify ligand from a series of ligands in 3-letter codes
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb input_labels=F ligand_list="ATP GTP CTP TTP ADP GDP CDP TDP A3P GSP NAP AMP GMP CMP TMP"
4. Identify ligand from a given set of pdb files (could be your compound library) in a specific directory
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb input_labels=F ligand_dir=/my/compound/library/A1217321
This command will take all .pdb files in the ligand_dir and a make a custom library to carry out the search
5. Identify ligand from a library of ligands founds in homologous structures (sequence or structural homologs) of the input pdb file.
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb input_labels=F use_pdb_ligand=True
This command will compile a library of ligands found in homologous pdbs of the 'model=' pdb file from Phenix's internal library. You can combine this keyword with other functional keywords (function, EC, pfam, ipro ...etc) and the program will compile a non-redundant combined ligand library. You may further filter the library to limit the sizes of ligands used in the search. See below. All these can be done in the GUI as well.
6. Identify ligand from a library composed of all ligands found in Tyrosine kinases in the PDB, and number of non-H atoms between 20 and 40
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb input_labels=F function="Tyrosine kinase" natom_min=20 natom_max=40
This command could be useful when you want to use a function-specific ligand library. The 'function' should be in one of the EC terms. Note in the above example, if only "kinase" is specified, ligand found in all types of kinases will be searched.
7. Identify ligand from a library composed of all ligands found in proteins belonging to a specific Enzyme Classification
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb input_labels=F EC=1.1.3
This command could be useful when you want to use a function-specific ligand library, and your protein belongs to a EC. Note the EC can be any parent set of the actual EC. (e.g. you can use 1.1.3 instead of 1.1.3.1 although you'll get a broader set of library with EC=1.1.3).
The nsf-d2 files in the above examples can be found in $PHENIX/phenix_examples/nsf-d2-ligand
List of ligands in the PHENIX ligand_identification default library:
PG4 CRY CYS DIO DOX GOL MO5 NBN OXL OXM PUT PYR F3S MO6 PEG COA DTT FS4 HED HEZ LI1 MPD SF4 SIN TMN TRS URA BEN BEZ MET NIO PGA POP ASP DAO FSO FUC GLU LYS PEP PGE PHB PHQ PLM TAR XYS ADE AKG 7HP BGC CAM HC4 ORO DKA GAL GLC MAN MES ARG PHE CIT FLC MMA MPO MYR NHE OLA PG4 AMG FER NAA NAG NDG SPM EPE NGA PLP TRP BTN F6P FTT G6P LDA UPL 1PE BH4 H4B THM 1PG P6G U10 ADN BOG EST FBP GSH GTT NVP RET UMP C5P C8E DHT TMP UFP CMP NCN PRP BGC MAN GTS IMP LAT MAL SUC GLC AMP FPP PQQ T44 2GP 3GP 5GP TYD UDP GTX SAH TDP TPP IMO SAM A3P ADP DCP GDP NAG XYS 2PE CTP DAD FOK TTP DTP DGA FMN SAP ACP ANP APC ATP FOL GNP GSP GTP MTX MAN CB3 MA4 UPG UD1 SPO HEC HEM NAG NAD NAI ACR GLC NAP NDP DHE BPH ACO BCL FAD CAA GLC AP5 BPB B12
Note: To use LigandFit parameters in ligand_identification, the parameter group (i.e. the bold text above the desired keyword) should be added to the keywords. For example,
*to specify search center:
search_target.search_center="10 10 10"
*to specify number of copies of the ligand in the asymmetric unit:
search_parameters.number_of_ligands=2
{{phil:phenix.command_line.ligand_identification}}