Contents
The phenix.ligand_identification command carries out fitting of a library of 180 most frequently observed ligands in the PDB to a given electron density map. The program also conducts the analysis and ranking of the ligand fitting results.
The phenix.ligand_identification command works with the ligand library provided with the Phenix program by default. It can also take a custom ligand library provided by the users.
The phenix.ligand_identification task can be run from command line, or from the PHENIX GUI.
The phenix.ligand_identification command uses RESOLVE ligand fitting methods as described in the LigandFit documentation. The phenix.ligand_identification carries out this fitting process for a library of 180 most frequently observed ligands in the Protein Data Bank, or a custom library as described above, scores and ranks the overall fitting results. A real-space refinement is carried out on the ligand by default between RESOLVE fitting and Phenix scoring. The scoring algorithm takes into consideration of density correlation between ligand and density as well as non-bonded interactions between fitted ligand and the input model. The output consists of a list of the best fitted ligands from the library. The command provides options to view the top ranked ligand in coot with or without the electron density (use keyword "--open_in_coot=True").
Example commands are provided below. The nsf-d2 files are in $PHENIX/phenix_examples/nsf-d2-ligand
The phenix.ligand_identification command needs:
*default ligand library: no keyword needed
*to specify a list of ligands in 3-letter codes:
--ligand_list="ATP CTP TTP GTP"
*to use all small molecules with a .pdb extension in the 'ligand_dir' as the search library:
--ligand_dir=/my/compound/library/A1217321
*to use all ligands associated with proteins of specific function found in the PDB as the search library:
--function="tyrosine kinase"
*to use all ligands found in proteins with a specific Enzyme Classification number in the PDB:
--EC=1.1.1.3
*to use all ligands found in proteins with a specific Pfam accession number in the PDB:
--pfam=PF00042
*to use all ligands found in proteins with specific SCOP and/or CATH terms in the PDB:
--scop='DNA/RNA-binding 3-helical bundle'
--cath='Trypsin-like serine proteases'
*to use all ligands found in proteins with a specific Gene Ontology (GO) accession number in the PDB:
--GO=0009253
*to use all ligands found in proteins with a specific InterPro ID in the PDB:
--ipro=014838
When you run phenix.ligand_identification command, the output files will be in the directory you started Phenix:
overall_ligand_scores.log
topligand.txt
The last column "Sequence in library' contains numbers '###' indicating the sequence number of the corresponding ligands. The final fitted ligand coordinates and all the log files are in the corresponding'###' files described below.
ligand_fit_pdbs/RSR_FITTED_[3-letter code]_###.pdb
ligand_fit_scores/[3-letter code].scores
resolve_map.mtz
display.com (uses ligid.scm, also in the same directory)
The phenix.ligand_identification program has build-in multi-thread processing capability. Use keyword
--nproc=[number of threads]
to run the command in multi-threaded mode. In general, the processing speed is proportional to the the number of threads used, up to the maximum free cores the system can allocate at run time. For example, it takes about 25 minutes for the nsf-d2 example to run in 8 threaded mode, while single-process uses about 180 minutes for the same job on a dual-Xeon W5580 machine.
You can run phenix.ligand_identification from a parameters file. This is often convenient because you can generate a default one with:
phenix.ligand_identification --show_defaults > my_ligand.eff
and then you can just edit this file to match your needs and run it with:
phenix.ligand_identification my_ligand.eff
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \ input_labels=F nproc=8\
If your refine a model with a command such as,
phenix.refine data.mtz partial.pdb
then you will end up with the refined model,
partial_refine_001.pdb
and a map coefficients file:
partial_refine_001_map_coeffs.mtz
You can then run ligand_identification using the 2Fo-Fc map calculated from these map coefficients:
phenix.ligand_identification mtz_type=diffmap \ mtz_in=partial_refine_001_map_coeffs.mtz input_labels="2FOFCWT PH2FOFCWT" \ model=partial_refine_001.pdb nproc=8
For Fo-Fc map from the same file you can say:
phenix.ligand_identification mtz_type=diffmap \ mtz_in=partial_refine_001_map_coeffs.mtz input_labels="FOFCWT PHFOFCWT" \ model=partial_refine_001.pdb nproc=8
In the above two cases, "model" keyword is optional. If provided, non-bonded terms will be used in scoring.
The examples below show various ways of specifying custom ligand libraries based on the 'standard run' example above.
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \ input_labels=F ligand_list="ATP GTP CTP TTP ADP GDP CDP TDP A3P GSP NAP AMP GMP CMP TMP"
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \ input_labels=F ligand_dir=/my/compound/library/A1217321
This command will take all .pdb files in the ligand_dir and a make a custom library to carry out the search
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \ input_labels=F function="Tyrosine kinase"
This command could be useful when you want to use a function-specific ligand library. The 'function' should be in one of the EC terms. Note in the above example, if only "kinase" is specified, ligand found in all types of kinases will be searched.
phenix.ligand_identification mtz_in=nsf-d2.mtz model=nsf-d2_noligand.pdb \ input_labels=F EC=1.1.3
This command could be useful when you want to use a function-specific ligand library, and your protein belongs to a EC. Note the EC can be any parent set of the actual EC. (e.g. you can use 1.1.3 instead of 1.1.3.1 although you'll get a boarder library with EC=1.1.3).
The nsf-d2 files in the above examples can be found in $PHENIX/phenix_examples/nsf-d2-ligand
Ligand identification using electron-density map correlations. T.C. Terwilliger, P.D. Adams, N.W. Moriarty, and J.D. Cohn. Acta Crystallogr D Biol Crystallogr 63, 101-7 (2006).
Automated ligand fitting by core-fragment fitting and extension into density. T.C. Terwilliger, H. Klei, P.D. Adams, N.W. Moriarty, and J.D. Cohn. Acta Crystallogr D Biol Crystallogr 62, 915-22 (2006).
List of ligands in the PHENIX ligand_identification default library:
PG4 CRY CYS DIO DOX GOL MO5 NBN OXL OXM PUT PYR F3S MO6 PEG COA DTT FS4 HED HEZ LI1 MPD SF4 SIN TMN TRS URA BEN BEZ MET NIO PGA POP ASP DAO FSO FUC GLU LYS PEP PGE PHB PHQ PLM TAR XYS ADE AKG 7HP BGC CAM HC4 ORO DKA GAL GLC MAN MES ARG PHE CIT FLC MMA MPO MYR NHE OLA PG4 AMG FER NAA NAG NDG SPM EPE NGA PLP TRP BTN F6P FTT G6P LDA UPL 1PE BH4 H4B THM 1PG P6G U10 ADN BOG EST FBP GSH GTT NVP RET UMP C5P C8E DHT TMP UFP CMP NCN PRP BGC MAN GTS IMP LAT MAL SUC GLC AMP FPP PQQ T44 2GP 3GP 5GP TYD UDP GTX SAH TDP TPP IMO SAM A3P ADP DCP GDP NAG XYS 2PE CTP DAD FOK TTP DTP DGA FMN SAP ACP ANP APC ATP FOL GNP GSP GTP MTX MAN CB3 MA4 UPG UD1 SPO HEC HEM NAG NAD NAI ACR GLC NAP NDP DHE BPH ACO BCL FAD CAA GLC AP5 BPB B12