Python-based Hierarchical ENvironment for Integrated Xtallography |
Documentation Home |
Automated ligand identification
Author(s)
PurposePurpose of the phenix.ligand_identification commandThe phenix.ligand_identification command carries out fitting of a library of 180 most frequently observed ligands in the PDB to a given electron density map. The program also conducts the analysis and ranking of the ligand fitting results. The phenix.ligand_identification command works with the ligand library provided with the Phenix program by default. It can also take a custom ligand library provided by the users. UsageThe phenix.ligand_identification task can be run from command line, or from the PHENIX GUI. How the phenix.ligand_identification works:The phenix.ligand_identification command uses RESOLVE ligand fitting methods as described in the LigandFit documentation. The phenix.ligand_identification carries out this fitting process for a library of 180 most frequently observed ligands in the Protein Data Bank, or a custom library as described above, scores and ranks the overall fitting results. A real-space refinement is carried out on the ligand by default between RESOLVE fitting and Phenix scoring. The scoring algorythm takes into consideration of density correlation between ligand and density as well as non-bonded interactions between fitted ligand and the input model. The output consists of a list of the best fitted ligands from the library. The command provides options to view the top ranked ligand in coot with or without the electron density (use keyword "--open_in_coot=True"). How to run the phenix.ligand_identificationExample commmands are provided below. The nsf-d2 files are in $PHENIX/phenix_examples/nsf-d2-ligand What the phenix.ligand_identification command needs to run:The phenix.ligand_identification command needs:
(2) (optional), a PDB file with your protein model without ligand Choice of ligand library (keywords+examples)*default ligand library: no keyword needed --ligand_list="ATP CTP TTP GTP" *to use all small molecules with a .pdb extension in the 'ligand_dir' as the search library: --ligand_dir=/my/compound/library/A1217321 *to use all ligands associated with proteins of specific function found in the PDB as the search library: --function="tyrosine kinase" *to use all ligands found in proteins with a specific Enzyme Classfication number in the PDB: --EC=1.1.1.3 *to use all ligands found in proteins with a specific Pfam accession number in the PDB: --pfam=PF00042 *to use all ligands found in proteins with specific SCOP and/or CATH terms in the PDB: --scop='DNA/RNA-binding 3-helical bundle' --cath='Trypsin-like serine proteases' *to use all ligands found in proteins with a specific Gene Ontology (GO) accession number in the PDB: --GO=0009253 *to use all ligands found in proteins with a specific InterPro ID in the PDB: --ipro=014838 Output files from phenix.ligand_identification commandWhen you run phenix.ligand_identification command, the output files will be in the directory you started Phenix: overall_ligand_scores.log topligand.txt The last column "Sequence in library' contains numbers '###' indicating the sequence number of the corresponding ligands. The final fitted ligand coordinates and all the log files are in the corresponding'###' files described below. ligand_fit_pdbs/RSR_FITTED_[3-letter code]_###.pdb ligand_fit_scores/[3-letter code].scores resolve_map.mtz display.com (uses ligid.scm, also in the same directory) Multi-thread computingThe phenix.ligand_identification program has build-in multi-thread processing capability. Use keyword --nproc=[number of threads]to run the command in multi-threaded mode. In general, the processing speed is proportional to the the number of threads used, up to the maximum free cores the system can allocate at run time. For example, it takes about 25 minutes for the nsf-d2 example to run in 8 threaded mode, while single-process uses about 180 minutes for the same job on a dual-Xeon W5580 machine. Running from a parameters fileYou can run phenix.ligand_identification from a parameters file. This is often convenient because you can generate a default one with: phenix.ligand_identification --show_defaults > my_ligand.effand then you can just edit this file to match your needs and run it with: phenix.ligand_identification my_ligand.eff ExamplesSample command_line inputs
The nsf-d2 files in the above examples can be found in $PHENIX/phenix_examples/nsf-d2-ligand Possible ProblemsSpecific limitations and problems:
Literature
Additional informationList of ligands in the PHENIX ligand_identification default library--------------------------------------------------------------- PG4 CRY CYS DIO DOX GOL MO5 NBN OXL OXM PUT PYR F3S MO6 PEG COA DTT FS4 HED HEZ LI1 MPD SF4 SIN TMN TRS URA BEN BEZ MET NIO PGA POP ASP DAO FSO FUC GLU LYS PEP PGE PHB PHQ PLM TAR XYS ADE AKG 7HP BGC CAM HC4 ORO DKA GAL GLC MAN MES ARG PHE CIT FLC MMA MPO MYR NHE OLA PG4 AMG FER NAA NAG NDG SPM EPE NGA PLP TRP BTN F6P FTT G6P LDA UPL 1PE BH4 H4B THM 1PG P6G U10 ADN BOG EST FBP GSH GTT NVP RET UMP C5P C8E DHT TMP UFP CMP NCN PRP BGC MAN GTS IMP LAT MAL SUC GLC AMP FPP PQQ T44 2GP 3GP 5GP TYD UDP GTX SAH TDP TPP IMO SAM A3P ADP DCP GDP NAG XYS 2PE CTP DAD FOK TTP DTP DGA FMN SAP ACP ANP APC ATP FOL GNP GSP GTP MTX MAN CB3 MA4 UPG UD1 SPO HEC HEM NAG NAD NAI ACR GLC NAP NDP DHE BPH ACO BCL FAD CAA GLC AP5 BPB B12 List of all ligand_identification keywords------------------------------------------------------------------------------- Legend: black bold - scope names black - parameter names red - parameter values blue - parameter help blue bold - scope help Parameter values: * means selected parameter (where multiple choices are available) False is No True is Yes None means not provided, not predefined, or left up to the program "%3d" is a Python style formatting descriptor ------------------------------------------------------------------------------- ligand_identification mtz_in= None Enter an MTZ file name mtz_type= *F diffmap If input a precalculated difference map, use "D" instead input_labels= None Provide a label of F if mtz type is F, provide F, PHI if mtz if a difference map ligand_list= None enter ligands to be searched. Ligands should be in 3-letter codes seperated by spaces. For example, ligand_list="ATP CTP APN FMN". If no input or uninterpretable, Default lib will be used. See phenix.doc for default ligand lib ligand_dir= None Directory of your ligand library. Files in this directory with .pdb extentions will be used in ligand search. If None, default lib will be used. work_dir= None Top level directory where jobs will be run. Default is the directory where phenix.ligand_identification is started. EC= None Provide an EC# of your protein. The program will screen all ligands interact with proteins in this class found in PDB function= None Provide functional terms of your protein. For example 'kinase'. The program will then screen all ligands interact with proteins of similar functions found in PDB. Note: function should belong to partial EC terms. scop_fold= None Provide fold informtion of your protein in SCOP terms. For example 'tim barrel'. The program will then screen all ligands interact with protein with TIM barrel folds in PDB. cath= None Provide structural information of protein in CATH terms. For example 'kinase'. The program will then screen all ligands interact with proteins with kinase folds in PDB. pfam= None Provide PFam IDs for protein(s). For example 'PF001232'. The program will then screen all ligands interact with proteins with all PF001232 in PDB. go= None Provide Gene Ontology(GO) information of protein in crystal. For example '01234'. The program will then screen all ligands interact with proteins with that GO number in PDB. ipro= None Provide Interpro ID of protein. For example '01740'. The program will then screen all ligands interact with proteins with the specified InterPro ID in PDB. model= None Enter a PDB files containing the protein only. high_resolution= 2.5 specify the high resolution to use. default=2.5 low_resolution= 1000 Low resolution restart_run= False Use (restart_run = True) to continue unfinished run partial_analyze= False Analyze results before jobs complete. This function is currently not available with command-line version. ncpu= 1 Number of CPUs to use. n_indiv_tries_min= 30 usually 0 to 10, but set up to 300 to try harder to find soln n_indiv_tries_max= 300 usually 0 to 10, but set up to 300 to try harder to find soln n_group_search= 4 usually 3, but set up to 10 to try harder to find soln search_dist= 10 usually 10 A; always at least 5. smaller speeds up search delta_phi_lig= 40 usually 40 degree increments. set lower to search more fit_phi_inc= 20 usually 10 increments. set lower to search more local_search= True Usually True; Use False to force complete search nproc= 1 number of processors to use. This is used in resolve internal parallelization verbose= False verbose output debug= False debugging output temp_dir= Auto Optional temporary work directory output_dir= None Output directory where files are to be written dry_run= False Just read in and check parameter names number_of_ligands= None Total number of ligand sites. Ignored if "None". find_all_ligands will keep looking until the correlation coefficient for the fit of the best ligand is less than cc_min or the number of ligands placed is number_of_ligands, whichever comes first cc_min= 0.50 Ignored if "None". find_all_ligands will keep looking until the correlation coefficient for the fit of the best ligand is less than cc_min or the number of ligands placed is number_of_ligands, whichever comes first open_in_coot= False If true, Phenix will automatically start Coot after the run is complete. refine_ligand= True If true, real-space refinement will be applied to the top 20 ranked (or less) ligands. non_bonded= True If true, non_bonded terms will be applied to the scores if RSR is used and model exists. keep_all_files= False If true, all intermediate pdb and log files will be kept. cif_def_file_list= None You can supply cif files for real-space refinement. example: cif_def_file_list='/my/cif/file1.cif /my/cif/file2.cif' real_space_target_weight= 10 You can carry change the weight on the real-space term in real-space refinement on the ligand after fitting. job_title= None Job title in PHENIX GUI, not used on command line |