PHENIX Python-based Hierarchical ENvironment for Integrated Xtallography

structure_search

Overview

Structure_search is a tool to quickly identify structural and/or sequence homologs of the input PDB file from the Protein Data Bank. It uses the SARST algorithm, and it's very fast. A typical search time against the whole PDB is usually less than one second. There is an option to allow users to obtain a list of ligands found in PDB structures of those homologs.

Usage

  • Identify and superpose homologous pdbs of mypdb.pdb
    phenix.structure_search mypdb.pdb
  • obtain a list of homologs of mypdb.pdb and all ligands found in structures of those homologs
    phenix.structure_search mypdb.pdb get_ligand=True
  • Use a local PDB mirror and obtain superposed homologs of mypdb.pdb
    phenix.structure_search mypdb.pdb PDB_MIRRORDIR=/path/to/pdb_mirror/top-level

More information can be found in input/Output files sections below:

Input files

required input:

  • pdb_file: the file containing the protein model of interest.
    By default, structure_search will search structure and sequence homologs of the input PDB file.

Optional inputs :

  • get_ligand:"=True" if want a list of ligands found in homologous PDBs, Default = False.
  • job_title: current job title
  • output_prefix: prefix for output files if needed.
  • get_pdb:Collect and superpose the top N homologous pdbs (default=10).
  • coot_display: Display superposed pdb files in coot. [default=(False/True) as E-value(>/<)1E-18].
  • sequence_only:"=False" Perform only sequence search against PDB database using Phenix' internal DB.
  • structure_only: Perform only structure search against PDB database using the SARST method.
  • PDB_MIRRORDIR: Set option to use local PDB mirror instead of using RCSB server. See 'Using Local PDB mirror' section.

Output files

In addition to screen output, these files contains results of structure_search:
  • output_(sequence/structure).txt: files containing sequence/structure homologs of 'pdb_file' sorted by scores.
  • MyBlast_(sequence/structure).log: Standard BLAST output with selected pairwise alignments. NOTE: for structure alignment, the 'sequences' are structure-based Ramachandran codes (see reference), not 1-letter code for amino acids.
  • pdb_ligand.txt (if get_ligand=True): file containing all ligands found in all homologs from this search.
  • superposed PDB files: Can be found in TEMPPDB_## subdirectory as prompted in the program output.

Using Local PDB mirror

By default the program retrieves homologous PDB mmCIF files from RCSB server for downstream processes. Users may choose to use their local PDB mirror if environmental variables "PDB_MIRRORDIR"has already been defined in the running shell of phenix.structure_search. Alternatively, users may define it in the command-line script or specify the path in the GUI. See more details below.
  • PDB_MIRRORDIR: Defines the top level of the local PDB mirror. The program will try to retrieve PDB mmCIF files from local mirror unless the path does not exist. Note this assumes the directory tree under it follows that in the RCSB server and will try to access $PDB_MIRRORDIR/data/structures/divided/mmCIF. The progran will fall back to using RCSB server should the path contain errors.

References

Lo WC, Huang PJ, Chang CH, Lyu PC. BMC Bioinformatics. 2007, 8:307

List of all available keywords

{{phil:phenix.command_line.structure_search}}