PHENIX Python-based Hierarchical ENvironment for Integrated Xtallography

structure_search

Overview

Structure_search is a tool to quickly identify structural homologs of the input PDB file from the Protein Data Bank. It uses the SARST algorithm, and it's very fast. A typical search time against the whole PDB is usually less than one second. There is an option to allow users to obtain a list of ligands found in structures of those homologs.

Usage

  • obtain superposed PDB chains sorted by similarities to mypdb.pdb

    phenix.structure_search mypdb.pdb

  • obtain a list of homologs of mypdb.pdb and all ligands found in structures of those homologs

    phenix.structure_search mypdb.pdb get_ligand=True

  • Use a local PDB mirror and obtain superposed homologs of mypdb.pdb

    phenix.structure_search mypdb.pdb PDB_MIRRORDIR=/path/to/pdb_mirror/top-level

More information can be found in input/Output files sections below:

Input files

required input:

  • pdb_file: the file containing the protein model of interest.

Optional inputs :

  • get_ligand:"=True" if want a list of ligands found in homologous PDBs, Default = False.

  • job_title: current job title

  • output_prefix: prefix for output files if needed.

  • get_pdb:Collect and superpose the top N homologous pdbs (default=10).

  • coot_display: Display superposed pdb files in coot. [default=(False/True) as E-value(>/<)1E-18].

  • sequence_only: Perform Blast sequence search against PDB database using Phenix internal DB. This

    option does not require network connection.

  • PDB_MIRRORDIR: Set option to use local PDB mirror instead of using RCSB server. See 'Using Local PDB mirror' section.

  • PDB_MIRROR_PDB: Set path to the coordinate files of the local PDB mirror. See 'Using Local PDB mirror' section.

  • PDB_MIRROR_STRUCTURE_FACTORS: Set path to the structure factor files of the local PDB mirror. See 'Using Local PDB mirror' section

Output files

In addition to screen output, these files contains results of structure_search:
  • output.txt: file containing homologs of 'pdb_file' sorted by scores.

  • MyBlast.log: Standard BLAST output with selected pairwise alignments. NOTE: for structure alignment, the

    'sequences' are structure-based Ramachandran codes (see reference), not 1-letter code for amino acids.

  • pdb_ligand.txt (if get_ligand=True): file containing all ligands found in all homologs from this search.

  • superposed PDB files: Can be found in TEMPPDB_## subdirectory as prompted in the program output.

Using Local PDB mirror

By default the program retrieves homologous PDB files from RCSB server for downstream processes. Users may choose to use their local PDB mirror if environmental variables "PDB_MIRROR_PDB" and "PDB_MIRROR_STRUCTURE_FACTORS" have already been defined in the running shell of phenix.structure_search. Alternatively, users may define one of more keywords in the command-line script --
  • PDB_MIRRORDIR: Defines the top level of the local PDB mirror. The program will try to retrieve PDBs and/or structure factors from local mirror unless path does not exist. Note this assumes the directory tree under it follows that in RCSB server, i.e. pdb files as 'pdb####.ent.gz' in

    PDB_MIRRORDIR/data/structures/divided/pdb directory. If you use PDB's rsync script, this variable sould be the same as the $MIRRORDIR in the script.

  • PDB_MIRROR_PDB: Alternatively, one may set explicit path to the pdb coordinate directly. This keyword direct the pdb retrieval to $PDB_MIRRORDIR/data/structures/divided/pdb directory.

  • PDB_MIRROR_STRUCTURE_FACTORS: Same idea as PDB_MIRROR_PDB except this is for structure factors.

We recommend setting PDB_MIRRORDIR as it will take care of both PDB_MIRROR_PDB and PDB_MIRROR_STRUCTURE_FACTORS together. However, users may choose to specify PDB_MIRROR_PDB or PDB_MIRROR_STRUCTURE_FACTORS instead. The progran will fall back to RCSB server should any of the path has errors.

References

Lo WC, Huang PJ, Chang CH, Lyu PC. BMC Bioinformatics. 2007, 8:307

List of all available keywords

  • structure_search
    • pdb_file = None Enter a PDB file name
    • sequence = None Enter a Fasta sequence file to search RCSB sequence homologs without a PDB.
    • output_prefix = 'output' Provide an output prefix if needed
    • blastpath = None Enter path to blastall executable
    • sequence_only = False Do a Blast search again PDBaa sequence instead of doing a Ramanchandran-based structure search
    • get_ligand = False Use get_ligand=True to retrive ligands.
    • get_pdb = 10 get_pdb=N will collect and superpose the top N homologous pdbs. Use get_pdb=0 to disable this option.
    • keep_all_pdb = False Keep all the PDB files, including full PDB, PDB_Chain and superposed PDB_Chain. Default is False which will keep only superposed PDB_Chain files in the directory specified in the output message.
    • coot_display = False (default) Display output pdb files in coot.
    • PDB_MIRRORDIR = None Enter the top directory of local RCSB PDB mirror. The program will try to retrieve PDBs and/or structure factors from this mirror first. Note this assumes the directory trees under it follows those in RCSB -- pdb files as 'pdb####.ent.gz' in PDB_MIRRORDIR/data/structures/divided/pdb directory. If you use PDB's rsync script, this variable would be the same as the $MIRRORDIR set in the script
    • PDB_MIRROR_PDB = None Enter the parent directory of the PDB files in the local PDB mirror. PDBs will be retrieved from subdirectory ## where ## are the second and third letters in the PDB id. This keyword should be $PDB_MIRRORDIR/data/structures/all/pdb directory. We recommend setting PDB_MIRRORDIR and it will take care of both PDB_MIRROR_PDB and DB_MIRROR_STRUCTURE_FACTORS together. However, users may choose to specify PDB_MIRROR_PDB directly
    • PDB_MIRROR_STRUCTURE_FACTORS = None Enter the parent directory of the PDB files in the local PDB mirror. structure factors s will be retrieved from subdirectory ## where ## are the second and third letters in the PDB id. This keyword should be the same as the $PDB_MIRRORDIR/data/structures/divided/structure_factors directory. We recommend setting PDB_MIRRORDIR and it will take care of both PDB_MIRROR_PDB and DB_MIRROR_STRUCTURE_FACTORS together. However, users may choose to specify PDB_MIRROR_STRUCTURE_FACTORS directly
    • local_pdb_dir = None Enter the path directly to your local PDB repository.
    • verbose = False verbose output
    • debug = False debugging output
    • job_title = None Job title in PHENIX GUI, not used on command line
    • guiGUI-specific parameter required for output directory
      • output_dir = None