structure_search
Overview
Structure_search is a tool to quickly identify structural and/or sequence homologs of the input PDB file from the Protein Data Bank. It uses the SARST algorithm, and it's very fast. A typical search time against the whole PDB is usually less than one second.  There is an option to allow users to obtain a list of ligands found in PDB structures of those homologs.
 
Usage
- 
- Identify and superpose homologous pdbs of mypdb.pdb
- phenix.structure_search mypdb.pdb
 
- 
- obtain a list of homologs of mypdb.pdb and all ligands found in structures of those homologs
- phenix.structure_search mypdb.pdb get_ligand=True
 
- 
- Use a local PDB mirror and obtain superposed homologs of mypdb.pdb
- phenix.structure_search mypdb.pdb PDB_MIRRORDIR=/path/to/pdb_mirror/top-level
 
More information can be found in input/Output files sections below:
 
Output files
- In addition to screen output, these files contains results of structure_search:
- 
- output_(sequence/structure).txt: files containing sequence/structure homologs of 'pdb_file' sorted by scores.
- MyBlast_(sequence/structure).log: Standard BLAST output with selected pairwise alignments. NOTE: for structure alignment, the 'sequences' are structure-based Ramachandran codes (see reference), not 1-letter code for amino acids.
- pdb_ligand.txt (if get_ligand=True): file containing all ligands found in all homologs from this search.
- superposed PDB files: Can be found in TEMPPDB_## subdirectory as prompted in the program output.
 
 
Using Local PDB mirror
- By default the program retrieves homologous PDB mmCIF files from RCSB server for downstream processes.  Users may choose to use their local PDB mirror if environmental variables "PDB_MIRRORDIR"has already been defined in the running shell of phenix.structure_search.  Alternatively, users may define it in the command-line script or specify the path in the GUI. See more details below.
- 
- PDB_MIRRORDIR: Defines the top level of the local PDB mirror.  The program will try to retrieve PDB mmCIF files from local mirror unless the path does not exist. Note this assumes the directory tree under it follows that in the RCSB server and will try to access $PDB_MIRRORDIR/data/structures/divided/mmCIF. The progran will fall back to using RCSB server should the path contain errors.
 
 
References
Lo WC, Huang PJ, Chang CH, Lyu PC. BMC Bioinformatics. 2007, 8:307
 
List of all available keywords
- structure_search- pdb_file = None  Enter a PDB file name 
- sequence = None  Optional Fasta sequence file. Only needed for a quick sequence search against RCSB without a PDB. 
- output_prefix = 'output'  Provide an output prefix if needed 
- blastpath = None  Enter path to blastall executable 
- sequence_only = False  Do a Blast search against PDBaa sequence instead of
        doing a Ramanchandran-based structure search
- structure_only = False  Do only a Ramanchandran-based structure search. 
- db_used = 'rcsb' structure database used in search. rcsb or scop95.
- get_ligand = False  Use get_ligand=True to retrive ligands.
- get_ramacode_only = False  Generate Rama code for input pdb/cif only.
        This is for developers only.
- get_xml_only = False Get BLAST XML output returned as a string object.
        No coordinate superposition will be performed. Developers only.
- use_pdb100aa = False  Use PDB100 sequence database for sequence search.
- use_custom_db = False  Use custom database specified by custom_db_files/custom_db_dir.
- custom_db_dir = None  The directory of pdb/cif files to make custom database.
        Default is current directory
- custom_db_files = None Filenames of the pdb/cif files seperated by spaces for database.
        If none specified, all pdb/cif in the custom_db_dir will be collected
- atom_selection = 'all' Choose part of the pdb used in the search (default=all).
        for example: chain B, resseq 113:219, ... etc. 
- get_pdb = 10  get_pdb=N will collect and superpose the top N
         homologous pdbs.  Use get_pdb=0 to disable this option.
- deposited_before = 0  Specify the latest year of matching structures to be considered
        for scoring. Pdbs deposited after this year will be discarded.
- deposited_after = 0  Specify the earliest year of matching structures to be considered
        for scoring. Pdbs deposited before this year will be discarded.
- batch_size = 0  Process the pdbs in batch of <batch_size> until <min_match>
         hits are identified or until all <get_pdb> pdbs are processed
- min_match = 0  Finish structure_search when <min_match> matches are found.
        Usually uses with <trim_ends> to exit the search once find suitable pdbs.
- keep_all_pdb = False  Keep all the PDB files, including full PDB, PDB_Chain and
        superposed PDB_Chain.  Default is False which will keep only superposed
        PDB_Chain files in the directory specified in the output message.
- trim_ends = False  Remove terminal residues of hit pdbs extending beyond those
        of the the target pdb.
- write_pdb = True  Set to False if no output pdb file is needed. Sometimes useful
        if use Structure_Search within another program and only want to pass pdb
        objects.
- write_results = True  Set to False if no output results/log files is needed. Useful
        when calling Structure_Search within another program and only want to pass pdb
        objects.
- trim_hit_pdb = False  Remove extra domains, extended loops, and unfit portions
        of hit pdbs after superposed to the target pdb.
- pickle_hits = False  Pickle blast hit results from xml output.
- coot_display = False  (default) Display output pdb files in coot.
- ask_coot = True prompt for coot display optios
- PDB_MIRRORDIR = None  Enter the top directory of local RCSB PDB mirror.  The program
        will try to retrieve PDBs and/or structure factors from this mirror first.
        Note this assumes the directory trees under it follows those in RCSB --
        pdb files as 'pdb####.ent.gz' in PDB_MIRRORDIR/data/structures/divided/pdb directory.
        If you use PDB's rsync script, this variable would be the same as the $MIRRORDIR set
        in the script
- PDB_MIRROR_MMCIF = None  Enter the parent directory of the mmcif files in the local PDB mirror.
        MMCIFs will be retrieved from subdirectory ##  where ## are the second and third letters
        in the PDB id.  This keyword should be $PDB_MIRRORDIR/data/structures/divided/mmcif directory.
        
- PDB_MIRROR_PDB = None  Enter the parent directory of the PDB files in the local PDB mirror.
        PDBs will be retrieved from subdirectory ##  where ## are the second and third letters
        in the PDB id.  This keyword should be $PDB_MIRRORDIR/data/structures/divided/pdb directory.
        We recommend setting PDB_MIRRORDIR and it will take care of both PDB_MIRROR_PDB and
        others together.  However, users may choose to specify PDB_MIRROR_PDB
         directly
- PDB_MIRROR_STRUCTURE_FACTORS = None Enter the parent directory of the PDB files in the local PDB mirror.
        structure factors s will be retrieved from subdirectory ##  where ## are the second
        and third letters in the PDB id.  This keyword should be the same as the
        $PDB_MIRRORDIR/data/structures/divided/structure_factors directory.
        We recommend setting PDB_MIRRORDIR and it will take care of both PDB_MIRROR_PDB and
        DB_MIRROR_STRUCTURE_FACTORS together.  However, users may choose to specify
        PDB_MIRROR_STRUCTURE_FACTORS directly 
- local_pdb_dir = None  Enter the path directly to your local PDB repository.
- verbose = False verbose output
- debug = False debugging output
- job_title = None Job title in PHENIX GUI, not used on command line
- guiGUI-specific parameter required for output directory