Homology modelling using simple_homology_model

Contents

Purpose

This program can be used for building a homology model for a target sequence using a template structure and an alignment.

Installation

simple_homology_model uses Rosetta, software developed in the Baker laboratory at the University of Washington. See the central installation notes for Rosetta.

Usage

simple_homology_model is available from the command line.

Input files

Output files

The homology model is written out as specified in output.model.

Description

The program first discard all residues from the template structure that are not present in the target and morphs the residue type when they are (this step is performed by Sculptor and is governed by the alignment). The missing segments are detected, a suitable loop modelling setup is devised, which is then fed into Rosetta to do the actual building. For a summary of all keywords with the corresponding defaults, see the Additional information section.

Chain-to-alignment-matching

This controls how chain-to-alignment matching is performed. For more information, see the Sculptor documentation.

Homology modelling setup

For each loop, the program backtracks min_residue_margin number of residues on both ends, then checks the distance between bridge residues. If this is shorter than residue_distance * number-of-residues-in-between, than backtracks one residues more on each side, and keeps doing this until the criterion is statisfied, or hits a terminus on either end. After this, it evaluates residue segments that are not assigned for loop modelling and will be kept fixed during the modelling step. Segments shorter than min_edge_segment_length will be discarded at the termini, and shorter than min_intenal_segment_length will be discarded internally, and the surrounding loops joined. It then checks the loop lengths, and stops with an error message if a loop is longer than max_loop_length.

Homology modelling

The program runs Rosetta with a kinematic loop closure algorithm (rosetta_loop_closure), as this does not require selected fragments. After the loop is closed, Rosetta can be instructed to run a refinement (rosetta_loop_refinement).

Command line

phaser.simple_homology_model \
    [ command-line switches ] \
    [ PHIL-format parameter files ] \
    [ PHIL command-line assignments ] \
    [ PDB file ] \
    [ alignment file ] \
    [ sequence file ]

Command-line switches:

-h, --help            show this help message and exit
--show-defaults       print PHIL and exit
-i, --stdin           read PHIL from stdin as well

PHIL arguments:

Everything not starting with a dash('-') is interpreted as a PHIL argument. This can be a PHIL-format file containing parameters, command-line assignment or a file whose type is automatically recognized (based on file extension; structure files and alignment files are recognized automatically).

Warning and error messages

Error messages

  • Structure file is missing: no PDB file specified.
  • Chain .. is not recognised as protein: the chain specified for modelling is not protein.
  • Alignment file is missing: no alignment file specified.
  • Could not match chain with alignment: chain-to-alignment matching was not successful (reason indicated in previous line).
  • ..: loop longer than maximum allowed: loop .. is longer than the maximum allowed.
  • Incomplete Rosetta environment setup: ..: Rosetta environment is not setup.
  • Cannot locate loopmodel: ..": the ``loopmodel Rosetta executable is not found.
  • Cannot locate Rosetta database: the Rosetta database is not found.
  • Rosetta finished with error code ..: Rosetta exited with an error code.
  • No models produced: although Rosetta exited indicating success, no model is created.

Warnings

  • No sequence file provided: no sequence file provided, the first sequence from the alignment will be used as target.
  • `` Multiple models output - using only first``: Rosetta generated multiple models, and only the first on will be used.

Additional information

List of all available keywords

  • inputParameters controlling input
    • structure = None PDB file name
    • alignment = None Alignment file name
    • sequence = None Target sequence
  • outputParameters controlling output
    • model = model.pdb Output model file name
  • chain_to_alignment_matchingChain-to-alignment matching options
    • consecutivity = *geometry numbering Consecutivity criterion to detect chain breaks
    • min_hss_length = 3 Minimum length of a sequence fragment to be included in chain alignment
    • max_seed_hss_count = 12 Number of HSS to use in extensive search
    • max_completion_hss_count = 6 Number of HSS to use in gap filling
    • min_sequence_overlap = 10 Minimum overlap between sequences to perform full alignment
    • min_sequence_identity = 0.80 Minimum sequence identity of accepted chain alignment
  • homology_modellingParameters controlling homology modelling
    • min_residue_margin = 2 Minimum number of residues to cut back on both sides of loops
    • residue_distance = 2.5 Distance covered by a single residue (in A)
    • min_edge_segment_length = 5 Discard edge segments if shorter (after considering gap margins)
    • min_internal_segment_length = 2 Discard internal segments if shorter (after considering gap margins)
    • max_loop_length = 60 Maximum loop length
    • rosetta_max_build_attempts = 1000 Maximum build attempts to close loop
    • rosetta_bump_overlap_factor = 0.1 Allows some atomic overlap in initial loop closures
    • rosetta_loop_closure = kic *ngk Algorithm for Rosetta loop closure
    • rosetta_loop_refinement = default quick test fast *no Algorithm for Rosetta loop refinement