phenix.structural_domain_search

Contents

Algorithm
Usage
- Important parameters
- Output
References
Command-line options

Finds domains by calculating low cuts in the interaction graph, and evaluates whether the cut would correspond to a viable domain decomposition.

Algorithm

Assuming that interaction within a domain are much stronger than between domains, a domain can be considered a low cut within a suitable constructed interaction graph.

The structure is divided up into chunks, and an interaction network is calculated. This is then turned into an interaction graph, and low cuts are enumerated. Cuts are then enumerated whether they would correspond to viable domains, based on size, compactness and interface strength, and viable domains are stored. The procedure is then repeated recursively until no viable cuts can be found.

Usage

The only required input to the program is a structure file (PDB/cif), which will then be analysed. The domain finding algorithm is then repeated for each protein chains, and results will be written to the logfile, and also into PHIL-syntax .eff-files.

The simplest command line:

phenix.structural_domain_search example.pdb

Runtime depends on the structure size, typically 1 min for structures between 200-300 residues, and 10 min for structures up to 1000 residues.

Important parameters

max_iterations: controls how many low cuts to enumerate. Increasing this number will make the search more exhaustive, but also slower.
sufficient_count: terminate the search early if the requested number of cuts have been found. It is probably a good idea to search for more than one, since the lowest cut does not always correspond to the highest domain quality score, but the default value (30) is perhaps slightly conservative, and good results can be achieved with sufficient_count = 10. On the other hand, the underlying clustering algorithm gets very slow when more then 100 values need to be clustered, and hence this is a practical upper limit.
min_size: minimum size for an acceptable domain. Domains smaller than this will not be accepted. Default is 30.
prune_fraction: controls whether to simplify the interaction graph by joining low-connected vertices. This can make the correct domain decomposition to appear earlier and hence make the search more efficient, at a slight loss of cut precision. A typical prune_fraction is 0.05-0.15.
algorithm: the quick algorithm typically gives correct results, but for some tricky (especially multiple domain) structures the thorough algorithm may find a better decomposition. Running times up to 10 times of the quick algorithm are not uncommon.

Output

The program writes out the best domain decomposition for each chain, as a series of .eff-files containing the residue segments.

All considered decompositions are recorded in the logfile.

References

Efficient Algorithms for the Problems of Enumerating Cut by Non-decreasing Weights. L.-P. Yeh, B.-F. Wang, and H.-H. Su. Algoritmica 5626, 297-312 (2010).

A simple min-cut algorithm. M. Stoer, and F. Wagner. Journal of the ACM 44, 585-591 (1997).

An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. Y. Boykov, and V. Kolmogorov. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1124-1137 (2004).

Protein domain decomposition using a graph-theoretic approach. Y. Xu, D. Xu, and H.N. Gabow. Bioinformatics 16, 1091-1104 (2000).

Command-line options

inputInput files
- structure = None Structure file
outputOutput files
- root = "domain" Root for domain files
configurationCalculation parameters
- interaction_strengthParameters to calculate interaction strength between two residues
  - significance_threshold = 0.01 Discard interaction if below this level
  - prune_fraction = 0.00 Discard lowest connected segments
  - long_range_contactParameters for defining a long-range contact
    - max_spatial_distance = 4.0 Maximum distance for atom-atom contact
  - short_range_contactParameters controlling the strength of short range contacts
    - default = 10.0 Strength of a coil-to-coil and mixed type connections
    - helix_to_helix_connection = 50.0 Strength of a helix-to-helix connection
    - strand_to_strand_connection = 50.0 Strength of a strand-to-strand connection
  - assembly_membershipParameters rewarding membership in a higher order assembly
    - sheet_bonus = 20.0 Extra strength if two strands are within the same sheet
- chunkingParameters controlling chunking of segments
  - coil_max_size = 15 Minimum chunksize for a coil
  - helix_max_size = None Minimum chunksize for a helix
  - strand_max_size = None Minimum chunksize for a strand
- evaluationParameters controlling domain acceptance
  - min_size = 30 Minimum domain size is residues
  - min_compactness = 0.6 Minimum compactness (internal interaction strength to atoms ratio)
  - max_interface_strength = 0.4 Maximum inteface strength to internal interaction strength ratio
- searchParameters controlling domain search
  - algorithm = *quick thorough Algorithm to use
  - max_iterations = 500 Maximum cut iterations to perform
  - sufficient_count = 31 Quit search if sufficient number of domains has been located
  - cluster_radius = 20 Radius for identifying quasi-equivalent domains
  - max_residue_difference_for_equality = 10 Maximum residue differences between two domains
  - max_residue_overlap_for_compatibility = 5 Max number of common residues between two domains
  - strong_connection = 100 Do not separate multisegment domains that are at least as strongly connected