phenix.structural_domain_search
Finds domains by calculating low cuts in the interaction graph, and evaluates
whether the cut would correspond to a viable domain decomposition.
Assuming that interaction within a domain are much stronger than between
domains, a domain can be considered a low cut within a suitable constructed
interaction graph.
The structure is divided up into chunks, and an interaction network is
calculated. This is then turned into an interaction graph, and low cuts are
enumerated. Cuts are then enumerated whether they would correspond to viable
domains, based on size, compactness and interface strength, and viable domains
are stored. The procedure is then repeated recursively until no viable cuts can
be found.
The only required input to the program is a structure file (PDB/cif), which
will then be analysed. The domain finding algorithm is then repeated for each
protein chains, and results will be written to the logfile, and also into
PHIL-syntax .eff-files.
The simplest command line:
phenix.structural_domain_search example.pdb
Runtime depends on the structure size, typically 1 min for structures between
200-300 residues, and 10 min for structures up to 1000 residues.
- max_iterations: controls how many low cuts to enumerate. Increasing this
number will make the search more exhaustive, but also slower.
- sufficient_count: terminate the search early if the requested number of
cuts have been found. It is probably a good idea to search for more than one,
since the lowest cut does not always correspond to the highest domain quality
score, but the default value (30) is perhaps slightly conservative, and good
results can be achieved with sufficient_count = 10. On the other hand, the
underlying clustering algorithm gets very slow when more then 100 values need
to be clustered, and hence this is a practical upper limit.
- min_size: minimum size for an acceptable domain. Domains smaller than this
will not be accepted. Default is 30.
- prune_fraction: controls whether to simplify the interaction graph by
joining low-connected vertices. This can make the correct domain
decomposition to appear earlier and hence make the search more efficient, at
a slight loss of cut precision. A typical prune_fraction is 0.05-0.15.
- algorithm: the quick algorithm typically gives correct results, but for
some tricky (especially multiple domain) structures the thorough algorithm
may find a better decomposition. Running times up to 10 times of the quick
algorithm are not uncommon.
The program writes out the best domain decomposition for each chain, as a
series of .eff-files containing the residue segments.
All considered decompositions are recorded in the logfile.
Efficient Algorithms for the Problems of Enumerating Cut by Non-decreasing Weights. L.-P. Yeh, B.-F. Wang, and H.-H. Su. Algoritmica 5626, 297-312 (2010).
A simple min-cut algorithm. M. Stoer, and F. Wagner. Journal of the ACM 44, 585-591 (1997).
An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. Y. Boykov, and V. Kolmogorov. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1124-1137 (2004).
Protein domain decomposition using a graph-theoretic approach. Y. Xu, D. Xu, and H.N. Gabow. Bioinformatics 16, 1091-1104 (2000).
- inputInput files
- structure = None Structure file
- outputOutput files
- root = "domain" Root for domain files
- configurationCalculation parameters
- interaction_strengthParameters to calculate interaction strength between two residues
- significance_threshold = 0.01 Discard interaction if below this level
- prune_fraction = 0.00 Discard lowest connected segments
- long_range_contactParameters for defining a long-range contact
- max_spatial_distance = 4.0 Maximum distance for atom-atom contact
- short_range_contactParameters controlling the strength of short range contacts
- default = 10.0 Strength of a coil-to-coil and mixed type connections
- helix_to_helix_connection = 50.0 Strength of a helix-to-helix connection
- strand_to_strand_connection = 50.0 Strength of a strand-to-strand connection
- assembly_membershipParameters rewarding membership in a higher order assembly
- sheet_bonus = 20.0 Extra strength if two strands are within the same sheet
- chunkingParameters controlling chunking of segments
- coil_max_size = 15 Minimum chunksize for a coil
- helix_max_size = None Minimum chunksize for a helix
- strand_max_size = None Minimum chunksize for a strand
- evaluationParameters controlling domain acceptance
- min_size = 30 Minimum domain size is residues
- min_compactness = 0.6 Minimum compactness (internal interaction strength to atoms ratio)
- max_interface_strength = 0.4 Maximum inteface strength to internal interaction strength ratio
- searchParameters controlling domain search
- algorithm = *quick thorough Algorithm to use
- max_iterations = 500 Maximum cut iterations to perform
- sufficient_count = 31 Quit search if sufficient number of domains has been located
- cluster_radius = 20 Radius for identifying quasi-equivalent domains
- max_residue_difference_for_equality = 10 Maximum residue differences between two domains
- max_residue_overlap_for_compatibility = 5 Max number of common residues between two domains
- strong_connection = 100 Do not separate multisegment domains that are at least as strongly connected