Ensembler can be used to superpose multiple chains to be used as an ensemble search model for molecular replacement.
Ensembler can be run from the PHENIX GUI and the command line, the only difference being the way commands are taken from the user.
The graphical user interface makes all settings accessible either as part of the main window (for frequently used options) or as a dialog box Ensemble generation settings.... Input files are specified either by the +/- button pair, or by drag-and-drop onto the window area. File types are automatically recognized, and added to the relevant input section.
The superposed chains can be written out either as a quasi multiple model PDB file that is readable by phaser directly (output style merged, output file name root_merged.pdb) or as a series of files containing each chain separately (output style separate, file name root_pdb_chain.pdb, where pdb is the name of the PDB file the chain was read from, and chain is the chain identifier). The file name root can be changed via the root parameter of the output (default: ensemble).
The workflow consists of several stages that can be independently configured. These are listed in order of execution, and control parameters are accessible from the Ensemble generation settings... dialog box. For a summary of all parameters with the corresponding defaults, see the Additional information section.
Establishes the equivalence of residues among the input chains. There are several options available:
Most common residue mapping mode is ssm. If no SSM alignment can be done or this is imprecise (e.g. no secondary structure), muscle is a good second choice.
Maps selected atoms within equivalent residues to each other. The mapping is done by name hence the order of atoms in the residue does not matter. If atoms are missing from certain residues (or if certain residues contain extra atoms), a gap will be filled where necessary. Atom selection is controlled by the atoms parameter of the configuration scope. Default atom selection: CA.
Equivalent positions are superposed iteratively to find a globally optimal solution. There are two superposition algorithms implemented, which primarily differ in how they handle gaps in equivalent positions.
Both algorithms use Diamond's formulation to solve the pairwise rotational superposition problem (Diamond, 1988).
An exception is raised if there are less than 3 sites present for superposition.
Multiple superposition is an iterative process and consists of a series of pairwise superpositions. The convergence criterion is controlled by the convergence parameter (in the superposition scope), which is the r.m.s. difference change between two consecutive iterations.
Automatic weighting can be used to improve superposition, either to amplify highly homologous regions or to decrease the effect of incorrect site-equivalence (typically arises because of a wrong alignment). Implemented weighting schemes are as follows:
unit - Unit weights (equivalent to no weighting).
robust_resistant - Robust-resistant weighting scheme (default). This tends to converge fast and give reliable results. The exact formula for weighting is as follows:
w = 1 - ( delta2 / tolerance2 )2 if delta < tolerance w = 0 (otherwise)
where delta is the deviation from the average. Tolerance is an empirical value, and its optimal value is close to (unweighted r.m.s.d.) 2 and can be controlled by the critical parameter of the robust_resistant scope.
Weighting is iterated with superposition until weights converge, which can be controlled by the convergence parameter of the weighting scope.
In case of highly dissimilar structures (or incorrect residue mapping), weight determination may temporarily need to be damped to avoid divergence. This is done automatically (in steps controlled by the incremental_damping_factor parameter of the weighting scope), until a preset value (controlled by the max_damping_factor parameter of the weighting scope) is reached, at which point an exception is raised.
Hierarchical cluster analysis is performed using the pairwise r.m.s. differences as a distance measure. The clustering parameter of the configuration score can be used to adjust cluster boundaries.
This option trims residues from the final superposed model where the unweighted r.m.s.d. is above a certain threshold (threshold parameter of the trimming scope). Useful in removing flexible loops, etc. Default: no trimming.
After superposition is complete, the chains can be sorted by sequence identity (identity), fraction of common sites wrt all aligned atom positions (overlap), weighted r.m.s.d. (wrmsd) or unweighted r.m.s.d. (unwrmsd). This is controlled by the sort parameter of the output scope. Default: input order (input).
phenix.ensembler \ [ command-line switches ] \ [ PHIL-format parameter files ] \ [ PHIL command-line assignments ] \ [ PDB-files ] \ [ alignment files ]
Command-line switches:
-h, --help show this help message and exit --show-defaults print PHIL and exit -i, --stdin read PHIL from stdin as well -v, --verbosity set verbosity level (DEBUG,INFO,WARNING,VERBOSE)
PHIL arguments:
Everything not starting with a dash('-') is interpreted as a PHIL argument. This can be a PHIL-format file containing parameters, command-line assignment or a file whose type is automatically recognized (based on file extension; structure files and alignment files are recognized automatically).
[Diamond1988] | A note on the rotational superposition problem R. Diamond Acta Cryst. A44, 211-216 (1988) |
[Diamond1992] | On the multiple simultaneous superposition of molecular structures by rigid body transformations R. Diamond Protein Science 1, 1279-1287 (1992) |
[WangSnoeyink2008] | Defining and computing optimum RMSD for gapped and weighted multiple-structure alignment X. Wang and J. Snoeyink EEE/ACM Transactions on Computational Biology and Bioinformatics 5, 525-533 (2008) |