Contents
Sculptor can be used to improve a molecular replacement model using additional information available from an alignment and/or structure. It is based on an algorithm outlined in Schwarzenbacher et al. (2004).
The following terms are used with the special meaning:
There is a standalone sculptor_new program available from the command line. The unified model preparation program sculpt_ensemble is available from the command line and also from the PHENIX GUI.
The fully processed structure is output. The file is named according to the following convention: root_pdb.pdb, where root is a user-defined parameter (accessible from the output scope), and pdb is the basename of the input PDB file.
The workflow consists of several stages that can be independently configured. These are listed in order of execution. For a summary of all keywords with the corresponding defaults, see the Additional information section.
CNS-style atom selection syntax. Default: all.
conformation for disordered entities, and discards the rest. Also involes sanitize_occupancies.
sanitize_occupancies: resets all occupancies to 1.0.
keep_crystal_symmetry: retain the CRYST record of the model structure.
In addition, chains will be analysed, hetero, sugar and solvent atoms will be separated from protein/DNA/RNA chains if they are not separated by TER cards.
Parameters in the chain_to_alignment_matching scope control how a sequence from an alignment is matched to the sequence of a macromolecule chain, and what constitutes an acceptable match. The sequence from the alignment is considered strictly consecutive, while gaps are allowed in the sequence derived from the protein chain (this is governed by the consecutivity parameter; geometry means that a chain segment is strictly consecutive if there is a bond to the neighbouring residue, and numbering means that residue numbering should be used to decide whether a residue is connected to neighbouring ones). The min_sequence__identity parameter is used as a threshold to accept a possible match between the two sequences.
These parameters control how error estimates input are matched to macromolecule chains. In addition, if no external errors are available, the modelling program * Rosetta * is installed and configured to be used with PHENIX, and there is network connectivity, Sculptor can be instructed to calculate an error estimate (calculate_if not_provided parameter and homology_modelling scope) by first making a homology model using * Rosetta * and then submitting this to the ProQ2 server. Raw results * obtained will be written out if the output_prefix parameter is set. Please note that this only works for protein chains. For more information, see the the simple_homology_model documentation.
Please note that the input/obtained error estimates are only used if a suitable processing method is selected. Such methods are available in the Deletion and B-factor prediction sections.
Discards residues from a model chain that are unlikely to improve signal in molecular replacement. This information is calculated either from the alignment or from estimated errors.
There are multiple algorithms available:
These algorithms can also be used together in any combination. In this case, a residue will be deleted if assigned for deletion by any active algorithms.
Makes small adjustments to the mainchain of a chain (taking results from deletion into account) to make it obey basic macromolecular features.
These algorithms can also be used together in combination. In this case, the chain will be processed sequentially by both algorithms.
This governs how the sidechain of an amino acid residue in the model is morphed into the target type.
This phase determines the level distance from the Calpha atom up to which a residue sidechain in the model is potentially similar to its counterpart in the target.
These algorithms can also be used together in any combination, in which case the sidechain will be truncated to the shortest value suggested.
B-factor prediction tries to increase B-factors for atoms that are likely to be more flexible or more in error. The calculation takes simple physical properties into account, and these are linearly transformed to B-factors (controlled by the factor parameter of the corresponding scope). If this value is lower than the minimum (from the bfactorscope) parameter, a constant is added to all B-factors so that the lowest of those equals to minimum (this is primarily intended to avoid negative B-factors).
Algorithms can be used in combination, in which case the sum of the predicted B-factors is used. This mode can also be used to map sequence similarity or accessible surface area to residues/atoms for display purposes.
Renumbers residues according to the target or model sequence. It is also possible to turn renumbering off (option original).
Renames residues according their counterpart in the target sequence. Please note that this is only a name change. Sidechain atoms are always mapped onto their target counterparts, and deleted if not present in the target. On the other hand, the addition of atoms that are present in the target and not in the model does not take place if renaming is not performed.
Controls the addition of missing atoms.
Discards residues from a model chain that are unlikely to improve signal in molecular replacement.
There are two algorithms available:
These algorithms can also be used together in any combination. In this case, a residue will be deleted if assigned for deletion by any active algorithms.
Makes small adjustments to the mainchain of a chain (taking results from deletion into account) to make it obey basic macromolecular features.
These algorithms can also be used together in combination. In this case, the chain will be processed sequentially by both algorithms.
This can be used to trim existing glucosyl chains based on the distance from the residue to which they are attached. Connectivity of glycosyl chains is worked out from distance tests, and the maximum_bond_length parameter can be used to adjust this slightly. Branched glycosyl chains are also handled.
Residues in these chains are normally deleted, unless an exception is made by specifying the residue codes that are to be retained. This is primarily intended to keep a known ligands of protein classes (e.g. HEM).
These are removed from the model.
Sequence similarity is calculated from the full alignment supplied (taking all present sequences into account), using a scoring matrix (currently blosum50, blosum62, dayhoff and identity are available). Raw scores are then smoothed using one of two alogrithms:
Sequence similarity calculation is configured individually for the steps that are using it.
Governs how missing error values are substituted.
Missing value substitution is configured individually for the steps that are using it.
phenix.sculptor \ [ command-line switches ] \ [ PHIL-format parameter files ] \ [ PHIL command-line assignments ] \ [ PDB-files ] \ [ alignment files ]
-h, --help show this help message and exit --show-defaults print PHIL and exit -i, --stdin read PHIL from stdin as well -v, --verbosity set verbosity level (info,debug,verbose) --version show program's version number and exit --text-logfile FILE Verbatim copy of log stream --html-logfile FILE Verbatim copy of log stream in HTML
Everything not starting with a dash ('-') is interpreted as a PHIL argument. This can be a PHIL-format file containing parameters, command-line assignment or a file whose type is automatically recognized (based on file extension). Note that sequence files are not accepted on the command line, since associated chains could not easily be guessed and require a fully specified parameter scope.
[Schwarzenbacher2004] | The importance of alignment accuracy for molecular replacement. R. Schwarzenbacher, A. Godzik, S. K. Grzechnik and L. Jaroszewski Acta Cryst. D60, 1229-1236 (2004) |
Improvement of molecular-replacement models with Sculptor. G. Bunkoczi and R. J. Read Acta Cryst. D67, 303-312 (2011)