Python-based Hierarchical ENvironment for Integrated Xtallography |
Documentation Home |
Model editing with Sculptor
Author(s)
PurposeSculptor can be used to improve a molecular replacement model using additional information available from an alignment and/or structure. ConventionsThe following terms are used with the special meaning:
UsageSculptor can be run from the PHENIX GUI and the command line, the only difference being the way commands are taken from the user. Input files
Command linephenix.sculptor \ [ command-line switches ] \ [ PHIL-format parameter files ] \ [ PHIL command-line assignments ] \ [ PDB-files ] \ [ alignment files ] Command-line switches:-h, --help show this help message and exit --show-defaults print PHIL and exit -i, --stdin read PHIL from stdin as well -v, --verbosity set verbosity level (DEBUG,INFO,WARNING,VERBOSE) --mode set mode (flexible,predefined) PHIL arguments:Everything not starting with a dash ('-') is interpreted as a PHIL argument. This can be a PHIL-format file containing parameters, command-line assignment or a file whose type is automatically recognized (based on file extension). Note that sequence files are not accepted on the command line, since associated chains could not easily be guessed and require a fully specified parameter scope. GUIThe graphical user interface makes all settings accessible either as part of the main window (for frequently used options) or through a series of dialog boxes under Settings for:: Main-chain removal, Main-chain polishing, Sidechain pruning, B-factors, Renumbering and Renaming. The input PDB file is specified in the PDB file: input line, while alignments and target sequences can be added through the Sequence alignment files... and the Sequence files... dialog boxes, respectively. Output filesIn flexible mode, the fully processed structure is output. The file is named according to the following convention: root_pdb.pdb, where root is a user-defined parameter (accessible from the output scope), and pdb is the basename of the input PDB file. In predefined mode, there is an output file produced for each requested protocol, and named according to root_pdb_N.pdb, where N is the number of the corresponding protocol. DescriptionThe workflow consists of several stages that can be independently configured. These are listed in order of execution.For a summary of all keywords with the corresponding defaults, see the Additional information section. Preprocessing
In addition, chains will be analysed, and solvent atoms will be separated from protein chains if they are not separated by TER cards. Protein chainsDeletionDiscards residues from a model chain that are unlikely to improve signal in molecular replacement. This information is calculated from the alignment. There are multiple algorithms available:
These algorithms can also be used together in any combination. In this case, a residue will be deleted if assigned for deletion by any active algorithms. PolishingMakes small adjustments to the mainchain of a chain (taking results from deletion into account) to make it obey basic macromolecular features.
These algorithms can also be used together in combination. In this case, the chain will be processed sequentially by both algorithms. PruningThis phase determines the level distance from the Calpha atom up to which a residue sidechain in the model is potentially similar to its counterpart in the target.
These algorithms can also be used together in any combination, in which case the sidechain will be truncated to the shortest value suggested. B-factor predictionB-factor prediction tries to increase B-factors for atoms that are likely to be more flexible or more in error. The calculation takes simple physical properties into account, and these are linearly transformed to B-factors (controlled by the factor parameter of the corresponding scope). If this value is lower than the minimum (from the bfactorscope) parameter, a constant is added to all B-factors so that the lowest of those equals to minimum (this is primarily intended to avoid negative B-factors).
Algorithms can be used in combination, in which case the sum of the predicted B-factors is used. This mode can also be used to map sequence similarity or accessible surface area to residues/atoms for display purposes. RenumberRenumbers residues according to the target or model sequence. It is also possible to turn renumbering off (option original). RenameRenames residues according their counterpart in the target sequence. It also "morphs" the sidechain, i.e. renames atoms and deletes atoms that are not present. It can also generate missing atoms, if their positions are determined unambigously by present atoms (available via the completion parameter of the macromolecule scope).
Non-macromolecular chainsResidues in these chains are normally deleted, unless an exception is made by specifying the residue codes that are to be retained. This is primarily intended to keep a known ligands of protein classes (e.g. HEM). Sequence similarity calculationSequence similarity is calculated from the full alignment supplied (taking all present sequences into account), using a scoring matrix (currently blosum50, blosum62, dayhoff and identity are available). Raw scores are then averaged over a window of residues (defaults to 5 residues in both directions) that is weighted using either uniform or triangular weights. The resulting scores are "normalized" so that 1.0 would indicate a perfect alignment, 0.0 would be a random match, and -1.0 a (locally) fully gapped alignment (on average). Note, it is possible to obtain values outside this range. This helps to ensure that defaults are sensible irrespective of the choice for the scoring matrix. Sequence similarity calculation is configured individually for the steps that are using it. Specific limitations and possible problemsProcessing features
Error messages
Warning messages
Literature
CitationImprovement of molecular-replacement models with Sculptor. G. Bunkoczi and R. J. Read Acta Cryst. D67, 303-312 (2011) Additional informationList of all sculptor keywords------------------------------------------------------------------------------- Legend: black bold - scope names black - parameter names red - parameter values blue - parameter help blue bold - scope help Parameter values: * means selected parameter (where multiple choices are available) False is No True is Yes None means not provided, not predefined, or left up to the program "%3d" is a Python style formatting descriptor ------------------------------------------------------------------------------- hetero= None Keep named hetero residues min_hssp_length= 6 Length of residue segment that indicates a reliable match min_matching_fraction= 0.4 Minimum matching fraction in residue-to-alignment matching input Input files model Input pdb file file_name= None PDB file name selection= all Selection string remove_alternate_conformations= False Remove alternate conformations sanitize_occupancies= False Sets occupancies > 1.0 to 1.0 alignment Input alignment file file_name= None target_index= 1 Index of target sequence in alignment sequence Input sequence file file_name= None Sequence file chain_ids= None output Output options job_title= None Job title in PHENIX GUI, not used on command line folder= . Output file folder root= sculpt Output file root format= *pdb coot Output file format macromolecule Workflow step configuration completion= *cbeta Sidechain completion algorithms (* = active) rename= True True: enable; False: disable keep_ptm_if_base_residues_agree= False Keep post-translational modification if residues agree deletion Configure mainchain deletion use= completeness_based_similarity remove_long threshold_based_similarity *gap Available algorithms (* = active) completeness_based_similarity Delete residues based on sequence similarity to get same number of gaps as the Schwarzenbacher algorithm offset= 0.0 Completeness in fraction of model length (0.0 = completeness from Schwarzenbacher algorithm, useful range: +/-0.05) calculation Configure sequence similarity calculation matrix= *blosum50 blosum62 dayhoff identity Similarity matrix window= 5 Averaging window width weighting= *triangular uniform Weighting scheme remove_long Delete residue if aligned with gap min_length= 3 Minimum length for mainchain segment to remove threshold_based_similarity Delete residue if sequence similarity is low threshold= -0.2 Threshold to accept a residue calculation Configure sequence similarity calculation matrix= *blosum50 blosum62 dayhoff identity Similarity matrix window= 5 Averaging window width weighting= *triangular uniform Weighting scheme gap Delete residue if aligned with gap polishing Configure mainchain polishing use= remove_short keep_regular Available algorithms (* = active) remove_short Delete short unconnected segments minimum_length= 3 Minimum length keep_regular Keep residues in secondary structure maximum_length= 1 Maximum length pruning Configure sidechain pruning use= *schwarzenbacher similarity Available algorithms (* = active) schwarzenbacher Truncate atoms if target residue != source residue pruning_level= 2 Level of truncation similarity Truncate atoms based on sequence similarity pruning_level= 2 Level of intermediate truncation full_length_limit= 0.2 Limit of no truncation full_truncation_limit= -0.2 Limit for full truncation calculation Configure sequence similarity calculation matrix= *blosum50 blosum62 dayhoff identity Similarity matrix window= 5 Averaging window width weighting= *triangular uniform Weighting scheme bfactor Configure bfactor prediction use= asa *original similarity Available algorithms (* = active) minimum= 10 Minimum allowed value (a constant is added if any B-factors would fall below this value) asa Use accessible surface area to predict new B-values factor= 2 Transform values by multiplying with a factor precision= 960 Number of points per atom probe_radius= 1.4 Radius for probing surface accessibility original Use original bfactors to predict new B-values factor= 1 Transform values by multiplying with a factor similarity Use sequence similarity to predict new B-values factor= -100 Transform values by multiplying with a factor calculation Configure sequence similarity calculation matrix= *blosum50 blosum62 dayhoff identity Similarity matrix window= 5 Averaging window width weighting= *triangular uniform Weighting scheme renumber use= model *target original Mainchain numbering; (* = selected; None: disable) start= 1 Number for first residue |