Automated protein-ligand structure determination with phenix.ligand_pipeline
phenix.ligand_pipeline is an automation system combining Xtriage, Phaser, eLBOW,
phenix.refine, AutoBuild, and
LigandFit, with optional interaction with Coot. In favorable cases, it can produce a near-finished structure
of a protein-ligand complex starting from minimal inputs, and can significantly
reduce the manual effort required for more difficult structures.
Data preparation. The program begins by importing the experimental data
and converting it to amplitudes with standard MTZ column labels (this
simplifies later steps). R-free flags will be generated if not available,
otherwise the user-supplied flags will be used (extended to higher resolution
if necessary). Xtriage will be run to assess data quality and guess an
appropriate number of search copies for MR (which, by default, will be the
same as the number of ligands to look for).
Ligand generation. eLBOW will be run on the input ligand definition to
generate suitable geometry restraints and a starting structure for LigandFit.
The most important option is the choice of whether to run the semi-empirical
AM1 quantum-mechanical optimization or
not (optimize=True). This is not done by default because it is more
time-consuming, but in some cases it may lead to improved geometry. For
troublesome cases, we recommend generating the ligand restraints and structure
manually prior to running the pipeline, then supplying the output CIF as
input to the pipeline with the additional argument
keep_input_restraints=True. In this case, eBLOW will only be used to
generate a PDB file, and the input restraints will be used for refinement.
Model preparation. For each input model, a series of cleanup steps is
run to remove specific details such as alternate conformations, hydrogen atoms,
or waters. The model will also be processed
with Sculptor to ensure that all residues match the target
sequence. (This will not extend truncated sidechains, but this is available
as a later step after ligand placement.)
Model placement. Two options are available, rigid-body refinement or
molecular replacement:
- Molecular replacement. This uses the full MR_AUTO mode in Phaser, which
includes rigid-body and group B-factor refinement (although the B-factors
are not output). The number of search
copies will be determined automatically by Xtriage (copies=Auto), but
since this will occasionally guess too few copies, you can explicitly set
the search number instead (e.g. copies=2). If Phaser finds fewer copies
than expected, the number of expected ligands will automatically be adjusted.
Non-water heteroatoms will be preserved at their original occupancies.
If you want to preserve a specific placement or orientation of the MR
solution, you can specify a previously determined model by passing the
reference_structure parameter.
- Rigid-body refinement. If the input model is already close enough to
the target structure, rigid-body refinement alone will be suitable.
Currently MR is run by default, but this will be changed in the future; if
you are confident that the model can be correctly placed without MR, add
mr=False or skip_mr=True or --skip_mr to jump straight to
refinement. (This will automatically turn on rigid-body refinement in the
subsequent step.) A more useful (but slightly slower) option is mr=Auto,
which will attempt rigid-body refinement and then revert to running Phaser
if the R-free is greater than 0.4 or no models with appropriate symmetry are
found.
In automatic or MR mode, you may specify multiple models as input; the best
model will be selected either by Phaser or the rigid-body refinement step.
The space group will be updated based on the Phaser solution or selected model
if necessary.
Initial refinement. The MR solution (or starting model, if MR was not
performed) will be refined using a relatively conservative resolution-dependent
strategy, by default rotamer fitting,
real-space and reciprocal-space coordinate minimization, individual B-factor
refinement, and TLS groups. Torsion NCS will be used if appropriate.
Rigid-body refinement will be used if Phaser was not previously run. Water
molecules will be added if the resolution is sufficient, but these will be
removed prior to ligand fitting. Simulated annealing may be run
if desired (anneal=True), but this will significantly increase the
runtime. Because rapid convergence is more important at this point than
ideal geometry, weight optimization will not be used by default. (Water
picking is used because this improves the maps, but the solvent atoms will be
removed prior to ligand fitting to avoid overlap.)
If the refined R-free is above 0.5, the program will immediately exit, since
this indicates that the structure may be incorrect or at least in need of
extensive manual rebuilding (or possibly additional search copies for MR). If
R-free is above 0.3, the default behavior (build=Auto) is to rebuild the
model in AutoBuild. Otherwise, the structure is considered suitable for
ligand fitting.
Rebuilding. Rebuilding is optional but may be essential in some cases
where large rearrangements are required. AutoBuild will be run with
rebuild_in_place=Auto, which in most cases means that residues will not be
added or removed. If run, AutoBuild is generally the most time-consuming
stage in the pipeline, although it can take advantage of multiple processors.
An alternate protocol, run by default, is to simply delete those residues with
poor fit to density, with the goal of moving them out of the way of the ligand
binding site. Either the mainchain or sidechain atoms or both can be removed,
but the default is to only prune sidechains. These may optionally be rebuilt
after the ligand is placed.
As a final preparation step, waters are removed from the model and the
difference map is recalculated.
Ligand placement. Currently only LigandFit is supported, although
additional options will be available in the future. The target map will be
the mFo-DFc map from phenix.refine. The parameters have been adjusted slightly
for optimal performance at the expense of slightly longer runtime; you may
specify quick=True or aggressive=True to override the defaults.
Real-space refinement will be performed on ligands after placement to obtain
an optimized fit and local geometry. Unlike most of the other steps, here the
resolution will be truncated by default to 1.8Å (ligandfit.d_min=1.8),
because higher-resolution data tend to lead to lower correlations even when
the fit is excellent.
An essential feature of automatic ligand fitting is a cutoff model-to-map CC
to prevent false positives (i.e. spurious placement). By default this is set
to 0.7, which has been empirically determined to nearly always indicate a
successful fit. Any ligands with a CC below this value will be omitted from
the model. In some cases this may actually reject correct fits (see
Troubleshooting section below), but we recommend that these solutions be
viewed with suspicion. If no acceptable ligand fits are obtained, the
pipeline will skip further refinement and exit early with an error message.
In many cases the one or more ligands will be placed, but fewer than the
desired number; in these situations, a separate post-LigandFit routine will
first attempt to use NCS operators to place additional copies of the ligand,
pruning the model as necessary to avoid clashes. If there are still fewer
copies than desired, the pipeline will continue with refinement but print a
warning at the end of the run.
Final refinement and validation. If the ligand placement was at least
partially successful, a final round of refinement will be performed on the
combined model. In this run, waters will be added automatically if the
resolution is at least 2.9Å, and the geometry weight will be optimized if the
resolution is worse than 1.75Å (this is much slower but can run on multiple
processors, and generally produces a superior model).
For users desiring more control over the rebuilding process, or for difficult
cases where more aggressive model modifications are required, the program can
be run semi-interactively by adding the flag interactive=True (or
alternately, --interactive). This will launch Coot after the first
round of refinement, with model and maps loaded, allowing rebuilding to take
place. The recommended procedure is to locate the putative ligand density
in the mFo-DFc map, and adjust the protein structure as necessary to remove
overlap with the ligand binding site (as well as any other modifications
that will not be automatically fixed by refinement and/or AutoBuild). Once
rebuilding is complete, click the button "Return to PHENIX" to save the
edited model and resume the pipeline (the next step will be LigandFit, with
a freshly calculated mFo-DFc map). At any point during rebuilding, you can
obtain new maps for the edited model by clicking "Fetch new maps" on the
toolbar, which will write out a temporary model and recalculate maps in the
background, automatically loading these into Coot when finished.
Because the interactive mode overcomes many of the limitations present when
significant rebuilding is required, it is the reocmmended method for running
the pipeline, especially for new users.
All files will be written to a directory named pipeline_X, where X is
an automatically incremented integer. Most of the important output files will
be named after the ligand code, by default LIG:
- LIG_data.mtz: filtered data and R-free flags
- LIG_final.pdb: final output model
- LIG_final.mtz: final map coefficients (as well as data and flags)
- LIG_ligand.cif: ligand geometry restraints
- pipeline.log: log file for the entire run
- visualize_coot.py: script for loading results in Coot
The Coot script can be run like this:
coot --script pipeline_1/visualize_coot.py
This will open the final model and map coefficients, and if ligand fitting
was successful, a window will be created containing buttons to zoom in on
the ligand(s).
Inside the output directory, separate subdirectories will be created for each
stage of the pipeline, e.g. phaser_0, refine_0, refine_1,
LigandFit_run_1_. You do not normally need any of the files in these
directories, but they may be useful for debugging purposes. In particular,
if LigandFit failed to place one or more copies, the ligand PDB files will not
be incorporated into the final model, but will still be available in the
LigandFit output directory.
Although the pipeline can produce nearly-complete models in favorable cases
(typically at moderate-to-high resolution, approximately 2.2 to 1.8Å), the
structures are unlikely to be truly complete. Several recommendations for
finishing the refinement in Coot:
Inspect the ligand binding closely. A number of structures in the PDB
have been shown to exhibit ligand placements which are not supported by
the data, leading to several retractions so far. Although the CC cutoff
for ligand placement is designed to avoid this outcome, it is essential
that you visually confirm the refined ligand position. Of particular
concern:
- Does the difference map show residual positive or (especially) negative
peaks around the ligand binding site? Even if the overall position is
approximately correct, this may indicate problems with the specific
conformation and/or geometry; it is not uncommon, for example, for
nearly-symmetric rings to be fit backwards.
- Does the overall shape of the 2mFo-DFc map approximate that of the
ligand? If not, this could mean that the ligand has been mistakenly
placed in density for missing protein atoms, or for another unrelated
ligand (such as contaminants from the crystallization solution). Note
that it is quite possible for flexible ligands to be partially
disordered, but in such cases the majority of the ligand should still
be clearly defined in the map.
- Does the binding make biological sense? Pay special attention to
hydrogen bond donors and acceptors in both the ligand and the protein,
and especially the interactions of charged groups. Chemically
nonsensical binding usually indicates an incorrect fit (or, in extreme
cases, an incorrect ligand).
Look for additional ligands. It is quite common for protein structures
to co-crystallize with buffer molecules, ions, and endogenous contaminants.
These can be placed by running LigandFit, or interactively in Coot.
Check for errors or missing pieces in the protein model. Even after
refinement and/or rebuilding, some local fitting problems may be present
(especially at lower resolution). Additionally, it is very common for
pieces of the start model to be missing from the refined structure, or
vice-versa; this will usually be obvious from the maps. Alternate
conformations will normally be stripped out before molecular replacement,
so at high resolution (typically 1.6Å or better), these may need to be
added manually. You should also check the validation results to identify
geometry problems and misfitting.
If none of the above steps requires further adjustments, your model is
probably complete. Otherwise, if you make any changes, you should run a
final round of refinement with weight optimization to get the best possible
model, and a final set of maps which should also be inspected.
More generally, with any automation system, two rules apply: exercise
appropriate skepticism and seek expert opinion if unsure of the results.
(If a local expert is not available, email help@phenix-online.org for advice.)
- For maximum flexibility, no restriction is placed on the quality of the
starting model or its similarity to the target model; the Sculptor step will
take care of any necessary sequence deletions. However, in practice it is
vastly more effective when working with near-identical structures.
- When working with a set of related structures that differ from previously
solved crystal forms, it will probably be most productive to solve the first
one interactively, then use this as a starting point for the remaining
structures.
- Only one type of ligand is supported at present. Essential cofactors or
other molecules whose binding mode is already known may included in the
input model; in these cases, you should add keep_hetatms=True to instruct
Phaser to leave the occupancies unmodified. Any additional ligands will
need to be placed after the final refinement.
- Multiple stereoisomers will need to be screened independently; the program
does not currently have the ability to evaluate different chiral forms.
- Covalently bound ligands may be placed correctly, but the covalent linkage
will not be handled, and nonbonded forces may distort binding. Additionally,
the ligand parameterization may be based on the unbound molecule.
- Structures such as HIV protease which exhibit symmetric binding of inhibitors
will not be handled properly - and more generally, alternate conformations
for the ligand are not modeled. (The pipeline may be able to place a single
conformation, however.)
- The pipeline is designed to work at resolutions of 3.0A or better; while
it is possible to solve complexes at lower resolution, the ligand fitting
procedure is not expected to work as well.
- There is no high-resolution limit, but structures at atomic resolution will
inevitably require more manual effort to complete due to the presence of
alternate conformations and (often) additional small-molecule ligands, which
are not presently handled by Phenix.
- Building is currently limited to either rebuild-in-place mode or building
an entirely new model from scratch, and rebuilds the entire model. This
means that for structures where local rebuilding or addition of a few
residues is required, interactive mode will probably be more successful.
- Most of the optimization of program behavior and parameters has been focused
on structures bound to drug-like molecules; less attention has been given to
fragments. These tend to have weaker binding and partial occupancy, and are
therefore more difficult to distinguish in maps.
The pipeline may fail for a number of reasons, usually because LigandFit could
not successfully place any copies of the ligand. If this happens, the program
will skip the final refinement step and exit early with an error message. You
can still run the Coot script to load the output of the previous refinement
or AutoBuild, which may provide clues about the cause of failure. Several
common problems are listed below. Note that these may also apply to cases
where the pipeline runs to completion but was unable to place all copies of
the ligand successfully (in which case a warning message will be printed).
- The protein blocks the ligand site. This is one of the most common
problems, especially in structures which undergo extensive conformational
changes or local rearrangements, such as kinases. Even a single stray
sidechain may prevent LigandFit from placing the ligand correctly, both
due to flattened difference density, and steric clashes. For structures
with NCS, this can sometimes be overcome by the NCS application procedure
described above, but this still requires that at least one copy was placed
correctly. Running AutoBuild can sometimes help, but this also has the
risk of moving the protein into ligand density. The interactive mode is
strongly recommended for overcoming this problem.
- Spurious difference density for other atoms. If large regions of
positive mFo-DFc density corresponding to missing protein residues are
present, LigandFit may sometimes place the ligand in these sites instead of
the correct binding site. Again, interactive mode is the recommended
solution (and an optimal search model can help too). Note that unexpected
ordered buffer components may also lead to false positives; this is
especially problematic for fragments.
- Poor or incomplete difference density. In some cases, a particularly
flexible ligand may not fit entirely into difference density; this is not
uncommon or necessarily problematic, but it can reduce the CC below the
cutoff required to continue. More common is poor density quality for the
entire molecule; this can be caused by an incompletely refined model, low
resolution, or partial binding. In some cases, it means that the ligand
is not bound at all. While reducing the CC cutoff may allow some
structures to proceed, we recommend inspecting the density interactively,
both before and after ligand placement. Also note that maximum entropy
treatment of the difference map (maxent=True) may improve the fitting
in borderline cases.
- Poor ligand geometry. In some cases eLBOW is unable to automatically
determine the optimal geometry from the inputs. Running AM1 optimization
may help for structures where the initial geometry is partially correct.
However, in extreme cases the geometry will be too distorted for AM1 to
work, and additional information on geometry is required. For this reason,
we recommend supplying as rich a source of information as possible;
in particular, if geometry from a crystal structure of the target compound
is available, this should be included in the inputs.
Speeding up execution: the bulk of the runtime is spent refining the model;
the default protocol is very robust but not especially fast, and may be
unnecessarily intensive for easy cases. You can switch to a faster protocol
by specifying quick_refine=True, which will shorten both refinement steps
from 6 to 3 cycles, and disable weight optimization.
Hydrogen atoms: these will be added prior to the final refinement step.
Usually this will improve geometry somewhat (and also R-factors at high
resolution); however, it has the side effect of making refinement significantly
slower. Specfiy after_ligand.hydrogens=False to leave them off.
Rebuilding sidechains: sidechains truncated by a previous step (Sculptor or
the pruning step), or missing in the input model, may optionally be restored
after ligand placement by specifying extend_model=True.
Standalone tools: although most of the pipeline is built around
pre-existing programs in Phenix, several new tools have been developed for
this protocol and are also available as separate command-line programs:
- mmtbx.select_best_starting_model: given a data file and a collection
of PDB files, calculate R-factors and (optionally) run rigid-body
refinement to pick the best one for further refinement
- mmtbx.prune_model: remove protein atoms with poor fit to density.
- mmtbx.apply_ncs_to_ligand: place additional copies of the desired
ligand using NCS operators between protein chains.
- mmtbx.extend_sidechains: rebuild missing protein sidechain atoms and
perform quick real-space refinement.
- mmtbx.validate_ligand: check for possible fitting errors in a specific
ligand.
- phenix.sort_hetatms: group heteroatoms (ligands, water, etc.) with the
closest macromolecule(s) in the PDB file, similar to the procedure used at
PDB deposition.
- phenix.development.optimize_model: this is essentially the final
refinement step as a standalone program. It was originally developed as an
automated tool for re-refining structures in the PDB, and may be useful
both for preparing starting models and for additional refinement of
pipeline-built structures after manual fixes.
- Automating crystallographic structure solution and refinement of protein-ligand complexes. N. Echols, N.W. Moriarty, H.E. Klei, P.V. Afonine, G. Bunkóczi, J.J. Headd, A.J. McCoy, R.D. Oeffner, R.J. Read, T.C. Terwilliger, and P.D. Adams. Acta Crystallogr D Biol Crystallogr 70, 144-54 (2014).
- Xtriage and Fest: automatic assessment of X-ray data and substructure structure factor estimation. P.H. Zwart, R.W. Grosse-Kunstleve, and P.D. Adams. CCP4 Newsletter Winter, Contribution 7 (2005).
- Towards automated crystallographic structure refinement with phenix.refine. P.V. Afonine, R.W. Grosse-Kunstleve, N. Echols, J.J. Headd, N.W. Moriarty, M. Mustyakimov, T.C. Terwilliger, A. Urzhumtsev, P.H. Zwart, and P.D. Adams. Acta Crystallogr D Biol Crystallogr 68, 352-67 (2012).
- Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. T.C. Terwilliger, R.W. Grosse-Kunstleve, P.V. Afonine, N.W. Moriarty, P.H. Zwart, L.-W. Hung, R.J. Read, and P.D. Adams. Acta Cryst. D64, 61-69 (2008).
- MolProbity: all-atom structure validation for macromolecular crystallography. V.B. Chen, W.B. Arendall, J.J. Headd, D.A. Keedy, R.M. Immormino, G.J. Kapral, L.W. Murray, J.S. Richardson, and D.C. Richardson. Acta Cryst. D66, 12-21 (2010).
- input_files
- seq_file = None Sequence of the crystallized protein. This should correspond to the search model, i.e. if the model contains two copies of the protein, the sequence file should as well.
- model = None Search model for MR (or starting model for refinement). Water molecules will be removed. Additional ligands may be present but their occupancy will be set to zero by Phaser unless you specify keep_hetatms=True.
- ligand = None Ligand description in chemical format (e.g. SMILES)
- ligand_smiles = None SMILES string for ligand
- ligand_pdb = None PDB file with ligand geometry and/or atom names
- ligand_code = None Chemical components code for ligand (if no file is provided), for molecules already present in the PDB.
- extra_cif = None Additional CIF file(s) for ligands already present in the structure.
- reference_structure = None If specified, phenix.find_alt_orig_sym_mate will be applied to map the solution to the reference structure. Primarily intended for testing, but also useful in cases where a common reference frame is desired.
- xray_data
- wavelength = None
- file_name = None
- labels = None
- high_resolution = None
- low_resolution = None
- outliers_rejection = True Remove basic wilson outliers , extreme wilson outliers , and beamstop shadow outliers
- french_wilson_scale = True
- sigma_fobs_rejection_criterion = None
- sigma_iobs_rejection_criterion = None
- ignore_all_zeros = True
- force_anomalous_flag_to_be_equal_to = None
- convert_to_non_anomalous_if_ratio_pairs_lone_less_than_threshold = 0.5
- phase_labels = None
- space_group = None
- unit_cell = None
- unmerged_data = None
- unmerged_labels = None
- french_wilson
- max_bins = 60 Maximum number of resolution bins
- min_bin_size = 40 Minimum number of reflections per bin
- r_free_flags
- file_name = None This is normally the same as the file containing Fobs and is usually selected automatically.
- label = None
- test_flag_value = None This value is usually selected automatically - do not change unless you really know what you're doing!
- ignore_r_free_flags = False Use all reflections in refinement (work and test)
- disable_suitability_test = False
- ignore_pdb_hexdigest = False If True, disables safety check based on MD5 hexdigests stored in PDB files produced by previous runs.
- generate = False Generate R-free flags (if not available in input files)
- fraction = 0.1
- max_free = 2000
- lattice_symmetry_max_delta = 5
- use_lattice_symmetry = True
- use_dataman_shells = False Used to avoid biasing of the test set by certain types of non-crystallographic symmetry.
- n_shells = 20
- reference_data
- file_name = None
- labels = None
- output
- prefix = None
- directory = None
- make_subdir = True If True, the program will create a subdirectory for the current run.
- directory_number = None
- verbose = False
- job_title = None Job title in PHENIX GUI, not used on command line
- coot_absolute_paths = True
- export_for_web = False
- runtime
- nproc = 1 Number of processors to use for parallelized steps (including LigandFit, AutoBuild, and phenix.refine with weight optimization). If set to Auto, the program will use as many CPUs as are available, minus the current load average.
- ligand_copies = None Number of copies of the ligand to place. By default, this is the same as the number of copies as the MR search model.
- skip_xtriage = False
- clean_models = True Run various cleanup on input model file(s) before MR or refinement.
- mr = True If True, molecular replacement will be run to place the model. If Auto, the program will try rigid-body refinement first, then run Phaser if the R-free is too high. Otherwise, rigid-body refinement will be used.
- skip_mr = False Alias for mr=False (deprecated)
- prune = Auto Prune the model after refinement to remove residues and sidechains in poor density.
- build = Auto Run AutoBuild after initial refinement. By default, this will be done if R-free is greater than the max_r_free cutoff.
- skip_ligand = False
- refine_after_fitting = True Run phenix.refine a final time after ligand placement. If not True, new maps will be calculated instead.
- refine_ligand_occupancy = False Refine group occupancy for each placed ligand. Recommended for fragments.
- mr_program = *Auto phaser mrage Program to use for molecular replacement. Will use Phaser by default if a PDB file is included in input, otherwise will use phaser.mrage (sequence required).
- method = *ligandfit Program to use for ligand placement. Currently only LigandFit is supported, but other programs will be added in the future.
- fill_maps = True
- extend_model = False If enabled, incomplete sidechains will be rebuilt and fit to density if possible after ligand placement.
- validate = True
- sort_hetatms = True Rearrange non-polymer heteroatoms (waters and ligands) to have the same chain ID and asymmetric unit as the nearest polymer chain.
- run_command = None
- ncs_for_ligands = True Use NCS operators to place additional ligands if LigandFit was not able to find all copies with acceptable CC.
- interactive = False Run program interactively with manual editing in Coot between refinement (or autobuilding) and ligand-fitting steps. Requires that Coot be installed and present on the path.
- stop_if_r_free_greater_than = 0.5 Cutoff for continuing the program after initial round of refinement - usually an R-free above 0.5 indicates that the model contains serious defects which cannot be automatically fixed (or is simply wrong altogether). Set to None to disable this check.
- auto_resolution = False If True, and unmerged intensities are included as input, the merging statistics will be used to determine appropriate resolution cutoffs for different steps in the pipeline (starting with a conservative cutoff for the initial refinement, and extending to higher resolution for later steps).
- xtriage
- min_i_over_sigma = None
- min_cc_one_half = None
- max_r_merge = None
- max_r_meas = None
- min_completeness = None
- prepare_modelOptions for preparing input model(s) for MR or refinement
- sculpt = True Run Sculptor to homogenize sidechains before MR or refinement
- remove_waters = True Remove all water molecules (HOH)
- remove_hydrogens = True Remove explicit hydrogen atoms
- remove_alt_confs = True Remove alternate conformations
- convert_semet_to_met = True Change MSE residues to MET
- convert_to_isotropic = True Convert atoms to anisotropic
- reset_occupancies = True Set occupancies to 1.0
- remove_ligands = False Remove all ligands
- reset_hetatm_flag = False Change HETATM records to ATOM
- sort_first = True
- select_modelModel selection options
- d_min = None
- rigid_body_refine = True
- optimize_b_factors = False
- max_cell_angle_rmsd = 1.0
- max_cell_edge_rmsd = 1.0
- max_r_free = 0.4
- phaser
- d_min = 2.0 Resolution limit for running Phaser.
- copies = Auto Number of copies to search for. If Auto, Xtriage will be used to guess based on the expected solvent content.
- identity = None Sequence identity of the search model to the target structure. If None and rmsd is also None, perfect sequence identity will be assumed.
- rmsd = None RMSD of the search model to the target structure. If not specified, the sequence identity will be used instead.
- sg_alt = none hand *all Choice of alternate space groups to try.
- ignore_packing = False
- pack_cutoff = 10
- keep_hetatms = True If True, HETATMs will not have their occupancies reset to zero.
- force_accept_composition = False
- tncs = True
- prune_model
- resolution_factor = 1/4. Map grid spacing (multiplied times d_min).
- sidechains = True Remove poor sidechains
- mainchain = False Remove entire residues in poor density
- min_backbone_2fofc = 0.8 Minimum 2mFo-DFc sigma level at C-alpha to keep. Residues with C-alpha in density below this cutoff will be deleted.
- min_backbone_fofc = -3.0 Maximum mFo-DFc sigma level at C-alpha to keep. Residues with C-alpha in difference density below this cutoff will be deleted.
- min_sidechain_2fofc = 0.6 Minimum mean 2mFo-DFc sigma level for sidechain atoms to keep. Residues with sidechains below this level will be truncated.
- max_sidechain_fofc = -2.8 Maximum mean 2mFo-DFc sigma level for sidechain atoms to keep. Residues with sidechains below this level will be truncated.
- min_cc = 0.7 Minimum overall CC for entire residue to keep.
- min_cc_sidechain = 0.6 Minimum overall CC for sidechains to keep.
- min_fragment_size = 3 Minimum fragment size to keep. Fragments smaller than this will be deleted in the final step (based on the assumption that the adjacent residues were already removed). Set this to None to prevent fragment filtering.
- check_cgamma = True Check for poor density at the C-gamma atom for long sidechains. Useful in cases where the terminal atoms may have been misfit into nearby density.
- autobuild
- d_min = None Resolution for running AutoBuild
- max_r_free = 0.3 R-free cutoff for deciding whether to run AutoBuild when build=Auto.
- max_frac_missing = 0.05
- quick = False Run AutoBuild in quick mode. Inferior results, but a huge time-saver.
- rebuild_in_place = Auto Controls whether AutoBuild will only rebuild the existing model, without adding or removing atoms, or build an entirely new model.
- include_input_model = True Specifies whether to incorporate parts of the input model when building a new model (i.e. rebuild_in_place=False).
- phil_file = None Parameters file for AutoBuild.
- refine
- quick_refine = False Use a shorter refinement protocol: fewer cycles and no weight optimization. Suitable for isomorphous structures.
- d_min = None
- ncs_type = *Auto torsion cartesian None
- twin_law = Auto
- phil_file = None File containing additional parameters for phenix.refine.
- after_mr
- cycles = 6 Number of macrocycles of refinement to run.
- optimize_weights = False Perform weight optimization (geometry and ADP restraints).
- sites = True Refine with individual_sites strategy.
- real_space = Auto
- rigid_body = Auto Run rigid-body refinement. By default this is only run if MR was not used to place the model.
- adp_type = *Auto aniso iso
- tls = Auto
- update_waters = Auto Add/remove waters automatically. These will not interfere with ligand placement, since the fitting programs ignore them.
- anneal = False Run simulated annealing. May increase radius of convergence, especially for rough initial models, but significantly adds to runtime.
- ready_set = False Run ready_set to generate additional parameter files for phenix.refine. Hydrogens are not added at this stage.
- after_ligand
- optimize_weights = Auto Perform a grid search to optimize the XYZ and ADP restraint weights. Not usually necessary if d_min <= 1.75.
- hydrogens = True Add hydrogens to model with phenix.ready_set. If Auto, the decision will be made based on resolution, the presence of hydrogens in the input model, or a mention of riding hydrogens in the PDB header.
- cycles = 6 Number of macrocycles.
- tls = Auto Refine TLS groups.
- adp_type = *Auto aniso iso B-factor parameterization
- waters = Auto Run ordered solvent update - normally resolution-dependent.
- ions = None List of element symbols for ion picking (separated by commas or spaces).
- outlier_rejection = True
- elbow
- ligand_id = Auto Three-letter residue code to use in output PDB file. If not specified, will default to LIG .
- optimize = False Run AM1 quantum-mechanical optimization on ligand geometry.
- initial_geometry = None Starting geometry for AM1 optimization.
- final_geometry = None Final geometry for output ligand
- keep_input_restraints = True If True, and the input files include pre-calculated restraints for the target ligand, eLBOW will propagate these restraints instead of generating new ones.
- template_pdb = None PDB file containing atom names to use for ligand. If not specified (and not present in the input ligand file), the atom names will be generated automatically.
- ligandfit
- d_min = 1.5 Resolution cutoff for ligand fitting. For atomic resolution datasets, LigandFit usually works better at slightly lower resolution.
- map_type = *mFo-DFc fo_minus_fo Type of map to use for ligand fitting. The default is to use the standard mFo-DFc difference map, but if a reference dataset is supplied, an isomorphous difference map may be used instead.
- quick = False Run LigandFit in quick (single-job) mode. Not recommended.
- aggressive = False Adjusts the LigandFit parameters to search for more conformations, at the expense of increased runtime.
- min_cc = None
- conformers = 20
- n_group_search = 5
- ligand_near_residue = None
- ligand_near_chain = None
- fixed_ligand = False
- min_ligand_cc_keep = 0.7 Minimum CC to consider a ligand placement correct. Ligands with at least this CC will be incorporated into the current model for refinement.
- min_overall_cc_stop = 0.75 Minimum overall CC (for all ligands) to consider the structure complete.
- maxent = False
- first_only = False If True, LigandFit will place a single ligand, and the remaining copies will be added based solely on NCS operators.
- fobs_minus_fobs
- multiscale = False
- d_min = None
- max_r_factor = 0.2 Maximum allowed R-factor for comparison of the target and reference datasets.
- relative_length_tolerance = 0.01 Tolerance for deviation in unit cell lengths between target and reference datasets.
- absolute_angle_tolerance = 1. Tolerance for deviation in unit cell angles between target and reference datasets.
- ligand_ncsParameters for placing missing ligands using NCS operators between protein chains.
- d_min = 2.5
- max_rmsd = 2.0
- min_cc = 0.7
- min_cc_reference = 0.85
- min_2fofc = 1.0
- min_dist_center = 5
- remove_clashing_atoms = True
- clash_cutoff = 2.0
- write_sampled_pdbs = False
- extension
- selection = None
- build_hydrogens = Auto
- max_atoms_missing = None
- use_rotamers = True
- anneal_residues = False
- skip_rsr = False
- output_model = None
- output_map_coeffs = None
- sorting
- preserve_chain_id = False The default behavior is to group heteroatoms with the nearest macromolecule chain, whose ID is inherited. This parameter disables the change of chain ID, and preserves the original chain ID.
- waters_only = False Rearrange waters, but leave all other ligands alone.
- sort_waters_by = none *b_iso Ordering of waters - by default it will sort them by the isotropic B-factor.
- set_hetatm_record = True Convert ATOM to HETATM where appropriate.
- ignore_selection = None Selection of atoms to skip. Any residue group which overlaps with this selection will be preserved with the original chain ID and numbering.
- renumber = True Renumber heteroatoms once they are in new chains.
- sequential_numbering = True If True, the heteroatoms will be renumbered starting from the next available residue number after the end of the associated macromolecule chain. Otherwise, numbering will start from 1.
- distance_cutoff = 6.0 Cutoff for identifying nearby macromolecule chains. This should be kept relatively small for speed reasons, but it may miss waters that are far out in solvent channels.
- remove_waters_outside_radius = False Remove waters more than the specified distnace cutoff to the nearest polymer chain (to avoid PDB complaints).
- loose_chain_id = X Chain ID assigned to heteroatoms that can't be mapped to a nearby macromolecule chain.
- testingParameters for testing and benchmarking the program - not intended for general use.
- pdb_id = None If specified, the program will download all relevant data including search model and SMILES string from the PDB, and exit.