Automated Model Building and Rebuilding using AutoBuild
- AutoBuild Wizard: Tom Terwilliger
- PHENIX GUI: Nathaniel Echols
- phenix.refine: Ralf W. Grosse-Kunstleve, Peter Zwart and Paul D.
Adams
- RESOLVE: Tom Terwilliger
- phenix.xtriage: Peter Zwart
The purpose of the AutoBuild Wizard is to provide a highly automated
system for model rebuilding and completion. The Wizard design allows the
user to specify data files and parameters through an interactive GUI, or
alternatively through a parameters file. The AutoBuild Wizard begins
with datafiles with structure factor amplitudes and uncertainties, along
with either experimental phase information or a starting model, carries
out cycles of model-building and refinement alternating with model-based
density modification, and producing a relatively complete atomic model.
The AutoBuild Wizard uses RESOLVE, xtriage and phenix.refine to build an
atomic model, refine it, and improve it with iterative density
modification, refinement, and model-building
The Wizard begins with either experimental phases (i.e., from AutoSol)
or with an atomic model that can be used to generate calculated phases.
The AutoBuild Wizard produces a refined model that can be nearly
complete if the data are strong and the resolution is about 2.5 A or
better. At lower resolutions (2.5 - 3 A) the model may be less complete
and at resolutions > 3A the model may be quite incomplete and not well
refined.
The AutoBuild Wizard can be used to generate OMIT maps (simple omit,
SA-omit, iterative-build omit) that can cover the entire unit cell or
specific residues in a PDB file.
The AutoBuild Wizard can generate a set of models compatible with
experimental data (multiple_models)
The AutoBuild Wizard can be run from the PHENIX GUI, from the
command-line, and from parameters files. All three versions are
identical except in the way that they take commands from the user. See
Using the PHENIX Wizards for details of how to
run a Wizard. The command-line version will be described here.
The AutoBuild Wizard begins with experimental structure factor
amplitudes, along with either experimental or model-based estimates of
crystallographic phases. The phase information is improved by using
statistical density modification to improve the correlation of
NCS-related density in the map (if present) and to improve the match of
the distribution of electron densities in the map with those expected
from a model map. This improved map is then used to build and refine an
atomic model.
In subsequent cycles, the models from previous cycles are used as a
source of phase information in statistical density modification,
iteratively improving the quality of the map used for model-building.
Additionally, during the first few cycles additional phase information
is obtained by detecting and enhancing (1) the presence of
commonly-found local patterns of density in the map, and (2) the
presence of density in the shape of helices and strands. The final model
obtained is analyzed for residue-based map correlation and density at
the coordinates of individual atoms, and an analysis including a summary
of atoms and residues that are in strong, moderate, or weak density and
out of density is provided.
The AutoBuild Wizard has been designed for ease of use combined with
maximal user control, with as many parameters set automatically by the
Wizard as possible, but maintaining parameters accessible to the user
through a GUI and through parameters files. The Wizard uses the
input/output routines of the cctbx library, allowing data files of many
different formats so that the user does not have to convert their data
to any particular format before using the Wizard. Use of the
phenix.refine refinement package in the AutoBuild Wizard allows a high
degree of automation of refinement so that the neither user nor Wizard
is required to specify parameters for refinement. The phenix.refine
package automatically includes a bulk solvent model and automatically
places solvent molecules.
The five core modules in the AutoBuild Wizard are
- building a new model into an electron density map
- rebuilding an existing model
- refinement
- iterative model-building beginning from experimental phase information, and
- iterative model-building beginning from a model.
The standard procedures available in the AutoBuild Wizard that are based
on these modules include:
- model-building and completion starting from experimental phases,
- rebuilding a model from scratch, with or without experimental phase information, and
- rebuilding a model in place, maintaining connectivity and sequence register.
Starting from a set of experimental phases and structure factor
amplitudes,
normally model-building and completion starting from experimental phases
is carried out, and then the
resulting model is rebuilt from scratch.
Starting from a model (e.g., from molecular replacement) and
experimental structure factor amplitudes, rebuilding a model in place
is by default
carried out if the starting model differs less than about 50% in
sequence from the desired model, and otherwise the resulting model
is rebuilt from scratch. It
is generally a good idea to specify which you want to happen using the
keyword "rebuild_in_place=True" (to keep the basic input model) or
"rebuild_in_place=False" (to build a new model).
- a data file, optionally with phases and HL coeffs and freeR flag
(w1.sca or data=w1.sca)
- a sequence file (seq.dat or seq_file=seq.dat) or a model
(coords.pdb or model=coords.pdb)
- coefficients for a starting map (map_file=resolve.mtz)
- a file for refinement
(refinement_file=exptl_fobs_freeR_flags.mtz)
- a high-resolution datafile (hires_file=high_res.sca)
The AutoBuild wizard will apply an anistropy correction and B-factor
sharpening to all the raw experimental data by default (controlled by
they keyword remove_aniso=True). The target overall Wilson B factor can
be set with the keyword b_iso, as in b_iso=25. By default the target
Wilson B will be 10 times the resolution of the data (e.g., if the
resolution is 3 A then b_iso=30.), or the actual Wilson B of the data,
whichever is lower.
If an anisotropy correction is applied then the entire AutoBuild run
will be carried out with anisotropy-corrected and sharpened data. At the
very end of the run the final model will be re-refined against the
uncorrected refinement data and this re-refined model and the
uncorrected refinement data (with freeR flags) will be written out as
overall_best.pdb and overall_best_refine_data.mtz.
You can specify many more parameters as well. See the list of keywords,
defaults and descriptions at the end of this page and also general
information about running Wizards at Using the PHENIX
Wizards for how to do this. Some of the most
common parameters are:
data=w1.sca # data file
model=coords.pdb # starting model
rebuild_in_place=true # rebuild input model in place
rebuild_in_place=false # build a new model; add or subtract residues
# from input model as necessary
seq_file=seq.dat # sequence file
map_file=map_coeffs.mtz # coefficients for a starting map for building
resolution=3 # dmin of 3 A
s_annealing=True # use simulated annealing refinement at start of each cycle
n_cycle_build_max=5 # max number of build cycles (starting from experimental phases)
n_cycle_rebuild_max=5 # max number of rebuild cycles (starting from a model)
You can run phenix.autobuild from a parameters file. This is often
convenient because you can generate a default one with:
phenix.autobuild --show_defaults > my_autobuild.eff
and then you can just edit this file to match your needs and run it
with:
phenix.autobuild my_autobuild.eff
By default AutoBuild will instruct phenix.refine to pick waters using
its standard procedure. This means that if the resolution of the data is
high enough (typically 3 A) then waters are placed.
You can tell AutoBuild not to have phenix.refine pick waters with the
command:
place_waters=False
If you want to place waters at a lower resolution, you will need to
reset the low-resolution cutoff for placing waters in phenix.refine. You
would do that in a "refinement_params.eff" file containing lines like
these (see below for passing parameters to phenix.refine with an ".eff"
file):
refinement {
ordered_solvent {
low_resolution = 2.8
}
}
AutoBuild does not know about twinning, but you can incorporate a twin
law into the refinement steps in the AutoBuild procedure if your crystal
is twinned. Use phenix.xtriage to identify twinning and the twin law.
Then specify the twin law in a parameters file (see next section) and
provide that to AutoBuild with the keyword such as
"refine_eff_file=twin_law.eff"
You may also want to try using the keyword "two_fofc_in_rebuild"
which will use the 2Fo-Fc map from phenix.refine in model-building.
In Phenix the parameter test_flag_value sets the value of the test set
that is to be free. Normally Phenix sets up test sets with values of 0
and 1 with 1 as the free set. The CCP4 convention is values of 0 through
19 with 0 as the free set. Either of these is recognized by default in
Phenix and you do not need to do anything special. If you have any other
convention (for example values of 0 to 19 and test set is 1) then you
can specify this with test_flag_value.
Note that phenix.refine and AutoBuild write out PDB files that contain
the test_flag_value. AutoBuild can read this test_flag_value and use it
automatically. However if there is a conflict between this test_flag_value
and the default value based on your data file, you may have to specify which
to use.
Special note on anomalous data and AutoBuild: Autobuild does not support
anomalous test sets. If you have a data file with anomalous data that
has Rfree flags such as Rfree(+),Rfree(-) then you will need to merge these
Rfree flags before running Autobuild. Here is how:
Go to the reflection file editor, read in your refine_data.mtz (or whatever
it is called) file with anomalous data. Copy all the data and Rfree
flags to the output file, but select "Edit arrays" and in the window
that comes up do the following:
- Change the names of the output data arrays from
I+ SigI+ I- SigI- to I SigI (or equivalent)
Specify "merge if present" for "anomalous"
Do the same for the Rfree Flags array.
Run the reflection editor.
Now you have a data file that is non-anomalous and that has the same test set as your original. You can use this in AutoBuild.
You can control phenix.refine parameters that are not specified directly
by AutoBuild using a refinement parameters (.eff) file:
refine_eff_file=refinement_params.eff # set any phenix.refine params not set by AutoBuild
This file might contain a twin-law for refinement:
refinement {
twinning {
twin_law = "-k, -h, -l"
}
}
You can put any phenix.refine parameters in this file, but a few
parameters that are set directly by AutoBuild override your inputs from
the refine_eff_file. These parameters are listed below.
Refinement parameters that must be set using AutoBuild Wizard keywords
(overwriting any values provided by user in input_eff_file)
phenix.refine keyword |
Wizard keyword(s) and notes |
refinement.main.number_of_macro_cycles |
ncycle_refine |
refinement.main.simulated_annealing |
s_annealing (only applies to 1st refinement in rebuild. SA in any other refinements controlled by input_eff_file, if any) |
refinement.ncs.find_automatically |
refine_with_ncs=True turns on automatic ncs search |
refinement.main.ncs |
refine_with_ncs=True turns on ncs |
refinement.ncs.coordinate_sigma |
Normally not set by Wizard. However if the Wizard keyword ncs_refine_coord_sigma_from_rmsd is True then the ncs coordinate sigma is equal to ncs_refine_coord_sigma_from_rmsd_ratio times the rmsd among ncs copies |
refinement.main.random_seed |
i_ran_seed sets the random seed at the beginning of a Wizard... this affects refinement.main.random_seed but does not set it to the value of i_ran_seed (because i_ran_seed gets updated by several different routines) |
refinement.main.ordered_solvent |
place_waters=True will set ordered_solvent to True. Note that this only has an effect if the value of the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) is higher than the resolution used for refinement. |
refinement.main.ordered_solvent |
place_waters_in_combine=True will set ordered_solvent to True, only applying this to the final combination step of multiple-model generation. Note that this only has an effect if the value of the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) is higher than the resolution used for refinement. |
refinement.ordered_solvent.low_resolution |
ordered_solvent_low_resolution=3.0 (default) will set the resolution cutoff for adding waters (refinement.ordered_solvent.low_resolution) to 3 A. If the resolution used for refinement is larger than the value of ordered_solvent_low_resolution then ordered solvent is not added. |
refinement.main.use_experimental_phases |
use_mlhl=True will set refinement.main.use_experimental_phases to True |
refinement.refine.strategy |
The Wizard keywords refine refine_b refine_xyz all affect refinement.refine.strategy. If refine=True then refinement is carried out. If refine_b=True (default) isotropic displacement factors are refined. If refine_xyz=True (default) coordinates are refined. |
refinement.main.occupancy_max |
max_occ=1.0 sets the value of refinement.main.occupancy_max to 1.0. Default is to do nothing and use the default from phenix.refine (1.0) |
refinement.refine.occupancies.individual |
The combination of Wizard keywords of semet=True and refine_se_occ=True will add "(name SE)" to the value of refinement.refine.occupancies.individual. You can add to your .eff file other names of atoms to have occupancies refined as well. |
refinement.main.high_resolution |
Either of the Wizard keywords refinement_resolution and resolution will set the value of refinement.main.high_resolution, with refinement_resolution being used if available. |
refinement.pdb_interpretation.link_distance_cutoff |
link_distance_cutoff |
The following parameters controlling phenix.refine output are set
directly in AutoBuild and cannot be set by the user
- refinement.output.write_eff_file
- refinement.output.write_geo_file
- refinement.output.write_def_file
- refinement.output.write_maps
- refinement.output.write_map_coefficients
Similarly, you can control resolve and resolve_pattern parameters. For
these parameters, your inputs will not be overridden by AutoBuild. The
format is a little tricky: you have to put two sets of quotes around the
command like this:
resolve_command="'resolution 200 3'" # NOTE ' and " quotes
This will put the text
resolution 200 3
at the end of every temporary command file created to run resolve. (This
is why it is not overridden by AutoBuild commands; they will all come
before your commands in the resolve command file.) Note that some
commands in resolve may be incompatible with this usage.
If your input PDB file contains ligands (anything other than solvent
that is not protein if your chain_type=PROTEIN, for example) then by
default these ligands will be kept, used in refinement, and written out
to your output PDB file. Any solvent molecules will by default be
discarded. You can change this behavior by changing the keywords from
these defaults:
keep_input_ligands=True
keep_input_waters=False
The AutoBuild Wizard will use phenix.elbow to generate geometries for
any ligands that are not recognized.
You can also tell AutoBuild to add the contents of any PDB files that
you wish to supply to the current version of the structure just before
refinement, so all the refined models produced contain whatever
AutoBuild has built, plus the contents of these PDB files. This can be
done through the GUI, the command-line, or a parameters file. In the
command-line version you do this with:
input_lig_file_list=my_ligand.pdb
NOTE: The files in input_lig_file_list will be edited to make them
all HETATM records to tell AutoBuild to ignore these residues in
rebuilding.
NOTE You may need to tell phenix.refine about the geometry of your
ligands. You will get an error message if the ligand is not recognized
and an automatic run of phenix.elbow does not succeed in generating your
ligand. In that case you will want to run phenix.elbow to create a cif
definition file for this ligand:
phenix.elbow my_ligand.pdb --id=LIG
where LIG is the 3-letter ID code that you use in my_ligand.pdb to
identify your ligand. If the automatic run does not work you may need to
give phenix.elbow additional information to generate your ligand.
Once phenix.elbow has generated your ligand you can use the keyword
"cif_def_file_list" to tell AutoBuild about this ligand:
cif_def_file_list=elbow.LIG.my_ligand.pdb.cif
You can tell AutoBuild to apply any set of cif definitions to the model
during refinement by using a combination of specification files and the
commands cif_def_file_list and refine_eff_file_list:
refine_eff_file_list=link.eff cif_def_file_list=link.cif
This example comes from the phenix.refine manual page in which a link is
specified in a cif definition file link.cif:
data_mod_5pho
#
loop_
_chem_mod_atom.mod_id
_chem_mod_atom.function
_chem_mod_atom.atom_id
_chem_mod_atom.new_atom_id
_chem_mod_atom.new_type_symbol
_chem_mod_atom.new_type_energy
_chem_mod_atom.new_partial_charge
5pho add . O5T O OH .
loop_
_chem_mod_bond.mod_id
_chem_mod_bond.function
_chem_mod_bond.atom_id_1
_chem_mod_bond.atom_id_2
_chem_mod_bond.new_type
_chem_mod_bond.new_value_dist
_chem_mod_bond.new_value_dist_esd
5pho add O5T P coval 1.520 0.020
and this is applied with a parameters file link.eff:
refinement.pdb_interpretation.apply_cif_modification
{
data_mod = 5pho
residue_selection = resname GUA and name O5T
}
You can have any number of cif files and parameters files.
When you run AutoBuild the output files will be in a subdirectory with
your run number:
AutoBuild_run_1_/ # subdirectory with results
The key output files that are produced are:
- A summary file listing the results of the run and the other files
produced:
AutoBuild_summary.dat # overall summary
- A log file describing the entire run: the other files produced:
AutoBuild_run_1_1.log # overall log file
- A warnings file listing any warnings about the run
AutoBuild_warnings.dat # any warnings
overall_best.pdb
NOTE 1: The "working_best.pdb" file is the current working best model.
If an anisotropy correction and sharpening are applied
(remove_aniso=True) then working_best.pdb will be refined against the
corrected data. At the end of the run the last working_best.pdb will be
re-refined against the original data (overall B refined only) and
written out as overall_best.pdb.
NOTE 2: If there are multiple chains or multiple ncs copies, each chain
will be given its own chainID (A B C D...). Segments that are not
assigned to a chain are given a separate chainID and are given a segid
of "UNK" to indicate that their assignment is unknown. ChainID's for
ligands are kept as input. The chainID for solvent molecules is normally
S.
- Final map coefficients used to build refined model. Use FWT PHWT in
maps. Normally this is a density-modified map from resolve.
overall_best_denmod_map_coeffs.mtz
- sigmaA-weighted 2mFo-DFc and Fo-Fc map coefficients from
phenix.refine based on the last working_best.pdb model These map
coefficients will be sharpened anisotropy-corrected if the
remove_aniso=True. (The file working_best.pdb is the same as
overall_best.pdb, except it is refined against sharpened,
anisotropy-corrected data if remove_aniso=True). The map
coefficients are 2FOFCWT PH2FOFCWT for the 2mFo-DFc map and FOFC and
PHFOFC for the Fo-Fc difference map. These map coefficients are
filled (missing reflections are given Fc values.)
overall_best_refine_map_coeffs.mtz
- MTZ file with FP, phases and HL coeffs if present, and freeR_flags
for refinement
overall_best_refine_data.mtz
NOTE: The labels for this mtz file are typically:
FP SIGFP PHIM FOMM HLAM HLBM HLCM HLDM FreeR_flag
The file overall_best_refine_data.mtz (identical to the file
exptl_fobs_phases_freeR_flags.mtz) has a copy of the (experimental)
HL coefficients that were input to autobuild. The labels HLAM HLBM etc
have the ending "M" because they were copied by resolve and it outputs
these labels...but in fact they are not density modified phases from
autobuild, just copied straight from the input data file.
- Final log file for model-building
overall_best.log
- Final log file for refinement
overall_best.log_refine
- Evaluation of fit of model to map
overall_best.log_eval
- Summary of NCS information
overall_best_ncs_info.ncs
The AutoBuild Wizard has two overall methods for building a model.
The first method (standard build) is to build a model from scratch. This
involves identification of where helices (and strands, for proteins) are
located, extension using fragment libraries, connection of segments,
identification of side-chains, and sequence alignment. These methods are
augmented in the standard building procedure by loop-fitting and
building model outside of the region that has already been built.
The second method (rebuild_in_place) takes an existing model and
rebuilds it without adding or deleting any residues and without changing
the connectivity of the chain. The way this works is a segment of the
model is deleted and then is filled-in again by rebuilding from the
remaining ends. This is repeated for overlapping segments covering the
entire model. NOTE: If you are using rebuild_in_place then your model
must be quite similar to your sequence file, and in particular the model
must not extend in the N-terminal direction beyond your sequence file.
Minor edits (amino acid replacements) will be done automatically. Also
NOTE: rebuild_in_place is not designed for models that contain
alternate conformations. It is designed for a model with a single
conformation. If you supply a model with some residues or side-chains
with a blank altloc, and some with an altloc of A and some with B, then
all those with A or B will be ignored (only the first conformer is
considered).
The multiple-models approach really has two levels of multiple models.
At the first level, several (multiple_models_group_number, default is
number_of_parallel_models) models are built (using
rebuild_in_place) and are then recombined into a single good model. At
the next level, this whole process may be done more than once
(multiple_models_number times), yielding several very good models. By
default, if you ask for rebuild_in_place, then you will get a single
very good model, created by running rebuild_in_place several times and
recombining the models.
The AutoBuild Wizard is set up to take advantage of multi-processor
machines or batch queues by splitting the work into separate tasks. See
Tutorial 4: Iterative model-building, density modification and
refinement starting from experimental phases and
Tutorial 6: Automatically rebuilding a structure solved by Molecular
Replacement for a description of the method
used by the AutoBuild Wizard to run build jobs as sub-processes and to
combine the results into single models.
Here are the key factors that determine how splitting model-building
into batches and running them on one or more processors works:
- nbatch is the number of batches of work. As long as nbatch is
fixed then the results of running the Wizard will be the same, no
matter how many processors are used. Normally you will not need to
adjust it.
- nproc is the number of processors to split the work among
- number_of_parallel_models is the number of models to build at
once. The default is to set number_of_parallel_models=nbatch. This
affects both standard building (number_of_parallel_models sets how
many initial models to build) and rebuild_in_place
(number_of_parallel_models determines whether a single model is
built or a set of models are built and recombined into a single
model).
Phenix.autobuild is set up so that you can specify the number of
processors (nproc). Here is how to choose how to set it:
- If you are using rebuild_in_place=False, then use nproc=4. (Any
more will not make any difference.)
- If you are using rebuild_in_place=True, then use nproc=5.
(Again, any more will not make any difference.)
- If you are calculating an omit map, then use nproc=5 * number of
omit regions (i.e., up to 100 or more, depending on how many
processors you have)
Additionally you will want to set two more parameters:
run_command ="command you use to submit a job to your system"
background=False # probably false if this is a cluster, true if this is a multiprocessor machine
If you have a queueing system with 20 nodes, then you probably submit
jobs with something like "qsub -someflags myjob.sh" # where someflags
are whatever flags you use (or just "qsub myjob.sh" if no flags) Then
you might use
run_command="qsub -someflags" background=False nproc=20
or
run_command="qsub" background=False nproc=20
or If you have a 20-processor machine instead, then you might say
run_command=sh background=True nproc=20
so that it would run your jobs with sh on your machine, and run them all
in the background (i.e., all at one time).
There are several resolution limits used in AutoBuild. You can leave
them all to default, or you can set any of them individually. Here is a
list of these limits and how their default values are set:
Name |
Description |
How default value is set |
resolution |
Overall resolution. Used as high-resolution limit for density modification. Used as default for refinement resolution and model-building resolution if they are not set. |
Resolution of input datafile. If a hires datafile is provided, the resolution of that data is used. |
refinement_resolution |
Resolution for refinement |
value of "resolution" |
resolution_build |
Resolution for model-building |
value of "resolution" |
overall_resolution |
Resolution to truncate all data. This should only be used if you need to truncate the data in order to get the Wizard to run. It causes the Wizard to ignore all data at higher resolution than overall_resolution. It is normally better to use the resolution keyword to define the resolution limits, as that will keep all the data in the output and working files. |
None |
multiple_models_starting_resolution |
Resolution for the initial rebuilding of a model in the multiple-models procedure. Normally a low resolution to generate diversity. |
4 A by default |
If you supply a starting map file and
a hires_file (with native data to higher resolution) and you do not supply
a model,then autobuild will by default carry out phase extension (in
increments of s (1/d_min) of s_step). If you do supply a model, or you do
not supply a hires_file, or you do not supply a starting map file, then
the resolution used will be the final resolution (no phase extension steps.)
NOTE: Output files will be in subdirectories labelled
"AutoBuild_run_1_" "AutoBuild_run_2_" etc.
phenix.autobuild data=solve_1.mtz seq_file=seq.dat
input_ncs_file=ha.pdb
Here the data in solve_1.mtz (FP SIGFP PHIB FOM HLA HLB HLC HLD) will
be used as the starting point for density modification. Then a model
will be built and refined. In subsequent cycles the models that have
been built will be used to improve the phases in density modification.
If NCS can be found from the sites in ha.pdb or from any models that are
built, then NCS will be used in density modification.
phenix.autobuild data=w1.sca seq.dat model=coords.pdb \
rebuild_in_place=True
Here "rebuild_in_place=True" tells AutoBuild to keep the overall model
you have supplied, not to add or subtract residues from it, except that
AutoBuild will try to edit the model to match the sequence in your
sequence file. The AutoBuild Wizard will use your model and the data in
w1.sca to generate starting phases, then it will carry out density
modification to improve those phases, and adjust your model, rebuilding
the model to match the resulting map and refining the model. This will
be done iteratively, with the new model from each cycle being used at
the start of the next one. If NCS is found in your model then it will be
used in the density modification process.
phenix.autobuild data=solve_1.mtz seq_file=seq.dat \
model=coords.pdb rebuild_in_place=False
Here "rebuild_in_place=False" tells AutoBuild to build a new model,
adding or subtracting residues as necessary. The data in solve_1.mtz
(FP SIGFP PHIB FOM HLA HLB HLC HLD) will be used along with your model
as the starting point for density modification. Then a new model will be
built and refined. In subsequent cycles the models that have been built
will be used to improve the phases in density modification. If NCS is
found in your model or any model that is built, then it will be used in
density modification.
phenix.autobuild after_autosol
AutoBuild will identify the AutoSol run (in your working directory) with
the highest overall score, then it will take the experimental phases
(solve_xx.mtz or phaser_xx.mtz, where xx is the solution number) from
that run, along with the corresponding density-modified map
(resolve_xx.mtz) and the heavy_atom file (ha_xx.pdb_formatted.pdb)
as inputs. Additionally, data for refinement are read in from
exptl_fobs_freeR_flags_xx.mtz.
AutoBuild will then build a model, refine it, use the refined model in
density modification, then iterate the model-building, refinement, and
density modification process until no further improvement in the model
occurs.
phenix.autobuild data=solve_2.mtz hires_file=w1.sca seq_file=seq.dat
The high-resolution data in w1.sca will be used for FP and SIGFP. Other
information from solve_2.mtz (PHIB FOM HLA HLB HLC HLD) will be kept.
phenix.autobuild data=solve_2.mtz seq_file=seq.dat input_ha_file=ha.pdb truncate_ha_sites_in_resolve=True
The heavy-atom sites in ha.pdb will be used to mark locations where high
density is to be ignored during initial cycles of density modification.
This can be useful if the heavy-atom peaks are very pronounced in the
experimental map. The sites in ha.pdb will also be included in the model for
the structure if they do not overlap with any atoms that are built as part of
the model.
phenix.autobuild data=solve_2.mtz seq_file=seq.dat find_ncs=False refine_with_ncs=False
The keyword "find_ncs=False" disables the finding of NCS from the
models that are built and its use in density modification and
model-building. The keyword "refine_with_ncs=False" disables finding
NCS and its use in the refinement process. Together they prevent all use
of NCS.
phenix.autobuild data=data.mtz model=coords.pdb omit_box_pdb=target.pdb composite_omit_type=sa_omit
Coefficients for the output omit map will be in the file
resolve_composite_map.mtz in the subdirectory OMIT/ . An additional
map coefficients file omit_region.mtz will show you the region that has
been omitted.
phenix.autobuild data=data.mtz model=coords.pdb composite_omit_type=simple_omit
Coefficients for the output omit map will be in the file
resolve_composite_map.mtz in the subdirectory OMIT/ .
phenix.autobuild data=data.mtz model=coords.pdb composite_omit_type=sa_omit
Coefficients for the output simulated-annealing composite omit map will
be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .
If you run a composite OMIT job but it fails at the last step of
combining files, or if you run all the individual omit boxes on
different machines, you can still combine them all into one single
composite omit map.
You can do this by copying all the individual mtz files with map
coefficients for omit regions to a single directory.
Here is a script you can edit and use to combine omit maps representing
different omit regions into one.
NOTE: you need to ensure that the OMIT regions are defined the same in
the runs where you got your
overall_best_denmod_map_coeffs.mtz_OMIT_REGION_1 etc files and
this run. You ensure that with the n_xyz command that sets the grid.
You can copy this from one of your resolve log files created when you
ran your omit (i.e.,
AutoBuild_run_1_/TEMP0/AutoBuild_run_1_/TEMP0/resolve.log will
have a line like "nu nv nw: 32 32 32 " and you copy those numbers).
------------------------------------
#!/bin/csh -f
# COMBINE OMIT SCRIPT
phenix.resolve << EOD
hklin exptl_fobs_phases_freeR_flags.mtz
labin FP=FP SIGFP=SIGFP
n_xyz 32 32 32 # YOU MUST SET THIS BASED ON THE nu nv nw in a resolve log
file.
solvent_content 0.85
no_build
ha_file NONE
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_1
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_2
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_3
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_4
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_5
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_6
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_7
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_8
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_9
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_10
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_11
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_12
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_13
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_14
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_15
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_16
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_17
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_18
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_19
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_20
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_21
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_22
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_23
combine_map overall_best_denmod_map_coeffs.mtz_OMIT_REGION_24
omit
EOD
# END OF COMBINE OMIT SCRIPT
phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=target.pdb \
composite_omit_type=iterative_build_omit
Coefficients for the output omit map will be in the file
resolve_composite_map.mtz in the subdirectory OMIT/ . An additional
map coefficients file omit_region.mtz will show you the region that has
been omitted.
phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=coords.pdb \
omit_res_start_list=3 omit_res_end_list=4 omit_chain_list=A \
composite_omit_type=sa_omit
Coefficients for the output omit map will be in the file
resolve_composite_map.mtz in the subdirectory OMIT/ . An additional
map coefficients file omit_region.mtz will show you the region that has
been omitted.
phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \
include_input_model=True \
multiple_models_number=1 n_cycle_rebuild_max=5
The final model will be in the subdirectory MULTIPLE_MODELS in the file
all_models.pdb (this file will contain just one model).
Note that this procedure will keep the sequence that is present in
coords.pdb. If you supply a sequence file it will edit the sequence of
coords.pdb to match your sequence file and discard any residues that do
not match. (If you want to input a sequence file but not edit the
sequence in coords.pdb and not discard any non-matching residues, then
specify also edit_pdb=False.)
Note also that if include_input_model=True then no randomization cycle
will be carried out and multiple_models_starting_resolution is
ignored.
phenix.autobuild data=data.mtz model=coords.pdb \
touch_up=True worst_percent_res_rebuild=2 min_cc_res_rebuild=0.8
You can rebuild just the worst parts of your model by setting
touch_up=True. You can decide what parts to rebuild based on a minimum
model-map correlation (by residue). You can decide how much to rebuild
using worst_percent_res_rebuild or with min_cc_res_rebuild, or
both.
phenix.autobuild data=data.mtz model=coords.pdb \
delete_bad_residues_only=True \
input_map_file=map_coeffs.mtz \
worst_percent_res_rebuild=2 min_cc_res_rebuild=0.8
The trimmed model will be in the file (the run number may vary):
AutoBuild_run_1_/starting_model_trimmed.pdb
and the removed residues will be in the file:
AutoBuild_run_1_/starting_model_removed_residues.pdb
You can delete just the worst parts of your model by setting
delete_bad_residues_only=True. You can decide what parts to remove
based on a minimum model-map correlation (by residue). You can decide
how much to remove using worst_percent_res_rebuild or with
min_cc_res_rebuild, or both. (these are the same parameters used to
decide which residues to rebuild in touch_up=True).
Here the input_map_file is optional; if you do not provide it then a
model- based density modified map will be used to evaluate your model.
phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \
multiple_models_number=20 n_cycle_rebuild_max=5
The 20 final models will be in the subdirectory MULTIPLE_MODELS in the
file all_models.pdb. This procedure is useful for generating an
ensemble of models that are each individually consistent with the data,
and yet are diverse. The variation among these models is an indication
of the uncertainty in each of the models. Note that the ensemble of
models is not a representation of the ensemble of structures that is
truly present in the crystal.
If you have run autobuild with rebuild_in_place=True then the last
step is combining the models that have been produced. If you ran the job
in separate batches and want to combine the final models, you can use
the script below.
Note that all the models must have exactly the same set of atoms (aside
from any solvent).
Basically you run a dummy autobuild run to create a directory and
database entries, then you copy your files there, then you run autobuild
and tell it to carry on and do the combine step. You'll need a
map_coeffs.mtz file that has map coefficients (they won't be used but
have to be there just to make it run).
--------------------------------------------------------
#!/bin/csh -f
#COMBINE_MODELS SCRIPT
if (-d PDS || -d AutoBuild_run_1_) then
echo "Please run in a directory without PDS or AutoBuild_run_1_"
exit 1
endif
echo "Setting up combine models with a dummy run. NOTE:
multiple_models_group_number must be correct"
phenix.autobuild fobs.mtz multiple_models=true seq_file=seq.dat \
combine_only=true multiple_models_group_number=2 \
input_map_file=map_coeffs.mtz \
multiple_models_number=1 > dummy_autobuild.log
echo "Copying files to AutoBuild_run_1_/MULTIPLE_MODELS"
mkdir AutoBuild_run_1_/MULTIPLE_MODELS
cp coords1.pdb AutoBuild_run_1_/MULTIPLE_MODELS/initial_model.pdb_1_1
cp coords2.pdb AutoBuild_run_1_/MULTIPLE_MODELS/initial_model.pdb_1_2
cp map_coeffs_1.mtz AutoBuild_run_1_/MULTIPLE_MODELS/initial_model.mtz_1_1
cp map_coeffs_2.mtz AutoBuild_run_1_/MULTIPLE_MODELS/initial_model.mtz_1_2
ls AutoBuild_run_1_/MULTIPLE_MODELS/
echo "Running autobuild to combine files in
AutoBuild_run_1_/MULTIPLE_MODELS"
phenix.autobuild combine_only=true seq_file=seq.dat carry_on=true run=1 > autobuild_combine.log
# END OF COMBINE_MODELS SCRIPT
-------------------------------------------------------
phenix.autobuild data=data.mtz model=MR.pdb \
rebuild_from_fragments=True\
seq_file=seq.dat \
i_ran_seed=124881 \
nproc=4
You can have autobuild try to start rebuilding from fragments of a
model. Keyword is rebuild_from_fragments=True. This sets the
parameters two_fofc_denmod_in_rebuild=True,
all_maps_in_rebuild=True, rebuild_in_place=False, and sets
consider_main_chain_list to include your input model. You might want
to use this if you look for ideal helices using Phaser, then rebuild the
resulting partial model, as in the
Arcimboldo procedure. The
special feature of finding helices is that they can be very accurately
placed in some cases. This really helps the subsequent rebuilding. If
you have enough computer time, then run it several or even many times
with different values of i_ran_seed. Each time you'll get a slightly
different result. Here two different types of density-modified maps are
calculated and models are built with each. The starting phases and phase
probabilities for one type are based on a sigmaA-weighted 2mFo-DFc map.
Those for the other type come from density modification using a
model-based map as a target map and finding phases that yield a map that
is as close to this one as possible. In either case the starting phases
and phase probabilities are used in a second cycle of density
modification in which part of the density modification target is a
calculated map and part is standard density modification (including
solvent flattening, histogram matching, NCS).
phenix.autobuild data=data.mtz model=MR.pdb \
morph=True rebuild_in_place=False seq_file=seq.dat
You can have autobuild morph your input model, distorting it to match
the density-modified map that is produced from your model and data. This
can be used to make an improved starting model in cases where the MR
model is very different than the structure that is to be solved. For the
morphing to work, the two structures must be topologically similar and
differ mostly by movements of domains or motifs such as a group of
helices or a sheet.
The morphing process consists of identifying a coordinate shift to apply
to each N (or P for nucleic acids) atom that maximizes the local density
correlation between the model and the map. This is smoothed and applied
to the structure to generate a morphed structure.
phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=RNA
phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=DNA
You can use the AutoBuild Wizard as a convenient way to run resolve
density modification with or without including model-based information.
Just use a command like this:
phenix.autobuild data=data.mtz model=coords.pdb \
maps_only=True seq_file=seq.dat
or
phenix.autobuild data=data.mtz \
maps_only=True seq_file=seq.dat
The Wizard will calculate the same map that it would normally calculate
given these data, and then it will write the map out and stop.
You can use the AutoBuild Wizard as a convenient way to run resolve
density modification starting with map coefficients you define. Just use
a command like this:
phenix.autobuild data=data.mtz \
maps_only=True seq_file=seq.dat \
map_file=starting_map.mtz map_labels="2FOFCWT PH2FOFCWT"
The Wizard will start with the phases in starting_map.mtz calculate the
same map that it would normally calculate given these data, and then it
will write the map out and stop.
phenix.autobuild data=data.mtz solvent_fraction=.6 \
ps_in_rebuild=True model=coords.pdb maps_only=True
The output prime-and-switch map will be in the file
prime_and_switch.mtz.
- The AutoBuild wizard edits input PDB files to remove multiple
conformations. It will also renumber residues if the file contains
residues with insertion codes. All references to residue numbers
(e.g. rebuild_res_start_list) refer to the edited, renumbered
model. This model can be found in the AutoBuild_run_1_ (or
appropriate) directory as "edited_pdb.pdb".
- If you are using rebuild_in_place then your model must be quite
similar to your sequence file, and in particular the model must not
extend in the N-terminal direction beyond your sequence file. Minor
edits (amino acid replacements) will be done automatically.
- The AutoBuild wizard expects residue numbers to not decrease along a
chain. It will stop if residue 250 in chain B is found between
residues 116 and 117 in the same chain, for example. To get around
this, use insertion codes (make residue 250 residue 116A instead).
- The keywords "cell" and "sg" have been replaced with "unit_cell" and
"space_group" to make the keywords the same as in other phenix
applications.
- The AutoBuild model-building can only build one type of chain at a
time (default chain_type='PROTEIN'; other choices are RNA and DNA).
If you supply a PDB file containing more than one type of chain for
rebuilding, then all the residues that are not that type of chain are
treated as ligands and are (by default, keep_input_ligands=True)
included in refinement but not in rebuilding. Any input solvent
molecules are (by default, keep_input_waters=False) ignored.
You can include more than one type of chain in rebuilding by supplying
one type of chains as ligands with input_lig_file_list and rebuilding
another type:
chain_type=PROTEIN # build only protein
input_lig_file_list=MyDNA.pdb # just read in DNA coordinates and include in refinement
In this case only protein chains will be built, but the DNA coordinates
in MyDNA.pdb will be included in all refinements and will be written out
to the final coordinate file. You may wish to add the keyword:
keep_pdb_atoms=False #keep the ligand atoms if model (pdb) and ligand overlap
which will tell AutoBuild that the ligand (DNA) atoms are to be kept if
the model that is being built (protein) overlaps with it. (The default
is to keep the model that is being built and to discard any ligand atoms
that overlap).
This whole process is likely to require substantial editing of the PDB
files by hand because when you build DNA, a lot of chains are going to
be built into the protein region, and when you build protein, it is
going to be accidentally built into the DNA.
- Any file in input_lig_file_list containing ATOM records will have
them replaced with HETATM records. This is so that the
rebuild_in_place algorithm does not try to use them in rebuilding.
- The ligand generation routine in phenix.elbow will not generate heme
groups at this point. Most other ligands can be automatically
generated.
- If your input data file contains both intensity data and amplitude
data, only the amplitude data is exposed in the AutoBuild Wizard. If
you want to use the intensity data then you have to create a file
that does not have amplitude data in it.
- If your input data file has only intensity data and you wish to
specify which columns of data the AutoBuild Wizard is to use, then
you have to specify the names that the columns will have AFTER
importing the data and conversion to amplitudes, not the original
column names.
These column names may not be obvious. Here is how to find out what they
will be. Do a quick dummy run like this with XXX as labels:
phenix.autobuild w2.sca coords.pdb input_labels="XXX XXX"
The Wizard will print out a list of available labels like this:
Sorry, the label XXX does not exist as an amplitude array in
the input_data_file ImportRawData_run_8_/w2_PHX.mtz
...available labels are: ['w2', 'SIGw2', 'None']
Then you know that the correct command is:
phenix.autobuild w2.sca coords.pdb input_labels="w2 SIGw2"
- The AutoBuild Wizard cannot build modified residues. If you supply a
model with modified residues, these will be taken out of the chain
and treated as ligands, and the chain will be broken at that point.
By default the modified residues will be added to your model just
before refinement and a cif definitions file will be automatically
generated for these residues. You can also add these residues with
the input_lig_file_list procedure if you want.
- The AutoBuild Wizard will not build very short chains unless you set
the variable group_ca_length (default=4 for building a model from
scratch) to a smaller number. The shortest chain that will be built
is group_ca_length. If you use rebuild_in_place, then the default
shortest chain allowed is 1 residue, so any part of a model you
supply is rebuilt.
- By default the AutoBuild Wizard splits jobs into one or more parts
(determined by the parameter "nbatch") and runs them as
sub-processes. These may run sequentially or in parallel, depending
on the value of the parameter "nproc" . In some cases the running of
sub-processes can lead to timing errors in which a file is not
written fully before it is to be read by the next process. This
appears more often when jobs are run on nfs-mounted disks than on a
local disk. If this occurs, a solution is to set the parameter
"nbatch=1" so that the jobs not be run as sub-processes. You can also
specify "number_of_parallel_models=1" which will do much the same
thing. Note that changing the value of "nbatch" will normally change
the results of running the Wizard. (Changing the value of "nproc"
does not change the results, it changes only how many jobs are run at
once.)
- In many versions of the shell tcsh (and sh), the length of the shell
variable PATH is limited (for example to 4096 characters). If your
PATH is quite long then when AutoBuild runs a sub-process, it may
accidentally increase the PATH to a value that is over the limit. The
symptom is that you get a message like "Word too long". If this
happens, try 'echo $PATH' to see if it is very long...and if so see
if you can remove some entries in it. Or...you may want to shorten
your path in PHENIX by specifying: remove_path_word_list='coot cns
ccp4' (add as many paths that you have but do not need within
PHENIX). Or...you may want to install a new version of tcsh which
will allow a much longer path. You can get a new version from
ftp://ftp.astron.com/pub/tcsh/
- The size of the asymmetric unit in the SOLVE/RESOLVE portion of the
AutoBuild wizard is limited by the memory in your computer and the
binaries used. The Wizard is supplied with regular-size ("", size=6),
giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge
("_extra_huge", size=36). Larger-size versions can be obtained on
request.
- The AutoBuild Wizard can take most settings of most space groups,
however it can only use the hexagonal setting of rhombohedral space
groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups
114-119 (not found in macromolecular crystallography) even in the
standard setting due to difficulties with the use of asuset in the
version of ccp4 libraries used in PHENIX for these settings and space
groups.
Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. T.C. Terwilliger, R.W. Grosse-Kunstleve, P.V. Afonine, N.W. Moriarty, P.H. Zwart, L.-W. Hung, R.J. Read, and P.D. Adams. Acta Cryst. D64, 61-69 (2008).
Interpretation of ensembles created by multiple iterative rebuilding of macromolecular models. T.C. Terwilliger, R.W. Grosse-Kunstleve, P.V. Afonine, P.D. Adams, N.W. Moriarty, P.H. Zwart, R.J. Read, D. Turk, and L.-W. Hung. Acta Cryst. D63, 597-610 (2007).
Improving macromolecular atomic models at moderate resolution by automated iterative model building, statistical density modification and refinement. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 59, 1174-82 (2003).
Using prime-and-switch phasing to reduce model bias in molecular replacement. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 60, 2144-9 (2004).
Rapid automatic NCS identification using heavy-atom substructures. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 58, 2213-5 (2002).
Maximum-likelihood density modification. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 56, 965-72 (2000).
Statistical density modification with non-crystallographic symmetry. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 58, 2082-6 (2002).
Statistical density modification using local pattern matching. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 59, 1688-701 (2003).
Maximum-likelihood density modification using pattern recognition of structural motifs. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 57, 1755-62 (2001).
Map-likelihood phasing. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 57, 1763-75 (2001).
Automated side-chain model building and sequence assignment by template matching. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 59, 45-9 (2003).
Automated main-chain model building by template matching and iterative fragment extension. T.C. Terwilliger. Acta Crystallogr D Biol Crystallogr 59, 38-44 (2003).
- autobuild
- data = None Datafile. This file can be a .sca or mtz or other standard file. The Wizard will guess the column identification. You can specify the column labels to use with: input_labels='FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag' Substitute any labels you do not have with None. If you only have myFP and mysigFP you can just say input_labels='myFP mysigFP'. If you have free R flags, phase information or HL coefficients that you want to use then an mtz file is required. If this file contains phase information, this phase information should be experimental (i.e., MAD/SAD/MIR etc), and should not be density-modified phases (enter any files with density-modified phases as input_map_file instead). NOTE: If you supply HL coefficients they will be used in phase recombination. If you supply PHIB or PHIB and FOM and not HL coefficients, then HL coefficients will be derived from your PHIB and FOM and used in phase recombination. If you also specify a hires data file, then FP and SIGFP will come from that data file (and not this one) If an input_refinement_file is specified, then F, Sigma, FreeR_flag (if present) from that file will be used for refinement instead of this one.
- model = None PDB file with starting model. NOTE: If your PDB file has been previously refined, then please make sure that you provide the free R flags that were used in that refinement. These can come from the data file or from the refinement_file.
- seq_file = Auto Text file with 1-letter code of protein sequence. Separate chains with a blank line or line starting with >. Normally you should include one copy of each unique chain. NOTE: if 1 copy of each unique chain is provided it is assumed that there are ncs_copies (could be 1) of each unique chain. If more than one copy of any chain is provided it is assumed that the asymmetric unit contains the number of copies of each chain that are given, multiplied by ncs_copies. So if the sequence file has two copies of the sequence for chain A and one of chain B, the cell contents are assumed to be ncs_copies*2 of chain A and ncs_copies of chain B. If there are unequal numbers of copies of chains, be sure to set solvent_fraction. ADDITIONAL NOTES: 1. lines starting with > are ignored and separate chains 2. FASTA format is fine 3. If you enter a PDB file for rebuilding and it has the sequence you want, then the sequence file is not necessary. NOTE: You can also enter the name of a PDB file that contains SEQRES records, and the sequence from the SEQRES records will be read, written to seq_from_seqres_records.dat, and used as your input sequence. If you have a duplex DNA, enter each strand as a separate chain. NOTE 1: for AutoBuild you can specify start_chains_list on the first line of your sequence file: >> start_chains_list 23 11 5 NOTE 2. Characters such as numbers and non-printing characters in the sequence file are ignored. NOTE 3. Be sure that your sequence file does not have any blank lines in the middle of your sequence, as these are interpreted as the beginning of another chain.
- map_file = Auto MTZ file containing starting map. This file must be a mtz file. The Wizard will guess the column identification. You can specify the column labels to use with: input_map_labels='FP PHIB FOM' Substitute any labels you do not have with None. If you only have myFP and myPHIB you can just say input_map_labels='myFP myPHIB'. This map will be used in the first cycle of model-building. NOTE 1: If use_map_file_as_hklstart=True then this file will be used instead to start density modification. NOTE 2: default for this keyword is Auto, which means "carry out normal process to guess this keyword". This means if you specify "after_autosol" in AutoBuild, AutoBuild will automatically take the value from AutoSol. If you do not want this to happen, you can specify None which means "No file"
- refinement_file = Auto File for refinement. This file can be a .sca or mtz or other standard file. This file will be merged with your data file, with any phase information coming from your data file. If this file has free R flags, they will be used, otherwise if the data file has them, those will be used, otherwise they will be generated. The Wizard will guess the column identification. You can specify the column labels to use with: input_refinement_labels='FP SIGFP FreeR_flag' Substitute any labels you do not have with None. If you only have myFP and mysigFP you can just say input_refinement_labels='myFP mysigFP'. Data file to use for refinement. The data in this file should not be corrected for anisotropy. It will be combined with experimental phase information (if any) from input_data_file for refinement. If you leave this blank, then the data in the input_data_file will be used in refinement. If no anisotropy correction is applied to the data you do not need to specify a datafile for refinement. If an anisotropy correction is applied to the data files, then you should enter an uncorrected datafile for refinement. Any standard format is fine; normally only F and sigF will be used. Bijvoet pairs and duplicates will be averaged. If an mtz file is provided then a free R flag can be read in as well. Any HL coeffs and phase information in this file is ignored. NOTE: default for this keyword is Auto, which means "carry out normal process to guess this keyword". This means if you specify "after_autosol" in AutoBuild, AutoBuild will automatically take the value from AutoSol. If you do not want this to happen, you can specify None which means "No file"
- hires_file = Auto File with high-resolution data. This file can be a .sca or mtz or other standard file. The Wizard will guess the column identification. You can specify the column labels to use with: input_hires_labels='FP SIGFP'.
- crystal_info
- unit_cell = None Enter cell parameter (a b c alpha beta gamma)
- space_group = None Space Group symbol (i.e., C2221 or C 2 2 21)
- solvent_fraction = None Solvent fraction in crystals (0 to 1). This is normally set automatically from the number of NCS copies and the sequence. If your has unequal numbers of different chains, then be sure to set the solvent fraction.
- chain_type = *Auto PROTEIN DNA RNA You can specify whether to build protein, DNA, or RNA chains. At present you can only build one of these in a single run. If you have both DNA and protein, build one first, then run AutoBuild again, supplying the prebuilt model in the "input_lig_file_list" and build the other. NOTE: default for this keyword is Auto, which means "carry out normal process to guess this keyword". The process is to look at the sequence file and/or input pdb file to see what the chain type is. If there are more than one type, the type with the larger number of residues is guessed. If you want to force the chain_type, then set it to PROTEIN RNA or DNA.
- resolution = 0 High-resolution limit. Used as resolution limit for density modification and as general default high-resolution limit. If resolution_build or refinement_resolution are set then they override this for model-building or refinement. If overall_resolution is set then data beyond that resolution is ignored completely. Zero means keep everything.
- dmax = 500 Low-resolution limit
- overall_resolution = 0 If overall_resolution is set, then all data beyond this is ignored. NOTE: this is only suggested if you have a very big cell and need to truncate the data to allow the wizard to run at all. Normally you should use 'resolution' and 'resolution_build' and 'refinement_resolution' to set the high-resolution limit
- sequence = None Plain text containing 1-letter code of protein sequence Same as seq_file except the sequence is read directly, not from a file. If both are given, seq_file is ignored.
- input_files
- input_labels = None Labels for input data columns
- input_hires_labels = None Labels for input hires file (FP SIGFP FreeR_flag)
- input_map_labels = None Labels for input map coefficient columns (FP PHIB FOM) NOTE: FOM is optional (set to None if you wish)
- input_refinement_labels = None Labels for input refinement file columns (FP SIGFP FreeR_flag)
- input_ha_file = None If the flag "truncate_ha_sites_in_resolve" is set then density at sites specified with input_ha_file is truncated to improve the density modification procedure. Additionally these sites are added to input_lig_file_list. NOTE: if the chain ID for atoms in this file is A then the chain ID will be renamed to Z so that the residue number and chain ID combination does not duplicate residues to be built. NOTE: if a hires_file is supplied, this input_ha_file and also any atoms in chain Z of input PDB file will be ignored unless force_input_ha is set to True.
- force_input_ha = False If a hires_file is supplied, input_ha_file and any atoms in chain Z of input pdb file will be ignored unless force_input_ha is set to True.
- include_ha_in_model = True Add contents of input_ha_file to the working model just before refinement (by adding it to input_lig_file_list).
- cif_def_file_list = None You can enter any number of CIF definition files. These are normally used to tell phenix.refine about the geometry of a ligand or unusual residue. You usually will use these in combination with "PDB file with metals/ligands" (keyword "input_lig_file_list" ) which allows you to attach the contents of any PDB file you like to your model just before it gets refined. You can use phenix.elbow to generate these if you do not have a CIF file and one is requested by phenix.refine
- input_lig_file_list = None This script adds the contents of these PDB files to each model just prior to refinement. Normally you might use this to put in any heavy-atoms that are in the refined structure (for example the heavy atoms that were used in phasing), or to add a ligand to your model. (By default if you supply input_ha_file this will be added to your input_lig_file_list.) If the atoms in this PDB file are not recognized by phenix.refine, then you can specify their geometries with a cif definitions file using the keyword "cif_def_files_list". You can easily generate cif definitions for many ligands using phenix.elbow in PHENIX. You can put anything you like in the files in input_lig_file_list, but any atoms that fall within 1.5 A of any atom in the current model will be tossed (not written to the model).
- keep_input_ligands = True You can choose whether to (by default) let the wizard keep ligands by separating them out from the rest of your model and adding them back to your rebuilt model, or alternatively to remove all ligands from your input pdb file before rebuild_in_place.
- keep_input_waters = False You can choose whether to keep input waters (solvent) when using rebuild_in_place. If you keep them, then you should specify either "place_waters=No" or "keep_pdb_atoms=No" because if place_waters=True and keep_pdb_atoms=True then phenix.refine will add waters and then the wizard will keep the new waters from the new PDB file created by phenix.refine preferentially over the ones in your input file.
- keep_pdb_atoms = True If true, keep the model coordinates when model and ligand coordinates are within dist_close_overlap and ligands in input_lig_file_list are being added to the current model. If false, keep instead the ligand coordinates.
- remove_residues_on_special_positions = False If true, remove any residues with atoms on special positions just before refinement.
- refine_eff_file_list = None You can enter any number of refinement parameter files. These are normally used to tell phenix.refine defaults to apply, as well as creating specialized definitions such as unusual amino acid residues and linkages. These parameters override the normal phenix.refine defaults. They themselves can be overridden by parameters set by the Wizard and by you, controlling the Wizard. NOTE: Any parameters set by AutoBuild directly (such as number_of_macro_cycles, high_resolution, etc...) will not be taken from this parameters file. This is useful only for adding extra parameters not normally set by AutoBuild.
- map_file_is_density_modified = True You can specify that the input_map_file has been density modified. (This changes the assumptions on statistics of the map.)
- map_file_fom = None You can specify the FOM of the input map file (useful in cases where the map file has only FWT PHFWT and no FOM column). This FOM is used to set the default smoothing radius for the density modification solvent boundary and also to decide whether extreme density modification is to be applied
- use_map_file_as_hklstart = None You can specify that the file named as input_map_file will be used as starting coefficients for density modification in the first cycle. NOTE: if maps_only=True and input_map_file is set, then use_map_file_as_hklstart will be set to True
- use_map_in_resolve_with_model = False You can specify that the current map file be used as hklstart in density modification with a model.
- identity_from_remark = True You can use sequence identity from remark like this one: REMARK PHASER ENSEMBLE MODEL 0 ID 100.0 Here the identity is 100 percent. Ignored if there is no remark of this kind in the input model.
- input_data_type = None You can specify the data type for shelx files. The options are: amplitudes (same as hklf3) or intensities (same as hklf4)
- aniso
- remove_aniso = True Remove anisotropy from data files before use Note: map files are assumed to be already corrected and are not affected by this. Also the input refinement file is not affected by this.
- b_iso = None Target overall B value for anisotropy correction. Ignored if remove_aniso = False. If None, default is minimum of (max_b_iso, lowest B of datasets, target_b_ratio*resolution)
- max_b_iso = 40. Default maximum overall B value for anisotropy correction. Ignored if remove_aniso = False. Ignored if b_iso is set. If used, default is minimum of (max_b_iso, lowest B of datasets, target_b_ratio*resolution)
- target_b_ratio = 10. Default ratio of target B value to resolution for anisotropy correction. Ignored if remove_aniso = False. Ignored if b_iso is set. If used, default is minimum of (max_b_iso, lowest B of datasets, target_b_ratio*resolution)
- decision_making
- acceptable_r = 0.25 Used to decide whether the model is acceptable enough to quit if it is not improving much. A good value is 0.25
- r_switch = 0.4 R-value criteria for deciding whether to use R-value or map correlation as a criteria for model quality. A good value is 0.40
- semi_acceptable_r = 0.3 Used to decide whether the model is acceptable enough to skip rebuilding the model from scratch and focus on adding loops and extending it. A good value is 0.3
- reject_weak = False You can rebuild or remove just the residues in weak density This will reject residues with density < 0.5 * mean - SD where the density, mean and SD are for either main-chain or all atoms in residues. If set, overrides min_cc_res_rebuild and worst_percent_res_rebuild.
- min_weak_z = 0.2 Minimum number of sd of rho above 0.5*mean of all residues for keeping weak residues if reject_weak=True
- min_cc_res_rebuild = 0.4 You can rebuild just the worst parts of your model by setting touch_up=True. You can decide what parts to rebuild based on a minimum model-map correlation (by residue). You can decide how much to rebuild using worst_percent_res_rebuild or with min_cc_res_rebuild, or both.
- min_seq_identity_percent = 50 The sequence in your input PDB file will be adjusted to match the sequence in your sequence file (if any). If there are insertions/deletions in your model and the wizard does not seem to identify them, you can split up your PDB file by adding records like this: BREAK You can specify the minimum sequence identity between your sequence file and a segment from your input PDB file to consider the sequences to be matched. Default is 50.0%. You might want a higher number to make sure that deletions in the sequence are noticed.
- dist_close = None If main-chain atom rmsd is less than dist_close then crossover between chains in different models is allowed at this point. If you input a negative number the defaults will be used
- dist_close_overlap = 1.5 Model or ligand coordinates but not both are kept when model and ligand coordinates are within dist_close_overlap and ligands in input_lig_file_list are being added to the current model. NOTE: you might want to decrease this if your ligand atoms get removed by the wizard. Default=1.5 A
- loop_cc_min = 0.4 You can specify the minimum correlation of density from a loop with the map.
- group_ca_length = 4 In resolve building you can specify how short a fragment to keep. Normally 4 or 5 residues should be the minimum.
- group_length = 2 In resolve building you can specify how many fragments must be joined to make a connected group that is kept. Normally 2 fragments should be the minimum.
- include_molprobity = False This command is currently disabled. You can choose to include the clash score from MolProbity as one of the scoring criteria in comparing and merging models. The score is combined with the model-map correlation CC by summing in a weighted clashscore. If clashscore for a residue has a value < ok_molp_score then its value is (clashscore-ok_molp_score)*scale_molp_score, otherwise its value is zero.
- ok_molp_score = None You can choose to include the clash score from MolProbity as one of the scoring criteria in comparing and merging models. The score is combined with the model-map correlation CC by summing in a weighted clashscore. If clashscore for a residue has a value < ok_molp_score (the threshold defined by ok_molp_score) then its value is (clashscore-ok_molp_score)*scale_molp_score, otherwise its value is zero.
- scale_molp_score = None You can choose to include the clash score from MolProbity as one of the scoring criteria in comparing and merging models. The score is combined with the model-map correlation CC by summing in a weighted clashscore. If clashscore for a residue has a value < ok_molp_score then its value is (clashscore-ok_molp_score)*scale_molp_score, otherwise its value is zero.
- density_modification
- add_classic_denmod = None You can run classic density modification with solvent flipping after any other kind of density modification. Note: this keyword cannot be used with an omit map. Note also: requires experimental phases (HL coeffs). Default is False unless extreme_dm=True and FOM is less than fom_for_extreme_dm.
- skip_classic_if_worse_fom = True Skip results of add_classic_denmod if FOM gets worse during density modification
- skip_ncs_in_add_classic = True Skip using NCS in add_classic_denmod (speeds it up)
- thorough_denmod = *Auto True False Choose whether you want to go for thorough density modification when no model is used ("False" speeds it up and for a terrible map is sometimes better)
- hl = False You can choose whether to calculate hl coeffs when doing density modification (True) or not to do so (False). Default is No.
- mask_type = *histograms probability wang classic Choose method for obtaining probability that a point is in the protein vs solvent region. Default is "histograms". If you have a SAD dataset with a heavy atom such as Pt or Au then you may wish to choose "wang" because the histogram method is sensitive to very high peaks. Options are: histograms: compare local rms of map and local skew of map to values from a model map and estimate probabilities. This one is usually the best. probability: compare local rms of map to distribution for all points in this map and estimate probabilities. In a few cases this one is much better than histograms. wang: take points with highest local rms and define as protein. Classic runs classical density modification with solvent flipping.
- mask_from_pdb = None You can specify a PDB file to define a mask for the macromolecule in density modification (i.e., the solvent boundary). All points within rad_mask_from_pdb of an atom in the PDB file defined by mask_from_pdb will be considered to be within the macromolecule
- mask_type_extreme_dm = histograms probability *wang classic If FOM of phasing is less up to fom_for_extreme_dm_rebuild then defaults for density modification become: mask_type=wang wang_radius=20 mask_cycles=1 minor_cycles=4. Applies to rebuild stages of autobuild. For build use instead fom_for_extreme_dm
- mask_cycles_extreme_dm = 1 Mask cycles in extreme density modification
- minor_cycles_extreme_dm = 4 Minor cycles in extreme density modification
- wang_radius_extreme_dm = 20. Wang radius in extreme density modification
- precondition = False Precondition density before modification
- minimum_ncs_cc = 0.30 Minimum NCS correlation to keep, except in case of extreme_dm
- extreme_dm = False Turns on extreme density modification if True or if Auto and FOM is up to fom_for_extreme_dm Use extreme_dm=True if your phasing is really weak and density modification is not working well. Note: requires input phase information (HL coeffs). Also note: not compatible with ps_in_rebuild, two_fofc_in_rebuild, or two_fofc_denmod_in_rebuild.
- fom_for_extreme_dm_rebuild = 0.10 If extreme_dm is on and FOM of phasing is up to fom_for_extreme_dm_rebuild then defaults for density modification become: mask_type=mask_type_extreme_dm wang_radius=wang_radius_extreme_dm mask_cycles=mask_cycles_extreme_dm minor_cycles=minor_cycles_extreme_dm Applies to rebuild stages of autobuild. For build use instead fom_for_extreme_dm
- fom_for_extreme_dm = 0.35 If extreme_dm is on and FOM of phasing is up to fom_for_extreme_dm then defaults for density modification become: mask_type=wang wang_radius=20 mask_cycles=1 minor_cycles=4. Applies to build stages of autobuild. For rebuild use instead fom_for_extreme_dm_rebuild
- rad_mask_from_pdb = 2 You can define the radius for calculation of the protein mask Applies only to mask_from_pdb
- modify_outside_delta_solvent = 0.05 You can set the initial solvent content to be a little lower than calculated when you are running modify_outside_model Usually 0.05 is fine.
- modify_outside_model = False You can choose whether to modify the density in the "protein" region outside the region specified in your current model by matching histograms with the region that is specified by that model. This can help by raising the density in this protein region up to a value similar to that where atoms are already placed.
- truncate_ha_sites_in_resolve = *Auto True False You can choose to truncate the density near heavy-atom sites at a maximum of 2.5 sigma. This is useful in cases where the heavy-atom sites are very strong, and rarely hurts in cases where they are not. The heavy-atom sites are specified with "input_ha_file" and the radius is rad_mask
- rad_mask = None You can define the radius for calculation of the protein mask Applies only to truncate_ha_sites_in_resolve. Default is resolution of data.
- s_step = None You can define the increment in s (1/resolution) for phase extension in resolve density modification. Default is 0.005
- res_start = None You can define the resolution for starting phase extension resolve density modification. Default is resolution to which phase information is available. Please note res_start and map_dmin_start have different effects. The starting resolution within RESOLVE is set by res_start and the step within RESOLVE is set by s_step. These can be used with any density modification procedure. The keyword map_dmin_start creates a set of macro-cycles of RESOLVE density modification where the output map for each cycle is the input_map_file for the next. The map_dmin_start keyword only applies if maps_only=True and only applies to density modification without a model.
- map_dmin_start = None Starting resolution for step-wise macro-cycle density modification. Only applies if maps_only=True and no model is supplied. Please note res_start and map_dmin_start have different effects. The starting resolution within RESOLVE is set by res_start and the step within RESOLVE is set by s_step. These can be used with any density modification procedure. The keyword map_dmin_start creates a set of macro-cycles of RESOLVE density modification where the output map for each cycle is the input_map_file for the next. The map_dmin_start keyword only applies if maps_only=True and only applies to density modification without a model.
- map_dmin_incr = 0.25 Step size (A) for changes in resolution for macro-cycle density modification
- use_resolve_fragments = True This script normally uses information from fragment identification as part of density modification for the first few cycles of model-building. Fragments are identified during model-building. The fragments are used, with weighting according to the confidence in their placement, in density modification as targets for density values.
- use_resolve_pattern = True Local pattern identification is normally used as part of density modification during the first few cycles of model building.
- use_hl_anom_in_denmod = False Default is False (use HL coefficients in density modification) NOTE: if True, you must supply HLanom coefficients Allows you to specify that HL coefficients including only the phase information from the imaginary (anomalous difference) contribution from the anomalous scatterers are to be used in density modification. Two sets of HL coefficients are produced by Phaser. HLA HLB etc are HL coefficients including the contribution of both the real scattering and the anomalous differences. HLanomA HLanomB etc are HL coefficients including the contribution of the anomalous differences alone. These HL coefficients for anomalous differences alone are the ones that you will want to use in cases where you are bringing in model information that includes the real scattering from the model used in Phaser, such as when you are carrying out density modification with a model or refinement of a model If use_hl_anom_in_denmod=True then the HLanom HL coefficients from Phaser are used in density modification
- use_hl_anom_in_denmod_with_model = False See use_hl_anom_in_denmod If use_hl_anom_in_denmod=True then the HLanom HL coefficients from Phaser are used in density modification with a model
- mask_as_mtz = False Defines how omit_output_mask_file ncs_output_mask_file and protein_output_mask_file are written out. If mask_as_mtz=False it will be a ccp4 map. If mask_as_mtz=True it will be an mtz file with map coefficients FP PHIM FOMM (all three required)
- protein_output_mask_file = None Name of map to be written out representing your protein (non-solvent) region. If mask_as_mtz=False the map will be a ccp4 map. If mask_as_mtz=True it will be an mtz file with map coefficients FP PHIM FOMM (all three required)
- ncs_output_mask_file = None Name of map to be written out representing your ncs asymmetric unit. If mask_as_mtz=False the map will be a ccp4 map. If mask_as_mtz=True it will be an mtz file with map coefficients FP PHIM FOMM (all three required)
- omit_output_mask_file = None Name of map to be written out representing your omit region. If mask_as_mtz=False the map will be a ccp4 map. If mask_as_mtz=True it will be an mtz file with map coefficients FP PHIM FOMM (all three required)
- maps
- maps_only = False You can choose whether to skip all model-building and just calculate maps and write out the results. This also runs just 1 cycle and turns on HL coefficients.
- n_xyz_list = None You can specify the grid to use for map calculations.
- model_building
- build_type = *RESOLVE RESOLVE_AND_BUCCANEER You can choose to build models with RESOLVE or with RESOLVE and BUCCANEER #and TEXTAL and how many different models to build with RESOLVE. The more you build, the more likely to get a complete model. Note that rebuild_in_place can only be carried out with RESOLVE model-building. For BUCCANEER model building you need CCP4 version 6.1.2 or higher and BUCCANEER version 1.3.0 or higher
- allow_negative_residues = False Normally the wizard does not allow negative residue numbers, and all residues with negative numbers are rejected when they are read in. You can allow them if you wish.
- highest_resno = None Highest residue number to be considered "placed" in sequence for rebuild_in_place
- semet = False You can specify that the dataset that is used for refinement is a selenomethionine dataset, and that the model should be the SeMet version of the protein, with all SD of MET replaced with Se of MSE.
- use_met_in_align = *Auto True False You can use the heavy-atom positions in input_ha_file as markers for Met SD positions.
- base_model = None You can enter a PDB file with coordinates to be used as a starting point for model-building. These coordinates will be included in the same way as fragments placed by searching for helices and strand in initial model-building. Note the difference from the use of models in consider_main_chain_list, which are merged with models after they are built. NOTE: Only use this if you want to keep the input model and just add to it.
- consider_main_chain_list = None This keyword lets you name any number of PDB files to consider as templates for model-building. Every time models are built, the contents of these files will be merged with them and the best parts will be kept. NOTE: this only uses the main-chain atoms of your PDB files.
- dist_connect_max_helices = None Set maximum distance between ends of helices and other ends to try and connect them in insert_helices.
- edit_pdb = True You can choose to edit the input PDB file in rebuild_in_place to match the input sequence (default=True). NOTE: residues with residue numbers higher than 'highest_resno' are assumed to not have a known sequence and will not be edited. By default the value of 'highest_resno' is the highest residue number from the sequence file, after adding it to the starting residue number from start_chains_list. You can also set it directly
- helices_strands_only = False You can choose to use a quick model-building method that only builds secondary structure. At low resolution this may be both quicker and more accurate than trying to build the entire structure
- resolution_helices_strands = 3 Resolution to switch to helices_strands_only
- helices_strands_start = False You can choose to use a quick model-building method that builds secondary structure as a way to get started...then model completion is done as usual. (Contrast with helices_strands_only which only does secondary structure)
- cc_helix_min = None Minimum CC of helical density to map at low resolution when using helices_strands_only
- cc_strand_min = None Minimum CC of strand density to map when using helices_strands_only
- loop_lib = False Use loop library to fit loops Only applicable for chain_type=PROTEIN
- standard_loops = True Use standard loop fitting
- trace_loops = False Use loop tracing to fit loops Only applicable for chain_type=PROTEIN
- refine_trace_loops = True Refine loops (real-space) after trace_loops
- density_of_points = None Packing density of points to consider as as possible CA atoms in trace_loops. Try 1.0 for a quick run, up to 5 for much more thorough run If None, try value depending on value of quick.
- max_density_of_points = None Maximum packing density of points to consider as as possible CA atoms in trace_loops.
- cutout_model_radius = None Radius to cut out density for trace_loops If None, guess based on length of loop
- max_cutout_model_radius = 20. Maximum value of cutout_model_radius to try
- padding = 1. Padding for cut out density in trace_loops
- max_span = 30 Maximum length of a gap to try to fill
- max_overlap = None Maximum number of residues from ends to start with. (1=use existing ends, 2=one in from ends etc) If None, set based on value of quick.
- min_overlap = None Minimum number of residues from ends to start with. (1=use existing ends, 2=one in from ends etc)
- include_input_model = True The keyword include_input_model defines whether the input model (if any) is to be crossed with models that are derived from it, and the best parts of each kept. It also defines whether the input model is to be included in combination steps during initial model-building. Note that if multiple_models=True and include_input_model=True then no initial cycle of randomization will be carried out and the keyword multiple_models_starting_resolution is ignored. In most cases you should use include_input_model=True If you want to generate maximum diversity with multiple-models then you may wish to use include_input_model=False. Also if you want to decrease the amount of bias from your starting model you may wish to use include_input_model=False.
- input_compare_file = None If you are rebuilding a model or already think you know what the model should be, you can include a comparison file in rebuilding. The model is not used for anything except to write out information on coordinate differences in the output log files. NOTE: this feature does not always work correctly.
- merge_models = False You can choose to only merge any input models and write out the resulting model. The best parts of each model will be kept based on model-map correlation. Normally used along with number_of_parallel_models=1
- morph = False You can choose whether to distort your input model in order to match the current working map. This may be useful for MR models that are quite distant from the correct structure.
- morph_main = False You can choose whether to use only main-chain atoms plus c-beta atoms in calculation of shifts in morphing. Default is morph_main=False; use all atoms including side-chain atoms.
- dist_cut_base = 3.0 Tolerance for base pairing (A) for RNA/DNA)
- morph_cycles = 2 Number of iterations of morphing each time it is run.
- morph_rad = 7 Smoothing radius for morphing. The density from your model and from the map are calculated with the radius rad_morph, then they are adjusted to overlap optimally
- n_ca_enough_helices = None Set maximum number of CA to add to ends of helices and other ends to try and connect them in insert_helices.
- delta_phi = 20 Approximate angular sampling for search for regular secondary structure in building
- offsets_list = 53 7 23 You can specify an offset for the orientation of the helix and strand templates in building. This is used in generating different starting models.
- all_maps_in_rebuild = False If two_fofc_in_rebuild or two_fofc_denmod_in_rebuild are set you can choose to try both density-modified and two_fofc-based maps in building. Note: Set to True if you specify rebuild_from_fragments=True. Note: not compatible with map_phasing=True.
- ps_in_rebuild = False You can choose to use a prime-and-switch resolve map in all cycles of rebuilding instead of a density-modified map. This is normally used in combination with maps_only to generate a prime-and-switch map. The map coeffs will be in prime_and_switch.mtz
- use_ncs_in_ps = False You can choose to use NCS in prime-and-switch
- remove_outlier_segments_z_cut = 3.0 You can remove any segments that are not assigned to sequence during model-building if the mean density at atomic positions are more than remove_outlier_segments_z_cut sd lower than the mean for the structure.
- refine = True This script normally refines the model during building. Say False to skip refinement
- refine_final_model_vs_orig_data = True This script normally refines the model at the end against the original (non-aniso-corrected) data and writes out a CIF version of the model as well
- reference_model = None You can specify a reference model for refinement
- resolution_build = 0 Enter the high-resolution limit for model-building. If 0.0, the value of resolution is used as a default.
- restart_cycle_after_morph = 5 Morphing (if morph=True) will go only up to this cycle, and then the morphed PDB file will be used as a starting PDB file from then on, removing all previous models. If restart_cycle_after_morph=0 then the model will be morphed and not rebuilt
- retrace_before_build = False You can choose to retrace your model n_mini times and use a map based on these retraced models to start off model-building. This is the default for rebuilding models if you are not using rebuild_in_place. You can also specify n_iter_rebuild, the number of cycles of retrace-density-modify-build before starting the main build.
- reuse_chain_prev_cycle = True You can choose to allow model-building to include atoms from each cycle in the model the next cycle or not This must be true if you use retrace_before_build
- richardson_rotamers = *Auto True False You can choose to use the rotamer library from SC Lovell, JM Word, JS Richardson and DC Richardson (2000) " The Penultimate Rotamer Library" Proteins: Structure Function and Genetics 40 389-408. if you wish. Typically this works well in RESOLVE model-building for nearly-final models but not as well earlier in the process . Default (Auto) is to use these rotamers for rebuild_in_place but not otherwise.
- rms_random_frag = None Rms random position change added to residues on ends of fragments when extending them If you enter a negative number, defaults will be used.
- rms_random_loop = None Rms random position change added to residues on ends of loops in tries for building loops If you enter a negative number, defaults will be used.
- start_chains_list = None You can specify the starting residue number for each of the unique chains in your structure. If you use a sequence file then the unique chains are extracted and the order must match the order of your starting residue numbers. For example, if your sequence file has chains A and B (identical) and chains C and D (identical to each other, but different than A and B) then you can enter 2 numbers, the starting residues for chains A and C. NOTE: you need to specify an input sequence file for start_chains_list to be applied.
- trace_as_lig = False You can specify that in building steps the ends of chains are to be extended using the LigandFit algorithm. This is default for nucleic acid model-building.
- track_libs = False You can keep track of what libraries each atom in a built structure comes from.
- two_fofc_denmod_in_rebuild = False You can choose to use a density-modified sigmaa-weighted 2Fo-Fc map in all cycles of rebuilding instead of a density-modified map. In density modification the density in the region defined by the current model will be truncated at +2sigma to reduce the dominance of parts of the map with model defined. Additionaly only 2 mask cycles of 3 minor cycles will be done. Additionally place_waters will be turned off. If the model is highly incomplete this can sometimes allow model-building to work even when it will not for density-modified maps. The map coeffs will be in two_fofc_denmod_map.mtz. You might consider turning on all_maps_in_rebuild as well. Note: Setting two_fofc_denmod_in_rebuild=True will by default set place_waters=False. Set to True if you specify rebuild_from_fragments=True.
- rebuild_from_fragments = False You can use rebuild_from_fragments=True as a shortcut to turn on two_fofc_denmod_in_rebuild and all_maps_in_rebuild and to use your model in each cycle with consider_main_chain_list. If you use rebuild_from_fragments=True you might also want to set i_ran_seed=xxxxx for some integer xxxxx and run the job 10 or 20 times to have a higher chance of success. This approach is designed for cases where you have a small part of your model very accurately placed and want to build the rest of the model.
- two_fofc_in_rebuild = False You can choose to use a sigmaa-weighted 2Fo-Fc map in all cycles of rebuilding instead of a density-modified map. If the model is poor this can sometimes allow model-building in place to work even when it will not for density-modified maps.
- refine_map_coeff_labels = "2FOFCWT PH2FOFCWT" You can pick which map coefficients from phenix.refine to use if two_fofc_in_rebuild=True
- filled_2fofc_maps = True You can choose to use filled 2Fo-Fc maps when two_fofc_in_rebuild is used. Default is True
- map_phasing = False You can choose to use statistical density modification starting with a 2mFo-DFc map, including model information instead of a standard density-modified map with model information. This density modification will include NCS if present. Note: not compatible with all_maps_in_rebuild=True
- use_any_side = True You can choose to have resolve model-building place the best-fitting side chain at each position, even if the sequence is not matched to the map.
- use_cc_in_combine_extend = False You can choose to use the correlation of density rather than density at atomic positions to score models in combine_extend
- sort_hetatms = False Waters are automatically named with the chain of the closest macromolecule if you set sort_hetatms=True This is for the final model only.
- map_to_object = None you can supply a target position for your model with map_to_object=my_target.pdb. Then at the very end your molecule will be placed as close to this as possible. The center of mass of the autobuild model will be superimposed on the center of mass of my_target.pdb using space group symmetry, taking any match closer than 15 A within 3 unit cells of the original position. The new file will be overall_best_mapped.pdb
- multiple_models
- combine_only = False Once you have created a set of initial models you can merge them together into a final set. This option is useful if you have split up the creation of multiple models into different directories, and then you have copied all the initial models to one directory for combining.
- multiple_models = False You can build a set of models, all compatible with your data. You can specify how many models with multiple_models_number. If you are using rebuild_in_place you can specify whether to generate starting models or not with multiple_models_starting.
- multiple_models_first = 1 Specify which model to build first
- multiple_models_group_number = 5 You can build several initial models and merge them. Normally 5 initial models is fine.
- multiple_models_last = 20 Specify which model to end with
- multiple_models_number = 20 Specify how many models to build.
- multiple_models_starting = True You can specify how to generate starting models for multiple models. If you are using rebuild_in_place and you specify "True" then the Wizard will rebuild your starting model at the resolution specified in multiple_models_starting_resolution. If you are not using rebuild_in_place the Wizard will always build a starting model at the current resolution.
- multiple_models_starting_resolution = 4 You can set the resolution for rebuilding an initial model. A value of 0.0 will use the resolution of the dataset.
- place_waters_in_combine = None You can choose whether phenix.refine automatically places ordered solvent (waters) during the last cycle of multiple-model generation. This is separate from place_waters, which applies to all other cycles. If None, then value of place_waters will be used.
- ncs
- find_ncs = *Auto True False This script normally deduces ncs information from the NCS in chains of models that are built during iterative model-building. The update is done each cycle in which an improved model is obtained. Say False to skip this. See also "input_ncs_file" which can be used to specify NCS at the start of the process. If find_ncs="No" then only this starting NCS will be used and it will not be updated. You can use find_ncs "No" to specify exactly what residues will be used in NCS refinement and exactly what NCS operators to use in density modification. You can use the function $PHENIX/phenix/phenix/command_line/simple_ncs_from_pdb.py to help you set up an input_ncs_file that has your specifications in it. NOTE: if an input map_file is provided then if no ncs is found from a model, ncs will be searched for in the density of that map.
- input_ncs_file = None You can enter NCS information in 3 ways: (1) an ncs_spec file produced by AutoSol or AutoBuild with NCS information (2) a heavy-atom PDB file that contains ncs in the heavy-atom sites (3) a PDB file with a model that contains chains with NCS The wizard will derive NCS information from any of these if specified. See also "find_ncs" which determines whether the wizard will update NCS from models that are built during iterative building.
- ncs_copies = None Number of copies of the molecule in the au (note: only one type of molecule allowed at present)
- ncs_refine_coord_sigma_from_rmsd = False You can choose to use the current NCS rmsd as the value of the sigma for NCS restraints. See also ncs_refine_coord_sigma_from_rmsd_ratio
- ncs_refine_coord_sigma_from_rmsd_ratio = 1 You can choose to multiply the current NCS rmsd by this value before using it as the sigma for NCS restraints See also ncs_refine_coord_sigma_from_rmsd
- no_merge_ncs_copies = False Normally False (do merge NCS copies). If True, then do not use each NCS copy to try to build the others.
- optimize_ncs = True This script normally deduces ncs information from the NCS in chains of models that are built during iterative model-building. Optimize NCS adds a step to try and make the molecule formed by NCS as compact as possible, without losing any point-group symmetry.
- use_ncs_in_build = True Use NCS information in the model assembly stage of model-building. Also if no_merge_ncs_copies is not set, then use each NCS copy to try to build the others.
- ncs_in_refinement = *torsion cartesian None Use torsion_angle refinement of NCS. Alternative is cartesian or None (None will use phenix.refine default)
- omit
- composite_omit_type = *None simple_omit refine_omit sa_omit iterative_build_omit Your choices of types of OMIT maps are: None - normal operation, no omit simple_omit - omit the atoms in OMIT region in calculating a sigmaA-weighted 2mFo-DFc map with no refinement. refine_omit - as simple_omit, but refine with standard refinement. sa_omit - omit the atoms in OMIT region, carry out simulated-annealing refinement, then calculate a sigmaA-weighted 2mFo-DFc map. iterative_build_omit - set occupancy of atoms in OMIT region to 0 throughout an entire iterative model-building, density modification and refinement process (takes a long time). All these omit map types are available as composite omit maps (default) or as omit maps around a region defined by a PDB file (using omit_box_pdb_list) The resulting OMIT map will be in the directory OMIT with file name resolve_composite_map.mtz . This mtz file contains the map coefficients to create the OMIT map. The file "omit_region.mtz" contains the coefficients for a map showing the boundaries of the OMIT region.
- n_box_target = None You can tell the Wizard how many omit boxes to try and set up (but it will not necessarily choose your number because it has to be nicely divisible into boxes that fit your asymmetric unit). A suitable number is 24. The larger the number of boxes, the better the map will be, but the longer it will take to calculate the map.
- n_cycle_image_min = 3 Pattern recognition (resolve_pattern) and fragment identification ("image based density modification") are used as part of the density modification process. These are normally only useful in the first few cycles of iterative model-building. This script tries model-building both with and without including image information, and proceeds with the most complete model. Once at least n_cycle_image_min cycles have been carried out with image information, if the image-based map results in a less-complete model than the one without image information, image information is no longer included.
- n_cycle_rebuild_omit = 10 Model-building is normally carried out using the "best" available map. If omit_on_rebuild is True, then every n_cycle_rebuild_omit cycle of model rebuilding, a composite omit map is used instead. If you specify 0 and omit_on_rebuild is True, omit maps will be used every cycle. Normally every 10th cycle is optimal.
- offset_boundary = 2. Specify the boundary in A around atoms in omit_box_pdb for definition of omit region. Contrast with omit_boundary which applies for composite omit
- omit_boundary = 2. Specify the boundary in A around atoms in omit_boxes for definition of omit region. Contrast with offset_boundary which applies for omit_box_pdb
- omit_box_start = 0 To only carry out omit in some of the omit boxes, use omit_box_start and omit_box_end
- omit_box_end = 0 To only carry out omit in some of the omit boxes, use omit_box_start and omit_box_end
- omit_box_pdb_list = None This keyword applies if you have set OMIT region specification to "omit_around_pdb". To automatically set an OMIT region specify a PDB file(s) with omit_box_pdb_list. The omit region boundaries will be the limits in x y z of the atoms in this file, plus a border of offset_boundary. To use only some of the atoms in the file, specify values for starting, ending and chain to omit (omit_res_start_list and omit_res_end_list and omit_chain_list) If you specify more than one file (or if you specify more than one segment of a file with omit_chain_list or omit_res_start_list and omit_res_end_list) then a set of omit runs will be carried out and combined into one composite omit.
- omit_chain_list = None You can choose to omit just a portion of your model keywords omit_res_start_list 3 omit_res_end_list 4 omit_chain_list chain1 (use "" to select all chains) The residues from 3 to 4 of chain1 will be omitted. You can specify more than one region by listing them separated by spaces If you specify more than one region, a separate omit run will be carried out for each one and then the maps will be put together afterwards. If there are more than one chains in the input PDB file then only the chain defined by omit_chain will be omitted NOTE: Zero for start and end and "" for chain is the same as choosing everything
- omit_offset_list = 0 0 0 0 0 0 To carry out one iterative build omit, with a region defined in grid units, enter nxs,nxe,nys,nye,nzs,nze in omit_offset_list.
- omit_on_rebuild = False You can specify whether to use an omit map for building the model on rebuild cycles. Default is True if you start with a model, False if you are building a model from scratch. The omit map is calculated every n_cycle_rebuild_omit cycles
- omit_selection = None Selection string defining atoms in input pdb to be used to define the OMIT region. For use with omit_region_specification=omit_selection
- omit_region_specification = *composite_omit omit_around_pdb omit_selection You can specify what region an omit (simple/sa-omit/iterative-build-omit) map is to be calculated for. Composite omit will create a map over the entire asymmetric unit by dividing the asymmetric unit into overlapping boxes, calculating omit maps for each, and splicing all the results together into a single composite omit map. You can tell the Wizard how many omit boxes to try and set up with the keyword "n_box_target" (but it will not necessarily choose your number because it has to be nicely divisible into boxes that fit your asymmetric unit). Omit around PDB will omit around the region defined by the PDB file(s) you enter for omit_box_pdb (or around the residues in that PDB file that you specify). If you specify omit_around_pdb then you must enter a pdb file to omit around. If you specify omit_selection you must enter a selection string in omit_selection
- omit_res_start_list = None You can choose to omit just a portion of your model keywords omit_res_start_list 3 omit_res_end_list 4 omit_chain_list chain1 (use " " for blank). The residues from 3 to 4 of chain1 will be omitted. You can specify more more than one region by listing them separated by spaces If you specify more than one region, a separate omit run will be carried out for each one and then the maps will be put together afterwards. If there are more than one chains in the input PDB file then only the chain defined by omit_chain will be rebuilt. NOTE: Zero for start and end and "" for chain is the same as choosing everything
- omit_res_end_list = None You can choose to omit just a portion of your model keywords omit_res_start_list 3 omit_res_end_list 4 omit_chain_list chain1 (use " " for blank). The residues from 3 to 4 of chain1 will be omitted. You can specify more more than one region by listing them separated by spaces If you specify more than one region, a separate omit run will be carried out for each one and then the maps will be put together afterwards. If there are more than one chains in the input PDB file then only the chain defined by omit_chain will be omitted NOTE: Zero for start and end and "" for chain is the same as choosing everything
- rebuild_in_place
- min_seq_identity_percent_rebuild_in_place = 95 Minimum sequence identity to use rebuild_in_place by default
- n_cycle_rebuild_in_place = None Number of cycles for rebuild_in_place for multiple models only
- n_rebuild_in_place = 1 You can choose how many times to rebuild your model in place with rebuild_in_place
- rebuild_chain_list = None You can choose to rebuild just a portion of your model keywords rebuild_res_start_list 3 rebuild_res_end_list 4 rebuild_chain_list chain1 (use " " for blank). The residues from 3 to 4 of chain1 will be rebuilt. You can specify more than one region by using the Parameter Group Options button to add lines. If there are more than one chains in the input PDB file then only the chain defined by rebuild_chain will be rebuilt. The smallest region that can be rebuilt is 4 residues.
- rebuild_in_place = *Auto True False You can choose to rebuild your model while fixing the sequence alignment by iteratively rebuilding segments within the model. This is done n_rebuild_in_place times, then the models are recombined, taking the best-fitting parts of each. Crossovers allowed where main-chain atom rmsd is less than dist_close. Note that the sequence of the input model must match the supplied sequence closely enough to allow a clear alignment. Also this method does not build any new chain, it just moves the existing model around. Normally this procedure is useful if the model is greater than 95% identical with the target sequence. You can include information directly from the starting model if you want with the keyword include_input_model. Then this model will be recombined with the models that are built based on it. Note that this requires that the input model have a sequence that is identical to the model to be rebuilt. You can also rebuild just a portion of the model with the keywords keywords rebuild_res_start_list 3 rebuild_res_end_list 4 rebuild_chain_list chain1 (use " " for blank) The residues from 3 to 4 of chain1 will be rebuilt. You can specify more than one region by using the Parameter Group Options button to add lines NOTE: if a region cannot be rebuilt the original coordinates will be preserved for that region.
- rebuild_near_chain = None You can specify where to rebuild either with rebuild_res_start_list rebuild_res_end_list rebuild_chain_list or with rebuild_near_res and rebuild_near_chain and rebuild_near_dist.
- rebuild_near_dist = 7.5 You can specify where to rebuild either with rebuild_res_start_list rebuild_res_end_list rebuild_chain_list or with rebuild_near_res and rebuild_near_chain and rebuild_near_dist.
- rebuild_near_res = None You can specify where to rebuild either with rebuild_res_start_list rebuild_res_end_list rebuild_chain_list or with rebuild_near_res and rebuild_near_chain and rebuild_near_dist.
- rebuild_res_end_list = None You can choose to rebuild just a portion of your model keywords rebuild_res_start_list 3 rebuild_res_end_list 4 rebuild_chain_list chain1 (use " " for blank). The residues from 3 to 4 of chain1 will be rebuilt. You can specify more than one region by using the Parameter Group Options button to add lines. If there are more than one chains in the input PDB file then only the chain defined by rebuild_chain will be rebuilt. The smallest region that can be rebuilt is 4 residues.
- rebuild_res_start_list = None You can choose to rebuild just a portion of your model keywords rebuild_res_start_list 3 rebuild_res_end_list 4 rebuild_chain_list chain1 (use " " for blank). The residues from 3 to 4 of chain1 will be rebuilt. You can specify more than one region by using the Parameter Group Options button to add lines. If there are more than one chains in the input PDB file then only the chain defined by rebuild_chain will be rebuilt. The smallest region that can be rebuilt is 4 residues.
- rebuild_side_chains = False You can choose to replace side chains (with extend_only) before rebuilding the model (not normally used)
- redo_side_chains = True You can chooses to have AutoBuild choose whether to replace all your side chains in rebuild_in_place, taking new ones if they fit the density better. If True, this is applied to all side chains, not only those that are rebuilt.
- replace_existing = True In rebuild_in_place the usual default is to force the replacement of all residues, even if the rebuilt ones are not as good a fit as the original. The rebuilt model is then crossed with the original model (if include_input_model=True) and the better parts of each are then kept. You can override the replacement of all residues in the initial model-building by saying "False" (do not force replacement of residues, keep whatever is better). Additionally if you set the "touch_up" flag then the default is "True": keep whatever is better.
- delete_bad_residues_only = False You can simply delete the worst parts of your model and write out the resulting model with delete_bad_residues_only=True The criteria used are the ones set with touch_up. Any residues that would be rebuild by touch_up=True will be deleted by delete_bad_residues_only. NOTE: delete_bad_residues_only ignores ligands, waters etc. so you may need to put them back in afterwards.
- touch_up = False You can rebuild just the worst parts of your model by setting touch_up=True. You can decide what parts to rebuild based on an minimum model-map correlation (by residue). This is set with min_cc_residue_rebuild=0.82 Alternatively you can rebuild the worst percentage of these: worst_percent_res_rebuild=6. If a value is set for both of these then residues qualifying in either way are rebuilt. NOTE: touch_up is only available with rebuild_in_place.
- touch_up_extra_residues = None Number of residues on each side of the residues identified in touch_up that you want to rebuild. Normally you will want to rebuild one or more on each side.
- worst_percent_res_rebuild = 2 You can rebuild just the worst parts of your model by setting touch_up=True. You can decide how much to rebuild using worst_percent_res_rebuild or with min_cc_res_rebuild, or both.
- smooth_range = None You can specify what number of residues to smooth in making choices for touch_up and delete_bad_residues_only Typically use 3 or 5.
- smooth_minimum_length = None If specified, then any segments remaining after smoothing that are shorter than smooth_mininum_length will be removed.
- refinement
- refine_b = True You can choose whether phenix.refine is to refine individual atomic displacement parameters (B values)
- refine_se_occ = True You can choose to refine the occupancy of SE atoms in a SEMET structure (default=True). This only applies if semet=true
- skip_clash_guard = True Skip refinement check for atoms that clash
- correct_special_position_tolerance = None Adjust tolerance for special position check. If 0., then check for clashes near special positions is not carried out. This sometimes allows phenix.refine to continue even if an atom is near a special position. If 1., then checks within 1 A of special positions. If None, then uses phenix.refine default. (1)
- use_mlhl = True This script normally uses information from the input file (HLA HLB HLC HLD) in refinement. Say No to only refine on Fobs
- generate_hl_if_missing = False This script normally uses information from the input file (HLA HLB HLC HLD) in refinement. Say No to not generate HL coeffs from input phases.
- place_waters = True You can choose whether phenix.refine automatically places ordered solvent (waters) during the refinement process.
- refinement_resolution = 0 Enter the high-resolution limit for refinement only. This high-resolution limit can be different than the high-resolution limit for other steps. The default ("None" or 0.0) is to use the overall high-resolution limit for this run (as set by resolution)
- ordered_solvent_low_resolution = None You can choose what resolution cutoff to use fo placing ordered solvent in phenix.refine. If the resolution of refinement is greater than this cutoff, then no ordered solvent will be placed, even if refinement.main.ordered_solvent=True.
- link_distance_cutoff = 3 You can specify the maximum bond distance for linking residues in phenix.refine called from the wizards.
- r_free_flags_fraction = 0.1 Maximum fraction of reflections in the free R set. You can choose the maximum fraction of reflections in the free R set and the maximum number of reflections in the free R set. The number of reflections in the free R set will be up the lower of the values defined by these two parameters.
- r_free_flags_max_free = 2000 Maximum number of reflections in the free R set. You can choose the maximum fraction of reflections in the free R set and the maximum number of reflections in the free R set. The number of reflections in the free R set will be up the lower of the values defined by these two parameters.
- r_free_flags_use_lattice_symmetry = True When generating r_free_flags you can decide whether to include lattice symmetry (good in general, necessary if there is twinning).
- r_free_flags_lattice_symmetry_max_delta = 5 You can set the maximum deviation of distances in the lattice that are to be considered the same for purposes of generating a lattice-symmetry-unique set of free R flags.
- allow_overlapping = None Default is None (set automatically, normally False unless S or Se atoms are the anomalously-scattering atoms). You can allow atoms in your ligand files to overlap atoms in your protein/nucleic acid model. This overrides 'keep_pdb_atoms' Useful in early stages of model-building and refinement The ligand atoms get the altloc indicator 'L' NOTE: The ligand occupancy will be refined by default if you set allow_overlapping=True (because of the altloc indicator) You can turn this off with fix_ligand_occupancy=True
- fix_ligand_occupancy = None If allow_overlapping=True then ligand occupancies are refined as a group. You can turn this off with fix_ligand_occupancy=true NOTE: has no effect if allow_overlapping=False
- remove_outlier_segments = True You can remove any segments that are not assigned to sequence if their mean B values are more than remove_outlier_segments_z_cut sd higher than the mean for the structure. NOTE: this is done after refinement, so the R/Rfree are no longer applicable; the remarks in the PDB file are removed
- twin_law = None You can specify a twin law for refinement like this: twin_law='-h,k,-l'
- max_occ = None You can choose to set the maximum value of occupancy for atoms that have their occupancies refined. Default is None (use default value of 1.0 from phenix.refine)
- refine_before_rebuild = True You can choose to refine the input model before rebuilding it
- refine_with_ncs = True This script can allow phenix.refine to automatically identify NCS and use it in refinement. NOTE: ncs refinement and placing waters automatically are mutually exclusive at present.
- refine_xyz = True You can choose whether phenix.refine is to refine coordinates
- s_annealing = False You can choose to carry out simulated annealing during the first refinement after initial model-building
- skip_hexdigest = False You may wish to ignore the hexdigest of the free R flags in your input PDB file if (1) the dataset you provide is not identical to the one that you refined with (but has the same free R flags), or (2) you are providing both an input_data_file and an input_refinement_file or input_hires_file and. In the second case, the resulting composite file may not have the same hexdigest even though the free R flags are copied over. The default is to set skip_hexdigest=True for case #2. For case #1 you have to tell the Wizard to skip the hexdigest (because it cannot know about this).
- use_hl_anom_in_refinement = False See use_hl_anom_in_denmod. If use_hl_anom_in_refinement=True then the HLanom HL coefficients from Phaser are used in refinement
- thoroughness
- build_outside = True Define whether to use the BuildOutside module in model_building
- connect = True Define whether to use the connect module in model_building. This module tries to connect nearby chains with loops, without using the sequence. This is different than fit_loops (which uses the sequence to identify the exact number of residues in the loop).
- extensive_build = False You can choose whether to build a new model on every cycle and carry out extra model-building steps every cycle. Default is False (build a new model on first cycle, after that carry out extra steps).
- fit_loops = True You can fit loops automatically if sequence alignment has been done.
- insert_helices = True Define whether to use the insert_helices module in model_building. This module tries to insert helices identified with find_helices_strands into the current working model. This can be useful as the standard build sometimes builds strands into helical density at low resolution.
- n_cycle_build = None Choose number of cycles of building and chain extension during each cycle of model-building. (default of 1 ).
- n_cycle_build_max = 6 Maximum number of cycles for iterative model-building, starting from experimental phases without a model. Even if a satisfactory model is not found, a maximum of n_cycle_build_max cycles will be carried out.
- n_cycle_build_min = 1 Minimum number of cycles for iterative model-building, starting from experimental phases without a model. Even if a satisfactory model is found, n_cycle_build_min cycles will be carried out.
- n_cycle_rebuild_max = 15 Maximum number of cycles for iterative model-rebuilding, starting from a model. Even if a satisfactory model is not found, a maximum of n_cycle_rebuild_max cycles will be carried out.
- n_cycle_rebuild_min = 1 Mininum number of cycles for iterative model-rebuilding, starting from a model. Even if a satisfactory model is found, n_cycle_rebuild_min cycles will be carried out.
- n_mini = 10 You can choose how many times to retrace your model in "retrace_before_build"
- n_random_frag = 0 In resolve building you can randomize each fragment slightly so as to generate more possibilities for tracing based on extending it.
- n_random_loop = 3 Number of randomized tries from each end for building loops If 0, then one try. If N, then N additional tries with randomization based on rms_random_loop.
- n_try_rebuild = 2 Number of attempts to build each segment of chain
- ncycle_refine = 3 Choose number of refinement cycles (3)
- number_of_models = None This parameter lets you choose how many initial models to build with RESOLVE within a single build cycle. This parameter is now superseded by number_of_parallel_models, which sets the number of models (but now entire build cycles) to carry out in parallel. None or zero means set it automatically. That is what you normally should use. The number_of_models is by default set to 1 and number_of_parallel_models is set to the value of nbatch (typically 4).
- number_of_parallel_models = 0 This parameter lets you choose how many models to build in parallel. None or 0 means set it automatically. That is what you normally should use. You can set this to 1 to prevent the wizard from running multiple jobs in parallel
- skip_combine_extend = False You can choose whether to skip the combine-extend step in model-building if only one model is available
- fully_skip_combine_extend = False You can choose whether to skip the combine-extend step in model-building in all cases
- thorough_loop_fit = True Try many conformations and accept them even if the fit is not perfect? If you say True the parameters for thorough loop fitting are: n_random_loop=100 rms_random_loop=0.3 rho_min_main=0.5 while if you say False those for quick loop fitting are: n_random_loop=20 rms_random_loop=0.3 rho_min_main=1.0
- general
- coot_name = "coot" If your version of coot is called something else, then you can specify that here.
- i_ran_seed = 72432 Random seed (positive integer) for model-building and simulated annealing refinement
- raise_sorry = False You can have any failure end with a Sorry instead of simply printout to the screen
- background = True When you specify nproc=nn, you can run the jobs in background (default if nproc is greater than 1) or foreground (default if nproc=1). If you set run_command=qsub (or otherwise submit to a batch queue), then you should set background=False, so that the batch queue can keep track of your runs. There is no need to use background=True in this case because all the runs go as controlled by your batch system. If you use run_command='sh ' (or similar, sh is default) then normally you will use background=True so that all the jobs run simultaneously.
- check_wait_time = 1.0 You can specify the length of time (seconds) to wait between checking for subprocesses to end
- max_wait_time = 1.0 You can specify the length of time (seconds) to wait when looking for a file. If you have a cluster where jobs do not start right away you may need a longer time to wait. The symptom of too short a wait time is 'File not found'
- wait_between_submit_time = 1.0 You can specify the length of time (seconds) to wait between each job that is submitted when running sub-processes. This can be helpful on NFS-mounted systems when running with multiple processors to avoid file conflicts. The symptom of too short a wait_between_submit_time is File exists:....
- cache_resolve_libs = True Use caching of resolve libraries to speed up resolve
- resolve_size = 12 Size for solve/resolve ("","_giant", "_huge","_extra_huge" or a number where 12=giant 18=huge
- check_run_command = False You can have the wizard check your run command at startup
- run_command = "sh " When you specify nproc=nn, you can run the subprocesses as jobs in background with sh (default) or submit them to a queue with the command of your choice (i.e., qsub ). If you have a multi-processor machine, use sh. If you have a cluster, use qsub or the equivalent command for your system. NOTE: If you set run_command=qsub (or otherwise submit to a batch queue), then you should set background=False, so that the batch queue can keep track of your runs. There is no need to use background=True in this case because all the runs go as controlled by your batch system. If nproc is greater than 1 and you use run_command='sh '(or similar, sh is default) then normally you will use background=True so that all the jobs run simultaneously.
- queue_commands = None You can add any commands that need to be run for your queueing system. These are written before any other commands in the file that is submitted to your queueing system. For example on a PBS system you might say: queue_commands='#PBS -N mr_rosetta' queue_commands='#PBS -j oe' queue_commands='#PBS -l walltime=03:00:00' queue_commands='#PBS -l nodes=1:ppn=1' NOTE: you can put in the characters '<path>' in any queue_commands line and this will be replaced by a string of characters based on the path to the run directory. The first character and last two characters of each part of the path will be included, separated by '_',up to 15 characters. For example 'test_autobuild/WORK_5/AutoBuild_run_1_/TEMP0/RUN_1' would be represented by: 'tld_W_5_A1__TP0_1'
- condor_universe = vanilla The universe for condor is usually vanilla. However you might need to set it to local for your cluster
- add_double_quotes_in_condor = True You might need to turn on or off double quotes in condor job submission scripts. These are already default elsewhere but may interfere with condor paths.
- condor = None Specifies if the group_run_command is submitting a job to a condor cluster. Set by default to True if group_run_command=condor_submit, otherwise False. For condor job submission mr_rosetta uses a customized script with condor commands. Also uses one_subprocess_level=True
- last_process_is_local = True If true, run the last process in a group in background with sh as part of the job that is submitting jobs. This prevents having the job that is submitting jobs sit and wait for all the others while doing nothing
- skip_r_factor = False You can skip R-factor calculation if refinement is not done and maps_only=True
- test_flag_value = Auto Normally leave this at Auto (default). This parameter sets the value of the test set that is to be free. Normally phenix sets up test sets with values of 0 and 1 with 1 as the free set. The CCP4 convention is values of 0 through 19 with 0 as the free set. Either of these is recognized by default in Phenix. If you have any other convention (for example values of 0 to 19 and test set is 1) then you can specify this with test_flag_value.
- skip_xtriage = False You can bypass xtriage if you want. This will prevent you from applying anisotropy corrections, however.
- base_path = None You can specify the base path for files (default is current working directory)
- temp_dir = None Define a temporary directory (it must exist)
- clean_up = None At the end of the entire run the TEMP directories will be removed if clean_up is True. Files listed in keep_files will not be deleted. If you want to remove files after your run is finished use a command like "phenix.autobuild run=1 clean_up=True"
- print_citations = True Print citations at end of run
- solution_output_pickle_file = None At end of run, write solutions to this file in output directory if defined
- job_title = None Job title in PHENIX GUI, not used on command line
- top_output_dir = None This is used in subprocess calls of wizards and to tell the Wizard where to look for the STOPWIZARD file.
- wizard_directory_number = None This is used by the GUI to define the run number for Wizards. It is the same as desired_run_number NOTE: this value can only be specified on the command line, as the directory number is set before parameters files are read.
- verbose = False Command files and other verbose output will be printed
- extra_verbose = False Facts and possible commands will be printed every cycle if True
- debug = False You can have the wizard stop with error messages about the code if you use debug. Additionally the output goes to the terminal if you specify "debug=True"
- require_nonzero = True Require non-zero values in data columns to consider reading in.
- remove_path_word_list = None List of words identifying paths to remove from PATH These can be used to shorten your PATH. For example... cns ccp4 coot would remove all paths containing these words except those also containing phenix. Capitalization is ignored.
- fill = False Fill in all missing reflections to resolution res_fill. Applies to density modified maps. See also filled_2fofc_maps in autobuild.
- res_fill = None Resolution for filling in missing data (default = highest resolution of any datafile). Only applies to density modified maps. Default is fill to high resolution of data. Ignored if fill=False
- check_only = False Just read in and check initial parameters. Not for general use
- keep_files = overall_best* AutoBuild_run_*.log List of files that are not to be cleaned up. wildcards permitted
- after_autosol = False You can specify that you want to continue on starting with the highest-scoring run of AutoSol in your working directory.
- nbatch = 3 You can specify the number of processors to use (nproc) and the number of batches to divide the data into for parallel jobs. Normally you will set nproc to the number of processors available and leave nbatch alone. If you leave nbatch as None it will be set automatically, with a value depending on the Wizard. This is recommended. The value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch.
- nproc = 1 You can specify the number of processors to use (nproc) and the number of batches to divide the data into for parallel jobs. Normally you will set nproc to the number of processors available and leave nbatch alone. If you leave nbatch as None it will be set automatically, with a value depending on the Wizard. This is recommended. The value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch. If you set nproc=Auto and your machine has n processors, then it will use n-1 processors, or 1 if only 1 is available
- quick = False Run everything quickly (number_of_parallel_models=1 n_cycle_build_max=1 n_cycle_rebuild_max=1)
- resolve_command_list = None Commands for resolve. One per line in the form: keyword value value can be optional Examples: coarse_grid resolution 200 2.0 hklin test.mtz NOTE: for command-line usage you need to enclose the whole set of commands in double quotes (") and each individual command in single quotes (') like this: resolve_command_list="'no_build' 'b_overall 23' "
- resolve_pattern_command_list = None Commands for resolve_pattern. One per line in the form: keyword value value can be optional Examples: resolution 200 2.0 hklin test.mtz NOTE: for command-line usage you need to enclose the whole set of commands in double quotes (") and each individual command in single quotes (') like this: resolve_pattern_command_list="'resolution 200 20' 'hklin test.mtz' "
- ignore_errors_in_subprocess = False Try to ignore errors in sub-processes This is useful in cases where a very rare crash occurs and you want to just ignore that step and go on.
- send_notification = False
- notify_email = None
- special_keywords
- write_run_directory_to_file = None Writes the full name of a run directory to the specified file. This can be used as a call-back to tell a script where the output is going to go.
- run_control
- coot = None Set coot to True and optionally run=[run-number] to run Coot with the current model and map for run run-number. In some wizards (AutoBuild) you can edit the model and give it back to PHENIX to use as part of the model-building process. If you just say coot then the facts for the highest-numbered existing run will be shown.
- ignore_blanks = None ignore_blanks allows you to have a command-line keyword with a blank value like "input_lig_file_list="
- stop = None You can stop the current wizard with "stopwizard" or "stop". If you type "phenix.autobuild run=3 stop" then this will stop run 3 of autobuild.
- display_facts = None Set display_facts to True and optionally run=[run-number] to display the facts for run run-number. If you just say display_facts then the facts for the highest-numbered existing run will be shown.
- display_summary = None Set display_summary to True and optionally run=[run-number] to show the summary for run run-number. If you just say display_summary then the summary for the highest-numbered existing run will be shown.
- carry_on = None Set carry_on to True to carry on with highest-numbered run from where you left off.
- run = None Set run to n to continue with run n where you left off.
- copy_run = None Set copy_run to n to copy run n to a new run and continue where you left off.
- display_runs = None List all runs for this wizard.
- delete_runs = None List runs to delete: 1 2 3-5 9:12
- display_labels = None display_labels=test.mtz will list all the labels that identify data in test.mtz. You can use the label strings that are produced in AutoSol to identify which data to use from a datafile like this: peak.data="F+ SIGF+ F- SIGF-". The entire string in quotes counts here You can use the individual labels from these strings as identifiers for data columns in AutoSol or AutoBuild like this: input_refinement_labels="FP SIGFP FreeR_flags" # each individual label counts
- dry_run = False Just read in and check parameter names
- params_only = False Just read in and return parameter defaults. Not for general use
- display_all = False Just read in and display parameter defaults
- non_user_parameters These are obsolete parameters and parameters that the wizards use to communicate among themselves. Not normally for general use.
- gui_output_dir = None Used only by the GUI
- background_map = None You can supply an mtz file (REQUIRED LABELS: FP PHIM FOMM) to use as map coefficients to calculate the electron density in all points in an omit map that are not part of any omitted region. (Default="")
- boundary_background_map = None You can supply an mtz file (REQUIRED LABELS: FP PHIM FOMM) to use as map coefficients to calculate the electron density in all points in the boundary map that are not part of any omitted region. (Default="")
- extend_try_list = True You can fill out the list of parallel jobs to match the number of jobs you want to run at one time, as specified with nbatch.
- force_combine_extend = False You can choose whether to force the combine-extend step in model-building
- model_list = None This keyword lets you name any number of PDB files to consider as starting models for model-building. NOTE: This differs from consider_main_chain_list which will try to add your PDB files EVERY cycle of merging models. In contrast model_list will only do it on the first cycle. NOTE: this only uses the main-chain atoms of your PDB files.
- oasis_cnos = None Enter number of C N O and S atoms here if you have OASIS and want to run it before resolve density modification like this: "C 250 N 121 O 85 S 3"
- offset_boundary_background_map = None You can set the offset of the boundary_background_map.
- skip_refine = False Skip refinement (used in get_connections/assign_sequence)
- sg = None Obsolete. Use space_group instead
- input_data_file = None Not normally used (same as "data=").
- input_map_file = Auto Not normally used. (Same as map_file).
- input_refinement_file = Auto Not normally used. Same as refinement_file
- input_pdb_file = None Not normally used. Same as "model="
- input_seq_file = Auto Not normally used. Same as seq_file
- super_quick = None Shortcut for very quick run of autobuild. Same as : number_of_parallel_models=1 refine=false n_cycle_rebuild_max=1 remove_aniso=False skip_xtriage=True ncs_copies=1 find_ncs=false fully_skip_combine_extend=True fit_loops=False redo_side_chains=False insert_helices=False build_outside=False connect=False
- require_test_set = False Require that input data file have a test set