Building starting with a very poor map with parallel_autobuild
- Authors
- Purpose
- How parallel autobuild works
- Inputs for parallel autobuild
- Possible Problems
- Specific limitations:
- References:
- Keywords:
- List of all parallel_autobuild keywords
Authors
parallel_autobuild: Tom Terwilliger
Purpose
The goal of parallel_autobuild is to build models starting from very poor
maps. In this situation most of the models created by autobuild are not
very good, but some will be a little better than others. The strategy is to
find these slightly-better models and iterate model-building based on them.
The working hypothesis for parallel_autobuild is that the quality
of the starting map is a key determinant of how good the final model
will be. The strategy is then to take a starting map and data, get
the best model from this starting point, and use it to calculate a new
starting map that is hopefully a little better than the first one.
Iterating this process may finally lead to a starting map that is good
enough to yield a complete model.
How parallel autobuild works
parallel_autobuild is a tool to carry out the process of picking
good models from many autobuild runs automatically.
It will run parallel jobs of phenix.autobuild,
identify the best model and the corresponding map, then take the
best map as the starting point for another cycle of building.
The choice of what is the best model is not always obvious.
Parallel_autobuild uses the R-values of the models to make this
choice if the R is less than 0.50 (by default), and if all
R-values are higher, the map correlation of each model to its
corresponding density-modified map is used to make the choice.
Inputs for parallel autobuild
The inputs for parallel autobuild are the same as for autobuild, with
the addition of a few control parameters. The main parameters are
the number of processors to use, the number of overall cycles,
and the number of parallel autobuild jobs to be run in each cycle.
The number of processors available for each autobuild job then
determines how many models are built in each autobuild cycle.
A run_command keyword allows you to specify how jobs are
to be run. Parallel autobuild can be run on a queueing system
(with a command such as qsub) or on a multiprocessor
machine (e.g., with sh).
Possible Problems
Specific limitations:
Parallel_autobuild does not recombine models from different
parallel runs. It is possible that the use of phenix.combine_models
with parallel_autobuild could further improve its performance.
References:
- Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. T.C. Terwilliger, R.W. Grosse-Kunstleve, P.V. Afonine, N.W. Moriarty, P.H. Zwart, L.-W. Hung, R.J. Read, and P.D. Adams. Acta Cryst. D64, 61-69 (2008).
Keywords:
List of all parallel_autobuild keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names
red - parameter values
blue - parameter help
blue bold - scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
parallel_autobuild
iterations= 1 Total number of iterations of autobuilding
n_parallel= 1 Total number of autobuild runs in each iteration
parallel_r_switch= 0.5 If R
nproc= 1 Number of processors to use
background= True run jobs in background or not (if nproc is greater than 1)
run_command= "sh " Command for running jobs (e.g., sh or qsub )
verbose= False verbose output
raise_sorry= False Raise sorry if problems
debug= False debugging output
temp_dir= Auto Optional temporary work directory
work_dir= "" Work directory (by default, PARALLEL_AUTOBUILD_RUN_xxxx/)
output_dir= "" Output directory where files are to be written
dry_run= False Just read in and check parameter names
check_wait_time= 10.0 You can specify the length of time (seconds) to wait
between checking for subprocesses to end
wait_between_submit_time= 1.0 You can specify the length of time (seconds)
to wait between each job that is submitted when
running sub-processes. This can be helpful on
NFS-mounted systems when running with multiple
processors to avoid file conflicts. The symptom
of too short a wait_between_submit_time is File
exists:....
autobuild
data= None Datafile. This file can be a .sca or mtz or other standard file.
The Wizard will guess the column identification. You can specify the
column labels to use with: input_labels='FP SIGFP PHIB FOM HLA HLB
HLC HLD FreeR_flag' Substitute any labels you do not have with None.
If you only have myFP and mysigFP you can just say input_labels='myFP
mysigFP'. If you have free R flags, phase information or HL
coefficients that you want to use then an mtz file is required. If
this file contains phase information, this phase information should
be experimental (i.e., MAD/SAD/MIR etc), and should not be
density-modified phases (enter any files with density-modified phases
as input_map_file instead). NOTE: If you supply HL coefficients they
will be used in phase recombination. If you supply PHIB or PHIB and
FOM and not HL coefficients, then HL coefficients will be derived
from your PHIB and FOM and used in phase recombination. If you also
specify a hires data file, then FP and SIGFP will come from that data
file (and not this one) If an input_refinement_file is specified,
then F, Sigma, FreeR_flag (if present) from that file will be used
for refinement instead of this one.
model= None PDB file with starting model. NOTE: If your PDB file has been
previously refined, then please make sure that you provide the free
R flags that were used in that refinement. These can come from the
data file or from the refinement_file.
seq_file= Auto Text file with 1-letter code of protein sequence. Separate
chains with a blank line or line starting with >. Normally you
should include one copy of each unique chain. NOTE: if 1 copy of
each unique chain is provided it is assumed that there are
ncs_copies (could be 1) of each unique chain. If more than one
copy of any chain is provided it is assumed that the asymmetric
unit contains the number of copies of each chain that are given,
multiplied by ncs_copies. So if the sequence file has two copies
of the sequence for chain A and one of chain B, the cell contents
are assumed to be ncs_copies*2 of chain A and ncs_copies of chain
B. ADDITIONAL NOTES: 1. lines starting with > are ignored and
separate chains 2. FASTA format is fine 3. If you enter a PDB
file for rebuilding and it has the sequence you want, then the
sequence file is not necessary. NOTE: You can also enter the name
of a PDB file that contains SEQRES records, and the sequence from
the SEQRES records will be read, written to
seq_from_seqres_records.dat, and used as your input sequence. If
you have a duplex DNA, enter each strand as a separate chain.
NOTE: for AutoBuild you can specify start_chains_list on the
first line of your sequence file: >> start_chains_list 23
11 5
map_file= Auto MTZ file containing starting map. This file must be a mtz
file. The Wizard will guess the column identification. You can
specify the column labels to use with: input_map_labels='FP PHIB
FOM' Substitute any labels you do not have with None. If you only
have myFP and myPHIB you can just say input_map_labels='myFP
myPHIB'. This map will be used in the first cycle of
model-building. NOTE 1: If use_map_file_as_hklstart=True then
this file will be used instead to start density modification.
NOTE 2: default for this keyword is Auto, which means "carry
out normal process to guess this keyword". This means if you
specify "after_autosol" in AutoBuild, AutoBuild will
automatically take the value from AutoSol. If you do not want
this to happen, you can specify None which means "No
file"
refinement_file= Auto File for refinement. This file can be a .sca or mtz
or other standard file. This file will be merged with your
data file, with any phase information coming from your
data file. If this file has free R flags, they will be
used, otherwise if the data file has them, those will be
used, otherwise they will be generated. The Wizard will
guess the column identification. You can specify the
column labels to use with: input_refinement_labels='FP
SIGFP FreeR_flag' Substitute any labels you do not have
with None. If you only have myFP and mysigFP you can just
say input_refinement_labels='myFP mysigFP'. Data file to
use for refinement. The data in this file should not be
corrected for anisotropy. It will be combined with
experimental phase information (if any) from
input_data_file for refinement. If you leave this blank,
then the data in the input_data_file will be used in
refinement. If no anisotropy correction is applied to the
data you do not need to specify a datafile for refinement.
If an anisotropy correction is applied to the data files,
then you should enter an uncorrected datafile for
refinement. Any standard format is fine; normally only F
and sigF will be used. Bijvoet pairs and duplicates will
be averaged. If an mtz file is provided then a free R flag
can be read in as well. Any HL coeffs and phase
information in this file is ignored. NOTE: default for
this keyword is Auto, which means "carry out normal
process to guess this keyword". This means if you
specify "after_autosol" in AutoBuild, AutoBuild
will automatically take the value from AutoSol. If you do
not want this to happen, you can specify None which means
"No file"
hires_file= Auto File with high-resolution data. This file can be a .sca or
mtz or other standard file. The Wizard will guess the column
identification. You can specify the column labels to use with:
input_hires_labels='FP SIGFP'.
crystal_info
unit_cell= None Enter cell parameter (a b c alpha beta gamma)
space_group= None Space Group symbol (i.e., C2221 or C 2 2 21)
solvent_fraction= None Solvent fraction in crystals (0 to 1). This is
normally set automatically from the number of NCS
copies and the sequence.
chain_type= *Auto PROTEIN DNA RNA You can specify whether to build
protein, DNA, or RNA chains. At present you can only build
one of these in a single run. If you have both DNA and
protein, build one first, then run AutoBuild again,
supplying the prebuilt model in the
"input_lig_file_list" and build the other. NOTE:
default for this keyword is Auto, which means "carry
out normal process to guess this keyword". The process
is to look at the sequence file and/or input pdb file to see
what the chain type is. If there are more than one type, the
type with the larger number of residues is guessed. If you
want to force the chain_type, then set it to PROTEIN RNA or
DNA.
resolution= 0 High-resolution limit. Used as resolution limit for
density modification and as general default high-resolution
limit. If resolution_build or refinement_resolution are set
then they override this for model-building or refinement. If
overall_resolution is set then data beyond that resolution
is ignored completely. Zero means keep everything.
dmax= 500 Low-resolution limit
overall_resolution= 0 If overall_resolution is set, then all data beyond
this is ignored. NOTE: this is only suggested if you
have a very big cell and need to truncate the data
to allow the wizard to run at all. Normally you
should use 'resolution' and 'resolution_build' and
'refinement_resolution' to set the high-resolution
limit
sequence= None Plain text containing 1-letter code of protein sequence
Same as seq_file except the sequence is read directly, not
from a file. If both are given, seq_file is ignored.
input_files
input_labels= None Labels for input data columns
input_hires_labels= None Labels for input hires file (FP SIGFP
FreeR_flag)
input_map_labels= None Labels for input map coefficient columns (FP PHIB
FOM) NOTE: FOM is optional (set to None if you wish)
input_refinement_labels= None Labels for input refinement file columns
(FP SIGFP FreeR_flag)
input_ha_file= None If the flag "truncate_ha_sites_in_resolve"
is set then density at sites specified with input_ha_file
is truncated to improve the density modification
procedure. Additionally these sites are added to
input_lig_file_list.
include_ha_in_model= True Add contents of input_ha_file to the working
model just before refinement (by adding it to
input_lig_file_list).
cif_def_file_list= None You can enter any number of CIF definition
files. These are normally used to tell phenix.refine
about the geometry of a ligand or unusual residue.
You usually will use these in combination with
"PDB file with metals/ligands" (keyword
"input_lig_file_list" ) which allows you to
attach the contents of any PDB file you like to your
model just before it gets refined. You can use
phenix.elbow to generate these if you do not have a
CIF file and one is requested by phenix.refine
input_lig_file_list= None This script adds the contents of these PDB
files to each model just prior to refinement.
Normally you might use this to put in any
heavy-atoms that are in the refined structure (for
example the heavy atoms that were used in phasing),
or to add a ligand to your model. (By default if
you supply input_ha_file this will be added to your
input_lig_file_list.) If the atoms in this PDB file
are not recognized by phenix.refine, then you can
specify their geometries with a cif definitions
file using the keyword
"cif_def_files_list". You can easily
generate cif definitions for many ligands using
phenix.elbow in PHENIX. You can put anything you
like in the files in input_lig_file_list, but any
atoms that fall within 1.5 A of any atom in the
current model will be tossed (not written to the
model).
keep_input_ligands= True You can choose whether to (by default) let the
wizard keep ligands by separating them out from the
rest of your model and adding them back to your
rebuilt model, or alternatively to remove all
ligands from your input pdb file before
rebuild_in_place.
keep_input_waters= False You can choose whether to keep input waters
(solvent) when using rebuild_in_place. If you keep
them, then you should specify either
"place_waters=No" or
"keep_pdb_atoms=No" because if
place_waters=True and keep_pdb_atoms=True then
phenix.refine will add waters and then the wizard
will keep the new waters from the new PDB file
created by phenix.refine preferentially over the ones
in your input file.
keep_pdb_atoms= True If true, keep the model coordinates when model and
ligand coordinates are within dist_close_overlap and
ligands in input_lig_file_list are being added to the
current model. If false, keep instead the ligand
coordinates.
refine_eff_file_list= None You can enter any number of refinement
parameter files. These are normally used to tell
phenix.refine defaults to apply, as well as
creating specialized definitions such as unusual
amino acid residues and linkages. These parameters
override the normal phenix.refine defaults. They
themselves can be overridden by parameters set by
the Wizard and by you, controlling the Wizard.
NOTE: Any parameters set by AutoBuild directly
(such as number_of_macro_cycles, high_resolution,
etc...) will not be taken from this parameters
file. This is useful only for adding extra
parameters not normally set by AutoBuild.
map_file_is_density_modified= False You can specify that the
input_map_file has been density modified.
(This changes the assumptions on
statistics of the map.)
map_file_fom= None You can specify the FOM of the input map file (useful
in cases where the map file has only FWT PHFWT and no FOM
column). This FOM is used to set the default smoothing
radius for the density modification solvent boundary and
also to decide whether extreme density modification is to
be applied
use_map_file_as_hklstart= False You can specify that the file named as
input_map_file will be used as starting
coefficients for density modification in the
first cycle. NOTE: if maps_only=True and
input_map_file is set, then
use_map_file_as_hklstart will be set to True
use_map_in_resolve_with_model= False You can specify that the current
map file be used as hklstart in density
modification with a model.
aniso
remove_aniso= True Remove anisotropy from data files before use Note:
map files are assumed to be already corrected and are not
affected by this. Also the input refinement file is not
affected by this.
b_iso= None Target overall B value for anisotropy correction. Ignored if
remove_aniso = False. If None, default is minimum of (max_b_iso,
lowest B of datasets, target_b_ratio*resolution)
max_b_iso= 40. Default maximum overall B value for anisotropy
correction. Ignored if remove_aniso = False. Ignored if b_iso
is set. If used, default is minimum of (max_b_iso, lowest B
of datasets, target_b_ratio*resolution)
target_b_ratio= 10. Default ratio of target B value to resolution for
anisotropy correction. Ignored if remove_aniso = False.
Ignored if b_iso is set. If used, default is minimum of
(max_b_iso, lowest B of datasets,
target_b_ratio*resolution)
decision_making
acceptable_r= 0.25 Used to decide whether the model is acceptable enough
to quit if it is not improving much. A good value is 0.25
r_switch= 0.4 R-value criteria for deciding whether to use R-value or
map correlation as a criteria for model quality. A good value
is 0.40
semi_acceptable_r= 0.3 Used to decide whether the model is acceptable
enough to skip rebuilding the model from scratch and
focus on adding loops and extending it. A good value
is 0.3
reject_weak= False You can rebuild or remove just the residues in weak
density This will reject residues with density < 0.5 * mean
- SD where the density, mean and SD are for either
main-chain or all atoms in residues. If set, overrides
min_cc_res_rebuild and worst_percent_res_rebuild.
min_weak_z= 0.2 Minimum number of sd of rho above 0.5*mean of all
residues for keeping weak residues if reject_weak=True
min_cc_res_rebuild= 0.4 You can rebuild just the worst parts of your
model by setting touch_up=True. You can decide what
parts to rebuild based on a minimum model-map
correlation (by residue). You can decide how much to
rebuild using worst_percent_res_rebuild or with
min_cc_res_rebuild, or both.
min_seq_identity_percent= 50 The sequence in your input PDB file will be
adjusted to match the sequence in your
sequence file (if any). If there are
insertions/deletions in your model and the
wizard does not seem to identify them, you can
split up your PDB file by adding records like
this: BREAK You can specify the minimum
sequence identity between your sequence file
and a segment from your input PDB file to
consider the sequences to be matched. Default
is 50.0%. You might want a higher number to
make sure that deletions in the sequence are
noticed.
dist_close= None If main-chain atom rmsd is less than dist_close then
crossover between chains in different models is allowed at
this point. If you input a negative number the defaults will
be used
dist_close_overlap= 1.5 Model or ligand coordinates but not both are
kept when model and ligand coordinates are within
dist_close_overlap and ligands in
input_lig_file_list are being added to the current
model. NOTE: you might want to decrease this if your
ligand atoms get removed by the wizard. Default=1.5
A
loop_cc_min= 0.4 You can specify the minimum correlation of density from
a loop with the map.
group_ca_length= 4 In resolve building you can specify how short a
fragment to keep. Normally 4 or 5 residues should be
the minimum.
group_length= 2 In resolve building you can specify how many fragments
must be joined to make a connected group that is kept.
Normally 2 fragments should be the minimum.
include_molprobity= False This command is currently disabled. You can
choose to include the clash score from MolProbity as
one of the scoring criteria in comparing and merging
models. The score is combined with the model-map
correlation CC by summing in a weighted clashscore.
If clashscore for a residue has a value <
ok_molp_score then its value is
(clashscore-ok_molp_score)*scale_molp_score,
otherwise its value is zero.
ok_molp_score= None You can choose to include the clash score from
MolProbity as one of the scoring criteria in comparing
and merging models. The score is combined with the
model-map correlation CC by summing in a weighted
clashscore. If clashscore for a residue has a value <
ok_molp_score (the threshold defined by ok_molp_score)
then its value is
(clashscore-ok_molp_score)*scale_molp_score, otherwise
its value is zero.
scale_molp_score= None You can choose to include the clash score from
MolProbity as one of the scoring criteria in comparing
and merging models. The score is combined with the
model-map correlation CC by summing in a weighted
clashscore. If clashscore for a residue has a value <
ok_molp_score then its value is
(clashscore-ok_molp_score)*scale_molp_score, otherwise
its value is zero.
density_modification
thorough_denmod= *Auto True False Choose whether you want to go for
thorough density modification when no model is used
("False" speeds it up and for a terrible map
is sometimes better)
hl= False You can choose whether to calculate hl coeffs when doing
density modification (True) or not to do so (False). Default is No.
mask_type= *histograms probability wang classic Choose method for
obtaining probability that a point is in the protein vs
solvent region. Default is "histograms". If you
have a SAD dataset with a heavy atom such as Pt or Au then
you may wish to choose "wang" because the histogram
method is sensitive to very high peaks. Options are:
histograms: compare local rms of map and local skew of map to
values from a model map and estimate probabilities. This one
is usually the best. probability: compare local rms of map to
distribution for all points in this map and estimate
probabilities. In a few cases this one is much better than
histograms. wang: take points with highest local rms and
define as protein. Classic runs classical density
modification with solvent flipping.
mask_from_pdb= None You can specify a PDB file to define a mask for the
macromolecule in density modification (i.e., the solvent
boundary). All points within rad_mask_from_pdb of an atom
in the PDB file defined by mask_from_pdb will be
considered to be within the macromolecule
mask_type_extreme_dm= histograms probability *wang classic If FOM of
phasing is less up to fom_for_extreme_dm_rebuild
then defaults for density modification become:
mask_type=wang wang_radius=20 mask_cycles=1
minor_cycles=4. Applies to rebuild stages of
autobuild. For build use instead
fom_for_extreme_dm
mask_cycles_extreme_dm= 1 Mask cycles in extreme density modification
minor_cycles_extreme_dm= 4 Minor cycles in extreme density modification
wang_radius_extreme_dm= 20. Wang radius in extreme density modification
precondition= False Precondition density before modification
minimum_ncs_cc= 0.30 Minimum NCS correlation to keep, except in case of
extreme_dm
extreme_dm= False Turns on extreme density modification if FOM is up to
fom_for_extreme_dm
fom_for_extreme_dm_rebuild= 0.10 If extreme_dm is true and FOM of
phasing is up to fom_for_extreme_dm_rebuild
then defaults for density modification
become: mask_type=mask_type_extreme_dm
wang_radius=wang_radius_extreme_dm
mask_cycles=mask_cycles_extreme_dm
minor_cycles=minor_cycles_extreme_dm Applies
to rebuild stages of autobuild. For build
use instead fom_for_extreme_dm
fom_for_extreme_dm= 0.35 If extreme_dm is true and FOM of phasing is up
to fom_for_extreme_dm then defaults for density
modification become: mask_type=wang wang_radius=20
mask_cycles=1 minor_cycles=4. Applies to build
stages of autobuild. For rebuild use instead
fom_for_extreme_dm_rebuild
rad_mask_from_pdb= 2 You can define the radius for calculation of the
protein mask Applies only to mask_from_pdb
modify_outside_delta_solvent= 0.05 You can set the initial solvent
content to be a little lower than
calculated when you are running
modify_outside_model Usually 0.05 is fine.
modify_outside_model= False You can choose whether to modify the density
in the "protein" region outside the
region specified in your current model by matching
histograms with the region that is specified by
that model. This can help by raising the density
in this protein region up to a value similar to
that where atoms are already placed.
truncate_ha_sites_in_resolve= *Auto True False You can choose to
truncate the density near heavy-atom sites
at a maximum of 2.5 sigma. This is useful
in cases where the heavy-atom sites are
very strong, and rarely hurts in cases
where they are not. The heavy-atom sites
are specified with
"input_ha_file" and the radius
is rad_mask
rad_mask= None You can define the radius for calculation of the protein
mask Applies only to truncate_ha_sites_in_resolve. Default is
resolution of data.
use_resolve_fragments= True This script normally uses information from
fragment identification as part of density
modification for the first few cycles of
model-building. Fragments are identified during
model-building. The fragments are used, with
weighting according to the confidence in their
placement, in density modification as targets for
density values.
use_resolve_pattern= True Local pattern identification is normally used
as part of density modification during the first
few cycles of model building.
use_hl_anom_in_denmod= False Default is False (use HL coefficients in
density modification) NOTE: if True, you must
supply HLanom coefficients Allows you to specify
that HL coefficients including only the phase
information from the imaginary (anomalous
difference) contribution from the anomalous
scatterers are to be used in density
modification. Two sets of HL coefficients are
produced by Phaser. HLA HLB etc are HL
coefficients including the contribution of both
the real scattering and the anomalous
differences. HLanomA HLanomB etc are HL
coefficients including the contribution of the
anomalous differences alone. These HL
coefficients for anomalous differences alone are
the ones that you will want to use in cases where
you are bringing in model information that
includes the real scattering from the model used
in Phaser, such as when you are carrying out
density modification with a model or refinement
of a model If use_hl_anom_in_denmod=True then the
HLanom HL coefficients from Phaser are used in
density modification
use_hl_anom_in_denmod_with_model= False See use_hl_anom_in_denmod If
use_hl_anom_in_denmod=True then the
HLanom HL coefficients from Phaser are
used in density modification with a
model
mask_as_mtz= False Defines how omit_output_mask_file
ncs_output_mask_file and protein_output_mask_file are
written out. If mask_as_mtz=False it will be a ccp4 map. If
mask_as_mtz=True it will be an mtz file with map
coefficients FP PHIM FOMM (all three required)
protein_output_mask_file= None Name of map to be written out
representing your protein (non-solvent)
region. If mask_as_mtz=False the map will be a
ccp4 map. If mask_as_mtz=True it will be an
mtz file with map coefficients FP PHIM FOMM
(all three required)
ncs_output_mask_file= None Name of map to be written out representing
your ncs asymmetric unit. If mask_as_mtz=False the
map will be a ccp4 map. If mask_as_mtz=True it
will be an mtz file with map coefficients FP PHIM
FOMM (all three required)
omit_output_mask_file= None Name of map to be written out representing
your omit region. If mask_as_mtz=False the map
will be a ccp4 map. If mask_as_mtz=True it will
be an mtz file with map coefficients FP PHIM FOMM
(all three required)
maps
maps_only= False You can choose whether to skip all model-building and
just calculate maps and write out the results. This also runs
just 1 cycle and turns on HL coefficients.
n_xyz_list= None You can specify the grid to use for map calculations.
model_building
build_type= *RESOLVE RESOLVE_AND_BUCCANEER You can choose to build
models with RESOLVE or with RESOLVE and BUCCANEER #and
TEXTAL and how many different models to build with RESOLVE.
The more you build, the more likely to get a complete model.
Note that rebuild_in_place can only be carried out with
RESOLVE model-building. For BUCCANEER model building you
need CCP4 version 6.1.2 or higher and BUCCANEER version
1.3.0 or higher
allow_negative_residues= False Normally the wizard does not allow
negative residue numbers, and all residues with
negative numbers are rejected when they are
read in. You can allow them if you wish.
highest_resno= None Highest residue number to be considered
"placed" in sequence for rebuild_in_place
semet= False You can specify that the dataset that is used for
refinement is a selenomethionine dataset, and that the model
should be the SeMet version of the protein, with all SD of MET
replaced with Se of MSE.
use_met_in_align= *Auto True False You can use the heavy-atom positions
in input_ha_file as markers for Met SD positions.
base_model= None You can enter a PDB file with coordinates to be used as
a starting point for model-building. These coordinates will
be included in the same way as fragments placed by searching
for helices and strand in initial model-building. Note the
difference from the use of models in
consider_main_chain_list, which are merged with models after
they are built. NOTE: Only use this if you want to keep the
input model and just add to it.
consider_main_chain_list= None This keyword lets you name any number of
PDB files to consider as templates for
model-building. Every time models are built,
the contents of these files will be merged
with them and the best parts will be kept.
NOTE: this only uses the main-chain atoms of
your PDB files.
dist_connect_max_helices= None Set maximum distance between ends of
helices and other ends to try and connect them
in insert_helices.
edit_pdb= True You can choose to edit the input PDB file in
rebuild_in_place to match the input sequence (default=True).
NOTE: residues with residue numbers higher than
'highest_resno' are assumed to not have a known sequence and
will not be edited. By default the value of 'highest_resno' is
the highest residue number from the sequence file, after
adding it to the starting residue number from
start_chains_list. You can also set it directly
helices_strands_only= False You can choose to use a quick model-building
method that only builds secondary structure. At
low resolution this may be both quicker and more
accurate than trying to build the entire structure
If you are running the AutoSol Wizard, normally
you should choose 'False' as standard building is
quick. When your structure is solved by AutoSol,
go on to AutoBuild and build a more complete model
(still using helices_strands_only=False). NOTE:
helices_strands_only does not apply in AutoSol if
phase_improve_and_build=True
helices_strands_start= False You can choose to use a quick
model-building method that builds secondary
structure as a way to get started...then model
completion is done as usual. (Contrast with
helices_strands_only which only does secondary
structure)
cc_helix_min= None Minimum CC of helical density to map at low
resolution when using helices_strands_only
cc_strand_min= None Minimum CC of strand density to map when using
helices_strands_only
loop_lib= False Use loop library to fit loops Only applicable for
chain_type=PROTEIN
standard_loops= True Use standard loop fitting
trace_loops= False Use loop tracing to fit loops Only applicable for
chain_type=PROTEIN
refine_trace_loops= True Refine loops (real-space) after trace_loops
density_of_points= None Packing density of points to consider as as
possible CA atoms in trace_loops. Try 1.0 for a quick
run, up to 5 for much more thorough run If None, try
value depending on value of quick.
max_density_of_points= None Maximum packing density of points to
consider as as possible CA atoms in trace_loops.
cutout_model_radius= None Radius to cut out density for trace_loops If
None, guess based on length of loop
max_cutout_model_radius= 20. Maximum value of cutout_model_radius to try
padding= 1. Padding for cut out density in trace_loops
max_span= 30 Maximum length of a gap to try to fill
max_overlap= None Maximum number of residues from ends to start with.
(1=use existing ends, 2=one in from ends etc) If None, set
based on value of quick.
min_overlap= None Minimum number of residues from ends to start with.
(1=use existing ends, 2=one in from ends etc)
include_input_model= True The keyword include_input_model defines
whether the input model (if any) is to be crossed
with models that are derived from it, and the best
parts of each kept. It also defines whether the
input model is to be included in combination steps
during initial model-building. Note that if
multiple_models=True and include_input_model=True
then no initial cycle of randomization will be
carried out and the keyword
multiple_models_starting_resolution is ignored. In
most cases you should use include_input_model=True
If you want to generate maximum diversity with
multiple-models then you may wish to use
include_input_model=False. Also if you want to
decrease the amount of bias from your starting
model you may wish to use
include_input_model=False.
input_compare_file= None If you are rebuilding a model or already think
you know what the model should be, you can include a
comparison file in rebuilding. The model is not used
for anything except to write out information on
coordinate differences in the output log files.
NOTE: this feature does not always work correctly.
merge_models= False You can choose to only merge any input models and
write out the resulting model. The best parts of each
model will be kept based on model-map correlation.
Normally used along with number_of_parallel_models=1
morph= False You can choose whether to distort your input model in order
to match the current working map. This may be useful for MR
models that are quite distant from the correct structure.
morph_main= False You can choose whether to use only main-chain atoms
plus c-beta atoms in calculation of shifts in morphing.
Default is morph_main=False; use all atoms including
side-chain atoms.
dist_cut_base= 3.0 Tolerance for base pairing (A) for RNA/DNA)
morph_cycles= 2 Number of iterations of morphing each time it is run.
morph_rad= 7 Smoothing radius for morphing. The density from your model
and from the map are calculated with the radius rad_morph,
then they are adjusted to overlap optimally
n_ca_enough_helices= None Set maximum number of CA to add to ends of
helices and other ends to try and connect them in
insert_helices.
delta_phi= 20 Approximate angular sampling for search for regular
secondary structure in building
offsets_list= 53 7 23 You can specify an offset for the orientation of
the helix and strand templates in building. This is used
in generating different starting models.
all_maps_in_rebuild= False If two_fofc_in_rebuild or
two_fofc_denmod_in_rebuild are set you can choose
to try both density-modified and two_fofc-based
maps in building. Note: Set to True if you specify
rebuild_from_fragments=True. Note: not compatible
with map_phasing=True.
ps_in_rebuild= False You can choose to use a prime-and-switch resolve
map in all cycles of rebuilding instead of a
density-modified map. This is normally used in
combination with maps_only to generate a prime-and-switch
map. The map coeffs will be in prime_and_switch.mtz
use_ncs_in_ps= False You can choose to use NCS in prime-and-switch
remove_outlier_segments_z_cut= 3.0 You can remove any segments that are
not assigned to sequence during
model-building if the mean density at
atomic positions are more than
remove_outlier_segments_z_cut sd lower
than the mean for the structure.
refine= True This script normally refines the model during building. Say
False to skip refinement
refine_final_model_vs_orig_data= True This script normally refines the
model at the end against the original
(non-aniso-corrected) data and writes
out a CIF version of the model as well
reference_model= None You can specify a reference model for refinement
resolution_build= 0 Enter the high-resolution limit for model-building.
If 0.0, the value of resolution is used as a default.
restart_cycle_after_morph= 5 Morphing (if morph=True) will go only up to
this cycle, and then the morphed PDB file
will be used as a starting PDB file from then
on, removing all previous models. If
restart_cycle_after_morph=0 then the model
will be morphed and not rebuilt
retrace_before_build= False You can choose to retrace your model n_mini
times and use a map based on these retraced models
to start off model-building. This is the default
for rebuilding models if you are not using
rebuild_in_place. You can also specify
n_iter_rebuild, the number of cycles of
retrace-density-modify-build before starting the
main build.
reuse_chain_prev_cycle= True You can choose to allow model-building to
include atoms from each cycle in the model the
next cycle or not This must be true if you use
retrace_before_build
richardson_rotamers= *Auto True False You can choose to use the rotamer
library from SC Lovell, JM Word, JS Richardson and
DC Richardson (2000) " The Penultimate Rotamer
Library" Proteins: Structure Function and
Genetics 40 389-408. if you wish. Typically this
works well in RESOLVE model-building for
nearly-final models but not as well earlier in the
process . Default (Auto) is to use these rotamers
for rebuild_in_place but not otherwise.
rms_random_frag= None Rms random position change added to residues on
ends of fragments when extending them If you enter a
negative number, defaults will be used.
rms_random_loop= None Rms random position change added to residues on
ends of loops in tries for building loops If you enter
a negative number, defaults will be used.
start_chains_list= None You can specify the starting residue number for
each of the unique chains in your structure. If you
use a sequence file then the unique chains are
extracted and the order must match the order of your
starting residue numbers. For example, if your
sequence file has chains A and B (identical) and
chains C and D (identical to each other, but
different than A and B) then you can enter 2 numbers,
the starting residues for chains A and C. NOTE: you
need to specify an input sequence file for
start_chains_list to be applied.
trace_as_lig= False You can specify that in building steps the ends of
chains are to be extended using the LigandFit algorithm.
This is default for nucleic acid model-building.
track_libs= False You can keep track of what libraries each atom in a
built structure comes from.
two_fofc_denmod_in_rebuild= False You can choose to use a
density-modified sigmaa-weighted 2Fo-Fc map
in all cycles of rebuilding instead of a
density-modified map. In density
modification the density in the region
defined by the current model will be
truncated at +2sigma to reduce the dominance
of parts of the map with model defined.
Additionaly only 2 mask cycles of 3 minor
cycles will be done. Additionally
place_waters will be turned off. If the
model is highly incomplete this can
sometimes allow model-building to work even
when it will not for density-modified maps.
The map coeffs will be in
two_fofc_denmod_map.mtz. You might consider
turning on all_maps_in_rebuild as well.
Note: Setting
two_fofc_denmod_in_rebuild=True will by
default set place_waters=False. Set to True
if you specify rebuild_from_fragments=True.
rebuild_from_fragments= False You can use rebuild_from_fragments=True as
a shortcut to turn on two_fofc_denmod_in_rebuild
and all_maps_in_rebuild and to use your model in
each cycle with consider_main_chain_list. If you
use rebuild_from_fragments=True you might also
want to set i_ran_seed=xxxxx for some integer
xxxxx and run the job 10 or 20 times to have a
higher chance of success. This approach is
designed for cases where you have a small part
of your model very accurately placed and want to
build the rest of the model.
two_fofc_in_rebuild= False You can choose to use a sigmaa-weighted
2Fo-Fc map in all cycles of rebuilding instead of a
density-modified map. If the model is poor this can
sometimes allow model-building in place to work
even when it will not for density-modified maps.
refine_map_coeff_labels= "2FOFCWT PH2FOFCWT" You can pick which map
coefficients from phenix.refine to use if
two_fofc_in_rebuild=True
filled_2fofc_maps= True You can choose to use filled 2Fo-Fc maps when
two_fofc_in_rebuild is used. Default is True
map_phasing= False You can choose to use statistical density
modification starting with a 2mFo-DFc map, including model
information instead of a standard density-modified map with
model information. This density modification will include
NCS if present. Note: not compatible with
all_maps_in_rebuild=True
use_any_side= True You can choose to have resolve model-building place
the best-fitting side chain at each position, even if the
sequence is not matched to the map.
use_cc_in_combine_extend= False You can choose to use the correlation of
density rather than density at atomic
positions to score models in combine_extend
sort_hetatms= False Waters are automatically named with the chain of the
closest macromolecule if you set sort_hetatms=True This is
for the final model only.
map_to_object= None you can supply a target position for your model with
map_to_object=my_target.pdb. Then at the very end your
molecule will be placed as close to this as possible. The
center of mass of the autobuild model will be
superimposed on the center of mass of my_target.pdb using
space group symmetry, taking any match closer than 15 A
within 3 unit cells of the original position. The new
file will be overall_best_mapped.pdb
multiple_models
combine_only= False Once you have created a set of initial models you
can merge them together into a final set. This option is
useful if you have split up the creation of multiple
models into different directories, and then you have
copied all the initial models to one directory for
combining.
multiple_models= False You can build a set of models, all compatible
with your data. You can specify how many models with
multiple_models_number. If you are using
rebuild_in_place you can specify whether to generate
starting models or not with multiple_models_starting.
multiple_models_first= 1 Specify which model to build first
multiple_models_group_number= 5 You can build several initial models and
merge them. Normally 5 initial models is
fine.
multiple_models_last= 20 Specify which model to end with
multiple_models_number= 20 Specify how many models to build.
multiple_models_starting= True You can specify how to generate starting
models for multiple models. If you are using
rebuild_in_place and you specify
"True" then the Wizard will rebuild
your starting model at the resolution
specified in
multiple_models_starting_resolution. If you
are not using rebuild_in_place the Wizard will
always build a starting model at the current
resolution.
multiple_models_starting_resolution= 4 You can set the resolution for
rebuilding an initial model. A
value of 0.0 will use the
resolution of the dataset.
place_waters_in_combine= None You can choose whether phenix.refine
automatically places ordered solvent (waters)
during the last cycle of multiple-model
generation. This is separate from place_waters,
which applies to all other cycles. If None,
then value of place_waters will be used.
ncs
find_ncs= *Auto True False This script normally deduces ncs information
from the NCS in chains of models that are built during
iterative model-building. The update is done each cycle in
which an improved model is obtained. Say False to skip this.
See also "input_ncs_file" which can be used to
specify NCS at the start of the process. If
find_ncs="No" then only this starting NCS will be
used and it will not be updated. You can use find_ncs
"No" to specify exactly what residues will be used
in NCS refinement and exactly what NCS operators to use in
density modification. You can use the function
$PHENIX/phenix/phenix/command_line/simple_ncs_from_pdb.py to
help you set up an input_ncs_file that has your specifications
in it. NOTE: if an input map_file is provided then if no ncs
is found from a model, ncs will be searched for in the density
of that map.
input_ncs_file= None You can enter NCS information in 3 ways: (1) an
ncs_spec file produced by AutoSol or AutoBuild with NCS
information (2) a heavy-atom PDB file that contains ncs
in the heavy-atom sites (3) a PDB file with a model that
contains chains with NCS The wizard will derive NCS
information from any of these if specified. See also
"find_ncs" which determines whether the wizard
will update NCS from models that are built during
iterative building.
ncs_copies= None Number of copies of the molecule in the au (note: only
one type of molecule allowed at present)
ncs_refine_coord_sigma_from_rmsd= False You can choose to use the
current NCS rmsd as the value of the
sigma for NCS restraints. See also
ncs_refine_coord_sigma_from_rmsd_ratio
ncs_refine_coord_sigma_from_rmsd_ratio= 1 You can choose to multiply the
current NCS rmsd by this value
before using it as the sigma for
NCS restraints See also
ncs_refine_coord_sigma_from_rmsd
no_merge_ncs_copies= False Normally False (do merge NCS copies). If
True, then do not use each NCS copy to try to build
the others.
optimize_ncs= True This script normally deduces ncs information from the
NCS in chains of models that are built during iterative
model-building. Optimize NCS adds a step to try and make
the molecule formed by NCS as compact as possible, without
losing any point-group symmetry.
use_ncs_in_build= True Use NCS information in the model assembly stage
of model-building. Also if no_merge_ncs_copies is not
set, then use each NCS copy to try to build the
others.
ncs_in_refinement= *torsion cartesian None Use torsion_angle refinement
of NCS. Alternative is cartesian or None (None will
use phenix.refine default)
omit
composite_omit_type= *None simple_omit refine_omit sa_omit
iterative_build_omit Your choices of types of OMIT
maps are: None - normal operation, no omit
simple_omit - omit the atoms in OMIT region in
calculating a sigmaA-weighted 2mFo-DFc map with no
refinement. refine_omit - as simple_omit, but
refine with standard refinement. sa_omit - omit the
atoms in OMIT region, carry out simulated-annealing
refinement, then calculate a sigmaA-weighted
2mFo-DFc map. iterative_build_omit - set occupancy
of atoms in OMIT region to 0 throughout an entire
iterative model-building, density modification and
refinement process (takes a long time). All these
omit map types are available as composite omit maps
(default) or as omit maps around a region defined
by a PDB file (using omit_box_pdb_list) The
resulting OMIT map will be in the directory OMIT
with file name resolve_composite_map.mtz . This mtz
file contains the map coefficients to create the
OMIT map. The file "omit_region.mtz"
contains the coefficients for a map showing the
boundaries of the OMIT region.
n_box_target= None You can tell the Wizard how many omit boxes to try
and set up (but it will not necessarily choose your number
because it has to be nicely divisible into boxes that fit
your asymmetric unit). A suitable number is 24. The larger
the number of boxes, the better the map will be, but the
longer it will take to calculate the map.
n_cycle_image_min= 3 Pattern recognition (resolve_pattern) and fragment
identification ("image based density
modification") are used as part of the density
modification process. These are normally only useful
in the first few cycles of iterative model-building.
This script tries model-building both with and
without including image information, and proceeds
with the most complete model. Once at least
n_cycle_image_min cycles have been carried out with
image information, if the image-based map results in
a less-complete model than the one without image
information, image information is no longer included.
n_cycle_rebuild_omit= 10 Model-building is normally carried out using
the "best" available map. If
omit_on_rebuild is True, then every
n_cycle_rebuild_omit cycle of model rebuilding, a
composite omit map is used instead. If you specify
0 and omit_on_rebuild is True, omit maps will be
used every cycle. Normally every 10th cycle is
optimal.
offset_boundary= 2. Specify the boundary in A around atoms in
omit_box_pdb for definition of omit region. Contrast
with omit_boundary which applies for composite omit
omit_boundary= 2. Specify the boundary in A around atoms in omit_boxes
for definition of omit region. Contrast with
offset_boundary which applies for omit_box_pdb
omit_box_start= 0 To only carry out omit in some of the omit boxes, use
omit_box_start and omit_box_end
omit_box_end= 0 To only carry out omit in some of the omit boxes, use
omit_box_start and omit_box_end
omit_box_pdb_list= None This keyword applies if you have set OMIT region
specification to "omit_around_pdb". To
automatically set an OMIT region specify a PDB
file(s) with omit_box_pdb_list. The omit region
boundaries will be the limits in x y z of the atoms
in this file, plus a border of offset_boundary. To
use only some of the atoms in the file, specify
values for starting, ending and chain to omit
(omit_res_start_list and omit_res_end_list and
omit_chain_list) If you specify more than one file
(or if you specify more than one segment of a file
with omit_chain_list or omit_res_start_list and
omit_res_end_list) then a set of omit runs will be
carried out and combined into one composite omit.
omit_chain_list= None You can choose to omit just a portion of your
model keywords omit_res_start_list 3 omit_res_end_list
4 omit_chain_list chain1 (use "" to select
all chains) The residues from 3 to 4 of chain1 will be
omitted. You can specify more than one region by
listing them separated by spaces If you specify more
than one region, a separate omit run will be carried
out for each one and then the maps will be put together
afterwards. If there are more than one chains in the
input PDB file then only the chain defined by
omit_chain will be omitted NOTE: Zero for start and end
and "" for chain is the same as choosing
everything
omit_offset_list= 0 0 0 0 0 0 To carry out one iterative build omit,
with a region defined in grid units, enter
nxs,nxe,nys,nye,nzs,nze in omit_offset_list.
omit_on_rebuild= False You can specify whether to use an omit map for
building the model on rebuild cycles. Default is True
if you start with a model, False if you are building a
model from scratch. The omit map is calculated every
n_cycle_rebuild_omit cycles
omit_selection= None Selection string defining atoms in input pdb to be
used to define the OMIT region. For use with
omit_region_specification=omit_selection
omit_region_specification= *composite_omit omit_around_pdb
omit_selection You can specify what region an
omit (simple/sa-omit/iterative-build-omit)
map is to be calculated for. Composite omit
will create a map over the entire asymmetric
unit by dividing the asymmetric unit into
overlapping boxes, calculating omit maps for
each, and splicing all the results together
into a single composite omit map. You can
tell the Wizard how many omit boxes to try
and set up with the keyword
"n_box_target" (but it will not
necessarily choose your number because it has
to be nicely divisible into boxes that fit
your asymmetric unit). Omit around PDB will
omit around the region defined by the PDB
file(s) you enter for omit_box_pdb (or around
the residues in that PDB file that you
specify). If you specify omit_around_pdb then
you must enter a pdb file to omit around. If
you specify omit_selection you must enter a
selection string in omit_selection
omit_res_start_list= None You can choose to omit just a portion of your
model keywords omit_res_start_list 3
omit_res_end_list 4 omit_chain_list chain1 (use
" " for blank). The residues from 3 to 4
of chain1 will be omitted. You can specify more
more than one region by listing them separated by
spaces If you specify more than one region, a
separate omit run will be carried out for each one
and then the maps will be put together afterwards.
If there are more than one chains in the input PDB
file then only the chain defined by omit_chain will
be rebuilt. NOTE: Zero for start and end and
"" for chain is the same as choosing
everything
omit_res_end_list= None You can choose to omit just a portion of your
model keywords omit_res_start_list 3
omit_res_end_list 4 omit_chain_list chain1 (use
" " for blank). The residues from 3 to 4 of
chain1 will be omitted. You can specify more more
than one region by listing them separated by spaces
If you specify more than one region, a separate omit
run will be carried out for each one and then the
maps will be put together afterwards. If there are
more than one chains in the input PDB file then only
the chain defined by omit_chain will be omitted NOTE:
Zero for start and end and "" for chain is
the same as choosing everything
rebuild_in_place
min_seq_identity_percent_rebuild_in_place= 95 Minimum sequence identity
to use rebuild_in_place by
default
n_cycle_rebuild_in_place= None Number of cycles for rebuild_in_place for
multiple models only
n_rebuild_in_place= 1 You can choose how many times to rebuild your
model in place with rebuild_in_place
rebuild_chain_list= None You can choose to rebuild just a portion of
your model keywords rebuild_res_start_list 3
rebuild_res_end_list 4 rebuild_chain_list chain1
(use " " for blank). The residues from 3
to 4 of chain1 will be rebuilt. You can specify more
than one region by using the Parameter Group Options
button to add lines. If there are more than one
chains in the input PDB file then only the chain
defined by rebuild_chain will be rebuilt. The
smallest region that can be rebuilt is 4 residues.
rebuild_in_place= *Auto True False You can choose to rebuild your model
while fixing the sequence alignment by iteratively
rebuilding segments within the model. This is done
n_rebuild_in_place times, then the models are
recombined, taking the best-fitting parts of each.
Crossovers allowed where main-chain atom rmsd is less
than dist_close. Note that the sequence of the input
model must match the supplied sequence closely enough
to allow a clear alignment. Also this method does not
build any new chain, it just moves the existing model
around. Normally this procedure is useful if the model
is greater than 95% identical with the target
sequence. You can include information directly from
the starting model if you want with the keyword
include_input_model. Then this model will be
recombined with the models that are built based on it.
Note that this requires that the input model have a
sequence that is identical to the model to be rebuilt.
You can also rebuild just a portion of the model with
the keywords keywords rebuild_res_start_list 3
rebuild_res_end_list 4 rebuild_chain_list chain1 (use
" " for blank) The residues from 3 to 4 of
chain1 will be rebuilt. You can specify more than one
region by using the Parameter Group Options button to
add lines NOTE: if a region cannot be rebuilt the
original coordinates will be preserved for that
region.
rebuild_near_chain= None You can specify where to rebuild either with
rebuild_res_start_list rebuild_res_end_list
rebuild_chain_list or with rebuild_near_res and
rebuild_near_chain and rebuild_near_dist.
rebuild_near_dist= 7.5 You can specify where to rebuild either with
rebuild_res_start_list rebuild_res_end_list
rebuild_chain_list or with rebuild_near_res and
rebuild_near_chain and rebuild_near_dist.
rebuild_near_res= None You can specify where to rebuild either with
rebuild_res_start_list rebuild_res_end_list
rebuild_chain_list or with rebuild_near_res and
rebuild_near_chain and rebuild_near_dist.
rebuild_res_end_list= None You can choose to rebuild just a portion of
your model keywords rebuild_res_start_list 3
rebuild_res_end_list 4 rebuild_chain_list chain1
(use " " for blank). The residues from 3
to 4 of chain1 will be rebuilt. You can specify
more than one region by using the Parameter Group
Options button to add lines. If there are more
than one chains in the input PDB file then only
the chain defined by rebuild_chain will be
rebuilt. The smallest region that can be rebuilt
is 4 residues.
rebuild_res_start_list= None You can choose to rebuild just a portion of
your model keywords rebuild_res_start_list 3
rebuild_res_end_list 4 rebuild_chain_list chain1
(use " " for blank). The residues from
3 to 4 of chain1 will be rebuilt. You can
specify more than one region by using the
Parameter Group Options button to add lines. If
there are more than one chains in the input PDB
file then only the chain defined by
rebuild_chain will be rebuilt. The smallest
region that can be rebuilt is 4 residues.
rebuild_side_chains= False You can choose to replace side chains (with
extend_only) before rebuilding the model (not
normally used)
redo_side_chains= True You can chooses to have AutoBuild choose whether
to replace all your side chains in rebuild_in_place,
taking new ones if they fit the density better. If
True, this is applied to all side chains, not only
those that are rebuilt.
replace_existing= True In rebuild_in_place the usual default is to force
the replacement of all residues, even if the rebuilt
ones are not as good a fit as the original. The
rebuilt model is then crossed with the original model
(if include_input_model=True) and the better parts of
each are then kept. You can override the replacement
of all residues in the initial model-building by
saying "False" (do not force replacement of
residues, keep whatever is better). Additionally if
you set the "touch_up" flag then the default
is "True": keep whatever is better.
delete_bad_residues_only= False You can simply delete the worst parts of
your model and write out the resulting model
with delete_bad_residues_only=True The
criteria used are the ones set with touch_up.
Any residues that would be rebuild by
touch_up=True will be deleted by
delete_bad_residues_only. NOTE:
delete_bad_residues_only ignores ligands,
waters etc. so you may need to put them back
in afterwards.
touch_up= False You can rebuild just the worst parts of your model by
setting touch_up=True. You can decide what parts to rebuild
based on an minimum model-map correlation (by residue). This
is set with min_cc_residue_rebuild=0.82 Alternatively you can
rebuild the worst percentage of these:
worst_percent_res_rebuild=6. If a value is set for both of
these then residues qualifying in either way are rebuilt.
NOTE: touch_up is only available with rebuild_in_place.
touch_up_extra_residues= None Number of residues on each side of the
residues identified in touch_up that you want
to rebuild. Normally you will want to rebuild
one or more on each side.
worst_percent_res_rebuild= 2 You can rebuild just the worst parts of
your model by setting touch_up=True. You can
decide how much to rebuild using
worst_percent_res_rebuild or with
min_cc_res_rebuild, or both.
smooth_range= None You can specify what number of residues to smooth in
making choices for touch_up and delete_bad_residues_only
Typically use 3 or 5.
smooth_minimum_length= None If specified, then any segments remaining
after smoothing that are shorter than
smooth_mininum_length will be removed.
refinement
refine_b= True You can choose whether phenix.refine is to refine
individual atomic displacement parameters (B values)
refine_se_occ= True You can choose to refine the occupancy of SE atoms
in a SEMET structure (default=True). This only applies if
semet=true
skip_clash_guard= True Skip refinement check for atoms that clash
correct_special_position_tolerance= None Adjust tolerance for special
position check. If 0., then check
for clashes near special positions
is not carried out. This sometimes
allows phenix.refine to continue
even if an atom is near a special
position. If 1., then checks within
1 A of special positions. If None,
then uses phenix.refine default. (1)
use_mlhl= True This script normally uses information from the input file
(HLA HLB HLC HLD) in refinement. Say No to only refine on Fobs
generate_hl_if_missing= False This script normally uses information from
the input file (HLA HLB HLC HLD) in refinement.
Say No to not generate HL coeffs from input
phases.
place_waters= True You can choose whether phenix.refine automatically
places ordered solvent (waters) during the refinement
process.
refinement_resolution= 0 Enter the high-resolution limit for refinement
only. This high-resolution limit can be different
than the high-resolution limit for other steps.
The default ("None" or 0.0) is to use
the overall high-resolution limit for this run
(as set by resolution)
ordered_solvent_low_resolution= None You can choose what resolution
cutoff to use fo placing ordered solvent
in phenix.refine. If the resolution of
refinement is greater than this cutoff,
then no ordered solvent will be placed,
even if
refinement.main.ordered_solvent=True.
link_distance_cutoff= 3 You can specify the maximum bond distance for
linking residues in phenix.refine called from the
wizards.
r_free_flags_fraction= 0.1 Maximum fraction of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_max_free= 2000 Maximum number of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_use_lattice_symmetry= True When generating r_free_flags you
can decide whether to include lattice
symmetry (good in general, necessary
if there is twinning).
r_free_flags_lattice_symmetry_max_delta= 5 You can set the maximum
deviation of distances in the
lattice that are to be
considered the same for
purposes of generating a
lattice-symmetry-unique set of
free R flags.
allow_overlapping= None Default is None (set automatically, normally
False unless S or Se atoms are the
anomalously-scattering atoms). You can allow atoms in
your ligand files to overlap atoms in your
protein/nucleic acid model. This overrides
'keep_pdb_atoms' Useful in early stages of
model-building and refinement The ligand atoms get
the altloc indicator 'L' NOTE: The ligand occupancy
will be refined by default if you set
allow_overlapping=True (because of the altloc
indicator) You can turn this off with
fix_ligand_occupancy=True
fix_ligand_occupancy= None If allow_overlapping=True then ligand
occupancies are refined as a group. You can turn
this off with fix_ligand_occupancy=true NOTE: has
no effect if allow_overlapping=False
remove_outlier_segments= True You can remove any segments that are not
assigned to sequence if their mean B values are
more than remove_outlier_segments_z_cut sd
higher than the mean for the structure. NOTE:
this is done after refinement, so the R/Rfree
are no longer applicable; the remarks in the
PDB file are removed
twin_law= None You can specify a twin law for refinement like this:
twin_law='-h,k,-l'
max_occ= None You can choose to set the maximum value of occupancy for
atoms that have their occupancies refined. Default is None (use
default value of 1.0 from phenix.refine)
refine_before_rebuild= True You can choose to refine the input model
before rebuilding it
refine_with_ncs= True This script can allow phenix.refine to
automatically identify NCS and use it in refinement.
NOTE: ncs refinement and placing waters automatically
are mutually exclusive at present.
refine_xyz= True You can choose whether phenix.refine is to refine
coordinates
s_annealing= False You can choose to carry out simulated annealing
during the first refinement after initial model-building
skip_hexdigest= False You may wish to ignore the hexdigest of the free R
flags in your input PDB file if (1) the dataset you
provide is not identical to the one that you refined
with (but has the same free R flags), or (2) you are
providing both an input_data_file and an
input_refinement_file or input_hires_file and. In the
second case, the resulting composite file may not have
the same hexdigest even though the free R flags are
copied over. The default is to set skip_hexdigest=True
for case #2. For case #1 you have to tell the Wizard to
skip the hexdigest (because it cannot know about this).
use_hl_anom_in_refinement= False See use_hl_anom_in_denmod. If
use_hl_anom_in_refinement=True then the
HLanom HL coefficients from Phaser are used
in refinement
thoroughness
build_outside= True Define whether to use the BuildOutside module in
model_building
connect= True Define whether to use the connect module in
model_building. This module tries to connect nearby chains with
loops, without using the sequence. This is different than
fit_loops (which uses the sequence to identify the exact number
of residues in the loop).
extensive_build= False You can choose whether to build a new model on
every cycle and carry out extra model-building steps
every cycle. Default is False (build a new model on
first cycle, after that carry out extra steps).
fit_loops= True You can fit loops automatically if sequence alignment
has been done.
insert_helices= True Define whether to use the insert_helices module in
model_building. This module tries to insert helices
identified with find_helices_strands into the current
working model. This can be useful as the standard build
sometimes builds strands into helical density at low
resolution.
n_cycle_build= None Choose number of cycles of building and chain
extension during each cycle of model-building. (default
of 1 ).
n_cycle_build_max= 6 Maximum number of cycles for iterative
model-building, starting from experimental phases
without a model. Even if a satisfactory model is not
found, a maximum of n_cycle_build_max cycles will be
carried out.
n_cycle_build_min= 1 Minimum number of cycles for iterative
model-building, starting from experimental phases
without a model. Even if a satisfactory model is
found, n_cycle_build_min cycles will be carried out.
n_cycle_rebuild_max= 15 Maximum number of cycles for iterative
model-rebuilding, starting from a model. Even if a
satisfactory model is not found, a maximum of
n_cycle_rebuild_max cycles will be carried out.
n_cycle_rebuild_min= 1 Mininum number of cycles for iterative
model-rebuilding, starting from a model. Even if a
satisfactory model is found, n_cycle_rebuild_min
cycles will be carried out.
n_mini= 10 You can choose how many times to retrace your model in
"retrace_before_build"
n_random_frag= 0 In resolve building you can randomize each fragment
slightly so as to generate more possibilities for tracing
based on extending it.
n_random_loop= 3 Number of randomized tries from each end for building
loops If 0, then one try. If N, then N additional tries
with randomization based on rms_random_loop.
n_try_rebuild= 2 Number of attempts to build each segment of chain
ncycle_refine= 3 Choose number of refinement cycles (3)
number_of_models= None This parameter lets you choose how many initial
models to build with RESOLVE within a single build
cycle. This parameter is now superseded by
number_of_parallel_models, which sets the number of
models (but now entire build cycles) to carry out in
parallel. None or zero means set it automatically.
That is what you normally should use. The
number_of_models is by default set to 1 and
number_of_parallel_models is set to the value of
nbatch (typically 4).
number_of_parallel_models= 0 This parameter lets you choose how many
models to build in parallel. None or 0 means
set it automatically. That is what you
normally should use. You can set this to 1 to
prevent the wizard from running multiple jobs
in parallel
skip_combine_extend= False You can choose whether to skip the
combine-extend step in model-building if only one
model is available
fully_skip_combine_extend= False You can choose whether to skip the
combine-extend step in model-building in all
cases
thorough_loop_fit= True Try many conformations and accept them even if
the fit is not perfect? If you say True the
parameters for thorough loop fitting are:
n_random_loop=100 rms_random_loop=0.3
rho_min_main=0.5 while if you say False those for
quick loop fitting are: n_random_loop=20
rms_random_loop=0.3 rho_min_main=1.0
general
coot_name= "coot" If your version of coot is called something else, then
you can specify that here.
i_ran_seed= 72432 Random seed (positive integer) for model-building and
simulated annealing refinement
raise_sorry= False You can have any failure end with a Sorry instead of
simply printout to the screen
background= True When you specify nproc=nn, you can run the jobs in
background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set run_command=qsub
(or otherwise submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If you use run_command='sh ' (or similar, sh is
default) then normally you will use background=True so that
all the jobs run simultaneously.
check_wait_time= 1.0 You can specify the length of time (seconds) to
wait between checking for subprocesses to end
max_wait_time= 1.0 You can specify the length of time (seconds) to wait
when looking for a file. If you have a cluster where jobs
do not start right away you may need a longer time to
wait. The symptom of too short a wait time is 'File not
found'
wait_between_submit_time= 1.0 You can specify the length of time
(seconds) to wait between each job that is
submitted when running sub-processes. This can
be helpful on NFS-mounted systems when running
with multiple processors to avoid file
conflicts. The symptom of too short a
wait_between_submit_time is File exists:....
cache_resolve_libs= True Use caching of resolve libraries to speed up
resolve
resolve_size= 12 Size for solve/resolve
("","_giant",
"_huge","_extra_huge" or a number
where 12=giant 18=huge
check_run_command= False You can have the wizard check your run command
at startup
run_command= "sh " When you specify nproc=nn, you can run the
subprocesses as jobs in background with sh (default) or
submit them to a queue with the command of your choice
(i.e., qsub ). If you have a multi-processor machine, use
sh. If you have a cluster, use qsub or the equivalent
command for your system. NOTE: If you set run_command=qsub
(or otherwise submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If nproc is greater than 1 and you use
run_command='sh '(or similar, sh is default) then normally
you will use background=True so that all the jobs run
simultaneously.
queue_commands= None You can add any commands that need to be run for
your queueing system. These are written before any other
commands in the file that is submitted to your queueing
system. For example on a PBS system you might say:
queue_commands='#PBS -N mr_rosetta' queue_commands='#PBS
-j oe' queue_commands='#PBS -l walltime=03:00:00'
queue_commands='#PBS -l nodes=1:ppn=1' NOTE: you can put
in the characters '' in any queue_commands line and this
will be replaced by a string of characters based on the
path to the run directory. The first character and last
two characters of each part of the path will be
included, separated by '_',up to 15 characters. For
example
'test_autobuild/WORK_5/AutoBuild_run_1_/TEMP0/RUN_1'
would be represented by: 'tld_W_5_A1__TP0_1'
condor_universe= vanilla The universe for condor is usually vanilla.
However you might need to set it to local for your
cluster
add_double_quotes_in_condor= True You might need to turn on or off
double quotes in condor job submission
scripts. These are already default
elsewhere but may interfere with condor
paths.
condor= None Specifies if the group_run_command is submitting a job to a
condor cluster. Set by default to True if
group_run_command=condor_submit, otherwise False. For condor job
submission mr_rosetta uses a customized script with condor
commands. Also uses one_subprocess_level=True
last_process_is_local= True If true, run the last process in a group in
background with sh as part of the job that is
submitting jobs. This prevents having the job
that is submitting jobs sit and wait for all the
others while doing nothing
skip_r_factor= False You can skip R-factor calculation if refinement is
not done and maps_only=True
skip_xtriage= False You can bypass xtriage if you want. This will
prevent you from applying anisotropy corrections, however.
base_path= None You can specify the base path for files (default is
current working directory)
temp_dir= None Define a temporary directory (it must exist)
clean_up= False At the end of the entire run the TEMP directories will
be removed if clean_up is True. The default is yes, delete
these directories. If you want to remove them after your run
is finished use a command like "phenix.autobuild run=1
clean_up=True" Files listed in keep_files will not be
deleted
print_citations= True Print citations at end of run
solution_output_pickle_file= None At end of run, write solutions to this
file in output directory if defined
title= None Enter any text you like to help identify what you did in
this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
wizard_directory_number= None This is used by the GUI to define the run
number for Wizards. It is the same as
desired_run_number NOTE: this value can only be
specified on the command line, as the directory
number is set before parameters files are read.
verbose= False Command files and other verbose output will be printed
extra_verbose= False Facts and possible commands will be printed every
cycle if True
debug= False You can have the wizard stop with error messages about the
code if you use debug. Additionally the output goes to the
terminal if you specify "debug=True"
require_nonzero= True Require non-zero values in data columns to
consider reading in.
remove_path_word_list= None List of words identifying paths to remove
from PATH These can be used to shorten your PATH.
For example... cns ccp4 coot would remove all
paths containing these words except those also
containing phenix. Capitalization is ignored.
fill= False Fill in all missing reflections to resolution res_fill.
Applies to density modified maps. See also filled_2fofc_maps in
autobuild.
res_fill= None Resolution for filling in missing data (default = highest
resolution of any datafile). Only applies to density modified
maps. Default is fill to high resolution of data. Ignored if
fill=False
check_only= False Just read in and check initial parameters. Not for
general use
keep_files= overall_best* AutoBuild_run_*.log List of files that are not
to be cleaned up. wildcards permitted
after_autosol= False You can specify that you want to continue on
starting with the highest-scoring run of AutoSol in your
working directory.
nbatch= 3 You can specify the number of processors to use (nproc) and
the number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors
available and leave nbatch alone. If you leave nbatch as None it
will be set automatically, with a value depending on the Wizard.
This is recommended. The value of nbatch can affect the results
that you get, as the jobs are not split into exact replicates,
but are rather run with different random numbers. If you want to
get the same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be
set automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch. If you set
nproc=Auto and your machine has n processors, then it will use
n-1 processors, or 1 if only 1 is available
quick= False Run everything quickly (number_of_parallel_models=1
n_cycle_build_max=1 n_cycle_rebuild_max=1)
resolve_command_list= None Commands for resolve. One per line in the
form: keyword value value can be optional
Examples: coarse_grid resolution 200 2.0 hklin
test.mtz NOTE: for command-line usage you need to
enclose the whole set of commands in double quotes
(") and each individual command in single
quotes (') like this:
resolve_command_list="'no_build' 'b_overall
23' "
resolve_pattern_command_list= None Commands for resolve_pattern. One per
line in the form: keyword value value can
be optional Examples: resolution 200 2.0
hklin test.mtz NOTE: for command-line
usage you need to enclose the whole set of
commands in double quotes (") and
each individual command in single quotes
(') like this:
resolve_pattern_command_list="'resolut
ion 200 20' 'hklin test.mtz' "
ignore_errors_in_subprocess= False Try to ignore errors in sub-processes
This is useful in cases where a very rare
crash occurs and you want to just ignore
that step and go on.
send_notification= False
notify_email= None
special_keywords
write_run_directory_to_file= None Writes the full name of a run
directory to the specified file. This can
be used as a call-back to tell a script
where the output is going to go.
run_control
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to
use as part of the model-building process. If you just say coot
then the facts for the highest-numbered existing run will be
shown.
ignore_blanks= None ignore_blanks allows you to have a command-line
keyword with a blank value like
"input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard"
or "stop". If you type "phenix.autobuild run=3
stop" then this will stop run 3 of autobuild.
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number.
If you just say display_facts then the facts for the
highest-numbered existing run will be shown.
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run
run-number. If you just say display_summary then the
summary for the highest-numbered existing run will be
shown.
carry_on= None Set carry_on to True to carry on with highest-numbered
run from where you left off.
run= None Set run to n to continue with run n where you left off.
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off.
display_runs= None List all runs for this wizard.
delete_runs= None List runs to delete: 1 2 3-5 9:12
display_labels= None display_labels=test.mtz will list all the labels
that identify data in test.mtz. You can use the label
strings that are produced in AutoSol to identify which
data to use from a datafile like this:
peak.data="F+ SIGF+ F- SIGF-". The entire
string in quotes counts here You can use the individual
labels from these strings as identifiers for data
columns in AutoSol or AutoBuild like this:
input_refinement_labels="FP SIGFP FreeR_flags"
# each individual label counts
dry_run= False Just read in and check parameter names
params_only= False Just read in and return parameter defaults. Not for
general use
display_all= False Just read in and display parameter defaults
non_user_parameters These are obsolete parameters and parameters that the
wizards use to communicate among themselves. Not
normally for general use.
gui_output_dir= None Used only by the GUI
background_map= None You can supply an mtz file (REQUIRED LABELS: FP
PHIM FOMM) to use as map coefficients to calculate the
electron density in all points in an omit map that are
not part of any omitted region. (Default="")
boundary_background_map= None You can supply an mtz file (REQUIRED
LABELS: FP PHIM FOMM) to use as map
coefficients to calculate the electron density
in all points in the boundary map that are not
part of any omitted region.
(Default="")
extend_try_list= True You can fill out the list of parallel jobs to
match the number of jobs you want to run at one time,
as specified with nbatch.
force_combine_extend= False You can choose whether to force the
combine-extend step in model-building
model_list= None This keyword lets you name any number of PDB files to
consider as starting models for model-building. NOTE: This
differs from consider_main_chain_list which will try to add
your PDB files EVERY cycle of merging models. In contrast
model_list will only do it on the first cycle. NOTE: this
only uses the main-chain atoms of your PDB files.
oasis_cnos= None Enter number of C N O and S atoms here if you have
OASIS and want to run it before resolve density modification
like this: "C 250 N 121 O 85 S 3"
offset_boundary_background_map= None You can set the offset of the
boundary_background_map.
skip_refine= False Skip refinement (used in
get_connections/assign_sequence)
sg= None Obsolete. Use space_group instead
input_data_file= None Not normally used (same as "data=").
input_map_file= Auto Not normally used. (Same as map_file).
input_refinement_file= Auto Not normally used. Same as refinement_file
input_pdb_file= None Not normally used. Same as "model="
input_seq_file= Auto Not normally used. Same as seq_file
|