Molecular replacement and autobuilding using Phaser, Rosetta, and
autobuild with mr_rosetta
- Author(s)
- Purpose
- Tools from Rosetta that are used in mr_rosetta
- Steps where mr_rosetta uses structure-modeling algorithms
- Summary of the procedure used in mr_rosetta
- Details of the procedure used in mr_rosetta
- Notes on the procedure used in mr_rosetta
- Viewing solutions and restarting with saved solutions
- Running mr_rosetta on a cluster
- Single file system required for mr_rosetta
- Read/write delay allows for slow NFS disks
- Tracking your log files
- Re-running parts of your mr_rosetta jobs
- Failures in sub-processes
- Stopping mr_rosetta
- Installing Rosetta for use with mr_rosetta
- Setting up for a run of mr_rosetta A. Fragment files from the Robetta server
- Setting up for a run of mr_rosetta B. Alignment files fromthe hhpred server
- Search models and alignment files
- Output files from mr_rosetta
- Graphical interface
- Parameters files in mr_rosetta
- Examples
- Standard run of mr_rosetta:
- Running mr_rosetta with a model that is already place in the unit cell
- Rebuilding your model with Rosetta before MR
- Running mr_rosetta from a homology search (with an hhr file)
- Getting a default parameters file for mr_rosetta:
- Testing mr_rosetta:
- Possible Problems
- Debugging problems with running mr_rosetta:
- Specific limitations and problems:
- Literature
- Additional information
- List of all mr_rosetta keywords
Author(s)
- mr_rosetta: Tom Terwilliger, Frank DiMaio, Randy Read, David Baker
Purpose
mr_rosetta is a procedure for extending the range of molecular replacement
by combining tools from the structure-modeling field (Rosetta) with
crystallographic molecular replacement, model-building, density
modification and refinement. The approach is described in
Dimaio et al. (2011). It can also be used to rebuild a model with a combination
of Rosetta and Phenix tools.
A key requirement for using mr_rosetta is that you have to have a
sequence alignment of the protein used as a template
to model your target protein. You can try several different alignments,
but a good alignment has to be in your set of alignments or the procedure
will be unlikely to be successful. The reason is that Rosetta homology
modeling makes strong use of the sequence, so if your alignment is incorrect
you are essentially trying to build the wrong molecule.
The basic process is to find MR solutions with automr,
rebuild them with Rosetta, then rebuild those models with
phenix.autobuild. The combination of Rosetta rebuilding and
phenix rebuilding is the key part of this method. In slightly more
detail, this process is to select possible MR solutions (one of which
must later be shown to be correct for the procedure to succeed)with
automr (Phaser), score with LLG following Rosetta relaxation,
pick the best solutions, rebuild each of these with Rosetta including
map information (density term), score the resulting models with
Rosetta, select the highest and score with LLG, verify that the top
solutions are all about the same (electron density maps are correlated),
and rebuild the top models with autobuild.
mr_rosetta can handle a single copy of a single chain, or multiple
copies of a single chain (NCS), or multiple copies of multiple chains
(groups of NCS). If you supply one or more input search models, then the
entire crystallographic asymmetric unit must contain some multiple
of the search models you supply (phaser will be used to find
copies of the search model). NCS will be found automatically in
your search model and in any models assembled by mr_rosetta.
NOTE: if your molecule has multiple chain types, then you cannot use
the simple hhr file input (see below) and you cannot automatically run
pre-refinement
with rosetta on your molecule. Instead you need to use mr_model_preparation
and phenix.automr to place your model. Then you can supply the aligned,
placed structure to mr_rosetta for rebuilding. Additionally in this
case you will need to supply a different set of fragments files for each
chain type.
Tools from Rosetta that are used in mr_rosetta
-
Calculation of Rosetta energies for a model.
-
Model relaxation (rebuilding) including just the Rosetta standard
energy term. This optional rebuilding can be carried out before molecular
replacement, improving the starting model.
-
Model completion and relaxation (filling in missing sections and rebuilding)
with Rosetta energies and fit to a map (Dimaio et al., 2009).
This step is carried out after
an MR solution has been found. The placed model is used to calculate a
2Fo-Fc map and fit to this map is used as a target in addition to the
standard Rosetta energies. Note: Sequence-specific fragment
libraries are used in this process (obtained from the Robetta webserver).
Steps where mr_rosetta uses structure-modeling algorithms
-
Prior to MR. Edited templates are optionally rebuilt before carrying out
molecular replacement.
-
In scoring MR solutions. MR solutions are scored by Phaser LLG scores
calculated after model relaxation with Rosetta including the term for
fit to the current density map.
-
In rebuilding MR solutions. Initial molecular replacement solutions are
rebuilt using Rosetta model completion and relaxation including the term
for fit to the current density map
Summary of the procedure used in mr_rosetta
The overall process in one cycle of mr_rosetta is: (a) edit the model and
place it in the unit cell (e.g., MR, molecular replacement), (b) score all MR
solutions and take the best ones by LLG for further steps, (c) rebuild each
model 20-2000 times using Rosetta and density-modified 2Fo-Fc map to yield
Rosetta models, (d) refine Rosetta models, average density from top 20%,
continue rebuilding each Rosetta model using averaged density, and (e) take top
models based on LLG score and rebuild with autobuild.
An optional prerefinement step is to carry out Rosetta modeling in
step (a) above, before carrying out molecular replacement.
Details of the procedure used in mr_rosetta
-
Check installation and verify that Rosetta binary and libraries are
available (can specify with keywords rosetta_path (overall path to
rosetta directories), and the keywords (paths relative to overall
rosetta_path) rosetta_binary_dir, rosetta_binary_name, rosetta_script_dir,
rosetta_database_dir)
-
Read reflections file (must be CCP4 mtz and have a freeR set; use phenix GUI
or import_and_add_free to set up)
-
An optional model editing step. If an hhpred .hhr file is supplied,
read through this file, download the
PDB files specified, apply alignments specified to generate pairs of
alignment files and edited models (e.g., 2cng.ali, 2cng_mr.pdb based on
2cng.pdb). This step is carried out by sculptor using a default
protocol.
NOTE 1: For tailoring of this step, use mr_model_preparation and then supply the aligned model to mr_rosetta.
NOTE 2: If your structure contains more than one chain or requires more than one homology model to represent the structure, then you need to use mr_model_preparation and phenix.automr to place your model. Then you can supply the aligned, placed structure to mr_rosetta for rebuilding.
-
NOTE: The steps below are carried out for each model/alignment file pair
supplied, (or for each pair generated from by mr_rosetta if an hhpred .hhr
file is supplied with alignment information).
-
Check model, alignment file and sequence file to verify that they match
(i.e., that the alignment file can be applied to the model to yield a model
with the sequence in the sequence file). Copy overall B-factor and
list of B-factors for atoms from the model to be substituted into subsequent
models before scoring. Rewrite the sequence file into standard (fasta)
format.
- Optional Rosetta modelling prior to molecular replacement (prerefinement).
The edited starting model (template) is rebuilt with Rosetta using the
standard Rosetta energy functions and a fragments library specific to the
target sequence to fill in gaps.
(If no gaps are present, no fragments files are necessary).
-
Run automr to find molecular replacement solutions (default
number_of_output_models=5).
-
Refine (if refine_after_mr=True) each MR solution with refine. Use
resulting 2mFo-DFc map as starting point for density modification, yielding
density-modified current density map for this refined solution, to be used
in Rosetta rebuilding below.
Ignore refined model. Note (but ignore) LLG of refined model.
- Determine the baseline LLG for model improvement.
This LLG can either be the LLG obtained in the previous step, or an LLG
obtained after rebuilding the MR model with Rosetta. Optionally
perform rosetta
relaxation (rebuilding rescore_mr.nstruct models, typically 5) models
without filling in missing sections, including a density term from the map in the prevous step. The sequence of all parts of the
model at this point will match the target sequence.
Score each solution by LLG. These models are not carried forward, but
taking the best LLG for any rebuilt model as the score for the original
model from MR. This score will be used later to prioritize MR solutions
for further analysis.
-
Sample relaxation script used to run relaxation in Rosetta:
#!/bin/sh
cd MR_ROSETTA_1/RESCORE_MR_1/RELAX_AND_SCORE_IN_SETS_1/RUN_1/WORK_1
/net/terwill/rosetta/rosetta_source/bin/mr_protocols.default.linuxgccrelease \
-database /net/terwill/rosetta/rosetta_database \
-MR:mode cm \
-in:file:extended_pose 1 \
-in:file:fasta MR_ROSETTA_1/WORK_1/EDITED_1crb_fasta.txt \
-in:file:alignment MR_ROSETTA_1/WORK_1/EDITED_1crb_2qo4.ali \
-in:file:template_pdb MR_ROSETTA_1/AutoMR_run_1_/2QO4.1.pdb \
-relax:default_repeats 4 \
-relax:jump_move true \
-edensity:mapreso 3.00 \
-edensity:grid_spacing 1.5 \
-edensity:mapfile \
MR_ROSETTA_1/AutoMR_run_1_/2QO4.1_refine_001_map_coeffs.map \
-edensity:sliding_window_wt 1.0 \
-edensity:sliding_window 5 \
-cm:aln_format grishin \
-MR:max_gaplength_to_model 0 \
-nstruct 1 \
-ignore_unrecognized_res \
-overwrite
-
Take the top (max_solutions_to_rebuild=5) models from step 5 and rebuild them
with Rosetta, this time filling in missing sections and including
the density term with the current map (the same map as used above in relaxation)
as part of the target function, generating total of
(rosetta_rebuild.nstruct=20) rebuilt models.
These are Rosetta models.
The sequence of these models
will be the same as that of the target (unless there are long gaps in the
template that cannot be filled by Rosetta).
Note: in many cases 20 models
is sufficient...in others far more models will make the method work
better (i.e., 1000 or 2000 models). This can take a lot of time unless you
have a cluster to run on. This Rosetta rebuilding step uses a library of
fragments specific to the target sequence to fill in any gaps.
(If no gaps are present, no fragments files are necessary).
Sample rebuild script used:
#!/bin/sh
cd MR_ROSETTA_1/WORK_1/REBUILD_IN_SETS_1/RUN_1/WORK_1
/net/terwill/rosetta/rosetta_source/bin/mr_protocols.default.linuxgccrelease \
-database /net/terwill/rosetta/rosetta_database \
-MR:mode cm \
-in:file:extended_pose 1 \
-in:file:fasta MR_ROSETTA_1/WORK_1/EDITED_1crb_fasta.txt \
-in:file:alignment MR_ROSETTA_1/WORK_1/EDITED_1crb_2qo4.ali \
-in:file:template_pdb MR_ROSETTA_1/AutoMR_run_1_/2QO4.1.pdb \
-loops:frag_sizes 9 3.2 \
-loops:frag_files inputs/aa1crb_09_05.200_v1_3.gz \
inputs/aa1crb_03_05.200_v1_3.gz none \
-loops:random_order \
-loops:random_grow_loops_by 5 \
-loops:extended \
-loops:remodel quick_ccd \
-loops:relax relax \
-relax:default_repeats 4 \
-relax:jump_move true \
-edensity:mapreso 3.00 \
-edensity:grid_spacing 1.5 \
-edensity:mapfile MR_ROSETTA_1/AutoMR_run_1_/2QO4.1_refine_001_map_coeffs.map \
-edensity:sliding_window_wt 1.0 \
-edensity:sliding_window 5 \
-cm:aln_format grishin \
-MR:max_gaplength_to_model 8 \
-nstruct 1 \
-ignore_unrecognized_res \
-overwrite
- Choose top (percentage_to_rescore=10) rebuilt models based on Rosetta
score (including density term) and rescore them based on LLG
-
Determine whether the top (number_of_required_cc=5) best LLG score Rosetta
models are all similar (map correlation between map for top
model with each )
-
Refine with phenix.refine the top Rosetta models based on LLG
(if refine_top_models.run_refine_top_models=True, percent_to_refine=20).
Save new 2mFo-DFc map, density modify the map, and then average the
top density-modified
maps (top based on Rosetta score) to yield an averaged density map
used in the next relaxation step. Ignore the refined model.
-
Relax (rebuild) the Rosetta models from into their corresponding
density maps from
with Rosetta, generating (relax_top_models.nstruct=5) models for
each. Score each relaxed model with LLG, take best LLG as score. Save
best relaxed model as new solution. Sample relax script used:
#!/bin/sh
cd MR_ROSETTA_1/GROUP_OF_RESCORE_MR_ROSETTA_2/RUN_1/RESCORE_MR_1/RELAX_AND_SCORE_IN_SETS_1/RUN_1/WORK_1
/net/terwill/rosetta/rosetta_source/bin/mr_protocols.default.linuxgccrelease \
-database /net/terwill/rosetta/rosetta_database \
-MR:mode relax \
-in::file::s \
MR_ROSETTA_1/WORK_1/REBUILD_IN_SETS_1/RUN_8/WORK_1/S_2QO4B_0001_edited.pdb \
-relax:default_repeats 4 \
-relax:jump_move true \
-edensity:mapreso 3.00 \
-edensity:grid_spacing 1.5 \
-edensity:mapfile \
MR_ROSETTA_1/WORK_1/REBUILD_IN_SETS_1/RUN_8/WORK_1/S_2QO4B_0001_edited_refine_001_map_coeffs.map \
-edensity:sliding_window_wt 1.0 \
-edensity:sliding_window 5 \
-nstruct 1 \
-overwrite
-
Take the top (number_to_autobuild=5) relaxed refined rebuilt Rosetta models
from the previous step (scored by LLG) and rebuild them with phenix.autobuild.
Report the R/freeR of each model.
Notes on the procedure used in mr_rosetta
-
Viewing solutions and restarting with saved solutions
At each stage, existing solutions are saved as a python "pkl" file and can
be read back in to mr_rosetta with "mr_rosetta_solutions=xxx.pkl". These
solutions can be displayed with "display_solutions=True". Existing
solutions are stored as "mr_rosetta_solution" objects which keep track
of the model and its history, the map_coefficients and labels, etc.
These can be read in to mr_rosetta with the
keyword "rosetta_solutions=results.pkl" and used as inputs for
subsequent runs, starting at any step that can use those solutions.
NOTE:
You can re-start mr_rosetta only at the beginning of major stages (like
"place_model", "rosetta_rebuild" etc)...but not in between.
Normally at the end of a major stage a .pkl file is written out with text
like "type this to see all the results". You can almost always give your
original command, the command "start_point=xxx" and
"mr_rosetta_solutions=my_pickle_file.pkl" and it should then continue on
from there.
-
Running mr_rosetta on a cluster
Jobs can be run on a single machine or on a cluster. A run command for
single jobs (single_run_command="sh") and a run command for batch
jobs (group_run_command=qsub) can be specified as well as the number of
processors to use (nproc=200).
The qsub command is used in Sun Grid Engine clusters. You can also
use mr_rosetta on a Condor cluster, using
group_run_command="condor_submit ".
-
Single file system required for mr_rosetta
All files are stored on a single file system that must be accessible to
all jobs.
-
Read/write delay allows for slow NFS disks
Read/write to files are (generally) accompanied by a wait for appearance of the new file of up to max_wait_time=100 sec.
-
Tracking your log files
mr_rosetta runs all cpu-intensive jobs as sub-processes.
When it submits a sub process to
do the work it lists the name of the corresponding log file. You can work
your way down to the bottom level at any time by reading through these log
files, copying the name of the next log file, and opening it until you get
to the place where the actual work is done.
-
Re-running parts of your mr_rosetta jobs
Sub-processes are always run in sub-directories. Each sub-process has a
file "RUN_FILE_1" that contains the information to run the sub-process,
a parameter file PARAMS_1.eff and a log file "RUN_FILE_1.log" with the
log file of running that sub-process.
Note that you can use the parameters files to re-run any jobs that you want.
You can say something like:
phenix.mr_rosetta PARAMS_1.eff
and that will rerun the job specified in that directory.
-
Failures in sub-processes
If some sub-processes fail, normally the failures will be ignored. This
is useful as your overall job can often continue even if a few
refinement or rosetta jobs fail. However if the failure is from the
queueing system (rather than in the actual running of the jobs)
then the overall job may still fail.
-
Stopping mr_rosetta
If you create a file "STOPWIZARD" in the top
level directory (i.e., MR_ROSETTA_1/), then each job in the entire
process will stop as soon as
any Phenix part of the process takes over (i.e., as soon as Rosetta jobs
finish).
Installing Rosetta for use with mr_rosetta
To run mr_rosetta, you need to install Rosetta from the Baker laboratory
at the University of Washington. This is pretty easy, and a summary of steps
is given below. Once you have installed Rosetta you need to set the
environmental variable $PHENIX_ROSETTA_PATH. Then you have all the software
you will need for running mr_rosetta.
Note: this set of instructions is for Rosetta version 3.2...presumably
future versions will look very similar except for the numbering.
Downloading and installing Rosetta is pretty easy if your computer is
compatible and it takes about an hour if you have a 2-processor
machine...or just a few minutes if you have a multiprocessor machine to
compile with.
NOTE: If trouble...see the FULL INSTRUCTIONS at http://www.rosettacommons.org/manuals/archive/rosetta3.2_user_guide/ or http://www.rosettacommons.org/manuals/archive/rosetta3.2_ or http://www.rosettacommons.org/manuals/archive/rosetta3.2.1_user_guide/
NOTE: If trouble on ubuntu 11.04 or later...also see: http://morganbye.net/blog/2011/05/rosetta-32-ubuntu-1104 (but just concerning basic.settings and options.settings modifications and using : scons bin mode=release cxx=gcc cxx_ver=4.5)
-
go to http://depts.washington.edu/uwc4c/express-licenses/assets/rosetta/ ,
find "Academic License" and click on "LICENSE". Fill out the form, and
receive by email a link to the download site and a login/password.
-
Go to Rosetta 3.2 on the download site, select Download, and
Download Rosetta "as one bundle". Note if you have a mac you may need to
install some additional patches (please see the instructions on the download
page).
-
Unpack and install Rosetta: Go to the directory where you want to install
it and move the downloaded file "rosetta3.2_Bundles.tgz" there. Then...
tar xzf rosetta3.2_Bundles.tgz
This should give you a directory rosetta-3.2 that contains:
BioTools new_apps.note rosetta_demos
foldit release.note rosetta_fragments
manual rosetta_database rosetta_source
NOTE: if your directory contains .tgz files instead of the listing above
you may need to also run the same tar command on the individual .tgz files.
-
Now you want to compile. You must have python on your machine...if not
you will need to install it from http://www.python.org/. You will also
need scons. If you don't have scons you can get it from www.scons.org/.
NOTE: you will need version 2.2 or later of python and 0.96.1 of scons.
You can check your versions with:
python --version
scons --version
-
NOTE 2: on Ubuntu you may also need zlib1g-dev. You can get
this library and scons with:
sudo su # UBUNTU ONLY FOR INSTALLING zlib1g and scons
apt-get install zlib1g-dev
apt-get install scons
- In the scons command below the "-j2" means use 2 processors....adjust for
your system. Takes about 1 hour with 2 processors.
cd rosetta_source
python external/scons-local/scons.py -j2 bin mode=release
cd ..
If you get to "scons: done building targets." you are all set!
-
Notice where you have installed rosetta. The directory you just set up,
now containing "rosetta_source" and "rosetta_database" is to be
called "PHENIX_ROSETTA_PATH". If this directory is...
/net/sigma/raid1/rosetta-3.2 then you can now set a local environmental
variable in your ".profile" (sh or bash shell) or ".cshrc" (c-shell) to
mark where rosetta is located:
if you are using the bash or sh shells:
export PHENIX_ROSETTA_PATH=/your-path-to-rosetta-here/rosetta-3.2
or sh (C-shell):
setenv PHENIX_ROSETTA_PATH /your-path-to-rosetta-here/rosetta-3.2
- If your machine is behind a firewall and there is a proxy server you need
to go through, then if you use a .hhr file to download files from the PDB
then you will need to specify your proxy server. You can use the following
command to specify the proxy server (replacing it with YOUR proxy server).
If you are using the bash or sh shells:
export HTTP_PROXY=proxyout.mydomain.edu:8080
or sh (C-shell):
setenv HTTP_PROXY proxyout.mydomain.edu:8080
-
Now you are completely ready to go with Rosetta and with mr_rosetta.
Setting up for a run of mr_rosetta A. Fragment files from the Robetta server
To run mr_rosetta on your structure, you will need to use the Robetta
fragment server at the Univ. of Washington to generate 9-mer and 3-mer
fragments from the PDB that are compatible with your sequence file. This takes
a few hours but is very easy to do.
To obtain the two required files:
-
go to: http://robetta.bakerlab.org/fragmentsubmit.jsp
-
register
-
paste your sequence file into the form
-
Receive an email from the server after a few hours that our files are ready
-
Download the files (two files, with similar filenames, one containing a 9 and
and one a 3 like: aat000_09_05.200_v1_3.gz and aat000_03_05.200_v1_3.gz)
-
These are your fragment files. You will need to list them in your mr_rosetta
parameters file
-
NOTE1: if your chain has more than 650 residues, then you will need to split
it up into pieces of 650 residues or fewer before submitting the sequence
to the Robetta server. Then you will get several 3-mer and 9-mer fragments
files, one for each piece that you submit. You can then simply paste these
together after editing all but the first to fix the residue numbers. To edit
the files just use
phenix.phenix.adjust_robetta_resid <fragment_file_name> <new_fragment_file_name> <offset-for-residue numbers>
-
NOTE2: if you have multiple chain types in your structure then you will
want to have a separate set of fragments files for each chain type. You can
specify these with: fragment_files_chain_list, fragment_files_3_mer_by_chain,
and fragment_files_9_mer_by_chain instead of fragment_files.
Use fragment_files_chain_list to define which chain ID each
of your fragment_files_3_mer_by_chain and
fragment_files_9_mer_by_chain go with.
-
NOTE3: You only need one set of fragments files for each
UNIQUE chain. So if chains A and C are the same, you just
need to specify fragments for chain A.
If you have two different chains A
and B and fragment files frag_A_3 frag_A_9 frag_B_3 frag_B_9
then you should use: fragment_files_chain_list=A
fragment_files_chain_list=B
fragment_files_3_mer_by_chain=frag_A_3
fragment_files_9_mer_by_chain=frag_A_9
fragment_files_3_mer_by_chain=frag_B_3
fragment_files_9_mer_by_chain=frag_B_9
Setting up for a run of mr_rosetta B. Alignment files from
the hhpred server
You will need to tell mr_rosetta what to use as search models and
the alignment between the search models and your target structure. The
easiest way is to use the hhpred server (Söding J. (2005) Protein
homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960.)
Here is what to do:
-
go to: http://toolkit.tuebingen.mpg.de/hhpred and paste in your sequence
and hit submit job (using all defaults).
-
In a few minutes there will be a new page with alignments in color.
You want to click on the little Save button on the line above all the
alignments. Save that .hhr file; this contains a list of all the
PDB entries with similar sequences and the alignments.
-
Repeat the run of hhpred (hit "Rerun job"),
this time selecting Alignment mode as global in
the middle of the page. Save the resulting .hhr file as well.
The HHR analysis file from hhpred contains PDB entries
similar in sequence to your target and sequence alignments. It is used
to create a list of search models and alignment files. If you supply
this file you do not need to specify alignment files or search models
time and enter them as your hhr_files.
Search models and alignment files
If you supply an hhr analysis file from hhpred, you do not need to worry
(usually) about the details of your search models and alignment files.
However you can supply mr_rosetta with your own list of search models and
a corresponding list of alignment files. This section describes what the
alignment files need to look like (two ways you can format these files.)
Here are your options for supplying alignment information:
- If you have a pre-edited PDB file (i.e., you ran
something beforehand to make the sequence be just what you want), then you
just supply the PDB file and the sequence file (which may be identical or
the PDB file may have deletions relative to the sequence file) as in the
sample scripts.
- Otherwise if you have a .ali alignment file (see below) ,
you can supply that along with the PDB
file and the sequence file, and then phenix will use sculptor to apply the
alignment .
- Otherwise, and commonly, you supply a .hhr file, and mr_rosetta downloads
the pdb and applys the alignment in the hhr file (and you don't have to
supply either sequence or hhr file).
- If you want to apply an alignment yourself with an hhr file, you can use
mr_model_preparation
- If you want to apply an alignment yourself with an alignment file
that just contains 4 lines
(1) > (greater-than-symbol), then title for target sequence
(2) target sequence with - for gaps
(3) >the sequence of the target and
(4) the sequence of the protein in the template PDB you are supplying,
then you can use mr_model_preparation
You can generate an alignment file with phenix.muscle if you do not have
one from another source.
Use a command like this:
phenix.muscle -in my_two_sequences.dat -out my_alignment.ali
where my_two_sequences.dat looks like:
> title text for sequence of target (your structure) to follow
LVLKWVMSTKYVEAGELKEGSYVVIDGEPCRVVEIEKSKTGKHGSAKARIVAVGVFDGGKRTLSLPVDAQVEVPIIEKFT
AQILSVSGDVIQLMDMRDYKTIEVPMKYVEEEAKGRLAPGAEVEVWQILDRYKIIRVKG
> title text for sequence of template (supplied PDB) to follow
qlmdmrd AQILSVSGDVIQLMDMRDYKTIEVPMKYVEEEAKGRLAPGAEVEVWQILDRYKIIRVKG qlmdmrd
and my_alignment.ali (your .ali file) looks like:
> title text for sequence of target (your structure) to follow
LVLKWVMSTKYVEAGELKEGSYVVIDGEPCRVVEIEKSKTGKHGSAKARIVAVGVFDGGK
RTLSLPVDAQVEVPIIEKFTAQILSVSGDVIQLMDMRDYKTIEVPMKYVEEEAKGRLAPG
AEVEVWQILDRYKIIRVKG-------
> title text for sequence of template (supplied PDB) to follow
------------------------------------------------------------
-------------QLMDMRDAQILSVSGDVIQLMDMRDYKTIEVPMKYVEEEAKGRLAPG
AEVEVWQILDRYKIIRVKGQLMDMRD
You have two options for alignment files if you are going to use one.
- You can use an alignment file that sculptor can recognize. This file looks like this (there must be exactly the same number of characters for the target
and the template sequences (including dashes for gaps):
> title text for sequence of target (your structure) to follow
VDFNGYWKMLSNENFEEYLRALDVNVALRKIANLLKPDKEIVQDGDHMIIRTLSTFRNYIMDFQVGKEFEEDLTGIDD
> title text for sequence of template (supplied PDB) to follow
-AFSGTWQVYAQENYEEFLRAISLPEEVIKLAKDVKPVTEIQQNGSDFTITSKTPGKTVTNSFTIGKEAEIT--TMDG
-
Alternatively, you can use a second format for the alignment file for
mr_rosetta (This file is different than the alignment file for
sculptor or mr_model_preparation; it is a MODELLER-style .ali file).
Here is a sample:
## 1CRB_ 2qo4_A
# hhsearch
scores_from_program: 0 1.00
1 VDFNGYWKMLSNENFEEYLRALDVNVALRKIANLLKPDKEIVQDGDHMIIRTLSTFRNYIMDFQVGKEFEEDLTGIDD
0 -AFSGTWQVYAQENYEEFLRAISLPEEVIKLAKDVKPVTEIQQNGSDFTITSKTPGKTVTNSFTIGKEAEIT--TMDG
--
Here is what has to be on each line:
-
Line 1: two ## signs, then target PDB ID then template PDB ID. NOTE: the template
PDB ID must match the starting characters of your input
search model file names (the file names themselves, not
including the path to them)
-
Line 2 just has a # sign and the word hhsearch
-
Line 3 just has some text like scores_from_program: 0 1.00
-
Line 4 has a number, then the entire sequence of
the structure to be solved (the target), all on one line. The number
is how many residues at the N-terminus of this sequence are to be ignored
in generating a model. Usually this is 0, but if you supply a sequence that
is not what is in your crystal, it could have some other number.
If you are supplying a template PDB file that has residues to be removed,
indicate these positions with a dash (-) in your sequence. Note: the
sequence on this line cannot start with a dash.
-
Line 5 has a number, then the matching sequence of the template PDB, using
dashes (-) to indicate residues that are not present in the template PDB
There must be exactly the same number of characters in the sequence of your
target and the sequence of your template. The number is how many residues
at the N-terminus of your template PDB are to be ignored. If you have fully
edited your template PDB to match the target sequence, the number will be 0.
-
line 6 has two dashes: --
Output files from mr_rosetta
The output files from mr_rosetta are the same as those from .autobuild:
a model and map coefficients. These will be in a subdirectory listed at the
end of your log file. The files will be something like:
MR_ROSETTA_1/..../AutoBuild_run_1_/overall_best.pdb
and
MR_ROSETTA_1/..../AutoBuild_run_1_/overall_best_denmod_map_coeffs.mtz.
Graphical interface
A GUI for MR-Rosetta is now available in the "Molecular replacement" category.
Its function is essentially identical to that of the command-line version,
but many of the details are not shown by default. In addition to the methods
described above for configuring your system to use Rosetta from PHENIX, the
GUI also includes a preferences setting in the "Wizards" section for defining
the path to the Rosetta installation. If the GUI does not detect that you
have the environment set up correctly, it will issue a warning when started.
The configuration tab in the GUI
includes a list into which any combination of input files may be added; the
file types should be recognized (and any relevant data they contain extracted,
such as space group and MTZ label information) automatically. For more
complex inputs involving fragment files, click the button labeled "Other
inputs" below the list of files.
The number of
processors to use will be set to one fewer than the total the number of CPU
cores PHENIX thinks are available, but if you are using a queueing system this
number can be increased. You can change how MR-Rosetta runs child processes
by clicking the "Job control" button in the lower left-hand corner of the
configuration tab.
Because it usually takes hours to run, MR-Rosetta will always be launched by
the GUI as a "detached" job, meaning that you can close the GUI without
killing the process, and resume it later.
While MR-Rosetta is running, the current set of solutions will be continuously
updated in a tab labeled "Current results", with the relevant score (LLG from
Phaser, Rosetta score, or R-factor). You may view any of these solutions
by clicking the buttons next to them.
Once the job is complete, a simply summary tab will be displayed, listing
the output files and basic statistics such as R-factors. If the program was
successful, the R-free will usually be below 50%, although this may vary
depending on resolution and data quality. Buttons are provided to start
additional programs or view the model and maps.
Parameters files in mr_rosetta
When you run mr_rosetta it will write out a
mr_rosetta_params.eff parameter file that can be used to
re-run mr_rosetta (just as for essentially all PHENIX methods).
Examples
Standard run of mr_rosetta:
Before you run mr_rosetta, you need to get fragment files from the
Robetta server (see Setting up for a run of mr_rosetta, part A, above).
Then you need an hhr alignment information file from the hhpred server
(see Setting up for a run of mr_rosetta, part B, above), or else
a search model and an alignment file to go with it.
Once you have these files, running mr_rosetta is easy.
If you have a search model (coords1.pdb) and an alignment file for it
(coords1.ali), and fragment files test3.gz and test9.gz, and
a data file fobs.mtz with FP SIGFP and FreeR_flag, you can type:
phenix.mr_rosetta \
seq_file=seq.dat \
data=coords1.mtz \
alignment_files=coords1.ali \
search_models=coords1.pdb \
already_placed=False\
fragment_files = test3.gz \
fragment_files = test9.gz \
rescore_mr.relax=False \
rosetta_models=20 \
ncs_copies=2 \
space_group=p212121 \
use_all_plausible_sg=False \
nproc=200 \
group_run_command=qsub
and mr_rosetta will run automatically, generating 20 rosetta models during
structure determination.
If you have an hhr alignment information file, you can specify that instead
of search_models and alignment_files,
with the command hhr_files=myhhpred.hhr. Then you can tell
mr_rosetta how many of the PDB files to use with
read_hhpred.number_of_models=1 (to use just the best one, for example).
Running mr_rosetta with a model that is already place in the unit cell
You can run mr_rosetta as a purely model-building tool as well. This is convenient
if you have found a MR solution but cannot rebuild it successfully. Here
is an example. The keyword to use is already_placed=True:
phenix.mr_rosetta \
seq_file=seq.dat \
data=coords1.mtz \
search_models=coords1.pdb \
already_placed=True \
fragment_files = test3.gz \
fragment_files = test9.gz \
rescore_mr.relax=False \
rosetta_models=20 \
ncs_copies=2 \
space_group=p212121 \
use_all_plausible_sg=False \
nproc=200 \
group_run_command=qsub
Rebuilding your model with Rosetta before MR
If your search model is too distant to find a molecular replacement solution, you
can prerefine your model with Rosetta before carrying out molecular replacement.
Here is an example. The keyword to use is: run_prerefine=True.
NOTE 1: It is best to specify the number of ncs_copies if you use
run_prerefine. If you do not, then you may end up running several parallel
jobs, each of which is independently carrying out prerefinement on the
same input model (to be used later with different numbers of ncs copies).
Once you have run your job with one value of ncs_copies, you can just use
the best prerefined model from that job as a search model in your other
runs.
phenix.mr_rosetta \
seq_file=seq.dat \
data=coords1.mtz \
search_models=coords1.pdb \
run_prerefine=True \
number_of_prerefine_models=1000 \
fragment_files = test3.gz \
fragment_files = test9.gz \
rescore_mr.relax=False \
rosetta_models=20 \
ncs_copies=2 \
space_group=p212121 \
use_all_plausible_sg=False \
nproc=200 \
group_run_command=qsub
NOTE 2: if you have a model and just want to run pre-refinement and not
anything else...then you can do so without any data:
phenix.mr_rosetta \
seq_file=seq.dat \
search_models=coords1.pdb \
run_prerefine=True \
number_of_prerefine_models=1000
Your pre-refined model(s) will be listed in
MR_ROSETTA_1/GROUP_OF_PLACE_MODEL_1/RUN_FILE_1.log
and you can pick the best of these (most negative score, listed first).
Running mr_rosetta from a homology search (with an hhr file)
If you have run hhpred and obtained a .hhr file with a list of alignments
of proteins in the PDB with your sequence, you can run starting from your
sequence file and this .hhr file.
Here is an example. The keyword to use is: hhr_files=my_hhr_file.hhr.
phenix.mr_rosetta \
seq_file=bfr258e.fasta \
data=bfr258e_data.mtz \
hhr_files=bfr258e.hhr \
read_hhpred.number_of_models=1 \
read_hhpred.number_of_models_to_skip=0 \
fragment_files=aabfr__03_05.200_v1_3.gz \
fragment_files=aabfr__09_05.200_v1_3.gz \
rescore_mr.relax=False \
rosetta_models=20 \
ncs_copies=1 \
nproc=200 \
group_run_command=qsub
NOTE: it is generally a good idea to run several separate mr_rosetta jobs,
one for each homology model you want to extract from the PDB, and possibly
also separately for each possible number of NCS copies. You can do this by
adjusting the "read_hhpred.number_of_models_to_skip" from 0 to N and the
value of "ncs_copies" in the script above. In this way, you
can just pick the first job that gives you a good solution. If you run them all
at once, then all jobs will wait for the slowest job to finish at each step.
If there are multiple NCS copies and some search models are poor, this can
sometimes take a very long time.
Getting a default parameters file for mr_rosetta:
Usually you will want to edit a parameters file so that you can specify more
details of the run. You can get a default parameters file with:
phenix.mr_rosetta
and then just edit that file.
Testing mr_rosetta:
You can do a test of mr_rosetta to make sure everything is ok with:
phenix_regression.wizards.test_command_line_rosetta_quick_tests
Possible Problems
Debugging problems with running mr_rosetta:
If mr_rosetta fails, the first thing (after just checking the commands you
used) is to run the mr_rosetta regression tests to make sure that the
installations of phenix and rosetta are both ok:
phenix_regression.wizards.test_command_line_rosetta_quick_tests
That should take 10-20 minutes to run and say "OK" for all the tests.
If one or more of these say instead "FAILED" ...you can go into the failed run
(for example, test_autobuild/) and run the script there
(e.g., ./test_autobuild.com) which should fail..
and you can track down what is not working.
If the tests all are OK, then there is something specific to
your data or script.
The best way to debug this is to go to the last sub-process that has
failed or hung and look at the log file, and possibly re-run that step
from the terminal. Here is how to get there:
-
In your main log file the last lines will be something like...
Starting job 1...Log will be: /net/omega/raid1/scratch1/terwillMR_ROSETTA_2/GROUP_OF_PLACE_MODEL_1/RUN_FILE_1.log
- This log file in turn may say that further jobs were submitted...if so,
go to the end of that log file...find the name of the next log file...etc...
until you are at the very last thing done.
- Your last run is in the directory where RUN_FILE_1.log is located.
There will be the following files (more if there are lots of runs in
this directory of course):
terwill@sigma> cd MR_ROSETTA_2/GROUP_OF_PLACE_MODEL_1/
terwill@sigma> ls -tlr
total 60
-rwx------ 1 terwill lanl 1495 Feb 5 14:54 RUN_FILE_1.sh*
-rwx------ 1 terwill lanl 282 Feb 5 14:54 RUN_FILE_1*
-rw-r--r-- 1 terwill lanl 6431 Feb 5 14:54 PARAMS_1.eff
-rw-r--r-- 1 terwill lanl 6564 Feb 5 14:54 mr_rosetta_params.eff
-rw-r--r-- 1 terwill lanl 130 Feb 5 14:54 INFO_FILE_1
drwxr-xr-x 6 terwill lanl 4096 Feb 5 16:44 RUN_1/
-rw-r--r-- 1 terwill lanl 21575 Feb 5 16:45 RUN_FILE_1.log
-rw-r--r-- 1 terwill lanl 51 Feb 5 16:46 JOBS_RUNNING
Here:
- PARAMS_1.eff are the parameters used in the run
- RUN_FILE_1.sh actually runs the job (e.g., phenix.mr_rosetta PARAMS_1.eff)
NOTE: usually this is mr_rosetta but it could also be another
routine, so you do have to look at it or the first line of PARAMS_1.eff
which will name the routine used.
- RUN_FILE_1.log is the log file for this run. Look at the end of this file.
- The job is run in RUN_1/
The key here is that you can type
phenix.mr_rosetta PARAMS_1.eff
and the exact same job that failed or ran will be run again. You can use
this to debug what is going on.
- Look at the log file RUN_FILE_1.log and the files in RUN_1/.
Notice what the last file written in RUN_1/ is...this may give a
clue as to when and where the problem occurred. Usually there will be an
error message in RUN_FILE_1.log that may be informative.
- If the run in question is a Rosetta job, then the actual Rosetta job is run in a subdirectory of RUN_1/ This will be in a directory like:
MR_ROSETTA_2/GROUP_OF_ROSETTA_REBUILD_1/RUN_1/REBUILD_IN_SETS_1/RUN_5/WORK_1
Here this is in RUN_1 of a group of rosetta models, set 1, run 5,
working directory. In this directory you will find something like:
terwill@sigma> cd WORK_1/
terwill@sigma> ls -tlr
total 684
-rw-r--r-- 1 terwill lanl 1475 Feb 5 16:48 rebuild.flags
-rwxr-xr-x 1 terwill lanl 304 Feb 5 16:48 run_rebuild.sh*
-rw-r--r-- 1 terwill lanl 422921 Feb 5 17:26 S_3DZB__0001.pdb
-rw-r--r-- 1 terwill lanl 665 Feb 5 17:26 score.sc
-rw-r--r-- 1 terwill lanl 97437 Feb 5 17:26 rebuild.log
-rw-r--r-- 1 terwill lanl 158717 Feb 5 18:17 S_3DZB__0001_ed.pdb
Here:
- rebuild.flags are the commands to Rosetta
- run_rebuild.sh is a command file to run Rosetta with rebuild.flags
- rebuild.log is the log file
You can look at the log file and see if there are any messages. Then you can rerun the Rosetta job in a scratch directory with:
mkdir junk
cd junk
../run_rebuild.sh
With luck, you will get the same errors and you can debug from there by
changing the parameters or input files in rebuild.flags to see what
was causing the problems.
Specific limitations and problems:
mr_rosetta does not have the full flexibility of autobuild,
so you may want to get a nearly-complete model with mr_rosetta and then
use autobuild to increase the completeness and quality.
You may also want to take the output of mr_rosetta and then put it back in
as input to mr_rosetta and re-run it to improve your model.
File names of PDB files for mr_rosetta need to have at least 4 characters
before the .pdb. So test.pdb is fine, but my.pdb is not.
Literature
- Frank DiMaio, Thomas C. Terwilliger, Randy J. Read, Alexander Wlodawer, Gustav
Oberdorfer, Eugene Valkov, Assaf Alon, Deborah Fass, Herbert L. Axelrod,
Debanu Das, Sergey M. Vorobiev, Hideo Iwai, P. Raj Pokkuluri & David Baker
(2011) "Increasing the Radius of Convergence of Molecular Replacement by
Density and Energy Guided Protein Structure Optimization"
Nature, 473, 540-543.
-
DiMaio, F., Tyka, M.D., Baker, M.L., Chiu, W.,
Baker, D. (2009).
"Refinement of Protein Structures into Low-Resolution Density Maps
Using Rosetta"
J. Mol. Biol. 392, 181-190.
Additional information
List of all mr_rosetta keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names
red - parameter values
blue - parameter help
blue bold - scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
mr_rosetta
input_files
seq_file= None File with 1-letter code sequence of molecule. Chains
separated by blank line or greater-than sign
hhr_files= None Optional HHR analysis file from hhpred. This file
contains PDB entries similar in sequence to your target and
sequence alignments. It is used to create a list of search
models and alignment files. If you supply this file you do
not need to specify alignment files or search models. To
obtain this file go to:
http://toolkit.tuebingen.mpg.de/hhpred and paste in your
sequence. Run it twice, once with default parameters and once
with alignment mode=global. Download the '.hhr' output file
each time and enter them as your hhr_files. NOTE: If your
model has more than one chain type then you cannot use an hhr
file to start the analysis. Instead you will need to use
phenix.mr_model_preparation and phenix.automr to create your
model; then you can start phenix.mr_rosetta with
already_placed=True
alignment_files= None Alignment file. Supply a list if you have a list
of search models, with alignment_files=model_1.ali
alignment_files=model_2.ali etc. NOTE 1: Not needed if
you supply an hhr alignment analysis file. NOTE 2: Not
needed if your search model has the same sequence as
your sequence file. NOTE 3: If your model has more than
one unique chain, your alignment file should just be
the contents of several single alignment files, one
after the other in a single file. NOTE 4: Alignment
file format: .ali Looks like: Line 1: target PDB ID
then template PDB ID. NOTE: the template PDB ID must
match the first 5 characters of your input search model
file names (the file names themselves, not including
the path to them) Lines 2, 3 keep as is Line 4: OFFSET,
then entire sequence of target PDB line5: OFFSET, then
matching sequence of template PDB line 6: as is --""
NOTE OFFSET is residue position, starting with ZERO,
where the alignment starts. For line 4 (target PDB
sequence) take number from hhpred output and subtract 1
For line 5 (template PDB sequence) should always be 0
## 1CRB_ 2qo4_A # hhsearch scores_from_program: 0 1.00
1
VDFNGYWKMLSNENFEEYLRALDVNVALRKIANLLKPDKEIVQDGDHMIIRTLSTF
RNYIMDFQVGKEFEEDLTGIDDRKCMTTVSWDGDKLQCVQKGEKEGRGWTQWIEGDE
LHLEMRAEGVTCKQVFKKV 0
-AFSGTWQVYAQENYEEFLRAISLPEEVIKLAKDVKPVTEIQQNGSDFTITSKTPG
KTVTNSFTIGKEAEIT--TMDGKKLKCIVKLDGGKLVCRT----DRFSHIQEIKAGE
MVETLTVGGTTMIRKSKKI --
model_info_file= None Pickled file containing information about starting
model
data= None Data file with experimental data ( FP SIGFP or I SIGI ).
NOTE: May optionally instead contain F SIGF FreeR_flag for
backward compatibility
data_labels= None Optional labels for experimental data Normally these
would be something like I,SIGI or F,SIGF
free_r_data= None Optional data file with free_r flags. By default this
is the same file as used for experimental data.
free_r_labels= None Optional labels for free_r flags. Normally these
would be something like FreeR_flags or R_free_flags
labin= None Optional Labin line for file with data. This is present for
backward compatibility only. Normally use data_labels instead.
Only allowed if the data file contains myFP mySIGF and
myFreeR_flag: LABIN FP=myFP SIGFP=mySIGFP FreeR_flag=myFreeR_flag
search_models= None Search model PDB file Not needed if you supply an
hhr alignment analysis file You can supply several with
search_models=model_1.pdb search_models=model_2.pdb (and
matching list of alignment files)... NOTE: If you supply
a search model that contains more than one chain, then
the entire search model (all chains) will be used. If
your search model has multiple chains and you specify
model_already_placed=False (i.e., run MR) then MR will be
run with your entire search model as a single search
model. If you need to run MR on parts of your search
model, or if you need to combine several search models,
then you will need to run automr first, then take the
resulting model and use it as a search_model for
mr_rosetta, specifying model_already_placed=True.
copies_in_search_models= None You can specify how many ncs copies are in
each search model (Not usually necessary, used
to skip combinations of ncs_copies and search
models that are implausible)
mr_rosetta_solutions= None You can read in prior solutions (.pkl files)
Then you can skip steps and it will pick up where
it left off. You can load just some solutions with
the keyword ids_to_load
ids_to_load= None You can restrict the load of mr_rosetta solutions to
just one or more id's
map_coeffs= None Data file (mtz format) with map coeffs ( FP PHIB FOM )
for rosetta electron density Normally use None; this is used
in iterations of mr_rosetta to pass map coeffs from
autobuild to the next cycle where they are used instead of
refinement or phaser map coeffs. If used, also should set
run_refine_top_models=False Always used with
labin_map_coeffs and map
labin_map_coeffs= None Labin line for map coeffs file Something like
FP=FP PHIB=PHIM FOM=FOMM Normally use None; used by
mr_rosetta in iteration
map= None Map file (ccp4 format) with map coeffs for rosetta electron
density. This is expected if map_coeffs is specified. Normally this
map should not contain freeR reflection information. Normally use
None; this is used in iterations of mr_rosetta to pass map from
autobuild to the next cycle where it is used instead of refinement
or phaser map
refinement_params= None You can specify a parameters file for refinement
display_solutions= False You can display any solutions in
mr_rosetta_solutions and select some of them with
ids_to_load
fragment_files= None Fragment files (normally required for Rosetta
rebuilding if your template has any gaps to fill) Enter
one at a time with fragment_files=myfragments3.gz
fragment_files=myfragments9.gz To obtain the two
required files...go to:
http://robetta.bakerlab.org/fragmentsubmit.jsp and
register, then submit your sequence file, and when the
server finishes, download the two files '...9...gz' and
'...3....gz' These are your fragment files of length 9
and 3 NOTE: if your molecule has multiple chains, use
instead fragment_files_chain_list,
fragment_files_3_mer_by_chain, and
fragment_files_9_mer_by_chain.
fragment_files_chain_list= None If your molecule has multiple chains,
use fragment_files_chain_list,
fragment_files_3_mer_by_chain, and
fragment_files_9_mer_by_chain instead of
fragment_files. Use fragment_files_chain_list
to define which chain ID each of your
fragment_files_3_mer_by_chain and
fragment_files_9_mer_by_chain go with. NOTE:
You only need one set of fragments files for
each UNIQUE chain. So if chains A and C are
the same, you just need to specify fragments
for chain A. If you have two different chains
A and B and fragment files frag_A_3 frag_A_9
frag_B_3 frag_B_9 then you should use:
fragment_files_chain_list=A
fragment_files_chain_list=B
fragment_files_3_mer_by_chain=frag_A_3
fragment_files_9_mer_by_chain=frag_A_9
fragment_files_3_mer_by_chain=frag_B_3
fragment_files_9_mer_by_chain=frag_B_9
fragment_files_9_mer_by_chain= None See fragment_files_chain_list.
fragment_files_3_mer_by_chain= None See fragment_files_chain_list.
use_dummy_fragment_files= False You can use dummy fragment files (this
is ok if your template matches your sequence
file already). If True, then you do not need
to supply fragment files
sort_fragment_files= True Sort the fragment files by name so that 9 is
first then 3
output_files
log= mr_rosetta.log Output log file
params_out= mr_rosetta_params.eff Parameters file to rerun mr_rosetta
directories
temp_dir= "" Optional temporary work directory
workdir= "" Optional work directory. Base path for all work
output_dir= "" Output directory where files are to be written
gui_output_dir= None Output directory for PHENIX GUI. Not used when run
from the command line.
top_output_dir= None Output directory for entire set of runs
rosetta_path= "" Location of rosetta directories All rosetta files are
located relative to this path You can set the environment
variable 'PHENIX_ROSETTA_PATH' to indicate where rosetta
is to be found. In csh/tcsh use something like: setenv
PHENIX_ROSETTA_PATH /Users/Shared/unix/rosetta In bash/sh
use: export PHENIX_ROSETTA_PATH=/Users/Shared/unix/rosetta
rosetta_binary_dir= "rosetta_source/bin" Directory with rosetta scripts
for mr_rosetta Path is relative to rosetta_path
rosetta_binary_name= "mr_protocols.default" Name of rosetta binary Path
is relative to rosetta_path+rosetta_binary_dir
NOTE: any suffixes such as '.default',
'.macosgccrelease', '.linuxrelease' are ignored
rosetta_script_dir= "rosetta_source/src/apps/public/electron_density"
Directory with rosetta scripts for mr_rosetta Path is
relative to rosetta_path
rosetta_pilot_script_dir= "rosetta_source/src/apps/pilot/frank/"
Directory with development rosetta scripts for
mr_rosetta Path is relative to rosetta_path
rosetta_database_dir= "rosetta_database" Location of rosetta database
Path is relative to rosetta_path
read_hhpred
number_of_models= 5 Take the first number_of_models models from the
hhpred similarity analysis that are specified with
hhr_files. (this will give you number_of_models models
for each hhr file)
number_of_models_to_skip= 0 Skip the first number_of_models_to_skip
models (most similar) in hhpred file (Useful
along with number_of_models to pick any one or
group of templates from your hhpred file
copies_to_extract= None Number of copies of the unique chain defined in
your hhr_file to extract from the template PDB file
(if possible). You can specify more than one value:
copies_to_extract='1 2 4' will try to run MR with a
monomer, dimer, and tetramer from each template PDB
file (if available). If None, then the values used
will be: 1, ncs_copies, and all other divisors of
ncs_copies, so if ncs_copies=6, the values will be 1,
2, 3, and 6. Note: if ncs_copies is also None and the
number of copies that can fit in the cell is large,
then this can lead to a lot of different combinations
being tried.
only_extract_proper_symmetry= False Only extract groups of copies from
template that form proper symmetry (i.e.,
do not extract 2 molecules from a trimer).
Note: not implemented. All are currently
extracted.
place_model
run_place_model= True Run place_model: use AutoMR or place existing
model Each model will be used to generate
number_of_prerefine_models. These will be grouped in
batches of number_of_models_in_ensembles, with up to
ensembles_to_create ensembles. An ensemble can be a
single model or group of models. Note: this can take a
lot of CPU time. You might want to only do this on a
single input model
model_already_placed= False Use model_already_placed to indicate that
your model is already placed in the correct
location
model_already_aligned= False Use model_already_aligned to indicate that
your model is already edited to match your
sequence
number_of_output_models= 5 Number of Phaser molecular replacement models
to consider
align_with_sculptor= True Use phenix.sculptor and
phenix.mr_model_preparation to apply alignments and
edit templates (alternative is to use Rosetta
scripts).
identity= None Percent identity between search model and target Normally
set automatically based on your alignment file
identity_for_scoring_only= 25 Percent identity between search model and
target to be used for LLG scoring. This is
normally a fixed value so that scores from
different templates can be compared.
use_all_plausible_sg= True Often you will want to search all space
groups with the same point group as you may not
know which is correct from your data.
overlap_allowed= 10 Solutions will be accepted by default if fewer than
10 percent of residues are involved in clashes. You can
choose to increase the percent clashes if the packing
is tight and your search molecule is not exactly the
same as the molecule in the cell.
selection_criteria_rot_value= 75 Choose a value for your criterion for
keeping rotation solutions at each stage.
Percent of Best Score: AutoMR looks down
the list of LLG scores and only keeps the
ones that differ from the mean by more
than the chosen percentage, compared to
the top solution.
fast_search_mode= True Run phaser with selection_criteria_rot_value and
then if no obvious solution, repeat with cutoff
lowered by search_down_percent
search_down_percent= 25 Used if fast_search_mode=True. Run phaser with
selection_criteria_rot_value and then if no obvious
solution, repeat with cutoff lowered by
search_down_percent
mr_resolution= 3.0 Resolution for molecular replacement
refine_after_mr= True Refine placed model for map calculation only
before rescoring and rebuilding. Required for
denmod_after_refine
denmod_after_refine= True After refinement, density-modify map before
rosetta scoring and rebuilding Note:
denmod_after_refine appears separately in the
scopes place_model and refine_top_models so you
need to set it separately in each place
ps_in_rebuild= False You can choose to use a prime-and-switch resolve
map in map calculation/density modification in the
place_model step.
find_ncs_after_mr= True Find NCS in model after placing model if
ncs_copies is greater than 1
min_length_ncs= 10 Minimum chain length for NCS search
copies_of_search_model_to_place= None (Optional) number of copies of
search model to place with MR. This is
how many new copies of search model to
add to anything already present. By
default, calculated from ncs_copies,
copies in the search model, and number
of already-placed copies. defined with
fixed_model=myfixed_model.pdb Note
difference from ncs_copies which is the
total copies in the asymmetric unit
prerefine Used if you need to improve your models before MR
run_prerefine= False Pre-refine models before MR
number_of_prerefine_models= 1000 Number of models to generate in
prerefinement
number_of_models_in_ensemble= 1 Number of top-scoring models to use
as an ensemble for MR NOTE: Not
implemented: only one model used at
this point
fixed_ensembles If you already know the placement of one or more
molecules you can specify them as fixed ensembles. NOTE
1: you are specifying location and orientation of one or
more copies of the search model NOTE 2: you cannot
specify use_all_plausible_sg if you have fixed ensembles
fixed_ensembleID_list= None Enter the word 'ensemble_1' to indicate
that you want to specify a copy of your search
model that is to be fixed. To specify more
than one placement just say 'ensemble_1' more
than once. For example if you specify
fixed_ensembleID_list=ensemble_1 and do not
specify fixed_euler_list or fixed_frac_list,
it will be assumed that you have one copy of
your search model already placed (with the
input coordinates), and you are looking to
place an additional
copies_of_search_model_to_place copies of the
search model.
fixed_euler_list= 0.0 0.0 0.0 Enter Euler angles (from AutoMR or
Phaser) for fixed component. NOTE 2: you can enter
more than one fixed component if you want. If you
do, then enter fixed_euler_list in multiples of 3
numbers and also fixed_frac_list in multiples of 3
numbers.
fixed_frac_list= 0.0 0.0 0.0 Enter fractional offset (location) for
fixed component (from AutoMR or Phaser) for fixed
component. NOTE 2: you can enter more than one fixed
component if you want. If you do, then enter
fixed_euler_list in multiples of 3 numbers and also
fixed_frac_list in multiples of 3 numbers.
fixed_frac_list_is_fractional= True Normally fixed_frac_list is
fractional coordinates. You can say
fixed_frac_list_is_fractional=False to
instead use orthogonal angstroms to
specify the locations of your
ensembles.
rescore_mr
run_rescore_mr= True Rescore MR solutions, optionally by rosetta modeling
nstruct= 5 Number of models to build with rosetta in rescoring if
relax=True
relax= False Relax solution with rosetta modeling before rescoring NOTE
1: if you only have one solution to rescore (as in the case where
you supplied a placed model) you might want to say relax=False to
not bother to relax the model. NOTE 2: if your model has multiple
chain types then you have to use relax=False.
include_unrelaxed_in_scoring= False Include unrelaxed (original) model
in scoring
align= True Use alignment file in relax procedure
edit_model= False Edit model before rescoring using model_info_file
stage_to_rescore= mr_solution You can specify the stage of solutions to
consider for rescoring (i.e., mr_solution,
rosetta_solution) Default is mr_solution; during
scoring of rosetta solutions it is set automatically
to rosetta_solution
rosetta_rebuild
run_rosetta_rebuild= True Run rosetta modeling on best rescored MR
solutions
stage_to_rebuild= rescored_mr_solution Normally set automatically. You
can specify the stage of solutions to consider for
rebuilding (i.e., mr_solution, rosetta_solution)
Default is rescored_mr_solution
max_solutions_to_rebuild= 5 Keep all solutions with at least
llg_percent_of_max_to_keep, up to
max_solutions_to_rebuild, and at least
min_solutions_to_rebuild
min_solutions_to_rebuild= 1 Keep all solutions with at least
llg_percent_of_max_to_keep, up to
max_solutions_to_rebuild, and at least
min_solutions_to_rebuild
llg_percent_of_max_to_keep= 50 Keep all solutions with at least
llg_percent_of_max_to_keep, up to
max_solutions_to_rebuild, and at least
min_solutions_to_rebuild
rosetta_models= 100 Number of models to build with rosetta in rebuilding
chunk_size= 1 If background=False, divide the nstruct models into chunks
of chunk_size or smaller to keep the length of individual
jobs shorter
edit_model= True Edit model before use
superpose_model= False Superpose the rebuilt model on the original
model. (Restore original location and orientation as
much as possible.)
rosetta_rescore
run_rosetta_rescore= True Run Phaser rescoring on top rosetta rebuilt
models
percentage_to_rescore= 20 Rescore percentage_to_rescore of top rosetta
solutions with Phaser RNP LLG scoring
min_solutions_to_rescore= 2 Rescore at least percentage_to_rescore of
the rosetta, models, and at least
min_solutions_to_rescore Usually choose at
least 2 so that they can be compared
similarity
run_similarity= False Identify similarity of top solutions
required_cc= 0.20 Value of required_cc for number_of_required_cc top
solutions to carry on
number_of_required_cc= 5 Number of top solutions with CC of required_cc
or better to top solution required to carry on
refine_top_models
run_refine_top_models= True Refine models to get map before relaxing
them to get new map coeffs. Use unrefined model
in relaxation however
stage_to_refine= None You can specify the stage of solutions to consider
for rescoring (i.e., mr_solution, rosetta_solution)
Default is rosetta_solution; during scoring of rosetta
solutions
sort_score_type= None You can specify the scoring method for choosing
top models to refine
percent_to_refine= 20 Percentage of top models to refine
denmod_after_refine= True After refinement, density-modify map before
rosetta scoring and rebuilding. Note: this appears
separately in the scopes place_model and
refine_top_models so you need to set it separately
in each place
average_density_top_models
run_average_density_top_models= True Average density from top models
percent_to_average= 100 percentage of refined models to use in averaging
relax_top_models
run_relax_top_models= True Relax rosetta rebuilt solutions, scoring with
LLG
stage_to_relax= None You can specify the stage of solutions to consider
for relaxing (i.e., mr_solution, rosetta_solution)
Default is rescored_rosetta_solution; during
relax_top_models
number_to_relax= 2 Number of top models to relax
nstruct= 5 Number of rosetta relaxed models to build for each starting
model in relaxation (best will be chosen)
autobuild_top_models
run_autobuild_top_models= True Autobuild top relaxed_rosetta_solutions
number_to_autobuild= 2 Number of top models to autobuild
quick= False Use fewer cycles
phase_and_build= False Use phase_and_build to rebuild models instead of
autobuild (much faster, but not quite as good)
macro_cycles= None Number of overall cycles for phase_and_build
(macro_cycles) or autobuild (n_cycle_rebuild_max) Set by
default if None (recommended)
morph= False You can choose whether to distort your model in order to
match the current working map. This may be useful for MR models
that are quite distant from the correct structure. [See also
repeats_with_morph which will instead morph your model at the
beginning of each repeat cycle up to repeats_with_morph times.]
edit_model= True Edit model before rescoring using model_info_file
use_map_coeffs= True Use current best map as starting map coeffs in
autobuild (Only applies if phase_and_build=False)
setup_repeat_mr_rosetta
run_setup_repeat_mr_rosetta= True Set up for running mr_rosetta again
using results from current run. Must be run
before repeat_mr_rosetta
repeats= 1 Maximum repeats of running mr_rosetta. (Runs one cycle if
repeats=0)
template_repeats= 0 Number of repeat cycles in which to restart from the
template (the MR solution corresponding to best
current model) instead of continuing with best current
model. This may be useful for very poor starting
models. This is normally combined with morph_repeats
(i.e., morph the template at the end of one cycle and
use that in the next cycle
morph_repeats= 0 Number of repeat cycles in which to morph the starting
model corresponding to best current model instead of
continuing as is with best current model. This may be
useful for very poor starting models. This is normally
combined with template_repeats (i.e., morph the template
at the end of one cycle and use that in the next cycle.
[See also morph which instead applies morphing during
autobuilding.]
number_to_repeat= 1 Number of top models to re-run in MR rosetta Usually
this should be 1. If you are using condor, or set
one_subprocess_level=True, it must be 1
acceptable_r= 0.25 Used to decide whether the model is acceptable enough
to quit if it is not improving much. A good value is 0.25
minimum_delta_r= None Used to decide whether the model is improving.
Skip additional cycles if improvement in R since last
is less than minimum_delta_r
repeat_mr_rosetta
run_repeat_mr_rosetta= True Run mr_rosetta again using results from
current run
copies_in_new_search_group= 1 Number of copies of model to used to
create a search model that is to be placed
with MR on repeat cycles (if not all ncs
copies are found on the first cycle). If you
are searching with a dimer on the first
cycle, you might want to consider
copies_in_new_search_group=2.
update_map_coeffs_with_autobuild= True Update map coeffs only during
autobuild (not with refinement) on
cycles of iteration
rosetta_modeling
map_resolution= 3. Map resolution in rosetta modeling
map_grid_spacing= 1.5 Grid spacing in map in Rosetta rebuilding
map_weight= 1. Weighting on map in Rosetta rebuilding
map_window= 5 Smoothing distance (residues) for map scoring
include_solvation_energy= True Include solvation energy term in Rosetta
modeling If you are modeling a membrane
protein you may want to turn this off. Note:
if False, then a weights_file is created with
the specification of fa_sol 0.0
weights_file= None Optional weights file for Rosetta. If specified, this
will be used instead of the file
$PHENIX_ROSETTA_PATH/rosetta_database/scoring/weights/score
12_full.wts
crystal_info
resolution= 0. high-resolution limit for map calculation
space_group= None You can specify the space group. If None then the
space group in your input data file or its inverse will be
used unless you specify use_all_plausible_sg=True
chain_type= *PROTEIN DNA RNA Chain type (for identifying main-chain and
side-chain atoms)
ncs_copies= 1 Number of copies of unique sequence defined in sequence
file expected in the a.u. This is how many molecules there
are in the asymmetric unit. NOTE: you can specify more than
one value with ncs_copies='1 2 7', in which case each will
be tried. You can also specify None in which case all likely
values (those leading to solvent content from 0.35 to 0.65)
will be tried.
control
verbose= False Verbose output
debug= False Debugging output
raise_sorry= False Raise sorry if problems
dry_run= False Just read in and check parameter names
nproc= 1 Number of processors to use
group_run_command= "sh " Command to use to run multiple jobs This may be
sh if you are using a single machine (where you might
set background=True) or something like 'qsub' or
'qsub -q all.q@theta' on a cluster (where you should
leave background=False)
queue_commands= None You can add any commands that need to be run for
your queueing system. For example on a PBS system you
might say: queue_commands='#PBS -N mr_rosetta'
queue_commands='#PBS -j oe' queue_commands='#PBS -l
walltime=03:00:00' queue_commands='#PBS -l
nodes=1:ppn=1'
condor= None Specifies if the group_run_command is submitting a job to a
condor cluster. Set by default to True if
group_run_command=condor_submit, otherwise False. For condor job
submission mr_rosetta uses a customized script with condor
commands. Also uses one_subprocess_level=True
one_subprocess_level= None Specifies that a subprocess cannot submit a
job
single_run_command= "sh " Command to use to run single jobs Normally
this is sh
last_process_is_local= True If true, run the last process in a group in
background with sh as part of the job that is
submitting jobs. This prevents having the job
that is submitting jobs sit and wait for all the
others while doing nothing
background= None Run in background. If None, automatically set to True
if nproc is greater than one and group_run_command is sh
ignore_errors_in_subprocess= True Generally use
ignore_errors_in_subprocess=True to ignore
errors in sub-processes. This allows you to
continue even if a few jobs crash. If all
jobs in a group crash, the process will
stop. NOTE: if a job hangs or never
runs...this will not be detected and you
will have to either put a file with the
name FINISHED in the directory where the
job was to run (e.g,
MR_ROSETTA_3/GROUP_OF_PLACE_MODEL_1/RUN_1/FI
NISHED) or stop the whole job by putting a
file with the name STOPWIZARD in the main
run directory (e.g.,
MR_ROSETTA_3/STOPWIZARD)
check_run_command= False Try out run command to make sure it works Use
False if your queue may not be available at the
beginning of your run. Use True if you want to check
things out
max_wait_time= 100 Maximum time (sec) to wait for a file to be written
(Useful for queues or nfs-mounted systems)
wait_between_submit_time= 1.0 You can specify the length of time
(seconds) to wait between each job that is
submitted when running sub-processes. This can
be helpful on NFS-mounted systems when running
with multiple processors to avoid file
conflicts. The symptom of too short a
wait_between_submit_time is File exists:....
wizard_directory_number= None Directory number for MR_ROSETTA_xx.
Normally None except if called from GUI
n_dir_max= 100000 Maximum number of directories to create (must be as
big as nproc or nstruct/chunk)
number_to_print= 5 Number of entries to print in long lists
write_run_directory_to_file= None The working directory name is written
to this file
resolve_command_list= None You can supply any resolve command here for
autobuild NOTE: for command-line usage you need to
enclose the whole set of commands in double quotes
(") and each individual command in single
quotes (') like this:
resolve_command_list="'no_build' 'b_overall
23' "
start_point= *place_model rescore_mr rosetta_rebuild rosetta_rescore
similarity refine_top_models average_density_top_models
relax_top_models autobuild_top_models
setup_repeat_mr_rosetta repeat_mr_rosetta You can specify
what point to start at by supplying a rosetta_solutions
.pkl file and specifying a place to start
stop_point= place_model rescore_mr rosetta_rebuild rosetta_rescore
similarity refine_top_models average_density_top_models
relax_top_models autobuild_top_models
setup_repeat_mr_rosetta repeat_mr_rosetta You can specify a
step to stop at (after completing this step) For example, to
carry out just place_model, you can say
start_point=place_model stop_point=place_model.
clean_up= False At the end of the autobuild runs the TEMP directories
will be removed if clean_up is True.
non_user_params
file_base= None String defining intermediate file names Normally set
automatically. If given, must match the 3rd word on the first
line of the alignment file
print_citations= True Print citation information at end of run
highest_id= 0 Start ID numbers with highest_id+1
is_sub_process= False identifies if this is a sub-process or top-level
job
dummy_autobuild= False Allows you to skip actual run of autobuild
dummy_refinement= False Allows you to run refinements but not to change
coordinates
dummy_rosetta= False Allows you to skip models from rosetta steps
prerefine_only= False Set internally to allow pre-refinement without data
skip_clash_guard= True Skip clash guard check in refinement
correct_special_position_tolerance= None Adjust tolerance for special
position check. If 0., then check
for clashes near special positions
is not carried out. This sometimes
allows phenix.refine to continue
even if an atom is near a special
position. If 1., then checks within
1 A of special positions. If None,
then uses phenix.refine default. (1)
comparison_mtz= None Allows you to compare results with an existing map
file
labin_comparison_mtz= None labin line for comparison mtz
write_local_files= False Used to create test pickle files only with
phenix.mr_rosetta mr_rosetta_solutions=results.pkl
display_solutions=True write_local_files=true
rosetta_fixed_seed= None Fixed seed for rosetta (so that the same answer
is always obtained. Use for regression tests only).
|