Molecular replacement and autobuilding using Phaser, Rosetta, and autobuild with mr_rosetta

Contents

Author(s)

Purpose

mr_rosetta is a procedure for extending the range of molecular replacement by combining tools from the structure-modeling field (Rosetta) with crystallographic molecular replacement, model-building, density modification and refinement. The approach is described in Dimaio et al. (2011). It can also be used to rebuild a model with a combination of Rosetta and Phenix tools.

A key requirement for using mr_rosetta is that you have to have a sequence alignment of the protein used as a template to model your target protein. You can try several different alignments, but a good alignment has to be in your set of alignments or the procedure will be unlikely to be successful. The reason is that Rosetta homology modeling makes strong use of the sequence, so if your alignment is incorrect you are essentially trying to build the wrong molecule.

The basic process is to find MR solutions with Phaser, rebuild them with Rosetta, then rebuild those models with phenix.autobuild. The combination of Rosetta rebuilding and phenix rebuilding is the key part of this method. In slightly more detail, this process is to select possible MR solutions (one of which must later be shown to be correct for the procedure to succeed)with Phaser, score with LLG following Rosetta relaxation, pick the best solutions, rebuild each of these with Rosetta including map information (density term), score the resulting models with Rosetta, select the highest and score with LLG, verify that the top solutions are all about the same (electron density maps are correlated), and rebuild the top models with autobuild.

mr_rosetta can handle a single copy of a single chain, or multiple copies of a single chain (NCS), or multiple copies of multiple chains (groups of NCS). If you supply one or more input search models, then the entire crystallographic asymmetric unit must contain some multiple of the search models you supply (phaser will be used to find copies of the search model). NCS will be found automatically in your search model and in any models assembled by mr_rosetta.

NOTE: if your molecule has multiple chain types, then you cannot use the simple hhr file input (see below) and you cannot automatically run pre-refinement with rosetta on your molecule. Instead you need to use mr_model_preparation and Phaser to place your model. Then you can supply the aligned, placed structure to mr_rosetta for rebuilding. Additionally in this case you will need to supply a different set of fragments files for each chain type.

Tools from Rosetta that are used in mr_rosetta

Steps where mr_rosetta uses structure-modeling algorithms

Summary of the procedure used in mr_rosetta

The overall process in one cycle of mr_rosetta is: (a) edit the model and place it in the unit cell (e.g., MR, molecular replacement), (b) score all MR solutions and take the best ones by LLG for further steps, (c) rebuild each model 20-2000 times using Rosetta and density-modified 2Fo-Fc map to yield Rosetta models, (d) refine Rosetta models, average density from top 20%, continue rebuilding each Rosetta model using averaged density, and (e) take top models based on LLG score and rebuild with autobuild. An optional prerefinement step is to carry out Rosetta modeling in step (a) above, before carrying out molecular replacement.

Details of the procedure used in mr_rosetta

NOTE 1: For tailoring of this step, use mr_model_preparation and then supply the aligned model to mr_rosetta.

NOTE 2: If your structure contains more than one chain or requires more than one homology model to represent the structure, then you need to use mr_model_preparation and Phaser to place your model. Then you can supply the aligned, placed structure to mr_rosetta for rebuilding.

#!/bin/sh
cd MR_ROSETTA_1/RESCORE_MR_1/RELAX_AND_SCORE_IN_SETS_1/RUN_1/WORK_1
 /net/terwill/rosetta/rosetta_source/bin/mr_protocols.default.linuxgccrelease \
-database /net/terwill/rosetta/rosetta_database \
-MR:mode cm \
-in:file:extended_pose 1 \
-in:file:fasta MR_ROSETTA_1/WORK_1/EDITED_1crb_fasta.txt \
-in:file:alignment MR_ROSETTA_1/WORK_1/EDITED_1crb_2qo4.ali \
-in:file:template_pdb MR_ROSETTA_1/AutoMR_run_1_/2QO4.1.pdb \
-relax:default_repeats 4 \
-relax:jump_move true \
-edensity:mapreso   3.00 \
-edensity:grid_spacing 1.5 \
-edensity:mapfile \
     MR_ROSETTA_1/AutoMR_run_1_/2QO4.1_refine_001_map_coeffs.map \
-edensity:sliding_window_wt 1.0 \
-edensity:sliding_window 5 \
-cm:aln_format grishin \
-MR:max_gaplength_to_model 0 \
-nstruct 1 \
-ignore_unrecognized_res  \
-overwrite
#!/bin/sh
cd MR_ROSETTA_1/WORK_1/REBUILD_IN_SETS_1/RUN_1/WORK_1
 /net/terwill/rosetta/rosetta_source/bin/mr_protocols.default.linuxgccrelease \
-database /net/terwill/rosetta/rosetta_database \
-MR:mode cm \
-in:file:extended_pose 1 \
-in:file:fasta MR_ROSETTA_1/WORK_1/EDITED_1crb_fasta.txt \
-in:file:alignment MR_ROSETTA_1/WORK_1/EDITED_1crb_2qo4.ali \
-in:file:template_pdb MR_ROSETTA_1/AutoMR_run_1_/2QO4.1.pdb \
-loops:frag_sizes 9 3.2 \
-loops:frag_files inputs/aa1crb_09_05.200_v1_3.gz \
   inputs/aa1crb_03_05.200_v1_3.gz none \
-loops:random_order \
-loops:random_grow_loops_by 5 \
-loops:extended \
-loops:remodel quick_ccd \
-loops:relax relax \
-relax:default_repeats 4 \
-relax:jump_move true \
-edensity:mapreso     3.00 \
-edensity:grid_spacing 1.5 \
-edensity:mapfile  MR_ROSETTA_1/AutoMR_run_1_/2QO4.1_refine_001_map_coeffs.map \
-edensity:sliding_window_wt 1.0 \
-edensity:sliding_window 5 \
-cm:aln_format grishin \
-MR:max_gaplength_to_model 8 \
-nstruct 1  \
-ignore_unrecognized_res \
-overwrite
#!/bin/sh
cd MR_ROSETTA_1/GROUP_OF_RESCORE_MR_ROSETTA_2/RUN_1/RESCORE_MR_1/RELAX_AND_SCORE_IN_SETS_1/RUN_1/WORK_1
 /net/terwill/rosetta/rosetta_source/bin/mr_protocols.default.linuxgccrelease \
-database /net/terwill/rosetta/rosetta_database \
-MR:mode relax \
-in::file::s \
 MR_ROSETTA_1/WORK_1/REBUILD_IN_SETS_1/RUN_8/WORK_1/S_2QO4B_0001_edited.pdb \
-relax:default_repeats 4 \
-relax:jump_move true \
-edensity:mapreso   3.00 \
-edensity:grid_spacing 1.5 \
-edensity:mapfile \
 MR_ROSETTA_1/WORK_1/REBUILD_IN_SETS_1/RUN_8/WORK_1/S_2QO4B_0001_edited_refine_001_map_coeffs.map \
-edensity:sliding_window_wt 1.0 \
-edensity:sliding_window 5 \
-nstruct 1 \
-overwrite

Notes on the procedure used in mr_rosetta

Viewing solutions and restarting with saved solutions

At each stage, existing solutions are saved as a python "pkl" file and can be read back in to mr_rosetta with "mr_rosetta_solutions=xxx.pkl". These solutions can be displayed with "display_solutions=True". Existing solutions are stored as "mr_rosetta_solution" objects which keep track of the model and its history, the map_coefficients and labels, etc. These can be read in to mr_rosetta with the keyword "mr_rosetta_solutions=results.pkl" and used as inputs for subsequent runs, starting at any step that can use those solutions.

NOTE: You can re-start mr_rosetta only at the beginning of major stages (like "place_model", "rosetta_rebuild" etc)...but not in between.

Normally at the end of a major stage a .pkl file is written out with text like "type this to see all the results". You can almost always give your original command, the command "start_point=xxx" and "mr_rosetta_solutions=my_pickle_file.pkl" and it should then continue on from there.

Running mr_rosetta on a cluster

Jobs can be run on a single machine or on a cluster. A run command for single jobs (single_run_command="sh") and a run command for batch jobs (group_run_command=qsub) can be specified as well as the number of processors to use (nproc=200).

The qsub command is used in Sun Grid Engine clusters. You can also use mr_rosetta on a Condor cluster, using group_run_command="condor_submit ".

Single file system required for mr_rosetta

All files are stored on a single file system that must be accessible to all jobs.

Read/write delay allows for slow NFS disks

Read/write to files are (generally) accompanied by a wait for appearance of the new file of up to max_wait_time=100 sec.

Tracking your log files

mr_rosetta runs all cpu-intensive jobs as sub-processes. When it submits a sub process to do the work it lists the name of the corresponding log file. You can work your way down to the bottom level at any time by reading through these log files, copying the name of the next log file, and opening it until you get to the place where the actual work is done.

Re-running parts of your mr_rosetta jobs

Sub-processes are always run in sub-directories. Each sub-process has a file "RUN_FILE_1" that contains the information to run the sub-process, a parameter file PARAMS_1.eff and a log file "RUN_FILE_1.log" with the log file of running that sub-process. Note that you can use the parameters files to re-run any jobs that you want. You can say something like:

phenix.mr_rosetta PARAMS_1.eff

and that will rerun the job specified in that directory.

Failures in sub-processes

If some sub-processes fail, normally the failures will be ignored. This is useful as your overall job can often continue even if a few refinement or rosetta jobs fail. However if the failure is from the queueing system (rather than in the actual running of the jobs) then the overall job may still fail.

Stopping mr_rosetta

If you create a file "STOPWIZARD" in the top level directory (i.e., MR_ROSETTA_1/), then each job in the entire process will stop as soon as any Phenix part of the process takes over (i.e., as soon as Rosetta jobs finish).

Ignoring long-running place_model jobs and going on

There are two ways to avoid the problem of having one or more long-running (and very likely eventually unsuccessful) place_model sub-processes that prevent mr_rosetta from going on. One way is to set the parameter sufficient_number_finished=nn, where you are satisfied if any nn place_model jobs finish successfully. Then once nn jobs finish, all the rest are simply ignored and mr_rosetta goes on. The jobs that are ignored will continue on until they finish (and will still be ignored.)

A second way is to edit the value of sufficient_number_finished after mr_rosetta has started. This is convenient if you see that all the jobs except for one or two are done, and these seem to be going on forever. You can create a little file "GO_ON" in the directory where place_model is being run that contains the value you want for sufficient_number_finished (if you want it to stop right away and at least one job is finished, just put in 1). This directory is the directory where the log files for place_model are located. For example your overall log file might say,

Splitting work into 2 jobs and running with 2 processors using sh
background=True in /Users/terwill/unix/misc/junk/test_place_model/MR_ROSETTA_7/GROUP_OF_PLACE_MODEL_1

in which case that directory is where you would put GO_ON, where the file GO_ON just contains a number (the value for sufficient_number_finished). As in the use of sufficient_number_finished, the ignored jobs do not actually stop, so if you really want to stop them you will need to do that in another way (mr_rosetta does not capture the job numbers for sub-processes so it can't stop them, it can only monitor what they have done.)

NOTE: you can use this second method with the GO_ON file to tell mr_rosetta how many finished jobs to require for any set of sub-processes. Just put this file in the directory specified for that group of sub-processes. This also works for other phenix tools including autobuild, ligandfit and find_all_ligands that have the 'Splitting work into...' text in their log files.

Installing Rosetta for use with mr_rosetta

To run mr_rosetta, you need to install Rosetta from the Baker laboratory at the University of Washington. This is pretty easy, and a summary of steps is given below. Once you have installed Rosetta you need to set the environmental variable $PHENIX_ROSETTA_PATH. Then you have all the software you will need for running mr_rosetta.

Notes on Rosetta versions:

Downloading and installing Rosetta is pretty easy if your computer is compatible and it takes about an hour if you have a 2-processor machine...or just a few minutes if you have a multiprocessor machine to compile with.

NOTE: If trouble...see the FULL INSTRUCTIONS at http://www.rosettacommons.org/manuals/archive/rosetta3.2_user_guide/ or http://www.rosettacommons.org/manuals/archive/rosetta3.2_ or http://www.rosettacommons.org/manuals/archive/rosetta3.2.1_user_guide/

NOTE: If trouble on ubuntu 11.04 or later...also see: http://morganbye.net/blog/2011/05/rosetta-32-ubuntu-1104 (but just concerning basic.settings and options.settings modifications and using : scons bin mode=release cxx=gcc cxx_ver=4.5)

tar xzf rosetta3.2_Bundles.tgz

This should give you a directory rosetta-3.2 that contains:

BioTools                new_apps.note           rosetta_demos
foldit                  release.note            rosetta_fragments
manual                  rosetta_database        rosetta_source

NOTE: if your directory contains .tgz files instead of the listing above you may need to also run the same tar command on the individual .tgz files. NOTE 2: Rosetta versions 3.6 and later have a different directory structure. It looks instead like:

demos   main    tools
python --version
scons --version
sudo su   # UBUNTU ONLY FOR INSTALLING zlib1g and scons
apt-get install zlib1g-dev
apt-get install scons

In the scons command below the "-j2" means use 2 processors....adjust for your system. Takes about 1 hour with 2 processors. Please note: Do not move your Rosetta installation after compiling it. The compilation builds in path names, and if you move the binaries to some other place it will not run.

cd rosetta_source
python external/scons-local/scons.py -j2 bin mode=release
cd ..

If you get to "scons: done building targets." you are all set!

export PHENIX_ROSETTA_PATH=/your-path-to-rosetta-here/rosetta-3.2

or sh (C-shell):

setenv PHENIX_ROSETTA_PATH /your-path-to-rosetta-here/rosetta-3.2

If you are using the bash or sh shells:

export HTTP_PROXY=proxyout.mydomain.edu:8080

or sh (C-shell):

setenv HTTP_PROXY proxyout.mydomain.edu:8080

Setting up for a run of mr_rosetta A. Fragment files from the Robetta server

NOTE: For versions 3.6 and later of Rosetta, fragments files are optional.

To run mr_rosetta on your structure, you will need to use the Robetta fragment server at the Univ. of Washington to generate 9-mer and 3-mer fragments from the PDB that are compatible with your sequence file. This takes a few hours but is very easy to do.

To obtain the two required files:

phenix.phenix.adjust_robetta_resid <fragment_file_name> <new_fragment_file_name> <offset-for-residue numbers>

Setting up for a run of mr_rosetta B. Alignment files from the hhpred server

You will need to tell mr_rosetta what to use as search models and the alignment between the search models and your target structure. The easiest way is to use the hhpred server (Söding J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960.) Here is what to do:

Search models and alignment files

If you supply an hhr analysis file from hhpred, you do not need to worry (usually) about the details of your search models and alignment files. However you can supply mr_rosetta with your own list of search models and a corresponding list of alignment files. This section describes what the alignment files need to look like (two ways you can format these files.)

Here are your options for supplying alignment information:

You can generate an alignment file with phenix.muscle if you do not have one from another source. Use a command like this:

phenix.muscle -in my_two_sequences.dat -out my_alignment.ali

where my_two_sequences.dat looks like:

> title text for sequence of target (your structure) to follow
LVLKWVMSTKYVEAGELKEGSYVVIDGEPCRVVEIEKSKTGKHGSAKARIVAVGVFDGGKRTLSLPVDAQVEVPIIEKFT
AQILSVSGDVIQLMDMRDYKTIEVPMKYVEEEAKGRLAPGAEVEVWQILDRYKIIRVKG
> title text for sequence of template (supplied PDB) to follow
qlmdmrd AQILSVSGDVIQLMDMRDYKTIEVPMKYVEEEAKGRLAPGAEVEVWQILDRYKIIRVKG qlmdmrd

and my_alignment.ali (your .ali file) looks like:

> title text for sequence of target (your structure) to follow
LVLKWVMSTKYVEAGELKEGSYVVIDGEPCRVVEIEKSKTGKHGSAKARIVAVGVFDGGK
RTLSLPVDAQVEVPIIEKFTAQILSVSGDVIQLMDMRDYKTIEVPMKYVEEEAKGRLAPG
AEVEVWQILDRYKIIRVKG-------
> title text for sequence of template (supplied PDB) to follow
------------------------------------------------------------
-------------QLMDMRDAQILSVSGDVIQLMDMRDYKTIEVPMKYVEEEAKGRLAPG
AEVEVWQILDRYKIIRVKGQLMDMRD

You have two options for alignment files if you are going to use one.

> title text for sequence of target (your structure) to follow
VDFNGYWKMLSNENFEEYLRALDVNVALRKIANLLKPDKEIVQDGDHMIIRTLSTFRNYIMDFQVGKEFEEDLTGIDD
> title text for sequence of template (supplied PDB) to follow
-AFSGTWQVYAQENYEEFLRAISLPEEVIKLAKDVKPVTEIQQNGSDFTITSKTPGKTVTNSFTIGKEAEIT--TMDG
## 1CRB_ 2qo4_A
# hhsearch
scores_from_program: 0 1.00
1 VDFNGYWKMLSNENFEEYLRALDVNVALRKIANLLKPDKEIVQDGDHMIIRTLSTFRNYIMDFQVGKEFEEDLTGIDD
0 -AFSGTWQVYAQENYEEFLRAISLPEEVIKLAKDVKPVTEIQQNGSDFTITSKTPGKTVTNSFTIGKEAEIT--TMDG
--

Here is what has to be on each line:

Output files from mr_rosetta

The output files from mr_rosetta are the same as those from .autobuild: a model and map coefficients. These will be in a subdirectory listed at the end of your log file. The files will be something like: MR_ROSETTA_1/..../AutoBuild_run_1_/overall_best.pdb and MR_ROSETTA_1/..../AutoBuild_run_1_/overall_best_denmod_map_coeffs.mtz.

Graphical interface

A GUI for MR-Rosetta is now available in the "Molecular replacement" category. Its function is essentially identical to that of the command-line version, but many of the details are not shown by default. In addition to the methods described above for configuring your system to use Rosetta from PHENIX, the GUI also includes a preferences setting in the "Wizards" section for defining the path to the Rosetta installation. If the GUI does not detect that you have the environment set up correctly, it will issue a warning when started.

The configuration tab in the GUI includes a list into which any combination of input files may be added; the file types should be recognized (and any relevant data they contain extracted, such as space group and MTZ label information) automatically. For more complex inputs involving fragment files, click the button labeled "Other inputs" below the list of files.

../images/mr_rosetta_config.png

The number of processors to use will be set to one fewer than the total the number of CPU cores PHENIX thinks are available, but if you are using a queueing system this number can be increased. You can change how MR-Rosetta runs child processes by clicking the "Job control" button in the lower left-hand corner of the configuration tab.

Because it usually takes hours to run, MR-Rosetta will always be launched by the GUI as a "detached" job, meaning that you can close the GUI without killing the process, and resume it later. While MR-Rosetta is running, the current set of solutions will be continuously updated in a tab labeled "Current results", with the relevant score (LLG from Phaser, Rosetta score, or R-factor). You may view any of these solutions by clicking the buttons next to them.

../images/mr_rosetta_solutions.png

Once the job is complete, a simply summary tab will be displayed, listing the output files and basic statistics such as R-factors. If the program was successful, the R-free will usually be below 50%, although this may vary depending on resolution and data quality. Buttons are provided to start additional programs or view the model and maps.

../images/mr_rosetta_result.png

Adding a specific Rosetta command (disulfides to fix) to mr_rosetta

When you run mr_rosetta you can specify a command or commands to be added to the Rosetta scripts. For example, if your model has disulfide bonds between residues 12 and 15 and between 22 and 39 you can say:

rosetta_command="-MR::disulf 12:15 22:39"

In this command, each disulfide is colon-separated, and the numbering corresponds to the input fasta file. If you have multiple commands you can just give multiple rosetta_command statements. Note that any commands will be applied to all rosetta scripts in this mr_rosetta run. That means that you can't have different commands for different steps that use Rosetta. It also means that you cannot specify chain names or use different commands for different chains. At the moment this feature is most useful if you are supplying a single chain (or multiple chains with identical sequences).

Parameters files in mr_rosetta

When you run mr_rosetta it will write out a mr_rosetta_params.eff parameter file that can be used to re-run mr_rosetta (just as for essentially all PHENIX methods).

Examples

Standard run of mr_rosetta

Before you run mr_rosetta, you need to get fragment files from the Robetta server (see Setting up for a run of mr_rosetta, part A, above). Then you need an hhr alignment information file from the hhpred server (see Setting up for a run of mr_rosetta, part B, above), or else a search model and an alignment file to go with it.

Once you have these files, running mr_rosetta is easy. If you have a search model (coords1.pdb) and an alignment file for it (coords1.ali), and fragment files test3.gz and test9.gz, and a data file fobs.mtz with FP SIGFP and FreeR_flag, you can type:

phenix.mr_rosetta \
  seq_file=seq.dat \
  data=coords1.mtz \
  alignment_files=coords1.ali \
  search_models=coords1.pdb \
  already_placed=False\
  fragment_files = test3.gz \
  fragment_files = test9.gz \
  rescore_mr.relax=False \
  rosetta_models=20 \
  ncs_copies=2 \
  space_group=p212121  \
  use_all_plausible_sg=False \
  nproc=200 \
  group_run_command=qsub

and mr_rosetta will run automatically, generating 20 rosetta models during structure determination.

If you have an hhr alignment information file, you can specify that instead of search_models and alignment_files, with the command hhr_files=myhhpred.hhr. Then you can tell mr_rosetta how many of the PDB files to use with read_hhpred.number_of_models=1 (to use just the best one, for example).

Running mr_rosetta with a model that is already place in the unit cell

You can run mr_rosetta as a purely model-building tool as well. This is convenient if you have found a MR solution but cannot rebuild it successfully. Here is an example. The keyword to use is already_placed=True:

phenix.mr_rosetta \
  seq_file=seq.dat \
  data=coords1.mtz \
  search_models=coords1.pdb \
  already_placed=True \
  fragment_files = test3.gz \
  fragment_files = test9.gz \
  rescore_mr.relax=False \
  rosetta_models=20 \
  ncs_copies=2 \
  space_group=p212121  \
  use_all_plausible_sg=False \
  nproc=200 \
  group_run_command=qsub

Rebuilding your model with Rosetta before MR

If your search model is too distant to find a molecular replacement solution, you can prerefine your model with Rosetta before carrying out molecular replacement. Here is an example. The keyword to use is: run_prerefine=True. NOTE 1: It is best to specify the number of ncs_copies if you use run_prerefine. If you do not, then you may end up running several parallel jobs, each of which is independently carrying out prerefinement on the same input model (to be used later with different numbers of ncs copies). Once you have run your job with one value of ncs_copies, you can just use the best prerefined model from that job as a search model in your other runs.

phenix.mr_rosetta \
  seq_file=seq.dat \
  data=coords1.mtz \
  search_models=coords1.pdb \
  run_prerefine=True \
  number_of_prerefine_models=1000 \
  fragment_files = test3.gz \
  fragment_files = test9.gz \
  rescore_mr.relax=False \
  rosetta_models=20 \
  ncs_copies=2 \
  space_group=p212121  \
  use_all_plausible_sg=False \
  nproc=200 \
  group_run_command=qsub

NOTE 2: if you have a model and just want to run pre-refinement and not anything else...then you can do so without any data:

phenix.mr_rosetta \
  seq_file=seq.dat \
  search_models=coords1.pdb \
  run_prerefine=True \
  number_of_prerefine_models=1000

Your pre-refined model(s) will be listed in

MR_ROSETTA_1/GROUP_OF_PLACE_MODEL_1/RUN_FILE_1.log

and you can pick the best of these (most negative score, listed first).

Running mr_rosetta from a homology search (with an hhr file)

If you have run hhpred and obtained a .hhr file with a list of alignments of proteins in the PDB with your sequence, you can run starting from your sequence file and this .hhr file. Here is an example. The keyword to use is: hhr_files=my_hhr_file.hhr.

phenix.mr_rosetta \
 seq_file=bfr258e.fasta \
 data=bfr258e_data.mtz \
 hhr_files=bfr258e.hhr \
 read_hhpred.number_of_models=1 \
 read_hhpred.number_of_models_to_skip=0 \
 fragment_files=aabfr__03_05.200_v1_3.gz \
 fragment_files=aabfr__09_05.200_v1_3.gz \
 rescore_mr.relax=False \
 rosetta_models=20 \
 ncs_copies=1 \
 nproc=200 \
 group_run_command=qsub

NOTE: it is generally a good idea to run several separate mr_rosetta jobs, one for each homology model you want to extract from the PDB, and possibly also separately for each possible number of NCS copies. You can do this by adjusting the "read_hhpred.number_of_models_to_skip" from 0 to N and the value of "ncs_copies" in the script above. In this way, you can just pick the first job that gives you a good solution. If you run them all at once, then all jobs will wait for the slowest job to finish at each step. If there are multiple NCS copies and some search models are poor, this can sometimes take a very long time.

Getting a default parameters file for mr_rosetta

Usually you will want to edit a parameters file so that you can specify more details of the run. You can get a default parameters file with:

phenix.mr_rosetta

and then just edit that file.

Testing mr_rosetta

You can do a test of mr_rosetta to make sure everything is ok with:

phenix_regression.wizards.test_command_line_rosetta_quick

Possible Problems

Environment problems when running mr_rosetta

If you get an error message something like ... /opt/rosetta3.4/rosetta_source/bin/mr_protocols.default.linuxgccrelease: error while loading shared libraries: libprotocols.7.so: cannot open shared object file: No such file or directory" this can mean that the LD_LIBRARY_PATH is not correctly interpreted by mr_rosetta. Try running this command just before running phenix, or put in your bash.bashrc file (if using sh/bash) or .cshrc file (if using csh): PHENIX_TRUST_OTHER_ENV="yes"

Debugging problems with running mr_rosetta

If mr_rosetta fails, the first thing (after just checking the commands you used) is to run the mr_rosetta regression tests to make sure that the installations of phenix and rosetta are both ok:

phenix_regression.wizards.test_command_line_rosetta_quick

That should take 10-20 minutes to run and say "OK" for all the tests. If one or more of these say instead "FAILED" ...you can go into the failed run (for example, test_autobuild/) and run the script there (e.g., ./test_autobuild.com) which should fail.. and you can track down what is not working.

NOTE: On some systems there may be some really minor (numerical) differences between the standard results and those on your system. These can cause the "FAILED" to be printed out but can safely be ignored. You can tell by looking at the file "diff.dat" that should be in the failed run directory and you'll see that the differences are very minor.

If the tests all are OK, then there is something specific to your data or script. The best way to debug this is to go to the last sub-process that has failed or hung and look at the log file, and possibly re-run that step from the terminal. Here is how to get there:

Starting job 1...Log will be: /net/omega/raid1/scratch1/terwillMR_ROSETTA_2/GROUP_OF_PLACE_MODEL_1/RUN_FILE_1.log
terwill@sigma> cd MR_ROSETTA_2/GROUP_OF_PLACE_MODEL_1/
terwill@sigma> ls -tlr
total 60
-rwx------ 1 terwill lanl  1495 Feb  5 14:54 RUN_FILE_1.sh*
-rwx------ 1 terwill lanl   282 Feb  5 14:54 RUN_FILE_1*
-rw-r--r-- 1 terwill lanl  6431 Feb  5 14:54 PARAMS_1.eff
-rw-r--r-- 1 terwill lanl  6564 Feb  5 14:54 mr_rosetta_params.eff
-rw-r--r-- 1 terwill lanl   130 Feb  5 14:54 INFO_FILE_1
drwxr-xr-x 6 terwill lanl  4096 Feb  5 16:44 RUN_1/
-rw-r--r-- 1 terwill lanl 21575 Feb  5 16:45 RUN_FILE_1.log
-rw-r--r-- 1 terwill lanl    51 Feb  5 16:46 JOBS_RUNNING

Here:

The key here is that you can type

phenix.mr_rosetta PARAMS_1.eff

and the exact same job that failed or ran will be run again. You can use this to debug what is going on.

MR_ROSETTA_2/GROUP_OF_ROSETTA_REBUILD_1/RUN_1/REBUILD_IN_SETS_1/RUN_5/WORK_1

Here this is in RUN_1 of a group of rosetta models, set 1, run 5, working directory. In this directory you will find something like:

terwill@sigma> cd WORK_1/
terwill@sigma> ls -tlr
total 684
-rw-r--r-- 1 terwill lanl   1475 Feb  5 16:48 rebuild.flags
-rwxr-xr-x 1 terwill lanl    304 Feb  5 16:48 run_rebuild.sh*
-rw-r--r-- 1 terwill lanl 422921 Feb  5 17:26 S_3DZB__0001.pdb
-rw-r--r-- 1 terwill lanl    665 Feb  5 17:26 score.sc
-rw-r--r-- 1 terwill lanl  97437 Feb  5 17:26 rebuild.log
-rw-r--r-- 1 terwill lanl 158717 Feb  5 18:17 S_3DZB__0001_ed.pdb

Here:

You can look at the log file and see if there are any messages. Then you can rerun the Rosetta job in a scratch directory with:

mkdir junk
cd junk
../run_rebuild.sh

With luck, you will get the same errors and you can debug from there by changing the parameters or input files in rebuild.flags to see what was causing the problems.

Specific limitations and problems

mr_rosetta does not have the full flexibility of autobuild, so you may want to get a nearly-complete model with mr_rosetta and then use autobuild to increase the completeness and quality. You may also want to take the output of mr_rosetta and then put it back in as input to mr_rosetta and re-run it to improve your model.

File names of PDB files for mr_rosetta need to have at least 4 characters before the .pdb. So test.pdb is fine, but my.pdb is not.

Literature

Improved molecular replacement by density- and energy-guided protein structure optimization. F. DiMaio, T.C. Terwilliger, R.J. Read, A. Wlodawer, G. Oberdorfer, U. Wagner, E. Valkov, A. Alon, D. Fass, H.L. Axelrod, D. Das, S.M. Vorobiev, H. Iwaï, P.R. Pokkuluri, and D. Baker. Nature 473, 540-3 (2011).

phenix.mr_rosetta: molecular replacement and model rebuilding with Phenix and Rosetta. T.C. Terwilliger, F. Dimaio, R.J. Read, D. Baker, G. Bunkóczi, P.D. Adams, R.W. Grosse-Kunstleve, P.V. Afonine, and N. Echols. J Struct Funct Genomics 13, 81-90 (2012).

Refinement of protein structures into low-resolution density maps using rosetta. F. DiMaio, M.D. Tyka, M.L. Baker, W. Chiu, and D. Baker. J Mol Biol 392, 181-90 (2009).

List of all available keywords