Crystallographic refinement with Rosetta (phenix.rosetta_refine)

Contents

Overview

The program phenix.rosetta_refine provides a convenient wrapper for running Rosetta refinement of protein X-ray crystal structures (DiMaio et al. 2013), which integrates the Rosetta methods for conformational sampling with the X-ray targets, B-factor refinement, and map generation in phenix.refine (Afonine et al. 2012). This is intended for use in difficult cases, especially at low resolution where it combines a wide radius of convergence with excellent geometry, and for preparing crystal structures for further modelling in Rosetta.

Currently both the setup and execution of Rosetta refinement have some limitations; please read this entire document before attempting to use the program! Note that Rosetta does not currently run on Windows; you must have a Mac or Linux system to use this program.

Installation

Because the build mechanism for hybrid Rosetta/Phenix refinement requires some specialized handling of the Rosetta package, it may interfere with other uses of the suite (e.g. MR-Rosetta). We recommend creating a separate installation for Rosetta if you plan to use it for other purposes. This limitation will be resolved in future releases.

First, you should install the appropriate version of Phenix. The binary installers should be sufficient; users of the Mac graphical package can set up the command-line environment with this command (replacing the version number as needed):

source /Applications/PHENIX-1.8.4-1496/Contents/phenix-1.8.4-1496/phenix_env.sh

You must obtain a separate license for Rosetta, which is free for academic users. The software is primarily distributed as source code, which will be linked to Phenix as part of the install process. Once you have obtained and unpacked the Rosetta distribution, the top-level directory should have a sub-directory called "main", containing additional sub-directories "source" and "database". You should first set environment variables to tell Phenix where to find the Rosetta distribution, and tell Rosetta where to find the database files, for example:

export PHENIX_ROSETTA_PATH=/usr/local/rosetta
export ROSETTA3_DB=/usr/local/rosetta/main/database

or for csh/tcsh users:

setenv PHENIX_ROSETTA_PATH "/usr/local/rosetta"
setenv ROSETTA3_DB "/usr/local/rosetta/main/database"

Once you have both a Phenix installation and a Rosetta source distribution, you can build the hybrid system with this command (located in the Phenix tree):

rosetta.build_phenix_interface nproc=X

where X is the number of processors you wish to use for compilation. This will automatically set up Rosetta to use the Python bundled with Phenix.

Execution

Running phenix.rosetta_refine is superficially similar to running phenix.refine, but with a much more limited set of options. Required inputs are a PDB file, a data file (MTZ format), and CIF restraint files for any non-standard ligands. Other common parameters include the number of processors to use, the method for parallelizing Rosetta jobs, and the refinement protocol, for example:

phenix.rosetta_refine model.pdb data.mtz nproc=5 technology=sge protocol=hires

The default behavior is to run the full refinement protocol as described in the program citation, creating five unique models, and picking the model with the best free R-factor. This will then be passed through phenix.refine with a null strategy to optimize hydrogen and bulk solvent parameters. If desired, the argument postref=True can be given to enable additional cycles of refinement in phenix.refine, including both coordinates and B-factors with weight optimization.

The main choice of refinement strategy is between the "full" and "hires" protocols, both of which are described using the RosettaScripts syntax (Fleishman et al. 2011). The "hires" protocol is intended both for high-resolution structures and in general any structure that is already near convergence. The XML scripts for these methods are included in the Rosetta distribution. You may also create your own XML script with a customized protocol and pass that to phenix.rosetta_refine.

You can ask the program to parallelize jobs across multiple CPUs or several different queuing systems; the technology parameter determines the method used, and nproc indicates the number of jobs to run at once. The actual number of processes will be limited by the number_of_models parameter, which defaults to 5. Requesting more models may provide incremental improvements in some cases but we have found it rarely makes a difference of more than 1-2% in R-free.

Output will be to a new directory with a name such as "rosetta_1", with the directory number increasing sequentially as additional runs are created. Because phenix.refine is used as the final step, the output files will be identical to those output by that program.

Limitations

The program has several restrictions compared to phenix.refine:

  • Molecules are limited to the standard amino acids and a small number of common ions and ligands. If you have additional molecules in your model, these will be removed prior to Rosetta refinement and replaced afterwards. You must still supply and necessary CIFs to the program, as these will be used later when running the final phenix.refine step.
  • RNA is not supported; however, the separate program ERRASER (Chou et al. 2012) which also employs both Rosetta and Phenix may be used to optimize the RNA components of the model in real space.
  • Alternate conformations are not supported.
  • The B-factor refinement is limited to simply isotropic refinement at present; for more flexible parameterization you should run phenix.refine separately.
  • Rosetta scales poorly above approximately 1000 residues; although larger structures may work, the performance tends to be worse.
  • At present it is not possible to impose NCS restraints on structures in Rosetta.

Also note that the program is significantly more processor-intensive, although less so than MR-Rosetta (DiMaio et al. 2011). We recommend that you use a multiprocessor computer or queuing system for greatest efficiency.

Troubleshooting

This program is under development; please address any questions concerning its use to help@phenix-online.org. The following guidelines may be useful for best performance:

  • The default settings are optimized for difficult structures, where conventional refinement gets stuck quickly. The assumption is that fine details such as waters and ligand molecules have not yet been added. For now, the only alternative is the "high-resolution" protocol, which employs gentler optimization; additional refinement scripts will be provided in the future.
  • For optimal results, especially when preparing a structure for further Rosetta modelling, you may wish to design your own Rosetta refinement protocol using RosettaScripts.
  • Unlike methods such as simulated annealing with the default geometry restraints, Rosetta refinement will almost never improve the fit to the experimental data at the expense of model geometry. If your structure contains severe distortions as a result of aggressive optimization, Rosetta refinement may actually degrade the model.
  • While Rosetta refinement can improve structures that are outside the radius of convergence of traditional crystallographic refinement, it still requires that the molecule(s) be correctly placed in the unit cell. If you are uncertain of your molecular replacement solution, we recommend trying MR-Rosetta instead.
  • The tests performed in DiMaio et al. 2013 indicated that the best results may come from combining orthogonal refinement methods such as DEN (Schroder et al. 2010) with Rosetta.

References

Improved low-resolution crystallographic refinement with Phenix and Rosetta. F. DiMaio, N. Echols, J.J. Headd, T.C. Terwilliger, P.D. Adams, and D. Baker. Nat Methods 10, 1102-4 (2013).

Towards automated crystallographic structure refinement with phenix.refine. P.V. Afonine, R.W. Grosse-Kunstleve, N. Echols, J.J. Headd, N.W. Moriarty, M. Mustyakimov, T.C. Terwilliger, A. Urzhumtsev, P.H. Zwart, and P.D. Adams. Acta Crystallogr D Biol Crystallogr 68, 352-67 (2012).

Correcting pervasive errors in RNA crystallography through enumerative structure prediction. F.C. Chou, P. Sripakdeevong, S.M. Dibrov, T. Hermann, and R. Das. Nat Methods 10, 74-6 (2012).

Improved molecular replacement by density- and energy-guided protein structure optimization. F. DiMaio, T.C. Terwilliger, R.J. Read, A. Wlodawer, G. Oberdorfer, U. Wagner, E. Valkov, A. Alon, D. Fass, H.L. Axelrod, D. Das, S.M. Vorobiev, H. Iwaï, P.R. Pokkuluri, and D. Baker. Nature 473, 540-3 (2011).

RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. S.J. Fleishman, A. Leaver-Fay, J.E. Corn, E.M. Strauch, S.D. Khare, N. Koga, J. Ashworth, P. Murphy, F. Richter, G. Lemmon, J. Meiler, and D. Baker. PLoS One 6, e20161 (2011).

Super-resolution biomolecular crystallography with low-resolution data. G.F. Schröder, M. Levitt, and A.T. Brunger. Nature 464, 1218-22 (2010).

List of all available keywords