Contents
Ralf Grosse-Kunstleve, Paul Adams, Randy Read, Gabor Bunkoczi, Tom Terwilliger
The HySS (Hybrid Substructure Search) submodule of the Phenix package is a highly-automated procedure for the location of anomalous scatterers in macromolecular structures. HySS starts with the automatic detection of the reflection file format and analyses all available datasets in a given reflection file to decide which of these is best suited for solving the structure. The search parameters are automatically adjusted based on the available data and the number of expected sites given by the user. The search method is a systematic multi-trial procedure employing
- direct-space Patterson interpretation followed by
- reciprocal-space Patterson interpretation followed by
- dual-space direct methods or Phaser LLG completion followed by
- automatic comparison of the solutions and
- automatic termination detection.
The end result is a consensus model which is exported in a variety of file formats suitable for frequently used phasing and density modification packages.
The core search procedure is applicable to both anomalous diffraction and isomorphous replacement problems. However, currently the command line interface is limited to work with anomalous diffraction data or externally preprocessed difference data.
HySS has many new features in 2014. The first thing you might notice in HySS is that it can use as many processors as you have on your computer. This can make for a really quick direct methods search for your anomalously-scattering substructure.
You might notice next that HySS now automatically tries Phaser completion to find a solution if the direct methods approach does not give a clear solution right away. Phaser completion uses the likelihood function to create an LLG map that is used to find additional sites. This is really great because Phaser completion in HySS can be much more powerful than direct methods in HySS. Phaser completion takes a lot longer than direct methods completion but it is now quite feasible, particularly if you have several processors on your computer.
The next thing you might notice in HySS is that it automatically tries several resolution cutoffs for the searches if the first try does not give a convincing solution. Also HySS will start out with a few Patterson seeds and then try more if that doesn't give a clear solution.
HySS now considers a solution convincing if it finds the same solution several times, starting with different initial Patterson peaks as seeds. The more sites in the solution, the fewer duplicates need to be found to have a convincing solution.
Putting all these together, the new HySS is much faster than the old HySS and it can solve substructures that the old HySS could not touch.
The HySS GUI is listed in the "Experimental phasing" category of the main PHENIX GUI. Most options are shown in the main window, but only the fields highlighted below are mandatory. The data labels will be selected automatically if the reflections file contains anomalous arrays, and any symmetry information present in the file will be loaded in the unit cell and space group fields.
It may be helpful to run Xtriage first to determine an appropriate high resolution cutoff, as most datasets do not have significant anomalous signal in the highest resolution shells. The wavelength is only required if Phaser is being used for rescoring. Additional options are described below in the command-line documentation.
At the end of the run, a tab will be added showing output files and basic statistics. A correlation coefficient of XXX usually indicates that the sites are real. If you are happy with the sites, you can load them into AutoSol or Phaser directly from this window.
A full list on sites is displayed in the "Edit sites" tab. For a typical high-quality selenomethionine dataset, such as the p9-sad tutorial data used here, valid sites should have an occupancy close to 1, but for certain types of heavy-atom soaks (such as bromine) all sites may have partial occupancy. You can edit the sites by changing the occupancy or unchecking any that you wish to discard, then clicking the "Save selected" button.
Enter phenix.hyss without arguments to obtain a list of the available command line options:
usage: phenix.hyss [options] reflection_file n_sites element_symbol Example: phenix.hyss w1.sca 66 Se
The site_min_distance, site_min_distance_sym_equiv, and site_min_cross_distance options are available to override the default minimum distance of 3.5 Angstroms between substructure sites.
The real_space_squaring option can be useful for large structures with high-resolution data. In this case the large number of triplets generated for the reciprocal-space direct methods procedure (i.e. the tangent formula) may lead to excessive memory allocation. By default HySS switches to real-space direct methods (i.e. E-map squaring) if it searches for more than 100 sites. If this limit is too high given the available memory use the real_space_squaring option. For substructures with a large number of sites it is in our experience not critical to employ reciprocal-space direct methods.
If the molecular_weight and solvent_content options are used HySS will help in determining the number of substructures sites in the unit cell, interpreting the number of sites specified on the command line as number of sites per molecule. For example:
phenix.hyss gere_MAD.mtz 2 se molecular_weight=8000 solvent_content=0.70
This is telling HySS that we have a molecule with a molecular weight of 8 kD, a crystal with an estimated solvent content of 70%, and that we expect to find 2 Se sites per molecule. The HySS output will now show the following:
#---------------------------------------------------------------------------# | Formula for calculating the number of molecules given a molecular weight. | |---------------------------------------------------------------------------| | n_mol = ((1.0-solvent_content)*v_cell)/(molecular_weight*n_sym*.783) | #---------------------------------------------------------------------------# Number of molecules: 6 Number of sites: 12 Values used in calculation: Solvent content: 0.70 Unit cell volume: 476839 Molecular weight: 8000.00 Number of symmetry operators: 4
HySS will go on searching for 12 sites.
If the HySS consensus model does not lead to an interpretable electron density map please try the search=full option:
phenix.hyss your_file.sca 100 se search full
This disables the automatic termination detection and the run will in general take considerably longer. If the full search leads to a better consensus model please let us know because we will want to improve the automatic termination detection.
Another possibility is to override the automatic determination of the high-resolution limit with the resolution option. In some cases the resolution limit is very critical. Truncating the high-resolution limit of the data can sometimes lead to a successful search, as more reflections with a weak anomalous signal are excluded.
Enabling a phaser-based rescoring protocol can also help (rescore=phaser-complete is recommended). It is less affected by suboptimal resolution cutoffs and also provides more discrimination with noisy data. Switching on the phaser-map extrapolation protocol is also worthwhile, since it increases success rate and is only a small runtime overhead compared to phaser-based rescoring.
If there is no consensus model at the end of a HySS run please try alternative programs. For example, run SHELXD with the .ins and .hkl files that are automatically generated by HySS:
Writing anomalous differences as SHELX HKLF file: mbp_anom_diffs.hkl Writing SHELXD ins file: mbp_anom_diffs.ins
If HySS does not produce a consensus model even though it is possible to solve the substructure with other programs we would like to investigate. Please send email to bugs@phenix-online.org.
EMMA stands for Euclidean Model Matching which allows two sets of coordinates to be superimposed as best as possible given symmetry and origin choices. See the phenix.emma documentation for more details.
Substructure search procedures for macromolecular structures. R.W. Grosse-Kunstleve, and P.D. Adams. Acta Cryst. D59, 1966-1973. (2003).
Simple algorithm for a maximum-likelihood SAD function. A.J. McCoy, L.C. Storoni, and R.J. Read. Acta Crystallogr D Biol Crystallogr 60, 1220-8 (2004).