Frequently asked questions about experimental phasing

Contents

General
Experimental phasing (autosol)
Phaser

General

What can I do if autosol says "this version does not seem big enough"?

Autosol tries to automatically determine the size of solve or resolve, but if your data is very high resolution or a very large unit cell, you can get the message:

***************************************************
Sorry, this version does not seem big enough...
(Current value of isizeit is  30)
Unfortunately your computer will only accept a size of  30
with your current settings.
You might try cutting back the resolution
You might try "coarse_grid" to reduce memory
You might try "unlimit" allow full use of memory
***************************************************

You cannot get rid of this problem by specifying the resolution with resolution=4.0 because autosol/autobuild/ligandfit uses the resolution cutoff you specify in all calculations, but the high-res data is still carried along.

The easiest solution to this problem is to edit your data file to have lower- resolution data. You can do it like this:

phenix.reflection_file_converter huge.sca --sca=big.sca --resolution=4.0

or in the GUI, use the reflection file editor.

A second solution is to tell autosol to ignore the high-res data explicitly with one of these commands (on the command line or in the GUI):

resolve_command="'resolution 200 4.0'"
solve_command="'resolution 200 4.0'"
resolve_pattern_command="'resolution 200 4.0'"

Note the two sets of quotes; both are required for this command-line input. Just one set of quotes is required in the GUI. These commands are applied after all other inputs in resolve/solve/resolve_pattern and therefore all data outside these limits will be ignored.

Why am I not allowed to use a file with FAVG SIGFAVG DANO SIGDANO in autosol or autobuild?

The group of MTZ columns FAVG SIGFAVG DANO SIGDANO is a special one that should normally not be used in Phenix. The reason is that Phenix stores this data as F+ SIGF+ F- SIGF-, but in the conversion process between F+/F- and FAVG/DANO, information is lost. Therefore you should normally supply data files with F+ SIGF+ F- SIGF- (or intensities), or fully merged data (F,SIG) to Phenix routines. As a special case, if you have anomalous data saved as FAVG SIGFAVG DANO SIGDANO you can supply this to autosol, however this requires either that (1) you supply a refinement file with F SIG, or that (2) your data file has a separate F SIG pair of columns (other than the FAVG SIGFAVG columns that are part of the FAVG/DANO group).

How can I specify a mask for density modification in autosol/autobuild/?

In autobuild you can simply use the command:

mask_from_pdb = my_mask_file.pdb
rad_mask_from_pdb = 2

where my_mask_file.pdb has atoms in it marking the region to be masked. All points within rad_mask_from_pdb of an atom in my_mask_file.pdb will be considered inside the mask.

If you want to specify a mask in autosol, add this command:

resolve_command_list="'model ../../coords.pdb'  'use_model_mask'"

where there are " and ' quotes and coords.pdb is the model to use for a mask. Note the "../../" because coords.pdb is in your working directory but when resolve runs the run directory is 2 directories lower, so relative to that directory your coords.pdb is at "../../coords.pdb".

You will know it is working if your resolve_xx.log says:

Using model mask calculated from coordinates

Note: this command is most appropriate for use with the keyword maps_only=True because phenix.autobuild also uses model=... so that iterative model-building may not work entirely correctly in this case. Two parts that may not function correctly are "build_outside_model" (which will use your model as a mask and not the current one), and evaluate_model (which will evaluate your starting model, not the current model).

What do I do if autobuild says TRIED resolve_extra_huge ...but not OK?

In most cases when you get this error in phenix:

TRIED resolve_extra_huge ...but not OK

it actually means "your computer does not have enough memory to run resolve_extra_huge". If that is the case then you are kind of stuck unless you have another computer with even more memory+swap space, or you cut back on the resolution of the input data (Note that you have to actually lower the resolution in the input file, not just set "resolution=" because all the data is kept but not used if you just set the resolution).

You can also try the keyword:

resolve_command_list="'coarse_grid'"

(note 2 sets of quotes)

Sometimes the not OK message can happen if your system and PHENIX are not matching, so that resolve or solve cannot run at all. You can test for this by typing:

phenix.resolve

and if it loads up (just type QUIT or END or control-C to end it) then it runs, and if it doesn't, there is a system mismatch.

Experimental phasing (autosol)

How can I tell autosol which columns to use from my mtz file?

In the GUI this is handled automatically, and available columns will be loaded into drop-down menus. On the command line, autosol will normally try to guess the appropriate columns of data from an input data file. If there are several choices, then you can tell autosol which one to use with the command_line keywords labels, peak.labels, infl.labels etc. For example if you have two input datafiles w1 and w2 for a 2-wavelength MAD dataset and you want to select the w1(+) and w1(-) data from the first file and w2(+) and w2(-1) from the second, you could use following keywords (see "How do I know what my choices of labels are for my data file" to know what to put in these lines):

input_file_list=" w1.mtz w2.mtz"
group_labels_list=" 'w1(+) SIGw1(+) w1(-) SIGw1(-)' 'w2(+) SIGw2(+) w2(-) SIGw2(-)'"

Note that all the labels for one set of anomalous data from one file are grouped together in each set of quotes.

You could accomplish the same thing from a parameters file specifying something like:

wavelength{
  wavelength_name = peak
  data = w1.mtz
  labels = w1(+) SIGw1(+) w1(-) SIGw1(-)
}
wavelength{
  wavelength_name = infl
  data = w2.mtz
  labels = w2(+) SIGw2(+) w2(-) SIGw2(-)
}

How do I know what my choices of labels are for my data file?

You can find out what your choices of labels are by running the command:

phenix.autosol show_labels=w1.mtz

This will provide a listing of the labels in w1.mtz and suggestions for their use in autosol/autobuild/ligandfit. For example the labels for w1.mtz yields:

List of all anomalous datasets in  w1.mtz
'w1(+) SIGw1(+) w1(-) SIGw1(-)'

List of all datasets in  w1.mtz
'w1(+) SIGw1(+) w1(-) SIGw1(-)'

List of all individual labels in  w1.mtz
'w1(+)'
'SIGw1(+)'
'w1(-)'
'SIGw1(-)'

Suggested uses:
labels='w1(+) SIGw1(+) w1(-) SIGw1(-)'
input_labels='w1(+) SIGw1(+) None None None None None None None'
input_refinement_labels='w1(+) SIGw1(+) None'
input_map_labels='w1(+) None None'

Why do I get "None of the solve versions worked" in autosol?

If you get this or a similar message for resolve, first have a look at LAST.LOG if it exists in your AutoSol_run_xx_ or AutoBuild_run_xx_ directory. The end of that file may give you a hint as to what was wrong.

The next thing to try is running one of these commands (just kill them with control-C if they do run):

phenix.solve

or:

phenix.resolve

If these load up solve or resolve, then they basically work and the problem is probably in the size of your dataset, some formatting issue, or the like.

If they do not run, then the problem is in your system setup. If you are using redhat linux, try changing the option of selinux to selinux=disabled in your /etc/sysconfig/selinux file.

It is also possible that you do not have the application csh installed on your system. If you have Ubuntu linux, csh and tcsh are not included in a normal installation. It is easy to install csh and tcsh under linux and it just takes a minute. On Ubuntu or Debian, you can say:

apt-get install tcsh

or on Fedora or CentOS and similar distributions:

yum install tcsh

and that should do it.

How can I do a quick check for iso and ano differences in an MIR dataset?

You can say:

phenix.autosol native.data=native.sca deriv.data=deriv.sca

and wait a couple minutes until it has scaled the data (once it says "RUNNING HYSS" you are far enough) and then have a look at:

AutoSol_run_1_/TEMP0/dataset_1_scale.log

which will say near the end:

isomorphous differences derivs            1  - native

Differences by shell:

shell   dmin    nobs      Fbar      R     scale    SIGNAL  NOISE   S/N

1     5.600  1018     285.012     0.287   0.998 105.05  26.73   3.93
2     4.200  1386     324.927     0.216   1.000  84.78  26.76   3.17
3     3.920   542     330.807     0.214   1.002  85.00  28.36   3.00
4     3.710   523     286.487     0.237   1.002  81.31  27.29   2.98
5     3.500   662     282.383     0.235   1.001  75.58  37.12   2.04
6     3.360   518     255.782     0.241   1.003  72.69  27.18   2.67
7     3.220   630     237.778     0.253   1.000  68.87  29.94   2.30
8     3.080   727     208.271     0.255   1.000  61.39  29.19   2.10
9     2.940   897     190.044     0.254   0.999  42.78  42.99   1.00
10     2.800  1067     169.022     0.280   0.999  50.54  33.24   1.52

Total:          7970     256.096     0.245   1.000  75.29  31.41   2.48

Here R is <Fderiv-Fnative>/(2 <Fderiv+Fnative>), noise is <sigma>, signal is sqrt(<(Fderiv-Fnative)**2>-<sigma**2>), and S/N is the ratio of signal to noise.

I ran autosol to get a partial model that I now want to refine. Which data file should I give as input: the original .sca file from HKL2000, or the file overall_best_refine_data.mtz from autosol?

Always use the MTZ file output by autosol. This contains a new set of R-free flags that have been used to refine the model; starting over with the .sca file will result in a new set of flags being generated, which biases R-free.

Phaser

How can I calculate a log-likelihood gradient (LLG) map in Phaser?

See the maps section of the general FAQ list.