Using the PHENIX Wizards

Purpose

Any Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded parameters files. All three versions are identical except in the way that they take commands and keywords from the user.

This page describes how to run a Wizard and what a Wizard does in general. The specific Wizard help pages describe the details of each PHENIX Wizard.

Overview of Structure Determination with the PHENIX Wizards

You can use the AutoSol Wizard to solve structures by SAD, MAD, SIR/SIRAS, and MIR/MIRAS. The AutoSol Wizards together can carry out MRSAD. The AutoSol Wizard can also combine SAD, MAD, SIR, and MIR datasets and solve the structure using all available data.

Once you have experimental or MR phases, you can carry out iterative model-building, density modification, and refinement with the AutoBuild Wizard to improve your model. Finally you can use the rebuild_in_place feature of the AutoBuild Wizard to make one very good final model.

If your structure contains ligands, you can place them using the LigandFit Wizard

This help page describes how to run the Wizards from a GUI, the command-line, or a parameters file. The individual Wizard documentation pages describe the strategies and commands for each Wizard:

Usage

Wizard data directories, sub-directories, Facts, and the PDS (Project Data Storage)

The directory that you are in when you start up PHENIX is your working directory.
Each run of a Wizard will have all output data in a subdirectory of your working directory named like this (for AutoSol run 3):

AutoSol_run_3_/

This subdirectory will have one or more temporary directories:

AutoSol_run_3_/TEMP0/

which contain intermediate files. These temporary directories will be deleted when the Wizard is finished (unless you set the parameter clean_up to False)

For OMIT and MULTIPLE-MODEL runs, the final OMIT maps and multiple models will be in a subdirectory of your run directory:

AutoSol_run_3_/OMIT/
AutoSol_run_3_/MULTIPLE_MODELS/

All the parameter values as well as any other information that a Wizard generates during its run is stored in the PDS (Project Data Storage) and/or the Wizard Facts. The Facts are values of parameters and pointers to files in the PDS. The Facts keep track of the current knowledge available to the Wizard. Each time a step is completed by a Wizard, the new Facts are saved (overwriting old ones for that run). As the Facts define the state of the Wizard, the Wizard can be restarted any time by loading the appropriate set of Facts.
The PDS (Project data storage) will be in your working directory:

./PDS/

The PDS contains the output of each of your runs for all Wizards and a record of all the Facts (parameters and data) for each run. If you delete a run using the PHENIX Wizard GUI or with a command like "phenix.autosol delete_runs=2", the corresponding entries in the PDS are also deleted. You can copy the PDS from one place to another. Note that if you delete directories such as "AutoSol_run_1_" by hand then the corresponding information remains in the PDS. For this reason it is best to use the GUI or specific commands to delete runs.

Running a Wizard using a multiprocessor machine or on a cluster

You can take advantage of having a multiprocessor machine or a cluster when running the wizards (Currently this applies to the LigandFit and AutoBuild Wizards). For example, adding command

nproc=4

to a command-line command for a Wizard will use 4 processors to run the wizard (if possible). Normally you will run the parallel processes in the background with the default of

background=True

If you have a cluster with a batch queue, you can send subprocesses to the batch queue with

run_command=qsub

(or whatever your batch command is). In this case you will use

background=False

so that the batch queue can keep track of your jobs.

The Wizards divide the data into nbatch batches during processing. The value of

nbatch=3

is set from 3 to 5 by default (depending on the Wizard) and is appropriate if you have up to nbatch processors. If you have more, then you may wish to increase nbatch to match the number of processors. The reason it is done this way is that the value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch.

Running a Wizard from a GUI

Basic operation of a Wizard from the GUI

Start up the PHENIX GUI in your working directory by typing "phenix"
Answer "yes" to the question "Do you want to make it a project directory?".
Launch a Wizard from the PHENIX GUI by double-clicking on the name of the Wizard ("AutoSol") under "Wizards" in the Strategy Interface of the main GUI.
The Wizard will come up in a blue window and will open a grey Parameters window asking you for information on what files to use and what to do.
Enter the file names and make choices as necessary (NOTE: to select a file click on the yellow box to the right of the file entry field. To add a new file entry field click on the "Parameter group options" tab if present).
Proceed to the next window by clicking "Continue" in the upper left corner of the grey Parameters window.
The Wizard will guide you through the necessary inputs, then it will continue on its own until it is finished.
When the Wizard is done, you can double-click on the Display icon (the little magnifying glass on the upper left of the blue Wizard window) to show a list of files and maps that can be displayed. (NOTE: The Display Options window is updated when you open it. Once this window is open you cannot open it again until you close it. Sometimes this window may be behind other windows and this will prevent you from opening it again.)
You can open the Parameters window any time the Wizard is stopped by clicking on the Parameters icon (4 little lines in the upper left corner of the blue Wizard window). This allows you to carry out some of the more advanced options below.
Your output log file will be in a file called "AutoSol.1.output" for an AutoSol run. You can also see the same file by clicking on the "LOG" button at the lower right of the blue or green window.

Keeping track of multiple runs of a Wizard from the GUI

You can run more than one Wizard job at a time if you want. Each run of a Wizard is put in a separate sub-directory (e.g., "AutoSol_run_1_").
When you start a Wizard, it will start a new run of that Wizard.
If you want to continue on with the highest-numbered run of a Wizard, you can start the Wizard with the continue button for that Wizard (for example the continue_AutoSol button).
If you want to go back to a previous run, you can use the Run Control and Run Number selections near the bottom of any Parameters window (NOTE: to open the parameters window click on the lines at the upper left of the blue Wizard window). Select goto_run and choose a run number to go to.
If you want to copy a previous run and go on, use the Run Control and Run Number selections and select copy_run and choose a run number to copy. The Wizard will create a new run (with number equal to the highest previous number plus one) and carry on with it.
To see what runs are available, select View or Delete Runs in the Navigate tab at the lower left of any Parameters window.
If you want to stop the Wizard, hit the PAUSE button on the green Wizard window (the Wizard is green when running, blue or purple when stopped). NOTE: this may take a little time, particularly if Phaser or HYSS or phenix.refine are running. In those cases if you really want to stop the Wizard right away, got to "Strategy" and then select "Stop Strategy" and it will be stopped.

Setting parameters of a Wizard from the GUI

You can set any parameter in a Wizard by selecting the variable in the Choose Variable to Set tab. The next time you click Continue, the Wizard will save all the current inputs as usual, and then instead of going on to the next step, it will open a window asking you for the new value of that variable. When you enter it and press Continue, the Wizard will continue on with what it was doing, but with this new value.
NOTE that some parameters (e.g., resolution) may affect many steps. If a prior step is affected by a parameter that is changed, the Wizard does not go back and change it. If you want the parameter change to affect something that has already been done, you need to re-run the corresponding step.
NOTE that you can set any SOLVE, RESOLVE or RESOLVE_PATTERN keyword when you are running a Wizard using the "resolve_command", "solve_command" or "resolve_pattern_command" keywords. These can be set in the GUI from the Choose Variable pull-down menu. You just type in the command to the entry form like this: (for resolve_command):

res_start 4.0

telling resolve in this case to start out density modification at a resolution of 4 A. This allows you to control what solve, resolve and resolve_pattern do more finely than you otherwise can in the Wizards.

Navigating steps in a Wizard from the GUI

When the Wizard is done or Paused, you can select any available step in the Navigate tab at the middle bottom of any Parameter window. This tells the Wizard to get any necessary inputs for that step and to then carry it out.
The Wizards normally start out in Manual mode (one step at a time, asking user for inputs). Once the necessary inputs are entered, the Wizard enters Automatic mode (no more asking for inputs until something required is missing). You can control this by specifying Manual or Automatic in the Auto/Manual tab at the bottom right of any Wizard.

Running a Wizard from the command-line

Basic operation of a Wizard from the command-line

You can run a wizard from the command line like this (autosol is the AutoSol wizard):

phenix.autosol data=w1.sca seq_file=seq.dat 2 Se

The command_line interpreter will try to interpret obvious information (2 means sites=2, Se means atom_type=Se) and will run the wizard.
To see all the information about this wizard and the keywords that you can set for this wizard, type:

phenix.autosol --help all

Any wizard keyword can be entered at the command line (not just the ones labelled "command-line only"). The documentation for each wizard lists all the keywords that apply to that wizard.
If you want to stop a Wizard, you can create a file "STOPWIZARD" and put it in the subdirectory (i.e., AutoSol_2_/) where the Wizard is running. This is like hitting the PAUSE button on the GUI and stops the wizard cleanly.

Keeping track of multiple runs of a Wizard from the command-line

When you start a Wizard from the command line, the default is to start a new run of that Wizard.
To see all the available runs of this Wizard, type:

phenix.autosol show_runs

To delete runs 1,2 and 4-7 of this Wizard, type something like this:

phenix.autosol delete_runs="1 2 4-7"

Note that the group of numbers is enclosed in quotes ("). This tells the input parser (iotbx.phil) that all these numbers go with the one keyword of delete_runs. Note also that there are no spaces around the "=" sign!

To go back to run 2 and carry on (remembering all previous inputs and possibly adding new ones, in this case setting the resolution) type something like:

phenix.autosol run=2 resolution=3.0

To carry on with the current highest-numbered run (remembering all previous inputs and possibly adding new ones, in this case setting the resolution) type something like:

phenix.autosol carry_on resolution=3.0

To copy run 2 to a new run and carry on from there (remembering all previous inputs and possibly adding new ones, in this case setting the resolution) type something like:

phenix.autosol copy_run=2 resolution=3.0

Setting parameters of a Wizard from the command-line

When you run a Wizard from the command-line, two files are produced and put in the subdirectory of the Wizard (e.g., AutoBuild_run_3_/).

A parameters (".eff") file will be produced that you can edit to rerun the Wizard:

phenix.autosol autosol.eff

This autosol.eff file (for AutoSol) contains the values of all the AutoSol parameters at the time of starting the Wizard.

Note that the syntax in the autosol.eff file is very slightly different than the syntax from the command line. From the command line, if a value has several parts, you enclose them in quotes and there are no spaces around the "=" sign:

phenix.autosol ... input_phase_labels="FP PHIM FOMM"

In the .eff file, you MUST leave off the quotes or the three values will be treated as one, and you should leave blanks around the "=" sign:

input_phase_labels = FP PHIM FOMM

The reason these are different is that in the .eff file, the structure of the file and the brackets tell the PHIL parser what is grouped together, while from the commmand line, the quotes tell the parser what is to be grouped together.

A parameters file (".eff") is produced that you can edit and use like this:

phenix.autosol parameters.eff

To get keyword help on a specific keyword you can type:

phenix.autosol --help data  # get help on the keyword data for autosol

To show current Facts (values of all parameters) for highest_numbered run:

phenix.autosol show_facts

To show current Facts (values of all parameters) for run 3:

phenix.autosol run=3 show_facts

To show current summary:

phenix.autosol show_summary

When you use a keyword like data= you need to give enough information to specify this keyword uniquely. You can see all the keywords for each PHENIX Wizard or tool at the end of the documentation for that Wizard or tool. This will have entries like this (for AutoSol):

autosol
       sites= None Number of heavy-atom sites. (Command-line only)

which describes the keyword sites in the scope defined by autosol. You can explicitly specify this on the command line with:

autosol.sites=3

which in this case is entirely the same as

sites=3

NOTE that you can set any SOLVE, RESOLVE or RESOLVE_PATTERN keyword in PHENIX using the "resolve_command", "solve_command" or "resolve_pattern_command" keywords from the command line. The format is a little tricky: you have to put two sets of quotes around the command like this:

resolve_command="'ligand_start start.pdb'"    # NOTE ' and " quotes

This will put the text

ligand_start start.pdb

at the end of every temporary command file created to run resolve.

Running a Wizard from a parameters file

Parameters files are an easy way to specify any parameters that you want to use when running a Wizard. They are structured in a clear way and you can edit them to set the values that you want.

You can get a parameters file to edit with any wizard by running the wizard with the flag "--show_defaults":

phenix.autosol --show_defaults

Here is a parameters file "sad.eff" to run a SAD dataset with "phenix.autosol sad.eff":

autosol {  atom_type = Se
  sites = 2
  seq_file = sequence.dat
  crystal_info {
    space_group = C2
    unit_cell = 76.08 27.97 42.36 90 103.2 90
    resolution = 2.6
  }
  wavelength {
    data = high.sca
    lambda = 0.9600
    f_prime = -1.5
    f_double_prime = 3
  }
}

Note the scope names ("autosol" or "crystal_info") followed by paired brackets ({ ....}) which enclose sets of parameters that are related.

The values of parameters are usually entered on one line, without quotation marks (as in this example) unless they are to be all considered as a single item.

You can specify almost anything either in a parameters file or on the command line. In the above example you could also just say:

phenix.autosol atom_type = Se sites = 2 seq_file = sequence.dat \
    space_group = C2  \
    unit_cell = "76.08 27.97 42.36 90 103.2 90"  \
    resolution = 2.6  \
    data = high.sca \
    lambda = 0.9600 \
    f_prime = -1.5 \
    f_double_prime = 3

(note that the cell parameters are in quotes on the command line and not in the parameters file) and you would get the same results. For simple cases the command-line format is fine, but for anything with a lot of parameters to set it is much easier to just edit a parameters file.

Specific limitations and problems:

In the GUI version of Wizards, The Display Options window is updated only when you open it. Further, once this window is open you cannot open it again until you close it. Sometimes this window may be behind other windows and this will prevent you from opening it again until you close the open window.
The Wizards use file names based on the names of your input files, but they do not differentiate between files with the same name coming from different directories. Consequently you should not use two files with different contents but with the same file name as inputs to a Wizard, even if they come from separate starting directories.
If you stop a Wizard and continue on with a command such as phenix.autobuild run=2 then you can change most parameters with keywords just as if you were starting from scratch, but if you had previously changed a keyword away from the default, you cannot set it back to the default in this way (the Wizard ignores keywords that are the same as the default).
You should not work on the same run in two ways at the same time. This can lead to unpredictable results because the two runs will really be the same run and the data and databases for the two runs will be overwriting each other. This means you need to be careful that if you goto_run 1 of a Wizard in one window that you do not also goto_run 1 of the same Wizard in another window. On the other hand, it is perfectly fine to work on run 1 of a Wizard in one window and run 2 of the same Wizard in another window.
The PHENIX Wizards can take most settings of most space groups, however they can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.