Automated ligand fitting with LigandFit
- LigandFit Wizard: Tom Terwilliger
- PHENIX GUI and PDS Server: Nigel W. Moriarty
- RESOLVE: Tom Terwilliger
The LigandFit Wizard carries out fitting of flexible ligands to electron
density maps.
The LigandFit Wizard can be run from the PHENIX GUI, from the
command-line, and from parameters files. All three versions are
identical except in the way that they take commands from the user. See
Using the PHENIX Wizards for details of how to
run a Wizard. The command-line version will be described here.
The LigandFit wizard provides a command-line and graphical user
interface allowing the user to identify a datafile containing
crystallographic structure factor information, an optional PDB file with
a partial model of the structure without the ligand, and a PDB file
containing the ligand to be fit (in an allowed but arbitrary
conformation).
The wizard checks the data files for consistency and then calls RESOLVE
to carry out the fitting of the ligand into the electron-density map.
The best map to use is usually a 2Fo-Fc map from phenix.refine.
You can also have LigandFit calculate a difference map,
with F=FP-FC. It can also be
an Fobs map (calulated from FP with phases PHIC from the input partial
model), or an arbitrary map, calculated with FP PHI and optional FOM. If you
supply an input partial model, then the region occupied by the partial
model is flattened in the map used to fit the ligand, so that the ligand
will normally not get placed in this region.
The ligand fitting is done by RESOLVE in a three-stage process. First,
the largest contiguous region of density in the map not already occupied
by the model is identified. The ligand will be placed in this density.
(If desired, the location of the ligand can instead be defined by the
user as near a certain residue or near specified coordinates. ) Next,
many possible placements of the largest rigid sub-fragments of the
ligand are found within this region of high density. Third, each of
these placements is taken as a starting point for fitting the remainder
of the ligand. All these ligand fits are scored based on the fit to the
density, and the best-fitting placement is written out.
The output of the wizard consists of a fitted ligand in PDB format and a
summary of the quality of the fit. Multiple copies of a ligand can be
fit to a single map in an automated fashion using the LigandFit wizard
as well.
Running the LigandFit Wizard is easy. For example, from the command-line
you can type:
phenix.ligandfit data=datafile.mtz model=partial_model.pdb ligand=ligand.pdb
The LigandFit Wizard will carry out ligand fitting of the ligand in
ligand.pdb based on the structure factor amplitudes in datafile.mtz,
calculating phases based on partial-model.pdb. All rotatable bonds will
be identified and allowed to take stereochemically reasonable
orientations.
The ligandfit wizard needs:
- a datafile (w1.sca or data=w1.sca); this can be any format
- a PDB file with your model without ligand (model=partial.pdb; optional if your datafile contains map coefficients)
- a file with information about your ligand (ligand=side.pdb), or a 3-letter code for your ligand (ligand=ATP)
The ligand file can be a PDB file with 1 stereochemically acceptable
conformation of your ligand. It can alternatively be a file containing a
SMILES string, in which case the starting ligand conformation will be
generated with the PHENIX elbow routine. It can also be a 3-letter code
that specifies a ligand in the Chemical Components Dictionary of the
PDB, in which case the ligand is taken from that dictionary with
idealized geometry.
The command_line ligandfit interpreter will guess which file is your
data file but you have to tell it which file is the model and which is
the ligand.
When you run LigandFit the output files will be in a subdirectory with
your run number:
LigandFit_run_1_/ # subdirectory with results
- A summary file listing the results of the run and the other files
produced:
LigandFit_summary.dat # overall summary
- A file that lists all parameters and knowledge accumulated by the
Wizard during the run (some parts are binary and are not printed)
LigandFit_Facts.dat # all Facts about the run
- A warnings file listing any warnings about the run
LigandFit_warnings.dat # any warnings
- A PDB file with the fitted ligand (in this case the first copy of
ligand number 1):
ligand_fit_1_1.pdb
- A log file with the fitting of the ligand:
ligand_1_1.log
- A log file with the fit of the ligand to the map:
ligand_cc_1_1.log
- Map coefficients for the map used for fitting:
resolve_map.mtz
You can run phenix.ligandfit from a parameters file. This is often
convenient because you can generate a default one with:
phenix.ligandfit --show_defaults > my_ligandfit.eff
and then you can just edit this file to match your needs and run it
with:
phenix.ligandfit my_ligandfit.eff
- Standard run of ligandfit (generate map from model and data file)
phenix.ligandfit w1.sca model=partial.pdb ligand=ATP \
lig_map_type=fo-fc_difference_map
- Build into a map from pre-determined coefficients
phenix.ligandfit data=perfect.mtz \
lig_map_type=pre_calculated_map_coeffs \
model=partial.pdb ligand=NAD
phenix.ligandfit w1.sca model=partial.pdb ligand=ATP quick=True
- Run ligandfit using pre-calculated map coefficients from
phenix.refine
If your refine a model with a command such as,
phenix.refine data.mtz partial.pdb
then you will end up with the refined model,
partial_refine_001.pdb
and a map coefficients file:
partial_refine_001_map_coeffs.mtz
You can then run ligandfit using the 2Fo-Fc map calculated from these
map coefficients:
phenix.ligandfit data=partial_refine_001_map_coeffs.mtz \
model=partial_refine_001.pdb ligand=NAD quick=True
or if you want to specify the coefficients explicitly you can add the
column labels:
phenix.ligandfit data=partial_refine_001_map_coeffs.mtz \
model=partial_refine_001.pdb ligand=GOL quick=True \
input_labels="2FOFCWT PH2FOFCWT"
For a difference map from the same file you can say:
phenix.ligandfit data=partial_refine_001_map_coeffs.mtz \
model=partial_refine_001.pdb ligand=AMP quick=True \
input_labels="FOFCWT PHFOFCWT"
- Run ligandfit on a series of ligands specified in ligand_list.dat
phenix.ligandfit w1.sca model=partial.pdb \
ligand=ligand_list.dat file_or_file_list=file_with_list_of_files
Note that you have to specify
file_or_file_list=file_with_list_of_files
or else the Wizard will try to interpret the contents of
ligand_list.dat as a SMILES string. Here the
"file_with_list_of_files" is a flag, not something you substitute
with an actual file name. You use it just as listed above.
- Place ligand near residue 94 of chain "A" from partial.pdb
phenix.ligandfit w1.sca model=partial.pdb ligand=ADP \
ligand_near_chain="A" ligand_near_res=92
- Use start.pdb as a template for some of the atoms in the ligand;
build the remainder of the ligand, fixing the coordinates of the
corresponding atoms:
phenix.ligandfit w1.sca model=partial.pdb ligand=GTP \
ligand_start=start.pdb
NOTE: the file start.pdb must contain an entire rigid group of atoms (so
that ligandfit can identify the position and orientation of at least one
rigid part of the ligand.)
- Use NCS from the model and the first ligand fitted to guess the
positions of NCS-related ligands:
phenix.ligandfit w1.sca model=partial.pdb ligand=GTP \
ligands_from_ncs=True
In Phenix the parameter test_flag_value sets the value of the test set
that is to be free. Normally Phenix sets up test sets with values of 0
and 1 with 1 as the free set. The CCP4 convention is values of 0 through
19 with 0 as the free set. Either of these is recognized by default in
Phenix and you do not need to do anything special. If you have any other
convention (for example values of 0 to 19 and test set is 1) then you
can specify this with test_flag_value.
- The ligand to be searched for must have at least 3 atoms.
- The partial-model file must not have any atoms (other than waters,
which are automatically removed) in the position where the ligand is
to be built. If this file contains atoms other than waters in the
position where the ligand is to be built, then you may wish to remove
them before building the ligand.
- If a ring in the ligand can have more than one conformation (e.g.,
chair or boat conformation) then you need to do separate runs for
each conformation of the ring (rings are taken as fixed units in
LigandFit).
- LigandFit ignores insertion codes, so if you specify a residue with
ligand_near_res, only the residue number is used.
- The size of the asymmetric unit in the SOLVE/RESOLVE portion of the
LigandFit wizard is limited by the memory in your computer and the
binaries used. The Wizard is supplied with regular-size ("", size=6),
giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge
("_extra_huge", size=36). Larger-size versions can be obtained on
request.
- The LigandFit Wizard can take most settings of most space groups,
however it can only use the hexagonal setting of rhombohedral space
groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups
114-119 (not found in macromolecular crystallography) even in the
standard setting due to difficulties with the use of asuset in the
version of ccp4 libraries used in PHENIX for these settings and space
groups.
- ligandfit
- data = None Datafile. This can be any format if only FP is to be read in. If phases are to be read in then MTZ format is required. The Wizard will guess the column identification. If you want to specify it you can say input_labels="FP" , or input_labels="FP PHIB FOM".
- ligand = None Three-letter code of ligand, or file containing information about the ligand (PDB or SMILES)
- model = None PDB file with model for everything but the ligand
- quick = False Set to True for running as quickly as possible.
- crystal_info
- unit_cell = None Enter cell parameter a b c alpha beta gamma
- resolution = 0 High-resolution limit. Zero means keep everything.
- space_group = None Space Group symbol (i.e., C2221 or C 2 2 21)
- file_info
- file_or_file_list = *single_file file_with_list_of_files Choose if you want to input a single file with PDB or other information about the ligand or if you want to input a file containing a list of files with this information for a list of ligands
- input_labels = None Labels for input data columns
- lig_map_type = fo-fc_difference_map fobs_map *pre_calculated_map_coeffs Enter the type of map to use in ligand fitting fo-fc_difference_map: Fo-Fc difference map phased on partial model (requires FOBS in your input file) fobs_map: Fo map phased on partial model (requires FOBS in your input file) pre_calculated_map_coeffs: map calculated from FP PHIB [FOM] coefficients in input data file (or 2FOFCWT PH2FOFCWT coeffs)
- ligand_format = *PDB SMILES Enter whether the files contain SMILES strings or PDB formatted information
- input_files
- existing_ligand_file_list = None You can enter a list of PDB files with ligands you have already fit. These will be used to exclude that region from consideration.
- ligand_start = None LigandFit will attempt to put your ligand superimposing on ligand_start if supplied. This must have some of the same atoms as your ligand, but does not have to have all of them.
- ncs_in = None You can supply a file with NCS information for use with ligands_from_ncs
- input_ligand_compare_file = None If you enter a PDB file with a ligand in it, the coordinates of the newly-built ligand will be compared with the coordinates in this file.
- cif_def_file_list = None You can supply cif files for real-space refinement after fitting
- refinement_file = None You can supply a file for full refinement containing F/I SIGF/SIGI FreeR_flag If you supply this file then after real-space refinement a round of full refinement will be carried out with phenix.refine
- fobs_labels = None Labels for Fobs SigFobs or Iobs SigIobs for refinement_file... same format as for phenix.refine
- r_free_label = None Label for FreeR_flag in refinement_file...same format as for phenix.refine
- search_parameters
- fixed_ligand = False Use fixed ligand (no rotations of any bonds) if set
- conformers = 1 Enter how many conformers to create. If greater than 1, then ELBOW will always be used to generate them. If 1 then ELBOW will be used if a PDB file is not specified. These conformers are used to identify allowed torsion angles for your ligand. The alternative is to use the empirical rules in RESOLVE. ELBOW takes longer but is more accurate.
- group_search = 0 Enter the ID number of the group from the ligand to use to seed the search for conformations
- ligand_cc_min = 0.75 Enter the minimum correlation coefficient of the ligand to the map to quit searching for more conformations
- ligand_completeness_min = 1 Enter the minimum completeness of the ligand to the map to quit searching for more conformations
- local_search = True If local_search is True then, only the region within search_dist of the point in the map with the highest local rmsd will be searched in the FFT search for fragments
- search_dist = 10 If local_search is True then, only the region within this distance of the point in the map with the highest local rmsd will be searched in the FFT search for fragments
- use_cc_local = False You can specify the use of a local correlation coefficient for scoring ligand fits to the map. If you do not do this, then the region over which the ligand is scored are all points within 2.5 A of the atoms in the ligand. If you do specify use_cc_local, then the region over which the ligand is scored are all these points, plus all the contingous points that have density greater than 0.5 * sigma .
- ligands_from_ncs = False You can try to use ncs (from your partial model file or from your ncs_in file) along with any ligands already found to place additional copies of your ligand. Only applicable if there is one type of ligand.
- max_ligands_from_ncs = 1 You can specify how many of the ligands already found to consider using NCS (usually 1)
- n_group_search = 3 Enter the number of different fragments of the ligand that will be looked for in FFT search of the map
- n_indiv_tries_max = 10 If 0 is specified, all fragments are searched at once otherwise all are first searched at once then individually up to the number specified
- n_indiv_tries_min = 5 If 0 is specified, all placements of a fragment are tested at once otherwise all are first tested at once then individually up to the number specified
- number_of_ligands = 1 Number of copies of the ligand expected in the asymmetric unit
- offsets_list = 7 53 29 You can specify an offset for the orientation of the templates in searching for ligands. This is used in generating diversity in models.
- refine_ligand = True You can carry out real-space refinement on the ligand after fitting
- ligand_occupancy = 1.0 You can set the initial occupancy of the ligand
- real_space_target_weight = 10. You can carry change the weight on the real-space term in real-space refinement on the ligand after fitting
- fittingParameters for tracing ligand
- delta_phi_ligand = 40 Specify the angle (degrees) between successive tries in FFT search for fragments
- fit_phi_inc = 20 Specify the angle (degrees) between rotations around bonds
- fit_phi_range = -180 180 Range of bond rotation angles to search
- search_target
- ligand_near_chain = None You can specify where to search for the ligand either with search_center or with ligand_near_res and ligand_near_chain. If you set ligand_near_chain="None" or leave it blank or do not set it, then all chains will be included. The keywords ligand_near_res and ligand_near_chain refer to residue/chain in the file defined by input_partial_model_file (or model if running from command line).
- ligand_near_res = None You can specify where to search for the ligand either with search_center or with ligand_near_res and ligand_near_chain The keywords ligand_near_res and ligand_near_chain refer to residue/chain in the file defined by input_partial_model_file (or model if running from command line).
- ligand_near_pdb = None You can specify where LigandFit should look for your ligands by providing a PDB file containing one or more copies of the ligand. If you want you can provide a PDB file with ligand+ macromolecule and specify the ligand name with name_of_ligand_near_pdb.
- name_of_ligand_near_pdb = None You can specify where LigandFit should look for your ligands by providing a PDB file containing one or more copies of the ligand. If you want you can provide a PDB file with ligand+ macromolecule and specify the ligand name with name_of_ligand_near_pdb.
- search_center = 0.0 0.0 0.0 Enter coordinates for center of search region (ignored if [0.,0.,0.])
- general
- extend_try_list = True You can fill out the list of parallel jobs to match the number of jobs you want to run at one time, as specified with nbatch.
- ligand_id = None You can specify an integer value for the ID of a ligand... This number will be added to whatever residue number the ligand search model in input_lig_file has. The keyword is only valid if a single copy of the ligand is to be found.
- nbatch = 5 You can specify the number of processors to use (nproc) and the number of batches to divide the data into for parallel jobs. Normally you will set nproc to the number of processors available and leave nbatch alone. If you leave nbatch as None it will be set automatically, with a value depending on the Wizard. This is recommended. The value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch.
- nproc = 1 You can specify the number of processors to use (nproc) and the number of batches to divide the data into for parallel jobs. Normally you will set nproc to the number of processors available and leave nbatch alone. If you leave nbatch as None it will be set automatically, with a value depending on the Wizard. This is recommended. The value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch. If you set nproc=Auto and your machine has n processors, then it will use n-1 processors, or 1 if only 1 is available
- resolve_command_list = None Commands for resolve. One per line in the form: keyword value value can be optional Examples: coarse_grid resolution 200 2.0 hklin test.mtz NOTE: for command-line usage you need to enclose the whole set of commands in double quotes (") and each individual command in single quotes (') like this: resolve_command_list="'no_build' 'b_overall 23' "
- coot_name = "coot" If your version of coot is called something else, then you can specify that here.
- i_ran_seed = 72432 Random seed (positive integer) for model-building and simulated annealing refinement
- raise_sorry = False You can have any failure end with a Sorry instead of simply printout to the screen
- background = True When you specify nproc=nn, you can run the jobs in background (default if nproc is greater than 1) or foreground (default if nproc=1). If you set run_command=qsub (or otherwise submit to a batch queue), then you should set background=False, so that the batch queue can keep track of your runs. There is no need to use background=True in this case because all the runs go as controlled by your batch system. If you use run_command='sh ' (or similar, sh is default) then normally you will use background=True so that all the jobs run simultaneously.
- check_wait_time = 1.0 You can specify the length of time (seconds) to wait between checking for subprocesses to end
- max_wait_time = 1.0 You can specify the length of time (seconds) to wait when looking for a file. If you have a cluster where jobs do not start right away you may need a longer time to wait. The symptom of too short a wait time is 'File not found'
- wait_between_submit_time = 1.0 You can specify the length of time (seconds) to wait between each job that is submitted when running sub-processes. This can be helpful on NFS-mounted systems when running with multiple processors to avoid file conflicts. The symptom of too short a wait_between_submit_time is File exists:....
- cache_resolve_libs = True Use caching of resolve libraries to speed up resolve
- resolve_size = 12 Size for solve/resolve ("","_giant", "_huge","_extra_huge" or a number where 12=giant 18=huge
- check_run_command = False You can have the wizard check your run command at startup
- run_command = "sh " When you specify nproc=nn, you can run the subprocesses as jobs in background with sh (default) or submit them to a queue with the command of your choice (i.e., qsub ). If you have a multi-processor machine, use sh. If you have a cluster, use qsub or the equivalent command for your system. NOTE: If you set run_command=qsub (or otherwise submit to a batch queue), then you should set background=False, so that the batch queue can keep track of your runs. There is no need to use background=True in this case because all the runs go as controlled by your batch system. If nproc is greater than 1 and you use run_command='sh '(or similar, sh is default) then normally you will use background=True so that all the jobs run simultaneously.
- queue_commands = None You can add any commands that need to be run for your queueing system. These are written before any other commands in the file that is submitted to your queueing system. For example on a PBS system you might say: queue_commands='#PBS -N mr_rosetta' queue_commands='#PBS -j oe' queue_commands='#PBS -l walltime=03:00:00' queue_commands='#PBS -l nodes=1:ppn=1' NOTE: you can put in the characters '<path>' in any queue_commands line and this will be replaced by a string of characters based on the path to the run directory. The first character and last two characters of each part of the path will be included, separated by '_',up to 15 characters. For example 'test_autobuild/WORK_5/AutoBuild_run_1_/TEMP0/RUN_1' would be represented by: 'tld_W_5_A1__TP0_1'
- condor_universe = vanilla The universe for condor is usually vanilla. However you might need to set it to local for your cluster
- add_double_quotes_in_condor = True You might need to turn on or off double quotes in condor job submission scripts. These are already default elsewhere but may interfere with condor paths.
- condor = None Specifies if the group_run_command is submitting a job to a condor cluster. Set by default to True if group_run_command=condor_submit, otherwise False. For condor job submission mr_rosetta uses a customized script with condor commands. Also uses one_subprocess_level=True
- last_process_is_local = True If true, run the last process in a group in background with sh as part of the job that is submitting jobs. This prevents having the job that is submitting jobs sit and wait for all the others while doing nothing
- skip_r_factor = False You can skip R-factor calculation if refinement is not done and maps_only=True
- test_flag_value = Auto Normally leave this at Auto (default). This parameter sets the value of the test set that is to be free. Normally phenix sets up test sets with values of 0 and 1 with 1 as the free set. The CCP4 convention is values of 0 through 19 with 0 as the free set. Either of these is recognized by default in Phenix. If you have any other convention (for example values of 0 to 19 and test set is 1) then you can specify this with test_flag_value.
- skip_xtriage = False You can bypass xtriage if you want. This will prevent you from applying anisotropy corrections, however.
- base_path = None You can specify the base path for files (default is current working directory)
- temp_dir = None Define a temporary directory (it must exist)
- clean_up = None At the end of the entire run the TEMP directories will be removed if clean_up is True. Files listed in keep_files will not be deleted. If you want to remove files after your run is finished use a command like "phenix.autobuild run=1 clean_up=True"
- print_citations = True Print citations at end of run
- solution_output_pickle_file = None At end of run, write solutions to this file in output directory if defined
- title = None Enter any text you like to help identify what you did in this run
- top_output_dir = None This is used in subprocess calls of wizards and to tell the Wizard where to look for the STOPWIZARD file.
- wizard_directory_number = None This is used by the GUI to define the run number for Wizards. It is the same as desired_run_number NOTE: this value can only be specified on the command line, as the directory number is set before parameters files are read.
- verbose = False Command files and other verbose output will be printed
- extra_verbose = False Facts and possible commands will be printed every cycle if True
- debug = False You can have the wizard stop with error messages about the code if you use debug. Additionally the output goes to the terminal if you specify "debug=True"
- require_nonzero = True Require non-zero values in data columns to consider reading in.
- remove_path_word_list = None List of words identifying paths to remove from PATH These can be used to shorten your PATH. For example... cns ccp4 coot would remove all paths containing these words except those also containing phenix. Capitalization is ignored.
- fill = False Fill in all missing reflections to resolution res_fill. Applies to density modified maps. See also filled_2fofc_maps in autobuild.
- res_fill = None Resolution for filling in missing data (default = highest resolution of any datafile). Only applies to density modified maps. Default is fill to high resolution of data. Ignored if fill=False
- check_only = False Just read in and check initial parameters. Not for general use
- keep_files = ligandfit*.pdb List of files that are not to be cleaned up. wildcards permitted
- display
- number_of_solutions_to_display = None Number of solutions to put on screen and to write out
- solution_to_display = 1 Solution number of the solution to display and write out ( use 0 to let the wizard display the top solution)
- run_control
- ignore_blanks = None ignore_blanks allows you to have a command-line keyword with a blank value like "input_lig_file_list="
- stop = None You can stop the current wizard with "stopwizard" or "stop". If you type "phenix.autobuild run=3 stop" then this will stop run 3 of autobuild.
- display_facts = None Set display_facts to True and optionally run=[run-number] to display the facts for run run-number. If you just say display_facts then the facts for the highest-numbered existing run will be shown.
- display_summary = None Set display_summary to True and optionally run=[run-number] to show the summary for run run-number. If you just say display_summary then the summary for the highest-numbered existing run will be shown.
- carry_on = None Set carry_on to True to carry on with highest-numbered run from where you left off.
- run = None Set run to n to continue with run n where you left off.
- copy_run = None Set copy_run to n to copy run n to a new run and continue where you left off.
- display_runs = None List all runs for this wizard.
- delete_runs = None List runs to delete: 1 2 3-5 9:12
- display_labels = None display_labels=test.mtz will list all the labels that identify data in test.mtz. You can use the label strings that are produced in AutoSol to identify which data to use from a datafile like this: peak.data="F+ SIGF+ F- SIGF-" # the entire string in quotes counts here You can use the individual labels from these strings as identifiers for data columns in AutoSol and AutoBuild like this: input_refinement_labels="FP SIGFP FreeR_flags" # each individual label counts
- dry_run = False Just read in and check parameter names
- params_only = False Just read in and return parameter defaults
- display_all = False Just read in and display parameter defaults
- coot = None Not presently applicable for ligandfit
- special_keywords
- write_run_directory_to_file = None Writes the full name of a run directory to the specified file. This can be used as a call-back to tell a script where the output is going to go.
- non_user_parameters These are obsolete parameters and parameters that the wizards use to communicate among themselves. Not normally for general use.
- gui_output_dir = None Used only by the GUI
- sg = None Obsolete. Use space_group instead
- get_lig_volume = False You can ask to get the volume of the ligand and to then stop
- input_data_file = None Not normally used. Use "data=" instead
- input_lig_file = None Not normally used. Use "ligand=" instead.
- ligand_code = None Not normally used. Use "ligand=" instead.
- input_partial_model_file = None Not normally used. Use "model=" instead
- cif_already_generated = False You can specify that the ligand cif file is already generated