| Python-based Hierarchical ENvironment for Integrated Xtallography |
| Documentation Home |
Automated ligand fitting with LigandFit
Author(s)
PurposePurpose of the LigandFit WizardThe LigandFit Wizard carries out fitting of flexible ligands to electron density maps. UsageThe LigandFit Wizard can be run from the PHENIX GUI, from the command-line, and from parameters files. All three versions are identical except in the way that they take commands from the user. See Using the PHENIX Wizards for details of how to run a Wizard. The command-line version will be described here. How the LigandFit Wizard worksThe LigandFit wizard provides a command-line and graphical user interface allowing the user to identify a datafile containing crystallographic structure factor information, an optional PDB file with a partial model of the structure without the ligand, and a PDB file containing the ligand to be fit (in an allowed but arbitrary conformation). The wizard checks the data files for consistency and then calls RESOLVE to carry out the fitting of the ligand into the electron-density map. The map used is normally a difference map, with F=FP-FC. It can also be an Fobs map (calulated from FP with phases PHIC from the input partial model), or an arbitrary map, calculated with FP PHI and FOM. If you supply an input partial model, then the region occupied by the partial model is flattened in the map used to fit the ligand, so that the ligand will normally not get placed in this region. The ligand fitting is done by RESOLVE in a three-stage process. First, the largest contiguous region of density in the map not already occupied by the model is identified. The ligand will be placed in this density. (If desired, the location of the ligand can instead be defined by the user as near a certain residue or near specified coordinates. ) Next, many possible placements of the largest rigid sub-fragments of the ligand are found within this region of high density. Third, each of these placements is taken as a starting point for fitting the remainder of the ligand. All these ligand fits are scored based on the fit to the density, and the best-fitting placement is written out. The output of the wizard consists of a fitted ligand in PDB format and a summary of the quality of the fit. Multiple copies of a ligand can be fit to a single map in an automated fashion using the LigandFit wizard as well. How to run the LigandFit WizardRunning the LigandFit Wizard is easy. For example, from the command-line you can type: phenix.ligandfit data=datafile.mtz model=partial_model.pdb ligand=ligand.pdb The LigandFit Wizard will carry out ligand fitting of the ligand in ligand.pdb based on the structure factor amplitudes in datafile.mtz, calculating phases based on partial-model.pdb. All rotatable bonds will be identified and allowed to take stereochemically reasonable orientations. What the LigandFit wizard needs to run
Running from a parameters fileYou can run phenix.ligandfit from a parameters file. This is often convenient because you can generate a default one with: phenix.ligandfit --show_defaults > my_ligandfit.effand then you can just edit this file to match your needs and run it with: phenix.ligandfit my_ligandfit.eff ExamplesSample command_line inputs
Possible ProblemsSpecific limitations and problems
Literature
Additional informationList of all LigandFit keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names
red - parameter values
blue - parameter help
blue bold - scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
ligandfit
data= None Datafile. This can be any format if only FP is to be read in. If
phases are to be read in then MTZ format is required. The Wizard will
guess the column identification. If you want to specify it you can
say input_labels="FP" , or input_labels="FP PHIB
FOM".
ligand= None Three-letter code of ligand, or file containing information
about the ligand (PDB or SMILES)
model= None PDB file with model for everything but the ligand
quick= False Set to True for running as quickly as possible.
crystal_info
unit_cell= None Enter cell parameter a b c alpha beta gamma
resolution= 0 High-resolution limit. Zero means keep everything.
space_group= None Space Group symbol (i.e., C2221 or C 2 2 21)
file_info
file_or_file_list= *single_file file_with_list_of_files Choose if you
want to input a single file with PDB or other
information about the ligand or if you want to input
a file containing a list of files with this
information for a list of ligands
input_labels= None Labels for input data columns
lig_map_type= fo-fc_difference_map fobs_map *pre_calculated_map_coeffs
Enter the type of map to use in ligand fitting
fo-fc_difference_map: Fo-Fc difference map phased on
partial model (requires FOBS in your input file) fobs_map:
Fo map phased on partial model (requires FOBS in your
input file) pre_calculated_map_coeffs: map calculated from
FP PHIB [FOM] coefficients in input data file (or 2FOFCWT
PH2FOFCWT coeffs)
ligand_format= *PDB SMILES Enter whether the files contain SMILES
strings or PDB formatted information
input_files
existing_ligand_file_list= None You can enter a list of PDB files with
ligands you have already fit. These will be
used to exclude that region from
consideration.
ligand_start= None LigandFit will attempt to put your ligand
superimposing on ligand_start if supplied. This must have
some of the same atoms as your ligand, but does not have
to have all of them.
ncs_in= None You can supply a file with NCS information for use with
ligands_from_ncs
input_ligand_compare_file= None If you enter a PDB file with a ligand in
it, the coordinates of the newly-built ligand
will be compared with the coordinates in this
file.
cif_def_file_list= None You can supply cif files for real-space
refinement after fitting
refinement_file= None You can supply a file for full refinement
containing F/I SIGF/SIGI FreeR_flag If you supply this
file then after real-space refinement a round of full
refinement will be carried out with phenix.refine
fobs_labels= None Labels for Fobs SigFobs or Iobs SigIobs for
refinement_file... same format as for phenix.refine
r_free_label= None Label for FreeR_flag in refinement_file...same format
as for phenix.refine
search_parameters
conformers= 1 Enter how many conformers to create. If greater than 1,
then ELBOW will always be used to generate them. If 1 then
ELBOW will be used if a PDB file is not specified. These
conformers are used to identify allowed torsion angles for
your ligand. The alternative is to use the empirical rules
in RESOLVE. ELBOW takes longer but is more accurate.
group_search= 0 Enter the ID number of the group from the ligand to use
to seed the search for conformations
ligand_cc_min= 0.75 Enter the minimum correlation coefficient of the
ligand to the map to quit searching for more
conformations
ligand_completeness_min= 1 Enter the minimum completeness of the ligand
to the map to quit searching for more
conformations
local_search= True If local_search is True then, only the region within
search_dist of the point in the map with the highest local
rmsd will be searched in the FFT search for fragments
search_dist= 10 If local_search is True then, only the region within
this distance of the point in the map with the highest
local rmsd will be searched in the FFT search for fragments
use_cc_local= False You can specify the use of a local correlation
coefficient for scoring ligand fits to the map. If you do
not do this, then the region over which the ligand is
scored are all points within 2.5 A of the atoms in the
ligand. If you do specify use_cc_local, then the region
over which the ligand is scored are all these points, plus
all the contingous points that have density greater than
0.5 * sigma .
ligands_from_ncs= False You can try to use ncs (from your partial model
file or from your ncs_in file) along with any ligands
already found to place additional copies of your
ligand. Only applicable if there is one type of
ligand.
max_ligands_from_ncs= 1 You can specify how many of the ligands already
found to consider using NCS (usually 1)
n_group_search= 3 Enter the number of different fragments of the ligand
that will be looked for in FFT search of the map
n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at
once otherwise all are first searched at once then
individually up to the number specified
n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are
tested at once otherwise all are first tested at once
then individually up to the number specified
number_of_ligands= 1 Number of copies of the ligand expected in the
asymmetric unit
offsets_list= 7 53 29 You can specify an offset for the orientation of
the templates in searching for ligands. This is used in
generating diversity in models.
refine_ligand= True You can carry out real-space refinement on the
ligand after fitting
real_space_target_weight= 10. You can carry change the weight on the
real-space term in real-space refinement on
the ligand after fitting
fitting Parameters for tracing ligand
delta_phi_ligand= 40 Specify the angle (degrees) between successive
tries in FFT search for fragments
fit_phi_inc= 20 Specify the angle (degrees) between rotations around
bonds
fit_phi_range= -180 180 Range of bond rotation angles to search
search_target
ligand_near_chain= None You can specify where to search for the ligand
either with search_center or with ligand_near_res and
ligand_near_chain. If you set
ligand_near_chain="None" or leave it blank
or do not set it, then all chains will be included.
The keywords ligand_near_res and ligand_near_chain
refer to residue/chain in the file defined by
input_partial_model_file (or model if running from
command line).
ligand_near_res= None You can specify where to search for the ligand
either with search_center or with ligand_near_res and
ligand_near_chain The keywords ligand_near_res and
ligand_near_chain refer to residue/chain in the file
defined by input_partial_model_file (or model if
running from command line).
ligand_near_pdb= None You can specify where LigandFit should look for
your ligands by providing a PDB file containing one or
more copies of the ligand. If you want you can provide
a PDB file with ligand+ macromolecule and specify the
ligand name with name_of_ligand_near_pdb.
name_of_ligand_near_pdb= None You can specify where LigandFit should
look for your ligands by providing a PDB file
containing one or more copies of the ligand. If
you want you can provide a PDB file with
ligand+ macromolecule and specify the ligand
name with name_of_ligand_near_pdb.
search_center= 0.0 0.0 0.0 Enter coordinates for center of search region
(ignored if [0.,0.,0.])
general
extend_try_list= True You can fill out the list of parallel jobs to
match the number of jobs you want to run at one time,
as specified with nbatch.
ligand_id= None You can specify an integer value for the ID of a
ligand... This number will be added to whatever residue
number the ligand search model in input_lig_file has. The
keyword is only valid if a single copy of the ligand is to be
found.
nbatch= 5 You can specify the number of processors to use (nproc) and
the number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors
available and leave nbatch alone. If you leave nbatch as None it
will be set automatically, with a value depending on the Wizard.
This is recommended. The value of nbatch can affect the results
that you get, as the jobs are not split into exact replicates,
but are rather run with different random numbers. If you want to
get the same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be
set automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch. If you set
nproc=Auto and your machine has n processors, then it will use
n-1 processors, or 1 if only 1 is available
resolve_command_list= None Commands for resolve. One per line in the
form: keyword value value can be optional
Examples: coarse_grid resolution 200 2.0 hklin
test.mtz NOTE: for command-line usage you need to
enclose the whole set of commands in double quotes
(") and each individual command in single
quotes (') like this:
resolve_command_list="'no_build' 'b_overall
23' "
coot_name= "coot" If your version of coot is called something else, then
you can specify that here.
i_ran_seed= 72432 Random seed (positive integer) for model-building and
simulated annealing refinement
raise_sorry= False You can have any failure end with a Sorry instead of
simply printout to the screen
background= True When you specify nproc=nn, you can run the jobs in
background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set run_command=qsub
(or otherwise submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If you use run_command='sh ' (or similar, sh is
default) then normally you will use background=True so that
all the jobs run simultaneously.
max_wait_time= 1.0 You can specify the length of time (seconds) to wait
when looking for a file. If you have a cluster where jobs
do not start right away you may need a longer time to
wait. The symptom of too short a wait time is 'File not
found'
wait_between_submit_time= 1.0 You can specify the length of time
(seconds) to wait between each job that is
submitted when running sub-processes. This can
be helpful on NFS-mounted systems when running
with multiple processors to avoid file
conflicts. The symptom of too short a
wait_between_submit_time is File exists:....
cache_resolve_libs= True Use caching of resolve libraries to speed up
resolve
resolve_size= 12 Size for solve/resolve
("","_giant",
"_huge","_extra_huge" or a number
where 12=giant 18=huge
check_run_command= False You can have the wizard check your run command
at startup
run_command= "sh " When you specify nproc=nn, you can run the
subprocesses as jobs in background with sh (default) or
submit them to a queue with the command of your choice
(i.e., qsub ). If you have a multi-processor machine, use
sh. If you have a cluster, use qsub or the equivalent
command for your system. NOTE: If you set run_command=qsub
(or otherwise submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If nproc is greater than 1 and you use
run_command='sh '(or similar, sh is default) then normally
you will use background=True so that all the jobs run
simultaneously.
last_process_is_local= True If true, run the last process in a group in
background with sh as part of the job that is
submitting jobs. This prevents having the job
that is submitting jobs sit and wait for all the
others while doing nothing
skip_r_factor= False You can skip R-factor calculation if refinement is
not done and maps_only=True
skip_xtriage= False You can bypass xtriage if you want. This will
prevent you from applying anisotropy corrections, however.
base_path= None You can specify the base path for files (default is
current working directory)
temp_dir= None Define a temporary directory (it must exist)
clean_up= True At the end of the entire run the TEMP directories will be
removed if clean_up is True. The default is yes, delete these
directories. If you want to remove them after your run is
finished use a command like "phenix.autobuild run=1
clean_up=True" Files listed in keep_files will not be
deleted
solution_output_pickle_file= None At end of run, write solutions to this
file in output directory if defined
title= None Enter any text you like to help identify what you did in
this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
wizard_directory_number= None This is used by the GUI to define the run
number for Wizards. It is the same as
desired_run_number NOTE: this value can only be
specified on the command line, as the directory
number is set before parameters files are read.
verbose= False Command files and other verbose output will be printed
extra_verbose= False Facts and possible commands will be printed every
cycle if True
debug= False You can have the wizard stop with error messages about the
code if you use debug. NOTE: you cannot use Pause with debug.
Additionally the output goes to the terminal if you specify
"debug=True"
require_nonzero= True Require non-zero values in data columns to
consider reading in.
remove_path_word_list= None List of words identifying paths to remove
from PATH These can be used to shorten your PATH.
For example... cns ccp4 coot would remove all
paths containing these words except those also
containing phenix. Capitalization is ignored.
fill= False Fill in all missing reflections to resolution res_fill.
Applies to density modified maps. See also filled_2fofc_maps in
autobuild.
res_fill= None Resolution for filling in missing data (default = highest
resolution of any datafile). Only applies to density modified
maps. Default is fill to high resolution of data. Ignored if
fill=False
keep_files= ligandfit*.pdb List of files that are not to be cleaned up.
wildcards permitted
display
number_of_solutions_to_display= None Number of solutions to put on
screen and to write out
solution_to_display= 1 Solution number of the solution to display and
write out ( use 0 to let the wizard display the top
solution)
run_control
ignore_blanks= None ignore_blanks allows you to have a command-line
keyword with a blank value like
"input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard"
or "stop". If you type "phenix.autobuild run=3
stop" then this will stop run 3 of autobuild.
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number.
If you just say display_facts then the facts for the
highest-numbered existing run will be shown.
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run
run-number. If you just say display_summary then the
summary for the highest-numbered existing run will be
shown.
carry_on= None Set carry_on to True to carry on with highest-numbered
run from where you left off.
run= None Set run to n to continue with run n where you left off.
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off.
display_runs= None List all runs for this wizard.
delete_runs= None List runs to delete: 1 2 3-5 9:12
display_labels= None display_labels=test.mtz will list all the labels
that identify data in test.mtz. You can use the label
strings that are produced in AutoSol to identify which
data to use from a datafile like this:
peak.data="F+ SIGF+ F- SIGF-" # the entire
string in quotes counts here You can use the individual
labels from these strings as identifiers for data
columns in AutoSol and AutoBuild like this:
input_refinement_labels="FP SIGFP FreeR_flags"
# each individual label counts
dry_run= False Just read in and check parameter names
params_only= False Just read in and return parameter defaults
display_all= False Just read in and display parameter defaults
coot= None Not presently applicable for ligandfit
special_keywords
write_run_directory_to_file= None Writes the full name of a run
directory to the specified file. This can
be used as a call-back to tell a script
where the output is going to go.
non_user_parameters These are obsolete parameters and parameters that the
wizards use to communicate among themselves. Not
normally for general use.
gui_output_dir= None Used only by the GUI
sg= None Obsolete. Use space_group instead
get_lig_volume= False You can ask to get the volume of the ligand and to
then stop
input_data_file= None Not normally used. Use "data=" instead
input_lig_file= None Not normally used. Use "ligand=" instead.
ligand_code= None Not normally used. Use "ligand=" instead.
input_partial_model_file= None Not normally used. Use "model="
instead
cif_already_generated= False You can specify that the ligand cif file is
already generated
| |||||