Finding all ligands in a map with phenix.find_all


	Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home

Finding all ligands in a map with phenix.find_all_ligands

Author(s)
Purpose
Usage: How phenix.find_all_ligands works:; Output files from phenix.find_all_ligands
Examples: Standard run of phenix.find_all_ligands:
Possible Problems: Specific limitations and problems:
Literature
Additional information: List of all find_all_ligands keywords

Author(s)

phenix.find_all_ligands: Tom Terwilliger

Purpose

phenix.find_all_ligands is a command line tool for finding all the ligands in a map by repetitively running phenix.ligandfit with a series of ligands and choosing the best-fitting one at each cycle.

Usage

How phenix.find_all_ligands works:

The basic procedure for phenix.find_all_ligands has three steps. The first is to identify the largest contiguous region of density in your map that is not already occupied by your model or previously-fitted ligands. The second is to fit each ligand (you identify the candidate ligands in advance) into this density. The third is to choose the one that fits the density the best. Then the best-fitting ligand is added to the structure and the process is repeated until the number of ligands you request is found or the correlation of ligand to the map drops below the value you specify (default=0.5).

Output files from phenix.find_all_ligands

The output ligand files from phenix.find_all_ligands are normally in the temporary directory (default='temp_dir'). They will be files with names such as "SITE_1_ATP.pdb" for the placement of ATP in the first site fitted.

Examples

Standard run of phenix.find_all_ligands:

Running phenix.find_all_ligands is easy. Usually you will want to edit a small parameter file (find_all_ligands.eff) to contain your commands like this, where the ligandfit commands are sent to phenix.ligandfit: for the actual fitting and the find_all_ligands commands determine what searches are done: type:

#  commands for running phenix.find_all_ligands
find_all_ligands {
  number_of_ligands = 5
  cc_min = 0.5
  ligand_list =  ATP.pdb NAD.pdb
  nproc = 2
}
ligandfit {
  data = "nsf-d2.mtz"
  model = "nsf-d2_noligand.pdb"
  lig_map_type = fo-fc_difference_map
}

You might also want to add to this some additional commands for phenix.ligandfit. Any commands for ligandfit are allowed, except that the commands "ligand" and "input_lig_file" are ignored as the input ligand comes from the find_all_ligands command "ligand_list":

# find_all_ligands.eff  more commands for ligandfit
ligandfit {
data = "nsf-d2.mtz"
model = "nsf-d2_noligand.pdb"
lig_map_type = fo-fc_difference_map
ligand_cc_min = 0.75
verbose = Yes
}

where you can put any phenix.ligandfit commands in the braces. Then you can run this with the command:

phenix.find_all_ligands find_all_ligands.eff

Possible Problems

Specific limitations and problems:

This method uses phenix.ligandfit to do the ligand fitting, so all the commands, features, and limitations of phenix.ligandfit apply to phenix.find_all_ligands.

Literature

Additional information

NOTE: in addition to the find_all_ligands keywords shown here, all phenix.ligandfit commands are also allowed, except that the commands "ligand" and "input_lig_file" are ignored as the input ligand comes from the find_all_ligands.

List of all find_all_ligands keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
find_all_ligands
   number_of_ligands= None Total number of ligand sites. Ignored if "None".
                      find_all_ligands will keep looking until the correlation
                      coefficient for the fit of the best ligand is less than
                      cc_min or the number of ligands placed is
                      number_of_ligands, whichever comes first
   cc_min= 0.50 Ignored if "None". find_all_ligands will keep looking until
           the correlation coefficient for the fit of the best ligand is less
           than cc_min or the number of ligands placed is number_of_ligands,
           whichever comes first
   ligand_list= None List of files with ligands to find
   nproc= 1 number of processors to use
   background= True run jobs in background or not (if nproc is greater than 1)
   run_command= "sh " Command for running jobs (e.g., sh or qsub )
   verbose= False verbose output
   raise_sorry= False Raise sorry if problems
   debug= False debugging output
   temp_dir= Auto Optional temporary work directory
   output_dir= "" Output directory where files are to be written
   dry_run= False Just read in and check parameter names
   max_wait_time= 1.0 You can specify the length of time (seconds) to wait
                  when looking for a file. If you have a cluster where jobs do
                  not start right away you may need a longer time to wait. The
                  symptom of too short a wait time is 'File not found'
   wait_between_submit_time= 1.0 You can specify the length of time (seconds)
                             to wait between each job that is submitted when
                             running sub-processes. This can be helpful on
                             NFS-mounted systems when running with multiple
                             processors to avoid file conflicts. The symptom
                             of too short a wait_between_submit_time is File
                             exists:....
ligandfit
   data= None Datafile. This can be any format if only FP is to be read in. If
         phases are to be read in then MTZ format is required. The Wizard will
         guess the column identification. If you want to specify it you can
         say input_labels="FP" , or input_labels="FP PHIB
         FOM".
   ligand= None Three-letter code of ligand, or file containing information
           about the ligand (PDB or SMILES)
   model= None PDB file with model for everything but the ligand
   quick= False Set to True for running as quickly as possible.
   crystal_info
      unit_cell= None Enter cell parameter a b c alpha beta gamma
      resolution= 0 High-resolution limit. Zero means keep everything.
      space_group= None Space Group symbol (i.e., C2221 or C 2 2 21)
   file_info
      file_or_file_list= *single_file file_with_list_of_files Choose if you
                         want to input a single file with PDB or other
                         information about the ligand or if you want to input
                         a file containing a list of files with this
                         information for a list of ligands
      input_labels= None Labels for input data columns
      lig_map_type= fo-fc_difference_map fobs_map *pre_calculated_map_coeffs 
                   Enter the type of map to use in ligand fitting
                    fo-fc_difference_map: Fo-Fc difference map phased on
                    partial model (requires FOBS in your input file) fobs_map:
                    Fo map phased on partial model (requires FOBS in your
                    input file) pre_calculated_map_coeffs: map calculated from
                    FP PHIB [FOM] coefficients in input data file (or 2FOFCWT
                    PH2FOFCWT coeffs)
      ligand_format= *PDB SMILES Enter whether the files contain SMILES
                     strings or PDB formatted information
   input_files
      existing_ligand_file_list= None You can enter a list of PDB files with
                                 ligands you have already fit. These will be
                                 used to exclude that region from
                                 consideration.
      ligand_start= None LigandFit will attempt to put your ligand
                    superimposing on ligand_start if supplied. This must have
                    some of the same atoms as your ligand, but does not have
                    to have all of them.
      ncs_in= None You can supply a file with NCS information for use with
              ligands_from_ncs
      input_ligand_compare_file= None If you enter a PDB file with a ligand in
                                 it, the coordinates of the newly-built ligand
                                 will be compared with the coordinates in this
                                 file.
      cif_def_file_list= None You can supply cif files for real-space
                         refinement after fitting
      refinement_file= None You can supply a file for full refinement
                       containing F/I SIGF/SIGI FreeR_flag If you supply this
                       file then after real-space refinement a round of full
                       refinement will be carried out with phenix.refine
      fobs_labels= None Labels for Fobs SigFobs or Iobs SigIobs for
                   refinement_file... same format as for phenix.refine
      r_free_label= None Label for FreeR_flag in refinement_file...same format
                    as for phenix.refine
   search_parameters
      conformers= 1 Enter how many conformers to create. If greater than 1,
                  then ELBOW will always be used to generate them. If 1 then
                  ELBOW will be used if a PDB file is not specified. These
                  conformers are used to identify allowed torsion angles for
                  your ligand. The alternative is to use the empirical rules
                  in RESOLVE. ELBOW takes longer but is more accurate.
      group_search= 0 Enter the ID number of the group from the ligand to use
                    to seed the search for conformations
      ligand_cc_min= 0.75 Enter the minimum correlation coefficient of the
                     ligand to the map to quit searching for more
                     conformations
      ligand_completeness_min= 1 Enter the minimum completeness of the ligand
                               to the map to quit searching for more
                               conformations
      local_search= True If local_search is True then, only the region within
                    search_dist of the point in the map with the highest local
                    rmsd will be searched in the FFT search for fragments
      search_dist= 10 If local_search is True then, only the region within
                   this distance of the point in the map with the highest
                   local rmsd will be searched in the FFT search for fragments
      use_cc_local= False You can specify the use of a local correlation
                    coefficient for scoring ligand fits to the map. If you do
                    not do this, then the region over which the ligand is
                    scored are all points within 2.5 A of the atoms in the
                    ligand. If you do specify use_cc_local, then the region
                    over which the ligand is scored are all these points, plus
                    all the contingous points that have density greater than
                    0.5 * sigma .
      ligands_from_ncs= False You can try to use ncs (from your partial model
                        file or from your ncs_in file) along with any ligands
                        already found to place additional copies of your
                        ligand. Only applicable if there is one type of
                        ligand.
      max_ligands_from_ncs= 1 You can specify how many of the ligands already
                            found to consider using NCS (usually 1)
      n_group_search= 3 Enter the number of different fragments of the ligand
                      that will be looked for in FFT search of the map
      n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at
                         once otherwise all are first searched at once then
                         individually up to the number specified
      n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are
                         tested at once otherwise all are first tested at once
                         then individually up to the number specified
      number_of_ligands= 1 Number of copies of the ligand expected in the
                         asymmetric unit
      offsets_list= 7 53 29 You can specify an offset for the orientation of
                    the templates in searching for ligands. This is used in
                    generating diversity in models.
      refine_ligand= True You can carry out real-space refinement on the
                     ligand after fitting
      real_space_target_weight= 10. You can carry change the weight on the
                                real-space term in real-space refinement on
                                the ligand after fitting
      fitting Parameters for tracing ligand
         delta_phi_ligand= 40 Specify the angle (degrees) between successive
                           tries in FFT search for fragments
         fit_phi_inc= 20 Specify the angle (degrees) between rotations around
                      bonds
         fit_phi_range= -180 180 Range of bond rotation angles to search
   search_target
      ligand_near_chain= None You can specify where to search for the ligand
                         either with search_center or with ligand_near_res and
                         ligand_near_chain. If you set
                         ligand_near_chain="None" or leave it blank
                         or do not set it, then all chains will be included.
                         The keywords ligand_near_res and ligand_near_chain
                         refer to residue/chain in the file defined by
                         input_partial_model_file (or model if running from
                         command line).
      ligand_near_res= None You can specify where to search for the ligand
                       either with search_center or with ligand_near_res and
                       ligand_near_chain The keywords ligand_near_res and
                       ligand_near_chain refer to residue/chain in the file
                       defined by input_partial_model_file (or model if
                       running from command line).
      ligand_near_pdb= None You can specify where LigandFit should look for
                       your ligands by providing a PDB file containing one or
                       more copies of the ligand. If you want you can provide
                       a PDB file with ligand+ macromolecule and specify the
                       ligand name with name_of_ligand_near_pdb.
      name_of_ligand_near_pdb= None You can specify where LigandFit should
                               look for your ligands by providing a PDB file
                               containing one or more copies of the ligand. If
                               you want you can provide a PDB file with
                               ligand+ macromolecule and specify the ligand
                               name with name_of_ligand_near_pdb.
      search_center= 0.0 0.0 0.0 Enter coordinates for center of search region
                     (ignored if [0.,0.,0.])
   general
      extend_try_list= True You can fill out the list of parallel jobs to
                       match the number of jobs you want to run at one time,
                       as specified with nbatch.
      ligand_id= None You can specify an integer value for the ID of a
                 ligand... This number will be added to whatever residue
                 number the ligand search model in input_lig_file has. The
                 keyword is only valid if a single copy of the ligand is to be
                 found.
      nbatch= 5 You can specify the number of processors to use (nproc) and
              the number of batches to divide the data into for parallel jobs.
              Normally you will set nproc to the number of processors
              available and leave nbatch alone. If you leave nbatch as None it
              will be set automatically, with a value depending on the Wizard.
              This is recommended. The value of nbatch can affect the results
              that you get, as the jobs are not split into exact replicates,
              but are rather run with different random numbers. If you want to
              get the same results, keep the same value of nbatch.
      nproc= 1 You can specify the number of processors to use (nproc) and the
             number of batches to divide the data into for parallel jobs.
             Normally you will set nproc to the number of processors available
             and leave nbatch alone. If you leave nbatch as None it will be
             set automatically, with a value depending on the Wizard. This is
             recommended. The value of nbatch can affect the results that you
             get, as the jobs are not split into exact replicates, but are
             rather run with different random numbers. If you want to get the
             same results, keep the same value of nbatch. If you set
             nproc=Auto and your machine has n processors, then it will use
             n-1 processors, or 1 if only 1 is available
      resolve_command_list= None Commands for resolve. One per line in the
                            form: keyword value value can be optional
                            Examples: coarse_grid resolution 200 2.0 hklin
                            test.mtz NOTE: for command-line usage you need to
                            enclose the whole set of commands in double quotes
                            (") and each individual command in single
                            quotes (') like this:
                            resolve_command_list="'no_build' 'b_overall
                            23' "
      coot_name= "coot" If your version of coot is called something else, then
                 you can specify that here.
      i_ran_seed= 72432 Random seed (positive integer) for model-building and
                  simulated annealing refinement
      raise_sorry= False You can have any failure end with a Sorry instead of
                   simply printout to the screen
      background= True When you specify nproc=nn, you can run the jobs in
                  background (default if nproc is greater than 1) or
                  foreground (default if nproc=1). If you set run_command=qsub
                  (or otherwise submit to a batch queue), then you should set
                  background=False, so that the batch queue can keep track of
                  your runs. There is no need to use background=True in this
                  case because all the runs go as controlled by your batch
                  system. If you use run_command='sh ' (or similar, sh is
                  default) then normally you will use background=True so that
                  all the jobs run simultaneously.
      max_wait_time= 1.0 You can specify the length of time (seconds) to wait
                     when looking for a file. If you have a cluster where jobs
                     do not start right away you may need a longer time to
                     wait. The symptom of too short a wait time is 'File not
                     found'
      wait_between_submit_time= 1.0 You can specify the length of time
                                (seconds) to wait between each job that is
                                submitted when running sub-processes. This can
                                be helpful on NFS-mounted systems when running
                                with multiple processors to avoid file
                                conflicts. The symptom of too short a
                                wait_between_submit_time is File exists:....
      cache_resolve_libs= True Use caching of resolve libraries to speed up
                          resolve
      resolve_size= 12 Size for solve/resolve
                    ("","_giant",
                    "_huge","_extra_huge" or a number
                    where 12=giant 18=huge
      check_run_command= False You can have the wizard check your run command
                         at startup
      run_command= "sh " When you specify nproc=nn, you can run the
                   subprocesses as jobs in background with sh (default) or
                   submit them to a queue with the command of your choice
                   (i.e., qsub ). If you have a multi-processor machine, use
                   sh. If you have a cluster, use qsub or the equivalent
                   command for your system. NOTE: If you set run_command=qsub
                   (or otherwise submit to a batch queue), then you should set
                   background=False, so that the batch queue can keep track of
                   your runs. There is no need to use background=True in this
                   case because all the runs go as controlled by your batch
                   system. If nproc is greater than 1 and you use
                   run_command='sh '(or similar, sh is default) then normally
                   you will use background=True so that all the jobs run
                   simultaneously.
      last_process_is_local= True If true, run the last process in a group in
                             background with sh as part of the job that is
                             submitting jobs. This prevents having the job
                             that is submitting jobs sit and wait for all the
                             others while doing nothing
      skip_r_factor= False You can skip R-factor calculation if refinement is
                     not done and maps_only=True
      skip_xtriage= False You can bypass xtriage if you want. This will
                    prevent you from applying anisotropy corrections, however.
      base_path= None You can specify the base path for files (default is
                 current working directory)
      temp_dir= None Define a temporary directory (it must exist)
      clean_up= True At the end of the entire run the TEMP directories will be
                removed if clean_up is True. The default is yes, delete these
                directories. If you want to remove them after your run is
                finished use a command like "phenix.autobuild run=1
                clean_up=True" Files listed in keep_files will not be
                deleted
      solution_output_pickle_file= None At end of run, write solutions to this
                                   file in output directory if defined
      title= None Enter any text you like to help identify what you did in
             this run
      top_output_dir= None This is used in subprocess calls of wizards and to
                      tell the Wizard where to look for the STOPWIZARD file.
      wizard_directory_number= None This is used by the GUI to define the run
                               number for Wizards. It is the same as
                               desired_run_number NOTE: this value can only be
                               specified on the command line, as the directory
                               number is set before parameters files are read.
      verbose= False Command files and other verbose output will be printed
      extra_verbose= False Facts and possible commands will be printed every
                     cycle if True
      debug= False You can have the wizard stop with error messages about the
             code if you use debug. NOTE: you cannot use Pause with debug.
             Additionally the output goes to the terminal if you specify
             "debug=True"
      require_nonzero= True Require non-zero values in data columns to
                       consider reading in.
      remove_path_word_list= None List of words identifying paths to remove
                             from PATH These can be used to shorten your PATH.
                             For example... cns ccp4 coot would remove all
                             paths containing these words except those also
                             containing phenix. Capitalization is ignored.
      fill= False Fill in all missing reflections to resolution res_fill.
            Applies to density modified maps. See also filled_2fofc_maps in
            autobuild.
      res_fill= None Resolution for filling in missing data (default = highest
                resolution of any datafile). Only applies to density modified
                maps. Default is fill to high resolution of data. Ignored if
                fill=False
      keep_files= ligandfit*.pdb List of files that are not to be cleaned up.
                  wildcards permitted
   display
      number_of_solutions_to_display= None Number of solutions to put on
                                      screen and to write out
      solution_to_display= 1 Solution number of the solution to display and
                           write out ( use 0 to let the wizard display the top
                           solution)
   run_control
      ignore_blanks= None ignore_blanks allows you to have a command-line
                     keyword with a blank value like
                     "input_lig_file_list="
      stop= None You can stop the current wizard with "stopwizard"
            or "stop". If you type "phenix.autobuild run=3
            stop" then this will stop run 3 of autobuild.
      display_facts= None Set display_facts to True and optionally
                     run=[run-number] to display the facts for run run-number.
                     If you just say display_facts then the facts for the
                     highest-numbered existing run will be shown.
      display_summary= None Set display_summary to True and optionally
                       run=[run-number] to show the summary for run
                       run-number. If you just say display_summary then the
                       summary for the highest-numbered existing run will be
                       shown.
      carry_on= None Set carry_on to True to carry on with highest-numbered
                run from where you left off.
      run= None Set run to n to continue with run n where you left off.
      copy_run= None Set copy_run to n to copy run n to a new run and continue
                where you left off.
      display_runs= None List all runs for this wizard.
      delete_runs= None List runs to delete: 1 2 3-5 9:12
      display_labels= None display_labels=test.mtz will list all the labels
                      that identify data in test.mtz. You can use the label
                      strings that are produced in AutoSol to identify which
                      data to use from a datafile like this:
                      peak.data="F+ SIGF+ F- SIGF-" # the entire
                      string in quotes counts here You can use the individual
                      labels from these strings as identifiers for data
                      columns in AutoSol and AutoBuild like this:
                      input_refinement_labels="FP SIGFP FreeR_flags"
                      # each individual label counts
      dry_run= False Just read in and check parameter names
      params_only= False Just read in and return parameter defaults
      display_all= False Just read in and display parameter defaults
      coot= None Not presently applicable for ligandfit
   special_keywords
      write_run_directory_to_file= None Writes the full name of a run
                                   directory to the specified file. This can
                                   be used as a call-back to tell a script
                                   where the output is going to go.
   non_user_parameters These are obsolete parameters and parameters that the
                       wizards use to communicate among themselves. Not
                       normally for general use.
      gui_output_dir= None Used only by the GUI
      sg= None Obsolete. Use space_group instead
      get_lig_volume= False You can ask to get the volume of the ligand and to
                      then stop
      input_data_file= None Not normally used. Use "data=" instead
      input_lig_file= None Not normally used. Use "ligand=" instead.
      ligand_code= None Not normally used. Use "ligand=" instead.
      input_partial_model_file= None Not normally used. Use "model="
                                instead
      cif_already_generated= False You can specify that the ligand cif file is
                             already generated