Python-based Hierarchical ENvironment for Integrated Xtallography |
Documentation Home |
Finding all ligands in a map with phenix.find_all_ligands
Author(s)
Purposephenix.find_all_ligands is a command line tool for finding all the ligands in a map by repetitively running phenix.ligandfit with a series of ligands and choosing the best-fitting one at each cycle. UsageHow phenix.find_all_ligands works:The basic procedure for phenix.find_all_ligands has three steps. The first is to identify the largest contiguous region of density in your map that is not already occupied by your model or previously-fitted ligands. The second is to fit each ligand (you identify the candidate ligands in advance) into this density. The third is to choose the one that fits the density the best. Then the best-fitting ligand is added to the structure and the process is repeated until the number of ligands you request is found or the correlation of ligand to the map drops below the value you specify (default=0.5). Output files from phenix.find_all_ligandsThe output ligand files from phenix.find_all_ligands are normally in the temporary directory (default='temp_dir'). They will be files with names such as "SITE_1_ATP.pdb" for the placement of ATP in the first site fitted. ExamplesStandard run of phenix.find_all_ligands:Running phenix.find_all_ligands is easy. Usually you will want to edit a small parameter file (find_all_ligands.eff) to contain your commands like this, where the ligandfit commands are sent to phenix.ligandfit: for the actual fitting and the find_all_ligands commands determine what searches are done: type: # commands for running phenix.find_all_ligands find_all_ligands { number_of_ligands = 5 cc_min = 0.5 ligand_list = ATP.pdb NAD.pdb nproc = 2 } ligandfit { data = "nsf-d2.mtz" model = "nsf-d2_noligand.pdb" lig_map_type = fo-fc_difference_map }You might also want to add to this some additional commands for phenix.ligandfit. Any commands for ligandfit are allowed, except that the commands "ligand" and "input_lig_file" are ignored as the input ligand comes from the find_all_ligands command "ligand_list": # find_all_ligands.eff more commands for ligandfit ligandfit { data = "nsf-d2.mtz" model = "nsf-d2_noligand.pdb" lig_map_type = fo-fc_difference_map ligand_cc_min = 0.75 verbose = Yes }where you can put any phenix.ligandfit commands in the braces. Then you can run this with the command: phenix.find_all_ligands find_all_ligands.eff Possible ProblemsSpecific limitations and problems:
LiteratureAdditional informationNOTE: in addition to the find_all_ligands keywords shown here, all phenix.ligandfit commands are also allowed, except that the commands "ligand" and "input_lig_file" are ignored as the input ligand comes from the find_all_ligands. List of all find_all_ligands keywords------------------------------------------------------------------------------- Legend: black bold - scope names black - parameter names red - parameter values blue - parameter help blue bold - scope help Parameter values: * means selected parameter (where multiple choices are available) False is No True is Yes None means not provided, not predefined, or left up to the program "%3d" is a Python style formatting descriptor ------------------------------------------------------------------------------- find_all_ligands number_of_ligands= None Total number of ligand sites. Ignored if "None". find_all_ligands will keep looking until the correlation coefficient for the fit of the best ligand is less than cc_min or the number of ligands placed is number_of_ligands, whichever comes first cc_min= 0.50 Ignored if "None". find_all_ligands will keep looking until the correlation coefficient for the fit of the best ligand is less than cc_min or the number of ligands placed is number_of_ligands, whichever comes first ligand_list= None List of files with ligands to find nproc= 1 number of processors to use background= True run jobs in background or not (if nproc is greater than 1) run_command= "sh " Command for running jobs (e.g., sh or qsub ) verbose= False verbose output raise_sorry= False Raise sorry if problems debug= False debugging output temp_dir= Auto Optional temporary work directory output_dir= "" Output directory where files are to be written dry_run= False Just read in and check parameter names check_wait_time= 1.0 You can specify the length of time (seconds) to wait between checking for subprocesses to end wait_between_submit_time= 1.0 You can specify the length of time (seconds) to wait between each job that is submitted when running sub-processes. This can be helpful on NFS-mounted systems when running with multiple processors to avoid file conflicts. The symptom of too short a wait_between_submit_time is File exists:.... ligandfit data= None Datafile. This can be any format if only FP is to be read in. If phases are to be read in then MTZ format is required. The Wizard will guess the column identification. If you want to specify it you can say input_labels="FP" , or input_labels="FP PHIB FOM". ligand= None Three-letter code of ligand, or file containing information about the ligand (PDB or SMILES) model= None PDB file with model for everything but the ligand quick= False Set to True for running as quickly as possible. crystal_info unit_cell= None Enter cell parameter a b c alpha beta gamma resolution= 0 High-resolution limit. Zero means keep everything. space_group= None Space Group symbol (i.e., C2221 or C 2 2 21) file_info file_or_file_list= *single_file file_with_list_of_files Choose if you want to input a single file with PDB or other information about the ligand or if you want to input a file containing a list of files with this information for a list of ligands input_labels= None Labels for input data columns lig_map_type= fo-fc_difference_map fobs_map *pre_calculated_map_coeffs Enter the type of map to use in ligand fitting fo-fc_difference_map: Fo-Fc difference map phased on partial model (requires FOBS in your input file) fobs_map: Fo map phased on partial model (requires FOBS in your input file) pre_calculated_map_coeffs: map calculated from FP PHIB [FOM] coefficients in input data file (or 2FOFCWT PH2FOFCWT coeffs) ligand_format= *PDB SMILES Enter whether the files contain SMILES strings or PDB formatted information input_files existing_ligand_file_list= None You can enter a list of PDB files with ligands you have already fit. These will be used to exclude that region from consideration. ligand_start= None LigandFit will attempt to put your ligand superimposing on ligand_start if supplied. This must have some of the same atoms as your ligand, but does not have to have all of them. ncs_in= None You can supply a file with NCS information for use with ligands_from_ncs input_ligand_compare_file= None If you enter a PDB file with a ligand in it, the coordinates of the newly-built ligand will be compared with the coordinates in this file. cif_def_file_list= None You can supply cif files for real-space refinement after fitting refinement_file= None You can supply a file for full refinement containing F/I SIGF/SIGI FreeR_flag If you supply this file then after real-space refinement a round of full refinement will be carried out with phenix.refine fobs_labels= None Labels for Fobs SigFobs or Iobs SigIobs for refinement_file... same format as for phenix.refine r_free_label= None Label for FreeR_flag in refinement_file...same format as for phenix.refine search_parameters conformers= 1 Enter how many conformers to create. If greater than 1, then ELBOW will always be used to generate them. If 1 then ELBOW will be used if a PDB file is not specified. These conformers are used to identify allowed torsion angles for your ligand. The alternative is to use the empirical rules in RESOLVE. ELBOW takes longer but is more accurate. group_search= 0 Enter the ID number of the group from the ligand to use to seed the search for conformations ligand_cc_min= 0.75 Enter the minimum correlation coefficient of the ligand to the map to quit searching for more conformations ligand_completeness_min= 1 Enter the minimum completeness of the ligand to the map to quit searching for more conformations local_search= True If local_search is True then, only the region within search_dist of the point in the map with the highest local rmsd will be searched in the FFT search for fragments search_dist= 10 If local_search is True then, only the region within this distance of the point in the map with the highest local rmsd will be searched in the FFT search for fragments use_cc_local= False You can specify the use of a local correlation coefficient for scoring ligand fits to the map. If you do not do this, then the region over which the ligand is scored are all points within 2.5 A of the atoms in the ligand. If you do specify use_cc_local, then the region over which the ligand is scored are all these points, plus all the contingous points that have density greater than 0.5 * sigma . ligands_from_ncs= False You can try to use ncs (from your partial model file or from your ncs_in file) along with any ligands already found to place additional copies of your ligand. Only applicable if there is one type of ligand. max_ligands_from_ncs= 1 You can specify how many of the ligands already found to consider using NCS (usually 1) n_group_search= 3 Enter the number of different fragments of the ligand that will be looked for in FFT search of the map n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at once otherwise all are first searched at once then individually up to the number specified n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are tested at once otherwise all are first tested at once then individually up to the number specified number_of_ligands= 1 Number of copies of the ligand expected in the asymmetric unit offsets_list= 7 53 29 You can specify an offset for the orientation of the templates in searching for ligands. This is used in generating diversity in models. refine_ligand= True You can carry out real-space refinement on the ligand after fitting ligand_occupancy= 1.0 You can set the initial occupancy of the ligand real_space_target_weight= 10. You can carry change the weight on the real-space term in real-space refinement on the ligand after fitting fitting Parameters for tracing ligand delta_phi_ligand= 40 Specify the angle (degrees) between successive tries in FFT search for fragments fit_phi_inc= 20 Specify the angle (degrees) between rotations around bonds fit_phi_range= -180 180 Range of bond rotation angles to search search_target ligand_near_chain= None You can specify where to search for the ligand either with search_center or with ligand_near_res and ligand_near_chain. If you set ligand_near_chain="None" or leave it blank or do not set it, then all chains will be included. The keywords ligand_near_res and ligand_near_chain refer to residue/chain in the file defined by input_partial_model_file (or model if running from command line). ligand_near_res= None You can specify where to search for the ligand either with search_center or with ligand_near_res and ligand_near_chain The keywords ligand_near_res and ligand_near_chain refer to residue/chain in the file defined by input_partial_model_file (or model if running from command line). ligand_near_pdb= None You can specify where LigandFit should look for your ligands by providing a PDB file containing one or more copies of the ligand. If you want you can provide a PDB file with ligand+ macromolecule and specify the ligand name with name_of_ligand_near_pdb. name_of_ligand_near_pdb= None You can specify where LigandFit should look for your ligands by providing a PDB file containing one or more copies of the ligand. If you want you can provide a PDB file with ligand+ macromolecule and specify the ligand name with name_of_ligand_near_pdb. search_center= 0.0 0.0 0.0 Enter coordinates for center of search region (ignored if [0.,0.,0.]) general extend_try_list= True You can fill out the list of parallel jobs to match the number of jobs you want to run at one time, as specified with nbatch. ligand_id= None You can specify an integer value for the ID of a ligand... This number will be added to whatever residue number the ligand search model in input_lig_file has. The keyword is only valid if a single copy of the ligand is to be found. nbatch= 5 You can specify the number of processors to use (nproc) and the number of batches to divide the data into for parallel jobs. Normally you will set nproc to the number of processors available and leave nbatch alone. If you leave nbatch as None it will be set automatically, with a value depending on the Wizard. This is recommended. The value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch. nproc= 1 You can specify the number of processors to use (nproc) and the number of batches to divide the data into for parallel jobs. Normally you will set nproc to the number of processors available and leave nbatch alone. If you leave nbatch as None it will be set automatically, with a value depending on the Wizard. This is recommended. The value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch. If you set nproc=Auto and your machine has n processors, then it will use n-1 processors, or 1 if only 1 is available resolve_command_list= None Commands for resolve. One per line in the form: keyword value value can be optional Examples: coarse_grid resolution 200 2.0 hklin test.mtz NOTE: for command-line usage you need to enclose the whole set of commands in double quotes (") and each individual command in single quotes (') like this: resolve_command_list="'no_build' 'b_overall 23' " coot_name= "coot" If your version of coot is called something else, then you can specify that here. i_ran_seed= 72432 Random seed (positive integer) for model-building and simulated annealing refinement raise_sorry= False You can have any failure end with a Sorry instead of simply printout to the screen background= True When you specify nproc=nn, you can run the jobs in background (default if nproc is greater than 1) or foreground (default if nproc=1). If you set run_command=qsub (or otherwise submit to a batch queue), then you should set background=False, so that the batch queue can keep track of your runs. There is no need to use background=True in this case because all the runs go as controlled by your batch system. If you use run_command='sh ' (or similar, sh is default) then normally you will use background=True so that all the jobs run simultaneously. check_wait_time= 1.0 You can specify the length of time (seconds) to wait between checking for subprocesses to end max_wait_time= 1.0 You can specify the length of time (seconds) to wait when looking for a file. If you have a cluster where jobs do not start right away you may need a longer time to wait. The symptom of too short a wait time is 'File not found' wait_between_submit_time= 1.0 You can specify the length of time (seconds) to wait between each job that is submitted when running sub-processes. This can be helpful on NFS-mounted systems when running with multiple processors to avoid file conflicts. The symptom of too short a wait_between_submit_time is File exists:.... cache_resolve_libs= True Use caching of resolve libraries to speed up resolve resolve_size= 12 Size for solve/resolve ("","_giant", "_huge","_extra_huge" or a number where 12=giant 18=huge check_run_command= False You can have the wizard check your run command at startup run_command= "sh " When you specify nproc=nn, you can run the subprocesses as jobs in background with sh (default) or submit them to a queue with the command of your choice (i.e., qsub ). If you have a multi-processor machine, use sh. If you have a cluster, use qsub or the equivalent command for your system. NOTE: If you set run_command=qsub (or otherwise submit to a batch queue), then you should set background=False, so that the batch queue can keep track of your runs. There is no need to use background=True in this case because all the runs go as controlled by your batch system. If nproc is greater than 1 and you use run_command='sh '(or similar, sh is default) then normally you will use background=True so that all the jobs run simultaneously. queue_commands= None You can add any commands that need to be run for your queueing system. These are written before any other commands in the file that is submitted to your queueing system. For example on a PBS system you might say: queue_commands='#PBS -N mr_rosetta' queue_commands='#PBS -j oe' queue_commands='#PBS -l walltime=03:00:00' queue_commands='#PBS -l nodes=1:ppn=1' NOTE: you can put in the characters ' |