Python-based Hierarchical ENvironment for Integrated Xtallography |
Documentation Home |
Fitting loops to fill in gaps in models with fit_loops
Author(s)
Purposefit_loops is a tool for building a loop into density to connect existing chain ends. You supply a model with a gap and a sequence file and coefficients for an electron density map, and you specify the first and last residues to be built. Then fit_loops will attempt to build the loop that you specify. One loop can be done at a time with fit_loops (but if you have multiple identical chains, you can fit them all at once). You can use either of two methods to fit the loops. By default fit_loops uses resolve chain extension to try to trace residues from the ends of segments in your input PDB file. If it can connect the segments, it writes out the connecting loops. Alternatively you can use a loop library supplied with PHENIX to connect ends of segments from your input PDB file. If you want a more complete model-building process, then you will want to use phenix.autobuild . fit_loops can be run from the command line or from the PHENIX GUI. UsageHow fit_loops works:fit_loops calculates a map based on the supplied map coefficients, then tries to extend the ends of the supplied model into the gap region, following the electron density in the map. Output files from fit_loopsmodel_with_loops.pdb: The output from fit_loops is a new PDB file containing your input model with the newly-built loop inserted into it (if a loop could be found). ExamplesStandard run of fit_loops:A typical command-line input would be: phenix.fit_loops pdb_in=nsf_gap.pdb mtz_in=map_coeffs.mtz \ seq_file=nsf.seq start=37 end=43 chain_id=NoneThis will fit a loop starting with residue 37 and ending with residue 43 in nsf_gap.pdb. phenix.fit_loops will expect that your existing nsf_gap.pdb model has a chain ending at residue 36 and another starting at residue 44. As chain_id=None in this example, if there are multiple chains A,B,C in nsf_gap.pdb then all 3 will be filled in. If you want (or need) to specify the column names from your mtz file, you will need to tell fit_loops what FP and PHIB (and optionally FOM) are, in this format: phenix.fit_loops pdb_in=nsf_gap.pdb mtz_in=map_coeffs.mtz \ seq_file=nsf.seq start=37 end=43 chain_id=None \ labin="FP=2FOFCWT PHIB=PH2FOFCWT"If you want to try and fit a loop with poor density, you might want to lower the threshold for the correlation of density in the loop (default minimum correlation is 0.2): phenix.fit_loops pdb_in=nsf_gap.pdb mtz_in=map_coeffs.mtz \ seq_file=nsf.seq start=37 end=43 chain_id=None \ loop_cc_min=0.1To use the loop library in PHENIX, use the keyword loop_lib: phenix.fit_loops pdb_in=nsf_gap.pdb mtz_in=map_coeffs.mtz \ seq_file=nsf.seq start=37 end=39 chain_id=None loop_lib=TrueThis will fit a loop starting with residue 37 and ending with residue 39. The maximum current length in the loop library is 3 residues. To use the loop library in PHENIX and try to connect any pair of segments that have an geometrical relationship, use the keyword connect_all: phenix.fit_loops pdb_in=nsf_gap.pdb mtz_in=map_coeffs.mtz \ seq_file=nsf.seq connect_all=TrueThis will go through all pairs of segments, trying to connect them with a loop from the PHENIX loop library. Note that this is a last-resort approach, normally instead use the default and let fit_loops connect segments that are close in sequence. Possible ProblemsSpecific limitations and problems:LiteratureAdditional informationList of all fit_loops keywords------------------------------------------------------------------------------- Legend: black bold - scope names black - parameter names red - parameter values blue - parameter help blue bold - scope help Parameter values: * means selected parameter (where multiple choices are available) False is No True is Yes None means not provided, not predefined, or left up to the program "%3d" is a Python style formatting descriptor ------------------------------------------------------------------------------- fit_loops input_files pdb_in= None PDB file with gap to fill. mtz_in= None MTZ file with coefficients for a map map_coeff_labels= None If map coefficients cannot be identified automatically from your MTZ file, you can specify the label or labels for them. (Please separate labels with blank space, MTZ columns grouped together separated by commas with no blanks.) You can specify: map_coeff_labels (e.g., FWT,PHIFWT) amplitudes and phases (e.g., FP,SIGFP PHIB) or amplitudes, phases, weights (e.g., FP,SIGFP PHIB FOM) labin= "" Labin line for MTZ file with map coefficients. Normally use instead map_coeff_labels. This is available for backward compatibility. You can specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM where myFP is your column label for FP seq_file= None sequence file (1-letter code, one copy of each chain) seq_prob_file= None File seq_prob.dat from resolve sequence alignment output_files pdb_out= connect.pdb Output PDB file (will be missing if no result). log= None Output logfile params_out= fit_loops_params.eff Parameters file to rerun fit_loops fitting min_acceptable_prob= None Minimum loop probability to consider refine_loops= True Refine fitted loops in loop_lib chain_id= None Chain ID for chain containing missing loops. If None any chain ID is allowed. All missing segments matching chain_id, start and end will be fit start= None Starting residue number of loop(s) to be fit. if None any start is allowed. All missing segments matching chain_id, start and end will be fit end= None Ending residue number of loop(s) to be fit. If None any end is allowed. All missing segments matching chain_id, start and end will be fit remove_loops= False Remove existing residues and replace with new loop All segments matching chain_id, start and end will be fit skip_trim= True If skip_trim=True (default) then model with added loops will be written out without checking for overlaps with non-sequence-aligned residues. If skip_trim=False then this check will be carried out. Note that skip_trim=False can cause some residues to be de-assigned from sequence if they cannot be successfully matched to density. In such cases you might try skip_trim=True. score_min= 1.0 Minimum connection score for connect_all_segments save_acceptable_loops= False Just return a file with possible loops n_random_loop= 200 Number of tries at building loops connect_all_segments= False Try to connect all segments to each other, regardless of sequence and residue numbers. Note: this is a last-resort approach. Normally just use the default and let fit_loops connect segments that are close in sequence. sequential_only= False Only connect adjacent segments in connect_all_segments ignore_sequence_register= False Ignore the input sequence register all_assigned= False Assume all residues in model can be assigned to sequence sequence_offset= None Input sequence file residue numbers offset by sequence_offset before use (same as adding sequence_offset*X to beginning) Note: number of of sequence_offset values must match number of chains in sequence file loop_cc_min= 0.2 Minimum loop map-model correlation aggressive= False Aggressive loop building (risky) target_insert= None You can try to force the number of residues to insert with trace_loops. If none, try to fill in the gap based on the number of missing residues. If set and greater than 0, only accept the number of residues given. If zero, take any length. Not supported. loop_lib= False Use loop library to fit loops Only applicable for chain_type=PROTEIN standard_loops= True Use standard loop fitting trace_loops= False Use loop tracing to fit loops Only applicable for chain_type=PROTEIN refine_trace_loops= True Refine loops (real-space) after trace_loops density_of_points= None Packing density of points to consider as as possible CA atoms in trace_loops. Try 1.0 for a quick run, up to 5 for much more thorough run If None, try value depending on value of quick. max_density_of_points= None Maximum packing density of points to consider as as possible CA atoms in trace_loops. cutout_model_radius= None Radius to cut out density for trace_loops If None, guess based on length of loop max_cutout_model_radius= 20. Maximum value of cutout_model_radius to try padding= 1. Padding for cut out density in trace_loops max_span= 30 Maximum length of a gap to try to fill max_overlap= None Maximum number of residues from ends to start with. (1=use existing ends, 2=one in from ends etc) If None, set based on value of quick. min_overlap= None Minimum number of residues from ends to start with. (1=use existing ends, 2=one in from ends etc) crystal_info resolution= 0. high-resolution limit for map calculation chain_type= *PROTEIN DNA RNA Chain type (for identifying main-chain and side-chain atoms) directories temp_dir= "temp_dir" Optional temporary work directory output_dir= None Output directory where files are to be written gui_output_dir= None GUI use only - does not apply to command line version control verbose= False Verbose output quick= False Try to run quickly raise_sorry= False Raise sorry if problems debug= False Debugging output dry_run= False Just read in and check parameter names coarse_grid= False Use coarse grid (saves on memory) i_ran_seed= None random seed resolve_command_list= None Commands for resolve. One per line in the form: keyword value value can be optional Examples: coarse_grid resolution 200 2.0 hklin test.mtz NOTE: for command-line usage you need to enclose the whole set of commands in double quotes (") and each individual command in single quotes (') like this: resolve_command_list="'no_build' 'b_overall 23' " write_run_directory_to_file= None The working directory name is written to this file pickled_arg_dict= None Pickled keywords to __init__ are in this file nproc= 1 You can specify the number of processors to use max_wait_time= 1.0 You can specify the length of time (seconds) to wait when looking for a file. If you have a cluster where jobs do not start right away you may need a longer time to wait. The symptom of too short a wait time is 'File not found' wait_between_submit_time= 1.0 You can specify the length of time (seconds) to wait between each job that is submitted when running sub-processes. This can be helpful on NFS-mounted systems when running with multiple processors to avoid file conflicts. The symptom of too short a wait_between_submit_time is File exists:.... background= None run jobs in background or not (if nproc is greater than 1) Usually set automatically. If run_command is sh or csh, True run_command= "sh " Command for running jobs (e.g., sh or qsub ) non_user_params print_citations= True Print citation information at end of run gui GUI-specific parameters, not used on command line result_file= None job_title= None Job title in PHENIX GUI, not used on command line |