Rapid phase improvement and model-building using phase_and


	Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home

Rapid phase improvement and model-building using phase_and_build

Author(s)
Purpose
Usage: How phase_and_build works:; Output files from phase_and_build; Parameters files in phase_and_build
Examples: Standard run of phase_and_build:
Possible Problems: Specific limitations and problems:
Literature
Additional information: List of all phase_and_build keywords

Author(s)

phase_and_build: Tom Terwilliger

Purpose

phase_and_build is a new and rapid method for improving the quality of your map and building a model. The approach is to carry out an iterative process of building a model as rapidly as possible and using this model in density modification to improve the map. This approach is related to the older phenix.autobuild approach. The difference is that in phenix.autobuild much effort was spent on building the best possible model at each stage before carrying out density modification, while in phenix.phase_and_build speed of model-building is optimized. The result is that phenix.phase_and_build is 10 times faster than phenix.autobuild, yet it produces nearly as good a model in the end. The phenix.phase_and_build approach will also find NCS from your starting map and apply it during density modification.

Usage

How phase_and_build works:

phase_and_build first identifies a free set of reflections if they are not supplied. A map without the test set reflections is created for real-space refinement. A set of data for density modification is created which contains anisotropy-corrected data (if supplied).
Next phase_and_build estimates the solvent fraction (if not supplied) from the sequence file and cell parameters and the maximum and most likely number of NCS copies.
A starting map is obtained by density modification of the input data (unless a map coefficients file is supplied).
NCS is identified from the starting map (if the maximum number of NCS copies is greater than 1). If a heavy-atom file is supplied, NCS will first be identified from the heavy-atom sites, and the map will be used if the expected number of sites is not obtained.
Next one or more cycles of phase improvement are carried out. In each cycle a model is built using phenix.build_one_model, the model is refined in real-space and/or in reciprocal space, and then the model is included in density modification to create a new map.
Finally, a full model-building cycle is carried out using the most recent map. One or more models are built with phenix.build_one_model and are refined, then are combined, again with phenix.build_one_model. The resulting model is refined and sequence assignment and fitting of short loops is carried out with phenix.assign_sequence. Then longer loops are fit with phenix.fit_loops and the final model is refined and written out.

Output files from phase_and_build

phase_and_build.pdb: A PDB file with the resulting model phase_and_build_map_coeffs.mtz: An MTZ file with optimized phases

Parameters files in phase_and_build

When you run phenix.phase_and_build it will write out a phase_and_build_params.eff parameter file that can be used to re-run phenix.phase_and_build (just as for essentially all PHENIX methods). In addition, phenix.phase_and_build will write out the parameters files for the intermediate methods used as part of phenix.phase_and_build to the temporary directory used in building. You can run these with:

phenix.find_ncs temp_dir/find_ncs_params.eff # runs NCS identification
phenix.autobuild temp_dir/AutoBuild_run_1_/autobuild.eff   # runs first cycle of density modification
phenix.build_one_model temp_dir/build_one_model_params.eff # runs most recent model-building
phenix.assign_sequence temp_dir/assign_sequence_params.eff # runs sequence assignment and filling short gaps
phenix.fit_loops temp_dir/fit_loops_params.eff # runs loop fitting

This gives you control of all the steps in map improvement and model-building in addition to letting you run them all together with phenix.phase_and_build

Examples

Standard run of phase_and_build:

Running phase_and_build is easy. From the command-line you can type:

phenix.phase_and_build exptl_fobs_phases_freeR_flags.mtz sequence.dat

If you want to supply a file with anisotropy-corrected data to use in density modification you can do so:

phenix.phase_and_build data=exptl_fobs_phases_freeR_flags.mtz \
seq_file=sequence.dat \
aniso_corrected_data=solve_1.mtz

where solve_1.mtz is anisotropy-corrected (the amplitudes are not measured amplitudes, but rather are corrected with an anisotropic B-factor), and exptl_fobs_phases_freeR_flags.mtz contains experimental amplitudes. These two files normally will contain the same phase information. (Usually these files will come from phenix.autosol.)

You can also add a starting model or a starting map to phenix.phase_and_build. This means that you can run it once, get a new model and map, then run it again to further improve your model and map.

Possible Problems

Specific limitations and problems:

phenix.phase_and_build does not have the full flexibility of phenix.autobuild, so you may want to get a nearly-complete model with phenix.phase_and_build and then use phenix.autobuild to increase the completeness and quality.

Literature

Additional information

List of all phase_and_build keywords

------------------------------------------------------------------------------- 
Legend: black bold - scope names
        black - parameter names
        red - parameter values
        blue - parameter help
        blue bold - scope help
        Parameter values:
          * means selected parameter (where multiple choices are available)
          False is No
          True is Yes
          None means not provided, not predefined, or left up to the program
          "%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- 
phase_and_build
   input_files
      data= None MTZ file containing FP SIGFP PHIB FOM HLA HLB HLC HLD
            FreeR_flags Used as source of FP SIGFP freeR information in
            refinement and as source of experimental phase information for
            density modification. A suitable file is
            exptl_fobs_phases_freeR_flags.mtz from autosol or autobuild NOTE:
            This is a temporary requirement. You can also supply any other
            format of file if the data columns can be identified automatically
      labin= None Labin line for MTZ file with FreeR_flags. This is optional
             if phase_and_build can guess the labels. Otherwise specify a line
             like: FP=FP SIGFP=SIGFP PHIB=PHIB FOM=FOM HLA=HLA HLB=HLB HLC=HLD
             FreeR_flags=myFreeR_flags
      aniso_corrected_data= None Optional MTZ file containing
                            anisotropy-corrected data with FP SIGFP PHIB FOM
                            HLA HLB HLC HLD Used as source of FP SIGFP
                            information for density modification. A suitable
                            file is solve_1.mtz or phaser_1.mtz If none
                            supplied, the mtz file specified as data will be
                            used.
      labin_aniso_corrected_data= None Labin line for aniso_corrected data MTZ
                                  file . This is optional if phase_and_build
                                  can guess the labels
      map_file_fom= None You can specify the FOM of the map_coeffs file
                    (useful in cases where the map file has only FWT PHFWT and
                    no FOM column). This FOM is used to set the default
                    smoothing radius for the density modification solvent
                    boundary.
      map_file_is_density_modified= False You can specify that the
                                    input_map_file has been density modified.
                                    (This changes the assumptions on
                                    statistics of the map.)
      ha_file= None Heavy atom sites to be used to find NCS and to remove high
               peaks of density in initial density modification
      seq_file= None File with 1-letter code sequence of molecule. Chains
                separated by blank line or greater-than sign
      pdb_in= None Optional starting PDB file (ends will be extended if
              present)
      map_coeffs= None MTZ file with coefficients for a map
      labin_map_coeffs= None Labin line for MTZ file with map coefficients.
                        This is optional if build_one_model can guess the
                        correct coefficients for FP PHI and FOM. Otherwise
                        specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM where myFP
                        is your column label for FP
      ncs_info_file= None ncs_spec file with NCS information (written by
                     simple_ncs_from_pdb or find_ncs)
      remove_free NOTE remove_free params only used in build_one_model, not
                  phase_and_build
         free_in= None MTZ file containing FreeR_flags NOTE free_in only used
                  in build_one_model. Ignored by phase_and_build Used as
                  source of freeR information for real_space refinement. Note
                  other columns of data may be present and can be used in
                  reciprocal-space refinement. A suitable file is
                  exptf_fobs_phases_freeR_flags.mtz from autosol/autobuild or
                  my_model_refine_data.mtz from phenix.refine
         labin_free= None Labin line for MTZ file with FreeR_flags. This is
                     optional if build_one_model can guess the correct
                     coefficients for FreeR_flags.Otherwise specify:
                     FreeR_flags==myFreeR_flags
         map_coeffs_no_free= None Optional MTZ file with coefficients for a
                             map with freeR set removed. Use instead of
                             free_in. This map will be used for real-space
                             refinement
         labin_no_free= None Labin line for MTZ file with map coefficients and
                        freeR set removed. This is optional if build_one_model
                        can guess the correct coefficients for FP PHI and FOM.
                        Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM
                        where myFP is your column label for FP
   output_files
      mtz_out= 'phase_and_build_map_coeffs.mtz' Output MTZ file with map coeffs
      pdb_out= build_one_model.pdb Output PDB file
      log= build_one_model.log Output log file
      params_out= phase_and_build_params.eff Parameters file to rerun
                  phase_and_build
      job_title= None Job title in PHENIX GUI, not used on command line
   cycles
      ncycle= 2 Number of initial cycles of model-building, refinement and
              density modification
      nmodels= 1 Number of models to build with map from initial cycles
   ncs
      find_ncs= True Find NCS from input_ha_file or density or chains in the
                model
      update_ncs= True Update NCS as new information becomes available
      use_ha_in_ncs= True Use ha_file as source of NCS information
      optimize_ncs= True Try to map NCS operators close together
      minimum_ncs_cc= None Minimum CC for NCS (default unless extreme denmod)
   density_modification
      truncate_ha_sites_in_resolve= True You can choose to truncate the
                                    density near heavy-atom sites at a maximum
                                    of 2.5 sigma. This is useful in cases
                                    where the heavy-atom sites are very
                                    strong, and rarely hurts in cases where
                                    they are not. The heavy-atom sites are
                                    specified with "ha_file"
      use_hl_anom_in_denmod= False You can choose to use HL coefficients not
                             including model information (HLanom) in density
                             modification. They must be present in your data
                             file
      use_hl_anom_in_denmod_with_model= False You can choose to use HL
                                        coefficients not including model
                                        information (HLanom) in density
                                        modification when model information is
                                        used. They must be present in your
                                        data file
      extreme_dm= False Turns on extreme density modification if FOM is up to
                  fom_for_extreme_dm
      fom_for_extreme_dm= 0.35 If extreme_dm is true and FOM of phasing is up
                          to fom_for_extreme_dm then defaults for density
                          modification become: mask_type=wang wang_radius=20
                          mask_cycles=1 minor_cycles=4
   refinement
      refine= True Refine with standard reciprocal-space refinement
      refine_pdb_in= False Refine input model (if any) before using it
      use_hl_anom_in_refinement= False You can choose to use HL coefficients
                                 not including model information (HLanom) in
                                 refinement. They must be present in your data
                                 file
      include_ha_in_refinement= True You can choose to include your heavy-atom
                                sites in the model for refinement. This is a
                                good idea if your structure includes these
                                heavy-atom sites (i.e., for SAD or MAD
                                structures where you are not using a native
                                dataset). Heavy-atom sites that overlap an
                                atom in your model will be ignored.
      refine_se_occ= True You can choose to refine the occupancy of SE atoms
                     in a SEMET structure (default=True). This only applies if
                     semet=true
      ordered_solvent= True You can add waters during refinement
      flood_with_waters= False You can use the parameters file in
                         $PHENIX/phenix/phenix/autosol/flood.par to add lots
                         of waters during the phase improvement stage
      macro_cycles= None You can set the number of macro_cycles in refinement
                    Default (None) will use phenix.refine default
      add_free_r_if_needed= True If your input data file has no FreeR_flag
                            then it will be added
      allow_overlapping= True You can allow atoms in your ligand files to
                         overlap atoms in your protein/nucleic acid model.
                         This overrides 'keep_pdb_atoms' Useful in early
                         stages of model-building and refinement The ligand
                         atoms get the altloc indicator 'L' NOTE: the ligand
                         occupancy gets refined by default. You can turn this
                         off with fix_ligand_occupancy=True
      fix_ligand_occupancy= False If allow_overlapping=True then ligand
                            occupancies are refined as a group. You can turn
                            this off with fix_ligand_occupancy=true NOTE: has
                            no effect if allow_overlapping=False
      skip_hexdigest= False You may wish to ignore the hexdigest of the free R
                      flags in your input PDB file if the dataset you provide
                      is not identical to the one that you refined with (but
                      has the same free R flags).
      ncs_in_refinement= *torsion cartesian None Use torsion_angle refinement
                         of NCS. Alternative is cartesian or None (None will
                         use phenix.refine default)
      correct_special_position_tolerance= None Adjust tolerance for special
                                          position check. If 0., then check
                                          for clashes near special positions
                                          is not carried out. This sometimes
                                          allows phenix.refine to continue
                                          even if an atom is near a special
                                          position. If 1., then checks within
                                          1 A of special positions. If None,
                                          then uses phenix.refine default. (1)
      rs_refine= True You can run real-space refinement after model-building
                 NOTE: real_space refinement requires a source of FreeR_flag
                 and standard requires Fobs SigFobs and a source of FreeR_flag
                 For real-space refinement you can supply either an mtz file
                 with a FreeR_flag column or an mtz map file that has all the
                 FreeR reflections removed
   model_building
      fit_loops= True Include loop fitting in full model-building. At lower
                 resolution (3.5 A) it may be best to skip this step
      trace_loops= False Use trace_loops algorithm in loop fitting
      standard_loops= True Use standard_loops algorithm in loop fitting
      loop_lib= False Use loop_lib algorithm in loop fitting
      assign_sequence= True Include sequence assignment and short loop joining
                       in full model-building. At lower resolution (3.5 A) it
                       may be best to skip this step Only applicable for
                       chain_type=PROTEIN
      min_percent_assigned_for_assign_sequence= 50 Skip assign_sequence if
                                                initial percentage sequence
                                                assigned is lower than
                                                min_percent_placed_for_assign_s
                                               equence
      quick= False You can run quickly (superquick_build/delta_phi=30.) or
             more thoroughly (default, thorough_build/delta_phi=20.)
      insert_helices= False You can find helices and use them as a starting
                      point for model-building. This is useful if your
                      resolution is worse than 3 A.
      i_ran_seed= 712341 Random seed for model-building
   directories
      temp_dir= "temp_dir" Optional temporary work directory
      output_dir= "" Output directory where files are to be written
      top_output_dir= "" Top output directory for control files
      base_gui_dir= None Base output path for Phenix GUI only.
   crystal_info
      ncs_copies= none Number of NCS copies (defines solvent_fraction with
                  sequence) Normally determined automatically
      resolution= 0. high-resolution limit for map calculation
      solvent_fraction= None You can specify the solvent fraction Normally it
                        is set automatically
      chain_type= *PROTEIN DNA RNA Chain type (for identifying main-chain and
                  side-chain atoms)
      semet= False You can specify that your protein contains selenomethionine
   control
      verbose= False Verbose output
      raise_sorry= False Raise sorry if problems
      debug= False Debugging output
      dry_run= False Just read in and check parameter names
      write_run_directory_to_file= None The working directory name is written
                                   to this file
      resolve_command_list= None You can supply any resolve command here NOTE:
                            for command-line usage you need to enclose the
                            whole set of commands in double quotes (")
                            and each individual command in single quotes (')
                            like this: resolve_command_list="'no_build'
                            'b_overall 23' "