Correlation of map and model after adjusting model for origin shifts with get_cc_mtz_pdb

Author(s)

get_cc_mtz_pdb: Tom Terwilliger

Purpose

get_cc_mtz_pdb is a command line tool for adjusting the origin of a PDB file using space-group symmetry so that the PDB file superimposes on a map, obtaining the correlation of model and map, and analyzing the correlation for each residue.

Usage

How get_cc_mtz_pdb works:

get_cc_mtz_pdb calculates a model map based on the supplied PDB file, then uses RESOLVE to find the origin shift (using space-group symmetry) that maximizes the correlation of this model map with a map calculated with the supplied map coefficients in an mtz file. This shift is applied to the atoms in the PDB file to create offset.pdb and then the correlation, residue-by-residue of offset.pdb with the map is analyzed. Atoms and residues that are out of density or are in weak density are flagged.

You can set several parameters to define how the correlations are calculated.

By default model density is calculated using the atom types, occupancies and isotropic thermal factors (B-values) supplied in the PDB file. If you specify

scale=True

then an overall B as well as an increment in B-values for each atom beyond CB (for proteins) will be added to the values in the PDB file, after adjusting these parameters to maximize the map correlation.

If you specify

use_only_refl_present_in_mtz=True

then the model-based map will be calculated using the same set of reflections as the map calculated from your input mtz file. This reduces the contribution of missing reflections on the calculation (but the correlation is no longer the actual map-model correlation).

In the calculation of the map correlation in the region of the model, the region where the model is located is defined as all points within a distance rad_max of an atom in the model. The value of rad_max is adjusted in each case to maximize this correlation. Its value is typically similar to the high-resolution limit of the map.

Output files from get_cc_mtz_pdb

offset.pdb: A PDB file offset to match the origin in the mtz file.

Examples

Standard run of get_cc_mtz_pdb:

Running the get_cc_mtz_pdb is easy. From the command-line you can type:

phenix.get_cc_mtz_pdb map_coeffs.mtz coords.pdb

If you want (or need) to specify the column names from your mtz file, you will need to tell get_cc_mtz_pdb what FP and PHIB (and optionally FOM) are, in this format:

phenix.get_cc_mtz_pdb map_coeffs.mtz coords.pdb \
labin="FP=2FOFCWT PHIB=PH2FOFCWT"

Possible Problems

Specific limitations and problems:

The option to use atom selections in get_cc_mtz_pdb can cause some confusion, because the CC values obtained can depend on the atom selections. The way this works is that the PDB file (after atom selections) is used to calculate model density. This model density is compared to the map from your input map coefficients in the region surrounding the atoms selected. This model density can depend on the presence of nearby atoms (where the density extends a ways away from those atoms).

In versions of PHENIX up to 1.3-final, defaults were set to maximize the correlation coefficient rather than to give the correlation using the existing thermal parameters and including only the reflections present in the mtz file. These previous defaults were equivalent to using the values:

scale=True
use_only_refl_present_in_mtz=True

These defaults were changed so that the correlation values obtained by default in a case where no origin shifts are needed would correspond to those obtained by simply calculating (1) a map using the input map coefficients and (2) a map from the PBB file and then determining the correlation between these maps.

List of all available keywords

get_cc_mtz_pdb
- pdb_in = None PDB file with coordinates to evaluate
- atom_selection = None Any selection specified with atom_selection is applied to input model (pdb_in) before using the model. NOTE: this option can result in confusing output because the model density near an atom and therefore the local CC value depends not just on the atoms being considered but also nearby atoms. Therefore if you remove some atoms next to a residue with atom_selection then the CC for that residue may change.
- mtz_in = None MTZ file with coefficients for a map
- map_in = None CCP4 or MRC-style map file
- labin = "" Labin line for MTZ file with map coefficients. This is optional if get_cc_mtz_pdb can guess the correct coefficients for FP PHI and FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM where myFP is your column label for FP
- offset_pdb_suffix = offset Suffix for output version of pdb file, (offset to maximize correlation with mtz file if fix_xyz is not set)
- resolution = None high-resolution limit for map calculation
- use_only_refl_present_in_mtz = False You can specify that only reflections present in your mtz file are used in the comparison.
- scale = False If you set scale=True then get_cc_mtz_pdb applies an overall B factor and a delta_b for each atom beyond CB.
- split_conformers = False If you want to have A and B conformers analyzed separately you can say split_conformers=True. Note that this requires that the entire A conformer for a residue must be before (or after) the entire B conformer for that residue.
- use_cc_mask = None Use cc_mask calculation to match validation methods Default is True for map_in and False for mtz_in
- fix_xyz = False If you want your PDB file compared to the map from your mtz file with no offsets at all (fixed position) then specify fix_xyz=True
- fix_rad_max = False If you want to use a fixed radius around all atoms for calculation of the correlation use fix_rad=True. To set the value, use rad_max=xxx, otherwise it is set automatically. If rad_max is set, fix_rad_max is set to True
- rad_max = None If you want to use a fixed radius around all atoms for calculation of the correlation use fix_rad=True. To set the value, use rad_max=2.5, otherwise it is set automatically
- any_offset = False You can search for a match with any offset even though this is not allowed by space-group symmetry
- chain_type = *PROTEIN DNA RNA Chain type (for identifying main-chain and side-chain atoms)
- temp_dir = "temp_dir" Temporary work directory
- output_dir = "" Output directory where files are to be written
- gui_output_dir = None
- verbose = True Verbose output
- quick = False Skip the residue-by=residue correlations for a quick run
- raise_sorry = False Raise sorry if problems
- debug = False Debugging output
- dry_run = False Just read in and check parameter names
- resolve_command_list = None You can supply any resolve command here NOTE: for command-line usage you need to enclose the whole set of commands in double quotes (") and each individual command in single quotes (') like this: resolve_command_list="'no_build' 'b_overall 23' "
- job_title = None Job title in PHENIX GUI, not used on command line