Correlation of map and model after adjusting model for origin shifts with get_cc_mtz_pdb
Author(s)
- get_cc_mtz_pdb: Tom Terwilliger
Purpose
get_cc_mtz_pdb is a command line tool for adjusting the origin of a
PDB file using space-group symmetry so that the PDB file superimposes on
a map, obtaining the correlation of model and map, and analyzing the
correlation for each residue.
Usage
How get_cc_mtz_pdb works:
get_cc_mtz_pdb calculates a model map based on the supplied PDB file,
then uses RESOLVE to find the origin shift (using space-group symmetry)
that maximizes the correlation of this model map with a map calculated
with the supplied map coefficients in an mtz file. This shift is applied
to the atoms in the PDB file to create offset.pdb and then the
correlation, residue-by-residue of offset.pdb with the map is
analyzed. Atoms and residues that are out of density or are in weak
density are flagged.
You can set several parameters to define how the correlations are
calculated.
By default model density is calculated using the atom types, occupancies
and isotropic thermal factors (B-values) supplied in the PDB file. If
you specify
scale=True
then an overall B as well as an increment in B-values for each atom
beyond CB (for proteins) will be added to the values in the PDB file,
after adjusting these parameters to maximize the map correlation.
If you specify
use_only_refl_present_in_mtz=True
then the model-based map will be calculated using the same set of
reflections as the map calculated from your input mtz file. This reduces
the contribution of missing reflections on the calculation (but the
correlation is no longer the actual map-model correlation).
In the calculation of the map correlation in the region of the model,
the region where the model is located is defined as all points within a
distance rad_max of an atom in the model. The value of rad_max is
adjusted in each case to maximize this correlation. Its value is
typically similar to the high-resolution limit of the map.
Output files from get_cc_mtz_pdb
offset.pdb: A PDB file offset to match the origin in the mtz file.
Examples
Standard run of get_cc_mtz_pdb:
Running the get_cc_mtz_pdb is easy. From the command-line you can
type:
phenix.get_cc_mtz_pdb map_coeffs.mtz coords.pdb
If you want (or need) to specify the column names from your mtz file,
you will need to tell get_cc_mtz_pdb what FP and PHIB (and optionally
FOM) are, in this format:
phenix.get_cc_mtz_pdb map_coeffs.mtz coords.pdb \
labin="FP=2FOFCWT PHIB=PH2FOFCWT"
Possible Problems
Specific limitations and problems:
The option to use atom selections in get_cc_mtz_pdb can cause some
confusion, because the CC values obtained can depend on the atom
selections. The way this works is that the PDB file (after atom
selections) is used to calculate model density. This model density is
compared to the map from your input map coefficients in the region
surrounding the atoms selected. This model density can depend on the
presence of nearby atoms (where the density extends a ways away from
those atoms).
In versions of PHENIX up to 1.3-final, defaults were set to maximize the
correlation coefficient rather than to give the correlation using the
existing thermal parameters and including only the reflections present
in the mtz file. These previous defaults were equivalent to using the
values:
scale=True
use_only_refl_present_in_mtz=True
These defaults were changed so that the correlation values obtained by
default in a case where no origin shifts are needed would correspond to
those obtained by simply calculating (1) a map using the input map
coefficients and (2) a map from the PBB file and then determining the
correlation between these maps.
List of all available keywords
- get_cc_mtz_pdb
- pdb_in = None PDB file with coordinates to evaluate
- atom_selection = None Any selection specified with atom_selection is applied to input model (pdb_in) before using the model. NOTE: this option can result in confusing output because the model density near an atom and therefore the local CC value depends not just on the atoms being considered but also nearby atoms. Therefore if you remove some atoms next to a residue with atom_selection then the CC for that residue may change.
- mtz_in = None MTZ file with coefficients for a map
- map_in = None CCP4 or MRC-style map file
- labin = "" Labin line for MTZ file with map coefficients.
This is optional if get_cc_mtz_pdb
can guess the correct coefficients
for FP PHI and FOM. Otherwise specify:
LABIN FP=myFP PHIB=myPHI FOM=myFOM
where myFP is your column label for FP
- offset_pdb_suffix = offset Suffix for output version of pdb file, (offset to maximize correlation with mtz file if fix_xyz is not set)
- resolution = None high-resolution limit for map calculation
- use_only_refl_present_in_mtz = False You can specify that only reflections present in
your mtz file are used in the comparison.
- scale = False If you set scale=True then get_cc_mtz_pdb applies
an overall B factor and a delta_b for each atom beyond CB.
- split_conformers = False If you want to have A and B conformers analyzed separately
you can say split_conformers=True. Note that this requires that
the entire A conformer for a residue must be before (or after)
the entire B conformer for that residue.
- use_cc_mask = None Use cc_mask calculation to match validation methods Default is True for map_in and False for mtz_in
- fix_xyz = False If you want your PDB file compared to the map from
your mtz file with no offsets at all (fixed position)
then specify fix_xyz=True
- fix_rad_max = False If you want to use a fixed radius around all atoms for calculation of the correlation use fix_rad=True. To set the value, use rad_max=xxx, otherwise it is set automatically. If rad_max is set, fix_rad_max is set to True
- rad_max = None If you want to use a fixed radius around all atoms for calculation of the correlation use fix_rad=True. To set the value, use rad_max=2.5, otherwise it is set automatically
- any_offset = False You can search for a match with any offset even
though this is not allowed by space-group symmetry
- chain_type = *PROTEIN DNA RNA Chain type (for identifying main-chain and side-chain atoms)
- temp_dir = "temp_dir" Temporary work directory
- output_dir = "" Output directory where files are to be written
- gui_output_dir = None
- verbose = True Verbose output
- quick = False Skip the residue-by=residue correlations for a quick run
- raise_sorry = False Raise sorry if problems
- debug = False Debugging output
- dry_run = False Just read in and check parameter names
- resolve_command_list = None You can supply any resolve command here NOTE: for command-line usage you need to enclose the whole set of commands in double quotes (") and each individual command in single quotes (') like this: resolve_command_list="'no_build' 'b_overall 23' "
- job_title = None Job title in PHENIX GUI, not used on command line
- output_files
- target_output_format = *None pdb mmcif Desired output format (if possible). Choices are None ( try to use input format), pdb, mmcif. If output model does not fit in pdb format, mmcif will be used. Default is pdb.