CaBLAM Validation in Phenix

Authors

cablam_validate: Christopher Williams (Richardson Lab, Duke University)

Purpose

CaBLAM stands for C-Alpha Based Low-resolution Annotation Method. It is a system designed to use protein CA geometry to evaluate mainchain geometry and identify areas of probable secondary structure. CaBLAM is intended for use in low-resolution structures where compound errors or ambiguities in a model may make the results of highly-sensitive measures of protein conformation, such as Ramachandran analysis, difficult or impossible to interpret.

How CaBLAM works

For each residue, cablam_validate calculates several measures based on CA geometry. It uses these measures as coordinates for comparison against contours of expected protein behavior derived from a high-quality dataset (Top8000). Residues that fall outside these contours are considered outliers and are reported in validation feedback.

cablam_validate also compares local CA geometry measures which are robust at low resolution against contours of expected secondary structure behavior. Residues that fall within these secondary structure contours are reported as probable members of that secondary structure type.

Running CaBLAM Validation

CaBLAM validation is currently run from the commandline. It accepts a .pdb file (or a structure file that phenix finds similarly readable.)

phenix.cablam_validate file.pdb output=[choice of: full_kin, kin, ca_kin, oneline, text, points]

Outlier Cutoff

outlier_cutoff=0.05 is the default. This captures the majority of modeling errors, but also may misidentify as outliers real-but-rare conformations such as those in loops.
outlier_cutoff=0.01 is also recommended to identify severe errors.
outlier_cutoff=0.005 is recommended for use with output=ca_kin.

Several output modes are available.

Full Kin

output=full_kin gives the most comprehensive validation output, recommended for general use. It generates two automatically named files in the working directory: file_cablam_multi.kin and file_cablam_multi.pdb, then opens these files together in a phenix.king window. The .kin file contains CaBLAM markup for mild (0.05) and severe (0.01) CaBLAM outliers and for CA geometry outliers. The .pdb file contains the original model, plus HELIX and SHEET-style records for secondary structure elements identified by CaBLAM. The KiNG window combines this feedback into a single, interactive format.

Kin

output=kin prints a markup kinemage to screen. This markup should be appended to a kinemage of the structure (such as a multi-criterion kinemage generated by phenix.kinemage). The markup consists of a set of colored lines drawn over each residue identified as an outlier. The lines follow the path of one of CaBLAM's geometric measures, a measure of the dihedral relationship between adjacent peptide planes. Among the CaBLAM measures, this is the dihedral most likely to be modeled incorrectly at low resolution.

CA Kin

output=ca_kin outlier_cutoff=0.005 prints a markup kinemage for C-Alpha geometry to screen. This markup should be appended to a kinemage of the structure (such as a multi-criterion kinemage generated by phenix.kinemage). CaBLAM is dependent on CA geometry, so it is necessary to know where that geometry is unreliable. The markup consists of red lines drawn along the CA Virtual Angle for each residue with a CA geometry outlier, but the virtual angle is not necessarily the angle at fault. Note that the outlier_cutoff must be set differently from its default to get meaningful results.

Oneline

output=oneline prints a single line summary of CaBLAM statistics for each of the submitted proteins. Sample output follows:

pdbid:residues:peptide_outlier_percent:peptide_bad_outlier_percent:ca_outlier_percent
3dnd:340:4.4:0.6:0.00

pdbid identifies the input file
residues gives the number of measurable residues assessed by CaBLAM (residues near termini and chain breaks are not assessable by CaBLAM)
peptide_outlier_percent gives the percent of residues with "mild" outlier status. Higher than 5% may be cause for concern.
peptide_bad_outlier_percent gives the percent of residues with "severe" outlier status. Higher than 1% may be cause for concern.
ca_outlier_percent give the percent of residues with CA geometry outliers. Higher than 0.1% may be cause for concern.

Text

output=text is the default. This provides columnated, comma-separated records for each outlier residue. Sample output follows:

residue,contour_level,loose_alpha,regular_alpha,loose_beta,regular_beta,threeten
THR A  37 ,0.01437,0.00000,0.00000,0.02612,0.00000,0.00000
PHE A  54 ,0.01350,0.00027,0.00000,0.00109,0.00000,0.00000
ASN A  99 ,0.04785,0.00000,0.00000,0.12777,0.04921,0.00000
ASN A 115 ,0.04685,0.04231,0.00000,0.00011,0.00000,0.00000
GLY A 136 ,0.03689,0.00013,0.00000,0.00595,0.00000,0.00000

residue: A full residue identifier for the outlier
contour_level: The contour level for protein behavior at which the residue falls, in fraction form (max of 1.0). Lower values indicate a more severe outlier. 0.05 is the default cutoff for outliers
loose_alpha/regular_alpha: contour levels for expected alpha helix behavior. Higher values indicate greater confidence that the residue should be modeled as alpha helix. loose_alpha is the most reliable indicator, and its default cutoff is 0.001
loose_beta/regular_beta: contour levels for expected beta behavior. Higher values indicate greater confidence that the residue should be modeled as beta sheet. regular_beta is the most reliable indicator, and its default cutoff is 0.001
threeten: contour levels for expected three-ten helix behavior. Higher values indicate greater confidence that the residue should be modeled as three-ten helix. Its default cutoff is 0.001

Points

output=points prints to screen a dotlist of points in kinemage format to produce a pointcloud of outlier residues in CaBLAM space. Primarily a developer tool.

Records

output=records prints to screen HELIX and SHEET-style records for secondary structure elements identified by CaBLAM. Formatting for these records is not yet totally consistent with PDB standard, but will be brought into closer adherence over time.

CaBLAM validation is not currently available through the phenix.refine GUI, but will be available on a future iteration of the validation tab.

Tools for exploring known structures are available through cablam.training