Validation (cryo-EM)
Why
Many steps are involved to get from the initial idea about a molecule to
the final atomic model and its interpretation. Errors can occur in each of
these steps, so validation procedures are necessary to ensure that the
atomic models are reasonable and fit the experimental data.
In particular, validation addresses data, model, and model-versus-data quality:
- Data. The quality of experimental data, such as the resolution or amount
of noise, defines the quality of the derived models. Therefore, it is vital to
understand the data quality so that you use the correct model-building and
refinement procedures appropriate for the data at hand.
(See the Assessing Map Quality (cryo-EM)
overview for more details.)
- Model. The underlying principles behind model validation are the same for
any experimental method: a good model should make chemical sense and be consistent
with empirical statistics for high-quality prior structures. Thus, atomic models
derived from cryo-EM reconstructions should conform to prior knowledge about
covalent geometry, non-bonded interactions, and known distributions of
side-chain and main-chain conformations in both proteins and nucleic acids.
Generally, the goal is to have as few
outliers as is feasible, where outliers can be explained by their environment
and/or experimental data.
- Model-versus-data fit. Model-verses-data validation criteria require the model
to describe its own data well. A perfectly good model from a stereochemistry
perspective may not fit the 3D
reconstruction sufficiently or at all.  In reciprocal space, model-versus-data agreement is
assessed by curves of the Fourier shell correlation (FSC) as a function of resolution. In
real space, the agreement is measured by correlation coefficients (CCs).
How
Comprehensive validation (cryo-EM) is a one-stop program in Phenix that
validates model, data, and model-versus-data fit. The minimal input is a
map (in MRC, CCP4, or related format) and an atomic model (in PDB or mmCIF format).
However, providing a map, model, and two half-maps is desirable.
For geometric validation, Phenix uses MolProbity, which is fully integrated into
the Comprehensive validation (cryo-EM) package. The graphics programs Coot and
PyMOL are also fully integrated, enabling communication of the validation results.
How to use the validation tools in the Phenix GUI: Click here
Common issues
- Outliers versus errors: In most structures, there will be Ramachandran and
side-chain rotamer outliers. If the data (the map) and surrounding chemistry
supports the outlier conformation, then it is most likely an outlier from prior
expectations. In the absence of supporting density and/or conflicting local chemistry,
it is probably an error that needs correction. For example, an atomic model
derived from low-resolution data is expected to have no geometrical outliers
because they are unlikely to be supported by the experimental data.
Related programs
- phenix.cablam_validation: The C-Alpha
Based Low-Resolution Annotation Method (CaBLAM) validates protein backbone
conformations in models determined at low resolution (2.5-4 Å), where it is difficult
to determine peptide orientations because carbonyl O atoms cannot be discerned in
the density maps. The program uses protein C-alpha geometry to evaluate the main-chain
geometry and identify areas of probable secondary structure.
- phenix.mtriage: This program computes various
cryo-EM map statistics, including resolution estimates, Fourier shell correlation
curves, and more.