Summarize geometric quality of a model
This tool summarizes the geometric quality of a model by calculating pseudo-energies (square of the value of the deviation from ideality divided by sigma) for every aspect of a model (each bond length, angle, rotamer, Ramachandran value, etc). For each metric (such as angles), the mean energy and worst energy are noted, and the probabilities of this mean and this worst value not occurring by chance are estimated. To reduce the dominance of very poor values, energy values E over 10 are filtered by reducing them toward 10 with a logarithmic function. The weighted energy for each metric is sum of the mean energy and the filtered worst energy for that metric, each weighted by their probabilities of not occurring by chance.
The overall energy is the sum of weighted energies for all metrics. The metrics used are:
CBetadev: Deviation of CB position from ideal (A) Clash: Clash of non-bonded atoms (A) Omega: Peptide omega angle deviation (degrees) Rama: Ramachandran outlier probability Rota: Rotamer outlier probability Angle: Angle deviation (degrees) Bond: Bond length deviation (A) Chir: Chirality deviation (A**3) Torsion: Torsion angle deviation (degrees) Full-nonbond: Non-bonded deviations, including all instances Nonbond: Non-bonded deviations (using Lennard-Jones potential, excluding bonds that are plausible and have low energies) Plane: Planarity deviations (A)
This tool estimates the expected energy for a structure that has all bonds, angles, etc, distributed as normal distributions with the sigmas used in the geometry validation step above. That is, if the structure had a normal distribution of errors, it would have about this energy. The energy is estimated by sampling from a normal distribution when calculating the energy terms instead of taking the actual deviations. The sampling process is carried out many times (20) and the averages are reported.
The tool estimates the overall ratio of deviations to sigmas in three steps
First it lists all the values deviation/sigma for all the metrics. If the deviations were drawn from random Gaussian distributions with standard deviations of their corresponding sigmas, these values of deviation/sigma would be normally distributed with a standard deviation of one.
Then the 99.7th percentile of values of deviation/sigma is noted. For a Gaussian distribution, this percentile corresponds to 3 standard deviations. The estimated ratio of deviations/sigma overall is then 1/3 of this 99.7th percentile value, or 1 standard deviation of the ratios of deviation to sigma.
You can use holton_geometry_validation to compare models for the same structure and identify which has fewer overall geometric problems.
You can use the expected energy to see if your structure has errors that are about what is expected based on ideal model geometry and uncertainties.
You can use the estimate of ratio of deviations to sigmas to further examine whether the deviations from ideality in your structure are about what is expected based on the expected variation in geometric values.
Running holton_geometry_validation is easy. From the command-line you can type:
phenix.holton_geometry_validation 1ss8_A.pdb
where 1ss8_A.pdb is the model you would like to evaluate.
If your structure has hydrogen positions that do not match those expected by Phenix, you can ignore them in all metrics except non-bonded interactions with ignore_h_except_in_nonbond=True. Alternatively you can redo their positions with keep_hydrogens=False.
If you want to ignore just hydrogen positions in Arginine residues, you can specify ignore_arg_h_nonbond=True.
If you want to exclude hydrogen nonbond contacts involving waters, use ignore_water_h_bonds, and if you want to exclude all bonds involving hydrogen, ignore_bond_lengths_with_h=True
The method is described at https://bl831.als.lbl.gov/~jamesh/challenge/twoconf/#score