This site contains the results of applying the Phenix tool phenix.map_to_model to 629 maps from the EMDB
NOTES:
The purpose of these data are to illustrate what can (and cannot) be done automatically, and to provide a head-start on model building for cryo-EM maps. Models produced with phenix.map_to_model are preliminary and only partially complete and many contain chains that are built backwards or have incorrect parts or joins.
Some of the models contain a few very bad contacts. The worst such model is map_to_model_5tc1_8397.pdb. Most of these can be removed by running the Phenix tool remove_clashes: phenix.remove_clashes map_to_model_5tc1_8397.pdb. Most of the "all data" downloads contain trimmed models created in this way, labeled for example map_to_model_trimmed_5tc1_8397.pdb
For each structure the models present in "all data" are the full automatically-built model (e.g., for the structure EMD-2984, PDB entry 5a1a, map_to_model_5a1a_2984.pdb), the "init" model created using the trace-chain algorithm and iterative secondary structure optimization (model_init_PROTEIN_shifted_5a1a_2984.pdb), the helices-strands model (model_helices_strands_only_PROTEIN_shifted_5a1a_2984.pdb), and the RESOLVE model (model_standard_PROTEIN_shifted_5a1a_2984.pdb). The intermediate models are listed as shifted as they have been shifted from their positions to match the origin of the original map.
For EMDB-6272 (3j9s) the part of the map representing the deposited model was cut out from the deposited map as the full symmetry was not available. After running phenix.map_to_model the map_to_model.pdb model was translated to match the original map. For this structure the intermediate files (e.g., model_init_PROTEIN_shifted_3j9s_6272.pdb, model_standard_PROTEIN_shifted_3j9s_6272.pdb, model_helices_strands_only_PROTEIN_shifted_3j9s_6272.pdb) match the cut out map, not the original map.
For EMD-4054 (5lij) there is no value for sequence match because the deposited model is a poly-alanine chain.
You can download all the data on this site with all_tar_mtm_2018-03-09.tgz
You can see an earlier version of this site with 476 datasets analyzed as well.
A summary of data as reported in the paper and based on the 476 datasets above is in the spreadsheet rmsd_estimates_2018-01-14b.xlsx
The table below can be downloaded as a spreadsheet from: MTM_summary_2018-03-09.xlsx
EMDB | EMDB ID and link to EMDB entry |
PDB | PDB ID and link to PDB entry |
Resolution | Resolution from PDB |
CC (deposited) | Map-model correlation for deposited map and model using phenix.map_model_cc |
CC (map_to_model) | Map-model correlation for automatically-generated model and auto_sharpened map |
Symmetry | Symmetry from the EMDB. Note that this symmetry may or may not match the number of copies or the symmetry file. For example, the first entry in the table below, EMDB-6272, PDB 3j9s is listed in the EMDB as having icosahedral symmetry, however the deposited map is only a portion of the molecule containing 3 chains. The number of copies in this case is 3 and the symmetry file contains 3 operators. Some entries are marked NA to indicate the EMDB did not specify the reconstruction symmetry. |
Copies | Number of reconstruction symmetry operators used ( normally obtained from meta-data in the PDB or from symmetry of PDB deposit) |
Residues (Protein/RNA) | Protein residues. This is the number used in comparisons between deposited and automatically-generated models. It can be either the unique or the total number |
% Matching | Percentage of protein/RNA residues in the deposited model or the unique part of the deposited model that are within 3 A of a residue (residues represented by their CA or P atoms) in the automatically-generated model. |
% Seq match | Percentage of matching residues that have the same residue name |
Download all data | Link to download all the data for this analysis. Includes: sequence , symmetry , resolution , automatically-generated model ( map_to_model_xxxx_yyyy.pdb), model, intermediate models: model_helices_strands_only_PROTEIN_shifted_xxxx_yyyy.pdb (helices-strands model), model_init_PROTEIN_shifted_xxxx_yyyy.pdb (trace-chain model), model_standard_PROTEIN_shifted_xxxx_yyyy.pdb (resolve model) |
Model (map_to_model) | Automatically generated model |
Deposited model | Model from PDB with symmetry (if any) applied |
Symmetry file | File containing symmetry matrices used in analysis. Note: only the first symmetry group is used; others are ignored. NCS refers to symmetry. Most of these were obtained by analysis of the deposited models and some have higher or lower symmetry than was used in the reconstruction. |
Sequence file | File containing sequence used in analysis. |