Merging two models with combine_models



This routine takes a model and finds pieces of a second model file that improve its fit to density when they replace the corresponding pieces in the first model.


The main uses of phenix.combine_models are:

How combine_models works:

The phenix.combine_models tool can combine the best parts of two or more models. You can use it in either of two different ways. One approach keeps segments in the models intact, and the second applies crossovers within segments.

In the first approach (merge_by_segment_correlation=True), each fragment in each model is scored based on correlation to the map. The scoring also includes a weight based on the square root of the length of the segment. It also includes weights based on whether a segment has been assigned to the sequence and the number of secondary structure hydrogen bonds in that segment. This approach can combine chains of different types as well. A final increment is added to the score for a segment if it is considered very likely to be correct with different thresholds for chains of different types.

Then the segments are picked in rank order to create a composite model. Any part of a segment that does not overlap an existing part of the composite model is kept. If symmetry is present, overlaps include that symmetry.

In the second approach, (merge_by_segment_correlation=False), a working model is created by taking all of the first model supplied and filling in any empty regions with fragments from other models. Then one by one, the segments in the working model are recombined with all other segments. To carry out recombination between two chains, residues that match in the two chains are identified. Then for each segment between matching pairs of residues, whichever chain has the higher correlation to the map is kept to create a composite model.

This second approach to combine_models starts with two input models. The first model is used as the default; if nothing can be found in the second model that is better than what is in the first model, then that part of the first model is kept. The second model is used as a template for improving the first model. Fragments of the second model are considered as alternatives for corresponding segments in the first model.

The fit of the models to density is used to decide which of a pair of fragments is best. In general, the correlation of model density with the map is used as the criterion. In cases where unequal numbers of residues are considered, then this correlation is weighted by the square root of the number of residues in each case. During the optional merge_second_model step, the scoring is optionally based on correlation of density, or by default, based on density at the positions of the main-chain atoms in the model.

If the two input models are not in the same asymmetric unit of the crystal, then combine_models will move the pieces from the second model to the corresponding locations in the first model. In this way the final model has all its parts in the same place.

The second approach in combine_models has four main steps, each of which is optional:

Output files from combine_models

combine_models.pdb: A PDB file with your combined model.


Standard run of combine_models:

Running combine_models is easy. From the command-line you can type:

phenix.combine_models pdb_in=first.pdb \
   second_pdb_in=second.pdb \
   seq.dat \

This will combine first.pdb and second.pdb based on fit to the map from map_coeffs.mtz, recheck the sequence alignment to seq.dat, and write out the resulting model.

Ranking all fragments and picking the best ones:

phenix.combine_models pdb_in=model.pdb
seq.dat map_coeffs.mtz merge_by_segment_correlation=True

This will score each segment (fragment), then work down from best to worst, keeping any part of any segment that does not overlap with a better-scoring segment.

Selecting pieces from the two models:

To take first.pdb and then see if residues A21-A30 of second.pdb can improve it, you can type:

phenix.combine_models pdb_in=first.pdb \
   second_pdb_in=second.pdb \
   seq.dat \
   map_coeffs.mtz \
   second_pdb_in_atom_selection="(chain A and resid 21:30)" \

Replacing a specific segment:

To take first.pdb and then see if residues A21-A30 and B21-B30 can be improved by replacing them with residues C10-C20 and D10-D20 of second.pdb, you can tell combine_models to ignore residues A22-A29 and B22-B29 and to consider only residues C10-C20 and D10-D20 of second.pdb:

phenix.combine_models pdb_in=first.pdb \
   second_pdb_in=second.pdb \
   seq.dat \
   map_coeffs.mtz \
   pdb_in_atom_selection="(not  ( (chain A or chain B) and resid 22:29) )" \
   second_pdb_in_atom_selection="( (chain C or chain D) and resid 10:20)" \

Crossing two models that have entirely matched residues:

If your first.pdb and second.pdb have exactly the same residues present, and just differ in coordinates, then you might want to preserve all the connectivity by skipping the merge_second_model step, and by skipping the non_matching crossover step, and by skipping the reassignment of sequence. You can type preserve_connectivity=True as a shortcut for this:

phenix.combine_models pdb_in=first.pdb \
   second_pdb_in=second.pdb \
   seq.dat \
   map_coeffs.mtz \

Possible Problems

If you are supplying a very big map, running combine_models can require a lot of memory. In this case, you may want to run map_box with your map and a model file that contains all your models. This will box the map around the region where your models are located and make a much smaller map that will require less memory.

Specific limitations and problems:


Additional information

List of all available keywords