The basic steps that the simple_ncs_from_pdb carries out are:
The chains matching is done using dynamic programming alignment of residues and atoms. The first pass contains restriction on minimal similarity, set by min_percent (number of matching residues)/(number of residues in longer chain)
From the matching chains list, we remove chain pairs where the matching segments exceed the RMSD limit.
Matching segments are scanned for local residue misalignment. Residues where (max_atom_distance - min_atom_distance) > match_radius are excluded from matching segment. This allow local differences in matching chains.
If matching residues have different number of atoms (For example if one containing the side chain while the other not), only the matching atoms will be included.
Grouping of chains is not performed.
Alternative conformation are excluded from matching segments.
The matching is done by the residues name strings, not by the residue numbers, this allows handling of insertions in PDB file.
The result of the NCS search is combination of NCS related groups and invariant or non-NCS related regions, to the atom level. In every NCS group all copies have the same number of atoms and can be reproduced by applying the applying the appropriate rotation and translation to the master copy.
When running
phenix.simple_ncs_from_pdb 2h50.pdb
The following files will be produced
2h50_simple_ncs_from_pdb.phil
2h50_simple_ncs_from_pdb.ncs_spec 2h50_simple_ncs_from_pdb.resolve
The file that should be used for refinement is 4boz_simple_ncs_from_pdb.phil. This file can also be modified if a particular NCS relations need to be changed. The content of that file is the exact selection sting of the atoms in the NCS groups
The content of 2h50_simple_ncs_from_pdb.ncs is
ncs_group { reference = chain 'A' selection = chain 'C' selection = chain 'E' selection = chain 'G' selection = chain 'I' selection = chain 'K' selection = chain 'M' selection = chain 'O' selection = chain 'Q' selection = chain 'S' selection = chain 'U' selection = chain 'W' }
To get output in other format:
phenix.simple_ncs_from_pdb 4boz.pdb write_spec_files=True
Simple_ncs_from_pdb will analyze the chains in 4boz.pdb and identify any NCS that exists. For this sample run the following output is produced:
GROUP 1 Summary of NCS group with 2 operators: ID of chain/residue where these apply: [['A', 'D'], [[[147, 150], [152, 211], [213, 275], [280, 305], [307, 308]], [[147, 150], [152, 211], [213, 275], [280, 305], [307, 308]]]] RMSD (A) from chain A: 0.0 0.72 Number of residues matching chain A:[155, 155] OPERATOR 1 CENTER: 24.4880 -13.3177 -20.1848 ROTA 1: 1.0000 0.0000 0.0000 ROTA 2: 0.0000 1.0000 0.0000 ROTA 3: 0.0000 0.0000 1.0000 TRANS: 0.0000 0.0000 0.0000 OPERATOR 2 CENTER: 15.9430 11.8822 0.6609 ROTA 1: 0.7964 -0.5651 0.2152 ROTA 2: -0.5503 -0.8249 -0.1295 ROTA 3: 0.2507 -0.0152 -0.9680 TRANS: 18.3631 5.3425 -23.3604 GROUP 2 Summary of NCS group with 3 operators: ID of chain/residue where these apply: [['B', 'C', 'E'], [[[1, 41], [43, 71]], [[1, 41], [43, 71]], [[1, 41], [43, 71]]]] RMSD (A) from chain B: 0.0 0.8 0.77 Number of residues matching chain B:[70, 70, 70] OPERATOR 1 CENTER: 47.5581 -9.5652 -26.8403 ROTA 1: 1.0000 0.0000 0.0000 ROTA 2: 0.0000 1.0000 0.0000 ROTA 3: 0.0000 0.0000 1.0000 TRANS: 0.0000 0.0000 0.0000 OPERATOR 2 CENTER: 27.7410 -11.0237 -49.7361 ROTA 1: -0.6661 0.2615 -0.6986 ROTA 2: 0.2866 -0.7749 -0.5634 ROTA 3: -0.6886 -0.5755 0.4412 TRANS: 34.1743 -54.0788 7.8610 OPERATOR 3 CENTER: 29.3812 -4.6379 12.2671 ROTA 1: 0.7539 -0.5901 0.2888 ROTA 2: -0.5860 -0.8027 -0.1106 ROTA 3: 0.2971 -0.0858 -0.9510 TRANS: 19.1286 5.2864 -24.3021
Another way to view the results is
phenix.simple_ncs_from_pdb 4boz.pdb show_summary=true Chains in model: --------------------------------------------------- A B C D E . . . . . . . . . . . . . . . . . . . . . . . . . . NCS summary: --------------------------------------------------- Number of NCS groups : 2 Group # : 1 Number of copies : 2 Chains in master : 'A' Chains in copies : 'D' Group # : 2 Number of copies : 3 Chains in master : 'B' Chains in copies : 'C', 'E' . . . . . . . . . . . . . . . . . . . . . . . . . . Transforms: --------------------------------------------------- Group # : 1 Transform # : 1 RMSD : 0 ROTA 0 1.0000 0.0000 0.0000 ROTA 1 0.0000 1.0000 0.0000 ROTA 2 0.0000 0.0000 1.0000 TRANS 0.0000 0.0000 0.0000 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Transform # : 2 RMSD : 0.720792956817 ROTA 0 0.7964 -0.5503 0.2507 ROTA 1 -0.5651 -0.8249 -0.0152 ROTA 2 0.2152 -0.1295 -0.9680 TRANS -5.8295 14.4282 -25.8708 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Group # : 2 Transform # : 1 RMSD : 0 ROTA 0 1.0000 0.0000 0.0000 ROTA 1 0.0000 1.0000 0.0000 ROTA 2 0.0000 0.0000 1.0000 TRANS 0.0000 0.0000 0.0000 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Transform # : 2 RMSD : 0.796197645664 ROTA 0 -0.6661 0.2866 -0.6886 ROTA 1 0.2615 -0.7749 -0.5755 ROTA 2 -0.6986 -0.5634 0.4412 TRANS 43.6766 -46.3175 -10.0616 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Transform # : 3 RMSD : 0.772963874518 ROTA 0 0.7539 -0.5860 0.2971 ROTA 1 -0.5901 -0.8027 -0.0858 ROTA 2 0.2888 -0.1106 -0.9510 TRANS -4.1022 13.4462 -28.0502
There are 5 chains in the PDB file (A,B,C,D,E). In the first group the master is A the the copy D and in the second group the master is B and the copy is C and E. Chain C is not in any group.
RMSD (A) from chain A: 0.0 0.72
shows the RMSD of matching atoms between the master and every other copy, The list of numbers is the list of matching residues by residue number. Note that this is not the exact selection, as it appears in: 4boz_simple_ncs_from_pdb.ncs.
ROTA and TRANS are the rotation and translation information. In the .ncs_spec file and the default on-screen representation of the results, the rotation and translation are:
Master = ROT x Copy + TRANS
While in the summery and in the CCBTX implementation the coomon use is:
Copy = ROT x Master + TRANS
So the rotation/Translation are the inverse of each other in the two formats.
A portion of the contents of the 4boz_simple_ncs_from_pdb.ncs_spec file, which you can edit if you want and which you can use in the AutoBuild Wizard, are shown below. NOTE: The ncs operators describe how to map the N'th ncs-related copy on to the first copy.
Summary of NCS information Thu Apr 2 15:44:03 2015 /net/cci-filer2/raid1/home/... new_ncs_group new_operator rota_matrix 1.0000 0.0000 0.0000 rota_matrix 0.0000 1.0000 0.0000 rota_matrix 0.0000 0.0000 1.0000 tran_orth 0.0000 0.0000 0.0000 center_orth 24.4880 -13.3177 -20.1848 CHAIN A RMSD 0 MATCHING 155 RESSEQ 147:150 RESSEQ 152:211 RESSEQ 213:275 RESSEQ 280:305 RESSEQ 307:308 new_operator rota_matrix 0.7955 -0.5660 0.2164 rota_matrix -0.5511 -0.8242 -0.1300 rota_matrix 0.2520 -0.0159 -0.9676 tran_orth 18.4021 5.3569 -23.3621 center_orth 15.9430 11.8822 0.6609 CHAIN D RMSD 0.817 MATCHING 155 RESSEQ 147:150 RESSEQ 152:211 RESSEQ 213:275 RESSEQ 280:305 RESSEQ 307:308
e.g. set bigger chain_max_rmsd, smaller chain_similarity_threshold.
Simple_ncs_from_pdb produces very complicated selections, like
reference = (chain 'A' and (resid 147:150 or resid 152:194 or (resid 195 and (name N or name CA or name C or name O or name CB )) or resid 196:211 or resid 213:229 or (resid 230 and (name N or name CA or name C or name O or name CB or name CG )) or resid 231:298 or (resid 299 and (name N or name CA or name C or name O or name CB or name CG )) or resid 300:305 or resid 307:308))
This indicates that procedure is trying to exclude some residues or atoms from the selections. This could be caused by alternative confomations of residues (not supported), poor alignment of particular residues (try to increase residue_match_radius) or missing atoms in one of the chains.
Master = ROT x Copy + TRANS
and not:
Copy = ROT x Master + TRANS
They are the inverse of the rotation and translation that are used in the implementation of the NCS relation.
To run simple_ncs_from_pdb in validation mode one should supply NCS groups in form of phil file. The purpose of this mode is to illustrate how your NCS annotations will be filtered in phenix.refine and phenix.real_space_refine .
We filter annotations because we need to be sure that:
Without all these NCS restraints and constraints cannot work reliably. We understand that it could be tricky to manually create such selections, especially for poor models where some residues may be not modelled yet or lacking some atoms. Therefore we filter user-supplied annotations before refinement by default and there is no way to turn it off.
For every NCS group supplied by a user we:
Select part of the model containing reference atoms and all selection atoms.
Then we search for NCS in this sub-model using parameters from ncs_search scope with some modifications. Precisely:
These modifications are used to relax criteria for validation to a large extent to accomodate poor matches in poor models.
If found NCS relations cover all atoms selected in 1, validation passed, the NCS group is used as was defined by a user If found NCS relations do not cover all the initial atoms selected in 1, we create new NCS selections and use them further in refinement. If no NCS relations were found, the whole NCS group is discarded.
Note, we only filter supplied selections. Nothing extra will be added.
One may find something like this in the log-file:
Validating user-supplied NCS groups... Validating: ncs_group { reference = "(chain A and resid 11:376 and not (resid 19:31 or resid 42:67 or resid 70:123 or resid 132:145 or resid 148:184 or resid 190:249 or resid 339:357 or resid 377:396))" selection = "(chain C and resid 11:376 and not (resid 19:31 or resid 42:67 or resid 70:123 or resid 132:145 or resid 148:184 or resid 190:249 or resid 339:357 or resid 377:396))" } MODIFIED. Some of the atoms were excluded from your selection. The most common reasons are: 1. Missing residues in one or several copies in NCS group. 2. Presence of alternative conformations (they are excluded). 3. Residue mismatch in requested copies. Please check the validated selection further down.
and further down something like:
Found NCS groups: ncs_group { reference = (chain 'A' and (resid 11 through 18 or resid 32 through 41 or resid 68 through 6 \ 9 or resid 124 through 125 or resid 129 through 131 or resid 146 through 147 or \ resid 185 through 189 or resid 250 through 338 or resid 358 through 364 or resid \ 375 through 376)) selection = (chain 'C' and (resid 11 through 18 or resid 32 through 41 or resid 68 through 6 \ 9 or resid 124 through 131 or resid 146 through 147 or resid 185 through 189 or \ resid 250 through 338 or resid 358 through 376)) }
This is how NCS selections were modified for one of the reasons listed above. In this example, they are essentially the same with slightly different syntax. Let's look at the differences.
These two are the only differences from your selection and they don't change anything! They look much different because we have to construct new selections from scratch (internal representation of selections) automatically.
When one sees something like this in new selections:
(resid 28 and (name N or name CA or name C or name O or name CB or name CG or name CD ))
this is usually because another chain is lacking particular atom(s). Therefore we select particular atoms of the residue that present in all other copies. In this particular case, residue 28 in chain A has NE and CZ atoms, while chain B lacks them.
Other messages one can see is:
REJECTED because copies don't match good enough. Try to revise selections or adjust chain_similarity_threshold or chain_max_rmsd parameters. OK. All atoms were included in validated selection.