Thank you, Oleg, for the explanation. I was not aware of the history of segids.
As someone who has started in crystallography in 1999, I've never
used segids. They have only been a nuisance. To my knowledge, they
are not intentionally added by users like me, who only play with
chain IDs, but software adds them. There is a "Change chain ID"
menu item in Coot but no segid equivalent. I've always thought
Phaser or Coot added them during some operation, since they appear
after molecular replacement and/or model building, but I have not
investigated, so I may be wrong. In Kevin's case, it was AutoBuild
(and Tom kindly fixed that).
Reading your message, I am convinced segids are unnecessary and
unused. So, ignore them I'd say, but I'm sure the phenix team has
thought more deeply about this, and knows of cases of actual use.
However, I do not agree with the point on second-order consequences: Neither Coot nor PyMOL can display these cifs with correct chain sequences. and wwPDB does not accept them. Losing all carefully curated chain IDs (sometimes going up to AA and on) because of a stray segid is a pain. Those are significant consequences at the moment.
Regardless, I should be able to fix this issue going forward. I
really appreciate all the work you and the team does.
Thank you!
Engin
P.S. I can now go back to worrying only about having a ton of new
chains for each N-linked glycan in the cif file.
Hi Engin,
I share your frustration over this issue. Without defending the current approach too much, let me share the rationale behind what is going on in Phenix.
A brief historical note, thanks to https://www.wwpdb.org/documentation/file-format
SEGID was not in the original PDB format description of 1972. It was introduced in 1996: https://cdn.rcsb.org/wwpdb/docs/documentation/file-format/PDB_format_Jan_1996.pdf and quickly disappeared in 1998: https://www.wwpdb.org/documentation/file-format-content/format23/sect9.html#ATOM
Likely, because of a real necessity, two years were enough for the community to start using it and refuse to let it go despite its disappearance from the format specifications. The demand for the support of segid was probably the reason why CCTBX processes it even though CCTBX was first published in 2002.
Part of the structural biology community is using segids largely instead of chain IDs, often leaving the chain ID field blank. This is the major use case I'm aware of and the case CCTBX supports.
Now comes mmCIF, and there is NO place for segid because there has been no formal segid definition for the last 25 years: https://mmcif.wwpdb.org/docs/pdb_to_pdbx_correspondences.html#ATOMP
The absence of segid prevents us from converting such PDB files into mmCIF directly, so we have to get creative. Here is the present state of my understanding: either no segids and no ambiguity, or segids are used instead of chain IDs. I admit this is a rather narrow use-case scenario, and I can definitely see that random leftover or carry-over segids can spoil the output.
Connectivity differences resulting from such mmCIF/PDB files are second-order consequences, as we definitely use a lot of heuristics to figure out connectivity, and since conversion PDB ⇄ mmCIF is not equivalent in the presence of random segids, connectivity might be compromised.
The hope is that with the PDB format being gradually phased out, all of this will be of less concern for developers and users.
Best regards,
Oleg Sobolev.
On Tue, Mar 11, 2025 at 6:15 PM Engin Özkan <[email protected]> wrote:
_______________________________________________Hi,
This was the source of a recent issue I have been having with pdb depositions. There were no ligands involved; as Oleg explained, presence of segids resulted in a cif file uninterpretable by PyMOL, Coot and for pdb deposition as chain ids were overwritten in the cif file only, not pdb.
If phenix chooses to overwrite chain ids with segids in cif, while I would not prefer that, that's one rational way. I am puzzled, though, why the pdb file is not handled the same way? Why produce pdb and cif files with different chain connectivities? I think treating both files consistently makes most sense.
(Back to re-running all phenix.refine jobs after deleting all segid columns...)
Sorry if I am misinterpreting this and thank you!
Engin
On 2/10/25 11:51 AM, Kevin M Jude wrote:
Got it. Phenix.autobuild assigns the UNK segid to chains with low confidence where the backbone was correct but sequence assignment was not. I corrected the sequence assignment for those positions in Coot, but the presence of the segids is not apparent without looking at the plain text of the pdb file.
Best wishes
Kevin
_______________________________________________ phenixbb mailing list -- [email protected] To unsubscribe send an email to [email protected] Unsubscribe: phenixbb-leave@%(host_name)s-- Engin Özkan, Ph.D. Associate Professor Dept of Biochemistry and Molecular Biology University of Chicago Phone: (773) 834-5498 http://ozkan.uchicago.edu
phenixbb mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Unsubscribe: phenixbb-leave@%(host_name)s