Got it. Phenix.autobuild assigns the UNK segid to chains with low confidence where the backbone was correct but sequence assignment was not. I corrected the sequence assignment for those positions in Coot, but the presence of the segids is not apparent without looking at the plain text of the pdb file.

 

Best wishes

 

Kevin

 

 

From: Oleg Sobolev <[email protected]>
Date: Monday, February 10, 2025 at 9:31
AM
To: Kevin M Jude <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: [phenixbb] .cif file has 'UNK' auth_asym_id

Hi Kevin,

 

Thank you for the clarification; the situation is much clearer now.

 

The "UNK" in the segid field was the root cause of the issue. When segid is present, Phenix prioritizes it over the chain ID when reading PDB files. This explains why the segid was applied to the entire chain. Essentially, Phenix interprets any segid as a chain ID. Handling cases with different segids for the same chain becomes overly complex, especially since the segid is not a commonly used feature in the PDB format. I'm not sure whether your specific case could be accommodated within the current processing workflow.

 

When Phenix reads an mmCIF model file, it uses the auth_asym_id as the chain ID and largely disregards the label_asym_id so that explains duplicated labels.

 

Great that you figured out the root cause, please let me know if you have more questions!

 

Best regards,

Oleg Sobolev

 

 

On Fri, Feb 7, 2025 at 8:10PM Kevin M Jude <[email protected]> wrote:

Thanks Oleg. I investigated some more and found a clue:

A few residues at the termini of the B and C chains in the input pdb file have UNK as the segid (column 73-75). The segids were introduced in autobuild.  I had apparently noticed this and removed them in the .pdb when I finished refinement a few months ago, because the final output pdb file has a later edit date than the rest of the output files. In the .cif files, all residues in those chains are labeled as ‘UNK’ in the auth_asym_id. Now three months later when making Table 1 using the .cif file, I was surprised when phenix complained about ‘duplicate atoms’ in the cif file.

 

So now I guess the mystery to me is why phenix extends the UNK segid to the whole chain, and why phenix sees atoms with the same auth_asym_id (segid) but different label_asym_id (chain) as being duplicates. I’ll leave it to you to decide if this is bug in the program or in the user, but still happy to share my files with you off-list if you like.

 

Best wishes

Kevin

 

From: Oleg Sobolev <[email protected]>
Date: Friday, February 7, 2025 at 4:52
PM
To: Kevin M Jude <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: [phenixbb] .cif file has 'UNK' auth_asym_id

Hi Kevin,

 

Thank you for the report. I would be happy to fix the issue. For this I need to be able to reproduce it myself. Can you please share (off-list) the inputs that you used for the refinement to produce this result? All files will be treated confidentially.

 

Best regards,

Oleg Sobolev.

 

On Fri, Feb 7, 2025 at 12:34PM Kevin M Jude <[email protected]> wrote:

I’ve finished refining a structure with three protein chains A, B, C. The pdb file looks ‘normal’ to me, but when I inspect the .cif file written by phenix, the auth_asym_id for chains B and C is ‘UNK’. label_asym_id is correct for all chains. I’m not really sure what the difference is between the auth_ and label_ fields. When I try to perform actions on the .cif file, I get duplicate atom label errors.

 

Here’s a few example lines from the .pdb and .cif files, where auth_asym_id is field 6 and label_asym_id is field 16

 

ATOM    460  OXT PRO A  59       5.272  55.689  31.481  1.00 52.63           O

ANISOU  460  OXT PRO A  59     6541   6212   7244   1448   1260     96       O

TER

ATOM    461  N   ALA B   2      39.292  61.974  39.403  1.00 67.81           N

ANISOU  461  N   ALA B   2     7118   8729   9916  -1711   -388   2066       N

 

   ATOM 460 OXT . PRO A 59 ? 5.27169 55.68852 31.48123 1.000 52.63052 O ? A ? 58 1

   ATOM 461 N . ALA UNK 2 ? 39.29207 61.97430 39.40271 1.000 67.80544 N ? B ? 1 1

 

I’m able to convert the pdb file to a usable cif file using gemmi but wanted to report this weird behavior with phenix 1.21.2_5419.

 

-- 

Kevin Jude, PhD

Structural Biology Research Specialist, Garcia Lab

Howard Hughes Medical Institute

Stanford University School of Medicine

Beckman B177, 279 Campus Drive, Stanford CA 94305

_______________________________________________
phenixbb mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Unsubscribe: phenixbb-leave@%(host_name)s