[phenixbb] single atom mmcif bug

Edwin Pozharski pozharskibb at gmail.com
Wed Sep 5 13:35:15 PDT 2018

There appears to be a bug in phenix.cif_as_pdb. I am looking at a bunch of
files that contain very few atoms.  When it's more than one, mmcif contains
multiple HETATM records in a loop.  However,  single atom entries seem to
encode atomic coordinates in a different way, like so

_atom_site.group_PDB             HETATM
_atom_site.id                    4905
_atom_site.type_symbol           CA
_atom_site.label_atom_id         CA
_atom_site.label_alt_id          .
_atom_site.label_comp_id         CA
_atom_site.label_asym_id         L
_atom_site.label_entity_id 5
_atom_site.label_seq_id          .
_atom_site.pdbx_PDB_ins_code     .
_atom_site.Cartn_x               0.394
_atom_site.Cartn_y               -6.965
_atom_site.Cartn_z               18.775
_atom_site.occupancy             1
_atom_site.B_iso_or_equiv        2
_atom_site.pdbx_formal_charge    ?
_atom_site.auth_atom_id          CA
_atom_site.auth_comp_id          CA
_atom_site.auth_asym_id          A
_atom_site.auth_seq_id           701
_atom_site.pdbx_PDB_model_num    1

This does not include _loop statement and instead lists parameters
directly.  If I try to convert this to pdb, cif_as_pdb fails like so (and I
suspect that pretty much any cif-file read would fail)

  Python argument types in
    double.__init__(double, str)
did not match C++ signature:
    __init__(boost::python::api::object, boost::python::numeric::array)
    __init__(boost::python::api::object, boost::python::tuple)
    __init__(boost::python::api::object, boost::python::list)
    __init__(boost::python::api::object, std::vector<double,
std::allocator<double> >)
    __init__(boost::python::api::object, scitbx::af::const_ref<std::string,
    __init__(_object*, scitbx::af::shared_plain<double>)
    __init__(_object*, unsigned long)
    __init__(_object*, unsigned long, double)
    __init__(_object*, scitbx::af::flex_grid<scitbx::af::small<long, 10ul>
    __init__(_object*, scitbx::af::flex_grid<scitbx::af::small<long, 10ul>
>, double)

This has nothing to do with how many atoms are there, obviously - in fact,
removing all HETATM records but one from a cif file that has _loop
statement does not produce this bug.

The obvious guess is that the bug is somewhere in mmdb (I am assuming here
that phenix relies on it for cif read/write).

This is not a big deal, of course - in most scenarios, a structure contains
more than one atom.  It does not interfere with me either - I am discarding
single atom cases in this analysis anyway (so it is actually serves as an
unintentional filter).

Still, it looks like a bug.  I am not an expert in mmcif format, but iiuc,
there is no strict requirement for atom_site category to have a _loop.  If
there is (I can't find any such syntax requirement), then the source of the
data files (PDBe coordinate server) would be at fault for not structuring
the data properly.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/phenixbb/attachments/20180905/7249e8e9/attachment.htm>

More information about the phenixbb mailing list