[phenixbb] single atom mmcif bug

Edwin Pozharski pozharskibb at gmail.com
Wed Sep 5 13:35:15 PDT 2018


There appears to be a bug in phenix.cif_as_pdb. I am looking at a bunch of
files that contain very few atoms.  When it's more than one, mmcif contains
multiple HETATM records in a loop.  However,  single atom entries seem to
encode atomic coordinates in a different way, like so

#
_atom_site.group_PDB             HETATM
_atom_site.id                    4905
_atom_site.type_symbol           CA
_atom_site.label_atom_id         CA
_atom_site.label_alt_id          .
_atom_site.label_comp_id         CA
_atom_site.label_asym_id         L
_atom_site.label_entity_id 5
_atom_site.label_seq_id          .
_atom_site.pdbx_PDB_ins_code     .
_atom_site.Cartn_x               0.394
_atom_site.Cartn_y               -6.965
_atom_site.Cartn_z               18.775
_atom_site.occupancy             1
_atom_site.B_iso_or_equiv        2
_atom_site.pdbx_formal_charge    ?
_atom_site.auth_atom_id          CA
_atom_site.auth_comp_id          CA
_atom_site.auth_asym_id          A
_atom_site.auth_seq_id           701
_atom_site.pdbx_PDB_model_num    1

This does not include _loop statement and instead lists parameters
directly.  If I try to convert this to pdb, cif_as_pdb fails like so (and I
suspect that pretty much any cif-file read would fail)

  Python argument types in
    double.__init__(double, str)
did not match C++ signature:
    __init__(boost::python::api::object, boost::python::numeric::array)
    __init__(boost::python::api::object, boost::python::tuple)
    __init__(boost::python::api::object, boost::python::list)
    __init__(boost::python::api::object, std::vector<double,
std::allocator<double> >)
    __init__(boost::python::api::object, scitbx::af::const_ref<std::string,
scitbx::af::trivial_accessor>)
    __init__(_object*, scitbx::af::shared_plain<double>)
    __init__(_object*, unsigned long)
    __init__(_object*, unsigned long, double)
    __init__(_object*, scitbx::af::flex_grid<scitbx::af::small<long, 10ul>
>)
    __init__(_object*, scitbx::af::flex_grid<scitbx::af::small<long, 10ul>
>, double)
    __init__(_object*)

This has nothing to do with how many atoms are there, obviously - in fact,
removing all HETATM records but one from a cif file that has _loop
statement does not produce this bug.

The obvious guess is that the bug is somewhere in mmdb (I am assuming here
that phenix relies on it for cif read/write).

This is not a big deal, of course - in most scenarios, a structure contains
more than one atom.  It does not interfere with me either - I am discarding
single atom cases in this analysis anyway (so it is actually serves as an
unintentional filter).

Still, it looks like a bug.  I am not an expert in mmcif format, but iiuc,
there is no strict requirement for atom_site category to have a _loop.  If
there is (I can't find any such syntax requirement), then the source of the
data files (PDBe coordinate server) would be at fault for not structuring
the data properly.

Cheers,

Ed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/phenixbb/attachments/20180905/7249e8e9/attachment.htm>


More information about the phenixbb mailing list