Dear Edwin,

Thank you for pointing it out!

It was oversight in procedure for creating internal model representation from mmCIF files in cctbx library. I'm happy to report that this has been fixed and newer versions of Phenix (after 3254) and cctbx library should behave as expected.

Cctbx uses its own tools to parse mmCIF, so I don't know if this corner-case could be an issue for other libraries/software.

Best regards,
Oleg Sobolev.


On Wed, Sep 5, 2018 at 1:35 PM, Edwin Pozharski <pozharskibb@gmail.com> wrote:
There appears to be a bug in phenix.cif_as_pdb. I am looking at a bunch of files that contain very few atoms.  When it's more than one, mmcif contains multiple HETATM records in a loop.  However,  single atom entries seem to encode atomic coordinates in a different way, like so

#
_atom_site.group_PDB             HETATM
_atom_site.id                    4905
_atom_site.type_symbol           CA
_atom_site.label_atom_id         CA
_atom_site.label_alt_id          .
_atom_site.label_comp_id         CA
_atom_site.label_asym_id         L
_atom_site.label_entity_id 5
_atom_site.label_seq_id          .
_atom_site.pdbx_PDB_ins_code     .
_atom_site.Cartn_x               0.394
_atom_site.Cartn_y               -6.965
_atom_site.Cartn_z               18.775
_atom_site.occupancy             1
_atom_site.B_iso_or_equiv        2
_atom_site.pdbx_formal_charge    ?
_atom_site.auth_atom_id          CA
_atom_site.auth_comp_id          CA
_atom_site.auth_asym_id          A
_atom_site.auth_seq_id           701
_atom_site.pdbx_PDB_model_num    1

This does not include _loop statement and instead lists parameters directly.  If I try to convert this to pdb, cif_as_pdb fails like so (and I suspect that pretty much any cif-file read would fail)

  Python argument types in
    double.__init__(double, str)
did not match C++ signature:
    __init__(boost::python::api::object, boost::python::numeric::array)
    __init__(boost::python::api::object, boost::python::tuple)
    __init__(boost::python::api::object, boost::python::list)
    __init__(boost::python::api::object, std::vector<double, std::allocator<double> >)
    __init__(boost::python::api::object, scitbx::af::const_ref<std::string, scitbx::af::trivial_accessor>)
    __init__(_object*, scitbx::af::shared_plain<double>)
    __init__(_object*, unsigned long)
    __init__(_object*, unsigned long, double)
    __init__(_object*, scitbx::af::flex_grid<scitbx::af::small<long, 10ul> >)
    __init__(_object*, scitbx::af::flex_grid<scitbx::af::small<long, 10ul> >, double)
    __init__(_object*)

This has nothing to do with how many atoms are there, obviously - in fact, removing all HETATM records but one from a cif file that has _loop statement does not produce this bug.

The obvious guess is that the bug is somewhere in mmdb (I am assuming here that phenix relies on it for cif read/write).  

This is not a big deal, of course - in most scenarios, a structure contains more than one atom.  It does not interfere with me either - I am discarding single atom cases in this analysis anyway (so it is actually serves as an unintentional filter). 

Still, it looks like a bug.  I am not an expert in mmcif format, but iiuc, there is no strict requirement for atom_site category to have a _loop.  If there is (I can't find any such syntax requirement), then the source of the data files (PDBe coordinate server) would be at fault for not structuring the data properly.

Cheers,

Ed.

_______________________________________________
phenixbb mailing list
phenixbb@phenix-online.org
http://phenix-online.org/mailman/listinfo/phenixbb
Unsubscribe: phenixbb-leave@phenix-online.org