There appears to be a bug in phenix.cif_as_pdb. I am looking at a bunch of
files that contain very few atoms. When it's more than one, mmcif contains
multiple HETATM records in a loop. However, single atom entries seem to
encode atomic coordinates in a different way, like so
#
_atom_site.group_PDB HETATM
_atom_site.id 4905
_atom_site.type_symbol CA
_atom_site.label_atom_id CA
_atom_site.label_alt_id .
_atom_site.label_comp_id CA
_atom_site.label_asym_id L
_atom_site.label_entity_id 5
_atom_site.label_seq_id .
_atom_site.pdbx_PDB_ins_code .
_atom_site.Cartn_x 0.394
_atom_site.Cartn_y -6.965
_atom_site.Cartn_z 18.775
_atom_site.occupancy 1
_atom_site.B_iso_or_equiv 2
_atom_site.pdbx_formal_charge ?
_atom_site.auth_atom_id CA
_atom_site.auth_comp_id CA
_atom_site.auth_asym_id A
_atom_site.auth_seq_id 701
_atom_site.pdbx_PDB_model_num 1
This does not include _loop statement and instead lists parameters
directly. If I try to convert this to pdb, cif_as_pdb fails like so (and I
suspect that pretty much any cif-file read would fail)
Python argument types in
double.__init__(double, str)
did not match C++ signature:
__init__(boost::python::api::object, boost::python::numeric::array)
__init__(boost::python::api::object, boost::python::tuple)
__init__(boost::python::api::object, boost::python::list)
__init__(boost::python::api::object, std::vector<double,
std::allocator<double> >)
__init__(boost::python::api::object, scitbx::af::const_ref<std::string,
scitbx::af::trivial_accessor>)
__init__(_object*, scitbx::af::shared_plain<double>)
__init__(_object*, unsigned long)
__init__(_object*, unsigned long, double)
__init__(_object*, scitbx::af::flex_grid<scitbx::af::small<long, 10ul>
>)
__init__(_object*, scitbx::af::flex_grid<scitbx::af::small<long, 10ul>
>, double)
__init__(_object*)
This has nothing to do with how many atoms are there, obviously - in fact,
removing all HETATM records but one from a cif file that has _loop
statement does not produce this bug.
The obvious guess is that the bug is somewhere in mmdb (I am assuming here
that phenix relies on it for cif read/write).
This is not a big deal, of course - in most scenarios, a structure contains
more than one atom. It does not interfere with me either - I am discarding
single atom cases in this analysis anyway (so it is actually serves as an
unintentional filter).
Still, it looks like a bug. I am not an expert in mmcif format, but iiuc,
there is no strict requirement for atom_site category to have a _loop. If
there is (I can't find any such syntax requirement), then the source of the
data files (PDBe coordinate server) would be at fault for not structuring
the data properly.
Cheers,
Ed.