[cctbxbb] Space-saving tips

Tristan Croll tic20 at cam.ac.uk
Mon Jan 25 06:36:55 PST 2021


Hi all,

Following on from Graeme's email about .o files, another way to quite dramatically reduce the size of the distribution (with little impact on code complexity or performance) would be to keep all the various text-based data files in .zip format. With modern Python fetching the data from these is little different from working with uncompressed files anyway (1-2 extra lines of code, generally negligible runtime cost). A couple of snippets as an example from ISOLDE (where I keep all MD ligand definition files in a .zip):

To get a list of the file contents:

    def _ligand_db_from_zip(self, ligand_zip):
        from zipfile import ZipFile
        import os
        namelist = []
        with ZipFile(ligand_zip) as zf:
            for fname in zf.namelist():
                name, ext = os.path.splitext(fname)
                if ext.lower() == '.xml':
                    namelist.append(name)
        return namelist

To read a given file from the zip as needed:

        if ligand_db is not None:
            zip, namelist = ligand_db
            if name in namelist:
                from zipfile import ZipFile
                logger.info('Loading residue template for {} from internal database'.format(name))
                with ZipFile(zip) as zf:
                    with zf.open(name+'.xml') as xf:
                        forcefield.loadFile(xf)

Even just doing this for the CaBLAM contour data cuts about 160 MB off the size of the uncompressed distribution.

Best regards,

Tristan




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/cctbxbb/attachments/20210125/c5ecd571/attachment.htm>


More information about the cctbxbb mailing list