[cctbxbb] some thoughts on cctbx and pip

Hi Tristan,

cctbx could be built to use your ChimeraX python, now that cctbx is moving to Python 3. The option —with-python is there for that with the bootstrap script. The specific environment setup boil down to setting two environment variable LIBTBX_BUILD and either LD_LIBRARY_PATH on Linux, PATH on Win32, or DYLIB_LIBRARY_PATH on MacOS. If you work within a framework such as ChimeraX, that should not be difficult to ensure those two variables are set.

> To add my two cents on this: probably the second-most common question I've had about ISOLDE's implementation is, "why didn't you use CCTBX?". The honest answer to that is, "I didn't know how."
> Still don't, really - although the current developments are rather promising. The problem I've faced is that CCTBX was designed as its own self-contained Python (2.7, until very recently) environment, with its own interpreter and a lot of very specific environment setup. Meanwhile I'm developing ISOLDE in ChimeraX, which is *also* its own self-contained Python (3.7) environment. To plug one into the other in that form... well, I don't think I'm a good enough programmer to really know where to start.
> The move to Conda and a more modular CCTBX architecture should make a lot more possible in that direction. Pip would be even better for me personally (ChimeraX can install directly from the PyPI, but doesn't interact with Conda) - but I understand pretty well the substantial challenge that would amount to (not least being that the PyPI imposes a limit - around 100MB from memory? - on the size of an individual package).
>> Hi Graeme,
>> Yes, I know. But “black" is a program doing a very particular task
>> (code formatting from the top of my head). Requiring to use a wrapper
>> for python itself is another level. But ok, I think I am mellowing to
>> the idea after all! Talking with people around me, and extrapolating,
>> I would bet that, right now, a great majority of people interested by
>> cctbx in pip have already used the cctbx, so they know about the
>> Python wrapper, and they would not be too sanguine about that. My
>> concern is for the future, when pip will be the first time some people
>> use cctbx. Big fat warning notices on PyPI page and a better error
>> message when cctbx fails because LIBTBX_BUILD is not set would be
>> needed but that could be all right.
>> If we do a pip installer, we should aim at a minimal install: cctbx,
>> iotbx and their dependencies, and that’s it.
>>> Without discussing the merits of this or whether we _choose_ to make the move to supporting PIP, I am certain it would be _possible_ - many other packages make dispatcher scripts when you pip install them e.g.
>>> Silver-Surfer rescale_f2 :) $ which black; cat $(which black)
>>> /Library/Frameworks/Python.framework/Versions/3.6/bin/black
>>> #!/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6
>>> # -*- coding: utf-8 -*-
>>> import re
>>> import sys
>>> from black import main
>>> if __name__ == '__main__':
>>>   sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
>>>   sys.exit(main())
>>> So we _could_ work around the absence of LIBTBX_BUILD etc. in the system. Whether or not we elect to do the work is a different question, and it seems clear that here are very mixed opinions on this.
>>> Hi,
>>> Even if we managed to ship our the boost dynamic libraries with pip, it would still not be pip-like, as we would still need our python wrappers to set LIBTBX_BUILD and LD_LIBRARY_PATH. Normal pip packages work with the standard python exe. LD_LIBRARY_PATH, we could get around that by changing the way we compile, using -Wl,-R, which is the runtime equivalent of build time -L. That’s a significant change that would need to be tested. But there is no way around setting LIBTBX_BUILD right now. Leaving that to the user is horrible. Perhaps there is a way to hack libtbx/env_config.py so that we can hardwire LIBTBX_BUILD in there when pip installs?
>>> Hi,
>>> I did look into that many years ago, and even toyed with building a pip installer. What stopped me is the exact conclusion you reached too: the user would not have the pip experience he expects. You are right that it is a lot of effort but is it worth it? Considering that remark, I don’t think so. Now, Conda was created specifically to go beyond pip pure-python-only support. Since cctbx has garnered support for Conda, the best avenue imho is to go the extra length to have a package on Anaconda.org<http://anaconda.org/>, and then to advertise it hard to every potential user out there.
>>> Hi, to avoid clouding Dorothee's documentation email thread, which I think is a highly useful enterprise, here's some thoughts about putting cctbx into pip.  Pip doesn't install non-python dependencies well.  I don't think boost is available as a package on pip (at least the package version we use).  wxPython4 isn't portable through pip (https://wiki.wxpython.org/How%20to%20install%20wxPython#Installing_wxPython-Phoenix_using_pip).  MPI libraries are system dependent.  If cctbx were a pure python package, pip would be fine, but cctbx is not.
>>> All that said, we could build a manylinux1 version of cctbx and upload it to PyPi (I'm just learning about this).  For a pip package to be portable (which is a requirement for cctbx), it needs to conform to PEP513, the manylinux1 standard (https://www.python.org/dev/peps/pep-0513/).  For example, numpy is built according to this standard (see https://pypi.org/project/numpy/#files, where you'll see the manylinux1 wheel).  Note, the manylinux1 standard is built with Centos 5.11 which we no longer support.
>>> There is also a manylinux2010 standard, which is based on Centos 6 (https://www.python.org/dev/peps/pep-0571/).  This is likely a more attainable target (note though by default C++11 is not supported on Centos 6).
>>> If we built a manylinuxX version of cctbx and uploaded it to PyPi, the user would need all the non-python dependencies.  There's no way to specify these in pip.  For example, cctbx requires boost 1.63 or better.  The user will need to have it in a place their python can find it, or we could package it ourselves and supply it, similar to how the pip h5py package now comes with an hd5f library, or how the pip numpy package includes an openblas library.  We'd have to do the same for any packages we depend on that aren't on pip using the manylinux standards, such as wxPython4.
>>> Further, we need to think about how dials and other cctbx-based packages interact.  If pip install cctbx is set up, how does pip install dials work, such that any dials shared libraries can find the cctbx libraries?  Can shared libraries from one pip package link against libraries in another pip package?  Would each package need to supply its own boost?  Possibly this is well understood in the pip field, but not by me :)
>>> Finally, there's the option of providing a source pip package.  This would require the full compiler toolchain for any given platform (macOS, linux, windows).  These are likely available for developers, but not for general users.
>>> Anyway, these are some of the obstacles.  Not saying it isn't possible, it's just a lot of effort.
