[cctbxbb] Scons for python3 released

Nicholas Devenish ndevenish at gmail.com
Tue Oct 17 03:56:19 PDT 2017


Hi All,

I spent a little bit of time looking at python3/libtbx so have some input
on this.

On Tue, Oct 10, 2017 at 6:16 PM, Billy Poon <bkpoon at lbl.gov> wrote:
>
> 1) Use Python 2 to build Python 2 version of CCTBX (no work)
>

This might not be as simple as "No Work" - cctbx is a few years behind on
SCons versions (libtbx.scons --version suggests 2.2.0, from 2012) so there
*might* be other issues upgrading the SCons version to 3.0, before trying
python3.

I also feel that SCons-Python3 is something of a red herring - the only
thing that non-python3-SCons prevents is an 100% python3-only codebase, and
unless the plan is to migrate the entire codebase, including all downstream
dependencies (like dials) to python3-only in one massive step (probably
impossible), everything would need to be dual 2/3 first, and only then a
decision taken on deprecating 2.7 support.

More usefully, outside of a small core of libtbx code, not much of the
buildsystem files are bound to the main project so this shouldn't be too
difficult. In fact, I've experimented with converting to CMake, and as one
of the approaches I explored, I wrote a SCons-emulator that read and parsed
the build *without* any scons/cctbx dependencies. To parse the entire
"builder=dials" SCons-tree only required this subset of libtbx:
https://github.com/ndevenish/read_scons/blob/master/tbx2cmake/import_env.py#L202-L235

(Note: my general CMake-work works but isn't complete/ready/documented for
general viewing, and still much resembles a hacky project, but I thought
that this was sufficient to decouple the buildsystem is usefully
illustrative of how simple the task might be)




Regarding general Python3 conversion, it's definitely not "Just changing
the print statements". I undertook a study in august to convert libtbx
(being the core that *everything* depends on) to dual python2/3 and IIRC
got most of the tests working in python3. It's a couple of months
out-of-date, but is probably useful as a benchmark of the effort required.
The repository links are:

    https://github.com/ndevenish/cctbx_project/tree/py3k-modernize
    https://github.com/ndevenish/cctbx_project/tree/py3k

Probably best looked at with a graphical viewer to get a top-down view of
the history. My approach was to separate manual/automatic changes as
follows:

1. Remove legacy code/modules - e.g. old compatibility. The Optik removal
came from this. We don't want to spend mental effort converting absorbed
external libraries from a decade ago (see also e.g. pexpect,
subprocess_with_fixes)
2. Make some manual fixes [Expanded as we go on]
3. Use futurize and modernize to update idioms ONLY e.g. remove pre-2.7
deprecated ways of working. Each operation was done is a separate commit
(so that changes are more visible and I thought people would have less
objection than to a massive code-change dump), and each commit ran the test
suite for libtbx. Some of the 'fixers' in each tool are complementary. If
there are any problems with tests or automatic conversion, then fix the
problem, put the fix into step 2, then start again. This step should be
entirely scriptable. I had 17 commits for separate fixes in this chain.

This is the where the py3k-modernize branch stops, and should in principle
be kept entirely safe to push back onto the python2-only repository. The
next steps form the `py3k` branch (not being intended for direct pushing,
is a little less organised - some of my changes could definitely be moved
to step 2):

4. Run 'modernize' to convert the codebase to as much python2/3 as
possible. This introduces the dependency on 'six'
5. Run tests, implement various fixes, repeat. This work was ongoing when I
stopped working on the study.

Various (non-exhaustive) problems found:
- cStringIO isn't handled automatically, so these need to be fixed manually
( e.g.
https://github.com/ndevenish/cctbx_project/commit/c793eb58acc37c60360dccbbbdd5205504ec3f1a
)
- Iterators needed to be fixed in cases where they were missed (next vs
__next__)
- Rounding. Python3 uses 'Bankers Rounding' and there are formatting tests
where this changes the output. I didn't know enough about the exact desired
result to know the best way to fix this
- libtbx uses compiler.misc.mangle and I don't know why - this was always a
private interface and was removed in 3.
- Moving print statements to functions - there was several failed tests
relating to the old python2-print-soft-spacing behaviour, which was
removed. Not too difficult, but definitely causes
- A couple of text/binary mode file issues, which seemed to be simple but
may be more complicated than the test cases covered. I'd expect more issues
with this in the format readers though.

I evaluated both the futurize (using future library) and modernize (using
the well known six library) tools, both being different approaches to 2to3,
but for dual 2/3 codebases. I liked the approach of futurize to attempt to
make code look as python3-idiomatic as-possible, but some of the
performance implications were slightly opaque: e.g. libtbx makes heavy use
of cStringIO (presumably for a good reason), and futurize converted all of
these back to using StringIO in the Python2 case, so settled on modernize
as I felt two different compatibility libraries would be messy. In either
case, using the library means that you can identify exactly everywhere that
needs to be removed when moving to python3 only.


My conclusions:
- Automatic tools are useful for the bulk of changes, but there are still
lots of edge cases
- The complexity means that a phased approach is *absolutely* necessary -
starting by converting the core to 2/3 and only moving to 3 once everything
downstream is converted.Trying to convert everything at once would likely
mean months of feature-freeze.
- A separate "Remove legacy" cleaning phase might be very useful, though
obviously the domain of this could be endless
- SCons is probably the least important of the conversion worries


Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/cctbxbb/attachments/20171017/fb1a60c7/attachment.htm>


More information about the cctbxbb mailing list