[cctbxbb] Scons for python3 released

Tue Oct 17 05:50:14 PDT 2017

Hi Nick and others,

That sounds like a great effort. A shame I didn't know about this. I 
have not had time to look in detail into your work but will nevertheless 
summarize my thoughts and work I have been doing lately in an effort to 
move CCTBX to python3.

I am not sure why it would be a waste of time to use SCons3.0 with 
python3 as I think you are suggesting. To me it seems as a necessary 
step in creating a codebase that runs both on python2 and python3. Do I 
understand correctly that as long as CCTBX code is changed to comply 
with python3 and remain python2 compliant then such a codebase can be 
used in place of the current python2 only codebase for derived projects 
such as Dials and Phenix? Assuming this is the case I think it is worth 
focusing just on CCTBX only for now.

My own attempt in porting CCTBX to python3 constitutes of the following 
steps:
  * Replace Scons2 with Scons3
  * Update the subset of Boost sources to version 1.63
  * Run futurize stage1 and stage2 on the CCTBX
  * Build base components like libtiff, hdf5, python3.6 + add-on modules)
  * Run bootstrap.py build with Python3.6 repeatedly and provide mock-up 
fixes to allow the build to continue.

This work is almost near completion in the sense that the sources now 
can build but are unlikely to pass test due to the mock-up fixes which 
often constitutes of replacement of PyStringXXX functions with 
equivalent PyUnicodeXXX, PyBytestringXXX functions ignoring whether that 
is appropriate or not. These token fixes would also need to be guarded 
by #if PY_MAJOR_VERSION == 3 ... macros.

The sources are available on 
https://github.com/cctbx/cctbx_project/tree/Python3

The next steps are less well defined. One approach would be to set up a 
build system that migrates python2 code to python3 using the futurize 
script, then builds CCTBX and runs test and presents build log files 
online as in http://cci-vm-6.lbl.gov:8010/one_line_per_build. With a 
hook to GitHub this could also be done on the fly as people commit code 
to CCTBX. This should encourage people to write code that runs on 
python2 as well as python3. Eventually once all tests for CCTBX pass we 
are done and can merge this codebase into the master branch.

Robert

On 17/10/2017 11:56, Nicholas Devenish wrote:
> Hi All,
> 
> I spent a little bit of time looking at python3/libtbx so have some
> input on this.
> 
> On Tue, Oct 10, 2017 at 6:16 PM, Billy Poon <bkpoon at lbl.gov> wrote:
> 
>> 1) Use Python 2 to build Python 2 version of CCTBX (no work)
> 
> This might not be as simple as "No Work" - cctbx is a few years behind
> on SCons versions (libtbx.scons --version suggests 2.2.0, from 2012)
> so there *might* be other issues upgrading the SCons version to 3.0,
> before trying python3.
> 
> I also feel that SCons-Python3 is something of a red herring - the
> only thing that non-python3-SCons prevents is an 100% python3-only
> codebase, and unless the plan is to migrate the entire codebase,
> including all downstream dependencies (like dials) to python3-only in
> one massive step (probably impossible), everything would need to be
> dual 2/3 first, and only then a decision taken on deprecating 2.7
> support.
> 
> More usefully, outside of a small core of libtbx code, not much of the
> buildsystem files are bound to the main project so this shouldn't be
> too difficult. In fact, I've experimented with converting to CMake,
> and as one of the approaches I explored, I wrote a SCons-emulator that
> read and parsed the build *without* any scons/cctbx dependencies. To
> parse the entire "builder=dials" SCons-tree only required this subset
> of libtbx:
> https://github.com/ndevenish/read_scons/blob/master/tbx2cmake/import_env.py#L202-L235
> [1]
> 
> (Note: my general CMake-work works but isn't complete/ready/documented
> for general viewing, and still much resembles a hacky project, but I
> thought that this was sufficient to decouple the buildsystem is
> usefully illustrative of how simple the task might be)
> 
> Regarding general Python3 conversion, it's definitely not "Just
> changing the print statements". I undertook a study in august to
> convert libtbx (being the core that *everything* depends on) to dual
> python2/3 and IIRC got most of the tests working in python3. It's a
> couple of months out-of-date, but is probably useful as a benchmark of
> the effort required. The repository links are:
> 
>     https://github.com/ndevenish/cctbx_project/tree/py3k-modernize [2]
> 
>     https://github.com/ndevenish/cctbx_project/tree/py3k [3]
> 
> Probably best looked at with a graphical viewer to get a top-down view
> of the history. My approach was to separate manual/automatic changes
> as follows:
> 
> 1. Remove legacy code/modules - e.g. old compatibility. The Optik
> removal came from this. We don't want to spend mental effort
> converting absorbed external libraries from a decade ago (see also
> e.g. pexpect, subprocess_with_fixes)
> 2. Make some manual fixes [Expanded as we go on]
> 3. Use futurize and modernize to update idioms ONLY e.g. remove
> pre-2.7 deprecated ways of working. Each operation was done is a
> separate commit (so that changes are more visible and I thought people
> would have less objection than to a massive code-change dump), and
> each commit ran the test suite for libtbx. Some of the 'fixers' in
> each tool are complementary. If there are any problems with tests or
> automatic conversion, then fix the problem, put the fix into step 2,
> then start again. This step should be entirely scriptable. I had 17
> commits for separate fixes in this chain.
> 
> This is the where the py3k-modernize branch stops, and should in
> principle be kept entirely safe to push back onto the python2-only
> repository. The next steps form the `py3k` branch (not being intended
> for direct pushing, is a little less organised - some of my changes
> could definitely be moved to step 2):
> 
> 4. Run 'modernize' to convert the codebase to as much python2/3 as
> possible. This introduces the dependency on 'six'
> 5. Run tests, implement various fixes, repeat. This work was ongoing
> when I stopped working on the study.
> 
> Various (non-exhaustive) problems found:
> - cStringIO isn't handled automatically, so these need to be fixed
> manually ( e.g.
> https://github.com/ndevenish/cctbx_project/commit/c793eb58acc37c60360dccbbbdd5205504ec3f1a
> [4] )
> 
> - Iterators needed to be fixed in cases where they were missed (next
> vs __next__)
> - Rounding. Python3 uses 'Bankers Rounding' and there are formatting
> tests where this changes the output. I didn't know enough about the
> exact desired result to know the best way to fix this
> - libtbx uses compiler.misc.mangle and I don't know why - this was
> always a private interface and was removed in 3.
> 
> - Moving print statements to functions - there was several failed
> tests relating to the old python2-print-soft-spacing behaviour, which
> was removed. Not too difficult, but definitely causes
> - A couple of text/binary mode file issues, which seemed to be simple
> but may be more complicated than the test cases covered. I'd expect
> more issues with this in the format readers though.
> 
> I evaluated both the futurize (using future library) and modernize
> (using the well known six library) tools, both being different
> approaches to 2to3, but for dual 2/3 codebases. I liked the approach
> of futurize to attempt to make code look as python3-idiomatic
> as-possible, but some of the performance implications were slightly
> opaque: e.g. libtbx makes heavy use of cStringIO (presumably for a
> good reason), and futurize converted all of these back to using
> StringIO in the Python2 case, so settled on modernize as I felt two
> different compatibility libraries would be messy. In either case,
> using the library means that you can identify exactly everywhere that
> needs to be removed when moving to python3 only.
> 
> My conclusions:
> - Automatic tools are useful for the bulk of changes, but there are
> still lots of edge cases
> - The complexity means that a phased approach is *absolutely*
> necessary - starting by converting the core to 2/3 and only moving to
> 3 once everything downstream is converted.Trying to convert everything
> at once would likely mean months of feature-freeze.
> - A separate "Remove legacy" cleaning phase might be very useful,
> though obviously the domain of this could be endless
> - SCons is probably the least important of the conversion worries
> 
> Nick
> 
> Links:
> ------
> [1]
> https://github.com/ndevenish/read_scons/blob/master/tbx2cmake/import_env.py#L202-L235
> [2] https://github.com/ndevenish/cctbx_project/tree/py3k-modernize
> [3] https://github.com/ndevenish/cctbx_project/tree/py3k
> [4]
> https://github.com/ndevenish/cctbx_project/commit/c793eb58acc37c60360dccbbbdd5205504ec3f1a
> 
> _______________________________________________
> cctbxbb mailing list
> cctbxbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/cctbxbb