[cctbxbb] [Cctbx-cvs] SF.net SVN: cctbx:[25333] trunk/libtbx/env_config.py

markus.gerstel at diamond.ac.uk markus.gerstel at diamond.ac.uk
Fri Sep 9 01:59:54 PDT 2016

Hi Billy,

Thanks for your mail.
As usual, it’s $insertYear and unicode is still not a solved problem :(

I ran into UnicodeEncode/DecodeErrors, but I am now happy that your change only exposed underlying issues in my code (outside of the cctbx/dials/xia2 repositories). I have sprinkled some forced UTF-8 encoding on top, and everything appears to be working fine now.
As to the changed output, that for example includes default wget output where it puts the file it writes to disk in ``quotes’’, and they observe the LC_ALL encoding. Fortunately enough we don’t really care about fancy formatting, so this is not a real problem.


Dr Markus Gerstel MBCS
Postdoctoral Research Associate
Tel: +44 1235 778698

Diamond Light Source Ltd.
Diamond House
Harwell Science & Innovation Campus
OX11 0DE

From: cctbxbb-bounces at phenix-online.org [mailto:cctbxbb-bounces at phenix-online.org] On Behalf Of Billy Poon
Sent: 08 September 2016 19:45
To: cctbx mailing list
Cc: bkpoon at users.sourceforge.net
Subject: Re: [cctbxbb] [Cctbx-cvs] SF.net SVN: cctbx:[25333] trunk/libtbx/env_config.py

Hi Markus,

There is an issue with non-ASCII paths (unicode type) and basic Python functions if the locale (like 'C') does not support UTF-8. Without UTF-8 support, these functions try to convert the unicode type into a str type with the 'ascii' encoding, which triggers a UnicodeEncodeError. I attached a script that tests it. The unicode path should fail for libtbx.python before my change and pass for after my change. Or change the LC_ALL setting in the build/bin/libtbx.python dispatcher (if the en_US locale is available, en_US will fail, en_US.UTF-8 will work).

An additional wrinkle is that LC_ALL=C works fine on my mac (OS X 10.10.5). Also, there is a "C.UTF-8" locale on Ubuntu, but not on CentOS.

Basically, to support non-ASCII paths (unicode type) in basic Python functions, any locale with UTF-8 or utf8 will work. The en_US part is not that important.

What are the errors that you get? I ran the regression tests for dials (libtbx.run_tests_parallel module=dials) and dials_regression (module=dials_regression) and everything passes except for one test in dials_regression (dials_regression/test.py). But the error seems to be about a goniometer object. Do you have the en_US locale installed?

Right now, I'm just checking if LC_ALL is set in the user environment and using that if it has the extra UTF-8 part. I can also check the LANG environment variable. That might be work better for users that do not have the en_US locale installed.

Billy K. Poon
Research Scientist, Molecular Biophysics and Integrated Bioimaging
Lawrence Berkeley National Laboratory
1 Cyclotron Road, M/S 33R0345
Berkeley, CA 94720
Tel: (510) 486-5709
Fax: (510) 486-5909
Web: https://phenix-online.org

On Thu, Sep 8, 2016 at 2:26 AM, <markus.gerstel at diamond.ac.uk<mailto:markus.gerstel at diamond.ac.uk>> wrote:

I just spent some time tracking software crashes to this change. Is setting the default to en_US really appropriate and what we want?
In particular it affects the output of downstream, external software we run from within python.

What is the unicode issue you hint at in the commit message?


Dr Markus Gerstel MBCS
Postdoctoral Research Associate
Tel: +44 1235 778698<tel:%2B44%201235%20778698>

Diamond Light Source Ltd.
Diamond House
Harwell Science & Innovation Campus
OX11 0DE

-----Original Message-----
From: bkpoon at users.sourceforge.net<mailto:bkpoon at users.sourceforge.net> [mailto:bkpoon at users.sourceforge.net<mailto:bkpoon at users.sourceforge.net>]
Sent: 07 September 2016 00:54
To: cctbx-cvs at lists.sourceforge.net<mailto:cctbx-cvs at lists.sourceforge.net>
Subject: [Cctbx-cvs] SF.net SVN: cctbx:[25333] trunk/libtbx/env_config.py

Revision: 25333
Author:   bkpoon
Date:     2016-09-06 23:54:29 +0000 (Tue, 06 Sep 2016)
Log Message:
Unicode support: set LC_ALL in dispatchers to the one in the user's environment (if available, and supports UTF-8), otherwise use the default setting of en_US.UTF-8; fixes unicode issue with python in Linux (e.g. os.path functions do not work correctly with unicode if LC_ALL=C

Modified Paths:

Modified: trunk/libtbx/env_config.py
--- trunk/libtbx/env_config.py  2016-09-06 21:15:34 UTC (rev 25332)
+++ trunk/libtbx/env_config.py  2016-09-06 23:54:29 UTC (rev 25333)
@@ -945,6 +945,15 @@

   def write_bin_sh_dispatcher(self,
         source_file, target_file, source_is_python_exe=False):
+    # determine LC_ALL from environment (Python UTF-8 compatibility in Linux)
+    LC_ALL = os.environ.get('LC_ALL')     # user setting
+    if (LC_ALL is not None):
+      if ( ('UTF-8' not in LC_ALL) and ('utf8' not in LC_ALL) ):
+        LC_ALL = None
+    if (LC_ALL is None):
+      LC_ALL = 'en_US.UTF-8'              # default
     f = target_file.open("w")
     if (source_file is not None):
       print >> f, '#! /bin/sh'
@@ -975,7 +984,7 @@
     print >> f, '#'
     print >> f, _SHELLREALPATH_CODE
     print >> f, 'unset PYTHONHOME'
-    print >> f, 'LC_ALL=C'
+    print >> f, 'LC_ALL=' + LC_ALL
     print >> f, 'export LC_ALL'
     print >> f, 'LIBTBX_BUILD="$(shellrealpath "$0" && cd "$(dirname "$RESULT")/.." && pwd)"'
     print >> f, 'export LIBTBX_BUILD'

This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

Cctbx-cvs mailing list
Cctbx-cvs at lists.sourceforge.net<mailto:Cctbx-cvs at lists.sourceforge.net>

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

cctbxbb mailing list
cctbxbb at phenix-online.org<mailto:cctbxbb at phenix-online.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/cctbxbb/attachments/20160909/a045fb52/attachment-0001.htm>

More information about the cctbxbb mailing list