[cctbxbb] [Cctbx-cvs] SF.net SVN: cctbx:[25333] trunk/libtbx/env_config.py

Billy Poon bkpoon at lbl.gov
Thu Sep 8 11:45:01 PDT 2016


Hi Markus,

There is an issue with non-ASCII paths (unicode type) and basic Python
functions if the locale (like 'C') does not support UTF-8. Without UTF-8
support, these functions try to convert the unicode type into a str type
with the 'ascii' encoding, which triggers a UnicodeEncodeError. I attached
a script that tests it. The unicode path should fail for libtbx.python
before my change and pass for after my change. Or change the LC_ALL setting
in the build/bin/libtbx.python dispatcher (if the en_US locale is
available, en_US will fail, en_US.UTF-8 will work).

An additional wrinkle is that LC_ALL=C works fine on my mac (OS X 10.10.5).
Also, there is a "C.UTF-8" locale on Ubuntu, but not on CentOS.

Basically, to support non-ASCII paths (unicode type) in basic Python
functions, any locale with UTF-8 or utf8 will work. The en_US part is not
that important.

What are the errors that you get? I ran the regression tests for dials
(libtbx.run_tests_parallel module=dials) and dials_regression
(module=dials_regression) and everything passes except for one test in
dials_regression (dials_regression/test.py). But the error seems to be
about a goniometer object. Do you have the en_US locale installed?

Right now, I'm just checking if LC_ALL is set in the user environment and
using that if it has the extra UTF-8 part. I can also check the LANG
environment variable. That might be work better for users that do not have
the en_US locale installed.

--
Billy K. Poon
Research Scientist, Molecular Biophysics and Integrated Bioimaging
Lawrence Berkeley National Laboratory
1 Cyclotron Road, M/S 33R0345
Berkeley, CA 94720
Tel: (510) 486-5709
Fax: (510) 486-5909
Web: https://phenix-online.org

On Thu, Sep 8, 2016 at 2:26 AM, <markus.gerstel at diamond.ac.uk> wrote:

> Hi,
>
> I just spent some time tracking software crashes to this change. Is
> setting the default to en_US really appropriate and what we want?
> In particular it affects the output of downstream, external software we
> run from within python.
>
> What is the unicode issue you hint at in the commit message?
>
> -Markus
>
> Dr Markus Gerstel MBCS
> Postdoctoral Research Associate
> Tel: +44 1235 778698
>
> Diamond Light Source Ltd.
> Diamond House
> Harwell Science & Innovation Campus
> Didcot
> Oxfordshire
> OX11 0DE
>
> -----Original Message-----
> From: bkpoon at users.sourceforge.net [mailto:bkpoon at users.sourceforge.net]
> Sent: 07 September 2016 00:54
> To: cctbx-cvs at lists.sourceforge.net
> Subject: [Cctbx-cvs] SF.net SVN: cctbx:[25333] trunk/libtbx/env_config.py
>
> Revision: 25333
>           http://sourceforge.net/p/cctbx/code/25333
> Author:   bkpoon
> Date:     2016-09-06 23:54:29 +0000 (Tue, 06 Sep 2016)
> Log Message:
> -----------
> Unicode support: set LC_ALL in dispatchers to the one in the user's
> environment (if available, and supports UTF-8), otherwise use the default
> setting of en_US.UTF-8; fixes unicode issue with python in Linux (e.g.
> os.path functions do not work correctly with unicode if LC_ALL=C
>
> Modified Paths:
> --------------
>     trunk/libtbx/env_config.py
>
> Modified: trunk/libtbx/env_config.py
> ===================================================================
> --- trunk/libtbx/env_config.py  2016-09-06 21:15:34 UTC (rev 25332)
> +++ trunk/libtbx/env_config.py  2016-09-06 23:54:29 UTC (rev 25333)
> @@ -945,6 +945,15 @@
>
>    def write_bin_sh_dispatcher(self,
>          source_file, target_file, source_is_python_exe=False):
> +
> +    # determine LC_ALL from environment (Python UTF-8 compatibility in
> Linux)
> +    LC_ALL = os.environ.get('LC_ALL')     # user setting
> +    if (LC_ALL is not None):
> +      if ( ('UTF-8' not in LC_ALL) and ('utf8' not in LC_ALL) ):
> +        LC_ALL = None
> +    if (LC_ALL is None):
> +      LC_ALL = 'en_US.UTF-8'              # default
> +
>      f = target_file.open("w")
>      if (source_file is not None):
>        print >> f, '#! /bin/sh'
> @@ -975,7 +984,7 @@
>      print >> f, '#'
>      print >> f, _SHELLREALPATH_CODE
>      print >> f, 'unset PYTHONHOME'
> -    print >> f, 'LC_ALL=C'
> +    print >> f, 'LC_ALL=' + LC_ALL
>      print >> f, 'export LC_ALL'
>      print >> f, 'LIBTBX_BUILD="$(shellrealpath "$0" && cd "$(dirname
> "$RESULT")/.." && pwd)"'
>      print >> f, 'export LIBTBX_BUILD'
>
> This was sent by the SourceForge.net collaborative development platform,
> the world's largest Open Source development site.
>
>
> ------------------------------------------------------------
> ------------------
> _______________________________________________
> Cctbx-cvs mailing list
> Cctbx-cvs at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cctbx-cvs
>
> --
> This e-mail and any attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended addressee only. If
> you are not the intended addressee or an authorised recipient of the
> addressee please notify us of receipt by returning the e-mail and do not
> use, copy, retain, distribute or disclose the information in or attached to
> the e-mail.
> Any opinions expressed within this e-mail are those of the individual and
> not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for any
> damage which you may sustain as a result of software viruses which may be
> transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England
> and Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>
>
> _______________________________________________
> cctbxbb mailing list
> cctbxbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/cctbxbb
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/cctbxbb/attachments/20160908/8cb6249a/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unicode_check.py
Type: text/x-python-script
Size: 501 bytes
Desc: not available
URL: <http://phenix-online.org/pipermail/cctbxbb/attachments/20160908/8cb6249a/attachment.bin>


More information about the cctbxbb mailing list