[cctbxbb] [Cctbx-cvs] SF.net SVN: cctbx:[25333] trunk/libtbx/env_config.py

Billy Poon bkpoon at lbl.gov
Fri Sep 9 12:48:42 PDT 2016


Hi Markus,

Great!

Just to let you know of some additional quirks Rob and I found about
unicode. Windows filesystems do not seem to like UTF-8, so you should use
the to_str and to_unicode functions in libtbx/utils.py if you want to
handle non-ASCII filenames on Windows. They default to 'mbcs' for the
encoding codec on Windows.

--
Billy K. Poon
Research Scientist, Molecular Biophysics and Integrated Bioimaging
Lawrence Berkeley National Laboratory
1 Cyclotron Road, M/S 33R0345
Berkeley, CA 94720
Tel: (510) 486-5709
Fax: (510) 486-5909
Web: https://phenix-online.org

On Fri, Sep 9, 2016 at 1:59 AM, <markus.gerstel at diamond.ac.uk> wrote:

> Hi Billy,
>
>
>
> Thanks for your mail.
>
> As usual, it’s $insertYear and unicode is still not a solved problem :(
>
>
>
> I ran into UnicodeEncode/DecodeErrors, but I am now happy that your change
> only exposed underlying issues in my code (outside of the cctbx/dials/xia2
> repositories). I have sprinkled some forced UTF-8 encoding on top, and
> everything appears to be working fine now.
>
> As to the changed output, that for example includes default wget output
> where it puts the file it writes to disk in ``quotes’’, and they observe
> the LC_ALL encoding. Fortunately enough we don’t really care about fancy
> formatting, so this is not a real problem.
>
>
>
> -Markus
>
>
>
> Dr Markus Gerstel MBCS
>
> Postdoctoral Research Associate
>
> Tel: +44 1235 778698
>
>
>
> Diamond Light Source Ltd.
>
> Diamond House
>
> Harwell Science & Innovation Campus
>
> Didcot
>
> Oxfordshire
>
> OX11 0DE
>
>
>
> *From:* cctbxbb-bounces at phenix-online.org [mailto:cctbxbb-bounces@
> phenix-online.org] *On Behalf Of *Billy Poon
> *Sent:* 08 September 2016 19:45
> *To:* cctbx mailing list
> *Cc:* bkpoon at users.sourceforge.net
> *Subject:* Re: [cctbxbb] [Cctbx-cvs] SF.net SVN: cctbx:[25333]
> trunk/libtbx/env_config.py
>
>
>
> Hi Markus,
>
>
>
> There is an issue with non-ASCII paths (unicode type) and basic Python
> functions if the locale (like 'C') does not support UTF-8. Without UTF-8
> support, these functions try to convert the unicode type into a str type
> with the 'ascii' encoding, which triggers a UnicodeEncodeError. I attached
> a script that tests it. The unicode path should fail for libtbx.python
> before my change and pass for after my change. Or change the LC_ALL setting
> in the build/bin/libtbx.python dispatcher (if the en_US locale is
> available, en_US will fail, en_US.UTF-8 will work).
>
>
>
> An additional wrinkle is that LC_ALL=C works fine on my mac (OS X
> 10.10.5). Also, there is a "C.UTF-8" locale on Ubuntu, but not on CentOS.
>
>
>
> Basically, to support non-ASCII paths (unicode type) in basic Python
> functions, any locale with UTF-8 or utf8 will work. The en_US part is not
> that important.
>
>
>
> What are the errors that you get? I ran the regression tests for dials
> (libtbx.run_tests_parallel module=dials) and dials_regression
> (module=dials_regression) and everything passes except for one test in
> dials_regression (dials_regression/test.py). But the error seems to be
> about a goniometer object. Do you have the en_US locale installed?
>
>
>
> Right now, I'm just checking if LC_ALL is set in the user environment and
> using that if it has the extra UTF-8 part. I can also check the LANG
> environment variable. That might be work better for users that do not have
> the en_US locale installed.
>
>
> --
>
> Billy K. Poon
>
> Research Scientist, Molecular Biophysics and Integrated Bioimaging
>
> Lawrence Berkeley National Laboratory
>
> 1 Cyclotron Road, M/S 33R0345
>
> Berkeley, CA 94720
>
> Tel: (510) 486-5709
>
> Fax: (510) 486-5909
>
> Web: https://phenix-online.org
>
>
>
> On Thu, Sep 8, 2016 at 2:26 AM, <markus.gerstel at diamond.ac.uk> wrote:
>
> Hi,
>
> I just spent some time tracking software crashes to this change. Is
> setting the default to en_US really appropriate and what we want?
> In particular it affects the output of downstream, external software we
> run from within python.
>
> What is the unicode issue you hint at in the commit message?
>
> -Markus
>
> Dr Markus Gerstel MBCS
> Postdoctoral Research Associate
> Tel: +44 1235 778698
>
> Diamond Light Source Ltd.
> Diamond House
> Harwell Science & Innovation Campus
> Didcot
> Oxfordshire
> OX11 0DE
>
> -----Original Message-----
> From: bkpoon at users.sourceforge.net [mailto:bkpoon at users.sourceforge.net]
> Sent: 07 September 2016 00:54
> To: cctbx-cvs at lists.sourceforge.net
> Subject: [Cctbx-cvs] SF.net SVN: cctbx:[25333] trunk/libtbx/env_config.py
>
> Revision: 25333
>           http://sourceforge.net/p/cctbx/code/25333
> Author:   bkpoon
> Date:     2016-09-06 23:54:29 +0000 (Tue, 06 Sep 2016)
> Log Message:
> -----------
> Unicode support: set LC_ALL in dispatchers to the one in the user's
> environment (if available, and supports UTF-8), otherwise use the default
> setting of en_US.UTF-8; fixes unicode issue with python in Linux (e.g.
> os.path functions do not work correctly with unicode if LC_ALL=C
>
> Modified Paths:
> --------------
>     trunk/libtbx/env_config.py
>
> Modified: trunk/libtbx/env_config.py
> ===================================================================
> --- trunk/libtbx/env_config.py  2016-09-06 21:15:34 UTC (rev 25332)
> +++ trunk/libtbx/env_config.py  2016-09-06 23:54:29 UTC (rev 25333)
> @@ -945,6 +945,15 @@
>
>    def write_bin_sh_dispatcher(self,
>          source_file, target_file, source_is_python_exe=False):
> +
> +    # determine LC_ALL from environment (Python UTF-8 compatibility in
> Linux)
> +    LC_ALL = os.environ.get('LC_ALL')     # user setting
> +    if (LC_ALL is not None):
> +      if ( ('UTF-8' not in LC_ALL) and ('utf8' not in LC_ALL) ):
> +        LC_ALL = None
> +    if (LC_ALL is None):
> +      LC_ALL = 'en_US.UTF-8'              # default
> +
>      f = target_file.open("w")
>      if (source_file is not None):
>        print >> f, '#! /bin/sh'
> @@ -975,7 +984,7 @@
>      print >> f, '#'
>      print >> f, _SHELLREALPATH_CODE
>      print >> f, 'unset PYTHONHOME'
> -    print >> f, 'LC_ALL=C'
> +    print >> f, 'LC_ALL=' + LC_ALL
>      print >> f, 'export LC_ALL'
>      print >> f, 'LIBTBX_BUILD="$(shellrealpath "$0" && cd "$(dirname
> "$RESULT")/.." && pwd)"'
>      print >> f, 'export LIBTBX_BUILD'
>
> This was sent by the SourceForge.net collaborative development platform,
> the world's largest Open Source development site.
>
>
> ------------------------------------------------------------
> ------------------
> _______________________________________________
> Cctbx-cvs mailing list
> Cctbx-cvs at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/cctbx-cvs
>
> --
> This e-mail and any attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended addressee only. If
> you are not the intended addressee or an authorised recipient of the
> addressee please notify us of receipt by returning the e-mail and do not
> use, copy, retain, distribute or disclose the information in or attached to
> the e-mail.
> Any opinions expressed within this e-mail are those of the individual and
> not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for any
> damage which you may sustain as a result of software viruses which may be
> transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England
> and Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>
>
> _______________________________________________
> cctbxbb mailing list
> cctbxbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/cctbxbb
>
>
>
>
>
> --
>
> This e-mail and any attachments may contain confidential, copyright and or
> privileged material, and are for the use of the intended addressee only. If
> you are not the intended addressee or an authorised recipient of the
> addressee please notify us of receipt by returning the e-mail and do not
> use, copy, retain, distribute or disclose the information in or attached to
> the e-mail.
> Any opinions expressed within this e-mail are those of the individual and
> not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
> attachments are free from viruses and we cannot accept liability for any
> damage which you may sustain as a result of software viruses which may be
> transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England
> and Wales with its registered office at Diamond House, Harwell Science and
> Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
>
>
> _______________________________________________
> cctbxbb mailing list
> cctbxbb at phenix-online.org
> http://phenix-online.org/mailman/listinfo/cctbxbb
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://phenix-online.org/pipermail/cctbxbb/attachments/20160909/3822898d/attachment-0001.htm>


More information about the cctbxbb mailing list