[cctbxbb] Exceptions squashed by easy_mp (revenge of)

Dr. Robert Oeffner rdo20 at cam.ac.uk
Tue Apr 3 05:20:35 PDT 2018


Hi Graeme,

Just had a look at the code in dials/util/mp.py. It seems that you are 
using parallel_map() on a cluster using qsub. Unfortunately 
multi_core_run() is not designed for that. It only runs on a single 
multi core CPU PC.

Sorry,

Rob


On 03/04/2018 12:44, Graeme.Winter at Diamond.ac.uk wrote:
> Thanks Rob, I could not dig out the thread (and the mail list thing does not have search that I could find)
> 
> I’ll talk to the crew about swapping this out for dials.* - though is possibly quite a big change?
> 
> Cheers Graeme
> 
> On 3 Apr 2018, at 12:26, Dr. Robert Oeffner <rdo20 at cam.ac.uk<mailto:rdo20 at cam.ac.uk>> wrote:
> 
> Hi Graeme,
> 
> I recall we've been here before,
> http://phenix-online.org/pipermail/cctbxbb/2017-December/001807.html
> 
> I believe the solution is to use easy_mp.multi_core_run() instead of easy_mp.parallel_map(). The first function preserves stack traces of individual process, unlike easy_mp.parallel_map().
> 
> Regards,
> 
> Rob
> 
> 
> On 03/04/2018 07:16, Graeme.Winter at Diamond.ac.uk<mailto:Graeme.Winter at Diamond.ac.uk> wrote:
> Folks,
> Following up on user reports again of errors within easy_mp - all that gets logged is “something went wrong” i.e.
>   Using multiprocessing with 10 parallel job(s)
> Traceback (most recent call last):
>    File "/home/user/bin/dials-installer/build/../modules/dials/command_line/integrate.py", line 613, in <module>
>      halraiser(e)
>    File "/home/user/bin/dials-installer/build/../modules/dials/command_line/integrate.py", line 611, in <module>
>      script.run()
>    File "/home/user/bin/dials-installer/build/../modules/dials/command_line/integrate.py", line 341, in run
>      reflections = integrator.integrate()
>    File "/home/user/bin/dials-installer/modules/dials/algorithms/integration/integrator.py", line 1214, in integrate
>      self.reflections, _, time_info = processor.process()
>    File "/home/user/bin/dials-installer/modules/dials/algorithms/integration/processor.py", line 271, in process
>      preserve_exception_message = True)
>    File "/home/user/bin/dials-installer/modules/dials/util/mp.py", line 171, in multi_node_parallel_map
>      preserve_exception_message = preserve_exception_message)
>    File "/home/user/bin/dials-installer/modules/dials/util/mp.py", line 53, in parallel_map
>      preserve_exception_message = preserve_exception_message)
>    File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/easy_mp.py", line 627, in parallel_map
>      result = res()
>    File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/result.py", line 119, in __call__
>      self.traceback( exception = self.exception() )
>    File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 115, in __call__
>      self.raise_handler( exception = exception )
>    File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/mainthread.py", line 100, in poll
>      value = target( *args, **kwargs )
>    File "/home/user/bin/dials-installer/modules/dials/util/mp.py", line 91, in __call__
>      preserve_exception_message = self.preserve_exception_message)
>    File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/easy_mp.py", line 627, in parallel_map
>      result = res()
>    File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/result.py", line 119, in __call__
>      self.traceback( exception = self.exception() )
>    File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 86, in __call__
>      raise exception
> RuntimeError: Please report this error to dials-support at lists.sourceforge.net<mailto:dials-support at lists.sourceforge.net>: exit code = -9
> I forget why it was decided that keeping the proper stack trace was a bad thing, but could this be revisited? It would greatly help to see it in the output of the program (if as is the case here I do not have the user data)
> My email-fu is not strong enough to dig out the previous conversation
> Cheers Graeme
> 
> 
> --
> Robert Oeffner, Ph.D.
> Research Associate, The Read Group
> Department of Haematology,
> Cambridge Institute for Medical Research
> University of Cambridge
> Cambridge Biomedical Campus
> Wellcome Trust/MRC Building
> Hills Road
> Cambridge CB2 0XY
> 
> www.cimr.cam.ac.uk/investigators/read/index.html<http://www.cimr.cam.ac.uk/investigators/read/index.html>
> tel: +44(0)1223 763234
> 
> 


-- 
Robert Oeffner, Ph.D.
Research Associate, The Read Group
Department of Haematology,
Cambridge Institute for Medical Research
University of Cambridge
Cambridge Biomedical Campus
Wellcome Trust/MRC Building
Hills Road
Cambridge CB2 0XY

www.cimr.cam.ac.uk/investigators/read/index.html
tel: +44(0)1223 763234


More information about the cctbxbb mailing list