[cctbxbb] Exceptions squashed by easy_mp (revenge of)

[email protected] Graeme.Winter at diamond.ac.uk
Tue Apr 3 05:23:09 PDT 2018


HI Rob

I think this is true … sometimes

It sets up the qsub every time, but does not always use it - at least it works on my MacBook with no qsub ;-)

That said, the question remains why exception reports are bad for parallel map… we *are* using preserve_exception_message…

Cheers Graeme


> On 3 Apr 2018, at 13:20, Dr. Robert Oeffner <rdo20 at cam.ac.uk> wrote:
> 
> Hi Graeme,
> 
> Just had a look at the code in dials/util/mp.py. It seems that you are using parallel_map() on a cluster using qsub. Unfortunately multi_core_run() is not designed for that. It only runs on a single multi core CPU PC.
> 
> Sorry,
> 
> Rob
> 
> 
> On 03/04/2018 12:44, Graeme.Winter at Diamond.ac.uk wrote:
>> Thanks Rob, I could not dig out the thread (and the mail list thing does not have search that I could find)
>> I’ll talk to the crew about swapping this out for dials.* - though is possibly quite a big change?
>> Cheers Graeme
>> On 3 Apr 2018, at 12:26, Dr. Robert Oeffner <rdo20 at cam.ac.uk<mailto:rdo20 at cam.ac.uk>> wrote:
>> Hi Graeme,
>> I recall we've been here before,
>> http://phenix-online.org/pipermail/cctbxbb/2017-December/001807.html
>> I believe the solution is to use easy_mp.multi_core_run() instead of easy_mp.parallel_map(). The first function preserves stack traces of individual process, unlike easy_mp.parallel_map().
>> Regards,
>> Rob
>> On 03/04/2018 07:16, Graeme.Winter at Diamond.ac.uk<mailto:Graeme.Winter at Diamond.ac.uk> wrote:
>> Folks,
>> Following up on user reports again of errors within easy_mp - all that gets logged is “something went wrong” i.e.
>>  Using multiprocessing with 10 parallel job(s)
>> Traceback (most recent call last):
>>   File "/home/user/bin/dials-installer/build/../modules/dials/command_line/integrate.py", line 613, in <module>
>>     halraiser(e)
>>   File "/home/user/bin/dials-installer/build/../modules/dials/command_line/integrate.py", line 611, in <module>
>>     script.run()
>>   File "/home/user/bin/dials-installer/build/../modules/dials/command_line/integrate.py", line 341, in run
>>     reflections = integrator.integrate()
>>   File "/home/user/bin/dials-installer/modules/dials/algorithms/integration/integrator.py", line 1214, in integrate
>>     self.reflections, _, time_info = processor.process()
>>   File "/home/user/bin/dials-installer/modules/dials/algorithms/integration/processor.py", line 271, in process
>>     preserve_exception_message = True)
>>   File "/home/user/bin/dials-installer/modules/dials/util/mp.py", line 171, in multi_node_parallel_map
>>     preserve_exception_message = preserve_exception_message)
>>   File "/home/user/bin/dials-installer/modules/dials/util/mp.py", line 53, in parallel_map
>>     preserve_exception_message = preserve_exception_message)
>>   File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/easy_mp.py", line 627, in parallel_map
>>     result = res()
>>   File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/result.py", line 119, in __call__
>>     self.traceback( exception = self.exception() )
>>   File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 115, in __call__
>>     self.raise_handler( exception = exception )
>>   File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/mainthread.py", line 100, in poll
>>     value = target( *args, **kwargs )
>>   File "/home/user/bin/dials-installer/modules/dials/util/mp.py", line 91, in __call__
>>     preserve_exception_message = self.preserve_exception_message)
>>   File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/easy_mp.py", line 627, in parallel_map
>>     result = res()
>>   File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/result.py", line 119, in __call__
>>     self.traceback( exception = self.exception() )
>>   File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 86, in __call__
>>     raise exception
>> RuntimeError: Please report this error to dials-support at lists.sourceforge.net<mailto:dials-support at lists.sourceforge.net>: exit code = -9
>> I forget why it was decided that keeping the proper stack trace was a bad thing, but could this be revisited? It would greatly help to see it in the output of the program (if as is the case here I do not have the user data)
>> My email-fu is not strong enough to dig out the previous conversation
>> Cheers Graeme
>> --
>> Robert Oeffner, Ph.D.
>> Research Associate, The Read Group
>> Department of Haematology,
>> Cambridge Institute for Medical Research
>> University of Cambridge
>> Cambridge Biomedical Campus
>> Wellcome Trust/MRC Building
>> Hills Road
>> Cambridge CB2 0XY
>> www.cimr.cam.ac.uk/investigators/read/index.html<http://www.cimr.cam.ac.uk/investigators/read/index.html>
>> tel: +44(0)1223 763234
> 
> 
> -- 
> Robert Oeffner, Ph.D.
> Research Associate, The Read Group
> Department of Haematology,
> Cambridge Institute for Medical Research
> University of Cambridge
> Cambridge Biomedical Campus
> Wellcome Trust/MRC Building
> Hills Road
> Cambridge CB2 0XY
> 
> www.cimr.cam.ac.uk/investigators/read/index.html
> tel: +44(0)1223 763234


-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom



More information about the cctbxbb mailing list