Thanks Rob.  I've switched dials.stills_process to using easy_mp.multi_core_run.  I tested raising an error in one of the processes and confirmed the program finishes, and that I can print out the error message raised in the process.  This is great.

Thanks,
-Aaron

On Wed, Dec 13, 2017 at 4:13 AM, R.D. Oeffner <rdo20@cam.ac.uk> wrote:
Hi Graeme,

As far as I can tell from stills_process.py you're using easy_mp.parallel_map() for your multiprocessing. I should use easy_mp.multi_core_run() instead. It is a different version of easy_mp.parallel_map().
See https://www.phenix-online.org/newsletter/CCN_2017_01.pdf#page=6

One of its features is that it preserves stacktraces from individual child processes should they crash which is helpful for debugging. If a child process crashes in parallel_map() however you will not get a stacktrace from the offending child which makes correcting the misbehaviour more difficult.

Rob



On 13/12/2017 09:34, Graeme.Winter@diamond.ac.uk wrote:
Folks

Once again, lack of stack traces in easy_mp is a big annoyance

I forget the reasons why printing the actual error was a bad thing,
but I do remember a discussion some 18 mo past

Not wishing to replay the entire transaction, but could we revisit this?

Thanks Graeme

Begin forwarded message:

From: <Danny.Axford@Diamond.ac.uk<mailto:Danny.Axford@Diamond.ac.uk>>
Subject: [Dials-support] dials.stills_process crash report
Date: 12 December 2017 at 17:43:41 GMT
To:
<dials-support@lists.sourceforge.net<mailto:dials-support@lists.sourceforge.net>>
Cc: martin.appleby@diamond.ac.uk<mailto:martin.appleby@diamond.ac.uk>

Perhaps this means something to someone. This is the SACLA data again,
it seems to run happily and then hit a problematic frame? Martin and I
see the error at pretty much the same point in the dataset, we were
both running on local machines rather than the cluster.

Cheers,
Danny



Detector 1 RMSDs by panel:
----------------------------------------------------
| Panel | Nref | RMSD_X  | RMSD_Y  | RMSD_DeltaPsi |
| id    |      | (px)    | (px)    | (deg)         |
----------------------------------------------------
| 1     | 89   | 0.37442 | 0.48949 | 0.2192        |
| 2     | 69   | 0.49627 | 0.4847  | 0.21385       |
| 5     | 90   | 0.48267 | 0.59909 | 0.22839       |
| 6     | 70   | 0.6094  | 0.55592 | 0.20944       |
----------------------------------------------------
Traceback (most recent call last):
 File
"/dls/science/groups/scisoft/DIALS/CD/latest/dials-dev20171212/build/../modules/dials/command_line/stills_process.py",
line 870, in <module>
   halraiser(e)
 File
"/dls/science/groups/scisoft/DIALS/CD/latest/dials-dev20171212/build/../modules/dials/command_line/stills_process.py",
line 868, in <module>
   script.run()
 File
"/dls/science/groups/scisoft/DIALS/CD/latest/dials-dev20171212/build/../modules/dials/command_line/stills_process.py",
line 383, in run
   preserve_exception_message=True)
 File
"/dls/science/groups/scisoft/DIALS/CD/latest/dials-dev20171212/modules/cctbx_project/libtbx/easy_mp.py",
line 623, in parallel_map
   result = res()
 File
"/dls/science/groups/scisoft/DIALS/CD/latest/dials-dev20171212/modules/cctbx_project/libtbx/scheduling/result.py",
line 119, in __call__
   self.traceback( exception = self.exception() )
 File
"/dls/science/groups/scisoft/DIALS/CD/latest/dials-dev20171212/modules/cctbx_project/libtbx/scheduling/stacktrace.py",
line 86, in __call__
   raise exception
RuntimeError: Please report this error to
dials-support@lists.sourceforge.net<mailto:dials-support@lists.sourceforge.net>:
exit code = -9


--
This e-mail and any attachments may contain confidential, copyright
and or privileged material, and are for the use of the intended
addressee only. If you are not the intended addressee or an authorised
recipient of the addressee please notify us of receipt by returning
the e-mail and do not use, copy, retain, distribute or disclose the
information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual
and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any
attachments are free from viruses and we cannot accept liability for
any damage which you may sustain as a result of software viruses which
may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in
England and Wales with its registered office at Diamond House, Harwell
Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United
Kingdom


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org<http://slashdot.org/>!
http://sdm.link/slashdot
_______________________________________________
DIALS-support mailing list
DIALS-support@lists.sourceforge.net<mailto:DIALS-support@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/dials-support


_______________________________________________
cctbxbb mailing list
cctbxbb@phenix-online.org
http://phenix-online.org/mailman/listinfo/cctbxbb

_______________________________________________
cctbxbb mailing list
cctbxbb@phenix-online.org
http://phenix-online.org/mailman/listinfo/cctbxbb