[cctbxbb] Removal of Boost.Thread support

Wed Aug 15 12:04:50 PDT 2012

On Wed, Aug 15, 2012 at 7:21 AM, Jeffrey Van Voorst <vanv0059 at umn.edu> wrote:
> I have been lurking on this mailing list for a bit.  I am very interested in
> and have some practical experience with OpenMP and Nvidia CUDA programming.
> I work on such projects both to make use of modern hardware on typical
> single user machines, and because, I find it fun.  I have found OpenMP to be
> rather easy to setup and gain good speedup, but it is generally very
> difficult to get close to the maximum theoretical performance (N cores gives
> a speedup of N) for relatively short computations (less than 1 second).
>
> I have several questions (that I know may not have simple answers):
> 0) Is there a public roadmap or recent plan of how to proceed?

Nope.  Most of the speed improvements we've discussed here at Berkeley
have focused on better algorithms and optimization methods.  There is
some interest (mostly Peter) in porting the direct structure factor
calculations to GPUs, which would potentially make this method
accessible for macromolecules, but it's a long-term project.

> 1) Does the cctbx developers community take kindly to others meddling in the
> code?

Since CCTBX is an open-source project, we generally welcome meddling,
as long as a) you talk to us first, b) you don't break anything.

> 2) For which types of machines would one be trying to tune cctbx's OpenMP
> code?  In general, the tradeoffs are different for machines with a small
> number of cores versus a massive shared memory platform (1000s of cores).

Small machines (where "small" means "2-64 cores").  Very few
calculations that we do are suitable for massive shared-memory
systems.

> 3) What is the primary motivation?  (e.g. have easy to extend code that make
> use of more cores because they are there? or highly efficient methods that
> scale very well -- 12 cores should give as close as possible to 12x speedup
> with respect to 1 core)

I think a lot of the OpenMP support currently in CCTBX was largely
experimental - it seemed like an easy thing to try.  The main goal for
us at Berkeley was (and still is) to make Phenix faster; once it was
obvious that OpenMP wouldn't help very much, we sort of lost interest.
 We've had far more luck with cruder parallelization using Python
multiprocessing (although this is very situational).  (A secondary
problem is that OpenMP is incompatible with the multiprocessing
module, so we don't distribute OpenMP builds of either CCTBX or Phenix
as a result.)

The best use of OpenMP that I can think of would be to parallelize the
direct summation code, which is so inefficient that Amdahl's Law
shouldn't be as big of a buzz-kill as it was for the parallelized FFT
calculations.

-Nat