[phenixbb] parallel phenix.phaser on SGE cluster
schuerjp at anl.gov
Wed Jul 10 11:17:44 PDT 2013
It's not quite as easy as one would hope, and it has nothing to do with
Phenix or Phaser. We make use of extensive multiprocessing across our
cluster in our RAPD software used at the beamline to speed up the
results for the user.
I set this up a while ago, but from what I remember, you first have to
setup a 'parallel environment' (PE) in SGE. The options maybe different
depending on your version of SGE. We use 6.2u4. There are probably
default PE's setup already but they might not have the correct
parameters. I created a new one called 'smp' with the number of 'slots'
set to the number of cores of your cluster, or some other lower limit.
(If you set 'slots' to 12 and you submit 5 jobs requiring 4 slots each,
only three will run, until one has finished and the resources are free.)
The 'allocation rules' are set to '$pe_slots' so that the job can use
only the cores on a single node. There are other rules that might be
better for what you want to do. In our case, I setup different queues
with different priorities that have access to specific PE's depending on
the jobs that are getting submitted at the beamline. Your setup may not
need this complexity. I would read through the huge manual for SGE for
details or do a search on Oracle's website.
When you submit the job, make sure you add 'qsub ... -pe smp 1-4 ...'
which will tell SGE that your job will need 1-4 cores on a single node.
You could also just specify a single integer (4 instead of 1-4) to
request 4 slots. Obviously, you can modify these to your needs. After
you submit the job, run 'qstat' and look at the last column labeled
'slots' to see how many slots are saved for the job.
In your Phaser command include 'JOBS 4' to match your requested number
of slots. I am not sure how much this speeds up a single Phaser job
because there isn't a whole lot of code to parallelize in MR. Randy Read
mentioned (either to me or the BB, I don't remember) that everything
that could be parallelized in Phaser is done.
On a side note, if you write code in Python, and you start a new
multiprocessing.Process() it will automatically launch it on another
core on the same node. You have to account for this when you request a
specific number of slots during job submission, otherwise you could
overload your cluster pretty quickly. Many programs will have an
optimized number of slots to request and requesting more slots will not
make it run any faster, but it will limit resources available for other
jobs on the cluster. I assume Phaser is one of these programs.
Jonathan P. Schuermann, Ph. D.
Beamline Scientist, NE-CAT
Argonne National Laboratory, 436E
9700 S. Cass Ave.
Argonne, IL 60439
Email: schuerjp at anl.gov
Tel: (630) 252-0682
On 07/10/2013 09:16 AM, L. Costenaro (IBB) wrote:
> I am trying to run phaser in a SGE cluster (qsub) using multiple proc
> (either phenix.phaser from the command line or phaser-MR from the
> GUI), but the jobs do not parallelize. When I run the phaser-MR
> locally (same executable) it does parallelize (multiple python threads).
> Any help , advice would be welcome.
> Best regards,
> phenixbb mailing list
> phenixbb at phenix-online.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the phenixbb