Good day! I am trying to get a bunch of OS X 10.6 workstations managed by Condor to work together on an mr_rosetta project. I am new to batch systems and rather confused. The Phenix manual states: "Single file system required for mr_rosetta All files are stored on a single file system that must be accessible to all jobs." What exactly has to go on the network share and how would I make the execution nodes aware of it? Could somebody please share some information on a working setup of this sort? Thanks! Best regards, Dmitry
This is most easily done by allowing your user home directory to be mapped identically on all nodes. That is to say that if the user submitting the job could login to all the nodes, the shell / directory/ filesystem environment looks exactly the same as the machine the user is submitting the job from. F On Sep 1, 2012, at 9:52 PM, Dmitry Rodionov wrote:
What exactly has to go on the network share and how would I make the execution nodes aware of it?
So, basically, no configuration is necessary in case NFS share is mounted on all hosts with exact same path and either all_squash or user with matching UID across all hosts? Dmitry On 2012-09-02, at 12:58 AM, Francis E Reyes wrote:
This is most easily done by allowing your user home directory to be mapped identically on all nodes.
That is to say that if the user submitting the job could login to all the nodes, the shell / directory/ filesystem environment looks exactly the same as the machine the user is submitting the job from.
F
On Sep 1, 2012, at 9:52 PM, Dmitry Rodionov wrote:
What exactly has to go on the network share and how would I make the execution nodes aware of it?
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
With shares out of the way, I started mr_rosetta on home-grown condor pool. Unfortunately it stopped immediately after submitting the jobs with this error: Splitting work into 10 jobs and running with 1 processors using /condor/condor-installed/bin/condor_submit background=None in /Volumes/condor/Sht1/MR_ROSETTA_5/GROUP_OF_PLACE_MODEL_1 Running using condor job submission Starting job 1...Log will be: /Volumes/condor/Sht1/MR_ROSETTA_5/GROUP_OF_PLACE_MODEL_1/RUN_FILE_1.log Submitting job(s) ERROR: No such directory: /Volumes/condor/Sht1/MR_ROSETTA_5/GROUP_OF_PLACE_MODEL_1/"/Volumes/condor/Sht1/MR_ROSETTA_5/GROUP_OF_PLACE_MODEL_1" The error can be reproduced by submitting the phenix-generated jobs manually. I can run the scripts locally without any problem. Sounds like condor_submit is unhappy with the job description files. What did I do wrong? Does anybody know how can this be fixed? Please find the input files attached. Thanks a bunch! Dmitry On 2012-09-02, at 12:58 AM, Francis E Reyes wrote:
This is most easily done by allowing your user home directory to be mapped identically on all nodes.
That is to say that if the user submitting the job could login to all the nodes, the shell / directory/ filesystem environment looks exactly the same as the machine the user is submitting the job from.
F
On Sep 1, 2012, at 9:52 PM, Dmitry Rodionov wrote:
What exactly has to go on the network share and how would I make the execution nodes aware of it?
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
participants (2)
-
Dmitry Rodionov
-
Francis E Reyes