[phenixbb] AutoBuild fails using NFS on Snow Leopard

Thomas C. Terwilliger terwilliger at lanl.gov
Fri Jan 28 08:37:09 PST 2011


Hi Chris,

I'm sorry for the trouble, and thanks very much for pointing it out.

I have not seen this exact problem before, but there is a general problem
with NFS-mounted disks as you guessed in terms of competing sub-processes
trying to set up directories.  AutoBuild checks to see if a directory
already exists before it creates it, but if a directory of that name gets
created by another process in the time between the check and when it
creates it, there can be a failure.

Similarly, there can be a problem when a file is "written" to a
NFS-mounted disk, but it may not appear until some time later. In recent
versions (including yours) there is the option to increase the wait time
for allowing a file to be written (checking over and over until the file
appears) with the keyword  max_wait_time=0.1.

I have one idea that you can try now.  Can you try this simple fix for me?
 Your file

   /common/app/phenix-1.7-650/phenix/phenix/autosol/run_group_of_wizards.py

has the following lines in it near the top:

class run_group_of_wizards(GeneralMethods):

  def __init__(self,workdir='',OutputDir='',top_output_dir='',
       run_command="sh ",
       condor=None,
       background=True,debug=False,nproc=None,verbose=False,wait_time=1.,
       wait_between_submit_time=0.1,ignore_errors_in_subprocess=False,
       CallingWizard=None,base_path=None,
       run_any_method=False,out=sys.stdout,quiet=True):

I want you to edit the number for wait_between_submit_time=0.1 above, to
some much larger number like "10.".  This will give the individual jobs
time to get set up without interfering with each other.

Does this fix the problem? (If so, I will make that a parameter to
AutoBuild.)

All the best,
Tom T




>> I've been having a problem running AutoBuild jobs in NFS-mounted project
>> directories on Snow Leopard.  The jobs start ok, but typically fail in a
>> build cycle with the error "File exists".  The problem is reproducible,
>> but the exact details of the failure are not - the point at which the
>> error occurs changes from run to run.  Identical jobs run to completion if
>> the project directory is on a local disk or an AFP-mounted volume.
>>
>> My guess is that it's a file locking issue, or maybe some sort of race
>> condition in file creation.  Has anybody else experienced this and if so,
>> is there a solution?
>>
>> The details are: OS X 10.6.6, mounting volumes by NFS from a variety of
>> machines (Ubuntu 10.0.4, OS X 10.6.6 & 10.4), Phenix 1.7-650.
>>
>> Regards,
>>
>> Chris
>>
>> -----%<----{ Log Extract }-----
>>
>> Sorry, a subprocess has failed...
>>
>> END OF LOG FILE
>> /work/nedu/foop/phenix_nedu_nfs/AutoBuild_run_3_/TEMP0/RUN_FILE_3.log :
>>
>> rd
>>     desired_run_number=self.desired_run_number)
>>   File
>> "/common/app/phenix-1.7-650/phenix/phenix/autosol/wizard_command_line.py",
>> line 132, in __init__
>>     overwrite_defaults=False,desired_run_number=desired_run_number)
>>   File "/common/app/phenix-1.7-650/phenix/phenix/wizards/AutoBuild.py",
>> line 83, in __init__
>>     init_OutputDir=init_OutputDir,init_RunInstance=init_RunInstance)
>>   File
>> "/common/app/phenix-1.7-650/phenix/phenix/autosol/AutoBaseExtend.py",
>> line 613, in local_init
>>     quiet=self.quiet,OutputDir=self.OutputDir)  #032107 OutputDir
>>   File "/common/app/phenix-1.7-650/phenix/phenix/autosol/RunWizardPDS.py",
>> line 65, in __init__
>>     self.ScriptPDS=ScriptPDS(workspace=workspace)
>>   File "/common/app/phenix-1.7-650/phenix/phenix/pds/ScriptPDS.py", line
>> 29, in __init__
>>     PhenixDataStorage.__init__(self, dirname)
>>   File "/common/app/phenix-1.7-650/phenix/phenix/pds/PDS.py", line 233, in
>> __init__
>>     os.mkdir(self.dirname)
>> OSError: [Errno 17] File exists: 'PDS/AutoBuild_run_1_'
>>
>> Possibly this subprocess is run on a machine with different
>> architecture than the main process?
>> --
>> Dr Chris Richardson :: Sysadmin, structural biology, icr.ac.uk
>>
>>
>> The Institute of Cancer Research: Royal Cancer Hospital, a charitable
>> Company Limited by Guarantee, Registered in England under Company No.
>> 534147 with its Registered Office at 123 Old Brompton Road, London SW7
>> 3RP.
>>
>> This e-mail message is confidential and for use by the addressee only.  If
>> the message is received by anyone other than the addressee, please return
>> the message to the sender by replying to it and then delete the message
>> from your computer and network.
>> _______________________________________________
>> phenixbb mailing list
>> phenixbb at phenix-online.org
>> http://phenix-online.org/mailman/listinfo/phenixbb
>>




More information about the phenixbb mailing list