AutoBuild fails using NFS on Snow Leopard
I've been having a problem running AutoBuild jobs in NFS-mounted project directories on Snow Leopard. The jobs start ok, but typically fail in a build cycle with the error "File exists". The problem is reproducible, but the exact details of the failure are not - the point at which the error occurs changes from run to run. Identical jobs run to completion if the project directory is on a local disk or an AFP-mounted volume. My guess is that it's a file locking issue, or maybe some sort of race condition in file creation. Has anybody else experienced this and if so, is there a solution? The details are: OS X 10.6.6, mounting volumes by NFS from a variety of machines (Ubuntu 10.0.4, OS X 10.6.6 & 10.4), Phenix 1.7-650. Regards, Chris -----%<----{ Log Extract }----- Sorry, a subprocess has failed... END OF LOG FILE /work/nedu/foop/phenix_nedu_nfs/AutoBuild_run_3_/TEMP0/RUN_FILE_3.log : rd desired_run_number=self.desired_run_number) File "/common/app/phenix-1.7-650/phenix/phenix/autosol/wizard_command_line.py", line 132, in __init__ overwrite_defaults=False,desired_run_number=desired_run_number) File "/common/app/phenix-1.7-650/phenix/phenix/wizards/AutoBuild.py", line 83, in __init__ init_OutputDir=init_OutputDir,init_RunInstance=init_RunInstance) File "/common/app/phenix-1.7-650/phenix/phenix/autosol/AutoBaseExtend.py", line 613, in local_init quiet=self.quiet,OutputDir=self.OutputDir) #032107 OutputDir File "/common/app/phenix-1.7-650/phenix/phenix/autosol/RunWizardPDS.py", line 65, in __init__ self.ScriptPDS=ScriptPDS(workspace=workspace) File "/common/app/phenix-1.7-650/phenix/phenix/pds/ScriptPDS.py", line 29, in __init__ PhenixDataStorage.__init__(self, dirname) File "/common/app/phenix-1.7-650/phenix/phenix/pds/PDS.py", line 233, in __init__ os.mkdir(self.dirname) OSError: [Errno 17] File exists: 'PDS/AutoBuild_run_1_' Possibly this subprocess is run on a machine with different architecture than the main process? -- Dr Chris Richardson :: Sysadmin, structural biology, icr.ac.uk The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
Hi Chris, I'm sorry for the trouble, and thanks very much for pointing it out. I have not seen this exact problem before, but there is a general problem with NFS-mounted disks as you guessed in terms of competing sub-processes trying to set up directories. AutoBuild checks to see if a directory already exists before it creates it, but if a directory of that name gets created by another process in the time between the check and when it creates it, there can be a failure. Similarly, there can be a problem when a file is "written" to a NFS-mounted disk, but it may not appear until some time later. In recent versions (including yours) there is the option to increase the wait time for allowing a file to be written (checking over and over until the file appears) with the keyword max_wait_time=0.1. I have one idea that you can try now. Can you try this simple fix for me? Your file /common/app/phenix-1.7-650/phenix/phenix/autosol/run_group_of_wizards.py has the following lines in it near the top: class run_group_of_wizards(GeneralMethods): def __init__(self,workdir='',OutputDir='',top_output_dir='', run_command="sh ", condor=None, background=True,debug=False,nproc=None,verbose=False,wait_time=1., wait_between_submit_time=0.1,ignore_errors_in_subprocess=False, CallingWizard=None,base_path=None, run_any_method=False,out=sys.stdout,quiet=True): I want you to edit the number for wait_between_submit_time=0.1 above, to some much larger number like "10.". This will give the individual jobs time to get set up without interfering with each other. Does this fix the problem? (If so, I will make that a parameter to AutoBuild.) All the best, Tom T
I've been having a problem running AutoBuild jobs in NFS-mounted project directories on Snow Leopard. The jobs start ok, but typically fail in a build cycle with the error "File exists". The problem is reproducible, but the exact details of the failure are not - the point at which the error occurs changes from run to run. Identical jobs run to completion if the project directory is on a local disk or an AFP-mounted volume.
My guess is that it's a file locking issue, or maybe some sort of race condition in file creation. Has anybody else experienced this and if so, is there a solution?
The details are: OS X 10.6.6, mounting volumes by NFS from a variety of machines (Ubuntu 10.0.4, OS X 10.6.6 & 10.4), Phenix 1.7-650.
Regards,
Chris
-----%<----{ Log Extract }-----
Sorry, a subprocess has failed...
END OF LOG FILE /work/nedu/foop/phenix_nedu_nfs/AutoBuild_run_3_/TEMP0/RUN_FILE_3.log :
rd desired_run_number=self.desired_run_number) File "/common/app/phenix-1.7-650/phenix/phenix/autosol/wizard_command_line.py", line 132, in __init__ overwrite_defaults=False,desired_run_number=desired_run_number) File "/common/app/phenix-1.7-650/phenix/phenix/wizards/AutoBuild.py", line 83, in __init__ init_OutputDir=init_OutputDir,init_RunInstance=init_RunInstance) File "/common/app/phenix-1.7-650/phenix/phenix/autosol/AutoBaseExtend.py", line 613, in local_init quiet=self.quiet,OutputDir=self.OutputDir) #032107 OutputDir File "/common/app/phenix-1.7-650/phenix/phenix/autosol/RunWizardPDS.py", line 65, in __init__ self.ScriptPDS=ScriptPDS(workspace=workspace) File "/common/app/phenix-1.7-650/phenix/phenix/pds/ScriptPDS.py", line 29, in __init__ PhenixDataStorage.__init__(self, dirname) File "/common/app/phenix-1.7-650/phenix/phenix/pds/PDS.py", line 233, in __init__ os.mkdir(self.dirname) OSError: [Errno 17] File exists: 'PDS/AutoBuild_run_1_'
Possibly this subprocess is run on a machine with different architecture than the main process? -- Dr Chris Richardson :: Sysadmin, structural biology, icr.ac.uk
The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On 28 Jan 2011, at 16:37, Thomas C. Terwilliger wrote:
I want you to edit the number for wait_between_submit_time=0.1 above, to some much larger number like "10.". This will give the individual jobs time to get set up without interfering with each other.
Does this fix the problem? (If so, I will make that a parameter to AutoBuild.)
My first test with wait_between_submit_time=10 has just completed successfully. I shall run some more tests, but it looks like this change fixes the problem. Thanks for your help, Chris The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
Hi Chris, Oh good. Current overnight builds of phenix now have this as a parameter: wait_between_submit_time = 1.0 .help = " You can specify the length of time (seconds) to wait " "between each job that is submitted when running sub-processes." "This can be helpful on NFS-mounted systems when running " "with multiple processors to avoid file conflicts." "The symptom of too short a wait_between_submit_time is" "File exists:...." Let me know of any other trouble with this! All the best, Tom
On 28 Jan 2011, at 16:37, Thomas C. Terwilliger wrote:
I want you to edit the number for wait_between_submit_time=0.1 above, to some much larger number like "10.". This will give the individual jobs time to get set up without interfering with each other.
Does this fix the problem? (If so, I will make that a parameter to AutoBuild.)
My first test with wait_between_submit_time=10 has just completed successfully. I shall run some more tests, but it looks like this change fixes the problem.
Thanks for your help,
Chris
The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
Dear Tom, I have downloaded new phenix 1.7 and did autsol. It produces two resolve_1.mtz and resolve_2.mtz and exptl_fobs_phases_freeR_flags_1_with_hl_anom.mtz. 1. resolve_1.mtz has good solvent boundaries other two maps not good at all. Thought Expt_fobs should have modified phase information? I did traced 3.0 ang map R/Rfree is 25/32 %. Thought may be some error in the model may be back trace etc... ? Is it possible for a wrong trace at this R/Rfree values? I did tried to Autobuild several times to get a other possible trace options by given my model as input build in place both true and false , but Autobuild never given me any satisfactory result. So I tried phenix.find_helix options without any model that given me different trace , so like to proceed in that direction. Is it possible to get polyalanine model for the entire trace instead of CA either from find_helix_strand or from autobuild? Thanks Ram So I t --- On Tue
participants (3)
-
Chris Richardson
-
r n
-
Thomas C. Terwilliger