Uploading error

Message boards : Number crunching : Uploading error

To post messages, you must log in.

AuthorMessage
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 22,715,646
RAC: 3,361
Message 69000 - Posted: 4 Jan 2011, 0:49:01 UTC

What's up with this?

1/3/2011 7:44:15 PM rosetta@home [error] Error reported by file upload server: [mem_prub_run05_centroid_round03_A_subrun_007643_SAVE_ALL_OUT_IGNORE_THE_REST_22824_46_0_0] locked by file_upload_handler PID=-1
ID: 69000 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,084,721
RAC: 1,523
Message 69003 - Posted: 4 Jan 2011, 16:51:54 UTC

Never a dull moment... :-(

Get a couple dozen of finished jobs with "upload pending" across several hosts as well..

But there are now 9million jobs to be process in queue and all servers show green... :-(

Ralf
ID: 69003 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 8,784
Message 69005 - Posted: 4 Jan 2011, 18:03:56 UTC

The whole site went down for about 8 hours, presumably to fix the problems over the holiday, and everyone's now uploading all their work. It's taking up to an hour to validate my first uploads but that should improve. New work started coming down 2 hours ago.

Hopefully they've fixed the stats export too.
ID: 69005 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile banditwolf

Send message
Joined: 10 Jan 06
Posts: 28
Credit: 139,737
RAC: 0
Message 69006 - Posted: 4 Jan 2011, 18:46:09 UTC

I get this now:
1/4/2011 1:42:28 PM|rosetta@home|Message from server: Project is temporarily shut down for maintenance

Also all server parts are listed as 'Disabled'. Looks like it could be being worked on.
ID: 69006 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,084,721
RAC: 1,523
Message 69007 - Posted: 4 Jan 2011, 18:47:22 UTC

Well, now all but the web server show "disabled" (in green?), so hopefully this will all be fixed soon, would be 7 days by now, that's quite a time even considering the holidays.
At least some form of acknowledgment on the projects web site would have been nice to show that they are aware of the problems... :?

Ra;f
ID: 69007 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 8,784
Message 69008 - Posted: 4 Jan 2011, 19:07:33 UTC

Ditto. Looks like I spoke a bit too soon. A finished file uploaded but I couldn't report the task back for the moment. Hopefully not long now....
ID: 69008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 69010 - Posted: 4 Jan 2011, 19:37:04 UTC

Looks like the stats have now been published.
Rosetta Moderator: Mod.Sense
ID: 69010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Plasmon_attack

Send message
Joined: 2 May 10
Posts: 13
Credit: 15,451,384
RAC: 0
Message 69013 - Posted: 4 Jan 2011, 22:29:39 UTC

I see the scheduler is back up and now I can upload finished work units, but it's still not sending out work. I have ~50 nodes waiting for work, several of them hyperthreaded. I joined yesterday. Is this much downtime typical? I usually have them fetch enough work for 2 days. Should I up this to 3, or 5, or 10? Since most of them were new to the project they hadn't yet received a full two days of work unites.
ID: 69013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 69014 - Posted: 4 Jan 2011, 22:57:17 UTC - in response to Message 69013.  
Last modified: 4 Jan 2011, 22:58:51 UTC

Is this much downtime typical?


Not really. At a guess there have only been about 5 outages over 12 hours in the last 6 months. While some of those took a while to get fixed, Rosetta is quite stable the majority of the time.
ID: 69014 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 8,784
Message 69015 - Posted: 5 Jan 2011, 0:02:16 UTC

Between Xmas and New Year I'd say it was typical. In fact it was far worse last year with just about everyone out of Rosetta WUs for days. This time I'd guess most people are still running - just a bit of a hiccup for a few hours.

Todays credits are between 2 and 4 times my highest ever. Nothing lost that I can see.
ID: 69015 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 69016 - Posted: 5 Jan 2011, 4:50:54 UTC

Plasmon Attack said:

Is this much downtime typical?


Absolutely not, Rosetta is one of the most stable BOINC-based projects around. Unfortunately, the level of communications with the community is. Mod.Sense does his best but I think that when we hit one of these speed bumps he has as much trouble getting system status from the admins as we do.

I really think that if the "home page" is up then the system administrators should have a little two or three line post providing a little insight as to what the problem is.

If the project is completely down, then the same status could be posted on their Facebook page.

No one is expecting "real time" updates - keeping the community informed is not a burden, it is a common courtesy which should be extended all members of the team.

ID: 69016 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ross Parlette

Send message
Joined: 10 Nov 05
Posts: 32
Credit: 2,165,044
RAC: 0
Message 69017 - Posted: 7 Jan 2011, 20:08:12 UTC

I have been experiencing some problems w/ uploading, starting in late December 2010. Even then, it had been taking multiple tries to u/l completed WU. My last successful u/l was on Jan 3 and I have two pending right now.

Has anyone else had problems u/l WU in this time frame?

Thanks.
ID: 69017 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EW-3

Send message
Joined: 1 Sep 06
Posts: 27
Credit: 2,561,427
RAC: 0
Message 69023 - Posted: 7 Jan 2011, 23:19:30 UTC - in response to Message 69017.  

I have been experiencing some problems w/ uploading, starting in late December 2010. Even then, it had been taking multiple tries to u/l completed WU. My last successful u/l was on Jan 3 and I have two pending right now.

Has anyone else had problems u/l WU in this time frame?

Thanks.


yup, sitting on a boatload of finished units.



ID: 69023 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
gahudock

Send message
Joined: 17 Aug 06
Posts: 2
Credit: 622,685
RAC: 0
Message 69024 - Posted: 8 Jan 2011, 0:18:19 UTC

Is there anywhere we can go to get "fficial information" on what's going on? Or is this it (i.e. just a bunch of questions with no official response)?
ID: 69024 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 69025 - Posted: 8 Jan 2011, 0:27:13 UTC - in response to Message 69024.  

Is there anywhere we can go to get "fficial information" on what's going on? Or is this it (i.e. just a bunch of questions with no official response)?


When the project went down there was a temporary message put up on their webserver stating that the fileserver crashed and that they are working to restore it as soon as possible.

ID: 69025 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill

Send message
Joined: 11 May 07
Posts: 1
Credit: 940,762
RAC: 0
Message 69033 - Posted: 8 Jan 2011, 7:33:32 UTC - in response to Message 69025.  

I too having been waiting for a while for upload and additional work items - All servers (except the feeder) are working and active according to the status report.

Is someone working on this?

Bill H
----------------------------------------------------------------------------

08/01/2011 07:29:16 rosetta@home update requested by user
08/01/2011 07:29:21 rosetta@home Sending scheduler request: Requested by user.
08/01/2011 07:29:21 rosetta@home Reporting 2 completed tasks, requesting new tasks for CPU and GPU
08/01/2011 07:29:24 rosetta@home Scheduler request completed: got 0 new tasks
08/01/2011 07:29:24 rosetta@home Message from server: Server error: can't attach shared memory
ID: 69033 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 808,098
RAC: 0
Message 69034 - Posted: 8 Jan 2011, 7:43:16 UTC

From front page.

Jan 7, 2010
Well, our luck ran out. The SAN controller that has been causing so much trouble in the last few months finally tipped over in a rather distructive fashion, corrupting the binary tree on which the filesystem is based. We're trying to rebuild the thing but the sheer number of files in the filesystem (> 10M files) makes this process very, very slow. We're bringing the project up from a recent backup (12/09/10) but the backup wasn't a perfect replica of the environment, so we're having to scramble to get all the parts working together again. We only need a few more weeks and then our new, next generation SAN will be ready to be put into place... I just thought the old one would last a few more week. I apologize for the hassle and appreciate your patience as we get things online again... KEL 01/07/11

Have a crunching good day!!
ID: 69034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 8,784
Message 69053 - Posted: 9 Jan 2011, 0:32:27 UTC - in response to Message 69015.  

This time I'd guess most people are still running - just a bit of a hiccup for a few hours.

I really wish I hadn't written that...
ID: 69053 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Uploading error



©2024 University of Washington
https://www.bakerlab.org