SERVER PROBLEMS.

Message boards : Number crunching : SERVER PROBLEMS.

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2130
Credit: 41,424,155
RAC: 16,102
Message 62487 - Posted: 27 Jul 2009, 2:10:59 UTC - in response to Message 62485.  

I had been trying all day to get some work units uploaded. Unfortunately none of them seem to be getting "Received" and credited. Has anyone else been seeing this problem along with all the other communication issues this weekend?

Uploading is an issue, but those that do manage to get returned are being credited quickly enough - no problem there. It's taken most of the day, but my completed WUs have all eventually found their way to the server. It's not the worst problem in the world unless you're close to deadline. They'll all go through eventually.

It's just those downloads that keep failing...

It doesn't look like it's much better this morning, all files but this one are O.K. which is a problem because it's the biggest - all 27 plus MB's.

Trouble is that this database is used by all 1.86 WUs, so when it fails none of the other downloaded parts count. Frustrating. I think I'm down to my last 12 hours work...
ID: 62487 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62488 - Posted: 27 Jul 2009, 2:25:55 UTC

Hi Sid.

// Trouble is that this database is used by all 1.86 WUs, so when it fails none of the other downloaded parts count. Frustrating. I think I'm down to my last 12 hours work...//

How true, your doing better then me i've got an hour left on one rig, the others are out.


ID: 62488 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62490 - Posted: 27 Jul 2009, 6:11:55 UTC

What a great weekend!

Getting this now just to top it all off.

Mon 27 Jul 2009 16:08:10 EST|rosetta@home|Sending scheduler request: To fetch work. Requesting 25276 seconds of work, reporting 0 completed tasks
Mon 27 Jul 2009 16:08:51 EST|rosetta@home|Scheduler request succeeded: got 0 new tasks

ID: 62490 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 62491 - Posted: 27 Jul 2009, 8:51:56 UTC

p.p.l. you can always download more work and store it on your system and then you will never run out unless its a long outage on the server.

just change either your RAH user settings or the boinc manager network usage.
i store an extra 5 days or work on my system this way, came close to running out but never ran out of work.
ID: 62491 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill G
Avatar

Send message
Joined: 28 Dec 07
Posts: 6
Credit: 11,405,707
RAC: 12,670
Message 62496 - Posted: 27 Jul 2009, 11:32:17 UTC

I keep 5 days worth on my systems, and they are bone dry.
ID: 62496 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2130
Credit: 41,424,155
RAC: 16,102
Message 62507 - Posted: 27 Jul 2009, 15:53:11 UTC - in response to Message 62488.  

How true, your doing better then me i've got an hour left on one rig, the others are out.

Not any more... :(
ID: 62507 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BAV
Avatar

Send message
Joined: 18 Jul 09
Posts: 1
Credit: 12,461
RAC: 0
Message 62510 - Posted: 27 Jul 2009, 16:33:52 UTC

I have just started crunching Rosetta@home due to the poor performance of SETI@home recently. I have 5 machines with 12 processors between them waiting to crunch something!!

Oh well back to SETI I suppose.
ID: 62510 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 62513 - Posted: 27 Jul 2009, 19:01:56 UTC

I'm getting 1.87 tasks now, and they seem to be crunching.

The server is very slow at the moment. I expect it will catch up after a while and then things will be back to normal.
ID: 62513 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
david @ TPS

Send message
Joined: 26 Nov 06
Posts: 3
Credit: 881,762
RAC: 0
Message 62514 - Posted: 27 Jul 2009, 19:05:03 UTC

The mass exodus from Berzerkely has apparently overloaded most other projects.

I am attached to several, and am not able to get work on most of them. Fortunately, between them all, enough is flowing to keep most boxes with enough work.



ID: 62514 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Gen_X_Accord
Avatar

Send message
Joined: 5 Jun 06
Posts: 154
Credit: 279,018
RAC: 0
Message 62529 - Posted: 27 Jul 2009, 23:47:14 UTC

Whatever is going on, I'm out of work for Rosetta and Folding's gpu client. I'm running Folding's regular client just to have something to crunch. I'm almost ready to shut the computer off for an extended period of time for the first time in 3 years.
ID: 62529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2130
Credit: 41,424,155
RAC: 16,102
Message 62530 - Posted: 27 Jul 2009, 23:53:26 UTC - in response to Message 62513.  

I'm getting 1.87 tasks now, and they seem to be crunching.

Really? I just got my first WUs and the download failed, the same as before :(

'Server down' messages again:

28/07/2009 00:41:56 rosetta@home Reporting 9 completed tasks, requesting new tasks
28/07/2009 00:42:18 Project communication failed: attempting access to reference site
28/07/2009 00:42:19 Internet access OK - project servers may be temporarily down.
28/07/2009 00:42:21 rosetta@home Scheduler request failed: Couldn't connect to server
ID: 62530 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62532 - Posted: 28 Jul 2009, 1:18:02 UTC

There are still problems by the look of it.

I just had my quad D/L the exe and two tasks, but no DB zip for some reason so

the tasks errored strait away. So i have done a reset to see if that helps.

ID: 62532 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2130
Credit: 41,424,155
RAC: 16,102
Message 62535 - Posted: 28 Jul 2009, 2:15:23 UTC - in response to Message 62532.  

I just had my quad D/L the exe and two tasks, but no DB zip for some reason so the tasks errored straight away.

Yes, well spotted. I wondered why it said "Download failed" when everything seemed to come down ok.
ID: 62535 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62544 - Posted: 28 Jul 2009, 6:52:45 UTC

Still getting nothing here.

Tue 28 Jul 2009 16:28:27 EST|rosetta@home|Sending scheduler request: Requested by user. Requesting 161758 seconds of work, reporting 0 completed tasks
Tue 28 Jul 2009 16:29:25 EST|rosetta@home|Scheduler request succeeded: got 0 new tasks


ID: 62544 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 62564 - Posted: 28 Jul 2009, 19:04:32 UTC

all times are CET

7/28/2009 9:01:23 PM|rosetta@home|Started download of lr8_1tul.out.zip
7/28/2009 9:01:29 PM||Project communication failed: attempting access to reference site
7/28/2009 9:01:29 PM|rosetta@home|Temporarily failed download of boinc_rb1_1tul.pdb: HTTP error
7/28/2009 9:01:29 PM|rosetta@home|Started download of boinc_rb1_1dhn.pdb
7/28/2009 9:01:30 PM||Internet access OK - project servers may be temporarily down.
7/28/2009 9:02:02 PM|rosetta@home|Finished download of boinc_rb1_1dhn.pdb
7/28/2009 9:02:02 PM|rosetta@home|Started download of lr8_1dhn.out.zip
ID: 62564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62570 - Posted: 28 Jul 2009, 21:52:45 UTC

I see things haven't improved this morning, for me at least.

Wed 29 Jul 2009 07:45:06 EST|rosetta@home|Fetching scheduler list
Wed 29 Jul 2009 07:45:16 EST|rosetta@home|Master file download succeeded
Wed 29 Jul 2009 07:45:21 EST|rosetta@home|Sending scheduler request: To fetch work. Requesting 26435 seconds of work, reporting 0 completed tasks
Wed 29 Jul 2009 07:45:31 EST|rosetta@home|Scheduler request succeeded: got 0 new tasks
Wed 29 Jul 2009 07:47:30 EST||Project communication failed: attempting access to reference site
Wed 29 Jul 2009 07:47:30 EST|rosetta@home|Temporarily failed upload of 1qlx_NNMAKE_CONSTRAINT_BOINC_ABRELAX_SAVE_ALL_OUT_14240_677_2_0: HTTP error
Wed 29 Jul 2009 07:47:30 EST|rosetta@home|Backing off 1 min 0 sec on upload of 1qlx_NNMAKE_CONSTRAINT_BOINC_ABRELAX_SAVE_ALL_OUT_14240_677_2_0
Wed 29 Jul 2009 07:47:31 EST||Internet access OK - project servers may be temporarily down.



ID: 62570 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stan Wells

Send message
Joined: 24 Jan 06
Posts: 11
Credit: 1,723,723
RAC: 0
Message 62572 - Posted: 28 Jul 2009, 22:22:46 UTC

I am seeing the same problems. One of my machines was totally empty - I suspended all other projects (for the fifth time). This time I had extra time to stick around and after about four hours it picked up a few. My other machine has one uploading (not successful after two tries) and has one which it finally downloaded this morning. I have my prefs set for 6 days of work and it gave me one work unit which it states has a 4 hour time to completion. On this machine with 200 to this and 100 to just one other project - I would expect it to be finished in 6 hours - max.


ID: 62572 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Warped

Send message
Joined: 15 Jan 06
Posts: 48
Credit: 1,788,185
RAC: 0
Message 62592 - Posted: 29 Jul 2009, 11:05:50 UTC - in response to Message 62572.  

I am seeing the same problems. One of my machines was totally empty - I suspended all other projects (for the fifth time). This time I had extra time to stick around and after about four hours it picked up a few. My other machine has one uploading (not successful after two tries) and has one which it finally downloaded this morning. I have my prefs set for 6 days of work and it gave me one work unit which it states has a 4 hour time to completion. On this machine with 200 to this and 100 to just one other project - I would expect it to be finished in 6 hours - max.



Try increasing your target CPU run time in your preferences. This reduces the amount of server contact required. I have mine set to 16 hours and I seem to be able to get work. Maybe there are more of the longer workunits available.
Warped

ID: 62592 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 62595 - Posted: 29 Jul 2009, 14:04:09 UTC - in response to Message 62592.  

Maybe there are more of the longer workunits available.


...that's not it. The workunit itself is identical regardless of your target runtime. The work unit itself enforces the runtime for you. It does this by starting a new model on the same protein until the runtime target is estimated to be exceeded by doing another model.

Rosetta Moderator: Mod.Sense
ID: 62595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 62605 - Posted: 29 Jul 2009, 23:47:15 UTC

Why is it that when the server goes "down" the status on the home page does not change? is it an interal issue with the server that is not bad enough to trigger a status change or is it that the server has to be "offline" completely?

It's confusing to see in messages that the server is down but the server status page is all green.

any ideas?
ID: 62605 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 · Next

Message boards : Number crunching : SERVER PROBLEMS.



©2024 University of Washington
https://www.bakerlab.org