Message boards : Number crunching : What's been up with R@H?
Author | Message |
---|---|
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,821,902 RAC: 15,180 |
Any news? Seems it was offline for the last 17hrs or so... |
rochester new york Send message Joined: 2 Jul 06 Posts: 2842 Credit: 2,020,043 RAC: 0 |
i thought they might have gone out of business.. at least it looks like its running Any news? Seems it was offline for the last 17hrs or so... |
retheridge Send message Joined: 13 Aug 08 Posts: 1 Credit: 362,493 RAC: 0 |
Any news? Seems it was offline for the last 17hrs or so... I don't have the answer, but I can verify I've had the same problem. My results have all since uploaded but I'm still waiting to download workunits -- my status message says, "Communication deferred xx:xx:xx" and it repeats once the countdown completes. I suppose there's a lot of computers waiting in line, but it seems like a long time to have computers sitting idle when we could be processing the next amazing breakthrough. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I can confirm the project was down (by my own observation), and presume that the scheduler is very busy right now. My hosts have been requesting work, but sometimes getting responses back from the scheduler with no new work, just confirming completed results. It will take a couple of hours for things to get more normal. Work is now flowing. But perhaps somewhat sporatically. Rosetta Moderator: Mod.Sense |
Keith E. Laidig Volunteer moderator Project developer Send message Joined: 1 Jul 05 Posts: 154 Credit: 117,189,961 RAC: 0 |
I can confirm the project was down (by my own observation), and presume that the scheduler is very busy right now. My hosts have been requesting work, but sometimes getting responses back from the scheduler with no new work, just confirming completed results. It will take a couple of hours for things to get more normal. Work is now flowing. But perhaps somewhat sporatically. The back-end fileserver experienced a kernel panic while updating the filesystem journal. There is a tremendous amount of I/O on this old machine and sometimes it doesn't keep up well. We're implementing a new back-end filesystem for R@H - along the line of the SAN that was attempted last year - and plan to move R@H over in the next few weeks. The amount of data that exists and the rapidity with which your clients update it makes the transfer of data very challenging indeed... |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Ah -- OK -- thanks for the detail -- I was beginning to think there was some sort of gentlemen's understanding regarding news updates here. Glad to hear an explanation. Note, this is the sort of thing which might have a home on the home page. I can confirm the project was down (by my own observation), and presume that the scheduler is very busy right now. My hosts have been requesting work, but sometimes getting responses back from the scheduler with no new work, just confirming completed results. It will take a couple of hours for things to get more normal. Work is now flowing. But perhaps somewhat sporatically. |
rochester new york Send message Joined: 2 Jul 06 Posts: 2842 Credit: 2,020,043 RAC: 0 |
maybe we could get some kind of e mail alert to let us know rossetta is down but not out ...new people might not hang around long enough if they dont think the system is ever going to come back up >>>and we need new crunchers.. Ah -- OK -- thanks for the detail -- I was beginning to think there was some sort of gentlemen's understanding regarding news updates here. Glad to hear an explanation. Note, this is the sort of thing which might have a home on the home page. |
fjpod Send message Joined: 9 Nov 07 Posts: 17 Credit: 2,201,029 RAC: 0 |
I run R@H on about 10 computers. All but one are working fine. The problem one wasn't reporting finished WUs. Communications were automatically deferred for 24 hours. Been going on for about 2 days, so after trying one last update, I reset the project which didn't work either. I finally deleted the project and went to re-attach and it's telling me R@H is temporarily unavailable. Why on only one of my computers? All the others are receiving and reporting work. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
I run R@H on about 10 computers. All but one are working fine. The problem one wasn't reporting finished WUs. Communications were automatically deferred for 24 hours. Been going on for about 2 days, so after trying one last update, I reset the project which didn't work either. I finally deleted the project and went to re-attach and it's telling me R@H is temporarily unavailable. Why on only one of my computers? All the others are receiving and reporting work. My PCs are working quite fine... |
Path7 Send message Joined: 25 Aug 07 Posts: 128 Credit: 61,751 RAC: 0 |
I run R@H on about 10 computers. All but one are working fine. The problem one wasn't reporting finished WUs. Communications were automatically deferred for 24 hours. Been going on for about 2 days, so after trying one last update, I reset the project which didn't work either. I finally deleted the project and went to re-attach and it's telling me R@H is temporarily unavailable. Why on only one of my computers? All the others are receiving and reporting work. Hello fjpod, After reading your post my first thought is some security software (firewall?) is preventing Boinc from accessing the Internet. Are you able to run another project on that one computer? Good luck, Path7. |
R.L. Casey Send message Joined: 7 Jun 06 Posts: 91 Credit: 2,728,885 RAC: 0 |
I run R@H on about 10 computers. All but one are working fine. The problem one wasn't reporting finished WUs. Communications were automatically deferred for 24 hours. Been going on for about 2 days, so after trying one last update, I reset the project which didn't work either. I finally deleted the project and went to re-attach and it's telling me R@H is temporarily unavailable. Why on only one of my computers? All the others are receiving and reporting work. fjpod, which one of your computers is having the problem? Thanks! |
Message boards :
Number crunching :
What's been up with R@H?
©2024 University of Washington
https://www.bakerlab.org