Message boards : Number crunching : loss of credit post crash
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
i had 34 work units report post crash. out of this 27 came up client error and only 7 returned with a ok status. whats up with this? |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
don't cry too hard, lol, I "lost" ~40 wu's. But I did see that 2 or 3 of them had "compute errors". trying to again overtake me in Rosie credits Belgian, lol ?! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Not picking on you here Greg... but a cursory review of your failed tasks shows the following message: <error_message>user requested transfer abort</error_message> ...if you abort the upload of your results (as it appears occurred), your results cannot be useful to the project. Rosetta Moderator: Mod.Sense |
agge Send message Joined: 14 Nov 06 Posts: 63 Credit: 432,341 RAC: 0 |
I doubt this is related, but yesterday, on one computer, I got 'compute error' on all of the WU for all projects (seti, einstein & wcg) except rosetta. It seems to be fine now after I reset the projects and restarted the computer. Any idea what this was about? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
doh! so perhaps i should have just let them continue and try and communicate with the project last night during the communications troubles. I thought I was just stopping them from trying to communicate and not do a total abort. so how do i go about just making them pause if they are already cued in the transfer section? suspend network activity or what? Not picking on you here Greg... but a cursory review of your failed tasks shows the following message: |
Beezlebub Send message Joined: 18 Oct 05 Posts: 40 Credit: 260,375 RAC: 0 |
I doubt this is related, but yesterday, on one computer, I got 'compute error' on all of the WU for all projects (seti, einstein & wcg) except Rosetta. It seems to be fine now after I reset the projects and restarted the computer. Any idea what this was about?A graphics glitch on one of my computers will crash any WU running at the time with a "client error" msg. Rosetta, Cpdn, anything with graphics.When I track down the problem I'll post back. (might be awhile tho) e6600 quad @ 2.5ghz 2418 floating point 5227 integer e6750 dual @ 3.71ghz 3598 floating point 7918 integer |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
doh! so perhaps i should have just let them continue and try and communicate with the project last night during the communications troubles. I thought I was just stopping them from trying to communicate and not do a total abort. I did the same with one of mine... then I remembered it deletes it ;-) I was half asleep. Yes suspend network activity. May open up a 'trac' at boinc for 'suspend' to be added to individual uploads.. as suspend network activity suspends ALL netwrok activity. Team mauisun.org |
(_KoDAk_) Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
104537423 94863610 10 Sep 2007 4:39:11 UTC 11 Sep 2007 5:52:37 UTC Over Success Done 6,936.59 42.07 20.00 104510027 94838135 10 Sep 2007 2:41:23 UTC 10 Sep 2007 14:56:26 UTC Over Success Done 42,022.69 254.87 20.00 104510025 94838133 10 Sep 2007 2:41:23 UTC 10 Sep 2007 15:52:50 UTC Over Success Done 27,928.73 169.39 20.00 104510023 94838131 10 Sep 2007 2:41:23 UTC 10 Sep 2007 23:58:21 UTC Over Success Done 15,540.84 94.26 20.00 ????????????????? 20.00 ??? what is IT ???????????? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
KoDAk, your work units were ended prematurely by the watchdog. You are having the same issue described by several others in the "Problems with..." thread where the Rosetta score is stuck for 900 seconds. So the 20 credits is basically a thank you for trying to crunch the task. These were probably issued by the nightly run to award credit for failed tasks. The project is working both on preserving any useful work done on the task (which is probably why it didn't show as a failure in the list), and on resolving the problem with some of the CAPRI tasks that causes many of them to end in this way. Rosetta Moderator: Mod.Sense |
No longer involved Send message Joined: 19 Mar 06 Posts: 22 Credit: 327,220 RAC: 0 |
The credit I have been getting since the crash shows me going backwards by the hour. This is a really good system. The more work we do now the less we get credit for doing. Guess it is time to more on. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The credit I have been getting since the crash shows me going backwards by the hour. This is a really good system. The more work we do now the less we get credit for doing. Guess it is time to more on. Phinehas, please define how you are seeing less credits issued for work completed then you were prior to the system outage. Because the credit system is the same as it has always been. Are you looking at RAC? Or credit for specific tasks? Rosetta Moderator: Mod.Sense |
Zxian Send message Joined: 17 May 07 Posts: 18 Credit: 1,173,075 RAC: 0 |
Since the system outage, I'm getting far, far more WUs with the 20-credit "thank you" than before. I actually think that I never saw this before the outage. I've tried to "fix" this by making my machines run for only 3 hours per WU, but this isn't an ideal solution. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Since the system outage, I'm getting far, far more WUs with the 20-credit "thank you" than before. I actually think that I never saw this before the outage. I've tried to "fix" this by making my machines run for only 3 hours per WU, but this isn't an ideal solution. This seems to be due to the new type of tasks that are presently being send out. You will note they have "CAPRI" in the name. I'm sure Rhiju is working hard on resolving these issues. They are working to predict structures for a CAPRI challenge. Rosetta Moderator: Mod.Sense |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
Valid results returned past the deadline have been granted the claimed credit. The maximum value possible is 300 so if you claimed over 300 you get 300. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
David, I don't believe anyone in this thread has tasks where they claimed that much credit. I think the issue is the CAPRI tasks that are ended by watchdog due to Rosetta score not moving for 900 seconds. Rosetta Moderator: Mod.Sense |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
Oh, I posted to the wrong thread. The watch dog errors suggest that there may be an issue with the application or the specific work units for capri. I'll alert rhiju and the others involved in capri. Since the capri experiment/competition is time sensitive, they may not be able to address the issue soon. |
No longer involved Send message Joined: 19 Mar 06 Posts: 22 Credit: 327,220 RAC: 0 |
The credit I have been getting since the crash shows me going backwards by the hour. This is a really good system. The more work we do now the less we get credit for doing. Guess it is time to more on. I am looking at Average Work Done which is now down to 1286 and dropping fast. This seems to relate to the increasing delays from the server. It is now putting out 'communication deferred' times in the hours each day. The ranking of computers has dropped me from being around 6 or 7 to somewhere around 39 now. Why do we bother with these kinds of stats when the host site determines the out comes? I have watched hours and hours of work units sitting here not being able to be returned because the server was delaying communications. |
No longer involved Send message Joined: 19 Mar 06 Posts: 22 Credit: 327,220 RAC: 0 |
The credit I have been getting since the crash shows me going backwards by the hour. This is a really good system. The more work we do now the less we get credit for doing. Guess it is time to more on. I have been trying to respond to your request but the server does not take the update. This message shows what the server is doing to jobs running. The message boards say the server is up and running yet I keep getting this type of message, sometimes into the multiple hours of delay. That delay turns into reduce results and standings in the Teams and Computer ratings. Tue 18 Sep 22:26:26 2007|rosetta@home|Message from server: Project is temporarily shut down for maintenance Tue 18 Sep 22:26:26 2007|rosetta@home|Deferring communication for 1 hr 0 min 0 sec Tue 18 Sep 22:26:26 2007|rosetta@home|Reason: project is down |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Look instead at the total work completed credits. The average is based on something like the past 2 weeks, so you would expect it to drop and 'stay dropped' until the outage timeframe begins to fall outside that two week window. Daily credits for me still haven't quite recovered to the pre-crash levels -- todays hiccups didn't help with that of course, nor did the release into the wild of some 'bad boy' work units which CPU's would chew on but not yield credit. Take a look at the message board topic regarding the 5.80 application and look thru it -- work units with 'Capri' in the title have been mentioned as work units you want to abort.
|
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Hi everybody: Thanks for your posts and for your patience over the last week. Quite a few things have been crazy. We have been testing all our workunits on the RALPH test server and they went through fine -- so your feedback over here at Rosetta@home has been critical to identifying and (in some cases) fixing new problems. The issue with the CAPRI workunits appears to be the large numebr of generated models and the size of output files; this was hammering our already frazzled fileservers. We are no longer sending out those jobs -- if we do, we'll fix this issue first. We're very sorry for this problem; it was totally unanticipated. There was also a separate issue with some workunits sent out before the crash not being accepted as valid; we had a problem with the database, and I think DK has fixed this. Then of course there was the massive outage; as BarryAZ has explained, this is causing some craziness with the credits that should hopefully be gone in a week or so. If you can, bear with us here. The results we're getting back are exciting on a number of scientific fronts. The CAPRI data on predicting protein-protein interactions is very interesting and we're analyzing it now. The work with NMR-constrained protein structural inference has the potential to revolutionize how structures are solved. And there's more exciting stuff coming soon -- we'll try to be as careful as possible! Look instead at the total work completed credits. The average is based on something like the past 2 weeks, so you would expect it to drop and 'stay dropped' until the outage timeframe begins to fall outside that two week window. Daily credits for me still haven't quite recovered to the pre-crash levels -- todays hiccups didn't help with that of course, nor did the release into the wild of some 'bad boy' work units which CPU's would chew on but not yield credit. Take a look at the message board topic regarding the 5.80 application and look thru it -- work units with 'Capri' in the title have been mentioned as work units you want to abort. |
Message boards :
Number crunching :
loss of credit post crash
©2025 University of Washington
https://www.bakerlab.org