Message boards : Number crunching : Max # total results question
Author | Message |
---|---|
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
Hi, I carefully read two good threads on what causes a result to be granted zero credit, even if it was returned on-time successfully. The threads were: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=3217 https://boinc.bakerlab.org/rosetta/forum_thread.php?id=2851 The workunit in question is: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=83885601 In this workunit, it appears two results sent previously did not report back before the deadline. Then, the server sent my computer a result from this workunit to crunch. My computer reported back on-time, but the previous computer reported back before me (but after its deadline). As a consequence, my computer received zero credit for the workunit. I'm really not going to complain about a few credits, but I would like to help the situation. I believe the scheduler (logic) on the server is doing exactly as it should by re-issuing a workunit if no reply after deadline. In the 2 threads above, someone introducted the idea of having a "grace period" after the deadline but before the workunit is resent. Has there been any further discussion or action on this topic? Also, just to ask the simple question, was I granted no credit because: 1) workunit error (see link) even though Outcome = success), or 2) someone else returned results before me, or 3) other? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The first issueance of the WU was not sent back. The second was returned after the deadline was reached. But perhaps they extended the deadlines a little due to the server outage during that time due to the upgrade to SAN file server. The third was issued to you just after the 10 day expiration date was actually reached (i.e. before the second person returned any result). And so when they later did return the result, that was the first report of a successful completion. You later returned the result and were the second successful completion. The settings are configured to accept only one successful completion. So the actual reason you were not issued credit was that the WU had already received one successful result, which is the maximum. The mistake was in issueing the task to you in the first place, when it was still possible for a completion report to be accepted. I believe there is a BOINC issue open to get this fixed in the server scheduling programs. (someone please post a link if they find it) Thank you for your understanding and constructive approach to the problem. Rosetta Moderator: Mod.Sense |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
Thank you for the clarification. Occasionally, one of my RAH workunits will crash, so I've been keeping an eye on the project and posting on the sticky threads. Other projects don't have any problem. I just wanted to be sure my computer crunched that WU properly. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Here is a link to the issue I opened on the BOINC trac system. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
The mistake was in issueing the task to you in the first place, when it was still possible for a completion report to be accepted. I believe there is a BOINC issue open to get this fixed in the server scheduling programs. (someone please post a link if they find it) I still contend that the mistake is setting the "max # of success results" number to just 1, when it is a known fact that work can get reissued as soon as a deadline is passed. IMHO this should be changed to 2 (and probably change the "max # of total results" to 3 as well). |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The reason for setting maximum successes to 1 is that the project would prefer that another WU were crunched instead of crunching the same one twice. You see each WU generated on the server has embedded within it a random number seed which is used to generate a unique series of starting points. This large number of unique starting points is what we are collectively exploring on our machines. So if the WU is completed by more then one machine, they have duplicated the efforts of the other machine. So NOT reissueing the WU until results will not longer be accepted would be preferable. Rosetta Moderator: Mod.Sense |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
A grace period for the transitioner (or scheduler?) to re-issue the work makes much more sense, IMHO. I agree that it's better not to recrunch because the researchers are treating each workunit as one sample in their study. In my case, 48 hours would have been enough of a delay before re-issuing work that the late result would have been returned. If the research/technical team can deal with workunits possibly taking an extra 48 hours * # late machines to finish, then it would solve the problem. Worst case, all the deadlines could be moved up 48 hours so account for the grace period, in the event someone doesn't return the result on-time. What would be REALLY cool, is if the Rosetta application *knew* the deadline. Then it would just "finish" if the deadline has pasted. Similar to the preferred run-length parameter, but have an overriding parameter so it doesn't go past the deadline. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
|
Message boards :
Number crunching :
Max # total results question
©2025 University of Washington
https://www.bakerlab.org