Message boards : Number crunching : Major problems with granted credit
Author | Message |
---|---|
Martin P. Send message Joined: 26 May 06 Posts: 38 Credit: 168,333 RAC: 0 |
I experience major problems with credits granted. On 2 occasions claimed crdit was 48.9 (https://boinc.bakerlab.org/rosetta/result.php?resultid=217748955) and 90.4 (https://boinc.bakerlab.org/rosetta/result.php?resultid=217572433) while the granted credit was only 2 and 8. Run-times are in line with previous, correct work-units. What is going on? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I thought that tasks ending in _0 were original to the specific user and the only other test would have been on RALPH. So how can another RAH user have the same task with that exact random start point assigned to them, unless it errored out on another machine? Didn't tasks with Zinc get a low credit rating? But I wouldn't think as low as 8 credits on a high end processor with no errors. On the other hand..if you break his run time out into hours, he had a 2.79 hour run and perhaps for that time frame the credit is correct. Wonder if it would have been better at 4-6 hrs run time? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I thought that tasks ending in _0 were original to the specific user and the only other test would have been on RALPH. So how can another RAH user have the same task with that exact random start point assigned to them, unless it errored out on another machine? Yes, this is true, but many machines will work on models for the same protein and same solution method. Each with different starting points. So these are what is averaged together. Basically, look at the WU names and the batch number that preceeds the _0. Many thousands of tasks will be generated with the same name and batch. Each has a unique random seed and will produce unique models. So, for example, the task first mentioned is: cc2_1_8_native_cen_cst_hb_t313__IGNORE_THE_REST_1RY6A_3_5845_79_0 5845 is the batch number 79 is the task within that batch 0 is the replication level so far But if you had a list of all the tasks for batch 5845, you would see some for proteins other then 1RY6A... or is it t313? I don't have a perfect understanding of the names either :) It wouldn't make sense to average runtimes from protein "A" with runtimes from protein "B". So, within a batch, the specific protein being studied is the basis of what gets averaged as results come in. Rosetta Moderator: Mod.Sense |
sarha1 Send message Joined: 23 Sep 05 Posts: 5 Credit: 6,339,735 RAC: 0 |
I realized recently a flat rate credit granted - exactly 2 credits per decoy counted. It is really strange. |
david @ TPS Send message Joined: 26 Nov 06 Posts: 3 Credit: 881,762 RAC: 0 |
Thanks for the explanations. Due to the CUDA fiasco, I am looking for a new home for my farm. I have been crunching Rosetta with a few of my older boxes but it does not sound like the big horsepower would be well served by this system. I might move a dual core and see how it likes it -- then decide. David PS: I have not looked at my computers for a while, and I see the credit numbers are MUCH better than they used to be! Here are a few that mirror Martin's, I think. 218028581 198676953 31 Dec 2008 10:21:34 UTC 5 Jan 2009 17:20:55 UTC Over Success Done 18,349.98 36.14 10.53 218028580 198676951 31 Dec 2008 10:21:34 UTC 7 Jan 2009 11:33:44 UTC Over Success Done 18,927.20 37.28 2.00 218028579 198676945 31 Dec 2008 10:21:34 UTC 7 Jan 2009 11:33:44 UTC Over Success Done 8,364.75 16.47 6.00 Think it's time to start re-tasking some horesepower............ |
bono_vox Send message Joined: 5 Dec 05 Posts: 8 Credit: 371,092 RAC: 0 |
All recent granted credits are integer values. Strange... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2127 Credit: 41,266,340 RAC: 8,573 |
|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I'm seeing the same. 2.0 credits per model. I've EMailed the Project Team. Rosetta Moderator: Mod.Sense |
Aegis Maelstrom Send message Joined: 29 Oct 08 Posts: 61 Credit: 2,137,555 RAC: 0 |
Hi Martin, Sartha, Sid and our brave Mod! The same here, firstly I thought it was just a single case, but it is repeating and repeating - flat two points per decoy - see here, here and here. All three abinitio norelax homfrag end with _0 just like Greg said, however the ones reported previously (on 6th of Jan) were rated more reasonably... If I were to bet I would say some rating module on the server side broke during the latest crash. I hope it will be fixed soon - I know I don't have any significant crunching power on this lappy but we do our best :) and it's a kind of slap. ---- On the side: I had another problem with another WU from the same set. See the other topic. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
What is the difference for the names of these two tasks: abinitio_norelax_homfrag_129_B_1a8oA_SAVE_ALL_OUT_4626_4562_0 abinitio_norelax_homfrag_129_B_1bq9A_SAVE_ALL_OUT_4626_4562_0 The only difference I see is the 1a or 1b name, so is this two different proteins or starting points or what? The first task I lost 20 credits 128 granted vs 148 claimed and the other task I gained credit 99 vs 130. But then here is another interesting thing, the run times were different and never modified by me as far as I know since the system has been down and no communication is possible. The first task ran 6hrs and the other ran 4hrs. How can this be if no settings were changed? on the credit point, the first task generated exactly 2 points per decoy for 64 decoys and the second came out to 1.61 for 81 decoys. seems kind of low for credit per decoy. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now. thanks for the update. hope you got everything else working ok now. been quite a trying few days for you i see. thanks for your hard work. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2127 Credit: 41,266,340 RAC: 8,573 |
the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now. Confirmed, thanks. Pity about those lost ~300 credits, but I'll make it up now that I've upped my runtime to 4 hours and all those lockfile errors disappeared. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now. 4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done. |
ConflictingEmotions Send message Joined: 5 Jun 08 Posts: 10 Credit: 3,081,990 RAC: 0 |
the credit granting system was broken due to a corrupt database table. I fixed it and it appears to be running okay now. So when will we see the updates to the affected WU that were already completed? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2127 Credit: 41,266,340 RAC: 8,573 |
4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done. More likely, it reduces the amount of complaining about long running models... <oops> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done. lol sid...i was thinking of something different in the long model area..but yeah..your right...it should in theory reduce the complaints. but then again thats up to our friends in seattle to get the code right. |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,688,048 RAC: 10,544 |
4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done. big models take longer - isn't the problem that some computers aren't finishing a single decoy in the run-time and that's why they're overrunning? No problem with the code, just the model size. I expect sure they'd like to send out much bigger models than the currently do. I think it'd be useful if they could be more selective in how much resource the different models require and only send those to adequate computers. It'd also be useful if there were a selection of large and small tasks so, for example, a quad might be able to do up to two large and two small as a maximum. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
4 hrs is a good runtime setting. gives you long enough to complete at least 1 complicated model before its done. If the scheduler or whatever program that stores the info about our systems could be made intelligent enough to read through the database of systems and their settings and say..oh..heres a system with 6 hour run times or longer, lets send a large model protien to it and then the same with lower run times and memory etc. But I suppose such a program or whatever would take a lot of time to develop or is not possible at this time. That would take care of the over runs I would think. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2127 Credit: 41,266,340 RAC: 8,573 |
...it should in theory reduce the complaints. but then again thats up to our friends in seattle to get the code right. Not entirely sure about that. Yes, there are bigger models coming through, but there also seems to be an issue of some taking unreasonably long, but not returning anything like the credit a model of that size should warrant. That's what they seem to be trying to pick up. |
Message boards :
Number crunching :
Major problems with granted credit
©2024 University of Washington
https://www.bakerlab.org