Message boards : Number crunching : Problems with Rosetta version 5.59
Author | Message |
---|---|
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
|
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
My problem is not any kind of errors, but with the steady drop in granted credit over the past two weeks along with a five month RAC of 225+ to 206. I've heard that these even out. However, mine appears to be a steady consistant drop. There have been code efficiency problems in the past, and I'm wondering if any inefficiencies exist in 5.5X. My system goes 24/7 except for a periodic system check every two weeks (aprox..5 to 10 minutes). I don't have any garbage running as I check XP's task manager regularly. This only started about two weeks ago. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Rhiju, the announcement on the home page of v5.59 received a date of March 20. So the date is wrong, just like the last time. Wonder what's up with that? Oh, and could you add a link to your description of v5.59 to this "problems with...5.59" thread? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Well, the wrong date is just silliness; I typed it in wrong! I'll talk to other team developers about the drop in credit, Keith; based on your post, it can't be inefficiencies in preempting/resuming, so we'll look for inefficiencies in the code as you suggest. Rhiju, the announcement on the home page of v5.59 received a date of March 20. So the date is wrong, just like the last time. Wonder what's up with that? |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
OK. I've done a comlete re-install of my system and the CPU is now running a consistent 99% on task manager. Before I had some unusual background process activity eating as much as 4%. Give me a few days to see if my RAC comes back around. Talk about Spring Cleaning! |
niko Send message Joined: 1 Apr 07 Posts: 3 Credit: 22,789 RAC: 0 |
i have no more problems of "process *** not found" with the latest version on my Macs! |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Break free of the (side) chains the bind you!! :) Just a nit. I'm noticing that there is about a 5 minute period mid model where no visible changes to the graphic occurs. On my workunits called: s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom014__1638_1775_0 This seems to hit around step 70,000. Just prior to that, the last redraw seems to get bad data, and you will see that much of the sidechains appear not to be connected to anything. If you get curious (as I always do) you rotate things around and find... well... yeah, THAT would NOT be connected to ANYTHING! Here's a link to my screenshot. Note, the first box seems to retain the sidechains in tact. The very next step brings everything back in to line... but it takes more then 5min. to reach that next step at this point in the model. So, people have a much higher then average chance of catching this malformed frame when they pop in to admire their beautiful protein and RNA structures. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Ai-Leng Send message Joined: 14 Oct 06 Posts: 8 Credit: 4,715 RAC: 0 |
Well I was a one of the small number of people on a mac with issues with v5.54. Now, something odd has occurred. Boinc was crunching away data for s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom014__1638_3094_0 and everything was going well and had reached a progress of 95%. The next unit for Rosetta had even been downloaded ready to start once this one had finished. It was at this point that I switched my Powerbook off to head to work. When I started my Powerbook up again, I noticed that the Progress of the same work unit is 0% and the data processing has since restarted. I didn't see any error messages at the time of when I switched my mac back on. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
A problem with my first unit with the new app, when i shut down for the night it had 4hrs, 36min at 46% for a 10hr runtime. When i started up this morning the time was the same but the % had gone back to 0%, now it's at 6hrs, 36min and showing 37.3% complete never had that before. P.S. 10hr runtime. |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
Feet1st, I've noticed that along with the first three docking models completing within the first hour with model four-on taking from 45 minutes to over an hour. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Mags, if you happen to have a short runtime preference, and you crunching did not complete the first model, then what happend is just that the task never reached a checkpoint and the work up to that point is lost. This is why improved checkpointing is on the list for including in the next release. [edit]The other possibility is that you are seeing the 0% complete issue, even though the graphic shows models crunched and the total CPU time. I'm not positive if this issue that was observed on Ralph still existing in the 5.59 verison or not. This may be what Rhiju was referring to when he said sometimes at the beginning of the model the estimate can be "a little off". Even if that issue still exists, the tasks completed OK. It was just a bit confusing to monitor the % complete after powering down like that. You will find it increases quickly and will complete the task at the expected time. Basically, if a task is restarted, work begins from the last checkpoint, or the last completed model. And it begins with the amount of CPU time you had at the time of the model completion or checkpoint... but for some reason, BOINC is not reporting that total CPU time to Rosetta. It is only reporting the time since this restart. So anyway, this is a kink that needs to be worked out. But it's an issue that the Project Team is already aware of. Rosetta Moderator: Mod.Sense |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Peter, this is how it worked all along. The difference now is the improved indication on the % completed. I mean what you would have observed before this would have been that you were crunching at 37.3% for the short period of time just before you shutdown, and then when you powered back on you were back at 37.3%. But now... even though everything else is running the same, you see that progress indicator tick up from 37.3% through to 46% before you power down. Some work is always lost when BOINC shuts down, or when the Rosetta application is otherwise removed from memory. With the planned improved checkpointing, you will still lose work, but significantly less will be lost. It is simply the way computers work. If you want to preserve any given piece of information, you must write to disk. If you write to disk all the time, from an application that runs on your machine 24x7, or at least all the time the machine is powered on, then you will be using the disk drive too much. Over time that's not a good idea. The tradeoff for sparing the disk drive is that some work is lost. Sometimes more then an hour can be lost. This is why improved checkpointing is important. Rosetta Moderator: Mod.Sense |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Thanks Mod.Sense for your explanations in the previous posts! David K. and I are working on checkpointing. So many users have expressed concerns about shutting off and starting BOINC -- I'm thinking about ways for Rosetta to be more independent of the cpu run time estimated provided by BOINC. During checkpoints, for example, we can record the time so far in the run. Feet1st, that's a hilarious graphic! I noticed the same thing on test runs here, though it wouldn't freeze for 5 minutes ... just a few seconds. I wonder what's causing the craziness to continue for so long! Anyway, I'll look into fixing it -- didn't have time before due to all the urgent problems last week. If its any consolation, those FOLD_AND_DOCK workunits are returning some pretty amazing results (which I'll probably start posting next week -- its been a while since we've seen some "Top Predictions", huh?). Wow, things are quiet on this thread. I'll take that as a good sign. Break free of the (side) chains the bind you!! |
Ai-Leng Send message Joined: 14 Oct 06 Posts: 8 Credit: 4,715 RAC: 0 |
Mags, if you happen to have a short runtime preference, and you crunching did not complete the first model, then what happend is just that the task never reached a checkpoint and the work up to that point is lost. This is why improved checkpointing is on the list for including in the next release. The thing is, the total CPU time also resets to zero and continues to crunch. This same work unit is still being worked on as it hasn't been completed. I switched my Powerbook off again to come home from work and also during a fire drill at work. Will be leaving it on all night tonight for completion and reporting. Not sure if this additional information helps you at all. |
jimbreed Send message Joined: 7 May 06 Posts: 1 Credit: 90,298 RAC: 0 |
This morning I was looking at the progress on a SYMM_FOLD_AND_DOCK_RELAX work unit that had been running for almost 6 hours and was only on the second model. I have an 8 hour preference. (I have a 1.6GHz Pentium 4 running XP-Home.) I clicked in the Low Energy pane to rotate the model and when I moved the mouse, the graphics window disappeared, no error messages, no sign of completion, just poof, it was gone. Boinc downloaded another work unit and started processing. The only message I got in Boinc was: 4/4/2007 6:52:45 AM|rosetta@home|Computation for task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom005__1638_6637_0 finished The result is 71154279. (Edited to correct the result id.) I never saw any graphics problems with earlier versions of Rosetta. |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
This morning I was looking at the progress on a SYMM_FOLD_AND_DOCK_RELAX work unit that had been running for almost 6 hours and was only on the second model. I have an 8 hour preference. (I have a 1.6GHz Pentium 4 running XP-Home.) I clicked in the Low Energy pane to rotate the model and when I moved the mouse, the graphics window disappeared, no error messages, no sign of completion, just poof, it was gone. Boinc downloaded another work unit and started processing. If the Wu is finished the graficswindow shut down by it self. Anders n |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Mags, since your CPU time shown also dropped to zero, then it means your work did not reach a checkpoint nor model end. i.e. the first case I was talking about. I've now confirmed on my machine that the CPU % upon restart drops back to zero., even when models have been completed. Jimbreed, since these models take considerable time to complete, Rosetta is only able to get you within about 90min. of your runtime preference. To begin another model at that point would take you past your preference. And so I suspect it was not your attempt to manipulate the graphics, but rather just the model reaching the end that caused it to take down the graphic window. But the %complete is really just based on your CPU runtime preference, so the % complete was not aware we were going to be ending a bit early on this one. It looks like the task reported normally. You say there was no sign of completion. Not sure what you were expecting, but the message you posted is the normal sign of completion. Rosetta Moderator: Mod.Sense |
Tim Kunz Send message Joined: 27 Dec 05 Posts: 9 Credit: 1,120,252 RAC: 0 |
I just had to reboot computer for Windows updates...a computation that was over 95% complete apparently was not saved...it reset and is recomputing from start. And another PC that was shut down normally and rebooted twice apparently restarted its task from zero each time also. (These were earlier in their computations... < 20%). This appears not to be checkpointing. |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
Tim, I've noticed that too. Did your CPU time reset or did it remain the same? % complete by itself will not affect work completed that has been checkpointed. If this happens again, double check your CPU time and model number. That will tell the story. |
Tim Kunz Send message Joined: 27 Dec 05 Posts: 9 Credit: 1,120,252 RAC: 0 |
The CPU time reset also....complete restart. I'm allowing completion of current computations and redirecting CPUs to other projects until this is resolved. --------------------------------------------------------------------------------- Tim, I've noticed that too. Did your CPU time reset or did it remain the same? |
Message boards :
Number crunching :
Problems with Rosetta version 5.59
©2025 University of Washington
https://www.bakerlab.org