Message boards : Number crunching : Minirosetta v1.32 bug thread
Previous · 1 . . . 4 · 5 · 6 · 7
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Sid - your message threads make mention of the error, but no one has answered it. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
Peculiar thing is, as soon as I had a little moan about most WUs falling over I just had a great little run of successes, including one in excess of 3 hours. And the successes continue. The only change I made was to "Leave applications in memory while suspended?" I thought I had this as 'Yes' in my Boinc Manager settings, but it was marked as 'No' online. Hmm. And now 5.98 WUs are coming through too. That 10 people may represent 10,000 who are having trouble, getting disillusioned and detaching. It might do. Is there any evidence of that? The home page shows more users and more hosts each day and 239k successes in the last 24hours (up from 235k the previous time I mentioned it). These graphs support that. What's the basis of your assertion? 09/09/08 20:18:35||Starting BOINC client version 6.2.18 for windows_x86_64 I was about to highlight this for being another Vista64 issue, then I glanced at the error messages being given and I'm staying clear. Way out of my depth on that one! Except I noticed all WUs succeeded prior to 7 Sept, which makes me wonder if my whole issue has been about leaving applications suspended in memory or not. I'll keep an eye on my progress now (with WU run time set to default 3 hours again). |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
Not clear on "Leave applications in memory while suspended?". Mine are Yes, should I switch to no and try again? >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Not clear on "Leave applications in memory while suspended?". Mine are Yes, should I switch to no and try again? ...only if you wish to test to see if undoing what apparently improved Sid's situation, and thus putting you with the settings that Sid thinks may have contributed to having problems. In theory the setting will not effect whether a task runs properly or not. Sid may be building evidence that there is a flaw making the theory not match the reality. In practice, you want to leave tasks in memory (virtual memory is where they are really) while suspended to preserve all the work possible. Otherwise you are shutting the task down (every hour be default) and it may not have had a chance to save a checkpoint for the work it has done, so the work is lost, and done again when it starts again later. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
Mine are Yes, should I switch to no and try again? Trawling back through other threads for clues, the advice seems to be to set it as 'Yes'. But your problemssymptoms seem very different to mine so I can't help, unfortunately. I was commenting as much on the fact that the online setting was different to what I had in my Boinc Manager. I thought they synchronised on each Update. In theory the setting will not effect whether a task runs properly or not. Sid may be building evidence that there is a flaw making the theory not match the reality. Don't think for a minute I have any idea what I'm doing or that I have a plan - I don't! But until something else changes I could hardly make things worse than they were! Maybe I've stumbled on some oversensitivity to one setting. I don't know. A couple of 2 hour WUs are going through now and 1 of 3 hours. Fingers crossed. I like to think I'm making a difference, even if I'm just deluding myself. (Probably the latter...) |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
A couple of 2 hour WUs are going through now and 1 of 3 hours. Fingers crossed. All those went through ok, but 2 further ones failed overnight. Now it's all 5.98 WUs and 100% successes as usual. |
leonari Send message Joined: 11 Dec 05 Posts: 8 Credit: 4,074,293 RAC: 795 |
I noticed at 09:30 BST today, the 1th of September, 2008, that the Work Unit(WU) Rosette Mini 1.32 abinitio_homfrag_71A_1jfvA_4443_45274_0 had 10 minutes to run at 95% complete (after circa five and a half hours CPU run time). About 30 minutes later the WU had only completed another 1%, which could have been because of business work I was doing but because of the problems with Rosetta that I have had in the recent past, I suspended everything else to allow it to finish, and to see what would happen. About two hours later, the CPU run time had increased to eight hours forty-seven minutes but had not finished at 98.139% complete, yet the SET application, which should run 75% of the time, had not moved. At that point I suspended the Rosetta aplication to allow SETIi@home Enhanced 6.03 to run. With the time now at 22:54 BST, SETI has not apparently done anything since (that is no progess and no increase in CPU time. However on checking Windows 2000 Task Manager , the Rosetta Mini 1.32 is running at circa 30 to 90% CPU utilisation (even though it is suspended - allegedly), and SETI is running at 0%. I also checked the graphics for both SETI and Rosetta, neither worked. Question I asked myself: what is at fault here: Rosetta mini 1.32; SETI 6.03 or BOINC 5.10.45? To see what happens next I have reset Rosetta Mini. SETI has started, the SETI graphics now work, and Windows 2000 Task Manager shows SETI at 90 to 95% utilisation. Comments, please? By the way, am I so unlucky with Rosetta, or is this a common occurance? Also, by the way, I filed another problem with Rosseta Mini on this thread a few days ago but, although I am sure it appeared in the "Thread Record", it has since disappeared. Reasons? |
leonari Send message Joined: 11 Dec 05 Posts: 8 Credit: 4,074,293 RAC: 795 |
I noticed at 09:30 BST today, the 1th of September, 2008, that the Work Unit(WU) Rosette Mini 1.32 abinitio_homfrag_71A_1jfvA_4443_45274_0 had 10 minutes to run at 95% complete (after circa five and a half hours CPU run time). About 30 minutes later the WU had only completed another 1%, which could have been because of business work I was doing but because of the problems with Rosetta that I have had in the recent past, I suspended everything else to allow it to finish, and to see what would happen. Stupid person that I am, I now see that some messages are hidden so my last question on why my previous message had disappeared is of no consquence. My Laptop is a Dell 2.2 GHz C640i running Windows 2000 5.00.2195 SP4 - fairly old but has no problems running SETI or Ralph (strangely)! |
BrnmccO1 Send message Joined: 26 Jun 07 Posts: 17 Credit: 578,825 RAC: 0 |
Well, on this comp I've had quite a few 1.32 errors and some 1.28 errors as well. Like other people it run's 5.98's 100%. 191481586 is a typical example of the usual "Unhandled Exception Error" that bombs out the WU. Hopefully 1.34 will be better! In any case, I for one won't be missing 1.32 RIP. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
Task ID 191460060 <core_client_version>6.2.18</core_client_version> Someone else took on this WU and it didn't fare any better either. |
Roger L. Cousins Send message Joined: 5 Nov 05 Posts: 1 Credit: 21,116,718 RAC: 8,903 |
MiniRosetta seems to be spawning multiple threads. I have run out of Page file several times. I see eighteen threads in process right now, and many of them are using up to 170 Meg. What's up with that? How do I terminate them, short of using Task Manager to stop them one by one? R Cousins |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
MiniRosetta seems to be spawning multiple threads. I have run out of Page file several times. I see eighteen threads in process right now, and many of them are using up to 170 Meg. Do you see it on your WinXP host 361486? Using which application? Single threads of any Windows process do not have their 'own' allocated memory (in the context described here), memory is allocated (and accessible) 'per process'. What's the total physical/virtual memory usage of the Minirosetta process? Your pagefile size? What's up with that? How do I terminate them, short of using Task Manager to stop them one by one? Task manager does not support terminating single threads. Are you sure you are seeing threads, not processes? Peter |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
It looks as if I may have solved my particular problems with Rosetta by giving up BOINC screen savers. I switched to standard Windows screen savers and re-attached the three computers which I had been forced to detach from Rosetta. The problem had been that something in Rosetta was rendering the three machines totally useless, pinning the CPU at 100% an generally making me miserable. I would get the machines back by using Task Manager and shutting down the errant application. Once I got rid of the BOINC screen saver, everything seemed to go back too normal. As I said, I re-attached to Rosetta, now about 36 hours ago. No machine has had any problems and I believe that I have results now in all four machines. I do not remember seeing any discussion here or screen savers. Maybe I just missed something. Let me say that I know that there are different philosophical positions on using the screen savers. I favor using them, many of them let me know what is going on in the 10 or so projects to which I am attached with a quick glance at the monitor. Any comments? >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
Weired - we'll look into this. Has anyone else experienced problems like this ?
http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
Mike- Just to let you know, things are still going quite well on all four machines, I suppose you can look at my results. I have just yesterday detached the two PIII's, to make room for another nproject which they can handle; but the two Core 2 Duos, which are really about 90% of my crunching ability, are of course still running Rosetta. I mean, the PIII's only achieve what they do running 24/7, whereas the other two do not. You guys have a major responsibility in that Rosetta because of its originating software may be the most important project running on BOINC software. At least, I believe it is Proteome at WCG which uses your software. Best ever always. >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
leonari Send message Joined: 11 Dec 05 Posts: 8 Credit: 4,074,293 RAC: 795 |
This is about problems with Minirosetta v1.34 as there does not appear to a "thread" for it! As can be seen from the three incidents below, Rosetta sometimes continues to run regardless of the rules on how long it is allowed to run (may be a BOINC Manager problem?). It then "locks up" and continues to run at 100% stopping anything else from running! Note: all of the message sequences below are sequential messages extracted from the "Messages" tab in BOINC. Incident 1 05/10/2008 12:50:17|rosetta@home|Starting abinitio_nohomfrag_70_A_1ynvA_4466_27265_0 05/10/2008 12:50:35|rosetta@home|Starting task abinitio_nohomfrag_70_A_1ynvA_4466_27265_0 using minirosetta version 134 Rosetta locked up running at 100% - presumably for one and a half days! Aborted at 09:34 07/10/2008 07/10/2008 09:34:17|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks 07/10/2008 09:34:22|rosetta@home|Scheduler request succeeded: got 0 new tasks 07/10/2008 09:34:50|SETI@home|Resuming task 22au08ac.21313.9479.16.8.9_1 using setiathome_enhanced version 603 Incident 2 10/10/2008 11:29:03||Starting BOINC client version 5.10.45 for windows_intelx86 10/10/2008 11:29:03||log flags: task, file_xfer, sched_ops 10/10/2008 11:29:03||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3 10/10/2008 11:29:03||Data directory: C:Program FilesBOINC 10/10/2008 11:29:07||Processor: 1 GenuineIntel Mobile Intel(R) Pentium(R) 4 - M CPU 2.20GHz [x86 Family 15 Model 2 Stepping 7] 10/10/2008 11:29:07||Processor features: fpu tsc sse mmx 10/10/2008 11:29:07||OS: Microsoft Windows 2000: Professional Edition, Service Pack 4, (05.00.2195.00) 10/10/2008 11:29:07||Memory: 511.43 MB physical, 1.21 GB virtual 10/10/2008 11:29:07||Disk: 17.70 GB total, 2.26 GB free 10/10/2008 11:29:07||Local time is UTC +1 hours 10/10/2008 11:29:11|rosetta@home|URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 97037; location: home; project prefs: default 10/10/2008 11:29:11|ralph@home|URL: http://ralph.bakerlab.org/; Computer ID: 1760; location: home; project prefs: default 10/10/2008 11:29:11|SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 1960189; location: work; project prefs: default 10/10/2008 11:29:11||General prefs: from http://setiathome.ssl.berkeley.edu/ (last modified 08-Jun-2006 10:33:55) 10/10/2008 11:29:11||Host location: work 10/10/2008 11:29:11||General prefs: no separate prefs for work; using your defaults 10/10/2008 11:29:11||Reading preferences override file 10/10/2008 11:29:11||Preferences limit memory usage when active to 255.71MB 10/10/2008 11:29:11||Preferences limit memory usage when idle to 460.29MB 10/10/2008 11:29:11||Preferences limit disk usage to 2.26GB 10/10/2008 11:29:18|SETI@home|Restarting task 19au08ab.15460.9479.6.8.46_1 using setiathome_enhanced version 603 10/10/2008 11:33:21|SETI@home|Sending scheduler request: Requested by user. Requesting 36 seconds of work, reporting 1 completed tasks 10/10/2008 11:33:24|SETI@home|Scheduler request succeeded: got 1 new tasks 10/10/2008 11:33:27|SETI@home|Started download of 26au08ad.24455.4162.7.8.218 10/10/2008 11:33:38|SETI@home|Finished download of 26au08ad.24455.4162.7.8.218 10/10/2008 12:16:18|SETI@home|Computation for task 19au08ab.15460.9479.6.8.46_1 finished 10/10/2008 12:16:18|SETI@home|Starting 26au08ad.15112.2526.6.8.181_1 10/10/2008 12:16:18|SETI@home|Starting task 26au08ad.15112.2526.6.8.181_1 using setiathome_enhanced version 603 10/10/2008 12:16:20|SETI@home|Started upload of 19au08ab.15460.9479.6.8.46_1_0 10/10/2008 12:16:28|SETI@home|Finished upload of 19au08ab.15460.9479.6.8.46_1_0 10/10/2008 14:14:37|rosetta@home|Restarting task abinitio_nohomfrag_70_A_1unrA_4466_47644_0 using minirosetta version 134 17:31 on the 10/10/2008 - Because Rosetta was still going at circa 85% but with no increase in either of the two SETI tasks (SETI should run 75% of the time), Rosetta was suspended at 17:31 on the 10/10/2008. 10/10/2008 17:31:02|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks 10/10/2008 17:31:07|SETI@home|Scheduler request succeeded: got 0 new tasks 10/10/2008 17:31:42|SETI@home|Resuming task 26au08ad.15112.2526.6.8.181_1 using setiathome_enhanced version 603 At 21:54 on the 11/10/08, it was observed that Rosetta had increase to 100% complete even though it was still suspended. Bt the way, there was no message to report that it had restarted. Aborted Rosetta. 11/10/2008 21:54:48|SETI@home|Starting 26au08ad.24455.4162.7.8.218_0 11/10/2008 21:54:51|SETI@home|Starting task 26au08ad.24455.4162.7.8.218_0 using setiathome_enhanced version 603 21:58 on the 11/10/08 - Rosetta still going, even though it had been aborted, but SETI was still not – "Screen capture" available. Terminated Rosetta task. SETI then started Incident 3 14/10/2008 11:53:21|rosetta@home|Restarting task abinitio_nohomfrag_70_A_1zd0A_4466_59245_0 using minirosetta version 134 14/10/2008 12:38:32|SETI@home|Started download of 25au08af.7275.890.10.8.52 (Note: First SET 14/10/2008 12:38:53|SETI@home|Finished download of 25au08af.7275.890.10.8.52 14/10/2008 12:41:15|ralph@home|Finished download of looprelax_tex_cst_oneparam.looprelax_tex_cst.t328_.tex.boinc_files.zip 14/10/2008 12:47:32|rosetta@home|Finished download of foldcst_simple.foldcst_simple.t313_.mtyka.boinc_files.zip 15/10/08 - Aborted “abinitio_nohomfrag_70_A_1zd0A_4466_59245_0” after the task was running at 100% for over twelve hours and stopping anything else from working – “Screen print" available. I also suspect that after this task first started, sometime on the 14th, no other task was allowed to start. Note that Rosetta was still taking processing power before the “abort” – “Screen print" available. 15/10/2008 10:10:01|SETI@home|Starting 25au08af.7275.890.10.8.52_1 15/10/2008 10:10:04|SETI@home|Starting task 25au08af.7275.890.10.8.52_1 using setiathome_enhanced version 603 15/10/2008 10:10:07|rosetta@home|Computation for task abinitio_nohomfrag_70_A_1zd0A_4466_59245_0 finished Every thing now working as expected. |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
This is about problems with Minirosetta v1.34 as there does not appear to a "thread" for it! Sure there is one ;-) --> Minirosetta v1.34 bug thread Peter |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
Latest results are not good. 1.32 and 1.34 I had to abort WU's, but, still, I think that at least my problem relates to the screen saver locking everything up. If I use a different screen saver, I seem to have no problems. >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
Latest results are not good. 1.32 and 1.34 I had to abort WU's, but, still, I think that at least my problem relates to the screen saver locking everything up. If I use a different screen saver, I seem to have no problems. >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
Message boards :
Number crunching :
Minirosetta v1.32 bug thread
©2024 University of Washington
https://www.bakerlab.org