Message boards : Number crunching : Problems with Rosetta version 5.40
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
I looked through your recent results and most of them were caused by the backward compatibility problem which happend earlier this week (scroll down to the first few posts). Happy crunching! i got more errors in last 5 days than i did in last 100 days |
alexpoon Send message Joined: 28 Dec 05 Posts: 6 Credit: 1,846 RAC: 0 |
19/11/2006 19:12:11|rosetta@home|Unrecoverable error for result PSH_0051_looprlx_GP120_OD1_138_148_5484_1404_20_0 ( - exit code -529697949 (0xe06d7363)) |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
The grafics on this typ of wu-s donĀ“t work very well. Most of the time there is nothing in the Accepted and Low Energy boxes. https://boinc.bakerlab.org/rosetta/result.php?resultid=47869770 Anders n |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 704,566 RAC: 4 |
I had to abort this wu: resultid=47806539 It had gotten stuck for hours, and was not using any CPU time, even though the Boinc CC said it was running. I suppose if I let it run a few more hours the watchdog would have stopped it, but I didn't want to waste any more time on it. |
Mats Petersson Send message Joined: 29 Sep 05 Posts: 225 Credit: 951,788 RAC: 0 |
I aborted resultid 47184521, as like previous post it got stuck... -- Mats |
SOAN Send message Joined: 27 Sep 05 Posts: 252 Credit: 63,160 RAC: 0 |
This one ran for an hour at 98% of the cpu but never registered any CPU time: resultid=48156890 |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
If others are seeing the problem reported in the last three updates, could you please just try ending and restarting BOINC? By end, I mean the File -> Exit. I've been seeing problems for several months where BOINC seems to lose contact and/or control of the Rosetta threads that do the crunching. It is BOINC that is supposed to tell the Rosetta thread when to be active. So, by the description that the Rosetta thread isn't getting CPU, it points more to a BOINC problem. Also, please report your BOINC version, and your platform (Windows, Linux, Mac). Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
If others are seeing the problem reported in the last three updates, could you please just try ending and restarting BOINC? By end, I mean the File -> Exit. File -> Exit does not end BOINC if it is running as a system service, or if it is running under another user's login. One way to end boinc when running as a service is ControlPanel -> Admin Tools -> Sefvices Right-click on BOINC, stop. Another is Start -> Run and enter net stop boinc The only way to exit boinc that works across *all* windows configurations is to get a command window, cd to the BOINC folder, and type boinccmd --quit but my suggestion is to find which of the easy ways works for your set up, and use that. In Linux get a shell window (terminal window), cd to the BOINC folder, and type ./boinc_cmd --quit note the extra underscore in the name of the command! By the way, the boinccmd or boinc_cmd command is quite powerful. It can also control an instance of boinc running on another machine on the network and this is the way you would place commands in a .BAT file or shell script. To see the entire range of its abilities, on win use boinccmd --help|more or on Linux ./boinc_cmd --help|less NB: The windows version's help output shows all its examples as for the linux version, so you do need to remember to leave out the underscore when taking its advice!! R~~ |
Steve Shedroff Send message Joined: 7 Nov 05 Posts: 11 Credit: 250,657 RAC: 0 |
Ever since about October 20 somthing my computer was crashing more than normal. Windows XP SP2. I am not in control of the maintnenace of this box so I always wait a few weeks to see it the auto-updates I get fix things. I noticed that Rosetta was up-versioned about the same time as my crashing problem so I stoped receiving work units for a short time. The problem went away. I noticed a new version of Rosetta today so I started back up. I'll let you know if I have further problems. Could have all been a coincinence. The crash alwasys strted with slow mouse/keyboard resonce, then total mouse and keyboard lockout. Power down was only way to reboot. Just thought you should know. Steve |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Ever since about October 20 somthing my computer was crashing more than normal. ... Just thought you should know. Thanks Steve, In this case it fits into what we already know about problems with the changeover between versions. However it is always worth mentioning this kind of thing in case it has not been picked up before. Also Rosetta is a dev project, and not everyone has time to weather out a bugstorm when they occur. Taking shelter in another project for a month is a good strategy! Glad you came back. R~~ |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
How long do we wait without an update to the %complete figure before we get worried? This task ran OK and did 18 decoys in about 3hrs on this machine, in the PSH_0058_looprlx series of WU. I have since put the target time up to 10hrs, and now another PSH_0058_looprlx task updated the %complete at around 3hrs (31%), and has now got to almost 10hrs without the %complete changing again. This seems a genuine effect, as boinccmd shows checkpoint CPU time: 11234.859375 current CPU time: 35711.500000 fraction done: 0.31510 ie about 24k sec without a checkpoint, ie over 6 hrs. This long-running task will be here once it is reported so you can see what stderr makes of this. Leaving it running for now, but wondered if others have seen this effect. By the way, it occurs to me that others may wonder how to see the checkpoint info, so I have just started the how to use boinccmd thread. R~~ |
Christian Diepold Send message Joined: 23 Sep 05 Posts: 37 Credit: 300,225 RAC: 0 |
I had a crashed WU today. From time to time I look at the gfx of the current Rosetta WU and I never had problems with that before. But today, when I openend the gfx for a WU, my firewall popped up told me, that Rosetta wanted to contact "msdl.microsoft.com". I was like, WTF. I said no, and the second I hit the "no" button of my firewall, the WU crashed with exit code 1 (0x1). That's the WU. What gives? Why would Rosey call M-$oft? And why would a no in my firewall - just the same as if the internet was off - crash the whole WU? All coincidence? |
Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0 |
so many problems with the WU's and they crashing my PC. So i will stop my Work here. Pls let me knew when the Problems are sovled. |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 704,566 RAC: 4 |
I had to abort this wu: resultid=47806539 The above WU was running under Boinc CC 5.7.2 on Windows XP. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
I had a crashed WU today. From time to time I look at the gfx of the current Rosetta WU and I never had problems with that before. But today, when I openend the gfx for a WU, my firewall popped up told me, that Rosetta wanted to contact "msdl.microsoft.com". I was like, WTF. I said no, and the second I hit the "no" button of my firewall, the WU crashed with exit code 1 (0x1). Several people have already reported problems with the graphics on the current app, so it makes sense that this happened only when you opened the graphics. My guess is that the WU was already dead when your firewall popped up and asked the question, and was running the microsoft (M$) debugger before quitting totally. msdl is possibly a debugging site within M$. If you try to open it with a browser you get redirected to their main site, to specific pages about using the debugger. There are two reasons I can think of why the debugger could try to talk to the msdl site - to ask for the translation of the error message into German (you have your machine set to use German wherever possible, I guess?), or to report the error automatically. Note - these are both guesses, I don't actually know. EDIT: Equally, it could be that the call comes, not from the debugger, but directly from the M$ code in the interface between Rosey and the graphics driver. Both the above reasons would apply there, too. Then, after you hit "no", the debugger completed immeditately and the error was reported. I can understand why the whole thing felt very suspicious, but in fact I think it may simply be M$'s usual dodgy practice of using the net when you don't expect it to. R~~ |
Christian Diepold Send message Joined: 23 Sep 05 Posts: 37 Credit: 300,225 RAC: 0 |
Ah, thx for the ideas River. Didn't know that msdl was a debugger. I always thought it meant "msdownload" or something like that. Yes, my BOINC version is set to German, so that theory of yours makes pretty much sense. :-) |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Ah, thx for the ideas River. Didn't know that msdl was a debugger. I always thought it meant "msdownload" or something like that. Yes, my BOINC version is set to German, so that theory of yours makes pretty much sense. I am only guessing from the fact that the browser re-directs to debug info. But my guess is Ms debug logging or suchlike. Who really knows? R~~ ps, note also the extra paragraph marked EDIT in my previous post |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 29 |
> Had this one fail and I did not notice each time I looked at it that it was not doing anything and the cpu had dropped back to idle. Boinc Manager said it was running but nothing was going on. It had hung for about 20 hours before I aborted it. https://boinc.bakerlab.org/rosetta/result.php?resultid=47931779 |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 29 |
> Just another addition about the Graphics problem with Rosetta/Ralph (they both do it). Sometimes in the debug data it shows a different project was the screensaver and not Rosetta at the time of failure. This maybe true but what may have been forgotten is that on a Dual Core processor or a hyperthreaded processor, 2 jobs are running at the same time but only one screensaver at a time can be on the screen. A number of the lockups I had showed Seti or Einstein as being the sreensaver when it froze. I then went to Task Manager and it told me that the Rosetta WU was "Not Responding" anymore, the other task was actually still running without any problems. So I am guessing here that the screensaver may of been about to change to the other project (rosetta) and 'hung' in the process (although the shortest WU I had fail ran less than 30 minutes with the screensaver on before dying). Anyway it could be something to look at. I have for a couple of weeks now not had the screensaver on for both Ralph and Rosetta due to the 70% + failure rate, I have had only the isolated failure of a WU since I did this. There are another 1 or 2 threads about this problem now, as people are not sure if it is 5.40 problem, a rosetta problem or a Boinc problem. Without graphics on I don't have a problem. |
Buffalo Bill Send message Joined: 25 Mar 06 Posts: 71 Credit: 1,630,458 RAC: 0 |
This one tried to upload several times and I tried to get it to upload with "retry now". I kept getting a message about the file being locked by the scheduler. After a reboot and a few more tries I aborted the upload and it caused the WU to error out. 49343410 |
Message boards :
Number crunching :
Problems with Rosetta version 5.40
©2025 University of Washington
https://www.bakerlab.org