Message boards : Number crunching : Report Rosetta screensaver problems here
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
...could I ask that anyone that feels they've got a system that is rock solid, regardless of running as screensaver, enable the BOINC screensaver, then go in to your Rosetta preferences (perhaps for a specific location, if you've got more then one machine) and set the frame rate up a notch per day? And see if it remains stable, or if you eventually hit a point that you see problems as well? Feet1st-- Over the past two days I increased my frame rate from default to 30. My computer had zero graphics related problems prior to this. I opened graphics on this WU and in about 5 minutes it froze up completely (blank white window). The one thing I did different with this WU was leave the graphics window open. All other times, even at this frame rate, I closed the window within seconds of opening it. I opened windows task manager and noticed that there were two instances of this WU showing "not responding" (with only one open window)when I traced each process back, one went to explorer.exe and the other to the WU. Ctrl-Alt-Delete and computation error. AMD Athlon 64 X2 Dual core 4200+ Windows XP Integrated graphics card: NVIDIA Geforce 6150LE Note that I never use the screen saver but open graphics occasionally --Tim Edit: I tried the screensaver with this WU. Screensaver froze within seconds and WU was errored out. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Thanks for trying that out Tim. I think we've learned something there. I enabled Rosetta in my firewall, because the application itself was trying to report the debug data, but my ZoneAlarm blocked it. Then I left the screensaver on over night, in the morning I checked on things before I went to work. The screensaver was unresponsive, not refreshing, no time incrementing. I had to alt-tab to see task list and kinda break out of it. Then left it running during the day and returned tonight to similar symptoms. Here is the day's messages logged. [size=10] 12/7/2006 2:02:00 AM||Rescheduling CPU: files downloaded [color=red]12/7/2006 8:42:32 AM|rosetta@home|rosetta not responding to screensaver, exiting 12/7/2006 8:42:36 AM|rosetta@home|Unrecoverable error for result FRA_t369_test_LARS_constraints_ hom001_1_S_00001_0000655_0.pdbIGNORE_THE_REST_1435_192_0 ( - exit code -1 (0xffffffff))[/color] 12/7/2006 8:42:36 AM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds 12/7/2006 8:42:36 AM||Rescheduling CPU: application exited 12/7/2006 8:42:36 AM|rosetta@home|Computation for task FRA_t369_test_LARS_constraints_ hom001_1_S_00001_0000655_0.pdbIGNORE_THE_REST_1435_192_0 finished 12/7/2006 8:42:36 AM|rosetta@home|Starting task 1tvg_1_NMRREF_1_1tvg_1_idid_model_10 IGNORE_THE_REST_idl_1432_2205_0 using rosetta version 541 12/7/2006 11:54:22 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 12/7/2006 11:54:22 AM|rosetta@home|Reason: To fetch work 12/7/2006 11:54:22 AM|rosetta@home|Requesting 586 seconds of new work, and reporting 1 completed tasks 12/7/2006 11:54:32 AM|rosetta@home|Scheduler request succeeded 12/7/2006 11:54:34 AM||Rescheduling CPU: files downloaded [color=red]12/7/2006 9:42:39 PM|rosetta@home|Unrecoverable error for result 1tvg_1_NMRREF_1_1tvg_1_idid_model_10 IGNORE_THE_REST_idl_1432_2205_0 ( - exit code 1073807364 (0x40010004))[/color] 12/7/2006 9:42:39 PM|rosetta@home|Deferring scheduler requests for 1 minutes and 0 seconds 12/7/2006 9:42:39 PM||Rescheduling CPU: application exited[/size] Wierdest thing. I've now set BOINC to use only one of my CPUs on this machine. But when I brought up task manager this evening, it showed the rosetta .exe using 100% of CPU. I mean the crunching thread, just one of them, using 100%. I killed the screensaver task, and then it dropped back to 50% like it should be on a HT CPU. Perhaps this rings a bell with someone as to how this might be possible? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
Thanks for trying that out Tim. I think we've learned something there. Yes the same happened to me. At the time I was doing this experiment I also had a RALPH WU running. Normally with both cores running 50% goes to each with very little variation. Instead the Rosetta WU maintained 90%+ while RALPH had 10% or even at times 0. Until I killed the process then back to normal (minus 1 perfectly good WU)... |
Christoph Jansen Send message Joined: 6 Jun 06 Posts: 248 Credit: 267,153 RAC: 0 |
Applications run flawlessly. But when opening the screensaver the graphics sometimes becomes unresponsive and has to be terminated from Task Manager. It then restarts the model and all is well. Right now I tried to solve it by just waiting for the graphics to respond again and that is what happened: 08.12.2006 09:35:45|rosetta@home|Unrecoverable error for result FRA_t103_test_LARS_constraints_oldfrags_barcode_enforced_hom001_12_IGNORE_THE_RESTS_00001_0010849_0.pdb_1429_ 272_0 ( - exit code -1073741819 (0xc0000005)) |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Interesting there was a mention of problem with 5.7.x series compared to the 5.6.5 client since there where no alterations to screensaver code in between them. They are pretty much the same boinc.exe client, but I will check. I had my first error for a long time today, but it was not to do with graphics since it was running in the background. Standard launch from BoincManager (i.e. not a service) It is not in my list as it hasn't reported it yet, similar error to above though. Team mauisun.org |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 700,874 RAC: 909 |
I had this happen this morning: the screensaver was running, QMC was being displayed. It was nearly finished, about 99.7% done (just a few minutes to go) when its 10 minute time slice was up. The next graphics to show were Rosetta (I think it was actually Ralph, but 5.41 as well). For a couple of minutes, all went well, then the QMC WU finished, and the Rosetta graphics froze. I was able to get control of the machine with ctrl-alt-del to get the task manager and also the taskbar showed up. I selected the Boinc Manager from the taskbar and clicked Close, then I went to the system tray and closed the Boinc CC. After restarting Boinc, the Ralph/Rosetta WU did not error out, but it restarted from the beginning. Sometimes I can't get control back, and I just have to hit the Power button. This also will keep the Rosetta WU from erroring out and it will go back to the last checkpoint or restart. The interesting thing here is that Rosetta graphics froze when the Boinc CC had to deal with a different project finishing, then it had to start up a new WU. I run a lot of Spinhenge, and these WU's are short -- usually only 35 minutes. This gives a lot of opportunities for Rosetta to crash on my system. I'm running Boinc CC version 5.6.5 currently, waiting for 5.7.6 which is supposed to have a fix in this area. Edit - Rosetta graphics seem to run fine on the Pentium M laptop. No other projects running. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
I've compiled you a pre-5.7.6 build based on what is currently there at time of writing. (note it has SSE2 turned on along with some other things as they, I don't know which, make the BoincGUI/SimpleGUI snappier for me. Seen as I use it only on Rosetta or CPDN it doesn't matter. It will bugger up QMC credit, but then it'll be nothing like what a lot of people claim there. No idea about Spinhenge) this buld btw also adds to more things (aming some others) to what I mentioned a few posts above) which 'sound' related to some of our problems. David 7 Dec 2006 - core client: add "bool restart" arg to kill_task. If true, the process is killed but we arrange to restart it again, instead of erroring out the result. (used when the app is killed because it doesn't respond to stop-screensaver-graphics message) This completes the fix from yesterday. - core client: changed screensaver-mode ack timeout from 2 sec to 3 sec; added some debug messages - API: add bool g_sleep: if you set this to true, timer activities stop (simulate application freezing up). api/ boinc_api.C,h windows_opengl.C client/ app.C,h app_control.C app_graphics.C gui_rpc_server_ops.C & David 8 Dec 2006 - API: fixed nasty bug that can result in application being both suspended (worker thread not running) and in a critical section (timer thread ignores messages to wake up worker thread). This is a deadlock; the app will never progress. The problem: bool in_critical_section needs to be declared volatile because it's used by both threads. Why didn't I listen to Bruce Allen when he told me to do this a long time ago? - Core client: deal with apps that stop accepting process control messages (due to the above bug). Several parts to this: - Add a timeout to process control message queue. If 180 seconds elapse with an unread process control message in the send buffer, kill and restart the app. Note: when a process is checkpointing it doesn't handle process control messages, so this timeout needs to be large enough to handle the longest possible checkpoint. I think 180 should be large enough. - Initialize message queues on app (re)start. - MSG_QUEUE::msg_queue_purge() was conceptually messed up. We don't want to purge ALL the messages of the opposite type, just the one at the tail of the queue. Whew! This one was exhausting. api/ boinc_api.C client/ app.C app_control.C app_start.C lib/ app_ipc.h WARNING, since I built them myself and they have only just gone into the code, it probably only has David Anderson who has actually tested the code ! Team mauisun.org |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 700,874 RAC: 909 |
Thanks, Fluffy, I will install 5.7.5 again and substitute your .exe's. We'll see how this goes. This morning, BTW, I found the machine completely frozen, not even the Power button would work (except, of course, to hold it for 5 seconds until the machine shut off). When restarted, there were two Ralph WU's in the queue, along with the ever-present CPDN and an Einstein and a Spinhenge. I'll have an opportunity perhaps to see if this helps when the Spinhenge finishes soon. ---- Edit: Tried it -- the BoincMgr immediately errors out, perhaps a version mismatch. Boinc.exe by itself also errors out. No actual Boinc activity here, just a Windows message box. I'll continue running with 5.7.5 for now. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Thanks, Fluffy, I will install 5.7.5 again and substitute your .exe's. We'll see how this goes. Works fine just substituting directly for 5.7.5 files for me on Win Vista, Athlon64 and WinXP, Pentium-M All 3 files need to be replaced, (you may need to replace /windows/boinc.scr with the boinc.scr file as well if you run via the screensaver) From you list all of them can use SSE2 (also the benchmarks are a good match :-)) Team mauisun.org |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Trying again Remember to completely close boinc down. Copy all 4 files (except boinc.scr) over a 5.7.5 installation. Copy boinc.scr over files in windows directory. Standard compile sse2, makes it a bit 'snappier' Team mauisun.org |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 700,874 RAC: 909 |
Hi Fluffy, OK, trying it again -- using the "standard compile" version. This also has boincmgr.exe. Your sse2 link is broken at the moment. Installing over 5.7.5 as before, and of course, I've shut down Boinc completely, verified that no Boinc-related processes are still running, and renamed all the original files as a backup. ---- Nope, no joy. Here is what it says: "C:Program FilesBOINCboincmgr.exe" "This application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem." ---- The original 5.7.5 files are back in place and everything works again. My system is a Dual Xeon 3.06GHz (Prestonia) with HT enabled. Supports MMX, SSE, SSE2. I don't know if that is incompatible with what you are building. This is the system in question. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Yea, this is really wierd and, I believe, a key clue to screensaver problems. When my Windows PC locks up, the graphic shows no time advancing, and my hyperthreaded PC has been set to run only on one CPU. But when I ctl-alt-dlt and get to task manager, I see the rosetta_5.41_windows_interx86.exe consuming 99% of my CPU(s). This MUST mean there are multiple threads in that process, otherwise how could it use both CPUs? And I also noted today that the CPU time shown advances at twice wall-clock time. In other words, I heard my clock tick 10 times, and saw the task accumulate 20 seconds of CPU time. The only way I've found to get control back and be able to display anything on my PC is to end the rosetta .exe, which is shown on the application tab as "not responding". This then results in the WU failing with Exit status 1073807364 (0x40010004). This is the host I'm talking about. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 700,874 RAC: 909 |
Feet1st, when you say it's been set to run on only one CPU, do you mean BOINC is only using one CPU, or did you turn off Hyperthreading in the BIOS? (I'm assuming from the rest of your post that it's BOINC that is only using one CPU.) I think BOINC will only run one project, but if the project is multithreaded (multiprocess, actually, and the screensaver is a different process)it can use both halves of your CPU. When you do ctrl-alt-del can you see the taskbar? If so, you should be able to right-click on the BOINC icon and select "exit", which will not cause the WU to be lost. You will, however, revert to that last checkpoint on whatever you were running. Now, interestingly, a processor with no hyperthreading doesn't seem to have problems. It must, however, divvy up its single thread between the rosetta app and the graphics, time-slicing them. A hyperthreaded or dual processor can run both simultaneously. Perhaps there is something in the Rosetta app or graphics that is not thread-safe. |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 700,874 RAC: 909 |
As an additional piece of info, the NVidia vs. ATI testing that I've been conducting in the middle of this has shown no clear winner. Both cards run the graphics just fine on all my current projects, though my list is by no means complete. Rosetta/Ralph screensaver graphics will crash equally well on either. I had though that maybe one vendor's OpenGL library might be more bulletproof than another's, but both seem to act similarly. If anything, the ATI can get its display settings messed up when the crash happens, making it difficult to impossible to read the screen. This may be a problem with the particular driver version I'm using, though, and not the card itself. I've been using 6-11_xp-2k_dd_37616, downloaded from AMD/ATI just a couple of days ago. For NVidia, I'm using 93.71, their current release. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
genes, I have no idea why it does not work, it has worked on all of mine so far... The only thing i loose is the XP theming. Guess I missed some option somewhere. Team mauisun.org |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
genes, I have no idea why it does not work, it has worked on all of mine so far... Seems some files got mixed up, so I grab them again this also resets all options, so now it is a 'vanilla' compile. I have not touched a thing. Lets try again If this does not work, have a beer :-) (I still have no idea why the previous work on mine, but not yours) Team mauisun.org |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 700,874 RAC: 909 |
genes, I have no idea why it does not work, it has worked on all of mine so far... Well, thanks for your efforts, I guess I'll just wait for 5.7.6 to come out officially :-) ---- Edit -- wait, wait, another try? OK, sure I'll bite. I'll have the beer ready, though, just in case. ACK. I can see the address your link is pointing to, but when I click it, I get a 404. (your "standard compile" link from before still works, though) |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
genes, I have no idea why it does not work, it has worked on all of mine so far... er yes, try the link again.... that'll teach me not to try my link..... that'll also teach me to put the actual files there. Also time for me to sleep, I'll probably get bored tommorow and play around with getting wxwidgets to compile and then try and get it to compile against boinc. Teaches me how to do these sorts of things. Team mauisun.org |
genes Send message Joined: 8 Oct 05 Posts: 60 Credit: 700,874 RAC: 909 |
*Sigh* Still no joy, Fluffy. I'm having the beer, though! |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
*Sigh* Still no joy, Fluffy. I'm having the beer, though! Well they all work fine for me on all my computers, hope the beer is good though. I honestly have no idea.. Team mauisun.org |
Message boards :
Number crunching :
Report Rosetta screensaver problems here
©2024 University of Washington
https://www.bakerlab.org