Message boards : Number crunching : Jobs lock up and never finish
Author | Message |
---|---|
mdillenk Send message Joined: 19 Feb 06 Posts: 8 Credit: 865,454 RAC: 0 |
Jobs such as these never finished, in the BOINC client they look frozen but the job doesn't utilize any cpu. I would guess that between 5% to 10% of the jobs do this. I'm running the 64 bit BOINC client on Windows 7 64. Any body else having problems like this or know what may be wrong? t374__boinc_filtered_loopbuild_threading_cst_lb_tex_IGNORE_THE_REST_16900_5733_0 t365__boinc_filtered_loopbuild_threading_cst_all_tex_IGNORE_THE_REST_16902_5996_0 lr15clus_opt_.1eyv.1eyv.IGNORE_THE_REST.c.10.2.pdb.pdb.JOB_17448_1_0 |
Admin Send message Joined: 13 Apr 07 Posts: 42 Credit: 260,782 RAC: 0 |
Yes, this has been a common problem for some. I would check the Rosetta 2.05 topic. Ive been having alot of trouble with these type work units as of late. |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
Jobs such as these never finished, in the BOINC client they look frozen but the job doesn't utilize any cpu I have had a few of these. I find that if you either suspend that work unit and then reactivate it, or exit BOINC and restart. The work unit then usually competes normally. |
mdillenk Send message Joined: 19 Feb 06 Posts: 8 Credit: 865,454 RAC: 0 |
Jobs such as these never finished, in the BOINC client they look frozen but the job doesn't utilize any cpu I'll try restarting the client and see, in the past I've just aborted the frozen task and let the next one in the queue kick off. |
mdillenk Send message Joined: 19 Feb 06 Posts: 8 Credit: 865,454 RAC: 0 |
Jobs such as these never finished, in the BOINC client they look frozen but the job doesn't utilize any cpu I've upgraded to the 6.10.32 client and haven't seen a problem again yet, I'll keep my fingers crossed. |
mdillenk Send message Joined: 19 Feb 06 Posts: 8 Credit: 865,454 RAC: 0 |
Jobs such as these never finished, in the BOINC client they look frozen but the job doesn't utilize any cpu The 6.10.32 client still has the same problem. Restarting the client does make the frozen jobs continue progress but I've had jobs that I have to reset the client for several times before they finish. It would be cool if the client could detect this problem and take the appropriate "reset" action to make the job continue progress. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
It would be cool if the client could detect this problem and take the appropriate "reset" action to make the job continue progress. That is exactly what the "watchdog" that you hear me refer to all the time is designed to do. Unfortunately, whatever is preventing the active Rosetta thread from getting CPU also seems to prevent the watchdog thread from getting CPU and so it never gets a chance to do it's thing. Rosetta Moderator: Mod.Sense |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
Do the people with this issue see the wu start and then "freeze" or do they never really start? Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
mdillenk Send message Joined: 19 Feb 06 Posts: 8 Credit: 865,454 RAC: 0 |
Do the people with this issue see the wu start and then "freeze" or do they never really start? It appears that they actually do start. They progress, percentage increases, during which they are utilizing CPU. Then when I check on it the time to completion number has gone up to 8 10 or more hours and the time running is way over the normal 3-3 1/2 hours to completion. Task manager shows that job is in memory but not utilizing any CPU. I wonder if it's an issue with anti-virus, however my anti-virus program isn't giving me any warnings. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I upgraded to 6.10.32 as suggested but the problem persists on W7. Shutting down and restarting BOINC results in things getting going again. I doubt if its anti-virus related: I don't have one on my machine. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,626 RAC: 3,201 |
I upgraded to 6.10.32 as suggested but the problem persists on W7. Shutting down and restarting BOINC results in things getting going again. I doubt if its anti-virus related: I don't have one on my machine. Do you guys have the setting to keep tasks in memory when suspended set to yes? "Leave applications in memory while suspended? (suspended applications will consume swap space if 'yes')" under Your Account, Computing Preferences, Processor Usage. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
Okay, sounds different. We have been chasing a problem of wu's not running, but never seeming to even start, at Docking@Home, there is one of the threads concerning the issue here - Sad stories at Docking! - I wondered if it was a similar/same problem. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Yes I do |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,626 RAC: 3,201 |
DARN, this has helped fix this in the past. |
Aegis Maelstrom Send message Joined: 29 Oct 08 Posts: 61 Credit: 2,137,555 RAC: 0 |
I had this problem as well, nothing really helped. It looked a bit like a RAM issue or just "another application issue" - the WU was being heavier and heavier crunched and in some moment, i.e. switching to some app, killing the browser etc. - this happens. For me it was like a quiet BOINC/Rosetta crash. That was a final straw for me for some time. I'll wait with Rosetta till some new WUs or a new computer, preferably. :) Rosie seems to want more than I can always provide. |
mdillenk Send message Joined: 19 Feb 06 Posts: 8 Credit: 865,454 RAC: 0 |
I had this problem as well, nothing really helped. It looked a bit like a RAM issue or just "another application issue" - the WU was being heavier and heavier crunched and in some moment, i.e. switching to some app, killing the browser etc. - this happens. For me it was like a quiet BOINC/Rosetta crash. My latest tweak has been to turn max allowed memory use up to 90% when the computer is in use. By default it is 90% for when the computer is idle so I turned it up to 90% for when it is in use as well. One of the jobs that froze took up 500 megs of RAM when I restarted it, so maybe the client is freezing the job because it's taking up too much memory when the computer goes from "idle" to "in use". |
Message boards :
Number crunching :
Jobs lock up and never finish
©2024 University of Washington
https://www.bakerlab.org