Jobs lock up and never finish

Message boards : Number crunching : Jobs lock up and never finish

To post messages, you must log in.

AuthorMessage
mdillenk

Send message
Joined: 19 Feb 06
Posts: 8
Credit: 865,454
RAC: 0
Message 65196 - Posted: 4 Feb 2010, 2:32:55 UTC

Jobs such as these never finished, in the BOINC client they look frozen but the job doesn't utilize any cpu. I would guess that between 5% to 10% of the jobs do this. I'm running the 64 bit BOINC client on Windows 7 64. Any body else having problems like this or know what may be wrong?

t374__boinc_filtered_loopbuild_threading_cst_lb_tex_IGNORE_THE_REST_16900_5733_0


t365__boinc_filtered_loopbuild_threading_cst_all_tex_IGNORE_THE_REST_16902_5996_0

lr15clus_opt_.1eyv.1eyv.IGNORE_THE_REST.c.10.2.pdb.pdb.JOB_17448_1_0
ID: 65196 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin

Send message
Joined: 13 Apr 07
Posts: 42
Credit: 260,782
RAC: 0
Message 65204 - Posted: 4 Feb 2010, 14:15:10 UTC

Yes, this has been a common problem for some. I would check the Rosetta 2.05 topic. Ive been having alot of trouble with these type work units as of late.
ID: 65204 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 65205 - Posted: 4 Feb 2010, 15:06:02 UTC

Jobs such as these never finished, in the BOINC client they look frozen but the job doesn't utilize any cpu


I have had a few of these. I find that if you either suspend that work unit and then reactivate it, or exit BOINC and restart. The work unit then usually competes normally.
ID: 65205 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mdillenk

Send message
Joined: 19 Feb 06
Posts: 8
Credit: 865,454
RAC: 0
Message 65209 - Posted: 4 Feb 2010, 20:52:48 UTC - in response to Message 65205.  

Jobs such as these never finished, in the BOINC client they look frozen but the job doesn't utilize any cpu


I have had a few of these. I find that if you either suspend that work unit and then reactivate it, or exit BOINC and restart. The work unit then usually competes normally.


I'll try restarting the client and see, in the past I've just aborted the frozen task and let the next one in the queue kick off.
ID: 65209 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mdillenk

Send message
Joined: 19 Feb 06
Posts: 8
Credit: 865,454
RAC: 0
Message 65212 - Posted: 5 Feb 2010, 1:11:47 UTC - in response to Message 65209.  

Jobs such as these never finished, in the BOINC client they look frozen but the job doesn't utilize any cpu


I have had a few of these. I find that if you either suspend that work unit and then reactivate it, or exit BOINC and restart. The work unit then usually competes normally.


I'll try restarting the client and see, in the past I've just aborted the frozen task and let the next one in the queue kick off.


I've upgraded to the 6.10.32 client and haven't seen a problem again yet, I'll keep my fingers crossed.
ID: 65212 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mdillenk

Send message
Joined: 19 Feb 06
Posts: 8
Credit: 865,454
RAC: 0
Message 65241 - Posted: 8 Feb 2010, 22:34:43 UTC - in response to Message 65212.  

Jobs such as these never finished, in the BOINC client they look frozen but the job doesn't utilize any cpu


I have had a few of these. I find that if you either suspend that work unit and then reactivate it, or exit BOINC and restart. The work unit then usually competes normally.


I'll try restarting the client and see, in the past I've just aborted the frozen task and let the next one in the queue kick off.


I've upgraded to the 6.10.32 client and haven't seen a problem again yet, I'll keep my fingers crossed.


The 6.10.32 client still has the same problem. Restarting the client does make the frozen jobs continue progress but I've had jobs that I have to reset the client for several times before they finish. It would be cool if the client could detect this problem and take the appropriate "reset" action to make the job continue progress.
ID: 65241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 65243 - Posted: 9 Feb 2010, 3:57:57 UTC
Last modified: 9 Feb 2010, 4:12:47 UTC

It would be cool if the client could detect this problem and take the appropriate "reset" action to make the job continue progress.


That is exactly what the "watchdog" that you hear me refer to all the time is designed to do. Unfortunately, whatever is preventing the active Rosetta thread from getting CPU also seems to prevent the watchdog thread from getting CPU and so it never gets a chance to do it's thing.
Rosetta Moderator: Mod.Sense
ID: 65243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 65253 - Posted: 10 Feb 2010, 0:42:15 UTC

Do the people with this issue see the wu start and then "freeze" or do they never really start?
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 65253 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mdillenk

Send message
Joined: 19 Feb 06
Posts: 8
Credit: 865,454
RAC: 0
Message 65256 - Posted: 10 Feb 2010, 5:54:56 UTC - in response to Message 65253.  

Do the people with this issue see the wu start and then "freeze" or do they never really start?


It appears that they actually do start. They progress, percentage increases, during which they are utilizing CPU. Then when I check on it the time to completion number has gone up to 8 10 or more hours and the time running is way over the normal 3-3 1/2 hours to completion. Task manager shows that job is in memory but not utilizing any CPU. I wonder if it's an issue with anti-virus, however my anti-virus program isn't giving me any warnings.

ID: 65256 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 65257 - Posted: 10 Feb 2010, 7:20:39 UTC

I upgraded to 6.10.32 as suggested but the problem persists on W7. Shutting down and restarting BOINC results in things getting going again. I doubt if its anti-virus related: I don't have one on my machine.
ID: 65257 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,178,626
RAC: 3,201
Message 65260 - Posted: 10 Feb 2010, 12:06:25 UTC - in response to Message 65257.  

I upgraded to 6.10.32 as suggested but the problem persists on W7. Shutting down and restarting BOINC results in things getting going again. I doubt if its anti-virus related: I don't have one on my machine.


Do you guys have the setting to keep tasks in memory when suspended set to yes?
"Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes')" under Your Account, Computing Preferences, Processor Usage.
ID: 65260 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 65262 - Posted: 10 Feb 2010, 15:26:38 UTC

Okay, sounds different. We have been chasing a problem of wu's not running, but never seeming to even start, at Docking@Home, there is one of the threads concerning the issue here -

Sad stories at Docking!

- I wondered if it was a similar/same problem.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 65262 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 65265 - Posted: 10 Feb 2010, 17:15:09 UTC - in response to Message 65260.  


Do you guys have the setting to keep tasks in memory when suspended set to yes?
"Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes')" under Your Account, Computing Preferences, Processor Usage.


Yes I do
ID: 65265 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,178,626
RAC: 3,201
Message 65273 - Posted: 11 Feb 2010, 10:53:08 UTC - in response to Message 65265.  


Do you guys have the setting to keep tasks in memory when suspended set to yes?
"Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes')" under Your Account, Computing Preferences, Processor Usage.


Yes I do


DARN, this has helped fix this in the past.
ID: 65273 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aegis Maelstrom

Send message
Joined: 29 Oct 08
Posts: 61
Credit: 2,137,555
RAC: 0
Message 65284 - Posted: 11 Feb 2010, 15:38:59 UTC
Last modified: 11 Feb 2010, 15:39:46 UTC

I had this problem as well, nothing really helped. It looked a bit like a RAM issue or just "another application issue" - the WU was being heavier and heavier crunched and in some moment, i.e. switching to some app, killing the browser etc. - this happens. For me it was like a quiet BOINC/Rosetta crash.

That was a final straw for me for some time. I'll wait with Rosetta till some new WUs or a new computer, preferably. :) Rosie seems to want more than I can always provide.
ID: 65284 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mdillenk

Send message
Joined: 19 Feb 06
Posts: 8
Credit: 865,454
RAC: 0
Message 65308 - Posted: 13 Feb 2010, 22:30:05 UTC - in response to Message 65284.  

I had this problem as well, nothing really helped. It looked a bit like a RAM issue or just "another application issue" - the WU was being heavier and heavier crunched and in some moment, i.e. switching to some app, killing the browser etc. - this happens. For me it was like a quiet BOINC/Rosetta crash.

That was a final straw for me for some time. I'll wait with Rosetta till some new WUs or a new computer, preferably. :) Rosie seems to want more than I can always provide.



My latest tweak has been to turn max allowed memory use up to 90% when the computer is in use. By default it is 90% for when the computer is idle so I turned it up to 90% for when it is in use as well. One of the jobs that froze took up 500 megs of RAM when I restarted it, so maybe the client is freezing the job because it's taking up too much memory when the computer goes from "idle" to "in use".
ID: 65308 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Jobs lock up and never finish



©2024 University of Washington
https://www.bakerlab.org