What's with all the errors???

Message boards : Number crunching : What's with all the errors???

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
FernValleyIT

Send message
Joined: 1 Dec 05
Posts: 7
Credit: 84,334
RAC: 0
Message 65056 - Posted: 21 Jan 2010, 23:11:07 UTC

Four machines - approx 2 out of 5 WU's error out. Another 1 out of 5 must be aborted by user to proceed. Less than 50% success rate. Reset project twice. No difference. Have now stopped getting WU's from Docking. Might be the switching back and forth. Will know more Friday.

Anyone else?
ID: 65056 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,673,616
RAC: 11,118
Message 65059 - Posted: 22 Jan 2010, 8:54:46 UTC

Have you got 'leave applications in memory while suspended' checked? Also what's your switch time between projects? The longer the better...
ID: 65059 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 28
Message 65061 - Posted: 22 Jan 2010, 9:55:15 UTC

I can see from my results that I have also seen errors, not many though. They fail after just a few seconds without intervention, and are failed by others as well. Not a big deal for me as I don't have to do anything, and the size of the download is not a problem, but could be for others.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 65061 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FernValleyIT

Send message
Joined: 1 Dec 05
Posts: 7
Credit: 84,334
RAC: 0
Message 65063 - Posted: 22 Jan 2010, 13:49:51 UTC

<< Have you got 'leave applications in memory while suspended' checked? Also what's your switch time between projects? The longer the better... >>

...did not have the memory thing checked. Switching set to 1 hour. Never had an issue. Not sure if they're related, but this behavior began after the holiday downtime, then ran okay for a couple weeks, and now has begun again. I'll give it some more time with the new prefs. Thanks!
ID: 65063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FernValleyIT

Send message
Joined: 1 Dec 05
Posts: 7
Credit: 84,334
RAC: 0
Message 65064 - Posted: 22 Jan 2010, 15:00:47 UTC - in response to Message 65063.  

...just wanted to add, this issue does not happen with Docking WU's. Thanks!
ID: 65064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FernValleyIT

Send message
Joined: 1 Dec 05
Posts: 7
Credit: 84,334
RAC: 0
Message 65066 - Posted: 22 Jan 2010, 18:44:50 UTC

No better. 18 out of the last 33 in error. What a waste. I'm detaching now. I'll watch for 2.06 and maybe come back then. Thanks.
ID: 65066 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 65074 - Posted: 23 Jan 2010, 2:23:09 UTC - in response to Message 65066.  

No better. 18 out of the last 33 in error. What a waste. I'm detaching now. I'll watch for 2.06 and maybe come back then. Thanks.

Before you go, Roger, can you check your Boinc Manager preferences in the Advanced menu and go to the Processor Usage tab. If the "Use at most xx% CPU time" figure is less than 100% you get errors in WU characterised by the message "Can't acquire lockfile - exiting" in the WU details. I see these errors in your problem WUs.

After you change this, re-boot and try again.

If the lockfile problem persists, the zero-byte lockfiles are usually found in your Boincslots folder. Try and delete them manually. If they refuse to go, close down boinc manager, go into Task Manager and end all boinc processes, then try again. They should go. Re-boot and try one last time.

In summary: 100% CPU usage in processor usage preferences, re-boot, ensure lockfiles have truly disappeared, re-boot and try one last time.

If it still doesn't work, I'm out of ideas. Hope it works.
ID: 65074 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FernValleyIT

Send message
Joined: 1 Dec 05
Posts: 7
Credit: 84,334
RAC: 0
Message 65082 - Posted: 24 Jan 2010, 1:49:16 UTC - in response to Message 65074.  

In summary: 100% CPU usage in processor usage preferences, re-boot, ensure lockfiles have truly disappeared, re-boot and try one last time.

If it still doesn't work, I'm out of ideas. Hope it works.


Thanks Sid. I don't really want to run 100% 24/7. I'm throttled at 50 and would like to stay that way. Hopefully 2.06 will address this. Thanks again!
ID: 65082 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 65083 - Posted: 24 Jan 2010, 5:47:59 UTC - in response to Message 65082.  
Last modified: 24 Jan 2010, 5:52:07 UTC

Thanks Sid. I don't really want to run 100% 24/7. I'm throttled at 50 and would like to stay that way. Hopefully 2.06 will address this. Thanks again!

That's ok if that's your intention, but be aware this is not a Rosetta issue, as I understand it, but a Boinc issue. It's just that Rosetta WUs seem to be most susceptible to falling over as a result. Version 2.07 (or whatever) won't solve it. I couldn't hazard a reason why, it just is. A new Boinc version might solve it, but I don't keep up with what they're working on so have no idea if it will.

While looking for this solution, previously given, I googled the phrase "can't acquire lockfile" and found reports of the same problem on the Seti project.

If you insist on running less than 100% then the errors will continue because of your choice.

Can I ask you just to switch to 100% for one day to confirm if the problem clears for you. If it does, then you know you're in control of a solution youself, then if you chose not to go with it then it's your own choice and you already know the outcome. Hopefully this one day will also confirm for you whether running at 100% is the problem you perceive or not.

I originally ran at 50% CPU, because I thought Boinc would slow my machine down, but when I switched to 100% to solve this very issue I didn't notice any slowdown at all.
ID: 65083 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 65085 - Posted: 24 Jan 2010, 11:22:10 UTC - in response to Message 65082.  

In summary: 100% CPU usage in processor usage preferences, re-boot, ensure lockfiles have truly disappeared, re-boot and try one last time.

If it still doesn't work, I'm out of ideas. Hope it works.


Thanks Sid. I don't really want to run 100% 24/7. I'm throttled at 50 and would like to stay that way. Hopefully 2.06 will address this. Thanks again!


Sid is right that this is a BOINC issue not restricted to rosetta but there is something else you can do on multiple processor systems. If you currently allow BOINC to use 50% of CPU time of 100% of your processors switch these numbers. Allow 100% of CPU time but restrict BOINC to half of your cores. I believe this solution has worked for others.

Snags
ID: 65085 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,178,626
RAC: 3,201
Message 65086 - Posted: 24 Jan 2010, 12:44:07 UTC - in response to Message 65082.  

In summary: 100% CPU usage in processor usage preferences, re-boot, ensure lockfiles have truly disappeared, re-boot and try one last time.

If it still doesn't work, I'm out of ideas. Hope it works.


Thanks Sid. I don't really want to run 100% 24/7. I'm throttled at 50 and would like to stay that way. Hopefully 2.06 will address this. Thanks again!


You do know that the 50% doesn't mean your cpu only uses 50% of itself, right? It just means Boinc only runs 50% of the time available to the cpu. It runs at 100% but only 50% of the time, it does not just use 50% of the cpu at any one moment. It might be better to set it to 100% and then change how Boinc runs on your pc. In Boinc Manager, Advanced, Preferences, you can set Boinc to only run during certain hours of the day, you can set it to only run after the machine has been idle for x amount of time, you can set it to use less than 100% of the processors, meaning you can set it to only use 1 core of a dual core machine, or 3 cores of a quad core etc. You can do this thru Boinc Manager and then it is machine specific or you can do it on the webpage and it will be globally set for all your pc's. I have 16 pc's running right now so do it thru Boinc Manager on each pc. In the end though it is your choice as is the projects you chose to crunch for.
ID: 65086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,673,616
RAC: 11,118
Message 65087 - Posted: 24 Jan 2010, 12:48:57 UTC - in response to Message 65085.  


Sid is right that this is a BOINC issue not restricted to rosetta but there is something else you can do on multiple processor systems. If you currently allow BOINC to use 50% of CPU time of 100% of your processors switch these numbers. Allow 100% of CPU time but restrict BOINC to half of your cores. I believe this solution has worked for others.

Snags

I'll 2nd that for netburst (P4) CPUs with HT turned on... Only using the real CPUs is better than 50% of all real and virtual CPUs...
ID: 65087 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 65088 - Posted: 24 Jan 2010, 16:21:21 UTC - in response to Message 65086.  

You do know that the 50% doesn't mean your cpu only uses 50% of itself, right? It just means Boinc only runs 50% of the time available to the cpu. It runs at 100% but only 50% of the time, it does not just use 50% of the cpu at any one moment.

I run a sidebar gadget on Vista (System Monitor, I think) which shows the CPU usage of each of my 4 cores. When I was running at 50% CPU time it always confused me that every core was alternating 0% then 100% several times a second, every second, every minute, hour, day etc. So yes, you're right I think.

Madness, when I think about it now...
ID: 65088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,673,616
RAC: 11,118
Message 65089 - Posted: 24 Jan 2010, 18:51:49 UTC - in response to Message 65088.  

You do know that the 50% doesn't mean your cpu only uses 50% of itself, right? It just means Boinc only runs 50% of the time available to the cpu. It runs at 100% but only 50% of the time, it does not just use 50% of the cpu at any one moment.

I run a sidebar gadget on Vista (System Monitor, I think) which shows the CPU usage of each of my 4 cores. When I was running at 50% CPU time it always confused me that every core was alternating 0% then 100% several times a second, every second, every minute, hour, day etc. So yes, you're right I think.

Madness, when I think about it now...

It has to work like that on some time scale. The time-scale could be decreased to something so small that you wouldn't notice the fluctuations in real-time, but the devs might have opted against that because it might cause more cache-swapping and therefore reduce efficiency... I guess the most efficient way to do it is to have BOINC at 100% at all times and reduce the clock-rate to scale.
ID: 65089 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 65092 - Posted: 25 Jan 2010, 0:51:54 UTC - in response to Message 65089.  

Errr, not sure about that. I guess I thought it would run constantly but put a ceiling of 50% on the processor usage directed toward BoincRosetta. It was only when I understood that Boinc runs at low priority, so that anythingeverything else that wanted to run went ahead of it, that I was reassured it was going to do what I actually wanted - i.e. not get in the way of any foreground apps I was running.

That said, mod.sense rightly pulled me up about the RAM Rosetta uses getting in the way of other apps, but I guess any adjustment to CPU time doesn't improve that side of things.

Saying that, I do limit the memory usage on the "disk and memory usage" tab while I'm running. I haven't heard about (nor had) any issues with that setting, though having 8Gb RAM makes it a little redundant. I still use it to reduce any slowdown issues using virtual RAM too much.
ID: 65092 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 65096 - Posted: 25 Jan 2010, 4:25:38 UTC

The % of CPU time is basically used for BOINC to assure that it leaves at LEAST as much slack time as required to meet the setting. So if you run at say 70% it means BOINC will not even try to run for 30% of the time. And during the 70% of the time it is to run, it activates and places it's chip in line for CPU, but higher priority tasks may not yield it. And as a result, BOINC may not actually get 70%, but something less then that. It is a rather rudimentary control, but it resolves the heat problems which were I believe it's major purpose in life.

So, what I'm saying is just that it isn't smart enough to figure out that it only actually got 5 seconds of CPU time during the last 7 seconds. And so it is going to go idle for the next 3 seconds, regardless of that fact. It really only would be noticeable on a busy machine ...which is perhaps why some wish to limit CPU% in the first place.

I wasn't sure which machine you were referring to. But it would indeed be more efficient to use 50% of the number of CPUs, then to run 50% of the time. The reason being you would have half as many tasks active at any given time, and therefore be using much less memory. Ex: 4CPUs running half of the CPUs would only begin 2 tasks. CPU should still run just as cool, but you'll only be using memory for 2. If you said to run all CPUs but only 50% of the time, 4 tasks will begin and all 4 will cycle on and off.
Rosetta Moderator: Mod.Sense
ID: 65096 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FernValleyIT

Send message
Joined: 1 Dec 05
Posts: 7
Credit: 84,334
RAC: 0
Message 65097 - Posted: 25 Jan 2010, 4:49:51 UTC - in response to Message 65096.  

The % of CPU time is basically


Well put. I was earlier going to comment about the memory thing. Load up a 4-core machine with 12-20 hour Docking WU's and you've used up 4GB memory. That's major hit if you start swappping to hard drive. Not so bad with the smaller Rosetta WU's. The heat thing too is why I wanted to throttle back. So I'm back and put the number of processors at 50% working 100% of the time. Believe it or not, my core-2 spreads a nice smooth 30-70% across both cores, not just one. The 4-core machines use only 2 cores (one off of each processor) at about 80% each. Much smoother, better performance, and no errors yet. Thanks!
<img src="http://www.boincstats.com/signature/user_1246837.gif">
ID: 65097 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,178,626
RAC: 3,201
Message 65099 - Posted: 25 Jan 2010, 10:18:54 UTC - in response to Message 65097.  

The % of CPU time is basically


Well put. I was earlier going to comment about the memory thing. Load up a 4-core machine with 12-20 hour Docking WU's and you've used up 4GB memory. That's major hit if you start swappping to hard drive. Not so bad with the smaller Rosetta WU's. The heat thing too is why I wanted to throttle back. So I'm back and put the number of processors at 50% working 100% of the time. Believe it or not, my core-2 spreads a nice smooth 30-70% across both cores, not just one. The 4-core machines use only 2 cores (one off of each processor) at about 80% each. Much smoother, better performance, and no errors yet. Thanks!


I almost hate to do this but.....you can't tell Boinc WHICH of the 2 processors to use, it uses any 2 it wants to, but does respect your 50% of the total. Now if you have HT that is a different story, but for 4 real live processors, Boinc will pick which 2 it uses. As for practical use, you should not notice any difference at all, you will now have 2 processors for your exclusive usage no matter what. If you needed to use only processors 1 and 3 for your own program for example, then you probably wouldn't be running Boinc anyway.
ID: 65099 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,249,734
RAC: 9,368
Message 65112 - Posted: 26 Jan 2010, 0:12:04 UTC - in response to Message 65097.  

The % of CPU time is basically...

Well put. I was earlier going to comment about the memory thing. Load up a 4-core machine with 12-20 hour Docking WU's and you've used up 4GB memory. That's a major hit if you start swapping to hard drive. Not so bad with the smaller Rosetta WU's. The heat thing too is why I wanted to throttle back. So I'm back and put the number of processors at 50% working 100% of the time. Believe it or not, my core-2 spreads a nice smooth 30-70% across both cores, not just one. The 4-core machines use only 2 cores (one off of each processor) at about 80% each. Much smoother, better performance, and no errors yet. Thanks!

I've just taken a look at each of your machines and they all have half a dozen or more completed WUs, all completed and validated successfully. Excellent news and a good discussion thread for this issue.

At one time I had this problem persist for months (literally) with no known solution. Now we've hit it quickly and, hopefully, have a happier user, rather than the discontented ones in the past.

Good job, guys.
ID: 65112 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,178,626
RAC: 3,201
Message 65114 - Posted: 26 Jan 2010, 10:02:54 UTC - in response to Message 65112.  

The % of CPU time is basically...

Well put. I was earlier going to comment about the memory thing. Load up a 4-core machine with 12-20 hour Docking WU's and you've used up 4GB memory. That's a major hit if you start swapping to hard drive. Not so bad with the smaller Rosetta WU's. The heat thing too is why I wanted to throttle back. So I'm back and put the number of processors at 50% working 100% of the time. Believe it or not, my core-2 spreads a nice smooth 30-70% across both cores, not just one. The 4-core machines use only 2 cores (one off of each processor) at about 80% each. Much smoother, better performance, and no errors yet. Thanks!

I've just taken a look at each of your machines and they all have half a dozen or more completed WUs, all completed and validated successfully. Excellent news and a good discussion thread for this issue.

At one time I had this problem persist for months (literally) with no known solution. Now we've hit it quickly and, hopefully, have a happier user, rather than the discontented ones in the past.

Good job, guys.


Alright! I love it when in the end the User is happy and crunching!
ID: 65114 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : What's with all the errors???



©2024 University of Washington
https://www.bakerlab.org