Question about constant RAC

Message boards : Number crunching : Question about constant RAC

To post messages, you must log in.

AuthorMessage
Profile tiger

Send message
Joined: 16 Jul 06
Posts: 17
Credit: 1,083,385
RAC: 0
Message 63404 - Posted: 19 Sep 2009, 22:27:33 UTC

I don't mean a level that stays within a tight range, I mean, a constant 6-significant-figure RAC. This fellow here:

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1105953

Stays at 3088.71. After the most recent server down time, where my three Q9550's went for hours with nothing to do, I increased the "additional work buffer" to the maximum of 10 days. But that is when the RAC of this one system seemed to freeze.

Any ideas?
ID: 63404 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 63405 - Posted: 20 Sep 2009, 2:34:49 UTC - in response to Message 63404.  

I don't mean a level that stays within a tight range, I mean, a constant 6-significant-figure RAC. This fellow here:

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1105953

Stays at 3088.71. After the most recent server down time, where my three Q9550's went for hours with nothing to do, I increased the "additional work buffer" to the maximum of 10 days. But that is when the RAC of this one system seemed to freeze.

Any ideas?

At the moment it's 3,086.14.
ID: 63405 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 63409 - Posted: 20 Sep 2009, 16:40:51 UTC
Last modified: 20 Sep 2009, 16:41:22 UTC

So your question is "why did they get work when I didn't?"

Just depends upon when you hit the scheduler and how caught up the feeder is. The file server was overloaded for extended periods of time causing various pieces of the process to have periods where they couldn't keep up with the 80,000 hosts that run R@h.

So increasing your work buffer is the main way to assure you have enough work for such adverse events. Another way to improve your odds would be to increase your runtime preference. Now that you have a 10 day buffer, you'll want to ratchet that down before increasing runtime. Make any increases gradually over the course of days or you will find yourself with too much work to complete before the 10 day deadline. For example, if you have 6 days of work on hand and double your runtime preference, suddenly it becomes 12 days of work. i.e. the change in runtime preference applies to work you already had waiting to run.
Rosetta Moderator: Mod.Sense
ID: 63409 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile tiger

Send message
Joined: 16 Jul 06
Posts: 17
Credit: 1,083,385
RAC: 0
Message 63410 - Posted: 20 Sep 2009, 22:09:28 UTC - in response to Message 63409.  

So your question is "why did they get work when I didn't?"


No. I was marvelling at the constant RAC of one machine. It occurred proximately to when I increased the additional work buffer. I was putting that out there in case someone knew if that causes the RAC to be calculated differently, that's all.

Another way to improve your odds would be to increase your runtime preference. Now that you have a 10 day buffer, you'll want to ratchet that down before increasing runtime. Make any increases gradually over the course of days or you will find yourself with too much work to complete before the 10 day deadline. For example, if you have 6 days of work on hand and double your runtime preference, suddenly it becomes 12 days of work. i.e. the change in runtime preference applies to work you already had waiting to run.


I re-read that a few times and still aren't sure what you're saying. I have three (soon to be 4!) quads that run R@H non-stop. They can have 100% of any idle cpu time available.

I did get from your post that 10 days might be pushing it, in that the report deadline may come and go before the last piece is available. I reduced the AWB to 5 days.
ID: 63410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 63412 - Posted: 21 Sep 2009, 3:51:01 UTC

RAC is based on completed work, not in any way based upon the pending work you have waiting to begin. Perhaps what you observed was an extended period of time when the host didn't contact the scheduler and the credit decay script hadn't adjusted his RAC yet.

If you aren't familiar with it, the Rosetta preferences allow you to define a preference for how long each task should run. Default is 3 hours. But you can select 1-24hrs per task as your preference. The application will do it's best to follow your preference, but it will not always be possible.

If you have a 3hr preference, then your average CPU on an average day will do 8 tasks. You have a quad, so that's a total of about 32 tasks per day. Do if you do get a full 10 day cache of work, you would have about 320 tasks queued up.

Now, if you were to change your preference to 8hrs, those existing 320 tasks are going to start running for roughly 8hrs instead of 3. And now 320 tasks cannot be completed within the 10 day deadline. So, work your cache of pending work down. Then ratchet up your runtime preference. Once the initial time to completion shown for a task is roughly inline with your current target runtime, then you can safely increase the number of days of work you keep on hand because the estimates will be reasonably close. It takes the BOINC client a day or so to get used to the new runtime preference.


Rosetta Moderator: Mod.Sense
ID: 63412 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 63418 - Posted: 21 Sep 2009, 15:22:03 UTC - in response to Message 63410.  

was it actively reporting new work units? i think if a machine doesn't report for a while then its RAC remains constant (there is a decay function but i don't think it runs that often - once a week maybe?)

So your question is "why did they get work when I didn't?"


No. I was marvelling at the constant RAC of one machine. It occurred proximately to when I increased the additional work buffer. I was putting that out there in case someone knew if that causes the RAC to be calculated differently, that's all.

Another way to improve your odds would be to increase your runtime preference. Now that you have a 10 day buffer, you'll want to ratchet that down before increasing runtime. Make any increases gradually over the course of days or you will find yourself with too much work to complete before the 10 day deadline. For example, if you have 6 days of work on hand and double your runtime preference, suddenly it becomes 12 days of work. i.e. the change in runtime preference applies to work you already had waiting to run.


I re-read that a few times and still aren't sure what you're saying. I have three (soon to be 4!) quads that run R@H non-stop. They can have 100% of any idle cpu time available.

I did get from your post that 10 days might be pushing it, in that the report deadline may come and go before the last piece is available. I reduced the AWB to 5 days.


ID: 63418 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile tiger

Send message
Joined: 16 Jul 06
Posts: 17
Credit: 1,083,385
RAC: 0
Message 63420 - Posted: 21 Sep 2009, 19:50:17 UTC - in response to Message 63418.  

Yep. I even manually did network communication. The RAC seems to be floating again, so maybe it was just that whatever updates RAC, was not running at the time.

was it actively reporting new work units?

ID: 63420 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 63421 - Posted: 21 Sep 2009, 22:49:15 UTC - in response to Message 63420.  

yeah - probably because of the recent validator backlog then...
Yep. I even manually did network communication. The RAC seems to be floating again, so maybe it was just that whatever updates RAC, was not running at the time.

was it actively reporting new work units?


ID: 63421 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Question about constant RAC



©2024 University of Washington
https://www.bakerlab.org