Resource Share Obsolete?

Message boards : Number crunching : Resource Share Obsolete?

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60723 - Posted: 19 Apr 2009, 5:36:08 UTC
Last modified: 19 Apr 2009, 5:37:39 UTC

Is it time to junk the concept of Resource Share?

Summary:
1) Resource Share does not meet the needs of a significant percentage of participants
2) Resource Share is thwarted by project work "throttling"
3) Resource Share is only shown as a collective across all resources
4) Resource Share is ineffective in allocation for projects with intermittent work loads
5) Resource Share allocation is trumped by hidden rules
6) Resource Share has not evolved even though almost all other aspects of work issue have

That is the question. We are a number of years into BOINC now and the concept of Resource Share which was intended to allow the participant to allocate his computing resource to various projects and to control the proportion of resources used by each project. But there are issues with the concept of Resource share. Lots of issues. Not the least of which has been the evolution of the scope of options available to the projects to control the issuance of work (including "trusted"/"High Reliability" systems, work throttling, etc.).

First of all, the number of participants that attach to only one project is over 50% of the total number of users. Some of that is because many never stick around for long and thusly never attach to a second project because they abandon BOINC before they get that far. But there is a large segment of the participant community that has zero interest in more than one project. In their case resource share is meaningless because they are 100% dedicated to one project.

Many of those that are attached to 2 or 3 projects are only attached so because the second project is a "Safety" project and they only want to get work from that project if their main project goes off-line longer than their present work queue. For these people the Resource Share concept also does not work well because BOINC will do work for that second (or even third) attached project even though it is supposed to be there for safety purposes only. By that we see people with resource share allocations of 100/5 or 1,000/5 with the clear intent of not actually wanting to do work for the second project unless there is an issue.

Added to this are the issues with NCI class projects where it is not at all clear what will be the impact if there are inappropriate allocations of Resource Share and the distribution of work. By that I mean if I have 8 cores and elect QCN which I want to run all the time and 16 tasks from FreeHAL what will be the allocation across the resources. More significantly, NCI tasks are by definition not allowed to be queued which means that if QCN is off-line when my task on hand completes i cannot get another task until the site returns on-line.

The design intent document: GpuWorkFetch clearly states:

This design does not accommodate:
• jobs that use more than one coprocessor type
• jobs that change their resource usage dynamically (e.g. coprocessor jobs that decide to use the CPU instead).

More importantly it also does not address those issues where the GPU and CPU will both be heavily loaded as is intended by the forthcoming application from The Lattice Project where instead of a lightly loaded CPU and a heavily loaded GPU there will be heavy usage of both by a single task.

Furthermore, my observations lead me to believe that the way that Resource Shares are used by the Work Fetch and the Resource Scheduler are inconsistent with each other giving rise to some of the observable artifacts of operational patterns. I think the concept of Resource Share with the scheduling of tasks on the resources has been overloaded to, in effect, manage the resources so they run an "interesting" suite of work. Sadly, this overloading on "wide" systems only seems to have the effect of having many tasks started an partially completed.

Additional issues include the conflict between participant desires and work availability. For example, some projects simply do not have a steady supply of work. Examples include:

LHC, SIMAP, Pirates, Ralph, FreeHAL, AlmereGrid, BURP, Mind Modeling, etc.

Granted some of these are testing projects, but, most are not. What happens here is that all things considered, I would prefer to have work from LHC to the exclusion of all other work. The only way I have to attempt to "force" this is to assign a high Resource Share to LHC and hope that I build up enough debt to get work from the project when it has some on hand.

Other projects "throttle" the number of tasks that you can obtain to manage their work flows or to attempt to improve "fairness" in the ability of participants to actually get work because of its poor availability. Examples are LHC, Milky Way, GPU Grid, etc.

One of the most obvious issues is the problem that the "Projects" tab summarizes all the resource shares with out regard for the different classes. It also does not take into account the work availability history or any other factor.

So, what the participant is left to do is to assign high shares to the projects that are of most interest to try to accumulate debt so that when work *IS* available they can get some. Sadly, there are "hidden" rules that are there that in effect "thwart" the attempt by the participant to get the work desired to the exclusion of work that they "suffer" with because they have no choice.

Might a concept of project priority 1-10 with two "safety" projects be more useful, simpler, and perhaps more effective in resource management?

In the case of the single project participant they would elect their one project as priority 1, and select one or two other projects as "safety" projects. If, and only if, the main project is down and the work queue is below the trigger level would work be fetched from the safety projects. I would assume the rules would be to fetch no more than the Connect interval or 0.25 days whichever was longer, until the main project revived.

For those of us with more "interesting" project suites the idea would be that work would be fetched from all of the projects attached but that priority 1 projects would get twice as much run time as priority 2, and so on down the line. In my case, I might have LHC and SIMAP as priority 1 projects. Yet both have intermittent availability, because of this the system should build up its hunger/debt and when work does become available my clients should attempt to obtain as much work from those projects as possible. (I will admit that I am not that sure that the current debt system is the best way to manage this, especially since we have those internal "capping" rules; perhaps it is, perhaps not).

The other wrinkle I would introduce, or rather remove, is the idea of Resource Share having anything to do with the scheduling of work on the system once it is downloaded. Or to put it another way, the priorities would control the obtainment of work, but once on hand the work would be processed in deadline order.

This is not a complete concept in that I have also not addressed NCI projects, multi-resource projects, and some of the other issues ... but. might it not be time to look with fresh eyes on this subject?
ID: 60723 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,203,343
RAC: 3,244
Message 60729 - Posted: 19 Apr 2009, 12:49:23 UTC - in response to Message 60723.  

Is it time to junk the concept of Resource Share?

Summary:
1) Resource Share does not meet the needs of a significant percentage of participants
2) Resource Share is thwarted by project work "throttling"
3) Resource Share is only shown as a collective across all resources
4) Resource Share is ineffective in allocation for projects with intermittent work loads
5) Resource Share allocation is trumped by hidden rules
6) Resource Share has not evolved even though almost all other aspects of work issue have

That is the question. We are a number of years into BOINC now and the concept of Resource Share which was intended to allow the participant to allocate his computing resource to various projects and to control the proportion of resources used by each project. But there are issues with the concept of Resource share. Lots of issues. Not the least of which has been the evolution of the scope of options available to the projects to control the issuance of work (including "trusted"/"High Reliability" systems, work throttling, etc.).

First of all, the number of participants that attach to only one project is over 50% of the total number of users. Some of that is because many never stick around for long and thusly never attach to a second project because they abandon BOINC before they get that far. But there is a large segment of the participant community that has zero interest in more than one project. In their case resource share is meaningless because they are 100% dedicated to one project.

Many of those that are attached to 2 or 3 projects are only attached so because the second project is a "Safety" project and they only want to get work from that project if their main project goes off-line longer than their present work queue. For these people the Resource Share concept also does not work well because BOINC will do work for that second (or even third) attached project even though it is supposed to be there for safety purposes only. By that we see people with resource share allocations of 100/5 or 1,000/5 with the clear intent of not actually wanting to do work for the second project unless there is an issue.

Added to this are the issues with NCI class projects where it is not at all clear what will be the impact if there are inappropriate allocations of Resource Share and the distribution of work. By that I mean if I have 8 cores and elect QCN which I want to run all the time and 16 tasks from FreeHAL what will be the allocation across the resources. More significantly, NCI tasks are by definition not allowed to be queued which means that if QCN is off-line when my task on hand completes i cannot get another task until the site returns on-line.

The design intent document: GpuWorkFetch clearly states:

This design does not accommodate:
• jobs that use more than one coprocessor type
• jobs that change their resource usage dynamically (e.g. coprocessor jobs that decide to use the CPU instead).

More importantly it also does not address those issues where the GPU and CPU will both be heavily loaded as is intended by the forthcoming application from The Lattice Project where instead of a lightly loaded CPU and a heavily loaded GPU there will be heavy usage of both by a single task.

Furthermore, my observations lead me to believe that the way that Resource Shares are used by the Work Fetch and the Resource Scheduler are inconsistent with each other giving rise to some of the observable artifacts of operational patterns. I think the concept of Resource Share with the scheduling of tasks on the resources has been overloaded to, in effect, manage the resources so they run an "interesting" suite of work. Sadly, this overloading on "wide" systems only seems to have the effect of having many tasks started an partially completed.

Additional issues include the conflict between participant desires and work availability. For example, some projects simply do not have a steady supply of work. Examples include:

LHC, SIMAP, Pirates, Ralph, FreeHAL, AlmereGrid, BURP, Mind Modeling, etc.

Granted some of these are testing projects, but, most are not. What happens here is that all things considered, I would prefer to have work from LHC to the exclusion of all other work. The only way I have to attempt to "force" this is to assign a high Resource Share to LHC and hope that I build up enough debt to get work from the project when it has some on hand.

Other projects "throttle" the number of tasks that you can obtain to manage their work flows or to attempt to improve "fairness" in the ability of participants to actually get work because of its poor availability. Examples are LHC, Milky Way, GPU Grid, etc.

One of the most obvious issues is the problem that the "Projects" tab summarizes all the resource shares with out regard for the different classes. It also does not take into account the work availability history or any other factor.

So, what the participant is left to do is to assign high shares to the projects that are of most interest to try to accumulate debt so that when work *IS* available they can get some. Sadly, there are "hidden" rules that are there that in effect "thwart" the attempt by the participant to get the work desired to the exclusion of work that they "suffer" with because they have no choice.

Might a concept of project priority 1-10 with two "safety" projects be more useful, simpler, and perhaps more effective in resource management?

In the case of the single project participant they would elect their one project as priority 1, and select one or two other projects as "safety" projects. If, and only if, the main project is down and the work queue is below the trigger level would work be fetched from the safety projects. I would assume the rules would be to fetch no more than the Connect interval or 0.25 days whichever was longer, until the main project revived.

For those of us with more "interesting" project suites the idea would be that work would be fetched from all of the projects attached but that priority 1 projects would get twice as much run time as priority 2, and so on down the line. In my case, I might have LHC and SIMAP as priority 1 projects. Yet both have intermittent availability, because of this the system should build up its hunger/debt and when work does become available my clients should attempt to obtain as much work from those projects as possible. (I will admit that I am not that sure that the current debt system is the best way to manage this, especially since we have those internal "capping" rules; perhaps it is, perhaps not).

The other wrinkle I would introduce, or rather remove, is the idea of Resource Share having anything to do with the scheduling of work on the system once it is downloaded. Or to put it another way, the priorities would control the obtainment of work, but once on hand the work would be processed in deadline order.

This is not a complete concept in that I have also not addressed NCI projects, multi-resource projects, and some of the other issues ... but. might it not be time to look with fresh eyes on this subject?


I think more people would crunch for more than one project consistently IF they
could assign a processor to it. By that I mean that on a quad core machine I could ALWAYS have 3 processors crunching for projects A and the 4th processor for project B. None of this all 4 processors running for 95% of the time on project A and 5% of the time all are running project B! Let me as the owner and user of the machine decide how I want my processors to be used. Okay, maybe the default is to let Boinc decide, but why not let us that like to tweak, for good or bad, do it! Okay so project B is down for 3 weeks and they have no work, if I choose to only run project B on one processor than I am out of luck for that project! Boinc can be smart enough to send me a message!!! It IS A COMPUTER for goodness sake, it can be programmed to do what I want it to do!!!! My thoughts are that the 'powers that be' are resistant to make changes fearing that they will break it and it won't work anymore! Who cares!!! That is why we have many versions of Boinc, some released, some in alpha testing, some in beta testing and some, I hope, on the drawing board!!!! I totally agree with Paul here...Boinc is stuck in the past and todays multi-processor systems are not being used to their full advantage by Boinc!!! Throw in the gpu and Boinc is a dinosaur that desperately needs updating to fit how todays users want to crunch! Berkeley better be careful or someone else will take over our pc's with their better program. WCG and Folding(at least for gpu's) and I am sure others all already use something besides Boinc. Are Boinc's days numbered?
ID: 60729 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60732 - Posted: 19 Apr 2009, 16:21:07 UTC

mikey, how can the client work fetch assure work for a given CPU, if the projects are only intermitantly making work available and not giving you as much as you request?

Paul, clearly you've given this a lot of thought and have a lot of experience behind your observations. As you point out, many participants are not aware of the scheduling and resource shares in sufficient detail to even comment on your post. For them, I thought I would see if I could simply the statement of your idea...

As I understand you (on my first read), you are basically suggesting that the current resource share and debt tracking system be replaced by a priority mechanism. And so history, essentially is ignored, and the current request for work will go the projects in order of the priorities you establish. And perhaps a simple counter of the number of times work was requested, but not received would be maintained, and then used to eventually promote the priority of a #3 project to #2 status until work is received.

So, your system would, much of the time, be simply based on the current state of what work you already have to do, and what priorities you've established. Have I got that about right?

So, for example, someone that likes to help test Rosetta by running Ralph, which by design only has work available when there is testing going on, would set up Ralph as a #1 priority, and Rosetta as #2, and perhaps a backup project as #3. And when work is needed, it would just hit the projects in that order until some work is received.

I personally am always all for people having control of their resources, as mikey says. But I always keep in mind that the more choices you have, the more complicated the system is to make work properly, and the more participants turn away because they simply do not want to take the time to learn both BOINC and how their cell bill was computed. So, any means of masking the internals with a simple skin option would always be desirable.

Perhaps Paul's priority mechanism being the simplified version, and the current debt system being the choice for advanced users. However, with all the projects being seperate, it's difficult to order your list without seeing them all at once. Such a configuration would fit nicely in to an account manager.
Rosetta Moderator: Mod.Sense
ID: 60732 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60735 - Posted: 19 Apr 2009, 18:50:44 UTC

@Mikey

You are actually asking for another mechanism which I have suggested which would allow the participant to better control the flow of work across the resources. For example, I like CPDN, and would like to have CPDN running nearly full time on my system, but never more than one task at a time. Because CPDN tasks are (or can be) "delicate" you can crash multiple models if you have more than one running at any given moment.

But, you are correct in the statement that there is a lot of resistance. There is an assumption on the part of Dr. Anderson that most people run their systems unattended and he may be correct. But the truth is that we really don't know how people use their systems with BOINC. I think a lot of people spend a lot more time looking at BOINC than he imagines. Until and unless we start instrumenting BOINC to track operational patterns we will never know this.

@Mod.Sense

You are close. History would be tracked, but in a different manner. And would come most into play with people with more projects than the norm. If you are only running one or two projects, history has almost no part to play in day-to-day operation and mostly can be ignored.

Following your example:

Priority 1: Ralph
Priority 2: Rosetta
Safety: Einstein

In this case, your client would always be looking to get Ralph work and would download it in preference to all other work. But, it would save time for Rosetta work to run when Ralph work was not available. If, and only if, both Ralph and Rosetta had no work would you try to get work from Einstein.

Priority 1: Ralph
Priority 1: Rosetta
Safety: Einstein

In this case, there would be no specific preference for obtaining work though there would be a tendency to try to get work from Ralph when it was available because Ralph has work on an intermittent basis. So, because it is likely that the last batch of work was obtained from Rosetta, when more work was needed, we would check first with Ralph to see if it had work. If none, then we would get work from Rosetta, rinse, repeat.

The "decay" or "debt" mechanism to balance the rotation is one part that is a little nebulous in my mind. I am not sure the current "debt" system is all that good in balancing the way we pull work. More importantly there are questions as to why some of the "hidden" rules exist and if they work that well ... or if they effectively gut the intent of Resource Share entirely.

I am going to make another long post to describe some more details and hopefully have some examples "soon" (today?) ...
ID: 60735 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 60736 - Posted: 19 Apr 2009, 18:53:07 UTC
Last modified: 19 Apr 2009, 18:55:38 UTC

my comments were interrupted and i see paul has made his message more along the lines i was thinking.
ID: 60736 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60738 - Posted: 19 Apr 2009, 19:17:15 UTC - in response to Message 60736.  

my comments were interrupted and i see paul has made his message more along the lines i was thinking.

Commentus interruptus ... I hate it when that happens ... :)
ID: 60738 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60740 - Posted: 19 Apr 2009, 19:48:15 UTC - in response to Message 60739.  

Does this mean that every project with a lower priority than the 'number one' project will never be more than a fallback option? Where does that leave a project which I want to be worked on for say, 10% of the total time?

No, it would mean that you would spend half of the time of higher priority projects on the lower priority projects. I hope to explain THAT part in the next post. The first was to develop the basics ... I broke it up because the complaint in the past was my concept posts are too long ... now we see the other problem ... if they are not long they are incomplete and we have other problems...
ID: 60740 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
nick n
Avatar

Send message
Joined: 26 Aug 07
Posts: 49
Credit: 219,102
RAC: 0
Message 60743 - Posted: 19 Apr 2009, 20:44:53 UTC

nice write up! I agree it is junk!
ID: 60743 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60745 - Posted: 20 Apr 2009, 1:13:40 UTC

Is it time to junk the concept of Resource Share?

Summary:
1) Resource Share does not meet the needs of a significant percentage
of participants
2) Resource Share is thwarted by project work "throttling"
3) Resource Share is only shown as a collective across all resources
4) Resource Share is ineffective in allocation for projects with
intermittent work loads
5) Resource Share allocation is trumped by hidden rules
6) Resource Share has not evolved even though almost all other aspects
of work issue have

7) Current design assumes that the participant will want to have the same Resource Share for the CPU and GPU for a specific project.
8) Work fetch asks, or can ask for work from a project that it does not have, by resource class.

If we allow the assignment of priority in the preferences as we currently do with Resource Share we would be able to have assignments like:

CPU: Priority 3 (default)
GPU: Priority 0 (Default, no work requested or run, or the project does not have this class work)
Combined (Mixed): Priority 0 (Default, no work requested or run, or the project does not have this class work, for projects like The Lattice Project's GARLI application which will run on both resources at the same time)
NCI: Priority 0 (Default, Project does not have this class work)

An advantage to this scheme is that the work fetch would then know NOT to ask for work from projects that did not have that class work without the server side scheduler having to make a database hit to find out what the participant wants, or can use.

For projects migrating to the new system the current Resource Share would be used to set the priorities, as follows:

Priority 1 = Resource Shares 100 or more
Priority 2 = Resource Shares 99-50
Priority 3 = Resource Shares 49-25
Priority 4 = Resource Shares 24-12
"Safety" = Resource Shares 11 or less

Assuming a task queue of 30 days (I know this is long, but bear with me, the actual code will proportion this down), the allocation of time to the projects in each priority group would mean that the client would set aside the following allocation of time (we can argue the merits of a decay of less than halving, or other proportions, but let us start here):

Priority 1 = 15 Days
Priority 2 = 7.5 Days
Priority 3 = 3.75 Days
Priority 4 = 1.875 Days
Priority 5 = 0.9375 Days (~22 hours)

This queue would be a "sliding" buffer where each day the work accomplished would be used to recalculate the new allocations (i say here at the end of the day, but again this can be done "on-the-fly" as it is now). The idea being that we would try to keep on hand 15 days of work from priority 1 projects and so forth down the line. As work was accomplished the allocations would be balanced so that if 0.5 days of Priority 3 work was done we would try to get that much work from Priority 3 projects.

Again, if the priorities are the same for all projects attached there is no preference as to where work is obtained.

Lets talk practical examples. The most typical user per the user numbers is, or has attached to only one, two, or three projects (nearly 75% of all participants) this is in part because many participants try BOINC and quit, many more are only interested in a few projects, or only know of a few projects. Regardless, the typical participant might have a selection like this:

Priority 1 (C): SETI@Home
Priority 2 (C): Rosetta
Safety: Einstein
Safety: ABC@Home

In this case the system would be saving room for 15 days of SaH work and 7.5 Days of Rosetta work and attempting to keep that much on hand. If, and only if, there was no work at all from these two projects would the system contact either Einstein or ABC for work and then it would only obtain a small safety buffer of work equal to the connect interval plus a margin.

Actually an even more likely configuration would be:

Priority 1 (C): SETI@Home
Safety: Rosetta
Safety: Einstein
Safety: ABC@Home

More complex? Try my configuration (partial, we are going to add to this later):

Priority 1 (G): GPU GRID
Priority 1 (C): SIMAP
Priority 1 (C): LHC@Home
Priority 1 (C): Pirates

Priority 2 (C): CPDN
Priority 2 (C): Einstein@Home
Priority 2 (C): Milky Way@Home
Priority 2 (C): Virtual Prairie
Priority 2 (C): WCG
Priority 2 (C): YoYo@Home

In this case, the system would be trying to get 15 days of work for GPU Grid, and 5 days each for the other three projects. Each of the Priority 2 projects would be 1.25 days.

In practical terms what does THIS mean, well, SIMAP, LHC@Home, Pirates and VP almost never have work. So, when it came time to fetch more work they would be candidates for refilling the queue. For one thing if we track who has been providing us work and work that list in inverse date order they would be the first we would likely be checking LHC@Home (back-off allowing) followed by VP, then Pirates, and lastly SIMAP which has work at the start of each month.

To round out this allocation let me note that as far as the GPU goes I am one of those single project types at the moment partly by choice and partly because of the limited choices for GPU enabled projects. But in my case I would make it:

Safety (G): SaH Beta
Safety (M): The Lattice Project

In case GPU Grid goes off-line I would try to get work from those two projects, but only if I was completely out of GPU Grid work.

So, lets take care of some of the objections...

This is a far simpler system to understand than resource share as balanced among projects.

Most participants are in the class of people that really, really are only interested in one or two projects and in some cases to the complete exclusion of all other projects. All the bells and whistles added to do all this Resource Share nonsense is of no interest to them and only can serve to make their systems run worse. If they attach to more than one project it is likely so that the have a "Safety" project but would be just as happy never to run work for them. Resource Share fails utterly here, but, this scheme does not.

Resource scheduling should become actually much simpler because we can calculate more exactly how much time should be spent on a particular project for any given time period.

Tasks that are longer than the queue allocation will be a problem. Say CPDN which can take longer than I might allocate in the 30 day queue. Well, even if we run in strictly deadline order and keep the rule that we don't miss a deadline ... well, so what. Just as we do now the system will "borrow" time from the future and run the task. Then, until the daily deficit is made up no more tasks are fetched.

22 hours a month is not much time. Well, on single core systems, sure, but how many people are running single core systems with more than one or two projects on them? Probably not very many ... the smallest system I have seen lately is a dual core and 4 core is the midrange and easily available for $800 ... and i7 systems are becoming common higher end systems. Apples latest Mac Pro is 16 CPUs ... in these cases we are now talking at least 44 hours, 88 Hours or 176 hours (7 days per month) ... not insignificant time periods.

In fact, I have been simulating this system with resource shares for my suite of 30+ projects and it works pretty well ... (Resource Shares of 200, 100, 50, 25, 10 and 5)

In fact the only major flaw I can think of at the moment is the fact that queue "flexing" might seem to increase with this system. But, this would be more of an artifact of the lack of work from specific projects than from a flaw in the design. The reality I think would be that after a settling in period the participants would actually see a more stable queue from projects with work and only those projects without work would cause issues and they would be easily seen and understood.
ID: 60745 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,203,343
RAC: 3,244
Message 60749 - Posted: 20 Apr 2009, 12:04:10 UTC - in response to Message 60732.  
Last modified: 20 Apr 2009, 12:16:02 UTC

mikey, how can the client work fetch assure work for a given CPU, if the projects are only intermitantly making work available and not giving you as much as you request?


I guess it would have to do it based on Paul's priority system. Project A first, Project B second and Project C third BUT still able to do this on a per cpu basis. So for instance I could setup the 1st core of a dual core system to crunch Ralph 1st then if no work go to Rosetta 2nd and then if neither of them have work goto Einstein 3rd. If none of the projects have work, in my example the 1st core would sit idle, which is one reason for the user to put several RELIABLE projects on the priority list. This is obviously not something the causal Boinc user would want to do, but for those of us that have been around a while and have our favorite projects that we like to contribute to, but just a little bit, my idea would work. Is my idea perfect, NO it is not, but I think you now have the basics of the idea I am thinking about.

Paul's system keeps the LTD sytem, my idea gets rid of that, and if you get work from project B you would only get enough until your Time to Connect lets you connect and you try to connect to project A again. Now deadlines being what they are, different for every project, you might be getting more work that you would at first think you might need because projects b's unit finishes before your Time to Connect arrives so you would get extra units from project A to hold you over. People with always on internet connections would not be affected by this last part.
ID: 60749 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 60752 - Posted: 20 Apr 2009, 12:53:05 UTC

Is this work fetch/scheduling idea floated on other project fora or only here? If only here, why? Shouldn't this be on the BOINC fora for generalized discussion or is that like posting republican ideas on a liberal website -- the audience just isn't interested and will rip you to shreds for mentioning it?
ID: 60752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60757 - Posted: 20 Apr 2009, 19:11:43 UTC

@Mikey

One of the mind model problems in the BOINC universe is that sometimes concepts get muddled and no one notices. Specifically there is a blurring of the point of work fetch and resource scheduling. At the moment I am most specifically addressing work fetch only because I think that the scheduling of the work on the processors should be rather more in line with deadline order with attention paid to most other priorities only to the degree that I would be more restrictive of some current policies.

To these points I would abandon processing of tasks only if there was the certainty of deadline miss if we didn't and that we would to the extent possible never run more than one task at a time from a project on a specific resource class. In other words if I have 4 CPUs and tasks from 10 projects I would never run two tasks from any one project unless I was in deadline jeopardy.

Fundamentally, if you correctly control the gateway, the population on hand is the one that you want. At the current time we are doing neither well ...

@Nothing But Idle Time
I posted here because Einstein and SaH are both having significant difficulties with their servers, firstly, and secondly, SaH has grown increasingly hostile to, ahem, pagan ideas ...

But, not that it matters, I have also posted this on the Dev mailing lists but in that most people don't read those I decided to post here for the nonce and collect some comments.
ID: 60757 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
j2satx

Send message
Joined: 17 Sep 05
Posts: 97
Credit: 3,670,592
RAC: 0
Message 60758 - Posted: 20 Apr 2009, 20:51:33 UTC - in response to Message 60757.  

@Mikey

One of the mind model problems in the BOINC universe is that sometimes concepts get muddled and no one notices. Specifically there is a blurring of the point of work fetch and resource scheduling. At the moment I am most specifically addressing work fetch only because I think that the scheduling of the work on the processors should be rather more in line with deadline order with attention paid to most other priorities only to the degree that I would be more restrictive of some current policies.

To these points I would abandon processing of tasks only if there was the certainty of deadline miss if we didn't and that we would to the extent possible never run more than one task at a time from a project on a specific resource class. In other words if I have 4 CPUs and tasks from 10 projects I would never run two tasks from any one project unless I was in deadline jeopardy.

Fundamentally, if you correctly control the gateway, the population on hand is the one that you want. At the current time we are doing neither well ...

@Nothing But Idle Time
I posted here because Einstein and SaH are both having significant difficulties with their servers, firstly, and secondly, SaH has grown increasingly hostile to, ahem, pagan ideas ...

But, not that it matters, I have also posted this on the Dev mailing lists but in that most people don't read those I decided to post here for the nonce and collect some comments.


If you are managing to control your workload with the resources as you have them, wouldn't it work to merely have the option to also limit the number of WUs by project, so you would only have the exact number of WUs you want in the work queue?

ID: 60758 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60759 - Posted: 20 Apr 2009, 21:18:47 UTC - in response to Message 60758.  

If you are managing to control your workload with the resources as you have them, wouldn't it work to merely have the option to also limit the number of WUs by project, so you would only have the exact number of WUs you want in the work queue?

That is micromanagement. :)

The point they impress on me is that we should not add controls that allow participants to micromanage their computer's BOINC client. Of course I do, at times, use suspend (CPDN in particular) to control me down to only one task running. Adding a control to do that automatically is a no-no it appears...

To put it mildly, they don't actually have a consistent argument. THey just use that one if they do not want to add a feature, even when that feature is asked for by at least two projects ... yet we already have controls that people use to micromanage (and far more than I do) ... but new ones that might help are verboten ...
ID: 60759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60766 - Posted: 21 Apr 2009, 17:46:48 UTC - in response to Message 60764.  

If I was attached to say 4 projects and would like to do something like this:

Project A: 50%
Project B: 20%
Project C: 20%
Project D: 10%

1) Project A: 50%
2) Project B: 20%
2) Project C: 20%
3) Project D: 10%

Though the pri 2 projects would be at effectively 25% not 20

How would I do that using priorities? And what would I do if I wanted to raise the time-share of project B to 25% and lower the time-share of project C to 15%?

You couldn't. The point was to simplify the system and as a consequence you give up some flexibility to gain simplicity. The question is how many people actually get that granular besides you and perhaps I? If you look at the numbers the vast majority are attached to one to three projects.

The priority system is, looking at, it a totally different system with a different sort of target. Resource-share aims at dividing up a total crunch time, the priority system doesn't. And maybe you should not want that, it makes the system too complex, in my opinion.

Actually they are aimed at the same ideal. Allowing you to allocate your resources to your pleasure.

What about this idea? Sort of combine the 2 systems, priority and resource-share.
Default priority would be 1 (highest priority). Every project you attach to would have that priority initially. Sub-divide time between projects sharing the same priority using shares.

This is adding to the complexity with not perceivable gain. Again, the priority 1 projects get twice the time of priority 2 projects and so on down the line. no need for resource share at all ...
ID: 60766 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60768 - Posted: 21 Apr 2009, 19:31:00 UTC - in response to Message 60767.  

The advantage of the change I proposed is that I would be able to manage BOINC the way I like it, while still allowing the possibility of a backup-project and better management of projects which don't offer work all the time.

Agreed, the problem is that the development team is very reluctant to embrace new ideas. The concept here is to simplify the logic of BOINC Clients and yet to deliver improved behavior. Also by eliminating lots of code and tests for this and that the simplified algorithms would have a better chance of working bug free.

I, and others (as noted in this thread as one example) are seeing anomalies in the way that the current client obtains and schedules work. When i can get an answer at all, or a discussion, the gist is that I should not believe my lying eyes.

Yet, on most of the projects I monitor the trend is that a person will try 6.6.2x and after a week or two moves back to an earlier version.

Fundamentally, the reason I disagree with your notion is that the point is to remove complexity, not add it. We have been adding complexity since BOINC started and I am not at all sure that the complexity is warranted or needed.
ID: 60768 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nickhorsky

Send message
Joined: 11 Mar 07
Posts: 10
Credit: 134,214
RAC: 0
Message 60771 - Posted: 22 Apr 2009, 1:48:01 UTC

Until something better come along my limited CPU power is working for Rosetta. That's a period. In the future a better way may come along but for now the average cruncher like me probably would prefer the the K.I.S.S. method. Let's not complicate what is working or scare crunchers away. This logic no matter how well thought out serves no one! How about this thread goes away?
ID: 60771 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60772 - Posted: 22 Apr 2009, 1:56:55 UTC - in response to Message 60771.  

Until something better come along my limited CPU power is working for Rosetta. That's a period. In the future a better way may come along but for now the average cruncher like me probably would prefer the the K.I.S.S. method. Let's not complicate what is working or scare crunchers away. This logic no matter how well thought out serves no one! How about this thread goes away?

The flaw in your argument is that most of the rules of work fetch and resource scheduling are a complete waste of time for you. With a single project attached you could get by with 90% less code in these areas.

And, that is also what I am trying to do. Move BOINC back to where it is keeping it simple.

In the first post I point out that most people are single project participants and those that are attached to more than one are usually doing so for "safety" purposes. For all those, BOINC is sub-optimal because BOINC *WILL* do work for your safety project even if you don't want it to because of the limitations of the design based on Resource Shares.

In your case, you would see no difference in operation unless you could measure the increase in operational speed because BOINC would be wasting less time doing things that are nor needed.

As far as the thread going away. If you are not interested in the discussion feel free not to read it... :)
ID: 60772 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60799 - Posted: 23 Apr 2009, 21:31:46 UTC

To continue to develop the ideas I have proposed I will point out a maxim of quality control, if you don't measure it, you have no idea of the quality. And you can only monitor the quality of those things that you measure. I know these are not the classic words, but the idea is the thing. The main point being that we measure very little in the world of BOINC and so we actually know very little about how well BOINC works. In one sense, for some, that is good because it is hard to toss rocks if you don't have hard numbers and, well, it is hard to get numbers if BOINC does not provide them.

Here are some quality metrics that would be useful in the monitoring of the interactions between BOINC clients and the projects. The point here is not to throw rocks at the projects, but the make BOINC clients smarter about how to meet the participant's desires. I think that Resource Share is not the way to get there, but even if we stick with RS these metrics, if we captured them, would at least let us start to know what is going on between the projects and the clients. Personally, I would like an "opt-in" option so that these numbers (and others too as we develop them) could be reported back to the developers so that they could see actual operational modes AS WE USE BOINC IN THE FIELD.

I know that they have a tough time with people like me, but, here is the thing ... because we don't monitor the beast we don't really know what it is doing. This is one of the things I have suggested before, and I am suggesting it again. Those that want to, can Opt-In and I am sure that enough of us would do so that useful patterns would emerge.

Anyway, metrics:

Availability Rate: Tracked by counting the number of hits to the project that are satisfied. I would also display this as a percentage on the Projects Tab for the participant to monitor.

Satisfaction Rate: The number of requests that are filled to the amount requested (ask for 5,000 seconds of work, get 5,000 seconds of work).

Throttle Limit: Very Low, Low, Medium, High, None This could either be set by the project as part of the preferences set for the project or it could be discerned by the numbers in the scheduler messages (received "daily" and "per CPU" limit messages). Very Low would be for 10 tasks or less either limit, low 11 to 25, Medium: 26-50; High: 51-1,000; None: 1,001 or more. Again, these are suggestions and the upper limits may want to be adjusted and maybe additional levels put in to smooth the curve.

Availability Type: Very Intermittent, Intermittent, Scheduled Outages, Scheduled Availability, Unscheduled Outages. Again, this could be a locally tracked value or in the preferences. Projects would start out as VI and as work was obtained the average span of time between requests that are satisfied could be tracked. Scheduled Outages and Scheduled Availability would be a little tricky to detect I am thinking, but we have two clear examples of each with the scheduled outage being easier to detect in that there is a scheduler message indicating that scheduled maintenance is being performed. SaH has, for example Tuesday outages and SIMAP has scheduled availabilities. To feed down this information the scheduler message suite could be expanded to include the current scheduled outage message, adding a unscheduled outage message, and in the no work available message it could be modified so that the client could be told that work is only available on a scheduled basis.


So why do all this? Well, at the moment one of the things that bothers a bunch of us is that BOINC does not handle intermittent work very well. In my opinion, the proliferation of intermittent projects means that the elaborate rules we have put into place in many cases are trying to shield us from the consequences of this anomaly, but in that they are not addressing the SPECIFIC issue, they work on a hit or miss basis. And may, in fact, make the real situation worse.

However, if we knew that SIMAP was available only during one week a month we could "reserve" or "plan" for that so that during that period we would concentrate on getting work from that project and balance the pulls so that we would more closely match the desires of the participant. I know we try to kludge this with LTD and STD but, the truth is that BOINC does not really do much to address the imbalance that has built up. I don't know if it is because of the caps to the debt build up or something else, but the point is, if it were attempting to achieve a balance I would be doing far more SIMAP in that first week every month than I actually accomplish.
ID: 60799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60801 - Posted: 23 Apr 2009, 22:11:20 UTC - in response to Message 60798.  

@Mod.Sense,

Can we extract out Tomasz's issue to a new thread? It is important, but a sidetrack from RS theory.


Created new thread for the topic. Let me know if I've not moved correct posts to it.
Rosetta Moderator: Mod.Sense
ID: 60801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Resource Share Obsolete?



©2024 University of Washington
https://www.bakerlab.org