Message boards : Number crunching : What about Docking@home and Proteins@home?
Author | Message |
---|---|
Gerry Rough Send message Joined: 2 Jan 06 Posts: 111 Credit: 1,389,340 RAC: 0 |
|
Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
Are the new BOINC projects Docking@home and Proteins@home similar to Rosetta? I sort of admit I think they probably are. But how so, and will their research also help Rosetta to complete the protein prediction picture? AFAIK Docking is a kind of follow up of Predictor. At least some of the staff are the same: Team Docking vs. Team Predictor. I'm no protein scientist, so it's up to those to decide about the sameness of the research, or better the (important) differences. |
Michael G.R. Send message Joined: 11 Nov 05 Posts: 264 Credit: 11,247,510 RAC: 0 |
Docking seems to be invitation-only right now (unless I'm missing something). Do you know how to get one of these invitations? |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Saenger, I believe he was talking about Proteins at home. A French project from École Polytechnique. It too takes an invite code. See sig below. They do have some issues to solve at the moment. It's also Winxp only, but there seems to be some progress in getting win9x to work, but I'm not sure of that. This is very alpha and was supposed to be kept quiet, but now that they're exporting stats, I suppose they won't mind. |
Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
Docking seems to be invitation-only right now (unless I'm missing something). Write an email to the devs (email available somewhere on their pages), und if you own a mac or a Linux machine, you might get one. Saenger, I believe he was talking about Proteins at home. A French project from Ecole Polytechnique. It too takes an invite code. See sig below. I know, but I don't know about Proteins besides that. That's why I didn't say anything about them. But I do crunch for Docking, and I know that M.Taufer and A.Kerstens were members of the Predictor Team. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?) At that time there were several different projects, and while they all looked at protiens they were either looking for different things or were trying different techniques. The impression I took away from the posting was that all the different approaches had some value and that I personally was not equipped to make a call between them on the basis of the science. So my guess is that in the big picture both those new projects will be helping to complete the picture. To some extent too I'd guess they are in competition with each other in the race to get the best prediction technique etc. River~~ |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
... Proteins at home. A French project from École Polytechnique. ... When President de Gaulle was around they'd never have got away with that name - it would have had to be ProteinsChezNous or suchlike ;-) They do have some issues to solve at the moment. It's also Winxp only, but there seems to be some progress in getting win9x to work, If XP works then it is possible that 2k will too. 2k is much more like XP than the 9x versions of windows
Many new projects have enjoyed the fantasy that they can export stats and still keep hush. It is a fallacy of course - once credits are released they find their way into people's sigs and then others want to find out more. But I'd suggest that it is not safe to assume that this project is free from that fanatsy. R~~ |
zombie67 [MM] Send message Joined: 11 Feb 06 Posts: 316 Credit: 6,621,003 RAC: 0 |
|
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?) I see they have a much nicer forum layout as well :-) Rosetta should take note ;-) If anyone can post there could you ask how it compares to the THINK virtual screening/docking as developed by www.treweren.com I'm sure there are a few people that would like to know that of which some HIV protease inhibition can be found here ? Since it's invite only I cannot post there (afaik) Team mauisun.org |
Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
Since it's invite only I cannot post there (afaik) But they can post here ;) I'll post a link to this thread in the forum. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
Some time ago (six months maybe?) I saw a posting from David Baker comparing the different protein projects. Not sure if that was here, or on someone else's website. (Anyone still have a link to it?) A useful link, thanks zombie. But actually I was thinking of this page which has a comparison of lots of projects that were around when the page was written -- BOINC and other dc projects. In the Life sciences part of the page is an explanation from WCG aboyut the difference between what they do with Rosetta and what this project does with Rosetts, followed by a piece from our David about how Rosetta compares with some of the other protein projects around then. After you have read the life sciences part, scroll down to the bottom where he gives his own personal choice amongst projects. Parts of the page are more up to date than others - it is a one-man effort by Dimitri (apart from the quotes he gets from project scientists) so can't really fault him on the fact that some points are out of date. And by the way the physics part of the page is just as useful, as are the comments on a few non-BOINC projects. River~~ |
[AF>Slappyto] popolito Send message Joined: 8 Mar 06 Posts: 13 Credit: 1,041,105 RAC: 2 |
Proteins@home allows to find the differents sequences of amino acids for a folding. The project goal is to calculate the energy fonctions for the differents sequences. They call it the : the inverse problem of the folding. http://biology.polytechnique.fr/proteinsathome/documentation.php I'm sorry for my bad English :) |
hugothehermit Send message Joined: 26 Sep 05 Posts: 238 Credit: 314,893 RAC: 0 |
If Docking@Home isn't using redundancy, could someone tell Dr Armen to fix up the BOINC credit problem before Docking@Home goes through the Rosetta@Home experience. I'm sure that the Rosetta@Home team would be more than willing to explain why it is very important to fix it properly and early in the projects life and explain what the credit system ,they came up with, is here. If it is using redundancy, less power to them :) sorry about the bad wordplay :p |
Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
If it is using redundancy, less power to them :) sorry about the bad wordplay :p Every project is using redundancy, or it's plain random bills*** and not science. Some do it by sending the same work to different participants (like Docking, Einstein, Malaria...), some do it somehow on the serverside (like Rosetta, CPDN). How it's done is secondary, but if it's not done it's not worth any power ;) |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
If it is using redundancy, less power to them :) sorry about the bad wordplay :p There certainly has to be some protection against rogue results (results that are wrong whether by dishonesty, or over enthusiastic overclocking, or a faulty computer, etc) Having agreed thus far, there is an important difference between redundancy and other forms of error control like building an ensemble (which we are doing here). With redundancy (as the word is used by BOINC) exactly the same task is crunched by two different users. With an ensemble, everyone does different tasks (for example with different starting conditions (CPDN) or a different set of random numbers (Rosetta)) and the end result is derived from the ensemble of all the different results in such a way that a few rogue results are unlikely to make any difference (or the end result can be independently checked. The crucial difference is that with redundancy work is lost when two accurate and honest crunchers handle the same data; but the end result is the same whether there are a few rogues or not. (Rogue may mean cheating or may mean an unwitting problem). With an ensemble the presence of a small number of rogue results would very slightly lower the overall accuracy, but to an insignificant extent but every accurate & honest client makes a unique contribution to the end result. Two other approaches are worth noting. Leiden is using both redundancy and an ensemble - each wu that goes into the ensemble is double crunched. This is undoubtedly the most accurate way of generating the end result given infinite resources, but whether they'd do better to have an ensemble twice as wide and no redundancy is another question. Some maths projects have the ability to check a result more easily than calculating it. For example it is easier to check that a square root is correct than to calculate it. So a project to calculate square roots ould run every WU just once, and the validator would just square the delivered answer to see if it was right. (This is an imaginary project - but it illustrates the point). On Rosetta, the lowest energy result is taken from the ensemble - and after all that crunching that is the one that matters. It is then easy for the team to check that that result is correct; if not they'd disqualify it and go to the next lowest. If a million decoys were run, this meqans that typically each result is only run 1.000001 times, or 1.000002 times on the rare event that some error is found in the first. And less still if there is a "short-cut" test like for the square roots. Finally, not possible within BOINC, but popular in some other grid projects, is the idea of "random redundancy". Only some WU are double crunched, but the users are not told which ones. This means that deliberate cheats have to stay honest. When a discrepancy is found, several more wu from both users are checked. If one user is found to be generating more than a very very small number of errors, all their work (and all their credit) is discarded and those WU re-worked. In my opinion the biggest weakness in BOINC was the decision to force the same degree of redundancy on every WU of a type. This is a reflection of the fact that on SETI at the time BOINC was first designed they had more computing power than they needed. Subsequent projects (including SETI when they get access to more input data) suffer from the fact that there is no degree of redundancy possible between 1 and 2. (btw - I like BOINC overall or I would not be here. I also don't think it is perfect!) For example, Zetagrid double crunched about 10% of all work, and less than 0.01% was found to be problematic, those runs then maybe generating 10x the work. This meant that each piece of work was run only 1.101 times, on average; compared to Einstein where the figure is over 2 (2 initial runs and more when needed) or LHC where is is over 5. THis mattered on Zetagrid. We missed out on being first to a trillion "zeroes" by less than 10% - we were up to 910 billion when a mainframe project got the trillion. Had the project leader gone for redundancy a la BOINC we'd have been only just over half way, a huge difference. Sadly, the Zetagrid project was not suitable for an ensemble approach or we'd have got there in fron of the mainframe. So yes, error checking of some sort is essential; but it is meaningful to talk of methods of error checking that do, or don't, involve redundancy. And it is certainly meaningful to talk of the degree of redundancy as a measure of what proportion of the work is devoted to error control. River~~ |
Saenger Send message Joined: 19 Sep 05 Posts: 271 Credit: 824,883 RAC: 0 |
You must not mix redundancy and validation. You have to validate the results, otherwise their are totally worthless, and so you have to have redundancy on some level or the other. If you have projects like CPDN or Rosetta, where a random seed starts a simulation, no real validation on the WU-level is needed. The whole process is extremely redundant as you crunch the same simulation thousands of thousands of times with just wee alterations. In projects with a vast amount of data to be searched for something meaningful like the needle in the haystack (Einstein: gravitational waves, Seti: radio signals) that's simply impossible, here the better solution has to be validation on a per WU level. There are probably projects that belong in both categories somehow. I don't know, but perhaps Leiden is one of those. The stupid accusation of "wasted CPU-time" for a needed validation is just that: stupid! I can be discussed how the validation can be performed without too much redundanxcy, but it's better to have more redundancy then to have too little. |
thom217 Send message Joined: 29 Oct 05 Posts: 12 Credit: 182 RAC: 0 |
I remember there was a gentleman who was in touch with Keith Davis, the head of the Find-a-Drug project, at the time of the project closure. He is one of the people responsible for running Chmoogle now called eMolecules search engine. http://www.emolecules.com/ http://usefulchem.blogspot.com/2005/11/chmoogle.html Jean-Claude Bradley http://www.blogger.com/profile/6833158 He might be able to contribute to the Docking@Home database. |
thom217 Send message Joined: 29 Oct 05 Posts: 12 Credit: 182 RAC: 0 |
There is also a copy of the posts of Caroline made at the FaD forum. http://www.fadbeens.co.uk/phpBB2/viewtopic.php?t=248 |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
In my opinion the biggest weakness in BOINC was the decision to force the same degree of redundancy on every WU of a type. This is a reflection of the fact that on SETI at the time BOINC was first designed they had more computing power than they needed. Subsequent projects (including SETI when they get access to more input data) suffer from the fact that there is no degree of redundancy possible between 1 and 2. Actually, all the BOINC wu-parameters is set per wu, this includes min_quorum and target_nresults, but the most common is for the project-specific wu-generator to use a config-file there the BOINC-wu-parameters is constant. Example, in the Seti_Enhanced wu-generator (splitter), the angle-range of a wu decides fpops_est, fpops_bound and delay_bound (deadline). It will be no problem to extend this functionality by adding a couple lines looking something like this: if AR < 0.4 => min_quorum = 3, target_nresults = 4 if 0.4 <= AR < 0.5 => min_quorum = 2, target_nresults = 3 if 0.5 <= AR < 1.1 => min_quorum = 3, target_nresults = 3 if 1.1 <= AR => min_quorum = 2, target_nresults = 2 Well, guess you've got my meaning, so I'll not go any more off-topic in this thread. :) "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
In my opinion the biggest weakness in BOINC was the decision to force the same degree of redundancy on every WU of a type. This is a reflection of the fact that on SETI at the time BOINC was first designed they had more computing power than they needed. Subsequent projects (including SETI when they get access to more input data) suffer from the fact that there is no degree of redundancy possible between 1 and 2. I think you missed my meaning. Your example indicates that all WU of a type (ie with the same paramters) will have the same redundancy. Non-BOINC projects may decide the redundancy after the scheduler knows who is going to get the work, so that for example there can be hosts that are more-trusted and less-trusted. Also it can be decided after the event to increase the redundancy retrospectively if one of the hosts (or the only host) is then discovered to have been cheating elsewhere. Also these ways of doing things allow for random testing (with a patteren known before, or only after, the initial crunching). In contrast to those schemes, BOINC insists that the need for redundancy inheres in the data of the WU not the crunchers. By insisting that the user can see the degree of redundancy, the use of random blind redundancy is ruled out, a pity as it is the most efficient way of spotting deliberate cheating. R~~ |
Message boards :
Number crunching :
What about Docking@home and Proteins@home?
©2025 University of Washington
https://www.bakerlab.org