Improvements to Rosetta@home based on user feedback

Author	Message
senatoralex85 Send message Joined: 27 Sep 05 Posts: 66 Credit: 169,644 RAC: 0	Message 16084 - Posted: 12 May 2006, 19:16:13 UTC I think it is great to see this projects enthusiam towards CASP. May I suggest changing the deadlines to maybe a week or so during this project? I also think this would be appropriate since checkpointing has met with such a success thus far. ID: 16084 · Rating: 0 · rate: / Reply Quote

Kerwin Send message Joined: 19 Sep 05 Posts: 10 Credit: 1,773,393 RAC: 0	Message 16173 - Posted: 13 May 2006, 16:56:38 UTC - in response to Message 16057. Last modified: 13 May 2006, 16:59:08 UTC Aglarond, I agree with you. I was on the Einstein boards a few days ago and it was stated by Bernd Machenschalk, a project developer, that they hired him a consultant from the project. In fact, during the time of his posting, Akos was sitting next to Bernd looking over the code. I do remember a few weeks ago that he expressed interest in trying to optimize Rosetta. It would be great to have him. Because of his S41.06 client, my crunch time for a 'long' Einstein unit now hovers in the 45 - 48 minutes range, compared to about 2 hours 45 minutes and more with the standard client. If he could bring his magic here now, I think it would be very good considering CASP has started. I posted this idea on other thread, but now I think it belongs here: What about kindly asking Akos Fekete, who made optimizations on Einstein, if he could look at Rosetta and try to make optimizations here? Although, it may be necessary to pay him for his effort, as he is rather busy. ID: 16173 · Rating: 0 · rate: / Reply Quote

Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0	Message 16193 - Posted: 13 May 2006, 19:54:01 UTC - in response to Message 16173. I was on the Einstein boards a few days ago and it was stated by Bernd Machenschalk, a project developer, that they hired him a consultant from the project. In fact, during the time of his posting, Akos was sitting next to Bernd looking over the code. I do remember a few weeks ago that he expressed interest in trying to optimize Rosetta. It would be great to have him. Where did you read that Akos expressed an interest in optimising Rosetta? That would be great, but when I mentioned it in passing to him, a couple of months ago in the Einstein forums, it was met with "benign indifference" (or so it seemed to me). Akos is evidently a code wizard (I've met a couple) and his help could make a huge difference for any project. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity ID: 16193 · Rating: 0 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16225 - Posted: 14 May 2006, 5:38:40 UTC Last modified: 14 May 2006, 19:40:29 UTC I did contact Akos a while ago, but he for the moment wants to focus on further improvements to the Einstein code. On the other issue--rosetta@home "oldtimers" will remember we were planning to release the rosetta source code several months ago in hopes that another Akos might be able to make significant speedups, but many participants objected to this for a variety of reasons (code corruption, cheating, etc.). Our general philosphy is that all results and code we develop should be accessbile to the public and the scientific community. ID: 16225 · Rating: 0 · rate: / Reply Quote

senatoralex85 Send message Joined: 27 Sep 05 Posts: 66 Credit: 169,644 RAC: 0	Message 16236 - Posted: 14 May 2006, 6:32:17 UTC - in response to Message 16225. On the other issue--rosetta@home "oldtimers" will remember we were planning to release the rosetta source code several months ago in hopes that another Akos might be able to make significant speedups, but many participants objected to this for a variety of reasons (code corruption, cheating, etc.). Our general philosphy is that all results and code we develop should be accessbile to the public and the scientific community. ----------------------------------------------------------------------------- Hmmm. This is an interesting issue. I currently attached to Ufluids but do not crunch for them because I lost too much work. Anyways, the code was released over a month ago with their project and I have yet to see any progress with their code. Depending on ones reasoning, I see that both sides of the issue have good arguements. I would agree with Dr. Baker that the code should be released in order to provide a service to the scientific community. I do not think releasing the code to optimize it will do much good. I have observed that a majority of the crunchers, such as myself, have little knowledge of "coding" and could not help even if they wanted to. The select few with the knowledge probably do not have the time, interest, or are manipulating it for their own benefit. Although I have been with this project almost as long as David, I soon stopped crunching due to all of the errors my computer was getting at the time. Only recently have I crunched for this project therefore I do not know the history of the discussions here. My personal suggestion would be to continue recruiting expertise from admins on other BOINC projects like you seem to be doing. Chrulle, the former admin over at LHC is looking for a job. Have you talked to him? ID: 16236 · Rating: 0 · rate: / Reply Quote

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 16295 - Posted: 15 May 2006, 9:52:14 UTC - in response to Message 16236. Seti The general optimisations over at Seti where mainly done by the compilers, adding faster math functions (IPP, etc.) and compiling for specific instruction sets. (SSE, SSE2 and SSE3). There where some tweaks to caching etc but it was mainly down to the compilers. The Seti-enhanced is now using these 'math instuctions' with some of the extra tweaks added. Though it does not use the compiler optimisations for instruction sets, so that is what you'll see being released for their optimised apps. Einstien aksof, reverse engineered einstien@home since they did not release the source. he dropped in code for SSE and SSE3 but I think his best work must be the 3DNow! drop ins which make even AMD K6's and Athlon (pre AthlonXP) faster. None of iwas compiled from source, though I'm assuming he is now ;-) Rosetta Given the fast change to the code it would need to be source code optimised speeding up generic sections. I do not think instruction sets (3dNow!, SSE->SSE3 and maybe even multi-threading for dualcore/hyperthreaders) would work here in the model of the above. What I believe would need to happen is that BOINC pulls out the instruction sets from the CPU (relativly easily if not already done) and send it to the server, then the server sends out the correct app for the correct CPU and OS, that way they (Rosetta) just need to compile for each target, get tested on Ralph, then released over here. This is purley due to the pace of application changes, by the time someone compiled an optimised one, it'll be out of date and we would be forever trying to keep up. Either that or compile the sci-app to determine for itself at run time, though this is not as fast and bulks the program out a little. Team mauisun.org ID: 16295 · Rating: 0 · rate: / Reply Quote

Tallguy-13088 Send message Joined: 14 Dec 05 Posts: 9 Credit: 843,378 RAC: 0	Message 16297 - Posted: 15 May 2006, 11:09:40 UTC - in response to Message 16295. Last modified: 15 May 2006, 11:12:44 UTC In the world of the "mainframe" where I work for a living, it is not uncommon (in low level system code), to see subroutines added that address specific optimizations at a "processor family" level or OS release level. This is typically done to address to address "incompatible" architectural changes (i.e. control block residency/format changes, etc). In the case of the multiple processor types such as is present in a project of this nature, I suspect that this would be a complex change to the code for the developers. My concern would be that there is a strong potential for making the code too complex for the average developer to maintain. Remember, this is still "basic science" and the focus is still on developing the techniques of the process Dr. Baker and crew are working on. As such subject to regular change. I'm thinking that they want to focus on developing the techniques and not be "bogged down" with the specifics of processor level optimizations quite yet. Once that is "where they want it", then they can then set off on the task of "stroking the code". Just my 2 cents. Seti The general optimisations over at Seti where mainly done by the compilers, adding faster math functions (IPP, etc.) and compiling for specific instruction sets. (SSE, SSE2 and SSE3). There where some tweaks to caching etc but it was mainly down to the compilers. The Seti-enhanced is now using these 'math instuctions' with some of the extra tweaks added. Though it does not use the compiler optimisations for instruction sets, so that is what you'll see being released for their optimised apps. Einstien aksof, reverse engineered einstien@home since they did not release the source. he dropped in code for SSE and SSE3 but I think his best work must be the 3DNow! drop ins which make even AMD K6's and Athlon (pre AthlonXP) faster. None of iwas compiled from source, though I'm assuming he is now ;-) Rosetta Given the fast change to the code it would need to be source code optimised speeding up generic sections. I do not think instruction sets (3dNow!, SSE->SSE3 and maybe even multi-threading for dualcore/hyperthreaders) would work here in the model of the above. What I believe would need to happen is that BOINC pulls out the instruction sets from the CPU (relativly easily if not already done) and send it to the server, then the server sends out the correct app for the correct CPU and OS, that way they (Rosetta) just need to compile for each target, get tested on Ralph, then released over here. This is purley due to the pace of application changes, by the time someone compiled an optimised one, it'll be out of date and we would be forever trying to keep up. Either that or compile the sci-app to determine for itself at run time, though this is not as fast and bulks the program out a little. ID: 16297 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 16312 - Posted: 15 May 2006, 14:59:33 UTC I agree that the application is changing too rapidly to worry about optimization. I know history tells us you can sometimes get 2x better throughput, but that is especially true for applications that were poorly coded to begin with. And it's especially true for applications that don't change, so the ONLY changes you are making pertain to optimizations. I would just point out that releasing source might also open the door to new platforms. Not specifically with hope of an optimized client for a platform, but rather a straight port of the code. Once a port process is in place, it's generally pretty straightforward to keep up with new releases. Any way around, none of it is stuff you want to play with during CASP, while you're already strecthing the limits and finding proteins that just happen to fit the profile of those you feel a new algorythm that's been on the shelf may work well with. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 16312 · Rating: 0 · rate: / Reply Quote

Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0	Message 16540 - Posted: 18 May 2006, 15:04:55 UTC - in response to Message 16225. Last modified: 18 May 2006, 15:09:03 UTC I did contact Akos a while ago, but he for the moment wants to focus on further improvements to the Einstein code. See this New Scientist article on Akos' Einstein work. So maybe he now completed his Einstein code speed-up activities and is looking for new challenges... ;-) Team betterhumans.com - discuss and celebrate the future - hoelder1in.org ID: 16540 · Rating: 0 · rate: / Reply Quote

XeNO Send message Joined: 21 Jan 06 Posts: 9 Credit: 109,466 RAC: 0	Message 16624 - Posted: 19 May 2006, 9:34:56 UTC - in response to Message 16410. Rosetta 5.16: (1) We're continuing our efforts to reduce memory usage by typical workunits by rosetta@home. You can expect an even further reduction in memory footprint in our next update. (2) We're testing a new science mode which uses the sequence and structural information from homologous proteins in an early phase of the simulation, but then returns to the target protein sequence in the final refinement phase. This mode appears to have a larger memory footprint than typical workunits, so we will only send out these jobs to computers that have >1Gb RAM. (3) Also, we're trying a new feature where at the end of a simulation, Rosetta compares its fold to the predictions made by a dozen other algorithms. (Those predictions are sent to the clients in a compressed format.) Seeing consensus between different algorithms is usually a good sign that a prediction is right. Is that Greater than or Equal to 1GB of RAM, or just Greater? When my Computer is not in use I have no problems with Rosetta taking a hog's share of resources. ID: 16624 · Rating: 0 · rate: / Reply Quote

Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0	Message 16820 - Posted: 22 May 2006, 8:23:28 UTC A plea: can merging be fixed? I have orphaned computers left right and centre. It's making a nonsense of the stats as well; Boinc Synergy thinks I have 6 machines when I have 3, repeat that to some degree for every producer and the stats become completely meaningless. Is there a db whiz among you? ID: 16820 · Rating: 0 · rate: / Reply Quote

tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0	Message 16830 - Posted: 22 May 2006, 12:41:00 UTC Today I got two T0287-targets: https://boinc.bakerlab.org/rosetta/result.php?resultid=21140036 https://boinc.bakerlab.org/rosetta/result.php?resultid=21177415 The expiry-date for this protein is June, 1st if I'm reading this page correctly: http://predictioncenter.org/casp7/targets/cgi/casp7-view.cgi However the deadline for both WU were June, 5th which is past the expiry date. I suggest to shorten the deadlines according to the expiry date in order to not receive results too late. ID: 16830 · Rating: 0 · rate: / Reply Quote

David Baker Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 17 Sep 05 Posts: 705 Credit: 559,847 RAC: 0	Message 16841 - Posted: 22 May 2006, 16:01:41 UTC - in response to Message 16830. Today I got two T0287-targets: https://boinc.bakerlab.org/rosetta/result.php?resultid=21140036 https://boinc.bakerlab.org/rosetta/result.php?resultid=21177415 The expiry-date for this protein is June, 1st if I'm reading this page correctly: http://predictioncenter.org/casp7/targets/cgi/casp7-view.cgi However the deadline for both WU were June, 5th which is past the expiry date. I suggest to shorten the deadlines according to the expiry date in order to not receive results too late. Good point--thanks for catching this! ID: 16841 · Rating: 1 · rate: / Reply Quote

NJMHoffmann Send message Joined: 17 Dec 05 Posts: 45 Credit: 45,891 RAC: 0	Message 16843 - Posted: 22 May 2006, 16:17:36 UTC - in response to Message 16840. Last modified: 22 May 2006, 16:19:14 UTC Actually there are a number of dates that are pertinent depending on the catagory of CASP in which a project submits its results. The dates you are seeing are for server predictions. The predictions that Rosetta is working on are in a different category. The reporting dates they are using have already taken into account the dates the project needs in order to meet the CASP deadlines for the category in which they will submit their results. Are you shure? The CASP-website says the release date for the structure will be at the 4th of June. Norbert Edit: Just saw the answer of David Baker. ID: 16843 · Rating: 0 · rate: / Reply Quote

Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0	Message 16844 - Posted: 22 May 2006, 16:45:45 UTC Last modified: 22 May 2006, 16:48:06 UTC The stats you cite are only adversely effected to the extent that individual machine statics are the focus of the information desired. The vast majority of the stats are in fact collective, and not dependent on a view of a specific machine. All of the stats that depend on your total credit are still as accurate as ever, even the rac for a particular machine will reflect the proper contribution if allowed to do so. The only stats significantly affected by this issue are those relating to the total credit for a particular machine. True, it doesn't affect the project as such. I was primarily interested in the proportions of different CPUs when I was looking at the stats yesterday; it was then that it occurred to me that the numbers were probably out by a large margin. This in turn affects the credit/RAC averages per processor type, which are also interesting. Just a way of deleting "false start" instances of machines would be a good place to be - I have several with zero credit; one took 3 "false-starts" before it kicked into life last time, so the credit of one machine is read as that of 4. It's pointless jeopardising the data in it's entirety while trying to fix it - I'll wait until the bugs are out, thanks. :) ID: 16844 · Rating: 0 · rate: / Reply Quote

Jim_S Send message Joined: 26 Aug 06 Posts: 15 Credit: 497,976 RAC: 0	Message 43953 - Posted: 19 Jul 2007, 23:17:39 UTC Why is Rosetta eating most of my resources even though I have the resource share set at 10%? PEACE ID: 43953 · Rating: 0 · rate: / Reply Quote

Jim_S Send message Joined: 26 Aug 06 Posts: 15 Credit: 497,976 RAC: 0	Message 43968 - Posted: 20 Jul 2007, 13:04:16 UTC - in response to Message 43953. Last modified: 20 Jul 2007, 13:05:34 UTC Why is Rosetta eating most of my resources even though I have the resource share set at 10%? BUMP Any ideas or HEeeelllp? Rosetta seems to IGNORE my BOINC settings. PEACE ID: 43968 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 43969 - Posted: 20 Jul 2007, 13:32:00 UTC Jim, you used the term "resource share". This is a setting which allows you to define to BOINC how to allocate your machine's time across all of the projects you are attached to. If you are concerned because SETI is no longer running, it's not a problem. It's just more efficient to run one project on a machine at a time, and so BOINC rotates over the course of time between the projects and over time will spend the configured time on each project. If you really only wanted BOINC to use 10% of your CPU time, there is a setting in your General Preferences which says "use at most ___ % of CPU". It most certainly is your choice, but I'd suggest that you either give BOINC at least 50%, or perhaps set it up to run only at night or during hours of the day when you aren't using the machine (these settings are on the same configuration page). One you change the setting for the location of the machine in question, then just go back to your BOINC manager and update to the project for it to bring down these new configuration changes. Rosetta Moderator: Mod.Sense ID: 43969 · Rating: 0 · rate: / Reply Quote

Paydirt Send message Joined: 10 Aug 06 Posts: 127 Credit: 960,607 RAC: 0	Message 44058 - Posted: 22 Jul 2007, 2:00:48 UTC - in response to Message 43969. Hey Jim, I wanted to share some of my thoughts about BOINC using processor time. I've found that it does not cause any noticeable slowness in machines where I have it set to 100% CPU. The reason this is, is because BOINC is set to the "LOWEST" priority so whenever the computer needs the CPU for something else, BOINC will temporarily get out of the way. I don't even turn it off for gaming. I do turn it off for spyware and virus sweeping. ID: 44058 · Rating: 0 · rate: / Reply Quote

FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0	Message 44061 - Posted: 22 Jul 2007, 9:19:08 UTC - in response to Message 44058. Hey Jim, I wanted to share some of my thoughts about BOINC using processor time. I've found that it does not cause any noticeable slowness in machines where I have it set to 100% CPU. The reason this is, is because BOINC is set to the "LOWEST" priority so whenever the computer needs the CPU for something else, BOINC will temporarily get out of the way. I don't even turn it off for gaming. I do turn it off for spyware and virus sweeping. Depending on your Spyware/Virus program you could raise the priority a notch to save you having to stop boinc on the big sweeps. I know mine is for manual sweeps by default. Though if you run them as scheduled searches you could just alter teh schedule (assuming it uses windows built in scheduler and not it's own) to stop boinc and restart it for you. Another thing to do for this is to restrict the run times of boinc, say don't let it run all of Friday night (can be set in the advanced preferences of newer boinc clients). That way anything BOINC stops working, like Google, Windows Desktop Searches/Indexing, Diskkeeper or other defragmenters get a chance to kick in. The reason these do not work with boinc is the look for inactivity of the computer rather than priority (boinc now does similar for it's in use/not in use memory etc setting, but does it as it was intended to be done afaik). Anyway, that also gives another reason why people do not use BOINC, it interferes with Defragmenters, Virus scanners (though not so much here) and defragmenters (Diskkeeper do know about this and do have a solution). That is when using BOINC 'out of the box' and needs tinkering to get it to work nicely. Team mauisun.org ID: 44061 · Rating: 0 · rate: / Reply Quote