Improvements to Rosetta@home based on user feedback

Message boards : Number crunching : Improvements to Rosetta@home based on user feedback

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
senatoralex85

Send message
Joined: 27 Sep 05
Posts: 66
Credit: 169,644
RAC: 0
Message 16084 - Posted: 12 May 2006, 19:16:13 UTC

I think it is great to see this projects enthusiam towards CASP. May I suggest changing the deadlines to maybe a week or so during this project? I also think this would be appropriate since checkpointing has met with such a success thus far.
ID: 16084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Kerwin

Send message
Joined: 19 Sep 05
Posts: 10
Credit: 1,773,393
RAC: 0
Message 16173 - Posted: 13 May 2006, 16:56:38 UTC - in response to Message 16057.  
Last modified: 13 May 2006, 16:59:08 UTC

Aglarond, I agree with you. I was on the Einstein boards a few days ago and it was stated by Bernd Machenschalk, a project developer, that they hired him a consultant from the project. In fact, during the time of his posting, Akos was sitting next to Bernd looking over the code.

I do remember a few weeks ago that he expressed interest in trying to optimize Rosetta.

It would be great to have him. Because of his S41.06 client, my crunch time for a 'long' Einstein unit now hovers in the 45 - 48 minutes range, compared to about 2 hours 45 minutes and more with the standard client. If he could bring his magic here now, I think it would be very good considering CASP has started.

I posted this idea on other thread, but now I think it belongs here:
What about kindly asking Akos Fekete, who made optimizations on Einstein, if he could look at Rosetta and try to make optimizations here? Although, it may be necessary to pay him for his effort, as he is rather busy.



ID: 16173 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 16193 - Posted: 13 May 2006, 19:54:01 UTC - in response to Message 16173.  

I was on the Einstein boards a few days ago and it was stated by Bernd Machenschalk, a project developer, that they hired him a consultant from the project. In fact, during the time of his posting, Akos was sitting next to Bernd looking over the code.

I do remember a few weeks ago that he expressed interest in trying to optimize Rosetta.

It would be great to have him.


Where did you read that Akos expressed an interest in optimising Rosetta? That would be great, but when I mentioned it in passing to him, a couple of months ago in the Einstein forums, it was met with "benign indifference" (or so it seemed to me).

Akos is evidently a code wizard (I've met a couple) and his help could make a huge difference for any project.
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 16193 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 16225 - Posted: 14 May 2006, 5:38:40 UTC
Last modified: 14 May 2006, 19:40:29 UTC

I did contact Akos a while ago, but he for the moment wants to focus on further improvements to the Einstein code.

On the other issue--rosetta@home "oldtimers" will remember we were planning to release the rosetta source code several months ago in hopes that another Akos might be able to make significant speedups, but many participants objected to this for a variety of reasons (code corruption, cheating, etc.). Our general philosphy is that all results and code we develop should be accessbile to the public and the scientific community.

ID: 16225 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
senatoralex85

Send message
Joined: 27 Sep 05
Posts: 66
Credit: 169,644
RAC: 0
Message 16236 - Posted: 14 May 2006, 6:32:17 UTC - in response to Message 16225.  


On the other issue--rosetta@home "oldtimers" will remember we were planning to release the rosetta source code several months ago in hopes that another Akos might be able to make significant speedups, but many participants objected to this for a variety of reasons (code corruption, cheating, etc.). Our general philosphy is that all results and code we develop should be accessbile to the public and the scientific community.

-----------------------------------------------------------------------------

Hmmm. This is an interesting issue. I currently attached to Ufluids but do not crunch for them because I lost too much work. Anyways, the code was released over a month ago with their project and I have yet to see any progress with their code. Depending on ones reasoning, I see that both sides of the issue have good arguements.

I would agree with Dr. Baker that the code should be released in order to provide a service to the scientific community.

I do not think releasing the code to optimize it will do much good. I have observed that a majority of the crunchers, such as myself, have little knowledge of "coding" and could not help even if they wanted to. The select few with the knowledge probably do not have the time, interest, or are manipulating it for their own benefit.

Although I have been with this project almost as long as David, I soon stopped crunching due to all of the errors my computer was getting at the time. Only recently have I crunched for this project therefore I do not know the history of the discussions here.

My personal suggestion would be to continue recruiting expertise from admins on other BOINC projects like you seem to be doing. Chrulle, the former admin over at LHC is looking for a job. Have you talked to him?

ID: 16236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 16295 - Posted: 15 May 2006, 9:52:14 UTC - in response to Message 16236.  

Seti
The general optimisations over at Seti where mainly done by the compilers, adding faster math functions (IPP, etc.) and compiling for specific instruction sets. (SSE, SSE2 and SSE3). There where some tweaks to caching etc but it was mainly down to the compilers. The Seti-enhanced is now using these 'math instuctions' with some of the extra tweaks added. Though it does not use the compiler optimisations for instruction sets, so that is what you'll see being released for their optimised apps.

Einstien
aksof, reverse engineered einstien@home since they did not release the source. he dropped in code for SSE and SSE3 but I think his best work must be the 3DNow! drop ins which make even AMD K6's and Athlon (pre AthlonXP) faster.
None of iwas compiled from source, though I'm assuming he is now ;-)


Rosetta
Given the fast change to the code it would need to be source code optimised speeding up generic sections. I do not think instruction sets (3dNow!, SSE->SSE3 and maybe even multi-threading for dualcore/hyperthreaders) would work here in the model of the above.
What I believe would need to happen is that BOINC pulls out the instruction sets from the CPU (relativly easily if not already done) and send it to the server, then the server sends out the correct app for the correct CPU and OS, that way they (Rosetta) just need to compile for each target, get tested on Ralph, then released over here. This is purley due to the pace of application changes, by the time someone compiled an optimised one, it'll be out of date and we would be forever trying to keep up.
Either that or compile the sci-app to determine for itself at run time, though this is not as fast and bulks the program out a little.
Team mauisun.org
ID: 16295 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tallguy-13088

Send message
Joined: 14 Dec 05
Posts: 9
Credit: 843,378
RAC: 0
Message 16297 - Posted: 15 May 2006, 11:09:40 UTC - in response to Message 16295.  
Last modified: 15 May 2006, 11:12:44 UTC

In the world of the "mainframe" where I work for a living, it is not uncommon (in low level system code), to see subroutines added that address specific optimizations at a "processor family" level or OS release level. This is typically done to address to address "incompatible" architectural changes (i.e. control block residency/format changes, etc).

In the case of the multiple processor types such as is present in a project of this nature, I suspect that this would be a complex change to the code for the developers. My concern would be that there is a strong potential for making the code too complex for the average developer to maintain. Remember, this is still "basic science" and the focus is still on developing the techniques of the process Dr. Baker and crew are working on. As such subject to regular change. I'm thinking that they want to focus on developing the techniques and not be "bogged down" with the specifics of processor level optimizations quite yet. Once that is "where they want it", then they can then set off on the task of "stroking the code".

Just my 2 cents.



Seti
The general optimisations over at Seti where mainly done by the compilers, adding faster math functions (IPP, etc.) and compiling for specific instruction sets. (SSE, SSE2 and SSE3). There where some tweaks to caching etc but it was mainly down to the compilers. The Seti-enhanced is now using these 'math instuctions' with some of the extra tweaks added. Though it does not use the compiler optimisations for instruction sets, so that is what you'll see being released for their optimised apps.

Einstien
aksof, reverse engineered einstien@home since they did not release the source. he dropped in code for SSE and SSE3 but I think his best work must be the 3DNow! drop ins which make even AMD K6's and Athlon (pre AthlonXP) faster.
None of iwas compiled from source, though I'm assuming he is now ;-)


Rosetta
Given the fast change to the code it would need to be source code optimised speeding up generic sections. I do not think instruction sets (3dNow!, SSE->SSE3 and maybe even multi-threading for dualcore/hyperthreaders) would work here in the model of the above.
What I believe would need to happen is that BOINC pulls out the instruction sets from the CPU (relativly easily if not already done) and send it to the server, then the server sends out the correct app for the correct CPU and OS, that way they (Rosetta) just need to compile for each target, get tested on Ralph, then released over here. This is purley due to the pace of application changes, by the time someone compiled an optimised one, it'll be out of date and we would be forever trying to keep up.
Either that or compile the sci-app to determine for itself at run time, though this is not as fast and bulks the program out a little.


ID: 16297 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 16312 - Posted: 15 May 2006, 14:59:33 UTC

I agree that the application is changing too rapidly to worry about optimization. I know history tells us you can sometimes get 2x better throughput, but that is especially true for applications that were poorly coded to begin with. And it's especially true for applications that don't change, so the ONLY changes you are making pertain to optimizations.

I would just point out that releasing source might also open the door to new platforms. Not specifically with hope of an optimized client for a platform, but rather a straight port of the code. Once a port process is in place, it's generally pretty straightforward to keep up with new releases.

Any way around, none of it is stuff you want to play with during CASP, while you're already strecthing the limits and finding proteins that just happen to fit the profile of those you feel a new algorythm that's been on the shelf may work well with.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 16312 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 16540 - Posted: 18 May 2006, 15:04:55 UTC - in response to Message 16225.  
Last modified: 18 May 2006, 15:09:03 UTC

I did contact Akos a while ago, but he for the moment wants to focus on further improvements to the Einstein code.

See this New Scientist article on Akos' Einstein work. So maybe he now completed his Einstein code speed-up activities and is looking for new challenges... ;-)
Team betterhumans.com - discuss and celebrate the future - hoelder1in.org
ID: 16540 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile XeNO

Send message
Joined: 21 Jan 06
Posts: 9
Credit: 109,466
RAC: 0
Message 16624 - Posted: 19 May 2006, 9:34:56 UTC - in response to Message 16410.  

Rosetta 5.16:

(1) We're continuing our efforts to reduce memory usage by typical workunits by rosetta@home. You can expect an even further reduction in memory footprint in our next update.

(2) We're testing a new science mode which uses the sequence and structural information from homologous proteins in an early phase of the simulation, but then returns to the target protein sequence in the final refinement phase. This mode appears to have a larger memory footprint than typical workunits, so we will only send out these jobs to computers that have >1Gb RAM.

(3) Also, we're trying a new feature where at the end of a simulation, Rosetta compares its fold to the predictions made by a dozen other algorithms. (Those predictions are sent to the clients in a compressed format.) Seeing consensus between different algorithms is usually a good sign that a prediction is right.



Is that Greater than or Equal to 1GB of RAM, or just Greater? When my Computer is not in use I have no problems with Rosetta taking a hog's share of resources.



ID: 16624 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 16820 - Posted: 22 May 2006, 8:23:28 UTC

A plea: can merging be fixed? I have orphaned computers left right and centre. It's making a nonsense of the stats as well; Boinc Synergy thinks I have 6 machines when I have 3, repeat that to some degree for every producer and the stats become completely meaningless.

Is there a db whiz among you?
ID: 16820 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 16830 - Posted: 22 May 2006, 12:41:00 UTC

Today I got two T0287-targets:

https://boinc.bakerlab.org/rosetta/result.php?resultid=21140036
https://boinc.bakerlab.org/rosetta/result.php?resultid=21177415

The expiry-date for this protein is June, 1st if I'm reading this page correctly:

http://predictioncenter.org/casp7/targets/cgi/casp7-view.cgi

However the deadline for both WU were June, 5th which is past the expiry date.

I suggest to shorten the deadlines according to the expiry date in order to not receive results too late.
ID: 16830 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 16841 - Posted: 22 May 2006, 16:01:41 UTC - in response to Message 16830.  

Today I got two T0287-targets:

https://boinc.bakerlab.org/rosetta/result.php?resultid=21140036
https://boinc.bakerlab.org/rosetta/result.php?resultid=21177415

The expiry-date for this protein is June, 1st if I'm reading this page correctly:

http://predictioncenter.org/casp7/targets/cgi/casp7-view.cgi

However the deadline for both WU were June, 5th which is past the expiry date.

I suggest to shorten the deadlines according to the expiry date in order to not receive results too late.


Good point--thanks for catching this!
ID: 16841 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
NJMHoffmann

Send message
Joined: 17 Dec 05
Posts: 45
Credit: 45,891
RAC: 0
Message 16843 - Posted: 22 May 2006, 16:17:36 UTC - in response to Message 16840.  
Last modified: 22 May 2006, 16:19:14 UTC

Actually there are a number of dates that are pertinent depending on the catagory of CASP in which a project submits its results. The dates you are seeing are for server predictions. The predictions that Rosetta is working on are in a different category. The reporting dates they are using have already taken into account the dates the project needs in order to meet the CASP deadlines for the category in which they will submit their results.

Are you shure? The CASP-website says the release date for the structure will be at the 4th of June.

Norbert

Edit: Just saw the answer of David Baker.
ID: 16843 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 16844 - Posted: 22 May 2006, 16:45:45 UTC
Last modified: 22 May 2006, 16:48:06 UTC

The stats you cite are only adversely effected to the extent that individual machine statics are the focus of the information desired. The vast majority of the stats are in fact collective, and not dependent on a view of a specific machine. All of the stats that depend on your total credit are still as accurate as ever, even the rac for a particular machine will reflect the proper contribution if allowed to do so. The only stats significantly affected by this issue are those relating to the total credit for a particular machine.


True, it doesn't affect the project as such. I was primarily interested in the proportions of different CPUs when I was looking at the stats yesterday; it was then that it occurred to me that the numbers were probably out by a large margin. This in turn affects the credit/RAC averages per processor type, which are also interesting.

Just a way of deleting "false start" instances of machines would be a good place to be - I have several with zero credit; one took 3 "false-starts" before it kicked into life last time, so the credit of one machine is read as that of 4.

It's pointless jeopardising the data in it's entirety while trying to fix it - I'll wait until the bugs are out, thanks. :)
ID: 16844 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jim_S
Avatar

Send message
Joined: 26 Aug 06
Posts: 15
Credit: 497,976
RAC: 0
Message 43953 - Posted: 19 Jul 2007, 23:17:39 UTC

Why is Rosetta eating most of my resources even though I have the resource share set at 10%?

PEACE
ID: 43953 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jim_S
Avatar

Send message
Joined: 26 Aug 06
Posts: 15
Credit: 497,976
RAC: 0
Message 43968 - Posted: 20 Jul 2007, 13:04:16 UTC - in response to Message 43953.  
Last modified: 20 Jul 2007, 13:05:34 UTC

Why is Rosetta eating most of my resources even though I have the resource share set at 10%?

*BUMP* Any ideas or HEeeelllp?
Rosetta seems to IGNORE my BOINC settings.

PEACE
ID: 43968 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 43969 - Posted: 20 Jul 2007, 13:32:00 UTC

Jim, you used the term "resource share". This is a setting which allows you to define to BOINC how to allocate your machine's time across all of the projects you are attached to. If you are concerned because SETI is no longer running, it's not a problem. It's just more efficient to run one project on a machine at a time, and so BOINC rotates over the course of time between the projects and over time will spend the configured time on each project.

If you really only wanted BOINC to use 10% of your CPU time, there is a setting in your General Preferences which says "use at most ___ % of CPU". It most certainly is your choice, but I'd suggest that you either give BOINC at least 50%, or perhaps set it up to run only at night or during hours of the day when you aren't using the machine (these settings are on the same configuration page).

One you change the setting for the location of the machine in question, then just go back to your BOINC manager and update to the project for it to bring down these new configuration changes.
Rosetta Moderator: Mod.Sense
ID: 43969 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paydirt
Avatar

Send message
Joined: 10 Aug 06
Posts: 127
Credit: 960,607
RAC: 0
Message 44058 - Posted: 22 Jul 2007, 2:00:48 UTC - in response to Message 43969.  

Hey Jim, I wanted to share some of my thoughts about BOINC using processor time. I've found that it does not cause any noticeable slowness in machines where I have it set to 100% CPU. The reason this is, is because BOINC is set to the "LOWEST" priority so whenever the computer needs the CPU for something else, BOINC will temporarily get out of the way. I don't even turn it off for gaming.

I do turn it off for spyware and virus sweeping.

ID: 44058 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 44061 - Posted: 22 Jul 2007, 9:19:08 UTC - in response to Message 44058.  

Hey Jim, I wanted to share some of my thoughts about BOINC using processor time. I've found that it does not cause any noticeable slowness in machines where I have it set to 100% CPU. The reason this is, is because BOINC is set to the "LOWEST" priority so whenever the computer needs the CPU for something else, BOINC will temporarily get out of the way. I don't even turn it off for gaming.

I do turn it off for spyware and virus sweeping.


Depending on your Spyware/Virus program you could raise the priority a notch to save you having to stop boinc on the big sweeps. I know mine is for manual sweeps by default. Though if you run them as scheduled searches you could just alter teh schedule (assuming it uses windows built in scheduler and not it's own) to stop boinc and restart it for you.

Another thing to do for this is to restrict the run times of boinc, say don't let it run all of Friday night (can be set in the advanced preferences of newer boinc clients). That way anything BOINC stops working, like Google, Windows Desktop Searches/Indexing, Diskkeeper or other defragmenters get a chance to kick in. The reason these do not work with boinc is the look for inactivity of the computer rather than priority (boinc now does similar for it's in use/not in use memory etc setting, but does it as it was intended to be done afaik).


Anyway, that also gives another reason why people do not use BOINC, it interferes with Defragmenters, Virus scanners (though not so much here) and defragmenters (Diskkeeper do know about this and do have a solution). That is when using BOINC 'out of the box' and needs tinkering to get it to work nicely.
Team mauisun.org
ID: 44061 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Improvements to Rosetta@home based on user feedback



©2025 University of Washington
https://www.bakerlab.org