Message boards : Number crunching : A "Good" Session...
Author | Message |
---|---|
Stephen Andersen Send message Joined: 24 Jul 06 Posts: 15 Credit: 20,359 RAC: 0 |
What is considered a "good" session of crunching? I've searched (honestly) for about 2 hours in the FAQs and didn't find anything on what I was looking for. The best "stderr out" that I've had was 24 decoys with 0 RMSD skipped. What exactly does this mean? Please enlighten, I failed physics 20 years ago. LOL Thank you for your patience. |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
Well I don't know what FPU benchmarks your system is pulling or what your system is. Is it a P4 greater than 2.5GHz? An AMD? What's your target CPU time in your preferences? If you're running at the 3hr default CPU time, then 24 decoys is nothing to sneeze at. In fact, that would be really good. Also, it depends on the WU itself. Some can produce 24 to 30 in three hours, while others might produce only three to five. Again, if you haven't changed your CPU time from the default, then I wouldn't be concerned at all. As far as the "0 RMSD Skipped" that just means that there was no known RMSD structure known (CASP Targets). Hope this helps. |
MikeMarsUK Send message Joined: 15 Jan 06 Posts: 121 Credit: 2,637,872 RAC: 0 |
|
Stephen Andersen Send message Joined: 24 Jul 06 Posts: 15 Credit: 20,359 RAC: 0 |
Thank you for the reply. I wanted to see if I was producing "a less than average" or "poor" results from my participation. My average computing time is around 4-5 hours per WU, so I assume that I'm doing good. I'm glad to see that I'm hangin' with the 'big dogs'. LOL Many thanks again. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Each computer is basically created equal. I mean each one model, whether it produced low or high energy results, was a learning experience for the Baker team. Collectively, we're working to crunch 100,000 models (decoys, never understood why both names) for each protein CASP is putting out for prediction, so if you did 30, and I did 28 and someone else did 37... it all adds up. If we can collectively crunch 100,000 models, then the prediction should be pretty good. Note, your PC could be sent the same protein to study more then once. The project team might have predicted that by such and such a date they'd HAVE 100,000 but they don't yet, so keep sending that protein WU until you have the desired results. In other words, they're ALL good sessions. The more hours you have your machine on, the more models you will help to crunch. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Stephen Andersen Send message Joined: 24 Jul 06 Posts: 15 Credit: 20,359 RAC: 0 |
Maybe this could be an additional question to the FAQ. So I'm understanding this correctly, a decoy (or model) is a promising protein (or string) that researchers may want to revisit at a later date. And the result of 1, or more, RMSDs skipped is basically your computer finding a known protein that doesn't need further analysis. If I'm on the right track, no need to reply. Better yet, no need to reply. I was lost back at "accepted energy". LMAO |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
The statement was that there is no RMSD, because the proteins being studied are unknown. You see the RMSD is the measure of how far your current prediction varies from the known structure. Since the structure is NOT known, the RMSD cannot be computed. I've related it to being like testing our your new GPS by navigating with a sextant and compass, and your GPS at the same time, and comparing the readings. If you view "GPS" as being the new method of navigating, that's what Rosetta is. It's the new method of determining the protein's shape. I don't have an answer for why some decoys aren't completed. ...otherwise I probably would have explained it in a QA item :) My guess is that we got started on it, and somehow determined it would not be fruitful to proceed to complete it. Meaning, they somehow predicted in advance of all the crunching, that this was a road we need not go down. There are trillions of combinations of shapes a given protein can take. We each take on a tiny piece of that search space, and compute a complete "model" (as shown in the graphic). In completing that model, you've run through several million of the possible shapes. All the while trying to hone in on the one with lowest energy. The protein is what we're all studying. It's a question of what it looks like. We know the chemical strands that make it up, but we don't know what shape it takes in nature. I can hand you a pile of 2x4s of various lengths and you know what they look like. I can tell you to assemble them in THIS order (i.e. they're numbered for you)... I can even ask you to assemble them into a small shed... but how many shed configurations could be made? And which would prove to be the "correct" configuration (i.e. the one these 2x4s take when nature produces them)? Rosetta is trying to predict that shape. Each model represents a possible prediction. Rosetta produces 100,000 (or more) predictions for a single protein we wish to study... for example, a protein found in the bird flu virus. So, it's not the protein that's promising, it is the prediction you are working on that might be promising. Once the shape of that protein in the virus is known... then we protein can be invented or discovered which might knock it out (i.e. FIT with it). To use my shed analogy... you've got this hole in the corner... if I come along with a door frame assembly... that FITS the hole, then we've GOT something. But first, I must know what your shed looks like. Hope that helps. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Ethan Volunteer moderator Send message Joined: 22 Aug 05 Posts: 286 Credit: 9,304,700 RAC: 0 |
Evening Stephen, My understanding is a decoy, or 'model' means one run through the Rosetta machine. You start with a random number (defined by your work unit and your computer's clock), do all the math that Rosetta@home thanks you for doing, and finally spit out a result. Since the work unit sizes are defined by time (you get to set the length), your computer is able to finish several decoys in that period. Some cpu's are faster than others, and some proteins are harder to compute than others, so the number of decoys per work unit vary considerably. The length of a protein is very important to how long it takes to compute. Say the average protein is 1 mile long (just using this unit since we're used to it). . and it takes 100,000 decoy's to get a decent answer. As the proteins get bigger, it's like walking on a longer trail. Say there is a split in the trail every additional mile. In order to get the same quality answer on a 2 mile trail, you'd have to do twice as many decoys (walking down both paths to see where each leads). This is analogous to the size of a protein. For Casp7, this varied from less than a hundred, to over four hundred. Since the path 'diverges' every mile (100 units of protein length), the number of decoys needed goes up quickly. For one of the 400-length proteins (4-miles), they need 800,000 decoys (100,000 times 2*2*2). In the end these are all averages. If you had a million sided dice, how many times would you have to roll it to get a given number? You'd be lucky if it was only a few times, but if my experience in craps is any example, you can roll them for a long time and not get what you want :) Hopefully that helps, and isn't too separated from the science. -Ethan |
soriak Send message Joined: 25 Oct 05 Posts: 102 Credit: 137,632 RAC: 0 |
Each work unit starts off at a random point and uses an elaborate algorithm to do tiny modifications to the structure and calculate the energy again. If such a structure is considered possible, the energy is measured and goes to 'accepted energy' - then it looks if that energy happens to be lower than the previous (which would be good) or higher (bad), then based on that outcome continues with different movements in a seemingly random fashion. (it uses a method from quantum chemistry known as Quantum Monte Carlo - how that works is way beyond my understanding though. As the name suggests it is about random numbers) The goal is to find the structure with the absolutly lowest energy - that's the only shape that exists in nature. Each Model (or decoy) begins at a new random point and goes from there. If you look at the screensaver you'll notice that shortly after the model/decoy starts, it will only to very small changes. Every time a new model/decoy begins, it'll do a massive transformation - basically it tries an entirely different structure, then goes through minor changes for that one until it moves on to the next model. Although a very low energy means it is closer to the real thing, high energy results also provide data that is important in refining the search and making it more efficient. If it takes 100,000 models today to get a high probability of being close to the one found in nature, then refining it may lead to only requiring 20,000 models. In the same time 5x the amount of protein structures could be determined. This is really a game of probabilities... it'd be impossible to test all combinations. A small protein may have about 100 different places where it can twist. Even if we would take the most basic assumption (ie twist = yes, or twist = no, and ignore the angles) there would be 100! different combinations. That's a 9 with 156 zeros. And of course they can twist in all sorts of different angles and in many more than 100 places. Researchers wont look at all but a few of the models we return - they're merely red dots on a graph like this: https://boinc.bakerlab.org/rosetta/rah_top_predictions.php The lowest energy and the lowest RMSD results are interesting. If we actually find the right one, RMSD and lowest energy would be exactly the same model. RMSD calculates the difference between the model and the "real" protein, where the structure is known already. There are ways to find that structure, but they are very time consuming and expensive. They can, however, be used to test Rosetta, because the project team knows how close to the actual one we get. Right now we're testing CASP proteins - these structures have been determined using one of the expensive methods, but the result has not been made public. Programs like Rosetta (but also other approaches) each do their thing to submit a prediction and at the end the organizers publish the real structure and show who got how close. Two years ago (CASP is held every 2 years) Rosetta submitted the most accurate predictions - and that was before they had the kind of computing power they now have with BOINC and all of us helping. Hope that answers your questions - if not, just ask away ;) Maybe I should also answer your initial question: There's no way to say how many models/decoys are a good run... it's different for every work unit and even every model. The best measure of your "work" is in the credit system - you can compare those (especially RAC, which is a recent average) to other users. Just don't be discouraged if others have much higher RACs, many people connect multiple computers, have those running 24/7 and overclock to get the most out of it. |
Stephen Andersen Send message Joined: 24 Jul 06 Posts: 15 Credit: 20,359 RAC: 0 |
Thank you much for the responses. I understand now a little more of the purpose and scope of Rosetta and it's mission. If I've got more questions, I know where to ask. :) |
Stephen Andersen Send message Joined: 24 Jul 06 Posts: 15 Credit: 20,359 RAC: 0 |
Oops.. had to go back and delete it.. Clicked too fast for the 'puter. LOL |
Message boards :
Number crunching :
A "Good" Session...
©2025 University of Washington
https://www.bakerlab.org