Benchmark Rosetta - the sesults are incredible

Author	Message
Beteigeuze Send message Joined: 29 Nov 05 Posts: 4 Credit: 8,257,422 RAC: 806	Message 52046 - Posted: 20 Mar 2008, 20:15:01 UTC I found some inconsistencies, which I must tell you. For some times I ask you how important is the L2 Cache. "E2180 vs. E4500, is a 2MB L2 Cache profitable" ? https://boinc.bakerlab.org/rosetta/forum_thread.php?id=3637&nowrap=true#47371 Now I have tested (benchmark) a Core 2 Duo E6550 (clocked with 2.45 GHz) for some days. And I’m astonished over the granted credit. Should this CPU be faster than a Core 2 Duo E4400 clocked with 2.88 GHz or a Core 2 Duo E2200 clocked with 2.92 GHz ? The internal benchmark measured a lower value for the E6550. Yes, the E2200 or the E4400 should be faster. I did make a test. I took some WUs and check out how long the same WUs need to complete on the different CPUs with the same clockrate. The first WU need 3h on the E2200 clocked with 2,45 GHz. But a E4500 and a E6550 clocked with 2,45 Ghz need exactly the same time. The second, the third WU a.s.o. , all 3 need the same time. The difference is only a couple of min. How can it bee ? o.k., perhaps the most work take place in the RAM . I diminished the clockspeed from the Ram. But no, ... the same time ! I take a AMD X2 4400+ clocked with 2.45 Ghz. The same time ! I was quite put out about the matter. Well. I cut the FSB in half from the 4400+ and the E2220. And restart the test. It was a doozie! The same time, only 1.2 GHz and the same time !? I reconstructed the old speed and adjust 50% CPU Power under the preferences. But now ? How would you estimate the result ? This shold be approximate 1.2 GHz ? ... the same time ? No, the double time. I don’t understand this. I the calculations take place in the RAM why do I have a processor load from 100 %. an exampel a 2,45 Ghz Intel vs. 1.28 AMD: It seems that a mindless loop arrange for a 100% CPU load. And the calculation time is depending on the CPU load. All discussions about the credit system with based on them are witless. ------------------------------------------------------------------------------------------------------------- The other Project with I am engaged in is WCG (Help Conquer Cancer). Here the same bosh ? Now the results are realistic, but the AMD has no chance. the scaling is correct: and the last pic (doubel time): I hope you can reproduce/understand this. ID: 52046 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 52048 - Posted: 20 Mar 2008, 21:57:05 UTC I'm not sure what you believe you have measured. For starters, the time it takes to complete a task gives you no indication of the amount of work a given machine has completed. Credit is issued for each model completed. One machine might complete 10 models in 3hrs and another might complete 7 models in 2.8hrs... which one did more work? There's no fixed number of models that need to be completed in any given task. Rosetta just keeps working on more models as the time remaining in your runtime preference allows. Some tasks have complex models that will take more then an hour of computing to complete a single model. Other tasks will go through a model every 10 minutes. And so credit is based upon the specific task and protein you complete models for. In fact, even within a specific task and protein, there is a fair degree of variation between one model and the next. Since it's not possible to control the task and protein you are assigned, it is extremely difficult to create a fair testbed across multiple hosts. Rosetta Moderator: Mod.Sense ID: 52048 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1835 Credit: 124,952,580 RAC: 45	Message 52049 - Posted: 20 Mar 2008, 22:38:35 UTC and so the most accurate benchmark is RAC ;) ID: 52049 · Rating: 0 · rate: / Reply Quote

Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 68,252,598 RAC: 4	Message 52057 - Posted: 21 Mar 2008, 11:22:27 UTC - in response to Message 52046. These are great tests - thank you for taking the time to perform them. Unfortunately, the run time of a task provides very little information about the work done because each task performs different computations and any given set of tasks yield inconsistent results. It would be great to run each system for a week or more and look at the RAC for that system. After a week (maybe 2), the system will receive enough work units to provide an accurate picture of the actual performance. I am very interested in the results of your tests because I want to make sure my future builds use the most productive parts for R@H. Keep us posted. Thx! Paul ID: 52057 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 52061 - Posted: 21 Mar 2008, 15:01:03 UTC I've been working on PHP programs to pass along scheduler requests to the project, but rewrite the output to hit a download server of my own for the files needed to process the work. This takes the burden of the downloads from the Rosetta servers. So I've been watching the data streams that are exchanged between client and server in detail. It might be possible to capture the information about a number of WUs over time, and cache all of the files needed to crunch them, and to sort of create a dummy BOINC project that would basically do nothing but let you download a set of benchmark WUs to measure and study. I could perhaps create multiple sets of benchmarks. Some for each Rosetta version. Some for abrelax. Some for docking. Etc. The benchmark would give you a way to crunch the exact same WU each time, with the exact same random seed. And so you could then make an apples-to-apples comparison with another box, or user that has done the same. I should even be able to figure out how many credits your result would be granted (based upon the credit issued when my original host reported the completed work). You would receive no credit for them, indeed the project would not be aware you are crunching them. The CPU time spent crunching them would offer no scientific benefit to the project (although perhaps the better understanding of how things work would bring more users to the project?). If people think such an opportunity to perform a benchmark like this would be useful, please let me know your comments, and express your interest in this thread. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 52061 · Rating: 0 · rate: / Reply Quote

Beteigeuze Send message Joined: 29 Nov 05 Posts: 4 Credit: 8,257,422 RAC: 806	Message 52062 - Posted: 21 Mar 2008, 19:21:29 UTC Thank you very much for this feedback ! @Feet1st and the others i think i did make a Dummy already. Perhaps a little bit intricate ... I donwloaded some WUs, after that i made the networkconnection for this client and the Tasks inactiv. Then i closed Boinc complete. As a zip-file i could transfer this dummy to every PC. I think that this way guarantees, that every Pc do the same. This is the result you can see. (i marked the area, with the same WUs) The key issue is, that there is no significant difference between the calculation periods if i choose an other cpu or clockrate, as long as a 100% CPU load is guarantee. How can a much slower CPU need the same time for a WU how a CPU wich is much more faster. I understand a WU as a work-packet. And the credits comes later ... that's an other case for me, but the same point should becomes clear as well. a little experiment: The internal benchmark with a 3.2 Ghz E2200 with any Rosetta- WUs: floating point speed 3227.7 million ops/sec integer speed 7437.7 million ops/sec CPU time (sec).......claimed credit..........granted credit 3200 Mhz 10,368.92...............64.00......................61.98 10,623.31...............65.57......................63.54 1600 MHz (reducing the Multi) 10,026.92...............61.89......................32.45 10,475.34...............64.65......................34.98 8,841.64.................54.57......................28.77 The CPU-Time (verify this time, ... its correct) is the same. But if you take SETI or WTC, you get a comprehensible working time. (my last 3 pics) ... or here: SETI BOINC Benchmark What is so different with the Rosetta WUs ... ? My intention is to optimized my machines. Hi Feet1st, perhaps can you create a better test, wich could use everyone ? How much effort would be involved in such a test ? (sorry for my worst english, i cant internalise all your comments/minds a.s.o.) ID: 52062 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1835 Credit: 124,952,580 RAC: 45	Message 52063 - Posted: 21 Mar 2008, 19:21:42 UTC Hi Feet1st! It's been a busy few months here! I think it's a good idea - it'd allow more specific benchmarks that we can currently get so we'd be able to see the true benefits of things like cache and memory speed... ID: 52063 · Rating: 0 · rate: / Reply Quote

j2satx Send message Joined: 17 Sep 05 Posts: 97 Credit: 3,670,592 RAC: 0	Message 52068 - Posted: 21 Mar 2008, 22:21:37 UTC - in response to Message 52061. If people think such an opportunity to perform a benchmark like this would be useful, please let me know your comments, and express your interest in this thread. I'm interested....wouldn't you have to run the same WU multiple times to get DCF trained? I'd like to see about ten in a package. ID: 52068 · Rating: 0 · rate: / Reply Quote

Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0	Message 52076 - Posted: 22 Mar 2008, 13:11:23 UTC - in response to Message 52062. CPU time (sec).......claimed credit..........granted credit 3200 Mhz 10,368.92...............64.00......................61.98 10,623.31...............65.57......................63.54 1600 MHz (reducing the Multi) 10,026.92...............61.89......................32.45 10,475.34...............64.65......................34.98 8,841.64.................54.57......................28.77 The CPU-Time (verify this time, ... its correct) is the same. But if you take SETI or WTC, you get a comprehensible working time. (my last 3 pics) ... or here: What is so different with the Rosetta WUs Maybe I don't understand what you are trying to depict, so if I'm way off just ignore me. What I see in your depiction is a Rosetta task runtime of about 3 hours, so regardless of the computer speed each task will run close to 3 hours. But one computer is 2x faster than the other and produces 2x more models... therefore 2x more credit. I assume that Seti and WCG give credits based on runtime not on models produced; so you can't compare Rosetta to Seti in the same way. ID: 52076 · Rating: 0 · rate: / Reply Quote

Beteigeuze Send message Joined: 29 Nov 05 Posts: 4 Credit: 8,257,422 RAC: 806	Message 52077 - Posted: 22 Mar 2008, 14:55:13 UTC - in response to Message 52076. @Nothing But Idle Time This ia a new approach for me. I have to get used to it. You mean one WU has potential for models. And a faster CPU can generate more of them. -> ... more credits o.k. But if every computer get the same WUs it pass up a chance on a slower CPU. way. Whats happend with a "half modell" ? Why not a complete investigation are be made ? This should be a important factor. ID: 52077 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 52080 - Posted: 22 Mar 2008, 18:01:31 UTC - in response to Message 52077. Whats happend with a "half model"? Why not a complete investigation are be made ? This should be a important factor. There are no half models. And you are correct, this is an important factor. This is why at least one model must be completed before results can be sent back, regardless of your preferred runtime. For task with long running models, and hosts with very short runtime preferences, this creates a disparity and poor estimates for time to completion. So your machine will run the task until at least one model is completed. Then it checks the runtime so far as compared to the preference and decides if a second model should be started. Faster machines actually tend to crunch longer. I mean get closer to the runtime preference. This is simply how the math works out. Take the extreme example if a machine that takes 2 hours per model. It will compute for 2hrs, complete the model, and decide not to begin another one because it would very likely exceed the 3hr runtime preference. No consider a machine that's twice as fast and only takes 1 hour per model for that same task. This machine will run the full three hours and complete three models in that time as compared to the slower machine that ran for 2 hours and only completed one model. And so the faster machine will earn 3 times more credit in the 3 hours then the slower machine earned in 2. Or, put it another way, credit earned per hour of runtime on the faster machine will be about double that of the slower machine. And both machines, regardless of their relative speeds, get the same credit per model completed. This is why we say credit is based on work completed. Not based on your machine specs. Rosetta Moderator: Mod.Sense ID: 52080 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1835 Credit: 124,952,580 RAC: 45	Message 52083 - Posted: 22 Mar 2008, 19:06:22 UTC - in response to Message 52077. @Nothing But Idle Time This ia a new approach for me. I have to get used to it. You mean one WU has potential for models. And a faster CPU can generate more of them. -> ... more credits o.k. But if every computer get the same WUs it pass up a chance on a slower CPU. way. Whats happend with a "half modell" ? Why not a complete investigation are be made ? This should be a important factor. In addition to what Mod.Sense has posted, it's also important to note that we're not running an exhaustive search of all possible models. There are too many possibilities to do that, so the search based on a random sample of starting points. ID: 52083 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 52104 - Posted: 24 Mar 2008, 19:30:26 UTC - in response to Message 52062. Hi Feet1st, perhaps can you create a better test, wich could use everyone ? How much effort would be involved in such a test ? The effort would involve doing a few things I've been meaning to do anyway, and be very dependant upon whether once I build it, that it works the way I had imaged :) That's way I thought I'd only proceed on that if a number of people were interested in running such benchmarks. Yes, your method of copying an entire BOINC directory structure and restoring to another machine sounds fairly close to what I would achieve. But I'm not sure how you came up with credit figures. Perhaps those were other tasks that the two hosts had done? My approach would have several advantages, such as the potential to run the benchmark on more then a single platform (Windows, Linux, Mac). And to have a complete benchmark that is easy to repeat if you like. The problem is always making the time to do such things, more then the hours it takes. It would likely be several months before I'd have anything. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 52104 · Rating: 0 · rate: / Reply Quote