Core 2 QX6700 beats the 4 Opterons Dual core for the Number 1 position

Message boards : Number crunching : Core 2 QX6700 beats the 4 Opterons Dual core for the Number 1 position

To post messages, you must log in.

AuthorMessage
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 31059 - Posted: 13 Nov 2006, 16:42:54 UTC
Last modified: 13 Nov 2006, 16:43:13 UTC

Top 20
Of course, this QX6700 is overclocked, at 4.0GHz, but it is rock stable.
It proves as well that Hypertransport is totally over hyped, if it was so good at memory bandwidth, it will be much faster than what it is:
The QX6700 is beating a AMD system that have 4 memory controlers! what a poor efficency!

Grandfather (They call it Quadfather)4x4 will have about half of this processing units, will use about 125Watts per sockets, and cost more than QX6700. The motherboard will be more expensive than the 975XBX that i used. Hypertransport and all its high pines counts raise the prices, but not the performance!!!

The demonstration is made, smart cores (Core 2) with prefetchers and good L2 caches is much better than the expensive Hypertransport with aging cores (K8) and expensive motherboard.

So, Mister the 3 marketeers, stop telling us that Hypertranspport is the futur, it is a marketing hype, and we exposed it here!!!!

May the Core be with you ;-)

who?
ID: 31059 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 31060 - Posted: 13 Nov 2006, 16:59:11 UTC

Great machine.

However, since Rosetta has very few L2-cache misses with 1MB L2 cache, I expect that the point of how many memory controllers you have will not matter in this case, and of course, Hypertransport only really matters if you have any communication between the processors... As Rosetta jobs don't need to communicate between each other (there is no shared memory between them, application is statically linked and each instance has it's own data-set), this is not a particularly good benchmark for how good or bad any type of inter-processor communication is.

My best machine is on the next page down... :-(

--
Mats




ID: 31060 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 31062 - Posted: 13 Nov 2006, 17:13:45 UTC - in response to Message 31060.  
Last modified: 13 Nov 2006, 17:14:20 UTC

Great machine.

However, since Rosetta has very few L2-cache misses with 1MB L2 cache, I expect that the point of how many memory controllers you have will not matter in this case, and of course, Hypertransport only really matters if you have any communication between the processors... As Rosetta jobs don't need to communicate between each other (there is no shared memory between them, application is statically linked and each instance has it's own data-set), this is not a particularly good benchmark for how good or bad any type of inter-processor communication is.

My best machine is on the next page down... :-(

--
Mats

You got my point, for crunching data, the programmer is usually smart enough to do a little of data locality work and avoid L2 caches access, making 4 memory controlers totally useless.
On the top of this, you have to choose between avoiding 1 of the 3 latency of mem access by using a memory controler on die, or avoiding totally the 3 latencies by prefetching the data correctly.

On Rosetta, Core 2 has a 99.99% L2 cache success rate, and after testing a little test on my X2, it is far from being true on X2. The X2 success rate is more around 97%, making it spend 3% of the time with a memory subsystem much slower than the core frequency.

Time to drive to work... Stay tune, i ll release a version of SETI optimized soon to prove my point. I wish Rosetta source code was open too.


who?
ID: 31062 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 31065 - Posted: 13 Nov 2006, 17:24:42 UTC - in response to Message 31062.  

Great machine.

However, since Rosetta has very few L2-cache misses with 1MB L2 cache, I expect that the point of how many memory controllers you have will not matter in this case, and of course, Hypertransport only really matters if you have any communication between the processors... As Rosetta jobs don't need to communicate between each other (there is no shared memory between them, application is statically linked and each instance has it's own data-set), this is not a particularly good benchmark for how good or bad any type of inter-processor communication is.

My best machine is on the next page down... :-(

--
Mats

You got my point, for crunching data, the programmer is usually smart enough to do a little of data locality work and avoid L2 caches access, making 4 memory controlers totally useless.
On the top of this, you have to choose between avoiding 1 of the 3 latency of mem access by using a memory controler on die, or avoiding totally the 3 latencies by prefetching the data correctly.

On Rosetta, Core 2 has a 99.99% L2 cache success rate, and after testing a little test on my X2, it is far from being true on X2. The X2 success rate is more around 97%, making it spend 3% of the time with a memory subsystem much slower than the core frequency.

Time to drive to work... Stay tune, i ll release a version of SETI optimized soon to prove my point. I wish Rosetta source code was open too.


who?


Yes, if AMD built a processor JUST for Rosetta, then I doubt it would be built with one memory controller per processor. However, I also doubt that AMD would be a reasonably successfull company if that was the speciality.

Which model X2 processor are you referring to? There are models with 256K, 512K and 1024K L2 cache per core. On a 1024K per core, I get 99% L2-cache hit-rate... With smaller L2 cache, it would obviously reduce the cache-hit rate...

--
Mats



ID: 31065 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 31073 - Posted: 13 Nov 2006, 18:18:54 UTC

Lets also keep things in perspective; newer CPU architectures being faster than old ones is nothing new.

The K8 architecture (A64) was introduced in 2003, so it's pretty impressive that it has stayed competitive this long.
ID: 31073 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 31082 - Posted: 13 Nov 2006, 19:27:25 UTC

Actually RAC may not be the best indicator since it doesn't take into effect bad work units, effect of bulk uploading ..

BUT just by looking at the credit per hour
Opteron ~10credit/hr (core) 870 is 2GHz I think.
C2Q @ ~30credit/hr (core) , you say that at 4GHz, so at it's real speed (2.66) is about ~20credits/hr (core).
Overall the C2Q is a faster at the same Hz but a slight overclock to all the Opteron cores would put it past the C2Q in rosetta@home.

Of course it would be cheaper to run the C2Q ;-)


Mind you should be comparing it to the 3GHz
Opteron 856 (for that platform)

Or the AM2 platform 8220SE (2.8GHz PC2-5300) rather then to the aging 870.

;-)


Team mauisun.org
ID: 31082 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 31102 - Posted: 14 Nov 2006, 1:15:28 UTC - in response to Message 31082.  

Actually RAC may not be the best indicator since it doesn't take into effect bad work units, effect of bulk uploading ..

BUT just by looking at the credit per hour
Opteron ~10credit/hr (core) 870 is 2GHz I think.
C2Q @ ~30credit/hr (core) , you say that at 4GHz, so at it's real speed (2.66) is about ~20credits/hr (core).
Overall the C2Q is a faster at the same Hz but a slight overclock to all the Opteron cores would put it past the C2Q in rosetta@home.

Of course it would be cheaper to run the C2Q ;-)


Mind you should be comparing it to the 3GHz
Opteron 856 (for that platform)

Or the AM2 platform 8220SE (2.8GHz PC2-5300) rather then to the aging 870.

;-)



hehehehe , I am not going to slow down my machine to figure out that it is still faster than the opteron, when i slow it down ...

na!

Who?
ID: 31102 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Core 2 QX6700 beats the 4 Opterons Dual core for the Number 1 position



©2025 University of Washington
https://www.bakerlab.org