who will be the next king of the hill?

Message boards : Number crunching : who will be the next king of the hill?

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile DHG

Send message
Joined: 18 Dec 06
Posts: 11
Credit: 8,277
RAC: 0
Message 35181 - Posted: 21 Jan 2007, 9:05:58 UTC - in response to Message 31474.  

And why would hypertransport make any sort of difference to Rosetta? Anyone claiming that would be as clever as saying that a Ferrari is better than a normal car in London City rush-hour traffic - it won't, because there's no room to benefit from the speed. As we've discussed before, a good machine (with large enough caches) will have 97%+ cache-hit-rate with Rosetta... You are the one who said you wanted to see a stop to performance gain claims that are based on "other" improvements. You've got a brand new core in your processor, it's not doing much memory access, I presume, to so even if you had a shared bus at 100MHz, it shouldn't make much, if any, difference at all.

--
Mats

I agree with you. Hypertransport is useless on Rosetta :), so does it on all the crunching workloads: A well programmed crunching algorythm will always fit in the case, or close to it.

I actually see performance improvement on Rosetta when i increase the front side bus speed, it is due to the fact that 3% can be compressed to 1% :)

In the racing world, 2% is great :)

who?


Okay, I lost it. What on earth does an interconnect bus has to do with computing power? It's only used to connect the cpu to the chipset, or in multi socket systems for connecting the cpu's with each other, WITHOUT the need for "gluelogic". The only bus I can imagine that is used as something that improved performance is the on-die memory controller.

Further I never heard annyone who 'hyped' the HT bus as a performance booster, and even if I did heard, I would have a good laugh with them, just as I found it funny how anti HT you where. But now, it isn't funny annymore, it's getting annoying. As an engineer you must know better, because repeating this so many times (=>HT didn't to a thing to improve performance) while it isn't exactly the thruth (=>it isn't ment to improve performance, it's just like saying that intel's speed throtling doesn't do a thing to improve performance) implies just ignorance from your part.
ID: 35181 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 35190 - Posted: 21 Jan 2007, 13:09:41 UTC - in response to Message 35181.  

And why would hypertransport make any sort of difference to Rosetta? Anyone claiming that would be as clever as saying that a Ferrari is better than a normal car in London City rush-hour traffic - it won't, because there's no room to benefit from the speed. As we've discussed before, a good machine (with large enough caches) will have 97%+ cache-hit-rate with Rosetta... You are the one who said you wanted to see a stop to performance gain claims that are based on "other" improvements. You've got a brand new core in your processor, it's not doing much memory access, I presume, to so even if you had a shared bus at 100MHz, it shouldn't make much, if any, difference at all.

--
Mats

I agree with you. Hypertransport is useless on Rosetta :), so does it on all the crunching workloads: A well programmed crunching algorythm will always fit in the case, or close to it.

I actually see performance improvement on Rosetta when i increase the front side bus speed, it is due to the fact that 3% can be compressed to 1% :)

In the racing world, 2% is great :)

who?


Okay, I lost it. What on earth does an interconnect bus has to do with computing power? It's only used to connect the cpu to the chipset, or in multi socket systems for connecting the cpu's with each other, WITHOUT the need for "gluelogic". The only bus I can imagine that is used as something that improved performance is the on-die memory controller.

Further I never heard annyone who 'hyped' the HT bus as a performance booster, and even if I did heard, I would have a good laugh with them, just as I found it funny how anti HT you where. But now, it isn't funny annymore, it's getting annoying. As an engineer you must know better, because repeating this so many times (=>HT didn't to a thing to improve performance) while it isn't exactly the thruth (=>it isn't ment to improve performance, it's just like saying that intel's speed throtling doesn't do a thing to improve performance) implies just ignorance from your part.


Who? hasn't really brought it up that much recently (you responded to a 2month old post ;)
But I'm sure he will now :-)
Team mauisun.org
ID: 35190 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile DHG

Send message
Joined: 18 Dec 06
Posts: 11
Credit: 8,277
RAC: 0
Message 35202 - Posted: 21 Jan 2007, 14:13:51 UTC - in response to Message 35190.  

And why would hypertransport make any sort of difference to Rosetta? Anyone claiming that would be as clever as saying that a Ferrari is better than a normal car in London City rush-hour traffic - it won't, because there's no room to benefit from the speed. As we've discussed before, a good machine (with large enough caches) will have 97%+ cache-hit-rate with Rosetta... You are the one who said you wanted to see a stop to performance gain claims that are based on "other" improvements. You've got a brand new core in your processor, it's not doing much memory access, I presume, to so even if you had a shared bus at 100MHz, it shouldn't make much, if any, difference at all.

--
Mats

I agree with you. Hypertransport is useless on Rosetta :), so does it on all the crunching workloads: A well programmed crunching algorythm will always fit in the case, or close to it.

I actually see performance improvement on Rosetta when i increase the front side bus speed, it is due to the fact that 3% can be compressed to 1% :)

In the racing world, 2% is great :)

who?


Okay, I lost it. What on earth does an interconnect bus has to do with computing power? It's only used to connect the cpu to the chipset, or in multi socket systems for connecting the cpu's with each other, WITHOUT the need for "gluelogic". The only bus I can imagine that is used as something that improved performance is the on-die memory controller.

Further I never heard annyone who 'hyped' the HT bus as a performance booster, and even if I did heard, I would have a good laugh with them, just as I found it funny how anti HT you where. But now, it isn't funny annymore, it's getting annoying. As an engineer you must know better, because repeating this so many times (=>HT didn't to a thing to improve performance) while it isn't exactly the thruth (=>it isn't ment to improve performance, it's just like saying that intel's speed throtling doesn't do a thing to improve performance) implies just ignorance from your part.


Who? hasn't really brought it up that much recently (you responded to a 2month old post ;)
But I'm sure he will now :-)


Ugh, I'm feeling stupid right now, not noticing the date ... Well, I knew that he was saying this when I first arrived here, then he quited. And it seemed (in my point of view, disregarding the date that is) that it was starting all over again. So, I guess I'll take back all the repeating it to death stuff.
ID: 35202 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 35212 - Posted: 21 Jan 2007, 15:55:44 UTC

DHG, I don't know as much about hardware as I do about software, but I believe the point is just that in Who?'s and Mats' observations of Rosetta as it runs, they are finding that it doesn't have to go out to memory very often. That it gets what it needs loaded (or prefetched) in to the processor cache. That is the cache hit rate. So, regardless of how much memory access speed might be improved, it won't run Rosetta much better because Rosetta doesn't use memory access much of the time.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 35212 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 35217 - Posted: 21 Jan 2007, 17:12:01 UTC - in response to Message 35212.  
Last modified: 21 Jan 2007, 18:11:30 UTC

hehehehe :)

I have nothing to say about hyperSLOWtransport on Desktop, the Grandfather 4x4 prooooooved me right by slowing down all the games at launch.
Non Uniform Memory Architecture (NUMA) may be good on server, but it is proven to be a catastrophy on Desktop computers.
And actually ... the slow down of 4x4 comes from HT cross link latency, so, as an engineer (see post below) I can say that HT degrade performance on Desktop, because I can prove it.

My V8 just passed RAC 3000 again ...

who?
ID: 35217 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 35219 - Posted: 21 Jan 2007, 18:14:45 UTC - in response to Message 35190.  
Last modified: 21 Jan 2007, 18:51:06 UTC


Who? hasn't really brought it up that much recently (you responded to a 2month old post ;)
But I'm sure he will now :-)

Done :P

Who? did that?
ID: 35219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile DHG

Send message
Joined: 18 Dec 06
Posts: 11
Credit: 8,277
RAC: 0
Message 35223 - Posted: 21 Jan 2007, 19:27:23 UTC - in response to Message 35217.  
Last modified: 21 Jan 2007, 19:41:08 UTC

hehehehe :)

I have nothing to say about hyperSLOWtransport on Desktop, the Grandfather 4x4 prooooooved me right by slowing down all the games at launch.
Non Uniform Memory Architecture (NUMA) may be good on server, but it is proven to be a catastrophy on Desktop computers.
And actually ... the slow down of 4x4 comes from HT cross link latency, so, as an engineer (see post below) I can say that HT degrade performance on Desktop, because I can prove it.

My V8 just passed RAC 3000 again ...

who?


Don't start on 4x4, it's a total disaster and whay too expensive for what it is. It's just a sort of cover for the quadless state of AMD, taken from the server world, with the only good thing that it uses regular DDR2. The bad thing is the software that is supposed to run (that includes OS) on it is not intended for hardware like that. And if the OS decides that it want's to play around and migrate the process from cpu to cpu, then it's even worser.

Further, if HslowT is so bad, what's intel's grandious sollution for connecting multiple sockets? "Gluelogic" perhaps? (I believe it is, can't be bothered to search much now) Well, last time I designed hardware, I still believed that the less logic between 2 points, the lesser the latency is. I'm speaking about 2 or more sockets, connecting 2 die's is less troublesome (intel's quad).

Thruth is, that multi socket solutions for desktop whatever brand of cpu it uses have a degraded performance (current situation, perhaps later there will be some beter support?). The best solution possible (performance wise) is everything on 1 die, of couse, that would not be really affordable. The next best thing is multiple sockets with software that's aware of the situation.

Btw, I'm not claiming annywhere here that AMD has the better sollution. Only stating that HT is not bad (when used correctly). I would've done the same if it where up to me, and intel will do the same with CSI. What would your nickname for CSI be? Common slow System Interface or something?
ID: 35223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,621,003
RAC: 0
Message 35247 - Posted: 22 Jan 2007, 0:26:20 UTC - in response to Message 35217.  

My V8 just passed RAC 3000 again ...

You *really* need to get that thing over on SETI for a while. There is now a V8 Mac Pro. I would like to see how it does with alexkan's app, vs your V8 windows box with your custom app.

http://setiathome.berkeley.edu/show_host_detail.php?hostid=3022805
Reno, NV
Team: SETI.USA
ID: 35247 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,621,003
RAC: 0
Message 35252 - Posted: 22 Jan 2007, 2:03:21 UTC - in response to Message 35247.  

My V8 just passed RAC 3000 again ...

You *really* need to get that thing over on SETI for a while. There is now a V8 Mac Pro. I would like to see how it does with alexkan's app, vs your V8 windows box with your custom app.

http://setiathome.berkeley.edu/show_host_detail.php?hostid=3022805


Also, there a *128* way SGI machine, that just took 1st place. The processors are pretty slow, but 128 of them together is generating 6k-8k per day. Currently its RAC is 4800 and growing. Sure would be nice to take down a 128-way machine.... =;^)
Reno, NV
Team: SETI.USA
ID: 35252 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 35258 - Posted: 22 Jan 2007, 4:31:41 UTC - in response to Message 35252.  

I am convince that Rosetta needs me more than seti. I am sure a 8 Cores with SSE4 will easily take down this SGI ... you just need to wait the CPU from Intel ;-)

who?
ID: 35258 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,621,003
RAC: 0
Message 35260 - Posted: 22 Jan 2007, 5:35:16 UTC - in response to Message 35258.  

I am convince that Rosetta needs me more than seti. I am sure a 8 Cores with SSE4 will easily take down this SGI ... you just need to wait the CPU from Intel ;-)

who?


I understand. ...but we also need your optimized cruncher!
Reno, NV
Team: SETI.USA
ID: 35260 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 35261 - Posted: 22 Jan 2007, 5:55:44 UTC - in response to Message 35260.  

I am convince that Rosetta needs me more than seti. I am sure a 8 Cores with SSE4 will easily take down this SGI ... you just need to wait the CPU from Intel ;-)

who?


I understand. ...but we also need your optimized cruncher!


I am almost done with my work related optimizations, I ll have time soon for Seti ...

SSE4 is a much bigger toy that SSSE3, i am more interested in SSE4.
The Boss of the company said publically that the 45nm Core 2 is working ...
I got new toys! Can't say anything about it. Exciting!!!!

If I was the cruncher programmers, i ll get into the Doc of SSE4, you can expect the compiler to do much better, the instruction were design for vectorization.

I think intel will finally see the sun of auto-vectorization :)

who?

ID: 35261 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Who?

Send message
Joined: 2 Apr 06
Posts: 213
Credit: 1,366,981
RAC: 0
Message 35267 - Posted: 22 Jan 2007, 8:13:18 UTC - in response to Message 35223.  

hehehehe :)

I have nothing to say about hyperSLOWtransport on Desktop, the Grandfather 4x4 prooooooved me right by slowing down all the games at launch.
Non Uniform Memory Architecture (NUMA) may be good on server, but it is proven to be a catastrophy on Desktop computers.
And actually ... the slow down of 4x4 comes from HT cross link latency, so, as an engineer (see post below) I can say that HT degrade performance on Desktop, because I can prove it.

My V8 just passed RAC 3000 again ...

who?




Further, if HslowT is so bad, what's intel's grandious sollution for connecting multiple sockets? "Gluelogic" perhaps? (I believe it is, can't be bothered to search much now) Well, last time I designed hardware, I still believed that the less logic between 2 points, the lesser the latency is. I'm speaking about 2 or more sockets, connecting 2 die's is less troublesome (intel's quad).


Intel did a little more than "Glue", in the S5000 chipset, you have 2 front side buses and 4 channel memory. This is still uniform memory architecture and it did mistified 4x4 on every test CPU intensive. This chipset plus the perfectchers is a very well balance platform, much more than NUMA, and it shows on 3D rendering or video compression/decompression + authoring.

The Front side bus is NEVER saturated on a desktop work load, and let the Cores scale to 8 without bottleneck. The cache coherency of the Quad core is managed by few snop filters, and it does show very nice performance. I wish it was as simple as Glue, but the Dual socket under the sun light shows impressive performance, more than GLue was used ... look at the scaling of Cinenbench on the V8 machine, it is almost perfect per core. Same for Povray, 3Dsmax and other. notice that the V8 machine does not slow down compare to a single Quad core! No cross link business to mess up the performance.

The solution with 2 FSB at 1333Mhz is very elegant, it deserve more respect than "gluelogic"

The glue, i used it for the neon lights, not for the architecture :-D

who?
ID: 35267 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : who will be the next king of the hill?



©2025 University of Washington
https://www.bakerlab.org