what about using CUDA for calculations?

Message boards : Number crunching : what about using CUDA for calculations?

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
2di

Send message
Joined: 6 May 08
Posts: 8
Credit: 51,990
RAC: 0
Message 52901 - Posted: 7 May 2008, 18:28:35 UTC

Hi guys.
I am pretty new to boinc projects but I think its very cool.

Resently i discovered a new toy CUDA.
Cuda is a technology that allows to execute C like code on the GPU. This is Nvidia technology and works only on 8th+ generations of the Gforces.

GPU is ridiculously faster than normal CPU. Besides it doesnt have stupid operating system to take care of.
My processor is intel core2 1.8@3.1Ghz, and i have pretty cheap graphics card 8500GT(600Mhz). I did few tests on floating point calculations and found that my GPU is twice as fast as CPU.
You can easily integrate CUDA into C++, and use GPU functions calls to do calculations and return some values(result).

So my idea is to create GPU version of the BOINC manager, so people who using nvidia card can get triple performance. I dont think it will be so difficult.
I believe ATI looking into support of CUDA as well, but this is just a rumor...

Well what u think about it ?
ID: 52901 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 52905 - Posted: 8 May 2008, 0:58:23 UTC
Last modified: 8 May 2008, 1:03:43 UTC

Ok, I'll bite. In a single word, "no".

My understanding is that Folding@Home already supports various ATI gpu's, and is * supposed * to be considering support for nvidia gpu's.

You may find the that following threads contain somewhat relative informative as to why Rosetta, depending upon your point of view, either will not, or can not, consider using gpu's / gaming consoles / cpu optimizations:


cpu optimization


does GPU upgrade help for R@H?


Sony Playstation 3 crunching data


Boinc/Rosetta on the Xbox 360?


Rosetta & Parallelization (gaming consoles)
ID: 52905 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
2di

Send message
Joined: 6 May 08
Posts: 8
Credit: 51,990
RAC: 0
Message 52909 - Posted: 8 May 2008, 12:15:32 UTC

well I looked through these topics.

CUDA is very different to any console, it doesnt require negotiations with Microsoft or Sony. Implementing boinc on any console is very complex/expensive process. They using different architecture(to PC) ,u need their dev/tool kits and "stuff" like that.

It also different to GPU implementations which was mentioned in the forum.
You dont need to put any information on the screen to see results, it works almost the same way as C. You dont need to use complex methods to capure result.


I have very basic knowledge about ATI programming support, but i know it requires some special GPU knowledge and it isnt so simple. In cuda all you need to replace old functions with GPU function calls and thats about it. Not sure about license but it cant be worse than 360 or ps3 licinses.

Also to the best of my knowledge all heavy seaching/testing algorithms which operates on data, can be done buy using simple math operations (+-*/sqrt/sin/cos...). So there is no problem for rosetta.

well thats just my opinion.
or something...

ID: 52909 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 52924 - Posted: 9 May 2008, 2:01:15 UTC
Last modified: 9 May 2008, 2:03:39 UTC

you make some valid points. unfortunately, I have been trying for a very loooooong time to have the Project Staff just "discuss" these topics, and they have refused.

so, if the project won't even take the time to discuss it, you've got about a snowball's chance in hell of them actually implementing it.

sorry if i've rained on your parade.

but if you want a project that is receptive to the ideas you're suggesting, consider Folding@Home, and read their FAQ's, especially on Second generation GPU client. while geared primarily to ATI, I think you'd find some people to have a conversation with about nVidia.

good luck, whatever you decide!
ID: 52924 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 52929 - Posted: 9 May 2008, 5:17:37 UTC - in response to Message 52924.  

you make some valid points. unfortunately, I have been trying for a very loooooong time to have the Project Staff just "discuss" these topics, and they have refused.

so, if the project won't even take the time to discuss it, you've got about a snowball's chance in hell of them actually implementing it.

sorry if i've rained on your parade.

but if you want a project that is receptive to the ideas you're suggesting, consider Folding@Home, and read their FAQ's, especially on Second generation GPU client. while geared primarily to ATI, I think you'd find some people to have a conversation with about nVidia.

good luck, whatever you decide!


the problem we have with rosetta is that the different steps in structure prediction and design involve a succession of very different algorithms. in the folding@home case, the molecular dynamics core is always the same and optimizing this can give a big improvement, wheras to get a big speed up in rosetta we would have to port many different parts of the code simultaneously, and we just haven't had the resources to do this yet

ID: 52929 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,688,048
RAC: 9,222
Message 52931 - Posted: 9 May 2008, 8:07:45 UTC

if it were just a compiler switch to throw i'm sure they'd do it in a second ;)
ID: 52931 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 52935 - Posted: 9 May 2008, 11:47:02 UTC
Last modified: 9 May 2008, 12:09:57 UTC

the problem we have with rosetta is that the different steps in structure prediction and design involve a succession of very different algorithms...

wheras to get a big speed up in rosetta we would have to port many different parts of the code simultaneously, and we just haven't had the resources to do this yet



Thank you DB for jumping in!!!


If I may, please, would you take a moment to clarify these two points:

1. we have been discussing this idea with Microsoft quite a bit over the past several weeks; I will keep everybody posted

and

Tony and Microsoft have been incredibly supportive of our efforts so far, and he is going to help us try to make this become a reality in the not too distant future.

I am sorry, I may have missed the follow-up. Can we assume at this point, there are no more discussions? And briefly, why didn't this work out? The architecture of the xBox, lack of real commitment from MS?



2. Again, please forgive my ignorance in advance, but I was under the impression that Rosie had already "port[ed] many different parts of the code simultaneously"

San Diego Supercomputer Center helps speed high-tech drug design

"When David Baker, who also serves as a principal investigator for Howard Hughes Medical Institute, originally developed the code, it had to be run in serial - broken into manageable amounts of data, with each portion calculated in series, one after another.

Through a research collaboration, SDSC's expertise and supercomputing resources helped modify the Rosetta code to run in parallel on SDSC's massive supercomputers, dramatically speeding processing, and providing a testing ground for running the code on the world's fastest non-classified computer.

The groundbreaking demonstration, part of the biennial Critical Assessment of Structure Prediction (CASP) competition, used UW professor David Baker's Rosetta Code and ran on more than 40.000 central processing units (CPUs) of IBM's Blue Gene Watson Supercomputer, using the experience gained on the Blue Gene Data system installed at SDSC."



I really love Rosie, otherwise I wouldn't be here, and I'm really not trying to be a pain in the @$$, but there seems to have been either silence or some inconsistent messages and information from the Project about why certain things can't be done.

Can you please put to rest the issues of MS and the xBox,

and

the previous parallelization of Rosie's code and why it can't be applied to gaming consoles and gpu's

(given that "Khanna says that his gravity grid has been up and running for a little over a month now and that, crudely speaking, his eight consoles are equal to about 200 of the supercomputing nodes he used to rely on." Astrophysicist Replaces Supercomputer with Eight PlayStation 3s)

?

Sincere Thanks !!!
ID: 52935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,688,048
RAC: 9,222
Message 52938 - Posted: 9 May 2008, 12:25:04 UTC - in response to Message 52935.  

If I may, please, would you take a moment to clarify these two points:

1. we have been discussing this idea with Microsoft quite a bit over the past several weeks; I will keep everybody posted

and

Tony and Microsoft have been incredibly supportive of our efforts so far, and he is going to help us try to make this become a reality in the not too distant future.

I am sorry, I may have missed the follow-up. Can we assume at this point, there are no more discussions? And briefly, why didn't this work out? The architecture of the xBox, lack of real commitment from MS?



I think it's a safe bet that the thermal problems (RROD) that the initial xbox 360s suffered from put an end to any chances of running something that would max out parts of the hardware. The newer (65nm and better cooled) xboxs would probably be fine, but differentiating between them would be a PR nightmare for MS.

ID: 52938 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
2di

Send message
Joined: 6 May 08
Posts: 8
Credit: 51,990
RAC: 0
Message 52939 - Posted: 9 May 2008, 12:35:49 UTC
Last modified: 9 May 2008, 12:42:28 UTC

I totally agree with The_Bad_Penguin.
It will be cool if you could explain issues with parallel programming and GPUs.

I dont want to push anyone to do something (even if i could ;) ). All iam saing there are new simple ways of programming on GPUs, and if this technology can be used to accelerate the project, its worth to consider it.
:)


eah, xboxes like to die, and temperature could be an issue,
sure boinc could use 80/70% of the cpu, but still it going to reduce it life circle.
ID: 52939 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 52940 - Posted: 9 May 2008, 13:38:47 UTC

Given Dr. Baker's other post, it was more the low memory of the SPEs then the redesign to take full advantage of the multiple cores that was the roadblock. Hard to fit 500k (and growing) lines of code into 256k (not growing until a new generation of Cell perhaps) of memory per SPE. Doesn't leave you any room for the protein you're working on.

And it is important to note that running on a Cell is not a "port" so much as a "redesign". A port is typically, for the most part, a recompile with a different CPU target. But a redesign, by any application, would be required to efficiently utilize the 8 SPEs.
Rosetta Moderator: Mod.Sense
ID: 52940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,688,048
RAC: 9,222
Message 52941 - Posted: 9 May 2008, 14:18:56 UTC - in response to Message 52939.  

eah, xboxes like to die, and temperature could be an issue,
sure boinc could use 80/70% of the cpu, but still it going to reduce it life circle.

it could very easily, but there is no way MS would contemplate it if it is going to increase the chance of more RRODs, and from a PR perspective, both having the reduced throughput and confirming that it has a thermal design flaw would be bad PR.

it's very unfortunate, but there's plenty of untapped compute power out there which hopefully minirosetta will make it easier to port/recompile for. However, I'm sure they've got plenty on while concentrating on CASP8 and Fold-it for the immediate future...
ID: 52941 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 52942 - Posted: 9 May 2008, 14:32:49 UTC
Last modified: 9 May 2008, 14:36:53 UTC

Not a bad discussion we're having here!


@dcdc: points well taken. The next logical question then, is R@H able to contact Sony? The ps3 has already proven itself capable of being a 24/7 cruncher, and no issues of being able to differentaite "newer" from "older" models.

I do understand that UW/BakerLabs is receiving $$$ from the Gates Foundation, which is in theory separate from MS. Would this research funding be in jeporady if R@H was to initiate contact with Sony? If the answer is "yes", then the answer is "yes", no need to explain or sugar-coat, just say so.


@Mod.Sense: Agreed, and R@H is not the only one that has such concerns - Nevertheless, Dr. Frank Mueller noted that the biggest limitation in its current state is the 512MB RAM constraint, but did insinuate that he might try retrofitting additional memory if future tasks deemed it necessary.

Maybe this possibility of hardware hacking, retrofitting additional ram, can be commented on by those posters who are more technically savvy than myself. Is it as simple as unsoldering installed ram chips, and re-soldering in new ram chips to the ps3 mobo?

Also agree your comment about redesign as opposed to port.

The point I was attempting to make was that others had previously commented about the extreme difficulty of "redesigning" Rosetta's sequential code into parallel code, when in fact, it had already been proven with a finished product that it could be done.

Not being a programmer, I would "assume" the hardest part of getting Rosie to run on a gaming console's / gpu's parallel processors would be the initial change in structure from sequential to parallel, and getting it to a specific architecture (ps3, xbox, nvidia, ati) while not "easy", would be "easier" than the part which had already been done (converted to parallel code).
ID: 52942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 52945 - Posted: 9 May 2008, 15:31:25 UTC

I found a pretty comprehensive description on the architecture here
http://www.blachford.info/computer/Cell/Cell1_v2.html
You see the 256K that each SPE has in it's "local store" is actually all integrated right on to the chip (illustraited at the top of the page). And that's part of why they are so fast... and also part of why the available memory on each SPE is so small.

There is some discussion of the programming efforts required if you follow the link at the bottom for "Part 3 Programming the Cell".

So, in short, the Cell is a good step in the right direction. And it would seem likely that a next generation Cell might be appropriate. And, Rosetta Mini (which uses less memory and has more streamlined code that should be easier to make changes to) is a step in the right direction. And perhaps a next generation Mini might be a good fit for a next generation Cell.
Rosetta Moderator: Mod.Sense
ID: 52945 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,688,048
RAC: 9,222
Message 52947 - Posted: 9 May 2008, 15:58:22 UTC - in response to Message 52942.  

@dcdc: points well taken. The next logical question then, is R@H able to contact Sony? The ps3 has already proven itself capable of being a 24/7 cruncher, and no issues of being able to differentaite "newer" from "older" models.

I don't think the PS3 is as straight forward to port to as the xbox because:
it only has one CPU core
the cell chips are cache limited (256kb)
there is only 256MB RAM available to the CPU

However, the creation of minirosetta and the recent supercomputer work may allow an SMP version (where all CPUs work on the same protein rather than one thread per CPU)?

Maybe this possibility of hardware hacking, retrofitting additional ram, can be commented on by those posters who are more technically savvy than myself. Is it as simple as unsoldering installed ram chips, and re-soldering in new ram chips to the ps3 mobo?
Don't know about the PS3 but the original xbox had solder points for additional RAM (for possible future expansion or two sets of lower density chips maybe?)
[/quote]

ID: 52947 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 52948 - Posted: 9 May 2008, 16:18:22 UTC
Last modified: 9 May 2008, 16:24:18 UTC

Given the following, perhaps it may be a wait before the Cell BE has an increase in cache/ram, and maybe a mini-miniRosetta will arrive first, lol.

What does mini-Rosetta need in terms of both cache and ram, for each of the ppe and the spe's?

So, while there is room for improvements, increased speed, cache/ram, compatability concerns might nix them in the near term:

Speaking of Cell and sales, the presentation suggests that, despite IBM's promise that Cell could see widespread adoption outside of the console realm, Sony is still far and away IBM's main customer of Cell. Specifically, IBM states the following in the paper digest: "To guarantee the proper operation of existing gaming software, the exact cycle-by-cycle machine behavior, including operating frequency, must be preserved."

In other words, IBM's Cell shrink was made with Sony in mind; the chipmaker didn't take advantage of the shrink to make any performance-enhancing tweaks, opting instead to preserve the exact performance characteristics of the 65nm version, which itself preserved the performance characteristics of the 90nm version.

IBM shrinks Cell to 45nm. Cheaper PS3s will follow



Soooooooooooooooo, I guess getting back to the OP's original question, what are the difficulties with gpu's?

Some of the newer ones are coming to market with 1gb ram, so do gpu's solve one problem, but create others?
ID: 52948 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
2di

Send message
Joined: 6 May 08
Posts: 8
Credit: 51,990
RAC: 0
Message 52949 - Posted: 9 May 2008, 16:42:04 UTC

Lets solve problems as we encounter them.

WE WANT: to make rosetta faster.
WE KNOW:
1)by using 360/ps3/GPU we can get extra speed
2)one way or another its possible to run the software on each platform (not necessary easy)

WE NEED:
1) for 360/ps3 we need a permission to run rosetta

PROBLEM:
1) there is no problem with anything. If for some reason administration of the project cannot/dontwant to use some of these platforms thats fine.

this is my own oppinion:
(i)port for 360 will be relatively easy. 3core IBM cpu can handle 3 threads easily. It has similar architecture as PC. (ii)PS3 will be harder, cells uses parallel processing but it doesnt mean that it cannot run few separate threads, as my pc doing at a moment. (iii) GPU porting shouldnt be so hard, all nvidia gpus(of one generation) using same architecture, so code will be compatable, it also supports multithreading.

I dont think there any point to descuss consoles, in best case scenario its gona take a life time to sort all licenses with MS of Sony. From my point of view GPUs are closest possible solution.
ID: 52949 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 52951 - Posted: 9 May 2008, 18:13:59 UTC - in response to Message 52949.  
Last modified: 9 May 2008, 19:08:27 UTC

while i appreciate your eagerness, i feel it necessary to correct one of your assumptions:

While it is true that MS developed the xBox as a "closed" system, and you would need "permission" to run other software on it...

there is no "permission" required from Sony, and the ps3 was developed as an "open" system, which you can readily install linux on to, and any other software that will run under the linux os (such as the Boinc middleware, and dc projects such as ps3grid and yoyo@home's ogr wrapper).

So.... as i understand it:

1. the XBox is not a good candidate because MS has already spent $1b fixing a design flaw that casused RRODs, and even if this was fixed at present, it would be difficult to have the software id if the xBox was "old" or "new" before deciding if it would run or not, so MS would very likely NOT donate any of its resources to assist in having dc crunching code for any project

2. the ps3, for purposes of Rosie's needs, appears at present to be limited by the 256kb cache and/or the 256mb ram. If either Rosie's application were to become smaller in size, or Sony was to increase the cache and/or ram, then perhaps the ps3 could be capable of crunching for Rosie, and Sony already HAS donated its resources to assist in having dc crunching code run on the ps3

3. I will need to re-read some threads and articles about gpu's, and the problems specifically related to nVidia (there were reasons why up until present that F@H only used ATI and not nVidia), before I comment futher, other than noting what F@H has already stated:


The GPU client is still the fastest, but it is the least flexible and can only run a very, very limited set of WUs. Thus, its points are not linearly proportional to the speed increase. The PS3 takes the middle ground between GPUs (extreme speed, but at limited types of WU's) and CPU's (less speed, but more flexibility in types of WUs).



So, gpu's may be very fast, but there are "very, very limited" instances where a dc project may be able to take advantage of that speed, and it may just be that the very nature of the Project does not allow it to have the "very, very limited" types of wu's which can benefit.
ID: 52951 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
2di

Send message
Joined: 6 May 08
Posts: 8
Credit: 51,990
RAC: 0
Message 52952 - Posted: 9 May 2008, 19:15:13 UTC

yeah u right about ps3, i knew about linux but completely forget about it, my bad.

I just have 2 questions ;)
Does anyone knows why there is strict limitation in size of a cache? Honestly i never came acros programs which require some specific amount of cache. Is it something to do with checking IDLE state of the processor? or just performance issue? As far as i know cpu cache used to store frequently accessed data, it much faster than ram, but its not critical for application to be executed, I mean its not gonna prevent program from been executed. ?

and a second question ;) , what is "very, very limited" ?
Sorry guys but i still cant understand it, you can do all mathematical operations on the GPU, and access ram and vram, what else do u need to predict a protein?
ID: 52952 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
2di

Send message
Joined: 6 May 08
Posts: 8
Credit: 51,990
RAC: 0
Message 52953 - Posted: 9 May 2008, 19:18:13 UTC

yeah u right about ps3, i knew about linux but completely forget about it, my bad.

I just have 2 questions ;)
Does anyone knows why there is strict limitation in size of a cache? Honestly i never came acros programs which require some specific amount of cache. Is it something to do with checking IDLE state of the processor? or just performance issue? As far as i know cpu cache used to store frequently accessed data, it much faster than ram, but its not critical for application to be executed, I mean its not gonna prevent program from been executed. ?

and a second question ;) , what is "very, very limited" ?
Sorry guys but i still cant understand it, you can do all mathematical operations on the GPU, and access ram and vram, what else do u need to predict a protein?
ID: 52953 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 52955 - Posted: 9 May 2008, 20:28:08 UTC - in response to Message 52953.  
Last modified: 9 May 2008, 20:30:59 UTC

...what else do u need to predict a protein?


Robust compilers for the programming language(s) of your application, and enough spare development hours to devote to the effort.

As to cache, the article I linked earlier discusses how the processing stops when the data being accessed is not in the cache and the system has to run out to memory to get it. And how this can impair your CPU by 80%. So, given that example, if you want more then half of your processor time doing useful work, then you need sufficient cache for the application's use of data.
Rosetta Moderator: Mod.Sense
ID: 52955 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : what about using CUDA for calculations?



©2024 University of Washington
https://www.bakerlab.org