Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 246 · 247 · 248 · 249 · 250 · 251 · 252 . . . 300 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,822
RAC: 13,035
Message 107517 - Posted: 21 Oct 2022, 0:14:19 UTC - in response to Message 107516.  
Last modified: 21 Oct 2022, 0:14:45 UTC

I've asked here if AMDs can do so: https://moowrap.net/forum_thread.php?id=647

Did you do anything special to combine cards on the project?
ID: 107517 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 107518 - Posted: 21 Oct 2022, 1:04:07 UTC - in response to Message 107516.  

I would like to see what actual use is in resource/ task monitor or the like
ID: 107518 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107519 - Posted: 21 Oct 2022, 7:35:28 UTC - in response to Message 107517.  

I've asked here if AMDs can do so: https://moowrap.net/forum_thread.php?id=647

Did you do anything special to combine cards on the project?



No...everything is standard.
I don't mess around with stuff like that.
All projects are default.
ID: 107519 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107520 - Posted: 21 Oct 2022, 7:36:17 UTC - in response to Message 107518.  

I would like to see what actual use is in resource/ task monitor or the like



When MOO is up to run again, I will grab a screen shot.
Right now Einstein and Prime are running.
ID: 107520 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,822
RAC: 13,035
Message 107521 - Posted: 21 Oct 2022, 7:45:23 UTC - in response to Message 107519.  

I've asked here if AMDs can do so: https://moowrap.net/forum_thread.php?id=647

Did you do anything special to combine cards on the project?



No...everything is standard.
I don't mess around with stuff like that.
All projects are default.
I jsut tried Moo on a computer with a Tahiti and a Fury (both AMD, not too far apart. Tahiti is 3GB 4tflops SP, Fury is 4GB 8tflops SP.) But I got tasks for 1 AMD at a time, maybe it's a Cuda thing. I've asked here:
https://moowrap.net/forum_thread.php?id=647#8359
ID: 107521 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107525 - Posted: 21 Oct 2022, 11:08:27 UTC
Last modified: 21 Oct 2022, 11:23:00 UTC



And then FAH GPU


But what is interesting is Windows shows only 20% usage of the GPU but if you look at MSI Afterburner it shows 98%



A GPU engine represents an independent unit of silicon on the GPU that can be scheduled and can operate in parallel with one another. For example, a copy engine may be used to transfer data around while a 3D engine is used for 3D rendering. While the 3D engine can also be used to move data around, simple data transfers can be offloaded to the copy engine, allowing the 3D engine to work on more complex tasks, improving overall performance. In this case both the copy engine and the 3D engine would operate in parallel.
ID: 107525 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 107528 - Posted: 21 Oct 2022, 19:04:05 UTC

Not the output I was expecting from task mangler , and leaves me baffled ,
Afterburner looks to be telling it as it is .
ID: 107528 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 258
Credit: 483,503
RAC: 163
Message 107529 - Posted: 21 Oct 2022, 19:43:35 UTC - in response to Message 107528.  

Try switching to performance tab, down to gpu and press on title of one of the graphs
There should be Cuda graph.
It doesn't show up on my screenshot because i have Hardware accelerated gpu memory scheduler enabled.
ID: 107529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107530 - Posted: 21 Oct 2022, 20:44:40 UTC - in response to Message 107529.  
Last modified: 21 Oct 2022, 20:50:13 UTC

I did have a look at the time.
They were mirrored just as it says in the processes.
The text below all the images I posted explains it.

A GPU engine represents an independent unit of silicon on the GPU that can be scheduled and can operate in parallel with one another. For example, a copy engine may be used to transfer data around while a 3D engine is used for 3D rendering. While the 3D engine can also be used to move data around, simple data transfers can be offloaded to the copy engine, allowing the 3D engine to work on more complex tasks, improving overall performance. In this case both the copy engine and the 3D engine would operate in parallel.
ID: 107530 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107531 - Posted: 21 Oct 2022, 20:55:27 UTC
Last modified: 21 Oct 2022, 20:56:43 UTC




So I am showing you the 1080 startup and then the 1050 running
So as you see the copy box is active on both.

Again...refer to the text in the previous post.
ID: 107531 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 258
Credit: 483,503
RAC: 163
Message 107532 - Posted: 21 Oct 2022, 20:57:42 UTC - in response to Message 107531.  

What other graphs does it support?
ID: 107532 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107533 - Posted: 21 Oct 2022, 23:17:23 UTC - in response to Message 107532.  

What other graphs does it support?



What do you mean?
ID: 107533 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107534 - Posted: 21 Oct 2022, 23:20:30 UTC
Last modified: 21 Oct 2022, 23:21:21 UTC

There are generally two ways to distribute computation across multiple devices:

Data parallelism, where a single model gets replicated on multiple devices or multiple machines. Each of them processes different batches of data, then they merge their results. There exist many variants of this setup, that differ in how the different model replicas merge results, in whether they stay in sync at every batch or whether they are more loosely coupled, etc.

Model parallelism, where different parts of a single model run on different devices, processing a single batch of data together. This works best with models that have a naturally-parallel architecture, such as models that feature multiple branches.

This guide focuses on data parallelism, in particular synchronous data parallelism, where the different replicas of the model stay in sync after each batch they process. Synchronicity keeps the model convergence behavior identical to what you would see for single-device training.

Specifically, this guide teaches you how to use the tf.distribute API to train Keras models on multiple GPUs, with minimal changes to your code, in the following two setups:

On multiple GPUs (typically 2 to 8) installed on a single machine (single host, multi-device training). This is the most common setup for researchers and small-scale industry workflows.
On a cluster of many machines, each hosting one or multiple GPUs (multi-worker distributed training). This is a good setup for large-scale industry workflows, e.g. training high-resolution image classification models on tens of millions of images using 20-100 GPUs.

More at: https://keras.io/guides/distributed_training/
ID: 107534 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 258
Credit: 483,503
RAC: 163
Message 107535 - Posted: 21 Oct 2022, 23:47:50 UTC - in response to Message 107534.  
Last modified: 21 Oct 2022, 23:48:31 UTC

There should be a graph called cuda. For some reason task manager doesn't count it at processes tab.
ID: 107535 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,155
Message 107536 - Posted: 22 Oct 2022, 0:25:47 UTC - in response to Message 107534.  

[snip]

There are generally two ways to distribute computation across multiple devices:

Data parallelism, where a single model gets replicated on multiple devices or multiple machines. Each of them processes different batches of data, then they merge their results. There exist many variants of this setup, that differ in how the different model replicas merge results, in whether they stay in sync at every batch or whether they are more loosely coupled, etc.

Model parallelism, where different parts of a single model run on different devices, processing a single batch of data together. This works best with models that have a naturally-parallel architecture, such as models that feature multiple branches.

This guide focuses on data parallelism, in particular synchronous data parallelism, where the different replicas of the model stay in sync after each batch they process. Synchronicity keeps the model convergence behavior identical to what you would see for single-device training.

NVIDIA GPUs work best with models that use few branches.

The GPU cores are in groups of 16, called warps. Each warp can execute only one instruction at a time, but it can execute it simultaneously in any combination of the GPU cores in the warp.

Therefore, it works best if the work is arranged so that there are few branches than affect some of the cores within a warp, but not all of them.

I've found little information of whether this also happens in AMD GPUs.

Dividing the work between two or more GPUs should work if the portion of the work on one GPU does not need to exchange more than a rather small amount of information with any other GPU, provided that the application can divide the work properly which means that the division must be written into the application rather than expecting it to happen automatically.

Multithreaded CPU applications can use multiple CPU cores at once if the application is written to allow this. The main restriction on these is that no two virtual cores within a physical core can execute an instruction at the same time. However, main memory speed is usually such that any virtual core waiting on a main memory access will not have to wait any longer if another virtual core for that physical core can get its inputs from a cache instead of main memory.
ID: 107536 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,822
RAC: 13,035
Message 107537 - Posted: 22 Oct 2022, 2:22:46 UTC - in response to Message 107536.  

The GPU cores are in groups of 16, called warps. Each warp can execute only one instruction at a time, but it can execute it simultaneously in any combination of the GPU cores in the warp.

Therefore, it works best if the work is arranged so that there are few branches than affect some of the cores within a warp, but not all of them.

I've found little information of whether this also happens in AMD GPUs.
Are warps equivalent to wavefronts?
https://community.amd.com/t5/archives-discussions/stream-processor-quot-wavefront-quot-term-definition/td-p/81505
ID: 107537 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1232
Credit: 14,269,631
RAC: 3,155
Message 107538 - Posted: 22 Oct 2022, 2:35:10 UTC - in response to Message 107537.  

I've found little information of whether this also happens in AMD GPUs.
Are warps equivalent to wavefronts?
https://community.amd.com/t5/archives-discussions/stream-processor-quot-wavefront-quot-term-definition/td-p/81505

I read that, and found it confusing about whether they are or not.
ID: 107538 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107539 - Posted: 22 Oct 2022, 8:27:22 UTC

So whats happening then in plain terms is that the MOO task is huge and complex and splits its self among the two GPU's and the CPU's to be more efficient in getting the work done?
ID: 107539 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,716,822
RAC: 13,035
Message 107541 - Posted: 22 Oct 2022, 9:11:34 UTC - in response to Message 107539.  

So whats happening then in plain terms is that the MOO task is huge and complex and splits its self among the two GPU's and the CPU's to be more efficient in getting the work done?
Sounds like it. Might aswell use both for one task if they can, then the task is completed in half the time.

I'm wondering about the efficiency though. I get one done in 11 minutes on a single Radeon Fury (8Tflops SP), and it uses NO cpu at all. You're taking 9 minutes (so 1.2 times faster) on 2 cards totalling 1.5 times the power, and using two CPU cores aswell.

Maybe NVidia are just rubbish. It could be their total lack of DP support, so yours is having to use the CPU for those bits.
ID: 107541 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 107542 - Posted: 22 Oct 2022, 13:34:07 UTC - in response to Message 107541.  

It could be their total lack of DP support, so yours is having to use the CPU for those bits.

All Geforce 10X0 GPUs have DP and Moo! doesn't need it at all. But I'm testing a GTX 275 since yesterday and I get there the same issue with Moo!, 100% of a CPU core to feed it, the HD3850 I had before needed 1-2%. On Milkyway the GTX 275 however does not need that much, so maybe it's a Moo! (or distributed.net) thing.

BTW, use GPU-Z to check GPU usage, not task manager, that's obviously useless for that, the GPU should be near 100% when Moo! is running.
.
ID: 107542 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 246 · 247 · 248 · 249 · 250 · 251 · 252 . . . 300 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org