Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 246 · 247 · 248 · 249 · 250 · 251 · 252 . . . 300 · Next
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,822 RAC: 13,035 |
I've asked here if AMDs can do so: https://moowrap.net/forum_thread.php?id=647 Did you do anything special to combine cards on the project? |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
I would like to see what actual use is in resource/ task monitor or the like |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I've asked here if AMDs can do so: https://moowrap.net/forum_thread.php?id=647 No...everything is standard. I don't mess around with stuff like that. All projects are default. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I would like to see what actual use is in resource/ task monitor or the like When MOO is up to run again, I will grab a screen shot. Right now Einstein and Prime are running. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,822 RAC: 13,035 |
I jsut tried Moo on a computer with a Tahiti and a Fury (both AMD, not too far apart. Tahiti is 3GB 4tflops SP, Fury is 4GB 8tflops SP.) But I got tasks for 1 AMD at a time, maybe it's a Cuda thing. I've asked here:I've asked here if AMDs can do so: https://moowrap.net/forum_thread.php?id=647 https://moowrap.net/forum_thread.php?id=647#8359 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
And then FAH GPU But what is interesting is Windows shows only 20% usage of the GPU but if you look at MSI Afterburner it shows 98% A GPU engine represents an independent unit of silicon on the GPU that can be scheduled and can operate in parallel with one another. For example, a copy engine may be used to transfer data around while a 3D engine is used for 3D rendering. While the 3D engine can also be used to move data around, simple data transfers can be offloaded to the copy engine, allowing the 3D engine to work on more complex tasks, improving overall performance. In this case both the copy engine and the 3D engine would operate in parallel. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Not the output I was expecting from task mangler , and leaves me baffled , Afterburner looks to be telling it as it is . |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 258 Credit: 483,503 RAC: 133 |
Try switching to performance tab, down to gpu and press on title of one of the graphs There should be Cuda graph. It doesn't show up on my screenshot because i have Hardware accelerated gpu memory scheduler enabled. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I did have a look at the time. They were mirrored just as it says in the processes. The text below all the images I posted explains it. A GPU engine represents an independent unit of silicon on the GPU that can be scheduled and can operate in parallel with one another. For example, a copy engine may be used to transfer data around while a 3D engine is used for 3D rendering. While the 3D engine can also be used to move data around, simple data transfers can be offloaded to the copy engine, allowing the 3D engine to work on more complex tasks, improving overall performance. In this case both the copy engine and the 3D engine would operate in parallel. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
So I am showing you the 1080 startup and then the 1050 running So as you see the copy box is active on both. Again...refer to the text in the previous post. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 258 Credit: 483,503 RAC: 133 |
What other graphs does it support? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
What other graphs does it support? What do you mean? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
There are generally two ways to distribute computation across multiple devices: Data parallelism, where a single model gets replicated on multiple devices or multiple machines. Each of them processes different batches of data, then they merge their results. There exist many variants of this setup, that differ in how the different model replicas merge results, in whether they stay in sync at every batch or whether they are more loosely coupled, etc. Model parallelism, where different parts of a single model run on different devices, processing a single batch of data together. This works best with models that have a naturally-parallel architecture, such as models that feature multiple branches. This guide focuses on data parallelism, in particular synchronous data parallelism, where the different replicas of the model stay in sync after each batch they process. Synchronicity keeps the model convergence behavior identical to what you would see for single-device training. Specifically, this guide teaches you how to use the tf.distribute API to train Keras models on multiple GPUs, with minimal changes to your code, in the following two setups: On multiple GPUs (typically 2 to 8) installed on a single machine (single host, multi-device training). This is the most common setup for researchers and small-scale industry workflows. On a cluster of many machines, each hosting one or multiple GPUs (multi-worker distributed training). This is a good setup for large-scale industry workflows, e.g. training high-resolution image classification models on tens of millions of images using 20-100 GPUs. More at: https://keras.io/guides/distributed_training/ |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 258 Credit: 483,503 RAC: 133 |
There should be a graph called cuda. For some reason task manager doesn't count it at processes tab. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
[snip] There are generally two ways to distribute computation across multiple devices: NVIDIA GPUs work best with models that use few branches. The GPU cores are in groups of 16, called warps. Each warp can execute only one instruction at a time, but it can execute it simultaneously in any combination of the GPU cores in the warp. Therefore, it works best if the work is arranged so that there are few branches than affect some of the cores within a warp, but not all of them. I've found little information of whether this also happens in AMD GPUs. Dividing the work between two or more GPUs should work if the portion of the work on one GPU does not need to exchange more than a rather small amount of information with any other GPU, provided that the application can divide the work properly which means that the division must be written into the application rather than expecting it to happen automatically. Multithreaded CPU applications can use multiple CPU cores at once if the application is written to allow this. The main restriction on these is that no two virtual cores within a physical core can execute an instruction at the same time. However, main memory speed is usually such that any virtual core waiting on a main memory access will not have to wait any longer if another virtual core for that physical core can get its inputs from a cache instead of main memory. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,822 RAC: 13,035 |
The GPU cores are in groups of 16, called warps. Each warp can execute only one instruction at a time, but it can execute it simultaneously in any combination of the GPU cores in the warp.Are warps equivalent to wavefronts? https://community.amd.com/t5/archives-discussions/stream-processor-quot-wavefront-quot-term-definition/td-p/81505 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
I've found little information of whether this also happens in AMD GPUs.Are warps equivalent to wavefronts? I read that, and found it confusing about whether they are or not. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
So whats happening then in plain terms is that the MOO task is huge and complex and splits its self among the two GPU's and the CPU's to be more efficient in getting the work done? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,716,822 RAC: 13,035 |
So whats happening then in plain terms is that the MOO task is huge and complex and splits its self among the two GPU's and the CPU's to be more efficient in getting the work done?Sounds like it. Might aswell use both for one task if they can, then the task is completed in half the time. I'm wondering about the efficiency though. I get one done in 11 minutes on a single Radeon Fury (8Tflops SP), and it uses NO cpu at all. You're taking 9 minutes (so 1.2 times faster) on 2 cards totalling 1.5 times the power, and using two CPU cores aswell. Maybe NVidia are just rubbish. It could be their total lack of DP support, so yours is having to use the CPU for those bits. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
It could be their total lack of DP support, so yours is having to use the CPU for those bits. All Geforce 10X0 GPUs have DP and Moo! doesn't need it at all. But I'm testing a GTX 275 since yesterday and I get there the same issue with Moo!, 100% of a CPU core to feed it, the HD3850 I had before needed 1-2%. On Milkyway the GTX 275 however does not need that much, so maybe it's a Moo! (or distributed.net) thing. BTW, use GPU-Z to check GPU usage, not task manager, that's obviously useless for that, the GPU should be near 100% when Moo! is running. . |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org