Tells us your thoughts on granting credit for large protein, long-running tasks

Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 9 · Next

AuthorMessage
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 95493 - Posted: 28 Apr 2020, 17:54:18 UTC - in response to Message 95488.  

Someone from DPC over here: we've notified the guy running the Nifhack account of this thread and asked if he wants, and is able to, clarify this. He's know for having access to huge amounts of computational power (at work, I believe) but can't deploy all of it all the time. He's also known to rarely part with specifics. My guess is as well those machines are indeed some sort of hosts to the computers behind.



Something similar to using Amazon cloud to fire up Rosetta instances on a grand scale, but routing it through a single BONIC client? Except instead of renting time on Amazons system, he has the ability to do this all on a private cloud?
ID: 95493 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2114
Credit: 41,105,271
RAC: 21,658
Message 95499 - Posted: 28 Apr 2020, 19:28:34 UTC - in response to Message 95461.  

My question about credits is, what is up with this guy? Within 3 days, he has the top three "fastest" computers by nearly a factor of 6.
They are returning a lot of Tasks for such a small number of core/threads.
0.72 day turn around. 8 hour runtime. 4,600 Tasks in progress on one system, over 6000 Valid.
0.72 day turn around. 8 hour runtime. 1,300 Tasks in progress on the others, roughly 1,650 each Valid on the others.

Number of times client has contacted the server, 3 for one system. 0 for the others?

Some sort of CPU compute cluster feeding it's results through those host IDs?

Is the work really done? If it is, great. But the "credit" only goes through one host? If the work done is real, who cares?
Neat trick. If he's got authority to do it, great. If not, it's nothing to do with me...
ID: 95499 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile lazyacevw

Send message
Joined: 18 Mar 20
Posts: 12
Credit: 93,576,463
RAC: 0
Message 95530 - Posted: 29 Apr 2020, 6:12:27 UTC - in response to Message 95488.  

Someone from DPC over here: we've notified the guy running the Nifhack account of this thread and asked if he wants, and is able to, clarify this. He's know for having access to huge amounts of computational power (at work, I believe) but can't deploy all of it all the time. He's also known to rarely part with specifics. My guess is as well those machines are indeed some sort of hosts to the computers behind.


Thanks for the insight! I'm just getting into edge computing and would love to know some of the details to play around with a similar setup.
ID: 95530 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Millenium

Send message
Joined: 20 Sep 05
Posts: 68
Credit: 184,283
RAC: 0
Message 95546 - Posted: 29 Apr 2020, 14:06:24 UTC

Yup, nothing wrong in what he is doing, it all seems good, valid, crunching. It's just funny seeing a single host with that RAC
ID: 95546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[DPC]DeApen~Kuuke

Send message
Joined: 1 Feb 06
Posts: 1
Credit: 53,281
RAC: 0
Message 95555 - Posted: 29 Apr 2020, 17:22:54 UTC

Our DPC-member is working for Nikhef - our National Institute for Subatomic Physics.
It's not the first time Nikhef is testing new toys on Rosetta :)

The Dutch Power Cows have their yearly "stampede" on Rosetta this year. The stampede will end on april 30 so expect a slow down in our production.

If you want to know a little bit more about the computing power of Nifhack parse this https://www.nikhef.nl/news/nieuwe-nikhef-rekenclusters-gaan-eerst-aan-het-coronavirus-rekenen/ through DeepL or some other translator. Original article is in Dutch.

In short:
64x Lenovo SR655 systems, most of them with AMD EPYC 7702P and 512GB ram.
It is housed in 4x 19" cabinets.
Each system has a Mellanox Connect LX4 that connects with 2x 25Gbit/s to the network.
Subset of the systems are equipped with GPUs.


Kuuke, moderator DPC forum
ID: 95555 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Terrible T

Send message
Joined: 29 Dec 16
Posts: 4
Credit: 1,333,030
RAC: 0
Message 95557 - Posted: 29 Apr 2020, 17:25:44 UTC - in response to Message 95530.  
Last modified: 29 Apr 2020, 17:27:54 UTC

too late ;

tks Kuuke
ID: 95557 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jacosito

Send message
Joined: 11 Jan 13
Posts: 1
Credit: 68,862
RAC: 0
Message 95560 - Posted: 29 Apr 2020, 19:16:56 UTC - in response to Message 94913.  

The WU time initial is 8 hours. While processing the WU time rise up to 48 hours or more.
(Sorry my english).

Greetings
ID: 95560 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 95561 - Posted: 29 Apr 2020, 19:40:09 UTC - in response to Message 95560.  

You seem to have specified a runtime preference of the highest value allowed (36 hours). It will take BOINC Manager a few days to get used to tasks taking 36 hours to complete. The estimates are not very accurate when you first start out.
Rosetta Moderator: Mod.Sense
ID: 95561 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bkil
Avatar

Send message
Joined: 11 Jan 20
Posts: 97
Credit: 4,433,288
RAC: 0
Message 95567 - Posted: 29 Apr 2020, 20:46:34 UTC - in response to Message 95449.  

I think if you deploy a vast amount of nodes from fixed OS images that have the BOINC folder hard coded then a clashing host ID could result something like this.
ID: 95567 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2114
Credit: 41,105,271
RAC: 21,658
Message 95611 - Posted: 30 Apr 2020, 15:09:26 UTC - in response to Message 95546.  

Yup, nothing wrong in what he is doing, it all seems good, valid, crunching. It's just funny seeing a single host with that RAC

I'm sure I've seen a news report in the past where the user didn't have the authority to take over all the machines
Now that was funny (to read)
ID: 95611 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RandyF

Send message
Joined: 2 Nov 14
Posts: 6
Credit: 7,744,262
RAC: 0
Message 95980 - Posted: 4 May 2020, 3:28:25 UTC
Last modified: 4 May 2020, 3:41:02 UTC

WTH is this?! Took my overclocked Ryzen 3900x over SIX hours to get SIX........SIX credits?! Is this a credit error, or the norm? What a waste of electricity..? SMDH...https://photos.app.goo.gl/rGjmPdzUNLErS5CeA



Did I get a batch of bad WU's?! Can someone please look into the following?

TASK: 1165850138 WORK UNIT: 1045787673 SENT: 30 Apr 2020, 22:27:22 UTC REPORTED: 3 May 2020, 19:02:13 UTC Completed and validated TOTAL TIME: 23,504.78 CPU TIME: 22,564.69 CREDIT= 7.06 Rosetta v4.15
windows_x86_64
1165798670 1048233647 30 Apr 2020, 21:57:15 UTC 3 May 2020, 19:02:51 UTC Completed and validated 24,268.53 23,202.55 CREDIT= 6.46 Rosetta v4.15
windows_x86_64
1165774723 1048213014 30 Apr 2020, 21:23:53 UTC 3 May 2020, 18:07:52 UTC Completed and validated 22,475.43 21,490.02 CREDIT= 7.65 Rosetta v4.15
windows_x86_64
1165751409 1048192909 30 Apr 2020, 20:50:56 UTC 3 May 2020, 16:42:29 UTC Completed and validated 22,156.37 21,163.57 CREDIT= 6.22 Rosetta v4.15
windows_x86_64
1165779858 1048149788 30 Apr 2020, 20:42:40 UTC 3 May 2020, 16:41:20 UTC Completed and validated 23,246.45 22,230.59 CREDIT= 6.32 Rosetta v4.15
windows_x86_64
1165742379 1048185110 30 Apr 2020, 20:37:25 UTC 3 May 2020, 22:31:38 UTC Completed and validated 31,895.94 30,596.40 CREDIT= 45.84 Rosetta v4.15
windows_x86_64
1165733818 1048177553 30 Apr 2020, 20:25:06 UTC 3 May 2020, 22:32:28 UTC Completed and validated 34,077.06 32,814.71 CREDIT= 39.39 Rosetta v4.15
windows_x86_64

Those were all crunched by computer 4246752. Measured floating point speed: 4958.25 million ops/sec
Measured integer speed: 19723.67 million ops/sec
ID: 95980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1670
Credit: 17,504,454
RAC: 24,396
Message 95981 - Posted: 4 May 2020, 3:39:28 UTC - in response to Message 95980.  
Last modified: 4 May 2020, 3:41:32 UTC

WTH is this?! Took my overclocked Ryzen 3900x over SIX hours to get SIX........SIX credits?! Is this a credit error, or the norm? What a waste of electricity..? SMDH...
With your computers hidden it's difficult to help, but maybe if you post this in the "Is the amount of credits I'm getting normal?" thread and make your systems visible it would be a start.
Grant
Darwin NT
ID: 95981 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RandyF

Send message
Joined: 2 Nov 14
Posts: 6
Credit: 7,744,262
RAC: 0
Message 95982 - Posted: 4 May 2020, 3:42:00 UTC - in response to Message 95981.  

Thank you. I will post there.
ID: 95982 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Admin
Project administrator

Send message
Joined: 1 Jul 05
Posts: 4805
Credit: 0
RAC: 0
Message 95984 - Posted: 4 May 2020, 3:52:08 UTC

I took a look and it appears your host was successfully returning results but then became unstable. Might there be an issue with the host?
ID: 95984 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RandyF

Send message
Joined: 2 Nov 14
Posts: 6
Credit: 7,744,262
RAC: 0
Message 95987 - Posted: 4 May 2020, 4:55:05 UTC - in response to Message 95984.  
Last modified: 4 May 2020, 4:57:43 UTC

There was, indeed, a hiccup today! The CPU over-temp'd, and I had to dial it back... I guess 4.2GHz on 12 cores/24 threads @ 1.35v was too much. Does it look ok now? She's been running a lot cooler since the "incident". Lol. Came down from 96°C to ~72°C. Thanks for looking into my conundrum.... You guys and gals are awesome! You can delete my original post, if you want...

Now, back to the topic at hand... Bring on the monster WU's!
ID: 95987 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile CyberTailor
Avatar

Send message
Joined: 26 Dec 18
Posts: 8
Credit: 579,176
RAC: 925
Message 96020 - Posted: 4 May 2020, 14:12:44 UTC - in response to Message 94913.  

if the BOINC Manager sees requests for memory that exceed that 4GB the task is actually aborted

If aborted tasks still return results, it's possible to assign credits for computed models.
ID: 96020 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2114
Credit: 41,105,271
RAC: 21,658
Message 96073 - Posted: 4 May 2020, 20:52:21 UTC - in response to Message 96020.  

if the BOINC Manager sees requests for memory that exceed that 4GB the task is actually aborted

If aborted tasks still return results, it's possible to assign credits for computed models.

I believe it does now - according to the CPU time the task reports back
ID: 96073 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 96074 - Posted: 4 May 2020, 21:43:07 UTC

We need to define terms carefully. With "Aborted" work units, the BOINC Manager does not return the results files. If a work unit ends abnormally, or it is ended by the watchdog, then results are returned and credit is granted based on the number of completed models.
Rosetta Moderator: Mod.Sense
ID: 96074 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RME

Send message
Joined: 4 Mar 20
Posts: 12
Credit: 1,211,010
RAC: 0
Message 96715 - Posted: 22 May 2020, 3:13:37 UTC - in response to Message 95339.  
Last modified: 22 May 2020, 3:18:59 UTC

I can't wait to get to 1,000,000 points so I can get my reward.

Well I made my million points and made me some pretty good tacos.
ID: 96715 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Joseph Francis

Send message
Joined: 9 Jun 20
Posts: 2
Credit: 718
RAC: 0
Message 97302 - Posted: 9 Jun 2020, 9:33:01 UTC

Would like to see GPU work units in this project is where my desktop has its most efficient throughput.
ID: 97302 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 9 · Next

Message boards : Number crunching : Tells us your thoughts on granting credit for large protein, long-running tasks



©2024 University of Washington
https://www.bakerlab.org