Message boards : Number crunching : Rosetta 4.1+ and 4.2+
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 34 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,767,500 RAC: 22,869 |
and probably help the programmers find out why they went wrong.That's the annoying thing. There is a problem with them, thousands of results for these faulty Work Units have been sent back. So it's well past time time to fix the problem before sending out even more pointless Tasks. It's not like it's one here or there that has an issue, or there are different types of failures- it's the entire group that fail, with the same error, every time. It may not use up much of our time, but it does use up project bandwidth & storage resources- which could be better used for work that does actually provide a result that isn't an error. Grant Darwin NT |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Remember this project is run by a university. It’s August. Most likely everybody is on holiday. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,739,033 RAC: 7,061 |
and probably help the programmers find out why they went wrong.That's the annoying thing. It's not like this is LHC, the tasks are small and download instantly, and they have very high bandwidth servers. They're not at their limit, so a few problems are not causing anything to be delayed. And I've no idea why you think there are a lot of them, I spot only one or two a day, while running 66 cores and I'm in front of the computer most of the time. The server will automatically stop resending a work unit if it crashes on a few of our machines. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,767,500 RAC: 22,869 |
It's not like this is LHC, the tasks are small and download instantly, and they have very high bandwidth servers.But the results are large, even the error ones. All bandwidth has to be paid for. Datacentre compute & file storage storage costs money. Better to use that money for things that provide valid results, not errors. And I've no idea why you think there are a lot of them, I spot only one or two a day, while running 66 cores and I'm in front of the computer most of the time.One or two a day, with 66 cores. there are thousands of machines, many with dozens (even hundreds) of threads each. Check the top system list- roughly 30 or so errors or invalid per system per day. That works out to thousands, even tens of thousands, of Work Units that produced nothing but error/invalid results. It's not like this is something that just started, it's been going on for almost 2 weeks now. The server will automatically stop resending a work unit if it crashes on a few of our machines.Better not to send out rubbish in the first place- even more so once you should already know that it's rubbish from all the previous ones that failed. It's not like some of these Work Unit process ok and others don't- all of the present failures are failing & failing in the same way. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,767,500 RAC: 22,869 |
Remember this project is run by a university. It’s August. Most likely everybody is on holiday.It's not up to the project to sort this out, but the researchers that keep submitting work that doesn't produce useful results. And it's not been going on for just a few days, we're talking almost 2 weeks now. Grant Darwin NT |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,739,033 RAC: 7,061 |
But the results are large, even the error ones. All bandwidth has to be paid for. Datacentre compute & file storage storage costs money. Better to use that money for things that provide valid results, not errors. They have a huge amount of spare bandwidth, I doubt it's metered any more than yours or mine. One or two a day, with 66 cores. there are thousands of machines, many with dozens (even hundreds) of threads each. You need to express that as a percentage or it's meaningless. It's like when governments say 500 people died in car crashes. Yeah, out of 100 million, so not important. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,739,033 RAC: 7,061 |
Remember this project is run by a university. It’s August. Most likely everybody is on holiday.It's not up to the project to sort this out, but the researchers that keep submitting work that doesn't produce useful results. Do you seriously think they're blindly throwing in work when the last batch comes back as failed? Only thing I've seen act like that is my cat. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,767,500 RAC: 22,869 |
You bet it's metered. It all has to be paid for, and the more you use the more you pay.But the results are large, even the error ones. All bandwidth has to be paid for. Datacentre compute & file storage storage costs money. Better to use that money for things that provide valid results, not errors.They have a huge amount of spare bandwidth, I doubt it's metered any more than yours or mine. It's not meaningless, it's an absolute value. It's might be a small percentage of the whole, but it is still a large number.One or two a day, with 66 cores. there are thousands of machines, many with dozens (even hundreds) of threads each.You need to express that as a percentage or it's meaningless. There will always be some Work Units that produce errors as they try new things. but it's ridiculous to keep submitting Work Units that have yet to produce a single valid result. It's like when governments say 500 people died in car crashes. Yeah, out of 100 million, so not important.Unless it's your wife, husband, kids, parents, girlfriend, boyfriend, best friend etc that is one of the dead. Grant Darwin NT |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
It's not up to the project to sort this out, but the researchers… who all work at universities, too |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,767,500 RAC: 22,869 |
Do you seriously think they're blindly throwing in work when the last batch comes back as failed? Only thing I've seen act like that is my cat.The first ones came out a week and a half ago. The present ones were released in the last day or so. When the supply of new Work Units dries up, the Ready to send supply runs out within 2 days. That would indicate that yes they are sending out more work even when the initial tasks from that batch all failed. And keep on sending out more work, even as all the returns are failures. Either that or there is a large batch of work all queued up to be sent out, and instead of cancelling the rest of the batch (since everything returned so far has been Invalid or an Error) they're just letting it go out anyway. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,767,500 RAC: 22,869 |
Has the last week and half been a University holiday in the US?It's not up to the project to sort this out, but the researchers… who all work at universities, too Given that Rosetta hasn't run out of work for a while, i figure that some people must still be there submitting new work to process. It would be nice if they checked some of the early results coming back to see if they should keep submitting certain work or cancel it and sort out what's wrong with it, before resubmitting it. Grant Darwin NT |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
And again I continue to receive validation errors (another 8 total today, 8-15-20,) on my Android device (3396190) for the foldit1 tasks. Again I question why these clearly erroneous tasks continue to be created! The batch in question today were created during the day of 8/15. The admins and researcher(s) involved with this particular set of tasks must know by now that this particular set of tasks won't produce anything useful! Of note, I've only recently seen these errors/tasks on my Android devices -- not on my PCs -- for whatever reason. Examples of the specific validation errors have previously been quoted by myself and others in this thread. Today's errors include the following tasks: 1. Name: foldit1_2008762_0003_00_asym_dock_SAVE_ALL_OUT_1005836_4487 Task: 1241587912 2. Name: foldit1_2008835_c016_00_asym_dock_SAVE_ALL_OUT_1005885_4489 Task: 1241587932 3. Name: foldit1_2008835_0008_00_asym_dock_SAVE_ALL_OUT_1005883_4423 Task: 1241586218 Thank you, Grant, for your expert logical arguments regarding this continued issue! |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,767,500 RAC: 22,869 |
Of note, I've only recently seen these errors/tasks on my Android devices -- not on my PCs -- for whatever reason.Luck. All foldit1 Work Units on my PCs have crashed & burned. foldit0 Work Units however aren't an issue and process normally. Edit- or more a case of bad luck with the system that's getting nothing but foldit1Tasks. It's a slow system, and doesn't have much in it's cache, so it's requesting new work as it completes the previous Task. Unfortunately at the time, there's a bunch of folidt1Tasks sitting there, and since they complete in a matter of minutes, there's still plenty of them there each time it finishes one, and so ends up getting another as a replacement. Grant Darwin NT |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
A typical academic year in the U.S. is (at longest) September–July, so you can expect most universities to be very quiet for the whole of August. The plentiful supply of work might be down to researchers submitting huge batches before going on holiday, but they won’t see that it’s failed until they get back. And/or the people active now and submitting new work are not the same people, so are in no position to do anything about the broken tasks. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,767,500 RAC: 22,869 |
A typical academic year in the U.S. is (at longest) September–July, so you can expect most universities to be very quiet for the whole of August.Huh. Here the academic year aligns with the calendar year. General schools start at the end of January (i think early/mid February for Unis) and the school year ends in mid Dec (Mid/late Nov for Unis). Summer semesters are from Mid Nov to late Feb. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,179,786 RAC: 10,068 |
The recent posts are all rather peculiar to me. I thought I must have missed these tasks going through, but there are none at all in my recent history. I had a couple of errors, but I caused them myself after crashing one PC |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,739,033 RAC: 7,061 |
You bet it's metered. It all has to be paid for, and the more you use the more you pay. You know for a fact it's a metered connection do you? And since the errors are probably a tenth of a percent of the bandwidth, it's not worth getting upset about. It's not meaningless, it's an absolute value. It's might be a small percentage of the whole, but it is still a large number. Which is unlikely if the percentage is low! Let me explain this in simpler terms.... Scenario 1: 100 people in a room, 2 die. That's alarming. It could have been you. Scenario 2: 1 million people in a room, 2 die. No concern at all. Chances are you're safe. But according to you, 2 is the same as 2, so just as dangerous! If that doesn't make sense to you, how about some real analogies: Scenario 1 becomes: 100 people go skydiving and 2 die. We can conclude skydiving is dangerous. Scenario 2 becomes: 1 million people drive to work every day and 2 die. We can conclude driving is 10,000 times safer than skydiving, but your calculation would say they're equally dangerous, because both killed 2 people. That would indicate that yes they are sending out more work even when the initial tasks from that batch all failed. And keep on sending out more work, even as all the returns are failures. You're assuming they're incompetant morons, I very much doubt that. And they probably have a lot more idea of how it works than you do, since they work there. Huh. Which means you have big holidays too, but you may have noticed you're in the other hemisphere, so the month names are different.... |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,739,033 RAC: 7,061 |
The recent posts are all rather peculiar to me. There's not many faulty tasks, just a few people getting upset over nothing. I think we must have some journalists in here, that's their way of thinking. One little thing goes wrong and they think the world is about to end. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,767,500 RAC: 22,869 |
Which is unlikely if the percentage is low! Let me explain this in simpler terms....Since you're missing the point entirely (deliberately or otherwise), there's no point discussing it further. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,767,500 RAC: 22,869 |
The recent posts are all rather peculiar to me.And today is the first day i don't have any in my current Task list either, after having them continuously for over a week and a half. Even the number of errored/Invalid Tasks for the top systems has dropped off a lot over the last few hours. Grant Darwin NT |
Message boards :
Number crunching :
Rosetta 4.1+ and 4.2+
©2024 University of Washington
https://www.bakerlab.org