Message boards : Number crunching : Validation errors prior to fileserver crash
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,228,659 RAC: 8,784 |
A new team-mate has been running very successfully since joining, but on the 4th5th had 14 consecutive validation errors. No idea why but it doesn't look like it's a problem at his end. Was it the first hint of issues on the fileserver? Anyone else see this with their uploads? I didn't and neither did other team-mates. Can these WUs be re-checked? Edit: Oops! User is itnumberpi |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
Hey Sid - how much run time did he have on these failed tasks? I had a few tasks fail shortly before the outage and I think that they ended up with validation errors - but since they only ran for 10 15 seconds each it was clear that they were sour tasks from the start CH |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,228,659 RAC: 8,784 |
Looks to be the full 3 hours in the main - not a WU problem but a validation issue it seems to me. I'm not in contact with the guy to know more - he's a friend of a friend. |
Chris Holvenstot Send message Joined: 2 May 10 Posts: 220 Credit: 9,106,918 RAC: 0 |
If that's the case then clearly it's not the same issue. Have a great night! |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,228,659 RAC: 8,784 |
Not sure why but I just took a look at your machines, seeing as you have such a high RAC. Take a look at your last results on these 3 machines. Same thing. https://boinc.bakerlab.org/rosetta/results.php?hostid=1312275 https://boinc.bakerlab.org/rosetta/results.php?hostid=1346087 https://boinc.bakerlab.org/rosetta/results.php?hostid=1277775 No idea why it happens on those and not on your other ones... Also, see KEL's message on the Rosetta front page. That's one guy who's not going to have a good night... |
Bernd Schnitker Send message Joined: 2 Jan 09 Posts: 10 Credit: 62,009 RAC: 0 |
I have 2 that failed to validate from the 4th and 5th of Jan also. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=357710503 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=357647672 I hope they are fixed in the end. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Some validation issues as the project restarted may be due to the restored database not being entirely in-synch as mentioned in the project news on the home page. This is probably why there is currently no validation being done, it would be doing more harm then good to the databases. Some validation issues prior to the crash may have been precursors to the final failure that occurred. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Validation errors prior to fileserver crash
©2024 University of Washington
https://www.bakerlab.org