Compute Error with CASP9_aa_benchmark_hybridization_run02_

Message boards : Number crunching : Compute Error with CASP9_aa_benchmark_hybridization_run02_

To post messages, you must log in.

AuthorMessage
Shurado

Send message
Joined: 9 Feb 12
Posts: 4
Credit: 11,710
RAC: 0
Message 72333 - Posted: 16 Feb 2012, 19:41:19 UTC

I now have two of these end with a compute error. Anyone know what went wrong?

I have a third one that I think I am going to just abort.

First
Second
ID: 72333 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72338 - Posted: 17 Feb 2012, 4:13:07 UTC - in response to Message 72333.  

I now have two of these end with a compute error. Anyone know what went wrong?

I have a third one that I think I am going to just abort.

First
Second


Where did you get your BOINC client? It's being listed as version 6.8.31, but that doesn't seem to be on the official list of BOINC clients http://boinc.berkeley.edu/trac/wiki/VersionHistory

You may want to try upgrading your BOINC client. http://boinc.berkeley.edu/download_all.php There's reports of 7.0.8 being flakey with Rosetta@home, so I would stick to 6.12.34 or 6.10.60. Make sure you get the correct version (i.e. don't get the 64-bit client unless you have a 64-bit machine and a 64-bit version of Windows).
ID: 72338 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shurado

Send message
Joined: 9 Feb 12
Posts: 4
Credit: 11,710
RAC: 0
Message 72340 - Posted: 17 Feb 2012, 6:23:30 UTC

It is from the Charity Engine. It was how I was introduced to crunching. I would switch out to the latest version but I don't know what would happen to my account.
ID: 72340 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 72341 - Posted: 17 Feb 2012, 9:48:38 UTC - in response to Message 72340.  

It is from the Charity Engine. It was how I was introduced to crunching. I would switch out to the latest version but I don't know what would happen to my account.

Simply download the Recommended Version for your OS from the official BOINC download page (and not some other suspicious looking pages).

I see you're using on both your computers here on Rosetta Windows XP x86, so the currently recommended version for those would be 6.12.34.

Depending on your other projects, an older version with shorter backoff intervals might be better, like the 6.10.18 (my favourite) or the 6.10.60.

All available versions can be found here, however there are all beta/development versions in there, so you should stick to those listet in the "Version History" posted above by Rocco Moretti.
.
ID: 72341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 72348 - Posted: 17 Feb 2012, 20:00:12 UTC

Right, your accounts with BOINC projects are not based in the software of a specific PC. So you can change BOINC versions and so long as you use the same user name and password, you are using the same account. In fact, if you have multiple machines, they could have various operating systems and BOINC versions, and all run on behalf of the same account.
Rosetta Moderator: Mod.Sense
ID: 72348 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 72353 - Posted: 18 Feb 2012, 10:48:32 UTC - in response to Message 72348.  

Right, your accounts with BOINC projects are not based in the software of a specific PC. So you can change BOINC versions and so long as you use the same user name and password, you are using the same account. In fact, if you have multiple machines, they could have various operating systems and BOINC versions, and all run on behalf of the same account.

Actually the email adress is the critical part, you might have different user names and I think even passwords on different projects, I had that at first when I joined Rosetta, changed that later when I tried BAM.

Anyway, this Charity Engine might have some moded version of BOINC, since what I understand from their homepage, they sell the computing resources of BOINC users, so if ce3297 wants his to be sold to companies which pay for that, eventually he needs to keep this strange version. But that he has to figure out by himself, either he asks over there or he simply tries, reverting to the older version should not be a problem if he backup his data folder.
.
ID: 72353 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shurado

Send message
Joined: 9 Feb 12
Posts: 4
Credit: 11,710
RAC: 0
Message 72360 - Posted: 18 Feb 2012, 21:03:25 UTC
Last modified: 18 Feb 2012, 21:06:13 UTC

Yes, the email in place is in their control and I'm connected to their account manager. I managed to access my accounts by finding the keys for them (pretty happy to finally make some changes). Anyway, I'll certainly look into which version to install and change the email to my own.

I actually don't get anything, but they run a cash prize lottery for their 'volunteers' depending on the credits earned. My two computers are too slow to bother with it though.

As for the actual problem, it seems the task doesn't checkpoint so I'm going to try increasing the minutes between switching application and see what happens.
ID: 72360 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 72362 - Posted: 19 Feb 2012, 10:28:27 UTC - in response to Message 72360.  
Last modified: 19 Feb 2012, 10:34:51 UTC

Yes, the email in place is in their control and I'm connected to their account manager. I managed to access my accounts by finding the keys for them (pretty happy to finally make some changes). Anyway, I'll certainly look into which version to install and change the email to my own.

Yes, if you have access to your personal accounts on all 3 BOINC projects, that's probably the best you can do.


As for the actual problem, it seems the task doesn't checkpoint so I'm going to try increasing the minutes between switching application and see what happens.

Yeah, the CASP9_aa_benchmark* tasks checkpoint quite seldom as I posted here. I'm using 1500 minutes, so that the rosetta tasks (and pretty much any other tasks) can complete without any interruption (I have 24h as default runtime here on Rosetta). Also, if you have enough RAM, you can try to leave the applications in memory while suspended.

EDIT: I see you don't have much RAM in your computers, specially the P4, so probably better go with increasing the time between switching applications. You might also want to try to allow computing while the computer is in use, if you haven't tried that until now and see if it's OK for you, i.e. you don't get some performance issues (shouldn't happen in most cases).
.
ID: 72362 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile fatbozz

Send message
Joined: 10 Dec 05
Posts: 5
Credit: 1,762,734
RAC: 0
Message 72432 - Posted: 3 Mar 2012, 18:31:19 UTC

all CASP9 wus are finished with computation error.
-
3.3.2012 17:42:48 | rosetta@home | Computation for task CASP9_ba_perfect_aln_hybrid_T0590_SAVE_ALL_OUT_IGNORE_THE_REST_42535_55_1 finished
3.3.2012 17:42:48 | rosetta@home | Output file CASP9_ba_perfect_aln_hybrid_T0590_SAVE_ALL_OUT_IGNORE_THE_REST_42535_55_1_0 for task CASP9_ba_perfect_aln_hybrid_T0590_SAVE_ALL_OUT_IGNORE_THE_REST_42535_55_1 absent
3.3.2012 17:42:48 | rosetta@home | Starting task CASP9_bg_perfect_aln_hybrid_T0624_SAVE_ALL_OUT_IGNORE_THE_REST_43019_54_1 using minirosetta version 322
3.3.2012 17:47:45 | rosetta@home | Computation for task CASP9_be_perfect_aln_hybrid_T0545_SAVE_ALL_OUT_IGNORE_THE_REST_42817_99_0 finished
3.3.2012 17:47:45 | rosetta@home | Output file CASP9_be_perfect_aln_hybrid_T0545_SAVE_ALL_OUT_IGNORE_THE_REST_42817_99_0_0 for task CASP9_be_perfect_aln_hybrid_T0545_SAVE_ALL_OUT_IGNORE_THE_REST_42817_99_0 absent
3.3.2012 17:47:45 | rosetta@home | Starting task CASP9_bg_perfect_aln_hybrid_T0588_SAVE_ALL_OUT_IGNORE_THE_REST_42995_21_1 using minirosetta version 322
3.3.2012 17:50:23 | rosetta@home | Computation for task CASP9_bg_perfect_aln_hybrid_T0624_SAVE_ALL_OUT_IGNORE_THE_REST_43019_54_1 finished
3.3.2012 17:50:23 | rosetta@home | Output file CASP9_bg_perfect_aln_hybrid_T0624_SAVE_ALL_OUT_IGNORE_THE_REST_43019_54_1_0 for task CASP9_bg_perfect_aln_hybrid_T0624_SAVE_ALL_OUT_IGNORE_THE_REST_43019_54_1 absent
3.3.2012 17:50:23 | rosetta@home | Starting task CASP9_bg_perfect_aln_hybrid_T0586_SAVE_ALL_OUT_IGNORE_THE_REST_42994_53_0 using minirosetta version 322
3.3.2012 17:54:27 | rosetta@home | Computation for task CASP9_bg_perfect_aln_hybrid_T0525_SAVE_ALL_OUT_IGNORE_THE_REST_42961_66_1 finished
3.3.2012 17:54:27 | rosetta@home | Output file CASP9_bg_perfect_aln_hybrid_T0525_SAVE_ALL_OUT_IGNORE_THE_REST_42961_66_1_0 for task CASP9_bg_perfect_aln_hybrid_T0525_SAVE_ALL_OUT_IGNORE_THE_REST_42961_66_1 absent
3.3.2012 17:54:27 | rosetta@home | Starting task CASP9_bg_perfect_aln_hybrid_T0563_SAVE_ALL_OUT_IGNORE_THE_REST_42979_51_1 using minirosetta version 322
3.3.2012 18:00:04 | rosetta@home | Computation for task CASP9_bg_perfect_aln_hybrid_T0586_SAVE_ALL_OUT_IGNORE_THE_REST_42994_53_0 finished
3.3.2012 18:00:04 | rosetta@home | Output file CASP9_bg_perfect_aln_hybrid_T0586_SAVE_ALL_OUT_IGNORE_THE_REST_42994_53_0_0 for task CASP9_bg_perfect_aln_hybrid_T0586_SAVE_ALL_OUT_IGNORE_THE_REST_42994_53_0 absent
3.3.2012 18:00:04 | rosetta@home | Starting task CASP9_bf_perfect_aln_hybrid_T0622_SAVE_ALL_OUT_IGNORE_THE_REST_42940_49_0 using minirosetta version 322
3.3.2012 18:03:26 | rosetta@home | Computation for task CASP9_bf_perfect_aln_hybrid_T0625_SAVE_ALL_OUT_IGNORE_THE_REST_42943_38_1 finished
3.3.2012 18:03:26 | rosetta@home | Output file CASP9_bf_perfect_aln_hybrid_T0625_SAVE_ALL_OUT_IGNORE_THE_REST_42943_38_1_0 for task CASP9_bf_perfect_aln_hybrid_T0625_SAVE_ALL_OUT_IGNORE_THE_REST_42943_38_1 absent
3.3.2012 18:03:26 | rosetta@home | Starting task CASP9_bf_perfect_aln_hybrid_T0614_SAVE_ALL_OUT_IGNORE_THE_REST_42936_49_0 using minirosetta version 322
3.3.2012 18:14:43 | rosetta@home | Computation for task CASP9_bf_perfect_aln_hybrid_T0622_SAVE_ALL_OUT_IGNORE_THE_REST_42940_49_0 finished
3.3.2012 18:14:43 | rosetta@home | Output file CASP9_bf_perfect_aln_hybrid_T0622_SAVE_ALL_OUT_IGNORE_THE_REST_42940_49_0_0 for task CASP9_bf_perfect_aln_hybrid_T0622_SAVE_ALL_OUT_IGNORE_THE_REST_42940_49_0 absent
3.3.2012 18:14:43 | rosetta@home | Starting task CASP9_bf_perfect_aln_hybrid_T0608_SAVE_ALL_OUT_IGNORE_THE_REST_42932_49_0 using minirosetta version 322
3.3.2012 18:18:09 | rosetta@home | Computation for task CASP9_bf_perfect_aln_hybrid_T0614_SAVE_ALL_OUT_IGNORE_THE_REST_42936_49_0 finished
3.3.2012 18:18:09 | rosetta@home | Output file CASP9_bf_perfect_aln_hybrid_T0614_SAVE_ALL_OUT_IGNORE_THE_REST_42936_49_0_0 for task CASP9_bf_perfect_aln_hybrid_T0614_SAVE_ALL_OUT_IGNORE_THE_REST_42936_49_0 absent
3.3.2012 18:18:09 | rosetta@home | Starting task CASP9_be_perfect_aln_hybrid_T0628_SAVE_ALL_OUT_IGNORE_THE_REST_42869_99_0 using minirosetta version 322

ID: 72432 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 22 Oct 10
Posts: 9
Credit: 216,670
RAC: 0
Message 72435 - Posted: 4 Mar 2012, 4:43:36 UTC - in response to Message 72432.  

all CASP9 wus are finished with computation error.
-
3.3.2012 17:42:48 | rosetta@home | Computation for task CASP9_ba_perfect_aln_hybrid_T0590_SAVE_ALL_OUT_IGNORE_THE_REST_42535_55_1 finished
3.3.2012 17:42:48 | rosetta@home | Output file CASP9_ba_perfect_aln_hybrid_T0590_SAVE_ALL_OUT_IGNORE_THE_REST_42535_55_1_0 for task CASP9_ba_perfect_aln_hybrid_T0590_SAVE_ALL_OUT_IGNORE_THE_REST_42535_55_1 absent
3.3.2012 17:42:48 | rosetta@home | Starting task CASP9_bg_perfect_aln_hybrid_T0624_SAVE_ALL_OUT_IGNORE_THE_REST_43019_54_1 using minirosetta version 322
3.3.2012 17:47:45 | rosetta@home | Computation for task CASP9_be_perfect_aln_hybrid_T0545_SAVE_ALL_OUT_IGNORE_THE_REST_42817_99_0 finished
3.3.2012 17:47:45 | rosetta@home | Output file CASP9_be_perfect_aln_hybrid_T0545_SAVE_ALL_OUT_IGNORE_THE_REST_42817_99_0_0 for task CASP9_be_perfect_aln_hybrid_T0545_SAVE_ALL_OUT_IGNORE_THE_REST_42817_99_0 absent
3.3.2012 17:47:45 | rosetta@home | Starting task CASP9_bg_perfect_aln_hybrid_T0588_SAVE_ALL_OUT_IGNORE_THE_REST_42995_21_1 using minirosetta version 322
3.3.2012 17:50:23 | rosetta@home | Computation for task CASP9_bg_perfect_aln_hybrid_T0624_SAVE_ALL_OUT_IGNORE_THE_REST_43019_54_1 finished
3.3.2012 17:50:23 | rosetta@home | Output file CASP9_bg_perfect_aln_hybrid_T0624_SAVE_ALL_OUT_IGNORE_THE_REST_43019_54_1_0 for task CASP9_bg_perfect_aln_hybrid_T0624_SAVE_ALL_OUT_IGNORE_THE_REST_43019_54_1 absent
3.3.2012 17:50:23 | rosetta@home | Starting task CASP9_bg_perfect_aln_hybrid_T0586_SAVE_ALL_OUT_IGNORE_THE_REST_42994_53_0 using minirosetta version 322
3.3.2012 17:54:27 | rosetta@home | Computation for task CASP9_bg_perfect_aln_hybrid_T0525_SAVE_ALL_OUT_IGNORE_THE_REST_42961_66_1 finished
3.3.2012 17:54:27 | rosetta@home | Output file CASP9_bg_perfect_aln_hybrid_T0525_SAVE_ALL_OUT_IGNORE_THE_REST_42961_66_1_0 for task CASP9_bg_perfect_aln_hybrid_T0525_SAVE_ALL_OUT_IGNORE_THE_REST_42961_66_1 absent
3.3.2012 17:54:27 | rosetta@home | Starting task CASP9_bg_perfect_aln_hybrid_T0563_SAVE_ALL_OUT_IGNORE_THE_REST_42979_51_1 using minirosetta version 322
3.3.2012 18:00:04 | rosetta@home | Computation for task CASP9_bg_perfect_aln_hybrid_T0586_SAVE_ALL_OUT_IGNORE_THE_REST_42994_53_0 finished
3.3.2012 18:00:04 | rosetta@home | Output file CASP9_bg_perfect_aln_hybrid_T0586_SAVE_ALL_OUT_IGNORE_THE_REST_42994_53_0_0 for task CASP9_bg_perfect_aln_hybrid_T0586_SAVE_ALL_OUT_IGNORE_THE_REST_42994_53_0 absent
3.3.2012 18:00:04 | rosetta@home | Starting task CASP9_bf_perfect_aln_hybrid_T0622_SAVE_ALL_OUT_IGNORE_THE_REST_42940_49_0 using minirosetta version 322
3.3.2012 18:03:26 | rosetta@home | Computation for task CASP9_bf_perfect_aln_hybrid_T0625_SAVE_ALL_OUT_IGNORE_THE_REST_42943_38_1 finished
3.3.2012 18:03:26 | rosetta@home | Output file CASP9_bf_perfect_aln_hybrid_T0625_SAVE_ALL_OUT_IGNORE_THE_REST_42943_38_1_0 for task CASP9_bf_perfect_aln_hybrid_T0625_SAVE_ALL_OUT_IGNORE_THE_REST_42943_38_1 absent
3.3.2012 18:03:26 | rosetta@home | Starting task CASP9_bf_perfect_aln_hybrid_T0614_SAVE_ALL_OUT_IGNORE_THE_REST_42936_49_0 using minirosetta version 322
3.3.2012 18:14:43 | rosetta@home | Computation for task CASP9_bf_perfect_aln_hybrid_T0622_SAVE_ALL_OUT_IGNORE_THE_REST_42940_49_0 finished
3.3.2012 18:14:43 | rosetta@home | Output file CASP9_bf_perfect_aln_hybrid_T0622_SAVE_ALL_OUT_IGNORE_THE_REST_42940_49_0_0 for task CASP9_bf_perfect_aln_hybrid_T0622_SAVE_ALL_OUT_IGNORE_THE_REST_42940_49_0 absent
3.3.2012 18:14:43 | rosetta@home | Starting task CASP9_bf_perfect_aln_hybrid_T0608_SAVE_ALL_OUT_IGNORE_THE_REST_42932_49_0 using minirosetta version 322
3.3.2012 18:18:09 | rosetta@home | Computation for task CASP9_bf_perfect_aln_hybrid_T0614_SAVE_ALL_OUT_IGNORE_THE_REST_42936_49_0 finished
3.3.2012 18:18:09 | rosetta@home | Output file CASP9_bf_perfect_aln_hybrid_T0614_SAVE_ALL_OUT_IGNORE_THE_REST_42936_49_0_0 for task CASP9_bf_perfect_aln_hybrid_T0614_SAVE_ALL_OUT_IGNORE_THE_REST_42936_49_0 absent
3.3.2012 18:18:09 | rosetta@home | Starting task CASP9_be_perfect_aln_hybrid_T0628_SAVE_ALL_OUT_IGNORE_THE_REST_42869_99_0 using minirosetta version 322

ID: 72435 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,169,305
RAC: 3,078
Message 72449 - Posted: 5 Mar 2012, 0:04:18 UTC - in response to Message 72432.  

all CASP9 wus are finished with computation error.


Mine too, I am turning Rosie off and aborting all my units until it is fixed! It is a waste of my resources!!
ID: 72449 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72458 - Posted: 5 Mar 2012, 19:49:13 UTC

Sorry about that - The offending jobs have already been re-sized so no new work units for the bad jobs will be sent out.

We're gearing up for CASP10 (less than two months to go!) and people in lab are feverishly testing new protocols. These jobs, with a new and improved protocol, looked to be good when tested on RALPH@home, but when run on a larger scale were noticed to have issues.

I'd like to put out a plug for RALPH@home (http://ralph.bakerlab.org/) our alpha testing server. If you can spare some cycles, we'd appreciate it if you'd sign up. The more power we have on the testing project, the larger test jobs we can be comfortable sending out, and hopefully will be able to find these problem sooner. (Note that as a testing project, there won't always be work queued, and the jobs/application you get might be error prone, so it may not be appropriate for everyone.)

Thanks.
ID: 72458 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,169,305
RAC: 3,078
Message 72460 - Posted: 5 Mar 2012, 23:43:20 UTC - in response to Message 72458.  

Sorry about that - The offending jobs have already been re-sized so no new work units for the bad jobs will be sent out.

We're gearing up for CASP10 (less than two months to go!) and people in lab are feverishly testing new protocols. These jobs, with a new and improved protocol, looked to be good when tested on RALPH@home, but when run on a larger scale were noticed to have issues.

I'd like to put out a plug for RALPH@home (http://ralph.bakerlab.org/) our alpha testing server. If you can spare some cycles, we'd appreciate it if you'd sign up. The more power we have on the testing project, the larger test jobs we can be comfortable sending out, and hopefully will be able to find these problem sooner. (Note that as a testing project, there won't always be work queued, and the jobs/application you get might be error prone, so it may not be appropriate for everyone.)

Thanks.


Rocco I KNOW you are VERY VERY busy but MY, and others, computers is WHY you are busy! If you can't be around to catch and correct these errors more often why put out this stuff? I crunched for days and days and days and all I got was error after error after error! FINALLY I say I have had enough, I am sure others have too, and you step in and say 'oops my bad'?! As I used to tell the new guy at work 'I am not a guinea pig, test your recipes out at home FIRST'! I tried to uncheck getting the Casp9 units from the project options, no such option exists! So essentially YOU, and the Team, were the cause of Rosetta crunching nothing but JUNK the last few days or week! There are WAAAY too many Boinc Projects in the sea to waste time crunching for Rosetta, no matter how noble your premise is, if all I am returning is junk it is WORTHLESS!!!

You can take this with a grain of salt, you can take to the delete key, I don't care but I am NOT crunching for Rosetta until you can PROVE it is not WORTHLESS and MY time and money down the drain!!! Crunching is NOT free, we users each pay our own way to do this, to HELP YOU, the Project, with your research! I CHOSE to help, you CHOSE to give me junk work, I am now CHOOSING to move on!!
ID: 72460 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TD Nickell
Avatar

Send message
Joined: 20 Jan 07
Posts: 10
Credit: 3,810,259
RAC: 0
Message 72482 - Posted: 10 Mar 2012, 18:55:45 UTC

The last ten work units i have done have failed.All were CASP 9.What a waste of time.
When will this be fixed?
ID: 72482 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Compute Error with CASP9_aa_benchmark_hybridization_run02_



©2024 University of Washington
https://www.bakerlab.org