Message boards : Number crunching : Problems with Rosetta version 5.98
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next
Author | Message |
---|---|
glaesum Send message Joined: 16 Oct 06 Posts: 21 Credit: 509,306 RAC: 35 |
starting to get the occasional error on 5.98: here is a t443 {wuid=158705243} that plugged away for nearly 15hrs until it packed in with a validate error. credit was claimed and granted but never actually got issued... there's no diagnostic on my task report but the wingman's task stopped with client error after 20mins and does have lots of diagnostics (too many restarts with no progress). |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I'm also seeing this same slowdown problem. A 4 hour task on t443 (17438540) has been going over 8 hours (according to BOINC) and longer in the real world. It appeared stuck on Model 2 Step 373221. Will try restarting Boinc several times as suggested above. Win XP: Boinc 5:10:28. |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=158648907 Not on mine there not. In task ID it states claimed and then granted = 0 |
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
|
Virtual Boss* Send message Joined: 10 May 08 Posts: 35 Credit: 713,981 RAC: 0 |
WU FRA_t449_CASP8_MANUAL_1_IGNORE_THE_RESTt449_1_ttxxxxT0449_1CHIM_0001_0001_0001_4142_3294_1 using rosetta_beta version 598 This WU stii runing at 17:22:00 CPU Progress now 99.049% and incrementing every 66 CPU secs To completion is now 00:09:56 (no change last CPU Hr) Currently Model 22 Step 69581 Also noticed no files in task slot have been updated since 30/06/2008 12:38 PM (56.6 Hrs ago Real Time)(approx 14.4 Hrs ago CPU Time) I dont think it is worth keeping this WU going |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
As long as the CPU is running, the WU is alive and well. My largest WU needed 46 hours to complete (and 12 hours default runtime of course) - the time needed depends on your computer. So if you're not too impatient... |
Virtual Boss* Send message Joined: 10 May 08 Posts: 35 Credit: 713,981 RAC: 0 |
OK will leave it runing a while longer |
anti-cancers Send message Joined: 2 Sep 06 Posts: 9 Credit: 173,262 RAC: 0 |
Result (#174921063) 2008. júl. 2., szerda, 23.13.30 CEST|rosetta@home|Computation for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_2443_0 finished 2008. júl. 2., szerda, 23.13.30 CEST|rosetta@home|Output file FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_2443_0_0 for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_2443_0 absent |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 10 |
174952450 <core_client_version>5.10.45</core_client_version> Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
t407__CASP8_JUMPAB_SAMPLE2_newalign_SAVE_ALL_OUT_BARCODE_hom005__3659_88038_2 VALIDATE ERROR - NO CREDIT GRANTED just wasted 4 hrs cpu time on this one...I thought these issues were taken care of? |
billy ewell 1931 Send message Joined: 30 Mar 07 Posts: 14 Credit: 6,951,540 RAC: 4,488 |
Three work units failed in sequence between 01:19 and 01:43 on 3 July UTC time. The error messages attributed "file transfer error". Apparently over 9 hours of cpu time lost. The WUIDs are: 159628332, 159620771 and 159639505. I have no idea if 5.98 is the culprit but I doubt it as many other units have completed ok as I have been running 4 cpus 100% Rosetta for 68.5 hours straight. |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_492 seems to be problematic. Same error adrianxw and anti-cancers posted stderr out <core_client_version>5.10.45</core_client_version> <![CDATA[ <stderr_txt> Rosetta@home Macintosh Stack Size checker. Original size: 0. Maximum size: 8388608. RLIM_INFINITY 0 # cpu_run_time_pref: 21600 # random seed: 2138737 # cpu_run_time_pref: 14400 ====================================================== DONE :: 1 starting structures 13919.5 cpu seconds This process generated 1 decoys from 1 attempts 0 starting pdbs were skipped ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message> <file_xfer_error> <file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_492_1_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Thu Jul 3 16:59:01 2008|rosetta@home|Computation for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_492_1 finished Thu Jul 3 16:59:01 2008|rosetta@home|Output file FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_492_1_0 for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_492_1 absent |
kb7rzf Send message Joined: 7 Oct 05 Posts: 16 Credit: 35,427 RAC: 0 |
Well, had 1 compute error since I started crunching again here. The wu is FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_1226 The info from the STDERR OUT: stderr out <core_client_version>5.10.45</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 2143013 ====================================================== DONE :: 1 starting structures 20322.7 cpu seconds This process generated 7 decoys from 7 attempts 0 starting pdbs were skipped ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message> <file_xfer_error> <file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_1226_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This errored after 4hrs,50min on me, same error for two hosts again!. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=159591795 7/4/2008 2:29:37 PM|rosetta@home|Output file FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_144_1_0 for task absent <core_client_version>5.10.30</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 2139085 ====================================================== DONE :: 1 starting structures 17427 cpu seconds This process generated 4 decoys from 4 attempts 0 starting pdbs were skipped ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message> <file_xfer_error> <file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_144_1_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> pete. |
Virtual Boss* Send message Joined: 10 May 08 Posts: 35 Credit: 713,981 RAC: 0 |
[/quote] As long as the CPU is running, the WU is alive and well. My largest WU needed 46 hours to complete (and 12 hours default runtime of course) - the time needed depends on your computer. So if you're not too impatient...[/quote] Finally finished at 24:09:10 CPU Clalmed Credit 189.06 Granted Credit 97.39 not as good credit/work as normal but thanks for the confidence to let it complete |
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
As long as the CPU is running, the WU is alive and well. My largest WU needed 46 hours to complete (and 12 hours default runtime of course) - the time needed depends on your computer. So if you're not too impatient...[/quote] Finally finished at 24:09:10 CPU Clalmed Credit 189.06 Granted Credit 97.39 not as good credit/work as normal but thanks for the confidence to let it complete[/quote] But as you can see the credit discrepancy is huge. This is actually a problem as far as I am concerned. The project staff should look into why this is happening. The application would not do say 5 decoys, decide it had time to do one other and then take *14 hours* to do so. This is a huge problem and I hope the project staff are looking into it. Tim |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
See my post here. It would seem that when you hit that last model, that happens to take a significantly longer time is when your credit is harmed. Because credit is based on averages, and that particular model is anything but average. As I posted at the above reference, this issue of specific long-running models is already under investigation. Rosetta Moderator: Mod.Sense |
Hypermarkup Send message Joined: 3 Mar 06 Posts: 7 Credit: 112,275 RAC: 0 |
|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This one last night. Edit// Ran about 7 mins. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=158009479 7/4/2008 5:40:10 PM|rosetta@home|Output file for task t434_1_NMRREF_1_t434_1_T0434_2QPWA_2JV0_hybridIGNORE_THE_REST_truncated_4104_4531_1 absent <core_client_version>5.10.30</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 2590114 ERROR:: Exit from: .refold.cc line: 338 </stderr_txt> pete. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This is starting to P... me off it ran for 5hrs,57min. Getting more of these then with the old 5.96 app. Will credit be given for the work done? https://boinc.bakerlab.org/rosetta/workunit.php?wuid=159600967 7/5/2008 12:50:11 PM|rosetta@home|Output file for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_473_1 absent <core_client_version>5.10.30</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 2143766 # cpu_run_time_pref: 21600 ====================================================== DONE :: 1 starting structures 21418 cpu seconds This process generated 9 decoys from 9 attempts 0 starting pdbs were skipped ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message> <file_xfer_error> <file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_473_1_0</file_name> <error_code>-161</error_code> </file_xfer_error> pete. |
Message boards :
Number crunching :
Problems with Rosetta version 5.98
©2024 University of Washington
https://www.bakerlab.org