Problems with Rosetta version 5.98

Message boards : Number crunching : Problems with Rosetta version 5.98

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

AuthorMessage
glaesum

Send message
Joined: 16 Oct 06
Posts: 21
Credit: 508,632
RAC: 0
Message 54109 - Posted: 1 Jul 2008, 15:15:31 UTC

starting to get the occasional error on 5.98:

here is a t443 {wuid=158705243} that plugged away for nearly 15hrs until it packed in with a validate error. credit was claimed and granted but never actually got issued...

there's no diagnostic on my task report but the wingman's task stopped with client error after 20mins and does have lots of diagnostics (too many restarts with no progress).
ID: 54109 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 54111 - Posted: 1 Jul 2008, 16:06:14 UTC

I'm also seeing this same slowdown problem. A 4 hour task on t443 (17438540) has been going over 8 hours (according to BOINC) and longer in the real world. It appeared stuck on Model 2 Step 373221. Will try restarting Boinc several times as suggested above. Win XP: Boinc 5:10:28.
ID: 54111 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 54114 - Posted: 1 Jul 2008, 17:52:02 UTC - in response to Message 54097.  

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=158648907

Two validate errors after full crunch.

Rosetta needs to think about how to apply credit when the problems are obviously of project/WU source.

Jim

Credit is applied to these as claimed - it doesn't show on the task's main page but does if you hit the Task ID link on the left.

HTH
Danny



Not on mine there not. In task ID it states claimed and then granted = 0
ID: 54114 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 54129 - Posted: 1 Jul 2008, 23:45:19 UTC

Here is a really bad WU, even though it validated: 174615818

My runtime preference is 7200 seconds but this one ran over 29000 seconds but here is the kicker: Claimed credit 102.7, Granted credit 13.4.

and another with the same problem: 174641989

3-5 hours of wasted credit is a huge problem!



ID: 54129 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Virtual Boss*
Avatar

Send message
Joined: 10 May 08
Posts: 35
Credit: 713,981
RAC: 0
Message 54141 - Posted: 2 Jul 2008, 12:00:49 UTC - in response to Message 54103.  

WU FRA_t449_CASP8_MANUAL_1_IGNORE_THE_RESTt449_1_ttxxxxT0449_1CHIM_0001_0001_0001_4142_3294_1 using rosetta_beta version 598

Original estimated run time about 6 CPU Hrs

Still Runing at 10:10:00 CPU

Progress 98.386% and incrementing 0.001 about every 25 CPU secs

To Completion 00:09:55 (no change last 30 CPU minutes

At current % increase will take another 11+ CPU Hrs, or if Prog% is calculated from time done as % of Time done+To completion then will run forever.

BTW Currently Model 22 Step 47795

the % complete and time to completion aren't linear - they're estimates, so don't worry about them if Rosetta's CPU time is increasing in task manager.

Danny



This WU stii runing at 17:22:00 CPU

Progress now 99.049% and incrementing every 66 CPU secs

To completion is now 00:09:56 (no change last CPU Hr)

Currently Model 22 Step 69581

Also noticed no files in task slot have been updated since 30/06/2008 12:38 PM (56.6 Hrs ago Real Time)(approx 14.4 Hrs ago CPU Time)

I dont think it is worth keeping this WU going
ID: 54141 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 54142 - Posted: 2 Jul 2008, 13:29:00 UTC - in response to Message 54141.  



This WU stii runing at 17:22:00 CPU

...

I dont think it is worth keeping this WU going


As long as the CPU is running, the WU is alive and well.

My largest WU needed 46 hours to complete (and 12 hours default runtime of course) - the time needed depends on your computer. So if you're not too impatient...
ID: 54142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Virtual Boss*
Avatar

Send message
Joined: 10 May 08
Posts: 35
Credit: 713,981
RAC: 0
Message 54143 - Posted: 2 Jul 2008, 13:34:36 UTC - in response to Message 54142.  



This WU stii runing at 17:22:00 CPU

...

I dont think it is worth keeping this WU going


As long as the CPU is running, the WU is alive and well.

My largest WU needed 46 hours to complete (and 12 hours default runtime of course) - the time needed depends on your computer. So if you're not too impatient...



OK will leave it runing a while longer
ID: 54143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
anti-cancers

Send message
Joined: 2 Sep 06
Posts: 9
Credit: 173,262
RAC: 0
Message 54147 - Posted: 2 Jul 2008, 21:19:21 UTC

Result (#174921063)

2008. júl. 2., szerda, 23.13.30 CEST|rosetta@home|Computation for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_2443_0 finished
2008. júl. 2., szerda, 23.13.30 CEST|rosetta@home|Output file FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_2443_0_0 for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_2443_0 absent
ID: 54147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 653
Credit: 11,840,739
RAC: 23
Message 54151 - Posted: 3 Jul 2008, 9:04:20 UTC

174952450

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 2137497
======================================================
DONE :: 1 starting structures 7768.21 cpu seconds
This process generated 1 decoys from 1 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_1732_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 54151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 54155 - Posted: 3 Jul 2008, 17:41:16 UTC

t407__CASP8_JUMPAB_SAMPLE2_newalign_SAVE_ALL_OUT_BARCODE_hom005__3659_88038_2
VALIDATE ERROR - NO CREDIT GRANTED

just wasted 4 hrs cpu time on this one...I thought these issues were taken care of?
ID: 54155 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
billy ewell 1931

Send message
Joined: 30 Mar 07
Posts: 14
Credit: 6,899,522
RAC: 0
Message 54157 - Posted: 3 Jul 2008, 19:37:47 UTC

Three work units failed in sequence between 01:19 and 01:43 on 3 July UTC time. The error messages attributed "file transfer error". Apparently over 9 hours of cpu time lost. The WUIDs are: 159628332, 159620771 and 159639505. I have no idea if 5.98 is the culprit but I doubt it as many other units have completed ok as I have been running 4 cpus 100% Rosetta for 68.5 hours straight.
ID: 54157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 54162 - Posted: 3 Jul 2008, 21:43:23 UTC

FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_492 seems to be problematic. Same error adrianxw and anti-cancers posted

stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
Rosetta@home Macintosh Stack Size checker.
Original size: 0.
Maximum size: 8388608.
RLIM_INFINITY 0
# cpu_run_time_pref: 21600
# random seed: 2138737
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 13919.5 cpu seconds
This process generated 1 decoys from 1 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_492_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


Thu Jul 3 16:59:01 2008|rosetta@home|Computation for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_492_1 finished
Thu Jul 3 16:59:01 2008|rosetta@home|Output file FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_492_1_0 for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_492_1 absent
ID: 54162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile kb7rzf
Avatar

Send message
Joined: 7 Oct 05
Posts: 16
Credit: 35,427
RAC: 0
Message 54164 - Posted: 4 Jul 2008, 0:09:37 UTC

Well, had 1 compute error since I started crunching again here. The wu is FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_1226

The info from the STDERR OUT:

stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2143013
======================================================
DONE :: 1 starting structures 20322.7 cpu seconds
This process generated 7 decoys from 7 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_1226_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
ID: 54164 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 54166 - Posted: 4 Jul 2008, 4:46:34 UTC

This errored after 4hrs,50min on me, same error for two hosts again!.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=159591795

7/4/2008 2:29:37 PM|rosetta@home|Output file FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_144_1_0 for task absent


<core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2139085
======================================================
DONE :: 1 starting structures 17427 cpu seconds
This process generated 4 decoys from 4 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4165_144_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>

pete.

ID: 54166 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Virtual Boss*
Avatar

Send message
Joined: 10 May 08
Posts: 35
Credit: 713,981
RAC: 0
Message 54168 - Posted: 4 Jul 2008, 15:44:06 UTC - in response to Message 54142.  

[/quote]

As long as the CPU is running, the WU is alive and well.

My largest WU needed 46 hours to complete (and 12 hours default runtime of course) - the time needed depends on your computer. So if you're not too impatient...[/quote]

Finally finished at 24:09:10 CPU

Clalmed Credit 189.06
Granted Credit 97.39

not as good credit/work as normal but thanks for the confidence to let it complete
ID: 54168 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 54170 - Posted: 4 Jul 2008, 16:48:55 UTC - in response to Message 54168.  



As long as the CPU is running, the WU is alive and well.

My largest WU needed 46 hours to complete (and 12 hours default runtime of course) - the time needed depends on your computer. So if you're not too impatient...[/quote]

Finally finished at 24:09:10 CPU

Clalmed Credit 189.06
Granted Credit 97.39

not as good credit/work as normal but thanks for the confidence to let it complete[/quote]

But as you can see the credit discrepancy is huge. This is actually a problem as far as I am concerned. The project staff should look into why this is happening.

The application would not do say 5 decoys, decide it had time to do one other and then take *14 hours* to do so. This is a huge problem and I hope the project staff are looking into it.

Tim



ID: 54170 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 54175 - Posted: 4 Jul 2008, 18:40:59 UTC

See my post here. It would seem that when you hit that last model, that happens to take a significantly longer time is when your credit is harmed. Because credit is based on averages, and that particular model is anything but average.

As I posted at the above reference, this issue of specific long-running models is already under investigation.
Rosetta Moderator: Mod.Sense
ID: 54175 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hypermarkup

Send message
Joined: 3 Mar 06
Posts: 7
Credit: 112,275
RAC: 0
Message 54176 - Posted: 4 Jul 2008, 19:14:04 UTC

ID: 54176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 54179 - Posted: 4 Jul 2008, 21:32:09 UTC
Last modified: 4 Jul 2008, 21:45:59 UTC

This one last night. Edit// Ran about 7 mins.


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=158009479


7/4/2008 5:40:10 PM|rosetta@home|Output file for task t434_1_NMRREF_1_t434_1_T0434_2QPWA_2JV0_hybridIGNORE_THE_REST_truncated_4104_4531_1 absent

<core_client_version>5.10.30</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2590114
ERROR:: Exit from: .refold.cc line: 338

</stderr_txt>

pete.
ID: 54179 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 54180 - Posted: 5 Jul 2008, 3:26:47 UTC

This is starting to P... me off it ran for 5hrs,57min. Getting more of these

then with the old 5.96 app. Will credit be given for the work done?

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=159600967

7/5/2008 12:50:11 PM|rosetta@home|Output file for task FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_473_1 absent


<core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2143766
# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures 21418 cpu seconds
This process generated 9 decoys from 9 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>FRA_t453_CASP8_HYBRID_MANUAL_1_IGNORE_THE_RESTt451_1_axmin1_0001_4163_473_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

pete.



ID: 54180 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

Message boards : Number crunching : Problems with Rosetta version 5.98



©2024 University of Washington
https://www.bakerlab.org