Message boards : Number crunching : Problems with Rosetta version 5.59
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author | Message |
---|---|
rechenknecht123 Send message Joined: 15 Oct 06 Posts: 17 Credit: 2,022 RAC: 0 |
Running time 00:24:00 13,327% Time til ready 2281:34:26 |
rechenknecht123 Send message Joined: 15 Oct 06 Posts: 17 Credit: 2,022 RAC: 0 |
I don't know what happend. After 11H there should have been atleast a few models made. thanks Anders n rechenknecht |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
This seems to be stuck when it is in ab initio stage. As far as I can tell the strand is stuck, its on model 24 step 11,000 and counting higher. But the RMSD is stuck on 13.xx with xx being the variable numbers. The accepted energy is not really stuck, but does not register on the graph. It appears stuck at the top. The progress keeps counting in BOINC manager though, so it's not stuck in a endless loop according to it. I will let it run its course as it is now 6 hrs into the process. I have one more WU of the same type to run still. Is this normal? |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Hi greg_be ... I think the behavior is OK. Please leave it running! Fixing the scale on the graph of the energy is definitely on the "TO DO" list. This seems to be stuck when it is in ab initio stage. As far as I can tell the strand is stuck, its on model 24 step 11,000 and counting higher. But the RMSD is stuck on 13.xx with xx being the variable numbers. The accepted energy is not really stuck, but does not register on the graph. It appears stuck at the top. The progress keeps counting in BOINC manager though, so it's not stuck in a endless loop according to it. I will let it run its course as it is now 6 hrs into the process. |
rechenknecht123 Send message Joined: 15 Oct 06 Posts: 17 Credit: 2,022 RAC: 0 |
Hi Rhiju, and anders n, this WU Mo 9 Apr 00:02:04 2007|rosetta@home|Restarting task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom001__1638_96906_0 using rosetta version 559 hangs now in the 13h. at 98,745 % no checkpoint wrote, now i runs from Zero 0 %, 3104:34:17h to run the 5 run at Easter. what now kill for ever or run a 6 time. rechenknecht Hi greg_be ... I think the behavior is OK. Please leave it running! Fixing the scale on the graph of the energy is definitely on the "TO DO" list. |
Prime Lemur Send message Joined: 22 Feb 06 Posts: 1 Credit: 89,553 RAC: 0 |
Just a minor problem observed (may not be RAH's problem): I'm currently crunching 1fkaA_BOINC_ZEROWATSONCRICK_RNA_ABINITIO-1fkaA-chunk006__1659_76_1. I changed my RAH preferences (resource share) while another project was running. A new WU downloaded from RAH. When BOINC switched back to my first RAH WU, the Progress % reset to zero. (Thankfully) CPU Time did not change, but the To Completion time grew to almost double (to 03:58:45) what it was before switching projects/changing prefs/downloading new WU. Like I say, the CPU Time did not change, so there was no loss of work. I realise it could be a BOINC issue as much as RAH, but I thought I'd share this anyway. Prime Lemur |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
Hi Rhiju, and anders n, this WU Hi there I tried to calculate how long a model should take on your MAC. It should take 3,5-5,5 H with that kind of Wu. If you decide to let it run check the grafics sometimes so the steps are counting up. Anders n |
rechenknecht123 Send message Joined: 15 Oct 06 Posts: 17 Credit: 2,022 RAC: 0 |
Hallo anders n, this is a other WU- on my other disk partition. it Runs under MAc os 10.49 in the Boinc container 5.8.15 . at 97,415% ready cpu time 6:17:00h time til ready 00:09:54h stands there for 10 min. Grafic kontrol is ok- stage symetric relax stands at model 1. step 69969 acceptet energy: - 311,4855 Now Mo 9 Apr 11:02:32 2007|rosetta@home|Resuming task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom014__1638_298_1 using rosetta version 559 Hi Rhiju, and anders n, this WU |
rechenknecht123 Send message Joined: 15 Oct 06 Posts: 17 Credit: 2,022 RAC: 0 |
Might a Boinc problem. this WU. Mo 9 Apr 11:02:32 2007|rosetta@home|Resuming task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom014__1638_298_1 using rosetta version 559 Running up til 6:24:20h running time at 97,455% ready. resttime 0:09:21h in step 1. modell 69969 then stop the All WUs( seti, simap, r@h) in Bonic- all runs fine. wenn i press start to continue the single WU. but as i close boinc over the Quit- button in the Menue this happends. rechenknecht Just a minor problem observed (may not be RAH's problem): |
rechenknecht123 Send message Joined: 15 Oct 06 Posts: 17 Credit: 2,022 RAC: 0 |
Might a Boinc problem. this WU. Mo 9 Apr 11:02:32 2007|rosetta@home|Resuming task s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom014__1638_298_1 using rosetta version 559 Running up til 6:24:20h running time at 97,455% ready. resttime 0:09:21h in step 1. modell 69969 then stop the All WUs( seti, simap, r@h) in Bonic- all runs fine. wenn i press start to continue the single WU. but as i close boinc over the Quit- button in the Menue this happends. rechenknecht Just a minor problem observed (may not be RAH's problem): |
(_KoDAk_) Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
http://img45.imageshack.us/img45/2572/berru4.gif this after restart Rosetta |
Purple Rabbit Send message Joined: 24 Sep 05 Posts: 28 Credit: 4,334,953 RAC: 963 |
I have had occasional problems with V5.59 on Linux Suse 10.2. Not every result had a problem. The following results (as a sample) died: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=64664432 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=64280490 More on this machine have died, but not on other machines. Host 395030 (the one referenced) is a Celeron 1.3 GHz CPU with 640 MB of RAM. I received a segment fault on these (and other) results. Everything was working OK before 5.59. My other computers are fine, both Windows and Linux. This seems strange to me. I'm running BOINC 5.8.17 (Windows) and BOINC 5.8.16 (Linux) on my machines. They ought to be current for now. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I just finished & returned this one, I don't know why it finished early and the numbers are odd. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=64512157 Over Success Done 22,383.00 46.47 52.78 cpu_run_time_pref: 36000 ====================================================== DONE :: 1 starting structures built 77 (nstruct) times This process generated 48 decoys from 48 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down |
Ty Send message Joined: 2 Mar 06 Posts: 2 Credit: 50,697 RAC: 0 |
BOINC 5.8.15 ~ Rosetta 5.59 ~ I have 4 active projects. Switch Apps 70 min. Workunit: 1lz1_BOINC_POSEDISULF_SAVE_ALL_OUT_1643_3382_0 Result id 72045046 "CPU Time" looks like it increments properly each second. "Progress" resets to 0 when restarting unit then looks like it increments. "To Completion" will increment 3~4 seconds then decrement about 12~18 seconds usually 15 seconds. In a 60 second period To Completion paused 11~16 times for 1 second while CPU time incremented. Sometimes To Completion jumped back, sometimes it continued forward counting. The consistancies ~ CPU Time counted 5 seconds forward and To Completion jumped back 11~18 seconds (usually 15 sec). |
(_KoDAk_) Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
What the *********** https://boinc.bakerlab.org/rosetta/result.php?resultid=71749731 https://boinc.bakerlab.org/rosetta/result.php?resultid=71748628 86,684.79 374.87 172.00 ?????????????????????? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
i had to do a windows security update and restart my system. i was 2 hrs plus into the crunch when i exited. when i restarted the wu, the cpu time remained the system but the percent complete went back to 0. Its currently in model 7 where it last left off in models and steps, but the percent complete seems low at 1.43% and counting for just under 3hrs in a 8hr run. Is this correct? or is there a problem with the percent complete stats? |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
i had to do a windows security update and restart my system. There is a problem with % to finish when you restart Boinc or when a wu is preemted and not set to keep in memory. You do not lose more work now than you did before just % to finish is off. Anders n [edit] also the estimated time to finish goes high then fast down again [/edit] |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
I report this incident, since it seems that (imperfect?) internet conditions seem to influence the wu crunching: (ibook G4 10.3.9) Result stderr out <core_client_version>5.8.17</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 2300066 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 7200 # random seed: 2300066 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 7200 ====================================================== DONE :: 1 starting structures built 7 (nstruct) times This process generated 6 decoys from 6 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> ]]> Local messages: ((2 first wus completed successfully. Then changed location and had to connect to a very unstable wireless network connection.)) Tir 10 Apr 17:00:40 2007|rosetta@home|Sending scheduler request: Requested by user Tir 10 Apr 17:00:40 2007|rosetta@home|Reporting 1 tasks Tir 10 Apr 17:04:46 2007|rosetta@home|Scheduler request failed: HTTP internal server error Tir 10 Apr 17:04:46 2007|rosetta@home|Deferring communication for 1 min 0 sec Tir 10 Apr 17:04:46 2007|rosetta@home|Reason: scheduler request failed Tir 10 Apr 17:04:51 2007|ralph@home|Sending scheduler request: To fetch work Tir 10 Apr 17:04:51 2007|ralph@home|Requesting 22067 seconds of new work Tir 10 Apr 17:10:03 2007||Project communication failed: attempting access to reference site Tir 10 Apr 17:10:03 2007|ralph@home|Scheduler request failed: a timeout was reached Tir 10 Apr 17:10:03 2007|ralph@home|Deferring communication for 1 hr 8 min 5 sec Tir 10 Apr 17:10:03 2007|ralph@home|Reason: scheduler request failed Tir 10 Apr 17:10:19 2007|rosetta@home|Sending scheduler request: Requested by user Tir 10 Apr 17:10:19 2007|rosetta@home|Reporting 1 tasks Tir 10 Apr 17:11:04 2007|rosetta@home|Task 1kd5__BOINC_INCREASECYCLES10_NOCHAINBREAK_RNA_ABINITIO-1kd5_-_1661_1715_0 exited with zero status but no 'finished' file Tir 10 Apr 17:11:04 2007|rosetta@home|If this happens repeatedly you may need to reset the project. Tir 10 Apr 17:11:04 2007|rosetta@home|Restarting task 1kd5__BOINC_INCREASECYCLES10_NOCHAINBREAK_RNA_ABINITIO-1kd5_-_1661_1715_0 using rosetta version 559 ((!!!!!!)) Tir 10 Apr 17:11:05 2007||Access to reference site succeeded - project servers may be temporarily down. Tir 10 Apr 17:12:50 2007|rosetta@home|Scheduler RPC succeeded [server version 509] Tir 10 Apr 17:12:50 2007|rosetta@home|Deferring communication for 4 min 2 sec Tir 10 Apr 17:12:50 2007|rosetta@home|Reason: requested by project Tir 10 Apr 18:18:09 2007|ralph@home|Sending scheduler request: To fetch work Tir 10 Apr 18:18:09 2007|ralph@home|Requesting 33299 seconds of new work Tir 10 Apr 18:18:55 2007|rosetta@home|Task 1kd5__BOINC_INCREASECYCLES10_NOCHAINBREAK_RNA_ABINITIO-1kd5_-_1661_1715_0 exited with zero status but no 'finished' file Tir 10 Apr 18:18:55 2007|rosetta@home|If this happens repeatedly you may need to reset the project. Tir 10 Apr 18:18:55 2007|rosetta@home|Restarting task 1kd5__BOINC_INCREASECYCLES10_NOCHAINBREAK_RNA_ABINITIO-1kd5_-_1661_1715_0 using rosetta version 559 ((!!!!!!)) 2007-04-10 18:23:07 [ralph@home] Scheduler request failed: HTTP internal server error 2007-04-10 18:23:07 [ralph@home] Deferring communication for 41 min 36 sec 2007-04-10 18:23:07 [ralph@home] Reason: scheduler request failed 2007-04-10 18:41:38 [---] Suspending network activity - user request ((After suspending network activity no more problems.)) -- R. A. Mostol |
Superfluence Send message Joined: 11 Apr 07 Posts: 2 Credit: 141 RAC: 0 |
I´m on a Mac 10.4.8 iBook and version 5.59 has A LOT of Bugs! 1. Most annoying of all: The crunching starts at 0% everytime Boinc is shut down and restarted. Some of the Time is still there but it´s 0% - Percentage runs faster from this point - BUT if it is restarted again (a second time, third time and so on) this new crunched data is lots and the time is also reset to the same like at the first restart. So i crunched the same s*** 10 times today - :/ 2. When the Mac goes to sleep the Project doesn´t start right. So sometimes a restart of Boinc is needed and guess what: the data is going bye bye... 3. Maybe this is a Boinc Problem, but when rosetta is crunching my iBook is becoming very slow - especially the Internet. PLZ help me - or fix the Version cause this really "sucks monkey ass!" ;) |
Lada JNet Send message Joined: 25 Mar 07 Posts: 2 Credit: 1,518 RAC: 0 |
Hello, I'm not sure if you are aware of this problem, but from time to time it happens to me, that computation stops in the middle of work. The CPU stops computing, nothing hangs, restarting BOINC helps. However, I have some computers I am not constantly checking and this glitch yesterday caused one of these computers to idle for ten hours before I found out and restarted BOINC... I'm afraid I cannot keep subscribed to Rosetta project on unnatended computers if this will occasionally happen. Some friends of mine observed the same problem and had to do the same. Well, we do not produce as much significant amount of credit than some other users, however I believe that losing any computation potential is a waste for Rosetta... If you would let me know when this is resolved, I'd like to continue to crunch more work from Rosetta again on unattended computers. |
Message boards :
Number crunching :
Problems with Rosetta version 5.59
©2025 University of Washington
https://www.bakerlab.org