Minirosetta 3.73-3.78

Message boards : Number crunching : Minirosetta 3.73-3.78

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · Next

AuthorMessage
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,946,651
RAC: 13,227
Message 88042 - Posted: 9 Jan 2018, 18:21:07 UTC
Last modified: 9 Jan 2018, 18:25:28 UTC

Looks kike something wrong with rb_01_08_.... series of WUs on minirosetta 3.78. (rb_01_08_77806_122534__t000__2_C1_SAVE_ALL_OUT_IGNORE_THE_REST_541301_331_0 latest example)

i have seen some of these tasks consuming huge amount of RAM - it start from standard 200-400 Mb range but at same point can hoard up to 1400-1800 Mb per task. May be even more - it crashed due to out of RAM (8 GB RAM + 4 GB page/swap file on 6-core CPU)
ID: 88042 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 88043 - Posted: 9 Jan 2018, 19:38:06 UTC - in response to Message 88042.  
Last modified: 9 Jan 2018, 20:03:21 UTC

i have seen some of these tasks consuming huge amount of RAM - it start from standard 200-400 Mb range but at same point can hoard up to 1400-1800 Mb per task.

I have five on Windows 7 64-bit (i7-4771), and six on Ubuntu 16.04 (i7-3770) ranging from 1 to 19 hours with no problems yet, but I will keep an eye on them. If they blow up, it must be late in the run.

EDIT: By the way, I see you are using AMD CPUs. I got poor performance on my Ryzen 1700 on Rosetta, as I reported earlier. I wonder if they need to recompile it to fix this problem too?
ID: 88043 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2122
Credit: 41,198,936
RAC: 9,975
Message 88084 - Posted: 17 Jan 2018, 3:57:06 UTC

Boinc 7.83 recent Mini-rosetta 3.78 error
nRoCM_01_P05055_group0_congq_SAVE_ALL_OUT_IGNORE_THE_REST_541727_1334_0
ERROR: ERROR: reading of AtomPair failed.

ERROR:: Exit from: ......srccorescoringconstraintsConstraintIO.cc line: 559
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

ID: 88084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 25,946,651
RAC: 13,227
Message 88088 - Posted: 17 Jan 2018, 7:44:06 UTC - in response to Message 88043.  
Last modified: 17 Jan 2018, 7:45:14 UTC

I do not see such memory leaks any more lately too.

About AMD CPU performance - I do not know. I do not have any latest AMD CPUs (from Ryzen family) yet.
I am still using older CPUs: one Phenom II X6 and two FX-8320 (Vishera/Piledriver), And I have not seen any performance issues with these older AMD CPUs in Rosetta: they almost on par with corresponding (from same Generation/age and same core number) Intel CPUs.
ID: 88088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 88130 - Posted: 20 Jan 2018, 22:48:04 UTC

I've recently begun having this issue with my host XP running Pentium 4 CPU. Previously no problems, though of course slow and relatively low credits as expected. Using app 3.78 windows_intelx86. Workunit 872559942 - Task 967645181

01/20/2018 12:57:30 PM | Rosetta@home | Computation for task rb_01_17_79431_122764__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_542014_553_0 finished
01/20/2018 12:57:30 PM | Rosetta@home | Output file rb_01_17_79431_122764__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_542014_553_0_r1092951988_0 for task rb_01_17_79431_122764__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_542014_553_0 absent
01/20/2018 12:57:51 PM | | Suspending computation - CPU is busy
01/20/2018 12:58:01 PM | | Resuming computation
01/20/2018 12:58:17 PM | Rosetta@home | Sending scheduler request: To report completed tasks.
01/20/2018 12:58:17 PM | Rosetta@home | Reporting 1 completed tasks
01/20/2018 12:58:17 PM | Rosetta@home | Not requesting tasks: don't need (job cache full)
01/20/2018 12:58:26 PM | Rosetta@home | Scheduler request completed

Exit status -1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0121B939 read attempt to address 0x39D5626C

Same errors for Workunit 872559856 Task 967645069.
No point in me continuing to run Rosetta on this host if this situation continues, as able to run SETI@home without issue with it.
ID: 88130 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 88214 - Posted: 2 Feb 2018, 5:55:58 UTC

I'm having this error on my host Windows XP running Pentium 4 CPU. Using app 3.78 windows_intelx86.
Name RhbaA_18619_a_trimmed_27_127len_cstwt_3.0_centerjumps_9mers_542830_9305_0
Workunit 875039666
Created 30 Jan 2018, 4:33:41 UTC
Sent 30 Jan 2018, 5:07:48 UTC
Report deadline 7 Feb 2018, 5:07:48 UTC
Received 2 Feb 2018, 3:33:42 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS
Computer ID 1580783


The following is repeated several times:

Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Using previously extracted minirosetta_database.
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
Continuing computation from checkpoint: chk_S_00000001_Abrelax__rg_state ... success!
ID: 88214 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 88256 - Posted: 10 Feb 2018, 7:28:48 UTC

Re: My host Windows XP with Pentium 4 CPU (1 core HT). Issue with Rosetta Mini v3.78 windows_intelx86.

Name rb_02_04_80757_123466__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_544266_607_0
Workunit 876143103 Created 4 Feb 2018, 14:13:59 UTC Sent 4 Feb 2018, 14:43:33 UTC Report deadline 12 Feb 2018, 14:43:33 UTC Received 9 Feb 2018, 10:24:36 UTC Task: 971639876

Initialization complete. Setting WU description ... Using previously extracted minirosetta_database. Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_rb_02_04_80757_123466__t000__1_C1_robetta.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active.

ERROR: semi-rotameric input invalid -- chi means differ. bb1: 120bb2: 110 original chi_1: -176.6 later chi_1: -177.8
ERROR: Exit from: C:UsersboincsrcRosettamainsourcesrccore/pack/dunbrack/SemiRotamericSingleResidueDunbrackLibrary.tmpl.hh line: 1685 called boinc_finish
ID: 88256 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aladar42

Send message
Joined: 14 Nov 17
Posts: 2
Credit: 67,864
RAC: 0
Message 88327 - Posted: 20 Feb 2018, 15:13:39 UTC

Couple of errors overnight:

https://boinc.bakerlab.org/workunit.php?wuid=878661631
https://boinc.bakerlab.org/workunit.php?wuid=878661900
https://boinc.bakerlab.org/workunit.php?wuid=878661716
ID: 88327 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 88789 - Posted: 2 May 2018, 6:18:54 UTC

Application version Rosetta Mini v3.78 windows_x86_64
Device: 1759960, Task: 993065168, and WU 894585192 .
Status: Error while computing.
Errors: Too many errors (may have bug). Too many total results.

Exit status -1 (0xFFFFFFFF) Unknown error code
Options::initialize()
Options::adding options()
Options::initialize() Check specs.
Options::initialize() End reached
ERROR: No values of the appropriate type specified for multi-valued option -jumps:random_sheets
ID: 88789 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,601,998
RAC: 9,000
Message 88793 - Posted: 2 May 2018, 9:54:20 UTC

All "nas_final" wus end, after few seconds, with error:

ERROR: unrecognized residue NAS
ERROR:: Exit from: ......srccoreiopose_from_sfrPoseFromSFRBuilder.cc line: 1030
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

ID: 88793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 88803 - Posted: 3 May 2018, 5:18:17 UTC

Application version Rosetta Mini v3.78 windows_x86_64
Device: 1759960, Task: 993062525, and WU 894544928.
Status: Error while computing.
Errors: Too many errors (may have bug). Too many total results.

Exit status: 1 (0x00000001) Unknown error code
<message> Incorrect function. (0x1) - exit code 1 (0x1)</message>

Starting work on structure: _00009
std::cerr: Exception was thrown:
chi angle must be between -180 and 180: -1.#IND
ID: 88803 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 88804 - Posted: 3 May 2018, 5:42:43 UTC
Last modified: 3 May 2018, 5:44:12 UTC

Application version Rosetta Mini v3.78 windows_x86_64
Device: 1759960, Task: 993062751, and WU 894756725.
Status: Aborted.

Exit status: 203 (0x000000CB) EXIT_ABORTED_VIA_GUI
BOINC:: Worker startup.
Starting watchdog...
Starting work on structure: _00001
Watchdog active.
Continuing computation from checkpoint: chk_S_00000001_Abrelax__rg_state ... success!

Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x75EC338D

Engaging BOINC Windows Runtime Debugger...

As WU continued to start over and resetting elapsed time to zero, at least 6 or more times, I aborted WU. Figured if was still on Structure 1 after 6 to 8 hrs of number crunching, it wasn't going to finish well., and was wasting CPU resources that could be used doing other work.
ID: 88804 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,601,998
RAC: 9,000
Message 88819 - Posted: 5 May 2018, 8:59:34 UTC - in response to Message 88793.  

All "nas_final" wus end, after few seconds, with error:

ERROR: unrecognized residue NAS
ERROR:: Exit from: ......srccoreiopose_from_sfrPoseFromSFRBuilder.cc line: 1030
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


Again, all "nas_final" with the same error
ID: 88819 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,601,998
RAC: 9,000
Message 89059 - Posted: 5 Jun 2018, 10:06:51 UTC

1003924662

ERROR: Unable to open atomset parameter file: minirosetta_databasechemical/atom_type_sets/fa_standard//

ID: 89059 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,031,114
RAC: 7,301
Message 89065 - Posted: 5 Jun 2018, 19:39:25 UTC - in response to Message 89059.  

1003924662

ERROR: Unable to open atomset parameter file: minirosetta_databasechemical/atom_type_sets/fa_standard//


Seems very strange that the path name has both forward and back slashes.
ID: 89065 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Viking69
Avatar

Send message
Joined: 3 Oct 05
Posts: 20
Credit: 6,804,326
RAC: 2,971
Message 89324 - Posted: 20 Jul 2018, 15:00:27 UTC

I am seeing this, but the WU's seems to be getting credit.

7/20/2018 7:57:02 AM | Rosetta@home | Task PH18070961_fold_SAVE_ALL_OUT_677251_949_0 exited with zero status but no 'finished' file
Hi all you enthusiastic crunchers.....
ID: 89324 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,031,114
RAC: 7,301
Message 89325 - Posted: 20 Jul 2018, 15:45:40 UTC - in response to Message 89324.  

I am seeing this, but the WU's seems to be getting credit.

7/20/2018 7:57:02 AM | Rosetta@home | Task PH18070961_fold_SAVE_ALL_OUT_677251_949_0 exited with zero status but no 'finished' file



Seems like a relatively harmless BOINC timing problem. From Snags on the Ralph board:
https://boinc.bakerlab.org/forum_thread.php?id=6376


The "exited with zero status but no 'finished' file" occurs when some other task on your computer prevents the science app from communicating with BOINC. It is usually safe to ignore it as it will have to happen 100 times to a task before the task will give up and error out. On the BOINC forum Jord (Ageless)makes the following suggestions:

Possible causes of the "Task exited with zero status but no 'finished' file" syndrome:

1. Make sure you exclude the BOINC directory and all subdirectories (or the BOINC Data directory and all subdirectories in BOINC 6 and 7) from being actively scanned by anti-virus and anti-spyware software. Only scan when you have exited BOINC.

2. Don't defrag your disk with BOINC on.

3. Don't run Scandisk with BOINC on.

4. Disable Drive Indexing.

5. Update your motherboard chipset drivers, specifically those for your IDE or SATA controllers.

6. Disable the Time synchronization in Windows XP/Vista. normally found under the clock (double click it in the system tray), third tab (Internet in English), uncheck the sync option.

7. When you use use BOINC's CPU throttling function, you can run into the too many exit(0)s error. The advice here is to disable the BOINC throttling (set it to 100%) and reduce the amount of CPUs/cores for BOINC to use.
** Use at most 100.0 percent of CPU time.
* In BOINC 7.0, this is done through the option On multiprocessors, use at most xxx% of the processors.
ID: 89325 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 89341 - Posted: 25 Jul 2018, 5:58:39 UTC

Application version Rosetta Mini v3.78 windows_intelx86
Device: 1759960, Task: 1015834050, and WU 915361344.
Status: Timed out - no response.
Outcome: No reply

As WU continued to start over and resetting elapsed time to zero numerous times with the notation "Task exited with zero status but no 'finished' file." I reset the project rather than aborting the WU, as is always suggested, for a change. Apparently the WU went into ghost-land until time out date/time came along. I guess aborting would have been the "better" option, reassigning the WU to another host, or would it have been. Appears another task was created for this WU, but not sent as of this date and time. I'm wondering why not reassigned yet, or if problem found with this WU type.

Just an FYI at this point as errored-out due to being timed-out.
ID: 89341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,601,998
RAC: 9,000
Message 89533 - Posted: 11 Sep 2018, 18:59:09 UTC

1027856016
1027855995

ERROR: ERROR: FragmentIO: could not open file 00001.200.9mers
ERROR:: Exit from: ......srccorefragmentFragmentIO.cc line: 233
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


After 2 years of this app there are errors again...
Debug it or abandon it to pass all on 4.x version, that's the question
ID: 89533 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 89546 - Posted: 13 Sep 2018, 7:45:40 UTC

Application version Rosetta Mini v3.78 windows_intelx86
Device: 1759960, Task: 1026363230, and WU 924883597.
Status: Validate error.
Outcome: Validate error.
Errors: Too many total results, which was seen before my task completed.

I was 3 hours past deadline. However, the replacement task was "abandoned" and received less than 30 minutes after it was sent to host. No errors noted in my computation, per my review. The end of the Stderr output shows:
======================================================
DONE :: 1 starting structures 28379.1 cpu seconds
This process generated 39 decoys from 39 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

I'm just wondering why the error. Normally I note after-deadline valid WUs still get credit if completed before valid replacement task.
ID: 89546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · Next

Message boards : Number crunching : Minirosetta 3.73-3.78



©2024 University of Washington
https://www.bakerlab.org