Message boards : Number crunching : Rosetta@Home version 3.31
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. This task locked up & brought this system to it's knees, when i checked the task properties it was trying to use over 3GB of ram, this rig only has 4GB total. I had to stop Boinc & reboot the system i've never had to do that before, it had tried to run for about 4min before i caught it & killed it. This is just a small amount of the result report there's a lot more, check it out yourself. rb_05_31_31656_62566__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_50638_248_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=464410649 No heartbeat from core client for 30 sec - exiting FILE_LOCK::unlock(): close failed.: Bad file descriptor *** glibc detected *** ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.31_x86_64-pc-linux-gnu: double free or corruption (fasttop): 0x0c19ee18 *** ======= Backtrace: ========= *** glibc detected *** ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.31_x86_64-pc-linux-gnu: double free or corruption (!prev): 0x0c18d9e8 *** ======= Backtrace: ========= *** glibc detected *** ../../projects/boinc.bakerlab.org_rosetta/minirosetta_3.31_x86_64-pc-linux-gnu: double free or corruption (fasttop): 0x0c18b820 *** SIGSEGV: segmentation violation ======= Backtrace: ========= [0xaa1dda1] [0xaa218bb] </stderr_txt> ]]> |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
See this thread. Probably related to core client 7.x as I have no problems with my 6.x clients. There must be more to it than that. Of my last 38 workunits under BOINC 7.0.25 with minirosetta 3.31, only one failed. Using 64-bit Windows (Vista on one computer, Windows 7 on the other). |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. This task was running when the other one went wild and got court up in the crash but restarted o.k. when i rebooted, but the result log has got a lot of info as well from the crash, if it's any help. https://boinc.bakerlab.org/rosetta/result.php?resultid=509638850 rb_05_29_31621_62561__round2_t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_50618_2701_0 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. I received the second copy of this task when i had a look at why it had been resent, it looks like it had the same over memory error as the one i had yesterday. I wasn't going to let it crash my system again so i aborted it. The first rig to try & run it was a 32core intel with only 16GB of ram, i say only because of the amount of cores it running. rb_05_31_31656_62566__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_50638_57 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=464409998 You might want to look at why these are going wild with memory usage, looks like there from the same batch. |
Wayne Miller Send message Joined: 10 Feb 06 Posts: 5 Credit: 114,107 RAC: 0 |
I am still having problems with about 50% of my work units crashing since I've upgraded to the new BOINC, and even with Rosetta 3.31 version. Can someone point me in the right direction to find a solution? Or is there one? Thanks in advance. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. This one has failed twice the same error, ran for 14sec on my rig. ab_11_29__optpps_T5311_optpps_03_09_35686_288404_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=464420142 Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... EFPCWLVEEFVVAEECSPCSNFRAKTTPECGPTGYVEKITCSSSKRNEFKSCRSALME can not find a residue type that matches the residue PRO_p:pro_hydroxylated_case1at position 3 ERROR: core::util::switch_to_residue_type_set fails ERROR:: Exit from: src/core/util/SwitchResidueTypeSet.cc line: 143 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
Wiseman72 Send message Joined: 10 Nov 06 Posts: 1 Credit: 2,914,852 RAC: 0 |
ok 70% of my WU's are now failing since upgrade to 7.0.25: here is the stdrr text for the failed ones they are all the same ...all identical ... does Rosetta have a problem with the new upgrade ...we would like to know <core_client_version>7.0.25</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> [2012- 6- 2 17:46:58:] :: BOINC:: Initializing ... ok. [2012- 6- 2 17:46:58:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/1kf5_homfrags.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 </stderr_txt> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Another one of these, it erred after 15sec. ab_11_29__optpps_T5311_optpps_03_09_35686_290279_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=465028229 Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... EFPCWLVEEFVVAEECSPCSNFRAKTTPECGPTGYVEKITCSSSKRNEFKSCRSALME can not find a residue type that matches the residue PRO_p:pro_hydroxylated_case1at position 3 ERROR: core::util::switch_to_residue_type_set fails ERROR:: Exit from: src/core/util/SwitchResidueTypeSet.cc line: 143 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
I am still having problems with about 50% of my work units crashing since I've upgraded to the new BOINC, and even with Rosetta 3.31 version. Can someone point me in the right direction to find a solution? Or is there one? In my last 42 workunits, I've had only one fail, so there's probably more to your problems than you've mentioned. Are able to go into the advanced section of BOINC Manager, Event Log, scroll to the top, and copy all the lines before the first BOINC project is listed? For example, here's my results from that for one of my computers: 6/3/2012 6:46:06 PM | | No config file found - using defaults 6/3/2012 6:46:06 PM | | Starting BOINC client version 7.0.25 for windows_x86_64 6/3/2012 6:46:06 PM | | log flags: file_xfer, sched_ops, task 6/3/2012 6:46:06 PM | | Libraries: libcurl/7.21.6 OpenSSL/1.0.0d zlib/1.2.5 6/3/2012 6:46:06 PM | | Data directory: C:ProgramDataBOINC 6/3/2012 6:46:06 PM | | Running under account Bobby 6/3/2012 6:46:06 PM | | Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz [Family 6 Model 23 Stepping 10] 6/3/2012 6:46:06 PM | | Processor: 6.00 MB cache 6/3/2012 6:46:06 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe 6/3/2012 6:46:06 PM | | OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00) 6/3/2012 6:46:06 PM | | Memory: 8.00 GB physical, 15.79 GB virtual 6/3/2012 6:46:06 PM | | Disk: 919.67 GB total, 526.13 GB free 6/3/2012 6:46:06 PM | | Local time is UTC -5 hours 6/3/2012 6:46:06 PM | | NVIDIA GPU 0: GeForce GTS 450 (driver version 285.62, CUDA version 4.10, compute capability 2.1, 1024MB, 922MB available, 714 GFLOPS peak) 6/3/2012 6:46:06 PM | | OpenCL: NVIDIA GPU 0: GeForce GTS 450 (driver version 285.62, device version OpenCL 1.1 CUDA, 1024MB, 922MB available) Note that I have a rather high amount of memory - the maximum that motherboard can hold. That seems important for some of the newer Rosetta@Home workunits that use over 1 GB of memory each. |
Wayne Miller Send message Joined: 10 Feb 06 Posts: 5 Credit: 114,107 RAC: 0 |
Ok here is mine. What am I supposed to be looking for? 6/4/2012 1:55:50 PM | | No config file found - using defaults 6/4/2012 1:55:50 PM | | Starting BOINC client version 7.0.25 for windows_x86_64 6/4/2012 1:55:50 PM | | log flags: file_xfer, sched_ops, task 6/4/2012 1:55:50 PM | | Libraries: libcurl/7.21.6 OpenSSL/1.0.0d zlib/1.2.5 6/4/2012 1:55:50 PM | | Data directory: C:ProgramDataBOINC 6/4/2012 1:55:50 PM | | Running under account Wayne 6/4/2012 1:55:50 PM | | Processor: 4 AuthenticAMD AMD Phenom(tm) II X4 960T Processor [Family 16 Model 10 Stepping 0] 6/4/2012 1:55:50 PM | | Processor: 512.00 KB cache 6/4/2012 1:55:50 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow 6/4/2012 1:55:50 PM | | OS: Microsoft Windows Vista: Ultimate x64 Edition, Service Pack 2, (06.00.6002.00) 6/4/2012 1:55:50 PM | | Memory: 8.00 GB physical, 16.05 GB virtual 6/4/2012 1:55:50 PM | | Disk: 298.09 GB total, 150.02 GB free 6/4/2012 1:55:50 PM | | Local time is UTC -5 hours 6/4/2012 1:55:50 PM | | ATI GPU 0: Juniper (CAL version 1.4.1523, 1024MB, 991MB available, 2752 GFLOPS peak) 6/4/2012 1:55:50 PM | | OpenCL: ATI GPU 0: Juniper (driver version CAL 1.4.1523 (VM), device version OpenCL 1.1 AMD-APP-SDK-v2.5 (709.2), 1024MB, 991MB available) |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
In my recently rebuilt machine I run a similar system to you two but with less RAM & less disk space, but my last 81 tasks have completed just fine. 03/06/2012 06:01:43 | | No config file found - using defaults 03/06/2012 06:01:46 | | Starting BOINC client version 7.0.25 for windows_x86_64 03/06/2012 06:01:46 | | log flags: file_xfer, sched_ops, task 03/06/2012 06:01:46 | | Libraries: libcurl/7.21.6 OpenSSL/1.0.0d zlib/1.2.5 03/06/2012 06:01:46 | | Data directory: C:ProgramDataBOINC 03/06/2012 06:01:46 | | Running under account Harry 03/06/2012 06:01:46 | | Processor: 4 AuthenticAMD AMD Phenom(tm) 9850 Quad-Core Processor [Family 16 Model 2 Stepping 3] 03/06/2012 06:01:46 | | Processor: 512.00 KB cache 03/06/2012 06:01:46 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs page1gb rdtscp 3dnowext 3dnow 03/06/2012 06:01:46 | | OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00) 03/06/2012 06:01:46 | | Memory: 4.00 GB physical, 8.17 GB virtual 03/06/2012 06:01:46 | | Disk: 76.33 GB total, 29.06 GB free 03/06/2012 06:01:46 | | Local time is UTC +1 hours 03/06/2012 06:01:46 | | ATI GPU 0: ATI Radeon HD 2600 (RV630) (CAL version 1.4.1385, 512MB, 480MB available, 348 GFLOPS peak) So it must be something else. Next on the list would be Computing Preferences in the Boinc manager. Processor Usage tab While computer is on batteries: Ticked While computer is in use: Ticked Use GPU while computer is in use: Unticked Only after computer has been idle for 0.00 minutes While processor usage is less than 0 percent Every day between hours of 00:00 and 00:00 Day of week override: none Switch between applications every 250 minutes On multiprocessor systems, use at most 100.00 % of the processors Use at most 100.00 % CPU time Network usage tab All zero or blank or unticked except Minimum work buffer 0.10 Max additional work buffer 1.50 days Disk & Memory usage tab Use at most 10.00Gb disk space Leave at least 0.10Gb disk space free Use at most 50.00% of total disk space Tasks checkpoint to disk at most every 60 seconds Use at most 75.00% of page file (swap space) Memory Use at most 60.00% when computer is in use Use at most 90.00% when computer is idle Leave applications in memory while suspended: Ticked Exclusive applications tab None Any clues? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
Ok here is mine. What am I supposed to be looking for? Nothing in particular. Using an AMD CPU instead of an Intel CPU might be significant, but otherwise we'll both have to wait for others to post similar information. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
Just flicking over to my W7 Intel laptop, my Boinc Preferences are essentially the same as the Vista desktop posted above. 1 WU errored out in the last 50 WUs so pretty much fine. Settings as follows: 03/06/2012 06:15:19 | | Starting BOINC client version 7.0.25 for windows_x86_64 03/06/2012 06:15:19 | | log flags: file_xfer, sched_ops, task 03/06/2012 06:15:19 | | Libraries: libcurl/7.21.6 OpenSSL/1.0.0d zlib/1.2.5 03/06/2012 06:15:19 | | Data directory: C:ProgramDataBOINC 03/06/2012 06:15:19 | | Running under account Harry 03/06/2012 06:15:19 | | Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU T6600 @ 2.20GHz [Family 6 Model 23 Stepping 10] 03/06/2012 06:15:19 | | Processor: 2.00 MB cache 03/06/2012 06:15:19 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm tm2 pbe 03/06/2012 06:15:19 | | OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00) 03/06/2012 06:15:19 | | Memory: 4.00 GB physical, 7.99 GB virtual 03/06/2012 06:15:19 | | Disk: 453.94 GB total, 268.85 GB free 03/06/2012 06:15:19 | | Local time is UTC +1 hours 03/06/2012 06:15:19 | | No usable GPUs found 03/06/2012 06:15:19 | | Config: don't use coprocessors The next usual factor is whether the machine is being overclocked or not as the demands of Rosetta sometimes finds the weak spots, so I read (no expert here) |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
In my recently rebuilt machine I run a similar system to you two but with less RAM & less disk space, but my last 81 tasks have completed just fine. My Computing Preferences settings that are different from yours for one computer: processor usage Use GPU while computer is in use checked Only after computer has been idle for 3.00 minutes Switch between applications every 60.00 minutes On multiprocessor systems, use at most 50.00% of the processors (has also been 75.00% for some workunits) network usage Minimum work buffer 0.00 days Max additional work buffer 0.20 days disk and memory usage use at most 30.00 Gigabytes disk space Leave at most 0.50 Gigabytes disk space free Use at most 95.00% of page file (swap space) Memory usage Use at most 40.00% when computer is in use Use at most 40.00% when computer is idle Summary: The use of an AMD CPU no longer looks especially significant. Nothing else looks very significant yet. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
From a second computer, also included in the 42 workunits I mentioned above: 5/30/2012 1:22:21 PM | | log flags: file_xfer, sched_ops, task 5/30/2012 1:22:21 PM | | Libraries: libcurl/7.21.6 OpenSSL/1.0.0d zlib/1.2.5 5/30/2012 1:22:21 PM | | Data directory: C:ProgramDataBOINC 5/30/2012 1:22:21 PM | | Running under account Bobby 5/30/2012 1:22:21 PM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz [Family 6 Model 42 Stepping 7] 5/30/2012 1:22:21 PM | | Processor: 256.00 KB cache 5/30/2012 1:22:21 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 syscall nx lm vmx smx tm2 popcnt aes pbe 5/30/2012 1:22:21 PM | | OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00) 5/30/2012 1:22:21 PM | | Memory: 15.98 GB physical, 31.96 GB virtual 5/30/2012 1:22:21 PM | | Disk: 136.03 GB total, 61.46 GB free 5/30/2012 1:22:21 PM | | Local time is UTC -5 hours 5/30/2012 1:22:21 PM | | NVIDIA GPU 0: GeForce GT 440 (driver version 301.42, CUDA version 4.20, compute capability 2.1, 1536MB, 1414MB available, 342 GFLOPS peak) 5/30/2012 1:22:21 PM | | OpenCL: NVIDIA GPU 0: GeForce GT 440 (driver version 301.42, device version OpenCL 1.1 CUDA, 1536MB, 1414MB available) processor usage Use computer while computer is in use checked Only after computer has been idle for 0.00 minutes Switch between applications every 60.00 minutes On multiprocessor systems, use at most 88.00% of the processors network usage minimum work buffer 0.00 days max additional work buffer 0.20 days disk & memory usage Use at most 50.00 Gigabytes Leave at least 1.00 Gigabytes disk space free Use at most 95.00% of page file (swap space) Memory usage Use at most 80.00% when computer is in use Use at most 80.00% when computer is idle |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
Summary: The use of an AMD CPU no longer looks especially significant. Just looking up the thread, is the only problem you're having a slowdown when you go above 40% RAM? Otherwise you don't really have any problems? Or did I miss something? The one field you didn't mention is "Leave applications in memory while suspended: Ticked" Having this unticked has caused people problems in the past (for some reason). I don't know this affects machines nowadays. The only other one that's possibly significantly different is "On multiprocessor systems, use at most 50.00% of the processors" I can't recall if it's this one or the CPU time field that was an issue once - probably the latter now I think about it, so probably ok. And the other factor for some was whether overclocking is in operation. Otherwise, I agree, the settings look pretty standard %^| |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
Summary: The use of an AMD CPU no longer looks especially significant. I have severe problems with slowdowns of non-BOINC 32-bit programs whenever I let my first computer go above about 40% memory used, as if it was trying to stuff all the 32-bit programs into just one 4 GB 32-bit memory space. No such problems seen on my second computer (the one with Windows 7). No similar problems with 64-bit programs. "Leave applications in memory while suspended: Ticked" is correct for both my computers also. "On multiprocessor systems, use at most 50.00% of the processors" is currently needed on my first computer to devote one CPU core to an antivirus scan expected to take a few more days, and still leave one CPU core available for what I do from the keyboard. It will go back to 75% when the antivirus scan finishes. My second computer has 4 CPU core, hyperthreaded, so BOINC sees them as 8 CPU cores. Leaving just one of those 8 idle seems sufficient. I don't use overclocking, so I didn't mention it. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 9,368 |
Just looking up the thread, is the only problem you're having a slowdown when you go above 40% RAM? Otherwise you don't really have any problems? Or did I miss something? Ok, so it's not limited to BoincRosetta. That kind of makes sense, though it doesn't help you obviously... I don't use overclocking, so I didn't mention it. It's worth being explicit that it's not a factor when there aren't any clues elsewhere either - so it can be excluded as a possibility. Looks like we got nowhere fast. It may even be it's the quickest we've ever got nowhere... ;) |
Wayne Miller Send message Joined: 10 Feb 06 Posts: 5 Credit: 114,107 RAC: 0 |
Im not sure if this can be any help. But, after my original post I looked in the computing preferences menu and I found that the last category, the one that says "Use at most __% of cpu time" was at 50%. So i reset everything to the default settings. Now its at 100%, and since then not one work unit has failed. Maybe its just a coincidence. Anyway I thought I would share. |
Message boards :
Number crunching :
Rosetta@Home version 3.31
©2024 University of Washington
https://www.bakerlab.org