Message boards : Number crunching : Minirosetta v1.32 bug thread
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author | Message |
---|---|
JStateson Send message Joined: 7 May 07 Posts: 15 Credit: 4,061,331 RAC: 0 |
Seeing segfaults periodically but all tasks seem to finish OK. How is that? All results returned seem valid except some messages about not having a filter https://boinc.bakerlab.org/rosetta/results.php?hostid=874157 I am running boinc 6.2.15 amd64 on ubuntu 8.0.4.1 and looking in the logs I see a bunch of seg faults jstateson@jyslinux3:/var/log$ grep -i segfault kern.log kern.log:Aug 22 09:33:52 jyslinux3 kernel: [53889.135133] minirosetta_1.3[7041]: segfault at ff3fbff8 rip 89bd380 rsp ff3fbed8 error 6 kern.log:Aug 22 17:41:43 jyslinux3 kernel: [83133.993543] minirosetta_1.3[7372]: segfault at ff3fbff8 rip 89bd380 rsp ff3fbed8 error 6 kern.log:Aug 23 18:24:55 jyslinux3 kernel: [75871.459737] minirosetta_1.3[20077]: segfault at ff5fbff8 rip 89bd380 rsp ff5fbed8 error 6 kern.log:Aug 23 18:24:55 jyslinux3 kernel: [75871.559667] minirosetta_1.3[19621]: segfault at ff5fbff8 rip 89bd380 rsp ff5fbed8 error 6 I switch to boinc 6.2.15 from 5.15.45 after result time 21 Aug 2008 22:30:58 and those 4 segfaults occured afterwards. Since all results were returned and all were valid I am unsure what effect the segfaults had. [img]http://www.boincstats.com/signature/user_610944.gif[/img |
Gray Handcock Send message Joined: 26 Sep 05 Posts: 20 Credit: 2,018,415 RAC: 0 |
hi thus far what I have processed has been rated a success. I am vaguely curious as to why a discrepancy between claimed credit and granted credit - I mean this in a spirit of enquiry only, as I am not in a position to compete with anyone: I crunch here and there as and when I can... :( 187039158 170829845 24 Aug 2008 14:40:04 UTC 24 Aug 2008 17:24:10 UTC Over Success Done 6,949.45 16.82 7.76 186859960 170670039 23 Aug 2008 20:48:21 UTC 24 Aug 2008 16:18:07 UTC Over Success Done 7,501.63 18.15 6.32 186500690 170342018 22 Aug 2008 8:35:39 UTC 24 Aug 2008 13:43:03 UTC Over Success Done 9,431.64 22.82 8.09 185665231 169581780 18 Aug 2008 22:28:03 UTC 22 Aug 2008 20:22:17 UTC Over Success Done 6,588.55 15.76 5.32 185659319 169576641 18 Aug 2008 21:46:50 UTC 19 Aug 2008 9:17:56 UTC Over Success Done 7,270.64 17.39 6.88 I underline again: this is NOT a major issue - I am merely curious. I am running on a winXP box with SP3 and Boinc Manager 6.3.10 Gray |
Storeytime Send message Joined: 10 Oct 06 Posts: 2 Credit: 2,207,638 RAC: 0 |
Approximately 3 out of every 4 work units ends in computation error. Its gotta to the point where I only have a 4 Daily WU quota. I have attached to other projects with no problems its getting annoying. some computers lockup. |
Pilgrim57 Send message Joined: 31 Jul 08 Posts: 3 Credit: 1,965,851 RAC: 0 |
I am getting this reported in 1.32 finished WUs "needs psipred_ss2 to run filters" also only get about 2/3 to 3/4 of claimed credit for all mini rosetta WUs! 186967182 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=186255800 https://boinc.bakerlab.org/rosetta/result.php?resultid=186219940 https://boinc.bakerlab.org/rosetta/result.php?resultid=186175420 https://boinc.bakerlab.org/rosetta/result.php?resultid=186135295 https://boinc.bakerlab.org/rosetta/result.php?resultid=186097857 https://boinc.bakerlab.org/rosetta/result.php?resultid=186051535 abinitio_only62_A_1tif__4438_3601_0 abinitio_only62_A_1louA_4438_3430_0 abinitio_only62_A_1fna__4438_2176_0 abinitio_homfrag_71_A_1mjcA_4443_534_0 abinitio_only62_A_1iibA_4434_6107_0 abinitio_only62_A_1louA_4434_4589_0 this shows the same thing as Pilgrim57's machine (needs psipred_ss2 to run filters) full run time,granted credit is above claimed like normal and no other errors |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=186255800 Now the following show the same message Task ID 186291840 Name abinitio_homfrag_71_A_1tzaA_4443_1287_0 Task ID 186332312 Name abinitio_homfrag_71_A_2hx5A_4443_1477_0 same result,full run time and perfect credit |
Terrasapiens Send message Joined: 25 Apr 08 Posts: 15 Credit: 368,919 RAC: 0 |
Can someone take a look at my failed work units (link below) and tell me if there is any more debugging info I can provide to help fix the problems I've been having with the rosetta mini WUs? Is there a problem with the WUs themselves, or my machine or the mini app? Almost every one I've received in the past month or two has failed immediately. I don't think I've had any failures on the non-mini WUs nor on the Seti WUs I process as well. https://boinc.bakerlab.org/rosetta/results.php?userid=254884 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Can someone take a look at my failed work units (link below) and tell me if there is any more debugging info I can provide to help fix the problems I've been having with the rosetta mini WUs? Is there a problem with the WUs themselves, or my machine or the mini app? Almost every one I've received in the past month or two has failed immediately. I don't think I've had any failures on the non-mini WUs nor on the Seti WUs I process as well. G'day Terrasapiens. Looking at your tasks couple things i can think of. 1/ You have 1 Gig of ram shared between two cores could be a problem with some tasks, has been known to cause problems for some people. 2/ Are you using onboard graphics because the error code looks like it could be a hardware conflict. 3/ If not no:2 can you run tests on your ram or try some other sticks. pete |
Terrasapiens Send message Joined: 25 Apr 08 Posts: 15 Credit: 368,919 RAC: 0 |
G'day Terrasapiens. Pete, as far as graphics I have an ATI All-In-Wonder series w/ 128Mb of RAM. I didn't seem to have any problem with rosetta WUs crashing until v1.28 came out. Since then only the mini rosetta ones crash. All else runs fine. I had 2 RAM issues a while back with the machine that caused it to to randomly reboot. I found out that when I removed and reset the RAM cards everything worked fine. I've had no apps crashing on this machine other than the MR. Maybe sometime in the next couple of days I'll try removing the RAM again and then running the RAM test program I have to see if anything shows up. I don't have other sticks to try. Maybe some time this year the box will get a full gutting and upgrade, but that could be a while. Thanks |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Can someone take a look at my failed work units (link below) and tell me if there is any more debugging info I can provide to help fix the problems I've been having with the rosetta mini WUs? Is there a problem with the WUs themselves, or my machine or the mini app? Almost every one I've received in the past month or two has failed immediately. I don't think I've had any failures on the non-mini WUs nor on the Seti WUs I process as well. that link shows 'no access' |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Can someone take a look at my failed work units... Terrasapiens, I see you are running BOINC 6.2.18. Do have any history running mini on older versions of BOINC? Are you using BOINC as your screensaver? Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
I have rather a serious problem that I'd appreciate some help or advice with. I've just upgraded my computer to an AMD Phenom 9850 Quad Core running Vista 64-bit so I grabbed Boinc Mgr 6.2.18 for Windows 64-bit. 24/08/2008 18:23:59||Starting BOINC client version 6.2.18 for windows_x86_64 24/08/2008 18:23:59||log flags: task, file_xfer, sched_ops 24/08/2008 18:23:59||Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3 24/08/2008 18:23:59||Running as a daemon 24/08/2008 18:23:59||Data directory: C:ProgramDataBOINC 24/08/2008 18:23:59||Running under account boinc_master 24/08/2008 18:23:59||Processor: 4 AuthenticAMD AMD Phenom(tm) 9850 Quad-Core Processor [AMD64 Family 16 Model 2 Stepping 3] 24/08/2008 18:23:59||Processor features: fpu tsc pae nx sse sse2 pni 24/08/2008 18:23:59||OS: Microsoft Windows Vista: Home Premium x64 Editon, Service Pack 1, (06.00.6001.00) 24/08/2008 18:23:59||Memory: 8.00 GB physical, 16.05 GB virtual 24/08/2008 18:23:59||Disk: 457.85 GB total, 378.16 GB free 24/08/2008 18:23:59||Local time is UTC +1 hours 24/08/2008 18:23:59||No coprocessors 24/08/2008 18:23:59|rosetta@home|URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 878134; location: home; project prefs: default 24/08/2008 18:23:59||General prefs: from rosetta@home (last modified 11-Feb-2008 13:52:58) 24/08/2008 18:23:59||Computer location: home 24/08/2008 18:23:59||General prefs: no separate prefs for home; using your defaults 24/08/2008 18:23:59||Reading preferences override file 24/08/2008 18:23:59||Preferences limit memory usage when active to 4914.23MB 24/08/2008 18:23:59||Preferences limit memory usage when idle to 7780.86MB 24/08/2008 18:23:59||Preferences limit disk usage to 4.66GB My problem is that 79 of my last 151 WU's (last 4 days only) failed with a "Compute Error" as shown here. I noticed this a while back but saw the note above that old WUs were in the system, so I've left it for those to clear through. Examining those WUs that failed, there were zero failures for Rosetta 5.98 files, though I've actually had very few in the last week. All the failures have been with Mini 1.32 - maybe 40% success rate, 60% failure. The majority of these failures come after considerable processing - sometimes even 80% the way through. Example of some errors: Task 187626469 and 27/08/2008 11:18:36|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file 27/08/2008 11:18:36|rosetta@home|If this happens repeatedly you may need to reset the project. 27/08/2008 11:19:18|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file 27/08/2008 11:19:18|rosetta@home|If this happens repeatedly you may need to reset the project. 27/08/2008 11:19:18|rosetta@home|Restarting task abinitio_homfrag_71_A_2o7kA_4443_8695_0 using minirosetta version 132 27/08/2008 11:19:59|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file 27/08/2008 11:19:59|rosetta@home|If this happens repeatedly you may need to reset the project. 27/08/2008 11:20:40|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file 27/08/2008 11:20:40|rosetta@home|If this happens repeatedly you may need to reset the project. 27/08/2008 11:20:40|rosetta@home|Restarting task abinitio_homfrag_71_A_2o7kA_4443_8695_0 using minirosetta version 132 [And on repeatedly until finally...] 27/08/2008 12:25:39|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file 27/08/2008 12:25:39|rosetta@home|If this happens repeatedly you may need to reset the project. 27/08/2008 12:25:40|rosetta@home|Restarting task abinitio_homfrag_71_A_2o7kA_4443_8695_0 using minirosetta version 132 27/08/2008 12:26:20|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file 27/08/2008 12:26:20|rosetta@home|If this happens repeatedly you may need to reset the project. 27/08/2008 12:26:20|rosetta@home|Restarting task abinitio_homfrag_71_A_2o7kA_4443_8695_0 using minirosetta version 132 27/08/2008 12:26:39|rosetta@home|Task abinitio_homfrag_71_A_1prqA_4443_9291_0 exited with zero status but no 'finished' file 27/08/2008 12:26:39|rosetta@home|If this happens repeatedly you may need to reset the project. 27/08/2008 12:27:01|rosetta@home|Computation for task abinitio_homfrag_71_A_2o7kA_4443_8695_0 finished It's the same story with: Task 187600159 Task 187437714 Task 187389606 All of them come up with the same errors: needs psipred_ss2 to run filters andor Can't acquire lockfile - exiting Those that have been "Aborted by User" were where I noticed the progress on tasks had stopped ticking up, saw the same old error messages and just aborted to move on hoping to have more luck with the next WU. Thing is, some of these tasks have been taken on and completed by others: Task 187600159 completed on an Intel Quad Core machine running XP SP3 Task 187389606 completed on an Intel Duo running XP SP2 I don't know whether this is an AMD issue, a Vista issue, a 64-bit issue or a MiniRosetta 1.32 issue - or whether it's some flakiness on my own machine. But I also don't know why all Rosetta 5.98 tasks run perfectly and 40% of Mini tasks go through ok. Any help or advice gratefully received. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
27/08/2008 11:19:59|rosetta@home|Task abinitio_homfrag_71_A_2o7kA_4443_8695_0 exited with zero status but no 'finished' file One rather obvious note I forgot. While waiting for those old WUs to pass through, I did go through the process of 'resetting the project' with no discernible improvement. Edit again: I just looked at a Mini WU that ran successfully 187651948 and it also came up with several 'needs psipred_ss2 to run filters' errors but not enough to make it fail. |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
needs psipred_ss2 to run filters andorI have just run a batch of min 1.32s on Ralph and none of them had the above message. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
needs psipred_ss2 to run filters Thanks for the feedback, Evan. Appreciated. I took a look at your setup (hope you don't mind) because I'm concerned it's to do with me having an AMD or Vista or 64bit before it's a Mini 1.32 problem. I see you run an Intel P4 with XP SP2 under Boinc 5.10.20. Maybe that's a difference. But I looked at your last WU 187113607 at Rosetta and lo and behold you actually did get several "needs psipred_ss2 to run filters" errors, the same as me, but not enough to make the WU fall over - again like some of mine. In fact all your WUs show that error. Because our machines, OS and software are so very different, this seems to point to it being a Mini 1.32 bug after all. What you don't show is my "Can't acquire lockfile - exiting" error. This seems to be the clue that screws up my WUs altogether then. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
needs psipred_ss2 to run filters keep any eye on this thread to see if anyone posts answers. i am getting alot of those ss2 messages but everything completes normally. |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
I took a look at your setup (hope you don't mind) because I'm concerned it's to do with me having an AMD or Vista or 64bit before it's a Mini 1.32 problem. I see you run an Intel P4 with XP SP2 under Boinc 5.10.20. Maybe that's a difference. Without any information from the backroom boys I would take a guess that they have been working at a fix. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
What you don't show is my "Can't acquire lockfile - exiting" error. This seems to be the clue that screws up my WUs altogether then. I've seen it now. It looks like it's annoying message but not fatal and not related to my new machinehardwareOS. I'll rest easy on that one. It's the causemeaning of the lockfile error then. Let's hope the backroom boys really are working on it and know where to point me (or themselves). I'd appreciate an acknowledgement, but I guess the issue of the day is the lack of any WUs right now. I'm crunching my last one (a 5.98 one) and it's gone to 6 hours with no sign of completing yet for some reason. Better that than nothing I guess. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,324,975 RAC: 3,231 |
needs psipred_ss2 to run filters I also get a lot of those ss2 messages, but on an AMD processor using Vista SP1 and BOINC 5.10.45. I haven't seen any of the lockfile messages. I wonder if some of the current workunits are missing the ss2 file since they don't need filtering, but 1.32 doesn't have a way built in to just turn off any attempts to use this file. |
mitrichr Send message Joined: 23 May 07 Posts: 44 Credit: 1,005,660 RAC: 0 |
The graphic is freezing, meaning, I assume, that the WU is a dead fish. {...} David- Yes, I know, that was what I had been doing and did not want to. >>RSM http://sciencespringe.wordpress.com http://facebook.com/sciencesprings |
Message boards :
Number crunching :
Minirosetta v1.32 bug thread
©2024 University of Washington
https://www.bakerlab.org