Message boards : Number crunching : Problems with Minirosetta v1.54
Author | Message |
---|---|
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
Hello All! We're ready for a new update. I want to say thank all of you who have helped over the last months to find and fix errors in minirosetta. A particular thank you goes to those who have donated their time over on RALPH and helped with their active feedback - we managed to find a number of difficult and rare bugs and put some new features into minirosetta that should help conserve computer time. Read about it here: http://ralph.bakerlab.org/forum_thread.php?id=431 and here http://ralph.bakerlab.org/forum_thread.php?id=432 I should add that work over there will continue,but now supplemented with information from Rosetta@HOME. This update is highly focused on bugfixing and stability issues - we have virtually no new science in it, but: We will hopefully now be able to run the science projects that have been in the pipeline waiting for BOINC - we're expecting quite a bit of work to go out very soon indeed. See Dr. Baker's journal for more details. Features/Fixes: 1.54 Release CHANGELOG
http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The link in the news item that should bring you to this thread is truncated. Rosetta Moderator: Mod.Sense |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The news item also shows the year as 2008 (which is probably the last time you had enough coffee to be able to read the calendar!! All these improvements are going to send TeraFLOPS much higher! Nice work Mike, and BakerLab. I can really see that you've come through for people here). Rosetta Moderator: Mod.Sense |
darengosse Jean-Paul Send message Joined: 9 Jun 06 Posts: 18 Credit: 259,459 RAC: 0 |
Hello, the version 1.47 was very well for me with 151 Workunits and 0 errors and an average CPU time 2.8 hours. Hope that the new version 1.54 will be as well... |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
If you are seeing errors with lock-file problems try setting the cpu setting back to 100%. If you are running at 100% CPU preference and are getting this problem, I for one, am very interested. If you are getting the failures and change the CPU setting to 100% and that cures the issue ... well, we are interested in THAT too ... I read about this in Einstein@Home and it seems to work for me ... YMMV ... |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,208,737 RAC: 2,882 |
I don't know about others but my Rosetta machines are running dry!!! The new minirosetta is stuck downloading at 89.25% and has been there for HOURS!!!! I have had to attach to a different project until it gets sorted out. So far all machines, exact same problem, one a dual core one a single core. If you llok at my computers, they are not hidden, any task that says "outcome unknown" is because the mini-rosetta download ain't happenning!!!! Message in Boinc says 1/28/2009 4:45:03 AM|rosetta@home|Started download of minirosetta_1.54_windows_intelx86.exe 1/28/2009 4:50:11 AM|rosetta@home|Temporarily failed download of minirosetta_1.54_windows_intelx86.exe: HTTP error 1/28/2009 4:50:12 AM|rosetta@home|Started download of minirosetta_1.54_windows_intelx86.exe 1/28/2009 4:50:13 AM||Internet access OK - project servers may be temporarily down. 1/28/2009 4:50:34 AM||Project communication failed: attempting access to reference site 1/28/2009 4:50:34 AM|rosetta@home|Temporarily failed download of minirosetta_1.54_windows_intelx86.exe: connect() failed 1/28/2009 4:50:35 AM||Internet access OK - project servers may be temporarily down. etc, etc, etc, etc forever!!!! Another project now loves you!! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
mikey, I haven't seen that problem my self, so it's not likely on the server side. At least not consistently. So it also seems odd that all of your servers are stopping... is it on the same file? You have to download the new programs, which is several MB. Are your machines all going through the same proxy or something that might be hung up on that particular file? Could I ask you to check the transfers tab and see exactly which file and how much of it you've downloaded? Your hosts seem to have pretty good bandwidth. Is anyone else seeing such a problem? Given then increase in project TFLOPS, I am thinking it is rare at best. Have you tried aborting the transfer on one of the machines? This may cause a couple of tasks to fail due to downloading error, but BOINC will recover and eventually try to pull a fresh copy of the problem file. Rosetta Moderator: Mod.Sense |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
Paul, can you point me to the thing you read about Lockfile problems on Einstein !? 5% of jobs fail in this way consistently. I would love to know if the problem is us or the clients or what, and get it resolved. What do you mean by 100% CPU ? If i can make this happen here on my machine i could learn better about what's going on. Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
What do you mean by 100% CPU ? "computing preferences" configured on website for the venue of the machine. The setting is called "Use at most" at the bottom of the processor usage section. Can also be configured via the BOINC Manager for a specific host. Rosetta Moderator: Mod.Sense |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,208,737 RAC: 2,882 |
mikey, I haven't seen that problem my self, so it's not likely on the server side. At least not consistently. So it also seems odd that all of your servers are stopping... is it on the same file? You have to download the new programs, which is several MB. Are your machines all going through the same proxy or something that might be hung up on that particular file? I do not use a proxy, just straight to the net. I use Comcast. Could I ask you to check the transfers tab and see exactly which file and how much of it you've downloaded? Your hosts seem to have pretty good bandwidth. It is "minirosetta_1.54_windows_intelx86.exe" I have aborted, retried, everything, it is just stuck on ALL of my pc's. The one I am looking at right now has been trying for 11:51:02 and is going to retry in 03:34:34, and counting. Is anyone else seeing such a problem? Given then increase in project TFLOPS, I am thinking it is rare at best. Yes I have, no luck, the file is stuck at 89.25, 89.26 or 89.27% depending on the pc. I am stuck at exactly 5.85 meg of 6.56 meg on all machines. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
mikey, if you would like to study this further, it would be helpful if you could create a cc_config.xml file and add the flag for debug of file transfers. You have to define the first three flags as shown, then just add a line for the: <file_xfer_debug>1</file_xfer_debug> If you already have such a file set up, do you have the <http_1_0> flag defined? Not asking you to do that one, just asking if you were already doing it. HTTP 1.0 does not have the ability to retry from the middle of the transfer (persistent file transfer is the term BOINC uses for this). It has to start over each attempt. Then BOINC seems to only open the pipe for 5minutes at a time. So if you can't get the whole thing in 5min. It might never happen. Rosetta Moderator: Mod.Sense |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
I'm seeing a validate error on task 224245929 , workunit 204213187, Mac OS X 10.4.11. The task name is 1nkuA_BOINC_MPZN_with_zinc_abrelax_cs_frags_6231_115354_1 : it ran twice as long as it was supposed to and I was the second person to get it. The original person to whom it was sent also got the same validate error: irritating after it took twice as long as it was supposed to. It seems to be one of these zinc-containing proteins that have a habit of doing this. <core_client_version>6.2.18</core_client_version> <![CDATA[ <stderr_txt> BOINC:: Initializing ... ok. [2009- 1-28 1:26:32:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Starting work on structure: _00001 Watchdog active. # cpu_run_time_pref: 14400 Starting work on structure: _00002 ====> called boinc_finish </stderr_txt> ]]> |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
mikey, I don't know why I didn't think of this before... Do a binary ftp of the file from here: boinc.bakerlab.org/download/minirosetta_1.54_windows_intelx86.exe and drop it in to your Rosetta folder in your BOINC data directory under the projects subfolder. That will at least get you up and running, or on to the next file to see if similar problems continue. Rosetta Moderator: Mod.Sense |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
It is "minirosetta_1.54_windows_intelx86.exe" I have aborted, retried, everything, it is just stuck on ALL of my pc's. My guess is that some anti-virus software either on your PC or at your ISP is blocking the download because the file is a .exe and it somehow looks suspicious to the anti-virus software. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2130 Credit: 41,424,155 RAC: 14,205 |
|
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
I'm seeing a number of WUs ending at 99 models. They are ending normally, but they often take less than half my 12 hour (43,200 sec) preference. Some examples: https://boinc.bakerlab.org/rosetta/result.php?resultid=223957908 https://boinc.bakerlab.org/rosetta/result.php?resultid=223968996 https://boinc.bakerlab.org/rosetta/result.php?resultid=223981088 https://boinc.bakerlab.org/rosetta/result.php?resultid=223989528 https://boinc.bakerlab.org/rosetta/result.php?resultid=223997524 https://boinc.bakerlab.org/rosetta/result.php?resultid=224065056 |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
I'm seeing a number of WUs ending at 99 models. They are ending normally, but they often take less than half my 12 hour (43,200 sec) preference. Sorry i should have mentioned there is a new rule. Mini will not produce more than 99 models. It will finish gracefully and grant full credit. The reason for this is that i want to prevent your individual uploads from getting too large. In the future there will be a better way to do this, like it will check that the output file size has not reached some limit. ITs just another safety hook that's been put in to prevent WUs from misbehaving. http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
darengosse Jean-Paul Send message Joined: 9 Jun 06 Posts: 18 Credit: 259,459 RAC: 0 |
Hello with all. For me no problems to receive from Wu Minirosetta v1.54. J'received 17 Wu to be made for February 6, 2009 with 21:28:04 (France Time). The first calculations should begin today (January 29), and if it with problems I you will warn about it there. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Paul, Two places to start: are here and here ... I can also report that since I made that change i have been getting good results on Win XP systems ... I cannot see the high error rate I had in the past as the tasks have been purged ... It seemed to me to be a problem I had on XP and it was most severe on the i7 where there are more things going on ... |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,208,737 RAC: 2,882 |
mikey, I don't know why I didn't think of this before... No difference, I downloaded the file, dropped it into the directory C:BoincDataProjectsboinc.bakerlab.org_rosetta. It asked if I wanted to overwrite what was there, the new file was bigger, I said yes, exited Boinc, restarted Boinc, did an update of Rosetta and it is downloading the file AGAIN and is stuck at exactly the same place. I even turned off my ad-aware and anti-virus and no change. Change #1....Just after I first posted this I did a total shutdown and then a restart, no change, Boinc is still trying to download that same file! I am about ready to detach and then reattach and see if that fixes it! Change #2....I detached and then reattached. Started downloading all the Rosetta files again. I made sure everything Rosetta was gone out of the Boinc and all subdirectories, so downloading was not a surprise. It got thru all the files except the usual one, stopped at exactly the same place. I aborted the transfer and stopped Boinc. I then copied the file I had downloaded manually into the same place as before, and did another update of Rosetta. It asked for 36000 seconds of work and got none. It went into the communication deferred state and is now downloading the EXACT SAME FILE again!!!! It is also STUCK at the EXACT SAME PLACE!!!! I have no clue how to fix this and other projects are working just fine. Frustrating to say the least!!!!! |
Message boards :
Number crunching :
Problems with Minirosetta v1.54
©2024 University of Washington
https://www.bakerlab.org