Message boards : Number crunching : Rosetta 4.0+
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 19 · Next
Author | Message |
---|---|
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I just found Rosetta 4.07 used 2,111,242,240 bytes (1.97 GIGAbytes) before my system crashed (i7-4770K, 8GB). That is a bit high. The maximum I see for the last two weeks is 1179 GB, and usually less than 700 GB. However, I have 32 GB, so they might as well use it. My other projects (on LHC and GPUGrid Quantum Chemistry) often use more. |
Darrell Send message Joined: 28 Sep 06 Posts: 25 Credit: 51,934,631 RAC: 0 |
@ Jim1348 That is a bit high. The maximum I see for the last two weeks is 1179 GB, and usually less than 700 GB. However, I have 32 GB, so they might as well use it. My other projects (on LHC and GPUGrid Quantum Chemistry) often use more. And on my 32GB computers, I don't mind. I wasn't expecting the 4.07 version to take so much, though. I would like to restrict them to the "big boys" but there doesn't seem to be a way to deselect or select them. Perhaps just limit the tasks to a single CPU on the computers that have only 8GB. LHC often takes more, but they run in a VM on my 32GB machines and so I can manage the load. |
Jesse Viviano Send message Joined: 14 Jan 10 Posts: 42 Credit: 2,700,472 RAC: 0 |
Rosetta v4.07 crashed on work unit 886791424. The error message is below: Unhandled Exception Detected... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,204,457 RAC: 10,266 |
Err... wut? Just had 27 Rosetta 4.07 tasks cancelled by the server - some that were already running. What happened?! Tuesday 03/04/2018 18:48:13 | Rosetta@home | Scheduler request completed |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,204,457 RAC: 10,266 |
And now 9 and 5 more on 2 other machines. Looks like all the DRH_curve_X jobs have been aborted... <sigh> |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
One of the features of the updated server code. On my Windows machine, they were taking excessive memory. So, that may be reason enough to cancel them. Rosetta Moderator: Mod.Sense |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,277,018 RAC: 1,575 |
I've had 8 tasks cancelled by server today, when they were, on the average, about half finished. Are you planning to issue any credit for the CPU time they used, or should I think of reducing the share of CPU time I offer to Rosetta@Home? Most of them used the 32-bit version of 4.07, even though they were running under 64-bit Windows and BOINC. The computer has 32 GB of memory, so the 64-bit version of 4.07 should have been able to give them all enough memory. |
aad Send message Joined: 5 Jan 06 Posts: 9 Credit: 194,209,187 RAC: 276 |
I've had 8 tasks cancelled by server today, when they were, on the average, about half finished. Are you planning to issue any credit for the CPU time they used, or should I think of reducing the share of CPU time I offer to Rosetta@Home? Yeah. I had that too, just a few minutes ago.... I sure hope this is not the new standard... |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Yes, I had four yesterday on an Ubuntu machine and one today on a Win7 machine that were aborted, a couple after 23+ hours. But it is better that they kill them if they know they are defective and save what time they can. Some more quality control would be better still. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2122 Credit: 41,204,457 RAC: 10,266 |
One of the features of the updated server code. Sure, I've seen tasks cancelled at the server end before - just never so many of all one task-type. My question was more to ask what went wrong with the batch that they all had to be withdrawn. I hadn't noticed the memory issue, but I've got plenty to spare My guess is the Rosetta guys were in such a rush to re-supply us with tasks after the recent outage, something major got missed in quality control, only realised when tasks starting coming back. My concern at the time was we might've had another shortage as new tasks were brought down to replace them in our buffers, as no new tasks came down, but that didn't happen. And I can see new DRH_curve_X tasks in my current buffer, so I'm guessing they got fixed and are now fed back through to us. All's well that ends well. |
San-Fernando-Valley Send message Joined: 16 Mar 16 Posts: 12 Credit: 143,229 RAC: 0 |
... just want to add my 2 cents worth: Started crunching today after a very long pause (many months). I noticed that after about 4:00 to 4:30 hours (approx.) elapsed time (4.07 and mini 3.78) there is suddenly an increase of remaining time from 4 hours to approx. 20 hours! This is happening on at least 3 of my rigs (Win7 64-bit). Anybody any ideas if this is OK or not? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
... just want to add my 2 cents worth: It sounds like you may have changed your runtime preference on the R@h website. 24hrs is the highest value allowed. Beyond that, estimated time remaining is not a very reliable indicator. I would not presume any problem based solely on that. Rosetta Moderator: Mod.Sense |
James W Send message Joined: 25 Nov 12 Posts: 130 Credit: 1,766,254 RAC: 0 |
I agree that estimated time remaining per BOINC is definitely an estimate, in general. On my Windows 7 machine, 4.07 will show approx. 5 hrs. estimated crunch time and mini 3.78 will show approx. 8 hrs. estimated crunch time. However, the 4.07 WUs and 3.78 WUs both end up taking approx. 8 hrs. to complete. |
San-Fernando-Valley Send message Joined: 16 Mar 16 Posts: 12 Credit: 143,229 RAC: 0 |
... haven't changed anything ... WUs have all but one finished without error. I sort of find it inappropiate to show an aprrox. runtime of 4 to 5 hours and then suddenly the darn things increase up to just under 24 hours !!! I am sure you have cited somewhere how long these WUs run? |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,035,199 RAC: 7,488 |
It appears from the 86k seconds for the WUs that your preference is set to 24 hours .... 86k seconds even though the Rosetta command line says "-cpu_run_time 28800" or 8 hours. Rosetta seems to be ignoring your 8 hour preference and running the maximum 24 hours. Rosetta loops on multiple attempts until the time preference is reached and then it terminates when that loop is finished. It looks like you are getting the 24 hour credit but something appear broken. Task 987505757 command: projects/boinc.bakerlab.org_rosetta/rosetta_4.07_windows_x86_64.exe @P49334_PF04281_0.6_domain1.bnd15.flags -in:file:boinc_wu_zip P49334_PF04281_0.6_domain1.bnd15.zip -nstruct 10000 -cpu_run_time 28800 -watchdog -boinc:max_nstruct 600 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 3903853 Starting watchdog... Watchdog active. ====================================================== DONE :: 1 starting structures 86140.1 cpu seconds This process generated 191 decoys from 191 attempts ====================================================== BOINC :: WS_max 3.94056e+08 |
Conan Send message Joined: 11 Oct 05 Posts: 150 Credit: 4,193,109 RAC: 903 |
All 32 bit work units on both this main project and on Ralph test project, all fail with the "Can't Create Process" error. I have checked my anti-virus and it does not appear to be blocking. Only the "Rosetta" tasks are failing , the "Rosetta Mini" tasks are running fine. Linux is OK. Conan |
Conan Send message Joined: 11 Oct 05 Posts: 150 Credit: 4,193,109 RAC: 903 |
All 32 bit work units on both this main project and on Ralph test project, all fail with the "Can't Create Process" error. Any headway on this 32 bit issue? 64 bit on Linux runs fine for both work unit types. Both Rosetta and Ralph have the same issue with the Rosetta work units. Rosetta Mini works fine on both projects. Could it be that the Rosetta work units are not Win32 valid applications? And should be re-compiled as 32 Bit as they appear to be 64 Bit instead. I am running Windows XP 32 Bit on the computer that can't run the Rosetta work unit type. An answer would be nice as it has been over 10 days now. thanks Conan |
m Send message Joined: 2 May 09 Posts: 12 Credit: 7,839,281 RAC: 3,650 |
Could be this is your problem (and mine...) but don't hold your breath for a fix. |
Conan Send message Joined: 11 Oct 05 Posts: 150 Credit: 4,193,109 RAC: 903 |
Could be this is your problem (and mine...) but don't hold your breath for a fix. No I wont be holding my breath, as that report, with the same error I have is from the 1st Feb 2018, so no fix for this 32 Bit issue for almost 3 months now. And all the Ralph work units I have been given that have also failed with the same error have also not been fixed. What is the point of a test project when things are not tested? No point at all. Conan |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,602,547 RAC: 8,833 |
The screensaver of all "Xy_00" wus crashes. |
Message boards :
Number crunching :
Rosetta 4.0+
©2024 University of Washington
https://www.bakerlab.org