work won't complete, says 10 min left only

Questions and Answers : Preferences : work won't complete, says 10 min left only

To post messages, you must log in.

AuthorMessage
Luckydriver

Send message
Joined: 20 May 07
Posts: 1
Credit: 2,153
RAC: 0
Message 58733 - Posted: 11 Jan 2009, 16:17:44 UTC

I dont know if i changed something inadvertently or what but my PC is only on about 1 hr on weekdays but on weekends is on well over 8 hours.

I've had a unit that says time to completion is 10 minutes but it's been like that for about a week. Even today the PC has been on 3 hours but it wont finish

i really have no clue what some of the settings are but here they are (my PC is regularly down to only 500mb of disk space left if that matters)

Suspend work while computer is on battery power?
(matters only for portable computers) yes

Suspend work while computer is in use? no
'In use' means mouse/keyboard activity in last 1 minutes

Suspend work if no mouse/keyboard activity in last
(Needed to enter low-power mode on some computers)
Enforced by version 5.10.14+ --- minutes

Do work only between the hours of (no restriction)

Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes') yes

Switch between applications every
(recommended: 60 minutes) 60 minutes

On multiprocessors, use at most 1 processors
Use at most

Enforced by version 5.6+ 50 percent of CPU time

Disk and memory usage
Use at most 0.3 GB disk space

Leave at least
(Values smaller than 0.001 are ignored) 0.1 GB disk space free

Use at most 10% of total disk space
Write to disk at most every 60 seconds

Use at most 10% of page file (swap space)

Use at most
Enforced by version 5.8+ 50% of memory when computer is in use

Use at most
Enforced by version 5.8+ 90% of memory when computer is not in use

Network usage
Computer is connected to the Internet about every
(Leave blank or 0 if always connected.
BOINC will try to maintain at least this much work.) 1 days

Maintain enough work for an additional
Enforced by version 5.10+ 0.25 days

Confirm before connecting to Internet?
(matters only if you have a modem, ISDN or VPN connection) no

Disconnect when done?
(matters only if you have a modem, ISDN or VPN connection) yes

Maximum download rate: no limit

Maximum upload rate: no limit

Use network only between the hours of
Enforced by version 4.46+ (no restriction)
Skip image file verification?

Check this ONLY if your Internet provider modifies image files (UMTS does this, for example).
Skipping verification reduces the security of BOINC. no

ID: 58733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rzlatic
Avatar

Send message
Joined: 20 Nov 07
Posts: 3
Credit: 327,897
RAC: 0
Message 94744 - Posted: 18 Apr 2020, 9:59:36 UTC
Last modified: 18 Apr 2020, 10:05:50 UTC

same here.
some (not all) workunits reach 98.xxx% after 9-10 hours, then goes to a super-slow crawl, reporting 10 minutes left, but it goes for hours really. a few hours later the computation time is 13 hours, and still 98% with small fraction higher percentage than few hours ago.
if i restart boinc client at that point, the WU goes to zero and starts over again. 13 hours wasted.

this happens for a week now, every day, with random workunits.

WU properties before boinc restart: https://imgur.com/WaDeO14
after boinc restart: https://imgur.com/b650PbA
ID: 94744 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,751,275
RAC: 22,974
Message 94750 - Posted: 18 Apr 2020, 10:17:48 UTC - in response to Message 94744.  

same here.
some (not all) workunits reach 98.xxx% after 9-10 hours, then goes to a super-slow crawl, reporting 10 minutes left, but it goes for hours really. a few hours later the computation time is 13 hours, and still 98% with small fraction higher percentage than few hours ago.
if i restart boinc client at that point, the WU goes to zero and starts over again. 13 hours wasted.

this happens for a week now, every day, with random workunits.

WU properties before boinc restart: https://imgur.com/WaDeO14
after boinc restart: https://imgur.com/b650PbA
See here.
May help.
Grant
Darwin NT
ID: 94750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,751,275
RAC: 22,974
Message 94752 - Posted: 18 Apr 2020, 10:23:56 UTC - in response to Message 58733.  

I dont know if i changed something inadvertently or what but my PC is only on about 1 hr on weekdays but on weekends is on well over 8 hours.

I've had a unit that says time to completion is 10 minutes but it's been like that for about a week. Even today the PC has been on 3 hours but it wont finish
The default Target CPU time for a Task is 8 hours.
You could try changing it to 2 hours, but even so with the times your system is on a lot of your work will likely time out as the deadlines for most Tasks are 3 days.
Anything you do on the computer will take time from Rosetta processing, so even at 1 hour a day, with a 2hr processing target CPU time there's a good chance of not making the deadlines.
And there are some Tasks over the last few days that are running way over what their Target times are.

Unfortunately I really don't think that Rosetta is the project for you with such limited times for work to be done.
Grant
Darwin NT
ID: 94752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2121
Credit: 41,179,074
RAC: 11,480
Message 94762 - Posted: 18 Apr 2020, 12:52:40 UTC - in response to Message 94752.  

I dont know if i changed something inadvertently or what but my PC is only on about 1 hr on weekdays but on weekends is on well over 8 hours.

I've had a unit that says time to completion is 10 minutes but it's been like that for about a week. Even today the PC has been on 3 hours but it wont finish
The default Target CPU time for a Task is 8 hours.
You could try changing it to 2 hours, but even so with the times your system is on a lot of your work will likely time out as the deadlines for most Tasks are 3 days.
Anything you do on the computer will take time from Rosetta processing, so even at 1 hour a day, with a 2hr processing target CPU time there's a good chance of not making the deadlines.
And there are some Tasks over the last few days that are running way over what their Target times are.

Unfortunately I really don't think that Rosetta is the project for you with such limited times for work to be done.

For Rosetta tasks, with a 3-day deadline and long task runs, better to not run Rosetta at all in the week and only on the weekend - yes, with a reduced runtime that ensures completion and also a buffer of zero backup tasks.
If that doesn't work, while it's nice people want to contribute, with such little time to do so, another project might be the right choice.
ID: 94762 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 94783 - Posted: 18 Apr 2020, 17:17:37 UTC

Trying to get back on topic for this thread... the estimated runtimes are challenging. If you estimate that it will take 30 minutes longer to run, and then run another 20 minutes, and now can see that you have 20 more minutes of work ahead, what do you show for the estimated runtime?

The models that Rosetta is crunching each are unique. They have there own pathway in their search. They may hit dead-ends rapidly, and quickly go back to the last fork in the road. They may hit dead-ends that take a long time to reach. Most generally, models complete in 10 or 20 minutes. It depends on the type of search protocol, the specific proteins and fragments being studied, and the specifics of the unique model being computed. Because of this, when the estimate gets down to 10 minutes, the WU starts to avoid burning up those last ten minutes. It does this by fudging the runtime estimate. Bottom line is that the WU really does not know how much longer it will take to complete the model it is on. It is still running as fast as it can. It is just trying to stage that last 10 minutes of estimated time remaining so that it shows you that it is progressing, without going over the time.

So, the last 10 minutes of estimated time shown is always the least accurate portion of the WU runtime. So people see 10 minutes remaining and it "suddenly jumps to 100%", and they panic that something went wrong. Others see 10 minutes remaining, and it can stay that way for over an hour, and they panic that something went wrong.

Bottom line is that the WU will manage itself. The work your machine is doing has never been done before with the specific combination of protein, model start point, protocols used to study it, etc. Don't take the runtime estimates too seriously. This is where the "watchdog" will take care of you. There are cases where WUs encounter particularly long models. It may have been cruising along doing a model every 20 minutes, and then run in to one that takes an hour. If that one happens to the one that is started with an estimated runtime of 25 minutes, something has to give. 4 hours was deemed the point at which it is not worth continuing with the model, sort of calling the whole model a dead-end, and cutting it off. This is why the watchdog waits until the WU has run for 4 hours (of CPU time, not time on your clock) passed the runtime preference the user has defined (on the website in the Rosetta preferences).

Quoting myself:
Most generally, models complete in 10 or 20 minutes

Yes, I know, "that's not true, I've seen some where they all take an hour". Yes, I know. Hence the next sentence starting with "It depends..." Some proteins are larger than others. Some WUs are actually studying two proteins and how they might interact with each other. And there are always new search protocols being developed, and (personal observation) it is not uncommon for new protocols to find themselves wondering down paths that do not prove to be fruitful. Once the Project Team studies the results and revise the protocol, they typically get much greater consistency in the runtime for each model.
Rosetta Moderator: Mod.Sense
ID: 94783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Preferences : work won't complete, says 10 min left only



©2024 University of Washington
https://www.bakerlab.org