Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 278 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,522,839
RAC: 15,277
Message 109161 - Posted: 24 Apr 2024, 0:49:35 UTC - in response to Message 109157.  

I've got 15 tasks returned after deadline and they've all validated and credited.
I have a further 6 awaiting validation.

Just checking further, the tasks I returned after deadline have been reissued to other users 10 minutes before I returned them.
One of the reissues has been cancelled by the Server. The others haven't.

Good for you.
I have a lot of "cancelled by the server"

So do I (although mostly they run OK). There seems to be something wrong with the server. It sends out a task, and before it returns its result or times out it sends the same one to me. Then the first user returns the result, and mine gets cancelled. Just plain sloppy.

It's a consequence of the whole site being down.
It seems like, once the site came back up, it timed-out tasks that missed deadline straight away and reissued them, but the host didn't re-poll the server until it's timer ran out - could've been 4-5hrs after the site came back up - to report they were completed.
It's just unfortunate.
ID: 109161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,522,839
RAC: 15,277
Message 109160 - Posted: 24 Apr 2024, 0:42:12 UTC - in response to Message 109154.  

I've got 15 tasks returned after deadline and they've all validated and credited.
I have a further 6 awaiting validation.

Just checking further, the tasks I returned after deadline have been reissued to other users 10 minutes before I returned them.
One of the reissues has been cancelled by the Server. The others haven't.

Good for you.
I have a lot of "cancelled by the server"

It was a very early call - in the first few hours.
In the end I had 13 cancelled by the server, none of which had started to run.
However, I did have 1 task that ran to completion, but came up with a validate error because the previous host reported it late.
On balance, it could've been a lot worse on a 16-thread machine. I'll live with it.
ID: 109160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,522,839
RAC: 15,277
Message 109159 - Posted: 24 Apr 2024, 0:31:42 UTC - in response to Message 109152.  

Which means the the tipping point <is> a Rosetta issue after all
Nope.
If it took 12 hours to do 12 hours of work, there'd be no problem.
But because it takes 24hrs to do 12hrs work, it's a big problem. Even set to 8 hours, it would still take 16hrs, so still Panic mode.
Make it so the CPU isn't over committed, and all would be OK.

His problem is purely down to it taking 2-4 times longer than it should to process any BOINC Tasks, because the CPU is also processing Folding work on the same CPU cores/threads- X cores/threads trying to process X+1 or X+2 applications (that are using 100% of each core/thread) is always going to cause problems. As long as the number of applications being run is equal to or less than the number of cores/threads, all will be well- so limiting the number of cores/threads available to BOINC so Folding has as many as it needs (1, 2, 4 or however many that is) would sort it out.

Of course if "Use at most xx % of CPU time" is anything other than 100%, that would just add to the issues of doing Folding on the same cores/threads as BOINC work (as would any GPU Tasks from BOINC projects that require 1 core/thread per GPU Task being run to support it, and that too can be resolved, although it's more difficult than it needs to be).

I don't completely agree.
It's not just that a 12hr task (that Rosetta only shows Boinc as 8hrs for the bulk of its run) is taking 20-32hrs to complete, it's that the next tasks in the cache are showing 8hrs to Boinc but will also take 20-32hrs too.
Changing the target runtime back to 8hrs, even with the folding@home contention, will take 7-11hrs out of the running tasks and a further 7-11hrs out of the cached tasks.
14-22hrs less processing time to complete tasks will make a huge difference to whether Panic mode arises. I'd guess <all> the difference.
This is only an issue if the cache is set above a day. It can be made to work by ensuring Rosetta tasks only run for the time Adrian already thought they were set to (8hrs rather than 12hrs they actually run for).

It can certainly be solved your way, but that gets a bit fiddly imo and doesn't resolve the confusion Rosetta runtime introduces.
I'd rather my solution if I were him too, especially if RAM and disk space don't come into the equation.
And we already know Adrian didn't like your solution, so let's see what he thinks of my alternative. It's entirely up to him.
ID: 109159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 173
Credit: 5,671,457
RAC: 3,329
Message 109157 - Posted: 23 Apr 2024, 17:36:12 UTC - in response to Message 109154.  

I've got 15 tasks returned after deadline and they've all validated and credited.
I have a further 6 awaiting validation.

Just checking further, the tasks I returned after deadline have been reissued to other users 10 minutes before I returned them.
One of the reissues has been cancelled by the Server. The others haven't.


Good for you.
I have a lot of "cancelled by the server"


So do I (although mostly they run OK). There seems to be something wrong with the server. It sends out a task, and before it returns its result or times out it sends the same one to me. Then the first user returns the result, and mine gets cancelled. Just plain sloppy.
ID: 109157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 16 Jun 07
Posts: 12
Credit: 1,441,973
RAC: 7,011
Message 109155 - Posted: 23 Apr 2024, 16:01:27 UTC - in response to Message 109154.  
Last modified: 23 Apr 2024, 16:01:53 UTC


I have a lot of "cancelled by the server"


same here, i lost a hundred tasks :(
i crunch for Ukraine. Join our team forums about Rosetta@home
ID: 109155 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1864
Credit: 8,184,675
RAC: 7,690
Message 109154 - Posted: 23 Apr 2024, 12:39:09 UTC - in response to Message 109145.  

I've got 15 tasks returned after deadline and they've all validated and credited.
I have a further 6 awaiting validation.

Just checking further, the tasks I returned after deadline have been reissued to other users 10 minutes before I returned them.
One of the reissues has been cancelled by the Server. The others haven't.


Good for you.
I have a lot of "cancelled by the server"
ID: 109154 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1225
Credit: 13,856,762
RAC: 2,065
Message 109153 - Posted: 23 Apr 2024, 12:29:01 UTC

I remember from when I was running Folding@Home also that Folding@Home expects to use entire CPU cores, not just the available threads in that CPU core. An easy way to handle this is to start the Folding@Home program at least a full minute before starting any BOINC program.
ID: 109153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1484
Credit: 14,653,889
RAC: 13,460
Message 109152 - Posted: 23 Apr 2024, 7:23:57 UTC - in response to Message 109149.  

Which means the the tipping point <is> a Rosetta issue after all
Nope.
If it took 12 hours to do 12 hours of work, there'd be no problem.
But because it takes 24hrs to do 12hrs work, it's a big problem. Even set to 8 hours, it would still take 16hrs, so still Panic mode.
Make it so the CPU isn't over committed, and all would be OK.

His problem is purely down to it taking 2-4 times longer than it should to process any BOINC Tasks, because the CPU is also processing Folding work on the same CPU cores/threads- X cores/threads trying to process X+1 or X+2 applications (that are using 100% of each core/thread) is always going to cause problems. As long as the number of applications being run is equal to or less than the number of cores/threads, all will be well- so limiting the number of cores/threads available to BOINC so Folding has as many as it needs (1, 2, 4 or however many that is) would sort it out.

Of course if "Use at most xx % of CPU time" is anything other than 100%, that would just add to the issues of doing Folding on the same cores/threads as BOINC work (as would any GPU Tasks from BOINC projects that require 1 core/thread per GPU Task being run to support it, and that too can be resolved, although it's more difficult than it needs to be).
Grant
Darwin NT
ID: 109152 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1484
Credit: 14,653,889
RAC: 13,460
Message 109150 - Posted: 23 Apr 2024, 6:59:02 UTC - in response to Message 109134.  
Last modified: 23 Apr 2024, 7:13:33 UTC

I'd like to comment.

I see a problem, a problem that I should not be seeing. I try to make headway to resolve it, so ask.
No you don't, you just ignore what you are told as to how to fix it. Twice now.


The result of asking each time is the same, basically, the BOINC folk tell me the problem is Folding, the Folding folk tell me it is not.
And since it is occurring with a BOINC project- actually all of your BOINC projects, not just this one- might it be somewhat obvious that those of us here doing BOINC work might have some idea of what is actually going on? While those at Folding- unless they do BOINC work as well- won't have the slightest idea of what you are complaining to them about?
And if you had paid the slightest bit of attention to the responses i gave you previously, you would understand what the problem is & how to fix it.


I have set no new tasks at both. I would seem to face a choice, I can support one or the other. Both are important to me.
The third option would be to fix it so that both can co-exist, hundreds (if not a thousand +) of other people have done so.

Twice i have told you what the problem is. Twice i have told you how you could fix the problem.
And twice you have ignored completely everything you were told that would allow you to sort it out.

So, yeah, not doing either of them is probably the best option for you.
Grant
Darwin NT
ID: 109150 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,522,839
RAC: 15,277
Message 109149 - Posted: 22 Apr 2024, 22:30:45 UTC - in response to Message 109133.  

This will solve the <entirety> of your problems, while (coincidentally) massively increasing your contribution to <all> the projects you run within your preferred settings.
He's running Folding at home as well.
He asked about this issue 4 years ago and ignored all advice as to how to fix it. He asked about it again about a month or so back, and once again refused to take any advice on how to resolve it.
He just likes to whinge about things he's not prepared to do anything about- ie Look in Task manager to see exactly what processes are using CPU time, and then limiting the number of cores/threads BOINC can use so it's not impacted by those used by Folding.

Ta, I didn't pick up the Folding@home involvement - that explains part of it.
But I do think it's the Target CPU time aspect that's tipping things over the edge - partly because I'm set to 12hr tasks too and it is a bit weird, but I run a small enough cache and only two projects so it never affects me.
The part about using 12hr tasks not changing the projected runtime of the rest of the Rosetta cache is something that was brought in... about 4 years ago.
I'm pretty sure that's not a coincidence.

Which means the the tipping point <is> a Rosetta issue after all
ID: 109149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,522,839
RAC: 15,277
Message 109148 - Posted: 22 Apr 2024, 22:01:50 UTC - in response to Message 109134.  

I'd like to comment.

I see a problem, a problem that I should not be seeing. I try to make headway to resolve it, so ask. The result of asking each time is the same, basically, the BOINC folk tell me the problem is Folding, the Folding folk tell me it is not.

I have set no new tasks at both. I would seem to face a choice, I can support one or the other. Both are important to me.

I understand the issue better now.
Irrespective of fault, it seems like all Boinc projects are having problems coexisting with Folding@home, evidenced by Grant's comment
And the same issue is happening with your other projects.
Asteroids- 2hrs Runtime,1hr CPU time.
SIdock- 31.5hrs Runtime, 27hrs 40min CPU time.
Denis- 3hr 40min Runtime, 1hr CPU time.

This is only a problem to the extent that tasks miss deadlines, which is what you have, so check these settings in turn:

1. Ensure "at most xx% of CPU time" is set to 100% for all Boinc tasks.
2. You may think Rosetta is set to 8hrs, but every one of your tasks runs to 43,200secs of CPU time, which is 12hrs. So go to your account online and within rosetta@home preferences reaffirm "Target CPU run time" is set explicitly to 8hrs and Update Preferences. Rosetta certainly thinks it's set to 12hrs.
3. If you still can't complete tasks within the deadline, reduce your cache size in Boinc, so you don't download too many tasks to complete before deadline.

I think Point 2 will be the solution.
Rosetta is a bit weird when non-default runtimes are set.
They're all downloaded as if they're 8hrs tasks, but when it gets close to that runtime only then does it adjust the remaining time up toward 12hrs.
So they run 4hrs longer, then projects the size of the rest of the cache as if it will be 8hrs again.
It's been programmed to <not> adjust based on past history. I forget why but I do recall when it was deliberately made to work that way.
ID: 109148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 230
Credit: 324,699
RAC: 1,764
Message 109147 - Posted: 22 Apr 2024, 21:37:26 UTC - in response to Message 109145.  

I hope they will still get points when script runs, because each task would still generate unique data.
ID: 109147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 375
Credit: 10,724,519
RAC: 5,070
Message 109146 - Posted: 22 Apr 2024, 21:37:19 UTC - in response to Message 109134.  
Last modified: 22 Apr 2024, 21:39:13 UTC

I'd like to comment.

I see a problem, a problem that I should not be seeing. I try to make headway to resolve it, so ask. The result of asking each time is the same, basically, the BOINC folk tell me the problem is Folding, the Folding folk tell me it is not.

I have set no new tasks at both. I would seem to face a choice, I can support one or the other. Both are important to me.


This is not a problem with Boinc. It is not a problem with Folding. It is a problem with your configuration which is preventing the two projects, which have no way of knowing the other is there, from working together.

You have been given the configuration changes required, all that’s needed now is for you to try them.
ID: 109146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,522,839
RAC: 15,277
Message 109145 - Posted: 22 Apr 2024, 21:32:57 UTC - in response to Message 109143.  

Project seems to be alive now


And I report some wus over the deadline.
I don't know if they will consider these as valid

I would hope so. The site hasn't been up to reallocate the tasks to anyone else, so we should get credit even having missed deadline as we're the first to return the task.

That said, I wonder if any of the new tasks being issued might be resends... might be worth checking within an hour or two.

I've got 15 tasks returned after deadline and they've all validated and credited.
I have a further 6 awaiting validation.

Just checking further, the tasks I returned after deadline have been reissued to other users 10 minutes before I returned them.
One of the reissues has been cancelled by the Server. The others haven't.

Which tells me that any tasks we're downloading may fall into the same bracket.

My understanding is that if the tasks haven't been started, they will be cancelled by the server. <But>
If they have been started they'll run to completion and <be awarded no credit> because the previous user has already been awarded them!

This is all getting a bit ugly.

Check the status of all your tasks online, people...
ID: 109145 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,522,839
RAC: 15,277
Message 109144 - Posted: 22 Apr 2024, 21:06:20 UTC - in response to Message 109142.  
Last modified: 22 Apr 2024, 21:08:02 UTC

État de l'exécution
Travail

Tasks ready to send 78401
Tâches en cours 112944
Workunits waiting for validation 21612
Workunits waiting for assimilation 5284
Workunits waiting for file deletion 0
Tasks waiting for file deletion 0
Transitioner backlog (hours) 0.00
Utilisateurs

With credit 1379962
With recent credit 15782
Registered in past 24 hours 2
Ordinateurs

With credit 4530477
With recent credit 32018
Registered in past 24 hours 7
Current GigaFLOPS 132514

Hopefully this is just a reflection of all the tasks being returned now the site is back up.
Most of my tasks aren't validated yet, but some are.

I have got one weird one though:

Completed, can't validate

Edit: It's just changed while I was typing this to validated and credited.
That's weird - never seen that happen before.

Sounds like we need to be a little patient as everything gets processed.

Edit 2: I refreshed the page again and loads more of my tasks were Validated.
We should definitely give the site a chance to blast through them
ID: 109144 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1990
Credit: 38,522,839
RAC: 15,277
Message 109143 - Posted: 22 Apr 2024, 21:02:29 UTC - in response to Message 109136.  

Project seems to be alive now


And i report some wus over the deadline.
I don't know if they will consider these as valid

I would hope so. The site hasn't been up to reallocate the tasks to anyone else, so we should get credit even having missed deadline as we're the first to return the task.

That said, I wonder if any of the new tasks being issued might be resends... might be worth checking within an hour or two.
ID: 109143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,650,416
RAC: 2,237
Message 109142 - Posted: 22 Apr 2024, 20:17:46 UTC

État de l'exécution
Travail

Tasks ready to send 78401
Tâches en cours 112944
Workunits waiting for validation 21612
Workunits waiting for assimilation 5284
Workunits waiting for file deletion 0
Tasks waiting for file deletion 0
Transitioner backlog (hours) 0.00
Utilisateurs

With credit 1379962
With recent credit 15782
Registered in past 24 hours 2
Ordinateurs

With credit 4530477
With recent credit 32018
Registered in past 24 hours 7
Current GigaFLOPS 132514
ID: 109142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,650,416
RAC: 2,237
Message 109140 - Posted: 22 Apr 2024, 19:42:59 UTC

somes are validated now
ID: 109140 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,650,416
RAC: 2,237
Message 109139 - Posted: 22 Apr 2024, 19:41:52 UTC

i notice the very low update rate ~25KB/s

and the "waiting for validation" about the results just uploaded ...
ID: 109139 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,709,409
RAC: 1,933
Message 109138 - Posted: 22 Apr 2024, 18:47:53 UTC

And again, not a word from the project.
Wonder what it was this time.
ID: 109138 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 278 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org