Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 319 · 320 · 321 · 322 · 323 · 324 · 325 . . . 331 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2310 Credit: 43,444,891 RAC: 29,423 ![]() |
Check the home page. As of 1900 UTC today, there are over 10 million queued tasks. I meant Work in Progress, sorry. Currently 198k and just 1k unsent ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2310 Credit: 43,444,891 RAC: 29,423 ![]() |
It's currently 4:30pm on Friday afternoon, their time. Well, a very strange thing has happened. boinc-process server is back running and the validation backlog is fully cleared already. We've barely reached Thursday morning UK time. I'm slightly disconcerted now tbh. Everything I thought I knew has been turned upside down. I shouldn't complain, but it's part of my character by now... ![]() ![]() |
mzelden Send message Joined: 8 Sep 20 Posts: 3 Credit: 3,718,885 RAC: 5,306 ![]() |
Where are those error messages being shown? |
mzelden Send message Joined: 8 Sep 20 Posts: 3 Credit: 3,718,885 RAC: 5,306 ![]() |
Where are those error messages being shown? |
mzelden Send message Joined: 8 Sep 20 Posts: 3 Credit: 3,718,885 RAC: 5,306 ![]() |
Where are those error messages being shown? I get this all the time also, I typically see it in the morning after not working on the computer. I will say it started after a MBO change. I've tried uninstalling and installing BOINC a couple of times and it hasn't helped. I haven't tried resetting the Rosetta project. If I just leave the computer idle for the screen saver timeout time, it seems to launch normally and I see the graphics. BTW, I still had SETI@HOME in my config, but deleted that completely and it didn't help. Rosetta@home is the only project. If I completely delete it and try to add it, I don't lose any credits, correct? I know I have to update first... |
![]() Send message Joined: 28 Mar 20 Posts: 1817 Credit: 18,534,891 RAC: 1 |
On the main page.Things are still very much broken- the amount of work In progress continues to fall, as does the number of Tasks processed per 24hrs. The Grafana graphs make it even easier to see what's going on. Whatever was broken appears to be working again- work In progress is climbing, the amount of work being returned each 24hrs is also increasing again. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2310 Credit: 43,444,891 RAC: 29,423 ![]() |
On the main page.Things are still very much broken- the amount of work In progress continues to fall, as does the number of Tasks processed per 24hrs. Excellent link, thanks. I used to use a different page but forgot what it was. I can see the WiP figure has risen slightly, but it still seems 20-30k below what it was some weeks ago and there seems so few unsent most of the time. Still, I'm running down WCG and SiDock on all my PCs and managing to maximise my small cache on each too, if more slowly than usual ![]() ![]() |
Random Send message Joined: 10 Mar 24 Posts: 8 Credit: 33,684 RAC: 776 |
How long does it take some of you super computers to complete 1 wu? Just curious. |
![]() Send message Joined: 28 Mar 20 Posts: 1817 Credit: 18,534,891 RAC: 1 |
How long does it take some of you super computers to complete 1 wu? Just curious.Rosetta Tasks are different to other projects- they are set to run for a certain amount of time- 8 hours for Rosetta 4.20 Tasks, and 3 Hours for Rosetta Beta Tasks (although there are some batches where they are set to run for 8 hours). So a more powerful computer won't do more Tasks per day than a less powerful one- however the more powerful computer will do more processing in that time, and so it gets more Credit for each Task for doing the extra work. Grant Darwin NT |
![]() Send message Joined: 1 Dec 05 Posts: 2058 Credit: 10,888,763 RAC: 12,298 ![]() |
How long does it take some of you super computers to complete 1 wu? Just curious. As said Grant, the difference is the number of decoys (in the screensaver are named "Model") you can run in a wu. An old core I3, for example, makes 50 decoys in 4 hrs in a wu The same wu, in 4hrs, in a new Threadripper maker 400 decoys (the numbers is at random, only as example) You have your credit based on the numbers of decoys |
Tom M Send message Joined: 20 Jun 17 Posts: 116 Credit: 25,259,723 RAC: 86,988 ![]() |
Apparently the graphics on version 6.06 beta are now not working. Previously they were or a previous version of the beta app was woring. On Linux. Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
![]() Send message Joined: 28 Mar 20 Posts: 1817 Credit: 18,534,891 RAC: 1 |
10 million jobs Queued up, but 0 ready to send. Once again, the number of Tasks In progress is dropping away as work is returned but people can't get new work to do. More project server system issues to be sorted out. Been an issue for about 8 hours now. There have been issues on and off with the Assimilators not keeping up with the load for the last couple of days- they're on the bwsrv1 host. Also on the bwsrv1 host is the Feeder (responsible for supplying new work)- so it looks like that's the server that's got problems. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2310 Credit: 43,444,891 RAC: 29,423 ![]() |
10 million jobs Queued up, but 0 ready to send. Yup, I was suspicious of bwsrv1 during last week. Not that it's gone down - it's never been flagged as not running and half my calls for work are still successful - but it's as if it's running at half speed or even less. It takes long enough to fix servers when they're definitely not running. I don't know what will trigger someone to even look at this as a problem. Unless researchers start asking why their tasks aren't coming back as quickly as expected. Back to crossing fingers... ![]() ![]() |
![]() Send message Joined: 28 Mar 20 Posts: 1817 Credit: 18,534,891 RAC: 1 |
[quote]10 million jobs Queued up, but 0 ready to send.Still broken- Tasks In progress has dropped by over 50,000. Grant Darwin NT |
![]() Send message Joined: 28 Mar 20 Posts: 1817 Credit: 18,534,891 RAC: 1 |
Things improved for a while there, but once again the Tasks In progress are falling away & the Assimilators have a backlog that is growing rapidly. Grant Darwin NT |
![]() Send message Joined: 28 Mar 20 Posts: 1817 Credit: 18,534,891 RAC: 1 |
Things are still broken, but not as broken. The Assimilator backlog isn't as bad as it was, and the rate of decline in the amount of work being done has slowed down- but it is still dropping. It's now about half of what it was (over 205,000 down to 115,000 now). When there is little to no work, there are no problems getting what is available. Now there is a heap of work available, and it's almost impossible to get any. Grant Darwin NT |
Tom M Send message Joined: 20 Jun 17 Posts: 116 Credit: 25,259,723 RAC: 86,988 ![]() |
Things are still broken, but not as broken. Nothing in the Ready To Send at the moment. Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel..... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2310 Credit: 43,444,891 RAC: 29,423 ![]() |
Things are still broken, but not as broken. On the plus side, when boinc-process goes down tomorrow, the number of tasks awaiting validation will be much lower than we're used to seeing... ...I've uncrossed my fingers and started clutching straws to see if that's a better strategy ![]() ![]() |
jon b. Send message Joined: 27 Dec 09 Posts: 1 Credit: 14,833,010 RAC: 94,248 ![]() |
I currently have my buffer set to store at least 0.15 and up to 0.25 additional days of work, and I have not run out of tasks on any of my computers yet. Another plus to maintaining a bit of a client-side queue is that it can help reduce load on the server by reducing the number of requests. Of course it would be ideal if the servers could keep up with our demand! Looking back at the Grafana logs, it looks like the boinc-process thing has been happening regularly on Wednesdays for at least a year. There may be a scheduled task performed during the downtime, such as a DB backup. Tasks are still being generated and distributed while boinc-process is down, and are validated when it comes back up. More of an annoyance than an actual operational issue. The real question is why the project team haven't shared any technical information with volunteers in so long. WCG had some issues a while back, and they did an excellent job of explaining the root cause of the problem and what they did to address it. From my personal experience, I know it can be difficult for researchers to find time for "public relations," but keeping volunteers/donors informed about the work they are contributing resources to shouldn't be neglected. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2310 Credit: 43,444,891 RAC: 29,423 ![]() |
I currently have my buffer set to store at least 0.15 and up to 0.25 additional days of work, and I have not run out of tasks on any of my computers yet. Another plus to maintaining a bit of a client-side queue is that it can help reduce load on the server by reducing the number of requests. Of course it would be ideal if the servers could keep up with our demand! Pretty much agree with all that. I'm just back from a few days in Portugal where I took my laptop with me out there and finally set it up for Boinc and Rosetta only, with just a 0.1 plus 0.1 cache size and 50% CPUs to keep the heat generation down and I managed to grab sufficient tasks to keep it occupied plus none spare. I've got my main PC with all non-Rosetta projects set to NNT and that's grabbed enough tasks to keep going too. My two other PCs are allowing non-Rosetta tasks to run atm, so they're both only running a couple of Rosetta at a time. What I'd emphasise yet again is that those tasks that are only 3 hours (Rosetta Beta I think) should be set explicitly at 8hr runtimes rather than allowing the default to knock them down to 3hrs. This will keep people running a lot longer and reduce the demand for tasks, only then to run out of fresh tasks. I'm personally <convinced> that this 3hr runtime setting is a mistake, however long it's persisted for. There's no downside whatsoever as a result of making this change, only an upside for everyone. I'm sure everyone already knows that boinc-process is down again - being Wednesday. I'd estimate about 10hrs ago. Crossing fingers that it may come back early again, like it did last week when it returned on Thursday am (UTC) rather than Friday Edit: weird thing, but the assimilation backlog seems to have cleared down to zero at about the same time as validators went down. No idea what that's about ![]() ![]() |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org