Problems and Technical Issues with Rosetta@home

Author	Message
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2161 Credit: 13,079,215 RAC: 5,969	Message 104826 - Posted: 15 Feb 2022, 17:01:06 UTC - in response to Message 104823. I've got plenty of _PcrV_ Tasks that have been processed and Validated, but it must be around 50% of them crashed and burned within seconds of starting. +1. Now some of these wus are running... ID: 104826 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 104828 - Posted: 15 Feb 2022, 18:33:16 UTC Last modified: 15 Feb 2022, 18:33:46 UTC Everything from that protein died and my wingmen had the same errors. Very good of them to dump untested tasks on the server. Thought they dumped them on RALPH first and if he liked them then they came to Rosie. ID: 104828 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2161 Credit: 13,079,215 RAC: 5,969	Message 104829 - Posted: 15 Feb 2022, 20:47:40 UTC - in response to Message 104828. Thought they dumped them on RALPH first and if he liked them then they came to Rosie. Completely agree with you. Ralph is VERY underused ID: 104829 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2542 Credit: 47,118,286 RAC: 319	Message 104830 - Posted: 15 Feb 2022, 23:23:54 UTC Finished the latest batch of Rosetta 4.20 tasks, so flicked back to WCG tasks automatically... Yeah, I didn't read WCG's recent announcement properly. I thought it was going to be down from 14th to 28th February. Not stop sending tasks that will complete in that period and then the whole project be down until April 22nd Already completed everything ffs... I'm going to have to install Virtual Box and give that another try, aren't I God help me ID: 104830 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2542 Credit: 47,118,286 RAC: 319	Message 104831 - Posted: 16 Feb 2022, 2:47:33 UTC - in response to Message 104830. Finished the latest batch of Rosetta 4.20 tasks, so flicked back to WCG tasks automatically... Yeah, I didn't read WCG's recent announcement properly. I thought it was going to be down from 14th to 28th February. Not stop sending tasks that will complete in that period and then the whole project be down until April 22nd Already completed everything ffs... I'm going to have to install Virtual Box and give that another try, aren't I God help me Was just about to say I completed my first tasks (which I did) when something crashed and all my remaining tasks errored out 16/02/2022 2:41:27 \| Rosetta@home \| [error] MD5 check failed for AIMNet_vm_v2.vdi 16/02/2022 2:41:27 \| Rosetta@home \| [error] expected 61fef19456bb58ec941845ef08d8c5ef, got c846bd7ee0a3dedc8eedbe8fbc36eda8 Still, better than my previous attempts. I had up to 9 tasks running at a time within 32Gb RAM on my 8C/16T machine ID: 104831 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2161 Credit: 13,079,215 RAC: 5,969	Message 104832 - Posted: 16 Feb 2022, 6:14:22 UTC - in response to Message 104830. Last modified: 16 Feb 2022, 6:16:33 UTC I'm going to have to install Virtual Box and give that another try, aren't I God help me Tn-Grid?? Sidock? ID: 104832 · Rating: 0 · rate: / Reply Quote

computezrmle Send message Joined: 9 Dec 11 Posts: 63 Credit: 9,680,103 RAC: 0	Message 104833 - Posted: 16 Feb 2022, 7:34:10 UTC - in response to Message 104831. 16/02/2022 2:41:27 \| Rosetta@home \| [error] MD5 check failed for AIMNet_vm_v2.vdi 16/02/2022 2:41:27 \| Rosetta@home \| [error] expected 61fef19456bb58ec941845ef08d8c5ef, got c846bd7ee0a3dedc8eedbe8fbc36eda8 Looks like the vdi image got damaged and needs to be refreshed. Best would be to - Shut down BOINC - Delete AIMNet_vm_v2.vdi from the projects directory - Restart BOINC This will initiate a fresh download of the compressed image (~2 GB) which will then be expanded to 6.9 GB. Whenever a fresh task starts AIMNet_vm_v2.vdi will be forced through the checksum calculator (MD5 check) and the result will be compared to the checksum sent by the project. Only in case of a success the image will be copied to a slots directory and renamed vm_image.vdi which is used for the task. ID: 104833 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2542 Credit: 47,118,286 RAC: 319	Message 104840 - Posted: 16 Feb 2022, 14:30:32 UTC - in response to Message 104832. I'm going to have to install Virtual Box and give that another try, aren't I God help me Tn-Grid?? Sidock? I couldn't even access the home page of Sidock. TN-Grid may be something, but I'm not sure what. I'll persist here for a while longer ID: 104840 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2542 Credit: 47,118,286 RAC: 319	Message 104841 - Posted: 16 Feb 2022, 14:39:40 UTC - in response to Message 104833. Last modified: 16 Feb 2022, 14:45:49 UTC 16/02/2022 2:41:27 \| Rosetta@home \| [error] MD5 check failed for AIMNet_vm_v2.vdi 16/02/2022 2:41:27 \| Rosetta@home \| [error] expected 61fef19456bb58ec941845ef08d8c5ef, got c846bd7ee0a3dedc8eedbe8fbc36eda8 Looks like the vdi image got damaged and needs to be refreshed. Best would be to - Shut down BOINC - Delete AIMNet_vm_v2.vdi from the projects directory - Restart BOINC This will initiate a fresh download of the compressed image (~2 GB) which will then be expanded to 6.9 GB. Whenever a fresh task starts AIMNet_vm_v2.vdi will be forced through the checksum calculator (MD5 check) and the result will be compared to the checksum sent by the project. Only in case of a success the image will be copied to a slots directory and renamed vm_image.vdi which is used for the task. A new version of AIMNet_vm_v2.vdi comes down after updating, with all the attributes you mention but without a shutdown and restart. I've had a few Rosetta 4.20 and WCG tasks dribble through too. I overclock my PC and sometimes these checksum errors have been associated with overclocking, so I'm wary of that factor. At the same time, VBox tasks seem slightly less demanding than Rosetta tasks and I'm running a lot cooler with VBox, so maybe not. And I'm sure that only being able to run 8 or 9 tasks at a time rather than 16 plays into that too. I've completed all the Rosetta and WCG tasks I got, but now I only have 2 VBox tasks and none further will download. Is it normal for VBox tasks to only be available intermittently? I'm getting what I'm getting and it's not the complete failure it was when I first tried. I'll give it a few more days Edit: I've had to click "Allow" on my PC's profile. I guess all the crashed tasks tripped it to restrict downloads Yup, that's done it ID: 104841 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2161 Credit: 13,079,215 RAC: 5,969	Message 104842 - Posted: 16 Feb 2022, 15:38:02 UTC - in response to Message 104840. I couldn't even access the home page of Sidock. Sidock is in maintenance today, but will be turn online soon TN-Grid may be something, but I'm not sure what. It's a storical boinc project about gene network.... http://gene.disi.unitn.it/test/ ID: 104842 · Rating: 0 · rate: / Reply Quote

Killersocke@rosetta Send message Joined: 13 Nov 06 Posts: 29 Credit: 2,579,125 RAC: 0	Message 104843 - Posted: 16 Feb 2022, 18:25:37 UTC - in response to Message 104842. World Community Grid: WCG Data Transfer Underway, Stress Test of New Infrastructure Scheduled For Feb 28th We have started to transfer data for all active WCG projects to the Krembil Research Institute. We are gearing up to start testing the whole system on February 28, 2022. 09.02.2022 20:41:43 · weiterlesen... -------------------------------------------------------------------------------- SiDock@home: Technical maintenance on February 15th Hello! Additional server maintenance planned on February 15th, for several hours. 14.02.2022 22:57:41 · weiterlesen... ID: 104843 · Rating: 0 · rate: / Reply Quote

BoredEEdude Send message Joined: 11 Apr 12 Posts: 11 Credit: 38,954,694 RAC: 0	Message 104844 - Posted: 16 Feb 2022, 18:28:44 UTC I have been running Rosetta on multiple computers for years, and it has been a mostly hands-off background task requiring minimal supervision. For the past few months, Rosetta work units have been unavailable for days on end. No errors are shown, just "got 0 new tasks". 2/16/2022 11:45:16 AM \| Rosetta@home \| update requested by user 2/16/2022 11:45:20 AM \| Rosetta@home \| Sending scheduler request: Requested by user. 2/16/2022 11:45:20 AM \| Rosetta@home \| Requesting new tasks for CPU 2/16/2022 11:45:22 AM \| Rosetta@home \| Scheduler request completed: got 0 new tasks 2/16/2022 11:45:22 AM \| Rosetta@home \| No tasks sent 2/16/2022 11:45:22 AM \| Rosetta@home \| Project requested delay of 31 seconds When this happens, the online website server status shows approximately 5000 tasks are ready to send, with some large number (~100k) of tasks in progress, and little server side processing occurring. Computing status Work Tasks ready to send 4992 Tasks in progress 115529 Workunits waiting for validation 0 Workunits waiting for assimilation 1 Workunits waiting for file deletion 1 Tasks waiting for file deletion 1 Transitioner backlog (hours) 0.00 It seems that whenever the available number of tasks gets down to around 5000, all work units are considered sent, and the server backend is now just waiting for completed work to be returned. I don't recall ever seeing the available tasks go down to zero. When I do eventually get some tasks, everything runs as expected locally until all tasks are finished. Then I go idle for days waiting for more tasks to become available. If seems to me that the project is just not generating as much work for all of it's users these days. I don't know if that is because the number of work units are down, or there are many more users available to process the same number of generally available units, or if the type of work has changed and I am unaware of what my system is lacking so it can be sent some of these "new" type of tasks now being made available. Is there a checklist somewhere that I can use to verify my system is setup correctly? Because my BOINC Manager currently thinks everything is running just fine. I used to run Rosetta work exclusively. But to keep my computers occupied (non-idle) I have since added other projects so I can pickup other tasks when no Rosette tasks are available. The downside is that when Rosetta tasks are available, these other projects dilute the amount of resources I can devote to Rosetta in the hands-off processing approach I prefer, as all projects now have to share the available CPU time. If many Rosetta users are running out of work, but there are still 10s or 100s of thousands of tasks still in progress, can Rosetta start limiting the number of tasks sent to individual users (even if they are willing to backlog a large numbers of tasks locally)? I have seen other projects where tasks were only generated in large bursts, and the users knew to backlog days or weeks worth of tasks since the server would quickly run out of new tasks to send out. The result was that if you didn't stockpile tasks during the initial big release, you would virtually never see any tasks unless BOINC happened to check in during a new big release of tasks days or weeks in the future. Limiting the size of individual user backlogs would spread the available work out across all the available users. That would help retain more users, since everyone would feel like they are contributing to the project. At this point, I feel like I'm getting sidelined with no work, while others are sitting on a lot of work units they cannot run immediately. And the rate of results back to Rosetta will be delayed unnecessarily as they wait for the return of backlogged tasks for a few users instead of sending them to idle machines instead. My Rosetta@home Statistics graph clearly shows 3 bursts of activity over a total of 8 days within the past 30 days. That leaves me sitting idle for 22 days (or about 75% of that time). My main PC (which the graph come from) is capable of running 16 concurrent tasks in 32 GB of RAM at ~3.5 GHZ CPU speed, so while I can normally complete many concurrent tasks in about 8 hours, 75% of the month Rosetta gets ZERO results from me for lack of tasks to run. https://drive.google.com/file/d/1X5aBWy0xj2wgV7DpF9tqjrRg8i8E-XEY/view ID: 104844 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 104845 - Posted: 16 Feb 2022, 19:22:32 UTC - in response to Message 104842. I couldn't even access the home page of Sidock. Sidock is in maintenance today, but will be turn online soon TN-Grid may be something, but I'm not sure what. It's a storical boinc project about gene network.... http://gene.disi.unitn.it/test/ QuChem has been offline for 3 days now...must have blown something up to be offline this long. No webserver, no project server....dead. ID: 104845 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 104846 - Posted: 16 Feb 2022, 19:27:03 UTC - in response to Message 104844. For the past few months, Rosetta work units have been unavailable for days on end. No errors are shown, just "got 0 new tasks". They aren't very good at alerting you to the fact that the work units are mostly the pythons now, which require VirtualBox. And lots of memory (3 GB per work unit). And lots of disk space. And they will kill your SSD unless you are careful. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14871 The other problems are too painful to mention. You won't have to look far in this thread, once you get past the irrelevant posts. Without a moderator, there are a lot of them. That is still another problem. Lots of luck. ID: 104846 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 104847 - Posted: 16 Feb 2022, 19:39:29 UTC - in response to Message 104844. I have been running Rosetta on multiple computers for years, and it has been a mostly hands-off background task requiring minimal supervision. For the past few months, Rosetta work units have been unavailable for days on end. No errors are shown, just "got 0 new tasks". 2/16/2022 11:45:16 AM \| Rosetta@home \| update requested by user 2/16/2022 11:45:20 AM \| Rosetta@home \| Sending scheduler request: Requested by user. 2/16/2022 11:45:20 AM \| Rosetta@home \| Requesting new tasks for CPU 2/16/2022 11:45:22 AM \| Rosetta@home \| Scheduler request completed: got 0 new tasks 2/16/2022 11:45:22 AM \| Rosetta@home \| No tasks sent 2/16/2022 11:45:22 AM \| Rosetta@home \| Project requested delay of 31 seconds When this happens, the online website server status shows approximately 5000 tasks are ready to send, with some large number (~100k) of tasks in progress, and little server side processing occurring. Computing status Work Tasks ready to send 4992 Tasks in progress 115529 Workunits waiting for validation 0 Workunits waiting for assimilation 1 Workunits waiting for file deletion 1 Tasks waiting for file deletion 1 Transitioner backlog (hours) 0.00 It seems that whenever the available number of tasks gets down to around 5000, all work units are considered sent, and the server backend is now just waiting for completed work to be returned. I don't recall ever seeing the available tasks go down to zero. When I do eventually get some tasks, everything runs as expected locally until all tasks are finished. Then I go idle for days waiting for more tasks to become available. If seems to me that the project is just not generating as much work for all of it's users these days. I don't know if that is because the number of work units are down, or there are many more users available to process the same number of generally available units, or if the type of work has changed and I am unaware of what my system is lacking so it can be sent some of these "new" type of tasks now being made available. Is there a checklist somewhere that I can use to verify my system is setup correctly? Because my BOINC Manager currently thinks everything is running just fine. I used to run Rosetta work exclusively. But to keep my computers occupied (non-idle) I have since added other projects so I can pickup other tasks when no Rosette tasks are available. The downside is that when Rosetta tasks are available, these other projects dilute the amount of resources I can devote to Rosetta in the hands-off processing approach I prefer, as all projects now have to share the available CPU time. If many Rosetta users are running out of work, but there are still 10s or 100s of thousands of tasks still in progress, can Rosetta start limiting the number of tasks sent to individual users (even if they are willing to backlog a large numbers of tasks locally)? I have seen other projects where tasks were only generated in large bursts, and the users knew to backlog days or weeks worth of tasks since the server would quickly run out of new tasks to send out. The result was that if you didn't stockpile tasks during the initial big release, you would virtually never see any tasks unless BOINC happened to check in during a new big release of tasks days or weeks in the future. Limiting the size of individual user backlogs would spread the available work out across all the available users. That would help retain more users, since everyone would feel like they are contributing to the project. At this point, I feel like I'm getting sidelined with no work, while others are sitting on a lot of work units they cannot run immediately. And the rate of results back to Rosetta will be delayed unnecessarily as they wait for the return of backlogged tasks for a few users instead of sending them to idle machines instead. My Rosetta@home Statistics graph clearly shows 3 bursts of activity over a total of 8 days within the past 30 days. That leaves me sitting idle for 22 days (or about 75% of that time). My main PC (which the graph come from) is capable of running 16 concurrent tasks in 32 GB of RAM at ~3.5 GHZ CPU speed, so while I can normally complete many concurrent tasks in about 8 hours, 75% of the month Rosetta gets ZERO results from me for lack of tasks to run. https://drive.google.com/file/d/1X5aBWy0xj2wgV7DpF9tqjrRg8i8E-XEY/view It looks like your just getting 4.2 tasks???? Well if that is the case you need to enable virtualization on your MOBO and download Oracle Virtualbox and its extension pack and you need to enable the option for RAH to send you Python (Virtual box tasks) via your account ->(Computing and credit section) ->Computers on this account View ->Details ->VirtualBox VM jobs (bottom of the page) and change that button to allow Vbox tasks to run. If you have been running Python and then it goes cold, then check this same page and make sure your computer has not been knocked off by the system for "errors". Your team is Gridcoin...are your systems run by Gridcoin? Someone else here is also a gridcoin member and can maybe help you further. How many days of work do you store on your system? Do you run other projects besides RAH? Often the no new tasks message comes up when your system is maxed out for work. As for task types...you need to look deeper in the server status Tasks by application Application Unsent In progress Rosetta 1 93348 Rosetta Mini 0 0 rosetta python projects 4995 16897 Feb 16 840pm CET ID: 104847 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 104852 - Posted: 16 Feb 2022, 21:44:31 UTC Rosetta@home: Notice from server rosetta python projects needs 14411.41MB more disk space. You currently have 4662.07 MB available and it needs 19073.49 MB. 2/16/2022 10:25:14 PM Say what? 500 GB dedicated drive Boinc set to leave 2GB free use the rest Windows says 97.1 used and 358 free out of 465 available running 10 python and 4x 4.2 Weird ID: 104852 · Rating: 0 · rate: / Reply Quote

dcdc Send message Joined: 3 Nov 05 Posts: 1835 Credit: 124,952,580 RAC: 199	Message 104853 - Posted: 16 Feb 2022, 22:51:45 UTC - in response to Message 104852. If the boxes are unchecked then I believe it uses your web preferences. ID: 104853 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2542 Credit: 47,118,286 RAC: 319	Message 104855 - Posted: 16 Feb 2022, 23:19:15 UTC - in response to Message 104842. I couldn't even access the home page of Sidock. Sidock is in maintenance today, but will be turn online soon TN-Grid may be something, but I'm not sure what. It's a storical boinc project about gene network.... http://gene.disi.unitn.it/test/ Sidock: ok TN-Grid: I got there and looked around a bit, but I didn't really understand what was going on tbh. If you repeat what they say, I'm not sure I'll understand it any better. I know my limitations. I'll persist here ID: 104855 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2542 Credit: 47,118,286 RAC: 319	Message 104856 - Posted: 16 Feb 2022, 23:36:43 UTC - in response to Message 104844. It seems that whenever the available number of tasks gets down to around 5000, all work units are considered sent, and the server backend is now just waiting for completed work to be returned. I don't recall ever seeing the available tasks go down to zero. Scroll down to the bottom of the server status page. 5000(ish) are Python tasks you'll need VirtualBox for. Anything above that is the Rosetta 4.20 tasks you're set up for and used to. Basically, you're right. Is there a checklist somewhere that I can use to verify my system is setup correctly? Because my BOINC Manager currently thinks everything is running just fine. No reason to think anything is wrong if you successfully ran unattended before. Everyone's in the same position. Limiting the size of individual user backlogs would spread the available work out across all the available users. That would help retain more users, since everyone would feel like they are contributing to the project. At this point, I feel like I'm getting sidelined with no work, while others are sitting on a lot of work units they cannot run immediately. And the rate of results back to Rosetta will be delayed unnecessarily as they wait for the return of backlogged tasks for a few users instead of sending them to idle machines instead. We did that already in 2019. What you're seeing now is the good version. Yeah, I know... The project changed recently, prioritising Python/Virtual Box tasks, of which there are many (2.2million) available. Plain Rosetta 4.20 tasks only become available in dribs and drabs in the way you're seeing. Python/Virtual Box tasks have their own problems, which many people are struggling with right now. <Very> high RAM and disk demands. If you want to give them a try you'll get work, but they won't be so easily runnable, as you'll see from other comments here. This just seems to be the way things are. Sorry ID: 104856 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2542 Credit: 47,118,286 RAC: 319	Message 104857 - Posted: 16 Feb 2022, 23:43:15 UTC - in response to Message 104852. Weird I think I just posted the long version of what you're getting in the "There's a max WU of 8 with Virtualbox" thread. In short: yes, me too, exactly the same but on a 1Tb drive ID: 104857 · Rating: 0 · rate: / Reply Quote