File transfers.

Author	Message
adrianxw Send message Joined: 18 Sep 05 Posts: 662 Credit: 12,167,519 RAC: 0	Message 91662 - Posted: 8 Feb 2020, 9:19:26 UTC Last modified: 8 Feb 2020, 9:20:36 UTC I noticed yesterday a Rosetta on my list in the "downloading" state. Some time later, it was still in the downloading state, so I went to transfers poked and prodded it, the download starts, but stops at 46.22%. retry does the same. It is still like that today. Server status looks normal. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. ID: 91662 · Rating: 0 · rate: / Reply Quote

LarryMajor Send message Joined: 1 Apr 16 Posts: 22 Credit: 31,533,212 RAC: 0	Message 91663 - Posted: 8 Feb 2020, 10:34:14 UTC I'm having the same problem with two machines. It happens occasionally, but it's been bad the past 24 hours. ID: 91663 · Rating: 0 · rate: / Reply Quote

bfromcolo Send message Joined: 25 Apr 13 Posts: 2 Credit: 1,294,095 RAC: 0	Message 91665 - Posted: 8 Feb 2020, 20:22:27 UTC I have had 3 tasks on 2 machines hung like this for hours, and these are very small downloads. To make matters worse it stops other work from being downloaded, at least sometimes, its not consistent here. Retrying the transfer didn't help with any of them. Aborting the transfer did help, it caused the associated work unit to fail, next update everything is back in order. Sat 08 Feb 2020 08:26:01 AM MST \| Rosetta@home \| Not requesting tasks: some download is stalled ID: 91665 · Rating: 0 · rate: / Reply Quote

adrianxw Send message Joined: 18 Sep 05 Posts: 662 Credit: 12,167,519 RAC: 0	Message 91670 - Posted: 9 Feb 2020, 13:37:20 UTC Still like that today. I aborted the transfer. Other jobs downloaded and started quickly. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. ID: 91670 · Rating: 0 · rate: / Reply Quote

Timo Send message Joined: 9 Jan 12 Posts: 185 Credit: 45,662,635 RAC: 0	Message 91671 - Posted: 9 Feb 2020, 20:35:37 UTC Just a note to help others not have to 'abort transfer' (and thus inadvertently abort tasks that may then never get completed and thus impact research) I've found that closing the BOINC client including checking the checkbox that says 'Stop running tasks when exiting the BOINC manager' and re-starting it, force-retries the downloads and they usually succeed. Still this is definitely a networking issue on the UW side. Hopefully someone reads this forum post. **38 cores crunching for R@H on behalf of cancercomputer.org - a non-profit supporting High Performance Computing in Cancer Research ID: 91671 · Rating: 0 · rate: / Reply Quote

Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 30,949,009 RAC: 87	Message 91677 - Posted: 12 Feb 2020, 8:01:11 UTC I also have few stuck files in last few days. And BOINC also stop getting new work from R@H completely until i have noticed it today and aborted stuck file transfers. ID: 91677 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 91766 - Posted: 24 Feb 2020, 13:21:37 UTC I've had this over the last few weeks - not entirely sure it's fixed even now. The biggest issue is unattended machines for a period of time longer than my overall buffer size - in my case 24-34hrs New tasks are prevented from coming down while a download is stalled (always a very small zip file) until all Rosetta tasks in my buffer are complete, so tasks are drawn from my backup project to completely fill the buffer instead. Once the stalled filetask is manually abortedcleared, my priorities between Rosetta and backup project mean backup tasks are all ignored unless they're manually forced to run, so there's a further day or two of clearing them out before the machine becomes unattended again with the prospect of another failed Rosetta download and everything repeats itself. This has been a constant job almost every single day of the last two weeks over 4 machines in 3 different locations, so if anyone can find a way of preventing this recurring I'd really appreciate it. It's not ben funny. ID: 91766 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 91768 - Posted: 24 Feb 2020, 14:42:07 UTC - in response to Message 91766. Last modified: 24 Feb 2020, 15:29:45 UTC Once the stalled filetask is manually abortedcleared, my priorities between Rosetta and backup project mean backup tasks are all ignored unless they're manually forced to run, so there's a further day or two of clearing them out before the machine becomes unattended again with the prospect of another failed Rosetta download and everything repeats itself. That is annoying, I know. But if you have set the backup as a zero resource share, it will eventually clear itself out in order to meet its expiration date. It will just sit around for a while. ID: 91768 · Rating: 0 · rate: / Reply Quote

Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 30,949,009 RAC: 87	Message 91814 - Posted: 1 Mar 2020, 2:24:50 UTC Yes, it will clear itself but in a not a good way - BOINC will just ignore such tasks from project with "zero" resource share until it almost hit theirs deadlines, it trigger "panic mode" and BOINC reallocate all resources to it to be able finish it before deadline. But sometimes it still miss some deadlines as tasks duration estimates are far from perfect and some WU can take a way longer than BOINC thinks. And do some other stupid thing while in "panic mode" like ignoring CPU cores reservation setting (like i set to use 90% CPUs at max = 7 of 8 cores, but BOINC in "panic mode" will use all 8) or start pausing GPU work to free more cpu cores for CPU WU risking cross deadline and other thing which was never allowed to do. ID: 91814 · Rating: 0 · rate: / Reply Quote

Om Send message Joined: 18 Feb 20 Posts: 16 Credit: 777,076 RAC: 0	Message 91967 - Posted: 14 Mar 2020, 16:16:26 UTC - in response to Message 91662. Last modified: 14 Mar 2020, 16:23:13 UTC . ID: 91967 · Rating: 0 · rate: / Reply Quote

Om Send message Joined: 18 Feb 20 Posts: 16 Credit: 777,076 RAC: 0	Message 91968 - Posted: 14 Mar 2020, 16:16:26 UTC - in response to Message 91662. March 14th and the issue continues. I have one stuck at 82.22%. Aborting seems to be the only option... ID: 91968 · Rating: 0 · rate: / Reply Quote

Dr Who Fan Send message Joined: 28 May 06 Posts: 108 Credit: 292,109 RAC: 0	Message 91972 - Posted: 14 Mar 2020, 19:37:32 UTC This thread/ topic is duplicate to Message boards : Number crunching : Stalled downloads Let's not make multiple topics on SAME issue! ID: 91972 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2590 Credit: 47,220,881 RAC: 5	Message 91980 - Posted: 15 Mar 2020, 8:44:08 UTC - in response to Message 91768. Once the stalled filetask is manually abortedcleared, my priorities between Rosetta and backup project mean backup tasks are all ignored unless they're manually forced to run, so there's a further day or two of clearing them out before the machine becomes unattended again with the prospect of another failed Rosetta download and everything repeats itself. That is annoying, I know. But if you have set the backup as a zero resource share, it will eventually clear itself out in order to meet its expiration date. It will just sit around for a while. I set it to 96.67% Rosetta to 3.33% WCG, but that's not the issue I'm seeing. Once all Rosetta tasks are complete, barring the stalled download Rosetta task, my entire buffer fills with the backup project, so I get 2.0 or 2.4 days of WCG tasks. When I resolve the Rosetta issue, I can manually force the WCG tasks to run (4 or 8 tasks at a time, depending on the cores for that machine) but as soon as they finish, Rosetta starts again and I have to manually start more WCG tasks. It's very boring as well as annoying. And when I'm at that location, I'm in one of two places for half a day at a time, so it can take 2 or 3 days to clear them or, as has just been the case, I don't get to clear them all in 3 days and have to leave for my other location for 3-4 days. I could just abort all the WCG tasks, I suppose, but I don't like to do that. If they run, then I'm sure of a long unattended run on Rosetta to catch up the debt. Which is great unless another Rosetta download fails and then I'm back to square one, resolving a task that's failed while unattended there. This has been going on for nearly a month. To say I'm thoroughly sick and tired of it all would be an understatement. ID: 91980 · Rating: 0 · rate: / Reply Quote