Message boards : Number crunching : Stalled downloads
Author | Message |
---|---|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
I keep getting downloads of 3kB files getting stuck. Aborting the download, then aborting the task, then updating the project usually works. But sometimes I still can't get new work until I actually reboot the computer! Boinc thinks the download is still stalled: Rosetta@home 16/02/2020 11:00:16 AM Not requesting tasks: some download is stalled |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
Managed to find the log from before I had to reboot: 16-Feb-2020 07:49:29 [Rosetta@home] Started download of 9v1nm_gb_c815_9mer_gb_001245.zip 16-Feb-2020 07:54:36 [Rosetta@home] Temporarily failed download of 9v1nm_gb_c815_9mer_gb_001245.zip: transient HTTP error 16-Feb-2020 07:54:36 [Rosetta@home] Backing off 03:44:45 on download of 9v1nm_gb_c815_9mer_gb_001245.zip 16-Feb-2020 07:54:37 [---] Project communication failed: attempting access to reference site 16-Feb-2020 07:54:38 [---] Internet access OK - project servers may be temporarily down. 16-Feb-2020 09:29:54 [Rosetta@home] Computation for task rb_02_15_16183_16041__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_894816_222_0 finished 16-Feb-2020 09:29:59 [Rosetta@home] Starting task 7ub7ru9a_3h_design1_893125_1_0 16-Feb-2020 09:30:00 [Rosetta@home] Started upload of rb_02_15_16183_16041__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_894816_222_0_r614618089_0 16-Feb-2020 09:30:04 [Rosetta@home] Finished upload of rb_02_15_16183_16041__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_894816_222_0_r614618089_0 16-Feb-2020 10:30:15 [Rosetta@home] Sending scheduler request: To report completed tasks. 16-Feb-2020 10:30:15 [Rosetta@home] Reporting 1 completed tasks 16-Feb-2020 10:30:15 [Rosetta@home] Not requesting tasks: some download is stalled 16-Feb-2020 10:30:17 [Rosetta@home] Scheduler request completed 16-Feb-2020 10:59:01 [Rosetta@home] task 9v1nm_gb_c815_9mer_gb_001245_SAVE_ALL_OUT_892880_29_0 aborted by user 16-Feb-2020 10:59:06 [Rosetta@home] update requested by user 16-Feb-2020 10:59:07 [Rosetta@home] Sending scheduler request: Requested by user. 16-Feb-2020 10:59:07 [Rosetta@home] Reporting 1 completed tasks 16-Feb-2020 10:59:07 [Rosetta@home] Not requesting tasks: some download is stalled 16-Feb-2020 10:59:08 [Rosetta@home] Scheduler request completed 16-Feb-2020 10:59:25 [Rosetta@home] update requested by user 16-Feb-2020 10:59:28 [Rosetta@home] Sending scheduler request: Requested by user. 16-Feb-2020 10:59:28 [Rosetta@home] Not requesting tasks: some download is stalled 16-Feb-2020 10:59:30 [Rosetta@home] Scheduler request completed 16-Feb-2020 11:00:14 [Rosetta@home] update requested by user 16-Feb-2020 11:00:16 [Rosetta@home] Sending scheduler request: Requested by user. 16-Feb-2020 11:00:16 [Rosetta@home] Not requesting tasks: some download is stalled 16-Feb-2020 11:00:17 [Rosetta@home] Scheduler request completed 16-Feb-2020 11:10:02 [Rosetta@home] update requested by user 16-Feb-2020 11:10:07 [Rosetta@home] Sending scheduler request: Requested by user. 16-Feb-2020 11:10:07 [Rosetta@home] Not requesting tasks: some download is stalled 16-Feb-2020 11:10:09 [Rosetta@home] Scheduler request completed |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 0 |
Same problem here, I've had not to restart hosts but in some of them I do have to restart boinc to be able to download wus again. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
Same problem here, I've had not to restart hosts but in some of them I do have to restart boinc to be able to download wus again. I haven't tried just Boinc, presumably that would be just as effective. But most of my machines are remote, so a system restart was easier than logging onto the machine and manually restarting Boinc. I don't think I can do that remotely through Boinctasks. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
Same problem here, I've had not to restart hosts but in some of them I do have to restart boinc to be able to download wus again. Richard Haselgrove over at Boinc is looking into it, but needs some logs from someone with a stuck WU. See https://boinc.berkeley.edu/dev/forum_thread.php?id=13435 |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,071,731 RAC: 16,670 |
Yep, same shit here. Stuck downloads (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few days. There were 4 or 5 times from beginning of February. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
Yep, same shit here. Stuck downloads (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few days. Seems to have been fine here for a few days (on 4 computers), I should have seen more problems by now. Mind you I'm not getting any of the type of tasks that get stuck - "multistate" - are those the ones you get stuck with? Maybe they've paused those while they fix something? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
Yep, same shit here. Stuck downloads (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few days. Correction, just got a multistate, and it downloaded fine. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 393 Credit: 12,114,842 RAC: 4,200 |
Yep, same shit here. Stuck downloads (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few days. Mine tended to be rb_02 and it was only a small portion of those. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
Yep, same shit here. Stuck downloads (= stop flow of work for R@H as BOINC stops getting new work from R@H and switch to backup project - WCG in my case) every few days. In that case I guess it was a random fault with a Rosetta server. But once they were stuck, a retry didn't help. Corrupt disk somewhere in Rosetta? Oh well, every time it happens I can always remove it and get it going again. While I'm not looking, it can always fall back on another project. But ever since someone offered to help look at the problem, I've not had it to give them any logs! |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 0 |
This issue continues occurring everyday but it is being specially annoying today. All hosts blocked to download new units and some of them ending idle. Has it been looked at project side? |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,071,731 RAC: 16,670 |
Yep, I got a bunch of stuck downloads at 28 Feb too. Latest 2 examples: https://boinc.bakerlab.org/rosetta/download/fc/rb_02_24_16848_16671_ab_t000__h002_robetta.zip https://boinc.bakerlab.org/rosetta/download/224/PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip From BOINC it looks like this (with http_debug): 01/03/2020 00:30:08 | Rosetta@home | Started download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip 01/03/2020 00:35:15 | Rosetta@home | Temporarily failed download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip: transient HTTP error 01/03/2020 00:35:15 | Rosetta@home | Backing off 05:44:16 on download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip --------i have noticed stalled download (it was stuck for about 15-20 hours already ) turned http_debug on and press "retry" ----------- 01/03/2020 00:42:31 | | Re-reading cc_config.xml 01/03/2020 00:42:31 | | log flags: file_xfer, sched_ops, task, http_debug, work_fetch_debug 01/03/2020 00:42:31 | Rosetta@home | Found app_config.xml 01/03/2020 00:42:31 | Rosetta@home | [work_fetch] REC 4936.494 prio -0.068 can't request work: some download is stalled 01/03/2020 00:42:31 | Rosetta@home | [work_fetch] share 0.000 01/03/2020 00:42:59 | Rosetta@home | [http] HTTP_OP::init_get(): https://boinc.bakerlab.org/rosetta/download/224/PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip 01/03/2020 00:42:59 | Rosetta@home | [http] HTTP_OP::libcurl_exec(): ca-bundle 'D:Boincca-bundle.crt' 01/03/2020 00:42:59 | Rosetta@home | [http] HTTP_OP::libcurl_exec(): ca-bundle set 01/03/2020 00:42:59 | Rosetta@home | Started download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip 01/03/2020 00:42:59 | Rosetta@home | [http] [ID#10522] Info: Connection 3013 seems to be dead! 01/03/2020 00:42:59 | Rosetta@home | [http] [ID#10522] Info: Closing connection 3013 01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Info: Trying 128.95.160.156... 01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Info: Connected to boinc.bakerlab.org (128.95.160.156) port 80 (#3014) 01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: GET /rosetta/download/224/PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip HTTP/1.1 01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: Host: boinc.bakerlab.org 01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.14.2) 01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: Accept: */* 01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: Accept-Encoding: deflate, gzip 01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: Content-Type: application/x-www-form-urlencoded 01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: Accept-Language: en_GB 01/03/2020 00:43:00 | Rosetta@home | [http] [ID#10522] Sent header to server: 01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: HTTP/1.1 200 OK 01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Date: Sat, 29 Feb 2020 21:42:58 GMT 01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Server: Apache/2.4.18 01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Last-Modified: Sat, 22 Feb 2020 18:36:23 GMT 01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: ETag: "a8a-59f2e6a4792b8" 01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Accept-Ranges: bytes 01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Content-Length: 2698 01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: Content-Type: application/zip 01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: 01/03/2020 00:43:01 | Rosetta@home | [http] [ID#10522] Received header from server: PK 01/03/2020 00:48:06 | Rosetta@home | [http] [ID#10522] Info: Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds 01/03/2020 00:48:06 | Rosetta@home | [http] [ID#10522] Info: Closing connection 3014 01/03/2020 00:48:06 | Rosetta@home | [http] HTTP error: Timeout was reached 01/03/2020 00:48:06 | Rosetta@home | Temporarily failed download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip: transient HTTP error 01/03/2020 00:48:06 | Rosetta@home | Backing off 03:56:16 on download of PKY1232uM_gly_00722_127_2_SSC_matched_9_FR_C_R_B_0001_notail.zip From a browser or other programs it looks the same: R@H server is responding, downloading of file begins but at some point completely stops until timeout is triggered. Retries does not help - it just repeat loop. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
I tried both those links in my browser, the first worked fine, but the second stopped at 2610 of 2698 bytes. Seems rather random. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,071,731 RAC: 16,670 |
Yes, first link is now working for me too. But it did not work at time when i was writing my previous post (29 Feb 2020 ~ 22:20 UTC ). |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
I have a few more stalled ones today. The main problem of course is that it prevents others from downloading, so you have to babysit it. It is fun for a while, but it is getting to be like LHC. If they can't get their servers to work, there is not much I can do for them. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
I have a few more stalled ones today. The main problem of course is that it prevents others from downloading, so you have to babysit it. LHC is irritating me too a little bit, but it's only CMS that screws up. You can turn CMS off completely in the website settings, or like me, just leave it running. They usually fail very quickly and don't waste much time, and I'm assuming that the failed tasks are helping them to fix the problem in some way. Also, if you're NOT running Linux, then switch off "run native tasks" in the LHC website settings. I had that enabled (I use Windows 10), thinking it would give more options of tasks to run. But it ended up stopping me getting any Theory or Atlas tasks. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
LHC is irritating me too a little bit, but it's only CMS that screws up. Thanks. I am running native ATLAS now. If they ever get CMS up again, I will give it a try. I think they are working on it. I just hope the Rosetta glitch is a minor server issue that does not fall in the long-term problem category that LHC does. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
LHC is irritating me too a little bit, but it's only CMS that screws up. For me, CMS only occupies a small amount of my computer's time. Failed tasks fail very early. I continue to allow them to help them figure out the problem. And for me, Rosetta is working perfectly now, not sure why. As I said earlier, I tried some links and failed to get a download in my browser of somebody's failed task, but no tasks my computers (4 of them) have been given are going wrong any more. Whatever was wrong isn't as bad as it used to be. I used to have to manually intervene with every computer about once a day. None have failed in the last week. It was mentioned somewhere that's it's just overloaded servers at their end. Maybe they upgraded something, or maybe there's less load as people go off and do other projects. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,845,183 RAC: 9,025 |
LHC is irritating me too a little bit, but it's only CMS that screws up. Oh my god, where did you get all those Ryzens from? That's pure pornography! |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Oh my god, where did you get all those Ryzens from? That's pure pornography! I just happened to spend the winter expanding my fleet. They came online just in time for the coronavirus. I also do Folding on each one too - which just recently announced a project for it. https://foldingathome.org/2020/02/27/foldinghome-takes-up-the-fight-against-covid-19-2019-ncov/ I just reserve a core in BOINC to support each GPU (everything from a GTX 750 Ti up to an RTX 2060). But they are really to heat my basement. You might as well have fun at the same time. |
Message boards :
Number crunching :
Stalled downloads
©2024 University of Washington
https://www.bakerlab.org