Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 301 · Next
Author | Message |
---|---|
PFLIEGER Guy Send message Joined: 20 Dec 15 Posts: 3 Credit: 1,230,645 RAC: 0 |
today is a problem with the server: most functions are not running Guy PFLIEGER MASEVAUX ALSACE France phone: 0033973514697 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Looks like all servers are now active. Are you still seeing upload problems? Rosetta Moderator: Mod.Sense |
ncoded.com Send message Joined: 16 Aug 16 Posts: 4 Credit: 39,895,071 RAC: 119,636 |
Hi, We are still getting upload problems. Status: Project back off. Thanks. |
Brian Priebe Send message Joined: 27 Nov 09 Posts: 16 Credit: 33,020,247 RAC: 0 |
I have two machines here with the same problem. 10 WU's stuck on uploading. They transmit between 48KB and 55KB then stop dead before going into the retry loop. |
Omega Send message Joined: 25 Jan 11 Posts: 3 Credit: 1,012,997 RAC: 0 |
Upload problems remain. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
I'll take a look. Sorry for being late on this. |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
I'll take a look. Sorry for being late on this. at 17:40 pacific time here I still have 16 queued up waiting in line... Happy Easter everyone! Cheers, /M |
Omega Send message Joined: 25 Jan 11 Posts: 3 Credit: 1,012,997 RAC: 0 |
More WU's getting stuck on uploading. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
We've been trying to troubleshoot this issue but still do not know what is causing it. Our sys admin said that UW-IT has been contacted to determine if it's a UW network issue. Sorry for any inconvenience. |
ncoded.com Send message Joined: 16 Aug 16 Posts: 4 Credit: 39,895,071 RAC: 119,636 |
Just to update. Some WU are actually uploading, but some are not. I am not sure if this helps troubleshoot, I thought it may be worth mentioning. Happy Easter, and Happy Holidays to everyone else. |
Keith E. Laidig Volunteer moderator Project developer Send message Joined: 1 Jul 05 Posts: 154 Credit: 117,189,961 RAC: 0 |
Hey folks - Happy Easter to all. Yesterday, in response contributors concerns about our aged SSL cipher handling on the R@H services, we made changes the webserver configurations. For reasons I don't yet understand, these changes resulted in upload timeouts and the slow collapse of project backend. Regrettably, I had 'gone offline' yesterday evening to celebrate the season with family and was unaware of the problems until this AM [ADT]. We've reverted to previous configurations and restarted the project. We apologize for the 'outage' and will keep monitoring the situation closely until convinced things are working. FYI - we are in the final stages of building out a new, shiny, high-powered BOINC system for R@H. More info coming.... -KEL [Update] It looks like there are still problems.... |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,015 RAC: 1,790 |
Could you modify your Server Status web page to show which of the server programs handle uploads and downloads? |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Could you modify your Server Status web page to show which of the server programs handle uploads and downloads? I just added some text: Web servers: boinc, srv1, srv2, srv3, srv4, srv5 (upload and download servers) boinc is load balanced among the srv web servers. The srv servers handle uploads and downloads. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,083,925 RAC: 1,942 |
I have seen so far one WU that is stuck, though I can't remotely check all the hosts that are running R@H. On that one host where I noticed this since Friday, other WUs are uploading fine. And the one that gets stuck is trying to upload but as far as I watched some forced upload retries, it craps out at various amounts of data, between 3KB and 32KB, out of 739.36KB. The WU in question is https://boinc.bakerlab.org/rosetta/workunit.php?wuid=820755819. I now see that there are actually a few more WUs from the same date send that should have all been returned by now, on other hosts as well... EDIT: Actually, I just checked and there are exactly 3 more WUs from the same data (one is early morning the next day, the 3/12) on one other host, a laptop that I can't check remotely. However, that same laptop has successfully received and returned WU send after 3/11, 3/12... As other WUs are uploading just fine, even on the same machine, I can not think of a reason as to why a networking issue at UW should be causing this... :? Ralf |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
I've got 12 queued that are now past the due date. And another 14 waiting that haven't gone past the due date yet, but more are dropping off all the time. These are spread out on 15 different clients. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Pretty similar here -- realizing that the issue is back at the Rosetta site, since iis not happening with other projects on the same systems, I've elected to suspend processing on Rosetta units, which pushes my processing over to WorldGrid. I periodically try to push the uploads but so far (since last week) no joy here. Ideally the folks on the project side will figure out the problem they have at there end and resolve it. I've got 12 queued that are now past the due date. |
Brian Priebe Send message Joined: 27 Nov 09 Posts: 16 Credit: 33,020,247 RAC: 0 |
Still have WU's stuck uploading. However, a different error sometimes shows up now: 17-Apr-2017 18:06:53 | rosetta@home | [error] Error reported by file upload server: [des_DS_160_fragments_fold_SAVE_ALL_OUT_470271_1422_0_0] locked by file_upload_handler PID=4156 17-Apr-2017 18:06:53 | rosetta@home | [error] Error reported by file upload server: [tj_3_6_junc_X_DHR55_DHR55_l3_t2_t3_4_v5c_fragments_abinitio_SAVE_ALL_OUT_474351_416_0_0] locked by file_upload_handler PID=23998 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,223,775 RAC: 11,118 |
Still have WU's stuck uploading. However, a different error sometimes shows up now: So where does file_upload_handler come from? Who is locking the file. The host machine or the server? My error messages are 18/04/2017 03:38:57 | rosetta@home | Started upload of rb_04_14_73903_117148__t000__ab_robetta_IGNORE_THE_REST_480361_204_0_0 Again, the server... |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,015 RAC: 1,790 |
Could you modify your Server Status web page to show which of the server programs handle uploads and downloads? Looks good so far, but could you also add the status of all those servers to help us tell when to expect upload and download problems? |
Omega Send message Joined: 25 Jan 11 Posts: 3 Credit: 1,012,997 RAC: 0 |
It seems to me that the upload problem has been solved. At least, all my stuck WU's have been uploaded now. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org