Message boards : Number crunching : Chaos in Rosetta@Home???
Previous · 1 · 2
Author | Message |
---|---|
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
ralph is definitely used. our standard procedure is to send all jobs through ralph first before running them on Rosetta. If there are some people in our group skipping this and may be causing problems, they shouldn't and I'll make sure they don't do it again. joseps, the code signing error was a simple human error. ralph uses a different signature than R@h so all new apps have to be code signed (you can't test this on ralph, you just have to do it right the first time and verify the signature). I since fixed our code signing script to make sure the signature files always get overwritten to make sure incorrect copies get overwritten with the correct signature (and verified). We do not have the resources to have backup servers just sitting idle and they wouldn't have helped much with this latest mishap anyway. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2127 Credit: 41,266,340 RAC: 8,573 |
Ralph is definitely used. Our standard procedure is to send all jobs through Ralph first before running them on Rosetta. If there are some people in our group skipping this and may be causing problems, they shouldn't and I'll make sure they don't do it again. So this is really just a discipline problem. It seems to me this kind of thing can be solved if people are made aware of the limits of their authority, and updates can't go live unless they're signed off first by someone who does have that authority, their capability and availability to rectify any unexpected issue that arises. That covers everything, doesn't it? A hard lesson this time, but the consequences are much bigger than the problem, so it has to be done. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Ralph is definitely used. Our standard procedure is to send all jobs through Ralph first before running them on Rosetta. If there are some people in our group skipping this and may be causing problems, they shouldn't and I'll make sure they don't do it again. Sounds like a hard core strictly enforced protocol needs to be enacted and enforced. Make RAH off limits except to just a few key personnel that can check the work before it is released. I'm guessing this was a serious work load for the IT people to fix and it was a egg on the face for the project. Let's hope this was a lesson that will not be repeated any time soon. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,284,221 RAC: 1,121 |
We are still in debug mode for our minirosetta application. There is a small memory leak and a 2 fold slow down in performance. The slow down was caused by a recent refactoring of the hydrogen bond energy code. Maybe that small memory leak is responsible for the problems I've been seeing lately with the total memory in use by processes as reported by Windows Task Manager being significantly less than the total physical memory in use, and whenever the total physical memory in use gets much above 50%, all programs that run in 32-bit mode slowing down significantly on both of my computers. I've currently decided to handle the problem by telling BOINC that it can use no more than 40% of the memory on either computer, even when it isn't in use. This makes such problems slower to appear, but does not stop them entirely. Restarting the boinc.exe program more often helps too. There's also the possibility that the versions on BOINC on both these computers have trouble using more than 50% of the memory to run workunits in 32-bit mode. Another reason to hurry up the availability of application programs that run in 64-bit mode, and to give future versions of BOINC separate control of how much memory can be used in 32-bit mode. |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,732,248 RAC: 3,460 |
Another discipline problem is that new versions are being implemented without posting them to the Rosetta Application Version Release Log. This has happened several times in the past few weeks. Version 1.91 is still not there. It's very important that this is done so that those of us who subscribe to that thread get an email, and can update our firewalls. There may be other reasons that this is important, but that's why it's important to me. |
Message boards :
Number crunching :
Chaos in Rosetta@Home???
©2024 University of Washington
https://www.bakerlab.org