Message boards : Number crunching : Memory and CPU problems with Ubuntu 16.04?
Previous · 1 · 2
Author | Message |
---|---|
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
This suggests that you have "Leave applications in memory while suspended" unchecked. Each time you suspend a task it is taken out of memory and the work done after the last checkpoint is discarded. I'm not sure that either Mod.Sense or Chilean have described rosetta's increasing need for memory perfectly precisely. I think rosetta models may require more memory for subsequent stages of processing after the first and that some models precede through more stages of processing than other models within the same task. The caveat is that I haven't actually looked that closely at rosetta's memory behavior in quite a while and I am vaguely aware that the rosetta team spent some time reexamining rosetta's use of memory in the somewhat recent past. Despite this and the fact that Mod.Sense is almost always exactly right, I still think, given the variety of rosetta protocols, it likely that any task increasing it's memory further after the initial setup is behaving appropriately and its need for more memory is not indicative of a memory leak or a bug. It was clear from your second post that the symptoms you described were most likely the expected result of a memory usage limit with a possible discrepancy between the Ubunto and Windows installations. This could be checked by answering Link's question then checking the event log (per rjs5's suggestion) to see if BOINC was reading the preferences the way you expected. It would also tell you from where BOINC is reading those preferences. You could compare the event logs of the Windows and Ubunto installations to confirm the memory limit preferences are the same. Mod.Sense asked you about this in his first response to you. Most of the responses have been from BOINC 101 with a sub-unit on rosetta. Mod.Sense's suggestion, to step back from the maze you've entered, look where everyone else is pointing, and double-check the basics, is from Troubleshooting 101. Best, Snags |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
Sound like memory usage limit. EACH rosetta WU uses on average AT LEAST 0.5 GB of RAM (I have 3 right now using 600+ MB). This RAM usage increases as the WU progresses up to a certain maximum. It doesn't start using the maximum maount of RAM it'll eventually use right at the start... thus this slow increase in RAM usage. What Chilean is trying to point out is that if you have limited BOINC to no more than .5GB per core it is inevitable that you will at least occasionally run into the memory usage limit and see the behavior you have described. Previously you said Ubunto indicated there was available memory at the same time BOINC was suspending tasks with the "waiting for memory" message. This suggests the BOINC preferences are the limiting factor, not Ubunto. At the beginning of the event log BOINC describes your machine and gives a few details of your preference settings. You should find these lines: Sat Apr 16 12:50:05 2016 | [name of project] | General prefs: from [name of project] (last modified [date time]) and if you are using local preferences instead of web-based preferences: Sat Apr 16 12:50:05 2016 | | Reading preferences override file then: Sat Apr 16 12:50:05 2016 | | max memory usage when active: xxx.xxMB Sat Apr 16 12:50:05 2016 | | max memory usage when idle: xxx.xxMB Sat Apr 16 12:50:05 2016 | | max disk usage: xxx.xxGB Sat Apr 16 12:50:05 2016 | | max CPUs used: x What are these values and are they the same for both installations? If they are the same the next step would be to look at the Activity Monitor or whatever it's called in Ubunto and see precisely where your memory is being allocated. I don't have a computer science degree but Troubleshooting 101 for every subject I've ever dealt with has included: Rule Out The Obvious. Best, Snags |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Well, at this point I increasingly suspect invisible friends, but... Near as I can tell, there were NO updates to anything during the period when the BOINC Manager problems suddenly went away. Memory utilization appears to be quite stable with 4 units running and using less than 45% of the available memory. The mix of work units does not seem to affect the memory status of the machine, though recently the rb work units seem unusually likely to trigger immediate computation errors, though that problem is not just on the Ubuntu 16.04 machine, but also on other hardware and OSes. (Not a new problem, and so far it just goes away after a few days.) At the earlier time when I was asked for the other data, it was not accessible (because the menus weren't), though I'm pretty sure that is an unrelated bug in 16.04. Still following the discussion of that menu problem over on Launchpad. I certainly agree with you about backing down to check the basics, but right now I seem to be in the state of "If it ain't broke, don't fix it." While it would be nice to know what was going on there I'm not going to worry too much until it comes back. Shall we just call it teething problems in 16.04? (Still I have to rate it as the worst version upgrade since they broke the Japanese input system around Gutsy Gibbon time... I think Dapper Drake may have been my first Ubuntu?) #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I would have to guess that you got an undesirable set of all high-memory tasks at the same time, over the course of several days, and this essentially exceeded your machine's resources available to BOINC. If you think about it, the BOINC Manager reacted fairly well to the situation and continued crunching through the work as best it could. Rosetta Moderator: Mod.Sense |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
I would have to guess that you got an undesirable set of all high-memory tasks at the same time, over the course of several days, and this essentially exceeded your machine's resources available to BOINC. If you think about it, the BOINC Manager reacted fairly well to the situation and continued crunching through the work as best it could. Could have sworn that I already replied to this? But it seems to have disappeared. I think I said something like "Perhaps so, but it still seems unfair, especially when I do my part and get no credit for trying." Just saw another example this morning. A computation error on an rb unit that had run some hours and was almost finished. Looking at the log, I see that my computer apparently requested some credit for it, but received nothing. On the one hand, I agree that better results should receive more reward, but on the other hand, it still feels like I'm being penalized for someone else's buggy software. (Ditto that ancient Mac unit. I'm pretty sure it will get no credit because it is way past it's deadline. I'm letting it run largely to be impressed by the stability of the Mac in running the work unit for over a month... A bug? Bad assessment of the computational requirements? Whatever. NOT my mistake, but no credit even for the electricity consumed.) Then again, I think that excessive worry about credit creates a competitive atmosphere that can be almost anti-scientific. Doesn't matter much in a case like seti@home (where I was in the top 1% before BOINC appeared), though the "points frenzy" still bothered me. Much more of a concern where the computations are possibly contributing to journal publications... Anyway, I'm still dismissing the reported problems as teething for 16.04, and I'm not really concerned if the bugs are Rosetta's or Ubuntu's. I'm just trying to follow the rules of the apparent game and sometimes feeling some annoyance or frustration when they seem broken. #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Do you mean like these two? https://boinc.bakerlab.org/rosetta/result.php?resultid=814278750 https://boinc.bakerlab.org/rosetta/result.php?resultid=814299542 As you can see, they were each granted credit equal to their credit claim, which, as you say, for this machine is better than the average. Rosetta Moderator: Mod.Sense |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,054,272 RAC: 5,361 |
Do you mean like these two? Both of the workloads failed and were granted what was claimed. Probably not a good example. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
It is an example of what we were describing here. Rosetta Moderator: Mod.Sense |
rjs5 Send message Joined: 22 Nov 10 Posts: 273 Credit: 23,054,272 RAC: 5,361 |
|
Message boards :
Number crunching :
Memory and CPU problems with Ubuntu 16.04?
©2024 University of Washington
https://www.bakerlab.org