Memory and CPU problems with Ubuntu 16.04?

Message boards : Number crunching : Memory and CPU problems with Ubuntu 16.04?

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 79937 - Posted: 26 Apr 2016, 8:38:07 UTC - in response to Message 79933.  


The pattern I have now is to wait until it starts suspending work units. Then I suspend the running units, 4 other work units start running and memory drops back to the area around 30%--and then usage starts creeping up. Eventually it stops running 4 units and I can repeat the process.

This suggests that you have "Leave applications in memory while suspended" unchecked. Each time you suspend a task it is taken out of memory and the work done after the last checkpoint is discarded.

I'm not sure that either Mod.Sense or Chilean have described rosetta's increasing need for memory perfectly precisely. I think rosetta models may require more memory for subsequent stages of processing after the first and that some models precede through more stages of processing than other models within the same task. The caveat is that I haven't actually looked that closely at rosetta's memory behavior in quite a while and I am vaguely aware that the rosetta team spent some time reexamining rosetta's use of memory in the somewhat recent past. Despite this and the fact that Mod.Sense is almost always exactly right, I still think, given the variety of rosetta protocols, it likely that any task increasing it's memory further after the initial setup is behaving appropriately and its need for more memory is not indicative of a memory leak or a bug.

It was clear from your second post that the symptoms you described were most likely the expected result of a memory usage limit with a possible discrepancy between the Ubunto and Windows installations. This could be checked by answering Link's question then checking the event log (per rjs5's suggestion) to see if BOINC was reading the preferences the way you expected. It would also tell you from where BOINC is reading those preferences. You could compare the event logs of the Windows and Ubunto installations to confirm the memory limit preferences are the same. Mod.Sense asked you about this in his first response to you.


Most of the responses have been from BOINC 101 with a sub-unit on rosetta. Mod.Sense's suggestion, to step back from the maze you've entered, look where everyone else is pointing, and double-check the basics, is from Troubleshooting 101.

Best,
Snags
ID: 79937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 79938 - Posted: 26 Apr 2016, 9:00:15 UTC - in response to Message 79936.  
Last modified: 26 Apr 2016, 9:04:09 UTC

Sound like memory usage limit. EACH rosetta WU uses on average AT LEAST 0.5 GB of RAM (I have 3 right now using 600+ MB). This RAM usage increases as the WU progresses up to a certain maximum. It doesn't start using the maximum maount of RAM it'll eventually use right at the start... thus this slow increase in RAM usage.

This means that having 4 normal WUs running at the same time will show up in your system monitor as 50% RAM usage JUST from Rosetta.
You BOINC preferences are probably set up yo only allow 50-60% of the maximum RAM, thus BOINC suspends WUs until the RAM usage is below this threshold.


Unless I have invisible friends, it sounds like a reasonable or at least sufficiently plausible explanation. If so, 16.04 is moving in the direction of bloatware, but that is certainly no surprise these days. Further so, it is plausible that few people are running similarly old machines and fewer of them are noticing the performance changes, which could explain the paucity of reports from other observers.

Then again, a lack of further comments from me may only indicate that I've given up and I'm running the machine under Windows 10. Much as I dislike Microsoft, I have to say at least this one isn't a flaming lemon. Yes, there are a couple of things I prefer doing from Linux, but nothing urgent right now.

Oh yeah and by the way, the menu bar problem seems to be widely reported over on Launchpad and they have consolidated most of the reports (including mine) into one giant thread there. Not clear how much progress they are making, but after reading most of it, I'd estimate the probability that it is related to this BOINC problem at under 25%. My own guess is that it involves a new dynamic menu feature that doesn't work correctly, but under Appearance settings I switched it back to static menus and I'm not seeing it now.


What Chilean is trying to point out is that if you have limited BOINC to no more than .5GB per core it is inevitable that you will at least occasionally run into the memory usage limit and see the behavior you have described. Previously you said Ubunto indicated there was available memory at the same time BOINC was suspending tasks with the "waiting for memory" message. This suggests the BOINC preferences are the limiting factor, not Ubunto.

At the beginning of the event log BOINC describes your machine and gives a few details of your preference settings. You should find these lines:
Sat Apr 16 12:50:05 2016 | [name of project] | General prefs: from [name of project] (last modified [date time])
and if you are using local preferences instead of web-based preferences:
Sat Apr 16 12:50:05 2016 | | Reading preferences override file
then:
Sat Apr 16 12:50:05 2016 | | max memory usage when active: xxx.xxMB
Sat Apr 16 12:50:05 2016 | | max memory usage when idle: xxx.xxMB
Sat Apr 16 12:50:05 2016 | | max disk usage: xxx.xxGB
Sat Apr 16 12:50:05 2016 | | max CPUs used: x

What are these values and are they the same for both installations? If they are the same the next step would be to look at the Activity Monitor or whatever it's called in Ubunto and see precisely where your memory is being allocated.

I don't have a computer science degree but Troubleshooting 101 for every subject I've ever dealt with has included: Rule Out The Obvious.

Best,
Snags
ID: 79938 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 79947 - Posted: 27 Apr 2016, 7:45:16 UTC - in response to Message 79937.  


The pattern I have now is to wait until it starts suspending work units. Then I suspend the running units, 4 other work units start running and memory drops back to the area around 30%--and then usage starts creeping up. Eventually it stops running 4 units and I can repeat the process.

This suggests that you have "Leave applications in memory while suspended" unchecked. Each time you suspend a task it is taken out of memory and the work done after the last checkpoint is discarded.

I'm not sure that either Mod.Sense or Chilean have described rosetta's increasing need for memory perfectly precisely. I think rosetta models may require more memory for subsequent stages of processing after the first and that some models precede through more stages of processing than other models within the same task. The caveat is that I haven't actually looked that closely at rosetta's memory behavior in quite a while and I am vaguely aware that the rosetta team spent some time reexamining rosetta's use of memory in the somewhat recent past. Despite this and the fact that Mod.Sense is almost always exactly right, I still think, given the variety of rosetta protocols, it likely that any task increasing it's memory further after the initial setup is behaving appropriately and its need for more memory is not indicative of a memory leak or a bug.

It was clear from your second post that the symptoms you described were most likely the expected result of a memory usage limit with a possible discrepancy between the Ubunto and Windows installations. This could be checked by answering Link's question then checking the event log (per rjs5's suggestion) to see if BOINC was reading the preferences the way you expected. It would also tell you from where BOINC is reading those preferences. You could compare the event logs of the Windows and Ubunto installations to confirm the memory limit preferences are the same. Mod.Sense asked you about this in his first response to you.


Most of the responses have been from BOINC 101 with a sub-unit on rosetta. Mod.Sense's suggestion, to step back from the maze you've entered, look where everyone else is pointing, and double-check the basics, is from Troubleshooting 101.

Best,
Snags

Well, at this point I increasingly suspect invisible friends, but... Near as I can tell, there were NO updates to anything during the period when the BOINC Manager problems suddenly went away. Memory utilization appears to be quite stable with 4 units running and using less than 45% of the available memory. The mix of work units does not seem to affect the memory status of the machine, though recently the rb work units seem unusually likely to trigger immediate computation errors, though that problem is not just on the Ubuntu 16.04 machine, but also on other hardware and OSes. (Not a new problem, and so far it just goes away after a few days.)

At the earlier time when I was asked for the other data, it was not accessible (because the menus weren't), though I'm pretty sure that is an unrelated bug in 16.04. Still following the discussion of that menu problem over on Launchpad.

I certainly agree with you about backing down to check the basics, but right now I seem to be in the state of "If it ain't broke, don't fix it." While it would be nice to know what was going on there I'm not going to worry too much until it comes back. Shall we just call it teething problems in 16.04? (Still I have to rate it as the worst version upgrade since they broke the Japanese input system around Gutsy Gibbon time... I think Dapper Drake may have been my first Ubuntu?)
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 79947 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 79948 - Posted: 27 Apr 2016, 13:28:44 UTC

I would have to guess that you got an undesirable set of all high-memory tasks at the same time, over the course of several days, and this essentially exceeded your machine's resources available to BOINC. If you think about it, the BOINC Manager reacted fairly well to the situation and continued crunching through the work as best it could.
Rosetta Moderator: Mod.Sense
ID: 79948 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 79964 - Posted: 29 Apr 2016, 2:07:57 UTC - in response to Message 79948.  

I would have to guess that you got an undesirable set of all high-memory tasks at the same time, over the course of several days, and this essentially exceeded your machine's resources available to BOINC. If you think about it, the BOINC Manager reacted fairly well to the situation and continued crunching through the work as best it could.


Could have sworn that I already replied to this? But it seems to have disappeared. I think I said something like "Perhaps so, but it still seems unfair, especially when I do my part and get no credit for trying."

Just saw another example this morning. A computation error on an rb unit that had run some hours and was almost finished. Looking at the log, I see that my computer apparently requested some credit for it, but received nothing. On the one hand, I agree that better results should receive more reward, but on the other hand, it still feels like I'm being penalized for someone else's buggy software. (Ditto that ancient Mac unit. I'm pretty sure it will get no credit because it is way past it's deadline. I'm letting it run largely to be impressed by the stability of the Mac in running the work unit for over a month... A bug? Bad assessment of the computational requirements? Whatever. NOT my mistake, but no credit even for the electricity consumed.)

Then again, I think that excessive worry about credit creates a competitive atmosphere that can be almost anti-scientific. Doesn't matter much in a case like seti@home (where I was in the top 1% before BOINC appeared), though the "points frenzy" still bothered me. Much more of a concern where the computations are possibly contributing to journal publications...

Anyway, I'm still dismissing the reported problems as teething for 16.04, and I'm not really concerned if the bugs are Rosetta's or Ubuntu's. I'm just trying to follow the rules of the apparent game and sometimes feeling some annoyance or frustration when they seem broken.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 79964 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 79966 - Posted: 29 Apr 2016, 3:19:41 UTC

Do you mean like these two?
https://boinc.bakerlab.org/rosetta/result.php?resultid=814278750
https://boinc.bakerlab.org/rosetta/result.php?resultid=814299542

As you can see, they were each granted credit equal to their credit claim, which, as you say, for this machine is better than the average.
Rosetta Moderator: Mod.Sense
ID: 79966 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,054,272
RAC: 5,361
Message 79967 - Posted: 29 Apr 2016, 4:20:26 UTC - in response to Message 79966.  

Do you mean like these two?
https://boinc.bakerlab.org/rosetta/result.php?resultid=814278750
https://boinc.bakerlab.org/rosetta/result.php?resultid=814299542

As you can see, they were each granted credit equal to their credit claim, which, as you say, for this machine is better than the average.



Both of the workloads failed and were granted what was claimed.

Probably not a good example.
ID: 79967 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 79968 - Posted: 29 Apr 2016, 13:00:30 UTC

It is an example of what we were describing here.
Rosetta Moderator: Mod.Sense
ID: 79968 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,054,272
RAC: 5,361
Message 79969 - Posted: 29 Apr 2016, 13:05:23 UTC - in response to Message 79968.  

It is an example of what we were describing here.


Ooooops. Then it is a perfect example! 8-)

ID: 79969 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Memory and CPU problems with Ubuntu 16.04?



©2024 University of Washington
https://www.bakerlab.org