Any views expressed within media held on this service are those of the contributors, should not be taken as approved or endorsed by the University, and do not necessarily reflect the views of the University in respect of any particular issue.

Computing Systems

Computing Systems

Informatics Computing Staff jottings

staff.ssh problems

The machine hosting the staff.ssh service (rydell) crashed on the evening of Tuesday 12th April. This was caused by a runaway process which consumed all available memory. The machine was rebooted at 6:30am on Wednesday 13th April and is now working normally.

For those interested in the details, the Linux kernel Out-Of-Memory (OOM) killer did kick in and did kill the runaway process. As is often the case though this didn’t regain sufficient memory quickly enough so it then went on the rampage and started killing processes all over the place which left the system running but non-functional.

On both the ssh login machines there is a limit on the number of processes a user is permitted to have running. This had not previously been changed from the default value of 1024. The default value was clearly too high in this case as the OOM killer had kicked in before the limit was reached. There is unlikely to be a good reason anyone should ever run that many processes on an ssh login server. To help prevent this problem recurring we are consequently going to drop the limit to 200. This limit is a lot more than anyone is currently running but significantly lower than the default. This does not, of course, prevent absolute memory consumption by a small number of large processes but it’s unclear whether it is possible to prevent that currently. When we upgrade to SL6 later in the year we will review the situation and see if newer features of the Linux kernel will allow us to do anything else to prevent total resource consumption and the subsequent crashes.

staff.ssh problems / Computing Systems by is licensed under a

4 replies to “staff.ssh problems”

  1. Iain says:

    Maybe encourage users to set a “ulimit -v” in their shell configuration to prevent them accidentally trying to allocate stupid amounts of memory? One form of encouragement might be setting a default in the system and default shell configurations.

  2. squinney says:

    The problem is that what is an appropriate amount of memory on one machine (e.g. a compute server) is completely wrong on another machine (e.g. an ssh login server with 1GB of RAM). I suspect most people would not want the hassle of altering it depending on their current server and task. I’m hopeful that the cgroups support in newer kernels will allow us to set up some sensible policies for different types of machines.

  3. Alan Bundy says:

    I assume if someone had a really good reason for a higher limit then this could be granted on a one-off basis.

  4. squinney says:

    Yes, we can make exceptions to the resource limits on a per-user basis where necessary.

Leave a reply to Alan Bundy

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

staff.ssh problems / Computing Systems by is licensed under a
css.php

Report this page

To report inappropriate content on this page, please use the form below. Upon receiving your report, we will be in touch as per the Take Down Policy of the service.

Please note that personal data collected through this form is used and stored for the purposes of processing this report and communication with you.

If you are unable to report a concern about content via this form please contact the Service Owner.

Please enter an email address you wish to be contacted on. Please describe the unacceptable content in sufficient detail to allow us to locate it, and why you consider it to be unacceptable.
By submitting this report, you accept that it is accurate and that fraudulent or nuisance complaints may result in action by the University.

  Cancel