Dear pii3.brahms.bnl.gov users, aparently pii3 had developed a peculiar problem. This problem is specific to the 2.2 Linux kernel series and is likely to stay with us for a while (AFAIK, the coming 2.4 Linux kernel series also have the same problem). The linux-kernel hackers are aware of the problem and hopefully will come up with a solution, eventually. This is what is happening: when one or more processes consume all available memory (both real and swap), Linux will try to free up some memory by killing off processes. Unfortunately the algorithm it uses to decide who to kill is flawed. Often, instead of killing the offending user processes (like the ones that consumed all memory), it kills critical system processes, such as the name server (named) or the NIS server (ypserv). I also see some dead httpd and mysqld processes, but both the Apache web server and the MySQL database server seem to survive the processcide. I have identified two sources of unlimited memory consumption: - user processes (i.e. netscape and root) and - CGI scripts run by the web server. To at least partially protect the critical services running on pii3 (named, NIS, system logger), I have decided to implement the memory limits for the above sources. The current memory limits for all users are essentially "unlimited". For the user processes, I will set the soft memory limit to 200 Mbytes. This should have no effect on user programs since pii3 has only 128 Mbytes of real memory and running processes that consume more than that is a bug. Also note that the soft memory limits can be raised by the user when needed. For the CGI scripts run from the web server I will try to set the memory limit to 64 Mbytes. Hopefully this will make pii3.brahms.bnl.gov a little bit more stable. pii3 will be rebooted and an additional message will go out when the memory limits are actually implemented. -- Konstantin Olchanski Physics Department, Brookhaven National Laboratory, Long Island, New York olchansk@bnl.gov
This archive was generated by hypermail 2b29 : Thu Jul 13 2000 - 16:53:40 EDT