[auton-users] LOP1 policy change: Please Read
Michael J. Baysek
mjbaysek at cs.cmu.edu
Mon Feb 16 16:24:49 EST 2009
Hi everyone. This mail is to notify you of a change that has taken
effect on LOP1 that may affect how you can run processes there.
*SUMMARY*
Use of LOP1 is now limited to processes which require less than 2 GB of
memory (including overcommit) to run. Any process using more than this
will be terminated automatically, and without warning. Be aware that
this may cause problems with software packages like Matlab or ASL even
under normal use cases.
*
BACKGROUND*
Because LOP1 is used by many as the default server to access CVS, it has
a special role in the lab. It needs to be available for that purpose,
even when overall lab CPU load is high. When LOP1 is overloaded,
particularly in the case of high memory jobs and swap related
disk-thrashing, it creates what could be called a denial-of-service for
many users of CVS. Until now, it was all to easy to do this, even
completely by accident (which is typically how it happens).
*WHAT'S CHANGED?*
All new logins on LOP1 will be subject to a ulimit for VIRT memory.
When a single process asks for more than 2 GB of RAM, the process will
now segfault and terminate. All low memory (< 2 GB) jobs will still run
as expected.
*WHY THE HARD LIMIT?*
Imposing the limit helps ensure that accessibility of CVS is not
disrupted during heavy load. The only way to do this is to impose these
limits.
*I STILL NEED TO USE LOP1, WHAT CAN I DO?*
It is highly recommended that you DO NOT run applications like Matlab on
LOP1, and use another machine instead. Appropriate warnings have been
added to the "matlab" and "asl" commands on LOP1 in hopes of preventing
problems. Be aware that not all apps can issue this warning.
Finally, if you are concerned about how the limits might affect your
process, you have two options. 1) Either run on any other machine, or
2) Have your script check shell variable "$?" immediately after the job
exits for return code 139, which is the code for segfault. If a
particular run segfaults on LOP1, you could simply record your launch
arguments to a file for execution on a different machine.
--
--
Michael J. Baysek, Systems Analyst
Carnegie Mellon University - Auton Lab
www.cmu.edu - www.autonlab.org
412-268-8939
More information about the Autonlab-users
mailing list