[auton-users] Fwd: LOU1 restored
Donghan (Jarod) Wang
donghanw at cs.cmu.edu
Mon Feb 11 10:35:19 EST 2013
Hi everyone,
LOU1, compute node, has been rebooted due to out of memory. All jobs were
terminated gracefully. All services on the node are back and running now.
Please check your jobs.
Date/Time
---------------
Rebooted on Feb. 11 10:09 AM
Description
----------------
A user job exhausted both RAM and swap, which leaded the server stopped
responding to the world.
Before rebooting, all jobs were terminated gracefully so that they had a
chance to save the data to disks. It's strongly recommended you check your
jobs to ensure consistency.
Please let me know if you have any questions/concerns.
Thanks,
Jarod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.srv.cs.cmu.edu/mailman/private/autonlab-users/attachments/20130211/20512a69/attachment.html>
More information about the Autonlab-users
mailing list