[auton-users] Fwd: LOT2 restored

Donghan (Jarod) Wang donghanw at cs.cmu.edu
Fri Feb 1 14:30:37 EST 2013


Hi everyone,

LOT2, compute node, has been rebooted unexpectedly due to a kernel panic.
All jobs were terminated. All services are back and running now. Please
check your jobs.

Description
----------------
A user job exhausted the memory and overloaded the system resulting in
system crash.

Date/Time
---------------
Crashed on Feb. 1 1:15 PM
Rebooted on Feb. 1 2:15PM

It's strongly recommended in the next few hours a user should avoid running
jobs that may overload the system. This is because a faulty disk was
replaced this morning and the system has been syncing the RAID array. Any
system crash will delay the sync process. The recovering is expected to
finish in 12 hours.

Please let me know if you have any questions/concerns.

Thanks,
Jarod

-- 
Donghan (Jarod) Wang
Research Programmer
Robotics Institute
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213
Email: donghanw at cs.cmu.edu
Tel: +1 412 268 1238
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.srv.cs.cmu.edu/mailman/private/autonlab-users/attachments/20130201/993e9667/attachment.html>


More information about the Autonlab-users mailing list