[auton-users] Neill1 Restored
Donghan (Jarod) Wang
donghanw at cs.cmu.edu
Thu Feb 14 18:22:48 EST 2013
Dear Neill users,
Neill1, compute node, has been rebooted due to system overload. All jobs
were terminated gracefully. All services on the node are back and running
now.
Date/Time
---------------
Rebooted on Feb. 14 6:10 PM
Description
----------------
Over 100 jobs were launched by a user which dramatically exceeded the cpu
capacity (there are 4 cpu cores). The system then stopped responding to the
world.
Before rebooting, all jobs were terminated gracefully so that they had a
chance to save the data to disks. It's strongly recommended you check your
jobs to ensure consistency.
Please let me know if you have any questions/concerns.
Thanks,
Jarod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.srv.cs.cmu.edu/mailman/private/autonlab-users/attachments/20130214/8f877dbc/attachment.html>
More information about the Autonlab-users
mailing list