[auton-users] IMPORTANT, PLEASE READ: Auton System up, and review
Michael J. Baysek
mjbaysek at cs.cmu.edu
Sat Mar 28 22:56:57 EDT 2009
Today, at approximately 12:00 noon, our primary file server, LOFTY went
down. This is the server that provides "BigPapa" file services, among
several other system critical services. The normal methods for remotely
accessing by serial console were not yielding any results.
I arrived on the scene a bit after 1:00 PM. The boot disk had failed,
with unreadable sectors. I restored to a working drive from last
night's backup. The system appeared to be fine, and after a quick
stress test, I returned home.
I checked on the system again, and noticed it had been rebooting itself
periodically since I had left. Upon noticing this, I knew it would be
necessary to switch "BigPapa" file services to the backup server. I
completed this switchover at 22:40, and BigPapa file services are again
restored. This will keep things going while I see what's wrong with LOFTY.
Be aware that for the time being, the "/mnt/userdirs" directory is
temporarily unavailable. Also unavailable is the Subversion
repository. These services should be restored on Monday.
Any processes that were running on the LOPs during the outage were not
terminated. However, please check any output files generated for any
signs of truncated or corrupted files. This "should not" happen, but
it's always safer to check.
More information about the Autonlab-users
mailing list