[auton-users] IMPORTANT, PLEASE READ: Auton System up, and review

Michael J. Baysek mjbaysek at cs.cmu.edu
Sat Mar 28 22:56:57 EDT 2009


Today, at approximately 12:00 noon, our primary file server, LOFTY went 
down.  This is the server that provides "BigPapa" file services, among 
several other system critical services.  The normal methods for remotely 
accessing by serial console were not yielding any results. 


I arrived on the scene a bit after 1:00 PM.  The boot disk had failed, 
with unreadable sectors.  I restored to a working drive from last 
night's backup.  The system appeared to be fine, and after a quick 
stress test, I returned home.


I checked on the system again, and noticed it had been rebooting itself 
periodically since I had left.  Upon noticing this, I knew it would be 
necessary to switch "BigPapa" file services to the backup server.  I 
completed this switchover at 22:40, and BigPapa file services are again 
restored.  This will keep things going while I see what's wrong with LOFTY.


Be aware that for the time being, the "/mnt/userdirs" directory is 
temporarily unavailable.  Also unavailable is the Subversion 
repository.  These services should be restored on Monday.


Any processes that were running on the LOPs during the outage were not 
terminated.  However, please check any output files generated for any 
signs of truncated or corrupted files.  This "should not" happen, but 
it's always safer to check.





More information about the Autonlab-users mailing list