[auton-users] Lofty Interruption Last Night

Jacob Joseph jmjoseph at andrew.cmu.edu
Thu Oct 7 10:57:55 EDT 2004


Late last night, between 9 and 12 a bug in the firmware of the primary 
disk array(BigPapa) was encountered.  This did result in the array going 
down until I noticed around 1am.  After waiting out some emergency 
backups and an upgrade of the buggy firmware, I brought the array back 
up for good around 4-5am.

While nothing on the disks was lost, there is a slim to nil chance that 
writes in progress were interrupted at a point that could have resulted 
in corruption.  Due to the way we do NFS and the lack of caching in 
memory during writes, this is extremely unlikely.  All transfers resumed 
when the array was brought back up.  For the sake of clarity, I can 
imagine such corruption presenting itself as a few bytes missing from 
the middle of a written file.

Despite the low likelihood of any troubles, I think it's prudent to let 
everyone know the two critical time points where Lofty was uncleanly 
rebooted.  If you were writing between 9-12(I do not know the exact time 
of the shutdown offhand) and at about 3am, and you see problems with 
your data, it would be wise to keep this issue in mind.  Please note 
that reads would not have been affected in any case.

While I've tried to address everything, please let me know if you have 
any questions.

-Jacob



More information about the Autonlab-users mailing list