[auton-users] Lofty Interruption Last Night
Jacob Joseph
jmjoseph at andrew.cmu.edu
Thu Oct 7 10:57:55 EDT 2004
Late last night, between 9 and 12 a bug in the firmware of the primary
disk array(BigPapa) was encountered. This did result in the array going
down until I noticed around 1am. After waiting out some emergency
backups and an upgrade of the buggy firmware, I brought the array back
up for good around 4-5am.
While nothing on the disks was lost, there is a slim to nil chance that
writes in progress were interrupted at a point that could have resulted
in corruption. Due to the way we do NFS and the lack of caching in
memory during writes, this is extremely unlikely. All transfers resumed
when the array was brought back up. For the sake of clarity, I can
imagine such corruption presenting itself as a few bytes missing from
the middle of a written file.
Despite the low likelihood of any troubles, I think it's prudent to let
everyone know the two critical time points where Lofty was uncleanly
rebooted. If you were writing between 9-12(I do not know the exact time
of the shutdown offhand) and at about 3am, and you see problems with
your data, it would be wise to keep this issue in mind. Please note
that reads would not have been affected in any case.
While I've tried to address everything, please let me know if you have
any questions.
-Jacob
More information about the Autonlab-users
mailing list