Hard disk drive failure

Predrag Punosevac predragp at andrew.cmu.edu
Mon Apr 23 17:59:48 EDT 2018


Dear Autonians,

Normally I try to keep quiet about occasional HDD failures which are
fixable. However I feel that some of you might have noticed that 
/zfsauton/public and /zfsauton/data went down for a bout 15 minutes and
some of your scripts crashed. 

Unfortunately one of HDDs which constitutes a ZFS pool hosting those two
data sets was failing long S.M.A.R.T. tests and had to be replaced on
the short notice.  Since I prefer safer reboot over hot swap the server
was down for about 15 minutes (I don't like to take a chance by
accidently off lining the second HDDs in the same pool). Affected ZFS
pool is being resilvered as I am typing this message.

root at uranus:~ # zpool status backups
  pool: backups
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool
will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Apr 23 17:48:37 2018
        143G scanned out of 5.72T at 302M/s, 5h23m to go
        13.1G resilvered, 2.44% done
config:

        NAME                        STATE     READ WRITE CKSUM
        backups                     DEGRADED     0     0     0
          raidz2-0                  DEGRADED     0     0     0
            da0                     ONLINE       0     0     0
            da1                     ONLINE       0     0     0
            da2                     ONLINE       0     0     0
            replacing-3             OFFLINE      0     0     0
              17410010232298071688  OFFLINE      0     0     0  was
/dev/da3/old
              da3                   ONLINE       0     0     0
(resilvering)
            da4                     ONLINE       0     0     0
            da5                     ONLINE       0     0     0
            da6                     ONLINE       0     0     0
            da7                     ONLINE       0     0     0
            da8                     ONLINE       0     0     0
            da9                     ONLINE       0     0     0

errors: No known data errors


Best,
Predrag


More information about the Autonlab-users mailing list