Main file server in serious danger
Predrag Punosevac
predragp at cs.cmu.edu
Wed Aug 23 19:14:43 EDT 2017
Dear Autonians,
As you know we do not have any user quotas on the main file server
usage. The working
assumption is that our users are smart, considerate, and responsible.
Unfortunately
while I was a way last week somebody offloaded a large amount of data to
the main
file server practically filling it up
Filesystem Size Used Avail Use% Mounted on
gaia:/mnt/zfsauton/zdata 1.3T 1.2T 44G 97% /zfsauton/data
gaia:/mnt/zfsauton/project 1.6T 1.5T 44G 98% /zfsauton/project
gaia:/mnt/zfsauton/home 18T 15T 2.8T 85% /zfsauton/home
The above numbers are with ZFS compression. In reality we have now
almost 24 TB of
home directories alone. The ZFS pools are over 80% filled which means
that re-silvering
(in the case of drive failure) will take months instead of a day or two.
At this point I need you to restrict writing to the file system to the
bare minimum as
the crash of main file server will make your home directories
unavailable and
nobody will be able to log into our infrastructure.
I have being running du for the past 2 hours to figure out culprits but
the command
is not responsive. If you are the one who dump large amount of data
please do not
try to delete it. You will not be able to delete anything from ZFS. You
will just create
more matadata and make things worse. I need to stop
ZFS snapshots and replications before anything can be really deleted.
The process is
very cumbersome and has being done only once in the past.
Another option is that we replace current 3TB HDD with for example 6TB
HDD and double
the space to close to 50 TB. The cost for that is approximately 4k. We
also have a
huge (150TB) special purpose file server which is not currently utilized
very much.
Best,
Predrag
More information about the Autonlab-users
mailing list