Main file server in serious danger

Predrag Punosevac predragp at cs.cmu.edu
Wed Aug 23 19:14:43 EDT 2017


Dear Autonians,

As you know we do not have any user quotas on the main file server 
usage. The working
assumption is that our users are smart, considerate, and responsible. 
Unfortunately
while I was a way last week somebody offloaded a large amount of data to 
the main
file server practically filling it up

Filesystem                  Size  Used Avail Use% Mounted on
gaia:/mnt/zfsauton/zdata    1.3T  1.2T   44G  97% /zfsauton/data
gaia:/mnt/zfsauton/project  1.6T  1.5T   44G  98% /zfsauton/project
gaia:/mnt/zfsauton/home      18T   15T  2.8T  85% /zfsauton/home

The above numbers are with ZFS compression. In reality we have now 
almost 24 TB of
home directories alone. The ZFS pools are over 80% filled which means 
that re-silvering
(in the case of drive failure) will take months instead of a day or two.

At this point I need you to restrict writing to the file system to the 
bare minimum as
the crash of main file server will make your home directories 
unavailable and
nobody will be able to log into our infrastructure.

I have being running du for the past 2 hours to figure out culprits but 
the command
is not responsive. If you are the one who dump large amount of data 
please do not
try to delete it. You will not be able to delete anything from ZFS. You 
will just create
more matadata and make things worse. I need to stop
ZFS snapshots and replications before anything can be really deleted. 
The process is
very cumbersome and has being done only once in the past.


Another option is that we replace current 3TB HDD with for example 6TB 
HDD and double
the space to close to 50 TB. The cost for that is approximately 4k. We 
also have a
huge (150TB) special purpose file server which is not currently utilized 
very much.


Best,
Predrag


More information about the Autonlab-users mailing list