network/NFS issues part II

predragp at andrew.cmu.edu predragp at andrew.cmu.edu
Sat Mar 14 23:19:20 EDT 2015


Dear Autonians,

After further troubleshooting network/NFS problem I pinpointed the issue
to the main file server GAIA. To make matters complicated I realized that
/zfsauton/project and /zfsauton/data are mounted on Neill group computing
nodes for project Hightmark. One should not be writing any data to
/zfsauton/data (mounted with rw so it is possible to write) which leaves
us with  wedged  /zfsauton/project or to your /zfsauton/home directory
(not relevant for the members of Neill group) as the cause of all
troubles.

Careful inspection of Collectd disk writes data for Gaia HDD reveals lots
of tiny disk writes (speed never exceeds 70k Bytes/s) which is pathetic
(for the reference current speed on Neill-ZFS is 800k Bytes/s per disk).

Anyhow at this point I am confident that members of Neill group are not
seriously affected unless they work on Highmark data. I would be happy to
temporary umount /zfsauton/project and /zfsauton/data directories from
Neill[1-4] to make sure those four computing nodes are 100% OK.

The rest of lab might experience various issues. You safest bet for login
into the lab is shell.autonlab.org since NFS shares are not mounted there.
If the issues persist I will be forced to reboot main file server (current
uptime 267 days).

Predrag





More information about the Autonlab-users mailing list