ssh login problems (NFS server killed due to overload)
Predrag Punosevac
predragp at andrew.cmu.edu
Mon Oct 24 11:43:05 EDT 2022
Dear Autoninas,
I got several reports this morning from a few of you (Ifi, Abby, Ben,
Vedant) that they are having problems accessing the system. After a bit of
investigation, I nailed down the culprit to the main file server. The
server (NFS instance) appears to be dead or severely degraded due to the
overload.
I am afraid that the only medicine will be to reboot the machine, perhaps
followed up by the reboot of all 45+ computing nodes. This will result in a
significant loss of work and productivity. We did go through this exercise
less than two months ago.
The Auton Lab cluster is not policed for rogue users. Its usability depends
on collegial behaviour of each of our 130 members. Use of scratch
directories instead of taxing NFS is well described in the documentation
and as recently as last week I added extra scratch on at least four
machines.
Best,
Predrag
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20221024/eafa9e6d/attachment.html>
More information about the Autonlab-users
mailing list