ssh login problems (NFS server killed due to overload)

Predrag Punosevac predragp at andrew.cmu.edu
Mon Oct 24 11:43:05 EDT 2022


Dear Autoninas,

I got several reports this morning from a few of you (Ifi, Abby, Ben,
Vedant) that they are having problems accessing the system. After a bit of
investigation, I nailed down the culprit to the main file server. The
server (NFS instance) appears to be dead or severely degraded due to the
overload.

I am afraid that  the only medicine will be to reboot the machine, perhaps
followed up by the reboot of all 45+ computing nodes. This will result in a
significant loss of work and productivity. We did go through this exercise
less than two months ago.

The Auton Lab cluster is not policed for rogue users. Its usability depends
on collegial behaviour of each of our 130 members. Use of scratch
directories instead of taxing NFS is well described in the documentation
and as recently as last week I added extra scratch on at least four
machines.

Best,
Predrag
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20221024/eafa9e6d/attachment.html>


More information about the Autonlab-users mailing list