lop1 now works!
Predrag Punosevac
predragp at cs.cmu.edu
Tue Jan 26 15:07:52 EST 2016
Junier Oliva <junieroliva at gmail.com> wrote:
> Hi Predrag,
>
> I can't seem to log into lop1 even though it seems to be up on monit. Is it
> operational?
>
> Thanks,
> Junier
Stale NFS file handles! I rebooted the lop1 and now it is fully
operational.
Predrag
> On Jan 25, 2016 6:05 PM, "Predrag Punosevac" <predragp at cs.cmu.edu> wrote:
>
> > Predrag Punosevac <predragp at cs.cmu.edu> wrote:
> >
> > > Predrag Punosevac <predragp at cs.cmu.edu> wrote:
> > >
> > > > Dear Autonians,
> > > >
> > > > One of us have done something nasty to our network file systems which
> > > > have caused massive outage in the Lab. I am working to restore the
> > > > services. Please stay tuned.
> > > >
> > > > Predrag
> > > >
> > > > P.S. The person that have done this will be hunted down and will have
> > to
> > > > give at least 5 seminar talks until the end of the year.
> > >
> > >
> > > I have a little more info about this. The outrage is caused by power
> > > failure in one of our racks. I am working on this right now. I can't giv
> > > the estimate how long would it take to restore the services. File
> > > servers are affected!
> > >
> > > Predrag
> >
> > Ok Folks,
> >
> > I was able to partially restore the power in the A1-2C. This is the
> > most important server RACK as it is hosting core network infrastructure
> > servers, file servers (GAIA, Neill-ZFS, Uranus), virtual host Athena,
> > as well as the following computing nodes GPU1, GPU2, ari, foxconn, low1,
> > lov3, lov4, lot1.
> >
> > This is the summary.
> >
> > All core network servers, Athena, and Uranus are safe fully operational
> > and connected to its own 120V PDU.
> >
> > File servers GAIA are Neill-ZFS are safe, fully operational and
> > connected to the their own 120V PDU.
> >
> > GPU1 and GPU2 are fully operational and safe connected to their own 208V
> > PDU.
> >
> > I have shut down on the emergency basis the following computing nodes
> >
> > ari, foxconn, lov3, lov4, low1, and lot1. I am afraid to add them to
> > any of the above mentioned PDU units. The good news is that I have a
> > space for and extra power supply in this rack so the best and the
> > easiest solution would be to add another PDU/UPS and safely connect this
> > computing nodes to separate power supply. They will remain down at least
> > until tomorrow morning while I consult with Artur about the future
> > course of action.
> >
> > Best,
> > Predrag
> >
More information about the Autonlab-users
mailing list