CPU servers lo3, lo4, low1, ari, Foxconn down
Predrag Punosevac
predragp at andrew.cmu.edu
Mon Jun 22 13:13:26 EDT 2020
On Mon, Jun 22, 2020 at 12:03 PM Arundhati Banerjee
<arundhat at andrew.cmu.edu> wrote:
>
> Hi Predrag,
>
> I just wanted to bring to your attention that some of the CPU servers are not
> accessible at the moment. I actually had some code running on Foxconn which I > am unable to access now. I would be obliged if you could kindly look into it.
>
I saw it on Monit when I logged in this morning. This appears to be a
major problem with the electric supply. The one common things for all
these five servers is that they are connected to the same dumb PDU
(power distribution unit) which in turn is connected to the same old
208V UPS. It was planned before Cov19 to remove all computing nodes
from UPSs.
I am trying to talk to David who is a server room attendant and the
person physically located in Wean 3611. However, I would not hold my
breath with this one. If the circuits are messed up that would require
Ed Walter, me, and perhaps external contractors in the server room.
That can take a long time.
A nuclear option is moving those 5 servers to GHC. I am not sure that
I would have enough electricity there. In either case we have a major
problem on our hands.
In my almost 8 years with the lab I have not seen such catastrophic
failure of power circuits.
Best,
Predrag
> Thank you.
> Best regards,
> Arundhati
More information about the Autonlab-users
mailing list