<div dir="ltr">GPU[1-9], and Denver are not available. I could not reach them with IPMI which meant that there was no power in the rack A1-2A or the switch died. I just called OPS guys. The machines are actually up. The switch was powered on but not flushing. OPS guys rebooted the switch but to no avail. Piotr will have to replace the switch tomorrow morning. We have one in the storage room ready for a situation like this.<div><br></div><div>On a related note. I updated xen host and restarted all virtual machines. LOP2 should be now available as well as a bunch of other stuff including Observium. Everything else looks OK to me but please ping Piotr and me with any issues. I need to get back to much big problems in my current lab :-)</div><div><br></div><div>Best,</div><div>Predrag</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 26, 2023 at 6:01 PM Predrag Punosevac <<a href="mailto:predragp@andrew.cmu.edu">predragp@andrew.cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Kudos to Piotr! Everything is up and running now. I managed to patch all perimeter firewalls and network service machines during the outage. If you use OpenVPN on your desktop you will need to restart the daemon. If you don't know how, just reboot the machine.<div><br></div><div>Predrag</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 26, 2023 at 4:20 PM Predrag Punosevac <<a href="mailto:predragp@andrew.cmu.edu" target="_blank">predragp@andrew.cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi Piotr,<div><br></div><div>Two of our newer perimeter firewalls (Phobos/Deimos) have dead CMOS batteries. The power outage was too long. UPS batteries ran out of juice. We have the same problem we had in May of this year. You need to go to the server room and physically attach monitors to these machines and reset boot order in UEFI. There is nothing I can do from New Mexico. Entire traffic goes through these machines. They are only a few years old but they were apparently shipped with bad CMOS batteries. </div><div><br></div><div>Best,</div><div>Predrag</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 26, 2023 at 3:28 PM Piotr Bartosiewicz <<a href="mailto:pbartosi@andrew.cmu.edu" target="_blank">pbartosi@andrew.cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Update:</div><div><a href="https://computing.cs.cmu.edu/dashboard/2023/scs-wean-machine-room-alert-10-26-2023" target="_blank">https://computing.cs.cmu.edu/dashboard/2023/scs-wean-machine-room-alert-10-26-2023</a></div><div><br></div><div>There was a power outage.</div><div><br></div><div>Piotr.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 26, 2023 at 3:25 PM Piotr Bartosiewicz <<a href="mailto:pbartosi@andrew.cmu.edu" target="_blank">pbartosi@andrew.cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Looks like there is a network problem at SCS level.</div><div>We're looking into it. </div><div><br></div><div>Piotr.</div><div><br></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>