From mjbaysek at cs.cmu.edu Wed Dec 22 11:43:37 2010 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Wed, 22 Dec 2010 11:43:37 -0500 Subject: [auton-users] Auton Downtime Scheduled Message-ID: <4D122AB9.10106@cs.cmu.edu> Hi Lab, All Auton Lab computing resources will be down for maintenance on Wed Dec 29, from 7:00 AM to 7:00 PM. This outage applies also to Neill Lab users. Services affected: All. Any processes, jobs, or screen sessions that are still running on compute nodes at the time of the scheduled outage will be terminated. Lab supported desktops will function by request only. Auton Lab staff, you must notify me by Tuesday if you need access to your machine on Wednesday. Happy Holidays Everyone, Mike From mjbaysek at cs.cmu.edu Tue Dec 28 12:12:32 2010 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Tue, 28 Dec 2010 12:12:32 -0500 Subject: [auton-users] Auton Downtime Scheduled Message-ID: <4D1A1A80.1020505@cs.cmu.edu> Hi Lab, This is a reminder about the downtime tomorrow. Please be sure to let me know in the next 12 hours if you will require access to your lab supported desktop machine during the outage window. Mike Original Notification Follows: All Auton Lab computing resources will be down for maintenance on Wed Dec 29, from 7:00 AM to 7:00 PM. This outage applies also to Neill Lab users. Services affected: All. Any processes, jobs, or screen sessions that are still running on compute nodes at the time of the scheduled outage will be terminated. Lab supported desktops will function by request only. Auton Lab staff, you must notify me by Tuesday if you need access to your machine on Wednesday. Happy Holidays Everyone, Mike From mjbaysek at cs.cmu.edu Wed Dec 29 22:26:02 2010 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Wed, 29 Dec 2010 22:26:02 -0500 Subject: [auton-users] Auton Downtime Status Message-ID: <4D1BFBCA.70505@cs.cmu.edu> Hi Lab. Status report (in short): Things are back up... with caveats. More info: I was not able to get started until after 11, due to some file copying taking longer than estimated (way to many small files). My original estimate of 12 hours was about right, but due to the late start, the outage window lasted 3.5 hours later than expected. There were many ambitious goals undertaken for this outage window. Not all of them were completed. This is the downside of being an only one person I.T. department. It turned out taking much longer to measure/redistribute power requirements/rewire than expected. Now that this is completed, we are ready for the new hadoop cluster and other future growth. I'd like to thank Robin for giving me a very helpful hand in this very boring and time consuming task. Now the less happy news. Yet to be completed is the switchover to the new file server. This was a big goal that I wanted to get done this time, but it's not happening. We'll need to suffer with the slow write performance until next maintenance window (TBA). It will be within the next month. Additionally, Neill1 is down and pending further troubleshooting. It should be back up before midweek next week. Mike Happy New Year :)