From mjbaysek at cs.cmu.edu Tue Mar 3 09:17:26 2009 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Tue, 03 Mar 2009 09:17:26 -0500 Subject: [auton-users] Auton Lab VPN Outage Last Night Message-ID: <49AD3BF6.9050004@cs.cmu.edu> This message applies to you only if your machine is on the Auton Lab VPN. Last night there was an outage for the VPN service for clients whose addresses are 10.17.1.x, which is most of the VPN connected machines. The problem is corrected now, and your machine should be responding properly again. Please let me know if you continue to have any trouble. -Mike From mjbaysek at cs.cmu.edu Sat Mar 28 13:48:16 2009 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Sat, 28 Mar 2009 13:48:16 -0400 Subject: [auton-users] System is Down Message-ID: <49CE62E0.1080502@cs.cmu.edu> I am in the server room working on the system now. Will keep you all informed when the system is back up and running. From mjbaysek at cs.cmu.edu Sat Mar 28 17:44:16 2009 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Sat, 28 Mar 2009 17:44:16 -0400 Subject: [auton-users] System is Down In-Reply-To: <49CE62E0.1080502@cs.cmu.edu> References: <49CE62E0.1080502@cs.cmu.edu> Message-ID: <49CE9A30.4020409@cs.cmu.edu> System is now working again. The outage was caused by a disk failure in the primary file server. Because the fileserver did hard-lock up during the crash, and at various places in the diagnostic, I recommend you check the output of any jobs that you had running as of 12:00 noon today, when the system went down. If you have any fear that any of your output files are incomplete, you should probably run the job again. Michael J. Baysek wrote: > I am in the server room working on the system now. Will keep you all > informed when the system is back up and running. > > > > From mjbaysek at cs.cmu.edu Sat Mar 28 20:03:47 2009 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Sat, 28 Mar 2009 20:03:47 -0400 Subject: [auton-users] System still unstable Message-ID: <49CEBAE3.7010901@cs.cmu.edu> I will keep you posted. From mjbaysek at cs.cmu.edu Sat Mar 28 20:37:40 2009 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Sat, 28 Mar 2009 20:37:40 -0400 Subject: [auton-users] System back to normal In-Reply-To: <49CEBAE3.7010901@cs.cmu.edu> References: <49CEBAE3.7010901@cs.cmu.edu> Message-ID: <49CEC2D4.2020409@cs.cmu.edu> System is back to normal... and stable! Michael J. Baysek wrote: > I will keep you posted. > From mjbaysek at cs.cmu.edu Sat Mar 28 20:58:07 2009 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Sat, 28 Mar 2009 20:58:07 -0400 Subject: [auton-users] No it isn't In-Reply-To: <49CEC2D4.2020409@cs.cmu.edu> References: <49CEBAE3.7010901@cs.cmu.edu> <49CEC2D4.2020409@cs.cmu.edu> Message-ID: <49CEC79F.3010307@cs.cmu.edu> As soon as I sent the email out after 30 good minutes of uptime, the server goes down again, even after replacing the failed disk. I have decided that I cannot trust this hardware right now without more in depth diagnosis. I am heading back to CMU and will be switching over to the secondary server. Here is what this means: Subversion will be down. /mnt/userdirs will be unavailable. Again, will keep you all in the loop... Mike Michael J. Baysek wrote: > System is back to normal... and stable! > > > > Michael J. Baysek wrote: >> I will keep you posted. >> > From mjbaysek at cs.cmu.edu Sat Mar 28 22:56:57 2009 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Sat, 28 Mar 2009 22:56:57 -0400 Subject: [auton-users] IMPORTANT, PLEASE READ: Auton System up, and review Message-ID: <49CEE379.7040805@cs.cmu.edu> Today, at approximately 12:00 noon, our primary file server, LOFTY went down. This is the server that provides "BigPapa" file services, among several other system critical services. The normal methods for remotely accessing by serial console were not yielding any results. I arrived on the scene a bit after 1:00 PM. The boot disk had failed, with unreadable sectors. I restored to a working drive from last night's backup. The system appeared to be fine, and after a quick stress test, I returned home. I checked on the system again, and noticed it had been rebooting itself periodically since I had left. Upon noticing this, I knew it would be necessary to switch "BigPapa" file services to the backup server. I completed this switchover at 22:40, and BigPapa file services are again restored. This will keep things going while I see what's wrong with LOFTY. Be aware that for the time being, the "/mnt/userdirs" directory is temporarily unavailable. Also unavailable is the Subversion repository. These services should be restored on Monday. Any processes that were running on the LOPs during the outage were not terminated. However, please check any output files generated for any signs of truncated or corrupted files. This "should not" happen, but it's always safer to check. From mjbaysek at cs.cmu.edu Tue Mar 31 09:39:12 2009 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Tue, 31 Mar 2009 09:39:12 -0400 Subject: [auton-users] Critical Notice for all Windows Users: Conficker Virus Warning and Fast Facts Message-ID: <49D21D00.7040909@cs.cmu.edu> Hi Lab, You may have heard about the virus known as "Conficker". The virus has had mainstream media attention, and is expected to be one of the biggest worms we have seen in years. Tomorrow, the virus "phones home" for instructions. Any machine infected by this virus is likely to be turned into a botnet zombie host. There are simple checks you can do yourself to see if you have the worm. 1) If you open Internet Explorer and try to visit http://windowsupdate.microsoft.com and you are unable to load the site, you may already be infected by the worm. 2) Go to Start -> Run. Type services.msc in the box, and press Enter. Look to see if BITS or Windows Defender services are disabled. If they are, you may be infected. THIS IS CRITICAL. If you are on a Windows host, please be certain (AND DO IT NOW) that you have run Windows Update and verified all patches are installed. If you cannot access Windows Update, and believe your machine to be infected, please let me know, unless you wish to clean the virus yourself. Again, You must do this today. The virus 'wakes up' on April 1. -- -- Michael J. Baysek, Systems Analyst Carnegie Mellon University - Auton Lab www.cmu.edu - www.autonlab.org 412-268-8939 From mjbaysek at cs.cmu.edu Tue Mar 31 11:27:38 2009 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Tue, 31 Mar 2009 11:27:38 -0400 Subject: [auton-users] Clarification on Conficker fact Message-ID: <49D2366A.6000705@cs.cmu.edu> If you are looking for BITS, it is knows as Background Intelligent Transfer Service. I should have spelled that out. Also, Windows Defender service will only be present if you have installed Windows Defender. If Defender is installed, and disabled, you should be concerned. You can also check to see if the System Restore Service is enabled. If disabled, it may be a sign of Conficker, or any number of other malware programs. -- -- Michael J. Baysek, Systems Analyst Carnegie Mellon University - Auton Lab www.cmu.edu - www.autonlab.org 412-268-8939