From mjbaysek at cs.cmu.edu Fri Jul 8 11:38:07 2011 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Fri, 08 Jul 2011 11:38:07 -0400 Subject: [auton-users] Auton Cluster Maintenance on Monday Message-ID: <4E17245F.3060109@cs.cmu.edu> Hi Lab, On this coming Monday, July 10th starting at 8:00 AM, I will be taking advantage of the relatively quiet system activity and will be shutting down the Auton central compute system in order to perform system maintenance. This work will set us up with the extra storage and performance we will need in the coming months and years. The maintenance includes: * Switch File server to Scientific Linux 6 OS, a RHEL clone. * Switch Linux file services to NFS4 file protocol. * Switch to larger disk array. * Improvements in file performance over NFS. * Software upgrade of primary Firewall server, time permitting. During this maintenance, many services will be unavailable. * /auton space, by all access methods. * All compute nodes. * CVS and Subversion. * Network copies of Eclipse, Netbeans, Matlab, etc. * MySQL server on LYRE. * ViewVC. These services are among those that will be largely unaffected, but they may be restarted or briefly interrupted throughout the course of the maintenance: * TCWI instances. * LOT1 (Project Server). * GD1 (Project Server). * Bugzilla. * SDSS. I plan to begin the maintenance at 8 AM. The work will last into the afternoon. For Lab Staff with desktop workstations: This work requires me to forcibly unmount /auton from all machines that mount it, so it's best to log out of your workstation on Friday (Today) when you leave the office. If you must leave processes running, any processes that have open file handles to /auton space will hang forever. They will not 'pick up where they left off' when the server comes online, like they usually would during a server outage. It would be best to keep any processes accessing files only on your local disk. If you have any questions regarding the planned outage, or a specific service, please drop me a mail. Mike P.S. /auton will need to be force unmounted from the NEILL system as a result of this work. This maintenance should not affect the NEILL system *unless you are accessing files from /auton*. If you are, those processes will hang indefinitely. -- Michael J. Baysek Systems Analyst Carnegie Mellon University / Auton Lab 412-268-8939 - mjbaysek at cs.cmu.edu http://www.autonlab.org From mjbaysek at cs.cmu.edu Fri Jul 8 11:46:58 2011 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Fri, 08 Jul 2011 11:46:58 -0400 Subject: [auton-users] Auton Cluster Maintenance on Monday In-Reply-To: <4E17245F.3060109@cs.cmu.edu> References: <4E17245F.3060109@cs.cmu.edu> Message-ID: <4E172672.7000906@cs.cmu.edu> Correction, MONDAY July 11th. -- Michael J. Baysek Systems Analyst Carnegie Mellon University / Auton Lab 412-268-8939 - mjbaysek at cs.cmu.edu http://www.autonlab.org On 07/08/2011 11:38 AM, Michael J. Baysek wrote: > Hi Lab, > > On this coming Monday, July 10th starting at 8:00 AM, I will be > taking advantage of the relatively quiet system activity and will > be shutting down the Auton central compute system in order to > perform system maintenance. This work will set us up with the > extra storage and performance we will need in the coming months > and years. > > The maintenance includes: > > * Switch File server to Scientific Linux 6 OS, a RHEL clone. > * Switch Linux file services to NFS4 file protocol. > * Switch to larger disk array. > * Improvements in file performance over NFS. > * Software upgrade of primary Firewall server, time permitting. > > During this maintenance, many services will be unavailable. > > * /auton space, by all access methods. > * All compute nodes. > * CVS and Subversion. > * Network copies of Eclipse, Netbeans, Matlab, etc. > * MySQL server on LYRE. > * ViewVC. > > These services are among those that will be largely unaffected, but > they may be restarted or briefly interrupted throughout the course > of the maintenance: > > * TCWI instances. > * LOT1 (Project Server). > * GD1 (Project Server). > * Bugzilla. > * SDSS. > > I plan to begin the maintenance at 8 AM. The work will last into > the afternoon. > > For Lab Staff with desktop workstations: > > This work requires me to forcibly unmount /auton from all machines > that mount it, so it's best to log out of your workstation on Friday > (Today) when you leave the office. If you must leave processes > running, any processes that have open file handles to /auton space > will hang forever. They will not 'pick up where they left off' when > the server comes online, like they usually would during a server > outage. It would be best to keep any processes accessing files only > on your local disk. > > If you have any questions regarding the planned outage, or a specific > service, please drop me a mail. > > Mike > > P.S. /auton will need to be force unmounted from the NEILL system as > a result of this work. This maintenance should not affect the NEILL > system *unless you are accessing files from /auton*. If you are, those > processes will hang indefinitely. > > > > > > From mjbaysek at cs.cmu.edu Mon Jul 11 18:50:19 2011 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Mon, 11 Jul 2011 18:50:19 -0400 Subject: [auton-users] System Status Message-ID: <4E1B7E2B.901@cs.cmu.edu> Lab: A brief update on the system status: Maintenance and system refactoring is going well. * Linux boxes are all back on /auton * Software on /auton like matlab and eclipse are available again. * CVS is back up * Went from having a few GB of free space to 6 TB. * NFS performance is much improved * NFS now supports >= 2GB files fully Few things that aren't working yet, but will be in this order: * Windows file service is not up yet * Backups are not running yet * SVN is not up yet * MySQL on lyre is not up yet * ViewVC is not up The remaining items will be cleaned up in a matter of a few days. The top 3 remaining items will be fixed tomorrow. Please let me know if you have any problems (such as your system not working correctly) with any of the changes. -- Michael J. Baysek Systems Analyst Carnegie Mellon University / Auton Lab 412-268-8939 - mjbaysek at cs.cmu.edu http://www.autonlab.org From mjbaysek at cs.cmu.edu Mon Jul 25 11:55:00 2011 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Mon, 25 Jul 2011 11:55:00 -0400 Subject: [auton-users] Fwd: [sys] Host UP alert for lou1! Message-ID: <4E2D91D4.10704@cs.cmu.edu> LOU1 is back up. Anyone who was doing/running anything on LOU1 last night at or around 11:20 PM, please let me know what you were doing. The kernel panicked in an odd way, and I'd like to know as much as possible about what was going on at the time of the crash to aid in diagnosing what happened. Mike -------- Original Message -------- Subject: [sys] Host UP alert for lou1! Date: Mon, 25 Jul 2011 11:47:40 -0400 (EDT) From: nagios at lmike.localdomain To: sysnotify at int.autonlab.org ***** Nagios ***** Notification Type: RECOVERY Host: lou1 State: UP Address: 192.168.6.80 Info: PING OK - Packet loss = 0%, RTA = 0.76 ms Date/Time: Mon Jul 25 11:47:40 EDT 2011 -------------- next part -------------- An HTML attachment was scrubbed... URL: