From predragp at andrew.cmu.edu Thu Feb 6 15:07:37 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 6 Feb 2014 15:07:37 -0500 Subject: [auton-users] LOV3 must be powered down Message-ID: Dear Autons, I hate to do this but a memory module is dead on the LOV3 (yes on the brand new computer) and I have to power down to replace it. How does tomorrow 2/7/2014 at 2:00 PM sound to you? Most Kind Regards, Predrag From predragp at andrew.cmu.edu Thu Feb 6 15:12:11 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 6 Feb 2014 15:12:11 -0500 Subject: [auton-users] LOT2 must be powered down Message-ID: <97cb1a8dc8313ace38187f780871ffd6.squirrel@webmail.andrew.cmu.edu> Dear Autons, The same dead memory module problem with LOT2 as with LOV3 except the fact that LOT2 memory modules are not manufactured any more. I will have to power down to get of serial number from the modules and BIOS which I was asked by Supermicro. The same thing as LOV3. I would like to power down tomorrow at 2:00 PM for hopefully no more than 30 minutes. Most Kind Regards, Predrag From predragp at andrew.cmu.edu Fri Feb 7 17:57:12 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Fri, 7 Feb 2014 17:57:12 -0500 Subject: [auton-users] LOV3 repaired Message-ID: Dear Autons, I replaced faulty memory module circa $250 courtesy of Silicon Mechanics :) LOV3 is back in business now. Please let me know if you notice anything strange. Predrag From predragp at andrew.cmu.edu Fri Feb 7 20:10:19 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Fri, 7 Feb 2014 20:10:19 -0500 Subject: [auton-users] LOT2 back on line Message-ID: <0e578185fa6b6e26fdeb8ad987d491a5.squirrel@webmail.andrew.cmu.edu> I successfully read off mother board serial number and BIOS version form LOT2. It is back on line. It passes memtest but mcolog is full of errors. Hopefully Supermicro technicians from Silicon Mechanics will able to help us. I ask those 5 people who have special account on the RAID 5 to let me know via private e-mail where do they need those file system mounted via NFS. Predrag From predragp at andrew.cmu.edu Wed Feb 12 12:01:27 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Wed, 12 Feb 2014 12:01:27 -0500 Subject: [auton-users] LOW1 back in business Message-ID: <242125e8d12923b36eae4cb9f4278342.squirrel@webmail.andrew.cmu.edu> LOW1 is back in business after a kernel dump due to high memory use. Predrag From predragp at andrew.cmu.edu Fri Feb 14 16:46:06 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Fri, 14 Feb 2014 16:46:06 -0500 Subject: [auton-users] gcc (GCC) 4.8.1 on computing nodes Message-ID: Many of you have asked me off the list about the latest version of GNU compiler collection and tools on our computing nodes. New nodes LOV3,LOV4,LOU1 as well as the NREC machines have the latest GCC installed but you will need to use the full path in your Makefiles. The other computing nodes which need to be rebuild have typically gcc-4.7. Make sure you are using full path in your Makefiles root at lov3 yum.repos.d # /opt/rh/devtoolset-2/root/usr/bin/gcc --version gcc (GCC) 4.8.1 20130715 (Red Hat 4.8.1-4) Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE root at lov4 yum.repos.d # /opt/rh/devtoolset-2/root/usr/bin/gcc --version gcc (GCC) 4.8.1 20130715 (Red Hat 4.8.1-4) Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE root at lou1 yum.repos.d # /opt/rh/devtoolset-2/root/usr/bin/gcc --version gcc (GCC) 4.8.1 20130715 (Red Hat 4.8.1-4) Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. From predragp at andrew.cmu.edu Fri Feb 28 16:25:22 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Fri, 28 Feb 2014 16:25:22 -0500 Subject: [auton-users] DNS really slooow Message-ID: Dear Autons, Many of you have noticed while trying to log into computing nodes that DNS is really slooow. Without metric monitoring it is difficult to pin point the exact culprit. However our entire core network infrastructure LOCK (firewall and VPN gateway), Lofty (DHCP server, primary DNS and NIS controller) as well as Lair (secondary DNS) are way overdue for decommission. The good news is that this was anticipated and that AutonLab directors Dr. Dubrawski and Dr. Schneider have made significant hardware purchases in the past two months from their discretionary funds which will enable us to fix the problems. I am happy to report that as of this morning we have three new core network infrastructure machines running Areas (Primary firewall, VPN gateway, and DNS server) Atlas (LDAP domain controller and DNS server) Horae (DMZ firewall and DNS server) As of this afternoon AutonLab DNS cluster (we switched from BIND to Unbound for int.autonlab.org so we now have a cluster of DNSs) is fully functional. However at this very moment only new not yet released file servers GAIA and Neill-ZFS as well as one computing node LOW1 have been switched to fully static IP addresses and new DNS cluster. An immediate fallout has been noticed by Benedikt Boecking. MATLAB no longer works on LOW1 as a consequence of the fact that its goofy licensing manager can't just open random ports to talk to university licensing server as in the past. The solution is to rebuild LOW1 and have self hosting copy of MATLAB just like we have now on LOV3, LOV4, LOU1 and NAVY cluster. This will happen very soon. My plan for next couple of days is: 1. Get new OpenVPN server up and running as well as switch your desktops to new server (much improved security with TLS encrypting). 2. Enable LDAP and move users info from Lofty to Atlas. As soon as LDAP is functional new files servers will released. 3. Start gradually (as they gets rebuild) switching all computing nodes to static IP addresses and DNS cluster (this will take couple of weeks to complete). I would like to thank you for your patience in this matter and ask you to aggressively report any unexpected behavior. Most Kind Regards, Predrag Punosevac