From predragp at andrew.cmu.edu Thu Jan 2 11:45:23 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 2 Jan 2014 11:45:23 -0500 Subject: [auton-users] LOU1 kernel panic Message-ID: Lou1 our VirtualBox host has experienced kernel panic while I was trying to get trafficjam working on it. It took down with herself 6 other virtual machines. The good news is that I have backup for everything. The second good news is that trafficjam is up on LOW1 and will remain on until I am sure LOU1 can handle it. The other 6 virtual machines should be up by the end of the day either on LOU1 or on the backup VirualBox host LOW1. Predrag From predragp at andrew.cmu.edu Thu Jan 2 15:42:44 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 2 Jan 2014 15:42:44 -0500 Subject: [auton-users] LOU1 back in business Message-ID: <7de6b73c55b9ed5873a27ecfcbf1100b.squirrel@webmail.andrew.cmu.edu> LOU1 is basically back in business. I am still installing some small stuff but the big things Python27, MATLAB, R-3.0.2 are there. I will try to bring virtual machines back shortly on line. Predrag From predragp at andrew.cmu.edu Fri Jan 3 13:20:01 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Fri, 3 Jan 2014 13:20:01 -0500 Subject: [auton-users] Virtual Machines Update Message-ID: <7b1ae4976bd2502e0a6175d029072e61.squirrel@webmail.andrew.cmu.edu> 5 out of 7 Virtual Machines which I was moving from LOW1 to LOU1 are up and running (rawdata1, vlad, lweb, cdc, sdss). I am having troubles with trafficjam (the network interface has disappeared after exporting the virtual machine). I am working on it right now. As noted earlier cadata will have to be rebuilt. When all this is done and finished LOW1 will be rebuilt and used only as a computing node. Best, Predrag From predragp at andrew.cmu.edu Tue Jan 7 17:16:14 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Tue, 7 Jan 2014 17:16:14 -0500 Subject: [auton-users] [Fwd: Reminder: SERVICE OUTAGE: Instance Manager and License Server] Message-ID: <9c28fa039007669ebeb2569efc7b3ef6.squirrel@webmail.andrew.cmu.edu> If you do not use MATLAB please stop reading this e-mail now. This will affect old computing nodes which still use University licensing server. If you need to use MATLAB stick to LOV3, LOV4, LOU1, LXV1, LXV2, LERA or your local desktop copy. Predrag ---------------------------- Original Message ---------------------------- Subject: Reminder: SERVICE OUTAGE: Instance Manager and License Server From: "Help Desk" Date: Tue, January 7, 2014 3:51 pm To: "Help Desk" -------------------------------------------------------------------------- DATE: Tuesday, January 7, 2014 TIME: 8:00PM - 10:00PM SERVICES AFFECTED: - License servers for MATLAB, Pro/Engineer, and Simics - Windows account management via the Instance Manager DETAILS: On January 7, 2014, SCS Computing Facilities will perform software upgrades on virtual machine hosts supporting a variety of services, to address a recurring problem with a disk controller driver. This will result in these services being unavailable for part or all of this maintenance period. In addition, the MATLAB license server will be upgraded during this outage, in order to allow us to support MATLAB R2013b and newer. Please contact the SCS Help Desk at x8-4231 or send mail to > help at cs.cmu.edu if you have any questions regarding this outage. Thank you for your attention, SCS Help Desk From predragp at andrew.cmu.edu Wed Jan 8 12:35:37 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Wed, 8 Jan 2014 12:35:37 -0500 Subject: [auton-users] [Fwd: Reminder: SERVICE OUTAGE: SCS Web, Email and RAMS Financial Services - 1/9/2014] Message-ID: <7160aa0feedff07c8a6f5c7b6e871bb7.squirrel@webmail.andrew.cmu.edu> Some of us like to sleep at this time but if you are one of the people who have healthier life style than me you probably should be aware of this outrage. Predrag ---------------------------- Original Message ---------------------------- Subject: Reminder: SERVICE OUTAGE: SCS Web, Email and RAMS Financial Services - 1/9/2014 From: "Help Desk" Date: Wed, January 8, 2014 9:37 am To: "Help Desk" -------------------------------------------------------------------------- Date: Thursday, January 9, 2014 Time: 6:00 AM - 8:00 AM EST Services Affected: SCS Web Services, Email Services, and RAMS Financial Services On Thursday, January 9, 2014 SCS Computing Facilities will perform maintenance on the devices that provide SCS Web Services, Email Services, and RAMS Financial Services functionality. Applygrad, Graduate Student Tracking, ~username websites and other Computing Facilities hosted web services/sites may be impacted during this maintenance period. Users may experience intermittent outages of these services during the scheduled maintenance period. Please contact the SCS Help Desk at x8-4231 or send mail tohelp+ at cs.cmu.edu with any questions or concerns regarding this maintenance period. Thank you for your attention, SCS Help Desk From predragp at andrew.cmu.edu Wed Jan 8 14:52:22 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Wed, 8 Jan 2014 14:52:22 -0500 Subject: [auton-users] [Fwd: /auton on lyre 98% full] Message-ID: Dear Autons, I hate to ask you again but we have to do some house cleaning. Our main file server is running out of space. Could you please try to prune files which you do not need so that we ensure the stability of the file server at least until I can put new 24TB file server which I received today into production. For your convenience these are some of bigger users: 219G ./zonggel/research/sims 219G ./zonggel/research/sims 219G ./zonggel/research/sims 232G ./zonggel/research 232G ./zonggel/research 232G ./zonggel/research 232G ./zonggel 232G ./zonggel 259G ./lujiec 259G ./lujiec 357G ./tzukuoh/research/XDATA 357G ./tzukuoh/research/XDATA 357G ./tzukuoh/research/XDATA 357G ./tzukuoh/research/XDATA 525G ./tzukuoh/research 525G ./tzukuoh/research 525G ./tzukuoh/research 535G ./tzukuoh 535G ./tzukuoh Predrag ---------------------------- Original Message ---------------------------- Subject: /auton on lyre 98% full From: "Donghan (Jarod) Wang" Date: Wed, January 8, 2014 10:25 am To: "Predrag Punosevac" -------------------------------------------------------------------------- Just wanted to get your attention that /auton on lyre is 98% full. There's less than 200GB available space. You may want to work with some users with big home to work out a plan in reducing disk consumption. Best, Jarod -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Jan 15 01:09:00 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Wed, 15 Jan 2014 01:09:00 -0500 Subject: [auton-users] Mantis Bug Tracker Message-ID: <557b81e5985459eae1761c5edd19d7b6.squirrel@webmail.andrew.cmu.edu> I would like to make a first announcement that Mantis Bug Tracker is production ready and available to the members of the Auton Lab. If you are a project manager and would like to use bug trucking features please send me a private e-mail. Current caveats are: 1. HTTP interface is at the moment only available on the Auton Lab local network. You will have to use ssh reverse proxy to bypass the firewall if you want to use it right now from other locations. As soon as DMZ firewall is in production I will have ability to turn this thing on WWW at a minute notice. 2. At this point I have not integrated Mantis with our CVS and SVN repositories. I really, really want to do that but it will have to wait 2-3 months until I finish more pressing issues. 3. I used Gmail mail server to relay messages until I redo our Auton Lab mail server. Most Kind Regards, Predrag From predragp at andrew.cmu.edu Wed Jan 15 17:07:50 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Wed, 15 Jan 2014 17:07:50 -0500 Subject: [auton-users] LOV4 back in business Message-ID: Dear Autons, After a short down time LOV4 is now back in business. The good news is that the person who crashed the server by recklessly running too many scripts just to volunteer to buy a beer for the entire Lab :) Most Kind Regards, Predrag Punosevac From predragp at andrew.cmu.edu Sat Jan 18 14:33:21 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Sat, 18 Jan 2014 14:33:21 -0500 Subject: [auton-users] LOT2 back in business Message-ID: <4541bfb48895b4d46b6d0d1e01769d18.squirrel@webmail.andrew.cmu.edu> LOT2 is back on line after a short down time due to the kernel dump. I am not 100% what caused kernel dump but backtrace is pointing to AFS kernel driver. Most Kind Regards, Predrag Punosevac From predragp at andrew.cmu.edu Wed Jan 22 00:43:32 2014 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 22 Jan 2014 00:43:32 -0500 Subject: [auton-users] X2Go vis LOP1 and LOP2 ssh proxy Message-ID: <967AE99D-F4E8-461D-935F-567DDC76BD31@andrew.cmu.edu> Dear All, Recently it has been brought to my attention by Rob that X2Go client can use ssh proxy to connect to destination server. In the layman's terms that effectively mean that you do not need any VPN gateway or to know anything about ssh reverse proxy to have full graphical access to Auton Lab computing nodes. I asked Rob to post quick how to but he was not sure about that so here is the quick how to which is of course tested. 1. On your home computer open X2Go client which is available for all operating systems http://wiki.x2go.org/doku.php/download:start 2. Click to configure new session and name it 3. As a host value type the full name of the machine you want to connect to. For example : lou1.int.autonlab.org 4. In the login window type your username 5. keep the ports 22 6. The most IMPORTANT click option "Use Proxy server for SSH connection" 7. Additional options will open. 8. You may check use the the following options a. Same logins as on X2Go server b. Same password as on X2Go server since your LOP1 and LOP2 credentials are the same as targeted machine. 9. For the host name type either lop1.autonlab.org or lop2.autonlab.org 10. As a session type please select single application "Terminal" to be on the safe side as full desktop environment might not be available. 11. Click OK 12. You will be asked to provide your regular Auton Lab password. Once the terminal appears on your local X server type the name of graphical application you want to use. For example if you type matlab (possibly you need the full path) /usr/local/MATLAB/R2013/bin/matlab you will get MATLAB desktop. You can use that MATLAB desktop to plot you data or to make movies for that matter. You can similarly used RStudion-desktop or even a Firefox. On the final note X2Go server is only installed so far on LOV3, LOV4, and LOU1 as well as NAVY cluster. As I rebuild outer computing nodes it will be available on all computing nodes. Enjoy, Predrag From predragp at andrew.cmu.edu Thu Jan 23 20:27:24 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 23 Jan 2014 20:27:24 -0500 Subject: [auton-users] RAM memory use on LOV3 98% Message-ID: Dear Autons, Memory use on LOV3 is 98%. You might want to change the computing node. I am not naming names. The machine is going to crash. I still do not have IPMI via LAN enabled so I will not be able to reboot machine from my home. Few people having permission to store local files on LOV4 are using 100% of the space. Be aware that space is volatile and make sure you have data on your local HDD. As promised about 2 months ago I am slowly resurrecting monitoring system. If you are on the local network you can check current machine conditions on http://monit.int.autonlab.org:8080 username: auton password: Dr.Who I will add web proxy shorty so that people can check the M/Monit report from home. Status page aka. Ganglia cluster monitoring system is going to be retired. We have only one soft cluster Guard Dog in the LAB and I see no purpose of maintaining very complex software just for one cluster. Cheers, Predrag From predragp at andrew.cmu.edu Thu Jan 23 22:31:40 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 23 Jan 2014 22:31:40 -0500 Subject: [auton-users] File server 96% full Message-ID: <0b2662a5afa8dcf1166e61fa3a1279e9.squirrel@webmail.andrew.cmu.edu> I would like to ask everyone to try to clean the garbage from the main file server and keep it stable for a little bit longer. Namely LYRE is 96% full. The help is on the way.The replacement file server is up and running but I am waiting for a Host Bus Adapter to arrive from California before I can configure Zpool and allow you to use it. On the related note the new Neill-zfs is fully configured and currently being tested by Seth. I anticipate that it will become available to the members of Neill group sometime next week. Predrag From predragp at andrew.cmu.edu Fri Jan 24 16:29:57 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Fri, 24 Jan 2014 16:29:57 -0500 Subject: [auton-users] LOT2 is down Message-ID: LOT2 has crashed according to M/Monit. Hopefully reboot will fix the machine. This is the second time LOT2 is going down in the past week. I will have a closer look in particular in the light of the fact that there is 3TB RAID full of data on that machine. Predrag From predragp at andrew.cmu.edu Fri Jan 24 17:03:08 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Fri, 24 Jan 2014 17:03:08 -0500 Subject: [auton-users] LOT2 back in business Message-ID: <81da299b689ac1b783c777b6caf49241.squirrel@webmail.andrew.cmu.edu> I rebooted LOT2 and it looks fine now. I will continue to monitor hardware for possible troubles. Predrag From predragp at andrew.cmu.edu Tue Jan 28 15:17:21 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Tue, 28 Jan 2014 15:17:21 -0500 Subject: [auton-users] LOV3 back in business Message-ID: LOV3 is back in business after being non responsive for an hour. It looks like somebody started runaway script which resulted in machine being dead apart of ping and ssh as root. Predrag From predragp at andrew.cmu.edu Wed Jan 29 14:51:35 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Wed, 29 Jan 2014 14:51:35 -0500 Subject: [auton-users] File server 97% Message-ID: <06b8f6c0c8d45e663adc5f190fbf1351.squirrel@webmail.andrew.cmu.edu> Dear Autons, I am asking for your help to try to clean our main file server from unnecessarily file. It is dangerously full and we need it up and running until new file server is production ready. Predrag