From predragp at cs.cmu.edu Fri Jan 8 15:50:14 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Fri, 08 Jan 2016 15:50:14 -0500 Subject: Printing issues Message-ID: <20160108205014.Kn1obfWd6oj5%predragp@cs.cmu.edu> Dear Autonians, Some of you have noticed that printing stopped working yesterday morning right after CS CMU power outrage drill. I have done little bit sniffing of CS network and I see that their LPD is either down or firewalled. I notified their help desk about the problem. If they do nothing until Monday I will switch all our desktops to jetdirect protocol. Namely I already noticed that they are not firewalling jetdirect port 9100 and that we can print using sockets. If you need to print something before Monday let me know and I will fix your desktop. Predrag From predragp at cs.cmu.edu Mon Jan 25 12:21:00 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Mon, 25 Jan 2016 12:21:00 -0500 Subject: GPU machines In-Reply-To: <56A64EEF.7070001@cs.cmu.edu> References: <56A64EEF.7070001@cs.cmu.edu> Message-ID: <20160125172100.pmjtmCLrLSRg%predragp@cs.cmu.edu> "Sashank J. Reddi" wrote: > Hi, > > I am trying to run few experiments using Theano on the GPU machine > (gpu1.int) and encountering the following error: > > WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not > available (error: Unable to get the number of gpus available: no > CUDA-capable device is detected) > > Can you please help me with this. > > Thanks, > Sashank I am redirecting this to users at autonlab.org. Unless the recent update to RHEL 7.2 broke something both GPU1 and GPU2 were fully functional. Let see if other people have seen this problem before I start troubleshooting the installation. Predrag From predragp at cs.cmu.edu Mon Jan 25 16:57:30 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Mon, 25 Jan 2016 16:57:30 -0500 Subject: system outage Message-ID: <20160125215730.GRSrtpQAAnBV%predragp@cs.cmu.edu> Dear Autonians, One of us have done something nasty to our network file systems which have caused massive outage in the Lab. I am working to restore the services. Please stay tuned. Predrag P.S. The person that have done this will be hunted down and will have to give at least 5 seminar talks until the end of the year. From predragp at cs.cmu.edu Mon Jan 25 17:21:27 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Mon, 25 Jan 2016 17:21:27 -0500 Subject: system outage In-Reply-To: <20160125215730.GRSrtpQAAnBV%predragp@cs.cmu.edu> References: <20160125215730.GRSrtpQAAnBV%predragp@cs.cmu.edu> Message-ID: <20160125222127.Gf43PPQCN0wf%predragp@cs.cmu.edu> Predrag Punosevac wrote: > Dear Autonians, > > One of us have done something nasty to our network file systems which > have caused massive outage in the Lab. I am working to restore the > services. Please stay tuned. > > Predrag > > P.S. The person that have done this will be hunted down and will have to > give at least 5 seminar talks until the end of the year. I have a little more info about this. The outrage is caused by power failure in one of our racks. I am working on this right now. I can't giv the estimate how long would it take to restore the services. File servers are affected! Predrag From predragp at cs.cmu.edu Mon Jan 25 18:04:35 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Mon, 25 Jan 2016 18:04:35 -0500 Subject: system outage In-Reply-To: <20160125222127.Gf43PPQCN0wf%predragp@cs.cmu.edu> References: <20160125215730.GRSrtpQAAnBV%predragp@cs.cmu.edu> <20160125222127.Gf43PPQCN0wf%predragp@cs.cmu.edu> Message-ID: <20160125230435.6K33iYq70U5Y%predragp@cs.cmu.edu> Predrag Punosevac wrote: > Predrag Punosevac wrote: > > > Dear Autonians, > > > > One of us have done something nasty to our network file systems which > > have caused massive outage in the Lab. I am working to restore the > > services. Please stay tuned. > > > > Predrag > > > > P.S. The person that have done this will be hunted down and will have to > > give at least 5 seminar talks until the end of the year. > > > I have a little more info about this. The outrage is caused by power > failure in one of our racks. I am working on this right now. I can't giv > the estimate how long would it take to restore the services. File > servers are affected! > > Predrag Ok Folks, I was able to partially restore the power in the A1-2C. This is the most important server RACK as it is hosting core network infrastructure servers, file servers (GAIA, Neill-ZFS, Uranus), virtual host Athena, as well as the following computing nodes GPU1, GPU2, ari, foxconn, low1, lov3, lov4, lot1. This is the summary. All core network servers, Athena, and Uranus are safe fully operational and connected to its own 120V PDU. File servers GAIA are Neill-ZFS are safe, fully operational and connected to the their own 120V PDU. GPU1 and GPU2 are fully operational and safe connected to their own 208V PDU. I have shut down on the emergency basis the following computing nodes ari, foxconn, lov3, lov4, low1, and lot1. I am afraid to add them to any of the above mentioned PDU units. The good news is that I have a space for and extra power supply in this rack so the best and the easiest solution would be to add another PDU/UPS and safely connect this computing nodes to separate power supply. They will remain down at least until tomorrow morning while I consult with Artur about the future course of action. Best, Predrag From predragp at cs.cmu.edu Tue Jan 26 12:11:41 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Tue, 26 Jan 2016 12:11:41 -0500 Subject: LOT1 back in business Message-ID: <20160126171141.FKHUMi8uIScu%predragp@cs.cmu.edu> Dear Autonians, I have enough power to put LOT1 back in business. The following computing nodes are still down ari, foxconn, low1, lov3, lov4 I am waiting for a permission to use an extra electric circuit. Predrag From predragp at cs.cmu.edu Tue Jan 26 15:02:06 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Tue, 26 Jan 2016 15:02:06 -0500 Subject: Please reboot your Auton Lab desktops In-Reply-To: <56A7AD18.1080700@cs.cmu.edu> References: <20160126171141.FKHUMi8uIScu%predragp@cs.cmu.edu> <56A7AD18.1080700@cs.cmu.edu> Message-ID: <20160126200206.z2d0uqg-2tHA%predragp@cs.cmu.edu> Jeff Schneider wrote: > Hi Predrag, > > I see that I am still unable to access my home auton directory from > loco. Do I need to do something to regain access? Or is the file > server for that still down? > > Jeff. Linux machines assume that we are using NFSv4 which is a stateful protocol. We don't. We are using NFSv3 which is stateless but above assumption caused stale file handles. I have rebooted your computer and everything should work now. Predrag P.S. I hope you don't mind if I share this e-mail with others. People who have Auton Lab desktops should reboot their machine at their convenience to fix stale file handles. There is no way around it. If you notice unusual behavior after that please e-mail. > > > On 01/26/2016 12:11 PM, Predrag Punosevac wrote: > > Dear Autonians, > > > > I have enough power to put LOT1 back in business. The following > > computing nodes are still down > > > > ari, foxconn, low1, lov3, lov4 > > > > I am waiting for a permission to use an extra electric circuit. > > > > Predrag From predragp at cs.cmu.edu Tue Jan 26 15:07:52 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Tue, 26 Jan 2016 15:07:52 -0500 Subject: lop1 now works! In-Reply-To: References: <20160125215730.GRSrtpQAAnBV%predragp@cs.cmu.edu> <20160125222127.Gf43PPQCN0wf%predragp@cs.cmu.edu> <20160125230435.6K33iYq70U5Y%predragp@cs.cmu.edu> Message-ID: <20160126200752.UWNHDjI7Fuyf%predragp@cs.cmu.edu> Junier Oliva wrote: > Hi Predrag, > > I can't seem to log into lop1 even though it seems to be up on monit. Is it > operational? > > Thanks, > Junier Stale NFS file handles! I rebooted the lop1 and now it is fully operational. Predrag > On Jan 25, 2016 6:05 PM, "Predrag Punosevac" wrote: > > > Predrag Punosevac wrote: > > > > > Predrag Punosevac wrote: > > > > > > > Dear Autonians, > > > > > > > > One of us have done something nasty to our network file systems which > > > > have caused massive outage in the Lab. I am working to restore the > > > > services. Please stay tuned. > > > > > > > > Predrag > > > > > > > > P.S. The person that have done this will be hunted down and will have > > to > > > > give at least 5 seminar talks until the end of the year. > > > > > > > > > I have a little more info about this. The outrage is caused by power > > > failure in one of our racks. I am working on this right now. I can't giv > > > the estimate how long would it take to restore the services. File > > > servers are affected! > > > > > > Predrag > > > > Ok Folks, > > > > I was able to partially restore the power in the A1-2C. This is the > > most important server RACK as it is hosting core network infrastructure > > servers, file servers (GAIA, Neill-ZFS, Uranus), virtual host Athena, > > as well as the following computing nodes GPU1, GPU2, ari, foxconn, low1, > > lov3, lov4, lot1. > > > > This is the summary. > > > > All core network servers, Athena, and Uranus are safe fully operational > > and connected to its own 120V PDU. > > > > File servers GAIA are Neill-ZFS are safe, fully operational and > > connected to the their own 120V PDU. > > > > GPU1 and GPU2 are fully operational and safe connected to their own 208V > > PDU. > > > > I have shut down on the emergency basis the following computing nodes > > > > ari, foxconn, lov3, lov4, low1, and lot1. I am afraid to add them to > > any of the above mentioned PDU units. The good news is that I have a > > space for and extra power supply in this rack so the best and the > > easiest solution would be to add another PDU/UPS and safely connect this > > computing nodes to separate power supply. They will remain down at least > > until tomorrow morning while I consult with Artur about the future > > course of action. > > > > Best, > > Predrag > > From predragp at cs.cmu.edu Tue Jan 26 16:36:00 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Tue, 26 Jan 2016 16:36:00 -0500 Subject: foxconn is back now Message-ID: <20160126213600.uPU6DWW7xHbR%predragp@cs.cmu.edu> Dear Autonians, We got extra circuit for our A1-2C rack as well as new UPS and PDU. That means that computing nodes are slowly coming back. foxconn is now fully productional! ari's file system looks damaged and I will assess the situation tomorrow. lov3, lov4, and low1 were on schedule for RHEL 7.2 upgrade (from 6.7). I am doing it right now. Hopefully they will be ready by tomorrow. Predrag From awd at cs.cmu.edu Thu Jan 28 10:30:25 2016 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Thu, 28 Jan 2016 10:30:25 -0500 Subject: Maria's work getting popular in Latin American media Message-ID: <56AA3411.1060504@cs.cmu.edu> Dear Autonians, Maria gives us a rare opportunity to practice our Spanish language reading skills. Check out the original summary of her study of sexual violence in El Salvador: http://especiales.laprensagrafica.com/2015/violaciones-en-elsalvador/ and a couple out of multiple news snippets from Mexico and Colombia stemming from it: http://aristeguinoticias.com/2501/mundo/cada-cuatro-horas-violan-a-una-persona-en-el-salvador-reportaje/ http://www.elespectador.com/noticias/elmundo/cada-cuatro-horas-una-persona-violada-el-salvador-articulo-612781 Nice job Maria! Artur From emily at marinusanalytics.com Thu Jan 28 15:44:24 2016 From: emily at marinusanalytics.com (Emily Kennedy) Date: Thu, 28 Jan 2016 20:44:24 +0000 Subject: Maria's work getting popular in Latin American media In-Reply-To: <56AA3411.1060504@cs.cmu.edu> References: <56AA3411.1060504@cs.cmu.edu> Message-ID: <3C81E5FB-1D96-4E09-BBA0-AD0F13A9F8D0@marinusanalytics.com> Wow, fantastic job, Maria! Emily Kennedy CEO, Founder Marinus Analytics LLC Direct: 916-205-1245 emily at marinusanalytics.com | marinusanalytics.com On Jan 28, 2016, at 7:30 AM, Artur Dubrawski > wrote: Dear Autonians, Maria gives us a rare opportunity to practice our Spanish language reading skills. Check out the original summary of her study of sexual violence in El Salvador: http://especiales.laprensagrafica.com/2015/violaciones-en-elsalvador/ and a couple out of multiple news snippets from Mexico and Colombia stemming from it: http://aristeguinoticias.com/2501/mundo/cada-cuatro-horas-violan-a-una-persona-en-el-salvador-reportaje/ http://www.elespectador.com/noticias/elmundo/cada-cuatro-horas-una-persona-violada-el-salvador-articulo-612781 Nice job Maria! Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at cs.cmu.edu Thu Jan 28 16:42:53 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Thu, 28 Jan 2016 16:42:53 -0500 Subject: LOV3, LOV4, LOW1 back in business Message-ID: <20160128214253.pGDHgWn_jm9b%predragp@cs.cmu.edu> Dear Autonians, lov3, lov4, low1 are back in business. I used the fact that servers were down to rebuilt them with the latest RHEL 7.2. Unfortunatelly the installer was very moody and would not take large HDD I had from the old file servers. So the scratch space is as before (about 350 GB on lov[3-4]). MATLAB is currently being installed. Before you e-mail me that something is wrong make sure you delete old records for lov3, lov4, and low1 from ~/.ssh/known_hosts before your ssh client will not allow you to log in. Enjoy, Predrag P.S. Give me also few minutes with SELinux so that you can use ssh keys. From predragp at cs.cmu.edu Fri Jan 29 17:04:31 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Fri, 29 Jan 2016 17:04:31 -0500 Subject: ARI is fixed now! In-Reply-To: References: <20160126213600.uPU6DWW7xHbR%predragp@cs.cmu.edu> Message-ID: <20160129220431.kFFaneVwUCBE%predragp@cs.cmu.edu> Anthony Wertz wrote: > Predrag, > > Any update on ARI machine status? I just fixed it. This is the last time I will ever install OS on RAID1 unless there is a real reason for it... Predrag > > 2016-01-26 16:36 GMT-05:00 Predrag Punosevac : > > > Dear Autonians, > > > > We got extra circuit for our A1-2C rack as well as new UPS and PDU. That > > means that computing nodes are slowly coming back. > > > > foxconn is now fully productional! > > ari's file system looks damaged and I will assess the situation > > tomorrow. > > > > lov3, lov4, and low1 were on schedule for RHEL 7.2 upgrade (from 6.7). I > > am doing it right now. Hopefully they will be ready by tomorrow. > > > > Predrag > > > > > > -- > *Anthony Wertz* > Research Programmer and Analyst > Robotics Institute - Auton Lab > Carnegie Mellon University > awertz at cmu.edu