From donghanw at cs.cmu.edu Mon Jul 8 07:50:46 2013 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Mon, 8 Jul 2013 07:50:46 -0400 Subject: [auton-users] Matlab 2013a Message-ID: Dear Auton users, I'm pleased to announce Matlab 2013a is available on all compute nodes. To launch it, run the command matlab. To launch the old versions, R2012b and R2012a, use the command matlab2012b and matlab2012a respectively. Have fun, Jarod -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghanw at cs.cmu.edu Tue Jul 9 15:40:23 2013 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Tue, 9 Jul 2013 15:40:23 -0400 Subject: [auton-users] Auton core services restored In-Reply-To: References: Message-ID: Dear Auton users, All auton services, except hadoop cluster, are up and running. Please let me know if you have any questions. Thanks, Jarod On Mon, Jul 8, 2013 at 11:10 PM, Donghan (Jarod) Wang wrote: > Dear Auton users, > > Monday at 4:45PM, the SCS machine room lost all power to panel-G. Power > was restored at 6:15PM. The outage was longer than UPS batteries can > hold power. The auton cluster went off, along with all other machines in > that row. > > The core services have been restored. I've been working on restoring the > remaining services. > > You may want to check your jobs. > > Best, > Jarod > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghanw at cs.cmu.edu Wed Jul 10 16:31:34 2013 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Wed, 10 Jul 2013 16:31:34 -0400 Subject: [auton-users] Auton core services restored In-Reply-To: References:

Message-ID: Dear Auton users, The Hadoop cluster is restored. Best, Jarod On Tue, Jul 9, 2013 at 3:40 PM, Donghan (Jarod) Wang wrote: > Dear Auton users, > > All auton services, except hadoop cluster, are up and running. Please let > me know if you have any questions. > > Thanks, > Jarod > > > On Mon, Jul 8, 2013 at 11:10 PM, Donghan (Jarod) Wang > wrote: > >> Dear Auton users, >> >> Monday at 4:45PM, the SCS machine room lost all power to panel-G. Power >> was restored at 6:15PM. The outage was longer than UPS batteries can >> hold power. The auton cluster went off, along with all other machines in >> that row. >> >> The core services have been restored. I've been working on restoring the >> remaining services. >> >> You may want to check your jobs. >> >> Best, >> Jarod >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Jul 16 14:48:45 2013 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Tue, 16 Jul 2013 14:48:45 -0400 Subject: [auton-users] Account Loggins Problem Message-ID: <31392ec2f7ef1ad4876f094d13c2dbfc.squirrel@webmail.andrew.cmu.edu> Hello All, Many of you have already found out that you can not log into your Auton accounts since 12:00 pm. The reason is that I shut down Lyre accidentally while rebooting Guard Dog cluster which run out of memory while preforming intensive computations during the weekend. Guard Dog and all four nodes are up but the services are not fully restored according to my monitoring server. However people should be able to ssh to those machines. Lyre our main NFS/CIFS and MySQL server is also up but services are not completely restored at this time. This is due to the fact that restarting Lyre is totally non-trivial due to original complex installation. As the documentation is sketchy I am working my way backward to bring Lyre its normal state. I apologize for ruining your productive day. Most Kind Regards, Predrag From predragp at andrew.cmu.edu Tue Jul 16 19:38:34 2013 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Tue, 16 Jul 2013 19:38:34 -0400 Subject: [auton-users] Lyre Fully functional! Message-ID: <945600b37c7f2131078d1a64cd5c5b4e.squirrel@webmail.andrew.cmu.edu> I am happy to report that Lyre is fully functional again. You should not have any problems logging into your accounts. Your home directories should be properly mounted. md2 (RAID 6) which hosts data and your home directories is successfully restored to normal state. Kudos to Michael J. Baysek who worked with me past hour and a half to fix things. Thanks Michael!!! Most Kind Regards, Predrag From predragp at andrew.cmu.edu Wed Jul 17 13:00:32 2013 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Wed, 17 Jul 2013 13:00:32 -0400 Subject: [auton-users] NFS stale file handle Important! Message-ID: <46a07e8c6ea25bbac39a42b1485500a9.squirrel@webmail.andrew.cmu.edu> Deal All, It has been brought to my attention that many of client server had problems with stale NFS file handle which occur due to yesterday file server crash. The cleanest solution to resolve the problem (vetoed by Mike and Jerad with whom I just spoke) is to reboot problematic client machines so that /auton can be properly mounted from the file server. I have done that with lou1 this morning and lou1 works as expected now. I would like to do that with low1, lop1, and lot1. If you are using any of these three machines please stop any running processes and log out 1:30 EST as I am going to bring them down and put them back. Most Kind Regards, Predrag From predragp at andrew.cmu.edu Wed Jul 17 13:41:53 2013 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Wed, 17 Jul 2013 13:41:53 -0400 Subject: [auton-users] low1 and lop1 back Message-ID: <2053d1e6350e8d51b54228687928e5bc.squirrel@webmail.andrew.cmu.edu> Two key machines low1 and lop1 are back after controlled reboot. They should be fully functional and you should not have any problems. I have not rebooted lot1 as it seems that it has no problem. My office mate are on lot1 and it seems that works as expected. Predrag From gcabrera at dim.uchile.cl Fri Jul 19 11:30:18 2013 From: gcabrera at dim.uchile.cl (Guillermo Cabrera) Date: Fri, 19 Jul 2013 11:30:18 -0400 Subject: [auton-users] Good bye Message-ID: Hi everyone, Tomorrow I leave back to Chile, and just wanted to say good bye and thank everyone of you for all your help: helping me with my research, having food and drinks together, administrative issues, and taking dogs out of my office. Many thanks to all of you! Guillermo.