From donghanw at cs.cmu.edu Tue Apr 10 14:57:50 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Tue, 10 Apr 2012 14:57:50 -0400 Subject: [auton-users] LOW1 maintenance, Apr. 12 - 15 Message-ID: Hello all, As you may notice that LOW1 has a faulty memory module; it needs to be replaced by a new module. During the maintenance, all services on LOW1 will be unavailable; the maintenance will take place *Apr. 12th through 15th*. If this is a problem, please let me know at your earliest convenience. If you ever saw following message on LOW1, you have already experienced the memory issue. low1 kernel: Northbridge Error (node 0): DRAM ECC error detected on the NB. The maintenance is critical for resolving the issue, thus providing you a reliable computing environment on LOW1. During the downtime, a series of tests will be taken to identify the faulty module; due to the large memory capacity of LOW1, 512GB, it takes longer to perform detection. Thanks, Jarod -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Email: donghanw at cs.cmu.edu Tel: +1 412 268 1238 -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghanw at cs.cmu.edu Wed Apr 11 16:27:15 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Wed, 11 Apr 2012 16:27:15 -0400 Subject: [auton-users] Managing ssh key using keychain Message-ID: Hello lab, This message describes how to use keychain to allow ssh access from one host to another without typing password or passphrase in secure way. It comes in handy in many scenarios, such as cvs operations, ssh from lops to low1. It requires a private/public ssh key pair. You can check its existence by looking into your ssh files, specifically, id_rsa.pub and id_rsa; by default they are in /auton/home//.ssh/ /neill/home//.ssh/ or /home//.ssh/ So you have your ssh keys ready (if not, check out this tutorial http://www.autonlab.org/auton_intranet/skills/sshkeys.html) Simply append following line to the shell startup file, .bashrc for bash and zsh and .cshrc for c shell and tcsh. eval `keychain --eval --agents ssh id_rsa` That's it. Let me know if you have any questions. Enjoy, Jarod -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Email: donghanw at cs.cmu.edu Tel: +1 412 268 1238 -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghanw at cs.cmu.edu Wed Apr 11 20:06:18 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Wed, 11 Apr 2012 20:06:18 -0400 Subject: [auton-users] LOW1 maintenance, Apr. 12 - 15 In-Reply-To: References: Message-ID: Hi all, This is a friendly reminder that LOW1 will be down tomorrow morning (Apr. 12th) through 15th for maintenance. Thanks for your attention. Jarod On Tue, Apr 10, 2012 at 2:57 PM, Donghan (Jarod) Wang wrote: > Hello all, > > As you may notice that LOW1 has a faulty memory module; it needs to be > replaced by a new module. During the maintenance, all services on LOW1 will > be unavailable; the maintenance will take place *Apr. 12th through 15th*. > If this is a problem, please let me know at your earliest convenience. > > If you ever saw following message on LOW1, you have already experienced > the memory issue. > > low1 kernel: Northbridge Error (node 0): DRAM ECC error detected on the NB. > > The maintenance is critical for resolving the issue, thus providing you a > reliable computing environment on LOW1. During the downtime, a series of > tests will be taken to identify the faulty module; due to the large memory > capacity of LOW1, 512GB, it takes longer to perform detection. > > Thanks, > Jarod > > -- > Donghan (Jarod) Wang > Research Programmer > Robotics Institute > Carnegie Mellon University > 5000 Forbes Avenue > Pittsburgh, PA 15213 > Email: donghanw at cs.cmu.edu > Tel: +1 412 268 1238 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghanw at cs.cmu.edu Sun Apr 15 21:54:50 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Sun, 15 Apr 2012 21:54:50 -0400 Subject: [auton-users] LOW1 maintenance, Apr. 12 - 15 In-Reply-To: References: Message-ID: Hi lab, The LOW1 maintenance will be extended to Tuesday, Aril 17th. The extension is necessary because the large volume memory, 512GB, on LOW1 needs long time for testing program(memtest86) to detect the faulty module. I apologize for any inconvenience that you may experience. Thanks, Jarod On Wed, Apr 11, 2012 at 8:06 PM, Donghan (Jarod) Wang wrote: > Hi all, > > This is a friendly reminder that LOW1 will be down tomorrow morning (Apr. > 12th) through 15th for maintenance. Thanks for your attention. > > Jarod > > > On Tue, Apr 10, 2012 at 2:57 PM, Donghan (Jarod) Wang > wrote: > >> Hello all, >> >> As you may notice that LOW1 has a faulty memory module; it needs to be >> replaced by a new module. During the maintenance, all services on LOW1 will >> be unavailable; the maintenance will take place *Apr. 12th through 15th*. >> If this is a problem, please let me know at your earliest convenience. >> >> If you ever saw following message on LOW1, you have already experienced >> the memory issue. >> >> low1 kernel: Northbridge Error (node 0): DRAM ECC error detected on the >> NB. >> >> The maintenance is critical for resolving the issue, thus providing you a >> reliable computing environment on LOW1. During the downtime, a series of >> tests will be taken to identify the faulty module; due to the large memory >> capacity of LOW1, 512GB, it takes longer to perform detection. >> >> Thanks, >> Jarod >> >> -- >> Donghan (Jarod) Wang >> Research Programmer >> Robotics Institute >> Carnegie Mellon University >> 5000 Forbes Avenue >> Pittsburgh, PA 15213 >> Email: donghanw at cs.cmu.edu >> Tel: +1 412 268 1238 >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghanw at cs.cmu.edu Wed Apr 18 10:02:28 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Wed, 18 Apr 2012 10:02:28 -0400 Subject: [auton-users] LOW1 maintenance, Apr. 12 - 15 In-Reply-To: References: Message-ID: Hi all, LOW1 is up and running. The memory stress test comes up clean. it is the case that Memtest86+ is not guaranteed to be able to pinpoint faulty memory, so I decided to put the machine back in to production to see if the error comes up again during normal operation. Best, Jarod On Sun, Apr 15, 2012 at 9:54 PM, Donghan (Jarod) Wang wrote: > Hi lab, > > The LOW1 maintenance will be extended to Tuesday, Aril 17th. The extension > is necessary because the large volume memory, 512GB, on LOW1 needs long time > for testing program(memtest86) to detect the faulty module. > > I apologize for any inconvenience that you may experience. > > Thanks, > Jarod > > > On Wed, Apr 11, 2012 at 8:06 PM, Donghan (Jarod) Wang > wrote: >> >> Hi all, >> >> This is a friendly reminder that LOW1 will be down tomorrow morning (Apr. >> 12th) through 15th for maintenance. Thanks for your attention. >> >> Jarod >> >> >> On Tue, Apr 10, 2012 at 2:57 PM, Donghan (Jarod) Wang >> wrote: >>> >>> Hello all, >>> >>> As you may notice that LOW1 has a faulty memory module; it needs to be >>> replaced by a new module. During the maintenance, all services on LOW1 will >>> be unavailable; the maintenance will take place Apr. 12th through 15th. If >>> this is a problem, please let me know at your earliest convenience. >>> >>> If you ever saw following message on LOW1, you have already experienced >>> the memory issue. >>> >>> low1 kernel: Northbridge Error (node 0): DRAM ECC error detected on the >>> NB. >>> >>> The maintenance is critical for resolving the issue, thus providing you a >>> reliable computing environment on LOW1. During the downtime, a series of >>> tests will be taken to identify the faulty module; due to the large memory >>> capacity of LOW1, 512GB, it takes longer to perform detection. >>> >>> Thanks, >>> Jarod >>> >>> -- >>> Donghan (Jarod) Wang >>> Research Programmer >>> Robotics Institute >>> Carnegie Mellon University >>> 5000 Forbes Avenue >>> Pittsburgh, PA 15213 >>> Email: donghanw at cs.cmu.edu >>> Tel: +1 412 268 1238 >> >> >> > From donghanw at cs.cmu.edu Thu Apr 19 11:19:17 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Thu, 19 Apr 2012 11:19:17 -0400 Subject: [auton-users] Matlab R2011b Message-ID: Hello lab, The Matlab has been upgraded to R2011b on all compute nodes. Just use command--matlab--as you normally would do; and you are ready to go. In case it fails to find the path, please add the following path to the PATH variable /auton/software/bin/x86_64 The complete list of available toolboxes can be found at following page. http://www.cs.cmu.edu/~help/software_licensing/software_licenses/toolboxes.html The previous version--R2010b--is obsolete but will continue to be available for a few months. To launch it, use command--matlab2010b. Best, Jarod -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Email: donghanw at cs.cmu.edu Tel: +1 412 268 1238 -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghanw at cs.cmu.edu Fri Apr 20 11:06:30 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Fri, 20 Apr 2012 11:06:30 -0400 Subject: [auton-users] LOW1 restored In-Reply-To: References: Message-ID: Attention Users: The LOU1 compute node had to be rebooted unexpectedly due to a process which overloaded them machine and caused it to stop responding to the world. Any processes or sessions on that machine were killed. It happened on Apr 19th, around 10:20pm. Please check your jobs. Thanks! Jarod -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Email: donghanw at cs.cmu.edu Tel: +1 412 268 1238 -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghanw at cs.cmu.edu Mon Apr 23 10:35:27 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Mon, 23 Apr 2012 10:35:27 -0400 Subject: [auton-users] LOW1 maintenance, 04/23, 3-5PM Message-ID: Hello all, The maintenance for LOW1 will take place on *April 23th, 3:00PM - 5:00PM*. During the maintenance, LOW1 will be taken down; the faulty module has been identified and will be replaced. Please let me know if there is a problem at your earliest convenience. Thanks, Jarod -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Email: donghanw at cs.cmu.edu Tel: +1 412 268 1238 -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghanw at cs.cmu.edu Mon Apr 23 17:00:11 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Mon, 23 Apr 2012 17:00:11 -0400 Subject: [auton-users] LOW1 maintenance, 04/23, 3-5PM In-Reply-To: References: Message-ID: Hello lab, The maintenance for LOW1 is completed; it's up and running. The faulty memory module has been replaced. Best, Jarod On Mon, Apr 23, 2012 at 10:35 AM, Donghan (Jarod) Wang wrote: > Hello all, > > The maintenance for LOW1 will take place on *April 23th, 3:00PM - 5:00PM*. > During the maintenance, LOW1 will be taken down; the faulty module has been > identified and will be replaced. > > Please let me know if there is a problem at your earliest convenience. > > Thanks, > Jarod > > -- > Donghan (Jarod) Wang > Research Programmer > Robotics Institute > Carnegie Mellon University > 5000 Forbes Avenue > Pittsburgh, PA 15213 > Email: donghanw at cs.cmu.edu > Tel: +1 412 268 1238 > -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Email: donghanw at cs.cmu.edu Tel: +1 412 268 1238 -------------- next part -------------- An HTML attachment was scrubbed... URL: From donghanw at cs.cmu.edu Thu Apr 26 11:11:59 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Thu, 26 Apr 2012 11:11:59 -0400 Subject: [auton-users] R 2.14.2 Message-ID: Hello lab, The R has been upgraded to 2.14.2 on all compute nodes, including neills. Simply type command--R--as you would normally do; and you are ready to go. Old versions--R 2.13 and R 2.12--are obsolete but will continue to be available for a few months. To launch them, type command--R_2_12 on LOU1 and R_2_13 on any other compute nodes. Best, Jarod -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Email: donghanw at cs.cmu.edu Tel: +1 412 268 1238 -------------- next part -------------- An HTML attachment was scrubbed... URL: