From mjbaysek at cs.cmu.edu Thu Feb 2 09:37:54 2012 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Thu, 02 Feb 2012 09:37:54 -0500 Subject: [auton-users] Unexpected Outage Message-ID: <4F2A9FC2.4020200@cs.cmu.edu> We are experiencing trouble with our servers at the moment. Please keep posted for updates. From mjbaysek at cs.cmu.edu Thu Feb 2 12:25:41 2012 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Thu, 02 Feb 2012 12:25:41 -0500 Subject: [auton-users] Unexpected Outage In-Reply-To: <4F2A9FC2.4020200@cs.cmu.edu> References: <4F2A9FC2.4020200@cs.cmu.edu> Message-ID: <4F2AC715.4090200@cs.cmu.edu> We have been making progress with the outage. We are bringing up the last of services. Let us know if you continue to have any trouble. On 2/2/12 9:37 AM, Michael J. Baysek wrote: > We are experiencing trouble with our servers at the moment. Please > keep posted for updates. > > From donghanw at cs.cmu.edu Fri Feb 3 09:34:45 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Fri, 3 Feb 2012 09:34:45 -0500 Subject: [auton-users] Unexpected Outage In-Reply-To: <4F2AC715.4090200@cs.cmu.edu> References: <4F2A9FC2.4020200@cs.cmu.edu> <4F2AC715.4090200@cs.cmu.edu> Message-ID: Hi Neill servers users, We've resolved the logging in issue. Let us know if you continue to have any trouble. Best, Jarod On Thu, Feb 2, 2012 at 12:25 PM, Michael J. Baysek wrote: > We have been making progress with the outage. ?We are bringing up the last > of services. ?Let us know if you continue to have any trouble. > > > > > On 2/2/12 9:37 AM, Michael J. Baysek wrote: >> >> We are experiencing trouble with our servers at the moment. ?Please keep >> posted for updates. >> >> > -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Tel: +1 412 268 1238 From mjbaysek at cs.cmu.edu Fri Feb 3 18:04:46 2012 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Fri, 3 Feb 2012 18:04:46 -0500 (EST) Subject: [auton-users] My Status and Going Forward Message-ID: <8086.128.147.0.41.1328310286.squirrel@webmail.cs.cmu.edu> Hi Lab, Some of you may not know this, but I have been out since Tuesday and basically living at Hamot Hospital in Erie supporting my family with my father's open heart surgery. Clinically, he's doing well, but he has been quite miserable through the whole ordeal. He really is doing well though. Thanks for all of your concerns. Anyhow, as a result of this surgery and the toll it has collected from me and my family, I have been less than responsive to some of your requests, and for that, I'd like to apologize. Please be sure to CC Jarod Wang when requesting help from now on. Jarod is a very friendly guy who is taking the reigns of Sysadminship from me as I make my move to Arizona. February 16th is my final day at the lab. You should now start getting into the habit of asking Jarod and I both for help when you make a request. Best Regards, Mike From jostlund at cs.cmu.edu Mon Feb 6 17:57:02 2012 From: jostlund at cs.cmu.edu (John K. Ostlund) Date: Mon, 6 Feb 2012 17:57:02 -0500 (EST) Subject: [auton-users] two interdependent DLLs in 64-bit MS-Windows? Message-ID: <47874.128.2.176.180.1328569022.squirrel@webmail.cs.cmu.edu> Hi gang, I've got a crisis that I'd appreciate some help with, from someone who knows MS-Windows way better than I do. We currently have a single DLL, named tcube_java.dll, that encapsulates all of the Auton Lab C projects used by the T-Cube Web Interface. We need to be able to split this into two separate DLLs, e.g., no_sources.dll and sources.dll, where the former contains callable routines from projects where the user DOES NOT have sources (but does have .h and .obj files), and the latter contains routines from projects where the user DOES have sources (.h, .c, .obj). One problem is, these DLLs are interdependent. For example, extra (sources) depends on draw (no sources) depends on utils (sources). For extra credit, explain how to have Tomcat 6 on Windows serve the sources.dll file, which in turn makes use of the no_sources.dll file. Help! And Thanks! We literally need a solution to this by tomorrow morning. - John O. From mjbaysek at cs.cmu.edu Wed Feb 8 08:09:14 2012 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Wed, 08 Feb 2012 08:09:14 -0500 Subject: [auton-users] Another Unexpected Outage Message-ID: <4F3273FA.5010507@cs.cmu.edu> In what seems to be a repeat of the last outage, we are down again. What we thought was a failed hard drive in one (rather aged) server has turned out to be a more systemic problem with that server. As the server (or motherboard therein) has proven itself to be unreliable, we will be working to get the server or motherboard replaced today. I expect it will be a matter of some hours before this is finished. We should be back up and running by or before noon. Please stay tuned for updates. From mjbaysek at cs.cmu.edu Wed Feb 8 11:09:23 2012 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Wed, 08 Feb 2012 11:09:23 -0500 Subject: [auton-users] Another Unexpected Outage In-Reply-To: <4F3273FA.5010507@cs.cmu.edu> References: <4F3273FA.5010507@cs.cmu.edu> Message-ID: <4F329E33.5070408@cs.cmu.edu> The situation has returned to normal. -- Michael J. Baysek Systems Analyst Carnegie Mellon University / Auton Lab 412-268-8939 - mjbaysek at cs.cmu.edu http://www.autonlab.org On 02/08/2012 08:09 AM, Michael J. Baysek wrote: > In what seems to be a repeat of the last outage, we are down again. > > What we thought was a failed hard drive in one (rather aged) server > has turned out to be a more systemic problem with that server. As the > server (or motherboard therein) has proven itself to be unreliable, we > will be working to get the server or motherboard replaced today. > > I expect it will be a matter of some hours before this is finished. > We should be back up and running by or before noon. > > Please stay tuned for updates. > > From mjbaysek at cs.cmu.edu Fri Feb 10 09:48:36 2012 From: mjbaysek at cs.cmu.edu (Michael J. Baysek) Date: Fri, 10 Feb 2012 09:48:36 -0500 Subject: [auton-users] CPU Resources are Almost Maxed Message-ID: <4F352E44.4020004@cs.cmu.edu> Hi Lab, In recent days, Jarod and I have noticed heavy use on the compute nodes. We've also noticed that some nodes have many more jobs running than there are processors available on those nodes. Please be courteous to other users and check the status page at http://www.autonlab.org/status (or at least run top) prior to launching your jobs. Launching more jobs than there are available processors reduces overall throughput for everyone on the node. The number of processor cores available on a node is listed on the status page in the denominator of the LOAD. Also, a note for Neill Lab users. If your jobs fit on a Neill Lab node, you should always prefer those nodes before using general purpose Auton Lab nodes. You should do this because general Auton users can't access the idle Neill Lab nodes, whereas you can. Best, Mike -- Michael J. Baysek Systems Analyst mjbaysek at cs.cmu.edu - mbaysek at gmail.com http://www.autonlab.org From donghanw at cs.cmu.edu Tue Feb 14 21:20:12 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Tue, 14 Feb 2012 21:20:12 -0500 Subject: [auton-users] Rebooting LOP1, 8:00AM, Feb. 18 Message-ID: Hi lab, LOP1 will be rebooted on Saturday, Feb. 18, at 8AM because of essential system maintenance. Please save your work and log out from LOP1 by then. During the down time, you can use LOP2 to access lab's machines. If this is a problem, please let Jarod know as soon as possible. Thanks, Jarod -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Tel: +1 412 268 1238 From donghanw at cs.cmu.edu Fri Feb 17 08:37:31 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Fri, 17 Feb 2012 08:37:31 -0500 Subject: [auton-users] Rebooting LOP1, 8:00AM, Feb. 18 In-Reply-To: References: Message-ID: Hi all, This message is a reminder that on Saturday, February 18, we will reboot LOP1 at 8:00AM. Please save your work and log out from LOP1 by then. If this is a problem, ?please let Jarod(donghanw at cs.cmu.edu) know as soon as possible. Thanks, Jarod On Tue, Feb 14, 2012 at 9:20 PM, Donghan (Jarod) Wang wrote: > Hi lab, > > LOP1 will be rebooted on Saturday, Feb. 18, at 8AM because of > essential system maintenance. > > Please save your work and log out from LOP1 by then. During the down > time, you can use LOP2 to access lab's machines. > > If this is a problem, ?please let Jarod know as soon as possible. > > Thanks, > > Jarod > > -- > Donghan (Jarod) Wang > Research Programmer > Robotics Institute > Carnegie Mellon University > 5000 Forbes Avenue > Pittsburgh, PA 15213 > Tel: +1 412 268 1238 From donghanw at cs.cmu.edu Sat Feb 18 08:56:09 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Sat, 18 Feb 2012 08:56:09 -0500 Subject: [auton-users] Rebooting LOP1, 8:00AM, Feb. 18 In-Reply-To: References: Message-ID: Hi lab, LOP1 is up and running. Please let me know if you have any problem. Thanks, Jarod On Fri, Feb 17, 2012 at 8:37 AM, Donghan (Jarod) Wang wrote: > Hi all, > > This message is a reminder that on Saturday, February 18, we will > reboot LOP1 at 8:00AM. Please save your work and log out from LOP1 by > then. > > If this is a problem, ?please let Jarod(donghanw at cs.cmu.edu) know as > soon as possible. > > Thanks, > > Jarod > > On Tue, Feb 14, 2012 at 9:20 PM, Donghan (Jarod) Wang > wrote: >> Hi lab, >> >> LOP1 will be rebooted on Saturday, Feb. 18, at 8AM because of >> essential system maintenance. >> >> Please save your work and log out from LOP1 by then. During the down >> time, you can use LOP2 to access lab's machines. >> >> If this is a problem, ?please let Jarod know as soon as possible. >> >> Thanks, >> >> Jarod >> >> -- >> Donghan (Jarod) Wang >> Research Programmer >> Robotics Institute >> Carnegie Mellon University >> 5000 Forbes Avenue >> Pittsburgh, PA 15213 >> Tel: +1 412 268 1238 -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Tel: +1 412 268 1238 From donghanw at cs.cmu.edu Sun Feb 19 15:12:25 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Sun, 19 Feb 2012 15:12:25 -0500 Subject: [auton-users] LOT2 restored Message-ID: Hi Lab, LOT2 has just been restored. Please check your jobs. Let me know if you find any problems. Thanks, Jarod -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Tel: +1 412 268 1238 From donghanw at cs.cmu.edu Mon Feb 20 19:47:01 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Mon, 20 Feb 2012 19:47:01 -0500 Subject: [auton-users] LOP1 restored Message-ID: Hi lab, LOP1 is back online. Please check your jobs. The incident is due to misplaced compute jobs on LOP1 and the jobs overloaded the machine. Please use LOP1 and LOP2 only for a passthru. Please view http://www.autonlab.org/status to find an appropriate node to run your compute jobs on. Let me know if you have any problems. Thanks, Jarod -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Email: donghanw at cs.cmu.edu Tel: +1 412 268 1238 From donghanw at cs.cmu.edu Mon Feb 27 11:08:46 2012 From: donghanw at cs.cmu.edu (Donghan (Jarod) Wang) Date: Mon, 27 Feb 2012 11:08:46 -0500 Subject: [auton-users] LOR2 down Message-ID: All, LOR2 is down at the moment. I am looking into this and will let you know once I have the system back up and running. Thank you for your patience. -Jarod -- Donghan (Jarod) Wang Research Programmer Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Email: donghanw at cs.cmu.edu Tel: +1 412 268 1238