From jmjoseph at andrew.cmu.edu Mon Mar 1 19:26:51 2004 From: jmjoseph at andrew.cmu.edu (Jacob Joseph) Date: Mon, 01 Mar 2004 19:26:51 -0500 Subject: [auton-users] Lab Outage Message-ID: <4043D4CB.5020708@andrew.cmu.edu> Due to uncontrollable temperatures in the lab, we are going to have to shut down most if not all of the lab for tonight and reconsider our options tomorrow. With the warmer weather this week, the AC unit is unable to keep up. It turns out the cooling coil for our air conditioning was never actually repaired since it burst in early January and the replacement is being special ordered. I was not able to obtain any ETA for this. We are looking at a few creative solutions to maintain the temperature, but until further notice, the lab will be down. -Jacob From jmjoseph at andrew.cmu.edu Mon Mar 1 20:29:41 2004 From: jmjoseph at andrew.cmu.edu (Jacob Joseph) Date: Mon, 01 Mar 2004 20:29:41 -0500 Subject: [auton-users] Lab Outage In-Reply-To: <1078188043.8547.67.camel@LYNX.AUTON.CS.CMU.EDU> References: <4043D4CB.5020708@andrew.cmu.edu> <1078188043.8547.67.camel@LYNX.AUTON.CS.CMU.EDU> Message-ID: <4043E385.3010408@andrew.cmu.edu> I thought I'd forward this on to everyone: I hope to leave lofty, the file server, up if at all possible. NFS from outside the lab will continue to work. Pat is looking into a few NFS problems on the desktops, so if you experience problems, send a mail to admin at autonlab.org. Thanks. -Jacob Adam Goode wrote: > What time will the lab go down? I need to get data off the RAID... > > On Mon, 2004-03-01 at 19:26, Jacob Joseph wrote: > >>Due to uncontrollable temperatures in the lab, we are going to have to >>shut down most if not all of the lab for tonight and reconsider our >>options tomorrow. With the warmer weather this week, the AC unit is >>unable to keep up. It turns out the cooling coil for our air >>conditioning was never actually repaired since it burst in early January >>and the replacement is being special ordered. I was not able to obtain >>any ETA for this. >> >>We are looking at a few creative solutions to maintain the temperature, >>but until further notice, the lab will be down. >> >>-Jacob >> >> > > From komarek at cmu.edu Wed Mar 3 10:16:04 2004 From: komarek at cmu.edu (Paul Komarek) Date: Wed, 03 Mar 2004 10:16:04 -0500 Subject: [auton-users] Lab machines available again, sort of. Message-ID: <1078326964.30084.9.camel@localhost> Hello everyone, Some hacking and a bit of good luck have made the lab's computers available again. At least for a while. We currently have no air conditioning. We won't have air conditioning for several more weeks. The room holding our computers is being fed air from outside to cool it. If the weather becomes warmer, the lab will become warmer. We're not sure how much compute load we can sustain. But we survived through the night with all of the lop*, lina, lira and loki all powered on and 4 cpus busy. It appears the outside air is currently cold enough to sustain the lab. If the lab's ambient temperature sensor reaches 28C, we will send a warning to everyone that we will shutdown some or all of the lab. If it reaches 30C, we will shutdown immediately. You can see the current temperature of this sensor at http://www2.autonlab.org/status/server24.html (Machine Summary-> Appliance -> Ambient). We may need to prioritize jobs running in the lab if we do a partial shut-down. If you have an important job that you feel must keep running, please send us a note at admin at autonlab.org. Alternatively you can watch the temperature graph linked above and wait to email us until the temperature becomes critical. -Paul Komarek From komarek at cmu.edu Thu Mar 4 14:48:09 2004 From: komarek at cmu.edu (Paul Komarek) Date: Thu, 04 Mar 2004 14:48:09 -0500 Subject: [auton-users] ssh session termination Message-ID: <1078429689.30076.44.camel@localhost> Hello everyone, We are preparing to make lop6 and lop7, as well as the 8GB loq1 and loq2, available to the lab. Among the steps is a lab-wide network operation that might/will terminate some/all ssh connections to the lab. In particular, we need to restart our address-translation daemon. We would like to make this change sometime tonight. Please make sure that you are not relying on an ssh session into the lab staying open after 8pm tonight. If you have an ssh session open which you absolutely positively cannot lose, please email admin at autonlab.org as soon as possible or call me (Paul Komarek) at 412.983.1284. We will notify everyone via this list when lop6, lop7, loq1 and loq2 are available. Some of you may discover you can connect to these machines, but please do not start any jobs without contacting us first. Just one of several reasons for this policy is that these machines are being rebooted fairly often now as we put them through their paces. It is also not clear how much thermal load we can handle in the lab while the AC is off. -Paul Komarek From komarek at cmu.edu Fri Mar 5 10:28:10 2004 From: komarek at cmu.edu (Paul Komarek) Date: Fri, 05 Mar 2004 10:28:10 -0500 Subject: [auton-users] lab temp climbing today Message-ID: <1078500490.26983.14.camel@localhost> Hi everyone, The lab temp is climbing rapidly today. You can see it on on the Appliance->Ambient graph at http://www2.autonlab.org/status/server24.html I've shutdown a few machines that weren't in use, and would like to see if there are any volunteers for shutting down a few more when/if the temp crosses through 28. All I need is contact info for that time; until then, we'll let the machines continue working. -Paul From pgunn at cs.cmu.edu Fri Mar 5 10:39:23 2004 From: pgunn at cs.cmu.edu (Pat Gunn) Date: Fri, 5 Mar 2004 10:39:23 -0500 (EST) Subject: [auton-users] Drinks for today's meeting Message-ID: Hey all, Jean just called, and the only drinks available from Alladin's were Nestea Iced Tea and Water. If you don't like either, it's suggested that you bring your own drink. Otherwise, the order should be as specified. -- Pat Gunn Research/Systems Programmer, Auton Group, CMU From jmjoseph at andrew.cmu.edu Sat Mar 6 12:50:09 2004 From: jmjoseph at andrew.cmu.edu (Jacob Joseph) Date: Sat, 06 Mar 2004 12:50:09 -0500 Subject: [auton-users] Another LAB OUTAGE Message-ID: <404A0F51.60807@andrew.cmu.edu> Well, I'm sorry to admit that the machines have now won the war. Cosmic rays, searing temperatures, and bad luck are to blame. Lofty, the fileserver, at this point is dead. After several false alarms, it appears that at the very least, the motherboard was destroyed, presumptively due to the heat. We have no reason to believe that the data on BigPapa has been damaged, but I cannot commit at this point. Just to be clear, this is not the sort of problem that software can cause and we will need replacement hardware. The timing of these problems is always wonderful. Last week, I was swamped with midterms and major assignments and now I've gone home for break. Thanks to Paul and Jeanie's all-nighter last night, we at least have a hope at recovering using another machine. This is not going to be a fast operation though, so I'll keep everyone posted of the status. Incidentally, the status pages are down as well. We've stolen that machine for our temporary solution. Once again, do thank Paul and Jeanie for their efforts. -Jacob From komarek at cmu.edu Sat Mar 6 17:34:10 2004 From: komarek at cmu.edu (Paul Komarek) Date: 06 Mar 2004 17:34:10 -0500 Subject: [auton-users] The lab is back up Message-ID: <1078612442.14538.54.camel@laptop> The lab fileserver has been replaced. While it's ability to handle high load has been diminished, we only ever see a high load for certain jobs run on loki -- and loki is down because we stole it's scsi card. lop1-7 are available, as are the new 8GB loq1 and loq2. Lina and lira are also available. Please report any troubles to me (Paul Komarek) immediately. My mobile phone number is 412.983.1284. My email is komarek at cmu.edu. You might also want to write to admin at autonlab.org. -Paul Komarek From anya at cmu.edu Mon Mar 8 23:11:32 2004 From: anya at cmu.edu (Anna Goldenberg) Date: Mon, 8 Mar 2004 23:11:32 -0500 Subject: [auton-users] lop3-5 request Message-ID: <00dc01c4058c$aafb9030$c600a8c0@Anna> Hi, I would like to reserve lop3-5 for the next 4 days to conduct experiments for UAI. Please e-mail me if it's of any inconvenience to you. If I don't hear from anybody till 10am tomorrow, they will be reserved. Thank you, Anna -------------- next part -------------- An HTML attachment was scrubbed... URL: From komarek at cmu.edu Tue Mar 9 14:24:39 2004 From: komarek at cmu.edu (Paul Komarek) Date: Tue, 09 Mar 2004 14:24:39 -0500 Subject: [auton-users] Reservation reminder Message-ID: <1078860278.9319.197.camel@localhost> Anya has reserved lop3, lop4 and lop5 until Saturday. -Paul Komarek From komarek at cmu.edu Sat Mar 20 04:04:02 2004 From: komarek at cmu.edu (Paul Komarek) Date: Sat, 20 Mar 2004 04:04:02 -0500 Subject: [auton-users] lop6 and lop7 idle Message-ID: <1079773441.5115.392.camel@localhost> Hi, I see that lop6 and lop7 have gone idle. I'm trying to cram a few more timing experiments into my draft thesis (due Tuesday). Would anyone mind if I reserved these (to make sure my timings were reliable) until Tuesday? If there is no new load on lop6 and lop7 by 3pm "tomorrow" (Saturday), and nobody objects via email or otherwise, I'll reserve lop6 and lop7. -Paul Komarek From komarek at cmu.edu Sat Mar 20 16:45:46 2004 From: komarek at cmu.edu (Paul Komarek) Date: Sat, 20 Mar 2004 16:45:46 -0500 Subject: [auton-users] lop6 and lop7 *not* reserved Message-ID: <1079819145.5113.400.camel@localhost> Hi, I was asked that I not reserve lop6 and lop7 by one of our users, and therefore those machines are not reserved. -Paul From komarek at cmu.edu Sun Mar 21 14:09:30 2004 From: komarek at cmu.edu (Paul Komarek) Date: Sun, 21 Mar 2004 14:09:30 -0500 Subject: [auton-users] loq1 Message-ID: <1079896170.5120.462.camel@localhost> Hello, Due to ram problems and pressing deadlines I have stolen loq1. If someone needs lop3 in exchange, please email me (komarek at cmu.edu) directly and we can arrange a "trade". -Paul From komarek at cmu.edu Sun Mar 21 14:32:39 2004 From: komarek at cmu.edu (Paul Komarek) Date: Sun, 21 Mar 2004 14:32:39 -0500 Subject: [auton-users] loq1 In-Reply-To: <1079896170.5120.462.camel@localhost> References: <1079896170.5120.462.camel@localhost> Message-ID: <1079897558.5113.464.camel@localhost> Heh, before anyone worries: by "ram problems" I meant "not enough ram". -Paul On Sun, 2004-03-21 at 14:09, Paul Komarek wrote: > Hello, > > Due to ram problems and pressing deadlines I have stolen loq1. If > someone needs lop3 in exchange, please email me (komarek at cmu.edu) > directly and we can arrange a "trade". > > -Paul > From komarek at cmu.edu Mon Mar 22 00:18:32 2004 From: komarek at cmu.edu (Paul Komarek) Date: Mon, 22 Mar 2004 00:18:32 -0500 Subject: [auton-users] loq2 Message-ID: <1079932711.5116.487.camel@localhost> Hi, loq2 has not been used by anyone but anya for a while, and I have her permission to steel it (promising I'll unpause her jobs later ;-). Since I can make good use of a second 8GB machine right now, I'm reserving it. All of this is for my thesis deadline on Tuesday, so these machines should be freed up shortly after that. -Paul Komarek From komarek at cmu.edu Mon Mar 22 16:08:11 2004 From: komarek at cmu.edu (Paul Komarek) Date: Mon, 22 Mar 2004 16:08:11 -0500 Subject: [auton-users] lop3,4,5 open again Message-ID: <1079989690.5115.553.camel@localhost> The reservations on lop3,4,5 have been released. -Paul Komarek From komarek at cmu.edu Wed Mar 24 15:40:29 2004 From: komarek at cmu.edu (Paul Komarek) Date: Wed, 24 Mar 2004 15:40:29 -0500 Subject: [auton-users] Unreserving loq2, lop1; reserving lop6,7 Message-ID: <1080160829.29997.362.camel@localhost> Hello, I was formerly using loq1 and loq2, which were reserved for both Anya and me. I no longer need both 8GB machines, and Anya has given me permission to release our reservations on loq2. I am also reserving my long-held reservation on lop1 no later than 5:30p today. lop1 has an "old" BIOS (it will updated soon, and is only known to affect a certain part of my code when run on two particular datasets). "Old" means that the BIOS is older than that of lop6,7 and loq1,2 In fact, this is true of lop3,4,5 as well. We have updated this BIOS on lop2 and it appears to be working properly. In truth, I am trading these for lop6 and lop7. I have a bit of an emergency with my thesis, and need to rerun many experiments. The current reservation status is lop1: open (by 5:30p) lop2: Paul Komarek lop3: open lop4: open lop5: open lop6: Paul Komarek lop7: Paul Komarek loq1: Paul Komarek loq2: open -Paul Komarek From komarek at cmu.edu Thu Mar 25 02:20:08 2004 From: komarek at cmu.edu (Paul Komarek) Date: Thu, 25 Mar 2004 02:20:08 -0500 Subject: [auton-users] lop1 is available Message-ID: <1080199207.29995.375.camel@localhost> Oops, I forgot to free lop1 by 5:30p like I promised. Well, it's free now. Open machines: lop1 4GB lop3 4GB lop4 4GB lop5 4GB loq2 8GB -Paul Komarek From komarek at cmu.edu Sat Mar 27 16:39:23 2004 From: komarek at cmu.edu (Paul Komarek) Date: Sat, 27 Mar 2004 16:39:23 -0500 Subject: [auton-users] lab down for maintenance Message-ID: <1080423562.16463.140.camel@localhost> Hi everyone, Most of the lab machines were empty, and the users of the rest said they didn't need them today. We are using this opportunity to do some rearranging of the machines into our two new racks. Please call me (Paul: 412.983.1284) or Jacob (412.831.5240) if today's outage is inconvenient in any way. Also, please email us if you are using, or will need to use, the lab's storage via NFS today or tonight. We would also like to move the firewall, fileserver and disks. That requires taking all of the labs network services offline for a short period (<< 1 hour anticipated). -Paul From jmjoseph at andrew.cmu.edu Sat Mar 27 16:43:14 2004 From: jmjoseph at andrew.cmu.edu (Jacob Joseph) Date: Sat, 27 Mar 2004 16:43:14 -0500 Subject: [auton-users] lab down for maintenance In-Reply-To: <1080423562.16463.140.camel@localhost> References: <1080423562.16463.140.camel@localhost> Message-ID: <4065F572.4080209@andrew.cmu.edu> Paul must be dyslexic. My phone number is 831-524-0666. -Jacob Paul Komarek wrote: > Hi everyone, > > Most of the lab machines were empty, and the users of the rest said they > didn't need them today. We are using this opportunity to do some > rearranging of the machines into our two new racks. Please call me > (Paul: 412.983.1284) or Jacob (412.831.5240) if today's outage is > inconvenient in any way. > > Also, please email us if you are using, or will need to use, the lab's > storage via NFS today or tonight. We would also like to move the > firewall, fileserver and disks. That requires taking all of the labs > network services offline for a short period (<< 1 hour anticipated). > > -Paul From komarek at cmu.edu Sun Mar 28 01:39:31 2004 From: komarek at cmu.edu (Paul Komarek) Date: Sun, 28 Mar 2004 01:39:31 -0500 Subject: [auton-users] lab is available Message-ID: <1080455970.16460.142.camel@localhost> Hello, The lab is available again, as well as NFS to office machines. I (Paul) still have a reservation on lop2,6,7, and loq1. -Paul Komarek