From predragp at imap.srv.cs.cmu.edu Thu May 12 14:05:51 2016 From: predragp at imap.srv.cs.cmu.edu (predragp) Date: Thu, 12 May 2016 14:05:51 -0400 Subject: Fwd: [Network Outage] Maintenance - Wired and Wireless 05/19/16 In-Reply-To: References: Message-ID: <59b37f6dacdd2f16a384612a1cfa153a@imap.srv.cs.cmu.edu> Unfortunately this network outage will affect the Auton Lab. The good news is that they are upgrading wired Internet from 1 Gigabit to 20 Gigabit (My vote was that we immediately upgrade to 50 Gigabit which is just becoming available and not much more expensive). Best, PRedrag -------- Original Message -------- Subject: [Network Outage] Maintenance - Wired and Wireless 05/19/16 Date: 2016-05-12 11:11 From: SCS Help Desk To: help at cs.cmu.edu [Network Outage] Maintenance - Wired and Wireless 05/19/16 SCS Computing Facilities received this wired and wireless network outage notice from Computing Services: *** To verify the authenticity of this message, visit Computing Services News at https://www.cmu.edu/computing/news/ *** DAY, DATE & TIME: Thursday, May 19, 2016 from 3:00-8:00 am AREAS/BUILDINGS AFFECTED: 311 South Craig Street 4609 Winthrop Street 4612 Forbes Ave 4615 Forbes Ave 4616 Henry St. 4700 Fifth Avenue Baker - Porter Hall Collaborative Innovation Center Cyert Hall Doherty Hall Elliott Dunlap Smith Hall Facilities Management Services Building Gates Center / Hillman Center Complex Hamerschlag Hall Newell-Simon Hall Pittsburgh Technology Center PPG Roberts Engineering Hall Scaife Hall SEI-Rand SEI-Sterling Plaza Software Engineering Institute Wean Hall SERVICES AFFECTED: Wired and Wireless Networks (CMU, CMU-SECURE, EDUROAM, and CMU-GUEST) DETAILS: Maintenance will be completed on the wired and wireless network infrastructure from 3:00-8:00 am on Thursday, May 19, 2016. Those who are using CMU wired and wireless networks MAY experience intermittent connectivity loss. If connectivity loss occurs, you will NOT have access to email, web browsing, and a number of other network dependent services. Please direct any questions or comments to the Computing Services Help Center (412-268-HELP or it-help at cmu.edu). From predragp at cs.cmu.edu Thu May 12 17:10:28 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Thu, 12 May 2016 17:10:28 -0400 Subject: neill[1-4] need to be powered down Message-ID: <20160512211028.PWaOGFj9f%predragp@cs.cmu.edu> Dear Autonians, neill[1-4] computing nodes will need to be powered down in order to be moved from the server RACK-1 to RACK-2. We are doing this in order to prepare RACK-1 for a bunch of new GPU and CPU nodes we will be getting in the near future. Unless there is a serious objection (conference/papers deadlines) I would like to power neill[1-4] tomorrow between 10 AM- 12:00PM. Alternatively I can try to do the whole operation on Monday morning. Cheers, Predrag From predragp at cs.cmu.edu Fri May 13 12:25:04 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Fri, 13 May 2016 12:25:04 -0400 Subject: Neill[1-4] Message-ID: <20160513162504.u3UL-kHJK%predragp@cs.cmu.edu> Unfortunately the old railings are about 1 cm longer than the RACK-2. I will need at least few more hours to put things back together. Right now Neill servers are outside of the RACK. Sorry, Predrag From predragp at cs.cmu.edu Fri May 13 17:06:11 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Fri, 13 May 2016 17:06:11 -0400 Subject: Neill[1-4] Message-ID: <20160513210611.VpskHWdwZ%predragp@cs.cmu.edu> Neill[3-4] are back on-line. I am upgrading MATLAB to 2016a. Please give me another hour before you attempt to use MATLAB. Neill[1-2] are back in the rack and powered up but I can't ssh right now. I am going back to the server room to see what is the issue. I hope to have everything up and running by 6 including MATLAB upgrades. I apologize for the inconvenience. I had to come up with custom railings to put things back. Best, Predrag From predragp at cs.cmu.edu Fri May 13 17:40:28 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Fri, 13 May 2016 17:40:28 -0400 Subject: Neill[1-2] are now back as well Message-ID: <20160513214028.qvVlL7jry%predragp@cs.cmu.edu> Neill[1-2] are back on-line as well. MATLAB is currently being installed. This is my last e-mail regarding Neill computing nodes. Predrag From predragp at cs.cmu.edu Sat May 21 20:26:48 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Sat, 21 May 2016 20:26:48 -0400 Subject: Transferring large amount of data from external server to compute node Message-ID: <20160522002648.svXqU2nAv%predragp@cs.cmu.edu> Jayanth Koushik wrote: > Hey Predrag > > For a new project, I need to transfer about 75 gigs of data (around 300 files) to a GPU node on Auton Lab from another server. But I???m not quite sure how to do this without involving the NFS at some point. Is there anyway to transfer the data directly to a compute node without going through one of the gateway nodes? > I can transfer entire directory test (recursively) to my gpu1 scratch folder using shell.autonlab.org as a proxy by executing the single command from my home desktop scp -o ProxyCommand="ssh -W %h:%p shell.autonlab.org" -r test predrag at gpu1.int.autonlab.org:/home/scratch/predrag/ NFS is not involved at all. Cheers, Predrag > I hope I???ve not misunderstood the architecture here. > > Thank you! > Jayanth From sheath at andrew.cmu.edu Tue May 24 13:30:31 2016 From: sheath at andrew.cmu.edu (Simon Heath) Date: Tue, 24 May 2016 13:30:31 -0400 Subject: ZFS share full Message-ID: <0b6ad1344181a1c34118f04f366e1d88.squirrel@webmail.andrew.cmu.edu> Hi all, It looks like the /zfsauton/projects and zfsauton/data directories are entirely filled up, so if you want to copy things onto these you're going to have trouble. The logs show that about 1.5 TB of data has been created on these drives since yesterday; since these drives are 1 TB and 4 TB respectively, this is an excessive amount of space. I'm still working on finding out just what is using all the space, but if it's you, please move your data to a scratch drive or talk to myself or Predrag about other solutions. Thanks, Simon -- Simon Heath, Research Programmer and Analyst Robotics Institute - Auton Lab Carnegie Mellon University sheath at andrew.cmu.edu From predragp at imap.srv.cs.cmu.edu Tue May 24 14:29:43 2016 From: predragp at imap.srv.cs.cmu.edu (predragp) Date: Tue, 24 May 2016 14:29:43 -0400 Subject: Fwd: Brief Power Test Tomorrow in Wean, NSH and Smith In-Reply-To: <574474A3.9080005@cs.cmu.edu> References: <574474A3.9080005@cs.cmu.edu> Message-ID: Dear Autonians, This shutdown will affect us in significant way as we no longer have UPSs. I would like to ask everyone to shutdown their desktop when you leave for home today. Best, Predrag -------- Original Message -------- Subject: Fwd: Brief Power Test Tomorrow in Wean, NSH and Smith Date: 2016-05-24 11:34 From: Artur Dubrawski To: Predrag Punosevac -------- Forwarded Message -------- SUBJECT: Brief Power Test Tomorrow in Wean, NSH and Smith DATE: Tue, 24 May 2016 09:08:10 -0400 FROM: Jim Skees TO: Jim Skees SCS Folks, This test of one of CMU's main Duquense Light power feeds should only produce a brief spike. However, if the mechanism being tested fails, this may result in as long as a 15-minute power outage in Wean, NSH and Smith between 6 and 7 a.m. tomorrow. So if you have sensitive equipment or are simply a strong believer in Murphy's Law, you may want to shut down your systems when you leave campus later today. --Jim REMINDER - Duquesne Light will be performing the power failure test this Wednesday, 5/25, between 6:00AM ? 7:00AM. Please see below for more information. From: FMS Announce Sent: Wednesday, May 11, 2016 12:54 PM To: 'fms-shutdown at lists.andrew.cmu.edu'; 'fms-baker at lists.cmu.edu'; 'fms-hamburg at lists.andrew.cmu.edu'; 'fms-mmch at lists.andrew.cmu.edu'; 'fms-newelsimon at lists.andrew.cmu.edu'; 'fms-fmsb at lists.andrew.cmu.edu'; 'fms-porter at lists.andrew.cmu.edu'; 'fms-smith at lists.andrew.cmu.edu'; 'fms-wean at lists.andrew.cmu.edu' Subject: UPDATE - Power Failure Testing - 5/25 UPDATE ? Duquesne Light will be performing the power failure test on Wednesday, 5/25, between 6:00AM ? 7:00AM. The buildings listed below will experience a momentary ?blip? in power during the test. If the test is not successful, the buildings will experience a power outage for up to 15 minutes. Please remember to save your files on your computers and turn all non-essential equipment off when you leave on 5/24. - Baker Hall - Hamburg Hall - Margaret Morrison College - Newell Simon Hall - Physical Plant (FMS) - Porter Hall - Smith Hall - Wean Hall If you would like to receive a follow-up email when the test concludes, please send a message to Shannon Wetzel at swetzel at andrew.cmu.edu. If you have any questions, please call Service Response at 8-2910 or e-mail swetzel at andrew.cmu.edu. From predragp at imap.srv.cs.cmu.edu Tue May 24 14:46:52 2016 From: predragp at imap.srv.cs.cmu.edu (predragp) Date: Tue, 24 May 2016 14:46:52 -0400 Subject: ZFS share full In-Reply-To: <0b6ad1344181a1c34118f04f366e1d88.squirrel@webmail.andrew.cmu.edu> References: <0b6ad1344181a1c34118f04f366e1d88.squirrel@webmail.andrew.cmu.edu> Message-ID: <5eef842e45198430a235327fd71c583a@imap.srv.cs.cmu.edu> On 2016-05-24 13:30, Simon Heath wrote: > Hi all, > > It looks like the /zfsauton/projects and zfsauton/data directories are > entirely filled up, so if you want to copy things onto these you're > going > to have trouble. The logs show that about 1.5 TB of data has been > created > on these drives since yesterday; since these drives are 1 TB and 4 TB > respectively, this is an excessive amount of space. I'm still working > on > finding out just what is using all the space, but if it's you, please > move > your data to a scratch drive or talk to myself or Predrag about other > solutions. > > Thanks, > Simon This is very serious. I just didn't anticipate creating 1.5TB of data in one day. Anyhow for all practical purpose our main file server hosting user directories is now dead. Unfortunately all that nice ZFS stuff (copy-on-write, check-sums, consistency check, and journaling) comes with the price. The price is that once the volume /zfsauton is over 80% full NFS performance will seriously degrade and even trying to erase files will not work the way people are used on less sophisticated file system. zfsauton is now 90% full. Anyhow I have to think a day or two what is the correct way to proceed. Predrag From predragp at imap.srv.cs.cmu.edu Tue May 24 16:01:43 2016 From: predragp at imap.srv.cs.cmu.edu (predragp) Date: Tue, 24 May 2016 16:01:43 -0400 Subject: people hogging gpu memory they're not using In-Reply-To: References: Message-ID: <77d1708bb872f3dc2c4d9fbc4c22bd0a@imap.srv.cs.cmu.edu> On 2016-05-24 15:44, Dougal Sutherland wrote: > Hey, > > So, there are two processes running on gpu2 that are claiming a lot of > the gpu memory but seem to be idle: > > +-----------------------------------------------------------------------------+ > | Processes: GPU > Memory | > | GPU PID Type Process name > Usage | > |=============================================================================| > | 0 884 C python > 8053MiB | > | 0 9904 C python > 78MiB | > | 1 884 C python > 64MiB | > | 1 9904 C python > 2485MiB | > | 1 26099 C caffe > 105MiB | > +-----------------------------------------------------------------------------+ > > 884 is an IPython terminal run by chaoliu1, which has been running for > a week; 9904 is "python test_trained_network.py", for almost three > weeks. The GPU and CPU are both idle; it seems that these are just > sitting there, claiming but not using the memory. (26099 is a Caffe > process by suppe, but that's using way less memory so I don't care.) > > If I knew who these people were, I'd just email them myself, but maybe > you can get in touch with them to stop hogging resources that they're > not actually using? > > Thanks, > Dougal Hi Doug, We have a new Wiki and the website (powered by DokuWiki) thanks to Simon Heath so above behavior will be added to Auton Lab Etiquette don't do it section. Right now we will treat this as a case of an uninformed user and hope that the person will read this e-mail and do the right thing. I am CC to users at autonlab.org. Best, Predrag