From pnspence at andrew.cmu.edu Wed Apr 1 09:31:38 2020 From: pnspence at andrew.cmu.edu (Patricia N Spencer) Date: Wed, 1 Apr 2020 13:31:38 +0000 Subject: Headset - Last Call! reply by Noon Thursday Message-ID: Hi everyone! I hope this finds you having a good day! I have 4 headsets left. If anyone is in need, I will drive it to you and place at your door. Those left unadopted will be returned to Amazon this week. Thanks very much and take care! Trish [1562005799537] Trish Spencer Project Manager Auton Lab, Robotics Institute Carnegie Mellon University Newell Simon Hall, Room #3124| Pittsburgh, PA | 15222 T: 412.268.9422; M: 831-227-3137 | PNSPENCE at andrew.cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 5461 bytes Desc: image001.png URL: From yeehos at andrew.cmu.edu Wed Apr 1 12:40:54 2020 From: yeehos at andrew.cmu.edu (Yeeho Song) Date: Wed, 1 Apr 2020 12:40:54 -0400 Subject: Headset - Last Call! reply by Noon Thursday In-Reply-To: References: Message-ID: Dear Patricia Spencer Thank you for your offer! If possible could you drop one off at 230 N.Craig Street as well? I live in an apartment building so if you could write my name on the box, it would be much easier for me to identify which package is mine! Sincerely, Yeeho Song On Wed, Apr 1, 2020 at 9:33 AM Patricia N Spencer wrote: > > > Hi everyone! > > > > I hope this finds you having a good day! > > > > I have 4 headsets left. If anyone is in need, I will drive it to you and > place at your door. > > > > Those left unadopted will be returned to Amazon this week. > > > > Thanks very much and take care! > > > > Trish > > > > [image: 1562005799537] > > *Trish Spencer* > Project Manager > > Auton Lab, Robotics Institute > > *Carnegie Mellon University * > > Newell Simon Hall, Room #3124*|* Pittsburgh, PA *|* 15222 > > T: 412.268.9422; M: 831-227-3137 *|** PNSPENCE at andrew.cmu.edu > * > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 5461 bytes Desc: not available URL: From awd at cs.cmu.edu Wed Apr 1 16:03:14 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Wed, 1 Apr 2020 16:03:14 -0400 Subject: Fwd: C3.ai COVID-19 Data Lake In-Reply-To: <5d9b1645-1045-469d-896b-221829a2f0e3@andrew.cmu.edu> References: <5d9b1645-1045-469d-896b-221829a2f0e3@andrew.cmu.edu> Message-ID: fyi ---------- Forwarded message --------- From: Srinivasa Narasimhan Date: Wed, Apr 1, 2020 at 2:54 PM Subject: C3.ai COVID-19 Data Lake To: Hi all, If you want to get access to the COVID-19 data sources, here is the c3.ai project weblink: https://c3.ai/covid/ Srinivas. _______________________________________________ ri-voting mailing list ri-voting at lists.andrew.cmu.edu https://lists.andrew.cmu.edu/mailman/listinfo/ri-voting -------------- next part -------------- An HTML attachment was scrubbed... URL: From hiteshar at andrew.cmu.edu Wed Apr 1 16:34:51 2020 From: hiteshar at andrew.cmu.edu (Hitesh Arora) Date: Wed, 1 Apr 2020 16:34:51 -0400 Subject: Permission denied error on scratch directory on all GPUs Message-ID: Hi Predrag, I started facing permission denied issue on scratch directory of all GPUs recently, leading to my processes being killed. I checked with Tanmay and he is also facing the same issue, so I guess it is an issue affecting more users. It looks like directory permissions got changed due to some reason: hiteshar at gpu6$ pwd /home hiteshar at gpu6$ ls -l total 8 drwx------. 5 auton-local auton-local 134 Sep 28 2017 auton-local drwxr-xr-x. 3 root root 20 Sep 30 2019 MATLAB drwx------. 143 root root 4096 Mar 31 22:19 scratch Can you please fix the permissions at the earliest? Thanks, Hitesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From yeehos at andrew.cmu.edu Wed Apr 1 16:39:16 2020 From: yeehos at andrew.cmu.edu (Yeeho Song) Date: Wed, 1 Apr 2020 16:39:16 -0400 Subject: Permission denied error on scratch directory on all GPUs In-Reply-To: References: Message-ID: Dear Predrag Punosevac I would also like to add that I'm also having similar issues with all of CPUs as well; yeehos at lov5$ ls -l total 8 drwxr-xr-x. 3 root root 20 Oct 1 2019 MATLAB drwx------. 5 auton-local auton-local 138 Oct 1 2018 auton-local drwx------. 131 root root 4096 Mar 31 22:19 scratch Thank you for your help! Sincerely, Yeeho Song On Wed, Apr 1, 2020 at 4:37 PM Hitesh Arora wrote: > Hi Predrag, > > I started facing permission denied issue on scratch directory of all GPUs > recently, leading to my processes being killed. I checked with Tanmay and > he is also facing the same issue, so I guess it is an issue affecting more > users. > > It looks like directory permissions got changed due to some reason: > > hiteshar at gpu6$ pwd > /home > hiteshar at gpu6$ ls -l > total 8 > drwx------. 5 auton-local auton-local 134 Sep 28 2017 auton-local > drwxr-xr-x. 3 root root 20 Sep 30 2019 MATLAB > drwx------. 143 root root 4096 Mar 31 22:19 scratch > > Can you please fix the permissions at the earliest? > > Thanks, > Hitesh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Apr 1 17:44:46 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 01 Apr 2020 17:44:46 -0400 Subject: [Permission Denied] Cd'ing into the scratch directory In-Reply-To: References: Message-ID: <20200401214446.FF-I4Gl59%predragp@andrew.cmu.edu> George Stoica wrote: > Hi Predrag, > > I hope everything is well. > > I am unable to cd into my scratch directory. I am receiving a "permission > denied" no matter which directory I choose within my scratch root directory > on GPU8: "/home/scratch/gis". I was wondering if you know by chance what > might be causing this issue? I have never encountered it before on the > auton resources. I also cannot sudo into the directories either. Fixed! I was removing few user accounts and didn't realize that runaway script has messed up permissions. Thanks for reporting! Predrag > > > Thanks very much, > George From ngisolfi at cs.cmu.edu Thu Apr 2 10:44:30 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 2 Apr 2020 10:44:30 -0400 Subject: [Lunch] Today @noon over Zoom Message-ID: <77E282DB-4623-477A-AE8B-1E70B5E4AC88@cs.cmu.edu> Hi Everyone, Auton Lab?s bring-your-own-lunch will happen today @noon over Zoom. The link for convenience: https://cmu.zoom.us/j/492870487 Meeting ID: 492 870 487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From hiteshar at andrew.cmu.edu Thu Apr 2 13:27:32 2020 From: hiteshar at andrew.cmu.edu (Hitesh Arora) Date: Thu, 2 Apr 2020 13:27:32 -0400 Subject: GPU9 and GPU 10 scratch full In-Reply-To: References: Message-ID: Hi All, GPU 9 and 10 scratch directory are almost full. Please check and delete/move your files from the scratch directories. Thanks, Hitesh On Tue, Feb 25, 2020 at 12:10 PM Hitesh Arora wrote: > Hi All, > > A gentle reminder on this! GPU 9 scratch is now completely FULL, making it > unusable. Please check and delete/move your files from the scratch > directory of GPU 9. > > > Thanks, > Hitesh > > On Sat, Feb 1, 2020 at 12:42 PM Sarveshwaran Jayaraman < > sarveshj at andrew.cmu.edu> wrote: > >> Hi All, >> >> >> GPU9 scratch space is almost full -- please check your space usage and >> clear out any unnecessary files. Thanks for your help! >> >> >> sarveshj at gpu9$ df -h /home/scratch >> Filesystem Size Used Avail Use% Mounted on >> /dev/mapper/sl_gpu9-home 1.8T 1.8T 67G 97% /home >> >> >> [image: 1562005799537] >> >> Sarvesh Jayaraman >> Sr. Research Analyst, Auton Lab >> Carnegie Mellon University >> Mob: +1-240-893-4287 >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-15620057995379b79b276-c9bc-475c-bdd9-7612280c8e92.png Type: image/png Size: 5461 bytes Desc: not available URL: From tanmaya at andrew.cmu.edu Thu Apr 2 13:48:48 2020 From: tanmaya at andrew.cmu.edu (Tanmay Agarwal) Date: Thu, 2 Apr 2020 13:48:48 -0400 Subject: Requesting users to kindly free-up resources once the runs are done! Message-ID: Hi Auton Users, I hope everyone is doing well and staying safe and healthy. I have found on a few GPUs that there are certain jobs that have been running for more than 2-3 weeks. I request you to kindly free-up the resources and not let them run for over weeks so that we can efficiently use the shared resources. Thanking you, Warm Regards, Tanmay Agarwal | MSR Graduate Student Robotics Institute @ CMU mailto: tanmaya at andrew.cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Thu Apr 2 13:57:24 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Thu, 2 Apr 2020 13:57:24 -0400 Subject: Requesting users to kindly free-up resources once the runs are done! In-Reply-To: References: Message-ID: This doesn't apply to Cov19 research related effort. Full steam ahead! I will not think twice to dedicate resources to Pandemic fighting effort without waiting for approval from my supervisors. Predrag On Thu, Apr 2, 2020, 1:50 PM Tanmay Agarwal wrote: > Hi Auton Users, > > I hope everyone is doing well and staying safe and healthy. I have found > on a few GPUs that there are certain jobs that have been running for more > than 2-3 weeks. I request you to kindly free-up the resources and not let > them run for over weeks so that we can efficiently use the shared resources. > > Thanking you, > > Warm Regards, > > Tanmay Agarwal | MSR Graduate Student > Robotics Institute @ CMU > mailto: tanmaya at andrew.cmu.edu > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tanmaya at andrew.cmu.edu Thu Apr 2 14:05:26 2020 From: tanmaya at andrew.cmu.edu (Tanmay Agarwal) Date: Thu, 2 Apr 2020 14:05:26 -0400 Subject: Requesting users to kindly free-up resources once the runs are done! In-Reply-To: References: Message-ID: Well, thanks Predrag for pointing that out. I would too appreciate any research efforts on that end and the above mail is just for users who are NOT working on Covid-19 related research / other important projects. Thanking you, Warm Regards, Tanmay Agarwal | MSR Graduate Student Robotics Institute @ CMU mailto: tanmaya at andrew.cmu.edu On Thu, Apr 2, 2020 at 1:57 PM Predrag Punosevac wrote: > This doesn't apply to Cov19 research related effort. Full steam ahead! I > will not think twice to dedicate resources to Pandemic fighting effort > without waiting for approval from my supervisors. > > Predrag > > On Thu, Apr 2, 2020, 1:50 PM Tanmay Agarwal > wrote: > >> Hi Auton Users, >> >> I hope everyone is doing well and staying safe and healthy. I have found >> on a few GPUs that there are certain jobs that have been running for more >> than 2-3 weeks. I request you to kindly free-up the resources and not let >> them run for over weeks so that we can efficiently use the shared resources. >> >> Thanking you, >> >> Warm Regards, >> >> Tanmay Agarwal | MSR Graduate Student >> Robotics Institute @ CMU >> mailto: tanmaya at andrew.cmu.edu >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Apr 6 13:55:23 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 06 Apr 2020 13:55:23 -0400 Subject: GPU19 full In-Reply-To: References: Message-ID: <20200406175523.PN7GPhCnT%predragp@andrew.cmu.edu> Mayra Melendez wrote: > Hi, Predrag, > > How does one check how much space is left on one of the Lab's GPUs? I'm > trying to save something to scratch on GPU19, and I'm getting a "[Errno28] > No space left on device" error. I just want to be sure it's actually full - > and I'm not just running into some weird error - before I email the rest of > the Lab and ask them to empty what they can off that GPU. Hi Mayra, df /home would do it. GPU has only 256 GB NVMe drive and no extra storage space. The scratch space is really tiny around 160GB. It is all hogged by a single user hiteshar who is CC to this email. I do have spare drives but limited physical access to the machine during the pandemic. If he doesn't clean his thing I will have to clean the scratch directory for him. Cheers, Predrag > > Hope you're staying well, > Mayra Melendez From hiteshar at andrew.cmu.edu Mon Apr 6 14:06:47 2020 From: hiteshar at andrew.cmu.edu (Hitesh Arora) Date: Mon, 6 Apr 2020 14:06:47 -0400 Subject: GPU19 full In-Reply-To: <20200406175523.PN7GPhCnT%predragp@andrew.cmu.edu> References: <20200406175523.PN7GPhCnT%predragp@andrew.cmu.edu> Message-ID: Hi Predrag and Mayra, I cleared up some space, and am moving other files, so you should be able to run your processes now. Sorry about it, some of my runs dumped more data than I intended. I'm on it, and will clear it further right away. Thanks, Hitesh On Mon, Apr 6, 2020 at 1:55 PM Predrag Punosevac wrote: > Mayra Melendez wrote: > > > Hi, Predrag, > > > > How does one check how much space is left on one of the Lab's GPUs? I'm > > trying to save something to scratch on GPU19, and I'm getting a > "[Errno28] > > No space left on device" error. I just want to be sure it's actually > full - > > and I'm not just running into some weird error - before I email the rest > of > > the Lab and ask them to empty what they can off that GPU. > > > Hi Mayra, > > df /home > > would do it. GPU has only 256 GB NVMe drive and no extra storage space. > The scratch space is really tiny around 160GB. It is all hogged by a > single user > > hiteshar > > who is CC to this email. I do have spare drives but limited physical > access to the machine during the pandemic. If he doesn't clean his thing > I will have to clean the scratch directory for him. > > Cheers, > Predrag > > > > > > Hope you're staying well, > > Mayra Melendez > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngisolfi at cs.cmu.edu Thu Apr 9 09:23:35 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 9 Apr 2020 09:23:35 -0400 Subject: [Lunch] Today @noon over Zoom Message-ID: Hi Everyone, Auton Lab?s bring-your-own-lunch will happen today @noon EDT over Zoom. The link for convenience: https://cmu.zoom.us/j/492870487 Meeting ID: 492 870 487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From boecking at andrew.cmu.edu Thu Apr 9 17:09:57 2020 From: boecking at andrew.cmu.edu (Benedikt Boecking) Date: Thu, 9 Apr 2020 17:09:57 -0400 Subject: Server load Message-ID: Hi all, Please be considerate with your research usage (see attached screenshot of lov6). 1. Before running a large job, check if it will slow others down because the server is already at its limit. 2. When choosing the number of parallel threads or number of GPUs to use, please be mindful that you are not the only one in need of the resources. 3. Please check htop frequently. Some settings can cause multi processing you might not be aware of and overwhelm the server with a massive number of threads (e.g. Open MP or Intel MKL) Thanks, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.png Type: image/png Size: 94695 bytes Desc: not available URL: From ngisolfi at cs.cmu.edu Thu Apr 16 11:17:18 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 16 Apr 2020 11:17:18 -0400 Subject: [Lunch] Today @noon over Zoom Message-ID: <48368E2B-EF57-4A0B-B706-FFF8644F3CEC@cs.cmu.edu> Hi Everyone, Auton Lab?s bring-your-own-lunch will happen today @noon EDT over Zoom. The link for convenience: https://cmu.zoom.us/j/492870487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From shulij at andrew.cmu.edu Fri Apr 17 12:51:49 2020 From: shulij at andrew.cmu.edu (Shuli Jiang) Date: Fri, 17 Apr 2020 12:51:49 -0400 Subject: Shuli's Master's thesis defense on Tuesday (April.21) Message-ID: Dear Autonians, I will be holding my Master's thesis defense next *Tuesday (April.21) 16:30pm ~ 17:30pm*, on "Deep Multi-view Clustering Using Local Similarity Graphs". You are welcome to attend if you are interested. Zoom link: https://cmu.zoom.us/j/99312100561 Thesis committee members: Prof. Artur Dubrawski (Chair), Prof. Jeff Schneider Title: Deep Multi-view Clustering Using Local Similarity Graphs Abstract: Multi-view clustering involves clustering data with different, possibly distinct feature sets simultaneously. In many application domains, multi-view data arises naturally. For example, news can be described by both text and pictures, and multimedia segments can be described by their video signals from cameras and audio signals from voice recorders. Multi-view clustering has a wide range of potentially impactful applications. Yet, the benefits of using graph-based local similarity information to learn better representations of data for clustering, and the flexibility of incorporating pairwise constraints which may be accessible to improve clustering performance, are still under-explored in multi-view clustering. We present Local Similarity Graph based Multi-view Clustering (LSGMC), a new and improved correlation based multi-view clustering approach. The method leverages local similarity graphs constructed by mutual K nearest neighbors. LSGMC uses the graphs to guide search for a better data representation through exploring first order proximity within views, and utilizing complementary information across views. We empirically show that LSGMC can efficiently use information from multiple views to improve clustering accuracy, and outperform state-of-the-art multi-view alternatives on a variety of benchmark and real world datasets, including image data for hand digit recognition, text data for language recognition and acoustic-articulatory data for speech recognition. We further show that LSGMC is flexible in incorporating pairwise constraints and thus it can be naturally extended to handle semi-supervised learning problems. Thesis document: http://www.andrew.cmu.edu/user/shulij/master_thesis.pdf Happy Friday. Cheers, Shuli -- *Shuli Jiang* Carnegie Mellon University B.S. Computer Science, 2019 M.S. Computer Science, 2020 -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Fri Apr 17 18:24:07 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 17 Apr 2020 18:24:07 -0400 Subject: Fwd: Autonlab-sysinfo Digest, Vol 69, Issue 175 In-Reply-To: References: Message-ID: Dear Autonians, Please see below. One of the HDD on the file server hosting /zfsauton/data and /zfsauton/project has crapped out on me. That, in turn, degraded one of zfs pools root at uranus:~ # zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT archive 21.8T 12.1T 9.62T - - 21% 55% 1.00x ONLINE - backups 36.2T 23.4T 12.9T - - 17% 64% 1.00x DEGRADED - data0 36.2T 17.3T 18.9T - - 30% 47% 1.00x ONLINE - data1 36.2T 2.54T 33.7T - - 7% 6% 1.00x ONLINE - zroot 107G 64.7G 42.3G - - 24% 60% 1.00x ONLINE - which conveniently holds those two data sets. Under normal circumstances, I would just replace HDD with the spare I have in my office. Unfortunately, RMA-ing failed drive is quite challenging under these circumstances. In my experience 3TB Seagate drives which were shipped with the server five years ago were nothing but the trouble. I have already replaced 11 out of 36 drives originally shipped. They will be out of warranty within the next 6 months. I have two options: 1. Replace the failed HDD and do lots of praying that another HDD doesn't die before I can RMA the faulty one. If another one dies we will be in the same position a month from now but I will not have a spear drive to react. 2. Pull the trigger and remount datasets from the backup which I made on the newest file server purchased by Dr. Schneider in December. It could be a few hours of inconvenience and perhaps tiny data loss. Right now, I am even scared to try to zfs replicate delta from the last ZFS snapshot as that can kill another drive which will degrade zfs pool even further. So theoretically a tiny portion of the work in the project folder could remain on the decaying ZFS pool which I will let rotten. I will not do anything until I hear from Lab elders (Artur and Jeff). Thirty legacy home directories (zfsauton/home) which are regularly backed up are on the same file server. I will probably migrate those home directories to zfsauton2 as the insensitive on keeping them on current location (20-30 min inconvenience to users) is very low. My plan when we get into the normal operation mode is to get 36x12TB new HDDs and retire those four zfs pools (five years old) build with crappy Seagate drives. Best, Predrag ---------- Forwarded message --------- From: Date: Fri, Apr 17, 2020 at 12:00 PM Subject: Autonlab-sysinfo Digest, Vol 69, Issue 175 To: Send Autonlab-sysinfo mailing list submissions to autonlab-sysinfo at autonlab.org To subscribe or unsubscribe via the World Wide Web, visit https://mailman.srv.cs.cmu.edu/mailman/listinfo/autonlab-sysinfo or, via email, send a message with subject or body 'help' to autonlab-sysinfo-request at autonlab.org You can reach the person managing the list at autonlab-sysinfo-owner at autonlab.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Autonlab-sysinfo digest..." Today's Topics: 1. neill-zfs.int.autonlab.org security run output (auton.sysnotify at gmail.com) 2. SMART error (ErrorCount) detected on host: uranus (auton.sysnotify at gmail.com) 3. SMART error (CurrentPendingSector) detected on host: uranus (auton.sysnotify at gmail.com) 4. SMART error (OfflineUncorrectableSector) detected on host: uranus (auton.sysnotify at gmail.com) ---------------------------------------------------------------------- Message: 1 Date: Fri, 17 Apr 2020 07:01:02 -0000 From: auton.sysnotify at gmail.com To: sysinfo at autonlab.org Subject: neill-zfs.int.autonlab.org security run output Message-ID: <5e99542f.1c69fb81.22168.ff77 at mx.google.com> Content-Type: text/plain; charset="utf-8" neill-zfs.int.autonlab.org kernel log messages: > pid 47721 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 47881 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 48047 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 48252 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 48416 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 48576 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 48711 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 48902 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 49068 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 49228 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 49398 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 49558 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 49693 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 49884 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 50050 (winbindd), uid 0: exited on signal 6 (core dumped) > pid 50257 (winbindd), uid 0: exited on signal 6 (core dumped) -- End of security output -- ------------------------------ Message: 2 Date: Fri, 17 Apr 2020 08:47:45 -0400 From: auton.sysnotify at gmail.com To: root+ at cs.cmu.edu Subject: SMART error (ErrorCount) detected on host: uranus Message-ID: <5e99a571.1a6f0.4349afd1 at uranus.int.autonlab.org> This message was generated by the smartd daemon running on: host name: uranus DNS domain: int.autonlab.org The following warning/error was logged by the smartd daemon: Device: /dev/da6 [SAT], ATA error count increased from 0 to 14 Device info: ST4000NM0024-1HT178, S/N:Z4F05P33, WWN:5-000c50-07b5a28c3, FW:SN02, 4.00 TB For details see host's SYSLOG. You can also use the smartctl utility for further investigation. No additional messages about this problem will be sent. ------------------------------ Message: 3 Date: Fri, 17 Apr 2020 09:17:11 -0400 From: auton.sysnotify at gmail.com To: root+ at cs.cmu.edu Subject: SMART error (CurrentPendingSector) detected on host: uranus Message-ID: <5e99ac57.1a6f7.79b86657 at uranus.int.autonlab.org> This message was generated by the smartd daemon running on: host name: uranus DNS domain: int.autonlab.org The following warning/error was logged by the smartd daemon: Device: /dev/da6 [SAT], 16 Currently unreadable (pending) sectors Device info: ST4000NM0024-1HT178, S/N:Z4F05P33, WWN:5-000c50-07b5a28c3, FW:SN02, 4.00 TB For details see host's SYSLOG. You can also use the smartctl utility for further investigation. No additional messages about this problem will be sent. ------------------------------ Message: 4 Date: Fri, 17 Apr 2020 09:17:11 -0400 From: auton.sysnotify at gmail.com To: root+ at cs.cmu.edu Subject: SMART error (OfflineUncorrectableSector) detected on host: uranus Message-ID: <5e99ac57.1a6f9.1d0a864 at uranus.int.autonlab.org> This message was generated by the smartd daemon running on: host name: uranus DNS domain: int.autonlab.org The following warning/error was logged by the smartd daemon: Device: /dev/da6 [SAT], 16 Offline uncorrectable sectors Device info: ST4000NM0024-1HT178, S/N:Z4F05P33, WWN:5-000c50-07b5a28c3, FW:SN02, 4.00 TB For details see host's SYSLOG. You can also use the smartctl utility for further investigation. No additional messages about this problem will be sent. ------------------------------ Subject: Digest Footer _______________________________________________ Autonlab-sysinfo mailing list Autonlab-sysinfo at autonlab.org https://mailman.srv.cs.cmu.edu/mailman/listinfo/autonlab-sysinfo ------------------------------ End of Autonlab-sysinfo Digest, Vol 69, Issue 175 ************************************************* From predragp at andrew.cmu.edu Sat Apr 18 12:58:50 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sat, 18 Apr 2020 12:58:50 -0400 Subject: Autonlab-sysinfo Digest, Vol 69, Issue 175 In-Reply-To: References: Message-ID: I have a solution to this problem which will minimize the number of users affected. I am migrating the remaining few home directories from /zfsauton/home to /zfsauton2/home. Once the migration is finished I will destroy empty ZFS pool and use it's 6 HDDs as hot spare drives to repair degraded pool and potentially to deal with future problems on other 2 healthy ZFS pools containing large amount of data. Predrag On Fri, Apr 17, 2020 at 6:24 PM Predrag Punosevac wrote: > > Dear Autonians, > > Please see below. One of the HDD on the file server hosting > /zfsauton/data and /zfsauton/project has crapped out on me. That, in > turn, degraded one of zfs pools > > root at uranus:~ # zpool list > NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP > HEALTH ALTROOT > archive 21.8T 12.1T 9.62T - - 21% 55% 1.00x ONLINE - > backups 36.2T 23.4T 12.9T - - 17% 64% 1.00x > DEGRADED - > data0 36.2T 17.3T 18.9T - - 30% 47% 1.00x ONLINE - > data1 36.2T 2.54T 33.7T - - 7% 6% 1.00x ONLINE - > zroot 107G 64.7G 42.3G - - 24% 60% 1.00x ONLINE - > > > which conveniently holds those two data sets. Under normal > circumstances, I would just replace HDD with the spare I have in my > office. Unfortunately, RMA-ing failed drive is quite challenging under > these circumstances. In my experience 3TB Seagate drives which were > shipped with the server five years ago were nothing but the trouble. I > have already replaced 11 out of 36 drives originally shipped. They > will be out of warranty within the next 6 months. > > I have two options: > > 1. Replace the failed HDD and do lots of praying that another HDD > doesn't die before I can RMA the faulty one. If another one dies we > will be in the same position a month from now but I will not have a > spear drive to react. > > 2. Pull the trigger and remount datasets from the backup which I made > on the newest file server purchased by Dr. Schneider in December. It > could be a few hours of inconvenience and perhaps tiny data loss. > Right now, I am even scared to try to zfs replicate delta from the > last ZFS snapshot as that can kill another drive which will degrade > zfs pool even further. So theoretically a tiny portion of the work in > the project folder could remain on the decaying ZFS pool which I will > let rotten. > > I will not do anything until I hear from Lab elders (Artur and Jeff). > Thirty legacy home directories (zfsauton/home) which are regularly > backed up are on the same file server. I will probably migrate those > home directories to zfsauton2 as the insensitive on keeping them on > current location (20-30 min inconvenience to users) is very low. > > My plan when we get into the normal operation mode is to get 36x12TB > new HDDs and retire those four zfs pools (five years old) build with > crappy Seagate drives. > > Best, > Predrag > > > > > > > ---------- Forwarded message --------- > From: > Date: Fri, Apr 17, 2020 at 12:00 PM > Subject: Autonlab-sysinfo Digest, Vol 69, Issue 175 > To: > > > Send Autonlab-sysinfo mailing list submissions to > autonlab-sysinfo at autonlab.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mailman.srv.cs.cmu.edu/mailman/listinfo/autonlab-sysinfo > or, via email, send a message with subject or body 'help' to > autonlab-sysinfo-request at autonlab.org > > You can reach the person managing the list at > autonlab-sysinfo-owner at autonlab.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Autonlab-sysinfo digest..." > > > Today's Topics: > > 1. neill-zfs.int.autonlab.org security run output > (auton.sysnotify at gmail.com) > 2. SMART error (ErrorCount) detected on host: uranus > (auton.sysnotify at gmail.com) > 3. SMART error (CurrentPendingSector) detected on host: uranus > (auton.sysnotify at gmail.com) > 4. SMART error (OfflineUncorrectableSector) detected on host: > uranus (auton.sysnotify at gmail.com) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 17 Apr 2020 07:01:02 -0000 > From: auton.sysnotify at gmail.com > To: sysinfo at autonlab.org > Subject: neill-zfs.int.autonlab.org security run output > Message-ID: <5e99542f.1c69fb81.22168.ff77 at mx.google.com> > Content-Type: text/plain; charset="utf-8" > > > neill-zfs.int.autonlab.org kernel log messages: > > pid 47721 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 47881 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 48047 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 48252 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 48416 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 48576 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 48711 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 48902 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 49068 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 49228 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 49398 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 49558 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 49693 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 49884 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 50050 (winbindd), uid 0: exited on signal 6 (core dumped) > > pid 50257 (winbindd), uid 0: exited on signal 6 (core dumped) > > -- End of security output -- > > > > ------------------------------ > > Message: 2 > Date: Fri, 17 Apr 2020 08:47:45 -0400 > From: auton.sysnotify at gmail.com > To: root+ at cs.cmu.edu > Subject: SMART error (ErrorCount) detected on host: uranus > Message-ID: <5e99a571.1a6f0.4349afd1 at uranus.int.autonlab.org> > > This message was generated by the smartd daemon running on: > > host name: uranus > DNS domain: int.autonlab.org > > The following warning/error was logged by the smartd daemon: > > Device: /dev/da6 [SAT], ATA error count increased from 0 to 14 > > Device info: > ST4000NM0024-1HT178, S/N:Z4F05P33, WWN:5-000c50-07b5a28c3, FW:SN02, 4.00 TB > > For details see host's SYSLOG. > > You can also use the smartctl utility for further investigation. > No additional messages about this problem will be sent. > > > ------------------------------ > > Message: 3 > Date: Fri, 17 Apr 2020 09:17:11 -0400 > From: auton.sysnotify at gmail.com > To: root+ at cs.cmu.edu > Subject: SMART error (CurrentPendingSector) detected on host: uranus > Message-ID: <5e99ac57.1a6f7.79b86657 at uranus.int.autonlab.org> > > This message was generated by the smartd daemon running on: > > host name: uranus > DNS domain: int.autonlab.org > > The following warning/error was logged by the smartd daemon: > > Device: /dev/da6 [SAT], 16 Currently unreadable (pending) sectors > > Device info: > ST4000NM0024-1HT178, S/N:Z4F05P33, WWN:5-000c50-07b5a28c3, FW:SN02, 4.00 TB > > For details see host's SYSLOG. > > You can also use the smartctl utility for further investigation. > No additional messages about this problem will be sent. > > > ------------------------------ > > Message: 4 > Date: Fri, 17 Apr 2020 09:17:11 -0400 > From: auton.sysnotify at gmail.com > To: root+ at cs.cmu.edu > Subject: SMART error (OfflineUncorrectableSector) detected on host: > uranus > Message-ID: <5e99ac57.1a6f9.1d0a864 at uranus.int.autonlab.org> > > This message was generated by the smartd daemon running on: > > host name: uranus > DNS domain: int.autonlab.org > > The following warning/error was logged by the smartd daemon: > > Device: /dev/da6 [SAT], 16 Offline uncorrectable sectors > > Device info: > ST4000NM0024-1HT178, S/N:Z4F05P33, WWN:5-000c50-07b5a28c3, FW:SN02, 4.00 TB > > For details see host's SYSLOG. > > You can also use the smartctl utility for further investigation. > No additional messages about this problem will be sent. > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Autonlab-sysinfo mailing list > Autonlab-sysinfo at autonlab.org > https://mailman.srv.cs.cmu.edu/mailman/listinfo/autonlab-sysinfo > > > ------------------------------ > > End of Autonlab-sysinfo Digest, Vol 69, Issue 175 > ************************************************* From d.c.howarth at gmail.com Mon Apr 20 10:42:35 2020 From: d.c.howarth at gmail.com (Dan Howarth) Date: Mon, 20 Apr 2020 10:42:35 -0400 Subject: lov3 scratch full Message-ID: Hello all, the lov3 scratch is full. Please check if there's anything you can delete. Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: From boecking at andrew.cmu.edu Mon Apr 20 14:52:07 2020 From: boecking at andrew.cmu.edu (Benedikt Boecking) Date: Mon, 20 Apr 2020 14:52:07 -0400 Subject: Current resource usage Message-ID: Hi everyone, There is a rather small number of users currently taking up most of the computing resources from lov1 through lov6 (+some of the smaller CPU servers). Because of this behavior, some servers such as lov6 are almost unusable. If you are currently using a large number of threads on one or more of the computing nodes, please consider scaling your resource usage down a little so that your colleagues can also make use of the servers. Best, Ben From sarveshj at andrew.cmu.edu Wed Apr 22 09:11:37 2020 From: sarveshj at andrew.cmu.edu (Sarveshwaran Jayaraman) Date: Wed, 22 Apr 2020 13:11:37 +0000 Subject: Regarding GPU contention Message-ID: Hi All, Following is snapshot of sample GPU node usage which is usually run by a single user. This pattern seems to have dramatically increased over the last couple of days. | 0 GeForce RTX 208... Off | 00000000:18:00.0 Off | N/A | | 35% 57C P2 92W / 250W | 7500MiB / 11019MiB | 38% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce RTX 208... Off | 00000000:3B:00.0 Off | N/A | | 38% 63C P2 126W / 250W | 3738MiB / 11019MiB | 64% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce RTX 208... Off | 00000000:86:00.0 Off | N/A | | 41% 66C P2 137W / 250W | 3747MiB / 11019MiB | 55% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce RTX 208... Off | 00000000:AF:00.0 Off | N/A | | 41% 67C P2 140W / 250W | 3705MiB / 11019MiB | 54% Default | I usually need to load model backbone for my experiments which requires around 10GB of RAM, but because of above usage pattern I am unable to really use any GPU. One way around this to increase percentage GPU usage and reduce number of GPUs used. Please be mindful especially if you're running across multiple GPU and your process has GPU usage in bursts. Thanks for your help! [1562005799537] Sarvesh Jayaraman Sr. Research Analyst, Auton Lab Carnegie Mellon University Mob: +1-240-893-4287 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-156200579953713abd276-b334-4989-90ea-a7e5da57a386.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-156200579953713abd276-b334-4989-90ea-a7e5da57a386.png URL: From predragp at andrew.cmu.edu Wed Apr 22 15:30:59 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 22 Apr 2020 15:30:59 -0400 Subject: GPU12 Message-ID: <20200422193059.bcsNxbdNh%predragp@andrew.cmu.edu> Dear Autonians, I hope everyone is healthy and safe. Just a quick heads up. Somebody crashed GPU12. It had to be cold reboot it. Thanks to IPMI I could do that from my home. I am rebuilding NVidia driver and CUDA as the server had uptime of almost a year and the things have changed some. Please give another 30 min before you try to use. Cheers, Predrag From ngisolfi at cs.cmu.edu Thu Apr 23 11:30:17 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 23 Apr 2020 11:30:17 -0400 Subject: [Lunch] Today @noon over Zoom Message-ID: Hi Everyone, Auton Lab?s bring-your-own-lunch will happen today @noon EDT over Zoom. The link for convenience: https://cmu.zoom.us/j/492870487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Thu Apr 23 12:51:00 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Thu, 23 Apr 2020 12:51:00 -0400 Subject: Missing 17 year old recovered in under a week using Traffic Jam In-Reply-To: References: <431757831d2c1c5ec10ecfdf9.92d30128d1.20200423135936.1ac3522150.c5311d4b@mail12.sea31.mcsv.net> Message-ID: Dear Autonians, Remember the Super Bowl in Houston in 2017? That's when we first put our "first appearance" stratifiers in counter-sex-trafficking software to a systematic use. Marinus then made it a permanent part of their product, Traffic Jam. It has just been announced that this capability was essential in rescuing a sex trafficked minor in the UK during the past week. This is just a snippet of a large collection of instances in which software developed at CMU Auton Lab helps reclaim lives of the most vulnerable persons. Congratulations, especially to everyone on theTeam who contributed to developing Traffic Jam! Cheers, Artur Missing 17 year old recovered in under a week using Traffic Jam The First Appearance feature in Traffic Jam led to the identification of a minor last week. First Appearance displays any ads in your area where the phone number(s) and photo(s) have never been observed before by the system. During the COVID-19 crisis, First Appearance can be utilized as a proactive safeguarding method to detect vulnerable and exploited persons entering sex work when social distancing and lockdowns are in effect. Where Can You Find First Appearance? You can find First Appearance under the *Leads* tab on the top menu bar in Traffic Jam. Be mindful of the possibility of spam/scam ads appearing in this report. This type of content involves the same phone number appearing in many cities on a single day. *Try Now* [image: LinkedIn] [image: Twitter] [image: Website] *Copyright ? 2020 Marinus Analytics, All rights reserved.* Creating AI for Social Impact *Our mailing address is:* support at marinusanalytics.com This email was sent to cara at marinusanalytics.com *why did I get this?* unsubscribe from this list update subscription preferences Marinus Analytics ? PO BOX 6127 ? Pittsburgh, PA 15212-9998 ? USA -------------- next part -------------- An HTML attachment was scrubbed... URL: From choset at andrew.cmu.edu Thu Apr 23 14:50:29 2020 From: choset at andrew.cmu.edu (Howie Choset) Date: Thu, 23 Apr 2020 14:50:29 -0400 Subject: Missing 17 year old recovered in under a week using Traffic Jam In-Reply-To: References: <431757831d2c1c5ec10ecfdf9.92d30128d1.20200423135936.1ac3522150.c5311d4b@mail12.sea31.mcsv.net> Message-ID: <08f801d619a0$0fec8970$2fc59c50$@andrew.cmu.edu> Thank you for sharing and I hope this girl and her family can move on from this horrific episode of her life, which only would have been infinitely more horrific, if it had not been for your team and you. Howie From: Artur Dubrawski Sent: Thursday, April 23, 2020 12:51 PM To: users at autonlab.org Cc: Martial Hebert ; Michael McQuade ; Srinivasa Narasimhan ; Farnam Jahanian ; Byron Spice ; George Darakos ; Howie Choset Subject: Missing 17 year old recovered in under a week using Traffic Jam Dear Autonians, Remember the Super Bowl in Houston in 2017? That's when we first put our "first appearance" stratifiers in counter-sex-trafficking software to a systematic use. Marinus then made it a permanent part of their product, Traffic Jam. It has just been announced that this capability was essential in rescuing a sex trafficked minor in the UK during the past week. This is just a snippet of a large collection of instances in which software developed at CMU Auton Lab helps reclaim lives of the most vulnerable persons. Congratulations, especially to everyone on theTeam who contributed to developing Traffic Jam! Cheers, Artur Missing 17 year old recovered in under a week using Traffic Jam The First Appearance feature in Traffic Jam led to the identification of a minor last week. First Appearance displays any ads in your area where the phone number(s) and photo(s) have never been observed before by the system. During the COVID-19 crisis, First Appearance can be utilized as a proactive safeguarding method to detect vulnerable and exploited persons entering sex work when social distancing and lockdowns are in effect. Where Can You Find First Appearance? You can find First Appearance under the Leads tab on the top menu bar in Traffic Jam. Be mindful of the possibility of spam/scam ads appearing in this report. This type of content involves the same phone number appearing in many cities on a single day. Try Now Copyright ? 2020 Marinus Analytics, All rights reserved. Creating AI for Social Impact Our mailing address is: support at marinusanalytics.com This email was sent to cara at marinusanalytics.com why did I get this? unsubscribe from this list update subscription preferences Marinus Analytics ? PO BOX 6127 ? Pittsburgh, PA 15212-9998 ? USA -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Fri Apr 24 13:45:19 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 24 Apr 2020 13:45:19 -0400 Subject: Fwd: best paper honorable mention in ICCP2020 In-Reply-To: <0e846659-aa13-f735-e4d3-00c9d84223f1@andrew.cmu.edu> References: <0e846659-aa13-f735-e4d3-00c9d84223f1@andrew.cmu.edu> Message-ID: Team, Our own Chao Liu has just received an honorable mention in the best paper contest for his newest publication at the ICCP 2020 conference. Congrats Chao! Artur [image: Screen Shot 2020-04-24 at 11.40.46 AM.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-04-24 at 11.40.46 AM.png Type: image/png Size: 656425 bytes Desc: not available URL: From predragp at andrew.cmu.edu Fri Apr 24 18:48:48 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 24 Apr 2020 18:48:48 -0400 Subject: Connection Refused For bash.autonlab.org In-Reply-To: References: Message-ID: <20200424224848.AFvsH-8zh%predragp@andrew.cmu.edu> Naji Shajarisales wrote: > Hi Predrag, > > Hope you are doing well and everything is fine with you during this > bizarre > days. > > I just wanted to check with you if anything has happened to > bash.autonlab.org cause I do get connection refused. Thank for bringing this to my attention. Sure enough I can't ping bash. That is my tiny office NUC which can't be rebooted remotelly. I have no idea what happened to it and we will not be able to find out until pandemic is over. So for now your option is to use lop2.autonlab.org but within 30 minutes you will have another two options. We have lop1.autonlab.org for the Auton Lab legacy users (those with home directories on zfsauton) as the OS is not supporting autofs needed to mount your home directories. I will create a tiny fake home directories for everyone on the lop1.autonlab.org in the next 20-30 min so that everyone can use it as a proxy machine to access the lab. Also other office desktop computers seems to be OK. I will enable lion.auton.cs.cmu.edu ECDSA key fingerprint: SHA256:BL7KygrfP6PApBpf6BFHlphnc9f0KpsdhSvsguAhP4I which is not currently used by anybody as the third gateway to the Lab. Cheers, Predrag > > Best, > Naji From predragp at andrew.cmu.edu Sun Apr 26 15:23:24 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sun, 26 Apr 2020 15:23:24 -0400 Subject: Auton Lab Intranet major update Message-ID: <20200426192324.VuINKPaOk%predragp@andrew.cmu.edu> Dear Autonians, As the remote work is becoming the new norm during the pandemic, I decided to do the major update of our Intranet pages. We are increasingly relay on documentation to preform our daily assignments so could no longer procrastinate Please use the link below https://www.autonlab.org/intranet username: auton password: Dr.Who to log and see the result. As you might imagine editing html files by hand didn't scale-up too well so I decided to deploy Sphinx Python Documentation Generator https://www.sphinx-doc.org/en/master/ That could be useful for documenting other lab projects and code. Sphinx uses reStructuredText markdown as its markup language. Both the language and the documentation preparation software are super easy to learn. Cheat sheets for both Sphinx and reStructuredText are numerous and easy to follow. For example https://matplotlib.org/sampledoc/cheatsheet.html Notice the show source tab which we also have on our website. If you would like to contribute to the Auton Lab Intranet/wiki pages please create the document using .rst format and I would be happy to put it onto the server. At this point I really need to make a public apiology to Sarveshwaran Jayaraman and Andrew Williams. Those two have tried on numerous occasions over the past two months to make contributions both to our Intranet and to our main website. Unfortunately for most part I was busy with other things and ignored the stuff they were sending to me. I promised that as soon as I get some sleep I would download Sarveshwaran's documents from the CMU Box as well as apply Andrew's patches/updates to the main web site. Cheers, Predrag From predragp at andrew.cmu.edu Mon Apr 27 14:44:17 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 27 Apr 2020 14:44:17 -0400 Subject: GPU1 down In-Reply-To: References: Message-ID: <20200427184417.YcVG4j2OI%predragp@andrew.cmu.edu> Youngseog Chung wrote: > Hi Predrag, > > I hope you had a good weekend. > > I'm sorry to bother you on a Monday morning, but I wanted to bring to your > attention the possibility of GPU1 being down. > I have been trying to access GPU1 all day on Sunday, but couldn't get > through, and the problem seems to be specific to GPU1. > Fixed! Somebody run the server into the ground midday Sunday. Chirag reported Sunday evening. Predrag > If it's not too much trouble, would you mind taking a look at the situation? > > Thank you very much. > > > Best, > Youngseog From predragp at andrew.cmu.edu Mon Apr 27 18:31:38 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 27 Apr 2020 18:31:38 -0400 Subject: bash is up and running again In-Reply-To: References: <1d4e97de3a8b4ae5a7c97cd61fc8aa3c@andrew.cmu.edu> <9791e1aa5eda419194b9e058be458bc7@cmu.edu> <3d57e44851c947e6b2d4dd67dcf6ec41@andrew.cmu.edu> Message-ID: <20200427223138.k-BXXowvI%predragp@andrew.cmu.edu> Donghan Wang wrote: > Thank you, Artur, for the info. > > I've rebooted the following desktops > > 1. lake.auton.cs.cmu.edu NSH 3119 Predrag Punosevac > 2. lula.auton.cs.cmu.edu NSH 3119 Andrew Williams > 3. lean.auton.cs.cmu.edu NSH 3115 Donghan Wang > 4. lena.auton.cs.cmu.edu NSH 3122 Daniel Howarth > 5. lyra.auton.cs.cmu.edu NSH 3122 Sarveshwaren Jayarman > 6. land.auton.cs.cmu.edu NSH 3123 Jieshi Chen > 7. lupe.auton.cs.cmu.edu NSH 3125 Gus Welter > > Karen's desktop is up and running. > > Thanks, > Jarod Thanks Chief! Predrag From predragp at andrew.cmu.edu Mon Apr 27 22:04:16 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 27 Apr 2020 22:04:16 -0400 Subject: bash crashes again Message-ID: <20200428020416.kjbpdA4kr%predragp@andrew.cmu.edu> Dear Autonians, Just a quick heads up. bash.autonlab.org (my desktop) just crashed again. I have no idea what happened nor I care too much about it. There are other three shell gateways. lop2.autonlab.org lop1.autonlab.org lion.auton.cs.cmu.edu Best, Predrag From awd at cs.cmu.edu Tue Apr 28 11:28:50 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 28 Apr 2020 11:28:50 -0400 Subject: Two Auton Lab thesis defenses next week! Mark your calendars please... In-Reply-To: References: Message-ID: Dear Autonians, Please join me in attending 2 (yes, two) excellent virtual presentations by our own Maria De-Arteaga and Chao Liu, both of which are scheduled for the next week. (btw, I do not remember when was the last time we had more than one doctoral thesis defense scheduled in one week at the Auton Lab...) Maria's defense will be on Monday May 4th at 11am, The official announcement will be shared soon. Chao's defense is scheduled for Thursday May 7th at noon. The official announcement with the zoom link is included below. Please help seeing these outstanding colleagues move to the next levels of their professional lives by attending these presentations and cheering for them :) Cheers, Artur ----- Date: 07 May 2020 Time: 12:00 p.m. Place: *Virtual Presentation* https://cmu.zoom.us/j/2623852919 Type: Ph.D. Thesis Defense Who: Chao Liu Title: Vision with Small Baselines Abstract: 3D sensing with portable imaging systems is becoming more and more popular in computer vision applications such as autonomous driving, virtual reality, robotics manipulation and surveillance, due to the decreasing expense and size of RGB cameras. Despite the compactness and portability of the small baseline vision systems, it is well-known that the uncertainty in range finding using multiple views and the sensor baselines are inversely related. On the other hand, besides compactness, the small baseline vision system has its unique advantages such as easier correspondence and large overlapping regions across views. The goal of this thesis is to develop computational methods and small baseline imaging systems for 3D sensing of complex scenes in real world conditions. Our design principle is to physically model the scene complexities and specifically infer the uncertainties for the images captured with small baseline setups. With this design principle, we make four contributions. In the first contribution, we propose a two-stage near-light photometric stereo method using a small (6 cm diameter) LED ring. The imaging system is compact compared to traditional photometric stereo systems. In the second contribution, we develop an algorithm to simultaneously estimate the occlusion pattern and depth for thin structures from a focal image stack, which is obtained either by varying the focus/aperture of the lens or computed from a one-shot light field image. As the third contribution, we propose a learning-based method to estimate per-pixel depth and its uncertainty continuously from a monocular video stream, with small camera baselines across adjacent frames. These depth probability volumes are accumulated over time as more incoming frames are processed sequentially, which effectively reduces depth uncertainty and improves accuracy, robustness, and temporal stability. Finally, using a pair of high resolution camera and laser projector, we develop a high spatial resolution Diffuse Optical Tomography (DOT) system that can detect accurate boundaries and relative depth of heterogeneous structures up to a depth of 8mm below a highly scattering medium such as whole milk. We showcase the application of a small baseline vision system for in-vivo micro-scale 3D reconstruction of capillary veins and develop a system for real-time analysis of microvascular blood flow for critical care. We believe that the computational methods developed in this thesis would find more applications of compact 3D sensing under challenging conditions. Thesis Committee Members: Srinivasa G. Narasimhan, Co-chair Artur W. Dubrawski, Co-chair Aswin C. Sankaranarayanan Manmohan Chandraker, University of California, San Diego A copy of the thesis document is available at: https://www.dropbox.com/s/cz75koh96ragy4x/thesis-small-baseline.pdf?dl=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Apr 29 08:27:44 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 29 Apr 2020 08:27:44 -0400 Subject: bash crashes again In-Reply-To: References: <20200428020416.kjbpdA4kr%predragp@andrew.cmu.edu> Message-ID: <20200429122744.FKtS7nUVM%predragp@andrew.cmu.edu> Ifigeneia Apostolopoulou wrote: > Hi Predrag, > Hi Ifi > I just wanted to bring it to your attention: > > no gateway is currently working for me. I may occasionally be able to > login I just checked lop2.autonlab.org lop1.autonlab.org lion.auton.cs.cmu.edu and I have no problem login. I have logged first with my regular account to eliminate possibility that LDAP services are down. Then I have used my root account to log as a you to check autofs daemon. Please see below. root at lop2$ su - iapostol Last login: Wed Apr 29 07:47:06 EDT 2020 from c-73-154-131-241.hsd1.pa.comcast.net on pts/25 root at lop2$ pwd /zfsauton3/home/iapostol lop1# su - iapostol -bash-5.0$ pwd /zfsauton3/home/iapostol -bash-5.0$ uname -a OpenBSD lop1.int.autonlab.org 6.6 GENERIC.MP#8 amd64 [root at lion ~]# su - iapostol Last failed login: Wed Apr 29 00:34:26 EDT 2020 from c-73-154-131-241.hsd1.pa.comcast.net on ssh:notty There were 4 failed login attempts since the last successful login. -bash-4.2$ pwd /zfsauton3/home/iapostol -bash-4.2$ uname -a Linux lion.auton.cs.cmu.edu 3.10.0-1127.el7.x86_64 #1 SMP Wed Apr 8 08:26:53 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux Now lion is showing an interesting output. It shows that you had tried 4 times to log with incorrect credentials. That would definitely put you on the banned list at least for a while. However, if I have to put my money on your problems I would guess that there is a DNS problem. I made sure that the Auton Lab DNS servers are working as advertised so I will point to your personal DNS. There is some remote chance that you are experiencing that weird routing problem, reported by ram and me, when NSA breaks CMU routing tables and blocks bunch of residential ISP from reaching CMU. > but still can't find anything in my home directory :// > > > iapostol at lop2.autonlab.org:/zfsauton3/home/iapostol/ > > iapostol at lop2.autonlab.org's password: > > Could not chdir to home directory /zfsauton3/home/iapostol: Input/output > error This was actually more interesting part of your report. I immediately assumed that my auto.nfs file got corrupted or that autofs daemon is not working properly. I had a problem with autofs on lion.auton.cs.cmu.edu so I am not running it out of systemd. It is manually started. However, as of this morning autofs works both on lop2.autonlab.org and lion.auton.cs.cmu.edu which you can see from the above output. lop1.autonlab.org doesn't run autofs daemon as it is runs of OpenBSD which doesn't have a modern autofs daemon. In order for you to log into lop1.autonlab.org I created a tiny local home directories which are needed for you to ssh to computing nodes. I would not expect that you see anything inside your home directory on lop1.autonlab.org. > > Just a quick heads up. bash.autonlab.org (my desktop) just crashed > > again. I have no idea what happened nor I care too much about it. There > > are other three shell gateways. I do know what is the problem with bash. Bash is a NUC machine. After upgrade to Red Hat 7.7 a network driver regression (reported by multiple people including me) was introduced which caused network interface to crap out. I typically manually select an older stable kernel when I reboot bash but this time around I realized that somebody else rebooted machine and grub boot-loader just picked new broken kernel. That thin is now going to rotten for a while. FYI Red Hat dismissed my bug report since we are not paid customers :-) For Red Hat/IBM the problem doesn't exist. Best, Predrag P.S. I almost forgot. If you had things running out of tmux or screen make sure you log out first before you try to recommenct to the Auton Lab. I have seen all sorts of weird things happening because of that. From sarveshj at andrew.cmu.edu Wed Apr 29 12:33:25 2020 From: sarveshj at andrew.cmu.edu (Sarveshwaran Jayaraman) Date: Wed, 29 Apr 2020 16:33:25 +0000 Subject: GPU memory Usage Message-ID: <5f3e7b4a92a74db09c2be0406d8dbb0e@andrew.cmu.edu> Hi All, I was trying to run my experiments on GPU 14 and came across this situation. On GPU ID 0 & 1 (highlighted in green& blue respectively) user has not released the GPU memory after experiment. A possible scenario could be that the user has not shutdown the jupyter notebook after use (closing does not suffice). Therefore 2 out of possible 4 GPUs are not available on that node. Please be mindful to free GPU memory after use for other users if that's the case. One simple solution I found around this is to convert your notebooks to python script and run them using nohup command. Thanks for your understanding! (base) sarveshj at gpu14$ nvidia-smi -l 3 Wed Apr 29 11:18:10 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:18:00.0 Off | N/A | | 32% 48C P2 59W / 250W | 10980MiB / 11019MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce RTX 208... Off | 00000000:3B:00.0 Off | N/A | | 41% 66C P2 98W / 250W | 10984MiB / 11019MiB | 18% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce RTX 208... Off | 00000000:86:00.0 Off | N/A | | 33% 54C P2 67W / 250W | 1935MiB / 11019MiB | 9% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce RTX 208... Off | 00000000:AF:00.0 Off | N/A | | 32% 43C P2 62W / 250W | 8849MiB / 11019MiB | 4% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 203849 C python3 1677MiB | | 1 203849 C python3 155MiB | | 1 236031 C /home/scratch/sarveshj/mini/bin/python3 10817MiB | | 2 203849 C python3 155MiB | | 2 232877 C python 1613MiB | | 2 236031 C /home/scratch/sarveshj/mini/bin/python3 155MiB | | 3 147113 C python 1613MiB | | 3 203849 C python3 155MiB | +-----------------------------------------------------------------------------+ [1562005799537] Sarvesh Jayaraman Sr. Research Analyst, Auton Lab Carnegie Mellon University Mob: +1-240-893-4287 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1562005799537128f0e9d-daf3-49bd-9c7a-70bd4af32ea2.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-1562005799537128f0e9d-daf3-49bd-9c7a-70bd4af32ea2.png URL: From awd at cs.cmu.edu Wed Apr 29 12:39:20 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Wed, 29 Apr 2020 12:39:20 -0400 Subject: Two Auton Lab thesis defenses next week! Mark your calendars please... In-Reply-To: References: Message-ID: And the details of Maria's defense on Monday: Please join us on Monday, May 4 via Zoom at 11am when Maria De-Arteaga (ML & Public Policy Joint PhD) will be defending her thesis. *Title:* Machine Learning in High-Stakes Settings: Risks and Opportunities *Thesis committee:* Artur Dubrawski (Co-Chair), Alexandra Chouldechova (Co-Chair), Roni Rosenfeld, Adam Tauman Kalai (Microsoft Research) *Zoom Link:* https://cmu.zoom.us/j/94967473449?pwd=b09lL29qblg1ZU5BWHZhVDB2NjFjQT09 *Meeting ID:* 949 6747 3449 *Password:* 000312 *Abstract: * Machine learning (ML) is increasingly being used to support decision-making in critical settings, where predictions have potentially grave implications over human lives. Examples include healthcare, hiring, child welfare, and the criminal justice system. In this thesis, I study the risks and opportunities of machine learning in high-stakes settings. In the first chapter I focus on opportunities of ML to support experts' decisions when dealing with high-resolution multivariate data, a type of data that is particularly hard for humans to interpret. I propose methodology to discover latent complex multivariate correlation structures and illustrate its use in two different domains: (1) identification of radioactive threats in nuclear physics, and (2) prediction of neurological recovery of comatose patients in healthcare. In the second chapter, focused on algorithmic fairness, I demonstrate how societal biases encoded in historical data may be reproduced and amplified by ML models, and introduce a new algorithm to mitigate biases without assuming access to protected attributes. Finally, in the third chapter I characterize challenges that arise from the limitations of available labels in decision support contexts--such as the selective labels problem and omitted payoff bias--and propose methodology to estimate and leverage human consistency to improve algorithmic recommendations and human-machine complementarity. *Paper Link:* https://www.dropbox.com/s/h449z85r6nls8oc/Dissertation_DeArteaga.pdf?dl=0 On Tue, Apr 28, 2020 at 11:28 AM Artur Dubrawski wrote: > Dear Autonians, > > Please join me in attending 2 (yes, two) excellent virtual > presentations by our own Maria De-Arteaga and Chao Liu, both of which are > scheduled for the next week. > > (btw, I do not remember when was the last time we had more than one > doctoral thesis defense scheduled in one week at the Auton Lab...) > > Maria's defense will be on Monday May 4th at 11am, > The official announcement will be shared soon. > > Chao's defense is scheduled for Thursday May 7th at noon. > The official announcement with the zoom link is included below. > > Please help seeing these outstanding colleagues move to the next levels of > their professional lives by attending these presentations and cheering for > them :) > > Cheers, > Artur > > ----- > > Date: 07 May 2020 > > Time: 12:00 p.m. > > Place: *Virtual Presentation* https://cmu.zoom.us/j/2623852919 > > Type: Ph.D. Thesis Defense > > Who: Chao Liu > > Title: Vision with Small Baselines > > > Abstract: > 3D sensing with portable imaging systems is becoming more and more popular > in computer vision applications such as autonomous driving, virtual > reality, robotics manipulation and surveillance, due to the decreasing > expense and size of RGB cameras. Despite the compactness and portability of > the small baseline vision systems, it is well-known that the uncertainty in > range finding using multiple views and the sensor baselines are inversely > related. On the other hand, besides compactness, the small baseline vision > system has its unique advantages such as easier correspondence and large > overlapping regions across views. > > The goal of this thesis is to develop computational methods and small > baseline imaging systems for 3D sensing of complex scenes in real world > conditions. Our design principle is to physically model the scene > complexities and specifically infer the uncertainties for the images > captured with small baseline setups. > > With this design principle, we make four contributions. In the first > contribution, we propose a two-stage near-light photometric stereo method > using a small (6 cm diameter) LED ring. The imaging system is compact > compared to traditional photometric stereo systems. In the second > contribution, we develop an algorithm to simultaneously estimate the > occlusion pattern and depth for thin structures from a focal image stack, > which is obtained either by varying the focus/aperture of the lens or > computed from a one-shot light field image. As the third contribution, we > propose a learning-based method to estimate per-pixel depth and its > uncertainty continuously from a monocular video stream, with small camera > baselines across adjacent frames. These depth probability volumes are > accumulated over time as more incoming frames are processed sequentially, > which effectively reduces depth uncertainty and improves accuracy, > robustness, and temporal stability. Finally, using a pair of high > resolution camera and laser projector, we develop a high spatial resolution > Diffuse Optical Tomography (DOT) system that can detect accurate boundaries > and relative depth of heterogeneous structures up to a depth of 8mm below a > highly scattering medium such as whole milk. > > We showcase the application of a small baseline vision system for in-vivo > micro-scale 3D reconstruction of capillary veins and develop a system for > real-time analysis of microvascular blood flow for critical care. We > believe that the computational methods developed in this thesis would find > more applications of compact 3D sensing under challenging conditions. > > > > Thesis Committee Members: > > Srinivasa G. Narasimhan, Co-chair > Artur W. Dubrawski, Co-chair > Aswin C. Sankaranarayanan > Manmohan Chandraker, University of California, San Diego > > > A copy of the thesis document is available at: > > https://www.dropbox.com/s/cz75koh96ragy4x/thesis-small-baseline.pdf?dl=0 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Apr 29 15:42:30 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 29 Apr 2020 15:42:30 -0400 Subject: GPU memory Usage In-Reply-To: <5f3e7b4a92a74db09c2be0406d8dbb0e@andrew.cmu.edu> References: <5f3e7b4a92a74db09c2be0406d8dbb0e@andrew.cmu.edu> Message-ID: <20200429194230.MS-WFhP9w%predragp@andrew.cmu.edu> Sarveshwaran Jayaraman wrote: > Hi All, > > I was trying to run my experiments on GPU 14 and came across this > situation. On GPU ID 0 & 1 (highlighted in green& blue respectively) > user has not released the GPU memory after experiment. This is one of quintessential don'ts and it is now well documented https://www.autonlab.org/autonlab_wiki/ Offending members will have their accounts suspendend until they take a quiz and score above 80%. If you take a quiz and flunk it, a mandatory seven day waiting period will be enforced :-)))))) Cheers, Predrag > A possible scenario could be that the user has not shutdown the > jupyter notebook after use (closing does not suffice). Therefore 2 > out of possible 4 GPUs are not available on that node. > > > Please be mindful to free GPU memory after use for other users if that's the case. One simple solution I found around this is to convert your notebooks to python script and run them using nohup command. > > Thanks for your understanding! > > > (base) sarveshj at gpu14$ nvidia-smi -l 3 > Wed Apr 29 11:18:10 2020 > +-----------------------------------------------------------------------------+ > | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | > |-------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | > |===============================+======================+======================| > | 0 GeForce RTX 208... Off | 00000000:18:00.0 Off | N/A | > | 32% 48C P2 59W / 250W | 10980MiB / 11019MiB | 0% Default | > +-------------------------------+----------------------+----------------------+ > | 1 GeForce RTX 208... Off | 00000000:3B:00.0 Off | N/A | > | 41% 66C P2 98W / 250W | 10984MiB / 11019MiB | 18% Default | > +-------------------------------+----------------------+----------------------+ > | 2 GeForce RTX 208... Off | 00000000:86:00.0 Off | N/A | > | 33% 54C P2 67W / 250W | 1935MiB / 11019MiB | 9% Default | > +-------------------------------+----------------------+----------------------+ > | 3 GeForce RTX 208... Off | 00000000:AF:00.0 Off | N/A | > | 32% 43C P2 62W / 250W | 8849MiB / 11019MiB | 4% Default | > +-------------------------------+----------------------+----------------------+ > > +-----------------------------------------------------------------------------+ > | Processes: GPU Memory | > | GPU PID Type Process name Usage | > |=============================================================================| > | 0 203849 C python3 1677MiB | > | 1 203849 C python3 155MiB | > | 1 236031 C /home/scratch/sarveshj/mini/bin/python3 10817MiB | > | 2 203849 C python3 155MiB | > | 2 232877 C python 1613MiB | > | 2 236031 C /home/scratch/sarveshj/mini/bin/python3 155MiB | > | 3 147113 C python 1613MiB | > | 3 203849 C python3 155MiB | > +-----------------------------------------------------------------------------+ > > > > > [1562005799537] > > Sarvesh Jayaraman > Sr. Research Analyst, Auton Lab > Carnegie Mellon University > Mob: +1-240-893-4287 > > From gwelter at andrew.cmu.edu Wed Apr 29 19:35:48 2020 From: gwelter at andrew.cmu.edu (Gus Welter) Date: Wed, 29 Apr 2020 19:35:48 -0400 Subject: GPU memory Usage In-Reply-To: <20200429194230.MS-WFhP9w%predragp@andrew.cmu.edu> References: <5f3e7b4a92a74db09c2be0406d8dbb0e@andrew.cmu.edu> <20200429194230.MS-WFhP9w%predragp@andrew.cmu.edu> Message-ID: [image: image.png] On Wed, Apr 29, 2020 at 3:43 PM Predrag Punosevac wrote: > Sarveshwaran Jayaraman wrote: > > > Hi All, > > > > I was trying to run my experiments on GPU 14 and came across this > > situation. On GPU ID 0 & 1 (highlighted in green& blue respectively) > > user has not released the GPU memory after experiment. > > This is one of quintessential don'ts and it is now well documented > > https://www.autonlab.org/autonlab_wiki/ > > Offending members will have their accounts suspendend until they take > a quiz and score above 80%. If you take a quiz and flunk it, a mandatory > seven day waiting period will be enforced :-)))))) > > Cheers, > Predrag > > > > > A possible scenario could be that the user has not shutdown the > > jupyter notebook after use (closing does not suffice). Therefore 2 > > out of possible 4 GPUs are not available on that node. > > > > > > Please be mindful to free GPU memory after use for other users if that's > the case. One simple solution I found around this is to convert your > notebooks to python script and run them using nohup command. > > > > Thanks for your understanding! > > > > > > (base) sarveshj at gpu14$ nvidia-smi -l 3 > > Wed Apr 29 11:18:10 2020 > > > +-----------------------------------------------------------------------------+ > > | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: > 10.2 | > > > |-------------------------------+----------------------+----------------------+ > > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile > Uncorr. ECC | > > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util > Compute M. | > > > |===============================+======================+======================| > > | 0 GeForce RTX 208... Off | 00000000:18:00.0 Off | > N/A | > > | 32% 48C P2 59W / 250W | 10980MiB / 11019MiB | 0% > Default | > > > +-------------------------------+----------------------+----------------------+ > > | 1 GeForce RTX 208... Off | 00000000:3B:00.0 Off | > N/A | > > | 41% 66C P2 98W / 250W | 10984MiB / 11019MiB | 18% > Default | > > > +-------------------------------+----------------------+----------------------+ > > | 2 GeForce RTX 208... Off | 00000000:86:00.0 Off | > N/A | > > | 33% 54C P2 67W / 250W | 1935MiB / 11019MiB | 9% > Default | > > > +-------------------------------+----------------------+----------------------+ > > | 3 GeForce RTX 208... Off | 00000000:AF:00.0 Off | > N/A | > > | 32% 43C P2 62W / 250W | 8849MiB / 11019MiB | 4% > Default | > > > +-------------------------------+----------------------+----------------------+ > > > > > +-----------------------------------------------------------------------------+ > > | Processes: GPU > Memory | > > | GPU PID Type Process name > Usage | > > > |=============================================================================| > > | 0 203849 C python3 > 1677MiB | > > | 1 203849 C python3 > 155MiB | > > | 1 236031 C /home/scratch/sarveshj/mini/bin/python3 > 10817MiB | > > | 2 203849 C python3 > 155MiB | > > | 2 232877 C python > 1613MiB | > > | 2 236031 C /home/scratch/sarveshj/mini/bin/python3 > 155MiB | > > | 3 147113 C python > 1613MiB | > > | 3 203849 C python3 > 155MiB | > > > +-----------------------------------------------------------------------------+ > > > > > > > > > > [1562005799537] > > > > Sarvesh Jayaraman > > Sr. Research Analyst, Auton Lab > > Carnegie Mellon University > > Mob: +1-240-893-4287 > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 101850 bytes Desc: not available URL: From awd at cs.cmu.edu Wed Apr 29 19:44:40 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Wed, 29 Apr 2020 19:44:40 -0400 Subject: Our own hero fighting against COVID at its US epicenter Message-ID: Dear Autonians, Many of you know Luke Sciulli. Some of you have not met him yet. Luke joined the Lab in October last year as a senior analyst. He is a retired Green Beret - a US Army special operations combat medic. He was deployed and served in the harm's way multiple times. He was severely injured in action in Afghanistan 2 years ago, amazingly survived the ordeal, recovered from injury-induced paralysis and several other traumas, and joined our team after returning to his home in Pittsburgh while continuing his path towards full recovery. He is helping a lot with multiple projects involving application of AI to healthcare. Most obviously, he is the key resource in RoboTRAC/TRACIR projects which aim to use robotics and AI to automate trauma care in the field, basically aiming to create a robotic version of Luke. He is sharing a lot of first-hand insights in the process, and that's invaluable. Many of you know all that already, but I want to share another news about Luke. He has recently learned that his fellow special operations medics and doctors have spontaneously decided to organize and open an ad-hoc field hospital in New York City. Luke could not imagine not being a part of it, so he joined the endeavor as a volunteer some 12 days ago. So now he is there, attending COVID patients who are in need of intensive care, taking long-hour shifts at the ICU floor, while amazingly using his remaining time to stay current with his primary CMU job duties. He is not only our ears and eyes in the field of combat against COVID, constantly looking for ideas on how we might creatively use AI and robotics to help the fight, but he is now also serving as a supervisor of that hospital. I just wanted to share with you all how proud I am for having Luke a part of our team. I keep asking him to stay safe and healthy, and I sincerely hope he listens to these requests, and that he will be back home soon and sound. Cheers, Artur PS. Check out this video. It is relevant: https://vimeo.com/410732813 -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Apr 29 21:10:45 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 29 Apr 2020 21:10:45 -0400 Subject: GPU memory Usage In-Reply-To: References: <5f3e7b4a92a74db09c2be0406d8dbb0e@andrew.cmu.edu> <20200429194230.MS-WFhP9w%predragp@andrew.cmu.edu> Message-ID: <20200430011045.oE0HwNo4q%predragp@andrew.cmu.edu> Gus Welter wrote: > [image: image.png] Yum. Check out Dos and Don'ts on our wiki. I am issuing serious threats. Also check out FAQ I added today https://www.autonlab.org/autonlab_wiki/faq.html I wrote two long paragraphs just for you :-) Why not Ubuntu? and May I use Docker? Speaking of which, I got a lot of crap today from Artur regarding those r-stan scripts. He wants those working ASAP. I can't deal with that Docker nonsense. I am busy before the weekend but I will have to get to the buttom of it and see why is your R script crash dumping. On the related note I have a marching orders to get you Auton systems server account. Congradulations on the new job! I will do it over the weekend. The good news is that you will have more work while people are bing laid off left and right . The bad news is that you will be working pro bono like all of us. On the second thought that seems also good news as it will help with your diet plan:-) Cheers, Predrag > > On Wed, Apr 29, 2020 at 3:43 PM Predrag Punosevac > wrote: > > > Sarveshwaran Jayaraman wrote: > > > > > Hi All, > > > > > > I was trying to run my experiments on GPU 14 and came across this > > > situation. On GPU ID 0 & 1 (highlighted in green& blue respectively) > > > user has not released the GPU memory after experiment. > > > > This is one of quintessential don'ts and it is now well documented > > > > https://www.autonlab.org/autonlab_wiki/ > > > > Offending members will have their accounts suspendend until they take > > a quiz and score above 80%. If you take a quiz and flunk it, a mandatory > > seven day waiting period will be enforced :-)))))) > > > > Cheers, > > Predrag > > > > > > > > > A possible scenario could be that the user has not shutdown the > > > jupyter notebook after use (closing does not suffice). Therefore 2 > > > out of possible 4 GPUs are not available on that node. > > > > > > > > > Please be mindful to free GPU memory after use for other users if that's > > the case. One simple solution I found around this is to convert your > > notebooks to python script and run them using nohup command. > > > > > > Thanks for your understanding! > > > > > > > > > (base) sarveshj at gpu14$ nvidia-smi -l 3 > > > Wed Apr 29 11:18:10 2020 > > > > > +-----------------------------------------------------------------------------+ > > > | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: > > 10.2 | > > > > > |-------------------------------+----------------------+----------------------+ > > > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile > > Uncorr. ECC | > > > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util > > Compute M. | > > > > > |===============================+======================+======================| > > > | 0 GeForce RTX 208... Off | 00000000:18:00.0 Off | > > N/A | > > > | 32% 48C P2 59W / 250W | 10980MiB / 11019MiB | 0% > > Default | > > > > > +-------------------------------+----------------------+----------------------+ > > > | 1 GeForce RTX 208... Off | 00000000:3B:00.0 Off | > > N/A | > > > | 41% 66C P2 98W / 250W | 10984MiB / 11019MiB | 18% > > Default | > > > > > +-------------------------------+----------------------+----------------------+ > > > | 2 GeForce RTX 208... Off | 00000000:86:00.0 Off | > > N/A | > > > | 33% 54C P2 67W / 250W | 1935MiB / 11019MiB | 9% > > Default | > > > > > +-------------------------------+----------------------+----------------------+ > > > | 3 GeForce RTX 208... Off | 00000000:AF:00.0 Off | > > N/A | > > > | 32% 43C P2 62W / 250W | 8849MiB / 11019MiB | 4% > > Default | > > > > > +-------------------------------+----------------------+----------------------+ > > > > > > > > +-----------------------------------------------------------------------------+ > > > | Processes: GPU > > Memory | > > > | GPU PID Type Process name > > Usage | > > > > > |=============================================================================| > > > | 0 203849 C python3 > > 1677MiB | > > > | 1 203849 C python3 > > 155MiB | > > > | 1 236031 C /home/scratch/sarveshj/mini/bin/python3 > > 10817MiB | > > > | 2 203849 C python3 > > 155MiB | > > > | 2 232877 C python > > 1613MiB | > > > | 2 236031 C /home/scratch/sarveshj/mini/bin/python3 > > 155MiB | > > > | 3 147113 C python > > 1613MiB | > > > | 3 203849 C python3 > > 155MiB | > > > > > +-----------------------------------------------------------------------------+ > > > > > > > > > > > > > > > [1562005799537] > > > > > > Sarvesh Jayaraman > > > Sr. Research Analyst, Auton Lab > > > Carnegie Mellon University > > > Mob: +1-240-893-4287 > > > > > > > > From choset at andrew.cmu.edu Wed Apr 29 21:25:54 2020 From: choset at andrew.cmu.edu (Howie Choset) Date: Wed, 29 Apr 2020 21:25:54 -0400 Subject: Our own hero fighting against COVID at its US epicenter In-Reply-To: References: Message-ID: <1f1e01d61e8e$4bfd5050$e3f7f0f0$@andrew.cmu.edu> Hurray for Luke. He is amazing. Howie From: Artur Dubrawski Sent: Wednesday, April 29, 2020 7:45 PM To: users at autonlab.org Cc: Farnam Jahanian ; James H. Garrett, Jr. ; Michael McQuade ; Martial Hebert ; Srinivasa Narasimhan ; Byron Spice ; George Darakos ; Cheryl Wehrer ; Howie Choset ; Christopher Atkeson ; John Galeotti ; Luke Sciulli ; Matty, Douglas M COL USARMY HQDA SECARMY (USA) ; Kliethermes, Kenneth J COL USARMY FUTURES COMMAND (USA) Subject: Our own hero fighting against COVID at its US epicenter Dear Autonians, Many of you know Luke Sciulli. Some of you have not met him yet. Luke joined the Lab in October last year as a senior analyst. He is a retired Green Beret - a US Army special operations combat medic. He was deployed and served in the harm's way multiple times. He was severely injured in action in Afghanistan 2 years ago, amazingly survived the ordeal, recovered from injury-induced paralysis and several other traumas, and joined our team after returning to his home in Pittsburgh while continuing his path towards full recovery. He is helping a lot with multiple projects involving application of AI to healthcare. Most obviously, he is the key resource in RoboTRAC/TRACIR projects which aim to use robotics and AI to automate trauma care in the field, basically aiming to create a robotic version of Luke. He is sharing a lot of first-hand insights in the process, and that's invaluable. Many of you know all that already, but I want to share another news about Luke. He has recently learned that his fellow special operations medics and doctors have spontaneously decided to organize and open an ad-hoc field hospital in New York City. Luke could not imagine not being a part of it, so he joined the endeavor as a volunteer some 12 days ago. So now he is there, attending COVID patients who are in need of intensive care, taking long-hour shifts at the ICU floor, while amazingly using his remaining time to stay current with his primary CMU job duties. He is not only our ears and eyes in the field of combat against COVID, constantly looking for ideas on how we might creatively use AI and robotics to help the fight, but he is now also serving as a supervisor of that hospital. I just wanted to share with you all how proud I am for having Luke a part of our team. I keep asking him to stay safe and healthy, and I sincerely hope he listens to these requests, and that he will be back home soon and sound. Cheers, Artur PS. Check out this video. It is relevant: https://vimeo.com/410732813 -------------- next part -------------- An HTML attachment was scrubbed... URL: From farnam at andrew.cmu.edu Wed Apr 29 21:47:27 2020 From: farnam at andrew.cmu.edu (Farnam Jahanian) Date: Thu, 30 Apr 2020 01:47:27 +0000 Subject: Our own hero fighting against COVID at its US epicenter In-Reply-To: <1f1e01d61e8e$4bfd5050$e3f7f0f0$@andrew.cmu.edu> References: <1f1e01d61e8e$4bfd5050$e3f7f0f0$@andrew.cmu.edu> Message-ID: Artur, Thank you for sharing. It is so inspiring to learn about Luke?s life experiences and how he is leading with compassion, empathy and commitment to others. He is a great role model for all of us, and for the 14000 students who will return to our campus, hopefully in a few months. This is a story that needs to be shared. Wishing Luke good health and safe return home, Farnam --- Farnam Jahanian President Henry L. Hillman President?s Chair Carnegie Mellon University From: Howie Choset Date: Wednesday, April 29, 2020 at 9:25 PM To: 'Artur Dubrawski' , "users at autonlab.org" Cc: Farnam Jahanian , "James H Garrett Jr." , Michael McQuade , 'Martial Hebert' , Srinivasa G Narasimhan , Byron G Spice , George Darakos , Cheryl Wehrer , 'Howie Choset' , Christopher Granger Atkeson , John Michael Galeotti , Luke Sciulli , "'Matty, Douglas M COL USARMY HQDA SECARMY (USA)'" , "'Kliethermes, Kenneth J COL USARMY FUTURES COMMAND (USA)'" Subject: RE: Our own hero fighting against COVID at its US epicenter Hurray for Luke. He is amazing. Howie From: Artur Dubrawski Sent: Wednesday, April 29, 2020 7:45 PM To: users at autonlab.org Cc: Farnam Jahanian ; James H. Garrett, Jr. ; Michael McQuade ; Martial Hebert ; Srinivasa Narasimhan ; Byron Spice ; George Darakos ; Cheryl Wehrer ; Howie Choset ; Christopher Atkeson ; John Galeotti ; Luke Sciulli ; Matty, Douglas M COL USARMY HQDA SECARMY (USA) ; Kliethermes, Kenneth J COL USARMY FUTURES COMMAND (USA) Subject: Our own hero fighting against COVID at its US epicenter Dear Autonians, Many of you know Luke Sciulli. Some of you have not met him yet. Luke joined the Lab in October last year as a senior analyst. He is a retired Green Beret - a US Army special operations combat medic. He was deployed and served in the harm's way multiple times. He was severely injured in action in Afghanistan 2 years ago, amazingly survived the ordeal, recovered from injury-induced paralysis and several other traumas, and joined our team after returning to his home in Pittsburgh while continuing his path towards full recovery. He is helping a lot with multiple projects involving application of AI to healthcare. Most obviously, he is the key resource in RoboTRAC/TRACIR projects which aim to use robotics and AI to automate trauma care in the field, basically aiming to create a robotic version of Luke. He is sharing a lot of first-hand insights in the process, and that's invaluable. Many of you know all that already, but I want to share another news about Luke. He has recently learned that his fellow special operations medics and doctors have spontaneously decided to organize and open an ad-hoc field hospital in New York City. Luke could not imagine not being a part of it, so he joined the endeavor as a volunteer some 12 days ago. So now he is there, attending COVID patients who are in need of intensive care, taking long-hour shifts at the ICU floor, while amazingly using his remaining time to stay current with his primary CMU job duties. He is not only our ears and eyes in the field of combat against COVID, constantly looking for ideas on how we might creatively use AI and robotics to help the fight, but he is now also serving as a supervisor of that hospital. I just wanted to share with you all how proud I am for having Luke a part of our team. I keep asking him to stay safe and healthy, and I sincerely hope he listens to these requests, and that he will be back home soon and sound. Cheers, Artur PS. Check out this video. It is relevant: https://vimeo.com/410732813 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngisolfi at cs.cmu.edu Thu Apr 30 11:05:10 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 30 Apr 2020 11:05:10 -0400 Subject: [Lunch] Today @noon over Zoom Message-ID: <99CDFEEF-4B83-4C07-8060-32AD0755B8B7@cs.cmu.edu> The link for convenience: https://cmu.zoom.us/j/492870487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Thu Apr 30 21:55:28 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Thu, 30 Apr 2020 21:55:28 -0400 Subject: Lov5 Not Responding In-Reply-To: References: Message-ID: <20200501015528.EMcbulm1N%predragp@andrew.cmu.edu> Naji Shajarisales wrote: > Hi Predrag, > > Just wanted to say I cannot reach lov5 now. Just fixed with cold reboot. Runaway Python script did it. I am not naming names :-) I would like to reboot it one more time after updating all packages. The server was is already out of production. Give 10-15 minutes before trying to login into. > I have quite some good updates > that I haven't committed on it right now. > I am not following 100%. If it is code, Git commits are cheap. You should commit frequently and push regularly. If it is data stored on anything zfs it will come undamaged. If it is a output of a script which was writing could be very well corrupt or non existing because process was terminated. Anything on XFS could be corrupted due to the unclean power cycle. Only ZFS, HAMMER1, and HAMMER2 handle crashes 100% clean. WAPBL will be OK in most situations. Linux has none of it. > I hope it is possible to bring lov5 back without deleting anything from > scratch. Scratch has been only deleted 3 times in almost 8 years while I am affiliated to the Lab. Every time that happen after a looong wait time when it was 100%, unusable, and people didn't care about it. Now you are probably referring to data corruption described earlier. I can't promise that. I am a system admin not a magician. > > I would appreciate if you address this asap. As always :-). Cheers, Predrag > > Best, > Naji From predragp at andrew.cmu.edu Thu Apr 30 22:00:11 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Thu, 30 Apr 2020 22:00:11 -0400 Subject: Auton Lab etiquette Message-ID: <20200501020011.GnlkVakD5%predragp@andrew.cmu.edu> Dear Autonians, Dos and don'ts are now well documented. https://www.autonlab.org/autonlab_wiki/aetiquette.html This should never happen. root at lov1$ du -h -s /tmp 13G /tmp Every reboot is a lost of productivity. I am going to shut up now as I would like to keep signal to noise ratio on this mailing list as high as possible. Cheers, Predrag