From awd at cs.cmu.edu Sun May 3 19:56:49 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Sun, 3 May 2020 19:56:49 -0400 Subject: Reminder: Maria's defense at 11am this Monday In-Reply-To: References: Message-ID: On Wed, Apr 29, 2020 at 12:39 PM Artur Dubrawski wrote: > And the details of Maria's defense on Monday: > > Please join us on Monday, May 4 via Zoom at 11am when Maria De-Arteaga (ML > & Public Policy Joint PhD) will be defending her thesis. > > *Title:* Machine Learning in High-Stakes Settings: Risks and Opportunities > > *Thesis committee:* Artur Dubrawski (Co-Chair), Alexandra Chouldechova > (Co-Chair), Roni Rosenfeld, Adam Tauman Kalai (Microsoft Research) > > *Zoom Link:* > https://cmu.zoom.us/j/94967473449?pwd=b09lL29qblg1ZU5BWHZhVDB2NjFjQT09 > > *Meeting ID:* 949 6747 3449 > *Password:* 000312 > > *Abstract: * Machine learning (ML) is increasingly being used to support > decision-making in critical settings, where predictions have potentially > grave implications over human lives. Examples include healthcare, hiring, > child welfare, and the criminal justice system. In this thesis, I study the > risks and opportunities of machine learning in high-stakes settings. In the > first chapter I focus on opportunities of ML to support experts' decisions > when dealing with high-resolution multivariate data, a type of data that is > particularly hard for humans to interpret. I propose methodology to > discover latent complex multivariate correlation structures and illustrate > its use in two different domains: (1) identification of radioactive threats > in nuclear physics, and (2) prediction of neurological recovery of comatose > patients in healthcare. In the second chapter, focused on algorithmic > fairness, I demonstrate how societal biases encoded in historical data may > be reproduced and amplified by ML models, and introduce a new algorithm to > mitigate biases without assuming access to protected attributes. Finally, > in the third chapter I characterize challenges that arise from the > limitations of available labels in decision support contexts--such as the > selective labels problem and omitted payoff bias--and propose methodology > to estimate and leverage human consistency to improve algorithmic > recommendations and human-machine complementarity. > > > *Paper Link:* > https://www.dropbox.com/s/h449z85r6nls8oc/Dissertation_DeArteaga.pdf?dl=0 > > > On Tue, Apr 28, 2020 at 11:28 AM Artur Dubrawski wrote: > >> Dear Autonians, >> >> Please join me in attending 2 (yes, two) excellent virtual >> presentations by our own Maria De-Arteaga and Chao Liu, both of which are >> scheduled for the next week. >> >> (btw, I do not remember when was the last time we had more than one >> doctoral thesis defense scheduled in one week at the Auton Lab...) >> >> Maria's defense will be on Monday May 4th at 11am, >> The official announcement will be shared soon. >> >> Chao's defense is scheduled for Thursday May 7th at noon. >> The official announcement with the zoom link is included below. >> >> Please help seeing these outstanding colleagues move to the next levels >> of their professional lives by attending these presentations and cheering >> for them :) >> >> Cheers, >> Artur >> >> ----- >> >> Date: 07 May 2020 >> >> Time: 12:00 p.m. >> >> Place: *Virtual Presentation* https://cmu.zoom.us/j/2623852919 >> >> Type: Ph.D. Thesis Defense >> >> Who: Chao Liu >> >> Title: Vision with Small Baselines >> >> >> Abstract: >> 3D sensing with portable imaging systems is becoming more and more >> popular in computer vision applications such as autonomous driving, virtual >> reality, robotics manipulation and surveillance, due to the decreasing >> expense and size of RGB cameras. Despite the compactness and portability of >> the small baseline vision systems, it is well-known that the uncertainty in >> range finding using multiple views and the sensor baselines are inversely >> related. On the other hand, besides compactness, the small baseline vision >> system has its unique advantages such as easier correspondence and large >> overlapping regions across views. >> >> The goal of this thesis is to develop computational methods and small >> baseline imaging systems for 3D sensing of complex scenes in real world >> conditions. Our design principle is to physically model the scene >> complexities and specifically infer the uncertainties for the images >> captured with small baseline setups. >> >> With this design principle, we make four contributions. In the first >> contribution, we propose a two-stage near-light photometric stereo method >> using a small (6 cm diameter) LED ring. The imaging system is compact >> compared to traditional photometric stereo systems. In the second >> contribution, we develop an algorithm to simultaneously estimate the >> occlusion pattern and depth for thin structures from a focal image stack, >> which is obtained either by varying the focus/aperture of the lens or >> computed from a one-shot light field image. As the third contribution, we >> propose a learning-based method to estimate per-pixel depth and its >> uncertainty continuously from a monocular video stream, with small camera >> baselines across adjacent frames. These depth probability volumes are >> accumulated over time as more incoming frames are processed sequentially, >> which effectively reduces depth uncertainty and improves accuracy, >> robustness, and temporal stability. Finally, using a pair of high >> resolution camera and laser projector, we develop a high spatial resolution >> Diffuse Optical Tomography (DOT) system that can detect accurate boundaries >> and relative depth of heterogeneous structures up to a depth of 8mm below a >> highly scattering medium such as whole milk. >> >> We showcase the application of a small baseline vision system for in-vivo >> micro-scale 3D reconstruction of capillary veins and develop a system for >> real-time analysis of microvascular blood flow for critical care. We >> believe that the computational methods developed in this thesis would find >> more applications of compact 3D sensing under challenging conditions. >> >> >> >> Thesis Committee Members: >> >> Srinivasa G. Narasimhan, Co-chair >> Artur W. Dubrawski, Co-chair >> Aswin C. Sankaranarayanan >> Manmohan Chandraker, University of California, San Diego >> >> >> A copy of the thesis document is available at: >> >> https://www.dropbox.com/s/cz75koh96ragy4x/thesis-small-baseline.pdf?dl=0 >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue May 5 14:04:02 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 5 May 2020 14:04:02 -0400 Subject: Possible power outage Message-ID: Dear Autonians, Just a quick heads up. I am unable to ping 3 firewall appliances protecting PMx room and my office. Consequently Karen Chen server forth is also unavailable. Those small appliances are Mips64 based and should just auto boot once the power is restored. However, they are very slow and if the file system is trashed (they have to boot off FAT 16) it is likely that it will take a few hours to fsck the file system or that I will have to manually clean things. I am still investigating as of this moment I am not sure if any other desktop computers are affected. lion.auton.cs.cmu.edu is OK and I am on it right now. Predrag From predragp at andrew.cmu.edu Tue May 5 14:16:09 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 5 May 2020 14:16:09 -0400 Subject: autofs issues In-Reply-To: References: <20200428020416.kjbpdA4kr%predragp@andrew.cmu.edu> <20200429122744.FKtS7nUVM%predragp@andrew.cmu.edu> Message-ID: Hi Ifi, Thank you for bringing this to my attention. autosf demon appears to be broken at least on the machines running Red Hat 8.1. (gpu15-gpu19). I have upgraded and rebooted gpu15 and it didn't fix the problem. I started autosf from the command line. These symptoms are typical when upstream breaks SEL Linux policies. Knowing how Red Hat operates I would not hold my breath that this will be fixed very soon. No big deal I will start autosf manually on all affected machines GPU15-19 All CPU nodes have been upgraded last week to Red Hat 7.8 and they seems to work find. GPU1-14 run mostly Red Hat 7.7 as the upgrade is time consuming due to the proprietary Nvidia drivers. Please report issues like this without second thought as that is the only way to fix them. Best, Predrag On Tue, May 5, 2020 at 9:44 AM Ifigeneia Apostolopoulou wrote: > > Hi (again!) Predrag and hope you are well! > > I think you were right: tmux was the culprit. But I'm still facing a problem for gpu15,gpu17,gpu18. any suggestions? see below > > Thanks (again!) :) > > > Last login: Tue May 5 08:40:24 2020 from 192.168.6.115 > > Could not chdir to home directory /zfsauton3/home/iapostol: No such file or directory > > iapostol at gpu15$ pwd > > / > > iapostol at gp > > > > > > On Wed, Apr 29, 2020 at 8:27 AM Predrag Punosevac wrote: >> >> Ifigeneia Apostolopoulou wrote: >> >> > Hi Predrag, >> > >> >> Hi Ifi >> >> > I just wanted to bring it to your attention: >> > >> > no gateway is currently working for me. I may occasionally be able to >> > login >> >> I just checked >> >> lop2.autonlab.org >> lop1.autonlab.org >> lion.auton.cs.cmu.edu >> >> and I have no problem login. I have logged first with my regular account >> to eliminate possibility that LDAP services are down. Then I have used >> my root account to log as a you to check autofs daemon. Please see >> below. >> >> root at lop2$ su - iapostol >> Last login: Wed Apr 29 07:47:06 EDT 2020 from >> c-73-154-131-241.hsd1.pa.comcast.net on pts/25 >> root at lop2$ pwd >> /zfsauton3/home/iapostol >> >> >> lop1# su - iapostol >> -bash-5.0$ pwd >> /zfsauton3/home/iapostol >> -bash-5.0$ uname -a >> OpenBSD lop1.int.autonlab.org 6.6 GENERIC.MP#8 amd64 >> >> >> >> >> [root at lion ~]# su - iapostol >> Last failed login: Wed Apr 29 00:34:26 EDT 2020 from >> c-73-154-131-241.hsd1.pa.comcast.net on ssh:notty >> There were 4 failed login attempts since the last successful login. >> -bash-4.2$ pwd >> /zfsauton3/home/iapostol >> -bash-4.2$ uname -a >> Linux lion.auton.cs.cmu.edu 3.10.0-1127.el7.x86_64 #1 SMP Wed Apr 8 >> 08:26:53 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux >> >> >> Now lion is showing an interesting output. It shows that you had tried 4 >> times to log with incorrect credentials. That would definitely put you >> on the banned list at least for a while. However, if I have to put my >> money on your problems I would guess that there is a DNS problem. I made >> sure that the Auton Lab DNS servers are working as advertised so I will >> point to your personal DNS. There is some remote chance that you are >> experiencing that weird routing problem, reported by ram and me, when >> NSA breaks CMU routing tables and blocks bunch of residential ISP from >> reaching CMU. >> >> >> > but still can't find anything in my home directory :// >> > >> > >> > iapostol at lop2.autonlab.org:/zfsauton3/home/iapostol/ >> > >> > iapostol at lop2.autonlab.org's password: >> > >> > Could not chdir to home directory /zfsauton3/home/iapostol: Input/output >> > error >> >> >> This was actually more interesting part of your report. I immediately >> assumed that my auto.nfs file got corrupted or that autofs daemon is not >> working properly. I had a problem with autofs on lion.auton.cs.cmu.edu >> so I am not running it out of systemd. It is manually started. However, >> as of this morning autofs works both on lop2.autonlab.org and >> lion.auton.cs.cmu.edu which you can see from the above output. >> >> lop1.autonlab.org doesn't run autofs daemon as it is runs of OpenBSD >> which doesn't have a modern autofs daemon. In order for you to log into >> lop1.autonlab.org I created a tiny local home directories which are >> needed for you to ssh to computing nodes. I would not expect that you >> see anything inside your home directory on lop1.autonlab.org. >> >> >> > > Just a quick heads up. bash.autonlab.org (my desktop) just crashed >> > > again. I have no idea what happened nor I care too much about it. There >> > > are other three shell gateways. >> >> I do know what is the problem with bash. Bash is a NUC machine. After >> upgrade to Red Hat 7.7 a network driver regression (reported by multiple >> people including me) was introduced which caused network interface to >> crap out. I typically manually select an older stable kernel when I >> reboot bash but this time around I realized that somebody else rebooted >> machine and grub boot-loader just picked new broken kernel. That thin is >> now going to rotten for a while. FYI Red Hat dismissed my bug report >> since we are not paid customers :-) For Red Hat/IBM the problem doesn't >> exist. >> >> >> Best, >> Predrag >> >> P.S. I almost forgot. If you had things running out of tmux or screen >> make sure you log out first before you try to recommenct to the Auton >> Lab. I have seen all sorts of weird things happening because of that. From chiragn at cs.cmu.edu Tue May 5 14:37:51 2020 From: chiragn at cs.cmu.edu (Chirag Nagpal) Date: Tue, 5 May 2020 14:37:51 -0400 Subject: Linking Libcudart for tensorflow Message-ID: Hi all I?ve been trying to use tensorflow on the gpu machines. I tried to install tensorflow 2 and 1.13 using conda on python 2.7 and 3.7 . Unfortunately when I import tensorflow it say?s it couldn?t find libcudart8 in the python path. The machines already have a more advanced version of the library libcudart10 in usr/local/cuda . it?s weird conda isn?t able to link this version of tensorflow during installation. Any help/pointers would be appreciated. PS. FWIW Pytorch installed from conda works like a charm for me with cuda support, but unfortunately I really need tensorflow for some baselines. -- Sent from my phone. Apologies for the typos. -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue May 5 14:41:59 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 5 May 2020 14:41:59 -0400 Subject: Linking Libcudart for tensorflow In-Reply-To: References: Message-ID: Python 2.7 is end-of-life. I would very strongly discourage everyone from using it. I just upgraded all packages on GPU15-19 which run Red Hat 8.1. That is basically the state of the art on this platform. I would be the most curious if people can get tensorflow to work on those machines. Predrag On Tue, May 5, 2020 at 2:39 PM Chirag Nagpal wrote: > > Hi all > > I?ve been trying to use tensorflow on the gpu machines. I tried to install tensorflow 2 and 1.13 using conda on python 2.7 and 3.7 . Unfortunately when I import tensorflow it say?s it couldn?t find libcudart8 in the python path. > > The machines already have a more advanced version of the library libcudart10 in usr/local/cuda . it?s weird conda isn?t able to link this version of tensorflow during installation. > > Any help/pointers would be appreciated. > > PS. FWIW Pytorch installed from conda works like a charm for me with cuda support, but unfortunately I really need tensorflow for some baselines. > > > -- > Sent from my phone. Apologies for the typos. From awd at cs.cmu.edu Thu May 7 07:35:10 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Thu, 7 May 2020 07:35:10 -0400 Subject: It is today at noon! [Ph.D. Thesis Defense: Chao Liu] In-Reply-To: References: Message-ID: Reminder: Chao's PhD thesis defense is today at noon! Zoom link and other information can be found below. Cheers Artur ---------- Forwarded message --------- From: Suzanne Lyons Muth Date: Tue, Apr 28, 2020 at 10:49 AM Subject: RI Ph.D. Thesis Defense: Chao Liu To: ri-people at lists.andrew.cmu.edu Date: 07 May 2020 Time: 12:00 p.m. Place: *Virtual Presentation* https://cmu.zoom.us/j/2623852919 Type: Ph.D. Thesis Defense Who: Chao Liu Title: Vision with Small Baselines Abstract: 3D sensing with portable imaging systems is becoming more and more popular in computer vision applications such as autonomous driving, virtual reality, robotics manipulation and surveillance, due to the decreasing expense and size of RGB cameras. Despite the compactness and portability of the small baseline vision systems, it is well-known that the uncertainty in range finding using multiple views and the sensor baselines are inversely related. On the other hand, besides compactness, the small baseline vision system has its unique advantages such as easier correspondence and large overlapping regions across views. The goal of this thesis is to develop computational methods and small baseline imaging systems for 3D sensing of complex scenes in real world conditions. Our design principle is to physically model the scene complexities and specifically infer the uncertainties for the images captured with small baseline setups. With this design principle, we make four contributions. In the first contribution, we propose a two-stage near-light photometric stereo method using a small (6 cm diameter) LED ring. The imaging system is compact compared to traditional photometric stereo systems. In the second contribution, we develop an algorithm to simultaneously estimate the occlusion pattern and depth for thin structures from a focal image stack, which is obtained either by varying the focus/aperture of the lens or computed from a one-shot light field image. As the third contribution, we propose a learning-based method to estimate per-pixel depth and its uncertainty continuously from a monocular video stream, with small camera baselines across adjacent frames. These depth probability volumes are accumulated over time as more incoming frames are processed sequentially, which effectively reduces depth uncertainty and improves accuracy, robustness, and temporal stability. Finally, using a pair of high resolution camera and laser projector, we develop a high spatial resolution Diffuse Optical Tomography (DOT) system that can detect accurate boundaries and relative depth of heterogeneous structures up to a depth of 8mm below a highly scattering medium such as whole milk. We showcase the application of a small baseline vision system for in-vivo micro-scale 3D reconstruction of capillary veins and develop a system for real-time analysis of microvascular blood flow for critical care. We believe that the computational methods developed in this thesis would find more applications of compact 3D sensing under challenging conditions. Thesis Committee Members: Srinivasa G. Narasimhan, Co-chair Artur W. Dubrawski, Co-chair Aswin C. Sankaranarayanan Manmohan Chandraker, University of California, San Diego A copy of the thesis document is available at: https://www.dropbox.com/s/cz75koh96ragy4x/thesis-small-baseline.pdf?dl=0 _______________________________________________ ri-people mailing list ri-people at lists.andrew.cmu.edu https://lists.andrew.cmu.edu/mailman/listinfo/ri-people -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngisolfi at cs.cmu.edu Thu May 7 11:24:08 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 7 May 2020 11:24:08 -0400 Subject: [Lunch] Today @noon over zoom Message-ID: <556CD386-38B9-4919-9A32-6D4019A1EC54@cs.cmu.edu> Sorry for the late reminder?the link for convenience: https://cmu.zoom.us/j/492870487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngisolfi at cs.cmu.edu Thu May 7 11:46:54 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 7 May 2020 11:46:54 -0400 Subject: [Lunch] Today @noon over zoom CANCELLED In-Reply-To: References: <556CD386-38B9-4919-9A32-6D4019A1EC54@cs.cmu.edu> Message-ID: <791966C2-9FCE-453F-BC0A-8567877D7010@cs.cmu.edu> Thanks for the heads up Jarod! See you at Chao?s defense! > On May 7, 2020, at 11:44 AM, Donghan Wang wrote: > > Nick, > > Chao will have his thesis defence at noon. I guess many lab members will be on that channel. > > Thanks, > Jarod > > On Thu, May 7, 2020 at 11:25 AM Nick Gisolfi > wrote: > Sorry for the late reminder?the link for convenience: > > https://cmu.zoom.us/j/492870487 > > We hope to see you there! > > - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngisolfi at cs.cmu.edu Thu May 14 09:52:34 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 14 May 2020 09:52:34 -0400 Subject: [Lunch] today @noon over zoom Message-ID: <076C2AC1-23AE-42A1-B709-950B4EECC31F@cs.cmu.edu> https://cmu.zoom.us/j/492870487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Fri May 15 19:27:34 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 15 May 2020 19:27:34 -0400 Subject: GPU19 cold reboot Message-ID: <20200515232734.euO3DVz3k%predragp@andrew.cmu.edu> Dear Autonians, GPU19 had to be cold rebooted. Somebody run it into the ground. Thanks God for IPMI so I can do this remotely :-) Best, Predrag From predragp at andrew.cmu.edu Fri May 15 23:06:20 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 15 May 2020 23:06:20 -0400 Subject: gpu20 and gpu21 provisioned Message-ID: <20200516030620.g8-PCLIAO%predragp@andrew.cmu.edu> Dear Autonians, After about 12h of work you can finally log into the newest addition to our cluster: GPU20 and GPU21. These are some of the finest GPU nodes on the CMU campus. They came with a price tag of 37K each after 8K per unit education discount. I have no idea whom Dr. Jeff Schneider shook down for money but he surely knows how to do it. Each machine has 4 Tesla V100 GPU cards which have 32GB of GPU memory. That is sufficient to train 3D neural networks and to answer all the questions Dr. Dubrawski might ask you :-) On a serious note, we had really hard time getting these monsters onto the campus during the Cov19 and provisioning them under current circumstances. We didn't have deep rack space for these nor electricity so I am temporary borrowing space and electricity from somebody else. You will notice that network is only 1Gigabit as I could not get 10Gigabit network working with 30m Cat 5e which was needed to plug machines into our switch. That is how far they are actually physically located from our cluster. I will have to take sleep before adding scratch directories and installing MATLAB. Best, Predrag P.S. Dr. Schneider has more surprises but I will need to make a new trip to CMU in my hazmat suit before those babes are brought online. predragp at gpu20$ nvidia-smi Fri May 15 22:25:55 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:61:00.0 Off | 0 | | N/A 42C P0 53W / 300W | 0MiB / 32510MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... Off | 00000000:62:00.0 Off | 0 | | N/A 41C P0 53W / 300W | 0MiB / 32510MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... Off | 00000000:89:00.0 Off | 0 | | N/A 40C P0 56W / 300W | 0MiB / 32510MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... Off | 00000000:8A:00.0 Off | 0 | | N/A 41C P0 55W / 300W | 0MiB / 32510MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ egp at gpu21$ pwd /zfsauton2/home/predragp predragp at gpu21$ nvidia-smi Fri May 15 22:52:16 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:61:00.0 Off | 0 | | N/A 35C P0 54W / 300W | 0MiB / 32510MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... Off | 00000000:62:00.0 Off | 0 | | N/A 34C P0 54W / 300W | 0MiB / 32510MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... Off | 00000000:89:00.0 Off | 0 | | N/A 34C P0 54W / 300W | 0MiB / 32510MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... Off | 00000000:8A:00.0 Off | 0 | | N/A 34C P0 57W / 300W | 0MiB / 32510MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ predragp at gpu21$ ounted. From jeff4 at andrew.cmu.edu Mon May 18 12:10:56 2020 From: jeff4 at andrew.cmu.edu (Jeff Schneider) Date: Mon, 18 May 2020 12:10:56 -0400 Subject: gpu20 and gpu21 provisioned In-Reply-To: <20200516030620.g8-PCLIAO%predragp@andrew.cmu.edu> References: <20200516030620.g8-PCLIAO%predragp@andrew.cmu.edu> Message-ID: <2e9d107f-68ce-4665-793d-73300ad5c114@andrew.cmu.edu> Thanks for the hard work bringing these online! On 5/15/2020 11:06 PM, Predrag Punosevac wrote: > Dear Autonians, > > After about 12h of work you can finally log into the newest addition to > our cluster: GPU20 and GPU21. These are some of the finest GPU nodes on > the CMU campus. They came with a price tag of 37K each after 8K per unit > education discount. I have no idea whom Dr. Jeff Schneider shook down > for money but he surely knows how to do it. Each machine has 4 Tesla > V100 GPU cards which have 32GB of GPU memory. That is sufficient to > train 3D neural networks and to answer all the questions Dr. Dubrawski > might ask you :-) On a serious note, we had really hard time getting > these monsters onto the campus during the Cov19 and provisioning them > under current circumstances. We didn't have deep rack space for these > nor electricity so I am temporary borrowing space and electricity from > somebody else. You will notice that network is only 1Gigabit as I could > not get 10Gigabit network working with 30m Cat 5e which was needed to > plug machines into our switch. That is how far they are actually > physically located from our cluster. > > I will have to take sleep before adding scratch directories and > installing MATLAB. > > Best, > Predrag > > P.S. Dr. Schneider has more surprises but I will need to make a new trip > to CMU in my hazmat suit before those babes are brought online. > > > > > > > > predragp at gpu20$ nvidia-smi > Fri May 15 22:25:55 2020 > +-----------------------------------------------------------------------------+ > | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: > 10.2 | > |-------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile > Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util > Compute M. | > |===============================+======================+======================| > | 0 Tesla V100-SXM2... Off | 00000000:61:00.0 Off | > 0 | > | N/A 42C P0 53W / 300W | 0MiB / 32510MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > | 1 Tesla V100-SXM2... Off | 00000000:62:00.0 Off | > 0 | > | N/A 41C P0 53W / 300W | 0MiB / 32510MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > | 2 Tesla V100-SXM2... Off | 00000000:89:00.0 Off | > 0 | > | N/A 40C P0 56W / 300W | 0MiB / 32510MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > | 3 Tesla V100-SXM2... Off | 00000000:8A:00.0 Off | > 0 | > | N/A 41C P0 55W / 300W | 0MiB / 32510MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > > > +-----------------------------------------------------------------------------+ > | Processes: GPU > Memory | > | GPU PID Type Process name Usage > | > |=============================================================================| > | No running processes found > | > +-----------------------------------------------------------------------------+ > > > > egp at gpu21$ pwd > /zfsauton2/home/predragp > predragp at gpu21$ nvidia-smi > Fri May 15 22:52:16 2020 > +-----------------------------------------------------------------------------+ > | NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: > 10.2 | > |-------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile > Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util > Compute M. | > |===============================+======================+======================| > | 0 Tesla V100-SXM2... Off | 00000000:61:00.0 Off | > 0 | > | N/A 35C P0 54W / 300W | 0MiB / 32510MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > | 1 Tesla V100-SXM2... Off | 00000000:62:00.0 Off | > 0 | > | N/A 34C P0 54W / 300W | 0MiB / 32510MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > | 2 Tesla V100-SXM2... Off | 00000000:89:00.0 Off | > 0 | > | N/A 34C P0 54W / 300W | 0MiB / 32510MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > | 3 Tesla V100-SXM2... Off | 00000000:8A:00.0 Off | > 0 | > | N/A 34C P0 57W / 300W | 0MiB / 32510MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > > > +-----------------------------------------------------------------------------+ > | Processes: GPU > Memory | > | GPU PID Type Process name Usage > | > |=============================================================================| > | No running processes found > | > +-----------------------------------------------------------------------------+ > predragp at gpu21$ ounted. From awd at cs.cmu.edu Mon May 18 14:00:16 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 18 May 2020 14:00:16 -0400 Subject: a nice read about Luke Message-ID: https://www.cs.cmu.edu/news/cmu-trauma-care-researcher-joins-fight-against-covid-19-nyc -------------- next part -------------- An HTML attachment was scrubbed... URL: From RKP19 at pitt.edu Mon May 18 14:26:11 2020 From: RKP19 at pitt.edu (Poropatich,Ronald) Date: Mon, 18 May 2020 18:26:11 +0000 Subject: a nice read about Luke In-Reply-To: References: Message-ID: Artur, Thanks for sharing! Nicely done ? we are so fortunate to have this selfless leader part of our research efforts! Ironically, today is his official retirement from the Army! ron From: Artur Dubrawski Sent: Monday, May 18, 2020 2:00 PM To: users at autonlab.org Cc: Andrew W. Moore ; Howie Choset ; John Galeotti ; Poropatich,Ronald Subject: a nice read about Luke https://www.cs.cmu.edu/news/cmu-trauma-care-researcher-joins-fight-against-covid-19-nyc -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngisolfi at cs.cmu.edu Thu May 21 11:13:11 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 21 May 2020 11:13:11 -0400 Subject: [Lunch] today @noon over zoom Message-ID: https://cmu.zoom.us/j/492870487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Fri May 22 14:32:46 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 22 May 2020 14:32:46 -0400 Subject: Fwd: Strategy to return research activity in CMU Pittsburgh facilities In-Reply-To: <5ec7f5cfce259_1a49b2aeca77785e4417be@massmail-02.andrew.cmu.edu.mail> References: <5ec7f5cfce259_1a49b2aeca77785e4417be@massmail-02.andrew.cmu.edu.mail> Message-ID: Team, Please let me and Trish (cc-d) know if you have reasons to believe that some of our research activity should return to campus or to other facilities outside of our homes in the foreseeable future. Thanks Artur ---------- Forwarded message --------- From: Michael McQuade Date: Fri, May 22, 2020 at 11:54 AM Subject: Strategy to return research activity in CMU Pittsburgh facilities To: Dear CMU Faculty and Researchers: President Jahanian?s email last Thursday announced the beginning of CMU?s return to the Pittsburgh campus following Pennsylvania Governor Tom Wolf?s move to ?yellow? in Allegheny County as of May 15. Research is one of the primary missions of the university, and we have heard from many faculty and researchers who are eager to re-engage activities in CMU facilities. Today, I am pleased to provide information about our initial plans to bring research activities back to our Pittsburgh facilities in a phased fashion. We are committed to returning to on-site research operations as soon as safely possible. Our plan, developed by a working group comprised of representatives from across our Pittsburgh research operations and with input from the broader CMU community, will begin by piloting the return of selected research that cannot be done remotely. The safety, health and well-being of our community remains our highest priority. We know that there will be some in our community who will not be able to return to campus as quickly as others due to medical or personal issues. Please be assured that in preparing for the return of research to campus (as, indeed, the return to campus overall), we have those individuals in mind and the university is developing processes to address their circumstances in a thoughtful and compassionate manner. The process for returning research to campus will be guided by continual assessment of the most up-to-date information available in order to minimize the risks posed by the pandemic. As with all on-site activities, it is essential that everyone adhere to the COVID safety requirements the university has established . In the first phase, beginning now, the deans are selecting a small set of research activities from among those that must be done in a CMU facility and which represent a variety of use cases. These pilots will allow us to test and improve our processes and safety requirements and our ability to adhere to them. For now, *all researchers must continue to operate remotely until authorized to return to our facilities*. Each research activity wishing to return to a CMU facility ? an individual researcher, a large group lab, a research support facility, a creative studio, etc. ? must submit a plan using the template found on the COVID researcher resources site , describing the research activity it is proposing to do on campus, why it can?t be done remotely and how the activity will be conducted in a manner that will meet safety requirements required by the university, as well as by those specific to the needs of the project. Initial pilots selected by the deans have been, or will be, notified shortly to prepare and submit their plans. Plans must be approved by the respective deans and me before work can begin in a CMU facility. As we learn from and see success in our pilot projects, we will gradually expand access for additional research activities that have approved plans. We will approve research activities in a rolling manner until all activities that must be done in a CMU facility can return and operate safely. Throughout the process, deans and ADRs will communicate details for submitting plans to those desiring to return to our facilities. Two additional realities impact the return of on-site research. First, other campus activities (such as education and administrative functions) may also be restarting, and we need to ensure that we can collectively meet our low density and other COVID-related safety protocols. Second, while we hope it will not be the case, we must plan for the possibility that a local resurgence of the COVID-19 virus may force us to once again move research out of our facilities. For this reason, all returning activities must detail emergency and rapid ramp down procedures as part of their plans. Research is a vital part of CMU?s mission and we are excited to now be able to begin to restart research in our facilities. Please remain flexible during this important pilot phase. I will continue to keep you informed as we progress. Thank you again for your patience and for all your efforts. Sincerely, Michael J. Michael McQuade Vice President for Research -------------- next part -------------- An HTML attachment was scrubbed... URL: From boecking at andrew.cmu.edu Fri May 22 15:08:23 2020 From: boecking at andrew.cmu.edu (Benedikt Boecking) Date: Fri, 22 May 2020 15:08:23 -0400 Subject: CPU compute nodes Message-ID: Hi all, It looks like all our servers dedicated to CPU intensive computation are maxed out. If you happen to have any programs running that you do not need anymore, would you please stop them to free up resources? Thanks in advance! Best, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From chiragn at cs.cmu.edu Fri May 22 15:30:25 2020 From: chiragn at cs.cmu.edu (Chirag Nagpal) Date: Fri, 22 May 2020 15:30:25 -0400 Subject: CPU compute nodes In-Reply-To: References: Message-ID: Re emphasizing this request from Ben, I know its NeurIPS time, but let's please be more reasonable! On Fri, May 22, 2020 at 3:09 PM Benedikt Boecking wrote: > Hi all, > > It looks like all our servers dedicated to CPU intensive computation are > maxed out. > > If you happen to have any programs running that you do not need anymore, > would you please stop them to free up resources? Thanks in advance! > > Best, > Ben > > > > > -- *Chirag Nagpal* PhD Student, Auton Lab School of Computer Science Carnegie Mellon University cs.cmu.edu/~chiragn -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Fri May 22 18:08:59 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 22 May 2020 18:08:59 -0400 Subject: URGENT: We need to play nice with Auton Lab computing resources please Message-ID: Team, It's been a while since it has happened previously, but we appear to have just a few users hog all of our CPU capacity. Please can you be more respectful of the rest of the team and refrain from preventing others from using the system. We will be soon adding more CPUs, but for now we need to manage with what exists and this requires all of us to be team players. If in doubt, please refer to the how-to guide: https://www.autonlab.org/autonlab_wiki/aetiquette.html and/or ask Predrag for specific instructions. If we cannot play nice as a team, we would be forced to install resource management tools, which would not be our top preference. Thanks very much, Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed May 27 13:22:48 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 27 May 2020 13:22:48 -0400 Subject: access to home folder in clusters In-Reply-To: References: Message-ID: autofs daemon is dead on a bunch of machines due to the configuration update I pushed yesterday (I needed to add Interns). I am logging into servers one by one right now and fixing problems. Predrag On Wed, May 27, 2020 at 1:18 PM Chao Liu wrote: > > Hi Predrag, > > I just noticed that on some GPU clusters (e.g. GPU21), I don't have any access to my home folder: /zfsauton3/home/. Although in some other clusters (e.g. GPU20), this folder is still accessible. So is there any maintenance going on for some clusters now ? > > Thanks, > Chao > > From predragp at andrew.cmu.edu Wed May 27 16:15:28 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 27 May 2020 16:15:28 -0400 Subject: autofs deamon problem update Message-ID: Dear Autonians, The autofs deamon problem appears to be Red Hat 8.1 specific. The affected are the following computing nodes GPU[15-21]. It appears that the upgrade to the recently released Red Hat 8.2 fixes the problem (it looks like some stupid SELinux knob). I have upgraded and rebooted GPU21. I know that many of you will be upset by an unannounced reboot but that is the only way to unfreeze the machines for login. I have already upgraded rebooted GPU[19-21] and appears to work as expected. Predrag From predragp at andrew.cmu.edu Wed May 27 16:33:16 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 27 May 2020 16:33:16 -0400 Subject: autofs deamon problem update In-Reply-To: References: Message-ID: This is now fixed across the board GPU[15-21]. I don't plan to interfere in upcoming conference deadlines by rebooting machines. Let's hope that this is the very last forced reboot. Predrag On Wed, May 27, 2020 at 4:15 PM Predrag Punosevac wrote: > > Dear Autonians, > > The autofs deamon problem appears to be Red Hat 8.1 specific. The > affected are the following computing nodes GPU[15-21]. It appears that > the upgrade to the recently released Red Hat 8.2 fixes the problem (it > looks like some stupid SELinux knob). I have upgraded and rebooted > GPU21. I know that many of you will be upset by an unannounced reboot > but that is the only way to unfreeze the machines for login. > > I have already upgraded rebooted GPU[19-21] and appears to work as expected. > > Predrag From ngisolfi at cs.cmu.edu Thu May 28 11:14:38 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 28 May 2020 11:14:38 -0400 Subject: [Lunch] Today @noon over Zoom Message-ID: https://cmu.zoom.us/j/492870487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Fri May 29 22:44:34 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 29 May 2020 22:44:34 -0400 Subject: lov[7-9] available Message-ID: <20200530024434.FfJL5Lnmp%predragp@andrew.cmu.edu> Dear Autonians, I managed to provision 3 more CPU nodes (lov7,lov8,lov9) with 88 cores and 768 GB or RAM per machine. I hope this will help a bit with the Tuesday deadline. Cheers, Predrag P.S. They run RHEL 8.2. I have not installed MATLAB yet but other scientific software is there. From jeff4 at andrew.cmu.edu Sun May 31 21:46:48 2020 From: jeff4 at andrew.cmu.edu (Jeff Schneider) Date: Sun, 31 May 2020 21:46:48 -0400 Subject: lov[7-9] available In-Reply-To: <20200530024434.FfJL5Lnmp%predragp@andrew.cmu.edu> References: <20200530024434.FfJL5Lnmp%predragp@andrew.cmu.edu> Message-ID: <0de61aa8-2c10-8523-f630-b04a01d669b9@andrew.cmu.edu> This is awesome, thanks Predrag! On 5/29/2020 10:44 PM, Predrag Punosevac wrote: > Dear Autonians, > > I managed to provision 3 more CPU nodes (lov7,lov8,lov9) with 88 cores > and 768 GB or RAM per machine. I hope this will help a bit with the > Tuesday deadline. > > Cheers, > Predrag > > P.S. They run RHEL 8.2. I have not installed MATLAB yet but other > scientific software is there.