From ngisolfi at cs.cmu.edu Thu Jun 4 09:37:49 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 4 Jun 2020 09:37:49 -0400 Subject: [Lunch] Today @ noon over Zoom Message-ID: <2387491C-9D26-4E3C-89F1-346DD5F55B9C@cs.cmu.edu> https://cmu.zoom.us/j/492870487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Thu Jun 4 14:28:53 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Thu, 4 Jun 2020 14:28:53 -0400 Subject: Gogs, X2Go on MAC In-Reply-To: References: Message-ID: I just tested it Gogs access using X2Go on MAC. It works flawlessly on my daughters' MAC book pro. It is OS X 10.15.4 and the version of X2Go is the one for 10.13 or higher https://wiki.x2go.org/doku.php/download:start You should obviously have XQartz installed (the computer must be rebooted after that). You will not be able to start X2Go client from Lanchpad as the application is not signed by Apple. You will have to go through Finder. Hold the control key and click on open. You will be able to launch the unsigned application. Cheers, Predrag On Thu, Jun 4, 2020 at 11:16 AM Cristian Challu wrote: > > Hi Predrag, > > I want to upload the .tex files to Gogs, but I remember you said having an interface in Mac is complicated. What do you suggest? I am reading the wiki of the website and if I understand correctly the best way is to clone a repo on one of the autonlab servers. Do we already have a repo in Gogs? > > Thanks, > Cristian From awd at cs.cmu.edu Tue Jun 9 17:03:41 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 9 Jun 2020 17:03:41 -0400 Subject: Karen Chen's thesis defense: this Thursday 6/11, 1pm on zoom Message-ID: Team, Please join and become a witness of a culmination of Karen's 16 year tenure as a researcher in the Auton Lab. She will be defending her doctoral dissertation this Thursday at 1pm via zoom. Hope to see a lot of Autonians there to cheer for Karen! Artur *Title: * Augmenting Human Perceptual and Reasoning Capabilities with Intelligent Multimodal Analytics: From Critical Care to Coaching Math Problem Solving Zoom Meeting https://cmu.zoom.us/j/94937712479?pwd=MFZubFNVdUNjYWxFQkl0UWRhZVJWUT09 Meeting ID: 949 3771 2479 Password: 562565 *Thesis Committee: * Artur Dubrawski (Chair) Daniel Nagin Amy Ogan (HCII, CMU) Sidney D?Mello (University of Colorado Boulder) *Link to thesis draft * https://cmu.box.com/s/1n2hg6q4l5yrg8hd0b4brbf7o58cyrlp *Abstract:* Augmenting human perceptual and reasoning capability by leveraging machine intelligence has been on the agenda of AI exploration since its inception. The recent developments in sensing technology make it possible to generate and accumulate massive amounts of high frequency multimodal data. This creates new opportunities for automating the analysis of this type of data where the dynamic and time sensitive decision making plays a major role in quality of service. In this thesis, I conduct simultaneous investigation in two presumably unrelated domains: monitoring of patients for instability in critical care, and monitoring elementary school students for their cognitive and affective states during math problem solving exercises. Those two contexts share similar monitoring paradigms with similar challenges, and offer comparable opportunities with high frequency multimodal observations. The goal of my thesis is to explore and demonstrate the practical utility of multimodal analytics of high frequency monitoring data in support of decision making in those contexts. I organized my research work around the following two thrusts. The ?rst thrust (Chapter 2) is motivated by the tension between the limited human capacity of ICU nurses to make best use of monitoring resources in fast-paced critical care settings. The current practice primarily relies on the vigilance of clinical personnel to recognize CRI early, however this is often in-suf?cient given the complexity and apparent unpredictability of temporal patterns of risk progression for CRI. The methods developed in this project aim to augment individual nurse ability to identify physiologic indicators of impending CRI (perceptual capability)and recognize how bedside monitor data patterns reflect heterogeneous disease progression (reasoning capability). The second thrust concerns education: augmenting human teacher perceptual and reasoning capability in providing personalized and adaptive support to students in either online or offline learning environments. The ?rst project (Chapter 3) aims to address the reasoning capabilities required by teachers to understand and eventually to adapt to the learning progression in math practices ("learning curve") of a large number of students using the ?ne-grained log data collected from an on-line learning environment. I developed a data driven method for decomposing population level learning curves into distinctive groups that reveal interpretable patterns of "skill growth" which correlate with the students? future learning outcomes.The second project (Chapter 4) explores off-line learning scenarios of coaching math problem solving with young students. I collected multi-modal data from one-on-one coaching sessions between parent-child dyads in a naturalist environment. Using those datasets, I developed analytical methods that may ease the cognitive load in resolving the practical teaching challenge of ?assistance dilemma?, i.e. making real time decisions with regard the timing and the type of support to provide (cognitive, metacognitive or social), in order to maximize students? exposure to ?productive struggles?. Eventually, those methods may be used to bootstrap the perceptual and reasoning modules toward a vision of an "intelligent monitor" in the classroom and home that can recognize and reason over dynamic cognitive and affective student processes during math problem solving in an off-line learning environment. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngisolfi at cs.cmu.edu Thu Jun 11 09:48:20 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 11 Jun 2020 09:48:20 -0400 Subject: [Lunch] Today @ noon over Zoom Message-ID: Link for Lunch: https://cmu.zoom.us/j/492870487 Also, remember to tune in to Karen?s thesis defense at 1pm, right after lunch! Karen?s Defense (more details in message Artur sent on June 9): https://cmu.zoom.us/j/94937712479?pwd=MFZubFNVdUNjYWxFQkl0UWRhZVJWUT09 - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Sun Jun 14 11:26:57 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sun, 14 Jun 2020 11:26:57 -0400 Subject: GPU11 lying Idle In-Reply-To: References: Message-ID: <20200614152657.sGSq8%predragp@andrew.cmu.edu> Tanmay Agarwal wrote: > Hi Predrag, > > Hope you are doing well. > > I just noticed that GPU11 seems to be sitting idle with all the GPU memory > occupied by zombie processes. Can we reboot this machine and make it usable > again? Done! > > P.S: Attaching a screenshot of the current state of the machine. > > > [image: image.png] > > Thanking you, > > Warm Regards, > > Tanmay Agarwal | MSR Graduate Student > Robotics Institute @ CMU > mailto: tanmaya at andrew.cmu.edu From predragp at andrew.cmu.edu Mon Jun 15 13:48:37 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 15 Jun 2020 13:48:37 -0400 Subject: Python error on Auton server In-Reply-To: References: Message-ID: <20200615174837.3AV16%predragp@andrew.cmu.edu> Saswati Ray wrote: > Hi Predrag, > > >From today morning, I see this error in my pre-existing working conda > environment on lov3. > Damn Conda morons. I have seen that before. It looks like they broke Python 3.7.7 installation on RedHat 7.8. That is a linker problem you see. I just checked and I can still load python 3.7.7. but when I run a conda command I see the same error. We have 8 computing nodes in total running RedHat 8.2 and they are not affected. Please try to rebuild your environment and report. I will see later today if reinstalling can help. I am a moron too. I was contaplanting upgrading Python to 3.8.3. I should be much more careful with Conda guys. You also have RedHat rh-python36 in /opt/rh. That one always works. Predrag > Traceback (most recent call last): > File "recap.py", line 15, in > import numpy as np > File > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/__init__.py", > line 142, in > from . import core > File > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/core/__init__.py", > line 102, in > from . import _dtype_ctypes > File > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/core/_dtype_ctypes.py", > line 25, in > import _ctypes > ImportError: libffi.so.7: cannot open shared object file: No such file or > directory > > Any ideas? > > Thanks, > Saswati > > -- > Saswati Ray > Senior Research Analyst > Carnegie Mellon University - Auton Lab > Newell-Simon Hall Room 3115, Pittsburgh PA 15213 > Phone: 412-268-1238 From predragp at andrew.cmu.edu Mon Jun 15 14:33:00 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 15 Jun 2020 14:33:00 -0400 Subject: Python error on Auton server In-Reply-To: References: <20200615174837.3AV16%predragp@andrew.cmu.edu> Message-ID: <20200615183300.yFqVY%predragp@andrew.cmu.edu> Saswati Ray wrote: > I cannot create a conda env on lov3. Which are the machines on which you > say are running RH 8.2 and are not affected? > Documentation? https://www.autonlab.org/intranet lov[7-9] gpu[15-21] That is actually ten not eight machines as I wrote in my original email. > Thanks, > Saswati > > On Mon, Jun 15, 2020 at 1:48 PM Predrag Punosevac > wrote: > > > Saswati Ray wrote: > > > > > Hi Predrag, > > > > > > >From today morning, I see this error in my pre-existing working conda > > > environment on lov3. > > > > > > > Damn Conda morons. I have seen that before. It looks like they broke > > Python 3.7.7 installation on RedHat 7.8. That is a linker problem you > > see. I just checked and I can still load python 3.7.7. but when I run a > > conda command I see the same error. We have 8 computing nodes in total > > running RedHat 8.2 and they are not affected. Please try to rebuild your > > environment and report. I will see later today if reinstalling can help. > > I am a moron too. I was contaplanting upgrading Python to 3.8.3. I > > should be much more careful with Conda guys. > > > > You also have RedHat rh-python36 in /opt/rh. That one always works. > > > > Predrag > > > > > > > > > > > > > > > Traceback (most recent call last): > > > File "recap.py", line 15, in > > > import numpy as np > > > File > > > > > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/__init__.py", > > > line 142, in > > > from . import core > > > File > > > > > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/core/__init__.py", > > > line 102, in > > > from . import _dtype_ctypes > > > File > > > > > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/core/_dtype_ctypes.py", > > > line 25, in > > > import _ctypes > > > ImportError: libffi.so.7: cannot open shared object file: No such file or > > > directory > > > > > > Any ideas? > > > > > > Thanks, > > > Saswati > > > > > > -- > > > Saswati Ray > > > Senior Research Analyst > > > Carnegie Mellon University - Auton Lab > > > Newell-Simon Hall Room 3115, Pittsburgh PA 15213 > > > Phone: 412-268-1238 > > > > > -- > Saswati Ray > Senior Research Analyst > Carnegie Mellon University - Auton Lab > Newell-Simon Hall Room 3115, Pittsburgh PA 15213 > Phone: 412-268-1238 From awd at cs.cmu.edu Mon Jun 15 16:55:32 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 15 Jun 2020 16:55:32 -0400 Subject: Let's congratulate our newest Doctors (and prepare for more to come soon!) In-Reply-To: <0681ce3d-d48b-b1e4-a1e9-cc3bf7dbfb28@cmu.edu> References: <0681ce3d-d48b-b1e4-a1e9-cc3bf7dbfb28@cmu.edu> Message-ID: Dear Autonians, We have reached what appears to be the new high in the production rate of doctors coming out of our Auton Lab production lines. So, recently, we have had these outstanding individuals graduating with their Ph.D.s: Dr. Karen Chen - on her way to become a professor at the University of Maryland Dr. Maria De Arteaga - on her way to become a processor at the University of Texas Dr. Chao Liu - on his way to become a researcher at Nvidia Please join me in congratulating Karen, Maria and Chao on their remarkable accomplishments (and please strictly remember to address each of them by "Doctor" from now on). But, behold, we are not done yet! Please mark your calendars as one more exciting thesis defense is coming. This one is by Yichong Xu (he will be on his way to Microsoft Research afterwards). See below for details. Cheers, Artur *Thesis **Defense* Date: June 29, 2020 Time: 9:00am (EDT) PhD Candidate: Yichong Xu Virtual Presentation Link: https://cmu.zoom.us/j/99909151454 *Title: *Learning and Decision Making from Diverse Forms of Information Abstract: Classical machine learning posits that data are independently and identically distributed, in a single format usually the same as test data. In modern applications however, additional information in other formats might be available freely or at a lower cost. For example, in data crowdsourcing we can collect preferences over the data points instead of directly asking the labels of a single data point at a lower cost. In natural language understanding problems, we might have limited amount of data in the target domain, but can use a large amount of general domain data for free. The main topic of this thesis is to study how to efficiently incorporate these diverse forms of information into the learning and decision making process. We study two representative paradigms in this thesis. Firstly, we study learning and decision making problems with direct labels and comparisons. Our algorithms can efficiently combine comparisons with direct labels so that the total learning cost can be greatly reduced. Secondly, we study multi-task learning problems from multiple domain data, and design algorithms to transfer the data from a general, abundant domain to the target domain. We show theoretical guarantees of our algorithms as well as their statistical minimaxity through information-theoretic limits. On the practical side, we demonstrate promising experimental results on price estimation and natural language understanding tasks. *Thesis Committee:* Artur Dubrawski, Co-Chair Aarti Singh, Co-Chair Sivaraman Balakrishnan John Langford (Microsoft Research) -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Jun 15 17:54:18 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 15 Jun 2020 17:54:18 -0400 Subject: Python error on Auton server In-Reply-To: References: <20200615174837.3AV16%predragp@andrew.cmu.edu> <20200615183300.yFqVY%predragp@andrew.cmu.edu> Message-ID: <20200615215418.XtSrV%predragp@andrew.cmu.edu> Saswati Ray wrote: Hi Saswati and the rest of Autonians, Could you please test /opt/miniconda3-new and /opt/miniconda3-py38 on LOV3. Note only LOV3 has two new instances of conda. miniconda3-new is the fresh installation of Python 3.7.7 which seems to fixes linking problem for me. miniconda3-py38 is python 3.8.3 which we probably should start using in the near future. Considering the pace of Python development and its importance for our lab I propose to have two miniconda installations per server (last two consequtive releases) /opt/miniconda-py37 /opt/miniconda-py38 I know that many people have their own installations of Conda which is not really optimal. If I get positive feed back I will go ahead and refresh miniconda installations on all nodes running RedHat 7.8. Cheers, Predrag > Thanks Predrag! > > On Mon, Jun 15, 2020 at 2:33 PM Predrag Punosevac > wrote: > > > Saswati Ray wrote: > > > > > I cannot create a conda env on lov3. Which are the machines on which you > > > say are running RH 8.2 and are not affected? > > > > > > > Documentation? > > > > https://www.autonlab.org/intranet > > > > > > lov[7-9] gpu[15-21] > > > > That is actually ten not eight machines as I wrote in my original email. > > > > > > > Thanks, > > > Saswati > > > > > > On Mon, Jun 15, 2020 at 1:48 PM Predrag Punosevac < > > predragp at andrew.cmu.edu> > > > wrote: > > > > > > > Saswati Ray wrote: > > > > > > > > > Hi Predrag, > > > > > > > > > > >From today morning, I see this error in my pre-existing working > > conda > > > > > environment on lov3. > > > > > > > > > > > > > Damn Conda morons. I have seen that before. It looks like they broke > > > > Python 3.7.7 installation on RedHat 7.8. That is a linker problem you > > > > see. I just checked and I can still load python 3.7.7. but when I run a > > > > conda command I see the same error. We have 8 computing nodes in total > > > > running RedHat 8.2 and they are not affected. Please try to rebuild > > your > > > > environment and report. I will see later today if reinstalling can > > help. > > > > I am a moron too. I was contaplanting upgrading Python to 3.8.3. I > > > > should be much more careful with Conda guys. > > > > > > > > You also have RedHat rh-python36 in /opt/rh. That one always works. > > > > > > > > Predrag > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Traceback (most recent call last): > > > > > File "recap.py", line 15, in > > > > > import numpy as np > > > > > File > > > > > > > > > > > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/__init__.py", > > > > > line 142, in > > > > > from . import core > > > > > File > > > > > > > > > > > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/core/__init__.py", > > > > > line 102, in > > > > > from . import _dtype_ctypes > > > > > File > > > > > > > > > > > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/core/_dtype_ctypes.py", > > > > > line 25, in > > > > > import _ctypes > > > > > ImportError: libffi.so.7: cannot open shared object file: No such > > file or > > > > > directory > > > > > > > > > > Any ideas? > > > > > > > > > > Thanks, > > > > > Saswati > > > > > > > > > > -- > > > > > Saswati Ray > > > > > Senior Research Analyst > > > > > Carnegie Mellon University - Auton Lab > > > > > Newell-Simon Hall Room 3115, Pittsburgh PA 15213 > > > > > Phone: 412-268-1238 > > > > > > > > > > > > > -- > > > Saswati Ray > > > Senior Research Analyst > > > Carnegie Mellon University - Auton Lab > > > Newell-Simon Hall Room 3115, Pittsburgh PA 15213 > > > Phone: 412-268-1238 > > > > > -- > Saswati Ray > Senior Research Analyst > Carnegie Mellon University - Auton Lab > Newell-Simon Hall Room 3115, Pittsburgh PA 15213 > Phone: 412-268-1238 From predragp at andrew.cmu.edu Mon Jun 15 18:32:16 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 15 Jun 2020 18:32:16 -0400 Subject: /opt/miniconda-py38 Message-ID: <20200615223216.AAuaP%predragp@andrew.cmu.edu> Hi Autonians, A working copy of /opt/miniconda-py38 is now avaiable across all nodes running RedHat 7.8 Best, Predrag From sray at cs.cmu.edu Mon Jun 15 22:17:47 2020 From: sray at cs.cmu.edu (Saswati Ray) Date: Mon, 15 Jun 2020 22:17:47 -0400 Subject: Python error on Auton server In-Reply-To: <20200615215418.XtSrV%predragp@andrew.cmu.edu> References: <20200615174837.3AV16%predragp@andrew.cmu.edu> <20200615183300.yFqVY%predragp@andrew.cmu.edu> <20200615215418.XtSrV%predragp@andrew.cmu.edu> Message-ID: Predrag, I tested /opt/miniconda3-new on lov3. It works fine now. Thanks, Saswati On Mon, Jun 15, 2020 at 5:54 PM Predrag Punosevac wrote: > Saswati Ray wrote: > > Hi Saswati and the rest of Autonians, > > Could you please test > > /opt/miniconda3-new > > and > > /opt/miniconda3-py38 > > > on LOV3. Note only LOV3 has two new instances of conda. > > miniconda3-new is the fresh installation of Python 3.7.7 which seems to > fixes linking problem for me. > > miniconda3-py38 is python 3.8.3 which we probably should start using in > the near future. > > Considering the pace of Python development and its importance for our > lab I propose to have two miniconda installations per server (last two > consequtive releases) > > /opt/miniconda-py37 > > /opt/miniconda-py38 > > I know that many people have their own installations of Conda which is > not really optimal. If I get positive feed back I will go ahead and > refresh miniconda installations on all nodes running RedHat 7.8. > > Cheers, > Predrag > > > > > Thanks Predrag! > > > > On Mon, Jun 15, 2020 at 2:33 PM Predrag Punosevac < > predragp at andrew.cmu.edu> > > wrote: > > > > > Saswati Ray wrote: > > > > > > > I cannot create a conda env on lov3. Which are the machines on which > you > > > > say are running RH 8.2 and are not affected? > > > > > > > > > > Documentation? > > > > > > https://www.autonlab.org/intranet > > > > > > > > > lov[7-9] gpu[15-21] > > > > > > That is actually ten not eight machines as I wrote in my original > email. > > > > > > > > > > Thanks, > > > > Saswati > > > > > > > > On Mon, Jun 15, 2020 at 1:48 PM Predrag Punosevac < > > > predragp at andrew.cmu.edu> > > > > wrote: > > > > > > > > > Saswati Ray wrote: > > > > > > > > > > > Hi Predrag, > > > > > > > > > > > > >From today morning, I see this error in my pre-existing working > > > conda > > > > > > environment on lov3. > > > > > > > > > > > > > > > > Damn Conda morons. I have seen that before. It looks like they > broke > > > > > Python 3.7.7 installation on RedHat 7.8. That is a linker problem > you > > > > > see. I just checked and I can still load python 3.7.7. but when I > run a > > > > > conda command I see the same error. We have 8 computing nodes in > total > > > > > running RedHat 8.2 and they are not affected. Please try to rebuild > > > your > > > > > environment and report. I will see later today if reinstalling can > > > help. > > > > > I am a moron too. I was contaplanting upgrading Python to 3.8.3. I > > > > > should be much more careful with Conda guys. > > > > > > > > > > You also have RedHat rh-python36 in /opt/rh. That one always works. > > > > > > > > > > Predrag > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Traceback (most recent call last): > > > > > > File "recap.py", line 15, in > > > > > > import numpy as np > > > > > > File > > > > > > > > > > > > > > > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/__init__.py", > > > > > > line 142, in > > > > > > from . import core > > > > > > File > > > > > > > > > > > > > > > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/core/__init__.py", > > > > > > line 102, in > > > > > > from . import _dtype_ctypes > > > > > > File > > > > > > > > > > > > > > > "/zfsauton2/home/sray/.local/lib/python3.7/site-packages/numpy/core/_dtype_ctypes.py", > > > > > > line 25, in > > > > > > import _ctypes > > > > > > ImportError: libffi.so.7: cannot open shared object file: No such > > > file or > > > > > > directory > > > > > > > > > > > > Any ideas? > > > > > > > > > > > > Thanks, > > > > > > Saswati > > > > > > > > > > > > -- > > > > > > Saswati Ray > > > > > > Senior Research Analyst > > > > > > Carnegie Mellon University - Auton Lab > > > > > > Newell-Simon Hall Room 3115, Pittsburgh PA 15213 > > > > > > Phone: 412-268-1238 > > > > > > > > > > > > > > > > > -- > > > > Saswati Ray > > > > Senior Research Analyst > > > > Carnegie Mellon University - Auton Lab > > > > Newell-Simon Hall Room 3115, Pittsburgh PA 15213 > > > > Phone: 412-268-1238 > > > > > > > > > -- > > Saswati Ray > > Senior Research Analyst > > Carnegie Mellon University - Auton Lab > > Newell-Simon Hall Room 3115, Pittsburgh PA 15213 > > Phone: 412-268-1238 > -- Saswati Ray Senior Research Analyst Carnegie Mellon University - Auton Lab Newell-Simon Hall Room 3115, Pittsburgh PA 15213 Phone: 412-268-1238 -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Jun 15 22:53:07 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 15 Jun 2020 22:53:07 -0400 Subject: miniconda-py37 and miniconda-py38 Message-ID: <20200616025307.oIMX5%predragp@andrew.cmu.edu> Dear Autonians, All comp nodes running RHEL 7.8 now have to fully working, up to date versions of python 3.7.7 and python 3.8.3 installed using miniconda. The new locations are /opt/miniconda-py37 and /opt/miniconda-py38 Please adjust your scripts and package locations accordingly before reporting problems. Best, Predrag From predragp at andrew.cmu.edu Tue Jun 16 22:41:40 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 16 Jun 2020 22:41:40 -0400 Subject: HDF5 on AutonLab resources In-Reply-To: References: Message-ID: <20200617024140.VGxke%predragp@andrew.cmu.edu> George Stoica wrote: > Hi Predrag, > > I hope everything is well. > > I was wondering whether there might be an existing HDF5 installation path > on the autonlab resources? I have been trying to install caffe, and the HDF5 had some nasty dependency issues. Package management is NP hard problem so that is expected from time to time on the large projects like RHEL. I removed it from our computing nodes as nobody needed it. I would be happy to install HDF5 if you tell me exactly what packages. It is actually very good and very useful data format. I like it a lot. I would install it on one of machines running RHEL 8.2. Eventually all computing nodes will run 8.xxx branch so there is no point looking back. > respective Makefile cannot find any HDF5 distribution. I have tried > installing the latest distribution myself but it seems like there may be a > roadblock with sudo privileges (at least based on the sources I have > found). > We use adhere to "The Principle of Least Privilege" around the Auton Lab as documented on our Wiki. No you nor for that matter most people with the lab accounts can't execute any commands which require elevated privileges. > Or do you know if there might be an existing caffe installation that may be > used? I ask this because there were other dependency issues such as a too People did use Caffe in the past on our servers. > old version of BOOST (the server default is 1.53,whereas the MakeFile > expects at least 1.54). While I believe I've gotten past these problems, I There are multiple version of Boost C++ library installed on RHEL 7.8. RHEL 8.2 comes with quite new C++ compiler and Boost libraries. Not many people around the lab use C++ these days. > am not very familiar with solving these lower level dependency issues and > am concerned about potentially inadvertently screwing something up on the > server. That is why you are not member of any sudoers group. The second rule in the lab is if you break things you get to keep all the pieces and you have to put them back together. Can you guess who broke the most things around here:-) Hint: He is just typing the replay to your email. Cheers, Predrag > > Thanks very much, > George From awertz at cmu.edu Tue Jun 16 23:10:10 2020 From: awertz at cmu.edu (Anthony Wertz) Date: Tue, 16 Jun 2020 23:10:10 -0400 Subject: HDF5 on AutonLab resources In-Reply-To: <20200617024140.VGxke%predragp@andrew.cmu.edu> References: <20200617024140.VGxke%predragp@andrew.cmu.edu> Message-ID: <18A08474-9654-4A88-A2B9-6AF5E40F0817@cmu.edu> George, If you want to build specific dependencies yourself, you?ll have to do a little research, but most (if not all) build systems default to a system-visible install path (i.e., requires root) but allow specifying a custom (e.g., local) install path instead. For UNIX makefiles (one option for HDF5) it requires the ?prefix directive. For CMAKE (an alternative, preferred (by me) option for HDF5) you use the CMAKE_INSTALL_PREFIX directive. For boost you have to double check, it uses it?s own build tool, but I think they directive is, again, ?prefix. A common procedure is to install packages locally to ~/.local and add ~/.local/bin to your path and ~/.local/lib[64] to your LD_LIBRARY_PATH and LIBRARY_PATH environment variables. You may also want to look at PKG_CONFIG_PATH and MAN_PATH. I routinely build specific versions of various packages (including both HDF5 and boost), but it can be cumbersome to get started if you?re not already pretty comfortable using C/C++ build systems and working out build dependencies. And note that building those packages may imply building some number of additional packages in order to get the required capabilities. However, it is also worth noting that miniconda provides a lot of these dependencies already, including a newer version of HDF5. Predrag sent out an email earlier today or yesterday about where to find miniconda on the servers, I would google how to setup a conda environment and then just use `conda install hdf5`, the install caffe (noting that you?ll need to activate that conda environment to install and to use caffe). - Anthony > On 16Jun2020, at 22:41, Predrag Punosevac wrote: > > George Stoica wrote: > >> Hi Predrag, >> >> I hope everything is well. >> >> I was wondering whether there might be an existing HDF5 installation path >> on the autonlab resources? I have been trying to install caffe, and the > > HDF5 had some nasty dependency issues. Package management is NP hard > problem so that is expected from time to time on the large projects like > RHEL. I removed it from our computing nodes as nobody needed it. I would > be happy to install HDF5 if you tell me exactly what packages. It is > actually very good and very useful data format. I like it a lot. > I would install it on one of machines running RHEL 8.2. Eventually all > computing nodes will run 8.xxx branch so there is no point looking back. > > >> respective Makefile cannot find any HDF5 distribution. I have tried >> installing the latest distribution myself but it seems like there may be a >> roadblock with sudo privileges (at least based on the sources I have >> found). >> > > We use adhere to "The Principle of Least Privilege" around the Auton Lab > as documented on our Wiki. No you nor for that matter most people with > the lab accounts can't execute any commands which require elevated > privileges. > > >> Or do you know if there might be an existing caffe installation that may be >> used? I ask this because there were other dependency issues such as a too > > People did use Caffe in the past on our servers. > > >> old version of BOOST (the server default is 1.53,whereas the MakeFile >> expects at least 1.54). While I believe I've gotten past these problems, I > > > There are multiple version of Boost C++ library installed on RHEL 7.8. > RHEL 8.2 comes with quite new C++ compiler and Boost libraries. Not many > people around the lab use C++ these days. > > >> am not very familiar with solving these lower level dependency issues and >> am concerned about potentially inadvertently screwing something up on the >> server. > > That is why you are not member of any sudoers group. The second rule in > the lab is if you break things you get to keep all the pieces and you > have to put them back together. Can you guess who broke the most things > around here:-) Hint: He is just typing the replay to your email. > > > Cheers, > Predrag > > >> >> Thanks very much, >> George -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: Message signed with OpenPGP URL: From gis at andrew.cmu.edu Wed Jun 17 01:20:15 2020 From: gis at andrew.cmu.edu (George Stoica) Date: Tue, 16 Jun 2020 22:20:15 -0700 Subject: HDF5 on AutonLab resources In-Reply-To: <18A08474-9654-4A88-A2B9-6AF5E40F0817@cmu.edu> References: <20200617024140.VGxke%predragp@andrew.cmu.edu> <18A08474-9654-4A88-A2B9-6AF5E40F0817@cmu.edu> Message-ID: Hi Anthony and Predrag, Thanks very much for the advice and help. I had used the -prefix flag to get the BOOST install working, but had some additional trouble installing HDF5 using the same type of install. I didn't know about these additional flags and path variables so this is very helpful thank you! And additionally the hdf5 in conda. I'll take another look into this. Regarding an resource-wide installation of HDF5, it would be much appreciated. I don't see any particular version restriction for me, so whatever would work best for everyone. Regarding the sudo privileges: I completely agree about (me) not having these privileges and I apologize if my email came across in that way. I only mentioned it because I wasn't sure how best to move forward. But both the comments in these emails have been very helpful. Thanks very much, George On Tue, Jun 16, 2020 at 8:10 PM Anthony Wertz wrote: > George, > > If you want to build specific dependencies yourself, you?ll have to do a > little research, but most (if not all) build systems default to a > system-visible install path (i.e., requires root) but allow specifying a > custom (e.g., local) install path instead. For UNIX makefiles (one option > for HDF5) it requires the ?prefix directive. For CMAKE (an alternative, > preferred (by me) option for HDF5) you use the CMAKE_INSTALL_PREFIX > directive. For boost you have to double check, it uses it?s own build tool, > but I think they directive is, again, ?prefix. A common procedure is to > install packages locally to ~/.local and add ~/.local/bin to your path and > ~/.local/lib[64] to your LD_LIBRARY_PATH and LIBRARY_PATH environment > variables. You may also want to look at PKG_CONFIG_PATH and MAN_PATH. > > I routinely build specific versions of various packages (including both > HDF5 and boost), but it can be cumbersome to get started if you?re not > already pretty comfortable using C/C++ build systems and working out build > dependencies. And note that building those packages may imply building some > number of additional packages in order to get the required capabilities. > > *However, *it is also worth noting that miniconda provides a lot of these > dependencies already, including a newer version of HDF5. Predrag sent out > an email earlier today or yesterday about where to find miniconda on the > servers, I would google how to setup a conda environment and then just use > `conda install hdf5`, the install caffe (noting that you?ll need to > activate that conda environment to install and to use caffe). > > > - *Anthony* > > On 16Jun2020, at 22:41, Predrag Punosevac wrote: > > George Stoica wrote: > > Hi Predrag, > > I hope everything is well. > > I was wondering whether there might be an existing HDF5 installation path > on the autonlab resources? I have been trying to install caffe, and the > > > HDF5 had some nasty dependency issues. Package management is NP hard > problem so that is expected from time to time on the large projects like > RHEL. I removed it from our computing nodes as nobody needed it. I would > be happy to install HDF5 if you tell me exactly what packages. It is > actually very good and very useful data format. I like it a lot. > I would install it on one of machines running RHEL 8.2. Eventually all > computing nodes will run 8.xxx branch so there is no point looking back. > > > respective Makefile cannot find any HDF5 distribution. I have tried > installing the latest distribution myself but it seems like there may be a > roadblock with sudo privileges (at least based on the sources I have > found). > > > We use adhere to "The Principle of Least Privilege" around the Auton Lab > as documented on our Wiki. No you nor for that matter most people with > the lab accounts can't execute any commands which require elevated > privileges. > > > Or do you know if there might be an existing caffe installation that may be > used? I ask this because there were other dependency issues such as a too > > > People did use Caffe in the past on our servers. > > > old version of BOOST (the server default is 1.53,whereas the MakeFile > expects at least 1.54). While I believe I've gotten past these problems, I > > > > There are multiple version of Boost C++ library installed on RHEL 7.8. > RHEL 8.2 comes with quite new C++ compiler and Boost libraries. Not many > people around the lab use C++ these days. > > > am not very familiar with solving these lower level dependency issues and > am concerned about potentially inadvertently screwing something up on the > server. > > > That is why you are not member of any sudoers group. The second rule in > the lab is if you break things you get to keep all the pieces and you > have to put them back together. Can you guess who broke the most things > around here:-) Hint: He is just typing the replay to your email. > > > Cheers, > Predrag > > > > Thanks very much, > George > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Jun 17 14:07:44 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 17 Jun 2020 14:07:44 -0400 Subject: Osmesa for mujoco installation In-Reply-To: <14E3C13F-1627-44C6-B7A9-83887963DCA7@andrew.cmu.edu> References: <14E3C13F-1627-44C6-B7A9-83887963DCA7@andrew.cmu.edu> Message-ID: Hi Viraj, osmesa.h is a header file for the software you are trying to compile. You don't have write privileges in system directories but you have access to UNIX filters find, grep and also to shell. If that is not enough RHEL is shipped with Python (I prefer Perl for sys admin). Thus you can find two scripting languages on each of our machines. This is a very easy question to answer for a CMU CS student. Source code you want to compile definitely can have any location as well as produced obj code and binaries. Even missing dependencies if any should be easy to add using /opt/conda3 on RHEL 8.2 which GPU20 is running. Please re-read Anthony's yesterday email. He and I might have a different taste when it comes to certain tools/software but he really knows what he is talking about. He also took the time to write that long email to educate all of us. Cheers, Predrag Cheers, Predrag On Wed, Jun 17, 2020 at 1:37 PM Viraj Mehta wrote: > > Hi Predrag, > > Hope you?re well. We are working on trying to get a student licensed version of mujoco installed on GPU20. I was able to get a license for the machine and to download the appropriate software from their website and the mujoco-py repository. However, when I actually try to run the Mujoco gym environments, I am running into gcc compilation errors around a missing copy of osmesa.h. Do you know whether we have that installed or if it would be possible to do so? > > Thanks, > Viraj From virajm at andrew.cmu.edu Wed Jun 17 14:24:56 2020 From: virajm at andrew.cmu.edu (Viraj Mehta) Date: Wed, 17 Jun 2020 12:24:56 -0600 Subject: Osmesa for mujoco installation In-Reply-To: References: <14E3C13F-1627-44C6-B7A9-83887963DCA7@andrew.cmu.edu> Message-ID: <753BBBBF-3995-4200-AA1F-D75B58A4FCB4@andrew.cmu.edu> Hi Predrag, Thanks for the response, and sorry for not being more clear. I `find` it in /usr/include/GL on GPU3 but not on GPU20 in /usr/local, /usr/include, and also just /usr/. So that leads me to believe that it?s not around. Definitely could spend more time looking but was asking you to spare crawling the entire machine, since /usr seems like where it should be. I also read Anthony's email yesterday, though I think my situation is different in that the folks at OpenAI have obfuscated the compilation to the point that I only can see what is going on by trying to instantiate a gym environment or really digging into their library code. I?m happy to keep looking around using the unix tools or Python or whatever but it does seem missing to me. I?d appreciate some guidance. Thanks, Viraj > On Jun 17, 2020, at 12:07 PM, Predrag Punosevac wrote: > > Hi Viraj, > > osmesa.h is a header file for the software you are trying to compile. > You don't have write privileges in system directories but you have > access to > UNIX filters find, grep and also to shell. If that is not enough RHEL > is shipped with Python (I prefer Perl for sys admin). Thus you can > find two scripting languages on each of our machines. This is a very > easy question to answer for a CMU CS student. > > Source code you want to compile definitely can have any location as > well as produced obj code and binaries. Even missing dependencies if > any should be easy to add using /opt/conda3 on RHEL 8.2 which GPU20 is > running. > > Please re-read Anthony's yesterday email. He and I might have a > different taste when it comes to certain tools/software but he really > knows what he is talking about. He also took the time to write that > long email to educate all of us. > > Cheers, > Predrag > > Cheers, > Predrag > > On Wed, Jun 17, 2020 at 1:37 PM Viraj Mehta wrote: >> >> Hi Predrag, >> >> Hope you?re well. We are working on trying to get a student licensed version of mujoco installed on GPU20. I was able to get a license for the machine and to download the appropriate software from their website and the mujoco-py repository. However, when I actually try to run the Mujoco gym environments, I am running into gcc compilation errors around a missing copy of osmesa.h. Do you know whether we have that installed or if it would be possible to do so? >> >> Thanks, >> Viraj From predragp at andrew.cmu.edu Wed Jun 17 14:40:33 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 17 Jun 2020 14:40:33 -0400 Subject: Osmesa for mujoco installation In-Reply-To: <753BBBBF-3995-4200-AA1F-D75B58A4FCB4@andrew.cmu.edu> References: <14E3C13F-1627-44C6-B7A9-83887963DCA7@andrew.cmu.edu> <753BBBBF-3995-4200-AA1F-D75B58A4FCB4@andrew.cmu.edu> Message-ID: You don't have write privileges in /usr. You have only write privileges in your home directory and your scratch directory. The only system directory you can write in is /tmp. UNIX filter fine has a number of useful switches and OK support for regular expressions (not as good as Perl). Think before writing that one line code. You only need to search folders where you have the write permission. If it is not there it is unlikely on the machine. You might need to talk to OpenAI guys. Follow their written instructions closely. If the instructions are saying it is tested on Windows and Ubuntu 18.04 only that means it runs on Windows and Ubuntu 18.04 only. Cheers, Predrag On Wed, Jun 17, 2020 at 2:25 PM Viraj Mehta wrote: > > Hi Predrag, > > Thanks for the response, and sorry for not being more clear. I `find` it in /usr/include/GL on GPU3 but not on GPU20 in /usr/local, /usr/include, and also just /usr/. So that leads me to believe that it?s not around. Definitely could spend more time looking but was asking you to spare crawling the entire machine, since /usr seems like where it should be. > > I also read Anthony's email yesterday, though I think my situation is different in that the folks at OpenAI have obfuscated the compilation to the point that I only can see what is going on by trying to instantiate a gym environment or really digging into their library code. > > I?m happy to keep looking around using the unix tools or Python or whatever but it does seem missing to me. I?d appreciate some guidance. > > Thanks, > Viraj > > > On Jun 17, 2020, at 12:07 PM, Predrag Punosevac wrote: > > > > Hi Viraj, > > > > osmesa.h is a header file for the software you are trying to compile. > > You don't have write privileges in system directories but you have > > access to > > UNIX filters find, grep and also to shell. If that is not enough RHEL > > is shipped with Python (I prefer Perl for sys admin). Thus you can > > find two scripting languages on each of our machines. This is a very > > easy question to answer for a CMU CS student. > > > > Source code you want to compile definitely can have any location as > > well as produced obj code and binaries. Even missing dependencies if > > any should be easy to add using /opt/conda3 on RHEL 8.2 which GPU20 is > > running. > > > > Please re-read Anthony's yesterday email. He and I might have a > > different taste when it comes to certain tools/software but he really > > knows what he is talking about. He also took the time to write that > > long email to educate all of us. > > > > Cheers, > > Predrag > > > > Cheers, > > Predrag > > > > On Wed, Jun 17, 2020 at 1:37 PM Viraj Mehta wrote: > >> > >> Hi Predrag, > >> > >> Hope you?re well. We are working on trying to get a student licensed version of mujoco installed on GPU20. I was able to get a license for the machine and to download the appropriate software from their website and the mujoco-py repository. However, when I actually try to run the Mujoco gym environments, I am running into gcc compilation errors around a missing copy of osmesa.h. Do you know whether we have that installed or if it would be possible to do so? > >> > >> Thanks, > >> Viraj > From predragp at andrew.cmu.edu Wed Jun 17 14:50:08 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 17 Jun 2020 14:50:08 -0400 Subject: The license of Matlab 2019b on gpu7 has expired In-Reply-To: References: Message-ID: Hi Jarod, I am CC-ing users as this might be of concern to the wider community. No, the MATLAB license 2019b has not expired. The license manager might have died. It has to be restarted as a regular (non root) user. Typically I log as a root and then do su - auton-local cd /home/MATLAB/R2019/etc/ ./lmstart If you have an older pre 2019b license those have expired. I typically upgrade MATLAB in the lab only once a year due to the fact that CMU licenses run from 1st of October until the 1st of October next year. I do have access to R2020b pre-release. I am not doing testing for Mathworks unless they add 100k to my current annual salary. The license for R2020b pre-release will expire in three months. I could install R2020a but that is a lot of work if it is not really needed for something which can be used only until the end of September. Predrag On Wed, Jun 17, 2020 at 2:32 PM Donghan Wang wrote: > > Predrag, > > May you update it or install the latest Matlab available? > > Thanks, > Jarod From donghanw at cs.cmu.edu Wed Jun 17 15:57:31 2020 From: donghanw at cs.cmu.edu (Donghan Wang) Date: Wed, 17 Jun 2020 15:57:31 -0400 Subject: The license of Matlab 2019b on gpu7 has expired In-Reply-To: References: Message-ID: Predrag, Thanks for the information. I restarted the Matlab license manager on GPU7. It works now. The lmstart command writes to /var/tmp/lm_TMW.log. It would fail due to the ownership issue (shown below). So it seems only the auton-local account can restart it on computing nodes. [dwang at gpu7 etc]$ ./lmstart Error: Cannot write in logfile ($LM_LOGFILE). LM_LOGFILE = /var/tmp/lm_TMW.log -rw-rw-r--. 1 auton-local auton-local You are: uid=2043(dwang) gid=2043(dwang) Thanks, Jarod On Wed, Jun 17, 2020 at 2:50 PM Predrag Punosevac wrote: > Hi Jarod, > > I am CC-ing users as this might be of concern to the wider community. > > No, the MATLAB license 2019b has not expired. The license manager > might have died. It has to be restarted as a regular (non root) user. > Typically > I log as a root and then do > > su - auton-local > cd /home/MATLAB/R2019/etc/ > ./lmstart > > If you have an older pre 2019b license those have expired. > > I typically upgrade MATLAB in the lab only once a year due to the fact > that CMU licenses run from 1st of October until the 1st of October > next year. I do have access to R2020b pre-release. I am not doing > testing for Mathworks unless they add 100k to my current annual > salary. The license for R2020b pre-release will expire in three > months. I could install R2020a but that is a lot of work if it is not > really needed for something which can be used only until the end of > September. > > > > Predrag > > > > On Wed, Jun 17, 2020 at 2:32 PM Donghan Wang wrote: > > > > Predrag, > > > > May you update it or install the latest Matlab available? > > > > Thanks, > > Jarod > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Jun 17 16:00:58 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 17 Jun 2020 16:00:58 -0400 Subject: The license of Matlab 2019b on gpu7 has expired In-Reply-To: References: Message-ID: Just clear the /tmp folder from MATLAB cludge. You have access to them. IIRC you also have sysadmin privileges on that machine. There are few rough files left after the licensing manager crashed. Predrag On Wed, Jun 17, 2020 at 3:57 PM Donghan Wang wrote: > > Predrag, > > Thanks for the information. I restarted the Matlab license manager on GPU7. It works now. > > The lmstart command writes to /var/tmp/lm_TMW.log. It would fail due to the ownership issue (shown below). So it seems only the auton-local account can restart it on computing nodes. > > [dwang at gpu7 etc]$ ./lmstart > Error: Cannot write in logfile ($LM_LOGFILE). > LM_LOGFILE = /var/tmp/lm_TMW.log > -rw-rw-r--. 1 auton-local auton-local > You are: uid=2043(dwang) gid=2043(dwang) > > Thanks, > Jarod > > On Wed, Jun 17, 2020 at 2:50 PM Predrag Punosevac wrote: >> >> Hi Jarod, >> >> I am CC-ing users as this might be of concern to the wider community. >> >> No, the MATLAB license 2019b has not expired. The license manager >> might have died. It has to be restarted as a regular (non root) user. >> Typically >> I log as a root and then do >> >> su - auton-local >> cd /home/MATLAB/R2019/etc/ >> ./lmstart >> >> If you have an older pre 2019b license those have expired. >> >> I typically upgrade MATLAB in the lab only once a year due to the fact >> that CMU licenses run from 1st of October until the 1st of October >> next year. I do have access to R2020b pre-release. I am not doing >> testing for Mathworks unless they add 100k to my current annual >> salary. The license for R2020b pre-release will expire in three >> months. I could install R2020a but that is a lot of work if it is not >> really needed for something which can be used only until the end of >> September. >> >> >> >> Predrag >> >> >> >> On Wed, Jun 17, 2020 at 2:32 PM Donghan Wang wrote: >> > >> > Predrag, >> > >> > May you update it or install the latest Matlab available? >> > >> > Thanks, >> > Jarod From ngisolfi at cs.cmu.edu Thu Jun 18 10:02:28 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 18 Jun 2020 10:02:28 -0400 Subject: [Lunch] Today @noon over Zoom Message-ID: <49417417-D068-4CC8-ABBA-EF5AE6817427@cs.cmu.edu> https://cmu.zoom.us/j/492870487 We hope to see you there! - Nick P.S. I may be a few minutes late today, so if you?re the first to connect, hang tight and others will join! -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Fri Jun 19 18:15:39 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 19 Jun 2020 18:15:39 -0400 Subject: server issues, Lov 9 etc Message-ID: Dear Autonians, I am sending this email with the explanation instead of responding to a half-dozen emails I received around 4:00 PM. There is nothing wrong with our infrastructure. I managed to upgrade from OpenBSD 6.6 to OpenBSD 6.7 eleven critical infrastructure machines between 11:00 AM and 4:00 PM. Those problems you saw around 4:00 PM were caused by me upgrading the main firewall and breaking temporary routing tables. I just got back from CMU. I still need to restart OpenVPN clients on your desktops but I am more or less done. We should be OK now for the next 6 months. By the way, CMU was like the eight circle of Dante's inferno. AC is switched off with exception of the server room. It was at least 110 degrees in my office. Cheers, Predrag From awd at cs.cmu.edu Fri Jun 19 18:23:54 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 19 Jun 2020 18:23:54 -0400 Subject: server issues, Lov 9 etc In-Reply-To: References: Message-ID: Thanks Predrag! Artur PS Let's hope the high temps kill covid and other bugs :) On Fri, Jun 19, 2020 at 6:18 PM Predrag Punosevac wrote: > Dear Autonians, > > I am sending this email with the explanation instead of responding to > a half-dozen emails I received around 4:00 PM. > > There is nothing wrong with our infrastructure. I managed to upgrade > from OpenBSD 6.6 to OpenBSD 6.7 eleven critical infrastructure > machines between 11:00 AM and 4:00 PM. Those problems you saw around > 4:00 PM were caused by me upgrading the main firewall and breaking > temporary routing tables. I just got back from CMU. I still need to > restart OpenVPN clients on your desktops > but I am more or less done. > > We should be OK now for the next 6 months. By the way, CMU was like > the eight circle of Dante's inferno. AC is switched off with exception > of the server room. It was at least 110 degrees in my office. > > > Cheers, > Predrag > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Jun 22 13:13:26 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 22 Jun 2020 13:13:26 -0400 Subject: CPU servers lo3, lo4, low1, ari, Foxconn down In-Reply-To: <2B1F9AC4-14E9-4C1C-9AF5-F930F5D8A134@andrew.cmu.edu> References: <2B1F9AC4-14E9-4C1C-9AF5-F930F5D8A134@andrew.cmu.edu> Message-ID: On Mon, Jun 22, 2020 at 12:03 PM Arundhati Banerjee wrote: > > Hi Predrag, > > I just wanted to bring to your attention that some of the CPU servers are not > accessible at the moment. I actually had some code running on Foxconn which I > am unable to access now. I would be obliged if you could kindly look into it. > I saw it on Monit when I logged in this morning. This appears to be a major problem with the electric supply. The one common things for all these five servers is that they are connected to the same dumb PDU (power distribution unit) which in turn is connected to the same old 208V UPS. It was planned before Cov19 to remove all computing nodes from UPSs. I am trying to talk to David who is a server room attendant and the person physically located in Wean 3611. However, I would not hold my breath with this one. If the circuits are messed up that would require Ed Walter, me, and perhaps external contractors in the server room. That can take a long time. A nuclear option is moving those 5 servers to GHC. I am not sure that I would have enough electricity there. In either case we have a major problem on our hands. In my almost 8 years with the lab I have not seen such catastrophic failure of power circuits. Best, Predrag > Thank you. > Best regards, > Arundhati From predragp at andrew.cmu.edu Mon Jun 22 15:42:12 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 22 Jun 2020 15:42:12 -0400 Subject: CPU servers lo3, lo4, low1, ari, Foxconn down In-Reply-To: References: <2B1F9AC4-14E9-4C1C-9AF5-F930F5D8A134@andrew.cmu.edu> Message-ID: I have really good news. It appears that a fuse on one of the UPSs was blown up. The server room attendant was able to reset following my directions over the phone. I was able to use IPMI and power up 4 out of 5 machines. LOW1 appears to be down due to the faulty hardware. In my experience working with very old machines like LOW1 the most likely culprit is a dead RAM module. I do have some old yet good RAM modules to fix LOW1 but that will have to wait. As of going forward, before Cov19 struck, I was in the phase of moving all CPU nodes of UPS which are no longer capable of backing power hungry computing nodes. There is a design consensus among people who know server room electric greed that going forward all computing nodes are to be considered stateless and designed to crash before they pull down with themselves mission critical gear like file servers, firewalls, and web servers. All our GPU nodes are already in compliance but not CPU nodes. Cheers, Predrag On Mon, Jun 22, 2020 at 1:13 PM Predrag Punosevac wrote: > > On Mon, Jun 22, 2020 at 12:03 PM Arundhati Banerjee > wrote: > > > > Hi Predrag, > > > > I just wanted to bring to your attention that some of the CPU servers are not > > accessible at the moment. I actually had some code running on Foxconn which I > am unable to access now. I would be obliged if you could kindly look into it. > > > > I saw it on Monit when I logged in this morning. This appears to be a > major problem with the electric supply. The one common things for all > these five servers is that they are connected to the same dumb PDU > (power distribution unit) which in turn is connected to the same old > 208V UPS. It was planned before Cov19 to remove all computing nodes > from UPSs. > > I am trying to talk to David who is a server room attendant and the > person physically located in Wean 3611. However, I would not hold my > breath with this one. If the circuits are messed up that would require > Ed Walter, me, and perhaps external contractors in the server room. > That can take a long time. > A nuclear option is moving those 5 servers to GHC. I am not sure that > I would have enough electricity there. In either case we have a major > problem on our hands. > > In my almost 8 years with the lab I have not seen such catastrophic > failure of power circuits. > > Best, > Predrag > > > > > > > Thank you. > > Best regards, > > Arundhati From predragp at andrew.cmu.edu Wed Jun 24 15:12:45 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 24 Jun 2020 15:12:45 -0400 Subject: Size limit of scratch directory In-Reply-To: References: Message-ID: Hi Sebastian, I am CC-ing users as your question and my answer will be of concern to others. The sizes of scratch directories are only limited by the total capacity of the HDD hosting the directory. Some of those HDDs are small (256GB) as they are NVMe drivers. The others are not that small 2TB. Most computing nodes have free drive bays which will allow for the expansion of the scratch capacity. I do have older HDDs in our storage room which could be used for that. However, except in the extreme cases of small 256 NVMe drives I was reluctant to consider adding extra scratch space. Most often than not the scratch space is filled because people don't clean after themselves. The system in which I keep adding resources because people are lazy (for the lack of better word) to pick up after themselves doesn't scale well. Thus in the past we had to clean scratch on a few machines. IIRC this has happened 3 times since I came to lab 8 years ago but it did happen. facultiesCheers, Predrag On Wed, Jun 24, 2020 at 2:53 PM Sebastian Caldas Rivera wrote: > > Hello Predrag, > > I hope everything is well with you and your family. > > I had a question about the maximum storage size of the scratch directory. I have been running into issues with my experiments on lov5 because of "No space left on device" errors. My scratch directory on that machine should be ~4GB. I tried copying some data into my scratch directory in another machine (lov3) to run experiments there but got the same error even while just copying the data. > > Are there any limits on the scratch directories I'm unaware of? Am I doing something wrong when it comes to handling my data? > > Thank you, > Sebastian From ngisolfi at cs.cmu.edu Thu Jun 25 09:43:14 2020 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 25 Jun 2020 09:43:14 -0400 Subject: [Lunch] Today @noon over Zoom Message-ID: <49C349BF-C180-4348-B8D4-87B5AE8A282B@cs.cmu.edu> https://cmu.zoom.us/j/492870487 We hope to see you there! - Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Fri Jun 26 14:12:46 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 26 Jun 2020 14:12:46 -0400 Subject: Fwd: Thesis Defense - June 29, 2020 - Yichong Xu - Learning and Decision Making from Diverse Forms of Information In-Reply-To: <0681ce3d-d48b-b1e4-a1e9-cc3bf7dbfb28@cmu.edu> References: <0681ce3d-d48b-b1e4-a1e9-cc3bf7dbfb28@cmu.edu> Message-ID: A reminder about Yichong's defense coming up this Monday at 9am. All details below, it will be a fun talk so please come and join if you can! Cheers, Artur ---------- Forwarded message --------- From: Diane Stidle Date: Mon, Jun 15, 2020 at 1:49 PM Subject: Thesis Defense - June 29, 2020 - Yichong Xu - Learning and Decision Making from Diverse Forms of Information To: ml-seminar at cs.cmu.edu , *Thesis Defense* Date: June 29, 2020 Time: 9:00am (EDT) PhD Candidate: Yichong Xu Virtual Presentation Link: https://cmu.zoom.us/j/99909151454 *Title: *Learning and Decision Making from Diverse Forms of Information Abstract: Classical machine learning posits that data are independently and identically distributed, in a single format usually the same as test data. In modern applications however, additional information in other formats might be available freely or at a lower cost. For example, in data crowdsourcing we can collect preferences over the data points instead of directly asking the labels of a single data point at a lower cost. In natural language understanding problems, we might have limited amount of data in the target domain, but can use a large amount of general domain data for free. The main topic of this thesis is to study how to efficiently incorporate these diverse forms of information into the learning and decision making process. We study two representative paradigms in this thesis. Firstly, we study learning and decision making problems with direct labels and comparisons. Our algorithms can efficiently combine comparisons with direct labels so that the total learning cost can be greatly reduced. Secondly, we study multi-task learning problems from multiple domain data, and design algorithms to transfer the data from a general, abundant domain to the target domain. We show theoretical guarantees of our algorithms as well as their statistical minimaxity through information-theoretic limits. On the practical side, we demonstrate promising experimental results on price estimation and natural language understanding tasks. *Thesis Committee:* Artur Dubrawski, Co-Chair Aarti Singh, Co-Chair Sivaraman Balakrishnan John Langford (Microsoft Research) -- Diane Stidle Graduate Programs Manager Machine Learning Department Carnegie Mellon Universitystidle at andrew.cmu.edu 412-268-1299 -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Sat Jun 27 14:12:32 2020 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sat, 27 Jun 2020 14:12:32 -0400 Subject: GPU18 and GPU19 scratch space Message-ID: <20200627181232.D6KMi%predragp@andrew.cmu.edu> Dear Autonians, I hope everyone is having a fabulous weekend. Could you please try to clean up the scratch space on GPU18 and GPU19 in particular. Currently scratch on those 2 machines is 100% which makes it useless for everyone. I would hate to use my sledge hammer to "fix" the problem. Cheers, Predrag From hiteshar at andrew.cmu.edu Sun Jun 28 07:38:52 2020 From: hiteshar at andrew.cmu.edu (Hitesh Arora) Date: Sun, 28 Jun 2020 07:38:52 -0400 Subject: GPU9 and GPU 10 scratch full In-Reply-To: References: Message-ID: Hi All, GPU 9 scratch space is almost full, and has been like that since long. So only some additional usage makes it unusable for everyone. I would request you to please check and delete/move your files as convenient. Details: *gpu9: /dev/mapper/sl_gpu9-home 1.8T 1.8T 4.9G 100% /home* GPU 10 is also high on usage: *gpu10: /dev/mapper/sl_lov6-home 392G 340G 53G 87% /home* Thanks, Hitesh On Thu, Apr 2, 2020 at 1:27 PM Hitesh Arora wrote: > Hi All, > > GPU 9 and 10 scratch directory are almost full. Please check and > delete/move your files from the scratch directories. > > Thanks, > Hitesh > > On Tue, Feb 25, 2020 at 12:10 PM Hitesh Arora > wrote: > >> Hi All, >> >> A gentle reminder on this! GPU 9 scratch is now completely FULL, making >> it unusable. Please check and delete/move your files from the scratch >> directory of GPU 9. >> >> >> Thanks, >> Hitesh >> >> On Sat, Feb 1, 2020 at 12:42 PM Sarveshwaran Jayaraman < >> sarveshj at andrew.cmu.edu> wrote: >> >>> Hi All, >>> >>> >>> GPU9 scratch space is almost full -- please check your space usage and >>> clear out any unnecessary files. Thanks for your help! >>> >>> >>> sarveshj at gpu9$ df -h /home/scratch >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/mapper/sl_gpu9-home 1.8T 1.8T 67G 97% /home >>> >>> >>> [image: 1562005799537] >>> >>> Sarvesh Jayaraman >>> Sr. Research Analyst, Auton Lab >>> Carnegie Mellon University >>> Mob: +1-240-893-4287 >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-15620057995379b79b276-c9bc-475c-bdd9-7612280c8e92.png Type: image/png Size: 5461 bytes Desc: not available URL: From awd at cs.cmu.edu Tue Jun 30 11:45:02 2020 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 30 Jun 2020 11:45:02 -0400 Subject: Fwd: RI PhD Thesis Proposal: Sibi Venkatesan In-Reply-To: <4790d203d5904c1b9220ec1bf4074613@cmu.edu> References: <4790d203d5904c1b9220ec1bf4074613@cmu.edu> Message-ID: Team, Tomorrow morning we will have our own Sibi Venkatesan present his thesis proposal. Come and join us on zoom to enjoy this highly interesting talk! Cheers, Artur ---------- Forwarded message --------- From: Suzanne Lyons Muth Date: Wed, Jun 24, 2020 at 8:55 PM Subject: RI PhD Thesis Proposal: Sibi Venkatesan To: ri-people at lists.andrew.cmu.edu Date: 01 July 2020 Time: 9:00 a.m. Place: *Virtual Presentation* https://cmu.zoom.us/j/93286511839?pwd=RnBsSUJ5Qk4rSzhTSnQvbzJCUE9xQT09 Type: Ph.D. Thesis Proposal Who: Sibi Venkatesan Title: Understanding, Exploiting and Improving Inter-view Relationship Abstract: Multi-view machine learning has received substantial attention in various applications over recent years. These applications typically involve learning on data obtained from multiple sources of information, such as, for example, in multi-sensor systems such as self-driving cars and patient bed-side monitoring. Learning models for such applications can often benefit from leveraging not only the information from individual sources, but also the interactions and relationships between these sources. In this proposal, we look at multi-view learning approaches which try to model these inter-view interactions explicitly. Here, we define interactions and relationships between views in terms of the information which is shared across these views, i.e. information redundancy between views. We distinguish between global relationships, which are shared across all views, and local relationships, which are only shared between a subset of views For example, in a multi-camera system, we can think of global relationships to be defined over the part of a scene which is visible to all cameras, while local relationships may exist between a subset of views to be defined by the intersection of the fields of view of only those cameras. We consider three main aspects of modeling such inter-view relationships. First, we look at *understanding* relationships within multi-view data. We describe two methods which aim to uncover and model local relationships between views: (i) Robust Multi-view Auto-Encoder, which generalizes the idea of drop-out to views as a whole and (ii) One-vs-Rest Embedding Learning, which explicitly models the local relationships by considering each view separately. We also propose extensions to these methods, as well as alternate approaches to understanding inter-view relationships. Next, we look at *exploiting* this understanding to solve down-stream tasks and real-world problems. Here, we use our proposed models to tackle real-world problems, and demonstrate the effectiveness of explicitly modeling inter-view relationships. We also discuss how we can extend our approaches to looking at special applications, such as dynamical systems and asynchronous multi-view data. Finally, we discuss *improving* inter-view relationships by facilitating favorable interactions between views in multi-view data. We first show how we can re-interpret individual views as data points, allowing us to apply traditional machine learning approaches to modeling inter-view relationships. We then describe Scalable Active Search as a candidate approach for view-selection. We also propose additional methods to improve inter-view relationships using our view-as-data-point interpretation, and discuss ways for their online improvement. Thesis Committee Members: Artur Dubrawski, Chair Jeff Schneider Srinivasa Narasimhan Junier Oliva, University of North Carolina, Chapel Hill A copy of the thesis document is available at: www.andrew.cmu.edu/user/sibiv/Thesis_Proposal.pdf _______________________________________________ ri-people mailing list ri-people at lists.andrew.cmu.edu https://lists.andrew.cmu.edu/mailman/listinfo/ri-people -------------- next part -------------- An HTML attachment was scrubbed... URL: