From ngisolfi at cs.cmu.edu Thu Aug 1 08:42:31 2019 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 1 Aug 2019 08:42:31 -0400 Subject: [Lunch] Today, 12-1pm @ patio between Hamburg / Smith Message-ID: <3A77A9B4-13CF-40D7-82AC-A3853A08D393@cs.cmu.edu> Hi Everyone, Lab lunch will be today from 12-1pm. Weather looks like it is supposed to be partly cloudy but warm, so let?s try meeting for lunch at the outdoor tables between Hamburg and Smith halls. We might as make the most of the nice weather and relatively empty campus while we still can! If anyone knows the name for this space, feel free to share or let me know so I can give more precise locations in the future. See you at lunch! - Nick From yushal at andrew.cmu.edu Thu Aug 1 11:26:48 2019 From: yushal at andrew.cmu.edu (Yusha Liu) Date: Thu, 1 Aug 2019 11:26:48 -0400 Subject: No GPU drivers detected on any gpu machine? In-Reply-To: <20190725021328.b1FI1L2xq%predragp@andrew.cmu.edu> References: <20190725021328.b1FI1L2xq%predragp@andrew.cmu.edu> Message-ID: Hi all, Could anyone help give me a guide on how to install tensorflow (<2.0 beta) compatible with CUDA 10.1 on gpus? I haven't succeed on that. Thanks and sorry for the overhead. Yours, Yusha On Wed, Jul 24, 2019 at 10:16 PM Predrag Punosevac wrote: > Predrag Punosevac wrote: > > I apologize for top posting. Just a quick update. As of 5 minutes ago > machines gpu[2-10] appear to have no issues. After all the upgrades and > reboots it appears that we don't have any dead GPU cards on them and > that drivers and CUDA 10.1 work as expected. I understand that this is a > little comfort to people who need to regenerate tensorflow, py-torch, > and all that "deep-learning" stuff but I have no control over the > upstream decisions. > > GPU1 appears to be broken at the moment. Without attaching consol to the > machine it is difficult for me to asses the complexity of the problem. > > One more time sorry for the down time. > > Cheers, > Predrag > > > > > > > > A quick update on this issue and a resolution. I took a clue from the > > fact that GPU10 was working as expected and narrowed down the issue to > > CUDA 9.1 installation. It appears that upstream has broken CUDA 9.1 > > purposely via dkms utility which is used to recompile kernel modules > > to fit specific kernel release. They probably want people to move to > > CUDA 10.1. > > > > Long story short. I upgraded NVidia driver and CUDA to 10.1 on GPU2 > > and GPU3 servers. They appear to be working flawlessly on my end as > > tested with nvidia-smi utility as well as MATLAB. I have recreated > > GPU3 scratch directory which was 100% used for almost half a year. I > > have also reinstalled libcudnn library on both machines but I am > > unable to test it. > > > > This is all good but it also means that people will have to regenerate > > their tools from the scratch to match the kernel, driver, and CUDA > > versions. If you have things on GPU10 you probably could just migrate > > them. This is very time consuming but we have no choice. > > > > The major bad news is that one of the GPU servers I tried to work on > > GPU1 (commissioned almost five years ago) didn't survive reboot. It > > also uses older Tesla K80 cards. I will have to attach the screen and > > troubleshoot this machine. That will not happen today or for that > > matter this week. > > > > My plan is now to move and fix machines GPU[4-9] which would take the > > rest of the day.Note that GPU7 is designated for a special project and > > not generally accessible. > > > > Most Kind Regards, > > Predrag Punosevac > > > > > > > > > > On Wed, Jul 24, 2019 at 1:09 PM Predrag Punosevac > > wrote: > > > > > > Thank you so much for bringing this to my attention. GPU10 is not > > > broken but sure enough you are right about the other machines. It > > > appears that one of recent updates have broken the driver. I will > > > reinstall drivers shortly and reboot the machines. This is also notice > > > for everyone else that GPU1-9 will have to be rebooted. > > > > > > Predrag > > > > > > On Wed, Jul 24, 2019 at 10:52 AM Chufan Gao > wrote: > > > > > > > > Hi Predrag, > > > > > > > > > > > > I discovered today that when I run nvidia-smi, I get this error: > > > > > > > > > > > > NVIDIA-SMI has failed because it couldn't communicate with the > NVIDIA driver. Make sure that the latest NVIDIA driver is installed and > running. > > > > > > > > The same happens for all of the gpu machines that I tried. I am > confused - was there an update that broke it? > > > > > > > > Sincerely, > > > > Andy Gao > -- Yusha Liu, Master's Student Machine Learning Department Carnegie Mellon University -------------- next part -------------- An HTML attachment was scrubbed... URL: From sarveshj at andrew.cmu.edu Thu Aug 1 11:52:22 2019 From: sarveshj at andrew.cmu.edu (Sarveshwaran Jayaraman) Date: Thu, 1 Aug 2019 15:52:22 +0000 Subject: No GPU drivers detected on any gpu machine? In-Reply-To: References: <20190725021328.b1FI1L2xq%predragp@andrew.cmu.edu>, Message-ID: Hi Yusha, I was able to install Tensorflow-gpu version 1.14.0 using the following command (note $: refers to the shell prompt) $: source $: pip install tensorflow-gpu # sanity check in python shell $: python >>> import tensorflow as tf >>> tf.__version__ # should give you the installed version Please let me know if these commands work for you. If not, please feel free to get in touch with me. Thanks! [1562005799537] Sarvesh Jayaraman Sr. Research Analyst, Auton Lab Carnegie Mellon University Mob: +1-240-893-4287 ________________________________ From: Autonlab-users on behalf of Yusha Liu Sent: Thursday, August 1, 2019 11:26:48 AM To: users at autonlab.org Subject: Re: No GPU drivers detected on any gpu machine? Hi all, Could anyone help give me a guide on how to install tensorflow (<2.0 beta) compatible with CUDA 10.1 on gpus? I haven't succeed on that. Thanks and sorry for the overhead. Yours, Yusha On Wed, Jul 24, 2019 at 10:16 PM Predrag Punosevac > wrote: Predrag Punosevac > wrote: I apologize for top posting. Just a quick update. As of 5 minutes ago machines gpu[2-10] appear to have no issues. After all the upgrades and reboots it appears that we don't have any dead GPU cards on them and that drivers and CUDA 10.1 work as expected. I understand that this is a little comfort to people who need to regenerate tensorflow, py-torch, and all that "deep-learning" stuff but I have no control over the upstream decisions. GPU1 appears to be broken at the moment. Without attaching consol to the machine it is difficult for me to asses the complexity of the problem. One more time sorry for the down time. Cheers, Predrag > A quick update on this issue and a resolution. I took a clue from the > fact that GPU10 was working as expected and narrowed down the issue to > CUDA 9.1 installation. It appears that upstream has broken CUDA 9.1 > purposely via dkms utility which is used to recompile kernel modules > to fit specific kernel release. They probably want people to move to > CUDA 10.1. > > Long story short. I upgraded NVidia driver and CUDA to 10.1 on GPU2 > and GPU3 servers. They appear to be working flawlessly on my end as > tested with nvidia-smi utility as well as MATLAB. I have recreated > GPU3 scratch directory which was 100% used for almost half a year. I > have also reinstalled libcudnn library on both machines but I am > unable to test it. > > This is all good but it also means that people will have to regenerate > their tools from the scratch to match the kernel, driver, and CUDA > versions. If you have things on GPU10 you probably could just migrate > them. This is very time consuming but we have no choice. > > The major bad news is that one of the GPU servers I tried to work on > GPU1 (commissioned almost five years ago) didn't survive reboot. It > also uses older Tesla K80 cards. I will have to attach the screen and > troubleshoot this machine. That will not happen today or for that > matter this week. > > My plan is now to move and fix machines GPU[4-9] which would take the > rest of the day.Note that GPU7 is designated for a special project and > not generally accessible. > > Most Kind Regards, > Predrag Punosevac > > > > > On Wed, Jul 24, 2019 at 1:09 PM Predrag Punosevac > > wrote: > > > > Thank you so much for bringing this to my attention. GPU10 is not > > broken but sure enough you are right about the other machines. It > > appears that one of recent updates have broken the driver. I will > > reinstall drivers shortly and reboot the machines. This is also notice > > for everyone else that GPU1-9 will have to be rebooted. > > > > Predrag > > > > On Wed, Jul 24, 2019 at 10:52 AM Chufan Gao > wrote: > > > > > > Hi Predrag, > > > > > > > > > I discovered today that when I run nvidia-smi, I get this error: > > > > > > > > > NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. > > > > > > The same happens for all of the gpu machines that I tried. I am confused - was there an update that broke it? > > > > > > Sincerely, > > > Andy Gao -- Yusha Liu, Master's Student Machine Learning Department Carnegie Mellon University -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1562005799537c4b9ba88-d224-4550-89a7-4d309ef482e2.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-1562005799537c4b9ba88-d224-4550-89a7-4d309ef482e2.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-15620057995377beff870-bf65-481a-8c1a-9694a05b61d8.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-15620057995377beff870-bf65-481a-8c1a-9694a05b61d8.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1562005799537d8124a23-bd8f-46fc-b84a-d4d2e34fd78b.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-1562005799537d8124a23-bd8f-46fc-b84a-d4d2e34fd78b.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-156200579953719ab3a3a-4555-4492-9e69-fc3fdfb987cd.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-156200579953719ab3a3a-4555-4492-9e69-fc3fdfb987cd.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-156200579953756de91d9-b9ef-421d-a055-df3a24dd15ca.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-156200579953756de91d9-b9ef-421d-a055-df3a24dd15ca.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1562005799537f62fe741-77e3-4d4f-a209-12c5923fa931.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-1562005799537f62fe741-77e3-4d4f-a209-12c5923fa931.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-15620057995376ebb8b8c-3d3e-40da-804e-fa10cbb15144.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-15620057995376ebb8b8c-3d3e-40da-804e-fa10cbb15144.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1562005799537f6f89be4-5df7-4eaf-a161-6b0d9202e602.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-1562005799537f6f89be4-5df7-4eaf-a161-6b0d9202e602.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1562005799537569d6540-2776-4a52-a316-37ae6cf7bd26.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-1562005799537569d6540-2776-4a52-a316-37ae6cf7bd26.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1562005799537cc72768c-8612-43dd-a28b-d178cd220172.png Type: image/png Size: 5461 bytes Desc: OutlookEmoji-1562005799537cc72768c-8612-43dd-a28b-d178cd220172.png URL: From predragp at andrew.cmu.edu Thu Aug 1 12:29:18 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Thu, 1 Aug 2019 12:29:18 -0400 Subject: No GPU drivers detected on any gpu machine? In-Reply-To: References: <20190725021328.b1FI1L2xq%predragp@andrew.cmu.edu> Message-ID: Could you be little bit more precises. Are you using pip from /opt/rh/python-36 or /opt/minconda3/python37 or from the base? Predrag On Thu, Aug 1, 2019 at 11:53 AM Sarveshwaran Jayaraman < sarveshj at andrew.cmu.edu> wrote: > Hi Yusha, > > > I was able to install Tensorflow-gpu version 1.14.0 using the following > command > > > (note $: refers to the shell prompt) > > $: source > > $: pip install tensorflow-gpu > > > # sanity check in python shell > > $: python > > >>> import tensorflow as tf > > >>> tf.__version__ # should give you the installed version > > > > Please let me know if these commands work for you. If not, please feel > free to get in touch with me. Thanks! > > > > > [image: 1562005799537] > > Sarvesh Jayaraman > Sr. Research Analyst, Auton Lab > Carnegie Mellon University > Mob: +1-240-893-4287 > > ------------------------------ > *From:* Autonlab-users on behalf of > Yusha Liu > *Sent:* Thursday, August 1, 2019 11:26:48 AM > *To:* users at autonlab.org > *Subject:* Re: No GPU drivers detected on any gpu machine? > > Hi all, > > Could anyone help give me a guide on how to install tensorflow (<2.0 beta) > compatible with CUDA 10.1 on gpus? I haven't succeed on that. Thanks and > sorry for the overhead. > > Yours, > Yusha > > > > > > On Wed, Jul 24, 2019 at 10:16 PM Predrag Punosevac < > predragp at andrew.cmu.edu> wrote: > >> Predrag Punosevac wrote: >> >> I apologize for top posting. Just a quick update. As of 5 minutes ago >> machines gpu[2-10] appear to have no issues. After all the upgrades and >> reboots it appears that we don't have any dead GPU cards on them and >> that drivers and CUDA 10.1 work as expected. I understand that this is a >> little comfort to people who need to regenerate tensorflow, py-torch, >> and all that "deep-learning" stuff but I have no control over the >> upstream decisions. >> >> GPU1 appears to be broken at the moment. Without attaching consol to the >> machine it is difficult for me to asses the complexity of the problem. >> >> One more time sorry for the down time. >> >> Cheers, >> Predrag >> >> >> >> >> >> >> > A quick update on this issue and a resolution. I took a clue from the >> > fact that GPU10 was working as expected and narrowed down the issue to >> > CUDA 9.1 installation. It appears that upstream has broken CUDA 9.1 >> > purposely via dkms utility which is used to recompile kernel modules >> > to fit specific kernel release. They probably want people to move to >> > CUDA 10.1. >> > >> > Long story short. I upgraded NVidia driver and CUDA to 10.1 on GPU2 >> > and GPU3 servers. They appear to be working flawlessly on my end as >> > tested with nvidia-smi utility as well as MATLAB. I have recreated >> > GPU3 scratch directory which was 100% used for almost half a year. I >> > have also reinstalled libcudnn library on both machines but I am >> > unable to test it. >> > >> > This is all good but it also means that people will have to regenerate >> > their tools from the scratch to match the kernel, driver, and CUDA >> > versions. If you have things on GPU10 you probably could just migrate >> > them. This is very time consuming but we have no choice. >> > >> > The major bad news is that one of the GPU servers I tried to work on >> > GPU1 (commissioned almost five years ago) didn't survive reboot. It >> > also uses older Tesla K80 cards. I will have to attach the screen and >> > troubleshoot this machine. That will not happen today or for that >> > matter this week. >> > >> > My plan is now to move and fix machines GPU[4-9] which would take the >> > rest of the day.Note that GPU7 is designated for a special project and >> > not generally accessible. >> > >> > Most Kind Regards, >> > Predrag Punosevac >> > >> > >> > >> > >> > On Wed, Jul 24, 2019 at 1:09 PM Predrag Punosevac >> > wrote: >> > > >> > > Thank you so much for bringing this to my attention. GPU10 is not >> > > broken but sure enough you are right about the other machines. It >> > > appears that one of recent updates have broken the driver. I will >> > > reinstall drivers shortly and reboot the machines. This is also notice >> > > for everyone else that GPU1-9 will have to be rebooted. >> > > >> > > Predrag >> > > >> > > On Wed, Jul 24, 2019 at 10:52 AM Chufan Gao >> wrote: >> > > > >> > > > Hi Predrag, >> > > > >> > > > >> > > > I discovered today that when I run nvidia-smi, I get this error: >> > > > >> > > > >> > > > NVIDIA-SMI has failed because it couldn't communicate with the >> NVIDIA driver. Make sure that the latest NVIDIA driver is installed and >> running. >> > > > >> > > > The same happens for all of the gpu machines that I tried. I am >> confused - was there an update that broke it? >> > > > >> > > > Sincerely, >> > > > Andy Gao >> > > > -- > Yusha Liu, Master's Student > Machine Learning Department > Carnegie Mellon University > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OutlookEmoji-1562005799537cc72768c-8612-43dd-a28b-d178cd220172.png Type: image/png Size: 5461 bytes Desc: not available URL: From predragp at andrew.cmu.edu Fri Aug 2 19:45:27 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 2 Aug 2019 19:45:27 -0400 Subject: lov3 scratch 100% full Message-ID: Dear Autonians, lov3 scratch directory is now 100% and not usable for anybody. Have handed systeadmin measure will have to be taken after the consultation with faculty sponsors. Cheers, Predrag From predragp at andrew.cmu.edu Sun Aug 4 22:24:25 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sun, 4 Aug 2019 22:24:25 -0400 Subject: bash and lop1 gateways down Message-ID: Something just have happened to CMU network while I was doing some work. I can't ssh to eitherone of our two ssh shell gateways. I have no clue what is going on. Predrag From predragp at andrew.cmu.edu Sun Aug 4 22:30:44 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sun, 4 Aug 2019 22:30:44 -0400 Subject: bash and lop1 gateways down In-Reply-To: References: Message-ID: NREC seems to be down as well. I have no idea it it is a DNS issue, network, or something else. Predrag On Sun, Aug 4, 2019 at 10:24 PM Predrag Punosevac wrote: > > Something just have happened to CMU network while I was doing some > work. I can't ssh to eitherone of our two ssh shell gateways. I have > no clue what is going on. > > Predrag From predragp at andrew.cmu.edu Sun Aug 4 22:36:41 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sun, 4 Aug 2019 22:36:41 -0400 Subject: bash and lop1 gateways down In-Reply-To: References: Message-ID: It seems to be that strange case of me not being able to see any CMU IP address from my home computer but I can ping it from one of my accounts in Texas. I am guessing most of you are not seeing what I am seeing right now and can continue to work. Predrag On Sun, Aug 4, 2019 at 10:30 PM Predrag Punosevac wrote: > > NREC seems to be down as well. I have no idea it it is a DNS issue, > network, or something else. > > Predrag > > On Sun, Aug 4, 2019 at 10:24 PM Predrag Punosevac > wrote: > > > > Something just have happened to CMU network while I was doing some > > work. I can't ssh to eitherone of our two ssh shell gateways. I have > > no clue what is going on. > > > > Predrag From predragp at andrew.cmu.edu Wed Aug 7 21:28:04 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 7 Aug 2019 21:28:04 -0400 Subject: CMU network issue Message-ID: Hi Rob, I am CC-ing this message to users as it might be of wider interest. I am able to reproduce that weird network behavior you reported earlier . Namely after using x2goclient to connect to my desktop nx session times out and after that I no longer can ping of dig any CMU address for several hours. I wonder if their firewall looks for specific applications and black list them. Best, Predrag From ngisolfi at cs.cmu.edu Thu Aug 8 09:04:28 2019 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 8 Aug 2019 09:04:28 -0400 Subject: [Lunch] Today @12-1pm Gates 6th floor balcony Message-ID: Hi Everyone, Let?s try our luck with getting enough seating on the 6th floor Gates balcony for lab lunch today at noon. See you there! - Nick From gis at andrew.cmu.edu Thu Aug 8 13:05:18 2019 From: gis at andrew.cmu.edu (George Stoica) Date: Thu, 8 Aug 2019 20:05:18 +0300 Subject: Trouble installing mpi4py Message-ID: Hi All, I hope everything is well. Apologies if this is not the correct place to ask this question, this is my first time posting. I am trying to work with the openai baselines repository and need to install the mpi4py dependency for it. Unfortunately, I'm having a lot of trouble installing it, and none of the solutions I could find online appear to be working. I was wondering if anyone has faced this issue/how you were able to resolve it? Thanks very much for your help! George -------------- next part -------------- An HTML attachment was scrubbed... URL: From sarveshj at andrew.cmu.edu Thu Aug 8 13:23:55 2019 From: sarveshj at andrew.cmu.edu (Sarveshwaran Jayaraman) Date: Thu, 8 Aug 2019 17:23:55 +0000 Subject: Trouble installing mpi4py In-Reply-To: References: Message-ID: <78e551f23d7249869902cee9323fc6d6@andrew.cmu.edu> Hey George, If you can provide more details like Error messages/screenshots it would be very helpful to troubleshoot your issue and provide suggestions. Thanks! ________________________________ From: Autonlab-users on behalf of George Stoica Sent: Thursday, August 8, 2019 1:05:18 PM To: users at autonlab.org Subject: Trouble installing mpi4py Hi All, I hope everything is well. Apologies if this is not the correct place to ask this question, this is my first time posting. I am trying to work with the openai baselines repository and need to install the mpi4py dependency for it. Unfortunately, I'm having a lot of trouble installing it, and none of the solutions I could find online appear to be working. I was wondering if anyone has faced this issue/how you were able to resolve it? Thanks very much for your help! George -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Thu Aug 8 13:31:24 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Thu, 8 Aug 2019 13:31:24 -0400 Subject: Trouble installing mpi4py In-Reply-To: References: Message-ID: I just had a quick look. Software appears to be untested by upstream on Red Hat derivatives which we use in the lab. I would firstly try to install the software on a spear laptop of a desktop running one of tested OSs (Ubunut or OS X). It that works and if software looks useful you should add yourself to one of the developer mailing list and we could try to install it on a Red Hat. Nuclear option is that we use singularity https://sylabs.io/docs/ to run Ubuntu kernel in Red Hat userland if that is the only way to install things. Note that most large government labs in U.S. use Red Hat just like we do so upstream should get their act together. Predrag On Thu, Aug 8, 2019 at 1:06 PM George Stoica wrote: > > Hi All, > > I hope everything is well. > > Apologies if this is not the correct place to ask this question, this is my first time posting. > > I am trying to work with the openai baselines repository and need to install the mpi4py dependency for it. Unfortunately, I'm having a lot of trouble installing it, and none of the solutions I could find online appear to be working. I was wondering if anyone has faced this issue/how you were able to resolve it? > > Thanks very much for your help! > George From donghanw at cs.cmu.edu Thu Aug 8 13:48:10 2019 From: donghanw at cs.cmu.edu (Donghan Wang) Date: Thu, 8 Aug 2019 13:48:10 -0400 Subject: Trouble installing mpi4py In-Reply-To: References: Message-ID: Hi George, I have good experience using Conda to handle dependencies. In this case, you can do: . /opt/miniconda3/etc/profile.d/conda.s # enable conda conda create --name mpi4py python=3.6 # create a new env, namely mpi4py conda activate mpi4py conda install mpi4py pip install tensorflow # or pip install tensorflow-gpu git clone https://github.com/openai/baselines.git cd baselines pip install -e . # optionally test the installation pip install pytest pandas matplotlib gym[atari] pytest Hope it helps. Thanks, Jarod On Thu, Aug 8, 2019 at 1:32 PM Predrag Punosevac wrote: > I just had a quick look. Software appears to be untested by upstream > on Red Hat derivatives which we use in the lab. I would firstly try to > install the software on a spear laptop of a desktop running one of > tested OSs (Ubunut or OS X). It that works and if software looks > useful you should add yourself to one of the developer mailing list > and we could try to install it on a Red Hat. > > Nuclear option is that we use singularity > > https://sylabs.io/docs/ > > to run Ubuntu kernel in Red Hat userland if that is the only way to > install things. > > Note that most large government labs in U.S. use Red Hat just like we > do so upstream should get their act together. > > Predrag > > On Thu, Aug 8, 2019 at 1:06 PM George Stoica wrote: > > > > Hi All, > > > > I hope everything is well. > > > > Apologies if this is not the correct place to ask this question, this is > my first time posting. > > > > I am trying to work with the openai baselines repository and need to > install the mpi4py dependency for it. Unfortunately, I'm having a lot of > trouble installing it, and none of the solutions I could find online appear > to be working. I was wondering if anyone has faced this issue/how you were > able to resolve it? > > > > Thanks very much for your help! > > George > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gis at andrew.cmu.edu Thu Aug 8 17:38:10 2019 From: gis at andrew.cmu.edu (George Stoica) Date: Fri, 9 Aug 2019 00:38:10 +0300 Subject: Trouble installing mpi4py In-Reply-To: References: Message-ID: Thank you for all your help and suggestions! I got it working with Jarod's suggestion, thanks very much! Thanks again, George On Thu, Aug 8, 2019 at 8:49 PM Donghan Wang wrote: > Hi George, > > I have good experience using Conda to handle dependencies. In this case, > you can do: > > . /opt/miniconda3/etc/profile.d/conda.s # enable conda > conda create --name mpi4py python=3.6 # create a new env, namely mpi4py > conda activate mpi4py > conda install mpi4py > pip install tensorflow # or pip install tensorflow-gpu > git clone https://github.com/openai/baselines.git > cd baselines > pip install -e . > > # optionally test the installation > pip install pytest pandas matplotlib gym[atari] > pytest > > Hope it helps. > > Thanks, > Jarod > > On Thu, Aug 8, 2019 at 1:32 PM Predrag Punosevac > wrote: > >> I just had a quick look. Software appears to be untested by upstream >> on Red Hat derivatives which we use in the lab. I would firstly try to >> install the software on a spear laptop of a desktop running one of >> tested OSs (Ubunut or OS X). It that works and if software looks >> useful you should add yourself to one of the developer mailing list >> and we could try to install it on a Red Hat. >> >> Nuclear option is that we use singularity >> >> https://sylabs.io/docs/ >> >> to run Ubuntu kernel in Red Hat userland if that is the only way to >> install things. >> >> Note that most large government labs in U.S. use Red Hat just like we >> do so upstream should get their act together. >> >> Predrag >> >> On Thu, Aug 8, 2019 at 1:06 PM George Stoica wrote: >> > >> > Hi All, >> > >> > I hope everything is well. >> > >> > Apologies if this is not the correct place to ask this question, this >> is my first time posting. >> > >> > I am trying to work with the openai baselines repository and need to >> install the mpi4py dependency for it. Unfortunately, I'm having a lot of >> trouble installing it, and none of the solutions I could find online appear >> to be working. I was wondering if anyone has faced this issue/how you were >> able to resolve it? >> > >> > Thanks very much for your help! >> > George >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Fri Aug 9 17:01:24 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 9 Aug 2019 17:01:24 -0400 Subject: GPU1 update Message-ID: Dear Autonians, A quick update on the state of GPU1 machine. I am getting error code B4 which typically means bad RAM module. Unfortunately I will have to open the machine and play little combinatorial game until I find the culprit. That is time consuming and will not happen before the end of next week. Cheers, Predrag From predragp at andrew.cmu.edu Sat Aug 10 01:10:39 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sat, 10 Aug 2019 01:10:39 -0400 Subject: GPU[11-14] added to the cluster Message-ID: <20190810051039.vZq0MNj1m%predragp@andrew.cmu.edu> Dear Autonians, I just finished provisioning of four more GPU computing nodes GPU11, GPU12, GPU13, and GPU14 purchased by Dr. Jeff Schneider. They appear to be in working condition and I tested them lightly. Each server has four of the latest NVidia GeForce RTX 2080 Ti GPU cards. They are so new that the driver for them (430.40) was released less than 2 weeks ago. I didn't bother to install MATLAB as support for these cards is likely to be added no earlier than R2020a (I will try adding R2019b when it becomes available) but sure enough all other scientific software we use in the lab is there. The servers come with 2 x Xeon Silver 4210 (2.2 GHz, 10-core, 14MB L3 Cache) which means that you have 40 CPU thread on each server. Unfortunately due to the budgetary constrains each server has only 96GB of RAM which is less than our other GPU nodes but it is likely that we will be able to add the RAM as funds become available. Cheers, Predrag From predragp at andrew.cmu.edu Sat Aug 10 10:59:31 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sat, 10 Aug 2019 10:59:31 -0400 Subject: cross privilege side-channel attack Message-ID: Hi Autonians, I hope all of you are out on the fresh air and that you will not be reading this email before Monday. Goodfellas from Intel have screwed up again. Intel CPUs have another cross privilege side-channel attack. (SWAPGS) https://threatpost.com/new-swapgs-side-channel-attack-bypasses-spectre-and-meltdown-defenses/147034/ Our perimeter machines, all of which are of course powered by OpenBSD (sorry Ubuntu guys that I have to hurt your feelings), have already been patched. Due to the fact that machines had to be rebooted OpenVPN daemons on all shell gateways and desktops had to be restarted which could be perceived as a network interruption. In the case you wonder this is the second privilege hole discovered in two months. Hint: Google Intel CPUs have a cross privilege side-channel attack (MDS). At this point I don't even bother patching other OSs (I am not even sure that they came up with patches) as they still support hyperthreding which is highly unsafe and should not be enabled on mission critical machines. Cheers, Predrag From hiteshar at andrew.cmu.edu Sun Aug 11 13:44:03 2019 From: hiteshar at andrew.cmu.edu (Hitesh Arora) Date: Sun, 11 Aug 2019 13:44:03 -0400 Subject: Is auton bash down? Message-ID: Hi Predrag / Autonians, Is bash (bash.autonlab.org) down? I am unable to ssh to auton. Thanks, Hitesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Sun Aug 11 14:08:20 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sun, 11 Aug 2019 14:08:20 -0400 Subject: Is auton bash down? In-Reply-To: References: Message-ID: No. VPN died so your home directory could not have been mounted. I just restarted it. Predrag On Sun, Aug 11, 2019 at 1:44 PM Hitesh Arora wrote: > > Hi Predrag / Autonians, > > Is bash (bash.autonlab.org) down? I am unable to ssh to auton. > > Thanks, > Hitesh From hiteshar at andrew.cmu.edu Sun Aug 11 14:13:20 2019 From: hiteshar at andrew.cmu.edu (Hitesh Arora) Date: Sun, 11 Aug 2019 14:13:20 -0400 Subject: Is auton bash down? In-Reply-To: References: Message-ID: Thank you so much! It is working now. On Sun, Aug 11, 2019 at 2:08 PM Predrag Punosevac wrote: > No. VPN died so your home directory could not have been mounted. I > just restarted it. > > Predrag > > On Sun, Aug 11, 2019 at 1:44 PM Hitesh Arora > wrote: > > > > Hi Predrag / Autonians, > > > > Is bash (bash.autonlab.org) down? I am unable to ssh to auton. > > > > Thanks, > > Hitesh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngisolfi at cs.cmu.edu Thu Aug 15 09:00:30 2019 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 15 Aug 2019 09:00:30 -0400 Subject: [Lunch] Today 12-1pm @ Gates 6th floor balcony Message-ID: <96E622D1-AF0C-423F-BC0D-34187929F218@cs.cmu.edu> Hi Everyone, Let?s aim for lunch on the 6th floor balcony in Gates-Hillman again. Hopefully we can still gather enough space since the semester hasn?t fully started yet. Bring your lunch, and we?ll see you there! - Nick From mbandrews at cmu.edu Fri Aug 16 13:38:26 2019 From: mbandrews at cmu.edu (Michael Andrews) Date: Fri, 16 Aug 2019 13:38:26 -0400 Subject: Getting remote access to jupyter notebook Message-ID: Hi all, Is there a way to get remote access to a jupyter notebook running on one of the gpu machines without using a reverse ssh (i.e. https://medium.com/@sankarshan7/how-to-run-jupyter-notebook-in-server-which-is-at-multi-hop-distance-a02bc8e78314) as this apparently violates firewall rules? Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Fri Aug 16 15:09:34 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 16 Aug 2019 15:09:34 -0400 Subject: Getting remote access to jupyter notebook In-Reply-To: References: Message-ID: Ok I think I am getting what is happening here. You are essentially using reverse ssh proxy as a VPN tunnel and then run Browser through it. That is going to be very slow and it is more or less the same thing you would do with ssh -Y -o "ProxyCommand ssh username at bash.autonlab.org" blabla What I am saying is try to use X2Goclient to connect via bash.autonlab.org proxy to a computing node of your liking and then start the browser in X2Goclinent custom (openbox) session. That would work in real time. My only concern is that in the past week or two I and Rob were getting blacklisted by CMU firewall for using NX. Let me think little bit more about the problem. I am surprised nobody wrote a server mode for Jypiter notebook like the one there is for Rstudio. Predrag On Fri, Aug 16, 2019 at 1:39 PM Michael Andrews wrote: > > Hi all, > > Is there a way to get remote access to a jupyter notebook running on one of the gpu machines without using a reverse ssh (i.e. https://medium.com/@sankarshan7/how-to-run-jupyter-notebook-in-server-which-is-at-multi-hop-distance-a02bc8e78314) as this apparently violates firewall rules? > > Thanks, > Michael From gis at andrew.cmu.edu Fri Aug 16 16:03:27 2019 From: gis at andrew.cmu.edu (George Stoica) Date: Fri, 16 Aug 2019 13:03:27 -0700 Subject: Using GPUs with TF Message-ID: Hi All, I hope everything is well. I am unable to utilize the gpu's when running any version of tensorflow-gpu. From online it appears that the problem may be due tensorflow not yet having support for Cuda 10.1, which is on every gpu I have access to. I was wondering if anyone is able to utilize the gpus with tensorflow and if so how are you able? Thanks very much! George -------------- next part -------------- An HTML attachment was scrubbed... URL: From chufang at andrew.cmu.edu Fri Aug 16 16:07:23 2019 From: chufang at andrew.cmu.edu (Chufan Gao) Date: Fri, 16 Aug 2019 20:07:23 +0000 Subject: Using GPUs with TF In-Reply-To: References: Message-ID: <5FE8E4DF-43BB-4C86-8E3C-3FA65CCD1ACC@andrew.cmu.edu> Hi George, I ran into this issue previously, and a workaround is to simply install TF via conda, as it automatically configures it so that it works with your cuda version. https://docs.conda.io/en/latest/miniconda.html Sincerely, Chufan Gao On Aug 16, 2019, at 16:04, George Stoica > wrote: Hi All, I hope everything is well. I am unable to utilize the gpu's when running any version of tensorflow-gpu. From online it appears that the problem may be due tensorflow not yet having support for Cuda 10.1, which is on every gpu I have access to. I was wondering if anyone is able to utilize the gpus with tensorflow and if so how are you able? Thanks very much! George -------------- next part -------------- An HTML attachment was scrubbed... URL: From gis at andrew.cmu.edu Fri Aug 16 17:02:26 2019 From: gis at andrew.cmu.edu (George Stoica) Date: Fri, 16 Aug 2019 14:02:26 -0700 Subject: Using GPUs with TF In-Reply-To: <5FE8E4DF-43BB-4C86-8E3C-3FA65CCD1ACC@andrew.cmu.edu> References: <5FE8E4DF-43BB-4C86-8E3C-3FA65CCD1ACC@andrew.cmu.edu> Message-ID: Thanks for the suggestion! Unfortunately, installing it in a conda environment doesn't appear be solving the problem. I still receive the following issues: [image: Screen Shot 2019-08-16 at 1.58.23 PM.png] [image: Screen Shot 2019-08-16 at 1.58.34 PM.png] Which suggest that the gpu is not accessible still. Interestingly however, it appears that all gpus are visible as running tf.test.is_gpu_available() displays all 4 gpus. Is there anything else that you had to do when installing tensorflow-gpu to get it working properly? Thanks! George On Fri, Aug 16, 2019 at 1:07 PM Chufan Gao wrote: > Hi George, > > I ran into this issue previously, and a workaround is to simply install TF > via conda, as it automatically configures it so that it works with your > cuda version. > > https://docs.conda.io/en/latest/miniconda.html > > Sincerely, > Chufan Gao > > On Aug 16, 2019, at 16:04, George Stoica wrote: > > Hi All, > > I hope everything is well. > > I am unable to utilize the gpu's when running any version of > tensorflow-gpu. From online it appears that the problem may be due > tensorflow not yet having support for Cuda 10.1, which is on every gpu I > have access to. I was wondering if anyone is able to utilize the gpus with > tensorflow and if so how are you able? > > > Thanks very much! > George > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-08-16 at 1.58.23 PM.png Type: image/png Size: 206302 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-08-16 at 1.58.34 PM.png Type: image/png Size: 152552 bytes Desc: not available URL: From mbandrews at cmu.edu Mon Aug 19 11:35:31 2019 From: mbandrews at cmu.edu (Michael Andrews) Date: Mon, 19 Aug 2019 11:35:31 -0400 Subject: Getting remote access to jupyter notebook In-Reply-To: References: Message-ID: Hi Predrag, Does it help with the firewall issue if I have a separate tunnel from my local machine to bash.autonlab.org and another one from the gpu machine to bash.autonlab.org? Does anyone else use jupyter notebooks remotely? On Fri, Aug 16, 2019 at 3:09 PM Predrag Punosevac wrote: > Ok I think I am getting what is happening here. You are essentially > using reverse ssh proxy as a VPN tunnel and then run Browser through > it. That is going to be very slow and it is more or less the same > thing you would do with > > ssh -Y -o "ProxyCommand ssh username at bash.autonlab.org" blabla > > What I am saying is try to use X2Goclient to connect via > bash.autonlab.org proxy to a computing node of your liking and then > start the browser in X2Goclinent custom (openbox) session. That would > work in real time. My only concern is that in the past week or two I > and Rob were getting blacklisted by CMU firewall for using NX. Let me > think little bit more about the problem. I am surprised nobody wrote a > server mode for Jypiter notebook like the one there is for Rstudio. > > Predrag > > On Fri, Aug 16, 2019 at 1:39 PM Michael Andrews wrote: > > > > Hi all, > > > > Is there a way to get remote access to a jupyter notebook running on one > of the gpu machines without using a reverse ssh (i.e. > https://medium.com/@sankarshan7/how-to-run-jupyter-notebook-in-server-which-is-at-multi-hop-distance-a02bc8e78314) > as this apparently violates firewall rules? > > > > Thanks, > > Michael > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Aug 19 12:18:00 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 19 Aug 2019 12:18:00 -0400 Subject: Getting remote access to jupyter notebook In-Reply-To: References: Message-ID: <20190819161800.q8xMj8sFQ%predragp@andrew.cmu.edu> Michael Andrews wrote: > Hi Predrag, > > Does it help with the firewall issue if I have a separate tunnel from my > local machine to bash.autonlab.org and another one from the gpu machine to > bash.autonlab.org? No makes it more complicated > > Does anyone else use jupyter notebooks remotely? > Yes using X2go as explained during the orientation. Predrag > On Fri, Aug 16, 2019 at 3:09 PM Predrag Punosevac > wrote: > > > Ok I think I am getting what is happening here. You are essentially > > using reverse ssh proxy as a VPN tunnel and then run Browser through > > it. That is going to be very slow and it is more or less the same > > thing you would do with > > > > ssh -Y -o "ProxyCommand ssh username at bash.autonlab.org" blabla > > > > What I am saying is try to use X2Goclient to connect via > > bash.autonlab.org proxy to a computing node of your liking and then > > start the browser in X2Goclinent custom (openbox) session. That would > > work in real time. My only concern is that in the past week or two I > > and Rob were getting blacklisted by CMU firewall for using NX. Let me > > think little bit more about the problem. I am surprised nobody wrote a > > server mode for Jypiter notebook like the one there is for Rstudio. > > > > Predrag > > > > On Fri, Aug 16, 2019 at 1:39 PM Michael Andrews wrote: > > > > > > Hi all, > > > > > > Is there a way to get remote access to a jupyter notebook running on one > > of the gpu machines without using a reverse ssh (i.e. > > https://medium.com/@sankarshan7/how-to-run-jupyter-notebook-in-server-which-is-at-multi-hop-distance-a02bc8e78314) > > as this apparently violates firewall rules? > > > > > > Thanks, > > > Michael > > From awertz at cmu.edu Mon Aug 19 12:29:50 2019 From: awertz at cmu.edu (Anthony Wertz) Date: Mon, 19 Aug 2019 12:29:50 -0400 Subject: Getting remote access to jupyter notebook In-Reply-To: <20190819161800.q8xMj8sFQ%predragp@andrew.cmu.edu> References: <20190819161800.q8xMj8sFQ%predragp@andrew.cmu.edu> Message-ID: I almost always use a tunnel, configured similar to what Predrag specified. You can greatly simplify this process by storing these proxy configurations in your SSH config file, e.g. on my laptop I can put in ~/.ssh/config (on macOS, similar on *nix, you're on your own on Windows): Host bash Hostname bash.autonlab.org UseKeychain yes Host gpu2 Hostname gpu2 ProxyCommand ssh bash exec nc %h %p UseKeychain yes And with that I can forward a port from GPU2 as if I could directly see the machine: ssh -L :localhost: gpu2 That being said, I proxy through my workstation which is substantially beefier than bash, so using bash as the middle-man might be what's slowing things down (Predrag can comment on that, there might be better gateways to use). I generally have no issues forwarding jupyter notebooks or anything else this way, and (for me at least) it's usually snappier than running a browser in X2go (might be the mac client or my failure to configure things properly :-)) On Mon, Aug 19, 2019 at 12:19 PM Predrag Punosevac wrote: > Michael Andrews wrote: > > > Hi Predrag, > > > > Does it help with the firewall issue if I have a separate tunnel from my > > local machine to bash.autonlab.org and another one from the gpu machine > to > > bash.autonlab.org? > > > No makes it more complicated > > > > > Does anyone else use jupyter notebooks remotely? > > > > Yes using X2go as explained during the orientation. > > Predrag > > > On Fri, Aug 16, 2019 at 3:09 PM Predrag Punosevac < > predragp at andrew.cmu.edu> > > wrote: > > > > > Ok I think I am getting what is happening here. You are essentially > > > using reverse ssh proxy as a VPN tunnel and then run Browser through > > > it. That is going to be very slow and it is more or less the same > > > thing you would do with > > > > > > ssh -Y -o "ProxyCommand ssh username at bash.autonlab.org" blabla > > > > > > What I am saying is try to use X2Goclient to connect via > > > bash.autonlab.org proxy to a computing node of your liking and then > > > start the browser in X2Goclinent custom (openbox) session. That would > > > work in real time. My only concern is that in the past week or two I > > > and Rob were getting blacklisted by CMU firewall for using NX. Let me > > > think little bit more about the problem. I am surprised nobody wrote a > > > server mode for Jypiter notebook like the one there is for Rstudio. > > > > > > Predrag > > > > > > On Fri, Aug 16, 2019 at 1:39 PM Michael Andrews > wrote: > > > > > > > > Hi all, > > > > > > > > Is there a way to get remote access to a jupyter notebook running on > one > > > of the gpu machines without using a reverse ssh (i.e. > > > > https://medium.com/@sankarshan7/how-to-run-jupyter-notebook-in-server-which-is-at-multi-hop-distance-a02bc8e78314 > ) > > > as this apparently violates firewall rules? > > > > > > > > Thanks, > > > > Michael > > > > -- *Anthony Wertz* Research Programmer and Analyst Robotics Institute - Auton Lab Carnegie Mellon University awertz at cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbandrews at cmu.edu Mon Aug 19 13:20:56 2019 From: mbandrews at cmu.edu (Michael Andrews) Date: Mon, 19 Aug 2019 13:20:56 -0400 Subject: Getting remote access to jupyter notebook In-Reply-To: References: <20190819161800.q8xMj8sFQ%predragp@andrew.cmu.edu> Message-ID: Hi Anthony, thanks for the detailed instructions, this works for me. On Mon, Aug 19, 2019 at 12:30 PM Anthony Wertz wrote: > I almost always use a tunnel, configured similar to what Predrag specified. > > You can greatly simplify this process by storing these proxy > configurations in your SSH config file, e.g. on my laptop I can put in > ~/.ssh/config (on macOS, similar on *nix, you're on your own on Windows): > > Host bash > Hostname bash.autonlab.org > UseKeychain yes > > Host gpu2 > Hostname gpu2 > ProxyCommand ssh bash exec nc %h %p > UseKeychain yes > > And with that I can forward a port from GPU2 as if I could directly see > the machine: > > ssh -L :localhost: gpu2 > > That being said, I proxy through my workstation which is substantially > beefier than bash, so using bash as the middle-man might be what's slowing > things down (Predrag can comment on that, there might be better gateways to > use). I generally have no issues forwarding jupyter notebooks or anything > else this way, and (for me at least) it's usually snappier than running a > browser in X2go (might be the mac client or my failure to configure things > properly :-)) > > On Mon, Aug 19, 2019 at 12:19 PM Predrag Punosevac < > predragp at andrew.cmu.edu> wrote: > >> Michael Andrews wrote: >> >> > Hi Predrag, >> > >> > Does it help with the firewall issue if I have a separate tunnel from my >> > local machine to bash.autonlab.org and another one from the gpu >> machine to >> > bash.autonlab.org? >> >> >> No makes it more complicated >> >> > >> > Does anyone else use jupyter notebooks remotely? >> > >> >> Yes using X2go as explained during the orientation. >> >> Predrag >> >> > On Fri, Aug 16, 2019 at 3:09 PM Predrag Punosevac < >> predragp at andrew.cmu.edu> >> > wrote: >> > >> > > Ok I think I am getting what is happening here. You are essentially >> > > using reverse ssh proxy as a VPN tunnel and then run Browser through >> > > it. That is going to be very slow and it is more or less the same >> > > thing you would do with >> > > >> > > ssh -Y -o "ProxyCommand ssh username at bash.autonlab.org" blabla >> > > >> > > What I am saying is try to use X2Goclient to connect via >> > > bash.autonlab.org proxy to a computing node of your liking and then >> > > start the browser in X2Goclinent custom (openbox) session. That would >> > > work in real time. My only concern is that in the past week or two I >> > > and Rob were getting blacklisted by CMU firewall for using NX. Let me >> > > think little bit more about the problem. I am surprised nobody wrote a >> > > server mode for Jypiter notebook like the one there is for Rstudio. >> > > >> > > Predrag >> > > >> > > On Fri, Aug 16, 2019 at 1:39 PM Michael Andrews >> wrote: >> > > > >> > > > Hi all, >> > > > >> > > > Is there a way to get remote access to a jupyter notebook running >> on one >> > > of the gpu machines without using a reverse ssh (i.e. >> > > >> https://medium.com/@sankarshan7/how-to-run-jupyter-notebook-in-server-which-is-at-multi-hop-distance-a02bc8e78314 >> ) >> > > as this apparently violates firewall rules? >> > > > >> > > > Thanks, >> > > > Michael >> > > >> > > > -- > *Anthony Wertz* > Research Programmer and Analyst > Robotics Institute - Auton Lab > Carnegie Mellon University > awertz at cmu.edu > -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Wed Aug 21 15:47:56 2019 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Wed, 21 Aug 2019 15:47:56 -0400 Subject: our counter-human-trafficking work on youtube (through Marinus Analytics who partnered with Amazon)) Message-ID: Check this out: https://youtu.be/AwKfxefvfNk Artur PS Martial and Srinivas, is there a possibility that Amazon's facial matching algorithm could have CMU roots as well? -------------- next part -------------- An HTML attachment was scrubbed... URL: From youngsec at andrew.cmu.edu Wed Aug 21 20:33:32 2019 From: youngsec at andrew.cmu.edu (Youngseog Chung) Date: Wed, 21 Aug 2019 17:33:32 -0700 Subject: No git on LOV6? Message-ID: Hi Predrag, Does LOV6 not have git installed? I can't seem to call any git commands. Thanks. Best, Youngseog -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Aug 21 17:47:51 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 21 Aug 2019 17:47:51 -0400 Subject: No git on LOV6? In-Reply-To: References: Message-ID: root at lov6$ pwd /opt/rh root at lov6$ ls devtoolset-7 httpd24 rh-git218 rh-python36 devtoolset-8 llvm-toolset-7 rh-git29 On Wed, Aug 21, 2019 at 5:33 PM Youngseog Chung wrote: > > Hi Predrag, > > Does LOV6 not have git installed? > > I can't seem to call any git commands. > > Thanks. > > Best, > Youngseog From ngisolfi at cs.cmu.edu Thu Aug 22 10:17:58 2019 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 22 Aug 2019 10:17:58 -0400 Subject: [Lunch] Today 12-1pm @ Gates 6th Floor Balcony Message-ID: Hi Everyone, Lab lunch will be today from 12-1. We?ll meet on the 6th floor Gates balcony again. See you there! - Nick From predragp at andrew.cmu.edu Thu Aug 22 16:20:58 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Thu, 22 Aug 2019 16:20:58 -0400 Subject: Unable to ssh to lop1 In-Reply-To: References: Message-ID: Hi Shimeng, I hope you don't mind me CC-ing all Auton Lab users as this affect lot of people. I will give the explanation first and then propose the solution. Namely about year and a half ago based upon experience I gained over the years with Auton Lab users and ZFS file system we started migrating user home directories to separate ZFS data sets (if people are not familiar with ZFS please refer to Oracle former Sun Microsystem documentation as it is TL:TW). That meant that we also have to utilize autofs daemon to do dynamic mounting of home directories as previously used /etc/fstab method would mean few thousand NFS client mounts on our file server. Unfortunately lop1.autonlab.org gateway was created at the time autofs daemon was not used and the underlining OS doesn't support it. That effectively means that you can't log onto the lop1 as your home directory is not mounted and that you have no choice but to use bash.autonlab.org as the only shell gateway. Anybody whose home directory is on zfsauton2 or zfsauton3 parent ZFS dataset is affected about (80-90 users). Note that the members of Neill Group as well people with home directories on zfsauton are not affected and still can use lop1 gateway. Many people also have AutonLab issued desktops and they don't need other shell gateways. Replacing lop1.autounlab.org has been on my todo list for a while and yesterday was suppose to be D day. However important security patches came out as well as release of Alpine Linux 3.10.2 (our virtual OS of choice) and I spent the day patching/upgrading bunch of servers as oppose to provisioning new gateway. This no longer can wait and I will provision new gateway hopefully before the end of the next working day. Best, Predrag On Thu, Aug 22, 2019 at 2:31 PM Shimeng Peng wrote: > > Hi,I am unable to ssh to lop1, I always get connection closed when I try to remotely log on. > > > Could you please help with figuring out it > > > Thank you > > Shimeng > > From predragp at andrew.cmu.edu Fri Aug 23 18:36:08 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 23 Aug 2019 18:36:08 -0400 Subject: New shell gateway lop2.autonlab.org Message-ID: Dear Autonians, I added another shell gateway capable of autofs daemon to our computing infrastructure. ECDSA key fingerprint is: SHA256:LiG0+LN6Tf5EQZjZatD/WDYF2iV046y+Lnz1EXC+EXY or MD5:91:f8:39:fb:c2:24:89:6e:2c:b4:42:a0:cc:11:64:bf You could also use this shell gateway for x2go-client GUI access to our computing nodes. Please let me know if you notice any rough edges. Old lop1/cpu is still available but not useful to the lab members whose home directories are mounted dynamically. This should be a welcome addition to bash.autonlab.org Cheers, Predrag From predragp at andrew.cmu.edu Sun Aug 25 01:43:11 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sun, 25 Aug 2019 01:43:11 -0400 Subject: Mailing list cleanup Message-ID: <20190825054311.X4fkMXbBm%predragp@andrew.cmu.edu> Dear Autonians, In preparation for the start of the Fall semester I did a bit of the mailing list cleaning (users at autonlab.org). It came to my attention that there was a discrepancy between a number of Auton Lab active user accounts 141 and the number of people who are subscribed to the mailing list. It seems that few of you have opted out from the mailing list. Our users mailing list is principle way of communicating between lab members who are scattered over three continents (Norht America, Europe, and Africa) and numerous institutions. A valid email address is a prerequisite for an active account in part due to legal reasons. As of today Jarod and I (list owners) will be getting a notices of unsubscribes. An unsubscribe request will be understood as a request for the account suspension and will be authomatically honored. At this point there is only one email address on this general mailing list which is not affiliated to an active Lab account and that one is our founding member. Most Kind Regards, Predrag Punosevac From predragp at andrew.cmu.edu Mon Aug 26 15:53:13 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 26 Aug 2019 15:53:13 -0400 Subject: GPUs 11/13/14 nvidia-smi not working In-Reply-To: References: Message-ID: Thanks for the report! I can reproduce the problem. This is typically due to the upgraded kernel for which nvidia module has to be rebuilt (If i reboot the machine that will happen automatically). Welcome to the wonderful world of proprietary software. So this is the notice to the rest of the group that I will be rebooting non functional GPU nodes in next hour or two until things are fixed. Cheers, Predrag On Mon, Aug 26, 2019 at 3:47 PM Abhay Gupta wrote: > > Hi Predrag, > > I was trying to access GPUs on servers 11/13/14 got this error while running 'nvidia-smi': > 'Failed to initialize NVML: Driver/library version mismatch' > > I think either the driver or the library needs to be updated for Nvidia drivers on these servers. Can you please have a look into this? Thanks. > > -- > Regards, > Abhay Gupta From predragp at andrew.cmu.edu Mon Aug 26 16:00:43 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 26 Aug 2019 16:00:43 -0400 Subject: GPUs 11/13/14 nvidia-smi not working In-Reply-To: References: Message-ID: Fixed. The only problematic machines were 11, 13, and 14. On Mon, Aug 26, 2019 at 3:53 PM Predrag Punosevac wrote: > > Thanks for the report! I can reproduce the problem. This is typically > due to the upgraded kernel for which nvidia module has to be rebuilt > (If i reboot the machine that will happen automatically). Welcome to > the wonderful world of proprietary software. So this is the notice to > the rest of the group that I will be rebooting non functional GPU > nodes in next hour or two until things are fixed. > > Cheers, > Predrag > > On Mon, Aug 26, 2019 at 3:47 PM Abhay Gupta wrote: > > > > Hi Predrag, > > > > I was trying to access GPUs on servers 11/13/14 got this error while running 'nvidia-smi': > > 'Failed to initialize NVML: Driver/library version mismatch' > > > > I think either the driver or the library needs to be updated for Nvidia drivers on these servers. Can you please have a look into this? Thanks. > > > > -- > > Regards, > > Abhay Gupta From predragp at andrew.cmu.edu Mon Aug 26 20:52:59 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 26 Aug 2019 20:52:59 -0400 Subject: no git on gpu10 In-Reply-To: References: Message-ID: RedHat comes with older version of Git so we use non-standard installation location root at gpu10$ pwd /opt/rh/rh-git218/root/bin root at gpu10$ ./git --version git version 2.18.1 Please check /opt for other interesting scientific software. /opt/rh is RedHat developer library with new tools. I am fixing damn DokuWiki so that I can post this info on our Intranet. Cheers, Predrag On Mon, Aug 26, 2019 at 8:47 PM Michael Andrews wrote: > > Hi Predrag, > > I can't seem to get access to git on gpu10: > > mbandrews at gpu10:$ git > -bash: git: command not found > > Was it installed on this machine? > > Thanks, > Michael From mbandrews at cmu.edu Mon Aug 26 20:59:54 2019 From: mbandrews at cmu.edu (Michael Andrews) Date: Mon, 26 Aug 2019 20:59:54 -0400 Subject: no git on gpu10 In-Reply-To: References: Message-ID: Thanks, Michael On Mon, Aug 26, 2019 at 8:53 PM Predrag Punosevac wrote: > RedHat comes with older version of Git so we use non-standard > installation location > > > root at gpu10$ pwd > /opt/rh/rh-git218/root/bin > root at gpu10$ ./git --version > git version 2.18.1 > > > Please check /opt for other interesting scientific software. /opt/rh > is RedHat developer library with new tools. > > I am fixing damn DokuWiki so that I can post this info on our Intranet. > > Cheers, > Predrag > > > > On Mon, Aug 26, 2019 at 8:47 PM Michael Andrews wrote: > > > > Hi Predrag, > > > > I can't seem to get access to git on gpu10: > > > > mbandrews at gpu10:$ git > > -bash: git: command not found > > > > Was it installed on this machine? > > > > Thanks, > > Michael > -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Tue Aug 27 16:33:19 2019 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 27 Aug 2019 16:33:19 -0400 Subject: UPDATE: [In case you have been contemplating developing or applying automated machine learning... In-Reply-To: References: Message-ID: ... look no further than Saswati Ray's office. <-- that's what I said in June, and it holds!] However, Saswati's Task Force continues to excel at DARPA D3M evaluations. Please see below the current leaderboard -- the competition was indeed top-notch. Cheers and congrats to everyone who contributed and please continue your outstanding work! Artur [image: image.png] On Fri, Jun 14, 2019 at 8:41 AM Artur Dubrawski wrote: > ... look no further than Saswati Ray's office. > > She leads the Auton Lab task force that got CMU to the very top of the > leader boards in the current DARPA D3M evaluations, leaving behind a number > of lead university (all the usual suspects included) and industrial > competitors. > > Congrats to everyone involved! > > Cheers, > Artur > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 79664 bytes Desc: not available URL: From srinivas at andrew.cmu.edu Tue Aug 27 16:44:22 2019 From: srinivas at andrew.cmu.edu (Srinivasa Narasimhan) Date: Tue, 27 Aug 2019 16:44:22 -0400 Subject: UPDATE: [In case you have been contemplating developing or applying automated machine learning... In-Reply-To: References: Message-ID: <30eb3c12-b798-b878-fd9c-7eccbdadc40b@cs.cmu.edu> Congratulations Saswati and the Autonlab! Srinivas. On 8/27/19 4:33 PM, Artur Dubrawski wrote: > ... look no further than Saswati Ray's office. ? <-- that's what I > said in June, and it holds!] > > However, Saswati's Task Force continues to excel at DARPA D3M > evaluations. > Please see below the current leaderboard -- the competition was indeed > top-notch. > > Cheers and congrats to everyone who contributed and please continue > your outstanding work! > > Artur > > image.png > On Fri, Jun 14, 2019 at 8:41 AM Artur Dubrawski > wrote: > > ... look no further than Saswati Ray's office. > > She leads the Auton Lab task force that got CMU to the very top of > the leader boards in the current DARPA D3M evaluations, leaving > behind a number of lead university (all the usual suspects > included) and industrial competitors. > > Congrats to everyone involved! > > Cheers, > Artur > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 79664 bytes Desc: not available URL: From awd at cs.cmu.edu Wed Aug 28 19:35:15 2019 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Wed, 28 Aug 2019 19:35:15 -0400 Subject: *** Auton Lab's 26th Annual Picnic: Sunday, October 13th at Schenley Park *** In-Reply-To: <70D7E5A2-54D9-4607-8C1D-EE868038CDDD@andrew.cmu.edu> References: <70D7E5A2-54D9-4607-8C1D-EE868038CDDD@andrew.cmu.edu> Message-ID: Dear Autonians, We would like to invite you and your family to celebrate the 26th birthday of the Auton Lab. Please save the date for our annual picnic at Vietnam Veterans Pavilion in Schenley Park on Sunday, October 13th. Please RSVP through the web form below so that we could plan resources properly: https://forms.gle/gZ1GtL4KvFAGYPRGA Do not forget to enter your bid for the slogan we should put on the Lab's birthday cake. As a reminder, see the attached last year's winning bid, materialized (yes the cake was superyummy). Thanks, Artur and Jessie (Auton Lab CEO, Chief Entertainment Officer) [image: IMG_5142.jpg] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_5142.jpg Type: image/jpeg Size: 2072891 bytes Desc: not available URL: From ngisolfi at cs.cmu.edu Thu Aug 29 09:22:37 2019 From: ngisolfi at cs.cmu.edu (Nick Gisolfi) Date: Thu, 29 Aug 2019 09:22:37 -0400 Subject: [Lunch] Today 12-1pm @ NSH 4513 Message-ID: <302DD3BB-84A0-4EFE-861E-04A3EA9BB4FE@cs.cmu.edu> Hi Everyone, Lab lunch will be in NSH 4513 today from 12-1pm. Bring your meal and we?ll see you there! - Nick