From predragp at andrew.cmu.edu Fri Nov 2 18:03:45 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 02 Nov 2018 18:03:45 -0400 Subject: Misc issues Message-ID: <20181102220345.6pSbywqLQ%predragp@andrew.cmu.edu> Dear Autonians, During today's orientation session with three new members of the lab few issues got my attention so I am goint to share them with you as well as a status update on the main file server. 1. As of this morning ZFS snapshots are disabled on the main file server hosting majority of older accounts and old snapshots have been deleted. Right now zpool hosting home directories is 88% full. At the moment I am moving at least one large legacy home directory to the attic which should releave 1 TB of space. This is still insuficient to drop the pool load to below 80% needed for normal NFS, resilvering, and snapshot operations. I am recalculationg home directory sizes this very moment and I hope I will have report by Monday. Any directory sized larger than 0.5TB will be prime target for migrartion (or removal). 2. Git web interface was temporary down due to the SSL certificate update. After I restarted PostgreSQL database things work as expected for both local accounts (Anthony's and mine) as well as LDAP accounts (everyone else). Git clone, pull, and push also work per testing. 3. There was a report on ssh key issue with Git authentication needed for the git operations from cli. This issue is user specific and it has nothing to do with SELinux NFS policies we had in the past setsebool -P use_nfs_home_dirs=true or the fact that 25 home directories are already migrated to the new file server and mounted by autofs daemon on the login. I verfied this both on GPU machines (which have SELinux disabled) as well as regular CPU machines which have SELinux enabled with the account that had to be autofs mounted per login. 4. You can also put your public key into .ssh/authorized_keys and use passwordless authentication whether you have an old permanently mount home directory or the one mounte with autofs daemon. This is tested. I had a rough time today during the demo (I guess it is just my age). 5. Finally it appears that at one out of four GPU cards on the GPU3 and GPU4 computing nodes are not working properly. Please see below. I have seen this in the past and the reason was dead hardware. I would like to reboot those two servers and do some further testing before we shell out $2500 for two used Titan Xp cards. I hope you have a great weekend. Predrag root at gpu3$ nvidia-smi Fri Nov 2 17:01:29 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.30 Driver Version: 390.30 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X (Pascal) Off | 00000000:02:00.0 Off | N/A | | 23% 29C P8 17W / 250W | 1081MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN X (Pascal) Off | 00000000:03:00.0 Off | N/A | | 23% 28C P8 9W / 250W | 1081MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 TITAN X (Pascal) Off | 00000000:82:00.0 Off | N/A | | 23% 31C P8 11W / 250W | 1081MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 4458 C python3 1071MiB | | 1 19870 C python3 1071MiB | | 2 3102 C python3 1071MiB | +-----------------------------------------------------------------------------+ root at gpu4$ nvidia-smi Fri Nov 2 17:01:34 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.30 Driver Version: 390.30 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X (Pascal) Off | 00000000:02:00.0 Off | N/A | | 41% 67C P2 169W / 250W | 3213MiB / 12196MiB | 7% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN X (Pascal) Off | 00000000:03:00.0 Off | N/A | | 25% 45C P8 19W / 250W | 2810MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 TITAN X (Pascal) Off | 00000000:83:00.0 Off | N/A | | 23% 34C P8 15W / 250W | 317MiB / 12196MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 21927 C python2 1689MiB | | 1 7063 C ...auton/home/cnagpal/anaconda2/bin/python 1267MiB | | 1 12081 C ...auton/home/cnagpal/anaconda2/bin/python 1525MiB | | 2 28460 C python 307MiB | +-----------------------------------------------------------------------------+ From predragp at andrew.cmu.edu Sat Nov 3 19:07:37 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sat, 03 Nov 2018 19:07:37 -0400 Subject: GPU2 Fwd: Autonlab-sysinfo Digest, Vol 52, Issue 12 Message-ID: <20181103230737.oNp-UWyvW%predragp@andrew.cmu.edu> Dear Autonians, GPU2 server is officially broke. Please see below. What you are not going to see in the below report is that besides /root being full due to the accessive cashing in the /tmp folder which resides on the root partition /home/scratch is also full. I have emailed multiple times asking people to clear the /scratch directory but no aval. I will have to use more heavy handed methods now like periodically deleteing all /home/scratch directories on all GPU machines so that machines are usable again. Best, Predrag P.S. I am going for a stroll. If /home/scratch and cash is not clear I will have to reboot the machine and delete /home/scratch around 9:00 PM tonight. -------- Original Message -------- From: autonlab-sysinfo-request at autonlab.org Subject: Autonlab-sysinfo Digest, Vol 52, Issue 12 To: autonlab-sysinfo at autonlab.org Date: Sat, 03 Nov 2018 17:21:15 -0400 Send Autonlab-sysinfo mailing list submissions to autonlab-sysinfo at autonlab.org To subscribe or unsubscribe via the World Wide Web, visit https://mailman.srv.cs.cmu.edu/mailman/listinfo/autonlab-sysinfo or, via email, send a message with subject or body 'help' to autonlab-sysinfo-request at autonlab.org You can reach the person managing the list at autonlab-sysinfo-owner at autonlab.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Autonlab-sysinfo digest..." Today's Topics: 1. Cron /usr/lib64/sa/sa1 1 1 ((Cron Daemon)) 2. Cron run-parts /etc/cron.hourly ((Cron Daemon)) 3. Cron /usr/lib64/sa/sa1 1 1 ((Cron Daemon)) 4. Cron /usr/lib64/sa/sa1 1 1 ((Cron Daemon)) 5. Cron /usr/lib64/sa/sa1 1 1 ((Cron Daemon)) ---------------------------------------------------------------------- Message: 1 Date: Sat, 3 Nov 2018 15:50:01 -0400 (EDT) From: "(Cron Daemon)" To: root at gpu2.int.autonlab.org Subject: Cron /usr/lib64/sa/sa1 1 1 Message-ID: <20181103212106.67CDC1596247 at gpu2.int.autonlab.org> Content-Type: text/plain; charset=UTF-8 Cannot write data to system activity file: No space left on device ------------------------------ Message: 2 Date: Sat, 3 Nov 2018 16:10:45 -0400 (EDT) From: "(Cron Daemon)" To: root at gpu2.int.autonlab.org Subject: Cron run-parts /etc/cron.hourly Message-ID: <20181103212106.8CFB71596243 at gpu2.int.autonlab.org> Content-Type: text/plain; charset=UTF-8 /etc/cron.hourly/0yum-hourly.cron: Traceback (most recent call last): File "/usr/sbin/yum-cron", line 729, in main() File "/usr/sbin/yum-cron", line 726, in main base.updatesCheck() File "/usr/sbin/yum-cron", line 618, in updatesCheck self.populateUpdateMetadata() File "/usr/sbin/yum-cron", line 422, in populateUpdateMetadata self.pkgSack # honor skip_if_unavailable File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 1074, in pkgSack = property(fget=lambda self: self._getSacks(), File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 778, in _getSacks self.repos.populateSack(which=repos) File "/usr/lib/python2.7/site-packages/yum/repos.py", line 347, in populateSack self.doSetup() File "/usr/lib/python2.7/site-packages/yum/repos.py", line 157, in doSetup self.retrieveAllMD() File "/usr/lib/python2.7/site-packages/yum/repos.py", line 88, in retrieveAllMD dl = repo._async and repo._commonLoadRepoXML(repo) File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1477, in _commonLoadRepoXML if self._latestRepoXML(local): File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1446, in _latestRepoXML oxml = self._saveOldRepoXML(local) File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1303, in _saveOldRepoXML shutil.copy2(local, old_local) File "/usr/lib64/python2.7/shutil.py", line 130, in copy2 copyfile(src, dst) File "/usr/lib64/python2.7/shutil.py", line 83, in copyfile with open(dst, 'wb') as fdst: IOError: [Errno 28] No space left on device: '/var/cache/yum/x86_64/7/SCL-core/repomd.xml.old.tmp' /etc/cron.hourly/ghc-doc-index: /usr/bin/ghc-doc-index: line 27: /var/lib/ghc/pkg-dir.cache.new: No space left on device diff: /var/lib/ghc/pkg-dir.cache.new: No such file or directory haddock: internal error: .: copyFile: resource exhausted (No space left on device) ------------------------------ Message: 3 Date: Sat, 3 Nov 2018 16:00:01 -0400 (EDT) From: "(Cron Daemon)" To: root at gpu2.int.autonlab.org Subject: Cron /usr/lib64/sa/sa1 1 1 Message-ID: <20181103212106.7C6CA1596246 at gpu2.int.autonlab.org> Content-Type: text/plain; charset=UTF-8 Cannot write data to system activity file: No space left on device ------------------------------ Message: 4 Date: Sat, 3 Nov 2018 15:40:01 -0400 (EDT) From: "(Cron Daemon)" To: root at gpu2.int.autonlab.org Subject: Cron /usr/lib64/sa/sa1 1 1 Message-ID: <20181103212106.8282C1596247 at gpu2.int.autonlab.org> Content-Type: text/plain; charset=UTF-8 Cannot write data to system activity file: No space left on device ------------------------------ Message: 5 Date: Sat, 3 Nov 2018 15:20:01 -0400 (EDT) From: "(Cron Daemon)" To: root at gpu2.int.autonlab.org Subject: Cron /usr/lib64/sa/sa1 1 1 Message-ID: <20181103212106.742761596243 at gpu2.int.autonlab.org> Content-Type: text/plain; charset=UTF-8 Cannot write data to system activity file: No space left on device ------------------------------ Subject: Digest Footer _______________________________________________ Autonlab-sysinfo mailing list Autonlab-sysinfo at autonlab.org https://mailman.srv.cs.cmu.edu/mailman/listinfo/autonlab-sysinfo ------------------------------ End of Autonlab-sysinfo Digest, Vol 52, Issue 12 ************************************************ From predragp at andrew.cmu.edu Sat Nov 3 19:11:14 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sat, 03 Nov 2018 19:11:14 -0400 Subject: GPU 1 error In-Reply-To: References: Message-ID: <20181103231114.gH2ceoGrv%predragp@andrew.cmu.edu> Biswajit Paria wrote: > Hi Predrag, > > I am trying to use GPU 1, and getting an unusual segmentation fault. The > same code that I was running for two days is now throwing a segmentation > fault. Is it possible to restart GPU1? Doesn't look like anyone else it > using it other than me. Sure if nobody is using it. Are you sure that you were using this machine after I rebooted last week? Those library exception errors are typically due to NVidia 3rd party binary blob drivers which needs to be reinstalled occasionally. I will give a two hours and reboot at the same time when I reboot GPU2. If the driver gets broken it will have to wait Monday. > > Here is stack trace in case you want to have a look: > > Stack trace returned 10 entries: > [bt] (0) > /zfsauton/home/bparia/anaconda3/lib/python3.6/site-packages/mxnet/lib > mxnet.so(+0x31f81a) [0x7feebb24f81a] > [bt] (1) > /zfsauton/home/bparia/anaconda3/lib/python3.6/site-packages/mxnet/lib > mxnet.so(+0x29f33b6) [0x7feebd9233b6] > [bt] (2) /lib64/libpthread.so.0(+0xf680) [0x7fef78319680] > [bt] (3) /lib64/libpthread.so.0(raise+0x2b) [0x7fef7831954b] > [bt] (4) /lib64/libpthread.so.0(+0xf680) [0x7fef78319680] > [bt] (5) /usr/lib64/nvidia/libcuda.so.1(+0xf88d5) [0x7fef304548d5] > [bt] (6) /usr/lib64/nvidia/libcuda.so.1(+0x248914) [0x7fef305a4914] > [bt] (7) /usr/lib64/nvidia/libcuda.so.1(+0x1e4e80) [0x7fef30540e80] > [bt] (8) /lib64/libpthread.so.0(+0x7dd5) [0x7fef78311dd5] > [bt] (9) /lib64/libc.so.6(clone+0x6d) [0x7fef7803bb3d] > > > Thanks in advance! > -- > Biswajit Paria > PhD student > Machine Learning Department > Carnegie Mellon University From predragp at andrew.cmu.edu Sat Nov 3 22:57:17 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sat, 03 Nov 2018 22:57:17 -0400 Subject: GPU 1 error In-Reply-To: References: <20181103231114.gH2ceoGrv%predragp@andrew.cmu.edu> Message-ID: <20181104025717.-mSPsGTrK%predragp@andrew.cmu.edu> Biswajit Paria wrote: > I see. I was using it yesterday. It is possible that the CUDA is broken, > and it is somehow not using the cuda in my home directory. I will try to > get it to use my local CUDA, otherwise I will wait till Monday. > > Thanks! > Ok I just spent almost 2h playing with GPU1. This is what I have done. I cleaned the system, NVidia driver to 396.44 and upgraded CUDA to 9.2. I then cleaned and upgraded all the packages. Note that I didn't want to install recently release CUDA 10 which is probably still poorly supported by applications. The system works like a swiss watch now but it is likely that all deep-learning tools are in broken state. You will have to rebuilt tensor-flow or whatever you were using. The following two users 678.5 GiB joliva 513.8 GiB chunlial should try to clean their scratch directories or at least e-mail me with an explanation for such excessive use. I have half-way scripted now this process for Ansible so I could push this to all GPU nodes but it is likely that I will inflict lot of pain to people who are running jobs. We still have a problem on the servers GPU3 and GPU4 which appear to have dead GPU cards. Best, Predrag > > On Sat, Nov 3, 2018, 7:11 PM Predrag Punosevac wrote: > > > Biswajit Paria wrote: > > > > > Hi Predrag, > > > > > > I am trying to use GPU 1, and getting an unusual segmentation fault. The > > > same code that I was running for two days is now throwing a segmentation > > > fault. Is it possible to restart GPU1? Doesn't look like anyone else it > > > using it other than me. > > > > > > Sure if nobody is using it. Are you sure that you were using this > > machine after I rebooted last week? Those library exception errors are > > typically due to NVidia 3rd party binary blob drivers which needs to be > > reinstalled occasionally. I will give a two hours and reboot at the > > same time when I reboot GPU2. If the driver gets broken it will have to > > wait Monday. > > > > > > > > > > > > Here is stack trace in case you want to have a look: > > > > > > Stack trace returned 10 entries: > > > [bt] (0) > > > /zfsauton/home/bparia/anaconda3/lib/python3.6/site-packages/mxnet/lib > > > mxnet.so(+0x31f81a) [0x7feebb24f81a] > > > [bt] (1) > > > /zfsauton/home/bparia/anaconda3/lib/python3.6/site-packages/mxnet/lib > > > mxnet.so(+0x29f33b6) [0x7feebd9233b6] > > > [bt] (2) /lib64/libpthread.so.0(+0xf680) [0x7fef78319680] > > > [bt] (3) /lib64/libpthread.so.0(raise+0x2b) [0x7fef7831954b] > > > [bt] (4) /lib64/libpthread.so.0(+0xf680) [0x7fef78319680] > > > [bt] (5) /usr/lib64/nvidia/libcuda.so.1(+0xf88d5) [0x7fef304548d5] > > > [bt] (6) /usr/lib64/nvidia/libcuda.so.1(+0x248914) [0x7fef305a4914] > > > [bt] (7) /usr/lib64/nvidia/libcuda.so.1(+0x1e4e80) [0x7fef30540e80] > > > [bt] (8) /lib64/libpthread.so.0(+0x7dd5) [0x7fef78311dd5] > > > [bt] (9) /lib64/libc.so.6(clone+0x6d) [0x7fef7803bb3d] > > > > > > > > > Thanks in advance! > > > -- > > > Biswajit Paria > > > PhD student > > > Machine Learning Department > > > Carnegie Mellon University > > From predragp at andrew.cmu.edu Sat Nov 3 23:19:44 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sat, 03 Nov 2018 23:19:44 -0400 Subject: GPU1 and GPU2 scratch directory report In-Reply-To: <24A483DA-87C2-4D47-A3CA-D56831A34205@andrew.cmu.edu> References: <20181103230737.oNp-UWyvW%predragp@andrew.cmu.edu> <24A483DA-87C2-4D47-A3CA-D56831A34205@andrew.cmu.edu> Message-ID: <20181104031944.OtNjYrohj%predragp@andrew.cmu.edu> Jayanth Koushik wrote: > Hey Predrag, > > I realize this is a serious issue, but would it at all be possible to keep my scratch directory? I only use 12GB but I have a personal toolchain installed there on which a lot of scripts depend. > Sure thing! I did a little digging and the same users are hogging scratch directories on both GPU1 and GPU2. Now there might be a valid reason to do so ... GPU1: 340.7 GiB joliva 320.9 GiB chunlial GPU2: 678.5 GiB joliva 513.8 GiB chunlial Cheers, Predrag Date: Sat, 03 Nov 2018 23:15:21 -0400 From: Predrag Punosevac To: jkoushik at andrew.cmu.edu Cc: joliva at cs.unc.edu chunlial at andrew.cmu.edu, users at autonlab.org Subject: Re: GPU1 and GPU2 scratch directory user report Message-ID: <20181104031521.zq4R9U0Is%predragp at andrew.cmu.edu> References: <20181103230737.oNp-UWyvW%predragp at andrew.cmu.edu> <24A483DA-87C2-4D47-A3CA-D56831A34205 at andrew.cmu.edu> In-Reply-To: <24A483DA-87C2-4D47-A3CA-D56831A34205 at andrew.cmu.edu> User-Agent: s-nail v14.8.12 Jayanth Koushik wrote: > Hey Predrag, > > I realize this is a serious issue, but would it at all be possible to keep my scratch directory? I only use 12GB but I have a personal toolchain installed there on which a lot of scripts depend. > Sure thing! I did a little digging and the same users are hogging scratch directories on both GPU1 and GPU2. Now there might be a valid reason to do so ... GPU1: 340.7 GiB joliva 320.9 GiB chunlial GPU2: 678.5 GiB joliva 513.8 GiB chunlial Cheers, Predrag > Thanks, > ~Jayanth > Thanks, > ~Jayanth From predragp at andrew.cmu.edu Sun Nov 4 16:28:23 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sun, 04 Nov 2018 16:28:23 -0500 Subject: Main file server disk usage report Message-ID: <20181104212823.c498DWpOh%predragp@andrew.cmu.edu> Dear Autonians, The following users exceed 250GB home directory quota which is implemented on the new file server. Keep in mind that ZFS is using lzma compression so depends on the data type 5.2 TB translates into 10 TB regular file system data. The reason for such large home directories might be completely legit but they should not have been created without prior notice. At least 3 of the people on this list are no longer formally affiliated to CMU. One of them ffalck have already contacted me and we are working on trimming and archiving his data. Anybody who has home directory larger than 1TB should contact me at the earliest convenience. Best, Predrag 5.2 TiB [##########] /jaylee 3.8 TiB [####### ] /yichongx 2.5 TiB [#### ] /joliva 1.5 TiB [## ] /lujiec 1.4 TiB [## ] /ffalck 1.3 TiB [## ] /htung 759.5 GiB [# ] /dsutherl 573.8 GiB [# ] /kkandasa 542.1 GiB [# ] /iapostol 536.2 GiB [# ] /pengrui 518.4 GiB [ ] /bpatra 464.7 GiB [ ] /chunlial 422.2 GiB [ ] /siyuh 416.6 GiB [ ] /jrmoniz 374.8 GiB [ ] /chiragn 327.2 GiB [ ] /yifeim 253.6 GiB [ ] /bparia From predragp at andrew.cmu.edu Sun Nov 4 17:04:42 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sun, 04 Nov 2018 17:04:42 -0500 Subject: Main file server disk usage report In-Reply-To: <9B3002CE-273C-404C-9931-681D6780875F@andrew.cmu.edu> References: <20181104212823.c498DWpOh%predragp@andrew.cmu.edu> <9B3002CE-273C-404C-9931-681D6780875F@andrew.cmu.edu> Message-ID: <20181104220442.CKfKOtNzK%predragp@andrew.cmu.edu> Yichong Xu wrote: > Hi Predrag, > I will clean my home directory soon. > Besides this - In my opinion 250GB is a bit too small for one person, > especially for storing models and data. Is there anywhere else I can > store my data to? Thank you very much! You don't get it. You can't delete anything from ZFS. Only I can delete things with the great deal of effort (stop and remove snapshots) stop backups and risk everyone's home directory. ZFS is designed for data retention not for data loss. The principal designed error on my part made 5 years ago was to allow home directories to share the same ZFS dataset thereby losing the fine control over the data management. Right now if you do rm -rf you will be able to delete the things because I had to turn off bunch of switches. 250 GB home directory is not curved in stone and I can increase quota on the need basis. We have enough disk space but I have to have some prior notice to be able to do planning. Please stop by my office so that we develop some kind migration strategy for your home directory. It will be a separate ZFS dataset with the quota we agree to be reasonable for you to work normally but also for me not to have these kind wildfires. Predrag > > Thanks, > Yichong > > > > On Nov 4, 2018, at 4:28 PM, Predrag Punosevac > wrote: > > Dear Autonians, > > The following users exceed 250GB home directory quota which is > implemented on the new file server. Keep in mind that ZFS is using lzma > compression so depends on the data type 5.2 TB translates into 10 TB > regular file system data. The reason for such large home directories > might be completely legit but they should not have been created without > prior notice. At least 3 of the people on this list are no longer > formally affiliated to CMU. One of them ffalck have already contacted me > and we are working on trimming and archiving his data. Anybody who has > home directory larger than 1TB should contact me at the earliest > convenience. > > Best, > Predrag > > > 5.2 TiB [##########] /jaylee > 3.8 TiB [####### ] /yichongx > 2.5 TiB [#### ] /joliva > 1.5 TiB [## ] /lujiec > 1.4 TiB [## ] /ffalck > 1.3 TiB [## ] /htung > 759.5 GiB [# ] /dsutherl > 573.8 GiB [# ] /kkandasa > 542.1 GiB [# ] /iapostol > 536.2 GiB [# ] /pengrui > 518.4 GiB [ ] /bpatra > 464.7 GiB [ ] /chunlial > 422.2 GiB [ ] /siyuh > 416.6 GiB [ ] /jrmoniz > 374.8 GiB [ ] /chiragn > 327.2 GiB [ ] /yifeim > 253.6 GiB [ ] /bparia > From predragp at andrew.cmu.edu Mon Nov 5 16:16:47 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 05 Nov 2018 16:16:47 -0500 Subject: HDD failure on the main file server Message-ID: <20181105211647.44Kh5fOCI%predragp@andrew.cmu.edu> HDD is already replaced but I powered off file server for about 5 minutes (it is just safer no to do hot swap). Luckily this is the HDD from a ZFS pool which was 50% loaded so the resilvering should be completed by tonight. If you are one of the people who promised to clean the stuff from the home directory now is the time. Predrag From predragp at andrew.cmu.edu Mon Nov 5 21:33:42 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 05 Nov 2018 21:33:42 -0500 Subject: U.S. government FIPS approved algorithms Message-ID: <20181106023342.qN-AwPMxe%predragp@andrew.cmu.edu> Dear Autonians, An hour ago I switched LDAP authentication on all our computing nodes and the Auton Lab maintained desktops to U.S. government FIPS approved algorithms for higher security protection https://csrc.nist.gov/csrc/media/publications/fips/140/2/final/documents/fips1402annexa.pdf I have done multiple testing and Authentication appears to be working as expected . Please let me know immediately if you notice any problems with logging into our infrastructure. Sincerely, Predrag Punosevac From vjeansel at andrew.cmu.edu Tue Nov 6 12:43:40 2018 From: vjeansel at andrew.cmu.edu (Vincent Jeanselme) Date: Tue, 6 Nov 2018 12:43:40 -0500 Subject: Jupyter Notebooks Message-ID: Hello, Has anyone faced problems with running Jupyter Notebook since yesterday ? Did you remember what was the change to operate after the last reboot of the sqlite database ? Thank you, Vincent -- Vincent Jeanselme ----------------- Analyst Researcher Auton Lab - Robotics Institute Carnegie Mellon University From mbarnes1 at andrew.cmu.edu Tue Nov 6 12:59:19 2018 From: mbarnes1 at andrew.cmu.edu (Matthew Barnes) Date: Tue, 6 Nov 2018 12:59:19 -0500 Subject: Jupyter Notebooks In-Reply-To: References: Message-ID: Also having this problem. Trying to create a new notebook hangs on "Creating new notebook in", and unable to open old notebooks. Anyone's setup currently working? On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme wrote: > Hello, > > Has anyone faced problems with running Jupyter Notebook since yesterday ? > Did you remember what was the change to operate after the last reboot of > the sqlite database ? > > Thank you, > > Vincent > > -- > Vincent Jeanselme > ----------------- > Analyst Researcher > Auton Lab - Robotics Institute > Carnegie Mellon University > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Nov 6 15:14:27 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 06 Nov 2018 15:14:27 -0500 Subject: Jupyter Notebooks In-Reply-To: References: Message-ID: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu> Matthew Barnes wrote: > Also having this problem. Trying to create a new notebook hangs on > "Creating new notebook in", and unable to open old notebooks. Anyone's > setup currently working? > Jupyter Notebook is using sqlite database to store the info. Unless you explicitly force Jupyter to create the database on the scratch directory the database is stored on the NFS share. There is nothing worse one can do in terms of data consistency than put a database or a private Git repo (talking about the server) onto the NFS. The datebase was left in inconsistent state after the file server was rebooted. You have to clear it and possibly recreate the database to be able to use Jupyter Notebook. Best, Predrag > On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme > wrote: > > > Hello, > > > > Has anyone faced problems with running Jupyter Notebook since yesterday ? > > Did you remember what was the change to operate after the last reboot of > > the sqlite database ? > > > > Thank you, > > > > Vincent > > > > -- > > Vincent Jeanselme > > ----------------- > > Analyst Researcher > > Auton Lab - Robotics Institute > > Carnegie Mellon University > > > > From vjeansel at andrew.cmu.edu Tue Nov 6 16:45:10 2018 From: vjeansel at andrew.cmu.edu (Vincent Jeanselme) Date: Tue, 6 Nov 2018 16:45:10 -0500 Subject: Jupyter Notebooks In-Reply-To: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu> References: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu> Message-ID: Even after changing my bashrc with export IPYTHONDIR=/home/scratch/$USER/.ipython and reinstalling my jupyter, it still does not seem to work. I also have an issue with git, I am no longer able to pull from the server. On 11/6/18 3:14 PM, Predrag Punosevac wrote: > Matthew Barnes wrote: > >> Also having this problem. Trying to create a new notebook hangs on >> "Creating new notebook in", and unable to open old notebooks. Anyone's >> setup currently working? >> > Jupyter Notebook is using sqlite database to store the info. Unless you > explicitly force Jupyter to create the database on the scratch directory > the database is stored on the NFS share. There is nothing worse one can > do in terms of data consistency than put a database or a private Git > repo (talking about the server) onto the NFS. The datebase was left in > inconsistent state after the file server was rebooted. You have to clear > it and possibly recreate the database to be able to use Jupyter > Notebook. > > Best, > Predrag > > >> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme >> wrote: >> >>> Hello, >>> >>> Has anyone faced problems with running Jupyter Notebook since yesterday ? >>> Did you remember what was the change to operate after the last reboot of >>> the sqlite database ? >>> >>> Thank you, >>> >>> Vincent >>> >>> -- >>> Vincent Jeanselme >>> ----------------- >>> Analyst Researcher >>> Auton Lab - Robotics Institute >>> Carnegie Mellon University >>> >>> -- Vincent Jeanselme ----------------- Analyst Researcher Auton Lab - Robotics Institute Carnegie Mellon University From chiragn at cs.cmu.edu Tue Nov 6 17:33:07 2018 From: chiragn at cs.cmu.edu (Chirag Nagpal) Date: Tue, 6 Nov 2018 17:33:07 -0500 Subject: Jupyter Notebooks In-Reply-To: References: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu> Message-ID: Predrag: I am able to reproduce the error on the scratch directory too. Vincent, Matt: are you using the anaconda jupyter? it could be an anaconda upgrade thats responsible ? Chirag On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme wrote: > Even after changing my bashrc with export IPYTHONDIR=/home/scratch/$USER/.ipython > and reinstalling my jupyter, it still does not seem to work. I also have an > issue with git, I am no longer able to pull from the server. > > > On 11/6/18 3:14 PM, Predrag Punosevac wrote: > >> Matthew Barnes wrote: >> >> Also having this problem. Trying to create a new notebook hangs on >>> "Creating new notebook in", and unable to open old notebooks. Anyone's >>> setup currently working? >>> >>> Jupyter Notebook is using sqlite database to store the info. Unless you >> explicitly force Jupyter to create the database on the scratch directory >> the database is stored on the NFS share. There is nothing worse one can >> do in terms of data consistency than put a database or a private Git >> repo (talking about the server) onto the NFS. The datebase was left in >> inconsistent state after the file server was rebooted. You have to clear >> it and possibly recreate the database to be able to use Jupyter >> Notebook. >> >> Best, >> Predrag >> >> >> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme < >>> vjeansel at andrew.cmu.edu> >>> wrote: >>> >>> Hello, >>>> >>>> Has anyone faced problems with running Jupyter Notebook since yesterday >>>> ? >>>> Did you remember what was the change to operate after the last reboot of >>>> the sqlite database ? >>>> >>>> Thank you, >>>> >>>> Vincent >>>> >>>> -- >>>> Vincent Jeanselme >>>> ----------------- >>>> Analyst Researcher >>>> Auton Lab - Robotics Institute >>>> Carnegie Mellon University >>>> >>>> >>>> -- > Vincent Jeanselme > ----------------- > Analyst Researcher > Auton Lab - Robotics Institute > Carnegie Mellon University > > -- *Chirag Nagpal* Graduate Student, Language Technologies Institute School of Computer Science Carnegie Mellon University cs.cmu.edu/~chiragn -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbarnes1 at andrew.cmu.edu Tue Nov 6 17:42:15 2018 From: mbarnes1 at andrew.cmu.edu (Matthew Barnes) Date: Tue, 6 Nov 2018 17:42:15 -0500 Subject: Jupyter Notebooks In-Reply-To: References: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu> Message-ID: I'm not using Anaconda. Unsure if this is related, but now also having issues with CUDA, too. On Tue, Nov 6, 2018 at 5:33 PM Chirag Nagpal wrote: > Predrag: I am able to reproduce the error on the scratch directory too. > > Vincent, Matt: are you using the anaconda jupyter? it could be an > anaconda upgrade thats responsible ? > > Chirag > > On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme > wrote: > >> Even after changing my bashrc with export >> IPYTHONDIR=/home/scratch/$USER/.ipython and reinstalling my jupyter, it >> still does not seem to work. I also have an issue with git, I am no longer >> able to pull from the server. >> >> >> On 11/6/18 3:14 PM, Predrag Punosevac wrote: >> >>> Matthew Barnes wrote: >>> >>> Also having this problem. Trying to create a new notebook hangs on >>>> "Creating new notebook in", and unable to open old notebooks. Anyone's >>>> setup currently working? >>>> >>>> Jupyter Notebook is using sqlite database to store the info. Unless you >>> explicitly force Jupyter to create the database on the scratch directory >>> the database is stored on the NFS share. There is nothing worse one can >>> do in terms of data consistency than put a database or a private Git >>> repo (talking about the server) onto the NFS. The datebase was left in >>> inconsistent state after the file server was rebooted. You have to clear >>> it and possibly recreate the database to be able to use Jupyter >>> Notebook. >>> >>> Best, >>> Predrag >>> >>> >>> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme < >>>> vjeansel at andrew.cmu.edu> >>>> wrote: >>>> >>>> Hello, >>>>> >>>>> Has anyone faced problems with running Jupyter Notebook since >>>>> yesterday ? >>>>> Did you remember what was the change to operate after the last reboot >>>>> of >>>>> the sqlite database ? >>>>> >>>>> Thank you, >>>>> >>>>> Vincent >>>>> >>>>> -- >>>>> Vincent Jeanselme >>>>> ----------------- >>>>> Analyst Researcher >>>>> Auton Lab - Robotics Institute >>>>> Carnegie Mellon University >>>>> >>>>> >>>>> -- >> Vincent Jeanselme >> ----------------- >> Analyst Researcher >> Auton Lab - Robotics Institute >> Carnegie Mellon University >> >> > > > -- > > *Chirag Nagpal* Graduate Student, Language Technologies Institute > School of Computer Science > Carnegie Mellon University > cs.cmu.edu/~chiragn > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbarnes1 at andrew.cmu.edu Tue Nov 6 18:01:59 2018 From: mbarnes1 at andrew.cmu.edu (Matthew Barnes) Date: Tue, 6 Nov 2018 18:01:59 -0500 Subject: CUDA hangs Message-ID: Is anyone else having issues with CUDA since this week? Even simple pytorch commands hang: (torch) bash-4.2$ python Python 2.7.5 (default, Jul 3 2018, 19:30:05) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import torch x>>> x = torch.zeros(4) >>> x.cuda() nvidia-smi works, and torch.cuda.is_available() returns True. -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Nov 6 20:06:06 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 06 Nov 2018 20:06:06 -0500 Subject: CUDA hangs In-Reply-To: References: Message-ID: <20181107010606.qX0qLZrAb%predragp@andrew.cmu.edu> Matthew Barnes wrote: > Is anyone else having issues with CUDA since this week? Even simple pytorch > commands hang: > Do you have issues on all 8 GPU servers (GPU 7 is used for special project) you can access? I upgraded driver and CUDA to 9.2 on GPU1. I would not expect pytorch to work after that without reinstalling. GPU3 and GPU4 are reporting 3 GPU cards. That is a bad sign and means dead hardware. I am planning to reboot it and play with it little bit before the final diagnosis. Predrag > (torch) bash-4.2$ python > Python 2.7.5 (default, Jul 3 2018, 19:30:05) > [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import torch > x>>> x = torch.zeros(4) > >>> x.cuda() > > > nvidia-smi works, and torch.cuda.is_available() returns True. From qiong.zhang at stat.ubc.ca Tue Nov 6 18:41:30 2018 From: qiong.zhang at stat.ubc.ca (qiong.zhang at stat.ubc.ca) Date: Tue, 06 Nov 2018 23:41:30 +0000 Subject: CUDA hangs In-Reply-To: References: Message-ID: I have a similar issue. When I submit the job, it says Runtime error: CUDA error: unknown error. I tried the simple commands that you provided, doesn't work as well. Qiong November 6, 2018 3:02 PM, "Matthew Barnes" )> wrote: Is anyone else having issues with CUDA since this week? Even simple pytorch commands hang: (torch) bash-4.2$ python Python 2.7.5 (default, Jul 3 2018, 19:30:05) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import torch x>>> x = torch.zeros(4) >>> x.cuda() nvidia-smi works, and torch.cuda.is_available() returns True. -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Nov 6 20:39:24 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 06 Nov 2018 20:39:24 -0500 Subject: Jupyter Notebooks In-Reply-To: References: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu> Message-ID: <20181107013924.QogiH68Av%predragp@andrew.cmu.edu> Chirag Nagpal wrote: > Predrag: I am able to reproduce the error on the scratch directory too. > I am sure you guys have a problem but one doesn't need Jupyter to do actual programming in Python. I am saying this because there are handful of you who are affected by this behavior (God knows what could have caused possibly even regressions by newer version of packages) and I am the only firefighter currently without bandwidth to deal with such wild fires. > Vincent, Matt: are you using the anaconda jupyter? it could be an anaconda > upgrade thats responsible ? > > Chirag > > On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme > wrote: > > > Even after changing my bashrc with export IPYTHONDIR=/home/scratch/$USER/.ipython > > and reinstalling my jupyter, it still does not seem to work. I also have an > > issue with git, I am no longer able to pull from the server. > > The git issue is environmental variable issue which is caused by the fact that there is only one user on Gogs server (git) and all accounts are just aliases with their own ssh-keys to this account. I don't use and know enough about git but those nasty files in your reponame/.git folder which look like predragp at lov3$ ls branches description HEAD index logs ORIG_HEAD refs config FETCH_HEAD hooks info objects packed-refs apparently get populated in different ways depending on the login. So for example if I ssh to one of the computing nodes from home I get this predragp at lov3$ git pull Host key fingerprint is SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA +---[ECDSA 256]---+ | .++++o.o.. | | . .+=+B . | |. . . . ..Bo. | |.E + o . o . | | .= * S | |o=o* . | |+=+.o .o | |o.o+.+. . | |. +o.. | +----[SHA256]-----+ remote: Enumerating objects: 34, done. remote: Counting objects: 100% (34/34), done. remote: Compressing objects: 100% (21/21), done. remote: Total 22 (delta 8), reused 0 (delta 0) Unpacking objects: 100% (22/22), done. >From ssh://git:/predragp/ansible cfc10cf..71a9aec master -> origin/master Updating cfc10cf..71a9aec Fast-forward Linux/autofs/etc/auto.nfs | 18 +++++++++++++++++ Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt | 0 Linux/ldap/etc/openldap/ldap.conf | 6 +++--- Linux/ldap/etc/sssd/sssd.conf | 2 +- Linux/ldap/ldap.yaml | 24 +++++++++++++++-------- 5 files changed, 38 insertions(+), 12 deletions(-) rename Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt (100%) If I ssh to my desktop I get predragp at lake$ git pull Host key fingerprint is SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA +---[ECDSA 256]---+ | .++++o.o.. | | . .+=+B . | |. . . . ..Bo. | |.E + o . o . | | .= * S | |o=o* . | |+=+.o .o | |o.o+.+. . | |. +o.. | +----[SHA256]-----+ Password for git at git.int.autonlab.org: which is the indication that my .ssh/config file and the ssh-key were not read even though Host git HostName git.int.autonlab.org Port 2222 User git IdentityFile /home/predragp/.ssh/git_rsa However if I log from the terminal to my desktop I don't have a Git issue. Best, Predrag > > > > On 11/6/18 3:14 PM, Predrag Punosevac wrote: > > > >> Matthew Barnes wrote: > >> > >> Also having this problem. Trying to create a new notebook hangs on > >>> "Creating new notebook in", and unable to open old notebooks. Anyone's > >>> setup currently working? > >>> > >>> Jupyter Notebook is using sqlite database to store the info. Unless you > >> explicitly force Jupyter to create the database on the scratch directory > >> the database is stored on the NFS share. There is nothing worse one can > >> do in terms of data consistency than put a database or a private Git > >> repo (talking about the server) onto the NFS. The datebase was left in > >> inconsistent state after the file server was rebooted. You have to clear > >> it and possibly recreate the database to be able to use Jupyter > >> Notebook. > >> > >> Best, > >> Predrag > >> > >> > >> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme < > >>> vjeansel at andrew.cmu.edu> > >>> wrote: > >>> > >>> Hello, > >>>> > >>>> Has anyone faced problems with running Jupyter Notebook since yesterday > >>>> ? > >>>> Did you remember what was the change to operate after the last reboot of > >>>> the sqlite database ? > >>>> > >>>> Thank you, > >>>> > >>>> Vincent > >>>> > >>>> -- > >>>> Vincent Jeanselme > >>>> ----------------- > >>>> Analyst Researcher > >>>> Auton Lab - Robotics Institute > >>>> Carnegie Mellon University > >>>> > >>>> > >>>> -- > > Vincent Jeanselme > > ----------------- > > Analyst Researcher > > Auton Lab - Robotics Institute > > Carnegie Mellon University > > > > > > > -- > > *Chirag Nagpal* Graduate Student, Language Technologies Institute > School of Computer Science > Carnegie Mellon University > cs.cmu.edu/~chiragn From predragp at andrew.cmu.edu Tue Nov 6 20:41:14 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 06 Nov 2018 20:41:14 -0500 Subject: CUDA hangs In-Reply-To: References: Message-ID: <20181107014114.WFKLqkUPf%predragp@andrew.cmu.edu> qiong.zhang at stat.ubc.ca wrote: > I have a similar issue. When I submit the job, it says Runtime error: CUDA error: unknown error. I tried the simple commands that you provided, doesn't work as well. > Can you tell me which server? We have nine GPU servers. Predrag > Qiong > November 6, 2018 3:02 PM, "Matthew Barnes" )> wrote: > Is anyone else having issues with CUDA since this week? Even simple pytorch commands hang: > (torch) bash-4.2$ python > Python 2.7.5 (default, Jul 3 2018, 19:30:05) > [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import torch > x>>> x = torch.zeros(4) > >>> x.cuda() > nvidia-smi works, and torch.cuda.is_available() returns True. From eyolcu at cs.cmu.edu Tue Nov 6 20:41:29 2018 From: eyolcu at cs.cmu.edu (Emre Yolcu) Date: Tue, 6 Nov 2018 20:41:29 -0500 Subject: CUDA hangs In-Reply-To: References: Message-ID: Could you try setting up everything in the scratch directory and test that way (if that's not what you're already doing)? The last time we had a CUDA problem I moved everything from /zfsauton/home to /home/scratch directories and I cannot reproduce the error on gpu{6,8,9}. On Tue, Nov 6, 2018 at 6:41 PM, wrote: > I have a similar issue. When I submit the job, it says Runtime error: CUDA > error: unknown error. I tried the simple commands that you provided, > doesn't work as well. > > Qiong > > November 6, 2018 3:02 PM, "Matthew Barnes" <%22Matthew%20Barnes%22%20%3Cmbarnes1 at andrew.cmu.edu%3E>> wrote: > > Is anyone else having issues with CUDA since this week? Even simple > pytorch commands hang: > (torch) bash-4.2$ python > Python 2.7.5 (default, Jul 3 2018, 19:30:05) > [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import torch > x>>> x = torch.zeros(4) > >>> x.cuda() > nvidia-smi works, and torch.cuda.is_available() returns True. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chiragn at cs.cmu.edu Tue Nov 6 20:43:24 2018 From: chiragn at cs.cmu.edu (Chirag Nagpal) Date: Tue, 6 Nov 2018 20:43:24 -0500 Subject: Jupyter Notebooks In-Reply-To: <20181107013924.QogiH68Av%predragp@andrew.cmu.edu> References: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu> <20181107013924.QogiH68Av%predragp@andrew.cmu.edu> Message-ID: I am working on fixing this. It is indeed SQLite and NFS not talking to each other which is the problem. I am able to get jupyter to behave somewhat better by forcing the SQLite server to be in the scratch instead if the NFS. This requires changing some default flags for jupyter. Its not completely fixed yet, but I will get back with an update soon. Chirag On Tue, Nov 6, 2018 at 8:39 PM, Predrag Punosevac wrote: > Chirag Nagpal wrote: > > > Predrag: I am able to reproduce the error on the scratch directory too. > > > > I am sure you guys have a problem but one doesn't need Jupyter to do > actual programming in Python. I am saying this because there are > handful of you who are affected by this behavior (God knows what could > have caused possibly even regressions by newer version of packages) and > I am the only firefighter currently without bandwidth to deal with such > wild fires. > > > Vincent, Matt: are you using the anaconda jupyter? it could be an > anaconda > > upgrade thats responsible ? > > > > Chirag > > > > On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme < > vjeansel at andrew.cmu.edu> > > wrote: > > > > > Even after changing my bashrc with export IPYTHONDIR=/home/scratch/$ > USER/.ipython > > > and reinstalling my jupyter, it still does not seem to work. I also > have an > > > issue with git, I am no longer able to pull from the server. > > > > > > The git issue is environmental variable issue which is caused by the > fact that there is only one user on Gogs server (git) and all accounts > are just aliases with their own ssh-keys to this account. I don't use > and know enough about git but those nasty files in your > > reponame/.git > > folder > > which look like > > predragp at lov3$ ls > branches description HEAD index logs ORIG_HEAD refs > config FETCH_HEAD hooks info objects packed-refs > > apparently get populated in different ways depending on the login. So > for example if I ssh to one of the computing nodes from home I get this > > > predragp at lov3$ git pull > Host key fingerprint is > SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA > +---[ECDSA 256]---+ > | .++++o.o.. | > | . .+=+B . | > |. . . . ..Bo. | > |.E + o . o . | > | .= * S | > |o=o* . | > |+=+.o .o | > |o.o+.+. . | > |. +o.. | > +----[SHA256]-----+ > remote: Enumerating objects: 34, done. > remote: Counting objects: 100% (34/34), done. > remote: Compressing objects: 100% (21/21), done. > remote: Total 22 (delta 8), reused 0 (delta 0) > Unpacking objects: 100% (22/22), done. > From ssh://git:/predragp/ansible > cfc10cf..71a9aec master -> origin/master > Updating cfc10cf..71a9aec > Fast-forward > Linux/autofs/etc/auto.nfs | 18 > +++++++++++++++++ > Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt | 0 > Linux/ldap/etc/openldap/ldap.conf | 6 +++--- > Linux/ldap/etc/sssd/sssd.conf | 2 +- > Linux/ldap/ldap.yaml | 24 > +++++++++++++++-------- > 5 files changed, 38 insertions(+), 12 deletions(-) > rename Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt (100%) > > > If I ssh to my desktop I get > > predragp at lake$ git pull > Host key fingerprint is > SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA > +---[ECDSA 256]---+ > | .++++o.o.. | > | . .+=+B . | > |. . . . ..Bo. | > |.E + o . o . | > | .= * S | > |o=o* . | > |+=+.o .o | > |o.o+.+. . | > |. +o.. | > +----[SHA256]-----+ > Password for git at git.int.autonlab.org: > > which is the indication that my .ssh/config file and the ssh-key were > not read even though > > Host git > HostName git.int.autonlab.org > Port 2222 > User git > IdentityFile /home/predragp/.ssh/git_rsa > > However if I log from the terminal to my desktop I don't have a Git > issue. > > > Best, > Predrag > > > > > > > > > > > > On 11/6/18 3:14 PM, Predrag Punosevac wrote: > > > > > >> Matthew Barnes wrote: > > >> > > >> Also having this problem. Trying to create a new notebook hangs on > > >>> "Creating new notebook in", and unable to open old notebooks. > Anyone's > > >>> setup currently working? > > >>> > > >>> Jupyter Notebook is using sqlite database to store the info. Unless > you > > >> explicitly force Jupyter to create the database on the scratch > directory > > >> the database is stored on the NFS share. There is nothing worse one > can > > >> do in terms of data consistency than put a database or a private Git > > >> repo (talking about the server) onto the NFS. The datebase was left in > > >> inconsistent state after the file server was rebooted. You have to > clear > > >> it and possibly recreate the database to be able to use Jupyter > > >> Notebook. > > >> > > >> Best, > > >> Predrag > > >> > > >> > > >> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme < > > >>> vjeansel at andrew.cmu.edu> > > >>> wrote: > > >>> > > >>> Hello, > > >>>> > > >>>> Has anyone faced problems with running Jupyter Notebook since > yesterday > > >>>> ? > > >>>> Did you remember what was the change to operate after the last > reboot of > > >>>> the sqlite database ? > > >>>> > > >>>> Thank you, > > >>>> > > >>>> Vincent > > >>>> > > >>>> -- > > >>>> Vincent Jeanselme > > >>>> ----------------- > > >>>> Analyst Researcher > > >>>> Auton Lab - Robotics Institute > > >>>> Carnegie Mellon University > > >>>> > > >>>> > > >>>> -- > > > Vincent Jeanselme > > > ----------------- > > > Analyst Researcher > > > Auton Lab - Robotics Institute > > > Carnegie Mellon University > > > > > > > > > > > > -- > > > > *Chirag Nagpal* Graduate Student, Language Technologies Institute > > School of Computer Science > > Carnegie Mellon University > > cs.cmu.edu/~chiragn > -- *Chirag Nagpal* Graduate Student, Language Technologies Institute School of Computer Science Carnegie Mellon University cs.cmu.edu/~chiragn -------------- next part -------------- An HTML attachment was scrubbed... URL: From chiragn at cs.cmu.edu Tue Nov 6 21:39:30 2018 From: chiragn at cs.cmu.edu (Chirag Nagpal) Date: Tue, 6 Nov 2018 21:39:30 -0500 Subject: Jupyter Notebooks In-Reply-To: References: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu> <20181107013924.QogiH68Av%predragp@andrew.cmu.edu> Message-ID: Ok I think I have a quick fix for people struggling with jupyter notebook on NFS. There are essentially two parts to the problem. The first one deals with forcing jupyter to create its SQLite db in your scratch directory. The second part is the ipython directory. Part 1: Step 1 : First ssh into the Auton Computing Environment and run $jupyter notebook --generate-config This would create a config file 'jupyter_notebook_config.py' in ~/.jupyter . ( Note that ~ this is /zfsauton/home/ ) Step 2: Edit the file above with your favourite text editor and add (or uncomment) the following line c.NotebookNotary.db_file='/home/scratch//jupyter.log' replace with your auton username. If you executed part 1 correctly, you should be able to create a new Notebook, and stop the Ipython server using control-c. However, you will not be able to connect Jupyter to Ipython for which you need to perform part 2 Part 2: $export IPYTHONDIR=/home/scratch//ipython as always replace with your auton username. you can also add this to .bashrc for next time ;) Thats it! It should work now! Chirag On Tue, Nov 6, 2018 at 8:43 PM, Chirag Nagpal wrote: > I am working on fixing this. > > It is indeed SQLite and NFS not talking to each other which is the > problem. I am able to get jupyter to behave somewhat better by forcing the > SQLite server to be in the scratch instead if the NFS. This requires > changing some default flags for jupyter. > > Its not completely fixed yet, but I will get back with an update soon. > > Chirag > > > > On Tue, Nov 6, 2018 at 8:39 PM, Predrag Punosevac > wrote: > >> Chirag Nagpal wrote: >> >> > Predrag: I am able to reproduce the error on the scratch directory too. >> > >> >> I am sure you guys have a problem but one doesn't need Jupyter to do >> actual programming in Python. I am saying this because there are >> handful of you who are affected by this behavior (God knows what could >> have caused possibly even regressions by newer version of packages) and >> I am the only firefighter currently without bandwidth to deal with such >> wild fires. >> >> > Vincent, Matt: are you using the anaconda jupyter? it could be an >> anaconda >> > upgrade thats responsible ? >> > >> > Chirag >> > >> > On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme < >> vjeansel at andrew.cmu.edu> >> > wrote: >> > >> > > Even after changing my bashrc with export >> IPYTHONDIR=/home/scratch/$USER/.ipython >> > > and reinstalling my jupyter, it still does not seem to work. I also >> have an >> > > issue with git, I am no longer able to pull from the server. >> > > >> >> >> The git issue is environmental variable issue which is caused by the >> fact that there is only one user on Gogs server (git) and all accounts >> are just aliases with their own ssh-keys to this account. I don't use >> and know enough about git but those nasty files in your >> >> reponame/.git >> >> folder >> >> which look like >> >> predragp at lov3$ ls >> branches description HEAD index logs ORIG_HEAD refs >> config FETCH_HEAD hooks info objects packed-refs >> >> apparently get populated in different ways depending on the login. So >> for example if I ssh to one of the computing nodes from home I get this >> >> >> predragp at lov3$ git pull >> Host key fingerprint is >> SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA >> +---[ECDSA 256]---+ >> | .++++o.o.. | >> | . .+=+B . | >> |. . . . ..Bo. | >> |.E + o . o . | >> | .= * S | >> |o=o* . | >> |+=+.o .o | >> |o.o+.+. . | >> |. +o.. | >> +----[SHA256]-----+ >> remote: Enumerating objects: 34, done. >> remote: Counting objects: 100% (34/34), done. >> remote: Compressing objects: 100% (21/21), done. >> remote: Total 22 (delta 8), reused 0 (delta 0) >> Unpacking objects: 100% (22/22), done. >> From ssh://git:/predragp/ansible >> cfc10cf..71a9aec master -> origin/master >> Updating cfc10cf..71a9aec >> Fast-forward >> Linux/autofs/etc/auto.nfs | 18 >> +++++++++++++++++ >> Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt | 0 >> Linux/ldap/etc/openldap/ldap.conf | 6 +++--- >> Linux/ldap/etc/sssd/sssd.conf | 2 +- >> Linux/ldap/ldap.yaml | 24 >> +++++++++++++++-------- >> 5 files changed, 38 insertions(+), 12 deletions(-) >> rename Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt (100%) >> >> >> If I ssh to my desktop I get >> >> predragp at lake$ git pull >> Host key fingerprint is >> SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA >> +---[ECDSA 256]---+ >> | .++++o.o.. | >> | . .+=+B . | >> |. . . . ..Bo. | >> |.E + o . o . | >> | .= * S | >> |o=o* . | >> |+=+.o .o | >> |o.o+.+. . | >> |. +o.. | >> +----[SHA256]-----+ >> Password for git at git.int.autonlab.org: >> >> which is the indication that my .ssh/config file and the ssh-key were >> not read even though >> >> Host git >> HostName git.int.autonlab.org >> Port 2222 >> User git >> IdentityFile /home/predragp/.ssh/git_rsa >> >> However if I log from the terminal to my desktop I don't have a Git >> issue. >> >> >> Best, >> Predrag >> >> >> >> >> >> >> > > >> > > On 11/6/18 3:14 PM, Predrag Punosevac wrote: >> > > >> > >> Matthew Barnes wrote: >> > >> >> > >> Also having this problem. Trying to create a new notebook hangs on >> > >>> "Creating new notebook in", and unable to open old notebooks. >> Anyone's >> > >>> setup currently working? >> > >>> >> > >>> Jupyter Notebook is using sqlite database to store the info. Unless >> you >> > >> explicitly force Jupyter to create the database on the scratch >> directory >> > >> the database is stored on the NFS share. There is nothing worse one >> can >> > >> do in terms of data consistency than put a database or a private Git >> > >> repo (talking about the server) onto the NFS. The datebase was left >> in >> > >> inconsistent state after the file server was rebooted. You have to >> clear >> > >> it and possibly recreate the database to be able to use Jupyter >> > >> Notebook. >> > >> >> > >> Best, >> > >> Predrag >> > >> >> > >> >> > >> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme < >> > >>> vjeansel at andrew.cmu.edu> >> > >>> wrote: >> > >>> >> > >>> Hello, >> > >>>> >> > >>>> Has anyone faced problems with running Jupyter Notebook since >> yesterday >> > >>>> ? >> > >>>> Did you remember what was the change to operate after the last >> reboot of >> > >>>> the sqlite database ? >> > >>>> >> > >>>> Thank you, >> > >>>> >> > >>>> Vincent >> > >>>> >> > >>>> -- >> > >>>> Vincent Jeanselme >> > >>>> ----------------- >> > >>>> Analyst Researcher >> > >>>> Auton Lab - Robotics Institute >> > >>>> Carnegie Mellon University >> > >>>> >> > >>>> >> > >>>> -- >> > > Vincent Jeanselme >> > > ----------------- >> > > Analyst Researcher >> > > Auton Lab - Robotics Institute >> > > Carnegie Mellon University >> > > >> > > >> > >> > >> > -- >> > >> > *Chirag Nagpal* Graduate Student, Language Technologies Institute >> > School of Computer Science >> > Carnegie Mellon University >> > cs.cmu.edu/~chiragn >> > > > > -- > > *Chirag Nagpal* Graduate Student, Language Technologies Institute > School of Computer Science > Carnegie Mellon University > cs.cmu.edu/~chiragn > -- *Chirag Nagpal* Graduate Student, Language Technologies Institute School of Computer Science Carnegie Mellon University cs.cmu.edu/~chiragn -------------- next part -------------- An HTML attachment was scrubbed... URL: From yichongx at cs.cmu.edu Tue Nov 6 21:43:29 2018 From: yichongx at cs.cmu.edu (Yichong Xu) Date: Wed, 7 Nov 2018 02:43:29 +0000 Subject: CUDA hangs In-Reply-To: References: Message-ID: <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu> Previously we have encountered this issue: Basically somehow you cannot put your cuda cache on nfs server now. Doing this will resolve the problem (works for me): export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder] Thanks, Yichong On Nov 6, 2018, at 7:41 PM, Emre Yolcu > wrote: Could you try setting up everything in the scratch directory and test that way (if that's not what you're already doing)? The last time we had a CUDA problem I moved everything from /zfsauton/home to /home/scratch directories and I cannot reproduce the error on gpu{6,8,9}. On Tue, Nov 6, 2018 at 6:41 PM, > wrote: I have a similar issue. When I submit the job, it says Runtime error: CUDA error: unknown error. I tried the simple commands that you provided, doesn't work as well. Qiong November 6, 2018 3:02 PM, "Matthew Barnes" > wrote: Is anyone else having issues with CUDA since this week? Even simple pytorch commands hang: (torch) bash-4.2$ python Python 2.7.5 (default, Jul 3 2018, 19:30:05) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import torch x>>> x = torch.zeros(4) >>> x.cuda() nvidia-smi works, and torch.cuda.is_available() returns True. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbarnes1 at andrew.cmu.edu Tue Nov 6 21:51:19 2018 From: mbarnes1 at andrew.cmu.edu (Matthew Barnes) Date: Tue, 6 Nov 2018 21:51:19 -0500 Subject: CUDA hangs In-Reply-To: <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu> References: <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu> Message-ID: The CUDA_CACHE_PATH works! Thanks for the quick fix. On Tue, Nov 6, 2018 at 9:44 PM Yichong Xu wrote: > Previously we have encountered this issue: Basically somehow you cannot > put your cuda cache on nfs server now. Doing this will resolve the problem > (works for me): > export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder] > > *Thanks,* > *Yichong* > > > > On Nov 6, 2018, at 7:41 PM, Emre Yolcu wrote: > > Could you try setting up everything in the scratch directory and test that > way (if that's not what you're already doing)? The last time we had a CUDA > problem I moved everything from /zfsauton/home to /home/scratch directories > and I cannot reproduce the error on gpu{6,8,9}. > > On Tue, Nov 6, 2018 at 6:41 PM, wrote: > >> I have a similar issue. When I submit the job, it says Runtime error: >> CUDA error: unknown error. I tried the simple commands that you provided, >> doesn't work as well. >> >> Qiong >> >> November 6, 2018 3:02 PM, "Matthew Barnes" > <%22Matthew%20Barnes%22%20%3Cmbarnes1 at andrew.cmu.edu%3E>> wrote: >> >> Is anyone else having issues with CUDA since this week? Even simple >> pytorch commands hang: >> (torch) bash-4.2$ python >> Python 2.7.5 (default, Jul 3 2018, 19:30:05) >> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> import torch >> x>>> x = torch.zeros(4) >> >>> x.cuda() >> nvidia-smi works, and torch.cuda.is_available() returns True. >> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vjeansel at andrew.cmu.edu Tue Nov 6 22:01:18 2018 From: vjeansel at andrew.cmu.edu (Vincent Jeanselme) Date: Tue, 6 Nov 2018 22:01:18 -0500 Subject: CUDA hangs In-Reply-To: References: <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu> Message-ID: Unfortunately not for me, I already had this path ... Le 06/11/2018 ? 21:51, Matthew Barnes a ?crit?: > The CUDA_CACHE_PATH works! Thanks for the quick fix. > > On Tue, Nov 6, 2018 at 9:44 PM Yichong Xu > wrote: > > Previously we have encountered this issue: Basically somehow you > cannot put your cuda cache on nfs server now. Doing this will > resolve the problem (works for me): > export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder] > > /Thanks,/ > /Yichong/ > > > >> On Nov 6, 2018, at 7:41 PM, Emre Yolcu > > wrote: >> >> Could you try setting up everything in the scratch directory and >> test that way (if that's not what you're already doing)? The last >> time we had a CUDA problem I moved everything from /zfsauton/home >> to /home/scratch directories and I cannot reproduce the error on >> gpu{6,8,9}. >> >> On Tue, Nov 6, 2018 at 6:41 PM, > > wrote: >> >> I have a similar issue. When I submit the job, it says >> Runtime error: CUDA error: unknown error. I tried the simple >> commands that you provided, doesn't work as well. >> >> Qiong >> >> >> November 6, 2018 3:02 PM, "Matthew Barnes" >> > > >> wrote: >> >> Is anyone else having issues with CUDA since this week? >> Even simple pytorch commands hang: >> (torch) bash-4.2$ python >> Python 2.7.5 (default, Jul 3 2018, 19:30:05) >> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 >> Type "help", "copyright", "credits" or "license" for more >> information. >> >>> import torch >> x>>> x = torch.zeros(4) >> >>> x.cuda() >> nvidia-smi works, and torch.cuda.is_available() returns True. >> >> >> >> > -- Vincent Jeanselme ----------------- Analyst Researcher Auton Lab - Robotics Institute Carnegie Mellon University -------------- next part -------------- An HTML attachment was scrubbed... URL: From vjeansel at andrew.cmu.edu Tue Nov 6 22:06:53 2018 From: vjeansel at andrew.cmu.edu (Vincent Jeanselme) Date: Tue, 6 Nov 2018 22:06:53 -0500 Subject: Jupyter Notebooks In-Reply-To: References: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu> <20181107013924.QogiH68Av%predragp@andrew.cmu.edu> Message-ID: <94b2672f-cf66-f6ea-4780-f04305d1cb61@andrew.cmu.edu> Thank you ! Uncommenting the line: c.NotebookNotary.db_file worked for my Jupyter Le 06/11/2018 ? 21:39, Chirag Nagpal a ?crit?: > Ok I think I have a quick fix for people struggling with jupyter > notebook on NFS. > > There are essentially two parts to the problem. The first one deals > with forcing jupyter to create its SQLite db in your scratch > directory. The second part is the ipython directory. > > > Part 1: > > Step 1?: First ssh into the Auton Computing Environment and run > > $jupyter notebook --generate-config > > This would create a config file 'jupyter_notebook_config.py' in > ~/.jupyter . ( Note that ~ this is /zfsauton/home/ ) > > Step 2: Edit the file above with your favourite text editor and add > (or uncomment) the following line > > c.NotebookNotary.db_file='/home/scratch//jupyter.log' > > replace with your auton username. > > If you executed part 1 correctly, you should be able to create a new > Notebook, and stop the Ipython server using control-c. However, you > will not be able to connect Jupyter to Ipython for which you need to > perform part 2 > > Part 2: > > $export IPYTHONDIR=/home/scratch//ipython > > as always replace with your auton username. > > you can also add this to .bashrc for next time ;) > > Thats it! > > It should work now! > > Chirag > > > > > > > > > On Tue, Nov 6, 2018 at 8:43 PM, Chirag Nagpal > wrote: > > I am working on fixing this. > > It is indeed SQLite and NFS not talking to each other which is the > problem. I am able to get jupyter to behave somewhat better by > forcing the SQLite server to be in the scratch instead if the NFS. > This requires changing some default flags for jupyter. > > Its not completely fixed yet, but I will get back with an update > soon. > > Chirag > > > > On Tue, Nov 6, 2018 at 8:39 PM, Predrag Punosevac > > wrote: > > Chirag Nagpal > > wrote: > > > Predrag: I am able to reproduce the error on the scratch > directory too. > > > > I am sure you guys have a problem but one doesn't need Jupyter > to do > actual programming in Python. I am saying this because there are > handful of you who are affected by this behavior (God knows > what could > have caused possibly even regressions by newer version of > packages) and > I am the only firefighter currently without bandwidth to deal > with such > wild fires. > > > Vincent, Matt:? are you using the anaconda jupyter? it could > be an anaconda > > upgrade thats responsible ? > > > > Chirag > > > > On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme > > > > wrote: > > > > > Even after changing my bashrc with export > IPYTHONDIR=/home/scratch/$USER/.ipython > > > and reinstalling my jupyter, it still does not seem to > work. I also have an > > > issue with git, I am no longer able to pull from the server. > > > > > > The git issue is environmental variable issue which is caused > by the > fact that there is only one user on Gogs server (git) and all > accounts > are just aliases with their own ssh-keys to this account. I > don't use > and know enough about git but those nasty files in your > > reponame/.git > > folder > > which look like > > predragp at lov3$ ls > branches? description? HEAD? ?index? logs ?ORIG_HEAD? ? refs > config? ? FETCH_HEAD? ?hooks? info? ?objects packed-refs > > apparently get populated in different ways depending on the > login. So > for example if I ssh to one of the computing nodes from home I > get this > > > predragp at lov3$ git pull > Host key fingerprint is > SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA > +---[ECDSA 256]---+ > |? ? ?.++++o.o..? | > |? ? ?. .+=+B .? ?| > |. . . . ..Bo.? ? | > |.E + o . o? .? ? | > | .= *? ?S? ? ? ? | > |o=o*? .? ? ? ? ? | > |+=+.o .o? ? ? ? ?| > |o.o+.+. .? ? ? ? | > |.? ?+o..? ? ? ? ?| > +----[SHA256]-----+ > remote: Enumerating objects: 34, done. > remote: Counting objects: 100% (34/34), done. > remote: Compressing objects: 100% (21/21), done. > remote: Total 22 (delta 8), reused 0 (delta 0) > Unpacking objects: 100% (22/22), done. > From ssh://git:/predragp/ansible > ? ?cfc10cf..71a9aec? master? ? ?-> origin/master > Updating cfc10cf..71a9aec > Fast-forward > ?Linux/autofs/etc/auto.nfs ?| 18 > +++++++++++++++++ > ?Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt |? 0 > ?Linux/ldap/etc/openldap/ldap.conf ? ? ?|? 6 +++--- > ?Linux/ldap/etc/sssd/sssd.conf ? ? ?|? 2 +- > ?Linux/ldap/ldap.yaml | 24 > +++++++++++++++-------- > ?5 files changed, 38 insertions(+), 12 deletions(-) > ?rename Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt (100%) > > > If I ssh to my desktop I get > > predragp at lake$ git pull > Host key fingerprint is > SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA > +---[ECDSA 256]---+ > |? ? ?.++++o.o..? | > |? ? ?. .+=+B .? ?| > |. . . . ..Bo.? ? | > |.E + o . o? .? ? | > | .= *? ?S? ? ? ? | > |o=o*? .? ? ? ? ? | > |+=+.o .o? ? ? ? ?| > |o.o+.+. .? ? ? ? | > |.? ?+o..? ? ? ? ?| > +----[SHA256]-----+ > Password for git at git.int.autonlab.org > : > > which is the indication that my .ssh/config file and the > ssh-key were > not read even though > > Host git > ? ? HostName git.int.autonlab.org > ? ? Port 2222 > ? ? User git > ? ? IdentityFile /home/predragp/.ssh/git_rsa > > However if I log from the terminal to my desktop I don't have > a Git > issue. > > > Best, > Predrag > > > > > > > > > > > > On 11/6/18 3:14 PM, Predrag Punosevac wrote: > > > > > >> Matthew Barnes > wrote: > > >> > > >> Also having this problem. Trying to create a new notebook > hangs on > > >>> "Creating new notebook in", and unable to open old > notebooks. Anyone's > > >>> setup currently working? > > >>> > > >>> Jupyter Notebook is using sqlite database to store the > info. Unless you > > >> explicitly force Jupyter to create the database on the > scratch directory > > >> the database is stored on the NFS share. There is nothing > worse one can > > >> do in terms of data consistency than put a database or a > private Git > > >> repo (talking about the server) onto the NFS. The > datebase was left in > > >> inconsistent state after the file server was rebooted. > You have to clear > > >> it and possibly recreate the database to be able to use > Jupyter > > >> Notebook. > > >> > > >> Best, > > >> Predrag > > >> > > >> > > >> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme < > > >>> vjeansel at andrew.cmu.edu > > > >>> wrote: > > >>> > > >>> Hello, > > >>>> > > >>>> Has anyone faced problems with running Jupyter Notebook > since yesterday > > >>>> ? > > >>>> Did you remember what was the change to operate after > the last reboot of > > >>>> the sqlite database ? > > >>>> > > >>>> Thank you, > > >>>> > > >>>> Vincent > > >>>> > > >>>> -- > > >>>> Vincent Jeanselme > > >>>> ----------------- > > >>>> Analyst Researcher > > >>>> Auton Lab - Robotics Institute > > >>>> Carnegie Mellon University > > >>>> > > >>>> > > >>>> -- > > > Vincent Jeanselme > > > ----------------- > > > Analyst Researcher > > > Auton Lab - Robotics Institute > > > Carnegie Mellon University > > > > > > > > > > > > -- > > > > *Chirag Nagpal* Graduate Student, Language Technologies > Institute > > School of Computer Science > > Carnegie Mellon University > > cs.cmu.edu/~chiragn > > > > > -- > *Chirag Nagpal > *Graduate Student, Language Technologies Institute > School of Computer Science > Carnegie Mellon University > cs.cmu.edu/~chiragn > > > > > -- > *Chirag Nagpal > * Graduate Student, Language Technologies Institute > School of Computer Science > Carnegie Mellon University > cs.cmu.edu/~chiragn -- Vincent Jeanselme ----------------- Analyst Researcher Auton Lab - Robotics Institute Carnegie Mellon University -------------- next part -------------- An HTML attachment was scrubbed... URL: From qiong.zhang at stat.ubc.ca Tue Nov 6 23:58:32 2018 From: qiong.zhang at stat.ubc.ca (qiong.zhang at stat.ubc.ca) Date: Wed, 07 Nov 2018 04:58:32 +0000 Subject: CUDA hangs In-Reply-To: <20181107014114.WFKLqkUPf%predragp@andrew.cmu.edu> References: <20181107014114.WFKLqkUPf%predragp@andrew.cmu.edu> Message-ID: <8ba316a89bd0eed2ce0cbb75a959c545@stat.ubc.ca> The CUDA_CACHE_PATH works for me! Thanks. Qiong November 6, 2018 5:41 PM, "Predrag Punosevac" wrote: > qiong.zhang at stat.ubc.ca wrote: > >> I have a similar issue. When I submit the job, it says Runtime error: CUDA error: unknown error. I >> tried the simple commands that you provided, doesn't work as well. > > Can you tell me which server? We have nine GPU servers. > > Predrag > >> Qiong >> November 6, 2018 3:02 PM, "Matthew Barnes" > (mailto:%22Matthew%20Barnes%22%20)> wrote: >> Is anyone else having issues with CUDA since this week? Even simple pytorch commands hang: >> (torch) bash-4.2$ python >> Python 2.7.5 (default, Jul 3 2018, 19:30:05) >> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> import torch >> x>>> x = torch.zeros(4) >> x.cuda() >> nvidia-smi works, and torch.cuda.is_available() returns True. From vjeansel at andrew.cmu.edu Wed Nov 7 08:52:34 2018 From: vjeansel at andrew.cmu.edu (Vincent Jeanselme) Date: Wed, 7 Nov 2018 08:52:34 -0500 Subject: CUDA hangs In-Reply-To: References: <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu> Message-ID: Problem solved after restart of tmux On 11/6/18 10:01 PM, Vincent Jeanselme wrote: > > Unfortunately not for me, I already had this path ... > > Le 06/11/2018 ? 21:51, Matthew Barnes a ?crit?: >> The CUDA_CACHE_PATH works! Thanks for the quick fix. >> >> On Tue, Nov 6, 2018 at 9:44 PM Yichong Xu > > wrote: >> >> Previously we have encountered this issue: Basically somehow you >> cannot put your cuda cache on nfs server now. Doing this will >> resolve the problem (works for me): >> export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder] >> >> /Thanks,/ >> /Yichong/ >> >> >> >>> On Nov 6, 2018, at 7:41 PM, Emre Yolcu >> > wrote: >>> >>> Could you try setting up everything in the scratch directory and >>> test that way (if that's not what you're already doing)? The >>> last time we had a CUDA problem I moved everything from >>> /zfsauton/home to /home/scratch directories and I cannot >>> reproduce the error on gpu{6,8,9}. >>> >>> On Tue, Nov 6, 2018 at 6:41 PM, >> > wrote: >>> >>> I have a similar issue. When I submit the job, it says >>> Runtime error: CUDA error: unknown error. I tried the simple >>> commands that you provided, doesn't work as well. >>> >>> Qiong >>> >>> >>> November 6, 2018 3:02 PM, "Matthew Barnes" >>> >> > >>> wrote: >>> >>> Is anyone else having issues with CUDA since this week? >>> Even simple pytorch commands hang: >>> (torch) bash-4.2$ python >>> Python 2.7.5 (default, Jul 3 2018, 19:30:05) >>> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 >>> Type "help", "copyright", "credits" or "license" for >>> more information. >>> >>> import torch >>> x>>> x = torch.zeros(4) >>> >>> x.cuda() >>> nvidia-smi works, and torch.cuda.is_available() returns >>> True. >>> >>> >>> >>> >> > -- > Vincent Jeanselme > ----------------- > Analyst Researcher > Auton Lab - Robotics Institute > Carnegie Mellon University -- Vincent Jeanselme ----------------- Analyst Researcher Auton Lab - Robotics Institute Carnegie Mellon University -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Nov 7 12:22:05 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 07 Nov 2018 12:22:05 -0500 Subject: Moral of the story In-Reply-To: References: <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu> Message-ID: <20181107172205.cWJ60phln%predragp@andrew.cmu.edu> Vincent Jeanselme wrote: > Problem solved after restart of tmux > This is a good opportunity for all of us to reflect on what we have learnt from this long public e-mail exchange. 1. Caching thing be it pytorch, ccache, or something else speeds up the things but create lot of problems when done on the volatile file system as NFS backed up by the most expensive file system ZFS. It creates unexpected hard to trace errors in the case of the file server unavailability. However from a system admin point of view create enormous garbage on the file server in the form of metadata needed to store hourly snapshots. I would wage $100 that we probably have 500GB in cache files and their snapshots alone on the main file server. I would really appreciate if everyone volunteerly uses only their scratch directories (not /tmp not NFS) for caching as well as clean their home directories during this time when ZFS snapshots are disabled. 2. Storing databases on NFS even unconsciously (sqlite used by Jupyter notebook) will sooner or later leave them in unconsistent state and lead to user frustration which is very hard and time consuming to trace and address. It is even worse doing it intensionally with PostgreSQL or MySQL. Please store your Jupyter notebooks sqlite databases on the scratch directory. For everything else more serious, we have database host that can be used on the need base. 3. Finally we all need to familiarize ourselves better with the tools we are using (Git/Gogs/tmux/screen etc). The decision that we adopt Git as a version control system for the Auton Lab was a long and carefully thought-out. For the record my opinion and my preference (fossil) didn't bare almost any weight. We had two other version control systems CVS and Subversion in the past which are still available as read only through ViewVC http://svnhub.int.autonlab.org/viewvc and I can assure you that we learnt the lectures by using them. The same goes for the Gogs self-hosted Git service which provides us with web interface but also with bug tracking mechanism with code tagging, Wiki, and solid integration with Jenkins. Is it perfect? No it is not. Does one need to understand how the ssh-keys and environmental variables are read. Yes you have to get your feet wet and it is far easier to do it at the Auton Lab which is very forgiving academic computing environment than at your next place of employment. If you think that the Gogs alternative GitLab is any better think again and just talk to people who used it or God forbid try to set it up. Best, Predrag > On 11/6/18 10:01 PM, Vincent Jeanselme wrote: > > > > Unfortunately not for me, I already had this path ... > > > > Le 06/11/2018 ?? 21:51, Matthew Barnes a ??crit??: > >> The CUDA_CACHE_PATH works! Thanks for the quick fix. > >> > >> On Tue, Nov 6, 2018 at 9:44 PM Yichong Xu >> > wrote: > >> > >> Previously we have encountered this issue: Basically somehow you > >> cannot put your cuda cache on nfs server now. Doing this will > >> resolve the problem (works for me): > >> export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder] > >> > >> /Thanks,/ > >> /Yichong/ > >> > >> > >> > >>> On Nov 6, 2018, at 7:41 PM, Emre Yolcu >>> > wrote: > >>> > >>> Could you try setting up everything in the scratch directory and > >>> test that way (if that's not what you're already doing)? The > >>> last time we had a CUDA problem I moved everything from > >>> /zfsauton/home to /home/scratch directories and I cannot > >>> reproduce the error on gpu{6,8,9}. > >>> > >>> On Tue, Nov 6, 2018 at 6:41 PM, >>> > wrote: > >>> > >>> I have a similar issue. When I submit the job, it says > >>> Runtime error: CUDA error: unknown error. I tried the simple > >>> commands that you provided, doesn't work as well. > >>> > >>> Qiong > >>> > >>> > >>> November 6, 2018 3:02 PM, "Matthew Barnes" > >>> >>> > > >>> wrote: > >>> > >>> Is anyone else having issues with CUDA since this week? > >>> Even simple pytorch commands hang: > >>> (torch) bash-4.2$ python > >>> Python 2.7.5 (default, Jul 3 2018, 19:30:05) > >>> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 > >>> Type "help", "copyright", "credits" or "license" for > >>> more information. > >>> >>> import torch > >>> x>>> x = torch.zeros(4) > >>> >>> x.cuda() > >>> nvidia-smi works, and torch.cuda.is_available() returns > >>> True. > >>> > >>> > >>> > >>> > >> > > -- > > Vincent Jeanselme > > ----------------- > > Analyst Researcher > > Auton Lab - Robotics Institute > > Carnegie Mellon University > > -- > Vincent Jeanselme > ----------------- > Analyst Researcher > Auton Lab - Robotics Institute > Carnegie Mellon University > From predragp at andrew.cmu.edu Fri Nov 9 17:45:38 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 9 Nov 2018 17:45:38 -0500 Subject: GPU3 and GPU4 to be rebooted for hardware failure assessment Message-ID: Dear Autonians, I need to reboot GPU3 and GPU4 servers to asses the health of GPU cards. If nobody says anything I will do it on Monday at 2:30 PM. Cheers, Predrag P.S. I do have spare GPU cards but it will take at least 24h after we power up the server to decide if the GPU cards are really dead. From predragp at andrew.cmu.edu Mon Nov 12 16:58:42 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 12 Nov 2018 16:58:42 -0500 Subject: GPU3 and GPU4 rebooted Message-ID: Per my Friday announcement GPU3 and GPU4 have been rebooted and updated. All four GPU cards (4 per servers) are now visible with nvidia-smi utility. It is of paramount importance that people hit these two machines hard with GPU intensive computations so that we see if that report of dead GPU cards was just a fluke or a real thing. Best, Predrag From awd at cs.cmu.edu Mon Nov 26 09:28:48 2018 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 26 Nov 2018 09:28:48 -0500 Subject: Auton Lab NIPS practice session: Thu 11/29 3pm, place TBD Message-ID: Team. Please mark your calendars for the time slot in the subject line above. I will follow up with the location as soon as we get it. If you have a paper or poster to present at NIPS or any of its workshops please come prepared with your talk and/or poster. We will emulate what happens at the conference so that you can practice. Fabian: we can project your poster on a screen and have you skyped-in for audio. Please work with Predrag to get connected. Thanks Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Mon Nov 26 09:30:23 2018 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 26 Nov 2018 09:30:23 -0500 Subject: Auton Lab NIPS practice session: Thu 11/29 3pm, place TBD In-Reply-To: References: Message-ID: We will be in NSH 4201 On Mon, Nov 26, 2018 at 9:28 AM Artur Dubrawski wrote: > Team. > > Please mark your calendars for the time slot in the subject line above. > I will follow up with the location as soon as we get it. > > If you have a paper or poster to present at NIPS or any of its workshops > please come prepared with your talk and/or poster. We will emulate what > happens at the conference so that you can practice. > > Fabian: we can project your poster on a screen and have you skyped-in > for audio. Please work with Predrag to get connected. > > Thanks > Artur > -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Mon Nov 26 09:33:34 2018 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 26 Nov 2018 09:33:34 -0500 Subject: Fwd: RI Ph.D. Thesis Proposal: Chao Liu In-Reply-To: <4ab88d6ad5734673be16a56048d06d02@cmu.edu> References: <4ab88d6ad5734673be16a56048d06d02@cmu.edu> Message-ID: Team, Please come see Chao give his thesis proposal presentation this Friday at 2pm. Cheers Artur ---------- Forwarded message --------- From: Suzanne Muth Date: Wed, Nov 21, 2018 at 10:07 AM Subject: RI Ph.D. Thesis Proposal: Chao Liu To: ri-people at cs.cmu.edu Date: 30 November 2018 Time: 2:00 p.m. Place: NSH 4305 Type: Ph.D. Thesis Proposal Who: Chao Liu Topic: Vision with Small Baselines Abstract: Portable camera sensor systems are becoming more and more popular in computer vision applications such as autonomous driving, virtual reality, robotics manipulation and surveillance, due to the decreasing expense and size of RGB camera. Despite the compactness and portability of the small baseline vision systems, it is well-known that the uncertainty in range finding using multiple views and the sensor baselines are inversely related. For small baseline vision systems, this means high depth uncertainties even for close range objects. On the other hand, besides compactness, small baseline vision system has its unique advantages such as easier correspondence and large overlapping regions across views. How to utilize those advantages for small baseline vision setup while avoiding the limitations as much as possible? In this thesis proposal, we approach this question in terms of three aspects: scene complexity, uncertainties in the estimations and baseline distance in the setup. We first present a method for matting and depth recovery of 3D thin structures with self-occlusions using a single-view camera with finite aperture lens. In this work, we take advantage of the small camera baselines that makes the correspondence easier. We apply the proposed method to scenes at both macro and microscales. For macro-scale, we evaluate our method on scenes with complex 3D thin structures such as tree branches and grass. For micro-scale, we apply our method to *in-vivo* microscopic images of micro-vessels with diameters less than 50 *um*. We also utilize the small baselines for circularly placed point light sources (commonly seen in consumer devices like NESTcam, Amazon Cloudcam). We propose a two-stage near-light photometric stereo method. In the first stage, we optimizethe vertex positions using the differential images induced by small changes in lightsource position. This procedure yields a strong initial guess for the second stage that refines the estimations using the raw captured images. To handle the estimation uncertainties inherent in the small baseline setup, we propose a learning-based method to estimate per-pixel depth and its uncertainty continuously from a monocular video stream. Compared to prior work, the proposed approach achieves more accurate and stable results, generalizes better to new datasets, and yields per-pixel depth probability map that accounts for the estimation uncertainties due to specular surface, occlusions in the scene and objects with large distance. To deal with the subsurface light scattering in the tissue, we propose a projector-camera setup with small baseline that works in a small scale and a method that combines the approximated model for subsurface light scattering, in order to see through skins and perform *in-vivo* blood flow analysis on human skin. We also propose to combine the benefits of small and large baseline vision systems, in order to handle large region occlusion and depth estimation for fine-grained structures at the same time. Thesis Committee Members: Srinivasa Narasimhan, Co-chair Artur Dubrawski, Co-chair Aswin Sankaranarayanan Manmohan Chandraker, University of California, San Diego A copy of thesis document is available at: https://www.dropbox.com/s/wwxe7mqy7mf947q/small-baseline.pdf?dl=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Tue Nov 27 05:38:03 2018 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 27 Nov 2018 05:38:03 -0500 Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs Message-ID: Team, This is a great recognition for Emily as well as the whole Marinus/CMU Auton Lab/Traffic Jam counter-human-trafficking team. Way to go Emily! Artur https://www.forbes.com/30-under-30/2019/social-entrepreneurs/#1e14b9b372e6 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.mitchell at cs.cmu.edu Tue Nov 27 08:04:34 2018 From: tom.mitchell at cs.cmu.edu (Tom Mitchell) Date: Tue, 27 Nov 2018 08:04:34 -0500 Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs In-Reply-To: References: Message-ID: Wonderful! Congratulations Emily on this great recognition, and to all your collaborators too. Byron - in case you're not already on this, let's make some noise about it! best Tom On Tue, Nov 27, 2018 at 5:38 AM Artur Dubrawski wrote: > Team, > > This is a great recognition for Emily as well as the whole Marinus/CMU > Auton Lab/Traffic Jam counter-human-trafficking team. > > Way to go Emily! > > Artur > > https://www.forbes.com/30-under-30/2019/social-entrepreneurs/#1e14b9b372e6 > > > > > > -- Tom M. Mitchell E. Fredkin University Professor Interim Dean School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From hebert at cs.cmu.edu Tue Nov 27 08:50:38 2018 From: hebert at cs.cmu.edu (Martial Hebert) Date: Tue, 27 Nov 2018 13:50:38 +0000 Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs In-Reply-To: References: Message-ID: <60b179a8cac449cc83d3b45613bbefbe@cs.cmu.edu> Excellent! Congratulations Emily. ________________________________ From: Artur Dubrawski Sent: Tuesday, November 27, 2018 5:38 AM To: users at autonlab.org Cc: Martial Hebert; Andrew W. Moore; Andrew Moore; Tom Mitchell; Roni Rosenfeld; Emily Kennedy; Cara Jones Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs Team, This is a great recognition for Emily as well as the whole Marinus/CMU Auton Lab/Traffic Jam counter-human-trafficking team. Way to go Emily! Artur https://www.forbes.com/30-under-30/2019/social-entrepreneurs/#1e14b9b372e6 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mille856 at andrew.cmu.edu Tue Nov 27 08:59:52 2018 From: mille856 at andrew.cmu.edu (James Miller) Date: Tue, 27 Nov 2018 08:59:52 -0500 Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs In-Reply-To: References: Message-ID: Awesome Emily! Congratulations! On Tue, Nov 27, 2018, 8:13 AM Tom Mitchell Wonderful! Congratulations Emily on this great recognition, and to all > your collaborators too. > > Byron - in case you're not already on this, let's make some noise about it! > > best > Tom > > > On Tue, Nov 27, 2018 at 5:38 AM Artur Dubrawski wrote: > >> Team, >> >> This is a great recognition for Emily as well as the whole Marinus/CMU >> Auton Lab/Traffic Jam counter-human-trafficking team. >> >> Way to go Emily! >> >> Artur >> >> https://www.forbes.com/30-under-30/2019/social-entrepreneurs/#1e14b9b372e6 >> >> >> >> >> >> > > -- > Tom M. Mitchell > E. Fredkin University Professor > Interim Dean > School of Computer Science > Carnegie Mellon University > www.cs.cmu.edu/~tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bapoczos at cs.cmu.edu Tue Nov 27 09:40:09 2018 From: bapoczos at cs.cmu.edu (Barnabas Poczos) Date: Tue, 27 Nov 2018 09:40:09 -0500 Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs In-Reply-To: References: Message-ID: This is awesome! Congratulations Emily! :-) Cheers, Barnabas ====================== Barnabas Poczos, PhD Associate Professor Co-Director of PhD Program Machine Learning Department Carnegie Mellon University On Tue, Nov 27, 2018 at 5:38 AM Artur Dubrawski wrote: > > Team, > > This is a great recognition for Emily as well as the whole Marinus/CMU Auton Lab/Traffic Jam counter-human-trafficking team. > > Way to go Emily! > > Artur > > https://www.forbes.com/30-under-30/2019/social-entrepreneurs/#1e14b9b372e6 > > > > > From predragp at andrew.cmu.edu Tue Nov 27 22:33:44 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 27 Nov 2018 22:33:44 -0500 Subject: GPU2 is killed Message-ID: <20181128033344.wvqy5Wks8%predragp@andrew.cmu.edu> Dear Autonians, GPU2 was just killed by a user trying to use 110GB of memory per script. The user had multiple scripts running. I do realize that NIPS deadline is near but killing machine is not going to do us any good either. Best, Predrag From yz6 at andrew.cmu.edu Thu Nov 29 00:11:45 2018 From: yz6 at andrew.cmu.edu (Yang Zhang) Date: Thu, 29 Nov 2018 00:11:45 -0500 Subject: Matlab license error Message-ID: Hi, I am receiving this error when I run Matlab. Does anyone know what to do? Cheers, Yang -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2018-11-29 at 12.11.06 AM.png Type: image/png Size: 104188 bytes Desc: not available URL: From predragp at andrew.cmu.edu Thu Nov 29 17:39:20 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Thu, 29 Nov 2018 17:39:20 -0500 Subject: LOV5 hard rebooted due to the crash Message-ID: Somebody went wild on lov5 and crash the machine which had to be cold rebooted. Please be mindful of how much you load machines as having frequent reboots is not going to help us with NIPS deadline. Best, Predrag From awd at cs.cmu.edu Fri Nov 30 09:32:40 2018 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 30 Nov 2018 09:32:40 -0500 Subject: Our Emily Kennedy, Marinus Analytics, and Traffic Jam software in The Washington Post Message-ID: Team, Emily and our Counter-Human-Trafficking work makes it to the national media again. This one is about the fallout of the shutdown of Backpage escort advertising and how the industry moved elsewhere and how the C-H-T community (led by Marinus Analytics on the technology side of things) managed to quickly cope with the change and continue extracting operationally useful intelligence from public data sources. Congrats Emily and the Traffic Jam team at Marinus and CMU! Artur https://www.washingtonpost.com/national/online-sex-ads-rebound-months-after-shutdown-of-backpage/2018/11/28/ff8fe3a4-f34b-11e8-99c2-cfca6fcf610c_story.html?utm_term=.7ed140a54b55 -------------- next part -------------- An HTML attachment was scrubbed... URL: From emily at marinusanalytics.com Fri Nov 30 17:11:00 2018 From: emily at marinusanalytics.com (Emily Kennedy) Date: Fri, 30 Nov 2018 22:11:00 +0000 Subject: Our Emily Kennedy, Marinus Analytics, and Traffic Jam software in The Washington Post In-Reply-To: References: Message-ID: Thank you for sharing this, Artur, and for highlighting it with your network! This work truly would have not been possible without its initial inception at CMU and from the amazing support we have received. Just so happens I also had the chance to share this week about our work fighting human trafficking at Hewlett Packard Enterprise?s Discover Conference in Madrid in the General Session with HPE CEO Antonio Neri. Video is here and my part begins around 12:30: https://www.youtube.com/watch?v=rCBZnXOBypU&feature=youtu.be Have a great weekend, everyone! Emily Kennedy President, Founder Marinus Analytics LLC +1 (866) 945-2803 LinkedIn | marinusanalytics.com On Nov 30, 2018, at 8:32 AM, Artur Dubrawski > wrote: Team, Emily and our Counter-Human-Trafficking work makes it to the national media again. This one is about the fallout of the shutdown of Backpage escort advertising and how the industry moved elsewhere and how the C-H-T community (led by Marinus Analytics on the technology side of things) managed to quickly cope with the change and continue extracting operationally useful intelligence from public data sources. Congrats Emily and the Traffic Jam team at Marinus and CMU! Artur https://www.washingtonpost.com/national/online-sex-ads-rebound-months-after-shutdown-of-backpage/2018/11/28/ff8fe3a4-f34b-11e8-99c2-cfca6fcf610c_story.html?utm_term=.7ed140a54b55 -------------- next part -------------- An HTML attachment was scrubbed... URL: