From predragp at andrew.cmu.edu  Fri Nov  2 18:03:45 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Fri, 02 Nov 2018 18:03:45 -0400
Subject: Misc issues
Message-ID: <20181102220345.6pSbywqLQ%predragp@andrew.cmu.edu>

Dear Autonians,

During today's orientation session with three new members of the lab few
issues got my attention so I am goint to share them with you as well as
a status update on the main file server.


1. As of this morning ZFS snapshots are disabled on the main file server
hosting majority of older accounts and old snapshots have been deleted.
Right now zpool hosting home directories is 88% full. At the moment I am
moving at least one large legacy home directory to the attic which
should releave 1 TB of space. This is still insuficient to drop the pool
load to below 80% needed for normal NFS, resilvering, and snapshot
operations. I am recalculationg home directory sizes this very moment
and I hope I will have report by Monday. Any directory sized larger than
0.5TB will be prime target for migrartion (or removal).

2. Git web interface was temporary down due to the SSL certificate
update. After I restarted PostgreSQL database things work as expected
for both local accounts (Anthony's and mine) as well as LDAP accounts
(everyone else). Git clone, pull, and push also work per testing. 

3. There was a report on ssh key issue with Git authentication needed
for the git operations from cli. This issue is user specific and it has
nothing to do with SELinux NFS policies we had in the past

setsebool -P use_nfs_home_dirs=true

or the fact that 25 home directories are already migrated to the new
file server and mounted by autofs daemon on the login. I verfied this
both on GPU machines (which have SELinux disabled) as well as regular
CPU machines which have SELinux enabled with the account that had to be
autofs mounted per login. 


4. You can also put your public key into .ssh/authorized_keys and use
passwordless authentication whether you have an old permanently mount
home directory or the one mounte with autofs daemon. This is tested. 
I had a rough time today during the demo (I guess it is just my age). 


5. Finally it appears that at one out of four GPU cards on the GPU3 and
GPU4 computing nodes are not working properly. Please see below. I have
seen this in the past and the reason was dead hardware. I would like to
reboot those two servers and do some further testing before we shell out
$2500 for two used Titan Xp cards. 

I hope you have a great weekend.

Predrag

root at gpu3$ nvidia-smi
Fri Nov  2 17:01:29 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30
    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:02:00.0 Off |
N/A |
| 23%   29C    P8    17W / 250W |   1081MiB / 12196MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:03:00.0 Off |
N/A |
| 23%   28C    P8     9W / 250W |   1081MiB / 12196MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 00000000:82:00.0 Off |
N/A |
| 23%   31C    P8    11W / 250W |   1081MiB / 12196MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+

     
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU
Memory |
|  GPU       PID   Type   Process name                             Usage
    |
|=============================================================================|
|    0      4458      C   python3
1071MiB |
|    1     19870      C   python3
1071MiB |
|    2      3102      C   python3
1071MiB |
+-----------------------------------------------------------------------------+

root at gpu4$ nvidia-smi
Fri Nov  2 17:01:34 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30
    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:02:00.0 Off |
N/A |
| 41%   67C    P2   169W / 250W |   3213MiB / 12196MiB |      7%
Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:03:00.0 Off |
N/A |
| 25%   45C    P8    19W / 250W |   2810MiB / 12196MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 00000000:83:00.0 Off |
N/A |
| 23%   34C    P8    15W / 250W |    317MiB / 12196MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+

     
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU
Memory |
|  GPU       PID   Type   Process name                             Usage
    |
|=============================================================================|
|    0     21927      C   python2
1689MiB |
|    1      7063      C   ...auton/home/cnagpal/anaconda2/bin/python
1267MiB |
|    1     12081      C   ...auton/home/cnagpal/anaconda2/bin/python
1525MiB |
|    2     28460      C   python
307MiB |
+-----------------------------------------------------------------------------+

From predragp at andrew.cmu.edu  Sat Nov  3 19:07:37 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Sat, 03 Nov 2018 19:07:37 -0400
Subject: GPU2 Fwd: Autonlab-sysinfo Digest, Vol 52, Issue 12
Message-ID: <20181103230737.oNp-UWyvW%predragp@andrew.cmu.edu>

Dear Autonians,

GPU2 server is officially broke. Please see below. What you are not
going to see in the below report is that besides /root being full due to
the accessive cashing in the /tmp folder which resides on the root
partition /home/scratch is also full.

I have emailed multiple times asking people to clear the /scratch
directory but no aval. I will have to use more heavy handed methods now
like periodically deleteing all /home/scratch directories on all GPU
machines so that machines are usable again.

Best,
Predrag


P.S. I am going for a stroll. If /home/scratch and cash is not clear I
will have to reboot the machine and delete /home/scratch around 9:00 PM
tonight.


-------- Original Message --------
From: autonlab-sysinfo-request at autonlab.org
Subject: Autonlab-sysinfo Digest, Vol 52, Issue 12
To: autonlab-sysinfo at autonlab.org
Date: Sat, 03 Nov 2018 17:21:15 -0400

Send Autonlab-sysinfo mailing list submissions to
	autonlab-sysinfo at autonlab.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://mailman.srv.cs.cmu.edu/mailman/listinfo/autonlab-sysinfo
or, via email, send a message with subject or body 'help' to
	autonlab-sysinfo-request at autonlab.org

You can reach the person managing the list at
	autonlab-sysinfo-owner at autonlab.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Autonlab-sysinfo digest..."


Today's Topics:

   1. Cron <root at gpu2> /usr/lib64/sa/sa1 1 1 ((Cron Daemon))
   2. Cron <root at gpu2> run-parts /etc/cron.hourly ((Cron Daemon))
   3. Cron <root at gpu2> /usr/lib64/sa/sa1 1 1 ((Cron Daemon))
   4. Cron <root at gpu2> /usr/lib64/sa/sa1 1 1 ((Cron Daemon))
   5. Cron <root at gpu2> /usr/lib64/sa/sa1 1 1 ((Cron Daemon))


----------------------------------------------------------------------

Message: 1
Date: Sat,  3 Nov 2018 15:50:01 -0400 (EDT)
From: "(Cron Daemon)" <auton.sysnotify at gmail.com>
To: root at gpu2.int.autonlab.org
Subject: Cron <root at gpu2> /usr/lib64/sa/sa1 1 1
Message-ID: <20181103212106.67CDC1596247 at gpu2.int.autonlab.org>
Content-Type: text/plain; charset=UTF-8

Cannot write data to system activity file: No space left on device


------------------------------

Message: 2
Date: Sat,  3 Nov 2018 16:10:45 -0400 (EDT)
From: "(Cron Daemon)" <auton.sysnotify at gmail.com>
To: root at gpu2.int.autonlab.org
Subject: Cron <root at gpu2> run-parts /etc/cron.hourly
Message-ID: <20181103212106.8CFB71596243 at gpu2.int.autonlab.org>
Content-Type: text/plain; charset=UTF-8

/etc/cron.hourly/0yum-hourly.cron:

Traceback (most recent call last):
  File "/usr/sbin/yum-cron", line 729, in <module>
    main()
  File "/usr/sbin/yum-cron", line 726, in main
    base.updatesCheck()
  File "/usr/sbin/yum-cron", line 618, in updatesCheck
    self.populateUpdateMetadata()
  File "/usr/sbin/yum-cron", line 422, in populateUpdateMetadata
    self.pkgSack # honor skip_if_unavailable
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 1074, in <lambda>
    pkgSack = property(fget=lambda self: self._getSacks(),
  File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 778, in _getSacks
    self.repos.populateSack(which=repos)
  File "/usr/lib/python2.7/site-packages/yum/repos.py", line 347, in populateSack
    self.doSetup()
  File "/usr/lib/python2.7/site-packages/yum/repos.py", line 157, in doSetup
    self.retrieveAllMD()
  File "/usr/lib/python2.7/site-packages/yum/repos.py", line 88, in retrieveAllMD
    dl = repo._async and repo._commonLoadRepoXML(repo)
  File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1477, in _commonLoadRepoXML
    if self._latestRepoXML(local):
  File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1446, in _latestRepoXML
    oxml = self._saveOldRepoXML(local)
  File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1303, in _saveOldRepoXML
    shutil.copy2(local, old_local)
  File "/usr/lib64/python2.7/shutil.py", line 130, in copy2
    copyfile(src, dst)
  File "/usr/lib64/python2.7/shutil.py", line 83, in copyfile
    with open(dst, 'wb') as fdst:
IOError: [Errno 28] No space left on device: '/var/cache/yum/x86_64/7/SCL-core/repomd.xml.old.tmp'
/etc/cron.hourly/ghc-doc-index:

/usr/bin/ghc-doc-index: line 27: /var/lib/ghc/pkg-dir.cache.new: No space left on device
diff: /var/lib/ghc/pkg-dir.cache.new: No such file or directory
haddock: internal error: .: copyFile: resource exhausted (No space left on device)


------------------------------

Message: 3
Date: Sat,  3 Nov 2018 16:00:01 -0400 (EDT)
From: "(Cron Daemon)" <auton.sysnotify at gmail.com>
To: root at gpu2.int.autonlab.org
Subject: Cron <root at gpu2> /usr/lib64/sa/sa1 1 1
Message-ID: <20181103212106.7C6CA1596246 at gpu2.int.autonlab.org>
Content-Type: text/plain; charset=UTF-8

Cannot write data to system activity file: No space left on device


------------------------------

Message: 4
Date: Sat,  3 Nov 2018 15:40:01 -0400 (EDT)
From: "(Cron Daemon)" <auton.sysnotify at gmail.com>
To: root at gpu2.int.autonlab.org
Subject: Cron <root at gpu2> /usr/lib64/sa/sa1 1 1
Message-ID: <20181103212106.8282C1596247 at gpu2.int.autonlab.org>
Content-Type: text/plain; charset=UTF-8

Cannot write data to system activity file: No space left on device


------------------------------

Message: 5
Date: Sat,  3 Nov 2018 15:20:01 -0400 (EDT)
From: "(Cron Daemon)" <auton.sysnotify at gmail.com>
To: root at gpu2.int.autonlab.org
Subject: Cron <root at gpu2> /usr/lib64/sa/sa1 1 1
Message-ID: <20181103212106.742761596243 at gpu2.int.autonlab.org>
Content-Type: text/plain; charset=UTF-8

Cannot write data to system activity file: No space left on device


------------------------------

Subject: Digest Footer

_______________________________________________
Autonlab-sysinfo mailing list
Autonlab-sysinfo at autonlab.org
https://mailman.srv.cs.cmu.edu/mailman/listinfo/autonlab-sysinfo


------------------------------

End of Autonlab-sysinfo Digest, Vol 52, Issue 12
************************************************


From predragp at andrew.cmu.edu  Sat Nov  3 19:11:14 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Sat, 03 Nov 2018 19:11:14 -0400
Subject: GPU 1 error
In-Reply-To: <CAGPLMfqmshi-os0ywzhnA-RDAb_kohPQ3bBc=Km_7RxZ=y2KWg@mail.gmail.com>
References: <CAGPLMfqmshi-os0ywzhnA-RDAb_kohPQ3bBc=Km_7RxZ=y2KWg@mail.gmail.com>
Message-ID: <20181103231114.gH2ceoGrv%predragp@andrew.cmu.edu>

Biswajit Paria <bparia at cs.cmu.edu> wrote:

> Hi Predrag,
> 
> I am trying to use GPU 1, and getting an unusual segmentation fault. The
> same code that I was running for two days is now throwing a segmentation
> fault. Is it possible to restart GPU1? Doesn't look like anyone else it
> using it other than me.


Sure if nobody is using it. Are you sure that you were using this
machine after I rebooted last week? Those library exception errors are
typically due to NVidia 3rd party binary blob drivers which needs to be
reinstalled occasionally. I will give a two hours and reboot at the
same time when I reboot GPU2. If the driver gets broken it will have to
wait Monday.


> 
> Here is stack trace in case you want to have a look:
> 
> Stack trace returned 10 entries:
> [bt] (0)
> /zfsauton/home/bparia/anaconda3/lib/python3.6/site-packages/mxnet/lib
> mxnet.so(+0x31f81a) [0x7feebb24f81a]
> [bt] (1)
> /zfsauton/home/bparia/anaconda3/lib/python3.6/site-packages/mxnet/lib
> mxnet.so(+0x29f33b6) [0x7feebd9233b6]
> [bt] (2) /lib64/libpthread.so.0(+0xf680) [0x7fef78319680]
> [bt] (3) /lib64/libpthread.so.0(raise+0x2b) [0x7fef7831954b]
> [bt] (4) /lib64/libpthread.so.0(+0xf680) [0x7fef78319680]
> [bt] (5) /usr/lib64/nvidia/libcuda.so.1(+0xf88d5) [0x7fef304548d5]
> [bt] (6) /usr/lib64/nvidia/libcuda.so.1(+0x248914) [0x7fef305a4914]
> [bt] (7) /usr/lib64/nvidia/libcuda.so.1(+0x1e4e80) [0x7fef30540e80]
> [bt] (8) /lib64/libpthread.so.0(+0x7dd5) [0x7fef78311dd5]
> [bt] (9) /lib64/libc.so.6(clone+0x6d) [0x7fef7803bb3d]
> 
> 
> Thanks in advance!
> --
> Biswajit Paria
> PhD student
> Machine Learning Department
> Carnegie Mellon University

From predragp at andrew.cmu.edu  Sat Nov  3 22:57:17 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Sat, 03 Nov 2018 22:57:17 -0400
Subject: GPU 1 error
In-Reply-To: <CAGPLMfr_GycSwLFSuh3rXZUpEuqTEHaJSut+RqxFvZW6-m1B_g@mail.gmail.com>
References: <CAGPLMfqmshi-os0ywzhnA-RDAb_kohPQ3bBc=Km_7RxZ=y2KWg@mail.gmail.com>
 <20181103231114.gH2ceoGrv%predragp@andrew.cmu.edu>
 <CAGPLMfr_GycSwLFSuh3rXZUpEuqTEHaJSut+RqxFvZW6-m1B_g@mail.gmail.com>
Message-ID: <20181104025717.-mSPsGTrK%predragp@andrew.cmu.edu>

Biswajit Paria <bparia at cs.cmu.edu> wrote:

> I see. I was using it yesterday. It is possible that the CUDA is broken,
> and it is somehow not using the cuda in my home directory. I will try to
> get it to use my local CUDA, otherwise I will wait till Monday.
> 
> Thanks!
> 

Ok I just spent almost 2h playing with GPU1. This is what I have done. I
cleaned the system, NVidia driver to 396.44 and upgraded CUDA to 9.2. I
then cleaned and upgraded all the packages. Note that I didn't want to
install recently release CUDA 10 which is probably still poorly
supported by applications. 

The system works like a swiss watch now but it is likely that all
deep-learning tools are in broken state. You will have to rebuilt
tensor-flow or whatever you were using.

The following two users 

678.5 GiB joliva
513.8 GiB chunlial 

should try to clean their scratch directories or at least e-mail me with
an explanation for such excessive use. I have half-way scripted now this
process for Ansible so I could push this to all GPU nodes but it is
likely that I will inflict lot of pain to people who are running jobs. 

We still have a problem on the servers GPU3 and GPU4 which appear to
have dead GPU cards.

Best,
Predrag


> 
> On Sat, Nov 3, 2018, 7:11 PM Predrag Punosevac <predragp at andrew.cmu.edu
> wrote:
> 
> > Biswajit Paria <bparia at cs.cmu.edu> wrote:
> >
> > > Hi Predrag,
> > >
> > > I am trying to use GPU 1, and getting an unusual segmentation fault. The
> > > same code that I was running for two days is now throwing a segmentation
> > > fault. Is it possible to restart GPU1? Doesn't look like anyone else it
> > > using it other than me.
> >
> >
> > Sure if nobody is using it. Are you sure that you were using this
> > machine after I rebooted last week? Those library exception errors are
> > typically due to NVidia 3rd party binary blob drivers which needs to be
> > reinstalled occasionally. I will give a two hours and reboot at the
> > same time when I reboot GPU2. If the driver gets broken it will have to
> > wait Monday.
> >
> >
> >
> > >
> > > Here is stack trace in case you want to have a look:
> > >
> > > Stack trace returned 10 entries:
> > > [bt] (0)
> > > /zfsauton/home/bparia/anaconda3/lib/python3.6/site-packages/mxnet/lib
> > > mxnet.so(+0x31f81a) [0x7feebb24f81a]
> > > [bt] (1)
> > > /zfsauton/home/bparia/anaconda3/lib/python3.6/site-packages/mxnet/lib
> > > mxnet.so(+0x29f33b6) [0x7feebd9233b6]
> > > [bt] (2) /lib64/libpthread.so.0(+0xf680) [0x7fef78319680]
> > > [bt] (3) /lib64/libpthread.so.0(raise+0x2b) [0x7fef7831954b]
> > > [bt] (4) /lib64/libpthread.so.0(+0xf680) [0x7fef78319680]
> > > [bt] (5) /usr/lib64/nvidia/libcuda.so.1(+0xf88d5) [0x7fef304548d5]
> > > [bt] (6) /usr/lib64/nvidia/libcuda.so.1(+0x248914) [0x7fef305a4914]
> > > [bt] (7) /usr/lib64/nvidia/libcuda.so.1(+0x1e4e80) [0x7fef30540e80]
> > > [bt] (8) /lib64/libpthread.so.0(+0x7dd5) [0x7fef78311dd5]
> > > [bt] (9) /lib64/libc.so.6(clone+0x6d) [0x7fef7803bb3d]
> > >
> > >
> > > Thanks in advance!
> > > --
> > > Biswajit Paria
> > > PhD student
> > > Machine Learning Department
> > > Carnegie Mellon University
> >

From predragp at andrew.cmu.edu  Sat Nov  3 23:19:44 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Sat, 03 Nov 2018 23:19:44 -0400
Subject: GPU1 and GPU2 scratch directory report
In-Reply-To: <24A483DA-87C2-4D47-A3CA-D56831A34205@andrew.cmu.edu>
References: <20181103230737.oNp-UWyvW%predragp@andrew.cmu.edu>
 <24A483DA-87C2-4D47-A3CA-D56831A34205@andrew.cmu.edu>
Message-ID: <20181104031944.OtNjYrohj%predragp@andrew.cmu.edu>

Jayanth Koushik <jkoushik at andrew.cmu.edu> wrote:

> Hey Predrag,
> 
> I realize this is a serious issue, but would it at all be possible to keep my scratch directory? I only use 12GB but I have a personal toolchain installed there on which a lot of scripts depend.
> 

Sure thing! I did a little digging and the same users are hogging
scratch directories on both GPU1 and GPU2. Now there might be a valid
reason to do so ... 

GPU1:
340.7 GiB joliva
320.9 GiB chunlial


GPU2:
678.5 GiB joliva
513.8 GiB chunlial

Cheers,
Predrag
Date: Sat, 03 Nov 2018 23:15:21 -0400
From: Predrag Punosevac <predragp at andrew.cmu.edu>
To: jkoushik at andrew.cmu.edu
Cc: joliva at cs.unc.edu chunlial at andrew.cmu.edu, users at autonlab.org
Subject: Re: GPU1 and GPU2 scratch directory user report
Message-ID: <20181104031521.zq4R9U0Is%predragp at andrew.cmu.edu>
References: <20181103230737.oNp-UWyvW%predragp at andrew.cmu.edu>
 <24A483DA-87C2-4D47-A3CA-D56831A34205 at andrew.cmu.edu>
In-Reply-To: <24A483DA-87C2-4D47-A3CA-D56831A34205 at andrew.cmu.edu>
User-Agent: s-nail v14.8.12

Jayanth Koushik <jkoushik at andrew.cmu.edu> wrote:

> Hey Predrag,
> 
> I realize this is a serious issue, but would it at all be possible to keep my scratch directory? I only use 12GB but I have a personal toolchain installed there on which a lot of scripts depend.
> 


Sure thing! I did a little digging and the same users are hogging
scratch directories on both GPU1 and GPU2. Now there might be a valid
reason to do so ... 

GPU1:
340.7 GiB joliva
320.9 GiB chunlial


GPU2:
678.5 GiB joliva
513.8 GiB chunlial

Cheers,
Predrag


> Thanks,
> ~Jayanth


> Thanks,
> ~Jayanth

From predragp at andrew.cmu.edu  Sun Nov  4 16:28:23 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Sun, 04 Nov 2018 16:28:23 -0500
Subject: Main file server disk usage report
Message-ID: <20181104212823.c498DWpOh%predragp@andrew.cmu.edu>

Dear Autonians,

The following users exceed 250GB home directory quota which is
implemented on the new file server. Keep in mind that ZFS is using lzma
compression so depends on the data type 5.2 TB translates into 10 TB
regular file system data. The reason for such large home directories
might be completely legit but they should not have been created without
prior notice. At least 3 of the people on this list are no longer
formally affiliated to CMU. One of them ffalck have already contacted me
and we are working on trimming and archiving his data. Anybody who has
home directory larger than 1TB should contact me at the earliest
convenience. 

Best,
Predrag


    5.2 TiB [##########] /jaylee
    3.8 TiB [#######   ] /yichongx
    2.5 TiB [####      ] /joliva
    1.5 TiB [##        ] /lujiec
    1.4 TiB [##        ] /ffalck
    1.3 TiB [##        ] /htung
  759.5 GiB [#         ] /dsutherl
  573.8 GiB [#         ] /kkandasa
  542.1 GiB [#         ] /iapostol
  536.2 GiB [#         ] /pengrui
  518.4 GiB [          ] /bpatra
  464.7 GiB [          ] /chunlial
  422.2 GiB [          ] /siyuh
  416.6 GiB [          ] /jrmoniz
  374.8 GiB [          ] /chiragn
  327.2 GiB [          ] /yifeim
  253.6 GiB [          ] /bparia

From predragp at andrew.cmu.edu  Sun Nov  4 17:04:42 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Sun, 04 Nov 2018 17:04:42 -0500
Subject: Main file server disk usage report
In-Reply-To: <9B3002CE-273C-404C-9931-681D6780875F@andrew.cmu.edu>
References: <20181104212823.c498DWpOh%predragp@andrew.cmu.edu>
 <9B3002CE-273C-404C-9931-681D6780875F@andrew.cmu.edu>
Message-ID: <20181104220442.CKfKOtNzK%predragp@andrew.cmu.edu>

Yichong Xu <yichongx at cs.cmu.edu> wrote:

> Hi Predrag,
> I will clean my home directory soon. 
> Besides this - In my opinion 250GB is a bit too small for one person,
> especially for storing models and data. Is there anywhere else I can
> store my data to? Thank you very much!

You don't get it. You can't delete anything from ZFS. Only I can delete
things with the great deal of effort (stop and remove snapshots) stop
backups and risk everyone's home directory. ZFS is designed for data
retention not for data loss. The principal designed error on my part
made 5 years ago was to allow home directories to share the same ZFS
dataset thereby losing the fine control over the data management. Right
now if you do rm -rf you will be able to delete the things because I had
to turn off bunch of switches.

250 GB home directory is not curved in stone and I can increase quota on
the need basis. We have enough disk space but I have to have some prior
notice to be able to do planning. Please stop by my office so that we
develop some kind migration strategy for your home directory. It will be
a separate ZFS dataset with the quota we agree to be reasonable for you
to work normally but also for me not to have these kind wildfires.

Predrag

> 
> Thanks,
> Yichong
> 
> 
> 
> On Nov 4, 2018, at 4:28 PM, Predrag Punosevac <predragp at andrew.cmu.edu<mailto:predragp at andrew.cmu.edu>> wrote:
> 
> Dear Autonians,
> 
> The following users exceed 250GB home directory quota which is
> implemented on the new file server. Keep in mind that ZFS is using lzma
> compression so depends on the data type 5.2 TB translates into 10 TB
> regular file system data. The reason for such large home directories
> might be completely legit but they should not have been created without
> prior notice. At least 3 of the people on this list are no longer
> formally affiliated to CMU. One of them ffalck have already contacted me
> and we are working on trimming and archiving his data. Anybody who has
> home directory larger than 1TB should contact me at the earliest
> convenience.
> 
> Best,
> Predrag
> 
> 
>    5.2 TiB [##########] /jaylee
>    3.8 TiB [#######   ] /yichongx
>    2.5 TiB [####      ] /joliva
>    1.5 TiB [##        ] /lujiec
>    1.4 TiB [##        ] /ffalck
>    1.3 TiB [##        ] /htung
>  759.5 GiB [#         ] /dsutherl
>  573.8 GiB [#         ] /kkandasa
>  542.1 GiB [#         ] /iapostol
>  536.2 GiB [#         ] /pengrui
>  518.4 GiB [          ] /bpatra
>  464.7 GiB [          ] /chunlial
>  422.2 GiB [          ] /siyuh
>  416.6 GiB [          ] /jrmoniz
>  374.8 GiB [          ] /chiragn
>  327.2 GiB [          ] /yifeim
>  253.6 GiB [          ] /bparia
> 


From predragp at andrew.cmu.edu  Mon Nov  5 16:16:47 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Mon, 05 Nov 2018 16:16:47 -0500
Subject: HDD failure on the main file server
Message-ID: <20181105211647.44Kh5fOCI%predragp@andrew.cmu.edu>

HDD is already replaced but I powered off file server for about 5
minutes (it is just safer no to do hot swap).  Luckily this is the HDD
from a ZFS pool which was 50% loaded so the resilvering should be
completed by tonight. If you are one of the people who promised to
clean the stuff from the home directory now is the time. 

Predrag

From predragp at andrew.cmu.edu  Mon Nov  5 21:33:42 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Mon, 05 Nov 2018 21:33:42 -0500
Subject: U.S. government FIPS approved algorithms
Message-ID: <20181106023342.qN-AwPMxe%predragp@andrew.cmu.edu>

Dear Autonians,

An hour ago I switched LDAP authentication on all our computing nodes
and the Auton Lab maintained desktops to U.S. government FIPS approved
algorithms for higher security protection

https://csrc.nist.gov/csrc/media/publications/fips/140/2/final/documents/fips1402annexa.pdf

I have done multiple testing and Authentication appears to be working as
expected . Please let me know immediately if you notice any problems
with logging into our infrastructure.

Sincerely,
Predrag Punosevac

From vjeansel at andrew.cmu.edu  Tue Nov  6 12:43:40 2018
From: vjeansel at andrew.cmu.edu (Vincent Jeanselme)
Date: Tue, 6 Nov 2018 12:43:40 -0500
Subject: Jupyter Notebooks
Message-ID: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>

Hello,

Has anyone faced problems with running Jupyter Notebook since yesterday ?
Did you remember what was the change to operate after the last reboot of 
the sqlite database ?

Thank you,

Vincent

-- 
Vincent Jeanselme
-----------------
Analyst Researcher
Auton Lab - Robotics Institute
Carnegie Mellon University


From mbarnes1 at andrew.cmu.edu  Tue Nov  6 12:59:19 2018
From: mbarnes1 at andrew.cmu.edu (Matthew Barnes)
Date: Tue, 6 Nov 2018 12:59:19 -0500
Subject: Jupyter Notebooks
In-Reply-To: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>
References: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>
Message-ID: <CAB7OVwCCY9R7R_QOaB+MtdAxhAZkqONW3TQBPECK5zTPCCP-nw@mail.gmail.com>

Also having this problem. Trying to create a new notebook hangs on
"Creating new notebook in", and unable to open old notebooks. Anyone's
setup currently working?

On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme <vjeansel at andrew.cmu.edu>
wrote:

> Hello,
>
> Has anyone faced problems with running Jupyter Notebook since yesterday ?
> Did you remember what was the change to operate after the last reboot of
> the sqlite database ?
>
> Thank you,
>
> Vincent
>
> --
> Vincent Jeanselme
> -----------------
> Analyst Researcher
> Auton Lab - Robotics Institute
> Carnegie Mellon University
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/c2be27df/attachment.html>

From predragp at andrew.cmu.edu  Tue Nov  6 15:14:27 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Tue, 06 Nov 2018 15:14:27 -0500
Subject: Jupyter Notebooks
In-Reply-To: <CAB7OVwCCY9R7R_QOaB+MtdAxhAZkqONW3TQBPECK5zTPCCP-nw@mail.gmail.com>
References: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>
 <CAB7OVwCCY9R7R_QOaB+MtdAxhAZkqONW3TQBPECK5zTPCCP-nw@mail.gmail.com>
Message-ID: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu>

Matthew Barnes <mbarnes1 at andrew.cmu.edu> wrote:

> Also having this problem. Trying to create a new notebook hangs on
> "Creating new notebook in", and unable to open old notebooks. Anyone's
> setup currently working?
> 

Jupyter Notebook is using sqlite database to store the info. Unless you
explicitly force Jupyter to create the database on the scratch directory
the database is stored on the NFS share. There is nothing worse one can
do in terms of data consistency than put a database or a private Git
repo (talking about the server) onto the NFS. The datebase was left in
inconsistent state after the file server was rebooted. You have to clear
it and possibly recreate the database to be able to use Jupyter
Notebook. 

Best,
Predrag


> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme <vjeansel at andrew.cmu.edu>
> wrote:
> 
> > Hello,
> >
> > Has anyone faced problems with running Jupyter Notebook since yesterday ?
> > Did you remember what was the change to operate after the last reboot of
> > the sqlite database ?
> >
> > Thank you,
> >
> > Vincent
> >
> > --
> > Vincent Jeanselme
> > -----------------
> > Analyst Researcher
> > Auton Lab - Robotics Institute
> > Carnegie Mellon University
> >
> >

From vjeansel at andrew.cmu.edu  Tue Nov  6 16:45:10 2018
From: vjeansel at andrew.cmu.edu (Vincent Jeanselme)
Date: Tue, 6 Nov 2018 16:45:10 -0500
Subject: Jupyter Notebooks
In-Reply-To: <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu>
References: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>
 <CAB7OVwCCY9R7R_QOaB+MtdAxhAZkqONW3TQBPECK5zTPCCP-nw@mail.gmail.com>
 <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu>
Message-ID: <c9946fc2-ad48-a03f-a423-719761944d80@andrew.cmu.edu>

Even after changing my bashrc with export 
IPYTHONDIR=/home/scratch/$USER/.ipython and reinstalling my jupyter, it 
still does not seem to work. I also have an issue with git, I am no 
longer able to pull from the server.

On 11/6/18 3:14 PM, Predrag Punosevac wrote:
> Matthew Barnes <mbarnes1 at andrew.cmu.edu> wrote:
>
>> Also having this problem. Trying to create a new notebook hangs on
>> "Creating new notebook in", and unable to open old notebooks. Anyone's
>> setup currently working?
>>
> Jupyter Notebook is using sqlite database to store the info. Unless you
> explicitly force Jupyter to create the database on the scratch directory
> the database is stored on the NFS share. There is nothing worse one can
> do in terms of data consistency than put a database or a private Git
> repo (talking about the server) onto the NFS. The datebase was left in
> inconsistent state after the file server was rebooted. You have to clear
> it and possibly recreate the database to be able to use Jupyter
> Notebook.
>
> Best,
> Predrag
>
>
>> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme <vjeansel at andrew.cmu.edu>
>> wrote:
>>
>>> Hello,
>>>
>>> Has anyone faced problems with running Jupyter Notebook since yesterday ?
>>> Did you remember what was the change to operate after the last reboot of
>>> the sqlite database ?
>>>
>>> Thank you,
>>>
>>> Vincent
>>>
>>> --
>>> Vincent Jeanselme
>>> -----------------
>>> Analyst Researcher
>>> Auton Lab - Robotics Institute
>>> Carnegie Mellon University
>>>
>>>
-- 
Vincent Jeanselme
-----------------
Analyst Researcher
Auton Lab - Robotics Institute
Carnegie Mellon University


From chiragn at cs.cmu.edu  Tue Nov  6 17:33:07 2018
From: chiragn at cs.cmu.edu (Chirag Nagpal)
Date: Tue, 6 Nov 2018 17:33:07 -0500
Subject: Jupyter Notebooks
In-Reply-To: <c9946fc2-ad48-a03f-a423-719761944d80@andrew.cmu.edu>
References: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>
 <CAB7OVwCCY9R7R_QOaB+MtdAxhAZkqONW3TQBPECK5zTPCCP-nw@mail.gmail.com>
 <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu>
 <c9946fc2-ad48-a03f-a423-719761944d80@andrew.cmu.edu>
Message-ID: <CAKH1gVrBTPWt30wppFc7dTapMJ=0BBCJGMxm0Jz4vKNwoenGPw@mail.gmail.com>

Predrag: I am able to reproduce the error on the scratch directory too.

Vincent, Matt:  are you using the anaconda jupyter? it could be an anaconda
upgrade thats responsible ?

Chirag

On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme <vjeansel at andrew.cmu.edu>
wrote:

> Even after changing my bashrc with export IPYTHONDIR=/home/scratch/$USER/.ipython
> and reinstalling my jupyter, it still does not seem to work. I also have an
> issue with git, I am no longer able to pull from the server.
>
>
> On 11/6/18 3:14 PM, Predrag Punosevac wrote:
>
>> Matthew Barnes <mbarnes1 at andrew.cmu.edu> wrote:
>>
>> Also having this problem. Trying to create a new notebook hangs on
>>> "Creating new notebook in", and unable to open old notebooks. Anyone's
>>> setup currently working?
>>>
>>> Jupyter Notebook is using sqlite database to store the info. Unless you
>> explicitly force Jupyter to create the database on the scratch directory
>> the database is stored on the NFS share. There is nothing worse one can
>> do in terms of data consistency than put a database or a private Git
>> repo (talking about the server) onto the NFS. The datebase was left in
>> inconsistent state after the file server was rebooted. You have to clear
>> it and possibly recreate the database to be able to use Jupyter
>> Notebook.
>>
>> Best,
>> Predrag
>>
>>
>> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme <
>>> vjeansel at andrew.cmu.edu>
>>> wrote:
>>>
>>> Hello,
>>>>
>>>> Has anyone faced problems with running Jupyter Notebook since yesterday
>>>> ?
>>>> Did you remember what was the change to operate after the last reboot of
>>>> the sqlite database ?
>>>>
>>>> Thank you,
>>>>
>>>> Vincent
>>>>
>>>> --
>>>> Vincent Jeanselme
>>>> -----------------
>>>> Analyst Researcher
>>>> Auton Lab - Robotics Institute
>>>> Carnegie Mellon University
>>>>
>>>>
>>>> --
> Vincent Jeanselme
> -----------------
> Analyst Researcher
> Auton Lab - Robotics Institute
> Carnegie Mellon University
>
>


-- 

*Chirag Nagpal* Graduate Student, Language Technologies Institute
School of Computer Science
Carnegie Mellon University
cs.cmu.edu/~chiragn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/5e40578f/attachment-0001.html>

From mbarnes1 at andrew.cmu.edu  Tue Nov  6 17:42:15 2018
From: mbarnes1 at andrew.cmu.edu (Matthew Barnes)
Date: Tue, 6 Nov 2018 17:42:15 -0500
Subject: Jupyter Notebooks
In-Reply-To: <CAKH1gVrBTPWt30wppFc7dTapMJ=0BBCJGMxm0Jz4vKNwoenGPw@mail.gmail.com>
References: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>
 <CAB7OVwCCY9R7R_QOaB+MtdAxhAZkqONW3TQBPECK5zTPCCP-nw@mail.gmail.com>
 <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu>
 <c9946fc2-ad48-a03f-a423-719761944d80@andrew.cmu.edu>
 <CAKH1gVrBTPWt30wppFc7dTapMJ=0BBCJGMxm0Jz4vKNwoenGPw@mail.gmail.com>
Message-ID: <CAB7OVwD3s_dnmkp9zHk4+fvDumFuOp40qvKkzmRDS4TJdK+ktg@mail.gmail.com>

I'm not using Anaconda.

Unsure if this is related, but now also having issues with CUDA, too.

On Tue, Nov 6, 2018 at 5:33 PM Chirag Nagpal <chiragn at cs.cmu.edu> wrote:

> Predrag: I am able to reproduce the error on the scratch directory too.
>
> Vincent, Matt:  are you using the anaconda jupyter? it could be an
> anaconda upgrade thats responsible ?
>
> Chirag
>
> On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme <vjeansel at andrew.cmu.edu
> > wrote:
>
>> Even after changing my bashrc with export
>> IPYTHONDIR=/home/scratch/$USER/.ipython and reinstalling my jupyter, it
>> still does not seem to work. I also have an issue with git, I am no longer
>> able to pull from the server.
>>
>>
>> On 11/6/18 3:14 PM, Predrag Punosevac wrote:
>>
>>> Matthew Barnes <mbarnes1 at andrew.cmu.edu> wrote:
>>>
>>> Also having this problem. Trying to create a new notebook hangs on
>>>> "Creating new notebook in", and unable to open old notebooks. Anyone's
>>>> setup currently working?
>>>>
>>>> Jupyter Notebook is using sqlite database to store the info. Unless you
>>> explicitly force Jupyter to create the database on the scratch directory
>>> the database is stored on the NFS share. There is nothing worse one can
>>> do in terms of data consistency than put a database or a private Git
>>> repo (talking about the server) onto the NFS. The datebase was left in
>>> inconsistent state after the file server was rebooted. You have to clear
>>> it and possibly recreate the database to be able to use Jupyter
>>> Notebook.
>>>
>>> Best,
>>> Predrag
>>>
>>>
>>> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme <
>>>> vjeansel at andrew.cmu.edu>
>>>> wrote:
>>>>
>>>> Hello,
>>>>>
>>>>> Has anyone faced problems with running Jupyter Notebook since
>>>>> yesterday ?
>>>>> Did you remember what was the change to operate after the last reboot
>>>>> of
>>>>> the sqlite database ?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Vincent
>>>>>
>>>>> --
>>>>> Vincent Jeanselme
>>>>> -----------------
>>>>> Analyst Researcher
>>>>> Auton Lab - Robotics Institute
>>>>> Carnegie Mellon University
>>>>>
>>>>>
>>>>> --
>> Vincent Jeanselme
>> -----------------
>> Analyst Researcher
>> Auton Lab - Robotics Institute
>> Carnegie Mellon University
>>
>>
>
>
> --
>
> *Chirag Nagpal* Graduate Student, Language Technologies Institute
> School of Computer Science
> Carnegie Mellon University
> cs.cmu.edu/~chiragn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/9df65847/attachment.html>

From mbarnes1 at andrew.cmu.edu  Tue Nov  6 18:01:59 2018
From: mbarnes1 at andrew.cmu.edu (Matthew Barnes)
Date: Tue, 6 Nov 2018 18:01:59 -0500
Subject: CUDA hangs
Message-ID: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>

Is anyone else having issues with CUDA since this week? Even simple pytorch
commands hang:

(torch) bash-4.2$ python
Python 2.7.5 (default, Jul  3 2018, 19:30:05)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
x>>> x = torch.zeros(4)
>>> x.cuda()


nvidia-smi works, and torch.cuda.is_available() returns True.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/f37e1c8c/attachment.html>

From predragp at andrew.cmu.edu  Tue Nov  6 20:06:06 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Tue, 06 Nov 2018 20:06:06 -0500
Subject: CUDA hangs
In-Reply-To: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
References: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
Message-ID: <20181107010606.qX0qLZrAb%predragp@andrew.cmu.edu>

Matthew Barnes <mbarnes1 at andrew.cmu.edu> wrote:

> Is anyone else having issues with CUDA since this week? Even simple pytorch
> commands hang:
> 

Do you have issues on all 8 GPU servers (GPU 7 is used for special
project) you can access? 

I upgraded driver and CUDA to 9.2 on GPU1. I would not expect pytorch to
work after that without reinstalling. 

GPU3 and GPU4 are reporting 3 GPU cards. That is a bad sign and means
dead hardware. I am planning to reboot it and play with it little bit
before the final diagnosis. 

Predrag


> (torch) bash-4.2$ python
> Python 2.7.5 (default, Jul  3 2018, 19:30:05)
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import torch
> x>>> x = torch.zeros(4)
> >>> x.cuda()
> 
> 
> nvidia-smi works, and torch.cuda.is_available() returns True.

From qiong.zhang at stat.ubc.ca  Tue Nov  6 18:41:30 2018
From: qiong.zhang at stat.ubc.ca (qiong.zhang at stat.ubc.ca)
Date: Tue, 06 Nov 2018 23:41:30 +0000
Subject: CUDA hangs
In-Reply-To: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
References: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
Message-ID: <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>

I have a similar issue. When I submit the job, it says Runtime error: CUDA error: unknown error. I tried the simple commands that you provided, doesn't work as well.

Qiong 
November 6, 2018 3:02 PM, "Matthew Barnes" <mbarnes1 at andrew.cmu.edu (mailto:%22Matthew%20Barnes%22%20<mbarnes1 at andrew.cmu.edu>)> wrote:
Is anyone else having issues with CUDA since this week? Even simple pytorch commands hang:
(torch) bash-4.2$ python 
Python 2.7.5 (default, Jul 3 2018, 19:30:05) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import torch 
x>>> x = torch.zeros(4) 
>>> x.cuda()  
nvidia-smi works, and torch.cuda.is_available() returns True.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/24a81ef1/attachment-0001.html>

From predragp at andrew.cmu.edu  Tue Nov  6 20:39:24 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Tue, 06 Nov 2018 20:39:24 -0500
Subject: Jupyter Notebooks
In-Reply-To: <CAKH1gVrBTPWt30wppFc7dTapMJ=0BBCJGMxm0Jz4vKNwoenGPw@mail.gmail.com>
References: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>
 <CAB7OVwCCY9R7R_QOaB+MtdAxhAZkqONW3TQBPECK5zTPCCP-nw@mail.gmail.com>
 <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu>
 <c9946fc2-ad48-a03f-a423-719761944d80@andrew.cmu.edu>
 <CAKH1gVrBTPWt30wppFc7dTapMJ=0BBCJGMxm0Jz4vKNwoenGPw@mail.gmail.com>
Message-ID: <20181107013924.QogiH68Av%predragp@andrew.cmu.edu>

Chirag Nagpal <chiragn at cs.cmu.edu> wrote:

> Predrag: I am able to reproduce the error on the scratch directory too.
> 

I am sure you guys have a problem but one doesn't need Jupyter to do
actual programming in Python. I am saying this because there are
handful of you who are affected by this behavior (God knows what could
have caused possibly even regressions by newer version of packages) and
I am the only firefighter currently without bandwidth to deal with such
wild fires.

> Vincent, Matt:  are you using the anaconda jupyter? it could be an anaconda
> upgrade thats responsible ?
> 
> Chirag
> 
> On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme <vjeansel at andrew.cmu.edu>
> wrote:
> 
> > Even after changing my bashrc with export IPYTHONDIR=/home/scratch/$USER/.ipython
> > and reinstalling my jupyter, it still does not seem to work. I also have an
> > issue with git, I am no longer able to pull from the server.
> >


The git issue is environmental variable issue which is caused by the
fact that there is only one user on Gogs server (git) and all accounts
are just aliases with their own ssh-keys to this account. I don't use
and know enough about git but those nasty files in your

reponame/.git 

folder 

which look like 

predragp at lov3$ ls
branches  description  HEAD   index  logs     ORIG_HEAD    refs
config    FETCH_HEAD   hooks  info   objects  packed-refs

apparently get populated in different ways depending on the login. So
for example if I ssh to one of the computing nodes from home I get this


predragp at lov3$ git pull
Host key fingerprint is
SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA
+---[ECDSA 256]---+
|     .++++o.o..  |
|     . .+=+B .   |
|. . . . ..Bo.    |
|.E + o . o  .    |
| .= *   S        |
|o=o*  .          |
|+=+.o .o         |
|o.o+.+. .        |
|.   +o..         |
+----[SHA256]-----+
remote: Enumerating objects: 34, done.
remote: Counting objects: 100% (34/34), done.
remote: Compressing objects: 100% (21/21), done.
remote: Total 22 (delta 8), reused 0 (delta 0)
Unpacking objects: 100% (22/22), done.
>From ssh://git:/predragp/ansible
   cfc10cf..71a9aec  master     -> origin/master
Updating cfc10cf..71a9aec
Fast-forward
 Linux/autofs/etc/auto.nfs                         | 18
+++++++++++++++++
 Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt |  0
 Linux/ldap/etc/openldap/ldap.conf                 |  6 +++---
 Linux/ldap/etc/sssd/sssd.conf                     |  2 +-
 Linux/ldap/ldap.yaml                              | 24
+++++++++++++++--------
 5 files changed, 38 insertions(+), 12 deletions(-)
 rename Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt (100%)


If I ssh to my desktop I get 

predragp at lake$ git pull
Host key fingerprint is
SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA
+---[ECDSA 256]---+
|     .++++o.o..  |
|     . .+=+B .   |
|. . . . ..Bo.    |
|.E + o . o  .    |
| .= *   S        |
|o=o*  .          |
|+=+.o .o         |
|o.o+.+. .        |
|.   +o..         |
+----[SHA256]-----+
Password for git at git.int.autonlab.org:

which is the indication that my .ssh/config file and the ssh-key were
not read even though

Host git 
    HostName git.int.autonlab.org
    Port 2222
    User git 
    IdentityFile /home/predragp/.ssh/git_rsa 

However if I log from the terminal to my desktop I don't have a Git
issue. 


Best,
Predrag


> >
> > On 11/6/18 3:14 PM, Predrag Punosevac wrote:
> >
> >> Matthew Barnes <mbarnes1 at andrew.cmu.edu> wrote:
> >>
> >> Also having this problem. Trying to create a new notebook hangs on
> >>> "Creating new notebook in", and unable to open old notebooks. Anyone's
> >>> setup currently working?
> >>>
> >>> Jupyter Notebook is using sqlite database to store the info. Unless you
> >> explicitly force Jupyter to create the database on the scratch directory
> >> the database is stored on the NFS share. There is nothing worse one can
> >> do in terms of data consistency than put a database or a private Git
> >> repo (talking about the server) onto the NFS. The datebase was left in
> >> inconsistent state after the file server was rebooted. You have to clear
> >> it and possibly recreate the database to be able to use Jupyter
> >> Notebook.
> >>
> >> Best,
> >> Predrag
> >>
> >>
> >> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme <
> >>> vjeansel at andrew.cmu.edu>
> >>> wrote:
> >>>
> >>> Hello,
> >>>>
> >>>> Has anyone faced problems with running Jupyter Notebook since yesterday
> >>>> ?
> >>>> Did you remember what was the change to operate after the last reboot of
> >>>> the sqlite database ?
> >>>>
> >>>> Thank you,
> >>>>
> >>>> Vincent
> >>>>
> >>>> --
> >>>> Vincent Jeanselme
> >>>> -----------------
> >>>> Analyst Researcher
> >>>> Auton Lab - Robotics Institute
> >>>> Carnegie Mellon University
> >>>>
> >>>>
> >>>> --
> > Vincent Jeanselme
> > -----------------
> > Analyst Researcher
> > Auton Lab - Robotics Institute
> > Carnegie Mellon University
> >
> >
> 
> 
> -- 
> 
> *Chirag Nagpal* Graduate Student, Language Technologies Institute
> School of Computer Science
> Carnegie Mellon University
> cs.cmu.edu/~chiragn


From predragp at andrew.cmu.edu  Tue Nov  6 20:41:14 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Tue, 06 Nov 2018 20:41:14 -0500
Subject: CUDA hangs
In-Reply-To: <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>
References: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
 <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>
Message-ID: <20181107014114.WFKLqkUPf%predragp@andrew.cmu.edu>

qiong.zhang at stat.ubc.ca wrote:

> I have a similar issue. When I submit the job, it says Runtime error: CUDA error: unknown error. I tried the simple commands that you provided, doesn't work as well.
> 

Can you tell me which server? We have nine GPU servers.

Predrag

> Qiong 
> November 6, 2018 3:02 PM, "Matthew Barnes" <mbarnes1 at andrew.cmu.edu (mailto:%22Matthew%20Barnes%22%20<mbarnes1 at andrew.cmu.edu>)> wrote:
> Is anyone else having issues with CUDA since this week? Even simple pytorch commands hang:
> (torch) bash-4.2$ python 
> Python 2.7.5 (default, Jul 3 2018, 19:30:05) 
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2 
> Type "help", "copyright", "credits" or "license" for more information. 
> >>> import torch 
> x>>> x = torch.zeros(4) 
> >>> x.cuda()  
> nvidia-smi works, and torch.cuda.is_available() returns True.

From eyolcu at cs.cmu.edu  Tue Nov  6 20:41:29 2018
From: eyolcu at cs.cmu.edu (Emre Yolcu)
Date: Tue, 6 Nov 2018 20:41:29 -0500
Subject: CUDA hangs
In-Reply-To: <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>
References: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
 <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>
Message-ID: <CAO5Ti1s8hkdqGR-7NMw8BMmxvtSVYMAc28MTZCp4zEUVWLg4yQ@mail.gmail.com>

Could you try setting up everything in the scratch directory and test that
way (if that's not what you're already doing)? The last time we had a CUDA
problem I moved everything from /zfsauton/home to /home/scratch directories
and I cannot reproduce the error on gpu{6,8,9}.

On Tue, Nov 6, 2018 at 6:41 PM, <qiong.zhang at stat.ubc.ca> wrote:

> I have a similar issue. When I submit the job, it says Runtime error: CUDA
> error: unknown error. I tried the simple commands that you provided,
> doesn't work as well.
>
> Qiong
>
> November 6, 2018 3:02 PM, "Matthew Barnes" <mbarnes1 at andrew.cmu.edu
> <%22Matthew%20Barnes%22%20%3Cmbarnes1 at andrew.cmu.edu%3E>> wrote:
>
> Is anyone else having issues with CUDA since this week? Even simple
> pytorch commands hang:
> (torch) bash-4.2$ python
> Python 2.7.5 (default, Jul 3 2018, 19:30:05)
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import torch
> x>>> x = torch.zeros(4)
> >>> x.cuda()
> nvidia-smi works, and torch.cuda.is_available() returns True.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/02560568/attachment.html>

From chiragn at cs.cmu.edu  Tue Nov  6 20:43:24 2018
From: chiragn at cs.cmu.edu (Chirag Nagpal)
Date: Tue, 6 Nov 2018 20:43:24 -0500
Subject: Jupyter Notebooks
In-Reply-To: <20181107013924.QogiH68Av%predragp@andrew.cmu.edu>
References: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>
 <CAB7OVwCCY9R7R_QOaB+MtdAxhAZkqONW3TQBPECK5zTPCCP-nw@mail.gmail.com>
 <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu>
 <c9946fc2-ad48-a03f-a423-719761944d80@andrew.cmu.edu>
 <CAKH1gVrBTPWt30wppFc7dTapMJ=0BBCJGMxm0Jz4vKNwoenGPw@mail.gmail.com>
 <20181107013924.QogiH68Av%predragp@andrew.cmu.edu>
Message-ID: <CAKH1gVrCz3Tig5X-iCTDVyUG_7fNXR9g32iKVXzwQMst7i7WMw@mail.gmail.com>

I am working on fixing this.

It is indeed SQLite and NFS not talking to each other which is the problem.
I am able to get jupyter to behave somewhat better by forcing the SQLite
server to be in the scratch instead if the NFS. This requires changing some
default flags for jupyter.

Its not completely fixed yet, but I will get back with an update soon.

Chirag


On Tue, Nov 6, 2018 at 8:39 PM, Predrag Punosevac <predragp at andrew.cmu.edu>
wrote:

> Chirag Nagpal <chiragn at cs.cmu.edu> wrote:
>
> > Predrag: I am able to reproduce the error on the scratch directory too.
> >
>
> I am sure you guys have a problem but one doesn't need Jupyter to do
> actual programming in Python. I am saying this because there are
> handful of you who are affected by this behavior (God knows what could
> have caused possibly even regressions by newer version of packages) and
> I am the only firefighter currently without bandwidth to deal with such
> wild fires.
>
> > Vincent, Matt:  are you using the anaconda jupyter? it could be an
> anaconda
> > upgrade thats responsible ?
> >
> > Chirag
> >
> > On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme <
> vjeansel at andrew.cmu.edu>
> > wrote:
> >
> > > Even after changing my bashrc with export IPYTHONDIR=/home/scratch/$
> USER/.ipython
> > > and reinstalling my jupyter, it still does not seem to work. I also
> have an
> > > issue with git, I am no longer able to pull from the server.
> > >
>
>
> The git issue is environmental variable issue which is caused by the
> fact that there is only one user on Gogs server (git) and all accounts
> are just aliases with their own ssh-keys to this account. I don't use
> and know enough about git but those nasty files in your
>
> reponame/.git
>
> folder
>
> which look like
>
> predragp at lov3$ ls
> branches  description  HEAD   index  logs     ORIG_HEAD    refs
> config    FETCH_HEAD   hooks  info   objects  packed-refs
>
> apparently get populated in different ways depending on the login. So
> for example if I ssh to one of the computing nodes from home I get this
>
>
> predragp at lov3$ git pull
> Host key fingerprint is
> SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA
> +---[ECDSA 256]---+
> |     .++++o.o..  |
> |     . .+=+B .   |
> |. . . . ..Bo.    |
> |.E + o . o  .    |
> | .= *   S        |
> |o=o*  .          |
> |+=+.o .o         |
> |o.o+.+. .        |
> |.   +o..         |
> +----[SHA256]-----+
> remote: Enumerating objects: 34, done.
> remote: Counting objects: 100% (34/34), done.
> remote: Compressing objects: 100% (21/21), done.
> remote: Total 22 (delta 8), reused 0 (delta 0)
> Unpacking objects: 100% (22/22), done.
> From ssh://git:/predragp/ansible
>    cfc10cf..71a9aec  master     -> origin/master
> Updating cfc10cf..71a9aec
> Fast-forward
>  Linux/autofs/etc/auto.nfs                         | 18
> +++++++++++++++++
>  Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt |  0
>  Linux/ldap/etc/openldap/ldap.conf                 |  6 +++---
>  Linux/ldap/etc/sssd/sssd.conf                     |  2 +-
>  Linux/ldap/ldap.yaml                              | 24
> +++++++++++++++--------
>  5 files changed, 38 insertions(+), 12 deletions(-)
>  rename Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt (100%)
>
>
> If I ssh to my desktop I get
>
> predragp at lake$ git pull
> Host key fingerprint is
> SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA
> +---[ECDSA 256]---+
> |     .++++o.o..  |
> |     . .+=+B .   |
> |. . . . ..Bo.    |
> |.E + o . o  .    |
> | .= *   S        |
> |o=o*  .          |
> |+=+.o .o         |
> |o.o+.+. .        |
> |.   +o..         |
> +----[SHA256]-----+
> Password for git at git.int.autonlab.org:
>
> which is the indication that my .ssh/config file and the ssh-key were
> not read even though
>
> Host git
>     HostName git.int.autonlab.org
>     Port 2222
>     User git
>     IdentityFile /home/predragp/.ssh/git_rsa
>
> However if I log from the terminal to my desktop I don't have a Git
> issue.
>
>
> Best,
> Predrag
>
>
>
>
>
>
> > >
> > > On 11/6/18 3:14 PM, Predrag Punosevac wrote:
> > >
> > >> Matthew Barnes <mbarnes1 at andrew.cmu.edu> wrote:
> > >>
> > >> Also having this problem. Trying to create a new notebook hangs on
> > >>> "Creating new notebook in", and unable to open old notebooks.
> Anyone's
> > >>> setup currently working?
> > >>>
> > >>> Jupyter Notebook is using sqlite database to store the info. Unless
> you
> > >> explicitly force Jupyter to create the database on the scratch
> directory
> > >> the database is stored on the NFS share. There is nothing worse one
> can
> > >> do in terms of data consistency than put a database or a private Git
> > >> repo (talking about the server) onto the NFS. The datebase was left in
> > >> inconsistent state after the file server was rebooted. You have to
> clear
> > >> it and possibly recreate the database to be able to use Jupyter
> > >> Notebook.
> > >>
> > >> Best,
> > >> Predrag
> > >>
> > >>
> > >> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme <
> > >>> vjeansel at andrew.cmu.edu>
> > >>> wrote:
> > >>>
> > >>> Hello,
> > >>>>
> > >>>> Has anyone faced problems with running Jupyter Notebook since
> yesterday
> > >>>> ?
> > >>>> Did you remember what was the change to operate after the last
> reboot of
> > >>>> the sqlite database ?
> > >>>>
> > >>>> Thank you,
> > >>>>
> > >>>> Vincent
> > >>>>
> > >>>> --
> > >>>> Vincent Jeanselme
> > >>>> -----------------
> > >>>> Analyst Researcher
> > >>>> Auton Lab - Robotics Institute
> > >>>> Carnegie Mellon University
> > >>>>
> > >>>>
> > >>>> --
> > > Vincent Jeanselme
> > > -----------------
> > > Analyst Researcher
> > > Auton Lab - Robotics Institute
> > > Carnegie Mellon University
> > >
> > >
> >
> >
> > --
> >
> > *Chirag Nagpal* Graduate Student, Language Technologies Institute
> > School of Computer Science
> > Carnegie Mellon University
> > cs.cmu.edu/~chiragn
>


-- 

*Chirag Nagpal* Graduate Student, Language Technologies Institute
School of Computer Science
Carnegie Mellon University
cs.cmu.edu/~chiragn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/631dafbb/attachment-0001.html>

From chiragn at cs.cmu.edu  Tue Nov  6 21:39:30 2018
From: chiragn at cs.cmu.edu (Chirag Nagpal)
Date: Tue, 6 Nov 2018 21:39:30 -0500
Subject: Jupyter Notebooks
In-Reply-To: <CAKH1gVrCz3Tig5X-iCTDVyUG_7fNXR9g32iKVXzwQMst7i7WMw@mail.gmail.com>
References: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>
 <CAB7OVwCCY9R7R_QOaB+MtdAxhAZkqONW3TQBPECK5zTPCCP-nw@mail.gmail.com>
 <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu>
 <c9946fc2-ad48-a03f-a423-719761944d80@andrew.cmu.edu>
 <CAKH1gVrBTPWt30wppFc7dTapMJ=0BBCJGMxm0Jz4vKNwoenGPw@mail.gmail.com>
 <20181107013924.QogiH68Av%predragp@andrew.cmu.edu>
 <CAKH1gVrCz3Tig5X-iCTDVyUG_7fNXR9g32iKVXzwQMst7i7WMw@mail.gmail.com>
Message-ID: <CAKH1gVr0BGgya1DORgaS5AD4qQ2BiNdW4MFX-At7jNDR6p1fVg@mail.gmail.com>

Ok I think I have a quick fix for people struggling with jupyter notebook
on NFS.

There are essentially two parts to the problem. The first one deals with
forcing jupyter to create its SQLite db in your scratch directory. The
second part is the ipython directory.


Part 1:

Step 1 : First ssh into the Auton Computing Environment and run

$jupyter notebook --generate-config

This would create a config file 'jupyter_notebook_config.py' in ~/.jupyter
. ( Note that ~ this is /zfsauton/home/<username> )

Step 2: Edit the file above with your favourite text editor and add (or
uncomment) the following line

c.NotebookNotary.db_file='/home/scratch/<username>/jupyter.log'

replace <username> with your auton username.

If you executed part 1 correctly, you should be able to create a new
Notebook, and stop the Ipython server using control-c. However, you will
not be able to connect Jupyter to Ipython for which you need to perform
part 2

Part 2:

$export IPYTHONDIR=/home/scratch/<username>/ipython

as always replace <username> with your auton username.

you can also add this to .bashrc for next time ;)

Thats it!

It should work now!

Chirag


On Tue, Nov 6, 2018 at 8:43 PM, Chirag Nagpal <chiragn at cs.cmu.edu> wrote:

> I am working on fixing this.
>
> It is indeed SQLite and NFS not talking to each other which is the
> problem. I am able to get jupyter to behave somewhat better by forcing the
> SQLite server to be in the scratch instead if the NFS. This requires
> changing some default flags for jupyter.
>
> Its not completely fixed yet, but I will get back with an update soon.
>
> Chirag
>
>
>
> On Tue, Nov 6, 2018 at 8:39 PM, Predrag Punosevac <predragp at andrew.cmu.edu
> > wrote:
>
>> Chirag Nagpal <chiragn at cs.cmu.edu> wrote:
>>
>> > Predrag: I am able to reproduce the error on the scratch directory too.
>> >
>>
>> I am sure you guys have a problem but one doesn't need Jupyter to do
>> actual programming in Python. I am saying this because there are
>> handful of you who are affected by this behavior (God knows what could
>> have caused possibly even regressions by newer version of packages) and
>> I am the only firefighter currently without bandwidth to deal with such
>> wild fires.
>>
>> > Vincent, Matt:  are you using the anaconda jupyter? it could be an
>> anaconda
>> > upgrade thats responsible ?
>> >
>> > Chirag
>> >
>> > On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme <
>> vjeansel at andrew.cmu.edu>
>> > wrote:
>> >
>> > > Even after changing my bashrc with export
>> IPYTHONDIR=/home/scratch/$USER/.ipython
>> > > and reinstalling my jupyter, it still does not seem to work. I also
>> have an
>> > > issue with git, I am no longer able to pull from the server.
>> > >
>>
>>
>> The git issue is environmental variable issue which is caused by the
>> fact that there is only one user on Gogs server (git) and all accounts
>> are just aliases with their own ssh-keys to this account. I don't use
>> and know enough about git but those nasty files in your
>>
>> reponame/.git
>>
>> folder
>>
>> which look like
>>
>> predragp at lov3$ ls
>> branches  description  HEAD   index  logs     ORIG_HEAD    refs
>> config    FETCH_HEAD   hooks  info   objects  packed-refs
>>
>> apparently get populated in different ways depending on the login. So
>> for example if I ssh to one of the computing nodes from home I get this
>>
>>
>> predragp at lov3$ git pull
>> Host key fingerprint is
>> SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA
>> +---[ECDSA 256]---+
>> |     .++++o.o..  |
>> |     . .+=+B .   |
>> |. . . . ..Bo.    |
>> |.E + o . o  .    |
>> | .= *   S        |
>> |o=o*  .          |
>> |+=+.o .o         |
>> |o.o+.+. .        |
>> |.   +o..         |
>> +----[SHA256]-----+
>> remote: Enumerating objects: 34, done.
>> remote: Counting objects: 100% (34/34), done.
>> remote: Compressing objects: 100% (21/21), done.
>> remote: Total 22 (delta 8), reused 0 (delta 0)
>> Unpacking objects: 100% (22/22), done.
>> From ssh://git:/predragp/ansible
>>    cfc10cf..71a9aec  master     -> origin/master
>> Updating cfc10cf..71a9aec
>> Fast-forward
>>  Linux/autofs/etc/auto.nfs                         | 18
>> +++++++++++++++++
>>  Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt |  0
>>  Linux/ldap/etc/openldap/ldap.conf                 |  6 +++---
>>  Linux/ldap/etc/sssd/sssd.conf                     |  2 +-
>>  Linux/ldap/ldap.yaml                              | 24
>> +++++++++++++++--------
>>  5 files changed, 38 insertions(+), 12 deletions(-)
>>  rename Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt (100%)
>>
>>
>> If I ssh to my desktop I get
>>
>> predragp at lake$ git pull
>> Host key fingerprint is
>> SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA
>> +---[ECDSA 256]---+
>> |     .++++o.o..  |
>> |     . .+=+B .   |
>> |. . . . ..Bo.    |
>> |.E + o . o  .    |
>> | .= *   S        |
>> |o=o*  .          |
>> |+=+.o .o         |
>> |o.o+.+. .        |
>> |.   +o..         |
>> +----[SHA256]-----+
>> Password for git at git.int.autonlab.org:
>>
>> which is the indication that my .ssh/config file and the ssh-key were
>> not read even though
>>
>> Host git
>>     HostName git.int.autonlab.org
>>     Port 2222
>>     User git
>>     IdentityFile /home/predragp/.ssh/git_rsa
>>
>> However if I log from the terminal to my desktop I don't have a Git
>> issue.
>>
>>
>> Best,
>> Predrag
>>
>>
>>
>>
>>
>>
>> > >
>> > > On 11/6/18 3:14 PM, Predrag Punosevac wrote:
>> > >
>> > >> Matthew Barnes <mbarnes1 at andrew.cmu.edu> wrote:
>> > >>
>> > >> Also having this problem. Trying to create a new notebook hangs on
>> > >>> "Creating new notebook in", and unable to open old notebooks.
>> Anyone's
>> > >>> setup currently working?
>> > >>>
>> > >>> Jupyter Notebook is using sqlite database to store the info. Unless
>> you
>> > >> explicitly force Jupyter to create the database on the scratch
>> directory
>> > >> the database is stored on the NFS share. There is nothing worse one
>> can
>> > >> do in terms of data consistency than put a database or a private Git
>> > >> repo (talking about the server) onto the NFS. The datebase was left
>> in
>> > >> inconsistent state after the file server was rebooted. You have to
>> clear
>> > >> it and possibly recreate the database to be able to use Jupyter
>> > >> Notebook.
>> > >>
>> > >> Best,
>> > >> Predrag
>> > >>
>> > >>
>> > >> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme <
>> > >>> vjeansel at andrew.cmu.edu>
>> > >>> wrote:
>> > >>>
>> > >>> Hello,
>> > >>>>
>> > >>>> Has anyone faced problems with running Jupyter Notebook since
>> yesterday
>> > >>>> ?
>> > >>>> Did you remember what was the change to operate after the last
>> reboot of
>> > >>>> the sqlite database ?
>> > >>>>
>> > >>>> Thank you,
>> > >>>>
>> > >>>> Vincent
>> > >>>>
>> > >>>> --
>> > >>>> Vincent Jeanselme
>> > >>>> -----------------
>> > >>>> Analyst Researcher
>> > >>>> Auton Lab - Robotics Institute
>> > >>>> Carnegie Mellon University
>> > >>>>
>> > >>>>
>> > >>>> --
>> > > Vincent Jeanselme
>> > > -----------------
>> > > Analyst Researcher
>> > > Auton Lab - Robotics Institute
>> > > Carnegie Mellon University
>> > >
>> > >
>> >
>> >
>> > --
>> >
>> > *Chirag Nagpal* Graduate Student, Language Technologies Institute
>> > School of Computer Science
>> > Carnegie Mellon University
>> > cs.cmu.edu/~chiragn
>>
>
>
>
> --
>
> *Chirag Nagpal* Graduate Student, Language Technologies Institute
> School of Computer Science
> Carnegie Mellon University
> cs.cmu.edu/~chiragn
>


-- 

*Chirag Nagpal* Graduate Student, Language Technologies Institute
School of Computer Science
Carnegie Mellon University
cs.cmu.edu/~chiragn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/329d1723/attachment-0001.html>

From yichongx at cs.cmu.edu  Tue Nov  6 21:43:29 2018
From: yichongx at cs.cmu.edu (Yichong Xu)
Date: Wed, 7 Nov 2018 02:43:29 +0000
Subject: CUDA hangs
In-Reply-To: <CAO5Ti1s8hkdqGR-7NMw8BMmxvtSVYMAc28MTZCp4zEUVWLg4yQ@mail.gmail.com>
References: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
 <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>
 <CAO5Ti1s8hkdqGR-7NMw8BMmxvtSVYMAc28MTZCp4zEUVWLg4yQ@mail.gmail.com>
Message-ID: <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu>

Previously we have encountered this issue: Basically somehow you cannot put your cuda cache on nfs server now. Doing this will resolve the problem (works for me):
export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder]

Thanks,
Yichong


On Nov 6, 2018, at 7:41 PM, Emre Yolcu <eyolcu at cs.cmu.edu<mailto:eyolcu at cs.cmu.edu>> wrote:

Could you try setting up everything in the scratch directory and test that way (if that's not what you're already doing)? The last time we had a CUDA problem I moved everything from /zfsauton/home to /home/scratch directories and I cannot reproduce the error on gpu{6,8,9}.

On Tue, Nov 6, 2018 at 6:41 PM, <qiong.zhang at stat.ubc.ca<mailto:qiong.zhang at stat.ubc.ca>> wrote:

I have a similar issue. When I submit the job, it says Runtime error: CUDA error: unknown error. I tried the simple commands that you provided, doesn't work as well.

Qiong

November 6, 2018 3:02 PM, "Matthew Barnes" <mbarnes1 at andrew.cmu.edu<mailto:%22Matthew%20Barnes%22%20%3Cmbarnes1 at andrew.cmu.edu%3E>> wrote:
Is anyone else having issues with CUDA since this week? Even simple pytorch commands hang:
(torch) bash-4.2$ python
Python 2.7.5 (default, Jul 3 2018, 19:30:05)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
x>>> x = torch.zeros(4)
>>> x.cuda()
nvidia-smi works, and torch.cuda.is_available() returns True.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181107/76eece04/attachment.html>

From mbarnes1 at andrew.cmu.edu  Tue Nov  6 21:51:19 2018
From: mbarnes1 at andrew.cmu.edu (Matthew Barnes)
Date: Tue, 6 Nov 2018 21:51:19 -0500
Subject: CUDA hangs
In-Reply-To: <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu>
References: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
 <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>
 <CAO5Ti1s8hkdqGR-7NMw8BMmxvtSVYMAc28MTZCp4zEUVWLg4yQ@mail.gmail.com>
 <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu>
Message-ID: <CAB7OVwB=kYQxY7PkCFxZwZDpLuZ2ES6i7ELj39d_02YFx-DkMw@mail.gmail.com>

The CUDA_CACHE_PATH works! Thanks for the quick fix.

On Tue, Nov 6, 2018 at 9:44 PM Yichong Xu <yichongx at cs.cmu.edu> wrote:

> Previously we have encountered this issue: Basically somehow you cannot
> put your cuda cache on nfs server now. Doing this will resolve the problem
> (works for me):
> export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder]
>
> *Thanks,*
> *Yichong*
>
>
>
> On Nov 6, 2018, at 7:41 PM, Emre Yolcu <eyolcu at cs.cmu.edu> wrote:
>
> Could you try setting up everything in the scratch directory and test that
> way (if that's not what you're already doing)? The last time we had a CUDA
> problem I moved everything from /zfsauton/home to /home/scratch directories
> and I cannot reproduce the error on gpu{6,8,9}.
>
> On Tue, Nov 6, 2018 at 6:41 PM, <qiong.zhang at stat.ubc.ca> wrote:
>
>> I have a similar issue. When I submit the job, it says Runtime error:
>> CUDA error: unknown error. I tried the simple commands that you provided,
>> doesn't work as well.
>>
>> Qiong
>>
>> November 6, 2018 3:02 PM, "Matthew Barnes" <mbarnes1 at andrew.cmu.edu
>> <%22Matthew%20Barnes%22%20%3Cmbarnes1 at andrew.cmu.edu%3E>> wrote:
>>
>> Is anyone else having issues with CUDA since this week? Even simple
>> pytorch commands hang:
>> (torch) bash-4.2$ python
>> Python 2.7.5 (default, Jul 3 2018, 19:30:05)
>> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> import torch
>> x>>> x = torch.zeros(4)
>> >>> x.cuda()
>> nvidia-smi works, and torch.cuda.is_available() returns True.
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/596c5dd5/attachment.html>

From vjeansel at andrew.cmu.edu  Tue Nov  6 22:01:18 2018
From: vjeansel at andrew.cmu.edu (Vincent Jeanselme)
Date: Tue, 6 Nov 2018 22:01:18 -0500
Subject: CUDA hangs
In-Reply-To: <CAB7OVwB=kYQxY7PkCFxZwZDpLuZ2ES6i7ELj39d_02YFx-DkMw@mail.gmail.com>
References: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
 <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>
 <CAO5Ti1s8hkdqGR-7NMw8BMmxvtSVYMAc28MTZCp4zEUVWLg4yQ@mail.gmail.com>
 <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu>
 <CAB7OVwB=kYQxY7PkCFxZwZDpLuZ2ES6i7ELj39d_02YFx-DkMw@mail.gmail.com>
Message-ID: <ae94a6e4-b158-8537-77dc-f2c5212e6dd4@andrew.cmu.edu>

Unfortunately not for me, I already had this path ...

Le 06/11/2018 ? 21:51, Matthew Barnes a ?crit?:
> The CUDA_CACHE_PATH works! Thanks for the quick fix.
>
> On Tue, Nov 6, 2018 at 9:44 PM Yichong Xu <yichongx at cs.cmu.edu 
> <mailto:yichongx at cs.cmu.edu>> wrote:
>
>     Previously we have encountered this issue: Basically somehow you
>     cannot put your cuda cache on nfs server now. Doing this will
>     resolve the problem (works for me):
>     export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder]
>
>     /Thanks,/
>     /Yichong/
>
>
>
>>     On Nov 6, 2018, at 7:41 PM, Emre Yolcu <eyolcu at cs.cmu.edu
>>     <mailto:eyolcu at cs.cmu.edu>> wrote:
>>
>>     Could you try setting up everything in the scratch directory and
>>     test that way (if that's not what you're already doing)? The last
>>     time we had a CUDA problem I moved everything from /zfsauton/home
>>     to /home/scratch directories and I cannot reproduce the error on
>>     gpu{6,8,9}.
>>
>>     On Tue, Nov 6, 2018 at 6:41 PM, <qiong.zhang at stat.ubc.ca
>>     <mailto:qiong.zhang at stat.ubc.ca>> wrote:
>>
>>         I have a similar issue. When I submit the job, it says
>>         Runtime error: CUDA error: unknown error. I tried the simple
>>         commands that you provided, doesn't work as well.
>>
>>         Qiong
>>
>>
>>         November 6, 2018 3:02 PM, "Matthew Barnes"
>>         <mbarnes1 at andrew.cmu.edu
>>         <mailto:%22Matthew%20Barnes%22%20%3Cmbarnes1 at andrew.cmu.edu%3E>>
>>         wrote:
>>
>>             Is anyone else having issues with CUDA since this week?
>>             Even simple pytorch commands hang:
>>             (torch) bash-4.2$ python
>>             Python 2.7.5 (default, Jul 3 2018, 19:30:05)
>>             [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
>>             Type "help", "copyright", "credits" or "license" for more
>>             information.
>>             >>> import torch
>>             x>>> x = torch.zeros(4)
>>             >>> x.cuda()
>>             nvidia-smi works, and torch.cuda.is_available() returns True.
>>
>>
>>
>>
>
-- 
Vincent Jeanselme
-----------------
Analyst Researcher
Auton Lab - Robotics Institute
Carnegie Mellon University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/1bf14b23/attachment-0001.html>

From vjeansel at andrew.cmu.edu  Tue Nov  6 22:06:53 2018
From: vjeansel at andrew.cmu.edu (Vincent Jeanselme)
Date: Tue, 6 Nov 2018 22:06:53 -0500
Subject: Jupyter Notebooks
In-Reply-To: <CAKH1gVr0BGgya1DORgaS5AD4qQ2BiNdW4MFX-At7jNDR6p1fVg@mail.gmail.com>
References: <cba72cd8-aaaf-9cf9-4630-6c6dd2ced0b8@andrew.cmu.edu>
 <CAB7OVwCCY9R7R_QOaB+MtdAxhAZkqONW3TQBPECK5zTPCCP-nw@mail.gmail.com>
 <20181106201427.MbWsmawdm%predragp@andrew.cmu.edu>
 <c9946fc2-ad48-a03f-a423-719761944d80@andrew.cmu.edu>
 <CAKH1gVrBTPWt30wppFc7dTapMJ=0BBCJGMxm0Jz4vKNwoenGPw@mail.gmail.com>
 <20181107013924.QogiH68Av%predragp@andrew.cmu.edu>
 <CAKH1gVrCz3Tig5X-iCTDVyUG_7fNXR9g32iKVXzwQMst7i7WMw@mail.gmail.com>
 <CAKH1gVr0BGgya1DORgaS5AD4qQ2BiNdW4MFX-At7jNDR6p1fVg@mail.gmail.com>
Message-ID: <94b2672f-cf66-f6ea-4780-f04305d1cb61@andrew.cmu.edu>

Thank you ! Uncommenting the line: c.NotebookNotary.db_file worked for 
my Jupyter

Le 06/11/2018 ? 21:39, Chirag Nagpal a ?crit?:
> Ok I think I have a quick fix for people struggling with jupyter 
> notebook on NFS.
>
> There are essentially two parts to the problem. The first one deals 
> with forcing jupyter to create its SQLite db in your scratch 
> directory. The second part is the ipython directory.
>
>
> Part 1:
>
> Step 1?: First ssh into the Auton Computing Environment and run
>
> $jupyter notebook --generate-config
>
> This would create a config file 'jupyter_notebook_config.py' in 
> ~/.jupyter . ( Note that ~ this is /zfsauton/home/<username> )
>
> Step 2: Edit the file above with your favourite text editor and add 
> (or uncomment) the following line
>
> c.NotebookNotary.db_file='/home/scratch/<username>/jupyter.log'
>
> replace <username> with your auton username.
>
> If you executed part 1 correctly, you should be able to create a new 
> Notebook, and stop the Ipython server using control-c. However, you 
> will not be able to connect Jupyter to Ipython for which you need to 
> perform part 2
>
> Part 2:
>
> $export IPYTHONDIR=/home/scratch/<username>/ipython
>
> as always replace<username> with your auton username.
>
> you can also add this to .bashrc for next time ;)
>
> Thats it!
>
> It should work now!
>
> Chirag
>
>
>
>
>
>
>
>
> On Tue, Nov 6, 2018 at 8:43 PM, Chirag Nagpal <chiragn at cs.cmu.edu 
> <mailto:chiragn at cs.cmu.edu>> wrote:
>
>     I am working on fixing this.
>
>     It is indeed SQLite and NFS not talking to each other which is the
>     problem. I am able to get jupyter to behave somewhat better by
>     forcing the SQLite server to be in the scratch instead if the NFS.
>     This requires changing some default flags for jupyter.
>
>     Its not completely fixed yet, but I will get back with an update
>     soon.
>
>     Chirag
>
>
>
>     On Tue, Nov 6, 2018 at 8:39 PM, Predrag Punosevac
>     <predragp at andrew.cmu.edu <mailto:predragp at andrew.cmu.edu>> wrote:
>
>         Chirag Nagpal <chiragn at cs.cmu.edu <mailto:chiragn at cs.cmu.edu>>
>         wrote:
>
>         > Predrag: I am able to reproduce the error on the scratch
>         directory too.
>         >
>
>         I am sure you guys have a problem but one doesn't need Jupyter
>         to do
>         actual programming in Python. I am saying this because there are
>         handful of you who are affected by this behavior (God knows
>         what could
>         have caused possibly even regressions by newer version of
>         packages) and
>         I am the only firefighter currently without bandwidth to deal
>         with such
>         wild fires.
>
>         > Vincent, Matt:? are you using the anaconda jupyter? it could
>         be an anaconda
>         > upgrade thats responsible ?
>         >
>         > Chirag
>         >
>         > On Tue, Nov 6, 2018 at 4:45 PM, Vincent Jeanselme
>         <vjeansel at andrew.cmu.edu <mailto:vjeansel at andrew.cmu.edu>>
>         > wrote:
>         >
>         > > Even after changing my bashrc with export
>         IPYTHONDIR=/home/scratch/$USER/.ipython
>         > > and reinstalling my jupyter, it still does not seem to
>         work. I also have an
>         > > issue with git, I am no longer able to pull from the server.
>         > >
>
>
>         The git issue is environmental variable issue which is caused
>         by the
>         fact that there is only one user on Gogs server (git) and all
>         accounts
>         are just aliases with their own ssh-keys to this account. I
>         don't use
>         and know enough about git but those nasty files in your
>
>         reponame/.git
>
>         folder
>
>         which look like
>
>         predragp at lov3$ ls
>         branches? description? HEAD? ?index? logs ?ORIG_HEAD? ? refs
>         config? ? FETCH_HEAD? ?hooks? info? ?objects packed-refs
>
>         apparently get populated in different ways depending on the
>         login. So
>         for example if I ssh to one of the computing nodes from home I
>         get this
>
>
>         predragp at lov3$ git pull
>         Host key fingerprint is
>         SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA
>         +---[ECDSA 256]---+
>         |? ? ?.++++o.o..? |
>         |? ? ?. .+=+B .? ?|
>         |. . . . ..Bo.? ? |
>         |.E + o . o? .? ? |
>         | .= *? ?S? ? ? ? |
>         |o=o*? .? ? ? ? ? |
>         |+=+.o .o? ? ? ? ?|
>         |o.o+.+. .? ? ? ? |
>         |.? ?+o..? ? ? ? ?|
>         +----[SHA256]-----+
>         remote: Enumerating objects: 34, done.
>         remote: Counting objects: 100% (34/34), done.
>         remote: Compressing objects: 100% (21/21), done.
>         remote: Total 22 (delta 8), reused 0 (delta 0)
>         Unpacking objects: 100% (22/22), done.
>         From ssh://git:/predragp/ansible
>         ? ?cfc10cf..71a9aec? master? ? ?-> origin/master
>         Updating cfc10cf..71a9aec
>         Fast-forward
>         ?Linux/autofs/etc/auto.nfs ?| 18
>         +++++++++++++++++
>         ?Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt |? 0
>         ?Linux/ldap/etc/openldap/ldap.conf ? ? ?|? 6 +++---
>         ?Linux/ldap/etc/sssd/sssd.conf ? ? ?|? 2 +-
>         ?Linux/ldap/ldap.yaml | 24
>         +++++++++++++++--------
>         ?5 files changed, 38 insertions(+), 12 deletions(-)
>         ?rename Linux/ldap/etc/openldap/{certs => cacerts}/ca.crt (100%)
>
>
>         If I ssh to my desktop I get
>
>         predragp at lake$ git pull
>         Host key fingerprint is
>         SHA256:XEkrneUFkkAPyf0gMOQKa3amiAp3QrXmu1x+sIJMcsA
>         +---[ECDSA 256]---+
>         |? ? ?.++++o.o..? |
>         |? ? ?. .+=+B .? ?|
>         |. . . . ..Bo.? ? |
>         |.E + o . o? .? ? |
>         | .= *? ?S? ? ? ? |
>         |o=o*? .? ? ? ? ? |
>         |+=+.o .o? ? ? ? ?|
>         |o.o+.+. .? ? ? ? |
>         |.? ?+o..? ? ? ? ?|
>         +----[SHA256]-----+
>         Password for git at git.int.autonlab.org
>         <mailto:git at git.int.autonlab.org>:
>
>         which is the indication that my .ssh/config file and the
>         ssh-key were
>         not read even though
>
>         Host git
>         ? ? HostName git.int.autonlab.org <http://git.int.autonlab.org>
>         ? ? Port 2222
>         ? ? User git
>         ? ? IdentityFile /home/predragp/.ssh/git_rsa
>
>         However if I log from the terminal to my desktop I don't have
>         a Git
>         issue.
>
>
>         Best,
>         Predrag
>
>
>
>
>
>
>         > >
>         > > On 11/6/18 3:14 PM, Predrag Punosevac wrote:
>         > >
>         > >> Matthew Barnes <mbarnes1 at andrew.cmu.edu
>         <mailto:mbarnes1 at andrew.cmu.edu>> wrote:
>         > >>
>         > >> Also having this problem. Trying to create a new notebook
>         hangs on
>         > >>> "Creating new notebook in", and unable to open old
>         notebooks. Anyone's
>         > >>> setup currently working?
>         > >>>
>         > >>> Jupyter Notebook is using sqlite database to store the
>         info. Unless you
>         > >> explicitly force Jupyter to create the database on the
>         scratch directory
>         > >> the database is stored on the NFS share. There is nothing
>         worse one can
>         > >> do in terms of data consistency than put a database or a
>         private Git
>         > >> repo (talking about the server) onto the NFS. The
>         datebase was left in
>         > >> inconsistent state after the file server was rebooted.
>         You have to clear
>         > >> it and possibly recreate the database to be able to use
>         Jupyter
>         > >> Notebook.
>         > >>
>         > >> Best,
>         > >> Predrag
>         > >>
>         > >>
>         > >> On Tue, Nov 6, 2018 at 12:44 PM Vincent Jeanselme <
>         > >>> vjeansel at andrew.cmu.edu <mailto:vjeansel at andrew.cmu.edu>>
>         > >>> wrote:
>         > >>>
>         > >>> Hello,
>         > >>>>
>         > >>>> Has anyone faced problems with running Jupyter Notebook
>         since yesterday
>         > >>>> ?
>         > >>>> Did you remember what was the change to operate after
>         the last reboot of
>         > >>>> the sqlite database ?
>         > >>>>
>         > >>>> Thank you,
>         > >>>>
>         > >>>> Vincent
>         > >>>>
>         > >>>> --
>         > >>>> Vincent Jeanselme
>         > >>>> -----------------
>         > >>>> Analyst Researcher
>         > >>>> Auton Lab - Robotics Institute
>         > >>>> Carnegie Mellon University
>         > >>>>
>         > >>>>
>         > >>>> --
>         > > Vincent Jeanselme
>         > > -----------------
>         > > Analyst Researcher
>         > > Auton Lab - Robotics Institute
>         > > Carnegie Mellon University
>         > >
>         > >
>         >
>         >
>         > --
>         >
>         > *Chirag Nagpal* Graduate Student, Language Technologies
>         Institute
>         > School of Computer Science
>         > Carnegie Mellon University
>         > cs.cmu.edu/~chiragn <http://cs.cmu.edu/~chiragn>
>
>
>
>
>     -- 
>     *Chirag Nagpal
>     *Graduate Student, Language Technologies Institute
>     School of Computer Science
>     Carnegie Mellon University
>     cs.cmu.edu/~chiragn <http://cs.cmu.edu/~chiragn>
>
>
>
>
> -- 
> *Chirag Nagpal
> * Graduate Student, Language Technologies Institute
> School of Computer Science
> Carnegie Mellon University
> cs.cmu.edu/~chiragn <http://cs.cmu.edu/~chiragn>

-- 
Vincent Jeanselme
-----------------
Analyst Researcher
Auton Lab - Robotics Institute
Carnegie Mellon University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/5034d12f/attachment-0001.html>

From qiong.zhang at stat.ubc.ca  Tue Nov  6 23:58:32 2018
From: qiong.zhang at stat.ubc.ca (qiong.zhang at stat.ubc.ca)
Date: Wed, 07 Nov 2018 04:58:32 +0000
Subject: CUDA hangs
In-Reply-To: <20181107014114.WFKLqkUPf%predragp@andrew.cmu.edu>
References: <20181107014114.WFKLqkUPf%predragp@andrew.cmu.edu>
 <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
 <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>
Message-ID: <8ba316a89bd0eed2ce0cbb75a959c545@stat.ubc.ca>

The CUDA_CACHE_PATH works for me! Thanks.

Qiong


November 6, 2018 5:41 PM, "Predrag Punosevac" <predragp at andrew.cmu.edu> wrote:

> qiong.zhang at stat.ubc.ca wrote:
> 
>> I have a similar issue. When I submit the job, it says Runtime error: CUDA error: unknown error. I
>> tried the simple commands that you provided, doesn't work as well.
> 
> Can you tell me which server? We have nine GPU servers.
> 
> Predrag
> 
>> Qiong
>> November 6, 2018 3:02 PM, "Matthew Barnes" <mbarnes1 at andrew.cmu.edu
>> (mailto:%22Matthew%20Barnes%22%20<mbarnes1 at andrew.cmu.edu>)> wrote:
>> Is anyone else having issues with CUDA since this week? Even simple pytorch commands hang:
>> (torch) bash-4.2$ python
>> Python 2.7.5 (default, Jul 3 2018, 19:30:05)
>> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> import torch
>> x>>> x = torch.zeros(4)
>> x.cuda()
>> nvidia-smi works, and torch.cuda.is_available() returns True.


From vjeansel at andrew.cmu.edu  Wed Nov  7 08:52:34 2018
From: vjeansel at andrew.cmu.edu (Vincent Jeanselme)
Date: Wed, 7 Nov 2018 08:52:34 -0500
Subject: CUDA hangs
In-Reply-To: <ae94a6e4-b158-8537-77dc-f2c5212e6dd4@andrew.cmu.edu>
References: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
 <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>
 <CAO5Ti1s8hkdqGR-7NMw8BMmxvtSVYMAc28MTZCp4zEUVWLg4yQ@mail.gmail.com>
 <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu>
 <CAB7OVwB=kYQxY7PkCFxZwZDpLuZ2ES6i7ELj39d_02YFx-DkMw@mail.gmail.com>
 <ae94a6e4-b158-8537-77dc-f2c5212e6dd4@andrew.cmu.edu>
Message-ID: <e81b39a3-e159-0849-65eb-65fd38c1a182@andrew.cmu.edu>

Problem solved after restart of tmux

On 11/6/18 10:01 PM, Vincent Jeanselme wrote:
>
> Unfortunately not for me, I already had this path ...
>
> Le 06/11/2018 ? 21:51, Matthew Barnes a ?crit?:
>> The CUDA_CACHE_PATH works! Thanks for the quick fix.
>>
>> On Tue, Nov 6, 2018 at 9:44 PM Yichong Xu <yichongx at cs.cmu.edu 
>> <mailto:yichongx at cs.cmu.edu>> wrote:
>>
>>     Previously we have encountered this issue: Basically somehow you
>>     cannot put your cuda cache on nfs server now. Doing this will
>>     resolve the problem (works for me):
>>     export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder]
>>
>>     /Thanks,/
>>     /Yichong/
>>
>>
>>
>>>     On Nov 6, 2018, at 7:41 PM, Emre Yolcu <eyolcu at cs.cmu.edu
>>>     <mailto:eyolcu at cs.cmu.edu>> wrote:
>>>
>>>     Could you try setting up everything in the scratch directory and
>>>     test that way (if that's not what you're already doing)? The
>>>     last time we had a CUDA problem I moved everything from
>>>     /zfsauton/home to /home/scratch directories and I cannot
>>>     reproduce the error on gpu{6,8,9}.
>>>
>>>     On Tue, Nov 6, 2018 at 6:41 PM, <qiong.zhang at stat.ubc.ca
>>>     <mailto:qiong.zhang at stat.ubc.ca>> wrote:
>>>
>>>         I have a similar issue. When I submit the job, it says
>>>         Runtime error: CUDA error: unknown error. I tried the simple
>>>         commands that you provided, doesn't work as well.
>>>
>>>         Qiong
>>>
>>>
>>>         November 6, 2018 3:02 PM, "Matthew Barnes"
>>>         <mbarnes1 at andrew.cmu.edu
>>>         <mailto:%22Matthew%20Barnes%22%20%3Cmbarnes1 at andrew.cmu.edu%3E>>
>>>         wrote:
>>>
>>>             Is anyone else having issues with CUDA since this week?
>>>             Even simple pytorch commands hang:
>>>             (torch) bash-4.2$ python
>>>             Python 2.7.5 (default, Jul 3 2018, 19:30:05)
>>>             [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
>>>             Type "help", "copyright", "credits" or "license" for
>>>             more information.
>>>             >>> import torch
>>>             x>>> x = torch.zeros(4)
>>>             >>> x.cuda()
>>>             nvidia-smi works, and torch.cuda.is_available() returns
>>>             True.
>>>
>>>
>>>
>>>
>>
> -- 
> Vincent Jeanselme
> -----------------
> Analyst Researcher
> Auton Lab - Robotics Institute
> Carnegie Mellon University

-- 
Vincent Jeanselme
-----------------
Analyst Researcher
Auton Lab - Robotics Institute
Carnegie Mellon University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181107/4bcf389e/attachment.html>

From predragp at andrew.cmu.edu  Wed Nov  7 12:22:05 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Wed, 07 Nov 2018 12:22:05 -0500
Subject: Moral of the story
In-Reply-To: <e81b39a3-e159-0849-65eb-65fd38c1a182@andrew.cmu.edu>
References: <CAB7OVwBTs71O1a0qE7R4JKdoEZrUREYhmn8UfZ4mHL4CG4nWWQ@mail.gmail.com>
 <e8d471abd906afe91e04031f8e388992@stat.ubc.ca>
 <CAO5Ti1s8hkdqGR-7NMw8BMmxvtSVYMAc28MTZCp4zEUVWLg4yQ@mail.gmail.com>
 <138093CD-5E74-42EA-A6E6-FA7F7BBB09ED@andrew.cmu.edu>
 <CAB7OVwB=kYQxY7PkCFxZwZDpLuZ2ES6i7ELj39d_02YFx-DkMw@mail.gmail.com>
 <ae94a6e4-b158-8537-77dc-f2c5212e6dd4@andrew.cmu.edu>
 <e81b39a3-e159-0849-65eb-65fd38c1a182@andrew.cmu.edu>
Message-ID: <20181107172205.cWJ60phln%predragp@andrew.cmu.edu>

Vincent Jeanselme <vjeansel at andrew.cmu.edu> wrote:

> Problem solved after restart of tmux
> 

This is a good opportunity for all of us to reflect on what we have
learnt from this long public e-mail exchange.


1. Caching thing be it pytorch, ccache, or something else speeds up the
things but create lot of problems when done on the volatile file system
as NFS backed up by the most expensive file system ZFS. It creates
unexpected hard to trace errors in the case of the file server
unavailability. However from a system admin point of view create
enormous garbage on the file server in the form of metadata needed to
store hourly snapshots. I would wage $100 that we probably have 500GB in
cache files and their snapshots alone on the main file server. 

I would really appreciate if everyone volunteerly uses only their
scratch directories (not /tmp not NFS) for caching as well as clean
their home directories during this time when ZFS snapshots are disabled.


2. Storing databases on NFS even unconsciously (sqlite used by Jupyter
notebook) will sooner or later leave them in unconsistent state and lead
to user frustration which is very hard and time consuming to trace and
address. It is even worse doing it intensionally with PostgreSQL or
MySQL. Please store your Jupyter notebooks sqlite databases on the
scratch directory. For everything else more serious, we have database
host that can be used on the need base.


3. Finally we all need to familiarize ourselves better with the tools we
are using (Git/Gogs/tmux/screen etc). The decision that we adopt Git as
a version control system for the Auton Lab was a long and carefully
thought-out.  For the record my opinion and my preference (fossil)
didn't bare almost any weight.  We had two other version control systems
CVS and Subversion in the past which are still available as read only
through ViewVC

http://svnhub.int.autonlab.org/viewvc

and I can assure you that we learnt the lectures by using them.  The
same goes for the Gogs self-hosted Git service which provides us with
web interface but also with bug tracking mechanism with code tagging,
Wiki, and solid integration with Jenkins. Is it perfect? No it is not.
Does one need to understand how the ssh-keys and environmental variables
are read. Yes you have to get your feet wet and it is far easier to do
it at the Auton Lab which is very forgiving academic computing
environment than at your next place of employment. If you think that the
Gogs alternative GitLab is any better think again and just talk to
people who used it or God forbid try to set it up. 

Best,
Predrag

> On 11/6/18 10:01 PM, Vincent Jeanselme wrote:
> >
> > Unfortunately not for me, I already had this path ...
> >
> > Le 06/11/2018 ?? 21:51, Matthew Barnes a ??crit??:
> >> The CUDA_CACHE_PATH works! Thanks for the quick fix.
> >>
> >> On Tue, Nov 6, 2018 at 9:44 PM Yichong Xu <yichongx at cs.cmu.edu 
> >> <mailto:yichongx at cs.cmu.edu>> wrote:
> >>
> >>     Previously we have encountered this issue: Basically somehow you
> >>     cannot put your cuda cache on nfs server now. Doing this will
> >>     resolve the problem (works for me):
> >>     export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder]
> >>
> >>     /Thanks,/
> >>     /Yichong/
> >>
> >>
> >>
> >>>     On Nov 6, 2018, at 7:41 PM, Emre Yolcu <eyolcu at cs.cmu.edu
> >>>     <mailto:eyolcu at cs.cmu.edu>> wrote:
> >>>
> >>>     Could you try setting up everything in the scratch directory and
> >>>     test that way (if that's not what you're already doing)? The
> >>>     last time we had a CUDA problem I moved everything from
> >>>     /zfsauton/home to /home/scratch directories and I cannot
> >>>     reproduce the error on gpu{6,8,9}.
> >>>
> >>>     On Tue, Nov 6, 2018 at 6:41 PM, <qiong.zhang at stat.ubc.ca
> >>>     <mailto:qiong.zhang at stat.ubc.ca>> wrote:
> >>>
> >>>         I have a similar issue. When I submit the job, it says
> >>>         Runtime error: CUDA error: unknown error. I tried the simple
> >>>         commands that you provided, doesn't work as well.
> >>>
> >>>         Qiong
> >>>
> >>>
> >>>         November 6, 2018 3:02 PM, "Matthew Barnes"
> >>>         <mbarnes1 at andrew.cmu.edu
> >>>         <mailto:%22Matthew%20Barnes%22%20%3Cmbarnes1 at andrew.cmu.edu%3E>>
> >>>         wrote:
> >>>
> >>>             Is anyone else having issues with CUDA since this week?
> >>>             Even simple pytorch commands hang:
> >>>             (torch) bash-4.2$ python
> >>>             Python 2.7.5 (default, Jul 3 2018, 19:30:05)
> >>>             [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
> >>>             Type "help", "copyright", "credits" or "license" for
> >>>             more information.
> >>>             >>> import torch
> >>>             x>>> x = torch.zeros(4)
> >>>             >>> x.cuda()
> >>>             nvidia-smi works, and torch.cuda.is_available() returns
> >>>             True.
> >>>
> >>>
> >>>
> >>>
> >>
> > -- 
> > Vincent Jeanselme
> > -----------------
> > Analyst Researcher
> > Auton Lab - Robotics Institute
> > Carnegie Mellon University
> 
> -- 
> Vincent Jeanselme
> -----------------
> Analyst Researcher
> Auton Lab - Robotics Institute
> Carnegie Mellon University
> 

From predragp at andrew.cmu.edu  Fri Nov  9 17:45:38 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Fri, 9 Nov 2018 17:45:38 -0500
Subject: GPU3 and GPU4 to be rebooted for hardware failure assessment
Message-ID: <CAEFFt-HMGw9J3faxxNHQb6YSWQHuTrKaEKmeqSPN64i0UxUgKg@mail.gmail.com>

Dear Autonians,

I need to reboot GPU3 and GPU4 servers to asses the health of GPU
cards. If nobody says anything I will do it on Monday at 2:30 PM.

Cheers,
Predrag

P.S. I do have spare GPU cards but it will take at least 24h after we
power up the server to decide if the GPU cards are really dead.

From predragp at andrew.cmu.edu  Mon Nov 12 16:58:42 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Mon, 12 Nov 2018 16:58:42 -0500
Subject: GPU3 and GPU4 rebooted
Message-ID: <CAEFFt-EB2ZkSeNQRbtGavz1iQ87HWxcqKziAwnUR_gVuG01BVg@mail.gmail.com>

Per my Friday announcement GPU3 and GPU4 have been rebooted and
updated. All four GPU cards (4 per servers) are now visible with
nvidia-smi utility. It is of paramount importance that people hit
these two machines hard with GPU intensive computations so that we see
if that report of dead GPU cards was just a fluke or a real thing.

Best,
Predrag

From awd at cs.cmu.edu  Mon Nov 26 09:28:48 2018
From: awd at cs.cmu.edu (Artur Dubrawski)
Date: Mon, 26 Nov 2018 09:28:48 -0500
Subject: Auton Lab NIPS practice session: Thu 11/29 3pm, place TBD
Message-ID: <CAJvAoysLRpcBGnnyE_g=42zc0Q4JmrXUwP_x__ZXj-OcqdLaFg@mail.gmail.com>

Team.

Please mark your calendars for the time slot in the subject line above.
I will follow up with the location as soon as we get it.

If you have a paper or poster to present at NIPS or any of its workshops
please come prepared with your talk and/or poster. We will emulate what
happens at the conference so that you can practice.

Fabian: we can project your poster on a screen and have you skyped-in
for audio. Please work with Predrag to get connected.

Thanks
Artur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181126/ee22787e/attachment.html>

From awd at cs.cmu.edu  Mon Nov 26 09:30:23 2018
From: awd at cs.cmu.edu (Artur Dubrawski)
Date: Mon, 26 Nov 2018 09:30:23 -0500
Subject: Auton Lab NIPS practice session: Thu 11/29 3pm, place TBD
In-Reply-To: <CAJvAoysLRpcBGnnyE_g=42zc0Q4JmrXUwP_x__ZXj-OcqdLaFg@mail.gmail.com>
References: <CAJvAoysLRpcBGnnyE_g=42zc0Q4JmrXUwP_x__ZXj-OcqdLaFg@mail.gmail.com>
Message-ID: <CAJvAoyt2=B=9Nuy2VFziCF6CPL=F-_XCc5RQVc-bJmT7HR3RZw@mail.gmail.com>

We will be in NSH 4201

On Mon, Nov 26, 2018 at 9:28 AM Artur Dubrawski <awd at cs.cmu.edu> wrote:

> Team.
>
> Please mark your calendars for the time slot in the subject line above.
> I will follow up with the location as soon as we get it.
>
> If you have a paper or poster to present at NIPS or any of its workshops
> please come prepared with your talk and/or poster. We will emulate what
> happens at the conference so that you can practice.
>
> Fabian: we can project your poster on a screen and have you skyped-in
> for audio. Please work with Predrag to get connected.
>
> Thanks
> Artur
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181126/8e6be57f/attachment.html>

From awd at cs.cmu.edu  Mon Nov 26 09:33:34 2018
From: awd at cs.cmu.edu (Artur Dubrawski)
Date: Mon, 26 Nov 2018 09:33:34 -0500
Subject: Fwd: RI Ph.D. Thesis Proposal: Chao Liu
In-Reply-To: <4ab88d6ad5734673be16a56048d06d02@cmu.edu>
References: <4ab88d6ad5734673be16a56048d06d02@cmu.edu>
Message-ID: <CAJvAoyuivi=kjD+-Bt73gJk7hft-hNZLQw1=dY7zNOtSuew=KQ@mail.gmail.com>

Team,

Please come see Chao give his thesis proposal presentation this Friday at
2pm.

Cheers
Artur

---------- Forwarded message ---------
From: Suzanne Muth <lyonsmuth at cmu.edu>
Date: Wed, Nov 21, 2018 at 10:07 AM
Subject: RI Ph.D. Thesis Proposal: Chao Liu
To: ri-people at cs.cmu.edu <ri-people at cs.cmu.edu>


Date: 30 November 2018

Time: 2:00 p.m.

Place: NSH 4305

Type: Ph.D. Thesis Proposal

Who: Chao Liu

Topic: Vision with Small Baselines


Abstract:

Portable camera sensor systems are becoming more and more popular in
computer vision applications such as autonomous driving, virtual reality,
robotics manipulation and surveillance, due to the decreasing expense and
size of RGB camera. Despite the compactness and portability of the small
baseline vision systems, it is well-known that the uncertainty in range
finding using multiple views and the sensor baselines are inversely
related. For small baseline vision systems, this means high depth
uncertainties even for close range objects. On the other hand, besides
compactness,
small baseline vision system has its unique advantages such as easier
correspondence
and large overlapping regions across views. How to utilize those advantages
for small baseline vision setup while avoiding the limitations as much as
possible? In this thesis proposal, we approach this question in terms of
three aspects: scene complexity, uncertainties in the estimations and
baseline distance in the setup.


We first present a method for matting and depth recovery of 3D thin
structures with self-occlusions using a single-view camera with finite
aperture lens. In this work, we take advantage of the small camera
baselines that makes the correspondence easier. We apply the proposed
method to scenes at both macro and microscales. For macro-scale, we
evaluate our method on scenes with complex 3D thin structures such as tree
branches and grass. For micro-scale, we apply our method to *in-vivo*
microscopic images of micro-vessels with diameters less than 50 *um*.


We also utilize the small baselines for circularly placed point light
sources (commonly seen in consumer devices like NESTcam, Amazon Cloudcam).
We propose a two-stage near-light photometric stereo method. In the first
stage, we optimizethe vertex positions using the differential images
induced by small changes in lightsource position. This procedure yields a
strong initial guess for the second stage that refines the estimations
using the raw captured images.


To handle the estimation uncertainties inherent in the small baseline
setup, we propose a learning-based method to estimate per-pixel depth and
its uncertainty continuously from a monocular video stream. Compared to
prior work, the proposed approach achieves more accurate and stable
results, generalizes better to new datasets, and yields per-pixel depth
probability map that accounts for the estimation uncertainties due to
specular surface, occlusions in the scene and objects with large distance.


To deal with the subsurface light scattering in the tissue, we propose a
projector-camera setup with small baseline that works in a small scale and
a method that combines the approximated model for subsurface light
scattering, in order to see through skins and perform *in-vivo* blood flow
analysis on human skin.


We also propose to combine the benefits of small and large baseline vision
systems, in order to handle large region occlusion and depth estimation for
fine-grained structures at the same time.


Thesis Committee Members:

Srinivasa Narasimhan, Co-chair

Artur Dubrawski, Co-chair

Aswin Sankaranarayanan

Manmohan Chandraker, University of California, San Diego


A copy of thesis document is available at:

https://www.dropbox.com/s/wwxe7mqy7mf947q/small-baseline.pdf?dl=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181126/c07a3f09/attachment-0001.html>

From awd at cs.cmu.edu  Tue Nov 27 05:38:03 2018
From: awd at cs.cmu.edu (Artur Dubrawski)
Date: Tue, 27 Nov 2018 05:38:03 -0500
Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs
Message-ID: <CAJvAoysb50trgepjdHpnLJyT2bFp6EiEXTL4_LjipDHwvefW5w@mail.gmail.com>

Team,

This is a great recognition for Emily as well as the whole Marinus/CMU
Auton Lab/Traffic Jam counter-human-trafficking team.

Way to go Emily!

Artur

https://www.forbes.com/30-under-30/2019/social-entrepreneurs/#1e14b9b372e6
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181127/a0d169ad/attachment.html>

From tom.mitchell at cs.cmu.edu  Tue Nov 27 08:04:34 2018
From: tom.mitchell at cs.cmu.edu (Tom Mitchell)
Date: Tue, 27 Nov 2018 08:04:34 -0500
Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs
In-Reply-To: <CAJvAoysb50trgepjdHpnLJyT2bFp6EiEXTL4_LjipDHwvefW5w@mail.gmail.com>
References: <CAJvAoysb50trgepjdHpnLJyT2bFp6EiEXTL4_LjipDHwvefW5w@mail.gmail.com>
Message-ID: <CAAZSyQ95P_zxU-t=A1q-sqfY6QBhFbLCT9P7_0SsSi6JpVbZHQ@mail.gmail.com>

Wonderful!   Congratulations Emily on this great recognition, and to all
your collaborators too.

Byron - in case you're not already on this, let's make some noise about it!

best
Tom


On Tue, Nov 27, 2018 at 5:38 AM Artur Dubrawski <awd at cs.cmu.edu> wrote:

> Team,
>
> This is a great recognition for Emily as well as the whole Marinus/CMU
> Auton Lab/Traffic Jam counter-human-trafficking team.
>
> Way to go Emily!
>
> Artur
>
> https://www.forbes.com/30-under-30/2019/social-entrepreneurs/#1e14b9b372e6
>
>
>
>
>
>

-- 
Tom M. Mitchell
E. Fredkin University Professor
Interim Dean
School of Computer Science
Carnegie Mellon University
www.cs.cmu.edu/~tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181127/3e61dff1/attachment.html>

From hebert at cs.cmu.edu  Tue Nov 27 08:50:38 2018
From: hebert at cs.cmu.edu (Martial Hebert)
Date: Tue, 27 Nov 2018 13:50:38 +0000
Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs
In-Reply-To: <CAJvAoysb50trgepjdHpnLJyT2bFp6EiEXTL4_LjipDHwvefW5w@mail.gmail.com>
References: <CAJvAoysb50trgepjdHpnLJyT2bFp6EiEXTL4_LjipDHwvefW5w@mail.gmail.com>
Message-ID: <60b179a8cac449cc83d3b45613bbefbe@cs.cmu.edu>

Excellent! Congratulations Emily.


________________________________
From: Artur Dubrawski <awd at cs.cmu.edu>
Sent: Tuesday, November 27, 2018 5:38 AM
To: users at autonlab.org
Cc: Martial Hebert; Andrew W. Moore; Andrew Moore; Tom Mitchell; Roni Rosenfeld; Emily Kennedy; Cara Jones
Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs

Team,

This is a great recognition for Emily as well as the whole Marinus/CMU Auton Lab/Traffic Jam counter-human-trafficking team.

Way to go Emily!

Artur

https://www.forbes.com/30-under-30/2019/social-entrepreneurs/#1e14b9b372e6


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181127/97f896cf/attachment.html>

From mille856 at andrew.cmu.edu  Tue Nov 27 08:59:52 2018
From: mille856 at andrew.cmu.edu (James Miller)
Date: Tue, 27 Nov 2018 08:59:52 -0500
Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs
In-Reply-To: <CAAZSyQ95P_zxU-t=A1q-sqfY6QBhFbLCT9P7_0SsSi6JpVbZHQ@mail.gmail.com>
References: <CAJvAoysb50trgepjdHpnLJyT2bFp6EiEXTL4_LjipDHwvefW5w@mail.gmail.com>
 <CAAZSyQ95P_zxU-t=A1q-sqfY6QBhFbLCT9P7_0SsSi6JpVbZHQ@mail.gmail.com>
Message-ID: <CAObivkwM=fqVm_DQ3HwiQXV5pFpY5-MwtpheTmMxQSZKoPJkLg@mail.gmail.com>

Awesome Emily! Congratulations!

On Tue, Nov 27, 2018, 8:13 AM Tom Mitchell <tom.mitchell at cs.cmu.edu wrote:

> Wonderful!   Congratulations Emily on this great recognition, and to all
> your collaborators too.
>
> Byron - in case you're not already on this, let's make some noise about it!
>
> best
> Tom
>
>
> On Tue, Nov 27, 2018 at 5:38 AM Artur Dubrawski <awd at cs.cmu.edu> wrote:
>
>> Team,
>>
>> This is a great recognition for Emily as well as the whole Marinus/CMU
>> Auton Lab/Traffic Jam counter-human-trafficking team.
>>
>> Way to go Emily!
>>
>> Artur
>>
>> https://www.forbes.com/30-under-30/2019/social-entrepreneurs/#1e14b9b372e6
>>
>>
>>
>>
>>
>>
>
> --
> Tom M. Mitchell
> E. Fredkin University Professor
> Interim Dean
> School of Computer Science
> Carnegie Mellon University
> www.cs.cmu.edu/~tom
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181127/de44f456/attachment-0001.html>

From bapoczos at cs.cmu.edu  Tue Nov 27 09:40:09 2018
From: bapoczos at cs.cmu.edu (Barnabas Poczos)
Date: Tue, 27 Nov 2018 09:40:09 -0500
Subject: Emily Kennedy among Forbes' 30 under 30 Social Entrepreneurs
In-Reply-To: <CAJvAoysb50trgepjdHpnLJyT2bFp6EiEXTL4_LjipDHwvefW5w@mail.gmail.com>
References: <CAJvAoysb50trgepjdHpnLJyT2bFp6EiEXTL4_LjipDHwvefW5w@mail.gmail.com>
Message-ID: <CAHFiJZs2dUyS1zvj05n=9LpWWLfe9=3+S0fFQgU7186-0R58aw@mail.gmail.com>

This is awesome! Congratulations Emily! :-)

Cheers,
Barnabas

======================
Barnabas Poczos, PhD
Associate Professor
Co-Director of PhD Program
Machine Learning Department
Carnegie Mellon University

On Tue, Nov 27, 2018 at 5:38 AM Artur Dubrawski <awd at cs.cmu.edu> wrote:
>
> Team,
>
> This is a great recognition for Emily as well as the whole Marinus/CMU Auton Lab/Traffic Jam counter-human-trafficking team.
>
> Way to go Emily!
>
> Artur
>
> https://www.forbes.com/30-under-30/2019/social-entrepreneurs/#1e14b9b372e6
>
>
>
>
>

From predragp at andrew.cmu.edu  Tue Nov 27 22:33:44 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Tue, 27 Nov 2018 22:33:44 -0500
Subject: GPU2 is killed
Message-ID: <20181128033344.wvqy5Wks8%predragp@andrew.cmu.edu>

Dear Autonians,

GPU2 was just killed by a user trying to use 110GB of memory per script.
The user had multiple scripts running.  I do realize that NIPS deadline
is near but killing machine is not going to do us any good either.

Best,
Predrag

From yz6 at andrew.cmu.edu  Thu Nov 29 00:11:45 2018
From: yz6 at andrew.cmu.edu (Yang Zhang)
Date: Thu, 29 Nov 2018 00:11:45 -0500
Subject: Matlab license error
Message-ID: <AA790E8F-FFC4-4FC2-B35B-B03C201FB6CE@andrew.cmu.edu>

Hi,

I am receiving this error when I run Matlab. Does anyone know what to do?

Cheers,
Yang


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181129/7a13fd38/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2018-11-29 at 12.11.06 AM.png
Type: image/png
Size: 104188 bytes
Desc: not available
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181129/7a13fd38/attachment-0001.png>

From predragp at andrew.cmu.edu  Thu Nov 29 17:39:20 2018
From: predragp at andrew.cmu.edu (Predrag Punosevac)
Date: Thu, 29 Nov 2018 17:39:20 -0500
Subject: LOV5 hard rebooted due to the crash
Message-ID: <CAEFFt-HUyB4me2LkPzdJboKp70i2oumD0OsnUFStodFziH0pxQ@mail.gmail.com>

Somebody went wild on lov5 and crash the machine which had to be cold
rebooted. Please be mindful of how much you load machines as having
frequent reboots is not going to help us with NIPS deadline.

Best,
Predrag

From awd at cs.cmu.edu  Fri Nov 30 09:32:40 2018
From: awd at cs.cmu.edu (Artur Dubrawski)
Date: Fri, 30 Nov 2018 09:32:40 -0500
Subject: Our Emily Kennedy, Marinus Analytics, and Traffic Jam software in The
 Washington Post
Message-ID: <CAJvAoyu6=xQ89qXmnrPbVO86EWTwyk3hbXgF21WWHUR-=ObZXw@mail.gmail.com>

Team,

Emily and our Counter-Human-Trafficking work makes it to the national media
again.

This one is about the fallout of the shutdown of Backpage escort
advertising and how
the industry moved elsewhere and how the C-H-T community (led by Marinus
Analytics on
the technology side of things) managed to quickly cope with the change and
continue
extracting operationally useful intelligence from public data sources.

Congrats Emily and the Traffic Jam team at Marinus and CMU!

Artur

https://www.washingtonpost.com/national/online-sex-ads-rebound-months-after-shutdown-of-backpage/2018/11/28/ff8fe3a4-f34b-11e8-99c2-cfca6fcf610c_story.html?utm_term=.7ed140a54b55
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181130/53e6a5e8/attachment.html>

From emily at marinusanalytics.com  Fri Nov 30 17:11:00 2018
From: emily at marinusanalytics.com (Emily Kennedy)
Date: Fri, 30 Nov 2018 22:11:00 +0000
Subject: Our Emily Kennedy, Marinus Analytics, and Traffic Jam software in
 The Washington Post
In-Reply-To: <CAJvAoyu6=xQ89qXmnrPbVO86EWTwyk3hbXgF21WWHUR-=ObZXw@mail.gmail.com>
References: <CAJvAoyu6=xQ89qXmnrPbVO86EWTwyk3hbXgF21WWHUR-=ObZXw@mail.gmail.com>
Message-ID: <BC93CABA-A09F-4E3D-B99D-5EB718665B75@marinusanalytics.com>

Thank you for sharing this, Artur, and for highlighting it with your network! This work truly would have not been possible without its initial inception at CMU and from the amazing support we have received.

Just so happens I also had the chance to share this week about our work fighting human trafficking at Hewlett Packard Enterprise?s Discover Conference in Madrid in the General Session with HPE CEO Antonio Neri.

Video is here and my part begins around 12:30: https://www.youtube.com/watch?v=rCBZnXOBypU&feature=youtu.be

Have a great weekend, everyone!

Emily Kennedy
President, Founder
Marinus Analytics LLC
+1 (866) 945-2803<tel:+1%20(866)%20945-2803>
LinkedIn<https://www.linkedin.com/in/ekmarinus/> | marinusanalytics.com<http://www.marinusanalytics.com/>

On Nov 30, 2018, at 8:32 AM, Artur Dubrawski <awd at cs.cmu.edu<mailto:awd at cs.cmu.edu>> wrote:

Team,

Emily and our Counter-Human-Trafficking work makes it to the national media again.

This one is about the fallout of the shutdown of Backpage escort advertising and how
the industry moved elsewhere and how the C-H-T community (led by Marinus Analytics on
the technology side of things) managed to quickly cope with the change and continue
extracting operationally useful intelligence from public data sources.

Congrats Emily and the Traffic Jam team at Marinus and CMU!

Artur

https://www.washingtonpost.com/national/online-sex-ads-rebound-months-after-shutdown-of-backpage/2018/11/28/ff8fe3a4-f34b-11e8-99c2-cfca6fcf610c_story.html?utm_term=.7ed140a54b55
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181130/800e606f/attachment.html>