PyTorch

Predrag Punosevac predragp at andrew.cmu.edu
Mon Mar 26 22:50:12 EDT 2018


Manzil Zaheer <manzil at cmu.edu> wrote:

> Thanks for the detailed analysis. But I am using pytorch. I have not tried Lua torch. Can you please check? Thanks again!
> 

I did. You have Python 3.6.4 in /opt/miniconda3/bin/python3.6

predrag at gpu3$ /opt/miniconda3/bin/python3.6
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.


Try reinstalling thing in your scratch directory as

/opt/miniconda3/bin/conda  install pytorch torchvision cuda91 -c pytorch

You should see something like

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pillow-5.0.0               |   py36h3deb7b8_0         561 KB
    mkl-2018.0.2               |                1       205.2 MB
    cuda91-1.0                 |       h4c16780_0           3 KB
pytorch
    libpng-1.6.34              |       hb9fc6fc_0         334 KB
    freetype-2.8               |       hab7d2ae_1         804 KB
    libgfortran-ng-7.2.0       |       hdf63c60_3         1.2 MB
    intel-openmp-2018.0.0      |                8         620 KB
    libtiff-4.0.9              |       h28f6b97_0         586 KB
    pytorch-0.3.1              |py36_cuda9.1.85_cudnn7.0.5_2       475.0
MB  pytorch
    torchvision-0.2.0          |   py36h17b6947_1         102 KB
pytorch
    jpeg-9b                    |       h024ee3a_2         248 KB
    numpy-1.14.2               |   py36hdbf6ddf_0         4.0 MB
    olefile-0.45.1             |           py36_0          47 KB
    ------------------------------------------------------------
                                           Total:       688.7 MB


Make sure you put your scratch as a path since file server is full. I
got clean installation but I didn't play further. One thing that worries
me is this line 

pytorch-0.3.1              |py36_cuda9.1.85_cudnn7.0.5_2       475.0 MB
pytorch

We had problems with cudnn on 9.1 apparently because the upstream was
assuming 7.0.5 when in reality I have 7.1.1 CUDA 9 or even 7.1.5.  CUDA
9.1

GPU3 has CUDNN library 7.0.5 in cuda-9.0 so try adjusting conda command
accordingly. 


Best,
Predrag






> 
> 
> Sent from my Samsung Galaxy smartphone.
> 
> 
> -------- Original message --------
> From: Predrag Punosevac <predragp at andrew.cmu.edu>
> Date: 3/26/18 9:00 PM (GMT-05:00)
> To: Manzil Zaheer <manzil at cmu.edu>
> Cc: Barnabas Poczos <bapoczos at andrew.cmu.edu>, users at autonlab.org
> Subject: Re: Lua Torch
> 
> Manzil Zaheer <manzil at cmu.edu> wrote:
> 
> > Hi Predrag,
> >
> > I am not able to use any GPUSs on gpu5,6,7,9. I tried all 3 versions of cuda, but I get the following error:
> >
> 
> 
> I was able to build it after adding this
> 
> export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
> 
> per
> 
> https://github.com/torch/torch7/issues/1086
> 
> When I try to run it I get errors that Lua packages are missing (probably
> due to my path variables). I have a vague recollection that Simon and I
> halped you once with this thing in the past. IIRC it was very picky about
> the version of some Lua package and required their version not the one
> which comes with yum .
> 
> Anyhow I am forwarding this to users at autonlab in hope somebody is using
> it and might be of more help. Please stop by NSH 3119 and let us try to
> debug this.
> 
> Predrag
> 
> 
> 
> 
> > THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCGeneral.c line=70 error=30 : unknown error
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File "/zfsauton/home/manzilz/local/lib/python3.6/site-packages/torch/cuda/__init__.py", line 384, in _lazy_new
> >     _lazy_init()
> >   File "/zfsauton/home/manzilz/local/lib/python3.6/site-packages/torch/cuda/__init__.py", line 142, in _lazy_init
> >     torch._C._cuda_init()
> > RuntimeError: cuda runtime error (30) : unknown error at /pytorch/torch/lib/THC/THCGeneral.c:70
> >
> > Can you kindly look into it?
> >
> > Thanks,
> > Manzil


More information about the Autonlab-users mailing list