PyTorch

Manzil Zaheer manzil at cmu.edu
Tue Mar 27 01:00:39 EDT 2018


Hi Pregrad,

Thanks again for your help. But I still can not get anything running on GPU5,6,7,9. Also notice that GPU1,2,3,4,8 almost all GPUs are full, while no one is using GPU5,6,7,9. This might mean no one else is also able to run anything as well.

So I tried many things. Everything installs without issue. But when i try to run the simple code like:

import torch
x = torch.cuda.FloatTensor(2,3,4)
print(x)


I get the following error:
THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCGeneral.c line=70 error=30 : unknown error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/zfsauton/home/manzilz/.local/lib/python3.6/site-packages/torch/_utils.py", line 69, in _cuda
    return new_type(self.size()).copy_(self, async)
  File "/zfsauton/home/manzilz/.local/lib/python3.6/site-packages/torch/cuda/__init__.py", line 384, in _lazy_new
    _lazy_init()
  File "/zfsauton/home/manzilz/.local/lib/python3.6/site-packages/torch/cuda/__init__.py", line 142, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (30) : unknown error at /pytorch/torch/lib/THC/THCGeneral.c:70

Thanks,
Manzil

________________________________________
From: Predrag Punosevac <predragp at andrew.cmu.edu>
Sent: 26 March 2018 22:50
To: Manzil Zaheer
Cc: Barnabas Poczos; users at autonlab.org
Subject: Re: PyTorch

Manzil Zaheer <manzil at cmu.edu> wrote:

> Thanks for the detailed analysis. But I am using pytorch. I have not tried Lua torch. Can you please check? Thanks again!
>

I did. You have Python 3.6.4 in /opt/miniconda3/bin/python3.6

predrag at gpu3$ /opt/miniconda3/bin/python3.6
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.


Try reinstalling thing in your scratch directory as

/opt/miniconda3/bin/conda  install pytorch torchvision cuda91 -c pytorch

You should see something like

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pillow-5.0.0               |   py36h3deb7b8_0         561 KB
    mkl-2018.0.2               |                1       205.2 MB
    cuda91-1.0                 |       h4c16780_0           3 KB
pytorch
    libpng-1.6.34              |       hb9fc6fc_0         334 KB
    freetype-2.8               |       hab7d2ae_1         804 KB
    libgfortran-ng-7.2.0       |       hdf63c60_3         1.2 MB
    intel-openmp-2018.0.0      |                8         620 KB
    libtiff-4.0.9              |       h28f6b97_0         586 KB
    pytorch-0.3.1              |py36_cuda9.1.85_cudnn7.0.5_2       475.0
MB  pytorch
    torchvision-0.2.0          |   py36h17b6947_1         102 KB
pytorch
    jpeg-9b                    |       h024ee3a_2         248 KB
    numpy-1.14.2               |   py36hdbf6ddf_0         4.0 MB
    olefile-0.45.1             |           py36_0          47 KB
    ------------------------------------------------------------
                                           Total:       688.7 MB


Make sure you put your scratch as a path since file server is full. I
got clean installation but I didn't play further. One thing that worries
me is this line

pytorch-0.3.1              |py36_cuda9.1.85_cudnn7.0.5_2       475.0 MB
pytorch

We had problems with cudnn on 9.1 apparently because the upstream was
assuming 7.0.5 when in reality I have 7.1.1 CUDA 9 or even 7.1.5.  CUDA
9.1

GPU3 has CUDNN library 7.0.5 in cuda-9.0 so try adjusting conda command
accordingly.


Best,
Predrag






>
>
> Sent from my Samsung Galaxy smartphone.
>
>
> -------- Original message --------
> From: Predrag Punosevac <predragp at andrew.cmu.edu>
> Date: 3/26/18 9:00 PM (GMT-05:00)
> To: Manzil Zaheer <manzil at cmu.edu>
> Cc: Barnabas Poczos <bapoczos at andrew.cmu.edu>, users at autonlab.org
> Subject: Re: Lua Torch
>
> Manzil Zaheer <manzil at cmu.edu> wrote:
>
> > Hi Predrag,
> >
> > I am not able to use any GPUSs on gpu5,6,7,9. I tried all 3 versions of cuda, but I get the following error:
> >
>
>
> I was able to build it after adding this
>
> export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
>
> per
>
> https://github.com/torch/torch7/issues/1086
>
> When I try to run it I get errors that Lua packages are missing (probably
> due to my path variables). I have a vague recollection that Simon and I
> halped you once with this thing in the past. IIRC it was very picky about
> the version of some Lua package and required their version not the one
> which comes with yum .
>
> Anyhow I am forwarding this to users at autonlab in hope somebody is using
> it and might be of more help. Please stop by NSH 3119 and let us try to
> debug this.
>
> Predrag
>
>
>
>
> > THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCGeneral.c line=70 error=30 : unknown error
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File "/zfsauton/home/manzilz/local/lib/python3.6/site-packages/torch/cuda/__init__.py", line 384, in _lazy_new
> >     _lazy_init()
> >   File "/zfsauton/home/manzilz/local/lib/python3.6/site-packages/torch/cuda/__init__.py", line 142, in _lazy_init
> >     torch._C._cuda_init()
> > RuntimeError: cuda runtime error (30) : unknown error at /pytorch/torch/lib/THC/THCGeneral.c:70
> >
> > Can you kindly look into it?
> >
> > Thanks,
> > Manzil



More information about the Autonlab-users mailing list