PyTorch problem

Predrag Punosevac predragp at andrew.cmu.edu
Wed Sep 5 16:40:37 EDT 2018


I just rebooted GPU8. All packages are up to date. NVidia driver appears to
be working properly and I can do GPU computations from MATLAB. Let's try
now to get pytorch working on GPU8.

Predrag

On Wed, Sep 5, 2018 at 12:19 AM, Biswajit Paria <bparia at cs.cmu.edu> wrote:

> I am facing a similar error on all GPU machines. Did someone find a
> solution yet?
>
> 2018-09-05 00:27:41.546064: E tensorflow/stream_executor/cuda/cuda_blas.cc:459]
> failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
>
> On Tue, Sep 4, 2018 at 10:03 PM Manzil Zaheer <manzil at cmu.edu> wrote:
>
>> Hi Yichong
>>
>> Yes I am able to run TF and PyTorch on these machines. Recently someone
>> else also had similar issue, but it got fixed by reinstalling some local
>> packages.
>>
>> Thanks,
>> Manzil
>>
>>
>> -------- Original message --------
>> From: Yichong Xu <yichongx at cs.cmu.edu>
>> Date: 9/4/18 9:58 PM (GMT-05:00)
>> To: Emre Yolcu <eyolcu at cs.cmu.edu>, Predrag Punosevac <
>> predragp at andrew.cmu.edu>
>> Cc: users at autonlab.org
>> Subject: Re: PyTorch problem
>>
>> Just wondering - can Tensorflow run well on these machines? I hope
>> someone to confirm about this so that we can isolate the problem.
>> OK so here’s a further test: I tried running the cuda examples from the
>> cuda installation (in /usr/local/cuda/sample), on gpu2 in my scratch
>> directory. Simple jobs like deviceQuery succeeds, but simpleCUBLAS failed:
>> yichongx at gpu2$ cd /home/scratch/yichongx/
>> yichongx at gpu2$ cd
>> 0_Simple/        2_Graphics/      4_Finance/       6_Advanced/      bin/
>>             conda/
>> 1_Utilities/     3_Imaging/       5_Simulations/   7_CUDALibraries/
>> common/          miniconda3/
>> yichongx at gpu2$ cd 7_CUDALibraries/
>> yichongx at gpu2$ cd simpleCUBLAS
>> yichongx at gpu2$ CUDA_VISIBLE_DEVICES=3 ./simpleCUBLAS
>> GPU Device 0: "TITAN X (Pascal)" with compute capability 6.1
>>
>> simpleCUBLAS test running..
>> !!!! CUBLAS initialization error
>> yichongx at gpu2$
>>
>>
>> This is also consistent with our previous errors from pytorch, which say
>> cublas library not initialized.
>>
>> So this means at least there is some problem with CUBLAS on gpu2. This
>> post suggests that using sudo can resolve this problem, and this is
>> probably because of some permission problems on CUBLAS libraries:
>> https://devtalk.nvidia.com/default/topic/1027602/cuda-
>> setup-and-installation/cublas-libraries-with-incorrect-permissions/
>> @Predrag: Can you try running the simpleCUBLAS example from the CUDA
>> library, with and without root privilege? I think that might be something
>> that you are more familiar with. Thank you very much!
>>
>>
>> *Thanks,*
>> *Yichong*
>>
>> On Sep 4, 2018, at 3:18 PM, Emre Yolcu <eyolcu at cs.cmu.edu> wrote:
>>
>> Hi,
>>
>> We are trying to troubleshoot the PyTorch issue with Predrag and were
>> wondering:
>>
>> Is anybody able to run PyTorch GPU models on gpu1-9? If you can, we would
>> appreciate if you can respond.
>>
>> Also, is it a problem for anyone if gpu8 is rebooted today?
>>
>> Thanks,
>>
>> Emre
>>
>>
>>
>
> --
> Biswajit Paria
> PhD in ML @ CMU
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20180905/637fe2da/attachment-0001.html>


More information about the Autonlab-users mailing list