PyTorch problem
Biswajit Paria
bparia at cs.cmu.edu
Wed Sep 5 00:19:50 EDT 2018
I am facing a similar error on all GPU machines. Did someone find a
solution yet?
2018-09-05 00:27:41.546064: E
tensorflow/stream_executor/cuda/cuda_blas.cc:459] failed to create cublas
handle: CUBLAS_STATUS_NOT_INITIALIZED
On Tue, Sep 4, 2018 at 10:03 PM Manzil Zaheer <manzil at cmu.edu> wrote:
> Hi Yichong
>
> Yes I am able to run TF and PyTorch on these machines. Recently someone
> else also had similar issue, but it got fixed by reinstalling some local
> packages.
>
> Thanks,
> Manzil
>
>
> -------- Original message --------
> From: Yichong Xu <yichongx at cs.cmu.edu>
> Date: 9/4/18 9:58 PM (GMT-05:00)
> To: Emre Yolcu <eyolcu at cs.cmu.edu>, Predrag Punosevac <
> predragp at andrew.cmu.edu>
> Cc: users at autonlab.org
> Subject: Re: PyTorch problem
>
> Just wondering - can Tensorflow run well on these machines? I hope someone
> to confirm about this so that we can isolate the problem.
> OK so here’s a further test: I tried running the cuda examples from the
> cuda installation (in /usr/local/cuda/sample), on gpu2 in my scratch
> directory. Simple jobs like deviceQuery succeeds, but simpleCUBLAS failed:
> yichongx at gpu2$ cd /home/scratch/yichongx/
> yichongx at gpu2$ cd
> 0_Simple/ 2_Graphics/ 4_Finance/ 6_Advanced/ bin/
> conda/
> 1_Utilities/ 3_Imaging/ 5_Simulations/ 7_CUDALibraries/
> common/ miniconda3/
> yichongx at gpu2$ cd 7_CUDALibraries/
> yichongx at gpu2$ cd simpleCUBLAS
> yichongx at gpu2$ CUDA_VISIBLE_DEVICES=3 ./simpleCUBLAS
> GPU Device 0: "TITAN X (Pascal)" with compute capability 6.1
>
> simpleCUBLAS test running..
> !!!! CUBLAS initialization error
> yichongx at gpu2$
>
>
> This is also consistent with our previous errors from pytorch, which say
> cublas library not initialized.
>
> So this means at least there is some problem with CUBLAS on gpu2. This
> post suggests that using sudo can resolve this problem, and this is
> probably because of some permission problems on CUBLAS libraries:
>
> https://devtalk.nvidia.com/default/topic/1027602/cuda-setup-and-installation/cublas-libraries-with-incorrect-permissions/
> @Predrag: Can you try running the simpleCUBLAS example from the CUDA
> library, with and without root privilege? I think that might be something
> that you are more familiar with. Thank you very much!
>
>
> *Thanks,*
> *Yichong*
>
> On Sep 4, 2018, at 3:18 PM, Emre Yolcu <eyolcu at cs.cmu.edu> wrote:
>
> Hi,
>
> We are trying to troubleshoot the PyTorch issue with Predrag and were
> wondering:
>
> Is anybody able to run PyTorch GPU models on gpu1-9? If you can, we would
> appreciate if you can respond.
>
> Also, is it a problem for anyone if gpu8 is rebooted today?
>
> Thanks,
>
> Emre
>
>
>
--
Biswajit Paria
PhD in ML @ CMU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20180905/2c0ef042/attachment.html>
More information about the Autonlab-users
mailing list