gpu10: pytorch and cuda
Predrag Punosevac
predragp at andrew.cmu.edu
Sun Mar 10 22:07:32 EDT 2019
Yichong Xu <yichongx at cs.cmu.edu> wrote:
> I tried installing the nightly version and the same error appears. I
> guess it is a recent problem - a few weeks ago I can also run pytorch
> but now it breaks (at that time there were only 3 gpus available on
> gpu10).
This is likely due to the CUDA upgrade. NVidia is aggressively pushing
CUDA 10 branch which we already used on this server. Both pytorch and
tensor flow were working fine up until I added another GPU card week ago
and upgraded the kernel and CUDA to 10.1. I would suggest that we do a
bit of debugging in unison with upstream. In my experience upstream has
probably not caught yet with latest changes and this is what we see.
Instead of me guessing somebody needs to communicate with pytorch and
tensor flow developers (via mailing lists).
Cheers,
Predrag
>
>
> Thanks,
> Yichong
>
>
>
> On Mar 10, 2019, at 3:05 PM, Yotam Hechtlinger <yhechtli at andrew.cmu.edu<mailto:yhechtli at andrew.cmu.edu>> wrote:
>
> Regarding tensorflow you don't need to compile from source.
>
> pip install tf-nightly-gpu
>
> Should get it done. I think that's what I've done, but it's been few weeks ago, so try it out and if it doesn't work I'll try to debug it.
> Notice that you'll have to uninstall it and install the regular version when you switch back to the other GPUs.
>
> Not sure regarding pytorch, I haven't tried to install it yet.
>
> Yotam.
>
>
> On Sun, Mar 10, 2019 at 2:24 PM Yichong Xu <yichongx at cs.cmu.edu<mailto:yichongx at cs.cmu.edu>> wrote:
> It seems like tensorflow does not support cuda10 right now - it has to be installed from source.
> But I???m mainly using pytorch though and the version with cuda10 does not run either.
> Plus, I tried the original cuda example and it cannot find the gpu either:
> (base) yichongx at gpu10$ ls
> Makefile readme.txt simplePrintf simplePrintf.cu simplePrintf.o
> (base) yichongx at gpu10$ ./simplePrintf
> CUDA error at ../../common/inc/helper_cuda.h:744 code=999(cudaErrorUnknown) "cudaGetDeviceCount(&device_count)"
> (base) yichongx at gpu10$
>
>
>
> Thanks,
> Yichong
>
>
>
> On Mar 10, 2019, at 9:52 AM, Yotam Hechtlinger <yhechtli at andrew.cmu.edu<mailto:yhechtli at andrew.cmu.edu>> wrote:
>
> It's not the same cuda version on GPU 10 and the rest, I think different version of tensorflow has to be installed.
>
> Check your tensorflow version and if it supports the cuda version on gpu10.
>
>
>
> On Saturday, March 9, 2019, Predrag Punosevac <predragp at andrew.cmu.edu<mailto:predragp at andrew.cmu.edu>> wrote:
> Try CUDA 10.0 instead of 10.1
>
> On Mar 9, 2019 5:28 PM, Yichong Xu <yichongx at cs.cmu.edu<mailto:yichongx at cs.cmu.edu>> wrote:
> Same issue here.
>
> From my iPhone
>
> On Mar 9, 2019, at 4:01 PM, Emre Yolcu <eyolcu at andrew.cmu.edu<mailto:eyolcu at andrew.cmu.edu>> wrote:
>
>
> Hi,
>
>
>
> Right now on gpu10 `nvcc --version` and `nvidia-smi` seem to work, but `python -c ???import torch; print(torch.cuda.is_available())???` prints False. Is anybody running into the same issue?
>
>
>
> Emre
>
>
>
More information about the Autonlab-users
mailing list