gpu10: pytorch and cuda
Predrag Punosevac
predragp at andrew.cmu.edu
Sun Mar 10 22:19:03 EDT 2019
Yichong Xu <yichongx at cs.cmu.edu> wrote:
> Thanks for the suggestion Predrag! However it seems like I cannot even run the cuda10.1 examples, as I mentioned previously:
> yichongx at gpu10$ pwd
> /home/scratch/yichongx/NVIDIA_CUDA-10.1_Samples/0_Simple/simplePrintf
> yichongx at gpu10$ ls
> Makefile readme.txt simplePrintf simplePrintf.cu simplePrintf.o
> yichongx at gpu10$ ./simplePrintf
> CUDA error at ../../common/inc/helper_cuda.h:744 code=999(cudaErrorUnknown) "cudaGetDeviceCount(&device_count)???
>
> This seems like a problem of CUDA its own. I downloaded the cuda10.1 examples from here:
> https://docs.nvidia.com/cuda/cuda-samples/index.html
I can't do anything tonight. Later this week (perhaps Tuesday) I will
try to reinstall everything.
>
>
>
>
> Thanks,
> Yichong
>
>
>
> On Mar 10, 2019, at 10:07 PM, Predrag Punosevac <predragp at andrew.cmu.edu<mailto:predragp at andrew.cmu.edu>> wrote:
>
> Yichong Xu <yichongx at cs.cmu.edu<mailto:yichongx at cs.cmu.edu>> wrote:
>
> I tried installing the nightly version and the same error appears. I
> guess it is a recent problem - a few weeks ago I can also run pytorch
> but now it breaks (at that time there were only 3 gpus available on
> gpu10).
>
>
> This is likely due to the CUDA upgrade. NVidia is aggressively pushing
> CUDA 10 branch which we already used on this server. Both pytorch and
> tensor flow were working fine up until I added another GPU card week ago
> and upgraded the kernel and CUDA to 10.1. I would suggest that we do a
> bit of debugging in unison with upstream. In my experience upstream has
> probably not caught yet with latest changes and this is what we see.
> Instead of me guessing somebody needs to communicate with pytorch and
> tensor flow developers (via mailing lists).
>
> Cheers,
> Predrag
>
>
>
>
> Thanks,
> Yichong
>
>
>
> On Mar 10, 2019, at 3:05 PM, Yotam Hechtlinger <yhechtli at andrew.cmu.edu<mailto:yhechtli at andrew.cmu.edu><mailto:yhechtli at andrew.cmu.edu>> wrote:
>
> Regarding tensorflow you don't need to compile from source.
>
> pip install tf-nightly-gpu
>
> Should get it done. I think that's what I've done, but it's been few weeks ago, so try it out and if it doesn't work I'll try to debug it.
> Notice that you'll have to uninstall it and install the regular version when you switch back to the other GPUs.
>
> Not sure regarding pytorch, I haven't tried to install it yet.
>
> Yotam.
>
>
> On Sun, Mar 10, 2019 at 2:24 PM Yichong Xu <yichongx at cs.cmu.edu<mailto:yichongx at cs.cmu.edu><mailto:yichongx at cs.cmu.edu>> wrote:
> It seems like tensorflow does not support cuda10 right now - it has to be installed from source.
> But I???m mainly using pytorch though and the version with cuda10 does not run either.
> Plus, I tried the original cuda example and it cannot find the gpu either:
> (base) yichongx at gpu10$ ls
> Makefile readme.txt simplePrintf simplePrintf.cu simplePrintf.o
> (base) yichongx at gpu10$ ./simplePrintf
> CUDA error at ../../common/inc/helper_cuda.h:744 code=999(cudaErrorUnknown) "cudaGetDeviceCount(&device_count)"
> (base) yichongx at gpu10$
>
>
>
> Thanks,
> Yichong
>
>
>
> On Mar 10, 2019, at 9:52 AM, Yotam Hechtlinger <yhechtli at andrew.cmu.edu<mailto:yhechtli at andrew.cmu.edu><mailto:yhechtli at andrew.cmu.edu>> wrote:
>
> It's not the same cuda version on GPU 10 and the rest, I think different version of tensorflow has to be installed.
>
> Check your tensorflow version and if it supports the cuda version on gpu10.
>
>
>
> On Saturday, March 9, 2019, Predrag Punosevac <predragp at andrew.cmu.edu<mailto:predragp at andrew.cmu.edu><mailto:predragp at andrew.cmu.edu>> wrote:
> Try CUDA 10.0 instead of 10.1
>
> On Mar 9, 2019 5:28 PM, Yichong Xu <yichongx at cs.cmu.edu<mailto:yichongx at cs.cmu.edu><mailto:yichongx at cs.cmu.edu>> wrote:
> Same issue here.
>
> From my iPhone
>
> On Mar 9, 2019, at 4:01 PM, Emre Yolcu <eyolcu at andrew.cmu.edu<mailto:eyolcu at andrew.cmu.edu><mailto:eyolcu at andrew.cmu.edu>> wrote:
>
>
> Hi,
>
>
>
> Right now on gpu10 `nvcc --version` and `nvidia-smi` seem to work, but `python -c ???import torch; print(torch.cuda.is_available())???` prints False. Is anybody running into the same issue?
>
>
>
> Emre
>
>
>
>
More information about the Autonlab-users
mailing list