<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">

Thanks for the suggestion Predrag! However it seems like I cannot even run the cuda10.1 examples, as I mentioned previously:

<div class="">

<div class="">yichongx@gpu10$ pwd</div>

<div class="">/home/scratch/yichongx/NVIDIA_CUDA-10.1_Samples/0_Simple/simplePrintf</div>

<div class="">yichongx@gpu10$ ls</div>

<div class="">Makefile  readme.txt  simplePrintf  simplePrintf.cu  simplePrintf.o</div>

<div class="">yichongx@gpu10$ ./simplePrintf</div>

<div class="">CUDA error at ../../common/inc/helper_cuda.h:744 code=999(cudaErrorUnknown) "cudaGetDeviceCount(&device_count)”</div>

<div class=""><br class="">

</div>

<div class="">This seems like a problem of CUDA its own. I downloaded the cuda10.1 examples from here:</div>

<div class=""><a href="https://docs.nvidia.com/cuda/cuda-samples/index.html" class="">https://docs.nvidia.com/cuda/cuda-samples/index.html</a></div>

<div class=""><br class="">

</div>

<div class=""><br class="">

</div>

<br class="">

<br class="">

<div class="">

<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;">

<i class="">Thanks,</i></div>

<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;">

<i class="">Yichong</i></div>

<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">

<br class="">

</div>

<br class="Apple-interchange-newline">

</div>

<br class="">

<div>

<blockquote type="cite" class="">

<div class="">On Mar 10, 2019, at 10:07 PM, Predrag Punosevac <<a href="mailto:predragp@andrew.cmu.edu" class="">predragp@andrew.cmu.edu</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class="">

<div class="">Yichong Xu <<a href="mailto:yichongx@cs.cmu.edu" class="">yichongx@cs.cmu.edu</a>> wrote:<br class="">

<br class="">

<blockquote type="cite" class="">I tried installing the nightly version and the same error appears. I<br class="">

guess it is a recent problem - a few weeks ago I can also run pytorch<br class="">

but now it breaks (at that time there were only 3 gpus available on<br class="">

gpu10).<br class="">

</blockquote>

<br class="">

<br class="">

This is likely due to the CUDA upgrade. NVidia is aggressively pushing<br class="">

CUDA 10 branch which we already used on this server. Both pytorch and<br class="">

tensor flow were working fine up until I added another GPU card week ago<br class="">

and upgraded the kernel and CUDA to 10.1. I would suggest that we do a<br class="">

bit of debugging in unison with upstream. In my experience upstream has<br class="">

probably not caught yet with latest changes and this is what we see.<br class="">

Instead of me guessing somebody needs to communicate with pytorch and<br class="">

tensor flow developers (via mailing lists).<br class="">

<br class="">

Cheers,<br class="">

Predrag<br class="">

<br class="">

<br class="">

<blockquote type="cite" class=""><br class="">

<br class="">

Thanks,<br class="">

Yichong<br class="">

<br class="">

<br class="">

<br class="">

On Mar 10, 2019, at 3:05 PM, Yotam Hechtlinger <<a href="mailto:yhechtli@andrew.cmu.edu" class="">yhechtli@andrew.cmu.edu</a><<a href="mailto:yhechtli@andrew.cmu.edu" class="">mailto:yhechtli@andrew.cmu.edu</a>>> wrote:<br class="">

<br class="">

Regarding tensorflow you don't need to compile from source.<br class="">

<br class="">

pip install tf-nightly-gpu<br class="">

<br class="">

Should get it done. I think that's what I've done, but it's been few weeks ago, so try it out and if it doesn't work I'll try to debug it.<br class="">

Notice that you'll have to uninstall it and install the regular version when you switch back to the other GPUs.<br class="">

<br class="">

Not sure regarding pytorch, I haven't tried to install it yet.<br class="">

<br class="">

Yotam.<br class="">

<br class="">

<br class="">

On Sun, Mar 10, 2019 at 2:24 PM Yichong Xu <<a href="mailto:yichongx@cs.cmu.edu" class="">yichongx@cs.cmu.edu</a><<a href="mailto:yichongx@cs.cmu.edu" class="">mailto:yichongx@cs.cmu.edu</a>>> wrote:<br class="">

It seems like tensorflow does not support cuda10 right now - it has to be installed from source.<br class="">

But I???m mainly using pytorch though and the version with cuda10 does not run either.<br class="">

Plus, I tried the original cuda example and it cannot find the gpu either:<br class="">

(base) yichongx@gpu10$ ls<br class="">

Makefile  readme.txt  simplePrintf  simplePrintf.cu  simplePrintf.o<br class="">

(base) yichongx@gpu10$ ./simplePrintf<br class="">

CUDA error at ../../common/inc/helper_cuda.h:744 code=999(cudaErrorUnknown) "cudaGetDeviceCount(&device_count)"<br class="">

(base) yichongx@gpu10$<br class="">

<br class="">

<br class="">

<br class="">

Thanks,<br class="">

Yichong<br class="">

<br class="">

<br class="">

<br class="">

On Mar 10, 2019, at 9:52 AM, Yotam Hechtlinger <<a href="mailto:yhechtli@andrew.cmu.edu" class="">yhechtli@andrew.cmu.edu</a><<a href="mailto:yhechtli@andrew.cmu.edu" class="">mailto:yhechtli@andrew.cmu.edu</a>>> wrote:<br class="">

<br class="">

It's not the same cuda version on GPU 10 and the rest, I think different version of tensorflow has to be installed.<br class="">

<br class="">

Check your tensorflow version and if it supports the cuda version on gpu10.<br class="">

<br class="">

<br class="">

<br class="">

On Saturday, March 9, 2019, Predrag Punosevac <<a href="mailto:predragp@andrew.cmu.edu" class="">predragp@andrew.cmu.edu</a><<a href="mailto:predragp@andrew.cmu.edu" class="">mailto:predragp@andrew.cmu.edu</a>>> wrote:<br class="">

Try CUDA 10.0 instead of 10.1<br class="">

<br class="">

On Mar 9, 2019 5:28 PM, Yichong Xu <<a href="mailto:yichongx@cs.cmu.edu" class="">yichongx@cs.cmu.edu</a><<a href="mailto:yichongx@cs.cmu.edu" class="">mailto:yichongx@cs.cmu.edu</a>>> wrote:<br class="">

Same issue here.<br class="">

<br class="">

From my iPhone<br class="">

<br class="">

On Mar 9, 2019, at 4:01 PM, Emre Yolcu <<a href="mailto:eyolcu@andrew.cmu.edu" class="">eyolcu@andrew.cmu.edu</a><<a href="mailto:eyolcu@andrew.cmu.edu" class="">mailto:eyolcu@andrew.cmu.edu</a>>> wrote:<br class="">

<br class="">

<br class="">

Hi,<br class="">

<br class="">

<br class="">

<br class="">

Right now on gpu10 `nvcc --version` and `nvidia-smi` seem to work, but `python -c ???import torch; print(torch.cuda.is_available())???` prints False. Is anybody running into the same issue?<br class="">

<br class="">

<br class="">

<br class="">

Emre<br class="">

<br class="">

<br class="">

<br class="">

</blockquote>

</div>

</div>

</blockquote>

</div>

<br class="">

</div>

</body>

</html>