<div dir="ltr"><span style="font-size:small">The error you showed<span class="inbox-inbox-Apple-converted-space"> </span></span><i style="font-size:small">should</i><span style="font-size:small"> be triggered by starting a session (not just by importing tensorflow, but the command I sent earlier does that).</span><br><div><span style="font-size:small"><br></span></div><div><font size="2">It could be that your torch install in your home directory is messing with things. Try exporting LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib64/mpich/lib before starting python.</font></div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, May 12, 2017 at 6:20 PM Kirthevasan Kandasamy <<a href="mailto:kandasamy@cmu.edu">kandasamy@cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">hey Dougal,<div><br></div><div>I could run python and import tensorflow on GPU1 but the issue is when I run my command.</div><div>Could it be that GPU4 is still using the older version of tensorflow?</div><div><br></div><div>I can run my stuff on GPU4 without much of an issue but not on GPU1. Here's what LD_LIBRARY_PATH gives me oin GPU4 and GPU1</div><div><br></div><div><div>kkandasa@gpu4$ echo $LD_LIBRARY_PATH</div><div>/zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:</div></div><div><br></div><div><div>kkandasa@gpu1$ echo $LD_LIBRARY_PATH</div><div>/zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/usr/local/cuda/lib64:/usr/lib64/mpich/lib</div></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 12, 2017 at 11:59 AM, Dougal Sutherland <span dir="ltr"><<a href="mailto:dougal@gmail.com" target="_blank">dougal@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">It's possible that you followed some instructions I sent a while ago and are using your own version of cudnn. Try "echo $LD_LIBRARY_PATH" and make sure it only has things in /usr/local, /usr/lib64 (nothing in your own directories), and make sure that your python code doesn't change that....<div><br></div><div>The Anaconda python distribution now distributes cudnn and tensorflow-gpu, so you could also install that in your scratch dir to have your own install. But they only have tensorflow 1.0 and higher, so your old code would require some changes (system install on gpu1 is 0.10, and there were breaking changes in both 1.0 and 1.1).</div></div><div class="m_2014180978264298058HOEnZb"><div class="m_2014180978264298058h5"><br><div class="gmail_quote"><div dir="ltr">On Fri, May 12, 2017 at 4:55 PM Dougal Sutherland <<a href="mailto:dougal@gmail.com" target="_blank">dougal@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">It works for me too, not in IPython. Try this:<div><br></div><div>CUDA_VISIBLE_DEVICES=5 python -c 'import tensorflow as tf; tf.InteractiveSession()'</div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, May 12, 2017 at 4:55 PM Kirthevasan Kandasamy <<a href="mailto:kandasamy@cmu.edu" target="_blank">kandasamy@cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">No, I don't use iPython.</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 12, 2017 at 11:22 AM, <span dir="ltr"><<a href="mailto:chiragn@andrew.cmu.edu" target="_blank">chiragn@andrew.cmu.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Have you tried running it from with iPython notebook as an interactive<br>
session?<br>
<br>
I am doing that right now and it works.<br>
<br>
Chirag<br>
<div class="m_2014180978264298058m_3488426076459665065m_-4840874716259804677m_424370253774490668HOEnZb"><div class="m_2014180978264298058m_3488426076459665065m_-4840874716259804677m_424370253774490668h5"><br>
<br>
> Kirthevasan Kandasamy <<a href="mailto:kandasamy@cmu.edu" target="_blank">kandasamy@cmu.edu</a>> wrote:<br>
><br>
>> Hi Predrag,<br>
>><br>
>> I am re-running a tensorflow project on GPU1 - I haven't touched it in<br>
>> 4/5<br>
>> months, and the last time I ran it it worked fine, but when I try now I<br>
>> seem to be getting the following error.<br>
>><br>
><br>
> This is the first time I hear about it. I was under impression that GPU<br>
> nodes were usable. I am redirecting your e-mail to <a href="mailto:users@autonlab.org" target="_blank">users@autonlab.org</a><br>
> in the hope that somebody who is using TensorFlow on the regular basis<br>
> can be of more help.<br>
><br>
> Predrag<br>
><br>
><br>
><br>
><br>
>> Can you please tell me what the issue might be or direct me to someone<br>
>> who<br>
>> might know?<br>
>><br>
>> This is for the NIPS deadline, so I would appreciate a quick response.<br>
>><br>
>> thanks,<br>
<br>
>> Samy<br>
>><br>
>><br>
>> I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0<br>
>> with<br>
>> properties:<br>
>> name: Tesla K80<br>
>> major: 3 minor: 7 memoryClockRate (GHz) 0.8235<br>
>> pciBusID 0000:05:00.0<br>
>> Total memory: 11.17GiB<br>
>> Free memory: 11.11GiB<br>
>> I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0<br>
>> I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y<br>
>> I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating<br>
>> TensorFlow<br>
>> device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id:<br>
>> 0000:05:00.0)<br>
>> E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime CuDNN<br>
>> library: 4007 (compatibility version 4000) but source was compiled with<br>
>> 5103 (compatibility version 5100). If using a binary install, upgrade<br>
>> your<br>
>> CuDNN library to match. If building from sources, make sure the library<br>
>> loaded at runtime matches a compatible version specified during compile<br>
>> configuration.<br>
>> F tensorflow/core/kernels/conv_ops.cc:457] Check failed:<br>
>> stream->parent()->GetConvolveAlgorithms(&algorithms)<br>
>> run_resnet.sh: line 49: 22665 Aborted (core dumped)<br>
>> CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py --data_dir<br>
>> $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every<br>
>> $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET --num_gpus<br>
>> 1<br>
>> --save_model_dir $SAVE_MODEL_DIR --save_model_every $SAVE_MODEL_EVERY<br>
>> --skip_add_method $SKIP_ADD_METHOD --architecture $ARCHITECTURE<br>
>> --skip_size<br>
>> $SKIP_SIZE<br>
><br>
<br>
<br>
</div></div></blockquote></div><br></div>
</blockquote></div></blockquote></div>
</div></div></blockquote></div><br></div>
</blockquote></div>