Naive Tensorflow/GPU1 question

Fri May 12 11:59:28 EDT 2017

It's possible that you followed some instructions I sent a while ago and
are using your own version of cudnn. Try "echo $LD_LIBRARY_PATH" and make
sure it only has things in /usr/local, /usr/lib64 (nothing in your own
directories), and make sure that your python code doesn't change that....

The Anaconda python distribution now distributes cudnn and tensorflow-gpu,
so you could also install that in your scratch dir to have your own
install. But they only have tensorflow 1.0 and higher, so your old code
would require some changes (system install on gpu1 is 0.10, and there were
breaking changes in both 1.0 and 1.1).

On Fri, May 12, 2017 at 4:55 PM Dougal Sutherland <dougal at gmail.com> wrote:

> It works for me too, not in IPython. Try this:
>
> CUDA_VISIBLE_DEVICES=5 python -c 'import tensorflow as tf;
> tf.InteractiveSession()'
>
> On Fri, May 12, 2017 at 4:55 PM Kirthevasan Kandasamy <kandasamy at cmu.edu>
> wrote:
>
>> No, I don't use iPython.
>>
>> On Fri, May 12, 2017 at 11:22 AM, <chiragn at andrew.cmu.edu> wrote:
>>
>>> Have you tried running it from with iPython notebook as an interactive
>>> session?
>>>
>>> I am doing that right now and it works.
>>>
>>> Chirag
>>>
>>>
>>> > Kirthevasan Kandasamy <kandasamy at cmu.edu> wrote:
>>> >
>>> >> Hi Predrag,
>>> >>
>>> >> I am re-running a tensorflow project on GPU1 - I haven't touched it in
>>> >> 4/5
>>> >> months, and the last time I ran it it worked fine, but when I try now
>>> I
>>> >> seem to be getting the following error.
>>> >>
>>> >
>>> > This is the first time I hear about it. I was under impression that GPU
>>> > nodes were usable.  I am redirecting your e-mail to users at autonlab.org
>>> > in the hope that somebody who is using TensorFlow on the regular basis
>>> > can be of more help.
>>> >
>>> > Predrag
>>> >
>>> >
>>> >
>>> >
>>> >> Can you please tell me what the issue might be or direct me to someone
>>> >> who
>>> >> might know?
>>> >>
>>> >> This is for the NIPS deadline, so I would appreciate a quick response.
>>> >>
>>> >> thanks,
>>>
>>> >> Samy
>>> >>
>>> >>
>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0
>>> >> with
>>> >> properties:
>>> >> name: Tesla K80
>>> >> major: 3 minor: 7 memoryClockRate (GHz) 0.8235
>>> >> pciBusID 0000:05:00.0
>>> >> Total memory: 11.17GiB
>>> >> Free memory: 11.11GiB
>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y
>>> >> I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating
>>> >> TensorFlow
>>> >> device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id:
>>> >> 0000:05:00.0)
>>> >> E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime
>>> CuDNN
>>> >> library: 4007 (compatibility version 4000) but source was compiled
>>> with
>>> >> 5103 (compatibility version 5100).  If using a binary install, upgrade
>>> >> your
>>> >> CuDNN library to match.  If building from sources, make sure the
>>> library
>>> >> loaded at runtime matches a compatible version specified during
>>> compile
>>> >> configuration.
>>> >> F tensorflow/core/kernels/conv_ops.cc:457] Check failed:
>>> >> stream->parent()->GetConvolveAlgorithms(&algorithms)
>>> >> run_resnet.sh: line 49: 22665 Aborted                 (core dumped)
>>> >> CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py --data_dir
>>> >> $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every
>>> >> $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET
>>> --num_gpus
>>> >> 1
>>> >> --save_model_dir $SAVE_MODEL_DIR --save_model_every $SAVE_MODEL_EVERY
>>> >> --skip_add_method $SKIP_ADD_METHOD --architecture $ARCHITECTURE
>>> >> --skip_size
>>> >> $SKIP_SIZE
>>> >
>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20170512/15bb8eeb/attachment.html>