Possible CUDA problem
    Yotam Hechtlinger 
    yhechtli at andrew.cmu.edu
       
    Tue Jan  8 03:49:13 EST 2019
    
    
  
Hi Predrag,
Is cuDNN properly installed?
I can't see it inside the /usr/local/cuda.
Also *import tensorflow* provides:
*ImportError: libcudnn.so.7: cannot open shared object file: No such file
or directory*
Thanks,
Yotam.
On Mon, Jan 7, 2019 at 10:43 PM Predrag Punosevac <predragp at andrew.cmu.edu>
wrote:
> Ok. I found one problem. CUDA 10 was not properly installed on GPU10
> due to the dependency problems. I had to disable  rpmfusion repos
> (both free and non-free) which I considered safe in the past. Now CUDA
> 10 is installed from NVidia repo and is in /usr/local and
> /usr/local/cuda is the symbolic link to actual /usr/local/cuda-10.0
> folder. Please try now.
>
> Predrag
>
> On Mon, Jan 7, 2019 at 2:28 PM Yotam Hechtlinger
> <yhechtli at andrew.cmu.edu> wrote:
> >
> > Hi Predrag,
> >
> > With GPU10 the problem is probably because LD_LIBRARY_PATH directs to
> /usr/local/cuda/lib64 but that's not where CUDA is installed (where is it?).
> >
> > Yotam.
> >
> >
> > On Mon, Jan 7, 2019 at 9:00 PM Predrag Punosevac <
> predragp at andrew.cmu.edu> wrote:
> >>
> >> Yotam,
> >>
> >> Thank you so much for this report! I am CC-ing users at autonlab.org so
> >> that everyone is on the same page.  Could you please work with me on
> >> this one? Let's try to fix GPU10 first. GPU10 was recently
> >> provisioned. It has three (one was DoA) GeForce 1080Ti. I am running
> >> the latest  NVIDIA-Linux-x86_64-410.78 driver and the latest
> >> cuda-10.0.130-1. You have two versions of Python. /opt/rh/rh-python36
> >> will give you the latest 3.6.7. While /opt/miniconda3 will install
> >> python-3.7.2. Once we fix GPU10 we will move to other machines. Note
> >> that other machines are still running older version of NVidia driver
> >> and CUDA-9.2. I have changed nothing on them so whatever is broken it
> >> is broken upstream (Python,TensorFlow, NVidia, or CUDA).
> >>
> >> Please keep CC-ing users to this discussion so that people know what
> >> is going on.
> >>
> >> Predrag
> >>
> >>
> >> On Mon, Jan 7, 2019 at 8:02 AM Yotam Hechtlinger
> >> <yhechtli at andrew.cmu.edu> wrote:
> >> >
> >> > Hi Predrag,
> >> >
> >> > There might be some CUDA problem on GPU 5,6 & 10.
> >> > I get the following message when I try to import tensorflow:
> >> >
> >> >
> >> >
> >> > >>> import tensorflow
> >> > Traceback (most recent call last):
> >> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py",
> line 58, in <module>
> >> >     from tensorflow.python.pywrap_tensorflow_internal import *
> >> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py",
> line 28, in <module>
> >> >     _pywrap_tensorflow_internal = swig_import_helper()
> >> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py",
> line 24, in swig_import_helper
> >> >     _mod = imp.load_module('_pywrap_tensorflow_internal', fp,
> pathname, description)
> >> >   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line
> 243, in load_module
> >> >     return load_dynamic(name, filename, file)
> >> >   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line
> 343, in load_dynamic
> >> >     return _load(spec)
> >> > ImportError: libcublas.so.9.0: cannot open shared object file: No
> such file or directory
> >> >
> >> > During handling of the above exception, another exception occurred:
> >> >
> >> > Traceback (most recent call last):
> >> >   File "<stdin>", line 1, in <module>
> >> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py",
> line 24, in <module>
> >> >     from tensorflow.python import pywrap_tensorflow  # pylint:
> disable=unused-import
> >> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py",
> line 49, in <module>
> >> >     from tensorflow.python import pywrap_tensorflow
> >> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py",
> line 74, in <module>
> >> >     raise ImportError(msg)
> >> > ImportError: Traceback (most recent call last):
> >> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py",
> line 58, in <module>
> >> >     from tensorflow.python.pywrap_tensorflow_internal import *
> >> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py",
> line 28, in <module>
> >> >     _pywrap_tensorflow_internal = swig_import_helper()
> >> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py",
> line 24, in swig_import_helper
> >> >     _mod = imp.load_module('_pywrap_tensorflow_internal', fp,
> pathname, description)
> >> >   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line
> 243, in load_module
> >> >     return load_dynamic(name, filename, file)
> >> >   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line
> 343, in load_dynamic
> >> >     return _load(spec)
> >> > ImportError: libcublas.so.9.0: cannot open shared object file: No
> such file or directory
> >> >
> >> >
> >> > Failed to load the native TensorFlow runtime.
> >> >
> >> > See https://www.tensorflow.org/install/errors
> >> >
> >> > for some common reasons and solutions.  Include the entire stack trace
> >> > above this error message when asking for help.
> >> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20190108/984554fa/attachment.html>
    
    
More information about the Autonlab-users
mailing list