Possible CUDA problem

Yotam Hechtlinger yhechtli at andrew.cmu.edu
Mon Jan 7 14:28:09 EST 2019


Hi Predrag,

With GPU10 the problem is probably because LD_LIBRARY_PATH directs to
/usr/local/cuda/lib64 but that's not where CUDA is installed (where is
it?).

Yotam.


On Mon, Jan 7, 2019 at 9:00 PM Predrag Punosevac <predragp at andrew.cmu.edu>
wrote:

> Yotam,
>
> Thank you so much for this report! I am CC-ing users at autonlab.org so
> that everyone is on the same page.  Could you please work with me on
> this one? Let's try to fix GPU10 first. GPU10 was recently
> provisioned. It has three (one was DoA) GeForce 1080Ti. I am running
> the latest  NVIDIA-Linux-x86_64-410.78 driver and the latest
> cuda-10.0.130-1. You have two versions of Python. /opt/rh/rh-python36
> will give you the latest 3.6.7. While /opt/miniconda3 will install
> python-3.7.2. Once we fix GPU10 we will move to other machines. Note
> that other machines are still running older version of NVidia driver
> and CUDA-9.2. I have changed nothing on them so whatever is broken it
> is broken upstream (Python,TensorFlow, NVidia, or CUDA).
>
> Please keep CC-ing users to this discussion so that people know what
> is going on.
>
> Predrag
>
>
> On Mon, Jan 7, 2019 at 8:02 AM Yotam Hechtlinger
> <yhechtli at andrew.cmu.edu> wrote:
> >
> > Hi Predrag,
> >
> > There might be some CUDA problem on GPU 5,6 & 10.
> > I get the following message when I try to import tensorflow:
> >
> >
> >
> > >>> import tensorflow
> > Traceback (most recent call last):
> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py",
> line 58, in <module>
> >     from tensorflow.python.pywrap_tensorflow_internal import *
> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py",
> line 28, in <module>
> >     _pywrap_tensorflow_internal = swig_import_helper()
> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py",
> line 24, in swig_import_helper
> >     _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname,
> description)
> >   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line
> 243, in load_module
> >     return load_dynamic(name, filename, file)
> >   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line
> 343, in load_dynamic
> >     return _load(spec)
> > ImportError: libcublas.so.9.0: cannot open shared object file: No such
> file or directory
> >
> > During handling of the above exception, another exception occurred:
> >
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py",
> line 24, in <module>
> >     from tensorflow.python import pywrap_tensorflow  # pylint:
> disable=unused-import
> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py",
> line 49, in <module>
> >     from tensorflow.python import pywrap_tensorflow
> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py",
> line 74, in <module>
> >     raise ImportError(msg)
> > ImportError: Traceback (most recent call last):
> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py",
> line 58, in <module>
> >     from tensorflow.python.pywrap_tensorflow_internal import *
> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py",
> line 28, in <module>
> >     _pywrap_tensorflow_internal = swig_import_helper()
> >   File
> "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py",
> line 24, in swig_import_helper
> >     _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname,
> description)
> >   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line
> 243, in load_module
> >     return load_dynamic(name, filename, file)
> >   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line
> 343, in load_dynamic
> >     return _load(spec)
> > ImportError: libcublas.so.9.0: cannot open shared object file: No such
> file or directory
> >
> >
> > Failed to load the native TensorFlow runtime.
> >
> > See https://www.tensorflow.org/install/errors
> >
> > for some common reasons and solutions.  Include the entire stack trace
> > above this error message when asking for help.
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20190107/fa2df8a8/attachment.html>


More information about the Autonlab-users mailing list