Possible CUDA problem

Predrag Punosevac predragp at andrew.cmu.edu
Mon Jan 7 14:00:04 EST 2019


Yotam,

Thank you so much for this report! I am CC-ing users at autonlab.org so
that everyone is on the same page.  Could you please work with me on
this one? Let's try to fix GPU10 first. GPU10 was recently
provisioned. It has three (one was DoA) GeForce 1080Ti. I am running
the latest  NVIDIA-Linux-x86_64-410.78 driver and the latest
cuda-10.0.130-1. You have two versions of Python. /opt/rh/rh-python36
will give you the latest 3.6.7. While /opt/miniconda3 will install
python-3.7.2. Once we fix GPU10 we will move to other machines. Note
that other machines are still running older version of NVidia driver
and CUDA-9.2. I have changed nothing on them so whatever is broken it
is broken upstream (Python,TensorFlow, NVidia, or CUDA).

Please keep CC-ing users to this discussion so that people know what
is going on.

Predrag


On Mon, Jan 7, 2019 at 8:02 AM Yotam Hechtlinger
<yhechtli at andrew.cmu.edu> wrote:
>
> Hi Predrag,
>
> There might be some CUDA problem on GPU 5,6 & 10.
> I get the following message when I try to import tensorflow:
>
>
>
> >>> import tensorflow
> Traceback (most recent call last):
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
>     from tensorflow.python.pywrap_tensorflow_internal import *
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
>     _pywrap_tensorflow_internal = swig_import_helper()
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
>     _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 243, in load_module
>     return load_dynamic(name, filename, file)
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic
>     return _load(spec)
> ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
>     from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
>     from tensorflow.python import pywrap_tensorflow
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
>     raise ImportError(msg)
> ImportError: Traceback (most recent call last):
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
>     from tensorflow.python.pywrap_tensorflow_internal import *
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
>     _pywrap_tensorflow_internal = swig_import_helper()
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
>     _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 243, in load_module
>     return load_dynamic(name, filename, file)
>   File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic
>     return _load(spec)
> ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
>
>
> Failed to load the native TensorFlow runtime.
>
> See https://www.tensorflow.org/install/errors
>
> for some common reasons and solutions.  Include the entire stack trace
> above this error message when asking for help.
>


More information about the Autonlab-users mailing list