GPU3 is "configured"

Dougal Sutherland dougal at gmail.com
Thu Oct 13 13:51:23 EDT 2016


I actually haven't gotten tensorflow working yet -- the bazel build just
hangs on me. I think it maybe has to do with home directories being on NFS,
but I can't figure out bazel at all. I'll try some more tonight.

Caffe should be workable following the instructions Predrag forwarded.

- Dougal

On Thu, Oct 13, 2016 at 6:39 PM Predrag Punosevac <
predragp at imap.srv.cs.cmu.edu> wrote:

> Dear Autonians,
>
> In the case anybody is interested what happens behind the scenes, Doug
> got Caffe and TensorFlow to work on
> GPU3. Please see message below. I also got the  very useful feed back
> from Princeton and Rutgers people. Please check out if you care (you
> will have to log into Gmail to see the exchange).
>
> https://groups.google.com/forum/#!forum/springdale-users
>
> I need to think how we move forward with this before start pulling
> triggers. If somebody is itchy and can't wait please build Caffe and
> TensorFlow in your scratch directory following below howto.
>
> Predrag
>
> On 2016-10-13 13:24, Dougal Sutherland wrote:
> > A note about cudnn:
> >
> > There are a bunch of versions of cudnn. They're not
> > backwards-compatible, and different versions of
> > caffe/tensorflow/whatever want different ones.
> >
> > I currently am using the setup in  ~dsutherl/cudnn_files:
> >
> >       * I have a bunch of versions of the installer there.
> >       * The use-cudnn.sh script, intended to be used like "source
> > use-cudnn.sh 5.1", will untar the appropriate one into a scratch
> > directory (if it hasn't already been done) and set
> > CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH is
> > needed for caffe binaries, since they don't link to the absolute path;
> > the first two (not sure about the the third) are needed for theano.
> > Dunno about tensorflow yet.
> >
> > So, here's the Caffe setup:
> >
> > cd /home/scratch/$USER
> > git clone https://github.com/BVLC/caffe
> > cd caffe
> > cp Makefile.config.example Makefile.config
> >
> > # tell it to use openblas; using atlas needs some changes to the
> > Makefile
> > sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config
> >
> > # configure to use cudnn (optional)
> > source ~dsutherl/cudnn-files/use-cudnn.sh 5.1
> > sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config
> > perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/'
> > Makefile.config
> > perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/'
> > Makefile.config
> >
> > # build the library
> > make -j23
> >
> > # to do tests (takes ~10 minutes):
> > make -j23 test
> > make runtest
> >
> > # Now, to run caffe binaries you'll need to remember to source
> > use-cudnn if you used cudnn before.
> >
> > # To build the python libary:
> > make py
> >
> > # Requirements for the python library:
> > # Some of the system packages are too old; this installs them in your
> > scratch directory.
> > # You'll have to set PYTHONUSERBASE again before running any python
> > processes that use these libs.
> > export PYTHONUSERBASE=$HOME/scratch/.local;
> > export PATH=$PYTHONUSERBASE/bin:"$PATH"  # <- optional
> > pip install --user -r python/requirements.txt
> >
> > # Caffe is dumb and doesn't package its python library properly. The
> > easiest way to use it is:
> > export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH
> > python -c 'import caffe'
> >
> > On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland <dougal at gmail.com>
> > wrote:
> >
> >> Java fix seemed to work. Now tensorflow wants  python-wheel  and
> >> swig.
> >>
> >> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac
> >> <predragp at imap.srv.cs.cmu.edu> wrote:
> >>
> >>> On 2016-10-13 11:46, Dougal Sutherland wrote:
> >>>
> >>>> Having some trouble with tensorflow, because:
> >>>
> >>>>
> >>>
> >>>> * it require's Google's bazel build system
> >>>
> >>>>
> >>>
> >>>> * The bazel installer says
> >>>
> >>>> Java version is 1.7.0_111 while at least 1.8 is needed.
> >>>
> >>>> *
> >>>
> >>>>
> >>>
> >>>> * $ java -version
> >>>
> >>>> openjdk version "1.8.0_102"
> >>>
> >>>> OpenJDK Runtime Environment (build 1.8.0_102-b14)
> >>>
> >>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
> >>>
> >>>> $ javac -version
> >>>
> >>>> javac 1.7.0_111
> >>>
> >>>>
> >>>
> >>> I just did yum -y install java-1.8.0* which installs openjdk 1.8.
> >>> Please
> >>>
> >>> change your java. Let me know if
> >>>
> >>> you want me to install Oracle JDK 1.8
> >>>
> >>> Predrag
> >>>
> >>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac
> >>>
> >>>> <predragp at cs.cmu.edu> wrote:
> >>>
> >>>>
> >>>
> >>>>> Dougal Sutherland <dougal at gmail.com> wrote:
> >>>
> >>>>>
> >>>
> >>>>>> Also, this seemed to work for me so far for protobuf:
> >>>
> >>>>>>
> >>>
> >>>>>> cd /home/scratch/$USER
> >>>
> >>>>>> VER=3.1.0
> >>>
> >>>>>> wget
> >>>
> >>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> https://github.com/google/protobuf/releases/download/v$VER/protobuf-cpp-$VER.tar.gz
> >>>
> >>>>>> tar xf protobuf-cpp-$VER.tar.gz
> >>>
> >>>>>> cd protobuf-cpp-$VER
> >>>
> >>>>>> ./configure --prefix=/home/scratch/$USER
> >>>
> >>>>>> make -j12
> >>>
> >>>>>> make -j12 check
> >>>
> >>>>>> make install
> >>>
> >>>>>
> >>>
> >>>>> That is great help!
> >>>
> >>>>>
> >>>
> >>>>>>
> >>>
> >>>>>> You could change --prefix=/usr if making an RPM.
> >>>
> >>>>>>
> >>>
> >>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland
> >>>
> >>>>> <dougal at gmail.com> wrote:
> >>>
> >>>>>>
> >>>
> >>>>>>> Some more packages for caffe:
> >>>
> >>>>>>>
> >>>
> >>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel
> >>>
> >>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel
> >>>
> >>>>>>>
> >>>
> >>>>>>> (Some of those might be installed already, but at least
> >>> gflags
> >>>
> >>>>> is
> >>>
> >>>>>>> definitely missing.)
> >>>
> >>>>>>>
> >>>
> >>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac <
> >>>
> >>>>>>> predragp at imap.srv.cs.cmu.edu> wrote:
> >>>
> >>>>>>>
> >>>
> >>>>>>> On 2016-10-12 23:26, Arne Suppe wrote:
> >>>
> >>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice
> >>>
> >>>>> also hangs
> >>>
> >>>>>>>> on my computer with R2016a.
> >>>
> >>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>>> We would have to escalate this with MathWorks. I have seen
> >>> work
> >>>
> >>>>> around
> >>>
> >>>>>>> Internet but it looks like a bug in one of Mathworks provided
> >>>
> >>>>> MEX files.
> >>>
> >>>>>>>
> >>>
> >>>>>>>> I was able compile the matrixMul example in the CUDA
> >>> samples
> >>>
> >>>>> and run
> >>>
> >>>>>>>> it on gpu3, so I think the build environment is probably
> >>> all
> >>>
> >>>>> set.
> >>>
> >>>>>>>>
> >>>
> >>>>>>>> As for the openGL, I think its possibly a problem with
> >>> their
> >>>
> >>>>> build
> >>>
> >>>>>>>> script findgl.mk [1] [1] which is not familiar with
> >>> Springdale OS.
> >>>
> >>>>> The
> >>>
> >>>>>>>> demo_suite directory has a precompiled nbody binary you may
> >>>
> >>>>> try, but I
> >>>
> >>>>>>>> suspect most users will not need graphics.
> >>>
> >>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>>> That should not be too hard to fix. Some header files have to
> >>> be
> >>>
> >>>>>>> manually edited. The funny part until 7.2 Princeton people
> >>>
> >>>>> didn't bother
> >>>
> >>>>>>> to remove RHEL branding which actually made things easier for
> >>>
> >>>>> us.
> >>>
> >>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>>> Doug is trying right now to compile the latest Caffe,
> >>>
> >>>>> TensorFlow, and
> >>>
> >>>>>>> protobuf-3. We will try to create an RPM for that so that we
> >>>
> >>>>> don't have
> >>>
> >>>>>>> to go through this again. I also asked Princeton and Rutgers
> >>>
> >>>>> guys if
> >>>
> >>>>>>> they
> >>>
> >>>>>>> have WIP RPMs to share.
> >>>
> >>>>>>>
> >>>
> >>>>>>> Predrag
> >>>
> >>>>>>>
> >>>
> >>>>>>>> Arne
> >>>
> >>>>>>>>
> >>>
> >>>>>>>>
> >>>
> >>>>>>>>
> >>>
> >>>>>>>>
> >>>
> >>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac
> >>>
> >>>>> <predragp at cs.cmu.edu>
> >>>
> >>>>>>>>> wrote:
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> Arne Suppe <suppe at andrew.cmu.edu> wrote:
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>> Hi Predrag,
> >>>
> >>>>>>>>>> Don???t know if this applies to you, but I just build a
> >>>
> >>>>> machines with
> >>>
> >>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the
> >>>
> >>>>> Titan.  After
> >>>
> >>>>>>>>>> installing CUDA 8, I still found I needed to install the
> >>>
> >>>>> latest
> >>>
> >>>>>>>>>> driver off of the NVIDIA web site to get the card
> >>>
> >>>>> recognized.  Right
> >>>
> >>>>>>>>>> now, I am running 367.44.
> >>>
> >>>>>>>>>>
> >>>
> >>>>>>>>>> Arne
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> Arne,
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL
> >>>
> >>>>> arhitecture I
> >>>
> >>>>>>>>> see lots of people complaining about it on the forums. I
> >>>
> >>>>> downloaded
> >>>
> >>>>>>>>> and
> >>>
> >>>>>>>>> installed driver from
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> That seems to made a real difference. Check out this
> >>>
> >>>>> beautiful outputs
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> root at gpu3$ ls nvidia*
> >>>
> >>>>>>>>> nvidia0  nvidia1  nvidia2  nvidia3  nvidiactl  nvidia-uvm
> >>>
> >>>>>>>>> nvidia-uvm-tools
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> root at gpu3$ lspci | grep -i nvidia
> >>>
> >>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation
> >>> Device
> >>>
> >>>>> 1b00 (rev
> >>>
> >>>>>>>>> a1)
> >>>
> >>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
> >>> a1)
> >>>
> >>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation
> >>> Device
> >>>
> >>>>> 1b00 (rev
> >>>
> >>>>>>>>> a1)
> >>>
> >>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
> >>> a1)
> >>>
> >>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation
> >>> Device
> >>>
> >>>>> 1b00 (rev
> >>>
> >>>>>>>>> a1)
> >>>
> >>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
> >>> a1)
> >>>
> >>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation
> >>> Device
> >>>
> >>>>> 1b00 (rev
> >>>
> >>>>>>>>> a1)
> >>>
> >>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
> >>> a1)
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> root at gpu3$ ls /proc/driver
> >>>
> >>>>>>>>> nvidia  nvidia-uvm  nvram  rtc
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> root at gpu3$ lsmod |grep nvidia
> >>>
> >>>>>>>>> nvidia_uvm            738901  0
> >>>
> >>>>>>>>> nvidia_drm             43405  0
> >>>
> >>>>>>>>> nvidia_modeset        764432  1 nvidia_drm
> >>>
> >>>>>>>>> nvidia              11492947  2 nvidia_modeset,nvidia_uvm
> >>>
> >>>>>>>>> drm_kms_helper        125056  2 ast,nvidia_drm
> >>>
> >>>>>>>>> drm                   349210  5
> >>>
> >>>>> ast,ttm,drm_kms_helper,nvidia_drm
> >>>
> >>>>>>>>> i2c_core               40582  7
> >>>
> >>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> root at gpu3$ nvidia-smi
> >>>
> >>>>>>>>> Wed Oct 12 22:03:27 2016
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> +-----------------------------------------------------------------------------+
> >>>
> >>>>>>>>> | NVIDIA-SMI 367.57                 Driver Version: 367.57
> >>>
> >>>>>>>>> |
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> |-------------------------------+----------------------+----------------------+
> >>>
> >>>>>>>>> | GPU  Name        Persistence-M| Bus-Id        Disp.A |
> >>>
> >>>>> Volatile
> >>>
> >>>>>>>>> Uncorr. ECC |
> >>>
> >>>>>>>>> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage |
> >>>
> >>>>> GPU-Util
> >>>
> >>>>>>>>> Compute M. |
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> |===============================+======================+======================|
> >>>
> >>>>>>>>> |   0  TITAN X (Pascal)    Off  | 0000:02:00.0     Off |
> >>>
> >>>>>>>>> N/A |
> >>>
> >>>>>>>>> | 23%   32C    P0    56W / 250W |      0MiB / 12189MiB |
> >>>
> >>>>> 0%
> >>>
> >>>>>>>>> Default |
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> +-------------------------------+----------------------+----------------------+
> >>>
> >>>>>>>>> |   1  TITAN X (Pascal)    Off  | 0000:03:00.0     Off |
> >>>
> >>>>>>>>> N/A |
> >>>
> >>>>>>>>> | 23%   36C    P0    57W / 250W |      0MiB / 12189MiB |
> >>>
> >>>>> 0%
> >>>
> >>>>>>>>> Default |
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> +-------------------------------+----------------------+----------------------+
> >>>
> >>>>>>>>> |   2  TITAN X (Pascal)    Off  | 0000:82:00.0     Off |
> >>>
> >>>>>>>>> N/A |
> >>>
> >>>>>>>>> | 23%   35C    P0    57W / 250W |      0MiB / 12189MiB |
> >>>
> >>>>> 0%
> >>>
> >>>>>>>>> Default |
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> +-------------------------------+----------------------+----------------------+
> >>>
> >>>>>>>>> |   3  TITAN X (Pascal)    Off  | 0000:83:00.0     Off |
> >>>
> >>>>>>>>> N/A |
> >>>
> >>>>>>>>> |  0%   35C    P0    56W / 250W |      0MiB / 12189MiB |
> >>>
> >>>>> 0%
> >>>
> >>>>>>>>> Default |
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> +-------------------------------+----------------------+----------------------+
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> +-----------------------------------------------------------------------------+
> >>>
> >>>>>>>>> | Processes:
> >>>
> >>>>> GPU
> >>>
> >>>>>>>>> Memory |
> >>>
> >>>>>>>>> |  GPU       PID  Type  Process name
> >>>
> >>>>>>>>> Usage
> >>>
> >>>>>>>>> |
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> |=============================================================================|
> >>>
> >>>>>>>>> |  No running processes found
> >>>
> >>>>>>>>> |
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>
> >>>
> >>>>
> >>>
> >>
> >
> +-----------------------------------------------------------------------------+
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> Alignment requirement for Surfaces:            Yes
> >>>
> >>>>>>>>> Device has ECC support:                        Disabled
> >>>
> >>>>>>>>> Device supports Unified Addressing (UVA):      Yes
> >>>
> >>>>>>>>> Device PCI Domain ID / Bus ID / location ID:   0 / 131 /
> >>> 0
> >>>
> >>>>>>>>> Compute Mode:
> >>>
> >>>>>>>>> < Default (multiple host threads can use
> >>>
> >>>>> ::cudaSetDevice() with
> >>>
> >>>>>>>>> device simultaneously) >
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU1) :
> >>>
> >>>>>>>>> Yes
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU2) :
> >>>
> >>>>>>>>> No
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU3) :
> >>>
> >>>>>>>>> No
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU0) :
> >>>
> >>>>>>>>> Yes
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU2) :
> >>>
> >>>>>>>>> No
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU3) :
> >>>
> >>>>>>>>> No
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU0) :
> >>>
> >>>>>>>>> No
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU1) :
> >>>
> >>>>>>>>> No
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU3) :
> >>>
> >>>>>>>>> Yes
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU0) :
> >>>
> >>>>>>>>> No
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU1) :
> >>>
> >>>>>>>>> No
> >>>
> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
> >>> (Pascal)
> >>>
> >>>>> (GPU2) :
> >>>
> >>>>>>>>> Yes
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version =
> >>> 8.0,
> >>>
> >>>>> CUDA
> >>>
> >>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X
> >>>
> >>>>> (Pascal),
> >>>
> >>>>>>>>> Device1
> >>>
> >>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 =
> >>>
> >>>>> TITAN X
> >>>
> >>>>>>>>> (Pascal)
> >>>
> >>>>>>>>> Result = PASS
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> Now not everything is rosy
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> root at gpu3$ cd
> >>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
> >>>
> >>>>>>>>> root at gpu3$ make
> >>>
> >>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting
> >>>
> >>>>> Started Guide
> >>>
> >>>>>>>>> for how to find and install them. <<<
> >>>
> >>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting
> >>>
> >>>>> Started Guide
> >>>
> >>>>>>>>> for how to find and install them. <<<
> >>>
> >>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting
> >>>
> >>>>> Started Guide
> >>>
> >>>>>>>>> for how to find and install them. <<<
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> even though those are installed. For example
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> root at gpu3$ yum whatprovides  */libX11.so
> >>>
> >>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for
> >>> libX11
> >>>
> >>>>>>>>> Repo        : core
> >>>
> >>>>>>>>> Matched from:
> >>>
> >>>>>>>>> Filename    : /usr/lib/libX11.so
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> also
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> mesa-libGLU-devel
> >>>
> >>>>>>>>> mesa-libGL-devel
> >>>
> >>>>>>>>> xorg-x11-drv-nvidia-devel
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> but
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel
> >>> mesa-libGL-devel
> >>>
> >>>>>>>>> xorg-x11-drv-nvidia-devel
> >>>
> >>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already
> >>>
> >>>>> installed and
> >>>
> >>>>>>>>> latest version
> >>>
> >>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64
> >>> already
> >>>
> >>>>>>>>> installed
> >>>
> >>>>>>>>> and latest version
> >>>
> >>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64
> >>>
> >>>>> already
> >>>
> >>>>>>>>> installed and latest version
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> Also from MATLAB gpuDevice hangs.
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> So we still don't have a working installation. Any help
> >>> would
> >>>
> >>>>> be
> >>>
> >>>>>>>>> appreciated.
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> Best,
> >>>
> >>>>>>>>> Predrag
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> P.S. Once we have a working installation we can think of
> >>>
> >>>>> installing
> >>>
> >>>>>>>>> Caffe and TensorFlow. For now we have to see why the
> >>> things
> >>>
> >>>>> are not
> >>>
> >>>>>>>>> working.
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>>
> >>>
> >>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac
> >>>
> >>>>> <predragp at cs.cmu.edu>
> >>>
> >>>>>>>>>>> wrote:
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>> Dear Autonians,
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all
> >>>
> >>>>> packages
> >>>
> >>>>>>>>>>> are
> >>>
> >>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA
> >>>
> >>>>> driver to
> >>>
> >>>>>>>>>>> recognize GPU cards. They appear to be properly
> >>> installed
> >>>
> >>>>> from the
> >>>
> >>>>>>>>>>> hardware point of view and you can list them with
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>> lshw -class display
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>> root at gpu3$ lshw -class display
> >>>
> >>>>>>>>>>> *-display UNCLAIMED
> >>>
> >>>>>>>>>>> description: VGA compatible controller
> >>>
> >>>>>>>>>>> product: NVIDIA Corporation
> >>>
> >>>>>>>>>>> vendor: NVIDIA Corporation
> >>>
> >>>>>>>>>>> physical id: 0
> >>>
> >>>>>>>>>>> bus info: pci at 0000:02:00.0
> >>>
> >>>>>>>>>>> version: a1
> >>>
> >>>>>>>>>>> width: 64 bits
> >>>
> >>>>>>>>>>> clock: 33MHz
> >>>
> >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
> >>>
> >>>>> cap_list
> >>>
> >>>>>>>>>>> configuration: latency=0
> >>>
> >>>>>>>>>>> resources: iomemory:383f0-383ef
> >>> iomemory:383f0-383ef
> >>>
> >>>>>>>>>>> memory:cf000000-cfffffff
> >>> memory:383fe0000000-383fefffffff
> >>>
> >>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128)
> >>>
> >>>>>>>>>>> memory:d0000000-d007ffff
> >>>
> >>>>>>>>>>> *-display UNCLAIMED
> >>>
> >>>>>>>>>>> description: VGA compatible controller
> >>>
> >>>>>>>>>>> product: NVIDIA Corporation
> >>>
> >>>>>>>>>>> vendor: NVIDIA Corporation
> >>>
> >>>>>>>>>>> physical id: 0
> >>>
> >>>>>>>>>>> bus info: pci at 0000:03:00.0
> >>>
> >>>>>>>>>>> version: a1
> >>>
> >>>>>>>>>>> width: 64 bits
> >>>
> >>>>>>>>>>> clock: 33MHz
> >>>
> >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
> >>>
> >>>>> cap_list
> >>>
> >>>>>>>>>>> configuration: latency=0
> >>>
> >>>>>>>>>>> resources: iomemory:383f0-383ef
> >>> iomemory:383f0-383ef
> >>>
> >>>>>>>>>>> memory:cd000000-cdffffff
> >>> memory:383fc0000000-383fcfffffff
> >>>
> >>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128)
> >>>
> >>>>>>>>>>> memory:ce000000-ce07ffff
> >>>
> >>>>>>>>>>> *-display
> >>>
> >>>>>>>>>>> description: VGA compatible controller
> >>>
> >>>>>>>>>>> product: ASPEED Graphics Family
> >>>
> >>>>>>>>>>> vendor: ASPEED Technology, Inc.
> >>>
> >>>>>>>>>>> physical id: 0
> >>>
> >>>>>>>>>>> bus info: pci at 0000:06:00.0
> >>>
> >>>>>>>>>>> version: 30
> >>>
> >>>>>>>>>>> width: 32 bits
> >>>
> >>>>>>>>>>> clock: 33MHz
> >>>
> >>>>>>>>>>> capabilities: pm msi vga_controller bus_master
> >>>
> >>>>> cap_list rom
> >>>
> >>>>>>>>>>> configuration: driver=ast latency=0
> >>>
> >>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff
> >>>
> >>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128)
> >>>
> >>>>>>>>>>> *-display UNCLAIMED
> >>>
> >>>>>>>>>>> description: VGA compatible controller
> >>>
> >>>>>>>>>>> product: NVIDIA Corporation
> >>>
> >>>>>>>>>>> vendor: NVIDIA Corporation
> >>>
> >>>>>>>>>>> physical id: 0
> >>>
> >>>>>>>>>>> bus info: pci at 0000:82:00.0
> >>>
> >>>>>>>>>>> version: a1
> >>>
> >>>>>>>>>>> width: 64 bits
> >>>
> >>>>>>>>>>> clock: 33MHz
> >>>
> >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
> >>>
> >>>>> cap_list
> >>>
> >>>>>>>>>>> configuration: latency=0
> >>>
> >>>>>>>>>>> resources: iomemory:387f0-387ef
> >>> iomemory:387f0-387ef
> >>>
> >>>>>>>>>>> memory:fa000000-faffffff
> >>> memory:387fe0000000-387fefffffff
> >>>
> >>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128)
> >>>
> >>>>>>>>>>> memory:fb000000-fb07ffff
> >>>
> >>>>>>>>>>> *-display UNCLAIMED
> >>>
> >>>>>>>>>>> description: VGA compatible controller
> >>>
> >>>>>>>>>>> product: NVIDIA Corporation
> >>>
> >>>>>>>>>>> vendor: NVIDIA Corporation
> >>>
> >>>>>>>>>>> physical id: 0
> >>>
> >>>>>>>>>>> bus info: pci at 0000:83:00.0
> >>>
> >>>>>>>>>>> version: a1
> >>>
> >>>>>>>>>>> width: 64 bits
> >>>
> >>>>>>>>>>> clock: 33MHz
> >>>
> >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
> >>>
> >>>>> cap_list
> >>>
> >>>>>>>>>>> configuration: latency=0
> >>>
> >>>>>>>>>>> resources: iomemory:387f0-387ef
> >>> iomemory:387f0-387ef
> >>>
> >>>>>>>>>>> memory:f8000000-f8ffffff
> >>> memory:387fc0000000-387fcfffffff
> >>>
> >>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128)
> >>>
> >>>>>>>>>>> memory:f9000000-f907ffff
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>> However what scares the hell out of me is that I don't
> >>> see
> >>>
> >>>>> NVIDIA
> >>>
> >>>>>>>>>>> driver
> >>>
> >>>>>>>>>>> loaded
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>> lsmod|grep nvidia
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am
> >>>
> >>>>> guessing I
> >>>
> >>>>>>>>>>> just
> >>>
> >>>>>>>>>>> missed some trivial step during the CUDA installation
> >>> which
> >>>
> >>>>> is very
> >>>
> >>>>>>>>>>> involving. I am unfortunately too tired to debug this
> >>>
> >>>>> tonight.
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>> Predrag
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>
> >>>
> >>>>
> >>>
> >>>> Links:
> >>>
> >>>> ------
> >>>
> >>>> [1] http://findgl.mk
> >
> >
> > Links:
> > ------
> > [1] http://findgl.mk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20161013/a7d1ad07/attachment-0001.html>


More information about the Autonlab-users mailing list