GPU3 is "configured"

Dougal Sutherland dougal at gmail.com
Thu Oct 13 13:58:58 EDT 2016


According to the tensorflow site, the conda package doesn't support GPUs.

On Thu, Oct 13, 2016, 6:55 PM Predrag Punosevac <
predragp at imap.srv.cs.cmu.edu> wrote:

> On 2016-10-13 13:51, Dougal Sutherland wrote:
> > I actually haven't gotten tensorflow working yet -- the bazel build
> > just hangs on me. I think it maybe has to do with home directories
> > being on NFS, but I can't figure out bazel at all. I'll try some more
> > tonight.
> >
>
> According to one of Princeton guys we could just use Python conda for
> TensorFlow. Please check out and use your scratch directory instead of
> NFS.
>
> Quote:
>
> Hello, Predrag.
>
> We have caffe 1.00rc3 if you are interested.
>
>
> ftp://ftp.cs.princeton.edu/pub/people/advorkin/SRPM/sd7/caffe-1.00rc3-3.sd7.src.rpm
>
> TensforFlow and protobuf-3 work great with conda
> (http://conda.pydata.org).  I just tried and had no problems installing
> it for Python 2.7 and 3.5
>
>
> > Caffe should be workable following the instructions Predrag forwarded.
> >
> > - Dougal
> >
> > On Thu, Oct 13, 2016 at 6:39 PM Predrag Punosevac
> > <predragp at imap.srv.cs.cmu.edu> wrote:
> >
> >> Dear Autonians,
> >>
> >> In the case anybody is interested what happens behind the scenes,
> >> Doug
> >> got Caffe and TensorFlow to work on
> >> GPU3. Please see message below. I also got the  very useful feed
> >> back
> >> from Princeton and Rutgers people. Please check out if you care (you
> >> will have to log into Gmail to see the exchange).
> >>
> >> https://groups.google.com/forum/#!forum/springdale-users
> >>
> >> I need to think how we move forward with this before start pulling
> >> triggers. If somebody is itchy and can't wait please build Caffe and
> >> TensorFlow in your scratch directory following below howto.
> >>
> >> Predrag
> >>
> >> On 2016-10-13 13:24, Dougal Sutherland wrote:
> >>> A note about cudnn:
> >>>
> >>> There are a bunch of versions of cudnn. They're not
> >>> backwards-compatible, and different versions of
> >>> caffe/tensorflow/whatever want different ones.
> >>>
> >>> I currently am using the setup in  ~dsutherl/cudnn_files:
> >>>
> >>> * I have a bunch of versions of the installer there.
> >>> * The use-cudnn.sh script, intended to be used like "source
> >>> use-cudnn.sh 5.1", will untar the appropriate one into a scratch
> >>> directory (if it hasn't already been done) and set
> >>> CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH
> >> is
> >>> needed for caffe binaries, since they don't link to the absolute
> >> path;
> >>> the first two (not sure about the the third) are needed for
> >> theano.
> >>> Dunno about tensorflow yet.
> >>>
> >>> So, here's the Caffe setup:
> >>>
> >>> cd /home/scratch/$USER
> >>> git clone https://github.com/BVLC/caffe
> >>> cd caffe
> >>> cp Makefile.config.example Makefile.config
> >>>
> >>> # tell it to use openblas; using atlas needs some changes to the
> >>> Makefile
> >>> sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config
> >>>
> >>> # configure to use cudnn (optional)
> >>> source ~dsutherl/cudnn-files/use-cudnn.sh 5.1
> >>> sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config
> >>> perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/'
> >>> Makefile.config
> >>> perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/'
> >>> Makefile.config
> >>>
> >>> # build the library
> >>> make -j23
> >>>
> >>> # to do tests (takes ~10 minutes):
> >>> make -j23 test
> >>> make runtest
> >>>
> >>> # Now, to run caffe binaries you'll need to remember to source
> >>> use-cudnn if you used cudnn before.
> >>>
> >>> # To build the python libary:
> >>> make py
> >>>
> >>> # Requirements for the python library:
> >>> # Some of the system packages are too old; this installs them in
> >> your
> >>> scratch directory.
> >>> # You'll have to set PYTHONUSERBASE again before running any
> >> python
> >>> processes that use these libs.
> >>> export PYTHONUSERBASE=$HOME/scratch/.local;
> >>> export PATH=$PYTHONUSERBASE/bin:"$PATH"  # <- optional
> >>> pip install --user -r python/requirements.txt
> >>>
> >>> # Caffe is dumb and doesn't package its python library properly.
> >> The
> >>> easiest way to use it is:
> >>> export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH
> >>> python -c 'import caffe'
> >>>
> >>> On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland
> >> <dougal at gmail.com>
> >>> wrote:
> >>>
> >>>> Java fix seemed to work. Now tensorflow wants  python-wheel  and
> >>>> swig.
> >>>>
> >>>> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac
> >>>> <predragp at imap.srv.cs.cmu.edu> wrote:
> >>>>
> >>>>> On 2016-10-13 11:46, Dougal Sutherland wrote:
> >>>>>
> >>>>>> Having some trouble with tensorflow, because:
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> * it require's Google's bazel build system
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> * The bazel installer says
> >>>>>
> >>>>>> Java version is 1.7.0_111 while at least 1.8 is needed.
> >>>>>
> >>>>>> *
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> * $ java -version
> >>>>>
> >>>>>> openjdk version "1.8.0_102"
> >>>>>
> >>>>>> OpenJDK Runtime Environment (build 1.8.0_102-b14)
> >>>>>
> >>>>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
> >>>>>
> >>>>>> $ javac -version
> >>>>>
> >>>>>> javac 1.7.0_111
> >>>>>
> >>>>>>
> >>>>>
> >>>>> I just did yum -y install java-1.8.0* which installs openjdk
> >> 1.8.
> >>>>> Please
> >>>>>
> >>>>> change your java. Let me know if
> >>>>>
> >>>>> you want me to install Oracle JDK 1.8
> >>>>>
> >>>>> Predrag
> >>>>>
> >>>>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac
> >>>>>
> >>>>>> <predragp at cs.cmu.edu> wrote:
> >>>>>
> >>>>>>
> >>>>>
> >>>>>>> Dougal Sutherland <dougal at gmail.com> wrote:
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>>> Also, this seemed to work for me so far for protobuf:
> >>>>>
> >>>>>>>>
> >>>>>
> >>>>>>>> cd /home/scratch/$USER
> >>>>>
> >>>>>>>> VER=3.1.0
> >>>>>
> >>>>>>>> wget
> >>>>>
> >>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> https://github.com/google/protobuf/releases/download/v$VER/protobuf-cpp-$VER.tar.gz
> >>>>>
> >>>>>>>> tar xf protobuf-cpp-$VER.tar.gz
> >>>>>
> >>>>>>>> cd protobuf-cpp-$VER
> >>>>>
> >>>>>>>> ./configure --prefix=/home/scratch/$USER
> >>>>>
> >>>>>>>> make -j12
> >>>>>
> >>>>>>>> make -j12 check
> >>>>>
> >>>>>>>> make install
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>> That is great help!
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>>>
> >>>>>
> >>>>>>>> You could change --prefix=/usr if making an RPM.
> >>>>>
> >>>>>>>>
> >>>>>
> >>>>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland
> >>>>>
> >>>>>>> <dougal at gmail.com> wrote:
> >>>>>
> >>>>>>>>
> >>>>>
> >>>>>>>>> Some more packages for caffe:
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel
> >>>>>
> >>>>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>> (Some of those might be installed already, but at least
> >>>>> gflags
> >>>>>
> >>>>>>> is
> >>>>>
> >>>>>>>>> definitely missing.)
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac <
> >>>>>
> >>>>>>>>> predragp at imap.srv.cs.cmu.edu> wrote:
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>> On 2016-10-12 23:26, Arne Suppe wrote:
> >>>>>
> >>>>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice
> >>>>>
> >>>>>>> also hangs
> >>>>>
> >>>>>>>>>> on my computer with R2016a.
> >>>>>
> >>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>> We would have to escalate this with MathWorks. I have seen
> >>>>> work
> >>>>>
> >>>>>>> around
> >>>>>
> >>>>>>>>> Internet but it looks like a bug in one of Mathworks
> >> provided
> >>>>>
> >>>>>>> MEX files.
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>>> I was able compile the matrixMul example in the CUDA
> >>>>> samples
> >>>>>
> >>>>>>> and run
> >>>>>
> >>>>>>>>>> it on gpu3, so I think the build environment is probably
> >>>>> all
> >>>>>
> >>>>>>> set.
> >>>>>
> >>>>>>>>>>
> >>>>>
> >>>>>>>>>> As for the openGL, I think its possibly a problem with
> >>>>> their
> >>>>>
> >>>>>>> build
> >>>>>
> >>>>>>>>>> script findgl.mk [1] [1] [1] which is not familiar with
> >>>>> Springdale OS.
> >>>>>
> >>>>>>> The
> >>>>>
> >>>>>>>>>> demo_suite directory has a precompiled nbody binary you may
> >>>>>
> >>>>>>> try, but I
> >>>>>
> >>>>>>>>>> suspect most users will not need graphics.
> >>>>>
> >>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>> That should not be too hard to fix. Some header files have
> >> to
> >>>>> be
> >>>>>
> >>>>>>>>> manually edited. The funny part until 7.2 Princeton people
> >>>>>
> >>>>>>> didn't bother
> >>>>>
> >>>>>>>>> to remove RHEL branding which actually made things easier
> >> for
> >>>>>
> >>>>>>> us.
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>> Doug is trying right now to compile the latest Caffe,
> >>>>>
> >>>>>>> TensorFlow, and
> >>>>>
> >>>>>>>>> protobuf-3. We will try to create an RPM for that so that we
> >>>>>
> >>>>>>> don't have
> >>>>>
> >>>>>>>>> to go through this again. I also asked Princeton and Rutgers
> >>>>>
> >>>>>>> guys if
> >>>>>
> >>>>>>>>> they
> >>>>>
> >>>>>>>>> have WIP RPMs to share.
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>> Predrag
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>>> Arne
> >>>>>
> >>>>>>>>>>
> >>>>>
> >>>>>>>>>>
> >>>>>
> >>>>>>>>>>
> >>>>>
> >>>>>>>>>>
> >>>>>
> >>>>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac
> >>>>>
> >>>>>>> <predragp at cs.cmu.edu>
> >>>>>
> >>>>>>>>>>> wrote:
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> Arne Suppe <suppe at andrew.cmu.edu> wrote:
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>> Hi Predrag,
> >>>>>
> >>>>>>>>>>>> Don???t know if this applies to you, but I just build a
> >>>>>
> >>>>>>> machines with
> >>>>>
> >>>>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the
> >>>>>
> >>>>>>> Titan.  After
> >>>>>
> >>>>>>>>>>>> installing CUDA 8, I still found I needed to install the
> >>>>>
> >>>>>>> latest
> >>>>>
> >>>>>>>>>>>> driver off of the NVIDIA web site to get the card
> >>>>>
> >>>>>>> recognized.  Right
> >>>>>
> >>>>>>>>>>>> now, I am running 367.44.
> >>>>>
> >>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>> Arne
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> Arne,
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL
> >>>>>
> >>>>>>> arhitecture I
> >>>>>
> >>>>>>>>>>> see lots of people complaining about it on the forums. I
> >>>>>
> >>>>>>> downloaded
> >>>>>
> >>>>>>>>>>> and
> >>>>>
> >>>>>>>>>>> installed driver from
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> That seems to made a real difference. Check out this
> >>>>>
> >>>>>>> beautiful outputs
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> root at gpu3$ ls nvidia*
> >>>>>
> >>>>>>>>>>> nvidia0  nvidia1  nvidia2  nvidia3  nvidiactl  nvidia-uvm
> >>>>>
> >>>>>>>>>>> nvidia-uvm-tools
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> root at gpu3$ lspci | grep -i nvidia
> >>>>>
> >>>>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation
> >>>>> Device
> >>>>>
> >>>>>>> 1b00 (rev
> >>>>>
> >>>>>>>>>>> a1)
> >>>>>
> >>>>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
> >>>>> a1)
> >>>>>
> >>>>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation
> >>>>> Device
> >>>>>
> >>>>>>> 1b00 (rev
> >>>>>
> >>>>>>>>>>> a1)
> >>>>>
> >>>>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
> >>>>> a1)
> >>>>>
> >>>>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation
> >>>>> Device
> >>>>>
> >>>>>>> 1b00 (rev
> >>>>>
> >>>>>>>>>>> a1)
> >>>>>
> >>>>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
> >>>>> a1)
> >>>>>
> >>>>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation
> >>>>> Device
> >>>>>
> >>>>>>> 1b00 (rev
> >>>>>
> >>>>>>>>>>> a1)
> >>>>>
> >>>>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
> >>>>> a1)
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> root at gpu3$ ls /proc/driver
> >>>>>
> >>>>>>>>>>> nvidia  nvidia-uvm  nvram  rtc
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> root at gpu3$ lsmod |grep nvidia
> >>>>>
> >>>>>>>>>>> nvidia_uvm            738901  0
> >>>>>
> >>>>>>>>>>> nvidia_drm             43405  0
> >>>>>
> >>>>>>>>>>> nvidia_modeset        764432  1 nvidia_drm
> >>>>>
> >>>>>>>>>>> nvidia              11492947  2 nvidia_modeset,nvidia_uvm
> >>>>>
> >>>>>>>>>>> drm_kms_helper        125056  2 ast,nvidia_drm
> >>>>>
> >>>>>>>>>>> drm                   349210  5
> >>>>>
> >>>>>>> ast,ttm,drm_kms_helper,nvidia_drm
> >>>>>
> >>>>>>>>>>> i2c_core               40582  7
> >>>>>
> >>>>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> root at gpu3$ nvidia-smi
> >>>>>
> >>>>>>>>>>> Wed Oct 12 22:03:27 2016
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> +-----------------------------------------------------------------------------+
> >>>>>
> >>>>>>>>>>> | NVIDIA-SMI 367.57                 Driver Version: 367.57
> >>>>>
> >>>>>>>>>>> |
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> |-------------------------------+----------------------+----------------------+
> >>>>>
> >>>>>>>>>>> | GPU  Name        Persistence-M| Bus-Id        Disp.A |
> >>>>>
> >>>>>>> Volatile
> >>>>>
> >>>>>>>>>>> Uncorr. ECC |
> >>>>>
> >>>>>>>>>>> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage |
> >>>>>
> >>>>>>> GPU-Util
> >>>>>
> >>>>>>>>>>> Compute M. |
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> |===============================+======================+======================|
> >>>>>
> >>>>>>>>>>> |   0  TITAN X (Pascal)    Off  | 0000:02:00.0     Off |
> >>>>>
> >>>>>>>>>>> N/A |
> >>>>>
> >>>>>>>>>>> | 23%   32C    P0    56W / 250W |      0MiB / 12189MiB |
> >>>>>
> >>>>>>> 0%
> >>>>>
> >>>>>>>>>>> Default |
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> +-------------------------------+----------------------+----------------------+
> >>>>>
> >>>>>>>>>>> |   1  TITAN X (Pascal)    Off  | 0000:03:00.0     Off |
> >>>>>
> >>>>>>>>>>> N/A |
> >>>>>
> >>>>>>>>>>> | 23%   36C    P0    57W / 250W |      0MiB / 12189MiB |
> >>>>>
> >>>>>>> 0%
> >>>>>
> >>>>>>>>>>> Default |
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> +-------------------------------+----------------------+----------------------+
> >>>>>
> >>>>>>>>>>> |   2  TITAN X (Pascal)    Off  | 0000:82:00.0     Off |
> >>>>>
> >>>>>>>>>>> N/A |
> >>>>>
> >>>>>>>>>>> | 23%   35C    P0    57W / 250W |      0MiB / 12189MiB |
> >>>>>
> >>>>>>> 0%
> >>>>>
> >>>>>>>>>>> Default |
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> +-------------------------------+----------------------+----------------------+
> >>>>>
> >>>>>>>>>>> |   3  TITAN X (Pascal)    Off  | 0000:83:00.0     Off |
> >>>>>
> >>>>>>>>>>> N/A |
> >>>>>
> >>>>>>>>>>> |  0%   35C    P0    56W / 250W |      0MiB / 12189MiB |
> >>>>>
> >>>>>>> 0%
> >>>>>
> >>>>>>>>>>> Default |
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> +-------------------------------+----------------------+----------------------+
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> +-----------------------------------------------------------------------------+
> >>>>>
> >>>>>>>>>>> | Processes:
> >>>>>
> >>>>>>> GPU
> >>>>>
> >>>>>>>>>>> Memory |
> >>>>>
> >>>>>>>>>>> |  GPU       PID  Type  Process name
> >>>>>
> >>>>>>>>>>> Usage
> >>>>>
> >>>>>>>>>>> |
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> |=============================================================================|
> >>>>>
> >>>>>>>>>>> |  No running processes found
> >>>>>
> >>>>>>>>>>> |
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
> +-----------------------------------------------------------------------------+
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> Alignment requirement for Surfaces:            Yes
> >>>>>
> >>>>>>>>>>> Device has ECC support:                        Disabled
> >>>>>
> >>>>>>>>>>> Device supports Unified Addressing (UVA):      Yes
> >>>>>
> >>>>>>>>>>> Device PCI Domain ID / Bus ID / location ID:   0 / 131 /
> >>>>> 0
> >>>>>
> >>>>>>>>>>> Compute Mode:
> >>>>>
> >>>>>>>>>>> < Default (multiple host threads can use
> >>>>>
> >>>>>>> ::cudaSetDevice() with
> >>>>>
> >>>>>>>>>>> device simultaneously) >
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU1) :
> >>>>>
> >>>>>>>>>>> Yes
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU2) :
> >>>>>
> >>>>>>>>>>> No
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU3) :
> >>>>>
> >>>>>>>>>>> No
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU0) :
> >>>>>
> >>>>>>>>>>> Yes
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU2) :
> >>>>>
> >>>>>>>>>>> No
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU3) :
> >>>>>
> >>>>>>>>>>> No
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU0) :
> >>>>>
> >>>>>>>>>>> No
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU1) :
> >>>>>
> >>>>>>>>>>> No
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU3) :
> >>>>>
> >>>>>>>>>>> Yes
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU0) :
> >>>>>
> >>>>>>>>>>> No
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU1) :
> >>>>>
> >>>>>>>>>>> No
> >>>>>
> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
> >>>>> (Pascal)
> >>>>>
> >>>>>>> (GPU2) :
> >>>>>
> >>>>>>>>>>> Yes
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version =
> >>>>> 8.0,
> >>>>>
> >>>>>>> CUDA
> >>>>>
> >>>>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X
> >>>>>
> >>>>>>> (Pascal),
> >>>>>
> >>>>>>>>>>> Device1
> >>>>>
> >>>>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 =
> >>>>>
> >>>>>>> TITAN X
> >>>>>
> >>>>>>>>>>> (Pascal)
> >>>>>
> >>>>>>>>>>> Result = PASS
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> Now not everything is rosy
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> root at gpu3$ cd
> >>>>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
> >>>>>
> >>>>>>>>>>> root at gpu3$ make
> >>>>>
> >>>>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting
> >>>>>
> >>>>>>> Started Guide
> >>>>>
> >>>>>>>>>>> for how to find and install them. <<<
> >>>>>
> >>>>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting
> >>>>>
> >>>>>>> Started Guide
> >>>>>
> >>>>>>>>>>> for how to find and install them. <<<
> >>>>>
> >>>>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting
> >>>>>
> >>>>>>> Started Guide
> >>>>>
> >>>>>>>>>>> for how to find and install them. <<<
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> even though those are installed. For example
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> root at gpu3$ yum whatprovides  */libX11.so
> >>>>>
> >>>>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for
> >>>>> libX11
> >>>>>
> >>>>>>>>>>> Repo        : core
> >>>>>
> >>>>>>>>>>> Matched from:
> >>>>>
> >>>>>>>>>>> Filename    : /usr/lib/libX11.so
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> also
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> mesa-libGLU-devel
> >>>>>
> >>>>>>>>>>> mesa-libGL-devel
> >>>>>
> >>>>>>>>>>> xorg-x11-drv-nvidia-devel
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> but
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel
> >>>>> mesa-libGL-devel
> >>>>>
> >>>>>>>>>>> xorg-x11-drv-nvidia-devel
> >>>>>
> >>>>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already
> >>>>>
> >>>>>>> installed and
> >>>>>
> >>>>>>>>>>> latest version
> >>>>>
> >>>>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64
> >>>>> already
> >>>>>
> >>>>>>>>>>> installed
> >>>>>
> >>>>>>>>>>> and latest version
> >>>>>
> >>>>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64
> >>>>>
> >>>>>>> already
> >>>>>
> >>>>>>>>>>> installed and latest version
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> Also from MATLAB gpuDevice hangs.
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> So we still don't have a working installation. Any help
> >>>>> would
> >>>>>
> >>>>>>> be
> >>>>>
> >>>>>>>>>>> appreciated.
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> Best,
> >>>>>
> >>>>>>>>>>> Predrag
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>> P.S. Once we have a working installation we can think of
> >>>>>
> >>>>>>> installing
> >>>>>
> >>>>>>>>>>> Caffe and TensorFlow. For now we have to see why the
> >>>>> things
> >>>>>
> >>>>>>> are not
> >>>>>
> >>>>>>>>>>> working.
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac
> >>>>>
> >>>>>>> <predragp at cs.cmu.edu>
> >>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>
> >>>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>> Dear Autonians,
> >>>>>
> >>>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all
> >>>>>
> >>>>>>> packages
> >>>>>
> >>>>>>>>>>>>> are
> >>>>>
> >>>>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA
> >>>>>
> >>>>>>> driver to
> >>>>>
> >>>>>>>>>>>>> recognize GPU cards. They appear to be properly
> >>>>> installed
> >>>>>
> >>>>>>> from the
> >>>>>
> >>>>>>>>>>>>> hardware point of view and you can list them with
> >>>>>
> >>>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>> lshw -class display
> >>>>>
> >>>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>> root at gpu3$ lshw -class display
> >>>>>
> >>>>>>>>>>>>> *-display UNCLAIMED
> >>>>>
> >>>>>>>>>>>>> description: VGA compatible controller
> >>>>>
> >>>>>>>>>>>>> product: NVIDIA Corporation
> >>>>>
> >>>>>>>>>>>>> vendor: NVIDIA Corporation
> >>>>>
> >>>>>>>>>>>>> physical id: 0
> >>>>>
> >>>>>>>>>>>>> bus info: pci at 0000:02:00.0
> >>>>>
> >>>>>>>>>>>>> version: a1
> >>>>>
> >>>>>>>>>>>>> width: 64 bits
> >>>>>
> >>>>>>>>>>>>> clock: 33MHz
> >>>>>
> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
> >>>>>
> >>>>>>> cap_list
> >>>>>
> >>>>>>>>>>>>> configuration: latency=0
> >>>>>
> >>>>>>>>>>>>> resources: iomemory:383f0-383ef
> >>>>> iomemory:383f0-383ef
> >>>>>
> >>>>>>>>>>>>> memory:cf000000-cfffffff
> >>>>> memory:383fe0000000-383fefffffff
> >>>>>
> >>>>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128)
> >>>>>
> >>>>>>>>>>>>> memory:d0000000-d007ffff
> >>>>>
> >>>>>>>>>>>>> *-display UNCLAIMED
> >>>>>
> >>>>>>>>>>>>> description: VGA compatible controller
> >>>>>
> >>>>>>>>>>>>> product: NVIDIA Corporation
> >>>>>
> >>>>>>>>>>>>> vendor: NVIDIA Corporation
> >>>>>
> >>>>>>>>>>>>> physical id: 0
> >>>>>
> >>>>>>>>>>>>> bus info: pci at 0000:03:00.0
> >>>>>
> >>>>>>>>>>>>> version: a1
> >>>>>
> >>>>>>>>>>>>> width: 64 bits
> >>>>>
> >>>>>>>>>>>>> clock: 33MHz
> >>>>>
> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
> >>>>>
> >>>>>>> cap_list
> >>>>>
> >>>>>>>>>>>>> configuration: latency=0
> >>>>>
> >>>>>>>>>>>>> resources: iomemory:383f0-383ef
> >>>>> iomemory:383f0-383ef
> >>>>>
> >>>>>>>>>>>>> memory:cd000000-cdffffff
> >>>>> memory:383fc0000000-383fcfffffff
> >>>>>
> >>>>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128)
> >>>>>
> >>>>>>>>>>>>> memory:ce000000-ce07ffff
> >>>>>
> >>>>>>>>>>>>> *-display
> >>>>>
> >>>>>>>>>>>>> description: VGA compatible controller
> >>>>>
> >>>>>>>>>>>>> product: ASPEED Graphics Family
> >>>>>
> >>>>>>>>>>>>> vendor: ASPEED Technology, Inc.
> >>>>>
> >>>>>>>>>>>>> physical id: 0
> >>>>>
> >>>>>>>>>>>>> bus info: pci at 0000:06:00.0
> >>>>>
> >>>>>>>>>>>>> version: 30
> >>>>>
> >>>>>>>>>>>>> width: 32 bits
> >>>>>
> >>>>>>>>>>>>> clock: 33MHz
> >>>>>
> >>>>>>>>>>>>> capabilities: pm msi vga_controller bus_master
> >>>>>
> >>>>>>> cap_list rom
> >>>>>
> >>>>>>>>>>>>> configuration: driver=ast latency=0
> >>>>>
> >>>>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff
> >>>>>
> >>>>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128)
> >>>>>
> >>>>>>>>>>>>> *-display UNCLAIMED
> >>>>>
> >>>>>>>>>>>>> description: VGA compatible controller
> >>>>>
> >>>>>>>>>>>>> product: NVIDIA Corporation
> >>>>>
> >>>>>>>>>>>>> vendor: NVIDIA Corporation
> >>>>>
> >>>>>>>>>>>>> physical id: 0
> >>>>>
> >>>>>>>>>>>>> bus info: pci at 0000:82:00.0
> >>>>>
> >>>>>>>>>>>>> version: a1
> >>>>>
> >>>>>>>>>>>>> width: 64 bits
> >>>>>
> >>>>>>>>>>>>> clock: 33MHz
> >>>>>
> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
> >>>>>
> >>>>>>> cap_list
> >>>>>
> >>>>>>>>>>>>> configuration: latency=0
> >>>>>
> >>>>>>>>>>>>> resources: iomemory:387f0-387ef
> >>>>> iomemory:387f0-387ef
> >>>>>
> >>>>>>>>>>>>> memory:fa000000-faffffff
> >>>>> memory:387fe0000000-387fefffffff
> >>>>>
> >>>>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128)
> >>>>>
> >>>>>>>>>>>>> memory:fb000000-fb07ffff
> >>>>>
> >>>>>>>>>>>>> *-display UNCLAIMED
> >>>>>
> >>>>>>>>>>>>> description: VGA compatible controller
> >>>>>
> >>>>>>>>>>>>> product: NVIDIA Corporation
> >>>>>
> >>>>>>>>>>>>> vendor: NVIDIA Corporation
> >>>>>
> >>>>>>>>>>>>> physical id: 0
> >>>>>
> >>>>>>>>>>>>> bus info: pci at 0000:83:00.0
> >>>>>
> >>>>>>>>>>>>> version: a1
> >>>>>
> >>>>>>>>>>>>> width: 64 bits
> >>>>>
> >>>>>>>>>>>>> clock: 33MHz
> >>>>>
> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
> >>>>>
> >>>>>>> cap_list
> >>>>>
> >>>>>>>>>>>>> configuration: latency=0
> >>>>>
> >>>>>>>>>>>>> resources: iomemory:387f0-387ef
> >>>>> iomemory:387f0-387ef
> >>>>>
> >>>>>>>>>>>>> memory:f8000000-f8ffffff
> >>>>> memory:387fc0000000-387fcfffffff
> >>>>>
> >>>>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128)
> >>>>>
> >>>>>>>>>>>>> memory:f9000000-f907ffff
> >>>>>
> >>>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>> However what scares the hell out of me is that I don't
> >>>>> see
> >>>>>
> >>>>>>> NVIDIA
> >>>>>
> >>>>>>>>>>>>> driver
> >>>>>
> >>>>>>>>>>>>> loaded
> >>>>>
> >>>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>> lsmod|grep nvidia
> >>>>>
> >>>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am
> >>>>>
> >>>>>>> guessing I
> >>>>>
> >>>>>>>>>>>>> just
> >>>>>
> >>>>>>>>>>>>> missed some trivial step during the CUDA installation
> >>>>> which
> >>>>>
> >>>>>>> is very
> >>>>>
> >>>>>>>>>>>>> involving. I am unfortunately too tired to debug this
> >>>>>
> >>>>>>> tonight.
> >>>>>
> >>>>>>>>>>>>>
> >>>>>
> >>>>>>>>>>>>> Predrag
> >>>>>
> >>>>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>>>
> >>>>>
> >>>>>> Links:
> >>>>>
> >>>>>> ------
> >>>>>
> >>>>>> [1] http://findgl.mk
> >>>
> >>>
> >>> Links:
> >>> ------
> >>> [1] http://findgl.mk
> >
> >
> > Links:
> > ------
> > [1] http://findgl.mk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20161013/179a6805/attachment-0001.html>


More information about the Autonlab-users mailing list