GPU3 is "configured"
Predrag Punosevac
predragp at imap.srv.cs.cmu.edu
Thu Oct 13 13:55:34 EDT 2016
On 2016-10-13 13:51, Dougal Sutherland wrote:
> I actually haven't gotten tensorflow working yet -- the bazel build
> just hangs on me. I think it maybe has to do with home directories
> being on NFS, but I can't figure out bazel at all. I'll try some more
> tonight.
>
According to one of Princeton guys we could just use Python conda for
TensorFlow. Please check out and use your scratch directory instead of
NFS.
Quote:
Hello, Predrag.
We have caffe 1.00rc3 if you are interested.
ftp://ftp.cs.princeton.edu/pub/people/advorkin/SRPM/sd7/caffe-1.00rc3-3.sd7.src.rpm
TensforFlow and protobuf-3 work great with conda
(http://conda.pydata.org). I just tried and had no problems installing
it for Python 2.7 and 3.5
> Caffe should be workable following the instructions Predrag forwarded.
>
> - Dougal
>
> On Thu, Oct 13, 2016 at 6:39 PM Predrag Punosevac
> <predragp at imap.srv.cs.cmu.edu> wrote:
>
>> Dear Autonians,
>>
>> In the case anybody is interested what happens behind the scenes,
>> Doug
>> got Caffe and TensorFlow to work on
>> GPU3. Please see message below. I also got the very useful feed
>> back
>> from Princeton and Rutgers people. Please check out if you care (you
>> will have to log into Gmail to see the exchange).
>>
>> https://groups.google.com/forum/#!forum/springdale-users
>>
>> I need to think how we move forward with this before start pulling
>> triggers. If somebody is itchy and can't wait please build Caffe and
>> TensorFlow in your scratch directory following below howto.
>>
>> Predrag
>>
>> On 2016-10-13 13:24, Dougal Sutherland wrote:
>>> A note about cudnn:
>>>
>>> There are a bunch of versions of cudnn. They're not
>>> backwards-compatible, and different versions of
>>> caffe/tensorflow/whatever want different ones.
>>>
>>> I currently am using the setup in ~dsutherl/cudnn_files:
>>>
>>> * I have a bunch of versions of the installer there.
>>> * The use-cudnn.sh script, intended to be used like "source
>>> use-cudnn.sh 5.1", will untar the appropriate one into a scratch
>>> directory (if it hasn't already been done) and set
>>> CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH
>> is
>>> needed for caffe binaries, since they don't link to the absolute
>> path;
>>> the first two (not sure about the the third) are needed for
>> theano.
>>> Dunno about tensorflow yet.
>>>
>>> So, here's the Caffe setup:
>>>
>>> cd /home/scratch/$USER
>>> git clone https://github.com/BVLC/caffe
>>> cd caffe
>>> cp Makefile.config.example Makefile.config
>>>
>>> # tell it to use openblas; using atlas needs some changes to the
>>> Makefile
>>> sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config
>>>
>>> # configure to use cudnn (optional)
>>> source ~dsutherl/cudnn-files/use-cudnn.sh 5.1
>>> sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config
>>> perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/'
>>> Makefile.config
>>> perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/'
>>> Makefile.config
>>>
>>> # build the library
>>> make -j23
>>>
>>> # to do tests (takes ~10 minutes):
>>> make -j23 test
>>> make runtest
>>>
>>> # Now, to run caffe binaries you'll need to remember to source
>>> use-cudnn if you used cudnn before.
>>>
>>> # To build the python libary:
>>> make py
>>>
>>> # Requirements for the python library:
>>> # Some of the system packages are too old; this installs them in
>> your
>>> scratch directory.
>>> # You'll have to set PYTHONUSERBASE again before running any
>> python
>>> processes that use these libs.
>>> export PYTHONUSERBASE=$HOME/scratch/.local;
>>> export PATH=$PYTHONUSERBASE/bin:"$PATH" # <- optional
>>> pip install --user -r python/requirements.txt
>>>
>>> # Caffe is dumb and doesn't package its python library properly.
>> The
>>> easiest way to use it is:
>>> export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH
>>> python -c 'import caffe'
>>>
>>> On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland
>> <dougal at gmail.com>
>>> wrote:
>>>
>>>> Java fix seemed to work. Now tensorflow wants python-wheel and
>>>> swig.
>>>>
>>>> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac
>>>> <predragp at imap.srv.cs.cmu.edu> wrote:
>>>>
>>>>> On 2016-10-13 11:46, Dougal Sutherland wrote:
>>>>>
>>>>>> Having some trouble with tensorflow, because:
>>>>>
>>>>>>
>>>>>
>>>>>> * it require's Google's bazel build system
>>>>>
>>>>>>
>>>>>
>>>>>> * The bazel installer says
>>>>>
>>>>>> Java version is 1.7.0_111 while at least 1.8 is needed.
>>>>>
>>>>>> *
>>>>>
>>>>>>
>>>>>
>>>>>> * $ java -version
>>>>>
>>>>>> openjdk version "1.8.0_102"
>>>>>
>>>>>> OpenJDK Runtime Environment (build 1.8.0_102-b14)
>>>>>
>>>>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
>>>>>
>>>>>> $ javac -version
>>>>>
>>>>>> javac 1.7.0_111
>>>>>
>>>>>>
>>>>>
>>>>> I just did yum -y install java-1.8.0* which installs openjdk
>> 1.8.
>>>>> Please
>>>>>
>>>>> change your java. Let me know if
>>>>>
>>>>> you want me to install Oracle JDK 1.8
>>>>>
>>>>> Predrag
>>>>>
>>>>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac
>>>>>
>>>>>> <predragp at cs.cmu.edu> wrote:
>>>>>
>>>>>>
>>>>>
>>>>>>> Dougal Sutherland <dougal at gmail.com> wrote:
>>>>>
>>>>>>>
>>>>>
>>>>>>>> Also, this seemed to work for me so far for protobuf:
>>>>>
>>>>>>>>
>>>>>
>>>>>>>> cd /home/scratch/$USER
>>>>>
>>>>>>>> VER=3.1.0
>>>>>
>>>>>>>> wget
>>>>>
>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> https://github.com/google/protobuf/releases/download/v$VER/protobuf-cpp-$VER.tar.gz
>>>>>
>>>>>>>> tar xf protobuf-cpp-$VER.tar.gz
>>>>>
>>>>>>>> cd protobuf-cpp-$VER
>>>>>
>>>>>>>> ./configure --prefix=/home/scratch/$USER
>>>>>
>>>>>>>> make -j12
>>>>>
>>>>>>>> make -j12 check
>>>>>
>>>>>>>> make install
>>>>>
>>>>>>>
>>>>>
>>>>>>> That is great help!
>>>>>
>>>>>>>
>>>>>
>>>>>>>>
>>>>>
>>>>>>>> You could change --prefix=/usr if making an RPM.
>>>>>
>>>>>>>>
>>>>>
>>>>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland
>>>>>
>>>>>>> <dougal at gmail.com> wrote:
>>>>>
>>>>>>>>
>>>>>
>>>>>>>>> Some more packages for caffe:
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel
>>>>>
>>>>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>> (Some of those might be installed already, but at least
>>>>> gflags
>>>>>
>>>>>>> is
>>>>>
>>>>>>>>> definitely missing.)
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac <
>>>>>
>>>>>>>>> predragp at imap.srv.cs.cmu.edu> wrote:
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>> On 2016-10-12 23:26, Arne Suppe wrote:
>>>>>
>>>>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice
>>>>>
>>>>>>> also hangs
>>>>>
>>>>>>>>>> on my computer with R2016a.
>>>>>
>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>> We would have to escalate this with MathWorks. I have seen
>>>>> work
>>>>>
>>>>>>> around
>>>>>
>>>>>>>>> Internet but it looks like a bug in one of Mathworks
>> provided
>>>>>
>>>>>>> MEX files.
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>>> I was able compile the matrixMul example in the CUDA
>>>>> samples
>>>>>
>>>>>>> and run
>>>>>
>>>>>>>>>> it on gpu3, so I think the build environment is probably
>>>>> all
>>>>>
>>>>>>> set.
>>>>>
>>>>>>>>>>
>>>>>
>>>>>>>>>> As for the openGL, I think its possibly a problem with
>>>>> their
>>>>>
>>>>>>> build
>>>>>
>>>>>>>>>> script findgl.mk [1] [1] [1] which is not familiar with
>>>>> Springdale OS.
>>>>>
>>>>>>> The
>>>>>
>>>>>>>>>> demo_suite directory has a precompiled nbody binary you may
>>>>>
>>>>>>> try, but I
>>>>>
>>>>>>>>>> suspect most users will not need graphics.
>>>>>
>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>> That should not be too hard to fix. Some header files have
>> to
>>>>> be
>>>>>
>>>>>>>>> manually edited. The funny part until 7.2 Princeton people
>>>>>
>>>>>>> didn't bother
>>>>>
>>>>>>>>> to remove RHEL branding which actually made things easier
>> for
>>>>>
>>>>>>> us.
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>> Doug is trying right now to compile the latest Caffe,
>>>>>
>>>>>>> TensorFlow, and
>>>>>
>>>>>>>>> protobuf-3. We will try to create an RPM for that so that we
>>>>>
>>>>>>> don't have
>>>>>
>>>>>>>>> to go through this again. I also asked Princeton and Rutgers
>>>>>
>>>>>>> guys if
>>>>>
>>>>>>>>> they
>>>>>
>>>>>>>>> have WIP RPMs to share.
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>> Predrag
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>>> Arne
>>>>>
>>>>>>>>>>
>>>>>
>>>>>>>>>>
>>>>>
>>>>>>>>>>
>>>>>
>>>>>>>>>>
>>>>>
>>>>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac
>>>>>
>>>>>>> <predragp at cs.cmu.edu>
>>>>>
>>>>>>>>>>> wrote:
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> Arne Suppe <suppe at andrew.cmu.edu> wrote:
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>> Hi Predrag,
>>>>>
>>>>>>>>>>>> Don???t know if this applies to you, but I just build a
>>>>>
>>>>>>> machines with
>>>>>
>>>>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the
>>>>>
>>>>>>> Titan. After
>>>>>
>>>>>>>>>>>> installing CUDA 8, I still found I needed to install the
>>>>>
>>>>>>> latest
>>>>>
>>>>>>>>>>>> driver off of the NVIDIA web site to get the card
>>>>>
>>>>>>> recognized. Right
>>>>>
>>>>>>>>>>>> now, I am running 367.44.
>>>>>
>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>> Arne
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> Arne,
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL
>>>>>
>>>>>>> arhitecture I
>>>>>
>>>>>>>>>>> see lots of people complaining about it on the forums. I
>>>>>
>>>>>>> downloaded
>>>>>
>>>>>>>>>>> and
>>>>>
>>>>>>>>>>> installed driver from
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> That seems to made a real difference. Check out this
>>>>>
>>>>>>> beautiful outputs
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> root at gpu3$ ls nvidia*
>>>>>
>>>>>>>>>>> nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm
>>>>>
>>>>>>>>>>> nvidia-uvm-tools
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> root at gpu3$ lspci | grep -i nvidia
>>>>>
>>>>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation
>>>>> Device
>>>>>
>>>>>>> 1b00 (rev
>>>>>
>>>>>>>>>>> a1)
>>>>>
>>>>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>>>> a1)
>>>>>
>>>>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation
>>>>> Device
>>>>>
>>>>>>> 1b00 (rev
>>>>>
>>>>>>>>>>> a1)
>>>>>
>>>>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>>>> a1)
>>>>>
>>>>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation
>>>>> Device
>>>>>
>>>>>>> 1b00 (rev
>>>>>
>>>>>>>>>>> a1)
>>>>>
>>>>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>>>> a1)
>>>>>
>>>>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation
>>>>> Device
>>>>>
>>>>>>> 1b00 (rev
>>>>>
>>>>>>>>>>> a1)
>>>>>
>>>>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>>>> a1)
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> root at gpu3$ ls /proc/driver
>>>>>
>>>>>>>>>>> nvidia nvidia-uvm nvram rtc
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> root at gpu3$ lsmod |grep nvidia
>>>>>
>>>>>>>>>>> nvidia_uvm 738901 0
>>>>>
>>>>>>>>>>> nvidia_drm 43405 0
>>>>>
>>>>>>>>>>> nvidia_modeset 764432 1 nvidia_drm
>>>>>
>>>>>>>>>>> nvidia 11492947 2 nvidia_modeset,nvidia_uvm
>>>>>
>>>>>>>>>>> drm_kms_helper 125056 2 ast,nvidia_drm
>>>>>
>>>>>>>>>>> drm 349210 5
>>>>>
>>>>>>> ast,ttm,drm_kms_helper,nvidia_drm
>>>>>
>>>>>>>>>>> i2c_core 40582 7
>>>>>
>>>>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> root at gpu3$ nvidia-smi
>>>>>
>>>>>>>>>>> Wed Oct 12 22:03:27 2016
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> +-----------------------------------------------------------------------------+
>>>>>
>>>>>>>>>>> | NVIDIA-SMI 367.57 Driver Version: 367.57
>>>>>
>>>>>>>>>>> |
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> |-------------------------------+----------------------+----------------------+
>>>>>
>>>>>>>>>>> | GPU Name Persistence-M| Bus-Id Disp.A |
>>>>>
>>>>>>> Volatile
>>>>>
>>>>>>>>>>> Uncorr. ECC |
>>>>>
>>>>>>>>>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage |
>>>>>
>>>>>>> GPU-Util
>>>>>
>>>>>>>>>>> Compute M. |
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> |===============================+======================+======================|
>>>>>
>>>>>>>>>>> | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off |
>>>>>
>>>>>>>>>>> N/A |
>>>>>
>>>>>>>>>>> | 23% 32C P0 56W / 250W | 0MiB / 12189MiB |
>>>>>
>>>>>>> 0%
>>>>>
>>>>>>>>>>> Default |
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> +-------------------------------+----------------------+----------------------+
>>>>>
>>>>>>>>>>> | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off |
>>>>>
>>>>>>>>>>> N/A |
>>>>>
>>>>>>>>>>> | 23% 36C P0 57W / 250W | 0MiB / 12189MiB |
>>>>>
>>>>>>> 0%
>>>>>
>>>>>>>>>>> Default |
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> +-------------------------------+----------------------+----------------------+
>>>>>
>>>>>>>>>>> | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off |
>>>>>
>>>>>>>>>>> N/A |
>>>>>
>>>>>>>>>>> | 23% 35C P0 57W / 250W | 0MiB / 12189MiB |
>>>>>
>>>>>>> 0%
>>>>>
>>>>>>>>>>> Default |
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> +-------------------------------+----------------------+----------------------+
>>>>>
>>>>>>>>>>> | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off |
>>>>>
>>>>>>>>>>> N/A |
>>>>>
>>>>>>>>>>> | 0% 35C P0 56W / 250W | 0MiB / 12189MiB |
>>>>>
>>>>>>> 0%
>>>>>
>>>>>>>>>>> Default |
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> +-------------------------------+----------------------+----------------------+
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> +-----------------------------------------------------------------------------+
>>>>>
>>>>>>>>>>> | Processes:
>>>>>
>>>>>>> GPU
>>>>>
>>>>>>>>>>> Memory |
>>>>>
>>>>>>>>>>> | GPU PID Type Process name
>>>>>
>>>>>>>>>>> Usage
>>>>>
>>>>>>>>>>> |
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> |=============================================================================|
>>>>>
>>>>>>>>>>> | No running processes found
>>>>>
>>>>>>>>>>> |
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> +-----------------------------------------------------------------------------+
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> Alignment requirement for Surfaces: Yes
>>>>>
>>>>>>>>>>> Device has ECC support: Disabled
>>>>>
>>>>>>>>>>> Device supports Unified Addressing (UVA): Yes
>>>>>
>>>>>>>>>>> Device PCI Domain ID / Bus ID / location ID: 0 / 131 /
>>>>> 0
>>>>>
>>>>>>>>>>> Compute Mode:
>>>>>
>>>>>>>>>>> < Default (multiple host threads can use
>>>>>
>>>>>>> ::cudaSetDevice() with
>>>>>
>>>>>>>>>>> device simultaneously) >
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU1) :
>>>>>
>>>>>>>>>>> Yes
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU2) :
>>>>>
>>>>>>>>>>> No
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU3) :
>>>>>
>>>>>>>>>>> No
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU0) :
>>>>>
>>>>>>>>>>> Yes
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU2) :
>>>>>
>>>>>>>>>>> No
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU3) :
>>>>>
>>>>>>>>>>> No
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU0) :
>>>>>
>>>>>>>>>>> No
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU1) :
>>>>>
>>>>>>>>>>> No
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU3) :
>>>>>
>>>>>>>>>>> Yes
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU0) :
>>>>>
>>>>>>>>>>> No
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU1) :
>>>>>
>>>>>>>>>>> No
>>>>>
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>>>>> (Pascal)
>>>>>
>>>>>>> (GPU2) :
>>>>>
>>>>>>>>>>> Yes
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version =
>>>>> 8.0,
>>>>>
>>>>>>> CUDA
>>>>>
>>>>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X
>>>>>
>>>>>>> (Pascal),
>>>>>
>>>>>>>>>>> Device1
>>>>>
>>>>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 =
>>>>>
>>>>>>> TITAN X
>>>>>
>>>>>>>>>>> (Pascal)
>>>>>
>>>>>>>>>>> Result = PASS
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> Now not everything is rosy
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> root at gpu3$ cd
>>>>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
>>>>>
>>>>>>>>>>> root at gpu3$ make
>>>>>
>>>>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting
>>>>>
>>>>>>> Started Guide
>>>>>
>>>>>>>>>>> for how to find and install them. <<<
>>>>>
>>>>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting
>>>>>
>>>>>>> Started Guide
>>>>>
>>>>>>>>>>> for how to find and install them. <<<
>>>>>
>>>>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting
>>>>>
>>>>>>> Started Guide
>>>>>
>>>>>>>>>>> for how to find and install them. <<<
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> even though those are installed. For example
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> root at gpu3$ yum whatprovides */libX11.so
>>>>>
>>>>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for
>>>>> libX11
>>>>>
>>>>>>>>>>> Repo : core
>>>>>
>>>>>>>>>>> Matched from:
>>>>>
>>>>>>>>>>> Filename : /usr/lib/libX11.so
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> also
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> mesa-libGLU-devel
>>>>>
>>>>>>>>>>> mesa-libGL-devel
>>>>>
>>>>>>>>>>> xorg-x11-drv-nvidia-devel
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> but
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel
>>>>> mesa-libGL-devel
>>>>>
>>>>>>>>>>> xorg-x11-drv-nvidia-devel
>>>>>
>>>>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already
>>>>>
>>>>>>> installed and
>>>>>
>>>>>>>>>>> latest version
>>>>>
>>>>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64
>>>>> already
>>>>>
>>>>>>>>>>> installed
>>>>>
>>>>>>>>>>> and latest version
>>>>>
>>>>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64
>>>>>
>>>>>>> already
>>>>>
>>>>>>>>>>> installed and latest version
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> Also from MATLAB gpuDevice hangs.
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> So we still don't have a working installation. Any help
>>>>> would
>>>>>
>>>>>>> be
>>>>>
>>>>>>>>>>> appreciated.
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> Best,
>>>>>
>>>>>>>>>>> Predrag
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>> P.S. Once we have a working installation we can think of
>>>>>
>>>>>>> installing
>>>>>
>>>>>>>>>>> Caffe and TensorFlow. For now we have to see why the
>>>>> things
>>>>>
>>>>>>> are not
>>>>>
>>>>>>>>>>> working.
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac
>>>>>
>>>>>>> <predragp at cs.cmu.edu>
>>>>>
>>>>>>>>>>>>> wrote:
>>>>>
>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>> Dear Autonians,
>>>>>
>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all
>>>>>
>>>>>>> packages
>>>>>
>>>>>>>>>>>>> are
>>>>>
>>>>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA
>>>>>
>>>>>>> driver to
>>>>>
>>>>>>>>>>>>> recognize GPU cards. They appear to be properly
>>>>> installed
>>>>>
>>>>>>> from the
>>>>>
>>>>>>>>>>>>> hardware point of view and you can list them with
>>>>>
>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>> lshw -class display
>>>>>
>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>> root at gpu3$ lshw -class display
>>>>>
>>>>>>>>>>>>> *-display UNCLAIMED
>>>>>
>>>>>>>>>>>>> description: VGA compatible controller
>>>>>
>>>>>>>>>>>>> product: NVIDIA Corporation
>>>>>
>>>>>>>>>>>>> vendor: NVIDIA Corporation
>>>>>
>>>>>>>>>>>>> physical id: 0
>>>>>
>>>>>>>>>>>>> bus info: pci at 0000:02:00.0
>>>>>
>>>>>>>>>>>>> version: a1
>>>>>
>>>>>>>>>>>>> width: 64 bits
>>>>>
>>>>>>>>>>>>> clock: 33MHz
>>>>>
>>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>>>
>>>>>>> cap_list
>>>>>
>>>>>>>>>>>>> configuration: latency=0
>>>>>
>>>>>>>>>>>>> resources: iomemory:383f0-383ef
>>>>> iomemory:383f0-383ef
>>>>>
>>>>>>>>>>>>> memory:cf000000-cfffffff
>>>>> memory:383fe0000000-383fefffffff
>>>>>
>>>>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128)
>>>>>
>>>>>>>>>>>>> memory:d0000000-d007ffff
>>>>>
>>>>>>>>>>>>> *-display UNCLAIMED
>>>>>
>>>>>>>>>>>>> description: VGA compatible controller
>>>>>
>>>>>>>>>>>>> product: NVIDIA Corporation
>>>>>
>>>>>>>>>>>>> vendor: NVIDIA Corporation
>>>>>
>>>>>>>>>>>>> physical id: 0
>>>>>
>>>>>>>>>>>>> bus info: pci at 0000:03:00.0
>>>>>
>>>>>>>>>>>>> version: a1
>>>>>
>>>>>>>>>>>>> width: 64 bits
>>>>>
>>>>>>>>>>>>> clock: 33MHz
>>>>>
>>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>>>
>>>>>>> cap_list
>>>>>
>>>>>>>>>>>>> configuration: latency=0
>>>>>
>>>>>>>>>>>>> resources: iomemory:383f0-383ef
>>>>> iomemory:383f0-383ef
>>>>>
>>>>>>>>>>>>> memory:cd000000-cdffffff
>>>>> memory:383fc0000000-383fcfffffff
>>>>>
>>>>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128)
>>>>>
>>>>>>>>>>>>> memory:ce000000-ce07ffff
>>>>>
>>>>>>>>>>>>> *-display
>>>>>
>>>>>>>>>>>>> description: VGA compatible controller
>>>>>
>>>>>>>>>>>>> product: ASPEED Graphics Family
>>>>>
>>>>>>>>>>>>> vendor: ASPEED Technology, Inc.
>>>>>
>>>>>>>>>>>>> physical id: 0
>>>>>
>>>>>>>>>>>>> bus info: pci at 0000:06:00.0
>>>>>
>>>>>>>>>>>>> version: 30
>>>>>
>>>>>>>>>>>>> width: 32 bits
>>>>>
>>>>>>>>>>>>> clock: 33MHz
>>>>>
>>>>>>>>>>>>> capabilities: pm msi vga_controller bus_master
>>>>>
>>>>>>> cap_list rom
>>>>>
>>>>>>>>>>>>> configuration: driver=ast latency=0
>>>>>
>>>>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff
>>>>>
>>>>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128)
>>>>>
>>>>>>>>>>>>> *-display UNCLAIMED
>>>>>
>>>>>>>>>>>>> description: VGA compatible controller
>>>>>
>>>>>>>>>>>>> product: NVIDIA Corporation
>>>>>
>>>>>>>>>>>>> vendor: NVIDIA Corporation
>>>>>
>>>>>>>>>>>>> physical id: 0
>>>>>
>>>>>>>>>>>>> bus info: pci at 0000:82:00.0
>>>>>
>>>>>>>>>>>>> version: a1
>>>>>
>>>>>>>>>>>>> width: 64 bits
>>>>>
>>>>>>>>>>>>> clock: 33MHz
>>>>>
>>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>>>
>>>>>>> cap_list
>>>>>
>>>>>>>>>>>>> configuration: latency=0
>>>>>
>>>>>>>>>>>>> resources: iomemory:387f0-387ef
>>>>> iomemory:387f0-387ef
>>>>>
>>>>>>>>>>>>> memory:fa000000-faffffff
>>>>> memory:387fe0000000-387fefffffff
>>>>>
>>>>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128)
>>>>>
>>>>>>>>>>>>> memory:fb000000-fb07ffff
>>>>>
>>>>>>>>>>>>> *-display UNCLAIMED
>>>>>
>>>>>>>>>>>>> description: VGA compatible controller
>>>>>
>>>>>>>>>>>>> product: NVIDIA Corporation
>>>>>
>>>>>>>>>>>>> vendor: NVIDIA Corporation
>>>>>
>>>>>>>>>>>>> physical id: 0
>>>>>
>>>>>>>>>>>>> bus info: pci at 0000:83:00.0
>>>>>
>>>>>>>>>>>>> version: a1
>>>>>
>>>>>>>>>>>>> width: 64 bits
>>>>>
>>>>>>>>>>>>> clock: 33MHz
>>>>>
>>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>>>
>>>>>>> cap_list
>>>>>
>>>>>>>>>>>>> configuration: latency=0
>>>>>
>>>>>>>>>>>>> resources: iomemory:387f0-387ef
>>>>> iomemory:387f0-387ef
>>>>>
>>>>>>>>>>>>> memory:f8000000-f8ffffff
>>>>> memory:387fc0000000-387fcfffffff
>>>>>
>>>>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128)
>>>>>
>>>>>>>>>>>>> memory:f9000000-f907ffff
>>>>>
>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>> However what scares the hell out of me is that I don't
>>>>> see
>>>>>
>>>>>>> NVIDIA
>>>>>
>>>>>>>>>>>>> driver
>>>>>
>>>>>>>>>>>>> loaded
>>>>>
>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>> lsmod|grep nvidia
>>>>>
>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am
>>>>>
>>>>>>> guessing I
>>>>>
>>>>>>>>>>>>> just
>>>>>
>>>>>>>>>>>>> missed some trivial step during the CUDA installation
>>>>> which
>>>>>
>>>>>>> is very
>>>>>
>>>>>>>>>>>>> involving. I am unfortunately too tired to debug this
>>>>>
>>>>>>> tonight.
>>>>>
>>>>>>>>>>>>>
>>>>>
>>>>>>>>>>>>> Predrag
>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>>>
>>>>>
>>>>>> Links:
>>>>>
>>>>>> ------
>>>>>
>>>>>> [1] http://findgl.mk
>>>
>>>
>>> Links:
>>> ------
>>> [1] http://findgl.mk
>
>
> Links:
> ------
> [1] http://findgl.mk
More information about the Autonlab-users
mailing list