GPU3 is "configured"

Thu Oct 13 13:55:34 EDT 2016

On 2016-10-13 13:51, Dougal Sutherland wrote:
> I actually haven't gotten tensorflow working yet -- the bazel build
> just hangs on me. I think it maybe has to do with home directories
> being on NFS, but I can't figure out bazel at all. I'll try some more
> tonight.
> 

According to one of Princeton guys we could just use Python conda for 
TensorFlow. Please check out and use your scratch directory instead of 
NFS.

Quote:

Hello, Predrag.

We have caffe 1.00rc3 if you are interested.

ftp://ftp.cs.princeton.edu/pub/people/advorkin/SRPM/sd7/caffe-1.00rc3-3.sd7.src.rpm

TensforFlow and protobuf-3 work great with conda 
(http://conda.pydata.org).  I just tried and had no problems installing 
it for Python 2.7 and 3.5

> Caffe should be workable following the instructions Predrag forwarded.
> 
> - Dougal
> 
> On Thu, Oct 13, 2016 at 6:39 PM Predrag Punosevac
> <predragp at imap.srv.cs.cmu.edu> wrote:
> 
>> Dear Autonians,
>> 
>> In the case anybody is interested what happens behind the scenes,
>> Doug
>> got Caffe and TensorFlow to work on
>> GPU3. Please see message below. I also got the  very useful feed
>> back
>> from Princeton and Rutgers people. Please check out if you care (you
>> will have to log into Gmail to see the exchange).
>> 
>> https://groups.google.com/forum/#!forum/springdale-users
>> 
>> I need to think how we move forward with this before start pulling
>> triggers. If somebody is itchy and can't wait please build Caffe and
>> TensorFlow in your scratch directory following below howto.
>> 
>> Predrag
>> 
>> On 2016-10-13 13:24, Dougal Sutherland wrote:
>>> A note about cudnn:
>>> 
>>> There are a bunch of versions of cudnn. They're not
>>> backwards-compatible, and different versions of
>>> caffe/tensorflow/whatever want different ones.
>>> 
>>> I currently am using the setup in  ~dsutherl/cudnn_files:
>>> 
>>> * I have a bunch of versions of the installer there.
>>> * The use-cudnn.sh script, intended to be used like "source
>>> use-cudnn.sh 5.1", will untar the appropriate one into a scratch
>>> directory (if it hasn't already been done) and set
>>> CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH
>> is
>>> needed for caffe binaries, since they don't link to the absolute
>> path;
>>> the first two (not sure about the the third) are needed for
>> theano.
>>> Dunno about tensorflow yet.
>>> 
>>> So, here's the Caffe setup:
>>> 
>>> cd /home/scratch/$USER
>>> git clone https://github.com/BVLC/caffe
>>> cd caffe
>>> cp Makefile.config.example Makefile.config
>>> 
>>> # tell it to use openblas; using atlas needs some changes to the
>>> Makefile
>>> sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config
>>> 
>>> # configure to use cudnn (optional)
>>> source ~dsutherl/cudnn-files/use-cudnn.sh 5.1
>>> sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config
>>> perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/'
>>> Makefile.config
>>> perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/'
>>> Makefile.config
>>> 
>>> # build the library
>>> make -j23
>>> 
>>> # to do tests (takes ~10 minutes):
>>> make -j23 test
>>> make runtest
>>> 
>>> # Now, to run caffe binaries you'll need to remember to source
>>> use-cudnn if you used cudnn before.
>>> 
>>> # To build the python libary:
>>> make py
>>> 
>>> # Requirements for the python library:
>>> # Some of the system packages are too old; this installs them in
>> your
>>> scratch directory.
>>> # You'll have to set PYTHONUSERBASE again before running any
>> python
>>> processes that use these libs.
>>> export PYTHONUSERBASE=$HOME/scratch/.local;
>>> export PATH=$PYTHONUSERBASE/bin:"$PATH"  # <- optional
>>> pip install --user -r python/requirements.txt
>>> 
>>> # Caffe is dumb and doesn't package its python library properly.
>> The
>>> easiest way to use it is:
>>> export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH
>>> python -c 'import caffe'
>>> 
>>> On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland
>> <dougal at gmail.com>
>>> wrote:
>>> 
>>>> Java fix seemed to work. Now tensorflow wants  python-wheel  and
>>>> swig.
>>>> 
>>>> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac
>>>> <predragp at imap.srv.cs.cmu.edu> wrote:
>>>> 
>>>>> On 2016-10-13 11:46, Dougal Sutherland wrote:
>>>>> 
>>>>>> Having some trouble with tensorflow, because:
>>>>> 
>>>>>> 
>>>>> 
>>>>>> * it require's Google's bazel build system
>>>>> 
>>>>>> 
>>>>> 
>>>>>> * The bazel installer says
>>>>> 
>>>>>> Java version is 1.7.0_111 while at least 1.8 is needed.
>>>>> 
>>>>>> *
>>>>> 
>>>>>> 
>>>>> 
>>>>>> * $ java -version
>>>>> 
>>>>>> openjdk version "1.8.0_102"
>>>>> 
>>>>>> OpenJDK Runtime Environment (build 1.8.0_102-b14)
>>>>> 
>>>>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
>>>>> 
>>>>>> $ javac -version
>>>>> 
>>>>>> javac 1.7.0_111
>>>>> 
>>>>>> 
>>>>> 
>>>>> I just did yum -y install java-1.8.0* which installs openjdk
>> 1.8.
>>>>> Please
>>>>> 
>>>>> change your java. Let me know if
>>>>> 
>>>>> you want me to install Oracle JDK 1.8
>>>>> 
>>>>> Predrag
>>>>> 
>>>>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac
>>>>> 
>>>>>> <predragp at cs.cmu.edu> wrote:
>>>>> 
>>>>>> 
>>>>> 
>>>>>>> Dougal Sutherland <dougal at gmail.com> wrote:
>>>>> 
>>>>>>> 
>>>>> 
>>>>>>>> Also, this seemed to work for me so far for protobuf:
>>>>> 
>>>>>>>> 
>>>>> 
>>>>>>>> cd /home/scratch/$USER
>>>>> 
>>>>>>>> VER=3.1.0
>>>>> 
>>>>>>>> wget
>>>>> 
>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> https://github.com/google/protobuf/releases/download/v$VER/protobuf-cpp-$VER.tar.gz
>>>>> 
>>>>>>>> tar xf protobuf-cpp-$VER.tar.gz
>>>>> 
>>>>>>>> cd protobuf-cpp-$VER
>>>>> 
>>>>>>>> ./configure --prefix=/home/scratch/$USER
>>>>> 
>>>>>>>> make -j12
>>>>> 
>>>>>>>> make -j12 check
>>>>> 
>>>>>>>> make install
>>>>> 
>>>>>>> 
>>>>> 
>>>>>>> That is great help!
>>>>> 
>>>>>>> 
>>>>> 
>>>>>>>> 
>>>>> 
>>>>>>>> You could change --prefix=/usr if making an RPM.
>>>>> 
>>>>>>>> 
>>>>> 
>>>>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland
>>>>> 
>>>>>>> <dougal at gmail.com> wrote:
>>>>> 
>>>>>>>> 
>>>>> 
>>>>>>>>> Some more packages for caffe:
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel
>>>>> 
>>>>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>> (Some of those might be installed already, but at least
>>>>> gflags
>>>>> 
>>>>>>> is
>>>>> 
>>>>>>>>> definitely missing.)
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac <
>>>>> 
>>>>>>>>> predragp at imap.srv.cs.cmu.edu> wrote:
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>> On 2016-10-12 23:26, Arne Suppe wrote:
>>>>> 
>>>>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice
>>>>> 
>>>>>>> also hangs
>>>>> 
>>>>>>>>>> on my computer with R2016a.
>>>>> 
>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>> We would have to escalate this with MathWorks. I have seen
>>>>> work
>>>>> 
>>>>>>> around
>>>>> 
>>>>>>>>> Internet but it looks like a bug in one of Mathworks
>> provided
>>>>> 
>>>>>>> MEX files.
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>>> I was able compile the matrixMul example in the CUDA
>>>>> samples
>>>>> 
>>>>>>> and run
>>>>> 
>>>>>>>>>> it on gpu3, so I think the build environment is probably
>>>>> all
>>>>> 
>>>>>>> set.
>>>>> 
>>>>>>>>>> 
>>>>> 
>>>>>>>>>> As for the openGL, I think its possibly a problem with
>>>>> their
>>>>> 
>>>>>>> build
>>>>> 
>>>>>>>>>> script findgl.mk [1] [1] [1] which is not familiar with
>>>>> Springdale OS.
>>>>> 
>>>>>>> The
>>>>> 
>>>>>>>>>> demo_suite directory has a precompiled nbody binary you may
>>>>> 
>>>>>>> try, but I
>>>>> 
>>>>>>>>>> suspect most users will not need graphics.
>>>>> 
>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>> That should not be too hard to fix. Some header files have
>> to
>>>>> be
>>>>> 
>>>>>>>>> manually edited. The funny part until 7.2 Princeton people
>>>>> 
>>>>>>> didn't bother
>>>>> 
>>>>>>>>> to remove RHEL branding which actually made things easier
>> for
>>>>> 
>>>>>>> us.
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>> Doug is trying right now to compile the latest Caffe,
>>>>> 
>>>>>>> TensorFlow, and
>>>>> 
>>>>>>>>> protobuf-3. We will try to create an RPM for that so that we
>>>>> 
>>>>>>> don't have
>>>>> 
>>>>>>>>> to go through this again. I also asked Princeton and Rutgers
>>>>> 
>>>>>>> guys if
>>>>> 
>>>>>>>>> they
>>>>> 
>>>>>>>>> have WIP RPMs to share.
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>> Predrag
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>>> Arne
>>>>> 
>>>>>>>>>> 
>>>>> 
>>>>>>>>>> 
>>>>> 
>>>>>>>>>> 
>>>>> 
>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac
>>>>> 
>>>>>>> <predragp at cs.cmu.edu>
>>>>> 
>>>>>>>>>>> wrote:
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> Arne Suppe <suppe at andrew.cmu.edu> wrote:
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>> Hi Predrag,
>>>>> 
>>>>>>>>>>>> Don???t know if this applies to you, but I just build a
>>>>> 
>>>>>>> machines with
>>>>> 
>>>>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the
>>>>> 
>>>>>>> Titan.  After
>>>>> 
>>>>>>>>>>>> installing CUDA 8, I still found I needed to install the
>>>>> 
>>>>>>> latest
>>>>> 
>>>>>>>>>>>> driver off of the NVIDIA web site to get the card
>>>>> 
>>>>>>> recognized.  Right
>>>>> 
>>>>>>>>>>>> now, I am running 367.44.
>>>>> 
>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>> Arne
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> Arne,
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL
>>>>> 
>>>>>>> arhitecture I
>>>>> 
>>>>>>>>>>> see lots of people complaining about it on the forums. I
>>>>> 
>>>>>>> downloaded
>>>>> 
>>>>>>>>>>> and
>>>>> 
>>>>>>>>>>> installed driver from
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> That seems to made a real difference. Check out this
>>>>> 
>>>>>>> beautiful outputs
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> root at gpu3$ ls nvidia*
>>>>> 
>>>>>>>>>>> nvidia0  nvidia1  nvidia2  nvidia3  nvidiactl  nvidia-uvm
>>>>> 
>>>>>>>>>>> nvidia-uvm-tools
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> root at gpu3$ lspci | grep -i nvidia
>>>>> 
>>>>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation
>>>>> Device
>>>>> 
>>>>>>> 1b00 (rev
>>>>> 
>>>>>>>>>>> a1)
>>>>> 
>>>>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>>>> a1)
>>>>> 
>>>>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation
>>>>> Device
>>>>> 
>>>>>>> 1b00 (rev
>>>>> 
>>>>>>>>>>> a1)
>>>>> 
>>>>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>>>> a1)
>>>>> 
>>>>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation
>>>>> Device
>>>>> 
>>>>>>> 1b00 (rev
>>>>> 
>>>>>>>>>>> a1)
>>>>> 
>>>>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>>>> a1)
>>>>> 
>>>>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation
>>>>> Device
>>>>> 
>>>>>>> 1b00 (rev
>>>>> 
>>>>>>>>>>> a1)
>>>>> 
>>>>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>>>> a1)
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> root at gpu3$ ls /proc/driver
>>>>> 
>>>>>>>>>>> nvidia  nvidia-uvm  nvram  rtc
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> root at gpu3$ lsmod |grep nvidia
>>>>> 
>>>>>>>>>>> nvidia_uvm            738901  0
>>>>> 
>>>>>>>>>>> nvidia_drm             43405  0
>>>>> 
>>>>>>>>>>> nvidia_modeset        764432  1 nvidia_drm
>>>>> 
>>>>>>>>>>> nvidia              11492947  2 nvidia_modeset,nvidia_uvm
>>>>> 
>>>>>>>>>>> drm_kms_helper        125056  2 ast,nvidia_drm
>>>>> 
>>>>>>>>>>> drm                   349210  5
>>>>> 
>>>>>>> ast,ttm,drm_kms_helper,nvidia_drm
>>>>> 
>>>>>>>>>>> i2c_core               40582  7
>>>>> 
>>>>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> root at gpu3$ nvidia-smi
>>>>> 
>>>>>>>>>>> Wed Oct 12 22:03:27 2016
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> +-----------------------------------------------------------------------------+
>>>>> 
>>>>>>>>>>> | NVIDIA-SMI 367.57                 Driver Version: 367.57
>>>>> 
>>>>>>>>>>> |
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> |-------------------------------+----------------------+----------------------+
>>>>> 
>>>>>>>>>>> | GPU  Name        Persistence-M| Bus-Id        Disp.A |
>>>>> 
>>>>>>> Volatile
>>>>> 
>>>>>>>>>>> Uncorr. ECC |
>>>>> 
>>>>>>>>>>> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage |
>>>>> 
>>>>>>> GPU-Util
>>>>> 
>>>>>>>>>>> Compute M. |
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> |===============================+======================+======================|
>>>>> 
>>>>>>>>>>> |   0  TITAN X (Pascal)    Off  | 0000:02:00.0     Off |
>>>>> 
>>>>>>>>>>> N/A |
>>>>> 
>>>>>>>>>>> | 23%   32C    P0    56W / 250W |      0MiB / 12189MiB |
>>>>> 
>>>>>>> 0%
>>>>> 
>>>>>>>>>>> Default |
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> +-------------------------------+----------------------+----------------------+
>>>>> 
>>>>>>>>>>> |   1  TITAN X (Pascal)    Off  | 0000:03:00.0     Off |
>>>>> 
>>>>>>>>>>> N/A |
>>>>> 
>>>>>>>>>>> | 23%   36C    P0    57W / 250W |      0MiB / 12189MiB |
>>>>> 
>>>>>>> 0%
>>>>> 
>>>>>>>>>>> Default |
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> +-------------------------------+----------------------+----------------------+
>>>>> 
>>>>>>>>>>> |   2  TITAN X (Pascal)    Off  | 0000:82:00.0     Off |
>>>>> 
>>>>>>>>>>> N/A |
>>>>> 
>>>>>>>>>>> | 23%   35C    P0    57W / 250W |      0MiB / 12189MiB |
>>>>> 
>>>>>>> 0%
>>>>> 
>>>>>>>>>>> Default |
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> +-------------------------------+----------------------+----------------------+
>>>>> 
>>>>>>>>>>> |   3  TITAN X (Pascal)    Off  | 0000:83:00.0     Off |
>>>>> 
>>>>>>>>>>> N/A |
>>>>> 
>>>>>>>>>>> |  0%   35C    P0    56W / 250W |      0MiB / 12189MiB |
>>>>> 
>>>>>>> 0%
>>>>> 
>>>>>>>>>>> Default |
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> +-------------------------------+----------------------+----------------------+
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> +-----------------------------------------------------------------------------+
>>>>> 
>>>>>>>>>>> | Processes:
>>>>> 
>>>>>>> GPU
>>>>> 
>>>>>>>>>>> Memory |
>>>>> 
>>>>>>>>>>> |  GPU       PID  Type  Process name
>>>>> 
>>>>>>>>>>> Usage
>>>>> 
>>>>>>>>>>> |
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> |=============================================================================|
>>>>> 
>>>>>>>>>>> |  No running processes found
>>>>> 
>>>>>>>>>>> |
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> +-----------------------------------------------------------------------------+
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> Alignment requirement for Surfaces:            Yes
>>>>> 
>>>>>>>>>>> Device has ECC support:                        Disabled
>>>>> 
>>>>>>>>>>> Device supports Unified Addressing (UVA):      Yes
>>>>> 
>>>>>>>>>>> Device PCI Domain ID / Bus ID / location ID:   0 / 131 /
>>>>> 0
>>>>> 
>>>>>>>>>>> Compute Mode:
>>>>> 
>>>>>>>>>>> < Default (multiple host threads can use
>>>>> 
>>>>>>> ::cudaSetDevice() with
>>>>> 
>>>>>>>>>>> device simultaneously) >
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU1) :
>>>>> 
>>>>>>>>>>> Yes
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU2) :
>>>>> 
>>>>>>>>>>> No
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU3) :
>>>>> 
>>>>>>>>>>> No
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU0) :
>>>>> 
>>>>>>>>>>> Yes
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU2) :
>>>>> 
>>>>>>>>>>> No
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU3) :
>>>>> 
>>>>>>>>>>> No
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU0) :
>>>>> 
>>>>>>>>>>> No
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU1) :
>>>>> 
>>>>>>>>>>> No
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU3) :
>>>>> 
>>>>>>>>>>> Yes
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU0) :
>>>>> 
>>>>>>>>>>> No
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU1) :
>>>>> 
>>>>>>>>>>> No
>>>>> 
>>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>>>>> (Pascal)
>>>>> 
>>>>>>> (GPU2) :
>>>>> 
>>>>>>>>>>> Yes
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version =
>>>>> 8.0,
>>>>> 
>>>>>>> CUDA
>>>>> 
>>>>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X
>>>>> 
>>>>>>> (Pascal),
>>>>> 
>>>>>>>>>>> Device1
>>>>> 
>>>>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 =
>>>>> 
>>>>>>> TITAN X
>>>>> 
>>>>>>>>>>> (Pascal)
>>>>> 
>>>>>>>>>>> Result = PASS
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> Now not everything is rosy
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> root at gpu3$ cd
>>>>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
>>>>> 
>>>>>>>>>>> root at gpu3$ make
>>>>> 
>>>>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting
>>>>> 
>>>>>>> Started Guide
>>>>> 
>>>>>>>>>>> for how to find and install them. <<<
>>>>> 
>>>>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting
>>>>> 
>>>>>>> Started Guide
>>>>> 
>>>>>>>>>>> for how to find and install them. <<<
>>>>> 
>>>>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting
>>>>> 
>>>>>>> Started Guide
>>>>> 
>>>>>>>>>>> for how to find and install them. <<<
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> even though those are installed. For example
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> root at gpu3$ yum whatprovides  */libX11.so
>>>>> 
>>>>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for
>>>>> libX11
>>>>> 
>>>>>>>>>>> Repo        : core
>>>>> 
>>>>>>>>>>> Matched from:
>>>>> 
>>>>>>>>>>> Filename    : /usr/lib/libX11.so
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> also
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> mesa-libGLU-devel
>>>>> 
>>>>>>>>>>> mesa-libGL-devel
>>>>> 
>>>>>>>>>>> xorg-x11-drv-nvidia-devel
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> but
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel
>>>>> mesa-libGL-devel
>>>>> 
>>>>>>>>>>> xorg-x11-drv-nvidia-devel
>>>>> 
>>>>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already
>>>>> 
>>>>>>> installed and
>>>>> 
>>>>>>>>>>> latest version
>>>>> 
>>>>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64
>>>>> already
>>>>> 
>>>>>>>>>>> installed
>>>>> 
>>>>>>>>>>> and latest version
>>>>> 
>>>>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64
>>>>> 
>>>>>>> already
>>>>> 
>>>>>>>>>>> installed and latest version
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> Also from MATLAB gpuDevice hangs.
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> So we still don't have a working installation. Any help
>>>>> would
>>>>> 
>>>>>>> be
>>>>> 
>>>>>>>>>>> appreciated.
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> Best,
>>>>> 
>>>>>>>>>>> Predrag
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> P.S. Once we have a working installation we can think of
>>>>> 
>>>>>>> installing
>>>>> 
>>>>>>>>>>> Caffe and TensorFlow. For now we have to see why the
>>>>> things
>>>>> 
>>>>>>> are not
>>>>> 
>>>>>>>>>>> working.
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac
>>>>> 
>>>>>>> <predragp at cs.cmu.edu>
>>>>> 
>>>>>>>>>>>>> wrote:
>>>>> 
>>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>>> Dear Autonians,
>>>>> 
>>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all
>>>>> 
>>>>>>> packages
>>>>> 
>>>>>>>>>>>>> are
>>>>> 
>>>>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA
>>>>> 
>>>>>>> driver to
>>>>> 
>>>>>>>>>>>>> recognize GPU cards. They appear to be properly
>>>>> installed
>>>>> 
>>>>>>> from the
>>>>> 
>>>>>>>>>>>>> hardware point of view and you can list them with
>>>>> 
>>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>>> lshw -class display
>>>>> 
>>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>>> root at gpu3$ lshw -class display
>>>>> 
>>>>>>>>>>>>> *-display UNCLAIMED
>>>>> 
>>>>>>>>>>>>> description: VGA compatible controller
>>>>> 
>>>>>>>>>>>>> product: NVIDIA Corporation
>>>>> 
>>>>>>>>>>>>> vendor: NVIDIA Corporation
>>>>> 
>>>>>>>>>>>>> physical id: 0
>>>>> 
>>>>>>>>>>>>> bus info: pci at 0000:02:00.0
>>>>> 
>>>>>>>>>>>>> version: a1
>>>>> 
>>>>>>>>>>>>> width: 64 bits
>>>>> 
>>>>>>>>>>>>> clock: 33MHz
>>>>> 
>>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>>> 
>>>>>>> cap_list
>>>>> 
>>>>>>>>>>>>> configuration: latency=0
>>>>> 
>>>>>>>>>>>>> resources: iomemory:383f0-383ef
>>>>> iomemory:383f0-383ef
>>>>> 
>>>>>>>>>>>>> memory:cf000000-cfffffff
>>>>> memory:383fe0000000-383fefffffff
>>>>> 
>>>>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128)
>>>>> 
>>>>>>>>>>>>> memory:d0000000-d007ffff
>>>>> 
>>>>>>>>>>>>> *-display UNCLAIMED
>>>>> 
>>>>>>>>>>>>> description: VGA compatible controller
>>>>> 
>>>>>>>>>>>>> product: NVIDIA Corporation
>>>>> 
>>>>>>>>>>>>> vendor: NVIDIA Corporation
>>>>> 
>>>>>>>>>>>>> physical id: 0
>>>>> 
>>>>>>>>>>>>> bus info: pci at 0000:03:00.0
>>>>> 
>>>>>>>>>>>>> version: a1
>>>>> 
>>>>>>>>>>>>> width: 64 bits
>>>>> 
>>>>>>>>>>>>> clock: 33MHz
>>>>> 
>>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>>> 
>>>>>>> cap_list
>>>>> 
>>>>>>>>>>>>> configuration: latency=0
>>>>> 
>>>>>>>>>>>>> resources: iomemory:383f0-383ef
>>>>> iomemory:383f0-383ef
>>>>> 
>>>>>>>>>>>>> memory:cd000000-cdffffff
>>>>> memory:383fc0000000-383fcfffffff
>>>>> 
>>>>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128)
>>>>> 
>>>>>>>>>>>>> memory:ce000000-ce07ffff
>>>>> 
>>>>>>>>>>>>> *-display
>>>>> 
>>>>>>>>>>>>> description: VGA compatible controller
>>>>> 
>>>>>>>>>>>>> product: ASPEED Graphics Family
>>>>> 
>>>>>>>>>>>>> vendor: ASPEED Technology, Inc.
>>>>> 
>>>>>>>>>>>>> physical id: 0
>>>>> 
>>>>>>>>>>>>> bus info: pci at 0000:06:00.0
>>>>> 
>>>>>>>>>>>>> version: 30
>>>>> 
>>>>>>>>>>>>> width: 32 bits
>>>>> 
>>>>>>>>>>>>> clock: 33MHz
>>>>> 
>>>>>>>>>>>>> capabilities: pm msi vga_controller bus_master
>>>>> 
>>>>>>> cap_list rom
>>>>> 
>>>>>>>>>>>>> configuration: driver=ast latency=0
>>>>> 
>>>>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff
>>>>> 
>>>>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128)
>>>>> 
>>>>>>>>>>>>> *-display UNCLAIMED
>>>>> 
>>>>>>>>>>>>> description: VGA compatible controller
>>>>> 
>>>>>>>>>>>>> product: NVIDIA Corporation
>>>>> 
>>>>>>>>>>>>> vendor: NVIDIA Corporation
>>>>> 
>>>>>>>>>>>>> physical id: 0
>>>>> 
>>>>>>>>>>>>> bus info: pci at 0000:82:00.0
>>>>> 
>>>>>>>>>>>>> version: a1
>>>>> 
>>>>>>>>>>>>> width: 64 bits
>>>>> 
>>>>>>>>>>>>> clock: 33MHz
>>>>> 
>>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>>> 
>>>>>>> cap_list
>>>>> 
>>>>>>>>>>>>> configuration: latency=0
>>>>> 
>>>>>>>>>>>>> resources: iomemory:387f0-387ef
>>>>> iomemory:387f0-387ef
>>>>> 
>>>>>>>>>>>>> memory:fa000000-faffffff
>>>>> memory:387fe0000000-387fefffffff
>>>>> 
>>>>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128)
>>>>> 
>>>>>>>>>>>>> memory:fb000000-fb07ffff
>>>>> 
>>>>>>>>>>>>> *-display UNCLAIMED
>>>>> 
>>>>>>>>>>>>> description: VGA compatible controller
>>>>> 
>>>>>>>>>>>>> product: NVIDIA Corporation
>>>>> 
>>>>>>>>>>>>> vendor: NVIDIA Corporation
>>>>> 
>>>>>>>>>>>>> physical id: 0
>>>>> 
>>>>>>>>>>>>> bus info: pci at 0000:83:00.0
>>>>> 
>>>>>>>>>>>>> version: a1
>>>>> 
>>>>>>>>>>>>> width: 64 bits
>>>>> 
>>>>>>>>>>>>> clock: 33MHz
>>>>> 
>>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>>> 
>>>>>>> cap_list
>>>>> 
>>>>>>>>>>>>> configuration: latency=0
>>>>> 
>>>>>>>>>>>>> resources: iomemory:387f0-387ef
>>>>> iomemory:387f0-387ef
>>>>> 
>>>>>>>>>>>>> memory:f8000000-f8ffffff
>>>>> memory:387fc0000000-387fcfffffff
>>>>> 
>>>>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128)
>>>>> 
>>>>>>>>>>>>> memory:f9000000-f907ffff
>>>>> 
>>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>>> However what scares the hell out of me is that I don't
>>>>> see
>>>>> 
>>>>>>> NVIDIA
>>>>> 
>>>>>>>>>>>>> driver
>>>>> 
>>>>>>>>>>>>> loaded
>>>>> 
>>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>>> lsmod|grep nvidia
>>>>> 
>>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am
>>>>> 
>>>>>>> guessing I
>>>>> 
>>>>>>>>>>>>> just
>>>>> 
>>>>>>>>>>>>> missed some trivial step during the CUDA installation
>>>>> which
>>>>> 
>>>>>>> is very
>>>>> 
>>>>>>>>>>>>> involving. I am unfortunately too tired to debug this
>>>>> 
>>>>>>> tonight.
>>>>> 
>>>>>>>>>>>>> 
>>>>> 
>>>>>>>>>>>>> Predrag
>>>>> 
>>>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>>>> Links:
>>>>> 
>>>>>> ------
>>>>> 
>>>>>> [1] http://findgl.mk
>>> 
>>> 
>>> Links:
>>> ------
>>> [1] http://findgl.mk
> 
> 
> Links:
> ------
> [1] http://findgl.mk