GPU3 is "configured"

Kirthevasan Kandasamy kandasamy at cmu.edu
Mon Oct 17 18:23:47 EDT 2016


Hi,

Just following up. Has anyone managed to resolve this yet?
I still can't run tensorflow on gpu3.

samy

On Thu, Oct 13, 2016 at 1:58 PM, Dougal Sutherland <dougal at gmail.com> wrote:

> According to the tensorflow site, the conda package doesn't support GPUs.
>
> On Thu, Oct 13, 2016, 6:55 PM Predrag Punosevac <
> predragp at imap.srv.cs.cmu.edu> wrote:
>
>> On 2016-10-13 13:51, Dougal Sutherland wrote:
>> > I actually haven't gotten tensorflow working yet -- the bazel build
>> > just hangs on me. I think it maybe has to do with home directories
>> > being on NFS, but I can't figure out bazel at all. I'll try some more
>> > tonight.
>> >
>>
>> According to one of Princeton guys we could just use Python conda for
>> TensorFlow. Please check out and use your scratch directory instead of
>> NFS.
>>
>> Quote:
>>
>> Hello, Predrag.
>>
>> We have caffe 1.00rc3 if you are interested.
>>
>> ftp://ftp.cs.princeton.edu/pub/people/advorkin/SRPM/sd7/
>> caffe-1.00rc3-3.sd7.src.rpm
>>
>> TensforFlow and protobuf-3 work great with conda
>> (http://conda.pydata.org).  I just tried and had no problems installing
>> it for Python 2.7 and 3.5
>>
>>
>> > Caffe should be workable following the instructions Predrag forwarded.
>> >
>> > - Dougal
>> >
>> > On Thu, Oct 13, 2016 at 6:39 PM Predrag Punosevac
>> > <predragp at imap.srv.cs.cmu.edu> wrote:
>> >
>> >> Dear Autonians,
>> >>
>> >> In the case anybody is interested what happens behind the scenes,
>> >> Doug
>> >> got Caffe and TensorFlow to work on
>> >> GPU3. Please see message below. I also got the  very useful feed
>> >> back
>> >> from Princeton and Rutgers people. Please check out if you care (you
>> >> will have to log into Gmail to see the exchange).
>> >>
>> >> https://groups.google.com/forum/#!forum/springdale-users
>> >>
>> >> I need to think how we move forward with this before start pulling
>> >> triggers. If somebody is itchy and can't wait please build Caffe and
>> >> TensorFlow in your scratch directory following below howto.
>> >>
>> >> Predrag
>> >>
>> >> On 2016-10-13 13:24, Dougal Sutherland wrote:
>> >>> A note about cudnn:
>> >>>
>> >>> There are a bunch of versions of cudnn. They're not
>> >>> backwards-compatible, and different versions of
>> >>> caffe/tensorflow/whatever want different ones.
>> >>>
>> >>> I currently am using the setup in  ~dsutherl/cudnn_files:
>> >>>
>> >>> * I have a bunch of versions of the installer there.
>> >>> * The use-cudnn.sh script, intended to be used like "source
>> >>> use-cudnn.sh 5.1", will untar the appropriate one into a scratch
>> >>> directory (if it hasn't already been done) and set
>> >>> CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH
>> >> is
>> >>> needed for caffe binaries, since they don't link to the absolute
>> >> path;
>> >>> the first two (not sure about the the third) are needed for
>> >> theano.
>> >>> Dunno about tensorflow yet.
>> >>>
>> >>> So, here's the Caffe setup:
>> >>>
>> >>> cd /home/scratch/$USER
>> >>> git clone https://github.com/BVLC/caffe
>> >>> cd caffe
>> >>> cp Makefile.config.example Makefile.config
>> >>>
>> >>> # tell it to use openblas; using atlas needs some changes to the
>> >>> Makefile
>> >>> sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config
>> >>>
>> >>> # configure to use cudnn (optional)
>> >>> source ~dsutherl/cudnn-files/use-cudnn.sh 5.1
>> >>> sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config
>> >>> perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/'
>> >>> Makefile.config
>> >>> perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/'
>> >>> Makefile.config
>> >>>
>> >>> # build the library
>> >>> make -j23
>> >>>
>> >>> # to do tests (takes ~10 minutes):
>> >>> make -j23 test
>> >>> make runtest
>> >>>
>> >>> # Now, to run caffe binaries you'll need to remember to source
>> >>> use-cudnn if you used cudnn before.
>> >>>
>> >>> # To build the python libary:
>> >>> make py
>> >>>
>> >>> # Requirements for the python library:
>> >>> # Some of the system packages are too old; this installs them in
>> >> your
>> >>> scratch directory.
>> >>> # You'll have to set PYTHONUSERBASE again before running any
>> >> python
>> >>> processes that use these libs.
>> >>> export PYTHONUSERBASE=$HOME/scratch/.local;
>> >>> export PATH=$PYTHONUSERBASE/bin:"$PATH"  # <- optional
>> >>> pip install --user -r python/requirements.txt
>> >>>
>> >>> # Caffe is dumb and doesn't package its python library properly.
>> >> The
>> >>> easiest way to use it is:
>> >>> export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH
>> >>> python -c 'import caffe'
>> >>>
>> >>> On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland
>> >> <dougal at gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Java fix seemed to work. Now tensorflow wants  python-wheel  and
>> >>>> swig.
>> >>>>
>> >>>> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac
>> >>>> <predragp at imap.srv.cs.cmu.edu> wrote:
>> >>>>
>> >>>>> On 2016-10-13 11:46, Dougal Sutherland wrote:
>> >>>>>
>> >>>>>> Having some trouble with tensorflow, because:
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>>> * it require's Google's bazel build system
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>>> * The bazel installer says
>> >>>>>
>> >>>>>> Java version is 1.7.0_111 while at least 1.8 is needed.
>> >>>>>
>> >>>>>> *
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>>> * $ java -version
>> >>>>>
>> >>>>>> openjdk version "1.8.0_102"
>> >>>>>
>> >>>>>> OpenJDK Runtime Environment (build 1.8.0_102-b14)
>> >>>>>
>> >>>>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
>> >>>>>
>> >>>>>> $ javac -version
>> >>>>>
>> >>>>>> javac 1.7.0_111
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>> I just did yum -y install java-1.8.0* which installs openjdk
>> >> 1.8.
>> >>>>> Please
>> >>>>>
>> >>>>> change your java. Let me know if
>> >>>>>
>> >>>>> you want me to install Oracle JDK 1.8
>> >>>>>
>> >>>>> Predrag
>> >>>>>
>> >>>>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac
>> >>>>>
>> >>>>>> <predragp at cs.cmu.edu> wrote:
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>>>> Dougal Sutherland <dougal at gmail.com> wrote:
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>>> Also, this seemed to work for me so far for protobuf:
>> >>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>>>>> cd /home/scratch/$USER
>> >>>>>
>> >>>>>>>> VER=3.1.0
>> >>>>>
>> >>>>>>>> wget
>> >>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > https://github.com/google/protobuf/releases/download/v$
>> VER/protobuf-cpp-$VER.tar.gz
>> >>>>>
>> >>>>>>>> tar xf protobuf-cpp-$VER.tar.gz
>> >>>>>
>> >>>>>>>> cd protobuf-cpp-$VER
>> >>>>>
>> >>>>>>>> ./configure --prefix=/home/scratch/$USER
>> >>>>>
>> >>>>>>>> make -j12
>> >>>>>
>> >>>>>>>> make -j12 check
>> >>>>>
>> >>>>>>>> make install
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>> That is great help!
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>>>>> You could change --prefix=/usr if making an RPM.
>> >>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland
>> >>>>>
>> >>>>>>> <dougal at gmail.com> wrote:
>> >>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>>>>>> Some more packages for caffe:
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel
>> >>>>>
>> >>>>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>> (Some of those might be installed already, but at least
>> >>>>> gflags
>> >>>>>
>> >>>>>>> is
>> >>>>>
>> >>>>>>>>> definitely missing.)
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac <
>> >>>>>
>> >>>>>>>>> predragp at imap.srv.cs.cmu.edu> wrote:
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>> On 2016-10-12 23:26, Arne Suppe wrote:
>> >>>>>
>> >>>>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice
>> >>>>>
>> >>>>>>> also hangs
>> >>>>>
>> >>>>>>>>>> on my computer with R2016a.
>> >>>>>
>> >>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>> We would have to escalate this with MathWorks. I have seen
>> >>>>> work
>> >>>>>
>> >>>>>>> around
>> >>>>>
>> >>>>>>>>> Internet but it looks like a bug in one of Mathworks
>> >> provided
>> >>>>>
>> >>>>>>> MEX files.
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>>> I was able compile the matrixMul example in the CUDA
>> >>>>> samples
>> >>>>>
>> >>>>>>> and run
>> >>>>>
>> >>>>>>>>>> it on gpu3, so I think the build environment is probably
>> >>>>> all
>> >>>>>
>> >>>>>>> set.
>> >>>>>
>> >>>>>>>>>>
>> >>>>>
>> >>>>>>>>>> As for the openGL, I think its possibly a problem with
>> >>>>> their
>> >>>>>
>> >>>>>>> build
>> >>>>>
>> >>>>>>>>>> script findgl.mk [1] [1] [1] which is not familiar with
>> >>>>> Springdale OS.
>> >>>>>
>> >>>>>>> The
>> >>>>>
>> >>>>>>>>>> demo_suite directory has a precompiled nbody binary you may
>> >>>>>
>> >>>>>>> try, but I
>> >>>>>
>> >>>>>>>>>> suspect most users will not need graphics.
>> >>>>>
>> >>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>> That should not be too hard to fix. Some header files have
>> >> to
>> >>>>> be
>> >>>>>
>> >>>>>>>>> manually edited. The funny part until 7.2 Princeton people
>> >>>>>
>> >>>>>>> didn't bother
>> >>>>>
>> >>>>>>>>> to remove RHEL branding which actually made things easier
>> >> for
>> >>>>>
>> >>>>>>> us.
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>> Doug is trying right now to compile the latest Caffe,
>> >>>>>
>> >>>>>>> TensorFlow, and
>> >>>>>
>> >>>>>>>>> protobuf-3. We will try to create an RPM for that so that we
>> >>>>>
>> >>>>>>> don't have
>> >>>>>
>> >>>>>>>>> to go through this again. I also asked Princeton and Rutgers
>> >>>>>
>> >>>>>>> guys if
>> >>>>>
>> >>>>>>>>> they
>> >>>>>
>> >>>>>>>>> have WIP RPMs to share.
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>> Predrag
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>>> Arne
>> >>>>>
>> >>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac
>> >>>>>
>> >>>>>>> <predragp at cs.cmu.edu>
>> >>>>>
>> >>>>>>>>>>> wrote:
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> Arne Suppe <suppe at andrew.cmu.edu> wrote:
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>> Hi Predrag,
>> >>>>>
>> >>>>>>>>>>>> Don???t know if this applies to you, but I just build a
>> >>>>>
>> >>>>>>> machines with
>> >>>>>
>> >>>>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the
>> >>>>>
>> >>>>>>> Titan.  After
>> >>>>>
>> >>>>>>>>>>>> installing CUDA 8, I still found I needed to install the
>> >>>>>
>> >>>>>>> latest
>> >>>>>
>> >>>>>>>>>>>> driver off of the NVIDIA web site to get the card
>> >>>>>
>> >>>>>>> recognized.  Right
>> >>>>>
>> >>>>>>>>>>>> now, I am running 367.44.
>> >>>>>
>> >>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>> Arne
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> Arne,
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL
>> >>>>>
>> >>>>>>> arhitecture I
>> >>>>>
>> >>>>>>>>>>> see lots of people complaining about it on the forums. I
>> >>>>>
>> >>>>>>> downloaded
>> >>>>>
>> >>>>>>>>>>> and
>> >>>>>
>> >>>>>>>>>>> installed driver from
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > http://www.nvidia.com/content/DriverDownload-March2009/
>> confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-
>> Linux-x86_64-367.57.run&lang=us&type=GeForce
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> That seems to made a real difference. Check out this
>> >>>>>
>> >>>>>>> beautiful outputs
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> root at gpu3$ ls nvidia*
>> >>>>>
>> >>>>>>>>>>> nvidia0  nvidia1  nvidia2  nvidia3  nvidiactl  nvidia-uvm
>> >>>>>
>> >>>>>>>>>>> nvidia-uvm-tools
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> root at gpu3$ lspci | grep -i nvidia
>> >>>>>
>> >>>>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation
>> >>>>> Device
>> >>>>>
>> >>>>>>> 1b00 (rev
>> >>>>>
>> >>>>>>>>>>> a1)
>> >>>>>
>> >>>>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>> >>>>> a1)
>> >>>>>
>> >>>>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation
>> >>>>> Device
>> >>>>>
>> >>>>>>> 1b00 (rev
>> >>>>>
>> >>>>>>>>>>> a1)
>> >>>>>
>> >>>>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>> >>>>> a1)
>> >>>>>
>> >>>>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation
>> >>>>> Device
>> >>>>>
>> >>>>>>> 1b00 (rev
>> >>>>>
>> >>>>>>>>>>> a1)
>> >>>>>
>> >>>>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>> >>>>> a1)
>> >>>>>
>> >>>>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation
>> >>>>> Device
>> >>>>>
>> >>>>>>> 1b00 (rev
>> >>>>>
>> >>>>>>>>>>> a1)
>> >>>>>
>> >>>>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>> >>>>> a1)
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> root at gpu3$ ls /proc/driver
>> >>>>>
>> >>>>>>>>>>> nvidia  nvidia-uvm  nvram  rtc
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> root at gpu3$ lsmod |grep nvidia
>> >>>>>
>> >>>>>>>>>>> nvidia_uvm            738901  0
>> >>>>>
>> >>>>>>>>>>> nvidia_drm             43405  0
>> >>>>>
>> >>>>>>>>>>> nvidia_modeset        764432  1 nvidia_drm
>> >>>>>
>> >>>>>>>>>>> nvidia              11492947  2 nvidia_modeset,nvidia_uvm
>> >>>>>
>> >>>>>>>>>>> drm_kms_helper        125056  2 ast,nvidia_drm
>> >>>>>
>> >>>>>>>>>>> drm                   349210  5
>> >>>>>
>> >>>>>>> ast,ttm,drm_kms_helper,nvidia_drm
>> >>>>>
>> >>>>>>>>>>> i2c_core               40582  7
>> >>>>>
>> >>>>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> root at gpu3$ nvidia-smi
>> >>>>>
>> >>>>>>>>>>> Wed Oct 12 22:03:27 2016
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > +-----------------------------------------------------------
>> ------------------+
>> >>>>>
>> >>>>>>>>>>> | NVIDIA-SMI 367.57                 Driver Version: 367.57
>> >>>>>
>> >>>>>>>>>>> |
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > |-------------------------------+----------------------+----
>> ------------------+
>> >>>>>
>> >>>>>>>>>>> | GPU  Name        Persistence-M| Bus-Id        Disp.A |
>> >>>>>
>> >>>>>>> Volatile
>> >>>>>
>> >>>>>>>>>>> Uncorr. ECC |
>> >>>>>
>> >>>>>>>>>>> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage |
>> >>>>>
>> >>>>>>> GPU-Util
>> >>>>>
>> >>>>>>>>>>> Compute M. |
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > |===============================+======================+====
>> ==================|
>> >>>>>
>> >>>>>>>>>>> |   0  TITAN X (Pascal)    Off  | 0000:02:00.0     Off |
>> >>>>>
>> >>>>>>>>>>> N/A |
>> >>>>>
>> >>>>>>>>>>> | 23%   32C    P0    56W / 250W |      0MiB / 12189MiB |
>> >>>>>
>> >>>>>>> 0%
>> >>>>>
>> >>>>>>>>>>> Default |
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > +-------------------------------+----------------------+----
>> ------------------+
>> >>>>>
>> >>>>>>>>>>> |   1  TITAN X (Pascal)    Off  | 0000:03:00.0     Off |
>> >>>>>
>> >>>>>>>>>>> N/A |
>> >>>>>
>> >>>>>>>>>>> | 23%   36C    P0    57W / 250W |      0MiB / 12189MiB |
>> >>>>>
>> >>>>>>> 0%
>> >>>>>
>> >>>>>>>>>>> Default |
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > +-------------------------------+----------------------+----
>> ------------------+
>> >>>>>
>> >>>>>>>>>>> |   2  TITAN X (Pascal)    Off  | 0000:82:00.0     Off |
>> >>>>>
>> >>>>>>>>>>> N/A |
>> >>>>>
>> >>>>>>>>>>> | 23%   35C    P0    57W / 250W |      0MiB / 12189MiB |
>> >>>>>
>> >>>>>>> 0%
>> >>>>>
>> >>>>>>>>>>> Default |
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > +-------------------------------+----------------------+----
>> ------------------+
>> >>>>>
>> >>>>>>>>>>> |   3  TITAN X (Pascal)    Off  | 0000:83:00.0     Off |
>> >>>>>
>> >>>>>>>>>>> N/A |
>> >>>>>
>> >>>>>>>>>>> |  0%   35C    P0    56W / 250W |      0MiB / 12189MiB |
>> >>>>>
>> >>>>>>> 0%
>> >>>>>
>> >>>>>>>>>>> Default |
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > +-------------------------------+----------------------+----
>> ------------------+
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > +-----------------------------------------------------------
>> ------------------+
>> >>>>>
>> >>>>>>>>>>> | Processes:
>> >>>>>
>> >>>>>>> GPU
>> >>>>>
>> >>>>>>>>>>> Memory |
>> >>>>>
>> >>>>>>>>>>> |  GPU       PID  Type  Process name
>> >>>>>
>> >>>>>>>>>>> Usage
>> >>>>>
>> >>>>>>>>>>> |
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > |===========================================================
>> ==================|
>> >>>>>
>> >>>>>>>>>>> |  No running processes found
>> >>>>>
>> >>>>>>>>>>> |
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> > +-----------------------------------------------------------
>> ------------------+
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> Alignment requirement for Surfaces:            Yes
>> >>>>>
>> >>>>>>>>>>> Device has ECC support:                        Disabled
>> >>>>>
>> >>>>>>>>>>> Device supports Unified Addressing (UVA):      Yes
>> >>>>>
>> >>>>>>>>>>> Device PCI Domain ID / Bus ID / location ID:   0 / 131 /
>> >>>>> 0
>> >>>>>
>> >>>>>>>>>>> Compute Mode:
>> >>>>>
>> >>>>>>>>>>> < Default (multiple host threads can use
>> >>>>>
>> >>>>>>> ::cudaSetDevice() with
>> >>>>>
>> >>>>>>>>>>> device simultaneously) >
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU1) :
>> >>>>>
>> >>>>>>>>>>> Yes
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU2) :
>> >>>>>
>> >>>>>>>>>>> No
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU3) :
>> >>>>>
>> >>>>>>>>>>> No
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU0) :
>> >>>>>
>> >>>>>>>>>>> Yes
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU2) :
>> >>>>>
>> >>>>>>>>>>> No
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU3) :
>> >>>>>
>> >>>>>>>>>>> No
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU0) :
>> >>>>>
>> >>>>>>>>>>> No
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU1) :
>> >>>>>
>> >>>>>>>>>>> No
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU3) :
>> >>>>>
>> >>>>>>>>>>> Yes
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU0) :
>> >>>>>
>> >>>>>>>>>>> No
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU1) :
>> >>>>>
>> >>>>>>>>>>> No
>> >>>>>
>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>> >>>>> (Pascal)
>> >>>>>
>> >>>>>>> (GPU2) :
>> >>>>>
>> >>>>>>>>>>> Yes
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version =
>> >>>>> 8.0,
>> >>>>>
>> >>>>>>> CUDA
>> >>>>>
>> >>>>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X
>> >>>>>
>> >>>>>>> (Pascal),
>> >>>>>
>> >>>>>>>>>>> Device1
>> >>>>>
>> >>>>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 =
>> >>>>>
>> >>>>>>> TITAN X
>> >>>>>
>> >>>>>>>>>>> (Pascal)
>> >>>>>
>> >>>>>>>>>>> Result = PASS
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> Now not everything is rosy
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> root at gpu3$ cd
>> >>>>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
>> >>>>>
>> >>>>>>>>>>> root at gpu3$ make
>> >>>>>
>> >>>>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting
>> >>>>>
>> >>>>>>> Started Guide
>> >>>>>
>> >>>>>>>>>>> for how to find and install them. <<<
>> >>>>>
>> >>>>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting
>> >>>>>
>> >>>>>>> Started Guide
>> >>>>>
>> >>>>>>>>>>> for how to find and install them. <<<
>> >>>>>
>> >>>>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting
>> >>>>>
>> >>>>>>> Started Guide
>> >>>>>
>> >>>>>>>>>>> for how to find and install them. <<<
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> even though those are installed. For example
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> root at gpu3$ yum whatprovides  */libX11.so
>> >>>>>
>> >>>>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for
>> >>>>> libX11
>> >>>>>
>> >>>>>>>>>>> Repo        : core
>> >>>>>
>> >>>>>>>>>>> Matched from:
>> >>>>>
>> >>>>>>>>>>> Filename    : /usr/lib/libX11.so
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> also
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> mesa-libGLU-devel
>> >>>>>
>> >>>>>>>>>>> mesa-libGL-devel
>> >>>>>
>> >>>>>>>>>>> xorg-x11-drv-nvidia-devel
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> but
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel
>> >>>>> mesa-libGL-devel
>> >>>>>
>> >>>>>>>>>>> xorg-x11-drv-nvidia-devel
>> >>>>>
>> >>>>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already
>> >>>>>
>> >>>>>>> installed and
>> >>>>>
>> >>>>>>>>>>> latest version
>> >>>>>
>> >>>>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64
>> >>>>> already
>> >>>>>
>> >>>>>>>>>>> installed
>> >>>>>
>> >>>>>>>>>>> and latest version
>> >>>>>
>> >>>>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64
>> >>>>>
>> >>>>>>> already
>> >>>>>
>> >>>>>>>>>>> installed and latest version
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> Also from MATLAB gpuDevice hangs.
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> So we still don't have a working installation. Any help
>> >>>>> would
>> >>>>>
>> >>>>>>> be
>> >>>>>
>> >>>>>>>>>>> appreciated.
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> Best,
>> >>>>>
>> >>>>>>>>>>> Predrag
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>> P.S. Once we have a working installation we can think of
>> >>>>>
>> >>>>>>> installing
>> >>>>>
>> >>>>>>>>>>> Caffe and TensorFlow. For now we have to see why the
>> >>>>> things
>> >>>>>
>> >>>>>>> are not
>> >>>>>
>> >>>>>>>>>>> working.
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac
>> >>>>>
>> >>>>>>> <predragp at cs.cmu.edu>
>>
>> >>>>>
>> >>>>>>>>>>>>> wrote:
>> >>>>>
>> >>>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>> Dear Autonians,
>> >>>>>
>> >>>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all
>> >>>>>
>> >>>>>>> packages
>> >>>>>
>> >>>>>>>>>>>>> are
>> >>>>>
>> >>>>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA
>> >>>>>
>> >>>>>>> driver to
>> >>>>>
>> >>>>>>>>>>>>> recognize GPU cards. They appear to be properly
>> >>>>> installed
>> >>>>>
>> >>>>>>> from the
>> >>>>>
>> >>>>>>>>>>>>> hardware point of view and you can list them with
>> >>>>>
>> >>>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>> lshw -class display
>> >>>>>
>> >>>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>> root at gpu3$ lshw -class display
>> >>>>>
>> >>>>>>>>>>>>> *-display UNCLAIMED
>> >>>>>
>> >>>>>>>>>>>>> description: VGA compatible controller
>> >>>>>
>> >>>>>>>>>>>>> product: NVIDIA Corporation
>> >>>>>
>> >>>>>>>>>>>>> vendor: NVIDIA Corporation
>> >>>>>
>> >>>>>>>>>>>>> physical id: 0
>> >>>>>
>> >>>>>>>>>>>>> bus info: pci at 0000:02:00.0
>> >>>>>
>> >>>>>>>>>>>>> version: a1
>> >>>>>
>> >>>>>>>>>>>>> width: 64 bits
>> >>>>>
>> >>>>>>>>>>>>> clock: 33MHz
>> >>>>>
>> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>> >>>>>
>> >>>>>>> cap_list
>> >>>>>
>> >>>>>>>>>>>>> configuration: latency=0
>> >>>>>
>> >>>>>>>>>>>>> resources: iomemory:383f0-383ef
>> >>>>> iomemory:383f0-383ef
>> >>>>>
>> >>>>>>>>>>>>> memory:cf000000-cfffffff
>> >>>>> memory:383fe0000000-383fefffffff
>> >>>>>
>> >>>>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128)
>> >>>>>
>> >>>>>>>>>>>>> memory:d0000000-d007ffff
>> >>>>>
>> >>>>>>>>>>>>> *-display UNCLAIMED
>> >>>>>
>> >>>>>>>>>>>>> description: VGA compatible controller
>> >>>>>
>> >>>>>>>>>>>>> product: NVIDIA Corporation
>> >>>>>
>> >>>>>>>>>>>>> vendor: NVIDIA Corporation
>> >>>>>
>> >>>>>>>>>>>>> physical id: 0
>> >>>>>
>> >>>>>>>>>>>>> bus info: pci at 0000:03:00.0
>> >>>>>
>> >>>>>>>>>>>>> version: a1
>> >>>>>
>> >>>>>>>>>>>>> width: 64 bits
>> >>>>>
>> >>>>>>>>>>>>> clock: 33MHz
>> >>>>>
>> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>> >>>>>
>> >>>>>>> cap_list
>> >>>>>
>> >>>>>>>>>>>>> configuration: latency=0
>> >>>>>
>> >>>>>>>>>>>>> resources: iomemory:383f0-383ef
>> >>>>> iomemory:383f0-383ef
>> >>>>>
>> >>>>>>>>>>>>> memory:cd000000-cdffffff
>> >>>>> memory:383fc0000000-383fcfffffff
>> >>>>>
>> >>>>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128)
>> >>>>>
>> >>>>>>>>>>>>> memory:ce000000-ce07ffff
>> >>>>>
>> >>>>>>>>>>>>> *-display
>> >>>>>
>> >>>>>>>>>>>>> description: VGA compatible controller
>> >>>>>
>> >>>>>>>>>>>>> product: ASPEED Graphics Family
>> >>>>>
>> >>>>>>>>>>>>> vendor: ASPEED Technology, Inc.
>> >>>>>
>> >>>>>>>>>>>>> physical id: 0
>> >>>>>
>> >>>>>>>>>>>>> bus info: pci at 0000:06:00.0
>> >>>>>
>> >>>>>>>>>>>>> version: 30
>> >>>>>
>> >>>>>>>>>>>>> width: 32 bits
>> >>>>>
>> >>>>>>>>>>>>> clock: 33MHz
>> >>>>>
>> >>>>>>>>>>>>> capabilities: pm msi vga_controller bus_master
>> >>>>>
>> >>>>>>> cap_list rom
>> >>>>>
>> >>>>>>>>>>>>> configuration: driver=ast latency=0
>> >>>>>
>> >>>>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff
>> >>>>>
>> >>>>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128)
>> >>>>>
>> >>>>>>>>>>>>> *-display UNCLAIMED
>> >>>>>
>> >>>>>>>>>>>>> description: VGA compatible controller
>> >>>>>
>> >>>>>>>>>>>>> product: NVIDIA Corporation
>> >>>>>
>> >>>>>>>>>>>>> vendor: NVIDIA Corporation
>> >>>>>
>> >>>>>>>>>>>>> physical id: 0
>> >>>>>
>> >>>>>>>>>>>>> bus info: pci at 0000:82:00.0
>> >>>>>
>> >>>>>>>>>>>>> version: a1
>> >>>>>
>> >>>>>>>>>>>>> width: 64 bits
>> >>>>>
>> >>>>>>>>>>>>> clock: 33MHz
>> >>>>>
>> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>> >>>>>
>> >>>>>>> cap_list
>> >>>>>
>> >>>>>>>>>>>>> configuration: latency=0
>> >>>>>
>> >>>>>>>>>>>>> resources: iomemory:387f0-387ef
>> >>>>> iomemory:387f0-387ef
>> >>>>>
>> >>>>>>>>>>>>> memory:fa000000-faffffff
>> >>>>> memory:387fe0000000-387fefffffff
>> >>>>>
>> >>>>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128)
>> >>>>>
>> >>>>>>>>>>>>> memory:fb000000-fb07ffff
>> >>>>>
>> >>>>>>>>>>>>> *-display UNCLAIMED
>> >>>>>
>> >>>>>>>>>>>>> description: VGA compatible controller
>> >>>>>
>> >>>>>>>>>>>>> product: NVIDIA Corporation
>> >>>>>
>> >>>>>>>>>>>>> vendor: NVIDIA Corporation
>> >>>>>
>> >>>>>>>>>>>>> physical id: 0
>> >>>>>
>> >>>>>>>>>>>>> bus info: pci at 0000:83:00.0
>> >>>>>
>> >>>>>>>>>>>>> version: a1
>> >>>>>
>> >>>>>>>>>>>>> width: 64 bits
>> >>>>>
>> >>>>>>>>>>>>> clock: 33MHz
>> >>>>>
>> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>> >>>>>
>> >>>>>>> cap_list
>> >>>>>
>> >>>>>>>>>>>>> configuration: latency=0
>> >>>>>
>> >>>>>>>>>>>>> resources: iomemory:387f0-387ef
>> >>>>> iomemory:387f0-387ef
>> >>>>>
>> >>>>>>>>>>>>> memory:f8000000-f8ffffff
>> >>>>> memory:387fc0000000-387fcfffffff
>> >>>>>
>> >>>>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128)
>> >>>>>
>> >>>>>>>>>>>>> memory:f9000000-f907ffff
>> >>>>>
>> >>>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>> However what scares the hell out of me is that I don't
>> >>>>> see
>> >>>>>
>> >>>>>>> NVIDIA
>> >>>>>
>> >>>>>>>>>>>>> driver
>> >>>>>
>> >>>>>>>>>>>>> loaded
>> >>>>>
>> >>>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>> lsmod|grep nvidia
>> >>>>>
>> >>>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am
>> >>>>>
>> >>>>>>> guessing I
>> >>>>>
>> >>>>>>>>>>>>> just
>> >>>>>
>> >>>>>>>>>>>>> missed some trivial step during the CUDA installation
>> >>>>> which
>> >>>>>
>> >>>>>>> is very
>> >>>>>
>> >>>>>>>>>>>>> involving. I am unfortunately too tired to debug this
>> >>>>>
>> >>>>>>> tonight.
>> >>>>>
>> >>>>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>>>>> Predrag
>> >>>>>
>> >>>>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>>>
>> >>>>>
>> >>>>>> Links:
>> >>>>>
>> >>>>>> ------
>> >>>>>
>> >>>>>> [1] http://findgl.mk
>> >>>
>> >>>
>> >>> Links:
>> >>> ------
>> >>> [1] http://findgl.mk
>> >
>> >
>> > Links:
>> > ------
>> > [1] http://findgl.mk
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20161017/4e6f0081/attachment-0001.html>


More information about the Autonlab-users mailing list