GPU3 is "configured"
Predrag Punosevac
predragp at imap.srv.cs.cmu.edu
Thu Oct 13 13:39:19 EDT 2016
Dear Autonians,
In the case anybody is interested what happens behind the scenes, Doug
got Caffe and TensorFlow to work on
GPU3. Please see message below. I also got the very useful feed back
from Princeton and Rutgers people. Please check out if you care (you
will have to log into Gmail to see the exchange).
https://groups.google.com/forum/#!forum/springdale-users
I need to think how we move forward with this before start pulling
triggers. If somebody is itchy and can't wait please build Caffe and
TensorFlow in your scratch directory following below howto.
Predrag
On 2016-10-13 13:24, Dougal Sutherland wrote:
> A note about cudnn:
>
> There are a bunch of versions of cudnn. They're not
> backwards-compatible, and different versions of
> caffe/tensorflow/whatever want different ones.
>
> I currently am using the setup in ~dsutherl/cudnn_files:
>
> * I have a bunch of versions of the installer there.
> * The use-cudnn.sh script, intended to be used like "source
> use-cudnn.sh 5.1", will untar the appropriate one into a scratch
> directory (if it hasn't already been done) and set
> CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH is
> needed for caffe binaries, since they don't link to the absolute path;
> the first two (not sure about the the third) are needed for theano.
> Dunno about tensorflow yet.
>
> So, here's the Caffe setup:
>
> cd /home/scratch/$USER
> git clone https://github.com/BVLC/caffe
> cd caffe
> cp Makefile.config.example Makefile.config
>
> # tell it to use openblas; using atlas needs some changes to the
> Makefile
> sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config
>
> # configure to use cudnn (optional)
> source ~dsutherl/cudnn-files/use-cudnn.sh 5.1
> sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config
> perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/'
> Makefile.config
> perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/'
> Makefile.config
>
> # build the library
> make -j23
>
> # to do tests (takes ~10 minutes):
> make -j23 test
> make runtest
>
> # Now, to run caffe binaries you'll need to remember to source
> use-cudnn if you used cudnn before.
>
> # To build the python libary:
> make py
>
> # Requirements for the python library:
> # Some of the system packages are too old; this installs them in your
> scratch directory.
> # You'll have to set PYTHONUSERBASE again before running any python
> processes that use these libs.
> export PYTHONUSERBASE=$HOME/scratch/.local;
> export PATH=$PYTHONUSERBASE/bin:"$PATH" # <- optional
> pip install --user -r python/requirements.txt
>
> # Caffe is dumb and doesn't package its python library properly. The
> easiest way to use it is:
> export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH
> python -c 'import caffe'
>
> On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland <dougal at gmail.com>
> wrote:
>
>> Java fix seemed to work. Now tensorflow wants python-wheel and
>> swig.
>>
>> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac
>> <predragp at imap.srv.cs.cmu.edu> wrote:
>>
>>> On 2016-10-13 11:46, Dougal Sutherland wrote:
>>>
>>>> Having some trouble with tensorflow, because:
>>>
>>>>
>>>
>>>> * it require's Google's bazel build system
>>>
>>>>
>>>
>>>> * The bazel installer says
>>>
>>>> Java version is 1.7.0_111 while at least 1.8 is needed.
>>>
>>>> *
>>>
>>>>
>>>
>>>> * $ java -version
>>>
>>>> openjdk version "1.8.0_102"
>>>
>>>> OpenJDK Runtime Environment (build 1.8.0_102-b14)
>>>
>>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
>>>
>>>> $ javac -version
>>>
>>>> javac 1.7.0_111
>>>
>>>>
>>>
>>> I just did yum -y install java-1.8.0* which installs openjdk 1.8.
>>> Please
>>>
>>> change your java. Let me know if
>>>
>>> you want me to install Oracle JDK 1.8
>>>
>>> Predrag
>>>
>>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac
>>>
>>>> <predragp at cs.cmu.edu> wrote:
>>>
>>>>
>>>
>>>>> Dougal Sutherland <dougal at gmail.com> wrote:
>>>
>>>>>
>>>
>>>>>> Also, this seemed to work for me so far for protobuf:
>>>
>>>>>>
>>>
>>>>>> cd /home/scratch/$USER
>>>
>>>>>> VER=3.1.0
>>>
>>>>>> wget
>>>
>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> https://github.com/google/protobuf/releases/download/v$VER/protobuf-cpp-$VER.tar.gz
>>>
>>>>>> tar xf protobuf-cpp-$VER.tar.gz
>>>
>>>>>> cd protobuf-cpp-$VER
>>>
>>>>>> ./configure --prefix=/home/scratch/$USER
>>>
>>>>>> make -j12
>>>
>>>>>> make -j12 check
>>>
>>>>>> make install
>>>
>>>>>
>>>
>>>>> That is great help!
>>>
>>>>>
>>>
>>>>>>
>>>
>>>>>> You could change --prefix=/usr if making an RPM.
>>>
>>>>>>
>>>
>>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland
>>>
>>>>> <dougal at gmail.com> wrote:
>>>
>>>>>>
>>>
>>>>>>> Some more packages for caffe:
>>>
>>>>>>>
>>>
>>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel
>>>
>>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel
>>>
>>>>>>>
>>>
>>>>>>> (Some of those might be installed already, but at least
>>> gflags
>>>
>>>>> is
>>>
>>>>>>> definitely missing.)
>>>
>>>>>>>
>>>
>>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac <
>>>
>>>>>>> predragp at imap.srv.cs.cmu.edu> wrote:
>>>
>>>>>>>
>>>
>>>>>>> On 2016-10-12 23:26, Arne Suppe wrote:
>>>
>>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice
>>>
>>>>> also hangs
>>>
>>>>>>>> on my computer with R2016a.
>>>
>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>>> We would have to escalate this with MathWorks. I have seen
>>> work
>>>
>>>>> around
>>>
>>>>>>> Internet but it looks like a bug in one of Mathworks provided
>>>
>>>>> MEX files.
>>>
>>>>>>>
>>>
>>>>>>>> I was able compile the matrixMul example in the CUDA
>>> samples
>>>
>>>>> and run
>>>
>>>>>>>> it on gpu3, so I think the build environment is probably
>>> all
>>>
>>>>> set.
>>>
>>>>>>>>
>>>
>>>>>>>> As for the openGL, I think its possibly a problem with
>>> their
>>>
>>>>> build
>>>
>>>>>>>> script findgl.mk [1] [1] which is not familiar with
>>> Springdale OS.
>>>
>>>>> The
>>>
>>>>>>>> demo_suite directory has a precompiled nbody binary you may
>>>
>>>>> try, but I
>>>
>>>>>>>> suspect most users will not need graphics.
>>>
>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>>> That should not be too hard to fix. Some header files have to
>>> be
>>>
>>>>>>> manually edited. The funny part until 7.2 Princeton people
>>>
>>>>> didn't bother
>>>
>>>>>>> to remove RHEL branding which actually made things easier for
>>>
>>>>> us.
>>>
>>>>>>>
>>>
>>>>>>>
>>>
>>>>>>> Doug is trying right now to compile the latest Caffe,
>>>
>>>>> TensorFlow, and
>>>
>>>>>>> protobuf-3. We will try to create an RPM for that so that we
>>>
>>>>> don't have
>>>
>>>>>>> to go through this again. I also asked Princeton and Rutgers
>>>
>>>>> guys if
>>>
>>>>>>> they
>>>
>>>>>>> have WIP RPMs to share.
>>>
>>>>>>>
>>>
>>>>>>> Predrag
>>>
>>>>>>>
>>>
>>>>>>>> Arne
>>>
>>>>>>>>
>>>
>>>>>>>>
>>>
>>>>>>>>
>>>
>>>>>>>>
>>>
>>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac
>>>
>>>>> <predragp at cs.cmu.edu>
>>>
>>>>>>>>> wrote:
>>>
>>>>>>>>>
>>>
>>>>>>>>> Arne Suppe <suppe at andrew.cmu.edu> wrote:
>>>
>>>>>>>>>
>>>
>>>>>>>>>> Hi Predrag,
>>>
>>>>>>>>>> Don???t know if this applies to you, but I just build a
>>>
>>>>> machines with
>>>
>>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the
>>>
>>>>> Titan. After
>>>
>>>>>>>>>> installing CUDA 8, I still found I needed to install the
>>>
>>>>> latest
>>>
>>>>>>>>>> driver off of the NVIDIA web site to get the card
>>>
>>>>> recognized. Right
>>>
>>>>>>>>>> now, I am running 367.44.
>>>
>>>>>>>>>>
>>>
>>>>>>>>>> Arne
>>>
>>>>>>>>>
>>>
>>>>>>>>> Arne,
>>>
>>>>>>>>>
>>>
>>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL
>>>
>>>>> arhitecture I
>>>
>>>>>>>>> see lots of people complaining about it on the forums. I
>>>
>>>>> downloaded
>>>
>>>>>>>>> and
>>>
>>>>>>>>> installed driver from
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce
>>>
>>>>>>>>>
>>>
>>>>>>>>> That seems to made a real difference. Check out this
>>>
>>>>> beautiful outputs
>>>
>>>>>>>>>
>>>
>>>>>>>>> root at gpu3$ ls nvidia*
>>>
>>>>>>>>> nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm
>>>
>>>>>>>>> nvidia-uvm-tools
>>>
>>>>>>>>>
>>>
>>>>>>>>> root at gpu3$ lspci | grep -i nvidia
>>>
>>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation
>>> Device
>>>
>>>>> 1b00 (rev
>>>
>>>>>>>>> a1)
>>>
>>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>> a1)
>>>
>>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation
>>> Device
>>>
>>>>> 1b00 (rev
>>>
>>>>>>>>> a1)
>>>
>>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>> a1)
>>>
>>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation
>>> Device
>>>
>>>>> 1b00 (rev
>>>
>>>>>>>>> a1)
>>>
>>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>> a1)
>>>
>>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation
>>> Device
>>>
>>>>> 1b00 (rev
>>>
>>>>>>>>> a1)
>>>
>>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev
>>> a1)
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>> root at gpu3$ ls /proc/driver
>>>
>>>>>>>>> nvidia nvidia-uvm nvram rtc
>>>
>>>>>>>>>
>>>
>>>>>>>>> root at gpu3$ lsmod |grep nvidia
>>>
>>>>>>>>> nvidia_uvm 738901 0
>>>
>>>>>>>>> nvidia_drm 43405 0
>>>
>>>>>>>>> nvidia_modeset 764432 1 nvidia_drm
>>>
>>>>>>>>> nvidia 11492947 2 nvidia_modeset,nvidia_uvm
>>>
>>>>>>>>> drm_kms_helper 125056 2 ast,nvidia_drm
>>>
>>>>>>>>> drm 349210 5
>>>
>>>>> ast,ttm,drm_kms_helper,nvidia_drm
>>>
>>>>>>>>> i2c_core 40582 7
>>>
>>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
>>>
>>>>>>>>>
>>>
>>>>>>>>> root at gpu3$ nvidia-smi
>>>
>>>>>>>>> Wed Oct 12 22:03:27 2016
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> +-----------------------------------------------------------------------------+
>>>
>>>>>>>>> | NVIDIA-SMI 367.57 Driver Version: 367.57
>>>
>>>>>>>>> |
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> |-------------------------------+----------------------+----------------------+
>>>
>>>>>>>>> | GPU Name Persistence-M| Bus-Id Disp.A |
>>>
>>>>> Volatile
>>>
>>>>>>>>> Uncorr. ECC |
>>>
>>>>>>>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage |
>>>
>>>>> GPU-Util
>>>
>>>>>>>>> Compute M. |
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> |===============================+======================+======================|
>>>
>>>>>>>>> | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off |
>>>
>>>>>>>>> N/A |
>>>
>>>>>>>>> | 23% 32C P0 56W / 250W | 0MiB / 12189MiB |
>>>
>>>>> 0%
>>>
>>>>>>>>> Default |
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> +-------------------------------+----------------------+----------------------+
>>>
>>>>>>>>> | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off |
>>>
>>>>>>>>> N/A |
>>>
>>>>>>>>> | 23% 36C P0 57W / 250W | 0MiB / 12189MiB |
>>>
>>>>> 0%
>>>
>>>>>>>>> Default |
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> +-------------------------------+----------------------+----------------------+
>>>
>>>>>>>>> | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off |
>>>
>>>>>>>>> N/A |
>>>
>>>>>>>>> | 23% 35C P0 57W / 250W | 0MiB / 12189MiB |
>>>
>>>>> 0%
>>>
>>>>>>>>> Default |
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> +-------------------------------+----------------------+----------------------+
>>>
>>>>>>>>> | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off |
>>>
>>>>>>>>> N/A |
>>>
>>>>>>>>> | 0% 35C P0 56W / 250W | 0MiB / 12189MiB |
>>>
>>>>> 0%
>>>
>>>>>>>>> Default |
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> +-------------------------------+----------------------+----------------------+
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> +-----------------------------------------------------------------------------+
>>>
>>>>>>>>> | Processes:
>>>
>>>>> GPU
>>>
>>>>>>>>> Memory |
>>>
>>>>>>>>> | GPU PID Type Process name
>>>
>>>>>>>>> Usage
>>>
>>>>>>>>> |
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> |=============================================================================|
>>>
>>>>>>>>> | No running processes found
>>>
>>>>>>>>> |
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>
>>>
>>>>
>>>
>>
> +-----------------------------------------------------------------------------+
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery
>>>
>>>>>>>>>
>>>
>>>>>>>>> Alignment requirement for Surfaces: Yes
>>>
>>>>>>>>> Device has ECC support: Disabled
>>>
>>>>>>>>> Device supports Unified Addressing (UVA): Yes
>>>
>>>>>>>>> Device PCI Domain ID / Bus ID / location ID: 0 / 131 /
>>> 0
>>>
>>>>>>>>> Compute Mode:
>>>
>>>>>>>>> < Default (multiple host threads can use
>>>
>>>>> ::cudaSetDevice() with
>>>
>>>>>>>>> device simultaneously) >
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU1) :
>>>
>>>>>>>>> Yes
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU2) :
>>>
>>>>>>>>> No
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU3) :
>>>
>>>>>>>>> No
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU0) :
>>>
>>>>>>>>> Yes
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU2) :
>>>
>>>>>>>>> No
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU3) :
>>>
>>>>>>>>> No
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU0) :
>>>
>>>>>>>>> No
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU1) :
>>>
>>>>>>>>> No
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU3) :
>>>
>>>>>>>>> Yes
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU0) :
>>>
>>>>>>>>> No
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU1) :
>>>
>>>>>>>>> No
>>>
>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X
>>> (Pascal)
>>>
>>>>> (GPU2) :
>>>
>>>>>>>>> Yes
>>>
>>>>>>>>>
>>>
>>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version =
>>> 8.0,
>>>
>>>>> CUDA
>>>
>>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X
>>>
>>>>> (Pascal),
>>>
>>>>>>>>> Device1
>>>
>>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 =
>>>
>>>>> TITAN X
>>>
>>>>>>>>> (Pascal)
>>>
>>>>>>>>> Result = PASS
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>> Now not everything is rosy
>>>
>>>>>>>>>
>>>
>>>>>>>>> root at gpu3$ cd
>>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
>>>
>>>>>>>>> root at gpu3$ make
>>>
>>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting
>>>
>>>>> Started Guide
>>>
>>>>>>>>> for how to find and install them. <<<
>>>
>>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting
>>>
>>>>> Started Guide
>>>
>>>>>>>>> for how to find and install them. <<<
>>>
>>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting
>>>
>>>>> Started Guide
>>>
>>>>>>>>> for how to find and install them. <<<
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>> even though those are installed. For example
>>>
>>>>>>>>>
>>>
>>>>>>>>> root at gpu3$ yum whatprovides */libX11.so
>>>
>>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for
>>> libX11
>>>
>>>>>>>>> Repo : core
>>>
>>>>>>>>> Matched from:
>>>
>>>>>>>>> Filename : /usr/lib/libX11.so
>>>
>>>>>>>>>
>>>
>>>>>>>>> also
>>>
>>>>>>>>>
>>>
>>>>>>>>> mesa-libGLU-devel
>>>
>>>>>>>>> mesa-libGL-devel
>>>
>>>>>>>>> xorg-x11-drv-nvidia-devel
>>>
>>>>>>>>>
>>>
>>>>>>>>> but
>>>
>>>>>>>>>
>>>
>>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel
>>> mesa-libGL-devel
>>>
>>>>>>>>> xorg-x11-drv-nvidia-devel
>>>
>>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already
>>>
>>>>> installed and
>>>
>>>>>>>>> latest version
>>>
>>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64
>>> already
>>>
>>>>>>>>> installed
>>>
>>>>>>>>> and latest version
>>>
>>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64
>>>
>>>>> already
>>>
>>>>>>>>> installed and latest version
>>>
>>>>>>>>>
>>>
>>>>>>>>> Also from MATLAB gpuDevice hangs.
>>>
>>>>>>>>>
>>>
>>>>>>>>> So we still don't have a working installation. Any help
>>> would
>>>
>>>>> be
>>>
>>>>>>>>> appreciated.
>>>
>>>>>>>>>
>>>
>>>>>>>>> Best,
>>>
>>>>>>>>> Predrag
>>>
>>>>>>>>>
>>>
>>>>>>>>> P.S. Once we have a working installation we can think of
>>>
>>>>> installing
>>>
>>>>>>>>> Caffe and TensorFlow. For now we have to see why the
>>> things
>>>
>>>>> are not
>>>
>>>>>>>>> working.
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>>>
>>>
>>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac
>>>
>>>>> <predragp at cs.cmu.edu>
>>>
>>>>>>>>>>> wrote:
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>> Dear Autonians,
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all
>>>
>>>>> packages
>>>
>>>>>>>>>>> are
>>>
>>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA
>>>
>>>>> driver to
>>>
>>>>>>>>>>> recognize GPU cards. They appear to be properly
>>> installed
>>>
>>>>> from the
>>>
>>>>>>>>>>> hardware point of view and you can list them with
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>> lshw -class display
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>> root at gpu3$ lshw -class display
>>>
>>>>>>>>>>> *-display UNCLAIMED
>>>
>>>>>>>>>>> description: VGA compatible controller
>>>
>>>>>>>>>>> product: NVIDIA Corporation
>>>
>>>>>>>>>>> vendor: NVIDIA Corporation
>>>
>>>>>>>>>>> physical id: 0
>>>
>>>>>>>>>>> bus info: pci at 0000:02:00.0
>>>
>>>>>>>>>>> version: a1
>>>
>>>>>>>>>>> width: 64 bits
>>>
>>>>>>>>>>> clock: 33MHz
>>>
>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>
>>>>> cap_list
>>>
>>>>>>>>>>> configuration: latency=0
>>>
>>>>>>>>>>> resources: iomemory:383f0-383ef
>>> iomemory:383f0-383ef
>>>
>>>>>>>>>>> memory:cf000000-cfffffff
>>> memory:383fe0000000-383fefffffff
>>>
>>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128)
>>>
>>>>>>>>>>> memory:d0000000-d007ffff
>>>
>>>>>>>>>>> *-display UNCLAIMED
>>>
>>>>>>>>>>> description: VGA compatible controller
>>>
>>>>>>>>>>> product: NVIDIA Corporation
>>>
>>>>>>>>>>> vendor: NVIDIA Corporation
>>>
>>>>>>>>>>> physical id: 0
>>>
>>>>>>>>>>> bus info: pci at 0000:03:00.0
>>>
>>>>>>>>>>> version: a1
>>>
>>>>>>>>>>> width: 64 bits
>>>
>>>>>>>>>>> clock: 33MHz
>>>
>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>
>>>>> cap_list
>>>
>>>>>>>>>>> configuration: latency=0
>>>
>>>>>>>>>>> resources: iomemory:383f0-383ef
>>> iomemory:383f0-383ef
>>>
>>>>>>>>>>> memory:cd000000-cdffffff
>>> memory:383fc0000000-383fcfffffff
>>>
>>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128)
>>>
>>>>>>>>>>> memory:ce000000-ce07ffff
>>>
>>>>>>>>>>> *-display
>>>
>>>>>>>>>>> description: VGA compatible controller
>>>
>>>>>>>>>>> product: ASPEED Graphics Family
>>>
>>>>>>>>>>> vendor: ASPEED Technology, Inc.
>>>
>>>>>>>>>>> physical id: 0
>>>
>>>>>>>>>>> bus info: pci at 0000:06:00.0
>>>
>>>>>>>>>>> version: 30
>>>
>>>>>>>>>>> width: 32 bits
>>>
>>>>>>>>>>> clock: 33MHz
>>>
>>>>>>>>>>> capabilities: pm msi vga_controller bus_master
>>>
>>>>> cap_list rom
>>>
>>>>>>>>>>> configuration: driver=ast latency=0
>>>
>>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff
>>>
>>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128)
>>>
>>>>>>>>>>> *-display UNCLAIMED
>>>
>>>>>>>>>>> description: VGA compatible controller
>>>
>>>>>>>>>>> product: NVIDIA Corporation
>>>
>>>>>>>>>>> vendor: NVIDIA Corporation
>>>
>>>>>>>>>>> physical id: 0
>>>
>>>>>>>>>>> bus info: pci at 0000:82:00.0
>>>
>>>>>>>>>>> version: a1
>>>
>>>>>>>>>>> width: 64 bits
>>>
>>>>>>>>>>> clock: 33MHz
>>>
>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>
>>>>> cap_list
>>>
>>>>>>>>>>> configuration: latency=0
>>>
>>>>>>>>>>> resources: iomemory:387f0-387ef
>>> iomemory:387f0-387ef
>>>
>>>>>>>>>>> memory:fa000000-faffffff
>>> memory:387fe0000000-387fefffffff
>>>
>>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128)
>>>
>>>>>>>>>>> memory:fb000000-fb07ffff
>>>
>>>>>>>>>>> *-display UNCLAIMED
>>>
>>>>>>>>>>> description: VGA compatible controller
>>>
>>>>>>>>>>> product: NVIDIA Corporation
>>>
>>>>>>>>>>> vendor: NVIDIA Corporation
>>>
>>>>>>>>>>> physical id: 0
>>>
>>>>>>>>>>> bus info: pci at 0000:83:00.0
>>>
>>>>>>>>>>> version: a1
>>>
>>>>>>>>>>> width: 64 bits
>>>
>>>>>>>>>>> clock: 33MHz
>>>
>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller
>>>
>>>>> cap_list
>>>
>>>>>>>>>>> configuration: latency=0
>>>
>>>>>>>>>>> resources: iomemory:387f0-387ef
>>> iomemory:387f0-387ef
>>>
>>>>>>>>>>> memory:f8000000-f8ffffff
>>> memory:387fc0000000-387fcfffffff
>>>
>>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128)
>>>
>>>>>>>>>>> memory:f9000000-f907ffff
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>> However what scares the hell out of me is that I don't
>>> see
>>>
>>>>> NVIDIA
>>>
>>>>>>>>>>> driver
>>>
>>>>>>>>>>> loaded
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>> lsmod|grep nvidia
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am
>>>
>>>>> guessing I
>>>
>>>>>>>>>>> just
>>>
>>>>>>>>>>> missed some trivial step during the CUDA installation
>>> which
>>>
>>>>> is very
>>>
>>>>>>>>>>> involving. I am unfortunately too tired to debug this
>>>
>>>>> tonight.
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>> Predrag
>>>
>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>>>
>>>
>>>>
>>>
>>>>
>>>
>>>> Links:
>>>
>>>> ------
>>>
>>>> [1] http://findgl.mk
>
>
> Links:
> ------
> [1] http://findgl.mk
More information about the Autonlab-users
mailing list