GPU3 is "configured"

Predrag Punosevac predragp at cs.cmu.edu
Wed Oct 12 22:23:32 EDT 2016


Arne Suppe <suppe at andrew.cmu.edu> wrote:

> Hi Predrag,
> Don???t know if this applies to you, but I just build a machines with a GTX1080 which has the same PASCAL architecture as the Titan.  After installing CUDA 8, I still found I needed to install the latest driver off of the NVIDIA web site to get the card recognized.  Right now, I am running 367.44.  
> 
> Arne

Arne,

Thank you so much for this e-mail. Yes it is damn PASCAL arhitecture I
see lots of people complaining about it on the forums. I downloaded and
installed driver from 

http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce

That seems to made a real difference. Check out this beautiful outputs

root at gpu3$ ls nvidia*
nvidia0  nvidia1  nvidia2  nvidia3  nvidiactl  nvidia-uvm
nvidia-uvm-tools

root at gpu3$ lspci | grep -i nvidia
02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
a1)
02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
a1)
03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
82:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
a1)
82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
83:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
a1)
83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)


root at gpu3$ ls /proc/driver
nvidia  nvidia-uvm  nvram  rtc

root at gpu3$ lsmod |grep nvidia
nvidia_uvm            738901  0 
nvidia_drm             43405  0 
nvidia_modeset        764432  1 nvidia_drm
nvidia              11492947  2 nvidia_modeset,nvidia_uvm
drm_kms_helper        125056  2 ast,nvidia_drm
drm                   349210  5 ast,ttm,drm_kms_helper,nvidia_drm
i2c_core               40582  7
ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia

root at gpu3$ nvidia-smi 
Wed Oct 12 22:03:27 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57
    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 0000:02:00.0     Off |
N/A |
| 23%   32C    P0    56W / 250W |      0MiB / 12189MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 0000:03:00.0     Off |
N/A |
| 23%   36C    P0    57W / 250W |      0MiB / 12189MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 0000:82:00.0     Off |
N/A |
| 23%   35C    P0    57W / 250W |      0MiB / 12189MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN X (Pascal)    Off  | 0000:83:00.0     Off |
N/A |
|  0%   35C    P0    56W / 250W |      0MiB / 12189MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+

     
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU
Memory |
|  GPU       PID  Type  Process name                               Usage
    |
|=============================================================================|
|  No running processes found
    |
+-----------------------------------------------------------------------------+



/usr/local/cuda/extras/demo_suite/deviceQuery 

  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 131 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with
device simultaneously) >
> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU1) :
Yes
> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU2) :
No
> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU3) :
No
> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU0) :
Yes
> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU2) :
No
> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU3) :
No
> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU0) :
No
> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU1) :
No
> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU3) :
Yes
> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU0) :
No
> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU1) :
No
> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU2) :
Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA
Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X (Pascal), Device1
= TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = TITAN X
(Pascal)
Result = PASS



Now not everything is rosy  

root at gpu3$ cd ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
root at gpu3$ make
>>> WARNING - libGL.so not found, refer to CUDA Getting Started Guide
for how to find and install them. <<<
>>> WARNING - libGLU.so not found, refer to CUDA Getting Started Guide
for how to find and install them. <<<
>>> WARNING - libX11.so not found, refer to CUDA Getting Started Guide
for how to find and install them. <<<


even though those are installed. For example

root at gpu3$ yum whatprovides  */libX11.so
libX11-devel-1.6.3-2.el7.i686 : Development files for libX11
Repo        : core
Matched from:
Filename    : /usr/lib/libX11.so
 
also

mesa-libGLU-devel
mesa-libGL-devel
xorg-x11-drv-nvidia-devel

but 

root at gpu3$ yum -y install mesa-libGLU-devel mesa-libGL-devel
xorg-x11-drv-nvidia-devel
Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already installed and
latest version
Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 already installed
and latest version
Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 already
installed and latest version

Also from MATLAB gpuDevice hangs. 

So we still don't have a working installation. Any help would be
appreciated.

Best,
Predrag

P.S. Once we have a working installation we can think of installing
Caffe and TensorFlow. For now we have to see why the things are not
working. 






> 
> > On Oct 12, 2016, at 6:26 PM, Predrag Punosevac <predragp at cs.cmu.edu> wrote:
> > 
> > Dear Autonians,
> > 
> > GPU3 is "configured". Namely you can log into it and all packages are
> > installed. However I couldn't get NVIDIA provided CUDA driver to
> > recognize GPU cards. They appear to be properly installed from the
> > hardware point of view and you can list them with
> > 
> > lshw -class display
> > 
> > root at gpu3$ lshw -class display
> >  *-display UNCLAIMED     
> >       description: VGA compatible controller
> >       product: NVIDIA Corporation
> >       vendor: NVIDIA Corporation
> >       physical id: 0
> >       bus info: pci at 0000:02:00.0
> >       version: a1
> >       width: 64 bits
> >       clock: 33MHz
> >       capabilities: pm msi pciexpress vga_controller cap_list
> >       configuration: latency=0
> >       resources: iomemory:383f0-383ef iomemory:383f0-383ef
> > memory:cf000000-cfffffff memory:383fe0000000-383fefffffff
> > memory:383ff0000000-383ff1ffffff ioport:6000(size=128)
> > memory:d0000000-d007ffff
> >  *-display UNCLAIMED
> >       description: VGA compatible controller
> >       product: NVIDIA Corporation
> >       vendor: NVIDIA Corporation
> >       physical id: 0
> >       bus info: pci at 0000:03:00.0
> >       version: a1
> >       width: 64 bits
> >       clock: 33MHz
> >       capabilities: pm msi pciexpress vga_controller cap_list
> >       configuration: latency=0
> >       resources: iomemory:383f0-383ef iomemory:383f0-383ef
> > memory:cd000000-cdffffff memory:383fc0000000-383fcfffffff
> > memory:383fd0000000-383fd1ffffff ioport:5000(size=128)
> > memory:ce000000-ce07ffff
> >  *-display
> >       description: VGA compatible controller
> >       product: ASPEED Graphics Family
> >       vendor: ASPEED Technology, Inc.
> >       physical id: 0
> >       bus info: pci at 0000:06:00.0
> >       version: 30
> >       width: 32 bits
> >       clock: 33MHz
> >       capabilities: pm msi vga_controller bus_master cap_list rom
> >       configuration: driver=ast latency=0
> >       resources: irq:19 memory:cb000000-cbffffff
> > memory:cc000000-cc01ffff ioport:4000(size=128)
> >  *-display UNCLAIMED
> >       description: VGA compatible controller
> >       product: NVIDIA Corporation
> >       vendor: NVIDIA Corporation
> > physical id: 0
> >       bus info: pci at 0000:82:00.0
> >       version: a1
> >       width: 64 bits
> >       clock: 33MHz
> >       capabilities: pm msi pciexpress vga_controller cap_list
> >       configuration: latency=0
> >       resources: iomemory:387f0-387ef iomemory:387f0-387ef
> > memory:fa000000-faffffff memory:387fe0000000-387fefffffff
> > memory:387ff0000000-387ff1ffffff ioport:e000(size=128)
> > memory:fb000000-fb07ffff
> >  *-display UNCLAIMED
> >       description: VGA compatible controller
> >       product: NVIDIA Corporation
> >       vendor: NVIDIA Corporation
> >       physical id: 0
> >       bus info: pci at 0000:83:00.0
> >       version: a1
> >       width: 64 bits
> >       clock: 33MHz
> >       capabilities: pm msi pciexpress vga_controller cap_list
> >       configuration: latency=0
> >       resources: iomemory:387f0-387ef iomemory:387f0-387ef
> > memory:f8000000-f8ffffff memory:387fc0000000-387fcfffffff
> > memory:387fd0000000-387fd1ffffff ioport:d000(size=128)
> > memory:f9000000-f907ffff
> > 
> > 
> > However what scares the hell out of me is that I don't see NVIDIA driver
> > loaded
> > 
> > lsmod|grep nvidia
> > 
> > and the device nodes /dev/nvidia are not created. I am guessing I just
> > missed some trivial step during the CUDA installation which is very
> > involving. I am unfortunately too tired to debug this tonight. 
> > 
> > Predrag



More information about the Autonlab-users mailing list