GPU3 is "configured"

Arne Suppe suppe at andrew.cmu.edu
Wed Oct 12 23:26:48 EDT 2016


Hmm - I don’t use matlab for deep learning, but gpuDevice also hangs on my computer with R2016a.

I was able compile the matrixMul example in the CUDA samples and run it on gpu3, so I think the build environment is probably all set.

As for the openGL, I think its possibly a problem with their build script findgl.mk which is not familiar with Springdale OS.  The demo_suite directory has a precompiled nbody binary you may try, but I suspect most users will not need graphics.

Arne




> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac <predragp at cs.cmu.edu> wrote:
> 
> Arne Suppe <suppe at andrew.cmu.edu> wrote:
> 
>> Hi Predrag,
>> Don???t know if this applies to you, but I just build a machines with a GTX1080 which has the same PASCAL architecture as the Titan.  After installing CUDA 8, I still found I needed to install the latest driver off of the NVIDIA web site to get the card recognized.  Right now, I am running 367.44.  
>> 
>> Arne
> 
> Arne,
> 
> Thank you so much for this e-mail. Yes it is damn PASCAL arhitecture I
> see lots of people complaining about it on the forums. I downloaded and
> installed driver from 
> 
> http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce
> 
> That seems to made a real difference. Check out this beautiful outputs
> 
> root at gpu3$ ls nvidia*
> nvidia0  nvidia1  nvidia2  nvidia3  nvidiactl  nvidia-uvm
> nvidia-uvm-tools
> 
> root at gpu3$ lspci | grep -i nvidia
> 02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
> a1)
> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
> 03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
> a1)
> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
> 82:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
> a1)
> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
> 83:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
> a1)
> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
> 
> 
> root at gpu3$ ls /proc/driver
> nvidia  nvidia-uvm  nvram  rtc
> 
> root at gpu3$ lsmod |grep nvidia
> nvidia_uvm            738901  0 
> nvidia_drm             43405  0 
> nvidia_modeset        764432  1 nvidia_drm
> nvidia              11492947  2 nvidia_modeset,nvidia_uvm
> drm_kms_helper        125056  2 ast,nvidia_drm
> drm                   349210  5 ast,ttm,drm_kms_helper,nvidia_drm
> i2c_core               40582  7
> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
> 
> root at gpu3$ nvidia-smi 
> Wed Oct 12 22:03:27 2016       
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 367.57                 Driver Version: 367.57
>    |
> |-------------------------------+----------------------+----------------------+
> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
> Uncorr. ECC |
> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
> Compute M. |
> |===============================+======================+======================|
> |   0  TITAN X (Pascal)    Off  | 0000:02:00.0     Off |
> N/A |
> | 23%   32C    P0    56W / 250W |      0MiB / 12189MiB |      0%
> Default |
> +-------------------------------+----------------------+----------------------+
> |   1  TITAN X (Pascal)    Off  | 0000:03:00.0     Off |
> N/A |
> | 23%   36C    P0    57W / 250W |      0MiB / 12189MiB |      0%
> Default |
> +-------------------------------+----------------------+----------------------+
> |   2  TITAN X (Pascal)    Off  | 0000:82:00.0     Off |
> N/A |
> | 23%   35C    P0    57W / 250W |      0MiB / 12189MiB |      0%
> Default |
> +-------------------------------+----------------------+----------------------+
> |   3  TITAN X (Pascal)    Off  | 0000:83:00.0     Off |
> N/A |
> |  0%   35C    P0    56W / 250W |      0MiB / 12189MiB |      0%
> Default |
> +-------------------------------+----------------------+----------------------+
> 
> 
> +-----------------------------------------------------------------------------+
> | Processes:                                                       GPU
> Memory |
> |  GPU       PID  Type  Process name                               Usage
>    |
> |=============================================================================|
> |  No running processes found
>    |
> +-----------------------------------------------------------------------------+
> 
> 
> 
> /usr/local/cuda/extras/demo_suite/deviceQuery 
> 
>  Alignment requirement for Surfaces:            Yes
>  Device has ECC support:                        Disabled
>  Device supports Unified Addressing (UVA):      Yes
>  Device PCI Domain ID / Bus ID / location ID:   0 / 131 / 0
>  Compute Mode:
>     < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU1) :
> Yes
>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU2) :
> No
>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU3) :
> No
>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU0) :
> Yes
>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU2) :
> No
>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU3) :
> No
>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU0) :
> No
>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU1) :
> No
>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU3) :
> Yes
>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU0) :
> No
>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU1) :
> No
>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU2) :
> Yes
> 
> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA
> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X (Pascal), Device1
> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = TITAN X
> (Pascal)
> Result = PASS
> 
> 
> 
> Now not everything is rosy  
> 
> root at gpu3$ cd ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
> root at gpu3$ make
>>>> WARNING - libGL.so not found, refer to CUDA Getting Started Guide
> for how to find and install them. <<<
>>>> WARNING - libGLU.so not found, refer to CUDA Getting Started Guide
> for how to find and install them. <<<
>>>> WARNING - libX11.so not found, refer to CUDA Getting Started Guide
> for how to find and install them. <<<
> 
> 
> even though those are installed. For example
> 
> root at gpu3$ yum whatprovides  */libX11.so
> libX11-devel-1.6.3-2.el7.i686 : Development files for libX11
> Repo        : core
> Matched from:
> Filename    : /usr/lib/libX11.so
> 
> also
> 
> mesa-libGLU-devel
> mesa-libGL-devel
> xorg-x11-drv-nvidia-devel
> 
> but 
> 
> root at gpu3$ yum -y install mesa-libGLU-devel mesa-libGL-devel
> xorg-x11-drv-nvidia-devel
> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already installed and
> latest version
> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 already installed
> and latest version
> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 already
> installed and latest version
> 
> Also from MATLAB gpuDevice hangs. 
> 
> So we still don't have a working installation. Any help would be
> appreciated.
> 
> Best,
> Predrag
> 
> P.S. Once we have a working installation we can think of installing
> Caffe and TensorFlow. For now we have to see why the things are not
> working. 
> 
> 
> 
> 
> 
> 
>> 
>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac <predragp at cs.cmu.edu> wrote:
>>> 
>>> Dear Autonians,
>>> 
>>> GPU3 is "configured". Namely you can log into it and all packages are
>>> installed. However I couldn't get NVIDIA provided CUDA driver to
>>> recognize GPU cards. They appear to be properly installed from the
>>> hardware point of view and you can list them with
>>> 
>>> lshw -class display
>>> 
>>> root at gpu3$ lshw -class display
>>> *-display UNCLAIMED     
>>>      description: VGA compatible controller
>>>      product: NVIDIA Corporation
>>>      vendor: NVIDIA Corporation
>>>      physical id: 0
>>>      bus info: pci at 0000:02:00.0
>>>      version: a1
>>>      width: 64 bits
>>>      clock: 33MHz
>>>      capabilities: pm msi pciexpress vga_controller cap_list
>>>      configuration: latency=0
>>>      resources: iomemory:383f0-383ef iomemory:383f0-383ef
>>> memory:cf000000-cfffffff memory:383fe0000000-383fefffffff
>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128)
>>> memory:d0000000-d007ffff
>>> *-display UNCLAIMED
>>>      description: VGA compatible controller
>>>      product: NVIDIA Corporation
>>>      vendor: NVIDIA Corporation
>>>      physical id: 0
>>>      bus info: pci at 0000:03:00.0
>>>      version: a1
>>>      width: 64 bits
>>>      clock: 33MHz
>>>      capabilities: pm msi pciexpress vga_controller cap_list
>>>      configuration: latency=0
>>>      resources: iomemory:383f0-383ef iomemory:383f0-383ef
>>> memory:cd000000-cdffffff memory:383fc0000000-383fcfffffff
>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128)
>>> memory:ce000000-ce07ffff
>>> *-display
>>>      description: VGA compatible controller
>>>      product: ASPEED Graphics Family
>>>      vendor: ASPEED Technology, Inc.
>>>      physical id: 0
>>>      bus info: pci at 0000:06:00.0
>>>      version: 30
>>>      width: 32 bits
>>>      clock: 33MHz
>>>      capabilities: pm msi vga_controller bus_master cap_list rom
>>>      configuration: driver=ast latency=0
>>>      resources: irq:19 memory:cb000000-cbffffff
>>> memory:cc000000-cc01ffff ioport:4000(size=128)
>>> *-display UNCLAIMED
>>>      description: VGA compatible controller
>>>      product: NVIDIA Corporation
>>>      vendor: NVIDIA Corporation
>>> physical id: 0
>>>      bus info: pci at 0000:82:00.0
>>>      version: a1
>>>      width: 64 bits
>>>      clock: 33MHz
>>>      capabilities: pm msi pciexpress vga_controller cap_list
>>>      configuration: latency=0
>>>      resources: iomemory:387f0-387ef iomemory:387f0-387ef
>>> memory:fa000000-faffffff memory:387fe0000000-387fefffffff
>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128)
>>> memory:fb000000-fb07ffff
>>> *-display UNCLAIMED
>>>      description: VGA compatible controller
>>>      product: NVIDIA Corporation
>>>      vendor: NVIDIA Corporation
>>>      physical id: 0
>>>      bus info: pci at 0000:83:00.0
>>>      version: a1
>>>      width: 64 bits
>>>      clock: 33MHz
>>>      capabilities: pm msi pciexpress vga_controller cap_list
>>>      configuration: latency=0
>>>      resources: iomemory:387f0-387ef iomemory:387f0-387ef
>>> memory:f8000000-f8ffffff memory:387fc0000000-387fcfffffff
>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128)
>>> memory:f9000000-f907ffff
>>> 
>>> 
>>> However what scares the hell out of me is that I don't see NVIDIA driver
>>> loaded
>>> 
>>> lsmod|grep nvidia
>>> 
>>> and the device nodes /dev/nvidia are not created. I am guessing I just
>>> missed some trivial step during the CUDA installation which is very
>>> involving. I am unfortunately too tired to debug this tonight. 
>>> 
>>> Predrag
> 




More information about the Autonlab-users mailing list