GPU3 is "configured"
Arne Suppe
suppe at andrew.cmu.edu
Wed Oct 12 23:26:48 EDT 2016
Hmm - I don’t use matlab for deep learning, but gpuDevice also hangs on my computer with R2016a.
I was able compile the matrixMul example in the CUDA samples and run it on gpu3, so I think the build environment is probably all set.
As for the openGL, I think its possibly a problem with their build script findgl.mk which is not familiar with Springdale OS. The demo_suite directory has a precompiled nbody binary you may try, but I suspect most users will not need graphics.
Arne
> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac <predragp at cs.cmu.edu> wrote:
>
> Arne Suppe <suppe at andrew.cmu.edu> wrote:
>
>> Hi Predrag,
>> Don???t know if this applies to you, but I just build a machines with a GTX1080 which has the same PASCAL architecture as the Titan. After installing CUDA 8, I still found I needed to install the latest driver off of the NVIDIA web site to get the card recognized. Right now, I am running 367.44.
>>
>> Arne
>
> Arne,
>
> Thank you so much for this e-mail. Yes it is damn PASCAL arhitecture I
> see lots of people complaining about it on the forums. I downloaded and
> installed driver from
>
> http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce
>
> That seems to made a real difference. Check out this beautiful outputs
>
> root at gpu3$ ls nvidia*
> nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm
> nvidia-uvm-tools
>
> root at gpu3$ lspci | grep -i nvidia
> 02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
> a1)
> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
> 03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
> a1)
> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
> 82:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
> a1)
> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
> 83:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev
> a1)
> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1)
>
>
> root at gpu3$ ls /proc/driver
> nvidia nvidia-uvm nvram rtc
>
> root at gpu3$ lsmod |grep nvidia
> nvidia_uvm 738901 0
> nvidia_drm 43405 0
> nvidia_modeset 764432 1 nvidia_drm
> nvidia 11492947 2 nvidia_modeset,nvidia_uvm
> drm_kms_helper 125056 2 ast,nvidia_drm
> drm 349210 5 ast,ttm,drm_kms_helper,nvidia_drm
> i2c_core 40582 7
> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
>
> root at gpu3$ nvidia-smi
> Wed Oct 12 22:03:27 2016
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 367.57 Driver Version: 367.57
> |
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile
> Uncorr. ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util
> Compute M. |
> |===============================+======================+======================|
> | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off |
> N/A |
> | 23% 32C P0 56W / 250W | 0MiB / 12189MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off |
> N/A |
> | 23% 36C P0 57W / 250W | 0MiB / 12189MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off |
> N/A |
> | 23% 35C P0 57W / 250W | 0MiB / 12189MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
> | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off |
> N/A |
> | 0% 35C P0 56W / 250W | 0MiB / 12189MiB | 0%
> Default |
> +-------------------------------+----------------------+----------------------+
>
>
> +-----------------------------------------------------------------------------+
> | Processes: GPU
> Memory |
> | GPU PID Type Process name Usage
> |
> |=============================================================================|
> | No running processes found
> |
> +-----------------------------------------------------------------------------+
>
>
>
> /usr/local/cuda/extras/demo_suite/deviceQuery
>
> Alignment requirement for Surfaces: Yes
> Device has ECC support: Disabled
> Device supports Unified Addressing (UVA): Yes
> Device PCI Domain ID / Bus ID / location ID: 0 / 131 / 0
> Compute Mode:
> < Default (multiple host threads can use ::cudaSetDevice() with
> device simultaneously) >
>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU1) :
> Yes
>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU2) :
> No
>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU3) :
> No
>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU0) :
> Yes
>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU2) :
> No
>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU3) :
> No
>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU0) :
> No
>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU1) :
> No
>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU3) :
> Yes
>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU0) :
> No
>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU1) :
> No
>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU2) :
> Yes
>
> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA
> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X (Pascal), Device1
> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = TITAN X
> (Pascal)
> Result = PASS
>
>
>
> Now not everything is rosy
>
> root at gpu3$ cd ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody
> root at gpu3$ make
>>>> WARNING - libGL.so not found, refer to CUDA Getting Started Guide
> for how to find and install them. <<<
>>>> WARNING - libGLU.so not found, refer to CUDA Getting Started Guide
> for how to find and install them. <<<
>>>> WARNING - libX11.so not found, refer to CUDA Getting Started Guide
> for how to find and install them. <<<
>
>
> even though those are installed. For example
>
> root at gpu3$ yum whatprovides */libX11.so
> libX11-devel-1.6.3-2.el7.i686 : Development files for libX11
> Repo : core
> Matched from:
> Filename : /usr/lib/libX11.so
>
> also
>
> mesa-libGLU-devel
> mesa-libGL-devel
> xorg-x11-drv-nvidia-devel
>
> but
>
> root at gpu3$ yum -y install mesa-libGLU-devel mesa-libGL-devel
> xorg-x11-drv-nvidia-devel
> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already installed and
> latest version
> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 already installed
> and latest version
> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 already
> installed and latest version
>
> Also from MATLAB gpuDevice hangs.
>
> So we still don't have a working installation. Any help would be
> appreciated.
>
> Best,
> Predrag
>
> P.S. Once we have a working installation we can think of installing
> Caffe and TensorFlow. For now we have to see why the things are not
> working.
>
>
>
>
>
>
>>
>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac <predragp at cs.cmu.edu> wrote:
>>>
>>> Dear Autonians,
>>>
>>> GPU3 is "configured". Namely you can log into it and all packages are
>>> installed. However I couldn't get NVIDIA provided CUDA driver to
>>> recognize GPU cards. They appear to be properly installed from the
>>> hardware point of view and you can list them with
>>>
>>> lshw -class display
>>>
>>> root at gpu3$ lshw -class display
>>> *-display UNCLAIMED
>>> description: VGA compatible controller
>>> product: NVIDIA Corporation
>>> vendor: NVIDIA Corporation
>>> physical id: 0
>>> bus info: pci at 0000:02:00.0
>>> version: a1
>>> width: 64 bits
>>> clock: 33MHz
>>> capabilities: pm msi pciexpress vga_controller cap_list
>>> configuration: latency=0
>>> resources: iomemory:383f0-383ef iomemory:383f0-383ef
>>> memory:cf000000-cfffffff memory:383fe0000000-383fefffffff
>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128)
>>> memory:d0000000-d007ffff
>>> *-display UNCLAIMED
>>> description: VGA compatible controller
>>> product: NVIDIA Corporation
>>> vendor: NVIDIA Corporation
>>> physical id: 0
>>> bus info: pci at 0000:03:00.0
>>> version: a1
>>> width: 64 bits
>>> clock: 33MHz
>>> capabilities: pm msi pciexpress vga_controller cap_list
>>> configuration: latency=0
>>> resources: iomemory:383f0-383ef iomemory:383f0-383ef
>>> memory:cd000000-cdffffff memory:383fc0000000-383fcfffffff
>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128)
>>> memory:ce000000-ce07ffff
>>> *-display
>>> description: VGA compatible controller
>>> product: ASPEED Graphics Family
>>> vendor: ASPEED Technology, Inc.
>>> physical id: 0
>>> bus info: pci at 0000:06:00.0
>>> version: 30
>>> width: 32 bits
>>> clock: 33MHz
>>> capabilities: pm msi vga_controller bus_master cap_list rom
>>> configuration: driver=ast latency=0
>>> resources: irq:19 memory:cb000000-cbffffff
>>> memory:cc000000-cc01ffff ioport:4000(size=128)
>>> *-display UNCLAIMED
>>> description: VGA compatible controller
>>> product: NVIDIA Corporation
>>> vendor: NVIDIA Corporation
>>> physical id: 0
>>> bus info: pci at 0000:82:00.0
>>> version: a1
>>> width: 64 bits
>>> clock: 33MHz
>>> capabilities: pm msi pciexpress vga_controller cap_list
>>> configuration: latency=0
>>> resources: iomemory:387f0-387ef iomemory:387f0-387ef
>>> memory:fa000000-faffffff memory:387fe0000000-387fefffffff
>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128)
>>> memory:fb000000-fb07ffff
>>> *-display UNCLAIMED
>>> description: VGA compatible controller
>>> product: NVIDIA Corporation
>>> vendor: NVIDIA Corporation
>>> physical id: 0
>>> bus info: pci at 0000:83:00.0
>>> version: a1
>>> width: 64 bits
>>> clock: 33MHz
>>> capabilities: pm msi pciexpress vga_controller cap_list
>>> configuration: latency=0
>>> resources: iomemory:387f0-387ef iomemory:387f0-387ef
>>> memory:f8000000-f8ffffff memory:387fc0000000-387fcfffffff
>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128)
>>> memory:f9000000-f907ffff
>>>
>>>
>>> However what scares the hell out of me is that I don't see NVIDIA driver
>>> loaded
>>>
>>> lsmod|grep nvidia
>>>
>>> and the device nodes /dev/nvidia are not created. I am guessing I just
>>> missed some trivial step during the CUDA installation which is very
>>> involving. I am unfortunately too tired to debug this tonight.
>>>
>>> Predrag
>
More information about the Autonlab-users
mailing list