GPU8

Predrag Punosevac predragp at andrew.cmu.edu
Thu Mar 29 14:44:25 EDT 2018


Yotam Hechtlinger <yhechtli at andrew.cmu.edu> wrote:

> Hello Predrag,
> 
> There might be a bug with GPU8 also.
> I didn't have time to test it yet, but python crashes when trying to call
> keras.

I did cold reboot. It didn't help. I think what we see is the bug with
the driver 390.30. The bug could be Titan Xp specific that is why we see
older machines working.Nvidia has a websites where one can download the
scripts which one can use to recompile the latest driver. I think the
latest driver is 390.48. which is quite a few versions ahead of 390.30.
I am installing it right now on GPU9. If that doesn't work I will try
downgrading kernel which assumption that it is a kernel bug. The
following kernels are available 

kernel.x86_64                    3.10.0-693.5.2.el7 @updates
kernel.x86_64                    3.10.0-693.11.6.el7 @updates
kernel.x86_64                    3.10.0-693.21.1.el7  

Right now I am running 3.10.0-693.21.1 but we can try to go one or even
two kernels back. 

If all that fails I still have few magic tricks in my hat but they are
related to motherboard firmware. GPU8 and GPU9 have the same
motherboards but not other servers.

Best,
Predrag



> Unlike GPU 5,6 & 9, you can actually get the GPU working, but when I run a
> keras prediction functions it crashed and says:
> 
> Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source
> was compiled with 7004 (compatibility version 7000).  If using a binary
> install, upgrade your CuDNN library to match.  If building from sources,
> make sure the library loaded at runtime matches a compatible version
> specified during compile configuration.
> 2018-03-29 09:57:49.807855: F tensorflow/core/kernels/conv_ops.cc:717]
> Check failed: stream->parent()->GetConvolveAlgorithms(
> conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
> 
> Same code works on GPU4.
> I know this is not informative, I'll look into it later, just wanted to
> give you a heads up.
> I think this might be why there aren't any users on GPU8 but there are on
> GPU4.
> 
> Thanks,
> Yotam.


More information about the Autonlab-users mailing list