GPU8
Predrag Punosevac
predragp at andrew.cmu.edu
Thu Mar 29 14:44:25 EDT 2018
Yotam Hechtlinger <yhechtli at andrew.cmu.edu> wrote:
> Hello Predrag,
>
> There might be a bug with GPU8 also.
> I didn't have time to test it yet, but python crashes when trying to call
> keras.
I did cold reboot. It didn't help. I think what we see is the bug with
the driver 390.30. The bug could be Titan Xp specific that is why we see
older machines working.Nvidia has a websites where one can download the
scripts which one can use to recompile the latest driver. I think the
latest driver is 390.48. which is quite a few versions ahead of 390.30.
I am installing it right now on GPU9. If that doesn't work I will try
downgrading kernel which assumption that it is a kernel bug. The
following kernels are available
kernel.x86_64 3.10.0-693.5.2.el7 @updates
kernel.x86_64 3.10.0-693.11.6.el7 @updates
kernel.x86_64 3.10.0-693.21.1.el7
Right now I am running 3.10.0-693.21.1 but we can try to go one or even
two kernels back.
If all that fails I still have few magic tricks in my hat but they are
related to motherboard firmware. GPU8 and GPU9 have the same
motherboards but not other servers.
Best,
Predrag
> Unlike GPU 5,6 & 9, you can actually get the GPU working, but when I run a
> keras prediction functions it crashed and says:
>
> Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source
> was compiled with 7004 (compatibility version 7000). If using a binary
> install, upgrade your CuDNN library to match. If building from sources,
> make sure the library loaded at runtime matches a compatible version
> specified during compile configuration.
> 2018-03-29 09:57:49.807855: F tensorflow/core/kernels/conv_ops.cc:717]
> Check failed: stream->parent()->GetConvolveAlgorithms(
> conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
>
> Same code works on GPU4.
> I know this is not informative, I'll look into it later, just wanted to
> give you a heads up.
> I think this might be why there aren't any users on GPU8 but there are on
> GPU4.
>
> Thanks,
> Yotam.
More information about the Autonlab-users
mailing list