incompatible cudnn.h and libcublas?

Predrag Punosevac predragp at andrew.cmu.edu
Mon Apr 19 11:58:53 EDT 2021


Hi Ben,

That is super useful info. That is exactly the feedback I was hoping to get
by CC-ing users at autonlab. There are just too many moving wheels and 90% of
time changing nothing is the correct approach to system administration.

Best,
Predrag

On Mon, Apr 19, 2021 at 10:30 AM Benedikt Boecking <boecking at andrew.cmu.edu>
wrote:

> Just fyi, as far as I am aware, pytorch only supports cuda up to 11.1 for
> now. It would be great if we could wait with updating cuda to 11.3 since
> many lab members rely on pytorch.
>
>
>
> On Apr 18, 2021, at 10:20 PM, Predrag Punosevac <predragp at andrew.cmu.edu>
> wrote:
>
> Hi Ifigeneia,
>
> I am CC-ing as this might be of wider interest to the lab members.
>
> This seems to be a cuDNN issue. gpu1 runs cuda11.2 on RHEL 7.9 while gpu2
> runs cuda11 on RHEL7.9. Current CUDA  release is 11.3 and all recently
> provisioned computing nodes run RHEL 8.3. In an ideal world I should
> firstly upgrade all computing nodes to 8.3 and CUDA installations to 11.3
> before we talk about cuDNN libraries.  cuDNN is a proprietary software. I
> logged into my NVidia developer account and I am downloading RedHat 8.1
> RPMs of cuDNN v8.1 released on February 26. That release supposedly should
> be compatible with all versions of CUDA branch 11 i.e. 11.0, 11.1, 11.2,
> and 11.3 but runs on RHEL 8.1 (so there is no guarantee that it will run on
> 8.3). I can download RMPs for RHEL 7.3 but obviously there is no guarantee
> that will work on RHEL 7.9.
>
> Upgrading 7.9 to 8.3 on 30+ computing nodes is not realistic. The down
> time would be significant. Updating CUDA and cuDNN across 23+ servers is
> also non trivial as it requires reboot. Upgrading cuda on 5 GPU servers per
> week seems a more reasonable and less risky approach. Are there any
> impending deadlines that I should be aware of? If Ben who is CC to this
> email confirms that I would be happy to try to upgrade CUDA to 8.3 on
> GPU[1-5] and install cuDNN v8.1 but I will not upgrade OS to 8.3.
>
> Best,
> Predrag
>
> On Sat, Apr 17, 2021 at 10:40 AM Ifigeneia Apostolopoulou <
> iapostol at andrew.cmu.edu> wrote:
>
>> Hi Predrag,
>>
>> on gpu1/gpu2, I'm getting the following error:
>>
>> RuntimeError: Mixed dnn version. The header is version 8002 while the
>> library is version 7605.
>>
>> It seems that there exists an updated cudnn.h in /usr/include/ but no in
>>
>> /usr/local/cuda-11/include
>> /usr/local/cuda-11/targets/include/
>>
>> In gpu20, there seems to be  no cudnn.h.
>>
>> would it be possible to sync cudnn.h??
>>
>> thanks!
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20210419/71fb7524/attachment-0001.html>


More information about the Autonlab-users mailing list