<div dir="ltr">Predrag, now it works fine. thanks a million! :-D<div><br></div><div>gpu2,10,11,12,13,14,21 seem to have a similar issue.</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 18, 2020 at 5:23 PM Predrag Punosevac <<a href="mailto:predragp@andrew.cmu.edu">predragp@andrew.cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Ifigeneia Apostolopoulou <<a href="mailto:iapostol@andrew.cmu.edu" target="_blank">iapostol@andrew.cmu.edu</a>> wrote:<br>
<br>
> yes, but there is still no bin/ptxas in cuda 10.2. actually there's no bin<br>
> directory. it seems that cuda-10.2 is corrupted?<br>
> <br>
<br>
I took a clue from your message and did the fresh installation of CUDA<br>
to GPU1 only. I upgraded the kernel and the driver to the latest one<br>
supporting branch 7.8 of RedHat. The driver works as expected in my<br>
limited testing. CUDA is upgraded to the newly released 11.0. I really<br>
hate that NVidia is intensionally breaking previous stable releases as<br>
soon as the new one is branched out. <br>
<br>
Could you please try building Tensor Flow in GPU1 and report the<br>
progress? We will eventually have to upgrade all GPU nodes to CUDA 11<br>
even if they are fully working now. <br>
<br>
Best,<br>
Predrag<br>
<br>
<br>
<br>
> On Tue, Aug 18, 2020 at 11:41 AM Predrag Punosevac <<a href="mailto:predragp@andrew.cmu.edu" target="_blank">predragp@andrew.cmu.edu</a>><br>
> wrote:<br>
> <br>
> > Because cuda folder is cuda 10.2 folder. Cuda folder is typically just a<br>
> > symbolic link to the curen version of cuda.<br>
> ><br>
> > On Tue, Aug 18, 2020, 11:31 AM Kyle Miller <<a href="mailto:mille856@andrew.cmu.edu" target="_blank">mille856@andrew.cmu.edu</a>><br>
> > wrote:<br>
> ><br>
> >> I see. I ran a few find commands on gpu13, I couldn't find a cuda folder<br>
> >> or CUPTI.<br>
> >><br>
> >> On Tue, Aug 18, 2020 at 10:00 AM Ifigeneia Apostolopoulou <<br>
> >> <a href="mailto:iapostol@andrew.cmu.edu" target="_blank">iapostol@andrew.cmu.edu</a>> wrote:<br>
> >><br>
> >>> Hi Kyle,<br>
> >>> Thanks a lot for your reply!<br>
> >>><br>
> >>> I also had this issue and I solved it as you did. However, this seems to<br>
> >>> be another issue:<br>
> >>> I currently can't see CUPTI in usr/local/cuda/extras/CUPTI (or anywhere<br>
> >>> in gpu1 to set it to my path) which causes the issue.<br>
> >>> I am also attaching the screenshot with the working (gpu3) and<br>
> >>> not-working (gpu1) case. In gpu1, gpu2, gpu13, it seems that the directory<br>
> >>> cuda (and all its content) has been moved (and I can't find it in any other<br>
> >>> directory).<br>
> >>><br>
> >>><br>
> >>><br>
> >>><br>
> >>><br>
> >>> On Tue, Aug 18, 2020 at 9:32 AM Kyle Miller <<a href="mailto:mille856@andrew.cmu.edu" target="_blank">mille856@andrew.cmu.edu</a>><br>
> >>> wrote:<br>
> >>><br>
> >>>> Ifi,<br>
> >>>> I recently had difficulty on GPU13, having not used it in a long<br>
> >>>> while. For me, the issue was that miniconda had moved. I added<br>
> >>>> /opt/miniconda-py38/bin to my path and rebuilt my environment (not sure if<br>
> >>>> that was necessary). Then it worked.<br>
> >>>> -Kyle<br>
> >>>><br>
> >>>> On Tue, Aug 18, 2020 at 2:14 AM Predrag Punosevac <<br>
> >>>> <a href="mailto:predragp@andrew.cmu.edu" target="_blank">predragp@andrew.cmu.edu</a>> wrote:<br>
> >>>><br>
> >>>>> Ifigeneia Apostolopoulou <<a href="mailto:iapostol@andrew.cmu.edu" target="_blank">iapostol@andrew.cmu.edu</a>> wrote:<br>
> >>>>><br>
> >>>>> > Hi Predrag,<br>
> >>>>> ><br>
> >>>>> > I hope that this (weird) summer is going well!<br>
> >>>>> ><br>
> >>>>> > I noticed a change in servers gpu1, gpu2, gpu13, gpu14.<br>
> >>>>> > Specifically, I no longer can find<br>
> >>>>><br>
> >>>>> I have not touch those servers in a very long time. I am CC-ing users<br>
> >>>>> mailing list. My brain is shutting down at this late hour. Maybe<br>
> >>>>> somebody could be of more help tomorrow morning.<br>
> >>>>><br>
> >>>>> ><br>
> >>>>> > /usr/local/cuda/extras/CUPTI<br>
> >>>>> ><br>
> >>>>><br>
> >>>>> I believe you.<br>
> >>>>><br>
> >>>>><br>
> >>>>> > which results in the error when I'm building my tensorflow models.<br>
> >>>>> ><br>
> >>>>> > Not found: ./bin/ptxas not found. Relying on driver to perform ptx<br>
> >>>>> > compilation. This message will be only logged once.<br>
> >>>>> ><br>
> >>>>> > Any ideas, how could I solve this issue? Would it be possible to<br>
> >>>>> restore<br>
> >>>>> > the cuda directory?<br>
> >>>>> ><br>
> >>>>> > Also, I currently do not have access to gpu21.<br>
> >>>>><br>
> >>>>> It is fixed now. I just restarted sssd daemon. Please don't use gpu20<br>
> >>>>> and gpu21 unless you are training 3D neuronal networks for which you<br>
> >>>>> need lot of GPU memory.<br>
> >>>>><br>
> >>>>> Predrag<br>
> >>>>><br>
> >>>>><br>
> >>>>> ><br>
> >>>>> > Thanks a lot in advance!<br>
> >>>>><br>
> >>>><br>
</blockquote></div>