cuda problem
Ifigeneia Apostolopoulou
iapostol at andrew.cmu.edu
Tue Aug 18 21:37:08 EDT 2020
Predrag, now it works fine. thanks a million! :-D
gpu2,10,11,12,13,14,21 seem to have a similar issue.
On Tue, Aug 18, 2020 at 5:23 PM Predrag Punosevac <predragp at andrew.cmu.edu>
wrote:
> Ifigeneia Apostolopoulou <iapostol at andrew.cmu.edu> wrote:
>
> > yes, but there is still no bin/ptxas in cuda 10.2. actually there's no
> bin
> > directory. it seems that cuda-10.2 is corrupted?
> >
>
> I took a clue from your message and did the fresh installation of CUDA
> to GPU1 only. I upgraded the kernel and the driver to the latest one
> supporting branch 7.8 of RedHat. The driver works as expected in my
> limited testing. CUDA is upgraded to the newly released 11.0. I really
> hate that NVidia is intensionally breaking previous stable releases as
> soon as the new one is branched out.
>
> Could you please try building Tensor Flow in GPU1 and report the
> progress? We will eventually have to upgrade all GPU nodes to CUDA 11
> even if they are fully working now.
>
> Best,
> Predrag
>
>
>
> > On Tue, Aug 18, 2020 at 11:41 AM Predrag Punosevac <
> predragp at andrew.cmu.edu>
> > wrote:
> >
> > > Because cuda folder is cuda 10.2 folder. Cuda folder is typically just
> a
> > > symbolic link to the curen version of cuda.
> > >
> > > On Tue, Aug 18, 2020, 11:31 AM Kyle Miller <mille856 at andrew.cmu.edu>
> > > wrote:
> > >
> > >> I see. I ran a few find commands on gpu13, I couldn't find a cuda
> folder
> > >> or CUPTI.
> > >>
> > >> On Tue, Aug 18, 2020 at 10:00 AM Ifigeneia Apostolopoulou <
> > >> iapostol at andrew.cmu.edu> wrote:
> > >>
> > >>> Hi Kyle,
> > >>> Thanks a lot for your reply!
> > >>>
> > >>> I also had this issue and I solved it as you did. However, this
> seems to
> > >>> be another issue:
> > >>> I currently can't see CUPTI in usr/local/cuda/extras/CUPTI (or
> anywhere
> > >>> in gpu1 to set it to my path) which causes the issue.
> > >>> I am also attaching the screenshot with the working (gpu3) and
> > >>> not-working (gpu1) case. In gpu1, gpu2, gpu13, it seems that the
> directory
> > >>> cuda (and all its content) has been moved (and I can't find it in
> any other
> > >>> directory).
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Aug 18, 2020 at 9:32 AM Kyle Miller <mille856 at andrew.cmu.edu
> >
> > >>> wrote:
> > >>>
> > >>>> Ifi,
> > >>>> I recently had difficulty on GPU13, having not used it in a long
> > >>>> while. For me, the issue was that miniconda had moved. I added
> > >>>> /opt/miniconda-py38/bin to my path and rebuilt my environment (not
> sure if
> > >>>> that was necessary). Then it worked.
> > >>>> -Kyle
> > >>>>
> > >>>> On Tue, Aug 18, 2020 at 2:14 AM Predrag Punosevac <
> > >>>> predragp at andrew.cmu.edu> wrote:
> > >>>>
> > >>>>> Ifigeneia Apostolopoulou <iapostol at andrew.cmu.edu> wrote:
> > >>>>>
> > >>>>> > Hi Predrag,
> > >>>>> >
> > >>>>> > I hope that this (weird) summer is going well!
> > >>>>> >
> > >>>>> > I noticed a change in servers gpu1, gpu2, gpu13, gpu14.
> > >>>>> > Specifically, I no longer can find
> > >>>>>
> > >>>>> I have not touch those servers in a very long time. I am CC-ing
> users
> > >>>>> mailing list. My brain is shutting down at this late hour. Maybe
> > >>>>> somebody could be of more help tomorrow morning.
> > >>>>>
> > >>>>> >
> > >>>>> > /usr/local/cuda/extras/CUPTI
> > >>>>> >
> > >>>>>
> > >>>>> I believe you.
> > >>>>>
> > >>>>>
> > >>>>> > which results in the error when I'm building my tensorflow
> models.
> > >>>>> >
> > >>>>> > Not found: ./bin/ptxas not found. Relying on driver to perform
> ptx
> > >>>>> > compilation. This message will be only logged once.
> > >>>>> >
> > >>>>> > Any ideas, how could I solve this issue? Would it be possible to
> > >>>>> restore
> > >>>>> > the cuda directory?
> > >>>>> >
> > >>>>> > Also, I currently do not have access to gpu21.
> > >>>>>
> > >>>>> It is fixed now. I just restarted sssd daemon. Please don't use
> gpu20
> > >>>>> and gpu21 unless you are training 3D neuronal networks for which
> you
> > >>>>> need lot of GPU memory.
> > >>>>>
> > >>>>> Predrag
> > >>>>>
> > >>>>>
> > >>>>> >
> > >>>>> > Thanks a lot in advance!
> > >>>>>
> > >>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20200818/6d16cd1c/attachment.html>
More information about the Autonlab-users
mailing list