GPU memory Usage

Predrag Punosevac predragp at andrew.cmu.edu
Wed Apr 29 21:10:45 EDT 2020


Gus Welter <gwelter at andrew.cmu.edu> wrote:

> [image: image.png]

Yum. Check out Dos and Don'ts on our wiki. I am issuing serious threats.

Also check out FAQ I added today

https://www.autonlab.org/autonlab_wiki/faq.html

I wrote two long paragraphs just for you :-) 

Why not Ubuntu?

and 

May I use Docker?

Speaking of which, I got a lot of crap today from Artur regarding those
r-stan scripts. He wants those working ASAP. I can't deal with that
Docker nonsense. I am busy before the weekend but I will have to get to
the buttom of it and see why is your R script crash dumping. On the
related note I have a marching orders to get you Auton systems server
account. Congradulations on the new job!  I will do it over the weekend.
The good news is that you will have more work while people are bing laid
off left and right . The bad news is that you will be working pro bono
like all of us. On the second thought that seems also good news as it
will help with your diet plan:-) 


Cheers,
Predrag




> 
> On Wed, Apr 29, 2020 at 3:43 PM Predrag Punosevac <predragp at andrew.cmu.edu>
> wrote:
> 
> > Sarveshwaran Jayaraman <sarveshj at andrew.cmu.edu> wrote:
> >
> > > Hi All,
> > >
> > > I was trying to run my experiments on GPU 14 and came across this
> > > situation. On GPU ID 0 & 1 (highlighted in green& blue respectively)
> > > user has not released the GPU memory after experiment.
> >
> > This is one of quintessential don'ts and it is now well documented
> >
> > https://www.autonlab.org/autonlab_wiki/
> >
> > Offending members will have their accounts suspendend until they take
> > a quiz and score above 80%. If you take a quiz and flunk it, a mandatory
> > seven day waiting period will be enforced :-))))))
> >
> > Cheers,
> > Predrag
> >
> >
> >
> > > A possible scenario could be that the user has not shutdown the
> > > jupyter notebook after use (closing does not suffice).  Therefore 2
> > > out of possible 4 GPUs are not available on that node.
> > >
> > >
> > > Please be mindful to free GPU memory after use for other users if that's
> > the case. One simple solution I found around this is to convert your
> > notebooks to python script and run them using nohup command.
> > >
> > > Thanks for your understanding!
> > >
> > >
> > > (base) sarveshj at gpu14$ nvidia-smi -l 3
> > > Wed Apr 29 11:18:10 2020
> > >
> > +-----------------------------------------------------------------------------+
> > > | NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version:
> > 10.2     |
> > >
> > |-------------------------------+----------------------+----------------------+
> > > | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
> > Uncorr. ECC |
> > > | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
> > Compute M. |
> > >
> > |===============================+======================+======================|
> > > |   0  GeForce RTX 208...  Off  | 00000000:18:00.0 Off |
> >   N/A |
> > > | 32%   48C    P2    59W / 250W |  10980MiB / 11019MiB |      0%
> > Default |
> > >
> > +-------------------------------+----------------------+----------------------+
> > > |   1  GeForce RTX 208...  Off  | 00000000:3B:00.0 Off |
> >   N/A |
> > > | 41%   66C    P2    98W / 250W |  10984MiB / 11019MiB |     18%
> > Default |
> > >
> > +-------------------------------+----------------------+----------------------+
> > > |   2  GeForce RTX 208...  Off  | 00000000:86:00.0 Off |
> >   N/A |
> > > | 33%   54C    P2    67W / 250W |   1935MiB / 11019MiB |      9%
> > Default |
> > >
> > +-------------------------------+----------------------+----------------------+
> > > |   3  GeForce RTX 208...  Off  | 00000000:AF:00.0 Off |
> >   N/A |
> > > | 32%   43C    P2    62W / 250W |   8849MiB / 11019MiB |      4%
> > Default |
> > >
> > +-------------------------------+----------------------+----------------------+
> > >
> > >
> > +-----------------------------------------------------------------------------+
> > > | Processes:                                                       GPU
> > Memory |
> > > |  GPU       PID   Type   Process name
> >  Usage      |
> > >
> > |=============================================================================|
> > > |    0    203849      C   python3
> >  1677MiB |
> > > |    1    203849      C   python3
> > 155MiB |
> > > |    1    236031      C   /home/scratch/sarveshj/mini/bin/python3
> > 10817MiB |
> > > |    2    203849      C   python3
> > 155MiB |
> > > |    2    232877      C   python
> > 1613MiB |
> > > |    2    236031      C   /home/scratch/sarveshj/mini/bin/python3
> > 155MiB |
> > > |    3    147113      C   python
> > 1613MiB |
> > > |    3    203849      C   python3
> > 155MiB |
> > >
> > +-----------------------------------------------------------------------------+
> > >
> > >
> > >
> > >
> > > [1562005799537]<https://www.autonlab.org/>
> > >
> > >         Sarvesh Jayaraman<https://www.linkedin.com/in/sarveshjayaraman/>
> > > Sr. Research Analyst, Auton Lab
> > > Carnegie Mellon University
> > > Mob: +1-240-893-4287
> > >
> > >
> >


More information about the Autonlab-users mailing list