<div dir="ltr"><div><img src="cid:ii_k9lufxsr0" alt="image.png" width="268" height="313"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 29, 2020 at 3:43 PM Predrag Punosevac <<a href="mailto:predragp@andrew.cmu.edu">predragp@andrew.cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Sarveshwaran Jayaraman <<a href="mailto:sarveshj@andrew.cmu.edu" target="_blank">sarveshj@andrew.cmu.edu</a>> wrote:<br>
<br>
> Hi All,<br>
> <br>
> I was trying to run my experiments on GPU 14 and came across this<br>
> situation. On GPU ID 0 & 1 (highlighted in green& blue respectively)<br>
> user has not released the GPU memory after experiment.<br>
<br>
This is one of quintessential don'ts and it is now well documented <br>
<br>
<a href="https://www.autonlab.org/autonlab_wiki/" rel="noreferrer" target="_blank">https://www.autonlab.org/autonlab_wiki/</a><br>
<br>
Offending members will have their accounts suspendend until they take <br>
a quiz and score above 80%. If you take a quiz and flunk it, a mandatory<br>
seven day waiting period will be enforced :-))))))<br>
<br>
Cheers,<br>
Predrag<br>
<br>
<br>
<br>
> A possible scenario could be that the user has not shutdown the<br>
> jupyter notebook after use (closing does not suffice).  Therefore 2<br>
> out of possible 4 GPUs are not available on that node.<br>
> <br>
> <br>
> Please be mindful to free GPU memory after use for other users if that's the case. One simple solution I found around this is to convert your notebooks to python script and run them using nohup command.<br>
> <br>
> Thanks for your understanding!<br>
> <br>
> <br>
> (base) sarveshj@gpu14$ nvidia-smi -l 3<br>
> Wed Apr 29 11:18:10 2020<br>
> +-----------------------------------------------------------------------------+<br>
> | NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |<br>
> |-------------------------------+----------------------+----------------------+<br>
> | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |<br>
> | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |<br>
> |===============================+======================+======================|<br>
> |   0  GeForce RTX 208...  Off  | 00000000:18:00.0 Off |                  N/A |<br>
> | 32%   48C    P2    59W / 250W |  10980MiB / 11019MiB |      0%      Default |<br>
> +-------------------------------+----------------------+----------------------+<br>
> |   1  GeForce RTX 208...  Off  | 00000000:3B:00.0 Off |                  N/A |<br>
> | 41%   66C    P2    98W / 250W |  10984MiB / 11019MiB |     18%      Default |<br>
> +-------------------------------+----------------------+----------------------+<br>
> |   2  GeForce RTX 208...  Off  | 00000000:86:00.0 Off |                  N/A |<br>
> | 33%   54C    P2    67W / 250W |   1935MiB / 11019MiB |      9%      Default |<br>
> +-------------------------------+----------------------+----------------------+<br>
> |   3  GeForce RTX 208...  Off  | 00000000:AF:00.0 Off |                  N/A |<br>
> | 32%   43C    P2    62W / 250W |   8849MiB / 11019MiB |      4%      Default |<br>
> +-------------------------------+----------------------+----------------------+<br>
> <br>
> +-----------------------------------------------------------------------------+<br>
> | Processes:                                                       GPU Memory |<br>
> |  GPU       PID   Type   Process name                             Usage      |<br>
> |=============================================================================|<br>
> |    0    203849      C   python3                                     1677MiB |<br>
> |    1    203849      C   python3                                      155MiB |<br>
> |    1    236031      C   /home/scratch/sarveshj/mini/bin/python3    10817MiB |<br>
> |    2    203849      C   python3                                      155MiB |<br>
> |    2    232877      C   python                                      1613MiB |<br>
> |    2    236031      C   /home/scratch/sarveshj/mini/bin/python3      155MiB |<br>
> |    3    147113      C   python                                      1613MiB |<br>
> |    3    203849      C   python3                                      155MiB |<br>
> +-----------------------------------------------------------------------------+<br>
> <br>
> <br>
> <br>
> <br>
> [1562005799537]<<a href="https://www.autonlab.org/" rel="noreferrer" target="_blank">https://www.autonlab.org/</a>><br>
> <br>
>         Sarvesh Jayaraman<<a href="https://www.linkedin.com/in/sarveshjayaraman/" rel="noreferrer" target="_blank">https://www.linkedin.com/in/sarveshjayaraman/</a>><br>
> Sr. Research Analyst, Auton Lab<br>
> Carnegie Mellon University<br>
> Mob: +1-240-893-4287<br>
> <br>
> <br>
</blockquote></div>