<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:11pt;color:#000000;font-family:'Trebuchet MS',Trebuchet,sans-serif;" dir="ltr">
<p></p>
<div>Hi All,<br>
</div>
<div><br>
</div>
<div>I was trying to run my experiments on GPU 14 and came across this situation. On GPU ID 0 & 1 (highlighted in green& blue respectively) user has not released the GPU memory after experiment.</div>
<div>A possible scenario could be that the user has not shutdown the jupyter notebook after use (closing does not suffice). Therefore 2 out of possible 4 GPUs are not available on that node.<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div>Please be mindful to free GPU memory after use for other users if that's the case. One simple solution I found around this is to convert your notebooks to python script and run them using nohup command.
<br>
</div>
<div><br>
</div>
<div>Thanks for your understanding!<br>
</div>
</div>
<br>
<div><br>
</div>
<div>
<div>(base) sarveshj@gpu14$ nvidia-smi -l 3<br>
Wed Apr 29 11:18:10 2020 <br>
+-----------------------------------------------------------------------------+<br>
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |<br>
|-------------------------------+----------------------+----------------------+<br>
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |<br>
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |<br>
|===============================+======================+======================|<br>
| <font color="#4BA524">0 GeForce RTX 208... Off | 00000000:18:00.0 Off | N/A |</font><br>
<font color="#4BA524">| 32% 48C P2 59W / 250W | 10980MiB / 11019MiB | 0% Default |</font><br>
+-------------------------------+----------------------+----------------------+<br>
| 1 GeForce RTX 208... Off | 00000000:3B:00.0 Off | N/A |<br>
| 41% 66C P2 98W / 250W | 10984MiB / 11019MiB | 18% Default |<br>
+-------------------------------+----------------------+----------------------+<br>
| 2 GeForce RTX 208... Off | 00000000:86:00.0 Off | N/A |<br>
| 33% 54C P2 67W / 250W | 1935MiB / 11019MiB | 9% Default |<br>
+-------------------------------+----------------------+----------------------+<br>
| <font color="#006FC9"><b> 3 GeForce RTX 208... Off | 00000000:AF:00.0 Off | N/A |</b></font><b><br>
</b><font color="#006FC9"><b>| 32% 43C P2 62W / 250W | 8849MiB / 11019MiB | 4% Default |</b></font><br>
+-------------------------------+----------------------+----------------------+<br>
<br>
+-----------------------------------------------------------------------------+<br>
| Processes: GPU Memory |<br>
| GPU PID Type Process name Usage |<br>
|=============================================================================|<br>
| <font color="#4BA524"><b>0 203849 C python3 1677MiB |</b></font><br>
| 1 203849 C python3 155MiB |<br>
| 1 236031 C /home/scratch/sarveshj/mini/bin/python3 10817MiB |<br>
| 2 203849 C python3 155MiB |<br>
| 2 232877 C python 1613MiB |<br>
| 2 236031 C /home/scratch/sarveshj/mini/bin/python3 155MiB |<br>
| <font color="#006FC9"> 3 147113 C python 1613MiB |</font><br>
<font color="#006FC9">| 3 203849 C python3 155MiB |</font><br>
+-----------------------------------------------------------------------------+</div>
</div>
<p></p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<div id="Signature">
<div id="divtagdefaultwrapper" dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: "Trebuchet MS", Trebuchet, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
<span style="font-size:11pt"></span><span style="font-size:11pt"></span><span style="font-size:11pt"></span>
<table class="ms-rteTable-lightlines" style="border-collapse:collapse; table-layout:fixed; border:1px solid transparent" cellspacing="0">
<tbody>
<tr class="ms-rteTableEvenRow-lightlines" style="border-collapse:collapse; border:1px solid transparent">
<td class="ms-rteTableEvenCol-lightlines" style="border-collapse:collapse; width:123px; border-top-color:transparent; border-bottom-color:rgb(146,192,224); border-left-color:transparent; border-right-color:transparent; border-top-width:1px; border-bottom-width:1px; border-top-style:solid; border-bottom-style:solid" align="left">
<a href="https://www.autonlab.org/" class="OWAAutoLink" title="Ctrl+Click or tap to follow the link" id="LPNoLP"><img class="EmojiInsert" alt="1562005799537" src="cid:195b1e27-f61e-4b9b-b9f0-ca8f55164cc2"></a><br>
<br>
</td>
<td class="ms-rteTableOddCol-lightlines" style="border-collapse:collapse; width:248px; border-top-color:transparent; border-bottom-color:rgb(146,192,224); border-left-color:transparent; border-right-color:transparent; border-top-width:1px; border-bottom-width:1px; border-top-style:solid; border-bottom-style:solid">
<span style="font-size:11pt"></span><a href="https://www.linkedin.com/in/sarveshjayaraman/" class="OWAAutoLink" id="LPNoLP"><span style="font-size:11pt">Sarvesh Jayaraman</span></a><br>
<span style="font-size:11pt"></span><span style="font-size:11pt">Sr. Research Analyst,
</span><span style="font-size:11pt">Auton Lab</span><br>
<span style="font-size:11pt">Carnegie Mellon University</span><br>
<span style="font-size:11pt">Mob: +1-240-893-4287</span><br>
<span style="font-size:11pt"></span></td>
</tr>
</tbody>
</table>
<span style="font-size:11pt"></span><br>
</div>
</div>
</div>
</body>
</html>