<div dir="ltr">Hi Autonians, <div><br></div><div>Adding on to Predrag's long list of misuse, I also request people who are running tensorflow jobs on GPU to make sure about the following things. <div><br></div><div>1. Limit the GPU usage to only the cards that are required for the jobs. <b>BY DEFAULT</b>, tensorflow will use up GPU memory from the <b>ALL</b> <b>GPU cards</b> available on the machine which may by lying idle most of the time. This can be done by setting the <a href="https://stackoverflow.com/questions/37893755/tensorflow-set-cuda-visible-devices-within-jupyter">CUDA_VISIBLE_DEVICES</a> inside your environment or using the <code dir="ltr" style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:14.4px;line-height:1;font-family:"Roboto Mono",monospace;box-sizing:inherit;color:rgb(26,115,232);outline:0px;background:rgb(241,243,244);padding:1px 4px;word-break:break-word"><a href="https://www.tensorflow.org/api_docs/python/tf/config/set_visible_devices" style="box-sizing:inherit;color:rgb(26,115,232);outline:0px;text-decoration-line:none;font-family:Roboto,"Noto Sans","Noto Sans JP","Noto Sans KR","Noto Naskh Arabic","Noto Sans Thai","Noto Sans Hebrew","Noto Sans Bengali",sans-serif;font-size:16px">tf.config.experimental.set_visible_devices</a></code>within Tensorflow.</div><div><br></div><div>2. Secondly, you may also want to <b>SET</b> the <b>FLAG</b> <span style="background-color:rgb(241,243,244);color:rgb(55,71,79);font-family:"Roboto Mono",monospace;font-size:14.4px"><b>TF_FORCE_GPU_ALLOW_GROWTH</b> or </span><code dir="ltr" style="box-sizing:inherit;color:rgb(26,115,232);outline:0px;text-decoration-line:none;font-family:"Roboto Mono",monospace;font-size:14.4px;background:rgb(241,243,244);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:1;padding:1px 4px;word-break:break-word"><a href="https://www.tensorflow.org/api_docs/python/tf/config/experimental/set_memory_growth" style="box-sizing:inherit;color:rgb(26,115,232);outline:0px;text-decoration-line:none;font-family:Roboto,"Noto Sans","Noto Sans JP","Noto Sans KR","Noto Naskh Arabic","Noto Sans Thai","Noto Sans Hebrew","Noto Sans Bengali",sans-serif;font-size:16px">tf.config.experimental.set_memory_growth</a></code>which allocates and uses memory in a dynamic fashion.</div><div><br></div><div>I hope following the above practices will help us utilize our shared resources more efficiently.</div><div><div><div><br></div><div>Reference: <a href="https://www.tensorflow.org/guide/gpu">https://www.tensorflow.org/guide/gpu</a></div><div><div><br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div><font face="arial, helvetica, sans-serif">Thanking you, </font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">Warm Regards, </font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div dir="ltr"><font face="arial, helvetica, sans-serif"><span></span>Tanmay Agarwal<span></span> | MSR Graduate Student</font><div><font face="arial, helvetica, sans-serif">Robotics Institute @ CMU</font></div><div><font face="arial, helvetica, sans-serif">mailto: <a href="mailto:tanmaya@andrew.cmu.edu" target="_blank">tanmaya@andrew.cmu.edu</a></font></div></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div></div></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 11, 2020 at 1:09 AM Predrag Punosevac <<a href="mailto:predragp@andrew.cmu.edu">predragp@andrew.cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear Autonians,<br>

<br>

This is a short list of commonly observed misuses of our lab resources.<br>

<br>

1. Using GPU nodes for CPU jobs. Currently 6 out of 19 GPU nodes are<br>

currently running CPU jobs while I am typing this email. This has to<br>

stop immediately!<br>

<br>

2. Using a cache to avoid recomputing data or accessing a slow database<br>

can provide you with a great performance boost. Do not under any<br>

circumstance use your home directory for caching. Do not use /tmp<br>

partition for caching. /tmp is the part of / slice which is limited to<br>

50-60GB only and will quickly be filled rendering machine non-usable for<br>

everyone.<br>

<br>

3. Don't put Jupiter sqlite database on your home directory. It is<br>

likely going to become incoherent due to NFS properties. Please use <br>

<br>

/home/scratch/$username <br>

<br>

for sqlite database, cashing, and volatile data in particular.<br>

<br>

4. Please make sure you release GPU cards once you are done running your<br>

Python scripts. Typically the easiest way for me to deal with those as<br>

well as zombi processes is reboot. Chances of comp nodes experiencing<br>

hardware problem grow exponentially with each reboot (dead RAM<br>

typically). Those are very time consuming to fix.<br>

<br>

5. Please clean your scratch directories regulary. I can't emphasis<br>

enough how important is this.<br>

<br>

6. If you do have an Auton Lab issued desktop which is VPN connected to<br>

the computing nodes please don't use shell gateways under any<br>

circumstances to connect to comp nodes. Your desktops are your private<br>

shell gateways and they are ssh reachable from anywhere on the world.<br>

<br>

7. Do not store non-essential things in your home directories. An<br>

example would be putting your conda or R packages. Please put that in<br>

scratch. Write a small script which can recreate scratch directories for<br>

you. <br>

<br>

8. Please don't transfer large amounts of data via shell gateways. Log<br>

into the comp nodes and use outgoing ssh connections to pull the data<br>

onto the server from outside the lab.<br>

<br>

Likely to be continued after a good night sleep...<br>

<br>

<br>

Cheers,<br>

Predrag<br>

</blockquote></div>