Memory taken

Predrag Punosevac predragp at cs.cmu.edu
Tue Feb 14 22:23:46 EST 2017


Kaylan Burleigh <kburleigh at lbl.gov> wrote:

> Hi Predrag,
> 
> Yes, I do know how to use unix. All the machines I'm used to run slurm and
> users are not root so the bashrc's are renamed to something else and the
> users edit those.
> 

SLURM is queueing system used to manage and control jobs on clusters
including GPU clusters. In Auton Lab at this point we don't operate a
single cluster due to the fact that we typically buy equipment from
smaller general purpose grants. If we score a 300K grant for the
equipment this year we will buy a cluster and we will run SLURM as well.
SLURM is used on most CMU clusters.




> Anyway, gpu1-3 each have a few gpu's that aren't being used but 99% of the
> memory is taken. See attached. Can we fix that?
> 

Please see this thread

https://github.com/tensorflow/tensorflow/issues/1578

In short it is well known TensorFlow "feature". The only way for me to 
"clear" memory is to reboot the node. 


Best,
Predrag


> Thanks,
> Kaylan
> 


More information about the Autonlab-users mailing list