Memory taken
Predrag Punosevac
predragp at cs.cmu.edu
Tue Feb 14 22:23:46 EST 2017
Kaylan Burleigh <kburleigh at lbl.gov> wrote:
> Hi Predrag,
>
> Yes, I do know how to use unix. All the machines I'm used to run slurm and
> users are not root so the bashrc's are renamed to something else and the
> users edit those.
>
SLURM is queueing system used to manage and control jobs on clusters
including GPU clusters. In Auton Lab at this point we don't operate a
single cluster due to the fact that we typically buy equipment from
smaller general purpose grants. If we score a 300K grant for the
equipment this year we will buy a cluster and we will run SLURM as well.
SLURM is used on most CMU clusters.
> Anyway, gpu1-3 each have a few gpu's that aren't being used but 99% of the
> memory is taken. See attached. Can we fix that?
>
Please see this thread
https://github.com/tensorflow/tensorflow/issues/1578
In short it is well known TensorFlow "feature". The only way for me to
"clear" memory is to reboot the node.
Best,
Predrag
> Thanks,
> Kaylan
>
More information about the Autonlab-users
mailing list