GPU2 upgraded

Dougal Sutherland dougal at gmail.com
Wed Oct 26 15:38:09 EDT 2016


Hmm - I tested that tensorflow worked on the mnist example when I built it,
but when I run

python -m tensorflow.models.image.mnist.convolutional

on gpu3 now, it gets to "Initialized!" and then just hangs, not responding
to ^C or ^Z and also not leading to any GPU utiliziation according to
nvidia-smi; it takes a kill -9 to stop it.


On Wed, Oct 26, 2016 at 8:25 PM Hanqi Sun <hanqis at andrew.cmu.edu> wrote:

> Hi,
>
> I am having similar issues with tensorflow as well.
>
> When I run a tensorflow program (like python XX.py), it hangs after
> loading the graph (after executing session.run()). I cannot quit it by
> hitting Ctrl-Z/Ctrl-C and nothing is printed to the stdout/stderr.
>
> Strangely, my program will terminate and do the job (like writing results
> to files). And after it terminates I am able to see all the strings I print
> to stdout/stderr. But before its termination I can do nothing about it.
>
> I have tried both my local tensorflow and the global one on all three GPU
> machines. They all had the same problem.
>
> Best,
> Hanqi
>
> On Wed, Oct 26, 2016 at 3:11 PM, Dougal Sutherland <dougal at gmail.com>
> wrote:
>
> a) You might also want to do
>
> export JUPYTER_CONFIG_DIR=/home/scratch/$USER/.jupyter
>
> if you use jupyter notebooks and whatnot.
>
>
> b) I'm also not able to import theano when using GPUs on gpu2/gpu3 as of
> this afternoon. It hangs in a similar way, though I've set the compiledir
> to be in scratch in my theanorc. I've tracked it down in the debugger to this
> line
> <https://github.com/Theano/Theano/blob/25f0dee338b901070e021d431c938102890bc69f/theano/sandbox/cuda/__init__.py#L556>
> or this one
> <https://github.com/Theano/Theano/blob/25f0dee338b901070e021d431c938102890bc69f/theano/sandbox/cuda/__init__.py#L571> (both
> hang, which one is called depends on theano settings), which call this
> function
> <https://github.com/Theano/Theano/blob/140d0a064523349b630a284247c7cddd767fc46e/theano/sandbox/cuda/cuda_ndarray.cu#L3170>
> or this one
> <https://github.com/Theano/Theano/blob/140d0a064523349b630a284247c7cddd767fc46e/theano/sandbox/cuda/cuda_ndarray.cu#L2947>.
> No idea why this started happening or if it's related -- it doesn't seem
> like it should be hitting nfs at all -- but it seemed to start at the same
> time.
>
>
>
> On Wed, Oct 26, 2016 at 7:20 PM Junier Oliva <junieroliva at gmail.com>
> wrote:
>
> Awesome, seems to work for me too. Thanks!!
>
> -Junier
>
> On Wed, Oct 26, 2016 at 2:08 PM, Dougal Sutherland <dougal at gmail.com>
> wrote:
>
> Here's a workaround that avoids ipython having its config files / etc on
> nfs, it seems to work for me:
>
> export IPYTHONDIR=/home/scratch/$USER/.ipython
>
> You can do this in a terminal or put it your .bash_profile / similar to
> make it permanent.
>
> (I guess this means something changed about the nfs server yesterday/today
> that broke this.)
>
>
> On Wed, Oct 26, 2016 at 7:03 PM yifei ma <mayifei1012 at gmail.com> wrote:
>
> Second that on foxconn. It launches but the ipython client won't start.
>
> Thanks,
> Yifei
>
>
> On 10/26/2016 01:03 PM, Junier Oliva wrote:
>
> Not sure if this at all related, but is ipython broken for anyone else? It
> seems to just hang upon launching it on several auton machines (GPU1, GPU2,
> LOV4, LOW1).
>
> Thanks,
> Junier
>
> On Tue, Oct 25, 2016 at 2:14 PM, Dougal Sutherland <dougal at gmail.com>
> wrote:
>
> The same version of TensorFlow as on gpu3 is now installed on gpu2, along
> with cudnn; let me know if there are issues.
>
> I didn't do a global install of Caffe on either machine, because Caffe is
> kind of dumb and doesn't really do global installs. If anyone wants this,
> talk to me and we can figure out what makes sense.
>
> - Dougal
>
> On Tue, Oct 25, 2016 at 6:09 PM Predrag Punosevac <
> predragp at imap.srv.cs.cmu.edu> wrote:
>
> Dear Autonians,
>
> As scheduled I upgraded GPU2 to 256 GB of RAM and 4 Titan (Pascal) cards
> per Barnabas. Dugal is currently upgrading TensorFlow and Caffe to the
> latest and the greatest. Please wait until his e-mail until you hit the
> machine. MATLAB is broken just like on GPU3 due to cuda-8.0. I am
> working now on GPU1.
>
>
> Predrag
>
> P.S. I will escalate MATLAB issue with MathWorks but I don't expect to
> fixed before R2017a.
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20161026/0f95e3c8/attachment-0001.html>


More information about the Autonlab-users mailing list