GPU2 upgraded
Predrag Punosevac
predragp at cs.cmu.edu
Thu Oct 27 09:12:41 EDT 2016
Dougal Sutherland <dougal at gmail.com> wrote:
> I'm not sure if anyone changed anything, but ipython (without the
> IPYTHONDIR workaround above), theano with GPUs, and tensorflow
> -m tensorflow.models.image.mnist.convolutional are all working for me now.
>
My guess is that somebody had some ipython notebooks open (I forgot how
it works but ipython is notorious for opening sockets and freezing the
system) which probably killed NFS for a while. Waiting a bit for those
problems to be resolved on its own before pocking the sytem is a good
strategy.
Predrag
> On Wed, Oct 26, 2016 at 8:38 PM Dougal Sutherland <dougal at gmail.com> wrote:
>
> > Hmm - I tested that tensorflow worked on the mnist example when I built
> > it, but when I run
> >
> > python -m tensorflow.models.image.mnist.convolutional
> >
> > on gpu3 now, it gets to "Initialized!" and then just hangs, not responding
> > to ^C or ^Z and also not leading to any GPU utiliziation according to
> > nvidia-smi; it takes a kill -9 to stop it.
> >
> >
> > On Wed, Oct 26, 2016 at 8:25 PM Hanqi Sun <hanqis at andrew.cmu.edu> wrote:
> >
> > Hi,
> >
> > I am having similar issues with tensorflow as well.
> >
> > When I run a tensorflow program (like python XX.py), it hangs after
> > loading the graph (after executing session.run()). I cannot quit it by
> > hitting Ctrl-Z/Ctrl-C and nothing is printed to the stdout/stderr.
> >
> > Strangely, my program will terminate and do the job (like writing results
> > to files). And after it terminates I am able to see all the strings I print
> > to stdout/stderr. But before its termination I can do nothing about it.
> >
> > I have tried both my local tensorflow and the global one on all three GPU
> > machines. They all had the same problem.
> >
> > Best,
> > Hanqi
> >
> > On Wed, Oct 26, 2016 at 3:11 PM, Dougal Sutherland <dougal at gmail.com>
> > wrote:
> >
> > a) You might also want to do
> >
> > export JUPYTER_CONFIG_DIR=/home/scratch/$USER/.jupyter
> >
> > if you use jupyter notebooks and whatnot.
> >
> >
> > b) I'm also not able to import theano when using GPUs on gpu2/gpu3 as of
> > this afternoon. It hangs in a similar way, though I've set the compiledir
> > to be in scratch in my theanorc. I've tracked it down in the debugger to this
> > line
> > <https://github.com/Theano/Theano/blob/25f0dee338b901070e021d431c938102890bc69f/theano/sandbox/cuda/__init__.py#L556>
> > or this one
> > <https://github.com/Theano/Theano/blob/25f0dee338b901070e021d431c938102890bc69f/theano/sandbox/cuda/__init__.py#L571> (both
> > hang, which one is called depends on theano settings), which call this
> > function
> > <https://github.com/Theano/Theano/blob/140d0a064523349b630a284247c7cddd767fc46e/theano/sandbox/cuda/cuda_ndarray.cu#L3170>
> > or this one
> > <https://github.com/Theano/Theano/blob/140d0a064523349b630a284247c7cddd767fc46e/theano/sandbox/cuda/cuda_ndarray.cu#L2947>.
> > No idea why this started happening or if it's related -- it doesn't seem
> > like it should be hitting nfs at all -- but it seemed to start at the same
> > time.
> >
> >
> >
> > On Wed, Oct 26, 2016 at 7:20 PM Junier Oliva <junieroliva at gmail.com>
> > wrote:
> >
> > Awesome, seems to work for me too. Thanks!!
> >
> > -Junier
> >
> > On Wed, Oct 26, 2016 at 2:08 PM, Dougal Sutherland <dougal at gmail.com>
> > wrote:
> >
> > Here's a workaround that avoids ipython having its config files / etc on
> > nfs, it seems to work for me:
> >
> > export IPYTHONDIR=/home/scratch/$USER/.ipython
> >
> > You can do this in a terminal or put it your .bash_profile / similar to
> > make it permanent.
> >
> > (I guess this means something changed about the nfs server yesterday/today
> > that broke this.)
> >
> >
> > On Wed, Oct 26, 2016 at 7:03 PM yifei ma <mayifei1012 at gmail.com> wrote:
> >
> > Second that on foxconn. It launches but the ipython client won't start.
> >
> > Thanks,
> > Yifei
> >
> >
> > On 10/26/2016 01:03 PM, Junier Oliva wrote:
> >
> > Not sure if this at all related, but is ipython broken for anyone else? It
> > seems to just hang upon launching it on several auton machines (GPU1, GPU2,
> > LOV4, LOW1).
> >
> > Thanks,
> > Junier
> >
> > On Tue, Oct 25, 2016 at 2:14 PM, Dougal Sutherland <dougal at gmail.com>
> > wrote:
> >
> > The same version of TensorFlow as on gpu3 is now installed on gpu2, along
> > with cudnn; let me know if there are issues.
> >
> > I didn't do a global install of Caffe on either machine, because Caffe is
> > kind of dumb and doesn't really do global installs. If anyone wants this,
> > talk to me and we can figure out what makes sense.
> >
> > - Dougal
> >
> > On Tue, Oct 25, 2016 at 6:09 PM Predrag Punosevac <
> > predragp at imap.srv.cs.cmu.edu> wrote:
> >
> > Dear Autonians,
> >
> > As scheduled I upgraded GPU2 to 256 GB of RAM and 4 Titan (Pascal) cards
> > per Barnabas. Dugal is currently upgrading TensorFlow and Caffe to the
> > latest and the greatest. Please wait until his e-mail until you hit the
> > machine. MATLAB is broken just like on GPU3 due to cuda-8.0. I am
> > working now on GPU1.
> >
> >
> > Predrag
> >
> > P.S. I will escalate MATLAB issue with MathWorks but I don't expect to
> > fixed before R2017a.
> >
> >
> >
> >
> >
> >
More information about the Autonlab-users
mailing list