GPU2 upgraded
Hanqi Sun
hanqis at andrew.cmu.edu
Wed Oct 26 15:24:22 EDT 2016
Hi,
I am having similar issues with tensorflow as well.
When I run a tensorflow program (like python XX.py), it hangs after loading
the graph (after executing session.run()). I cannot quit it by hitting
Ctrl-Z/Ctrl-C and nothing is printed to the stdout/stderr.
Strangely, my program will terminate and do the job (like writing results
to files). And after it terminates I am able to see all the strings I print
to stdout/stderr. But before its termination I can do nothing about it.
I have tried both my local tensorflow and the global one on all three GPU
machines. They all had the same problem.
Best,
Hanqi
On Wed, Oct 26, 2016 at 3:11 PM, Dougal Sutherland <dougal at gmail.com> wrote:
> a) You might also want to do
>
> export JUPYTER_CONFIG_DIR=/home/scratch/$USER/.jupyter
>
> if you use jupyter notebooks and whatnot.
>
>
> b) I'm also not able to import theano when using GPUs on gpu2/gpu3 as of
> this afternoon. It hangs in a similar way, though I've set the compiledir
> to be in scratch in my theanorc. I've tracked it down in the debugger to this
> line
> <https://github.com/Theano/Theano/blob/25f0dee338b901070e021d431c938102890bc69f/theano/sandbox/cuda/__init__.py#L556>
> or this one
> <https://github.com/Theano/Theano/blob/25f0dee338b901070e021d431c938102890bc69f/theano/sandbox/cuda/__init__.py#L571> (both
> hang, which one is called depends on theano settings), which call this
> function
> <https://github.com/Theano/Theano/blob/140d0a064523349b630a284247c7cddd767fc46e/theano/sandbox/cuda/cuda_ndarray.cu#L3170>
> or this one
> <https://github.com/Theano/Theano/blob/140d0a064523349b630a284247c7cddd767fc46e/theano/sandbox/cuda/cuda_ndarray.cu#L2947>.
> No idea why this started happening or if it's related -- it doesn't seem
> like it should be hitting nfs at all -- but it seemed to start at the same
> time.
>
>
>
> On Wed, Oct 26, 2016 at 7:20 PM Junier Oliva <junieroliva at gmail.com>
> wrote:
>
>> Awesome, seems to work for me too. Thanks!!
>>
>> -Junier
>>
>> On Wed, Oct 26, 2016 at 2:08 PM, Dougal Sutherland <dougal at gmail.com>
>> wrote:
>>
>> Here's a workaround that avoids ipython having its config files / etc on
>> nfs, it seems to work for me:
>>
>> export IPYTHONDIR=/home/scratch/$USER/.ipython
>>
>> You can do this in a terminal or put it your .bash_profile / similar to
>> make it permanent.
>>
>> (I guess this means something changed about the nfs server
>> yesterday/today that broke this.)
>>
>>
>> On Wed, Oct 26, 2016 at 7:03 PM yifei ma <mayifei1012 at gmail.com> wrote:
>>
>> Second that on foxconn. It launches but the ipython client won't start.
>>
>> Thanks,
>> Yifei
>>
>>
>> On 10/26/2016 01:03 PM, Junier Oliva wrote:
>>
>> Not sure if this at all related, but is ipython broken for anyone else?
>> It seems to just hang upon launching it on several auton machines (GPU1,
>> GPU2, LOV4, LOW1).
>>
>> Thanks,
>> Junier
>>
>> On Tue, Oct 25, 2016 at 2:14 PM, Dougal Sutherland <dougal at gmail.com>
>> wrote:
>>
>> The same version of TensorFlow as on gpu3 is now installed on gpu2, along
>> with cudnn; let me know if there are issues.
>>
>> I didn't do a global install of Caffe on either machine, because Caffe is
>> kind of dumb and doesn't really do global installs. If anyone wants this,
>> talk to me and we can figure out what makes sense.
>>
>> - Dougal
>>
>> On Tue, Oct 25, 2016 at 6:09 PM Predrag Punosevac <
>> predragp at imap.srv.cs.cmu.edu> wrote:
>>
>> Dear Autonians,
>>
>> As scheduled I upgraded GPU2 to 256 GB of RAM and 4 Titan (Pascal) cards
>> per Barnabas. Dugal is currently upgrading TensorFlow and Caffe to the
>> latest and the greatest. Please wait until his e-mail until you hit the
>> machine. MATLAB is broken just like on GPU3 due to cuda-8.0. I am
>> working now on GPU1.
>>
>>
>> Predrag
>>
>> P.S. I will escalate MATLAB issue with MathWorks but I don't expect to
>> fixed before R2017a.
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20161026/65b7f6fc/attachment-0001.html>
More information about the Autonlab-users
mailing list