CUDA hangs

Wed Nov 7 08:52:34 EST 2018

Problem solved after restart of tmux

On 11/6/18 10:01 PM, Vincent Jeanselme wrote:
>
> Unfortunately not for me, I already had this path ...
>
> Le 06/11/2018 à 21:51, Matthew Barnes a écrit :
>> The CUDA_CACHE_PATH works! Thanks for the quick fix.
>>
>> On Tue, Nov 6, 2018 at 9:44 PM Yichong Xu <yichongx at cs.cmu.edu 
>> <mailto:yichongx at cs.cmu.edu>> wrote:
>>
>>     Previously we have encountered this issue: Basically somehow you
>>     cannot put your cuda cache on nfs server now. Doing this will
>>     resolve the problem (works for me):
>>     export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder]
>>
>>     /Thanks,/
>>     /Yichong/
>>
>>
>>
>>>     On Nov 6, 2018, at 7:41 PM, Emre Yolcu <eyolcu at cs.cmu.edu
>>>     <mailto:eyolcu at cs.cmu.edu>> wrote:
>>>
>>>     Could you try setting up everything in the scratch directory and
>>>     test that way (if that's not what you're already doing)? The
>>>     last time we had a CUDA problem I moved everything from
>>>     /zfsauton/home to /home/scratch directories and I cannot
>>>     reproduce the error on gpu{6,8,9}.
>>>
>>>     On Tue, Nov 6, 2018 at 6:41 PM, <qiong.zhang at stat.ubc.ca
>>>     <mailto:qiong.zhang at stat.ubc.ca>> wrote:
>>>
>>>         I have a similar issue. When I submit the job, it says
>>>         Runtime error: CUDA error: unknown error. I tried the simple
>>>         commands that you provided, doesn't work as well.
>>>
>>>         Qiong
>>>
>>>
>>>         November 6, 2018 3:02 PM, "Matthew Barnes"
>>>         <mbarnes1 at andrew.cmu.edu
>>>         <mailto:%22Matthew%20Barnes%22%20%3Cmbarnes1 at andrew.cmu.edu%3E>>
>>>         wrote:
>>>
>>>             Is anyone else having issues with CUDA since this week?
>>>             Even simple pytorch commands hang:
>>>             (torch) bash-4.2$ python
>>>             Python 2.7.5 (default, Jul 3 2018, 19:30:05)
>>>             [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
>>>             Type "help", "copyright", "credits" or "license" for
>>>             more information.
>>>             >>> import torch
>>>             x>>> x = torch.zeros(4)
>>>             >>> x.cuda()
>>>             nvidia-smi works, and torch.cuda.is_available() returns
>>>             True.
>>>
>>>
>>>
>>>
>>
> -- 
> Vincent Jeanselme
> -----------------
> Analyst Researcher
> Auton Lab - Robotics Institute
> Carnegie Mellon University

-- 
Vincent Jeanselme
-----------------
Analyst Researcher
Auton Lab - Robotics Institute
Carnegie Mellon University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181107/4bcf389e/attachment.html>