CUDA hangs
Vincent Jeanselme
vjeansel at andrew.cmu.edu
Tue Nov 6 22:01:18 EST 2018
Unfortunately not for me, I already had this path ...
Le 06/11/2018 à 21:51, Matthew Barnes a écrit :
> The CUDA_CACHE_PATH works! Thanks for the quick fix.
>
> On Tue, Nov 6, 2018 at 9:44 PM Yichong Xu <yichongx at cs.cmu.edu
> <mailto:yichongx at cs.cmu.edu>> wrote:
>
> Previously we have encountered this issue: Basically somehow you
> cannot put your cuda cache on nfs server now. Doing this will
> resolve the problem (works for me):
> export CUDA_CACHE_PATH=/home/scratch/[your_id]/[some_folder]
>
> /Thanks,/
> /Yichong/
>
>
>
>> On Nov 6, 2018, at 7:41 PM, Emre Yolcu <eyolcu at cs.cmu.edu
>> <mailto:eyolcu at cs.cmu.edu>> wrote:
>>
>> Could you try setting up everything in the scratch directory and
>> test that way (if that's not what you're already doing)? The last
>> time we had a CUDA problem I moved everything from /zfsauton/home
>> to /home/scratch directories and I cannot reproduce the error on
>> gpu{6,8,9}.
>>
>> On Tue, Nov 6, 2018 at 6:41 PM, <qiong.zhang at stat.ubc.ca
>> <mailto:qiong.zhang at stat.ubc.ca>> wrote:
>>
>> I have a similar issue. When I submit the job, it says
>> Runtime error: CUDA error: unknown error. I tried the simple
>> commands that you provided, doesn't work as well.
>>
>> Qiong
>>
>>
>> November 6, 2018 3:02 PM, "Matthew Barnes"
>> <mbarnes1 at andrew.cmu.edu
>> <mailto:%22Matthew%20Barnes%22%20%3Cmbarnes1 at andrew.cmu.edu%3E>>
>> wrote:
>>
>> Is anyone else having issues with CUDA since this week?
>> Even simple pytorch commands hang:
>> (torch) bash-4.2$ python
>> Python 2.7.5 (default, Jul 3 2018, 19:30:05)
>> [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
>> Type "help", "copyright", "credits" or "license" for more
>> information.
>> >>> import torch
>> x>>> x = torch.zeros(4)
>> >>> x.cuda()
>> nvidia-smi works, and torch.cuda.is_available() returns True.
>>
>>
>>
>>
>
--
Vincent Jeanselme
-----------------
Analyst Researcher
Auton Lab - Robotics Institute
Carnegie Mellon University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20181106/1bf14b23/attachment-0001.html>
More information about the Autonlab-users
mailing list