GPU1 multiple crashes, server restored
Ian Char
ichar at andrew.cmu.edu
Fri Feb 10 12:18:49 EST 2023
Sorry for the late follow up on this. To the best of my knowledge I have
been the only one using this machine (others please correct me if I am
wrong). At first I thought it was related to the previous power outages,
but I think there is something else going on here because of the frequency.
I'm happy to look through my code again, but I think there is another issue
because
1. I am running the same code on other machines fine and
2. I have noticed that a crash happened when I had no jobs running at all.
Please let me know if there is anything I can do to help investigate this.
Thanks,
Ian
On Tue, Feb 7, 2023 at 10:51 PM Predrag Punosevac <predragp at andrew.cmu.edu>
wrote:
> Whoever was using it should debug her/his code
>
> root at gpu1$ pwd
> /var/crash
>
> root at gpu1$ ls
> 127.0.0.1-2022-12-21-17:28:08 127.0.0.1-2023-01-08-09:21:13
> 127.0.0.1-2022-12-28-10:30:57 127.0.0.1-2023-01-28-17:12:06
> 127.0.0.1-2023-01-01-07:47:43 127.0.0.1-2023-02-02-12:21:30
> 127.0.0.1-2023-01-04-08:42:19 127.0.0.1-2023-02-07-01:54:10
>
> Predrag
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20230210/1ffa75be/attachment.html>
More information about the Autonlab-users
mailing list