GPU1 multiple crashes, server restored
Predrag Punosevac
predragp at andrew.cmu.edu
Fri Feb 10 12:26:54 EST 2023
GPU1 is an 8 year old server. I will check the RAID and a few other things
but we have to be realistic about the life span of these machines (5 is the
industry standard).
Predrag
On Fri, Feb 10, 2023 at 12:19 PM Ian Char <ichar at andrew.cmu.edu> wrote:
> Sorry for the late follow up on this. To the best of my knowledge I have
> been the only one using this machine (others please correct me if I am
> wrong). At first I thought it was related to the previous power outages,
> but I think there is something else going on here because of the frequency.
>
> I'm happy to look through my code again, but I think there is another
> issue because
> 1. I am running the same code on other machines fine and
> 2. I have noticed that a crash happened when I had no jobs running at all.
>
> Please let me know if there is anything I can do to help investigate this.
>
> Thanks,
> Ian
>
> On Tue, Feb 7, 2023 at 10:51 PM Predrag Punosevac <predragp at andrew.cmu.edu>
> wrote:
>
>> Whoever was using it should debug her/his code
>>
>> root at gpu1$ pwd
>> /var/crash
>>
>> root at gpu1$ ls
>> 127.0.0.1-2022-12-21-17:28:08 127.0.0.1-2023-01-08-09:21:13
>> 127.0.0.1-2022-12-28-10:30:57 127.0.0.1-2023-01-28-17:12:06
>> 127.0.0.1-2023-01-01-07:47:43 127.0.0.1-2023-02-02-12:21:30
>> 127.0.0.1-2023-01-04-08:42:19 127.0.0.1-2023-02-07-01:54:10
>>
>> Predrag
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20230210/390f001d/attachment.html>
More information about the Autonlab-users
mailing list