Lov5 Not Responding

Predrag Punosevac predragp at andrew.cmu.edu
Thu Apr 30 21:55:28 EDT 2020


Naji Shajarisales <najis at cmu.edu> wrote:

> Hi Predrag,
> 
> Just wanted to say I cannot reach lov5 now. 

Just fixed with cold reboot. Runaway Python script did it. I am not
naming names :-) I would like to reboot it one more time after updating
all packages. The server was is already out of production. Give 10-15
minutes before trying to login into.


> I have quite some good updates
> that I haven't committed on it right now.
> 

I am not following 100%. If it is code, Git commits are cheap. You
should commit frequently and push regularly. If it is data stored on
anything zfs it will come undamaged. If it is a output of a script which
was writing could be very well corrupt or non existing because process
was terminated. Anything on XFS could be corrupted due to the unclean
power cycle.  Only ZFS, HAMMER1, and HAMMER2 handle crashes 100% clean.
WAPBL will be OK in most situations. Linux has none of it.


> I hope it is possible to bring lov5 back without deleting anything from
> scratch.


Scratch has been only deleted 3 times in almost 8 years while I am
affiliated to the Lab. Every time that happen after a looong wait time
when it was 100%, unusable, and people didn't care about it. Now you are
probably referring to data corruption described earlier. I can't promise
that. I am a system admin not a magician.

> 
> I would appreciate if you address this asap.

As always :-).

Cheers,
Predrag

> 
> Best,
> Naji


More information about the Autonlab-users mailing list