ipython hangs on Auton cluster

Predrag Punosevac predragp at andrew.cmu.edu
Wed Aug 19 18:46:45 EDT 2020


Your report indicates that my gut feeling that SQLite database is the
culprit seems to be correct.  Per our documentation

https://www.autonlab.org/autonlab_wiki/aetiquette.html#don-ts

*Use your scratch directory to store Jupiter sqlite database!*

You placed your SQLite database onto the NFS share (zfsauton2) and you are
surprised that it is incoherent. I hope you understand now better the lack
of urgency in my responses.

Predrag

On Wed, Aug 19, 2020 at 6:03 PM Viraj Mehta <virajm at andrew.cmu.edu> wrote:

> Hi Predrag & Users,
>
> I have a clue as to what is wrong with our cluster. Had a few processes
> running which broke due to this sqlite error from ipython:
> I’d imagine this is what is wrong with all our ipython stuff. No idea how
> to debug this, but I hope it can be helpful as we try to fix this.
>
> Thanks,
> Viraj
>
> On Aug 18, 2020, at 10:28 PM, Chufan Gao <chufang at andrew.cmu.edu> wrote:
>
> Hi All,
>
> Rachel and I are also facing a similar issue with our Jupyter notebooks.
> We also both reinstalled jupyter with no effect.
>
> For me, these notebooks are extremely helpful in fast code iteration and
> testing out concepts.
> I also have the intuition that it is an upstream issue, as they were
> running fine (without any changes) before lop2 went down.
> Would you please take another look?
>
> Worst case, I have to convert my notebooks into .py files, which will slow
> things down.
>
> Sincerely,
> Chufan (Andy) Gao
> ------------------------------
> *From:* Autonlab-users <autonlab-users-bounces at autonlab.org> on behalf of
> Predrag Punosevac <predragp at andrew.cmu.edu>
> *Sent:* Tuesday, August 18, 2020 10:35:11 PM
> *To:* Viraj Mehta
> *Cc:* users at autonlab.org
> *Subject:* Re: ipython hangs on Auton cluster
>
> Viraj Mehta <virajm at andrew.cmu.edu> wrote:
>
> > I???m pretty sure it???s not an upstream bug, as many environments
> > (conda and virtualenv) which were working with ipython across several
> > python versions before are now not working.
> >
> > I understand that ipython and ipdb aren???t typically required for
> > Python workflows but certain efforts, like stepping through code that
> > requires a GPU and loads a model from the Auton cluster, are difficult
> > to debug without ipdb. Is there anything else that has changed that
> > might have broken it?
>
> Nothing that I am aware of. However, you do understand that the system
> is very complex and it is like a live organism constantly morphing.
>
> Best,
> Predrag
>
>
>
> >
> > Thanks,
> > Viraj
> >
> > > On Aug 18, 2020, at 6:21 PM, Predrag Punosevac <
> predragp at andrew.cmu.edu> wrote:
> > >
> > > I looked a bit more carefully. It could be an upstream bug. It
> wouldn't be the first time
> > >
> > > https://github.com/ipython/ipython/issues/11678 <
> https://github.com/ipython/ipython/issues/11678>
> <https://github.com/ipython/ipython/issues/11678>
> ipython won't start · Issue #11678 · ipython/ipython · GitHub
> <https://github.com/ipython/ipython/issues/11678>
> github.com
> Now I'm facing that ipython won't start without any error messages. I
> tried to run it with DEBUG, then the command will be "uninterruptible
> sleep" after the logs. $ pyenv global system $ python --version Python
> 2.7.5 $ ipython --version ...
>
> <https://github.com/ipython/ipython/issues/11678>
> ipython won't start · Issue #11678 · ipython/ipython · GitHub
> <https://github.com/ipython/ipython/issues/11678>
> github.com
> Now I'm facing that ipython won't start without any error messages. I
> tried to run it with DEBUG, then the command will be "uninterruptible
> sleep" after the logs. $ pyenv global system $ python --version Python
> 2.7.5 $ ipython --version ...
>
>
> > >
> > > You don't need ipython to run Python code. You could work and debug
> your code on your local machine and just run production code on the server.
> A typical python code is just a script starting with a shebang following
> with a path to the binaries. I fail to see how ipython could be useful for
> that. It is surely useful for the interactive work.
> > >
> > > Predrag
> > >
> > > On Tue, Aug 18, 2020 at 5:45 PM Viraj Mehta <virajm at andrew.cmu.edu <
> mailto:virajm at andrew.cmu.edu <virajm at andrew.cmu.edu>>> wrote:
> > > Tried this with 3.7 and 3.8 and it still hangs. Also if it???s a good
> clue, it doesn???t stop even if I send SIGINT or SIGQUIT. Not really sure
> what???s going on here.
> > >
> > >> On Aug 18, 2020, at 4:39 PM, Viraj Mehta <virajm at andrew.cmu.edu <
> mailto:virajm at andrew.cmu.edu <virajm at andrew.cmu.edu>>> wrote:
> > >>
> > >> Yeah, I???ll give it a shot. Thanks!
> > >>
> > >>> On Aug 18, 2020, at 4:38 PM, Predrag Punosevac <
> predragp at andrew.cmu.edu <mailto:predragp at andrew.cmu.edu
> <predragp at andrew.cmu.edu>>> wrote:
> > >>>
> > >>> I just upgraded all /opt/conda-py37 and /opt/conda-py38 packages on
> both GPU9 and GPU11. Could you please try again? Could you also try with
> py38 which is now recommended and report back. If this works I will upgrade
> packages across all servers. This could be potentially remotely related to
> the fact that Ifegenia could not build TensorFlow. Another thought is that
> the ipython SQLite database is corrupted.
> > >>>
> > >>> Best,
> > >>> Predag
> > >>>
> > >>> On Tue, Aug 18, 2020 at 4:34 PM Viraj Mehta <virajm at andrew.cmu.edu <
> mailto:virajm at andrew.cmu.edu <virajm at andrew.cmu.edu>>> wrote:
> > >>> Hi Predrag,
> > >>>
> > >>> Hope you???re doing well. I???ve been running into an issue the last
> couple days on the Auton cluster that is blocking my work on code that used
> to work and was hoping to get your thoughts. I have tried to distill this
> down to a small but replicable issue, as seen in the attachment, which I
> have seen hang on the ipython call on GPU9 and GPU11 so far. Do you know
> why this might be? Thanks.
> > >>>
> > >>> Best,
> > >>> Viraj
> > >>
> > >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20200819/816c63ba/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 534822 bytes
Desc: not available
URL: <http://mailman.srv.cs.cmu.edu/pipermail/autonlab-users/attachments/20200819/816c63ba/attachment-0001.png>


More information about the Autonlab-users mailing list