GPU 1 error
Predrag Punosevac
predragp at andrew.cmu.edu
Sat Nov 3 19:11:14 EDT 2018
Biswajit Paria <bparia at cs.cmu.edu> wrote:
> Hi Predrag,
>
> I am trying to use GPU 1, and getting an unusual segmentation fault. The
> same code that I was running for two days is now throwing a segmentation
> fault. Is it possible to restart GPU1? Doesn't look like anyone else it
> using it other than me.
Sure if nobody is using it. Are you sure that you were using this
machine after I rebooted last week? Those library exception errors are
typically due to NVidia 3rd party binary blob drivers which needs to be
reinstalled occasionally. I will give a two hours and reboot at the
same time when I reboot GPU2. If the driver gets broken it will have to
wait Monday.
>
> Here is stack trace in case you want to have a look:
>
> Stack trace returned 10 entries:
> [bt] (0)
> /zfsauton/home/bparia/anaconda3/lib/python3.6/site-packages/mxnet/lib
> mxnet.so(+0x31f81a) [0x7feebb24f81a]
> [bt] (1)
> /zfsauton/home/bparia/anaconda3/lib/python3.6/site-packages/mxnet/lib
> mxnet.so(+0x29f33b6) [0x7feebd9233b6]
> [bt] (2) /lib64/libpthread.so.0(+0xf680) [0x7fef78319680]
> [bt] (3) /lib64/libpthread.so.0(raise+0x2b) [0x7fef7831954b]
> [bt] (4) /lib64/libpthread.so.0(+0xf680) [0x7fef78319680]
> [bt] (5) /usr/lib64/nvidia/libcuda.so.1(+0xf88d5) [0x7fef304548d5]
> [bt] (6) /usr/lib64/nvidia/libcuda.so.1(+0x248914) [0x7fef305a4914]
> [bt] (7) /usr/lib64/nvidia/libcuda.so.1(+0x1e4e80) [0x7fef30540e80]
> [bt] (8) /lib64/libpthread.so.0(+0x7dd5) [0x7fef78311dd5]
> [bt] (9) /lib64/libc.so.6(clone+0x6d) [0x7fef7803bb3d]
>
>
> Thanks in advance!
> --
> Biswajit Paria
> PhD student
> Machine Learning Department
> Carnegie Mellon University
More information about the Autonlab-users
mailing list