From awd at cs.cmu.edu Tue May 2 10:14:44 2017 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 2 May 2017 10:14:44 -0400 Subject: Fwd: Second Paper Presentation - Karen Chen - Thursday, May 4 at noon - Room 2003 In-Reply-To: References: Message-ID: <8ee134a8-2229-285c-bfab-69b36a738ca1@cs.cmu.edu> Team, Karen will be presenting her qualifier work at Heinz college this Thursday. Please join if you can. Thanks Artur -------- Forwarded Message -------- Subject: CORRECTION: Second Paper Presentation - Karen Chen - Thursday, May 4 at noon - Room 2003 Date: Fri, 28 Apr 2017 18:59:15 +0000 From: Michelle Wirtz To: Heinz-phd at lists.andrew.cmu.edu , heinz-faculty at lists.andrew.cmu.edu , Amy Ogan Hi all, Please join us on Thursday, May 4, 2017 in Hamburg Hall Room 2003 at noon when Karen Chen will be presenting her second paper. *Title:*Peek into the Black Box: A Multimodal Analysis Framework for Automatic Characterization of the One-on-one Tutoring Processes *Committee: *Artur Dubrawski (chair), Daniel Nagin and Amy Ogen (HCII,SCS) *Abstract:* Student-teacher interactions during the one-on-one tutoring processes are rich forms of inter-personal communications with significant educational impact. An ideal teacher is able to pick up student's subtle signals in real time and respond optimally to offer cognitive and emotional support. However, until recently, the characterization of this information rich process has relied upon human observations which do not scale well. In this study, I made an attempt to automate the characterization process by leveraging the recent advances in affective computing and multi-modal machine learning techniques. I analyzed a series of video recordings of math problem solving sessions by a young student under support of his tutor, demonstrating a multimodal analysis framework to characterize several aspects of the student-teacher interaction patterns at a fine-grained temporal resolution. I then build machine learning models to predict teacher's response using extracted multi-modal features. In addition, I validate the performance of automatic detector of affect, intent-to-connect behavior, and voice activity, using annotated data, which provides evidence of the potential utility of the presented tools in scaling up analysis of this type to large number of subjects and in implementing decision support tools to guide teachers towards optimal intervention in real time. *Paper:*https://drive.google.com/open?id=0B8SWduW_x8gYcnN6YkhZSDA3WE0 ** -------------- next part -------------- An HTML attachment was scrubbed... URL: From sheath at andrew.cmu.edu Wed May 3 16:43:29 2017 From: sheath at andrew.cmu.edu (Simon Heath) Date: Wed, 3 May 2017 16:43:29 -0400 Subject: Auton Lab Code and Coffee, next Thursday at 12:30 pm In-Reply-To: References: Message-ID: Hi all, Karen Chen is presenting a paper this Thursday at noon, so we'll be pushing this back a bit to 1 to 1:30 pm or thereabouts so we can all go see her talk as well. Simon On Wed, Apr 26, 2017 at 10:44 AM, Simon Heath wrote: > Hi all, > > I want to try something new, which I call Code and Coffee. The goal is > basically to be the equivalent of the brainstorming sessions, but oriented > less around what research we're doing and more around what research tools > we're using/developing/wish we had. I want to do them at least monthly, > maybe weekly if we can manage it, just to get a lot of people together in > the same room to talk about programming. > > The first session will be next Thursday at 12:30 PM, in NSH 3001. Feel > free to bring lunches, and if people want we can pass a hat around and get > a couple pots of actually good coffee. I will be kicking off by talking > about Collective Mind Data Server, a web service for time-series data > analysis being built by Jarod, Saswati, Anthony and myself. > > Simon > > -- > Simon Heath, Research Programmer and Analyst > Robotics Institute - Auton Lab > Carnegie Mellon University > sheath at andrew.cmu.edu > -- Simon Heath, Research Programmer and Analyst Robotics Institute - Auton Lab Carnegie Mellon University sheath at andrew.cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From sheath at andrew.cmu.edu Thu May 4 13:03:58 2017 From: sheath at andrew.cmu.edu (Simon Heath) Date: Thu, 4 May 2017 13:03:58 -0400 Subject: Auton Lab Code and Coffee, next Thursday at 12:30 pm In-Reply-To: References: Message-ID: There's coffee in NSH 3001, if people bring code. Simon On Wed, Apr 26, 2017 at 10:44 AM, Simon Heath wrote: > Hi all, > > I want to try something new, which I call Code and Coffee. The goal is > basically to be the equivalent of the brainstorming sessions, but oriented > less around what research we're doing and more around what research tools > we're using/developing/wish we had. I want to do them at least monthly, > maybe weekly if we can manage it, just to get a lot of people together in > the same room to talk about programming. > > The first session will be next Thursday at 12:30 PM, in NSH 3001. Feel > free to bring lunches, and if people want we can pass a hat around and get > a couple pots of actually good coffee. I will be kicking off by talking > about Collective Mind Data Server, a web service for time-series data > analysis being built by Jarod, Saswati, Anthony and myself. > > Simon > > -- > Simon Heath, Research Programmer and Analyst > Robotics Institute - Auton Lab > Carnegie Mellon University > sheath at andrew.cmu.edu > -- Simon Heath, Research Programmer and Analyst Robotics Institute - Auton Lab Carnegie Mellon University sheath at andrew.cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Sat May 6 08:00:12 2017 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Sat, 6 May 2017 08:00:12 -0400 Subject: an opportunity to fight human trafficking with your own hands Message-ID: Team, We will have a unique opportunity to join a concerted effort and help generate and track leads to guide law enforcement in their field operations against human trafficking. This will involve using our Traffic Jam tool, as well as other analytic tools, in real time. If you are interested in participating and if you have the weekend of May 20-21 (plus sometime on Friday just before it) available please let me and Emily Kennedy (cc-d) know. Thanks Artur From awd at cs.cmu.edu Sun May 7 11:34:17 2017 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Sun, 7 May 2017 11:34:17 -0400 Subject: Fwd: PhD Speaking Qualifier (3-4pm, Mon, May 8th) - NSH 1109 - Matting and Depth Recovery of Thin Structures using a Focal Stack In-Reply-To: References: Message-ID: Good stuff from Chao on display tomorrow. Every Autonian is welcome to join and listen to his talk. Artur -------- Forwarded Message -------- Subject: PhD Speaking Qualifier (3-4pm, Mon, May 8th) - NSH 1109 - Matting and Depth Recovery of Thin Structures using a Focal Stack Date: Fri, 05 May 2017 10:37:59 -0400 From: Chao Liu Reply-To: chao.liu at cs.cmu.edu To: ri-people at cs.cmu.edu Hi everyone, I will be talking about my work on depth recovery for fine-grained structures for the PhD speaking qualifier next Monday (May 8th) at 3:00pm at NSH 1109. Everyone is invited. ----------------- Time: May 8, 2017, 3:00PM - 4:00PM Location: NSH 1109 Title: Matting and Depth Recovery of Thin Structures using a Focal Stack Abstract: Thin structures such as fences, grass and vessels are common in photography and scientific imaging. They exhibit complex 3D structures with sharp depth variations/discontinuities and mutual occlusions. In this paper, we develop a method to estimate the occlusion matte and depths of thin structures from a focal image stack, which is obtained either by varying the focus/aperture of the lens or computed from a one-shot light field image. We propose an image formation model that explicitly describes the spatially varying optical blur and mutual occlusions for structures located at different depths. Based on the model, we derive an efficient MCMC inference algorithm that enables direct and analytical computations of the iterative update for the model/images without re-rendering images in the sampling process. Then, the depths of the thin structures are recovered using gradient descent with the differential terms computed using the image formation model. We apply the proposed method to scenes at both macro and micro scales. For macro-scale, we evaluate our method on scenes with complex 3D thin structures such as tree branches and grass. For micro-scale, we apply our method to in-vivo microscopic images of micro-vessels with diameters less than 50 um. To our knowledge, the proposed method is the first approach to reconstruct the 3D structures of micro-vessels from non-invasive in-vivo image measurements. Committee: Srinivasa Narasimhan (co-advisor) Artur Dubrawski (co-advisor) Aswin Sankaranarayanan Yuxiong Wang ----------------- Thank you! - Chao -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at cs.cmu.edu Thu May 11 11:41:52 2017 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Thu, 11 May 2017 11:41:52 -0400 Subject: GPU1 scratch 100% full Message-ID: <20170511154152.Du-RefjZK%predragp@cs.cmu.edu> Dear Autonians, As you know we are not enforcing any quotas on our servers. That also mean that we also have to follow Auton Lab net etiquette. Could you please delete unnecessary files from your scratch directory on GPU1? If you need extra space we have 10TB RAID 6 on GPU1 and GPU2 that can be used on the need base. Best, Predrag From sheath at andrew.cmu.edu Thu May 11 15:32:45 2017 From: sheath at andrew.cmu.edu (Simon Heath) Date: Thu, 11 May 2017 15:32:45 -0400 Subject: Code and Coffee, Thursday May 18th Message-ID: Hey all, Code and Coffee is next Thursday, May 18th in NSH 3001. If anyone has a topic they want to present or talk about, speak up or forever hold your peace! One topic suggestion from Ben and Maria is to talk about ways we can more effectively deal with resource sharing on our compute nodes: CPU, memory, scratch space, GPU's, etc. Is there some system or policy we can implement to help everyone share more harmoniously? Do we want a technical solution like a queue system, or it just a matter of keeping an eye on your work and communicating effectively? Perhaps we need more public service announcements like the one attached? Discuss, or offer alternative topics! Simon -- Simon Heath, Research Programmer and Analyst Robotics Institute - Auton Lab Carnegie Mellon University sheath at andrew.cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gpu-machines.jpg Type: image/jpeg Size: 32781 bytes Desc: not available URL: From sheath at andrew.cmu.edu Thu May 11 15:52:22 2017 From: sheath at andrew.cmu.edu (Simon Heath) Date: Thu, 11 May 2017 15:52:22 -0400 Subject: Code and Coffee, Thursday May 18th In-Reply-To: References: Message-ID: Oops, forgot to mention, it will start at 12:30 pm. Feel free to bring your lunch. Simon On Thu, May 11, 2017 at 3:32 PM, Simon Heath wrote: > Hey all, > > Code and Coffee is next Thursday, May 18th in NSH 3001. If anyone has a > topic they want to present or talk about, speak up or forever hold your > peace! > > One topic suggestion from Ben and Maria is to talk about ways we can more > effectively deal with resource sharing on our compute nodes: CPU, memory, > scratch space, GPU's, etc. Is there some system or policy we can implement > to help everyone share more harmoniously? Do we want a technical solution > like a queue system, or it just a matter of keeping an eye on your work and > communicating effectively? Perhaps we need more public service > announcements like the one attached? > > Discuss, or offer alternative topics! > > Simon > > -- > Simon Heath, Research Programmer and Analyst > Robotics Institute - Auton Lab > Carnegie Mellon University > sheath at andrew.cmu.edu > -- Simon Heath, Research Programmer and Analyst Robotics Institute - Auton Lab Carnegie Mellon University sheath at andrew.cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at cs.cmu.edu Thu May 11 21:55:24 2017 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Thu, 11 May 2017 21:55:24 -0400 Subject: Naive Tensorflow/GPU1 question In-Reply-To: References: Message-ID: <20170512015524.wNTLB7Eso%predragp@cs.cmu.edu> Kirthevasan Kandasamy wrote: > Hi Predrag, > > I am re-running a tensorflow project on GPU1 - I haven't touched it in 4/5 > months, and the last time I ran it it worked fine, but when I try now I > seem to be getting the following error. > This is the first time I hear about it. I was under impression that GPU nodes were usable. I am redirecting your e-mail to users at autonlab.org in the hope that somebody who is using TensorFlow on the regular basis can be of more help. Predrag > Can you please tell me what the issue might be or direct me to someone who > might know? > > This is for the NIPS deadline, so I would appreciate a quick response. > > thanks, > Samy > > > I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with > properties: > name: Tesla K80 > major: 3 minor: 7 memoryClockRate (GHz) 0.8235 > pciBusID 0000:05:00.0 > Total memory: 11.17GiB > Free memory: 11.11GiB > I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 > I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y > I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow > device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:05:00.0) > E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime CuDNN > library: 4007 (compatibility version 4000) but source was compiled with > 5103 (compatibility version 5100). If using a binary install, upgrade your > CuDNN library to match. If building from sources, make sure the library > loaded at runtime matches a compatible version specified during compile > configuration. > F tensorflow/core/kernels/conv_ops.cc:457] Check failed: > stream->parent()->GetConvolveAlgorithms(&algorithms) > run_resnet.sh: line 49: 22665 Aborted (core dumped) > CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py --data_dir > $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every > $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET --num_gpus 1 > --save_model_dir $SAVE_MODEL_DIR --save_model_every $SAVE_MODEL_EVERY > --skip_add_method $SKIP_ADD_METHOD --architecture $ARCHITECTURE --skip_size > $SKIP_SIZE From chiragn at andrew.cmu.edu Fri May 12 11:22:13 2017 From: chiragn at andrew.cmu.edu (chiragn at andrew.cmu.edu) Date: Fri, 12 May 2017 11:22:13 -0400 Subject: Naive Tensorflow/GPU1 question In-Reply-To: <20170512015524.wNTLB7Eso%predragp@cs.cmu.edu> References: <20170512015524.wNTLB7Eso%predragp@cs.cmu.edu> Message-ID: <3dd033ee1efd4d00c14db94d52ba37a7.squirrel@webmail.andrew.cmu.edu> Have you tried running it from with iPython notebook as an interactive session? I am doing that right now and it works. Chirag > Kirthevasan Kandasamy wrote: > >> Hi Predrag, >> >> I am re-running a tensorflow project on GPU1 - I haven't touched it in >> 4/5 >> months, and the last time I ran it it worked fine, but when I try now I >> seem to be getting the following error. >> > > This is the first time I hear about it. I was under impression that GPU > nodes were usable. I am redirecting your e-mail to users at autonlab.org > in the hope that somebody who is using TensorFlow on the regular basis > can be of more help. > > Predrag > > > > >> Can you please tell me what the issue might be or direct me to someone >> who >> might know? >> >> This is for the NIPS deadline, so I would appreciate a quick response. >> >> thanks, >> Samy >> >> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 >> with >> properties: >> name: Tesla K80 >> major: 3 minor: 7 memoryClockRate (GHz) 0.8235 >> pciBusID 0000:05:00.0 >> Total memory: 11.17GiB >> Free memory: 11.11GiB >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y >> I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating >> TensorFlow >> device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: >> 0000:05:00.0) >> E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime CuDNN >> library: 4007 (compatibility version 4000) but source was compiled with >> 5103 (compatibility version 5100). If using a binary install, upgrade >> your >> CuDNN library to match. If building from sources, make sure the library >> loaded at runtime matches a compatible version specified during compile >> configuration. >> F tensorflow/core/kernels/conv_ops.cc:457] Check failed: >> stream->parent()->GetConvolveAlgorithms(&algorithms) >> run_resnet.sh: line 49: 22665 Aborted (core dumped) >> CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py --data_dir >> $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every >> $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET --num_gpus >> 1 >> --save_model_dir $SAVE_MODEL_DIR --save_model_every $SAVE_MODEL_EVERY >> --skip_add_method $SKIP_ADD_METHOD --architecture $ARCHITECTURE >> --skip_size >> $SKIP_SIZE > From kandasamy at cmu.edu Fri May 12 11:47:52 2017 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Fri, 12 May 2017 11:47:52 -0400 Subject: Naive Tensorflow/GPU1 question In-Reply-To: <3dd033ee1efd4d00c14db94d52ba37a7.squirrel@webmail.andrew.cmu.edu> References: <20170512015524.wNTLB7Eso%predragp@cs.cmu.edu> <3dd033ee1efd4d00c14db94d52ba37a7.squirrel@webmail.andrew.cmu.edu> Message-ID: No, I don't use iPython. On Fri, May 12, 2017 at 11:22 AM, wrote: > Have you tried running it from with iPython notebook as an interactive > session? > > I am doing that right now and it works. > > Chirag > > > > Kirthevasan Kandasamy wrote: > > > >> Hi Predrag, > >> > >> I am re-running a tensorflow project on GPU1 - I haven't touched it in > >> 4/5 > >> months, and the last time I ran it it worked fine, but when I try now I > >> seem to be getting the following error. > >> > > > > This is the first time I hear about it. I was under impression that GPU > > nodes were usable. I am redirecting your e-mail to users at autonlab.org > > in the hope that somebody who is using TensorFlow on the regular basis > > can be of more help. > > > > Predrag > > > > > > > > > >> Can you please tell me what the issue might be or direct me to someone > >> who > >> might know? > >> > >> This is for the NIPS deadline, so I would appreciate a quick response. > >> > >> thanks, > > >> Samy > >> > >> > >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 > >> with > >> properties: > >> name: Tesla K80 > >> major: 3 minor: 7 memoryClockRate (GHz) 0.8235 > >> pciBusID 0000:05:00.0 > >> Total memory: 11.17GiB > >> Free memory: 11.11GiB > >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 > >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y > >> I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating > >> TensorFlow > >> device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: > >> 0000:05:00.0) > >> E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime CuDNN > >> library: 4007 (compatibility version 4000) but source was compiled with > >> 5103 (compatibility version 5100). If using a binary install, upgrade > >> your > >> CuDNN library to match. If building from sources, make sure the library > >> loaded at runtime matches a compatible version specified during compile > >> configuration. > >> F tensorflow/core/kernels/conv_ops.cc:457] Check failed: > >> stream->parent()->GetConvolveAlgorithms(&algorithms) > >> run_resnet.sh: line 49: 22665 Aborted (core dumped) > >> CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py --data_dir > >> $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every > >> $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET --num_gpus > >> 1 > >> --save_model_dir $SAVE_MODEL_DIR --save_model_every $SAVE_MODEL_EVERY > >> --skip_add_method $SKIP_ADD_METHOD --architecture $ARCHITECTURE > >> --skip_size > >> $SKIP_SIZE > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dougal at gmail.com Fri May 12 11:59:28 2017 From: dougal at gmail.com (Dougal Sutherland) Date: Fri, 12 May 2017 15:59:28 +0000 Subject: Naive Tensorflow/GPU1 question In-Reply-To: References: <20170512015524.wNTLB7Eso%predragp@cs.cmu.edu> <3dd033ee1efd4d00c14db94d52ba37a7.squirrel@webmail.andrew.cmu.edu> Message-ID: It's possible that you followed some instructions I sent a while ago and are using your own version of cudnn. Try "echo $LD_LIBRARY_PATH" and make sure it only has things in /usr/local, /usr/lib64 (nothing in your own directories), and make sure that your python code doesn't change that.... The Anaconda python distribution now distributes cudnn and tensorflow-gpu, so you could also install that in your scratch dir to have your own install. But they only have tensorflow 1.0 and higher, so your old code would require some changes (system install on gpu1 is 0.10, and there were breaking changes in both 1.0 and 1.1). On Fri, May 12, 2017 at 4:55 PM Dougal Sutherland wrote: > It works for me too, not in IPython. Try this: > > CUDA_VISIBLE_DEVICES=5 python -c 'import tensorflow as tf; > tf.InteractiveSession()' > > On Fri, May 12, 2017 at 4:55 PM Kirthevasan Kandasamy > wrote: > >> No, I don't use iPython. >> >> On Fri, May 12, 2017 at 11:22 AM, wrote: >> >>> Have you tried running it from with iPython notebook as an interactive >>> session? >>> >>> I am doing that right now and it works. >>> >>> Chirag >>> >>> >>> > Kirthevasan Kandasamy wrote: >>> > >>> >> Hi Predrag, >>> >> >>> >> I am re-running a tensorflow project on GPU1 - I haven't touched it in >>> >> 4/5 >>> >> months, and the last time I ran it it worked fine, but when I try now >>> I >>> >> seem to be getting the following error. >>> >> >>> > >>> > This is the first time I hear about it. I was under impression that GPU >>> > nodes were usable. I am redirecting your e-mail to users at autonlab.org >>> > in the hope that somebody who is using TensorFlow on the regular basis >>> > can be of more help. >>> > >>> > Predrag >>> > >>> > >>> > >>> > >>> >> Can you please tell me what the issue might be or direct me to someone >>> >> who >>> >> might know? >>> >> >>> >> This is for the NIPS deadline, so I would appreciate a quick response. >>> >> >>> >> thanks, >>> >>> >> Samy >>> >> >>> >> >>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 >>> >> with >>> >> properties: >>> >> name: Tesla K80 >>> >> major: 3 minor: 7 memoryClockRate (GHz) 0.8235 >>> >> pciBusID 0000:05:00.0 >>> >> Total memory: 11.17GiB >>> >> Free memory: 11.11GiB >>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 >>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y >>> >> I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating >>> >> TensorFlow >>> >> device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: >>> >> 0000:05:00.0) >>> >> E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime >>> CuDNN >>> >> library: 4007 (compatibility version 4000) but source was compiled >>> with >>> >> 5103 (compatibility version 5100). If using a binary install, upgrade >>> >> your >>> >> CuDNN library to match. If building from sources, make sure the >>> library >>> >> loaded at runtime matches a compatible version specified during >>> compile >>> >> configuration. >>> >> F tensorflow/core/kernels/conv_ops.cc:457] Check failed: >>> >> stream->parent()->GetConvolveAlgorithms(&algorithms) >>> >> run_resnet.sh: line 49: 22665 Aborted (core dumped) >>> >> CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py --data_dir >>> >> $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every >>> >> $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET >>> --num_gpus >>> >> 1 >>> >> --save_model_dir $SAVE_MODEL_DIR --save_model_every $SAVE_MODEL_EVERY >>> >> --skip_add_method $SKIP_ADD_METHOD --architecture $ARCHITECTURE >>> >> --skip_size >>> >> $SKIP_SIZE >>> > >>> >>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dougal at gmail.com Fri May 12 11:55:49 2017 From: dougal at gmail.com (Dougal Sutherland) Date: Fri, 12 May 2017 15:55:49 +0000 Subject: Naive Tensorflow/GPU1 question In-Reply-To: References: <20170512015524.wNTLB7Eso%predragp@cs.cmu.edu> <3dd033ee1efd4d00c14db94d52ba37a7.squirrel@webmail.andrew.cmu.edu> Message-ID: It works for me too, not in IPython. Try this: CUDA_VISIBLE_DEVICES=5 python -c 'import tensorflow as tf; tf.InteractiveSession()' On Fri, May 12, 2017 at 4:55 PM Kirthevasan Kandasamy wrote: > No, I don't use iPython. > > On Fri, May 12, 2017 at 11:22 AM, wrote: > >> Have you tried running it from with iPython notebook as an interactive >> session? >> >> I am doing that right now and it works. >> >> Chirag >> >> >> > Kirthevasan Kandasamy wrote: >> > >> >> Hi Predrag, >> >> >> >> I am re-running a tensorflow project on GPU1 - I haven't touched it in >> >> 4/5 >> >> months, and the last time I ran it it worked fine, but when I try now I >> >> seem to be getting the following error. >> >> >> > >> > This is the first time I hear about it. I was under impression that GPU >> > nodes were usable. I am redirecting your e-mail to users at autonlab.org >> > in the hope that somebody who is using TensorFlow on the regular basis >> > can be of more help. >> > >> > Predrag >> > >> > >> > >> > >> >> Can you please tell me what the issue might be or direct me to someone >> >> who >> >> might know? >> >> >> >> This is for the NIPS deadline, so I would appreciate a quick response. >> >> >> >> thanks, >> >> >> Samy >> >> >> >> >> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 >> >> with >> >> properties: >> >> name: Tesla K80 >> >> major: 3 minor: 7 memoryClockRate (GHz) 0.8235 >> >> pciBusID 0000:05:00.0 >> >> Total memory: 11.17GiB >> >> Free memory: 11.11GiB >> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 >> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y >> >> I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating >> >> TensorFlow >> >> device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: >> >> 0000:05:00.0) >> >> E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime CuDNN >> >> library: 4007 (compatibility version 4000) but source was compiled with >> >> 5103 (compatibility version 5100). If using a binary install, upgrade >> >> your >> >> CuDNN library to match. If building from sources, make sure the >> library >> >> loaded at runtime matches a compatible version specified during compile >> >> configuration. >> >> F tensorflow/core/kernels/conv_ops.cc:457] Check failed: >> >> stream->parent()->GetConvolveAlgorithms(&algorithms) >> >> run_resnet.sh: line 49: 22665 Aborted (core dumped) >> >> CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py --data_dir >> >> $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every >> >> $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET >> --num_gpus >> >> 1 >> >> --save_model_dir $SAVE_MODEL_DIR --save_model_every $SAVE_MODEL_EVERY >> >> --skip_add_method $SKIP_ADD_METHOD --architecture $ARCHITECTURE >> >> --skip_size >> >> $SKIP_SIZE >> > >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandasamy at cmu.edu Fri May 12 13:19:56 2017 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Fri, 12 May 2017 13:19:56 -0400 Subject: Naive Tensorflow/GPU1 question In-Reply-To: References: <20170512015524.wNTLB7Eso%predragp@cs.cmu.edu> <3dd033ee1efd4d00c14db94d52ba37a7.squirrel@webmail.andrew.cmu.edu> Message-ID: hey Dougal, I could run python and import tensorflow on GPU1 but the issue is when I run my command. Could it be that GPU4 is still using the older version of tensorflow? I can run my stuff on GPU4 without much of an issue but not on GPU1. Here's what LD_LIBRARY_PATH gives me oin GPU4 and GPU1 kkandasa at gpu4$ echo $LD_LIBRARY_PATH /zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib: kkandasa at gpu1$ echo $LD_LIBRARY_PATH /zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/usr/local/cuda/lib64:/usr/lib64/mpich/lib On Fri, May 12, 2017 at 11:59 AM, Dougal Sutherland wrote: > It's possible that you followed some instructions I sent a while ago and > are using your own version of cudnn. Try "echo $LD_LIBRARY_PATH" and make > sure it only has things in /usr/local, /usr/lib64 (nothing in your own > directories), and make sure that your python code doesn't change that.... > > The Anaconda python distribution now distributes cudnn and tensorflow-gpu, > so you could also install that in your scratch dir to have your own > install. But they only have tensorflow 1.0 and higher, so your old code > would require some changes (system install on gpu1 is 0.10, and there were > breaking changes in both 1.0 and 1.1). > > On Fri, May 12, 2017 at 4:55 PM Dougal Sutherland > wrote: > >> It works for me too, not in IPython. Try this: >> >> CUDA_VISIBLE_DEVICES=5 python -c 'import tensorflow as tf; >> tf.InteractiveSession()' >> >> On Fri, May 12, 2017 at 4:55 PM Kirthevasan Kandasamy >> wrote: >> >>> No, I don't use iPython. >>> >>> On Fri, May 12, 2017 at 11:22 AM, wrote: >>> >>>> Have you tried running it from with iPython notebook as an interactive >>>> session? >>>> >>>> I am doing that right now and it works. >>>> >>>> Chirag >>>> >>>> >>>> > Kirthevasan Kandasamy wrote: >>>> > >>>> >> Hi Predrag, >>>> >> >>>> >> I am re-running a tensorflow project on GPU1 - I haven't touched it >>>> in >>>> >> 4/5 >>>> >> months, and the last time I ran it it worked fine, but when I try >>>> now I >>>> >> seem to be getting the following error. >>>> >> >>>> > >>>> > This is the first time I hear about it. I was under impression that >>>> GPU >>>> > nodes were usable. I am redirecting your e-mail to >>>> users at autonlab.org >>>> > in the hope that somebody who is using TensorFlow on the regular basis >>>> > can be of more help. >>>> > >>>> > Predrag >>>> > >>>> > >>>> > >>>> > >>>> >> Can you please tell me what the issue might be or direct me to >>>> someone >>>> >> who >>>> >> might know? >>>> >> >>>> >> This is for the NIPS deadline, so I would appreciate a quick >>>> response. >>>> >> >>>> >> thanks, >>>> >>>> >> Samy >>>> >> >>>> >> >>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 >>>> >> with >>>> >> properties: >>>> >> name: Tesla K80 >>>> >> major: 3 minor: 7 memoryClockRate (GHz) 0.8235 >>>> >> pciBusID 0000:05:00.0 >>>> >> Total memory: 11.17GiB >>>> >> Free memory: 11.11GiB >>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 >>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y >>>> >> I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating >>>> >> TensorFlow >>>> >> device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: >>>> >> 0000:05:00.0) >>>> >> E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime >>>> CuDNN >>>> >> library: 4007 (compatibility version 4000) but source was compiled >>>> with >>>> >> 5103 (compatibility version 5100). If using a binary install, >>>> upgrade >>>> >> your >>>> >> CuDNN library to match. If building from sources, make sure the >>>> library >>>> >> loaded at runtime matches a compatible version specified during >>>> compile >>>> >> configuration. >>>> >> F tensorflow/core/kernels/conv_ops.cc:457] Check failed: >>>> >> stream->parent()->GetConvolveAlgorithms(&algorithms) >>>> >> run_resnet.sh: line 49: 22665 Aborted (core dumped) >>>> >> CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py >>>> --data_dir >>>> >> $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every >>>> >> $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET >>>> --num_gpus >>>> >> 1 >>>> >> --save_model_dir $SAVE_MODEL_DIR --save_model_every $SAVE_MODEL_EVERY >>>> >> --skip_add_method $SKIP_ADD_METHOD --architecture $ARCHITECTURE >>>> >> --skip_size >>>> >> $SKIP_SIZE >>>> > >>>> >>>> >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dougal at gmail.com Fri May 12 13:24:14 2017 From: dougal at gmail.com (Dougal Sutherland) Date: Fri, 12 May 2017 17:24:14 +0000 Subject: Naive Tensorflow/GPU1 question In-Reply-To: References: <20170512015524.wNTLB7Eso%predragp@cs.cmu.edu> <3dd033ee1efd4d00c14db94d52ba37a7.squirrel@webmail.andrew.cmu.edu> Message-ID: The error you showed *should* be triggered by starting a session (not just by importing tensorflow, but the command I sent earlier does that). It could be that your torch install in your home directory is messing with things. Try exporting LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib64/mpich/lib before starting python. On Fri, May 12, 2017 at 6:20 PM Kirthevasan Kandasamy wrote: > hey Dougal, > > I could run python and import tensorflow on GPU1 but the issue is when I > run my command. > Could it be that GPU4 is still using the older version of tensorflow? > > I can run my stuff on GPU4 without much of an issue but not on GPU1. > Here's what LD_LIBRARY_PATH gives me oin GPU4 and GPU1 > > kkandasa at gpu4$ echo $LD_LIBRARY_PATH > > /zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib: > > kkandasa at gpu1$ echo $LD_LIBRARY_PATH > > /zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/usr/local/cuda/lib64:/usr/lib64/mpich/lib > > > > On Fri, May 12, 2017 at 11:59 AM, Dougal Sutherland > wrote: > >> It's possible that you followed some instructions I sent a while ago and >> are using your own version of cudnn. Try "echo $LD_LIBRARY_PATH" and make >> sure it only has things in /usr/local, /usr/lib64 (nothing in your own >> directories), and make sure that your python code doesn't change that.... >> >> The Anaconda python distribution now distributes cudnn and >> tensorflow-gpu, so you could also install that in your scratch dir to have >> your own install. But they only have tensorflow 1.0 and higher, so your old >> code would require some changes (system install on gpu1 is 0.10, and there >> were breaking changes in both 1.0 and 1.1). >> >> On Fri, May 12, 2017 at 4:55 PM Dougal Sutherland >> wrote: >> >>> It works for me too, not in IPython. Try this: >>> >>> CUDA_VISIBLE_DEVICES=5 python -c 'import tensorflow as tf; >>> tf.InteractiveSession()' >>> >>> On Fri, May 12, 2017 at 4:55 PM Kirthevasan Kandasamy >>> wrote: >>> >>>> No, I don't use iPython. >>>> >>>> On Fri, May 12, 2017 at 11:22 AM, wrote: >>>> >>>>> Have you tried running it from with iPython notebook as an interactive >>>>> session? >>>>> >>>>> I am doing that right now and it works. >>>>> >>>>> Chirag >>>>> >>>>> >>>>> > Kirthevasan Kandasamy wrote: >>>>> > >>>>> >> Hi Predrag, >>>>> >> >>>>> >> I am re-running a tensorflow project on GPU1 - I haven't touched it >>>>> in >>>>> >> 4/5 >>>>> >> months, and the last time I ran it it worked fine, but when I try >>>>> now I >>>>> >> seem to be getting the following error. >>>>> >> >>>>> > >>>>> > This is the first time I hear about it. I was under impression that >>>>> GPU >>>>> > nodes were usable. I am redirecting your e-mail to >>>>> users at autonlab.org >>>>> > in the hope that somebody who is using TensorFlow on the regular >>>>> basis >>>>> > can be of more help. >>>>> > >>>>> > Predrag >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >> Can you please tell me what the issue might be or direct me to >>>>> someone >>>>> >> who >>>>> >> might know? >>>>> >> >>>>> >> This is for the NIPS deadline, so I would appreciate a quick >>>>> response. >>>>> >> >>>>> >> thanks, >>>>> >>>>> >> Samy >>>>> >> >>>>> >> >>>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 >>>>> >> with >>>>> >> properties: >>>>> >> name: Tesla K80 >>>>> >> major: 3 minor: 7 memoryClockRate (GHz) 0.8235 >>>>> >> pciBusID 0000:05:00.0 >>>>> >> Total memory: 11.17GiB >>>>> >> Free memory: 11.11GiB >>>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 >>>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y >>>>> >> I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating >>>>> >> TensorFlow >>>>> >> device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: >>>>> >> 0000:05:00.0) >>>>> >> E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime >>>>> CuDNN >>>>> >> library: 4007 (compatibility version 4000) but source was compiled >>>>> with >>>>> >> 5103 (compatibility version 5100). If using a binary install, >>>>> upgrade >>>>> >> your >>>>> >> CuDNN library to match. If building from sources, make sure the >>>>> library >>>>> >> loaded at runtime matches a compatible version specified during >>>>> compile >>>>> >> configuration. >>>>> >> F tensorflow/core/kernels/conv_ops.cc:457] Check failed: >>>>> >> stream->parent()->GetConvolveAlgorithms(&algorithms) >>>>> >> run_resnet.sh: line 49: 22665 Aborted (core dumped) >>>>> >> CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py >>>>> --data_dir >>>>> >> $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every >>>>> >> $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET >>>>> --num_gpus >>>>> >> 1 >>>>> >> --save_model_dir $SAVE_MODEL_DIR --save_model_every >>>>> $SAVE_MODEL_EVERY >>>>> >> --skip_add_method $SKIP_ADD_METHOD --architecture $ARCHITECTURE >>>>> >> --skip_size >>>>> >> $SKIP_SIZE >>>>> > >>>>> >>>>> >>>>> >>>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandasamy at cmu.edu Fri May 12 13:32:39 2017 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Fri, 12 May 2017 13:32:39 -0400 Subject: Naive Tensorflow/GPU1 question In-Reply-To: References: <20170512015524.wNTLB7Eso%predragp@cs.cmu.edu> <3dd033ee1efd4d00c14db94d52ba37a7.squirrel@webmail.andrew.cmu.edu> Message-ID: actually, I could run your commdn too: kkandasa at gpu1$ CUDA_VISIBLE_DEVICES=5 python -c 'import tensorflow as tf; tf.InteractiveSession()' I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:85:00.0 Total memory: 11.17GiB Free memory: 11.11GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0) Ihere is the error, I get. kkandasa at gpu1$ bash run_resnet2.sh I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally ./cifar10py/train/data_batch_3 ./cifar10py/train/data_batch_4 ./cifar10py/train/train_file Could not read file train_file. ./cifar10py/train/data_batch_2 ./cifar10py/train/data_batch_1 ./cifar10py/valid/data_batch_5 ./cifar10py/valid/valid_file Could not read file valid_file. --- Architecture --- initial filters: 64 residual groups: (11) [64, 64], [64, 64], [128, 128], [128, 128], [128, 128], [256, 256], [256, 256], [256, 256], [256, 256], [512, 512], [512, 512] final fc nodes: 1000 skip add method: linear # total model params: 29703429 (29,703,429) # trainable model params: 14846530 (14,846,530) I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate (GHz) 0.8235 pciBusID 0000:85:00.0 Total memory: 11.17GiB Free memory: 11.11GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0) E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime CuDNN library: 4007 (compatibility version 4000) but source was compiled with 5103 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration. F tensorflow/core/kernels/conv_ops.cc:457] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) run_resnet2.sh: line 33: 9035 Aborted (core dumped) CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py --data_dir $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET --num_gpus 1 --save_model_dir $SAVE_MODEL_DIR --save_model_every $SAVE_MODEL_EVERY --architecture $ARCHITECTURE --skip_size $SKIP_SIZE kkandasa at gpu1$ echo $LD_LIBRARY_PATH /zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib:/usr/local/cuda/lib64:/usr/lib64/mpich/lib On Fri, May 12, 2017 at 1:24 PM, Dougal Sutherland wrote: > The error you showed *should* be triggered by starting a session (not > just by importing tensorflow, but the command I sent earlier does that). > > It could be that your torch install in your home directory is messing with > things. Try exporting LD_LIBRARY_PATH=/usr/local/ > cuda/lib64:/usr/lib64/mpich/lib before starting python. > > On Fri, May 12, 2017 at 6:20 PM Kirthevasan Kandasamy > wrote: > >> hey Dougal, >> >> I could run python and import tensorflow on GPU1 but the issue is when I >> run my command. >> Could it be that GPU4 is still using the older version of tensorflow? >> >> I can run my stuff on GPU4 without much of an issue but not on GPU1. >> Here's what LD_LIBRARY_PATH gives me oin GPU4 and GPU1 >> >> kkandasa at gpu4$ echo $LD_LIBRARY_PATH >> /zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/ >> kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/install/lib: >> >> kkandasa at gpu1$ echo $LD_LIBRARY_PATH >> /zfsauton/home/kkandasa/torch/install/lib:/zfsauton/home/ >> kkandasa/torch/install/lib:/zfsauton/home/kkandasa/torch/ >> install/lib:/usr/local/cuda/lib64:/usr/lib64/mpich/lib >> >> >> >> On Fri, May 12, 2017 at 11:59 AM, Dougal Sutherland >> wrote: >> >>> It's possible that you followed some instructions I sent a while ago and >>> are using your own version of cudnn. Try "echo $LD_LIBRARY_PATH" and make >>> sure it only has things in /usr/local, /usr/lib64 (nothing in your own >>> directories), and make sure that your python code doesn't change that.... >>> >>> The Anaconda python distribution now distributes cudnn and >>> tensorflow-gpu, so you could also install that in your scratch dir to have >>> your own install. But they only have tensorflow 1.0 and higher, so your old >>> code would require some changes (system install on gpu1 is 0.10, and there >>> were breaking changes in both 1.0 and 1.1). >>> >>> On Fri, May 12, 2017 at 4:55 PM Dougal Sutherland >>> wrote: >>> >>>> It works for me too, not in IPython. Try this: >>>> >>>> CUDA_VISIBLE_DEVICES=5 python -c 'import tensorflow as tf; >>>> tf.InteractiveSession()' >>>> >>>> On Fri, May 12, 2017 at 4:55 PM Kirthevasan Kandasamy < >>>> kandasamy at cmu.edu> wrote: >>>> >>>>> No, I don't use iPython. >>>>> >>>>> On Fri, May 12, 2017 at 11:22 AM, wrote: >>>>> >>>>>> Have you tried running it from with iPython notebook as an interactive >>>>>> session? >>>>>> >>>>>> I am doing that right now and it works. >>>>>> >>>>>> Chirag >>>>>> >>>>>> >>>>>> > Kirthevasan Kandasamy wrote: >>>>>> > >>>>>> >> Hi Predrag, >>>>>> >> >>>>>> >> I am re-running a tensorflow project on GPU1 - I haven't touched >>>>>> it in >>>>>> >> 4/5 >>>>>> >> months, and the last time I ran it it worked fine, but when I try >>>>>> now I >>>>>> >> seem to be getting the following error. >>>>>> >> >>>>>> > >>>>>> > This is the first time I hear about it. I was under impression that >>>>>> GPU >>>>>> > nodes were usable. I am redirecting your e-mail to >>>>>> users at autonlab.org >>>>>> > in the hope that somebody who is using TensorFlow on the regular >>>>>> basis >>>>>> > can be of more help. >>>>>> > >>>>>> > Predrag >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> >> Can you please tell me what the issue might be or direct me to >>>>>> someone >>>>>> >> who >>>>>> >> might know? >>>>>> >> >>>>>> >> This is for the NIPS deadline, so I would appreciate a quick >>>>>> response. >>>>>> >> >>>>>> >> thanks, >>>>>> >>>>>> >> Samy >>>>>> >> >>>>>> >> >>>>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found >>>>>> device 0 >>>>>> >> with >>>>>> >> properties: >>>>>> >> name: Tesla K80 >>>>>> >> major: 3 minor: 7 memoryClockRate (GHz) 0.8235 >>>>>> >> pciBusID 0000:05:00.0 >>>>>> >> Total memory: 11.17GiB >>>>>> >> Free memory: 11.11GiB >>>>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 >>>>>> >> I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y >>>>>> >> I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating >>>>>> >> TensorFlow >>>>>> >> device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: >>>>>> >> 0000:05:00.0) >>>>>> >> E tensorflow/stream_executor/cuda/cuda_dnn.cc:347] Loaded runtime >>>>>> CuDNN >>>>>> >> library: 4007 (compatibility version 4000) but source was compiled >>>>>> with >>>>>> >> 5103 (compatibility version 5100). If using a binary install, >>>>>> upgrade >>>>>> >> your >>>>>> >> CuDNN library to match. If building from sources, make sure the >>>>>> library >>>>>> >> loaded at runtime matches a compatible version specified during >>>>>> compile >>>>>> >> configuration. >>>>>> >> F tensorflow/core/kernels/conv_ops.cc:457] Check failed: >>>>>> >> stream->parent()->GetConvolveAlgorithms(&algorithms) >>>>>> >> run_resnet.sh: line 49: 22665 Aborted (core dumped) >>>>>> >> CUDA_VISIBLE_DEVICES=$GPU python ../resnettf/resnet_main.py >>>>>> --data_dir >>>>>> >> $DATA_DIR --max_batch_iters $NUM_ITERS --report_results_every >>>>>> >> $REPORT_RESULTS_EVERY --log_root $LOG_ROOT --dataset $DATASET >>>>>> --num_gpus >>>>>> >> 1 >>>>>> >> --save_model_dir $SAVE_MODEL_DIR --save_model_every >>>>>> $SAVE_MODEL_EVERY >>>>>> >> --skip_add_method $SKIP_ADD_METHOD --architecture $ARCHITECTURE >>>>>> >> --skip_size >>>>>> >> $SKIP_SIZE >>>>>> > >>>>>> >>>>>> >>>>>> >>>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandasamy at cmu.edu Fri May 12 14:04:15 2017 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Fri, 12 May 2017 14:04:15 -0400 Subject: Hogging up GPU4 Message-ID: Hi all, I am going to hog up all 4 nodes on GPU4 for the next week. All 4 GPUs have to be on the same machine since it is a parallelised experiment. Please don't start a job on any of the GPUs on GPU4. Sometimes, the GPUs might be idle, but please don't start a job. (The idle time is part of the experiment where I am trying to show that our method uses less idle time.) For some of the alternatives, this idle time could be up to a couple of hours. Thanks! samy -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandasamy at cmu.edu Fri May 12 14:04:15 2017 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Fri, 12 May 2017 14:04:15 -0400 Subject: Hogging up GPU4 Message-ID: Hi all, I am going to hog up all 4 nodes on GPU4 for the next week. All 4 GPUs have to be on the same machine since it is a parallelised experiment. Please don't start a job on any of the GPUs on GPU4. Sometimes, the GPUs might be idle, but please don't start a job. (The idle time is part of the experiment where I am trying to show that our method uses less idle time.) For some of the alternatives, this idle time could be up to a couple of hours. Thanks! samy -------------- next part -------------- An HTML attachment was scrubbed... URL: From sheath at andrew.cmu.edu Mon May 15 11:51:25 2017 From: sheath at andrew.cmu.edu (Simon Heath) Date: Mon, 15 May 2017 11:51:25 -0400 Subject: Code and Coffee, Thursday May 18th In-Reply-To: References: Message-ID: Predrag has requested we not do this topic while he's out of town, so I guess we should push it back to the next meeting. Any suggestions for things people want to cover? If not I can say a few words about how to interface Python and C code and get them talking back and forth. Simon On Thu, May 11, 2017 at 3:32 PM, Simon Heath wrote: > Hey all, > > Code and Coffee is next Thursday, May 18th in NSH 3001. If anyone has a > topic they want to present or talk about, speak up or forever hold your > peace! > > One topic suggestion from Ben and Maria is to talk about ways we can more > effectively deal with resource sharing on our compute nodes: CPU, memory, > scratch space, GPU's, etc. Is there some system or policy we can implement > to help everyone share more harmoniously? Do we want a technical solution > like a queue system, or it just a matter of keeping an eye on your work and > communicating effectively? Perhaps we need more public service > announcements like the one attached? > > Discuss, or offer alternative topics! > > Simon > > -- > Simon Heath, Research Programmer and Analyst > Robotics Institute - Auton Lab > Carnegie Mellon University > sheath at andrew.cmu.edu > -- Simon Heath, Research Programmer and Analyst Robotics Institute - Auton Lab Carnegie Mellon University sheath at andrew.cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From boecking at andrew.cmu.edu Wed May 17 13:40:27 2017 From: boecking at andrew.cmu.edu (Benedikt Boecking) Date: Wed, 17 May 2017 13:40:27 -0400 Subject: Computing node resources Message-ID: All, Right now there are over 80 threads running on lov4, a machine that only has 64 cores. The same happened on lov3 earlier today. I know that deadlines are approaching but please try to follow some reasonable person principles. Here is a non-exhaustive list of things you should do before running experiments on our servers: 1. Before starting a new job, check the amount of available memory and how many other jobs are currently running. The easiest way to do this is to use htop. 2. If a computing node is at its limit, check if any other nodes are underutilized (http://monit.autonlab.org:8080/status/hosts/ ) 3. ?nice" your jobs if they require a lot of resources and will be running for a long time (https://en.wikipedia.org/wiki/Nice_(Unix)) 4. Use a reasonable number of threads and limit excessive memory usage. 5. Close your jupyter notebooks, matlab sessions etc. that you don?t need anymore 6. Move files from the scratch to your home directory on zfsauton if you don?t need them anymore for your current experiments. 7. If you are using GPUs, use nvidia-smi to check utilization and make sure your code does not automatically allocate all GPUs and all GPU memory to your experiment. Please respond to this email if you have any additional recommendations for your fellow lab members. Best, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From sibi.venkatesan at gmail.com Wed May 17 13:52:52 2017 From: sibi.venkatesan at gmail.com (Sibi Venkatesan) Date: Wed, 17 May 2017 17:52:52 +0000 Subject: Computing node resources In-Reply-To: References: Message-ID: Hi everyone, I ran into an issue earlier today where I thought I was running some single threaded python code but numpy defaults to multi-threaded for some linear algebra. I think I might be the only one running into this problem (perhaps because my numpy version is different or something). But I got around it by setting the following environment variables when I wanted to force it to be single threaded: export MKL_NUM_THREADS=1 export NUMEXPR_NUM_THREADS=1 export OMP_NUM_THREADS=1 More info here: http://stackoverflow.com/questions/17053671/python-how-do-you-stop-numpy-from-multithreading http://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy Maybe there's a simpler way, or maybe no one else faces this problem. But it did fix the issue for me. On Wed, May 17, 2017 at 1:41 PM Benedikt Boecking wrote: > All, > > Right now there are over *80 threads* running on *lov4*, a machine that > only has *64 cores*. The same happened on lov3 earlier today. I know that > deadlines are approaching but please try to follow some reasonable person > principles. Here is a non-exhaustive list of things you should do before > running experiments on our servers: > > 1. Before starting a new job, check the amount of available memory and how > many other jobs are currently running. The easiest way to do this is to use > htop. > 2. If a computing node is at its limit, check if any other nodes are > underutilized (http://monit.autonlab.org:8080/status/hosts/) > 3. ?nice" your jobs if they require a lot of resources and will be running > for a long time (https://en.wikipedia.org/wiki/Nice_(Unix)) > 4. Use a reasonable number of threads and limit excessive memory usage. > 5. Close your jupyter notebooks, matlab sessions etc. that you don?t need > anymore > 6. Move files from the scratch to your home directory on zfsauton if you > don?t need them anymore for your current experiments. > 7. If you are using GPUs, use nvidia-smi to check utilization and make > sure your code does not automatically allocate all GPUs and all GPU memory > to your experiment. > > Please respond to this email if you have any additional recommendations > for your fellow lab members. > > Best, > Ben > > > > -- - Sibi -------------- next part -------------- An HTML attachment was scrubbed... URL: From sheath at andrew.cmu.edu Wed May 17 15:53:30 2017 From: sheath at andrew.cmu.edu (Simon Heath) Date: Wed, 17 May 2017 15:53:30 -0400 Subject: Code and Coffee reminder -- tomorrow at 12:30 pm in NSH 3001 Message-ID: Just a reminder, you are all invited to Code and Coffee on Thursday May 18th at 12:30 pm. It will be in NSH 3001 and we will talk about whatever you want, though if nobody has any objections I can give a quick lesson on how to write fast numerical code in C and then access it from Python. Predrag wants to be present for the talk about compute node sharing policy and he is on vacation this week, though if we don't mind rehashing the topic for him we can do that too. I'll bring the coffee, everyone else bring the code. Simon -- Simon Heath, Research Programmer and Analyst Robotics Institute - Auton Lab Carnegie Mellon University sheath at andrew.cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Wed May 17 17:52:40 2017 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Wed, 17 May 2017 17:52:40 -0400 Subject: Fwd: Congratulations to Dean Andrew Moore! In-Reply-To: <37EF77A8-C06C-46CB-B4A8-2F7006C97E62@andrew.cmu.edu> References: <37EF77A8-C06C-46CB-B4A8-2F7006C97E62@andrew.cmu.edu> Message-ID: <7e66c671-9e28-6155-d285-e7de5b217256@cs.cmu.edu> Hearty congrats Andrew ! PS I am told that the rumors suggesting that the bow tie in the attached picture was photoshopped are totally unfounded. PSPS Yeah. PSPSPS They forgot to recognize his biggest achievement: Establishing the Auton Lab :P -------- Forwarded Message -------- Subject: Congratulations to Dean Andrew Moore! Date: Wed, 17 May 2017 20:41:10 +0000 From: CMU Provost Dear Friends and Colleagues, I am writing to share some exciting recognition recently received by Andrew Moore, dean of the School of Computer Science. At the recent Maecenas Gala , a black-tie fundraiser for the Pittsburgh Opera, several local educators, public servants, business leaders and artists were honored for contributions to Pittsburgh that have helped distinguish and define our city. The gala celebrated Andrew as the Maecenas Honoree in Technology. (You might be wondering what Andrew Moore at a black-tie gala might look like, so I have attached a photo of the occasion ? but don?t worry, he was still wearing his trademark orange socks!) In addition, David Porges, Executive Chairman of CMU?s Board of Trustees, was also honored for his contributions to the energy industry here in Pittsburgh. Andrew has not only been an incredible leader of our School of Computer Science, but through his mentorship of students and his cultivation of excellence in the technology sector in Pittsburgh, he is ensuring that this city ? and the institution of Carnegie Mellon University ? remain global leaders in innovative technologies, next generation computing, and data-driven discoveries. Congrats to Andrew Moore on this well-deserved recognition! Warm regards, Farnam *Farnam Jahanian | Provost and Chief Academic Officer | Carnegie Mellon University* 5000 Forbes Avenue, Warner Hall| Pittsburgh, PA 15213 | o: 412.268.3363 provost at cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0379 Maecenas XXXIII.jpg Type: image/jpeg Size: 1332195 bytes Desc: not available URL: From sheath at andrew.cmu.edu Thu May 18 11:24:59 2017 From: sheath at andrew.cmu.edu (Simon Heath) Date: Thu, 18 May 2017 11:24:59 -0400 Subject: Reminder: Code and coffee TODAY 12:30 pm in NSH 3001 Message-ID: This is the last spammy reminder, I promise! -- Simon Heath, Research Programmer and Analyst Robotics Institute - Auton Lab Carnegie Mellon University sheath at andrew.cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at imap.srv.cs.cmu.edu Mon May 22 12:24:04 2017 From: predragp at imap.srv.cs.cmu.edu (Predrag Punosevac) Date: Mon, 22 May 2017 12:24:04 -0400 Subject: Neill file server practically killed In-Reply-To: References: Message-ID: <60eafd306c8a0e5cb72ded80312db8fa@imap.srv.cs.cmu.edu> This message is of concern only for the members of Neill group. I just got from a vocation to find out that one of the group members have practically trashed the file server over the past 7 days. A prime suspect is 5.8T emcfowla Namely over the past week somebody have created so much data on the file server that ZFS pool is practically 100% in use (6.5 TB of compressed data). ZFS is not a normal file system and once that full I will have very hard time to recover the file server operation. I need you to stop generating any data onto NFS right now as none of you will be able to log into the system. It will take me a day or two to develop a strategy how to deal with this (the fact that I am receiving 4 interns today is not helping). Server might be unusable for at least a week. One possible outcome will be that we will need to buy a new file server to resume day to day operations while we clean and rebuilt this one. Best, Predrag -------- Original Message -------- Subject: Autonlab-sysinfo Digest, Vol 34, Issue 51 Date: 2017-05-22 12:00 From: autonlab-sysinfo-request at autonlab.org To: autonlab-sysinfo at autonlab.org Reply-To: autonlab-sysinfo at autonlab.org Send Autonlab-sysinfo mailing list submissions to autonlab-sysinfo at autonlab.org To subscribe or unsubscribe via the World Wide Web, visit https://mailman.srv.cs.cmu.edu/mailman/listinfo/autonlab-sysinfo or, via email, send a message with subject or body 'help' to autonlab-sysinfo-request at autonlab.org You can reach the person managing the list at autonlab-sysinfo-owner at autonlab.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Autonlab-sysinfo digest..." Today's Topics: 1. Critical Alerts (auton.sysnotify at gmail.com) 2. Critical Alerts (auton.sysnotify at gmail.com) 3. Critical Alerts (auton.sysnotify at gmail.com) 4. Critical Alerts (auton.sysnotify at gmail.com) 5. M/Monit report (ILIM: Monit instance changed monit on 22 May 13:36:34 +0000) (auton.sysnotify at gmail.com) 6. Critical Alerts (auton.sysnotify at gmail.com) ---------------------------------------------------------------------- Message: 1 Date: Mon, 22 May 2017 10:10:01 -0000 From: auton.sysnotify at gmail.com To: sysinfo at autonlab.org Subject: Critical Alerts Message-ID: <5922b8fa.83d4370a.3eeea.3018 at mx.google.com> Content-Type: text/plain; charset="utf-8" The capacity for the volume 'zfsneill' is currently at 92%, while the recommended value is below 80%. ------------------------------ Message: 2 Date: Mon, 22 May 2017 10:55:00 -0000 From: auton.sysnotify at gmail.com To: sysinfo at autonlab.org Subject: Critical Alerts Message-ID: <5922c385.c72c370a.2f0d0.2fec at mx.google.com> Content-Type: text/plain; charset="utf-8" The capacity for the volume 'zfsneill' is currently at 93%, while the recommended value is below 80%. ------------------------------ Message: 3 Date: Mon, 22 May 2017 11:45:00 -0000 From: auton.sysnotify at gmail.com To: sysinfo at autonlab.org Subject: Critical Alerts Message-ID: <5922cf3d.4e9f370a.98314.4357 at mx.google.com> Content-Type: text/plain; charset="utf-8" The capacity for the volume 'zfsneill' is currently at 94%, while the recommended value is below 80%. ------------------------------ Message: 4 Date: Mon, 22 May 2017 13:10:01 -0000 From: auton.sysnotify at gmail.com To: sysinfo at autonlab.org Subject: Critical Alerts Message-ID: <5922e32a.e332c80a.cf14b.05cc at mx.google.com> Content-Type: text/plain; charset="utf-8" The capacity for the volume 'zfsneill' is currently at 95%, while the recommended value is below 80%. ------------------------------ Message: 5 Date: Mon, 22 May 2017 13:53:36 +0000 From: auton.sysnotify at gmail.com To: sysinfo at autonlab.org Subject: M/Monit report (ILIM: Monit instance changed monit on 22 May 13:36:34 +0000) Message-ID: <177C61CBCF6F4F16A56E7A5025779844 at monit.int.autonlab.org> Content-Type: text/plain; charset="utf-8" Date: 22 May 13:36:34 +0000 Host: ILIM Service: monit Action: Start Description: Monit 5.16 started Your faithful employee, M/Monit -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 6 Date: Mon, 22 May 2017 15:40:01 -0000 From: auton.sysnotify at gmail.com To: sysinfo at autonlab.org Subject: Critical Alerts Message-ID: <59230652.9163370a.9da54.5e52 at mx.google.com> Content-Type: text/plain; charset="utf-8" The capacity for the volume 'zfsneill' is currently at 96%, while the recommended value is below 80%. ------------------------------ Subject: Digest Footer _______________________________________________ Autonlab-sysinfo mailing list Autonlab-sysinfo at autonlab.org https://mailman.srv.cs.cmu.edu/mailman/listinfo/autonlab-sysinfo ------------------------------ End of Autonlab-sysinfo Digest, Vol 34, Issue 51 ************************************************ From kandasamy at cmu.edu Mon May 22 14:24:30 2017 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Mon, 22 May 2017 14:24:30 -0400 Subject: Hogging up GPU4 In-Reply-To: References: Message-ID: Hi all, Thanks everyone for your understanding on this. I have created a set of new (larger) jobs - and once again they all need to be on the same machine. I am expecting this to take till about next Friday, i.e. June 2nd. Since the nips deadline is over and some of the other GPUs are free now, I hope people can use them instead. best, Samy On Fri, May 12, 2017 at 2:04 PM, Kirthevasan Kandasamy wrote: > Hi all, > > I am going to hog up all 4 nodes on GPU4 for the next week. All 4 GPUs > have to be on the same machine since it is a parallelised experiment. > > Please don't start a job on any of the GPUs on GPU4. > > Sometimes, the GPUs might be idle, but please don't start a job. (The idle > time is part of the experiment where I am trying to show that our method > uses less idle time.) > For some of the alternatives, this idle time could be up to a couple of > hours. > > Thanks! > samy > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chiragn at andrew.cmu.edu Mon May 22 18:44:23 2017 From: chiragn at andrew.cmu.edu (chiragn at andrew.cmu.edu) Date: Mon, 22 May 2017 18:44:23 -0400 Subject: benchmarking on low1 Message-ID: <4ce358fa1a033e9d5357194b9bfb28be.squirrel@webmail.andrew.cmu.edu> Hi all I am benchmarking some code on low1 for a submission due tomorrow. I request if you could please avoid using low1 till tomorrow 11.59 PM PS. I am willing to free up all other resources I have occupied on any other machines till then, incase so required. Much Thanks Chirag From predragp at imap.srv.cs.cmu.edu Wed May 24 13:23:31 2017 From: predragp at imap.srv.cs.cmu.edu (Predrag Punosevac) Date: Wed, 24 May 2017 13:23:31 -0400 Subject: Why is lop1 rebooting alla time? In-Reply-To: <717df7a8-2fb2-912d-2d36-a5a6c495f2da@ri.cmu.edu> References: <717df7a8-2fb2-912d-2d36-a5a6c495f2da@ri.cmu.edu> Message-ID: <1643f02839a249e465ae8392b5372ce1@imap.srv.cs.cmu.edu> On 2017-05-24 13:18, Rob Maclachlan wrote: > Messes with my x2go session. > > Rob > > -bash-4.4$ uptime > 1:15PM up 1 min, 1 user, load averages: 1.27, 0.39, 0.14 > -bash-4.4$ last reboot > reboot ~ Wed May 24 13:15 > reboot ~ Wed May 24 09:07 > reboot ~ Tue May 23 22:33 > reboot ~ Tue May 23 17:50 > reboot ~ Tue May 23 17:30 > reboot ~ Tue May 23 08:34 > reboot ~ Mon May 22 21:09 > reboot ~ Mon May 22 18:18 > reboot ~ Mon May 22 17:29 > reboot ~ Sun May 21 14:15 > reboot ~ Sat May 20 22:47 > reboot ~ Sat May 20 13:57 > reboot ~ Sat May 20 08:42 > > wtmp begins Sat May 20 08:42 2017 Thanks for bringing this to my attention! My guess is flaky hardware. Something is dying. Possibly power supply. I noticed lots of Monit restart messages but I was not too concern as the machine is fairly new. I will follow up on this report. Predrag From predragp at imap.srv.cs.cmu.edu Wed May 24 16:45:44 2017 From: predragp at imap.srv.cs.cmu.edu (Predrag Punosevac) Date: Wed, 24 May 2017 16:45:44 -0400 Subject: Why is lop1 rebooting alla time? In-Reply-To: <1643f02839a249e465ae8392b5372ce1@imap.srv.cs.cmu.edu> References: <717df7a8-2fb2-912d-2d36-a5a6c495f2da@ri.cmu.edu> <1643f02839a249e465ae8392b5372ce1@imap.srv.cs.cmu.edu> Message-ID: <8ec40450fecd2c4c05d49537e02fc6da@imap.srv.cs.cmu.edu> On 2017-05-24 13:23, Predrag Punosevac wrote: > On 2017-05-24 13:18, Rob Maclachlan wrote: >> Messes with my x2go session. >> I just got back from the server room and I think we have the hardware issue. AC adapter was very hot (this micro server has no classical power). I just put new AC adopter. If that doesn't fix the problem we will try something else. Bottom line, I have enough hardware to replace this machine if it turns out we have bigger problem. Predrag >> Rob >> >> -bash-4.4$ uptime >> 1:15PM up 1 min, 1 user, load averages: 1.27, 0.39, 0.14 >> -bash-4.4$ last reboot >> reboot ~ Wed May 24 13:15 >> reboot ~ Wed May 24 09:07 >> reboot ~ Tue May 23 22:33 >> reboot ~ Tue May 23 17:50 >> reboot ~ Tue May 23 17:30 >> reboot ~ Tue May 23 08:34 >> reboot ~ Mon May 22 21:09 >> reboot ~ Mon May 22 18:18 >> reboot ~ Mon May 22 17:29 >> reboot ~ Sun May 21 14:15 >> reboot ~ Sat May 20 22:47 >> reboot ~ Sat May 20 13:57 >> reboot ~ Sat May 20 08:42 >> >> wtmp begins Sat May 20 08:42 2017 > > Thanks for bringing this to my attention! My guess is flaky hardware. > Something is dying. Possibly power supply. I noticed lots of Monit > restart messages but I was not too concern as the machine is fairly > new. > > > I will follow up on this report. > > Predrag From predragp at imap.srv.cs.cmu.edu Wed May 24 16:48:21 2017 From: predragp at imap.srv.cs.cmu.edu (Predrag Punosevac) Date: Wed, 24 May 2017 16:48:21 -0400 Subject: Why is lop1 rebooting alla time? In-Reply-To: <1643f02839a249e465ae8392b5372ce1@imap.srv.cs.cmu.edu> References: <717df7a8-2fb2-912d-2d36-a5a6c495f2da@ri.cmu.edu> <1643f02839a249e465ae8392b5372ce1@imap.srv.cs.cmu.edu> Message-ID: On 2017-05-24 13:23, Predrag Punosevac wrote: > On 2017-05-24 13:18, Rob Maclachlan wrote: >> Messes with my x2go session. >> >> Rob >> I just got from the server room. The AC adapter was extremely hot (This micro server doesn't have a normal power supply). I just put the new AC adapter. We will monitor situation few days. If this doesn't fix the problem we will try something else. Bottom line I have enough hardware on my hands right now even to replace this machine if needs be. Predrag >> -bash-4.4$ uptime >> 1:15PM up 1 min, 1 user, load averages: 1.27, 0.39, 0.14 >> -bash-4.4$ last reboot >> reboot ~ Wed May 24 13:15 >> reboot ~ Wed May 24 09:07 >> reboot ~ Tue May 23 22:33 >> reboot ~ Tue May 23 17:50 >> reboot ~ Tue May 23 17:30 >> reboot ~ Tue May 23 08:34 >> reboot ~ Mon May 22 21:09 >> reboot ~ Mon May 22 18:18 >> reboot ~ Mon May 22 17:29 >> reboot ~ Sun May 21 14:15 >> reboot ~ Sat May 20 22:47 >> reboot ~ Sat May 20 13:57 >> reboot ~ Sat May 20 08:42 >> >> wtmp begins Sat May 20 08:42 2017 > > Thanks for bringing this to my attention! My guess is flaky hardware. > Something is dying. Possibly power supply. I noticed lots of Monit > restart messages but I was not too concern as the machine is fairly > new. > > > I will follow up on this report. > > Predrag From sheath at andrew.cmu.edu Wed May 31 04:06:51 2017 From: sheath at andrew.cmu.edu (Simon Heath) Date: Wed, 31 May 2017 04:06:51 -0400 Subject: Code and Coffee -- This Thursday at 12:30 pm, NSH 3001 Message-ID: Dear all, Just a reminder, Code and Coffee is this Thursday (tomorrow!) at 12:30 pm in NSH 3001. I am out of town though so you are on your own for discussion topics. If anyone wants to make coffee though, it's on the shelf next to my desk. Enjoy! Simon -- Simon Heath, Research Programmer and Analyst Robotics Institute - Auton Lab Carnegie Mellon University sheath at andrew.cmu.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: