From predragp at andrew.cmu.edu Fri Jan 4 19:11:23 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 4 Jan 2019 19:11:23 -0500 Subject: GPU10 fully functional Message-ID: Happy New Year to everyone! I just got out of the machine room and I have a good news (sort of). GPU10 is now fully functional and nvidia-smi correctly reports three GeForce 1081Ti cards. It turns out the fourth card was a lemon (DoA). It took 2h of playing combinatorics to pinpoint the culprit. The other problem was that proprietary pigtails power cables used for GPU cards have a tendency to lose connection on one of 4x4 ends (GPU cards need 4x4+3x3 power supply). That manifests in NVidia card report card as underpowered. I think that I have this one under control now. Best, Predrag From awd at cs.cmu.edu Mon Jan 7 09:20:40 2019 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 7 Jan 2019 09:20:40 -0500 Subject: Lecture at Pitt of potential relevance [Fwd: FW: REMINDER: CRISMA Weekly Research Conference - January 8, 2019] In-Reply-To: References: Message-ID: [note this is not our Mathieu!] Artur ---------- Forwarded message --------- From: Clermont, Gilles Date: Mon, Jan 7, 2019 at 8:07 AM Subject: FW: REMINDER: CRISMA Weekly Research Conference - January 8, 2019 To: awd at cs.cmu.edu , Hauskrecht, Milos Matthieu is in town for a couple of days. Probably appropriate for some of the students to attend this. @Artur. He is from Imperial College. Maybe overlap with Fabian. gilles *From:* Toboz, Jaclyn *Sent:* Monday, January 7, 2019 7:57 AM *To:* Cooper, Greg ; Peddada, Shyamal Das ; Porterfield, Toni ; Kapko, Bernadette E ; Roberts, Mark Stenius ; Borowski, Susan Lynn < borowski at pitt.edu>; Marroquin, Oscar ; Beerman, Margery ; Bart, Robert ; Leslie, Beth < leslieba at upmc.edu>; Clermont, Gilles ; Tang, Lu < LUTANG at pitt.edu>; Goffeney, Robertta ; Gunn, Scott < gunnsr at ccm.upmc.edu>; Donadee, Chenell L ; Allan, Dawn Ambrose ; Chang, Chung-Chou Ho ; King, Andrew Joseph ; Alarcon, Louis ; Arnold, R. M. ; Arnold, Robert M ; Axelson, Joan ; Badran, Omar ; Baldridge, Kris ; Barbash, Ian ; Barnhart, Kimberlee A ; Bauza, Graciela < bauzagm at upmc.edu>; deanna.blisard at va.gov; christopher.brackney at va.gov; CCM 1st Year Fellows ; CCM 2nd Year Fellows < ccm2ndyearfellows at anes.upmc.edu>; CCM CRISMA All < CCMCRISMAALL_CCMOnly at UPMC.EDU>; CCM CRISMA Assoc. Faculty < CCMCRISMAAssocFaculty at upmc.edu>; CCM CRISMA DATA MGMT < CCMCRISMADATAMGMT at UPMC.EDU>; CCM CRISMA Weekly Meeting < CCMCRISMAWeeklyMeeting at UPMC.EDU>; CCM PICU Fellows < CCMCHPFellows at anes.upmc.edu>; CCMcomm (Service Account) ; Chen, Huiwen ; Chen, Huiwen ; Chou, Sherry ; Demerle, Kimberley ; DiGiacomo, Kathy ; Early, Barbara J ; Emlet, Lillian Liang ; Fertal, Anastasia < anastasia.fertal at chp.edu>; Fink, Ericka ; Fiorentino, Marco ; Formeck, Cassandra ; Forsythe, Raquel M ; Fuhrman, Dana < dana.fuhrman at chp.edu>; Gibson, Abagale ; Gigli, Kristin < KRG68 at pitt.edu>; Girard, Timothy D ; Gunsallus, Brittany Nicole ; Gunsallus, Brittany < gunsallusbn at upmc.edu>; Hall, Daniel ; Hanmer, Janel < hanmerjz at upmc.edu>; Hasskamp, Joanne Haller ; Hauskrecht, Milos ; Holt, Emma ; Hong, Charmgil < chh91 at pitt.edu>; Horvat, Christopher ; Izawa, Junichi ; Jacobs, Bruce ; Kellum Jr, John A ; kennedyj4 at upmc.edu; Khalid, Maria < Khalid.Maria at medstudent.pitt.edu>; Kobzik, Alexander ; koskycj at upmc.edu; Kulkarni, Shreyus ; LaForce, Janeen Renee ; Lincoln, Taylor ; Linstrum, Kelsey ; Manrique, Carlos ; Martin, Mackenzie (CCM) ; Mason, Rochelle L ; Mayr, Florian B ; Morse, Noah ; Neal, Matthew D ; Nieri, Karen ; nuzzoea2 < nuzzoea2 at upmc.edu>; O'Donnell, Wendy (CCM) ; Orlowski, Samantha Jo ; Padmanabhan, Rajagopala < padmanabhanrr at upmc.edu>; Peerapornratana, Sadudee ; Pinsky, Michael R ; Priyanka, Priyanka ; Prout, Andrew ; Reitz, Katherine ; Robas, Edward ; Rubin, Pamela ; Rudd, Kristina ; Sabnis, Saniya ; Scott, Melanie ; Seaman, Jennifer Burgher ; Senussi, Mourad ; Seymour, Christopher W < seymourc at pitt.edu>; Shoemaker, Gail L ; Shutter, Lori ; Skrtich, Aimee ; Suen, Angela < suena at upmc.edu>; Kerti, Samantha Jo ; Treu, Megan Marie < MMT55 at pitt.edu>; Vates, Jennifer Rebecca ; Wallace, David J ; xut at upmc.edu; Yang, Xiaoli ; sachin.yende at va.gov; Yoon, Joo ; Zhang, Li Ang < zhangl at pitt.edu>; Zhao, Huiying ; Brungo, Michelle < brungom4 at upmc.edu>; Buck, Meghan ; Green, Traci R. < greetr at upmc.edu>; Kramer, Adam ; Mclaughlin, Brandon < mclaughlinbr at upmc.edu>; Mistrick, Fran ; Nieman, Natalie ; michelle.poropatic at med.va.gov; Russo, Kathryn (CHP CICU) ; Santucci, Janet ; Wilkerson, Aja ; Woodson, Joanne < woodsonjl at ccm.upmc.edu>; Basile, Kimberly Mowrey ; Caplan, Erin Anne ; Davis, Billie ; Hall, Daryll < DAH185 at pitt.edu>; Liotus, Laura G ; Mancing, Olivia Rose < orm9 at pitt.edu>; Polaski, Trista J ; Rak, Kimberly Joy < kjr8 at pitt.edu>; Ricketts, Daniel ; Tsang, Sheik Ali Wing < ali.tsang at pitt.edu>; Abaye, Menna ; Brown, Elke P < brownep at upmc.edu>; Buddadhumaruk, Praewpannarai ; Butler, Rachel ; Callahan, Hannah Renee ; Curtis, Brett Richard ; Felman, Kristyn < felmankl2 at upmc.edu>; Shields, Anne Marie ; White, Douglas B *Cc:* Angus, Derek ; Seymour, Christopher W < seymourc at pitt.edu>; Vates, Jennifer Rebecca ; CCMcomm (Service Account) ; O'Donnell, Wendy < wendyodonnell at pitt.edu>; Orlowski, Samantha Jo ; Robas, Edward *Subject:* REMINDER: CRISMA Weekly Research Conference - January 8, 2019 *Please note room location for this lecture:* *A**NNOUNCEMENT* *CCM CRISMA WEEKLY RESEARCH CONFERENCE* Tuesday, January 8, 2019 12:00 pm ? 1:00 pm Lecture Room 5, 4th floor, Scaife Hall *?Improving sepsis resuscitation with reinforcement learning?* Matthieu Komorowski, MD Clinical Senior Lecturer, Imperial College London Honorary Consultant, Intensive Care Unit, Charing Cross Hospital, London Visiting Scholar, Laboratory of Computational Physiology, Harvard-MIT Health Sciences and Technology *Lunch will be provided* *Continuing Medical Education* *The University of Pittsburgh School of Medicine is accredited by the ACCME to provide continuing medical education for physicians.* *The University of Pittsburgh School of Medicine designates this educational activity* *for a maximum of 1.0 AMA PRA Category 1 Credit **TM**.* *Physicians should only claim credit commensurate with the extent of their participation in the activity* *Other health care professionals are awarded 1.0 continuing education units (CEU's) which are equal to 1.0 contact hours.* *Faculty Disclosure* *In accordance with Accreditation Council for Continuing Medical Education requirements on disclosure, information about relationships of presenters with commercial interests (if any) will be included in materials distributed at the time of the conference.* Thank You, *Jackie* Jackie Toboz Senior Administrative Assistant Department of Critical Care Medicine (CCM) And CRISMA Center -- *C*linical *R*esearch, *I*nvestigation and *S*ystems *M*odeling of *A*cute Illness 3550 Terrace St 606E Scaife Hall Pittsburgh, PA 15261 Phone: 412-647-7125 Fax: 412-647-8060 Email: tobozjm2 at upmc.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Mon Jan 7 09:25:48 2019 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 7 Jan 2019 09:25:48 -0500 Subject: Fwd: [ML-news] [KDD 2019] Call for Applied Data Science Papers In-Reply-To: <932110c4-d36d-480e-95fd-c144c2b2ed29@googlegroups.com> References: <932110c4-d36d-480e-95fd-c144c2b2ed29@googlegroups.com> Message-ID: This is potentially a good outlet for some of our work that has a strong application context. Please let me know if you'd like to submit something. Artur ---------- Forwarded message --------- From: Ping Zhang Date: Sun, Jan 6, 2019 at 9:53 PM Subject: [ML-news] [KDD 2019] Call for Applied Data Science Papers To: Machine Learning News ===================================== KDD 2019 Call for Applied Data Science Papers ===================================== Details in https://www.kdd.org/kdd2019/calls/view/kdd-2019-call-for-applied-data-science-papers Website for submissions: https://easychair.org/conferences/?conf=kdd19 . ************** Key Dates ************** Submission: February 3, 2019 Notification: Apr 28, 2019 Camera-ready: May 17, 2019 Short Promotional Video of Accepted Papers (Required): June 2, 2019 Source Code (Optional): June 2, 2019 Conference (Anchorage, Alaska): August 3 to August 7, 2019 All deadlines are at 11:59PM Alofi Time. There will be absolutely no exception to these deadlines. ************** Description ************** We solicit submissions of papers describing designs and implementations of solutions and systems for practical tasks in data mining, data analytics, data science, and applied machine learning. The primary emphasis is on papers that either solve or advance the understanding of issues related to deploying data science technologies in the real world. Submitted papers will go through a peer review process. The Applied Data Science Track is distinct from the Research Track in that submissions focus on applied work addressing real-world problems and systems demonstrating tangible impact/value in their respective domains (eg. industries, government initiatives, social programs). Submissions must clearly identify the category they fall into:: ?deployed? or ?evidential. The ADS Chairs might shift a submission from one category to another, if they find that the submission is misplaced. The criteria for submissions in each category are as follows: CATEGORY Deployed : Must describe implementation of a system that solves a significant real-world problem and is (or was) in production use for an extended period of time. The paper should present the problem, its significance to the application domain, the decisions and tradeoffs made when making design choices for the solution, the deployment challenges, and the lessons learned from successes and failures . Evidence must be provided that the solution has been deployed by quantifying post-launch performance. Papers that describe enabling infrastructure for large-scale deployment of applied machine learning also fall in this category. An example might be a deployed system that collects heartbeat audio from mobile phones during a marathon race and uses machine learning to identify potentially irregular signals and to alert support personnel. Examples from past KDD conferences: HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network Cascade Ranking for Operational E-commerce Search CATEGORY Evidential : Must describe fundamental insights derived from addressing a significant real-world problem, even though a system has not been deployed. This might include papers providing significant gains in the understanding of a applied area/domain (for example, involving data or system deployment needs) or even papers where a conclusion has been reached that the problem is unsolvable. In addition to insights the paper must explain what milestones were reached, what the practical impact is, and (if applicable) what the obstacles to deployment are. Straightforward improvements over trivial baseline solutions are unlikely to qualify. Continuing the example above, a paper in this category might present a system that achieves reasonable error rates in an experiment with many volunteers but suffers from interferences among mobiles that are located very close to each other. Examples from past KDD conferences: DeepSD: Generating High Resolution Climate Change Projections through Single Image Super-Resolution A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments Backpage and Bitcoin: Uncovering Human Traffickers TFX: A TensorFlow-Based Production-Scale Machine Learning Platform Automated Categorization of Onion Sites for Analyzing the Darkweb Ecosystem Please consult the guidelines for authors here. Submission topics include but are not limited to: Target application area?Business Advertising and E-commerce Finance Marketing Markets and Crowds Recommender systems Target application area?Life Sciences Bioinformatics Clinical Decision Support Clinical Research Healthcare and Caregiving mHealth Patient Empowerment Target application area?Social and Network Sciences Crowdsourcing Network sciences Social good Social media and publishing Social sciences User modeling Web mining Target application area?Facilitating the Learning Process Big Data infrastructures Cloud, Map-Reduce, MPI Data protection Design of experiments Interpretable models Large-scale optimization Scalable algorithms Further target application areas Education Mobile and Sensor devices Security Transportation **************************** Submission directions **************************** KDD is a dual track conference hosting both a Research track and an Applied Data Science track. Due to the large number of submissions, papers submitted to the Research track will not be considered for publication in the Applied Data Science track and vice versa. Authors are encouraged to read the track descriptions carefully and to choose an appropriate track for their submissions. Following KDD conference tradition, reviews are not double-blind, and author names and affiliations should be listed. Submissions are limited to a total of 9 (nine) pages, including all content and references, and must be in PDF format and formatted according to the new Standard ACM Conference Proceedings Template. For LaTeX users: unzip acmart.zip, make, and use sample-sigconf.tex as a template. Additional information about formatting and style files is available online at: https://www.acm.org/publications/proceedings-template. Papers that do not meet the formatting requirements will be rejected without review. In addition, authors can provide an optional two (2) page supplement at the end of their submitted paper (it needs to be in the same PDF file and start at page 10) focused on reproducibility (see reproducibility section for more details)*. Website for submissions: https://easychair.org/conferences/?conf=kdd2019 **************************** Important policies **************************** Reproducibility Submitted papers will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. Authors are strongly encouraged to make their code and data publicly available when possible . Algorithms and resources used in a paper should be described as completely as possible to allow reproducibility. This includes experimental methodology, empirical evaluations, and results. The reproducibility factor will play an important role in the assessment of each submission. *Important Note: To encourage reproducibility of the results presented in KDD, only papers that include a supplement (up to two pages as described in the submission directions) aiming to provide reproducibility-related information will be considered for the best paper awards. This supplement can only be used to include (i) information necessary for reproducing the experimental results, insights, or conclusions reported in the paper (e.g., various algorithmic and model parameters and configurations, hyper-parameter search spaces, details related to dataset filtering and train/test splits, software versions, detailed hardware configuration, etc.), and (ii) any implementation, pseudo-code, or proofs that due to space limitations, could not be included in the main nine-page manuscript, but that help in reproducibility. Authorship Every person named as the author of a paper must have contributed substantially both to the work described in the paper and to the writing of the paper. Every listed author must take responsibility for the entire content of a paper. Persons who do not meet these requirements may be acknowledged, but should not be listed as authors. Post-submission changes to the author list are not allowed. Dual submissions Submitted papers must describe work that is substantively different from work that has already been published, or accepted for publication, or submitted in parallel to other conferences or journals. However, there are several exceptions to this rule. Submission is permitted for a shorter version of a paper submitted to a journal, but not yet published. Authors must declare such dual-submissions on the submission form and must ensure that the journal in question allows concurrent submissions to conferences. Submissions are permitted for papers presented or to be presented at seminars, conferences or workshops without proceedings. Submissions are permitted for papers that have previously been made available only in the form of technical report with no peer reviews, in particular on arXiv. Conflicts of interest During the submission process, enter the email domains of all institutions with which you have an institutional conflict of interest. You have an institutional conflict of interest if you are currently employed or have been employed at this institution in the past three years, or you have extensively collaborated with this institution within the past three years. Authors are also required to identify all PC/SPC members with whom they have a conflict of interest, eg, advisor, student, colleague, or coauthor in the last five years. Attendance For each accepted paper, at least one author must attend the conference and present the paper. Authors of all accepted papers must prepare a final version for publication, a poster, and a three-minute short video presentation. Copyright Accepted papers will be published in the conference proceedings by ACM and also appear in the ACM Digital Library. The rights retained by authors who transfer copyright to ACM can be found here . AUTHORS TAKE NOTE: The official publication date is the date the proceedings are made available in the ACM Digital Library. This date for KDD 2019 is on or after July 1st, 2019. The official publication date affects the deadline for any patent filings related to published work. For any questions, please contact applied2019 at kdd.org -- You received this message because you are subscribed to the Google Groups "Machine Learning News" group. To unsubscribe from this group and stop receiving emails from it, send an email to ml-news+unsubscribe at googlegroups.com. For more options, visit https://groups.google.com/d/optout. -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Jan 7 14:00:04 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 7 Jan 2019 14:00:04 -0500 Subject: Possible CUDA problem In-Reply-To: References: Message-ID: Yotam, Thank you so much for this report! I am CC-ing users at autonlab.org so that everyone is on the same page. Could you please work with me on this one? Let's try to fix GPU10 first. GPU10 was recently provisioned. It has three (one was DoA) GeForce 1080Ti. I am running the latest NVIDIA-Linux-x86_64-410.78 driver and the latest cuda-10.0.130-1. You have two versions of Python. /opt/rh/rh-python36 will give you the latest 3.6.7. While /opt/miniconda3 will install python-3.7.2. Once we fix GPU10 we will move to other machines. Note that other machines are still running older version of NVidia driver and CUDA-9.2. I have changed nothing on them so whatever is broken it is broken upstream (Python,TensorFlow, NVidia, or CUDA). Please keep CC-ing users to this discussion so that people know what is going on. Predrag On Mon, Jan 7, 2019 at 8:02 AM Yotam Hechtlinger wrote: > > Hi Predrag, > > There might be some CUDA problem on GPU 5,6 & 10. > I get the following message when I try to import tensorflow: > > > > >>> import tensorflow > Traceback (most recent call last): > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in > from tensorflow.python.pywrap_tensorflow_internal import * > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in > _pywrap_tensorflow_internal = swig_import_helper() > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 243, in load_module > return load_dynamic(name, filename, file) > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic > return _load(spec) > ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "", line 1, in > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in > from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in > from tensorflow.python import pywrap_tensorflow > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in > raise ImportError(msg) > ImportError: Traceback (most recent call last): > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in > from tensorflow.python.pywrap_tensorflow_internal import * > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in > _pywrap_tensorflow_internal = swig_import_helper() > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 243, in load_module > return load_dynamic(name, filename, file) > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic > return _load(spec) > ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory > > > Failed to load the native TensorFlow runtime. > > See https://www.tensorflow.org/install/errors > > for some common reasons and solutions. Include the entire stack trace > above this error message when asking for help. > From yhechtli at andrew.cmu.edu Mon Jan 7 14:28:09 2019 From: yhechtli at andrew.cmu.edu (Yotam Hechtlinger) Date: Mon, 7 Jan 2019 21:28:09 +0200 Subject: Possible CUDA problem In-Reply-To: References: Message-ID: Hi Predrag, With GPU10 the problem is probably because LD_LIBRARY_PATH directs to /usr/local/cuda/lib64 but that's not where CUDA is installed (where is it?). Yotam. On Mon, Jan 7, 2019 at 9:00 PM Predrag Punosevac wrote: > Yotam, > > Thank you so much for this report! I am CC-ing users at autonlab.org so > that everyone is on the same page. Could you please work with me on > this one? Let's try to fix GPU10 first. GPU10 was recently > provisioned. It has three (one was DoA) GeForce 1080Ti. I am running > the latest NVIDIA-Linux-x86_64-410.78 driver and the latest > cuda-10.0.130-1. You have two versions of Python. /opt/rh/rh-python36 > will give you the latest 3.6.7. While /opt/miniconda3 will install > python-3.7.2. Once we fix GPU10 we will move to other machines. Note > that other machines are still running older version of NVidia driver > and CUDA-9.2. I have changed nothing on them so whatever is broken it > is broken upstream (Python,TensorFlow, NVidia, or CUDA). > > Please keep CC-ing users to this discussion so that people know what > is going on. > > Predrag > > > On Mon, Jan 7, 2019 at 8:02 AM Yotam Hechtlinger > wrote: > > > > Hi Predrag, > > > > There might be some CUDA problem on GPU 5,6 & 10. > > I get the following message when I try to import tensorflow: > > > > > > > > >>> import tensorflow > > Traceback (most recent call last): > > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", > line 58, in > > from tensorflow.python.pywrap_tensorflow_internal import * > > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 28, in > > _pywrap_tensorflow_internal = swig_import_helper() > > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 24, in swig_import_helper > > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, > description) > > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 243, in load_module > > return load_dynamic(name, filename, file) > > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 343, in load_dynamic > > return _load(spec) > > ImportError: libcublas.so.9.0: cannot open shared object file: No such > file or directory > > > > During handling of the above exception, another exception occurred: > > > > Traceback (most recent call last): > > File "", line 1, in > > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", > line 24, in > > from tensorflow.python import pywrap_tensorflow # pylint: > disable=unused-import > > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", > line 49, in > > from tensorflow.python import pywrap_tensorflow > > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", > line 74, in > > raise ImportError(msg) > > ImportError: Traceback (most recent call last): > > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", > line 58, in > > from tensorflow.python.pywrap_tensorflow_internal import * > > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 28, in > > _pywrap_tensorflow_internal = swig_import_helper() > > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 24, in swig_import_helper > > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, > description) > > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 243, in load_module > > return load_dynamic(name, filename, file) > > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 343, in load_dynamic > > return _load(spec) > > ImportError: libcublas.so.9.0: cannot open shared object file: No such > file or directory > > > > > > Failed to load the native TensorFlow runtime. > > > > See https://www.tensorflow.org/install/errors > > > > for some common reasons and solutions. Include the entire stack trace > > above this error message when asking for help. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Jan 7 15:43:19 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 7 Jan 2019 15:43:19 -0500 Subject: Possible CUDA problem In-Reply-To: References: Message-ID: Ok. I found one problem. CUDA 10 was not properly installed on GPU10 due to the dependency problems. I had to disable rpmfusion repos (both free and non-free) which I considered safe in the past. Now CUDA 10 is installed from NVidia repo and is in /usr/local and /usr/local/cuda is the symbolic link to actual /usr/local/cuda-10.0 folder. Please try now. Predrag On Mon, Jan 7, 2019 at 2:28 PM Yotam Hechtlinger wrote: > > Hi Predrag, > > With GPU10 the problem is probably because LD_LIBRARY_PATH directs to /usr/local/cuda/lib64 but that's not where CUDA is installed (where is it?). > > Yotam. > > > On Mon, Jan 7, 2019 at 9:00 PM Predrag Punosevac wrote: >> >> Yotam, >> >> Thank you so much for this report! I am CC-ing users at autonlab.org so >> that everyone is on the same page. Could you please work with me on >> this one? Let's try to fix GPU10 first. GPU10 was recently >> provisioned. It has three (one was DoA) GeForce 1080Ti. I am running >> the latest NVIDIA-Linux-x86_64-410.78 driver and the latest >> cuda-10.0.130-1. You have two versions of Python. /opt/rh/rh-python36 >> will give you the latest 3.6.7. While /opt/miniconda3 will install >> python-3.7.2. Once we fix GPU10 we will move to other machines. Note >> that other machines are still running older version of NVidia driver >> and CUDA-9.2. I have changed nothing on them so whatever is broken it >> is broken upstream (Python,TensorFlow, NVidia, or CUDA). >> >> Please keep CC-ing users to this discussion so that people know what >> is going on. >> >> Predrag >> >> >> On Mon, Jan 7, 2019 at 8:02 AM Yotam Hechtlinger >> wrote: >> > >> > Hi Predrag, >> > >> > There might be some CUDA problem on GPU 5,6 & 10. >> > I get the following message when I try to import tensorflow: >> > >> > >> > >> > >>> import tensorflow >> > Traceback (most recent call last): >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in >> > from tensorflow.python.pywrap_tensorflow_internal import * >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in >> > _pywrap_tensorflow_internal = swig_import_helper() >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper >> > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 243, in load_module >> > return load_dynamic(name, filename, file) >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic >> > return _load(spec) >> > ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory >> > >> > During handling of the above exception, another exception occurred: >> > >> > Traceback (most recent call last): >> > File "", line 1, in >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in >> > from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in >> > from tensorflow.python import pywrap_tensorflow >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in >> > raise ImportError(msg) >> > ImportError: Traceback (most recent call last): >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in >> > from tensorflow.python.pywrap_tensorflow_internal import * >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in >> > _pywrap_tensorflow_internal = swig_import_helper() >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper >> > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 243, in load_module >> > return load_dynamic(name, filename, file) >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic >> > return _load(spec) >> > ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory >> > >> > >> > Failed to load the native TensorFlow runtime. >> > >> > See https://www.tensorflow.org/install/errors >> > >> > for some common reasons and solutions. Include the entire stack trace >> > above this error message when asking for help. >> > From yhechtli at andrew.cmu.edu Tue Jan 8 03:49:13 2019 From: yhechtli at andrew.cmu.edu (Yotam Hechtlinger) Date: Tue, 8 Jan 2019 10:49:13 +0200 Subject: Possible CUDA problem In-Reply-To: References: Message-ID: Hi Predrag, Is cuDNN properly installed? I can't see it inside the /usr/local/cuda. Also *import tensorflow* provides: *ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory* Thanks, Yotam. On Mon, Jan 7, 2019 at 10:43 PM Predrag Punosevac wrote: > Ok. I found one problem. CUDA 10 was not properly installed on GPU10 > due to the dependency problems. I had to disable rpmfusion repos > (both free and non-free) which I considered safe in the past. Now CUDA > 10 is installed from NVidia repo and is in /usr/local and > /usr/local/cuda is the symbolic link to actual /usr/local/cuda-10.0 > folder. Please try now. > > Predrag > > On Mon, Jan 7, 2019 at 2:28 PM Yotam Hechtlinger > wrote: > > > > Hi Predrag, > > > > With GPU10 the problem is probably because LD_LIBRARY_PATH directs to > /usr/local/cuda/lib64 but that's not where CUDA is installed (where is it?). > > > > Yotam. > > > > > > On Mon, Jan 7, 2019 at 9:00 PM Predrag Punosevac < > predragp at andrew.cmu.edu> wrote: > >> > >> Yotam, > >> > >> Thank you so much for this report! I am CC-ing users at autonlab.org so > >> that everyone is on the same page. Could you please work with me on > >> this one? Let's try to fix GPU10 first. GPU10 was recently > >> provisioned. It has three (one was DoA) GeForce 1080Ti. I am running > >> the latest NVIDIA-Linux-x86_64-410.78 driver and the latest > >> cuda-10.0.130-1. You have two versions of Python. /opt/rh/rh-python36 > >> will give you the latest 3.6.7. While /opt/miniconda3 will install > >> python-3.7.2. Once we fix GPU10 we will move to other machines. Note > >> that other machines are still running older version of NVidia driver > >> and CUDA-9.2. I have changed nothing on them so whatever is broken it > >> is broken upstream (Python,TensorFlow, NVidia, or CUDA). > >> > >> Please keep CC-ing users to this discussion so that people know what > >> is going on. > >> > >> Predrag > >> > >> > >> On Mon, Jan 7, 2019 at 8:02 AM Yotam Hechtlinger > >> wrote: > >> > > >> > Hi Predrag, > >> > > >> > There might be some CUDA problem on GPU 5,6 & 10. > >> > I get the following message when I try to import tensorflow: > >> > > >> > > >> > > >> > >>> import tensorflow > >> > Traceback (most recent call last): > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", > line 58, in > >> > from tensorflow.python.pywrap_tensorflow_internal import * > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 28, in > >> > _pywrap_tensorflow_internal = swig_import_helper() > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 24, in swig_import_helper > >> > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, > pathname, description) > >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 243, in load_module > >> > return load_dynamic(name, filename, file) > >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 343, in load_dynamic > >> > return _load(spec) > >> > ImportError: libcublas.so.9.0: cannot open shared object file: No > such file or directory > >> > > >> > During handling of the above exception, another exception occurred: > >> > > >> > Traceback (most recent call last): > >> > File "", line 1, in > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", > line 24, in > >> > from tensorflow.python import pywrap_tensorflow # pylint: > disable=unused-import > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", > line 49, in > >> > from tensorflow.python import pywrap_tensorflow > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", > line 74, in > >> > raise ImportError(msg) > >> > ImportError: Traceback (most recent call last): > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", > line 58, in > >> > from tensorflow.python.pywrap_tensorflow_internal import * > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 28, in > >> > _pywrap_tensorflow_internal = swig_import_helper() > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 24, in swig_import_helper > >> > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, > pathname, description) > >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 243, in load_module > >> > return load_dynamic(name, filename, file) > >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 343, in load_dynamic > >> > return _load(spec) > >> > ImportError: libcublas.so.9.0: cannot open shared object file: No > such file or directory > >> > > >> > > >> > Failed to load the native TensorFlow runtime. > >> > > >> > See https://www.tensorflow.org/install/errors > >> > > >> > for some common reasons and solutions. Include the entire stack trace > >> > above this error message when asking for help. > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Jan 8 09:42:44 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 08 Jan 2019 09:42:44 -0500 Subject: Possible CUDA problem In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From yhechtli at andrew.cmu.edu Tue Jan 8 12:45:43 2019 From: yhechtli at andrew.cmu.edu (Yotam Hechtlinger) Date: Tue, 8 Jan 2019 19:45:43 +0200 Subject: Possible CUDA problem In-Reply-To: References: Message-ID: With GPU5 & 6 the problem is that /usr/local is missing a symbolic link. It has cuda9.0 and cuda9.1 but not /usr/local/cuda. Regarding GPU10 - I also think consistency would be useful with the CUDA versions. Tensorflow nightly build support CUDA 10 since mid december, see: https://github.com/tensorflow/tensorflow/issues/22706 but using it would require switching tensorflow versions between the servers, because the stable version only support CUDA 9. Regarding cuDNN - not sure I understand, but I can't debug the nightly version on GPU10 until it's installed. Yotam. On Tue, Jan 8, 2019 at 4:42 PM Predrag Punosevac wrote: > It is not. That is a proprietar Intel optimization library. One of you > with the active Intel dev account needs to download and put somewhere I can > find. Please read documentation to make sure which cuda version is > supported. In the past we had to downgrade CUDA to use cuDNN.I would not be > surprised that we had to go back to CUDA 9.2 to use it. > > On Jan 8, 2019 3:49 AM, Yotam Hechtlinger wrote: > > Hi Predrag, > > Is cuDNN properly installed? > I can't see it inside the /usr/local/cuda. > > Also *import tensorflow* provides: > > *ImportError: libcudnn.so.7: cannot open shared object file: No such file > or directory* > > Thanks, > Yotam. > > On Mon, Jan 7, 2019 at 10:43 PM Predrag Punosevac > wrote: > > Ok. I found one problem. CUDA 10 was not properly installed on GPU10 > due to the dependency problems. I had to disable rpmfusion repos > (both free and non-free) which I considered safe in the past. Now CUDA > 10 is installed from NVidia repo and is in /usr/local and > /usr/local/cuda is the symbolic link to actual /usr/local/cuda-10.0 > folder. Please try now. > > Predrag > > On Mon, Jan 7, 2019 at 2:28 PM Yotam Hechtlinger > wrote: > > > > Hi Predrag, > > > > With GPU10 the problem is probably because LD_LIBRARY_PATH directs to > /usr/local/cuda/lib64 but that's not where CUDA is installed (where is it?). > > > > Yotam. > > > > > > On Mon, Jan 7, 2019 at 9:00 PM Predrag Punosevac < > predragp at andrew.cmu.edu> wrote: > >> > >> Yotam, > >> > >> Thank you so much for this report! I am CC-ing users at autonlab.org so > >> that everyone is on the same page. Could you please work with me on > >> this one? Let's try to fix GPU10 first. GPU10 was recently > >> provisioned. It has three (one was DoA) GeForce 1080Ti. I am running > >> the latest NVIDIA-Linux-x86_64-410.78 driver and the latest > >> cuda-10.0.130-1. You have two versions of Python. /opt/rh/rh-python36 > >> will give you the latest 3.6.7. While /opt/miniconda3 will install > >> python-3.7.2. Once we fix GPU10 we will move to other machines. Note > >> that other machines are still running older version of NVidia driver > >> and CUDA-9.2. I have changed nothing on them so whatever is broken it > >> is broken upstream (Python,TensorFlow, NVidia, or CUDA). > >> > >> Please keep CC-ing users to this discussion so that people know what > >> is going on. > >> > >> Predrag > >> > >> > >> On Mon, Jan 7, 2019 at 8:02 AM Yotam Hechtlinger > >> wrote: > >> > > >> > Hi Predrag, > >> > > >> > There might be some CUDA problem on GPU 5,6 & 10. > >> > I get the following message when I try to import tensorflow: > >> > > >> > > >> > > >> > >>> import tensorflow > >> > Traceback (most recent call last): > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", > line 58, in > >> > from tensorflow.python.pywrap_tensorflow_internal import * > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 28, in > >> > _pywrap_tensorflow_internal = swig_import_helper() > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 24, in swig_import_helper > >> > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, > pathname, description) > >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 243, in load_module > >> > return load_dynamic(name, filename, file) > >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 343, in load_dynamic > >> > return _load(spec) > >> > ImportError: libcublas.so.9.0: cannot open shared object file: No > such file or directory > >> > > >> > During handling of the above exception, another exception occurred: > >> > > >> > Traceback (most recent call last): > >> > File "", line 1, in > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", > line 24, in > >> > from tensorflow.python import pywrap_tensorflow # pylint: > disable=unused-import > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", > line 49, in > >> > from tensorflow.python import pywrap_tensorflow > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", > line 74, in > >> > raise ImportError(msg) > >> > ImportError: Traceback (most recent call last): > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", > line 58, in > >> > from tensorflow.python.pywrap_tensorflow_internal import * > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 28, in > >> > _pywrap_tensorflow_internal = swig_import_helper() > >> > File > "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", > line 24, in swig_import_helper > >> > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, > pathname, description) > >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 243, in load_module > >> > return load_dynamic(name, filename, file) > >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line > 343, in load_dynamic > >> > return _load(spec) > >> > ImportError: libcublas.so.9.0: cannot open shared object file: No > such file or directory > >> > > >> > > >> > Failed to load the native TensorFlow runtime. > >> > > >> > See https://www.tensorflow.org/install/errors > >> > > >> > for some common reasons and solutions. Include the entire stack trace > >> > above this error message when asking for help. > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Jan 8 17:06:00 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 8 Jan 2019 17:06:00 -0500 Subject: lov6 beefed up Message-ID: Dear Autonians, Thanks to Dr. Barnabas Poczos I just beefed up LOV6 with a brand new Intel Xeon Gold 6152 22-core processor and extra 192 GB of RAM. Current spec of this CPU computing nodes are: 44 cores (88 threads) and 384 GB of RAM This server can't not be upgraded further. Enjoy, Predrag From predragp at andrew.cmu.edu Tue Jan 8 17:38:28 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 8 Jan 2019 17:38:28 -0500 Subject: lov6 beefed up In-Reply-To: <67FE4EBD-9D9C-4469-8CC7-DFB59C417C1B@andrew.cmu.edu> References: <67FE4EBD-9D9C-4469-8CC7-DFB59C417C1B@andrew.cmu.edu> Message-ID: One would think that guy who claims to have a PhD in mathematics knows that $$\lnot \lnot p \Leftrightarrow p$$ :-) Predrag P.S. The grammatical error I made is the artifact of my mother Serbian tongue where double negation is perfectly correct way to express a negation. Funny enough if George Boole was Serbian the following $$\lnot \lnot p \Leftrightarrow \lnot p$$ would have been tautologi instead of the above. Such propositional calculus can be made perfectly logically consistent just like the one we use as proved by mathematicians :-) On Tue, Jan 8, 2019 at 5:08 PM Jayanth Koushik wrote: > > > can?t not be upgraded further > so it can? ;-) From predragp at andrew.cmu.edu Tue Jan 8 18:32:21 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 8 Jan 2019 18:32:21 -0500 Subject: Possible CUDA problem In-Reply-To: References: Message-ID: Many other GPU servers have no symbolic link between /usr/local/cuda and actual installed version of CUDA due to multiple CUDA versions installed. That is the artifact of the fact that we started with CUDA 8.0 and went through bunch of upgrades. You should be able to work without the symbolic link. cuDNN is proprietary software (although not Intel library as I said in an earlier e-mail). I don't have the account to download it https://developer.nvidia.com/rdp/form/cudnn-download-survey Please download if you have NVidia dev account and put somewhere where I can access it. I forgot how it works. We had similar issues with Intel proprietary compiler and optimization libraries. They can't be downloaded for free but there are bunch of hops we have to jump through to get it. Predrag On Tue, Jan 8, 2019 at 12:46 PM Yotam Hechtlinger wrote: > > With GPU5 & 6 the problem is that /usr/local is missing a symbolic link. > It has cuda9.0 and cuda9.1 but not /usr/local/cuda. > > Regarding GPU10 - I also think consistency would be useful with the CUDA versions. > Tensorflow nightly build support CUDA 10 since mid december, see: > https://github.com/tensorflow/tensorflow/issues/22706 > > but using it would require switching tensorflow versions between the servers, because the stable version only support CUDA 9. > > Regarding cuDNN - not sure I understand, but I can't debug the nightly version on GPU10 until it's installed. > > Yotam. > > > > > > > On Tue, Jan 8, 2019 at 4:42 PM Predrag Punosevac wrote: >> >> It is not. That is a proprietar Intel optimization library. One of you with the active Intel dev account needs to download and put somewhere I can find. Please read documentation to make sure which cuda version is supported. In the past we had to downgrade CUDA to use cuDNN.I would not be surprised that we had to go back to CUDA 9.2 to use it. >> >> On Jan 8, 2019 3:49 AM, Yotam Hechtlinger wrote: >> >> Hi Predrag, >> >> Is cuDNN properly installed? >> I can't see it inside the /usr/local/cuda. >> >> Also import tensorflow provides: >> >> ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory >> >> Thanks, >> Yotam. >> >> On Mon, Jan 7, 2019 at 10:43 PM Predrag Punosevac wrote: >> >> Ok. I found one problem. CUDA 10 was not properly installed on GPU10 >> due to the dependency problems. I had to disable rpmfusion repos >> (both free and non-free) which I considered safe in the past. Now CUDA >> 10 is installed from NVidia repo and is in /usr/local and >> /usr/local/cuda is the symbolic link to actual /usr/local/cuda-10.0 >> folder. Please try now. >> >> Predrag >> >> On Mon, Jan 7, 2019 at 2:28 PM Yotam Hechtlinger >> wrote: >> > >> > Hi Predrag, >> > >> > With GPU10 the problem is probably because LD_LIBRARY_PATH directs to /usr/local/cuda/lib64 but that's not where CUDA is installed (where is it?). >> > >> > Yotam. >> > >> > >> > On Mon, Jan 7, 2019 at 9:00 PM Predrag Punosevac wrote: >> >> >> >> Yotam, >> >> >> >> Thank you so much for this report! I am CC-ing users at autonlab.org so >> >> that everyone is on the same page. Could you please work with me on >> >> this one? Let's try to fix GPU10 first. GPU10 was recently >> >> provisioned. It has three (one was DoA) GeForce 1080Ti. I am running >> >> the latest NVIDIA-Linux-x86_64-410.78 driver and the latest >> >> cuda-10.0.130-1. You have two versions of Python. /opt/rh/rh-python36 >> >> will give you the latest 3.6.7. While /opt/miniconda3 will install >> >> python-3.7.2. Once we fix GPU10 we will move to other machines. Note >> >> that other machines are still running older version of NVidia driver >> >> and CUDA-9.2. I have changed nothing on them so whatever is broken it >> >> is broken upstream (Python,TensorFlow, NVidia, or CUDA). >> >> >> >> Please keep CC-ing users to this discussion so that people know what >> >> is going on. >> >> >> >> Predrag >> >> >> >> >> >> On Mon, Jan 7, 2019 at 8:02 AM Yotam Hechtlinger >> >> wrote: >> >> > >> >> > Hi Predrag, >> >> > >> >> > There might be some CUDA problem on GPU 5,6 & 10. >> >> > I get the following message when I try to import tensorflow: >> >> > >> >> > >> >> > >> >> > >>> import tensorflow >> >> > Traceback (most recent call last): >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in >> >> > from tensorflow.python.pywrap_tensorflow_internal import * >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in >> >> > _pywrap_tensorflow_internal = swig_import_helper() >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper >> >> > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 243, in load_module >> >> > return load_dynamic(name, filename, file) >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic >> >> > return _load(spec) >> >> > ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory >> >> > >> >> > During handling of the above exception, another exception occurred: >> >> > >> >> > Traceback (most recent call last): >> >> > File "", line 1, in >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in >> >> > from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in >> >> > from tensorflow.python import pywrap_tensorflow >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in >> >> > raise ImportError(msg) >> >> > ImportError: Traceback (most recent call last): >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in >> >> > from tensorflow.python.pywrap_tensorflow_internal import * >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in >> >> > _pywrap_tensorflow_internal = swig_import_helper() >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper >> >> > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 243, in load_module >> >> > return load_dynamic(name, filename, file) >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic >> >> > return _load(spec) >> >> > ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory >> >> > >> >> > >> >> > Failed to load the native TensorFlow runtime. >> >> > >> >> > See https://www.tensorflow.org/install/errors >> >> > >> >> > for some common reasons and solutions. Include the entire stack trace >> >> > above this error message when asking for help. >> >> > >> >> From predragp at andrew.cmu.edu Wed Jan 9 15:46:23 2019 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 9 Jan 2019 15:46:23 -0500 Subject: Possible CUDA problem In-Reply-To: References: Message-ID: Did you download cuDNN from NVidia website? I tried today but I don't have the account (I am 100% sure now I don't have it). Predrag On Tue, Jan 8, 2019 at 12:46 PM Yotam Hechtlinger wrote: > > With GPU5 & 6 the problem is that /usr/local is missing a symbolic link. > It has cuda9.0 and cuda9.1 but not /usr/local/cuda. > > Regarding GPU10 - I also think consistency would be useful with the CUDA versions. > Tensorflow nightly build support CUDA 10 since mid december, see: > https://github.com/tensorflow/tensorflow/issues/22706 > > but using it would require switching tensorflow versions between the servers, because the stable version only support CUDA 9. > > Regarding cuDNN - not sure I understand, but I can't debug the nightly version on GPU10 until it's installed. > > Yotam. > > > > > > > On Tue, Jan 8, 2019 at 4:42 PM Predrag Punosevac wrote: >> >> It is not. That is a proprietar Intel optimization library. One of you with the active Intel dev account needs to download and put somewhere I can find. Please read documentation to make sure which cuda version is supported. In the past we had to downgrade CUDA to use cuDNN.I would not be surprised that we had to go back to CUDA 9.2 to use it. >> >> On Jan 8, 2019 3:49 AM, Yotam Hechtlinger wrote: >> >> Hi Predrag, >> >> Is cuDNN properly installed? >> I can't see it inside the /usr/local/cuda. >> >> Also import tensorflow provides: >> >> ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory >> >> Thanks, >> Yotam. >> >> On Mon, Jan 7, 2019 at 10:43 PM Predrag Punosevac wrote: >> >> Ok. I found one problem. CUDA 10 was not properly installed on GPU10 >> due to the dependency problems. I had to disable rpmfusion repos >> (both free and non-free) which I considered safe in the past. Now CUDA >> 10 is installed from NVidia repo and is in /usr/local and >> /usr/local/cuda is the symbolic link to actual /usr/local/cuda-10.0 >> folder. Please try now. >> >> Predrag >> >> On Mon, Jan 7, 2019 at 2:28 PM Yotam Hechtlinger >> wrote: >> > >> > Hi Predrag, >> > >> > With GPU10 the problem is probably because LD_LIBRARY_PATH directs to /usr/local/cuda/lib64 but that's not where CUDA is installed (where is it?). >> > >> > Yotam. >> > >> > >> > On Mon, Jan 7, 2019 at 9:00 PM Predrag Punosevac wrote: >> >> >> >> Yotam, >> >> >> >> Thank you so much for this report! I am CC-ing users at autonlab.org so >> >> that everyone is on the same page. Could you please work with me on >> >> this one? Let's try to fix GPU10 first. GPU10 was recently >> >> provisioned. It has three (one was DoA) GeForce 1080Ti. I am running >> >> the latest NVIDIA-Linux-x86_64-410.78 driver and the latest >> >> cuda-10.0.130-1. You have two versions of Python. /opt/rh/rh-python36 >> >> will give you the latest 3.6.7. While /opt/miniconda3 will install >> >> python-3.7.2. Once we fix GPU10 we will move to other machines. Note >> >> that other machines are still running older version of NVidia driver >> >> and CUDA-9.2. I have changed nothing on them so whatever is broken it >> >> is broken upstream (Python,TensorFlow, NVidia, or CUDA). >> >> >> >> Please keep CC-ing users to this discussion so that people know what >> >> is going on. >> >> >> >> Predrag >> >> >> >> >> >> On Mon, Jan 7, 2019 at 8:02 AM Yotam Hechtlinger >> >> wrote: >> >> > >> >> > Hi Predrag, >> >> > >> >> > There might be some CUDA problem on GPU 5,6 & 10. >> >> > I get the following message when I try to import tensorflow: >> >> > >> >> > >> >> > >> >> > >>> import tensorflow >> >> > Traceback (most recent call last): >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in >> >> > from tensorflow.python.pywrap_tensorflow_internal import * >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in >> >> > _pywrap_tensorflow_internal = swig_import_helper() >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper >> >> > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 243, in load_module >> >> > return load_dynamic(name, filename, file) >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic >> >> > return _load(spec) >> >> > ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory >> >> > >> >> > During handling of the above exception, another exception occurred: >> >> > >> >> > Traceback (most recent call last): >> >> > File "", line 1, in >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in >> >> > from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in >> >> > from tensorflow.python import pywrap_tensorflow >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in >> >> > raise ImportError(msg) >> >> > ImportError: Traceback (most recent call last): >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in >> >> > from tensorflow.python.pywrap_tensorflow_internal import * >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in >> >> > _pywrap_tensorflow_internal = swig_import_helper() >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper >> >> > _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 243, in load_module >> >> > return load_dynamic(name, filename, file) >> >> > File "/zfsauton/home/yhechtli/anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic >> >> > return _load(spec) >> >> > ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory >> >> > >> >> > >> >> > Failed to load the native TensorFlow runtime. >> >> > >> >> > See https://www.tensorflow.org/install/errors >> >> > >> >> > for some common reasons and solutions. Include the entire stack trace >> >> > above this error message when asking for help. >> >> > >> >> From awd at cs.cmu.edu Thu Jan 17 10:37:36 2019 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Thu, 17 Jan 2019 10:37:36 -0500 Subject: This just in: "Nationwide Sting Operation Targets Illegal Asian Brothels, Six Indicted for Racketeering" In-Reply-To: References: Message-ID: Dear Fellow Autonians, I am very happy to report that our Traffic Jam software helped identify and nab a major international organization conducting sex trafficking activities in multiple countries and multiple states throughout the United States. Please see the release from the U.S. Attorney?s Office, District of Oregon, linked below. Even though neither CMU nor our spinoff Marinus Analytics names are mentioned in the document, it is a fact that the federal agencies involved in the process of tracking the indicted perpetrator's activities, as well as the National Cyber-Forensics and Training Alliance (NCFTA) mentioned in it, have been using the results of the analyses conducted with our software to guide their investigations. It turns out that our software triggered the first alerts of the suspicious activities of this very organization as early as three years ago. We are very glad that those and other subsequent findings obtained with Traffic Jam were correct, informative, and that they ultimately led to the currently announced indictions. Some of the apparent perpetrators are in custody and they will stand court trials, and so until convicted they are presumed innocent. We all hope that the potentially numerous victims, involved in the activities these individuals are being accused for, will all be rescued and will have a good chance to return to their normal, safe lives. Cara and Emily from Marinus Analytics are working with the representatives of NCFTA to determine if and how this achievement could or should be publicized more broadly, so please keep the news as internal only until further notice. Huge congratulations to the Auton Lab and everyone who contributed to developing and deploying Traffic Jam! Cheers, Artur https://www.justice.gov/usao-or/pr/nationwide-sting-operation-targets-illegal-asian-brothels-six-indicted-racketeering -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.mitchell at cs.cmu.edu Thu Jan 17 10:43:42 2019 From: tom.mitchell at cs.cmu.edu (Tom Mitchell) Date: Thu, 17 Jan 2019 07:43:42 -0800 Subject: This just in: "Nationwide Sting Operation Targets Illegal Asian Brothels, Six Indicted for Racketeering" In-Reply-To: References: Message-ID: Artur, Terrific - congratulations to you and your whole team on again showing how your research goes all the way to making a practical, positive impact! best, Tom On Thu, Jan 17, 2019 at 7:37 AM Artur Dubrawski wrote: > Dear Fellow Autonians, > > I am very happy to report that our Traffic Jam software helped identify > and nab a major international organization conducting sex trafficking > activities in multiple countries and multiple states throughout the United > States. > > Please see the release from the U.S. Attorney?s Office, District of > Oregon, linked below. > > Even though neither CMU nor our spinoff Marinus Analytics names are > mentioned in the document, it is a fact that the federal agencies involved > in the process of tracking the indicted perpetrator's activities, as well > as the National Cyber-Forensics and Training Alliance (NCFTA) mentioned in > it, have been using the results of the analyses conducted with our software > to guide their investigations. It turns out that our software triggered the > first alerts of the suspicious activities of this very organization as > early as three years ago. We are very glad that those and other subsequent > findings obtained with Traffic Jam were correct, informative, and that they > ultimately led to the currently announced indictions. Some of the apparent > perpetrators are in custody and they will stand court trials, and so until > convicted they are presumed innocent. We all hope that the potentially > numerous victims, involved in the activities these individuals are being > accused for, will all be rescued and will have a good chance to return to > their normal, safe lives. > > Cara and Emily from Marinus Analytics are working with the representatives > of NCFTA to determine if and how this achievement could or should be > publicized more broadly, so please keep the news as internal only until > further notice. > > Huge congratulations to the Auton Lab and everyone who contributed to > developing and deploying Traffic Jam! > > Cheers, > Artur > > > https://www.justice.gov/usao-or/pr/nationwide-sting-operation-targets-illegal-asian-brothels-six-indicted-racketeering > -- Tom M. Mitchell E. Fredkin University Professor Interim Dean School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~tom -------------- next part -------------- An HTML attachment was scrubbed... URL: