From awd at cs.cmu.edu Mon Oct 3 12:53:14 2022 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 3 Oct 2022 12:53:14 -0400 Subject: =?UTF-8?Q?Fwd=3A_=5BHCII_Seminar=5D_Rich_Caruana_=2D_=22Friends_Don=E2=80=99t_?= =?UTF-8?Q?Let_Friends_Deploy_Black=2DBox_Models=3A_The_Importance_of_Intel?= =?UTF-8?Q?ligibility_in_Machine_Learning=22?= In-Reply-To: References: Message-ID: A relevant talk by an old friend of the Auton Lab. Artur ---------- Forwarded message --------- From: Adam Perer Date: Mon, Oct 3, 2022 at 12:50 PM Subject: [HCII Seminar] Rich Caruana - "Friends Don?t Let Friends Deploy Black-Box Models: The Importance of Intelligibility in Machine Learning" To: Rich Caruana, a Senior Principal Researcher at Microsoft Research, will be giving a virtual HCII seminar on Friday, entitled, " Friends Don?t Let Friends Deploy Black-Box Models: The Importance of Intelligibility in Machine Learning". Details below: *Topic: HCII Fall Seminar Series* *Time: **Friday, October 7, 2022 - 1:30 PM Eastern Time (US and Canada) ? Virtual * The next presentation of the FALL 2022 Seminar session will be conducted on *Friday, October 7, from 1:30 to 2:45 pm via the Zoom. *The featured speaker -- VIRTUALLY (REMOTELY) -- will be *Rich Caruana from Microsoft Research. * *Join Zoom Meeting:* https://cmu.zoom.us/j/98589029500?pwd=YzYrTkVUU1pvN1AwUXpkbU55SXpYZz09 Meeting ID: 985 8902 9500 Passcode: 916104 *Rich Caruana, **Senior Principal Researcher at Microsoft Research in Redmond, WA* *Pres**entation Title**: * Friends Don?t Let Friends Deploy Black-Box Models: The Importance of Intelligibility in Machine Learning *Abstract: * In machine learning, sometimes tradeoffs must be made between accuracy, privacy and intelligibility: the most accurate models usually are not very intelligible or private, and the most intelligible models usually are less accurate. This can limit the accuracy and privacy of models that can safely be deployed in mission-critical applications such as healthcare where being able to understand, validate, edit, and trust models is important. EBMs (Explainable Boosting Machines) are a recent learning method based on generalized additive models (GAMs) that are as accurate as full complexity models, more intelligible than linear models, and which can be made differentially private with little loss in accuracy. EBMs make it easy to understand what a model has learned and to edit the model when it learns inappropriate things. In the talk, I?ll present multiple case studies where EBMs discover surprising patterns in data that would have made deploying black-box models risky. I?ll describe how to train these glassbox models with boosted trees, and with deep neural nets, and I?ll briefly discuss how we?re using these models to uncover and mitigate bias in models where fairness and transparency are important. *Bio: * Rich Caruana is a senior principal researcher at Microsoft Research. Before joining Microsoft, Rich was on the faculty in the Computer Science Department at Cornell University, at UCLA?s Medical School, and at CMU?s Center for Learning and Discovery. Rich?s Ph.D. is from Carnegie Mellon University, where he worked with Tom Mitchell and Herb Simon. His thesis on Multi-Task Learning helped create interest in a new subfield of machine learning called Transfer Learning. Rich received an NSF CAREER Award in 2004 for Meta Clustering, best paper awards in 2005 (with Alex Niculescu-Mizil), 2007 (with Daria Sorokina), and 2014 (with Todd Kulesza, Saleema Amershi, Danyel Fisher, and Denis Charles), and co-chaired KDD in 2007. His current research focus is on learning for medical decision making, transparent modeling, and deep learning. Adam Perer ( http://perer.org ) Data Interaction Group ( http://dig.cmu.edu ) Human-Computer Interaction Institute Carnegie Mellon University -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ii_l8s75uqh1 Type: image/jpeg Size: 244827 bytes Desc: not available URL: From awd at cs.cmu.edu Mon Oct 3 14:58:30 2022 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 3 Oct 2022 14:58:30 -0400 Subject: talk of interest Message-ID: ... particularly to those of us who are interested in healthcare analytics, ICD, causality in EHR, etc. https://seminartracker.tepper.cmu.edu/SeminarDetail?SeminarId=113 Cheers, Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Fri Oct 7 16:24:39 2022 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 7 Oct 2022 16:24:39 -0400 Subject: Fwd: Seminar: David Page, Monday Faculty Research Seminars In-Reply-To: References: <000000000000a2e68e05ea727d8a@google.com> Message-ID: this Heinz seminar's topic may be very relevant to many of us interested in healthcare AI Cheers Artur ---------- Forwarded message --------- From: Date: Fri, Oct 7, 2022 at 10:19 AM Subject: Seminar: David Page, Monday Faculty Research Seminars To: , , , , Seminar: David Page, Monday Faculty Research Seminars -- Seminar: David Page, Monday Faculty Research Seminars Monday Oct 17, 2022 ? 12pm ? 1:20pm (Eastern Time - New York) Please join us on *Monday, October 17th*, from *12:00 PM ? 01:20 PM* in *Hamburg Hall, Room 1002* as part of the Monday Faculty Research Seminars (MFRS). The *schedule is now open* to reserve your time to meet with David Page on *October 17th*: *Seminar Scheduler* *Presenter: *David Page is a Professor of Biostatistics & Bioinformatics at Duke University. He works on algorithms for data mining and machine learning, and their applications to biomedical data, especially de-identified electronic health records and high-throughput genetic and other molecular data. Of particular interest are machine learning methods for complex multi-relational data (such as electronic health records or molecules as shown) and irregular temporal data, and methods that find causal relationships or produce human-interpretable output (such as the rules for molecular bioactivity shown in green to the side). *Paper:* "Prediction and Causation in EHRs" *Abstract: *This talk begins with two empirical results in prediction in electronic health records (EHRs). First occurrence of thousands of ICD codes can be predicted with average AUC above 0.7, and accuracies can be further significantly improved by using family histories constructed entirely automatically from de-identified patient data. The talk then turns to tasks where causal discovery is needed rather than merely prediction. Motivated first by empirical results with neural point processes and other methods, we propose a theoretical model of causal discovery that adds time and sample complexity to the normal formulations. This model shows that some traditional machine learning algorithms can be used for causal discovery, if they are modified to consider unobserved or partially-observed confounders, including time-varying confounders. ______________________________________________________ See the *Heinz Event Calendar* for additional upcoming Seminars LocationLocation: HBH 1002 (In Person) View map Organizer heinz-events at andrew.cmu.edu Guests -- George H. Chen Assistant Professor, Heinz College of Information Systems and Public Policy Affiliated Faculty, Machine Learning Department Carnegie Mellon University -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Tue Oct 11 13:07:23 2022 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 11 Oct 2022 13:07:23 -0400 Subject: Fwd: Intro to Python, Git and Shell October 26-28 In-Reply-To: References: Message-ID: This is quite likely too basic for most of the Lab's veterans but I am sharing the info of this introductory crash course just in case it might be of help to some newer members of the team ---------- Forwarded message --------- From: Sarah Young Date: Tue, Oct 11, 2022 at 12:45 PM Subject: Intro to Python, Git and Shell October 26-28 To: , < heinz-phd-post-docall at lists.andrew.cmu.edu>, < heinz-all-adjunct at lists.andrew.cmu.edu>, , Dear all, Registration is open for an upcoming virtual 3-day Software Carpentries workshop *Intro to Python, Shell, and Git*, *on October 26-28, 2022*. The Carpentries is an organization that teaches foundational computing skills to researchers. This event is a virtual example-driven workshop on basic computing skills that runs over 3 half days (12 hours total). Short tutorials alternate with hands-on practical exercises*. The workshop is designed for those with no or little Python programming experience and is suitable for researchers in any subject area. *Attendees will learn how to use a command line, automate their data analysis workflows in Python, and version control with Git. The workshop runs *October 26-28, 9:00-12:00PM Eastern Daylight Time*. *Register here*: https://www.eventbrite.com/e/carnegie-mellon-university-software-carpentry-workshop-tickets-395972704017 Note that we will hold an install session on *Monday, October 24 at 10am EDT,* to ensure that everyone has all necessary software installed in advance. You will receive installation and set up instructions prior to the workshop, which must be completed in advance in order to receive the Zoom information for the training. Seating is limited so please only register if you can commit to coming to most or all of the training. Please write to me if you have any questions. Best, Sarah -- Sarah Young Library Liaison, Heinz College Social & Decision Sciences | Information Systems Institute for Politics & Strategy | Statistics & Data Science Carnegie Mellon University Libraries she/her/hers | (412) 268-7384 _______________________________________________ Heinz-all-faculty mailing list Heinz-all-faculty at lists.andrew.cmu.edu https://lists.andrew.cmu.edu/mailman/listinfo/heinz-all-faculty _______________________________________________ Heinz-affiliate-faculty mailing list Heinz-affiliate-faculty at lists.andrew.cmu.edu https://lists.andrew.cmu.edu/mailman/listinfo/heinz-affiliate-faculty -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Oct 11 23:04:37 2022 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 11 Oct 2022 23:04:37 -0400 Subject: RedHat 9.0 & GPU14 heads up In-Reply-To: References: Message-ID: Dear Autonians, After some delay due to the conference deadlines, GPU14 is finally reprovisioned to run RHEL 9. According to my limited testing, things appear to work as expected. Please log into GPU14 and test extensively. If this turns out to be a truly working configuration, Piotr Bartosiewicz and I will reprovision all existing RHEL 7 computing nodes with RHEL 9. Best, Predrag On Tue, Sep 27, 2022 at 8:46 PM Predrag Punosevac wrote: > Dear Autonians, > > The short version of this email is that I am planning to offline gpu14 > this Thursday in order to upgrade from RHEL 7.9 to recently released RHEL > 9.0. The GPU node is currently idle. Please don't start jobs as you will > lose them. > > If you have 10 minutes, the long version of this email reads: > > > It has been brought to my attention that some lab members are running into > a problem with glibc library version 2.28 shipped with RHEL 8.6. Currently, > about 50% of our computing nodes run an even older version of RHEL 7.9 > which is shipped with 2.17. > > RHEL 9.0 was released less than half a year ago. It is shipped with glibc > 2.34. GCC 11.2.1 and binutils 2.35.2. The default Python version is 3.9. > The common wisdom is to hold any upgrades until 9.1 release. Due to my > personal connection with the Springdale community (Princeton university), > which provides us with a free clone of RHEL, I know that 9.0 us production > ready. I have already checked the NVidia/CUDA stack and RPMs are built. > Therefore, I decided to schedule an experiment and to try to upgrade gpu14 > to RHEL 9.0. If the experiment is successful, Piotr Bartosiewicz and I > will upgrade all computing nodes currently running RHEL 7.9 to 9.0 release. > Computing nodes and all workstations currently running 8.6 will not be > touched to maintain the usability of the system. > > Ubuntu fans should be aware that 22.04 is shipped with glibc 2.35 but GCC > and many other things a minor point releases behind RHEL 9.0. The last time > I looked, CS CMU facilities were upgrading all desktops to Ubuntu 20.04 > from 18.04 and earlier. CMU CS facilities don't run Ubuntu on servers and > they are still in crisis mode due to unanticipated termination of the > CentOS clone of RHEL 8.xxx. The last time I talked to Ed Walter they were > running Cent/RHEL 7.9 and thinking about what to do next (Alma Linux, Rocky > Linux as well as what to do with obsolete ROCKS clusters). I did my best to > spare you from those things. > > Cheers, > Predrag > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Oct 11 23:51:36 2022 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 11 Oct 2022 23:51:36 -0400 Subject: RedHat 9.0 & GPU14 heads up In-Reply-To: References:

Message-ID: Oh come on. The server is reprovision. All cryptographic keys are regenerated. You need to open your file ~/.ssh/know_hosts and remove the old cryptographic key. P^2 On Tue, Oct 11, 2022 at 11:43 PM Ravi Tej Akella wrote: > Hi Predrag, > > When I try ssh-ing into GPU14, I get the following error message (I don't > get any error message when I log into other GPUs): > > @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ > @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ > @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ > IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! > Someone could be eavesdropping on you right now (man-in-the-middle attack)! > It is also possible that a host key has just been changed. > The fingerprint for the ECDSA key sent by the remote host is ------------. > Please contact your system administrator. > Add correct host key in /zfsauton2/home/rakella/.ssh/known_hosts to get > rid of this message. > Offending ECDSA key in /zfsauton2/home/rakella/.ssh/known_hosts:40 > ECDSA host key for gpu14 has changed and you have requested strict > checking. > Host key verification failed. > > Regards, > Ravi Tej Akella > > ? > > On Tue, Oct 11, 2022 at 11:04 PM Predrag Punosevac < > predragp at andrew.cmu.edu> wrote: > >> >> Dear Autonians, >> >> After some delay due to the conference deadlines, GPU14 is finally >> reprovisioned to run RHEL 9. According to my limited testing, things appear >> to work as expected. Please log into GPU14 and test extensively. If this >> turns out to be a truly working configuration, Piotr Bartosiewicz and I >> will reprovision all existing RHEL 7 computing nodes with RHEL 9. >> >> Best, >> >> Predrag >> >> On Tue, Sep 27, 2022 at 8:46 PM Predrag Punosevac < >> predragp at andrew.cmu.edu> wrote: >> >>> Dear Autonians, >>> >>> The short version of this email is that I am planning to offline gpu14 >>> this Thursday in order to upgrade from RHEL 7.9 to recently released RHEL >>> 9.0. The GPU node is currently idle. Please don't start jobs as you will >>> lose them. >>> >>> If you have 10 minutes, the long version of this email reads: >>> >>> >>> It has been brought to my attention that some lab members are running >>> into a problem with glibc library version 2.28 shipped with RHEL 8.6. >>> Currently, about 50% of our computing nodes run an even older version of >>> RHEL 7.9 which is shipped with 2.17. >>> >>> RHEL 9.0 was released less than half a year ago. It is shipped with >>> glibc 2.34. GCC 11.2.1 and binutils 2.35.2. The default Python version is >>> 3.9. The common wisdom is to hold any upgrades until 9.1 release. Due to my >>> personal connection with the Springdale community (Princeton university), >>> which provides us with a free clone of RHEL, I know that 9.0 us production >>> ready. I have already checked the NVidia/CUDA stack and RPMs are built. >>> Therefore, I decided to schedule an experiment and to try to upgrade gpu14 >>> to RHEL 9.0. If the experiment is successful, Piotr Bartosiewicz and I >>> will upgrade all computing nodes currently running RHEL 7.9 to 9.0 release. >>> Computing nodes and all workstations currently running 8.6 will not be >>> touched to maintain the usability of the system. >>> >>> Ubuntu fans should be aware that 22.04 is shipped with glibc 2.35 but >>> GCC and many other things a minor point releases behind RHEL 9.0. The last >>> time I looked, CS CMU facilities were upgrading all desktops to Ubuntu >>> 20.04 from 18.04 and earlier. CMU CS facilities don't run Ubuntu on servers >>> and they are still in crisis mode due to unanticipated termination of the >>> CentOS clone of RHEL 8.xxx. The last time I talked to Ed Walter they were >>> running Cent/RHEL 7.9 and thinking about what to do next (Alma Linux, Rocky >>> Linux as well as what to do with obsolete ROCKS clusters). I did my best to >>> spare you from those things. >>> >>> Cheers, >>> Predrag >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Sat Oct 15 16:46:26 2022 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sat, 15 Oct 2022 16:46:26 -0400 Subject: conda gpu26 In-Reply-To: References:

Message-ID: autofs demon was not working properly. Your home directory was not mounted. I fixed it manually right now and tested with your account. Predrag On Sat, Oct 15, 2022 at 3:32 PM Ifigeneia Apostolopoulou < iapostol at andrew.cmu.edu> wrote: > yes, I am doing export PATH="/opt/miniconda-py39/bin:$PATH" > but I can't find my conda environments. also when I try to create a new > one I am getting: > > NoWritableEnvsDirError: No writeable envs directories configured. > - /zfsauton3/home/iapostol/.conda/envs > - /opt/miniconda-py39/envs > > On Sat, Oct 15, 2022 at 3:08 PM Predrag Punosevac > wrote: > >> System conda is in /opt. Whatever you installed on your own is not my >> responsibility. >> >> On Sat, Oct 15, 2022, 2:45 PM Ifigeneia Apostolopoulou < >> iapostol at andrew.cmu.edu> wrote: >> >>> also conda on gpu26 seems not to work properly. I cannot retrieve my >>> conda environments or create new ones (I 'm setting PATH properly) >>> >>> thanks! >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From iapostol at andrew.cmu.edu Sat Oct 15 17:23:59 2022 From: iapostol at andrew.cmu.edu (Ifigeneia Apostolopoulou) Date: Sat, 15 Oct 2022 17:23:59 -0400 Subject: conda gpu26 In-Reply-To: References:

Message-ID: thanks! On Sat, Oct 15, 2022 at 4:46 PM Predrag Punosevac wrote: > autofs demon was not working properly. Your home directory was not > mounted. I fixed it manually right now and tested with your account. > > Predrag > > On Sat, Oct 15, 2022 at 3:32 PM Ifigeneia Apostolopoulou < > iapostol at andrew.cmu.edu> wrote: > >> yes, I am doing export PATH="/opt/miniconda-py39/bin:$PATH" >> but I can't find my conda environments. also when I try to create a new >> one I am getting: >> >> NoWritableEnvsDirError: No writeable envs directories configured. >> - /zfsauton3/home/iapostol/.conda/envs >> - /opt/miniconda-py39/envs >> >> On Sat, Oct 15, 2022 at 3:08 PM Predrag Punosevac < >> predragp at andrew.cmu.edu> wrote: >> >>> System conda is in /opt. Whatever you installed on your own is not my >>> responsibility. >>> >>> On Sat, Oct 15, 2022, 2:45 PM Ifigeneia Apostolopoulou < >>> iapostol at andrew.cmu.edu> wrote: >>> >>>> also conda on gpu26 seems not to work properly. I cannot retrieve my >>>> conda environments or create new ones (I 'm setting PATH properly) >>>> >>>> thanks! >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Mon Oct 17 10:54:00 2022 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 17 Oct 2022 10:54:00 -0400 Subject: Fwd: Rachel Cummings on Improving Communication for Differential Privacy In-Reply-To: References: Message-ID: This should be of interest to those of us who care about federated or transfer learning as well as privacy protection in AI. Artur ---------- Forwarded message --------- From: C3.ai Digital Transformation Institute Date: Mon, Oct 17, 2022, 10:20 AM Subject: Rachel Cummings on Improving Communication for Differential Privacy To: Insight from Human Behavior The Colloquium on Digital Transformation is a series of weekly online talks on how artificial intelligence, machine learning, and big data can lead to scientific breakthroughs with large-scale societal benefit. *See details of upcoming talks here and note we have the same Zoom Webinar registration link for all fall 2022 talks* *Improving Communication for Differential Privacy: Insight from Human Behavior* *October 20, 1 pm PT/3 pm CT* *Rachel Cummings, Assistant Professor of Industrial Engineering and Operations Research, Columbia University* Differential privacy (DP) is widely regarded as a gold standard for privacy-preserving computation over users? data. A key challenge is that the privacy guarantees are difficult to communicate to users, leaving them uncertain about how and whether they are protected. Despite recent widespread deployment of DP, relatively little is known about user perceptions and how to effectively communicate DP's practical privacy guarantees. This talk will cover a series of user studies aimed at measuring and improving communication with non-technical end users about DP. The first set explores users' privacy expectations related to DP and measures the efficacy of existing methods for communicating the privacy guarantees of DP systems. We find that the ways in which DP is described in-the-wild largely set users' privacy expectations haphazardly, which can be misleading depending on the deployment. Motivated by these findings, the second set develops and evaluates prototype descriptions designed to help end users understand DP guarantees. These descriptions target two important technical details in DP deployments that are often poorly communicated to end users: the privacy parameter epsilon (which governs the level of privacy protections) and distinctions between the local and central models of DP (which governs who can access exact user data). Rachel Cummings is an Assistant Professor of Industrial Engineering and Operations Research at Columbia University. Before joining Columbia, she was an Assistant Professor at the Georgia Institute of Technology. Her research interests lie primarily in data privacy, with connections to machine learning, algorithmic economics, optimization, statistics, and public policy. She is the recipient of an NSF CAREER Award, a DARPA Young Faculty Award, an Apple Privacy-Preserving Machine Learning Award, JP Morgan Chase Faculty Award, a Simons-Google Research Fellowship, a Mozilla Research Grant, and multiple best paper awards. *Watch all C3.ai DTI talks on our YouTube channel* YouTube.com/C3DigitalTransformationInstitute *The Science of Digital Transformation* *About the C3.ai Digital Transformation Institute* Established in March 2020 by C3 AI, Microsoft, and leading universities, the C3.ai Digital Transformation Institute is a research consortium dedicated to accelerating the socioeconomic benefits of artificial intelligence. The Institute engages the world?s leading scientists to conduct research and train practitioners in the new Science of Digital Transformation, which operates at the intersection of artificial intelligence, machine learning, cloud computing, internet of things, big data analytics, organizational behavior, public policy, and ethics. The ten C3.ai Digital Transformation Institute consortium member universities and research laboratories are: University of Illinois at Urbana-Champaign; University of California, Berkeley; Carnegie Mellon University; KTH Royal Institute of Technology; Lawrence Berkeley National Laboratory; Massachusetts Institute of Technology; National Center for Supercomputing Applications at University of Illinois at Urbana-Champaign; Princeton University; Stanford University; and University of Chicago. Learn more at C3DTI.ai . [image: Twitter] [image: LinkedIn] [image: Facebook] [image: YouTube] [image: Website] [image: Email] *C3.ai Digital Transformation Institute @ Berkeley* University of California, Berkeley 750 Sutardja Dai Hall, MC 1764 Berkeley, California 94720-1764 *C3.ai Digital Transformation Institute @ Illinois* University of Illinois at Urbana-Champaign 1205 W. Clark Street, MC-257, Room 1008 Urbana, Illinois 61801 *Copyright ? 2022 C3.ai Digital Transformation Institute, All rights reserved.* You are receiving this email because you opted in via our website. Want to change how you receive these emails? You can update your preferences or unsubscribe from this list . [image: Email Marketing Powered by Mailchimp] -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Oct 19 18:13:33 2022 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 19 Oct 2022 18:13:33 -0400 Subject: Extra scratch gpu[24-27] Message-ID: Dear Autonians, Due to the popular demand I just added 2TB extra scratch space to gpu[24-27]. Kudos to Dr. Dubrawski who paid $300 for the new 2.5" HDD needed for this upgrade. Best, Dr. P^2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Oct 19 18:26:47 2022 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 19 Oct 2022 18:26:47 -0400 Subject: Extra scratch gpu[24-27] In-Reply-To: References: Message-ID: I forgot to say the mount point /home/extra_scatch/$username P^2 On Wed, Oct 19, 2022 at 6:13 PM Predrag Punosevac wrote: > Dear Autonians, > > Due to the popular demand I just added 2TB extra scratch space to > gpu[24-27]. Kudos to Dr. Dubrawski who paid $300 for the new 2.5" HDD > needed for this upgrade. > > Best, > Dr. P^2 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Thu Oct 20 22:01:05 2022 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Thu, 20 Oct 2022 22:01:05 -0400 Subject: upload.autonlab.org cryptographic key changed Message-ID: Dear Autonians, I had to repurpose computing node Athena as a dedicated InfluxDB host. Athena was the backend of the shell gateway upload.autonlab.org. Currently that role is played by the computing node Foxconn until I commission a dedicated small server for that role. Please delete upload cryptographic keys from your ~/.ssh/known_hosts file before trying to use upload as you will see nasty warnings. For those of you who are careful the new cryptographic key should read ED25519 key fingerprint is SHA256:IUQxesUiVl0JBF9f1ilsQOEK7bzrcA0sxPejAmmL0LI. Best, Dr. P^2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Oct 24 11:43:05 2022 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 24 Oct 2022 11:43:05 -0400 Subject: ssh login problems (NFS server killed due to overload) Message-ID: Dear Autoninas, I got several reports this morning from a few of you (Ifi, Abby, Ben, Vedant) that they are having problems accessing the system. After a bit of investigation, I nailed down the culprit to the main file server. The server (NFS instance) appears to be dead or severely degraded due to the overload. I am afraid that the only medicine will be to reboot the machine, perhaps followed up by the reboot of all 45+ computing nodes. This will result in a significant loss of work and productivity. We did go through this exercise less than two months ago. The Auton Lab cluster is not policed for rogue users. Its usability depends on collegial behaviour of each of our 130 members. Use of scratch directories instead of taxing NFS is well described in the documentation and as recently as last week I added extra scratch on at least four machines. Best, Predrag -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Oct 24 12:01:12 2022 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 24 Oct 2022 12:01:12 -0400 Subject: ssh login problems (NFS server killed due to overload) In-Reply-To: <70CF9D01-A418-47A2-A0DF-3EEED712A9BB@andrew.cmu.edu> References: <70CF9D01-A418-47A2-A0DF-3EEED712A9BB@andrew.cmu.edu> Message-ID: I am trying really hard not to reboot anything. I manually restarted a bunch of daemons on the main file server Gaia (nfsd, mounted, rpcbind). I noticed that restarting autofs daemons on computing nodes restored the access. I am using Ansible to propagate autofs daemon restart over all computing nodes. It appears that some of them hang. I am hoping to get away with rebooting only a machine or two and definitely avoid rebooting the main file server. For curiosity. NFS is the last century (1980s Sun Microsystem) technology. It is a centralized single point of failure system. We mitigate this risk by having NFS exports distributed over several different physical file servers which run their own NFS instances. That is why /zfsauton/data and /zfsauton/project as well as /zfsauton/datasets are not affected. Unfortunately all of your home directories are located on GAIA. If I catch rough users I could theoretically move their home directory to the different file server and avoid this mess. The other option I was looking for was migrating NFS to GlusterFS (distributed network file system). The migration will be non-trivial and the performance penalty with small files might be significant. This is not an exact science. Predrag On Mon, Oct 24, 2022 at 11:47 AM Benedikt Boecking wrote: > If there is any way to not reboot gpu24 and gpu27 you might save me 2 > weeks of work. If they are rebooted I may be screwed for my ICLR rebuttal. > > But ultimately, do what you have to of course. Thanks! > > > > > On Oct 24, 2022, at 10:43 AM, Predrag Punosevac > wrote: > > > > > > > > Dear Autoninas, > > > > I got several reports this morning from a few of you (Ifi, Abby, Ben, > Vedant) that they are having problems accessing the system. After a bit of > investigation, I nailed down the culprit to the main file server. The > server (NFS instance) appears to be dead or severely degraded due to the > overload. > > > > I am afraid that the only medicine will be to reboot the machine, > perhaps followed up by the reboot of all 45+ computing nodes. This will > result in a significant loss of work and productivity. We did go through > this exercise less than two months ago. > > > > The Auton Lab cluster is not policed for rogue users. Its usability > depends on collegial behaviour of each of our 130 members. Use of scratch > directories instead of taxing NFS is well described in the documentation > and as recently as last week I added extra scratch on at least four > machines. > > > > Best, > > Predrag > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Oct 24 12:12:47 2022 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 24 Oct 2022 12:12:47 -0400 Subject: ssh login problems (NFS server killed due to overload) In-Reply-To: References: <70CF9D01-A418-47A2-A0DF-3EEED712A9BB@andrew.cmu.edu> Message-ID: Please try to test bash.autonlab.org, upload.autonlab.org, and lop2.autonlab.org. It appears that NFS mounts work on these shell gateways. If you have an Auton Lab workstation please mount -o remount your network home directory or reboot it. Predrag On Mon, Oct 24, 2022 at 12:01 PM Predrag Punosevac wrote: > I am trying really hard not to reboot anything. I manually restarted a > bunch of daemons on the main file server Gaia (nfsd, mounted, rpcbind). I > noticed that restarting autofs daemons on computing nodes restored the > access. I am using Ansible to propagate autofs daemon restart over all > computing nodes. It appears that some of them hang. I am hoping to get away > with rebooting only a machine or two and definitely avoid rebooting the > main file server. > > For curiosity. NFS is the last century (1980s Sun Microsystem) technology. > It is a centralized single point of failure system. We mitigate this risk > by having NFS exports distributed over several different physical file > servers which run their own NFS instances. That is why /zfsauton/data and > /zfsauton/project as well as /zfsauton/datasets are not affected. > Unfortunately all of your home directories are located on GAIA. If I catch > rough users I could theoretically move their home directory to the > different file server and avoid this mess. The other option I was looking > for was migrating NFS to GlusterFS (distributed network file system). The > migration will be non-trivial and the performance penalty with small files > might be significant. This is not an exact science. > > Predrag > > > > > On Mon, Oct 24, 2022 at 11:47 AM Benedikt Boecking < > boecking at andrew.cmu.edu> wrote: > >> If there is any way to not reboot gpu24 and gpu27 you might save me 2 >> weeks of work. If they are rebooted I may be screwed for my ICLR rebuttal. >> >> But ultimately, do what you have to of course. Thanks! >> >> >> >> > On Oct 24, 2022, at 10:43 AM, Predrag Punosevac < >> predragp at andrew.cmu.edu> wrote: >> > >> > >> > >> > Dear Autoninas, >> > >> > I got several reports this morning from a few of you (Ifi, Abby, Ben, >> Vedant) that they are having problems accessing the system. After a bit of >> investigation, I nailed down the culprit to the main file server. The >> server (NFS instance) appears to be dead or severely degraded due to the >> overload. >> > >> > I am afraid that the only medicine will be to reboot the machine, >> perhaps followed up by the reboot of all 45+ computing nodes. This will >> result in a significant loss of work and productivity. We did go through >> this exercise less than two months ago. >> > >> > The Auton Lab cluster is not policed for rogue users. Its usability >> depends on collegial behaviour of each of our 130 members. Use of scratch >> directories instead of taxing NFS is well described in the documentation >> and as recently as last week I added extra scratch on at least four >> machines. >> > >> > Best, >> > Predrag >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Oct 24 13:27:07 2022 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 24 Oct 2022 13:27:07 -0400 Subject: ssh login problems (NFS server killed due to overload) In-Reply-To: <4674D02C-875E-4347-BD66-6A9082231E14@andrew.cmu.edu> References: <70CF9D01-A418-47A2-A0DF-3EEED712A9BB@andrew.cmu.edu>

<4674D02C-875E-4347-BD66-6A9082231E14@andrew.cmu.edu> Message-ID: That means that the process which caused the crash are still alive. I need to think a bit how to proceed in the most efficient way. Logging into 45 computing nodes and poking around doesn't scale well. If I end up doing that offending account will be suspended. Predrag On Mon, Oct 24, 2022, 1:21 PM Benedikt Boecking wrote: > Just to confirm, looks like things are down again. > > > > On Oct 24, 2022, at 11:12 AM, Predrag Punosevac > wrote: > > Please try to test bash.autonlab.org, upload.autonlab.org, and > lop2.autonlab.org. > > It appears that NFS mounts work on these shell gateways. If you have an > Auton Lab workstation please mount -o remount your network home directory > or reboot it. > > Predrag > > On Mon, Oct 24, 2022 at 12:01 PM Predrag Punosevac < > predragp at andrew.cmu.edu> wrote: > >> I am trying really hard not to reboot anything. I manually restarted a >> bunch of daemons on the main file server Gaia (nfsd, mounted, rpcbind). I >> noticed that restarting autofs daemons on computing nodes restored the >> access. I am using Ansible to propagate autofs daemon restart over all >> computing nodes. It appears that some of them hang. I am hoping to get away >> with rebooting only a machine or two and definitely avoid rebooting the >> main file server. >> >> For curiosity. NFS is the last century (1980s Sun Microsystem) >> technology. It is a centralized single point of failure system. We >> mitigate this risk by having NFS exports distributed over several different >> physical file servers which run their own NFS instances. That is why >> /zfsauton/data and /zfsauton/project as well as /zfsauton/datasets are not >> affected. Unfortunately all of your home directories are located on GAIA. >> If I catch rough users I could theoretically move their home directory to >> the different file server and avoid this mess. The other option I was >> looking for was migrating NFS to GlusterFS (distributed network file >> system). The migration will be non-trivial and the performance penalty with >> small files might be significant. This is not an exact science. >> >> Predrag >> >> >> >> >> On Mon, Oct 24, 2022 at 11:47 AM Benedikt Boecking < >> boecking at andrew.cmu.edu> wrote: >> >>> If there is any way to not reboot gpu24 and gpu27 you might save me 2 >>> weeks of work. If they are rebooted I may be screwed for my ICLR rebuttal. >>> >>> But ultimately, do what you have to of course. Thanks! >>> >>> >>> >>> > On Oct 24, 2022, at 10:43 AM, Predrag Punosevac < >>> predragp at andrew.cmu.edu> wrote: >>> > >>> > >>> > >>> > Dear Autoninas, >>> > >>> > I got several reports this morning from a few of you (Ifi, Abby, Ben, >>> Vedant) that they are having problems accessing the system. After a bit of >>> investigation, I nailed down the culprit to the main file server. The >>> server (NFS instance) appears to be dead or severely degraded due to the >>> overload. >>> > >>> > I am afraid that the only medicine will be to reboot the machine, >>> perhaps followed up by the reboot of all 45+ computing nodes. This will >>> result in a significant loss of work and productivity. We did go through >>> this exercise less than two months ago. >>> > >>> > The Auton Lab cluster is not policed for rogue users. Its usability >>> depends on collegial behaviour of each of our 130 members. Use of scratch >>> directories instead of taxing NFS is well described in the documentation >>> and as recently as last week I added extra scratch on at least four >>> machines. >>> > >>> > Best, >>> > Predrag >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Tue Oct 25 15:18:20 2022 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 25 Oct 2022 15:18:20 -0400 Subject: Auton survival models sweep the board in a prostate cancer study just published in Nature Scientific Reports Message-ID: Dear Autonians, You will enjoy checking this out: https://www.nature.com/articles/s41598-022-22118-y Both of the two winning methods, DSM and RDSM, are parts of the open source software package auton-survival. Way to go Chirag, Rachel, Vincent, Willa, and the rest of our auton-survival team! Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Wed Oct 26 13:17:14 2022 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Wed, 26 Oct 2022 13:17:14 -0400 Subject: Fwd: ML/Duolingo Seminar - Michael Oberst In-Reply-To: References: Message-ID: this talk could be of interest to all of us who work in the ML4HC space and who are curious about causality in the context of training predictive models ---------- Forwarded message --------- From: Sharon Cavlovich Date: Wed, Oct 26, 2022 at 12:37 PM Subject: ML/Duolingo Seminar - Michael Oberst To: Please join us for a ML/Duolingo Seminar! Tuesday, Nov. 1, 2022 NSH 4305 10:30am Michael Oberst, PhD Candidate, MIT Title: What is the role of causality in reliable prediction? Abstract: How should we incorporate causal knowledge into the development of predictive models in high-risk domains like healthcare? Rather than attempting to learn "causal" models, I present an alternative viewpoint: Partial causal knowledge can be used to anticipate how model performance will change in novel (but plausible) scenarios, and can be used as a guide for developing reliable models. First, I will discuss my work on learning linear predictors that are worst-case optimal under a set of user-specified interventions on unobserved variables (e.g., moving from a hospital with high-income patients to one with lower-income patients). This work assumes the existence of noisy proxies for those background variables at training time, and an underlying linear causal model over all variables. A key insight is that the optimal predictor is not necessarily a "causal" predictor, but depends on the scale (and direction) of plausible interventions. Second, I will demonstrate how similar ideas can be extended to more general settings, including computer vision. Here, I will discuss work on evaluating the worst-case performance of predictive models under a set of user-specified, causally interpretable changes in distribution (e.g., a change in X-ray scanning policies). In contrast to work that considers a worst-case over subpopulations or distributions in an f-divergence ball, we consider parametric shifts in the distribution of a subset of variables. This allows us to further constrain the space of plausible shifts, and in some cases directly interpret the worst-case shift to build intuition for model vulnerabilities. This talk is based on joint work with Nikolaj Thams, David Sontag, and Jonas Peters. (https://arxiv.org/abs/2103.02477, https://arxiv.org/abs/2205.15947) Bio: Michael Oberst is a PhD Candidate in EECS at MIT, advised by David Sontag. His research lies at the intersection of causality, machine learning, and healthcare, with an emphasis on improving the reliability of both causal inference and prediction models. His work has been published at a range of machine learning venues (NeurIPS / ICML / AISTATS / KDD), including work with clinical collaborators from NYU Langone, Beth Israel Deaconess Medical Center, and Mass General Brigham. He has also worked on clinical applications of machine learning, including work on learning effective antibiotic treatment policies (published in Science Translational Medicine). He earned his undergraduate degree in Statistics at Harvard. -- -- Sharon Cavlovich Senior Department Administrative Assistant | Machine Learning Department Carnegie Mellon University 5000 Forbes Avenue | Gates Hillman Complex 8215 Pittsburgh, PA 15213 412.268.5196 (office) | 412.268.3431 (fax) -------------- next part -------------- An HTML attachment was scrubbed... URL: