From awd at cs.cmu.edu Fri Feb 3 10:51:47 2023 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Fri, 3 Feb 2023 10:51:47 -0500 Subject: Our good friend and partner. Dr. Debra Bogen, to become Pennsylvania Secretary of Health Message-ID: The current head of the Allegheny County Health Department and the intellectual lead of our ongoing Covid-19 surveillance project using wastewater as the source data for analyses, Dr. Debra Bogen (also faculty at Pitt and a pediatrician at UPMC) has been selected by the new Governor of Pennsylvania to assume a highly prominent post in our state government. Check this out: https://www.pittsburghmagazine.com/governor-elect-josh-shapiro-calls-on-dr-debra-bogen-as-pa-secretary-of-health/ We will see if this will or won't help us continue and expand our wastewater analytics project, but I am moderately optimistic :) Cheers, Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Feb 6 18:15:18 2023 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 6 Feb 2023 16:15:18 -0700 Subject: Identity management issues Message-ID: A number of you have reported some identity management issues in the past 2 days. I took PTO until Thursday because I am not Pittsburgh and I very limited access to the Internet. I will try to look into it before Thursday but no promises. Predrag -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Feb 7 22:47:52 2023 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 7 Feb 2023 22:47:52 -0500 Subject: GPU1 multiple crashes, server restored Message-ID: Whoever was using it should debug her/his code root at gpu1$ pwd /var/crash root at gpu1$ ls 127.0.0.1-2022-12-21-17:28:08 127.0.0.1-2023-01-08-09:21:13 127.0.0.1-2022-12-28-10:30:57 127.0.0.1-2023-01-28-17:12:06 127.0.0.1-2023-01-01-07:47:43 127.0.0.1-2023-02-02-12:21:30 127.0.0.1-2023-01-04-08:42:19 127.0.0.1-2023-02-07-01:54:10 Predrag -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Feb 7 23:07:28 2023 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 7 Feb 2023 23:07:28 -0500 Subject: lov2 down In-Reply-To: <4D4F5929-5ADE-4D3E-900A-B87CC9B8FBDC@andrew.cmu.edu> References: <4D4F5929-5ADE-4D3E-900A-B87CC9B8FBDC@andrew.cmu.edu> Message-ID: Finally, I had some time to look into this. Almost surely it is either a dirty file system like lov5 and lov6 a week ago or faulty hardware. This will have to wait until Friday when I have full access to the server room. If it turns out to be a dirty file system I will just rebuild the machine with RHEL 9.1 like I did with lov5 and lov6 which I still didn't make available as I ran out of time before my PTO. Predrag On Sat, Feb 4, 2023 at 1:49 PM Vikram Duvvur wrote: > Dear Predrag, > > Thank you for quick response, please let me know if I can help when you > get back. > > Regards, > Vikram > > On Feb 4, 2023, at 11:11 AM, Predrag Punosevac > wrote: > > ? > I took PTO. This will have to wait. > > Predrag > > On Sat, Feb 4, 2023, 10:48 AM Vikram Duvvur wrote: > >> Hi Predrag, >> I was running a (heavy workload) job on lov2 yesterday, and it seems >> like it crashed around 3am last night. I'm a little confused because I ran >> a similar workload on it last week, but maybe there is another issue. Is >> there anything I can do to help bring it back online? >> >> Thank you, >> Vikram >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Tue Feb 7 23:11:27 2023 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 7 Feb 2023 23:11:27 -0500 Subject: Identity management issues In-Reply-To: References: Message-ID: I had an hour to check our identity management services. Nothing appears to be out of the ordinary. I think several people caught the LDAP server while it was busy (typically if you wait 10 min it clears up). They started panicking due to the deadlines and tried to force ssh login which in turn resulted in temporary account suspension (on the level of individual computing nodes). Predrag On Mon, Feb 6, 2023 at 6:15 PM Predrag Punosevac wrote: > A number of you have reported some identity management issues in the past > 2 days. > > I took PTO until Thursday because I am not Pittsburgh and I very limited > access to the Internet. I will try to look into it before Thursday but no > promises. > > > Predrag > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Feb 8 10:11:32 2023 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 8 Feb 2023 08:11:32 -0700 Subject: Identity management issues In-Reply-To: References: Message-ID: autofs daemon was killed when the machines were struggling with computational tasks. I will restart next time I have the access to the Internet which is not before tomorrow. Predrag On Wed, Feb 8, 2023, 5:29 AM Vedant Sanil wrote: > Hi Predrag, > > I can't seem to login into any one of these machines: lov2, lov5, lov6, > lov13. It requests my LDAP password but throws me permission denied when I > enter it correctly. > > I am able to login to lov10 but am greeted with this message: Could not > chdir to home directory /zfsauton2/home/vsanil: No such file or directory > > I'd appreciate it if you would be able to help me out with this issue! > Thanks. > > Regards, > Vedant Sanil > > > On Wed, Feb 8, 2023 at 9:44 AM Predrag Punosevac > wrote: > >> I had an hour to check our identity management services. Nothing appears >> to be out of the ordinary. I think several people caught the LDAP server >> while it was busy (typically if you wait 10 min it clears up). They started >> panicking due to the deadlines and tried to force ssh login which in turn >> resulted in temporary account suspension (on the level of individual >> computing nodes). >> >> Predrag >> >> >> On Mon, Feb 6, 2023 at 6:15 PM Predrag Punosevac >> wrote: >> >>> A number of you have reported some identity management issues in >>> the past 2 days. >>> >>> I took PTO until Thursday because I am not Pittsburgh and I very limited >>> access to the Internet. I will try to look into it before Thursday but no >>> promises. >>> >>> >>> Predrag >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Fri Feb 10 09:41:41 2023 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 10 Feb 2023 09:41:41 -0500 Subject: Identity management issues In-Reply-To: References: Message-ID: autofs daemon was dead on lov10 that is why you were getting this message Could not chdir to home directory /zfsauton2/home/vsanil: No such file or directory It is fixed now. Nothing wrong with lov2 and lov13. LOV1, LOV5, and LOV6 are not currently available. These three machines have failed to reboot with an unclear file system and are being reprovisioned. This was communicated earlier. Predrag On Wed, Feb 8, 2023 at 7:29 AM Vedant Sanil wrote: > Hi Predrag, > > I can't seem to login into any one of these machines: lov2, lov5, lov6, > lov13. It requests my LDAP password but throws me permission denied when I > enter it correctly. > > I am able to login to lov10 but am greeted with this message: Could not > chdir to home directory /zfsauton2/home/vsanil: No such file or directory > > I'd appreciate it if you would be able to help me out with this issue! > Thanks. > > Regards, > Vedant Sanil > > > On Wed, Feb 8, 2023 at 9:44 AM Predrag Punosevac > wrote: > >> I had an hour to check our identity management services. Nothing appears >> to be out of the ordinary. I think several people caught the LDAP server >> while it was busy (typically if you wait 10 min it clears up). They started >> panicking due to the deadlines and tried to force ssh login which in turn >> resulted in temporary account suspension (on the level of individual >> computing nodes). >> >> Predrag >> >> >> On Mon, Feb 6, 2023 at 6:15 PM Predrag Punosevac >> wrote: >> >>> A number of you have reported some identity management issues in >>> the past 2 days. >>> >>> I took PTO until Thursday because I am not Pittsburgh and I very limited >>> access to the Internet. I will try to look into it before Thursday but no >>> promises. >>> >>> >>> Predrag >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ichar at andrew.cmu.edu Fri Feb 10 12:18:49 2023 From: ichar at andrew.cmu.edu (Ian Char) Date: Fri, 10 Feb 2023 12:18:49 -0500 Subject: GPU1 multiple crashes, server restored In-Reply-To: References: Message-ID: Sorry for the late follow up on this. To the best of my knowledge I have been the only one using this machine (others please correct me if I am wrong). At first I thought it was related to the previous power outages, but I think there is something else going on here because of the frequency. I'm happy to look through my code again, but I think there is another issue because 1. I am running the same code on other machines fine and 2. I have noticed that a crash happened when I had no jobs running at all. Please let me know if there is anything I can do to help investigate this. Thanks, Ian On Tue, Feb 7, 2023 at 10:51 PM Predrag Punosevac wrote: > Whoever was using it should debug her/his code > > root at gpu1$ pwd > /var/crash > > root at gpu1$ ls > 127.0.0.1-2022-12-21-17:28:08 127.0.0.1-2023-01-08-09:21:13 > 127.0.0.1-2022-12-28-10:30:57 127.0.0.1-2023-01-28-17:12:06 > 127.0.0.1-2023-01-01-07:47:43 127.0.0.1-2023-02-02-12:21:30 > 127.0.0.1-2023-01-04-08:42:19 127.0.0.1-2023-02-07-01:54:10 > > Predrag > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Fri Feb 10 12:26:54 2023 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Fri, 10 Feb 2023 12:26:54 -0500 Subject: GPU1 multiple crashes, server restored In-Reply-To: References: Message-ID: GPU1 is an 8 year old server. I will check the RAID and a few other things but we have to be realistic about the life span of these machines (5 is the industry standard). Predrag On Fri, Feb 10, 2023 at 12:19 PM Ian Char wrote: > Sorry for the late follow up on this. To the best of my knowledge I have > been the only one using this machine (others please correct me if I am > wrong). At first I thought it was related to the previous power outages, > but I think there is something else going on here because of the frequency. > > I'm happy to look through my code again, but I think there is another > issue because > 1. I am running the same code on other machines fine and > 2. I have noticed that a crash happened when I had no jobs running at all. > > Please let me know if there is anything I can do to help investigate this. > > Thanks, > Ian > > On Tue, Feb 7, 2023 at 10:51 PM Predrag Punosevac > wrote: > >> Whoever was using it should debug her/his code >> >> root at gpu1$ pwd >> /var/crash >> >> root at gpu1$ ls >> 127.0.0.1-2022-12-21-17:28:08 127.0.0.1-2023-01-08-09:21:13 >> 127.0.0.1-2022-12-28-10:30:57 127.0.0.1-2023-01-28-17:12:06 >> 127.0.0.1-2023-01-01-07:47:43 127.0.0.1-2023-02-02-12:21:30 >> 127.0.0.1-2023-01-04-08:42:19 127.0.0.1-2023-02-07-01:54:10 >> >> Predrag >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Mon Feb 13 15:36:43 2023 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 13 Feb 2023 15:36:43 -0500 Subject: Fwd: Seminar Detail - Seminar Tracker In-Reply-To: <6C35753E-3625-4EDF-9CAB-59D88003C5D1@andrew.cmu.edu> References: <6C35753E-3625-4EDF-9CAB-59D88003C5D1@andrew.cmu.edu> Message-ID: Maria is back! Artur ---------- Forwarded message --------- From: Martin Gaynor Date: Mon, Feb 13, 2023 at 3:35 PM Subject: Seminar Detail - Seminar Tracker To: heinz-phd at lists. edu , < heinz-all-faculty at lists.andrew.cmu.edu> FYI our terrific PhD alum Maria DeArtega, who some of you may recall, is giving a talk over at Tepper this Friday. Best, Marty https://seminartracker.tepper.cmu.edu/ Maria De-Arteaga BT/IS Seminar - A Case for Humans-in-the-Loop: Decisions in the Presence of Misestimated Algorithmic Scores Business Technology February 17, 2023 at 12:00 PM EST (local) || Duration: 60 minutes Location: Virtual, Meeting Link: (Virtual) UT Austin The increased use of machine learning to assist with decision-making in high-stakes domains has been met with both enthusiasm and concern. One source of ongoing debate is the effect and value of decision makers' discretionary power to override algorithmic recommendations. In this paper, we study the adoption of an algorithmic tool used to help with decisions in child maltreatment hotline screenings. By taking advantage of an implementation glitch, we investigate corrective overrides: whether decision makers are more likely to override algorithmic recommendations when the tool misestimates the risk score shown to call workers. We find that, after the deployment of the tool, decisions became better aligned with algorithmic assessments, but human adherence to the tool's recommendation was less likely when the displayed score was misestimated as a result of the glitch. Then, analyzing the effect of adoption and overrides on racial and socioeconomic disproportionalities, we find that the deployment of the tool did not affect disproportionalities with respect to the pre-deployment period. We also observe that the disproportionalities resulting from algorithmic-informed decisions were substantially smaller than those associated with the algorithm in isolation. Together, these results make a case for the value of humans in-the-loop, showing that in high-stakes contexts, human discretionary power can mitigate the risks of algorithmic errors and reduce disparities. Paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4050125 Additional Notes: Meeting ID: 951 4541 3358 Passcode: 294771 If you have any questions, please contact Phil Conley at pconley at andrew.cmu.edu (412) 268-6212 (Carnegie Mellon University). _______________________________________________ Heinz-all-faculty mailing list Heinz-all-faculty at lists.andrew.cmu.edu https://lists.andrew.cmu.edu/mailman/listinfo/heinz-all-faculty _______________________________________________ Heinz-affiliate-faculty mailing list Heinz-affiliate-faculty at lists.andrew.cmu.edu https://lists.andrew.cmu.edu/mailman/listinfo/heinz-affiliate-faculty -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Feb 13 16:59:05 2023 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 13 Feb 2023 16:59:05 -0500 Subject: Can't login to lov13 In-Reply-To: <6A8C58F7-ED7B-40A4-B79D-57E30D534D98@andrew.cmu.edu> References: <6A8C58F7-ED7B-40A4-B79D-57E30D534D98@andrew.cmu.edu> Message-ID: You managed to put a shell gateway and a computing node to Jail. fail2ban> status sshd Status for the jail: sshd |- Filter | |- Currently failed: 4 | |- Total failed: 184 | `- Journal matches: _SYSTEMD_UNIT=sshd.service + _COMM=sshd `- Actions |- Currently banned: 2 |- Total banned: 3 `- Banned IP list: 192.168.6.91 10.8.0.14 I fixed it. If the server is temporarily unavailable there is no point brute forcing it. Just wait a few minutes before trying again. Predrag On Mon, Feb 13, 2023 at 4:30 PM Arundhati Banerjee wrote: > Hi Predrag, > > Hope you are doing well. I am currently unable to ssh into lov13 - it > keeps asking for the password, but says 'permission denied? even with the > correct password entered. Is there something wrong? > > Thanks, > Arundhati -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Sun Feb 26 20:33:42 2023 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Sun, 26 Feb 2023 20:33:42 -0500 Subject: gpu27 will crash Message-ID: Someone is using /tmp for cashing, which is one of the major don'ts according to the documentation. The root file system is almost full. This will eventually crash the server. Just saying ... Predrag -------------- next part -------------- An HTML attachment was scrubbed... URL: