From bogus@does.not.exist.com Tue Apr 15 14:28:36 2014 From: bogus@does.not.exist.com () Date: Tue, 15 Apr 2014 18:28:36 -0000 Subject: No subject Message-ID: loki.auton.cs.cmu.edu" or "ssh 128.2.179.122" still works. However, doing either of these from one of the alphas, for example, will fail because loki has address 192.168.1.4 inside the lab. This is inconvenient for everyone, and we have been working around this problem in a non-sustainable way. We believe the best solution is to put the "inside" machines our other domain, autonlab.org. This would allow external machines to ssh to loki.auton.cs.cmu.edu *or* loki.autonlab.org *or* 128.2.179.122. Inside the lab, one could still ssh to loki.autonlab.org *or* loki, and the right thing would happen. However, ssh'ing to loki.auton.cs.cmu.edu or 128.2.179.122 would still fail. Summary: if we make the proposed change, you can always use the name loki.autonlab.org, and loki.auton.cs.cmu.edu would continue to work from outside the lab. Alternative suggestions are welcome. In fact, *all* comments are welcome! Please let us know any concerns or reservations you have about this proposal. -Paul Komarek From bogus@does.not.exist.com Tue Apr 15 14:28:36 2014 From: bogus@does.not.exist.com () Date: Tue, 15 Apr 2014 18:28:36 -0000 Subject: No subject Message-ID: as many times as you want, and will not be asked for a password. Because of the way we set things up in the lab, this buys you similar free access to loki, liver, and limey. NOTES: - IF YOUR HOME DIRECTORY ON YOUR "PRIMARY WORK MACHINE" IS ON AFS: - then the private key can be "sniffed" (stolen) by hackers when it travels from the AFS server to you. YOU SHOULDN'T BE USING files in .ssh/ IF IT'S ON AFS. If you're in this kind of situation you can tell ssh-keygen to store the keys on a directory on the local disk, and similarly tell ssh-add to read it from the same directory. Make sure that you're the only ones with read and write access to that directory. - You can save typing the "eval ssh-agent" part by putting it into a startup file or wrapping "ssh-agent" around your window manager. I won't go into the details here, but if someone cares to post them to this list they're very welcome. - If the machine you're logging in TO is a facilitized SCS machine, you will not be granted Kerberos tokens, meaning that you won't be able to access files in AFS. There are tricks to get around that, again too detailed to go into here. I'll just say that if you want this to work so you can use the said machine to access the CVS repo, then it will work if you follow the instructions on the web-site on how to set up CVS access for AFS-ignorant clients. Again, if you do this, make sure that the machine with AFS on it only stores you PUBLIC key and does not have the PRIVATE version. Good luck! -- Dan Pelleg From predragp at andrew.cmu.edu Thu Apr 3 14:42:02 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 3 Apr 2014 14:42:02 -0400 Subject: [auton-users] neill-fs non-responsive Message-ID: <06dcc021ccfa6c34fd29a8438c78d355.squirrel@webmail.andrew.cmu.edu> Users of the Neill's group are probably experiencing total lack of services. I just went upstairs and neill-fs is practically non responsive. Last time that happen was when the backup scripts drag machine down to almost a halt. The decision was made to have new files server. For the rest of today I will go back to my original plan trying fixing the last remaining bits of neill-zfs, neill1, and neill2 and releasing that group of servers. No work is going to be done on dying hardware. Most Kind Regards, Predrag From predragp at andrew.cmu.edu Fri Apr 4 17:13:19 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Fri, 4 Apr 2014 17:13:19 -0400 Subject: [auton-users] Status Page replaced with M/Monit Message-ID: <9d078e32b9cafb1db854513668b00746.squirrel@webmail.andrew.cmu.edu> There is no point keeping the link to old status page. Please use M/Monit to check on our computing nodes and some other machines. To log into M/Monit username:auton password:Dr.Who I am hoping to link Munin soon so that you can have nice RRD graphs for all resources as well number of users logged into machines. Predrag P.S. Several people where complaining about MATLAB on LOW1. There is no problem with license manager. I fire-walled it intensionally as I was expecting to rebuild LOW1 this week with self hosting instance of MATLAB. Unfortunately I am still fighting the issues coming from LDAP conversion. From predragp at andrew.cmu.edu Mon Apr 7 12:45:06 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Mon, 7 Apr 2014 12:45:06 -0400 Subject: [auton-users] Important MATLAB update status Message-ID: Dear Autonians, Late last night I started updating MATLAB on all computing nodes. This is what has been done so far. Computing nodes running self hosting version of the latest MATLAB 2014a with all tool boxes. 1. lera 2. lxv1 3. lxv2 4. neill1 (LDAP still off) 5. neill2 (LDAP still off) Computing nodes which have new version of R2014b but I need everyone to stop their MATLAB session before I can restart licensing server on 1. lov3 2. lov4 3. lou1 Computing nodes which have to be rebuilt before we can even attempt installation of R2014b which doesn't support RedHat 5.xxx 1. low1 2. Guard Dog Cluster -Compute-0-0 -Compute-0-1 -Compute-0-2 -Compute-0-3 3. Neill3 4. Neill4 Cheers, Predrag From predragp at andrew.cmu.edu Wed Apr 9 16:19:22 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Wed, 9 Apr 2014 16:19:22 -0400 Subject: [auton-users] LOT2 back in business Message-ID: Lot2 is back on line after kernel memory dump. I am not sure what cause the dump but we might be dealing with dead memory modules here. Predrag From predragp at andrew.cmu.edu Thu Apr 10 10:58:10 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 10 Apr 2014 10:58:10 -0400 Subject: [auton-users] MATLAB licensing server LOV3/LOV4 Message-ID: <8d74725c387f29cff1c2df5e120dc0cc.squirrel@webmail.andrew.cmu.edu> MATLAB licensing has to be restarted on LOV3/LOV4. In order for me to do that everyone has to get off the MATLAB. How does 3:00 PM today sound? Predrag From predragp at andrew.cmu.edu Thu Apr 10 15:02:01 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 10 Apr 2014 15:02:01 -0400 Subject: [auton-users] LDAP enabled computing nodes (Neill1, Neill2, and ...) Message-ID: Dear Autonians, After a week and half fighting security certificates and TLS connections between computing nodes/desktops and new LDAP server I have things 100% under control. This is the last bit I needed to complete rebuilding Auton Lab computer infrastructure. Computing nodes Neill1, Neill2, my desktop (as a proof of concept) are hooked to new LDAP server as of this morning. I have as of this morning 16 out of 60 or something Auton Lab users in LDAP data base. My plan is to migrate info of the members of Neill's group tonight and tomorrow morning so that Neill group can migrate from old to new hardware (including home directories on the new file server). Desktop conversion to LDAP will start tomorrow after 5 pm and hopefully be finished by Monday morning. The further conversion of all computing nodes to LDAP will be continued next week. In order to be able to use computing nodes which are plugged into LDAP you will a new password. There are two options. 1. I randomly generate and e-mail password for you. 2. You stop by my office and type the password of choice into hash generator. Note that very simple version of LDAP which we use in the lab will not allow users to change the password. I do not care for your passwords and if you could e-mail me SSHA hashes of your password (slappasswd -h {SSHA}) I would be happy to but them into the data base. Most Kind Regards, Predrag From predragp at andrew.cmu.edu Thu Apr 10 15:42:03 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 10 Apr 2014 15:42:03 -0400 Subject: [auton-users] MATLAB LOV3/LOV4 fully functional Message-ID: <37f63cf581176d82491b18707d6dde23.squirrel@webmail.andrew.cmu.edu> As of 3:00 PM today I killed all instances of 2013b MATLAB on LOV3 and LOV4 and restarted licensing server. To use new MATLAB make sure you use full path /usr/local/MATLAB/R2014a/bin/matlab This is due to the fact that by default the software is firstly lunched from /auton shares which are mounted via NFS. /auton has 2013b version of MATLAB which will not be updated as we are migrating toward local self hosting software to improve performance and increase the security of the whole system. The self hosting means that MATLAB instances on LOV3 and LOV4 (just like Neill1 , Neill2, Lera, LXV1, LXV2) do not depend on the university licensing server. I am planning on restarting licensing server tomorrow on LOU1 exactly at 3 PM. Please kill all instances of MATLAB by that time on LOU1. Predrag From predragp at andrew.cmu.edu Thu Apr 10 16:17:00 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Thu, 10 Apr 2014 16:17:00 -0400 Subject: [auton-users] LOFTY and LOCK are dying painful dead Message-ID: <854e77c066ad28b45dad71ac5af96175.squirrel@webmail.andrew.cmu.edu> Many of you have experienced sloooow DNS resolution and slooooow login process on LOP1, LOP2 gateways and many computing nodes. If you check M/Monit you will see that this is due to the fact that our old primary domain controller LOFTY is dying a painful dead. Just check how much memory is LOFTY swapping. Lofty runs Master instance of Bind DNS as well as NIS server and even DHCP server. The good news is that we have new cluster of DNSs and fully functional replacement for NIS (LDAP server). However connection computing nodes is going to take a bit longer. Ten minutes ago I have edited DHCP server configuration files on LOFTY trying to point computing nodes which has not been rebuilt to new DNS clusters. However network connections will have to be restarted to change to take the place and that one will not be forced on nodes. The nodes which need to be rebuilt also use LOCK as a default gateway and LOCK is dying too. I am scared to massively switch nodes to AREAS as a gateway due to the fact that AREAS has very serious firewall rules and might break some of the services that not documented. The whole situation has been anticipated and the conversion will be finished before LOFTY and LOCK die. Predrag From predragp at andrew.cmu.edu Fri Apr 11 18:46:24 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Fri, 11 Apr 2014 18:46:24 -0400 Subject: [auton-users] MATLAB licences restarted on LOU1 Message-ID: <10d467732f615c84ced0af9081919f9f.squirrel@webmail.andrew.cmu.edu> I have restarted MATLAB licenses on LOU1. The installation of MATLAB on LOU1 is self hosting. Please use the full path /usr/local/MATLAB/R2014a/bin/matlab to start the application. By default when you issue command it will try to start the software from auton shares which has an old version of MATLAB which can not be use on new computing node. Predrag From predragp at andrew.cmu.edu Sun Apr 13 14:43:49 2014 From: predragp at andrew.cmu.edu (predragp at andrew.cmu.edu) Date: Sun, 13 Apr 2014 14:43:49 -0400 Subject: [auton-users] LDAP switch update Message-ID: <3626dd3956b41b7688c661ef0ebe6972.squirrel@webmail.andrew.cmu.edu> All our Linux desktops are now switched of to LDAP. If you have not stopped by my office to type your LDAP password into the database please do so as mismatch between your local desktop password (old NIS password) and LDAP password will result in file permission problems on NFS. your new zfsauton/home directories are still not mounted. I am proceeding carefully. However if you need storage space on new file server immediately please let me know and I will mount it for you (I am thinking of you Ben). The only people who have access to certain data on new file server are Jarod and Jieshi Chen. Cheers, Predrag P.S. I still have not get around putting Neill people into LDAP data base. It is no.1 priority now.