From jieshic at andrew.cmu.edu Mon Oct 1 10:16:59 2018 From: jieshic at andrew.cmu.edu (Jieshi Chen) Date: Mon, 1 Oct 2018 14:16:59 +0000 Subject: *** Auton Lab's 25th Annual Picnic: Sunday, October 7th at Schenley Park *** In-Reply-To: <70D7E5A2-54D9-4607-8C1D-EE868038CDDD@andrew.cmu.edu> References: <70D7E5A2-54D9-4607-8C1D-EE868038CDDD@andrew.cmu.edu> Message-ID: <4f1d82938a4c43798c36e241758ddca8@andrew.cmu.edu> Pls RSVP through the following link if you haven't done so. We will place the order for food and CAKE soon. https://goo.gl/forms/HysbH5sndcs4bZjr1 Thanks, Jessie ________________________________ From: Autonlab-users on behalf of Chen Jieshi Sent: Thursday, September 20, 2018 10:32:37 AM To: users at autonlab.org Subject: *** Auton Lab's 25th Annual Picnic: Sunday, October 7th at Schenley Park *** Dear Autonians, We would like to invite you and your family to celebrate the 25th birthday of the Auton Lab. Pls save the date for our annual lab picnic at Vietnam Veterans Pavilion in Schenley Park on Sunday, October 7th. Pls RSVP through the web form below so that we could plan resources properly. https://goo.gl/forms/HysbH5sndcs4bZjr1 Looking forward to seeing all of you! Best, Jessie Jieshi (Jessie) Chen Senior Research Analyst Auton Lab, Robotics Institute Carnegie Mellon University Newell-Simon Hall, Room 3123 5000 Forbes Ave, Pittsburgh, PA 15213 -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Oct 1 16:27:25 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 01 Oct 2018 16:27:25 -0400 Subject: MATLAB R2018b Message-ID: <20181001202725.84ih5D2Ku%predragp@andrew.cmu.edu> Dear Autonians, I just installed MATLAB R2018b to low1. Due to the size of installation files (12GB) things are moving slowly but I am on the role now and few computing nodes will be added within 30 minutes. I hope that all servers including the servers of Neill's group will have the new version by tomorrow morning. I will not spam you with additional e-mail. Desktops as well as one virtual machine which uses MATLAB will have to wait until Wednesday. Predrag From predragp at andrew.cmu.edu Mon Oct 1 18:35:47 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 01 Oct 2018 18:35:47 -0400 Subject: MATLAB R2018b In-Reply-To: <20181001202725.84ih5D2Ku%predragp@andrew.cmu.edu> References: <20181001202725.84ih5D2Ku%predragp@andrew.cmu.edu> Message-ID: <20181001223547.K5vSLHkue%predragp@andrew.cmu.edu> Predrag Punosevac wrote: > Dear Autonians, > > I just installed MATLAB R2018b to low1. Due to the size of installation > files (12GB) things are moving slowly but I am on the role now and few > computing nodes will be added within 30 minutes. I hope that all servers > including the servers of Neill's group will have the new version by > tomorrow morning. I will not spam you with additional e-mail. > > Desktops as well as one virtual machine which uses MATLAB will have to > wait until Wednesday. > > Predrag Quick update on the status of MATLAB. Few good news and some not so good news. The MATLAB is now upgraded to the latest 2018b on the following CPU nodes: low1, lov1, lov2, lov3, lov4, lov5, ari, foxconn, athena as well as on gpu1 as a bonus I also upgraded all the software but with exception of gpu1 I didn't reboot machines so the kernel is still few months old. However the userland is the latest. I have encountered few problems. It looks like MATLAB installation breaks NVidia driver and reboot is needed for MATLAB to see GPU devices. That means that GPU nodes will have to be rebooted unless we have serious reason to hold the process. Also none of Neill's servers have enough HDD space for the latest MATLAB (30 GB during the installation). I will have to take little break from this before I can think of what can be done. Cheers, Predrag P.S. If you don't want me to reboot GPU servers please speak now! From predragp at andrew.cmu.edu Mon Oct 1 22:26:41 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 01 Oct 2018 22:26:41 -0400 Subject: MATLAB R2018b In-Reply-To: <20181001223547.K5vSLHkue%predragp@andrew.cmu.edu> References: <20181001202725.84ih5D2Ku%predragp@andrew.cmu.edu> <20181001223547.K5vSLHkue%predragp@andrew.cmu.edu> Message-ID: <20181002022641.g6vIqSgGX%predragp@andrew.cmu.edu> Predrag Punosevac wrote: > Predrag Punosevac wrote: > > > Dear Autonians, > > > > I just installed MATLAB R2018b to low1. Due to the size of installation > > files (12GB) things are moving slowly but I am on the role now and few > > computing nodes will be added within 30 minutes. I hope that all servers > > including the servers of Neill's group will have the new version by > > tomorrow morning. I will not spam you with additional e-mail. > > > > Desktops as well as one virtual machine which uses MATLAB will have to > > wait until Wednesday. > > > > Predrag > > Quick update on the status of MATLAB. Few good news and some not so good > news. > > The MATLAB is now upgraded to the latest 2018b on the following CPU > nodes: > > low1, lov1, lov2, lov3, lov4, lov5, ari, foxconn, athena > > as well as on gpu1 > > as a bonus I also upgraded all the software but with exception of gpu1 I > didn't reboot machines so the kernel is still few months old. However > the userland is the latest. > > I have encountered few problems. It looks like MATLAB installation > breaks NVidia driver and reboot is needed for MATLAB to see GPU devices. > That means that GPU nodes will have to be rebooted unless we have > serious reason to hold the process. > I was able to install the latest release of MATLAB on all GPU servers (2-9) without rebooting them. Also all the software is now up to date. Please rebuilt "deep learning" toolbox accordingly. The remaining issue is lack of space on Neill's servers. I will have to deal with it tomorrow. Personal desktops and that single virtual machine instance which uses MATLAB will have to wait Wednesday as I have things which must be done tomorrow. Predrag > Also none of Neill's servers have enough HDD space for the latest > MATLAB (30 GB during the installation). I will have to take little break > from this before I can think of what can be done. > > > Cheers, > Predrag > > P.S. If you don't want me to reboot GPU servers please speak now! From predragp at andrew.cmu.edu Mon Oct 1 22:54:27 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 01 Oct 2018 22:54:27 -0400 Subject: MATLAB R2018b In-Reply-To: <20181002022641.g6vIqSgGX%predragp@andrew.cmu.edu> References: <20181001202725.84ih5D2Ku%predragp@andrew.cmu.edu> <20181001223547.K5vSLHkue%predragp@andrew.cmu.edu> <20181002022641.g6vIqSgGX%predragp@andrew.cmu.edu> Message-ID: <20181002025427.om6wI-D4K%predragp@andrew.cmu.edu> I was able to install MATLAB (all 23 GB of it) on neill3 and neill4 machines by utilizing scratch directory. neill1 and neill2 machines have tiny HDDs. My plan is to install just a core and few toolbox instead of standard installation with all toolboxes. However I really don't have energy to do this tonight. Cheers, Predrag From mbarnes1 at andrew.cmu.edu Tue Oct 2 07:35:55 2018 From: mbarnes1 at andrew.cmu.edu (Matthew Barnes) Date: Tue, 2 Oct 2018 07:35:55 -0400 Subject: MATLAB R2018b In-Reply-To: <20181002025427.om6wI-D4K%predragp@andrew.cmu.edu> References: <20181001202725.84ih5D2Ku%predragp@andrew.cmu.edu> <20181001223547.K5vSLHkue%predragp@andrew.cmu.edu> <20181002022641.g6vIqSgGX%predragp@andrew.cmu.edu> <20181002025427.om6wI-D4K%predragp@andrew.cmu.edu> Message-ID: Thanks Predrag!! We all appreciate this. On Mon, Oct 1, 2018 at 10:55 PM Predrag Punosevac wrote: > I was able to install MATLAB (all 23 GB of it) on neill3 and neill4 > machines by utilizing scratch directory. neill1 and neill2 machines have > tiny HDDs. My plan is to install just a core and few toolbox instead of > standard installation with all toolboxes. However I really don't have > energy to do this tonight. > > Cheers, > Predrag > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jieshic at andrew.cmu.edu Fri Oct 5 17:33:23 2018 From: jieshic at andrew.cmu.edu (Chen Jieshi) Date: Fri, 5 Oct 2018 17:33:23 -0400 Subject: *** Auton Lab's 25th Annual Picnic: Sunday, October 7th at Schenley Park *** In-Reply-To: <4f1d82938a4c43798c36e241758ddca8@andrew.cmu.edu> References: <70D7E5A2-54D9-4607-8C1D-EE868038CDDD@andrew.cmu.edu> <4f1d82938a4c43798c36e241758ddca8@andrew.cmu.edu> Message-ID: <6873BDFB-345A-4D84-A062-2CC78F3BA78F@andrew.cmu.edu> Just a friendly reminder of the picnic this coming Sunday! We?ll start around noon and here is the google map for the pavilion location https://goo.gl/maps/imDVFypstfC2 Look forward to seeing all of you! Best, Jessie > On Oct 1, 2018, at 10:16 AM, Jieshi Chen wrote: > > Pls RSVP through the following link if you haven't done so. We will place the order for food and CAKE soon. > https://goo.gl/forms/HysbH5sndcs4bZjr1 > > > Thanks, > Jessie > From: Autonlab-users on behalf of Chen Jieshi > Sent: Thursday, September 20, 2018 10:32:37 AM > To: users at autonlab.org > Subject: *** Auton Lab's 25th Annual Picnic: Sunday, October 7th at Schenley Park *** > > Dear Autonians, > > We would like to invite you and your family to celebrate the 25th birthday of the Auton Lab. Pls save the date for our annual lab picnic at Vietnam Veterans Pavilion in Schenley Park on Sunday, October 7th. > > Pls RSVP through the web form below so that we could plan resources properly. > https://goo.gl/forms/HysbH5sndcs4bZjr1 > > Looking forward to seeing all of you! > > > Best, > Jessie > > > Jieshi (Jessie) Chen > Senior Research Analyst > Auton Lab, Robotics Institute > Carnegie Mellon University > Newell-Simon Hall, Room 3123 > 5000 Forbes Ave, Pittsburgh, PA 15213 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jieshic at andrew.cmu.edu Tue Oct 9 11:49:44 2018 From: jieshic at andrew.cmu.edu (Jieshi Chen) Date: Tue, 9 Oct 2018 15:49:44 +0000 Subject: *** Auton Lab's 25th Annual Picnic: Sunday, October 7th at Schenley Park *** In-Reply-To: <6873BDFB-345A-4D84-A062-2CC78F3BA78F@andrew.cmu.edu> References: <70D7E5A2-54D9-4607-8C1D-EE868038CDDD@andrew.cmu.edu> <4f1d82938a4c43798c36e241758ddca8@andrew.cmu.edu>, <6873BDFB-345A-4D84-A062-2CC78F3BA78F@andrew.cmu.edu> Message-ID: <50d4c0f43a8643f6b495c96a4f243064@andrew.cmu.edu> Hi Everyone, Thanks for coming to the picnic! Here's the picture of the Cake, with slogan credited to Artur. The cake is Le Fraisier from bakery La Gourmandine. ps. If you left a black water bottle in the picnic, pls contact me (jieshic at andrew.cmu.edu) to pick it up. Thanks, Jessie ________________________________ From: Autonlab-users on behalf of Chen Jieshi Sent: Friday, October 5, 2018 5:33:23 PM To: users at autonlab.org Subject: Re: *** Auton Lab's 25th Annual Picnic: Sunday, October 7th at Schenley Park *** Just a friendly reminder of the picnic this coming Sunday! We?ll start around noon and here is the google map for the pavilion location https://goo.gl/maps/imDVFypstfC2 Look forward to seeing all of you! Best, Jessie On Oct 1, 2018, at 10:16 AM, Jieshi Chen > wrote: Pls RSVP through the following link if you haven't done so. We will place the order for food and CAKE soon. https://goo.gl/forms/HysbH5sndcs4bZjr1 Thanks, Jessie ________________________________ From: Autonlab-users > on behalf of Chen Jieshi > Sent: Thursday, September 20, 2018 10:32:37 AM To: users at autonlab.org Subject: *** Auton Lab's 25th Annual Picnic: Sunday, October 7th at Schenley Park *** Dear Autonians, We would like to invite you and your family to celebrate the 25th birthday of the Auton Lab. Pls save the date for our annual lab picnic at Vietnam Veterans Pavilion in Schenley Park on Sunday, October 7th. Pls RSVP through the web form below so that we could plan resources properly. https://goo.gl/forms/HysbH5sndcs4bZjr1 Looking forward to seeing all of you! Best, Jessie Jieshi (Jessie) Chen Senior Research Analyst Auton Lab, Robotics Institute Carnegie Mellon University Newell-Simon Hall, Room 3123 5000 Forbes Ave, Pittsburgh, PA 15213 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_5142.jpg Type: image/jpeg Size: 2072891 bytes Desc: IMG_5142.jpg URL: From predragp at andrew.cmu.edu Wed Oct 10 22:57:16 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 10 Oct 2018 22:57:16 -0400 Subject: gsl25, gsl25-devel In-Reply-To: References: Message-ID: <20181011025716.WGI4VAkD0%predragp@andrew.cmu.edu> Ifigeneia Apostolopoulou wrote: > thanks alot both of you! > > I replaced some functions to use the older gsl version > so that I can finish part of my experiments as a temporary solution > You can put those functions back and run the experiment just like you did on your MAC. Josko Plazonic of Princeton University who is the lead developer of Springdale Linux has built RPMs for both gsl25 and gsl25-devel. It's already in computational repo and I installed on low1 (gsl25, gsl25-devel, env modules in /usr/local/share/Modules/modulefiles/...). I hope you do realize that this software is not well tested and any crazy things should be reported. On the related note since I am CC-ing this to the entire lab I am adding llvm-toolset-7 (clang) to all computing nodes right now and replacing devtoolset-6 with devtoolset-7 (GCC 7). Cheers, Predrag > > > On Wed, Oct 10, 2018 at 2:58 PM Predrag Punosevac > wrote: > > > I just asked on Springdale mailing lists if anybody has encountered this > > before and what is the safest way to add GSL 2.5 which was released three > > months ago. > > > > Predrag > > > > On Wed, Oct 10, 2018 at 1:44 PM Donghan Wang wrote: > > > >> Ifi, > >> > >> The following commands should compile the code on LOW1 with GSL 2.5 or > >> newer. > >> > >> source /opt/rh/devtoolset-7/enable > >> g++ -g -std=c++11 -Wall Kernels.cpp PointProcess.cpp HawkesProcess.cpp > >> GeneralizedHawkesProcess.cpp EventSequence.cpp > >> polyagammaSampler/PolyaGamma.cpp polyagammaSampler/RNG.cpp > >> polyagammaSampler/GRNG.cpp stat_utils.cpp plot_utils.cpp fit_pp_models.cpp > >> -lboost_iostreams -lboost_filesystem -lboost_system -lboost_serialization > >> -lboost_timer -lpthread -lgsl -lcblas -llapack > >> -I/usr/local/gsl/2.4/x86_64/include/ > >> > >> Note the path specification needs to contain the trailing slash. > >> > >> The commands will fail due to lack of gsl_ran_wishart function. The > >> function is added to GSL 2.5 however GSL 2.4 is currently installed on > >> LOW1. Predrag is aware of the issue and will look into it. > >> > >> Thanks, > >> Jarod > >> > >> On Wed, Oct 10, 2018 at 1:13 PM Ifigeneia Apostolopoulou < > >> iapostol at andrew.cmu.edu> wrote: > >> > >>> g++ -g -std=c++11 -Wall Kernels.cpp PointProcess.cpp HawkesProcess.cpp > >>> GeneralizedHawkesProcess.cpp EventSequence.cpp polyagammaSampler/PolyaGamma.cpp > >>> polyagammaSampler/RNG.cpp polyagammaSampler/GRNG.cpp stat_utils.cpp > >>> plot_utils.cpp gen_synth_data.cpp -lboost_iostreams -lboost_filesystem > >>> -lboost_system -lboost_serialization -lboost_timer -lpthread -lgsl > >>> -lcblas -llapack -o gen > >>> > >>> > >>> ./gen 2 1 dpp_example_1 2 0 500 1.0 5 1.0 5 2 1 0 1 0 1 2 1 0 > >>> 1 0 1 > >>> > >>> On Wed, Oct 10, 2018 at 1:05 PM Ifigeneia Apostolopoulou < > >>> iapostol at andrew.cmu.edu> wrote: > >>> > >>>> pplib.zip > >>>> > >>>> > >>>> g++ -g -std=c++11 -Wall Kernels.cpp PointProcess.cpp HawkesProcess.cpp > >>>> GeneralizedHawkesProcess.cpp EventSequence.cpp polyagammaSampler/PolyaGamma.cpp > >>>> polyagammaSampler/RNG.cpp polyagammaSampler/GRNG.cpp stat_utils.cpp > >>>> plot_utils.cpp fit_pp_models.cpp -lboost_iostreams -lboost_filesystem > >>>> -lboost_system -lboost_serialization -lboost_timer -lpthread -lgsl > >>>> -lcblas -llapack > >>>> > >>> From predragp at andrew.cmu.edu Mon Oct 15 12:33:37 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 15 Oct 2018 12:33:37 -0400 Subject: GPU8 killed by python caching Message-ID: <20181015163337.QfXPm_LMh%predragp@andrew.cmu.edu> Dear Autonians, Using python cashing is a good idea as long as we are careful where to put the cash. root at gpu8$ df -h /tmp Filesystem Size Used Avail Use% Mounted on /dev/mapper/sl_gpu8-root 50G 50G 5.9M 100% / root at gpu8$ du -h -s /tmp 26G /tmp I had to reboot the server. Predrag From predragp at andrew.cmu.edu Tue Oct 16 13:22:05 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Tue, 16 Oct 2018 13:22:05 -0400 Subject: Fwd: Autonlab-sysinfo Digest, Vol 51, Issue 14 In-Reply-To: Message-ID: <446f729f-16c0-4d22-9b73-84410958cac4@email.android.com> An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Mon Oct 22 22:28:40 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Mon, 22 Oct 2018 22:28:40 -0400 Subject: GPU2 scratch directory cleaning Message-ID: <20181023022840.pDZgVw73F%predragp@andrew.cmu.edu> Dear Autonians, Could you please clean your scratch directories on GPU2? The scratch directory 2TB is 100% and people are unable to run their experiments. Thank you. Predrag From awd at cs.cmu.edu Wed Oct 24 13:02:52 2018 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Wed, 24 Oct 2018 13:02:52 -0400 Subject: Ben receives Uber Presidential Fellowship Message-ID: Dear Autonians, Please join me in congratulating our distinguished colleague Benedikt Boecking on his receipt of the 2018 Uber Presidential Fellowship Award! This great honor for Ben and our team comes with an added benefit of reducing the Lab's responsibility for the costs of Ben's stipend and tuition for one year. It should be noted that Ben follows the (equally distinguished) footsteps of Nick Gisolfi, who won the Presidential Award in 2016. At that point, we set a lofty but not completely crazy goal of making sure that every Auton Lab graduate student receives the same award, so that we could perpetually stabilize our research budget. We are evidently still working on it, yet progress has just been made :) Cheers to Ben! Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From bapoczos at cs.cmu.edu Wed Oct 24 13:25:57 2018 From: bapoczos at cs.cmu.edu (Barnabas Poczos) Date: Wed, 24 Oct 2018 13:25:57 -0400 Subject: Ben receives Uber Presidential Fellowship In-Reply-To: References: Message-ID: This is awesome! Congrats Ben! Best, Barnabas ====================== Barnabas Poczos, PhD Associate Professor Co-Director of PhD Program Machine Learning Department Carnegie Mellon University On Wed, Oct 24, 2018 at 1:03 PM Artur Dubrawski wrote: > > Dear Autonians, > > Please join me in congratulating our distinguished colleague Benedikt > Boecking on his receipt of the 2018 Uber Presidential Fellowship Award! > > This great honor for Ben and our team comes with an added benefit of > reducing the Lab's responsibility for the costs of Ben's stipend and tuition > for one year. It should be noted that Ben follows the (equally distinguished) > footsteps of Nick Gisolfi, who won the Presidential Award in 2016. > At that point, we set a lofty but not completely crazy goal of making sure that > every Auton Lab graduate student receives the same award, so that we could > perpetually stabilize our research budget. We are evidently still working on it, > yet progress has just been made :) > > Cheers to Ben! > > Artur From awd at cs.cmu.edu Tue Oct 30 07:20:45 2018 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Tue, 30 Oct 2018 07:20:45 -0400 Subject: Maria's interview on youtube Message-ID: Dear Autonians, Take a look at what Maria has to say about the use of Machine Learning in developing countries. She has been interviewed by ZettaBytes in Switzerland, during her recent European tour. As a bonus, you can hear how to pronounce Maria's name in French :) ZettaBytes, affiliated with the School of Computer and Communication Sciences at the Ecole Polytechnique Federale de Lausanne (EPFL) in their own words: "aims at promoting and explaining big ideas of computer science to a general audience." https://www.youtube.com/watch?v=tRgiaXFEtwI Enjoy! Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at andrew.cmu.edu Wed Oct 31 13:38:39 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 31 Oct 2018 13:38:39 -0400 Subject: GPU1 rebooted GPU3 scratch Message-ID: <20181031173839.F_MhxVZKw%predragp@andrew.cmu.edu> Dear Autonians, Per users report I had to reboot GPU1. nvidia-smi works for me as a regular user although it takes some time. I can also see GPU cards through MATLAB. However, I might need to upgrade the driver in the future if people think it is too slow. On the unrelated note scratch directory on GPU3 if full making it less useful to people. Please clean the scratch before I do rm -rf and just rebuild the space. Best, Predrag From awd at cs.cmu.edu Wed Oct 31 16:16:29 2018 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Wed, 31 Oct 2018 16:16:29 -0400 Subject: Fwd: FW: Workshop on Systematic Reviews - next Tuesday (11/6) In-Reply-To: References: <3EC2F8EA-841C-4600-B04B-13DB6AD30257@cmu.edu> Message-ID: This could be useful. Artur ---------- Forwarded message --------- From: Dorothy Holland-Minkley Date: Wed, Oct 31, 2018 at 3:06 PM Subject: FW: Workshop on Systematic Reviews - next Tuesday (11/6) To: ml-faculty at cs.cmu.edu , ml-students at cs.cmu.edu < ml-students at cs.cmu.edu> Cc: Huajin Wang *From: *Huajin Wang *Date: *Wednesday, October 31, 2018 at 3:02 PM Hi everyone, We are hosting a workshop ?Finding Research Evidence for Decision-making: An Introduction to Systematic Reviews" next Tuesday. Anyone who would like to learn about comprehensive literature review, meta-analysis or reproducible literature search is encouraged to come! For more details and to register: https://cmu.libcal.com/event/4500063 Best, Huajin -------- Huajin Wang, PhD Biomedical Data Science Liaison University Libraries Carnegie Mellon University huajinw at cmu.edu | 412.268.3172 [image: cid:D4774028-05C1-43DB-8A9D-043949890C32 at fios-router.home] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 490384 bytes Desc: not available URL: From predragp at andrew.cmu.edu Wed Oct 31 20:36:05 2018 From: predragp at andrew.cmu.edu (Predrag Punosevac) Date: Wed, 31 Oct 2018 20:36:05 -0400 Subject: Main File server issues Message-ID: <20181101003605.S9_HZ1LQ2%predragp@andrew.cmu.edu> Dear Autonians, Our group has grown up to 129 active accounts which is one shy of National Robotics Center. Unfortunately some of the computer infrastructure practices we adapted over the past 25 years no longer scale well. One of those polices is unrestricted size home directories which share the same ZFS data set on one of our NFS file servers. Currently there are 96 historical accounts which share the 36TB data set on zfs pool (zfsauton) of the same size, 21 recently created accounts with home directories restricted to 250 GB per user each one being a separate ZFS data set which is the part of 44TB zfs pool (zfsauton2), and 12 account of group Neill which has its own file server with ample space on their own zfs pool (zfsneill). Neill account also share the same data set but they are irrelevant for the purpose of this e-mail. Besides being able to implement per account storage restrictions having each home directory enables us to have more fine grained ZFS snapshot take and retention policies. However migrating old accounts to separate data sets (I have several 44 TB ZFS pools available for those accounts) is time consuming and manual process (essentially I have to rsync old to new home directory). All file servers in questions have 10 Gigabit network cards so it is completely irrelevant where is your home directory. I was dragging my feet with it but I can't do it any longer. 36TB ZFS pool which hosts /zfsauton/home dataset is over 90% full which seriously impacts the speed of NFS (in spite of 10 Gigabit network card) and is very expensive to rebuild (ZFS resilvering in the case of a dead HDD will take a month instead of a day). I have no choice but to do the following. I am looking for 10-15 volunteers who don't mind removing old junk from their home directories and having some down time while I rsync those directories to the new file server (ZFS pool). Once I migrate 10-15 accounts I will stop snapshots for the remaining people and clear stale snapshots to relive the space on the 36TB ZFS pool. This will be repeated multiple times up until all accounts are separate ZFS data sets with limit 250 GB per account (additional storage space will be granted per sponsoring faculty request). In the case that there are no volunteers I will compute the size of the 5 largest home directories and those will be migrated to the new file server after being reduced to the proper size. I appreciate your cooperation in this matter. Sincerely, Predrag Punosevac P.S. We will also have to implement Slurm workload manager on all computing nodes no later than the 1st of January next year https://slurm.schedmd.com/ This will essentially convert the Auton Lab in to the same modus operands (apart of the Lustre file system http://lustre.org/) as the Pittsburgh super computing center.