From predragp at cs.cmu.edu Sun Oct 2 16:56:58 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Sun, 02 Oct 2016 16:56:58 -0400 Subject: MATLAB upgrade completed Message-ID: <20161002205658.2FVmVxnCA%predragp@cs.cmu.edu> Dear Autonians, MATLAB is now upgraded on all computing nodes. Predrag From predragp at cs.cmu.edu Mon Oct 3 07:31:00 2016 From: predragp at cs.cmu.edu (predragp at cs.cmu.edu) Date: Mon, 3 Oct 2016 11:31:00 +0000 (UTC) Subject: Fwd: POSTPONED: [Power Outage] SCS Wean Hall machine room - October 4th 2016 In-Reply-To: <4dff3aaa-a607-1682-0f69-b604579731c3@cs.cmu.edu> References: <0aee22ff-fd00-56d4-4563-f912cfe06184@cs.cmu.edu> <4dff3aaa-a607-1682-0f69-b604579731c3@cs.cmu.edu> Message-ID: <7B1F10ED8F1045F6.d4c53dad-27b1-4003-9697-cf9e62464310@mail.outlook.com> Get Outlook for Android ---------- Forwarded message ---------- From: "Edward Walter" Date: Mon, Oct 3, 2016 at 7:28 AM -0400 Subject: POSTPONED: [Power Outage] SCS Wean Hall machine room - October 4th 2016 To: "Edward J Walter" The planned power outage for the SCS Wean Hall machine room has been postponed and will NOT take place on October 4th. We are coordinating with the electrical contractors to get the work re-scheduled. We will let you know as soon as we have a new date for this power outage. Thank you. SCS Help Desk On 09/12/2016 08:01 AM, Edward Walter wrote: > SCS Computing Facilities and FMS are planning a partial power outage in > the SCS Wean Hall machine room. We expect this work to begin on Oct > 4th, 2016 and to take less than 24 hours. The outage may run into 48 > hours in the event that the electrical contractor encounters something > unexpected. > > The following servers or computational clusters will be affected by this > power outage: > > Affected clusters: > ACTR.HPC1.CS.CMU.EDU > AUTON > COMA.HPC1.CS.CMU.EDU > CORTEX.ML.CMU.EDU > LATEDAYS.ANDREW.CMU.EDU > PSYCH-O.HPC1.CS.CMU.EDU > ROCKS.IS.CS.CMU.EDU > WORKHORSE.LTI.CS.CMU.EDU > YODA.GRAPHICS.CS.CMU.EDU > > > Affected servers: > OMEPSLID.COMPBIO > SLIF.COMPBIO > PACIFIC.DB > GPUSERVER.PERCEPTION > GPUSERVER2.PERCEPTION > GPUSERVER3.PERCEPTION > GPUSERVER5.PERCEPTION > GPUSERVER6.PERCEPTION > GPUSERVER7.PERCEPTION > DENVER.LTI > LOR.LTI > MIAMI.LTI > SASKIA.ML > MARTEN.ML > JAN.ML > ARNOUT.ML > LYSBET.ML > FLORIS.ML > > > Please contact the SCS Help Desk at x8-4231 or send mail to > help at cs.cmu.edu with any questions or concerns regarding this > maintenance period. > > Thank you for your attention, > > SCS Help Desk -------------- next part -------------- An HTML attachment was scrubbed... URL: From jieshic at andrew.cmu.edu Mon Oct 3 11:30:18 2016 From: jieshic at andrew.cmu.edu (jieshic at andrew.cmu.edu) Date: Mon, 3 Oct 2016 11:30:18 -0400 Subject: Annual Auton Lab Picnic: Saturday October 1st at Schenley Park In-Reply-To: References: <07b2034c-153b-d4e3-5e46-8e1b65f8a2ac@cs.cmu.edu> Message-ID: <942587fbd979ccea27143628ce8ee5a3.squirrel@webmail.andrew.cmu.edu> Hi Everyone, Thanks for attending the picnic last Saturday. Attached is the picture for the cake with winning slogan by Kyle. Since we received a lot of questions about the food vendors, here is the information.BBQ food from Double Wide Grill (2339 E Carson St, Pittsburgh, PA 15203).Taro cake from Pink Box Bakery Cafe (2104 Murray Ave, Pittsburgh, PA 15217). Thank you again for your support. Best, Auton Lab Entertainment Committee > > Hi Everyone, > Tomorrow's picnic will start at 11:30am. The shelter > is Vietnam Veterans Pavilion at the Schenley Park. We'll have lunch with > BBQ food, a large Taro cake, and beers & wines (pls bring your ID if > you'll drink :P ). > We have prepared some long games and you are > also welcome to bring your bikes, games, cameras, etc.. > BTW, here's > the weather forecast for your reference. "Showers in the morning, > then partly cloudy in the afternoon. Thunder possible. High 73F. Winds SSE > at 5 to 10 mph. Chance of rain 50%."? (source: www.weather.com) > > Pls feel free to let me know if you have any question.? > > Looking forward to seeing you tomorrow! > > > > Cheers, > Jessie >> Dear Autonians, >> >> We will be celebrating the 23rd > anniversary of the Lab this year at a >> nearby location. >> > We have reserved Vietnam Veterans Pavilion at the Schenley Park: >> > >> > https://www.google.com/maps/place/Vietnam+Veterans+Pavilion/@40.434036,-79.9453924,17z/data=!4m12!1m6!3m5!1s0x8834f18b186c4403:0xd24a0faef8f7e126!2sSchenley+Park!8m2!3d40.4318311!4d-79.9462078!3m4!1s0x0:0x4bcca3ccfd92c919!8m2!3d40.4338281!4d-79.9439894 >> >> We have it booked from 11am till 9pm, we will have lunch > food, and the >> Auton Lab Entertainment >> Committee led by > our CEO (Chief Entertainment Officer) Jessie is working >> on the - > you've guessed it - program >> of entertainment and activities. >> >> Would you please go to this google doc form to rsvp and > provide >> information useful for planning: >> >> > https://docs.google.com/forms/d/e/1FAIpQLSdf75XSHzcACDi9eiiUWjVSFuUzTT4mwTPNBWr1kIH7z0H69Q/viewform?usp=send_form >> >> Quick hint for the new members of our team: each year we > run a contest >> for the most fitting slogan to put >> on > the Auton Lab birthday cake. The bids are judged by a few wise men >> and the author of the winning slogan >> feels the pride of > seeing it on the cake and basks in glory forever. >> >> > Cheers! >> Artur >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 20161001_141039.jpeg Type: image/jpeg Size: 160849 bytes Desc: not available URL: From krw at andrew.cmu.edu Mon Oct 3 11:36:09 2016 From: krw at andrew.cmu.edu (Karen Widmaier) Date: Mon, 3 Oct 2016 11:36:09 -0400 Subject: Annual Auton Lab Picnic: Saturday October 1st at Schenley Park In-Reply-To: <942587fbd979ccea27143628ce8ee5a3.squirrel@webmail.andrew.cmu.edu> References: <07b2034c-153b-d4e3-5e46-8e1b65f8a2ac@cs.cmu.edu> <942587fbd979ccea27143628ce8ee5a3.squirrel@webmail.andrew.cmu.edu> Message-ID: <023e01d21d8b$dd6aa860$983ff920$@andrew.cmu.edu> Hi Jieshi, Thank you for all your hard work pulling it all together. A special thanks to you and Maria and Ben. Karen From: Autonlab-users [mailto:autonlab-users-bounces at autonlab.org] On Behalf Of jieshic at andrew.cmu.edu Sent: Monday, October 03, 2016 11:30 AM To: users at autonlab.org Cc: Chirag Nagpal; Luna Yang Subject: Re: Annual Auton Lab Picnic: Saturday October 1st at Schenley Park Hi Everyone, Thanks for attending the picnic last Saturday. Attached is the picture for the cake with winning slogan by Kyle. Since we received a lot of questions about the food vendors, here is the information. * BBQ food from Double Wide Grill (2339 E Carson St, Pittsburgh, PA 15203). * Taro cake from Pink Box Bakery Cafe (2104 Murray Ave, Pittsburgh, PA 15217). Thank you again for your support. Best, Auton Lab Entertainment Committee > > Hi Everyone, > Tomorrow's picnic will start at 11:30am. The shelter > is Vietnam Veterans Pavilion at the Schenley Park. We'll have lunch with > BBQ food, a large Taro cake, and beers & wines (pls bring your ID if > you'll drink :P ). > We have prepared some long games and you are > also welcome to bring your bikes, games, cameras, etc.. > BTW, here's > the weather forecast for your reference. "Showers in the morning, > then partly cloudy in the afternoon. Thunder possible. High 73F. Winds SSE > at 5 to 10 mph. Chance of rain 50%." (source: www.weather.com) > > Pls feel free to let me know if you have any question. > > Looking forward to seeing you tomorrow! > > > > Cheers, > Jessie >> Dear Autonians, >> >> We will be celebrating the 23rd > anniversary of the Lab this year at a >> nearby location. >> > We have reserved Vietnam Veterans Pavilion at the Schenley Park: >> > >> > https://www.google.com/maps/place/Vietnam+Veterans+Pavilion/@40.434036,-79.9 453924,17z/data=!4m12!1m6!3m5!1s0x8834f18b186c4403:0xd24a0faef8f7e126!2sSche nley+Park!8m2!3d40.4318311!4d-79.9462078!3m4!1s0x0:0x4bcca3ccfd92c919!8m2!3d 40.4338281!4d-79.9439894 >> >> We have it booked from 11am till 9pm, we will have lunch > food, and the >> Auton Lab Entertainment >> Committee led by > our CEO (Chief Entertainment Officer) Jessie is working >> on the - > you've guessed it - program >> of entertainment and activities. >> >> Would you please go to this google doc form to rsvp and > provide >> information useful for planning: >> >> > https://docs.google.com/forms/d/e/1FAIpQLSdf75XSHzcACDi9eiiUWjVSFuUzTT4mwTPN BWr1kIH7z0H69Q/viewform?usp=send_form >> >> Quick hint for the new members of our team: each year we > run a contest >> for the most fitting slogan to put >> on > the Auton Lab birthday cake. The bids are judged by a few wise men >> and the author of the winning slogan >> feels the pride of > seeing it on the cake and basks in glory forever. >> >> > Cheers! >> Artur >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbarnes1 at andrew.cmu.edu Mon Oct 3 11:47:50 2016 From: mbarnes1 at andrew.cmu.edu (Matt Barnes) Date: Mon, 03 Oct 2016 15:47:50 +0000 Subject: Annual Auton Lab Picnic: Saturday October 1st at Schenley Park In-Reply-To: <023e01d21d8b$dd6aa860$983ff920$@andrew.cmu.edu> References: <07b2034c-153b-d4e3-5e46-8e1b65f8a2ac@cs.cmu.edu> <942587fbd979ccea27143628ce8ee5a3.squirrel@webmail.andrew.cmu.edu> <023e01d21d8b$dd6aa860$983ff920$@andrew.cmu.edu> Message-ID: Seconded. Thanks for organizing -- I've never eaten so many ribs. On Mon, Oct 3, 2016 at 11:37 AM Karen Widmaier wrote: > Hi Jieshi, > > Thank you for all your hard work pulling it all together. > > > > A special thanks to you and Maria and Ben. > > > > Karen > > > > > > *From:* Autonlab-users [mailto:autonlab-users-bounces at autonlab.org] *On > Behalf Of *jieshic at andrew.cmu.edu > *Sent:* Monday, October 03, 2016 11:30 AM > *To:* users at autonlab.org > *Cc:* Chirag Nagpal; Luna Yang > *Subject:* Re: Annual Auton Lab Picnic: Saturday October 1st at Schenley > Park > > > > Hi Everyone, > > Thanks for attending the picnic last Saturday. Attached is the picture for > the cake with winning slogan by Kyle. > > Since we received a lot of questions about the food vendors, here is the > information. > > - BBQ food from Double Wide Grill (2339 E Carson St, Pittsburgh, PA > 15203). > - Taro cake from Pink Box Bakery Cafe (2104 Murray Ave, Pittsburgh, PA > 15217). > > Thank you again for your support. > > Best, > Auton Lab Entertainment Committee > > > > > Hi Everyone, > > Tomorrow's picnic will start at 11:30am. The shelter > > is Vietnam Veterans Pavilion at the Schenley Park. We'll have lunch with > > BBQ food, a large Taro cake, and beers & wines (pls bring your ID if > > you'll drink :P ). > > We have prepared some long games and you are > > also welcome to bring your bikes, games, cameras, etc.. > > BTW, here's > > the weather forecast for your reference. "Showers in the morning, > > then partly cloudy in the afternoon. Thunder possible. High 73F. Winds > SSE > > at 5 to 10 mph. Chance of rain 50%." (source: www.weather.com) > > > > Pls feel free to let me know if you have any question. > > > > Looking forward to seeing you tomorrow! > > > > > > > > Cheers, > > Jessie > >> Dear Autonians, > >> > >> We will be celebrating the 23rd > > anniversary of the Lab this year at a > >> nearby location. > >> > > We have reserved Vietnam Veterans Pavilion at the Schenley Park: > >> > > > >> > > > https://www.google.com/maps/place/Vietnam+Veterans+Pavilion/@40.434036,-79.9453924,17z/data=!4m12!1m6!3m5!1s0x8834f18b186c4403:0xd24a0faef8f7e126!2sSchenley+Park!8m2!3d40.4318311!4d-79.9462078!3m4!1s0x0:0x4bcca3ccfd92c919!8m2!3d40.4338281!4d-79.9439894 > >> > >> We have it booked from 11am till 9pm, we will have lunch > > food, and the > >> Auton Lab Entertainment > >> Committee led by > > our CEO (Chief Entertainment Officer) Jessie is working > >> on the - > > you've guessed it - program > >> of entertainment and activities. > >> > >> Would you please go to this google doc form to rsvp and > > provide > >> information useful for planning: > >> > >> > > > https://docs.google.com/forms/d/e/1FAIpQLSdf75XSHzcACDi9eiiUWjVSFuUzTT4mwTPNBWr1kIH7z0H69Q/viewform?usp=send_form > >> > >> Quick hint for the new members of our team: each year we > > run a contest > >> for the most fitting slogan to put > >> on > > the Auton Lab birthday cake. The bids are judged by a few wise men > >> and the author of the winning slogan > >> feels the pride of > > seeing it on the cake and basks in glory forever. > >> > >> > > Cheers! > >> Artur > >> > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From awd at cs.cmu.edu Mon Oct 3 11:54:51 2016 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 3 Oct 2016 11:54:51 -0400 Subject: Annual Auton Lab Picnic: Saturday October 1st at Schenley Park In-Reply-To: <942587fbd979ccea27143628ce8ee5a3.squirrel@webmail.andrew.cmu.edu> References: <07b2034c-153b-d4e3-5e46-8e1b65f8a2ac@cs.cmu.edu> <942587fbd979ccea27143628ce8ee5a3.squirrel@webmail.andrew.cmu.edu> Message-ID: <54c94770-e719-ac60-aef3-26ad110e7be8@cs.cmu.edu> Jessie and the Entertainment Committee, Thank you so much for organizing a meticulous and very enjoyable event! This might have been the most attended Lab Picnic in Auton history, so feeding and entertaining everyone was not a small feat. Way to go! Thanks Artur On 10/3/2016 11:30 AM, jieshic at andrew.cmu.edu wrote: > > Hi Everyone, > > Thanks for attending the picnic last Saturday. Attached is the picture > for the cake with winning slogan by Kyle. > > Since we received a lot of questions about the food vendors, here is > the information. > > * BBQ food from Double Wide Grill(2339 E Carson St, Pittsburgh, PA > 15203). > * Taro cake from Pink Box Bakery Cafe (2104 Murray Ave, Pittsburgh, > PA 15217). > > Thank you again for your support. > > Best, > Auton Lab Entertainment Committee > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandasamy at cmu.edu Mon Oct 3 11:56:27 2016 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Mon, 3 Oct 2016 11:56:27 -0400 Subject: Annual Auton Lab Picnic: Saturday October 1st at Schenley Park In-Reply-To: References: <07b2034c-153b-d4e3-5e46-8e1b65f8a2ac@cs.cmu.edu> <942587fbd979ccea27143628ce8ee5a3.squirrel@webmail.andrew.cmu.edu> <023e01d21d8b$dd6aa860$983ff920$@andrew.cmu.edu> Message-ID: Thanks for organising this Jieshi, Ben, Maria and everyone else who chipped in. This was one of the best picnics we've had! On Mon, Oct 3, 2016 at 11:47 AM, Matt Barnes wrote: > Seconded. Thanks for organizing -- I've never eaten so many ribs. > > On Mon, Oct 3, 2016 at 11:37 AM Karen Widmaier wrote: > >> Hi Jieshi, >> >> Thank you for all your hard work pulling it all together. >> >> >> >> A special thanks to you and Maria and Ben. >> >> >> >> Karen >> >> >> >> >> >> *From:* Autonlab-users [mailto:autonlab-users-bounces at autonlab.org] *On >> Behalf Of *jieshic at andrew.cmu.edu >> *Sent:* Monday, October 03, 2016 11:30 AM >> *To:* users at autonlab.org >> *Cc:* Chirag Nagpal; Luna Yang >> *Subject:* Re: Annual Auton Lab Picnic: Saturday October 1st at Schenley >> Park >> >> >> >> Hi Everyone, >> >> Thanks for attending the picnic last Saturday. Attached is the picture >> for the cake with winning slogan by Kyle. >> >> Since we received a lot of questions about the food vendors, here is the >> information. >> >> - BBQ food from Double Wide Grill (2339 E Carson St, Pittsburgh, PA >> 15203). >> - Taro cake from Pink Box Bakery Cafe (2104 Murray Ave, Pittsburgh, >> PA 15217). >> >> Thank you again for your support. >> >> Best, >> Auton Lab Entertainment Committee >> >> > >> > Hi Everyone, >> > Tomorrow's picnic will start at 11:30am. The shelter >> > is Vietnam Veterans Pavilion at the Schenley Park. We'll have lunch with >> > BBQ food, a large Taro cake, and beers & wines (pls bring your ID if >> > you'll drink :P ). >> > We have prepared some long games and you are >> > also welcome to bring your bikes, games, cameras, etc.. >> > BTW, here's >> > the weather forecast for your reference. "Showers in the morning, >> > then partly cloudy in the afternoon. Thunder possible. High 73F. Winds >> SSE >> > at 5 to 10 mph. Chance of rain 50%." (source: www.weather.com) >> > >> > Pls feel free to let me know if you have any question. >> > >> > Looking forward to seeing you tomorrow! >> > >> > >> > >> > Cheers, >> > Jessie >> >> Dear Autonians, >> >> >> >> We will be celebrating the 23rd >> > anniversary of the Lab this year at a >> >> nearby location. >> >> >> > We have reserved Vietnam Veterans Pavilion at the Schenley Park: >> >> >> > >> >> >> > https://www.google.com/maps/place/Vietnam+Veterans+ >> Pavilion/@40.434036,-79.9453924,17z/data=!4m12!1m6! >> 3m5!1s0x8834f18b186c4403:0xd24a0faef8f7e126!2sSchenley+ >> Park!8m2!3d40.4318311!4d-79.9462078!3m4!1s0x0: >> 0x4bcca3ccfd92c919!8m2!3d40.4338281!4d-79.9439894 >> >> >> >> We have it booked from 11am till 9pm, we will have lunch >> > food, and the >> >> Auton Lab Entertainment >> >> Committee led by >> > our CEO (Chief Entertainment Officer) Jessie is working >> >> on the - >> > you've guessed it - program >> >> of entertainment and activities. >> >> >> >> Would you please go to this google doc form to rsvp and >> > provide >> >> information useful for planning: >> >> >> >> >> > https://docs.google.com/forms/d/e/1FAIpQLSdf75XSHzcACDi9eiiUWjVS >> FuUzTT4mwTPNBWr1kIH7z0H69Q/viewform?usp=send_form >> >> >> >> Quick hint for the new members of our team: each year we >> > run a contest >> >> for the most fitting slogan to put >> >> on >> > the Auton Lab birthday cake. The bids are judged by a few wise men >> >> and the author of the winning slogan >> >> feels the pride of >> > seeing it on the cake and basks in glory forever. >> >> >> >> >> > Cheers! >> >> Artur >> >> >> > >> > >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sray at cs.cmu.edu Mon Oct 3 14:57:17 2016 From: sray at cs.cmu.edu (Saswati Ray) Date: Mon, 3 Oct 2016 14:57:17 -0400 Subject: Annual Auton Lab Picnic: Saturday October 1st at Schenley Park In-Reply-To: <942587fbd979ccea27143628ce8ee5a3.squirrel@webmail.andrew.cmu.edu> References: <07b2034c-153b-d4e3-5e46-8e1b65f8a2ac@cs.cmu.edu> <942587fbd979ccea27143628ce8ee5a3.squirrel@webmail.andrew.cmu.edu> Message-ID: <0447e89e-8649-e950-59a6-4c4951c99667@cs.cmu.edu> Wonderful picnic. Very delicious cake! Thank you all, Saswati On 10/03/2016 11:30 AM, jieshic at andrew.cmu.edu wrote: > > Hi Everyone, > > Thanks for attending the picnic last Saturday. Attached is the picture > for the cake with winning slogan by Kyle. > > Since we received a lot of questions about the food vendors, here is > the information. > > * BBQ food from Double Wide Grill(2339 E Carson St, Pittsburgh, PA > 15203). > * Taro cake from Pink Box Bakery Cafe (2104 Murray Ave, Pittsburgh, > PA 15217). > > Thank you again for your support. > > Best, > Auton Lab Entertainment Committee > > > > > Hi Everyone, > > Tomorrow's picnic will start at 11:30am. The shelter > > is Vietnam Veterans Pavilion at the Schenley Park. We'll have lunch with > > BBQ food, a large Taro cake, and beers & wines (pls bring your ID if > > you'll drink :P ). > > We have prepared some long games and you are > > also welcome to bring your bikes, games, cameras, etc.. > > BTW, here's > > the weather forecast for your reference. "Showers in the morning, > > then partly cloudy in the afternoon. Thunder possible. High 73F. > Winds SSE > > at 5 to 10 mph. Chance of rain 50%." (source: www.weather.com) > > > > Pls feel free to let me know if you have any question. > > > > Looking forward to seeing you tomorrow! > > > > > > > > Cheers, > > Jessie > >> Dear Autonians, > >> > >> We will be celebrating the 23rd > > anniversary of the Lab this year at a > >> nearby location. > >> > > We have reserved Vietnam Veterans Pavilion at the Schenley Park: > >> > > > >> > > > https://www.google.com/maps/place/Vietnam+Veterans+Pavilion/@40.434036,-79.9453924,17z/data=!4m12!1m6!3m5!1s0x8834f18b186c4403:0xd24a0faef8f7e126!2sSchenley+Park!8m2!3d40.4318311!4d-79.9462078!3m4!1s0x0:0x4bcca3ccfd92c919!8m2!3d40.4338281!4d-79.9439894 > >> > >> We have it booked from 11am till 9pm, we will have lunch > > food, and the > >> Auton Lab Entertainment > >> Committee led by > > our CEO (Chief Entertainment Officer) Jessie is working > >> on the - > > you've guessed it - program > >> of entertainment and activities. > >> > >> Would you please go to this google doc form to rsvp and > > provide > >> information useful for planning: > >> > >> > > > https://docs.google.com/forms/d/e/1FAIpQLSdf75XSHzcACDi9eiiUWjVSFuUzTT4mwTPNBWr1kIH7z0H69Q/viewform?usp=send_form > >> > >> Quick hint for the new members of our team: each year we > > run a contest > >> for the most fitting slogan to put > >> on > > the Auton Lab birthday cake. The bids are judged by a few wise men > >> and the author of the winning slogan > >> feels the pride of > > seeing it on the cake and basks in glory forever. > >> > >> > > Cheers! > >> Artur > >> > > > > > > > -- Saswati Ray Senior Research Programmer Carnegie Mellon University - Auton Lab Newell-Simon Hall Room 3115, Pittsburgh PA 15213 Phone: 412-268-1238 -------------- next part -------------- An HTML attachment was scrubbed... URL: From junieroliva at gmail.com Mon Oct 3 14:59:47 2016 From: junieroliva at gmail.com (Junier Oliva) Date: Mon, 3 Oct 2016 14:59:47 -0400 Subject: Annual Auton Lab Picnic: Saturday October 1st at Schenley Park In-Reply-To: <0447e89e-8649-e950-59a6-4c4951c99667@cs.cmu.edu> References: <07b2034c-153b-d4e3-5e46-8e1b65f8a2ac@cs.cmu.edu> <942587fbd979ccea27143628ce8ee5a3.squirrel@webmail.andrew.cmu.edu> <0447e89e-8649-e950-59a6-4c4951c99667@cs.cmu.edu> Message-ID: Everything was great (especially those ribs!) :D Thanks! Junier On Mon, Oct 3, 2016 at 2:57 PM, Saswati Ray wrote: > Wonderful picnic. > > Very delicious cake! > > Thank you all, > Saswati > > > On 10/03/2016 11:30 AM, jieshic at andrew.cmu.edu wrote: > > Hi Everyone, > > Thanks for attending the picnic last Saturday. Attached is the picture for > the cake with winning slogan by Kyle. > > Since we received a lot of questions about the food vendors, here is the > information. > > - BBQ food from Double Wide Grill (2339 E Carson St, Pittsburgh, PA > 15203). > - Taro cake from Pink Box Bakery Cafe (2104 Murray Ave, Pittsburgh, PA > 15217). > > Thank you again for your support. > > Best, > Auton Lab Entertainment Committee > > > > > Hi Everyone, > > Tomorrow's picnic will start at 11:30am. The shelter > > is Vietnam Veterans Pavilion at the Schenley Park. We'll have lunch with > > BBQ food, a large Taro cake, and beers & wines (pls bring your ID if > > you'll drink :P ). > > We have prepared some long games and you are > > also welcome to bring your bikes, games, cameras, etc.. > > BTW, here's > > the weather forecast for your reference. "Showers in the morning, > > then partly cloudy in the afternoon. Thunder possible. High 73F. Winds > SSE > > at 5 to 10 mph. Chance of rain 50%." (source: www.weather.com) > > > > Pls feel free to let me know if you have any question. > > > > Looking forward to seeing you tomorrow! > > > > > > > > Cheers, > > Jessie > >> Dear Autonians, > >> > >> We will be celebrating the 23rd > > anniversary of the Lab this year at a > >> nearby location. > >> > > We have reserved Vietnam Veterans Pavilion at the Schenley Park: > >> > > > >> > > https://www.google.com/maps/place/Vietnam+Veterans+ > Pavilion/@40.434036,-79.9453924,17z/data=!4m12!1m6! > 3m5!1s0x8834f18b186c4403:0xd24a0faef8f7e126!2sSchenley+ > Park!8m2!3d40.4318311!4d-79.9462078!3m4!1s0x0:0x4bcca3ccfd92c919!8m2!3d40. > 4338281!4d-79.9439894 > >> > >> We have it booked from 11am till 9pm, we will have lunch > > food, and the > >> Auton Lab Entertainment > >> Committee led by > > our CEO (Chief Entertainment Officer) Jessie is working > >> on the - > > you've guessed it - program > >> of entertainment and activities. > >> > >> Would you please go to this google doc form to rsvp and > > provide > >> information useful for planning: > >> > >> > > https://docs.google.com/forms/d/e/1FAIpQLSdf75XSHzcACDi9eiiUWjVS > FuUzTT4mwTPNBWr1kIH7z0H69Q/viewform?usp=send_form > >> > >> Quick hint for the new members of our team: each year we > > run a contest > >> for the most fitting slogan to put > >> on > > the Auton Lab birthday cake. The bids are judged by a few wise men > >> and the author of the winning slogan > >> feels the pride of > > seeing it on the cake and basks in glory forever. > >> > >> > > Cheers! > >> Artur > >> > > > > > > > > > > -- > Saswati Ray > Senior Research Programmer > Carnegie Mellon University - Auton Lab > Newell-Simon Hall Room 3115, Pittsburgh PA 15213 > Phone: 412-268-1238 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From db78349 at gmail.com Mon Oct 3 17:33:33 2016 From: db78349 at gmail.com (dave booni) Date: Mon, 3 Oct 2016 17:33:33 -0400 Subject: Request for Submission of Website Content Message-ID: Hello Fellow Autonians- First, so that you don't worry, this is David Ba. , new guy at the lab, room 3119 NSH - if you have any questions about the authenticity of this email, feel free to pay me a visit. I am partially responsible for assembling the latest iteration of the Auton Lab website. In an effort to fill the website with up-to-date, relevant content, we are asking the members of the lab to provide materials relating to their work which they have the ability to publicly share. For instance, if you submitted any papers which are in the public domain, a link to said paper would be appreciated. If you presented at a conference, we would like to know - if there is a video of your presentation, a pointer (link) to said presentation would be all the better. Any additional content that you would like to share, including appropriate content from your personal website, is also welcome. Thank you all for your cooperation. We look forward to scrap-booking the souvenirs of your excellent output. Feel free to email me with any questions. -Sincerely David B. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbayani at andrew.cmu.edu Wed Oct 5 09:02:25 2016 From: dbayani at andrew.cmu.edu (David Bayani) Date: Wed, 5 Oct 2016 06:02:25 -0700 Subject: Request for Submission of Website Content (Update) Message-ID: Hello Fellow Autonians- First, so that you don't worry, this is David Ba. , new guy at the lab, room 3119 NSH - if you have any questions about the authenticity of this email, feel free to pay me a visit. I am partially responsible for assembling the latest iteration of the Auton Lab website. In an effort to fill the website with up-to-date, relevant content, we are asking the members of the lab to provide materials relating to their work which they have the ability to publicly share. For instance, if you submitted any papers which are in the public domain, a link to said paper would be appreciated. If you presented at a conference, we would like to know - if there is a video of your presentation, a pointer (link) to said presentation would be all the better. Any additional content that you would like to share, including appropriate content from your personal website, is also welcome. We ask that some material be provided within the next two weeks. Naturally, we will accept submissions later than this, but some representative content from each lab member would be appreciated within this time frame. This email was previously sent from my other account, db78349 at gmail.com, which was quite possibly caught in the spam filter of some lab members. As an addendum to the previous email, we ask that you respond indicating that this message was received. We understand that gathering the requested materials may take some time, which is a rationed resource given the busy schedules of our lab members; with that in mind, we need to distinguish between cases of "received message, but gathering materials" and "failed to receive message". Please direct responses to db78349 at gmail.com. While the methods of constructing a website are largely solved research questions, understand that autonlab.org is an important component of The Auton Lab's face. It provides an overview of the laboratory's focus and achievements to the general public, interested students, and potential funding organizations. Lack of current and sufficient content on the website may fail to reflect the otherwise exceptional work produced by the members of this laboratory. Thank you all for your cooperation. We look forward to scrap-booking the souvenirs of your excellent output. Feel free to email me with any questions. -Sincerely David B. -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at cs.cmu.edu Thu Oct 6 15:00:43 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Thu, 06 Oct 2016 15:00:43 -0400 Subject: Auton Lab Intranet is now functional! Message-ID: <20161006190043.Fpozmq95S%predragp@cs.cmu.edu> Dear Autonians, You should have received or will receive shortly the initial password for the new the Auton Lab Intranet. Make sure you check spam mailboxes before reporting a problem. You can use those passwords to log into new Intranet http://www.autonlab.org/start?do=login§ok=f4aa30200a856d99d7410faad4a007db Feel free to change the password to whatever you fancy upon first login. Our new webpage is DokuWiki based and anybody with the password will be able to edit internal content. Only admins can edit external content or create new users. Login tab is located under tools tab on the far upper right corner. We are working on putting it on the more prominent place. We (Simon, David, and I) are working on resurrecting old content but any help will be appreciated. Predrag From predragp at cs.cmu.edu Fri Oct 7 20:52:00 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Fri, 07 Oct 2016 20:52:00 -0400 Subject: GPU2 status update Message-ID: <20161008005200.0cb1MfHlv%predragp@cs.cmu.edu> Dear Autonians, I got out of the machine room 45 minutes ago where I spent 1h trying to boot GPU3 of the USB drive. For some reason it didn't work. No big deal as the machine has DVD drive which are much more bootable. However it will have to wait until Monday. I am sorry for that. Predrag From awd at cs.cmu.edu Mon Oct 10 08:35:02 2016 From: awd at cs.cmu.edu (Artur Dubrawski) Date: Mon, 10 Oct 2016 08:35:02 -0400 Subject: "The closest thing to getting to go to Hogwarts" In-Reply-To: <6d8ebc95-4f4c-21f7-ce99-ddf109af3d55@cs.cmu.edu> References: <6d8ebc95-4f4c-21f7-ce99-ddf109af3d55@cs.cmu.edu> Message-ID: <2b8a82d9-4d08-4eba-27c9-f33fbbb32ed3@cs.cmu.edu> A must see :) http://www.cbsnews.com/news/60-minutes-artificial-intelligence-real-life-applications/ From predragp at cs.cmu.edu Mon Oct 10 10:27:26 2016 From: predragp at cs.cmu.edu (predragp at cs.cmu.edu) Date: Mon, 10 Oct 2016 14:27:26 +0000 (UTC) Subject: Fwd: RESCHEDULED - October 25th 2016: [Power Outage] SCS Wean Hall machine room In-Reply-To: References: <0aee22ff-fd00-56d4-4563-f912cfe06184@cs.cmu.edu> <4dff3aaa-a607-1682-0f69-b604579731c3@cs.cmu.edu> Message-ID: <7B1F10ED8F1045F6.cc1488a9-74dc-4d22-92e2-c3ffc206a072@mail.outlook.com> Get Outlook for Android ---------- Forwarded message ---------- From: "Edward Walter" Date: Mon, Oct 10, 2016 at 9:27 AM -0400 Subject: RESCHEDULED - October 25th 2016: [Power Outage] SCS Wean Hall machine room To: "Edward J Walter" The partial power outage for the SCS Wean Hall machine room has been rescheduled for October 25th. We expect this work to take less than 24 hours. The outage may run into 48 hours if the electrical contractor encounters problems related to the planned maintenance tasks. Please contact the SCS Help Desk at x8-4231 or send mail to help at cs.cmu.edu with any questions or concerns regarding this maintenance period. Thank you for your attention, SCS Help Desk On 10/03/2016 07:28 AM, Edward Walter wrote: > The planned power outage for the SCS Wean Hall machine room has been > postponed and will NOT take place on October 4th. We are coordinating > with the electrical contractors to get the work re-scheduled. We will > let you know as soon as we have a new date for this power outage. > > Thank you. > > SCS Help Desk > > On 09/12/2016 08:01 AM, Edward Walter wrote: >> SCS Computing Facilities and FMS are planning a partial power outage in >> the SCS Wean Hall machine room. We expect this work to begin on Oct >> 4th, 2016 and to take less than 24 hours. The outage may run into 48 >> hours in the event that the electrical contractor encounters something >> unexpected. >> >> The following servers or computational clusters will be affected by this >> power outage: >> >> Affected clusters: >> ACTR.HPC1.CS.CMU.EDU >> AUTON >> COMA.HPC1.CS.CMU.EDU >> CORTEX.ML.CMU.EDU >> LATEDAYS.ANDREW.CMU.EDU >> PSYCH-O.HPC1.CS.CMU.EDU >> ROCKS.IS.CS.CMU.EDU >> WORKHORSE.LTI.CS.CMU.EDU >> YODA.GRAPHICS.CS.CMU.EDU >> >> >> Affected servers: >> OMEPSLID.COMPBIO >> SLIF.COMPBIO >> PACIFIC.DB >> GPUSERVER.PERCEPTION >> GPUSERVER2.PERCEPTION >> GPUSERVER3.PERCEPTION >> GPUSERVER5.PERCEPTION >> GPUSERVER6.PERCEPTION >> GPUSERVER7.PERCEPTION >> DENVER.LTI >> LOR.LTI >> MIAMI.LTI >> SASKIA.ML >> MARTEN.ML >> JAN.ML >> ARNOUT.ML >> LYSBET.ML >> FLORIS.ML >> >> >> Please contact the SCS Help Desk at x8-4231 or send mail to >> help at cs.cmu.edu with any questions or concerns regarding this >> maintenance period. >> >> Thank you for your attention, >> >> SCS Help Desk -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at cs.cmu.edu Wed Oct 12 18:26:58 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Wed, 12 Oct 2016 18:26:58 -0400 Subject: GPU3 is "configured" Message-ID: <20161012222658.bee4yVvYw%predragp@cs.cmu.edu> Dear Autonians, GPU3 is "configured". Namely you can log into it and all packages are installed. However I couldn't get NVIDIA provided CUDA driver to recognize GPU cards. They appear to be properly installed from the hardware point of view and you can list them with lshw -class display root at gpu3$ lshw -class display *-display UNCLAIMED description: VGA compatible controller product: NVIDIA Corporation vendor: NVIDIA Corporation physical id: 0 bus info: pci at 0000:02:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller cap_list configuration: latency=0 resources: iomemory:383f0-383ef iomemory:383f0-383ef memory:cf000000-cfffffff memory:383fe0000000-383fefffffff memory:383ff0000000-383ff1ffffff ioport:6000(size=128) memory:d0000000-d007ffff *-display UNCLAIMED description: VGA compatible controller product: NVIDIA Corporation vendor: NVIDIA Corporation physical id: 0 bus info: pci at 0000:03:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller cap_list configuration: latency=0 resources: iomemory:383f0-383ef iomemory:383f0-383ef memory:cd000000-cdffffff memory:383fc0000000-383fcfffffff memory:383fd0000000-383fd1ffffff ioport:5000(size=128) memory:ce000000-ce07ffff *-display description: VGA compatible controller product: ASPEED Graphics Family vendor: ASPEED Technology, Inc. physical id: 0 bus info: pci at 0000:06:00.0 version: 30 width: 32 bits clock: 33MHz capabilities: pm msi vga_controller bus_master cap_list rom configuration: driver=ast latency=0 resources: irq:19 memory:cb000000-cbffffff memory:cc000000-cc01ffff ioport:4000(size=128) *-display UNCLAIMED description: VGA compatible controller product: NVIDIA Corporation vendor: NVIDIA Corporation physical id: 0 bus info: pci at 0000:82:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller cap_list configuration: latency=0 resources: iomemory:387f0-387ef iomemory:387f0-387ef memory:fa000000-faffffff memory:387fe0000000-387fefffffff memory:387ff0000000-387ff1ffffff ioport:e000(size=128) memory:fb000000-fb07ffff *-display UNCLAIMED description: VGA compatible controller product: NVIDIA Corporation vendor: NVIDIA Corporation physical id: 0 bus info: pci at 0000:83:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller cap_list configuration: latency=0 resources: iomemory:387f0-387ef iomemory:387f0-387ef memory:f8000000-f8ffffff memory:387fc0000000-387fcfffffff memory:387fd0000000-387fd1ffffff ioport:d000(size=128) memory:f9000000-f907ffff However what scares the hell out of me is that I don't see NVIDIA driver loaded lsmod|grep nvidia and the device nodes /dev/nvidia are not created. I am guessing I just missed some trivial step during the CUDA installation which is very involving. I am unfortunately too tired to debug this tonight. Predrag From predragp at cs.cmu.edu Wed Oct 12 22:23:32 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Wed, 12 Oct 2016 22:23:32 -0400 Subject: GPU3 is "configured" In-Reply-To: References: <20161012222658.bee4yVvYw%predragp@cs.cmu.edu> Message-ID: <20161013022332.Coy6tY6j7%predragp@cs.cmu.edu> Arne Suppe wrote: > Hi Predrag, > Don???t know if this applies to you, but I just build a machines with a GTX1080 which has the same PASCAL architecture as the Titan. After installing CUDA 8, I still found I needed to install the latest driver off of the NVIDIA web site to get the card recognized. Right now, I am running 367.44. > > Arne Arne, Thank you so much for this e-mail. Yes it is damn PASCAL arhitecture I see lots of people complaining about it on the forums. I downloaded and installed driver from http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce That seems to made a real difference. Check out this beautiful outputs root at gpu3$ ls nvidia* nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm nvidia-uvm-tools root at gpu3$ lspci | grep -i nvidia 02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev a1) 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) 03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev a1) 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) 82:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev a1) 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) 83:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev a1) 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) root at gpu3$ ls /proc/driver nvidia nvidia-uvm nvram rtc root at gpu3$ lsmod |grep nvidia nvidia_uvm 738901 0 nvidia_drm 43405 0 nvidia_modeset 764432 1 nvidia_drm nvidia 11492947 2 nvidia_modeset,nvidia_uvm drm_kms_helper 125056 2 ast,nvidia_drm drm 349210 5 ast,ttm,drm_kms_helper,nvidia_drm i2c_core 40582 7 ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia root at gpu3$ nvidia-smi Wed Oct 12 22:03:27 2016 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.57 Driver Version: 367.57 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off | N/A | | 23% 32C P0 56W / 250W | 0MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off | N/A | | 23% 36C P0 57W / 250W | 0MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off | N/A | | 23% 35C P0 57W / 250W | 0MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off | N/A | | 0% 35C P0 56W / 250W | 0MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ /usr/local/cuda/extras/demo_suite/deviceQuery Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 131 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > > Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU1) : Yes > Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU2) : No > Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU3) : No > Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU0) : Yes > Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU2) : No > Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU3) : No > Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU0) : No > Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU1) : No > Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU3) : Yes > Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU0) : No > Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU1) : No > Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU2) : Yes deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X (Pascal), Device1 = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = TITAN X (Pascal) Result = PASS Now not everything is rosy root at gpu3$ cd ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody root at gpu3$ make >>> WARNING - libGL.so not found, refer to CUDA Getting Started Guide for how to find and install them. <<< >>> WARNING - libGLU.so not found, refer to CUDA Getting Started Guide for how to find and install them. <<< >>> WARNING - libX11.so not found, refer to CUDA Getting Started Guide for how to find and install them. <<< even though those are installed. For example root at gpu3$ yum whatprovides */libX11.so libX11-devel-1.6.3-2.el7.i686 : Development files for libX11 Repo : core Matched from: Filename : /usr/lib/libX11.so also mesa-libGLU-devel mesa-libGL-devel xorg-x11-drv-nvidia-devel but root at gpu3$ yum -y install mesa-libGLU-devel mesa-libGL-devel xorg-x11-drv-nvidia-devel Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already installed and latest version Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 already installed and latest version Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 already installed and latest version Also from MATLAB gpuDevice hangs. So we still don't have a working installation. Any help would be appreciated. Best, Predrag P.S. Once we have a working installation we can think of installing Caffe and TensorFlow. For now we have to see why the things are not working. > > > On Oct 12, 2016, at 6:26 PM, Predrag Punosevac wrote: > > > > Dear Autonians, > > > > GPU3 is "configured". Namely you can log into it and all packages are > > installed. However I couldn't get NVIDIA provided CUDA driver to > > recognize GPU cards. They appear to be properly installed from the > > hardware point of view and you can list them with > > > > lshw -class display > > > > root at gpu3$ lshw -class display > > *-display UNCLAIMED > > description: VGA compatible controller > > product: NVIDIA Corporation > > vendor: NVIDIA Corporation > > physical id: 0 > > bus info: pci at 0000:02:00.0 > > version: a1 > > width: 64 bits > > clock: 33MHz > > capabilities: pm msi pciexpress vga_controller cap_list > > configuration: latency=0 > > resources: iomemory:383f0-383ef iomemory:383f0-383ef > > memory:cf000000-cfffffff memory:383fe0000000-383fefffffff > > memory:383ff0000000-383ff1ffffff ioport:6000(size=128) > > memory:d0000000-d007ffff > > *-display UNCLAIMED > > description: VGA compatible controller > > product: NVIDIA Corporation > > vendor: NVIDIA Corporation > > physical id: 0 > > bus info: pci at 0000:03:00.0 > > version: a1 > > width: 64 bits > > clock: 33MHz > > capabilities: pm msi pciexpress vga_controller cap_list > > configuration: latency=0 > > resources: iomemory:383f0-383ef iomemory:383f0-383ef > > memory:cd000000-cdffffff memory:383fc0000000-383fcfffffff > > memory:383fd0000000-383fd1ffffff ioport:5000(size=128) > > memory:ce000000-ce07ffff > > *-display > > description: VGA compatible controller > > product: ASPEED Graphics Family > > vendor: ASPEED Technology, Inc. > > physical id: 0 > > bus info: pci at 0000:06:00.0 > > version: 30 > > width: 32 bits > > clock: 33MHz > > capabilities: pm msi vga_controller bus_master cap_list rom > > configuration: driver=ast latency=0 > > resources: irq:19 memory:cb000000-cbffffff > > memory:cc000000-cc01ffff ioport:4000(size=128) > > *-display UNCLAIMED > > description: VGA compatible controller > > product: NVIDIA Corporation > > vendor: NVIDIA Corporation > > physical id: 0 > > bus info: pci at 0000:82:00.0 > > version: a1 > > width: 64 bits > > clock: 33MHz > > capabilities: pm msi pciexpress vga_controller cap_list > > configuration: latency=0 > > resources: iomemory:387f0-387ef iomemory:387f0-387ef > > memory:fa000000-faffffff memory:387fe0000000-387fefffffff > > memory:387ff0000000-387ff1ffffff ioport:e000(size=128) > > memory:fb000000-fb07ffff > > *-display UNCLAIMED > > description: VGA compatible controller > > product: NVIDIA Corporation > > vendor: NVIDIA Corporation > > physical id: 0 > > bus info: pci at 0000:83:00.0 > > version: a1 > > width: 64 bits > > clock: 33MHz > > capabilities: pm msi pciexpress vga_controller cap_list > > configuration: latency=0 > > resources: iomemory:387f0-387ef iomemory:387f0-387ef > > memory:f8000000-f8ffffff memory:387fc0000000-387fcfffffff > > memory:387fd0000000-387fd1ffffff ioport:d000(size=128) > > memory:f9000000-f907ffff > > > > > > However what scares the hell out of me is that I don't see NVIDIA driver > > loaded > > > > lsmod|grep nvidia > > > > and the device nodes /dev/nvidia are not created. I am guessing I just > > missed some trivial step during the CUDA installation which is very > > involving. I am unfortunately too tired to debug this tonight. > > > > Predrag From suppe at andrew.cmu.edu Wed Oct 12 23:26:48 2016 From: suppe at andrew.cmu.edu (Arne Suppe) Date: Wed, 12 Oct 2016 23:26:48 -0400 Subject: GPU3 is "configured" In-Reply-To: <20161013022332.Coy6tY6j7%predragp@cs.cmu.edu> References: <20161012222658.bee4yVvYw%predragp@cs.cmu.edu> <20161013022332.Coy6tY6j7%predragp@cs.cmu.edu> Message-ID: <22C1E88D-D614-4509-B0EC-3EB2357C0C3A@andrew.cmu.edu> Hmm - I don?t use matlab for deep learning, but gpuDevice also hangs on my computer with R2016a. I was able compile the matrixMul example in the CUDA samples and run it on gpu3, so I think the build environment is probably all set. As for the openGL, I think its possibly a problem with their build script findgl.mk which is not familiar with Springdale OS. The demo_suite directory has a precompiled nbody binary you may try, but I suspect most users will not need graphics. Arne > On Oct 12, 2016, at 10:23 PM, Predrag Punosevac wrote: > > Arne Suppe wrote: > >> Hi Predrag, >> Don???t know if this applies to you, but I just build a machines with a GTX1080 which has the same PASCAL architecture as the Titan. After installing CUDA 8, I still found I needed to install the latest driver off of the NVIDIA web site to get the card recognized. Right now, I am running 367.44. >> >> Arne > > Arne, > > Thank you so much for this e-mail. Yes it is damn PASCAL arhitecture I > see lots of people complaining about it on the forums. I downloaded and > installed driver from > > http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce > > That seems to made a real difference. Check out this beautiful outputs > > root at gpu3$ ls nvidia* > nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm > nvidia-uvm-tools > > root at gpu3$ lspci | grep -i nvidia > 02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev > a1) > 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) > 03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev > a1) > 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) > 82:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev > a1) > 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) > 83:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev > a1) > 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) > > > root at gpu3$ ls /proc/driver > nvidia nvidia-uvm nvram rtc > > root at gpu3$ lsmod |grep nvidia > nvidia_uvm 738901 0 > nvidia_drm 43405 0 > nvidia_modeset 764432 1 nvidia_drm > nvidia 11492947 2 nvidia_modeset,nvidia_uvm > drm_kms_helper 125056 2 ast,nvidia_drm > drm 349210 5 ast,ttm,drm_kms_helper,nvidia_drm > i2c_core 40582 7 > ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia > > root at gpu3$ nvidia-smi > Wed Oct 12 22:03:27 2016 > +-----------------------------------------------------------------------------+ > | NVIDIA-SMI 367.57 Driver Version: 367.57 > | > |-------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile > Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util > Compute M. | > |===============================+======================+======================| > | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off | > N/A | > | 23% 32C P0 56W / 250W | 0MiB / 12189MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off | > N/A | > | 23% 36C P0 57W / 250W | 0MiB / 12189MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off | > N/A | > | 23% 35C P0 57W / 250W | 0MiB / 12189MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off | > N/A | > | 0% 35C P0 56W / 250W | 0MiB / 12189MiB | 0% > Default | > +-------------------------------+----------------------+----------------------+ > > > +-----------------------------------------------------------------------------+ > | Processes: GPU > Memory | > | GPU PID Type Process name Usage > | > |=============================================================================| > | No running processes found > | > +-----------------------------------------------------------------------------+ > > > > /usr/local/cuda/extras/demo_suite/deviceQuery > > Alignment requirement for Surfaces: Yes > Device has ECC support: Disabled > Device supports Unified Addressing (UVA): Yes > Device PCI Domain ID / Bus ID / location ID: 0 / 131 / 0 > Compute Mode: > < Default (multiple host threads can use ::cudaSetDevice() with > device simultaneously) > >> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU1) : > Yes >> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU2) : > No >> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU3) : > No >> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU0) : > Yes >> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU2) : > No >> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU3) : > No >> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU0) : > No >> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU1) : > No >> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU3) : > Yes >> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU0) : > No >> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU1) : > No >> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU2) : > Yes > > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA > Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X (Pascal), Device1 > = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = TITAN X > (Pascal) > Result = PASS > > > > Now not everything is rosy > > root at gpu3$ cd ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody > root at gpu3$ make >>>> WARNING - libGL.so not found, refer to CUDA Getting Started Guide > for how to find and install them. <<< >>>> WARNING - libGLU.so not found, refer to CUDA Getting Started Guide > for how to find and install them. <<< >>>> WARNING - libX11.so not found, refer to CUDA Getting Started Guide > for how to find and install them. <<< > > > even though those are installed. For example > > root at gpu3$ yum whatprovides */libX11.so > libX11-devel-1.6.3-2.el7.i686 : Development files for libX11 > Repo : core > Matched from: > Filename : /usr/lib/libX11.so > > also > > mesa-libGLU-devel > mesa-libGL-devel > xorg-x11-drv-nvidia-devel > > but > > root at gpu3$ yum -y install mesa-libGLU-devel mesa-libGL-devel > xorg-x11-drv-nvidia-devel > Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already installed and > latest version > Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 already installed > and latest version > Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 already > installed and latest version > > Also from MATLAB gpuDevice hangs. > > So we still don't have a working installation. Any help would be > appreciated. > > Best, > Predrag > > P.S. Once we have a working installation we can think of installing > Caffe and TensorFlow. For now we have to see why the things are not > working. > > > > > > >> >>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac wrote: >>> >>> Dear Autonians, >>> >>> GPU3 is "configured". Namely you can log into it and all packages are >>> installed. However I couldn't get NVIDIA provided CUDA driver to >>> recognize GPU cards. They appear to be properly installed from the >>> hardware point of view and you can list them with >>> >>> lshw -class display >>> >>> root at gpu3$ lshw -class display >>> *-display UNCLAIMED >>> description: VGA compatible controller >>> product: NVIDIA Corporation >>> vendor: NVIDIA Corporation >>> physical id: 0 >>> bus info: pci at 0000:02:00.0 >>> version: a1 >>> width: 64 bits >>> clock: 33MHz >>> capabilities: pm msi pciexpress vga_controller cap_list >>> configuration: latency=0 >>> resources: iomemory:383f0-383ef iomemory:383f0-383ef >>> memory:cf000000-cfffffff memory:383fe0000000-383fefffffff >>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128) >>> memory:d0000000-d007ffff >>> *-display UNCLAIMED >>> description: VGA compatible controller >>> product: NVIDIA Corporation >>> vendor: NVIDIA Corporation >>> physical id: 0 >>> bus info: pci at 0000:03:00.0 >>> version: a1 >>> width: 64 bits >>> clock: 33MHz >>> capabilities: pm msi pciexpress vga_controller cap_list >>> configuration: latency=0 >>> resources: iomemory:383f0-383ef iomemory:383f0-383ef >>> memory:cd000000-cdffffff memory:383fc0000000-383fcfffffff >>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128) >>> memory:ce000000-ce07ffff >>> *-display >>> description: VGA compatible controller >>> product: ASPEED Graphics Family >>> vendor: ASPEED Technology, Inc. >>> physical id: 0 >>> bus info: pci at 0000:06:00.0 >>> version: 30 >>> width: 32 bits >>> clock: 33MHz >>> capabilities: pm msi vga_controller bus_master cap_list rom >>> configuration: driver=ast latency=0 >>> resources: irq:19 memory:cb000000-cbffffff >>> memory:cc000000-cc01ffff ioport:4000(size=128) >>> *-display UNCLAIMED >>> description: VGA compatible controller >>> product: NVIDIA Corporation >>> vendor: NVIDIA Corporation >>> physical id: 0 >>> bus info: pci at 0000:82:00.0 >>> version: a1 >>> width: 64 bits >>> clock: 33MHz >>> capabilities: pm msi pciexpress vga_controller cap_list >>> configuration: latency=0 >>> resources: iomemory:387f0-387ef iomemory:387f0-387ef >>> memory:fa000000-faffffff memory:387fe0000000-387fefffffff >>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128) >>> memory:fb000000-fb07ffff >>> *-display UNCLAIMED >>> description: VGA compatible controller >>> product: NVIDIA Corporation >>> vendor: NVIDIA Corporation >>> physical id: 0 >>> bus info: pci at 0000:83:00.0 >>> version: a1 >>> width: 64 bits >>> clock: 33MHz >>> capabilities: pm msi pciexpress vga_controller cap_list >>> configuration: latency=0 >>> resources: iomemory:387f0-387ef iomemory:387f0-387ef >>> memory:f8000000-f8ffffff memory:387fc0000000-387fcfffffff >>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128) >>> memory:f9000000-f907ffff >>> >>> >>> However what scares the hell out of me is that I don't see NVIDIA driver >>> loaded >>> >>> lsmod|grep nvidia >>> >>> and the device nodes /dev/nvidia are not created. I am guessing I just >>> missed some trivial step during the CUDA installation which is very >>> involving. I am unfortunately too tired to debug this tonight. >>> >>> Predrag > From predragp at imap.srv.cs.cmu.edu Thu Oct 13 10:44:16 2016 From: predragp at imap.srv.cs.cmu.edu (Predrag Punosevac) Date: Thu, 13 Oct 2016 10:44:16 -0400 Subject: GPU3 is "configured" In-Reply-To: <22C1E88D-D614-4509-B0EC-3EB2357C0C3A@andrew.cmu.edu> References: <20161012222658.bee4yVvYw%predragp@cs.cmu.edu> <20161013022332.Coy6tY6j7%predragp@cs.cmu.edu> <22C1E88D-D614-4509-B0EC-3EB2357C0C3A@andrew.cmu.edu> Message-ID: On 2016-10-12 23:26, Arne Suppe wrote: > Hmm - I don?t use matlab for deep learning, but gpuDevice also hangs > on my computer with R2016a. > We would have to escalate this with MathWorks. I have seen work around Internet but it looks like a bug in one of Mathworks provided MEX files. > I was able compile the matrixMul example in the CUDA samples and run > it on gpu3, so I think the build environment is probably all set. > > As for the openGL, I think its possibly a problem with their build > script findgl.mk which is not familiar with Springdale OS. The > demo_suite directory has a precompiled nbody binary you may try, but I > suspect most users will not need graphics. > That should not be too hard to fix. Some header files have to be manually edited. The funny part until 7.2 Princeton people didn't bother to remove RHEL branding which actually made things easier for us. Doug is trying right now to compile the latest Caffe, TensorFlow, and protobuf-3. We will try to create an RPM for that so that we don't have to go through this again. I also asked Princeton and Rutgers guys if they have WIP RPMs to share. Predrag > Arne > > > > >> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac >> wrote: >> >> Arne Suppe wrote: >> >>> Hi Predrag, >>> Don???t know if this applies to you, but I just build a machines with >>> a GTX1080 which has the same PASCAL architecture as the Titan. After >>> installing CUDA 8, I still found I needed to install the latest >>> driver off of the NVIDIA web site to get the card recognized. Right >>> now, I am running 367.44. >>> >>> Arne >> >> Arne, >> >> Thank you so much for this e-mail. Yes it is damn PASCAL arhitecture I >> see lots of people complaining about it on the forums. I downloaded >> and >> installed driver from >> >> http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce >> >> That seems to made a real difference. Check out this beautiful outputs >> >> root at gpu3$ ls nvidia* >> nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm >> nvidia-uvm-tools >> >> root at gpu3$ lspci | grep -i nvidia >> 02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev >> a1) >> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) >> 03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev >> a1) >> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) >> 82:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev >> a1) >> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) >> 83:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev >> a1) >> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) >> >> >> root at gpu3$ ls /proc/driver >> nvidia nvidia-uvm nvram rtc >> >> root at gpu3$ lsmod |grep nvidia >> nvidia_uvm 738901 0 >> nvidia_drm 43405 0 >> nvidia_modeset 764432 1 nvidia_drm >> nvidia 11492947 2 nvidia_modeset,nvidia_uvm >> drm_kms_helper 125056 2 ast,nvidia_drm >> drm 349210 5 ast,ttm,drm_kms_helper,nvidia_drm >> i2c_core 40582 7 >> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia >> >> root at gpu3$ nvidia-smi >> Wed Oct 12 22:03:27 2016 >> +-----------------------------------------------------------------------------+ >> | NVIDIA-SMI 367.57 Driver Version: 367.57 >> | >> |-------------------------------+----------------------+----------------------+ >> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile >> Uncorr. ECC | >> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util >> Compute M. | >> |===============================+======================+======================| >> | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off | >> N/A | >> | 23% 32C P0 56W / 250W | 0MiB / 12189MiB | 0% >> Default | >> +-------------------------------+----------------------+----------------------+ >> | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off | >> N/A | >> | 23% 36C P0 57W / 250W | 0MiB / 12189MiB | 0% >> Default | >> +-------------------------------+----------------------+----------------------+ >> | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off | >> N/A | >> | 23% 35C P0 57W / 250W | 0MiB / 12189MiB | 0% >> Default | >> +-------------------------------+----------------------+----------------------+ >> | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off | >> N/A | >> | 0% 35C P0 56W / 250W | 0MiB / 12189MiB | 0% >> Default | >> +-------------------------------+----------------------+----------------------+ >> >> >> +-----------------------------------------------------------------------------+ >> | Processes: GPU >> Memory | >> | GPU PID Type Process name >> Usage >> | >> |=============================================================================| >> | No running processes found >> | >> +-----------------------------------------------------------------------------+ >> >> >> >> /usr/local/cuda/extras/demo_suite/deviceQuery >> >> Alignment requirement for Surfaces: Yes >> Device has ECC support: Disabled >> Device supports Unified Addressing (UVA): Yes >> Device PCI Domain ID / Bus ID / location ID: 0 / 131 / 0 >> Compute Mode: >> < Default (multiple host threads can use ::cudaSetDevice() with >> device simultaneously) > >>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU1) : >> Yes >>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU2) : >> No >>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X (Pascal) (GPU3) : >> No >>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU0) : >> Yes >>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU2) : >> No >>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X (Pascal) (GPU3) : >> No >>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU0) : >> No >>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU1) : >> No >>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X (Pascal) (GPU3) : >> Yes >>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU0) : >> No >>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU1) : >> No >>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X (Pascal) (GPU2) : >> Yes >> >> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA >> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X (Pascal), >> Device1 >> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = TITAN X >> (Pascal) >> Result = PASS >> >> >> >> Now not everything is rosy >> >> root at gpu3$ cd ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody >> root at gpu3$ make >>>>> WARNING - libGL.so not found, refer to CUDA Getting Started Guide >> for how to find and install them. <<< >>>>> WARNING - libGLU.so not found, refer to CUDA Getting Started Guide >> for how to find and install them. <<< >>>>> WARNING - libX11.so not found, refer to CUDA Getting Started Guide >> for how to find and install them. <<< >> >> >> even though those are installed. For example >> >> root at gpu3$ yum whatprovides */libX11.so >> libX11-devel-1.6.3-2.el7.i686 : Development files for libX11 >> Repo : core >> Matched from: >> Filename : /usr/lib/libX11.so >> >> also >> >> mesa-libGLU-devel >> mesa-libGL-devel >> xorg-x11-drv-nvidia-devel >> >> but >> >> root at gpu3$ yum -y install mesa-libGLU-devel mesa-libGL-devel >> xorg-x11-drv-nvidia-devel >> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already installed and >> latest version >> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 already >> installed >> and latest version >> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 already >> installed and latest version >> >> Also from MATLAB gpuDevice hangs. >> >> So we still don't have a working installation. Any help would be >> appreciated. >> >> Best, >> Predrag >> >> P.S. Once we have a working installation we can think of installing >> Caffe and TensorFlow. For now we have to see why the things are not >> working. >> >> >> >> >> >> >>> >>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac >>>> wrote: >>>> >>>> Dear Autonians, >>>> >>>> GPU3 is "configured". Namely you can log into it and all packages >>>> are >>>> installed. However I couldn't get NVIDIA provided CUDA driver to >>>> recognize GPU cards. They appear to be properly installed from the >>>> hardware point of view and you can list them with >>>> >>>> lshw -class display >>>> >>>> root at gpu3$ lshw -class display >>>> *-display UNCLAIMED >>>> description: VGA compatible controller >>>> product: NVIDIA Corporation >>>> vendor: NVIDIA Corporation >>>> physical id: 0 >>>> bus info: pci at 0000:02:00.0 >>>> version: a1 >>>> width: 64 bits >>>> clock: 33MHz >>>> capabilities: pm msi pciexpress vga_controller cap_list >>>> configuration: latency=0 >>>> resources: iomemory:383f0-383ef iomemory:383f0-383ef >>>> memory:cf000000-cfffffff memory:383fe0000000-383fefffffff >>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128) >>>> memory:d0000000-d007ffff >>>> *-display UNCLAIMED >>>> description: VGA compatible controller >>>> product: NVIDIA Corporation >>>> vendor: NVIDIA Corporation >>>> physical id: 0 >>>> bus info: pci at 0000:03:00.0 >>>> version: a1 >>>> width: 64 bits >>>> clock: 33MHz >>>> capabilities: pm msi pciexpress vga_controller cap_list >>>> configuration: latency=0 >>>> resources: iomemory:383f0-383ef iomemory:383f0-383ef >>>> memory:cd000000-cdffffff memory:383fc0000000-383fcfffffff >>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128) >>>> memory:ce000000-ce07ffff >>>> *-display >>>> description: VGA compatible controller >>>> product: ASPEED Graphics Family >>>> vendor: ASPEED Technology, Inc. >>>> physical id: 0 >>>> bus info: pci at 0000:06:00.0 >>>> version: 30 >>>> width: 32 bits >>>> clock: 33MHz >>>> capabilities: pm msi vga_controller bus_master cap_list rom >>>> configuration: driver=ast latency=0 >>>> resources: irq:19 memory:cb000000-cbffffff >>>> memory:cc000000-cc01ffff ioport:4000(size=128) >>>> *-display UNCLAIMED >>>> description: VGA compatible controller >>>> product: NVIDIA Corporation >>>> vendor: NVIDIA Corporation >>>> physical id: 0 >>>> bus info: pci at 0000:82:00.0 >>>> version: a1 >>>> width: 64 bits >>>> clock: 33MHz >>>> capabilities: pm msi pciexpress vga_controller cap_list >>>> configuration: latency=0 >>>> resources: iomemory:387f0-387ef iomemory:387f0-387ef >>>> memory:fa000000-faffffff memory:387fe0000000-387fefffffff >>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128) >>>> memory:fb000000-fb07ffff >>>> *-display UNCLAIMED >>>> description: VGA compatible controller >>>> product: NVIDIA Corporation >>>> vendor: NVIDIA Corporation >>>> physical id: 0 >>>> bus info: pci at 0000:83:00.0 >>>> version: a1 >>>> width: 64 bits >>>> clock: 33MHz >>>> capabilities: pm msi pciexpress vga_controller cap_list >>>> configuration: latency=0 >>>> resources: iomemory:387f0-387ef iomemory:387f0-387ef >>>> memory:f8000000-f8ffffff memory:387fc0000000-387fcfffffff >>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128) >>>> memory:f9000000-f907ffff >>>> >>>> >>>> However what scares the hell out of me is that I don't see NVIDIA >>>> driver >>>> loaded >>>> >>>> lsmod|grep nvidia >>>> >>>> and the device nodes /dev/nvidia are not created. I am guessing I >>>> just >>>> missed some trivial step during the CUDA installation which is very >>>> involving. I am unfortunately too tired to debug this tonight. >>>> >>>> Predrag >> From predragp at imap.srv.cs.cmu.edu Thu Oct 13 13:39:19 2016 From: predragp at imap.srv.cs.cmu.edu (Predrag Punosevac) Date: Thu, 13 Oct 2016 13:39:19 -0400 Subject: GPU3 is "configured" In-Reply-To: References: <20161012222658.bee4yVvYw%predragp@cs.cmu.edu> <20161013022332.Coy6tY6j7%predragp@cs.cmu.edu> <22C1E88D-D614-4509-B0EC-3EB2357C0C3A@andrew.cmu.edu>

<20161013153826.f4agzWkMb%predragp@cs.cmu.edu> <3ad25168d2dc7502872b0cde94950655@imap.srv.cs.cmu.edu> Message-ID: <576ceb12fa4fffb3b72b68a742a9b0b1@imap.srv.cs.cmu.edu> Dear Autonians, In the case anybody is interested what happens behind the scenes, Doug got Caffe and TensorFlow to work on GPU3. Please see message below. I also got the very useful feed back from Princeton and Rutgers people. Please check out if you care (you will have to log into Gmail to see the exchange). https://groups.google.com/forum/#!forum/springdale-users I need to think how we move forward with this before start pulling triggers. If somebody is itchy and can't wait please build Caffe and TensorFlow in your scratch directory following below howto. Predrag On 2016-10-13 13:24, Dougal Sutherland wrote: > A note about cudnn: > > There are a bunch of versions of cudnn. They're not > backwards-compatible, and different versions of > caffe/tensorflow/whatever want different ones. > > I currently am using the setup in ~dsutherl/cudnn_files: > > * I have a bunch of versions of the installer there. > * The use-cudnn.sh script, intended to be used like "source > use-cudnn.sh 5.1", will untar the appropriate one into a scratch > directory (if it hasn't already been done) and set > CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH is > needed for caffe binaries, since they don't link to the absolute path; > the first two (not sure about the the third) are needed for theano. > Dunno about tensorflow yet. > > So, here's the Caffe setup: > > cd /home/scratch/$USER > git clone https://github.com/BVLC/caffe > cd caffe > cp Makefile.config.example Makefile.config > > # tell it to use openblas; using atlas needs some changes to the > Makefile > sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config > > # configure to use cudnn (optional) > source ~dsutherl/cudnn-files/use-cudnn.sh 5.1 > sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config > perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/' > Makefile.config > perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/' > Makefile.config > > # build the library > make -j23 > > # to do tests (takes ~10 minutes): > make -j23 test > make runtest > > # Now, to run caffe binaries you'll need to remember to source > use-cudnn if you used cudnn before. > > # To build the python libary: > make py > > # Requirements for the python library: > # Some of the system packages are too old; this installs them in your > scratch directory. > # You'll have to set PYTHONUSERBASE again before running any python > processes that use these libs. > export PYTHONUSERBASE=$HOME/scratch/.local; > export PATH=$PYTHONUSERBASE/bin:"$PATH" # <- optional > pip install --user -r python/requirements.txt > > # Caffe is dumb and doesn't package its python library properly. The > easiest way to use it is: > export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH > python -c 'import caffe' > > On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland > wrote: > >> Java fix seemed to work. Now tensorflow wants python-wheel and >> swig. >> >> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac >> wrote: >> >>> On 2016-10-13 11:46, Dougal Sutherland wrote: >>> >>>> Having some trouble with tensorflow, because: >>> >>>> >>> >>>> * it require's Google's bazel build system >>> >>>> >>> >>>> * The bazel installer says >>> >>>> Java version is 1.7.0_111 while at least 1.8 is needed. >>> >>>> * >>> >>>> >>> >>>> * $ java -version >>> >>>> openjdk version "1.8.0_102" >>> >>>> OpenJDK Runtime Environment (build 1.8.0_102-b14) >>> >>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode) >>> >>>> $ javac -version >>> >>>> javac 1.7.0_111 >>> >>>> >>> >>> I just did yum -y install java-1.8.0* which installs openjdk 1.8. >>> Please >>> >>> change your java. Let me know if >>> >>> you want me to install Oracle JDK 1.8 >>> >>> Predrag >>> >>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac >>> >>>> wrote: >>> >>>> >>> >>>>> Dougal Sutherland wrote: >>> >>>>> >>> >>>>>> Also, this seemed to work for me so far for protobuf: >>> >>>>>> >>> >>>>>> cd /home/scratch/$USER >>> >>>>>> VER=3.1.0 >>> >>>>>> wget >>> >>>>>> >>> >>>>> >>> >>>> >>> >> > https://github.com/google/protobuf/releases/download/v$VER/protobuf-cpp-$VER.tar.gz >>> >>>>>> tar xf protobuf-cpp-$VER.tar.gz >>> >>>>>> cd protobuf-cpp-$VER >>> >>>>>> ./configure --prefix=/home/scratch/$USER >>> >>>>>> make -j12 >>> >>>>>> make -j12 check >>> >>>>>> make install >>> >>>>> >>> >>>>> That is great help! >>> >>>>> >>> >>>>>> >>> >>>>>> You could change --prefix=/usr if making an RPM. >>> >>>>>> >>> >>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland >>> >>>>> wrote: >>> >>>>>> >>> >>>>>>> Some more packages for caffe: >>> >>>>>>> >>> >>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel >>> >>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel >>> >>>>>>> >>> >>>>>>> (Some of those might be installed already, but at least >>> gflags >>> >>>>> is >>> >>>>>>> definitely missing.) >>> >>>>>>> >>> >>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac < >>> >>>>>>> predragp at imap.srv.cs.cmu.edu> wrote: >>> >>>>>>> >>> >>>>>>> On 2016-10-12 23:26, Arne Suppe wrote: >>> >>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice >>> >>>>> also hangs >>> >>>>>>>> on my computer with R2016a. >>> >>>>>>>> >>> >>>>>>> >>> >>>>>>> We would have to escalate this with MathWorks. I have seen >>> work >>> >>>>> around >>> >>>>>>> Internet but it looks like a bug in one of Mathworks provided >>> >>>>> MEX files. >>> >>>>>>> >>> >>>>>>>> I was able compile the matrixMul example in the CUDA >>> samples >>> >>>>> and run >>> >>>>>>>> it on gpu3, so I think the build environment is probably >>> all >>> >>>>> set. >>> >>>>>>>> >>> >>>>>>>> As for the openGL, I think its possibly a problem with >>> their >>> >>>>> build >>> >>>>>>>> script findgl.mk [1] [1] which is not familiar with >>> Springdale OS. >>> >>>>> The >>> >>>>>>>> demo_suite directory has a precompiled nbody binary you may >>> >>>>> try, but I >>> >>>>>>>> suspect most users will not need graphics. >>> >>>>>>>> >>> >>>>>>> >>> >>>>>>> That should not be too hard to fix. Some header files have to >>> be >>> >>>>>>> manually edited. The funny part until 7.2 Princeton people >>> >>>>> didn't bother >>> >>>>>>> to remove RHEL branding which actually made things easier for >>> >>>>> us. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> Doug is trying right now to compile the latest Caffe, >>> >>>>> TensorFlow, and >>> >>>>>>> protobuf-3. We will try to create an RPM for that so that we >>> >>>>> don't have >>> >>>>>>> to go through this again. I also asked Princeton and Rutgers >>> >>>>> guys if >>> >>>>>>> they >>> >>>>>>> have WIP RPMs to share. >>> >>>>>>> >>> >>>>>>> Predrag >>> >>>>>>> >>> >>>>>>>> Arne >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac >>> >>>>> >>> >>>>>>>>> wrote: >>> >>>>>>>>> >>> >>>>>>>>> Arne Suppe wrote: >>> >>>>>>>>> >>> >>>>>>>>>> Hi Predrag, >>> >>>>>>>>>> Don???t know if this applies to you, but I just build a >>> >>>>> machines with >>> >>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the >>> >>>>> Titan. After >>> >>>>>>>>>> installing CUDA 8, I still found I needed to install the >>> >>>>> latest >>> >>>>>>>>>> driver off of the NVIDIA web site to get the card >>> >>>>> recognized. Right >>> >>>>>>>>>> now, I am running 367.44. >>> >>>>>>>>>> >>> >>>>>>>>>> Arne >>> >>>>>>>>> >>> >>>>>>>>> Arne, >>> >>>>>>>>> >>> >>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL >>> >>>>> arhitecture I >>> >>>>>>>>> see lots of people complaining about it on the forums. I >>> >>>>> downloaded >>> >>>>>>>>> and >>> >>>>>>>>> installed driver from >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce >>> >>>>>>>>> >>> >>>>>>>>> That seems to made a real difference. Check out this >>> >>>>> beautiful outputs >>> >>>>>>>>> >>> >>>>>>>>> root at gpu3$ ls nvidia* >>> >>>>>>>>> nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm >>> >>>>>>>>> nvidia-uvm-tools >>> >>>>>>>>> >>> >>>>>>>>> root at gpu3$ lspci | grep -i nvidia >>> >>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation >>> Device >>> >>>>> 1b00 (rev >>> >>>>>>>>> a1) >>> >>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >>> a1) >>> >>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation >>> Device >>> >>>>> 1b00 (rev >>> >>>>>>>>> a1) >>> >>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >>> a1) >>> >>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation >>> Device >>> >>>>> 1b00 (rev >>> >>>>>>>>> a1) >>> >>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >>> a1) >>> >>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation >>> Device >>> >>>>> 1b00 (rev >>> >>>>>>>>> a1) >>> >>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >>> a1) >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> root at gpu3$ ls /proc/driver >>> >>>>>>>>> nvidia nvidia-uvm nvram rtc >>> >>>>>>>>> >>> >>>>>>>>> root at gpu3$ lsmod |grep nvidia >>> >>>>>>>>> nvidia_uvm 738901 0 >>> >>>>>>>>> nvidia_drm 43405 0 >>> >>>>>>>>> nvidia_modeset 764432 1 nvidia_drm >>> >>>>>>>>> nvidia 11492947 2 nvidia_modeset,nvidia_uvm >>> >>>>>>>>> drm_kms_helper 125056 2 ast,nvidia_drm >>> >>>>>>>>> drm 349210 5 >>> >>>>> ast,ttm,drm_kms_helper,nvidia_drm >>> >>>>>>>>> i2c_core 40582 7 >>> >>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia >>> >>>>>>>>> >>> >>>>>>>>> root at gpu3$ nvidia-smi >>> >>>>>>>>> Wed Oct 12 22:03:27 2016 >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > +-----------------------------------------------------------------------------+ >>> >>>>>>>>> | NVIDIA-SMI 367.57 Driver Version: 367.57 >>> >>>>>>>>> | >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > |-------------------------------+----------------------+----------------------+ >>> >>>>>>>>> | GPU Name Persistence-M| Bus-Id Disp.A | >>> >>>>> Volatile >>> >>>>>>>>> Uncorr. ECC | >>> >>>>>>>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | >>> >>>>> GPU-Util >>> >>>>>>>>> Compute M. | >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > |===============================+======================+======================| >>> >>>>>>>>> | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off | >>> >>>>>>>>> N/A | >>> >>>>>>>>> | 23% 32C P0 56W / 250W | 0MiB / 12189MiB | >>> >>>>> 0% >>> >>>>>>>>> Default | >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > +-------------------------------+----------------------+----------------------+ >>> >>>>>>>>> | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off | >>> >>>>>>>>> N/A | >>> >>>>>>>>> | 23% 36C P0 57W / 250W | 0MiB / 12189MiB | >>> >>>>> 0% >>> >>>>>>>>> Default | >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > +-------------------------------+----------------------+----------------------+ >>> >>>>>>>>> | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off | >>> >>>>>>>>> N/A | >>> >>>>>>>>> | 23% 35C P0 57W / 250W | 0MiB / 12189MiB | >>> >>>>> 0% >>> >>>>>>>>> Default | >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > +-------------------------------+----------------------+----------------------+ >>> >>>>>>>>> | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off | >>> >>>>>>>>> N/A | >>> >>>>>>>>> | 0% 35C P0 56W / 250W | 0MiB / 12189MiB | >>> >>>>> 0% >>> >>>>>>>>> Default | >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > +-------------------------------+----------------------+----------------------+ >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > +-----------------------------------------------------------------------------+ >>> >>>>>>>>> | Processes: >>> >>>>> GPU >>> >>>>>>>>> Memory | >>> >>>>>>>>> | GPU PID Type Process name >>> >>>>>>>>> Usage >>> >>>>>>>>> | >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > |=============================================================================| >>> >>>>>>>>> | No running processes found >>> >>>>>>>>> | >>> >>>>>>>>> >>> >>>>>>> >>> >>>>> >>> >>>> >>> >> > +-----------------------------------------------------------------------------+ >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery >>> >>>>>>>>> >>> >>>>>>>>> Alignment requirement for Surfaces: Yes >>> >>>>>>>>> Device has ECC support: Disabled >>> >>>>>>>>> Device supports Unified Addressing (UVA): Yes >>> >>>>>>>>> Device PCI Domain ID / Bus ID / location ID: 0 / 131 / >>> 0 >>> >>>>>>>>> Compute Mode: >>> >>>>>>>>> < Default (multiple host threads can use >>> >>>>> ::cudaSetDevice() with >>> >>>>>>>>> device simultaneously) > >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X >>> (Pascal) >>> >>>>> (GPU1) : >>> >>>>>>>>> Yes >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X >>> (Pascal) >>> >>>>> (GPU2) : >>> >>>>>>>>> No >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X >>> (Pascal) >>> >>>>> (GPU3) : >>> >>>>>>>>> No >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X >>> (Pascal) >>> >>>>> (GPU0) : >>> >>>>>>>>> Yes >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X >>> (Pascal) >>> >>>>> (GPU2) : >>> >>>>>>>>> No >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X >>> (Pascal) >>> >>>>> (GPU3) : >>> >>>>>>>>> No >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X >>> (Pascal) >>> >>>>> (GPU0) : >>> >>>>>>>>> No >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X >>> (Pascal) >>> >>>>> (GPU1) : >>> >>>>>>>>> No >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X >>> (Pascal) >>> >>>>> (GPU3) : >>> >>>>>>>>> Yes >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X >>> (Pascal) >>> >>>>> (GPU0) : >>> >>>>>>>>> No >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X >>> (Pascal) >>> >>>>> (GPU1) : >>> >>>>>>>>> No >>> >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X >>> (Pascal) >>> >>>>> (GPU2) : >>> >>>>>>>>> Yes >>> >>>>>>>>> >>> >>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = >>> 8.0, >>> >>>>> CUDA >>> >>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X >>> >>>>> (Pascal), >>> >>>>>>>>> Device1 >>> >>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = >>> >>>>> TITAN X >>> >>>>>>>>> (Pascal) >>> >>>>>>>>> Result = PASS >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> Now not everything is rosy >>> >>>>>>>>> >>> >>>>>>>>> root at gpu3$ cd >>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody >>> >>>>>>>>> root at gpu3$ make >>> >>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting >>> >>>>> Started Guide >>> >>>>>>>>> for how to find and install them. <<< >>> >>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting >>> >>>>> Started Guide >>> >>>>>>>>> for how to find and install them. <<< >>> >>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting >>> >>>>> Started Guide >>> >>>>>>>>> for how to find and install them. <<< >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> even though those are installed. For example >>> >>>>>>>>> >>> >>>>>>>>> root at gpu3$ yum whatprovides */libX11.so >>> >>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for >>> libX11 >>> >>>>>>>>> Repo : core >>> >>>>>>>>> Matched from: >>> >>>>>>>>> Filename : /usr/lib/libX11.so >>> >>>>>>>>> >>> >>>>>>>>> also >>> >>>>>>>>> >>> >>>>>>>>> mesa-libGLU-devel >>> >>>>>>>>> mesa-libGL-devel >>> >>>>>>>>> xorg-x11-drv-nvidia-devel >>> >>>>>>>>> >>> >>>>>>>>> but >>> >>>>>>>>> >>> >>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel >>> mesa-libGL-devel >>> >>>>>>>>> xorg-x11-drv-nvidia-devel >>> >>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already >>> >>>>> installed and >>> >>>>>>>>> latest version >>> >>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 >>> already >>> >>>>>>>>> installed >>> >>>>>>>>> and latest version >>> >>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 >>> >>>>> already >>> >>>>>>>>> installed and latest version >>> >>>>>>>>> >>> >>>>>>>>> Also from MATLAB gpuDevice hangs. >>> >>>>>>>>> >>> >>>>>>>>> So we still don't have a working installation. Any help >>> would >>> >>>>> be >>> >>>>>>>>> appreciated. >>> >>>>>>>>> >>> >>>>>>>>> Best, >>> >>>>>>>>> Predrag >>> >>>>>>>>> >>> >>>>>>>>> P.S. Once we have a working installation we can think of >>> >>>>> installing >>> >>>>>>>>> Caffe and TensorFlow. For now we have to see why the >>> things >>> >>>>> are not >>> >>>>>>>>> working. >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac >>> >>>>> >>> >>>>>>>>>>> wrote: >>> >>>>>>>>>>> >>> >>>>>>>>>>> Dear Autonians, >>> >>>>>>>>>>> >>> >>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all >>> >>>>> packages >>> >>>>>>>>>>> are >>> >>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA >>> >>>>> driver to >>> >>>>>>>>>>> recognize GPU cards. They appear to be properly >>> installed >>> >>>>> from the >>> >>>>>>>>>>> hardware point of view and you can list them with >>> >>>>>>>>>>> >>> >>>>>>>>>>> lshw -class display >>> >>>>>>>>>>> >>> >>>>>>>>>>> root at gpu3$ lshw -class display >>> >>>>>>>>>>> *-display UNCLAIMED >>> >>>>>>>>>>> description: VGA compatible controller >>> >>>>>>>>>>> product: NVIDIA Corporation >>> >>>>>>>>>>> vendor: NVIDIA Corporation >>> >>>>>>>>>>> physical id: 0 >>> >>>>>>>>>>> bus info: pci at 0000:02:00.0 >>> >>>>>>>>>>> version: a1 >>> >>>>>>>>>>> width: 64 bits >>> >>>>>>>>>>> clock: 33MHz >>> >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >>> >>>>> cap_list >>> >>>>>>>>>>> configuration: latency=0 >>> >>>>>>>>>>> resources: iomemory:383f0-383ef >>> iomemory:383f0-383ef >>> >>>>>>>>>>> memory:cf000000-cfffffff >>> memory:383fe0000000-383fefffffff >>> >>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128) >>> >>>>>>>>>>> memory:d0000000-d007ffff >>> >>>>>>>>>>> *-display UNCLAIMED >>> >>>>>>>>>>> description: VGA compatible controller >>> >>>>>>>>>>> product: NVIDIA Corporation >>> >>>>>>>>>>> vendor: NVIDIA Corporation >>> >>>>>>>>>>> physical id: 0 >>> >>>>>>>>>>> bus info: pci at 0000:03:00.0 >>> >>>>>>>>>>> version: a1 >>> >>>>>>>>>>> width: 64 bits >>> >>>>>>>>>>> clock: 33MHz >>> >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >>> >>>>> cap_list >>> >>>>>>>>>>> configuration: latency=0 >>> >>>>>>>>>>> resources: iomemory:383f0-383ef >>> iomemory:383f0-383ef >>> >>>>>>>>>>> memory:cd000000-cdffffff >>> memory:383fc0000000-383fcfffffff >>> >>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128) >>> >>>>>>>>>>> memory:ce000000-ce07ffff >>> >>>>>>>>>>> *-display >>> >>>>>>>>>>> description: VGA compatible controller >>> >>>>>>>>>>> product: ASPEED Graphics Family >>> >>>>>>>>>>> vendor: ASPEED Technology, Inc. >>> >>>>>>>>>>> physical id: 0 >>> >>>>>>>>>>> bus info: pci at 0000:06:00.0 >>> >>>>>>>>>>> version: 30 >>> >>>>>>>>>>> width: 32 bits >>> >>>>>>>>>>> clock: 33MHz >>> >>>>>>>>>>> capabilities: pm msi vga_controller bus_master >>> >>>>> cap_list rom >>> >>>>>>>>>>> configuration: driver=ast latency=0 >>> >>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff >>> >>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128) >>> >>>>>>>>>>> *-display UNCLAIMED >>> >>>>>>>>>>> description: VGA compatible controller >>> >>>>>>>>>>> product: NVIDIA Corporation >>> >>>>>>>>>>> vendor: NVIDIA Corporation >>> >>>>>>>>>>> physical id: 0 >>> >>>>>>>>>>> bus info: pci at 0000:82:00.0 >>> >>>>>>>>>>> version: a1 >>> >>>>>>>>>>> width: 64 bits >>> >>>>>>>>>>> clock: 33MHz >>> >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >>> >>>>> cap_list >>> >>>>>>>>>>> configuration: latency=0 >>> >>>>>>>>>>> resources: iomemory:387f0-387ef >>> iomemory:387f0-387ef >>> >>>>>>>>>>> memory:fa000000-faffffff >>> memory:387fe0000000-387fefffffff >>> >>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128) >>> >>>>>>>>>>> memory:fb000000-fb07ffff >>> >>>>>>>>>>> *-display UNCLAIMED >>> >>>>>>>>>>> description: VGA compatible controller >>> >>>>>>>>>>> product: NVIDIA Corporation >>> >>>>>>>>>>> vendor: NVIDIA Corporation >>> >>>>>>>>>>> physical id: 0 >>> >>>>>>>>>>> bus info: pci at 0000:83:00.0 >>> >>>>>>>>>>> version: a1 >>> >>>>>>>>>>> width: 64 bits >>> >>>>>>>>>>> clock: 33MHz >>> >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >>> >>>>> cap_list >>> >>>>>>>>>>> configuration: latency=0 >>> >>>>>>>>>>> resources: iomemory:387f0-387ef >>> iomemory:387f0-387ef >>> >>>>>>>>>>> memory:f8000000-f8ffffff >>> memory:387fc0000000-387fcfffffff >>> >>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128) >>> >>>>>>>>>>> memory:f9000000-f907ffff >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> However what scares the hell out of me is that I don't >>> see >>> >>>>> NVIDIA >>> >>>>>>>>>>> driver >>> >>>>>>>>>>> loaded >>> >>>>>>>>>>> >>> >>>>>>>>>>> lsmod|grep nvidia >>> >>>>>>>>>>> >>> >>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am >>> >>>>> guessing I >>> >>>>>>>>>>> just >>> >>>>>>>>>>> missed some trivial step during the CUDA installation >>> which >>> >>>>> is very >>> >>>>>>>>>>> involving. I am unfortunately too tired to debug this >>> >>>>> tonight. >>> >>>>>>>>>>> >>> >>>>>>>>>>> Predrag >>> >>>>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>> >>> >>>> >>> >>>> Links: >>> >>>> ------ >>> >>>> [1] http://findgl.mk > > > Links: > ------ > [1] http://findgl.mk From predragp at imap.srv.cs.cmu.edu Thu Oct 13 13:55:34 2016 From: predragp at imap.srv.cs.cmu.edu (Predrag Punosevac) Date: Thu, 13 Oct 2016 13:55:34 -0400 Subject: GPU3 is "configured" In-Reply-To: References: <20161012222658.bee4yVvYw%predragp@cs.cmu.edu> <20161013022332.Coy6tY6j7%predragp@cs.cmu.edu> <22C1E88D-D614-4509-B0EC-3EB2357C0C3A@andrew.cmu.edu>

<20161013153826.f4agzWkMb%predragp@cs.cmu.edu> <3ad25168d2dc7502872b0cde94950655@imap.srv.cs.cmu.edu> <576ceb12fa4fffb3b72b68a742a9b0b1@imap.srv.cs.cmu.edu> Message-ID: On 2016-10-13 13:51, Dougal Sutherland wrote: > I actually haven't gotten tensorflow working yet -- the bazel build > just hangs on me. I think it maybe has to do with home directories > being on NFS, but I can't figure out bazel at all. I'll try some more > tonight. > According to one of Princeton guys we could just use Python conda for TensorFlow. Please check out and use your scratch directory instead of NFS. Quote: Hello, Predrag. We have caffe 1.00rc3 if you are interested. ftp://ftp.cs.princeton.edu/pub/people/advorkin/SRPM/sd7/caffe-1.00rc3-3.sd7.src.rpm TensforFlow and protobuf-3 work great with conda (http://conda.pydata.org). I just tried and had no problems installing it for Python 2.7 and 3.5 > Caffe should be workable following the instructions Predrag forwarded. > > - Dougal > > On Thu, Oct 13, 2016 at 6:39 PM Predrag Punosevac > wrote: > >> Dear Autonians, >> >> In the case anybody is interested what happens behind the scenes, >> Doug >> got Caffe and TensorFlow to work on >> GPU3. Please see message below. I also got the very useful feed >> back >> from Princeton and Rutgers people. Please check out if you care (you >> will have to log into Gmail to see the exchange). >> >> https://groups.google.com/forum/#!forum/springdale-users >> >> I need to think how we move forward with this before start pulling >> triggers. If somebody is itchy and can't wait please build Caffe and >> TensorFlow in your scratch directory following below howto. >> >> Predrag >> >> On 2016-10-13 13:24, Dougal Sutherland wrote: >>> A note about cudnn: >>> >>> There are a bunch of versions of cudnn. They're not >>> backwards-compatible, and different versions of >>> caffe/tensorflow/whatever want different ones. >>> >>> I currently am using the setup in ~dsutherl/cudnn_files: >>> >>> * I have a bunch of versions of the installer there. >>> * The use-cudnn.sh script, intended to be used like "source >>> use-cudnn.sh 5.1", will untar the appropriate one into a scratch >>> directory (if it hasn't already been done) and set >>> CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH >> is >>> needed for caffe binaries, since they don't link to the absolute >> path; >>> the first two (not sure about the the third) are needed for >> theano. >>> Dunno about tensorflow yet. >>> >>> So, here's the Caffe setup: >>> >>> cd /home/scratch/$USER >>> git clone https://github.com/BVLC/caffe >>> cd caffe >>> cp Makefile.config.example Makefile.config >>> >>> # tell it to use openblas; using atlas needs some changes to the >>> Makefile >>> sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config >>> >>> # configure to use cudnn (optional) >>> source ~dsutherl/cudnn-files/use-cudnn.sh 5.1 >>> sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config >>> perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/' >>> Makefile.config >>> perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/' >>> Makefile.config >>> >>> # build the library >>> make -j23 >>> >>> # to do tests (takes ~10 minutes): >>> make -j23 test >>> make runtest >>> >>> # Now, to run caffe binaries you'll need to remember to source >>> use-cudnn if you used cudnn before. >>> >>> # To build the python libary: >>> make py >>> >>> # Requirements for the python library: >>> # Some of the system packages are too old; this installs them in >> your >>> scratch directory. >>> # You'll have to set PYTHONUSERBASE again before running any >> python >>> processes that use these libs. >>> export PYTHONUSERBASE=$HOME/scratch/.local; >>> export PATH=$PYTHONUSERBASE/bin:"$PATH" # <- optional >>> pip install --user -r python/requirements.txt >>> >>> # Caffe is dumb and doesn't package its python library properly. >> The >>> easiest way to use it is: >>> export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH >>> python -c 'import caffe' >>> >>> On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland >> >>> wrote: >>> >>>> Java fix seemed to work. Now tensorflow wants python-wheel and >>>> swig. >>>> >>>> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac >>>> wrote: >>>> >>>>> On 2016-10-13 11:46, Dougal Sutherland wrote: >>>>> >>>>>> Having some trouble with tensorflow, because: >>>>> >>>>>> >>>>> >>>>>> * it require's Google's bazel build system >>>>> >>>>>> >>>>> >>>>>> * The bazel installer says >>>>> >>>>>> Java version is 1.7.0_111 while at least 1.8 is needed. >>>>> >>>>>> * >>>>> >>>>>> >>>>> >>>>>> * $ java -version >>>>> >>>>>> openjdk version "1.8.0_102" >>>>> >>>>>> OpenJDK Runtime Environment (build 1.8.0_102-b14) >>>>> >>>>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode) >>>>> >>>>>> $ javac -version >>>>> >>>>>> javac 1.7.0_111 >>>>> >>>>>> >>>>> >>>>> I just did yum -y install java-1.8.0* which installs openjdk >> 1.8. >>>>> Please >>>>> >>>>> change your java. Let me know if >>>>> >>>>> you want me to install Oracle JDK 1.8 >>>>> >>>>> Predrag >>>>> >>>>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac >>>>> >>>>>> wrote: >>>>> >>>>>> >>>>> >>>>>>> Dougal Sutherland wrote: >>>>> >>>>>>> >>>>> >>>>>>>> Also, this seemed to work for me so far for protobuf: >>>>> >>>>>>>> >>>>> >>>>>>>> cd /home/scratch/$USER >>>>> >>>>>>>> VER=3.1.0 >>>>> >>>>>>>> wget >>>>> >>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > https://github.com/google/protobuf/releases/download/v$VER/protobuf-cpp-$VER.tar.gz >>>>> >>>>>>>> tar xf protobuf-cpp-$VER.tar.gz >>>>> >>>>>>>> cd protobuf-cpp-$VER >>>>> >>>>>>>> ./configure --prefix=/home/scratch/$USER >>>>> >>>>>>>> make -j12 >>>>> >>>>>>>> make -j12 check >>>>> >>>>>>>> make install >>>>> >>>>>>> >>>>> >>>>>>> That is great help! >>>>> >>>>>>> >>>>> >>>>>>>> >>>>> >>>>>>>> You could change --prefix=/usr if making an RPM. >>>>> >>>>>>>> >>>>> >>>>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland >>>>> >>>>>>> wrote: >>>>> >>>>>>>> >>>>> >>>>>>>>> Some more packages for caffe: >>>>> >>>>>>>>> >>>>> >>>>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel >>>>> >>>>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel >>>>> >>>>>>>>> >>>>> >>>>>>>>> (Some of those might be installed already, but at least >>>>> gflags >>>>> >>>>>>> is >>>>> >>>>>>>>> definitely missing.) >>>>> >>>>>>>>> >>>>> >>>>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac < >>>>> >>>>>>>>> predragp at imap.srv.cs.cmu.edu> wrote: >>>>> >>>>>>>>> >>>>> >>>>>>>>> On 2016-10-12 23:26, Arne Suppe wrote: >>>>> >>>>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice >>>>> >>>>>>> also hangs >>>>> >>>>>>>>>> on my computer with R2016a. >>>>> >>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>>>> We would have to escalate this with MathWorks. I have seen >>>>> work >>>>> >>>>>>> around >>>>> >>>>>>>>> Internet but it looks like a bug in one of Mathworks >> provided >>>>> >>>>>>> MEX files. >>>>> >>>>>>>>> >>>>> >>>>>>>>>> I was able compile the matrixMul example in the CUDA >>>>> samples >>>>> >>>>>>> and run >>>>> >>>>>>>>>> it on gpu3, so I think the build environment is probably >>>>> all >>>>> >>>>>>> set. >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> As for the openGL, I think its possibly a problem with >>>>> their >>>>> >>>>>>> build >>>>> >>>>>>>>>> script findgl.mk [1] [1] [1] which is not familiar with >>>>> Springdale OS. >>>>> >>>>>>> The >>>>> >>>>>>>>>> demo_suite directory has a precompiled nbody binary you may >>>>> >>>>>>> try, but I >>>>> >>>>>>>>>> suspect most users will not need graphics. >>>>> >>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>>>> That should not be too hard to fix. Some header files have >> to >>>>> be >>>>> >>>>>>>>> manually edited. The funny part until 7.2 Princeton people >>>>> >>>>>>> didn't bother >>>>> >>>>>>>>> to remove RHEL branding which actually made things easier >> for >>>>> >>>>>>> us. >>>>> >>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>>>> Doug is trying right now to compile the latest Caffe, >>>>> >>>>>>> TensorFlow, and >>>>> >>>>>>>>> protobuf-3. We will try to create an RPM for that so that we >>>>> >>>>>>> don't have >>>>> >>>>>>>>> to go through this again. I also asked Princeton and Rutgers >>>>> >>>>>>> guys if >>>>> >>>>>>>>> they >>>>> >>>>>>>>> have WIP RPMs to share. >>>>> >>>>>>>>> >>>>> >>>>>>>>> Predrag >>>>> >>>>>>>>> >>>>> >>>>>>>>>> Arne >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> >>>>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac >>>>> >>>>>>> >>>>> >>>>>>>>>>> wrote: >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Arne Suppe wrote: >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>>> Hi Predrag, >>>>> >>>>>>>>>>>> Don???t know if this applies to you, but I just build a >>>>> >>>>>>> machines with >>>>> >>>>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the >>>>> >>>>>>> Titan. After >>>>> >>>>>>>>>>>> installing CUDA 8, I still found I needed to install the >>>>> >>>>>>> latest >>>>> >>>>>>>>>>>> driver off of the NVIDIA web site to get the card >>>>> >>>>>>> recognized. Right >>>>> >>>>>>>>>>>> now, I am running 367.44. >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>> Arne >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Arne, >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL >>>>> >>>>>>> arhitecture I >>>>> >>>>>>>>>>> see lots of people complaining about it on the forums. I >>>>> >>>>>>> downloaded >>>>> >>>>>>>>>>> and >>>>> >>>>>>>>>>> installed driver from >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> That seems to made a real difference. Check out this >>>>> >>>>>>> beautiful outputs >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> root at gpu3$ ls nvidia* >>>>> >>>>>>>>>>> nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm >>>>> >>>>>>>>>>> nvidia-uvm-tools >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> root at gpu3$ lspci | grep -i nvidia >>>>> >>>>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation >>>>> Device >>>>> >>>>>>> 1b00 (rev >>>>> >>>>>>>>>>> a1) >>>>> >>>>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >>>>> a1) >>>>> >>>>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation >>>>> Device >>>>> >>>>>>> 1b00 (rev >>>>> >>>>>>>>>>> a1) >>>>> >>>>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >>>>> a1) >>>>> >>>>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation >>>>> Device >>>>> >>>>>>> 1b00 (rev >>>>> >>>>>>>>>>> a1) >>>>> >>>>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >>>>> a1) >>>>> >>>>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation >>>>> Device >>>>> >>>>>>> 1b00 (rev >>>>> >>>>>>>>>>> a1) >>>>> >>>>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >>>>> a1) >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> root at gpu3$ ls /proc/driver >>>>> >>>>>>>>>>> nvidia nvidia-uvm nvram rtc >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> root at gpu3$ lsmod |grep nvidia >>>>> >>>>>>>>>>> nvidia_uvm 738901 0 >>>>> >>>>>>>>>>> nvidia_drm 43405 0 >>>>> >>>>>>>>>>> nvidia_modeset 764432 1 nvidia_drm >>>>> >>>>>>>>>>> nvidia 11492947 2 nvidia_modeset,nvidia_uvm >>>>> >>>>>>>>>>> drm_kms_helper 125056 2 ast,nvidia_drm >>>>> >>>>>>>>>>> drm 349210 5 >>>>> >>>>>>> ast,ttm,drm_kms_helper,nvidia_drm >>>>> >>>>>>>>>>> i2c_core 40582 7 >>>>> >>>>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> root at gpu3$ nvidia-smi >>>>> >>>>>>>>>>> Wed Oct 12 22:03:27 2016 >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > +-----------------------------------------------------------------------------+ >>>>> >>>>>>>>>>> | NVIDIA-SMI 367.57 Driver Version: 367.57 >>>>> >>>>>>>>>>> | >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > |-------------------------------+----------------------+----------------------+ >>>>> >>>>>>>>>>> | GPU Name Persistence-M| Bus-Id Disp.A | >>>>> >>>>>>> Volatile >>>>> >>>>>>>>>>> Uncorr. ECC | >>>>> >>>>>>>>>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | >>>>> >>>>>>> GPU-Util >>>>> >>>>>>>>>>> Compute M. | >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > |===============================+======================+======================| >>>>> >>>>>>>>>>> | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off | >>>>> >>>>>>>>>>> N/A | >>>>> >>>>>>>>>>> | 23% 32C P0 56W / 250W | 0MiB / 12189MiB | >>>>> >>>>>>> 0% >>>>> >>>>>>>>>>> Default | >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > +-------------------------------+----------------------+----------------------+ >>>>> >>>>>>>>>>> | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off | >>>>> >>>>>>>>>>> N/A | >>>>> >>>>>>>>>>> | 23% 36C P0 57W / 250W | 0MiB / 12189MiB | >>>>> >>>>>>> 0% >>>>> >>>>>>>>>>> Default | >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > +-------------------------------+----------------------+----------------------+ >>>>> >>>>>>>>>>> | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off | >>>>> >>>>>>>>>>> N/A | >>>>> >>>>>>>>>>> | 23% 35C P0 57W / 250W | 0MiB / 12189MiB | >>>>> >>>>>>> 0% >>>>> >>>>>>>>>>> Default | >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > +-------------------------------+----------------------+----------------------+ >>>>> >>>>>>>>>>> | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off | >>>>> >>>>>>>>>>> N/A | >>>>> >>>>>>>>>>> | 0% 35C P0 56W / 250W | 0MiB / 12189MiB | >>>>> >>>>>>> 0% >>>>> >>>>>>>>>>> Default | >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > +-------------------------------+----------------------+----------------------+ >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > +-----------------------------------------------------------------------------+ >>>>> >>>>>>>>>>> | Processes: >>>>> >>>>>>> GPU >>>>> >>>>>>>>>>> Memory | >>>>> >>>>>>>>>>> | GPU PID Type Process name >>>>> >>>>>>>>>>> Usage >>>>> >>>>>>>>>>> | >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > |=============================================================================| >>>>> >>>>>>>>>>> | No running processes found >>>>> >>>>>>>>>>> | >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>> >>>>> >>>>>> >>>>> >>>> >>> >> > +-----------------------------------------------------------------------------+ >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Alignment requirement for Surfaces: Yes >>>>> >>>>>>>>>>> Device has ECC support: Disabled >>>>> >>>>>>>>>>> Device supports Unified Addressing (UVA): Yes >>>>> >>>>>>>>>>> Device PCI Domain ID / Bus ID / location ID: 0 / 131 / >>>>> 0 >>>>> >>>>>>>>>>> Compute Mode: >>>>> >>>>>>>>>>> < Default (multiple host threads can use >>>>> >>>>>>> ::cudaSetDevice() with >>>>> >>>>>>>>>>> device simultaneously) > >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU1) : >>>>> >>>>>>>>>>> Yes >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU2) : >>>>> >>>>>>>>>>> No >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU3) : >>>>> >>>>>>>>>>> No >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU0) : >>>>> >>>>>>>>>>> Yes >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU2) : >>>>> >>>>>>>>>>> No >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU3) : >>>>> >>>>>>>>>>> No >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU0) : >>>>> >>>>>>>>>>> No >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU1) : >>>>> >>>>>>>>>>> No >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU3) : >>>>> >>>>>>>>>>> Yes >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU0) : >>>>> >>>>>>>>>>> No >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU1) : >>>>> >>>>>>>>>>> No >>>>> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X >>>>> (Pascal) >>>>> >>>>>>> (GPU2) : >>>>> >>>>>>>>>>> Yes >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = >>>>> 8.0, >>>>> >>>>>>> CUDA >>>>> >>>>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X >>>>> >>>>>>> (Pascal), >>>>> >>>>>>>>>>> Device1 >>>>> >>>>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = >>>>> >>>>>>> TITAN X >>>>> >>>>>>>>>>> (Pascal) >>>>> >>>>>>>>>>> Result = PASS >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Now not everything is rosy >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> root at gpu3$ cd >>>>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody >>>>> >>>>>>>>>>> root at gpu3$ make >>>>> >>>>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting >>>>> >>>>>>> Started Guide >>>>> >>>>>>>>>>> for how to find and install them. <<< >>>>> >>>>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting >>>>> >>>>>>> Started Guide >>>>> >>>>>>>>>>> for how to find and install them. <<< >>>>> >>>>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting >>>>> >>>>>>> Started Guide >>>>> >>>>>>>>>>> for how to find and install them. <<< >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> even though those are installed. For example >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> root at gpu3$ yum whatprovides */libX11.so >>>>> >>>>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for >>>>> libX11 >>>>> >>>>>>>>>>> Repo : core >>>>> >>>>>>>>>>> Matched from: >>>>> >>>>>>>>>>> Filename : /usr/lib/libX11.so >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> also >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> mesa-libGLU-devel >>>>> >>>>>>>>>>> mesa-libGL-devel >>>>> >>>>>>>>>>> xorg-x11-drv-nvidia-devel >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> but >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel >>>>> mesa-libGL-devel >>>>> >>>>>>>>>>> xorg-x11-drv-nvidia-devel >>>>> >>>>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already >>>>> >>>>>>> installed and >>>>> >>>>>>>>>>> latest version >>>>> >>>>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 >>>>> already >>>>> >>>>>>>>>>> installed >>>>> >>>>>>>>>>> and latest version >>>>> >>>>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 >>>>> >>>>>>> already >>>>> >>>>>>>>>>> installed and latest version >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Also from MATLAB gpuDevice hangs. >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> So we still don't have a working installation. Any help >>>>> would >>>>> >>>>>>> be >>>>> >>>>>>>>>>> appreciated. >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> Best, >>>>> >>>>>>>>>>> Predrag >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> P.S. Once we have a working installation we can think of >>>>> >>>>>>> installing >>>>> >>>>>>>>>>> Caffe and TensorFlow. For now we have to see why the >>>>> things >>>>> >>>>>>> are not >>>>> >>>>>>>>>>> working. >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>>> >>>>> >>>>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac >>>>> >>>>>>> >>>>> >>>>>>>>>>>>> wrote: >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> Dear Autonians, >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all >>>>> >>>>>>> packages >>>>> >>>>>>>>>>>>> are >>>>> >>>>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA >>>>> >>>>>>> driver to >>>>> >>>>>>>>>>>>> recognize GPU cards. They appear to be properly >>>>> installed >>>>> >>>>>>> from the >>>>> >>>>>>>>>>>>> hardware point of view and you can list them with >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> lshw -class display >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> root at gpu3$ lshw -class display >>>>> >>>>>>>>>>>>> *-display UNCLAIMED >>>>> >>>>>>>>>>>>> description: VGA compatible controller >>>>> >>>>>>>>>>>>> product: NVIDIA Corporation >>>>> >>>>>>>>>>>>> vendor: NVIDIA Corporation >>>>> >>>>>>>>>>>>> physical id: 0 >>>>> >>>>>>>>>>>>> bus info: pci at 0000:02:00.0 >>>>> >>>>>>>>>>>>> version: a1 >>>>> >>>>>>>>>>>>> width: 64 bits >>>>> >>>>>>>>>>>>> clock: 33MHz >>>>> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >>>>> >>>>>>> cap_list >>>>> >>>>>>>>>>>>> configuration: latency=0 >>>>> >>>>>>>>>>>>> resources: iomemory:383f0-383ef >>>>> iomemory:383f0-383ef >>>>> >>>>>>>>>>>>> memory:cf000000-cfffffff >>>>> memory:383fe0000000-383fefffffff >>>>> >>>>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128) >>>>> >>>>>>>>>>>>> memory:d0000000-d007ffff >>>>> >>>>>>>>>>>>> *-display UNCLAIMED >>>>> >>>>>>>>>>>>> description: VGA compatible controller >>>>> >>>>>>>>>>>>> product: NVIDIA Corporation >>>>> >>>>>>>>>>>>> vendor: NVIDIA Corporation >>>>> >>>>>>>>>>>>> physical id: 0 >>>>> >>>>>>>>>>>>> bus info: pci at 0000:03:00.0 >>>>> >>>>>>>>>>>>> version: a1 >>>>> >>>>>>>>>>>>> width: 64 bits >>>>> >>>>>>>>>>>>> clock: 33MHz >>>>> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >>>>> >>>>>>> cap_list >>>>> >>>>>>>>>>>>> configuration: latency=0 >>>>> >>>>>>>>>>>>> resources: iomemory:383f0-383ef >>>>> iomemory:383f0-383ef >>>>> >>>>>>>>>>>>> memory:cd000000-cdffffff >>>>> memory:383fc0000000-383fcfffffff >>>>> >>>>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128) >>>>> >>>>>>>>>>>>> memory:ce000000-ce07ffff >>>>> >>>>>>>>>>>>> *-display >>>>> >>>>>>>>>>>>> description: VGA compatible controller >>>>> >>>>>>>>>>>>> product: ASPEED Graphics Family >>>>> >>>>>>>>>>>>> vendor: ASPEED Technology, Inc. >>>>> >>>>>>>>>>>>> physical id: 0 >>>>> >>>>>>>>>>>>> bus info: pci at 0000:06:00.0 >>>>> >>>>>>>>>>>>> version: 30 >>>>> >>>>>>>>>>>>> width: 32 bits >>>>> >>>>>>>>>>>>> clock: 33MHz >>>>> >>>>>>>>>>>>> capabilities: pm msi vga_controller bus_master >>>>> >>>>>>> cap_list rom >>>>> >>>>>>>>>>>>> configuration: driver=ast latency=0 >>>>> >>>>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff >>>>> >>>>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128) >>>>> >>>>>>>>>>>>> *-display UNCLAIMED >>>>> >>>>>>>>>>>>> description: VGA compatible controller >>>>> >>>>>>>>>>>>> product: NVIDIA Corporation >>>>> >>>>>>>>>>>>> vendor: NVIDIA Corporation >>>>> >>>>>>>>>>>>> physical id: 0 >>>>> >>>>>>>>>>>>> bus info: pci at 0000:82:00.0 >>>>> >>>>>>>>>>>>> version: a1 >>>>> >>>>>>>>>>>>> width: 64 bits >>>>> >>>>>>>>>>>>> clock: 33MHz >>>>> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >>>>> >>>>>>> cap_list >>>>> >>>>>>>>>>>>> configuration: latency=0 >>>>> >>>>>>>>>>>>> resources: iomemory:387f0-387ef >>>>> iomemory:387f0-387ef >>>>> >>>>>>>>>>>>> memory:fa000000-faffffff >>>>> memory:387fe0000000-387fefffffff >>>>> >>>>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128) >>>>> >>>>>>>>>>>>> memory:fb000000-fb07ffff >>>>> >>>>>>>>>>>>> *-display UNCLAIMED >>>>> >>>>>>>>>>>>> description: VGA compatible controller >>>>> >>>>>>>>>>>>> product: NVIDIA Corporation >>>>> >>>>>>>>>>>>> vendor: NVIDIA Corporation >>>>> >>>>>>>>>>>>> physical id: 0 >>>>> >>>>>>>>>>>>> bus info: pci at 0000:83:00.0 >>>>> >>>>>>>>>>>>> version: a1 >>>>> >>>>>>>>>>>>> width: 64 bits >>>>> >>>>>>>>>>>>> clock: 33MHz >>>>> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >>>>> >>>>>>> cap_list >>>>> >>>>>>>>>>>>> configuration: latency=0 >>>>> >>>>>>>>>>>>> resources: iomemory:387f0-387ef >>>>> iomemory:387f0-387ef >>>>> >>>>>>>>>>>>> memory:f8000000-f8ffffff >>>>> memory:387fc0000000-387fcfffffff >>>>> >>>>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128) >>>>> >>>>>>>>>>>>> memory:f9000000-f907ffff >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> However what scares the hell out of me is that I don't >>>>> see >>>>> >>>>>>> NVIDIA >>>>> >>>>>>>>>>>>> driver >>>>> >>>>>>>>>>>>> loaded >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> lsmod|grep nvidia >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am >>>>> >>>>>>> guessing I >>>>> >>>>>>>>>>>>> just >>>>> >>>>>>>>>>>>> missed some trivial step during the CUDA installation >>>>> which >>>>> >>>>>>> is very >>>>> >>>>>>>>>>>>> involving. I am unfortunately too tired to debug this >>>>> >>>>>>> tonight. >>>>> >>>>>>>>>>>>> >>>>> >>>>>>>>>>>>> Predrag >>>>> >>>>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Links: >>>>> >>>>>> ------ >>>>> >>>>>> [1] http://findgl.mk >>> >>> >>> Links: >>> ------ >>> [1] http://findgl.mk > > > Links: > ------ > [1] http://findgl.mk From dougal at gmail.com Thu Oct 13 13:51:23 2016 From: dougal at gmail.com (Dougal Sutherland) Date: Thu, 13 Oct 2016 17:51:23 +0000 Subject: GPU3 is "configured" In-Reply-To: <576ceb12fa4fffb3b72b68a742a9b0b1@imap.srv.cs.cmu.edu> References: <20161012222658.bee4yVvYw%predragp@cs.cmu.edu> <20161013022332.Coy6tY6j7%predragp@cs.cmu.edu> <22C1E88D-D614-4509-B0EC-3EB2357C0C3A@andrew.cmu.edu>

<20161013153826.f4agzWkMb%predragp@cs.cmu.edu> <3ad25168d2dc7502872b0cde94950655@imap.srv.cs.cmu.edu> <576ceb12fa4fffb3b72b68a742a9b0b1@imap.srv.cs.cmu.edu> Message-ID: I actually haven't gotten tensorflow working yet -- the bazel build just hangs on me. I think it maybe has to do with home directories being on NFS, but I can't figure out bazel at all. I'll try some more tonight. Caffe should be workable following the instructions Predrag forwarded. - Dougal On Thu, Oct 13, 2016 at 6:39 PM Predrag Punosevac < predragp at imap.srv.cs.cmu.edu> wrote: > Dear Autonians, > > In the case anybody is interested what happens behind the scenes, Doug > got Caffe and TensorFlow to work on > GPU3. Please see message below. I also got the very useful feed back > from Princeton and Rutgers people. Please check out if you care (you > will have to log into Gmail to see the exchange). > > https://groups.google.com/forum/#!forum/springdale-users > > I need to think how we move forward with this before start pulling > triggers. If somebody is itchy and can't wait please build Caffe and > TensorFlow in your scratch directory following below howto. > > Predrag > > On 2016-10-13 13:24, Dougal Sutherland wrote: > > A note about cudnn: > > > > There are a bunch of versions of cudnn. They're not > > backwards-compatible, and different versions of > > caffe/tensorflow/whatever want different ones. > > > > I currently am using the setup in ~dsutherl/cudnn_files: > > > > * I have a bunch of versions of the installer there. > > * The use-cudnn.sh script, intended to be used like "source > > use-cudnn.sh 5.1", will untar the appropriate one into a scratch > > directory (if it hasn't already been done) and set > > CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH is > > needed for caffe binaries, since they don't link to the absolute path; > > the first two (not sure about the the third) are needed for theano. > > Dunno about tensorflow yet. > > > > So, here's the Caffe setup: > > > > cd /home/scratch/$USER > > git clone https://github.com/BVLC/caffe > > cd caffe > > cp Makefile.config.example Makefile.config > > > > # tell it to use openblas; using atlas needs some changes to the > > Makefile > > sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config > > > > # configure to use cudnn (optional) > > source ~dsutherl/cudnn-files/use-cudnn.sh 5.1 > > sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config > > perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/' > > Makefile.config > > perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/' > > Makefile.config > > > > # build the library > > make -j23 > > > > # to do tests (takes ~10 minutes): > > make -j23 test > > make runtest > > > > # Now, to run caffe binaries you'll need to remember to source > > use-cudnn if you used cudnn before. > > > > # To build the python libary: > > make py > > > > # Requirements for the python library: > > # Some of the system packages are too old; this installs them in your > > scratch directory. > > # You'll have to set PYTHONUSERBASE again before running any python > > processes that use these libs. > > export PYTHONUSERBASE=$HOME/scratch/.local; > > export PATH=$PYTHONUSERBASE/bin:"$PATH" # <- optional > > pip install --user -r python/requirements.txt > > > > # Caffe is dumb and doesn't package its python library properly. The > > easiest way to use it is: > > export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH > > python -c 'import caffe' > > > > On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland > > wrote: > > > >> Java fix seemed to work. Now tensorflow wants python-wheel and > >> swig. > >> > >> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac > >> wrote: > >> > >>> On 2016-10-13 11:46, Dougal Sutherland wrote: > >>> > >>>> Having some trouble with tensorflow, because: > >>> > >>>> > >>> > >>>> * it require's Google's bazel build system > >>> > >>>> > >>> > >>>> * The bazel installer says > >>> > >>>> Java version is 1.7.0_111 while at least 1.8 is needed. > >>> > >>>> * > >>> > >>>> > >>> > >>>> * $ java -version > >>> > >>>> openjdk version "1.8.0_102" > >>> > >>>> OpenJDK Runtime Environment (build 1.8.0_102-b14) > >>> > >>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode) > >>> > >>>> $ javac -version > >>> > >>>> javac 1.7.0_111 > >>> > >>>> > >>> > >>> I just did yum -y install java-1.8.0* which installs openjdk 1.8. > >>> Please > >>> > >>> change your java. Let me know if > >>> > >>> you want me to install Oracle JDK 1.8 > >>> > >>> Predrag > >>> > >>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac > >>> > >>>> wrote: > >>> > >>>> > >>> > >>>>> Dougal Sutherland wrote: > >>> > >>>>> > >>> > >>>>>> Also, this seemed to work for me so far for protobuf: > >>> > >>>>>> > >>> > >>>>>> cd /home/scratch/$USER > >>> > >>>>>> VER=3.1.0 > >>> > >>>>>> wget > >>> > >>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > https://github.com/google/protobuf/releases/download/v$VER/protobuf-cpp-$VER.tar.gz > >>> > >>>>>> tar xf protobuf-cpp-$VER.tar.gz > >>> > >>>>>> cd protobuf-cpp-$VER > >>> > >>>>>> ./configure --prefix=/home/scratch/$USER > >>> > >>>>>> make -j12 > >>> > >>>>>> make -j12 check > >>> > >>>>>> make install > >>> > >>>>> > >>> > >>>>> That is great help! > >>> > >>>>> > >>> > >>>>>> > >>> > >>>>>> You could change --prefix=/usr if making an RPM. > >>> > >>>>>> > >>> > >>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland > >>> > >>>>> wrote: > >>> > >>>>>> > >>> > >>>>>>> Some more packages for caffe: > >>> > >>>>>>> > >>> > >>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel > >>> > >>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel > >>> > >>>>>>> > >>> > >>>>>>> (Some of those might be installed already, but at least > >>> gflags > >>> > >>>>> is > >>> > >>>>>>> definitely missing.) > >>> > >>>>>>> > >>> > >>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac < > >>> > >>>>>>> predragp at imap.srv.cs.cmu.edu> wrote: > >>> > >>>>>>> > >>> > >>>>>>> On 2016-10-12 23:26, Arne Suppe wrote: > >>> > >>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice > >>> > >>>>> also hangs > >>> > >>>>>>>> on my computer with R2016a. > >>> > >>>>>>>> > >>> > >>>>>>> > >>> > >>>>>>> We would have to escalate this with MathWorks. I have seen > >>> work > >>> > >>>>> around > >>> > >>>>>>> Internet but it looks like a bug in one of Mathworks provided > >>> > >>>>> MEX files. > >>> > >>>>>>> > >>> > >>>>>>>> I was able compile the matrixMul example in the CUDA > >>> samples > >>> > >>>>> and run > >>> > >>>>>>>> it on gpu3, so I think the build environment is probably > >>> all > >>> > >>>>> set. > >>> > >>>>>>>> > >>> > >>>>>>>> As for the openGL, I think its possibly a problem with > >>> their > >>> > >>>>> build > >>> > >>>>>>>> script findgl.mk [1] [1] which is not familiar with > >>> Springdale OS. > >>> > >>>>> The > >>> > >>>>>>>> demo_suite directory has a precompiled nbody binary you may > >>> > >>>>> try, but I > >>> > >>>>>>>> suspect most users will not need graphics. > >>> > >>>>>>>> > >>> > >>>>>>> > >>> > >>>>>>> That should not be too hard to fix. Some header files have to > >>> be > >>> > >>>>>>> manually edited. The funny part until 7.2 Princeton people > >>> > >>>>> didn't bother > >>> > >>>>>>> to remove RHEL branding which actually made things easier for > >>> > >>>>> us. > >>> > >>>>>>> > >>> > >>>>>>> > >>> > >>>>>>> Doug is trying right now to compile the latest Caffe, > >>> > >>>>> TensorFlow, and > >>> > >>>>>>> protobuf-3. We will try to create an RPM for that so that we > >>> > >>>>> don't have > >>> > >>>>>>> to go through this again. I also asked Princeton and Rutgers > >>> > >>>>> guys if > >>> > >>>>>>> they > >>> > >>>>>>> have WIP RPMs to share. > >>> > >>>>>>> > >>> > >>>>>>> Predrag > >>> > >>>>>>> > >>> > >>>>>>>> Arne > >>> > >>>>>>>> > >>> > >>>>>>>> > >>> > >>>>>>>> > >>> > >>>>>>>> > >>> > >>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac > >>> > >>>>> > >>> > >>>>>>>>> wrote: > >>> > >>>>>>>>> > >>> > >>>>>>>>> Arne Suppe wrote: > >>> > >>>>>>>>> > >>> > >>>>>>>>>> Hi Predrag, > >>> > >>>>>>>>>> Don???t know if this applies to you, but I just build a > >>> > >>>>> machines with > >>> > >>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the > >>> > >>>>> Titan. After > >>> > >>>>>>>>>> installing CUDA 8, I still found I needed to install the > >>> > >>>>> latest > >>> > >>>>>>>>>> driver off of the NVIDIA web site to get the card > >>> > >>>>> recognized. Right > >>> > >>>>>>>>>> now, I am running 367.44. > >>> > >>>>>>>>>> > >>> > >>>>>>>>>> Arne > >>> > >>>>>>>>> > >>> > >>>>>>>>> Arne, > >>> > >>>>>>>>> > >>> > >>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL > >>> > >>>>> arhitecture I > >>> > >>>>>>>>> see lots of people complaining about it on the forums. I > >>> > >>>>> downloaded > >>> > >>>>>>>>> and > >>> > >>>>>>>>> installed driver from > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce > >>> > >>>>>>>>> > >>> > >>>>>>>>> That seems to made a real difference. Check out this > >>> > >>>>> beautiful outputs > >>> > >>>>>>>>> > >>> > >>>>>>>>> root at gpu3$ ls nvidia* > >>> > >>>>>>>>> nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm > >>> > >>>>>>>>> nvidia-uvm-tools > >>> > >>>>>>>>> > >>> > >>>>>>>>> root at gpu3$ lspci | grep -i nvidia > >>> > >>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation > >>> Device > >>> > >>>>> 1b00 (rev > >>> > >>>>>>>>> a1) > >>> > >>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >>> a1) > >>> > >>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation > >>> Device > >>> > >>>>> 1b00 (rev > >>> > >>>>>>>>> a1) > >>> > >>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >>> a1) > >>> > >>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation > >>> Device > >>> > >>>>> 1b00 (rev > >>> > >>>>>>>>> a1) > >>> > >>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >>> a1) > >>> > >>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation > >>> Device > >>> > >>>>> 1b00 (rev > >>> > >>>>>>>>> a1) > >>> > >>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >>> a1) > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> root at gpu3$ ls /proc/driver > >>> > >>>>>>>>> nvidia nvidia-uvm nvram rtc > >>> > >>>>>>>>> > >>> > >>>>>>>>> root at gpu3$ lsmod |grep nvidia > >>> > >>>>>>>>> nvidia_uvm 738901 0 > >>> > >>>>>>>>> nvidia_drm 43405 0 > >>> > >>>>>>>>> nvidia_modeset 764432 1 nvidia_drm > >>> > >>>>>>>>> nvidia 11492947 2 nvidia_modeset,nvidia_uvm > >>> > >>>>>>>>> drm_kms_helper 125056 2 ast,nvidia_drm > >>> > >>>>>>>>> drm 349210 5 > >>> > >>>>> ast,ttm,drm_kms_helper,nvidia_drm > >>> > >>>>>>>>> i2c_core 40582 7 > >>> > >>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia > >>> > >>>>>>>>> > >>> > >>>>>>>>> root at gpu3$ nvidia-smi > >>> > >>>>>>>>> Wed Oct 12 22:03:27 2016 > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > +-----------------------------------------------------------------------------+ > >>> > >>>>>>>>> | NVIDIA-SMI 367.57 Driver Version: 367.57 > >>> > >>>>>>>>> | > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > |-------------------------------+----------------------+----------------------+ > >>> > >>>>>>>>> | GPU Name Persistence-M| Bus-Id Disp.A | > >>> > >>>>> Volatile > >>> > >>>>>>>>> Uncorr. ECC | > >>> > >>>>>>>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | > >>> > >>>>> GPU-Util > >>> > >>>>>>>>> Compute M. | > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > |===============================+======================+======================| > >>> > >>>>>>>>> | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off | > >>> > >>>>>>>>> N/A | > >>> > >>>>>>>>> | 23% 32C P0 56W / 250W | 0MiB / 12189MiB | > >>> > >>>>> 0% > >>> > >>>>>>>>> Default | > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > +-------------------------------+----------------------+----------------------+ > >>> > >>>>>>>>> | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off | > >>> > >>>>>>>>> N/A | > >>> > >>>>>>>>> | 23% 36C P0 57W / 250W | 0MiB / 12189MiB | > >>> > >>>>> 0% > >>> > >>>>>>>>> Default | > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > +-------------------------------+----------------------+----------------------+ > >>> > >>>>>>>>> | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off | > >>> > >>>>>>>>> N/A | > >>> > >>>>>>>>> | 23% 35C P0 57W / 250W | 0MiB / 12189MiB | > >>> > >>>>> 0% > >>> > >>>>>>>>> Default | > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > +-------------------------------+----------------------+----------------------+ > >>> > >>>>>>>>> | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off | > >>> > >>>>>>>>> N/A | > >>> > >>>>>>>>> | 0% 35C P0 56W / 250W | 0MiB / 12189MiB | > >>> > >>>>> 0% > >>> > >>>>>>>>> Default | > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > +-------------------------------+----------------------+----------------------+ > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > +-----------------------------------------------------------------------------+ > >>> > >>>>>>>>> | Processes: > >>> > >>>>> GPU > >>> > >>>>>>>>> Memory | > >>> > >>>>>>>>> | GPU PID Type Process name > >>> > >>>>>>>>> Usage > >>> > >>>>>>>>> | > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > |=============================================================================| > >>> > >>>>>>>>> | No running processes found > >>> > >>>>>>>>> | > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>> > >>> > >>>> > >>> > >> > > > +-----------------------------------------------------------------------------+ > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery > >>> > >>>>>>>>> > >>> > >>>>>>>>> Alignment requirement for Surfaces: Yes > >>> > >>>>>>>>> Device has ECC support: Disabled > >>> > >>>>>>>>> Device supports Unified Addressing (UVA): Yes > >>> > >>>>>>>>> Device PCI Domain ID / Bus ID / location ID: 0 / 131 / > >>> 0 > >>> > >>>>>>>>> Compute Mode: > >>> > >>>>>>>>> < Default (multiple host threads can use > >>> > >>>>> ::cudaSetDevice() with > >>> > >>>>>>>>> device simultaneously) > > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU1) : > >>> > >>>>>>>>> Yes > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU2) : > >>> > >>>>>>>>> No > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU3) : > >>> > >>>>>>>>> No > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU0) : > >>> > >>>>>>>>> Yes > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU2) : > >>> > >>>>>>>>> No > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU3) : > >>> > >>>>>>>>> No > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU0) : > >>> > >>>>>>>>> No > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU1) : > >>> > >>>>>>>>> No > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU3) : > >>> > >>>>>>>>> Yes > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU0) : > >>> > >>>>>>>>> No > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU1) : > >>> > >>>>>>>>> No > >>> > >>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X > >>> (Pascal) > >>> > >>>>> (GPU2) : > >>> > >>>>>>>>> Yes > >>> > >>>>>>>>> > >>> > >>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = > >>> 8.0, > >>> > >>>>> CUDA > >>> > >>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X > >>> > >>>>> (Pascal), > >>> > >>>>>>>>> Device1 > >>> > >>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = > >>> > >>>>> TITAN X > >>> > >>>>>>>>> (Pascal) > >>> > >>>>>>>>> Result = PASS > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> Now not everything is rosy > >>> > >>>>>>>>> > >>> > >>>>>>>>> root at gpu3$ cd > >>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody > >>> > >>>>>>>>> root at gpu3$ make > >>> > >>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting > >>> > >>>>> Started Guide > >>> > >>>>>>>>> for how to find and install them. <<< > >>> > >>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting > >>> > >>>>> Started Guide > >>> > >>>>>>>>> for how to find and install them. <<< > >>> > >>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting > >>> > >>>>> Started Guide > >>> > >>>>>>>>> for how to find and install them. <<< > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> even though those are installed. For example > >>> > >>>>>>>>> > >>> > >>>>>>>>> root at gpu3$ yum whatprovides */libX11.so > >>> > >>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for > >>> libX11 > >>> > >>>>>>>>> Repo : core > >>> > >>>>>>>>> Matched from: > >>> > >>>>>>>>> Filename : /usr/lib/libX11.so > >>> > >>>>>>>>> > >>> > >>>>>>>>> also > >>> > >>>>>>>>> > >>> > >>>>>>>>> mesa-libGLU-devel > >>> > >>>>>>>>> mesa-libGL-devel > >>> > >>>>>>>>> xorg-x11-drv-nvidia-devel > >>> > >>>>>>>>> > >>> > >>>>>>>>> but > >>> > >>>>>>>>> > >>> > >>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel > >>> mesa-libGL-devel > >>> > >>>>>>>>> xorg-x11-drv-nvidia-devel > >>> > >>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already > >>> > >>>>> installed and > >>> > >>>>>>>>> latest version > >>> > >>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 > >>> already > >>> > >>>>>>>>> installed > >>> > >>>>>>>>> and latest version > >>> > >>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 > >>> > >>>>> already > >>> > >>>>>>>>> installed and latest version > >>> > >>>>>>>>> > >>> > >>>>>>>>> Also from MATLAB gpuDevice hangs. > >>> > >>>>>>>>> > >>> > >>>>>>>>> So we still don't have a working installation. Any help > >>> would > >>> > >>>>> be > >>> > >>>>>>>>> appreciated. > >>> > >>>>>>>>> > >>> > >>>>>>>>> Best, > >>> > >>>>>>>>> Predrag > >>> > >>>>>>>>> > >>> > >>>>>>>>> P.S. Once we have a working installation we can think of > >>> > >>>>> installing > >>> > >>>>>>>>> Caffe and TensorFlow. For now we have to see why the > >>> things > >>> > >>>>> are not > >>> > >>>>>>>>> working. > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>> > >>> > >>>>>>>>>> > >>> > >>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac > >>> > >>>>> > >>> > >>>>>>>>>>> wrote: > >>> > >>>>>>>>>>> > >>> > >>>>>>>>>>> Dear Autonians, > >>> > >>>>>>>>>>> > >>> > >>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all > >>> > >>>>> packages > >>> > >>>>>>>>>>> are > >>> > >>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA > >>> > >>>>> driver to > >>> > >>>>>>>>>>> recognize GPU cards. They appear to be properly > >>> installed > >>> > >>>>> from the > >>> > >>>>>>>>>>> hardware point of view and you can list them with > >>> > >>>>>>>>>>> > >>> > >>>>>>>>>>> lshw -class display > >>> > >>>>>>>>>>> > >>> > >>>>>>>>>>> root at gpu3$ lshw -class display > >>> > >>>>>>>>>>> *-display UNCLAIMED > >>> > >>>>>>>>>>> description: VGA compatible controller > >>> > >>>>>>>>>>> product: NVIDIA Corporation > >>> > >>>>>>>>>>> vendor: NVIDIA Corporation > >>> > >>>>>>>>>>> physical id: 0 > >>> > >>>>>>>>>>> bus info: pci at 0000:02:00.0 > >>> > >>>>>>>>>>> version: a1 > >>> > >>>>>>>>>>> width: 64 bits > >>> > >>>>>>>>>>> clock: 33MHz > >>> > >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >>> > >>>>> cap_list > >>> > >>>>>>>>>>> configuration: latency=0 > >>> > >>>>>>>>>>> resources: iomemory:383f0-383ef > >>> iomemory:383f0-383ef > >>> > >>>>>>>>>>> memory:cf000000-cfffffff > >>> memory:383fe0000000-383fefffffff > >>> > >>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128) > >>> > >>>>>>>>>>> memory:d0000000-d007ffff > >>> > >>>>>>>>>>> *-display UNCLAIMED > >>> > >>>>>>>>>>> description: VGA compatible controller > >>> > >>>>>>>>>>> product: NVIDIA Corporation > >>> > >>>>>>>>>>> vendor: NVIDIA Corporation > >>> > >>>>>>>>>>> physical id: 0 > >>> > >>>>>>>>>>> bus info: pci at 0000:03:00.0 > >>> > >>>>>>>>>>> version: a1 > >>> > >>>>>>>>>>> width: 64 bits > >>> > >>>>>>>>>>> clock: 33MHz > >>> > >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >>> > >>>>> cap_list > >>> > >>>>>>>>>>> configuration: latency=0 > >>> > >>>>>>>>>>> resources: iomemory:383f0-383ef > >>> iomemory:383f0-383ef > >>> > >>>>>>>>>>> memory:cd000000-cdffffff > >>> memory:383fc0000000-383fcfffffff > >>> > >>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128) > >>> > >>>>>>>>>>> memory:ce000000-ce07ffff > >>> > >>>>>>>>>>> *-display > >>> > >>>>>>>>>>> description: VGA compatible controller > >>> > >>>>>>>>>>> product: ASPEED Graphics Family > >>> > >>>>>>>>>>> vendor: ASPEED Technology, Inc. > >>> > >>>>>>>>>>> physical id: 0 > >>> > >>>>>>>>>>> bus info: pci at 0000:06:00.0 > >>> > >>>>>>>>>>> version: 30 > >>> > >>>>>>>>>>> width: 32 bits > >>> > >>>>>>>>>>> clock: 33MHz > >>> > >>>>>>>>>>> capabilities: pm msi vga_controller bus_master > >>> > >>>>> cap_list rom > >>> > >>>>>>>>>>> configuration: driver=ast latency=0 > >>> > >>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff > >>> > >>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128) > >>> > >>>>>>>>>>> *-display UNCLAIMED > >>> > >>>>>>>>>>> description: VGA compatible controller > >>> > >>>>>>>>>>> product: NVIDIA Corporation > >>> > >>>>>>>>>>> vendor: NVIDIA Corporation > >>> > >>>>>>>>>>> physical id: 0 > >>> > >>>>>>>>>>> bus info: pci at 0000:82:00.0 > >>> > >>>>>>>>>>> version: a1 > >>> > >>>>>>>>>>> width: 64 bits > >>> > >>>>>>>>>>> clock: 33MHz > >>> > >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >>> > >>>>> cap_list > >>> > >>>>>>>>>>> configuration: latency=0 > >>> > >>>>>>>>>>> resources: iomemory:387f0-387ef > >>> iomemory:387f0-387ef > >>> > >>>>>>>>>>> memory:fa000000-faffffff > >>> memory:387fe0000000-387fefffffff > >>> > >>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128) > >>> > >>>>>>>>>>> memory:fb000000-fb07ffff > >>> > >>>>>>>>>>> *-display UNCLAIMED > >>> > >>>>>>>>>>> description: VGA compatible controller > >>> > >>>>>>>>>>> product: NVIDIA Corporation > >>> > >>>>>>>>>>> vendor: NVIDIA Corporation > >>> > >>>>>>>>>>> physical id: 0 > >>> > >>>>>>>>>>> bus info: pci at 0000:83:00.0 > >>> > >>>>>>>>>>> version: a1 > >>> > >>>>>>>>>>> width: 64 bits > >>> > >>>>>>>>>>> clock: 33MHz > >>> > >>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >>> > >>>>> cap_list > >>> > >>>>>>>>>>> configuration: latency=0 > >>> > >>>>>>>>>>> resources: iomemory:387f0-387ef > >>> iomemory:387f0-387ef > >>> > >>>>>>>>>>> memory:f8000000-f8ffffff > >>> memory:387fc0000000-387fcfffffff > >>> > >>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128) > >>> > >>>>>>>>>>> memory:f9000000-f907ffff > >>> > >>>>>>>>>>> > >>> > >>>>>>>>>>> > >>> > >>>>>>>>>>> However what scares the hell out of me is that I don't > >>> see > >>> > >>>>> NVIDIA > >>> > >>>>>>>>>>> driver > >>> > >>>>>>>>>>> loaded > >>> > >>>>>>>>>>> > >>> > >>>>>>>>>>> lsmod|grep nvidia > >>> > >>>>>>>>>>> > >>> > >>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am > >>> > >>>>> guessing I > >>> > >>>>>>>>>>> just > >>> > >>>>>>>>>>> missed some trivial step during the CUDA installation > >>> which > >>> > >>>>> is very > >>> > >>>>>>>>>>> involving. I am unfortunately too tired to debug this > >>> > >>>>> tonight. > >>> > >>>>>>>>>>> > >>> > >>>>>>>>>>> Predrag > >>> > >>>>>>>>> > >>> > >>>>>>> > >>> > >>>>>>> > >>> > >>>> > >>> > >>>> > >>> > >>>> Links: > >>> > >>>> ------ > >>> > >>>> [1] http://findgl.mk > > > > > > Links: > > ------ > > [1] http://findgl.mk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dougal at gmail.com Thu Oct 13 13:58:58 2016 From: dougal at gmail.com (Dougal Sutherland) Date: Thu, 13 Oct 2016 17:58:58 +0000 Subject: GPU3 is "configured" In-Reply-To: References: <20161012222658.bee4yVvYw%predragp@cs.cmu.edu> <20161013022332.Coy6tY6j7%predragp@cs.cmu.edu> <22C1E88D-D614-4509-B0EC-3EB2357C0C3A@andrew.cmu.edu>

<20161013153826.f4agzWkMb%predragp@cs.cmu.edu> <3ad25168d2dc7502872b0cde94950655@imap.srv.cs.cmu.edu> <576ceb12fa4fffb3b72b68a742a9b0b1@imap.srv.cs.cmu.edu> Message-ID: According to the tensorflow site, the conda package doesn't support GPUs. On Thu, Oct 13, 2016, 6:55 PM Predrag Punosevac < predragp at imap.srv.cs.cmu.edu> wrote: > On 2016-10-13 13:51, Dougal Sutherland wrote: > > I actually haven't gotten tensorflow working yet -- the bazel build > > just hangs on me. I think it maybe has to do with home directories > > being on NFS, but I can't figure out bazel at all. I'll try some more > > tonight. > > > > According to one of Princeton guys we could just use Python conda for > TensorFlow. Please check out and use your scratch directory instead of > NFS. > > Quote: > > Hello, Predrag. > > We have caffe 1.00rc3 if you are interested. > > > ftp://ftp.cs.princeton.edu/pub/people/advorkin/SRPM/sd7/caffe-1.00rc3-3.sd7.src.rpm > > TensforFlow and protobuf-3 work great with conda > (http://conda.pydata.org). I just tried and had no problems installing > it for Python 2.7 and 3.5 > > > > Caffe should be workable following the instructions Predrag forwarded. > > > > - Dougal > > > > On Thu, Oct 13, 2016 at 6:39 PM Predrag Punosevac > > wrote: > > > >> Dear Autonians, > >> > >> In the case anybody is interested what happens behind the scenes, > >> Doug > >> got Caffe and TensorFlow to work on > >> GPU3. Please see message below. I also got the very useful feed > >> back > >> from Princeton and Rutgers people. Please check out if you care (you > >> will have to log into Gmail to see the exchange). > >> > >> https://groups.google.com/forum/#!forum/springdale-users > >> > >> I need to think how we move forward with this before start pulling > >> triggers. If somebody is itchy and can't wait please build Caffe and > >> TensorFlow in your scratch directory following below howto. > >> > >> Predrag > >> > >> On 2016-10-13 13:24, Dougal Sutherland wrote: > >>> A note about cudnn: > >>> > >>> There are a bunch of versions of cudnn. They're not > >>> backwards-compatible, and different versions of > >>> caffe/tensorflow/whatever want different ones. > >>> > >>> I currently am using the setup in ~dsutherl/cudnn_files: > >>> > >>> * I have a bunch of versions of the installer there. > >>> * The use-cudnn.sh script, intended to be used like "source > >>> use-cudnn.sh 5.1", will untar the appropriate one into a scratch > >>> directory (if it hasn't already been done) and set > >>> CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH > >> is > >>> needed for caffe binaries, since they don't link to the absolute > >> path; > >>> the first two (not sure about the the third) are needed for > >> theano. > >>> Dunno about tensorflow yet. > >>> > >>> So, here's the Caffe setup: > >>> > >>> cd /home/scratch/$USER > >>> git clone https://github.com/BVLC/caffe > >>> cd caffe > >>> cp Makefile.config.example Makefile.config > >>> > >>> # tell it to use openblas; using atlas needs some changes to the > >>> Makefile > >>> sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config > >>> > >>> # configure to use cudnn (optional) > >>> source ~dsutherl/cudnn-files/use-cudnn.sh 5.1 > >>> sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config > >>> perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/' > >>> Makefile.config > >>> perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/' > >>> Makefile.config > >>> > >>> # build the library > >>> make -j23 > >>> > >>> # to do tests (takes ~10 minutes): > >>> make -j23 test > >>> make runtest > >>> > >>> # Now, to run caffe binaries you'll need to remember to source > >>> use-cudnn if you used cudnn before. > >>> > >>> # To build the python libary: > >>> make py > >>> > >>> # Requirements for the python library: > >>> # Some of the system packages are too old; this installs them in > >> your > >>> scratch directory. > >>> # You'll have to set PYTHONUSERBASE again before running any > >> python > >>> processes that use these libs. > >>> export PYTHONUSERBASE=$HOME/scratch/.local; > >>> export PATH=$PYTHONUSERBASE/bin:"$PATH" # <- optional > >>> pip install --user -r python/requirements.txt > >>> > >>> # Caffe is dumb and doesn't package its python library properly. > >> The > >>> easiest way to use it is: > >>> export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH > >>> python -c 'import caffe' > >>> > >>> On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland > >> > >>> wrote: > >>> > >>>> Java fix seemed to work. Now tensorflow wants python-wheel and > >>>> swig. > >>>> > >>>> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac > >>>> wrote: > >>>> > >>>>> On 2016-10-13 11:46, Dougal Sutherland wrote: > >>>>> > >>>>>> Having some trouble with tensorflow, because: > >>>>> > >>>>>> > >>>>> > >>>>>> * it require's Google's bazel build system > >>>>> > >>>>>> > >>>>> > >>>>>> * The bazel installer says > >>>>> > >>>>>> Java version is 1.7.0_111 while at least 1.8 is needed. > >>>>> > >>>>>> * > >>>>> > >>>>>> > >>>>> > >>>>>> * $ java -version > >>>>> > >>>>>> openjdk version "1.8.0_102" > >>>>> > >>>>>> OpenJDK Runtime Environment (build 1.8.0_102-b14) > >>>>> > >>>>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode) > >>>>> > >>>>>> $ javac -version > >>>>> > >>>>>> javac 1.7.0_111 > >>>>> > >>>>>> > >>>>> > >>>>> I just did yum -y install java-1.8.0* which installs openjdk > >> 1.8. > >>>>> Please > >>>>> > >>>>> change your java. Let me know if > >>>>> > >>>>> you want me to install Oracle JDK 1.8 > >>>>> > >>>>> Predrag > >>>>> > >>>>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac > >>>>> > >>>>>> wrote: > >>>>> > >>>>>> > >>>>> > >>>>>>> Dougal Sutherland wrote: > >>>>> > >>>>>>> > >>>>> > >>>>>>>> Also, this seemed to work for me so far for protobuf: > >>>>> > >>>>>>>> > >>>>> > >>>>>>>> cd /home/scratch/$USER > >>>>> > >>>>>>>> VER=3.1.0 > >>>>> > >>>>>>>> wget > >>>>> > >>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > https://github.com/google/protobuf/releases/download/v$VER/protobuf-cpp-$VER.tar.gz > >>>>> > >>>>>>>> tar xf protobuf-cpp-$VER.tar.gz > >>>>> > >>>>>>>> cd protobuf-cpp-$VER > >>>>> > >>>>>>>> ./configure --prefix=/home/scratch/$USER > >>>>> > >>>>>>>> make -j12 > >>>>> > >>>>>>>> make -j12 check > >>>>> > >>>>>>>> make install > >>>>> > >>>>>>> > >>>>> > >>>>>>> That is great help! > >>>>> > >>>>>>> > >>>>> > >>>>>>>> > >>>>> > >>>>>>>> You could change --prefix=/usr if making an RPM. > >>>>> > >>>>>>>> > >>>>> > >>>>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland > >>>>> > >>>>>>> wrote: > >>>>> > >>>>>>>> > >>>>> > >>>>>>>>> Some more packages for caffe: > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel > >>>>> > >>>>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> (Some of those might be installed already, but at least > >>>>> gflags > >>>>> > >>>>>>> is > >>>>> > >>>>>>>>> definitely missing.) > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac < > >>>>> > >>>>>>>>> predragp at imap.srv.cs.cmu.edu> wrote: > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> On 2016-10-12 23:26, Arne Suppe wrote: > >>>>> > >>>>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice > >>>>> > >>>>>>> also hangs > >>>>> > >>>>>>>>>> on my computer with R2016a. > >>>>> > >>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> We would have to escalate this with MathWorks. I have seen > >>>>> work > >>>>> > >>>>>>> around > >>>>> > >>>>>>>>> Internet but it looks like a bug in one of Mathworks > >> provided > >>>>> > >>>>>>> MEX files. > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>>> I was able compile the matrixMul example in the CUDA > >>>>> samples > >>>>> > >>>>>>> and run > >>>>> > >>>>>>>>>> it on gpu3, so I think the build environment is probably > >>>>> all > >>>>> > >>>>>>> set. > >>>>> > >>>>>>>>>> > >>>>> > >>>>>>>>>> As for the openGL, I think its possibly a problem with > >>>>> their > >>>>> > >>>>>>> build > >>>>> > >>>>>>>>>> script findgl.mk [1] [1] [1] which is not familiar with > >>>>> Springdale OS. > >>>>> > >>>>>>> The > >>>>> > >>>>>>>>>> demo_suite directory has a precompiled nbody binary you may > >>>>> > >>>>>>> try, but I > >>>>> > >>>>>>>>>> suspect most users will not need graphics. > >>>>> > >>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> That should not be too hard to fix. Some header files have > >> to > >>>>> be > >>>>> > >>>>>>>>> manually edited. The funny part until 7.2 Princeton people > >>>>> > >>>>>>> didn't bother > >>>>> > >>>>>>>>> to remove RHEL branding which actually made things easier > >> for > >>>>> > >>>>>>> us. > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> Doug is trying right now to compile the latest Caffe, > >>>>> > >>>>>>> TensorFlow, and > >>>>> > >>>>>>>>> protobuf-3. We will try to create an RPM for that so that we > >>>>> > >>>>>>> don't have > >>>>> > >>>>>>>>> to go through this again. I also asked Princeton and Rutgers > >>>>> > >>>>>>> guys if > >>>>> > >>>>>>>>> they > >>>>> > >>>>>>>>> have WIP RPMs to share. > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> Predrag > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>>> Arne > >>>>> > >>>>>>>>>> > >>>>> > >>>>>>>>>> > >>>>> > >>>>>>>>>> > >>>>> > >>>>>>>>>> > >>>>> > >>>>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac > >>>>> > >>>>>>> > >>>>> > >>>>>>>>>>> wrote: > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> Arne Suppe wrote: > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>>> Hi Predrag, > >>>>> > >>>>>>>>>>>> Don???t know if this applies to you, but I just build a > >>>>> > >>>>>>> machines with > >>>>> > >>>>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the > >>>>> > >>>>>>> Titan. After > >>>>> > >>>>>>>>>>>> installing CUDA 8, I still found I needed to install the > >>>>> > >>>>>>> latest > >>>>> > >>>>>>>>>>>> driver off of the NVIDIA web site to get the card > >>>>> > >>>>>>> recognized. Right > >>>>> > >>>>>>>>>>>> now, I am running 367.44. > >>>>> > >>>>>>>>>>>> > >>>>> > >>>>>>>>>>>> Arne > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> Arne, > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL > >>>>> > >>>>>>> arhitecture I > >>>>> > >>>>>>>>>>> see lots of people complaining about it on the forums. I > >>>>> > >>>>>>> downloaded > >>>>> > >>>>>>>>>>> and > >>>>> > >>>>>>>>>>> installed driver from > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > http://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run&lang=us&type=GeForce > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> That seems to made a real difference. Check out this > >>>>> > >>>>>>> beautiful outputs > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> root at gpu3$ ls nvidia* > >>>>> > >>>>>>>>>>> nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm > >>>>> > >>>>>>>>>>> nvidia-uvm-tools > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> root at gpu3$ lspci | grep -i nvidia > >>>>> > >>>>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation > >>>>> Device > >>>>> > >>>>>>> 1b00 (rev > >>>>> > >>>>>>>>>>> a1) > >>>>> > >>>>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >>>>> a1) > >>>>> > >>>>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation > >>>>> Device > >>>>> > >>>>>>> 1b00 (rev > >>>>> > >>>>>>>>>>> a1) > >>>>> > >>>>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >>>>> a1) > >>>>> > >>>>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation > >>>>> Device > >>>>> > >>>>>>> 1b00 (rev > >>>>> > >>>>>>>>>>> a1) > >>>>> > >>>>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >>>>> a1) > >>>>> > >>>>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation > >>>>> Device > >>>>> > >>>>>>> 1b00 (rev > >>>>> > >>>>>>>>>>> a1) > >>>>> > >>>>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >>>>> a1) > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> root at gpu3$ ls /proc/driver > >>>>> > >>>>>>>>>>> nvidia nvidia-uvm nvram rtc > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> root at gpu3$ lsmod |grep nvidia > >>>>> > >>>>>>>>>>> nvidia_uvm 738901 0 > >>>>> > >>>>>>>>>>> nvidia_drm 43405 0 > >>>>> > >>>>>>>>>>> nvidia_modeset 764432 1 nvidia_drm > >>>>> > >>>>>>>>>>> nvidia 11492947 2 nvidia_modeset,nvidia_uvm > >>>>> > >>>>>>>>>>> drm_kms_helper 125056 2 ast,nvidia_drm > >>>>> > >>>>>>>>>>> drm 349210 5 > >>>>> > >>>>>>> ast,ttm,drm_kms_helper,nvidia_drm > >>>>> > >>>>>>>>>>> i2c_core 40582 7 > >>>>> > >>>>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> root at gpu3$ nvidia-smi > >>>>> > >>>>>>>>>>> Wed Oct 12 22:03:27 2016 > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > +-----------------------------------------------------------------------------+ > >>>>> > >>>>>>>>>>> | NVIDIA-SMI 367.57 Driver Version: 367.57 > >>>>> > >>>>>>>>>>> | > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > |-------------------------------+----------------------+----------------------+ > >>>>> > >>>>>>>>>>> | GPU Name Persistence-M| Bus-Id Disp.A | > >>>>> > >>>>>>> Volatile > >>>>> > >>>>>>>>>>> Uncorr. ECC | > >>>>> > >>>>>>>>>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | > >>>>> > >>>>>>> GPU-Util > >>>>> > >>>>>>>>>>> Compute M. | > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > |===============================+======================+======================| > >>>>> > >>>>>>>>>>> | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off | > >>>>> > >>>>>>>>>>> N/A | > >>>>> > >>>>>>>>>>> | 23% 32C P0 56W / 250W | 0MiB / 12189MiB | > >>>>> > >>>>>>> 0% > >>>>> > >>>>>>>>>>> Default | > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > +-------------------------------+----------------------+----------------------+ > >>>>> > >>>>>>>>>>> | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off | > >>>>> > >>>>>>>>>>> N/A | > >>>>> > >>>>>>>>>>> | 23% 36C P0 57W / 250W | 0MiB / 12189MiB | > >>>>> > >>>>>>> 0% > >>>>> > >>>>>>>>>>> Default | > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > +-------------------------------+----------------------+----------------------+ > >>>>> > >>>>>>>>>>> | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off | > >>>>> > >>>>>>>>>>> N/A | > >>>>> > >>>>>>>>>>> | 23% 35C P0 57W / 250W | 0MiB / 12189MiB | > >>>>> > >>>>>>> 0% > >>>>> > >>>>>>>>>>> Default | > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > +-------------------------------+----------------------+----------------------+ > >>>>> > >>>>>>>>>>> | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off | > >>>>> > >>>>>>>>>>> N/A | > >>>>> > >>>>>>>>>>> | 0% 35C P0 56W / 250W | 0MiB / 12189MiB | > >>>>> > >>>>>>> 0% > >>>>> > >>>>>>>>>>> Default | > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > +-------------------------------+----------------------+----------------------+ > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > +-----------------------------------------------------------------------------+ > >>>>> > >>>>>>>>>>> | Processes: > >>>>> > >>>>>>> GPU > >>>>> > >>>>>>>>>>> Memory | > >>>>> > >>>>>>>>>>> | GPU PID Type Process name > >>>>> > >>>>>>>>>>> Usage > >>>>> > >>>>>>>>>>> | > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > |=============================================================================| > >>>>> > >>>>>>>>>>> | No running processes found > >>>>> > >>>>>>>>>>> | > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>> > >>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > +-----------------------------------------------------------------------------+ > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> Alignment requirement for Surfaces: Yes > >>>>> > >>>>>>>>>>> Device has ECC support: Disabled > >>>>> > >>>>>>>>>>> Device supports Unified Addressing (UVA): Yes > >>>>> > >>>>>>>>>>> Device PCI Domain ID / Bus ID / location ID: 0 / 131 / > >>>>> 0 > >>>>> > >>>>>>>>>>> Compute Mode: > >>>>> > >>>>>>>>>>> < Default (multiple host threads can use > >>>>> > >>>>>>> ::cudaSetDevice() with > >>>>> > >>>>>>>>>>> device simultaneously) > > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU1) : > >>>>> > >>>>>>>>>>> Yes > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU2) : > >>>>> > >>>>>>>>>>> No > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU3) : > >>>>> > >>>>>>>>>>> No > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU0) : > >>>>> > >>>>>>>>>>> Yes > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU2) : > >>>>> > >>>>>>>>>>> No > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU3) : > >>>>> > >>>>>>>>>>> No > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU0) : > >>>>> > >>>>>>>>>>> No > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU1) : > >>>>> > >>>>>>>>>>> No > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU3) : > >>>>> > >>>>>>>>>>> Yes > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU0) : > >>>>> > >>>>>>>>>>> No > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU1) : > >>>>> > >>>>>>>>>>> No > >>>>> > >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X > >>>>> (Pascal) > >>>>> > >>>>>>> (GPU2) : > >>>>> > >>>>>>>>>>> Yes > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = > >>>>> 8.0, > >>>>> > >>>>>>> CUDA > >>>>> > >>>>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X > >>>>> > >>>>>>> (Pascal), > >>>>> > >>>>>>>>>>> Device1 > >>>>> > >>>>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = > >>>>> > >>>>>>> TITAN X > >>>>> > >>>>>>>>>>> (Pascal) > >>>>> > >>>>>>>>>>> Result = PASS > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> Now not everything is rosy > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> root at gpu3$ cd > >>>>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody > >>>>> > >>>>>>>>>>> root at gpu3$ make > >>>>> > >>>>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting > >>>>> > >>>>>>> Started Guide > >>>>> > >>>>>>>>>>> for how to find and install them. <<< > >>>>> > >>>>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting > >>>>> > >>>>>>> Started Guide > >>>>> > >>>>>>>>>>> for how to find and install them. <<< > >>>>> > >>>>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting > >>>>> > >>>>>>> Started Guide > >>>>> > >>>>>>>>>>> for how to find and install them. <<< > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> even though those are installed. For example > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> root at gpu3$ yum whatprovides */libX11.so > >>>>> > >>>>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for > >>>>> libX11 > >>>>> > >>>>>>>>>>> Repo : core > >>>>> > >>>>>>>>>>> Matched from: > >>>>> > >>>>>>>>>>> Filename : /usr/lib/libX11.so > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> also > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> mesa-libGLU-devel > >>>>> > >>>>>>>>>>> mesa-libGL-devel > >>>>> > >>>>>>>>>>> xorg-x11-drv-nvidia-devel > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> but > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel > >>>>> mesa-libGL-devel > >>>>> > >>>>>>>>>>> xorg-x11-drv-nvidia-devel > >>>>> > >>>>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already > >>>>> > >>>>>>> installed and > >>>>> > >>>>>>>>>>> latest version > >>>>> > >>>>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 > >>>>> already > >>>>> > >>>>>>>>>>> installed > >>>>> > >>>>>>>>>>> and latest version > >>>>> > >>>>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 > >>>>> > >>>>>>> already > >>>>> > >>>>>>>>>>> installed and latest version > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> Also from MATLAB gpuDevice hangs. > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> So we still don't have a working installation. Any help > >>>>> would > >>>>> > >>>>>>> be > >>>>> > >>>>>>>>>>> appreciated. > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> Best, > >>>>> > >>>>>>>>>>> Predrag > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> P.S. Once we have a working installation we can think of > >>>>> > >>>>>>> installing > >>>>> > >>>>>>>>>>> Caffe and TensorFlow. For now we have to see why the > >>>>> things > >>>>> > >>>>>>> are not > >>>>> > >>>>>>>>>>> working. > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>>>>> > >>>>> > >>>>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac > >>>>> > >>>>>>> > >>>>> > >>>>>>>>>>>>> wrote: > >>>>> > >>>>>>>>>>>>> > >>>>> > >>>>>>>>>>>>> Dear Autonians, > >>>>> > >>>>>>>>>>>>> > >>>>> > >>>>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all > >>>>> > >>>>>>> packages > >>>>> > >>>>>>>>>>>>> are > >>>>> > >>>>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA > >>>>> > >>>>>>> driver to > >>>>> > >>>>>>>>>>>>> recognize GPU cards. They appear to be properly > >>>>> installed > >>>>> > >>>>>>> from the > >>>>> > >>>>>>>>>>>>> hardware point of view and you can list them with > >>>>> > >>>>>>>>>>>>> > >>>>> > >>>>>>>>>>>>> lshw -class display > >>>>> > >>>>>>>>>>>>> > >>>>> > >>>>>>>>>>>>> root at gpu3$ lshw -class display > >>>>> > >>>>>>>>>>>>> *-display UNCLAIMED > >>>>> > >>>>>>>>>>>>> description: VGA compatible controller > >>>>> > >>>>>>>>>>>>> product: NVIDIA Corporation > >>>>> > >>>>>>>>>>>>> vendor: NVIDIA Corporation > >>>>> > >>>>>>>>>>>>> physical id: 0 > >>>>> > >>>>>>>>>>>>> bus info: pci at 0000:02:00.0 > >>>>> > >>>>>>>>>>>>> version: a1 > >>>>> > >>>>>>>>>>>>> width: 64 bits > >>>>> > >>>>>>>>>>>>> clock: 33MHz > >>>>> > >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >>>>> > >>>>>>> cap_list > >>>>> > >>>>>>>>>>>>> configuration: latency=0 > >>>>> > >>>>>>>>>>>>> resources: iomemory:383f0-383ef > >>>>> iomemory:383f0-383ef > >>>>> > >>>>>>>>>>>>> memory:cf000000-cfffffff > >>>>> memory:383fe0000000-383fefffffff > >>>>> > >>>>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128) > >>>>> > >>>>>>>>>>>>> memory:d0000000-d007ffff > >>>>> > >>>>>>>>>>>>> *-display UNCLAIMED > >>>>> > >>>>>>>>>>>>> description: VGA compatible controller > >>>>> > >>>>>>>>>>>>> product: NVIDIA Corporation > >>>>> > >>>>>>>>>>>>> vendor: NVIDIA Corporation > >>>>> > >>>>>>>>>>>>> physical id: 0 > >>>>> > >>>>>>>>>>>>> bus info: pci at 0000:03:00.0 > >>>>> > >>>>>>>>>>>>> version: a1 > >>>>> > >>>>>>>>>>>>> width: 64 bits > >>>>> > >>>>>>>>>>>>> clock: 33MHz > >>>>> > >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >>>>> > >>>>>>> cap_list > >>>>> > >>>>>>>>>>>>> configuration: latency=0 > >>>>> > >>>>>>>>>>>>> resources: iomemory:383f0-383ef > >>>>> iomemory:383f0-383ef > >>>>> > >>>>>>>>>>>>> memory:cd000000-cdffffff > >>>>> memory:383fc0000000-383fcfffffff > >>>>> > >>>>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128) > >>>>> > >>>>>>>>>>>>> memory:ce000000-ce07ffff > >>>>> > >>>>>>>>>>>>> *-display > >>>>> > >>>>>>>>>>>>> description: VGA compatible controller > >>>>> > >>>>>>>>>>>>> product: ASPEED Graphics Family > >>>>> > >>>>>>>>>>>>> vendor: ASPEED Technology, Inc. > >>>>> > >>>>>>>>>>>>> physical id: 0 > >>>>> > >>>>>>>>>>>>> bus info: pci at 0000:06:00.0 > >>>>> > >>>>>>>>>>>>> version: 30 > >>>>> > >>>>>>>>>>>>> width: 32 bits > >>>>> > >>>>>>>>>>>>> clock: 33MHz > >>>>> > >>>>>>>>>>>>> capabilities: pm msi vga_controller bus_master > >>>>> > >>>>>>> cap_list rom > >>>>> > >>>>>>>>>>>>> configuration: driver=ast latency=0 > >>>>> > >>>>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff > >>>>> > >>>>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128) > >>>>> > >>>>>>>>>>>>> *-display UNCLAIMED > >>>>> > >>>>>>>>>>>>> description: VGA compatible controller > >>>>> > >>>>>>>>>>>>> product: NVIDIA Corporation > >>>>> > >>>>>>>>>>>>> vendor: NVIDIA Corporation > >>>>> > >>>>>>>>>>>>> physical id: 0 > >>>>> > >>>>>>>>>>>>> bus info: pci at 0000:82:00.0 > >>>>> > >>>>>>>>>>>>> version: a1 > >>>>> > >>>>>>>>>>>>> width: 64 bits > >>>>> > >>>>>>>>>>>>> clock: 33MHz > >>>>> > >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >>>>> > >>>>>>> cap_list > >>>>> > >>>>>>>>>>>>> configuration: latency=0 > >>>>> > >>>>>>>>>>>>> resources: iomemory:387f0-387ef > >>>>> iomemory:387f0-387ef > >>>>> > >>>>>>>>>>>>> memory:fa000000-faffffff > >>>>> memory:387fe0000000-387fefffffff > >>>>> > >>>>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128) > >>>>> > >>>>>>>>>>>>> memory:fb000000-fb07ffff > >>>>> > >>>>>>>>>>>>> *-display UNCLAIMED > >>>>> > >>>>>>>>>>>>> description: VGA compatible controller > >>>>> > >>>>>>>>>>>>> product: NVIDIA Corporation > >>>>> > >>>>>>>>>>>>> vendor: NVIDIA Corporation > >>>>> > >>>>>>>>>>>>> physical id: 0 > >>>>> > >>>>>>>>>>>>> bus info: pci at 0000:83:00.0 > >>>>> > >>>>>>>>>>>>> version: a1 > >>>>> > >>>>>>>>>>>>> width: 64 bits > >>>>> > >>>>>>>>>>>>> clock: 33MHz > >>>>> > >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >>>>> > >>>>>>> cap_list > >>>>> > >>>>>>>>>>>>> configuration: latency=0 > >>>>> > >>>>>>>>>>>>> resources: iomemory:387f0-387ef > >>>>> iomemory:387f0-387ef > >>>>> > >>>>>>>>>>>>> memory:f8000000-f8ffffff > >>>>> memory:387fc0000000-387fcfffffff > >>>>> > >>>>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128) > >>>>> > >>>>>>>>>>>>> memory:f9000000-f907ffff > >>>>> > >>>>>>>>>>>>> > >>>>> > >>>>>>>>>>>>> > >>>>> > >>>>>>>>>>>>> However what scares the hell out of me is that I don't > >>>>> see > >>>>> > >>>>>>> NVIDIA > >>>>> > >>>>>>>>>>>>> driver > >>>>> > >>>>>>>>>>>>> loaded > >>>>> > >>>>>>>>>>>>> > >>>>> > >>>>>>>>>>>>> lsmod|grep nvidia > >>>>> > >>>>>>>>>>>>> > >>>>> > >>>>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am > >>>>> > >>>>>>> guessing I > >>>>> > >>>>>>>>>>>>> just > >>>>> > >>>>>>>>>>>>> missed some trivial step during the CUDA installation > >>>>> which > >>>>> > >>>>>>> is very > >>>>> > >>>>>>>>>>>>> involving. I am unfortunately too tired to debug this > >>>>> > >>>>>>> tonight. > >>>>> > >>>>>>>>>>>>> > >>>>> > >>>>>>>>>>>>> Predrag > >>>>> > >>>>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>>>>> > >>>>> > >>>>>> > >>>>> > >>>>>> > >>>>> > >>>>>> Links: > >>>>> > >>>>>> ------ > >>>>> > >>>>>> [1] http://findgl.mk > >>> > >>> > >>> Links: > >>> ------ > >>> [1] http://findgl.mk > > > > > > Links: > > ------ > > [1] http://findgl.mk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dbayani at andrew.cmu.edu Mon Oct 17 02:13:05 2016 From: dbayani at andrew.cmu.edu (David Bayani) Date: Sun, 16 Oct 2016 23:13:05 -0700 Subject: Auton Lab Website Personnel List Message-ID: Dear Autonians- We will be updating the website's personnel list in the near future. Beyond what was described in the previous website-related email (sent October 5th), no action is needed from any lab members. However, if you would prefer we not list you online for whatever reason (with no more information than what is the current standard for our website's list), feel free to contact me so that we can shift something out. We consider it important to respect personal wishes regarding content release. -Sincerely David B. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandasamy at cmu.edu Mon Oct 17 18:23:47 2016 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Mon, 17 Oct 2016 18:23:47 -0400 Subject: GPU3 is "configured" In-Reply-To: References: <20161012222658.bee4yVvYw%predragp@cs.cmu.edu> <20161013022332.Coy6tY6j7%predragp@cs.cmu.edu> <22C1E88D-D614-4509-B0EC-3EB2357C0C3A@andrew.cmu.edu>

<20161013153826.f4agzWkMb%predragp@cs.cmu.edu> <3ad25168d2dc7502872b0cde94950655@imap.srv.cs.cmu.edu> <576ceb12fa4fffb3b72b68a742a9b0b1@imap.srv.cs.cmu.edu>

Message-ID: Hi, Just following up. Has anyone managed to resolve this yet? I still can't run tensorflow on gpu3. samy On Thu, Oct 13, 2016 at 1:58 PM, Dougal Sutherland wrote: > According to the tensorflow site, the conda package doesn't support GPUs. > > On Thu, Oct 13, 2016, 6:55 PM Predrag Punosevac < > predragp at imap.srv.cs.cmu.edu> wrote: > >> On 2016-10-13 13:51, Dougal Sutherland wrote: >> > I actually haven't gotten tensorflow working yet -- the bazel build >> > just hangs on me. I think it maybe has to do with home directories >> > being on NFS, but I can't figure out bazel at all. I'll try some more >> > tonight. >> > >> >> According to one of Princeton guys we could just use Python conda for >> TensorFlow. Please check out and use your scratch directory instead of >> NFS. >> >> Quote: >> >> Hello, Predrag. >> >> We have caffe 1.00rc3 if you are interested. >> >> ftp://ftp.cs.princeton.edu/pub/people/advorkin/SRPM/sd7/ >> caffe-1.00rc3-3.sd7.src.rpm >> >> TensforFlow and protobuf-3 work great with conda >> (http://conda.pydata.org). I just tried and had no problems installing >> it for Python 2.7 and 3.5 >> >> >> > Caffe should be workable following the instructions Predrag forwarded. >> > >> > - Dougal >> > >> > On Thu, Oct 13, 2016 at 6:39 PM Predrag Punosevac >> > wrote: >> > >> >> Dear Autonians, >> >> >> >> In the case anybody is interested what happens behind the scenes, >> >> Doug >> >> got Caffe and TensorFlow to work on >> >> GPU3. Please see message below. I also got the very useful feed >> >> back >> >> from Princeton and Rutgers people. Please check out if you care (you >> >> will have to log into Gmail to see the exchange). >> >> >> >> https://groups.google.com/forum/#!forum/springdale-users >> >> >> >> I need to think how we move forward with this before start pulling >> >> triggers. If somebody is itchy and can't wait please build Caffe and >> >> TensorFlow in your scratch directory following below howto. >> >> >> >> Predrag >> >> >> >> On 2016-10-13 13:24, Dougal Sutherland wrote: >> >>> A note about cudnn: >> >>> >> >>> There are a bunch of versions of cudnn. They're not >> >>> backwards-compatible, and different versions of >> >>> caffe/tensorflow/whatever want different ones. >> >>> >> >>> I currently am using the setup in ~dsutherl/cudnn_files: >> >>> >> >>> * I have a bunch of versions of the installer there. >> >>> * The use-cudnn.sh script, intended to be used like "source >> >>> use-cudnn.sh 5.1", will untar the appropriate one into a scratch >> >>> directory (if it hasn't already been done) and set >> >>> CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH >> >> is >> >>> needed for caffe binaries, since they don't link to the absolute >> >> path; >> >>> the first two (not sure about the the third) are needed for >> >> theano. >> >>> Dunno about tensorflow yet. >> >>> >> >>> So, here's the Caffe setup: >> >>> >> >>> cd /home/scratch/$USER >> >>> git clone https://github.com/BVLC/caffe >> >>> cd caffe >> >>> cp Makefile.config.example Makefile.config >> >>> >> >>> # tell it to use openblas; using atlas needs some changes to the >> >>> Makefile >> >>> sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config >> >>> >> >>> # configure to use cudnn (optional) >> >>> source ~dsutherl/cudnn-files/use-cudnn.sh 5.1 >> >>> sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config >> >>> perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/' >> >>> Makefile.config >> >>> perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/' >> >>> Makefile.config >> >>> >> >>> # build the library >> >>> make -j23 >> >>> >> >>> # to do tests (takes ~10 minutes): >> >>> make -j23 test >> >>> make runtest >> >>> >> >>> # Now, to run caffe binaries you'll need to remember to source >> >>> use-cudnn if you used cudnn before. >> >>> >> >>> # To build the python libary: >> >>> make py >> >>> >> >>> # Requirements for the python library: >> >>> # Some of the system packages are too old; this installs them in >> >> your >> >>> scratch directory. >> >>> # You'll have to set PYTHONUSERBASE again before running any >> >> python >> >>> processes that use these libs. >> >>> export PYTHONUSERBASE=$HOME/scratch/.local; >> >>> export PATH=$PYTHONUSERBASE/bin:"$PATH" # <- optional >> >>> pip install --user -r python/requirements.txt >> >>> >> >>> # Caffe is dumb and doesn't package its python library properly. >> >> The >> >>> easiest way to use it is: >> >>> export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH >> >>> python -c 'import caffe' >> >>> >> >>> On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland >> >> >> >>> wrote: >> >>> >> >>>> Java fix seemed to work. Now tensorflow wants python-wheel and >> >>>> swig. >> >>>> >> >>>> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac >> >>>> wrote: >> >>>> >> >>>>> On 2016-10-13 11:46, Dougal Sutherland wrote: >> >>>>> >> >>>>>> Having some trouble with tensorflow, because: >> >>>>> >> >>>>>> >> >>>>> >> >>>>>> * it require's Google's bazel build system >> >>>>> >> >>>>>> >> >>>>> >> >>>>>> * The bazel installer says >> >>>>> >> >>>>>> Java version is 1.7.0_111 while at least 1.8 is needed. >> >>>>> >> >>>>>> * >> >>>>> >> >>>>>> >> >>>>> >> >>>>>> * $ java -version >> >>>>> >> >>>>>> openjdk version "1.8.0_102" >> >>>>> >> >>>>>> OpenJDK Runtime Environment (build 1.8.0_102-b14) >> >>>>> >> >>>>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode) >> >>>>> >> >>>>>> $ javac -version >> >>>>> >> >>>>>> javac 1.7.0_111 >> >>>>> >> >>>>>> >> >>>>> >> >>>>> I just did yum -y install java-1.8.0* which installs openjdk >> >> 1.8. >> >>>>> Please >> >>>>> >> >>>>> change your java. Let me know if >> >>>>> >> >>>>> you want me to install Oracle JDK 1.8 >> >>>>> >> >>>>> Predrag >> >>>>> >> >>>>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac >> >>>>> >> >>>>>> wrote: >> >>>>> >> >>>>>> >> >>>>> >> >>>>>>> Dougal Sutherland wrote: >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>>>> Also, this seemed to work for me so far for protobuf: >> >>>>> >> >>>>>>>> >> >>>>> >> >>>>>>>> cd /home/scratch/$USER >> >>>>> >> >>>>>>>> VER=3.1.0 >> >>>>> >> >>>>>>>> wget >> >>>>> >> >>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > https://github.com/google/protobuf/releases/download/v$ >> VER/protobuf-cpp-$VER.tar.gz >> >>>>> >> >>>>>>>> tar xf protobuf-cpp-$VER.tar.gz >> >>>>> >> >>>>>>>> cd protobuf-cpp-$VER >> >>>>> >> >>>>>>>> ./configure --prefix=/home/scratch/$USER >> >>>>> >> >>>>>>>> make -j12 >> >>>>> >> >>>>>>>> make -j12 check >> >>>>> >> >>>>>>>> make install >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>>> That is great help! >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>>>> >> >>>>> >> >>>>>>>> You could change --prefix=/usr if making an RPM. >> >>>>> >> >>>>>>>> >> >>>>> >> >>>>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland >> >>>>> >> >>>>>>> wrote: >> >>>>> >> >>>>>>>> >> >>>>> >> >>>>>>>>> Some more packages for caffe: >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel >> >>>>> >> >>>>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>> (Some of those might be installed already, but at least >> >>>>> gflags >> >>>>> >> >>>>>>> is >> >>>>> >> >>>>>>>>> definitely missing.) >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac < >> >>>>> >> >>>>>>>>> predragp at imap.srv.cs.cmu.edu> wrote: >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>> On 2016-10-12 23:26, Arne Suppe wrote: >> >>>>> >> >>>>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice >> >>>>> >> >>>>>>> also hangs >> >>>>> >> >>>>>>>>>> on my computer with R2016a. >> >>>>> >> >>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>> We would have to escalate this with MathWorks. I have seen >> >>>>> work >> >>>>> >> >>>>>>> around >> >>>>> >> >>>>>>>>> Internet but it looks like a bug in one of Mathworks >> >> provided >> >>>>> >> >>>>>>> MEX files. >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>>> I was able compile the matrixMul example in the CUDA >> >>>>> samples >> >>>>> >> >>>>>>> and run >> >>>>> >> >>>>>>>>>> it on gpu3, so I think the build environment is probably >> >>>>> all >> >>>>> >> >>>>>>> set. >> >>>>> >> >>>>>>>>>> >> >>>>> >> >>>>>>>>>> As for the openGL, I think its possibly a problem with >> >>>>> their >> >>>>> >> >>>>>>> build >> >>>>> >> >>>>>>>>>> script findgl.mk [1] [1] [1] which is not familiar with >> >>>>> Springdale OS. >> >>>>> >> >>>>>>> The >> >>>>> >> >>>>>>>>>> demo_suite directory has a precompiled nbody binary you may >> >>>>> >> >>>>>>> try, but I >> >>>>> >> >>>>>>>>>> suspect most users will not need graphics. >> >>>>> >> >>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>> That should not be too hard to fix. Some header files have >> >> to >> >>>>> be >> >>>>> >> >>>>>>>>> manually edited. The funny part until 7.2 Princeton people >> >>>>> >> >>>>>>> didn't bother >> >>>>> >> >>>>>>>>> to remove RHEL branding which actually made things easier >> >> for >> >>>>> >> >>>>>>> us. >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>> Doug is trying right now to compile the latest Caffe, >> >>>>> >> >>>>>>> TensorFlow, and >> >>>>> >> >>>>>>>>> protobuf-3. We will try to create an RPM for that so that we >> >>>>> >> >>>>>>> don't have >> >>>>> >> >>>>>>>>> to go through this again. I also asked Princeton and Rutgers >> >>>>> >> >>>>>>> guys if >> >>>>> >> >>>>>>>>> they >> >>>>> >> >>>>>>>>> have WIP RPMs to share. >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>> Predrag >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>>> Arne >> >>>>> >> >>>>>>>>>> >> >>>>> >> >>>>>>>>>> >> >>>>> >> >>>>>>>>>> >> >>>>> >> >>>>>>>>>> >> >>>>> >> >>>>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>>>>>>> wrote: >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> Arne Suppe wrote: >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>> Hi Predrag, >> >>>>> >> >>>>>>>>>>>> Don???t know if this applies to you, but I just build a >> >>>>> >> >>>>>>> machines with >> >>>>> >> >>>>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the >> >>>>> >> >>>>>>> Titan. After >> >>>>> >> >>>>>>>>>>>> installing CUDA 8, I still found I needed to install the >> >>>>> >> >>>>>>> latest >> >>>>> >> >>>>>>>>>>>> driver off of the NVIDIA web site to get the card >> >>>>> >> >>>>>>> recognized. Right >> >>>>> >> >>>>>>>>>>>> now, I am running 367.44. >> >>>>> >> >>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>> Arne >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> Arne, >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL >> >>>>> >> >>>>>>> arhitecture I >> >>>>> >> >>>>>>>>>>> see lots of people complaining about it on the forums. I >> >>>>> >> >>>>>>> downloaded >> >>>>> >> >>>>>>>>>>> and >> >>>>> >> >>>>>>>>>>> installed driver from >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > http://www.nvidia.com/content/DriverDownload-March2009/ >> confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA- >> Linux-x86_64-367.57.run&lang=us&type=GeForce >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> That seems to made a real difference. Check out this >> >>>>> >> >>>>>>> beautiful outputs >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> root at gpu3$ ls nvidia* >> >>>>> >> >>>>>>>>>>> nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm >> >>>>> >> >>>>>>>>>>> nvidia-uvm-tools >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> root at gpu3$ lspci | grep -i nvidia >> >>>>> >> >>>>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation >> >>>>> Device >> >>>>> >> >>>>>>> 1b00 (rev >> >>>>> >> >>>>>>>>>>> a1) >> >>>>> >> >>>>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >> >>>>> a1) >> >>>>> >> >>>>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation >> >>>>> Device >> >>>>> >> >>>>>>> 1b00 (rev >> >>>>> >> >>>>>>>>>>> a1) >> >>>>> >> >>>>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >> >>>>> a1) >> >>>>> >> >>>>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation >> >>>>> Device >> >>>>> >> >>>>>>> 1b00 (rev >> >>>>> >> >>>>>>>>>>> a1) >> >>>>> >> >>>>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >> >>>>> a1) >> >>>>> >> >>>>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation >> >>>>> Device >> >>>>> >> >>>>>>> 1b00 (rev >> >>>>> >> >>>>>>>>>>> a1) >> >>>>> >> >>>>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev >> >>>>> a1) >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> root at gpu3$ ls /proc/driver >> >>>>> >> >>>>>>>>>>> nvidia nvidia-uvm nvram rtc >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> root at gpu3$ lsmod |grep nvidia >> >>>>> >> >>>>>>>>>>> nvidia_uvm 738901 0 >> >>>>> >> >>>>>>>>>>> nvidia_drm 43405 0 >> >>>>> >> >>>>>>>>>>> nvidia_modeset 764432 1 nvidia_drm >> >>>>> >> >>>>>>>>>>> nvidia 11492947 2 nvidia_modeset,nvidia_uvm >> >>>>> >> >>>>>>>>>>> drm_kms_helper 125056 2 ast,nvidia_drm >> >>>>> >> >>>>>>>>>>> drm 349210 5 >> >>>>> >> >>>>>>> ast,ttm,drm_kms_helper,nvidia_drm >> >>>>> >> >>>>>>>>>>> i2c_core 40582 7 >> >>>>> >> >>>>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> root at gpu3$ nvidia-smi >> >>>>> >> >>>>>>>>>>> Wed Oct 12 22:03:27 2016 >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > +----------------------------------------------------------- >> ------------------+ >> >>>>> >> >>>>>>>>>>> | NVIDIA-SMI 367.57 Driver Version: 367.57 >> >>>>> >> >>>>>>>>>>> | >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > |-------------------------------+----------------------+---- >> ------------------+ >> >>>>> >> >>>>>>>>>>> | GPU Name Persistence-M| Bus-Id Disp.A | >> >>>>> >> >>>>>>> Volatile >> >>>>> >> >>>>>>>>>>> Uncorr. ECC | >> >>>>> >> >>>>>>>>>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | >> >>>>> >> >>>>>>> GPU-Util >> >>>>> >> >>>>>>>>>>> Compute M. | >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > |===============================+======================+==== >> ==================| >> >>>>> >> >>>>>>>>>>> | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off | >> >>>>> >> >>>>>>>>>>> N/A | >> >>>>> >> >>>>>>>>>>> | 23% 32C P0 56W / 250W | 0MiB / 12189MiB | >> >>>>> >> >>>>>>> 0% >> >>>>> >> >>>>>>>>>>> Default | >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > +-------------------------------+----------------------+---- >> ------------------+ >> >>>>> >> >>>>>>>>>>> | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off | >> >>>>> >> >>>>>>>>>>> N/A | >> >>>>> >> >>>>>>>>>>> | 23% 36C P0 57W / 250W | 0MiB / 12189MiB | >> >>>>> >> >>>>>>> 0% >> >>>>> >> >>>>>>>>>>> Default | >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > +-------------------------------+----------------------+---- >> ------------------+ >> >>>>> >> >>>>>>>>>>> | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off | >> >>>>> >> >>>>>>>>>>> N/A | >> >>>>> >> >>>>>>>>>>> | 23% 35C P0 57W / 250W | 0MiB / 12189MiB | >> >>>>> >> >>>>>>> 0% >> >>>>> >> >>>>>>>>>>> Default | >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > +-------------------------------+----------------------+---- >> ------------------+ >> >>>>> >> >>>>>>>>>>> | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off | >> >>>>> >> >>>>>>>>>>> N/A | >> >>>>> >> >>>>>>>>>>> | 0% 35C P0 56W / 250W | 0MiB / 12189MiB | >> >>>>> >> >>>>>>> 0% >> >>>>> >> >>>>>>>>>>> Default | >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > +-------------------------------+----------------------+---- >> ------------------+ >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > +----------------------------------------------------------- >> ------------------+ >> >>>>> >> >>>>>>>>>>> | Processes: >> >>>>> >> >>>>>>> GPU >> >>>>> >> >>>>>>>>>>> Memory | >> >>>>> >> >>>>>>>>>>> | GPU PID Type Process name >> >>>>> >> >>>>>>>>>>> Usage >> >>>>> >> >>>>>>>>>>> | >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > |=========================================================== >> ==================| >> >>>>> >> >>>>>>>>>>> | No running processes found >> >>>>> >> >>>>>>>>>>> | >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > +----------------------------------------------------------- >> ------------------+ >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> Alignment requirement for Surfaces: Yes >> >>>>> >> >>>>>>>>>>> Device has ECC support: Disabled >> >>>>> >> >>>>>>>>>>> Device supports Unified Addressing (UVA): Yes >> >>>>> >> >>>>>>>>>>> Device PCI Domain ID / Bus ID / location ID: 0 / 131 / >> >>>>> 0 >> >>>>> >> >>>>>>>>>>> Compute Mode: >> >>>>> >> >>>>>>>>>>> < Default (multiple host threads can use >> >>>>> >> >>>>>>> ::cudaSetDevice() with >> >>>>> >> >>>>>>>>>>> device simultaneously) > >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU1) : >> >>>>> >> >>>>>>>>>>> Yes >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU2) : >> >>>>> >> >>>>>>>>>>> No >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU3) : >> >>>>> >> >>>>>>>>>>> No >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU0) : >> >>>>> >> >>>>>>>>>>> Yes >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU2) : >> >>>>> >> >>>>>>>>>>> No >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU3) : >> >>>>> >> >>>>>>>>>>> No >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU0) : >> >>>>> >> >>>>>>>>>>> No >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU1) : >> >>>>> >> >>>>>>>>>>> No >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU3) : >> >>>>> >> >>>>>>>>>>> Yes >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU0) : >> >>>>> >> >>>>>>>>>>> No >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU1) : >> >>>>> >> >>>>>>>>>>> No >> >>>>> >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X >> >>>>> (Pascal) >> >>>>> >> >>>>>>> (GPU2) : >> >>>>> >> >>>>>>>>>>> Yes >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = >> >>>>> 8.0, >> >>>>> >> >>>>>>> CUDA >> >>>>> >> >>>>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X >> >>>>> >> >>>>>>> (Pascal), >> >>>>> >> >>>>>>>>>>> Device1 >> >>>>> >> >>>>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = >> >>>>> >> >>>>>>> TITAN X >> >>>>> >> >>>>>>>>>>> (Pascal) >> >>>>> >> >>>>>>>>>>> Result = PASS >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> Now not everything is rosy >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> root at gpu3$ cd >> >>>>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody >> >>>>> >> >>>>>>>>>>> root at gpu3$ make >> >>>>> >> >>>>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting >> >>>>> >> >>>>>>> Started Guide >> >>>>> >> >>>>>>>>>>> for how to find and install them. <<< >> >>>>> >> >>>>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting >> >>>>> >> >>>>>>> Started Guide >> >>>>> >> >>>>>>>>>>> for how to find and install them. <<< >> >>>>> >> >>>>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting >> >>>>> >> >>>>>>> Started Guide >> >>>>> >> >>>>>>>>>>> for how to find and install them. <<< >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> even though those are installed. For example >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> root at gpu3$ yum whatprovides */libX11.so >> >>>>> >> >>>>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for >> >>>>> libX11 >> >>>>> >> >>>>>>>>>>> Repo : core >> >>>>> >> >>>>>>>>>>> Matched from: >> >>>>> >> >>>>>>>>>>> Filename : /usr/lib/libX11.so >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> also >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> mesa-libGLU-devel >> >>>>> >> >>>>>>>>>>> mesa-libGL-devel >> >>>>> >> >>>>>>>>>>> xorg-x11-drv-nvidia-devel >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> but >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel >> >>>>> mesa-libGL-devel >> >>>>> >> >>>>>>>>>>> xorg-x11-drv-nvidia-devel >> >>>>> >> >>>>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already >> >>>>> >> >>>>>>> installed and >> >>>>> >> >>>>>>>>>>> latest version >> >>>>> >> >>>>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 >> >>>>> already >> >>>>> >> >>>>>>>>>>> installed >> >>>>> >> >>>>>>>>>>> and latest version >> >>>>> >> >>>>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 >> >>>>> >> >>>>>>> already >> >>>>> >> >>>>>>>>>>> installed and latest version >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> Also from MATLAB gpuDevice hangs. >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> So we still don't have a working installation. Any help >> >>>>> would >> >>>>> >> >>>>>>> be >> >>>>> >> >>>>>>>>>>> appreciated. >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> Best, >> >>>>> >> >>>>>>>>>>> Predrag >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> P.S. Once we have a working installation we can think of >> >>>>> >> >>>>>>> installing >> >>>>> >> >>>>>>>>>>> Caffe and TensorFlow. For now we have to see why the >> >>>>> things >> >>>>> >> >>>>>>> are not >> >>>>> >> >>>>>>>>>>> working. >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac >> >>>>> >> >>>>>>> >> >> >>>>> >> >>>>>>>>>>>>> wrote: >> >>>>> >> >>>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>>> Dear Autonians, >> >>>>> >> >>>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all >> >>>>> >> >>>>>>> packages >> >>>>> >> >>>>>>>>>>>>> are >> >>>>> >> >>>>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA >> >>>>> >> >>>>>>> driver to >> >>>>> >> >>>>>>>>>>>>> recognize GPU cards. They appear to be properly >> >>>>> installed >> >>>>> >> >>>>>>> from the >> >>>>> >> >>>>>>>>>>>>> hardware point of view and you can list them with >> >>>>> >> >>>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>>> lshw -class display >> >>>>> >> >>>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>>> root at gpu3$ lshw -class display >> >>>>> >> >>>>>>>>>>>>> *-display UNCLAIMED >> >>>>> >> >>>>>>>>>>>>> description: VGA compatible controller >> >>>>> >> >>>>>>>>>>>>> product: NVIDIA Corporation >> >>>>> >> >>>>>>>>>>>>> vendor: NVIDIA Corporation >> >>>>> >> >>>>>>>>>>>>> physical id: 0 >> >>>>> >> >>>>>>>>>>>>> bus info: pci at 0000:02:00.0 >> >>>>> >> >>>>>>>>>>>>> version: a1 >> >>>>> >> >>>>>>>>>>>>> width: 64 bits >> >>>>> >> >>>>>>>>>>>>> clock: 33MHz >> >>>>> >> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >> >>>>> >> >>>>>>> cap_list >> >>>>> >> >>>>>>>>>>>>> configuration: latency=0 >> >>>>> >> >>>>>>>>>>>>> resources: iomemory:383f0-383ef >> >>>>> iomemory:383f0-383ef >> >>>>> >> >>>>>>>>>>>>> memory:cf000000-cfffffff >> >>>>> memory:383fe0000000-383fefffffff >> >>>>> >> >>>>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128) >> >>>>> >> >>>>>>>>>>>>> memory:d0000000-d007ffff >> >>>>> >> >>>>>>>>>>>>> *-display UNCLAIMED >> >>>>> >> >>>>>>>>>>>>> description: VGA compatible controller >> >>>>> >> >>>>>>>>>>>>> product: NVIDIA Corporation >> >>>>> >> >>>>>>>>>>>>> vendor: NVIDIA Corporation >> >>>>> >> >>>>>>>>>>>>> physical id: 0 >> >>>>> >> >>>>>>>>>>>>> bus info: pci at 0000:03:00.0 >> >>>>> >> >>>>>>>>>>>>> version: a1 >> >>>>> >> >>>>>>>>>>>>> width: 64 bits >> >>>>> >> >>>>>>>>>>>>> clock: 33MHz >> >>>>> >> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >> >>>>> >> >>>>>>> cap_list >> >>>>> >> >>>>>>>>>>>>> configuration: latency=0 >> >>>>> >> >>>>>>>>>>>>> resources: iomemory:383f0-383ef >> >>>>> iomemory:383f0-383ef >> >>>>> >> >>>>>>>>>>>>> memory:cd000000-cdffffff >> >>>>> memory:383fc0000000-383fcfffffff >> >>>>> >> >>>>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128) >> >>>>> >> >>>>>>>>>>>>> memory:ce000000-ce07ffff >> >>>>> >> >>>>>>>>>>>>> *-display >> >>>>> >> >>>>>>>>>>>>> description: VGA compatible controller >> >>>>> >> >>>>>>>>>>>>> product: ASPEED Graphics Family >> >>>>> >> >>>>>>>>>>>>> vendor: ASPEED Technology, Inc. >> >>>>> >> >>>>>>>>>>>>> physical id: 0 >> >>>>> >> >>>>>>>>>>>>> bus info: pci at 0000:06:00.0 >> >>>>> >> >>>>>>>>>>>>> version: 30 >> >>>>> >> >>>>>>>>>>>>> width: 32 bits >> >>>>> >> >>>>>>>>>>>>> clock: 33MHz >> >>>>> >> >>>>>>>>>>>>> capabilities: pm msi vga_controller bus_master >> >>>>> >> >>>>>>> cap_list rom >> >>>>> >> >>>>>>>>>>>>> configuration: driver=ast latency=0 >> >>>>> >> >>>>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff >> >>>>> >> >>>>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128) >> >>>>> >> >>>>>>>>>>>>> *-display UNCLAIMED >> >>>>> >> >>>>>>>>>>>>> description: VGA compatible controller >> >>>>> >> >>>>>>>>>>>>> product: NVIDIA Corporation >> >>>>> >> >>>>>>>>>>>>> vendor: NVIDIA Corporation >> >>>>> >> >>>>>>>>>>>>> physical id: 0 >> >>>>> >> >>>>>>>>>>>>> bus info: pci at 0000:82:00.0 >> >>>>> >> >>>>>>>>>>>>> version: a1 >> >>>>> >> >>>>>>>>>>>>> width: 64 bits >> >>>>> >> >>>>>>>>>>>>> clock: 33MHz >> >>>>> >> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >> >>>>> >> >>>>>>> cap_list >> >>>>> >> >>>>>>>>>>>>> configuration: latency=0 >> >>>>> >> >>>>>>>>>>>>> resources: iomemory:387f0-387ef >> >>>>> iomemory:387f0-387ef >> >>>>> >> >>>>>>>>>>>>> memory:fa000000-faffffff >> >>>>> memory:387fe0000000-387fefffffff >> >>>>> >> >>>>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128) >> >>>>> >> >>>>>>>>>>>>> memory:fb000000-fb07ffff >> >>>>> >> >>>>>>>>>>>>> *-display UNCLAIMED >> >>>>> >> >>>>>>>>>>>>> description: VGA compatible controller >> >>>>> >> >>>>>>>>>>>>> product: NVIDIA Corporation >> >>>>> >> >>>>>>>>>>>>> vendor: NVIDIA Corporation >> >>>>> >> >>>>>>>>>>>>> physical id: 0 >> >>>>> >> >>>>>>>>>>>>> bus info: pci at 0000:83:00.0 >> >>>>> >> >>>>>>>>>>>>> version: a1 >> >>>>> >> >>>>>>>>>>>>> width: 64 bits >> >>>>> >> >>>>>>>>>>>>> clock: 33MHz >> >>>>> >> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller >> >>>>> >> >>>>>>> cap_list >> >>>>> >> >>>>>>>>>>>>> configuration: latency=0 >> >>>>> >> >>>>>>>>>>>>> resources: iomemory:387f0-387ef >> >>>>> iomemory:387f0-387ef >> >>>>> >> >>>>>>>>>>>>> memory:f8000000-f8ffffff >> >>>>> memory:387fc0000000-387fcfffffff >> >>>>> >> >>>>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128) >> >>>>> >> >>>>>>>>>>>>> memory:f9000000-f907ffff >> >>>>> >> >>>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>>> However what scares the hell out of me is that I don't >> >>>>> see >> >>>>> >> >>>>>>> NVIDIA >> >>>>> >> >>>>>>>>>>>>> driver >> >>>>> >> >>>>>>>>>>>>> loaded >> >>>>> >> >>>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>>> lsmod|grep nvidia >> >>>>> >> >>>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am >> >>>>> >> >>>>>>> guessing I >> >>>>> >> >>>>>>>>>>>>> just >> >>>>> >> >>>>>>>>>>>>> missed some trivial step during the CUDA installation >> >>>>> which >> >>>>> >> >>>>>>> is very >> >>>>> >> >>>>>>>>>>>>> involving. I am unfortunately too tired to debug this >> >>>>> >> >>>>>>> tonight. >> >>>>> >> >>>>>>>>>>>>> >> >>>>> >> >>>>>>>>>>>>> Predrag >> >>>>> >> >>>>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>>>> >> >>>>> >> >>>>>> Links: >> >>>>> >> >>>>>> ------ >> >>>>> >> >>>>>> [1] http://findgl.mk >> >>> >> >>> >> >>> Links: >> >>> ------ >> >>> [1] http://findgl.mk >> > >> > >> > Links: >> > ------ >> > [1] http://findgl.mk >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at cs.cmu.edu Mon Oct 17 20:37:31 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Mon, 17 Oct 2016 20:37:31 -0400 Subject: GPU3 is "configured" In-Reply-To: References: <20161012222658.bee4yVvYw%predragp@cs.cmu.edu> <20161013022332.Coy6tY6j7%predragp@cs.cmu.edu> <22C1E88D-D614-4509-B0EC-3EB2357C0C3A@andrew.cmu.edu>

<20161013153826.f4agzWkMb%predragp@cs.cmu.edu> <3ad25168d2dc7502872b0cde94950655@imap.srv.cs.cmu.edu> <576ceb12fa4fffb3b72b68a742a9b0b1@imap.srv.cs.cmu.edu>

Message-ID: <20161018003731.k_VYAK8Xs%predragp@cs.cmu.edu> Kirthevasan Kandasamy wrote: > Hi, > > Just following up. Has anyone managed to resolve this yet? > I still can't run tensorflow on gpu3. > > samy I will not have time to look this back before Wednesday. Predrag > > On Thu, Oct 13, 2016 at 1:58 PM, Dougal Sutherland wrote: > > > According to the tensorflow site, the conda package doesn't support GPUs. > > > > On Thu, Oct 13, 2016, 6:55 PM Predrag Punosevac < > > predragp at imap.srv.cs.cmu.edu> wrote: > > > >> On 2016-10-13 13:51, Dougal Sutherland wrote: > >> > I actually haven't gotten tensorflow working yet -- the bazel build > >> > just hangs on me. I think it maybe has to do with home directories > >> > being on NFS, but I can't figure out bazel at all. I'll try some more > >> > tonight. > >> > > >> > >> According to one of Princeton guys we could just use Python conda for > >> TensorFlow. Please check out and use your scratch directory instead of > >> NFS. > >> > >> Quote: > >> > >> Hello, Predrag. > >> > >> We have caffe 1.00rc3 if you are interested. > >> > >> ftp://ftp.cs.princeton.edu/pub/people/advorkin/SRPM/sd7/ > >> caffe-1.00rc3-3.sd7.src.rpm > >> > >> TensforFlow and protobuf-3 work great with conda > >> (http://conda.pydata.org). I just tried and had no problems installing > >> it for Python 2.7 and 3.5 > >> > >> > >> > Caffe should be workable following the instructions Predrag forwarded. > >> > > >> > - Dougal > >> > > >> > On Thu, Oct 13, 2016 at 6:39 PM Predrag Punosevac > >> > wrote: > >> > > >> >> Dear Autonians, > >> >> > >> >> In the case anybody is interested what happens behind the scenes, > >> >> Doug > >> >> got Caffe and TensorFlow to work on > >> >> GPU3. Please see message below. I also got the very useful feed > >> >> back > >> >> from Princeton and Rutgers people. Please check out if you care (you > >> >> will have to log into Gmail to see the exchange). > >> >> > >> >> https://groups.google.com/forum/#!forum/springdale-users > >> >> > >> >> I need to think how we move forward with this before start pulling > >> >> triggers. If somebody is itchy and can't wait please build Caffe and > >> >> TensorFlow in your scratch directory following below howto. > >> >> > >> >> Predrag > >> >> > >> >> On 2016-10-13 13:24, Dougal Sutherland wrote: > >> >>> A note about cudnn: > >> >>> > >> >>> There are a bunch of versions of cudnn. They're not > >> >>> backwards-compatible, and different versions of > >> >>> caffe/tensorflow/whatever want different ones. > >> >>> > >> >>> I currently am using the setup in ~dsutherl/cudnn_files: > >> >>> > >> >>> * I have a bunch of versions of the installer there. > >> >>> * The use-cudnn.sh script, intended to be used like "source > >> >>> use-cudnn.sh 5.1", will untar the appropriate one into a scratch > >> >>> directory (if it hasn't already been done) and set > >> >>> CPATH/LIBRARY_PATH/LD_LIBRARY_PATH appropriately. LD_LIBRARY_PATH > >> >> is > >> >>> needed for caffe binaries, since they don't link to the absolute > >> >> path; > >> >>> the first two (not sure about the the third) are needed for > >> >> theano. > >> >>> Dunno about tensorflow yet. > >> >>> > >> >>> So, here's the Caffe setup: > >> >>> > >> >>> cd /home/scratch/$USER > >> >>> git clone https://github.com/BVLC/caffe > >> >>> cd caffe > >> >>> cp Makefile.config.example Makefile.config > >> >>> > >> >>> # tell it to use openblas; using atlas needs some changes to the > >> >>> Makefile > >> >>> sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config > >> >>> > >> >>> # configure to use cudnn (optional) > >> >>> source ~dsutherl/cudnn-files/use-cudnn.sh 5.1 > >> >>> sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config > >> >>> perl -i -pe 's|$| '$CUDNN_DIR'/include| if /INCLUDE_DIRS :=/' > >> >>> Makefile.config > >> >>> perl -i -pe 's|$| '$CUDNN_DIR'/lib64| if /LIBRARY_DIRS :=/' > >> >>> Makefile.config > >> >>> > >> >>> # build the library > >> >>> make -j23 > >> >>> > >> >>> # to do tests (takes ~10 minutes): > >> >>> make -j23 test > >> >>> make runtest > >> >>> > >> >>> # Now, to run caffe binaries you'll need to remember to source > >> >>> use-cudnn if you used cudnn before. > >> >>> > >> >>> # To build the python libary: > >> >>> make py > >> >>> > >> >>> # Requirements for the python library: > >> >>> # Some of the system packages are too old; this installs them in > >> >> your > >> >>> scratch directory. > >> >>> # You'll have to set PYTHONUSERBASE again before running any > >> >> python > >> >>> processes that use these libs. > >> >>> export PYTHONUSERBASE=$HOME/scratch/.local; > >> >>> export PATH=$PYTHONUSERBASE/bin:"$PATH" # <- optional > >> >>> pip install --user -r python/requirements.txt > >> >>> > >> >>> # Caffe is dumb and doesn't package its python library properly. > >> >> The > >> >>> easiest way to use it is: > >> >>> export PYTHONPATH=/home/scratch/$USER/caffe/python:$PYTHONPATH > >> >>> python -c 'import caffe' > >> >>> > >> >>> On Thu, Oct 13, 2016 at 6:01 PM Dougal Sutherland > >> >> > >> >>> wrote: > >> >>> > >> >>>> Java fix seemed to work. Now tensorflow wants python-wheel and > >> >>>> swig. > >> >>>> > >> >>>> On Thu, Oct 13, 2016 at 5:08 PM Predrag Punosevac > >> >>>> wrote: > >> >>>> > >> >>>>> On 2016-10-13 11:46, Dougal Sutherland wrote: > >> >>>>> > >> >>>>>> Having some trouble with tensorflow, because: > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>>>> * it require's Google's bazel build system > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>>>> * The bazel installer says > >> >>>>> > >> >>>>>> Java version is 1.7.0_111 while at least 1.8 is needed. > >> >>>>> > >> >>>>>> * > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>>>> * $ java -version > >> >>>>> > >> >>>>>> openjdk version "1.8.0_102" > >> >>>>> > >> >>>>>> OpenJDK Runtime Environment (build 1.8.0_102-b14) > >> >>>>> > >> >>>>>> OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode) > >> >>>>> > >> >>>>>> $ javac -version > >> >>>>> > >> >>>>>> javac 1.7.0_111 > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>>> I just did yum -y install java-1.8.0* which installs openjdk > >> >> 1.8. > >> >>>>> Please > >> >>>>> > >> >>>>> change your java. Let me know if > >> >>>>> > >> >>>>> you want me to install Oracle JDK 1.8 > >> >>>>> > >> >>>>> Predrag > >> >>>>> > >> >>>>>> On Thu, Oct 13, 2016 at 4:38 PM Predrag Punosevac > >> >>>>> > >> >>>>>> wrote: > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>>>>> Dougal Sutherland wrote: > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>>>> Also, this seemed to work for me so far for protobuf: > >> >>>>> > >> >>>>>>>> > >> >>>>> > >> >>>>>>>> cd /home/scratch/$USER > >> >>>>> > >> >>>>>>>> VER=3.1.0 > >> >>>>> > >> >>>>>>>> wget > >> >>>>> > >> >>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > https://github.com/google/protobuf/releases/download/v$ > >> VER/protobuf-cpp-$VER.tar.gz > >> >>>>> > >> >>>>>>>> tar xf protobuf-cpp-$VER.tar.gz > >> >>>>> > >> >>>>>>>> cd protobuf-cpp-$VER > >> >>>>> > >> >>>>>>>> ./configure --prefix=/home/scratch/$USER > >> >>>>> > >> >>>>>>>> make -j12 > >> >>>>> > >> >>>>>>>> make -j12 check > >> >>>>> > >> >>>>>>>> make install > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>>> That is great help! > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>>>> > >> >>>>> > >> >>>>>>>> You could change --prefix=/usr if making an RPM. > >> >>>>> > >> >>>>>>>> > >> >>>>> > >> >>>>>>>> On Thu, Oct 13, 2016 at 4:26 PM Dougal Sutherland > >> >>>>> > >> >>>>>>> wrote: > >> >>>>> > >> >>>>>>>> > >> >>>>> > >> >>>>>>>>> Some more packages for caffe: > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>> leveldb-devel snappy-devel opencv-devel boost-devel > >> >>>>> > >> >>>>>>>>> hdf5-devel gflags-devel glog-devel lmdb-devel > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>> (Some of those might be installed already, but at least > >> >>>>> gflags > >> >>>>> > >> >>>>>>> is > >> >>>>> > >> >>>>>>>>> definitely missing.) > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>> On Thu, Oct 13, 2016 at 3:44 PM Predrag Punosevac < > >> >>>>> > >> >>>>>>>>> predragp at imap.srv.cs.cmu.edu> wrote: > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>> On 2016-10-12 23:26, Arne Suppe wrote: > >> >>>>> > >> >>>>>>>>>> Hmm - I don???t use matlab for deep learning, but gpuDevice > >> >>>>> > >> >>>>>>> also hangs > >> >>>>> > >> >>>>>>>>>> on my computer with R2016a. > >> >>>>> > >> >>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>> We would have to escalate this with MathWorks. I have seen > >> >>>>> work > >> >>>>> > >> >>>>>>> around > >> >>>>> > >> >>>>>>>>> Internet but it looks like a bug in one of Mathworks > >> >> provided > >> >>>>> > >> >>>>>>> MEX files. > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>>> I was able compile the matrixMul example in the CUDA > >> >>>>> samples > >> >>>>> > >> >>>>>>> and run > >> >>>>> > >> >>>>>>>>>> it on gpu3, so I think the build environment is probably > >> >>>>> all > >> >>>>> > >> >>>>>>> set. > >> >>>>> > >> >>>>>>>>>> > >> >>>>> > >> >>>>>>>>>> As for the openGL, I think its possibly a problem with > >> >>>>> their > >> >>>>> > >> >>>>>>> build > >> >>>>> > >> >>>>>>>>>> script findgl.mk [1] [1] [1] which is not familiar with > >> >>>>> Springdale OS. > >> >>>>> > >> >>>>>>> The > >> >>>>> > >> >>>>>>>>>> demo_suite directory has a precompiled nbody binary you may > >> >>>>> > >> >>>>>>> try, but I > >> >>>>> > >> >>>>>>>>>> suspect most users will not need graphics. > >> >>>>> > >> >>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>> That should not be too hard to fix. Some header files have > >> >> to > >> >>>>> be > >> >>>>> > >> >>>>>>>>> manually edited. The funny part until 7.2 Princeton people > >> >>>>> > >> >>>>>>> didn't bother > >> >>>>> > >> >>>>>>>>> to remove RHEL branding which actually made things easier > >> >> for > >> >>>>> > >> >>>>>>> us. > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>> Doug is trying right now to compile the latest Caffe, > >> >>>>> > >> >>>>>>> TensorFlow, and > >> >>>>> > >> >>>>>>>>> protobuf-3. We will try to create an RPM for that so that we > >> >>>>> > >> >>>>>>> don't have > >> >>>>> > >> >>>>>>>>> to go through this again. I also asked Princeton and Rutgers > >> >>>>> > >> >>>>>>> guys if > >> >>>>> > >> >>>>>>>>> they > >> >>>>> > >> >>>>>>>>> have WIP RPMs to share. > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>> Predrag > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>>> Arne > >> >>>>> > >> >>>>>>>>>> > >> >>>>> > >> >>>>>>>>>> > >> >>>>> > >> >>>>>>>>>> > >> >>>>> > >> >>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> On Oct 12, 2016, at 10:23 PM, Predrag Punosevac > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>>>>>>> wrote: > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> Arne Suppe wrote: > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>> Hi Predrag, > >> >>>>> > >> >>>>>>>>>>>> Don???t know if this applies to you, but I just build a > >> >>>>> > >> >>>>>>> machines with > >> >>>>> > >> >>>>>>>>>>>> a GTX1080 which has the same PASCAL architecture as the > >> >>>>> > >> >>>>>>> Titan. After > >> >>>>> > >> >>>>>>>>>>>> installing CUDA 8, I still found I needed to install the > >> >>>>> > >> >>>>>>> latest > >> >>>>> > >> >>>>>>>>>>>> driver off of the NVIDIA web site to get the card > >> >>>>> > >> >>>>>>> recognized. Right > >> >>>>> > >> >>>>>>>>>>>> now, I am running 367.44. > >> >>>>> > >> >>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>> Arne > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> Arne, > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> Thank you so much for this e-mail. Yes it is damn PASCAL > >> >>>>> > >> >>>>>>> arhitecture I > >> >>>>> > >> >>>>>>>>>>> see lots of people complaining about it on the forums. I > >> >>>>> > >> >>>>>>> downloaded > >> >>>>> > >> >>>>>>>>>>> and > >> >>>>> > >> >>>>>>>>>>> installed driver from > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > http://www.nvidia.com/content/DriverDownload-March2009/ > >> confirmation.php?url=/XFree86/Linux-x86_64/367.57/NVIDIA- > >> Linux-x86_64-367.57.run&lang=us&type=GeForce > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> That seems to made a real difference. Check out this > >> >>>>> > >> >>>>>>> beautiful outputs > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> root at gpu3$ ls nvidia* > >> >>>>> > >> >>>>>>>>>>> nvidia0 nvidia1 nvidia2 nvidia3 nvidiactl nvidia-uvm > >> >>>>> > >> >>>>>>>>>>> nvidia-uvm-tools > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> root at gpu3$ lspci | grep -i nvidia > >> >>>>> > >> >>>>>>>>>>> 02:00.0 VGA compatible controller: NVIDIA Corporation > >> >>>>> Device > >> >>>>> > >> >>>>>>> 1b00 (rev > >> >>>>> > >> >>>>>>>>>>> a1) > >> >>>>> > >> >>>>>>>>>>> 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >> >>>>> a1) > >> >>>>> > >> >>>>>>>>>>> 03:00.0 VGA compatible controller: NVIDIA Corporation > >> >>>>> Device > >> >>>>> > >> >>>>>>> 1b00 (rev > >> >>>>> > >> >>>>>>>>>>> a1) > >> >>>>> > >> >>>>>>>>>>> 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >> >>>>> a1) > >> >>>>> > >> >>>>>>>>>>> 82:00.0 VGA compatible controller: NVIDIA Corporation > >> >>>>> Device > >> >>>>> > >> >>>>>>> 1b00 (rev > >> >>>>> > >> >>>>>>>>>>> a1) > >> >>>>> > >> >>>>>>>>>>> 82:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >> >>>>> a1) > >> >>>>> > >> >>>>>>>>>>> 83:00.0 VGA compatible controller: NVIDIA Corporation > >> >>>>> Device > >> >>>>> > >> >>>>>>> 1b00 (rev > >> >>>>> > >> >>>>>>>>>>> a1) > >> >>>>> > >> >>>>>>>>>>> 83:00.1 Audio device: NVIDIA Corporation Device 10ef (rev > >> >>>>> a1) > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> root at gpu3$ ls /proc/driver > >> >>>>> > >> >>>>>>>>>>> nvidia nvidia-uvm nvram rtc > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> root at gpu3$ lsmod |grep nvidia > >> >>>>> > >> >>>>>>>>>>> nvidia_uvm 738901 0 > >> >>>>> > >> >>>>>>>>>>> nvidia_drm 43405 0 > >> >>>>> > >> >>>>>>>>>>> nvidia_modeset 764432 1 nvidia_drm > >> >>>>> > >> >>>>>>>>>>> nvidia 11492947 2 nvidia_modeset,nvidia_uvm > >> >>>>> > >> >>>>>>>>>>> drm_kms_helper 125056 2 ast,nvidia_drm > >> >>>>> > >> >>>>>>>>>>> drm 349210 5 > >> >>>>> > >> >>>>>>> ast,ttm,drm_kms_helper,nvidia_drm > >> >>>>> > >> >>>>>>>>>>> i2c_core 40582 7 > >> >>>>> > >> >>>>>>>>>>> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> root at gpu3$ nvidia-smi > >> >>>>> > >> >>>>>>>>>>> Wed Oct 12 22:03:27 2016 > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > +----------------------------------------------------------- > >> ------------------+ > >> >>>>> > >> >>>>>>>>>>> | NVIDIA-SMI 367.57 Driver Version: 367.57 > >> >>>>> > >> >>>>>>>>>>> | > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > |-------------------------------+----------------------+---- > >> ------------------+ > >> >>>>> > >> >>>>>>>>>>> | GPU Name Persistence-M| Bus-Id Disp.A | > >> >>>>> > >> >>>>>>> Volatile > >> >>>>> > >> >>>>>>>>>>> Uncorr. ECC | > >> >>>>> > >> >>>>>>>>>>> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | > >> >>>>> > >> >>>>>>> GPU-Util > >> >>>>> > >> >>>>>>>>>>> Compute M. | > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > |===============================+======================+==== > >> ==================| > >> >>>>> > >> >>>>>>>>>>> | 0 TITAN X (Pascal) Off | 0000:02:00.0 Off | > >> >>>>> > >> >>>>>>>>>>> N/A | > >> >>>>> > >> >>>>>>>>>>> | 23% 32C P0 56W / 250W | 0MiB / 12189MiB | > >> >>>>> > >> >>>>>>> 0% > >> >>>>> > >> >>>>>>>>>>> Default | > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > +-------------------------------+----------------------+---- > >> ------------------+ > >> >>>>> > >> >>>>>>>>>>> | 1 TITAN X (Pascal) Off | 0000:03:00.0 Off | > >> >>>>> > >> >>>>>>>>>>> N/A | > >> >>>>> > >> >>>>>>>>>>> | 23% 36C P0 57W / 250W | 0MiB / 12189MiB | > >> >>>>> > >> >>>>>>> 0% > >> >>>>> > >> >>>>>>>>>>> Default | > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > +-------------------------------+----------------------+---- > >> ------------------+ > >> >>>>> > >> >>>>>>>>>>> | 2 TITAN X (Pascal) Off | 0000:82:00.0 Off | > >> >>>>> > >> >>>>>>>>>>> N/A | > >> >>>>> > >> >>>>>>>>>>> | 23% 35C P0 57W / 250W | 0MiB / 12189MiB | > >> >>>>> > >> >>>>>>> 0% > >> >>>>> > >> >>>>>>>>>>> Default | > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > +-------------------------------+----------------------+---- > >> ------------------+ > >> >>>>> > >> >>>>>>>>>>> | 3 TITAN X (Pascal) Off | 0000:83:00.0 Off | > >> >>>>> > >> >>>>>>>>>>> N/A | > >> >>>>> > >> >>>>>>>>>>> | 0% 35C P0 56W / 250W | 0MiB / 12189MiB | > >> >>>>> > >> >>>>>>> 0% > >> >>>>> > >> >>>>>>>>>>> Default | > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > +-------------------------------+----------------------+---- > >> ------------------+ > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > +----------------------------------------------------------- > >> ------------------+ > >> >>>>> > >> >>>>>>>>>>> | Processes: > >> >>>>> > >> >>>>>>> GPU > >> >>>>> > >> >>>>>>>>>>> Memory | > >> >>>>> > >> >>>>>>>>>>> | GPU PID Type Process name > >> >>>>> > >> >>>>>>>>>>> Usage > >> >>>>> > >> >>>>>>>>>>> | > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > |=========================================================== > >> ==================| > >> >>>>> > >> >>>>>>>>>>> | No running processes found > >> >>>>> > >> >>>>>>>>>>> | > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > +----------------------------------------------------------- > >> ------------------+ > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> /usr/local/cuda/extras/demo_suite/deviceQuery > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> Alignment requirement for Surfaces: Yes > >> >>>>> > >> >>>>>>>>>>> Device has ECC support: Disabled > >> >>>>> > >> >>>>>>>>>>> Device supports Unified Addressing (UVA): Yes > >> >>>>> > >> >>>>>>>>>>> Device PCI Domain ID / Bus ID / location ID: 0 / 131 / > >> >>>>> 0 > >> >>>>> > >> >>>>>>>>>>> Compute Mode: > >> >>>>> > >> >>>>>>>>>>> < Default (multiple host threads can use > >> >>>>> > >> >>>>>>> ::cudaSetDevice() with > >> >>>>> > >> >>>>>>>>>>> device simultaneously) > > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU1) : > >> >>>>> > >> >>>>>>>>>>> Yes > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU2) : > >> >>>>> > >> >>>>>>>>>>> No > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU0) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU3) : > >> >>>>> > >> >>>>>>>>>>> No > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU0) : > >> >>>>> > >> >>>>>>>>>>> Yes > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU2) : > >> >>>>> > >> >>>>>>>>>>> No > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU1) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU3) : > >> >>>>> > >> >>>>>>>>>>> No > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU0) : > >> >>>>> > >> >>>>>>>>>>> No > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU1) : > >> >>>>> > >> >>>>>>>>>>> No > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU2) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU3) : > >> >>>>> > >> >>>>>>>>>>> Yes > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU0) : > >> >>>>> > >> >>>>>>>>>>> No > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU1) : > >> >>>>> > >> >>>>>>>>>>> No > >> >>>>> > >> >>>>>>>>>>>> Peer access from TITAN X (Pascal) (GPU3) -> TITAN X > >> >>>>> (Pascal) > >> >>>>> > >> >>>>>>> (GPU2) : > >> >>>>> > >> >>>>>>>>>>> Yes > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = > >> >>>>> 8.0, > >> >>>>> > >> >>>>>>> CUDA > >> >>>>> > >> >>>>>>>>>>> Runtime Version = 8.0, NumDevs = 4, Device0 = TITAN X > >> >>>>> > >> >>>>>>> (Pascal), > >> >>>>> > >> >>>>>>>>>>> Device1 > >> >>>>> > >> >>>>>>>>>>> = TITAN X (Pascal), Device2 = TITAN X (Pascal), Device3 = > >> >>>>> > >> >>>>>>> TITAN X > >> >>>>> > >> >>>>>>>>>>> (Pascal) > >> >>>>> > >> >>>>>>>>>>> Result = PASS > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> Now not everything is rosy > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> root at gpu3$ cd > >> >>>>> ~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody > >> >>>>> > >> >>>>>>>>>>> root at gpu3$ make > >> >>>>> > >> >>>>>>>>>>>>>> WARNING - libGL.so not found, refer to CUDA Getting > >> >>>>> > >> >>>>>>> Started Guide > >> >>>>> > >> >>>>>>>>>>> for how to find and install them. <<< > >> >>>>> > >> >>>>>>>>>>>>>> WARNING - libGLU.so not found, refer to CUDA Getting > >> >>>>> > >> >>>>>>> Started Guide > >> >>>>> > >> >>>>>>>>>>> for how to find and install them. <<< > >> >>>>> > >> >>>>>>>>>>>>>> WARNING - libX11.so not found, refer to CUDA Getting > >> >>>>> > >> >>>>>>> Started Guide > >> >>>>> > >> >>>>>>>>>>> for how to find and install them. <<< > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> even though those are installed. For example > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> root at gpu3$ yum whatprovides */libX11.so > >> >>>>> > >> >>>>>>>>>>> libX11-devel-1.6.3-2.el7.i686 : Development files for > >> >>>>> libX11 > >> >>>>> > >> >>>>>>>>>>> Repo : core > >> >>>>> > >> >>>>>>>>>>> Matched from: > >> >>>>> > >> >>>>>>>>>>> Filename : /usr/lib/libX11.so > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> also > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> mesa-libGLU-devel > >> >>>>> > >> >>>>>>>>>>> mesa-libGL-devel > >> >>>>> > >> >>>>>>>>>>> xorg-x11-drv-nvidia-devel > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> but > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> root at gpu3$ yum -y install mesa-libGLU-devel > >> >>>>> mesa-libGL-devel > >> >>>>> > >> >>>>>>>>>>> xorg-x11-drv-nvidia-devel > >> >>>>> > >> >>>>>>>>>>> Package mesa-libGLU-devel-9.0.0-4.el7.x86_64 already > >> >>>>> > >> >>>>>>> installed and > >> >>>>> > >> >>>>>>>>>>> latest version > >> >>>>> > >> >>>>>>>>>>> Package mesa-libGL-devel-10.6.5-3.20150824.el7.x86_64 > >> >>>>> already > >> >>>>> > >> >>>>>>>>>>> installed > >> >>>>> > >> >>>>>>>>>>> and latest version > >> >>>>> > >> >>>>>>>>>>> Package 1:xorg-x11-drv-nvidia-devel-367.48-1.el7.x86_64 > >> >>>>> > >> >>>>>>> already > >> >>>>> > >> >>>>>>>>>>> installed and latest version > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> Also from MATLAB gpuDevice hangs. > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> So we still don't have a working installation. Any help > >> >>>>> would > >> >>>>> > >> >>>>>>> be > >> >>>>> > >> >>>>>>>>>>> appreciated. > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> Best, > >> >>>>> > >> >>>>>>>>>>> Predrag > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> P.S. Once we have a working installation we can think of > >> >>>>> > >> >>>>>>> installing > >> >>>>> > >> >>>>>>>>>>> Caffe and TensorFlow. For now we have to see why the > >> >>>>> things > >> >>>>> > >> >>>>>>> are not > >> >>>>> > >> >>>>>>>>>>> working. > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>>> On Oct 12, 2016, at 6:26 PM, Predrag Punosevac > >> >>>>> > >> >>>>>>> > >> > >> >>>>> > >> >>>>>>>>>>>>> wrote: > >> >>>>> > >> >>>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>>> Dear Autonians, > >> >>>>> > >> >>>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>>> GPU3 is "configured". Namely you can log into it and all > >> >>>>> > >> >>>>>>> packages > >> >>>>> > >> >>>>>>>>>>>>> are > >> >>>>> > >> >>>>>>>>>>>>> installed. However I couldn't get NVIDIA provided CUDA > >> >>>>> > >> >>>>>>> driver to > >> >>>>> > >> >>>>>>>>>>>>> recognize GPU cards. They appear to be properly > >> >>>>> installed > >> >>>>> > >> >>>>>>> from the > >> >>>>> > >> >>>>>>>>>>>>> hardware point of view and you can list them with > >> >>>>> > >> >>>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>>> lshw -class display > >> >>>>> > >> >>>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>>> root at gpu3$ lshw -class display > >> >>>>> > >> >>>>>>>>>>>>> *-display UNCLAIMED > >> >>>>> > >> >>>>>>>>>>>>> description: VGA compatible controller > >> >>>>> > >> >>>>>>>>>>>>> product: NVIDIA Corporation > >> >>>>> > >> >>>>>>>>>>>>> vendor: NVIDIA Corporation > >> >>>>> > >> >>>>>>>>>>>>> physical id: 0 > >> >>>>> > >> >>>>>>>>>>>>> bus info: pci at 0000:02:00.0 > >> >>>>> > >> >>>>>>>>>>>>> version: a1 > >> >>>>> > >> >>>>>>>>>>>>> width: 64 bits > >> >>>>> > >> >>>>>>>>>>>>> clock: 33MHz > >> >>>>> > >> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >> >>>>> > >> >>>>>>> cap_list > >> >>>>> > >> >>>>>>>>>>>>> configuration: latency=0 > >> >>>>> > >> >>>>>>>>>>>>> resources: iomemory:383f0-383ef > >> >>>>> iomemory:383f0-383ef > >> >>>>> > >> >>>>>>>>>>>>> memory:cf000000-cfffffff > >> >>>>> memory:383fe0000000-383fefffffff > >> >>>>> > >> >>>>>>>>>>>>> memory:383ff0000000-383ff1ffffff ioport:6000(size=128) > >> >>>>> > >> >>>>>>>>>>>>> memory:d0000000-d007ffff > >> >>>>> > >> >>>>>>>>>>>>> *-display UNCLAIMED > >> >>>>> > >> >>>>>>>>>>>>> description: VGA compatible controller > >> >>>>> > >> >>>>>>>>>>>>> product: NVIDIA Corporation > >> >>>>> > >> >>>>>>>>>>>>> vendor: NVIDIA Corporation > >> >>>>> > >> >>>>>>>>>>>>> physical id: 0 > >> >>>>> > >> >>>>>>>>>>>>> bus info: pci at 0000:03:00.0 > >> >>>>> > >> >>>>>>>>>>>>> version: a1 > >> >>>>> > >> >>>>>>>>>>>>> width: 64 bits > >> >>>>> > >> >>>>>>>>>>>>> clock: 33MHz > >> >>>>> > >> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >> >>>>> > >> >>>>>>> cap_list > >> >>>>> > >> >>>>>>>>>>>>> configuration: latency=0 > >> >>>>> > >> >>>>>>>>>>>>> resources: iomemory:383f0-383ef > >> >>>>> iomemory:383f0-383ef > >> >>>>> > >> >>>>>>>>>>>>> memory:cd000000-cdffffff > >> >>>>> memory:383fc0000000-383fcfffffff > >> >>>>> > >> >>>>>>>>>>>>> memory:383fd0000000-383fd1ffffff ioport:5000(size=128) > >> >>>>> > >> >>>>>>>>>>>>> memory:ce000000-ce07ffff > >> >>>>> > >> >>>>>>>>>>>>> *-display > >> >>>>> > >> >>>>>>>>>>>>> description: VGA compatible controller > >> >>>>> > >> >>>>>>>>>>>>> product: ASPEED Graphics Family > >> >>>>> > >> >>>>>>>>>>>>> vendor: ASPEED Technology, Inc. > >> >>>>> > >> >>>>>>>>>>>>> physical id: 0 > >> >>>>> > >> >>>>>>>>>>>>> bus info: pci at 0000:06:00.0 > >> >>>>> > >> >>>>>>>>>>>>> version: 30 > >> >>>>> > >> >>>>>>>>>>>>> width: 32 bits > >> >>>>> > >> >>>>>>>>>>>>> clock: 33MHz > >> >>>>> > >> >>>>>>>>>>>>> capabilities: pm msi vga_controller bus_master > >> >>>>> > >> >>>>>>> cap_list rom > >> >>>>> > >> >>>>>>>>>>>>> configuration: driver=ast latency=0 > >> >>>>> > >> >>>>>>>>>>>>> resources: irq:19 memory:cb000000-cbffffff > >> >>>>> > >> >>>>>>>>>>>>> memory:cc000000-cc01ffff ioport:4000(size=128) > >> >>>>> > >> >>>>>>>>>>>>> *-display UNCLAIMED > >> >>>>> > >> >>>>>>>>>>>>> description: VGA compatible controller > >> >>>>> > >> >>>>>>>>>>>>> product: NVIDIA Corporation > >> >>>>> > >> >>>>>>>>>>>>> vendor: NVIDIA Corporation > >> >>>>> > >> >>>>>>>>>>>>> physical id: 0 > >> >>>>> > >> >>>>>>>>>>>>> bus info: pci at 0000:82:00.0 > >> >>>>> > >> >>>>>>>>>>>>> version: a1 > >> >>>>> > >> >>>>>>>>>>>>> width: 64 bits > >> >>>>> > >> >>>>>>>>>>>>> clock: 33MHz > >> >>>>> > >> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >> >>>>> > >> >>>>>>> cap_list > >> >>>>> > >> >>>>>>>>>>>>> configuration: latency=0 > >> >>>>> > >> >>>>>>>>>>>>> resources: iomemory:387f0-387ef > >> >>>>> iomemory:387f0-387ef > >> >>>>> > >> >>>>>>>>>>>>> memory:fa000000-faffffff > >> >>>>> memory:387fe0000000-387fefffffff > >> >>>>> > >> >>>>>>>>>>>>> memory:387ff0000000-387ff1ffffff ioport:e000(size=128) > >> >>>>> > >> >>>>>>>>>>>>> memory:fb000000-fb07ffff > >> >>>>> > >> >>>>>>>>>>>>> *-display UNCLAIMED > >> >>>>> > >> >>>>>>>>>>>>> description: VGA compatible controller > >> >>>>> > >> >>>>>>>>>>>>> product: NVIDIA Corporation > >> >>>>> > >> >>>>>>>>>>>>> vendor: NVIDIA Corporation > >> >>>>> > >> >>>>>>>>>>>>> physical id: 0 > >> >>>>> > >> >>>>>>>>>>>>> bus info: pci at 0000:83:00.0 > >> >>>>> > >> >>>>>>>>>>>>> version: a1 > >> >>>>> > >> >>>>>>>>>>>>> width: 64 bits > >> >>>>> > >> >>>>>>>>>>>>> clock: 33MHz > >> >>>>> > >> >>>>>>>>>>>>> capabilities: pm msi pciexpress vga_controller > >> >>>>> > >> >>>>>>> cap_list > >> >>>>> > >> >>>>>>>>>>>>> configuration: latency=0 > >> >>>>> > >> >>>>>>>>>>>>> resources: iomemory:387f0-387ef > >> >>>>> iomemory:387f0-387ef > >> >>>>> > >> >>>>>>>>>>>>> memory:f8000000-f8ffffff > >> >>>>> memory:387fc0000000-387fcfffffff > >> >>>>> > >> >>>>>>>>>>>>> memory:387fd0000000-387fd1ffffff ioport:d000(size=128) > >> >>>>> > >> >>>>>>>>>>>>> memory:f9000000-f907ffff > >> >>>>> > >> >>>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>>> However what scares the hell out of me is that I don't > >> >>>>> see > >> >>>>> > >> >>>>>>> NVIDIA > >> >>>>> > >> >>>>>>>>>>>>> driver > >> >>>>> > >> >>>>>>>>>>>>> loaded > >> >>>>> > >> >>>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>>> lsmod|grep nvidia > >> >>>>> > >> >>>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>>> and the device nodes /dev/nvidia are not created. I am > >> >>>>> > >> >>>>>>> guessing I > >> >>>>> > >> >>>>>>>>>>>>> just > >> >>>>> > >> >>>>>>>>>>>>> missed some trivial step during the CUDA installation > >> >>>>> which > >> >>>>> > >> >>>>>>> is very > >> >>>>> > >> >>>>>>>>>>>>> involving. I am unfortunately too tired to debug this > >> >>>>> > >> >>>>>>> tonight. > >> >>>>> > >> >>>>>>>>>>>>> > >> >>>>> > >> >>>>>>>>>>>>> Predrag > >> >>>>> > >> >>>>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>>>> > >> >>>>> > >> >>>>>> Links: > >> >>>>> > >> >>>>>> ------ > >> >>>>> > >> >>>>>> [1] http://findgl.mk > >> >>> > >> >>> > >> >>> Links: > >> >>> ------ > >> >>> [1] http://findgl.mk > >> > > >> > > >> > Links: > >> > ------ > >> > [1] http://findgl.mk > >> > > From predragp at cs.cmu.edu Tue Oct 18 11:36:01 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Tue, 18 Oct 2016 11:36:01 -0400 Subject: Critical ssh patches Message-ID: <20161018153601.uLUtv9wCy%predragp@cs.cmu.edu> Dear Autonians, I had to apply critical ssh patches to our infrastructure servers which cased the short few minutes interruption on ssh gateways which needed to be rebooted for patches to apply. No further interruptions are anticipated both lop1 and bash are now available and fully functional. If Auton Lab desktop behaves strangely (it should not but just in case it does) please reboot it to restart OpenVPN and remount NFS shares. Best, Predrag P.S. This ssh problems have for now only be noticed on OpenBSD and fixed in the non-portable OpenSSH version. The Linux fix is probably few days away. From predragp at cs.cmu.edu Tue Oct 18 13:54:27 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Tue, 18 Oct 2016 13:54:27 -0400 Subject: GPU3 CUDA downgrade Message-ID: <20161018175427.WWY0b754S%predragp@cs.cmu.edu> Dear Autonians, I would like to schedule CUDA downgrade from 8.0 to 7.5 on GPU3 for tomorrow at 11:00 AM. If you need machine up to finish a job please speak now. We feel that there are some reasons to believe that our problems with compiling TensorFlow are due to newest version of CUDA. I will try to downgrade to 7.5 which we use on GPU1 and GPU2 to see if we can make a progress with this. We also hope it might fix MATLAB problem. I just receive extra RAM for GPU1, GPU2, and GPU3 so once GPU3 is back on line it should have 256GB of RAM. Predrag From predragp at cs.cmu.edu Tue Oct 18 15:45:48 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Tue, 18 Oct 2016 15:45:48 -0400 Subject: CMU clock drifting Message-ID: <20161018194548.mKFDPsvcY%predragp@cs.cmu.edu> Dear Autonians, I have being trying to figure out what was wrong with AFS and Kerbersos on Jeff's computer for over a week now. What I found out is such a subtle problem that I would like to share with you as it is affecting everyone on campus. Namely the clock on Jeff's desktop was drifting so Kerberos server would not give him a ticket. I wouldn't expect that to happen as I am running ntpd daemon as I do on all our servers and virtual machines. Well I learned hard way that Carnegie Mellon University is blocking clock synchronization on their firewalls expecting people to run isc-dhcp clients which can alter clock synchronization pool. I have the list of their ntpd servers now and I will create Auton Lab ntpd server which pool their machines and pass correct time to our machines. So the moral of the story is something behaves very strange please check the clock first. Best, Predrag From predragp at cs.cmu.edu Wed Oct 19 13:22:09 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Wed, 19 Oct 2016 13:22:09 -0400 Subject: GPU3 back in business Message-ID: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> Dear Autonians, I have added additional 128 GB of RAM to GPU3 and downgraded CUDA to 7.5. The good news is that CUDA downgrade has fixed MATLAB problem. You can use MATLAB now on GPU3. I am looking at the TensorFlow right now. Predrag From predragp at cs.cmu.edu Wed Oct 19 16:10:53 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Wed, 19 Oct 2016 16:10:53 -0400 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> Message-ID: <20161019201053.KHd8koxJl%predragp@cs.cmu.edu> Dougal Sutherland wrote: > I tried for a while. I failed. > Damn this doesn't look good. I guess back to the drawing board. Thanks for the quick feed back. Predrag > Version 0.10.0 fails immediately on build: "The specified --crosstool_top > '@local_config_cuda//crosstool:crosstool' is not a valid cc_toolchain_suite > rule." Apparently this is because 0.10 required an older version of bazel ( > https://github.com/tensorflow/tensorflow/issues/4368), and I don't have the > energy to install an old version of bazel. > > Version 0.11.0rc0 gets almost done and then complains about no such file or > directory for libcudart.so.7.5 (which is there, where I told tensorflow it > was...). > > Non-release versions from git fail immediately because they call git -C to > get version info, which is only in git 1.9 (we have 1.8). > > > Some other notes: > - I made a symlink from ~/.cache/bazel to /home/scratch/$USER/.cache/bazel, > because bazel is the worst. (It complains about doing things on NFS, and > hung for me [clock-related?], and I can't find a global config file or > anything to change that in; it seems like there might be one, but their > documentation is terrible.) > > - I wasn't able to use the actual Titan X compute capability of 6.1, > because that requires cuda 8; I used 5.2 instead. Probably not a huge deal, > but I don't know. > > - I tried explicitly including /usr/local/cuda/lib64 in LD_LIBRARY_PATH and > set CUDA_HOME to /usr/local/cuda before building, hoping that would help > with the 0.11.0rc0 problem, but it didn't. From kandasamy at cmu.edu Fri Oct 21 13:14:11 2016 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Fri, 21 Oct 2016 13:14:11 -0400 Subject: GPU3 back in business In-Reply-To: <20161019201053.KHd8koxJl%predragp@cs.cmu.edu> References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu> Message-ID: Predrag, Any updates on gpu3? I have tried both tensorflow and chainer and in both cases the problem seems to be with cuda On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac wrote: > Dougal Sutherland wrote: > > > I tried for a while. I failed. > > > > Damn this doesn't look good. I guess back to the drawing board. Thanks > for the quick feed back. > > Predrag > > > Version 0.10.0 fails immediately on build: "The specified --crosstool_top > > '@local_config_cuda//crosstool:crosstool' is not a valid > cc_toolchain_suite > > rule." Apparently this is because 0.10 required an older version of > bazel ( > > https://github.com/tensorflow/tensorflow/issues/4368), and I don't have > the > > energy to install an old version of bazel. > > > > Version 0.11.0rc0 gets almost done and then complains about no such file > or > > directory for libcudart.so.7.5 (which is there, where I told tensorflow > it > > was...). > > > > Non-release versions from git fail immediately because they call git -C > to > > get version info, which is only in git 1.9 (we have 1.8). > > > > > > Some other notes: > > - I made a symlink from ~/.cache/bazel to /home/scratch/$USER/.cache/ > bazel, > > because bazel is the worst. (It complains about doing things on NFS, and > > hung for me [clock-related?], and I can't find a global config file or > > anything to change that in; it seems like there might be one, but their > > documentation is terrible.) > > > > - I wasn't able to use the actual Titan X compute capability of 6.1, > > because that requires cuda 8; I used 5.2 instead. Probably not a huge > deal, > > but I don't know. > > > > - I tried explicitly including /usr/local/cuda/lib64 in LD_LIBRARY_PATH > and > > set CUDA_HOME to /usr/local/cuda before building, hoping that would help > > with the 0.11.0rc0 problem, but it didn't. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dougal at gmail.com Fri Oct 21 14:03:21 2016 From: dougal at gmail.com (Dougal Sutherland) Date: Fri, 21 Oct 2016 18:03:21 +0000 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu> Message-ID: I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda 8.0 install, and it built fine. So additionally installing 7.5 was probably not necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute architecture that the Titan Xs use, so Theano at least needs to be manually told to use an older architecture. A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I think it should work fine with the cudnn in my scratch directory. You should probably install it to scratch, either running this first to put libraries your scratch directory or using a virtualenv or something: export PYTHONUSERBASE=/home/scratch/$USER/.local You'll need this to use the library and probably to install it: export LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/lib64:"$LD_LIBRARY_PATH" To install: pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl (remove --user if you're using a virtualenv) (A request: I'm submitting to ICLR in two weeks, and for some of the models I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please don't run a ton of stuff on gpu3 unless you're working on a deadline too. Steps to install it, for the future: - Install bazel in your home directory: - wget https://github.com/bazelbuild/bazel/releases/download/0.3.2/bazel-0.3.2-installer-linux-x86_64.sh - bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER --base=/home/scratch/$USER/.bazel - Configure bazel to build in scratch. There's probably a better way to do this, but this works: - mkdir /home/scratch/$USER/.cache - ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel - Build tensorflow. Note that builds from git checkouts don't work, because they assume a newer version of git than is on gpu3: - cd /home/scratch/$USER - wget - tar xf - cd tensorflow-0.11.0rc0 - ./configure - This is an interactive script that doesn't seem to let you pass arguments or anything. It's obnoxious. - Use the default python - don't use cloud platform or hadoop file system - use the default site-packages path if it asks - build with GPU support - default gcc - default Cuda SDK version - specify /usr/local/cuda-8.0 - default cudnn version - specify $CUDNN_DIR from use-cudnn.sh, e.g. /home/scratch/dsutherl/cudnn-8.0-5.1/cuda - Pascal Titan Xs have compute capability 6.1 - bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package - bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ - A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the directory you specified above. - Dougal On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy wrote: Predrag, Any updates on gpu3? I have tried both tensorflow and chainer and in both cases the problem seems to be with cuda On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac wrote: Dougal Sutherland wrote: > I tried for a while. I failed. > Damn this doesn't look good. I guess back to the drawing board. Thanks for the quick feed back. Predrag > Version 0.10.0 fails immediately on build: "The specified --crosstool_top > '@local_config_cuda//crosstool:crosstool' is not a valid cc_toolchain_suite > rule." Apparently this is because 0.10 required an older version of bazel ( > https://github.com/tensorflow/tensorflow/issues/4368), and I don't have the > energy to install an old version of bazel. > > Version 0.11.0rc0 gets almost done and then complains about no such file or > directory for libcudart.so.7.5 (which is there, where I told tensorflow it > was...). > > Non-release versions from git fail immediately because they call git -C to > get version info, which is only in git 1.9 (we have 1.8). > > > Some other notes: > - I made a symlink from ~/.cache/bazel to /home/scratch/$USER/.cache/bazel, > because bazel is the worst. (It complains about doing things on NFS, and > hung for me [clock-related?], and I can't find a global config file or > anything to change that in; it seems like there might be one, but their > documentation is terrible.) > > - I wasn't able to use the actual Titan X compute capability of 6.1, > because that requires cuda 8; I used 5.2 instead. Probably not a huge deal, > but I don't know. > > - I tried explicitly including /usr/local/cuda/lib64 in LD_LIBRARY_PATH and > set CUDA_HOME to /usr/local/cuda before building, hoping that would help > with the 0.11.0rc0 problem, but it didn't. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dougal at gmail.com Fri Oct 21 15:08:09 2016 From: dougal at gmail.com (Dougal Sutherland) Date: Fri, 21 Oct 2016 19:08:09 +0000 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>

Message-ID: I installed it in my scratch directory (not sure if there's a global install?). The main thing was to put its cache on scratch; it got really upset when the cache directory was on NFS. (Instructions at the bottom of my previous email.) On Fri, Oct 21, 2016, 8:04 PM Barnabas Poczos wrote: > That's great! Thanks Dougal. > > As I remember bazel was not installed correctly previously on GPU3. Do > you know what went wrong with it before and why it is good now? > > Thanks, > Barnabas > ====================== > Barnabas Poczos, PhD > Assistant Professor > Machine Learning Department > Carnegie Mellon University > > > On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland > wrote: > > I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda > 8.0 > > install, and it built fine. So additionally installing 7.5 was probably > not > > necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute > architecture > > that the Titan Xs use, so Theano at least needs to be manually told to > use > > an older architecture. > > > > A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I > think > > it should work fine with the cudnn in my scratch directory. > > > > You should probably install it to scratch, either running this first to > put > > libraries your scratch directory or using a virtualenv or something: > > export PYTHONUSERBASE=/home/scratch/$USER/.local > > > > You'll need this to use the library and probably to install it: > > export > > > LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/lib64:"$LD_LIBRARY_PATH" > > > > To install: > > pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl > > (remove --user if you're using a virtualenv) > > > > (A request: I'm submitting to ICLR in two weeks, and for some of the > models > > I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please don't > > run a ton of stuff on gpu3 unless you're working on a deadline too. > > > > > > > > Steps to install it, for the future: > > > > Install bazel in your home directory: > > > > wget > > > https://github.com/bazelbuild/bazel/releases/download/0.3.2/bazel-0.3.2-installer-linux-x86_64.sh > > bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER > > --base=/home/scratch/$USER/.bazel > > > > Configure bazel to build in scratch. There's probably a better way to do > > this, but this works: > > > > mkdir /home/scratch/$USER/.cache > > ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel > > > > Build tensorflow. Note that builds from git checkouts don't work, because > > they assume a newer version of git than is on gpu3: > > > > cd /home/scratch/$USER > > wget > > tar xf > > cd tensorflow-0.11.0rc0 > > ./configure > > > > This is an interactive script that doesn't seem to let you pass > arguments or > > anything. It's obnoxious. > > Use the default python > > don't use cloud platform or hadoop file system > > use the default site-packages path if it asks > > build with GPU support > > default gcc > > default Cuda SDK version > > specify /usr/local/cuda-8.0 > > default cudnn version > > specify $CUDNN_DIR from use-cudnn.sh, e.g. > > /home/scratch/dsutherl/cudnn-8.0-5.1/cuda > > Pascal Titan Xs have compute capability 6.1 > > > > bazel build -c opt --config=cuda > > //tensorflow/tools/pip_package:build_pip_package > > bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ > > A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the > > directory you specified above. > > > > > > - Dougal > > > > > > On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy > > > wrote: > >> > >> Predrag, > >> > >> Any updates on gpu3? > >> I have tried both tensorflow and chainer and in both cases the problem > >> seems to be with cuda > >> > >> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac > > >> wrote: > >>> > >>> Dougal Sutherland wrote: > >>> > >>> > I tried for a while. I failed. > >>> > > >>> > >>> Damn this doesn't look good. I guess back to the drawing board. Thanks > >>> for the quick feed back. > >>> > >>> Predrag > >>> > >>> > Version 0.10.0 fails immediately on build: "The specified > >>> > --crosstool_top > >>> > '@local_config_cuda//crosstool:crosstool' is not a valid > >>> > cc_toolchain_suite > >>> > rule." Apparently this is because 0.10 required an older version of > >>> > bazel ( > >>> > https://github.com/tensorflow/tensorflow/issues/4368), and I don't > have > >>> > the > >>> > energy to install an old version of bazel. > >>> > > >>> > Version 0.11.0rc0 gets almost done and then complains about no such > >>> > file or > >>> > directory for libcudart.so.7.5 (which is there, where I told > tensorflow > >>> > it > >>> > was...). > >>> > > >>> > Non-release versions from git fail immediately because they call git > -C > >>> > to > >>> > get version info, which is only in git 1.9 (we have 1.8). > >>> > > >>> > > >>> > Some other notes: > >>> > - I made a symlink from ~/.cache/bazel to > >>> > /home/scratch/$USER/.cache/bazel, > >>> > because bazel is the worst. (It complains about doing things on NFS, > >>> > and > >>> > hung for me [clock-related?], and I can't find a global config file > or > >>> > anything to change that in; it seems like there might be one, but > their > >>> > documentation is terrible.) > >>> > > >>> > - I wasn't able to use the actual Titan X compute capability of 6.1, > >>> > because that requires cuda 8; I used 5.2 instead. Probably not a huge > >>> > deal, > >>> > but I don't know. > >>> > > >>> > - I tried explicitly including /usr/local/cuda/lib64 in > LD_LIBRARY_PATH > >>> > and > >>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would > >>> > help > >>> > with the 0.11.0rc0 problem, but it didn't. > >> > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandasamy at cmu.edu Fri Oct 21 15:10:50 2016 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Fri, 21 Oct 2016 15:10:50 -0400 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>

Message-ID: Thanks Dougal. I'll take a look atthis and get back to you. So are you suggesting that this is an issue with TitanX's not being compatible with 7.5? On Fri, Oct 21, 2016 at 3:08 PM, Dougal Sutherland wrote: > I installed it in my scratch directory (not sure if there's a global > install?). The main thing was to put its cache on scratch; it got really > upset when the cache directory was on NFS. (Instructions at the bottom of > my previous email.) > > On Fri, Oct 21, 2016, 8:04 PM Barnabas Poczos wrote: > >> That's great! Thanks Dougal. >> >> As I remember bazel was not installed correctly previously on GPU3. Do >> you know what went wrong with it before and why it is good now? >> >> Thanks, >> Barnabas >> ====================== >> Barnabas Poczos, PhD >> Assistant Professor >> Machine Learning Department >> Carnegie Mellon University >> >> >> On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland >> wrote: >> > I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda >> 8.0 >> > install, and it built fine. So additionally installing 7.5 was probably >> not >> > necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute >> architecture >> > that the Titan Xs use, so Theano at least needs to be manually told to >> use >> > an older architecture. >> > >> > A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I >> think >> > it should work fine with the cudnn in my scratch directory. >> > >> > You should probably install it to scratch, either running this first to >> put >> > libraries your scratch directory or using a virtualenv or something: >> > export PYTHONUSERBASE=/home/scratch/$USER/.local >> > >> > You'll need this to use the library and probably to install it: >> > export >> > LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/ >> lib64:"$LD_LIBRARY_PATH" >> > >> > To install: >> > pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl >> > (remove --user if you're using a virtualenv) >> > >> > (A request: I'm submitting to ICLR in two weeks, and for some of the >> models >> > I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please >> don't >> > run a ton of stuff on gpu3 unless you're working on a deadline too. >> > >> > >> > >> > Steps to install it, for the future: >> > >> > Install bazel in your home directory: >> > >> > wget >> > https://github.com/bazelbuild/bazel/releases/download/0.3.2/ >> bazel-0.3.2-installer-linux-x86_64.sh >> > bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER >> > --base=/home/scratch/$USER/.bazel >> > >> > Configure bazel to build in scratch. There's probably a better way to do >> > this, but this works: >> > >> > mkdir /home/scratch/$USER/.cache >> > ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel >> > >> > Build tensorflow. Note that builds from git checkouts don't work, >> because >> > they assume a newer version of git than is on gpu3: >> > >> > cd /home/scratch/$USER >> > wget >> > tar xf >> > cd tensorflow-0.11.0rc0 >> > ./configure >> > >> > This is an interactive script that doesn't seem to let you pass >> arguments or >> > anything. It's obnoxious. >> > Use the default python >> > don't use cloud platform or hadoop file system >> > use the default site-packages path if it asks >> > build with GPU support >> > default gcc >> > default Cuda SDK version >> > specify /usr/local/cuda-8.0 >> > default cudnn version >> > specify $CUDNN_DIR from use-cudnn.sh, e.g. >> > /home/scratch/dsutherl/cudnn-8.0-5.1/cuda >> > Pascal Titan Xs have compute capability 6.1 >> > >> > bazel build -c opt --config=cuda >> > //tensorflow/tools/pip_package:build_pip_package >> > bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ >> > A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the >> > directory you specified above. >> > >> > >> > - Dougal >> > >> > >> > On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy < >> kandasamy at cmu.edu> >> > wrote: >> >> >> >> Predrag, >> >> >> >> Any updates on gpu3? >> >> I have tried both tensorflow and chainer and in both cases the problem >> >> seems to be with cuda >> >> >> >> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac < >> predragp at cs.cmu.edu> >> >> wrote: >> >>> >> >>> Dougal Sutherland wrote: >> >>> >> >>> > I tried for a while. I failed. >> >>> > >> >>> >> >>> Damn this doesn't look good. I guess back to the drawing board. Thanks >> >>> for the quick feed back. >> >>> >> >>> Predrag >> >>> >> >>> > Version 0.10.0 fails immediately on build: "The specified >> >>> > --crosstool_top >> >>> > '@local_config_cuda//crosstool:crosstool' is not a valid >> >>> > cc_toolchain_suite >> >>> > rule." Apparently this is because 0.10 required an older version of >> >>> > bazel ( >> >>> > https://github.com/tensorflow/tensorflow/issues/4368), and I don't >> have >> >>> > the >> >>> > energy to install an old version of bazel. >> >>> > >> >>> > Version 0.11.0rc0 gets almost done and then complains about no such >> >>> > file or >> >>> > directory for libcudart.so.7.5 (which is there, where I told >> tensorflow >> >>> > it >> >>> > was...). >> >>> > >> >>> > Non-release versions from git fail immediately because they call >> git -C >> >>> > to >> >>> > get version info, which is only in git 1.9 (we have 1.8). >> >>> > >> >>> > >> >>> > Some other notes: >> >>> > - I made a symlink from ~/.cache/bazel to >> >>> > /home/scratch/$USER/.cache/bazel, >> >>> > because bazel is the worst. (It complains about doing things on NFS, >> >>> > and >> >>> > hung for me [clock-related?], and I can't find a global config file >> or >> >>> > anything to change that in; it seems like there might be one, but >> their >> >>> > documentation is terrible.) >> >>> > >> >>> > - I wasn't able to use the actual Titan X compute capability of 6.1, >> >>> > because that requires cuda 8; I used 5.2 instead. Probably not a >> huge >> >>> > deal, >> >>> > but I don't know. >> >>> > >> >>> > - I tried explicitly including /usr/local/cuda/lib64 in >> LD_LIBRARY_PATH >> >>> > and >> >>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would >> >>> > help >> >>> > with the 0.11.0rc0 problem, but it didn't. >> >> >> >> >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bapoczos at cs.cmu.edu Fri Oct 21 15:04:08 2016 From: bapoczos at cs.cmu.edu (Barnabas Poczos) Date: Fri, 21 Oct 2016 15:04:08 -0400 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>

Message-ID: That's great! Thanks Dougal. As I remember bazel was not installed correctly previously on GPU3. Do you know what went wrong with it before and why it is good now? Thanks, Barnabas ====================== Barnabas Poczos, PhD Assistant Professor Machine Learning Department Carnegie Mellon University On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland wrote: > I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda 8.0 > install, and it built fine. So additionally installing 7.5 was probably not > necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute architecture > that the Titan Xs use, so Theano at least needs to be manually told to use > an older architecture. > > A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I think > it should work fine with the cudnn in my scratch directory. > > You should probably install it to scratch, either running this first to put > libraries your scratch directory or using a virtualenv or something: > export PYTHONUSERBASE=/home/scratch/$USER/.local > > You'll need this to use the library and probably to install it: > export > LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/lib64:"$LD_LIBRARY_PATH" > > To install: > pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl > (remove --user if you're using a virtualenv) > > (A request: I'm submitting to ICLR in two weeks, and for some of the models > I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please don't > run a ton of stuff on gpu3 unless you're working on a deadline too. > > > > Steps to install it, for the future: > > Install bazel in your home directory: > > wget > https://github.com/bazelbuild/bazel/releases/download/0.3.2/bazel-0.3.2-installer-linux-x86_64.sh > bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER > --base=/home/scratch/$USER/.bazel > > Configure bazel to build in scratch. There's probably a better way to do > this, but this works: > > mkdir /home/scratch/$USER/.cache > ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel > > Build tensorflow. Note that builds from git checkouts don't work, because > they assume a newer version of git than is on gpu3: > > cd /home/scratch/$USER > wget > tar xf > cd tensorflow-0.11.0rc0 > ./configure > > This is an interactive script that doesn't seem to let you pass arguments or > anything. It's obnoxious. > Use the default python > don't use cloud platform or hadoop file system > use the default site-packages path if it asks > build with GPU support > default gcc > default Cuda SDK version > specify /usr/local/cuda-8.0 > default cudnn version > specify $CUDNN_DIR from use-cudnn.sh, e.g. > /home/scratch/dsutherl/cudnn-8.0-5.1/cuda > Pascal Titan Xs have compute capability 6.1 > > bazel build -c opt --config=cuda > //tensorflow/tools/pip_package:build_pip_package > bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ > A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the > directory you specified above. > > > - Dougal > > > On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy > wrote: >> >> Predrag, >> >> Any updates on gpu3? >> I have tried both tensorflow and chainer and in both cases the problem >> seems to be with cuda >> >> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac >> wrote: >>> >>> Dougal Sutherland wrote: >>> >>> > I tried for a while. I failed. >>> > >>> >>> Damn this doesn't look good. I guess back to the drawing board. Thanks >>> for the quick feed back. >>> >>> Predrag >>> >>> > Version 0.10.0 fails immediately on build: "The specified >>> > --crosstool_top >>> > '@local_config_cuda//crosstool:crosstool' is not a valid >>> > cc_toolchain_suite >>> > rule." Apparently this is because 0.10 required an older version of >>> > bazel ( >>> > https://github.com/tensorflow/tensorflow/issues/4368), and I don't have >>> > the >>> > energy to install an old version of bazel. >>> > >>> > Version 0.11.0rc0 gets almost done and then complains about no such >>> > file or >>> > directory for libcudart.so.7.5 (which is there, where I told tensorflow >>> > it >>> > was...). >>> > >>> > Non-release versions from git fail immediately because they call git -C >>> > to >>> > get version info, which is only in git 1.9 (we have 1.8). >>> > >>> > >>> > Some other notes: >>> > - I made a symlink from ~/.cache/bazel to >>> > /home/scratch/$USER/.cache/bazel, >>> > because bazel is the worst. (It complains about doing things on NFS, >>> > and >>> > hung for me [clock-related?], and I can't find a global config file or >>> > anything to change that in; it seems like there might be one, but their >>> > documentation is terrible.) >>> > >>> > - I wasn't able to use the actual Titan X compute capability of 6.1, >>> > because that requires cuda 8; I used 5.2 instead. Probably not a huge >>> > deal, >>> > but I don't know. >>> > >>> > - I tried explicitly including /usr/local/cuda/lib64 in LD_LIBRARY_PATH >>> > and >>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would >>> > help >>> > with the 0.11.0rc0 problem, but it didn't. >> >> > From dougal at gmail.com Fri Oct 21 15:17:13 2016 From: dougal at gmail.com (Dougal Sutherland) Date: Fri, 21 Oct 2016 19:17:13 +0000 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>

Message-ID: They do work with 7.5 if you specify an older compute architecture; it's just that their actual compute capability of 6.1 isn't supported by cuda 7.5. Thank is thrown off by this, for example, but it can be fixed by telling it to pass compute capability 5.2 (for example) to nvcc. I don't think that this was my problem with building tensorflow on 7.5; I'm not sure what that was. On Fri, Oct 21, 2016, 8:11 PM Kirthevasan Kandasamy wrote: > Thanks Dougal. I'll take a look atthis and get back to you. > So are you suggesting that this is an issue with TitanX's not being > compatible with 7.5? > > On Fri, Oct 21, 2016 at 3:08 PM, Dougal Sutherland > wrote: > > I installed it in my scratch directory (not sure if there's a global > install?). The main thing was to put its cache on scratch; it got really > upset when the cache directory was on NFS. (Instructions at the bottom of > my previous email.) > > On Fri, Oct 21, 2016, 8:04 PM Barnabas Poczos wrote: > > That's great! Thanks Dougal. > > As I remember bazel was not installed correctly previously on GPU3. Do > you know what went wrong with it before and why it is good now? > > Thanks, > Barnabas > ====================== > Barnabas Poczos, PhD > Assistant Professor > Machine Learning Department > Carnegie Mellon University > > > On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland > wrote: > > I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda > 8.0 > > install, and it built fine. So additionally installing 7.5 was probably > not > > necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute > architecture > > that the Titan Xs use, so Theano at least needs to be manually told to > use > > an older architecture. > > > > A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I > think > > it should work fine with the cudnn in my scratch directory. > > > > You should probably install it to scratch, either running this first to > put > > libraries your scratch directory or using a virtualenv or something: > > export PYTHONUSERBASE=/home/scratch/$USER/.local > > > > You'll need this to use the library and probably to install it: > > export > > > LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/lib64:"$LD_LIBRARY_PATH" > > > > To install: > > pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl > > (remove --user if you're using a virtualenv) > > > > (A request: I'm submitting to ICLR in two weeks, and for some of the > models > > I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please don't > > run a ton of stuff on gpu3 unless you're working on a deadline too. > > > > > > > > Steps to install it, for the future: > > > > Install bazel in your home directory: > > > > wget > > > https://github.com/bazelbuild/bazel/releases/download/0.3.2/bazel-0.3.2-installer-linux-x86_64.sh > > bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER > > --base=/home/scratch/$USER/.bazel > > > > Configure bazel to build in scratch. There's probably a better way to do > > this, but this works: > > > > mkdir /home/scratch/$USER/.cache > > ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel > > > > Build tensorflow. Note that builds from git checkouts don't work, because > > they assume a newer version of git than is on gpu3: > > > > cd /home/scratch/$USER > > wget > > tar xf > > cd tensorflow-0.11.0rc0 > > ./configure > > > > This is an interactive script that doesn't seem to let you pass > arguments or > > anything. It's obnoxious. > > Use the default python > > don't use cloud platform or hadoop file system > > use the default site-packages path if it asks > > build with GPU support > > default gcc > > default Cuda SDK version > > specify /usr/local/cuda-8.0 > > default cudnn version > > specify $CUDNN_DIR from use-cudnn.sh, e.g. > > /home/scratch/dsutherl/cudnn-8.0-5.1/cuda > > Pascal Titan Xs have compute capability 6.1 > > > > bazel build -c opt --config=cuda > > //tensorflow/tools/pip_package:build_pip_package > > bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ > > A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the > > directory you specified above. > > > > > > - Dougal > > > > > > On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy > > > wrote: > >> > >> Predrag, > >> > >> Any updates on gpu3? > >> I have tried both tensorflow and chainer and in both cases the problem > >> seems to be with cuda > >> > >> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac > > >> wrote: > >>> > >>> Dougal Sutherland wrote: > >>> > >>> > I tried for a while. I failed. > >>> > > >>> > >>> Damn this doesn't look good. I guess back to the drawing board. Thanks > >>> for the quick feed back. > >>> > >>> Predrag > >>> > >>> > Version 0.10.0 fails immediately on build: "The specified > >>> > --crosstool_top > >>> > '@local_config_cuda//crosstool:crosstool' is not a valid > >>> > cc_toolchain_suite > >>> > rule." Apparently this is because 0.10 required an older version of > >>> > bazel ( > >>> > https://github.com/tensorflow/tensorflow/issues/4368), and I don't > have > >>> > the > >>> > energy to install an old version of bazel. > >>> > > >>> > Version 0.11.0rc0 gets almost done and then complains about no such > >>> > file or > >>> > directory for libcudart.so.7.5 (which is there, where I told > tensorflow > >>> > it > >>> > was...). > >>> > > >>> > Non-release versions from git fail immediately because they call git > -C > >>> > to > >>> > get version info, which is only in git 1.9 (we have 1.8). > >>> > > >>> > > >>> > Some other notes: > >>> > - I made a symlink from ~/.cache/bazel to > >>> > /home/scratch/$USER/.cache/bazel, > >>> > because bazel is the worst. (It complains about doing things on NFS, > >>> > and > >>> > hung for me [clock-related?], and I can't find a global config file > or > >>> > anything to change that in; it seems like there might be one, but > their > >>> > documentation is terrible.) > >>> > > >>> > - I wasn't able to use the actual Titan X compute capability of 6.1, > >>> > because that requires cuda 8; I used 5.2 instead. Probably not a huge > >>> > deal, > >>> > but I don't know. > >>> > > >>> > - I tried explicitly including /usr/local/cuda/lib64 in > LD_LIBRARY_PATH > >>> > and > >>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would > >>> > help > >>> > with the 0.11.0rc0 problem, but it didn't. > >> > >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kandasamy at cmu.edu Fri Oct 21 15:20:13 2016 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Fri, 21 Oct 2016 15:20:13 -0400 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>

Message-ID: I didn't understand half of what you said :P But I'll give this a shot and get abck to you if I run into any issues. On Fri, Oct 21, 2016 at 3:17 PM, Dougal Sutherland wrote: > They do work with 7.5 if you specify an older compute architecture; it's > just that their actual compute capability of 6.1 isn't supported by cuda > 7.5. Thank is thrown off by this, for example, but it can be fixed by > telling it to pass compute capability 5.2 (for example) to nvcc. I don't > think that this was my problem with building tensorflow on 7.5; I'm not > sure what that was. > > On Fri, Oct 21, 2016, 8:11 PM Kirthevasan Kandasamy > wrote: > >> Thanks Dougal. I'll take a look atthis and get back to you. >> So are you suggesting that this is an issue with TitanX's not being >> compatible with 7.5? >> >> On Fri, Oct 21, 2016 at 3:08 PM, Dougal Sutherland >> wrote: >> >> I installed it in my scratch directory (not sure if there's a global >> install?). The main thing was to put its cache on scratch; it got really >> upset when the cache directory was on NFS. (Instructions at the bottom of >> my previous email.) >> >> On Fri, Oct 21, 2016, 8:04 PM Barnabas Poczos >> wrote: >> >> That's great! Thanks Dougal. >> >> As I remember bazel was not installed correctly previously on GPU3. Do >> you know what went wrong with it before and why it is good now? >> >> Thanks, >> Barnabas >> ====================== >> Barnabas Poczos, PhD >> Assistant Professor >> Machine Learning Department >> Carnegie Mellon University >> >> >> On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland >> wrote: >> > I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda >> 8.0 >> > install, and it built fine. So additionally installing 7.5 was probably >> not >> > necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute >> architecture >> > that the Titan Xs use, so Theano at least needs to be manually told to >> use >> > an older architecture. >> > >> > A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I >> think >> > it should work fine with the cudnn in my scratch directory. >> > >> > You should probably install it to scratch, either running this first to >> put >> > libraries your scratch directory or using a virtualenv or something: >> > export PYTHONUSERBASE=/home/scratch/$USER/.local >> > >> > You'll need this to use the library and probably to install it: >> > export >> > LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/ >> lib64:"$LD_LIBRARY_PATH" >> > >> > To install: >> > pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl >> > (remove --user if you're using a virtualenv) >> > >> > (A request: I'm submitting to ICLR in two weeks, and for some of the >> models >> > I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please >> don't >> > run a ton of stuff on gpu3 unless you're working on a deadline too. >> > >> > >> > >> > Steps to install it, for the future: >> > >> > Install bazel in your home directory: >> > >> > wget >> > https://github.com/bazelbuild/bazel/releases/download/0.3.2/ >> bazel-0.3.2-installer-linux-x86_64.sh >> > bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER >> > --base=/home/scratch/$USER/.bazel >> > >> > Configure bazel to build in scratch. There's probably a better way to do >> > this, but this works: >> > >> > mkdir /home/scratch/$USER/.cache >> > ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel >> > >> > Build tensorflow. Note that builds from git checkouts don't work, >> because >> > they assume a newer version of git than is on gpu3: >> > >> > cd /home/scratch/$USER >> > wget >> > tar xf >> > cd tensorflow-0.11.0rc0 >> > ./configure >> > >> > This is an interactive script that doesn't seem to let you pass >> arguments or >> > anything. It's obnoxious. >> > Use the default python >> > don't use cloud platform or hadoop file system >> > use the default site-packages path if it asks >> > build with GPU support >> > default gcc >> > default Cuda SDK version >> > specify /usr/local/cuda-8.0 >> > default cudnn version >> > specify $CUDNN_DIR from use-cudnn.sh, e.g. >> > /home/scratch/dsutherl/cudnn-8.0-5.1/cuda >> > Pascal Titan Xs have compute capability 6.1 >> > >> > bazel build -c opt --config=cuda >> > //tensorflow/tools/pip_package:build_pip_package >> > bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ >> > A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the >> > directory you specified above. >> > >> > >> > - Dougal >> > >> > >> > On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy < >> kandasamy at cmu.edu> >> > wrote: >> >> >> >> Predrag, >> >> >> >> Any updates on gpu3? >> >> I have tried both tensorflow and chainer and in both cases the problem >> >> seems to be with cuda >> >> >> >> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac < >> predragp at cs.cmu.edu> >> >> wrote: >> >>> >> >>> Dougal Sutherland wrote: >> >>> >> >>> > I tried for a while. I failed. >> >>> > >> >>> >> >>> Damn this doesn't look good. I guess back to the drawing board. Thanks >> >>> for the quick feed back. >> >>> >> >>> Predrag >> >>> >> >>> > Version 0.10.0 fails immediately on build: "The specified >> >>> > --crosstool_top >> >>> > '@local_config_cuda//crosstool:crosstool' is not a valid >> >>> > cc_toolchain_suite >> >>> > rule." Apparently this is because 0.10 required an older version of >> >>> > bazel ( >> >>> > https://github.com/tensorflow/tensorflow/issues/4368), and I don't >> have >> >>> > the >> >>> > energy to install an old version of bazel. >> >>> > >> >>> > Version 0.11.0rc0 gets almost done and then complains about no such >> >>> > file or >> >>> > directory for libcudart.so.7.5 (which is there, where I told >> tensorflow >> >>> > it >> >>> > was...). >> >>> > >> >>> > Non-release versions from git fail immediately because they call >> git -C >> >>> > to >> >>> > get version info, which is only in git 1.9 (we have 1.8). >> >>> > >> >>> > >> >>> > Some other notes: >> >>> > - I made a symlink from ~/.cache/bazel to >> >>> > /home/scratch/$USER/.cache/bazel, >> >>> > because bazel is the worst. (It complains about doing things on NFS, >> >>> > and >> >>> > hung for me [clock-related?], and I can't find a global config file >> or >> >>> > anything to change that in; it seems like there might be one, but >> their >> >>> > documentation is terrible.) >> >>> > >> >>> > - I wasn't able to use the actual Titan X compute capability of 6.1, >> >>> > because that requires cuda 8; I used 5.2 instead. Probably not a >> huge >> >>> > deal, >> >>> > but I don't know. >> >>> > >> >>> > - I tried explicitly including /usr/local/cuda/lib64 in >> LD_LIBRARY_PATH >> >>> > and >> >>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would >> >>> > help >> >>> > with the 0.11.0rc0 problem, but it didn't. >> >> >> >> >> > >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dougal at gmail.com Fri Oct 21 15:27:32 2016 From: dougal at gmail.com (Dougal Sutherland) Date: Fri, 21 Oct 2016 19:27:32 +0000 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>

Message-ID: Heh. :) An explanation: - Different nvidia gpu architectures are called "compute capabilities". This is a number that describes the behavior of the card: the maximum size of various things, which API functions it supports, etc. There's a reference here , but it shouldn't really matter. - When CUDA compiles code, it targets a certain architecture, since it needs to know what features to use and whatnot. I *think* that if you compile for compute capability x, it will work on a card with compute capability y approximately iff x <= y. - Pascal Titan Xs, like gpu3 has, have compute capability 6.1. - CUDA 7.5 doesn't know about compute capability 6.1, so if you ask to compile for 6.1 it crashes. - Theano by default tries to compile for the capability of the card, but can be configured to compile for a different capability. - Tensorflow asks for a list of capabilities to compile for when you build it in the first place. On Fri, Oct 21, 2016 at 8:17 PM Dougal Sutherland wrote: > They do work with 7.5 if you specify an older compute architecture; it's > just that their actual compute capability of 6.1 isn't supported by cuda > 7.5. Thank is thrown off by this, for example, but it can be fixed by > telling it to pass compute capability 5.2 (for example) to nvcc. I don't > think that this was my problem with building tensorflow on 7.5; I'm not > sure what that was. > > On Fri, Oct 21, 2016, 8:11 PM Kirthevasan Kandasamy > wrote: > > Thanks Dougal. I'll take a look atthis and get back to you. > So are you suggesting that this is an issue with TitanX's not being > compatible with 7.5? > > On Fri, Oct 21, 2016 at 3:08 PM, Dougal Sutherland > wrote: > > I installed it in my scratch directory (not sure if there's a global > install?). The main thing was to put its cache on scratch; it got really > upset when the cache directory was on NFS. (Instructions at the bottom of > my previous email.) > > On Fri, Oct 21, 2016, 8:04 PM Barnabas Poczos wrote: > > That's great! Thanks Dougal. > > As I remember bazel was not installed correctly previously on GPU3. Do > you know what went wrong with it before and why it is good now? > > Thanks, > Barnabas > ====================== > Barnabas Poczos, PhD > Assistant Professor > Machine Learning Department > Carnegie Mellon University > > > On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland > wrote: > > I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda > 8.0 > > install, and it built fine. So additionally installing 7.5 was probably > not > > necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute > architecture > > that the Titan Xs use, so Theano at least needs to be manually told to > use > > an older architecture. > > > > A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I > think > > it should work fine with the cudnn in my scratch directory. > > > > You should probably install it to scratch, either running this first to > put > > libraries your scratch directory or using a virtualenv or something: > > export PYTHONUSERBASE=/home/scratch/$USER/.local > > > > You'll need this to use the library and probably to install it: > > export > > > LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/lib64:"$LD_LIBRARY_PATH" > > > > To install: > > pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl > > (remove --user if you're using a virtualenv) > > > > (A request: I'm submitting to ICLR in two weeks, and for some of the > models > > I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please don't > > run a ton of stuff on gpu3 unless you're working on a deadline too. > > > > > > > > Steps to install it, for the future: > > > > Install bazel in your home directory: > > > > wget > > > https://github.com/bazelbuild/bazel/releases/download/0.3.2/bazel-0.3.2-installer-linux-x86_64.sh > > bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER > > --base=/home/scratch/$USER/.bazel > > > > Configure bazel to build in scratch. There's probably a better way to do > > this, but this works: > > > > mkdir /home/scratch/$USER/.cache > > ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel > > > > Build tensorflow. Note that builds from git checkouts don't work, because > > they assume a newer version of git than is on gpu3: > > > > cd /home/scratch/$USER > > wget > > tar xf > > cd tensorflow-0.11.0rc0 > > ./configure > > > > This is an interactive script that doesn't seem to let you pass > arguments or > > anything. It's obnoxious. > > Use the default python > > don't use cloud platform or hadoop file system > > use the default site-packages path if it asks > > build with GPU support > > default gcc > > default Cuda SDK version > > specify /usr/local/cuda-8.0 > > default cudnn version > > specify $CUDNN_DIR from use-cudnn.sh, e.g. > > /home/scratch/dsutherl/cudnn-8.0-5.1/cuda > > Pascal Titan Xs have compute capability 6.1 > > > > bazel build -c opt --config=cuda > > //tensorflow/tools/pip_package:build_pip_package > > bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ > > A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the > > directory you specified above. > > > > > > - Dougal > > > > > > On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy > > > wrote: > >> > >> Predrag, > >> > >> Any updates on gpu3? > >> I have tried both tensorflow and chainer and in both cases the problem > >> seems to be with cuda > >> > >> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac > > >> wrote: > >>> > >>> Dougal Sutherland wrote: > >>> > >>> > I tried for a while. I failed. > >>> > > >>> > >>> Damn this doesn't look good. I guess back to the drawing board. Thanks > >>> for the quick feed back. > >>> > >>> Predrag > >>> > >>> > Version 0.10.0 fails immediately on build: "The specified > >>> > --crosstool_top > >>> > '@local_config_cuda//crosstool:crosstool' is not a valid > >>> > cc_toolchain_suite > >>> > rule." Apparently this is because 0.10 required an older version of > >>> > bazel ( > >>> > https://github.com/tensorflow/tensorflow/issues/4368), and I don't > have > >>> > the > >>> > energy to install an old version of bazel. > >>> > > >>> > Version 0.11.0rc0 gets almost done and then complains about no such > >>> > file or > >>> > directory for libcudart.so.7.5 (which is there, where I told > tensorflow > >>> > it > >>> > was...). > >>> > > >>> > Non-release versions from git fail immediately because they call git > -C > >>> > to > >>> > get version info, which is only in git 1.9 (we have 1.8). > >>> > > >>> > > >>> > Some other notes: > >>> > - I made a symlink from ~/.cache/bazel to > >>> > /home/scratch/$USER/.cache/bazel, > >>> > because bazel is the worst. (It complains about doing things on NFS, > >>> > and > >>> > hung for me [clock-related?], and I can't find a global config file > or > >>> > anything to change that in; it seems like there might be one, but > their > >>> > documentation is terrible.) > >>> > > >>> > - I wasn't able to use the actual Titan X compute capability of 6.1, > >>> > because that requires cuda 8; I used 5.2 instead. Probably not a huge > >>> > deal, > >>> > but I don't know. > >>> > > >>> > - I tried explicitly including /usr/local/cuda/lib64 in > LD_LIBRARY_PATH > >>> > and > >>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would > >>> > help > >>> > with the 0.11.0rc0 problem, but it didn't. > >> > >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From predragp at cs.cmu.edu Fri Oct 21 15:37:27 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Fri, 21 Oct 2016 15:37:27 -0400 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>

Message-ID: <20161021193727.9hJzyE8s5%predragp@cs.cmu.edu> Dougal Sutherland wrote: Sorry that I am late for the party. This is my interpretation of what we should do. 1. I will go back to CUDA 8.0 which will brake MATLAB. We have to live with it. Barnabas please OK this. I will work with MathWorks for this to be fixed for 2017a release. 2. Then I could install TensorFlow compiled by Dougal system wide. Please Dugal after I upgrade back to 8.0 recompile it again using CUDA 8.0. I could give you the root password so that you can compile and install directly. 3. If everyone is OK with above I will pull the trigger on GPU3 at 4:30PM and upgrade to 8.0 4. MATLAB will be broken on GPU2 as well after I put Titan cards during the October 25 power outrage. Predrag > Heh. :) > > An explanation: > > - Different nvidia gpu architectures are called "compute capabilities". > This is a number that describes the behavior of the card: the maximum size > of various things, which API functions it supports, etc. There's a > reference here > , > but it shouldn't really matter. > - When CUDA compiles code, it targets a certain architecture, since it > needs to know what features to use and whatnot. I *think* that if you > compile for compute capability x, it will work on a card with compute > capability y approximately iff x <= y. > - Pascal Titan Xs, like gpu3 has, have compute capability 6.1. > - CUDA 7.5 doesn't know about compute capability 6.1, so if you ask to > compile for 6.1 it crashes. > - Theano by default tries to compile for the capability of the card, but > can be configured to compile for a different capability. > - Tensorflow asks for a list of capabilities to compile for when you > build it in the first place. > > > On Fri, Oct 21, 2016 at 8:17 PM Dougal Sutherland wrote: > > > They do work with 7.5 if you specify an older compute architecture; it's > > just that their actual compute capability of 6.1 isn't supported by cuda > > 7.5. Thank is thrown off by this, for example, but it can be fixed by > > telling it to pass compute capability 5.2 (for example) to nvcc. I don't > > think that this was my problem with building tensorflow on 7.5; I'm not > > sure what that was. > > > > On Fri, Oct 21, 2016, 8:11 PM Kirthevasan Kandasamy > > wrote: > > > > Thanks Dougal. I'll take a look atthis and get back to you. > > So are you suggesting that this is an issue with TitanX's not being > > compatible with 7.5? > > > > On Fri, Oct 21, 2016 at 3:08 PM, Dougal Sutherland > > wrote: > > > > I installed it in my scratch directory (not sure if there's a global > > install?). The main thing was to put its cache on scratch; it got really > > upset when the cache directory was on NFS. (Instructions at the bottom of > > my previous email.) > > > > On Fri, Oct 21, 2016, 8:04 PM Barnabas Poczos wrote: > > > > That's great! Thanks Dougal. > > > > As I remember bazel was not installed correctly previously on GPU3. Do > > you know what went wrong with it before and why it is good now? > > > > Thanks, > > Barnabas > > ====================== > > Barnabas Poczos, PhD > > Assistant Professor > > Machine Learning Department > > Carnegie Mellon University > > > > > > On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland > > wrote: > > > I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda > > 8.0 > > > install, and it built fine. So additionally installing 7.5 was probably > > not > > > necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute > > architecture > > > that the Titan Xs use, so Theano at least needs to be manually told to > > use > > > an older architecture. > > > > > > A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I > > think > > > it should work fine with the cudnn in my scratch directory. > > > > > > You should probably install it to scratch, either running this first to > > put > > > libraries your scratch directory or using a virtualenv or something: > > > export PYTHONUSERBASE=/home/scratch/$USER/.local > > > > > > You'll need this to use the library and probably to install it: > > > export > > > > > LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/lib64:"$LD_LIBRARY_PATH" > > > > > > To install: > > > pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl > > > (remove --user if you're using a virtualenv) > > > > > > (A request: I'm submitting to ICLR in two weeks, and for some of the > > models > > > I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please don't > > > run a ton of stuff on gpu3 unless you're working on a deadline too. > > > > > > > > > > > > Steps to install it, for the future: > > > > > > Install bazel in your home directory: > > > > > > wget > > > > > https://github.com/bazelbuild/bazel/releases/download/0.3.2/bazel-0.3.2-installer-linux-x86_64.sh > > > bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER > > > --base=/home/scratch/$USER/.bazel > > > > > > Configure bazel to build in scratch. There's probably a better way to do > > > this, but this works: > > > > > > mkdir /home/scratch/$USER/.cache > > > ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel > > > > > > Build tensorflow. Note that builds from git checkouts don't work, because > > > they assume a newer version of git than is on gpu3: > > > > > > cd /home/scratch/$USER > > > wget > > > tar xf > > > cd tensorflow-0.11.0rc0 > > > ./configure > > > > > > This is an interactive script that doesn't seem to let you pass > > arguments or > > > anything. It's obnoxious. > > > Use the default python > > > don't use cloud platform or hadoop file system > > > use the default site-packages path if it asks > > > build with GPU support > > > default gcc > > > default Cuda SDK version > > > specify /usr/local/cuda-8.0 > > > default cudnn version > > > specify $CUDNN_DIR from use-cudnn.sh, e.g. > > > /home/scratch/dsutherl/cudnn-8.0-5.1/cuda > > > Pascal Titan Xs have compute capability 6.1 > > > > > > bazel build -c opt --config=cuda > > > //tensorflow/tools/pip_package:build_pip_package > > > bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ > > > A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the > > > directory you specified above. > > > > > > > > > - Dougal > > > > > > > > > On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy > > > > > wrote: > > >> > > >> Predrag, > > >> > > >> Any updates on gpu3? > > >> I have tried both tensorflow and chainer and in both cases the problem > > >> seems to be with cuda > > >> > > >> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac > > > > >> wrote: > > >>> > > >>> Dougal Sutherland wrote: > > >>> > > >>> > I tried for a while. I failed. > > >>> > > > >>> > > >>> Damn this doesn't look good. I guess back to the drawing board. Thanks > > >>> for the quick feed back. > > >>> > > >>> Predrag > > >>> > > >>> > Version 0.10.0 fails immediately on build: "The specified > > >>> > --crosstool_top > > >>> > '@local_config_cuda//crosstool:crosstool' is not a valid > > >>> > cc_toolchain_suite > > >>> > rule." Apparently this is because 0.10 required an older version of > > >>> > bazel ( > > >>> > https://github.com/tensorflow/tensorflow/issues/4368), and I don't > > have > > >>> > the > > >>> > energy to install an old version of bazel. > > >>> > > > >>> > Version 0.11.0rc0 gets almost done and then complains about no such > > >>> > file or > > >>> > directory for libcudart.so.7.5 (which is there, where I told > > tensorflow > > >>> > it > > >>> > was...). > > >>> > > > >>> > Non-release versions from git fail immediately because they call git > > -C > > >>> > to > > >>> > get version info, which is only in git 1.9 (we have 1.8). > > >>> > > > >>> > > > >>> > Some other notes: > > >>> > - I made a symlink from ~/.cache/bazel to > > >>> > /home/scratch/$USER/.cache/bazel, > > >>> > because bazel is the worst. (It complains about doing things on NFS, > > >>> > and > > >>> > hung for me [clock-related?], and I can't find a global config file > > or > > >>> > anything to change that in; it seems like there might be one, but > > their > > >>> > documentation is terrible.) > > >>> > > > >>> > - I wasn't able to use the actual Titan X compute capability of 6.1, > > >>> > because that requires cuda 8; I used 5.2 instead. Probably not a huge > > >>> > deal, > > >>> > but I don't know. > > >>> > > > >>> > - I tried explicitly including /usr/local/cuda/lib64 in > > LD_LIBRARY_PATH > > >>> > and > > >>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would > > >>> > help > > >>> > with the 0.11.0rc0 problem, but it didn't. > > >> > > >> > > > > > > > > > From bapoczos at cs.cmu.edu Fri Oct 21 15:44:02 2016 From: bapoczos at cs.cmu.edu (Barnabas Poczos) Date: Fri, 21 Oct 2016 15:44:02 -0400 Subject: GPU3 back in business In-Reply-To: <20161021193727.9hJzyE8s5%predragp@cs.cmu.edu> References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>

<20161021193727.9hJzyE8s5%predragp@cs.cmu.edu> Message-ID: Hi Predrag, If there is no other solution, then I think it is OK not to have Matlab on GPU2 and GPU3. Tensorflow has higher priority on these nodes. Best, Barnabas ====================== Barnabas Poczos, PhD Assistant Professor Machine Learning Department Carnegie Mellon University On Fri, Oct 21, 2016 at 3:37 PM, Predrag Punosevac wrote: > Dougal Sutherland wrote: > > > Sorry that I am late for the party. This is my interpretation of what we > should do. > > 1. I will go back to CUDA 8.0 which will brake MATLAB. We have to live > with it. Barnabas please OK this. I will work with MathWorks for this to > be fixed for 2017a release. > > 2. Then I could install TensorFlow compiled by Dougal system wide. > Please Dugal after I upgrade back to 8.0 recompile it again using CUDA > 8.0. I could give you the root password so that you can compile and > install directly. > > 3. If everyone is OK with above I will pull the trigger on GPU3 at > 4:30PM and upgrade to 8.0 > > 4. MATLAB will be broken on GPU2 as well after I put Titan cards during > the October 25 power outrage. > > Predrag > > > > > > >> Heh. :) >> >> An explanation: >> >> - Different nvidia gpu architectures are called "compute capabilities". >> This is a number that describes the behavior of the card: the maximum size >> of various things, which API functions it supports, etc. There's a >> reference here >> , >> but it shouldn't really matter. >> - When CUDA compiles code, it targets a certain architecture, since it >> needs to know what features to use and whatnot. I *think* that if you >> compile for compute capability x, it will work on a card with compute >> capability y approximately iff x <= y. >> - Pascal Titan Xs, like gpu3 has, have compute capability 6.1. >> - CUDA 7.5 doesn't know about compute capability 6.1, so if you ask to >> compile for 6.1 it crashes. >> - Theano by default tries to compile for the capability of the card, but >> can be configured to compile for a different capability. >> - Tensorflow asks for a list of capabilities to compile for when you >> build it in the first place. >> >> >> On Fri, Oct 21, 2016 at 8:17 PM Dougal Sutherland wrote: >> >> > They do work with 7.5 if you specify an older compute architecture; it's >> > just that their actual compute capability of 6.1 isn't supported by cuda >> > 7.5. Thank is thrown off by this, for example, but it can be fixed by >> > telling it to pass compute capability 5.2 (for example) to nvcc. I don't >> > think that this was my problem with building tensorflow on 7.5; I'm not >> > sure what that was. >> > >> > On Fri, Oct 21, 2016, 8:11 PM Kirthevasan Kandasamy >> > wrote: >> > >> > Thanks Dougal. I'll take a look atthis and get back to you. >> > So are you suggesting that this is an issue with TitanX's not being >> > compatible with 7.5? >> > >> > On Fri, Oct 21, 2016 at 3:08 PM, Dougal Sutherland >> > wrote: >> > >> > I installed it in my scratch directory (not sure if there's a global >> > install?). The main thing was to put its cache on scratch; it got really >> > upset when the cache directory was on NFS. (Instructions at the bottom of >> > my previous email.) >> > >> > On Fri, Oct 21, 2016, 8:04 PM Barnabas Poczos wrote: >> > >> > That's great! Thanks Dougal. >> > >> > As I remember bazel was not installed correctly previously on GPU3. Do >> > you know what went wrong with it before and why it is good now? >> > >> > Thanks, >> > Barnabas >> > ====================== >> > Barnabas Poczos, PhD >> > Assistant Professor >> > Machine Learning Department >> > Carnegie Mellon University >> > >> > >> > On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland >> > wrote: >> > > I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda >> > 8.0 >> > > install, and it built fine. So additionally installing 7.5 was probably >> > not >> > > necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute >> > architecture >> > > that the Titan Xs use, so Theano at least needs to be manually told to >> > use >> > > an older architecture. >> > > >> > > A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I >> > think >> > > it should work fine with the cudnn in my scratch directory. >> > > >> > > You should probably install it to scratch, either running this first to >> > put >> > > libraries your scratch directory or using a virtualenv or something: >> > > export PYTHONUSERBASE=/home/scratch/$USER/.local >> > > >> > > You'll need this to use the library and probably to install it: >> > > export >> > > >> > LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/lib64:"$LD_LIBRARY_PATH" >> > > >> > > To install: >> > > pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl >> > > (remove --user if you're using a virtualenv) >> > > >> > > (A request: I'm submitting to ICLR in two weeks, and for some of the >> > models >> > > I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please don't >> > > run a ton of stuff on gpu3 unless you're working on a deadline too. >> > > >> > > >> > > >> > > Steps to install it, for the future: >> > > >> > > Install bazel in your home directory: >> > > >> > > wget >> > > >> > https://github.com/bazelbuild/bazel/releases/download/0.3.2/bazel-0.3.2-installer-linux-x86_64.sh >> > > bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER >> > > --base=/home/scratch/$USER/.bazel >> > > >> > > Configure bazel to build in scratch. There's probably a better way to do >> > > this, but this works: >> > > >> > > mkdir /home/scratch/$USER/.cache >> > > ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel >> > > >> > > Build tensorflow. Note that builds from git checkouts don't work, because >> > > they assume a newer version of git than is on gpu3: >> > > >> > > cd /home/scratch/$USER >> > > wget >> > > tar xf >> > > cd tensorflow-0.11.0rc0 >> > > ./configure >> > > >> > > This is an interactive script that doesn't seem to let you pass >> > arguments or >> > > anything. It's obnoxious. >> > > Use the default python >> > > don't use cloud platform or hadoop file system >> > > use the default site-packages path if it asks >> > > build with GPU support >> > > default gcc >> > > default Cuda SDK version >> > > specify /usr/local/cuda-8.0 >> > > default cudnn version >> > > specify $CUDNN_DIR from use-cudnn.sh, e.g. >> > > /home/scratch/dsutherl/cudnn-8.0-5.1/cuda >> > > Pascal Titan Xs have compute capability 6.1 >> > > >> > > bazel build -c opt --config=cuda >> > > //tensorflow/tools/pip_package:build_pip_package >> > > bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ >> > > A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the >> > > directory you specified above. >> > > >> > > >> > > - Dougal >> > > >> > > >> > > On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy > > > >> > > wrote: >> > >> >> > >> Predrag, >> > >> >> > >> Any updates on gpu3? >> > >> I have tried both tensorflow and chainer and in both cases the problem >> > >> seems to be with cuda >> > >> >> > >> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac > > > >> > >> wrote: >> > >>> >> > >>> Dougal Sutherland wrote: >> > >>> >> > >>> > I tried for a while. I failed. >> > >>> > >> > >>> >> > >>> Damn this doesn't look good. I guess back to the drawing board. Thanks >> > >>> for the quick feed back. >> > >>> >> > >>> Predrag >> > >>> >> > >>> > Version 0.10.0 fails immediately on build: "The specified >> > >>> > --crosstool_top >> > >>> > '@local_config_cuda//crosstool:crosstool' is not a valid >> > >>> > cc_toolchain_suite >> > >>> > rule." Apparently this is because 0.10 required an older version of >> > >>> > bazel ( >> > >>> > https://github.com/tensorflow/tensorflow/issues/4368), and I don't >> > have >> > >>> > the >> > >>> > energy to install an old version of bazel. >> > >>> > >> > >>> > Version 0.11.0rc0 gets almost done and then complains about no such >> > >>> > file or >> > >>> > directory for libcudart.so.7.5 (which is there, where I told >> > tensorflow >> > >>> > it >> > >>> > was...). >> > >>> > >> > >>> > Non-release versions from git fail immediately because they call git >> > -C >> > >>> > to >> > >>> > get version info, which is only in git 1.9 (we have 1.8). >> > >>> > >> > >>> > >> > >>> > Some other notes: >> > >>> > - I made a symlink from ~/.cache/bazel to >> > >>> > /home/scratch/$USER/.cache/bazel, >> > >>> > because bazel is the worst. (It complains about doing things on NFS, >> > >>> > and >> > >>> > hung for me [clock-related?], and I can't find a global config file >> > or >> > >>> > anything to change that in; it seems like there might be one, but >> > their >> > >>> > documentation is terrible.) >> > >>> > >> > >>> > - I wasn't able to use the actual Titan X compute capability of 6.1, >> > >>> > because that requires cuda 8; I used 5.2 instead. Probably not a huge >> > >>> > deal, >> > >>> > but I don't know. >> > >>> > >> > >>> > - I tried explicitly including /usr/local/cuda/lib64 in >> > LD_LIBRARY_PATH >> > >>> > and >> > >>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would >> > >>> > help >> > >>> > with the 0.11.0rc0 problem, but it didn't. >> > >> >> > >> >> > > >> > >> > >> > From predragp at cs.cmu.edu Fri Oct 21 15:50:32 2016 From: predragp at cs.cmu.edu (Predrag Punosevac) Date: Fri, 21 Oct 2016 15:50:32 -0400 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>

<20161021193727.9hJzyE8s5%predragp@cs.cmu.edu> Message-ID: <20161021195032.EWGQfn13b%predragp@cs.cmu.edu> Barnabas Poczos wrote: > Hi Predrag, > > If there is no other solution, then I think it is OK not to have > Matlab on GPU2 and GPU3. > Tensorflow has higher priority on these nodes. We could possibly have multiple CUDA libraries for different versions but that is going to bite us for the rear end quickly. People who want to use MATLAB with GPUs will have to live with GPU1 probably until Spring release of MATLAB. Predrag > > Best, > Barnabas > > > > > ====================== > Barnabas Poczos, PhD > Assistant Professor > Machine Learning Department > Carnegie Mellon University > > > On Fri, Oct 21, 2016 at 3:37 PM, Predrag Punosevac wrote: > > Dougal Sutherland wrote: > > > > > > Sorry that I am late for the party. This is my interpretation of what we > > should do. > > > > 1. I will go back to CUDA 8.0 which will brake MATLAB. We have to live > > with it. Barnabas please OK this. I will work with MathWorks for this to > > be fixed for 2017a release. > > > > 2. Then I could install TensorFlow compiled by Dougal system wide. > > Please Dugal after I upgrade back to 8.0 recompile it again using CUDA > > 8.0. I could give you the root password so that you can compile and > > install directly. > > > > 3. If everyone is OK with above I will pull the trigger on GPU3 at > > 4:30PM and upgrade to 8.0 > > > > 4. MATLAB will be broken on GPU2 as well after I put Titan cards during > > the October 25 power outrage. > > > > Predrag > > > > > > > > > > > > > >> Heh. :) > >> > >> An explanation: > >> > >> - Different nvidia gpu architectures are called "compute capabilities". > >> This is a number that describes the behavior of the card: the maximum size > >> of various things, which API functions it supports, etc. There's a > >> reference here > >> , > >> but it shouldn't really matter. > >> - When CUDA compiles code, it targets a certain architecture, since it > >> needs to know what features to use and whatnot. I *think* that if you > >> compile for compute capability x, it will work on a card with compute > >> capability y approximately iff x <= y. > >> - Pascal Titan Xs, like gpu3 has, have compute capability 6.1. > >> - CUDA 7.5 doesn't know about compute capability 6.1, so if you ask to > >> compile for 6.1 it crashes. > >> - Theano by default tries to compile for the capability of the card, but > >> can be configured to compile for a different capability. > >> - Tensorflow asks for a list of capabilities to compile for when you > >> build it in the first place. > >> > >> > >> On Fri, Oct 21, 2016 at 8:17 PM Dougal Sutherland wrote: > >> > >> > They do work with 7.5 if you specify an older compute architecture; it's > >> > just that their actual compute capability of 6.1 isn't supported by cuda > >> > 7.5. Thank is thrown off by this, for example, but it can be fixed by > >> > telling it to pass compute capability 5.2 (for example) to nvcc. I don't > >> > think that this was my problem with building tensorflow on 7.5; I'm not > >> > sure what that was. > >> > > >> > On Fri, Oct 21, 2016, 8:11 PM Kirthevasan Kandasamy > >> > wrote: > >> > > >> > Thanks Dougal. I'll take a look atthis and get back to you. > >> > So are you suggesting that this is an issue with TitanX's not being > >> > compatible with 7.5? > >> > > >> > On Fri, Oct 21, 2016 at 3:08 PM, Dougal Sutherland > >> > wrote: > >> > > >> > I installed it in my scratch directory (not sure if there's a global > >> > install?). The main thing was to put its cache on scratch; it got really > >> > upset when the cache directory was on NFS. (Instructions at the bottom of > >> > my previous email.) > >> > > >> > On Fri, Oct 21, 2016, 8:04 PM Barnabas Poczos wrote: > >> > > >> > That's great! Thanks Dougal. > >> > > >> > As I remember bazel was not installed correctly previously on GPU3. Do > >> > you know what went wrong with it before and why it is good now? > >> > > >> > Thanks, > >> > Barnabas > >> > ====================== > >> > Barnabas Poczos, PhD > >> > Assistant Professor > >> > Machine Learning Department > >> > Carnegie Mellon University > >> > > >> > > >> > On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland > >> > wrote: > >> > > I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda > >> > 8.0 > >> > > install, and it built fine. So additionally installing 7.5 was probably > >> > not > >> > > necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute > >> > architecture > >> > > that the Titan Xs use, so Theano at least needs to be manually told to > >> > use > >> > > an older architecture. > >> > > > >> > > A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I > >> > think > >> > > it should work fine with the cudnn in my scratch directory. > >> > > > >> > > You should probably install it to scratch, either running this first to > >> > put > >> > > libraries your scratch directory or using a virtualenv or something: > >> > > export PYTHONUSERBASE=/home/scratch/$USER/.local > >> > > > >> > > You'll need this to use the library and probably to install it: > >> > > export > >> > > > >> > LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/lib64:"$LD_LIBRARY_PATH" > >> > > > >> > > To install: > >> > > pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl > >> > > (remove --user if you're using a virtualenv) > >> > > > >> > > (A request: I'm submitting to ICLR in two weeks, and for some of the > >> > models > >> > > I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please don't > >> > > run a ton of stuff on gpu3 unless you're working on a deadline too. > >> > > > >> > > > >> > > > >> > > Steps to install it, for the future: > >> > > > >> > > Install bazel in your home directory: > >> > > > >> > > wget > >> > > > >> > https://github.com/bazelbuild/bazel/releases/download/0.3.2/bazel-0.3.2-installer-linux-x86_64.sh > >> > > bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER > >> > > --base=/home/scratch/$USER/.bazel > >> > > > >> > > Configure bazel to build in scratch. There's probably a better way to do > >> > > this, but this works: > >> > > > >> > > mkdir /home/scratch/$USER/.cache > >> > > ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel > >> > > > >> > > Build tensorflow. Note that builds from git checkouts don't work, because > >> > > they assume a newer version of git than is on gpu3: > >> > > > >> > > cd /home/scratch/$USER > >> > > wget > >> > > tar xf > >> > > cd tensorflow-0.11.0rc0 > >> > > ./configure > >> > > > >> > > This is an interactive script that doesn't seem to let you pass > >> > arguments or > >> > > anything. It's obnoxious. > >> > > Use the default python > >> > > don't use cloud platform or hadoop file system > >> > > use the default site-packages path if it asks > >> > > build with GPU support > >> > > default gcc > >> > > default Cuda SDK version > >> > > specify /usr/local/cuda-8.0 > >> > > default cudnn version > >> > > specify $CUDNN_DIR from use-cudnn.sh, e.g. > >> > > /home/scratch/dsutherl/cudnn-8.0-5.1/cuda > >> > > Pascal Titan Xs have compute capability 6.1 > >> > > > >> > > bazel build -c opt --config=cuda > >> > > //tensorflow/tools/pip_package:build_pip_package > >> > > bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ > >> > > A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the > >> > > directory you specified above. > >> > > > >> > > > >> > > - Dougal > >> > > > >> > > > >> > > On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy >> > > > >> > > wrote: > >> > >> > >> > >> Predrag, > >> > >> > >> > >> Any updates on gpu3? > >> > >> I have tried both tensorflow and chainer and in both cases the problem > >> > >> seems to be with cuda > >> > >> > >> > >> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac >> > > > >> > >> wrote: > >> > >>> > >> > >>> Dougal Sutherland wrote: > >> > >>> > >> > >>> > I tried for a while. I failed. > >> > >>> > > >> > >>> > >> > >>> Damn this doesn't look good. I guess back to the drawing board. Thanks > >> > >>> for the quick feed back. > >> > >>> > >> > >>> Predrag > >> > >>> > >> > >>> > Version 0.10.0 fails immediately on build: "The specified > >> > >>> > --crosstool_top > >> > >>> > '@local_config_cuda//crosstool:crosstool' is not a valid > >> > >>> > cc_toolchain_suite > >> > >>> > rule." Apparently this is because 0.10 required an older version of > >> > >>> > bazel ( > >> > >>> > https://github.com/tensorflow/tensorflow/issues/4368), and I don't > >> > have > >> > >>> > the > >> > >>> > energy to install an old version of bazel. > >> > >>> > > >> > >>> > Version 0.11.0rc0 gets almost done and then complains about no such > >> > >>> > file or > >> > >>> > directory for libcudart.so.7.5 (which is there, where I told > >> > tensorflow > >> > >>> > it > >> > >>> > was...). > >> > >>> > > >> > >>> > Non-release versions from git fail immediately because they call git > >> > -C > >> > >>> > to > >> > >>> > get version info, which is only in git 1.9 (we have 1.8). > >> > >>> > > >> > >>> > > >> > >>> > Some other notes: > >> > >>> > - I made a symlink from ~/.cache/bazel to > >> > >>> > /home/scratch/$USER/.cache/bazel, > >> > >>> > because bazel is the worst. (It complains about doing things on NFS, > >> > >>> > and > >> > >>> > hung for me [clock-related?], and I can't find a global config file > >> > or > >> > >>> > anything to change that in; it seems like there might be one, but > >> > their > >> > >>> > documentation is terrible.) > >> > >>> > > >> > >>> > - I wasn't able to use the actual Titan X compute capability of 6.1, > >> > >>> > because that requires cuda 8; I used 5.2 instead. Probably not a huge > >> > >>> > deal, > >> > >>> > but I don't know. > >> > >>> > > >> > >>> > - I tried explicitly including /usr/local/cuda/lib64 in > >> > LD_LIBRARY_PATH > >> > >>> > and > >> > >>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would > >> > >>> > help > >> > >>> > with the 0.11.0rc0 problem, but it didn't. > >> > >> > >> > >> > >> > > > >> > > >> > > >> > From bapoczos at cs.cmu.edu Fri Oct 21 15:54:08 2016 From: bapoczos at cs.cmu.edu (Barnabas Poczos) Date: Fri, 21 Oct 2016 15:54:08 -0400 Subject: GPU3 back in business In-Reply-To: <20161021195032.EWGQfn13b%predragp@cs.cmu.edu> References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>

<20161021193727.9hJzyE8s5%predragp@cs.cmu.edu> <20161021195032.EWGQfn13b%predragp@cs.cmu.edu> Message-ID: Sounds good. Let us have tensorflow system wide on all GPU nodes. We can worry about Matlab later. Best, B ====================== Barnabas Poczos, PhD Assistant Professor Machine Learning Department Carnegie Mellon University On Fri, Oct 21, 2016 at 3:50 PM, Predrag Punosevac wrote: > Barnabas Poczos wrote: > >> Hi Predrag, >> >> If there is no other solution, then I think it is OK not to have >> Matlab on GPU2 and GPU3. >> Tensorflow has higher priority on these nodes. > > We could possibly have multiple CUDA libraries for different versions > but that is going to bite us for the rear end quickly. People who want > to use MATLAB with GPUs will have to live with GPU1 probably until > Spring release of MATLAB. > > Predrag > >> >> Best, >> Barnabas >> >> >> >> >> ====================== >> Barnabas Poczos, PhD >> Assistant Professor >> Machine Learning Department >> Carnegie Mellon University >> >> >> On Fri, Oct 21, 2016 at 3:37 PM, Predrag Punosevac wrote: >> > Dougal Sutherland wrote: >> > >> > >> > Sorry that I am late for the party. This is my interpretation of what we >> > should do. >> > >> > 1. I will go back to CUDA 8.0 which will brake MATLAB. We have to live >> > with it. Barnabas please OK this. I will work with MathWorks for this to >> > be fixed for 2017a release. >> > >> > 2. Then I could install TensorFlow compiled by Dougal system wide. >> > Please Dugal after I upgrade back to 8.0 recompile it again using CUDA >> > 8.0. I could give you the root password so that you can compile and >> > install directly. >> > >> > 3. If everyone is OK with above I will pull the trigger on GPU3 at >> > 4:30PM and upgrade to 8.0 >> > >> > 4. MATLAB will be broken on GPU2 as well after I put Titan cards during >> > the October 25 power outrage. >> > >> > Predrag >> > >> > >> > >> > >> > >> > >> >> Heh. :) >> >> >> >> An explanation: >> >> >> >> - Different nvidia gpu architectures are called "compute capabilities". >> >> This is a number that describes the behavior of the card: the maximum size >> >> of various things, which API functions it supports, etc. There's a >> >> reference here >> >> , >> >> but it shouldn't really matter. >> >> - When CUDA compiles code, it targets a certain architecture, since it >> >> needs to know what features to use and whatnot. I *think* that if you >> >> compile for compute capability x, it will work on a card with compute >> >> capability y approximately iff x <= y. >> >> - Pascal Titan Xs, like gpu3 has, have compute capability 6.1. >> >> - CUDA 7.5 doesn't know about compute capability 6.1, so if you ask to >> >> compile for 6.1 it crashes. >> >> - Theano by default tries to compile for the capability of the card, but >> >> can be configured to compile for a different capability. >> >> - Tensorflow asks for a list of capabilities to compile for when you >> >> build it in the first place. >> >> >> >> >> >> On Fri, Oct 21, 2016 at 8:17 PM Dougal Sutherland wrote: >> >> >> >> > They do work with 7.5 if you specify an older compute architecture; it's >> >> > just that their actual compute capability of 6.1 isn't supported by cuda >> >> > 7.5. Thank is thrown off by this, for example, but it can be fixed by >> >> > telling it to pass compute capability 5.2 (for example) to nvcc. I don't >> >> > think that this was my problem with building tensorflow on 7.5; I'm not >> >> > sure what that was. >> >> > >> >> > On Fri, Oct 21, 2016, 8:11 PM Kirthevasan Kandasamy >> >> > wrote: >> >> > >> >> > Thanks Dougal. I'll take a look atthis and get back to you. >> >> > So are you suggesting that this is an issue with TitanX's not being >> >> > compatible with 7.5? >> >> > >> >> > On Fri, Oct 21, 2016 at 3:08 PM, Dougal Sutherland >> >> > wrote: >> >> > >> >> > I installed it in my scratch directory (not sure if there's a global >> >> > install?). The main thing was to put its cache on scratch; it got really >> >> > upset when the cache directory was on NFS. (Instructions at the bottom of >> >> > my previous email.) >> >> > >> >> > On Fri, Oct 21, 2016, 8:04 PM Barnabas Poczos wrote: >> >> > >> >> > That's great! Thanks Dougal. >> >> > >> >> > As I remember bazel was not installed correctly previously on GPU3. Do >> >> > you know what went wrong with it before and why it is good now? >> >> > >> >> > Thanks, >> >> > Barnabas >> >> > ====================== >> >> > Barnabas Poczos, PhD >> >> > Assistant Professor >> >> > Machine Learning Department >> >> > Carnegie Mellon University >> >> > >> >> > >> >> > On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland >> >> > wrote: >> >> > > I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda >> >> > 8.0 >> >> > > install, and it built fine. So additionally installing 7.5 was probably >> >> > not >> >> > > necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute >> >> > architecture >> >> > > that the Titan Xs use, so Theano at least needs to be manually told to >> >> > use >> >> > > an older architecture. >> >> > > >> >> > > A pip package is in ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl. I >> >> > think >> >> > > it should work fine with the cudnn in my scratch directory. >> >> > > >> >> > > You should probably install it to scratch, either running this first to >> >> > put >> >> > > libraries your scratch directory or using a virtualenv or something: >> >> > > export PYTHONUSERBASE=/home/scratch/$USER/.local >> >> > > >> >> > > You'll need this to use the library and probably to install it: >> >> > > export >> >> > > >> >> > LD_LIBRARY_PATH=/home/scratch/dsutherl/cudnn-8.0-5.1/cuda/lib64:"$LD_LIBRARY_PATH" >> >> > > >> >> > > To install: >> >> > > pip install --user ~dsutherl/tensorflow-0.11.0rc0-py2-none-any.whl >> >> > > (remove --user if you're using a virtualenv) >> >> > > >> >> > > (A request: I'm submitting to ICLR in two weeks, and for some of the >> >> > models >> >> > > I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please don't >> >> > > run a ton of stuff on gpu3 unless you're working on a deadline too. >> >> > > >> >> > > >> >> > > >> >> > > Steps to install it, for the future: >> >> > > >> >> > > Install bazel in your home directory: >> >> > > >> >> > > wget >> >> > > >> >> > https://github.com/bazelbuild/bazel/releases/download/0.3.2/bazel-0.3.2-installer-linux-x86_64.sh >> >> > > bash bazel-0.3.2-installer-linux-x86_64.sh --prefix=/home/scratch/$USER >> >> > > --base=/home/scratch/$USER/.bazel >> >> > > >> >> > > Configure bazel to build in scratch. There's probably a better way to do >> >> > > this, but this works: >> >> > > >> >> > > mkdir /home/scratch/$USER/.cache >> >> > > ln -s /home/scratch/$USER/.cache/bazel ~/.cache/bazel >> >> > > >> >> > > Build tensorflow. Note that builds from git checkouts don't work, because >> >> > > they assume a newer version of git than is on gpu3: >> >> > > >> >> > > cd /home/scratch/$USER >> >> > > wget >> >> > > tar xf >> >> > > cd tensorflow-0.11.0rc0 >> >> > > ./configure >> >> > > >> >> > > This is an interactive script that doesn't seem to let you pass >> >> > arguments or >> >> > > anything. It's obnoxious. >> >> > > Use the default python >> >> > > don't use cloud platform or hadoop file system >> >> > > use the default site-packages path if it asks >> >> > > build with GPU support >> >> > > default gcc >> >> > > default Cuda SDK version >> >> > > specify /usr/local/cuda-8.0 >> >> > > default cudnn version >> >> > > specify $CUDNN_DIR from use-cudnn.sh, e.g. >> >> > > /home/scratch/dsutherl/cudnn-8.0-5.1/cuda >> >> > > Pascal Titan Xs have compute capability 6.1 >> >> > > >> >> > > bazel build -c opt --config=cuda >> >> > > //tensorflow/tools/pip_package:build_pip_package >> >> > > bazel-bin/tensorflow/tools/pip_package/build_pip_package ./ >> >> > > A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-any.whl, is put in the >> >> > > directory you specified above. >> >> > > >> >> > > >> >> > > - Dougal >> >> > > >> >> > > >> >> > > On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy > >> > > >> >> > > wrote: >> >> > >> >> >> > >> Predrag, >> >> > >> >> >> > >> Any updates on gpu3? >> >> > >> I have tried both tensorflow and chainer and in both cases the problem >> >> > >> seems to be with cuda >> >> > >> >> >> > >> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac > >> > > >> >> > >> wrote: >> >> > >>> >> >> > >>> Dougal Sutherland wrote: >> >> > >>> >> >> > >>> > I tried for a while. I failed. >> >> > >>> > >> >> > >>> >> >> > >>> Damn this doesn't look good. I guess back to the drawing board. Thanks >> >> > >>> for the quick feed back. >> >> > >>> >> >> > >>> Predrag >> >> > >>> >> >> > >>> > Version 0.10.0 fails immediately on build: "The specified >> >> > >>> > --crosstool_top >> >> > >>> > '@local_config_cuda//crosstool:crosstool' is not a valid >> >> > >>> > cc_toolchain_suite >> >> > >>> > rule." Apparently this is because 0.10 required an older version of >> >> > >>> > bazel ( >> >> > >>> > https://github.com/tensorflow/tensorflow/issues/4368), and I don't >> >> > have >> >> > >>> > the >> >> > >>> > energy to install an old version of bazel. >> >> > >>> > >> >> > >>> > Version 0.11.0rc0 gets almost done and then complains about no such >> >> > >>> > file or >> >> > >>> > directory for libcudart.so.7.5 (which is there, where I told >> >> > tensorflow >> >> > >>> > it >> >> > >>> > was...). >> >> > >>> > >> >> > >>> > Non-release versions from git fail immediately because they call git >> >> > -C >> >> > >>> > to >> >> > >>> > get version info, which is only in git 1.9 (we have 1.8). >> >> > >>> > >> >> > >>> > >> >> > >>> > Some other notes: >> >> > >>> > - I made a symlink from ~/.cache/bazel to >> >> > >>> > /home/scratch/$USER/.cache/bazel, >> >> > >>> > because bazel is the worst. (It complains about doing things on NFS, >> >> > >>> > and >> >> > >>> > hung for me [clock-related?], and I can't find a global config file >> >> > or >> >> > >>> > anything to change that in; it seems like there might be one, but >> >> > their >> >> > >>> > documentation is terrible.) >> >> > >>> > >> >> > >>> > - I wasn't able to use the actual Titan X compute capability of 6.1, >> >> > >>> > because that requires cuda 8; I used 5.2 instead. Probably not a huge >> >> > >>> > deal, >> >> > >>> > but I don't know. >> >> > >>> > >> >> > >>> > - I tried explicitly including /usr/local/cuda/lib64 in >> >> > LD_LIBRARY_PATH >> >> > >>> > and >> >> > >>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would >> >> > >>> > help >> >> > >>> > with the 0.11.0rc0 problem, but it didn't. >> >> > >> >> >> > >> >> >> > > >> >> > >> >> > >> >> > From kandasamy at cmu.edu Fri Oct 21 16:21:33 2016 From: kandasamy at cmu.edu (Kirthevasan Kandasamy) Date: Fri, 21 Oct 2016 16:21:33 -0400 Subject: GPU3 back in business In-Reply-To: References: <20161019172209.esq71ATV8%predragp@cs.cmu.edu> <20161019183651.3Xy0murhr%predragp@cs.cmu.edu> <7B1F10ED8F1045F6.b801b0a4-484f-4c6e-9558-4a83a0cb33ec@mail.outlook.com> <20161019201053.KHd8koxJl%predragp@cs.cmu.edu>