<div dir="ltr">Thanks Dougal. I'll take a look atthis and get back to you.<div>So are you suggesting that this is an issue with TitanX's not being compatible with 7.5?</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 21, 2016 at 3:08 PM, Dougal Sutherland <span dir="ltr"><<a href="mailto:dougal@gmail.com" target="_blank">dougal@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr">I installed it in my scratch directory (not sure if there's a global install?). The main thing was to put its cache on scratch; it got really upset when the cache directory was on NFS. (Instructions at the bottom of my previous email.)</p><div class="HOEnZb"><div class="h5">
<br><div class="gmail_quote"><div dir="ltr">On Fri, Oct 21, 2016, 8:04 PM Barnabas Poczos <<a href="mailto:bapoczos@cs.cmu.edu" target="_blank">bapoczos@cs.cmu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">That's great! Thanks Dougal.<br class="m_-6269736360502246421gmail_msg">
<br class="m_-6269736360502246421gmail_msg">
As I remember bazel was not installed correctly previously on GPU3. Do<br class="m_-6269736360502246421gmail_msg">
you know what went wrong with it before and why it is good now?<br class="m_-6269736360502246421gmail_msg">
<br class="m_-6269736360502246421gmail_msg">
Thanks,<br class="m_-6269736360502246421gmail_msg">
Barnabas<br class="m_-6269736360502246421gmail_msg">
======================<br class="m_-6269736360502246421gmail_msg">
Barnabas Poczos, PhD<br class="m_-6269736360502246421gmail_msg">
Assistant Professor<br class="m_-6269736360502246421gmail_msg">
Machine Learning Department<br class="m_-6269736360502246421gmail_msg">
Carnegie Mellon University<br class="m_-6269736360502246421gmail_msg">
<br class="m_-6269736360502246421gmail_msg">
<br class="m_-6269736360502246421gmail_msg">
On Fri, Oct 21, 2016 at 2:03 PM, Dougal Sutherland <<a href="mailto:dougal@gmail.com" class="m_-6269736360502246421gmail_msg" target="_blank">dougal@gmail.com</a>> wrote:<br class="m_-6269736360502246421gmail_msg">
> I was just able to build tensorflow 0.11.0rc0 on gpu3! I used the cuda 8.0<br class="m_-6269736360502246421gmail_msg">
> install, and it built fine. So additionally installing 7.5 was probably not<br class="m_-6269736360502246421gmail_msg">
> necessary; in fact, cuda 7.5 doesn't know about the 6.1 compute architecture<br class="m_-6269736360502246421gmail_msg">
> that the Titan Xs use, so Theano at least needs to be manually told to use<br class="m_-6269736360502246421gmail_msg">
> an older architecture.<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> A pip package is in ~dsutherl/tensorflow-0.11.<wbr>0rc0-py2-none-any.whl. I think<br class="m_-6269736360502246421gmail_msg">
> it should work fine with the cudnn in my scratch directory.<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> You should probably install it to scratch, either running this first to put<br class="m_-6269736360502246421gmail_msg">
> libraries your scratch directory or using a virtualenv or something:<br class="m_-6269736360502246421gmail_msg">
> export PYTHONUSERBASE=/home/scratch/$<wbr>USER/.local<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> You'll need this to use the library and probably to install it:<br class="m_-6269736360502246421gmail_msg">
> export<br class="m_-6269736360502246421gmail_msg">
> LD_LIBRARY_PATH=/home/scratch/<wbr>dsutherl/cudnn-8.0-5.1/cuda/<wbr>lib64:"$LD_LIBRARY_PATH"<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> To install:<br class="m_-6269736360502246421gmail_msg">
> pip install --user ~dsutherl/tensorflow-0.11.<wbr>0rc0-py2-none-any.whl<br class="m_-6269736360502246421gmail_msg">
> (remove --user if you're using a virtualenv)<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> (A request: I'm submitting to ICLR in two weeks, and for some of the models<br class="m_-6269736360502246421gmail_msg">
> I'm running gpu3's cards are 4x the speed of gpu1 or 2's. So please don't<br class="m_-6269736360502246421gmail_msg">
> run a ton of stuff on gpu3 unless you're working on a deadline too.<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> Steps to install it, for the future:<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> Install bazel in your home directory:<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> wget<br class="m_-6269736360502246421gmail_msg">
> <a href="https://github.com/bazelbuild/bazel/releases/download/0.3.2/bazel-0.3.2-installer-linux-x86_64.sh" rel="noreferrer" class="m_-6269736360502246421gmail_msg" target="_blank">https://github.com/bazelbuild/<wbr>bazel/releases/download/0.3.2/<wbr>bazel-0.3.2-installer-linux-<wbr>x86_64.sh</a><br class="m_-6269736360502246421gmail_msg">
> bash <a href="http://bazel-0.3.2-installer-linux-x86_64.sh" rel="noreferrer" class="m_-6269736360502246421gmail_msg" target="_blank">bazel-0.3.2-installer-linux-<wbr>x86_64.sh</a> --prefix=/home/scratch/$USER<br class="m_-6269736360502246421gmail_msg">
> --base=/home/scratch/$USER/.<wbr>bazel<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> Configure bazel to build in scratch. There's probably a better way to do<br class="m_-6269736360502246421gmail_msg">
> this, but this works:<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> mkdir /home/scratch/$USER/.cache<br class="m_-6269736360502246421gmail_msg">
> ln -s /home/scratch/$USER/.cache/<wbr>bazel ~/.cache/bazel<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> Build tensorflow. Note that builds from git checkouts don't work, because<br class="m_-6269736360502246421gmail_msg">
> they assume a newer version of git than is on gpu3:<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> cd /home/scratch/$USER<br class="m_-6269736360502246421gmail_msg">
> wget<br class="m_-6269736360502246421gmail_msg">
> tar xf<br class="m_-6269736360502246421gmail_msg">
> cd tensorflow-0.11.0rc0<br class="m_-6269736360502246421gmail_msg">
> ./configure<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> This is an interactive script that doesn't seem to let you pass arguments or<br class="m_-6269736360502246421gmail_msg">
> anything. It's obnoxious.<br class="m_-6269736360502246421gmail_msg">
> Use the default python<br class="m_-6269736360502246421gmail_msg">
> don't use cloud platform or hadoop file system<br class="m_-6269736360502246421gmail_msg">
> use the default site-packages path if it asks<br class="m_-6269736360502246421gmail_msg">
> build with GPU support<br class="m_-6269736360502246421gmail_msg">
> default gcc<br class="m_-6269736360502246421gmail_msg">
> default Cuda SDK version<br class="m_-6269736360502246421gmail_msg">
> specify /usr/local/cuda-8.0<br class="m_-6269736360502246421gmail_msg">
> default cudnn version<br class="m_-6269736360502246421gmail_msg">
> specify $CUDNN_DIR from use-cudnn.sh, e.g.<br class="m_-6269736360502246421gmail_msg">
> /home/scratch/dsutherl/cudnn-<wbr>8.0-5.1/cuda<br class="m_-6269736360502246421gmail_msg">
> Pascal Titan Xs have compute capability 6.1<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> bazel build -c opt --config=cuda<br class="m_-6269736360502246421gmail_msg">
> //tensorflow/tools/pip_<wbr>package:build_pip_package<br class="m_-6269736360502246421gmail_msg">
> bazel-bin/tensorflow/tools/<wbr>pip_package/build_pip_package ./<br class="m_-6269736360502246421gmail_msg">
> A .whl file, e.g. tensorflow-0.11.0rc0-py2-none-<wbr>any.whl, is put in the<br class="m_-6269736360502246421gmail_msg">
> directory you specified above.<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> - Dougal<br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
> On Fri, Oct 21, 2016 at 6:14 PM Kirthevasan Kandasamy <<a href="mailto:kandasamy@cmu.edu" class="m_-6269736360502246421gmail_msg" target="_blank">kandasamy@cmu.edu</a>><br class="m_-6269736360502246421gmail_msg">
> wrote:<br class="m_-6269736360502246421gmail_msg">
>><br class="m_-6269736360502246421gmail_msg">
>> Predrag,<br class="m_-6269736360502246421gmail_msg">
>><br class="m_-6269736360502246421gmail_msg">
>> Any updates on gpu3?<br class="m_-6269736360502246421gmail_msg">
>> I have tried both tensorflow and chainer and in both cases the problem<br class="m_-6269736360502246421gmail_msg">
>> seems to be with cuda<br class="m_-6269736360502246421gmail_msg">
>><br class="m_-6269736360502246421gmail_msg">
>> On Wed, Oct 19, 2016 at 4:10 PM, Predrag Punosevac <<a href="mailto:predragp@cs.cmu.edu" class="m_-6269736360502246421gmail_msg" target="_blank">predragp@cs.cmu.edu</a>><br class="m_-6269736360502246421gmail_msg">
>> wrote:<br class="m_-6269736360502246421gmail_msg">
>>><br class="m_-6269736360502246421gmail_msg">
>>> Dougal Sutherland <<a href="mailto:dougal@gmail.com" class="m_-6269736360502246421gmail_msg" target="_blank">dougal@gmail.com</a>> wrote:<br class="m_-6269736360502246421gmail_msg">
>>><br class="m_-6269736360502246421gmail_msg">
>>> > I tried for a while. I failed.<br class="m_-6269736360502246421gmail_msg">
>>> ><br class="m_-6269736360502246421gmail_msg">
>>><br class="m_-6269736360502246421gmail_msg">
>>> Damn this doesn't look good. I guess back to the drawing board. Thanks<br class="m_-6269736360502246421gmail_msg">
>>> for the quick feed back.<br class="m_-6269736360502246421gmail_msg">
>>><br class="m_-6269736360502246421gmail_msg">
>>> Predrag<br class="m_-6269736360502246421gmail_msg">
>>><br class="m_-6269736360502246421gmail_msg">
>>> > Version 0.10.0 fails immediately on build: "The specified<br class="m_-6269736360502246421gmail_msg">
>>> > --crosstool_top<br class="m_-6269736360502246421gmail_msg">
>>> > '@local_config_cuda//<wbr>crosstool:crosstool' is not a valid<br class="m_-6269736360502246421gmail_msg">
>>> > cc_toolchain_suite<br class="m_-6269736360502246421gmail_msg">
>>> > rule." Apparently this is because 0.10 required an older version of<br class="m_-6269736360502246421gmail_msg">
>>> > bazel (<br class="m_-6269736360502246421gmail_msg">
>>> > <a href="https://github.com/tensorflow/tensorflow/issues/4368" rel="noreferrer" class="m_-6269736360502246421gmail_msg" target="_blank">https://github.com/tensorflow/<wbr>tensorflow/issues/4368</a>), and I don't have<br class="m_-6269736360502246421gmail_msg">
>>> > the<br class="m_-6269736360502246421gmail_msg">
>>> > energy to install an old version of bazel.<br class="m_-6269736360502246421gmail_msg">
>>> ><br class="m_-6269736360502246421gmail_msg">
>>> > Version 0.11.0rc0 gets almost done and then complains about no such<br class="m_-6269736360502246421gmail_msg">
>>> > file or<br class="m_-6269736360502246421gmail_msg">
>>> > directory for libcudart.so.7.5 (which is there, where I told tensorflow<br class="m_-6269736360502246421gmail_msg">
>>> > it<br class="m_-6269736360502246421gmail_msg">
>>> > was...).<br class="m_-6269736360502246421gmail_msg">
>>> ><br class="m_-6269736360502246421gmail_msg">
>>> > Non-release versions from git fail immediately because they call git -C<br class="m_-6269736360502246421gmail_msg">
>>> > to<br class="m_-6269736360502246421gmail_msg">
>>> > get version info, which is only in git 1.9 (we have 1.8).<br class="m_-6269736360502246421gmail_msg">
>>> ><br class="m_-6269736360502246421gmail_msg">
>>> ><br class="m_-6269736360502246421gmail_msg">
>>> > Some other notes:<br class="m_-6269736360502246421gmail_msg">
>>> > - I made a symlink from ~/.cache/bazel to<br class="m_-6269736360502246421gmail_msg">
>>> > /home/scratch/$USER/.cache/<wbr>bazel,<br class="m_-6269736360502246421gmail_msg">
>>> > because bazel is the worst. (It complains about doing things on NFS,<br class="m_-6269736360502246421gmail_msg">
>>> > and<br class="m_-6269736360502246421gmail_msg">
>>> > hung for me [clock-related?], and I can't find a global config file or<br class="m_-6269736360502246421gmail_msg">
>>> > anything to change that in; it seems like there might be one, but their<br class="m_-6269736360502246421gmail_msg">
>>> > documentation is terrible.)<br class="m_-6269736360502246421gmail_msg">
>>> ><br class="m_-6269736360502246421gmail_msg">
>>> > - I wasn't able to use the actual Titan X compute capability of 6.1,<br class="m_-6269736360502246421gmail_msg">
>>> > because that requires cuda 8; I used 5.2 instead. Probably not a huge<br class="m_-6269736360502246421gmail_msg">
>>> > deal,<br class="m_-6269736360502246421gmail_msg">
>>> > but I don't know.<br class="m_-6269736360502246421gmail_msg">
>>> ><br class="m_-6269736360502246421gmail_msg">
>>> > - I tried explicitly including /usr/local/cuda/lib64 in LD_LIBRARY_PATH<br class="m_-6269736360502246421gmail_msg">
>>> > and<br class="m_-6269736360502246421gmail_msg">
>>> > set CUDA_HOME to /usr/local/cuda before building, hoping that would<br class="m_-6269736360502246421gmail_msg">
>>> > help<br class="m_-6269736360502246421gmail_msg">
>>> > with the 0.11.0rc0 problem, but it didn't.<br class="m_-6269736360502246421gmail_msg">
>><br class="m_-6269736360502246421gmail_msg">
>><br class="m_-6269736360502246421gmail_msg">
><br class="m_-6269736360502246421gmail_msg">
</blockquote></div>
</div></div></blockquote></div><br></div>