GPU[5-7] status update
Predrag Punosevac
predragp at andrew.cmu.edu
Fri Sep 8 23:55:34 EDT 2017
Michael Andrews <mbandrews at cmu.edu> wrote:
> Hi Predrag,
>
> Many of the gpus on the auton machines seem to have their memory maxed out,
> and for those that do have memory (gpu3 for instance), it seems to take
> hours for a session to initialize... is this something expected?
>
> Thanks,
> Michael
Dear Autonians,
This is a status update on the long expected hardware edition to our
lab.
GPU5 built, 4 Titan Xp cards installed, up and running. You can log into
the machine and use it. However installed driver
http://www.nvidia.com/download/driverResults.aspx/123103/en-us
seems to be not loaded into the kernel. I did install cuda-8.0 toolkit
non the less. I am out of fuel tonight to see what is going on. If
somebody see something please let me know.
GPU6 built, 2 Titan Xp cards installed, up and running. NVidia/CUDA has
the same issue as on GPU6. I ordered 2 Titan X (note that p is missing)
to complete the server. If it is not too late I will try to switch the
order on Monday.
GPU7 built, missing GPU cards. However 4 Titan X cards ordered. You can
log and use CPUs. I am kind reluctant to switch the order to Titan Xp
due to the driver issues. Titan X has being rock solid for us. I am not
sure when I will receive the GPU cards.
MATLAB is not installed on GPU[5-7]. I will wait a week or so for R2017b
release. We will see if this release is going to work with Titan X cards
on GPU[2-4] which use older Nvidia driver. I am not too optimistic that
MATLAB is going to work with the latest driver.
Please don't bother me with the questions about TensorFlow, Caffe, and
similar until I sort out things with the hardware.
Finally I think I will have enough HDDs to create additional 7 disk RAID
6 on GPU5 with the storage capacity of 10TB. The OS HDDs have scratch
space of about 2TB.
GPU6 and GPU7 just like GPU3 and GPU4 will only have 2TB scratch space.
Cheers,
Predrag
P.S. I sent earlier two e-mails about GPU1 but I didn't see that e-mails
got posted. GPU1 driver problems appears to be fixed and the unit is
fully functional.
More information about the Autonlab-users
mailing list