Misc issues

Predrag Punosevac predragp at andrew.cmu.edu
Fri Nov 2 18:03:45 EDT 2018


Dear Autonians,

During today's orientation session with three new members of the lab few
issues got my attention so I am goint to share them with you as well as
a status update on the main file server.


1. As of this morning ZFS snapshots are disabled on the main file server
hosting majority of older accounts and old snapshots have been deleted.
Right now zpool hosting home directories is 88% full. At the moment I am
moving at least one large legacy home directory to the attic which
should releave 1 TB of space. This is still insuficient to drop the pool
load to below 80% needed for normal NFS, resilvering, and snapshot
operations. I am recalculationg home directory sizes this very moment
and I hope I will have report by Monday. Any directory sized larger than
0.5TB will be prime target for migrartion (or removal).

2. Git web interface was temporary down due to the SSL certificate
update. After I restarted PostgreSQL database things work as expected
for both local accounts (Anthony's and mine) as well as LDAP accounts
(everyone else). Git clone, pull, and push also work per testing. 

3. There was a report on ssh key issue with Git authentication needed
for the git operations from cli. This issue is user specific and it has
nothing to do with SELinux NFS policies we had in the past

setsebool -P use_nfs_home_dirs=true

or the fact that 25 home directories are already migrated to the new
file server and mounted by autofs daemon on the login. I verfied this
both on GPU machines (which have SELinux disabled) as well as regular
CPU machines which have SELinux enabled with the account that had to be
autofs mounted per login. 


4. You can also put your public key into .ssh/authorized_keys and use
passwordless authentication whether you have an old permanently mount
home directory or the one mounte with autofs daemon. This is tested. 
I had a rough time today during the demo (I guess it is just my age). 


5. Finally it appears that at one out of four GPU cards on the GPU3 and
GPU4 computing nodes are not working properly. Please see below. I have
seen this in the past and the reason was dead hardware. I would like to
reboot those two servers and do some further testing before we shell out
$2500 for two used Titan Xp cards. 

I hope you have a great weekend.

Predrag

root at gpu3$ nvidia-smi
Fri Nov  2 17:01:29 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30
    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:02:00.0 Off |
N/A |
| 23%   29C    P8    17W / 250W |   1081MiB / 12196MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:03:00.0 Off |
N/A |
| 23%   28C    P8     9W / 250W |   1081MiB / 12196MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 00000000:82:00.0 Off |
N/A |
| 23%   31C    P8    11W / 250W |   1081MiB / 12196MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+

     
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU
Memory |
|  GPU       PID   Type   Process name                             Usage
    |
|=============================================================================|
|    0      4458      C   python3
1071MiB |
|    1     19870      C   python3
1071MiB |
|    2      3102      C   python3
1071MiB |
+-----------------------------------------------------------------------------+

root at gpu4$ nvidia-smi
Fri Nov  2 17:01:34 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30
    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:02:00.0 Off |
N/A |
| 41%   67C    P2   169W / 250W |   3213MiB / 12196MiB |      7%
Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:03:00.0 Off |
N/A |
| 25%   45C    P8    19W / 250W |   2810MiB / 12196MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 00000000:83:00.0 Off |
N/A |
| 23%   34C    P8    15W / 250W |    317MiB / 12196MiB |      0%
Default |
+-------------------------------+----------------------+----------------------+

     
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU
Memory |
|  GPU       PID   Type   Process name                             Usage
    |
|=============================================================================|
|    0     21927      C   python2
1689MiB |
|    1      7063      C   ...auton/home/cnagpal/anaconda2/bin/python
1267MiB |
|    1     12081      C   ...auton/home/cnagpal/anaconda2/bin/python
1525MiB |
|    2     28460      C   python
307MiB |
+-----------------------------------------------------------------------------+


More information about the Autonlab-users mailing list