====== Differences ====== This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
ace-gpu-1_installation_log [2017/02/15 17:41] csteel created |
ace-gpu-1_installation_log [2018/04/04 13:00] (current) xteng [/etc/skel] |
||
---|---|---|---|
Line 2: | Line 2: | ||
Log of platform setup and configuration | Log of platform setup and configuration | ||
+ | |||
+ | ===== Base ===== | ||
+ | |||
+ | |||
+ | ===== NVIDIA Driver ===== | ||
+ | |||
+ | This is probably not required as a driver is included with CUDA. | ||
+ | |||
+ | <code> | ||
+ | chmod 770 NVIDIA-Linux-x86_64-375.26.run | ||
+ | /etc/init.d/lightdm stop | ||
+ | ./NVIDIA-Linux-x86_64-375.26.run | ||
+ | reboot | ||
+ | </code> | ||
+ | |||
+ | ===== CUDA ==== | ||
+ | |||
+ | Installation of CUDA from debian package | ||
+ | |||
+ | ==== Confirm GPU ==== | ||
+ | |||
+ | <code> | ||
+ | lspci | grep -i nvidia | ||
+ | </code> | ||
+ | |||
+ | Output example | ||
+ | |||
+ | <code> | ||
+ | 01:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1) | ||
+ | 01:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) | ||
+ | 02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1) | ||
+ | 02:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) | ||
+ | 03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1) | ||
+ | 03:00.1 Audio device: NVIDIA Corporation Device 10ef (rev a1) | ||
+ | |||
+ | </code> | ||
+ | |||
+ | ==== gcc version ==== | ||
+ | |||
+ | <code> | ||
+ | gcc --version | ||
+ | </code> | ||
+ | |||
+ | <code> | ||
+ | gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 | ||
+ | Copyright (C) 2015 Free Software Foundation, Inc. | ||
+ | This is free software; see the source for copying conditions. There is NO | ||
+ | warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
+ | </code> | ||
+ | |||
+ | ==== ensure for headers ==== | ||
+ | |||
+ | <code> | ||
+ | sudo apt-get install linux-headers-$(uname -r) | ||
+ | </code> | ||
+ | |||
+ | ==== Download CUDA Toolkit ==== | ||
+ | |||
+ | <code> | ||
+ | wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb | ||
+ | </code> | ||
+ | |||
+ | ==== Confirm checksum ==== | ||
+ | |||
+ | <code> | ||
+ | wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/docs/sidebar/md5sum-txt | ||
+ | </code> | ||
+ | |||
+ | <code> | ||
+ | md5sum cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb | ||
+ | cat md5sum-txt | grep cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64 | ||
+ | </code> | ||
+ | |||
+ | ==== Install ==== | ||
+ | |||
+ | <code> | ||
+ | mv cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb | ||
+ | sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb | ||
+ | sudo apt-get update | ||
+ | sudo apt-get install cuda | ||
+ | </code> | ||
+ | |||
+ | Reboot system. | ||
+ | |||
+ | ==== Environment Setup ==== | ||
+ | |||
+ | === Add CUDA bin path === | ||
+ | |||
+ | </code> | ||
+ | export PATH=/usr/local/cuda-8.0/bin:${PATH} | ||
+ | echo $PATH | ||
+ | <code> | ||
+ | |||
+ | === Ensure for LD_LIBRARY_PATH === | ||
+ | |||
+ | Ensure LD_LIBRARY_PATH includes `/usr/local/cuda-8.0/lib64` | ||
+ | |||
+ | <code> | ||
+ | echo $LD_LIBRARY_PATH | ||
+ | </code> | ||
+ | |||
+ | If not set set using: | ||
+ | |||
+ | <code> | ||
+ | export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64 | ||
+ | </code> | ||
+ | |||
+ | otherwise something like this: | ||
+ | |||
+ | <code> | ||
+ | export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH | ||
+ | </code> | ||
+ | |||
+ | ==== /etc/skel ==== | ||
+ | |||
+ | Configure `/etc/skel` so that new users have the proper environment configuration | ||
+ | |||
+ | <code> | ||
+ | nano /etc/skel/.profile | ||
+ | </code> | ||
+ | |||
+ | Content example | ||
+ | |||
+ | <code> | ||
+ | # set user PATH to include /usr/local/cuda-8.0/bin | ||
+ | if [ -d "/usr/local/cuda-8.0/bin" ]; then | ||
+ | PATH="/usr/local/cuda-8.0/bin:$PATH" | ||
+ | fi | ||
+ | |||
+ | # set user LD_LIBRARY_PATH to include /usr/local/cuda-8.0/lib64 | ||
+ | if [ -d "/usr/local/cuda-8.0/lib64" ]; then | ||
+ | LD_LIBRARY_PATH="/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH" | ||
+ | |||
+ | |||
+ | </code> | ||
+ | <code> | ||
+ | </code> | ||
+ | <code> | ||
+ | </code> | ||
+ | |||
+ | === GPU Accounting Setup and Configuration === | ||
+ | |||
+ | The CUDA nvidia-persistenced needs to be configured for the target OS's startup system. In the case of Ubuntu 16.04 this would be systemd. | ||
+ | |||
+ | == Confirm Driver Version == | ||
+ | |||
+ | <code> | ||
+ | nvidia-smi | ||
+ | </code> | ||
+ | |||
+ | Output example | ||
+ | |||
+ | <code> | ||
+ | Wed Apr 26 14:20:40 2017 | ||
+ | +-----------------------------------------------------------------------------+ | ||
+ | | NVIDIA-SMI 375.39 Driver Version: 375.39 | | ||
+ | |-------------------------------+----------------------+----------------------+ | ||
+ | | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | ||
+ | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | ||
+ | |===============================+======================+======================| | ||
+ | | 0 TITAN X (Pascal) Off | 0000:01:00.0 On | N/A | | ||
+ | | 41% 68C P2 95W / 250W | 1902MiB / 12186MiB | 96% Default | | ||
+ | +-------------------------------+----------------------+----------------------+ | ||
+ | |||
+ | +-----------------------------------------------------------------------------+ | ||
+ | | Processes: GPU Memory | | ||
+ | | GPU PID Type Process name Usage | | ||
+ | |=============================================================================| | ||
+ | | 0 1977 G /usr/lib/xorg/Xorg 60MiB | | ||
+ | | 0 10406 C /data1/data/kwagstyl/anaconda2/bin/python 189MiB | | ||
+ | | 0 10660 C /data1/data/kwagstyl/anaconda2/bin/python 1495MiB | | ||
+ | | 0 16118 C ...freesurfer_LBL/bin/mris_fix_topology_cuda 153MiB | | ||
+ | +-----------------------------------------------------------------------------+ | ||
+ | </code> | ||
+ | |||
+ | == Download and uncompress == | ||
+ | |||
+ | Download nvidia-persistenced version that matches your driver version (see output example above) | ||
+ | |||
+ | <code> | ||
+ | mkdir -p ~/src/ubuntu/16.04/nvidia | ||
+ | cd ~/src/ubuntu/16.04/nvidia | ||
+ | wget ftp://download.nvidia.com/XFree86/nvidia-persistenced/nvidia-persistenced-375.39.tar.bz2 | ||
+ | tar xvjf nvidia-persistenced-375.39.tar.bz2 | ||
+ | </code> | ||
+ | |||
+ | == Edit the nvidia-persistenced.conf.template == | ||
+ | |||
+ | = Confirm creation of the nvidia-persistenced user = | ||
+ | |||
+ | <code> | ||
+ | sudo cat /etc/passwd | grep nvidia | ||
+ | </code> | ||
+ | |||
+ | Output example: | ||
+ | |||
+ | <code> | ||
+ | nvidia-persistenced:x:126:132:NVIDIA Persistence Daemon,,,:/:/sbin/nologin | ||
+ | </code> | ||
+ | |||
+ | == Edit the systemd template == | ||
+ | |||
+ | <code> | ||
+ | cd nvidia-persistenced-375.39/init/systemd | ||
+ | nano nvidia-persistenced.service.template | ||
+ | </code> | ||
+ | |||
+ | Replace __USER__ with the nvidia-persistence users name `nvidia-persistenced` | ||
+ | |||
+ | == Run the installer == | ||
+ | |||
+ | <code> | ||
+ | cd ~/sys/sw/ubuntu/16.04/nvidia/nvidia-persistenced-375.39/init | ||
+ | sudo ./install.sh | ||
+ | </code> | ||
+ | |||
+ | = Output example = | ||
+ | |||
+ | <code> | ||
+ | Checking for common requirements... | ||
+ | sed found in PATH? Yes | ||
+ | useradd found in PATH? Yes | ||
+ | userdel found in PATH? Yes | ||
+ | id found in PATH? Yes | ||
+ | Common installation/uninstallation supported | ||
+ | |||
+ | Creating sample System V script... done. | ||
+ | Creating sample systemd service file... done. | ||
+ | Creating sample Upstart service file... done. | ||
+ | |||
+ | Checking for systemd requirements... | ||
+ | /usr/lib/systemd/system directory exists? No | ||
+ | /etc/systemd/system directory exists? Yes | ||
+ | systemctl found in PATH? Yes | ||
+ | systemd installation/uninstallation supported | ||
+ | |||
+ | Installation parameters: | ||
+ | User : nvidia-persistenced | ||
+ | Group : nvidia-persistenced | ||
+ | systemd service installation path : /etc/systemd/system | ||
+ | |||
+ | User 'nvidia-persistenced' already exists, skipping useradd... | ||
+ | User 'nvidia-persistenced' is in primary group 'nvidia-persistenced'. | ||
+ | Stopping nvidia-persistenced.service... done. | ||
+ | Installing sample systemd service nvidia-persistenced.service... done. | ||
+ | Enabling nvidia-persistenced.service... done. | ||
+ | Starting nvidia-persistenced.service... done. | ||
+ | </code> | ||
+ | |||
+ | == Check == | ||
+ | |||
+ | <code> | ||
+ | sudo service nvidia-persistenced status | ||
+ | </code> | ||
+ | |||
+ | == Troubleshooting == | ||
+ | |||
+ | add "--persistence-mode --verbose" to the line where the service is started. | ||
+ |