User Tools

Site Tools


gpu_resources

====== Differences ====== This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
gpu_resources [2017/04/26 14:12]
llewis
gpu_resources [2017/06/08 17:48] (current)
adoyle [Preventing Job Clobbering]
Line 1: Line 1:
 ====== GPU Resources ====== ====== GPU Resources ======
  
 +This is a collaborative resource, please improve it. Login using your MCIN user name and ID and add your discoveries.
 +
 +===== Items of Interest / for Discussion? =====
 +
 +
 +
 +==== Resources ====
 +
 +* [ OpenACC - Tutorial - Steps to More Science ]( https://​developer.nvidia.com/​openacc/​3-steps-to-more-science )
 +
 +"Here are three simple steps to start accelerating your code with GPUs. We will be using PGI OpenACC compiler for C, C++, FORTRAN, along with tools from the PGI Community Edition."​
 +
 +* [ Performance Portability from GPUs to CPUs with OpenACC ](https://​devblogs.nvidia.com/​parallelforall/​performance-portability-gpus-cpus-openacc/​)
 +
 +* [ Data Center Management Tools ]( http://​www.nvidia.com/​object/​data-center-managment-tools.html )
 +
 +    * The GPU Deployment Kit
 +    * Ganglia
 +    * Slurm
 +    * NVIDIA Docker
 +    * Others???
 +
 +"​...performance on multicore CPUs for HPC apps using MPI + OpenACC is equivalent to MPI + OpenMP code. Compiling and running the same code on a Tesla K80 GPU can provide large speedups."​
 +
 +
 +===== Preventing Job Clobbering =====
 +
 +There are currently 3 GPU's in ace-gpu-1. To select one of the three (0, 1, 2), set the CUDA_​VISIBLE_​DEVICES environment variable. This can be accomplished by adding the following line to your ~/​.bash_profile file on ace-gpu-1, where X is either 0, 1 or 2:
 +
 +<​code>​
 +export CUDA_VISIBLE_DEVICES=X
 +</​code>​
 +
 +This will only take effect when you log in, so log out and back in and try the following to ensure that it worked:
 +
 +<​code>​
 +echo $CUDA_VISIBLE_DEVICES
 +</​code>​
 +
 +If it outputs the ID that you selected then you're ready to use the GPU.
 +
 +==== Sharing a single GPU ====
 +To configure TensorFlow to not pre-allocate all GPU memory you can use the following Python code:
 +
 +<​code>​
 +# configures TensorFlow to not try to grab all the GPU memory
 +config = tf.ConfigProto(allow_soft_placement=True)
 +config.gpu_options.allow_growth = True
 +session = tf.Session(config=config)
 +K.set_session(session)
 +</​code>​
 +
 +This has been found to work only to a certain extent, and when there are several jobs that use a significant amount of the GPU resources, jobs can still be ruined even when using the above code
 ===== GPU Info ===== ===== GPU Info =====
  
Line 31: Line 84:
 nsight nsight
 </​code>​ </​code>​
 +
 +Nvidia Visual Profiler (https://​developer.nvidia.com/​nvidia-visual-profiler) would be useful for GPU monitoring if we had X visualization,​ but we do not:
 +<​code>​
 +/​usr/​local/​cuda/​bin/​nvvp
 +</​code>​
 +
  
 ===== GPU Accounting ===== ===== GPU Accounting =====
Line 44: Line 103:
 </​code>​ </​code>​
  
 +Output example:
 +
 +<​code>​
 +==============NVSMI LOG==============
 +
 +Timestamp ​                          : Thu Apr 27 09:09:50 2017
 +Driver Version ​                     : 375.39
 +
 +Attached GPUs                       : 1
 +GPU 0000:​01:​00.0
 +    Accounting Mode                 : Enabled
 +    Accounting Mode Buffer Size     : 1920
 +    Accounted Processes
 +        Process ID                  : 15819
 +            GPU Utilization ​        : 100 %
 +            Memory Utilization ​     : 6 %
 +            Max memory usage        : 187 MiB
 +            Time                    : 3769 ms
 +            Is Running ​             : 0
 +...
 +</​code>​
 Users: to check GPU stats per process: Users: to check GPU stats per process:
 <​code>​ <​code>​
 nvidia-smi -i 0 --query-accounted-apps=gpu_name,​pid,​gpu_util,​max_memory_usage,​time --format=csv nvidia-smi -i 0 --query-accounted-apps=gpu_name,​pid,​gpu_util,​max_memory_usage,​time --format=csv
 +</​code>​
 +
 +Output example:
 +
 +<​code>​
 +gpu_name, pid, gpu_utilization [%], max_memory_usage [MiB], time [ms]
 +TITAN X (Pascal), 15819, 100 %, 187 MiB, 3769 ms
 +TITAN X (Pascal), 15633, 87 %, 8465 MiB, 200626 ms
 +TITAN X (Pascal), 15944, 0 %, 153 MiB, 382 ms
 +TITAN X (Pascal), 16000, 0 %, 155 MiB, 299 ms
 +TITAN X (Pascal), 15862, 80 %, 8465 MiB, 215039 ms
 +TITAN X (Pascal), 15842, 41 %, 425 MiB, 721223 ms
 +TITAN X (Pascal), 16294, 74 %, 8465 MiB, 231517 ms
 +TITAN X (Pascal), 16436, 70 %, 10425 MiB, 229470 ms
 +TITAN X (Pascal), 16118, 40 %, 155 MiB, 1310156 ms
 +TITAN X (Pascal), 16908, 72 %, 8465 MiB, 511122 ms
 +TITAN X (Pascal), 17102, 73 %, 8465 MiB, 833806 ms
 +TITAN X (Pascal), 17900, 0 %, 153 MiB, 358 ms
 +TITAN X (Pascal), 18018, 0 %, 153 MiB, 235 ms
 +TITAN X (Pascal), 17632, 75 %, 8465 MiB, 823193 ms
 +TITAN X (Pascal), 18376, 74 %, 8529 MiB, 827336 ms
 +TITAN X (Pascal), 18637, 74 %, 8465 MiB, 547161 ms
 +TITAN X (Pascal), 16377, 54 %, 153 MiB, 0 ms
 +TITAN X (Pascal), 18752, 55 %, 8465 MiB, 0 ms
 </​code>​ </​code>​
  
Line 54: Line 158:
 </​code>​ </​code>​
  
 +==== nvidia-smi flags used ====
 +
 +<​code>​
 +    -i,   ​--id= ​                ​Target a specific GPU.
 +    -am   ​--accounting-mode= ​   Enable or disable Accounting Mode: 0/DISABLED, 1/ENABLED
 +    -q,   ​--query ​              ​Display GPU or Unit info.
 +    -d,   ​--display= ​           Display only selected information:​ MEMORY,
 +                                    UTILIZATION,​ ECC, TEMPERATURE,​ POWER, CLOCK,
 +                                    COMPUTE, PIDS, PERFORMANCE,​ SUPPORTED_CLOCKS,​
 +                                    PAGE_RETIREMENT,​ ACCOUNTING.
 +                                Flags can be combined with comma e.g. ECC,POWER.
 +                                Sampling data with max/min/avg is also returned ​
 +                                for POWER, UTILIZATION and CLOCK display types.
 +                                Doesn'​t work with -u or -x flags.
 +</​code>​
 +
 +* [[http://​docs.nvidia.com/​deploy/​driver-persistence/​index.html#​persistence-mode]]
 +
 +* [[http://​docs.nvidia.com/​deploy/​driver-persistence/​index.html#​persistence-daemon]]
 ===== Deep Learning ===== ===== Deep Learning =====
  
gpu_resources.1493215958.txt.gz · Last modified: 2017/04/26 14:12 by llewis