ACELab wiki

====== Differences ====== This shows you the differences between two versions of the page.

--- gpu_resources [2017/06/08 17:20]
adoyle [Preventing Job Clobbering]
+++ gpu_resources [2017/06/08 17:48] (current)
adoyle [Preventing Job Clobbering]
@@ Line 28: / Line 28: @@
 ===== Preventing Job Clobbering =====
-There are currently 3 GPU's in ace-gpu-1. To select one of the three (0, 1, 2), set the CUDA_VISIBLE_DEVICES environment variable. This can be accomplished by adding the following line to your ~/.bashrc file on ace-gpu-1, where X is either 0, 1 or 2:
+There are currently 3 GPU's in ace-gpu-1. To select one of the three (0, 1, 2), set the CUDA_VISIBLE_DEVICES environment variable. This can be accomplished by adding the following line to your ~/.bash_profile file on ace-gpu-1, where X is either 0, 1 or 2:
 <code>
@@ Line 34: / Line 34: @@
 </code>
+This will only take effect when you log in, so log out and back in and try the following to ensure that it worked:
+<code>
+echo $CUDA_VISIBLE_DEVICES
+</code>
+If it outputs the ID that you selected then you're ready to use the GPU.
+==== Sharing a single GPU ====
 To configure TensorFlow to not pre-allocate all GPU memory you can use the following Python code:
@@ Line 44: / Line 53: @@
 </code>
+This has been found to work only to a certain extent, and when there are several jobs that use a significant amount of the GPU resources, jobs can still be ruined even when using the above code
 ===== GPU Info =====