User Tools

Site Tools


checkpoint_techniques_on_compute_canada_clusters

====== Differences ====== This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
checkpoint_techniques_on_compute_canada_clusters [2015/04/17 15:28]
132.216.122.26
checkpoint_techniques_on_compute_canada_clusters [2016/11/03 17:23] (current)
Line 1: Line 1:
-====== Checkpointing techniques on guillimin ====== 
- 
- 
 These are the notes for the Checkpoint Techniques workshop I attended on March 26th, 2015 (the workshop materials can be found [[http://​www.hpc.mcgill.ca/​index.php/​training#​chkpt|here]].) Might be useful for people who want to learn how to code this on their own programs. Please don't hesitate to edit this page if you feel I left something out, you want to add something on your own or my English sounds funny. These are the notes for the Checkpoint Techniques workshop I attended on March 26th, 2015 (the workshop materials can be found [[http://​www.hpc.mcgill.ca/​index.php/​training#​chkpt|here]].) Might be useful for people who want to learn how to code this on their own programs. Please don't hesitate to edit this page if you feel I left something out, you want to add something on your own or my English sounds funny.
  
Line 82: Line 79:
  
 # New version of this script. Now we use DMTCP to launch # New version of this script. Now we use DMTCP to launch
-# the scripts ​(and gnu-parallel).+# the scripts.
  
 def chunks(l, n): def chunks(l, n):
Line 172: Line 169:
  
 **Currently this is not working as expected; for some unknown reason, only 2 random jobs get re-started. I have contacted Calcul Québec about this and they should reply shortly. I will update this page with a bug-free script (or whatever solution they give me.)** ​ **Currently this is not working as expected; for some unknown reason, only 2 random jobs get re-started. I have contacted Calcul Québec about this and they should reply shortly. I will update this page with a bug-free script (or whatever solution they give me.)** ​
 +
 +**Update 2: they did not reply.**
checkpoint_techniques_on_compute_canada_clusters.1429284525.txt.gz · Last modified: 2016/11/03 17:23 (external edit)