====== Differences ====== This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
checkpoint_techniques_on_compute_canada_clusters [2015/04/17 15:28] 132.216.122.26 |
checkpoint_techniques_on_compute_canada_clusters [2016/11/03 17:23] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Checkpointing techniques on guillimin ====== | ||
- | |||
- | |||
These are the notes for the Checkpoint Techniques workshop I attended on March 26th, 2015 (the workshop materials can be found [[http://www.hpc.mcgill.ca/index.php/training#chkpt|here]].) Might be useful for people who want to learn how to code this on their own programs. Please don't hesitate to edit this page if you feel I left something out, you want to add something on your own or my English sounds funny. | These are the notes for the Checkpoint Techniques workshop I attended on March 26th, 2015 (the workshop materials can be found [[http://www.hpc.mcgill.ca/index.php/training#chkpt|here]].) Might be useful for people who want to learn how to code this on their own programs. Please don't hesitate to edit this page if you feel I left something out, you want to add something on your own or my English sounds funny. | ||
Line 82: | Line 79: | ||
# New version of this script. Now we use DMTCP to launch | # New version of this script. Now we use DMTCP to launch | ||
- | # the scripts (and gnu-parallel). | + | # the scripts. |
def chunks(l, n): | def chunks(l, n): | ||
Line 172: | Line 169: | ||
**Currently this is not working as expected; for some unknown reason, only 2 random jobs get re-started. I have contacted Calcul Québec about this and they should reply shortly. I will update this page with a bug-free script (or whatever solution they give me.)** | **Currently this is not working as expected; for some unknown reason, only 2 random jobs get re-started. I have contacted Calcul Québec about this and they should reply shortly. I will update this page with a bug-free script (or whatever solution they give me.)** | ||
+ | |||
+ | **Update 2: they did not reply.** |