User Tools

Site Tools


guillimin_faq

Just some questions I had while using this cluster system. ====== What is the maximum ''wall'' time? ====== It's 30 days. You can run jobs for as long as that and you don't need to ask permission to an administrator. Also, have in mind that this limit cannot be extended. If you are going to run very long jobs, please consult the [[checkpoint_techniques_on_compute_canada_clusters|checkpointing manual]], which contains notes from a Calcul Québec workshop. ====== One of my jobs is running under an account that is not mine! ====== Happened to me once. See this: <code> $ showq -r -u jmateos ... 30486658 R gm- 93.14 1.2 no jmateos eim-670- sw-2r14-n14 12 1:16:49:28 Thu Apr 16 03:51:21 30486659 R gm- 88.11 1.3 no jmateos eim-670- sw-2r15-n70 12 1:18:30:22 Thu Apr 16 05:32:15 30621080 R gm- 91.52 1.0 no jmateos atlaspt sw-2r14-n13 12 3:03:19:00 Thu Apr 16 14:20:53 ... </code> Job number ''30621080'' seems to we running under allocation group ''atlaspt'' and not one of our own. I contacted ''guillimin'' support and they told me this: > This is an issue with our scheduler software that is only a cosmetic problem. The job 30621080 will be correctly charged to the account eim-670-aa. The 'group' parameter in the scheduler is distinct from one called 'account' which is the one used for accounting purposes. The group parameter is correct on the worker nodes, but gets confused at some point on the scheduler node. However, it will not affect your jobs. So don't worry too much about this. ====== I would like to receive an e-mail when my jobs are done ====== You can use the ''-m'' and ''-M'' switches in your script header. Example: <code> ... #PBS -m abe #PBS -M your.email@address.here ... </code> The ''-m abe'' option instructs the scheduler to send you and e-mail when your jobs start (b), finish (e) or abort due to an error (a). You can select any combination of the possible options.

guillimin_faq.txt · Last modified: 2016/11/03 17:23 (external edit)