-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathslurm.html
833 lines (826 loc) · 40.2 KB
/
slurm.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<meta name="author" content="April 18, 2024" />
<title>Savio intermediate training: Savio tips and tricks – making the most of the Slurm scheduler and of Mamba/Conda environments</title>
<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
div.columns{display: flex; gap: min(4vw, 1.5em);}
div.column{flex: auto; overflow-x: auto;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
ul.task-list li input[type="checkbox"] {
width: 0.8em;
margin: 0 0.8em 0.2em -1.6em;
vertical-align: middle;
}
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { color: #008000; } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { color: #008000; font-weight: bold; } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
.display.math{display: block; text-align: center; margin: 0.5rem auto;}
</style>
<link rel="stylesheet" type="text/css" media="screen, projection, print"
href="https://www.w3.org/Talks/Tools/Slidy2/styles/slidy.css" />
<script src="https://www.w3.org/Talks/Tools/Slidy2/scripts/slidy.js"
charset="utf-8" type="text/javascript"></script>
</head>
<body>
<div class="slide titlepage">
<h1 class="title">Savio intermediate training: Savio tips and tricks –
making the most of the Slurm scheduler and of Mamba/Conda
environments</h1>
<p class="author">
April 18, 2024
</p>
<p class="date">Chris Paciorek and Jeffrey Jacob</p>
</div>
<div id="upcoming-events-and-hiring" class="slide section level1">
<h1>Upcoming events and hiring</h1>
<ul>
<li><p>Cybersecurity for Researchers</p>
<ul>
<li>Tuesday, October 22, 2024 at 1 pm via Zoom</li>
<li>This brown bag session will focus on secure campus tools and
services that Research IT and Berkeley IT offer to researchers, tips on
navigating campus security processes, and cybersecurity best practices
for keeping your research and research subjects safe.</li>
<li>In partnership with the UC Berkeley Information Security Office and
Industry Alliances Office.</li>
<li>Check our <a
href="https://research-it.berkeley.edu/events-trainings/upcoming-events-trainings">Events
& Training page</a> for more information about this and other
upcoming events.</li>
</ul></li>
<li><p>We offer platforms and services for researchers working with <a
href="https://docs-research-it.berkeley.edu/services/srdc/">sensitive
data</a>.</p></li>
<li><p>Get paid to develop your skills in research data and
computing!</p>
<ul>
<li>Berkeley Research Computing is hiring several graduate student
Domain Consultants for flexible appointments, 10% to 25% effort (4-10
hours/week).</li>
<li>Email your cover letter and CV to: [email protected].</li>
</ul></li>
</ul>
</div>
<div id="introduction" class="slide section level1">
<h1>Introduction</h1>
<p>We’ll do this in part as a demonstration. We encourage you to login
to your account and try out the various examples yourself as we go
through them.</p>
<p>Much of this material is based on the extensive Savio documention we
have prepared and continue to update, available at <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/">https://docs-research-it.berkeley.edu/services/high-performance-computing/</a>.</p>
<p>The materials for this tutorial are available using git at the short
URL (<a href="https://tinyurl.com/brc-apr24">tinyurl.com/brc-apr24</a>),
the GitHub URL (<a
href="https://github.com/ucb-rit/savio-training-slurm-conda-spring-2024">https://github.com/ucb-rit/savio-training-slurm-conda-spring-2024</a>),
or simply as a <a
href="https://github.com/ucb-rit/savio-training-slurm-conda-spring-2024/archive/main.zip">zip
file</a>.</p>
</div>
<div id="how-to-get-additional-help" class="slide section level1">
<h1>How to get additional help</h1>
<ul>
<li>For technical issues and questions about using Savio:
<ul>
<li>[email protected]</li>
</ul></li>
<li>For questions about computing resources in general, including cloud
computing:
<ul>
<li>[email protected] or [email protected]</li>
<li>office hours: Wed. 1:30-3:00 and Thur. 9:30-11:00 <a
href="https://research-it.berkeley.edu/programs/berkeley-research-computing/research-computing-consulting">on
Zoom</a></li>
</ul></li>
<li>For questions about data management (including HIPAA-protected
data):
<ul>
<li>[email protected]</li>
<li>office hours: Wed. 1:30-3:00 and Thur. 9:30-11:00 <a
href="https://research-it.berkeley.edu/programs/berkeley-research-computing/research-computing-consulting">on
Zoom</a></li>
</ul></li>
<li>Status & Service Updates
<ul>
<li>The best way to stay updated on the latest status and updates for
the Research IT services is on the front page of the Research IT
website. If you are having issues or unsure if one of our services is
down, check there first before sending us a ticket.</li>
</ul></li>
</ul>
</div>
<div id="outline" class="slide section level1">
<h1>Outline</h1>
<p>This training session will cover the following topics:</p>
<ul>
<li>Slurm tips and tricks
<ul>
<li>Associations: Accounts, partitions and queues</li>
<li>Requesting specific resources, including GPUs</li>
<li>Diagnosing Slurm submission errors</li>
<li>Understanding the queue and getting jobs to start faster</li>
<li>Using Slurm flags for parallelization</li>
<li>Using MPI and troubleshooting problems</li>
<li>Diagnosing job run-time errors</li>
</ul></li>
<li>Working with Conda/Mamba environments
<ul>
<li>Introduction and Conda vs. Mamba</li>
<li>Creating and isolating environments</li>
<li>Disk space and Conda</li>
<li>Jupyter kernels</li>
</ul></li>
</ul>
</div>
<div id="slurm-scheduler" class="slide section level1">
<h1>Slurm scheduler</h1>
<p>All computations are done by submitting jobs to the scheduling
software that manages jobs on the cluster, called Slurm.</p>
<p>Why is this necessary? Otherwise your jobs would be slowed down by
other people’s jobs running on the same node. This also allows everyone
to fairly share Savio.</p>
<p>Savio uses Slurm to:</p>
<ol style="list-style-type: decimal">
<li>Allocate access to resources (compute nodes) for users’ jobs</li>
<li>Start and monitor jobs on allocated resources</li>
<li>Manage the queue of pending jobs</li>
</ol>
<center>
<img src="savio_diagram.jpeg">
</center>
</div>
<div id="submitting-jobs-accounts-and-partitions"
class="slide section level1">
<h1>Submitting jobs: accounts and partitions</h1>
<p>Generally request:</p>
<ul>
<li>project account (FCA, condo, etc.)</li>
<li>partition (type of node)</li>
</ul>
<p>You can see what accounts you have access to and which partitions
within those accounts as follows:</p>
<pre><code>sacctmgr -p show associations user=SAVIO_USERNAME</code></pre>
<p>Here’s an example of the output for a user who has access to an FCA
and a condo.</p>
<pre><code>Cluster|Account|User|Partition|Share|Priority|GrpJobs|GrpTRES|GrpSubmit|GrpWall|GrpTRESMins|MaxJobs|MaxTRES|MaxTRESPerNode|MaxSubmit|MaxWall|MaxTRESMins|QOS|Def QOS|GrpTRESRunMins|
brc|ucb|paciorek|ood-inter|1|||||||||||||ood_interactive|ood_interactive||
brc|fc_paciorek|paciorek|savio4_gpu|1|||||||||||||a5k_gpu4_normal,savio_lowprio|a5k_gpu4_normal||
brc|fc_paciorek|paciorek|savio4_htc|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio3_gpu|1|||||||||||||a40_gpu3_normal,gtx2080_gpu3_normal,savio_lowprio,v100_gpu3_normal|gtx2080_gpu3_normal||
brc|fc_paciorek|paciorek|savio3_htc|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio3_bigmem|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio3|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2_1080ti|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2_knl|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2_gpu|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2_htc|1|||||||||||||savio_debug,savio_long,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2_bigmem|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio_bigmem|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|co_stat|paciorek|savio3_gpu|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio4_gpu|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio4_htc|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio3_htc|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio3_bigmem|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio3|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio2_1080ti|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio2_knl|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio2_bigmem|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio2_gpu|1|||||||||||||savio_lowprio,stat_gpu2_normal|stat_gpu2_normal||
brc|co_stat|paciorek|savio2_htc|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio_bigmem|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio2|1|||||||||||||savio_lowprio,stat_savio2_normal|stat_savio2_normal||</code></pre>
<p>If you are part of a condo, you’ll notice that you have
<em>low-priority</em> access to certain partitions. For example I am
part of the statistics condo <em>co_stat</em>, which owns some ‘savio2’
nodes and ‘savio2_gpu’ nodes and therefore I have normal access to
those, but I can also burst beyond the condo and use other partitions at
low priority.</p>
<p>In contrast, through my FCA, I have access to the most partitions at
normal priority, but not all of them…</p>
<pre><code>[paciorek@ln002 ~]$ srun -p savio3_xlmem -A co_stat -t 5:00 --pty bash
srun: error: Unable to allocate resources: Invalid account or account/partition combination specified</code></pre>
</div>
<div id="submitting-a-batch-job" class="slide section level1">
<h1>Submitting a batch job</h1>
<p>Let’s see how to submit a simple job. If your job will only use the
resources on a single node, you can do the following.</p>
<p>Here’s an example job script that I’ll run.</p>
<div class="sourceCode" id="cb4"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co">#!/bin/bash</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Job name:</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --job-name=test</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Account:</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --account=fc_paciorek</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Partition:</span></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --partition=savio3_htc</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Cores:</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --cpus-per-task=2</span></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Wall clock limit (2 minutes here):</span></span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --time=00:02:00</span></span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a><span class="co">## Command(s) to run:</span></span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a><span class="ex">module</span> load python/3.10.10 </span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a><span class="ex">python</span> calc.py <span class="op">>&</span> calc.out</span></code></pre></div>
<p>Note: The number of cores and nodes requested default to 1.</p>
<p>Tip: It’s generally a good idea to specify module versions explicitly
for reproducibility. Default versions will change over time.</p>
</div>
<div id="monitoring-jobs" class="slide section level1">
<h1>Monitoring jobs</h1>
<p>Now let’s submit and monitor the job:</p>
<pre><code>sbatch test.sh
squeue -j <JOB_ID>
wwall -j <JOB_ID></code></pre>
<p>You can also login to the node where you are running and use commands
like <code>top</code>, <code>free</code>, and <code>ps</code>:</p>
<pre><code>srun --jobid=<JOB_ID> --pty /bin/bash</code></pre>
<p>After a job has completed (or been terminated/cancelled), you can
review the maximum memory used (and other information) via the sacct
command.</p>
<pre><code>sacct -j <JOB_ID> --format=JobID,JobName,MaxRSS,Elapsed</code></pre>
<p>MaxRSS will show the maximum amount of memory that the job used in
kilobytes.</p>
</div>
<div id="specific-resources-cpus-cores" class="slide section level1">
<h1>Specific resources: CPUs (cores)</h1>
<p><strong>Per-core allocations</strong>: For partitions named
<code>_htc</code> or <code>_gpu</code>, jobs are scheduled (and charged)
per core. Default one core.</p>
<p><strong>Per-node allocations</strong>: For other partitions, jobs are
given exclusive access to entire node(s) (and your account is charged
for all of the cores on the node(s)).</p>
<p>In a few partitions the number of cores differ between machines in
the partition.</p>
<ul>
<li><p>E.g., <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/hardware-config/">in
<code>savio3</code>, some nodes have 40 cores and some have 32
cores</a>.</p></li>
<li><p>To request <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/scheduler-config/">particular
‘features’</a>, you can use <code>-C</code>, e.g.,</p>
<pre><code>srun -p savio3 -C savio3_c40 -A ac_scsguest --pty -t 5:00 bash # 40 cores
srun -p savio3 -C savio3 -A ac_scsguest --pty -t 5:00 bash # 32 cores</code></pre></li>
</ul>
</div>
<div id="specific-resources-memory-ram" class="slide section level1">
<h1>Specific resources: Memory (RAM)</h1>
<p>You generally should not request a particular amount of memory:</p>
<ul>
<li>full-node allocations can automatically use all the memory</li>
<li>per-core allocations are given memory proportional to the <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/hardware-config/">number
of cores</a>.
<ul>
<li>to get more memory, request the number of cores equivalent to the
memory you need.</li>
</ul></li>
<li>In some partitions (<code>savio4_htc</code>,
<code>savio3_gpu</code>), the amount of CPU memory per node varies. (See
previous slide about ‘constraints’.)</li>
</ul>
</div>
<div id="specific-resources-gpus" class="slide section level1">
<h1>Specific resources: GPUs</h1>
<p>GPU technology is advancing fast. As a result, it’s hard to maintain
a large, homogeneous pool of <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/hardware-config">GPU
nodes</a>.</p>
<ul>
<li><code>savio2_gpu</code> has (old) K80 GPUs.</li>
<li><code>savio3_gpu</code> has GTX2080TI, TITAN RTX, V100, and A40
nodes.</li>
<li><code>savio4_gpu</code> has A5000 nodes.</li>
</ul>
<p>Required submission info:</p>
<ul>
<li>Request the number of GPUs.</li>
<li>Request a <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/submitting-jobs/#gpu-jobs">fixed
number of CPUs for each GPU you need</a>.</li>
<li>Request the GPU type in (for <code>savio3_gpu</code>,
<code>savio4_gpu</code>).</li>
</ul>
<p>For example:</p>
<pre><code>sbatch -A fc_foo -p savio3_gpu --gres=gpu:GTX2080TI:1 -c 2 -t 60:00 job.sh
sbatch -A fc_foo -p savio3_gpu --gres=gpu:A40:2 -c 16 -t 60:00 job.sh</code></pre>
<p><code>CUDA_VISIBLE_DEVICES</code> will be set to <code>0,....</code>
(i.e., “internal” numbering within job).</p>
</div>
<div id="submission-problems---obvious-failures"
class="slide section level1">
<h1>Submission problems - obvious failures</h1>
<ul>
<li><p>Submitting to an account/partition/QoS you don’t have access to
(“Invalid account or account/partition combination specified”).</p></li>
<li><p>FCA is exhausted (“This user/account pair does not have enough
service units”): If you’d like to see how much of an FCA has been
used:</p>
<pre><code>check_usage.sh -a fc_bands</code></pre></li>
</ul>
</div>
<div id="submission-problems---non-obvious-failures"
class="slide section level1">
<h1>Submission problems - non-obvious failures</h1>
<p>Frustratingly, some submissions can simply hang. They will never
start but do not give an error message.</p>
<ul>
<li>Time limit too long (e.g., more than 3 hours in
<code>savio_debug</code> queue or more than 72 hours FCA job in
<code>savio_normal</code>:</li>
</ul>
<pre><code>[paciorek@ln002 ~]$ srun -A ac_scsguest -t 74:00:00 -p savio3_htc --pty bash
[paciorek@ln002 ~]$ squeue -u paciorek -o "%.7i %.12P %.20j %.8u %.2t %.5C %.5D %.12M %.12l %.14r %.8p %.20q %.12b %.20R"
JOBID PARTITION NAME USER ST CPUS NODES TIME TIME_LIMIT REASON PRIORITY QOS TRES_PER_NOD NODELIST(REASON)
1809333 savio3_htc bash paciorek PD 1 1 0:00 3-02:00:00 QOSMaxWallDura 0.000034 savio_normal N/A (QOSMaxWallDurationP</code></pre>
<ul>
<li>Too many nodes requested:</li>
</ul>
<pre><code>[paciorek@ln002 ~]$ srun -A fc_paciorek -p savio4_htc -N 40 --pty -t 5:00 bash
[paciorek@ln002 ~]$ squeue -u paciorek -o "%.7i %.12P %.20j %.8u %.2t %.5C %.5D %.12M %.12l %.14r %.8p %.20q %.12b %.20R"
JOBID PARTITION NAME USER ST CPUS NODES TIME TIME_LIMIT REASON PRIORITY QOS TRES_PER_NOD NODELIST(REASON)
1809334 savio4_htc bash paciorek PD 40 40 0:00 5:00 QOSMaxNodePerJ 0.000085 savio_normal N/A (QOSMaxNodePerJobLim</code></pre>
<ul>
<li>GPU jobs not requesting sufficient CPUs:</li>
</ul>
<pre><code>[paciorek@ln002 ~]$ srun -A fc_paciorek -p savio4_gpu -c 2 --gres=gpu:A5000:1 --pty -t 5:00 bash
[paciorek@ln002 ~]$ squeue -u paciorek -o "%.7i %.12P %.20j %.8u %.2t %.5C %.5D %.12M %.12l %.14r %.8p %.20q %.12b %.20R"
JOBID PARTITION NAME USER ST CPUS NODES TIME TIME_LIMIT REASON PRIORITY QOS TRES_PER_NOD NODELIST(REASON)
1809335 savio4_gpu bash paciorek PD 2 1 0:00 5:00 QOSMinCpuNotSa 0.000108 a5k_gpu4_normal gres:gpu:A50 (QOSMinCpuNotSatisfi</code></pre>
<ul>
<li>Invalid or missing GPU type:</li>
</ul>
<pre><code>[paciorek@ln002 ~]$ srun -A fc_paciorek -p savio4_gpu -c 4 --gres=gpu:1 --pty -t 5:00 bash
[paciorek@ln002 ~]$ squeue -u paciorek -o "%.7i %.12P %.20j %.8u %.2t %.5C %.5D %.12M %.12l %.14r %.8p %.20q %.12b %.20R"
JOBID PARTITION NAME USER ST CPUS NODES TIME TIME_LIMIT REASON PRIORITY QOS TRES_PER_NOD NODELIST(REASON)
1809336 savio4_gpu bash paciorek PD 4 1 0:00 5:00 QOSMinGRES 0.000108 a5k_gpu4_normal gres:gpu:1 (QOSMinGRES)</code></pre>
</div>
<div id="monitoring-jobs-the-job-queue-and-overall-usage"
class="slide section level1">
<h1>Monitoring jobs, the job queue, and overall usage</h1>
<p>The basic command for seeing what is running on the system is
<code>squeue</code>:</p>
<pre><code>squeue
squeue -u $USER
squeue -A co_stat</code></pre>
<p>To see what nodes are available in a given partition:</p>
<pre><code>sinfo -p savio3
sinfo -p savio2_gpu</code></pre>
<p>For more information on cores, QoS, and additional (e.g., GPU)
resources, here’s some syntax:</p>
<pre><code>squeue -o "%.7i %.12P %.20j %.8u %.2t %.5C %.5D %.12M %.12l %.14r %.8p %.20q %.12b %.20R"</code></pre>
</div>
<div id="waiting-in-the-queue" class="slide section level1">
<h1>Waiting in the queue</h1>
<p>Tools to diagnose queueing situations:</p>
<ul>
<li>Our <code>sq</code> tool, which wraps <code>squeue</code>.</li>
<li><code>sinfo -p savio3_htc</code></li>
<li><code>squeue</code>
<ul>
<li><code>--state=PD</code> may be a helpful flag.</li>
</ul></li>
</ul>
<p><a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/why-job-not-run/">Reasons
your job might sit in the queue</a>:</p>
<ul>
<li>The partition may be fully occupied (<code>Priority</code>,
<code>Resources</code>).</li>
<li>Your condo may be fully utilizing its purchased resources
(<code>QOSGrpCpuLimit</code>, <code>QOSGrpNodeLimit</code>).</li>
<li>The total number of FCA jobs in small partitions may be at its limit
(<code>QOSGrpCpuLimit</code>, <code>QOSGrpNodeLimit</code>).</li>
<li>Slurm’s fair share policy will prioritize less-active FCA groups
(and less-active users) (<code>Priority</code>).</li>
<li>FCA jobs have lower priority than condo jobs
(<code>Priority</code>).</li>
<li>Your time limit may overlap with a scheduled downtime
(<code>ReqNodeNotAvail, Reserved for Maintenance</code>).</li>
</ul>
<p>Let’s experiment with submitting jobs to heavily-used partitions and
see what the queue looks like.</p>
</div>
<div id="how-the-queue-works" class="slide section level1">
<h1>How the queue works</h1>
<ul>
<li><p>Fairshare</p>
<ul>
<li>Condo jobs get top priority and will go to the top of the queue.
<ul>
<li>Users within a condo will be prioritized inversely to recent
usage.</li>
</ul></li>
<li>FCAs (and then users within FCAs) prioritized inversely to recent
usage (see the <code>PRIORITY</code> column of
<code>squeue</code>).</li>
</ul></li>
<li><p>Backfilling</p>
<ul>
<li>Slurm uses <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/why-job-not-run/">“backfilling”</a>
to try to fit in lower-priority jobs that won’t delay higher-priority
jobs.</li>
</ul></li>
</ul>
<center>
<img src="scheduler_cartoon.jpg">
</center>
</div>
<div id="how-the-queue-works-condos" class="slide section level1">
<h1>How the queue works (condos)</h1>
<ul>
<li>A condo’s usage, aggregated over all the condo’s users is limited to
at most the number of nodes purchased by the condo at any given
time.</li>
<li>Additional jobs will be queued until usage drops below that limit.
<ul>
<li>The pending jobs will be ordered based on the Slurm Fairshare
priority, with users with less recent usage prioritized.</li>
</ul></li>
<li>Sometimes a condo job may not start immediately even if the condo’s
usage is below it’s allocation:
<ul>
<li>Because the partition is fully used, across all condo and FCA users
of the given partition.</li>
<li>This can occur when a condo has not been fully used and FCA jobs
have filled up the partition during that period of limited usage.</li>
<li>Condo jobs are prioritized over FCA jobs in the queue and will start
as soon as resources become available.</li>
<li>Usually any lag in starting condo jobs under this circumstance is
limited.</li>
</ul></li>
</ul>
</div>
<div id="how-the-queue-works-fcas" class="slide section level1">
<h1>How the queue works (FCAs)</h1>
<ul>
<li>Jobs start when they reach the top of the queue and resources become
available as running jobs finish.</li>
<li>The queue is ordered based on the Slurm Fairshare priority
(specifically the Fair Tree algorithm).</li>
<li>The primary influence on this priority is the overall recent usage
by all users in the same FCA as the user submitting the job.</li>
<li>Jobs from multiple users within an FCA are then influenced by their
individual recent usage.</li>
<li>In more detail, usage at the FCA level (summed across all
partitions) is ordered across all FCAs,
<ul>
<li>Priority for a given job depends inversely on that recent usage
(based on the FCA the job is using).</li>
<li>Similarly, amongst users within an FCA, usage is ordered amongst
those users, such that for a given partition, a user with lower recent
usage in that partition will have higher priority than one with higher
recent usage.</li>
</ul></li>
</ul>
</div>
<div id="when-will-my-job-start" class="slide section level1">
<h1>When will my job start?</h1>
<p><code>sq</code> provides a user-friendly way to understand why your
job isn’t running yet or the status of your finished/failed job.</p>
<pre><code># should be loaded by default, but if it isn't:
# module load sq
# sq -h # for help with `sq`
sq</code></pre>
<pre><code>Showing results for user paciorek
Currently 0 running jobs and 1 pending job (most recent job first):
+---------|------|-------------|-----------|--------------|------|---------|-----------+
| Job ID | Name | Account | Nodes | QOS | Time | State | Reason |
+---------|------|-------------|-----------|--------------|------|---------|-----------+
| 7510375 | test | fc_paciorek | 1x savio2 | savio_normal | 0:00 | PENDING | Resources |
+---------|------|-------------|-----------|--------------|------|---------|-----------+
7510375:
This job is scheduled to run after 21 higher priority jobs.
Estimated start time: N/A
To get scheduled sooner, you can try reducing wall clock time as appropriate.
Recent jobs (most recent job first):
+---------|------|-------------|-----------|----------|---------------------|-----------+
| Job ID | Name | Account | Nodes | Elapsed | End | State |
+---------|------|-------------|-----------|----------|---------------------|-----------+
| 7509474 | test | fc_paciorek | 1x savio2 | 00:00:16 | 2021-02-09 23:47:45 | COMPLETED |
+---------|------|-------------|-----------|----------|---------------------|-----------+
7509474:
- This job ran for a very short amount of time (0:00:16). You may want to check that the output was correct or if it exited because of a problem.</code></pre>
<p>To see another user’s jobs:</p>
<pre><code>sq -u paciorek</code></pre>
<p>The <code>-a</code> flag shows current and past jobs together, the
<code>-q</code> flag suppresses messages about job issues, and the
<code>-n</code> flag sets the limit on the number of jobs to show in the
output (default = 8).</p>
<pre><code>sq -u paciorek -aq -n 10</code></pre>
<pre><code>Showing results for user paciorek
Recent jobs (most recent job first):
+-----------|------|-------------|-----------|------------|---------------------|-----------+
| Job ID | Name | Account | Nodes | Elapsed | End | State |
+-----------|------|-------------|-----------|------------|---------------------|-----------+
| 7487633.1 | ray | co_stat | 1x | 1-20:19:03 | Unknown | RUNNING |
| 7487633.0 | ray | co_stat | 1x | 1-20:19:08 | Unknown | RUNNING |
| 7487633 | test | co_stat | 2x savio2 | 1-20:19:12 | Unknown | RUNNING |
| 7487879 | bash | ac_scsguest | 1x savio | 00:00:27 | 2021-02-08 14:54:19 | COMPLETED |
| 7487633.2 | bash | co_stat | 2x | 00:00:34 | 2021-02-08 14:53:38 | FAILED |
| 7487515 | test | co_stat | 2x savio2 | 00:04:53 | 2021-02-08 14:22:17 | CANCELLED |
| 7487515.1 | ray | co_stat | 1x | 00:00:06 | 2021-02-08 14:17:39 | FAILED |
| 7487515.0 | ray | co_stat | 1x | 00:00:05 | 2021-02-08 14:17:33 | FAILED |
| 7473988 | test | co_stat | 2x savio2 | 3-00:00:16 | 2021-02-08 13:33:40 | TIMEOUT |
| 7473989 | test | ac_scsguest | 2x savio | 2-22:30:11 | 2021-02-08 11:47:54 | CANCELLED |
+-----------|------|-------------|-----------|------------|---------------------|-----------+</code></pre>
</div>
<div id="getting-your-job-to-start-faster" class="slide section level1">
<h1>Getting your job to start faster</h1>
<ul>
<li>Reduce the time limit.</li>
<li>Request fewer nodes or cores.</li>
<li>Find a less-used partition (using <code>sinfo</code>).</li>
<li>Submit to a condo instead of an FCA (if you’re in both) for higher
priority.</li>
<li>Submit to an FCA instead of a condo (if you’re in both) if condo is
full.</li>
</ul>
</div>
<div id="parallelization" class="slide section level1">
<h1>Parallelization</h1>
<p>Some flavors of parallelization:</p>
<ul>
<li>single node only:
<ul>
<li>threaded code (e.g., <code>openMP</code>, <code>TBB</code>)</li>
<li>threaded linear algebra in Python/numpy, R, Julia, etc. (uses
<code>openMP</code> or <code>MKL</code>), e.g., our <code>test.sh</code>
example earlier</li>
</ul></li>
<li>one or more nodes:
<ul>
<li>parallel loops, parallel maps in Python, R, etc. (usually one Linux
process per worker)
<ul>
<li>Python: <code>dask</code>, <code>ray</code>,
<code>ipyparallel</code> packages</li>
<li>R: <code>future</code>, <code>parallel</code>, <code>foreach</code>
packages</li>
</ul></li>
<li>MPI (message-passing)</li>
<li><a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/gnu-parallel/">GNU
parallel</a>: parallelize independent tasks</li>
</ul></li>
</ul>
<p>Various other executables (e.g., in bioinformatics, computational
chemistry, computational fluid mechanics, etc.) will use various of
these approaches internally.</p>
</div>
<div id="parallelization-considerations" class="slide section level1">
<h1>Parallelization considerations</h1>
<p>Rules-of-thumb:</p>
<ul>
<li>Often one core per process (i.e., “worker”)
<ul>
<li>Multiple cores per process for threaded code</li>
<li>Avoid having multiple processes per core</li>
</ul></li>
<li>One or more computational units per worker</li>
</ul>
<p>Confusing: “task” could mean “worker” in the context of MPI or
“computational unit” more generally.</p>
<p>Important:</p>
<ul>
<li>Is the executable you’re using written so as to use
parallelization?</li>
<li>What does the user need to specify?
<ul>
<li>Sometimes multi-core, single-node parallelization will occur without
user specification.</li>
</ul></li>
</ul>
</div>
<div id="slurm-flags" class="slide section level1">
<h1>Slurm flags:</h1>
<ul>
<li><code>--cpus-per-task</code> (<code>-c</code>): number of cores for
each task</li>
<li><code>--ntasks</code> (<code>-n</code>): total number of tasks</li>
<li><code>--ntasks-per-node</code>: number of tasks on each node</li>
<li><code>--nodes</code> (<code>-N</code>): the number of nodes to
use</li>
</ul>
<p>Based on the flags, Slurm will set various shell environment
variables your code can use to configure parallelization, e.g.,
<code>SLURM_NTASKS</code>, <code>SLURM_CPUS_PER_TASK</code>,
<code>SLURM_NODELIST</code>, <code>SLURM_NNODES</code>.</p>
<p>We generally refer to “cores” rather than “CPUs” as modern CPUs have
multiple computational cores that can each carry out independent
work.</p>
</div>
<div id="cpus-per-task-vs.-ntasks" class="slide section level1">
<h1><code>cpus-per-task</code> vs. <code>ntasks</code></h1>
<p>In some cases one can either use <code>--cpus-per-task</code> or
<code>--ntasks</code> (or <code>--ntasks-per-node</code>) to get
multiple cores on a single node.</p>
<p>Caveats:</p>
<ul>
<li>Can’t use <code>--cpus-per-task</code> to get cores on multiple
nodes.</li>
<li><code>--ntasks</code> does not guarantee cores all on a single node
(but <code>--ntasks-per-node</code> does).</li>
<li>Need to use <code>--ntasks</code> (or
<code>--ntasks-per-node</code>) for MPI jobs.</li>
<li>Need to use specify cpus and tasks for hybrid jobs with multiple
threaded processes (e.g., MPI+openMP or GNU parallel+openMP).</li>
</ul>
</div>
<div id="examples" class="slide section level1">
<h1>Examples</h1>
<p>Some common paradigms are:</p>
<ul>
<li>one node, many cores
<ul>
<li>openMP/threaded jobs - one task, <em>c</em> cores for the task</li>
<li>Python/R/GNU parallel - <em>n</em> tasks, one per core at any given
time, often more computational units than tasks</li>
</ul></li>
<li>many nodes, many cores
<ul>
<li>MPI jobs that use one core per task for each of <em>n</em> tasks,
spread across multiple nodes</li>
<li>Python/R/GNU parallel - <em>n</em> tasks, one per core at any given
time, often more computational units than tasks</li>
</ul></li>
<li>hybrid jobs that use <em>c</em> cores for each of <em>n</em> tasks
<ul>
<li>e.g., MPI+threaded code</li>
</ul></li>
</ul>
<p>We have lots more <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/scheduler-examples">examples
of job submission scripts</a> for different kinds of parallelization
(multi-node (MPI), multi-core (openMP), hybrid, etc.</p>
</div>
<div id="mpi-and-slurm" class="slide section level1">
<h1>MPI and Slurm</h1>
<p>Slurm’s “ntasks” corresponds to the number of MPI tasks.</p>
<p>MPI knows about the Slurm job specification.</p>
<p>So you don’t need to specify <code>-np</code> or
<code>--machinefile</code> with <code>mpirun/mpiexec</code>.</p>
</div>
<div id="mpi-troubleshooting" class="slide section level1">
<h1>MPI troubleshooting</h1>
<p>It’s not uncommon to get MPI run-time errors on Savio that can be
hard to decipher, particularly when running on multiple nodes.</p>
<ul>
<li>Load the compiler module (e.g., <code>gcc</code>,
<code>intel</code>), then load the compiler-specific MPI module (e.g.,
<code>openmpi</code>)</li>
<li>The MPI version used to compile code should be the same as used to
run the code.</li>
<li>The MPI version used inside an Apptainer/Singularity container
should be the same as module loaded on the system.</li>
<li>Use MPI+UCX for MPI jobs on <code>savio4_htc</code> for efficiency
(<code>module load gcc/11.3.0 openmpi/5.0.0-ucx</code>)</li>
</ul>
<p>If you troubleshoot based on the above items and are still stuck,
please contact us.</p>
</div>
<div id="using-multiple-gpus" class="slide section level1">
<h1>Using multiple GPUs</h1>
<ul>
<li>Is your code set up to use multiple GPUs?</li>
<li><code>CUDA_VISIBLE_DEVICES</code> will be set when your job
starts.</li>
<li>With PyTorch, you will refer to the GPUs indexed starting with
0.</li>
</ul>
<pre><code>import torch
gpu0 = torch.device("cuda:0")
gpu1 = torch.device("cuda:1")
x = torch.rand(100)
x0 = x.to(gpu0)
x1 = x.to(gpu1)</code></pre>
</div>
<div id="parallelizing-independent-computations"
class="slide section level1">
<h1>Parallelizing independent computations</h1>
<p>You may have many serial jobs to run. It may be more cost-effective
and/or simply easier to manage if you collect those jobs together and
run them across multiple cores on one or more nodes.</p>
<p>Here are some options:</p>
<ul>
<li>using <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/gnu-parallel/">GNU
parallel</a> to run many computational tasks (e.g., thousands of
simulations, scanning tens of thousands of parameter values, etc.) as
part of single Savio job submission</li>
<li>using <a
href="https://berkeley-scf.github.io/tutorial-parallelization">single-node
or multi-node parallelism</a> in Python, R, Julia, MATLAB, etc.
<ul>
<li>parallel R tools such as <em>future</em>, <em>foreach</em>,
<em>parLapply</em>, and <em>mclapply</em></li>
<li>parallel Python tools such as <em>ipyparallel</em>, <em>Dask</em>,
and <em>ray</em></li>
<li>parallel functionality in MATLAB through <em>parfor</em></li>
</ul></li>
</ul>
</div>
<div id="troubleshooting-failed-or-misbehaving-jobs"
class="slide section level1">
<h1>Troubleshooting failed or misbehaving jobs</h1>
<ul>
<li>Look at the software’s log/output files and Slurm’s job/error files
(<code>slurm-<JOB_ID>.out</code>,
<code>slurm-<JOB_ID>.err</code>)</li>
<li>Use <code>sacct</code> to look at result of failed jobs (memory use,
time limit, error codes):
<ul>
<li><code>sacct -j <JOB_ID> --format=JobID,JobName,MaxRSS,Elapsed</code></li>
<li><code>sacct -u <USER> -S 2024-04-04 --format User,JobID,JobName,Partition,Account,AllocCPUS,State,MaxRSS,ExitCode,Submit,Start,End,Elapsed,Timelimit,NodeList</code></li>
</ul></li>
<li>Possible hardware failures – use <code>sacct</code> to see if
repeated failures occur on particular node(s)
<ul>
<li>Specify nodes with <code>-w</code> or exclude with
<code>-x</code>.</li>
</ul></li>
<li>Run your code interactively via <code>srun</code></li>
<li>Run multi-node jobs on a single node to check for communication
issues or issues with modules on additional nodes</li>
<li>Contact us if you’re stuck.</li>
</ul>
</div>
</body>
</html>