亲和性和核心绑定的应用
一.环境(Rocky 8.8/openEuler 22.03 , slurm 23.02)
1. Node
Nodename |
N1 |
N2 |
N3 |
N5 |
Number of Sockets |
2 |
2 |
2 |
1 |
Number of Cores per Socket |
4 |
4 |
4 |
4 |
Total Number of Cores |
8 |
8 |
8 |
4 |
Number of Threads (CPUs) per Core |
1 |
1 |
1 |
2 |
Total Number of CPUs |
8 |
8 |
8 |
8 |
2. Partition
PartitionName |
Part001 |
Part003 |
Nodes |
N1/N2/N3 |
N5 |
Default |
YES |
- |
二. Job 运行
1.Job 需求
一个job需要 6 个CPUs (6 tasks with no overcommitment). 在默认分区的单个节点中运行作业.将核心绑定到应用于每个任务.
2. 任务分布
Nodename |
N1 |
||||||||
Socket id |
0 |
1 |
|||||||
Number of Allocated CPUs |
3 |
3 |
|||||||
Allocated CPU ids |
0 1 2 |
4 5 6 |
|||||||
Binding of Tasks to CPUs |
CPU id |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
Task id |
0 |
2 |
4 |
- |
1 |
3 |
5 |
- |
3. 参数配置
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
TaskPlugin=task/affinity
4. 执行命令
srun --nodes=1-1 --ntasks=6 --cpu-bind=cores sleep 60
三. Log 日志
1. N1
[2024-02-07T16:54:07.965] launch task StepId=16.0 request from UID:0 GID:0 HOST:192.168.100.40 PORT:41324
[2024-02-07T16:54:07.965] task/affinity: lllp_distribution: JobId=16 binding: cores,one_thread, dist 8192
[2024-02-07T16:54:07.965] task/affinity: _task_layout_lllp_cyclic: _task_layout_lllp_cyclic
[2024-02-07T16:54:07.965] task/affinity: _lllp_generate_cpu_bind: _lllp_generate_cpu_bind jobid [16]: mask_cpu,one_thread, 0x01,0x10,0x02,0x20,0x04,0x40
[2024-02-07T16:55:08.157] [16.0] done with job
四. 总结
通过log日志首先可用确定分配在1个节点上执行了tasks.其次,通过log中的 cpu_mask (0x01,0x10,0x02,0x20,0x04,0x40) 的数量确定tasks数, 值确定CPUs的位置,而且是一对一的绑定关系.