TOWAYINFO

亲和性和核心绑定的应用

首页    技术视角    亲和性和核心绑定的应用

一.环境(Rocky 8.8/openEuler 22.03 , slurm 23.02)

   1.  Node     

Nodename

N1

N2

N3

N5

Number of Sockets

2

2

2

1

Number of Cores per Socket

4

4

4

4

Total Number of Cores

8

8

8

4

Number of Threads (CPUs) per Core

1

1

1

2

Total Number of CPUs

8

8

8

8

   2. Partition

PartitionName

Part001

Part003

Nodes

N1/N2/N3

N5

Default

YES

-

 

二. Job 运行

    1.Job 需求

      一个job需要 6 个CPUs (6 tasks with no overcommitment). 在默认分区的单个节点中运行作业.将核心绑定到应用于每个任务.

    2. 任务分布   

Nodename

N1

Socket id

0

1

Number of Allocated CPUs

3

3

Allocated CPU ids

0 1 2

4 5 6

Binding of Tasks to CPUs

CPU id

0

1

2

3

4

5

6

7

Task id

0

2

4

-

1

3

5

-

 

    3. 参数配置

        SelectType=select/cons_tres

        SelectTypeParameters=CR_Core

        TaskPlugin=task/affinity

    4. 执行命令

       srun --nodes=1-1 --ntasks=6 --cpu-bind=cores sleep 60

 

 三. Log 日志

     1. N1    

[2024-02-07T16:54:07.965] launch task StepId=16.0 request from UID:0 GID:0 HOST:192.168.100.40 PORT:41324

[2024-02-07T16:54:07.965] task/affinity: lllp_distribution: JobId=16 binding: cores,one_thread, dist 8192

[2024-02-07T16:54:07.965] task/affinity: _task_layout_lllp_cyclic: _task_layout_lllp_cyclic

[2024-02-07T16:54:07.965] task/affinity: _lllp_generate_cpu_bind: _lllp_generate_cpu_bind jobid [16]: mask_cpu,one_thread, 0x01,0x10,0x02,0x20,0x04,0x40

[2024-02-07T16:55:08.157] [16.0] done with job

 

四. 总结

通过log日志首先可用确定分配在1个节点上执行了tasks.其次,通过log中的 cpu_mask (0x01,0x10,0x02,0x20,0x04,0x40) 的数量确定tasks数, 值确定CPUs的位置,而且是一对一的绑定关系.