TOWAYINFO

资源碎片最小化分配

首页    技术视角    资源碎片最小化分配

一.环境(Rocky 8.8/openEuler 22.03 , slurm 23.02)

   1.  Node     

Nodename

N1

N2

N3

N5

Number of Sockets

2

2

2

1

Number of Cores per Socket

4

4

4

4

Total Number of Cores

8

8

8

4

Number of Threads (CPUs) per Core

1

1

1

2

Total Number of CPUs

8

8

8

8

   2. Partition

PartitionName

Part001

Part003

Nodes

N1/N2/N3

N5

Default

YES

-

 

二. Job 运行

    1.Job 需求

      一个job需要 12个CPUs (12 tasks and 1 CPU per task with no overcommitment).使用作业所需的最少节点数和最少套接字数分配 CPUs,以最大程度地减少集群中allowcated/unallocated CPUs 的碎片.

    2. 任务分布   

Nodename

N1

N2

N3

Socket id

0

1

0

1

0

1

Number of Allocated CPUs

4

4

4

0

0

0

Number of Tasks

8

4

0

    3. 参数配置

        SelectType=select/cons_tres

        SelectTypeParameters=CR_Core, CR_CORE_DEFAULT_DIST_BLOCK

    4. 执行命令

       srun  --ntasks=12  sleep 60

 

 三. Log 日志

     1. N1    

[2024-02-07T17:58:18.063] launch task StepId=25.0 request from UID:0 GID:0 HOST:192.168.100.40 PORT:49022

[2024-02-07T17:58:18.064] task/affinity: lllp_distribution: JobId=25 implicit auto binding: cores,one_thread, dist 8192

[2024-02-07T17:58:18.064] task/affinity: _task_layout_lllp_block: _task_layout_lllp_block

[2024-02-07T17:58:18.064] task/affinity: _lllp_generate_cpu_bind: _lllp_generate_cpu_bind jobid [25]: mask_cpu,one_thread, 0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80

[2024-02-07T17:59:18.301] [25.0] done with job

    2. N2

[2024-02-07T17:58:16.690] launch task StepId=25.0 request from UID:0 GID:0 HOST:192.168.100.40 PORT:57456

[2024-02-07T17:58:16.690] task/affinity: lllp_distribution: JobId=25 implicit auto binding: cores,one_thread, dist 8192

[2024-02-07T17:58:16.690] task/affinity: _task_layout_lllp_block: _task_layout_lllp_block

[2024-02-07T17:58:16.690] task/affinity: _lllp_generate_cpu_bind: _lllp_generate_cpu_bind jobid [25]: mask_cpu,one_thread, 0x01,0x02,0x04,0x08

[2024-02-07T17:59:16.886] [25.0] done with job

 

四. 总结

通过log日志首先可用确定分配在2个节点上执行了tasks.其次,通过log中的cpu_mask(0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80 /0x01,0x02,0x04,0x08) 可用看出每一个节点上分配的CPUs的位置.