节点/作业详细信息查询-scontrol

查看、修改SLURM配置和状态,此处仅介绍常用的查看命令。

查看节点信息

scontrol show node[=node_name]

查看节点的详细信息,如果不指定ndoe_name默认会显示所有节点信息,如果指定node_name,则仅显示指定节点的信息。

$ scontrol show node b13r2n18
NodeName=b13r2n18 Arch=x86_64 CoresPerSocket=8 
CPUAlloc=32 CPUTot=32 CPULoad=0.10
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=加速卡:国产处理器:4
NodeAddr=b13r2n18 NodeHostName=b13r2n18 Version=18.08
OS=Linux 3.10.0-693.el7.x86_64 #1 SMP Mon Apr 8 19:28:59 CST 2019 
RealMemory=126530 AllocMem=61856 FreeMem=116609 Sockets=4 Boards=1
MemSpecLimit=10240
State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=debug 
BootTime=2019-07-14T13:42:57 SlurmdStartTime=2019-07-14T13:43:22
CfgTRES=cpu=32,mem=126530M,billing=32,gres/加速卡=4
AllocTRES=cpu=32,mem=61856M,gres/加速卡=4
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

查看作业信息

scontrol show job [jod_id]

查看作业的详细信息,如果指定job_id,则仅显示指定作业的信息。

$ scontrol show job 64509
JobId=64509 JobName=DL_test
UserId=hc(1060) GroupId=nobody(1060) MCS_label=N/A
Priority=1000 Nice=0 Account=sugon QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=04:01:35 TimeLimit=1-00:00:00 TimeMin=N/A
SubmitTime=2019-07-16T13:44:19 EligibleTime=2019-07-16T13:44:19
AccrueTime=2019-07-16T13:44:19
StartTime=2019-07-16T13:44:20 EndTime=2019-07-17T13:44:20 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2019-07-16T13:44:20
Partition=debug AllocNode:Sid=j00admin2:44146
ReqNodeList=(null) ExcNodeList=(null)
NodeList=b01r1n[00-08],b01r4n[06-08],b03r1n[00-08,10-18],b06r2n[10-18],b06r3n[00-08,10-18],b07r1n[00-08,10-18],b07r2n[00-08,10-18]b07r3n[00-08,10-18],b08r1n[00-08,10-18],b08r2n[00-08,10-18],b08r4n[00-08,10-18],b09r1n[00-08,10-18],b09r2n[00-08,10-18],b09r3n[00-0810-18],b09r4n[00-08,10-18],b10r4n[00-08,10-18],b11r1n[00-08,10-18],g05r4n[15-19],g06r1n[00-06],g09r1n[15-19],g09r2n[00-09]
BatchHost=b01r1n00
NumNodes=300 NumCPUs=9600 NumTasks=9600 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=9600,mem=18556800M,node=300,billing=9600,gres/加速卡=1200
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=1933M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/public/home/hc/slurm/sleep.slurm
WorkDir=/public/home/hc/slurm
StdErr=/public/home/hc/slurm/slurm-64509.out
StdIn=/dev/null
StdOut=/public/home/hc/slurm/slurm-64509.out
Power=
TresPerNode=加速卡:4

results matching ""

    No results matching ""