Danny Auble | 2 May 23:23 2014

Slurm version 14.03.2 is now available

We are Please to announce Slurm 14.03.2 available at 

Please upgrade at your earliest convenience.

Here is a list of changes/fixes since 14.03.1-2.

  -- Fix race condition if PrologFlags=Alloc,NoHold is used.
  -- Cray - Make NPC only limit running other NPC jobs on shared blades 
     of limited non NPC jobs.
  -- Fix for sbatch #PBS -m (mail) option parsing.
  -- Fix job dependency bug. Jobs dependent upon multiple other jobs may 
  -- Set "Reason" field for all elements of a job array on short-circuited
     scheduling for job arrays.
  -- Allow -D option of salloc/srun/sbatch to specify relative path.
  -- Added SchedulerParameter of batch_sched_delay to permit many batch jobs
     to be submitted between each scheduling attempt to reduce overhead of
     scheduling logic.
  -- Added job reason of "SchedTimeout" if the scheduler was not able to 
     the job to attempt scheduling it.
  -- Add job's exit state and exit code to email message.
  -- scontrol hold/release accepts job name option (in addition to job ID).
  -- Handle when trying to cancel a step that hasn't started yet better.
  -- Handle Max/GrpCPU limits better
  -- Add --priority option to salloc, sbatch and srun commands.
(Continue reading)

jette | 21 Apr 21:46 2014

CFP: Slurm User Group Meeting 2014

You are invited to submit an abstract of a tutorial, technical  
presentation or site report to be given at the Slurm User Group  
Meeting 2014. This event is sponsored and organized by The Swiss  
National Supercomputing Centre and will be held in Lugano, Switzerland  
on 23-24 September 2014.

This international event is opened to everyone who wants to:
Learn more about Slurm, a highly scalable Resource Manager and Job Scheduler
Share their knowledge and experience with other users and administrators
Get detailed information about the latest features and developments
Share requirements and discuss future developments

Everyone who wants to present their own usage, developments, site  
report, or tutorial about Slurm is invited to send an abstract to  

Important Dates:
6 June 2014: Abstracts due
27 June 2014: Notification of acceptance
23-24 September 2014: Slurm User Group Meeting 2014

Program Committee:
Yiannis Georgiou (Bull)
Matthieu Hautreux (CEA)
Morris Jette (SchedMD)
Donald Lipari (LLNL, Lawrence Livermore National Laboratory)
Colin McMurtrie (CSCS, Swiss National Supercomputing Centre)
Stephen Trofinoff (CSCS, Swiss National Supercomputing Centre)

(Continue reading)

jette | 21 Apr 21:42 2014

Slurm version 14.03.1 is now available

Slurm version 14.03.1 is now available with four weeks worth of bug  
fixes as described below. You can download the Slurm from:

* Changes in Slurm 14.03.1
  -- Add support for job std_in, std_out and std_err fields in Perl API.
  -- Add "Scheduling Configuration Guide" web page.
  -- BGQ - fix check for jobinfo when it is NULL
  -- Do not check cleaning on "pending" steps.
  -- task/cgroup plugin - Fix for building on older hwloc (v1.0.2).
  -- In the PMI implementation by default don't check for duplicate keys.
     Set the SLURM_PMI_KVS_DUP_KEYS if you want the code to check for
     duplicate keys.
  -- Add job submission time to squeue.
  -- Permit user root to propagate resource limits higher than the hard limit
     slurmd has on that compute node has (i.e. raise both current and maximum
  -- Fix issue with license used count when doing an scontrol reconfig.
  -- Fix the PMI iterator to not report duplicated keys.
  -- Fix issue with sinfo when -o is used without the %P option.
  -- Rather than immediately invoking an execution of the scheduling logic on
     every event type that can enable the execution of a new job, queue its
     execution. This permits faster execution of some operations, such as
     modifying large counts of jobs, by executing the scheduling logic less
     frequently, but still in a timely fashion.
  -- If the environment variable is greater than MAX_ENV_STRLEN don't
     set it in the job env otherwise the exec() fails.
  -- Optimize scontrol hold/release logic for job arrays.
(Continue reading)

Moe Jette | 26 Mar 21:37 2014

Slurm version 14.03.0 is now available

Slurm version 14.03.0 is now available. This is a major Slurm release  
with many new features. See the RELEASE_NOTES and NEWS files in the  
distribution for detailed descriptions of the changes, a few of which  
are noted below. Upgrading from Slurm versions 2.5 or 2.6 should  
proceed without loss of
jobs or other state. Just be sure to upgrade the slurmdbd first.  
(Upgrades from pre-releases of version 14.03 may result job loss.)  
Slurm ownloads are available from

Highlights of changes in Slurm version 14.03.0 include:
  -- Added support for native Slurm operation on Cray systems (without ALPS).
  -- Added partition configuration parameters AllowAccounts, AllowQOS,
     DenyAccounts and DenyQOS to provide greater control over use.
  -- Added the ability to perform load based scheduling. Allocating  
resources to
     jobs on the nodes with the largest number if idle CPUs.
  -- Added support for reserving cores on a compute node for system services
     (core specialization)
  -- Add mechanism for job_submit plugin to generate error message for srun,
     salloc or sbatch to stderr.
  -- Support for Postgres database has long since been out of date and
     problematic, so it has been removed entirely.  If you would like to
     use it the code still exists in <= 2.6, but will not be included in
     this and future versions of the code.
  -- Added new structures and support for both server and cluster resources.
  -- Significant performance improvements, especially with respect to job
     array support.

(Continue reading)

Danny Auble | 14 Mar 23:14 2014

Slurm versions 2.6.7 and 14.03.0-rc1 are now available

We are pleased to announce the availability of Slurm version 2.6.7, plus 
version 14.03.0-rc1 (release candidate 1). We plan to release version 
14.03.0 by the end of the month. See the "RELEASE_NOTES" file in the 
distribution for a description of the major changes in version 14.03.

This will most likely be the last 2.6 release.  14.03 code has been 
frozen for development and will only get bug fixes from here on out.  
Thanks to all those that have contributed to the effort!

The Slurm distributions are available from:

Bug fixes and enhancements in these 2 versions are listed below...

* Changes in Slurm 2.6.7
  -- Properly enforce a job's cpus-per-task option when a job's 
allocation is
     constrained on some nodes by the mem-per-cpu option.
  -- Correct the slurm.conf man pages and checkpoint_blcr.html page
     describing that jobs must be drained from cluster before deploying
     any checkpoint plugin. Corrected in version 14.03.
  -- Fix issue where if using munge and munge wasn't running and a slurmd
     needed to forward a message, the slurmd would core dump.
  -- Update srun.1 man page documenting the PMI2 support.
  -- Fix slurmctld core dump when a jobs gets its QOS updated but there
     is not a corresponding association.
  -- If a job requires specific nodes and can not run due to those nodes 
(Continue reading)

Moe Jette | 23 Dec 21:45 2013

Slurm versions 2.6.5 and 14.03.0-pre5 are now available

Slurm version 2.6.5 with a multitude of bug fixes plus is now  
available. We are also making available version 14.03.0-pre5 with more  
development work for the next major release. A summary of changes are  
listed below. Downloads are available from  

* Changes in Slurm 2.6.5
  -- Correction to hostlist parsing bug introduced in v2.6.4 for hostlists with
     more than one numeric range in brackets (e.g. rack[0-3]_blade[0-63]").
  -- Add notification if using proctrack/cgroup and task/cgroup when oom hits.
  -- Corrections to advanced reservation logic with overlapping jobs.
  -- job_submit/lua - add cpus_per_task field to those available.
  -- Add cpu_load to the node information available using the Perl API.
  -- Correct a job's GRES allocation data in accounting records for non-Cray
  -- Substantial performance improvement for systems with Shared=YES or FORCE
     and large numbers of running jobs (replace bubble sort with quick sort).
  -- proctrack/cgroup - Add locking to prevent race condition where  
one job step
     is ending for a user or job at the same time another job stepsis starting
     and the user or job container is deleted from under the starting job step.
  -- Fixed sh5util loop when there are no node-step files.
  -- Fix race condition on batch job termination that could result in  
a job exit
     code of 0xfffffffe if the slurmd on node zero registers its active jobs at
     the same time that slurmstepd is recording the job's exit code.
  -- Correct logic returning remaining job dependencies in job information
     reported by scontrol and squeue. Eliminates vestigial descriptors with
(Continue reading)

Moe Jette | 16 Nov 01:22 2013

Slurm release schedule update

Several Slurm enhancements planned for the next major release will not  
be available for a December release. The major organizations  
performing SLurm development have decided that the best course of  
action is to delay the next major release until March 2014. This will  
provide sufficient time to complete the development planned for the  
December release, plus some additional work.

Organizations compelled to upgrade before March can make use of  
version 13.12/14.03 pre-releases. Waiting for the 14.03 release is  
advised for mission critical applications. As with any pre-releases,  
upgrades from a pre-release of version 13.12/14.03 to version 14.03  
may result in loss of jobs and other state information. We regret any  
inconveneience this delay may cause.

Moe Jette | 5 Nov 01:07 2013

Slurm versions 2.6.4 and 13.12.0-pre4 are now available

Slurm version 2.6.4 with a multitude of bug fixes plus some new  
development to better support Torque/PBS commands and options is now  
available. We are also making available version 13.12.0-pre4 with more  
development work for the next major release. Detailed descriptions of  
the changes are shown below. Downloads are available from

* Changes in Slurm 2.6.4
  -- Fixed sh5util to print its usage.
  -- Corrected commit f9a3c7e4e8ec.
  -- Honor ntasks-per-node option with exclusive node allocations.
  -- sched/backfill - Prevent invalid memory reference if bf_continue option is
     configured and slurm is reconfigured during one of the sleep cycles or if
     there are any changes to the partition configuration or if the normal
     scheduler runs and starts a job that the backfill scheduler is actively
     working on.
  -- Update man pages information about acct-freq and JobAcctGatherFrequency
     to reflect only the latest supported format.
  -- Minor document update to include note about PrivateData=Usage for the
     slurm.conf when using the DBD.
  -- Expand information reported with DebugFlags=backfill.
  -- Initiate jobs pending to run in a reservation as soon as the reservation
     becomes active.
  -- Purged expired reservation even if it has pending jobs.
  -- Corrections to calculation of a pending job's expected start time.
  -- Remove some vestigial logic treating job priority of 1 as a special case.
  -- Memory freeing up to avoid minor memory leaks at close of daemons
  -- Updated documentation to give correct units being displayed.
(Continue reading)

Moe Jette | 4 Oct 00:02 2013

Slurm versions 2.6.3 and 13.12.0-pre3 are now available

Slurm version 2.6.3 with a multitude of bug fixes plus some new development to
better support Torque/PBS commands and options is now available. We are also
making available version 13.12.0-pre3 with more development work for the next
major release. The latest versions (and earlier versions) are available from:

* Changes in Slurm 2.6.3
  -- Add support for some new #PBS options in sbatch scripts and qsub wrapper:
     -l accelerator=true|false	(GPU use)
     -l mpiprocs=#	(processors per node)
     -l naccelerators=#	(GPU count)
     -l select=#		(node count)
     -l ncpus=#		(task count)
     -v key=value	(environment variable)
     -W depend=opts	(job dependencies, including "on" and "before" options)
     -W umask=#		(set job's umask)
  -- Added qalter and qrerun commands to torque package.
  -- Corrections to qstat logic: job CPU count and partition time format.
  -- Add job_submit/pbs plugin to translate PBS job dependency options to the
     extend possible (no support for PBS "before" options) and set some PBS
     environment variables.
  -- Add spank/pbs plugin to set a bunch of PBS environment variables.
  -- Backported sh5util from master to 2.6 as there are some important
     bugfixes and the new item extraction feature.
  -- select/cons_res - Correct MacCPUsPerNode partition constraint for  
  -- scontrol - for setdebugflags command, avoid parsing "-flagname" as an
     scontrol command line option.
(Continue reading)

Moe Jette | 20 Sep 19:19 2013

Slurm User Group Meeting, Presentations

Copies of (almost all) of the Slurm User Group Meeting presentations  
are now available here:


Thanks to all of the speakers. We hope to see you again in Lugano  
Switzerland at the 2014 meeting hosted by the Swiss National  
Supercomputing Centre.

Moe Jette

Moe Jette | 11 Sep 00:29 2013

Slurm versions 2.6.2 and 13.12.0-pre2 are now available

We are pleased to announce the availability of Slurm version 2.6.2  
(with various bug fixes) and 13.12.0-pre2 (with second installment of  
development for the next major release). Downloads are available from  

* Changes in Slurm 2.6.2
  -- Fix issue with reconfig and GrpCPURunMins
  -- Fix of wrong node/job state problem after reconfig
  -- Allow users who are coordinators update their own limits in the accounts
     they are coordinators over.
  -- BackupController - Make sure we have a connection to the DBD first thing
     to avoid it thinking we don't have a cluster name.
  -- Correct value of min_nodes returned by loading job information to consider
     the job's task count and maximum CPUs per node.
  -- If running jobacct_gather/none fix issue on unpacking step completion.
  -- Reservation with CoreCnt: Avoid possible invalid memory reference.
  -- sjstat - Add man page when generating rpms.
  -- Make sure GrpCPURunMins is added when creating a user, account or QOS with
  -- Fix for invalid memory reference due to multiple free calls caused by
     job arrays submitted to multiple partitions.
  -- Enforce --ntasks-per-socket=1 job option when allocating by socket.
  -- Validate permissions of key directories at slurmctld startup. Report
     anything that is world writable.
  -- Improve GRES support for CPU topology. Previous logic would pick CPUs then
     reject jobs that can not match GRES to the allocated CPUs. New logic first
     filters out CPUs that can not use the GRES, next picks CPUs for the job,
     and finally picks the GRES that best match those CPUs.
(Continue reading)