jette | 18 Jul 17:41 2014

Slurm User Group Meeting 2014, Schedule and Registration


The fifth annual Slurm User Group Meeting will be held on September 23  
and 24, hosted by the Swiss National Supercomputing Centre in Lugano,  
Switzerland. The meeting will include an assortment of tutorials,  
technical presentations, and site reports. This is an excellent  
opportunity to learn more about how Slurm works and help to set future  
directions.

The schedule, registration and hotel information are now available online:
http://slurm.schedmd.com/slurm_ug_agenda.html

Thank you for your continued interest and support. We hope to see you  
in Lugano!

Sincerely,
Moe Jette
CTO, SchedMD LLC

jette | 17 Jul 00:57 2014

Slurm version 14.03.6 is now available


Slurm version 14.03.6 is now available. Version 14.03.6 includes  
includes a few bug fixes, including a bug related to generic resources  
that can result in the slurmctld daemon aborting.

Slurm downloads are available from
http://www.schedmd.com/#repos

Highlights of changes in Slurm version 14.03.6 include:

  -- Added examples to demonstrate the use of the sacct -T option to the man
     page.
  -- Fix for regression in 14.03.5 with sacctmgr load when Parent has "'"
     around it.
  -- Update comments in sacctmgr dump header.
  -- Fix for possible abort on change in GRES configuration.
  -- CRAY - fix modules file, (backport from 14.11 commit 78fe86192b.
  -- Fix race condition which could result in requeue if batch job  
exit and node
     registration occur at the same time.
  -- switch/nrt - Unload job tables (in addition to windows) in user  
space mode.
  -- Differentiate between two identical debug messages about purging vestigial
     job scripts.
  -- If the socket used by slurmstepd to communicate with slurmd exist when
     slurmstepd attempts to create it, for example left over from a previous
     requeue or crash, delete it and recreate it.

jette | 10 Jul 23:24 2014

Slurm versions 14.03.5 and 14.11.0-pre2 are now available


Slurm versions 14.03.5 and 14.11.0-pre2 are now available. Version  
14.03.5 includes about 40 relatively minor bug fixes and enhancements  
as described below. Version 14.11.0-pre2 is the second pre-release of  
the next major release of Slurm scheduled for November 2014. This is  
very much a work in progress and not intended for production use.

Slurm downloads are available from http://www.schedmd.com/#repos.

Highlights of changes in Slurm version 14.03.5 include:
  -- If a srun runs in an exclusive allocation and doesn't use the entire
     allocation and CR_PACK_NODES is set layout tasks appropriately.
  -- Correct Shared field in job state information seen by scontrol,  
sview, etc.
  -- Print Slurm error string in scontrol update job and reset the Slurm errno
     before each call to the API.
  -- Fix task/cgroup to handle -mblock:fcyclic correctly
  -- Fix for core-based advanced reservations where the distribution of cores
     across nodes is not even.
  -- Fix issue where association maxnodes wouldn't be evaluated correctly if a
     QOS had a GrpNodes set.
  -- GRES fix with multiple files defined per line in gres.conf.
  -- When a job is requeued make sure accounting marks it as such.
  -- Print the state of requeued job as REQUEUED.
  -- Fix if a job's partition was taken away from it don't allow a requeue.
  -- Make sure we lock on the conf when sending slurmd's conf to the  
slurmstepd.
  -- Fix issue with sacctmgr 'load' not able to gracefully handle bad formatted
     file.
  -- sched/backfill: Correct job start time estimate with advanced  
(Continue reading)

jette | 16 Jun 23:59 2014

Slurm versions 14.03.4 and 14.11.0-pre1 are now available


Slurm versions 14.03.4 and 14.11.0-pre1 are now available.
Version 14.03.4 includes about 40 relatively minor bug fixes and enhancements
as described below. Of particular note, there are several enhancements to
control layout of tasks across resources and significant performance
improvements for backfill scheduling.

Version 14.11.0-pre1 is the first pre-release of the next major release of
Slurm scheduled for November 2014. This is very much a work in  
progress and not
intended for production use.

Slurm downloads are available from
<a href="http://www.schedmd.com/#repos">http://www.schedmd.com/#repos</a>.

Highlights of changes in Slurm version 14.03.4 include:

  -- Fix issue where not enforcing QOS but a partition either allows or denies
     them.
  -- CRAY - Make switch/cray default when running on a Cray natively.
  -- CRAY - Make job_container/cncu default when running on a Cray natively.
  -- Disable job time limit change if it's preemption is in progress.
  -- Correct logic to properly enforce job preemption GraceTime.
  -- Fix sinfo -R to print each down/drained node once, rather than once per
     partition.
  -- If a job has non-responding node, retry job step create rather than
     returning with DOWN node error.
  -- Support SLURM_CONF path which does not have "slurm.conf" as the file name.
  -- CRAY - make job_container/cncu default when running on a Cray natively
  -- Fix issue where batch cpuset wasn't looked at correctly in
(Continue reading)

jette | 27 May 23:11 2014

CFP: Slurm User Group Meeting 2014, Due 6 June


You are invited to submit an abstract of a tutorial, technical  
presentation or site report to be given at the Slurm User Group  
Meeting 2014. This event is sponsored and organized by The Swiss  
National Supercomputing Centre and will be held in Lugano, Switzerland  
on 23-24 September 2014.

This international event is opened to everyone who wants to:
Learn more about Slurm, a highly scalable Resource Manager and Job Scheduler
Share their knowledge and experience with other users and administrators
Get detailed information about the latest features and developments
Share requirements and discuss future developments

Everyone who wants to present their own usage, developments, site  
report, or tutorial about Slurm is invited to send an abstract to  
sugc@...

Important Dates:
6 June 2014: Abstracts due
27 June 2014: Notification of acceptance
23-24 September 2014: Slurm User Group Meeting 2014

Program Committee:
Yiannis Georgiou (Bull)
Matthieu Hautreux (CEA)
Morris Jette (SchedMD)
Donald Lipari (LLNL, Lawrence Livermore National Laboratory)
Colin McMurtrie (CSCS, Swiss National Supercomputing Centre)
Stephen Trofinoff (CSCS, Swiss National Supercomputing Centre)

(Continue reading)

jette | 6 May 02:30 2014

Slurm version 14.03.3 is now available


We are pleased to announce that Slurm 14.03.3 is available at
http://www.schedmd.com/#repos.

* Changes in Slurm 14.03.3
==========================
  -- Fix perlapi to compile correctly with perl 5.18
  -- Correction to default batch output file name. In version 14.03.2 was using
     "slurm_<jobid>_4294967294.out" due to error in job array logic.
  -- In slurm.spec file, replace "Requires cray-MySQL-devel-enterprise" with
     "Requires mysql-devel".
  -- Switch/nrt - On switch resource allocation failure, free partial  
allocation.
  -- Switch/nrt - Properly track usage of CAU and RDMA resources with multiple
     tasks per compute node.
  -- Fix issue where user is requesting --acctg-freq=0 and no memory limits.
  -- BGQ - Temp fix issue where job could be left on job_list after it  
finished.
  -- BGQ - Fix issue where limits were checked on midplane counts instead of
     cnode counts.
  -- BGQ - Move code to only start job on a block after limits are checked.
  -- Handle node ranges better when dealing with accounting max node limits.

Danny Auble | 2 May 23:23 2014

Slurm version 14.03.2 is now available


We are Please to announce Slurm 14.03.2 available at 
http://www.schedmd.com/#repos.

Please upgrade at your earliest convenience.

Here is a list of changes/fixes since 14.03.1-2.

  -- Fix race condition if PrologFlags=Alloc,NoHold is used.
  -- Cray - Make NPC only limit running other NPC jobs on shared blades 
instead
     of limited non NPC jobs.
  -- Fix for sbatch #PBS -m (mail) option parsing.
  -- Fix job dependency bug. Jobs dependent upon multiple other jobs may 
start
     prematurely.
  -- Set "Reason" field for all elements of a job array on short-circuited
     scheduling for job arrays.
  -- Allow -D option of salloc/srun/sbatch to specify relative path.
  -- Added SchedulerParameter of batch_sched_delay to permit many batch jobs
     to be submitted between each scheduling attempt to reduce overhead of
     scheduling logic.
  -- Added job reason of "SchedTimeout" if the scheduler was not able to 
reach
     the job to attempt scheduling it.
  -- Add job's exit state and exit code to email message.
  -- scontrol hold/release accepts job name option (in addition to job ID).
  -- Handle when trying to cancel a step that hasn't started yet better.
  -- Handle Max/GrpCPU limits better
  -- Add --priority option to salloc, sbatch and srun commands.
(Continue reading)

jette | 21 Apr 21:46 2014

CFP: Slurm User Group Meeting 2014


You are invited to submit an abstract of a tutorial, technical  
presentation or site report to be given at the Slurm User Group  
Meeting 2014. This event is sponsored and organized by The Swiss  
National Supercomputing Centre and will be held in Lugano, Switzerland  
on 23-24 September 2014.

This international event is opened to everyone who wants to:
Learn more about Slurm, a highly scalable Resource Manager and Job Scheduler
Share their knowledge and experience with other users and administrators
Get detailed information about the latest features and developments
Share requirements and discuss future developments

Everyone who wants to present their own usage, developments, site  
report, or tutorial about Slurm is invited to send an abstract to  
sugc@...

Important Dates:
6 June 2014: Abstracts due
27 June 2014: Notification of acceptance
23-24 September 2014: Slurm User Group Meeting 2014

Program Committee:
Yiannis Georgiou (Bull)
Matthieu Hautreux (CEA)
Morris Jette (SchedMD)
Donald Lipari (LLNL, Lawrence Livermore National Laboratory)
Colin McMurtrie (CSCS, Swiss National Supercomputing Centre)
Stephen Trofinoff (CSCS, Swiss National Supercomputing Centre)

(Continue reading)

jette | 21 Apr 21:42 2014

Slurm version 14.03.1 is now available


Slurm version 14.03.1 is now available with four weeks worth of bug  
fixes as described below. You can download the Slurm from:
http://www.schedmd.com/#repos

* Changes in Slurm 14.03.1
==========================
  -- Add support for job std_in, std_out and std_err fields in Perl API.
  -- Add "Scheduling Configuration Guide" web page.
  -- BGQ - fix check for jobinfo when it is NULL
  -- Do not check cleaning on "pending" steps.
  -- task/cgroup plugin - Fix for building on older hwloc (v1.0.2).
  -- In the PMI implementation by default don't check for duplicate keys.
     Set the SLURM_PMI_KVS_DUP_KEYS if you want the code to check for
     duplicate keys.
  -- Add job submission time to squeue.
  -- Permit user root to propagate resource limits higher than the hard limit
     slurmd has on that compute node has (i.e. raise both current and maximum
     limits).
  -- Fix issue with license used count when doing an scontrol reconfig.
  -- Fix the PMI iterator to not report duplicated keys.
  -- Fix issue with sinfo when -o is used without the %P option.
  -- Rather than immediately invoking an execution of the scheduling logic on
     every event type that can enable the execution of a new job, queue its
     execution. This permits faster execution of some operations, such as
     modifying large counts of jobs, by executing the scheduling logic less
     frequently, but still in a timely fashion.
  -- If the environment variable is greater than MAX_ENV_STRLEN don't
     set it in the job env otherwise the exec() fails.
  -- Optimize scontrol hold/release logic for job arrays.
(Continue reading)

Moe Jette | 26 Mar 21:37 2014

Slurm version 14.03.0 is now available


Slurm version 14.03.0 is now available. This is a major Slurm release  
with many new features. See the RELEASE_NOTES and NEWS files in the  
distribution for detailed descriptions of the changes, a few of which  
are noted below. Upgrading from Slurm versions 2.5 or 2.6 should  
proceed without loss of
jobs or other state. Just be sure to upgrade the slurmdbd first.  
(Upgrades from pre-releases of version 14.03 may result job loss.)  
Slurm ownloads are available from
http://www.schedmd.com/#repos

Highlights of changes in Slurm version 14.03.0 include:
  -- Added support for native Slurm operation on Cray systems (without ALPS).
  -- Added partition configuration parameters AllowAccounts, AllowQOS,
     DenyAccounts and DenyQOS to provide greater control over use.
  -- Added the ability to perform load based scheduling. Allocating  
resources to
     jobs on the nodes with the largest number if idle CPUs.
  -- Added support for reserving cores on a compute node for system services
     (core specialization)
  -- Add mechanism for job_submit plugin to generate error message for srun,
     salloc or sbatch to stderr.
  -- Support for Postgres database has long since been out of date and
     problematic, so it has been removed entirely.  If you would like to
     use it the code still exists in <= 2.6, but will not be included in
     this and future versions of the code.
  -- Added new structures and support for both server and cluster resources.
  -- Significant performance improvements, especially with respect to job
     array support.

(Continue reading)

Danny Auble | 14 Mar 23:14 2014

Slurm versions 2.6.7 and 14.03.0-rc1 are now available


We are pleased to announce the availability of Slurm version 2.6.7, plus 
version 14.03.0-rc1 (release candidate 1). We plan to release version 
14.03.0 by the end of the month. See the "RELEASE_NOTES" file in the 
distribution for a description of the major changes in version 14.03.

This will most likely be the last 2.6 release.  14.03 code has been 
frozen for development and will only get bug fixes from here on out.  
Thanks to all those that have contributed to the effort!

The Slurm distributions are available from:
http://www.schedmd.com/#repos

Bug fixes and enhancements in these 2 versions are listed below...

* Changes in Slurm 2.6.7
========================
  -- Properly enforce a job's cpus-per-task option when a job's 
allocation is
     constrained on some nodes by the mem-per-cpu option.
  -- Correct the slurm.conf man pages and checkpoint_blcr.html page
     describing that jobs must be drained from cluster before deploying
     any checkpoint plugin. Corrected in version 14.03.
  -- Fix issue where if using munge and munge wasn't running and a slurmd
     needed to forward a message, the slurmd would core dump.
  -- Update srun.1 man page documenting the PMI2 support.
  -- Fix slurmctld core dump when a jobs gets its QOS updated but there
     is not a corresponding association.
  -- If a job requires specific nodes and can not run due to those nodes 
being
(Continue reading)


Gmane