Moe Jette | 12 Jul 21:51 2013

Slurm User Group Meeting and Training


There is a preliminary agenda of the Slurm User Group Meeting  
available and registration is open. Details are at the site below:

http://www.schedmd.com/slurmdocs/slurm_ug_agenda.html

We may also provide introductory Slurm usage and administration  
training given sufficient demand. This training would occur the day  
before the Slurm User Group Meeting and would be available at an  
addition cost. If you are interested in attending such training,  
please respond directly to me (NOT the mailing list).

Moe Jette | 10 Jul 18:04 2013

Slurm version 2.6 is now available


We are pleased to announce the availability of Slurm version 2.6. Changes from
version 2.5 are extensive and highlights are listed below. Please see the
RELEASE_NOTES file in the Slurm distribution for more details. Note the Slurm
documentation at schedmd.com has been updated to version 2.6.

Download the latest version of Slurm from:
http://www.schedmd.com/#repos

Highlights of changes in Slurm version 2.6 include:
  - Added support for job arrays, which increases performance and ease of use
    for sets of similar jobs. This may necessitate changes in prolog and/or
    epilog scripts due to change in the job ID format, which is now of the form
    "<job_id>_<index>" for job arrays.
    http://slurm.schedmd.com/job_array.html
  - Added support for job profiling to periodically capture each  
task's CPU use,
    memory use, power consumption, Lustre use and Infiniband network use.
    http://slurm.schedmd.com/hdf5_profile_user_guide.html
  - Added support for generic external sensor plugins which can be used to
    capture temperature and power consumption data.
    http://slurm.schedmd.com/ext_sensorsplugins.html
    http://slurm.schedmd.com/ext_sensors.conf.html
  - Added mpi/pmi2 plugin with much more scalable performance for MPI
    implementations using PMI communications interface.
    http://slurm.schedmd.com/mpi_guide.html#mpich2
  - Added prolog and epilog support for advanced reservations.
  - Much faster throughput for job step execution with --exclusive option. The
    srun process is notified when resources become available rather than
    periodic polling.
(Continue reading)

Moe Jette | 5 Jun 22:31 2013

Slurm versions 2.5.7 and 2.6.0-rc1 are now available


We are pleased to announce the availability of Slurm version 2.5.7  
with the bug fixes listed below, plus version 2.6.0-rc1 (release  
candidate 1) with the bug fixes and enhancements listed below. We plan  
to release version 2.6.0 after more testing. See the "RELEASE_NOTES"  
file in the distribution for a description of the major changes in  
version 2.6.

A great way to find out about Slurm development is to attend the Slurm  
User Group Meeting, September 18 - 19 in Oakland, California, USA:
http://www.schedmd.com/slurmdocs/slurm_ug_agenda.html

The Slurm distributions are available from:
http://www.schedmd.com/#repos

* Changes in Slurm 2.5.7
========================
  -- Fix for linking to the select/cray plugin to not give warning about
     undefined variable.
  -- Add missing symbols to the xlator.h
  -- Avoid placing pending jobs in AdminHold state due to backfill scheduler
     interactions with advanced reservation.
  -- Accounting - make average by task not cpu.
  -- CRAY - Change logging of transient ALPS errors from error() to debug().
  -- POE - Correct logic to support poe option "-euidevice sn_all" and
     "-euidevice sn_single".
  -- Accounting - Fix minor initialization error.
  -- POE - Correct logic to support srun network instances count with POE.
  -- POE - With the srun --launch-cmd option, report proper task count when
     the --cpus-per-task option is used without the --ntasks option.
(Continue reading)

Moe Jette | 24 May 18:53 2013

Slurm User Group Meeting, Call for Abstracts, Deadline extension


You are invited to submit an abstract of a presentation or tutorial to  
be given at the Slurm User Group Meeting 2013. This event is sponsored  
and organized by SchedMD and will be held in Oakland, California, USA  
on September 18 and 19, 2013.

This international event is opened to everyone who wants to:

* Learn more about Slurm, a highly scalable Resource Manager and Job Scheduler
* Share their knowledge and experience with other users and administrators
* Get detailed informations about the latest features and developments
* Share requirements and discuss future developments

Everyone who wants to present their own usage, developments, site  
report, or tutorial about Slurm is invited to send an abstract to  
sugc@...

IMPORTANT DATES:
May 31, 2013: Abstracts due
June 21, 2013: Notification of acceptance
September 18-19, 2013: Slurm User Group Meeting 2013

Program Committee:
Yiannis Georgiou (Bull)
Matthieu Hautreux (CEA)
Morris Jette (SchedMD)
Donald Lipari (LLNL, Lawrence Livermore National Laboratory)
Colin McMurtrie (CSCS, Swiss National Supercomputing Centre)
Stephen Trofinoff (CSCS, Swiss National Supercomputing Centre)

(Continue reading)

Moe Jette | 6 May 00:38 2013

CFA: Slurm User Group Meeting, Abstracts due May 24


This is a reminding that abstracts for the 2013 Slurm User Group  
Meeting are due on 24 May. The event will be held in Oakland,  
California, USA on September 18 and 19, 2013.

This international event is opened to everyone who wants to:

     Learn more about Slurm, a highly scalable Resource Manager and  
Job Scheduler
     Share their knowledge and experience with other users and administrators
     Get detailed informations about the latest features and developments
     Share requirements and discuss future developments

Everyone who wants to present their own usage, developments, site  
report, or tutorial about Slurm is invited to send an abstract to  
sugc@...

For more information, please see:
http://slurm.schedmd.com/slurm_ug_cfp.html

Important Dates:
May 24, 2013: Abstracts due
June 21, 2013: Notification of acceptance
September 18-19, 2013: Slurm User Group Meeting 2013

Program Committee:
Yiannis Georgiou (Bull)
Matthieu Hautreux (CEA)
Morris Jette (SchedMD)
Donald Lipari (LLNL, Lawrence Livermore National Laboratory)
(Continue reading)

Danny Auble | 26 Apr 02:22 2013

2.5.6 now available


We have just found a regression in 2.5.5 if using the mysql database for 
accounting along with GRES.  There for we tagged a 2.5.6 with the fix.

You can download it from http://www.schedmd.com/#repos.

This bug only exists in 2.5.5 and 2.6.0-0pre3 systems.  Those running 
2.6 pre releases are advised to patch there code base or just do a pull 
from github.

A simple patch is found here 
https://github.com/SchedMD/slurm/commit/e5bc5e3615515ae1c023f1e7d067fa2933307467.patch.

2.5.6 also contains a patch dealing with requeuing jobs that use GRES as 
well.

Sorry for the inconvenience.

Danny

Danny Auble | 25 Apr 02:15 2013

Slurm versions 2.5.5 and 2.6.0-pre3 are now available


Slurm versions 2.5.5 and 2.6.0-pre3 are now available.

The latest versions of Slurm are available from 
http://www.schedmd.com/#repos

* Changes in Slurm 2.5.5
========================
  -- Fix for sacctmgr add qos to handle the 'flags' option.
  -- Export SLURM_ environment variables from sbatch, even if "--export"
     option does not explicitly list them.
  -- If node is in more than one partition, correct counting of 
allocated CPUs.
  -- If step requests more CPUs than possible in specified node count of job
     allocation then return ESLURM_TOO_MANY_REQUESTED_CPUS rather than
     ESLURM_NODES_BUSY and retrying.
  -- CRAY - Fix SLURM_TASKS_PER_NODE to be set correctly.
  -- Accounting - more checks for strings with a possible `'` in it.
  -- sreport - Fix by adding planned down time to utilization reports.
  -- Do not report an error when sstat identifies job steps terminated 
during
     its execution, but log using debug type message.
  -- Select/cons_res - Permit node removed from job by going down to be 
returned
     to service and re-used by another job.
  -- Select/cons_res - Tighter packing of job allocations on sockets.
  -- SlurmDBD - fix to allow user root along with the slurm user to 
register a
     cluster.
  -- Select/cons_res - Fix for support of consecutive node option.
(Continue reading)

Moe Jette | 3 Apr 23:06 2013

Slurm User Group Meeting, Call for Abstracts


You are invited to submit an abstract of a presentation or tutorial to  
be given at the Slurm User Group Meeting 2013. This event is sponsored  
and organized by SchedMD and will be held in Oakland, California, USA  
on September 18 and 19, 2013.

This international event is opened to everyone who wants to:

* Learn more about Slurm, a highly scalable Resource Manager and Job Scheduler
* Share their knowledge and experience with other users and administrators
* Get detailed informations about the latest features and developments
* Share requirements and discuss future developments

Everyone who wants to present their own usage, developments, site  
report, or tutorial about Slurm is invited to send an abstract to  
sugc@...

IMPORTANT DATES:
May 24, 2013: Abstracts due
June 21, 2013: Notification of acceptance
September 18-19, 2013: Slurm User Group Meeting 2013

Program Committee:
Yiannis Georgiou (Bull)
Matthieu Hautreux (CEA)
Morris Jette (SchedMD)
Donald Lipari (LLNL, Lawrence Livermore National Laboratory)
Colin McMurtrie (CSCS, Swiss National Supercomputing Centre)
Stephen Trofinoff (CSCS, Swiss National Supercomputing Centre)

(Continue reading)

Moe Jette | 8 Mar 21:20 2013

Slurm versions 2.5.4 and 2.6.0-pre2 are now available


Slurm versions 2.5.4 is now available with the bug fixes listed below  
and version 2.6.0-pre2 with the enhancements listed below.

The latest versions of Slurm are available from www.schedmd.com/#repos.

* Changes in Slurm 2.5.4
========================
  -- Fix bug in PrologSlurmctld use that would block job steps until node
     responds.
  -- CRAY - If a partition has MinNodes=0 and a batch job doesn't request nodes
     put the allocation to 1 instead of 0 which prevents the allocation to
     happen.
  -- Better debug when the database is down and using the --cluster option in
     the user commands.
  -- When asking for job states with sacct, default to 'now' instead  
of midnight
     of the current day.
  -- Fix for handling a test-only job or immediate job that fails while being
     built.
  -- Comment out all of the logic in the job_submit/defaults plugin. The logic
     is only an example and not meant for actual use.
  -- Eliminate configuration file 4096 character line limitation.
  -- More robust logic for tree message forward
  -- BGQ - When cnodes fail in a timeout fashion correctly look up parent
     midplane.
  -- Correct sinfo "%c" (node's CPU count) output value for Bluegene systems.
  -- Backfill - Responsive improvements for systems with large numbers of jobs
     (>5000) and using the SchedulerParameters option bf_max_job_user.
  -- slurmstepd: ensure that IO redirection openings from/to files correctly
(Continue reading)

Moe Jette | 6 Feb 00:31 2013

Slurm version 2.5.3 is now available


Slurm version 2.5.3 is now available with the bug fixes listed below.  
Of particular note, SchedMD has been working with the Swiss National  
Supercomputing Centre to identify and fix a Slurm bug which can cause  
the slurmctld daemon to terminate with an invalid memory reference.  
This bug may have been reported by several sites in the past couple of  
weeks.

The latest versions of Slurm are available from http://www.schedmd.com/#repos

The fix for the invalid memory reference is available from
https://github.com/SchedMD/slurm/commit/ff26cc50db9e2fe2f9745a16c8c59fd3e0bd7ae8

* Changes in SLURM 2.5.3
========================
  -- Gres/gpu plugin - If no GPUs requested, set  
CUDA_VISIBLE_DEVICES=NoDevFiles.
     This bug was introduced in 2.5.2 for the case where a GPU count was
     configured, but without device files.
  -- task/affinity plugin - Fix bug in CPU masks for some processors.
  -- Modify sacct command to get format from SACCT_FORMAT environment variable.
  -- BGQ - Changed order of library inclusions and fixed incorrect declaration
     to compile correctly on newer compilers
  -- Fix for not building sview if glib exists on a system but not the  
gtk libs.
  -- BGQ - Fix for handling a job cleanup on a small block if the job has long
     since left the system.
  -- Fix race condition in job dependency logic which can result in invalid
     memory reference.

(Continue reading)

Moe Jette | 31 Jan 23:33 2013

Slurm versions 2.5.2 and 2.6.0-pre1 now available


Slurm version 2.5.2 is now available with the bug fixes described  
below. We have also made availablea  pre-release of version 2.6,(still  
under development). Notable features in v2.6 include support for job  
arrays and accounting for a job's energy consumption using IPMI. The  
job array documentation is available here:
http://www.schedmd.com/slurmdocs/job_array.html

The latest versions of Slurm are available from:
http://www.schedmd.com/#repos

* Changes in SLURM 2.5.2
========================
  -- Fix advanced reservation recovery logic when upgrading from version 2.4.
  -- BLUEGENE - fix for QOS/Association node limits.
  -- Add missing "safe" flag from print of AccountStorageEnforce option.
  -- Fix logic to optimize GRES topology with respect to allocated CPUs.
  -- Add job_submit/all_partitions plugin to set a job's default partition
     to ALL available partitions in the cluster.
  -- Modify switch/nrt logic to permit build without libnrt.so library.
  -- Handle srun task launch failure without duplicate error messages or abort.
  -- Fix bug in QoS limits enforcement when slurmctld restarts and user not yet
     added to the QOS list.
  -- Fix issue where sjstat and sjobexitmod was installed in 2 different RPMs.
  -- Fix for job request of multiple partitions in which some partitions lack
     nodes with required features.
  -- Permit a job to use a QOS they do not have access to if an administrator
     manually set the job's QOS (previously the job would be rejected).
  -- Make more variables available to job_submit/lua plugin: slurm.MEM_PER_CPU,
     slurm.NO_VAL, etc.
(Continue reading)


Gmane