Moe Jette | 2 Nov 2012 18:15
Favicon
Gravatar

Slurm version 2.4.4 is now available


Slurm version 2.4.4 is now available from

http://www.schedmd.com/#repos

Changes since version 2.4.3 are almost all bug fixes for IBM  
Bluegene/Q systems as listed below.

We plan to tag version 2.5.0-rc1 (release candidate 1) next week, just  
before the SC12 conference. For those of you attending SC12, there is  
a Slurm BOF on Thursday at 12:15 and will be a Slurm booth on the  
exhibit floor.

* Changes in SLURM 2.4.4
========================
  -- BGQ - minor fix to make build work in emulated mode.
  -- BGQ - Fix if large block goes into error and the next highest  
priority jobs
     are planning on using the block.  Previously it would fail those jobs
     erroneously.
  -- BGQ - Fix issue when a cnode going to an error (not SoftwareError) state
     with a job running or trying to run on it.
  -- Execute slurm_spank_job_epilog when there is no system Epilog configured.
  -- Fix for srun --test-only to work correctly with timelimits
  -- BGQ - If a job goes away while still trying to free it up in the
     database, and the job is running on a small block make sure we free up
     the correct node count.
  -- BGQ - Logic added to make sure a job has finished on a block before it is
     purged from the system if its front-end node goes down.
  -- Modify strigger so that a filter option of "--user=0" is supported.
(Continue reading)

Danny Auble | 26 Sep 2012 18:39
Favicon
Gravatar

SUG Meeting 2012


Just in case you aren't on the dev list...

Slurm User Group Meeting is hosting by Barcelona SuperComputing Center
next 9 and 10 of October.

http://www.bsc.es/marenostrum-support-services/hpc-events-trainings/others/slurm-user-group-meeting-2012

We are pleased to announce the keynotes:

Day 9 of October:

"The OmpSs programming model and its links to resource managers" by
Jes�s Labarta, Director of Computer Sciences Department at Barcelona
Supercomputing Center.

Day 10 of October:

"Challenges in Evaluating Parallel Job Schedulers" by Dror Feitelson,
head of the Experimental System Labs at Hebrew University.

If you plan to come do not forget to register using link above.

See you there.

Moe Jette | 19 Sep 2012 00:00
Favicon
Gravatar

SLURM versions 2.4.3 and 2.5.0-pre3 now available


We are pleased to announce the availability of SLURM version 2.4.3 with a
sizable number of bug fixes, primarily for IBM Bluegene systems. A full list
of changes are shown below.

We have also made available version 2.5.0-pre3, a pre-release of the version
2.5 code, which is still under development. Of particular note, this version
of SLURM supports the IBM Parallel Environment (PE) including POE and IBM's
NRT switch interface. We are nearing the end of development for version 2.5
and will soon move into a testing phase before release, planned for November.
Changes in version 2.5.0-pre3 are also shown below.

The files are available for download from:
http://www.schedmd.com/#repos

* Changes in SLURM 2.4.3
========================
  -- Accounting - Fix so complete 32 bit numbers can be put in for a priority.
  -- cgroups - fix if initial directory is non-existent SLURM creates it
     correctly.  Before the errno wasn't being checked correctly
  -- BGQ - fixed srun when only requesting a task count and not a node count
     to operate the same way salloc or sbatch did and assign a task per cpu
     by default instead of task per node.
  -- Fix salloc --gid to work correctly.  Reported by Brian Gilmer
  -- BGQ - fix smap to set the correct default MloaderImage
  -- BLUEGENE - updated documentation.
  -- Close the batch job's environment file when it contains no data to avoid
     leaking file descriptors.
  -- Fix sbcast's credential to last till the end of a job instead of the
     previous 20 minute time limit.  The previous behavior would fail for
(Continue reading)

Moe Jette | 1 Aug 2012 21:01
Favicon
Gravatar

SLURM version 2.4.2 is now available


SLURM version 2.4.2 is now available from
http://www.schedmd.com/#repos

It includes many bug fixes, most of which IBM BlueGene specific.

* Changes in SLURM 2.4.2
========================
  -- BLUEGENE - Correct potential deadlock issue when hardware goes bad and
     there are jobs running on that hardware.
  -- If job is submitted to more than one partition, it's partition pointer can
     be set to an invalid value. This can result in the count of CPUs allocated
     on a node being bad, resulting in over- or under-allocation of its CPUs.
     Patch by Carles Fenoy, BSC.
  -- Fix bug in task layout with select/cons_res plugin and --ntasks-per-node
     option. Patch by Martin Perry, Bull.
  -- BLUEGENE - remove race condition where if a block is removed while waiting
     for a job to finish on it the number of unused cpus wasn't updated
     correctly.
  -- BGQ - make sure we have a valid block when creating or finishing a step
     allocation.
  -- BLUEGENE - If a large block (> 1 midplane) is in error and underlying
     hardware is marked bad remove the larger block and create a block over
     just the bad hardware making the other hardware available to run on.
  -- BLUEGENE - Handle job completion correctly if an admin removes a block
     where other blocks on an overlapping midplane are running jobs.
  -- BLUEGENE - correctly remove running jobs when freeing a block.
  -- BGQ - correct logic to place multiple (< 1 midplane) steps inside a
     multi midplane block allocation.
  -- BGQ - Make it possible for a multi midplane allocation to run on more
(Continue reading)

Danny Auble | 2 Jul 2012 19:29
Favicon
Gravatar

SLURM version 2.4.0 released


It has come to our attention a bug in 2.4.0 results in job loss when 
upgrading from 2.3.* to 2.4.0.

2.4.1 has fixed this problem.  This is the only patch in 2.4.1 from 2.4.0.

2.4.1 will preserve job state from 2.4.0 as well as state from 2.1+.

Sorry for the inconvenience, thanks to Charles Fenoy for bringing the 
issue to our attention.

You may download it at http://www.schedmd.com/#repos.  To avoid future 
job loss we have taken 2.4.0 away from download.  If you need it for 
historic purposes please fill free to download the tag from github.

Change log below.

Danny

* Changes in SLURM 2.4.1
========================
  -- Fix bug for job state change from 2.3 -> 2.4 job state can now be 
preserved
     correctly when transitioning.  This also applies for 2.4.0 -> 2.4.1, no
     state will be lost. (Thanks to Carles Fenoy)

Danny Auble | 29 Jun 2012 00:20
Favicon
Gravatar

SLURM versions 2.4.0 and 2.5.0-pre1 are now available

We are pleased to release a formal 2.4.0 release!  Also a first development release of 2.5.

Both are available now for download at http://www.schedmd.com/#repos.

If you are developing new code please code against the master git repo https://github.com/SchedMD/slurm as it is constantly updated so as to avoid as many conflicts as possible.

Note to BGQ earlier adopters:  Recently there have been a few changes that require the runjob_mux to run as your SLURM user.  Also the plugin_flags must be updated as well to avoid a possible runjob_mux crash if you are starting a job and decide to turn off the slurmctld at the same time.  Please read the updated bluegene web page http://schedmd.com/slurmdocs/bluegene.html look for "System Administration for BlueGene/Q only" for full instructions.

Thanks for all your help and support.  Among other things 2.4 brings substantial performance enhancements and many other improvements many of which can be found in the RELEASE_NOTES file in the code.

As always if you find any bugs let us know through http://bugs.schedmd.com or the slurm-dev list.

Below are changes for 2.4.0 and 2.5.0-pre1 since the last tag.

* Changes in SLURM 2.4.0
========================
 -- Cray - Improve support for zero compute note resource allocations.
    Partition used can now be configured with no nodes nodes.
 -- BGQ - make it so srun -i<taskid> works correctly.
 -- Fix parse_uint32/16 to complain if a non-digit is given.
 -- Add SUBMITHOST to job state passed to Moab vial sched/wiki2. Patch by Jon
    Bringhurst (LANL).
 -- BGQ - Fix issue when running with AllowSubBlockAllocations=Yes without
    compiling with --enable-debug
 -- Modify scontrol to require "-dd" option to report batch job's script. Patch
    from Don Albert, Bull.
 -- Modify SchedulerParamters option to match documentation: "bf_res="
    changed to "bf_resolution=". Patch from Rod Schultz, Bull.
 -- Fix bug that clears job pending reason field. Patch fron Don Lipari, LLNL.
 -- In etc/init.d/slurm move check for scontrol after sourcing
    /etc/sysconfig/slurm. Patch from Andy Wettstein, University of Chicago.
 -- Fix in scheduling logic that can delay jobs with min/max node counts.
 -- BGQ - fix issue where if a step uses the entire allocation and then
    the next step in the allocation only uses part of the allocation it gets
    the correct cnodes.
 -- BGQ - Fix checking for IO on a block with new IBM driver V1R1M1 previous
    function didn't always work correctly.
 -- BGQ - Fix issue when a nodeboard goes down and you want to combine blocks
    to make a larger small block and are running with sub-blocks.
 -- BLUEGENE - Better logic for making small blocks around bad nodeboard/card.
 -- BGQ - When using an old IBM driver cnodes that go into error because of
    a job kill timeout aren't always reported to the system.  This is now
    handled by the runjob_mux plugin.
 -- BGQ - Added information on how to setup the runjob_mux to run as SlurmUser.
 -- Improve memory consumption on step layouts with high task count.
 -- BGQ - quiter debug when the real time server comes back but there are
    still messages we find when we poll but haven't given it back to the real
    time yet.
 -- BGQ - fix for if a request comes in smaller than the smallest block and
    we must use a small block instead of a shared midplane block.
 -- Fix issues on large jobs (>64k tasks) to have the correct counter type when
    packing the step layout structure.
 -- BGQ - fix issue where if a user was asking for tasks and ntasks-per-node
    but not node count the node count is correctly figured out.
 -- Move logic to always use the 1st alphanumeric node as the batch host for
    batch jobs.
 -- BLUEGENE - fix race condition where if a nodeboard/card goes down at the
    same time a block is destroyed and that block just happens to be the
    smallest overlapping block over the bad hardware.
 -- Fix bug when querying accounting looking for a job node size.
 -- BLUEGENE - fix possible race condition if cleaning up a block and the
    removal of the job on the block failed.
 -- BLUEGENE - fix issue if a cable was in an error state make it so we can
    check if a block is still makable if the cable wasn't in error.
 -- Put nodes names in alphabetic order in node table.
 -- If preempted job should have a grace time and preempt mode is not cancel
    but job is going to be canceled because it is interactive or other reason
    it now receives the grace time.
 -- BGQ - Modified documents to explain new plugin_flags needed in bg.properties
    in order for the runjob_mux to run correctly.
 -- BGQ - change linking from libslurm.o to libslurmhelper.la to avoid warning.

* Changes in SLURM 2.5.0.pre1
=============================
 -- Add new output to "scontrol show configuration" of LicensesUsed. Output is
    "name:used/total"
 -- Changed jobacct_gather plugin infrastructure to be cleaner and easier to
    maintain.
 -- Change license option count separator from "*" to ":" for consistency with
    the gres option (e.g. "--licenses=foo:2 --gres=gpu:2"). The "*" will still
    be accepted, but is no longer documented.
 -- Permit more than 100 jobs to be scheduled per node (new limit is 10,000
    jobs).
 -- Restructure of srun code to allow outside programs to utilize existing
    logic.

Danny Auble | 16 May 2012 22:15
Favicon
Gravatar

SLURM versions 2.3.5 and 2.4.0-rc1 are now available


SLURM versions 2.3.5 and 2.4.0-rc1 are now available from
http://www.schedmd.com/#repos
A description of the changes is appended.

This will most likely be the last 2.3 release unless a 2.3.6 is really 
warranted.

Development for 2.4 has been halted and only bug fixes will be applied 
from now on.  Our plans are to release an rc2 in a couple of weeks and a 
2.4.0-1 a couple of weeks after that.  Please test 2.4 and report any 
bugs to us through http://bugs.schedmd.com or through the slurm-dev list.

Future developments will be in 2.5 released later this year (planned for 
October).  We will release a 2.5.0-pre1 shortly.

* Changes in SLURM 2.3.5
========================
  -- Improve support for overlapping advanced reservations. Patch from
     Bill Brophy, Bull.
  -- Modify Makefiles for support of Debian hardening flags. Patch from
     Simon Ruderich.
  -- CRAY: Fix support for configuration with SlurmdTimeout=0 (never mark
     node that is DOWN in ALPS as DOWN in SLURM).
  -- Fixed the setting of SLURM_SUBMIT_DIR for jobs submitted by Moab 
(BZ#1467).
     Patch by Don Lipari, LLNL.
  -- Correction to init.d/slurmdbd exit code for status option. Patch by 
Bill
     Brophy, Bull.
  -- When the optional max_time is not specified for --switches=count, 
the site
     max (SchedulerParameters=max_switch_wait=seconds) is used for the job.
     Based on patch from Rod Schultz.
  -- Fix bug in select/cons_res plugin when used with topology/tree and 
a node
     range count in job allocation request.
  -- Fixed moab_2_slurmdb.pl script to correctly work for end records.
  -- Add support for new SchedulerParameters of max_depend_depth 
defining the
     maximum number of jobs to test for circular dependencies (i.e. job 
A waits
     for job B to start and job B waits for job A to start). Default 
value is
     10 jobs.
  -- Fix potential race condition if MinJobAge is very low (i.e. 1) and 
using
     slurmdbd accounting and running large amounts of jobs (>50 sec).  Job
     information could be corrupted before it had a chance to reach the DBD.
  -- Fix state restore of job limit set from admin value for min_cpus.
  -- Fix clearing of limit values if an admin removes the limit for max cpus
     and time limit where it was previously set by an admin.
  -- Fix issue where log message is more than 256 chars and then has a 
format.
  -- Fix sched/wiki2 to support job account name, gres, partition name, 
wckey,
     or working directory that contains "#" (a job record separator). 
Also fix
     for wckey or working directory that contains a double quote '\"'.
  -- CRAY - fix for handling memory requests from user for an allocation.
  -- Add support for switches parameter to the job_submit/lua plugin. 
Work by
     Par Andersson, NSC.
  -- Fix to job preemption logic to preempt multiple jobs at the same time.
  -- Fix minor issue where uid and gid were switched in sview for submitting
     batch jobs.
  -- Fix possible illegal memory reference in slurmctld for job step with
     relative option. Work by Matthieu Hautreux (CEA).
  -- Reset priority of system held jobs when dependency is satisfied. 
Work by
     Don Lipari, LLNL.

* Changes in SLURM 2.4.0.rc1
=============================
  -- Improve task binding logic by making fuller use of HWLOC library,
     especially with respect to Opteron 6000 series processors. Work 
contributed
     by Komoto Masahiro.
  -- Add new configuration parameter PriorityFlags, based upon work by
     Carles Fenoy (Barcelona Supercomputer Center).
  -- Modify the step completion RPC between slurmd and slurmstepd in 
order to
     eliminate a possible deadlock. Based on work by Matthieu Hautreux, CEA.
  -- Change the owner of slurmctld and slurmdbd log files to the appropriate
     user. Without this change the files will be created by and owned by the
     user starting the daemons (likely user root).
  -- Reorganize the slurmstepd logic in order to better support NFS and
     Kerberos credentials via the AUKS plugin. Work by Matthieu 
Hautreux, CEA.
  -- Fix bug in allocating GRES that are associated with specific CPUs. 
In some
     cases the code allocated first available GRES to job instead of 
allocating
     GRES accessible to the specific CPUs allocated to the job.
  -- spank: Add callbacks in slurmd: slurm_spank_slurmd_{init,exit}
     and job epilog/prolog: slurm_spank_job_{prolog,epilog}
  -- spank: Add spank_option_getopt() function to api
  -- Change resolution of switch wait time from minutes to seconds.
  -- Added CrpCPUMins to the output of sshare -l for those using hard limit
     accounting.  Work contributed by Mark Nelson.
  -- Added mpi/pmi2 plugin for complete support of pmi2 including acquiring
     additional resources for newly launched tasks. Contributed by 
Hongjia Cao,
     NUDT.
  -- BGQ - fixed issue where if a user asked for a specific node count 
and more
     tasks than possible without overcommit the request would be allowed 
on more
     nodes than requested.
  -- Add support for new SchedulerParameters of bf_max_job_user, maximum 
number
     of jobs to attempt backfilling per user. Work by Bj�rn-Helge Mevik,
     University of Oslo.
  -- BLUEGENE - fixed issue where MaxNodes limit on a partition only limited
     larger than midplane jobs.
  -- Added cpu_run_min to the output of sshare --long.  Work contributed by
     Mark Nelson.
  -- BGQ - allow regular users to resolve Rack-Midplane to AXYZ coords.
  -- Add sinfo output format option of "%R" for partition name without "*"
     appended for default partition.
  -- Cray - Add support for zero compute note resource allocation to run 
batch
     script on front-end node with no ALPS reservation. Useful for pre- 
or post-
     processing.
  -- Support for cyclic distribution of cpus in task/cgroup plugin from 
Martin
     Perry, Bull.
  -- GrpMEM limit for QOSes and associations added Patch from 
Bj�rn-Helge Mevik,
     University of Oslo.
  -- Various performance improvements for up to 500% higher throughput 
depending
     upon configuration. Work supported by the Oak Ridge National Laboratory
     Extreme Scale Systems Center.
  -- Added jobacct_gather/cgroup plugin.  It is not advised to use this in
     production as it isn't currently complete and doesn't provide an 
equivalent
     substitution for jobacct_gather/linux yet. Work by Martin Perry, Bull.

Moe Jette | 30 Apr 2012 18:56
Favicon
Gravatar

SLURM User Group Meeting


The deadline for submitting proposals for SLURM tutorials and  
technical presentations is May 15. This year's meeting will be hosted  
by the Barcelona Supercomputing Center on 9 and 10 October 2012.  
Details are available here:
http://www.bsc.es/marenostrum-support-services/hpc-events-trainings/others/slurm-user-group-meeting-2012

Moe Jette | 20 Mar 2012 00:00
Favicon
Gravatar

SLURM versions 2.3.4 and 2.4.0-pre4 are now available


SLURM versions 2.3.4 and 2.4.0-pre4 are now available from  
http://www.schedmd.com/#repos
A description of the changes is appended.

* Changes in SLURM 2.3.4
========================
  -- Set DEFAULT flag in partition structure when slurmctld reads the
     configuration file. Patch from Rémi Palancher.
  -- Fix for possible deadlock in accounting logic: Avoid calling
     jobacct_gather_g_getinfo() until there is data to read from the socket.
  -- Fix typo in accounting when using reservations. Patch from Alejandro
     Lucero Palau.
  -- Fix to the multifactor priority plugin to calculate effective  
usage earlier
     to give a correct priority on the first decay cycle after a restart of the
     slurmctld. Patch from Martin Perry, Bull.
  -- Permit user root to run a job step for any job as any user. Patch from
     Didier Gazen, Laboratoire d'Aerologie.
  -- BLUEGENE - fix for not allowing jobs if all midplanes are drained and all
     blocks are in an error state.
  -- Avoid slurmctld abort due to bad pointer when setting an advanced
     reservation MAINT flag if it contains no nodes (only licenses).
  -- Fix bug when requeued batch job is scheduled to run on a different node
     zero, but attemts job launch on old node zero.
  -- Fix bug in step task distribution when nodes are not configured in numeric
     order. Patch from Hongjia Cao, NUDT.
  -- Fix for srun allocating running within existing allocation with --exclude
     option and --nnodes count small enough to remove more nodes. Patch from
     Phil Eckert, LLNL.
  -- Work around to handle certain combinations of glibc/kernel
     (i.e. glibc-2.14/Linux-3.1) to correctly open the pty of the slurmstepd
     as the job user. Patch from Mark Grondona, LLNL.
  -- Modify linking to include "-ldl" only when needed. Patch from Aleksej
     Saushev.
  -- Fix smap regression to display nodes that are drained or down correctly.
  -- Several bug fixes and performance improvements with related to batch
     scripts containing very large numbers of arguments. Patches from Par
     Andersson, NSC.
  -- Fixed extremely hard to reproduce threading issue in assoc_mgr.
  -- Correct "scontrol show daemons" output if there is more than one
     ControlMachine configured.
  -- Add node read lock where needed in slurmctld/agent code.
  -- Added test for LUA library named "liblua5.1.so.0" in addition to
     "liblua5.1.so" as needed by Debian. Patch by Remi Palancher.
  -- Added partition default_time field to job_submit LUA plugin. Patch by
     Remi Palancher.
  -- Fix bug in cray/srun wrapper stdin/out/err file handling.
  -- In cray/srun wrapper, only include aprun "-q" option when srun "--quiet"
     option is used.
  -- BLUEGENE - fix issue where if a small block was in error it could hold up
     the queue when trying to place a larger than midplane job.
  -- CRAY - ignore all interactive nodes and jobs on interactive nodes.
  -- Add new job state reason of "FrontEndDown" which applies only to Cray and
     IBM BlueGene systems.
  -- Cray - Enable configure option of "--enable-salloc-background" to permit
     the srun and salloc commands to be executed in the background. This does
     NOT remove the ALPS limitation that only one job reservation can  
be created
     for each Linux session ID.
  -- Cray - For srun wrapper when creating a job allocation, set the  
default job
     name to the executable file's name.
  -- Add support for Cray ALPS 5.0.0
  -- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't
     mark front end node down.
  -- FRONTEND - don't down a front end node if you have an epilog error.
  -- Cray - fix for if a frontend slurmd was started after the slurmctld had
     already pinged it on startup the unresponding flag would be removed from
     the frontend node.
  -- Cray - Fix issue on smap not displaying grid correctly.
  -- Fixed minor memory leak in sview.

* Changes in SLURM 2.4.0-pre4
=============================
  -- Add logic to cache GPU file information (bitmap index mapping to device
     file number) in the slurmd daemon and transfer that information to the
     slurmstepd whenever a job step is initiated. This is needed to set the
     appropriate CUDA_VISIBLE_DEVICES environment variable value when the
     devices are not in strict numeric order (e.g. some GPUs are skipped).
     Based upon work by Nicolas Bigaouette.
  -- BGQ - Remove ability to make a sub-block with a geometry with one or more
     of it's dimensions of length 3.  There is a limitation in the IBM I/O
     subsystem that is problematic with multiple sub-blocks with a dimension
     of length 3, so we will disallow them to be able to be created.  This
     mean you if you ask the system for an allocation of 12 c-nodes you will
     be given 16.  If this is ever fix in BGQ you can remove this patch.
  -- BLUEGENE - Better handling blocks that go into error state or deallocate
     while jobs are running on them.
  -- BGQ - fix for handling mix of steps running at same time some of which
     are full allocation jobs, and others that are smaller.
  -- BGQ - fix for core dump after running multiple sub-block jobs on static
     blocks.
  -- BGQ - fixed sync issue where if a job finishes in SLURM but not in mmcs
     for a long time after the SLURM job has been flushed from the system
     we don't have to worry about rebooting the block to sync the system.
  -- BGQ - In scontrol/sview node counts are now displayed with
     CnodeCount/CnodeErrCount so to point out there are cnodes in an  
error state
     on the block.  Draining the block and having it reboot when all jobs are
     gone will clear up the cnodes in Software Failure.
  -- Change default SchedulerParameters max_switch_wait field value from 60 to
     300 seconds.
  -- BGQ - catch errors from the kill option of the runjob client.
  -- BLUEGENE - make it so the epilog runs until slurmctld tells it the job is
     gone.  Previously it had a timelimit which has proven to not be the right
     thing.
  -- FRONTEND - fix issue where if a compute node was in a down state and
     an admin updates the node to idle/resume the compute nodes will go
     instantly to idle instead of idle* which means no response.
  -- Fix regression in 2.4.0.pre3 where number of submitted jobs limit wasn't
     being honored for QOS.
  -- Cray - Enable logging of BASIL communications with environment variables.
     Set XML_LOG to enable logging. Set XML_LOG_LOC to specify path to log file
     or "SLURM" to write to SlurmctldLogFile or unset for  
"slurm_basil_xml.log".
     Patch from Steve Tronfinoff, CSCS.
  -- FRONTEND - if a front end unexpectedly reboots kill all jobs but don't
     mark front end node down.
  -- FRONTEND - don't down a front end node if you have an epilog error
  -- BLUEGENE - if a job has an epilog error don't down the midplane it was
     running on.
  -- BGQ - added new DebugFlag (NoRealTime) for only printing debug from
     state change while the realtime server is running.
  -- Fix multi-cluster mode with sview starting on a non-bluegene cluster going
     to a bluegene cluster.
  -- BLUEGENE - ability to show Rack Midplane name of midplanes in sview and
     scontrol.

Herbert Jung | 16 Mar 2012 17:26
Favicon

cant get the backup controller to take over

Hi All,

 

We have 2 servers being setup/configured  for using slurm (1 master and 1 backup controller) but for some reason we don’t get the failover to work and we cant figure out why.

When I stop the slurmctld daemon on the master controller, the backup slurm server takes over and all the jobs which are in the queue are being triggert via the backup controller, but when I shutdown the slurmd daemon on the master controller the backup controller wont take over

 

 

installed slurm Version     2.3.3

 

daemon running on master controller

slurm    15096     1  0 Mar15 ?        00:00:02 /usr/local/sbin/slurmctld

root     15105     1  0 Mar15 ?        00:00:00 /usr/local/sbin/slurmd

munge     2286     1  0 Mar08 ?        00:00:26 /usr/sbin/munged

 

daemon running on backup controller

slurm      313     1  0 Mar14 ?        00:00:00 /usr/local/sbin/slurmdbd

slurm    13192     1  0 Mar15 ?        00:00:01 /usr/local/sbin/slurmctld

root     13202     1  0 Mar15 ?        00:00:00 /usr/local/sbin/slurmd

munge    19842     1  0 Mar08 ?        00:00:26 /usr/sbin/munged

mysqld

 

the slurm.state folder is mounted on both servers, user slurm has r-w  permissions  

 

example

I just triggert a few jobs on the master controller

sbatch --begin=now+20 -n4 -w "host[1,2]" -o /tmp/my.stdout /tmp/my.script

sbatch --begin=now+40 -n4 -w "host[1,2]" -o /tmp/my.stdout /tmp/my.script

.

 

 

squeue says

  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)

    281     debug my.scrip     root  PD       0:00      2 (BeginTime)

    282     debug my.scrip     root  PD       0:00      2 (BeginTime)

    283     debug my.scrip     root  PD       0:00      2 (BeginTime)

    284     debug my.scrip     root  PD       0:00      2 (BeginTime)

    285     debug my.scrip     root  PD       0:00      2 (BeginTime)

    286     debug my.scrip     root  PD       0:00      2 (BeginTime)

 

everything looks good so far

 

.

.

current squeue status

 

  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)

    283     debug my.scrip     root  PD       0:00      2 (BeginTime)

    284     debug my.scrip     root  PD       0:00      2 (BeginTime)

    285     debug my.scrip     root  PD       0:00      2 (BeginTime)

    286     debug my.scrip     root  PD       0:00      2 (BeginTime)

 

next I am going to shutdown slurmd/slurmctld on the master controller

 

squeue status short after

 

  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)

    283     debug my.scrip     root  PD       0:00      2 (ReqNodeNotAvail)

    284     debug my.scrip     root  PD       0:00      2 (ReqNodeNotAvail)

    285     debug my.scrip     root  PD       0:00      2 (ReqNodeNotAvail)

    286     debug my.scrip     root  PD       0:00      2 (ReqNodeNotAvail)

 

 

I have reconfigured the srm.conf files several times, but nothing seems to work

 

 

any help would be very much appreciated

 

I  have attached the slurmctld and slurmd logs from the master and backup controller, as also a snapshot of our current config file and the output of scontrol-show-config

 

 

Thanks a lot

Herbert

 

 

 

P Please consider the environment before printing this email…

 

 



*******************************************************
This message (including any files transmitted with it) may contain confidential and/or proprietary information, is the property of Interactive Data Corporation and/or its subsidiaries, and is directed only to the addressee(s). If you are not the designated recipient or have reason to believe you received this message in error, please delete this message from your system and notify the sender immediately. An unintended recipient's disclosure, copying, distribution, or use of this message or any attachments is prohibited and may be unlawful.
*******************************************************
Attachment (slurmctld.log-backup-controller): application/octet-stream, 26 KiB
Attachment (slurmctld.log-master-controller): application/octet-stream, 70 KiB
Attachment (slurmd.log-backup-controller): application/octet-stream, 5982 bytes
Attachment (slurmd.log-master-controller): application/octet-stream, 1301 bytes
Attachment (scontrol-show-config): application/octet-stream, 4745 bytes
Attachment (slurm-conf-master-backup): application/octet-stream, 1406 bytes
Moe Jette | 3 Feb 2012 21:32
Favicon
Gravatar

To remain on slurm-announce mailing list


In an effort to remove defunct email addresses from our mailing list,  
please respond to lists@... with the names of the lists in the  
subject to remain on them. A subject of "slurm-dev slurm-announce"  
will keep you on both lists.

If you do not respond by 1 March, your email address will be removed  
from our mailing list(s). You can subscribe or unsubscribe from either  
list at any time here:
http://lists.schedmd.com/cgi-bin/dada/mail.cgi/list


Gmane