pbstop 5.01 update
Dennis McRitchie <dmcr <at> princeton.edu>
2015-08-24 16:06:08 GMT
I have created a significantly upgraded version of pbstop that is based on
v4.16 and includes the array job support changes contributed by Gareth
Williams, as well as the other enhancements and bug fixes listed below.
Notably, it includes support for Torque 4+, improved build for perl-PBS, and
also supports the SLURM resource manager. This combined build is called
schedtop 5.01, and it was developed and tested on a number of Princeton
University clusters running a Springdale 6 environment (i.e., RedHat 6
distribution built at Princeton University).
This release is available from our Subversion repository at
https://svn.princeton.edu/schedtop, and can be downloaded with: svn export
The release is provided in the form of a schedtop source RPM and a schedtop
tarball. Binary RPMs for pbstop and perl-PBS can be built by typing:
> rpmbuild --rebuild --with pbs schedtop-5.01-1.sdl6.src.rpm
(a "--with slurm" option exists to build a binary slurmtop rpm)
pbstop can also be built using the attached tarball by extracting the files
and following the directions in the README file. Note that the appropriate
scheduler's client libraries must be installed.
The changes since v4.16 of pbstop are as follows:
1) Added SLURM support, including support for subwindow node-level (offline,
restore) and job-level (delete, hold, release, rerun) commands.
2) Refactored code base to share common pbstop/slurmtop code in new
schedtop.pm module. slurmtop and pbstop scripts contain the
scheduler-specific code. Same division was done with the POD documentation,
with the schedtop.pm, pbstop, and slurmtop documentation split into several
POD files that are assembled at build time.
3) Ported array job support, including array job compression (array job
support courtesy of Gareth Williams at CSIRO).
4) Added array job support enhancements:
4a) Enhanced array job support by displaying total number of allocated
cores for compressed array jobs.
4b) Supported with pbstop when using command-line utilities as backend.
4c) Fixed sort order problem with array job index zero.
5) Fixed perl-PBS build script to support Torque 3 and 4, including
circumventing lack of swig support for Torque 3 and 4's pbs_error.h file.
6) Fixed parsing of the 'jobs' output of 'pbsnodes -a' (and its equivalent
perl-PBS API call) under Torque 4, which outputs a ','-delimited list
instead of a ', '-delimited list causing pbstop to only see the first job
running on any given node. Fix supports both Torque 3 and 4. (pbstop)
7) Re-implemented secondary "timeshare" grid to support servers with a very
large number of cpus per node (i.e., nodes or servers requiring multiple
terminal lines to display all their cpus such as SGI UV).
8) Major auto-configuration enhancements added for many cluster types and
8a) Unless explicitly set, show_cpu and maxnodegrid are automatically set
to display all cluster nodes in the primary grid, except for those that
cannot be displayed on a single terminal line.
8b) All nodes/servers whose cpu display will not fit on one terminal line
are automatically assigned to be displayed in their own secondary grid.
8c) Either show_cpu or maxnodegrid can be explicitly set in order to force
larger nodes into the secondary grid.
9) Autocolumns support was improved to better assign nodes to fit terminal
10) Added support for "-f" command-line option and "f" interactive command:
toggle fill background with black.
11) Compact display ("no space", -n) was improved; also new interactive "N"
toggle was added.
12) New interactive "L" toggle for limiting job view to specific queue was
13) New interactive "m" command to specify primary grid's max per-node CPU
count was added. Can be reduced from the default to force larger CPU nodes
into their own secondary grid.
14) Brought all POD (man page) documentation up-to-date, including new
documentation for subwindow commands to offline and restore nodes, and
delete, hold, release, and re-run jobs.
15) Updated -h menu and interactive help screen to match man pages.
16) Better support for mixed busy/free node grid display: node's cpu status
(busy/free) in grid now shown with cpu-level rather than node-level
granularity; if job display disabled or user-specific job filtering in
effect, nodes with 'free' status show cpu status accurately.
17) Helpful warning displayed if $maxrows is set too small to display all
jobs (1500 default). Instructions for correcting value are provided in the
18) $maxcols changed to default to 300 (from 250) to accommodate wider
19) Grid legend moved directly under the grids for better visibility.
20) Window and subwindow formatting improvements.
21) Display expected run delay for queued jobs as negative elapsed time
22) Highlight recently completed jobs (slurmtop)
23) Fixed bug with 0-9 CPU number toggle in primary grid: it broke CPU
numbers > 9; deprecated this early feature: not designed for nodes with 10
or more CPUs.
24) Display USC copyright for pbstop only.
25) Miscellaneous bug fixes.
I hope you will find this new version helpful.