Michael Weiser | 2 Sep 17:18 2015
Picon

Interactive jobs bound to one core?

Hi,

a customer is running Torque 4.1.9 with Maui as scheduler on OpenSuSE
13.1 and sees the following behaviour: When requesting an interactive
shell on a node via qsub -I -l nodes=10 all processes spawned from that
shell share the same CPU.

I've found the documentation "Managing Nodes > Linux Cpuset Support " saying:

 TM tasks are constrained to a single core, thus a multi-threaded
 process could seriously suffer.

Is that what we're seeing and if so, can it be disabled or changed to
expand the cpuset to the actually allocated nodes other than recompiling
Torque without cpuset support?

Thanks,
-- 
Michael Weiser                science + computing ag
Senior Solutions Architect    Geschaeftsstelle Duesseldorf
                              Faehrstrasse 1
phone: +49 211 302 708 32     D-40221 Duesseldorf
fax:   +49 211 302 708 50     www.science-computing.de
--

-- 
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Arno Steitz
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
(Continue reading)

Michael Weiser | 2 Sep 12:19 2015
Picon

Interactive jobs bound to one core?

Hi,

a customer is running Torque 4.1.9 with Maui as scheduler on OpenSuSE
13.1 and sees the following behaviour: When requesting an interactive
shell on a node via qsub -I -l nodes=10 all processes spawned from that
shell share the same CPU.

I've found the documentation "Managing Nodes > Linux Cpuset Support " saying:

 TM tasks are constrained to a single core, thus a multi-threaded
 process could seriously suffer.

Is that what we're seeing and if so, can it be disabled or changed to
expand the cpuset to the actually allocated nodes other than recompiling
Torque without cpuset support?

Thanks,
-- 
Michael Weiser                science + computing ag
Senior Solutions Architect    Geschaeftsstelle Duesseldorf
                              Faehrstrasse 1
phone: +49 211 302 708 32     D-40221 Duesseldorf
fax:   +49 211 302 708 50     www.science-computing.de
--

-- 
Vorstandsvorsitzender/Chairman of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Arno Steitz
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
(Continue reading)

Mahmoud A. A. Ibrahim | 2 Sep 16:25 2015
Picon

Disable a particular node

Dear Torque Users
We were wondering what the proper command should be used to:
#disable running of jobs on a particular node. This is to free this node for maintenance.
#disable running of jobs on all nodes, but accept putting jobs on queue.
We run Rocks cluster 6.1.
Any kind of support will be highly appreciated
Sincerely;
M. Ibrahim

_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Bas van der Vlies | 27 Aug 15:44 2015
Picon

Torque 5.1.1.2 and allow_node_submit problem

Hello,

 We are upgrading from 2.5.X to this version to install MOAB, but the first problem hit is that
allow_node_submit does not work. We get this error message 
    * qsub: submit error (Job rejected by all possible destinations (check syntax, queue resources, …))

Has somebody else hit this problem and has a solution for it or is it a bug?

regards,

---
Bas van der Vlies
| Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG  Amsterdam
| T +31 (0) 20 800 1300  | bas.vandervlies <at> surfsara.nl | www.surfsara.nl |

_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Hongyi Zhao | 26 Aug 07:00 2015
Picon

Using torque without trqauthd.

Hi all,

Considering that the cluster of mine is located at internal network and 
is used only by several people.  Can I running torque + maui without 
using the trqauthd?

If so, how should I config torque + maui for working without using 
trqauthd?

Regards
--

-- 
.: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
Michel Béland | 24 Aug 20:59 2015
Picon

qstat bug

Hello everyone,

I am running Torque 5.1.1. It seems that there is a bug with qstat :

beland <at> foudre2 ~]$ qstat -a 1 2 3 4 5 6 7 8 9
Unable to communicate with egeon2(192.168.86.8)
Can not resolve name for server ����. (rc = -1 - )
Cannot resolve specified server host '����'.
Can not resolve name for server ����. (rc = -1 - )
Cannot resolve specified server host '����'.
qstat: cannot connect to server ���� (errno=15010) Access from host not 
allowed, or unknown host
qstat: Unknown Job Id Error 1.egeon2
qstat: Unknown Job Id Error 2.egeon2
qstat: Unknown Job Id Error 3.egeon2
qstat: Unknown Job Id Error 4.egeon2
qstat: Unknown Job Id Error 5.egeon2
qstat: Unknown Job Id Error 6.egeon2
qstat: Unknown Job Id Error 7.egeon2
qstat: Unknown Job Id Error 8.egeon2
qstat: Unknown Job Id Error 8.egeon2
[beland <at> foudre2 ~]$

The Unknown Job ID errors are ok (except that there are two error 
messages for job 8), but what is above is not.

Here is my qmgr configuration, if needed :

[beland <at> egeon2 ~]$ qmgr
Max open servers: 9
Qmgr: p s
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = True
set server acl_hosts = egeon2
set server acl_hosts += localhost
set server acl_hosts += foudre2
set server managers = rqchppbs <at> egeon2
set server managers += rqchppbs <at> foudre2
set server operators = rqchppbs <at> egeon2
set server operators += rqchppbs <at> foudre2
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 300
set server poll_jobs = True
set server mom_job_sync = True
set server keep_completed = 30
set server next_job_number = 110
set server moab_array_compatible = True
set server nppcu = 1
Qmgr:

--

-- 
Michel Béland, analyste en calcul scientifique
michel.beland <at> calculquebec.ca
bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal
téléphone : 514 343-6111 poste 3892     télécopieur : 514 343-2155
Calcul Québec (www.calculquebec.ca)
Calcul Canada (calculcanada.ca)

_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Dennis McRitchie | 24 Aug 18:06 2015
Picon

pbstop 5.01 update

Hi all,

I have created a significantly upgraded version of pbstop that is based on
v4.16 and includes the array job support changes contributed by Gareth
Williams, as well as the other enhancements and bug fixes listed below.
Notably, it includes support for Torque 4+, improved build for perl-PBS, and
also supports the SLURM resource manager. This combined build is called
schedtop 5.01, and it was developed and tested on a number of Princeton
University clusters running a Springdale 6 environment (i.e., RedHat 6
distribution built at Princeton University).

This release is available from our Subversion repository at
https://svn.princeton.edu/schedtop, and can be downloaded with: svn export
https://svn.princeton.edu/schedtop .

The release is provided in the form of a schedtop source RPM and a schedtop
tarball. Binary RPMs for pbstop and perl-PBS can be built by typing:
> rpmbuild --rebuild --with pbs schedtop-5.01-1.sdl6.src.rpm 
(a "--with slurm" option exists to build a binary slurmtop rpm)

pbstop can also be built using the attached tarball by extracting the files
and following the directions in the README file. Note that the appropriate
scheduler's client libraries must be installed.

The changes since v4.16 of pbstop are as follows:

1) Added SLURM support, including support for subwindow node-level (offline,
restore) and job-level (delete, hold, release, rerun) commands.
2) Refactored code base to share common pbstop/slurmtop code in new
schedtop.pm module.  slurmtop and pbstop scripts contain the
scheduler-specific code.  Same division was done with the POD documentation,
with the schedtop.pm, pbstop, and slurmtop documentation split into several
POD files that are assembled at build time.
3) Ported array job support, including array job compression (array job
support courtesy of Gareth Williams at CSIRO).
4) Added array job support enhancements:
  4a) Enhanced array job support by displaying total number of allocated
cores for compressed array jobs.
  4b) Supported with pbstop when using command-line utilities as backend.
  4c) Fixed sort order problem with array job index zero.
5) Fixed perl-PBS build script to support Torque 3 and 4, including
circumventing lack of swig support for Torque 3 and 4's pbs_error.h file.
(pbstop)
6) Fixed parsing of the 'jobs' output of 'pbsnodes -a' (and its equivalent
perl-PBS API call) under Torque 4, which outputs a ','-delimited list
instead of a ', '-delimited list causing pbstop to only see the first job
running on any given node. Fix supports both Torque 3 and 4. (pbstop)
7) Re-implemented secondary "timeshare" grid to support servers with a very
large number of cpus per node (i.e., nodes or servers requiring multiple
terminal lines to display all their cpus such as SGI UV).
8) Major auto-configuration enhancements added for many cluster types and
sizes: 
  8a) Unless explicitly set, show_cpu and maxnodegrid are automatically set
to display all cluster nodes in the primary grid, except for those that
cannot be displayed on a single terminal line. 
  8b) All nodes/servers whose cpu display will not fit on one terminal line
are automatically assigned to be displayed in their own secondary grid.
  8c) Either show_cpu or maxnodegrid can be explicitly set in order to force
larger nodes into the secondary grid.
9) Autocolumns support was improved to better assign nodes to fit terminal
width.
10) Added support for "-f" command-line option and "f" interactive command:
toggle fill background with black.
11) Compact display ("no space", -n) was improved; also new interactive "N"
toggle was added.
12) New interactive "L" toggle for limiting job view to specific queue was
added.
13) New interactive "m" command to specify primary grid's max per-node CPU
count was added. Can be reduced from the default to force larger CPU nodes
into their own secondary grid.
14) Brought all POD (man page) documentation up-to-date, including new
documentation for subwindow commands to offline and restore nodes, and
delete, hold, release,  and re-run jobs.
15) Updated -h menu and interactive help screen to match man pages.
16) Better support for mixed busy/free node grid display: node's cpu status
(busy/free) in grid now shown with cpu-level rather than node-level
granularity; if job display disabled or user-specific job filtering in
effect, nodes with 'free' status show cpu status accurately.
17) Helpful warning displayed if $maxrows is set too small to display all
jobs (1500 default). Instructions for correcting value are provided in the
message.
18) $maxcols changed to default to 300 (from 250) to accommodate wider
terminals.
19) Grid legend moved directly under the grids for better visibility.
20) Window and subwindow formatting improvements.
21) Display expected run delay for queued jobs as negative elapsed time
(slurmtop)
22) Highlight recently completed jobs (slurmtop)
23) Fixed bug with 0-9 CPU number toggle in primary grid: it broke CPU
numbers > 9; deprecated this early feature: not designed for nodes with 10
or more CPUs.
24) Display USC copyright for pbstop only.
25) Miscellaneous bug fixes.

I hope you will find this new version helpful.

Dennis McRitchie
Masataro Asai | 20 Aug 17:58 2015
Picon

Memory usage and periodical death of pbs

Hi all,

I am running torque/pbs 2.4.16 on a small ubuntu-based cluster. When I
submit around >10k jobs, the server eats up over 3GB memory and the
system begins to swap. Also, pbs_sched suddenly dies and the jobs no
longer gets started. As a workaround, I made a script which watch and
restarts the scheduler daemon, but the memory/swap problem is still
unbearable.

Now the question is, is there any server configuration/attribute which
can alleviate this problem? I understand that the PBS version is largely
outdated compared to the latest. However, I'd rather want to avoid
installing the software manually (i.e. outside package systems) due to
several reasons.

Bests,

--

-- 
Masataro Asai
Lenox, Billy CTR (US | 20 Aug 16:16 2015

Re: qsub problem in more detail

Thanks for the help and reply


Billy

On 8/20/15, 9:04 AM, "torqueusers-bounces <at> supercluster.org on behalf of
Michel Béland" <torqueusers-bounces <at> supercluster.org on behalf of
michel.beland <at> calculquebec.ca> wrote:

>You have to kill pbs_sched (either with the command kill or with
>/etc/init.d/pbs_sched stop) and make sure it is not started up
>automatically on boot.
>
>You install Maui and modify maui.cfg to have something like this at the
>beginning:
>
>RMPOLLINTERVAL        00:00:30
>
>SERVERHOST        yourtorqueserver.yourdomain.mil
>SERVERPORT        42559
>SERVERMODE        NORMAL
>
>RMCFG[base]        TYPE=PBS
>
>ADMIN1                manager root # replace manager with the name of
>the account that is manager with Torque
>ADMIN3                ALL
>
>The rest of the Maui configuration is up to you, but post any question
>to mauiusers instead of torqueusers.
>
>Hope this helps,
>
>Michel Béland
>
>Le 2015-08-20 06:55, Lenox, Billy CTR (US) a écrit :
>> Sorry it is pbs_sched. What do I do to remove that and install Maui?
>>
>>
>> Billy
>>
>> On 8/19/15, 3:23 PM, "torqueusers-bounces <at> supercluster.org on behalf of
>> Michel Béland" <torqueusers-bounces <at> supercluster.org on behalf of
>> michel.beland <at> calculquebec.ca> wrote:
>>
>>> Hello,
>>>
>>> I am not sure if you told us which scheduler you are using. If you use
>>> pbs_sched, it will not work. Use Maui (free) or Moab (commercial).
>>>
>>>
>>>> I have a weird Problem when users submit jobs
>>>>
>>>> *************************************************************
>>>>
>>>> Here is my Node List
>>>> *************************************************************
>>>>
>>>> node001 np=12 default high
>>>> node002 np=12 default high
>>>> node003 np=12 default high
>>>> node004 np=12 default high
>>>> node005 np=12 group2
>>>> node006 np=12 group2
>>>> node007 np=12 group2
>>>> node008 np=12 group2
>>>>
>>>> *************************************************************
>>>>
>>>> Here is what I see as a User issuing qmgr -c Œp s¹
>>>>
>>>> *************************************************************
>>>>
>>>>
>>>> #
>>>> # Create queues and set their attributes.
>>>> #
>>>> #
>>>> # Create and define queue default
>>>> #
>>>> create queue default
>>>> set queue default queue_type = Execution
>>>> set queue default Priority = 75
>>>> set queue default max_running = 888
>>>> set queue default max_user_run = 888
>>>> set queue default enabled = True
>>>> set queue default started = True
>>>> #
>>>> # Create and define queue high
>>>> #
>>>> create queue high
>>>> set queue high queue_type = Execution
>>>> set queue high Priority = 100
>>>> set queue high max_running = 96
>>>> set queue high resources_min.nodect = 1
>>>> set queue high resources_default.nodes = 1
>>>> set queue high resources_default.walltime = 24:00:00
>>>> set queue high max_user_run = 96
>>>> set queue high enabled = True
>>>> set queue high started = True
>>>> #
>>>> # Create and define queue group2
>>>> #
>>>> create queue group2
>>>> set queue group2 queue_type = Execution
>>>> set queue group2 acl_host_enable = False
>>>> set queue group2 acl_hosts = node005+node006+node007+node008
>>>> set queue group2 acl_users = user1
>>>> set queue group2 acl_users += user2
>>>> set queue group2 resources_default.walltime = 240:00:00
>>>> set queue group2 enabled = True
>>>> set queue group2 started = True
>>>> #
>>>> # Set server attributes.
>>>> #
>>>> set server scheduling = True
>>>> set server max_running = 888
>>>> set server max_user_run = 888
>>>> set server acl_host_enable = True
>>>> set server acl_hosts = master.local
>>>> set server acl_hosts += cluster.local
>>>> set server default_queue = default
>>>> set server log_events = 511
>>>> set server mail_from = root
>>>> set server query_other_jobs = True
>>>> set server resources_default.nodect = 1
>>>> set server resources_default.nodes = 1
>>>> set server scheduler_iteration = 600
>>>> set server node_check_rate = 150
>>>> set server tcp_timeout = 6
>>>> set server allow_node_submit = True
>>>> set server next_job_number = 100
>>>> set server authorized_users = * <at> *.local
>>>>
>>>> *************************************************************
>>>> Here is what is see as root issuing qmgr -c Œp s¹
>>>>
>>>> *************************************************************
>>>>
>>>>
>>>> #
>>>> # Create queues and set their attributes.
>>>> #
>>>> #
>>>> # Create and define queue default
>>>> #
>>>> create queue default
>>>> set queue default queue_type = Execution
>>>> set queue default Priority = 75
>>>> set queue default max_running = 888
>>>> set queue default resources_default.neednodes = default
>>>><-------------
>>>> set queue default max_user_run = 888
>>>> set queue default enabled = True
>>>> set queue default started = True
>>>> #
>>>> # Create and define queue high
>>>> #
>>>> create queue high
>>>> set queue high queue_type = Execution
>>>> set queue high Priority = 100
>>>> set queue high max_running = 96
>>>> set queue high resources_min.nodect = 1
>>>> set queue high resources_default.neednodes = high  <-------------
>>>> set queue high resources_default.nodes = 1
>>>> set queue high resources_default.walltime = 24:00:00
>>>> set queue high max_user_run = 96
>>>> set queue high enabled = True
>>>> set queue high started = True
>>>> #
>>>> # Create and define queue group2
>>>> #
>>>> create queue group2
>>>> set queue group2 queue_type = Execution
>>>> set queue group2 acl_host_enable = False
>>>> set queue group2 acl_hosts = node005+node006+node007+node008
>>>> set queue group2 acl_users = user1
>>>> set queue group2 acl_users += user2
>>>> set queue group2 resources_default.neednodes = group2  <-------------
>>>> set queue group2 resources_default.walltime = 240:00:00
>>>> set queue group2 enabled = True
>>>> set queue group2 started = True
>>>> #
>>>> # Set server attributes.
>>>> #
>>>> set server scheduling = True
>>>> set server max_running = 888
>>>> set server max_user_run = 888
>>>> set server acl_host_enable = True
>>>> set server acl_hosts = master.local
>>>> set server acl_hosts += cluster.local
>>>> set server default_queue = default
>>>> set server log_events = 511
>>>> set server mail_from = root
>>>> set server query_other_jobs = True
>>>> set server resources_default.neednodes = 1  <-------------
>>>> set server resources_default.nodect = 1
>>>> set server resources_default.nodes = 1
>>>> set server scheduler_iteration = 600
>>>> set server node_check_rate = 150
>>>> set server tcp_timeout = 6
>>>> set server allow_node_submit = True
>>>> set server next_job_number = 100
>>>> set server authorized_users = * <at> *.local
>>>>
>>>>
>>>>
>>>> A user does not see the lines Marked above <‹‹‹‹‹‹
>>>>
>>>>
>>>> Users submit jobs to default and request 8 nodes even though 4 nodes
>>>>are
>>>> dedicated to default queue.
>>>> How can I stop this from grabbing nodes in group2 and running on all 8
>>>> nodes?
>>>>
>>>> Billy
>>>>
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers <at> supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers

>>>
>>> -- 
>>> Michel Béland, analyste en calcul scientifique
>>> michel.beland <at> calculquebec.ca
>>> bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal
>>> téléphone : 514 343-6111 poste 3892     télécopieur : 514 343-2155
>>> Calcul Québec (www.calculquebec.ca)

>>> Calcul Canada (calculcanada.ca)
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers <at> supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers

>> _______________________________________________
>> torqueusers mailing list
>> torqueusers <at> supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers

>
>
>-- 
>Michel Béland, analyste en calcul scientifique
>michel.beland <at> calculquebec.ca
>bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal
>téléphone : 514 343-6111 poste 3892     télécopieur : 514 343-2155
>Calcul Québec (www.calculquebec.ca)

>Calcul Canada (calculcanada.ca)
>
>_______________________________________________
>torqueusers mailing list
>torqueusers <at> supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers


_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Lenox, Billy CTR (US | 20 Aug 12:55 2015

Re: qsub problem in more detail

Sorry it is pbs_sched. What do I do to remove that and install Maui?


Billy

On 8/19/15, 3:23 PM, "torqueusers-bounces <at> supercluster.org on behalf of
Michel Béland" <torqueusers-bounces <at> supercluster.org on behalf of
michel.beland <at> calculquebec.ca> wrote:

>Hello,
>
>I am not sure if you told us which scheduler you are using. If you use
>pbs_sched, it will not work. Use Maui (free) or Moab (commercial).
>
>
>> I have a weird Problem when users submit jobs
>>
>> *************************************************************
>>
>> Here is my Node List
>> *************************************************************
>>
>> node001 np=12 default high
>> node002 np=12 default high
>> node003 np=12 default high
>> node004 np=12 default high
>> node005 np=12 group2
>> node006 np=12 group2
>> node007 np=12 group2
>> node008 np=12 group2
>>
>> *************************************************************
>>
>> Here is what I see as a User issuing qmgr -c Œp s¹
>>
>> *************************************************************
>>
>>
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue default
>> #
>> create queue default
>> set queue default queue_type = Execution
>> set queue default Priority = 75
>> set queue default max_running = 888
>> set queue default max_user_run = 888
>> set queue default enabled = True
>> set queue default started = True
>> #
>> # Create and define queue high
>> #
>> create queue high
>> set queue high queue_type = Execution
>> set queue high Priority = 100
>> set queue high max_running = 96
>> set queue high resources_min.nodect = 1
>> set queue high resources_default.nodes = 1
>> set queue high resources_default.walltime = 24:00:00
>> set queue high max_user_run = 96
>> set queue high enabled = True
>> set queue high started = True
>> #
>> # Create and define queue group2
>> #
>> create queue group2
>> set queue group2 queue_type = Execution
>> set queue group2 acl_host_enable = False
>> set queue group2 acl_hosts = node005+node006+node007+node008
>> set queue group2 acl_users = user1
>> set queue group2 acl_users += user2
>> set queue group2 resources_default.walltime = 240:00:00
>> set queue group2 enabled = True
>> set queue group2 started = True
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server max_running = 888
>> set server max_user_run = 888
>> set server acl_host_enable = True
>> set server acl_hosts = master.local
>> set server acl_hosts += cluster.local
>> set server default_queue = default
>> set server log_events = 511
>> set server mail_from = root
>> set server query_other_jobs = True
>> set server resources_default.nodect = 1
>> set server resources_default.nodes = 1
>> set server scheduler_iteration = 600
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server allow_node_submit = True
>> set server next_job_number = 100
>> set server authorized_users = * <at> *.local
>>
>> *************************************************************
>> Here is what is see as root issuing qmgr -c Œp s¹
>>
>> *************************************************************
>>
>>
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue default
>> #
>> create queue default
>> set queue default queue_type = Execution
>> set queue default Priority = 75
>> set queue default max_running = 888
>> set queue default resources_default.neednodes = default  <-------------
>> set queue default max_user_run = 888
>> set queue default enabled = True
>> set queue default started = True
>> #
>> # Create and define queue high
>> #
>> create queue high
>> set queue high queue_type = Execution
>> set queue high Priority = 100
>> set queue high max_running = 96
>> set queue high resources_min.nodect = 1
>> set queue high resources_default.neednodes = high  <-------------
>> set queue high resources_default.nodes = 1
>> set queue high resources_default.walltime = 24:00:00
>> set queue high max_user_run = 96
>> set queue high enabled = True
>> set queue high started = True
>> #
>> # Create and define queue group2
>> #
>> create queue group2
>> set queue group2 queue_type = Execution
>> set queue group2 acl_host_enable = False
>> set queue group2 acl_hosts = node005+node006+node007+node008
>> set queue group2 acl_users = user1
>> set queue group2 acl_users += user2
>> set queue group2 resources_default.neednodes = group2  <-------------
>> set queue group2 resources_default.walltime = 240:00:00
>> set queue group2 enabled = True
>> set queue group2 started = True
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server max_running = 888
>> set server max_user_run = 888
>> set server acl_host_enable = True
>> set server acl_hosts = master.local
>> set server acl_hosts += cluster.local
>> set server default_queue = default
>> set server log_events = 511
>> set server mail_from = root
>> set server query_other_jobs = True
>> set server resources_default.neednodes = 1  <-------------
>> set server resources_default.nodect = 1
>> set server resources_default.nodes = 1
>> set server scheduler_iteration = 600
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server allow_node_submit = True
>> set server next_job_number = 100
>> set server authorized_users = * <at> *.local
>>
>>
>>
>> A user does not see the lines Marked above <‹‹‹‹‹‹
>>
>>
>> Users submit jobs to default and request 8 nodes even though 4 nodes are
>> dedicated to default queue.
>> How can I stop this from grabbing nodes in group2 and running on all 8
>> nodes?
>>
>> Billy
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers <at> supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers

>
>
>-- 
>Michel Béland, analyste en calcul scientifique
>michel.beland <at> calculquebec.ca
>bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal
>téléphone : 514 343-6111 poste 3892     télécopieur : 514 343-2155
>Calcul Québec (www.calculquebec.ca)

>Calcul Canada (calculcanada.ca)
>
>_______________________________________________
>torqueusers mailing list
>torqueusers <at> supercluster.org
>http://www.supercluster.org/mailman/listinfo/torqueusers


_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Lenox, Billy CTR (US | 19 Aug 21:37 2015

qsub problem in more detail

I have a weird Problem when users submit jobs

*************************************************************

Here is my Node List
*************************************************************

node001 np=12 default high
node002 np=12 default high
node003 np=12 default high
node004 np=12 default high
node005 np=12 group2
node006 np=12 group2
node007 np=12 group2
node008 np=12 group2

*************************************************************

Here is what I see as a User issuing qmgr -c Œp s¹

*************************************************************

#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default Priority = 75
set queue default max_running = 888
set queue default max_user_run = 888
set queue default enabled = True
set queue default started = True
#
# Create and define queue high
#
create queue high
set queue high queue_type = Execution
set queue high Priority = 100
set queue high max_running = 96
set queue high resources_min.nodect = 1
set queue high resources_default.nodes = 1
set queue high resources_default.walltime = 24:00:00
set queue high max_user_run = 96
set queue high enabled = True
set queue high started = True
#
# Create and define queue group2
#
create queue group2
set queue group2 queue_type = Execution
set queue group2 acl_host_enable = False
set queue group2 acl_hosts = node005+node006+node007+node008
set queue group2 acl_users = user1
set queue group2 acl_users += user2
set queue group2 resources_default.walltime = 240:00:00
set queue group2 enabled = True
set queue group2 started = True
#
# Set server attributes.
#
set server scheduling = True
set server max_running = 888
set server max_user_run = 888
set server acl_host_enable = True
set server acl_hosts = master.local
set server acl_hosts += cluster.local
set server default_queue = default
set server log_events = 511
set server mail_from = root
set server query_other_jobs = True
set server resources_default.nodect = 1
set server resources_default.nodes = 1
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server allow_node_submit = True
set server next_job_number = 100
set server authorized_users = * <at> *.local

*************************************************************
Here is what is see as root issuing qmgr -c Œp s¹

*************************************************************

#
# Create queues and set their attributes.
#
#
# Create and define queue default
#
create queue default
set queue default queue_type = Execution
set queue default Priority = 75
set queue default max_running = 888
set queue default resources_default.neednodes = default  <-------------
set queue default max_user_run = 888
set queue default enabled = True
set queue default started = True
#
# Create and define queue high
#
create queue high
set queue high queue_type = Execution
set queue high Priority = 100
set queue high max_running = 96
set queue high resources_min.nodect = 1
set queue high resources_default.neednodes = high  <-------------
set queue high resources_default.nodes = 1
set queue high resources_default.walltime = 24:00:00
set queue high max_user_run = 96
set queue high enabled = True
set queue high started = True
#
# Create and define queue group2
#
create queue group2
set queue group2 queue_type = Execution
set queue group2 acl_host_enable = False
set queue group2 acl_hosts = node005+node006+node007+node008
set queue group2 acl_users = user1
set queue group2 acl_users += user2
set queue group2 resources_default.neednodes = group2  <-------------
set queue group2 resources_default.walltime = 240:00:00
set queue group2 enabled = True
set queue group2 started = True
#
# Set server attributes.
#
set server scheduling = True
set server max_running = 888
set server max_user_run = 888
set server acl_host_enable = True
set server acl_hosts = master.local
set server acl_hosts += cluster.local
set server default_queue = default
set server log_events = 511
set server mail_from = root
set server query_other_jobs = True
set server resources_default.neednodes = 1  <-------------
set server resources_default.nodect = 1
set server resources_default.nodes = 1
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server allow_node_submit = True
set server next_job_number = 100
set server authorized_users = * <at> *.local

A user does not see the lines Marked above <‹‹‹‹‹‹

Users submit jobs to default and request 8 nodes even though 4 nodes are
dedicated to default queue.
How can I stop this from grabbing nodes in group2 and running on all 8
nodes?

Billy

Gmane