Shepherd, Melissa D | 29 Jun 19:01 2016

LBNL Node Health Check in physical cluster - question

Greetings,

 

I'm new to the management, architecture, and configuration of TORQUE. I don't see anything in the list archives that clarifies an issue for me, and appreciate guidance.

 

The cluster I am supporting has physical machines (1 blade/host=1 compute node) all with RHEL OS (6.x).

TORQUE client (pbs_mom) runs on each compute node, which communicates with one head node running TORQUE server (pbs_server).

 

I've put NHC from .rpm on some compute nodes, and tested that the scripts do work correctly in detecting issues on the node. However, the helper scripts, etc. run from the nodes--how can the node-mark-offline & node-mark-online scripts invoke 'pbsnodes' when that's on the head node?

 

Thanks,

Mel 

 

​​


This electronic message transmission contains information from CSRA that may be attorney-client privileged, proprietary or confidential. The information in this message is intended only for use by the individual(s) to whom it is addressed. If you believe you have received this message in error, please contact me immediately and be aware that any use, disclosure, copying or distribution of the contents of this message is strictly prohibited. NOTE: Regardless of content, this email shall not operate to bind CSRA to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of email for such purpose.

_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Christopher Wirth | 29 Jun 17:10 2016
Picon

Are multiple array dependencies possible?

Hi,

Another question about array dependencies: If I have array jobs A, B, and C, is it possible to make C wait
until both A and B have completed ok?

In the documentation I can see how to do multiple non-array dependencies, using a colon-separated list of
job IDs:

afterok:jobid[:jobid…]

However, this doesn’t appear to be an option for afterokarray:

afterokarray:arrayid[count]

A quick test, just in case it was implemented but not documented, suggests that the docs are accurate as
array job C started when array job A finished, but array job B was still running.

Is there a way to make this dual-array-dependency happen?

Many thanks,

Chris
________________________________

This email is confidential and intended solely for the use of the person(s) ('the intended recipient') to
whom it was addressed. Any views or opinions presented are solely those of the author and do not
necessarily represent those of the Cancer Research UK Manchester Institute or the University of
Manchester. It may contain information that is privileged & confidential within the meaning of
applicable law. Accordingly any dissemination, distribution, copying, or other use of this message, or
any of its contents, by any person other than the intended recipient may constitute a breach of civil or
criminal law and is strictly prohibited. If you are NOT the intended recipient please contact the sender
and dispose of this e-mail as soon as possible.
Skip Montanaro | 27 Jun 21:38 2016
Picon

Can't send email from a job?

I'm trying to get myself reacquainted with Torque after a several year
hiatus. I've got pbs_server and maui running on server1, and pbs_mom
running on server1 and server2. This simpleton command works just
fine:

echo "echo hello world > /home/skipm/torque.out" | qsub

creating the desired file in my home directory. This command completes
with an error, however:

echo "echo hello world | mailx -s test user <at> host.domain" | qsub

06/27/2016 13:59:26.074;13;PBS_Server.109057;Job;6.server1;Not sending
email: User does not want mail of this type.

I did not use --with-sendmail=... when configuring, so it set the
SENDMAIL_CMD macro/variable to /usr/lib/sendmail. I'm not sure what
effect that might have had on this failed command. Mailx is in
/usr/bin/mailx, which should be in PATH.

Searching for the error message yielded no useful hits. Your guidance
would be appreciated.

Thanks,

Skip Montanaro
Mark Lohry | 25 Jun 20:09 2016
Picon

jobs stuck in queue unless forced with qrun

Several other similar queries on this in the mailing list archives, but 
I've yet to find an answer to fix this. Hope someone has an idea.

Fresh install on ubuntu 16.04. I can submit jobs, but they just sit in 
the queue indefinitely. If I force them to run with sudo qrun, then they 
run just fine.

Is there anything obviously wrong with my setup here:

qmgr setup is like so:

:~$ sudo qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 24:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = templeton
set server managers = root <at> templeton
set server operators = root <at> templeton
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server submit_hosts = templeton
set server allow_node_submit = True
set server next_job_number = 27
sudo | 21 Jun 05:17 2016
Picon

PBS_Server 6.0.1 segfault by 'qdel'

Hello,

I'm trying to use torque 6.0.1 with numa support.
Most of torque functions run well as expected.
The only thing I'm confusing is that pbs_server segfaults by user's qdel.
The admin(root) can not qdel the jobs (also pbs_server gets segfault).
Only 'qdel -p <jid>' can kill the jobs.(by root only)

I recreate db by torque.setup, but, still pbs_server keep segfault by 'qdel'.

Are there any similar issues exists of torque v6.0.1 before?

I'm appreciate any comments/suggestions/workarounds.

------------------------------------------------------------------------------

Here is my experience and environment.

- user's qdel make pbs_server segfault.

[]$ date; qstat -a
2016  6 21  11:12:01 JST

vsmp003:
Req'd       Req'd       Elap
Job ID                  Username    Queue    Jobname SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ ---------
--------- - ---------
0.vsmp003               user009     batch    himeNO7 13059     1      9       --   01:00:00 C       --
1.vsmp003               user009     batch    himeNO7 14544     1      9       --   01:00:00 R  00:01:27

[]$ date; qdel 1
2016  6 21  11:12:06 JST

Unable to communicate with vsmp003(192.168.21.3)

Unable to communicate with vsmp003(192.168.21.3)
Cannot connect to specified server host 'vsmp003'.
qdel: cannot connect to server vsmp003 (errno=111) Connection refused
[]$ date; qstat -a
2016  6 21  11:12:09 JST

Unable to communicate with vsmp003(192.168.21.3)
Cannot connect to specified server host 'vsmp003'.

Unable to communicate with vsmp003(192.168.21.3)
Cannot connect to specified server host 'vsmp003'.

Unable to communicate with vsmp003(192.168.21.3)
Cannot connect to specified server host 'vsmp003'.
qstat: cannot connect to server vsmp003 (errno=111) Connection refused
qstat: Error (111 - Connection refused)

- /var/log/messages:

Jun 21 11:12:06 vsmp003 kernel: [ 2065.849677] pbs_server[12740]: segfault at 1b0 ip 00000000004e16e2
sp 00002b87b47fb7d0 error 4 in pbs_server[400000+15b000]
Jun 21 11:12:06 vsmp003 abrt[15112]: Saved core dump of pid 12730 (/usr/local/sbin/pbs_server) to
/var/spool/abrt/ccpp-2016-06-21-11:12:06-12730 (115888128 bytes)
Jun 21 11:12:06 vsmp003 abrtd: Directory 'ccpp-2016-06-21-11:12:06-12730' creation detected
Jun 21 11:12:06 vsmp003 abrtd: Executable '/usr/local/sbin/pbs_server' doesn't belong to any package
and ProcessUnpackaged is set to 'no'
Jun 21 11:12:06 vsmp003 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2016-06-21-11:12:06-12730'
exited with 1
Jun 21 11:12:06 vsmp003 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2016-06-21-11:12:06-12730'

- TORQUE configure

  HWLOC_LIBS='-L/usr/local/lib -lhwloc'  (note; hwloc-1.9)

  $ ./configure --with-default-server=vsmp003 --enable-numa-support --enable-cpuset --enable-cgroups

- Installed prerequired packages.

libtool-2.2.6-15.5.el6.x86_64
libcgroup-devel-0.40.rc1-16.el6.x86_64
openssl-devel-1.0.1e-42.el6.x86_64
libxml2-devel-2.7.6-20.el6.x86_64
zlib-devel-1.2.3-29.el6.x86_64
libtool-ltdl-2.2.6-15.5.el6.x86_64
boost-devel-1.41.0-27.el6.x86_64
pciutils-devel-3.1.10-4.el6.x86_64

-----------------------------------------------------------------

Best Regards,
 --Sudo
Skip Montanaro | 16 Jun 17:59 2016
Picon

Boost assertion error when starting pbs_server

I've been away from Torque and Maui for about five years. I was recently asked to get going with them again for a new user. It appears that in the intervening time the Boost libraries have insinuated themselves into Torque. I am building and running on a couple openSuSE 12.2 systems, and installed boost-devel (1.49) to build Torque. I also removed everything from PATH other than the minimal required, to keep from polluting the build with locally installed stuff. From config.log:

PATH: /usr/X11R6/bin
PATH: /usr/bin
PATH: /bin
PATH: /usr/sbin

When I try to start pbs_server, I get a Boost assertion error:

% sudo /opt/local/sbin/pbs_server 
skipm's password:
pbs_server: /usr/include/boost/unordered/detail/table.hpp:387: size_t boost::unordered::detail::table<Types>::min_buckets_for_size(size_t) const [with Types = boost::unordered::detail::map<std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, boost::hash<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >]: Assertion `this->mlf_ != 0' failed.

Not being a Boost user, I haven't the slightest idea what that means. I ran things from within gdb and got this backtrace:

(gdb) bt
#0  0x00007ffff552ad25 in raise () from /lib64/libc.so.6
#1  0x00007ffff552c1a8 in abort () from /lib64/libc.so.6
#2  0x00007ffff5523c22 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff5523cd2 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000426f92 in boost::unordered::detail::table<boost::unordered::detail::map<std::allocator<std::pair<std::string const, int> >, std::string, int, boost::hash<std::string>, std::equal_to<std::string> > >::min_buckets_for_size (this=0x7ffff7dda368 <cache+72>, size=1)
    at /usr/include/boost/unordered/detail/table.hpp:387
#5  0x0000000000426405 in boost::unordered::detail::table<boost::unordered::detail::map<std::allocator<std::pair<std::string const, int> >, std::string, int, boost::hash<std::string>, std::equal_to<std::string> > >::reserve_for_insert (this=0x7ffff7dda368 <cache+72>, size=1) at /usr/include/boost/unordered/detail/table.hpp:643
#6  0x00000000004254dd in boost::unordered::detail::table_impl<boost::unordered::detail::map<std::allocator<std::pair<std::string const, int> >, std::string, int, boost::hash<std::string>, std::equal_to<std::string> > >::operator[] (this=0x7ffff7dda368 <cache+72>, k=...) at /usr/include/boost/unordered/detail/unique.hpp:351
#7  0x0000000000424731 in boost::unordered::unordered_map<std::string, int, boost::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, int> > >::operator[] (
    this=0x7ffff7dda368 <cache+72>, k=...) at /usr/include/boost/unordered/unordered_map.hpp:1192
#8  0x00007ffff7510d8d in container::item_container<int>::find (this=0x7ffff7dda320 <cache>, id=...)
    at ../../../src/include/container.hpp:389
#9  0x00007ffff751082d in addrcache::getFromCache (this=0x7ffff7dda320 <cache>, 
    hostName=0xb53b20 <server_host> "blade") at ../Libnet/net_cache.c:354
#10 0x00007ffff750f6f3 in get_cached_fullhostname (hostname=0xb53b20 <server_host> "blade", sai=0x0)
    at ../Libnet/net_cache.c:464
#11 0x00007ffff7505eb6 in get_fullhostname (shortname=0xb53b20 <server_host> "blade", 
    namebuf=0xb53b20 <server_host> "blade", bufsize=1024, EMsg=0x7fffffffe610 "")
    at ../Libnet/get_hostname.c:153
#12 0x00000000004533b2 in main (argc=1, argv=0x7fffffffeb28) at pbsd_main.c:1670

I see that in get_fullhostname it shows just "blade". For some reason, many years ago, our admins concluded that fully qualified hostnames were a bad idea. When I built Torque, I had configured --with-server-home=/opt/local/torque, so created a server_name file in that directory with the true fully qualified domain name, but it still craps out with the same backtrace.

Setting up and starting the trqauthd.service seems to yield similar results:

# systemctl status trqauthd.service
trqauthd.service - TORQUE trqauthd daemon
 Loaded: loaded (/usr/lib/systemd/system/trqauthd.service; enabled)
 Active: failed (Result: core-dump) since Thu, 16 Jun 2016 10:57:01 -0500; 1min 4s ago
Process: 18810 ExecStart=/opt/local/sbin/trqauthd (code=dumped, signal=ABRT)
 CGroup: name=systemd:/system/trqauthd.service

Jun 16 10:57:01 blade trqauthd[18810]: trqauthd: /usr/include/boost/unorder...d.

Any suggestions about how to further debug this error appreciated. 

Thx,

Skip Montanaro

_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Nicholas Lindberg | 13 Jun 19:32 2016

Getting "#PBS -M" user supplied e-mail to an epilogue script?

Hello,

 

I’m trying to do something that I feel should be fairly easy, but can’t figure out a straightforward answer to, which is this: how do I retrieve a user’s supplied e-mail address using

 

#PBS –M user <at> user.edu

 

from within an epilogue script?

 

As far as I know, the only parameters the epilogue script are passed are below (which doesn’t include the user supplied e-mail).  Does this mean that I have to do something like have a prologue script (or some kind of submit filter) write out an environment variable containing the user supplied e-mail address after doing some perl/sed magic to grep it out, store it in the environment,

so that I can then reference said environment inside my epilogue? 

 

Seems like way too much work, but also seems like it’s the only way.  If somebody has found another way, I’m all ears.

 

Thanks.

Nick

 

 

PARAMETERS PASSED TO EPILOGUE (example from docs):

               

#!/bin/sh

 

echo "Epilogue Args:"

echo "Job ID: $1"

echo "User ID: $2"

echo "Group ID: $3"

echo "Job Name: $4"

echo "Session ID: $5"

echo "Resource List: $6"

echo "Resources Used: $7"

echo "Queue Name: $8"

echo "Account String: $9"

echo ""

 

exit 0

stdout   

 

 

Epilogue Args:

Job ID: 13724.node01

User ID: user1

Group ID: user1

Job Name: script.sh

Session ID: 28244

Resource List: neednodes=node01,nodes=1,walltime=00:01:00

Resources Used: cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:07

Queue Name: batch

Account String:

 

 

 

--

 

Nick Lindberg

Director of Engineering

Milwaukee Institute 

414-269-8332 (O)

608-215-3508 (M)

 

_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Stijn De Weirdt | 13 Jun 15:03 2016
Picon

jobstart vs prologue

hi all,

does anyone know (or point to the documentation) if the pbs_mom
node_check_interval value jobstart runs the healthcheck before or after
prologue (and similar for jobend vs epilogue)?

many thanks,

stijn
Gabriel A. Devenyi | 8 Jun 20:23 2016
Picon
Gravatar

PBS Equivalent to SGE's -sync?

Is there an equivalent method of blocking a qsub until the submitted job is complete, similar to functionality of SGE's -sync?

> -sync y causes qsub to wait for the job to complete before exiting.  If the job completes successfully, qsub's exit code will be that of the completed job.  If the job fails to complete successfully, qsub will print out a error message indicating why the job failed and will have an exit code of 1.  If qsub is interrupted, e.g. with CTRL-C, before the job completes, the job will be canceled.

--
Gabriel A. Devenyi B.Eng. Ph.D.
Research Computing Associate
Computational Brain Anatomy Laboratory
Cerebral Imaging Center
Douglas Mental Health University Institute
Affiliate, Department of Psychiatry
McGill University
t: 514.761.6131x4781
e: gdevenyi <at> gmail.com
_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Go Yoshimura | 7 Jun 06:53 2016
Picon

pbsnodes reports incorrect total_sockets/total_numa_nodes/total_cores/total_threads for some numanodes

Hello,

I have a question about torque v6.0.1 .
I have 8nodes cluster each has 2socket/28cores per node.
I built torque v6.0.1 with NUMA support and all cluster nodes
has HWLOC-1.9 installed already.
Most of cluster nodes behaves just fine, but, pbsnodes reports
strange total_cores/total_threads values on the few nodes in this cluster.

I tried pbs_server recreate database, reboot pbs_mom, or, reboot entire cluster
didn't help.

Would you suggest us what is wrong on this node(nc02 below)?
Where to check pbs_server/pbs_mom configuration?

Here are some information.

===========================================================================
0) torque configue 

     It was created by torque configure 6.0.1, which was
     generated by GNU Autoconf 2.69.  Invocation command line was

       $ ./configure --enable-numa-support --enable-cpuset --enable-cgroups

     (e.g ac_cv_env_HWLOC_LIBS_value='-L/usr/local/lib -lhwloc'
          HWLOC_LIBS='-L/usr/local/lib -lhwloc' . . . fm config.log )

1) HWLOC information on the cluster (nc01~nc08)

# pdsh -w nc0[1-8] hwloc-info --version | sort
nc01: hwloc-info 1.9
nc02: hwloc-info 1.9
nc03: hwloc-info 1.9
nc04: hwloc-info 1.9
nc05: hwloc-info 1.9
nc06: hwloc-info 1.9
nc07: hwloc-info 1.9
nc08: hwloc-info 1.9

# pdsh -w nc0[1-8] hwloc-info |grep PU | sort
nc01:         depth 8:	28 PU (type #6)
nc02:         depth 8:	28 PU (type #6)
nc03:         depth 8:	28 PU (type #6)
nc04:         depth 8:	28 PU (type #6)
nc05:         depth 8:	28 PU (type #6)
nc06:         depth 8:	28 PU (type #6)
nc07:         depth 8:	28 PU (type #6)
nc08:         depth 8:	28 PU (type #6)

1) PBS_SERVER : server_priv/node file

[root <at> fs9 ~]# cat /var/spool/torque/server_priv/nodes 
nc01  np=28 num_node_boards=2
nc02  np=28 num_node_boards=2
nc03  np=28 num_node_boards=2
nc04  np=28 num_node_boards=2
nc05  np=28 num_node_boards=2
nc06  np=28 num_node_boards=2
nc07  np=28 num_node_boards=2
nc08  np=28 num_node_boards=2

2) PBS_MOM : mom.layout on the cluster nodes (all the same)

# cat /var/spool/torque/mom_priv/mom.layout 
nodes=0 cpus=0-13 mems=0
nodes=1 cpus=14-27 mems=1

3) Strange node 'nc02-0'

   It seems mom recognize ncpus=14 both for nc02-0/nc02-1 which is good.
   But, it reports "total_sockets = 1, total_cores = 4" on nc02-0.
   They are "total_sockets = 2, total_cores = 28" on other NUMA nodes.

[root <at> fs9 ~]# pbsnodes nc02
nc02-0
     state = free
     power_state = Running
     np = 14
     ntype = cluster
     status = rectime=1465269096,macaddr=1c:b7:2c:14:56:c7,cpuclock=OnDemand:1200MHz,varattr=,jobs=,state=free,netload=,gres=,loadave=0.00,ncpus=14,physmem=67007008kb,availmem=65065912kb,totmem=
67007008kb,idletime=0,nusers=0,nsessions=0,uname=Linux nc02 2.6.32-573.el6.x86_64 #1 SMP Thu Jul
23 15:44:03 UTC 2015 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     total_sockets = 1
     total_numa_nodes = 1
     total_cores = 4
     total_threads = 4
     dedicated_sockets = 0
     dedicated_numa_nodes = 0
     dedicated_cores = 0
     dedicated_threads = 0

nc02-1
     state = free
     power_state = Running
     np = 14
     ntype = cluster
     status = rectime=1465269096,macaddr=1c:b7:2c:14:56:c7,cpuclock=OnDemand:1200MHz,varattr=,jobs=,state=free,netload=,gres=,loadave=0.00,ncpus=14,physmem=67108864kb,availmem=65672456kb,totmem=
67108864kb,idletime=0,nusers=0,nsessions=0,uname=Linux nc02 2.6.32-573.el6.x86_64 #1 SMP Thu Jul
23 15:44:03 UTC 2015 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     total_sockets = 2
     total_numa_nodes = 2
     total_cores = 28
     total_threads = 28
     dedicated_sockets = 0
     dedicated_numa_nodes = 0
     dedicated_cores = 0
     dedicated_threads = 14

4) Healthy node 'node08' (for example)

# pbsnodes nc08
nc08-0
     state = free
     power_state = Running
     np = 14
     ntype = cluster
     status = rectime=1465269939,macaddr=14:dd:a9:24:2f:8d,cpuclock=OnDemand:1200MHz,varattr=,jobs=,state=free,netload=,gres=,loadave=0.00,ncpus=14,physmem=67007052kb,availmem=65149424kb,totmem=
67007052kb,idletime=0,nusers=0,nsessions=0,uname=Linux nc08 2.6.32-573.el6.x86_64 #1 SMP Thu Jul
23 15:44:03 UTC 2015 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     total_sockets = 2
     total_numa_nodes = 2
     total_cores = 28
     total_threads = 28
     dedicated_sockets = 0
     dedicated_numa_nodes = 0
     dedicated_cores = 0
     dedicated_threads = 0

nc08-1
     state = free
     power_state = Running
     np = 14
     ntype = cluster
     status = rectime=1465269939,macaddr=14:dd:a9:24:2f:8d,cpuclock=OnDemand:1200MHz,varattr=,jobs=,state=free,netload=,gres=,loadave=0.00,ncpus=14,physmem=67108864kb,availmem=65574536kb,totmem=
67108864kb,idletime=0,nusers=0,nsessions=0,uname=Linux nc08 2.6.32-573.el6.x86_64 #1 SMP Thu Jul
23 15:44:03 UTC 2015 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003
     total_sockets = 2
     total_numa_nodes = 2
     total_cores = 28
     total_threads = 28
     dedicated_sockets = 0
     dedicated_numa_nodes = 0
     dedicated_cores = 0
     dedicated_threads = 0

5) Other 
 - cgconfig is 'on' on cluster nodes.
 - trqauth is 'on' 
 - pbs_mom is 'on'
 - CentOS6.7 
 - Kernel Linux nc02 2.6.32-573.el6.x86_64 #1 SMP Thu Jul 23 15:44:03 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

thank you
go
---

----
Go Yoshimura <go-yoshimura <at> sstc.co.jp>
Scalable Systems Co., Ltd.  <http://www.sstc.co.jp/>
Osaka Office            HONMACHI-COLLABO Bldg. 4F, 4-4-2 Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan
              Tel: 81-6-6224-4115
Tokyo Kojimachi Office  BUREX Kojimachi 11F, 3-5-2 Kojimachi, Chiyoda-ku, Tokyo 102-0083 Japan 
              Tel: 81-3-5875-4718 Fax: 81-3-3237-7612              
Go Yoshimura | 6 Jun 09:46 2016
Picon

recommended torque version for CentOS6.7

Hi everyone!

- We are going to install torque into CentOS6.7.
- There are many versions of torque.
- Latest is 6.0.1.
- Which version is recommended to install into CentOS6.7?

- We want to enable numa-support.
- As for 6.0.1, we tried both
   ./configure --enable-numa-support --enable-cpuset --enable-cgroups
  and
   ./configure --enable-numa-support
- Both of above, we failed in configure because of too old hwloc version(1.5).
- "--enable-cgroups" is new feature.
  But even only for "--enable-numa-support",  hwloc is required.
- Required version is hwloc 1.9 or later like ((6.0.1 failure)).
- hwloc 1.9 is much newer than CentOS6.7 or CentOS7.2 provide 
   centOS6.7    hwloc1.5
   centOS7.2    hwloc1.7
- We can configure torque-5.1.3 like ((5.1.3 success))
  torqueRelaseNotes5.1.3.pdf mensions that
  RHEL7.x/CentOS7.x are newly supported

thank you
go
----

((6.0.1 failure))
checking for HWLOC... no
configure: error: cpuset support requires the hwloc development package
cgroup support requires the hwloc development package

Requested 'hwloc >= 1.9' but version of hwloc is 1.5

This can be solved by configuring with --with-hwloc-path=<path>. This path
should be the path to the directory containing the lib/ and include/ directories
for your version of hwloc.

hwloc can be loaded by running the hwloc_install.sh script in the
contrib directory within this Torque distribution.

Another option is adding the directory containing 'hwloc.pc'
to the PKG_CONFIG_PATH environment variable.

If you have done these and still get this  error, try running ./autogen.sh and
then configuring again.

((5.1.3 success))
Building components: server=yes mom=yes clients=yes
                     gui=no drmaa=no pam=no
PBS Machine type    : linux
Remote copy         : /usr/bin/scp -rpB
PBS home            : /var/spool/torque
Default server      : cent6-07

Unix Domain sockets : 
Linux cpusets       : yes
Tcl                 : disabled
Tk                  : disabled
Authentication      : classic (pbs_iff)

Ready for 'make'.

----
Go Yoshimura <go-yoshimura <at> sstc.co.jp>
Scalable Systems Co., Ltd.  <http://www.sstc.co.jp/>
Osaka Office            HONMACHI-COLLABO Bldg. 4F, 4-4-2 Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan
              Tel: 81-6-6224-4115
Tokyo Kojimachi Office  BUREX Kojimachi 11F, 3-5-2 Kojimachi, Chiyoda-ku, Tokyo 102-0083 Japan 
              Tel: 81-3-5875-4718 Fax: 81-3-3237-7612              

Gmane