Steven Lo | 24 Apr 01:44 2014
Picon

version 4.2.8 release

Hi,

Any insight on when 4.2.8 will be released officially?

We understand that 4.2.6.1 is the most stable and 4.2.7 is also stable 
but it has
a memory leak problem which will be fixed in 4.2.8.

Thanks.

Steven.
Andrew Mather | 23 Apr 03:53 2014
Picon

Incorrect qstat output ?

Hi,

We're in the process of configuring our new Torque (4.2.7) and have come up with an issue.

Our previous install (2.5.1) has output as below:

[root <at> dev1 ~]# qstat -a

mgt-mgmt.basc.dpi.vic.gov.au:
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
2218246.mgt-mgmt     ss1b     batch    MIRA_4-2014         --      1   8  800gb 1200: Q   --
2500686.mgt-mgmt     sk00     batch    Trinity.fasta.ca   6105     1   6   22gb 170:0 R 141:0
2502414.mgt-mgmt     im18     batch    FG_BN_Job25       13862     1   1   18gb 45:00 R 25:26
2502516.mgt-mgmt     jp24     asreml   RFI_validation    16526   --   -- 13800m 40:00 R 22:49

qstat-f on one of these jobs (2502414) shows
    Resource_List.mem = 18gb
    Resource_List.nodect = 1
    Resource_List.nodes = 1
    Resource_List.pmem = 3800mb


The new (4.2.7) shows:

[root <at> dev3 NCBI]# qstat -a

                                                                                  Req'd    Req'd       Elap
Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory   Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------
187.mgt.basc.science.d  pb26        batch    samtools_4        43945     1      1  3800m  02:00:00 R  00:34:08
188.mgt.basc.science.d  pb26        batch    samtools_5        43906     1      1  3800m  02:00:00 R  00:34:07
189.mgt.basc.science.d  pb26        batch    samtools_6        43792     1      1  3800m  02:00:00 R  00:34:07
190.mgt.basc.science.d  pb26        batch    samtools_7        43880     1      1  3800m  02:00:00 R  00:34:06
192.mgt.basc.science.d  pb26        batch    samtools_9        43180     1      1  3800m  02:00:00 R  00:34:05
193.mgt.basc.science.d  pb26        batch    samtools_10       43945     1      1  3800m  02:00:00 R  00:34:04
226.mgt.basc.science.d  pb26        batch    memstress_33      44137     1      1  3800m  06:00:00 R  00:04:26
227.mgt.basc.science.d  pb26        batch    memstress_34      44209     1      1  3800m  06:00:00 R  00:01:37
228.mgt.basc.science.d  pb26        batch    memstress_35      44178     1      1  3800m  06:00:00 R  00:01:06
229.mgt.basc.science.d  pb26        batch    memstress_36      44192     1      1  3800m  06:00:00 R  00:01:06
230.mgt.basc.science.d  pb26        batch    memstress_37      44234     1      1  3800m  06:00:00 R  00:01:06
231.mgt.basc.science.d  pb26        batch    memstress_38      46662     1      1  3800m  06:00:00 R  00:00:35



qstat -f on one of these jobs (226) shows:
        Resource_List.mem = 376gb
    Resource_List.nodect = 1
    Resource_List.nodes = 1:ppn=1
    Resource_List.pmem = 3800mb
    Resource_List.walltime = 06:00:00

It seems that the newer version is showing the value of the Resource_List.pmem field (which is the default value for the queue)

Is there a way to have the output display the requested memory ?  My Google-Fu is letting me down.

Thanks,
Andrew







--
-
http://surfcoast.redbubble.com | https://picasaweb.google.com/107747436224613508618
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
"Unless someone like you, cares a whole awful lot, nothing is going to get better...It's not !" - The Lorax
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
A committee is a cul-de-sac, down which ideas are lured and then quietly strangled.
  Sir Barnett Cocks
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
"A mind is like a parachute. It doesnt work if it's not open." :- Frank Zappa
-
_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Ken Nielson | 22 Apr 18:37 2014

What features would you like to see in TORQUE

Hi all,

We would like to hear if you have suggestions for new features or improvements to TORQUE. We are looking for features and improvements not bug fixes. Keep in mind what kind of future hardware you may be planning to purchase and if TORQUE will need to make changes to be able to take advantage of that hardware.

The community is the place from which most of the improvements for TORQUE come.

Thanks

Ken

--

Ken Nielson Sr. Software Engineer
+1 801.717.3700 office    +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300     Provo, UT 84606
_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Daniel Benavides | 22 Apr 15:19 2014
Picon

nodes exclusive for one queue

Hi

I have instaled the torque 4.2.6.1 and maui 3.3.1.

I have nodes with GPU and nodes without GPU.

I generated two queue. une for gpu nodes and other for the others nodes.

create queue gpu
set queue gpu queue_type = Execution
set queue gpu acl_host_enable = False
set queue gpu acl_hosts = n002
set queue gpu acl_hosts += n001
set queue gpu resources_default.neednodes = gpu
set queue gpu resources_default.nodes = 1
set queue gpu resources_default.walltime = 1440:00:00
set queue gpu enabled = True
set queue gpu started = True

create queue batch
set queue batch queue_type = Execution
set queue batch acl_hosts = n006
set queue batch acl_hosts += n005
set queue batch acl_hosts += n004
set queue batch acl_hosts += n003
set queue batch resources_default.neednodes = batch
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 1440:00:00
set queue batch enabled = True
set queue batch started = True

the property in the nodes is asigner:
Ej:
one node with GPU

create node n001
set node n001 state = free
set node n001 np = 8
set node n001 properties = gpu
set node n001 ntype = cluster
set node n001 status = rectime=1396011140
set node n001 status += varattr=
set node n001 status += jobs=
set node n001 status += state=free
set node n001 status += netload=15685031
set node n001 status += gres=
set node n001 status += loadave=0.00
set node n001 status += ncpus=8
set node n001 status += physmem=32981792kb
set node n001 status += availmem=37490084kb
set node n001 status += totmem=38101784kb
set node n001 status += idletime=77551
set node n001 status += nusers=1
set node n001 status += nsessions=1
set node n001 status += sessions=2987
set node n001 status += uname=Linux n001 2.6.32-358.el6.x86_64 #1 SMP 
Fri Feb 22 00:31:26 UTC 2013 x86_64
set node n001 status += opsys=linux
set node n001 gpus = 1

one node without gpu

create node n003
set node n003 state = free
set node n003 np = 16
set node n003 properties = batch
set node n003 ntype = cluster
set node n003 status = rectime=1396011229
set node n003 status += varattr=
set node n003 status += jobs=
set node n003 status += state=free
set node n003 status += netload=8922119
set node n003 status += gres=
set node n003 status += loadave=0.00
set node n003 status += ncpus=16
set node n003 status += physmem=32905824kb
set node n003 status += availmem=37380212kb
set node n003 status += totmem=38025816kb
set node n003 status += idletime=81451
set node n003 status += nusers=1
set node n003 status += nsessions=1
set node n003 status += sessions=3327
set node n003 status += uname=Linux n003 2.6.32-358.el6.x86_64 #1 SMP 
Fri Feb 22 00:31:26 UTC 2013 x86_64
set node n003 status += opsys=linux

But the maui mapping the all the nodes in the all classes.

In the PBSPro the comand is

set node n001 queue = gpu
set node n002 queue = gpu
set node n003 queue = batch
set node n004 queue = batch
set node n005 queue = batch
set node n006 queue = batch

but this command is not allowed in the torque

th log in the maui is

03/28 10:17:03 ServerProcessRequests()
03/28 10:17:03 INFO:     not rolling logs (1456817 < 10000000)
03/28 10:17:03 MResAdjust(NULL,0,0)
03/28 10:17:03 MStatInitializeActiveSysUsage()
03/28 10:17:03 MStatClearUsage([NONE],Active)
03/28 10:17:03 ServerUpdate()
03/28 10:17:03 MSysUpdateTime()
03/28 10:17:03 INFO:     starting iteration 1705
03/28 10:17:03 MRMGetInfo()
03/28 10:17:03 MClusterClearUsage()
03/28 10:17:03 MRMClusterQuery()
03/28 10:17:03 MClusterClearUsage()
03/28 10:17:03 MRMClusterQuery()
03/28 10:17:03 MPBSClusterQuery(BIRLIBIRLOKUS,RCount,SC)
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n001 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n001,n001,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n001,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n002 set to state Idle (free)
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n002 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n002,n002,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n002,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n003 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n003,n003,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n003,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n004 set to state Idle (free)
03/28 10:17:03 INFO:     PBS node n004 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n004,n004,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n004,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n005 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n005,n005,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n005,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n006 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n006,n006,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n006,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 INFO:     6 PBS resources detected on RM BIRLIBIRLOKUS
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 INFO:     6 PBS resources detected on RM BIRLIBIRLOKUS
03/28 10:17:03 INFO:     resources detected: 6
03/28 10:17:03 MRMWorkloadQuery()
03/28 10:17:03 MRMWorkloadQuery()
03/28 10:17:03 MPBSWorkloadQuery(BIRLIBIRLOKUS,JCount,SC)
03/28 10:17:03 INFO:     queue is empty
03/28 10:17:03 INFO:     0 PBS jobs detected on RM BIRLIBIRLOKUS
03/28 10:17:03 WARNING:  no workload detected
03/28 10:17:03 MStatClearUsage(node,Active)
03/28 10:17:03 MClusterUpdateNodeState()
03/28 10:17:03 MClusterUpdateNodeState()
03/28 10:17:03 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
03/28 10:17:03 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
03/28 10:17:03 
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
03/28 10:17:03 
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
03/28 10:17:03 
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,TRUE)
03/28 10:17:03 
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
03/28 10:17:03 MSchedUpdateStats()
03/28 10:17:03 INFO:     iteration: 1705   scheduling time:  0.003 seconds
03/28 10:17:03 MResUpdateStats()
03/28 10:17:03 INFO:     current util[1705]:  0/6 (0.00%)  PH: 0.00%  
active jobs: 0 of 0 (completed: 3)
03/28 10:17:03 MQueueCheckStatus()
03/28 10:17:03 MNodeCheckStatus()
03/28 10:17:03 MUClearChild(PID)
03/28 10:17:03 INFO:     scheduling complete.  sleeping 30 seconds

Please help my
Daniel Benavides | 22 Apr 00:54 2014
Picon

How mapping nodes

Hi

I have instaled the torque 4.2.6.1 and maui 3.3.1.

I have nodes with GPU and nodes without GPU.

I generated two queue. une for gpu nodes and other for the others nodes.

create queue gpu
set queue gpu queue_type = Execution
set queue gpu acl_host_enable = False
set queue gpu acl_hosts = n002
set queue gpu acl_hosts += n001
set queue gpu resources_default.neednodes = gpu
set queue gpu resources_default.nodes = 1
set queue gpu resources_default.walltime = 1440:00:00
set queue gpu enabled = True
set queue gpu started = True

create queue batch
set queue batch queue_type = Execution
set queue batch acl_hosts = n006
set queue batch acl_hosts += n005
set queue batch acl_hosts += n004
set queue batch acl_hosts += n003
set queue batch resources_default.neednodes = batch
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 1440:00:00
set queue batch enabled = True
set queue batch started = True

the property in the nodes is asigner:
Ej:
one node with GPU

create node n001
set node n001 state = free
set node n001 np = 8
set node n001 properties = gpu
set node n001 ntype = cluster
set node n001 status = rectime=1396011140
set node n001 status += varattr=
set node n001 status += jobs=
set node n001 status += state=free
set node n001 status += netload=15685031
set node n001 status += gres=
set node n001 status += loadave=0.00
set node n001 status += ncpus=8
set node n001 status += physmem=32981792kb
set node n001 status += availmem=37490084kb
set node n001 status += totmem=38101784kb
set node n001 status += idletime=77551
set node n001 status += nusers=1
set node n001 status += nsessions=1
set node n001 status += sessions=2987
set node n001 status += uname=Linux n001 2.6.32-358.el6.x86_64 #1 SMP 
Fri Feb 22 00:31:26 UTC 2013 x86_64
set node n001 status += opsys=linux
set node n001 gpus = 1

one node without gpu

create node n003
set node n003 state = free
set node n003 np = 16
set node n003 properties = batch
set node n003 ntype = cluster
set node n003 status = rectime=1396011229
set node n003 status += varattr=
set node n003 status += jobs=
set node n003 status += state=free
set node n003 status += netload=8922119
set node n003 status += gres=
set node n003 status += loadave=0.00
set node n003 status += ncpus=16
set node n003 status += physmem=32905824kb
set node n003 status += availmem=37380212kb
set node n003 status += totmem=38025816kb
set node n003 status += idletime=81451
set node n003 status += nusers=1
set node n003 status += nsessions=1
set node n003 status += sessions=3327
set node n003 status += uname=Linux n003 2.6.32-358.el6.x86_64 #1 SMP 
Fri Feb 22 00:31:26 UTC 2013 x86_64
set node n003 status += opsys=linux

But the maui mapping the all the nodes in the all classes.

In the PBSPro the comand is

set node n001 queue = gpu
set node n002 queue = gpu
set node n003 queue = batch
set node n004 queue = batch
set node n005 queue = batch
set node n006 queue = batch

but this command is not allowed in the torque

th log in the maui is

03/28 10:17:03 ServerProcessRequests()
03/28 10:17:03 INFO:     not rolling logs (1456817 < 10000000)
03/28 10:17:03 MResAdjust(NULL,0,0)
03/28 10:17:03 MStatInitializeActiveSysUsage()
03/28 10:17:03 MStatClearUsage([NONE],Active)
03/28 10:17:03 ServerUpdate()
03/28 10:17:03 MSysUpdateTime()
03/28 10:17:03 INFO:     starting iteration 1705
03/28 10:17:03 MRMGetInfo()
03/28 10:17:03 MClusterClearUsage()
03/28 10:17:03 MRMClusterQuery()
03/28 10:17:03 MClusterClearUsage()
03/28 10:17:03 MRMClusterQuery()
03/28 10:17:03 MPBSClusterQuery(BIRLIBIRLOKUS,RCount,SC)
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n001 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n001,n001,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n001,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n002 set to state Idle (free)
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n002 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n002,n002,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n002,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n003 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n003,n003,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n003,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n004 set to state Idle (free)
03/28 10:17:03 INFO:     PBS node n004 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n004,n004,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n004,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n005 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n005,n005,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n005,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 __MPBSGetNodeState(Name,State,PNode)
03/28 10:17:03 INFO:     PBS node n006 set to state Idle (free)
03/28 10:17:03 MPBSNodeUpdate(n006,n006,Idle,BIRLIBIRLOKUS)
03/28 10:17:03 MPBSLoadQueueInfo(BIRLIBIRLOKUS,n006,SC)
03/28 10:17:03 INFO:     queue 'gpu' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'gpu'
03/28 10:17:03 INFO:     queue 'batch' started state set to True
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 INFO:     6 PBS resources detected on RM BIRLIBIRLOKUS
03/28 10:17:03 INFO:     class to node mapping enabled for queue 'batch'
03/28 10:17:03 INFO:     6 PBS resources detected on RM BIRLIBIRLOKUS
03/28 10:17:03 INFO:     resources detected: 6
03/28 10:17:03 MRMWorkloadQuery()
03/28 10:17:03 MRMWorkloadQuery()
03/28 10:17:03 MPBSWorkloadQuery(BIRLIBIRLOKUS,JCount,SC)
03/28 10:17:03 INFO:     queue is empty
03/28 10:17:03 INFO:     0 PBS jobs detected on RM BIRLIBIRLOKUS
03/28 10:17:03 WARNING:  no workload detected
03/28 10:17:03 MStatClearUsage(node,Active)
03/28 10:17:03 MClusterUpdateNodeState()
03/28 10:17:03 MClusterUpdateNodeState()
03/28 10:17:03 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
03/28 10:17:03 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
03/28 10:17:03 
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
03/28 10:17:03 
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
03/28 10:17:03 
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,TRUE)
03/28 10:17:03 
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
03/28 10:17:03 MSchedUpdateStats()
03/28 10:17:03 INFO:     iteration: 1705   scheduling time:  0.003 seconds
03/28 10:17:03 MResUpdateStats()
03/28 10:17:03 INFO:     current util[1705]:  0/6 (0.00%)  PH: 0.00%  
active jobs: 0 of 0 (completed: 3)
03/28 10:17:03 MQueueCheckStatus()
03/28 10:17:03 MNodeCheckStatus()
03/28 10:17:03 MUClearChild(PID)
03/28 10:17:03 INFO:     scheduling complete.  sleeping 30 seconds

Please help my
hitesh chugani | 18 Apr 01:15 2014
Picon

shared compute nodes

Hello,


I have a question, can i have a shared compute nodes between two pbs_servers? If so, what should be the entry in TORQUE_HOME/mom_priv/config ?Thanks in advance.



Regards,
Hitesh Chugani.
_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Zhang,Jun | 15 Apr 21:55 2014

There are CPUs available, but job is queued

Out of my 16 nodes, there are 3 cpus seem vacant, they belong to two different nodes. At this time I submit a job, it is being queued. I am under any limit of the queue the job trying to execute. Can somebody help?

 

Jun

_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Ken Nielson | 15 Apr 19:14 2014

TORQUE master branch head reset in github

Hi all,

We had a bad merge into the  master branch of TORQUE last Friday, April 11, 2014. We have since removed that bad merge by resetting the head of master back to the last commit before the merge. There have been no other commits since the merge so nothing else has been lost.

We strongly discourage anyone from using the master branch for any production environment and do not support the master branch for such purposes. If you have pulled from the master branch since last Friday be aware it has changed.


Please respond with any questions.

Regards

Ken Nielson
--

Ken Nielson Sr. Software Engineer
+1 801.717.3700 office    +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300     Provo, UT 84606
_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Clotho Tsang | 15 Apr 03:40 2014

Torque HA is unable to handle network disconnection?

I am testing Torque' HA feature by using "--ha" option of pbs_server.
Torque version: 4.2.2.
OS: CentOS 6.4 x86_64
The Torque servers and the file server are VMs.
The file server is only a NFS server for testing purpose.

I set it up by 
  • mounting /var/spool/torque/server_priv/ on an external file server.
  • removing $pbsserver from /var/spool/torque/mom_priv/config
  • adding both server hostnames to /var/spool/torque/server_name
  • setting managers, operators, acl_hosts, lock_file_check_time, lock_file_update_time at qmgr
  • setting up separated maui on both hosts, no HA setup for Maui. (for testing purpose, will change later)
With the setup, I pass the following tests:
  • Both nodes should able to see same results of following commands: pbsnodes, qstat
  • Stop pbs_server of a node (with command "kill"), the commands above should keep working.
But I fail the following test:
  • Disconnect network of one node, the commands above should keep working.
Instead, pbsnodes / qstat hangs forever. I trace it with strace, find that they are waiting
for connection with disconnected host. Of course it will fail.

Before I disconnect the network, server.lock stores the PID of server1 (m20).
After I disconnect the network of server1, server.lock stores the PID of server2 (m30).
Its timestamp is kept updated.

[root <at> m30 torque]# ls -l /var/spool/torque/server_priv/server.lock
-rw------- 1 root root 5 Apr  9 17:01 /var/spool/torque/server_priv/server.lock
[root <at> m30 torque]# date
Wed Apr  9 17:01:44 CST 2014
[root <at> m30 torque]# cat /var/spool/torque/server_priv/server.lock
9339
[root <at> m30 torque]# ps aux | grep pbs_server
root      9339  0.0  3.9 574196 26988 ?        Sl   16:54   0:00 /usr/sbin/pbs_server -d /var/spool/torque --ha
root     11219  0.0  0.1 103236   856 pts/0    R+   16:59   0:00 grep pbs_server
[root <at> m30 torque]# pbsnodes
(hang here)
^C
[root <at> m30 torque]# strace pbsnodes
:
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
connect(3, {sa_family=AF_INET, sin_port=htons(15001), sin_addr=inet_addr("192.168.122.20")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, NULL, [3], NULL, {300, 0}

(hang here. 192.168.122.20 is the IP of server1)

Is it the expected behavior that Torque's native HA cannot handle network disconnection?


--
Clotho Tsang
Senior Software Engineer
Cluster Technology Limited
Email: clotho <at> clustertech.com
Tel: (852) 2655-6129
Fax: (852) 2994-2101
Website: www.clustertech.com
_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
Islam, Sharif | 14 Apr 22:31 2014

nallocpolicy

Our current NODEALLOCATIONPOLICY is set to PRIORITY (We are using torque
4.2.6 and moab 7.2.7).

We are testing other policies and was wondering about the “-l
nallocpolicy” resource manager extension
(http://docs.adaptivecomputing.com/mwm/7-2-7/help.htm#topics/resourceManage

rs/rmextensions.html#nallocpolicy).

It seems when I have NODEALLOCATIONPOLICY set to PRIORITY in moab, it is
not honoring "-l nallocpolicy=CONTIGUOUS”. However, when I have
NODEALLOCATIONPOLICY set to CONTIGUOUS, I was able to get a priority
allocation with  "-l nallocpolicy=PRIORITY”.  Is this expected? We also
have this set in moab: NODECFG[DEFAULT]      PRIORITYF=‘-NODEINDEX’ (Cray
XE/XK machine).



—sharif 


-- 
Sharif Islam 
System Engineer 
Blue Waters (http://www.ncsa.illinois.edu/BlueWaters/)

3006 E NCSA, 1205 W. Clark St. Urbana, IL


_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
David Beer | 14 Apr 22:20 2014

Improvements Coming in the June Release of TORQUE

All,

At MoabCon 2 weeks ago we discussed a lot of the scalability and throughput improvements that are being made in Ascent. For those who couldn't come or would like a review, here's a blog entry on the same topic: http://www.adaptivecomputing.com/blog-opensource/update-ascent-torque/

Cheers,

--
David Beer | Senior Software Engineer
Adaptive Computing
_______________________________________________
torqueusers mailing list
torqueusers <at> supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers

Gmane