HA pbs server setup
Rui Zhang <zhang <at> physik.uni-bonn.de>
2015-06-29 07:10:28 GMT
I would like to ask some advices for high available pbs server setup. I install torque-4.2.10 from epel
repository on two different virtual machines as pbs servers. I want them to be high available and load
balanced. I have two worker nodes as well. I set up the cluster based on the four machines and jobs are
submitted from either server. But somehow I think I just create two different pbs systems, one is server1 +
2 worker nodes, the other is server2 + 2 worker nodes because of the following reasons:
1. when I submit jobs from the two servers, the job id are created separately. Every time I submit three jobs
from server 1 and two jobs from server 2. For example for the second time I submit jobs, the id from server 1 is
4,5,6 and the id from server 2 is 3,4.
2. I created the queue on both server with same name. If I delete the queue from server 2 but again submit jobs,
qsub: submit error (Unknown queue MSG=requested queue not found)
3. the server_priv/ is not shared by two servers. Only server_name is shared via NFS. I also set lock file
qmgr -c “set server lock_file=/nfs/path”, but it is hard to see whether it is successful or not.
Can anyone give me some suggestions how to check if I set up a high available server or not? I start the service
on both pbs servers by
Thanks in advance.
torqueusers mailing list
torqueusers <at> supercluster.org