Lars Ellenberg | 27 Feb 16:52 2015

Please subscribe to users <at> This Mailing List is considered deprecated.

One of the things that was agreed at the recent cluster summit was a
consolidation cluster related irc channels, mailing lists, and websites.

In keeping with this, the linux-ha mailing list should now be considered

We still want to hear your questions (and answers!), but you'll need to
subscribe to the 'users' list at:

The mailing list itself will automatically remind posters of this
(at most) once a week.

Apologies for the inconvenience.

	Lars Ellenberg

Linux-HA mailing list
Linux-HA <at>
See also:

jarek | 26 Feb 09:36 2015

Automatic Postgres recovery


I've replicated postgresql cluster with pacemaker and corosync.

Testing instance of this cluster works on virtual machines, and
sometimes after host restart, postgres remains unsychronized and I need
to manually setup slave database with pg_basebacup. Fortunately such
problem never happen to production cluster, but I can imagine, that in
case of power failure similar scenario can happen to it also.

Is there any way to configure it, so in case of serious slave failure,
it will automatically perform recovery process with pg_basebackup ?
best regards

Linux-HA mailing list
Linux-HA <at>
See also:

Maily Peng | 25 Feb 11:47 2015

conditionnal start a resource


I'd like to start a Dummy resource  only if a file exists , if not, the 
resource could not start.

Thank you .
Linux-HA mailing list
Linux-HA <at>
See also:

Markus Guertler | 24 Feb 23:22 2015

fence_ec2 agent

Dear list,
I was just trying to configure the fence_ec2 stonith agent from 2012, written by Andrew Beekhof. It looks
like, that this one not working anymore with newer stonith / cluster versions. Is there any other EC2
agent, that is still maintained?

If not, I'll write one myself. However, I'd like to check all options first.


Linux-HA mailing list
Linux-HA <at>
See also:

Barry Haycock | 16 Feb 12:48 2015

Maintaining TCP State and configuring Conntrackd

I am building a corosync/pacemaker/haproxy HA load balancer in Active/Active mode using ClusterIP. As
this built on RHEL 6.5 I am restricted to using PCS to configure the LB.

One of the requirements is to maintain TCP state so that TCP based syslog audit is not lost during a fail over. 

I have two questions: 

1) is it possible when using conntrackd to maintain TCP state to have a seamless transition to the remaining
LB should one of the servers be shutdown. The work group in question cannot afford to loose any messages
once the connection has commenced. Some machines will be using a reliable transmission method for syslog
such as RELP but others will be using raw TCP. 

My testing shows that when sending a large of raw TCP messages via a single connection, the syslog server
will loose messages when one of the LBs are shutdown or put into standby. The client machine will start
ARPing for the mac address assigned to the VIP till a connection is established with the remaining LB. This
can loose us up to 3 seconds worth of messages. In reality I don't expect such a large amount of traffic to be
generated via a single connection. But the work group will not accept the solution if we loose any

Will this be a matter of managing the expectations of the work group, that during fail over, messages in
transit will be lost when using raw TCP?

2) I have been looking for instructions to implement conntrackd as a resource using PCS in order to maintain
TCP state and haven't had any luck. All instructions I have found implement conntrackd using cman. 
If anyone has an example for implementing conntrackd via pcs it would be much appreciated.



(Continue reading)

Vladislav Bogdanov | 13 Feb 15:03 2015

crmsh fails to stop already stopped resource


Following fails with the current crmsh (e4b10ee).
# crm resource stop cl-http-lv
# crm resource stop cl-http-lv
ERROR: crm_diff apparently failed to produce the diff (rc=0)
ERROR: Failed to commit updates to cl-http-lv
# echo $?

Linux-HA mailing list
Linux-HA <at>
See also:

Ulrich Windl | 13 Feb 10:38 2015

Q: Resource migration (Xen live migration)


I have some questions on pacemakers's resource migration. We have a Xen host that has some problems (still
to be investigated) that causes some VM disk not be be ready for use.

When tyring to migrate a VM frem the bad host to a good host through pacemaker, migration seemed to hang. At
some state the "source VM" was no longer present on the bad host (Unable to find domain 'v09'), but
pacemaker still tried a migration:
crmd[6779]:   notice: te_rsc_command: Initiating action 100: migrate_from
prm_xen_v09_migrate_from_0 on h05
Only after the timeout CRM realized that there is a problem:
crmd[6779]:  warning: status_from_rc: Action 100 (prm_xen_v09_migrate_from_0) on h05 failed (target:
0 vs. rc: 1): Error
After that CRM still stried a stop on the "source host" (h10) (and on the destination host):
crmd[6779]:   notice: te_rsc_command: Initiating action 98: stop prm_xen_v09_stop_0 on h10
crmd[6779]:   notice: te_rsc_command: Initiating action 26: stop prm_xen_v09_stop_0 on h05

Q1: Is this the way it should work?

Before that we had the same situation (thae bad host had been set to "standby") when someone tired of waiting
so long destroyed the affected Xen VMS on the source host while the cluster was migrating. Eventually the
VMs came up (restarted instead of being live migrated) on the good hosts.

Then we shutdown OpenAIS on the bad host, installed updates and rebooted the bad host (during reboot
OpenAIS was started (still standby)).
To my surprise pacemaker thought the VMS were still running on the bad host and initiated a migration. As
there were no source VMs on the bad host, but alle the affected VMs were running on some good host, CRM
stutdown the VMs on the good hostss, just to restart them.

Q2: Ist this expected behavior? I can hardly believe!
(Continue reading)

Lars Ellenberg | 10 Feb 22:24 2015

Announcing the Heartbeat 3.0.6 Release


  If you intend to set up a new High Availability cluster
  using the Pacemaker cluster manager,
  you typically should not care for Heartbeat,
  but use recent releases (2.3.x) of Corosync.

  If you don't care for Heartbeat, don't read further.

Unless you are beekhof... there's a question below ;-)


After 3½ years since the last "officially tagged" release of Heartbeat,
I have seen the need to do a new "maintenance release".

  The Heartbeat 3.0.6 release tag: 3d59540cf28d
  and the change set it points to: cceeb47a7d8f

The main reason for this was that pacemaker more recent than
somewhere between 1.1.6 and 1.1.7 would no longer work properly
on the Heartbeat cluster stack.

Because some of the daemons have moved from "glue" to "pacemaker" proper,
and changed their paths. This has been fixed in Heartbeat.

And because during that time, stonith-ng was refactored, and would still
reliably fence, but not understand its own confirmation message, so it
was effectively broken. This I fixed in pacemaker.
(Continue reading)

Lukas Kostyan | 6 Feb 18:16 2015

pacemaker doesn't start after cman config

Hi all,

was following the guide from clusterlab but use debian wheezy.
corosync   1.4.2-3
pacemaker  1.1.7-1
cman       3.0.12-3.2+deb7u2

configured the active/passive with no problems but as soon as I try to
config active/active with cman pacemaker doesnt start anymore it doestn
even write anything related to pacemaker in the logs, any ideas how to get
a hint?

/etc/init.d/service.d/pcmk is removed

Starting cluster:
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Starting gfs_controld... [  OK  ]
   Unfencing self... [  OK  ]
   Joining fence domain... [  OK  ]
root <at> vm-2:~# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    264   2015-02-06 10:09:15
   2   M    256   2015-02-06 10:08:59
(Continue reading)

Muhammad Sharfuddin | 6 Feb 06:28 2015

how to configure cluster to do not run resources simultaneously


resource 'app2' depends on 'app1', i.e without app1, app2 can't run.

resource 'app1' is configured to run on node1, while app2 is configured 
to run on node2 via location rules.

cluster is configured to never start the app2 atop node1.

I don't want to run both the resources on node2 simultaneously. How can 
I configure the cluster to do not run
the app2 on the node2 when cluster has to run the app1 atop node2(due to 
failure/unavailability of node1)

Help me how can I achieve this.



Muhammad Sharfuddin

Linux-HA mailing list
Linux-HA <at>
See also:

Dejan Muhamedagic | 30 Jan 21:52 2015

resource-agents 3.9.6 released


We've tagged today (Jan 30) a new stable resource-agents release
(3.9.6) in the upstream repository.

Big thanks go to all contributors! Needless to say, without you
this release would not be possible.

It has been almost two years since the release v3.9.5, hence the
number of changes is quite big. Still, every precaution has been
taken not to introduce regressions.

These are the most significant new features in the linux-ha set:

- new resource agents:


- the drbd agent was removed (it has been deprecated since quite
  some time in favour of ocf:linbit:drbd)
(Continue reading)