John Scalia | 21 Oct 17:15 2014

Configuring corosync on a CentOS 6.5

Hi all, again,

My network engineer and I have found that the VM's hypervisor was set up to block multicast broadcasts by our
security team. We're not really certain why or if we can change that 
for at least my 3 systems. He's speaking with them now. Anyway, as you don't have to configure corosync on
CentOS or Redhat, and there isn't even an /etc/corosync/corosync.conf on 
these systems, what problems could I cause by creating a config file and would the system actually use it on a
restart? I want to try setting the multicast address to a unicast 
one, at least for testing.

This whole setup seems a little odd since CentOS uses CMAN and pacemaker, but corosync is getting started
and I see all the systems listening on port 5404 and 5405 similar to as 

udp    0    0  *
udp    0    0  *
udp    0    0"*

So, if CentOS uses CMAN and pacemaker, why is corosync still in the mix?
Linux-HA mailing list
Linux-HA <at>
See also:

Robert.Koeppl | 20 Oct 20:51 2014

AUTO: Robert Koeppl ist außer Haus. Robert Koeppl is out of office (Rückkehr am 23.10.2014)

Ich kehre zurück am 23.10.2014.

Hinweis: Dies ist eine automatische Antwort auf Ihre Nachricht  "Re:
[Linux-HA] Remote node attributes support in crmsh" gesendet am 20.10.2014

Diese ist die einzige Benachrichtigung, die Sie empfangen werden, während
diese Person abwesend ist.

Linux-HA mailing list
Linux-HA <at>
See also:

John Scalia | 20 Oct 20:50 2014

New user can't get cman to recognize other systems

Hi all,

I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the
instructions to the letter at:

and everything appears to start normally, but if I run "cman_tool nodes -a", I only see:

Node     Sts    Inc          Joined Name
         1      M     64         2014-10--20 14:00:00  csgha1
         2      X 0                                                  csgha2
         3      X 0                                                  csgha3

In the other systems, the output is the same except for which system is shown as joined. Each shows just
itself as belonging to the cluster. Also, "pcs status" reflects similarly 
with non-self systems showing offline. I've checked "netstat -an" and see each machine listening on ports
5405 and 5405. And the logs are rather involved, but I'm not seeing errors 
in it.

Any ideas for where to look for what's causing them to not communicate?
Linux-HA mailing list
Linux-HA <at>
See also:

(Continue reading)

Vladislav Bogdanov | 20 Oct 08:03 2014

Remote node attributes support in crmsh

Hi Kristoffer,

do you plan to add support for recently added "remote node attributes"
feature to chmsh?

Currently (at least as of 2.1, and I do not see anything relevant in the
git log) crmsh fails to update CIB if it contains node attributes for
remote (bare-metal) node, complaining that duplicate element is found.
But for bare-metal nodes it is natural to have ocf:pacemaker:remote
resource with name equal to remote node uname (I doubt it can be
configured differently).
If I comment check for 'obj_id in id_set', then it fails to update CIB
because it inserts above primitive definition into the node section.

Linux-HA mailing list
Linux-HA <at>
See also:

Ulrich Windl | 8 Oct 14:25 2014

crm shell: misleading "Do you want to edit again?"


I discovered an inconsistency in crm shell crmsh-1.2.6-0.35.11 (SLES11): When you add a primitive
interactively using an unknown parameter, you can commit the change. However if you use "crm configure
edit <primitive>", after saving you'll see: "Do you want to edit again?"

My assumption was that answering "no" will keep the changes as written, but in fact the changes seem to be
discarded when answering "no":
Do you want to edit again? no
crm(live)configure# commit
INFO: apparently there is nothing to commit
INFO: try changing something first

What about changing the question to "What now? (Keep|Fix|Revert)?

(With the obvious semantics: Keep=Keeps the changes as written, Fix=Try again to fix the problem,
Revert=Revert to the loaded configuration)

I assume I did something dirty on my system: Just updated the RA on one node, so the other node didn't know
about the new RA, but anyway...


Linux-HA mailing list
Linux-HA <at>
See also:
(Continue reading)

Greg Woods | 1 Oct 16:40 2014

Corosync 1 -> 2

I notice that the "network:ha-clustering:Stable" repo for CentOS 6 now
contains Corosync 2.3.3-1 . I am currently running 1.4.1-17 . Is it safe to
just run this update? Are there configuration changes I have to make in
order for the new version to work? (If there is a document or wiki page
describing how to convert from Corosync 1 to 2, I would be happy to be
pointed to it).

Linux-HA mailing list
Linux-HA <at>
See also:

Matthias Ferdinand | 29 Sep 14:44 2014

Re: corosync communication stops after link down

On Fri, Sep 26, 2014 at 12:00:04PM -0600, linux-ha-request <at> wrote:
> Message: 1
> Date: Fri, 26 Sep 2014 14:41:41 +0200
> From: Helmut Wollmersdorfer <helmut.wollmersdorfer <at>>
> To: General Linux-HA mailing list <linux-ha <at>>
> Subject: Re: [Linux-HA] corosync communication stops after link down
> Message-ID: <1B2FBDF7-C012-4296-8D51-8597492071D5 <at>>
> Content-Type: text/plain; charset=us-ascii
> Am 24.09.2014 um 22:35 schrieb Matthias Ferdinand <mf <at>>:
> > OS: Ubuntu 14.04 64bit
> > corosync: 2.3.3-1ubuntu1
> > 2 nodes
> > 2 rings (em1, bond0(p2p1,p1p1)) rrp_mode: active,
> >        all with crossover cables, no switches
> > transport: udpu
> So, this bug 
> is solved in your version of corosync? It must, because the cross-over point-to-point connection would
always fail.

these bug reports are for corosync 1.x and point-to-point interfaces,
(Continue reading)

Stefan Schloesser | 29 Sep 12:05 2014

Totem: Received message has invalid digest after upgrade of cluster node


I am currently testing Ubuntu release upgrade from 12.04->14.04. With this the corosync Version changes
from 1.4.2 to 2.3.3. 
After updating a node I wanted to start corosync and shift services to the already upgraded node in order to
upgrade the primary. 

Unfortunately I get the following error:
Totem: Received message has invalid digest

I presume this is due to the big difference in corosync version. So is it principally not possible to have
nodes with such big difference in version in the same cluster ?

My workaround would be to stop corosync on all involved nodes, start the services manually on the already
upgraded node, upgrade the remaining node and then hope for the best with all nodes having the same
version, that the cluster starts again.

Would that be the correct procedure ?


Linux-HA mailing list
Linux-HA <at>
See also:

Matthias Ferdinand | 24 Sep 22:35 2014

corosync communication stops after link down

OS: Ubuntu 14.04 64bit
corosync: 2.3.3-1ubuntu1
2 nodes
2 rings (em1, bond0(p2p1,p1p1)) rrp_mode: active,
        all with crossover cables, no switches
transport: udpu

If the cluster is up for some time (here: ~ 1 week), and one node is
rebooted, corosync on the surviving node (no-carrier on all
corosync-related interfaces) does not resume
sending packets when links go up again after peer finished rebooting
(3-4 minutes link down; tcpdump on both nodes and both em1 and bond0
show: no packets from the surviving node). The rebooted node then cannot
see any neighbor and consequently decides to stonith the peer before
starting resources. But the resources still cannot run until the
stonith'd node is completely rebooted, because the drbd volumes became
outdated at "shutdown -r now" time.

Subsequent reboots do not show any problems. Repeat after ~ 1 week
uptime, and the problem shows up again.

This happened on two different cluster installs with rougly the same
hardware (Dell Poweredge R520 resp. R420, onboard Broadcom BCM5720 (em1),
2x2port Intel I350 (p2p1,p1p1)).

Any ideas?

  Matthias Ferdinand
(Continue reading)

Atul Yadav | 23 Sep 13:25 2014

Hertbeat fail-over Email Alert

Dear Team ,

In our environment for storage HA, we are using heartbeat method.

Our Storage HA is working fine with Heartbeat management.

Now we need your guidance to setup the EMAIL alert at the time of fail-over
 happen and fail over completed.

We already setup smtp in both the servers.
And we are able to send mail from terminal window.

Please guide us.

Thank You
Atul Yadav
Linux-HA mailing list
Linux-HA <at>
See also:

Oliver Weichhold | 18 Sep 17:51 2014

Unable to start any node of pgsql Master/Slave Cluster

I'm currently a bit struggling with setting up a PostgreSQL 9.3 HA
Master/Slave Cluster using CentOS 7 (Corosync 2 and Pacemaker 1.1.10).

Please note that I've deliberately stopped node2 currently in order to keep
the scenario simpler (I hope).

After starting the cluster on node1, crm_mon shows the following:

Stack: corosync
Current DC: node1 (1) - partition WITHOUT quorum
Version: 1.1.10-32.el7_0-368c726
2 Nodes configured
4 Resources configured

Online: [ node1 ]
OFFLINE: [ node2 ]

Full list of resources:

 Master/Slave Set: pgsql_master_slave [pgsql]
     Stopped: [ node1 node2 ]
 Resource Group: master-group
     pgsql_vip_rep      (ocf::heartbeat:IPaddr2):       Stopped
     pgsql_forward_listen_port  (ocf::heartbeat:portforward):   Stopped

Node Attributes:
* Node node1:
    + master-pgsql                      : -INFINITY
    + pgsql-status                      : STOP

(Continue reading)