Matthias Ferdinand | 29 Sep 14:44 2014
Picon

Re: corosync communication stops after link down

On Fri, Sep 26, 2014 at 12:00:04PM -0600, linux-ha-request <at> lists.linux-ha.org wrote:
> Message: 1
> Date: Fri, 26 Sep 2014 14:41:41 +0200
> From: Helmut Wollmersdorfer <helmut.wollmersdorfer <at> fixpunkt.de>
> To: General Linux-HA mailing list <linux-ha <at> lists.linux-ha.org>
> Subject: Re: [Linux-HA] corosync communication stops after link down
> Message-ID: <1B2FBDF7-C012-4296-8D51-8597492071D5 <at> fixpunkt.de>
> Content-Type: text/plain; charset=us-ascii
> 
> 
> Am 24.09.2014 um 22:35 schrieb Matthias Ferdinand <mf <at> 14v.de>:
> 
> > OS: Ubuntu 14.04 64bit
> > corosync: 2.3.3-1ubuntu1
> > 2 nodes
> > 2 rings (em1, bond0(p2p1,p1p1)) rrp_mode: active,
> >        all with crossover cables, no switches
> > transport: udpu
> 
> 
> So, this bug 
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=746269
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=821352
> 
> is solved in your version of corosync? It must, because the cross-over point-to-point connection would
always fail.

these bug reports are for corosync 1.x and point-to-point interfaces,
(Continue reading)

Stefan Schloesser | 29 Sep 12:05 2014

Totem: Received message has invalid digest after upgrade of cluster node

Hi,

I am currently testing Ubuntu release upgrade from 12.04->14.04. With this the corosync Version changes
from 1.4.2 to 2.3.3. 
After updating a node I wanted to start corosync and shift services to the already upgraded node in order to
upgrade the primary. 

Unfortunately I get the following error:
Totem: Received message has invalid digest

I presume this is due to the big difference in corosync version. So is it principally not possible to have
nodes with such big difference in version in the same cluster ?

My workaround would be to stop corosync on all involved nodes, start the services manually on the already
upgraded node, upgrade the remaining node and then hope for the best with all nodes having the same
version, that the cluster starts again.

Would that be the correct procedure ?

Stefan

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Matthias Ferdinand | 24 Sep 22:35 2014
Picon

corosync communication stops after link down

OS: Ubuntu 14.04 64bit
corosync: 2.3.3-1ubuntu1
2 nodes
2 rings (em1, bond0(p2p1,p1p1)) rrp_mode: active,
        all with crossover cables, no switches
transport: udpu

If the cluster is up for some time (here: ~ 1 week), and one node is
rebooted, corosync on the surviving node (no-carrier on all
corosync-related interfaces) does not resume
sending packets when links go up again after peer finished rebooting
(3-4 minutes link down; tcpdump on both nodes and both em1 and bond0
show: no packets from the surviving node). The rebooted node then cannot
see any neighbor and consequently decides to stonith the peer before
starting resources. But the resources still cannot run until the
stonith'd node is completely rebooted, because the drbd volumes became
outdated at "shutdown -r now" time.

Subsequent reboots do not show any problems. Repeat after ~ 1 week
uptime, and the problem shows up again.

This happened on two different cluster installs with rougly the same
hardware (Dell Poweredge R520 resp. R420, onboard Broadcom BCM5720 (em1),
2x2port Intel I350 (p2p1,p1p1)).

Any ideas?

Regards
  Matthias Ferdinand
_______________________________________________
(Continue reading)

Atul Yadav | 23 Sep 13:25 2014
Picon

Hertbeat fail-over Email Alert

Dear Team ,

In our environment for storage HA, we are using heartbeat method.

Our Storage HA is working fine with Heartbeat management.

Now we need your guidance to setup the EMAIL alert at the time of fail-over
 happen and fail over completed.

We already setup smtp in both the servers.
And we are able to send mail from terminal window.
Storage1
Storage2

Please guide us.

Thank You
Atul Yadav
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Oliver Weichhold | 18 Sep 17:51 2014

Unable to start any node of pgsql Master/Slave Cluster

I'm currently a bit struggling with setting up a PostgreSQL 9.3 HA
Master/Slave Cluster using CentOS 7 (Corosync 2 and Pacemaker 1.1.10).

Please note that I've deliberately stopped node2 currently in order to keep
the scenario simpler (I hope).

After starting the cluster on node1, crm_mon shows the following:

Stack: corosync
Current DC: node1 (1) - partition WITHOUT quorum
Version: 1.1.10-32.el7_0-368c726
2 Nodes configured
4 Resources configured

Online: [ node1 ]
OFFLINE: [ node2 ]

Full list of resources:

 Master/Slave Set: pgsql_master_slave [pgsql]
     Stopped: [ node1 node2 ]
 Resource Group: master-group
     pgsql_vip_rep      (ocf::heartbeat:IPaddr2):       Stopped
     pgsql_forward_listen_port  (ocf::heartbeat:portforward):   Stopped

Node Attributes:
* Node node1:
    + master-pgsql                      : -INFINITY
    + pgsql-status                      : STOP

(Continue reading)

Ulrich Windl | 12 Sep 08:06 2014
Picon

Re: Antw: Re: Postgresql RA fails starting master node

[...]
    If I use:  *ocf_log err "$OCF_RESKEY_config”*   in pgsql
    Where do I have to check this print? Because I’m not seeing it in
corosync.log.
[...]

It depends what log you configured. In my configuration (and probably yours
also) these messages should go to syslog. Maybe try ;-)
ocf_log err "HEY, LOOK here: $OCF_RESKEY_config”

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Federico Castro | 11 Sep 22:45 2014
Picon

Re: Antw: Re: Postgresql RA fails starting master node

Using ocf-tester I get:

ocf-tester -n pgsql -o repuser="ha" -o pgdba="postgres" -o
restart_on_promote="true" -o pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" -o
psql="/usr/lib/postgresql/9.1/bin/psql" -o
pgdata="/var/lib/postgresql/9.1/main/" -o
config="/etc/postgresql/9.1/main/postgresql.conf" -o rep_mode="async" -o
node_list="pz01 pz02" -o restore_command="cp
/var/lib/postgresql/9.1/main/archive/%f %p" -o
primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
keepalives_count=5" -o master_ip="10.10.10.80" -o stop_escalate="0"
/usr/lib/ocf/resource.d/heartbeat/pgsql
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/pgsql...
/usr/sbin/ocf-tester: 268: export: /var/lib/postgresql/9.1/main/archive/%f:
bad variable name

Is this the reason why I get `invalid parameter` ? Do you know what is
wrong there?

And without restore_command:

ocf-tester -n msPgsql -o repuser="ha" -o pgdba="postgres" -o
restart_on_promote="true" -o pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" -o
psql="/usr/lib/postgresql/9.1/bin/psql" -o
pgdata="/var/lib/postgresql/9.1/main/" -o
config="/etc/postgresql/9.1/main/postgresql.conf" -o rep_mode="async" -o
node_list="pz01 pz02" -o primary_conninfo_opt="keepalives_idle=60
keepalives_interval=5 keepalives_count=5" -o master_ip="10.10.10.80" -o
stop_escalate="0" /usr/lib/ocf/resource.d/heartbeat/pgsql
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/pgsql...
(Continue reading)

Federico Castro | 11 Sep 21:52 2014
Picon

Re: Antw: Re: Postgresql RA fails starting master node

Using ocf-tester I get:

ocf-tester -n pgsql -o repuser="ha" -o pgdba="postgres" -o
restart_on_promote="true" -o pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" -o
psql="/usr/lib/postgresql/9.1/bin/psql" -o
pgdata="/var/lib/postgresql/9.1/main/" -o
config="/etc/postgresql/9.1/main/postgresql.conf" -o rep_mode="async" -o
node_list="pz01 pz02" -o restore_command="cp
/var/lib/postgresql/9.1/main/archive/%f %p" -o
primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
keepalives_count=5" -o master_ip="10.10.10.80" -o stop_escalate="0"
/usr/lib/ocf/resource.d/heartbeat/pgsql
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/pgsql...
/usr/sbin/ocf-tester: 268: export: /var/lib/postgresql/9.1/main/archive/%f:
bad variable name

Is this the reason why I get `invalid parameter` ? Do you know what is
wrong there?

And without restore_command:

ocf-tester -n msPgsql -o repuser="ha" -o pgdba="postgres" -o
restart_on_promote="true" -o pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" -o
psql="/usr/lib/postgresql/9.1/bin/psql" -o
pgdata="/var/lib/postgresql/9.1/main/" -o
config="/etc/postgresql/9.1/main/postgresql.conf" -o rep_mode="async" -o
node_list="pz01 pz02" -o primary_conninfo_opt="keepalives_idle=60
keepalives_interval=5 keepalives_count=5" -o master_ip="10.10.10.80" -o
stop_escalate="0" /usr/lib/ocf/resource.d/heartbeat/pgsql
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/pgsql...
(Continue reading)

Federico Castro | 10 Sep 20:49 2014
Picon

Postgresql RA fails starting master node

Hi all,

I´m working on a two node cluster with pacemaker and postgresql.
For some reason I don't really understand pgsql RA fails to start postgres
on first node.
If I start postgresql manually on my two nodes, then replication works
correctly.

I would really appreciate some clue on what to check from my installation
or configuration.

Thanks in advance.

I'm using:
OS: Debian 7
RA: resource-agents    1:3.9.2-5+deb7u2
      but using pgsql RA from
https://raw.githubusercontent.com/ClusterLabs/resource-agents/a6f4ddf76cb4bbc1b3df4c9b6632a6351b63c19e/heartbeat/pgsql
Pacemaker: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
Postgresql: 9.1+134wheezy4

CRM actual state:

============
Last updated: Thu Aug 28 12:58:51 2014
Last change: Thu Aug 28 12:58:46 2014 via crmd on pz01
Stack: openais
Current DC: pz01 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
(Continue reading)

Alan Robertson | 10 Sep 02:31 2014

Anyone use Neo4j? Is there interest in a resource agent for Neo4j?

Hi,

I use the Neo4j graph database in the Assimilation project, and the
Assimilation code uses OCF RAs (among others).

So I wrote a neo4j resource agent.  I'm currently publishing it as part
of the Assimilation project code - because it's convenient for me.

If there is interest from others who would use it, I'd be happy to
provide it individually, or maybe even as part of the resource-agents
package.

The reason for me not to do that is that the distros lag far behind
(usually years behind) current source.  So even if I published it there,
it would likely be years before I could stop publishing my own copy.

[I confess I haven't yet written the metadata for the agent - because I
don't need it.  If someone wants to use it, I'd be happy to take a patch
with metadata, or *gasp* write it myself].

As an aside:
The reason I use a graph database (like Neo4j) is this: we model data
centers (servers, applications, networks, IPs, MACs, switch connections,
dependencies, etc) -- and almost all interesting questions about data
centers are naturally graph questions.

    -- Alan Robertson
       alanr <at> assimilationsystems.com OR alanr <at> unix.sh
       http://assimilationsystems.com/
_______________________________________________
(Continue reading)

Ulrich Windl | 9 Sep 16:20 2014
Picon

FYI: Patched ocf:pacemaker:ping RA

Hi!

Here's my patch I did today to the ping RA of pacemaker (current version fro mSLES11 SP3). Basically I wanted
the RA to use ping even if fping is found on the system. Anyway, here it is (edited, because ist on of 14
patches, all tabs expanded to spaces through copy from PuTTY and paste to Windows):
---
From 63f5d42d316f562a8c8ebc4bed6dff4859a9fc57 Mon Sep 17 00:00:00 2001
From: Ulrich Windl <Ulrich.Windl <at> RZ.Uni-Regensburg.DE>
Date: Tue, 9 Sep 2014 15:26:33 +0200
Subject: [PATCH 1/1] Changed ping from pacemaker (SLES11 SP3)

Change ping: Parameter "pidfile" is "unique" now.  Improve description of
"dampen" parameter.  Indicate the correct default for "multiplier" and
"attempts".  Add parameter "flavor" to select ping or fping.  Fix output of
ping_usage().  Use options also for fping.  Only use fping if ping was not
selected.
---
 ping        |   27 +-

diff --git a/ping b/ping
index b9a69b8..adb7682 100755
--- a/ping
+++ b/ping
 <at>  <at>  -40,7 +40,7  <at>  <at>  meta_data() {
 <?xml version="1.0"?>
 <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
 <resource-agent name="ping">
-<version>1.0</version>
+<version>1.1</version>

(Continue reading)


Gmane