Dejan Muhamedagic | 21 Jul 16:49 2014
Picon

glue 1.0.12 released

Hello,

The current glue repository has been tagged as 1.0.12.

It's been a while since the release candidate 1.0.12-rc1. There
were a few minor fixes and additions in the meantime, mostly for
hb_report.

Please upgrade at the earliest possible opportunity.

You can get the 1.0.12 tarball here:

	http://hg.linux-ha.org/glue/archive/glue-1.0.12.tar.bz2

The ChangeLog is available here:

http://hg.linux-ha.org/glue/file/glue-1.0.12/ChangeLog

A set of rpms is also available at the openSUSE Build Service:*)

http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

The packages at the openSUSE Build Service will not work with
pacemaker versions earlier than v1.1.8 because the LRM bits are
not compiled.

Many thanks to all contributors. Without you this release would
not have been possible.

Enjoy!
(Continue reading)

Nirmal Fernando | 21 Jul 13:48 2014

Errors when starting heartbeat on CentOS

Hi All,

I was trying to configure heart beat on 2 AWS EC2 instances (CentOS) and
currently facing following error [1].

Also the kernel version;

*[root <at> node01 stratos]# rpm -qa |egrep 'heartbeat|kernel-2.6'*
kernel-2.6.32-431.5.1.el6.x86_64
kernel-2.6.32-279.1.1.el6.x86_64
heartbeat-3.0.4-2.el6.x86_64
kernel-2.6.32-431.11.2.el6.x86_64
heartbeat-libs-3.0.4-2.el6.x86_64
kernel-2.6.32-431.17.1.el6.x86_64
kernel-2.6.32-431.20.3.el6.x86_64

Any help is appreciated.

[1]
Jul 21 10:22:25 node01 heartbeat: [3083]: info: **************************
Jul 21 10:22:25 node01 heartbeat: [3083]: info: Configuration validated.
Starting heartbeat 3.0.4
Jul 21 10:22:25 node01 heartbeat: [3084]: info: heartbeat: version 3.0.4
Jul 21 10:22:25 node01 heartbeat: [3084]: info: Heartbeat generation:
1405925294
Jul 21 10:22:25 node01 heartbeat: [3084]: info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth0
Jul 21 10:22:25 node01 heartbeat: [3084]: info: glib: ucast: bound send
socket to device: eth0

(Continue reading)

Charles Taylor | 17 Jul 17:24 2014
Picon

Managed Failovers w/ NFS HA Cluster

I feel like this is something that must have been covered extensively already but I've done a lot of
googling, looked at a lot of cluster configs, but have not found the solution.

I have an HA NFS cluster (corosync+pacemaker).  The relevant rpms are listed below but I'm not sure they are
that important to the question which is this...

When performing managed failovers of the NFS-exported file system resource from one node to the other (crm
resource move), any active NFS clients experience an I/O error when the file system is unexported.  In
other words, you must unexport it to unmount it.  As soon as it is unexported, clients are no longer able to
write to it and experience an I/O error (rather than just blocking).

In a failure scenario this is not a problem becuase the file system is never unexported on the primary
server.  Rather the server just goes down, the secondary takes over the resources and client I/O blocks
until the process is complete and then goes about its business.   We would like this same behavior for a
*managed* failover but have not found a mount or export option/scenario that works.   Is it possible?  What
am I missing?

I realize this is more of an nfs/exportfs question but I would think that those implementing NFS HA clusters
would be familiar with the scenario I'm describing.

Regards,

Charlie Taylor

pacemaker-cluster-libs-1.1.7-6.el6.x86_64
pacemaker-cli-1.1.7-6.el6.x86_64
pacemaker-1.1.7-6.el6.x86_64
pacemaker-libs-1.1.7-6.el6.x86_64
resource-agents-3.9.2-40.el6.x86_64
fence-agents-3.1.5-35.el6.x86_64
(Continue reading)

willi.fehler@t-online.de | 18 Jul 13:05 2014
Picon

DRBD on CentOS7

Hello,

I'm trying to use DRBD on CentOS7. It looks like RedHat hasn't compiled DRBD into the Kernel.
So I downloaded the source rpm from Fedora 19 and created my own rpm.

[root <at> centos7 ~]# rpm -qa | grep drbd
drbd-utils-8.4.3-2.el7.centos.x86_64
drbd-8.4.3-2.el7.centos.x86_64
drbd-udev-8.4.3-2.el7.centos.x86_64

But I cannot load the drbd kernel module:

[root <at> centos7 ~]# modprobe drbd
modprobe: FATAL: Module drbd not found.

Regards - Willi

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Felipe Lima | 15 Jul 14:33 2014
Picon

Ha.cf with IPv6 address

Dear,

I am using the following configuration on my server:

ubuntu 12.04 + heartbeat 1:3.0.5-3ubuntu

I wonder if it is possible to configure an IPv6 address on file ha.cf? 
Well done proper setting there is the following error message:

Starting High-Availability services: ipv6addr [3390]: INFO: Resource is 
stopped
Heartbeat failure [rc = 6]. Failed.

heartbeat [3436]: 2014/07/14_15: 59:32 ERROR: glib: ucast: can not 
resolve hostname
heartbeat [3436]: 2014/07/14_15: 59:32 ERROR: glib: ucast: Interface 
[eth0] does not exist
heartbeat [3436]: 2014/07/14_15: 59:32 ERROR: Heartbeat not started: 
configuration error.
heartbeat [3436]: 2014/07/14_15: 59:32 ERROR: Configuration error, 
heartbeat not started.

Thank you in advance for your help.

Best Regards,
Felipe Lima
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
(Continue reading)

Kristoffer Grönlund | 30 Jun 09:35 2014

Announcing crmsh release 2.1

Today we are proud to announce the release of `crmsh` version 2.1!
This version primarily fixes all known issues found since the release
of `crmsh` 2.0 in April, but also has some major new features.

A massive thank you to everyone who has helped out with bug fixes,
comments and contributions for this release!

For a complete list of changes since the previous version, please
refer to the changelog:

* https://github.com/crmsh/crmsh/blob/2.1.0/ChangeLog

Packages for several popular Linux distributions can be downloaded
from the Stable repository at the OBS:

*
  http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

(Will be updated soon)

Archives of the tagged release:

* https://github.com/crmsh/crmsh/archive/2.1.0.tar.gz
* https://github.com/crmsh/crmsh/archive/2.1.0.zip

Here are some of the highlights of this release:

== Rule expressions in attribute lists

One of the biggest features in this release is full support for rule
(Continue reading)

Daniel Thielking | 26 Jun 13:13 2014
Picon
Picon

IP-Address problem with cluster

Dear members!
I have a problem with our cluster.
We have two nodes each has an IP and one more IP for the cluster.
All our resources are on node-1 and node-2 is our slave one. But if I 
ping from node-1 to any other machine he is using the cluster-IP and not 
his own IP do you know how to fix that?

Thank You for your help!

-- 
_____________________________________________________

Auszubildender Fachinformatiker für Systemintegration
RWTH Aachen
Lehrstuhl für Integrierte Analogschaltungen
Raum 24C 313
Walter-Schottky-Haus
Sommerfeldstr. 24
D-52074 Aachen

www.ias.rwth-aachen.de

Email: Daniel.Thielking <at> ias.rwth-aachen.de
Phone: +49-(0)241-80-27771
   FAX: +49-(0)241-80-627771
_____________________________________________________

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
(Continue reading)

Pasi Kärkkäinen | 24 Jun 22:20 2014
Picon
Picon

heartbeat 3.0.3 crashes if there are networking/multicast issues (ERROR: lowseq cannnot be greater than ackseq)

Hello!

I've been seeing heartbeat cluster problems in Linux-based Vyatta and more recent VyOS
networking/router appliances.
These are currently based on Debian Squeeze, and thus are using:

Package: heartbeat
Version: 1:3.0.3-2

VyOS bug report: http://bugzilla.vyos.net/show_bug.cgi?id=244

The problem is that when there are (unexpected) networking problems causing multicast issues,
which cause problems in the inter-cluster communications, the heartbeat processes will die on the
cluster nodes,
which is bad, right? I assume heartbeat should never die, especially not because of temporary networking issues..

I've also seen heartbeat dying because of temporary network maintenance breaks..

Basicly first I'm seeing this kind of messages:

Jun 23 17:55:02 vyos03 heartbeat: [4119]: WARN: node vyos01: is dead
Jun 23 17:59:23 vyos03 heartbeat: [4119]: CRIT: Cluster node vyos01 returning after partition.
Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Deadtime value may be too small.
Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Late heartbeat: Node vyos01: interval 273580 ms
Jun 23 17:59:23 vyos03 harc[4961]: info: Running /etc/ha.d//rc.d/status status
Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Releasing resource group: vyos01 IPaddr2-vyatta::10.0.0.10/24/eth1
Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Running /etc/ha.d/resource.d/IPaddr2-vyatta
10.0.0.10/24/eth1 stop
Jun 23 17:59:26 vyos03 heartbeat: [4119]: WARN: 1 lost packet(s) for [vyos01] [421:423]
Jun 23 17:59:39 vyos03 heartbeat: [4119]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
(Continue reading)

fank | 20 Jun 21:18 2014

unable to recover from split-brain in a two-node cluster

Hi,

New to this list and hope I can get some help here.

I'm using pacemaker 1.0.10 and heartbeat 3.0.5 for a two-node cluster. I'm having split-brain problem
when heartbeat messages sometimes get dropped when system is under high load. However the problem is it
never recover back when system load became low.

I created a test setup to test this by setting dead time to 6 seconds, and continuously dropping one-way
heartbeat packets (udp dst port 694) for 5~8 seconds and resume the traffic for 1~2 seconds using
iptables. After the system got into split-brain state, I stop the test and allow all heartbeat traffic to
go through. Sometimes the system recovered by sometimes it didn't. There are various symptoms when the
system didn't recovered from split-brain:

1. In one instance, cl_status listnodes becomes empty. The syslog keeps showing
2014-06-19T18:59:57+00:00 node-0 heartbeat:  [daemon.warning] [2853]: WARN: Message hist queue is
filling up (436 messages in queue)
2014-06-19T18:59:57+00:00 node-0 heartbeat:  [daemon.debug] [2853]: debug: hist->ackseq =12111
2014-06-19T18:59:57+00:00 node-0 heartbeat:  [daemon.debug] [2853]: debug: hist->lowseq =12111, hist->hiseq=12547
2014-06-19T18:59:57+00:00 node-0 heartbeat:  [daemon.debug] [2853]: debug: expecting from node-1
2014-06-19T18:59:57+00:00 node-0 heartbeat:  [daemon.debug] [2853]: debug: it's ackseq=12111

2. In another instance, cl_status nodestatus <node> shows both nodes are active, but "crm_mon -1" shows
that each of the two nodes thinks itself is the DC, and peer node is offline. Pengine process is running on
one node only. The node not running pengine (but still thinks itself is DC) has log shows crmd terminated
pengine because it detected peer is active. After that, the peer status keeps flapping between dead and
active, but pengine has never being started again. The last log shows the peer is active (after I stopped
the test and allow all traffic). However "crm_mon -1" shows itself is the DC and peer is offline as:

[root <at> node-1 ~]# crm_mon -1
(Continue reading)

Ilo Lorusso | 19 Jun 09:54 2014
Picon

ldirectord question

Hi ,

I have a general question of how ldirectord works, I have setup my
virtual service and real servers

I have an active connection and traffic is flowing though to the real
server perfectly as shown below

I want to know is it possible to move an established connection
between the real servers with out resetting the connection ?

[root <at> lbmaster ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=32768)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.16.162.190:40054 wlc persistent 300
  -> 172.16.162.199:40054         Masq    100    1          0
  -> 172.16.162.200:40054         Masq    99     0          0
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Bauer, Stefan (IZLBW Extern | 18 Jun 08:42 2014
Picon

Patch/recommendation for ocf:heartbeat:Filesystem cifs

Dear Users/Developers,

we're using ocf:heartbeat:Filesystem but fail to unmount cifs mounts if the cifs server went down.
Please consider adding -l (lazy umount) to the umount_force variable in the RA.

With the above option in use, we could unmounts the cifs share cleanly without running in any timeouts.

Cheers

Stefan
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Gmane