Kristoffer Grönlund | 29 Oct 00:33 2014

Announcing crmsh release 2.1.1


Today we are proud to announce the release of `crmsh` version 2.1.1!
This version primarily fixes all known issues found since the release
of `crmsh` 2.1 in June. We recommend that all users of crmsh upgrade
to this version, especially if using Pacemaker 1.1.12 or newer.

A massive thank you to everyone who has helped out with bug fixes,
comments and contributions for this release!

For a complete list of changes since the previous version, please
refer to the changelog:

* https://github.com/crmsh/crmsh/blob/2.1.1/ChangeLog

Packages for several popular Linux distributions can be downloaded
from the Stable repository at the OBS:

* http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

Archives of the tagged release:

* https://github.com/crmsh/crmsh/archive/2.1.1.tar.gz
* https://github.com/crmsh/crmsh/archive/2.1.1.zip

Changes since the previous release:

 - cibconfig: Clean up output from crm_verify (bnc#893138)
 - high: constants: Add acl_target and acl_group to cib_cli_map (bnc#894041)
 - high: parse: split shortcuts into valid rules
 - medium: Handle broken CIB in find_objects
(Continue reading)

Ulrich Windl | 27 Oct 10:36 2014
Picon

Q: crm node status

Hi!

A simple question: Is it intentional that "crm node status" outputs XML? Usually crm tries to avoid bashing
the user with XML ;-)
(crmsh-1.2.6-0.35.11 of SLES 11)

And an RFE: Implement "version" in crm shell to display its version...

Regards,
Ulrich

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

John Scalia | 21 Oct 17:15 2014
Picon

Configuring corosync on a CentOS 6.5

Hi all, again,

My network engineer and I have found that the VM's hypervisor was set up to block multicast broadcasts by our
security team. We're not really certain why or if we can change that 
for at least my 3 systems. He's speaking with them now. Anyway, as you don't have to configure corosync on
CentOS or Redhat, and there isn't even an /etc/corosync/corosync.conf on 
these systems, what problems could I cause by creating a config file and would the system actually use it on a
restart? I want to try setting the multicast address to a unicast 
one, at least for testing.

This whole setup seems a little odd since CentOS uses CMAN and pacemaker, but corosync is getting started
and I see all the systems listening on port 5404 and 5405 similar to as 
follows:

udp    0    0 10.10.1.129:5404            0.0.0.0:*
udp    0    0 10.10.1.129:5405            0.0.0.0:*
udp    0    0 239.192.143.91:5405     0.0.0.0"*

So, if CentOS uses CMAN and pacemaker, why is corosync still in the mix?
--
Jay
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Robert.Koeppl | 20 Oct 20:51 2014

AUTO: Robert Koeppl ist außer Haus. Robert Koeppl is out of office (Rückkehr am 23.10.2014)


Ich kehre zurück am 23.10.2014.

Hinweis: Dies ist eine automatische Antwort auf Ihre Nachricht  "Re:
[Linux-HA] Remote node attributes support in crmsh" gesendet am 20.10.2014
17:23:46.

Diese ist die einzige Benachrichtigung, die Sie empfangen werden, während
diese Person abwesend ist.

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

John Scalia | 20 Oct 20:50 2014
Picon

New user can't get cman to recognize other systems

Hi all,

I'm trying to build my first ever HA cluster and I'm using 3 VMs running CentOS 6.5. I followed the
instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

and everything appears to start normally, but if I run "cman_tool nodes -a", I only see:

Node     Sts    Inc          Joined Name
         1      M     64         2014-10--20 14:00:00  csgha1
                 Addresses: 10.10.1.128
         2      X 0                                                  csgha2
         3      X 0                                                  csgha3

In the other systems, the output is the same except for which system is shown as joined. Each shows just
itself as belonging to the cluster. Also, "pcs status" reflects similarly 
with non-self systems showing offline. I've checked "netstat -an" and see each machine listening on ports
5405 and 5405. And the logs are rather involved, but I'm not seeing errors 
in it.

Any ideas for where to look for what's causing them to not communicate?
--
Jay
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

(Continue reading)

Vladislav Bogdanov | 20 Oct 08:03 2014

Remote node attributes support in crmsh

Hi Kristoffer,

do you plan to add support for recently added "remote node attributes"
feature to chmsh?

Currently (at least as of 2.1, and I do not see anything relevant in the
git log) crmsh fails to update CIB if it contains node attributes for
remote (bare-metal) node, complaining that duplicate element is found.
But for bare-metal nodes it is natural to have ocf:pacemaker:remote
resource with name equal to remote node uname (I doubt it can be
configured differently).
If I comment check for 'obj_id in id_set', then it fails to update CIB
because it inserts above primitive definition into the node section.

Best,
Vladislav
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Ulrich Windl | 8 Oct 14:25 2014
Picon

crm shell: misleading "Do you want to edit again?"

Hi!

I discovered an inconsistency in crm shell crmsh-1.2.6-0.35.11 (SLES11): When you add a primitive
interactively using an unknown parameter, you can commit the change. However if you use "crm configure
edit <primitive>", after saving you'll see: "Do you want to edit again?"

My assumption was that answering "no" will keep the changes as written, but in fact the changes seem to be
discarded when answering "no":
[...]
Do you want to edit again? no
crm(live)configure# commit
INFO: apparently there is nothing to commit
INFO: try changing something first

What about changing the question to "What now? (Keep|Fix|Revert)?

(With the obvious semantics: Keep=Keeps the changes as written, Fix=Try again to fix the problem,
Revert=Revert to the loaded configuration)

I assume I did something dirty on my system: Just updated the RA on one node, so the other node didn't know
about the new RA, but anyway...

Regards,
Ulrich

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
(Continue reading)

Greg Woods | 1 Oct 16:40 2014
Picon

Corosync 1 -> 2

I notice that the "network:ha-clustering:Stable" repo for CentOS 6 now
contains Corosync 2.3.3-1 . I am currently running 1.4.1-17 . Is it safe to
just run this update? Are there configuration changes I have to make in
order for the new version to work? (If there is a document or wiki page
describing how to convert from Corosync 1 to 2, I would be happy to be
pointed to it).

Thanks,
--Greg
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Matthias Ferdinand | 29 Sep 14:44 2014
Picon

Re: corosync communication stops after link down

On Fri, Sep 26, 2014 at 12:00:04PM -0600, linux-ha-request <at> lists.linux-ha.org wrote:
> Message: 1
> Date: Fri, 26 Sep 2014 14:41:41 +0200
> From: Helmut Wollmersdorfer <helmut.wollmersdorfer <at> fixpunkt.de>
> To: General Linux-HA mailing list <linux-ha <at> lists.linux-ha.org>
> Subject: Re: [Linux-HA] corosync communication stops after link down
> Message-ID: <1B2FBDF7-C012-4296-8D51-8597492071D5 <at> fixpunkt.de>
> Content-Type: text/plain; charset=us-ascii
> 
> 
> Am 24.09.2014 um 22:35 schrieb Matthias Ferdinand <mf <at> 14v.de>:
> 
> > OS: Ubuntu 14.04 64bit
> > corosync: 2.3.3-1ubuntu1
> > 2 nodes
> > 2 rings (em1, bond0(p2p1,p1p1)) rrp_mode: active,
> >        all with crossover cables, no switches
> > transport: udpu
> 
> 
> So, this bug 
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=746269
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=821352
> 
> is solved in your version of corosync? It must, because the cross-over point-to-point connection would
always fail.

these bug reports are for corosync 1.x and point-to-point interfaces,
(Continue reading)

Stefan Schloesser | 29 Sep 12:05 2014

Totem: Received message has invalid digest after upgrade of cluster node

Hi,

I am currently testing Ubuntu release upgrade from 12.04->14.04. With this the corosync Version changes
from 1.4.2 to 2.3.3. 
After updating a node I wanted to start corosync and shift services to the already upgraded node in order to
upgrade the primary. 

Unfortunately I get the following error:
Totem: Received message has invalid digest

I presume this is due to the big difference in corosync version. So is it principally not possible to have
nodes with such big difference in version in the same cluster ?

My workaround would be to stop corosync on all involved nodes, start the services manually on the already
upgraded node, upgrade the remaining node and then hope for the best with all nodes having the same
version, that the cluster starts again.

Would that be the correct procedure ?

Stefan

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Matthias Ferdinand | 24 Sep 22:35 2014
Picon

corosync communication stops after link down

OS: Ubuntu 14.04 64bit
corosync: 2.3.3-1ubuntu1
2 nodes
2 rings (em1, bond0(p2p1,p1p1)) rrp_mode: active,
        all with crossover cables, no switches
transport: udpu

If the cluster is up for some time (here: ~ 1 week), and one node is
rebooted, corosync on the surviving node (no-carrier on all
corosync-related interfaces) does not resume
sending packets when links go up again after peer finished rebooting
(3-4 minutes link down; tcpdump on both nodes and both em1 and bond0
show: no packets from the surviving node). The rebooted node then cannot
see any neighbor and consequently decides to stonith the peer before
starting resources. But the resources still cannot run until the
stonith'd node is completely rebooted, because the drbd volumes became
outdated at "shutdown -r now" time.

Subsequent reboots do not show any problems. Repeat after ~ 1 week
uptime, and the problem shows up again.

This happened on two different cluster installs with rougly the same
hardware (Dell Poweredge R520 resp. R420, onboard Broadcom BCM5720 (em1),
2x2port Intel I350 (p2p1,p1p1)).

Any ideas?

Regards
  Matthias Ferdinand
_______________________________________________
(Continue reading)


Gmane