robin | 1 Sep 2011 05:04
Favicon

Re: Antw: Why 'crm resource cleanup' cannot work

Hi Gent,

It seems some errors in syslog when I run "crm_resource -C -r linkmon"

[root <at> master ~]# crm status
============
Last updated: Thu Sep  1 10:58:18 2011
Stack: Heartbeat
Current DC: master (1f226e55-fb60-4dc4-b800-f5fc3126b3b6) - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 1 expected votes
2 Resources configured.
============

Online: [ master installer-11-00 ]

 linkmon    (ocf::platform:linkmon):    Started installer-11-00
 Resource Group: Installer
     ip1    (ocf::heartbeat:IPaddr2):    Started installer-11-00
     ip1arp    (ocf::heartbeat:SendArp):    Started installer-11-00
     ip2    (ocf::heartbeat:IPaddr2):    Started installer-11-00
     ip2arp    (ocf::heartbeat:SendArp):    Started installer-11-00
     dhcp    (lsb:dhcpd):    Started installer-11-00

Failed actions:
    linkmon_start_0 (node=master, call=71, rc=1, status=complete): unknown error
[root <at> master ~]# crm_resource -C -r linkmon
Cleaning up linkmon on master
Cleaning up linkmon on installer-11-00

(Continue reading)

Nikita Michalko | 1 Sep 2011 08:27

Re: Why 'crm resource cleanup' cannot work

Hi Robin,

I had a similar problem in the past with the old version. You may  want send 
your  configuration and logs?

Regards

Nikita Michalko

Am Mittwoch, 31. August 2011 13:22:36 schrieb robin:
> If they are official stable release, we can consider the upgrade.
> But does the upgrade can resolve my issue?
> 
> 
> Regards,
> -Robin
> 
> At 2011-08-31 17:56:23,"Nikita Michalko" <michalko.system <at> a-i-p.com> wrote:
> >Am Mittwoch, 31. August 2011 11:23:46 schrieb robin:
> >> Append the version
> >>
> >> [root <at> master ~]# rpm -qa|grep heartbeat
> >> heartbeat-3.0.3-2.3.el5
> >> heartbeat-libs-3.0.3-2.3.el5
> >> [root <at> master ~]# rpm -qa|grep pacemaker
> >> pacemaker-libs-1.0.9.1-1.15.el5
> >> pacemaker-1.0.9.1-1.15.el5
> >>
> >>
> >> Regards,
(Continue reading)

Lorenzo Milesi | 1 Sep 2011 09:00
Gravatar

DRBD+Xen, problems during shutdown

Hi.

I've set up a configuration with drbd as storage for a Xen VM.
Both resources are handled with Pacemaker 1.0.9.1.

It works great, and seems to watch and check for resources correctly, but I have an issue upon shutdown.
If I "halt" the last node pacemaker will initiate the stop of all resources at once, so trying to stop drbd
before the VM has shutdown. This causes drbd to timeout, other ops to hang and preventing the hardware to halt.
I've tried setting up an "order" directive, but even if it can work on startup doesn't seem to affect stop.

How can I make drbd wait for the VM to stop before trying to release the resource?
Should I use resource groups or is there another way?

Thanks.

node host1
node host2
primitive DRBD-ubuntu ocf:linbit:drbd \
	params drbd_resource="ubuntu" \
	operations $id="DRBD-ubuntu-ops" \
	op monitor interval="20" role="Master" timeout="40" \
	op monitor interval="30" role="Slave" timeout="40" \
	meta target-role="started"
primitive XEN-ubuntu ocf:heartbeat:Xen \
	params xmfile="/etc/xen/test.yotest.com.cfg" \
	op monitor interval="10s" \
	op start interval="0" timeout="240s" \
	op stop interval="0" timeout="240s" \
	meta allow-migrate="false" target-role="Started"
ms ubuntu-MS DRBD-ubuntu \
(Continue reading)

Dejan Muhamedagic | 1 Sep 2011 11:38

Re: custom jboss init script on pacemaker

Hi,

On Wed, Aug 31, 2011 at 04:17:27PM -0500, David Gersic wrote:
> Note: I know that I'm following up on an old list message here...
> 
> 
> >>> On 11/30/2010 at 04:55 AM, Michael Kromer <michael.kromer <at> millenux.com> wrote:
> 
> > right, for reference:
> > 
> > http://www.linux-ha.org/doc/re-ra-jboss.html
> 
> Which is now moved to:
> 
> http://www.linux-ha.org/doc/man-pages/re-ra-jboss.html
> 
> 
> > I just recommend to take a safe look at the timeouts, as 60s could be
> > too short for some larger applications.
> 
> Agreed. I've also modified this OCF to include a JAVA_OPTS parameter, to allow passing in parameters for
Java itself, not just for JBoss. I'll see if I can merge my changes in to the current version from
linux-ha.org. Assuming that I can, who do I then provide them to so that others can benefit?

I guess that the best is to clone the github repository, then
send a pull request. Otherwise, just send patches to the
linux-ha-dev ML.

Cheers,

(Continue reading)

gilmarlinux | 1 Sep 2011 12:41
Picon

Re: DRBD+Xen, problems during shutdown


Hello! I think you'll have to change
the execution order of shutdown
of services. Edit the startup script and change the line as below. In this example the script when a
heartbeat it to shutdown the script first and after the
xen drbd. # Required-Start: $ network $ remote_fs $ time $
syslog xend drbd # Should-Start: openhpid
# Required-Stop: $ network $ remote_fs
$ time $ syslog xend drbd> Hi.> > I've set up a
configuration with drbd as storage for a Xen VM.> Both resources are handled
with Pacemaker 1.0.9.1.> > It works great, and seems to watch and
check for resources correctly, but I have an> issue upon shutdown.> If
I "halt" the last node pacemaker will initiate the stop of all resources at
once, so> trying to stop drbd before the VM has shutdown. This causes drbd to
timeout, other ops> to hang and preventing the hardware to halt.> I've
tried setting up an "order" directive, but even if it can work on startup
doesn't> seem to affect stop.> > How can I make drbd wait for
the VM to stop before trying to release the resource?> Should I use resource
groups or is there another way?> > Thanks.> > node
host1> node host2> primitive DRBD-ubuntu ocf:linbit:drbd \>
	params drbd_resource="ubuntu" \> 	operations
$id="DRBD-ubuntu-ops" \> 	op monitor interval="20"
role="Master" timeout="40" \> 	op monitor
interval="30" role="Slave" timeout="40" \> 	meta
target-role="started"> primitive XEN-ubuntu ocf:heartbeat:Xen \> 	params
xmfile="/etc/xen/test.yotest.com.cfg" \> 	op monitor
interval="10s" \> 	op start interval="0"
timeout="240s" \> 	op stop interval="0"
timeout="240s" \> 	meta allow-migrate="false"
target-role="Started"> ms ubuntu-MS DRBD-ubuntu \> 	meta
(Continue reading)

alain.moulle | 1 Sep 2011 14:00
Picon

Pacemaker : Pb on stop on a resource while the monitoring is performed

Hi

My release is :
pacemaker-1.1.2-7 (on RHEL6)
and I have checked that the patch :
High: PE: Bug lf#2433 - No services should be stopped until probes finish
is effectively integrated in this release.

Nethertheless, it seems that I got a similar problem from time to time for 
whatever primitive: a primitive under pacemaker is flagged "failed" for 
one
node whereas the primitive is already started on the other node. Then a 
simple cleanup on the group erase the Failure and all is fine, but
it happens let's say within two hours when I start a loop (a robustness 
test) of migration on the group (which includes the primitive) from one 
node to the other and vice-versa with a delay of 300s between each 
migration.

If I compare the logs (syslog) generated by the scenario when all is fine 
and when I got the error, the first error I found is :
node1 daemon info lrmd [38904]: info: flush_op: process for operation 
monitor[2973] on ocf:<provider>:<scriptname>::<primitive name> for client 
38907 still running, flush delayed 
node1 daemon debug crmd [38907]: debug: cancel_op: Op 2973 for 
<primitive-name> (<primitive-name>:2973): cancelled 

It seems that Pacemaker applies the stop on the primitive running on node1 
just at the moment when a monitoring is currently checking the primitive, 
so the
monitor stop operation is delayed. The primitive stop is effective and the 
(Continue reading)

Ulrich Windl | 2 Sep 2011 09:10
Picon

Antw: DRBD+Xen, problems during shutdown

order Xen-after-DRBD inf: ubuntu-MS:promote XEN-ubuntu:start

Hi!

I think you only specify ordering for the start, not for the stop.

Ulrich

>>> Lorenzo Milesi <maxxer <at> ufficyo.com> schrieb am 01.09.2011 um 09:00 in Nachricht
<1665271283.991.1314860403727.JavaMail.root <at> quaglia>:
> Hi.
> 
> I've set up a configuration with drbd as storage for a Xen VM.
> Both resources are handled with Pacemaker 1.0.9.1.
> 
> It works great, and seems to watch and check for resources correctly, but I 
> have an issue upon shutdown.
> If I "halt" the last node pacemaker will initiate the stop of all resources 
> at once, so trying to stop drbd before the VM has shutdown. This causes drbd 
> to timeout, other ops to hang and preventing the hardware to halt.
> I've tried setting up an "order" directive, but even if it can work on 
> startup doesn't seem to affect stop.
> 
> How can I make drbd wait for the VM to stop before trying to release the 
> resource?
> Should I use resource groups or is there another way?
> 
> Thanks.
> 
> node host1
(Continue reading)

Dipti Bharvirkar | 2 Sep 2011 16:16
Picon

Queries on Heartbeat 3.

Hi,

I had some very basic queries on the Heartbeat 3 software.

1. For our Geo-redundancy requirement, we need some software to
maintain the heartbeat between the active and the standby node and
raise an alarm if either of the nodes are found to be down. Can I use
the Heartbeat software for this purpose (only the messaging
component). We do not want to do any resource manangement or automatic
failover. We need the software to simply detect when the other node is
down and raise a flag. Can this be achieved?
2. With Heartbeat 3, I believe we can use the messaging component
alone without the other components. Is that right?
3. Ours will be a "warm" standby. We need the active server to raise
an alarm when standby goes down and vice-versa. Can Heartbeat work
both ways?
4. I understand that Heartbeat can write to syslog. Could you confirm?
5. Could the Heartbeat software be used to monitor services on the system?

Thanks,
DB
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Andreas Mock | 4 Sep 2011 23:22
Picon

Install problems with ha resource

Hi all,

I downloaded the current resource agent file with
git clone --depth=1 https://github.com/ClusterLabs/resource-agents/
After 
./configure   --prefix=/usr/local --localstatedir=/var/local
make
make install

I get the error 
/usr/local/etc/ha.d/shellfuncs: Zeile 96: /usr/lib/ocf/lib//heartbeat/ocf-shellfuncs: file or
directory not found

After manually corecting the path to
/usr/local /usr/lib/ocf/lib//heartbeat/ocf-shellfuncs
I get the error
/usr/local//usr/lib/ocf/lib//heartbeat/ocf-shellfuncs: Zeile 56:
/usr/lib/ocf/lib/heartbeat/ocf-binaries: file or directory not
found

So I'm pretty sure that the directories given to ./configure are
not honoured correctly.

Can someone with the right knowledge correct these problems?
Or give the right hints?

Best regards
Andreas Mock

_______________________________________________
(Continue reading)

Oualid Nouri | 5 Sep 2011 10:38
Picon
Favicon

two node cluster: clvm depending resources restart/stuck when failing node joins cluster

Hi to all,
i have setup a drbd-based dual primary two node cluster with Pacemaker on opensuse 11.4  for testing.
I have also setup drbd=>controld=>clvm=>lvm=>ocfs2 resources (all clones)  and a samba+IP resource
(primitive)  . Fencing is done via UPS with two apcsmart resources.
So far it seems to work. The resources come all up. I can access the samba share.
Going in standby shutting down and restarting one of the nodes. Everything worked as expected.
After this test i started testing failover functionality by powering off one node.
After powering off one Node by pulling the power cable the hosted resources failed over to the remaining
node (failover node). As expected
The "failing" node get fenced by powering off the UPS, as expected.

So far so good.....

But when the "failing" node comes back online the drbd+ControlD resource came up. The controld depending
resources (clvm=>lvm=ocfs2 etc.)  on the failed and failover  node stuck. Ending in failed status of the
lvm-resource. None of the clvm depending resources comes up. And the previously functioning resources
on the failover node are no longer accessible.
Checking the status on the command line (on the failover node) shows that all lvm-specific command hang
after the failed node tries to rejoin the Cluster.

There are many howtos and my example is nearly identical.
I have searched the web but did not found any hints.
Is this behavior depending on wrong parameters?
Is this behavior depending on the combination of the used Cluster components?

Any help appreciated, thank you!

Used Software:
Opensuse 11.4 x86_64
Pacemaker 1.1.5
(Continue reading)


Gmane