song loesprite | 1 Apr 09:17 2006
Picon

Re: Race conditions when usin mon

Hi!
I am now using the Heartbeat 2.0.4. I configured it with "--enable-crm".
But when I add "crm on|yes" to ha.cf , error appeared.
I want some howtos about the crm. Thanks.

2006/3/29, Andrew Beekhof <beekhof <at> gmail.com>:
On 3/18/06, Héctor Cordobés <hcordobes <at> motorola.com> wrote:
> Hi all
>
> I am facing an awkward situation. I am using mon for monitoring
> services, and a am running it with a respawn clause.
>
> But I see the following:
>
> 1.- I try to stop heartbeat.
> 2.- Heartbeat stops services
> 3.- Mon restarts services
> 4.- Partitions cannot be unmounted or whatever happens...
>
> I feel like this is caused because the respawned processes are killed at
> the end.
>
> Is there any means to take care of this case and let mon die gracefully
> upon stopping heartbeat?

Yes... use the 2.x series and enable the crm.  Then you don't need mon
at all :-)

http://www.linux-ha.org/v2
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Andrew Beekhof | 1 Apr 10:15 2006
Picon

Re: Race conditions when usin mon


On Apr 1, 2006, at 9:17 AM, song loesprite wrote:

Hi!
I am now using the Heartbeat 2.0.4. I configured it with "--enable-crm".
But when I add "crm on|yes" to ha.cf , error appeared.
I want some howtos about the crm. Thanks.

try the link in the email you replied to 


2006/3/29, Andrew Beekhof <beekhof <at> gmail.com>:
On 3/18/06, Héctor Cordobés <hcordobes <at> motorola.com> wrote:
> Hi all
>
> I am facing an awkward situation. I am using mon for monitoring
> services, and a am running it with a respawn clause.
>
> But I see the following:
>
> 1.- I try to stop heartbeat.
> 2.- Heartbeat stops services
> 3.- Mon restarts services
> 4.- Partitions cannot be unmounted or whatever happens...
>
> I feel like this is caused because the respawned processes are killed at
> the end.
>
> Is there any means to take care of this case and let mon die gracefully
> upon stopping heartbeat?

Yes... use the 2.x series and enable the crm.  Then you don't need mon
at all :-)

http://www.linux-ha.org/v2
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list

--
Andrew Beekhof

“The greatest trick the devil ever pulled was convincing the world he didn’t exist.” - The Usual Suspects



_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Paul Sindelar | 1 Apr 19:20 2006
Picon

Re: ipfail and CRM

Larkin Lowrey wrote:
> From various posts, it appears that ipfail does not work with CRM. Is 
> this correct? If so, is there an alternative strategy or will ipfail 
> ultimately be made compatible with CRM?
>
> If I go to a v1 config, I will be giving up clusters with more than 
> two nodes, right? The docs say that v1 has an "inability to monitor 
> resources for their correct operation." What does that mean exactly? 
> Will it not use the 'status' command of the script periodically to 
> check for process death?
>
> --Larkin
> _______________________________________________
> Linux-HA mailing list
> Linux-HA <at> lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
I could be mistaken, but I think the gods are working on implementing 
ipfail to work with crm.  From a post I saw a week or so ago, it'll 
probably be at least a couple weeks maybe a month or two?  There is a 
lot of demand for this feature so it'll be coming soon.

Regards,
-Paul

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Robert Heinzmann | 2 Apr 18:58 2006
Picon
Picon

Re: Starting and stopping resources manually

Hello,

I played some more with heartbeat 2.0.4 (HEAD branch from CVS) and found 
some aswers regarding my prior questions. Could someone please verify if 
my assumptions are correct ?

>
> - how can I manually start and stop resources from the command line.
I can use "lrmadmin" for this. With lrmadmin I can stop and start 
resources. The command:
./lrmadmin -E resource_WebserverIP stop 0 0 EVERYTIME
stops the IP resource. And the "start" action starts the resource again.
The "lrmadmin" command is the only command to manually start and stop 
resources / resource groups in the cluster.

The problem with lrmadmin is, that it does not take constraints like 
startafter and stopbefore into account. I can stop the IP manually with 
the lrmadmin command, altought the Webserver keeps running and I have a 
symmetric "before" constraint in place. I verified the correct function 
of the constraint, by stopping and starting heartbeat on the machine 
(via init script) and the stop and start order was correct - contrains 
work ok.

Another problem is, that if I have monitor operations for the resource 
in place and I stop a resource via "lrmadmin", heartbeat detects the 
resource as not running during the next check and restarts it again.

Therefore I think using lrmadmin is no solution and more of a workaround 
so start and stop resources. One way to achieve what I want could be:
- remove all monitor operations from the resources I want to stop
- stop all resources in the right order manually
do what I like with the resources
- start the resources in the right order manually (reverse stop order)
- add the monitor operations to the resources again.

Is there a better way ? :)

>
> - How can I take resources out of monitoring ?
>
I can take resources our of monitoring by removing the monitor operators 
from the resource. I can use the cibadmin command for this purpose.
> - How can I stop the cluster software without causing the resources to 
> be stooped (rolling update WITHOUT downtime)
>
I have not found a way to do this - any hints appreshiated :)

Robert
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Andrew Beekhof | 3 Apr 12:37 2006
Picon

Re: Starting and stopping resources manually

On 4/2/06, Robert Heinzmann <Robert.Heinzmann <at> gmx.net> wrote:
> Hello,
>
> I played some more with heartbeat 2.0.4 (HEAD branch from CVS) and found
> some aswers regarding my prior questions. Could someone please verify if
> my assumptions are correct ?
>
> >
> > - how can I manually start and stop resources from the command line.

The short answer is: dont.

You have a cluster resource manager for a reason... you should tell it
want you want and then let it do its job.  Telling it that you want a
service to always be running and then manually stopping it is asking
for trouble.

If you need to do maintenance on a resource, try "is_managed".  When
set to false, the CRM wont start, stop, or recover it anywhere in the
cluster.

As of a few minutes ago, the config can now allow also you to specify
what "target_role" you want a resource to be in (Started or Stopped)
with a default of Started.

"target_role = Stopped" is approximately equal to the sequence:
1) is_managed = false
2) manual stop
3) cancel monitoring

> I can use "lrmadmin" for this. With lrmadmin I can stop and start
> resources. The command:
> ./lrmadmin -E resource_WebserverIP stop 0 0 EVERYTIME
> stops the IP resource. And the "start" action starts the resource again.
> The "lrmadmin" command is the only command to manually start and stop
> resources / resource groups in the cluster.
>
> The problem with lrmadmin is, that it does not take constraints like
> startafter and stopbefore into account. I can stop the IP manually with
> the lrmadmin command, altought the Webserver keeps running and I have a
> symmetric "before" constraint in place. I verified the correct function
> of the constraint, by stopping and starting heartbeat on the machine
> (via init script) and the stop and start order was correct - contrains
> work ok.
>
> Another problem is, that if I have monitor operations for the resource
> in place and I stop a resource via "lrmadmin", heartbeat detects the
> resource as not running during the next check and restarts it again.
>
> Therefore I think using lrmadmin is no solution and more of a workaround
> so start and stop resources. One way to achieve what I want could be:
> - remove all monitor operations from the resources I want to stop
> - stop all resources in the right order manually
> do what I like with the resources
> - start the resources in the right order manually (reverse stop order)
> - add the monitor operations to the resources again.
>
> Is there a better way ? :)
>
>
> >
> > - How can I take resources out of monitoring ?
> >
> I can take resources our of monitoring by removing the monitor operators
> from the resource. I can use the cibadmin command for this purpose.
> > - How can I stop the cluster software without causing the resources to
> > be stooped (rolling update WITHOUT downtime)
> >
> I have not found a way to do this - any hints appreshiated :)
>

is_managed_default = false
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Diego Leider | 3 Apr 14:00 2006

Re: heartbeat 2.0.4 rpm's for Redhat Enterprise Linux 4

Hi Alberto,

When trying to use crm_mon with the heartbeat package you provided i get
the error:

[root <at> venus ~]# crm_mon
The use of crm_mon requires ncurses to be available during the build
process

I'm still getting the error message below when i try to build the
package 

[root <at> venus ~]# rpmbuild -ba ./heartbeat.spec

----------------------------------------------------------------
(cd .libs && rm -f libhbmgmtcommon.so.0 && ln -s
libhbmgmtcommon.so.0.0.0 libhbmgmtcommon.so.0)
(cd .libs && rm -f libhbmgmtcommon.so && ln -s libhbmgmtcommon.so.0.0.0
libhbmgmtcommon.so)
ar cru .libs/libhbmgmtcommon.a  libhbmgmtcommon_la-mgmt_common_lib.o
ranlib .libs/libhbmgmtcommon.a
creating libhbmgmtcommon.la
(cd .libs && rm -f libhbmgmtcommon.la && ln -s ../libhbmgmtcommon.la
libhbmgmtcommon.la)
swig not found -python pymgmt.i
gmake[2]: swig: Command not found
gmake[2]: *** [pymgmt.py] Error 127
gmake[2]: Leaving directory
`/usr/src/redhat/BUILD/heartbeat-2.0.4/lib/mgmt'
gmake[1]: *** [all-recursive] Error 1
gmake[1]: Leaving directory `/usr/src/redhat/BUILD/heartbeat-2.0.4/lib'
make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.53712 (%build)

RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.53712 (%build)
----------------------------------------------------------------

Can you please give a hand with this one?

-- 

Diëgo Leider

--------------------------------
Software Developement
Phone:  +31 (0)36 5483783
Fax:    +31 (0)36 5483788
Email:  diego <at> daisycon.com

Daisycon B.V.
P.J. Oudweg 5
1314CH, Almere
The Netherlands
--------------------------------

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Werner Fischer | 3 Apr 16:01 2006
Picon

Heartbeat with Xen/qemu/linux-vserver/openVZ/UML/... experiences?

Hi all,

we are preparing a talk ("High availability clustering of virtual
machines – possibilities and pitfalls") for the Linuxtag in May.

The talk will cover clusters on different levels in a virtualized
environment:
1. clusters between the underlying host system
2. clusters between virtual machines (which run themselves on different
   hardware machines)
3. clusters between a physical machine and a virtual machine

We'll talk about how well these different kinds of cluster
implementations work with different virtualization techniques.

We'd also like to mention some in-production examples, so if you have
such an environment running we would be very happy if you could share
your experiences with us on the list.

greetings from Austria,
Werner

PS: details on the talk can be found at
http://www.linuxtag.org/2006/de/besucher/programm/freies-vortragsprogramm/samstag.html?talkid=306

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Andrew Beekhof | 3 Apr 16:21 2006
Picon

Re: Undocumented configurations in <crm_config> ?

On 3/31/06, Chun Tian (binghe) <binghe.lisp <at> gmail.com> wrote:
> Hi,
>
> I'm using Heartbeat 2.0.3 on several Debian box. When I check
> heartbeat's debug file, I find this two configuration items in
> <crm_config> which doesn't appear in 'Annotated CIB DTD (1.0)':
>
> crm_transition_idle_timeout
> crm_remove_after_stop
>
> What are they mean? Are they useful?

not sure about the crm_ prefix but:

remove_after_stop was something I was experimenting with.
when set to true, after a resource is stopped on a node, it is also
removed from the LRM.  it was intended to reduce the size of the CIB
but it hasn't been tested much and may have other side-effects (good
and bad).

transition_idle_timeout was documented as transition_timeout (i've
since fixed that)
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Paul Sindelar | 3 Apr 16:43 2006
Picon

Re: heartbeat 2.0.4 rpm's for Redhat Enterprise Linux 4

Diego Leider wrote:
> Hi Alberto,
>
> When trying to use crm_mon with the heartbeat package you provided i get
> the error:
>
> [root <at> venus ~]# crm_mon
> The use of crm_mon requires ncurses to be available during the build
> process
>
> I'm still getting the error message below when i try to build the
> package 
>
> [root <at> venus ~]# rpmbuild -ba ./heartbeat.spec
>
> ----------------------------------------------------------------
> (cd .libs && rm -f libhbmgmtcommon.so.0 && ln -s
> libhbmgmtcommon.so.0.0.0 libhbmgmtcommon.so.0)
> (cd .libs && rm -f libhbmgmtcommon.so && ln -s libhbmgmtcommon.so.0.0.0
> libhbmgmtcommon.so)
> ar cru .libs/libhbmgmtcommon.a  libhbmgmtcommon_la-mgmt_common_lib.o
> ranlib .libs/libhbmgmtcommon.a
> creating libhbmgmtcommon.la
> (cd .libs && rm -f libhbmgmtcommon.la && ln -s ../libhbmgmtcommon.la
> libhbmgmtcommon.la)
> swig not found -python pymgmt.i
> gmake[2]: swig: Command not found
> gmake[2]: *** [pymgmt.py] Error 127
> gmake[2]: Leaving directory
> `/usr/src/redhat/BUILD/heartbeat-2.0.4/lib/mgmt'
> gmake[1]: *** [all-recursive] Error 1
> gmake[1]: Leaving directory `/usr/src/redhat/BUILD/heartbeat-2.0.4/lib'
> make: *** [all-recursive] Error 1
> error: Bad exit status from /var/tmp/rpm-tmp.53712 (%build)
>
>
> RPM build errors:
>     Bad exit status from /var/tmp/rpm-tmp.53712 (%build)
> ----------------------------------------------------------------
>
> Can you please give a hand with this one?
>
>   

Do you have ncurses-devel package installed?  I had recieved the same 
error, but after installing  the latest ncurses-devel I was able to 
build.  I'm running CentOS 4.2 x86

Good luck,
-Paul

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Laurent Denel | 3 Apr 15:48 2006
Picon

questions about resource_stickiness, DC_TIMEOUT, rsc_order, suppress_cib_writes

Hi,

I've been facing some behaviours of heartbeat2 (2.0.4) that brought up
some questions :

Is it possible, when heartbeat2 is starting, to reduce the DC_TIMEOUT.
I suppose this is when hearbeat waits for a possible DC on an other
node to show up before starting its own. These are 2 consecutive lines
that show this (check the 60 seconds elapsed time before the 2
events):
Apr  3 11:18:23 ha01 crmd: [4993]: info:
mask(utils.c:crm_timer_popped): Wait Timer (I_NULL) just popped!
Apr  3 11:19:24 ha01 crmd: [4993]: info:
mask(utils.c:crm_timer_popped): Election Trigger (I_DC_TIMEOUT) just
popped!

Suppress_cib_writes has been removed after 2.0.2. What is the
procedure I shall use to ensure I can manage my cib.xml under CVS and
be sure it's the one that will be started on the nodes after their
possible reboot ? In fact, we have a configuration delivery system
that runs every day to ensure conformance between the CVS repository
and the host.

Next, I wasn't able to add a resource order constraint, I used the
following syntax but heartbeat keeps complaining at each start saying
"mysql_front01-COMMON-db01" is not a resource, which is not true.
Apr  3 09:03:14 ha03 pengine: [26021]: ERROR:
mask(unpack.c:unpack_rsc_order): Constraint order_front01-COMMON-db01:
no resource found for LHS of mysql_front01-COMMON-db01
Here's the config line :
<rsc_order id="order_front01-COMMON-db01"
from="vol1_front01-COMMON-db01" action="start" type="before"
to="mysql_front01-COMMON-db01" symmetrical="TRUE"/>
Is there something wrong with my syntax ?

Finally my major concern is about "default_resource_stickiness" which
is set to INFINITY. It appeared that after a successful switch from
the master to the slave, when I powered on again the master it got
back the resources. It's the opposite to what I was aiming for, i.e.
no automatic switch back.

I've joined my cib.xml and ha.cf (Mysql and Kill resources are
custom). I use a Kill resource to power down the master node, I know
there's stonith with fencing for that but I already have too much
stuff pending before digging again on that part.

Thanks beforehand for the help !

Cheers,
LD
Attachment (cib.xml): text/xml, 4782 bytes
Attachment (ha.cf): application/octet-stream, 393 bytes
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Gmane