Ron Lorah | 1 Sep 2007 09:19

Virtual Hosting Software

Greetings,

I'm trying to put together a simple IP fail-over solution utilizing some
kind of virtual hosting software. Are any known to work? Trial by fire
with Plesk. Just goes to the default Apache page.

Thanks in advance,

~Ron
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Michael Schwartzkopff | 2 Sep 2007 11:03
Picon
Favicon

Understanding "orphan" LRM resources

Hi,

When I use the GUI
- add an apache/ocf resource, but do not start it and
- remove it immedeately again

I still see it in the LRM resources, once per node:
cibadmin -Q -o status
(...)
<lrm_resources>
 <lrm_resource id="resource_apache" type="apache" class="ocf"
provider="heartbeat">
  <lrm_rsc_op id="resource_apache_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource"
transition_key="3:21:d3e0b6cb-0d0d-4b2b-8220-ab6bd96f1798"
transition_magic="0:7;3:21:d3e0b6cb-0d0d-4b2b-8220-ab6bd96f1798"
call_id="14" crm_feature_set="1.0.9" rc_code="7" op_status="0"
interval="0" op_digest="02e4774ae092e0e208c54ca2d084a1c7"/>
 </lrm_resource>
</lrm_resources>
(...)

Please could anybody explain why these resources still appear in my CIB?
Is there any other automatic removal / cleanup process other than manual
invoking:
crm_resource -C -r resource_apache

I am using 2.1.2 from debian/unstable.

Thanks for the clarification.
(Continue reading)

mingdao lu | 2 Sep 2007 10:38
Picon

How to monitor a ip resource?

hi,all

My server has two network interface: enet0 and enet1 and I used heartbeat to
support one ip resource.
At first, it run on enet0. When the enet0 not work, using the enet1 to run
the ip resource.
Could heartbeat do this and how to configure?

Thanks
Mingdao
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Christian Rishøj | 2 Sep 2007 12:44
Gravatar

Re: How to monitor a ip resource?

On 9/2/07, mingdao lu <mingdaolu <at> gmail.com> wrote:
> hi,all
>
> My server has two network interface: enet0 and enet1 and I used heartbeat to
> support one ip resource.
> At first, it run on enet0. When the enet0 not work, using the enet1 to run
> the ip resource.
> Could heartbeat do this and how to configure?

You could configure two IPaddr2 resources with the same IP address
(one for each network interface) and define a colocation rule on them
with -INFINITY, so that they would never run together.

Christian
_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Junko IKEDA | 3 Sep 2007 06:16
Picon

crmd would dumped core, while recovering from split-brain

Hi,

When some nodes recover from Split-Brain,
crmd would dumped core.
Does it work as required?
Is crmd going to shutdown if membership instance ID went backward?

heartbeat[2635]: 2007/09/03_11:48:21 info: Link prec370d:eth2 up.
...
heartbeat[2635]: 2007/09/03_11:48:22 info: Status update for node prec370d:
status active
crmd[2648]: 2007/09/03_11:48:22 notice: crmd_ha_status_callback: Status
update: Node prec370d now has status [active]
...
crmd[2648]: 2007/09/03_11:48:23 info: crmd_ccm_msg_callback: Quorum
(re)attained after event=NEW MEMBERSHIP (id=1)
crmd[2648]: 2007/09/03_11:48:23 ERROR: crmd_ccm_msg_callback: Membership
instance ID went backwards! 3->1
cib[2644]: 2007/09/03_11:48:23 info: cib_ccm_msg_callback: PEER: prec370e
crmd[2648]: 2007/09/03_11:48:23 ERROR: crm_abort: crmd_ccm_msg_callback:
Triggered fatal assert at callbacks.c:526 : current_ccm_membership_id <=
membership->m_instance
pengine[2655]: 2007/09/03_11:48:23 ERROR: subsystem_msg_dispatch: The server
2648 has left us: Shutting down...NOW
heartbeat[2635]: 2007/09/03_11:48:23 WARN: Exiting /usr/lib64/heartbeat/crmd
process 2648 killed by signal 6 [SIGABRT - Abort].
heartbeat[2635]: 2007/09/03_11:48:23 ERROR: Exiting
/usr/lib64/heartbeat/crmd process 2648 dumped core <= ???
tengine[2654]: 2007/09/03_11:48:23 ERROR: subsystem_msg_dispatch: The server
2648 has left us: Shutting down...NOW
(Continue reading)

Junko IKEDA | 3 Sep 2007 09:04
Picon

the way to stop all resources after recovering from Split-Brain

Hi,

Usually, one resource would be allocated on the appropriate node after some
accident blow over like Split-Brain.
This is exactly the right action,
but is there any way I can stop all resources after recovering from
Split-Brain?
STONITH?

Best Regards,
Junko Ikeda

NTT DATA INTELLILINK CORPORATION\

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Dejan Muhamedagic | 3 Sep 2007 12:11

Re: Understanding "orphan" LRM resources

Hi,

On Sun, Sep 02, 2007 at 11:03:47AM +0200, Michael Schwartzkopff wrote:
> Hi,
> 
> When I use the GUI
> - add an apache/ocf resource, but do not start it and
> - remove it immedeately again
> 
> I still see it in the LRM resources, once per node:
> cibadmin -Q -o status
> (...)
> <lrm_resources>
>  <lrm_resource id="resource_apache" type="apache" class="ocf"
> provider="heartbeat">
>   <lrm_rsc_op id="resource_apache_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource"
> transition_key="3:21:d3e0b6cb-0d0d-4b2b-8220-ab6bd96f1798"
> transition_magic="0:7;3:21:d3e0b6cb-0d0d-4b2b-8220-ab6bd96f1798"
> call_id="14" crm_feature_set="1.0.9" rc_code="7" op_status="0"
> interval="0" op_digest="02e4774ae092e0e208c54ca2d084a1c7"/>
>  </lrm_resource>
> </lrm_resources>
> (...)
> 
> Please could anybody explain why these resources still appear in my CIB?

This is most probably a probe monitor operation.

> Is there any other automatic removal / cleanup process other than manual
(Continue reading)

sebastien lorandel | 3 Sep 2007 12:27
Picon

Re: Re: nodes won't auto_failback after network failure

Ok,

now I don't have this error anymore, but services still don't failback to my
node (ha2). And even when my other (ha1) node is shutdown they don't. So I
don't have services running in my cluster anymore...

- After I reconnect the cable, the cluster see it and say that both nodes
are able to run services:

Sep  3 12:01:13 ha1 crmd: [20955]: info: do_state_transition: All 2 cluster
nodes are eligible to run resources.

- But, then it can't make them re-run on the node who failed:

Sep  3 12:01:13 ha1 pengine: [22141]: info: determine_online_status: Node
ha2 is online
Sep  3 12:01:13 ha1 pengine: [22141]: WARN: unpack_rsc_op: Processing failed
op (IPaddr_start_0) on ha2
Sep  3 12:01:13 ha1 pengine: [22141]: WARN: unpack_rsc_op: Handling failed
start for IPaddr on ha2
Sep  3 12:01:13 ha1 pengine: [22141]: WARN: unpack_rsc_op: Processing failed
op (IPaddr_monitor_5000) on ha2

- Then when when I stop ha1 who was running my resources, it says ha2 is
elligible:

Sep  3 12:17:42 ha2 crmd: [7347]: info: do_state_transition: All 1 cluster
nodes are eligible to run resources.

- And then...
(Continue reading)

Dominik Klein | 3 Sep 2007 15:09

Re: patch: bug in the xen resource agent

> Apply my patch to the Resource Agent and it should be good as gold. I've 
> been running some testing and it seems to be quite stable.

I had some time to test it and it really looks good.

But: I need to do some constraints on my xen-machine. Once I add 
constraints, "crm_resource -M -r domU" does not migrate but stop/start 
the machine.

Is there any way around this (except for coding the constraints into the 
"migrate_to" function)?

Regards
Dominik

_______________________________________________
Linux-HA mailing list
Linux-HA <at> lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Dejan Muhamedagic | 3 Sep 2007 16:18

Re: Is the following logic by design for the Split-Brain Case ? If yes - can i disable it ?

Hi,

On Fri, Aug 31, 2007 at 09:05:54AM -0700, Harakiri wrote:
> Hello,
> 
> > It is most probably a bug. The cluster should be
> > able to recover
> > from split brain. Please post the logs.
> > 
> > Dejan
> 
> attached to this message are the log files.
> 
> server1_network_down.txt - the log of server1 when the
> network went down
> 
> server2_network_down.txt - the log of server2 when the
> network went down
> 
> server1_network_restored.txt - the log of server1 when
> the network has been restored
> 
> server2_network_restored.txt - the log of server2 when
> the network has been restored
> 
> resource_my_service = the service which has been
> configured for heartbeat

Read the logs and there everything looks fine. Don't know why
crm_mon shows the nodes as offline on that one node. In the logs,
(Continue reading)


Gmane