Rajkumar, Anoop | 1 Jul 2010 16:11
Picon
Favicon

Re: RHEL Cluster node fencing and cluster

Hi

Gfs file system I have created is not being used for services running on tis cluster. I was planning to use it for a database but I was getting into stale cluster problem as soon as file system was created. After changing to gfs instead of gfs2 everything is fine.

Thanks
Anoop


Message: 2
Date: Tue, 29 Jun 2010 13:16:42 +0300
From: jacob ishak <jacob.ishak <at> gmail.com>
To: linux clustering <linux-cluster <at> redhat.com>
Subject: Re: [Linux-cluster] RHEL Cluster node fencing and cluster
Message-ID:
        <AANLkTik2y7_oRHENLq-Z6shLwcV-AsBtB6zsENHm2LQh <at> mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

in your cluster.conf

fstype="ext3"

it should be fstype="gfs" or gfs2

BR

On Mon, Jun 28, 2010 at 8:42 PM, Rajkumar, Anoop
<anoop_rajkumar <at> merck.com>wrote:

>  Hi
>
> I am not getting into the problem now of cluster getting staled after I
> create gfs file system instaed of gfs2. Here is my cluster.conf file.
>
> [root <at> system1 cluster]# more cluster.conf
> <?xml version="1.0"?>
> <cluster alias="cluster1" config_version="33" name="cluster1">
>         <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="100"/>
>         <clusternodes>
>                 <clusternode name="system1.merck.com" nodeid="1"
> votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="system1r"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="system2.merck.com" nodeid="2"
> votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="system2r"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <fencedevices>
>                 <fencedevice agent="fence_ilo" hostname="
> system1r.merck.com" login="admin
> " name="system1r" passwd="Anwyccdfy57"/>
>                 <fencedevice agent="fence_ilo" hostname="
> system2r.merck.com" login="admin
> " name="system1r" passwd="Anwyccdfy57"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="webdomain" nofailback="0"
> ordered="1" restricte
> d="1">
>                                 <failoverdomainnode name="
> system1.merck.com" priority="
> 1"/>
>                                 <failoverdomainnode name="
> system2.merck.com" priority="
> 2"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <ip address="54.3.xyz.abc" monitor_link="1"/>
>                         <script file="/etc/init.d/orig.httpd" name="http
> startup script"/>
>                         <fs device="/dev/sda2" force_fsck="0"
> force_unmount="0" fsid="6443" f
> stype="ext3" mountpoint="/var/www/html" name="httpd-content" options=""
> self_fence="0"/>
>                         <fs device="/dev/sda1" force_fsck="0"
> force_unmount="0" fsid="30579"
> fstype="ext3" mountpoint="/var/lib/mysql" name="mysql-content" options=""
> self_fence="0"/>
>                         <script file="/etc/init.d/mysqld" name="mysql
> startup script"/>
>                         <ip address="192.168.0.3" monitor_link="1"/>
>                 </resources>
>                 <service autostart="1" domain="webdomain"
> name="http-service" recovery="resta
> rt">
>                         <script ref="http startup script"/>
>                         <fs ref="httpd-content"/>
>                         <ip ref="54.3.xyz.abc"/>
>                 </service>
>                 <service autostart="1" domain="webdomain" exclusive="0"
> name="mysql" recovery
> ="disable">
>                         <fs ref="mysql-content"/>
>                         <script ref="mysql startup script"/>
>                         <ip ref="192.168.0.3"/>
>                 </service>
>         </rm>
> </cluster>
>
> Thanks
> Anoop

Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
<div>

<p>Hi
</p>

<p>Gfs file system I have created is not being used for services running on tis cluster. I was planning to use it for a database but I was getting into stale cluster problem as soon as file system was created. After changing to gfs instead of gfs2 everything is fine.</p>

<p>Thanks

<br>Anoop
</p>
<br><p>Message: 2

<br>Date: Tue, 29 Jun 2010 13:16:42 +0300

<br>From: jacob ishak &lt;jacob.ishak <at> gmail.com&gt;

<br>To: linux clustering &lt;linux-cluster <at> redhat.com&gt;

<br>Subject: Re: [Linux-cluster] RHEL Cluster node fencing and cluster

<br>Message-ID:

<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;AANLkTik2y7_oRHENLq-Z6shLwcV-AsBtB6zsENHm2LQh <at> mail.gmail.com&gt;

<br>Content-Type: text/plain; charset="iso-8859-1"
</p>

<p>in your cluster.conf
</p>

<p>fstype="ext3"
</p>

<p>it should be fstype="gfs" or gfs2
</p>

<p>BR
</p>

<p>On Mon, Jun 28, 2010 at 8:42 PM, Rajkumar, Anoop

<br>&lt;anoop_rajkumar <at> merck.com&gt;wrote:
</p>

<p>&gt;&nbsp; Hi

<br>&gt;

<br>&gt; I am not getting into the problem now of cluster getting staled after I

<br>&gt; create gfs file system instaed of gfs2. Here is my cluster.conf file.

<br>&gt;

<br>&gt; [root <at> system1 cluster]# more cluster.conf

<br>&gt; &lt;?xml version="1.0"?&gt;

<br>&gt; &lt;cluster alias="cluster1" config_version="33" name="cluster1"&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;fence_daemon clean_start="0" post_fail_delay="0"

<br>&gt; post_join_delay="100"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;clusternodes&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;clusternode name="system1.merck.com" nodeid="1"

<br>&gt; votes="1"&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;fence&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;method name="1"&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;device name="system1r"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/method&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/fence&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/clusternode&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;clusternode name="system2.merck.com" nodeid="2"

<br>&gt; votes="1"&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;fence&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;method name="1"&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;device name="system2r"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/method&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/fence&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/clusternode&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/clusternodes&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;cman expected_votes="1" two_node="1"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;fencedevices&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;fencedevice agent="fence_ilo" hostname="

<br>&gt; system1r.merck.com" login="admin

<br>&gt; " name="system1r" passwd="Anwyccdfy57"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;fencedevice agent="fence_ilo" hostname="

<br>&gt; system2r.merck.com" login="admin

<br>&gt; " name="system1r" passwd="Anwyccdfy57"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/fencedevices&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;rm&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;failoverdomains&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;failoverdomain name="webdomain" nofailback="0"

<br>&gt; ordered="1" restricte

<br>&gt; d="1"&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;failoverdomainnode name="

<br>&gt; system1.merck.com" priority="

<br>&gt; 1"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;failoverdomainnode name="

<br>&gt; system2.merck.com" priority="

<br>&gt; 2"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/failoverdomain&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/failoverdomains&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;resources&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;ip address="54.3.xyz.abc" monitor_link="1"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;script file="/etc/init.d/orig.httpd" name="http

<br>&gt; startup script"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;fs device="/dev/sda2" force_fsck="0"

<br>&gt; force_unmount="0" fsid="6443" f

<br>&gt; stype="ext3" mountpoint="/var/www/html" name="httpd-content" options=""

<br>&gt; self_fence="0"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;fs device="/dev/sda1" force_fsck="0"

<br>&gt; force_unmount="0" fsid="30579"

<br>&gt; fstype="ext3" mountpoint="/var/lib/mysql" name="mysql-content" options=""

<br>&gt; self_fence="0"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;script file="/etc/init.d/mysqld" name="mysql

<br>&gt; startup script"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;ip address="192.168.0.3" monitor_link="1"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/resources&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;service autostart="1" domain="webdomain"

<br>&gt; name="http-service" recovery="resta

<br>&gt; rt"&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;script ref="http startup script"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;fs ref="httpd-content"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;ip ref="54.3.xyz.abc"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/service&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;service autostart="1" domain="webdomain" exclusive="0"

<br>&gt; name="mysql" recovery

<br>&gt; ="disable"&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;fs ref="mysql-content"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;script ref="mysql startup script"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;ip ref="192.168.0.3"/&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/service&gt;

<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;/rm&gt;

<br>&gt; &lt;/cluster&gt;

<br>&gt;

<br>&gt; Thanks

<br>&gt; Anoop
</p>

Notice:  This e-mail message, together with any attachments, contains
information of Merck &amp; Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.
</div>
C. Handel | 2 Jul 2010 15:59

Re: postgres cluster without shared storage (ESGLinux)

> I need to mount a two nodes cluster with postgres as service. I have mounted
> it in the past but with a shared storage and using GFS but now I don?t have
> this element.
>
> The idea is to have a master node with all the data in its own disk and have
> a mechanism to replicate this data to the slave node in its own disk. If the
> master goes down the slave begin to give the service and the flow of data
> will go from this node to the other one. (the slave node becomes the master
> one)
>
> Is it possible to do something like this?

To get a good HA solution it is always necessary to know all edge
cases of the clients. If you can modify the clients/application it is
much easier to get a working HA solution.

Also you need to define what is HA? Failover in seconds? minutes?
seamless? Client reconnect?

With postgresql 8.3+ (i think it started with 8.3) you can do
logshipping. That results in an active server accepting connections
and a standby. The standby does not accept connections. It follows the
original data pretty close but can lag behind (same lag goes for mysql
replication, except mysql allows readonly on the slave). You need to
script the failover. That is, detect that the master missbehaves,
shoot it (important!) and switch the standby from replication to
service. You can't have a switchback to the master. Clients need to
detect that they lost connection and reconnect. This solution is
straight forward, easy to understand and holds no surprises (so why
did it do this? How did i loose this transaction?). Failover time
depends on timeouts (how long do you wait before you decide that the
master is dead?) but can be done sub-five-minutes.

Starting with postgesql 9 you can have hot-standby that accepts
read-only connections. And the standby can upgrade the connections to
read-write without a reconnect. You still need to work your way
through the ip failover. Haven't used this (and 9 is still beta).

Greetings
   Christoph

AlannY | 4 Jul 2010 17:24
Picon
Favicon
Gravatar

Cross mount of block devices on 2 nodes

Hi there. I'm new in clustering, so I have a question.

I have 2 nodes. For example, ONE and TWO. On ONE, I can export disk block
device
(via GNBD or iSCSI) to TWO. On TWO, I can export another disk block device
and import
it in ONE.

Some, ASCII art:

  +-------------+          +-----------+
  | ONE +-------+          | TWO       |
  |     | disk1 | -------> |           |
  |     +-------+          +-------+   |
  |             | <------- | disk2 |   |
  |             |          +-------+   |
  +-------------+          +-------+---+

Is it possible? Can I append this 2 block devices in one LVM volume group?
Can I format this volume group in GFS2 (or any other cluster filesystem)?

Is there any pros and cons for this method?

Thanks for patience.

P.S. I'm not dumb, just newbie.

Joseph L. Casale | 4 Jul 2010 20:39

Re: Cross mount of block devices on 2 nodes

>Is it possible? Can I append this 2 block devices in one LVM volume group?
>Can I format this volume group in GFS2 (or any other cluster filesystem)?
>
>Is there any pros and cons for this method?

Ugh, why reinvent the wheel...
Look into DRBD, this essentially does what you want or just use the tools
RH provides to create shared storage, if that's in fact what you actually
need.

What exactly are you trying to achieve?

jlc

Kaloyan Kovachev | 4 Jul 2010 20:55
Favicon

Re: Cross mount of block devices on 2 nodes

Hi,

On Sun, 04 Jul 2010 19:24:03 +0400, AlannY <m <at> alanny.ru> wrote:
> Hi there. I'm new in clustering, so I have a question.
> 
> I have 2 nodes. For example, ONE and TWO. On ONE, I can export disk
block
> device
> (via GNBD or iSCSI) to TWO. On TWO, I can export another disk block
device
> and import
> it in ONE.
> 
> Some, ASCII art:
> 
>   +-------------+          +-----------+
>   | ONE +-------+          | TWO       |
>   |     | disk1 | -------> |           |
>   |     +-------+          +-------+   |
>   |             | <------- | disk2 |   |
>   |             |          +-------+   |
>   +-------------+          +-------+---+
> 
> Is it possible? Can I append this 2 block devices in one LVM volume
group?
> Can I format this volume group in GFS2 (or any other cluster
filesystem)?
> 

Yes, it is possible, but i guess you want to make a mirror between the two
... take a look at DRBD in this case

> Is there any pros and cons for this method?
> 

It depends on your usage case, but you haven't provided any info about it.

> Thanks for patience.
> 
> P.S. I'm not dumb, just newbie.
> 
> --
> Linux-cluster mailing list
> Linux-cluster <at> redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Torintino T | 5 Jul 2010 22:46
Picon
Favicon

Redhat Cluster

 
Dear All,

I am newbie to Linux Clustering, i have  2 standalone Redhat servers, i want to setup a cluster on those servers,
to synchronize between each other, and to make a one as standby to the other, if a one fails the other will switchover.

I will mostly use Apache, Mysql, and PHP.

I have read the "Cluster Administration" document, i found that there are multiple methods to setup the cluster,
actually i want to ask expert people in the clustering, which method is the most proper one,
and should i have use a fence device, which one i will preferably use.

Thanks for assistance.

Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. Sign up now.
<div>
&nbsp;<br>
Dear All,<br><br>I am newbie to Linux Clustering, i have&nbsp; 2 standalone Redhat servers, i want to setup a cluster on those servers,<br>to synchronize between each other, and to make a one as standby to the other, if a one fails the other will switchover.<br><br>I will mostly use Apache, Mysql, and PHP.<br><br>I have read the "Cluster Administration" document, i found that there are multiple methods to setup the cluster,<br>actually i want to ask expert people in the clustering, which method is the most proper one,<br>and should i have use a fence device, which one i will preferably use.<br><br>Thanks for assistance.<br><br>Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. <a href="https://signup.live.com/signup.aspx?id=60969" target="_new">Sign up now.</a>
</div>
Abraham Alawi | 6 Jul 2010 03:22
Picon
Picon
Favicon

CLVM/GFS hangs while rebuilding an AoE RAID5 LUN, anyone experienced that before? any clue?

The system was running well for a while but lately we had a flaky disk in the RAID array which we replaced with a
healthy one but suddenly the CLVM/GFS became unusable, we can mount GFS but while listing it recursively
'ls -R' it hangs with Input/output error, can't even access the c/LVM LUN rawly using 'dd' BUT we still can
access the LVM PV devices using 'dd'. Reconfiguring the LVM volume as a local one and accessing it
exclusively from one node doesn't make a difference. 

RHEL5: 2.6.18-164.11.1.el5
# modinfo gfs
filename:       /lib/modules/2.6.18-164.11.1.el5/weak-updates/gfs/gfs.ko
license:        GPL
author:         Red Hat, Inc.
description:    Global File System 0.1.34-2.el5
srcversion:     3B1BAC4069F1A4B556A958A
depends:        dlm
vermagic:       2.6.18-159.el5 SMP mod_unload gcc-4.1

# uname -r
2.6.18-164.11.1.el5

# modinfo /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/block/aoe/aoe.ko
filename:       /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/block/aoe/aoe.ko
description:    AoE block/char driver for 2.6.2 and newer 2.6 kernels
author:         Sam Hopkins <sah <at> coraid.com>
license:        GPL
srcversion:     42BF122979AC807F2BB50E6
depends:        
vermagic:       2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1
parm:           aoe_iflist:aoe_iflist=dev1[,dev2...]
 (string)
parm:           version:aoe module version 74
 (string)
parm:           aoe_dyndevs:Use dynamic minor numbers for devices. (int)
parm:           aoe_deadsecs:After aoe_deadsecs seconds, give up and fail dev. (int)
parm:           aoe_maxout:Only aoe_maxout outstanding packets for every MAC on eX.Y. (int)
parm:           aoe_maxsectors:When nonzero, set the maximum number of sectors per I/O request in new devices. (int)

# modinfo dlm
filename:       /lib/modules/2.6.18-164.11.1.el5/kernel/fs/dlm/dlm.ko
license:        GPL
author:         Red Hat, Inc.
description:    Distributed Lock Manager
srcversion:     E768995007648CA8DB078AE
depends:        configfs
vermagic:       2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1
module_sig:	883f3504b56fe19c59c69348c13cf1f1126a509f6ddaee3965ee8b5fcd04163669647a889a9801e09f722187d1de068c0d52cd2b99bc3d475cb6ca1a0

Herein what the kernel spits out:

Jul  6 11:27:36 kiwiland kernel: GFS 0.1.34-2.el5 (built Sep  9 2009 06:54:42) installed
Jul  6 11:27:36 kiwiland kernel: Lock_DLM (built Sep  9 2009 06:54:38) installed
Jul  6 11:27:36 kiwiland kernel: Lock_Nolock (built Sep  9 2009 06:54:37) installed
Jul  6 11:27:36 kiwiland kernel: Trying to join cluster "lock_dlm", "FSC:files"
Jul  6 11:27:36 kiwiland kernel: Joined cluster. Now mounting FS...
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Trying to acquire journal lock...
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Looking at journal...
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Acquiring the transaction lock...
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Replaying journal...
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Replayed 0 of 11 blocks
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: replays = 0, skips = 4, sames = 7
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Journal replayed in 1s
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Done
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Trying to acquire journal lock...
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Looking at journal...
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Done
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Scanning for log elements...
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Found 2 unlinked inodes
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Found quota changes for 2 IDs
Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Done
Jul  6 11:27:36 kiwiland kernel: Trying to join cluster "lock_dlm", "FSC:webcluster"
Jul  6 11:27:36 kiwiland kernel: Joined cluster. Now mounting FS...
Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Trying to acquire journal lock...
Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Looking at journal...
Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Done
Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Scanning for log elements...
Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Found 0 unlinked inodes
Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Found quota changes for 0 IDs
Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Done
Jul  6 11:27:37 kiwiland kernel: Installing knfsd (copyright (C) 1996 okir <at> monad.swb.de).
Jul  6 11:27:39 kiwiland kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Jul  6 11:27:39 kiwiland kernel: NFSD: starting 90-second grace period
Jul  6 11:32:21 kiwiland kernel: dlm: closing connection to node 1
Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Trying to acquire journal lock...
Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: fatal: invalid metadata block
Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0:   bh = 1432543247 (magic)
Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0:   function = gfs_rgrp_read
Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0:   file =
/builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/rgrp.c, line = 830
Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0:   time = 1278372781
Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: about to withdraw from the cluster
Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: telling LM to withdraw
Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Looking at journal...
Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Acquiring the transaction lock...
Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Replaying journal...
Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Replayed 0 of 0 blocks
Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: replays = 0, skips = 0, sames = 0
Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Journal replayed in 1s
Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Done
Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:files.0: withdrawn
Jul  6 11:33:02 kiwiland kernel: 
Jul  6 11:33:02 kiwiland kernel: Call Trace:
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff88805018>] :gfs:gfs_lm_withdraw+0xc4/0xd3
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff80063a36>] __wait_on_bit+0x60/0x6e
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8001538b>] sync_buffer+0x0/0x3f
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff800a00e5>] wake_bit_function+0x0/0x23
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8881cc97>] :gfs:gfs_meta_check_ii+0x32/0x3e
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff88819439>] :gfs:gfs_rgrp_read+0x139/0x225
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff887fb8e8>] :gfs:glock_wait_internal+0x229/0x2c3
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff887fbd17>] :gfs:gfs_glock_nq+0x395/0x3d6
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff887fbd6e>] :gfs:gfs_glock_nq_init+0x16/0x2a
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff88817466>] :gfs:gfs_rgrp_lvb_init+0x1e/0x3f
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8881a46f>] :gfs:gfs_stat_gfs+0x213/0x273
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8881353d>] :gfs:gfs_statfs+0x67/0xea
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff800deba3>] vfs_statfs+0x63/0x7f
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886d2ce>] :nfsd:nfsd_statfs+0x28/0x38
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff888745f8>] :nfsd:nfsd3_proc_fsstat+0x3f/0x54
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886a1db>] :nfsd:nfsd_dispatch+0xd8/0x1d6
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff886e0529>] :sunrpc:svc_process+0x454/0x71b
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff80064644>] __down_read+0x12/0x92
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886a746>] :nfsd:nfsd+0x1a5/0x2cb
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb
Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
Jul  6 11:33:02 kiwiland kernel: 

Another kernel spit out:
Jul  5 02:01:19 Hercules kernel: GFS: fsid=FSC:files.0: fast statfs start time = 1278252079
Jul  5 03:01:16 Hercules kernel: GFS: fsid=FSC:files.0: fast statfs start time = 1278255676
Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: fatal: invalid metadata block
Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0:   bh = 86700288 (magic)
Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0:   function = gfs_get_meta_buffer
Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0:   file =
/builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/dio.c, line = 1225
Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0:   time = 1278255737
Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: about to withdraw from the cluster
Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: telling LM to withdraw
Jul  5 03:02:21 Hercules kernel: GFS: fsid=FSC:files.0: withdrawn
Jul  5 03:02:21 Hercules kernel: 
Jul  5 03:02:21 Hercules kernel: Call Trace:
Jul  5 03:02:21 Hercules kernel:  [<ffffffff8880a018>] :gfs:gfs_lm_withdraw+0xc4/0xd3
Jul  5 03:02:21 Hercules kernel:  [<ffffffff8001538b>] sync_buffer+0x0/0x3f
Jul  5 03:02:21 Hercules kernel:  [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
Jul  5 03:02:21 Hercules kernel:  [<ffffffff800a00e5>] wake_bit_function+0x0/0x23
Jul  5 03:02:21 Hercules kernel:  [<ffffffff88821c97>] :gfs:gfs_meta_check_ii+0x32/0x3e
Jul  5 03:02:21 Hercules kernel:  [<ffffffff887f7717>] :gfs:gfs_get_meta_buffer+0x1d1/0x247
Jul  5 03:02:21 Hercules kernel:  [<ffffffff88804193>] :gfs:gfs_copyin_dinode+0x1d/0x12f
Jul  5 03:02:21 Hercules kernel:  [<ffffffff88800d6e>] :gfs:gfs_glock_nq_init+0x16/0x2a
Jul  5 03:02:21 Hercules kernel:  [<ffffffff888043e3>] :gfs:inode_create+0x13e/0x1df
Jul  5 03:02:21 Hercules kernel:  [<ffffffff88804a5d>] :gfs:gfs_inode_get+0x9d/0xba
Jul  5 03:02:21 Hercules kernel:  [<ffffffff888053bb>] :gfs:gfs_lookupi+0x33d/0x3df
Jul  5 03:02:21 Hercules kernel:  [<ffffffff887fce57>] :gfs:ea_find_i+0x0/0x6b
Jul  5 03:02:21 Hercules kernel:  [<ffffffff888172af>] :gfs:gfs_lookup+0x363/0x41a
Jul  5 03:02:21 Hercules kernel:  [<ffffffff80025426>] igrab+0x25/0x34
Jul  5 03:02:21 Hercules kernel:  [<ffffffff888055a0>] :gfs:gfs_iget+0x3d/0x1f1
Jul  5 03:02:21 Hercules kernel:  [<ffffffff88801224>] :gfs:gfs_glock_dq+0x13c/0x14b
Jul  5 03:02:21 Hercules kernel:  [<ffffffff8000cf01>] do_lookup+0xe5/0x1e6
Jul  5 03:02:21 Hercules kernel:  [<ffffffff8000a22b>] __link_path_walk+0xa01/0xf42
Jul  5 03:02:21 Hercules kernel:  [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
Jul  5 03:02:21 Hercules kernel:  [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
Jul  5 03:02:21 Hercules kernel:  [<ffffffff80012752>] getname+0x15b/0x1c2
Jul  5 03:02:21 Hercules kernel:  [<ffffffff800236ba>] __user_walk_fd+0x37/0x4c
Jul  5 03:02:21 Hercules kernel:  [<ffffffff8003f235>] vfs_lstat_fd+0x18/0x47
Jul  5 03:02:21 Hercules kernel:  [<ffffffff8002a95a>] sys_newlstat+0x19/0x31
Jul  5 03:02:21 Hercules kernel:  [<ffffffff8005dde9>] error_exit+0x0/0x84
Jul  5 03:02:21 Hercules kernel:  [<ffffffff8005d116>] system_call+0x7e/0x83

Thanks in advance,

  -- Abraham

''''''''''''''''''''''''''''''''''''''''''''''''''''''
Abraham Alawi

Unix/Linux Systems Administrator
Science IT
University of Auckland
e: a.alawi <at> auckland.ac.nz
p: +64-9-373 7599, ext#: 87572

''''''''''''''''''''''''''''''''''''''''''''''''''''''

POWERBALL ONLINE | 6 Jul 2010 05:44
Picon

Re: Redhat Cluster

I thing you should use conga because it very easy

2010/7/6 Torintino T <torintino1 <at> hotmail.com>
 
Dear All,

I am newbie to Linux Clustering, i have  2 standalone Redhat servers, i want to setup a cluster on those servers,
to synchronize between each other, and to make a one as standby to the other, if a one fails the other will switchover.

I will mostly use Apache, Mysql, and PHP.

I have read the "Cluster Administration" document, i found that there are multiple methods to setup the cluster,
actually i want to ask expert people in the clustering, which method is the most proper one,
and should i have use a fence device, which one i will preferably use.

Thanks for assistance.

Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. Sign up now.

--
Linux-cluster mailing list
Linux-cluster <at> redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

<div>
<p>I thing you should use conga because it very easy<br><br></p>
<div class="gmail_quote">2010/7/6 Torintino T <span dir="ltr">&lt;<a href="mailto:torintino1 <at> hotmail.com">torintino1 <at> hotmail.com</a>&gt;</span><br><blockquote class="gmail_quote">
<div>&nbsp;<br>Dear All,<br><br>I am newbie to Linux Clustering, i have&nbsp; 2 standalone Redhat servers, i want to setup a cluster on those servers,<br>to synchronize between each other, and to make a one as standby to the other, if a one fails the other will switchover.<br><br>I will mostly use Apache, Mysql, and PHP.<br><br>I have read the "Cluster Administration" document, i found that there are multiple methods to setup the cluster,<br>actually i want to ask expert people in the clustering, which method is the most proper one,<br>
and should i have use a fence device, which one i will preferably use.<br><br>Thanks for assistance.<br><div class="hm">
<br>
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. <a href="https://signup.live.com/signup.aspx?id=60969" target="_blank">Sign up now.</a>
</div>
</div>
<br>--<br>Linux-cluster mailing list<br><a href="mailto:Linux-cluster <at> redhat.com">Linux-cluster <at> redhat.com</a><br><a href="https://www.redhat.com/mailman/listinfo/linux-cluster" target="_blank">https://www.redhat.com/mailman/listinfo/linux-cluster</a><br>
</blockquote>
</div>
<br>
</div>
Steven Whitehouse | 6 Jul 2010 10:22
Picon
Favicon

Re: CLVM/GFS hangs while rebuilding an AoE RAID5 LUN, anyone experienced that before? any clue?

Hi,

It looks to me as if the fs is corrupt in some manner. Try unmounting on
all nodes and running fsck on one node on the filesystem. Make sure you
save the output of fsck in case that is useful for future debugging and
make sure you have a backup of the data in question first.

Its tricky to say exactly what might have gone wrong (the fsck output
might give a clue) but you will certainly need fsck to fix whatever the
problem is,

Steve.

On Tue, 2010-07-06 at 13:22 +1200, Abraham Alawi wrote:
> The system was running well for a while but lately we had a flaky disk in the RAID array which we replaced with
a healthy one but suddenly the CLVM/GFS became unusable, we can mount GFS but while listing it recursively
'ls -R' it hangs with Input/output error, can't even access the c/LVM LUN rawly using 'dd' BUT we still can
access the LVM PV devices using 'dd'. Reconfiguring the LVM volume as a local one and accessing it
exclusively from one node doesn't make a difference. 
> 
> RHEL5: 2.6.18-164.11.1.el5
> # modinfo gfs
> filename:       /lib/modules/2.6.18-164.11.1.el5/weak-updates/gfs/gfs.ko
> license:        GPL
> author:         Red Hat, Inc.
> description:    Global File System 0.1.34-2.el5
> srcversion:     3B1BAC4069F1A4B556A958A
> depends:        dlm
> vermagic:       2.6.18-159.el5 SMP mod_unload gcc-4.1
> 
> # uname -r
> 2.6.18-164.11.1.el5
> 
> # modinfo /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/block/aoe/aoe.ko
> filename:       /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/block/aoe/aoe.ko
> description:    AoE block/char driver for 2.6.2 and newer 2.6 kernels
> author:         Sam Hopkins <sah <at> coraid.com>
> license:        GPL
> srcversion:     42BF122979AC807F2BB50E6
> depends:        
> vermagic:       2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1
> parm:           aoe_iflist:aoe_iflist=dev1[,dev2...]
>  (string)
> parm:           version:aoe module version 74
>  (string)
> parm:           aoe_dyndevs:Use dynamic minor numbers for devices. (int)
> parm:           aoe_deadsecs:After aoe_deadsecs seconds, give up and fail dev. (int)
> parm:           aoe_maxout:Only aoe_maxout outstanding packets for every MAC on eX.Y. (int)
> parm:           aoe_maxsectors:When nonzero, set the maximum number of sectors per I/O request in new devices. (int)
> 
> # modinfo dlm
> filename:       /lib/modules/2.6.18-164.11.1.el5/kernel/fs/dlm/dlm.ko
> license:        GPL
> author:         Red Hat, Inc.
> description:    Distributed Lock Manager
> srcversion:     E768995007648CA8DB078AE
> depends:        configfs
> vermagic:       2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1
> module_sig:	883f3504b56fe19c59c69348c13cf1f1126a509f6ddaee3965ee8b5fcd04163669647a889a9801e09f722187d1de068c0d52cd2b99bc3d475cb6ca1a0
> 
> 
> 
> Herein what the kernel spits out:
> 
> Jul  6 11:27:36 kiwiland kernel: GFS 0.1.34-2.el5 (built Sep  9 2009 06:54:42) installed
> Jul  6 11:27:36 kiwiland kernel: Lock_DLM (built Sep  9 2009 06:54:38) installed
> Jul  6 11:27:36 kiwiland kernel: Lock_Nolock (built Sep  9 2009 06:54:37) installed
> Jul  6 11:27:36 kiwiland kernel: Trying to join cluster "lock_dlm", "FSC:files"
> Jul  6 11:27:36 kiwiland kernel: Joined cluster. Now mounting FS...
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Trying to acquire journal lock...
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Looking at journal...
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Acquiring the transaction lock...
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Replaying journal...
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Replayed 0 of 11 blocks
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: replays = 0, skips = 4, sames = 7
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Journal replayed in 1s
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=0: Done
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Trying to acquire journal lock...
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Looking at journal...
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: jid=1: Done
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Scanning for log elements...
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Found 2 unlinked inodes
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Found quota changes for 2 IDs
> Jul  6 11:27:36 kiwiland kernel: GFS: fsid=FSC:files.0: Done
> Jul  6 11:27:36 kiwiland kernel: Trying to join cluster "lock_dlm", "FSC:webcluster"
> Jul  6 11:27:36 kiwiland kernel: Joined cluster. Now mounting FS...
> Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Trying to acquire journal lock...
> Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Looking at journal...
> Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=1: Done
> Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Scanning for log elements...
> Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Found 0 unlinked inodes
> Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Found quota changes for 0 IDs
> Jul  6 11:27:37 kiwiland kernel: GFS: fsid=FSC:webcluster.1: Done
> Jul  6 11:27:37 kiwiland kernel: Installing knfsd (copyright (C) 1996 okir <at> monad.swb.de).
> Jul  6 11:27:39 kiwiland kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
> Jul  6 11:27:39 kiwiland kernel: NFSD: starting 90-second grace period
> Jul  6 11:32:21 kiwiland kernel: dlm: closing connection to node 1
> Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Trying to acquire journal lock...
> Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: fatal: invalid metadata block
> Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0:   bh = 1432543247 (magic)
> Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0:   function = gfs_rgrp_read
> Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0:   file =
/builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/rgrp.c, line = 830
> Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0:   time = 1278372781
> Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: about to withdraw from the cluster
> Jul  6 11:33:01 kiwiland kernel: GFS: fsid=FSC:files.0: telling LM to withdraw
> Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Looking at journal...
> Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Acquiring the transaction lock...
> Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Replaying journal...
> Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Replayed 0 of 0 blocks
> Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: replays = 0, skips = 0, sames = 0
> Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Journal replayed in 1s
> Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:webcluster.1: jid=0: Done
> Jul  6 11:33:02 kiwiland kernel: GFS: fsid=FSC:files.0: withdrawn
> Jul  6 11:33:02 kiwiland kernel: 
> Jul  6 11:33:02 kiwiland kernel: Call Trace:
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff88805018>] :gfs:gfs_lm_withdraw+0xc4/0xd3
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff80063a36>] __wait_on_bit+0x60/0x6e
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8001538b>] sync_buffer+0x0/0x3f
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff800a00e5>] wake_bit_function+0x0/0x23
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8881cc97>] :gfs:gfs_meta_check_ii+0x32/0x3e
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff88819439>] :gfs:gfs_rgrp_read+0x139/0x225
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff887fb8e8>] :gfs:glock_wait_internal+0x229/0x2c3
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff887fbd17>] :gfs:gfs_glock_nq+0x395/0x3d6
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff887fbd6e>] :gfs:gfs_glock_nq_init+0x16/0x2a
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff88817466>] :gfs:gfs_rgrp_lvb_init+0x1e/0x3f
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8881a46f>] :gfs:gfs_stat_gfs+0x213/0x273
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8881353d>] :gfs:gfs_statfs+0x67/0xea
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff800deba3>] vfs_statfs+0x63/0x7f
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886d2ce>] :nfsd:nfsd_statfs+0x28/0x38
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff888745f8>] :nfsd:nfsd3_proc_fsstat+0x3f/0x54
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886a1db>] :nfsd:nfsd_dispatch+0xd8/0x1d6
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff886e0529>] :sunrpc:svc_process+0x454/0x71b
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff80064644>] __down_read+0x12/0x92
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886a746>] :nfsd:nfsd+0x1a5/0x2cb
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8886a5a1>] :nfsd:nfsd+0x0/0x2cb
> Jul  6 11:33:02 kiwiland kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
> Jul  6 11:33:02 kiwiland kernel: 
> 
> 
> Another kernel spit out:
> Jul  5 02:01:19 Hercules kernel: GFS: fsid=FSC:files.0: fast statfs start time = 1278252079
> Jul  5 03:01:16 Hercules kernel: GFS: fsid=FSC:files.0: fast statfs start time = 1278255676
> Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: fatal: invalid metadata block
> Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0:   bh = 86700288 (magic)
> Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0:   function = gfs_get_meta_buffer
> Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0:   file =
/builddir/build/BUILD/gfs-kmod-0.1.34/_kmod_build_/src/gfs/dio.c, line = 1225
> Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0:   time = 1278255737
> Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: about to withdraw from the cluster
> Jul  5 03:02:17 Hercules kernel: GFS: fsid=FSC:files.0: telling LM to withdraw
> Jul  5 03:02:21 Hercules kernel: GFS: fsid=FSC:files.0: withdrawn
> Jul  5 03:02:21 Hercules kernel: 
> Jul  5 03:02:21 Hercules kernel: Call Trace:
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff8880a018>] :gfs:gfs_lm_withdraw+0xc4/0xd3
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff8001538b>] sync_buffer+0x0/0x3f
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff800a00e5>] wake_bit_function+0x0/0x23
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff88821c97>] :gfs:gfs_meta_check_ii+0x32/0x3e
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff887f7717>] :gfs:gfs_get_meta_buffer+0x1d1/0x247
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff88804193>] :gfs:gfs_copyin_dinode+0x1d/0x12f
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff88800d6e>] :gfs:gfs_glock_nq_init+0x16/0x2a
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff888043e3>] :gfs:inode_create+0x13e/0x1df
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff88804a5d>] :gfs:gfs_inode_get+0x9d/0xba
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff888053bb>] :gfs:gfs_lookupi+0x33d/0x3df
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff887fce57>] :gfs:ea_find_i+0x0/0x6b
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff888172af>] :gfs:gfs_lookup+0x363/0x41a
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff80025426>] igrab+0x25/0x34
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff888055a0>] :gfs:gfs_iget+0x3d/0x1f1
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff88801224>] :gfs:gfs_glock_dq+0x13c/0x14b
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff8000cf01>] do_lookup+0xe5/0x1e6
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff8000a22b>] __link_path_walk+0xa01/0xf42
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff8000e9cc>] link_path_walk+0x42/0xb2
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff8000cc9c>] do_path_lookup+0x275/0x2f1
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff80012752>] getname+0x15b/0x1c2
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff800236ba>] __user_walk_fd+0x37/0x4c
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff8003f235>] vfs_lstat_fd+0x18/0x47
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff8002a95a>] sys_newlstat+0x19/0x31
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff8005dde9>] error_exit+0x0/0x84
> Jul  5 03:02:21 Hercules kernel:  [<ffffffff8005d116>] system_call+0x7e/0x83
> 
> 
> Thanks in advance,
> 
>   -- Abraham
> 
> ''''''''''''''''''''''''''''''''''''''''''''''''''''''
> Abraham Alawi
> 
> Unix/Linux Systems Administrator
> Science IT
> University of Auckland
> e: a.alawi <at> auckland.ac.nz
> p: +64-9-373 7599, ext#: 87572
> 
> ''''''''''''''''''''''''''''''''''''''''''''''''''''''
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster <at> redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Peter McGowan | 6 Jul 2010 15:12

Two node cluster with iLO fencing.

Hi

I'm looking to put together a two-node cluster that will use HP iLOs for fencing. The systems have their bond0 interfaces connected to the public LAN, and I was intending to use a private non-routable network (bond1) purely for cluster communication traffic.

What's confused me is the line in the CMAN FAQ which says "If you are using per-node power management of any sort where the device is not shared between cluster nodes, it must be connected to the same network used by CMAN for cluster communication"

Is anyone able to define "same network" in this context? My iLOs will be reachable through bond0 (public LAN), but not through bond1 (cluster network) - does this count as "same network"?

Thanks for any clarification,
Peter
<div>Hi<div><br></div>
<div>I'm looking to put together a two-node cluster that will use HP iLOs for fencing. The systems have their bond0 interfaces connected to the public LAN, and I was intending to use a private non-routable network (bond1) purely for cluster communication traffic.</div>
<div><br></div>
<div>What's confused me is the line in the CMAN FAQ which says "If you are using per-node power management of any sort where the device is not shared between cluster nodes, it must be connected to the same network used by CMAN for cluster communication"</div>
<div><br></div>
<div>Is anyone able to define "same network" in this context? My iLOs will be reachable through bond0 (public LAN), but not through bond1 (cluster network) - does this count as "same network"?</div>
<div><br></div>
<div>Thanks for any clarification,</div>
<div>Peter</div>
</div>

Gmane